[Solved] How to find and calculate the number of duplicated rows between two different dataframe? [closed]

Question

You can sorted both DataFrames – columns c_x and c_y, for movies is used DataFrame.pivot, count non missing values by DataFrame.count and append to df1:

df2[['c_x','c_y']] = np.sort(df2[['c_x','c_y']], axis=1)

df2['g'] = df2.groupby(['c_x','c_y']).cumcount().add(1)

df2 = df2.pivot(index=['c_x','c_y'], columns="g", values="movie").add_prefix('movie')
df2['number'] = df2.count(axis=1)
print (df2)
g       movie1 movie2  number
c_x c_y                      
bob dan      c      f       2
    uni      a      f       2
kim kim      a    NaN       1
    lee      a      b       2

And then:

df1[['c_x','c_y']] = np.sort(df1[['c_x','c_y']], axis=1)

df = df1.join(df2, on=['c_x','c_y'])

Accepted Answer

You can sorted both DataFrames – columns c_x and c_y, for movies is used DataFrame.pivot, count non missing values by DataFrame.count and append to df1:

df2[['c_x','c_y']] = np.sort(df2[['c_x','c_y']], axis=1)

df2['g'] = df2.groupby(['c_x','c_y']).cumcount().add(1)

df2 = df2.pivot(index=['c_x','c_y'], columns="g", values="movie").add_prefix('movie')
df2['number'] = df2.count(axis=1)
print (df2)
g       movie1 movie2  number
c_x c_y                      
bob dan      c      f       2
    uni      a      f       2
kim kim      a    NaN       1
    lee      a      b       2

And then:

df1[['c_x','c_y']] = np.sort(df1[['c_x','c_y']], axis=1)

df = df1.join(df2, on=['c_x','c_y'])