Checking for Duplicate Rows
tags: #python/data_science/eda
Basic Syntax
To check for duplicate rows:
df.duplicated()
This method returns a boolean mask indicating which rows are duplicates of previous rows:
0 False
1 False
2 False
3 False
4 False
5 True
dtype: bool
Retrieve Total Number of Duplicated Row
df.duplicated().sum()
This sums the number of True values in that array, indicating the number of duplicated rows in the DataFrame, df.
To Check Whether Duplicated Rows Exist in the Overall Dataset
df.duplicated().any()
This will return a boolean value indicating whether there are any duplicated rows (True) or not (False) in the dataframe df.