Checking for Duplicate Rows

tags: #python/data_science/eda

Basic Syntax

To check for duplicate rows:

df.duplicated()

This method returns a boolean mask indicating which rows are duplicates of previous rows:

0    False
1    False
2    False
3    False
4    False
5     True
dtype: bool

Retrieve Total Number of Duplicated Row

df.duplicated().sum()

This sums the number of True values in that array, indicating the number of duplicated rows in the DataFrame, df.

To Check Whether Duplicated Rows Exist in the Overall Dataset

df.duplicated().any()

This will return a boolean value indicating whether there are any duplicated rows (True) or not (False) in the dataframe df.

Powered by Forestry.md