Getting a Quick Statistical Summary

tags: #python/data_science/eda

To get quick and simple description of the data, we can use the describe() function. This includes:

This provides a statistical summary of the data belonging to any numerical datatype (e.g., int, float). This provides a high-level overview of the overall distribution of the data and potential outliers.

df.describe()
       numerical_column_1  numerical_column_2  numerical_column_3
count         1000.000000         1000.000000         1000.000000
mean            50.345000           25.678000            3.456000
std              7.546943            5.123456            1.234567
min             35.000000           18.000000            1.000000
25%             45.000000           22.000000            2.500000
50%             50.000000           25.000000            3.000000
75%             55.000000           29.000000            4.000000
max             70.000000           40.000000            6.000000
Using the include Parameter

In the describe() method of pandas, the include parameter allows you to specify the data types to be included in the summary statistics. If you set include='all', it means that the summary statistics will include all columns, regardless of their data types (numeric or object).

For object columns, you get count, unique, top (most frequently occurring value), and freq (frequency of the top value).

Powered by Forestry.md