Displaying Dataset Description
tags: #python/data_science/eda
Displaying the Dataset Description
To retrieve the summary description of a DataFrame object, we can use the following method:
df.info()
This is a useful method for quickly getting an overview of the data in a DataFrame including information about the data type, number of records in each column, and number of missing values.
Sample Return Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
column_1 10000 non-null int64
column_2 10000 non-null object
column_3 10000 non-null float64
column_4 10000 non-null int64
column_5 10000 non-null object
column_6 10000 non-null float64
column_7 10000 non-null int64
dtypes: float64(2), int64(3), object(2)
memory usage: 547.0+ KB
Interpreting the Output
This output provides some basic information about a pandas DataFrame, including:
- The class of the DataFrame (
pandas.core.frame.DataFrame) - The range of the index (in this case,
RangeIndex: 10000 entries, 0 to 9999) - The total number of columns in the DataFrame (
7in this case) - A list of the column names, along with their data types and the number of non-null values (
column_1is an integer with 10000 non-null values,column_2is an object with 10000 non-null values, etc.) - The data types of each column (
float64,int64, orobject) - The total memory usage of the DataFrame (
547.0+ KBin this case)
Getting the Dimension of the Dataset
To get the dimension (or shape) of the dataset:
df.shape
This returns the number of features in the dataset and the records:
([# of features], [# of records])