Creating a DataFrame

How do we create a DataFrame?

To create a DataFrame, the pd.DataFrame() function can be used:

pd.dataframe(data, index, columns, inplace,...)
Using the inpalce Parameter

The inplace parameter in Pandas is a way to control whether the changes made to a DataFrame should be applied directly to the existing DataFrame , which we declare inplace=True .

Otherwise, create a new instance of the DataFrame with the changes.

DataFrame from a dictionary

When creating a Pandas DataFrame using a dictionary, each key-value pair in the dictionary typically represents a column in the DataFrame.


data = {
		'colName': ['list', 'of', 'values', 'any', 'datatype'],
		'colName2': [1, 23, 4, 6, 10] 
		...
		}	

df = pd.DataFrame(data)

DataFrame from a list

When creating a DataFrame using a list, each element of a list represent a row of values. Make sure that the position of the values in the list correspond to its appropriate column.

data = [['Alice', 25, 'New York'],
        ['Bob', 30, 'San Francisco'],
        ['Charlie', 35, 'Los Angeles']]

df = pd.DataFrame(data, columns=['Name', 'Age', 'City']) # need to pass columns

DataFrame from a List of Dictionary

When passing a list of dictionaries, each dictionary represent a row a values, where each key:value pair maps a the value to its corresponding column.

data = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
        {'Name': 'Bob', 'Age': 30, 'City': 'San Francisco'},
        {'Name': 'Charlie', 'Age': 35, 'City': 'Los Angeles'}]

df = pd.DataFrame(data)

DataFrame from a NumPy Array

You can also create a DataFrame from a NumPy array:

import pandas as pd
import numpy as np

data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]]) # each array is a row

df = pd.DataFrame(data, columns=['A', 'B', 'C']) # need to specify column name

DataFrame from an External File

To read data from an external file (e.g., CSV, Excel, SQL, etc):

import pandas as pd

# Reading from CSV
df_csv = pd.read_csv('filename.csv')

# Reading from Excel
df_excel = pd.read_excel('filename.xlsx')

# Reading from SQL
df_sql = pd.read_sql('SELECT * FROM table', connection)
Powered by Forestry.md