II. Python for Data Science
Python Manipulation Using Pandas
1) Basics to DataFrames
| File | Comments | Functionality |
|---|---|---|
| 1. Getting Started with Pandas | Installing and importing from Pandas. |
- |
| 2. Core Data Structures | Understanding core data structures of Pandas: DataFrames and Series |
- |
| 3. Creating a DataFrame | Creating a DataFrame from lists, dictionaries, list of dictionaries, arrays, and loading data from an external file (e.g., CSV files) to a DataFrame. | Create New Object |
| 4. Creating a Series | Using pd.Series() |
Create New Object |
| 5. Indexing a DataFrame | How to index by rows or columns, and subsetting the DataFrame using [] and loc and iloc attributes. |
Filtering |
| 6. Chained Indexing | Sequential indexing. NOT recommended. | Filtering |
| 7. Boolean Masking | Filtering or subsetting a DataFrame based on a boolean condition (series). Alternative comparison and logical operators for DataFrames. | Filtering |
| 8. Exporting a DataFrame | Exporting DataFrame as a CSV, Excel, JSON file. | - |
2) Manipulating the Physical DataFrame Object
| File | Comments | Functionality |
|---|---|---|
| 1. Resetting the Index | Reset index of a DataFrame using df.reset_index() to drop or move the index data to axis=1 as a new column. |
Index |
| 2. Customize Index | Customize the index of DataFrame using df.index attribute. |
Index |
3. Setting New Index using df.set_index() |
Set new index to DataFrame object with existing columns. | Index |
| 4. Multi-Level Indexing | How to set, drop and query multi-index | Index |
| 5. Sorting Rows by Column Values | Using df.sort_values() |
Sorting |
| 6. Reordering Columns | Manually reorder columns in desired order. | Sorting |
| 7. Renaming Row Index and Column Names | - | Index |
| 8. Transposing the DataFrame | Flipping columns to rows and rows as columns | Physical Structure |
| 9. Pivoting the DataFrame | There are two build in functions: pivot() and pivot_table(). |
Physical Structure |
| 10. Merging DataFrames | Merging DataFrame Objects using: merge(), join(), and concat(). |
Physical Structure |
| 11. Stacking | Stacking using concat(), append(), and stack(). |
Physical Structure |
2) Manipulating the Data in the DataFrame Object
| File | Comments | Functionality |
|---|---|---|
| 1. Adding Data to a DataFrame and Series | Adding a row or column to a DataFrame. Adding a value to a series. | Add Data |
| 2. Removing Data in a DataFrame and Series | Removing data in a pandas DataFrame, including removing rows and columns by index or name, replacing values with NaN, and handling missing data with dropna(). |
Remove Data |
| 4. How to Handle Missing Data | Checking for missing values with isnull() and isna() and handling missing values with dropna(). |
Missing Data |
| 5. Replacing Values | Using a dictionary with the df.replace() function to remap values in specific columns. |
Replace Values |
| 6. Converting DataTypes in a DataFrame | Using astype() to convert the data type of a specific column. |
Change Data Type |
| 7. Type Casting | Implicit vs explicit type conversions and using df[col].astype() to explicit change the data type of a column. |
Change Data Type |
3) Grouping and Aggregation
| File | Comments | Type |
|---|---|---|
| 1. Group By Aggregation | Basic aggregation operations, .apply() and .agg(). |
Group By |
| 2. Multi-Level Group By | - | Group By |
4) Merging and Concatenation
5) Useful Functions
| File | Comments | Functionality |
|---|---|---|
1. Checking Data Types with dtypes |
Check data types of each columns in a DataFrame object or Series. | - |
2. Filter DataFrame Columns by select_dtype() |
Filtering dataframe by datatype select_dtype() |
Filtering |