Pandas

Table of Content

Install and Import
Setup
Data Structures
Read and Write
Inspecting Data
Selection & Indexing
Filtering & Boolean Indexing
Sorting
Handling Missing Data
Aggregations & Statistics
Grouping & Pivoting
String Methods (.str)
Datetime Handling (.dt)
Reshaping
Merging & Joining
Apply & Lambda

1. Install and Import

Before you start working with Pandas, you need to make sure it’s installed in your Python environment. Pandas is not included in Python’s standard library, so if you’re using a fresh Python setup (or a new virtual environment), you’ll likely need to install it first.

If you are using Anaconda or Google Colab, Pandas usually comes pre-installed.
If you are using pip (standard Python), you can install it manually.

Once installed, you should import Pandas into your code. The community convention is to import it as pd. This makes your code concise and consistent with most tutorials and examples online.

# Install pandas (if not already installed)

!pip install pandas

pip install pandas → downloads and installs the latest stable version of Pandas from PyPI.

# Check the installed version

print(pd.__version__)

pd.__version__ → prints the version number so you know which release you’re working with (useful for debugging or following tutorials).

# Import pandas with the standard alias

import pandas as pd

import pandas as pd → imports the Pandas library and shortens the reference name to pd. This is the universal convention.

2. Setup

Once Pandas is imported, you can customize a few settings to improve your workflow. Pandas provides the pd.set_option() function that allows you to control how DataFrames are displayed in your notebook or terminal. This doesn’t affect the actual data — it only changes how you see it.

The most common options people adjust are:

display.max_rows → maximum number of rows to show when printing a DataFrame.
display.max_columns → maximum number of columns to show.
display.precision → number of decimal places for floating-point numbers.

This helps especially when you’re working with large datasets, where by default Pandas might cut off data or show ... for hidden rows/columns.

# Control number of rows displayed

pd.set_option('display.max_rows', 20)

pd.set_option('display.max_rows', 20) → when printing a DataFrame, show up to 20 rows.

# Control number of columns displayed

pd.set_option('display.max_columns', 10)

pd.set_option('display.max_columns', 10) → when printing a DataFrame, show up to 10 columns.

# Control precision for float values

pd.set_option('display.precision', 3)

pd.set_option('display.precision', 3) → float numbers will be rounded to 3 decimal places in display.

👉 These settings don’t alter your data — they only make the console or notebook display more user-friendly.

3. Data Structures: Series & DataFrame

Series = one labeled column (1-D). Think: a single field like closing price, scores, or department names.
DataFrame = table (2-D) made of multiple Series aligned on the same index. Think: a dataset with many columns (id, name, dept, score, date).

When to use which?

Use a Series for single-variable (e.g., compute % change of a price series).

Use a DataFrame when columns relate to the same observations (e.g., one row per student or transaction).

Key ideas:

Index → the row labels. You can keep the default 0 ... N−1 or set a meaningful key (e.g., id, date).

Dtypes → numeric, string, datetime, categorical… Choosing the right type helps performance & correctness.

# Control number of rows displayed

pd.set_option('display.max_rows', 20)

pd.set_option('display.max_rows', 20) → when printing a DataFrame, show up to 20 rows.

# Control number of columns displayed

pd.set_option('display.max_columns', 10)

pd.set_option('display.max_columns', 10) → when printing a DataFrame, show up to 10 columns.

# Control precision for float values

pd.set_option('display.precision', 3)

pd.set_option('display.precision', 3) → float numbers will be rounded to 3 decimal places in display.

👉 These settings don’t alter your data — they only make the console or notebook display more user-friendly.

2. Setup

The most common options people adjust are:

display.max_rows → maximum number of rows to show when printing a DataFrame.
display.max_columns → maximum number of columns to show.
display.precision → number of decimal places for floating-point numbers.

This helps especially when you’re working with large datasets, where by default Pandas might cut off data or show ... for hidden rows/columns.

# Control number of rows displayed

pd.set_option('display.max_rows', 20)

pd.set_option('display.max_rows', 20) → when printing a DataFrame, show up to 20 rows.

# Control number of columns displayed

pd.set_option('display.max_columns', 10)

pd.set_option('display.max_columns', 10) → when printing a DataFrame, show up to 10 columns.

# Control precision for float values

pd.set_option('display.precision', 3)

pd.set_option('display.precision', 3) → float numbers will be rounded to 3 decimal places in display.

👉 These settings don’t alter your data — they only make the console or notebook display more user-friendly.

Summary

1. Import & Setup

import pandas as pd
pd.__version__ → check version
pd.set_option('display.max_rows', n) → control display

2. Data Structures

Series: pd.Series(data, index)
DataFrame: pd.DataFrame(data, columns, index)

3. Input/Output (I/O)

pd.read_csv('file.csv')
pd.read_excel('file.xlsx')
pd.read_json('file.json')
pd.read_sql(query, connection)
df.to_csv('file.csv')
df.to_excel('file.xlsx')

4. Inspecting Data

df.head(n) / df.tail(n)

df.info()

df.shape

df.dtypes

df.columns

df.index

df.describe()

5. Selection & Indexing

df['col'] → select column

df[['col1','col2']] → multiple columns

df.loc[row_index, col_name] → label-based

df.iloc[row_index, col_index] → position-based

df.at[row, col] / df.iat[row, col] → fast scalar access

6. Filtering & Boolean Indexing

df[df['col'] > value]

df.query('col > value')

df[(df['A'] > 0) & (df['B'] < 5)]

7. Sorting

df.sort_values('col')

df.sort_values(['col1','col2'], ascending="[True," False])

df.sort_index()

8. Handling Missing Data

df.isnull() / df.notnull()

df.dropna()

df.fillna(value)

df.interpolate()

9. Aggregations & Statistics

df.sum() / df.mean() / df.median()

df.min() / df.max()

df.std() / df.var()

df.count()

df.value_counts()

df.corr()

10. Grouping & Pivoting

df.groupby('col').mean()

df.groupby(['A','B']).agg({'C':'sum'})

df.pivot(index='col1', columns='col2', values='col3')

df.pivot_table(values='col', index='A', columns='B', aggfunc='mean')

11. String Methods (.str)

df['col'].str.lower() / .upper()

df['col'].str.contains('text')

df['col'].str.replace('a','b')

df['col'].str.len()

12. Datetime Handling (.dt)

pd.to_datetime(df['date'])

df['date'].dt.year / .month / .day

df['date'].dt.weekday

df['date'].dt.strftime('%Y-%m-%d')

13. Reshaping

df.melt(id_vars, value_vars)

df.stack() / df.unstack()

df.pivot_table()

14. Merging & Joining

pd.concat([df1, df2])

pd.merge(df1, df2, on='key')

pd.merge(df1, df2, how='left')

df1.join(df2)

15. Apply & Lambda

df['col'].apply(func)

df.applymap(func) → elementwise on DataFrame

df.transform(lambda x: x+1)

16. Attributes (quick access)

df.shape → (rows, cols)

df.index → row index

df.columns → column names

df.dtypes → data types

df.size → total elements

df.values → underlying NumPy array

17. Export & Save

df.to_csv('data.csv')

df.to_excel('data.xlsx')

df.to_json('data.json')

df.to_sql(table, connection)

18. Visualization (basic)

df.plot()

df['col'].hist()

df.plot.scatter(x='col1', y='col2')

Pandas

Table of Content

​​1. Install and Import

​​​​2. Setup

​​​3. Data Structures: Series & DataFrame

​​​​2. Setup

Summary

1. Install and Import

2. Setup

3. Data Structures: Series & DataFrame

2. Setup