Pandas

Table of Content

  1. Install and Import
  2. Setup
  3. Data Structures
  4. Read and Write
  5. Inspecting Data
  6. Selection & Indexing
  7. Filtering & Boolean Indexing
  8. Sorting
  9. Handling Missing Data
  10. Aggregations & Statistics
  11. Grouping & Pivoting
  12. String Methods (.str)
  13. Datetime Handling (.dt)
  14. Reshaping
  15. Merging & Joining
  16. Apply & Lambda

​1. Install and Import

Before you start working with Pandas, you need to make sure it’s installed in your Python environment. Pandas is not included in Python’s standard library, so if you’re using a fresh Python setup (or a new virtual environment), you’ll likely need to install it first.
  • If you are using Anaconda or Google Colab, Pandas usually comes pre-installed.
  • If you are using pip (standard Python), you can install it manually.
Once installed, you should import Pandas into your code. The community convention is to import it as pd. This makes your code concise and consistent with most tutorials and examples online.

# Install pandas (if not already installed)
!pip install pandas

pip install pandas → downloads and installs the latest stable version of Pandas from PyPI.

# Check the installed version
print(pd.__version__)

pd.__version__ → prints the version number so you know which release you’re working with (useful for debugging or following tutorials).

# Import pandas with the standard alias
import pandas as pd

import pandas as pd → imports the Pandas library and shortens the reference name to pd. This is the universal convention.

​​​2. Setup

Once Pandas is imported, you can customize a few settings to improve your workflow. Pandas provides the pd.set_option() function that allows you to control how DataFrames are displayed in your notebook or terminal. This doesn’t affect the actual data — it only changes how you see it.
The most common options people adjust are:
  • display.max_rows → maximum number of rows to show when printing a DataFrame.
  • display.max_columns → maximum number of columns to show.
  • display.precision → number of decimal places for floating-point numbers.
This helps especially when you’re working with large datasets, where by default Pandas might cut off data or show ... for hidden rows/columns.

# Control number of rows displayed
pd.set_option('display.max_rows', 20)

pd.set_option('display.max_rows', 20) → when printing a DataFrame, show up to 20 rows.

# Control number of columns displayed
pd.set_option('display.max_columns', 10)

pd.set_option('display.max_columns', 10) → when printing a DataFrame, show up to 10 columns.

# Control precision for float values
pd.set_option('display.precision', 3)

pd.set_option('display.precision', 3) → float numbers will be rounded to 3 decimal places in display.

👉 These settings don’t alter your data — they only make the console or notebook display more user-friendly.

​​​2. Setup

Once Pandas is imported, you can customize a few settings to improve your workflow. Pandas provides the pd.set_option() function that allows you to control how DataFrames are displayed in your notebook or terminal. This doesn’t affect the actual data — it only changes how you see it.
The most common options people adjust are:
  • display.max_rows → maximum number of rows to show when printing a DataFrame.
  • display.max_columns → maximum number of columns to show.
  • display.precision → number of decimal places for floating-point numbers.
This helps especially when you’re working with large datasets, where by default Pandas might cut off data or show ... for hidden rows/columns.

# Control number of rows displayed
pd.set_option('display.max_rows', 20)

pd.set_option('display.max_rows', 20) → when printing a DataFrame, show up to 20 rows.

# Control number of columns displayed
pd.set_option('display.max_columns', 10)

pd.set_option('display.max_columns', 10) → when printing a DataFrame, show up to 10 columns.

# Control precision for float values
pd.set_option('display.precision', 3)

pd.set_option('display.precision', 3) → float numbers will be rounded to 3 decimal places in display.

👉 These settings don’t alter your data — they only make the console or notebook display more user-friendly.

​​​2. Setup

Once Pandas is imported, you can customize a few settings to improve your workflow. Pandas provides the pd.set_option() function that allows you to control how DataFrames are displayed in your notebook or terminal. This doesn’t affect the actual data — it only changes how you see it.
The most common options people adjust are:
  • display.max_rows → maximum number of rows to show when printing a DataFrame.
  • display.max_columns → maximum number of columns to show.
  • display.precision → number of decimal places for floating-point numbers.
This helps especially when you’re working with large datasets, where by default Pandas might cut off data or show ... for hidden rows/columns.

# Control number of rows displayed
pd.set_option('display.max_rows', 20)

pd.set_option('display.max_rows', 20) → when printing a DataFrame, show up to 20 rows.

# Control number of columns displayed
pd.set_option('display.max_columns', 10)

pd.set_option('display.max_columns', 10) → when printing a DataFrame, show up to 10 columns.

# Control precision for float values
pd.set_option('display.precision', 3)

pd.set_option('display.precision', 3) → float numbers will be rounded to 3 decimal places in display.

👉 These settings don’t alter your data — they only make the console or notebook display more user-friendly.

Summary

1. Import & Setup
  • import pandas as pd
  • pd.__version__ → check version
  • pd.set_option('display.max_rows', n) → control display

2. Data Structures
  • Series: pd.Series(data, index)
  • DataFrame: pd.DataFrame(data, columns, index)

3. Input/Output (I/O)
  • pd.read_csv('file.csv')
  • pd.read_excel('file.xlsx')
  • pd.read_json('file.json')
  • pd.read_sql(query, connection)
  • df.to_csv('file.csv')
  • df.to_excel('file.xlsx')

4. Inspecting Data
df.head(n) / df.tail(n)
df.info()
df.shape
df.dtypes
df.columns
df.index
df.describe()

5. Selection & Indexing
df['col'] → select column
df[['col1','col2']] → multiple columns
df.loc[row_index, col_name] → label-based
df.iloc[row_index, col_index] → position-based
df.at[row, col] / df.iat[row, col] → fast scalar access

6. Filtering & Boolean Indexing
df[df['col'] > value]
df.query('col > value')
df[(df['A'] > 0) & (df['B'] < 5)]

7. Sorting
df.sort_values('col')
df.sort_values(['col1','col2'], ascending="[True," False])
df.sort_index()

8. Handling Missing Data
df.isnull() / df.notnull()
df.dropna()
df.fillna(value)
df.interpolate()

9. Aggregations & Statistics
df.sum() / df.mean() / df.median()
df.min() / df.max()
df.std() / df.var()
df.count()
df.value_counts()
df.corr()

10. Grouping & Pivoting
df.groupby('col').mean()
df.groupby(['A','B']).agg({'C':'sum'})
df.pivot(index='col1', columns='col2', values='col3')
df.pivot_table(values='col', index='A', columns='B', aggfunc='mean')

11. String Methods (.str)
df['col'].str.lower() / .upper()
df['col'].str.contains('text')
df['col'].str.replace('a','b')
df['col'].str.len()

12. Datetime Handling (.dt)
pd.to_datetime(df['date'])
df['date'].dt.year / .month / .day
df['date'].dt.weekday
df['date'].dt.strftime('%Y-%m-%d')

13. Reshaping
df.melt(id_vars, value_vars)
df.stack() / df.unstack()
df.pivot_table()

14. Merging & Joining
pd.concat([df1, df2])
pd.merge(df1, df2, on='key')
pd.merge(df1, df2, how='left')
df1.join(df2)

15. Apply & Lambda
df['col'].apply(func)
df.applymap(func) → elementwise on DataFrame
df.transform(lambda x: x+1)

16. Attributes (quick access)
df.shape → (rows, cols)
df.index → row index
df.columns → column names
df.dtypes → data types
df.size → total elements
df.values → underlying NumPy array

17. Export & Save
df.to_csv('data.csv')
df.to_excel('data.xlsx')
df.to_json('data.json')
df.to_sql(table, connection)

18. Visualization (basic)
df.plot()
df['col'].hist()
df.plot.scatter(x='col1', y='col2')