Data Science II - Introduction to Pandas

February 6, 2020 · 4 mins read

Pandas is an open source library built on the top of NumPy that allows us to analyse and clean the data for further step to be performed upon.

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

1. Pandas features
2. How to install Pandas?
3. What are we going to learn?
4. Series in Pandas
5. DataFrames in Pandas
6. DataFrame Operations
7. Working on missing data in DataFrames
8. Applying GroupBy( Aggregate) operations in Pandas
9. Merging, concatenating and joining DataFrames in Pandas

Pandas is the next step that you need to know if you are starting out as a data scientist. As it is built on the top of Numpy, that is why studied about Numpy first.

Pandas features

Pandas library has a built-in visualization which you can use which we are going to discuss in the next few parts.
It can work with a wide variety of data sources and can help us to clean them up.

There are a lot of other features which you might find useful, but this is what we need for the start.

How to install Pandas?

Installing Pandas is quite similar to installing Numpy as we did in the last part.

When in your virtual environment, use the following command

pip install pandas

What are we going to learn?

We are going to learn various methods of the Pandas library which will help us to clean and analyse the code.

Some important terms we will be seeing in this post are:

Series
DataFrames
Missing Data
GroupBy
Merging, Joining and Concatenating …

Series in Pandas

Pandas series are similar to Numpy Array except that the pandas series should contain some indexes.

Let’s jump to the Jupyter notebook to learn more about the Pandas series.

Pandas Series Jupyter notebook

Recap

In this notebook, we discussed,

How to create Pandas series?
How to create Pandas series with custom indexes?
Creating series using Python dictionaries.
How to select elements from pandas series?
How to apply arithmetic operations in Pandas series?

NOTE: When you do arithmetic operations on pandas series with int type, it will convert it to float.

DataFrames in Pandas

A Pandas DataFrame is a combination of Pandas series clubbed together in form of rows/ columns.

Let’s jump on to the Jupyter notebook and continue the discussion over there.

Pandas DataFrame Jupyter notebook

In this notebook, we learned about the following topics.

How to create Pandas DataFrames?
How to select column series from DataFrame?
How to add new data into the Pandas DataFrame?
How to remove series from Pandas DataFrames?
How to select rows from Pandas DataFrames?

DataFrame Operations

Normal python and and or don’t work because they doesn’t have the capability to compare boolean values in a series.

Numpy DataFrames Operations Notebook

In this notebook we learned about the different methods used in Pandas to select, manipulate and operate on Pandas DataFrames.

Working on missing data in DataFrames

Pandas provide a lot of methods that can help us with cleaning and removing the missing data from the DataFrames.

Let’s head up to the jupyter notebook and learn more on how to handle missing data in a DataFrame.

Applying GroupBy( Aggregate) operations in Pandas

Group by operators allow us to apply aggregate functions.

Let’s jump into the jupyter notebook and learn how can we apply group by techniques to pandas DataFrame.

Merging, concatenating and joining DataFrames in Pandas

Let’s jump to the jupyter notebook to learn more about this.

That’s it for this part of the post. I will keep adding more operations and methods if I find something interesting to this post.

Ranvir Singh

Data Science II - Introduction to Pandas

Pandas features

How to install Pandas?

What are we going to learn?

Series in Pandas

Recap

DataFrames in Pandas

DataFrame Operations

Working on missing data in DataFrames

Applying GroupBy( Aggregate) operations in Pandas

Merging, concatenating and joining DataFrames in Pandas

You May Also Enjoy

1. Normalizing or Standardizing distribution in Machine Learning

2. Machine learning for beginners - MP Neuron

3. Basic Mathematics for Neural Networks | Vectors and Matrices with Matplotlib

4. SVM | Introduction to Support Vector Machines with Sklearn in Machine Learning

Data Science II - Introduction to Pandas

Pandas features

How to install Pandas?

What are we going to learn?

Series in Pandas

Recap

DataFrames in Pandas

DataFrame Operations

Working on missing data in DataFrames

Applying GroupBy( Aggregate) operations in Pandas

Merging, concatenating and joining DataFrames in Pandas

Liked it? Sing it Loud and Proud:

You May Also Enjoy

1. Normalizing or Standardizing distribution in Machine Learning

2. Machine learning for beginners - MP Neuron

3. Basic Mathematics for Neural Networks | Vectors and Matrices with Matplotlib

4. SVM | Introduction to Support Vector Machines with Sklearn in Machine Learning