We’ll be showing you how filter a Pandas DataFrame in Python. So let’s set the scene. You work for a massive food company that sells deli meats and fruits. You’ve been toiling at your computer for hours trying to create a dataframe of your company’s product info. Department XYZ needs that data so they can estimate average costs for some random ass quarterly presentation. Thus, you had to hit some random API thousands of times and spent hours trying to gather the product info you have because you had to get the data by reverse engineering your own company’s website’s APIs. Why won’t they Karen give you direct access to the database? Because she sucks. Only thing is, that the data you now have has some random rows of data you do not need. Department XYZ told you they didn’t need information on the Deli Meats. This quarter we’re focusing on fruits!
My Computer Setup
- Python3 (Finally getting used to adding parentheses around my print statements )
- macOS
Getting your Environment Setup
You can find some info on installing pip here. The article actually covers pip installation for Python2 and the Python3 pip installation slightly differs. I’ll be sure to add and link another article in the near future. So assuming you have pip for python3 installed, execute the following to get the latest pandas package up and running:
pip3 install pandas
Setting up the Problem
So let’s setup our sample data. We’ll be using pandas here.
import pandas as pd
df['category'] = ['Deli', 'Fruits', 'Deli', 'Fruits', 'Fruits']
df['price']=[1,2,3,4,5]
df['name'] = ['bologna', 'apple', 'turkey', 'orange', 'banana']
>>> df
category price name
0 Deli 1 bologna
1 Fruits 2 apple
2 Deli 3 turkey
3 Fruits 4 orange
4 Fruits 5 banana
Solution
So now that we have the sample data set for Karen in department XYZ we can see that there’s info that she doesn’t want. Soooo typical of Karen. We’ll use some built in filtering abilities of Pandas DataFrames to clean up her data. We’re only going to filter out those rows that are labeled ‘Deli.’
df2 = df.loc[df['category']!= 'Deli']
>>> df2
category price name
1 Fruits 2 apple
3 Fruits 4 orange
4 Fruits 5 banana
Pretty succinct and straight forward. We could have used quite a few other methods to filter a Pandas DataFrame in Python. Feel free to give me some feedback and happy coding! Hopefully Karen’s happy with our work.