Powerful DataFrame Filtering with the .query()
Method in Pandas
The .query()
method is a handy tool in pandas for filtering DataFrames based on conditional expressions. It offers a concise and readable way to select specific rows that meet certain criteria.
Here’s a breakdown of how it works:
Import pandas:
import pandas as pd
Create a DataFrame:
data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 22, 28],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
Filter using .query()
:
filtered_df = df.query("age > 25 and (city == 'New York' or city == 'Los Angeles')")
- The query string
age > 25 and (city == 'New York' or city == 'Los Angeles')
defines the filtering criteria. - In this case, we’re selecting rows where:
age
is greater than 25 andcity
is either 'New York' or 'Los Angeles'
View the filtered DataFrame:
print(filtered_df)
name age city
1 Bob 30 Los Angeles
As you can see, the .query()
method has filtered the DataFrame to include only rows where the 'age' is above 25 and the 'city' is either 'New York' or 'Los Angeles'.
Key Points:
- The
.query()
method uses a boolean expression to filter rows. - You can use various comparison operators (
==
,!=
,<
,>
, etc.) and logical operators (and
,or
,not
). - For column names with spaces or special characters, enclose them in backticks (`).
Additional Considerations:
- The
.query()
method returns a new DataFrame by default. To modify the original DataFrame, use theinplace=True
argument (not recommended for most cases). - You can use string methods like
str.contains()
for pattern matching in string columns.
By effectively using the .query()
method, you can streamline your DataFrame manipulation tasks in pandas, making data filtering more intuitive and efficient.