Boolean Masking

What is it?

Boolean masking in Pandas involves creating a boolean condition or series and using it to filter data in a DataFrame.

What is a Boolean Series (Condition)

A boolean condition or boolean series in Pandas is a series of boolean values (True or False) that indicates whether each element in the original series satisfies a certain condition. This is then used to create a mask where each element is marked as either True or False based on whether the condition is met.

The resulting DataFrame will only contain rows where the condition is True.

Getting a Boolean Series

We can broadcast a comparison operator to a specific column, with the results being returned as a Boolean Series.

df[df['colName']<comparison_operator> <value>]

The resultant series is indexed where the value of each cell is either True or False, depending on whether the boolean expression is satisfied.

Combining Multiple Boolean Masks

We combine boolean conditions based on logical operators using the following notation to combine two or more boolean conditions:

Logical Connectives Boolean Notation
AND &
OR |

Example:

# alternative to and operator
(df['chance of admit'] > 0.7) & (df['chance of admit'] < 0.9)

# alternative to or operator
(df['chance of admit'] > 0.7) | (df['chance of admit'] < 0.9)

Alternative to comparison operators

An alternative to using comparison operators is to use pandas built-in function which mimics this approach using the following (also works with series objects):

1. gt() greater than function
df['chance of admit'].gt(0.7)
2. lt() less than function
df['chance of admit'].lt(0.9)
3. le() less than/equal to function
df['chance of admit'].le(0.9)
4. ge() greater than/equal to function
df['chance of admit'].ge(0.9)
Powered by Forestry.md