The Conditional Aggregation
tags: #sql/useful_functions
What is Conditional Aggregation?
Conditional aggregation refers to the process of aggregating data based on specified conditions. It involves applying aggregate functions like COUNT, SUM, AVG, etc., selectively to data based on certain criteria or conditions.
Within the aggregate function, conditions are applied using CASE expressions. Each row is evaluated against the specified conditions, and the aggregate function operates only on rows that meet the conditions.
The aggregate function computes the result based on the filtered rows and returns the aggregated value.
General structure that can be adapted to different use case:
SELECT
AGGREGATE_FUNCTION(CASE
WHEN condition1 THEN expression1
WHEN condition2 THEN expression2
...
ELSE default_expression
END) AS alias
FROM
your_table
GROUP BY
group_column;
Case 1: Count of Occurrences
SELECT
COUNT(CASE WHEN condition1 THEN 1 END) AS condition1_count,
COUNT(CASE WHEN condition2 THEN 1 END) AS condition2_count,
COUNT(CASE WHEN condition3 THEN 1 END) AS condition3_count
FROM
your_table;
1.1. Null Values
To get the counts of null values for each of the columns cola, colb, and colc, you can use the COUNT function along with the CASE statement. Here's an example query:
SELECT
COUNT(CASE WHEN cola IS NULL THEN 1 END) AS cola_null_count,
COUNT(CASE WHEN colb IS NULL THEN 1 END) AS colb_null_count,
COUNT(CASE WHEN colc IS NULL THEN 1 END) AS colc_null_count
FROM
your_table;
In this query:
cola_null_countis the count of null values in thecolacolumn.colb_null_countis the count of null values in thecolbcolumn.colc_null_countis the count of null values in thecolccolumn.
The CASE WHEN ... THEN 1 END construct is used to count null values as 1 and non-null values as 0. The COUNT function then tallies these values.
This query provides a result set with counts of null values for each specified column. If a column has a NOT NULL constraint, its corresponding null count should ideally be zero.
Adjust the table and column names in the query according to your actual schema and table structure.
Case 2: Sum and Average
Different Categories
SELECT
SUM(CASE WHEN category = 'A' THEN 1 ELSE 0 END) AS category_A_count,
SUM(CASE WHEN category = 'B' AND status = 'active' THEN 1 ELSE 0 END) AS active_category_B_count,
AVG(CASE WHEN sales > 1000 THEN sales END) AS avg_sales_above_1000
FROM
your_table;