The Conditional Aggregation

tags: #sql/useful_functions

What is Conditional Aggregation?

Conditional aggregation refers to the process of aggregating data based on specified conditions. It involves applying aggregate functions like COUNT, SUM, AVG, etc., selectively to data based on certain criteria or conditions.

Within the aggregate function, conditions are applied using CASE expressions. Each row is evaluated against the specified conditions, and the aggregate function operates only on rows that meet the conditions.

Summary

The aggregate function computes the result based on the filtered rows and returns the aggregated value.

General structure that can be adapted to different use case:

SELECT
    AGGREGATE_FUNCTION(CASE 
                          WHEN condition1 THEN expression1
                          WHEN condition2 THEN expression2
                          ...
                          ELSE default_expression 
                       END) AS alias
FROM
    your_table
GROUP BY
    group_column;


Case 1: Count of Occurrences

SELECT
    COUNT(CASE WHEN condition1 THEN 1 END) AS condition1_count,
    COUNT(CASE WHEN condition2 THEN 1 END) AS condition2_count,
    COUNT(CASE WHEN condition3 THEN 1 END) AS condition3_count
FROM
    your_table;

1.1. Null Values

To get the counts of null values for each of the columns cola, colb, and colc, you can use the COUNT function along with the CASE statement. Here's an example query:

SELECT
    COUNT(CASE WHEN cola IS NULL THEN 1 END) AS cola_null_count,
    COUNT(CASE WHEN colb IS NULL THEN 1 END) AS colb_null_count,
    COUNT(CASE WHEN colc IS NULL THEN 1 END) AS colc_null_count
FROM
    your_table;
Description

In this query:

  • cola_null_count is the count of null values in the cola column.
  • colb_null_count is the count of null values in the colb column.
  • colc_null_count is the count of null values in the colc column.

The CASE WHEN ... THEN 1 END construct is used to count null values as 1 and non-null values as 0. The COUNT function then tallies these values.

This query provides a result set with counts of null values for each specified column. If a column has a NOT NULL constraint, its corresponding null count should ideally be zero.

Adjust the table and column names in the query according to your actual schema and table structure.


Case 2: Sum and Average

Different Categories

SELECT
    SUM(CASE WHEN category = 'A' THEN 1 ELSE 0 END) AS category_A_count,
    SUM(CASE WHEN category = 'B' AND status = 'active' THEN 1 ELSE 0 END) AS active_category_B_count,
    AVG(CASE WHEN sales > 1000 THEN sales END) AS avg_sales_above_1000
FROM
    your_table;
Powered by Forestry.md