The OVER() and PARTITION BY() Function

What is the OVER() and PARTITION BY() Function?

The SQL PARTITION BY expression is a subclause of the OVER clause, which is used in almost all invocations of window functions[1] like AVG()MAX(), and RANK() to partition the data based on a single column or multiple columns.

In SQL, a window function is a function which uses values from one or multiple rows to return a value for each row

Use Cases

  1. Aggregating Data Within Groups: To perform aggregation functions (e.g., SUM(), AVG(), MIN()) within specific groups of data without collapsing the rows into a single output row per group (The GROUP BY Clause). This is achieved by partitioning the dataset by a e.g., column or condition.

  2. Ranking and Row Numbering: To assign ranks or row numbers within partitions of data

We can further combine this with Conditional Logic using CASE and The DECODE() Function when you need conditional logic as part of the aggregation.

How is using conditional logic different from using The GROUP BY Clause?

  • Using GROUP BY(): When you want to filter rows before the aggregation is applied. This means that only the rows that meet the WHERE condition are included in the computation.
  • Using Conditional Logic: This means that rows not meeting the condition STILL contribute to the result set (with whatever default value you set).


General Syntax

select
	column1
	, column2
	, ....
	, aggregate_function OVER (
		PARTITION BY column(s)_to_group_by  -- Comma separated
		ORDER BY column(s)_to_order_by      -- Comma separated
	) AS new_column_name
from table_name;
What if remove the PARTITION BY?

If we were to move the PARITION BY clause, we are effectively applying the aggregate function against every row in the dataset.

Example: Aggregating Sums

SELECT
    salesperson_id,
    sale_date,
    sale_amount,
    SUM(sale_amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS cumulative_sales
FROM
    sales;

Example 2: Counting Partitions

The Count() Function with PARTITION BY allows us to count rows within each partition separately.

SELECT 
    customer_id,
    COUNT(*) OVER (PARTITION BY customer_id) AS order_count_per_customer
FROM 
    Orders;

  1. A set of rows related to the current row is called a window or a window frame. Hence, the name of these functions: their result is based on a sliding window frame. ↩︎

Powered by Forestry.md