The OVER() and PARTITION BY() Function
What is the OVER() and PARTITION BY() Function?
The SQL PARTITION BY expression is a subclause of the OVER clause, which is used in almost all invocations of window functions[1] like AVG(), MAX(), and RANK() to partition the data based on a single column or multiple columns.
- This defines which column on which we need to perform the aggregation.
In SQL, a window function is a function which uses values from one or multiple rows to return a value for each row
-
Aggregating Data Within Groups: To perform aggregation functions (e.g., SUM(), AVG(), MIN()) within specific groups of data without collapsing the rows into a single output row per group (The GROUP BY Clause). This is achieved by partitioning the dataset by a e.g., column or condition.
-
Ranking and Row Numbering: To assign ranks or row numbers within partitions of data
We can further combine this with Conditional Logic using CASE and The DECODE() Function when you need conditional logic as part of the aggregation.
- Using GROUP BY(): When you want to filter rows before the aggregation is applied. This means that only the rows that meet the
WHEREcondition are included in the computation. - Using Conditional Logic: This means that rows not meeting the condition STILL contribute to the result set (with whatever default value you set).
General Syntax
select
column1
, column2
, ....
, aggregate_function OVER (
PARTITION BY column(s)_to_group_by -- Comma separated
ORDER BY column(s)_to_order_by -- Comma separated
) AS new_column_name
from table_name;
PARTITION BY?
If we were to move the PARITION BY clause, we are effectively applying the aggregate function against every row in the dataset.
Example: Aggregating Sums
SELECT
salesperson_id,
sale_date,
sale_amount,
SUM(sale_amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS cumulative_sales
FROM
sales;
SUM(sale_amount): This is the aggregate function to compute the sum.OVER (PARTITION BY salesperson_id ORDER BY sale_date): This defines the window:PARTITION BY salesperson_id: Divides the rows into partitions where each partition consists of sales made by a particular salesperson.ORDER BY sale_date: Orders the rows within each partition by the date of the sale to calculate the running total in chronological order.
Example 2: Counting Partitions
The Count() Function with PARTITION BY allows us to count rows within each partition separately.
SELECT
customer_id,
COUNT(*) OVER (PARTITION BY customer_id) AS order_count_per_customer
FROM
Orders;
A set of rows related to the current row is called a window or a window frame. Hence, the name of these functions: their result is based on a sliding window frame. ↩︎