π

# Calculate cumulative sum in SQL

Calculating cumulative sums or running totals is a common task in data analysis, especially when dealing with time series data or analyzing sequential patterns. In SQL, this can be achieved using window functions, providing a powerful way to perform complex aggregations. Here is how to calculate cumulative sums, percentages, and group-wise cumulative sums in SQL.

# Create a sample table

Letβs start by creating a small example table from a BigQuery public dataset, namely an extract of Google Analytics data from July 2017.

``````CREATE OR REPLACE TABLE `training.sql_cumsum`  AS (
SELECT
PARSE_DATE("%Y%m%d", date) AS day,
EXTRACT(WEEK FROM PARSE_DATE("%Y%m%d", date)) AS week,
SUM(totals.visits) AS sessions
GROUP BY 1,2
ORDER BY 1
LIMIT 10
);

SELECT * FROM `training.sql_cumsum`;``````
 day week sessions 0 2017-07-01 26 2048 1 2017-07-02 27 1895 2 2017-07-03 27 2046 3 2017-07-04 27 1938 4 2017-07-05 27 2885 5 2017-07-06 27 2658 6 2017-07-07 27 2450 7 2017-07-08 27 1859 8 2017-07-09 28 1921 9 2017-07-10 28 2769

# Running total

A running total, or cumulative sum, is the summation of a sequence of numbers that is updated each time a new number is added to the sequence. It's a common operation in time series analysis to understand trends over time.

To compute the running sum in SQL, use the window clause `SUM(metric) OVER (ORDER BY dimension)`:

``````SELECT
day,
sessions,
-- Running total of daily sessions
SUM(sessions) OVER (ORDER BY day) AS sessions_cumtd
FROM `training.sql_cumsum`
ORDER BY day
LIMIT 10``````
 day sessions sessions_cumtd 0 2017-07-01 2048 2048 1 2017-07-02 1895 3943 2 2017-07-03 2046 5989 3 2017-07-04 1938 7927 4 2017-07-05 2885 10812 5 2017-07-06 2658 13470 6 2017-07-07 2450 15920 7 2017-07-08 1859 17779 8 2017-07-09 1921 19700 9 2017-07-10 2769 22469

This query will return the daily sessions along with the cumulative sum, ordered by the date.

# Running percentage of total

Sometimes, it's useful to understand the cumulative percentage of a total, especially when analyzing proportions or contributions of parts to a whole. This can be achieved by dividing the cumulative sum by the total with `SUM(metric) OVER()`:

``````SELECT
day,
sessions,
-- Cumulated percentage of daily sessions
SUM(sessions) OVER (ORDER BY day) / SUM(sessions) OVER() AS sessions_pct_cumtd,
FROM `training.sql_cumsum`
ORDER BY day``````
 day sessions sessions_pct_cumtd 0 2017-07-01 2048 0.091148 1 2017-07-02 1895 0.175486 2 2017-07-03 2046 0.266545 3 2017-07-04 1938 0.352797 4 2017-07-05 2885 0.481196 5 2017-07-06 2658 0.599493 6 2017-07-07 2450 0.708532 7 2017-07-08 1859 0.791268 8 2017-07-09 1921 0.876764 9 2017-07-10 2769 1.000000

# Cumulated sum, by group

In many scenarios, you may want to calculate the cumulative sum within specific groups or categories. This can be done using the `PARTITION BY` clause, allowing you to segment the data and calculate the cumulative sum within each segment.

``````SELECT
day,
sessions,
week,
-- Cumulated sum of daily sessions
SUM(sessions) OVER (ORDER BY day) AS sessions_cumtd,
-- Cumulated sum of sessions, partitioned by week
SUM(sessions) OVER (PARTITION BY week ORDER BY day) AS sessions_cumtd_week
FROM `training.sql_cumsum`
ORDER BY day``````
 day sessions week sessions_cumtd sessions_cumtd_week 0 2017-07-01 2048 26 2048 2048 1 2017-07-02 1895 27 3943 1895 2 2017-07-03 2046 27 5989 3941 3 2017-07-04 1938 27 7927 5879 4 2017-07-05 2885 27 10812 8764 5 2017-07-06 2658 27 13470 11422 6 2017-07-07 2450 27 15920 13872 7 2017-07-08 1859 27 17779 15731 8 2017-07-09 1921 28 19700 1921 9 2017-07-10 2769 28 22469 4690

Calculating cumulative sums in SQL is a powerful technique that allows you to analyze trends, patterns, and sequences in your data. By leveraging window functions and the `PARTITION BY` clause, you can create complex aggregations tailored to your specific needs.