Using ClickHouse Cloud For Real Time Sports Analytics

Benjamin Wootton

Benjamin Wootton

Published on July 3, 2025

Using ClickHouse Cloud For Real Time Sports Analytics

Increasingly, we are seeing sports teams and players make use of data, analytics and AI to help gain an edge.

On the field, data can be used to help teams or players improve their performance by better understanding how they and their competitors are playing or identifying weaknesses in an opponents play.

Off the field, data can used for use cases such as transfer analysis, media or television coverage, fan engagement, identifying cheating or betting and gaming. These datasets can represent significant adjacent revenue streams when packaged in the right way.

With this in mind, I wanted to put together a demonstration of how we could use ClickHouse to analyse a sports related dataset. My aim was to generate data at a similar frequency and scale as we would expect in a real match in order to performance test ClickHouse in terms of scale and latency. I then wanted to generate insights in real time so that they could be used by a player or coach during a game.

I chose to do a football example (soccer for our American readers). I would record data about players moving around a pitch with their position being captured by an IOT sensor. I would then analyse their movements to understand at an aggregagated level how far players are travelling, where they are moving on the pitch, and how they are moving relative to other players. Though this example is football related, the same approach could be used in almost any sport.

The end result of my exploration looks like this, a live heatmap of player and ball movements around the pitch. The data is processed in ClickHouse and the visualisation is rendered through a React application.

Why Use ClickHouse Cloud For This

The performance of ClickHouse makes it a perfect fit for sports analytics use cases like this.

High volumes of data from IOT sensors or data entry teams could be streamed directly into the database in real time and at scale. There is no need for pre-processing and we can even import JSON or MQTT data directly into ClickHouse.

We can take the freshly ingested data and make use of powerful ClickHouse features for numeric analysis and to derive aggregates and statistics. The results of this analysis can be placed into materialised views to make them fast to lookup by our frontends.

If we wish to expose the data to thousands of concurrent website or mobile app visitors then ClickHouse is also a good choice of technology for backing a system like this.

In short, ClickHouse offers a unique combination of real time capabilities, concurrency and scalability which makes it perfect for this task.

Finally, we opted for ClickHouse Cloud over open source for this one due to the need to scale up and down. During the week the database could scale to zero, whereas at the weekend during a game we may need to scale up to a huge distributed cluster to serve all of our concurrent viewers. This bbursty workload makes ClickHouse Cloud very compelling.

About The Source Data

My first aim was to simulate both the ball and the players realistically moving around a pitch. Though we could have approximated this with a SQL insert query, I reached for Python to try to model realistic movement whereby the player is attracted to their primary position, but will drift away from it 1 or 2 cells per time to track the ball. The Python script was developed and ran from Hex, a data science notebook that we really like to use when working with ClickHouse.:

The Python script runs in real time and aims to insert roughly one record per second for each of the 22 players plus the match ball. Each record contains an X/Y coordinate of the player, between 0 and 105 on the X plane and 62 on the Y plane in line with a standard football pitch dimensions.

Each players X/Y coordinates are recorded as a time series in a denormalised match_positions table. Though the players position is recorded as a float, at any one time they can be allocated to a specific cell by taking the floor().

SELECT
    player_name,
    x_position,
    y_position,
    floor(x_position),
    floor(y_position),
    timestamp
FROM match_positions
ORDER BY x_position DESC
LIMIT 5

Query id: 0665b273-5175-4614-af46-567ecdd3adcd

   ┌─player_name─┬─x_position─┬─y_position─┬─floor(x_position)─┬─floor(y_position)─┬───────────────timestamp─┐
1. │ Ederson     │        104 │      34.89 │               104 │                34 │ 2025-06-23 15:06:09.000 │
2. │ Ederson     │        104 │      38.33 │               104 │                38 │ 2025-06-23 15:23:25.000 │
3. │ Ederson     │        104 │      30.46 │               104 │                30 │ 2025-06-23 15:16:31.000 │
4. │ Ederson     │        104 │      42.16 │               104 │                42 │ 2025-06-23 15:17:43.000 │
5. │ Ederson     │        104 │      33.93 │               104 │                33 │ 2025-06-23 15:06:10.000 │
   └─────────────┴────────────┴────────────┴───────────────────┴───────────────────┴─────────────────────────┘

5 rows in set. Elapsed: 0.009 sec. Processed 124.20 thousand rows, 3.95 MB (13.22 million rows/s., 420.21 MB/s.)
Peak memory usage: 7.24 MiB.

One of my aims for the project was to generate the heatmap shown above to visualise where players are spending most of their time. To inform this we can simply aggregate over this table to understand which cells each player spends their time in. In the example below we can see that Mohamed Salah was observed in cell 69x49 (on the right wing!) a total of 62 times in the dataset.

SELECT
    player_name,
    floor(x_position) AS x,
    floor(y_position) AS y,
    count()
FROM match_positions
WHERE player_name = 'Salah'
GROUP BY
    player_name,
    x,
    y
HAVING count() >= 5
ORDER BY 4 DESC
LIMIT 10

Query id: 94b8840a-6acd-45f0-ae30-5319d829b4d3

    ┌─player_name─┬──x─┬──y─┬─count()─┐
 1. │ Salah       │ 69 │ 49 │      62 │
 2. │ Salah       │ 69 │ 50 │      57 │
 3. │ Salah       │ 69 │ 48 │      50 │
 4. │ Salah       │ 68 │ 51 │      50 │
 5. │ Salah       │ 70 │ 51 │      49 │
 6. │ Salah       │ 70 │ 49 │      48 │
 7. │ Salah       │ 71 │ 51 │      48 │
 8. │ Salah       │ 68 │ 49 │      48 │
 9. │ Salah       │ 71 │ 49 │      47 │
10. │ Salah       │ 69 │ 51 │      47 │
    └─────────────┴────┴────┴─────────┘

10 rows in set. Elapsed: 0.009 sec. Processed 124.20 thousand rows, 2.95 MB (13.82 million rows/s., 328.57 MB/s.)
Peak memory usage: 2.42 MiB.

Another simple view which may be interesting is to understand which section of the field the ball spent it’s time in to indicate which team are attacking most. This can be calculated by dividing the pitch into sections using fixed widths. Totals and percentages can be calculated from the match positions:

WITH
    ball_positions AS
    (
        SELECT
            x_position,
            y_position,
            timestamp
        FROM match_positions
        WHERE player_id = 'Ball'
    ),
    ball_range AS
    (
        SELECT
            min(x_position) AS min_x,
            max(x_position) AS max_x
        FROM ball_positions
    ),
    all_sectors AS
    (
        SELECT number + 1 AS sector
        FROM numbers(5)
    ),
    sector_data AS
    (
        SELECT
            multiIf(x_position <= (min_x + ((max_x - min_x) * 0.2)), 1, x_position <= (min_x + ((max_x - min_x) * 0.4)), 2, x_position <= (min_x + ((max_x - min_x) * 0.6)), 3, x_position <= (min_x + ((max_x - min_x) * 0.8)), 4, 5) AS sector,
            count() AS time_in_sector
        FROM ball_positions
        CROSS JOIN ball_range
        GROUP BY sector
    ),
    total_time AS
    (
        SELECT sum(time_in_sector) AS total
        FROM sector_data
    )
SELECT
    s.sector AS sector,
    COALESCE(sd.time_in_sector, 0) AS time_in_sector,
    round((COALESCE(sd.time_in_sector, 0) * 100.) / tt.total, 2) AS percentage
FROM all_sectors AS s
LEFT JOIN sector_data AS sd ON s.sector = sd.sector
CROSS JOIN total_time AS tt
ORDER BY s.sector ASC

Query id: 69d18ab4-8cf2-41ac-a24c-2d0d56467582

   ┌─sector─┬─time_in_sector─┬─percentage─┐
1. │      1 │            185 │       3.43 │
2. │      2 │           1528 │       28.3 │
3. │      3 │           2737 │      50.69 │
4. │      4 │            865 │      16.02 │
5. │      5 │             85 │       1.57 │
   └────────┴────────────────┴────────────┘

5 rows in set. Elapsed: 0.037 sec. Processed 496.81 thousand rows, 9.83 MB (13.35 million rows/s., 264.06 MB/s.)
Peak memory usage: 14.66 MiB.

This data is used to build the following visualisation in our React frontend:

Though it probably wouldn’t be necessary at this scale, we exposed these views as materialised views so that the results can be incrementally computed each time we have a new match position record. This would keep the response time for our frontend as fast as possible even as the game progresses and more data is added into the source data. This would be particularly relevant when we have a high number of concurrent users.

Distance Calculations

Though we have a time series of the players X/Y positions, we have a little work to do to turn this into distance data. We can do this by making use of window functions to tie a players position to their previous position, then use the Euclidean distance formula to calculate the distance change for each record.

WITH position_distances AS
    (
        SELECT
            player_name,
            x_position,
            y_position,
            timestamp,
            lagInFrame(x_position, 1) OVER w AS prev_x,
            lagInFrame(y_position, 1) OVER w AS prev_y
        FROM soccer_analytics.match_positions
        WHERE player_name = 'Salah'
        WINDOW w AS (PARTITION BY player_name ORDER BY timestamp ASC)
    )
SELECT
    player_name,
    x_position,
    y_position,
    timestamp,
    sqrt(pow(x_position - prev_x, 2) + pow(y_position - prev_y, 2)) AS step_distance
FROM position_distances
ORDER BY
    player_name ASC,
    timestamp ASC
LIMIT 5

Query id: ba65524e-10bc-49cb-9503-a681af392c7b

   ┌─player_name─┬─x_position─┬─y_position─┬───────────────timestamp─┬──────step_distance─┐
1. │ Salah       │      59.06 │      49.47 │ 2025-06-23 15:00:00.000 │  77.04131867374484 │
2. │ Salah       │      59.38 │      52.36 │ 2025-06-23 15:00:01.000 │  2.907661651027494 │
3. │ Salah       │      60.74 │      49.28 │ 2025-06-23 15:00:02.000 │ 3.3668996034121434 │
4. │ Salah       │      60.34 │      55.21 │ 2025-06-23 15:00:03.000 │  5.943475821444074 │
5. │ Salah       │      66.29 │      52.15 │ 2025-06-23 15:00:04.000 │  6.690746904313433 │
   └─────────────┴────────────┴────────────┴─────────────────────────┴────────────────────┘

5 rows in set. Elapsed: 0.012 sec. Processed 124.20 thousand rows, 3.95 MB (10.24 million rows/s., 325.43 MB/s.)
Peak memory usage: 7.31 MiB.

We can use a similar approach to rank our players based on how much distance they have covered:

WITH
    position_distances AS
    (
        SELECT
            player_name,
            x_position,
            y_position,
            timestamp,
            lagInFrame(x_position, 1) OVER w AS prev_x,
            lagInFrame(y_position, 1) OVER w AS prev_y
        FROM soccer_analytics.match_positions
        WHERE player_name != 'Match Ball'
        WINDOW w AS (PARTITION BY player_name ORDER BY timestamp ASC)
    ),
    player_total_distances AS
    (
        SELECT
            player_name,
            sum(sqrt(pow(x_position - prev_x, 2) + pow(y_position - prev_y, 2))) AS total_distance_covered
        FROM position_distances
        WHERE prev_x IS NOT NULL
        GROUP BY player_name
    )
SELECT
    player_name,
    total_distance_covered,
    rank() OVER (ORDER BY total_distance_covered DESC) AS distance_rank
FROM player_total_distances
ORDER BY distance_rank ASC

Query id: 5204a713-1994-4255-9037-705feb5c73e0

    ┌─player_name──────┬─total_distance_covered─┬─distance_rank─┐
 1. │ Thiago           │     15885.452286834556 │             1 │
 2. │ Fabinho          │     15837.986919173645 │             2 │
 3. │ Gundogan         │     15760.845081490974 │             3 │
 4. │ Rodri            │     15623.971044052041 │             4 │
 5. │ Henderson        │     15508.640366100735 │             5 │
 6. │ Walker           │      14970.80207873978 │             6 │
 7. │ Alexander-Arnold │     14905.663935419661 │             7 │
 8. │ Robertson        │     14857.358633348349 │             8 │
 9. │ Nunez            │     14787.387002795449 │             9 │
10. │ Cancelo          │     14692.806968891939 │            10 │
11. │ De_Bruyne        │     14630.147235236394 │            11 │
12. │ Haaland          │     14625.545360647466 │            12 │
13. │ Diaz             │     14593.543208102144 │            13 │
14. │ Mahrez           │       14570.6445839304 │            14 │
15. │ Grealish         │      14546.08687134666 │            15 │
16. │ Konate           │     14531.325777164402 │            16 │
17. │ Salah            │     14527.366844371758 │            17 │
18. │ Van_Dijk         │     14496.910397768248 │            18 │
19. │ Ake              │     14481.101652761357 │            19 │
20. │ Ederson          │       14401.6964663955 │            20 │
21. │ Dias             │      14353.00043518532 │            21 │
22. │ Alisson          │      14250.78925854333 │            22 │
23. │ Ball             │      13016.01061425975 │            23 │
    └──────────────────┴────────────────────────┴───────────────┘

23 rows in set. Elapsed: 0.041 sec. Processed 124.20 thousand rows, 3.95 MB (3.00 million rows/s., 95.29 MB/s.)
Peak memory usage: 11.77 MiB.

This can be used to drive the following data table in our application frontend:

Distances can also be summed up in a running total to understand the cumulative total over time:

WITH position_distances AS
    (
        SELECT
            player_name,
            x_position,
            y_position,
            timestamp,
            lagInFrame(x_position, 1) OVER w AS prev_x,
            lagInFrame(y_position, 1) OVER w AS prev_y
        FROM soccer_analytics.match_positions
        WHERE player_name = 'Salah'
        WINDOW w AS (PARTITION BY player_name ORDER BY timestamp ASC)
    )
SELECT
    player_name,
    x_position,
    y_position,
    timestamp,
    sqrt(pow(x_position - prev_x, 2) + pow(y_position - prev_y, 2)) AS step_distance,
    sum(sqrt(pow(x_position - prev_x, 2) + pow(y_position - prev_y, 2))) OVER (PARTITION BY player_name ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_distance
FROM position_distances
ORDER BY
    player_name ASC,
    timestamp ASC
LIMIT 5

Query id: cc4f7273-8b39-480b-af3e-c21311632898

   ┌─player_name─┬─x_position─┬─y_position─┬───────────────timestamp─┬──────step_distance─┬────total_distance─┐
1. │ Salah       │      59.06 │      49.47 │ 2025-06-23 15:00:00.000 │  77.04131867374484 │ 77.04131867374484 │
2. │ Salah       │      59.38 │      52.36 │ 2025-06-23 15:00:01.000 │  2.907661651027494 │ 79.94898032477234 │
3. │ Salah       │      60.74 │      49.28 │ 2025-06-23 15:00:02.000 │ 3.3668996034121434 │ 83.31587992818447 │
4. │ Salah       │      60.34 │      55.21 │ 2025-06-23 15:00:03.000 │  5.943475821444074 │ 89.25935574962854 │
5. │ Salah       │      66.29 │      52.15 │ 2025-06-23 15:00:04.000 │  6.690746904313433 │ 95.95010265394197 │
   └─────────────┴────────────┴────────────┴─────────────────────────┴────────────────────┴───────────────────┘

5 rows in set. Elapsed: 0.015 sec. Processed 124.20 thousand rows, 3.95 MB (8.37 million rows/s., 265.91 MB/s.)
Peak memory usage: 7.31 MiB.

The chart below, surfaced from Hex shows the cumulative distance of Mohamed Salah over the 90 minute match.

Relative Movements

Finally I thought it would be interesting to look at distances between players to understand how they are moving relative to each other on the pitch. This involves joining all of the players with each other and claculating their distance at each timestamp, before taking an average:

WITH
    player_distances AS
    (
        SELECT
            p1.player_id AS player1_id,
            p1.player_name AS player1_name,
            p1.team_name AS team1_name,
            p1.player_position AS player1_position,
            p2.player_id AS player2_id,
            p2.player_name AS player2_name,
            p2.team_name AS team2_name,
            p2.player_position AS player2_position,
            sqrt(pow(p1.x_position - p2.x_position, 2) + pow(p1.y_position - p2.y_position, 2)) / 10 AS distance_meters,
            p1.timestamp
        FROM match_positions AS p1
        INNER JOIN match_positions AS p2 ON (p1.timestamp = p2.timestamp) AND (p1.match_id = p2.match_id) AND (p1.team_name != p2.team_name) AND (p1.player_id != 'Ball') AND (p2.player_id != 'Ball')
    ),
    avg_distances AS
    (
        SELECT
            player1_name,
            team1_name,
            player1_position,
            player2_name,
            team2_name,
            player2_position,
            round(avg(distance_meters), 2) AS avg_distance,
            round(min(distance_meters), 2) AS min_distance,
            count() AS measurements
        FROM player_distances
        GROUP BY
            player1_name,
            team1_name,
            player1_position,
            player2_name,
            team2_name,
            player2_position
        HAVING measurements > 100
    )
SELECT *
FROM avg_distances
ORDER BY
    team1_name ASC,
    player1_position ASC,
    avg_distance ASC
LIMIT 10

Query id: d7656fab-dbd2-4b11-a2ac-b32782e0b002

    ┌─player1_name─┬─team1_name───┬─player1_position─┬─player2_name─┬─team2_name──────┬─player2_position─┬─avg_distance─┬─min_distance─┬─measurements─┐
 1. │ Van_Dijk     │ Liverpool FC │ CB               │ Haaland      │ Manchester City │ ST               │         1.08 │         0.02 │         5400 │
 2. │ Konate       │ Liverpool FC │ CB               │ Haaland      │ Manchester City │ ST               │         1.12 │         0.01 │         5400 │
 3. │ Konate       │ Liverpool FC │ CB               │ Grealish     │ Manchester City │ LW               │         1.66 │         0.05 │         5400 │
 4. │ Van_Dijk     │ Liverpool FC │ CB               │ Mahrez       │ Manchester City │ RW               │         1.66 │         0.05 │         5400 │
 5. │ Van_Dijk     │ Liverpool FC │ CB               │ Grealish     │ Manchester City │ LW               │         2.54 │         0.17 │         5400 │
 6. │ Konate       │ Liverpool FC │ CB               │ Mahrez       │ Manchester City │ RW               │         2.59 │         0.59 │         5400 │
 7. │ Van_Dijk     │ Liverpool FC │ CB               │ Rodri        │ Manchester City │ CDM              │         3.47 │         0.94 │         5400 │
 8. │ Konate       │ Liverpool FC │ CB               │ Rodri        │ Manchester City │ CDM              │         3.52 │         1.14 │         5400 │
 9. │ Van_Dijk     │ Liverpool FC │ CB               │ De_Bruyne    │ Manchester City │ CAM              │         3.67 │         0.89 │         5400 │
10. │ Konate       │ Liverpool FC │ CB               │ Gundogan     │ Manchester City │ CM               │         3.98 │         1.56 │         5400 │
    └──────────────┴──────────────┴──────────────────┴──────────────┴─────────────────┴──────────────────┴──────────────┴──────────────┴──────────────┘

10 rows in set. Elapsed: 0.287 sec. Processed 248.40 thousand rows, 21.08 MB (865.71 thousand rows/s., 73.47 MB/s.)
Peak memory usage: 64.74 MiB.

The completed visualisation looks like this in our React application:

Frontend Application

I put together the frontend application by “vibe-coding” it with Claude Code. Incidentally, this was very easy for such a powerful and complex visualisation and I can see more and more people building interactive data apps over their ClickHouse data instead of yet another dashboard.

The application incorporates the the ability to select a specific player, the entire team or the match ball and to add time windows such that a coach can for instance analyse what has happened in the last 5 minutes. The application interacts with ClickHouse directly, issuing the queries shown above with relatively minor modifications to incorporate the time windows and match selection. The completed app looks like this:

Conclusion

I believe that ClickHouse Cloud is an ideal fit for real time sports analytics use cases.

Here we have shown a real time data stream which is processed and analysed immediately as data is captured.

We have demonstrated how using only SQL we can perform relatively complex analysis work including calculating point in time and cumulative distances from X/Y positions, and looking at how players move relative to each other on the pitch.

With relatively little work we could introduce a Python layer between the web application and ClickHouse for more complex statistical work.

Finally, using AI we have developed an interactive web application which incorporates a powerful visualisation that allows us to slice and dice the data in a way that could genuinely assist a player or coach mid game.

By using ClickHouse Cloud, we can scale the system to zero when not in use, and scale it up hugely in a situation where we need to serve the analytics to thousands of fans.