cd ..
GeneralPerformance

The Risk Of Misusing ClickHouse Partitions

Benjamin Wootton
2025-12-10
8 min read
Featured image for The Risk Of Misusing ClickHouse Partitions

In ClickHouse it is possible to partition tables by a chosen field. This involves splitting the table into separate files on disk based on the value of the partition column. We do this using the PARTITION BY syntax, noting that we can partition and order by different fields:

CREATE TABLE sales (
    sale_date Date,
    product_id UInt32,
    customer_id UInt32,
    country String,
    amount Decimal(10, 2)
)
ENGINE = MergeTree
PARTITION BY country
ORDER BY (sale_date, product_id);

Many people reach for partitions thinking that they will speed up queries. This may be a hangover from other relational databases or those that have worked with cloud data lakes where partitioning is an important lever for performance.

In ClickHouse however, partitions are less relevant and useful for query speed. Though partitioning can have an performance impact through a process called partition pruning, they also comes with risks and downsides, especially if you get the partitioning schema wrong.

As a ClickHouse consultant, I have found partitioning to be at the root of many performance and scalability issues and now only deploy partitions with great care.

Partitions Increase The Number Of Files (Parts)

Each ClickHouse partition contains one or more files known as parts.

Say we have a dataset which we split into 10 separate partitions such as in the country example above. Each of those partitions will contain at least one part which in turn stores the underlying data.

This can be shown by selecting the hidden _part column to show which physical parts rows have landed within. In the example below we can see that our rows have all landed in different physical part files with the exception of the Germany rows which have been colocated.

SELECT _part, country, amount
FROM sales
LIMIT 10

┌─_part─────────────────────┬─country────────┬───amount─┐
│ 8b5f2a1c9d3e7f4a_0_0_0    │ Germany        │   149.50 │
│ 8b5f2a1c9d3e7f4a_0_0_0    │ Germany        │    89.99 │
│ 3c7e9b2d1f8a6c5e_0_0_0    │ France         │   245.00 │
│ 9a4d8c3b2e7f1a6d_0_0_0    │ United Kingdom │   312.75 │
│ 2f6a9c8d4b3e7a1c_0_0_0    │ Spain          │    67.50 │
│ 7d3c8a9b2e4f6c1a_0_0_0    │ Italy          │   198.25 │
│ 4a8f2c6d9b3e1a7c_0_0_0    │ Netherlands    │   425.00 │
│ 6c1a9d4b8e2f7a3c_0_0_0    │ Poland         │    55.99 │
│ 1e7c3a9d8b4f2a6c_0_0_0    │ Belgium        │   178.50 │
│ 5b2d8a4c9e3f7a1b_0_0_0    │ Sweden         │   299.00 │
└───────────────────────────┴────────────────┴──────────┘

Often, a partition ends up containing multiple parts as the dataset grows and as we have inserts which have not yet merged. This means that 10 partitions could turn into 20, 50 or even more than 100 parts depending on what is happening in the database.

This fragmentation has multiple downside which we will cover in the rest of this article.

More Parts Increases File IO

If we execute a query which only targets a few partitions such as our most recent hot data, the existence of partitions can help our queries by allowing it to quickly prune unnecessary parts and focus only on those relevant to our query.

However, if we execute a select query which spans many partitions, now we have to open and close many part files to evaluate the query.

Furthermore, for any queries which do not contain the partition key in the WHERE again we could have to visit tens, hundreds or even thousands of parts when we have a granular partitioning scheme.

All of these files need to be opened, closed and managed compared with just 1 or 2 larger ones in an unpartitioned scheme. This is always going to be slower and more expensive to manage, especially when you account for things such as file handling, extra cache eviction, disk IO and concurrency limits.

Unless our queries are very predictable and uniform, partitions can just as easily slow down as many or even more queries than it speeds up.

This Increases File Access On Inserts

Partitions can also lead to significantly more work on inserts.

Say we have partitioned by state and a batch of records are inserted into ClickHouse containing 50 distinct states. Instead of needing to write one big part, we now need to split that insert block and write to 50 separate partitions and therefore part files.

That's a painful enough operation in itself, but the real consequence is how these fragmented parts lead to much more background work.

Partitions Increase Merge Activity

When inserts are fragmented into multiple partitions and parts, all of these different parts then need to be managed in the background.

With a single insert we could create tens of parts in different partitions. All of these ahve to be merged independently within the partition. This process involves opening all of the small files, reading them into memory, sorting, building new blocks and rewriting. Doing this for lots of small files is less efficient than doing it for fewer larger files.

This Doesn't Scale In A Clustered Situation

In a clustered scenario, more parts being inserted lead to exponentially more work in the cluster. For instance, more interactions with Zookeeper and more parts that need to be downloaded between replicas.

A failure mode I have seen is when people insert into distributed tables. Though this is not advisable, it can be fatal in a situation where you have granular parts and partitions, leading to a situation where small parts are queued or back up in the distribution queue.

Partitions Make Mutations More Expensive

Updating and deleting data in ClickHouse is typically an expensive exercise. However, when we have partitions they become even more expensive. Rewriting multiple small parts or even making use of lightweight deletes and updates is much more expensive when we have data fragmented across partitions and parts.

Partitions Can Break ReplacingMergeTree Deduplication

ReplacingMergeTree is commonly used when you want ClickHouse to automatically deduplicate rows based on a key, keeping only the most recent version.

However, ClickHouse only merges parts within the same partition. This means that deduplication only happens within a partition and duplicates can remain in place across different partitions.

This is a common source of data quality issues and can be very difficult to debug. The workaround is to either avoid partitioning ReplacingMergeTree tables, partition by the deduplication key itself, or use FINAL in your queries to force deduplication at query time at significant performance cost.

This All Slows Down Your Queries Due To Concurrency Issues

Remember that the starting point for introducing partitions is often about query performance. However, with all of the downstream implications of partitions, you can find that your server begins experiencing concurrency issues as your select queries wait for CPU cores which are being used for inserts and mutations.

In one recent project I put together what I thought was a good partitioning scheme for speeding up queries, but found that all of the issues outlined above combined to give me net worse performance.

When To Use Partitions

Partitions are most useful as a data management feature. We can do things such as drop a partition easily and efficiently or detach partitions if we want to move them out of scope.

However, this tends to be a small benefit in practice and one I have rarely used before. For dropping old data I would usually reach for TTL instead of this.

What Are The Alternatives?

PRIMARY KEY and ORDER BY is far more important than partitioning for query performance. By ordering data in an optimal way that align with your query patterns, ClickHouse will be able to jump directly to the rows it needs just as quickly as a partitioning scheme could achieve.

Using other ClickHouse optimisations such as projections and secondary indexes can help people do what they are looking to achieve with partitions in a better way. Secondary indexes for instance can allow us to skip vast quantities of data to give us the same outcome as partitions in a more idiomatic way.

Remember To Test

Partitions are something I find hard to reason about. I have had projects where I thought long and hard about partitions, introduced them and found that performance was horrible because of all of the knock on effects. I then moved to a less granular partitioning scheme such as from month to year and found a huge step up in performance.

Fortunately they are relatively easy to test. Just create a table with an alternate partitioning scheme, INSERT INTO SELECT FROM, wait for the merges to complete, then EXCHANGE the tables.

CREATE TABLE sales_new (
    sale_date Date,
    product_id UInt32,
    customer_id UInt32,
    country String,
    amount Decimal(10, 2)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(sale_date)
ORDER BY (country, sale_date, product_id);

INSERT INTO sales_new SELECT * FROM sales;

EXCHANGE TABLES sales AND sales_new;

By combining this with some monitoring of the parts log then you have a pragmatic way to work out the optimal partitioning scheme.

Portrait of Benjamin Wootton

Written by

Benjamin Wootton

Freelance Consultant - ClickHouse

Connect on LinkedIn
END OF FILE