ArchitectureData Warehouse

Why I'm Not Convinced About Data Lakehouse Architecture

Benjamin Wootton

2025-12-05

3 min read

Featured image for Why I'm Not Convinced About Data Lakehouse Architecture

I've built 3 data lakehouses and I'm not convinced about this architecture at all. I'm not sure if I'm doing it wrong but I don't think I want to try again.

Unifying the best of data warehouse and data lake sounds compelling, but all of my projects seemed to suffer from the following challenges.

You End Up Rewriting a Data Warehouse

The marketing materials make it sound easy, but you are inevitably thinking about how to layout data on object stores, maintenance jobs, concurrent access, transactions and/or reading the Iceberg spec. Low level non differentiating stuff.

This is engineering effort that could be spent on business logic rather than reinventing storage layer primitives.

More Expensive Than You Think

You save by using object storage but you end up writing ingestion logic, transformation jobs etc which are expensive to run (e.g., Databricks or EMR). You also pay for billions of S3 access calls. And then the engineering time costs more than license fees you are trying to avoid.

The total cost of ownership often exceeds what you'd pay for a managed data warehouse when you factor in:

Compute costs for transformation engines
API call costs for object storage
Engineering time for custom tooling
Operational overhead

Not Fast Enough for Real-Time Use Cases

You think you don't have a real time use case, but you inevitably want to run an app or something latency sensitive and the lakehouse soon turns out to be too slow for anything fast and interactive. You are then totally snookered and have to implement caching layers which make everything more complex.

This adds another layer of complexity to your architecture and creates cache invalidation challenges.

Ecosystem Tooling is Poor Compared to a Data Warehouse

Reporting, data quality, machine learning etc. Everything is made awkward vs SQL data warehouses.

The maturity of tooling around traditional data warehouses is hard to beat. Years of ecosystem development means better integrations, better debugging tools, and better operational workflows.

My Conclusion

The technology is cool for data geeks and the idea of open standards based data is compelling, but with how well data warehouses can work with unstructured data and with how well they can export/interoperate with things like Iceberg or Parquet, I honestly think I'll just build around a data warehouse from now on unless I'm operating at truly massive scale.

Modern data warehouses have evolved to handle many of the use cases that lakehouses were designed for, while maintaining the simplicity and performance that made them successful in the first place.

Free Whitepaper

Why ClickHouse Should Be Your Next Database

How ClickHouse is differentiated from other analytical databases.

Download

Written by

Benjamin Wootton

Freelance Consultant - ClickHouse

I am a freelance consultant specialising in ClickHouse. I help businesses deploy ClickHouse open source and ClickHouse Cloud, build solutions on top of ClickHouse for real-time analytics, observability and AI, and resolve performance and reliability issues with their existing deployments. Visit my home page to learn more.

Connect on LinkedIn

END OF FILE

Related Insights

AIAgents

The Agentic Data Stack

3 min read

AIArchitecture

ClickHouse Is The Best Data Platform For Building AI Initiatives

2 min read