Why I'm Not Convinced About Data Lakehouse Architecture

I've built 3 data lakehouses and I'm not convinced about this architecture at all. I'm not sure if I'm doing it wrong but I don't think I want to try again.
Unifying the best of data warehouse and data lake sounds compelling, but all of my projects seemed to suffer from the following challenges.
You End Up Rewriting a Data Warehouse
The marketing materials make it sound easy, but you are inevitably thinking about how to layout data on object stores, maintenance jobs, concurrent access, transactions and/or reading the Iceberg spec. Low level non differentiating stuff.
This is engineering effort that could be spent on business logic rather than reinventing storage layer primitives.
More Expensive Than You Think
You save by using object storage but you end up writing ingestion logic, transformation jobs etc which are expensive to run (e.g., Databricks or EMR). You also pay for billions of S3 access calls. And then the engineering time costs more than license fees you are trying to avoid.
The total cost of ownership often exceeds what you'd pay for a managed data warehouse when you factor in:
- Compute costs for transformation engines
- API call costs for object storage
- Engineering time for custom tooling
- Operational overhead
Not Fast Enough for Real-Time Use Cases
You think you don't have a real time use case, but you inevitably want to run an app or something latency sensitive and the lakehouse soon turns out to be too slow for anything fast and interactive. You are then totally snookered and have to implement caching layers which make everything more complex.
This adds another layer of complexity to your architecture and creates cache invalidation challenges.
Ecosystem Tooling is Poor Compared to a Data Warehouse
Reporting, data quality, machine learning etc. Everything is made awkward vs SQL data warehouses.
The maturity of tooling around traditional data warehouses is hard to beat. Years of ecosystem development means better integrations, better debugging tools, and better operational workflows.
My Conclusion
The technology is cool for data geeks and the idea of open standards based data is compelling, but with how well data warehouses can work with unstructured data and with how well they can export/interoperate with things like Iceberg or Parquet, I honestly think I'll just build around a data warehouse from now on unless I'm operating at truly massive scale.
Modern data warehouses have evolved to handle many of the use cases that lakehouses were designed for, while maintaining the simplicity and performance that made them successful in the first place.

