Designing and building an open source ClickHouse cluster to capture network telemetry from thousands of routers at 500k rows per second, powering SOC and NOC analytics integrated with Splunk.
Our client operates a large enterprise network estate spanning thousands of routers, switches and edge devices distributed across multiple sites and geographies.
Network telemetry, flow data and device level events are fundamental inputs to both their Security Operations Centre (SOC) for threat detection and incident response, and their Network Operations Centre (NOC) for availability, capacity and performance monitoring.
Their incumbent tooling was struggling to keep up with the sustained ingest rates and retention requirements, whilst the cost of pushing the full firehose of telemetry into Splunk for long term retention had become prohibitive. They needed a high throughput, cost effective analytics tier that could sit alongside Splunk and act as the system of record for raw network telemetry.
The customer were experiencing the following challenges prior to the engagement:
We took the following approach to this project:
Key outcomes of the project included: