March 2026 – Complex-ish Systems

The security data problem has a structural cause that most tooling conversations avoid. Traditional SIEMs couple storage to compute: you pay per byte ingested, which means every byte retained is a cost center. The rational response is to filter at ingestion — sampling endpoint telemetry, dropping low-priority logs, ignoring anything that doesn’t map cleanly to a known detection use case. The result is a detection layer built on an incomplete data model, by design.

This isn’t a vendor failure. It’s an architectural constraint producing predictable behavior.

The security lakehouse architecture — and Lakewatch specifically, announced today — bets that decoupling compute from storage changes the economics enough to change the behavior. Store everything in open formats on cheap object storage. Run compute against it on demand. Pay for queries, not retention.

That shift has concrete implications for detection engineering that are worth taking seriously. I helped design Lakewatch, so hear me out.

What the architecture enables

When retention cost approaches zero, the first-order benefit is obvious: you keep data you previously discarded. But the more interesting second-order effect is on your data model.

Most detection gaps aren’t gaps in rules but rather they’re gaps in coverage. You can’t detect lateral movement involving a SaaS application you’re not ingesting. You can’t correlate an endpoint event with an identity event if they live in systems with different retention windows. You can’t build behavioral baselines across sparse data sources if you’re sampling them.

A unified data model with consistent retention across security, IT, and business data changes what detections are expressible, not just how fast you can run them. The multi-modal angle of ingesting video, audio, and unstructured sources for social engineering and insider threat detection extends the same argument. The constraint wasn’t that teams didn’t want that data. It was that the architecture made it prohibitively expensive to keep.

Detection-as-Code is also worth unpacking. Version-controlled detections with automated testing are conceptually straightforward, but the implementation friction has always been platform support. Most SIEMs treat detections as configuration rather than code, which means no CI/CD, no property-based testing, no systematic coverage analysis. Packaging that as a native feature rather than an afterthought changes how detection engineering can be practiced.

Getting the most out of it

The teams that will extract the most value from this architecture are the ones that bring good data engineering practices to the platform. This includes things like clear coverage goals defined against an actual asset model, detections maintained in version control, quality gates before production deployment, etc. Lakewatch removes the storage constraint that has historically made those practices hard to justify economically — which means now is exactly the right time to build them (if you haven’t already).

The AI agents that automate triage and threat hunting are a real capability multiplier, but like any detection tooling they perform best on clean, well-modeled telemetry. Teams that invest in their data pipelines and schema normalization upfront will see compounding returns as the agentic layer matures.

The market question

The SiftD acquisition (e.g. bringing in the team that built Splunk’s query language and search architecture) signals that Databricks understands that practitioner trust matters as much as the data platform story. SPL became the lingua franca of detection engineering because it was optimized for the specific cognitive patterns of writing and debugging detections. That institutional knowledge is now inside Lakewatch, which matters for how the product evolves.

The architectural argument for the security lakehouse has been sound for years. Lakewatch is the most serious production bet on it yet. The teams that get ahead of it now are going to be well-positioned as the rest of the market catches up.

I work at Databricks. This blog is my own independent analysis and not affiliated with my employer.

Month: March 2026

The security lakehouse architecture is sound. Here’s what changes.