Confession…I’ve unironically bragged about the high quality of my 40 million record database (the largest of its kind in the nation), based on its over 150 edit checks.
Each of those edit checks represented something we'd learned the hard way: a field that shouldn't be null, a date range that didn't make sense, a relationship between columns that broke when upstream systems changed without warning. One hundred fifty rules, hand-crafted, each one a scar from a data quality incident.
…and people were impressed.
The arc of data quality
When I started working with data in financial services, quality was craftsmanship. You learned the data. You knew its quirks. You built rules based on institutional knowledge, and you maintained those rules in spreadsheets that validated other spreadsheets. The best data engineers weren't the ones with the fanciest code; they were the ones who'd been around long enough to know which fields lied.
Then, tooling emerged. Data quality vendors gave us profiling, standardization, and matching. The rules got more sophisticated. The interfaces got prettier, but the fundamental model stayed the same: humans write rules, systems enforce them.
The observability turn changed the question. Instead of "did the data pass my rules?" we started asking "is the data behaving the way it usually behaves?" AI represented anomaly detection “on steroids”…catching problems we hadn't thought to write rules for. Lineage meant understanding precisely where the problem originated.
And now we're somewhere else entirely. AI can generate rules. AI can explain anomalies. AI can trace lineage and suggest root causes. The 150 edit checks I was so proud of? A modern system would derive most of them faster than I care to imagine.
That's not a threat, but a liberation. The question is no longer "how many rules do you have?" It's "how reliable is your data, and how fast can you fix it when it breaks?"
Where we stand in 2026
The data reliability landscape is in the middle of an identity crisis. Categories that seemed distinct a few years ago are collapsing into each other.
Data observability vendors started with anomaly detection and pipeline monitoring. Now they're adding rule-based quality checks, governance features, and cost visibility. Data quality vendors are adding observability dashboards and lineage tracking. Catalog vendors want to be control planes. Security vendors are building data context because scanning without lineage is just noise.
Meanwhile, the platforms are absorbing everything. Every major data platform wants to own quality, governance, lineage, and access management natively. The independent vendor's competitive edge is under pressure.
And then there's AI. Not just as a feature, but as a fundamental shift in how we approach the problem. Why write rules when a model can learn what "normal" looks like? Why manually trace lineage when an LLM can parse your SQL and explain it in plain language? Why wait for a human to triage an incident when an agent can correlate the anomaly with the deployment that caused it?
We're not in the "edit checks" era anymore.
What's next…
I'll be researching and writing about the trends I'm tracking: the identity crisis in data observability, the convergence of quality, governance, analytics, security (and perhaps all things IT), as well as the build-vs-buy-vs-platform decisions that data teams increasingly face. The economics have changed: cloud-native architectures broke down the infrastructure silos, and now AI makes cross-domain correlation effective at scale.
If you're a vendor, practitioner, or participant navigating this space, I’d love to know what you’re seeing. Where are the tools working? Where are they falling short? What problems still aren't being solved? The most exciting part of this work is talking to people who are actually in the trenches.
Happy New Year. I’m excited to dig in.
Comments ( 0 )