Over the past few weeks, I've been writing about data quality from a few different angles. The critical role of the data analyst and the importance of empathy (ie, human judgment) in data work. The story’s the same…automation and AI make the human aspect more important, not less.

em360tech image
Help good content travel further, give this a like.
Link copied to clipboard!

So, having philosophized on that, let’s get practical. If human judgment is essential, what organizational structures actually support it?

The answer starts with a phrase I’ve been bombarded with left and right lately: “the data isn't ready.”

I’m sorry, but I've been hearing that since day one. The data wasn't ready in 2005 when the first signs of the financial crisis emerged (you've all seen the movie). It definitely wasn't ready during the depths, when every day held a new recovery plan. And it wasn't ready in 2020 when pandemic response teams scrambled for reliable numbers. 

The data has never been ready. What's changed are the needs…and more recently the client.

The primary consumer of data has always been people (which in and of itself seems like an absurd thing to say). People are forgiving and capable of making judgment calls. They look at an inconsistency and think, "That looks off," and CALL SOMEONE. They apply common sense and work around the gaps.

And, here’s the thing…humans hesitate. We pause when something doesn’t feel right. AI doesn’t.

This is why “good enough” data stops being good enough at AI scale. The problem isn’t that AI makes mistakes…it’s that it makes them confidently. AI ingests everything it’s given with uncritical confidence and produces outputs that inherit every flaw. Garbage-in, garbage out is an eternal truth, and it now scales data quality problems instantly, giving answers with certainty.

So once again, and as always, data quality matters.

The market has responded predictably. Every company needs to be an AI company now, and every AI company is shouting about data preparedness. But beneath the positioning, there's a real question…what does "data quality for AI" really mean?

The answer, increasingly, is more data. Observability watching pipelines, metadata tracking lineage. Data about data about data.

The tooling helps, but data quality has always been an organizational problem as much as a technical one. Everyone agrees data quality matters, but no one is clearly empowered to act when it fails. The tooling matters, but so does the workflow. Who gets alerted? Who owns the fix? Where are the “silos” being kept intact?

The reality is it’s as much about governance as quality.

Specifically: Who owns data quality? Do your data analysts have the authority to stop the line? Are you designing around imperfection? Do your tools amplify judgment or bypass it?

Practically, this means:

  • Escalation paths for data quality issues that actually get used
  • Defined ownership and responsibility at the dataset level
  • Human-in-the-loop workflows where the human has authority
  • Tools that support investigation
  • Documentation! (Capturing institutional knowledge is key)

The spotlight is on quality, but we can’t ignore governance, observability, and productization as if they’re new. The data has always been messy, and the pipelines have always been fragile. AI scale has just raised the cost of being wrong.

I’m sorry to report, but the data will never be ready. The good news is that’s because we continue to need more, learn more, and do more with it.