Companies can access the multi structured data they need to make AI/ML successful. But integrating and governing that data remains challenging.
That's my initial take as Shawn Rogers and I start to research adoption trends, requirements, and best practices for managing data to support AI initiatives.
We're excited to see what the hard data will tell us.
In the meantime, here’s my hunch about the maturity and readiness of data teams to drive AI success.
Data/AI leaders, what do you think? Where is your company in its journey? Quick cheat sheet: “+” signifies high maturity and readiness, “~” is medium, and “0” is low.
People:
Lack of AI team skills/expertise ranked as the top challenge in Shawn and Merv Adrian's recent report about AI adoption. But I think the picture is less gloomy on the data management side of the house.
Data access practices: high maturity
Data engineers have solid experience collecting and accessing data objects of all types—tables, log files, even unstructured objects such as documents and images.
Integration: medium
But they have less experience integrating and preparing those unstructured objects for AI. They need the help of data scientists and new pipeline tools.
Governance: medium
Data stewards and other governance roles also have new things to learn about governing unstructured data. On one hand, authorization and masking techniques are straightforward. On the other hand, data quality is a whole new ball game for unstructured data. To make a vector database accurate, data engineers and data scientists must choose the right chunking technique, assemble the right metadata, and apply the right embedding model. All new stuff to learn.
Technology:
Access: high
Data lakes and now lake houses offer familiar platforms for bringing all types of data together, and many/most companies now have them.
Integration: low
New tools from innovative vendors such as Datavolo, unstructured.io, Flexor, and Airbyte help data engineers transform unstructured text into usable inputs for AI/ML models. But not enough companies have adopted such tools yet.
Governance: low
Most companies aren’t yet using the right tools to catalog, observe, and validate either unstructured data inputs or AI/ML models.
Process:
Access: high
Companies have mature processes for accessing multi-structured data.
Integration: low
They have little or no process yet for transforming and iterating those inputs for AI.
Governance: medium
Many/most companies have at least a rudimentary governance program in place to support BI. Now they must adapt those programs and processes to address new risks related to unstructured data and AI/ML.
Look forward to feedback on this.