em360tech image

Companies can access the multi structured data they need to make AI/ML successful. But integrating and governing that data remains challenging. 

That's my initial take as Shawn Rogers and I start to research adoption trends, requirements, and best practices for managing data to support AI initiatives. 

We're excited to see what the hard data will tell us. 

In the meantime, here’s my hunch about the maturity and readiness of data teams to drive AI success. 

Data/AI leaders, what do you think? Where is your company in its journey? Quick cheat sheet: “+” signifies high maturity and readiness, “~” is medium, and “0” is low. 

People: 

Lack of AI team skills/expertise ranked as the top challenge in Shawn and Merv Adrian's recent report about AI adoption. But I think the picture is less gloomy on the data management side of the house. 

Data access practices: high maturity 

Data engineers have solid experience collecting and accessing data objects of all types—tables, log files, even unstructured objects such as documents and images. 

Integration: medium 

But they have less experience integrating and preparing those unstructured objects for AI. They need the help of data scientists and new pipeline tools. 

Governance: medium 

Data stewards and other governance roles also have new things to learn about governing unstructured data. On one hand, authorization and masking techniques are straightforward. On the other hand, data quality is a whole new ball game for unstructured data. To make a vector database accurate, data engineers and data scientists must choose the right chunking technique, assemble the right metadata, and apply the right embedding model. All new stuff to learn. 

Technology: 

Access: high 

Data lakes and now lake houses offer familiar platforms for bringing all types of data together, and many/most companies now have them. 

Integration: low 

New tools from innovative vendors such as Datavolo, unstructured.io, Flexor, and Airbyte help data engineers transform unstructured text into usable inputs for AI/ML models. But not enough companies have adopted such tools yet. 

Governance: low 

Most companies aren’t yet using the right tools to catalog, observe, and validate either unstructured data inputs or AI/ML models.

Process: 

Access: high 

Companies have mature processes for accessing multi-structured data. 

Integration: low 

They have little or no process yet for transforming and iterating those inputs for AI. 

Governance: medium 

Many/most companies have at least a rudimentary governance program in place to support BI. Now they must adapt those programs and processes to address new risks related to unstructured data and AI/ML. 

Look forward to feedback on this.