"A flaw of warehouses is that you need to move all your data into them so you can keep it going, and for a lot of organisations that's a big hassle,” says Will Martin, EMEA Evangelist at Dremio. “It can take a long time, it can be expensive, and you ultimately can end up ripping up processes that are there."

In this episode of the Don’t Panic It’s Just Data podcast, recorded live at Big Data LDN (BDL) 2025, Will Martin, EMEA Evangelist at Dremio, joins Shubhangi Dua, Podcast Host and Tech Journalist at EM360Tech. They talk about how enterprises can enable the Agentic AI Lakehouse on Apache Iceberg and why query performance is critical for efficient data analysis.

"If you have a data silo, it exists for a reason—something's feeding information to it. You usually have other processes feeding off of it. So if you shift all that to a warehouse, it disrupts a lot of your business," Martin tells Dua.

This is where a lakehouse comes into play. Organisations can federate their access through a lakehouse data approach. They can centralise access to the respective organisation’s lakehouse while keeping their data in its original location. Such a system helps people get started quickly.

In terms of data quality, if you access everything from one location, even with separate data silos, you can see all your data. This visibility allows you to identify issues, address them, and enhance your data quality. That’s beneficial for AI, too, Martin explains.

Lakehouse Key to AI Infrastructure?

Lakehouse has been recognised for unifying and simplifying governance. An imperative feature of a lakehouse is the data catalogue, which helps an organisation browse and find information. It also secures access and manages permissions.

"You can access in one place, but you can do all your security and permissions in one place rather than all these individual systems, which is great if you work in IT,” reflects Martin. "There are some drawbacks to lakehouses. So, a big component of a lakehouse is metadata. It can be quite big, and it needs managing. Certain companies and vendors are trying to deal with that."

With AI and AI agents, it’s become even harder to optimise analytics on a lakehouse. However, this has been improved as technical barriers are disappearing. Martin explains that anyone can prompt a question; for instance, an enterprise CEO could ask questions about the data and demand justifications directly.

In the past, a request would have to be submitted, and then a data scientist or engineer would create the dataset and hand it over. Now, engineers' roles have changed to focus on better optimisation. They help queries run smoothly and ensure tables are efficient. Agents cannot assist with that.

Also Listen: Dremio: The State of the Data Lakehouse

Optimise Lakehouse

Vendors such as Dremio provide services to manage and optimise lakehouses. They offer autonomous features to help set up the workflow. Martin says that in many cases, Dremio learns from the clients’ actions and improves their system. “This is evident in our reflections, which are optimised datasets that speed up performance,” he added.

“In other situations, we handle tasks like file compaction and garbage collection, which are often less exciting for engineers. Now, there’s no need for engineers to manage those tasks, which benefits everyone.”

As a lakehouse provider, Dremio is Iceberg native. They began their journey as a lakehouse provider and continue down this road. Now the industry has shifted gears to focus on lakehouses too, first with Snowflake and now even Databricks, which has developed its format with Delta Lake.

The ultimate goal is to incorporate more features—permissions, governance, and fine-grained access control. “These capabilities are things vendors typically sell, but they will soon become widely available for free,” Martin tells Dua.

Learn More: Visit dremio.com for more information on open data lakehouse technology.

Key Takeaways

  • Agentic AI and Apache Iceberg are current hot topics.
  • Lakehouses offer quicker, less disruptive data access for AI compared to data warehouses.
  • Centralised access in a lakehouse improves data quality and simplifies AI integration.
  • Lakehouses, with their data catalogues, ease governance and permission management for AI agents working with sensitive data.
  • Apache Iceberg is resolving metadata format issues, though metadata management remains an overhead.
  • Dremio, an Iceberg-native provider, champions open source and interoperability, offering autonomous optimisation features to free engineers from mundane tasks.
  • Beyond technology, a robust data strategy is crucial for organisational data improvement.
  • Agentic AI will evolve to handle more delegated, multi-step tasks with less supervision.
  • The open-source ecosystem will see consolidation and improved features, making advanced catalogue and governance tools widely available.
  • Ultimately, for IT decision-makers, the quality of data is paramount for all analytical endeavours, including AI.

Chapters

  • 0:00 - Introduction to Agentic AI
  • 0:35 - Discussing Big Data London, Hot Topics: Agentic AI and Apache Iceberg
  • 1:37 - Data Lakehouse vs. Data Warehouse for AI
  • 2:30 - Data Quality and AI with a Lakehouse
  • 3:18 - AI Agents and Sensitive Data: Governance with a Lakehouse
  • 4:19 - Challenges and Solutions in Lakehouse Technology (Apache Iceberg)
  • 5:47 - Dremio's Use Cases and Interoperability
  • 7:40 - Dremio's Standout Features and Autonomous Optimisation
  • 9:39 - The Importance of Data Strategy
  • 10:29 - Future of Agentic AI
  • 11:34 - Future of the Open-Source Ecosystem
  • 12:51 - Final Takeaway for IT Decision Makers: Data Quality is Critical
  • 13:51 - Conclusion