Modern organizations recognize that updating their data architecture is essential to remain competitive. In recent years, an array of new technologies and methods has transformed how companies leverage data to serve customers and stay ahead in the market. Rather than simply reacting to events, today’s data-driven organizations use these advances to anticipate business needs and proactively shape outcomes. Those that fail to evolve risk falling behind in a fast-paced, data-centric world.
Eckerson Group has written and consulted extensively about the modern data architectures. This article summarizes the major characteristics of a modern data architecture and serves as a guide for organizations that are in the midst of developing a new data strategy for the modern age. (For a slightly different perspective, read what drives companies to adopt a modern data architecture and what features they request most often.)
What is Data Architecture?
Like a conventional architect that designs homes or buildings, a data architect creates a blueprint of a data environment that aligns with the short- and long-term goals of an organization and its unique cultural and contextual requirements.
For most people, data architecture defines a standard set of products and tools an organization uses to manage data. But it is much more than that. A data architecture defines the processes to capture, transform, and deliver usable data to business users. Most importantly, it identifies the people who will consume that data and their unique requirements. A good data architecture flows right to left: from data consumers to data sources—not the other way.
From Old to New. In the past, organizations built fairly static IT-driven data architectures. We called them data warehouses. Because of the underlying technology and design patterns, most data warehouses take an army of people to build and change, providing minimal return on investment. Most are glorified corporate data dumps, although some sing beautifully, providing a rich harmony of integrated dimensional data for reporting and analysis.
A modern data architecture may still deliver a data warehouse—ideally, one that is flexible, adaptable, and agile. But a data warehouse is just one component of a modern data architecture or modern analytics ecosystem, as some call it. (See figure 1.) The new data environment is a living, breathing organism that detects and responds to changes, continuously learns and adapts, and provides governed, tailored access to every individual.
Versus a Data Platform. In addition, a data architecture is not a data platform. The latter refers to the engines and tools that do the heavy lifting of moving, shaping, and validating data. A data platform consists of the underlying database engines (e.g., relational, Hadoop, OLAP) that process data as well as the data assembly framework that enables data engineers from IT and the business to create data sets for business consumption.
“Data assembly” is a new term I’m using that supersedes the term “data integration” which has an IT-centric connotation. Data assembly reinforces the notion that the modern data architecture is a collaborative venture between business and IT.
Ten Characteristics
A modern data architecture exhibits the following ten characteristics:
Customer-centric. Rather than focus on the data or the technology required to extract, ingest, transform, and present information, a modern data architecture starts with business users and their requirements and flows backward, as mentioned above. Customers can be internal or external to an organization and their needs vary by role, by department, and over time. A good data architecture continuously evolves to meet new and changing customer information needs.
Adaptable. In a modern data architecture, data flows like water from source systems to business users. The purpose of the architecture is to manage that flow by creating a series of interconnected and bidirectional data pipelines that serve various business needs. The pipelines are constructed using base data objects—data snapshots, data increments, data views, reference data, master data, and flat, subject-oriented tables. The data objects serve as building blocks that are continuously reused, repurposed, and replenished to ensure the steady flow of high quality, relevant data to the business.
Automated. To create an adaptable architecture in which data flows continuously, designers must automate everything. They must profile and tag data as it’s ingested and map it to existing data sets and attributes—a process called metadata injection—a key function of data catalogs. In the same manner, it must also detect changes in source schema and identify the impact of changes on downstream objects and applications. In a real-time environment, it must detect anomalies and notify the appropriate individuals or trigger alerts in operational dashboards.
Smart. The ideal data architecture is more than just automated; it uses machine learning and artificial intelligence to build the data objects, tables, views, and models that keep data flowing. It uses intelligence rather than brute force to identify data types, common keys and join paths, identify and fix data quality errors, map tables, identify relationships, recommend related data sets and analytics, and so on. A modern data architecture uses intelligence to learn, adjust, alert, and recommend, making people who administer and use the environment more efficient and effective.
Flexible. A modern data architecture needs to be flexible enough to support a multiplicity of business needs. It needs to support multiple types of business users, load operations and refresh rates (e.g. batch, mini-batch, stream), query operations (e.g., create, read, update, delete), deployments (e.g., on premises, public cloud, private cloud, hybrid), data processing engines (e.g., relational, OLAP, MapReduce, SQL, graphing, mapping, programmatic) and pipelines (e.g., data warehouse, data mart, OLAP cubes, visual discovery, real-time operational applications.) A modern data architecture has to be all things to all people.
Collaborative. Unlike the past where the IT department built everything, a modern data architecture splits the responsibility for acquiring and transforming data between IT and the business. The IT department still does the heavy lifting of ingesting data from core operational systems and creating generic reusable building blocks. But from there, business units take over (if they have the skills, desire, and need). Data engineers and analysts in business units use data preparation and data catalog tools to create custom data sets comprised of corporate and local data and use them to create and power business unit applications. This collaboration frees IT from having to know business context, which has never been its strong suit.
Governed. Ironically, governance is the key to self-service. A modern data architecture defines access points for each type of user to meet their information requirements. This was the basis of my 2016 report A Reference Architecture for Self-Service Analytics which defines the access points for four classes of business users: data consumers, data explorers, data analysts, and data scientists. For instance, data scientists need to be given access to raw data in the landing area or, better yet, a purpose-built sandbox where they can mix raw corporate data with their own data.
Simple. Like Occam’s razor, the simplest architecture is the best architecture. This is a tall task given the diversity of the requirements and the complexity of components in today’s data architecture. To apply this rule, an organization with small data might be better served by a BI tool with a built-in data management environment rather than a massively parallel processing (MPP) appliance or Hadoop system. To reduce complexity, organizations should strive to limit data movement and data duplication and advocate for a uniform database platform, data assembly framework, and analytic platform, despite the howls of best-of-breed proponents.
Elastic. In the age of big data and variable workloads, organizations need a scalable, elastic architecture that adapts to changing data processing requirements on demand. Many companies are now flocking to cloud platforms (both public and private) to obtain on-demand scalability at affordable prices. Elastic architectures free administrators from having to calibrate capacity exactly, throttle usage if necessary, and overbuy hardware incessantly. Elasticity also spawns many types of applications and use cases, such as on-demand development and test environments, analytic sandboxes, and prototyping playgrounds.
Secure. A modern data architecture is a freedom fortress—that is, it provides authorized users ready access to data while keeping hackers and intruders at bay. It also complies with privacy regulations, including Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation emanating from the European Union. It does this by encrypting data upon ingest, masking personally identifiable information (PII), and tracking all data elements in a data catalog, including their lineage, usage, and audit trail. Lifecycle management ensures each data object has an owner, a location, and an obsolescence plan.
BONUS CHARACTERISTIC
Resilient. Any data architecture must be resilient, equipped with high availability, disaster recovery, and backup/restore capabilities. This is especially crucial in a modern data architecture, which often operates across vast cloud server infrastructures where outages are not uncommon. Fortunately, many cloud providers now offer built-in redundancy, failover options, and strong service level agreements, allowing companies to establish disaster recovery in distributed data centers at a relatively low cost.
While there are certainly more than ten characteristics of a modern data architecture, this list covers some of the most essential ones. If there’s a characteristic you feel is missing from this list, I’d love to hear your thoughts. And for further insights, check out our blog on the ten things companies seek in a modern data architecture.
This is a top trending article on the Eckerson Group site and was published on Nov 25, 2018.