A detailed overview
Every year, new tools emerge that help us keep up with the increasing volume of data generated every day by making the management, discovery, and retrieval of data so much easier. On this page, you’ll learn what a data mesh is, what it does, and when to use one. Without further ado, let’s get started.
What is a data mesh?
When you hear the term “data mesh” (a new concept in data management), two words should come to mind: decentralized and domain.
Something that’s decentralized is distributed across multiple locations, of course. With a data mesh, this means that an organization’s data repository is spread across various domains within the organization. A domain can refer to different departments, such as sales, accounting, and so on.
But a data mesh goes beyond that, which you’ll learn about below. To help you better understand what a data mesh is, we’ll compare it with other traditional data management techniques: a data lake and a data fabric.
Data mesh vs data lake
Essentially, a data lake is a centralized data architecture used for storing an organization’s data. All data coming from different domains in an organization, such as marketing, sales, ERP, etc., are stored in a centralized location. If a person from the marketing team needs to access a particular data point, they’ll need to consult with the IT department team responsible for managing the data lake.
A data mesh operates differently. Each domain within an organization is responsible for maintaining and managing its specific data. In other words, the marketing team will manage any marketing data, the sales team will manage the sales data, and so forth. Using a data mesh is a distributed approach. The marketing team does not need to consult the IT team to access its own data, nor does the sales team, and so on.
Data mesh vs data fabric
The main difference between a data mesh and a data fabric is democratization. As mentioned above, a data mesh is a relatively new concept that emphasizes the democratization of data ownership and governance. In contrast, a data fabric is a more established concept that focuses on a unified architecture for managing and governing data.
A data fabric enables data discovery, access, and governance to occur in one centralized location. A data mesh, on the other hand, allows for greater autonomy and agility among teams by organizing data around domains.
Principles of a data mesh
As stated earlier, a data mesh moves organizations away from traditional data management techniques where a single team manages the entire data architecture to a more distributed approach. A data mesh has specific principles that define its architecture. You can only consider a data architecture a data mesh if it adheres to the following:
Self-service data infrastructure
Federated data governance
Data as products
Domain-oriented ownership means that each domain within an organization that generates and uses data should have ownership and management of its own data. This includes setting standards and access controls for the domain team. For instance, the marketing domain should be able to implement access controls that restrict certain team members from accessing or modifying particular data.
Self-service data infrastructure
Self-service data infrastructure means that the entire data infrastructure, from storage to processing and analytics, must be self-service. This means that each domain is responsible for managing its own infrastructure, including data storage, data processing, and creating analytical dashboards to gain insights from their data. As such, every domain must know the data infrastructure tools.
Federated data governance
Federated data governance means that data governance and standards are distributed across the organization through shared standards and protocols. This means that domains should be responsible for governing their own data and adhering to standards and protocols.
Data as products
Managing data as products means that domains that are responsible for managing data must think of it like a product. Whenever another domain needs to access that data, the domain that’s managing it must provide high-quality data. Also, the domain must provide metadata and other artifacts, thus creating a positive customer experience for other domains.
What is a data mesh used for?
A data mesh is a modern approach to data architecture that solves the problem of centralized data management and governance. It creates an agile environment where distributed data management pressures more people inside the organization to be data driven. Additionally, a data mesh eliminates the problem of data silos and allows teams to work autonomously, providing faster data delivery and improved data quality compared to a centralized approach where each domain needs to consult a centralized team just to get access to a particular data point.
Here are some of the benefits of using a data mesh:
Improved data quality. Domains are able to manage their own data using the expertise they’ve acquired inside that domain.
Better communication and collaboration. Data is now more accessible and transparent.
Faster data delivery. A centralized approach often results in unnecessary delays.
When to use a data mesh
Although a data mesh is a modern approach to data management and governance, it’s not a one-size-fits-all solution. It’s not suitable for all organizations at this time. Rather, it’s an approach that some organizations adapt because it better meets their requirements.
A data mesh is particularly suitable for large organizations struggling with data silos, delayed data delivery, and poor data quality. It’s also useful for organizations that want most of their team members to be data driven and that are planning to empower its teams with the necessary data skills. A data mesh is particularly effective for organizations with multiple teams working with different datasets that require autonomy when it comes to managing their own data to enhance its quality.
How to design and implement a data mesh
Designing and implementing a data mesh requires a strategic and collaborative approach that involves the entire organization, including all domains. Here are some steps to consider:
Identify the data domains that exist in your organization. This involves identifying the data that the organization uses, which domains or teams use it, and their relationship with the data.
Define the data products that each domain will handle. This involves determining things such as the data schema, business logic, and data quality requirements for each product.
Establish domain ownership. In other words, make the domains the owners of the data products they handle and hold them accountable to making high-quality data products and for delivering on time.
Adopt the federated governance approach, where each domain has its own standards and policies.
Implement a data infrastructure that allows each domain to carry out its required tasks in terms of storage, processing, analytics, etc.
Establish a data culture approach that encourages everyone in the organization to be data driven and to understand the importance of being data driven.
Continuously improve and iterate the process, identify areas that need improvement, and make adjustments as needed.
In this tutorial, you learned what a data mesh is and how it differs from other types of data architecture. Additionally, you gained an understanding of the major principles of a data mesh as well as when and how to set one up. Thank you for reading! For more technical tutorials, check out our blog.