
In one of my previous articles on Data Repositories, we discussed in brief about Data Marts, Data Lakes and Data Warehouses. Based on the feedback from some of you, in this article, I am focusing specifically on the difference between Data Marts, Data Lakes and Data Warehouses.
You can view a presentation video in my Odysee channel.
Data Warehouse: A data warehouse is a centralized system that stores data from various sources. It is data-oriented and designed for analytical purpose. Think of it as a large repository that holds data about different aspects of an organization. It is a multipurpose enabler of operational and performance analysis. These are usually large in size often ranging from 100 GB to Terabytes. In a Business Entity, it would collect data from different departments such as Marketing, Finance, HR and Project Deliverables for centralized analysis contributing to BI and Predictive Analytics. It may take months to even years to systematically build on to a Data Warehouse.
Think of Data Warehouse as a large repository that holds data about different aspects of an organization
Data Mart: A data mart, on the other hand, is a subset of a data warehouse. It is project-oriented and typically focuses on a specific business unit or department. Data Marts store data relevant to a particular team’s needs. In a Business Context, each Data Mart usually stores a department’s data such as Marketing, Finance, HR, Compensation and Project Deliverables. These are usually less than 100 GB in size and normally take few months to build, depending on the complexity of the data.
In a Business Context, each Data Mart usually stores a department’s data such as Marketing, Finance, HR, Compensation and Project Deliverables.

Data Lakes: are usually a pool of data where each data element is given a unique identifier and is tagged with Metatags for further use. These are organized based on specific use cases and store all the source data without exclusions.
The flow of Data is generally through raw sources to Data Lakes in the raw format itself that can be further extracted, transformed and loaded to several Data Marts, that in turn are consolidated in a Data Warehouse for further analysis and reporting.

Main Image by Владимир from Pixabay

You must be logged in to post a comment.