Lakehouse VS. Warehouse VS. Datamart – The Difference Between The Three Fabric Objects

Lakehouse VS. Warehouse VS Datamart at Microsoft Fabric

Three types of objects in the Microsoft Fabric have similarities in what they can do for an analytics system. These three are; Lakehouse, Data Warehouse, and Power BI Datamart. All three objects provide storage for the data, which can be loaded into them using an ETL process and read using something like a Power BI report. In this article and video, I’ll explain the actual differences and how to choose the best option for your implementation and architecture.

Video

What is Microsoft Fabric

Microsoft Fabric is an end-to-end Data Analytics platform software-as-a-service from Microsoft. This platform uses multiple workloads to carry all services related to data analytics, such as data integration, storage, data warehousing, data engineering, Business intelligence, Data science, etc.

These are good starting points if you want to get yourself more familiar with Microsoft Fabric;

Power BI Datamart

Power BI Datamart brings Azure SQL Database into the Power BI architecture. It combines Dataflow (ETL), Azure SQL Database (Storage for the dimensional model), and Power BI Dataset (Analytical model) in a unified web editor experience. Power BI Datamart enables citizen data analysts to build a data warehouse in a multi-layered Power BI architecture.

You can learn more about Datamart in my article and video below;

Lakehouse

Lakehouse is a place for storing structured data (data tables) and unstructured data (files). Lakehouse is one of the Microsoft Fabric objects under the Data Engineering workloads. It stores the data behind the scene in OneLake. The data can be loaded into the Lakehouse using Data Pipelines, Dataflows Gen2, Notebooks, and Azure Spark Job Definitions. The Data from the Lakehouse can be accessed via the Lakehouse Explorer in the browser and the SQL Endpoint, which allows writing query commands to the Lakehouse. A Power BI dataset will be associated with the Lakehouse, which can be used for reporting.

To learn more about Lakehouse, read my article below;

Data Warehouse or Warehouse

Microsoft Fabric Data Warehouse, or Warehouse, is a place where structured data tables can be stored. A Warehouse is a highly performant and scalable database, and the complexity of the scalability is invisible from the user’s view. The Warehouse will be automatically called up or down to provide the best performance needed. The Warehouse stores the data in OneLake. Data can be loaded into the Warehouse using the Data Pipeline, Dataflows Gen2, or SQL Commands. The Warehouse provides a full SQL Endpoint connection to write SQL commands that can read and write into the data tables. Likewise, the Lakehouse and the Power BI Datamart, a Power BI Dataset, will be associated with the Warehouse for reporting.

To learn more about the Warehouse, read my article here;

Similarities

It is easy to spot the similarities between these three objects. They all act as a storage system (or you can call it a database or a data warehouse) in the analytics solution. There is always an ETL to load data into these and a Power BI dataset associated with them for further reporting. They are all cloud-based database systems. They all use the web editor for editing and modifying the data model. So that is one point of confusion; it looks like all three are doing the same thing.

Similarities between Power BI Datamart, Lakehouse, and Warehouse

I often get this question; now that Microsoft Fabric comes with the Lakehosue and the Warehouse, will Power BI Datamart be deprecated? Or what is the reason I should use one over the other? The rest of this article answers these types of questions.

Licensing Matters

One of the most essential matters of the difference is the type of license you are using. With Fabric F SKUs, you would have all three, but you can also have the Power BI Datamart used only with a PPU license (Premium Per User). Some organizations don’t use Premium or Fabric capacities and still using PPU; they can use Power BI Datamart at its full function with a database size of up to 100TB.

Licensing Difference Between Datamart, Lakehouse, and Warehouse.

Suppose you are working in an organization with only five Power BI users and want full functionality for the Power BI, including a place to store the data. In that case, you can purchase PPU licenses for all five users and use the Power BI Datamart for your purpose.

Scalability

Another big difference between the three options is the scalability. Both Lakehouse and Warehouse are using OneLake as their storage. They both use Microsoft Fabric’s compute bursting behind the scene to speed up the performance when needed (let’s say at the time of loading data or fetching data using a query from Power BI). The Lakehouse and the Warehouse can easily handle massive amounts of data at ease.

Spark Compute process. Image source: https://learn.microsoft.com/en-us/fabric/data-engineering/spark-compute

Although the Power BI Datamarts can handle large amounts of data, there is a cap of 100TB. And the compute bursting won’t be an option here. On the other hand, the only way to load data into Datamart is Dataflow, which might not perform as fast as Copy data activity in Data Pipeline if we deal with massive data.

Structured and Unstructured Data

You can store Structured and Unstructured Data in the Lakehouse. However, both Warehouse and Datamart only support structured data. The Lakehouse makes a better choice when looking for a storage option to store files and tables.

Lakehouse is a storage for files and data tables

SQL Endpoint

The SQL Endpoint connects to the database part of the Datamart, Lakehouse, and Warehouse. For the Datamart and the Lakehouse, the connection is read-only. You can only query the data but cannot modify it or write commands that change the structure of the database (such as altering or creating a table). However, the SQL Endpoint for the Warehouse is a fully operational endpoint, which allows you to read and write. You can write a query to fetch the data and write SQL commands to alter and create the tables or insert or delete data from the tables.

SQL Endpoint of Warehouse enables SQL commands for reading and writing. Image source: https://learn.microsoft.com/en-us/fabric/data-warehouse/data-warehousing

This is good evidence that the Warehouse is more for a database developer, whereas the other two options are not.

Are you a Citizen Data Analyst, Database Developer, or Data Engineer?

The persona of the developer working with these technologies is another point of difference. Because each object uses a different set of technology areas, it would need someone with a particular skill set.

Datamart is for Citizen Data Analyst

Datamart is an environment where you can load data using Power BI Dataflows. The knowledge that you need to work in that environment is Power Query and Power BI. These are skills that a Power BI Developer has in the belt.

Data Warehouse is for Database Developer

Because of the powerful SQL endpoint of the Warehouse, the best outcome from it is achieved when a Database Developer works with it. In addition to working with Data Pipelines and Dataflows, the database developer can write SQL query commands or commands to change the data and even the data structure.

Lakehouse is for Data Engineer

You can use Data Pipelines and Dataflows to load data into the Lakehouse. However, you can also use Notebooks and Spark Job Definitions. These two would use languages such as SparkR, Scala, and Python. These languages and working with the Notebooks are usually in the skillset area of a data engineer or a data scientist.

Different personas for the users of Datamart, Data Warehouse, and Lakehouse.

From the Power BI Point of View

If you are only concerned from the Power BI point of view, then these options have a few differences. If you are Importing data into Power BI, then usually only the speed of Import is the important factor (Refresh time), Which would usually work faster when the data is in OneLake (Warehouse or Lakehouse), especially if you are dealing with a large volume of data.

If you use a DirectQuery option, the new Direct Lake option added to the Power BI will bring a much faster response time than the Datamart (This option is available for Lakehouse and Warehouse because their data is stored in OneLake). So if you are working with a huge amount of data, then Data Warehouse and Lakehouse would bring a better performance to the Power BI. But then, the Lakehouse and the Warehouse won’t be much different here because they store their data in the OneLake and provide the SQL Endpoint for SQL queries that read data from there.

Direct Lake

All-in-One Comparison

Here are some of the differences mentioned in the below table

ItemData warehouseLakehousePower BI Datamart
Data volumeUnlimitedUnlimitedUp to 100 GB
Type of dataStructuredUnstructured,
semi-structured,
structured
Structured
Primary developer personaData warehouse developer,
SQL engineer
Data engineer,
data scientist
Citizen data analyst
Primary developer skill setSQLSpark
(Scala, PySpark, Spark SQL, R)
No code (Power Query), SQL
ETLData Pipelines,
Dataflows Gen2,
SQL
Data Pipelines,
Dataflows Gen2,
Notebooks,
Spark Job Definitions
Dataflows
Languages to work withT-SQL
Spark
Spark
(Scala, PySpark, Spark SQL, R)
Dataflows, T-SQL
SQL EndpointRead and WriteRead-onlyRead-only
Primary development interfaceSQL scriptsSpark notebooks,
Spark job definitions
Power BI
LicensesFabric CapacityFabric CapacityFabric Capacity
Power BI Premium
PPU
Summary of differences

Why not one? Analogy to Understand

I know that some of you at this point might ask questions like the below;

Why didn’t Microsoft combine all these three into one? Why should we need multiple things with pros and cons and not all pros in one? Why making it so complicated etc.

To answer that, I like to use an analogy. Next time you are in your neighborhood, visit a car dealership of a brand you like (Mercedes, Jaguar, BMW, Ford, Toyota, Nissan etc, I let you choose yours). And ask them for a car! They would either ask you for what you need a car, what is your budget, or family size, etc. Or they might bring you to their yard, and you get a whole lot of different types of cars from the same brand. Here are some of the cars that Mercedes makes;

Some of the cars from Mercedes

The list above is all cars; you can drive each, and they take you from point A to point B. So why so many different cars, then? The answer is in their usage and pros and cons. Some are good for families and large groups (SUVs), some are good for simplicity and small family (compacts), some are good for performance (Coupes and Supercars), some are good for offroading, etc. Each has pros and cons. Car companies understand that each car has its own market, so they make not only one type but multiple types.

The Same is true everywhere. There are multiple programming languages. And here we are multiple objects in Fabric; Lakehouse, Warehouse, and the Datamart.

Summary

Although these three objects (Lakehouse, Warehouse, and Datamart) perform similar activities in an analytics project, they differ in many aspects. Their differences depend on the type of license you are using, the skillset and the person of the developer working with it, the scale and column of the data, and the type of data to be stored. Each object has its use case scenarios, and there are situations in which each can be helpful. I don’t see any of these objects getting replaced or deprecated at this stage. However, depending on how Microsoft focuses on its roadmap, things might change in the future.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

2 thoughts on “Lakehouse VS. Warehouse VS. Datamart – The Difference Between The Three Fabric Objects

Leave a Reply