Microsoft Fabric Glossary

There are a lot of similar terminologies and words when working with Fabric. We have Data Lake, Delta Lake, OneLake, and Lakehouse, and the list continues. I found it confusing for some to understand the differences between them. Although explaining these terms and their differences requires many different blog articles, having one place for a quick definition for each can be helpful. Hence, this Glossary.

I will keep this updated as we have added new features, tools, and workloads, and I welcome all your input regarding what keywords, terms and terminologies you would like to see added here.

Enough talking, here is the list;

Video

Fabric (or Microsoft Fabric)

It is a fully cloud-based set of services (software-as-a-service, or SAAS experience) for designing and building analytical solutions. Fabric uses one central portal for all the development, simplifying the experience and bringing the power of multiple modules of analytical projects into one toolset. Learn more about Microsoft Fabric here.

Lakehouse

It is an object that stores the data. The data stored in a Lakehouse can be structured (such as tables) or unstructured (such files, PDFs, images, etc). Lakehouse is a combined word from Data Lake and Data Warehouse. Lakehouse is a logical object. It won’t store the data itself; it structures the data and stores it in an underlying structure of OneLake. Lakehouse has an SQL analytics endpoint for database developers and a default Power BI semantic model for data analysts. Lakehouse’s SQL analytics endpoint supports read-only SQL commands. Learn more about Lakehouse here.

Warehouse

Warehouse is a high-performance database that moves all the complexity of running the database at scale behind the scenes, and you can use it like a simple database. A warehouse can only store structured data (tables), giving the audience the full command using SQL scripts, including querying the data and updating, deleting, and inserting it. A warehouse cannot store unstructured data. Learn more about Warehouse here.

Datamart (or Power BI Datamart)

It is a persistence layer of data created using the Azure SQL database behind the scenes. Datamart is designed for citizen data analysts with low knowledge of database handling to be able to persist their data in a database with a few simple steps. Learn more about Datamart here.

SQL Analytics Endpoint

Lakehouse, Warehouse, and Datamart all support SQL development and management tools to connect to their database engine. This ability is provided using the SQL analytics endpoint. SQL analytics endpoint provides a database connection string that can be used in tools such as SSMS (SQL Server Management Studio) or Visual Studio (or any other tools working with the database) to connect to those objects and communicate with the database engine.

Semantic Model (or Power BI Semantic Model)

It is a logical structure of tables, their relationships, plus configurations at the table and column level. The tables’ structure and relationships might look similar to a relational database. However, the Semantic model is based on a different technology. Power BI semantic model is based on SQL Server Analysis Services Tabular engine (also called Vertipaq engine), which works with data in a column store storage way and supports using DAX as an expression language. Learn more about the Semantic model here.

OneLake

It is a logical storage of data that provides one way to access all the Fabric data across your tenant. The data in OneLake includes all the data items from the Fabric (which can be files or tables in Lakehouse, Warehouse, Eventhouse, and many other objects). OneLake is not the physical storage itself. OneLake stores the data in ADLS (Azure Data Lake Storage) Gen2 behind the scene. However, the complexity of that is abstracted from your view. You can work with OneLake as simply as you work with OneDrive. Learn more about OneLake here.

Data Lake

It is a structure that stores data in any form. Data Lake is a concept and technology that has been adopted and implemented by many vendors (such as Microsoft, Google, Amazon, etc.).

ADLS (Azure Data Lake Storage) Gen2

It is Microsoft’s Data Lake implementation on the cloud. Azure supports it, and it enables data storage in any format. OneLake uses ADLS Gen2 to store all the Fabric data items.

Azure

Azure is the Microsoft’s cloud computing and storage engine is called Azure. Azure includes many services for different purposes, from SAAS (such as Power BI and Fabric) to IAAS (such as Virtual Machines).

SQL

It is a language to work with database systems. SQL is a standard language that many database vendors use. In Microsoft Fabric, SQL can be used when working with the SQL Analytics Endpoint of Lakehouse or Warehouse. Another version of it, Spark SQL, can be used when working with Notebook.

KQL

It is a query language for Kusto database systems that work on a real-time streaming solution.

KQL Database

A real-time streaming database. Real-time events can be stored as records of a database table in this database.

Eventhouse

A group of KQL databases together. In other terms, it is a container for a group of KQL databases.

Eventstream

A hub that gets real-time data from sources such as Azure Event Hub, IOT Hub, or third-party real-time streaming services and then passes it on to destinations such as a KQL database (inside an Eventhouse) or Lakehouse. Eventstream is also capable of some simple transformations of the real-time data before sending it to the destinations. Learn more about Eventstream here.

Dataflow

Power Query engine-based ETL system that can read data from over 200 data sources (on-premises and cloud-based), transform the data, and then load it into destinations (Dataflow Gen2 can load data into Lakehouse, Warehouse, KQL database, and Azure SQL database). Dataflow uses a rich graphical interface with plenty of data transformation options available, and it uses M scripting language behind the scenes. Learn more about Dataflow here.

Data Factory (Fabric Data Factory)

It is the Data Integration technology of Microsoft Fabric. Data Factory combines the Data Pipelines from ADF with the Dataflows from Power Query to provide a fully functional ETL system in the cloud supported by hundreds of data connectors. Learn more about Data Factory here.

Azure Data Factory (or ADF)

ADF is a cloud-based data ingestion and data orchestration tool. Azure Data Factory uses pipelines and dataflows to provide an ETL framework. ADF is now enhanced much more as the Data Factory inside Microsoft Fabric. There isn’t an easy way to migrate ADF work to Fabric Data Factory at the moment, but there are workarounds.

Data Pipeline

Data orchestration object inside Data Factory workload of Microsoft Fabric. Using Data Pipeline, you can build a control flow of execution that runs one activity after the other based on conditions and logic. Activities range widely; they can execute dataflows, run Notebook, refresh a Semantic model, etc. Each activity has multiple output states; depending on that state, another activity can be executed. Learn more about Data Pipeline here.

Notebook

It is a code-first experience for data engineers and scientists in Microsoft Fabric. Using Notebook; you can prepare data, explore it, build machine learning models with it, process the data, load it into tables in Lakehouse, visualize it, and do many other tasks. The Notebook supports four languages: SparkR, PySpark, Scala, and Spark SQL. Learn more about Notebook here.

Spark

Spark is the parallel processing engine for big data. This engine is used behind the scenes of data engineering and data science experience of Microsoft Fabric. Spark uses configurations that can be set at the Workspace level in Microsoft Fabric. Execution of Notebook codes will utilize the Spark engine behind the scenes. Learn more about Spark here.

Delta Lake (or Delta Lake Table)

Delta Lake is a format in which the data tables are stored in Microsoft Fabric Lakehouse and Warehouse. Delta Lake format is an open standard format that includes multiple file types. The Parquet file is used to store the data, and the JSON log files are used to keep track of changes. Delta Lake format enables the Spark engine to provide normal transactional database operations as well as advanced features such as time travel to the data. Learn more about Delta Lake here.

Spark Job Definition

A pre-written application code to be executed by the Spark engine. A spark job definition can be scheduled to execute, or it can be executed as part of a Data Factory Pipeline. One big difference between Notebook and Spark Job Definition is that the Notebook provides a médium to edit the code, whereas Spark Job Definition is just for executing the code. Parameters as input or output can be passed to the Spark Job Definition.

M

M is the language for Power Query. M is a functional language and can be used in Power Query in Power BI Desktop, Power Query in Excel, and Power Query in Dataflow. Every Transformation done in the Power Query Editor Will be written as an M script behind the scenes. Learn more about M here.

DAX

DAX stands for Data Analytics eXpression language. This is the language used for writing calculations in Power BI, Power Pivot in Excel, and SQL Server Analysis Services Tabular. DAX uses many functions that can be helpful for analytical calculations. Learning DAX usually is the most time-consuming part of learning about Power BI. Learn more about DAX here.

DirectQuery

It is a type of connection to the source data from Power BI. When DirectQuery is used, the data will remain in the source system, and there won’t be a copy of that in Power BI. Power BI will just keep the metadata, and the visualizations and reports will always query data from the source system directly. This mode of connection is slow, but the data is always up-to-date. Learn more about DirectQuery here.

Import Data

Import Data is the default connection mode from Power BI to many data sources. Using this connection type, a copy of the data from the source system Will be generated in Power BI in a column-store format, which is efficient for data analysis. This is the most efficient way of working with data in Power BI. However, depending on your license for Fabric and Power BI, you may face some limitations regarding the model size. A scheduled refresh of the data is needed to keep the data up-to-date.

Composite Model

Composite model combines the benefits of DirectQuery and Import data together in Power BI. Using the composite model, you can have smaller tables (such as customer) as Import and keep bigger tables (Such as Sales) as DirectQuery. The composite model uses a dual storage type and can even enhance performance by using aggregations. Learn more about Composite Model here.

Live Connection

Live connection is a type of connection when connecting to the Power BI semantic model or Analysis Services dataset. In this type of connection, the Power BI report connects to an existing dataset (or semantic model) which has all the tables, relationships, and calculations. Power BI will be performing as a reporting-only tool in this mode. This mode is commonly used in multi-layered architecture using Power BI and in team-based development Solutions. Learn more about Live Connection here.

Direct Lake

Direct Lake is a new type of connection from Power BI only when connected to a Delta Lake table structure from Microsoft Fabric (Lakehouse and Warehouse). This type of connection does not copy the data from the source; it will be Read directly from the Parquet files. The performance of this mode will be more similar to that of Import Data. However, similar to DirectQuery, there is no need to refresh the data. Learn more about Direct Lake here.

Parquet

Parquet is a structure for storing data in a file. Unlike CSV, which stores the data in row format, Parquet stores it in column-store format, which is more efficient in both Reading and writing. The file size of Parquet is usually much smaller than the CSV equivalent. Delta Lake format uses Parquet as the storage file structure.

Sempy

SemPy comes from Sematic model + Python words. It is a new library recently published in Microsoft Fabric that allows data scientists and data engineers to connect directly to a Power BI Semantic Model.

MLSpark

MLSpark is a machine learning library to use in Notebook in Microsoft Fabric that is optimized and enhanced to work with the Spark engine behind the scenes.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply