Microsoft Fabric is a complete set of technologies that provide Analytics as a service. Fabric uses a logical storage layer named OneLake. In this article, we will explore what OneLake is, what it is important, its essential features, and how it works with the rest of Fabric objects.
Video
Microsoft Fabric
Microsoft Fabric is an end-to-end Data Analytics software-as-a-service offering from Microsoft. Microsoft Fabric combined some products and services to cover an end-to-end and easy-to-use platform for data analytics. Here are the components (also called workloads) of Microsoft Fabric.
To learn more about Microsoft Fabric and enable it in your organization, I recommend reading the articles below;
What is OneLake?
A Data Lake is a repository to keep the data in a raw format. Data Lake is a concept that has been in data-related technologies for some time. However, the idea for a Lake comes with the fact that you have ONE place to keep all the data.
The reality, unfortunately, isn’t like that. In reality, the Lake is not used as the one place. You will find multiple lakes in many organizations, and that will lead to data duplicates and maintenance problems that come with it. What we see in the reality is Silo Lakes rather than the One Lake.
This brings us to Microsoft’s OneLake primary promise. OneLake promised to be the ONE Lake organizations need, even if they have multiple branches across multiple time zones. If they have multiple business domains and units, there will be ONE lake for them to use, not more than that. Hence the name OneLake.
OneLake is a logical data lake for the whole organization, which is single and unified. OneLake is part of Microsoft Fabric and helps organizations to have less duplication of the data and easier to manage data. All the workloads in Fabric require their data items to be stored somewhere, and OneLake is the storage for those. You can consider OneLake as the OneDrive for Data (as Microsoft says).
What is the Storage under the hood?
OneLake is a logical storage. Underneath that, the physical storage is ADLS Gen2 (Azure Data Lake Storage Gen2). Fabric data items are stored in this storage. Here is some information about OneLake and how it works with the rest of the Fabric items and workloads;
- Compute engines store their data automatically in OneLake.
- Data is stored in a single common format.
- an open standard format; Delta Parquet, used for tabular data (Power BI datasets)
- Compute engines are optimized to work with Delta Parquet files as their native files without import or export. (An example of that is DirectLake)
No matter if you create a Lakehouse, Warehouse, Data mart, or Power BI datasets, all these artifacts and any other Fabric artifact will be stored in ADLS Gen2 using the logical layer of OneLake.
Structure and Governance
Fabric is created on the same principles of Power BI, So it is understandable that OneLake uses a similar structure for governing the objects. We have Workspaces and Domains to separate Fabric items in this logical layer; underneath, they are all stored in ADLS Gen2.
To learn more about Domains, read the below article;
OneCopy and Shortcuts ensure that there is no need to duplicate the data. Entities from a Domain can be easily accessible in other domains under the same tenant using Shortcuts. To learn more about Shortcuts, read the below article;
Accessing OneLake
There are multiple ways to access OneLake; These are in addition to being able to access Fabric items through the Fabric portal.
OneLake Data hub
OneLake Data hub is available inside the Fabric portal and allows you to find Fabric items inside the OneLake. Items can be filtered by Domains, workspaces, endorsed, or item types as filters.
OneLake Data hub is also available in desktop tools such as Power BI Desktop;
OneLake file explorer for Windows
When Microsoft calls OneLake like the OneDrive for Data, they mean it literally. You can access OneLake directly from Windows Explorer. You only need to download and install the OneLake file explorer here. Once downloaded, you can open Windows Explorer and see Fabric items easily;
This will include files and folders used for each Fabric item;
There will be a Sync process to Sync the OneLake’s data with the Windows Explorer. Not only you can see the data files in the Explorer, but also you can copy and paste files just like that.
URI
Another way to access OneLake is using the URI. You can use this URI to get to the individual objects and paths you want anywhere in your Fabric tenant. The syntax is like the following;
https://onelake.dfs.fabric.microsoft.com/<workspace>/<item>.<itemtype>/<path>/<fileName>
ABFS is another way to get a URI. ABFS stands for Azure Blob FileSystem driver. It looks like the following;
abfs[s]://<workspace>@onelake.dfs.fabric.microsoft.com/<item>.<itemtype>/<path>/<fileName>
You can access the ABFS for items through the Fabric portal.
API
As a Software as a service, OneLake provides an API for external services and applications. The API can read, write, and manage the data in OneLake.
OneSecurity
The idea for security in OneLake is that you apply the security in the OneLake, and all workloads using the data will follow those security settings. For example, you apply the row/column/table level security in OneLake, and then the Warehouse, the Lakehouse, and the Power BI dataset and report will all follow those settings, and they won’t need to have security rules implemented in them.
Summary
OneLake is the data lake logical storage for all the Fabric objects and computes. OneLake uses ADLS Gen2 underneath to store the data in Delta Parquet format, and it governs the data with structural items such as Domains and Workspaces. However, there won’t be a need to duplicate the data as OneCopy using Shortcuts will make data available in other places in the same tenant without copying it. OneLake’s data can be accessed through Windows Explorer, API, or URIs in the Fabric tenant. OneLake also provides the OneSecurity mechanism so that the data security will be applied in OneLake but used in other workloads.