What is Data Factory in Microsoft Fabric

What is Data Factory in Microsoft Fabric

Microsoft Fabric is an end-to-end data analytics solution in the cloud, and one of its workloads is called Data Factory. In this article, you will learn what Data Factory is, how it works with the rest of Microsoft Fabric, and what are elements and functions of Data Factory.

Video

Microsoft Fabric

To understand Data Factory, it is best to understand Microsoft Fabric first. Microsoft Fabric is an end-to-end Data Analytics software-as-a-service offering from Microsoft. Microsoft Fabric combined some products and services to cover an end-to-end and easy-to-use platform for data analytics. Here are the components (also called workloads) of Microsoft Fabric.

Microsoft Fabric

To learn more about Microsoft Fabric and enable it in your organization, I recommend reading the articles below;

Data Factory Origin

Microsoft Fabric has a workload for Data Integration. Any end-to-end data analytics system should have a data integration component. Microsoft has been a strong data integration tool and service leader for decades. This started with SQL Server tools such as DTS (Data Transformation Service) and SSIS (SQL Server Integration Services) and then stepped into cloud-based technologies such as ADF (Azure Data Factory). Microsoft also used a data transformation engine that first targeted citizen data analysts called Power Query.

Data Factory is the data integration component of Microsoft Fabric which brings the power of Azure Data Factory and Power Query Dataflows into one place. For many years, we had these two technologies doing data transformations separately. But now, these two are combined under Fabric, called Data Factory.

Power Query

Power Query Dataflows was first announced a few years ago as an additional component to Power BI for data transformation as a cloud technology that is simple to use for data analysts. But soon, it became more than just for Power BI; it became Power Platform Dataflows. These days, Power Query Dataflows are used for data transformations in Power BI projects and data migration in Power Apps projects.

Power Query

Although Power Query Dataflows is also on the dataflow side, it needed some enhancements on scalability and the control of execution with some control flow elements (such as loop structures, conditional execution, etc.).

Azure Data Factory

Azure Data Factory came into the market many years ago as the next generation of SSIS for in-the-cloud ETL. However, the data transformation engine of Azure Data Factory was not built on a strong basis, so most of the time, ADF was used for data ingestion, and then with the help of SQL stored procedures, etc., for doing the transformation afterward. ADF was not a tool for citizen data analysts. It was instead for data engineers and developers. ADF used data pipelines to execute a group of activities as a flow, and among those activities, there were tasks such as copy data, running a stored procedure, etc.

Azure Data Factory. Image sourced from: https://learn.microsoft.com/en-us/azure/data-factory/introduction

For the past few years, we have always had this split; If you wanted a simple-to-use data transformation engine but not much data, use Power Query Dataflows. If you want scalable data ingestion, then use Azure Data Factory.

Best of Both Worlds

Now in Microsoft Fabric, We combine the best things from Power Query Dataflows and Azure Data Factory Data Pipelines into one stream: Data Factory. Data Factory ensures that you still have a simple-to-use and powerful transformation engine of Power Query for data transformation, but on the other hand, you also have the scalability of Data Pipelines and can build a control flow for execution of the ETL using the Data Pipelines. In other words, Data Factory is a state-of-the-art ETL software-as-a-service offering for Microsoft Fabric.

Data Factory in Microsoft Fabric combines Azure Data Factory and Power Query Dataflows together.

Elements of Data Factory

Combining these two services brings great features that make the Data Factory an ultimate ETL service. Here are some of those below;

Data Connectors

For an ETL (Extract, Transform, Load) system, one of the most important aspects is what sources the data can be fetched from. Data Factory offers hundreds of data connectors, enabling you to get data from sources such as databases, files, folders, software-as-a-service systems, etc.

Data Factory Connectors

It is also possible to create your connector if you are keen.

Dataflows

Dataflows are the heart of Data Factory. This is where you get the data from the sources, define the data transformation and prepare it in any shape needed, and finally load it into destinations. Dataflows use the Power Query data transformation engine and the user interface for creating it using the simple-to-use Power Query Editor online.

Dataflow

Power Query Editor online is not only powerful in the graphical interface, it also enables the developer to write code in M language, which is the data transformation language for Power Query.

Power Query Editor online

To learn more about Dataflows, I suggest reading my article below.

Dataflows support a few destinations at the time of writing this article which are;

  • Azure Data Explorer (Kusto)
  • Azure SQL Database
  • Data Warehouse
  • Lakehouse

Data Pipelines

Although Dataflows are the main ETL component of the Data Factory, they can be enhanced when wrapped by a control flow execution component. This control flow execution component is called Data Pipeline. A Data Pipeline is a group of activities (or tasks) defined by a particular flow of execution. The activities in a Pipeline can involve copying data, running a Dataflow, executing a stored procedure, looping until a certain condition is met, or executing a particular set of activities if a condition is met, etc.

Data Pipeline

Data Pipelines can then be scheduled, and there is a monitoring tool to check the execution stage of the pipeline in addition to the activity-state-outputs where you can define what happens if a certain task fails or succeeds.

As mentioned, one of the most important activities that can be done in a Pipeline is the execution of a Dataflow. This is where Dataflows and Data Pipelines work together in their best way.

Executing Dataflows from Data Pipeline

To learn more about Data Pipelines, read my article below;

Summary

Data Factory is an ETL-in-cloud solution that is the data integration workload of Microsoft Fabric. Data Factory is not a new product or service; it comes from many years of Microsoft data transformation tools and services. It is built on top of Power Query and Azure Data Factory. Data Factory uses two main components to deliver the best ETL scenarios possible; Dataflows and Data Pipelines. Dataflows are for the main get data, transform, and load process, and the Data Pipeline can control the rest of the execution with control flow activities.

I highly recommend reading the articles below to study more about Data Factory;

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

2 thoughts on “What is Data Factory in Microsoft Fabric

  • ok, so this is a bit confusing. ADF has data flow and Fabric DF has data flow but they are both different. Why are data flow not the same in both products? or you think this is the plan moving forward

    • This was the case before Fabric. The Wrangling Dataflow is ADF was different from Power Query Dataflow in many ways.
      However, in Fabric, these came into agreement. You can run Dataflow Gen2 (which is pretty much Power Query Dataflow with additional benefits) inside a Data Pipeline
      Cheers
      Reza

Leave a Reply

%d bloggers like this: