Introduction to Azure Data Factory

Many of you worked with SSIS and are familiar with term of ETL. Azure Data Factory is Microsoft Cloud based ETL technology, which has ability to work with large volume of data, working with data sources such as SQL Server on premises, SQL Server Azure, or Azure Blob storage. Data can be transformed with Azure Data Factory and be loaded into the destination. Data Factory can be scheduled to run in Azure portal, and there are monitoring features that provide good information about the job.

 

 

 

 

Azure Data Factory provides;

  • Access to data sources such as SQL Server On premises, SQL Azure, and Azure Blob storage
  • Data transformation through Hive, Pig, and C#.
  • Monitoring the pipeline of data, validation and execution of scheduled jobs
  • Load it into desired Destinations such as SQL Server On premises, SQL Azure, and Azure Blob storage
  • And on last but not least; This is Cloud based service.

Diagram below shows how Data Factory works

source: Microsoft: http://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/

 

What tools you need to work with Azure Data Factory;

  • You would need an Azure subscription and access to Azure Portal.

  • There is not requirement for SSDT.
  • If you want to connect to SQL Server On Premises you would require Data Management Gateway.

  • Azure Storage Explorer is a good tool which is not mandatory, but would help to work with Azure Blob Storage for uploading files.

  • SSMS (SQL Server Management Studio) or SQL Azure Console.

 

What else you need?

  • Understanding JSON. Metadata language for Azure Data Factory works with JSON. JSON is a markup language. in the next blog post I explain how to use it do define metadata structure of an Azure Data Factory.
  • Scripting through C#, Pig, and Hive depends on what transformations you want to apply.

 

Supported Transformations;

Data Factory supports transformation, but unlike SSIS, it doesn’t have so much built-in transformations. there are only three types of transformations so far;

  • Copy Activity
  • C# Activity
  • HDInsight Activity (Pig and Hive)

So the very first impression for you is right; working with this service needs scripting. you might end up with scripting with C# or Pig and Hive.

 

Is there a Designer or Editor tool for Azure Data Factory?

  • If you mean something such as Control Flow or Data Flow designer in SSDT; then answer is no! you have to use JSON and write JSON scripts in files and upload them into Azure to create your pipeline.
  • If you mean a viewer area that you can see your pipeline or data flow; then the answer is yes. you can view the pipeline in Azure Portal.

 

Is there a Monitoring and Admin tool?

Yes. you can monitor and administer Azure Data Factory through Azure Portal.

 

In the next blog post I’ll explain how to start your first Data Factory through an example. You will learn how to configure and use tools and services mentioned above.

 

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply