Introduction to Azure Data Factory

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube

Many of you worked with SSIS and are familiar with term of ETL. Azure Data Factory is Microsoft Cloud based ETL technology, which has ability to work with large volume of data, working with data sources such as SQL Server on premises, SQL Server Azure, or Azure Blob storage. Data can be transformed with Azure Data Factory and be loaded into the destination. Data Factory can be scheduled to run in Azure portal, and there are monitoring features that provide good information about the job.

 

 

 

Azure Data Factory provides;

  • Access to data sources such as SQL Server On premises, SQL Azure, and Azure Blob storage
  • Data transformation through Hive, Pig, and C#.
  • Monitoring the pipeline of data, validation and execution of scheduled jobs
  • Load it into desired Destinations such as SQL Server On premises, SQL Azure, and Azure Blob storage
  • And on last but not least; This is Cloud based service.

Diagram below shows how Data Factory works

source: Microsoft: http://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/

 

What tools you need to work with Azure Data Factory;

  • You would need an Azure subscription and access to Azure Portal.

  • There is not requirement for SSDT.
  • If you want to connect to SQL Server On Premises you would require Data Management Gateway.

  • Azure Storage Explorer is a good tool which is not mandatory, but would help to work with Azure Blob Storage for uploading files.

  • SSMS (SQL Server Management Studio) or SQL Azure Console.

 

What else you need?

  • Understanding JSON. Metadata language for Azure Data Factory works with JSON. JSON is a markup language. in the next blog post I explain how to use it do define metadata structure of an Azure Data Factory.
  • Scripting through C#, Pig, and Hive depends on what transformations you want to apply.

 

Supported Transformations;

Data Factory supports transformation, but unlike SSIS, it doesn’t have so much built-in transformations. there are only three types of transformations so far;

  • Copy Activity
  • C# Activity
  • HDInsight Activity (Pig and Hive)

So the very first impression for you is right; working with this service needs scripting. you might end up with scripting with C# or Pig and Hive.

 

Is there a Designer or Editor tool for Azure Data Factory?

  • If you mean something such as Control Flow or Data Flow designer in SSDT; then answer is no! you have to use JSON and write JSON scripts in files and upload them into Azure to create your pipeline.
  • If you mean a viewer area that you can see your pipeline or data flow; then the answer is yes. you can view the pipeline in Azure Portal.

 

Is there a Monitoring and Admin tool?

Yes. you can monitor and administer Azure Data Factory through Azure Portal.

 

In the next blog post I’ll explain how to start your first Data Factory through an example. You will learn how to configure and use tools and services mentioned above.

 

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

Leave a Reply