Many of you worked with SSIS and are familiar with term of ETL. Azure Data Factory is Microsoft Cloud based ETL technology, which has ability to work with large volume of data, working with data sources such as SQL Server on premises, SQL Server Azure, or Azure Blob storage. Data can be transformed with Azure Data Factory and be loaded into the destination. Data Factory can be scheduled to run in Azure portal, and there are monitoring features that provide good information about the job.
Azure Data Factory provides;
- Access to data sources such as SQL Server On premises, SQL Azure, and Azure Blob storage
- Data transformation through Hive, Pig, and C#.
- Monitoring the pipeline of data, validation and execution of scheduled jobs
- Load it into desired Destinations such as SQL Server On premises, SQL Azure, and Azure Blob storage
- And on last but not least; This is Cloud based service.
Diagram below shows how Data Factory works
source: Microsoft: http://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/
What tools you need to work with Azure Data Factory;
- You would need an Azure subscription and access to Azure Portal.
- There is not requirement for SSDT.
- If you want to connect to SQL Server On Premises you would require Data Management Gateway.
- Azure Storage Explorer is a good tool which is not mandatory, but would help to work with Azure Blob Storage for uploading files.
- Azure Power Shell for running cmdlets of Azure Data Factory.
- SSMS (SQL Server Management Studio) or SQL Azure Console.
What else you need?
- Understanding JSON. Metadata language for Azure Data Factory works with JSON. JSON is a markup language. in the next blog post I explain how to use it do define metadata structure of an Azure Data Factory.
- Scripting through C#, Pig, and Hive depends on what transformations you want to apply.
Supported Transformations;
Data Factory supports transformation, but unlike SSIS, it doesn’t have so much built-in transformations. there are only three types of transformations so far;
- Copy Activity
- C# Activity
- HDInsight Activity (Pig and Hive)
So the very first impression for you is right; working with this service needs scripting. you might end up with scripting with C# or Pig and Hive.
Is there a Designer or Editor tool for Azure Data Factory?
- If you mean something such as Control Flow or Data Flow designer in SSDT; then answer is no! you have to use JSON and write JSON scripts in files and upload them into Azure to create your pipeline.
- If you mean a viewer area that you can see your pipeline or data flow; then the answer is yes. you can view the pipeline in Azure Portal.
Is there a Monitoring and Admin tool?
Yes. you can monitor and administer Azure Data Factory through Azure Portal.
In the next blog post I’ll explain how to start your first Data Factory through an example. You will learn how to configure and use tools and services mentioned above.