Azure Data Factory has been released as general availability 10 days ago. There has been also an extension for Visual Studio published a little earlier for Data Factory. The good news is that now you can create Azure Data Factory projects from Visual Studio. This is a great step forward in development of Data Factory packages. Previously the only way to create Data Factory solutions was only through Azure portal with JSON scripts. The new set of templates for Azure Data Factory helps to speed up the development of Azure based data extract and loads. In this post I like to share features of these new templates with you.
What is Azure Data Factory?
Azure Data Factory is the cloud based data extraction and ingestion service. This service can connect to data sources such as on premises data sources (Oracle, SQL Server…) as well as Azure data sources (Azure SQL Database, Azure Blob Storage). Transformations can be applied to the dataset through range of Activities defined in data factory, and the result set can be loaded into on premises data stores as well as azure data stores. If you want to learn more about Azure Data Factory please read this blog post.
I encourage you to read blog posts here to have better understand of Data Factory:
In blog posts above you’ll see that I’ve created the Azure Data Factory through Azure Portal. And you will also see that I’ve created other components of data factory through Azure Portal with JSON codes. This was the only way of creating data factory solutions so far. Fortunately now an extension for Visual Studio has been released that has templates for Azure Data Factory projects. Let’s see what are these templates;
Visual Studio SDK for Data Factory
You need to have Visual Studio 2013 (this extension yet to be published for VS 2015, stay tuned), and version of VS 2013 should be the Update 4 of it. If you don’t have Update 4 of Visual Studio 2013, download it from here.
You also need to install Windows Azure SDK for .NET version 2.7. Download it from here.
You can also update an existing SDK through Tools menu in Visual Studio, and then Extensions and Updates. then choose Update from Visual Studio Gallery and then Microsoft Azure DataFactory Tools for Visual Studio.
Data Factory Project Templates
After installing above SDK you will see Data Factory templates when you create a New Project in Visual Studio.
Empty Data Factory Project
Empty Data Factory Project will just create an empty solution for you with essential folders for a Data Factory solutions. Essential Folders are:
- Linked Services
You can then easily add items to this solution. For example Right click on LinkedServices folder and Add New Item. You can choose to create a Linked Services from one of existing items as below screenshot shows;
You might disappoint here a bit 😉 but with creating this Linked Service a new JSON script will open. and You still have to modify it through JSON code. Fortunately you shouldn’t do much code manipulation here, you have to just change few configurations. for example the main part in below LinkedService is the connection string to the Blob Storage
A Pipeline also can be added with a New Item option under Pipelines folder which gives you options below;
Pipeline items are named based on their activity types. for example Copy Data Pipeline is a Pipeline with a Copy Activity. For pipelines the editor works even with a designer. You can see the pipeline view in the designer, and the code underneath, these are sync together which makes the development much easier.
You can also create Tables which are bases of each pipeline (input and outputs) with adding new item under Tables folder. Here you can see the existing supported list of tables;
For tables you’ll get the base script and then you can modify it with setting the structure, formatting, location/connection string, LinkedService name related to this table, frequency and etc.
You can also build the solution to see existing errors in the code
Data Factory Templates
The other option when you create a Data Factory project from Visual Studio is to use existing Data Factory templates. You can also choose to install sample data with the template as well.
Here as an example I show you steps of a Customer Churn Analysis use-case template; You can choose to use an existing data factory for your template or a new one. You have the options to configure and set up a new Data Factory as you see below;
Then You can configure data sources (this might vary depends on the template)
Because this template uses Machine Learning component so a configuration for that compute is required here
You can verify what you have configured in the Summary
Then in deployment you will see errors of deployment or success message. For example screenshot below shows that the name of data factory was not correct, so creating the new Data Factory failed because of that.
In general it is a great step forward to have this extension in Visual Studio because creating data factory solutions is much easier now. I’ll write more blog posts soon to explain how to create Data Factory solutions from Visual Studio through demos.