Building The First Azure Data Factory

logo

Previously in another post I explained what is Azure Data Factory alongside tools and requirements for this service. In this post I want to go through a simple demo of Data Factory, so you get an idea of how Data Factory project builds, develops and schedules to run. You may see some components of Azure Data Factory in this post that you don’t fully understand, but don’t worry, I’ll go through them later on in future posts.

An overview from previous section; Azure Data Factory is a Microsoft Azure service to ingest data from data sources and apply compute operations on the data and load it into the destination. The main purpose of Data Factory is data ingestion, and that is the big difference of this service with ETL tools such as SSIS (I’ll go through difference of Data Factory and SSIS in separate blog post). With Azure Data Factory you can;

  • Access to data sources such as SQL Server On premises, SQL Azure, and Azure Blob storage
  • Apply Data transformation through Hive, Pig, and C#.
  • Monitor the pipeline of data, validation and execution of scheduled jobs
  • Load it into desired Destinations such as SQL Server On premises, SQL Azure, and Azure Blob storage
  • And on last but not least; This is Cloud based service.

[…]

I’ll Speak in SQL PASS Summit 2015; 3 Years in a row

It is a honor for me that I’ve been selected to speak in SQL PASS Summit 2015. SQL PASS Summit is the largest SQL Server conference and event in the world. Last year about 6ooo people from more than 55 countries attended this great event. I’ve been honored previously to speak in this great conference Read more about I’ll Speak in SQL PASS Summit 2015; 3 Years in a row[…]

SSIS Demos in Action: Tutorial Videos

podcast

SQL Server Integration Services is not a new technology, this technology is a mature data transfer and data consolidation tool which has been in the market since 2005. Prior than that SSIS had another name; DTS. However SSIS nowadays is capable of doing far more than what DTS had to offer.

You can transfer data from any source to any destination, you can upload or download files, you can set priority on execution of tasks, you can call third party applications or command lines, you can get part of the data from Web Service, you can zip or unzip result set, change structure of the result set and load it wherever you want.

[…]

What’s New in MDS of SQL Server 2016

0 heading

SQL Server 2016 CTP 2 released almost a day ago, and I’ve had a chance to install it and play with Master Data Services to see what changes has been made in this product. Master Data Services is a service of SQL Server first released in SQL Server 2008R2, and enhanced in SQL Server 2012. There was no changes in MDS of SQL Server 2014. But fortunately MDS wheels are spinning now, there are some changes in 2016. Changes are not major though they  are evidence of Microsoft investing on this services nowadays.

[…]

Incremental Load: Change Data Capture in SSIS

15

Incremental Load is always a big challenge in Data Warehouse and ETL implementation. In enterprise world you face millions, billions and even more of records in fact tables. It won’t be a practical practice to load those records every night, as it would have many downsides such as;

  • ETL process will slow down significantly, and can’t be scheduled to run on small periods.
  • Performance of the source, and destination server will be affected badly, downtime of these systems would be longer.
  • More resources will be required to maintain the process. such as better processors, more RAMs… and adding these won’t help so much at the end, because the amount of data is increasing as times passes.
  • and many other issues.

So what would be the solution? Solution is Incremental Load approach. In this approach data will be loaded partially, preferably only part of the data that has been changed. A change set will be much smaller than the total amount of data. As an example in a 200 million records fact table which stored data for 10 years, only 10% percent of that data might be related to the current year and changes frequently, so you won’t usually required to re-load the rest 180 million records.

[…]

TechDays Hong Kong 2015: Azure Data Factory vs. SSIS

6

It has been long time passed from my presentation in Hong Kong TechDays 2015 on Mid February, I’ve been really busy so far and hadn’t chance to upload my presentation files here. I would like to thank all audience of my session on Azure Data Factory vs. SSIS, and provide you the link to my presentation slides. In this session you will see comparison between SSIS and Azure Data Factory on different factors such as developments, features, deployments, user experience, environment, and etc. For each comparison factor you will see a table comparison of these two products and their pros and cons for different situation.

[…]

Codecamp Christchurch 2015 SQL and BI Stream Highlights

first

Codecamp Christchurch 2015 ran today on Christchurch Poly-technique of Technology. I’ve joined the even in last minutes when the agenda was full set, so I only joined as audience. There was a really good turnout for the codecamp about 200 registered. Three streams ran on Software Development, SQL/BI, and SharePoint. 5 sessions in each stream. In total it was a good event, I had a chance to meet some friends and SQL Community down in Christchurch. Here is my highlights of the event.

[…]

Introduction to Power BI Designer

blog

Power BI Designer is the new Editor for Power BI main components which are: Power Query, Power Pivot, and Power View. Power BI Designer made building Power BI solutions easier with an integrated solution. Power BI Designer’s files can be easily uploaded into Power BI site. In this tutorial video you will learn basics of Power BI Designer, some demos of this product, getting data from FIFA 2014 World Cup website, and creating charts and dashboards. In the demo you will also learn how to deploy the report to Power BI site, and how to view the dashboard and report from Power BI app.

[…]

Walk-through Steps: I’m New to BI, Where to Start? – Part 0: Prerequisites

rady

This is the first part which published at the last! I previously published 7 posts for the series of “I’m New to BI, Where to Start?”. However I got some feedback from audience that are not coming from the world of Database, and they are not familiar with relational database structure, primary keys, foreign keys, constraints, indexes, T-SQL…. So I felt the requirement for a preliminary post that links to some references to have better understanding of prerequisites to starting BI. Business Intelligence is the art of fetching information to support decisions based on the story behind the data. With this definition the first and foremost prerequisite is to understand data and how to work with it.

[…]

Walk-through Steps: I’m New to BI, Where to Start? – Part 7: Azure

Many organizations nowadays are in transition from on-premises to cloud, and many of them use hybrid solutions where part of the computing will be done in cloud and the rest on-premises. The trend nowadays is to use cloud to have better maintenance, lower costs, more reliable solutions, lower administrative efforts, and powerful shared resources. In BI world there is a high demand for solutions to be on cloud, some computing services such as data transfer and ETL to be done on cloud, some data analysis and mining solutions happens on cloud, and even data to be stored on cloud data warehouse at some stage. There are many BI vendors in the market, but there are few who provide BI on the cloud.

[…]