Incremental Load: Change Data Capture in SSIS

15

Incremental Load is always a big challenge in Data Warehouse and ETL implementation. In enterprise world you face millions, billions and even more of records in fact tables. It won’t be a practical practice to load those records every night, as it would have many downsides such as;

  • ETL process will slow down significantly, and can’t be scheduled to run on small periods.
  • Performance of the source, and destination server will be affected badly, downtime of these systems would be longer.
  • More resources will be required to maintain the process. such as better processors, more RAMs… and adding these won’t help so much at the end, because the amount of data is increasing as times passes.
  • and many other issues.

So what would be the solution? Solution is Incremental Load approach. In this approach data will be loaded partially, preferably only part of the data that has been changed. A change set will be much smaller than the total amount of data. As an example in a 200 million records fact table which stored data for 10 years, only 10% percent of that data might be related to the current year and changes frequently, so you won’t usually required to re-load the rest 180 million records.

[…]

TechDays Hong Kong 2015: Azure Data Factory vs. SSIS

6

It has been long time passed from my presentation in Hong Kong TechDays 2015 on Mid February, I’ve been really busy so far and hadn’t chance to upload my presentation files here. I would like to thank all audience of my session on Azure Data Factory vs. SSIS, and provide you the link to my presentation slides. In this session you will see comparison between SSIS and Azure Data Factory on different factors such as developments, features, deployments, user experience, environment, and etc. For each comparison factor you will see a table comparison of these two products and their pros and cons for different situation.

[…]

Codecamp Christchurch 2015 SQL and BI Stream Highlights

first

Codecamp Christchurch 2015 ran today on Christchurch Poly-technique of Technology. I’ve joined the even in last minutes when the agenda was full set, so I only joined as audience. There was a really good turnout for the codecamp about 200 registered. Three streams ran on Software Development, SQL/BI, and SharePoint. 5 sessions in each stream. In total it was a good event, I had a chance to meet some friends and SQL Community down in Christchurch. Here is my highlights of the event.

[…]

Introduction to Power BI Designer

blog

Power BI Designer is the new Editor for Power BI main components which are: Power Query, Power Pivot, and Power View. Power BI Designer made building Power BI solutions easier with an integrated solution. Power BI Designer’s files can be easily uploaded into Power BI site. In this tutorial video you will learn basics of Power BI Designer, some demos of this product, getting data from FIFA 2014 World Cup website, and creating charts and dashboards. In the demo you will also learn how to deploy the report to Power BI site, and how to view the dashboard and report from Power BI app.

[…]

Walk-through Steps: I’m New to BI, Where to Start? – Part 0: Prerequisites

rady

This is the first part which published at the last! I previously published 7 posts for the series of “I’m New to BI, Where to Start?”. However I got some feedback from audience that are not coming from the world of Database, and they are not familiar with relational database structure, primary keys, foreign keys, constraints, indexes, T-SQL…. So I felt the requirement for a preliminary post that links to some references to have better understanding of prerequisites to starting BI. Business Intelligence is the art of fetching information to support decisions based on the story behind the data. With this definition the first and foremost prerequisite is to understand data and how to work with it.

[…]

Walk-through Steps: I’m New to BI, Where to Start? – Part 7: Azure

Many organizations nowadays are in transition from on-premises to cloud, and many of them use hybrid solutions where part of the computing will be done in cloud and the rest on-premises. The trend nowadays is to use cloud to have better maintenance, lower costs, more reliable solutions, lower administrative efforts, and powerful shared resources. In BI world there is a high demand for solutions to be on cloud, some computing services such as data transfer and ETL to be done on cloud, some data analysis and mining solutions happens on cloud, and even data to be stored on cloud data warehouse at some stage. There are many BI vendors in the market, but there are few who provide BI on the cloud.

[…]

Walk-through Steps: I’m New to BI, Where to Start? – Part 6: Data Mining

dmm

Data Scientist is a hot job nowadays in the market and there are lots of demand for it. Data Science origins from the BI field and sits under umbrella of Business Intelligence, because the data science and data mining helps decision makers in their decision making process. To have an idea about how data mining can solve real world issues, think about loyalty program of the super market that you usually use close to  your home. Loyalty program simply analyze your purchase history and based on that it comes up with some suggestion and prediction about your next purchases and will promote that through your cell phone text messages or email or even flyers directly addressed to your home address.

[…]

Walk-through Steps: I’m New to BI, Where to Start? – Part 5: Power BI

pbi

Power BI is not a strange word these days, there are many blog posts, videos, news, and speakings about it. Power BI is Microsoft Cloud BI services that released recently. Power BI is not only Cloud BI but also it offers good self-service BI stack as well. Power BI has five main components which are:

  • Power Query: For data extraction, transformation, and load into model.
  • Power Pivot: Data Modelling tool.
  • Power View: Data Visualization tool
  • Power Map: 3D Geo-spatial data visualization tool.
  • Power Q&A: Engine for natural language questions and answering

[…]

Walk-through Steps: I’m New to BI, Where to Start? – Part 4: Data Visualization

logo

Data Visualization is the front end of your BI system. It will be seen from the user’s point of view as the whole BI system! The reason for that is users only will see this part of the BI system, they won’t see Data Warehouse, ETL,  andData Governance… What they see is only dashboards and charts explaining data values. So it is essential to do the data visualization right and effective. A good data visualization should be able to tell story behind the data. A bad data visualization won’t help users even if you have a good data warehouse or ETL design. So you should spent time to analyze ever dashboard, chart, and table and make that a good story teller to the user.

[…]

Walk-through Steps: I’m New to BI, Where to Start? – Part 3 : Data Governance

bag-and-hands_w

Data Governance is one of the most important aspects of BI systems, that unfortunately seen less important in many organizations. All usefulness and gratefulness of BI system will only be seen if there is a good data governance in place. If you build superb dashboards and data visualizations it won’t help until the quality of data is high. your ETL scenario won’t be so much great when there are more than one source for the same data but with different versions of it.

Data Governance and Enterprise Information Management (EIM) are concepts that needs to be covered in every organization working with the data, despite the fact that they use BI system or not. So in nutshell EIM and Data Governance are not components of BI system, they are separate systems, that can be used side by side of a BI system. However because of their very close relationship with BI system (especially because BI system is based on the data and information) we would cover them here. I would like to comment here that there won’t be a good BI system without good EIM or data governance in place.

[…]