In the last two articles, you have learned what is Dataflow and sample use case scenarios of that in Power BI, and also you learn how to create a dataflow. Before going any further in the dataflow discussion, we have to make a quick pause and talk about the Common Data Model (CDM). Dataflow integration with CDM is an important and valuable asset for business applications as well as Power BI. In this article, you will learn what is CDM, and all details about it that you need for your Power BI solution. If you want to learn more about Power BI; Read Power BI book from Rookie to Rock Star.
To understand the content of this article, it is good to have general information about what dataflow is, and scenarios of using dataflow:
Silos of Data
To understand the common data model, you have to first understand the need for it. Nowadays many organizations have tens of applications, applications that do different functions. For example, an application for accounting, another application for ERP, one application that takes care of their website, and an application for their timesheet system and etc. Many organizations also have many databases, because usually, each application comes with a database. There might be also some databases with no application for it. Such as an Excel file that a salesperson keeps having a list of leads and contacts to refer to it. There can be hundreds of databases in each organization.
As an example of the silos of data, try to answer this question; How many applications may keep customer data? ERP may keep that information. The online website may have a section in its database for it. Also, there might be a customer table in the accounting application. How would you sync all these customer data together? how to integrate?
One of the big challenges in a scenario mentioned above (tens of applications, and hundreds of databases) is the integration. The ERP application would want to access the data from the website. The accounting application would need some details to be imported from the ERP application. A reporting application would need data from all applications. If we want to design an integrated model between ten applications and hundreds of databases, we end up with something like below diagram!
Shared Data Model
As you can see in the above screenshot, integration between applications and databases looks very complicated and crazy. Most of the organizations have to spend a lot of resources and budget to get this done, and with an appearance of the next application or database, the whole operation needs to be done again! Instead of the model above, it is much more efficient, if we can integrate all the data into a “shared data model”. The shared data model would be the source of the single version of the truth. All systems will read and write into the shared data model. Other applications still have their own database, but if they want to integrate with another application, the shared data model is their adapter.
The shared data model can help the solution architecture to be much more efficient as you see in below:
The shared data model will make applications isolated from the changes in other applications, and as a result, it would need much fewer integration efforts. Now let’s see what a Common Data Model is.
Common data model: CDM
The Common Data Model (CDM) is the shared data model you have seen in the previous diagram. It is a place to keep all common data to be shared between applications and data sources. It ensures the application integrity and consistency. Each application should only know how to deal with the CDM. Common Data Model have templates of a table structure for most of the business functions needs. The initiative of the common data model started from Dynamics 365. So the table structure mainly derived from that structure. As of the time of writing this article, there are more than 250 tables in the common data model. the screenshot below shows some of these entities;
There is a CDM repository in Github for the schema of the Common Data Model: https://github.com/Microsoft/CDM. CDM works with other Microsoft technologies as explained in the next section.
CDM and Other Microsoft Services
The common data model is already supported in the Common Data Services for Apps, Dynamics 365, Power Apps, Power BI, and it will be supported in many upcoming Azure data services. As a reader of RADACAD’s articles, I assume, you are most probably coming from Power BI side, and that is the side that I will be focusing on in this article. Continuing the discussion from Dataflow; Dataflow can map the output of a query to an entity in the common data model. This feature is handled with “Map to Standard” option in the Dataflow Power Query Editor.
The Common Data Services also provides the hub for additional analysis as well as AI-driven business insight. I’ll leave the AI part of it to be covered by Leila in other articles in deep. The below diagram shows how CDM can be used with all other services of Microsoft.
CDM and Dataflow
The common data model can be the destination of storing data in a dataflow. To use this function, you need to use an option named “Map to Standard”. This is an option that you will see when you are in the online query editor of the dataflow.
You can then select the standard entity. The standard entity is one of the entities in the common data model, as you can see in the screenshot below, there are many entities pre-defined.
After selecting an entity, you can map the fields from the source column to the standard entity.
This process loads the data into the CDM table.
CDM and Business Applications
The common data model can be accessed from Microsoft business applications such as Microsoft PowerApps, Power BI, and Dynamics 365.
Data Storage for the Common Data Model
The common data model stores the data into Azure Data Lake storage. The structure of storage is using folders called as CDM folders. Each folder contains metadata files and multiple data files. The metadata file is a *.json file, and data is stored as *.csv files. The screenshot below shows an example structure of a CDM folder:
In another article, I’ll explain the structure of CDM folders in details and also explain how you can leverage the structure to build external dataflow. Stay tuned for that.
Main Advantage of CDM
As mentioned above, CDM helps to make the integration easier. As a direct result of that, it helps applications and data sources to be decoupled from each other. And what that means is that; you can have a report built for a specific purpose, and if that report is using CDM, then it can be easily applied to a similar scenario where the data source is different. The same process applies when you have applications using Dynamics 365 or PowerApps using CDM. This leads to some opportunities such as Industry Solution Accelerators, which are pre-packaged applications using CDM.
There are already some Microsoft Industry Solution Accelerators published here.
Dataflow is loading data into the Azure Data Lake storage, but it can be even much more efficient if it loads data into a shared data model. A shared data model will reduce the efforts needed for integration between data sources and applications. The Common Data Model is a shared data model that can be accessed by multiple Microsoft technologies. The common data model is saved in the Azure Data Lake. Dataflow can load data into the CDM and other applications, as well as Power BI, can get data from the CDM. Stay tuned for upcoming articles from Dataflow series; we still have a lot to discuss; Linked vs Computed Entities, Licensing, CDM folder structure etc.
Part 3: What is the Common Data Model and Why Should I Care? (this article)