In the last two articles, you have learned what is Dataflow and sample use case scenarios of that in Power BI, and also you learn how to create a dataflow. Before going any further in the dataflow discussion, we have to make a quick pause and talk about the Common Data Model (CDM). Dataflow integration with CDM is an important and valuable asset for business applications as well as Power BI. In this article, you will learn what is CDM, and all details about it that you need for your Power BI solution. If you want to learn more about Power BI; Read Power BI book from Rookie to Rock Star.
Prerequisite
To understand the content of this article, it is good to have general information about what dataflow is, and scenarios of using dataflow:
Part 1: What are the Use Cases of Dataflow for You in Power BI?
Part 2: Getting Starting with Dataflow in Power BI
Silos of Data
To understand the common data model, you have to first understand the need for it. Nowadays many organizations have tens of applications, applications that do different functions. For example, an application for accounting, another application for ERP, one application that takes care of their website, and an application for their timesheet system and etc. Many organizations also have many databases, because usually, each application comes with a database. There might be also some databases with no application for it. Such as an Excel file that a salesperson keeps having a list of leads and contacts to refer to it. There can be hundreds of databases in each organization.
As an example of the silos of data, try to answer this question; How many applications may keep customer data? ERP may keep that information. The online website may have a section in its database for it. Also, there might be a customer table in the accounting application. How would you sync all these customer data together? how to integrate?
Integration Challenge
One of the big challenges in a scenario mentioned above (tens of applications, and hundreds of databases) is the integration. The ERP application would want to access the data from the website. The accounting application would need some details to be imported from the ERP application. A reporting application would need data from all applications. If we want to design an integrated model between ten applications and hundreds of databases, we end up with something like below diagram!
Shared Data Model
As you can see in the above screenshot, integration between applications and databases looks very complicated and crazy. Most of the organizations have to spend a lot of resources and budget to get this done, and with an appearance of the next application or database, the whole operation needs to be done again! Instead of the model above, it is much more efficient, if we can integrate all the data into a “shared data model”. The shared data model would be the source of the single version of the truth. All systems will read and write into the shared data model. Other applications still have their own database, but if they want to integrate with another application, the shared data model is their adapter.
The shared data model can help the solution architecture to be much more efficient as you see in below:
The shared data model will make applications isolated from the changes in other applications, and as a result, it would need much fewer integration efforts. Now let’s see what a Common Data Model is.
Common data model: CDM
The Common Data Model (CDM) is the shared data model you have seen in the previous diagram. It is a place to keep all common data to be shared between applications and data sources. It ensures the application integrity and consistency. Each application should only know how to deal with the CDM. Common Data Model have templates of a table structure for most of the business functions needs. The initiative of the common data model started from Dynamics 365. So the table structure mainly derived from that structure. As of the time of writing this article, there are more than 250 tables in the common data model. the screenshot below shows some of these entities;
There is a CDM repository in Github for the schema of the Common Data Model: https://github.com/Microsoft/CDM. CDM works with other Microsoft technologies as explained in the next section.
CDM and Other Microsoft Services
The common data model is already supported in the Common Data Services for Apps, Dynamics 365, Power Apps, Power BI, and it will be supported in many upcoming Azure data services. As a reader of RADACAD’s articles, I assume, you are most probably coming from Power BI side, and that is the side that I will be focusing on in this article. Continuing the discussion from Dataflow; Dataflow can map the output of a query to an entity in the common data model. This feature is handled with “Map to Standard” option in the Dataflow Power Query Editor.
The Common Data Services also provides the hub for additional analysis as well as AI-driven business insight. I’ll leave the AI part of it to be covered by Leila in other articles in deep. The below diagram shows how CDM can be used with all other services of Microsoft.
CDM and Dataflow
The common data model can be the destination of storing data in a dataflow. To use this function, you need to use an option named “Map to Standard”. This is an option that you will see when you are in the online query editor of the dataflow.
You can then select the standard entity. The standard entity is one of the entities in the common data model, as you can see in the screenshot below, there are many entities pre-defined.
After selecting an entity, you can map the fields from the source column to the standard entity.
This process loads the data into the CDM table.
CDM and Business Applications
The common data model can be accessed from Microsoft business applications such as Microsoft PowerApps, Power BI, and Dynamics 365.
Data Storage for the Common Data Model
The common data model stores the data into Azure Data Lake storage. The structure of storage is using folders called as CDM folders. Each folder contains metadata files and multiple data files. The metadata file is a *.json file, and data is stored as *.csv files. The screenshot below shows an example structure of a CDM folder:
In another article, I’ll explain the structure of CDM folders in details and also explain how you can leverage the structure to build external dataflow. Stay tuned for that.
Main Advantage of CDM
As mentioned above, CDM helps to make the integration easier. As a direct result of that, it helps applications and data sources to be decoupled from each other. And what that means is that; you can have a report built for a specific purpose, and if that report is using CDM, then it can be easily applied to a similar scenario where the data source is different. The same process applies when you have applications using Dynamics 365 or PowerApps using CDM. This leads to some opportunities such as Industry Solution Accelerators, which are pre-packaged applications using CDM.
There are already some Microsoft Industry Solution Accelerators published here.
Summary
Dataflow is loading data into the Azure Data Lake storage, but it can be even much more efficient if it loads data into a shared data model. A shared data model will reduce the efforts needed for integration between data sources and applications. The Common Data Model is a shared data model that can be accessed by multiple Microsoft technologies. The common data model is saved in the Azure Data Lake. Dataflow can load data into the CDM and other applications, as well as Power BI, can get data from the CDM. Stay tuned for upcoming articles from Dataflow series; we still have a lot to discuss; Linked vs Computed Entities, Licensing, CDM folder structure etc.
Part 1: What are the Use Cases of Dataflow for You in Power BI?
Part 2: Getting Starting with Dataflow in Power BI
Part 3: What is the Common Data Model and Why Should I Care? (this article)
I have some serious doubts about CDM. It is already there for years, but no progress. Now suddenly Microsoft is making a push with CDM. But for how long and to what direction?
A data model is to… model data. Then start using the data and the model. There is no data modelling tool for CDM that shows what the relations and dependencies are. It is all mapped directly on entities. There is a definition of the entities, but on the attributes it gets vague. It is all focussed on using CDM.
Then if you start using CDM, if you want to define custom entities/attributes, again there is no modelling tool. Also no versioning. So how do you keep track… then you also need to keep track of how mapped your own company data to the CDM or to your custom extensions.
With other words: no way that is usable in any way. Not for small companies and definitely not enterprise ready.
I really would like to stand corrected on this, but I doubt it.
Hi Johannes
Thanks for your comment. I agree with most of your points here. There is not a good modeling tool to interact with and design the CDM. But as Microsoft focused on this recently even heavier (especially considering collaboration with Adobe and SAP), then I expect some new updates on this, maybe we see some tools coming 🙂
Cheers
Reza
I guess we can only map one entity to one CDM? And if so is it possible to only map some of the fields? Let’s say within our entity we only have to which fit to the CDM
Thanks
You can map one entity from the dataflow to one entity in the CDM (CDM has hundreds of entities). and Yes, correct. You can map only the fields you want. It is not mandatory to map all fields.
Hi ReZa Rad,
Thank you sharing Data flow concept.please can share different between Power BI pro vs Premium with Data flow.
becoz..next month i am starting new project my end user only 100 employees that way we choose Power BI pro licensee for 100 persons.
after Reading Your blog i will plan to implement dataflow in my project
My datasource is:Azure Sql DB(only one data base now)
thank
Lavidiya
Hi
You can definitely use Dataflow even using Pro license. However, you will miss some features; such as linked and computed entities, AI features in Dataflow and few others.
I will write a post later on about it.
Cheers
Reza
Hi Thanks for your post. Its really helpul.
Can we ingest and write data in CDM format using Azure Data Factory similar to what can be done with Power BI data flows?
If yes, Can you please brief us a bit with the process or info about it?
Hi Ayush
I am not an Azure Data Factory expert, but it seems there is a way to do it.
Cheers
Reza
Hi Reza,
Do you know where can we have a description of the MS Common Data Model categories? We want to understand what they include, so we can map them properly. I haven’t been able to find a clear definition on the main categories and the groupings under each.
Hi
I think here you can find some more information with an advanced navigator
Cheers
Reza
Hi, very good info!
Do you know if Dataflows can be stored in ADLS as parquet files instead of csv?
At the moment, CSV is the only possible format in ADLS
Hi Reza,
I do not see the map to standard option in power query.
Do I need to enable it from somewhere?
Have you made any video on how to actually implement this practically?
Thanks!
Hi Naser. This is in Power Query online as part of Power BI or Power Platform Dataflows.