What is the Common Data Model and Why Should I Care? Part 3 of Dataflow Series in Power BI

In the last two articles, you have learned what is Dataflow and sample use case scenarios of that in Power BI, and also you learn how to create a dataflow. Before going any further in the dataflow discussion, we have to make a quick pause and talk about the Common Data Model (CDM). Dataflow integration with CDM is an important and valuable asset for business applications as well as Power BI. In this article, you will learn what is CDM, and all details about it that you need for your Power BI solution. If you want to learn more about Power BI; Read Power BI book from Rookie to Rock Star.

Prerequisite

To understand the content of this article, it is good to have general information about what dataflow is, and scenarios of using dataflow:

Part 1: What are the Use Cases of Dataflow for You in Power BI?

Part 2: Getting Starting with Dataflow in Power BI

Silos of Data

To understand the common data model, you have to first understand the need for it. Nowadays many organizations have tens of applications, applications that do different functions. For example, an application for accounting, another application for ERP, one application that takes care of their website, and an application for their timesheet system and etc. Many organizations also have many databases, because usually, each application comes with a database. There might be also some databases with no application for it. Such as an Excel file that a salesperson keeps having a list of leads and contacts to refer to it. There can be hundreds of databases in each organization.

As an example of the silos of data, try to answer this question; How many applications may keep customer data? ERP may keep that information. The online website may have a section in its database for it. Also, there might be a customer table in the accounting application. How would you sync all these customer data together? how to integrate?

Integration Challenge

One of the big challenges in a scenario mentioned above (tens of applications, and hundreds of databases) is the integration. The ERP application would want to access the data from the website. The accounting application would need some details to be imported from the ERP application. A reporting application would need data from all applications. If we want to design an integrated model between ten applications and hundreds of databases, we end up with something like below diagram!

Shared Data Model

As you can see in the above screenshot, integration between applications and databases looks very complicated and crazy. Most of the organizations have to spend a lot of resources and budget to get this done, and with an appearance of the next application or database, the whole operation needs to be done again! Instead of the model above, it is much more efficient, if we can integrate all the data into a “shared data model”. The shared data model would be the source of the single version of the truth. All systems will read and write into the shared data model. Other applications still have their own database, but if they want to integrate with another application, the shared data model is their adapter.

The shared data model can help the solution architecture to be much more efficient as you see in below:

The shared data model will make applications isolated from the changes in other applications, and as a result, it would need much fewer integration efforts. Now let’s see what a Common Data Model is.

Common data model: CDM

The Common Data Model (CDM) is the shared data model you have seen in the previous diagram. It is a place to keep all common data to be shared between applications and data sources. It ensures the application integrity and consistency. Each application should only know how to deal with the CDM. Common Data Model have templates of a table structure for most of the business functions needs. The initiative of the common data model started from Dynamics 365. So the table structure mainly derived from that structure. As of the time of writing this article, there are more than 250 tables in the common data model. the screenshot below shows some of these entities;

There is a CDM repository in Github for the schema of the Common Data Model: https://github.com/Microsoft/CDM. CDM works with other Microsoft technologies as explained in the next section.

CDM and Other Microsoft Services

The common data model is already supported in the Common Data Services for Apps, Dynamics 365, Power Apps, Power BI, and it will be supported in many upcoming Azure data services. As a reader of RADACAD’s articles, I assume, you are most probably coming from Power BI side, and that is the side that I will be focusing on in this article. Continuing the discussion from Dataflow; Dataflow can map the output of a query to an entity in the common data model. This feature is handled with “Map to Standard” option in the Dataflow Power Query Editor.

The Common Data Services also provides the hub for additional analysis as well as AI-driven business insight. I’ll leave the AI part of it to be covered by Leila in other articles in deep. The below diagram shows how CDM can be used with all other services of Microsoft.

CDM and Dataflow

The common data model can be the destination of storing data in a dataflow. To use this function, you need to use an option named “Map to Standard”. This is an option that you will see when you are in the online query editor of the dataflow.

You can then select the standard entity. The standard entity is one of the entities in the common data model, as you can see in the screenshot below, there are many entities pre-defined.

After selecting an entity, you can map the fields from the source column to the standard entity.

This process loads the data into the CDM table.

CDM and Business Applications

The common data model can be accessed from Microsoft business applications such as Microsoft PowerApps, Power BI, and Dynamics 365.

Data Storage for the Common Data Model

The common data model stores the data into Azure Data Lake storage. The structure of storage is using folders called as CDM folders. Each folder contains metadata files and multiple data files. The metadata file is a *.json file, and data is stored as *.csv files. The screenshot below shows an example structure of a CDM folder:

In another article, I’ll explain the structure of CDM folders in details and also explain how you can leverage the structure to build external dataflow. Stay tuned for that.

Main Advantage of CDM

As mentioned above, CDM helps to make the integration easier. As a direct result of that, it helps applications and data sources to be decoupled from each other. And what that means is that; you can have a report built for a specific purpose, and if that report is using CDM, then it can be easily applied to a similar scenario where the data source is different. The same process applies when you have applications using Dynamics 365 or PowerApps using CDM. This leads to some opportunities such as Industry Solution Accelerators, which are pre-packaged applications using CDM.

There are already some Microsoft Industry Solution Accelerators published here.

Summary

Dataflow is loading data into the Azure Data Lake storage, but it can be even much more efficient if it loads data into a shared data model. A shared data model will reduce the efforts needed for integration between data sources and applications. The Common Data Model is a shared data model that can be accessed by multiple Microsoft technologies. The common data model is saved in the Azure Data Lake. Dataflow can load data into the CDM and other applications, as well as Power BI, can get data from the CDM. Stay tuned for upcoming articles from Dataflow series; we still have a lot to discuss; Linked vs Computed Entities, Licensing, CDM folder structure etc.

Part 1: What are the Use Cases of Dataflow for You in Power BI?

Part 2: Getting Starting with Dataflow in Power BI

Part 3: What is the Common Data Model and Why Should I Care? (this article)

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

14 thoughts on “What is the Common Data Model and Why Should I Care? Part 3 of Dataflow Series in Power BI

  • I have some serious doubts about CDM. It is already there for years, but no progress. Now suddenly Microsoft is making a push with CDM. But for how long and to what direction?

    A data model is to… model data. Then start using the data and the model. There is no data modelling tool for CDM that shows what the relations and dependencies are. It is all mapped directly on entities. There is a definition of the entities, but on the attributes it gets vague. It is all focussed on using CDM.

    Then if you start using CDM, if you want to define custom entities/attributes, again there is no modelling tool. Also no versioning. So how do you keep track… then you also need to keep track of how mapped your own company data to the CDM or to your custom extensions.

    With other words: no way that is usable in any way. Not for small companies and definitely not enterprise ready.

    I really would like to stand corrected on this, but I doubt it.

    • Hi Johannes
      Thanks for your comment. I agree with most of your points here. There is not a good modeling tool to interact with and design the CDM. But as Microsoft focused on this recently even heavier (especially considering collaboration with Adobe and SAP), then I expect some new updates on this, maybe we see some tools coming 🙂
      Cheers
      Reza

  • I guess we can only map one entity to one CDM? And if so is it possible to only map some of the fields? Let’s say within our entity we only have to which fit to the CDM
    Thanks

    • You can map one entity from the dataflow to one entity in the CDM (CDM has hundreds of entities). and Yes, correct. You can map only the fields you want. It is not mandatory to map all fields.

  • Hi ReZa Rad,
    Thank you sharing Data flow concept.please can share different between Power BI pro vs Premium with Data flow.
    becoz..next month i am starting new project my end user only 100 employees that way we choose Power BI pro licensee for 100 persons.
    after Reading Your blog i will plan to implement dataflow in my project
    My datasource is:Azure Sql DB(only one data base now)

    thank
    Lavidiya

    • Hi
      You can definitely use Dataflow even using Pro license. However, you will miss some features; such as linked and computed entities, AI features in Dataflow and few others.
      I will write a post later on about it.
      Cheers
      Reza

  • Hi Thanks for your post. Its really helpul.
    Can we ingest and write data in CDM format using Azure Data Factory similar to what can be done with Power BI data flows?
    If yes, Can you please brief us a bit with the process or info about it?

  • Hi Reza,
    Do you know where can we have a description of the MS Common Data Model categories? We want to understand what they include, so we can map them properly. I haven’t been able to find a clear definition on the main categories and the groupings under each.

  • Hi, very good info!
    Do you know if Dataflows can be stored in ADLS as parquet files instead of csv?

  • Hi Reza,
    I do not see the map to standard option in power query.
    Do I need to enable it from somewhere?
    Have you made any video on how to actually implement this practically?
    Thanks!

Leave a Reply