Step Beyond the 10GB Limitation of Power BI

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2016-04-10_15h41_53

Power BI free allows you to have 1GB data per user, and pro 10GB. However in many organizations you have much bigger size of data than 10GB, sometimes you deal with Tera bytes, even Peta bytes of data. Is Power BI limiting you for the visualization? Short answer; No! In this post I’ll explain you options that you have when you deal with large data sets. As any other solutions there are pros and cons of working with large data sets with Power BI, Let’s explore them together.

Compressed Structure of Power BI

First of all I have to mention that Power BI compress each data set effectively before loading into memory, this is one of the big advantages of xVelocity In-memory engine that Power BI, Power Pivot, and SSAS Tabular built on top of that. Compression happens automatically, that means you don’t need to set a configuration, or allow Power BI to do something. Any data set will be compressed in a reasonable level when you import the data into Power BI.

As an example I’ve imported a CSV file with ~800 MB data size. When it loaded into Power BI I had only 8MB power BI file to work with. That’s an impressive compression.

However compression rate isn’t always like that, it depends on type of data set, data type of columns and some other facts. The bottom line in this section is that, if you have a data set that is more than 10GB and won’t grow significantly, then consider loading that into Power BI to check how compressed it would be, maybe you get it under 10GB limitation and then you are good to go :) If that is not the case then continue reading.

Live Connection

With Power BI we can connect live to some data sources; On premises or in cloud. Well, here’s the trick; if your data set is large, then use Live connection! simple but useful trick. You can have a SQL Server, Oracle or whatever data source you want with any size of data you need, and create a live connection to it with Power BI. Live connection won’t import data into the model in Power BI. Live connection brings the metadata and data structure into Power BI, and then you can visualize data based on that. with every visualization a query will be sent to the data source and brings the response.

Live Connection to SQL Server On Premises

Let’s look at an example of such Live connection to SQL Server on premises database. I have a database with a large table contains 48 million records of data. the table itself is ~800 MB

2016-04-10_11h46_08

I know it is not more than 10GB, but the method would be the same even if I have 10TB sized data table. this table is big enough to show how live connection works for this example.

When you go through Get Data and select this table (or any other tables in the data set) and click Load, you’ll see the option to choose between Live connection or Import. DirectQuery means Live Connection.

2016-04-10_10h49_33

After creating the live Connection you can create relationship in the model, or build visualization for that.

2016-04-10_11h59_25

There are however some limitations with Live connection which I’ll explore later in this post.

** For the Live connection to on-premises data sources you need to have Power BI Enterprise or Personal Gateway installed and configured.

You can connect to wide range of on-premises data sources, such as other databases, and also SSAS. as an example you can do:

 Live Connection to Azure

There are also many Azure data sources that you can use for the live connection Azure SQL Database which is the similar database engine of SQL Server on-premises (with some differences of course) but on cloud. Azure SQL Data Warehouse is the cloud database structure which supports both unstructured and structured data. Azure SQL DW is capable of expanding the compute engine of the database regardless of the storage engine of that. And Sparks on Azure HDInsight is the other azure data source that can be used for live connection to big data structure on azure.

2016-04-10_12h29_39

Limitations of Live Connection

Live connection is good to connect with large data sets, and won’t load the data into the model, so Power BI solution would be very small. However there are some limitations for this type of connection, such as; No DAX support, no data model formatting in the Data tab of Power BI, No multiple data sources support, and No Power Q&A support. Let’s go through them one by one;

No DAX Support

With Live connection there won’t be any Data tab in Power BI to create calculated measures, columns or tables. You have to create all calculations in the data source level. It means that if you have SQL Server as the source, create all calculated columns or measures in SQL Server source tables. Your modelling should be done through the data source. In Power BI desktop you would be able to only set up relationship and that’s all in terms of modelling in Power BI.

2016-04-10_12h35_52
No Data Tab, No DAX in Live Connection from Power BI

 

Formatting is not also available through the Modeling tab of Power BI, So if you want to set decimal points, or set a data type of column it is not possible through Power BI. You need to handle all of these through the data source (or Power Query, read further down).

2016-04-10_12h43_23
No Data Modeling tab with Live Connection in Power BI

Full Visualization Support

Fortunately visualization part of Power BI is fully supported in Live connection mode. The underlying reason is that visualization is a separate engine in Power BI, and this is one of awesome reasons that a product built on top of separate components usually works better than a product all in one with no underlying component. You can build any visualization and you won’t be limited at all.

Multiple Data Sources is Not Supported

With Power BI one of the great advantages is that you can combine data sets from multiple sources, multiple data bases or files can be participated in building a model. Unfortunately when you work with Live connection you won’t be able to have data from more than one data source. Even if you want to get data from another database on the same server you will face messages like below;

2016-04-10_10h28_24

So adding another data source requires data to be imported, which is not useful for the scenario of working with large data sets.

No Power Q&A

As you know one of the features of Power BI website is natural language questions and answering engine called Power Q&A. This feature is not available (at the time of writing this post) in Power BI Live Connection. when you have a live connection your dashboard won’t have Power Q&A question box on the top.

2016-04-10_15h16_57
No Power Q&A with Power BI Live Connection

Don’t Forget the Power Query

Fortunately Power Query still is available with Live Connection. This gives you ability to join tables, flatten them if you require, apply data transformation and prepare the data as you want. Power Query can also set the data types in a way that be more familiar for the Power BI model to understand. If you want to learn more about Power Query read Power Query sections of Power BI online book.

2016-04-10_15h28_12

In screenshot above you can see that I’ve joined DimProduct, DimProductSubCategory, and DimProductCategory to bring all fields in single table; DimProduct. And this all happened with Live Connection.

Optimization at Data Source Level

Live connection to data source means report will send queries to data source. Data Sources are different in terms of response time. SSAS tabular might produce faster result, and normal SQL Server database slower. Don’t forget that all performance and indexing tips should be considered carefully when you work with Live connection. If you are working for example with SQL Server consider proper indexing, column-store indexes and many other optimization and performance tuning tips.

Just for a very small example of performance tuning; here is what performance I get when I have normal index on my table with 48 Million records;

2016-04-10_15h34_02
Regular Index

A regular select sum from my table with 48 million records takes 4 minutes and 4 seconds to run. and the same query responds in less than a second when I have clustered column store index;

and significantly improved performance when I have a Clustered Column Store index on the same table with same amount of data rows;

Performance Boost with Clustered Column Store Index
Performance Boost with Clustered Column Store Index

I’m not going to teach you all performance tuning in this post, and I can’t do it, because you have to read books, blog posts, watch videos to learn that. That is a whole different topic on its own. and the most important thing is that; Performance tuning is different for each data source. performance tuning for Oracle, SQL Server, and SSAS are totally different. Your friend for this part would be Google, and vast amount of free content available on the internet for you to study.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

6 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">