Power Query can connect to a web page and get data from there, and I have explained it before in my article about movies data analysis. However, the default way of getting data from the web page doesn’t fetch images into your report. Images in a webpage come through their own URL. If you can use Power Query in a way that it reads that URL, then you can bring it into Power BI, and visualize it. In this article, I show you a method that you can use to fetch images from the webpage and bring it to visualization in Power BI, all using Power Query magic. To learn more about Power BI, read Power BI book from Rookie to Rock Star.
Sample Source
For this example, I am using the IMDB Top 250 movies list as a source. You can access it here: https://www.imdb.com/chart/top
You can see how the page looks like below. What we want to do in this example, is to get those images into our Power BI report.
Get Data from Web
You can use Get Data from Web in Power BI, and you’ll have the option to even customize it a bit, and use the Use Table from Examples to get part of the data from the table above.
I have explained in another article how you can get something like below from the table above:
The problem, however, is that still, you won’t get the images. So let’s see how we can get that.
Edit Queries
First thing you need to do is to go to Power Query Editor and remove all extra steps. Only leave the first step of reading data from that page. This might be using functions such as Web.Contents, Web.Page or Web.BrowserContent, or etc. This is a step that has the URL in it.
Then Convert it to Table:
Now you are ready to do the next step, which is searching for the HTML code of the images.
Finding the HTML code of the images in the page
There are different tools and methods to find the HTML code for the image. The one that is simple and I use often is using the Inspect feature of the Chrome web browser. If you browse to the same web page using Google Chrome, then you can right-click on one of the images, and select Inspect.
That will bring up the HTML code part of the page for that element you have right-click on. In this case, would be the source of the image!
The good part about the Inspect element of Chrome that I like is the ability to move in the code, and see which part of the page it is. If I hover a bit higher on top of the tbody class lister-list, I can see the entire table;
Breaking the Data using the HTML codes
Now that you know what code you have to search for, you can apply some transformations in Power Query to get into that part.
The first step for me is to isolate that table of all movies (I don’t start from images, because there might be other images in the page outside of that table). That table starts with tbody…. “lister-list. So the “lister-list” can be the delimiter for me to start. and then the delimiter to end is </tbody>.
So I can use Extract Text Between Delimiter
With configurations below:
Now I’ve got the table section. I can split it into rows, but using the <tr> part:
The key for Split Column is to Split into ROWS;
This will give you one row per movie, except the very first row which should be removed.
You can remove that using the Remove Top Rows then top 1 row.
Now let’s get to the image part of our table. as you can see in the code below from Inspect element of the Chrome browser, the image is within “<img src=” and ” width=”. (this combination would be of course different for other webpages)
So All we need to do is another Extract Text Between Delimiters:
And here you go, we got the URL of all images in that table:
You can use the same approach to get other parts of those movies, such as Title, the link to the movie page, etc. But let’s keep this article only focused on one subject. I got the rest of the data using simple Get data from the web and the method explained here and then merged it back to the Images, so here is my table at the end:
After Close and Apply. In the data model, I have to set the Data Category of this field to Image URL:
Now I’ve got images everywhere in my Power BI report I want:
Summary
You have seen in this article how powerful can be the get data experience of Power BI because it uses all the transformations in Power Query. For this example, I have used mostly Split Column and Extract Text Between Delimiter transformations. The key is always finding the part of HTML code you want to fetch, and then use appropriate transformations to fetch it correctly.
Hi Reza. You can actually do this a bit more easily using Html.Table, via the same approach used to extract links in this example: https://blog.crossjoin.co.uk/2018/08/30/power-bi-extract-urls-web-page/
Ehren
Hi Ehren.
Yes, Correct. I have seen that article of Chris’s before, and it is awesome.
However, the fact is that the images are not always in Table. sometimes in DIV, or many other tags. The main learning of this article is that you should dig in, and find the way to reach to the HTML code of the image.
Cheers
Reza
Hi Reza. I just wanted to let you know that as of the October release, you can now extract image src values as well as hrefs using the “by example” UI. So the kind of scraping you demonstrate here just got much, much easier. Please try it out and let us know what you think!
Thanks Ehren,
will try that for sure. Good to see these improvements coming.
Cheers
Reza
Excellent explanation, Keep up the great work