How to generate 1 billion rows using U-SQL


I was interested in generating some dummy data to do some load testing in MS Azure and came up with a pretty nifty way to generate lots and lots of data using U-SQL.  The tip is to simply create a small U-SQL custom generator and use it to extract from a dummy file.

First I created a dummy file… literally….  In my input folder on my local machine I just created a blank, 0-byte file just to stop the custom extractor complain that I’m not actually going to use an input file.

The custom extractor uses a C# as follows

and this loop simply generates a single column line using the output.Set function.  The full code for the code behind file is

Once you have this in place you can call it from your U-SQL script


This script calls the CustomExtractor .GenerateSeries function and passes three arguments which in term become the three arguments used in the C# for loop.  So these can be customised pretty easily.

The @t select statement allows you to inject additional columns.  This could be where you generate columns for random dates, products, quantities etc on a pretty major scale if you wanted.

I first ran this locally on my machine and filled up my hard drive pretty quick, so switched to my Azure Data Lake Store where space is no issue.  With 2 verticies the query took 20 seconds to prep, sat 7 seconds in the queue but ran for 16 minutes.

The final result was a pretty easy to customise file with 1 billion rows that was about 20GB in my Azure Data Lake Store Account.GenerateSeries

Philip Seamark
Phil is Microsoft Data Platform MVP and an experienced database and business intelligence (BI) professional with a deep knowledge of the Microsoft B.I. stack along with extensive knowledge of data warehouse (DW) methodologies and enterprise data modelling. He has 25+ years experience in this field and an active member of Power BI community.

Leave a Reply

Your email address will not be published. Required fields are marked *