File Managment in Azure Data Lake Store(ADLS) using R Studio

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

datalake

In this post, I am going to share my experiment in how to do file management in ADLS using with R studio environment,

So how it works? we able to manage ADLS from Rstudio environment using R scripts, so without accessing the ADLS we able to manage the portal, bring Data from ADLS to R studio to do Machine Learning  practice, However, after we sure that our codes are good enough, now we can use U-SQL inside ADLS to embed R codes in ADLS environment. In this post, first I am going to show how we can access files from R studio environment to ADLS file for file management and practising machine learning there.  In the next post on ADLS, I will show after you test your code in ADLS environment, now you want to embed the code inside ADLS using R scripts in U-SQL (the language we have in ADLS).

 

 What do we need to start?

to do this you need to have below items

1. An Azure subscription

2. Create an Azure Data Lake Store Account

3. Create an Azure Active Directory Application (for the aim of service-to-service authentication).

4. An Authorization Token from Azure Active Directory Application

You should have below information:

Client_id (application id), Tenant_Id, Client_secret, and OAUTH 2.0 Token endpoint.

to start in R studio, you need to install below packages

Now I need to request information from ADLS using some R codes as below

after setting up the connection, first I am going to explore the folders that I have in ADLS,

so these are the folders that I have in ADLS

21

 

Now, I am going to use R scripts inside R studio to explore the folders and files in my ADLS, as below:

after running the above codes, I got the below message in the output window:

List Folders

 

so it shows me all the available folders in the root of my ADLS.

Create Folders

imagine I am going to create a folder inside my ADLS storage to do that I write below codes

so if I check the ADLS I will found this folder

26

 

 Read Data

Now imagine that we are going to read a file and load it from ADLS into R Studio, or even copy into local PC, so the first step I am going to access the directory using “Get” function, through the mytempdir folder and access the iris.csv file in this folder.

So, if I need to load it just for working in R studio without download it I can use the below codes

so all data will be load into variable “Dataforiris”,

however, you may be interested in a local memory then you can use below codes:

I just load the data into my local C folder and it can be accessible

27

this practice was for a small dataset that it took less than 1 second to load it from ADLS, however, I tried it for a dataset for 64 million records, the loading process took about 70 seconds from ADLS to R Studio. however if you interested to just work with R inside ADLS then you have to use R inside U-SQL, which I am going to talk about it in next post!

https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/03/14/using-r-to-perform-filesystem-operations-on-azure-data-lake-store/

 

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Leila Etaati
Dr. Leila Etaati is Principal Data Scientist, BI Consultant, and Speaker. She has over 10 years’ experience working with databases and software systems. She was involved in many large-scale projects for big sized companies. Leila has PhD of Information System department, University of Auckland, MS and BS in computer science. Leila is Microsoft Data Platform MVP.

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">