R Data Gathering from Spreadsheet and SQL Server

notebook-1071775_640

In previous post you’ve learned about data structures in R. In this post I’ll explain Data Gathering, Data Understanding, and Data Cleaning, which are the main tasks of data management that need to be done before any machine learning process. In this post I will explain how to fetch data from Spreadsheet and SQL Server. Gathering data from other resources help us to do some analysis inside R Studio.

2017-01-11_14h22_01

In RStudio we can fetch data from different resources such as Excel, SQL Server, HTML, SAS and so forth.

2017-01-11_14h24_57

excImport from Spreadsheet

There are two main approach to import Excel and CSV file into R:

  1. from menu and writing R codes.
  2. To import data from RStudio Menuย .

2017-01-11_11h25_48

For importing the excel file, from โ€œFileโ€ menu, โ€œImport datasetโ€ should be selected, then โ€œFrom Excelโ€.

Next, below windows will be shown up. We brows our desire Excel fileย from “File/Url “

2017-01-11_11h37_44

After we browse computer to import the required excel file, RStudio automatically detects the Excelย headers data preview.

2017-01-11_11h42_10

We also able to change importing options such asย first row as header, sheet number, skip how many rows and so forth.

2017-01-11_11h49_15

Moreover, the code behind the reading excel file, has been shown in Code Preview section.

2017-01-11_11h49_05

Finally, by clicking on import option, all the dataset will be shown in Rstudio as below

2017-01-11_12h40_41

This process is equal to write the below codes in RStudio command editor:

2017-01-11_11h49_05

R supports more than 6000 packages that each of them help the R users to have more functionality. For instance, to read the Excel file we need to call or install package โ€œreadxlโ€. If the package is already installed, we just call it via โ€œlibrary(readxl)โ€, if not we have to use command โ€œinstall.packages(readxl)โ€.

Next, we call the function โ€œread_excel()โ€. We should provide theย path of the excel file as argument for his function. Finally, by calling the View() function, dataset will be shown to user.

We can follow the same process for CSV files .

2017-01-11_12h41_26

 

The only difference between importing Excel and CSV file is on setting up the import option as there are some more options to set. Such as โ€œDelimiterโ€, โ€œTrim Spacesโ€ and so forth.

Moreover, the code behind the importing CSV file into RStudio is like:

2017-01-11_11h59_32

We need to call โ€œlibraray(readr)โ€ andย  call function โ€œread_csvโ€.

In function โ€œread_csvโ€ we can specify that do not consider the character type as Factor by adding stringsAsFactors = FALSE to the function arguments.

usedcars <- read_excel(“C:/โ€ฆโ€ฆโ€ฆ.xls”,stringsAsFactors = FALSE)

sqlserverSQL Server 2016

we can fetch data fromย SQL Server and we want to do some analysis on it in RStudio. To get data from SQL Server 2016, first we should install a package named RODBC.

2017-01-11_13h39_40

There is a function name odbcDriverConnect () that help us to connect to SQL Server database.

2017-01-11_13h42_23

We should specify the Driver as โ€œSQL Serverโ€, server as โ€œlocalhostโ€, DatabaseName asโ€DWโ€. The output of the odbcDriverConnect will be stored in a variable name dbhandle.

Then we can use sqlquery function to fetch data from SQL Server.in this function we pass the dbhandle as it contains the information about the database connection and the second argument should be SQL query to fetch data from tables.

For instance, we have a table name โ€œusedcarsโ€ in โ€œDWโ€ database in SQL Sever 2016, we want to fetch all columns into a variable name โ€œresโ€. The below query help us to do that.

2017-01-11_13h49_33

We use View () commend to see the table inside the res variable

In Next post I will explain how to explore data to check the behaviour of data.

Save

Leila Etaati on LinkedinLeila Etaati on TwitterLeila Etaati on Youtube
Leila Etaati
Trainer, Consultant, Mentor
Leila is the first Microsoft AI MVP in New Zealand and Australia, She has Ph.D. in Information System from the University Of Auckland. She is the Co-director and data scientist in RADACAD Company with more than 100 clients in around the world. She is the co-organizer of Microsoft Business Intelligence and Power BI Use group (meetup) in Auckland with more than 1200 members, She is the co-organizer of three main conferences in Auckland: SQL Saturday Auckland (2015 till now) with more than 400 registrations, Difinity (2017 till now) with more than 200 registrations and Global AI Bootcamp 2018. She is a Data Scientist, BI Consultant, Trainer, and Speaker. She is a well-known International Speakers to many conferences such as Microsoft ignite, SQL pass, Data Platform Summit, SQL Saturday, Power BI world Tour and so forth in Europe, USA, Asia, Australia, and New Zealand. She has over ten yearsโ€™ experience working with databases and software systems. She was involved in many large-scale projects for big-sized companies. She also AI and Data Platform Microsoft MVP. Leila is an active Technical Microsoft AI blogger for RADACAD.

Leave a Reply

Your email address will not be published. Required fields are marked *