R Data Structures for Machine Learning

Posted by on Jan 9, 2017 in Azure Machine Learning, Data Mining, R | No Comments
Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

screw-cap-1931743_1280

Every programming language has specific data structure. R language also has some predefined data structure that each of them can be useful for specific purposes. For doing machine learning in R, we normally use data structure such as Vector, List, Data Frame, factors, arrays and matrix. In this post I will explain some of them briefly.

Vector – C()

Vector stores the order set of values. All values have same data type. Each vector can have types like Integer (numbers without decimals), Double (numbers with decimals), Character (text data), and Logical (TRUE or FALSE values).

vector

We use Function C () to define a vector to store people name.

codee1

Subject_name is a Vector that contains Character value (People name).

We can use the Typeof () to determine the type of Vector.

code2

The output will be:

code3

Now we are going to have another vector that stores the people age.

code4

The Age vector stores Integer value. We create another vector to store a Boolean information about whether people married or single:

code5

Using the Typeof () Function to see the Vector type:

code6

We can select specific elements of the each vector, for example to extract the second name in Subject_Name vector, we write below code:

code9

which the output will be:

code8

Moreover, there is a possibility to get the range of value in a Vector. For example, we want to fetch the age second and third person we stored in Age vector, the code should be look like below:

code10

The out put will be like:

code11

Factor – Factor()

Factor is specific type of Vector that stores the categorical or ordinal variables, for instance, instead of storing the female and male in a vector computer stores 1,2 that takes less space, for defining a Factor for storing gender we first should have a vector of gender as below

C(“Female”, “Male”)

then we  use commend Factor() as below

code12

as you can see in above output, when I called the “gender” , it shows the gender of people that we stored in Vector plus a value called “Level”, Level show the possible value in gender vector.

for instance, currently we just have BA and Master students . However, in future there is a possibility that we have PhD or Diploma students. So we create a factor as below that can support future types as well:

code13

we should specify the “Levels” like this :levels = c(“BA”,”Master”, “PhD”,”Diploma”)

Lists-list()

List is so similar to vector. List able to have combination of data types whilst in Vector we just can have one data type.

list

For instance for storing the student’s information we can use list as below:

code14

the out put of calling students list will be look like:

code15

List helps us to have combination of the data type.

Data frames- data.frame()

Data Frames are most important data structure in machine learning process. It similar to Table as it has both columns and rows.

dataframe

To define a Frame we use data.frame syntax as below:

dataframe1

studentData is a data frame that contains some vectors like subject_name, Age, Gender and Student_Level.

R automatically convert every character vector to a factor, hence to avoid that we normally use StringAsfactor as parameter that specify character data type should not consider as factor.

the output of calling Studentdata will be look like:

dfout

As data frame is like a table we can access the cells, rows and columns separately

for instance, to fetch a specific column like age we use below code:

agecol

only the Age column as a Vector has been shown.

Moreover, we just want to see age and gender of students so we employ below code:

2colmdf

we can extract all the rows of the first column:

studentname

or extract all columns data of specific students using below code

studentdata

in next post I will show how we can get data from different resources and how to visualize the data inside R.

Reference :L. Brents. Machine Learning with R, Pack Publishing, 2015

Save

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Leila Etaati
Dr. Leila Etaati is Principal Data Scientist, BI Consultant, and Speaker. She has over 10 years’ experience working with databases and software systems. She was involved in many large-scale projects for big sized companies. Leila has PhD of Information System department, University of Auckland, MS and BS in computer science. Leila is Microsoft Data Platform MVP.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">