Every programming language has specific data structure. R language also has some predefined data structure that each of them can be useful for specific purposes. For doing machine learning in R, we normally use data structure such as Vector, List, Data Frame, factors, arrays and matrix. In this post I will explain some of them briefly.
Vector – C()
Vector stores the order set of values. All values have same data type. Each vector can have types like Integer (numbers without decimals), Double (numbers with decimals), Character (text data), and Logical (TRUE or FALSE values).
We use Function C () to define a vector to store people name.
Subject_name is a Vector that contains Character value (People name).
We can use the Typeof () to determine the type of Vector.
The output will be:
Now we are going to have another vector that stores the people age.
The Age vector stores Integer value. We create another vector to store a Boolean information about whether people married or single:
Using the Typeof () Function to see the Vector type:
We can select specific elements of the each vector, for example to extract the second name in Subject_Name vector, we write below code:
which the output will be:
Moreover, there is a possibility to get the range of value in a Vector. For example, we want to fetch the age second and third person we stored in Age vector, the code should be look like below:
The out put will be like:
Factor – Factor()
Factor is specific type of Vector that stores the categorical or ordinal variables, for instance, instead of storing the female and male in a vector computer stores 1,2 that takes less space, for defining a Factor for storing gender we first should have a vector of gender as below
C(“Female”, “Male”)
then we use commend Factor() as below
as you can see in above output, when I called the “gender” , it shows the gender of people that we stored in Vector plus a value called “Level”, Level show the possible value in gender vector.
for instance, currently we just have BA and Master students . However, in future there is a possibility that we have PhD or Diploma students. So we create a factor as below that can support future types as well:
we should specify the “Levels” like this :levels = c(“BA”,”Master”, “PhD”,”Diploma”)
Lists-list()
List is so similar to vector. List able to have combination of data types whilst in Vector we just can have one data type.
For instance for storing the student’s information we can use list as below:
the out put of calling students list will be look like:
List helps us to have combination of the data type.
Data frames- data.frame()
Data Frames are most important data structure in machine learning process. It similar to Table as it has both columns and rows.
To define a Frame we use data.frame syntax as below:
studentData is a data frame that contains some vectors like subject_name, Age, Gender and Student_Level.
R automatically convert every character vector to a factor, hence to avoid that we normally use StringAsfactor as parameter that specify character data type should not consider as factor.
the output of calling Studentdata will be look like:
As data frame is like a table we can access the cells, rows and columns separately
for instance, to fetch a specific column like age we use below code:
only the Age column as a Vector has been shown.
Moreover, we just want to see age and gender of students so we employ below code:
we can extract all the rows of the first column:
or extract all columns data of specific students using below code
in next post I will show how we can get data from different resources and how to visualize the data inside R.
Reference :L. Brents. Machine Learning with R, Pack Publishing, 2015
Save