# List containing a number, a vector, and a string
<- list(2, c(T,F,T,T), "hello")
my_list
# Print the list
print(my_list)
[[1]]
[1] 2
[[2]]
[1] TRUE FALSE TRUE TRUE
[[3]]
[1] "hello"
Appendix B
Vectors can contain only one data type (number, character, boolean)
Lists are data structures that can contain any R object
Lists can be created similarly to vectors, with the command list()
Elements of a list can be retrieved by indexing
my_list[[k]]
returns k-th element of my_list
You can return multiple items of a list via slicing
my_list[c(k1, ..., kn)]
returns elements in positions k1, ..., kn
my_list[k1:k2]
returns elements k1
to k2
names(my_list) <- c("name_1", ..., "name_k")
# Create list with 3 elements
my_list <- list(2, c(T,F,T,T), "hello")
# Name each of the 3 elements
names(my_list) <- c("number", "TF_vector", "string")
# Print the named list: the list is printed along with element names
print(my_list)
$number
[1] 2
$TF_vector
[1] TRUE FALSE TRUE TRUE
$string
[1] "hello"
my_list
named my_name
can be accessed with dollar operator
my_list$my_name
# Create list with 3 elements and name them
my_list <- list(2, c(T,F,T,T), "hello")
names(my_list) <- c("number", "TF_vector", "string")
# Access 2nd element using dollar operator and store it in variable
second_component <- my_list$TF_vector
# Print 2nd element
print(second_component)
[1] TRUE FALSE TRUE TRUE
Data Frames are the best way of presenting a data set in R:
Data frames can contain any R object
Data Frames are similar to Lists, with the difference that:
Data frames are constructed similarly to lists, using data.frame()
Important: Elements of data frame must be vectors of the same length
Example: We construct the Family Guy data frame. Variables are
person
– Name of characterage
– Age of charactersex
– Sex of characterThink of a data frame as a matrix
You can extract element in position (m,n)
by using
my_data[m, n]
Example: Peter is in 1st row. We can extract Peter’s name as follows
[1] "Peter"
To extract multiple elements on the same row or column type
my_data[c(k1,...,kn), m]
\quad or \quad my_data[k1:k2, m]
my_data[n, c(k1,...,km)]
\quad or \quad my_data[n, k1:k2]
Example: Meg is listed in 3rd row. We extract her age and sex
age sex
3 17 F
To extract entire rows or columns type
my_data[c(k1,...,kn), ]
\quad or \quad my_data[k1:k2, ]
my_data[, c(k1,...,km)]
\quad or \quad my_data[, k1:k2]
peter_data <- family[1, ] # Extracts first row - Peter
sex_age <- family[, c(3,2)] # Extracts third and second columns:
# sex and age
print(peter_data)
print(sex_age)
person age sex
1 Peter 42 M
sex age
1 M 42
2 F 40
3 F 17
4 M 14
5 M 1
Use dollar operator to access data frame columns
my_data
contains a variable called my_variable
my_data$my_variable
accesses column my_variable
my_data$my_variable
is a vectorExample: To access age in the family
data frame type
ages <- family$age # Stores ages in a vector
cat("Ages of the Family Guy characters are", ages)
cat("Meg's age is", ages[3])
Ages of the Family Guy characters are 42 40 17 14 1
Meg's age is 17
The size of a data frame can be discovered using:
nrow(my_data)
\quad number of rowsncol(my_data)
\quad number of columnsdim(my_data)
\quad \quad vector containing number of rows and columnsfamily_dim <- dim(family) # Stores dimensions of family in a vector
cat("The Family Guy data frame has", family_dim[1],
"rows and", family_dim[2], "columns")
The Family Guy data frame has 5 rows and 3 columns
Adding data to an existing data frame my_data
new_record
new_record
must match the structure of my_data
my_data
with my_data <- rbind(my_data, new_record)
new_variable
new_variable
must have as many components as rows in my_data
my_data
with my_data <- cbind(my_data, new_variable)
family
new_record
to family
person age sex
1 Peter 42 M
2 Lois 40 F
3 Meg 17 F
4 Chris 14 M
5 Stewie 1 M
6 Brian 7 M
family
funny
funny
with entries matching each character (including Brian)funny
to the Family Guy data frame family
person age sex funny
1 Peter 42 M High
2 Lois 40 F High
3 Meg 17 F Low
4 Chris 14 M Med
5 Stewie 1 M High
6 Brian 7 M Med
Instead of using cbind
we can add a new varibale using dollar operator:
new_variable
v
containing values for the new variablev
must have as many components as rows in my_data
my_data
with my_data$new_variable <- v
Example:
family
family$age
by 12v <- family$age * 12 # Computes vector of ages in months
family$age.months <- v # Stores vector as new column in family
print(family)
person age sex funny age.months
1 Peter 42 M High 504
2 Lois 40 F High 480
3 Meg 17 F Low 204
4 Chris 14 M Med 168
5 Stewie 1 M High 12
6 Brian 7 M Med 84
We saw how to use logical flag vectors to subset vectors
We can use logical flag vectors to subset data frames as well
Suppose to have data frame my_data
containing a variable my_variable
Want to subset records in my_data
for which my_variable
satisfies a condition
Use commands
flag <- condition(my_data$my_variable)
my_data[flag, ]
Example:
family
family$sex == "M"
# Create flag vector for male Family Guy characters
flag <- (family$sex == "M")
# Subset data frame "family" and store in data frame "subset"
subset <- family[flag, ]
# Print subset
print(subset)
person age sex funny age.months
1 Peter 42 M High 504
4 Chris 14 M Med 168
5 Stewie 1 M High 12
6 Brian 7 M Med 84
R has a many functions for reading characters from stored files
We will see how to read Table-Format files
Table-Formats are just tables stored in plain-text files
Typical file estensions are:
.txt
for plain-text files.csv
for comma-separated valuesTable-Formats can be read into R with the command
read.table()
NA
#
*
read.table()
.txt
or .csv
file and outputs a data frameread.table()
header = T/F
– Tells R if a header is presentna.strings = "string"
– Tells R that "string"
means NA
To read family_guy.txt
into R proceed as follows:
Download family_guy.txt and move file to Desktop
Open the R Console and change working directory to Desktop
family_guy.txt
into R and store it in data frame family
with coderead.table()
that
family_guy.txt
has a header*
family
to screen person age sex funny age.mon
1 Peter NA M High 504
2 Lois 40 F <NA> 480
3 Meg 17 F Low 204
4 Chris 14 M Med 168
5 Stewie 1 M High NA
6 Brian NA M Med NA
.txt
fileExample: Analysis of Consumer Confidence Index for 2008 crisis from Lecture 4
c()
.txt
file insteadGoal: Perform t-test on CCI difference for mean difference \mu = 0
read.table()
The CCI dataset can be downloaded here 2008_crisis.txt
The text file looks like this
To perform the t-test on data 2008_crisis.txt
we proceed as follows:
Download dataset 2008_crisis.txt and move file to Desktop
Open the R Console and change working directory to Desktop
2008_crisis.txt
into R and store it in data frame scores
with codescores
into 2 vectors# CCI from 2007 is stored in 2nd column
score_2007 <- scores[, 2]
# CCI from 2009 is stored in 3nd column
score_2009 <- scores[, 3]
t.test
is below
One Sample t-test
data: difference
t = 38.144, df = 11, p-value = 4.861e-13
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
68.15960 76.50706
sample estimates:
mean of x
72.33333
They should be meaningful and end in .R
_
) to separate words within a nameBigCamelCase
(link)If possible avoid using names of existing functions and variables
Use <-
and not =
for assignment
=
, +
, -
, <-
, etc.)=
when calling a function:
, ::
and :::
do not need spacingExtra spacing is ok if it improves alignment of =
or <-
If a function definition runs over multiple lines, indent the second line to where the definition starts
return(object)
Often you can call a function without explicitly naming arguments:
plot(height, weight)
mean(weight)
This might be fine for plot()
or mean
However for less common functions:
Comments
#
and a single space-
and=
to break up code into easily readable chunksHomepage License Contact