seq(1, 11, length.out = 6)
[1] 1 3 5 7 9 11
Appendix B
Functions are a class of objects
Format of a function is name followed by parentheses containing arguments
Functions take arguments and return a result
We already encountered several built in functions:
plot(x, y)
lines(x, y)
seq(x)
print("Stats is great!")
cat("R is great!")
mean(x)
sin(x)
plot(x, y)
has formal arguments two vectors x
and y
plot(height, weight)
has actual arguments height
and weight
plot(height, weight)
the arguments are matched:
height
corresponds to x-variableweight
corresponds to y-variableIf a function has a lot of arguments, positional matching is tedious
For example plot()
accepts the following (and more!) arguments
Argument | Description |
---|---|
x |
x coordinate of points in the plot |
y |
y coordinate of points in the plot |
type |
Type of plot to be drawn |
main |
Title of the plot |
xlab |
Label of x axis |
ylab |
Label of y axis |
pch |
Shape of points |
Issue with having too many arguments is the following:
pch = 2
pch
x
y
type
xlab
ylab
pch = 2
by the call
plot(weight, height, pch = 2)
weight
is implicitly matched to x
height
is implicitly matched to y
pch
is explicitly matched to 2
plot(x = weight, y = height, pch = 2)
plot(height, weight)
plot(x = height, y = weight)
plot(y = weight, x = height)
We have already seen another example of named actual arguments
seq(from = 1, to = 11, by = 2)
seq(1, 11, 2)
If however we want to divide the interval [1, 11] in 5 equal parts:
seq(1, 11, length.out = 6)
seq(1, 11, 6)
seq()
is by
seq(1, 11, 6)
assumes that by = 6
()
getwd()
– which outputs current working directoryls()
– which outputs names of objects currently in memorymy_function
is belowmy_function(arguments)
The R function mean(x)
computes the sample mean of vector x
We want to define our own function to compute the mean
Example: The mean of x
could be computed via
sum(x) / length(x)
We want to implement this code into the function my_mean(x)
my_mean
takes vector x
as argumentmy_mean
returns a scalar – the mean of x
my_mean
on an example# Generate a random vector of 1000 entries from N(0,1)
x <- rnorm(1000)
# Compute mean of x with my_mean
xbar <- my_mean(x)
# Compute mean of x with built in function mean
xbar_check <- mean(x)
cat("Mean of x computed with my_mean is:", xbar)
cat("Mean of x computed with R mean is:", xbar_check)
cat("They coincide!")
Mean of x computed with my_mean is: 0.02339032
Mean of x computed with R mean is: 0.02339032
They coincide!
Print
and cat
produce different output on character vectors:
print(x)
prints all the strings in x
separatelycat(x)
concatenates strings. There is no way to tell how many were thereTRUE
, FALSE
or NA
TRUE
and FALSE
can be abbreviated with T
and F
NA
stands for not availableLogical vectors are extremely useful to evaluate conditions
Example:
x
t
# Generate a vector containing sequence 1 to 8
x <- seq(from = 1 , to = 8, by = 1)
# Generate vector of flags for entries strictly above 5
y <- ( x > 5 )
cat("Vector x is: (", x, ")")
cat("Entries above 5 are: (", y, ")")
Vector x is: ( 1 2 3 4 5 6 7 8 )
Entries above 5 are: ( FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE )
Question: How to do this?
Hint: T/F
are interpreted as 1/0
in arithmetic operations
sum(x)
sums the entries of a vector x
sum(x)
to count the number of T
entries in a logical vector x
x <- rnorm(1000) # Generates vector with 1000 normal entries
y <- (x > 0) # Generates logical vector of entries above 0
above_zero <- sum(y) # Counts entries above zero
cat("Number of entries which are above the average 0 is", above_zero)
cat("This is pretty close to 500!")
Number of entries which are above the average 0 is 513
This is pretty close to 500!
NA
value - Not AvailableNA
is carried through in computations: operations on NA
yield NA
as the resultComponents of a vector can be retrieved by indexing
vector[k]
returns k-th component of vector
To modify an element of a vector use the following:
vector[k] <- value
stores value
in k-th component of vector
Returning multiple items of a vactor is known as slicing
vector[c(k1, ..., kn)]
returns components k1, ..., kn
vector[k1:k2]
returns components k1
to k2
x
can be deleted by using
x[ -c(k1, ..., kn) ]
which deletes entries k1, ..., kn
# Create a vector x
x <- c(11, 22, 33, 44, 55, 66, 77, 88, 99, 100)
# Print vector x
cat("Vector x is:", x)
# Delete 2nd, 3rd and 7th entries of x
x <- x[ -c(2, 3, 7) ]
# Print x again
cat("Vector x with 2nd, 3rd and 7th entries removed:", x)
Vector x is: 11 22 33 44 55 66 77 88 99 100
Vector x with 2nd, 3rd and 7th entries removed: 11 44 55 66 88 99 100
Code: Suppose given a vector x
Create a flag vector by using
flag <- condition(x)
condition()
is any function which returns T/F
vector of same length as x
Subset x
by using
x[flag]
x[ x < 0 ]
# Create numeric vector x
x <- c(5, -2.3, 4, 4, 4, 6, 8, 10, 40221, -8)
# Get negative components from x and store them in neg_x
neg_x <- x[ x < 0 ]
cat("Vector x is:", x)
cat("Negative components of x are:", neg_x)
Vector x is: 5 -2.3 4 4 4 6 8 10 40221 -8
Negative components of x are: -2.3 -8
a
and b
&
x[ (x > a) & (x < b) ]
# Create numeric vector
x <- c(5, -2.3, 4, 4, 4, 6, 8, 10, 40221, -8)
# Get components between 0 and 100
range_x <- x[ (x > 0) & (x < 100) ]
cat("Vector x is:", x)
cat("Components of x between 0 and 100 are:", range_x)
Vector x is: 5 -2.3 4 4 4 6 8 10 40221 -8
Components of x between 0 and 100 are: 5 4 4 4 6 8 10
which()
allows to convert a logical vector flag
into a numeric index vector
which(flag)
is vector of indices of flag
which correspond to TRUE
# Create a logical flag vector
flag <- c(T, F, F, T, F)
# Indices for flag which
true_flag <- which(flag)
cat("Flag vector is:", flag)
cat("Positions for which Flag is TRUE are:", true_flag)
Flag vector is: TRUE FALSE FALSE TRUE FALSE
Positions for which Flag is TRUE are: 1 4
which()
can be used to delete certain entries from a vector x
Create a flag vector by using
flag <- condition(x)
condition()
is any function which returns T/F
vector of same length as x
Delete entries flagged by condition
using the code
x[ -which(flag) ]
# Create numeric vector x
x <- c(5, -2.3, 4, 4, 4, 6, 8, 10, 40221, -8)
# Print x
cat("Vector x is:", x)
# Flag positive components of x
flag_pos_x <- (x > 0)
# Remove positive components from x
x <- x[ -which(flag_pos_x) ]
# Print x again
cat("Vector x with positive components removed:", x)
Vector x is: 5 -2.3 4 4 4 6 8 10 40221 -8
Vector x with positive components removed: -2.3 -8
The main functions to generate vectors are
c()
concatenateseq()
sequencerep()
replicateWe have already met c()
and seq()
but there are more details to discuss
Recall: c()
generates a vector containing the input values
c()
can also concatenate vectorsYou can assign names to vector elements
This modifies the way the vector is printed
Given a named vector x
names(x)
unname(x)
# Create named vector
x <- c(first = "Red", second = "Green", third = "Blue")
# Access names of x via names(x)
names_x <- names(x)
# Access values of x via unname(x)
values_x <- unname(x)
cat("Names of x are:", names(x))
cat("Values of x are:", unname(x))
Names of x are: first second third
Values of x are: Red Green Blue
seq
is
seq(from =, to =, by =, length.out =)
by = 1
seq(x1, x2)
is equivalent to x1:x2
x1:x2
is preferred to seq(x1, x2)
# Generate two vectors of integers from 1 to 6
x <- seq(1, 6)
y <- 1:6
cat("Vector x is:", x)
cat("Vector y is:", y)
cat("They are the same!")
Vector x is: 1 2 3 4 5 6
Vector y is: 1 2 3 4 5 6
They are the same!
rep
generates repeated values from a vector:
x
vectorn
integerrep(x, n)
repeats n
times the vector x
# Create a vector with 3 components
x <- c(2, 1, 3)
# Repeats 4 times the vector x
y <- rep(x, 4)
cat("Original vector is:", x)
cat("Original vector repeated 4 times:", y)
Original vector is: 2 1 3
Original vector repeated 4 times: 2 1 3 2 1 3 2 1 3 2 1 3
The second argument of rep()
can also be a vector:
x
and y
vectorsrep(x, y)
repeats entries of x
as many times as corresponding entries of y
x <- c(2, 1, 3) # Vector to replicate
y <- c(1, 2, 3) # Vector saying how to replicate
z <- rep(x, y) # 1st entry of x is replicated 1 time
# 2nd entry of x is replicated 2 times
# 3rd entry of x is replicated 3 times
cat("Original vector is:", x)
cat("Original vector repeated is:", z)
Original vector is: 2 1 3
Original vector repeated is: 2 1 1 3 3 3
rep()
can be useful to create vectors of labelsVectors can contain only one data type (number, character, boolean)
Lists are data structures that can contain any R object
Lists can be created similarly to vectors, with the command list()
Elements of a list can be retrieved by indexing
my_list[[k]]
returns k-th element of my_list
You can return multiple items of a list via slicing
my_list[c(k1, ..., kn)]
returns elements in positions k1, ..., kn
my_list[k1:k2]
returns elements k1
to k2
names(my_list) <- c("name_1", ..., "name_k")
# Create list with 3 elements
my_list <- list(2, c(T,F,T,T), "hello")
# Name each of the 3 elements
names(my_list) <- c("number", "TF_vector", "string")
# Print the named list: the list is printed along with element names
print(my_list)
$number
[1] 2
$TF_vector
[1] TRUE FALSE TRUE TRUE
$string
[1] "hello"
my_list
named my_name
can be accessed with dollar operator
my_list$my_name
# Create list with 3 elements and name them
my_list <- list(2, c(T,F,T,T), "hello")
names(my_list) <- c("number", "TF_vector", "string")
# Access 2nd element using dollar operator and store it in variable
second_component <- my_list$TF_vector
# Print 2nd element
print(second_component)
[1] TRUE FALSE TRUE TRUE
Data Frames are the best way of presenting a data set in R:
Data frames can contain any R object
Data Frames are similar to Lists, with the difference that:
Data frames are constructed similarly to lists, using data.frame()
Important: Elements of data frame must be vectors of the same length
Example: We construct the Family Guy data frame. Variables are
person
– Name of characterage
– Age of charactersex
– Sex of characterThink of a data frame as a matrix
You can extract element in position (m,n)
by using
my_data[m, n]
Example: Peter is in 1st row. We can extract Peter’s name as follows
[1] "Peter"
To extract multiple elements on the same row or column type
my_data[c(k1,...,kn), m]
\quad or \quad my_data[k1:k2, m]
my_data[n, c(k1,...,km)]
\quad or \quad my_data[n, k1:k2]
Example: Meg is listed in 3rd row. We extract her age and sex
age sex
3 17 F
To extract entire rows or columns type
my_data[c(k1,...,kn), ]
\quad or \quad my_data[k1:k2, ]
my_data[, c(k1,...,km)]
\quad or \quad my_data[, k1:k2]
peter_data <- family[1, ] # Extracts first row - Peter
sex_age <- family[, c(3,2)] # Extracts third and second columns:
# sex and age
print(peter_data)
print(sex_age)
person age sex
1 Peter 42 M
sex age
1 M 42
2 F 40
3 F 17
4 M 14
5 M 1
Use dollar operator to access data frame columns
my_data
contains a variable called my_variable
my_data$my_variable
accesses column my_variable
my_data$my_variable
is a vectorExample: To access age in the family
data frame type
ages <- family$age # Stores ages in a vector
cat("Ages of the Family Guy characters are", ages)
cat("Meg's age is", ages[3])
Ages of the Family Guy characters are 42 40 17 14 1
Meg's age is 17
The size of a data frame can be discovered using:
nrow(my_data)
\quad number of rowsncol(my_data)
\quad number of columnsdim(my_data)
\quad \quad vector containing number of rows and columnsfamily_dim <- dim(family) # Stores dimensions of family in a vector
cat("The Family Guy data frame has", family_dim[1],
"rows and", family_dim[2], "columns")
The Family Guy data frame has 5 rows and 3 columns
Adding data to an existing data frame my_data
new_record
new_record
must match the structure of my_data
my_data
with my_data <- rbind(my_data, new_record)
new_variable
new_variable
must have as many components as rows in my_data
my_data
with my_data <- cbind(my_data, new_variable)
family
new_record
to family
person age sex
1 Peter 42 M
2 Lois 40 F
3 Meg 17 F
4 Chris 14 M
5 Stewie 1 M
6 Brian 7 M
family
funny
funny
with entries matching each character (including Brian)funny
to the Family Guy data frame family
person age sex funny
1 Peter 42 M High
2 Lois 40 F High
3 Meg 17 F Low
4 Chris 14 M Med
5 Stewie 1 M High
6 Brian 7 M Med
Instead of using cbind
we can add a new varibale using dollar operator:
new_variable
v
containing values for the new variablev
must have as many components as rows in my_data
my_data
with my_data$new_variable <- v
Example:
family
family$age
by 12v <- family$age * 12 # Computes vector of ages in months
family$age.months <- v # Stores vector as new column in family
print(family)
person age sex funny age.months
1 Peter 42 M High 504
2 Lois 40 F High 480
3 Meg 17 F Low 204
4 Chris 14 M Med 168
5 Stewie 1 M High 12
6 Brian 7 M Med 84
We saw how to use logical flag vectors to subset vectors
We can use logical flag vectors to subset data frames as well
Suppose to have data frame my_data
containing a variable my_variable
Want to subset records in my_data
for which my_variable
satisfies a condition
Use commands
flag <- condition(my_data$my_variable)
my_data[flag, ]
Example:
family
family$sex == "M"
# Create flag vector for male Family Guy characters
flag <- (family$sex == "M")
# Subset data frame "family" and store in data frame "subset"
subset <- family[flag, ]
# Print subset
print(subset)
person age sex funny age.months
1 Peter 42 M High 504
4 Chris 14 M Med 168
5 Stewie 1 M High 12
6 Brian 7 M Med 84
R has a many functions for reading characters from stored files
We will see how to read Table-Format files
Table-Formats are just tables stored in plain-text files
Typical file estensions are:
.txt
for plain-text files.csv
for comma-separated valuesTable-Formats can be read into R with the command
read.table()
NA
#
*
read.table()
.txt
or .csv
file and outputs a data frameread.table()
header = T/F
– Tells R if a header is presentna.strings = "string"
– Tells R that "string"
means NA
To read family_guy.txt
into R proceed as follows:
Download family_guy.txt and move file to Desktop
Open the R Console and change working directory to Desktop
family_guy.txt
into R and store it in data frame family
with coderead.table()
that
family_guy.txt
has a header*
family
to screen person age sex funny age.mon
1 Peter NA M High 504
2 Lois 40 F <NA> 480
3 Meg 17 F Low 204
4 Chris 14 M Med 168
5 Stewie 1 M High NA
6 Brian NA M Med NA
.txt
fileExample: Analysis of Consumer Confidence Index for 2008 crisis from Lecture 4
c()
.txt
file insteadGoal: Perform t-test on CCI difference for mean difference \mu = 0
read.table()
The CCI dataset can be downloaded here 2008_crisis.txt
The text file looks like this
To perform the t-test on data 2008_crisis.txt
we proceed as follows:
Download dataset 2008_crisis.txt and move file to Desktop
Open the R Console and change working directory to Desktop
2008_crisis.txt
into R and store it in data frame scores
with codescores
into 2 vectors# CCI from 2007 is stored in 2nd column
score_2007 <- scores[, 2]
# CCI from 2009 is stored in 3nd column
score_2009 <- scores[, 3]
t.test
is below
One Sample t-test
data: difference
t = 38.144, df = 11, p-value = 4.861e-13
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
68.15960 76.50706
sample estimates:
mean of x
72.33333
They should be meaningful and end in .R
_
) to separate words within a nameBigCamelCase
(link)If possible avoid using names of existing functions and variables
Use <-
and not =
for assignment
=
, +
, -
, <-
, etc.)=
when calling a function:
, ::
and :::
do not need spacingExtra spacing is ok if it improves alignment of =
or <-
If a function definition runs over multiple lines, indent the second line to where the definition starts
return(object)
Often you can call a function without explicitly naming arguments:
plot(height, weight)
mean(weight)
This might be fine for plot()
or mean
However for less common functions:
Comments
#
and a single space-
and=
to break up code into easily readable chunksHomepage License Contact