1 First Steps

1.1 Why Program in R?

You should program your statistical analysis, and we will cover some of the basics of how to do this in R. You also want your work to be replicable

Replicable: someone collecting new data comes to the same results.
Reproducibile: someone reusing your data comes to the same results.

You can read more about the distinction in many places, including

We focus on R because it is good for complex stats, concise figures, and coherent organization. It is built and developed by applied statisticians for statistics, and used by many in academia and industry.

You see the exact steps you took and found an error
- Google “worst Excel errors” and note the frequency they arise from copy/paste via the “point-and-click” approach. E.g., Fidelity’s $2.6 Billion Dividend Error.
- Find a problem before sending your work out. E.g., see https://retractionwatch.com/ and https://econjwatch.org/.
You try out a new plot you found in The Visual Display of Quantitative Information, by Edward Tufte.
- It’s not a standard plot, but AI answers most of your questions.
- Tutorials help avoid bad practices, such as plotting 2D data as a 3D object (see e.g., https://clauswilke.com/dataviz/no-3d.html).
You try out an obscure statistical approach that’s hot in your field.
- it doesn’t make the report, but you have some confidence that candidate issue isn’t a big problem

So my main sell to you is that being reproducible is in your own self-interest. For students, think about labor demand and what may be good for getting a job. Do some of your own research to best understand how much to invest.

1.2 First Steps

Install R.

First Install R. Then Install RStudio.

For Fedora (linux) users, note that you need to first enable the repo and then install

Code

sudo dnf install 'dnf-command(copr)'
sudo dnf copr enable iucar/RStudio
sudo dnf install RStudio-desktop

Make sure you have the latest version of R and RStudio for class. If not, then reinstall.

Interfacing with RStudio.

RStudio is perhaps the easiest to get going with. (There are other GUI’s.)

In RStudio, there are 4 panes. If you do not see 4, click “file > new file > R script” on the top left of the toolbar.

The top left pane is where you write your code. For example, type

1+1

The pane below is where your code is executed. Keep you mouse on the same line as your code, and then click “Run”. You should see

> 1+1
[1] 2

If you click “Run” again, you should see that same output printed again.

As we proceed, you can see both my source code and output like this:

Code

1 + 1
## [1] 2

Now try $2+2$ and $3+3$. First execute each one-at-a-time. Then highlight/run them both together.

You should add comments to your codes, and you do this with hashtags. For example

Code

# This is my first comment!
1+1 # The simplest calculation I could think of
## [1] 2

# Other examples of running code
2+2
## [1] 4
3+3
## [1] 6

Now try some other mathematical operations

Code

2-3 #subtraction
## [1] -1
2*3 #multiplication
## [1] 6
2/3 #division
## [1] 0.6666667
2^3 #powers
## [1] 8

Now try some more complex examples

Code

# Example 1
5+2/10
## [1] 5.2
(5+2)/10 # notice the difference
## [1] 0.7

# Example 2
2^3/10 # notice the differences
## [1] 0.8
2/10^3
## [1] 0.002
(2/10)^3
## [1] 0.008

Reading This Textbook.

In later chapters, there are also special boxes for especially important statistical examples.

Note

This box contains need to know examples. Such as

Code

2 + 7
## [1] 9

Tip

This box contains test yourself examples and questions. Such as

Code

(2 + 7)^3 / 10
## [1] 72.9

To understand each chunk of code:

Copy/paste it into your RStudio
Run it
Predict what happens if changed
Change it
Break it
Fix it

E.g., Use the following code to see if the empty space matters

Code

(2 + 7)^3 / 10
## [1] 72.9

Assignment.

You can create “variables” that store values. For example,

Code

x <- 1 # Make your first variable
x + 1 # The simplest calculation I could think of
## [1] 2

Code

x <- 23 #Another example
x + 1
## [1] 24

Code

y <- x + 1 #Another example
y
## [1] 24

Your variables must be defined in order to use them. Otherwise you get an error. For example,

Code

X +   1 # notice that R is sensitive to capitalization 
## Error:
## ! object 'X' not found

Scripting.

Create a folder on your computer to save your scripts
Save your R Script file as My_First_Script.R in your folder
Close RStudio
Open your script and re-run it

As you work through the material, make sure to both execute and save your scripts. Add lots of commentary to your scripts. Name your scripts systematically.

There are often many ways to accomplish the same goal. You first scripts will be very basic and rough, but you can edit them later based on what you learn. And you can always ask R for help

Code

sum(x, 2) # x + 2
?sum

We write script in the top left so that we can edit common mistakes.

Code

# Mistake 1: using undefined objects
Y

#  Mistake 2: spelling and spacing
Y < - 43
Y_plus_z <- Y + z

# Mistake 3: half-completed code
x + y + 
x_plus_y_plus_z <- x + y + z
# Seeing "+" in the bottom console?
# press "Escape" and try again

Your variable names do not matter technically, but they should be informative to help avoid common mistakes.

1.3 Mathematical Objects

In R: scalars, vectors, and matrices are different kinds of “objects”.

These objects are used extensively in data analysis

scalars: summary statistics (e.g., average household income of Canada).
vectors: single variable in a data set (e.g., the household income of each family in Vancouver).
matrices: two variables in a data set (e.g, the age and education level of every person in class).

Vectors are probably your most common object in R, but we will start with scalars.

Scalars.

Make your first scalar

Code

xs <- 2 # Make your first scalar
xs  # Print the scalar
## [1] 2

Perform simple calculations and see how R is doing the math for you

Code

xs + 2
## [1] 4
xs*2 # Perform and print a simple calculation
## [1] 4
(xs+1)^2 # Perform and print a simple calculation
## [1] 9
xs + NA # often used for missing values
## [1] NA

Now change xs, predict what will happen, then re-run the code.

Vectors.

Make your first vector

Code

x <- c(0,1,3,10,6) # Your First Vector
x # Print the vector
## [1]  0  1  3 10  6
x[2] # Print the 2nd element; 1
## [1] 1
x+2 # Print simple calculation; 2,3,5,8,12
## [1]  2  3  5 12  8
x*2
## [1]  0  2  6 20 12
x^2
## [1]   0   1   9 100  36

Apply mathematical calculations elementwise

Code

x+x
## [1]  0  2  6 20 12
x*x
## [1]   0   1   9 100  36
x^x
## [1] 1.0000e+00 1.0000e+00 2.7000e+01 1.0000e+10 4.6656e+04

In R, scalars are treated as a vector with one element.

Code

c(1)
## [1] 1

Matrices.

Matrices are also common objects

Code

x1 <- c(1, 4, 9)
x2 <- c(3, 0, 2)
x_mat <- rbind(x1, x2)

x_mat       # Print full matrix
##    [,1] [,2] [,3]
## x1    1    4    9
## x2    3    0    2
x_mat[2,  ]   # Print Second Row
## [1] 3 0 2
x_mat[ , 2]   # Print Second Column
## x1 x2 
##  4  0
x_mat[2, 2]  # Print Element in Second Column and Second Row
## x2 
##  0

There are elementwise calculations

Code

x_mat+2
##    [,1] [,2] [,3]
## x1    3    6   11
## x2    5    2    4
x_mat*2
##    [,1] [,2] [,3]
## x1    2    8   18
## x2    6    0    4
x_mat^2
##    [,1] [,2] [,3]
## x1    1   16   81
## x2    9    0    4

x_mat + x_mat
##    [,1] [,2] [,3]
## x1    2    8   18
## x2    6    0    4
x_mat * x_mat
##    [,1] [,2] [,3]
## x1    1   16   81
## x2    9    0    4

1.4 Mathematical Functions

Creating Simple Functions.

Functions are applied to objects

Code

# Define a function that adds two to any vector
add_two <- function(input) { #input is a placeholder
    output <- input + 2 # new object defined locally 
    return(output) # return new object 
}
# Apply that function to a vector
x <- c(0, 1, 3, 10, 6)
add_two(input=x) #same as add_two(x)
## [1]  2  3  5 12  8

Common mistakes:

Code

print(input)
print(output)
# These are not available globally, only locally (inside of the function)

# Double check typos
x  < - add_two(Input=X) 

# Seeing "+" in the bottom console
# often means you forgot to close the function with "}" 
# click the bottom left panel, press "Escape", and try again in the top left panel
add_two <- function(input_vector) { 
    output_vector <- input_vector + 2 
    return(output_vector)
x <- c(0,1,3,10,6)
add_two(x)

There are many different functions. Many of which functions have defaults.

Code

add_scalar <- function(input_vector1, input_scalar2) {
    output_vector <- input_vector1 + input_scalar2
    return(output_vector)
}
add_scalar(x, 3)
## [1]  3  4  6 13  9
add_scalar(x, 4)
## [1]  4  5  7 14 10

add_scalar3 <- function(input_vector1, input_scalar2=3) {
    output_vector <- input_vector1 + input_scalar2
    return(output_vector)
}
add_scalar3(x)
## [1]  3  4  6 13  9
add_scalar3(x,4)
## [1]  4  5  7 14 10

Common Functions.

Perhaps the most common function we will use is summation. You can see exactly what a function does with ?.

Code

x1 <- c(1,4,9)
x1
## [1] 1 4 9
sum(x1)
## [1] 14

x2 <- c(3,0,2)
x_mat <- rbind(x1, x2)
x_mat
##    [,1] [,2] [,3]
## x1    1    4    9
## x2    3    0    2
sum(x_mat)
## [1] 19

# ?sum

You can apply functions to each row or column of a matrix

Code

x_mat
##    [,1] [,2] [,3]
## x1    1    4    9
## x2    3    0    2

# Row sums
y_row <- apply(x_mat, 1, sum)
y_row
## x1 x2 
## 14  5

#check row sums are correct
x_row1 <- x_mat[1, ]
sum(x_row1)
## [1] 14
x_row2 <- x_mat[2, ]
sum(x_row2)
## [1] 5

# Column sums
y_col <- apply(x_mat, 2, sum)
y_col
## [1]  4  4 11

#check column sums are correct: DIY

Sometimes, we will use vectors that are entirely ordered. We make them with functions.

Code

seq(1, 7, by=1) #same as 1:7
## [1] 1 2 3 4 5 6 7
seq(1, 7, by=0.5)
##  [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

# Ordering data
sort(x)
## [1]  0  1  3  6 10
x[order(x)]
## [1]  0  1  3  6 10

Loops.

Applying the same function over and over again

Code

# Example 1: simple division
x <- vector(length=3)
#Fill empty vector
for(i in seq(1,3)){
    x[i] <- i/2
}
x
## [1] 0.5 1.0 1.5
# Compare

# Example 2: using existing data
x <- c(1,3,9,2)
y <- vector(length=length(x))
for(i in seq(1,4) ){
    y[i] <- x[i] + 1
}
y
## [1]  2  4 10  3


#Example 3: recursion
x <- vector(length=4)
x[1] <- 1
for(i in seq(2,4) ){
    x[i] <- x[i-1]^2
}
x
## [1] 1 1 1 1

Logic.

Calculations that are either TRUE or FALSE

Code

x <- c(1, 2, 3, NA)

x == 2
## [1] FALSE  TRUE FALSE    NA
any(x==2)
## [1] TRUE
all(x==2)
## [1] FALSE
2 %in% x
## [1] TRUE
is.na(x)
## [1] FALSE FALSE FALSE  TRUE

The & and | commands are logical calculations that compare vectors to the left and right.

Code

x < 2
## [1]  TRUE FALSE FALSE    NA
x >= 1
## [1] TRUE TRUE TRUE   NA
(x >= 1) & (x < 2)
## [1]  TRUE FALSE FALSE    NA
(x >= 1) | (x < 2)
## [1] TRUE TRUE TRUE   NA

1.5 Further Reading

For more on why to program in R, see

http://www.r-bloggers.com/the-reproducibility-crisis-in-science-and-prospects-for-r/
https://github.com/qinwf/awesome-R\#reproducible-research
A Guide to Reproducible Code in Ecology and Evolution

The most common programming tasks can be found https://raw.githubusercontent.com/rstudio/cheatsheets/main/rstudio-ide.pdf

There are many good and free programming materials online. For help setting up, see any of the following links