R Basics

From Wiki
Jump to navigation Jump to search

Install on Ubuntu

add this line to /etc/apt/sources.list:

deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu trusty/

then run apt-get update

If you get an error message about GPG keys:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9

Repeat this for additional error signatures.

install basic R packages:

apt-get install r-base

To get third-party packages from CRAN, start R as root. Then run a command like this:

install.packages(c("iterators", "foreach"))

Help System

help(glm)                    # get help on a function
?glm                         # same thing
example(glm)                 # see an example
help.search("regression")    # search all help files
??regression                 # same thing
vignette("affy")             # how to use a package

Running R

Start an interactive console with "R". Use the "q()" command to quit.

To run an R script from the command line:

R CMD BATCH generate_graphs.R

To run an R script from an interactive console:

source generate_graphs.R

Syntax

  • Comments are anything after a pound sign.
  • Semicolons may be used to separate expressions.
  • Curly braces may group a sequence of expressions and return only the last expression.

Control

if (x < y) x else y
 
repeat {if (i > 25) break else {print(i); i <- i + 5;}}
  
while (i <= 25) {print(i); i <- i + 5}

for (i in seq(from=5,to=25,by=5)) print(i)
iterators package (from CRAN)
> library(iterators)
> onetofive = iter(1:5)
> nextElem(onetofive)       # 1
foreach package (from CRAN)
> foreach(i=1:5) %do% sqrt(i)      # 1, 1.414214, 1.732051, 2, 2.236068

Assignment

x = 34
x <- 34
34 -> x

Values are generally copied on assignment.

Special Values

NA (for missing data), Inf (infinity), NaN (bad computation), NULL, TRUE (=1), FALSE (=0)

Data Types and Classes

x = 22
class(x)                 # "numeric", class refers to how an object may be used
class("hi")              # "character"
typeof(x)                # "double", type refers to how an object is stored
x = as(22, "integer")    # type-casting
typeof(x)                # "integer"

Strings (or character vectors)

paste("a", "b")            # "a b", default separator is single space
paste("a", "b", sep="")    # "ab"

Vectors

A vector is a one-dimensional sequence of same-typed elements

c(0, 1, 1, 2, <span class="docTextHighlight">3</span>, 5, 8)
x = 1:10      # same as x = c(1,2,3,4,5,6,7,8,9,10)
x[3]          # 3
x[[3]]        # 3
x[3:6]        # same as c(3,4,5,6)
x[-1:-5]      # c(6,7,8,9,10)
x[x>5]        # c(6,7,8,9,10)
x[x%%3 == 0]  # c(3, 6, 9)
x[c(1,5,10)]  # c(1,5,10)
x[12] = 12    # now x is c(1,2,3,4,5,6,7,8,9,10,NA,12)
seq(from=5,to=25,by=5)   # c(5,10,15,20,25)
sqrt(1:7)          # many built-in functions will automatically apply themselves to every element of a vector
sapply(1:7)  # apply the function sequentially to each element of the vector

Arrays

An array is a multidimensional vector of same-typed elements.

> x = array(1:12, dim=c(3,4))
> x
[,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> x[1,3]             # 7
> x[,3]              # c(7,8,9)> x[2:3,3]           # c(8,9)

Lists

A list is a heterogeneous collection of objects, each of which may be named.

x = list(thing="hat", size="8.25")
x[1]                                      # "hat"
x$thing                                   # "hat"
x[["thing"]]                              # "hat" 
x[["thin"]]                               # NULL 
x[["thin", exact=FALSE]]                  # "hat", search on substring match 
> nl = list(first=x,second=y)
> nl$first$thing                          # "hat"
> nl[[c("first", "thing")]]               # "hat" 

Factors and Levels

eye.colors = factor(c("brown", "blue", "blue", "green", "brown", "brown", "brown"))
levels(eye.colors)     # "blue"  "brown" "green"

Shingles

A shingle is a generalization of a factor to a continuous variable. A shingle consists of a numeric vector and a set of intervals. The intervals are allowed to overlap (much like roof shingles, hence the name shingles).

Data Frames

A data frame is a list that contains multiple named vectors that are the same length, like a database table.

> teams <- c("PHI","NYM","FLA","ATL","WSN")
> w <- c(92, 89, 94, 72, 59)
> l <- c(70, 73, 77, 90, 102)
> nleast <- data.frame(teams,w,l)
> nleast
teams  w   l
1   PHI 92  70
2   NYM 89  73
3   FLA 94  77
4   ATL 72  90
5   WSN 59 102
nleast$w               # c(92, 89, 94, 72, 59)
nleast$w[nleast$teams == "FLA"]  # 94
cbind (df1, df2)       # combine two dataframes into one by adding columns
rbind(df1, df2)        # combine two dataframes into one by adding rows

Models and Formulas

A linear model may be expressed as

y ~ x1 + x2 + ... + xn

which is a formula object. Example of linear model using built-in "cars" data set:

> cars.lm <- lm(formula=dist~speed,data=cars)
> cars.lm
Call:
lm(formula = dist ~ speed, data = cars)
Coefficients:
(Intercept)        speed  
 -17.579        3.932
> summary(cars.lm)     # shows lots more details
y ~ x1+x2+x3        # y is a linear function of x1, x2, and x3
y ~ x1+x2+x3+0      # y is a linear combination of x1, x2, and x3 with no intercept
y ~ x1+x2 | x3      # y is a linear function of x1 and x2 conditioned on x3
y ~ I(x1+x2)        # y is a linear function of a single variable (x1+x2)
y ~ (x1+x2)*x3      # equivalent to y ~ x1+x2+x3+I(x1*x2)+I(x2*x3)
y ~ (x1+x2)^2       # equivalent to y ~ (x1+x2)*(x1+x2)
y ~ log(x1)+sin(x2) # can incorporate simple functions

Functions

myFunc = function(x,y) { c(x+1, y+1) }
myFunc(2,3) # c(3,4)
myFunc # function(x,y){c(x+1,y+1)}
timethis <- function(...) {  # the "..." takes an arbitrary list of arguments and passes them on
   start.time <- Sys.time();
   eval.parent(...);         # accesses a function from the parent environment
   print(Sys.time() - start.time);

}

> addemup <- function(x,...) {
+    args <- list(...)
+    for (a in args) x <- x + a
+    return(x)
+ }
> addemup(1,1)             # 2
> addemup(1,2,3,4,5)       # 15

assigns.x <- function(i) {x <<- i}  # assigns the variable x in the parent environment

Dates

as.Date("2009-09-08") + 7     # "2009-09-15"
today = Sys.Date()
format(today, "%m/%d/%Y")

Time Series

> ts(1:8,start=c(2008,2),frequency=4)
  Qtr1 Qtr2 Qtr3 Qtr4
2008         1    2    3
2009    4    5    6    7
2010    8

Environments

An environment consists of the set of symbols that have been defined. A function body, list, or data frame contains its own environment.

example.list <- list(a=1, b=2, c=3)
with(example.list, a+b+c) # 6

Classes

> setClass("TimeSeries",
+   representation(
+     data="numeric",
+     start="POSIXct",
+     end="POSIXct"
+   )
+ )

> my.TimeSeries <- new("TimeSeries",
+    data=c(1,2,3,4,5,6),
+    start=as.POSIXct("07/01/2009 0:00:00",tz="GMT", format="%m/%d/%Y %H:%M:%S"),
+    end=as.POSIXct("07/01/2009 0:05:00",tz="GMT", format="%m/%d/%Y %H:%M:%S")
+ )

> setValidity("TimeSeries",
+    function(object) {
+      object@start <= object@end &&
+      length(object@start) == 1 &&
+      length(object@end) == 1
+    }
+  )