R Basics: Difference between revisions
| No edit summary | No edit summary | ||
| Line 3: | Line 3: | ||
| add this line to /etc/apt/sources.list: | add this line to /etc/apt/sources.list: | ||
| <pre> | <pre> | ||
| deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu | deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu trusty/ | ||
| </pre> | </pre> | ||
| then run <code>apt-get update</code> | then run <code>apt-get update</code> | ||
| Line 9: | Line 9: | ||
| If you get an error message about GPG keys: | If you get an error message about GPG keys: | ||
| <pre> | <pre> | ||
| sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 | |||
| </pre> | </pre> | ||
| Repeat this for additional error signatures. | Repeat this for additional error signatures. | ||
Latest revision as of 00:39, 28 February 2015
Install on Ubuntu
add this line to /etc/apt/sources.list:
deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu trusty/
then run apt-get update
If you get an error message about GPG keys:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
Repeat this for additional error signatures.
install basic R packages:
apt-get install r-base
To get third-party packages from CRAN, start R as root. Then run a command like this:
install.packages(c("iterators", "foreach"))
Help System
help(glm)                    # get help on a function
?glm                         # same thing
example(glm)                 # see an example
help.search("regression")    # search all help files
??regression                 # same thing
vignette("affy")             # how to use a package
Running R
Start an interactive console with "R".  Use the "q()" command to quit.
To run an R script from the command line:
R CMD BATCH generate_graphs.R
To run an R script from an interactive console:
source generate_graphs.R
Syntax
- Comments are anything after a pound sign.
- Semicolons may be used to separate expressions.
- Curly braces may group a sequence of expressions and return only the last expression.
Control
if (x < y) x else y
 
repeat {if (i > 25) break else {print(i); i <- i + 5;}}
  
while (i <= 25) {print(i); i <- i + 5}
for (i in seq(from=5,to=25,by=5)) print(i)
iterators package (from CRAN)
> library(iterators) > onetofive = iter(1:5) > nextElem(onetofive) # 1
foreach package (from CRAN)
> foreach(i=1:5) %do% sqrt(i) # 1, 1.414214, 1.732051, 2, 2.236068
Assignment
x = 34 x <- 34 34 -> x
Values are generally copied on assignment.
Special Values
NA (for missing data), Inf (infinity), NaN (bad computation), NULL, TRUE (=1), FALSE (=0)
Data Types and Classes
x = 22
class(x)                 # "numeric", class refers to how an object may be used
class("hi")              # "character"
typeof(x)                # "double", type refers to how an object is stored
x = as(22, "integer")    # type-casting
typeof(x)                # "integer"
Strings (or character vectors)
paste("a", "b")            # "a b", default separator is single space
paste("a", "b", sep="")    # "ab"
Vectors
A vector is a one-dimensional sequence of same-typed elements
c(0, 1, 1, 2, <span class="docTextHighlight">3</span>, 5, 8) x = 1:10 # same as x = c(1,2,3,4,5,6,7,8,9,10) x[3] # 3 x[[3]] # 3 x[3:6] # same as c(3,4,5,6) x[-1:-5] # c(6,7,8,9,10) x[x>5] # c(6,7,8,9,10) x[x%%3 == 0] # c(3, 6, 9) x[c(1,5,10)] # c(1,5,10) x[12] = 12 # now x is c(1,2,3,4,5,6,7,8,9,10,NA,12) seq(from=5,to=25,by=5) # c(5,10,15,20,25) sqrt(1:7) # many built-in functions will automatically apply themselves to every element of a vector sapply(1:7) # apply the function sequentially to each element of the vector
Arrays
An array is a multidimensional vector of same-typed elements.
> x = array(1:12, dim=c(3,4)) > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > x[1,3] # 7 > x[,3] # c(7,8,9)> x[2:3,3] # c(8,9)
Lists
A list is a heterogeneous collection of objects, each of which may be named.
x = list(thing="hat", size="8.25")
x[1]                                      # "hat"
x$thing                                   # "hat"
x[["thing"]]                              # "hat" 
x[["thin"]]                               # NULL 
x[["thin", exact=FALSE]]                  # "hat", search on substring match 
> nl = list(first=x,second=y)
> nl$first$thing                          # "hat"
> nl[[c("first", "thing")]]               # "hat" 
Factors and Levels
eye.colors = factor(c("brown", "blue", "blue", "green", "brown", "brown", "brown"))
levels(eye.colors)     # "blue"  "brown" "green"
Shingles
A shingle is a generalization of a factor to a continuous variable. A shingle consists of a numeric vector and a set of intervals. The intervals are allowed to overlap (much like roof shingles, hence the name shingles).
Data Frames
A data frame is a list that contains multiple named vectors that are the same length, like a database table.
> teams <- c("PHI","NYM","FLA","ATL","WSN")
> w <- c(92, 89, 94, 72, 59)
> l <- c(70, 73, 77, 90, 102)
> nleast <- data.frame(teams,w,l)
> nleast
teams  w   l
1   PHI 92  70
2   NYM 89  73
3   FLA 94  77
4   ATL 72  90
5   WSN 59 102
nleast$w               # c(92, 89, 94, 72, 59)
nleast$w[nleast$teams == "FLA"]  # 94
cbind (df1, df2)       # combine two dataframes into one by adding columns
rbind(df1, df2)        # combine two dataframes into one by adding rows
Models and Formulas
A linear model may be expressed as
y ~ x1 + x2 + ... + xn
which is a formula object. Example of linear model using built-in "cars" data set:
> cars.lm <- lm(formula=dist~speed,data=cars) > cars.lm Call: lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed -17.579 3.932 > summary(cars.lm) # shows lots more details
y ~ x1+x2+x3 # y is a linear function of x1, x2, and x3 y ~ x1+x2+x3+0 # y is a linear combination of x1, x2, and x3 with no intercept y ~ x1+x2 | x3 # y is a linear function of x1 and x2 conditioned on x3 y ~ I(x1+x2) # y is a linear function of a single variable (x1+x2) y ~ (x1+x2)*x3 # equivalent to y ~ x1+x2+x3+I(x1*x2)+I(x2*x3) y ~ (x1+x2)^2 # equivalent to y ~ (x1+x2)*(x1+x2) y ~ log(x1)+sin(x2) # can incorporate simple functions
Functions
myFunc = function(x,y) { c(x+1, y+1) }
myFunc(2,3) # c(3,4)
myFunc # function(x,y){c(x+1,y+1)}
timethis <- function(...) {  # the "..." takes an arbitrary list of arguments and passes them on
   start.time <- Sys.time();
   eval.parent(...);         # accesses a function from the parent environment
   print(Sys.time() - start.time);
}
> addemup <- function(x,...) {
+    args <- list(...)
+    for (a in args) x <- x + a
+    return(x)
+ }
> addemup(1,1)             # 2
> addemup(1,2,3,4,5)       # 15
assigns.x <- function(i) {x <<- i}  # assigns the variable x in the parent environment
Dates
as.Date("2009-09-08") + 7     # "2009-09-15"
today = Sys.Date()
format(today, "%m/%d/%Y")
Time Series
> ts(1:8,start=c(2008,2),frequency=4) Qtr1 Qtr2 Qtr3 Qtr4 2008 1 2 3 2009 4 5 6 7 2010 8
Environments
An environment consists of the set of symbols that have been defined. A function body, list, or data frame contains its own environment.
example.list <- list(a=1, b=2, c=3) with(example.list, a+b+c) # 6
Classes
> setClass("TimeSeries",
+   representation(
+     data="numeric",
+     start="POSIXct",
+     end="POSIXct"
+   )
+ )
> my.TimeSeries <- new("TimeSeries",
+    data=c(1,2,3,4,5,6),
+    start=as.POSIXct("07/01/2009 0:00:00",tz="GMT", format="%m/%d/%Y %H:%M:%S"),
+    end=as.POSIXct("07/01/2009 0:05:00",tz="GMT", format="%m/%d/%Y %H:%M:%S")
+ )
> setValidity("TimeSeries",
+    function(object) {
+      object@start <= object@end &&
+      length(object@start) == 1 &&
+      length(object@end) == 1
+    }
+  )