R Basics: Difference between revisions
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
add this line to /etc/apt/sources.list: | add this line to /etc/apt/sources.list: | ||
<pre> | <pre> | ||
deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu | deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu trusty/ | ||
</pre> | </pre> | ||
then run <code>apt-get update</code> | then run <code>apt-get update</code> | ||
Line 9: | Line 9: | ||
If you get an error message about GPG keys: | If you get an error message about GPG keys: | ||
<pre> | <pre> | ||
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 | |||
</pre> | </pre> | ||
Repeat this for additional error signatures. | Repeat this for additional error signatures. |
Latest revision as of 00:39, 28 February 2015
Install on Ubuntu
add this line to /etc/apt/sources.list:
deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu trusty/
then run apt-get update
If you get an error message about GPG keys:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
Repeat this for additional error signatures.
install basic R packages:
apt-get install r-base
To get third-party packages from CRAN, start R as root. Then run a command like this:
install.packages(c("iterators", "foreach"))
Help System
help(glm) # get help on a function ?glm # same thing example(glm) # see an example help.search("regression") # search all help files ??regression # same thing vignette("affy") # how to use a package
Running R
Start an interactive console with "R
". Use the "q()
" command to quit.
To run an R script from the command line:
R CMD BATCH generate_graphs.R
To run an R script from an interactive console:
source generate_graphs.R
Syntax
- Comments are anything after a pound sign.
- Semicolons may be used to separate expressions.
- Curly braces may group a sequence of expressions and return only the last expression.
Control
if (x < y) x else y repeat {if (i > 25) break else {print(i); i <- i + 5;}} while (i <= 25) {print(i); i <- i + 5} for (i in seq(from=5,to=25,by=5)) print(i)
iterators package (from CRAN)
> library(iterators) > onetofive = iter(1:5) > nextElem(onetofive) # 1
foreach package (from CRAN)
> foreach(i=1:5) %do% sqrt(i) # 1, 1.414214, 1.732051, 2, 2.236068
Assignment
x = 34 x <- 34 34 -> x
Values are generally copied on assignment.
Special Values
NA
(for missing data), Inf
(infinity), NaN
(bad computation), NULL
, TRUE
(=1), FALSE
(=0)
Data Types and Classes
x = 22 class(x) # "numeric", class refers to how an object may be used class("hi") # "character" typeof(x) # "double", type refers to how an object is stored x = as(22, "integer") # type-casting typeof(x) # "integer"
Strings (or character vectors)
paste("a", "b") # "a b", default separator is single space paste("a", "b", sep="") # "ab"
Vectors
A vector is a one-dimensional sequence of same-typed elements
c(0, 1, 1, 2, <span class="docTextHighlight">3</span>, 5, 8) x = 1:10 # same as x = c(1,2,3,4,5,6,7,8,9,10) x[3] # 3 x[[3]] # 3 x[3:6] # same as c(3,4,5,6) x[-1:-5] # c(6,7,8,9,10) x[x>5] # c(6,7,8,9,10) x[x%%3 == 0] # c(3, 6, 9) x[c(1,5,10)] # c(1,5,10) x[12] = 12 # now x is c(1,2,3,4,5,6,7,8,9,10,NA,12) seq(from=5,to=25,by=5) # c(5,10,15,20,25) sqrt(1:7) # many built-in functions will automatically apply themselves to every element of a vector sapply(1:7) # apply the function sequentially to each element of the vector
Arrays
An array is a multidimensional vector of same-typed elements.
> x = array(1:12, dim=c(3,4)) > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > x[1,3] # 7 > x[,3] # c(7,8,9)> x[2:3,3] # c(8,9)
Lists
A list is a heterogeneous collection of objects, each of which may be named.
x = list(thing="hat", size="8.25") x[1] # "hat" x$thing # "hat" x[["thing"]] # "hat" x[["thin"]] # NULL x[["thin", exact=FALSE]] # "hat", search on substring match > nl = list(first=x,second=y) > nl$first$thing # "hat" > nl[[c("first", "thing")]] # "hat"
Factors and Levels
eye.colors = factor(c("brown", "blue", "blue", "green", "brown", "brown", "brown")) levels(eye.colors) # "blue" "brown" "green"
Shingles
A shingle is a generalization of a factor to a continuous variable. A shingle consists of a numeric vector and a set of intervals. The intervals are allowed to overlap (much like roof shingles, hence the name shingles).
Data Frames
A data frame is a list that contains multiple named vectors that are the same length, like a database table.
> teams <- c("PHI","NYM","FLA","ATL","WSN") > w <- c(92, 89, 94, 72, 59) > l <- c(70, 73, 77, 90, 102) > nleast <- data.frame(teams,w,l) > nleast teams w l 1 PHI 92 70 2 NYM 89 73 3 FLA 94 77 4 ATL 72 90 5 WSN 59 102 nleast$w # c(92, 89, 94, 72, 59) nleast$w[nleast$teams == "FLA"] # 94 cbind (df1, df2) # combine two dataframes into one by adding columns rbind(df1, df2) # combine two dataframes into one by adding rows
Models and Formulas
A linear model may be expressed as
y ~ x1 + x2 + ... + xn
which is a formula object. Example of linear model using built-in "cars" data set:
> cars.lm <- lm(formula=dist~speed,data=cars) > cars.lm Call: lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed -17.579 3.932 > summary(cars.lm) # shows lots more details
y ~ x1+x2+x3 # y is a linear function of x1, x2, and x3 y ~ x1+x2+x3+0 # y is a linear combination of x1, x2, and x3 with no intercept y ~ x1+x2 | x3 # y is a linear function of x1 and x2 conditioned on x3 y ~ I(x1+x2) # y is a linear function of a single variable (x1+x2) y ~ (x1+x2)*x3 # equivalent to y ~ x1+x2+x3+I(x1*x2)+I(x2*x3) y ~ (x1+x2)^2 # equivalent to y ~ (x1+x2)*(x1+x2) y ~ log(x1)+sin(x2) # can incorporate simple functions
Functions
myFunc = function(x,y) { c(x+1, y+1) } myFunc(2,3) # c(3,4) myFunc # function(x,y){c(x+1,y+1)} timethis <- function(...) { # the "..." takes an arbitrary list of arguments and passes them on start.time <- Sys.time(); eval.parent(...); # accesses a function from the parent environment print(Sys.time() - start.time); } > addemup <- function(x,...) { + args <- list(...) + for (a in args) x <- x + a + return(x) + } > addemup(1,1) # 2 > addemup(1,2,3,4,5) # 15 assigns.x <- function(i) {x <<- i} # assigns the variable x in the parent environment
Dates
as.Date("2009-09-08") + 7 # "2009-09-15" today = Sys.Date() format(today, "%m/%d/%Y")
Time Series
> ts(1:8,start=c(2008,2),frequency=4) Qtr1 Qtr2 Qtr3 Qtr4 2008 1 2 3 2009 4 5 6 7 2010 8
Environments
An environment consists of the set of symbols that have been defined. A function body, list, or data frame contains its own environment.
example.list <- list(a=1, b=2, c=3) with(example.list, a+b+c) # 6
Classes
> setClass("TimeSeries", + representation( + data="numeric", + start="POSIXct", + end="POSIXct" + ) + ) > my.TimeSeries <- new("TimeSeries", + data=c(1,2,3,4,5,6), + start=as.POSIXct("07/01/2009 0:00:00",tz="GMT", format="%m/%d/%Y %H:%M:%S"), + end=as.POSIXct("07/01/2009 0:05:00",tz="GMT", format="%m/%d/%Y %H:%M:%S") + ) > setValidity("TimeSeries", + function(object) { + object@start <= object@end && + length(object@start) == 1 && + length(object@end) == 1 + } + )