Scott: Created page with '==Basic stats== '''mean, min, max, range = c(min, max)'''

 mean(c(1,2,3,4,5,NA),na.rm=TRUE)   # 3, ignore NA's mean(c(-1,0:100,2000),trim=0.1)    # 50, ignore …'

2011-02-01T23:31:13Z

Created page with '==Basic stats== '''<code>mean, min, max, range = c(min, max)</code>''' <pre> mean(c(1,2,3,4,5,NA),na.rm=TRUE) # 3, ignore NA's mean(c(-1,0:100,2000),trim=0.1) # 50, ignore …'

New page

==Basic stats==

'''<code>mean, min, max, range = c(min, max)</code>'''
<pre>
mean(c(1,2,3,4,5,NA),na.rm=TRUE) # 3, ignore NA's
mean(c(-1,0:100,2000),trim=0.1) # 50, ignore 10% of outliers
</pre>

'''<code>quantile, fivenum, IQR, summary</code>''' give quantile-related stats

==Correlation and covariance==

'''Pearson''' correlation assumes normally distributed data

'''Spearman''' correlation is nonparametric and doesn't make assumptions about the underlying distribution:
<pre>
cor(x, y, method="spearman") # correlation
cov(x, y, method="pearson") # covariance
</pre>

==Principal Components Analysis==

http://en.wikipedia.org/wiki/Principal_component_analysis

http://www.youtube.com/watch?v=BfTMmoDFXyE

You have a data set in N dimensions. The first principle component is a linear combination of these dimensions that best explains the variance in the data. The second principle component is orthogonal to the first and best explains the variance in the rest of the data, and so on. It is useful for exploring a large multi-dimensional data set.

'''<code>princomp</code>''' involves the calculation of the eigenvalue decomposition of the data covariance matrix.

'''<code>prcomp</code>''' uses singular value decomposition which gives better numerical accuracy

==Probability Distributions==
<pre>
dnorm(x, mean = 0, sd = 1, log = FALSE) # density function, dnorm(0) = 0.3989423
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) # distribution function: pnorm(0) = 0.5
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) # quantile function: qnorm(0.5) = 0 (aka inverse dist fn)
rnorm(n, mean = 0, sd = 1) # generate n random values from norm dist
</pre>

Same functions are available for beta, binomial, cauchy, etc.

==Compare data set to a distribution function==
<pre>
shapiro.test(x) # Shapiro-Wilk test for normality, small p-value means good match
ks.test(x, dist) # Kolmogorov-Smirnov test to see if x values came from dist distribution
</pre>

R Statistics - Revision history

Scott: Created page with '==Basic stats== '''`mean, min, max, range = c(min, max)`'''
mean(c(1,2,3,4,5,NA),na.rm=TRUE) # 3, ignore NA's mean(c(-1,0:100,2000),trim=0.1) # 50, ignore …'

R Statistics - Revision history

Scott: Created page with '==Basic stats== '''mean, min, max, range = c(min, max)''' mean(c(1,2,3,4,5,NA),na.rm=TRUE) # 3, ignore NA's mean(c(-1,0:100,2000),trim=0.1) # 50, ignore …'

Scott: Created page with '==Basic stats== '''`mean, min, max, range = c(min, max)`'''
mean(c(1,2,3,4,5,NA),na.rm=TRUE) # 3, ignore NA's mean(c(-1,0:100,2000),trim=0.1) # 50, ignore …'