<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.scott5.org/index.php?action=history&amp;feed=atom&amp;title=Statistics</id>
	<title>Statistics - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.scott5.org/index.php?action=history&amp;feed=atom&amp;title=Statistics"/>
	<link rel="alternate" type="text/html" href="https://wiki.scott5.org/index.php?title=Statistics&amp;action=history"/>
	<updated>2026-04-13T00:31:11Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>https://wiki.scott5.org/index.php?title=Statistics&amp;diff=1172&amp;oldid=prev</id>
		<title>Scott: /* Mean */</title>
		<link rel="alternate" type="text/html" href="https://wiki.scott5.org/index.php?title=Statistics&amp;diff=1172&amp;oldid=prev"/>
		<updated>2013-12-10T00:46:16Z</updated>

		<summary type="html">&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Mean&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 00:46, 10 December 2013&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l11&quot;&gt;Line 11:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 11:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* population mean &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; is the average over the entire population of size N &amp;lt;math&amp;gt;\mu = \frac{1}{N}\sum^N x_i&amp;lt;/math&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* population mean &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; is the average over the entire population of size N &amp;lt;math&amp;gt;\mu = \frac{1}{N}\sum^N x_i&amp;lt;/math&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* sample mean &amp;lt;math&amp;gt;\overline{x}&amp;lt;/math&amp;gt; is the average over a sample of size n (usually n &amp;lt;&amp;lt; N) &amp;lt;math&amp;gt;\overline{x} = \frac{1}{n}\sum^n x_i&amp;lt;/math&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* sample mean &amp;lt;math&amp;gt;\overline{x}&amp;lt;/math&amp;gt; is the average over a sample of size n (usually n &amp;lt;&amp;lt; N) &amp;lt;math&amp;gt;\overline{x} = \frac{1}{n}\sum^n x_i&amp;lt;/math&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* test is &amp;lt;math&amp;gt;\xi\frac{\alpha}{\beta - 1}&amp;lt;/math&amp;gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Variance and Standard Deviation==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Variance and Standard Deviation==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Scott</name></author>
	</entry>
	<entry>
		<id>https://wiki.scott5.org/index.php?title=Statistics&amp;diff=1171&amp;oldid=prev</id>
		<title>Scott: /* Mean */</title>
		<link rel="alternate" type="text/html" href="https://wiki.scott5.org/index.php?title=Statistics&amp;diff=1171&amp;oldid=prev"/>
		<updated>2013-12-10T00:46:06Z</updated>

		<summary type="html">&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Mean&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 00:46, 10 December 2013&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l11&quot;&gt;Line 11:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 11:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* population mean &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; is the average over the entire population of size N &amp;lt;math&amp;gt;\mu = \frac{1}{N}\sum^N x_i&amp;lt;/math&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* population mean &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; is the average over the entire population of size N &amp;lt;math&amp;gt;\mu = \frac{1}{N}\sum^N x_i&amp;lt;/math&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* sample mean &amp;lt;math&amp;gt;\overline{x}&amp;lt;/math&amp;gt; is the average over a sample of size n (usually n &amp;lt;&amp;lt; N) &amp;lt;math&amp;gt;\overline{x} = \frac{1}{n}\sum^n x_i&amp;lt;/math&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* sample mean &amp;lt;math&amp;gt;\overline{x}&amp;lt;/math&amp;gt; is the average over a sample of size n (usually n &amp;lt;&amp;lt; N) &amp;lt;math&amp;gt;\overline{x} = \frac{1}{n}\sum^n x_i&amp;lt;/math&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* test is &amp;lt;math&amp;gt;\xi\frac{\alpha}{\beta - 1}&amp;lt;/math&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Variance and Standard Deviation==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Variance and Standard Deviation==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Scott</name></author>
	</entry>
	<entry>
		<id>https://wiki.scott5.org/index.php?title=Statistics&amp;diff=91&amp;oldid=prev</id>
		<title>Scott at 17:33, 31 January 2011</title>
		<link rel="alternate" type="text/html" href="https://wiki.scott5.org/index.php?title=Statistics&amp;diff=91&amp;oldid=prev"/>
		<updated>2011-01-31T17:33:06Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;from Biostatistics, by Paulson 2008&lt;br /&gt;
&lt;br /&gt;
==Normal Distribution==&lt;br /&gt;
&lt;br /&gt;
* 68% of area lies within one standard deviation of the mean&lt;br /&gt;
* 95% for two&lt;br /&gt;
* 99.7% for three standard deviations&lt;br /&gt;
&lt;br /&gt;
==Mean==&lt;br /&gt;
&lt;br /&gt;
* population mean &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; is the average over the entire population of size N &amp;lt;math&amp;gt;\mu = \frac{1}{N}\sum^N x_i&amp;lt;/math&amp;gt;&lt;br /&gt;
* sample mean &amp;lt;math&amp;gt;\overline{x}&amp;lt;/math&amp;gt; is the average over a sample of size n (usually n &amp;lt;&amp;lt; N) &amp;lt;math&amp;gt;\overline{x} = \frac{1}{n}\sum^n x_i&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Variance and Standard Deviation==&lt;br /&gt;
&lt;br /&gt;
* population variance &amp;lt;math&amp;gt;\sigma^2 = \frac{1}{N}\sum (x_i-\mu)^2&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample variance &amp;lt;math&amp;gt;s^2 = \frac{1}{n-1}\sum (x_i-\overline{x})^2 = \frac{\sum x_i^2 -n\overline{x}^2}{n-1} &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* standard deviation is the square root of variance&lt;br /&gt;
&lt;br /&gt;
==Mode, Median==&lt;br /&gt;
&lt;br /&gt;
* The mode of a sample is simply the value that occurs most often&lt;br /&gt;
* The median is the value that has an equal number of values above and below it. If there are an even number of values, you average the two middle ones.&lt;br /&gt;
&lt;br /&gt;
==Z and Student&amp;#039;s t Distribution==&lt;br /&gt;
&lt;br /&gt;
* Z distribution for a sample normalizes &amp;lt;math&amp;gt;z_i = \frac{x_i-\overline{x}}{s}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Student&amp;#039;s t distribution approximates z distribution but compensates for smaller samples by fattening tails as n gets smaller. It is essentially standard normal for n &amp;gt; 100. The parameter is called &amp;quot;degrees of freedom&amp;quot;, and amounts to n-1&lt;br /&gt;
&lt;br /&gt;
[[Image:student_t.png|300px]]&lt;br /&gt;
&lt;br /&gt;
==Standard Error of the Mean, or Mean Standard Error (MSE)==&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;math&amp;gt;s_{\overline{x}} = \frac{s}{\sqrt{n}}&amp;lt;/math&amp;gt;&lt;br /&gt;
* standard deviationof the sample: 95% of sample values lie within the interval &amp;lt;math&amp;gt;\overline{x} \pm 2s&amp;lt;/math&amp;gt;&lt;br /&gt;
* standard deviation of the mean: the true mean &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; lies within the interval &amp;lt;math&amp;gt;\overline{x} \pm 2s_{\overline{x}}&amp;lt;/math&amp;gt; 95% of the time&lt;br /&gt;
* In statistical tests, the means of samples are compared, not the data points themselves&lt;br /&gt;
&lt;br /&gt;
==Confidence Intervals==&lt;br /&gt;
&lt;br /&gt;
The interval &amp;lt;math&amp;gt;\overline{x} \pm t(\alpha/2, n-1) \frac{s}{\sqrt{n}}&amp;lt;/math&amp;gt; contains the mean &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; with a confidence level of &amp;lt;math&amp;gt;1-\alpha&amp;lt;/math&amp;gt;. For example, &amp;lt;math&amp;gt;\alpha&amp;lt;/math&amp;gt; would be 0.05 for a 95% confidence interval.&lt;br /&gt;
&lt;br /&gt;
==Hypothesis testing==&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Upper-tail test&amp;#039;&amp;#039;&amp;#039;&amp;lt;nowiki&amp;gt;: Does sample A have higher values than sample B? We look at A - B &amp;gt; 0&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Lower-tail test&amp;#039;&amp;#039;&amp;#039;&amp;lt;nowiki&amp;gt;: Does sample A have lower values than sample B? We look at A - B &amp;lt; 0&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Two-tail test&amp;#039;&amp;#039;&amp;#039;&amp;lt;nowiki&amp;gt;: Is sample A different from sample B? We look at A - B not equal to zero&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* In all cases, the &amp;#039;&amp;#039;&amp;#039;null hypothesis&amp;#039;&amp;#039;&amp;#039; is that A - B is equivalent to zero within the accuracy of our test.&lt;br /&gt;
* A - B is represented by z or t values, and tails refer to the outlying values of the standard normal or t distribution.&lt;br /&gt;
* The two tail test is more general, but the upper and lower tail tests are more powerful because they are more specific.&lt;br /&gt;
* Rejecting the null hypothesis is significant, but failing to reject it is not.&lt;br /&gt;
&lt;br /&gt;
==Two types of errors==&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Type 1 error&amp;#039;&amp;#039;&amp;#039;&amp;lt;nowiki&amp;gt;: alpha is the probability (or &amp;quot;acceptance level&amp;quot;) of rejecting the null hypothesis when you shouldn&amp;#039;t. So if &amp;lt;/nowiki&amp;gt;&amp;lt;math&amp;gt;\alpha = 0.05&amp;lt;/math&amp;gt;, we will falsely claim a significant difference 5 times out of 100.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Type 2 error&amp;#039;&amp;#039;&amp;#039;&amp;lt;nowiki&amp;gt;: beta is the probability (or &amp;quot;acceptance level&amp;quot;) of sticking with the null hypothesis when you shouldn&amp;#039;t. So if &amp;lt;/nowiki&amp;gt;&amp;lt;math&amp;gt;\beta = 0.20&amp;lt;/math&amp;gt;, we will wrongly ignore a significant different 20 times out of 100.&lt;br /&gt;
* The &amp;#039;&amp;#039;&amp;#039;power&amp;#039;&amp;#039;&amp;#039; of the statistic is defined as &amp;lt;math&amp;gt;1-\beta&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Compare one sample to a known standard value==&lt;br /&gt;
&lt;br /&gt;
* Calculated t value is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;t_c = \frac{\overline{x}-c}{s/\sqrt{n}}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where c is known standard value. Note that this is close to zero if the mean is close to the known standard value.&lt;br /&gt;
&lt;br /&gt;
* Two-tail test: Null hypothesis is that&amp;lt;math&amp;gt;t(-\alpha/2, n-1) &amp;lt; t_c &amp;lt; t(\alpha/2, n-1)&amp;lt;/math&amp;gt;Rejection means that our mean is significantly different from the standard value.&lt;br /&gt;
* Lower-tail test: Null hypothesis is that&amp;lt;math&amp;gt;t_c &amp;gt; t(-\alpha, n-1)&amp;lt;/math&amp;gt;Rejection means that our mean is significantly less than the standard value.&lt;br /&gt;
* Upper-tail test: Null hypothesis is that&amp;lt;math&amp;gt;t_c &amp;lt; t(\alpha, n-1)&amp;lt;/math&amp;gt;Rejection means that our mean is significantly greater than the standard value.&lt;br /&gt;
&lt;br /&gt;
==Determining adequate sample size for one-sample test==&lt;br /&gt;
&lt;br /&gt;
* Rough estimate:&amp;lt;math&amp;gt;n \ge \frac{z_{\alpha/2}^2s^2}{d^2}&amp;lt;/math&amp;gt;where s is an estimate of deviation and d is &amp;quot;detection level&amp;quot;, the minimum difference from the standard value that needs to be detected&lt;br /&gt;
* Iterative method:&amp;lt;math&amp;gt;n \ge \frac{t^2(\alpha/2, n-1)s^2}{d^2}&amp;lt;/math&amp;gt;Here, n shows up on both sides of the equation, so start with a large estimate of n and recalculate until it converges.&lt;br /&gt;
&lt;br /&gt;
==Two-sample independent t test==&lt;br /&gt;
&lt;br /&gt;
* Assume that each sample comes from a normal distribution&lt;br /&gt;
* We can make no assumptions about the variances of the two samples&lt;br /&gt;
* Calculated test statistic is&amp;lt;math&amp;gt;t_c = (\overline{x}_A-\overline{x}_B)/\sqrt{\frac{s_A^2}{n_A}+\frac{s_B^2}{n_B}}&amp;lt;/math&amp;gt; with &amp;lt;math&amp;gt;n_1+n_2-2&amp;lt;/math&amp;gt; degrees of freedom&lt;br /&gt;
* Sample size determination:&amp;lt;math&amp;gt;n \ge \frac{(s_1^2+s_2^2)(z_{\alpha/2}+z_{\beta})^2}{d^2}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Two-sample pooled t test==&lt;br /&gt;
&lt;br /&gt;
* Assume that each sample comes from a normal distribution&lt;br /&gt;
* Assume that variances of the two samples are close to equal&lt;br /&gt;
* Pooled standard deviation is&amp;lt;math&amp;gt;s_{pooled} = \sqrt{\frac{(n_A-1)s_A^2+(n_B-1)s_B^2}{n_A+n_B-2}}\sqrt{\frac{1}{n_A}+\frac{1}{n_B}}&amp;lt;/math&amp;gt;&lt;br /&gt;
* Calculated test statistic is&amp;lt;math&amp;gt;t_c = \frac{\overline{x}_A - \overline{x}_B}{s_{pooled}}&amp;lt;/math&amp;gt;&lt;br /&gt;
* Sample size determination:&amp;lt;math&amp;gt;n \ge \frac{s_{pooled}^2(z_{\alpha/2}+z_{\beta})^2}{d^2}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Paired t test==&lt;br /&gt;
&lt;br /&gt;
* Assume that each sample comes from a normal distribution&lt;br /&gt;
* Samples come in pairs (e.g. each test subject is matched to an otherwise identical control)&lt;br /&gt;
* We do statistics on &amp;lt;math&amp;gt;d = x_A-x_B&amp;lt;/math&amp;gt;&lt;br /&gt;
* Mean(n is the number of pairs!):&amp;lt;math&amp;gt;\overline{d} = \frac{1}{n}\sum{(x_A-x_B)} = \frac{1}{n}\sum{d}&amp;lt;/math&amp;gt;&lt;br /&gt;
* Standard deviation of pair differences is&amp;lt;math&amp;gt;s_{paired} = \sqrt{\frac{\sum{(d_i-\overline{d})^2}}{n-1}}&amp;lt;/math&amp;gt;&lt;br /&gt;
* Test statistic:&amp;lt;math&amp;gt;t_c = \frac{\overline{d}\sqrt{n}}{s_{paired}}&amp;lt;/math&amp;gt;&lt;br /&gt;
* Sample size determination:&amp;lt;math&amp;gt;n \ge \frac{s_{paired}^2(z_{\alpha/2}+z_{\beta})^2}{d^2}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Two boolean sample proportion t test==&lt;br /&gt;
&lt;br /&gt;
* If each trial is a boolean &amp;quot;success&amp;quot; or &amp;quot;failure&amp;quot;, let P denote the proportion of successes for the sample.&lt;br /&gt;
* The combined proportion of samples A and B is&amp;lt;math&amp;gt;P_c = \frac{n_A P_A+n_B P_B}{n_A+n_B}&amp;lt;/math&amp;gt;&lt;br /&gt;
* The sample standard deviation is&amp;lt;math&amp;gt;s = \sqrt\frac{P_c(1-P_c)}{n_A+n_B}&amp;lt;/math&amp;gt;&lt;br /&gt;
* The test statistic is&amp;lt;math&amp;gt;t_c = \frac{P_A - P_B}{s}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==One-factor ANOVA==&lt;br /&gt;
&lt;br /&gt;
* This is analogous to a two-sample independent t-test, but with more than two samples.&lt;br /&gt;
* We assume that the sampled populations are normally distributed with different means but the same variance.&lt;br /&gt;
* Let &amp;lt;math&amp;gt;j = 1 ... m&amp;lt;/math&amp;gt; denote the m different populations, and &amp;lt;math&amp;gt;i = 1 ... n&amp;lt;/math&amp;gt; denote the n members &amp;lt;math&amp;gt;x_{ij}&amp;lt;/math&amp;gt; in each sample.&lt;br /&gt;
* Let &amp;lt;math&amp;gt;\overline{x}_j = \frac{1}{n}\sum_{i=1}^n x_{ij}&amp;lt;/math&amp;gt; denote the mean of the jth sample, and let &amp;lt;math&amp;gt;\overline{\overline{x}} = \frac{1}{nm}\sum_i \sum_j x_{ij} = \frac{1}{m}\sum_{j=1}^m\overline{x}_j&amp;lt;/math&amp;gt; denote the mean of all the samples.&lt;br /&gt;
* The sample variance of the population means (also &amp;quot;mean square treatment&amp;quot;) is&amp;lt;math&amp;gt;MST = \frac{1}{m-1}\sum_j(\overline{x}_j - \overline{\overline{x}})^2&amp;lt;/math&amp;gt;&lt;br /&gt;
* The mean of the sample variances (also &amp;quot;mean square error&amp;quot;) is&amp;lt;math&amp;gt;MSE = \frac{1}{m(n-1)}\sum_i \sum_j(x_{ij} - \overline{x}_j)^2&amp;lt;/math&amp;gt;&lt;br /&gt;
* The test statistic is &amp;lt;math&amp;gt;F_c = \frac{MST}{MSE}&amp;lt;/math&amp;gt;&lt;br /&gt;
* The null hypothesis is that the populations are not significantly different from each other, so that &amp;lt;math&amp;gt;F_c \approx 1&amp;lt;/math&amp;gt;.&lt;br /&gt;
* We reject the null hypothesis at level alpha if &amp;lt;math&amp;gt;F_c &amp;gt; F_t = F[\alpha, m-1, m(n-1)&amp;lt;/math&amp;gt;], where&lt;br /&gt;
* &amp;lt;math&amp;gt;F(\alpha, d_1, d_2)&amp;lt;/math&amp;gt; is F-test based on the Fisher or F-distribution. See http://en.wikipedia.org/wiki/F-test&lt;br /&gt;
&lt;br /&gt;
======Contrasts (Tukey method)======&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Tukey%27s_test&lt;br /&gt;
* Make the same assumptions as in one-factor ANOVA above, but instead of testing everything all at once, we compare each pair of populations independently.&lt;br /&gt;
* For each pair of means, we reject the null hypothesis (that they are the same) if&amp;lt;math&amp;gt;|\overline{x}_i - \overline{x}_j| &amp;gt; q(\alpha, m, m(n-1))\sqrt{\frac{MSE}{n}}&amp;lt;/math&amp;gt;, where &amp;lt;math&amp;gt;q(\alpha, d_1, d_2)&amp;lt;/math&amp;gt; is based on a Tukey-Cramer distribution.&lt;br /&gt;
&lt;br /&gt;
======Confidence Intervals======&lt;br /&gt;
&lt;br /&gt;
* We can also compare confidence intervals for each population:&amp;lt;math&amp;gt;\mu_i = \overline{x}_i \pm t(\alpha/2, m(n-1))\sqrt{\frac{MSE}{n}}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
======Sample size estimate======&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;n \ge \frac{m\cdot MSE(z_{\alpha/2}+z_\beta)^2}{\delta^2}&amp;lt;/math&amp;gt;, where the MSE is estimated beforehand and delta is the desired detection level&lt;br /&gt;
&lt;br /&gt;
==Blocked ANOVA==&lt;br /&gt;
&lt;br /&gt;
* This is like the paired t-test for more than two populations.&lt;br /&gt;
* We assume that the sampled populations are normally distributed with different means but the same variance.&lt;br /&gt;
* We further assume that samples come in blocks, each member of the blocks coming from one of each population.&lt;br /&gt;
* Let &amp;lt;math&amp;gt;j = 1 ... m&amp;lt;/math&amp;gt; denote the m different populations, and &amp;lt;math&amp;gt;i = 1 ... n&amp;lt;/math&amp;gt; denote the n groups of members &amp;lt;math&amp;gt;x_{ij}&amp;lt;/math&amp;gt; in each sample.&lt;br /&gt;
* In addition to the notation above, let&amp;lt;math&amp;gt;\overline{x}_i = \frac{1}{m}\sum_{j=1}^m x_{ij}&amp;lt;/math&amp;gt; denote the mean of the ith block.&lt;br /&gt;
* The sample variance of the population means (also &amp;quot;mean square treatment&amp;quot;) is&amp;lt;math&amp;gt;MST = \frac{n}{m-1}\sum_j(\overline{x}_j - \overline{\overline{x}})^2&amp;lt;/math&amp;gt;&lt;br /&gt;
* The sample variance of the block means (also &amp;quot;mean square block&amp;quot;) is&amp;lt;math&amp;gt;MSB = \frac{m}{n-1}\sum_i(\overline{x}_i - \overline{\overline{x}})^2&amp;lt;/math&amp;gt;&lt;br /&gt;
* The mean of the sample variances (also &amp;quot;mean square error&amp;quot;) is&amp;lt;math&amp;gt;MSE = \frac{1}{(m-1)(n-1)}\sum_i \sum_j(x_{ij} - \overline{x}_j)^2&amp;lt;/math&amp;gt;&lt;br /&gt;
* There are two null hypotheses and two statistics:&lt;br /&gt;
* Reject the hypothesis that the populations share the same mean if&amp;lt;math&amp;gt;\frac{MST}{MSE} &amp;gt; F[\alpha, m-1, (m-1)(n-1)&amp;lt;/math&amp;gt;]&lt;br /&gt;
* Reject the hypothesis that the blocks share the same mean if&amp;lt;math&amp;gt;\frac{MSB}{MSE} &amp;gt; F[\alpha, n-1, (m-1)(n-1)&amp;lt;/math&amp;gt;]&lt;br /&gt;
* If the blocks turn out to be the same, you gain no power by choosing the blocked ANOVA over the standard one-way ANOVA.&lt;br /&gt;
&lt;br /&gt;
======Contrasts (Tukey method)[http://en.wikipedia.org/wiki/Tukey%27s_test ]======&lt;br /&gt;
&lt;br /&gt;
* Make the same assumptions as in blocked ANOVA above, but instead of testing everything all at once, we compare each pair of populations independently.&lt;br /&gt;
* For each pair of means, we reject the null hypothesis (that they are the same) if&amp;lt;math&amp;gt;|\overline{x}_i - \overline{x}_j| &amp;gt; q(\alpha, m, (m-1)(n-1))\sqrt{\frac{MSE}{n}}&amp;lt;/math&amp;gt;, where &amp;lt;math&amp;gt;q(\alpha, d_1, d_2)&amp;lt;/math&amp;gt; is based on a Tukey-Cramer distribution.&lt;br /&gt;
&lt;br /&gt;
======Confidence Intervals======&lt;br /&gt;
&lt;br /&gt;
* We can also compare confidence intervals for each population:&amp;lt;math&amp;gt;\mu_i = \overline{x}_i \pm t(\alpha/2, (m-1)(n-1))\sqrt{\frac{MSE}{n}}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
======Sample size estimate======&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;n \ge \frac{m\cdot MSE(z_{\alpha/2}+z_\beta)^2}{\delta^2}&amp;lt;/math&amp;gt;, where the MSE is estimated beforehand and delta is the desired detection level&lt;br /&gt;
&lt;br /&gt;
==Fitting data with least-squares regression==&lt;br /&gt;
&lt;br /&gt;
* We have a set of n (x,y) pairs, and want to fit these points to a line &amp;lt;math&amp;gt;\hat{y} = a + bx&amp;lt;/math&amp;gt;&lt;br /&gt;
* Estimate the slope with&amp;lt;math&amp;gt;b = \frac{\sum{xy} - n\overline{x} \overline{y}}{\sum{x^2} - n\overline{x}^2}&amp;lt;/math&amp;gt;&lt;br /&gt;
* Estimate the y-intercept with&amp;lt;math&amp;gt;a = \overline{y}-b\overline{x}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Linearizing Data==&lt;br /&gt;
&lt;br /&gt;
[[Image:linearizing_data.png]]&lt;br /&gt;
&lt;br /&gt;
* Linearize by increasing or decreasing the power of the x or y values, or both. Linearizing y only is most common.&lt;br /&gt;
* Here is the sequence of transformations to try:&amp;lt;math&amp;gt;..., y^{-3}, y^{-2}, y^{-1}, y^{-1/2}, \log{y}, y^{1/2}, y, y^2, y^3, ...&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Interpolating from the Linear Regression==&lt;br /&gt;
&lt;br /&gt;
* Extrapolating beyond the data range is risky.&lt;br /&gt;
* After solving for a and b, you can generate an alpha-level confidence interval for the &amp;#039;&amp;#039;average&amp;#039;&amp;#039; &amp;lt;math&amp;gt;\hat{y}&amp;lt;/math&amp;gt; for a given x:&amp;lt;math&amp;gt;\overline{\hat{y}} \pm t(\alpha/2, n-2)\sqrt{\frac{\sum(y-\hat{y})^2}{n-2}\left[\frac{1}{n}+\frac{(x-\overline{x})^2}{\sum(x-\overline{x})^2}\right]}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Correlation==&lt;br /&gt;
&lt;br /&gt;
* For a set of n (x,y) pairs, we can talk about how well the samples are correlated.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Correlation coefficient&amp;#039;&amp;#039;&amp;#039; is denoted as r:&amp;lt;math&amp;gt;r = \frac{\sum{xy}-\frac{1}{n}\sum{x}\sum{y}}{\sqrt{\left[\sum{x^2}-\frac{1}{n}\left(\sum{x}\right)^2\right]\left[\sum{y^2}-\frac{1}{n}\left(\sum{y}\right)^2\right]}}&amp;lt;/math&amp;gt;&lt;br /&gt;
* A r = 1 means a perfectly linear relationship with positive slope. r = -1 is perfectly correlated with negative slope. r = 0 means no relationship.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Coefficient of determination&amp;#039;&amp;#039;&amp;#039; is &amp;lt;math&amp;gt;r^2&amp;lt;/math&amp;gt; and has a direct interpretation. If &amp;lt;math&amp;gt;r^2 = 0.9&amp;lt;/math&amp;gt;, that means 90% of the data variability can be explained by the regression equation. The rest is random noise.&lt;br /&gt;
&lt;br /&gt;
==Testing boolean data==&lt;br /&gt;
&lt;br /&gt;
* We have a sample of size n from a population of boolean trials, and we know that proportion p of the trials resulted in success.&lt;br /&gt;
* The sample mean is just p, and for the sake of discussion, assume &amp;lt;math&amp;gt;p &amp;lt; 1-p&amp;lt;/math&amp;gt;.&lt;br /&gt;
* If &amp;lt;math&amp;gt;np &amp;gt; 5&amp;lt;/math&amp;gt;, we model the sample as binomial and take the sample variance as &amp;lt;math&amp;gt;s^2 = p(1-p)&amp;lt;/math&amp;gt;&lt;br /&gt;
* If &amp;lt;math&amp;gt;np \not&amp;gt; 5&amp;lt;/math&amp;gt;, we model the sample as Poisson and take the sample variance as &amp;lt;math&amp;gt;s^2 = p&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Confidence interval for mean of boolean sample==&lt;br /&gt;
&lt;br /&gt;
* The confidence interval for the sample mean is &amp;lt;math&amp;gt;p \pm z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;p \pm z_{\alpha/2}\sqrt{\frac{p}{n}}&amp;lt;/math&amp;gt;&lt;br /&gt;
* If p is anywhere near 0.5, we throw in the &amp;quot;Yates factor&amp;quot; to expand the confidence interval:&amp;lt;math&amp;gt;p - z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}} - \frac{1}{2n} &amp;lt; \pi &amp;lt; p + z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}} + \frac{1}{2n}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Comparing the mean of a boolean sample to a standard value==&lt;br /&gt;
&lt;br /&gt;
* To compare the mean p to a standard value c, we use as test statistic&amp;lt;math&amp;gt;z_c = \frac{p-c}{\sqrt{\frac{p(1-p)}{n}}}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;z_c = \frac{p-c}{\sqrt{\frac{p}{n}}}&amp;lt;/math&amp;gt;Note that this is close to zero when p is close to c.&lt;br /&gt;
&lt;br /&gt;
* Two-tail test: Null hypothesis is that&amp;lt;math&amp;gt;-z_{\alpha/2} &amp;lt; z_c &amp;lt; z_{\alpha/2}&amp;lt;/math&amp;gt;Rejection means that our mean is significantly different from the standard value.&lt;br /&gt;
* Lower-tail test: Null hypothesis is that&amp;lt;math&amp;gt;z_c &amp;gt; -z_{\alpha}&amp;lt;/math&amp;gt;Rejection means that our mean is significantly less than the standard value.&lt;br /&gt;
* Upper-tail test: Null hypothesis is that&amp;lt;math&amp;gt;z_c &amp;lt; z_\alpha&amp;lt;/math&amp;gt;Rejection means that our mean is significantly greater than the standard value.&lt;/div&gt;</summary>
		<author><name>Scott</name></author>
	</entry>
</feed>