<< Chapter < Page | Chapter >> Page > |
The mean of each vector will be equal to:
The standard deviation of each vector will be equal to:
The standard error of each vector will be equal to:
The formula for the t test is as follows:
The t-test is used to test the difference between the means of two test sets, as in before and after studies or matched-pairs studies.There is a confidence interval for the mean and a critical value for t for the chosen level of significance associated with the t-test. For instance, a level of significance equal to0.05 means that 95% of the cases will be within the confidence range if there is no significant difference between the means of the two test sets, or experiments,being compared. The confidence limits set upper and lower bounds on an estimate of the mean for the chosen level of significance (0.05). The confidence interval is therange within the bounds of the confidence limits. The confidence interval can be computed, if you know the shape of your distribution. For normally distributed data, theconfidence limits at the 0.05 significance level for an estimated mean are the sample mean plus or minus 1.96 times thestandard error.
confidence interval (normal distribution): mean +/- 1.96 * SE
For example, if the sample mean is 10 and the standard error is 1.2, then 95% of the cases will be within the range of 10 plus or minus 1.96 times 1.2,or 10 plus or minus 2.4, which is the range from 7.6 to 12.4. Thus, if the experimental mean is outside the limits of this range computed for the reference mean, then the differencebetween the means of the two test sets is considered to be significant within a probability of 95%. The critical value for t at a given significance level fora specific type of distribution can be looked up in a table; most statistics books contain them. In the case of microarray data, if the absolute value of tis greater than the critical value, this indicates a significant difference in the gene expression between the reference and experimental test sets.Because the t-test is a parametric test that assumes a normal distribution, the statistical tests that are commonly used to analyze microarray data are more complexvariations that are used for distributions other than normal distributions.
red:green ratio | red:green ratio | red:green ratio | red:green ratio | red:green ratio | red:green ratio | red:green ratio | |
---|---|---|---|---|---|---|---|
Gene | measurement 1 | measurement 2 | measurement 3 | measurement 4 | measurement 5 | measurement 6 | measurement 7 |
A(ref) | 0.97 | 1.54 | 1.32 | 0.89 | 1.06 | 1.21 | |
A(exp) | 1.37 | 1.25 | 1.15 | 0.99 | 1.30 | 1.53 | 1.07 |
B(ref) | 1.67 | 1.78 | 2.01 | 1.89 | 1.75 | 1.81 | 1.69 |
B(exp) | 6.21 | 6.03 | 5.94 | 6.14 | 6.11 |
Assumptions for example problem:
What are the means for each row of data?
What are the standard deviations for each row of data?
What is the standard error for each row of data?
What is the value for t for the comparison between the reference and experimental test sets for Gene A?
What is the 95% confidence interval computed for the Gene A reference set?
Is there a significant difference between the mean values of the experimental versus the reference set for Gene A? (Explain the answer both in terms of the t value and the confidence interval.)
What is the value for t for the comparison between the reference and experimental test sets for Gene B?
What is the 95% confidence interval computed for the Gene B reference set?
Is there a significant difference between the mean values of the experimental versus the reference set for Gene B? (Explainthe answer both in terms of the t value and the confidence interval.)
If there was a significant difference between the gene expressionunder experimental conditions versus the gene expression under reference conditions for either Gene A or Gene B, then estimate the significant increaseor decrease observed.
There are many software packages available that have been designed expressly for microarray data analysis. In addition to testing gene expression under aset of experimental conditions versus reference conditions, it is possible to identified "clustered" genes that seem to have similar responses under similarconditions. Also, genes can be identified that show related responses under similar conditions, such as one gene's expression always increases whenanother's decreases. When two or more genes show this kind of clustered behavior, it can be an indication that they are part of the same pathway,or that they are regulating each other. Using this type of microarray data analysis, the scientist can combine the cluster analysis results with what is known through laboratory experiments and often come up with new hypotheses about biochemical pathways and regulation.
Notification Switch
Would you like to follow the 'Bios 533 bioinformatics' conversation and receive update notifications?