R : Copyright 2004, The R Foundation for Statistical Computing Version 1.9.1 (2004-06-21), ISBN 3-900051-00-3 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for a HTML browser interface to help. Type 'q()' to quit R. > #RUNNING THIS SCRIPT > #the simplest way to run this script is to put all of the data needed into one directory or folder > #open a terminal window and change directories to the data directory > #type R #R results will save the R output in results > #the plot will appear in an X window (if one is open) and in a postscript file called Rplots.ps > #the ps file may be viewed by running ghostscript on a linux/unix machine > #or clicking on the icon in mac os10 which generates a pdf file > > #PROGRAM NOTES > #load the limma package > #limma = linear models for microarray analysis > #see http://bioinf.wehi.edu.au/limma/ for description > > library(limma) > > #we are using a small data set of 95 genes that have been extracted > #from a much larger gene set to illustrate treatment of statistical variation > #mRNA from leaf or root tissue has been labeled with Cy3 or Cy5 (red or green) > #mixed and hybridized to slides. > #the next command reads in a text file rootleafnew.txt that shows a list of the arrays to be read > #these array files are gpr files that show one line for each gene, listing the gene (EST), > # the array number, and then foreground and background values for red and green (Rf,Rb,Gf,Gb) > #view r9.gpr to see the format of the data file > # > #each line in rootleafnew.txt shows label for root or leaf tissue > #view the file rootleafnew.txt to see the experimental protocol > > targets <- readTargets("rootleafnew.txt") > > #the next line creates an instance of the RGList data object > #the read.images command reads the data in all of the gpr files into the data object > #the object is a table or matrix of the data in the files with the column of the gpr file name, > #and the same data columns as in the gpr files > > RG <-read.maimages(targets$File.Name,columns=list(Rf="Rf",Gf="Gf",Rb="Rb",Gb="Gb")) Read r91.gpr Read r92.gpr Read r93.gpr Read r94.gpr > > #normally the Rb and Gb values would be subtracted from Rf and Gf > #by one of the methods described in the text > #our statistician says not to subtract background for this small a data set of 95 genes > #note that the foreground values in RG would normally be changed by this next command > > RG <- backgroundCorrect(RG, method="none") > > #now the red and green data points on each array will be normalized > #so that as the spread of data points is more uniform > #using the Loess method described in the text and normalizing > #within a sliding window interval of 0.4 along a plot of the red/green averages (A) - x axis > #vs. log of the ratio of red to green (M) - y axis > #a new data object MA with these normalized values is created > > MA <- normalizeWithinArrays(RG, method="loess", span=0.4) > > #now fit the normalized data to a linear model with the lmFit command > #the c argument represents information on the dye swaps > #here, the use of red and green labels for leaf and root on each array > #fit will have a new set of expression values based on the analysis > #described on page 635, Figure 13.10 > #these values take into account all sources of variation in the data > #(slide, dye) except for variation between root and leaf > > fit <- lmFit(MA, c(-1,1,-1,1)) > > #the data spread from other genes is now used to estimate how far each value > #is deviating from the mean based on the statistical variation in the average gene > #this is called an empirical Bayes fit to the data which "smoothes" the standard errors > > fit2 <- eBayes(fit) > > #topTable is now used to show a table of the 29 genes that are showing > #the most significant variation between roots and leaves > #shown in the table are gene, M value (log of R/G expression ratio), A value (average of R and G expression values) > #the t statistic between root and leaf values, the probability of a larger t corrected for false discover rate > #see web site on line 12 > #B is a measure of false discovery rate and is the log odds that the gene is differentially expressed > > topTable(fit2, adjust="fdr", n=29) M A t P.Value B 7 -4.3041351 11.331225 -42.483211 3.142499e-06 9.9414154 55 -3.6450484 11.270645 -34.865550 3.765769e-06 8.9698811 34 -4.4690967 10.982140 -33.730740 3.765769e-06 8.7985096 51 4.3606471 11.536743 26.325167 1.113696e-05 7.4500143 42 -4.2050610 13.331101 -23.715384 1.322064e-05 6.8539228 40 -3.9703779 14.090790 -23.713573 1.322064e-05 6.8534817 15 -2.4041778 9.097903 -22.983113 1.346677e-05 6.6722461 27 2.6961831 10.187489 22.139530 1.448119e-05 6.4542347 93 3.1848981 12.336154 20.563158 1.933317e-05 6.0195559 45 2.4099117 11.222206 19.646907 2.125475e-05 5.7489259 6 2.6685291 9.595381 19.487576 2.125475e-05 5.7004060 53 -2.1489284 10.843972 -18.232679 2.807940e-05 5.3020155 85 3.1439140 11.785828 12.492530 2.032768e-04 3.0018222 63 2.3095565 10.401730 10.495617 4.590316e-04 1.9374775 35 1.7494085 13.606037 10.397668 4.590316e-04 1.8803585 11 1.0065633 10.524029 10.325039 4.590316e-04 1.8376743 21 -0.9716747 11.443381 -9.413600 6.881459e-04 1.2766229 57 -1.4436630 12.238223 -9.355659 6.881459e-04 1.2392726 81 1.7105442 14.185770 8.927839 8.340344e-04 0.9567514 23 1.4347553 9.471391 8.654987 9.175590e-04 0.7700920 87 1.0349012 12.699132 8.600731 9.175590e-04 0.7323471 24 0.8826645 10.476534 8.158804 1.152851e-03 0.4167857 92 -2.2646784 10.729282 -7.880773 1.319655e-03 0.2104866 84 -1.3020188 10.245729 -7.283330 1.896089e-03 -0.2548227 18 1.1928899 9.658257 7.172774 1.942648e-03 -0.3444269 1 0.8121760 8.649424 7.135796 1.942648e-03 -0.3746530 88 -1.9485754 8.756081 -6.947053 2.143388e-03 -0.5309688 86 1.2539832 13.575521 6.825872 2.259174e-03 -0.6331659 89 0.9685733 10.873341 6.625832 2.533618e-03 -0.8051103 > > #plot an M/A plot as described above for the data before and after normalization > #cex is a value to increase the size of the text labels on the plot > > plotMA(RG,cex=0.5) > plotMA(MA,cex=0.5) > > #exercise - add the command volcanoplot(fit) - but limma version 1.8.6 is needed > #a volcano plot plots the fold change against the significance of the change > > #volcanoplot(fit) >