Thursday, May 14, 2015

Meng's Notes: Simple Enrichment Test -- calculate hypergeometric...

Meng's Notes: Simple Enrichment Test -- calculate hypergeometric...: Hypergeometric test are useful for enrichment analysis. For example, having a gene list in hand, people might want to tell which functions (...




















Wednesday, December 19, 2012


Simple Enrichment Test -- calculate hypergeometric p-values in R


Hypergeometric test are useful for enrichment analysis. For example,
having a gene list in hand, people might want to tell which functions
(GO terms) are enriched among these genes. Hypergeometric test (or its
equivalent: one-tailed
Fisher's exact test) will give you statistical confidence in p-values.



R software provids function phyper and fisher.test
for Hypergeometric and Fisher's exact test accordingly. However, it is
tricky to get it right. I spent some time to make it clear.





Here is a simple example:

Five cards were chosen from a well shuffled deck

x = the number of diamonds selected.

We use a 2x2 table to represent the case:



                Diamond     Non-Diamond

selected        x                     5-x               total 5 sampled cards

left               13-x                 34+x             total 47 left cards after sampling

                 13 Dia          39 Non-Dia         total 52 cards



We 're asking if diamond enriched or depleted in our selected cards, comparing to the background.




Here are the different parameters used by phyper and fisher.test:


phyper(x, 13, 39, 5, lower.tail=TRUE);
# Numerical parameters in order:
# (success-in-sample, success-in-bkgd, failure-in-bkgd, sample-size).
fisher.test(matrix(c(x, 13-x, 5-x, 34+x), 2, 2), alternative='less');
# Numerical parameters in order:
# (success-in-sample, success-in-left-part, failure-in-sample, failure-in-left-part).
It's obvious that hypergeometric test compares sample to bkgd, while
fisher's exact test compares sample to the left part of bkgd after
sampling without replacement. They will give the same p-value (because
they assume the same distribution).


Here is the results:

x=1; # x could be 0~5 
hitInSample 1  # could be 0~5
hitInPop 13 
failInPop 54-hitInPop 
sampleSize = 5
  • Test for under-representation (depletion)
phyper(hitInSample-1hitInPopfailInPopsampleSizelower.tailTRUE);
## [1] 0.6329532
fisher.test(matrix(c(hitInSamplehitInPop-hitInSamplesampleSize-hitInSamplefailInPop-sampleSize +hitInSample), 22), alternative='less')$p.value; 
## [1] 0.6329532
  • Test for over-representation (enrichment)
phyper(hitInSample-1hitInPopfailInPopsampleSizelower.tailFALSE);
## [1] 0.7784664
fisher.test(matrix(c(hitInSamplehitInPop-hitInSamplesampleSize-hitInSamplefailInPop-sampleSize +hitInSample), 22), alternative='greater')$p.value; 
## [1] 0.7784664
  •  Why hitInSample-1 when testing over-representation?
Because if lower.tail is TRUE (default), probabilities are
P[X ≤ x], otherwise, P[X > x]. We subtract x by 1, when P[X ≥ x] is needed.





So are there any advantages fisher.test has over phyper, as they give the same p-values?

Yes, fisher.test can do two other jobs: two-side test, and giving
confidence intervals of odds ratio. Please refer to its manual for
details. For one-side p-value calculating, they don't have any
difference if correct parameters were used.


Tuesday, May 12, 2015

How to change the alpha value of colours in R

How to change the alpha value of colours in R

Often I like to reduce the alpha value (level of transparency) of colours to identify patterns of over-plotting when displaying lots of data points with R. So, here is a tiny function that allows me to add an alpha value to a given vector of colours, e.g. a RColorBrewer palette, using col2rgb and rgb, which has an argument for alpha, in combination with the wonderful apply and sapply functions.

1 2 3 4 5 6 7 8
## Add an alpha value to a colour
add.alpha <- function(col, alpha=1){
if(missing(col))
stop("Please provide a vector of colours.")
apply(sapply(col, col2rgb)/255, 2,
function(x)
rgb(x[1], x[2], x[3], alpha=alpha))
}
view raw add.alpha.R hosted with ❤ by GitHub

The example below illustrates how this function can be used with colours provided in different formats, thanks to the col2rgb function.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
# Source add.alpha function from Github
require(RCurl)
source(textConnection(getURL("https://gist.github.com/mages/5339689/raw/576263b8f0550125b61f4ddba127f5aa00fa2014/add.alpha.R")))
 
## Example
set.seed(1)
n <- 1200
dat <- data.frame(
x = gl(n=4, k=n/4),
y = rnorm(n)
)
 
myColours = c(1, "steelblue", "#FFBB00", rgb(0.4, 0.2, 0.3))
myColoursAlpha <- add.alpha(myColours, alpha=0.4)
## "#00000066" "#4682B466" "#FFBB0066" "#66334D66"
 
op <- par(mfrow=c(1,2), mar=c(2,2,3,1))
boxplot(y ~ x, data=dat, outline=FALSE,
axes=FALSE, main="alpha=1")
points(x=jitter(as.numeric(dat$x)), y=dat$y,
col=myColours[dat$x], pch=19)
box()
 
boxplot(y ~ x, data=dat, outline=FALSE,
axes=FALSE, main="alpha=0.4")
points(x=jitter(as.numeric(dat$x)), y=dat$y,
col=myColoursAlpha[dat$x], pch=19)
box()
par(op)
view raw add.alpha.example.R hosted with ❤ by GitHub

Session Info

sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RCurl_1.95-4.1 bitops_1.0-5