This page is outdated, please also see the man page.

How to run fenrich?

This mode allows assessing a set of QTLs fall within some functional annotations more often than what is expected by chance. To illustrate how it works, first download the required example data:


Step1: Prepare the QTL data

First, you need to prepare a BED file containing the positions of the QTLs of interest. To do so, extract all significant hits at a given FDR threshold (e.g. 5%):

Rscript ./script/runFDR_cis.R results.genes.full.txt.gz 0.05 results.genes

Then, transform the significant QTL list into a BED file using the following command:

cat results.genes.significant.txt | awk '{ print $9, $10-1, $11, $8, $1, $5 }' | tr " " "\t" | sort -k1,1 -k2,2n > results.genes.significant.bed

The resulting BED file looks like this:

head results.genes.significant.bed
1 15210 15211 1_15211 ENSG00000227232.4 -
1 735984 735985 1_735985 ENSG00000177757.1 +
1 735984 735985 1_735985 ENSG00000240453.1 -
1 739527 739528 1_739528 ENSG00000237491.4 +
1 754963 754964 1_754964 ENSG00000225880.4 -
1 769222 769223 1_769223 ENSG00000228794.4 +
1 832397 832398 1_832398 ENSG00000230699.2 +
1 844342 844343 1_844343 ENSG00000223764.2 -
1 879675 879676 1_879676 ENSG00000187634.6 +
1 895705 895706 1_895706 ENSG00000187961.9 +

Note here that this file contains the follwing columns:


Step2: Prepare the Phenotype data

Then, you need to prepare a BED file containing the positions of all Phenotypes you mapped QTLs for. To do so, run this command on the raw QTL mapping output:

zcat results.genes.full.txt.gz | awk '{ print $2, $3-1, $4, $1, $8, $5 }' | tr " " "\t" | sort -k1,1 -k2,2n > results.genes.quantified.bed

The resulting BED file looks like this:

head results.genes.quantified.bed
1 29369 29370 ENSG00000227232.4 1_15211 -
1 135894 135895 ENSG00000268903.1 1_985446 -
1 137964 137965 ENSG00000269981.1 1_1118728 -
1 139378 139379 ENSG00000237683.5 1_965604 -
1 267252 267253 ENSG00000228463.4 1_256498 -
1 317719 317720 ENSG00000237094.7 1_15211 +
1 564441 564442 ENSG00000225972.1 1_1079866 +
1 565019 565020 ENSG00000225630.1 1_1077962 +
1 566453 566454 ENSG00000237973.1 1_69428 +
1 568136 568137 ENSG00000229344.1 1_1392271 +

This file contains the follwing columns:


Step3: Run the enrichment measurements

Now that you've got your QTL and phenotype file ready to go, you can then measure the enrichement using the following command:

QTLtools fenrich --qtl results.genes.significant.bed --tss results.genes.quantified.bed --bed TFs.encode.bed.gz --out enrichement.QTL.in.TF.txt

This generates a file enrichement.QTL.in.TF.txt that contains the number of functional annotations per 1kb bins around QTLs.

cat enrichement.QTL.in.TF.txt
1088 10762 711.253 23.1664 0.000999001 1.7049 1.58987 1.48432

You can use the following option to tune the calculations:

Once you've got your observation and expectation, you can measure the enrichment related statistics in R using:

R> D=read.table("enrichement.QTL.in.TF.txt", head=FALSE, stringsAsFactors=FALSE)
R> fisher.test(matrix(c(D$V1, D$V2, round(D$V3), D$V2), ncol=2))
R= Fisher's Exact Test for Count Data
R= data: matrix(c(D$V1, D$V2, round(D$V3), D$V2), ncol = 2)
R= p-value < 2.2e-16
R= alternative hypothesis: true odds ratio is not equal to 1
R= 95 percent confidence interval:
R= 1.385641 1.690700
R= sample estimates:
R= odds ratio
R= 1.5302

So the enrichment has an odd ration of ~1.53 with a significant P-value < 2.2e-16.

Monday 11th July 2016