This page is outdated, please also see the man page.
How to run fenrich?
This mode allows assessing a set of QTLs fall within some functional annotations more often than what is expected by chance. To illustrate how it works, first download the required example data:
- The results from a full cis-analysis: TXT
- The functional annotation for few Transcription Factor Binding sites: BED
Step1: Prepare the QTL data
First, you need to prepare a BED file containing the positions of the QTLs of interest. To do so, extract all significant hits at a given FDR threshold (e.g. 5%):
Rscript ./script/runFDR_cis.R results.genes.full.txt.gz 0.05 results.genes
Then, transform the significant QTL list into a BED file using the following command:
cat results.genes.significant.txt | awk '{ print $9, $10-1, $11, $8, $1, $5 }' | tr " " "\t" | sort -k1,1 -k2,2n > results.genes.significant.bed
The resulting BED file looks like this:
head results.genes.significant.bed
1 15210 15211 1_15211 ENSG00000227232.4 -
1 735984 735985 1_735985 ENSG00000177757.1 +
1 735984 735985 1_735985 ENSG00000240453.1 -
1 739527 739528 1_739528 ENSG00000237491.4 +
1 754963 754964 1_754964 ENSG00000225880.4 -
1 769222 769223 1_769223 ENSG00000228794.4 +
1 832397 832398 1_832398 ENSG00000230699.2 +
1 844342 844343 1_844343 ENSG00000223764.2 -
1 879675 879676 1_879676 ENSG00000187634.6 +
1 895705 895706 1_895706 ENSG00000187961.9 +
Note here that this file contains the follwing columns:
- 1. Chromosome ID
- 2. Start position of the QTL
- 3. End position of the QTL
- 4. QTL ID
- 5. Targeted phenotype ID (crucial here, specify well the phenotype so that distance between QTL and phenotype can be determined)
- 6. Strand orientation (not used, can be whatever you want)
Step2: Prepare the Phenotype data
Then, you need to prepare a BED file containing the positions of all Phenotypes you mapped QTLs for. To do so, run this command on the raw QTL mapping output:
zcat results.genes.full.txt.gz | awk '{ print $2, $3-1, $4, $1, $8, $5 }' | tr " " "\t" | sort -k1,1 -k2,2n > results.genes.quantified.bed
The resulting BED file looks like this:
head results.genes.quantified.bed
1 29369 29370 ENSG00000227232.4 1_15211 -
1 135894 135895 ENSG00000268903.1 1_985446 -
1 137964 137965 ENSG00000269981.1 1_1118728 -
1 139378 139379 ENSG00000237683.5 1_965604 -
1 267252 267253 ENSG00000228463.4 1_256498 -
1 317719 317720 ENSG00000237094.7 1_15211 +
1 564441 564442 ENSG00000225972.1 1_1079866 +
1 565019 565020 ENSG00000225630.1 1_1077962 +
1 566453 566454 ENSG00000237973.1 1_69428 +
1 568136 568137 ENSG00000229344.1 1_1392271 +
This file contains the follwing columns:
- 1. Chromosome ID of the phenotype
- 2. Start position of the phenotype
- 3. End position of the phenotype
- 4. Phenotype ID
- 5. Top variants (not used, can be whatever you want)
- 6. Strand orientation (important to measure distance between QTLs and phenotypes)
Step3: Run the enrichment measurements
Now that you've got your QTL and phenotype file ready to go, you can then measure the enrichement using the following command:
QTLtools fenrich --qtl results.genes.significant.bed --tss results.genes.quantified.bed --bed TFs.encode.bed.gz --out enrichement.QTL.in.TF.txt
This generates a file enrichement.QTL.in.TF.txt that contains the number of functional annotations per 1kb bins around QTLs.
cat enrichement.QTL.in.TF.txt
1088 10762 711.253 23.1664 0.000999001 1.7049 1.58987 1.48432
- 1. observed number of QTLs falling within the functional annotations
- 2. total number of QTLs
- 3. mean expected number of QTLs falling within the functional annotations (across multiple permutations, see below)
- 4. standard deviation of the expected number of QTLs falling within the functional annotations (across multiple permutations, see below)
- 5-8. dummy fields
You can use the following option to tune the calculations:
- --permute [int]: number of permutations of the functional landscape done to get expectations (default is 1000)
Once you've got your observation and expectation, you can measure the enrichment related statistics in R using:
R> D=read.table("enrichement.QTL.in.TF.txt", head=FALSE, stringsAsFactors=FALSE)
R> fisher.test(matrix(c(D$V1, D$V2, round(D$V3), D$V2), ncol=2))
R= Fisher's Exact Test for Count Data
R= data: matrix(c(D$V1, D$V2, round(D$V3), D$V2), ncol = 2)
R= p-value < 2.2e-16
R= alternative hypothesis: true odds ratio is not equal to 1
R= 95 percent confidence interval:
R= 1.385641 1.690700
R= sample estimates:
R= odds ratio
R= 1.5302
So the enrichment has an odd ration of ~1.53 with a significant P-value < 2.2e-16.