This page is outdated, please also see the man page.

How to run fenrich?

This mode allows assessing a set of QTLs fall within some functional annotations more often than what is expected by chance. To illustrate how it works, first download the required example data:

The results from a full cis-analysis: TXT
The functional annotation for few Transcription Factor Binding sites: BED

Step1: Prepare the QTL data

First, you need to prepare a BED file containing the positions of the QTLs of interest. To do so, extract all significant hits at a given FDR threshold (e.g. 5%):


		Rscript ./script/runFDR_cis.R results.genes.full.txt.gz 0.05 results.genes

Then, transform the significant QTL list into a BED file using the following command:


		cat results.genes.significant.txt | awk '{ print $9, $10-1, $11, $8, $1, $5 }'  | tr " " "\t" | sort -k1,1 -k2,2n > results.genes.significant.bed

The resulting BED file looks like this:


		head results.genes.significant.bed

		1	15210	15211	1_15211	ENSG00000227232.4	-

		1	735984	735985	1_735985	ENSG00000177757.1	+

		1	735984	735985	1_735985	ENSG00000240453.1	-

		1	739527	739528	1_739528	ENSG00000237491.4	+

		1	754963	754964	1_754964	ENSG00000225880.4	-

		1	769222	769223	1_769223	ENSG00000228794.4	+

		1	832397	832398	1_832398	ENSG00000230699.2	+

		1	844342	844343	1_844343	ENSG00000223764.2	-

		1	879675	879676	1_879676	ENSG00000187634.6	+

		1	895705	895706	1_895706	ENSG00000187961.9	+

Note here that this file contains the follwing columns:

1. Chromosome ID
2. Start position of the QTL
3. End position of the QTL
4. QTL ID
5. Targeted phenotype ID (crucial here, specify well the phenotype so that distance between QTL and phenotype can be determined)
6. Strand orientation (not used, can be whatever you want)

Step2: Prepare the Phenotype data

Then, you need to prepare a BED file containing the positions of all Phenotypes you mapped QTLs for. To do so, run this command on the raw QTL mapping output:


		zcat results.genes.full.txt.gz | awk '{ print $2, $3-1, $4, $1, $8, $5 }'  | tr " " "\t" | sort -k1,1 -k2,2n > results.genes.quantified.bed

The resulting BED file looks like this:


		head results.genes.quantified.bed

		1	29369	29370	ENSG00000227232.4	1_15211	-

		1	135894	135895	ENSG00000268903.1	1_985446	-

		1	137964	137965	ENSG00000269981.1	1_1118728	-

		1	139378	139379	ENSG00000237683.5	1_965604	-

		1	267252	267253	ENSG00000228463.4	1_256498	-

		1	317719	317720	ENSG00000237094.7	1_15211	+

		1	564441	564442	ENSG00000225972.1	1_1079866	+

		1	565019	565020	ENSG00000225630.1	1_1077962	+

		1	566453	566454	ENSG00000237973.1	1_69428	+

		1	568136	568137	ENSG00000229344.1	1_1392271	+

This file contains the follwing columns:

1. Chromosome ID of the phenotype
2. Start position of the phenotype
3. End position of the phenotype
4. Phenotype ID
5. Top variants (not used, can be whatever you want)
6. Strand orientation (important to measure distance between QTLs and phenotypes)

Step3: Run the enrichment measurements

Now that you've got your QTL and phenotype file ready to go, you can then measure the enrichement using the following command:


		QTLtools fenrich --qtl results.genes.significant.bed --tss results.genes.quantified.bed --bed TFs.encode.bed.gz --out enrichement.QTL.in.TF.txt

This generates a file enrichement.QTL.in.TF.txt that contains the number of functional annotations per 1kb bins around QTLs.


		cat enrichement.QTL.in.TF.txt
 
		1088 10762 711.253 23.1664 0.000999001 1.7049 1.58987 1.48432

1. observed number of QTLs falling within the functional annotations
2. total number of QTLs
3. mean expected number of QTLs falling within the functional annotations (across multiple permutations, see below)
4. standard deviation of the expected number of QTLs falling within the functional annotations (across multiple permutations, see below)
5-8. dummy fields

You can use the following option to tune the calculations:

--permute [int]: number of permutations of the functional landscape done to get expectations (default is 1000)

Once you've got your observation and expectation, you can measure the enrichment related statistics in R using:


		R> D=read.table("enrichement.QTL.in.TF.txt", head=FALSE, stringsAsFactors=FALSE)

		R> fisher.test(matrix(c(D$V1, D$V2, round(D$V3), D$V2), ncol=2))

		R= Fisher's Exact Test for Count Data

		R= data:  matrix(c(D$V1, D$V2, round(D$V3), D$V2), ncol = 2)

		R= p-value < 2.2e-16

		R= alternative hypothesis: true odds ratio is not equal to 1

		R= 95 percent confidence interval:

		R= 1.385641 1.690700

		R= sample estimates:

		R= odds ratio 

		R= 1.5302

So the enrichment has an odd ration of ~1.53 with a significant P-value < 2.2e-16.

Monday 11th July 2016