This page is outdated, please also see the man page.

How to run an approximate pass in trans?

To perform an approximate analysis in trans, you need to proceed in two steps: 1. build a null distribution and 2. use it to adjust nominal P-values. Conversely to the full permutation pass (link), the null distribution here is designed to adjust nominal P-values for the number of variants being tested. To illustrate this process, first download this example data set:

A phenotype data matrix for 358 samples on chr22: BED / index
A genotype data matrix for 358 samples on chr22: VCF / index

Step1: Build the null distribution

To do so, run the command:


		QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --sample 1000 --normal --out trans.sample

This basically choose randomly 1,000 phenotypes, permute them and test for association with all variants.

We strongly recommend to use the option --normal here to make sure that all phenotypes are distributed the same way.

This produces 3 output files, but you're are interested only in the file trans.null.best.txt.gz that list the best associations discovered for each shuffled phenotype.


		zcat trans.sample.best.txt.gz | head

		ENSG00000223704.1 -1 4.7752e-06

		ENSG00000025770.14 -1 0.00029513

		ENSG00000183569.13 -1 0.00017113

		ENSG00000226471.2 -1 2.28579e-05

		ENSG00000167077.8 -1 1.84116e-05

		ENSG00000226471.2 -1 3.45359e-05

		ENSG00000211644.2 -1 2.24922e-05

		ENSG00000268292.1 -1 3.26526e-05

		ENSG00000182841.8 -1 6.16083e-05

		ENSG00000241360.1 -1 2.49368e-05

The columns in this file are:

1. The ID of the randomly chosen phenotype with sghuffled quantifications
2. Dummy field, not used here
3. The smallest nominal P-value discovered for the phenotype

Since --sample 1000 has been run, there are 1,000 lines in this file. This forms a null distribution that can be used to correct P-values in a nominal pass; similarly to what is done in cis.

Step2: Adjust the nominal P-values

To run a nominal pass and adjust the P-values given the null distribution, use:


		QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.50percent.chr22.bed.gz --adjust trans.sample	.best.txt.gz --normal --threshold 0.1 --out trans.adjust

This does the same than --nominal, excepted that each nominal is adjusted. Specifically, the most notable options used in this command are:

--adjust trans.sample.best.txt.gz: to read the null and fit a beta distribution on it
--normal: again, do not forget to add this option in order to be consistent with the step1
--threshold 0.1: this needs to be increased since it now applies on adjusted P-values and not nominals

The output is now corrected for the multiple variants being tested, but we still need to correct for the multiple phenotypes being tested. To do so, we proceed in a very similar way than in cis.

Extract significant hits

Here, We need to determine the adjusted P-value threshold corresponding to a given FDR and then use it to extract all significant hits. At this point, we therefore need to look at two particular files that the last command has produced: trans.adjust.best.txt.gz and trans.adjust.hits.txt.gz. The first will be used to determine the adjusted P-value threshold that will help to filter out unsignificant hits from the second. To run all this, we provide a minimalistic R script that can obviously be adapted, extended and optimized according to your needs. To run this script, use:


		Rscript ./script/runFDR_atrans.R trans.adjust.best.txt.gz trans.adjust.hits.txt.gz 0.05 output.txt

The file output.txt is a subset of trans.adjust.hits.txt.gz that basically contains all significant hits at the specified FDR level.

Monday 11th July 2016