This page is outdated, please also see the man page.

How to run a conditional analysis in cis?

A conditional analysis can only be run once a permutation pass on the data has been done. Why? Because the permutation pass allows to get a nominal P-value threshold of significance per phenotypes. To illustrate this, first download the example data below and go through the two steps; 1. permutation pass and 2. conditional analysis.


Step1: Run the permutation pass

This can be done quickly using 4 threads as follows:

for j in $(seq 1 16); do
  echo "cis --vcf genotypes.chr22.vcf.gz --bed genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz --permute 200 --chunk $j 16 --out permutations_$j\_16.txt";
done | xargs -P4 -n14 QTLtools

Then, cat all chunks together and run FDR correction:

cat permutations_*.txt | gzip -c > permutations_all.txt.gz
Rscript ./script/runFDR_cis.R permutations_all.txt.gz 0.05 permutations_all

From this, you get a file permutations_all.thresholds.txt with the nominal P-value thresholds that you need for the conditional analysis:

ENSG00000182057.4 7.6768047839515e-05
ENSG00000233903.2 9.8313776034753e-05
ENSG00000273366.1 8.39380360819764e-05
...
ENSG00000225783.2 2.72012352409146e-05
ENSG00000206028.1 3.83673656872974e-05
ENSG00000244625.1 3.66061481116287e-05

Step2: Run the conditional analysis

Now you can proceed with the actual conditional analysis. For one chunk, this is done as:

QTLtools cis --vcf genotypes.chr22.vcf.gz --bed genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz --mapping permutations_all.thresholds.txt --chunk 12 16 --out conditional_12_16.txt

The output conditional_12_16.txt looks like this:

ENSG00000073150.9 chr22 50609160 50609160 + 5641 -71259 rs12169319 chr22 50537901 50537901 0 6.60258e-08 0.202819 1 1 6.60258e-08 0.202819 1 1
ENSG00000073150.9 chr22 50609160 50609160 + 5641 -58747 rs9617077 chr22 50550413 50550413 0 6.84017e-05 0.0802672 0 1 6.84017e-05 0.0802672 0 1
ENSG00000073150.9 chr22 50609160 50609160 + 5641 -53474 rs760749 chr22 50555686 50555686 0 3.83196e-05 0.083734 0 1 3.83196e-05 0.083734 0 1
...
ENSG00000184319.11 chr22 51222185 51222185 + 3201 11115 rs62240042 chr22 51233300 51233300 0 9.89086e-06 -0.105956 0 1 0.000483508 -0.292154 0 0
ENSG00000184319.11 chr22 51222185 51222185 + 3201 -41251 rs9616824 chr22 51180934 51180934 1 4.9067e-06 -0.390492 1 1 4.9067e-06 -0.390492 1 1
ENSG00000184319.11 chr22 51222185 51222185 + 3201 -29599 rs5771006 chr22 51192586 51192586 1 7.71478e-05 -0.391773 0 1 7.71478e-05 -0.391773 0 1

With the various columns giving:

We strongly recommend to use the outcome of the backward pass for any downstrean analysis, since it represents what is given by a full linear model incorporating all signals together. The forward is just there for information.

Now, run all chunks and if you are only interested in the top variant for each signal, you can filter out the results for the column 19 as follows:

cat conditional_full.txt | awk '{ if ($19 == 1) print $0}' > conditional_top_variants.txt

Let's take an example:

cat conditional_top_variants | grep ENSG00000188130.9
ENSG00000188130.9 chr22 50700254 50700254 - 5139 -93 rs11913279 chr22 50700347 50700347 0 2.25986e-09 -0.278047 1 1 1.58737e-12 -0.724656 1 1
ENSG00000188130.9 chr22 50700254 50700254 - 5139 94985 rs116914202 chr22 50605269 50605269 1 7.6825e-05 -0.533554 0 1 7.58821e-06 -0.602191 1 1
ENSG00000188130.9 chr22 50700254 50700254 - 5139 65061 rs4084288 chr22 50635193 50635193 2 7.29639e-05 -0.347478 1 1 7.29639e-05 -0.347478 1 1

From this, you can see that the gene ENSG00000188130.9 has three significant eQTLs with independent effects: rs11913279 (rank=0), rs116914202 (rank=1) and rs4084288 (rank=2).