This page is outdated, please also see the man page.
How to run a conditional analysis in cis?
A conditional analysis can only be run once a permutation pass on the data has been done. Why? Because the permutation pass allows to get a nominal P-value threshold of significance per phenotypes. To illustrate this, first download the example data below and go through the two steps; 1. permutation pass and 2. conditional analysis.
- The phenotype data matrix for chr22 on 358 samples: BED / index
- The genotype data matrix for chr22 on 358 samples: VCF / index
- The covariate data matrix on 358 samples: TXT
Step1: Run the permutation pass
This can be done quickly using 4 threads as follows:
for j in $(seq 1 16); do
echo "cis --vcf genotypes.chr22.vcf.gz --bed genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz --permute 200 --chunk $j 16 --out permutations_$j\_16.txt";
done | xargs -P4 -n14 QTLtools
Then, cat all chunks together and run FDR correction:
cat permutations_*.txt | gzip -c > permutations_all.txt.gz
Rscript ./script/runFDR_cis.R permutations_all.txt.gz 0.05 permutations_all
From this, you get a file permutations_all.thresholds.txt with the nominal P-value thresholds that you need for the conditional analysis:
ENSG00000182057.4 7.6768047839515e-05
ENSG00000233903.2 9.8313776034753e-05
ENSG00000273366.1 8.39380360819764e-05
...
ENSG00000225783.2 2.72012352409146e-05
ENSG00000206028.1 3.83673656872974e-05
ENSG00000244625.1 3.66061481116287e-05
Step2: Run the conditional analysis
Now you can proceed with the actual conditional analysis. For one chunk, this is done as:
QTLtools cis --vcf genotypes.chr22.vcf.gz --bed genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz --mapping permutations_all.thresholds.txt --chunk 12 16 --out conditional_12_16.txt
The output conditional_12_16.txt looks like this:
ENSG00000073150.9 chr22 50609160 50609160 + 5641 -71259 rs12169319 chr22 50537901 50537901 0 6.60258e-08 0.202819 1 1 6.60258e-08 0.202819 1 1
ENSG00000073150.9 chr22 50609160 50609160 + 5641 -58747 rs9617077 chr22 50550413 50550413 0 6.84017e-05 0.0802672 0 1 6.84017e-05 0.0802672 0 1
ENSG00000073150.9 chr22 50609160 50609160 + 5641 -53474 rs760749 chr22 50555686 50555686 0 3.83196e-05 0.083734 0 1 3.83196e-05 0.083734 0 1
...
ENSG00000184319.11 chr22 51222185 51222185 + 3201 11115 rs62240042 chr22 51233300 51233300 0 9.89086e-06 -0.105956 0 1 0.000483508 -0.292154 0 0
ENSG00000184319.11 chr22 51222185 51222185 + 3201 -41251 rs9616824 chr22 51180934 51180934 1 4.9067e-06 -0.390492 1 1 4.9067e-06 -0.390492 1 1
ENSG00000184319.11 chr22 51222185 51222185 + 3201 -29599 rs5771006 chr22 51192586 51192586 1 7.71478e-05 -0.391773 0 1 7.71478e-05 -0.391773 0 1
With the various columns giving:
- 1. Phenotype ID
- 2. Phenotype chr ID
- 3. Phenotype start position
- 4. Phenotype end position
- 5. Phenotype strand orientation
- 6. Number of variants tested in cis
- 7. Distance between the variant and the phenotype
- 8. Variant ID
- 9. Variant chr ID
- 10. Variant start position
- 11. Variant end position
- 12. Rank of the association. This tells you if the variant has been mapped as belonging to the best signal (rank=0), the second best (rank=1), etc ... As a consequence, the maximal rank value for a given phenotype tells you how many independent signals there are (e.g. rank=2 means 3 independent signals).
- 13. Forward nominal P-value
- 14. Forward regression slope
- 15. Binary flag; 1 means that the variant is the top forward variant of this rank, 0 otherwise
- 16. Binary flag; 1 means that the forward P-value is below the threshold of this phenotype, 0 otherwise
- 17. Backward nominal P-value
- 18. Backward regression slope
- 19. Binary flag; 1 means that the variant is the top backward variant of this rank, 0 otherwise
- 20. Binary flag; 1 means that the backward P-value is below the threshold of this phenotype, 0 otherwise
Now, run all chunks and if you are only interested in the top variant for each signal, you can filter out the results for the column 19 as follows:
cat conditional_full.txt | awk '{ if ($19 == 1) print $0}' > conditional_top_variants.txt
Let's take an example:
cat conditional_top_variants | grep ENSG00000188130.9
ENSG00000188130.9 chr22 50700254 50700254 - 5139 -93 rs11913279 chr22 50700347 50700347 0 2.25986e-09 -0.278047 1 1 1.58737e-12 -0.724656 1 1
ENSG00000188130.9 chr22 50700254 50700254 - 5139 94985 rs116914202 chr22 50605269 50605269 1 7.6825e-05 -0.533554 0 1 7.58821e-06 -0.602191 1 1
ENSG00000188130.9 chr22 50700254 50700254 - 5139 65061 rs4084288 chr22 50635193 50635193 2 7.29639e-05 -0.347478 1 1 7.29639e-05 -0.347478 1 1
From this, you can see that the gene ENSG00000188130.9 has three significant eQTLs with independent effects: rs11913279 (rank=0), rs116914202 (rank=1) and rs4084288 (rank=2).