This page is outdated, please also see the man page.

Important changes in version 1.2

The quan mode in version 1.2 is not compatible with the quantifications generated by the previous versions. This due to bug fixes and slight adjustments to the way we quantify. DO NOT MIX QUANTIFICATIONS GENERATED BY EARLIER VERSIONS OF QTLTOOLS WITH QUANTIFICATIONS FROM VERSION 1.2 AS THIS WILL CREATE A BIAS IN YOUR DATASET.

Other changes include:

Filtering of reads in quantifications

Each read will be checked for the following filters in following order. If a read fails one of the filters it will not be checked for the latter filters. This means if a read fails multiple filters it will be counted towards the first filter it fails and you can find these counts in the stats files:

How to run quan?

The QTLtools mode quant allows quantifying exons and genes and requires 2 files as input:

To illustrate how this works, we provide the 4 following example files:

Then, to perform quantification from paired-end RNA-seq with 49 bp reads aligned with GEM, we recommend to use this command:

QTLtools quan --bam HG00381.chr22.bam --gtf gencode.v19.annotation.chr22.gtf.gz --sample HG00381 --out-prefix HG00381 --filter-mapping-quality 150 --filter-mismatch 5 --filter-mismatch-total 5 --rpkm

This command produces 5 output files HG00381.5zDu4oPgLES.exon.count.bed, HG00381.5zDu4oPgLES.exon.rpkm.bed, HG00381.5zDu4oPgLES.gene.count.bed, HG00381.5zDu4oPgLES.gene.rpkm.bed, and HG00381.5zDu4oPgLES.stats. If you open the gene rpkm file, you'll get something like this:

#chr start end gene length strand HG00381
22 16062156 16062157 ENSG00000233866.1 L=586;T=lincRNA;R=chr22:16062157-16063236;N=LA16c-4G1.3 + 0
22 16076171 16076172 ENSG00000229286.1 L=121;T=pseudogene;R=chr22:16076052-16076172;N=LA16c-4G1.4 - 0
22 16084825 16084826 ENSG00000235265.1 L=578;T=pseudogene;R=chr22:16084249-16084826;N=LA16c-4G1.5 - 0
22 16124972 16124973 ENSG00000223875.1 L=685;T=pseudogene;R=chr22:16100517-16124973;N=NBEAP3 - 0
22 16122719 16122720 ENSG00000215270.3 L=1049;T=pseudogene;R=chr22:16122720-16123768;N=LA16c-60H5.7 + 56.6973588
22 16193003 16193004 ENSG00000206195.6 L=5545;T=processed_transcript;R=chr22:16147979-16193004;N=AP000525.9 - 1.05850612
22 16151396 16151397 ENSG00000271672.1 L=622;T=pseudogene;R=chr22:16150776-16151397;N=DUXAP8 - 9.436360824
22 16154765 16154766 ENSG00000272872.1 L=694;T=sense_intronic;R=chr22:16154073-16154766;N=LL22NC03-N14H11.1 - 0
22 16157078 16157079 ENSG00000232775.2 L=1572;T=pseudogene;R=chr22:16157079-16172700;N=AP000525.10 + 0

This file is a BED file with the following 7 columns:

Now, if you open HG00381.5zDu4oPgLES.exon.rpkm.bed, you'll get:

#chr start end gene length strand HG00381
...
22 17517459 17517460 ENSG00000237438.2_17517460_17518234 ENSG00000237438.2 + 7.874866966
22 17517459 17517460 ENSG00000237438.2_17525763_17525890 ENSG00000237438.2 + 106.9270201
22 17517459 17517460 ENSG00000237438.2_17528214_17528320 ENSG00000237438.2 + 77.96262006
...
22 17565843 17565844 ENSG00000177663.9_17565844_17566119 ENSG00000177663.9 + 3.009379634
22 17565843 17565844 ENSG00000177663.9_17578687_17578833 ENSG00000177663.9 + 166.3142867
22 17565843 17565844 ENSG00000177663.9_17579665_17579777 ENSG00000177663.9 + 21.7314494
...

For exon quantifications, the output is slightly different. The 4th and the 5th columns give this time, the exonID and the gene it belongs to, respectively. The TSS position is the same for all exons within the same gene. These pieces of information are important for QTL mapping in cis.

The HG00381.5zDu4oPgLES.stats file will contain:

sample HG00381
total_reads 610860
total_secondary_alingments 0
total_unmapped 27706
total_duplicate 0
total_failqc 0
total_mapQ_less_than_150 80608
total_notpaired 1404
total_mismatches_greater_than_5_5 139022
total_merged_reads 160772
total_good 353278
total_exonic 335448
total_exonic_multi_counting 369218
total_exonic_multi_counting_after_merge_(used_for_rpkm) 340749.3782233436
good_over_total 0.5783289133352978
exonic_over_total 0.5491405559375307
exonic_over_good 0.9495298320302991

This file is a stats file with the following lines:

To quantify all samples for which you have RNA-seq, proceed as follows:

The expected results will look like this:

#chr start end gene length strand HG00096 HG00097 HG00099 HG00100 HG00101 HG00102
chr1 29369 29370 ENSG00000227232.4 1800 - 3.574873147 2.159108572 2.656253435 2.147152811 3.060300881 0.4603702761
chr1 135894 135895 ENSG00000268903.1 755 - 1.094681369 0.5444517286 0 0 1.015910092 0
chr1 137964 137965 ENSG00000269981.1 284 - 2.494419821 1.184234794 0.7113553138 0.9226247455 3.437314236 2.122065293
chr1 139378 139379 ENSG00000237683.5 2661 - 1.575143469 1.306025695 0.2530689078 0.3692579309 3.092051181 1.358887358
chr1 267252 267253 ENSG00000228463.4 3954 - 0.07465174813 0.09450983011 0 0.1822382211 0.03526978984 0.209576757
chr1 317719 317720 ENSG00000237094.7 6869 + 0 0.005440265952 0 0.009536520153 0 0.03290143452
chr1 564441 564442 ENSG00000225972.1 372 + 1269.561342 1101.687828 1226.812553 1438.322375 1157.266087 2486.809527
chr1 565019 565020 ENSG00000225630.1 1044 + 1324.772668 1573.121065 1388.501978 1507.901121 1498.158208 2563.208926
chr1 566453 566454 ENSG00000237973.1 1543 + 5898.256933 6113.821775 5275.691308 6484.280259 5137.033507 8913.352248

Hereafter the list of options that can be used to tune the quantification step:

Monday 2nd September 2019