This page is outdated, please also see the man page.

How to cite MBV?

If you use this particular tool in your study, please cite this paper:

Fort et al. MBV: an efficient sample mislabeling and technical bias detection method for combined genotype and sequencing assay data sets (in review).

How to run mbv?

The mbv mode of QTLtools (previously called match) requires two files as input:

To illustrate how it works, we provide the 2 following example files:

Then, you can run mbv on these two files running the command:

QTLtools mbv --bam HG00381.chr22.bam --vcf genotypes.chr22.vcf.gz --filter-mapping-quality 150 --out HG00381.chr22.bamstat.txt

Note that we use the option --filter-mapping-quality 150 since it is the recommended value to remove bad quality reads in a BAM generated with the GEM mapper.

This command produces an output file HG00381.chr22.match.txt as follows:

HG00096 0 23764 61721 175 499 91 333 29
HG00097 0 26639 58846 193 481 93 317 23
HG00099 0 27672 57813 216 458 93 294 26
HG00100 0 28267 57218 243 431 107 281 24
...
HG00381 0 27339 58146 213 461 204 408 28
...
HG00106 0 26046 59439 190 484 90 317 30
HG00108 0 25408 60077 205 469 85 297 31

The 9 columns give:

Then, you can compute the ratios G/E and H/F to get the concordances at heterozygous and homozygous genotypes. This helps to identify the genotyped sample in the VCF that matches best your sequence data. It is shown in bold above in the output file and has the same sample ID than the sequence sample, meaning that there is no sample swap here. Alternatively, you can also inspect the output by plotting the output using R as follows:

On this plot, we can clearly identify the set of genotyped samples that do not match the sequence data (in red) with the best match as a clear outlier (in green).

Hereafter the list of options that the mbv mode can use to tune the analysis:

Monday 11th July 2016