Quantcast
Browsing latest articles
Browse All 97 View Live

How can I turn on or customize forum notifications?

By default, the forum does not send notification messages about new comments or discussions. If you want to turn on notifications or customize the type of notifications you want to receive (email,...

View Article


What is "Phone Home" and how does it affect me?

1. What it is and how it helps us improve the GATK Since September, 2010, the GATK has had a "phone-home" feature that sends us information about each GATK run via the Broad filesystem (within the...

View Article


Collected FAQs about VCF files

1. What file formats do you support for variant callsets? We support the Variant Call Format (VCF) for variant callsets. No other file formats are supported. 2. How can I know if my VCF file is valid?...

View Article

Why are some of the annotation values different with VariantAnnotator...

As featured in this forum question. Two main things account for these kinds of differences, both linked to default behaviors of the tools: The tools downsample to different depths of coverage The tools...

View Article

What is Map/Reduce and why are GATK tools called "walkers"?

Overview One of the key challenges of working with next-gen sequence data is that input files are usually very large. We can’t just make the program open the files, load all the data into memory and...

View Article


How do I submit a detailed bug report?

Note: only do this if you have been explicitly asked to do so. Scenario: You posted a question about a problem you had with GATK tools, we answered that we think it's a bug, and we asked you to submit...

View Article

Image may be NSFW.
Clik here to view.

What is GATK-Lite and how does it relate to "full" GATK 2.x? [RETIRED]

Please note that GATK-Lite was retired in February 2013 when version 2.4 was released. See the announcement here. You probably know by now that GATK-Lite is a free-for-everyone and completely...

View Article

How can I invoke read filters and their arguments?

Most GATK tools apply several read filters by default. You can look up exactly what are the defaults for each tool in their respective Technical Documentation pages. But sometimes you want to specify...

View Article


Image may be NSFW.
Clik here to view.

What are the prerequisites for running GATK?

1. Operating system The GATK runs natively on most if not all flavors of UNIX, which includes MacOSX, Linux and BSD. It is possible to get it running on Windows using Cygwin, but we don't provide any...

View Article


Can I use different versions of the GATK at different steps of my analysis?

Short answer: NO. Medium answer: no, at least not if you want to run a low-risk pipeline. Long answer: see below for details. The rationale There are several reasons why you might want to do this:...

View Article

What types of variants can GATK tools detect / handle?

The answer depends on what tool we're talking about, and whether we're considering variant discovery or variant manipulation. Variant manipulation GATK variant manipulation tools are able to recognize...

View Article

Where can I get the GATK source code?

We distinguish "Classic GATK" (major versions 1 through 3) and GATK 4, the next generation of GATK tools. "Classic GATK" (major versions 1 through 3) (current distribution) We provide the current GATK...

View Article

What do the VariantEval modules do?

VariantEval accepts two types of modules: stratification and evaluation modules. Stratification modules will stratify (group) the variants based on certain properties. Evaluation modules will compute...

View Article


Collected FAQs about input files for sequence read data (BAM/CRAM)

1. What file formats do you support for sequence data input? The GATK supports the BAM format for reads, quality scores, alignments, and metadata (e.g. the lane of sequencing, center of origin, sample...

View Article

What is the structure of a GATK command?

Overview This document describes how GATK commands are structured and how to add arguments to basic command examples. Basic java syntax Commands for GATK always follow the same basic syntax: java [Java...

View Article


Where can I get a gene list in RefSeq format?

1. About the RefSeq Format From the NCBI RefSeq website The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including...

View Article

Which tools use pedigree information?

There are two types of GATK tools that are able to use pedigree (family structure) information: Tools that require a pedigree to operate PhaseByTransmission and CalculateGenotypePosterior will not run...

View Article


I'm new to GATK. Where do I start?

If this is your first rodeo, you're probably asking yourself: What can GATK do for me? Identify variants in a bunch of sample sequences, with great sensitivity and specificity. How do I get GATK to do...

View Article

Image may be NSFW.
Clik here to view.

Can I use GATK on non-diploid organisms?

In general most GATK tools don't care about ploidy. The major exception is, of course, at the variant calling step: the variant callers need to know what ploidy is assumed for a given sample in order...

View Article

What is the GATKReport file format?

A GATKReport is simply a text document that contains well-formatted, easy to read representation of some tabular data. Many GATK tools output their results as GATKReports, so it's important to...

View Article

What do I need to do before attending a workshop hands-on session?

So you're going to a GATK workshop, and you've been selected to participate in a hands-on session? Fantastic! We're looking forward to walking you through some exercises that will help you master the...

View Article


How can I use parallelism to make GATK tools run faster?

This document provides technical details and recommendations on how the parallelism options offered by the GATK can be used to yield optimal performance results. Overview As explained in the primer on...

View Article


What input files does the GATK accept / require?

All analyses done with the GATK typically involve several (though not necessarily all) of the following inputs: Reference genome sequence Sequencing reads Intervals of interest Reference-ordered data...

View Article

What is uBAM and why is it better than FASTQ for storing unmapped sequence data?

Most sequencing providers generate FASTQ files with the raw unmapped read sequences, so that is the most common form in which the data is input into the mapping step of the pre-processing pipeline....

View Article

Image may be NSFW.
Clik here to view.

What is a GVCF and how is it different from a 'regular' VCF?

Overview GVCF stands for Genomic VCF. A GVCF is a kind of VCF, so the basic format specification is the same as for a regular VCF (see the spec documentation here), but a Genomic VCF contains extra...

View Article


How should I cite GATK in my own publications?

To date we have published three papers on GATK (citation details below). The ideal way to cite the GATK is to use all as a triple citation, as in: We sequenced 10 samples on 10 lanes on an Illumina...

View Article

What should I use as known variants/sites for running tool X?

1. Notes on known sites Why are they important? Each tool uses known sites differently, but what is common to all is that they use them to help distinguish true variants from false positives, which is...

View Article

Image may be NSFW.
Clik here to view.

Should I analyze my samples alone or together?

Together is (almost always) better than alone We recommend performing variant discovery in a way that enables joint analysis of multiple samples, as laid out in our Best Practices workflow. That...

View Article

I have multiple read groups for 1 sample. How should I pre-process them?

Things can get a bit messy when you have multiple libraries (or read groups) for a sample. You may not know how to organize the data for the pre-processing steps or how to feed the data into Haplotype...

View Article



Lane, Library, Sample and Cohort -- what do they mean and why are they...

There are four major organizational units for next-generation DNA sequencing processes that used throughout the GATK documentation: Lane: The basic machine unit for sequencing. The lane reflects the...

View Article

What is the difference between QUAL and GQ annotations?

There has been a lot of confusion about the difference between QUAL and GQ, and we hope this FAQ will clarify the difference. The basic difference is that QUAL refers to the variant site whereas GQ...

View Article

Image may be NSFW.
Clik here to view.

What is a VCF and how should I interpret it?

This document describes "regular" VCF files. For information on the special kind of VCF called gVCF, produced by HaplotypeCaller in -ERC GVCF mode, please see this companion document. Contents What is...

View Article

How should I pre-process data from multiplexed sequencing and multi-library...

Our Best Practices Pre-processing documentation assumes a simple experimental design in which you have one set of input sequence files (forward/reverse or interleaved FASTQ, or unmapped uBAM) per...

View Article


Collected FAQs about interval lists

1. What file formats do you support for interval lists? We support three types of interval lists, as mentioned here. Interval lists should preferentially be formatted as Picard-style interval lists,...

View Article

Should I use UnifiedGenotyper or HaplotypeCaller to call variants on my data?

Use HaplotypeCaller! The HaplotypeCaller is a more recent and sophisticated tool than the UnifiedGenotyper. Its ability to call SNPs is equivalent to that of the UnifiedGenotyper, its ability to call...

View Article

When should I use -L to pass in a list of intervals?

The -L argument (short for --intervals) enables you to restrict your analysis to specific intervals instead of running over the whole genome. Using this argument can have important consequences for...

View Article


How can I prepare a FASTA file to use as reference?

This article describes the steps necessary to prepare your reference file (if it's not one that you got from us). As a complement to this article, see the relevant tutorial. Why these steps are...

View Article


Which training sets / arguments should I use for running VQSR?

This document describes the resource datasets and arguments that we recommend for use in the two steps of VQSR (i.e. the successive application of VariantRecalibrator and ApplyRecalibration), based on...

View Article

What's in the resource bundle and how can I get it?

1. Obtaining the bundle Inside of the Broad, the latest bundle will always be available in: /humgen/gsa-hpprojects/GATK/bundle/current with a subdirectory containing for each reference sequence and...

View Article

Can I apply the germline variant joint calling workflow to my RNAseq data?

We have not yet validated the joint genotyping methods (HaplotypeCaller in -ERC GVCF mode per-sample then GenotypeGVCFs per-cohort) on RNAseq data. Our standard recommendation is to process RNAseq...

View Article

How should I select samples for a Panel of Normals for somatic analysis?

The Panel of Normals (PoN) plays two important roles in somatic variant analysis: Exclude germline variant sites that are found in the normals to avoid calling them as potential somatic variants in the...

View Article


What do GATK workshops cover?

This is a summary description of our standard 3-day workshop, with optional 1-day pipelining add-on at the discretion of the organizer. Overview This workshop formula focuses on the core steps involved...

View Article
Browsing latest articles
Browse All 97 View Live