mutation fields
The mutation endpoint contains data from TCGA about genes and mutations
Column names that have a .
between words denote that the term after the .
is a nested field. Nesting structure can be more easily browsed in the mutation JSON schema
column_name | description | data_type |
---|---|---|
AA_MAF | Non-reference allele and frequency of existing variant in NHLBI-ESP African American population | STRING |
AFR_MAF | Non-reference allele and frequency of existing variant in 1000 Genomes combined African population | FLOAT |
ALLELE_NUM | Allele number from input; 0 is reference, 1 is first alternate etc. | STRING |
AMR_MAF | Non-reference allele and frequency of existing variant in 1000 Genomes combined American population | FLOAT |
Allele | The variant allele used to calculate the consequence | STRING |
Amino_acids | Amino acid substitution caused by the mutation. Only given if the variation affects the protein-coding sequence | STRING |
BIOTYPE | Biotype of transcript | STRING |
CANONICAL | A flag (YES) indicating that the VEP-based canonical transcript, the longest translation, was used for this gene. If not, the value is null | STRING |
CCDS | The CCDS identifier for this transcript, where applicable | STRING |
CDS_position | Relative position of base pair in coding sequence. A - symbol is displayed as the numerator if the variant does not appear in coding sequence | STRING |
CLIN_SIG | Clinical significance of variant from dbSNP | STRING |
CONTEXT | The reference allele per VCF specs, and its five flanking base pairs | STRING |
COSMIC | Overlapping COSMIC variants | STRING |
Center | One or more genome sequencing center reporting the variant | STRING |
Chromosome | Chromosome, possible values: chr1-22, and chrX | STRING |
Codons | The alternative codons with the variant base in upper case | STRING |
Consequence | Consequence type of this variant; sequence ontology terms | STRING |
DISTANCE | Shortest distance from the variant to transcript | INTEGER |
DOMAINS | The source and identifier of any overlapping protein domains | STRING |
EAS_MAF | Non-reference allele and frequency of existing variant in 1000 Genomes combined East Asian population | FLOAT |
EA_MAF | Non-reference allele and frequency of existing variant in NHLBI-ESP European American population | STRING |
ENSP | The Ensembl protein identifier of the affected transcript | STRING |
EUR_MAF | Non-reference allele and frequency of existing variant in 1000 Genomes combined European population | FLOAT |
EXON | The exon number (out of total number) | STRING |
End_Position | Highest numeric genomic position of the reported variant on the genomic reference sequence. Mutation end coordinate | INTEGER |
Entrez_Gene_Id | Entrez gene ID (an integer). 0 is used for regions that do not correspond to a gene region or Ensembl ID | INTEGER |
ExAC_AF | Global Allele Frequency from ExAC | STRING |
ExAC_AF_AFR | African/African American Allele Frequency from ExAC | STRING |
ExAC_AF_AMR | American Allele Frequency from ExAC | STRING |
ExAC_AF_Adj | Adjusted Global Allele Frequency from ExAC | STRING |
ExAC_AF_EAS | East Asian Allele Frequency from ExAC | STRING |
ExAC_AF_FIN | Finnish Allele Frequency from ExAC | STRING |
ExAC_AF_NFE | Non-Finnish European Allele Frequency from ExAC | STRING |
ExAC_AF_OTH | Other Allele Frequency from ExAC | STRING |
ExAC_AF_SAS | South Asian Allele Frequency from ExAC | STRING |
Existing_variation | Known identifier of existing variation | STRING |
Exon_Number | The exon number (out of total number) | STRING |
FILTER | Copied from input VCF. This includes filters implemented directly by the variant caller and other external software used in the DNA-Seq pipeline. See below for additional details. | STRING |
Feature | Stable Ensembl ID of feature (transcript, regulatory, motif) | STRING |
Feature_type | Type of feature. Currently one of Transcript, RegulatoryFeature, MotifFeature (or blank) | STRING |
GDC_FILTER | GDC filters applied universally across all MAFs | STRING |
GDC_Validation_Status | GDC implementation of validation checks. See notes section (#5) below for details | STRING |
GMAF | Non-reference allele and frequency of existing variant in 1000 Genomes | FLOAT |
Gene | The gene symbol. In this table, gene symbol is gene name e.g. ACADVL | STRING |
HGNC_ID | Gene identifier from the HUGO Gene Nomenclature Committee if applicable | STRING |
HGVS_OFFSET | Indicates by how many bases the HGVS notations for this variant have been shifted | INTEGER |
HGVSc | The coding sequence of the variant in HGVS recommended format | STRING |
HGVSp | The protein sequence of the variant in HGVS recommended format. p.= signifies no change in the protein | STRING |
HGVSp_Short | Same as the HGVSp column, but using 1-letter amino-acid codes | STRING |
Hugo_Symbol | HUGO symbol for the gene (HUGO symbols are always in all caps). Unknown is used for regions that do not correspond to a gene | STRING |
IMPACT | The impact modifier for the consequence type | STRING |
INTRON | The intron number (out of total number) | STRING |
MC3_Overlap | Indicates whether this region overlaps with an MC3 variant for the same sample pair | STRING |
MINIMISED | Alleles in this variant have been converted to minimal representation before consequence calculation (1 or null) | STRING |
Matched_Norm_Sample_UUID | Unique GDC identifier for normal aliquot (10189 unique) | STRING |
Mutation_Status | An assessment of the mutation as somatic, germline, LOH, post transcriptional modification, unknown, or none. The values allowed in this field are constrained by the value in the Validation_Status field | STRING |
NCBI_Build | The reference genome used for the alignment (GRCh38) | STRING |
One_Consequence | The single consequence of the canonical transcript in sequence ontology terms, eg missense_variant | STRING |
PHENO | Indicates if existing variant is associated with a phenotype, disease or trait (0, 1, or null) | STRING |
PICK | Indicates if this block of consequence data was picked by VEP's pick feature (1 or null) | STRING |
PUBMED | Pubmed ID(s) of publications that cite existing variant | STRING |
PolyPhen | The PolyPhen prediction and/or score | STRING |
Protein_position | Relative position of affected amino acid in protein. A - symbol is displayed as the numerator if the variant does not appear in coding sequence | STRING |
RefSeq | RefSeq identifier for this transcript | STRING |
Reference_Allele | The plus strand reference allele at this position. Includes the deleted sequence for a deletion or - for an insertion | STRING |
SAS_MAF | Non-reference allele and frequency of existing variant in 1000 Genomes combined South Asian population | FLOAT |
SIFT | The SIFT prediction and/or score, with both given as prediction (score) | STRING |
SOMATIC | Somatic status of each ID reported under Existing_variation (0, 1, or null) | STRING |
SWISSPROT | UniProtKB/Swiss-Prot accession | STRING |
SYMBOL | Eg TP53, LRP1B, etc (same as Hugo_Symbol field except blank instead of Unknown | STRING |
SYMBOL_SOURCE | The source of the gene symbol, usually HGNC, rarely blank, other sources include Uniprot_gn, EntrezGene, etc | STRING |
Sequencer | Instrument used to produce primary sequence data | STRING |
Start_Position | Lowest numeric position of the reported variant on the genomic reference sequence. Mutation start coordinate | INTEGER |
Strand | Either + or - to denote whether read mapped to the sense (+) or anti-sense (-) strand | STRING |
TRANSCRIPT_STRAND | The DNA strand (1 or -1) on which the transcript/feature lies | INTEGER |
TREMBL | UniProtKB/TrEMBL identifier of protein product | STRING |
TSL | Transcript support level, which is based on independent RNA analyses | INTEGER |
Transcript_ID | Ensembl ID of the transcript affected by the variant | STRING |
Tumor_Sample_UUID | Unique GDC identifier for tumor aliquot (10189 unique) | STRING |
Tumor_Seq_Allele1 | Primary data genotype for tumor sequencing (discovery) allele 1. A - symbol for a deletion represents a variant. A - symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases | STRING |
Tumor_Seq_Allele2 | Primary data genotype for tumor sequencing (discovery) allele 2. A - symbol for a deletion represents a variant. A - symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases | STRING |
Tumor_Validation_Allele1 | Secondary data from orthogonal technology. Tumor genotyping (validation) for allele 1. A - symbol for a deletion represents a variant. A - symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases | STRING |
Tumor_Validation_Allele2 | Secondary data from orthogonal technology. Tumor genotyping (validation) for allele 2 | STRING |
UNIPARC | UniParc identifier of protein product | STRING |
VARIANT_CLASS | Sequence Ontology variant class | STRING |
Validation_Method | The assay platforms used for the validation call | STRING |
Variant_Classification | Translational effect of variant allele | STRING |
Variant_Type | Type of mutation. TNP (tri-nucleotide polymorphism) is analogous to DNP (di-nucleotide polymorphism) but for three consecutive nucleotides. ONP (oligo-nucleotide polymorphism) is analogous to TNP but for consecutive runs of four or more (SNP, DNP, TNP, ONP, INS, DEL, or Consolidated) | STRING |
aliquot_barcode_normal | TCGA aliquot barcode for the normal control, eg TCGA-12-1089-01A-01D-0517-01 | STRING |
aliquot_barcode_tumor | TCGA aliquot barcode for the tumor, eg TCGA-12-1089-01A-01D-0517-01 | STRING |
all_effects | A semicolon delimited list of all possible variant effects, sorted by priority ([Symbol,Consequence,HGVSp_Short,Transcript_ID,RefSeq,HGVSc,Impact,Canonical,Sift,PolyPhen,Strand]) | STRING |
cDNA_position | Relative position of base pair in the cDNA sequence as a fraction. A - symbol is displayed as the numerator if the variant does not appear in cDNA | STRING |
callerName | -delimited list of mutation caller(s) that agreed on this particular call, always in alphabetical order: muse, mutect, somaticsniper, varscan | |
case_barcode | Original TCGA case barcode, eg TCGA-DX-A8BN | STRING |
case_id | Unique GDC identifier for the underlying case | STRING |
dbSNP_RS | The rs-IDs from the dbSNP database, novel if not found in any database used, or null if there is no dbSNP record, but it is found in other databases | STRING |
dbSNP_Val_Status | The dbSNP validation status is reported as a semicolon-separated list of statuses. The union of all rs-IDs is taken when there are multiple | STRING |
fileName | -delimited list of name of underlying MAF file | |
fileUUID | -delimited list of unique GDC identifiers for underlying MAF file | |
n_depth | Read depth across this locus in normal BAM | STRING |
normal_bam_uuid | Unique GDC identifier for the underlying normal bam file | STRING |
project_short_name | Project name abbreviation; the program name appended with a project name abbreviation; eg. TCGA-OV, etc. | STRING |
sample_barcode_normal | TCGA sample barcode for the normal control, eg TCGA-12-1089-01A. One sample may have multiple sets of CN segmentations corresponding to multiple aliquots; use GROUP BY appropriately in queries | STRING |
sample_barcode_tumor | TCGA sample barcode for the tumor, eg TCGA-12-1089-01A. One sample may have multiple sets of CN segmentations corresponding to multiple aliquots; use GROUP BY appropriately in queries | STRING |
src_vcf_id | -delimited list of GDC VCF file identifiers | |
t_alt_count | Read depth supporting the variant allele in tumor BAM | STRING |
t_depth | Read depth across this locus in tumor BAM | STRING |
t_ref_count | Read depth supporting the reference allele in tumor BAM | STRING |
tumor_bam_uuid | Unique GDC identifier for the underlying bam file | STRING |
Last update:
2022-09-28