First, we will get to a summary of information about the Factor IX
gene by clicking on one of the sequences in the window (or by clicking the F9 to the left of the graphic window). You will be taken to a new page with loads of information about the F9 gene. Among other sources of data we here see references to 1) Genbank mRNAs, 2) Uniprot/SwissProt and 3) PDB.
For instance, identify the heading "Descriptions from all associated GenBank mRNAs".
Click on one of
the mRNAs listed there, M11309; the resulting page
has information on that single sequence (it is a Genbank mRNA entry).
Go back to the page "Human Gene F9 (uc004fas.1) Description and Page Index" and try to find the answers to the following questions.
Q2. Is the protein found inside the cell or outside?
Q3. Are there any 3D structures for the protein? In such as case, what were the methods used to derive these structures?
Q7. There are two variants of F9 (uc004fas.1 and uc004fat.1) shown in the browser. They are two different transcripts of the same gene. What is the difference between them?
Zoom in on the first exon. If you have a sufficiently small window the amino acid sequence encoded by the exon will be displayed.
Q9. How many nucleotides are within the 5' UTR?
Q11. What human gene is located "to the left" of Factor IX, and what strand is it on? What are the approximate distances between the Factor IX gene and the respective flanking genes?
hg19 chr12 location |
SNP |
Ref allele |
Variant allele |
AA sub |
change in methotrexate
clearance |
21331799 |
rs4149056 |
T |
C |
V174A |
decrease |
21330063 |
rs11045819 |
C |
A |
P155T |
increase |
In this part we will illustrate the use of the UCSC Table
Browser. It provides text-based access to the genome assemblies
and annotation data stored in the Genome Browser database to
retrieve specific data. This tool offers an enhanced level of
query support that includes restrictions based on field values,
free-form SQL queries, and combined queries on multiple tables.
Output can be filtered to restrict the fields and lines returned,
and may be organized into one of several formats, including a
simple tab-delimited file that can be loaded into a spreadsheet or
database as well as advanced formats that may be uploaded into the
Genome Browser as custom annotation tracks. The Table Browser
provides a convenient alternative to downloading and manipulating
the entire genome and its massive data tracks.
First, let's examine a group of genes that are characterized by
three-nucleotide repeats. Such repeat regions are often associated
with disease. One example is Huntington's disease, which is a
result of the expansion of a three-nucleotide repeat consisting of
the triplets CAG. Huntington's disease is not the only disease
that is caused by a CAG triple repeat expansion. Thus, a number of
other hereditary neurodegenerative diseases involve such an
expansion. One example is the disease DRPLA involving the gene
ATN1.
Here is a short list of genes with trinucleotide repeats from
McMurray, C. Mechanisms of trinucleotide repeat instability during
human development. Nature Reviews Genetics 11: 786-99:
Disease |
Trinucleotide |
Gene |
Location |
DRPLA |
CAG |
ATN1 |
CDS, exon 5 |
Huntington |
CAG |
HTT |
CDS, exon 1 |
DM1 |
CTG |
DMPK |
3' UTR |
We will here use the Table browser to identify human genes with
CAG repeats.
Go to the UCSC Genome Browser homepage and select the Table
Browser, by clicking either of the Table Browser links from the
homepage (Table browser/Tables).
In the Table Browser window, select the human genome, the Feb.
2009 assembly and the group "Variation and repeats". Then select
the track "Simple Repeats". For a specific track there may
be one or more tables to describe it. In this case there is only
one table "simpleRepeat". Click on the button "describe table
schema" if you want to get more information on what is contained
in that table.
Then we define the genomic region(s) to search. In this case we
will search the entire genome and make sure that under "region",
the "genome" alternative is selected.
Now click the summary/statistics button.
Q20.
How many simple repeats are there in the human genome? (check
"item count")?
Now we will filter the data to identify only the repeats with the
sequence CAG. Click on the button filter: Create. In the resulting
form there are a number of fields but we will only make use of
"sequence does match" and enter "CAG" in that field instead of the
default asterisk (*). Click Submit. Note that the filter button has now changed
to two buttons "edit" and "clear".
Again, click on the summary/statistics button.
Q21.
How many simple repeats with the sequence CAG are there in the
human genome?
We now want to output the result. For this example we will leave the boxes Galaxy and GREAT unchecked. Depending on the data you want to obtain there are different output formats. There are several different output formats available for this data table:
For this example select "all fields from selected table". Leave
the field "Output file" empty. Click the "get output" button. A
new page will open with a list of CAG repeat regions.
Q22.
How many CAG repeat regions are there on the Y chromosome
?
4.2 CAG repeats continued -
intersections
We saw in the previous exercise an example of filtering data. We
will now examine intersections with the Table Browser. The
intersection tool allows you to find if two datasets have any
overlap. For example, we may want to know if there is any chromosomal location overlap
between the “known genes” dataset and the “simple repeat” dataset, and we may in that case want to download the data of this overlap region.
Here we will attempt to identity all
“CAG” repeats that are within known genes and
download these sequences.
If you return to the Table Browser, your previous search and
filter should have remained.
Clicking on the "intersection" create button will take you to an
intersection page.
Here, you choose the group, annotation track and table that you
wish to intersect with the table that you selected on the main
page. We intersect our simple repeats with UCSC genes to find
which of our filtered repeats reside in known genes. Take note,
the UCSC Known Genes table will only include coding exons in
intersections.
You can choose “any overlap”, “no overlap”, or “at least” or “at
most” percentage of overlap. You can also select either an
intersection or union of the data sets. Here we choose any
overlap. Once you have completed your choices, click submit.
You will find that the “intersection”
choice changes to “edit” and “clear” and text appears that shows
you the intersection. If we look at the summary/statistics as we
did earlier, you will see that by intersecting our filtered
repeats with UCSC genes in the entire genome, we’ve narrowed our
search.
As output format select "hyperlinks to Genome Browser". Click 'get output'.
This will lead to a page with hyperlinks to the browser window; each link with a specific CAG repeat region within a known gene.
We will examine four of the repeats. First, follow the link to
"trf at chr4:3076604-3076667". You will discover that this is the
gene for the Huntingtin protein, HTT. The CAG repeats gives rise a
poly-glutamine region at the protein level.
Q23.
Zoom out so that you view the entire HTT gene in the browser
window. Is the polyQ-region located in the N-terminal or
C-terminal region of the protein?
Also check out the link to "trf at chr12:7045880-7045938". This
will show the gene ATN1 mentioned above.
Then, examine "trf at chr17:17697094-17697134". This links to the
gene RAI1. Examine information for this gene.
Q24.
What disease may be associated with a polymorphism of the poly-Gln region in the RAI1 gene?
Finally, examine the link to "trf at chr19:46273463-46273524".
This is a repeat region in the gene DMPK. We now enter into a bit
of confusion as this gene is on the complementary minus strand but
repeats are listed in the Genome
Browser database by the reference strand orientation.
Q25.
What is the trinucleotide repeat of the DMPK gene/mRNA? You may
be helped by reversing the sequence in the browser window by
clicking the "reverse" button below the browser image window.
browser position chr22:20100000-20100900
track name=coords description="Chromosome coordinates list" visibility=2, color=255,0,0,
chr22 20100000 20100100
chr22 20100011 20100200
You should now see a new track with lines corresponding to the
regions listed above.