Resources

SEQUENCE DATABASES & RETRIEVAL SYSTEMS

The major sequence retrieval systems
These systems use the sequence databases listed in Major public sequence databases below.

NCBI Entrez (USA), also contain PubMed literature DB (Medline).
   http://www.ncbi.nlm.nih.gov/Entrez/ NEW: Search all DB
   http://www.ncbi.nlm.nih.gov/entrez/ OLD: Choose DB
SRS (Europe, several sites)
   http://srs.ebi.ac.uk
   http://srs.sanger.ac.uk
   Lund http://titanic.thep.lu.se

Major public sequence databases
These databases belongs to the International Nucleotide Sequence Database Collaboration and they exchange data daily. As each member database stores and presents the underlying data using a slightly different format, this data exchange makes all known nucleotide and protein sequence data available to all users, regardless of which of the three databases are queried.

DNA Data Bank of Japan
   http://www.ddbj.nig.ac.jp
EMBL Nucleotide Sequence Database
   http://www.ebi.ac.uk/embl/index.html
GenBank
   http://www.ncbi.nlm.nih.gov

Expressed sequence tag clustering databases
The ability to bring together expressed sequence tag, mRNA and other related sequences into gene-oriented clusters often facilitates genomic analysis, since the method groups individual sequences that most likely arise from the same gene or transcript. These three databases provide gene-oriented views of the data, using different algorithms in calculating the individual gene clusters.

UniGene
   http://www.ncbi.nlm.nih.gov/UniGene
STACK
   http://www.sanbi.ac.za/Dbases.html
TIGR Gene Indices
   http://www.tigr.org/tdb/tgi.shtml

STRUCTURAL/SECONDARY DATABASES

Alternative splicing databases
EBI: Alternative Splicing Database (ASD = AltSplice, AltExtron, AEdb)
   http://www.ebi.ac.uk/asd/
UCLA: Alternative splicing annotation project (ASAP = HASDB)
   http://www.bioinformatics.ucla.edu/ASAP/

Protein/RNA secondary databases
Human Protein Reference db
   http://www.hprd.org/
PROSITE (patterns, profiles)
   http://www.expasy.org/prosite/
Protein families database (Pfam)
   http://www.sanger.ac.uk/Software/Pfam/
NCBI Conserved Domain Database (= Pfam+SMART)
   http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
EBI InterPro (=Pfam,PROSITE,SCOP, ...))
   http://www.ebi.ac.uk/interpro/
Rfam (RNA covariance models)
   http://www.sanger.ac.uk/Software/Rfam/

Structural databases
These databases have information on the 3D structure of sequences.

Protein Data Bank (PDB)
   http://www.rcsb.org/pdb/
Nucleic Acid Database (NDB)
   http://ndbserver.rutgers.edu/
NCBI mmdb (subset PDB)
   http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml

OTHER DATABASES

A more extensive list of databases is in the online Nucleic Acids Research Database Collection, at http://nar.oupjournals.org/cgi/content/full/30/1/1/DC1.

GENOME INFORMATION

Genome Browsers

Ensembl
   http://www.ensembl.org
NCBI Map Viewer
   http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/map_search
UCSC Genome Browser
   http://genome.ucsc.edu
Celera
   http://www.celera.com/genomics/academic/home.cfm
ORNL Genome Channel
   http://compbio.ornl.gov/channel/
RIKEN Genomic Sciences Center
   http://hgrep.ims.u-tokyo.ac.jp/
VISTA (homology)
   http://pipeline.lbl.gov/cgi-bin/gateway2

Genome annotation
The following sites provide detailed information on annotations at each of the three major genome portals.

Distributed Annotation System
   http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/EnsemblDAS.html
Ensembl Science Documentation
   http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/ScienceDocumentation.html
NCBI Contig Assembly and Annotation Process
   http://www.ncbi.nlm.nih.gov/genome/guide/build.html
UCSC Annotation Database
   http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html

Human Genome Hub and Genome Central
These sites provide jumping-off points to major genome-based web sites. Resources available include trace data archives, access to cDNA and expressed sequence tag data and mapping information used to produce genome assemblies. The web sites of the individual members of the International Human Genome Sequencing Consortium may be accessed through these sites.

Ensembl Human Genome Central
   http://www.ensembl.org/genome/central/
NCBI Human Genome Central
   http://www.ncbi.nlm.nih.gov/genome/guide/central.html
NHGRI Genome Hub
   http://www.nhgri.nih.gov/genome_hub.html
UK HGMP GenomeWeb
   http://www.hgmp.mrc.ac.uk/GenomeWeb/genome-db.html

GENETIC / PSYSICAL MAPS

Human genetic and physical maps
The databases listed below represent a significant portion of the data underlying current human genome assemblies. Many of these data are available through DDBJ/EMBL/GenBank, but each database contains additional information regarding clones, constructs and similar that is not available through the major sequence repositories. A more extensive list of human genetic and physical maps can also be found through the online Nucleic Acids Research Database Collection, at http://nar.oupjournals.org/cgi/content/full/30/1/1/DC1.

Bacterial artificial chromosome and accession maps
   http://genome.wustl.edu/projects/human/index.php?fpc=1
GenAtlas
   http://www.citi2.fr/GENATLAS/
Genebridge4 radiation hybrid maps
   http://www.sanger.ac.uk/Software/RHserver/RHserver.shtml
GeneMap '99
   http://www.ncbi.nlm.nih.gov/genemap99
GenMapDB
   http://genomics.med.upenn.edu/genmapdb
Généthon linkage map
   http://www.genethon.fr/index_en.html
HuGeMap
   http://www.infobiogen.fr/services/Hugemap
Marshfield genetic maps
   http://research.marshfieldclinic.org/genetics/Map_Markers/maps/IndexMapFrames.html
RHdb
   http://corba.ebi.ac.uk/RHdb
Stanford G3 and TNG radiation hybrid maps
   http://www-shgc.stanford.edu/RH/

Genomic Databases and Resources
Useful databases containing human mutation, variation, medical or expression data. The list is a short representative cross-section of the types of database freely available to genome researchers. See also the 'lists of lists' found at the Human GenomeHub and Genome Central cites for a more extensive list.

Cancer Genome Anatomy Project (CGAP)
   http://www.ncbi.nlm.nih.gov/CGAP/
Cancer Biomedical Informatics grid (caBIG)
   http://cabig.nci.nih.gov/
Genome DataBase (GDB)
   http://www.gdb.org
HUGO Gene Nomenclature
   http://www.gene.ucl.ac.uk/nomenclature
Online Mendelian Inheritance in Man (OMIM)
   http://www.ncbi.nlm.nih.gov/Omim
SNP Consortium
   http://snp.cshl.org

EDUCATION & ETHICS

Genetic education
The following sites present basic information on genetics and genomics, much of which is appropriate for elementary and secondary school education, as well as for the college level. Many of these sites offer teaching plans, graphics and other teaching resources that can be freely used in the classroom or lecture hall.

Access Excellence
   http://www.accessexcellence.org/
Department of Energy education resources
   http://www.ornl.gov/hgmis/education/education.html
Genetics Education Center
   http://www.kumc.edu/gec/
NHGRI Exploring our Molecular Selves Multimedia Kit
   http://www.genome.gov/Pages/EducationKit/
NHGRI Glossary of Genetic Terms
   http://www.genome.gov/glossary.cfm

Ethical, legal and social Issues
Ethical, legal and social issues (ELSI) are becoming increasingly important in this age of genetic and genomic research. The following web sites provide an introduction to important issues related to genome biology as applied to human health and provide a jumping-off point for further information.

Swedish directives for stem-cell research
EU directives for stem-cell research
Swedish directives for biobanks.
EU directives for biobanks.
DOE ELSI Program
   http://www.ornl.gov/hgmis/elsi/elsi.html
Lawrence Berkeley National Laboratory
   http://www.lbl.gov/Education/ELSI/
NHGRI ELSI Program
   http://www.nhgri.nih.gov/ELSI/

SEARCHING FOR MATCHES IN DATABASES

Sequence-based searching
List of sequence similarity search tools can be found on the ExPASy web site, at http://us.expasy.org/tools/.

BLAST
   http://www.ncbi.nlm.nih.gov/BLAST/
Mega-BLAST with Trace DB (Cross-species)
   http://www.ncbi.nlm.nih.gov/blast/tracemb.shtml
BLAT
   http://genome.ucsc.edu/cgi-bin/hgBlat?command=start
Ensembl BLAST
   http://www.ensembl.org/Homo_sapiens/blastview
SSAHA
   http://trace.ensembl.org/perl/ssahaview
EnsMART
   http://www.ensembl.org/Multi/martview?

Pattern/domain/profile-based searching
PROSITE
   http://us.expasy.org/tools/scanprosite/
MotifScan (Prosite patterns/profiles + Pfam)
   http://hits.isb-sib.ch/cgi-bin/PFSCAN?
NCBI Conserved Domain Database (= Pfam+SMART)
   http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd
InterPro scan
   http://www.ebi.ac.uk/InterProScan/
Pfam (protein HMM models)
   http://www.sanger.ac.uk/Software/Pfam/
   http://pfam.cgb.ki.se/
Rfam (RNA covariance models)
   http://www.sanger.ac.uk/Software/Rfam/

Structure-based searching/prediction
TMHMM, SignalP, TargetP, ChloroP (and more)
    CBS Prediction Servers
   http://www.cbs.dtu.dk/services/
RevTrans, MatrixPlot,RNA Structure Logos
    CBS Bioinformatics Tools
   http://www.cbs.dtu.dk/biotools/
PSIpred, Protein Structure Prediction Server
    http://bioinf.cs.ucl.ac.uk/psipred/
FFAS, Fold&Function Assign System etc. (Burnham Inst.)
   http://bioinformatics.ljcrf.edu/pages/
List of secondary structure prediction sites
   xxx
Mfold RNA secondary structure prediction
   http://bioweb.pasteur.fr/seqanal/interfaces/mfold-simple.html
DALI/Fssp 3D structure search
    http://www.ebi.ac.uk/dali/

Gene prediction
Genscan
   http://genes.mit.edu/GENSCAN.html
HMMGENE
   http://www.cbs.dtu.dk/services/HMMgene/

Multiple Sequence Alignments
ClustalW
   http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_clustalw.html
T-coffee
   http://www.ch.embnet.org/software/TCoffee.html

MODEL ORGANISMS

Model organisms, taxonomy & information
NCBI Taxonomy
   http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html
NIH Model organisms for biomedical research
   http://www.nih.gov/science/models/
TIGR Gene Indices   http://www.tigr.org/tdb/tgi/
The Encyclopedia of Life (EOL), complete proteome of all organisms
   http://eol.sdsc.edu/

Model organism databases
Some of the sequencing initiatives on model organisms. Additional information on the progress of numerous model organism sequencing initiatives can be found on the Model Organisms for Biomedical Research web page, at http://www.nih.gov/science/models/. A more extensive list of organismal databases can also be found through the online Nucleic Acids Research Database Collection, at http://nar.oupjournals.org/cgi/content/full/30/1/1/DC1.

* Arabidopsis thaliana *
The Arabidopsis Information Resource
   http://www.arabidopsis.org
Arabidopsis Genome Initiative
   http://mips.gsf.de/proj/thal/db/

* Caenorhabditis elegans *
AceDB
   http://www.acedb.org
WormBase
   http://www.wormbase.org/

* Chlamydomonas reinhardtii *
Chlamydomonas reinhardtii Resource Center
   http://www.biology.duke.edu/chlamy_genome/

* Danio rerio (zebrafish) *
Zebrafish Information Network
   http://zfin.org

* Dictyostelium discoideum *
NIH models: Dictyostelium
   http://www.nih.gov/science/models/d_discoideum/
dictyBase
   http://dictybase.org/

* Drosophila melanogaster *
Berkeley Drosophila Genome Project
   http://www.fruitfly.org/
FlyBase
   http://flybase.bio.indiana.edu/

* Escherichia coli *
EcoGene
   http://bmb.med.miami.edu/EcoGene/EcoWeb/

* Fugu/Takifugu rubripes (pufferfish) *
IMCB Fugu Genome Project
   http://www.fugu-sg.org/

* Microbial Genomes *
Comprehensive Microbial Resource
   http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl
TIGR Microbial Database
   http://www.tigr.org/tdb/mdb/
Plasmodium Genome Resource
   http://www.plasmodb.org/

* Mouse *
Mouse Genome Database/Informatics
   http://www.informatics.jax.org/

* Rat *
Rat Genome Database
   http://rgd.mcw.edu
RatMap & GAPP
   http://ratmap.gen.gu.se/

* Xenopus laevis *
Xenbase
   http://www.xenbase.org/
Proteins of Xenopus laevis Database
   http://eol.sdsc.edu/perl/browser.pl?tax=Xenopus%20laevis&tid=8355

* Yeast *
Comprehensive Yeast Genome Database (MIPS)
   http://mips.gsf.de/proj/yeast/CYGD/db/
Saccharomyces Genome Database (SGD)
   http://genome-www.stanford.edu/Saccharomyces/
S. pombe Genome Sequencing Project
   http://www.sanger.ac.uk/Projects/S_pombe/
Génolevures
   http://cbi.labri.fr/Genolevures/
PROPHECY DataBase, links
   http://prophecy.lundberg.gu.se/
Yeast Intron Database
   http://www.cse.ucsc.edu/research/compbio/yeast_introns.html

Last modified: 11 March 2004, MAR