JGI Selaginella Portal

From Purdue Genomics Database Facility

Jump to: navigation, search

Contents

Articles

edit

Portal and assembly release information

JGI provides a genome annotation portal that includes multiple gene models, comparisons to other plant genomes, comparisons to ESTs, protein motif analysis, and GO terms.

  • Here is Bobby Otillar's (JGI) description of the release, see JGI release notes for a complete description.
We are pleased to announce JGI annotation pre-release portal for the draft assembly and annotation of Selaginella moellendorffii (Selmo1). This genome was assembled by the Stanford Human Genome Center and annotated by JGI using JGI Annotation Pipeline.

The 212.5Mbp Selmo1 assembly contains 759 nuclear scaffolds and is predicted to have 34,292 genes. The following tables summarize the assembly and gene models statistics for Selmo1:

NUCLEAR GENOME ASSEMBLY [1]
Nuclear genome size (Mbp)                 212.5
Sequencing read coverage depth             7.0x
Reported # of contigs                     5,156
# of nuclear scaffolds                      759
# of nuclear scaffolds >2 Kbp               726
Nuclear scaffold N/L50               38/1.7 Mbp
Three largest Scaffolds (Mbp)               7.0
                                            6.2
                                            4.5
[1] While the estimated genome size is 110Mb, we sequenced two divergent
haplotypes of Selaginella so this assembly represents 212 Mb of genomic
sequence. The statistics above include only nuclear scaffolds1kb or longer.
GENE MODELS [2]
average length (bp) of:
 gene            1711.37
 transcript      1229.57
 exon             213.66
 intron           103.33
Proteins [2]:
 protein length (aa)   392.04
 exons per gene          5.75
 # of gene models      34,292
[2] FilteredModels2 gene model track

JGI gene model names

JGI runs multiple gene modeling software packages. Models from each method (and sometimes from runs of the same method with different parameters) appear as separate tracks in the genome browser. It is frequently true that none of the models are exactly correct - detecting and fixing this is the expert annotator's (you) job.

consider the following blast search

jgi|Selmo1|73774|e_gw1.0.1.1                                5860   0.0    1
jgi|Selmo1|86461|e_gw1.8.3.1                                5848   0.0    1
jgi|Selmo1|138124|e_gw1.182.3.1                              787   0.0    9
jgi|Selmo1|447568|estExt_fgenesh2_pg.C_930077                608   0.0    13
jgi|Selmo1|425643|fgenesh2_pg.C_scaffold_76000001            590   0.0    15
jgi|Selmo1|91756|e_gw1.12.3.1                                587   0.0    22
jgi|Selmo1|92081|e_gw1.13.2.1                                484   0.0    15
jgi|Selmo1|121638|e_gw1.66.3.1                               484   0.0    15
jgi|Selmo1|94851|e_gw1.16.1.1                                431   0.0    14
jgi|Selmo1|76047|e_gw1.1.2.1                                 431   0.0    14
jgi|Selmo1|185288|estExt_Genewise1Plus.C_1080113             355   0.0    18
jgi|Selmo1|131043|e_gw1.104.1.1                              244   0.0    19
jgi|Selmo1|130462|e_gw1.101.1.1                              244   0.0    19
  • each name has several parts separated by vertical bars (known as pipes to unixians), e.g. jgi|Selmo1|73774|e_gw1.0.1.1
    • jgi|Selmo1 - assembly release
    • 73774 - gene model ID. this is a unique number for each gene. use this for referring to a specific gene model.
    • e_gw1.0.1.1 - gene model description
  • the gene model description is a code that describes some aspects of how the model was generated, e.g., for e_gw1.0.1.1
    • e_gw - indicates a GeneWise model including start and stop codon information
    • 1 - iteration (I think) some methods are run multiple times with different parameters, producing different results
    • .0 - scaffold zero, e_gw1.8.3.1 is a model on scaffold 8
    • .3.1 - sequential model number on the scaffold (I think the last digit, e.g. '.1' is a revision number to allow models to be updated)
    • C_930077 - another way of indicating scaffold and sequential model number. this would be scaffold 93, model number 77
    • C_scaffold_76000001 - yet another way of indicating scaffold and sequential model number. this would be scaffold 760, model 1.

Gene model codes

Click on the i button next to the track to get information on the modeling software and parameters used. Here is a partial list of the model codes.

  • gw = GeneWise
  • e_gw = GeneWise with start and stop codon = GeneWisePlus
  • fgenesh2_pg = ab initio fgenesh
  • estExt_ = model modified using ESTs

Gribskov 09:42, 13 November 2007 (EST)

Haplotypes

The selaginella isolate that was sequenced is not an inbred laboratory organism; The genomic sequence represents the combined sequence of both haplotypes.

  • Each gene should appear as two allelic variants, e.g., jgi|Selmo1|73774|e_gw1.0.1.1 and jgi|Selmo1|86461|e_gw1.8.3.1, above
  • Some genes will have only one model, because...
    • the two alleles were so similar that they assembled as one sequence, or
    • one or both alleles are mis-modeled
  • JGI expects that about 70% to 80% of the genes will have two variants.

JGI browser hints

  • in the scaffold track, Red = gaps in assembly, black = contigs
  • in the Vista tracks, pink= intron/intergene space, blue=exon
research Groups