JGI Selaginella Portal
From Purdue Genomics Database Facility
|
Articles
- Brief History of the Selaginella genome project and taxonomy
- Contamination
analysis - Distribution of genes
- Flagellar Proteins
- Protein kinases
- transcription factors
- Evolution of developmental genes
- Meristems
- Root
- Vascular system
- Cytoskeletons
- Epidermal cell differentiation
- Pollen
- ABA
- Seed
- Ethylene
- Brassinosteroids
- Gibberellins
- Cytokinin
- Evolution of cytokinin signaling
- Auxin
- Phosphoinositides
- Evolution of auxin signaling
- Epigenetic gene regulation
- Phase transition
- Cell cycle
- Circadian clock
- Light signalling
- small RNAs
- PEBP gene family
- Potassium channels
- PPR gene family
- Cyclic Nucleotide Gated Channels
- YABBY gene family
- Ras superfamily GTPases
- LUMINIDEPENDENS (LD)
- UNUSUAL FLORAL ORGANS (UFO)
- STERILE APETALA (SAP)
- PESCADILLO - Wnt-like signaling?
- KEGG differences
- The NTMC2 genes of embryophytes
- BRK1 and apical growth
- MADS-box genes
- Lignin monomer biosynthesis
- BAHD family acyltransferases
- SCPL (serine carboxypeptidase-like) enzymes
- Cytochrome P450
- Light Harvesting Complexes
- Transcription factors involved in senescence
- Phylogenetic_Patterns
- Metallothionein
- Duplicate Gene Analyses
- HAP3 gene family
- Sucrose transporter gene family
- Hexose/hexitol transporter gene family
- Amino acid transporter gene family
- Ammonium transporter gene family
- plastidic maltose transporter gene family
- Telomeres, subtelomeres and telomere-related genes
- Sulfate transporter gene family
- proton/urea cotransporter
- Cell wall composition and glycosyltransferases involved in cell wall formation
- KNOX Genes
- Cellulose synthase superfamily
- Ubiquitin-mediated proteolysis
- Whole Genome Syntenic Analysis of Haplotypes
- Lack of Synteny Between Selaginella and Physcomitrella patens, Oryza sativa japonica, and Vitis vinifera
Portal and assembly release information
JGI provides a genome annotation portal that includes multiple gene models, comparisons to other plant genomes, comparisons to ESTs, protein motif analysis, and GO terms.
- JGI Selaginella portal (A password is required)
- Here is Bobby Otillar's (JGI) description of the release, see JGI release notes for a complete description.
The 212.5Mbp Selmo1 assembly contains 759 nuclear scaffolds and is predicted to have 34,292 genes. The following tables summarize the assembly and gene models statistics for Selmo1:
NUCLEAR GENOME ASSEMBLY [1]
Nuclear genome size (Mbp) 212.5
Sequencing read coverage depth 7.0x
Reported # of contigs 5,156
# of nuclear scaffolds 759
# of nuclear scaffolds >2 Kbp 726
Nuclear scaffold N/L50 38/1.7 Mbp
Three largest Scaffolds (Mbp) 7.0
6.2
4.5
[1] While the estimated genome size is 110Mb, we sequenced two divergent
haplotypes of Selaginella so this assembly represents 212 Mb of genomic
sequence. The statistics above include only nuclear scaffolds1kb or longer.
GENE MODELS [2] average length (bp) of: gene 1711.37 transcript 1229.57 exon 213.66 intron 103.33
Proteins [2]: protein length (aa) 392.04 exons per gene 5.75 # of gene models 34,292 [2] FilteredModels2 gene model track
JGI gene model names
JGI runs multiple gene modeling software packages. Models from each method (and sometimes from runs of the same method with different parameters) appear as separate tracks in the genome browser. It is frequently true that none of the models are exactly correct - detecting and fixing this is the expert annotator's (you) job.
consider the following blast search
jgi|Selmo1|73774|e_gw1.0.1.1 5860 0.0 1 jgi|Selmo1|86461|e_gw1.8.3.1 5848 0.0 1 jgi|Selmo1|138124|e_gw1.182.3.1 787 0.0 9 jgi|Selmo1|447568|estExt_fgenesh2_pg.C_930077 608 0.0 13 jgi|Selmo1|425643|fgenesh2_pg.C_scaffold_76000001 590 0.0 15 jgi|Selmo1|91756|e_gw1.12.3.1 587 0.0 22 jgi|Selmo1|92081|e_gw1.13.2.1 484 0.0 15 jgi|Selmo1|121638|e_gw1.66.3.1 484 0.0 15 jgi|Selmo1|94851|e_gw1.16.1.1 431 0.0 14 jgi|Selmo1|76047|e_gw1.1.2.1 431 0.0 14 jgi|Selmo1|185288|estExt_Genewise1Plus.C_1080113 355 0.0 18 jgi|Selmo1|131043|e_gw1.104.1.1 244 0.0 19 jgi|Selmo1|130462|e_gw1.101.1.1 244 0.0 19
- each name has several parts separated by vertical bars (known as pipes to unixians), e.g. jgi|Selmo1|73774|e_gw1.0.1.1
- jgi|Selmo1 - assembly release
- 73774 - gene model ID. this is a unique number for each gene. use this for referring to a specific gene model.
- e_gw1.0.1.1 - gene model description
- the gene model description is a code that describes some aspects of how the model was generated, e.g., for e_gw1.0.1.1
- e_gw - indicates a GeneWise model including start and stop codon information
- 1 - iteration (I think) some methods are run multiple times with different parameters, producing different results
- .0 - scaffold zero, e_gw1.8.3.1 is a model on scaffold 8
- .3.1 - sequential model number on the scaffold (I think the last digit, e.g. '.1' is a revision number to allow models to be updated)
- C_930077 - another way of indicating scaffold and sequential model number. this would be scaffold 93, model number 77
- C_scaffold_76000001 - yet another way of indicating scaffold and sequential model number. this would be scaffold 760, model 1.
Gene model codes
Click on the i button next to the track to get information on the modeling software and parameters used. Here is a partial list of the model codes.
- gw = GeneWise
- e_gw = GeneWise with start and stop codon = GeneWisePlus
- fgenesh2_pg = ab initio fgenesh
- estExt_ = model modified using ESTs
Gribskov 09:42, 13 November 2007 (EST)
Haplotypes
The selaginella isolate that was sequenced is not an inbred laboratory organism; The genomic sequence represents the combined sequence of both haplotypes.
- Each gene should appear as two allelic variants, e.g., jgi|Selmo1|73774|e_gw1.0.1.1 and jgi|Selmo1|86461|e_gw1.8.3.1, above
- Some genes will have only one model, because...
- the two alleles were so similar that they assembled as one sequence, or
- one or both alleles are mis-modeled
- JGI expects that about 70% to 80% of the genes will have two variants.
JGI browser hints
- in the scaffold track, Red = gaps in assembly, black = contigs
- in the Vista tracks, pink= intron/intergene space, blue=exon

