Contamination analysis
From Purdue Genomics Database Facility
|
Articles
- Brief History of the Selaginella genome project and taxonomy
- Contamination
analysis - Distribution of genes
- Flagellar Proteins
- Protein kinases
- transcription factors
- Evolution of developmental genes
- Meristems
- Root
- Vascular system
- Cytoskeletons
- Epidermal cell differentiation
- Pollen
- ABA
- Seed
- Ethylene
- Brassinosteroids
- Gibberellins
- Cytokinin
- Evolution of cytokinin signaling
- Auxin
- Phosphoinositides
- Evolution of auxin signaling
- Epigenetic gene regulation
- Phase transition
- Cell cycle
- Circadian clock
- Light signalling
- small RNAs
- PEBP gene family
- Potassium channels
- PPR gene family
- Cyclic Nucleotide Gated Channels
- YABBY gene family
- Ras superfamily GTPases
- LUMINIDEPENDENS (LD)
- UNUSUAL FLORAL ORGANS (UFO)
- STERILE APETALA (SAP)
- PESCADILLO - Wnt-like signaling?
- KEGG differences
- The NTMC2 genes of embryophytes
- BRK1 and apical growth
- MADS-box genes
- Lignin monomer biosynthesis
- BAHD family acyltransferases
- SCPL (serine carboxypeptidase-like) enzymes
- Cytochrome P450
- Light Harvesting Complexes
- Transcription factors involved in senescence
- Phylogenetic_Patterns
- Metallothionein
- Duplicate Gene Analyses
- HAP3 gene family
- Sucrose transporter gene family
- Hexose/hexitol transporter gene family
- Amino acid transporter gene family
- Ammonium transporter gene family
- plastidic maltose transporter gene family
- Telomeres, subtelomeres and telomere-related genes
- Sulfate transporter gene family
- proton/urea cotransporter
- Cell wall composition and glycosyltransferases involved in cell wall formation
- KNOX Genes
- Cellulose synthase superfamily
- Ubiquitin-mediated proteolysis
- Whole Genome Syntenic Analysis of Haplotypes
- Lack of Synteny Between Selaginella and Physcomitrella patens, Oryza sativa japonica, and Vitis vinifera
MEGAN analysis
MEGAN (MEtaGenome ANalyzer) is a is a tool that can be used to analyze large metagenomic datasets ( see, MEGAN homepage, online publication). Based on a blastp search against Genpept (rel. 162) MEGAN was used to check the Selaginella genome assembly for contamination. For this purpose, the Selaginella genomic scaffolds were cut into 2000 nt subsequences and subjected to BLAST. MEGAN was then used to map the BLAST hits on the NCBI taxonomy to summarize and order the results.
MEGAN mapping results
Filtering preferences:
- bit score cut-off: 50
- top percentage score: 10
- min support for taxa: 5
Summary
- File: Selaginella_moellendorfii_split_2000.fas.megan2
- Reads total: 107693
- Reads assigned: 62724
- Reads unassigned: 2272
- Reads with no hits: 42967
- Reads that only hit unknown taxa: 29
- Hits total: 107963
File:Selaginella moellendorffi bioperled split 200043.png
The size of the circles are proportional to the number of sequences assigned to the corresponding taxon/genus. The numbers on the right side are the amount of Selaginella subsequences yielding a significant hit.
The hits in Metazoa and Fungi are mainly transposon related, such hits are also observed in Physcomitrella, and should be ok.
There are significant hits in Bacteria especially B. selenitireducens. This organism is also sequenced by the JGI B. selenitireducens genome draft. We have looked at these hits in more detail and noticed, that there are only two proteins which amount the main percentage of hits.
These two proteins are:
| Genpept accession | Description | Number of Selaginella subsequences with hits |
| EDP81118.1 | hypothetical protein BselDRAFT_2604 [''Bacillus selenitireducens'' MLS10] | 1620 |
| EDP81119.1 | isochorismatase hydrolase [''Bacillus selenitireducens'' MLS10] | 955 |
The corresponding Selaginella scaffolds subsequences yielding these hits are spread all over the Selaginella scaffolds.
Are there contaminations all over the Selaginella scaffolds ? To answer this question we have had a closer look at these regions in the Selaginella genome:
e.g.
- scaffold_121:440881-442880
- scaffold_1:2169498-2171497
- scaffold_5:297595-299594
- scaffold_27:1465927-1467926
- scaffold_3:1458579-1460578
These regions seem to be always intergenic and related to LTR retrotransposons. In conclusion, the Selaginella sequences producing these Bacillus selenitireducens protein hits are repetitive in the Selaginella genome and might be related to LTR retrotransposons. The fact that only two "Bacillus selenitireducens proteins" produce all these ~2000 hits leads to the assumption that that they are in fact contaminations in the Bacillus selenitireducens genome. These two proteins EDP81118.1 and EDP81119.1 are on one contig (4000241_Cont78) in the Bacillus genome. The results above supports the fact that at least this contig is not bacterial but belongs to Selaginella.
Finally, the results suggest that there are no obvious bacterial contaminations in the Selaginella genome.
G/C content of Selaginella genomic scaffolds
We used the software geecee (EMBOSS geecee) to calculate the fraction of G+C bases of the genomic scaffolds.
Selaginella genomic scaffolds G/C distribution plot.
The fact, that there is no secondary G/C peak detectable, indicates that there is no obvious/large scale contamination in the genome assembly.
More Information
For further information and questions please contact:
andreas.zimmer@biologie.uni-freiburg.de
stefan.rensing@biologie.uni-freiburg.de

