Small RNAs

From Purdue Genomics Database Facility

Jump to: navigation, search

Michael J. Axtell, Pennsylvania State University, Dept. of Biology, 410 Life Sciences Bldg. University Park, PA 16802 USA, mja18 -at- psu -dot- edu


Contents

microRNA Loci

Currently, the microRNA (miRNA) registry, miRBase, lists 58 S. moellendorffii miRNA loci; because some of these are paralogs, they represent 44 distinct miRNA families. The genomic loci giving rise to these miRNAs had previously been inferred by local genome assembly using the whole-genome shotgun traces (Axtell et al., 2007). These consensus sequences were mapped to the version 1 S. moellendorffii scaffolds using megablast. As expected, in most cases (43 out of 58 loci), both alleles were found in the assembly. For these 43 loci, following convention for protein-coding loci, the allele on the lower numbered scaffold was designated -1, and the allele on the higher numbered scaffold designated -2. Thus, 101 (2x43 + 15) distinct miRNA loci/alleles were mapped to this assembly. Similar to angiosperm miRNAs, but dissimilar to Physcomitrella patens miRNAs, most S. moellendorffii miRNA loci were in non-annotated regions of the genome (80 out of 101 loci/alleles). This implies that these 80 derive from independent transcription units. The other 21 loci/alleles overlapped introns, exons, or both in both sense (14) and antisense (7) orientations. Details can be found in this tab-delimited text file. miRBase annotated mature miRNA sequences were also mapped onto each allele / locus. Unfortunately, the JGI genome browser does not allow annotation of non protein-coding loci. The details of these miRNA hairpins and mature miRNAs can instead be found in this GFF formated text file. I have asked Bobby Otillar at JGI if he would post the miRNAs as a public browser track --Mj axtell 11:27, 1 February 2008 (EST)



Predicted microRNA Targets

The targets of the known S. moellendorffii miRNAs were predicted using a non-redundant set of queries comprising the currently known mature miRNA sequences (compiled from miRBase version 10.0). The transcript dataset was the JGI filtered models 3 dataset, which comprises only one allelic variant for each transcript mode; thus, these predictions should largely be free of redundancies due to predicting both alleles of a given target. Target prediction essentially followed Allen et al. (2006), except that conservation in other genomes was not explicitly considered during the computational stage. The mismatch score cutoff was set at 3.5, except for the predicted targets of conserved miRNAs which were homologs of miRNA targets in other species. These target predictions were purposefully conservative, as I wished to avoid false positive predictions. However, there are undoubtedly many true targets that were missed with this approach.

The predicted targets were separated into two lists: An A list, comprising high-confidence predictions, and a B list, comprising lower confidence predictions. High-confidence predictions were generally those with a score of 2.5 or less, those where multiple members of the same gene family were predicted as a target of a single miRNA (a known trait of many bona fide miRNA targets), or those of conserved miRNAs where a homologous target has been shown to be targeted in another plant species. Details of the A and B lists are in the files linked to below.

The predicted targets of several well-conserved miRNA families are invariant between S. moellendorffii and other plants. For instance, miR160 was predicted to target an Auxin Response Factor (ARF) mRNA, miR166 to target a Homeodomain Leucine-Zipper III (HDZIP-III) mRNA, miR171 to target GRAS domain transcription factors, miR408 a plastocyanin-domain mRNA, and miR536 (which is conserved in the moss Physcomitrella patens but not in angiosperms) an F-box mRNA. High-confidence predicted targets were not found for some other well-characterized, long-conserved miRNAs, including miR156, miR159, miR319, and miR396.

Very few of the non-conserved S. moellendorffii miRNAs had high-confidence predicted targets. miR1094 was the major exception to this trend; miR1094 is predicted to target a large set of protein kinases. However, most non-conserved S. moellendorffii miRNAs either had only one or two lower-confidence predicted targets or, more often, no predicted targets at all. This mirrors observations made in other plant species where miRNA conservation is positively correlated with the ability to confidently predict and detect targets (Fahlgren et al., 2007).

"A" list of high-confidence target predictions

"B" list of lower-confidence target predictions



Other Small RNAs

NCBI GEO GSE7320 contains a set of 36,582 expressed small RNAs sequenced from a wild-type specimen (Axtell et al., 2007). Of these, 21,342 could be mapped once or more onto the S. moellendorffii version 1.0 genome assembly (~58%). The other 42% of the small RNAs likely corresponds to a mixture of sequencing errors (in the small RNA sequencing) and of small RNAs originating from regions of the S. moellendorffii genome which were omitted from the assembly. The number of times a given small RNA is sequenced is a rough indicator of its molecular abundance. Because some small RNAs were sequenced more than once, the 21,342 mapped unique small RNAs includes a total of 127,327 reads. Only a slim fraction of the unique sequences (670 / 21342 =~ 3.1%) correspond to the microRNA hairpins described above, but many of these are quite abundant in terms of total number of sequence reads, with over 40% of the reads corresponding to miRNA hairpins (52813 / 127327 =~ 41.5%). However, as in other plants, much of the small RNA abundance and the overwhelming majority of small RNA sequence diversity is not attributable to currently annotated miRNAs.

Most miRNAs were 21nts in size; In contrast, The population of non-microRNA small RNAs had a broad size distribution. Many of these reads were <= 19nts or >=25nts; these sizes are incompatible with the size specificities of any known Dicers, and are likely the result of degradation in the original small RNA sample. However, there were also a significant number of small RNAs in approximately equal proportions in the 21-24nt size range, suggestive of multiple Dicer activities in S. moellendorffii. In Arabidopsis, there are four Dicer proteins which are specialized for different classes and sizes of small RNAs: DCL1 makes 21nt microRNAs, DCL2 makes 22 and 23nt siRNAs, DCL3 makes 24nt siRNAs, while DCL4 makes 21nt siRNAs (Chapman and Carrington, 2007). The fact that the non-miRNA S. moellendorffii small RNAs fall into all of those size categories implies a similar differentiation of the small RNA producing machinery. Indeed, Kurata and colleagues show on their wiki ( Epigenetic_gene_regulation ) that S. moellendorffii contains clear DCL1, DCL3 and DCL4 homologs. However, because the existing S. moellendorffii small RNA dataset appears to have significant contamination with degraded RNAs, further small RNA sequencing is needed to reach firm conclusions on the possible roles of siRNAs in this organism.

File:Selmo sRNA histogram.png


References

Allen, E., Xie, Z., Gustafson, A.M., and Carrington, J.C. 2005. microRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121: 207-221.

Axtell, M.J., Snyder, J.A., and Bartel, D.P. 2007. Common functions for diverse small RNAs of land plants. Plant Cell 19: 1750-1769.

Chapman, E.J. and Carrington, J.C. 2007. Specialization and evolution of endogenous small RNA pathways. Nat. Rev. Genet. 8: 884-896.

Fahlgren, N., Howell, M.D., Kasschau, K.D., Chapman, E.J., Sullivan, C.M., Cumbie, J.S., Givan, S.A., Law, T.F., Grant, S.R., Dangl, J.L. et al. 2007. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE 2: e219.

research Groups