Ras superfamily GTPases
From Purdue Genomics Database Facility
Marek Elias
Charles University in Prague, Faculty of Science, Department of Botany, Prague, Czech Republic
The Ras superfamily is the largest subgroup of the superclass of P-loop GTPases and related proteins (Leipe et al., 2002). The superfamily, often (but imprecisely) referred to as "small GTPase superfamily", comprises the traditional RAB, RAS, RHO, RAN, and ARF/Sar1 families, as well proteins not easily classified into any of these families, e.g., RJL, Miro, IFT27, SR-beta. In addition, the alpha subunit of heterotrimeric G-proteins also phylogenetically belongs here, as well as the ROCO family of large proteins with the ROC GTPase domain. Most Ras superfamily members regulate various important processes within the eukaryotic cell as membrane and protein trafficking, cytoskeleton dynamics, transmission of signals from membrane receptors etc. (Vernoud et al., 2003).
I used BLASTP and TBLASTN to identify all Ras superfamily GTPases encoded by the Selaginella genome and annotated the corresponding genes on the JGI genome portal (including correction or creation of new gene models). My comparative analysis of the Ras superfamily genes in Selaginella and other sequenced embryophytes (Arabidopsis thaliana, Oryza sativa, Physcomitrella patens) revealed several interesting points, which are discussed below (see also Table 1):
1. Uniquely among embryophytes, Selaginella possesses an ortholog of the Chlamydomonas gene FAP9, in Caenorhabditis called IFTA-2 and in mammals RABL5. FAP9 was found in the Chlamydomonas flagellar proteome (Pazour et al., 2005) and the Caenorhabditis IFTA-2 undergoes intraflagellar transport and is required for a certain cilium-based signalling pathway (Schafer et al., 2006). The absence of FAP9 from angiosperms thus simply reflects the loss of the flagellum, whereas its missing from Physcomitrella indicates a reduction of flagellum-associated processes in the moss compared to Selaginella.
2. Selaginella shares with Physcomitrella three Ras superfamily GTPases absent from angiosperms - RAB23, ARL3, and ARL13. Orthologs of all these proteins have been shown to have something to do with the flagellum, including participation in cilium-based signalling pathways in mammals (RAB23, Wang et al., 2006), a crucial role in flagellum maintenance in Leishmania (ARL3, Cuvillier et al., 2000), or ciliary localisation and a role in the assembly of the axoneme in mammals (ARL13, Caspary et al., 2007). RAB23, ARL3, and ARL13 were therefore probably lost concomitantly with the flagellum in the angiosperm lineage.
3. Selaginella shares with embryophytes three genes with orthologs outside embryophytes but secondarily absent from Physcomitrella. This category includes SPG1 (Spg1 in S. pombe, Tem1 in S. cerevisiae), a GTPase known to be involved in regulation of the cell cycle or mitosis in yeasts but so far without a clarely defined role in angiosperms (Bedhomme et al., 2008), LIP1, a recently characterised GTPase implicated in regulation of the circadian clock in Arabidopsis (Kevei et al., 2007), and the alpha subunit of heterotrimeric G-proteins. It therefore seems that Physcomitrella has simplified several regulatory pathways compared to Selaginella and angiosperms.
4. Selaginella exhibits a phylogenetic diversity of the Rab11 (or RabA) subfamily intermediate between Physcomitrella and angiosperms. In Arabidopsis, RAB GTPases of the Rab11 subfamily were classified into six subclasses denoted RabA1 to RabA6 (Rutherford and Moore, 2002). As evident from a phylogenetic analysis of the Rab11 subfamily (see the figure below), rice possesses representatives of subclasses RabA1 to RabA5 but lacks the subclass RabA6. Selaginella encodes orthologs of subclasses RabA1, RabA2, and RabA5. In addition, it has also two genes that belong to a larger clade comprising also the angiosperm subclasses RabA3 and RabA4 and a group of RABs from Physcomitrella. The resolution within this clade in not strong enough to allow for firm conclusions, but it is likely that RabA3 and RabA4 diversified only after the split between the lycophyte and angiosperm lineages, with RabA3 representing a specialised, more rapidly evolving paralog. Because RabA4 sequences appear to have retained more ancestral features and the related Selaginella and Physcomitrella RABs are more similar to RabA4 than to RabA3, I designate the whole clade “Rab11;A4” and annotate its Selaginella representatives accordingly. Physcomitrella, in addition to representatives of the Rab11;A4 clade, encodes orthologs of RabA2 and RabA5, but not RabA1. The following evolutionary scenario for the plant Rab11 subfamily can thus be envisaged based on the present analysis: The ancestral Rab11 gene was duplicated before the separation of the moss and tracheophyte lineages yielding the Rab11;A2 clade probably retaining most of the ancestral features (see the non-embryophyte Rab11 sequences branching close or even within the Rab11;A2 clade) and the Rab11;A4 and Rab11;A5 clades representing “novel” paralogs. Tracheophytes added an additional paralog, Rab11;A1, probably by duplicating Rab11A;2, and angiosperms further expanded the subfamily by evolving RabA3 as a “novel” paralog within the Rab11;A4 clade. The origin of the Arabidopsis RabA6 is unclear for the moment. The available, though still limited, experimental evidence from angiosperms suggests a specific function for each Rab11 subclass in post-Golgi membrane trafficking (de Graaf et al., 2005; Heo et al., 2005; Preuss et al., 2006; Chow et al., 2008). Thus, I suggest that the complexity of the Rab11 subfamily in plants, which correlates with the complexity of the body as expressed in the number of distinct cell and tissue types, reflects the involvement of the distinct Rab11 clades in transport processes (exocystosis…) responsible for cell differentiation and patterning.
5. Compared to other embryophytes, Selaginella has a substatially expanded ROCO family composed mostly of pseudogenes. The ROCO family comprises proteins sharing a unique combination of two domains - the ROC domian representing a variant of a typical GTPase domain of the Ras superfamily and the COR domain of still unclear function lying directly downstream the ROC domain. This domain tandem, which probably forms a functional unit, is typically decorated with various additional domains, including leucine-rich repeats or ankyrin repeats at the N-terminus and kinase, RasGEF, WD40, or PH etc. domains at the C-terminus. The cellular roles of ROCO proteins are apparently diverse and they were proposed to work as stand-alone signal transduction transduction units (Marin et al., 2008). The single Arabidopsis member of this family, encoded by the TORNADO1 (TRN1) gene, functions in patterning processed during root and leaf development (Cnops et al., 2000; 2006). Analysis of the Selaginella genome revealed a lot of loci related to the ROCO family, but the precise number of Selaginella ROCO genes is difficult to determine (Table 2). First, most of the loci are apparent pseudogenes bearing frame-shifts, stop-codons and/or deletions disrupting the conserved structure of the gene. Most such cases seem genuine judging from inspection of corresponding WGS trace data, but in a few cases the situation is less certain and possibility of sequnecing/assembly errors cannot be discounted. Second, the incomplete genome assembly complicates discrimination between separate genes and mere allelic variants of a single gene. Finishing the genome only can resolve details of the ROCO family in Selaginella, but the following observations could be made. The most interesting is the fact that out of 38 loci with predicted gene models (some additional short ROCO-like fragments remain unannotated), at least 24 represent pseudogenes. In several cases where corresponding allelic variants could be recognised, one allele seems to be an intact gene whereas another allele has been disrupted. Some disrupted loci are still transcribed (as witnessed by ESTs) so the pseudogenisation might have occured relatively recently. Based on the primary sequence, the Selaginella ROCO proteins can be divided into four subgroups, Roco1 to Roco4. Representatives of the subgroups 1, 2 and 4 seems to have leucine-rich repeats in the N-terminal part upstream the ROC domain, similarly to ROCO proteins in other embryophytes, whereas the subgroup 3 possesses a novel N-terminal extension without discernible similarity to other proteins. Subgroups 1 and 2 each comprise a single gene with two distinct alleles, whereas Roco3 and Roco4 are each represened by multiple paralogs that tend to occupy linked positions within the genome as cluster on a few scaffolds containing multiple ROCO loci separated by non-ROCO genes. Hence, the ROCO family in Selaginella has been probably shaped by repeated local gene duplications mostly followed by disrupting mutations. The cellular function of the family in Selaginella remains enigmatic but the high evolutionary turnover is reminiscent of the evolutionary dynamics exhibited by multigene families implicated in disease resistance (see, e.g., Ameline-Torregrosa et al., 2008).
References
Ameline-Torregrosa C, Wang BB, O'Bleness MS, Deshpande S, Zhu H, Roe B, Young ND, Cannon SB. 2008. Identification and characterization of nucleotide-binding site-leucine-rich repeat genes in the model plant Medicago truncatula. Plant Physiol. 146(1):5-21.
Bedhomme M, Jouannic S, Champion A, Simanis V, Henry Y. 2008. Plants, MEN and SIN. Plant Physiol Biochem. 46(1):1-10.
Caspary T, Larkins CE, Anderson KV. 2007. The graded response to Sonic Hedgehog depends on cilia architecture. Dev Cell. 12(5):767-78.
Chow CM, Neto H, Foucart C, Moore I. 2008. Rab-A2 and Rab-A3 GTPases define a trans-Golgi endosomal membrane domain in Arabidopsis that contributes substantially to the cell plate. Plant Cell. 20(1):101-23.
Cnops G, Neyt P, Raes J, Petrarulo M, Nelissen H, Malenica N, Luschnig C, Tietz O, Ditengou F, Palme K, Azmi A, Prinsen E, Van Lijsebettens M. 2006. The TORNADO1 and TORNADO2 genes function in several patterning processes during early leaf development in Arabidopsis thaliana. Plant Cell. 18(4):852-66.
Cnops G, Wang X, Linstead P, Van Montagu M, Van Lijsebettens M, Dolan L. 2000. Tornado1 and tornado2 are required for the specification of radial and circumferential pattern in the Arabidopsis root. Development. 127(15):3385-94.
Cuvillier A, Redon F, Antoine JC, Chardin P, DeVos T, Merlin G. 2000. LdARL-3A, a Leishmania promastigote-specific ADP-ribosylation factor-like protein, is essential for flagellum integrity. J Cell Sci. 113(Pt 11):2065-74.
de Graaf BH, Cheung AY, Andreyeva T, Levasseur K, Kieliszewski M, Wu HM. 2005. Rab11 GTPase-regulated membrane trafficking is crucial for tip-focused pollen tube growth in tobacco. Plant Cell. 17(9):2564-79.
Heo JB, Rho HS, Kim SW, Hwang SM, Kwon HJ, Nahm MY, Bang WY, Bahk JD. 2005. OsGAP1 functions as a positive regulator of OsRab11-mediated TGN to PM or vacuole trafficking. Plant Cell Physiol. 46(12):2005-18.
Kevei E, Gyula P, Fehér B, Tóth R, Viczián A, Kircher S, Rea D, Dorjgotov D, Schäfer E, Millar AJ, Kozma-Bognár L, Nagy F. 2007. Arabidopsis thaliana circadian clock is regulated by the small GTPase LIP1. Curr Biol. 17(17):1456-64.
Leipe DD, Wolf YI, Koonin EV, Aravind L. 2002. Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol. 317(1):41-72.
Marín I, van Egmond WN, van Haastert PJ. 2008. The Roco protein family: a functional perspective. FASEB J. in press
Pazour GJ, Agrin N, Leszyk J, Witman GB. 2005. Proteomic analysis of a eukaryotic cilium. J Cell Biol. 170(1):103-13.
Preuss ML, Schmitz AJ, Thole JM, Bonner HK, Otegui MS, Nielsen E. 2006. A role for the RabA4b effector protein PI-4Kbeta1 in polarized expansion of root hair cells in Arabidopsis thaliana. J Cell Biol. 172(7):991-8.
Schafer JC, Winkelbauer ME, Williams CL, Haycraft CJ, Desmond RA, Yoder BK. 2006. IFTA-2 is a conserved cilia protein involved in pathways regulating longevity and dauer formation in Caenorhabditis elegans. J Cell Sci. 119(Pt 19):4088-100.
Rutherford S, Moore I. 2002. The Arabidopsis Rab GTPase family: another enigma variation.Curr Opin Plant Biol. 5(6):518-28.
Vernoud V, Horton AC, Yang Z, Nielsen E. 2003. Analysis of the small GTPase gene superfamily of Arabidopsis. Plant Physiol. 131(3):1191-208.
Wang Y, Ng EL, Tang BL. 2006. Rab23: what exactly does it traffic? Traffic. 7(6):746-50.
Zhang J, Hill DR, Sylvester AW. 2007. Diversification of the RAB guanosine triphosphatase family in dicots and monocots. J Integr Plant Biol. 46(8):1129-41.
Figure 1: The ML phylogenetic tree (PhyML-aLRT, WAG+G+I) of the Rab11 subfamily (rooted with Rab2 subfamily used as an outgroup). Sequences in red, blue, green, and yellow come from Arabidopsis, rice, Selaginella, and Physcomitrella, respectively. Other taxa (Cre – Chlamydomonas reinhardtii, Hsa – Homo sapines, Olu – Ostreococcus lucimarinus) are shown in black. Gene names for Arabidipsis are provided according Rutherford and Moore (2002) and rice gene names follow the nomenclature proposed by Zhang et al. (2007). Names for Physcomitrella genes are taken from Physcomitrella JGI genome portal (annotations by Tony Sanderfoot); names with quotation marks are proposed here for genes lacking annotation at the portal. Bootstrap values were calculated from 100 replicates.
Table 1: The Ras superfamily in embryophytes
For Selaginella, the numbers of putative loci are indicated followed by numbers in parentheses corresponding to a total number of gene models including allelic variants. IDs of the respective Selaginella proteins are shown in the next column, with alternative allelic variants in parentheses. For other species, the numbers in parentheses indicate putative pseudogenes belonging to the respective gene groups.
| Gene (or gene family) | Number of paralogs | ||||
|---|---|---|---|---|---|
| Selaginella moellendorffii | Arabidopsis thaliana | Oryza sativa | Physcomitrella patens | ||
| Rab1 | 2(4) | 228158 (229020), 449886 (177231) | 4 | 5 | 3 |
| Rab2 | 1(2) | 439463 (172085) | 3 | 3 | 4(+1) |
| Rab5 | 1(2) | 269316 (269914) | 2 | 3 | 2 |
| RabF1 | 1(2) | 449889 (449890) | 1 | 2 | 3 |
| Rab6 | 1(2) | 145680 (271169) | 5 | 2 | 7 |
| Rab7 | 1(2) | 228214 (228953) | 8 | 4 | 3(+2) |
| Rab8 | 3(6) | 449887 (449888), 270274 (147477), 437530 (168727) | 5 | 4 | 5 |
| Rab11;A1 | 2(4) | 158820 (135992), 146310 (269178) | 9 | 6 | 0 |
| Rab11;A2 | 1(2) | 145965 (162653) | 4 | 5 | 5 |
| Rab11;A4 | 2(4) | 412904 (119679), 232291 (234853) | 5(+1) | 4 | 3 |
| Rab11;A5 | 2(4) | 230012 (231747), 442026 (416682) | 5 | 2 | 4 |
| Rab18 | 1(2) | 98103 (234557) | 3 | 3 | 5(+1) |
| Rab21 | 1(2) | 230104 (184543) | 0 | 1 | 3(+2) |
| Rab23 | 1(2) | 449892 (449894) | 0 | 0 | 1 |
| ROP | 2(4) | 183027 (447605), 271540=229811 (451215) | 11 | 7(+2) | 4 |
| MIRO | 1(2) | 173355 (187516) | 3 | 1(+2) | 4 |
| SPG1 | 2(4) | 449907 (449908), 165210 (167256) | 2 | 1 | 0 |
| RAN | 1(2) | 270966 (271700) | 4 | 3 | 6 |
| LIP1 (RabL3) | 1(2) | 75974 (171849) | 2 | 2 | 0 |
| ArfA | 2(4) | 140818 (266840), 449867 (449869) | 7(+1) | 6 | 14 |
| ArfB | 1(2) | 100907 (132355) | 4 | 4 | 3 |
| ARL1 | 1(2) | 449871 (449873) | 1 | 1 | 2 |
| ARL2 | 1(2) | 231996 (127099) | 1 | 1 | 1 |
| ARL3 | 1(2) | 449874 (449879) | 0 | 0 | 1 |
| ARL5 | 1(2) | 439326 (179952) | 1 | 1 | 2 |
| ARL8 | 2(4) | 159101 (185994), 173564 (150767) | 3 | 2 | 3 |
| ARL13 | 1(2) | 450543 (450544) | 0 | 0 | 1 |
| ARFRP1 | 1(2) | 449880 (441459) | 1 | 2 | 2 |
| Sar1 | 2(4) | 179259 (271820), 165998 (149969) | 4(+1) | 4 | 4 |
| SR-beta | 1(2) | 449881 (444837) | 2 | 1(+1) | 1 |
| G-alpha | 1(2) | 449882 (449884) | 1 | 1 | 0 |
| XLG | 2(4) | 449915 (449918), 449913 (449914) | 3 | 4 | 1(+1) |
| FAP9 (RabL5) | 1(2) | 449910 (449912) | 0 | 0 | 0 |
| ROCO | 9? | 1 | 1 | 2 | |
Table 2: The ROCO family in Selaginella moellendorffii
IDs with an asterisk represent apparent pseudogenes and the respective gene models are thus fragmentary to the extent depending on how many disturbing mtations have occured
| subgroup | allele 1 | allele 2 | notes |
|---|---|---|---|
| Roco1 | 450041 (scaffold_40) | 430538 (scaffold_133) | |
| Roco2 | 451457* (scaffold_71) | 451459 (scaffold_99) | |
| Roco3 | 405805* (scaffold_4) | 450449 (scaffold_127) | |
| 405813 (scaffold_4) | 430252* (scaffold_127) | ||
| 405817+405816+405815* (scaffold_4) | ? | ||
| 405852* (scaffold_4) | ? | ||
| 405854 (scaffold_4) | 431305* (scaffold_149) | assignment of the loci as alleles is uncertain | |
| 405856* (scaffold_4) | 431303* (scaffold_149) | assignment of the loci as alleles is uncertain | |
| Roco 4 | 451479* (scaffold_43) | ? | |
| 451412 (scaffold_43) | 451417 (scaffold_144) | ||
| 451482 (scaffold_43) | 448941 (scaffold_144) | ||
| 451416 (scaffold_43) | 431047* (scaffold_144) | ||
| 444429* (scaffold_43) | ? | ||
| 419764* (scaffold_43) | ? | ||
| 444430* (scaffold_43) | 449236* (scaffold_173) | assignment of the loci as alleles is uncertain | |
| 444432* (scaffold_43) | 431857* (scaffold_173) | ||
| 444433 (scaffold_43) | 431856* (scaffold_173) | ||
| 451451* (scaffold_51) | ? | ||
| 421760* (scaffold_53) | ? | ||
| 421762* (scaffold_53) | ? | ||
| 451453* (scaffold_55) | ? | ||
| 449239* (scaffold_173) | ? | ||
| 431865* (scaffold_173) | ? | ||
| 449240 (scaffold_173) | ? | ||
| 431869* (scaffold_173) | ? | ||
| 432646 (incomplete sequence) (scaffold_615) | ? |

