RNA graph

From Purdue Genomics Database Facility

Jump to: navigation, search

| RNA Home | RNA Structure Match | RNA_Structure_Datasets | RNA_Papers |RNA_misc_(Internal_discussion)|

ISBRA paper: Pattern_Matching_in_RNA_Structures wordle pic produced by www.wordle.net

Please help by researching items labeled "needs reader". leave a note that you are doing so using the four tildes '~~~~'

Contents

CVS

CVS repository:/cbio/server/lib/CVS_Repository

Nomenclature

The idea of this project is to use graph theoretical approaches to find the greates common folded structure in an ad hoc set of RNA sequences. In outline

  • Each atomic stem (folded unit) is a node of graph
  • Edges represent relations between atomic units, which can be
    • nested or internal, stem b is in the loop of stem a (edge:i)
    • serial, stem a and stem b are a non-overlapping sequence (edge:s or none)
    • overlapping, e.g. pseudoknotted, half of stem b is in the loop of stem a (or vice versa) (edge:o)
    • incompatible (because they use the same sequence), stem a and stem b would use the same sequence ( edge:x)

There should be some examples here, but the drawings are hard to make.

Requirements

  • The structures to be considered must include pseudoknots
  • The approach must work in the presence of sequences that, in fact, do not have a common structure from the biological point of view. This is very important because a biologically selected set of sequences will rarely be clean - contamination with "junk" sequences is to be expoected
  • Do we need approximate subgraph matching? Or because we can enumerate all graphs up to a certain size, is exact matching sufficient (at least for initial screening).
  • Do we need induced graph matching?

Known Problems and Approaches

  • graph isomorphism - see the wikipedia introductory article
  • subgraph isomorphism - For graphs G1 and G2 does G2 contain a subgraph isomorphous to G1?
  • Induced subgraph isomorphism problem - for gaaphs G1=(V1, E1) and a graph G2=(V2, E2) of lesser or equal order to G1.
    , can one map the vertices of G2 to vertices of G1 such that for all vertices x, y in V2, edge (x, y) is in E2 if and only if the edge (f(x), f(y)) is in E1. This is different from the in that the inducted subgraph in G1 cannot have edges which aren't also in G2. In subgraph isomorphism, these "extra" edges may be present. (excerpted from Wikipedia)
  • Maximum common subgraph (MCS) isomorphism problem - For graphs G1 and G2, what is the largest subgraph of G1 that is isomorpus to a subgraph of G2? One possible solution for this problem is to build product graph, in which the largest clique (see below) represents a solution for the MCS problem. MCS algorithms have a long tradition in cheminformatics and pharmacophore mapping, but do not seem to be the best solution for finding the greates common structure in a contaminated set of RNA sequences. See also the folowing Wikipedia articles:
  • Clique problem - A clique in a graph is a set of pairwise adjacent vertices, or in other words, an induced subgraph, which is a complete graph. The clique problem is the problem of determining whether a graph contains a clique of at least a given size k. Once we have located k or more vertices which form a clique. The corresponding optimization problem, the maximum clique problem, is to find the largest clique in a graph. (excerpted from Wikipedia)

References for general approaches

  • SAGA: a subgraph matching tool for biological graphs. Yuanyuan Tian, Richard C. McEachin, Carlos Santos, David J. States and Jignesh M. Patel. Bioinformatics 23:232–239, 2007. doi:10.1093/bioinformatics/btl571 [[1]]

SAGA provides a tool for matching a query (<100 nodes) vs a database of preindexed graphs. Graph distance and indexing functions are very relevant but probably cannot be used without modification.Gribskov 10:24, 2 June 2007 (EDT)

  • He,H. and Singh,A.K. (2006) Closure-tree: an index structure for graph queries. In Proceedings ICDE 2006, pp. 38–49. (needs a reader)
  • Shasha,D. et al. (2002) Algorithmics and applications of tree and graph searching. In Proceedings of PODS 2002, pp. 39–52. (needs a reader) Aditi 18:10, 3 June 2007 (EDT)
  • Yan,X. et al. (2004) Graph indexing: a frequent structure-based approach. In Proceedings of SIGMOD 2004, pp. 335–346.(needs a reader) Reazur 3:10, 4 June 2007 (EDT)
  • Yan,X. et al. (2005) Substructure similarity search in graph databases. In Proceedings of SIGMOD 2005, pp. 766–777.(needs a reader) Aditi 17:43, 8 June 2007 (EDT)
  • Yan,X. et al. (2006) Searching substructures with superimposed distance. In Proceedings of ICDE 2006, pp. 88–99.(needs a reader)

Alternative approach to graph matching problem

These two papers describe tractable way of mining frequent graphs. The idea of generating DFS codes for enumerating structures might be helpful for us.

A working version of gSpan is available here

Related Software

Packages we might use

  • Graphviz - an open source graph visualization software with several graph layout programs.
  • nauty - (no automorphisms, yes?) is a set of procedures for determining the automorphism group of a vertex-coloured graph.
  • Vienna RNA package - The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
  • MFold - Michael Zuker's minimum free energy RNA folding program (and server)

research Groups