Using GNN to cluster non coding RNA
Get the latest version of ViennaRNA from https://www.tbi.univie.ac.at/RNA/#download For the current version wget https://www.tbi.univie.ac.at/RNA/download/sourcecode/2_6_x/ViennaRNA-2.6.4.tar.gz tar -xzf ViennaRNA-2.6.4.tar.gz cd ViennaRNA-2.6.4 ./configure make make check make install
Varna (3.93) is a tool to visualize RNA secondary structure (more information at https://varna.lisn.upsaclay.fr/index.php?lang=en&page=downloads&css=varna) wget https://varna.lisn.upsaclay.fr/bin/VARNAv3-93.jar
You can import the class ClusteringRNA from rnaClustering.py and use function as shown in the example.ipynb.
- When creating a ClusteringRNA object, you need to specify the list of sequences. You can also specify all the arguments for the training of the GNN.
- fit_model() will train the GNN on the sequences
- You can use no_sleeping=True to run the training in background (useful for long training)
- cluster() will cluster the sequences using the trained GNN. You can specify the number of clusters you want to have, the gamma for the Maximum Expected Accuracy (MEA) algorithm and the entropy threshold to filter the clusters.
- The variable representatives provide the list of selected structures for each cluster. It also has the entropy of the cluster.
- example.ipynb : example of how to use the ClusteringRNA class
- rnaClustering.py : contains the ClusteringRNA class and the methods to analyse the sequences
- script.py : contains the training of the GNN and information about the training
- gae/data.py : contains the class to define the data and dataset
- gae/gae.py : contains the GNN model
- ViennaRNA
- Varna