ICM Molsoft: “RNA–Protein Docking”

Ribonucleic acid (RNA) is a multifunctional biomolecule that plays a central role in the storage, transmission, and regulation of genetic information within biological systems (1–3). Chemically distinct from DNA by the presence of a ribose sugar and uracil base, RNA is predominantly single-stranded and exhibits remarkable structural plasticity (4,5). This intrinsic flexibility enables RNA to fold into diverse secondary and tertiary structures (6,7), allowing it to function not only as an information carrier but also as a structural scaffold, regulatory element, and catalytic molecule (1,2,8).

Figure 1: Chemical and Structural Features of RNA

RNA in Protein Biosynthesis

The most well-established role of RNA is in protein synthesis (4,9). Messenger RNA (mRNA) acts as a transient molecular template that transfers genetic information from DNA to the ribosome (4,5). Translation is carried out by the coordinated action of ribosomal RNA (rRNA) and transfer RNA (tRNA) (4,9,10). rRNA forms the structural and catalytic core of the ribosome and is directly responsible for peptide bond formation (10,11), while tRNA molecules decode mRNA codons and deliver the correct amino acids during polypeptide elongation (4,12).

Figure 2: Overview of the Central Dogma with RNA Components

Regulatory Roles of Non-Coding RNAs

Beyond its role in translation, RNA is a key regulator of gene expression (13,14). Advances in transcriptomics have revealed that a significant proportion of eukaryotic genomes is transcribed into non-coding RNAs (ncRNAs) (13,15). Small ncRNAs, including microRNAs (miRNAs) and small interfering RNAs (siRNAs), regulate gene expression post-transcriptionally by guiding RNA-induced silencing complexes (RISC) to target mRNAs, resulting in translational repression or mRNA degradation (16–18). Larger ncRNAs, such as long non-coding RNAs (lncRNAs) and circular RNAs, are involved in chromatin remodeling, transcriptional control, RNA splicing, and the organization of nuclear architecture, underscoring RNA’s role in higher-order cellular regulation (19-21).

Figure 3: Gene Regulation by Non-Coding RNAs

RNA as a Structural and Catalytic Molecule

RNA is not merely a regulatory signal but can also function as a biochemical catalyst (1,22). Ribozymes possess intrinsic enzymatic activity and catalyse reactions such as RNA cleavage and ligation, providing strong support for the hypothesis that early life may have relied on RNA-based catalysis (2,22,23). Additionally, structured RNA elements known as riboswitches directly bind metabolites or ions and undergo conformational changes that regulate transcription or translation without the involvement of proteins (24,25).

Figure 4: Functional RNA Structures

Predicting Proteins–RNA Interactions: A Guide to Docking with MolSoft ICM

The limited availability of high-resolution protein–RNA structures in the Protein Data Bank (PDB) has driven the need for predictive computational approaches. This is where computational docking comes in. Using powerful algorithms, scientists can predict how proteins and RNA fit together in three dimensions. Among the tools leading this effort is MolSoft ICM (26, 27, 28), a molecular modeling platform designed to handle the unique challenges of protein–RNA interactions. Below is the list of RNA-Protein docking methods.

Why Protein–RNA Interactions Are different

Firstly, docking a protein to RNA may sound similar to docking one protein to another. In reality, it is much harder.

Proteins are relatively rigid and often bind through well-defined pockets. RNA, on the other hand:

  • Carries a strong negative charge along its phosphate backbone
  • Is highly flexible, able to bend and twist
  • Uses both its bases and backbone to bind proteins

Because of this, protein–RNA binding is driven largely by electrostatic attraction—positive charges on proteins pulling in the negatively charged RNA.

Figure 5: Protein-Protein and RNA-Protein docking

Protein–RNA docking

RNA/Protein-Protein docking is a two step method:

  1. Initially, molecular recognition is assessed using a simplified scoring function that rapidly captures how well the interacting partners fit together in terms of shape, as well as how their hydrophobic and hydrophilic regions align in space. This streamlined approach allows fast screening while still accounting for the most important physicochemical features that govern binding. The method then applies an FFT-based translational search to systematically explore possible relative positions of the molecules. This positional search is paired with a stepwise rotational sampling strategy, starting with a coarse set of 60 × 27 orientations and gradually refining to 256 × 125 orientations, ultimately providing a detailed and high-resolution view of the binding landscape.
  2. In the second stage, the top few thousand docking poses—typically between 3,000 and 20,000—are re-evaluated using a more detailed and realistic energy model. This refined scoring takes into account electrostatic interactions as well as solvent effects using a solvent-accessible surface area (SAS)–based approach, leading to a more reliable assessment of binding strength. The resulting poses are then grouped by clustering based on contact fingerprints, which capture key interaction patterns between the molecules. This step helps eliminate redundant solutions and highlights the most consistent and meaningful binding modes.

Figure 6: Protein-Protein/RNA-Protein docking workflow

Docking workflow for Protein-Protein docking, RNA-Protein docking remains same with slight modifications in the protocol. Results depend on selecting the right sampling method.

Sampling:
  • The coarse precision level employs an icosahedron with 12 φ/ψ angle pairs, applies a 72° θ angle step for global rotations on one side, and uses a 3 × 3 × 3 fine grid on the other side with a δ (delta) step size of 30°.
  • The medium precision level uses a dodecahedron with 20 φ/ψ angle pairs, applies a 60° θ angle step, and employs a 5 × 5 × 5 fine grid with a δ (delta) step size of 15°.
  • The fine precision level uses a dodecahedron–icosahedron hybrid polyhedron with 32 φ/ψ angle pairs, applies a 45° θ angle step, and employs a 5 × 5 × 5 fine grid with a δ (delta) step size of 9°.
Benchmarking Studies:

To evaluate how well the ICM-based docking approach performs—and to fairly compare it with other existing methods—we tested it on three well-established benchmark sets of RNA–protein complexes. All these benchmarks were built from structures available in the Protein Data Bank and include complexes solved by both X-ray crystallography and NMR. For NMR entries, the first model in each ensemble was used as the reference structure.

The first benchmark set (set1), introduced by Huang and colleagues (29), contains 72 complexes. This includes 52 cases where both the protein and RNA are available in unbound form, and 20 cases where the protein is unbound but the RNA comes from a bound structure. Only one of these represents a truly unbound–unbound complex; many others are considered “pseudo-unbound” because the RNA is bound to a different protein with low sequence similarity. The second set (set2), compiled by Perez-Cano et al (30)., includes 71 complexes, most of which involve unbound proteins paired with bound or pseudo-unbound RNA structures. The third set (set3), from Barik et al (31)., consists of 45 complexes covering a mix of truly unbound and unbound–bound systems. Together, these datasets span a wide range of biological functions and docking challenges, making them well suited for benchmarking.

To better understand how electrostatic interactions influence protein–protein binding—and to directly compare performance between protein–RNA and protein–protein docking—we used the well-established protein–protein docking benchmark version 4.0. This benchmark includes 228 non-redundant protein–protein complexes for which both the bound complex structures and the corresponding unbound protein structures are available. While NMR structures of the complexes themselves were excluded, NMR-derived structures of the unbound proteins were allowed.

The benchmark is organized into three categories based on how much the binding interface changes between the unbound and bound forms, measured by interface RMSD. Complexes labelled as “rigid” show minimal structural change and are generally easier to predict, whereas “medium” and “difficult” cases involve increasing levels of conformational change and pose greater challenges for docking. Specifically, the dataset contains 154 rigid, 41 medium, and 33 difficult complexes. This classification makes the benchmark particularly useful for assessing docking performance across a realistic range of interaction complexities.

Role of electrostatic interactions in protein–RNA docking

Electrostatic interactions are well known to play a key role in protein–RNA recognition, as supported by several previous studies. Although electrostatics were included in the final scoring stage of our docking protocol, the initial FFT-based search focused only on steric fit and lipophilic interactions. This simplification, while efficient, risked missing binding poses that are primarily stabilized by electrostatic forces.

To address this, introduction of an explicit electrostatic term into the energy function used during the FFT-based search and tested its impact using complexes from benchmark set1. Docking simulations were performed with different weights assigned to the electrostatic term (ω_el). The results clearly show that including electrostatics improves docking performance. When electrostatics were excluded (ω_el = 0), the overall success rate was 0.72, which increased to 0.79 when ω_el was set to 1. The best performance was observed at ω_el = 5, yielding a success rate of 0.89. Increasing the weight further, however, reduced accuracy, indicating that overly strong electrostatic contributions can be detrimental.

In addition, both the average rank and the number of near-native solutions improved at ω_el = 5. The protocol consistently generated many native-like poses across all tested conditions, demonstrating robust sampling. Based on these results, ω_el = 5 was selected for all subsequent protein–RNA docking simulations.

Because the added electrostatic term mainly captures interactions between the negatively charged RNA phosphate groups and positively charged residues on the protein, we expected it to have a stronger impact on complexes where binding is dominated by RNA backbone contacts. To examine this idea, we combined complexes from benchmark sets 1, 2, and 3 and analysed the bound structures of each protein–RNA complex.

For every complex, two types of contact surface areas were calculated: one corresponding to interactions between RNA phosphate groups and protein heavy atoms (SA_phosph), and another describing contacts between RNA base atoms and the protein (SA_base). Based on which contact area was larger, complexes were classified as either “phosph” or “base” type; cases where the difference between the two areas was less than 10 Ų were excluded. It is worth noting that most complexes exhibited mixed interfaces involving both phosphate and base contacts.

Figure 7. Overlay of the experimental zinc finger–RNA complex (PDB ID: 1un6) with top docking poses obtained without (ω_el = 0, orange) and with electrostatics (ω_el = 5, magenta), using the unbound protein (PDB ID: 2hgh).

Docking success rates were then compared for both groups using simulations without electrostatics (ω_el = 0) and with electrostatics included (ω_el = 5). Without the electrostatic term, docking performed better for base-dominated complexes. When electrostatics were added, success rates increased for both groups, although base-type complexes still showed slightly higher performance. Importantly, these results demonstrate that the inclusion of electrostatics improves docking accuracy across different types of protein–RNA interfaces, rather than benefiting only a small subset of phosphate-driven interactions.

Internal Case Study:

To evaluate and reproduce the unbound-unbound and unbound-bound conformations of protein-RNA complexes using ICM MolSoft’s FFT-based docking performance

Protein RNA-complexes which were difficult to reproduce (29-31) were selected from the 3 bench marking studies and few recent RNA-Protein complexes from PDB were also selected for the study. Sampling precision selected in the study was regular.

Medium precision level was used for docking of below RNA-Protein docking simulations

Summary:
  • Out of 36 RNA-Protein docking complexes, 3 complexes didn’t reproduce the desired results
  • An overall, 1st docking pose of 27 complexes could reproduce the crystal conformation, 2nd docking pose for 2 complexes and 8th pose for 1 complex.
  • Couldn’t generate the crystal conformation for 3 complexes
    • The binding interface between the protein and RNA is minimal
    • Role of electrostatics for pose prediction was less.
References
  1. Sharp, P. A. The centrality of RNA. Cell 2009, 136, 577–580.
  2. Gesteland, R. F.; Cech, T. R.; Atkins, J. F. (Eds.). The RNA World, 3rd ed.; Cold Spring Harbor Laboratory Press, 2006.
  3. Doudna, J. A.; Cech, T. R. The chemical repertoire of natural ribozymes. Nature 2002, 418, 222–228.
  4. Alberts, B.; et al. Molecular Biology of the Cell, 7th ed.; Garland Science, 2022.
  5. Watson, J. D.; et al. Molecular Biology of the Gene, 7th ed.; Pearson Education, 2013.
  6. Draper, D. E. A guide to ions and RNA structure. RNA 2004, 10, 335–343.
  7. Holbrook, S. R. Structural principles from large RNAs. Annu. Rev. Biophys. 2008, 37, 445–464.
  8. Leontis, N. B.; Westhof, E. Geometric nomenclature and classification of RNA base pairs. RNA 2001, 7, 499–512.
  9. Ramakrishnan, V. Ribosome structure and the mechanism of translation. Cell 2002, 108, 557–572.
  10. Steitz, T. A. A structural understanding of the dynamic ribosome machine. Nat. Rev. Mol. Cell Biol. 2008, 9, 242–253.
  11. Nissen, P.; et al. The structural basis of ribosome activity in peptide bond synthesis. Science 2000, 289, 920–930.
  12. Schimmel, P. Transfer RNA: structure, properties, and recognition by aminoacyl-tRNA synthetases. Annu. Rev. Biochem. 1987, 56, 125–158.
  13. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489, 57–74.
  14. Morris, K. V.; Mattick, J. S. The rise of regulatory RNA. Nat. Rev. Genet. 2014, 15, 423–437.
  15. Djebali, S.; et al. Landscape of transcription in human cells. Nature 2012, 489, 101–108.
  16. Bartel, D. P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004, 116, 281–297.
  17. Carthew, R. W.; Sontheimer, E. J. Origins and mechanisms of miRNAs and siRNAs. Cell 2009, 136, 642–655.
  18. Hannon, G. J. RNA interference. Nature 2002, 418, 244–251.
  19. Rinn, J. L.; Chang, H. Y. Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 2012, 81, 145–166.
  20. Quinn, J. J.; Chang, H. Y. Unique features of long non-coding RNA biogenesis and function. Nat. Rev. Genet. 2016, 17, 47–62.
  21. Kristensen, L. S.; et al. The biogenesis, biology and characterization of circular RNAs. Nat. Rev. Genet. 2019, 20, 675–691.
  22. Cech, T. R. RNA as an enzyme. Biochemistry 1987, 26, 6331–6337.
  23. Joyce, G. F. The antiquity of RNA-based evolution. Nature 2002, 418, 214–221.
  24. Breaker, R. R. Riboswitches and the RNA world. Cold Spring Harb. Perspect. Biol. 2012, 4, a003566.
  25. Serganov, A.; Nudler, E. A decade of riboswitches. Cell 2013, 152, 17–24.
  26. Abagyan, R.; Totrov, M.; Kuznetsov, D., ICM – a new method for protein modeling and design – applications to docking and structure prediction from the distorted native conformation. Journal of computational chemistry 1994, 15, 488;
  27. Abagyan, R.; Totrov, M., Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. J Mol Biol 1994, 235 (3), 983-1002.
  28. Yelena A. Arnautova, Ruben Abagyan, Maxim Totrov. Protein-RNA Docking Using ICM. J. Chem. Theory Comput. 2018, 14, 9, 4971–4984
  29. Sheng-You Huang, Xiaoqin Zou, A non-redundant structure dataset for benchmarking protein-RNA computational docking. J Comput Chem. 2013;34:311-318.
  30. Pérez-Cano L, Jiménez-García B, Fernández-Recio J. A protein-RNA docking benchmark (II): extended set from experimental and homology modeling data, Proteins. 80(7):1872-82
  31. Amita Barik, Nithin C, Manasa P, Ranjit Prasad Bahadur. Aprotein–RNAdocking benchmark (I): Nonredundant cases, Proteins. 2012 Jul;80(7):1866-71.