COMPUTER-BASED MODELLING OF LIGAND/RECEPTOR STRUCTURES

Title:

COMPUTER-BASED MODELLING OF LIGAND/RECEPTOR STRUCTURES

Document Type and Number:

WIPO Patent Application WO/2002/016930

Kind Code:

A2

Abstract:

Method for defining a binding site in biological macromolecule based on two-sphere grid and method for determining the free energy of ligand/RNA structure based on pseudo-energy values. These methods can be use in docking and also in high-throughput $i(in silico )screening of ligand libraries against RNA structures. Defining the binding site involves: (a) placing a grid over a 3D representation of the receptor; (b) identifying as an excluded volume grid point(s) which lie within the receptor; (c) centring a large sphere over grid point(s) and removing the grid points within the sphere if it does not overlap with the receptor; (e) centring a small sphere over all remaining grid points and determining whether one or more grid points within the small sphere overlap with the excluded volume of the receptor and identifying any non-overlapping grid points as representing a putative binding site cavity within the receptor.

More Like This:

WO/2008/131311	SHIFT-INVARIANT PREDICTIONS
WO2010110228	INFORMATION DISPLAY PROGRAM, INFORMATION DISPLAY DEVICE, AND INFORMATION DISPLAY METHOD
JP6976250	Antibody design with de novo

Inventors:

AFSHAR MOHAMMAD MICHEL (GB)
MORLEY STEPHEN DAVID (GB)

Application Number:

PCT/GB2001/003743

Publication Date:

February 28, 2002

Filing Date:

August 21, 2001

Export Citation:

Click for automatic bibliography generation Help

Assignee:

RIBOTARGETS LTD (GB)
AFSHAR MOHAMMAD MICHEL (GB)
MORLEY STEPHEN DAVID (GB)

International Classes:

G16B15/30; G01N33/566; C07B61/00; (IPC1-7): G01N33/00

Domestic Patent References:

WO1999058722A1

1999-11-18

Other References:

FILIKOV ANTON V ET AL: "Identification of ligands for RNA targets via structure-based virtual screening: HIV-1 TAR." JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, vol. 14, no. 6, August 2000 (2000-08), pages 593-610, XP008004783 ISSN: 0920-654X cited in the application
BOEHM HANS-JOACHIM: "Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3D database search programs." JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, vol. 12, no. 4, July 1998 (1998-07), pages 309-323, XP008004781 ISSN: 0920-654X cited in the application
RAREY MATTHIAS ET AL: "A fast flexible docking method using an incremental construction algorithm." JOURNAL OF MOLECULAR BIOLOGY, vol. 261, no. 3, 1996, pages 470-489, XP002203342 ISSN: 0022-2836
LIND KENNETH E ET AL: "Structure-based computational database screening, in vitro assay, and NMR assessment of compounds that target TAR RNA." CHEMISTRY & BIOLOGY (LONDON), vol. 9, no. 2, February 2002 (2002-02), pages 185-193, XP002203343 February, 2002 ISSN: 1074-5521

Attorney, Agent or Firm:

Howard, Paul Nicholas (43 Bloomsbury Square, London WC1A 2RA, GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1.	A method for determining the free energy of a ligand/RNA structure, wherein the method utilises a scoring function for calculating pseudoenergy values of the ligand/RNA structure.

2.

The method of claim 1, wherein the scoring function calculates pseudoenergy values for one or more of the following interactions between RNA and ligand: hydrogen bonds, lipophilic interactions, ionic interactions, repulsive ionic interactions, aromatic/cationpi interactions, and entropic costs of interaction.

3.	The method of claim 1 or claim 2, wherein formal charges are assigned to RNA and/or ligand atoms before scoring.

4.	The method of any one of claims 1 to 3, wherein the pseudoenergy values are weighted.

5.	The method of any one of claims 1 to 4, wherein the pseudoenergy value for hydrogen bond is a function of the donoracceptor bond length, the acute angle around the donor hydrogen, and the acute angle around the acceptor atom.

6.	The method of any one of claims 1 to 5, wherein the pseudoenergy value for aromatic/cationpi interactions utilises the average perpendicular distance from each ring centre to the other ring plane and the average slip angle between the rings.

7.	The method of any one of claims 1 to 6, wherein the pseudoenergy value for lipophilic interactions is a linear version of a LennardJones potential function.

8.	The method of claim 7, wherein the function has an attractive part and a repulsive part.

9.	The method of any one of claims 1 to 8, wherein said method utilises one or more of equations (I), (II), (III), (IV) and (V) disclosed herein.

10.

A method for defining a binding site in a receptor, comprising the following steps: (a) superimposing a grid, comprising a plurality of grid points, over a 3D representation of the receptor; (b) flagging as excluded volume the grid points which lie within the van der Waals surface of the receptor; (c) centring a large sphere on the grid points not flagged in step (b) and removing all grid points within said large sphere if said large sphere does not overlap with one or more grid point (s) flagged in step (b); and (d) removing the grid point (s) for which a small sphere centred on said grid point (s) overlaps with one or more grid point (s) flagged in step (b), the remaining grid points defining the receptor's binding site (s).

11.	The method of claim 10, comprising the further step of: (e) dividing the remaining grid points into contiguous cavity regions;.

12.

The method of claim 11, comprising the further steps of : (f) filtering the cavity (s) to remove (i) those smaller than a minimum size and/or (ii) those with a centre of mass further than a maximum distance from a designated active site centre, with the remaining grid points defining the receptor's binding site.

13.	The method of any one of claims 10 to 12, wherein the grid is cubic.

14.	The method of any one of claims 10 to 13, wherein the receptor is a protein, a nucleic acid, a carbohydrate, or a lipid.

15.	The method of claim 14, wherein the nucleic acid is RNA.

16.	The method of any one of claims 10 to 15, wherein the spacing of the grid used in step (a) is around 0.5Å.

17.	The method of any one of claims 10 to 16, wherein the radius of the large sphere used in step (c) is in the range 35A.

18.	The method of any one of claims 10 to 17, wherein the radius of the small sphere used in step (d) is around 1.75A.

19.	The method of any one of claims 12 to 18, wherein the minimum size used in step (f) is around 20 grid spacings.

20.	The method of any one of claims 12 to 19, wherein the maximum distance used in step (f) is around 10Å.

21.	A method for docking a ligand to a RNA, wherein free energy of a ligand/RNA structure is calculated using the method of any one of claims 1 to 9.

22.	The method of claim 20, wherein the binding site of the RNA is defined using the method of any one of claims 10 to 20.

23.	The method of claim 21 or claim 22, wherein the docking of ligand to RNA is done at high throughput.

24.	The method of any one of claims 21 to claim 23, wherein Monte Carlo simulated annealing is utilised.

25.	The method of any one of claims 21 to 24, wherein distance restraints are used to keep ligand atoms within a specified distance of at least one of the binding site grid points.

26.	The method of claim 25, wherein the specified distance is around 7A.

27.	The method of any one of claims 21 to 26, wherein a ligand placed within a binding site is subjected to Monte Carlo sampling.

28.	A highthroughput process for screening a library of ligand structures for those which interact with an RNA receptor of interest, comprising docking each structure from the library against the RNA receptor by the method of any one of claims 21 to 27.

29.

A method of screening a library of ligand structures to identify a ligand which interacts with an RNA molecule, the method comprising the steps of : docking a structure from the library against the RNA receptor using the method of any one of claims 21 to 27 in order to simulate a ligand: RNA complex.

30.	A ligand identified by the method of claim 28 or claim 29.

Description:

COMPUTER-BASED MODELLING OF LIGAND/RECEPTOR STRUCTURES All documents cited herein are incorporated by reference in their entirety and are indicated in the text by numerals in square brackets.

TECHNICAL FIELD This invention is in the field of computer-based biomolecular modelling. In particular it concerns the in silico docking of ligand ligand with biological biological receptor, such the docking docking ligands ligands with RNA and RNA-protein complexes. More particularly, it concerns prediction of the binding mode and affinity of a ligand binding to a receptor, starting with a 2D structure for the ligand and a 3D structure for the receptor.

BACKGROUND ART Computer-based molecular docking techniques are widely used for modelling the interaction between a ligand and a biological receptor [e. g. references 1 and 2], typically requiring known 3D structures for the ligand and receptor. Interacting ligand and receptor surfaces are described by different physical properties (e. g. Connolly surfaces, grids, protein slices, property-vectors etc.) to allow the identification of their geometrically complementary regions.

The goal of molecular docking is to reconstruct the conformation of the bound ligand/receptor complex starting from the structure of the unbound partners, and also to predict the affinity of the interaction. Despite the complexity of this problem, several approximations lead to effective and yet computationally feasible approaches.

Conceptually, docking can be split into two related tasks. First, a variety of feasible ligandlreceptor structures are generated, using a'search'algorithm. Second, these structures are each'scored'to estimate binding affinity and thus to evaluate the likelihood of the putative structure being the correct'real world'structure (i. e. to evaluate a structure's accuracy). Scores for a number of alternative putative structures can be ranked, with the highest scoring structure (e. g. the lowest energy structure) being predicted to represent the actual ligand/receptor complex.

Where a large number of scores are to be calculated for ranking, it is important for the scoring function to be as fast as possible whilst retaining accuracy.

The docking package DOCK [3] uses a sphere definition of a target binding site to guide the positioning of ligands. The centres of spheres are used to identify possible positions for ligand atoms in a putative binding site of a receptor. Selected ligand atoms are mapped onto subsets of sphere centres with internal distances between sphere centres approximating the distance

between ligand atoms. Each mapping attempt or'docking run'orients a ligand in the site. A minimum of four atom-sphere centre pairs is required to define a unique configuration for a ligand, and thus thousands of orientations are possible.

Following transformation of a ligand's coordinates (each coordinate representing the position of an atom within the ligand) into a particular orientation, the orientation is scored by evaluating the extent of shape complementarity between the receptor and the ligand using a scoring function which approximates van der Waals interactions between atoms. Ligand atoms docked within attractive distances of receptor atoms are assigned positive scores, while orientations in which ligand atoms overlap receptor atoms are assigned negative scores and discarded as unlikely to form in vivo.

To simplify the computational complexity of docking and reduce configuration space to a manageable level, search algorithms typically focus only on the binding site of a receptor-it is pointless to attempt to dock a ligand with a region in the receptor which is known not to be involved. A number of algorithms are available to predict binding sites for this purpose e. g.

SPHGEN [4], which defines the binding site by using clusters of overlapping spheres to represent the solvent-accessible surface of the receptor. The radii of the spheres are proportional to the concavity of the receptor surface-flat regions are represented by larger spheres while small spheres are generated in highly-featured regions.

Spheres are constructed using a molecular surface program, such as described in references 5,6 and 7. Spheres are calculated over the entire surface of a macromolecule (e. g. receptor or ligand), producing approximately one sphere per surface point. Spheres are then selectively eliminated by filtering to keep only the largest sphere associated with each surface atom. The filtered set is then clustered on the basis of radial overlap between the spheres using a single linkage algorithm.

This creates a negative image of the molecular surface, where any invagination is characterised by a set of overlapping spheres. These sets, or'clusters', are sorted according to numbers of constituent spheres, with the largest cluster in a receptor molecule (i. e. the cluster comprising the most spheres) typically identified as representing a ligand binding site.

In the biological field, docking has typically been used for predicting protein/protein and protein/small molecule interactions, with only a few reports of docking a ligand to RNA.

Reference 4 describes the application of SPHGEN to an RNA to define its binding site for a ligand, followed by rigid body orientation of a ligand within the binding site to overlap the

'shape'defined by spheres. Structures were scored on the basis of molecular mechanics, using a modified version of AMBER force field with the docking package DOCK [3].

Reference 8 describes a method similar to that described in reference 4 where the docking package DOCK (with its molecular mechanics score) is extended and applied to RNA.

In reference 9, electrostatic patches on the RNA are identified using Brownian dynamics. A pharmacophore-type motif is identified and aminoglycosides are docked using this as a guide to place their amino groups.

In reference 10, peptides are docked to RNA using molecular mechanics energy minimisation.

Reference 11 describes a docking method for RNA which uses a 3D pharmacophore search and molecular dynamics optimisation. Reference 12 describes MCSS-based docking of arginine and aminoglycosides to the TAR RNA using a fragment-based molecular mechanics approach.

Where scoring functions have been used in RNA docking [e. g. 4], these have been based on molecular mechanics, considering van der Waals and electrostatic interactions between ligand and receptor (e. g. Dock3.5 [3]). Pseudo-energies have been used as an alternative approach for protein docking (e. g. LUDI [13], FleXX [14], Hammerhead [15]). As the methods require training against empirically-determined receptor/ligand complexes, however, they have not been seen as suitable for RNA because of the relative lack of RNA/ligand structures.

It is an object of the invention to provide improved methods for molecular docking, particularly for docking ligands to RNA receptors. In particular, it is an object to provide improved methods for defining active sites in receptors, to provide improved scoring functions for RNA/ligand complexes, and to use docking to provide automated iM silico library screening for RNA-binding compounds.

DISCLOSURE OF THE INVENTION The invention provides a method for defining a binding site in a biological macromolecule based on using a two-sphere grid to map cavities on a receptor surface which are likely to be accessible to ligands, and a method for determining the free energy of a ligand/RNA structure based on pseudo-energy values. These methods can be use in docking and also in high-throughput in silico screening.

I. Defining a binding site in a biological macromolecule (receptor) To simplify the computational complexity of docking, it is helpful to identify the most important active site cavities on the receptor and to restrict the sampling of the ligand to these regions.

The invention provides a method for defining a binding site in a biological macromolecule (receptor) which utilises a two-sphere grid to map the receptor and to locate cavities likely to be accessible only to ligands (e. g. low molecular weight ligands).

The method involves the following steps: 1. A grid is placed over a 3D representation of the receptor; 2. Grid points which lie within the van der Waals'volume of the receptor are flagged as excluded volume; 3. The remaining grid points are checked for accessibility to a large sphere-if a large sphere centred on a given grid point does not overlap with grid points flagged in step 2 as excluded volume then all grid points within the large sphere are removed; 4. Any grid points still remaining are checked for accessibility to a smaller sphere (solvent accessibility)-if a small sphere centred on a given grid point does not overlap with receptor, the grid point is flagged as belonging to a cavity; and 5. The ensemble of adjacent grid points flagged in step 4 defines the binding cavity or cavities- there may be more than one cavity depending on the complexity of the receptor surface.

The method may optionally comprise either or both of the following further steps, with the remaining cavities defining the binding site (s) of the receptor: 6. The cavities are filtered to remove those smaller than a minimum size; and 7. The co-ordinate of the active site centre is defined and cavities with a centre of mass further than a maximum distance from the designated active site centre are removed.

In comparison with SPHGEN, the method of the invention avoids the need for manual intervention (e. g. user input of empirically-determined parameters) in order to guide the algorithm towards the binding site, although manual guidance can be used if desired. Cavity mapping is therefore improved.

In comparison with the binding site calculation method presented in reference 16, the method of the invention uses grid points and no surface calculation is performed. This results in improved speed of computation.

In comparison with the method implemented in Cerius2 [17], the use of two spheres (instead of a single sphere in Cerius2) allows the filtering of ripples at the bottom of the cavities, thus simplifying the surface and increasing the performance of the docking at later stages.

The receptor can be any biological macromolecule, such as a protein, a nucleic acid, a carbohydrate or a lipid. The method is ideally suited to defining the binding site of nucleic acid, in particular RNA.

The grid used in step 1 will typically be cubic and will comprise a plurality of x-, y-and z- coordinates, each representing a point in 3D space (a grid point). Its spacing will be selected according to the size of the receptor. As the grid becomes finer (i. e. the spacing decreases), the final binding site will be more detailed, but the method will involve more computation. A balance between required detail and computing power can thus be chosen for any particular situation. Typically, the grid will have a spacing of between 0. là and 2A. A preferred grid is cubic and has a spacing of between 0.2Å and 1Å, and most preferably the spacing is around 0.5Å. The grid is preferably placed using Cartesian coordinates. It may optionally be centred on a postulated active site region of the receptor.

The van der Waal's volume calculated in step 2 preferably uses standard atomic radii.

The radius of the large sphere used in step 3 is in the range 3-5A, typically about 4A. Step 3 has the effect of eliminating all large cavities and of smoothing convex regions of the receptor surface.

The radius of the small sphere used in step 4 represents a small ligand fragment or solvent (e. g.

H2O) molecule. The radius is typically in the range 0.5-3A, preferably 1-2A, and most preferably around 1.75A.

Taken together, steps 3 and 4 identify regions of the grid corresponding to"deep"cavities, i. e. those that are accessible to small spheres but not larger ones.

Cavities determined at the end of step 4 will often need to have a minimum size in order to be useful as target regions for the binding of ligands. The minimum size used in optional step 6 represents an empirical estimate of this minimum size. It will typically be in the range 5-50 grid points, preferably 10-30 grid points, and most preferably around 20 grid points. With a 0.5A grid, therefore, the minimum size will be 10A.

If extra information is available on the location of the"active"site, 3D coordinates of the active site centre can be identified in the context of the whole receptor. A"maximum distance"can then be used to define the extent of the active site in optional step 7. The centre point and the distance together define the active site sphere. The assumption is that regions of the cavity which are remote from the centre of the binding site (more than maximum distance) are predicted not to

be critically involved in binding. The maximum distance will typically be in the range 1-50Å, preferably 5-20A, and most preferably around 10A. The centre point can be defined by using the centre of mass of the receptor, or prior information on the location of the active site.

Evidently, the method requires a 3D representation of the receptor. 3D coordinates of target receptors can be obtained, for example, from a protein database (e. g. the Protein Data Bank [18] or the Cambridge Crystallographic Database [19]); as an alternative, they can be determined by experimental methods routine in the art (e. g. by X-ray diffraction or NMR) or a model structure may be used.

In one embodiment, a 3D representation of the receptor may be generated using a program such as ms which reads atomic coordinates in Protein Databank Format [e. g. references 20,21,22] or using other programs, such as those described in references 23 to 48.

II. Docking methods Once a binding site in the receptor has been defined, ligands (structural representations of ligands) can be docked to it. This can use any of the standard docking algorithms available.

The'search'aspect of docking samples the degrees of freedom of the receptor/ligand complex.

In some embodiments of the invention, the ligand can be considered to be rigid, and only six degrees of freedom need to be sampled. In other embodiments, alternate conformations of the ligands (each torsion angle that varies) can be considered, which adds complexity (e. g. computational time) to the search.

The search algorithm may use molecular mechanics (e. g. energy minimisation [49], molecular dynamics [50], multiple copy simultaneous search [51]), Monte Carlo simulated annealing [52, 53], a combinatorial search (e. g. in-site combinatorial search, or rotamer search), genetic algorithms, or may assemble the ligand from fragments (e. g. FlexX [14], Dock 4.0 [3], Ludi [13], Hammerhead [15], Growmol [54], Hook [55]). Molecular mechanics algorithms are preferred, as they can be readily used to sample the conformational space and optimise the geometry of the docked ligands. Furthermore, their theoretical basis allows the interpretation of the physico-chemical features that are relevant to a particular prediction. Monte Carlo algorithms are particularly preferred.

Thus the invention provides a method for docking a ligand to the binding site of a biological macromolecule (receptor), comprising the steps of : (i) defining the binding site of the receptor according to the invention; and (ii) docking a ligand structure to the binding site.

The binding site definition method of the invention results in a set of grid points which represent the (putative) binding site (s) of the receptor for use in docking. When performing a docking experiment, it is possible to include distance restraints to keep ligand atoms within a specified distance of at least one of these grid points. Atoms further away from the binding site thus do not contribute to the scoring function, but it may be preferred to retain calculation of the repulsive vdW term for these atoms. The specified distance will typically be in the range 1-20Å, preferably 5-10A, and more preferably around 7A. In effect, this provides a non-spherical docking region that is closely moulded to the shape of the binding site cavities.

Generally, a ligand is placed within one of the receptor's binding site cavities at the start of each independent docking run. A typical method will involve the following steps: 1. If multiple cavities are present, a cavity is selected at random, preferably with a probability proportional to its size; 2. The ligand centre of mass is placed either (a) at the cavity centre of mass or (b) on a randomly selected cavity grid point; 3. The ligand principal axes are aligned either (a) with the cavity principal axes or (b) along a randomly selected axis ; and 4. The internal conformation of the ligand is preferably retained from the best scoring bound conformation from the previous run (s).

In step 1, a preferred probability for placing the ligand in any given cavity is the ratio between the cavity volume and the total volume of all cavities.

In step 2, it is preferred to align the ligand centre of mass with the cavity centre of mass.

In step 3, it is preferred to align the ligand principal axes with the cavity principal axes.

Once the ligand is placed, it is preferably subjected to Monte Carlo sampling. A typical Monte Carlo trial comprises the steps of: 1. Translation of the ligand along a random vector selected from a spherical distribution; 2. Rotation of the ligand around a random axis vector (also selected from a spherical distribution) through the ligand centre of mass; and 3. Torsional rotation of one of the ligand rotatable bonds.

Generally, the maximum step sizes for each of the three steps will be chosen to give a Metropolis acceptance rate of approximately 0.5 [56].

The Monte Carlo sampling is preferably used within a non-adaptive simulated annealing cooling schedule for locating low energy bound conformations. A typical schedule is divided into a number of phases. In each phase a fixed number of Monte Carlo trials are performed in a series of constant temperature blocks. At the end of each temperature block, the sampling temperature is reduced by a constant factor (geometric cooling). In addition, the maximum Monte Carlo step sizes may be reduced (e. g. halved) if the acceptance rate falls below a threshold (e. g. 0.25) over a particular block. The next phase may begin from the lowest scoring ligand conformation found during the previous phase.

Standard Monte Carlo protocols and parameters can be found in references 57,58 and 59.

In some embodiments, the RNA is a drug target and the ligand is a candidate lead drug.

III. Scoring functions for ligand/RNA complexes To estimate the likelihood of a ligand/receptor in silico structure representing the actual in vivo situation, the structure can be scored. This is particularly useful for ranking a variety of models, thereby identifying the most promising model.

The invention provides a method for predicting the free energy of a ligand/RNA structure ("pseudo energy"), wherein the method utilises a simplified empirical scoring function for calculating energy values of the ligand/RNA structure. The need for molecular mechanics calculations is thus avoided and computational time is hugely reduced.

The invention also provides a method for docking a ligand to a RNA, wherein the free energy of a ligand/RNA structure is calculated using a scoring function based on pseudo-energy values.

The pseudo-energy methods of the invention can also be used for discriminating between native and non-native ligands of a particular receptor.

Preferably, the scoring function calculates pseudo-energy values for one or more (preferably all) of the following interactions between RNA and ligand: hydrogen bonds, attractive lipophilic interactions, repulsive lipophilic interactions, attractive ionic interactions, repulsive ionic interactions, aromatic/cation-pi interactions, and entropic costs of interaction. These terms may contribute to the scoring function equally, but are preferably weighted.

Preferably, the method of the invention uses the scoring function set out as equation (I) : I H-bonds Ionic E f, (A. Rij). f2 (ci, cj). f, (NNighb) + ionic AG AG,, rol), Ifl (ARperl))'fl (Aaslip) + equ' Ginding arom AG-.I fl' (ARij) + lipo AGrotNrot + AGo AV + AGo

Equation (I) is a scoring function for the binding free energy of a ligand/RNA complex as a weighted sum of defined interactions [c. f. ref. 13]. The interactions which are included in the equation, the functional forms used to describe each interaction, and the free energy weightings assigned to each interaction were determined empirically by fitting the parameters of the equation to known affinities for ligand/RNA complexes of known structure. Due to the relative lack of high-resolution RNA-ligand structures, however, a full statistical approach using a large training set was not possible. Rather, the scoring function was optimised against RNA aptamer complexes with emphasis on reproducing the experimental binding modes of each ligand (using Monte Carlo docking), as well as predicting the binding affinity. The ability of the scoring function of the invention to discriminate between native and non-native ligands has also been demonstrated through the calculation of a full cross-docking matrix of binding affinities, in which each ligand is docked into each aptamer in the training set.

Typical weightings and parameters for the free energy terms in the scoring function are: Term AG (kJmol-1) X X0 AXmin AXmax H-bond-3.4 R 1.9 A 0. 25A 0.6A αdon180 30 80 aAcc 150° 30° 70° Ionic-1. 0 Rij (#RvdW) +0.6A 0.25A 0.6A Arom-1.6 Rperp 3.5A 0. 25A 0.6A ashp 0° 20° 30° Lipo-0. 12 Rij (#RvdW)+0.6Å 0.25A 0.6A Rot +1.0 0 +5.4

Notes: X, Xo, AXmin, AXmaxc refer to equ" (II).

Formal charges Before applying equation (I) to a ligand/RNA structure, formal charges may be assigned to certain receptor and ligand atoms in order to modulate the strength of hydrogen bond and ionic interactions in the scoring function. Applying formal charges to ligand and receptor atoms in this way is not used in scoring a protein/ligand interaction and has not previously been done.

According to the invention, atoms in the RNA receptor are assigned formal charges as follows: RNA Residue Atom (s) * Formal charge A, C, G, U O1P, 02P-0.5 A N7-0. 5 C 02-0. 5 G N7, 06-0.5 U 02, 04-0.5

* : standard RNA atom nomenclature is used Anionic phosphate groups and base atoms carrying a high partial charge in molecular mechanics force fields are assigned a negative formal charge e. g. the O1P, 02P and N7 atoms in each A residue in the RNA receptor are assigned a formal charge of-0.5 etc. No positive charges are assigned on the receptor. The choice of the atoms that are charged and their associated charge are part of the present invention.

Atoms in the ligand are assigned formal charges as follows: Ligand Group Atom (s) Formal charge Primary amine 3 H +0. 33 (1/3) Secondary amine 2 H +0.5 Tertiary amine 1 H +1.0 Guanidinium 5 H + C +0.167 (1/6) Imidazole 2 H + C +0.33 ('/3) Carboxylate 2 O-0. 5 Phosphate 2 1

Formal positive charge is distributed evenly between all the polar hydrogens in any given group, except in guanidinium and imidazole groups the central nitrogen-bonded carbon also receives an equal portion of the charge. Formal negative charge is distributed evenly between all the terminal oxygen atoms in a group.

When applying the formal charges, positive ionisable groups which are usually protonated at physiological pH (e. g. amines, guanidines, imidazoles etc.) should be protonated and negatively ionisable groups (e. g. carboxylates, phosphates etc.) should be deprotonated.

#GH-bond_Hydrogen bond interactions Hydrogen bond geometries are defined by four atoms: donor parent-donor hydrogen---acceptor- acceptor parent, e. g. N-Hw0=C. The interaction score for a hydrogen bond is a function of the donor-acceptor bond length, the acute angle around the donor hydrogen and the acute angle around the acceptor atom. In cases where the acceptor is bonded to more than one parent, the acceptor angle is defined to the mean coordinates of all acceptor parents. This is similar to Bohm's [13] original and FleXX's [14] scoring functions but with the addition of the acceptor angular term, which advantageously helps to exclude unfeasible geometries in the docking simulations. This term is not used in the prior art protein methods because they typically deal with'real'structures which defacto do not include unfeasible geometries.

Hydrogen bond scores are scaled by function fi [13], a generic geometric scoring function : X = any geometric variable (distance, angle, plane angle) * 0 =ideal value A X-Xol AXMn = tolerance on ideal value AXMaX = deviation at which score is reduced to zero equi (II) 1 A (A Min/XMin < AX < AXM ; AXMax Miz 10 Mar 0 A>A Hydrogen bond scores are also scaled by function f2 [60], which relies on the formal charges applied as described above: /2 (ci,cj)=-sgn(cicj)#(1+n#ci#)(1+n#cj#) wherein ci, Cj = formal charges on atoms i and j n = scale factor (0.5) equi (III) + 1 for charges of opposite sign -sgncc _ -1 for charges of like sign

This provides a convenient way to increase the affinity of hydrogen bonds involving formally charged donors and/or acceptors. Function f2 ranges from 1.0 for neutral H-bonds through to 1.875 for e. g. tertiary amine-phosphate interactions.

Hydrogen bond scores are also scaled by local receptor density, f3 [13]: a 3 (J=f-1 wherein equ" (IV) NNeighb, 0 NNeighb = no. of non-H receptor atoms within 5A radius sphere of given receptor atom NNejgb 0 = normalisation factor (average local receptor density) = 25 cor=0. 5 As described in reference 13, function f3 is used to distinguish between convex (exposed) and concave (cavity) regions of the receptor surface, and increases the weighting given to interactions in tightly binding cavities. Function f3 typically ranges from 0.5 to 1.3. Together with the free energy weighting of-3.4 kJ/mol, functions f2 and f3 combine to give a wide range of ideal H-bond affinities, from 1.7 kJ/mol for exposed neutral interactions to 8.3 kJ/mol for highly charged interactions in tight concave cavities. dGionic-Ionic interactions This is a specialised term, which primarily helps the correct placement of charged, planar groups such as guanidinium and imidazole-conventional charged interactions such as amine-phosphate contacts are already accounted for in the hydrogen bonding term. The ionic term is also used to prevent close contact between polar atoms of like charge (e. g. donor-donor, acceptor-acceptor).

Distance-dependent interaction scores are calculated between the following sets of atoms: RNA Ligand Type All formally charged All formally charged non-Attractive or non-hydrogen atoms hydrogen atoms repulsive H-bond donor hydrogens H-bond donor hydrogens Repulsive (neutral and charged) (neutral and I

The ideal contact (or repulsive) distance is equal to the sum of the vdW radii of two atoms, plus an increment. The increment is preferably close or equal to 0.6A. As with hydrogen bonds, the score is further scaled by the formal charges (function f2, which changes sign for atoms of like charge) and the local receptor density (function f3).

The invention avoids the inclusion of repulsive interactions between neutral hydrogen bond acceptors. Including these interactions causes problems with pi-pi stacking, as ligands with acceptor atoms in an aromatic ring (e. g. furan) are prevented from stacking correctly on RNA bases. Thus the invention only considers formally charged acceptors in the repulsive term.

The weighting for AGionic is typically relatively low (e. g.-1. 0 kJ/mol)-it can be considered as accounting for solvation-like interactions above and below the plane of the charged ligand group that are not accounted for by the hydrogen bonding or aromatic terms. A good example of this type of interaction is presented by arginine when bound to its aptamer.

The use of a formal charge at the centre of planar groups such as guanidinium and imidazole and their derivatives has not previously been used in pseudo energy scores. Specifically, it is not used by Bohm [13], in FleXX [14] or in Hammerhead [15].

#Garom-Aromatic interactions The aromatic term favours parallel pi-pi stacking between aromatic rings. Guanidinium-like groups on the ligand are also included (cation-pi interactions) e. g. as described in refs 61 to 68.

As shown in Figure 1, the geometry is described by the average perpendicular distance from each ring centre to the other ring plane (ARperp), and by the average slip angle between the rings (ha5lip). Each ring in a fused ring system is scored independently. Perpendicular stacking geometries are not accounted for.

Unlike Böhm [13], FleXX [14] and other pseudo-energy scores, planar groups such as guanidinium and analogues are considered as aromatic-like. These interactions are key for the correct placement and scoring of guanidinium-like groups stacking on nucleotides, such as for example the interaction of Arginine in the TAR RNA pocket. lGtipO-Lipophilic interactions The functional form of the lipophilic interaction (fi') is a crude linear version of a Lennard-Jones potential [69]. The functional form of the term is similar to that used in FleXX [14]: Rij = interatomic distance Ru = ideal value AR = Ri}-Ro ARM", = tolerance on ideal value ARMaX = deviation at which score is reduced to zero Slope =10. 0 2 methylene (CH2) and methyl (CH3) carbons with implicit hydrogens ni, nj =<BR> <BR> <BR> 1 all other atoms equ" (V)

The function has an attractive part (equivalent to fl), and also a repulsive part to prevent atomic overlap during docking. The slope of the repulsive potential is between 1 and 20 times the slope of the attractive potential, preferably between 5 and 15 times and more preferably about 10 times. In order to promote close contact between the ligand and receptor surfaces, the lipophilic score is calculated for all receptor-ligand atom pairs (polar and non-polar).

The repulsive part of the lipophilic potential is the only score which is calculated for the ligand, and prevents unreasonable atomic clashes during Monte Carlo sampling of the ligand rotatable bonds. Whilst a more complete calculation of ligand intramolecular strain energy can be used according to the invention (e. g. using the majority of the interaction terms outlined above), this does not give any significant improvement in the quality of docked conformations or predicted binding affinities. In the context of docking into a rigid receptor with a semi-flexible ligand, it thus seems better to treat all ligand conformations as equally probable, rather than attempt to rank putative binding modes by intramolecular and intermolecular free energy simultaneously.

#GrotNrot+#G0-Entropic cost of binding The entropic cost of ligand binding is estimated from the number of ligand rotatable bonds (Nrot) plus a constant (AGo = +5.4 kJmol-1) that represents the loss of translational and rotational freedom of the ligand [cf. ref. 13]. This term is similar to that of reference 13, FleXX [14] etc.

IV. Automated in silico library screening for RNA-bindinv compounds Unlike previous methods [e. g. ref. 4], the methods of the invention are ideally suited to a high-throughput in silico screening process of a large number of ligands (e. g. several thousand ligands per day) against an RNA target of interest (e. g. for screening ligands from a library to identify ligand/RNA structures having minimal energy requirements). This is also true when comparing with the method of reference 8, which uses methods derived from DOCK and molecular mechanics.

The ligands may be designed de novo and their structures saved in a computer readable format, such as SD or MDL formats. Alternatively, many vendors provide their catalogue of compounds in a computer readable format.

A 3D structure of the ligands is preferably generated from the 2D representation e. g. using a program such as CORINA [e. g. 70], CONCORDE or InsightII.

Using the described scoring function, a complete docking run on a low-molecular weight ligand takes 5-15 seconds (SGI MIPS R10K 250MHz), depending on the number of ligand interaction centres.

An example of an application of the above described invention is the use of the X-ray co- ordinates of the 50S or 30S subunits of the ribosome for the identification of novel antibiotics by docking a library of small molecules into known or novel sites on the ribosome and identifying ligands that are predicted to bind, i. e. have a predicted binding energy lower than threshold, for example less than-20 kJ.

To optimise the scoring function for high-throughput methods, the receptor's rigidity and the short-range nature of the interaction scoring functions can be exploited. It is thus useful to pre-calculate neighbour lists for the receptor-an indexed grid centred over the active site stores lists of all receptor atoms within a given radius of each grid point. The radius is chosen to just exceed the maximum range of any of the scoring functions (e.g. vdW radius + 4.0Å). A lookup table of this type is extremely useful for computing lipophilic scores.

Once a ligand has been identified which interacts with a receptor, this may be provided (synthesised, purified or purchased, for instance) and the interaction can be verified experimentally, using any means known in the art for evaluating ligand/receptor interactions (e. g. NMR, filter-binding assays, gel-retardation assays, displacement assays, surface plasmon

resonance, reverse two-hybrid, FRAC [71] etc.). The invention provides a ligand identified using the methods of the invention.

The invention further provides a method for screening a library of ligand structures to identify ligands which interact with an RNA receptor of interest. The method comprises docking each of a plurality of structures representing ligands from the library against the RNA receptor structure and calculating the free energy of a ligand: RNA complex's structure e. g. using a scoring function for calculating pseudo-energy values of the ligand: RNA complex's structure to identify a structure having minimal energy requirements.

Ligands provided in the library can comprise proteins, polypeptides, carbohydrates, lipids, nucleic acids (e. g. DNA, RNA, and modified forms thereof), as well as drugs (e. g. chemical agents, antibiotics, etc.).

The method provides a way to identify candidate lead drugs which are likely to bind strongly in vivo and which are thus more likely to achieve a therapeutic effect. In one embodiment, where the receptor is an RNA molecule, the library is used to identify a molecule which can inhibit the translation of the RNA molecule.

V. Definitions In order to more clearly and concisely describe and point out the subject matter of the invention, example definitions are provided for specific terms that are used in the description and claims.

As used herein, the term"docking"refers to an in silico method characterising characterising receptor ligand ligand interactions. Docking comprises orienting a ligand structure with a putative binding site structure in a receptor and evaluating the biochemical stability of the interaction (e. g. the amount of energy required to form a ligand: receptor complex). Evaluation is performed using an algorithm comprising both searching and scoring functions which searches for possible receptor conformations and scores the conformation according to its likelihood of forming based on energy considerations (i. e. lower free energy values for a conformation would have a higher likelihood of forming in vivo) and/or the degree of shape complementarity between the receptor and the ligand.

As used herein, a"receptor"is any biological macromolecule to which another molecule can bind (naturally or after engineering the molecule). Receptors according to the invention include, but are not limited to, proteins, polypeptides, nucleic acids (DNA, RNA, and modified forms

thereof), nucleic acid-protein complexes, carbohydrates and lipids. Preferred receptors are RNA and RNA-protein complexes. A receptor has at least one binding site for a ligand.

As used herein, a"cavity"refers to a portion of a receptor which has the potential to receive and interact with a ligand. A"binding cavity"is a cavity which has a high likelihood of receiving and interacting with a native ligand, based on energy minima evaluations. The shape of a binding cavity generally corresponds to the negative shape of a ligand and comprises a binding site for the ligand.

As used herein, a"binding site"or"active site"refers to an atom or group of atoms within a receptor which form chemical contacts with a ligand (e. g. ionic bonds, van der Waals interactions etc.) which stabilise a ligand/receptor interaction (e. g. making the interaction more energetically favoured).

As used herein, a"ligand"is any molecule which can be bound by a receptor, and includes, but is not limited to, proteins, polypeptides, nucleic acids (DNA, RNA, and modified forms thereof), peptides, peptoids, metal-based ligands, modified nucleotides or nucleosides, polyamines, carbohydrates, lipids and small organic molecules (e. g. organic molecules with a molecular weight of between 50 & 2500 Daltons e. g. between 300 and 800 Daltons). A ligand has at least one site which is complementary to a binding site of a receptor (e. g. forming energetically favoured interactions with the binding site).

As used herein,"van der Waals forces"refers to interatomic or intermolecular forces of attraction due to the interaction between fluctuating dipole moments associated with molecules not possessing permanent dipole moments. These dipoles result from momentary dissymmetry in the positive and negative charges of the atom or molecule and on neighbouring atoms or molecules and tend to align in anti-parallel direction and thus result in a net attractive force.

As used herein, a"van der Waals surface"is a surface which provides the most favourable intramolecular van der Waals forces. The van der Waals surface can be represented in various ways including as a volume map, a dot surface, or a Connolly surface.

As used herein, a"van der Waals volume"refers to a volume of a molecule which is impenetrable by other molecules with thermal energies at ordinary temperatures. Calculations of van der Waals volumes are described in, for example, reference 72.

As used herein, the term"sphere"refers to a graphical representation of a portion (e. g. atom or groups of atoms) of a molecule. A sphere comprises a pre-selected radius e. g. the van der Waals

radius of an atom (or groups of atoms, such as the atoms in a small ligand, ligand fragment, or solvent).

As used herein,"an active site sphere"or"binding site sphere"refers to a sphere whose centre point is at the active site or binding site of a binding cavity.

As used herein, a"maximum distance from an active site sphere" (or grammatical equivalents of this term) refers to the maximum distance of a ligand may have from an active site and still have favourable energy interactions with the binding site.

As used herein, the term"molecular surface"refers to a surface defined by the inward surface of a hypothetical sphere (a"probe sphere") as it rolls over a protein molecule (see, for example, references 73 & 74). A molecular surface may be subdivided into a"contact surface", and a "reentrant surface."The"contact surfaces the part of the van der Waals surface of the atoms that the probe sphere can touch, while the"reentrant surface"is a space defined by the inward- facing part of a probe sphere when it is simultaneously touching three atoms and the volume swept out by the probe sphere as it rolls around a pair of atoms (see, for example, reference 75).

As defined herein, the"binding mode"refers to molecular interactions between a ligand and receptor.

As used herein,"aromatic/cation-pi interactions"refer to interactions between an aromatic ring and a positively charged molecule. An aromatic/cation-pi interaction can be either attractive or repulsive, as determined by the electrostatic potential surface of the aromatic ring itself (e. g. see reference 76).

As used herein,"configuration space"is a space which specifies the values of all potentially accessible physical degrees of freedom of a molecule.

As defined herein,"simulated annealing"refers to a method of defining minimal energy zones in a molecule using a random walk, in which a small, random perturbation from a given, current position is selected as a move, and the energy change associated with making this move is calculated. If the energy change is less than or equal to zero, then the move is accepted. A random move having an energy change which is greater than zero however, is accepted with an acceptance probability of : p = e~AE/T, where T is the"annealing temperature". The larger the value of hE, the smaller the acceptance probability (e. g. as described in reference 77).

As used herein, the term"docking run"refers to an iii silico simulation in which the orientations of sphere (s) representing a ligand within a putative binding cavity within a receptor are modified and the energy state of a ligand: receptor complex is determined.

As defined herein, an"aptamer"is an nucleic acid which binds to a target molecule with a much higher degree of affinity than it binds to non-target materials in a sample. The Kd of an aptamer for its target molecule is at least about 50-fold greater than for non-target molecules in a sample.

As used herein, the term"aptamer binding"refers to non-"Watson-Crick"binding interactions.

BRIEF DESCRIPTION OF DRAWINGS Figure 1 illustrates, for one embodiment of the invention, (A) how the average perpendicular distance from each ring centre to the other ring plane and (B) how the average slip angle between the rings are calculated for aromatic interactions.

Figure 2 shows a plot of calculated binding energy vs. RMS deviation of the recognition fragment from the experimental structure, for 100 docking runs of arginine into arginine aptamer (lkoc. pdb). The point to the far left of the graph represents the native structure prior to energy minimisation hence its poor score.

Figure 3 shows an overlay of the highest scoring docked structures with the experimental structures, for tobramycin aptamer (3A; 2tob. pdb) and AMP aptamer (3B; lamO. pdb).

MODES FOR CARRYING OUT THE INVENTION Docking experiments In order to demonstrate that the invention can correctly predict the binding mode of a set of ligands interacting with their receptor, docking experiments were carried out for models of five published RNA/ligand complexes (Arginine, Citruline, AMP, tobramycin and FMN). 3D models for RNA receptor and ligand were taken directly from PDB files lkoc, lamO, lfmn, 2tob and lkod, with no further energy minimisation.

Cross-docking experiments were also performed for the same complexes in which binding affinities for all pairwise ligand and RNA combinations were predicted. The hypothesis was that each aptamer will have a distinct preference for its own native ligand over non-native ligands, and that each ligand will prefer to bind to its own aptamer. The cross-docking experiments test the discrimination power of the invention.

100 independent docking runs were performed for each native aptamer-ligand complex, using a low-temperature cooling schedule as shown below. Fifty docking runs per ligand were used in the cross-docking experiment. Phase Initial Final No. of constant Cooling No. of MC Initial max step temp temp temperature factor trials/block sizes (trans, rot, (K) (K) blocks torsion) 1 300 50 25 0.928 100 0. lA, 10°, 10° 2 50 10 25 0. 935 100 0. 1A, 10°, 10°

Binding site mapping Binding site cavities for the RNA receptors were mapped using large and small sphere radii of 4.0A and 1.75A, and a grid resolution of 0. 5A. Cavities with a volume less than 20 grid points, or whose centre of mass was greater than SA from the binding site centre, were removed. In all cases, only one cavity remained. A distance restraint function was applied to keep all ligand atoms within 6A of any cavity grid point during docking. The results of the binding site mapping are shown below: Designated binding site Cavity centre of mass centre Cavity volume distance from REceptor designated bidin (mean coordinates of all (no. of grid points) atoms listed) site center (A) Arginine G8 : 06; G9: 02; G15: 06, N7; 164 4.5 G16: 06, N7; G20: 06, N7 AMP A5 : N1, N3, C4, C5, C6, C7; 363 3.6 G6: N1, C2, N3, C4, C5, C6 FMN G9: N3; G10 : C2; U12: C4; 150 2.7 G27: C6 Tobramycin U7: 04; G12 : C2 205 2. 1 Citruline U9: 02; U16 : 04; A18 : 04' 26 1. 7

Native dockings-free energies Free energies of docked RNA/ligand complexes were calculated using equations I, II, III, IV and V as described above. The lowest energy obtained in 100 dockings was taken as the result.

The structures of the docked result and the PDB file were compared to give (a) RMS deviation for the tightly bound portion of the ligand (recognition fragment) only and (b) RMS deviation for all ligand non-hydrogen atoms. The recognition fragment is typically much better defined in the NMR structure than the rest of the ligand.

The same equations were used to predict binding energies for the complex as described in the PDB files, without any docking.

Both results were compared with experimental AG values. E tl Calcd Calcd RMSD PDG Docked Recognition recognition PDB Ligand #Gbind #Gbind #Gbind Fragment (all) (kJmol) (kJmol-1) (kJmol-1) (Å) lkoc Arginine-28. 5-14.0-31.4 Guanidine 0.59 (3.73) lamO AMP-28.5-24.0-32.2 Adenine 0.36 (2.13) lfmn Fmn-30. 1-26.3-37.4 Fused rings 0.33 (0. 75) 2tob Tobramycin-45. 2-44.8-38.9 Rings 1,2 0.20 (1. 23) lkod Citruline-28.5-6.3-21.0 Urea 2.58 (4.20)

table A: docking results tor RNA aptamers and ligands The calculated energies in Table A are reported without the contribution from the repulsive lipophilic potential. The rationale for this [60] is that any residual atomic clashes in the final conformation are likely to be an artefact of the rigid receptor/semi-flexible ligand model, and could be easily resolved in reality by a slight relaxation of the receptor or ligand. Table B shows the breakdown of component interaction scores for each aptamer. There are significant repulsive lipophilic interactions in the PDB structures of tobramycin aptamer (2tob) in particular, and to a lesser extent in the citruline aptamer (lkod). These could be due to a combination of the particular set of vdW radii used and inaccuracies in the experimental structure. PDB Structure Aromatic H-bond Ionic Lipo Attr. Lipo Rep. lkoc PDB +0.0-6.8-5.8-12.7 +4.8 Docked +0.0-24.5-5.9-12.3 +1.8 lamO PDB-5.4-13.5 +2.5-20.0 +4.9 Docked-4.6-18.1 +0.3-22.2 +0.5 1 fmn PDB-3.0-12.8 +0.0-26.9 +0.0 Docked-2.4-23.7 +0.6-28.3 +1.5 2tob PDB +0.0-36.9 +0.0-29.3 +17.8 Docked +0.0-34.6 +1.0-26.8 +10.1 lkod PDB +0.0-7.1 +1.2-11.7 +9.8 Docked +0.0-24.3 +1.3-9.4 +1.3 Table B: Breakdown of component interaction scores for aptamer native dockings

Binding modes and free energies are well predicted for four out of the five RNA aptamers (arginine, amp, fmn, tobramycin). In the lowest energy docked conformation, the RMS deviation of the recognition fragment to the experimental binding mode is under 0.6A in all four cases.

The all-atom ligand RMS deviation can be much higher, reflecting the mobility or low resolution of peripheral ligand functionality in the bound conformation. Plots of binding energy against RMS deviation of the recognition fragment for all 100 docking runs show that there are no grossly mis-docked structures with comparable energies to the experimental structure (see figure 2 for an example). The rank order of binding energies is reproduced from arginine (, uMol) through to tobramycin (nM) aptamer.

In the case of citruline aptamer, the lowest energy ligand conformations is misdocked and the binding free energies is underestimated.

Ara aptamer The arginine binding cavity displayed by the arg aptamer brings together an array of negatively ionisable N7 atoms from G15, G16 and A18 as well as oxygens from G15, G16 and G20. This electronegative pocket accommodates the guanidinium moiety of the ligand. The interaction between the guanidine and the receptor are mediated through a hydrogen bond network, as well specific ionic"solvation"interaction above and under the guanidinium plane with 06 of G16 and G20. These interactions are well accounted for in the score and the docking reproduces well this binding geometry. The position of aliphatic portion of the ligand is less well defined both in the experimental structure and the docking.

Amp aptamer The binding of Amp is dominated by the stacking of the ligand adenine between 2 purines (A5 and G6). The 6 membered ring of the Adenine is perfectly stacked against the 6 membered ring of the A5 and 5 membered ring of G6. The interaction is stabilised by hydrogen bonds between Adenine N13 and the amino groups of A7 and G12; a hetero purine base pairing of the ligand with G3 further stabilises the complex (H-bond between G: H21/Adenine : N16 and Adenine: H19/G3 : N3.) There are also several hydrogen bonds between the ligand ribose and RNA backbone and an ionic interaction between the AMP phosphate and the H1 of G6.

The docking is able to reproduce the bound conformation of the ligand adenine moiety (see figure 3). The placement of the phosphate tail is less precise, consistent with the experimental data.

FMN aptamer FMN intercalates G10 and G9. The lower plateau of the binding site is formed by G10, U12 and A25 (base triple). The stacking interactions take place between the benzene ring of FMN and 6 membered rings of G10 and G9, and to a lesser extent between the last cycle of FMN and U12.

The FMN edge base pairs with A26, displaying 2 hydrogen bonds. The tail also makes hydrogen bonding contacts with bases and the RNA backbone.

The docking predicts correctly the position and orientation of the FMN. Stacking interactions are predicted accurately and so is the hydrogen bond between FMN: 043 and A26: H62.

Citruline aptamer Citruline is an uncharged molecule at physiological pH. Accordingly, in the citruline aptamer, the binding cavity is less negatively charged than what is observed for the arg aptamer. At position 31, G is replaced by U, losing a negatively charged N7. Further, by going from a C at position 13 to a U, N3 is switched from an H-bond acceptor to an H-bond donor, interacting with citruline's urea moiety.

This is the only test-case where the docking method of the invention was not able to predict the binding mode. The authors of citruline aptamer structure note that given the proximity of an array of 4 oxygens from the 4 Gs in the binding cavity, it is likely that a Na ion may bind this cavity and stabilise the complex. The omission of this ion in the structure and in the docking calculations may thus explain the scoring function's failure.

Tobramycin aptamer The tobramycin in complex with its aptamer show many complementary features involving vdW, hydrogen bonding and ionic contacts between the ligand and its receptor, consistent with its tight affinity.

Tobramycin ring I is sandwiched between the major groove and G12 acting as a flap. Amino N8 is interacting with U16-04, U4-04 and G12-06.09H hydrogen bonds H61 of A15 amine and OH14 hydrogen bonds G12-05'. Ring II has ionic interaction between its amine N24 and G5-

O1P and G6-N7, as well as between amine N25 and U7-01P. Finally OH52 hydrogen bonds G11 02P backbone and amine N42 interacts with G12-O1P, U9-04 and U8-04.

The docking reproduces the position of ring I and II, scoring for all the above mentioned interactions (see figure 3). Ring III is slightly less well defined, but the ionic interaction of NH3 is preserved.

Cross-dockinss The ability of the scoring function to match ligand and receptor correctly was assessed by cross- docking as described above.

For the four aptamers which were successfully docked (see above), the correct pairings were largely identified. The pairings for the other aptamer (citrulline) was also acceptable.

Table C shows the results, with the diagonal corresponding to docked AG values from table A. AGbind (expt) '"arg Amp fmn Tob Cit kjmol lkoc(arg)-28.5-31.4-17.9-21.0-23.0-20.5 1 amO(amp)-28.5-26.4-32.2-21.2-21.5-21.6 lfmn(fmn)-30.1-23. 1-21.4-37.4-24.0-21.6 2tob(tob)-45.2-32.7-20.1-28.7-38.9-24.4 lkod(cit)-28.5-24.7-16.6-14.1-22.2-21.0 Table C : cross-docking results The scoring function thus does not allow charged ligands such as arginine and tobramycin to indiscriminately bind all RNA receptors to the same extent. Instead, specific polar contacts and shape complementarity are required.

Conclusions The scoring function of the invention can thus predict the correct free energy of binding in most cases. Furthermore, it can discriminate between native and non-native ligands. The accuracy of the docking method of the invention is also confirmed, and its speed means that it is suitable for screening large virtual libraries of compounds.

It will be understood that the invention has been described by way of example only. Variations, modifications, and other implementations of what is described herein will occur to those of ordinary skill in the art without departing from the spirit and scope of the invention as claimed.

REFERENCES (the contents of which are incorporated herein in full) [1] Lengauer & Rarey (1996) Curr. Opin. Struct. Biol. 5: 402-406.

[2] Strynadka et al. (1996) Nature Struct. Biol. 3: 233-239.

[3] Kuntz et al. (1982) J. Mol. Biol. 161: 269-288-see also Ewing & Kuntz (1997) J. Comput.

Client. 18 (9): 1175-1189.

[4] Chen et al. (1997) Biochemistry 36: 11402-11407.

[5] Richards (1977) Annu., Rev. Biohyphys, Bioengineer. 6 : 151-176.

[6] Connolly (1983) Appl. Crystallogr. 16: 548-558.

[7] Connolly (1983) Science 221: 709-713.

[8] Filikov et al. (2000) J. C07nput. Aided Mol. Des. 14 (6): 593-610.

[9] Hermann & Westhof (1999) J. Med. Chem. 42: 1250-1261.

[10] Srinivasan et al. (1996) Folding & Design 1 : 463-472.

[11] Leclerc & Cedergen (1998) J. Med. Chem. 41: 175-182.

[12] Leclerc & Karplus (1999) Theo. Clzent. Accounts 101 : 131-137.

[13] Bohm (1998) J. Comput. Aided Mol. Des. 12 : 309.

[14] Available from Tripos Inc. (http ://www. tripos. com).

[15] Welch et al. (1996) Chem. Biol. 3 (6): 449-462.

[16] Afshar et al. (1994) J. Mol. Biol. 244: 554-571.

[17] www. msi. com.

[18] www. rcsb. org/pdb [19] Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 lEZ, UK.

[20] Connolly (1983) Appl. Crystallogr. 16: 548-558.

[21] Connolly (1983) Science 221: 709-713.

[22] Connolly (1993) J. Mol. Graphics 11: 139-141.

[23] Agishtein (1992) J. Biomolec. Struct. & Dynamics 9: 759-768.

[24] Colloc'h et al. (1990) J. Mol. Graphics 8: 133-140.

[25] Colloc'h et al. (1988) J. Mol. Graphics 5: 170.

[26] Connolly (1985) J. Mol. Graphics 3: 19-24.

[27] Connolly J. Am. Chem. Soc. 107 : 1118-1124.

[28] Connolly et al. (1985) Comput. Chez. 9: 1-6.

[29] Connolly (1985) J. Appl. Crystallogr. 18 : 499-505.

[30] Connolly (1986) J. Mol. Graphics 4: 3-6.

[31] Connolly (1986) J. Mol. Graphics 4: 93-96.

[32] Connolly (1987) Visual Computer 3: 72-81.

[33] Connolly (1992) Biopolymers 32: 1215-1236.

[34] Duncan et al. (1993) Biopolymers 33: 219-229.

[35] Duncan et al. (1993) Biopolymers 33 : 231-238.

[36] Eisenhaber et al. (1993) J. Comp. Chem. 14 (11): 1272-1280.

[37] Eisenhaber et al. (1995) J. Comp. Chem. 16 (3): 273-284.

[38] Gibson et al. (1988) Molec. Phys. 64: 641-644.

[39] Guerrero-Ruiz et al. (1991) Computers Chem. 15 (4): 351-352.

[40] Heiden et al. (1993) J. Comp. Chem. 14 (2): 246-250.

[41] Heiden et al. J. Comp. Aided Mol. Design 4: 255-269.

[42] Kundrot et al. (1991) J. Comp. Chez. 12 (3): 402-409.

[43] Lin et al. (1994) Proteins : Struct. Funct. Genet. 18: 94-101.

[44] Pascual-Ahuir et al. (1990) J. Coup. Chem. 11 (9): 1047-1060.

[45] Perrot et al. (1992) J. Comp. Chem. 13 (1): 1-11.

[46] Perrot et al. (1990) J. Mol. Grpahics 8: 141-144.

[47] Petitjean (1994) J. Comp. Chem. 15 (5): 507-523.

[48] Petitjean (1995) J. Comp. Chem. 16 (1) : 80-90.

[49] Sezerman et al. (1993) Protein Science 2: 1827-1843.

[50] Luty et al. (1995) J. Comput. Chem. 16: 454-464.

[51] Miranker et al. (1991) Proteins 1.1: 29-34.

[52] Caflishetal. (1996) J. Comput. Chem. 18 : 723-743.

[53] Goodsell & Olson (1990) Proteins: Struct. Funct. Grenet. 8: 195-202.

[54] Bohacek & McMartin (1995) SIAM J. Math. Anal. 116: 147-179.

[55] Eisen et al. (1994) Proteins: Struct. Funct. Genet. 19 : 199-221.

[56] Metropolis et al. (1953) J. Chem. Phys. 21: 1087.

[57] Morris et al. (1996) J. Comput. Aided Mol. Des. 10: 293-304.

[58] Liu et al. (1999) J. Comput. Aided Mol. Des. 13: 435-451.

[59] Curmnings et al. (1995) Protein Science 4 : 885-899.

[60] Jain (1996) J. Comput. Aided MoL Des. 10: 427-440.

[61] Dougherty (1996) Science 271: 163-168.

[62] Ma et al. (1997) Chem. Rev. 97: 1303-1324.

[63] Scrutton & Raine (1996) Biochem. J. 319: 18.

[64] Sussman et al., Science 253: 872-879.

[65] Hendlich (1998) Acta Crystallogr. D 54: 1178-1182.

[66] Wouters (1998) Protein Sci. 7: 2472-2475.

[67] Jorgensen et al. (1988) J. Am. Chem. Soc. 110: 1657-1666.

[68] Jorgensen et al. (1996) J. Am. Chem. Soc. 118: 11225-11236.

[69] Rarey et als (1996) J. Mol. Biol. 261: 470-489.

[70] Gasteiger et al. (1990) Tetrahedron Comp. Method. 3 : 537-547.

[71] W099/64625.

[72] Bondi (1964) J. of Physical Chemistry 68: 441-451.

[73] Richards (1977) Annu. Rev. Biophys. Bioeng. 6: 151-176.

[74] Greer & Bush (1978) Proc. Natl. Acad. Sci. USA 75: 303-307.

[75] Molecular Surface Package Implementation Manual, Version 3.9, February 10,2000,1259 El Camino Real, #184, Menlo Park, CA 94025 USA.

[76] Dougherty (1996) Science 271: 163-168.

[77] US patent 5,241,470.

Previous Patent: UROKINASE PEPTIDE STRUCTURE MIMETICS

Next Patent: MEASURING RELATIVE PROPORTIONS OF DISSIMILAR FLUIDS IN A PIPE