Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
GENERATING CANDIDATE MOLECULE STRUCTURE
Document Type and Number:
WIPO Patent Application WO/2024/013028
Kind Code:
A1
Abstract:
A computer-implemented method and a computational device or apparatus for generating a candidate molecule structure are disclosed. A non-transitory computer- readable medium storing executable instructions and a computer program product for generating a candidate molecule structure are also disclosed.

Inventors:
VÁZQUEZ LOZANO JAVIER (ES)
GIBERT CODINA ENRIQUE (ES)
HERRERO ABELLANAS ENRIC (ES)
Application Number:
PCT/EP2023/068915
Publication Date:
January 18, 2024
Filing Date:
July 07, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PHARMACELERA S L (ES)
International Classes:
G16C20/62
Domestic Patent References:
WO2018121866A12018-07-05
Foreign References:
US20030236631A12003-12-25
US7330793B22008-02-12
EP22382666A2022-07-13
Other References:
KAWAI KENTARO ET AL: "De Novo Design of Drug-Like Molecules by a Fragment-Based Molecular Evolutionary Approach", vol. 54, no. 1, 28 December 2013 (2013-12-28), US, pages 49 - 56, XP055805856, ISSN: 1549-9596, Retrieved from the Internet DOI: 10.1021/ci400418c
Attorney, Agent or Firm:
ZBM PATENTS - ZEA, BARLOCCI & MARKVARDSEN (ES)
Download PDF:
Claims:
CLAIMS:

1 . A computer-implemented method for generating a candidate molecule structure, the method comprising: fragmenting a reference chemical structure; obtaining a reference fragment and a remaining fragment from the fragmented reference chemical structure; selecting a plurality of reference interaction field points of the reference fragment, each of the reference interaction field points being one point in space; determining a reference interaction field value for the reference interaction field point of the reference fragment; providing a plurality of candidate fragments; applying the reference interaction field points of the reference fragment to each of the candidate fragments to select a plurality of candidate interaction field points of each candidate fragment so that each of the candidate interaction field points is one point in space corresponding to the position in space of the reference interaction field points for each of the candidate interaction field points; determining candidate interaction field values for the candidate interaction field points of the candidate fragments; comparing the reference interaction field values with the candidate interaction field values; determining, based on the comparison, a replacing candidate fragment from the candidate fragments; and generating a candidate molecule structure by replacing the reference fragment with the replacing candidate fragment, to synthesize a candidate molecule structure to bind a specific receptor.

2. The computer-implemented method of claim 1 , the method comprising filtering a plurality of chemical structures from a database to obtain the candidate fragments.

3. The computer-implemented method of claim 1 or 2, wherein generating the candidate molecule structure comprises combining the replacing candidate fragment with the remaining fragment.

4. The computer-implemented method of any of claims 1-3, wherein comparing the reference interaction field values with the candidate interaction field values comprises generating a similarity index indicating a level of similarity between the candidate interaction field values and the reference interaction field values.

5. The computer-implemented method of claim 4, wherein determining the replacing candidate fragment comprises determining whether the similarity index of the candidate fragment exceeds a minimum similarity threshold.

6. The computer-implemented method of any of the claims 4 or 5, wherein determining the replacing candidate fragment comprises ranking the similarity indexes of the candidate fragments and selecting the replacing candidate fragment corresponding to the N highest ranked similarity indexes of the candidate fragments.

7. The computer-implemented method of claim 6, wherein ranking the candidate fragments further comprises: determining a distance between an anchoring point of the candidate fragments and an anchoring point of the reference fragment; ranking the similarity indexes of the candidate fragments to select the M highest ranked similarity indexes of the candidate fragments based on the distance.

8. The computer-implemented method of any of the preceding claims, wherein obtaining the reference fragment comprises applying retrosynthetic rules.

9. The computer-implemented method of any of the preceding claims, wherein obtaining the candidate fragments comprises: obtaining a spatial orientation of the reference fragment; aligning one or more of the candidate fragments based on the spatial orientation of the reference fragment.

10. The computer-implemented method of any of the preceding claims, wherein filtering the plurality of chemical structures from a database to obtain the candidate fragments is based on a size of the reference fragment and/or based on a number of linkers of the reference fragment.

1 1 . The computer-implemented method of any of the preceding claims, wherein selecting the reference interaction field point of the reference fragment is based on atom properties, atom coordinates, and linker type of the reference chemical structure.

12. The computer-implemented method of any of the preceding claims, wherein the reference chemical structure is a reference molecule. 13. The computer-implemented method of claim 12, wherein obtaining a reference fragment is based on one or more conformations of the reference molecule.

14. A computational device or apparatus for generating a candidate molecule structure configured to perform the method according to any of claims 1 to 13.

15. A non-transitory computer-readable medium storing executable instructions that, when executed by a processor, cause the processor to operate a method according to any of claims 1 to 13. 16. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of claims 1 to 13.

Description:
GENERATING CANDIDATE MOLECULE STRUCTURE

This application claims the benefit of European Patent Application EP22382666.0 filed on July 13, 2022.

The present disclosure relates to computational chemistry and more specifically to generating candidate molecule structure.

BACKGROUND

Computational chemistry is a branch of chemistry that combines the effectiveness and advantages of computer simulations with laws of chemistry in facing difficult chemical problems. One of the main areas of the computational chemistry concerns searching for molecules in order to find alternative structures capable of binding to a given receptor or to decide which modifications of the molecule are the most appropriate to improve its affinity, solubility, etc.

In this context, a well-known problem refers to searching structures of chemical compounds to identify compounds which may share a biological activity with a known compound. These compounds may be searched in databases containing commercially available compounds. These compounds may or may not share any common synthetic linage with the known or reference compound. These searches aim at finding molecules or compounds from the database that are very similar to the reference compounds.

To this end, molecular alignment and similarity measures may play a role to capture a degree of similarity of these molecules. It is not trivial to know the right alignment or similarity metric, as different properties of molecules will be considered more important depending on the problem. It is also known the use of methods for generating a degree of similarity between two molecules. Therefore, there are multiple similarity metrics to help the comparison between molecules, some of which use steric or electrostatic fields for statistical methods for molecular activity prediction.

However, the new molecules are limited to the molecules or compounds contained in the database. In some cases, the similarity between the molecules of the database and the reference molecule is not sufficient to find a suitable alternative molecule. In order to explore a large chemical space comprising a higher amount of compounds, bigger databases have been created. The dimension of these databases implies massive screening times and less accessibility thereto. Other methods use machine learning techniques to enumerate molecules in a huge chemical space at a reduced computational cost. However, these methods end up retrieving molecules that are very complex to synthesize.

In addition, this process generally involves long computational times and computational resources.

The present disclosure provides examples of systems and methods that at least partially resolve some of the aforementioned disadvantages.

SUMMARY

In a first aspect, a computer-implemented method for generating candidate molecule structure is disclosed.

The computer-implemented method comprises fragmenting a reference chemical structure and obtaining a reference fragment and a remaining fragment from the fragmented reference chemical structure. It may be noted that, for example, the fragmented reference chemical structure may result into a plurality of fragments of the reference chemical structure. Depending on the complexity of the reference chemical structure, the number of reference chemical structure fragments may vary.

The reference chemical structure may be considered as a representation of a molecule, a portion of a molecule, a macromolecule, or a chemical structure intervening in a more complex chain of chemical components. The reference chemical structure may be employed as starting point to generate a candidate molecule or candidate molecule structure which replaces at least a part or a fragment thereof. The reference chemical structure may be represented by reference chemical structure information. The reference chemical structure information, such as data, may be the crystallographic structure of the reference chemical structure. In some examples, the method may comprise receiving reference chemical structure information, for example crystallographic structure, representing the reference chemical structure. The reference chemical structure may be fragmented into reference chemical structure fragments. In other examples, the reference chemical structure may comprise information about the reference chemical structure and about a plurality of fragments in which it is possible to divide the reference chemical structure. The reference fragment may be understood as a fragment or a portion of the reference chemical structure to be replaced or investigated to be replaced.

The computer-implemented method comprises selecting a plurality of reference interaction field points of the reference fragment, each of the reference interaction field points being one point in space.

It may be noted that each of the reference interaction field points is one different point in space.

The computer-implemented method comprises determining a reference interaction field value for the reference interaction field point of the reference fragment. An interaction field point represents a location in space where the reference fragment may be compared to another fragment. The reference chemical structure or any fragment of the reference chemical structure may be located on a space contained in a grid of interaction field points. In some examples the interaction field points may be defined by pharmacophoric points. An interaction field value may be used to describe the molecular properties of the reference chemical structure or the reference fragment, e.g., the influence of electrostatic field at a given interaction field point. Therefore, the one or more interaction field points may be a discrete representation of the projections of e.g., electrostatic, or steric fields which are continuous in space. Mathematical operations, e.g., comparing, may be more easily performed.

In addition, the computer-implemented method comprises providing a plurality of candidate fragments. Each candidate fragment may be understood as the fragment to be compared to the reference fragment as a potential replacement thereof.

In some examples, the computer-implemented method may further comprise filtering a plurality of chemical structures from a database to obtain the candidate fragments. The plurality of chemical structures comprises the reference chemical structure and/or chemical structure fragments from a database.

In some examples, the candidate fragments may be obtained from the reference chemical structure. In other examples, the candidate fragments may be obtained from a chemical structure database. In some examples, the candidate fragments may be obtained from the reference chemical structure and from a chemical structure database. A chemical structure database may be understood as comprising or storing a set or a plurality of chemical structures.

The computer-implemented method further comprises applying the reference interaction field points of the reference fragment to each of the candidate fragments to select a plurality of candidate interaction field points of each candidate fragment. Each candidate interaction field point is selected using the same position in space as each reference interaction field point. Therefore, each of the candidate interaction field points, which is one point in space, corresponds to the position in space of the reference interaction field points for each of the candidate interaction field points.

In some examples, one or more candidate interaction field points coincide with the one or more reference interaction field points.

The computer-implemented method further comprises determining candidate interaction field values for the candidate interaction field points of the candidate fragments.

In addition, the method comprises comparing the reference interaction field values with the candidate interaction field value. In some examples, the interaction field value may correspond to e.g., a hydrophobicity projection at the interaction field points, respectively for the reference fragment and the candidate fragments.

Furthermore, the method comprises determining, based on the comparison, a replacing candidate fragment from the candidate fragment; and generating a candidate molecule structure by replacing the reference fragment with the replacing candidate fragment, to synthesize a candidate molecule structure to bind a specific receptor.

In some examples, the specific receptor may be a nucleic acid like DNA or RNA.

In some examples, generating the candidate molecule structure may comprise combining the replacing candidate fragment with the remaining fragment. Since the number of fragments from the fragmented reference chemical structure may vary, some examples of the methods disclosed herein may comprise generating candidate molecule structure through the replacement of the reference fragment by the replacing candidate fragment without the need of combining the replacing candidate fragment to any remaining fragment. In some examples, the replacing candidate fragment may be understood as the most appropriate candidate fragment to replace the reference fragment according to a comparison between the replacing candidate fragment and the candidate fragments.

According to this aspect, the candidate molecule structure may replace existing molecule structures. Therefore, the search for alternative molecules to replace existing molecules is improved. In addition, as the method generates candidate molecule structure from the replacing candidate fragment, a potential higher number of different molecule structures with respect to the mere comparison between existing molecules is provided. Accordingly, higher number of candidate molecule structures may improve the probability to find alternative structures to synthesize and capable of binding to a given receptor. As a result, increasing the number of candidate molecule structure may enhance the matching to the reference chemical structure, or at least increasing the probability of finding a suitable replacement therefor.

Fragmenting the reference chemical structure into the fragmented reference chemical structure may reduce the run time required to compare molecules. Accordingly, computational cost and time can be reduced.

Computational cost may be reduced because the disclosed computer implemented method is based on the comparison between reference fragments (e.g., building blocks) of a reference chemical structure from a database (e.g., a building block family).

For example, a reference structure comprising a first building block BB1 and a second building block BB2, may be fragmented in two fragments (i.e. , REF1 represents the first building block BB1 , and REF2 represents the second building block BB2).

The building block library may comprise N different building blocks. In this example, N=3, therefore the building block library comprises 3 different building blocks (i.e., BB3, BB4, and BB5). The disclosed computer implemented method may compare the fragment REF1 with each building block (i.e., BB3, BB4, and BB5) of the building block library. Therefore, 3 comparisons may be performed (i.e., first comparison: REF1 with BB3, second comparison: REF1 with BB4, and third comparison: REF1 with BB5). Similarly, the fragment REF2 may be compared with each building block of the building block library (i.e., BB3, BB4, and BB5). Therefore, 3 more comparisons may be performed (i.e. , fourth comparison: REF2 with BB3, fifth comparison: REF2 with BB4, and sixth comparison: REF2 with BB5). In this example, there is in total 6 comparisons (2*N) that may be performed.

On the contrary, if no fragments are used (e.g., in the state of the art), the reference chemical structure is e.g., a whole molecule REFMOL comprising 2 building blocks (i.e., a first building block BB1 and a second building block BB2). In this example, the reference chemical structure is compared with the molecules of the molecule library. The molecule library may comprise molecules of 2 building blocks in which there are N different building blocks forming at least each of the building blocks of the molecule.

In this example, N=3, therefore the molecule library comprises molecules of 2 buildings blocks formed by 3 different building block (i.e., BB3, BB4, and BB5). As a result, the molecule library consists of N*N molecules (i.e., 9 molecules) which are the following: molecule 1 which is represented by the building blocks (BB3-BB3), molecule 2 which is represented by the building blocks (BB3-BB4), molecule 3 which is represented by the building blocks (BB3-BB5), molecule 4 which is represented by the building blocks (BB4-BB3), molecule 5 which is represented by the building blocks (BB4-BB4), molecule 6 which is represented by the building blocks (BB4-BB5), molecule 7 which is represented by the building blocks (BB5-BB3), molecule 8 which is represented by the building blocks (BB5-BB4), and molecule 9 which is represented by the building blocks (BB5-BB5).

Therefore, in this example (where no fragments are used, and comparison is performed by using the whole molecule), 9 comparisons may be performed (i.e., first comparison: REFMOL with molecule 1 , second comparison: REFMOL with molecule 2, third comparison: REFMOL with molecule 3, fourth comparison: REFMOL with molecule 4, fifth comparison: REFMOL with molecule 5, sixth comparison: REFMOL with molecule 6, seventh comparison: REFMOL with molecule 7, eighth comparison: REFMOL with molecule 8, and ninth comparison: REFMOL with molecule 9). In this example, there is in total 9 comparisons (N*N) that may be performed.

As a result, because the disclosed computer implemented method is based on the comparison between reference fragments of a reference chemical structure with fragments, computational cost and time may be reduced (compared to methods based on the comparison of whole molecules with other molecules).

Furthermore, determining the interaction field value for the interaction field point of the fragment (e.g., the reference fragment or the candidate fragments) involves less computational resources and time than determining interaction field value for the interaction field point of the whole chemical structure. Less information may be required to represent or to describe fragments than to represent the whole chemical structure. Particularly, less interaction field points due to the reduced size of the fragments compared to the whole chemical structure may be required.

For example, the information, such as data, may be the crystallographic structure of e.g., the reference chemical structure, or any fragment. Accordingly, operations involving this information may be more efficiently performed.

Using the reference fragment and the candidate fragments may allow comparing the reference interaction field values of the reference interaction field point of the reference fragment with the candidate interaction field values of the candidate interaction field points of the candidate fragments in a simple manner. As a result, using fragments reduces the use of computational resources and time.

In some examples, comparing the reference interaction field values with the candidate interaction field value may comprise generating a similarity index indicating a level of similarity between the candidate interaction field values and the reference interaction field values. The similarity index may be a numerical value that represents the similarity between the interaction field value of two fragments, (i.e. , the reference fragment, and the candidate fragment).

Specifically, the similarity index may indicate a level of similarity between the candidate interaction field values of the candidate fragments and the reference interaction field values of the reference fragment.

In a second aspect, a computational device or apparatus for generating a candidate molecule structure is disclosed. The computational device or apparatus is configured to perform the methods for generating a candidate molecule structure according to any of the examples herein disclosed. The computational device or apparatus may comprise input means or an input module for introducing a reference chemical structure.

In a third aspect, a non-transitory computer-readable medium storing executable instructions is disclosed. When the executable instructions are executed by a processor, it causes the processor to operate a method for generating a candidate molecule structure according to any of the examples herein disclosed.

In a fourth aspect, a computer program product is disclosed. The computer program product comprises program instructions which, when the program is executed by a computer, cause the computer to carry out a method for generating a candidate molecule structure according to any of the examples herein disclosed.

One way of implementing the method herein disclosed would be through a software computer program to be executed in the computational device or apparatus, e.g., a computer. The computational device may employ graphic processing units (GPUs) for the parallelization of tasks through the usage of specific programming languages.

Finally, some tasks that are executed often may also be performed by ad-hoc computational devices designed specifically for these tasks and included e.g., in the computer. These computational devices may be implemented within an electronic circuit capable of performing such tasks and fabricated or realized in reprogrammable hardware devices such as Field Programmable Gate-Arrays (FPGAs).

Advantages derived from these three last aspects may be similar to those mentioned regarding the method for generating a candidate molecule structure of the first aspect.

The term “data” may refer to a collection of individual values which, when processed, convey information. Particularly, the term “data” may be understood as information which is a representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by a human being or an electronic machine such as a computer.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting examples of the present disclosure will be described in the following, with reference to the appended drawings, in which: Figure 1 is a schematic representation of an example computer-implemented method according to the present disclosure.

Figure 2 is a schematic representation of an example computer-implemented method according to the present disclosure.

Figure 3 is a schematic representation of an example computer-implemented method according to the present disclosure comprising a reference molecule.

Figure 4 is a schematic representation of a controller comprising a processor and a non- transitory computer-readable storage medium according to an example.

Figure 5A is a schematic representation of a reference chemical structure according to the present disclosure.

Figure 5B is a schematic representation of a plurality of reference chemical structure fragments according to an example.

Figure 5C is a schematic representation of a plurality of candidate fragments to be compared with a reference fragment according to an example.

Figure 5D schematically represents a candidate molecule structure according to an example.

Figure 5E is a schematic representation of a plurality of candidate fragments to be compared with a reference fragment according to an example.

Figure 5F schematically represents a candidate molecule structure according to an example.

DETAILED DESCRIPTION OF EXAMPLES

In the figures herein disclosed, the same reference signs have been used to designate matching elements.

Fig. 1 schematically represents a computer-implemented method according to an example of the present disclosure. The method comprises fragmenting a reference chemical structure. Particularly, throughout the description the reference chemical structure may be a virtual reference chemical structure. By fragmenting the reference chemical structure, the reference chemical structure is fragmented into a reference fragment and a remaining fragment. Similarly, the virtual reference chemical structure may be fragmented into a plurality of reference chemical structure fragments comprising a virtual reference fragment and a virtual remaining fragment, as represented at block 102. In some examples, the reference chemical structure or the virtual reference chemical structure is a reference molecule as showed e.g., in fig. 3 of the present disclosure.

Fragmenting the reference chemical structure or the virtual reference chemical structure may comprise the application of a set of rules established or stored into a lookup table. These rules may provide information about one or more chemical reactions reproducible when a portion of e.g., the reference chemical structure or the virtual reference fragment is involved. In this way, it is possible to know the capability of the reference fragment or the virtual reference fragment to interact with respectively the remaining fragment or the virtual remaining fragment. These rules may also provide information about a capability respectively of the reference chemical structure or the virtual reference chemical structure to interact with other fragments. Accordingly, the fragmenting may take into account the capability of the fragments of the reference chemical structure to involve known or expected chemical reactions.

As a result, the method herein disclosed provides key differences over existing solutions where no fragmentation is provided. Indeed, the fragmentation of the reference chemical structure or the virtual reference chemical structure reduces the computational cost of running a computer for performing a method treating whole chemical structures, e.g., big molecules.

In some examples, fragmenting the reference chemical structure into a plurality of reference chemical structure fragments (such as a reference fragment and a remaining fragment) comprises identifying or selecting fragments which can be synthesized from known chemistries. A list of bond types derived from common chemical reactions may be used to identify the regions where the reference chemical structure should be partitioned. If a fragment contains only small functional groups (e.g., hydrogen, methyl, ethyl, propyl, and butyl), the fragment may be left unfragmented. Consequently, these examples imply benefits because fragmenting the reference chemical structure takes into account the potential chemical reactions happening between substructures of the reference chemical structure.

In other examples, identifying or selecting structural fragments common to molecules which interact with residues of biological targets or target classes is used for fragmenting the reference chemical structure.

The comparison of fragments may be achieved by means of techniques for identification of structural fragments like pharmacophore modeling techniques. Pharmacophore modeling techniques have been developed where, for a given set of ligands, the importance of hydrophobic, hydrophilic, and charged functional groups and their geometric relationships on biological activity can be determined. Pharmacophore modeling techniques rely on a set of “pharmacophoric points” rather than e.g., atom connectivity.

In some examples, previously to fragmenting 102 the chemical structure into a reference fragment and a remaining fragment, the method may comprise receiving the information, such as data, of the reference chemical structure. Such reference chemical structure may be introduced e.g., manually by a user, or e.g., may be received from a database of chemical structures according to selection criteria. The selection criteria may be defined or preestablished before executing the method and stored in a storage medium.

The method also comprises obtaining a reference fragment and a remaining fragment from the fragmented reference chemical structure.

In some examples, when the reference chemical structure is a virtual reference chemical structure, the method comprises obtaining a virtual reference fragment (VRF) and a remaining fragment from the plurality of virtual reference chemical structure fragments, as represented at block 104.

In some examples, obtaining the fragments, e.g., the reference fragment and the remaining fragment comprises applying retrosynthetic rules. Retrosynthetic rules rely on well-known algorithms wherein fragments are built either by expert through manual extraction or automatically by computing bond energies. Consequently, applying retrosynthetic rules implies fragmenting molecules around bonds which are formed by common chemical reactions, which synergistically combines with fragmenting a reference chemical structure to obtain substructures (e.g., the reference fragment, the remaining fragment) of the reference chemical structure. In some examples, a retrosynthetic combinatorial analysis procedure is performed. In these examples, the fragments are analysed through cluster analysis or frequency of occurrence and the reference fragment is obtained by selecting one fragment from the fragments of the fragmented reference chemical structure.

Furthermore, the method comprises identifying one or more reference interaction field points of the reference fragment, as represented at block 106. Identifying a reference field point of the reference fragment may be understood as selecting a reference field point of the reference fragment, where the reference interaction field point is one point in space. At block 108, determining a reference interaction field value for each of the reference interaction field points of the reference fragment is represented.

In some examples, identifying or selecting the reference interaction field points of the reference fragment is based on one or more of atom properties, atom coordinates, and linker type of the reference chemical structure.

The interaction field value may be determined or calculated for a single interaction field point or for a set of M interaction field points in space (C). This set of interaction field points may be uniformly distributed in space. For each point (c), three coordinate values (c x , c y , c z ) may be a spatial representation thereof. In a cubic uniformly distributed grid, M may be equal to the multiplication of the number of points in each coordinate axis. In one example of the methods herein disclosed, the interaction field value may be determined or calculated for a set of M points in space (C).

In some examples, selecting e.g., a set of interaction field points for the reference fragment may comprise creating a grid of points in space. Additionally, selecting the set of interaction field points may be associated with an interaction field value to each of the interaction field points. For example, associating the reference interaction filed points with the reference interaction filed values may comprise calculating an influence of different hydrophobicity value of the reference fragment. Moreover, molecular properties are generally described in a descriptor point. A descriptor point may be a point defined in space where a molecular feature (or property) is described, e.g., where a hydrophobicity value (hv), is present. Partial charges to generate an electrostatic field or the van der Waals radius are other molecular features that may be described in a descriptor point. Therefore, for example, hydrophobicity may be a molecular property described in a descriptor point. Hydrophobicity may be defined also in a plurality of descriptor points. Each descriptor point may have three coordinate values (i x , i y , i z ). A hydrophobicity value (hv(i)) may be a numerical value representing the hydrophobicity of a fragment (e.g., the reference fragment or the remaining fragment) at each descriptor point. The hydrophobicity value (hv(i)) of e.g., the reference fragment at a given descriptor point (i) may be understood as a representation of the hydrophobicity at that point of the reference fragment. The representation of the hydrophobicity at that point of the reference fragment may be represented by the logarithm of the partition coefficient P (logP) or a partitional type of the logP using fractional components. The number of descriptor points (N) may correspond to the total number of hydrophobicity values. If the descriptor points of the fragments such as the reference fragment or the remaining fragment, are located at the center of each atom of the fragment, then N would be equal to the number of atoms of the fragment.

In some examples, a set of interaction field points may be calculated for a plurality of candidate fragments. Calculating the set of interaction field points for each candidate fragment may comprise creating a grid of points in space, i.e. , a set of interaction field point coordinates. Similarly, determining the set of interaction field points for each candidate fragment may be obtained as described for the reference fragment. Indeed, determining the set of interaction field points for each candidate fragment may be associated with calculating an influence of different hydrophobicity value defined at one or more descriptor points of the candidate fragment to determine the candidate interaction field value at each candidate interaction field point. The one or more interaction field points and/or interaction field values for the candidate fragments may be obtained as described for the reference fragment.

In some examples, identifying 106 or selecting one or more interaction field point of the fragments (such as the reference fragment) may be achieved by calculating the set of points C. Calculating the set of points C may be performed by creating a 3D mesh (e.g., cube or sphere) of uniformly distributed points on the reference fragment located at the center of the mesh. This may be performed by defining a border length (b) and a grid spacing distance (s) and finding the reference fragment coordinate extrema in each coordinate axis. The 3D mesh origin (co) may be defined by subtracting the border length from the minimum reference fragment coordinates (CoordMin): (Eq. 1)

Then the size of the 3D mesh may be calculated by finding the first integer number of points (D) in each coordinate direction that multiplied by the grid spacing is bigger than the grid spacing distance between the maximum reference fragment coordinates (CoordMax) and the minimum reference fragment coordinates (CoordMin) plus two times the border length:

(Eq. 2)

Finally, once the 3D mesh origin and the number of points in each coordinate direction is defined, iteration over all the interaction field points (c) to calculate their coordinates may take place. Interaction field point coordinates may be calculated by adding to the 3D mesh origin coordinates the number of the field point multiplied by the grid spacing in each coordinate direction. Eq.3 shows how the x-axis coordinates of field point Q are calculated (being a number from 0 to D x -1):

(Eq. 3)

In some examples, determining 108 the reference interaction field value, e.g., by calculating the influence of different hydrophobicity values (hv), for each of the reference interaction field points of the reference fragment at each point of the C set may be performed with the following formula:

(Eq. 4) where the interaction field value F(c) is the sum of the contributions of the different hydrophobicity values to that field point, being N the number of descriptor points of the reference fragment, c the field point and f(hv i , d ci ) a field value formula using an hydrophobicity value (h Vi) and the field point distance between that descriptor point and the field point (d C i). The interaction field value F(c) is a numerical value representing the interaction field in a given /field point (c). In some examples, the descriptor points correspond with the atom centers. In some examples, Eq. 4 may be used for determining a candidate interaction field value. In these examples, the candidate interaction field value is calculated with the Eq. 4, wherein N may be used to indicate the number of descriptor points of one or more candidate fragments, (hv) may be used to indicate the hydrophobicity values of each of the descriptor points of one or more candidate fragments, and (d C i) may be used to indicate the field point distance between that descriptor point and the candidate interaction field point. Accordingly, the rest of equations referred to the reference fragment may be adapted to the one or more virtual candidate fragments.

The interaction field value may be calculated with a field value formula (f) that depends on all the hydrophobicity values (hv) and the absolute field point distance (d C i) between the interaction field point and the descriptor point. The field point distance (d C i) between the descriptor points and the interaction field points may be calculated with the following formula: (Eq.5)

The field value formula f (h v(i), d ci ) may describe the influence of a hydrophobicity value to a given point in space.

In some examples, the field value formula f(hv i , d ci ) may be: (Eq. 6)

In other examples, the field value formula f(hv i , d ci ) may be: (Eq. 7) being a an adjusting factor.

In some examples, the method may further comprise calculating a local representation of hydrophobicity at different areas of the reference fragment, e.g., at a predefined distance from each atom of the reference fragment.

In some examples, the hydrophobicity value (hv i ) of the reference fragment is a representation of hydrophobicity value of the reference chemical structure related thereto.

In some examples, the hydrophobicity value (h Vi) is calculated using the contribution of each atom to the logP by using parameters related to the transfer of the reference fragment from apolar and polar phases. The logP is the ratio of concentrations of a compound in a mixture of two immiscible phases at equilibrium. These two phases are usually solvents, typically water and an organic phase like octanol. In that case, the logarithm of the partition coefficient P may be calculated as follows:

(Eq. 8) where ΔG sol is the solvation free energy or Gibbs free energy in solution (water, organic phase like octanol), R is gas constant and T is the temperature. The solvation free energy or Gibbs free energy ( ΔG sol ) is the amount of free energy required from the transfer of the reference chemical structure from the gas phase to the interior of the solvent.

In other examples, hv could be calculated using the fractional logP (Pf), which may be defined as the logP calculated with any of the individual components of the solvation free energy like ΔG cav, ΔGvw or ΔGele: (Eq. 9)

In some examples, calculating the 3D distribution of polar and apolar regions in the reference fragment may comprise calculating the free energy of solvation (ΔGSol) by combining the cavitation (ΔG_Cav), the van der Waals (ΔG_VW) and the electrostatic components (ΔG_Ele).

In some examples, combining the cavitation (ΔG_Cav), the van der Waals (ΔG_VW) and the electrostatic components (ΔG_Ele) may comprise calculating the solvation free energy by using the accurate polarizable continuum model (PCM) developed by Miertus- Scrocco and Tomasi (MST) and is calculated by adding three energy contributions, the cavitation ( ΔG Cav ), the van der Waals (ΔG VW ) and the electrostatic terms (ΔG Ele ):

ΔG sol = ΔG Cav + ΔG vw + ΔG Ele (Eq.10) where ΔG cav is the free energy required for creating a cavity shaped to accommodate the solute in the solvent, ΔG v w is the free energy accounting for dispersion-repulsion interactions between solute and solvent, and ΔG e ie is the free energy needed to build up the solute charge distribution in the solvent.

The use of atomic contributions to the LogP or Pf at the descriptor points may be used to calculate the hydrophobic field value at the interaction field points which may be used to compare the reference fragment with one or more candidate fragments.

In some examples, the reference interaction field value and the candidate interaction field value comprise one of or a combination of the field value corresponding to shape, steric, Lennard-Jones, electrostatic or hydrophobic field values. The field value representing the shape field value of the reference fragment (and according to the same equation of the candidate fragment) compares the radius of the van der Waals particles and the distance between the descriptor points and field points corresponding to the interaction field point e.g., calculated through Eq.5. In such scenario, the reference field value may assume a digital character as it will be 0 or 1 as shown in Eq. 11.

Furthermore, the method comprises obtaining, based on the reference chemical structure, one or more candidate fragments from a plurality of chemical structures as represented at block 110. The plurality of chemical structures comprises the reference chemical structure and/or additional chemical structure fragments from a database. The use of a database of chemical structure fragments may improve the search for candidate fragments.

In some examples, obtaining one or more candidate fragments from a plurality of chemical structures, may be filtering the plurality of chemical structures from a database to obtain candidate fragments based on a size of the reference fragment and I or based on a number of linkers of the reference fragment.

It may be noted that throughout the description the candidate fragment may be a virtual candidate fragment.

Existing methods looking for alternative fragments implies the fragmentation of molecules from a database of molecules. Therefore, these methods imply a burdensome process due to the fact that fragmenting molecules from a database of molecules entails multiple repetition of the same single fragment. In other words, partitioning a database of molecules is inefficient as some fragments are replicated. Instead of fragmenting database of molecules, the method herein disclosed uses a database of fragments. As a result, fragmenting a database of molecules is avoided. The method relies on obtaining the candidate fragments from a database of chemical structure fragments. Consequently, finding repeated candidate fragments is avoided.

In some examples, obtaining 110 the one or more candidate fragments comprises designating one of the reference chemical structure fragments from the plurality of reference chemical structure fragments /or a chemical structure fragment from a database as the one or more candidate fragments. As a result, the reference chemical structure may be filtered to obtain the candidate fragments.

Furthermore, the reference fragment may be compared to and replaced with a replacing candidate fragment from the plurality of candidate fragments, which may coincide with the remaining fragment or may be a portion thereof. Consequently, a reorganization of the fragments of the reference chemical structure in an optimal or suboptimal configuration may be provided.

In some examples, obtaining 110 or filtering the plurality of chemical structures from a database to obtain the one or more candidate fragments comprises: obtaining a spatial orientation of the reference fragment and aligning one of the candidate fragments based on the spatial orientation of the reference fragment.

The spatial orientation of the reference fragment may comprise an expansion center of the reference fragment. The expansion center may coincide, e.g., with the atom center of an anchoring point, or may e.g., be understood as a location at geometric center of one or more atoms of the reference fragment. The expansion center may be a monopole of the multipolar of expansion of the interaction field values. In some examples, the expansion center is a hydrophobic monopole of the reference fragment.

The expansion center may be defined both for the reference fragment and the candidate fragments e.g., obtained from a chemical structure fragments database. The following equation may be used to describe a way of aligning the candidate fragment e.g., by means of tensors:

(Eq. 12) wherein r defines the position of an atom i. Atomic LogP is the parameter used in Eq. 12, although other examples may employ a volume or partial charges as alternative to LogP. A quadrupolar moment tensor may be defined. A definition of the quadrupolar moment tensor includes a sequence of configurations of e.g., electric charge or current, or gravitational mass that may be present in an ideal form, but it is reduced as a part of a multipole expansion of a more complex structure reflecting various orders of complexity. The definition of the quadrupolar moment tensor at the center of expansion defined in Eq. 12 allows the minimization of the dipole, which is zero. Accordingly, the quadrupole moment tensor is traceless at this center of expansion (as expressed or derivable by the following Eq. 13). The quadrupole tensor defines two independent principal values and three principal axes, which represent the canonical axes that define the molecular orientations of the chemical compound, e.g., the reference, candidate fragments, and are invariant to the translation of the reference chemical structure. (Eq. 13)

Therefore, the quadrupolar tensor may yield an orthogonal set of principal axes that can be used for the alignment.

In some examples, the spatial orientation takes into account the substructures of the reference chemical structure. Accordingly, aligning the candidate fragment based on the spatial orientation of the reference fragment relies on an expansion center of the reference fragment, e.g., accordingly to Eq. 12.

In these examples, the method overlays fragments (such as the reference fragment and the candidate fragment) onto each other by translating and rotating these fragments in the space to covering the maximum area of a specific feature. For example, maximum common chemical substructure between the candidate fragment and the reference fragment, a pharmacophoric pattern or a specific moiety.

In some examples, obtaining 110 one or more candidate fragments comprises filtering the plurality of chemical structures, based on a size of the reference fragment. For example, potential candidate fragments bigger than a predetermined percentage of the reference fragment may be discarded. The predetermined percentage may be 20% bigger than the reference fragment. This percentage may be calculated based on a number of heavy atoms of the candidate fragment.

In some examples, potential candidate fragments smaller than a predetermined percentage of the reference fragment may be discarded. In some of these examples, the predetermined percentage may be 20% smaller than the reference fragment.

In some examples, filtering the plurality of chemical structures comprises filtering the whole chemical structure fragments database in order to obtain a limited number of candidate fragments. In other examples, in order to explore a wider amount of candidate fragments, the filtering could be performed after obtaining the candidate fragments from the chemical structure fragments database. Nevertheless, filtering after obtaining the candidate fragments from the chemical structure fragments database requires more computational cost than filtering before the obtaining the candidate fragments. As a result, filtering after obtaining the candidate fragments is preferentially avoided.

In some examples, obtaining or providing 110 one or more candidate fragments comprises filtering the one or more candidate fragments based on a number of linkers of the reference fragment. Therefore, a candidate fragment that has the same number of linkers than the reference fragment may be preferred to a candidate fragment that owns a different number of linkers. The number of linkers of the reference fragment may vary. For small or simple reference chemical structures, only one or few linkers may be taken into consideration for the reference fragment. For more complex reference chemical structures, the reference fragment may have a plurality of linkers.

The method also comprises identifying, based on the reference interaction field points of the reference fragment, one or more candidate interaction field points of the one or more candidate fragments, as represented at block 112. Identifying a candidate fragment field point of the candidate fragment may be understood as selecting a plurality of candidate field points of each candidate fragment, where each of the candidate interaction field points is one point in space corresponding to the position in space of the reference interaction field points for each of the candidate interaction field points. The one or more candidate interaction field points of the one or more candidate fragments may be identified or selected accordingly to any of the examples disclosed at block 106.

Particularly, the relative position in space of the reference interaction field points with the reference fragment is translated into the relative position in space of the candidate interaction field points with each candidate fragment so that each candidate fragment may be compared with the reference fragment by maintaining the same relative position in space of the interaction field points (i.e., reference interaction field point or candidate interaction field point) with the respective fragment (i.e., reference fragment or candidate fragment).

Furthermore, the method comprises determining a candidate interaction field value representing a candidate interaction field value for each of the candidate interaction field points of the one or more candidate fragments, as represented at block 114. The candidate interaction field value may be determined accordingly to any of the examples disclosed at block 108.

The method also comprises comparing the reference fragment with the one or more candidate fragment, as represented at block 116.

The reference interaction field value of each of the reference interaction field points of the reference fragment is compared to the candidate interaction field value of each one of the candidate interaction field points of the one or more candidate fragments.

Furthermore, the method comprises determining 118, based on the comparison, one or more replacing candidate fragments from the one or more candidate fragments for replacing the reference fragment.

The method further comprises generating 120 a candidate molecule structure by replacing the reference fragment with the replacing candidate fragment, to synthesize a candidate molecule structure to bind a specific receptor. Therefore, the generated candidate structure is based at least in part on the one or more replacing candidate fragments.

In some examples, the comparison between the reference interaction field values and the candidate interaction field values comprises generating a similarity index. The similarity index, as disclosed above, indicates a level of similarity between the candidate interaction field value of each of the one or more candidate fragments and the reference interaction field value of the reference fragment at the corresponding interaction field points.

In some examples, the method generates a similarity index. In these examples, determining 118 the one or more replacing candidate fragments is based on those candidate fragments having a similarity index equal or greater than a minimum similarity threshold. In other words, the one or more replacing candidate fragments may be selected by determining whether the similarity index of each of the one or more corresponding candidate fragments is greater than a minimum similarity threshold.

In some examples, the method may determine the one or more replacing candidate fragments by ranking the similarity indexes of the one or more candidate fragments and selecting the one or more replacing candidate fragments corresponding to the N highest ranked similarity indexes of the candidate fragments. N may be understood as a natural number comprised between 10 and 10000.

In some examples, N may be comprised between 500 and 2000. In some examples, in order to determine a strict number of replacing candidate fragments for any single reference fragment, N may correspond to e.g., 500. In other examples, in which extreme speed (e.g., less than several hours or days) is required N may be a natural number e.g., between 10 and 100. In such cases it may be possible to use a filter to discard such replacing candidate fragments which are the same than the reference fragment. In some examples, the method may comprise clustering the replacing candidate fragments and selecting a representative of each cluster of candidate fragments. In some examples, the method may comprise receiving a user selection to discard at least one generated candidate molecule structure.

As explained along the disclosure, the similarity index may be used to prioritize fragments with a higher similarity index than fragments with a low similarity index (which may have a lower probability to result in a candidate molecule structure configured to bind a specific receptor).

In some examples wherein the method generates the similarity index, the method may comprise ranking the similarity index in the following manner: determining a distance between an anchoring point of the one or more candidate fragments and each anchoring point of the reference fragment. Furthermore, the method may comprise ranking the similarity indexes of the one or more candidate fragments to select the M highest ranked similarity indexes of the candidate fragments based on the distance. The reference fragment linker may be understood as the linker that encompasses a conjugating functionality suitable for attachment between the reference fragment and the remaining fragment from the plurality of reference chemical structure fragments. The candidate fragment linker may be understood as the linker that encompasses a conjugating functionality suitable for attachment to the very same remaining fragment.

In some examples, the method may comprise combining the one or more replacing candidate fragments with the remaining fragment. The combination may be based on known synthesis chemical reactions. The combination is realized e.g., through obtaining one or more of first synthesis chemical reactions for a replacing candidate fragment linker of the one or more replacing candidate fragments from the highest ranked candidate fragments. Then, the method of these examples comprises comparing the first synthesis chemical reactions with one or more second synthesis chemical reactions of a reference fragment linker of the remaining reference fragment from a database of synthesis chemical reactions. Furthermore, the method comprises comparing the first synthesis chemical reactions and the second synthesis chemical reactions and verifying if any of the first synthesis chemical reactions exist in the pool of the second synthesis chemical reactions. In other words, the method comprises searching common synthesis chemical reactions among the first and second synthesis chemical reactions. Finally, the method comprises selecting the one or more replacing candidate fragments with common chemical reactions with respect to the remaining fragment.

In some examples, to reduce the computational cost, the computer-implemented method may employ multiple acceleration techniques. In some examples, these techniques include task parallelization or the usage of hardware accelerators to reduce the time required to perform similarity index calculation.

Task parallelization may take advantage of the data independence of various tasks of the method. For example, instead of executing the method in a sequential way, it may detect those tasks that may be executed in parallel and execute those tasks simultaneously, thus reducing the overall execution time. Task parallelization may be achieved at different levels. In some examples, task parallelization is implemented at the reference and candidate fragment level if multiple similarity indexes need to be calculated by calculating all similarity indexes in parallel. Task parallelization may also be implemented at the interaction field point level in the interaction field value calculation or in the comparison between the reference fragment and each of the candidate fragments. Additional techniques may be employed at the instruction level for a finer grain parallelization such as vectorization of mathematical operations.

Fig. 2 schematically represents a computer-implemented method according to one example of the present disclosure. The method comprises receiving a reference chemical structure as represented at block 101.

In some examples according to fig. 2, receiving 101 the reference chemical structure may comprise receiving information such as data about the reference chemical structure and about a plurality of fragments in which it is possible to divide the reference chemical structure.

The method of figure 2 comprise fragmenting the reference chemical structure data as shown at block 1020. The reference chemical structure data may be fragmented into a plurality of reference chemical structure fragments data representing a plurality of reference chemical structure fragments. However, in other examples, the reference chemical structure data comprises information about a plurality of fragments in which it is possible to divide the reference chemical structure.

Block 103 of Fig. 2 schematically illustrates a chemical structure database. The chemical structure database 103 is a database that may comprise or store a plurality of chemical structures. In some examples, the one or more candidate fragments are obtained from the chemical structure database. The one or more candidate fragments may be selected from the chemical structure database e.g., according to a size of the reference fragment as described for some examples of the method represented in fig. 1 at block 110.

As can be seen in fig. 2 the one or more candidate fragments may be obtained from the reference chemical structure data. In some examples, obtaining 110 the one or more candidate fragments comprises selecting one or more fragments of the remaining fragment. Consequently, the method may involve generating 120 a candidate molecule structure by combining and reorganizing the plurality of reference chemical structure fragments. In some examples, one or more candidate fragments are obtained from the reference chemical structure and one or more candidate fragments from a chemical structure database.

A part of blocks 101 , 103 and 1020, the rest of blocks of fig. 2 are also present in fig. 1. The examples in accordance with these blocks shown in fig. 1 may be combined with the examples described at blocks 101 , 103 and 1020.

Fig. 3 schematically represents a computer-implemented method according to an example of the present disclosure. The reference chemical structure is a reference molecule. The reference molecule is a representation of a molecule. The method comprises fragmenting the reference molecule as shown at block 10200. Fragmenting 10200 the reference molecule may be performed using one of the methods of fragmenting the reference chemical structure represented at blocks 102 or 1020 respectively in fig. 1 or fig. 2.

In some examples, the computer-implemented method comprises receiving a reference molecule data representing the reference molecule. As for the reference chemical structure, the reference molecule may be fragmented into a plurality of reference chemical structure fragments data representing a plurality of reference chemical structure fragments or reference molecule fragments.

The plurality of reference molecule fragments may correspond to e.g., Mol2 file format or SDF (structure data file) file format. For a given molecule, Mol2 file format provides the positions of each atom thereof in space, typically with X, Y, and Z cartesian coordinates. Mol2 comprises a plain text tabular format representing the atoms of the molecule, chemical elements, atomic coordinates, chemical bond information, and metadata of a molecule. A Mol2 file catalogues the information about the molecule in a plurality of sections. For example, Mol2 file format comprises atomic coordinates, and information about how the atoms of the molecule are connected. As a result, a combination of the information contained in the section of the file may be taken into account to obtain the plurality of reference molecule fragments.

The method may receive e.g., a user selection of such reference molecule data or a database of molecules according to a selection criterion. The selection criteria may be defined or preestablished before executing the method and stored in a storage medium.

Some examples according to fig. 3 comprise obtaining 104 a reference fragment. The reference fragment is one of the fragments obtained from one or more conformations of the reference molecule data. The conformation of the reference molecule may correspond to a local energy minimum or the bioactive conformation.

In some examples, the fragments (e.g., the reference fragment or the candidate fragment) may be based on one or more conformations of the reference molecule and I or one or more conformations of the candidate fragment respectively.

In some examples, the reference molecule data may comprise receiving a reference receptor protein data representing a reference receptor protein. Receptor proteins are chemical structures responsible for binding, where a specific ligand molecule binds. When a ligand molecule binds to a protein, the receptor protein may change conformation, transmitting a signal into cells. Therefore, the interaction between ligand molecule and the receptor protein has a key role e.g., regulation of glucose concentration in the blood. Furthermore, as the changes in receptor protein conformation depend on the linked ligand molecule, the method may involve an efficient way of exploring similar chemical structures. As a result, the method may generate more than one candidate molecule structure.

As such, receiving the reference receptor protein data may allow the method to generating 120 a candidate molecule structure to replace a known drug.

Some examples according to fig. 3 comprise retrieving one or more candidate fragments from a chemical structure fragments library. Then the method obtains one or more candidate fragments based on the retrieved data. The chemical structure fragments library may comprise a set of building blocks. The set of building blocks may be obtained e.g., through retrosynthetic rules of decomposition of druglike chemical structures prior to the execution of the method of this example.

The candidate fragments obtained from the set of building blocks may present the same number of linkers of the reference fragment or a higher number of linkers. In order to compare the interaction field value of the candidate fragments and the reference fragment, the method may involve ignoring some of the linkers of the candidate fragment. In some examples, more candidate fragments may be obtained by replicating the candidate fragment considering a different linker each time and performing the method to the each of the candidate fragments obtained.

In some examples, a spatial orientation of the reference fragment is obtained. In these examples, the method comprises aligning one of the candidate fragments based on the spatial orientation of the reference fragment. Aligning the one of the candidate fragments may rely on e.g., an expansion center of the reference fragment as described by Eq. 12. Additionally, the method may comprise determining a distance between an anchoring point of the one or more candidate fragments from the chemical structure fragments library and the one or more anchoring points of the reference fragment. Then, the method comprises obtaining one or more candidate fragments based on the chemical structure fragments library if the distance between anchoring points is equal to or less than a predefined maximum threshold.

In some examples, if multiple anchoring points exist multiple distances are defined. These distances may be combined in a single value to be compared to a predefined maximum threshold.

In some examples, the maximum threshold is 2 angstroms (A). In other examples, the value of the maximum threshold may be any suitable value. In some examples, the maximum threshold is between 1 A and 4 A.

In some examples, the expansion center may be defined at the linker position.

In some examples, the method comprises a virtual screening of the candidate molecule after generating a candidate molecule structure. For example, structure-based virtual screening (SBVS) and/or ligand-based virtual screening (LBVS) may be used.

In some examples, one or more candidate fragments are retrieved from a chemical structure fragments library comprising a set of building blocks and a virtual screening of the chemical space is employed. In this way the virtual screening of the chemical space is applied to the fragments instead of the molecules. The presence of the set of building blocks may reduce a number of drawbacks usually related to the virtual screening. These drawbacks are e.g., a relative small number of molecules explored compared to the higher dimension of the chemical space, the space required to store data, and the difficulty to access the results due to high computational cost involved to run a virtual screening of the chemical space. The database may be tailored depending on e.g., the time available to generate a candidate molecule structure or if an extensive exploration of the chemical space is requested. For example, taking 1000 candidate fragments from a set of 1 million building blocks to be combined through two reactions to generate a candidate molecule structure, a chemical space of 1 billion molecules would be explored to create 1 million candidate molecules from a focused chemical space. Obtained candidates molecule structures are configured to be synthesizable and bind a specific receptor.

Fig. 4 schematically represents a controller 400 comprising a processor 401 and a non- transitory computer-readable storage medium 402 according to an example of the present disclosure.

The non-transitory computer-readable storage medium 402 comprises executable instructions 404 that, when executed by a processor, cause the processor 401 to operate a method according to any of the examples herein disclosed.

The non-transitory computer-readable storage medium 402 may include any electronic, magnetic, optical, or other physical storage device that stores executable instructions. The non-transitory computer-readable storage medium may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disk, and the like.

The method for generating a candidate molecule structure according to any of the examples herein disclosed may be carried out by a computer program product comprising program instructions executable by a computer.

The computer program product may be embodied on the storage medium 402 (for example, a CD-ROM, a DVD, a USB drive, on a computer memory or on a read-only memory) or carried on a carrier signal (for example, on an electrical or optical carrier signal).

The computer program may be in the form of source code, object code, a code intermediate source and object code such as in partially compiled form, or in any other form suitable for use in the implementation of the processes. The carrier may be any entity or device capable of carrying the computer program.

For example, the carrier may comprise a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a hard disk. Further, the carrier may be a transmissible carrier such as an electrical or optical signal, which may be conveyed via electrical or optical cable or by radio or other means.

When the computer program product is embodied in a signal that may be conveyed directly by a cable or other device or means, the carrier may be constituted by such cable or other device or means.

Alternatively, the carrier may be an integrated circuit in which the computer program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant methods.

Figure 5A is a schematic representation of a chemical structure used as the reference chemical structure 500. The reference chemical structure 500 is thus the starting structure to generate an alternative molecule structure. The reference chemical structure 500 is located in space by a set of interaction field points 504, distributed therein. The set of interaction field points 504 is represented as a grid of interaction field points where the interaction field values are calculated. Each interaction field value represents a discretization of e.g., a field electrostatic or steric fields of the reference chemical structure 500 which are continuous.

As can be seen in fig. 5A, the reference chemical structure 500 comprises a ring chemical structure 501 , e.g., a hydrocarbon cyclohexane. The ring chemical structure is linked to the fragment 503 through a chemical bond 502.

Figure 5B is a schematic representation of the reference chemical structure of fig. 5A split into a plurality of reference chemical structure fragments. Fragmenting the reference chemical structure into the plurality of reference chemical structure fragments entails obtaining the reference fragment 530 and the remaining fragment 510.

In the example of fig. 5B, the remaining fragment 510 comprises the ring chemical structure 501 and the reference fragment 530 comprises the fragment 503. In other examples the ring chemical structure 501 is the reference fragment. As it can be seen in fig. 5B neither the reference fragment 530 nor the remaining fragment 510 keep the chemical bond 502. Therefore, the chemical bond 502 is not part of the reference fragment 530 and consequently is not compared with the one or more candidate fragments.

From the reference fragment 530, an anchoring point 532 is identified. A remaining fragment anchoring point 538 is also identified. These two anchoring points 532 and 538 are the atoms where the bond between the reference fragment 530 and the remaining fragment 510 existed. Once the reference interaction field points are identified, reference interaction field values are determined for each interaction field point.

In other examples, the reference fragment comprises a plurality of anchoring points where the bonds connect to other fragments of the chemical structure. Therefore, fragmenting the chemical structure may provide a plurality of remaining fragments corresponding to the plurality of the anchoring points.

Fig. 5C is a schematic representation of the reference fragment 530 and three candidate fragments. A first candidate fragment 540 is represented. A second candidate fragment 550 is represented with e.g., a candidate anchoring point 552 of the second candidate fragment 550 in the atom that is going to form a bond with another chemical fragment, e.g., the remaining fragment 510 of fig. 5B. A third candidate fragment 560 is represented. The first candidate fragment 540 and the third candidate fragment also comprise at least one candidate anchoring point (not shown in fig. 5C). Candidate interaction field value is determined for the candidate interaction field points of each of the first 540, second 550 and third 560 candidate fragments respectively.

In order to select a suitable candidate fragment from fig. 5C, the candidate fragments are overlaid on top of the reference fragment to perform the comparison. After performing the overlay, the distance of the candidate fragments anchoring points from the reference fragment anchoring points could be measured and used to filter out those candidate fragments where the distance is higher than a given threshold, e.g., the threshold could be a value between 1 and 4 A. Similarly, a distance between a candidate fragment linker and a reference fragment linker may be determined to rank the candidate fragments and to select the highest ranked candidate fragments for replacement.

Furthermore, the reference interaction field values of the reference interaction field points are compared to each of the candidate interaction field values of the candidate interaction field points. Then, a replacing candidate fragment is determined. The replacing candidate fragment may be the candidate fragment with the candidate interaction field value most similar to the reference interaction field value.

Furthermore, a candidate molecule structure 580 is generated. As can be seen in fig. 5D, the candidate molecule structure 580 is generated by combining the replacing candidate fragment 550A (corresponding to i.e. , the second candidate fragment 550 of fig. 5C) and the remaining fragment 510. As it can be seen, between the replacing candidate 550A and the remaining fragment 510, there is the chemical bond 520.

Fig. 5E is a schematic representation of a reference fragment 530 and three candidate fragments. A fourth candidate fragment 5010, a fifth candidate fragment 5020 and a sixth candidate fragment 5030 are suitable to replace the reference fragment 530 (corresponding to the ring chemical structure 501 of fig. 5A) of fig. 5E. A candidate anchoring point 5012 of the fourth candidate fragment 5010 is identified. The fifth and sixth candidate fragments 5020 and 5030 also comprise at least one anchoring point (not shown in fig. 5E). The reference chemical structure could be e.g., the reference chemical structure 500 of fig. 5A or the candidate molecule structure 580 of fig. 5D. Using the candidate molecule structure 580 of fig. 5D as a reference chemical structure implies to modify more than one reference fragment. More specifically, the method may be performed on the candidate molecule structure obtained previously by the method. Therefore, a previously obtained candidate molecule structure may be used as the reference chemical structure. Additionally, a fragment previously used by the method as a remaining fragment may be used as a reference fragment. Consequently, the new candidate molecule structure may comprise replacing candidate fragments replacing a plurality of reference fragments. In order to reduce the computational cost, parallelization task may be employed to obtain a plurality of candidate fragments for each reference fragment to be compared.

Fig. 5F is a representation of a second candidate molecule structure 590, generated by combining a second replacing candidate 5010B (corresponding to the fourth candidate fragment 5010 of fig. 5E) and the remaining fragment 510 (corresponding to the replacing fragment 550A of fig, 5D). In this example, the candidate molecule structure 580 of fig. 5D is used as the reference chemical structure 500 from which the second candidate molecule structure 590 is generated. For reasons of completeness, various aspects of the present disclosure are set out in the following numbered clauses:

Clause 1 : A computer-implemented method for generating candidate molecule data, comprising: fragmenting a virtual reference chemical structure into a plurality of virtual reference chemical structure fragments; obtaining a virtual reference fragment and a virtual remaining fragment from the plurality of virtual reference chemical structure fragments; identifying one or more reference interaction field points of the virtual reference fragment; determining reference interaction field value data representing a reference interaction field value for each of the reference interaction field points of the virtual reference fragment; obtaining, based on the virtual reference chemical structure, one or more virtual candidate fragments from a plurality of virtual chemical structures, wherein the plurality of virtual chemical structures comprises the virtual reference chemical structure and/or a chemical structure fragments database; identifying, based on the reference interaction field points of the virtual reference fragment, one or more candidate interaction field points of the one or more virtual candidate fragments; determining candidate interaction field value data representing a candidate interaction field value for each of the candidate interaction field points of the one or more virtual candidate fragments; comparing the reference interaction field value data of each of the reference interaction field points of the virtual reference fragment with the candidate interaction field value data of each one of the candidate interaction field points of the one or more virtual candidate fragments to obtain comparison data; determining, based on the comparison data, one or more virtual replacing candidate fragments from the one or more virtual candidate fragments for replacing the virtual reference fragment; and generating candidate molecule data based on the one or more virtual replacing candidate fragments. Clause 2: The computer-implemented method of clause 1 , wherein generating candidate molecule data comprises combining the one or more virtual replacing candidate fragments with the virtual remaining fragment.

Clause 3:The computer-implemented method of any of preceding clauses, wherein comparing the reference interaction field value data of each of the reference interaction field points of the virtual reference fragment with the candidate interaction field value data of each one of the candidate interaction field points of the one or more virtual candidate fragments comprises generating a similarity index indicating a level of similarity between the candidate interaction field value data of each of the one or more virtual candidate fragments and the reference interaction field value data of the virtual reference fragment.

Clause 4: The computer-implemented method of clause 3, wherein determining the one or more virtual replacing candidate fragments comprises determining whether the similarity index of each of the one or more virtual candidate fragment exceeds a minimum similarity threshold.

Clause 5: The computer-implemented method of any of the clauses 3-4, wherein determining the one or more virtual replacing candidate fragments comprises ranking the similarity indexes of the one or more virtual candidate fragments and selecting the one or more virtual replacing candidate fragments corresponding to the N highest ranked similarity indexes of the virtual candidate fragments.

Clause 6: The computer-implemented method of clause 5, wherein ranking the one or more virtual candidate fragments further comprises: determining a distance between an anchoring point of the one or more virtual candidate fragments and an anchoring point of the virtual reference fragment; ranking the similarity indexes of the one or more virtual candidate fragments to select the M highest ranked similarity indexes of the virtual candidate fragments based on the distance. Clause 7: The computer-implemented method of any of clauses 5-6, wherein generating candidate molecule data comprises: obtaining one or more of first synthesis chemical reactions for a virtual replacing candidate fragment linker of the one or more virtual replacing candidate fragments from the highest ranked virtual candidate fragments; comparing the first synthesis chemical reactions with one or more second synthesis chemical reactions of a virtual remaining reference fragment linker of the virtual remaining reference fragment from a database of synthesis chemical reactions; verifying if one or more of the first synthesis chemical reactions is the same of one or more of the second synthesis chemical reactions; selecting the virtual replacing candidate fragments having the one or more of the first synthesis chemical reactions equal to the one or more of the second synthesis chemical reactions; generating candidate molecule data based on the selection.

Clause 8: The computer-implemented method of any of the preceding clauses, wherein obtaining the virtual reference fragment comprises applying retrosynthetic rules.

Clause 9: The computer-implemented method of any of the preceding clauses, wherein obtaining the one or more virtual candidate fragments comprises designating one of the virtual reference chemical structure fragments from the plurality of virtual reference chemical structure fragments as the one or more virtual candidate fragments.

Clause 10: The computer-implemented method of any of the preceding clauses, wherein obtaining the one or more virtual candidate fragments comprises: obtaining a spatial orientation of the virtual reference fragment; aligning one of the virtual candidate fragments based on the spatial orientation of the virtual reference fragment.

Clause 11 : The computer-implemented method of clause 10, wherein obtaining the spatial orientation of the virtual reference fragment comprises obtaining an expansion center of the virtual reference fragment. Clause 12: The computer-implemented method of clause 11 , wherein the expansion center is a monopole of the multipolar of expansion of the interaction field values.

Clause 13: The computer-implemented method of clause 11 , wherein the expansion center is an anchoring point of the virtual reference fragment.

Clause 14: The computer-implemented method of any of the preceding clauses, wherein obtaining the one or more virtual candidate fragments comprises filtering the plurality of virtual chemical structures, based on a size of the virtual reference fragment.

Clause 15: The computer-implemented method of any of the preceding clauses, wherein filtering the plurality of virtual chemical structures comprises comparing a size of the plurality of virtual chemical structures with the size of the virtual reference fragment.

Clause 16: The computer-implemented method of any of the preceding clauses, wherein obtaining the one or more virtual candidate fragments comprises filtering the one or more virtual candidate fragments based on a number of linkers of the virtual reference fragment.

Clause 17: The computer-implemented method of any of the preceding clauses, wherein identifying the one or more reference interaction field points of the virtual reference fragment is based on one or more of atom identification data, atom coordinate data, and linker type data of the virtual reference chemical structure.

Clause 18: The computer-implemented method of any of the preceding clauses, wherein the virtual reference chemical structure is a virtual reference molecule.

Clause 19: The computer-implemented method of any of the preceding clauses, comprising receiving virtual reference chemical structure data representing the virtual reference chemical structure. Clause 20: The computer-implemented method according to clause 19, wherein the virtual reference chemical structure is a virtual reference molecule, and the virtual reference chemical structure data is virtual reference molecule data.

Clause 21 : The computer-implemented method of clause 20, wherein obtaining a virtual reference fragment is based on one or more conformations of the virtual reference molecule data.

Clause 22: The computer-implemented method of clause 20, wherein receiving the virtual reference molecule data comprises receiving virtual reference receptor protein data representing a virtual reference receptor protein.

Clause 23: The computer-implemented method of any of the preceding clauses, wherein obtaining one or more virtual candidate fragments comprises: retrieving one or more virtual candidate fragments from a virtual chemical structure fragments library; obtaining one or more virtual candidate fragments based on the retrieved data.

Clause 24: The computer-implemented method of clause 23, wherein obtaining the one or more virtual candidate fragments is based on different conformations of the chemical structure fragments library.

Clause 25: The computer-implemented method of any of clauses 23-24, wherein obtaining the one or more virtual candidate fragments further comprises: obtaining a spatial orientation of the virtual reference fragment; aligning one of the virtual candidate fragments based on the spatial orientation of the virtual reference fragment; determining a distance between an anchoring point of the one or more virtual candidate fragments from the chemical structure fragments library and an anchoring point of the virtual reference fragment; obtaining one or more virtual candidate fragments based on the chemical structure fragments library if the distance is equal to or less than a predefined maximum threshold. Clause 26: A computational device or apparatus for generating candidate molecule data configured to perform the method according to any of clauses 1 to 25.

Clause 27: The computational device or apparatus of clause 26 further comprising an input module for introducing virtual reference chemical structure data.

Clause 28: A non-transitory computer-readable storage medium storing executable instructions that, when executed by a processor, cause the processor to operate a method according to any of clauses 1 to 25.

Clause 29: A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of clauses 1 to 25.

Clause 30: A computer-implemented method for generating a candidate molecule structure, the method comprising: fragmenting a reference chemical structure; obtaining a reference fragment and a remaining fragment from the fragmented reference chemical structure; selecting a plurality of reference interaction field points of the reference fragment, each of the reference interaction field points being one point in space; determining a reference interaction field value for the reference interaction field point of the reference fragment; providing a plurality of candidate fragments; applying the reference interaction field points of the reference fragment to each of the candidate fragments to select a plurality of candidate interaction field points of each candidate fragment so that each of the candidate interaction field points is one point in space corresponding to the position in space of the reference interaction field point for each of the candidate interaction field points; determining candidate interaction field values for the candidate interaction field points of the candidate fragments; comparing the reference interaction field value with the candidate interaction field values; determining, based on the comparison, a replacing candidate fragment from the candidate fragments; and generating a candidate molecule structure by replacing the reference fragment with the replacing candidate fragment, to synthesize a candidate molecule structure to bind a specific receptor.

Clause 31 : The computer-implemented method of clause 30, the method comprising filtering a plurality of chemical structures from a database to obtain the candidate fragments.

Clause 32: The computer-implemented method of any of clauses 30 or 31 , wherein generating the candidate molecule structure comprises combining the replacing candidate fragment with the remaining fragment.

Clause 33: The computer-implemented method of any of clauses 30 to 32, wherein comparing the reference interaction field value with the candidate interaction field value comprises generating a similarity index indicating a level of similarity between the candidate interaction field value and the reference interaction field value.

Clause 34: The computer-implemented method of any of clauses 30 to 33, wherein determining the replacing candidate fragment comprises determining whether the similarity index of the candidate fragment exceeds a minimum similarity threshold.

Clause 35: The computer-implemented method of clause 34, wherein determining the replacing candidate fragment comprises ranking the similarity indexes of the candidate fragment and selecting the replacing candidate fragment corresponding to the N highest ranked similarity indexes of the candidate fragment.

Clause 36: The computer-implemented method of clause 35, wherein ranking the candidate fragment further comprises: determining a distance between an anchoring point of the candidate fragment and an anchoring point of the reference fragment; ranking the similarity indexes of the candidate fragment to select the M highest ranked similarity indexes of the candidate fragment based on the distance. Clause 37: The computer-implemented method of any of clauses 30 to 36, wherein generating a candidate molecule structure comprises: obtaining one or more of a first synthesis chemical reactions for a replacing candidate fragment linker of the replacing candidate fragment from the highest ranked candidate fragment; comparing the first synthesis chemical reactions with one or more second synthesis chemical reactions of a remaining reference fragment linker of the remaining reference fragment from a database of synthesis chemical reactions; verifying if one or more of the first synthesis chemical reactions is the same of one or more of the second synthesis chemical reactions; selecting the replacing candidate fragment having the one or more of the first synthesis chemical reactions equal to the one or more of the second synthesis chemical reactions; generating the candidate molecule structure based on the selection.

Clause 38: The computer-implemented method of any of clauses 30 to 37, wherein obtaining the reference fragment comprises applying retrosynthetic rules.

Clause 39: The computer-implemented method of any of clauses 30 to 38, wherein providing the plurality of candidate fragments comprises designating one of the reference chemical structure fragments from the plurality of reference chemical structure fragments as the candidate fragment.

Clause 40: The computer-implemented method of any of clauses 30 to 39, wherein providing the plurality of candidate fragments comprises: obtaining a spatial orientation of the reference fragment; aligning the candidate fragment based on the spatial orientation of the reference fragment.

Clause 41 : The computer-implemented method of clause 40, wherein obtaining the spatial orientation of the reference fragment comprises obtaining an expansion center of the reference fragment. Clause 42: The computer-implemented method of clause 41 , wherein the expansion center is a monopole of the multipolar of expansion of the interaction field value.

Clause 43: The computer-implemented method of clause 41 , wherein the expansion center is an anchoring point of the reference fragment.

Clause 44: The computer-implemented method of any of clauses 31 to 43, wherein filtering the plurality of chemical structures from a database to obtain the candidate fragments is based on a size of the reference fragment and/or based on a number of linkers of the reference fragment.

Clause 45: The computer-implemented method of any of clauses 31 to 44, wherein filtering the plurality of chemical structures comprises comparing a size of the plurality of chemical structures with the size of the reference fragment.

Clause 46: The computer-implemented method of any of clauses 30 to 45, wherein selecting the plurality of reference interaction field points of the reference fragment is based on atom properties, atom coordinates, and linker type of the reference chemical structure.

Clause 47: The computer-implemented method of any of clauses 30 to 46, wherein the reference chemical structure is a reference molecule.

Clause 48: The computer-implemented method of any of clauses 30 to 47, comprising receiving a reference chemical structure data representing the reference chemical structure.

Clause 49: The computer-implemented method according to clause 48, wherein the reference chemical structure is a reference molecule, and the reference chemical structure data is a reference molecule data.

Clause 50: The computer-implemented method of any of clauses 47 to 49, wherein obtaining a reference fragment is based on one or more conformations of the reference molecule. Clause 51 : The computer-implemented method of clause 49, wherein receiving the reference molecule data comprises receiving a reference receptor protein data representing a reference receptor protein.

Clause 52: The computer-implemented method of any of clauses 30 to 51 , wherein obtaining a candidate fragment comprises: retrieving a candidate fragment from a virtual chemical structure fragments library; obtaining a candidate fragment based on the retrieved data.

Clause 53: The computer-implemented method of clause 52, wherein obtaining the candidate fragment is based on different conformations of the chemical structure fragments library.

Clause 54: The computer-implemented method of clauses 52 or 53, wherein obtaining the candidate fragment further comprises: obtaining a spatial orientation of the reference fragment; aligning the candidate fragment based on the spatial orientation of the reference fragment; determining a distance between an anchoring point of the candidate fragment from the chemical structure fragments library and an anchoring point of the reference fragment; obtaining a candidate fragment based on the chemical structure fragments library if the distance is equal to or less than a predefined maximum threshold.

Clause 55: A computational device or apparatus for generating a candidate molecule structure configured to perform the method according to any of clauses 30 to 54.

Clause 56: A non-transitory computer-readable medium storing executable instructions that, when executed by a processor, cause the processor to operate a method according to any of clauses 30 to 55.

Clause 57: A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of clauses 33 to 56. Although only a number of examples have been disclosed herein, other alternatives, modifications, uses and/or equivalents thereof are possible. Furthermore, all possible combinations of the described examples are also covered. Thus, the scope of the present disclosure should not be limited by particular examples but should be determined only by a fair reading of the clauses that follow.