FOK MANSON (CN)
US20210371841A1 | 2021-12-02 | |||
CN111210871A | 2020-05-29 |
TRAGNI VINCENZO, PREZIUSI FRANCESCA, LAERA LUNA, ONOFRIO ANGELO, MERCURIO IVAN, TODISCO SIMONA, VOLPICELLA MARIATERESA, DE GRASSI : "Modeling SARS-CoV-2 spike/ACE2 protein–protein interactions for predicting the binding affinity of new spike variants for ACE2, and novel ACE2 structurally related human protein targets, for COVID-19 handling in the 3PM context", THE EPMA JOURNAL, SPRINGER, NL, vol. 13, no. 1, 1 March 2022 (2022-03-01), NL , pages 149 - 175, XP093134368, ISSN: 1878-5077, DOI: 10.1007/s13167-021-00267-w
YAZDANI-JAHROMI MEHDI, YOUSEFI NILOOFAR, TAYEBI AIDA, GARIBAY OZLEM OZMEN, SEAL SUDIPTA, KOLANTHAI ELAYARAJA, NEAL CRAIG J.: "Interpretable and Generalizable Attention-Based Model for Predicting Drug-Target Interaction Using 3D Structure of Protein Binding Sites: SARS-CoV-2 Case Study and in-Lab Validation", BIORXIV, 18 February 2022 (2022-02-18), pages 1 - 11, XP093134375, DOI: 10.1101/2021.12.07.471693
CLAIMS What is claimed is: 1. A method of modulating interaction between a binding protein and a ligand protein, comprising: providing a heterogeneous database comprising a plurality of datasets related to interactions between a first set of proteins and a second set of proteins, wherein the plurality of datasets comprises experimental data from a plurality of experimental techniques; preparing a structure dataset utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair comprising a ligand, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure dataset to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising a plurality of initial candidate binding proteins; generating, using the protein interaction algorithm, a secondary library comprising a plurality of secondary candidate binding proteins, wherein the secondary binding proteins are selected by the protein interaction algorithm as comprising a modulated interaction with the ligand protein; screening at least some of the plurality of secondary candidate binding proteins for modulated interaction with the ligand protein using an in vivo or in vitro screening assay to identify a set of tertiary candidate binding proteins; and synthesizing at least a portion of the set of tertiary candidate binding protein for use in at least one of an in vitro biomedical assay, an in vivo biomedical assay, and in preparing a therapeutic formulation. 2. The method of claim 1, wherein the plurality of experimental techniques comprises collecting experimental data from two or more of enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and a neutralization assay. 3. The method of claim 1 or 2, wherein the binding protein is selected from the group consisting of an antibody, a fragment of an antibody, a single-chain antibody, and a fragment of a single chain antibody. 4. The method of claim 3, wherein the antibody is a therapeutic antibody or is a result of immunization. 5. The method of one of claims 1 to 4, wherein the ligand is selected from the group consisting of an immune checkpoint protein, a tumor marker, and a component of a pathogen. 6. The method of claim 5, wherein the pathogen is a virus. 7. The method of claim 6, wherein the virus is a coronavirus and wherein the ligand is a spike protein of the coronavirus. 8. The method of one of claims 1 to 7, wherein the modulation comprises an increase in binding between the binding protein and the ligand protein. 9. The method of one of claims 1 to 8, wherein preparing the structure dataset comprises generating graphical unified protein structure models for each first and second protein, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E) , wherein V= {v 1,v 2,…,v N} are node features for each amino acid residue and pairwise edge features between amino acid residues. 10. A composition comprising a mutated ligand generated by the method of claims 1 to 9. 11. A composition for use in treating a viral infection, comprising a mutated ligand generated by the method of claims 1 to 9. 12. The composition of claim 11, wherein the viral infection is a coronavirus infection, and wherein the mutated ligand shows increased affinity for ACE2 relative to a wild type SARS coronavirus spike protein or relative to SARS spike proteins of a plurality of SARS coronavirus variants. 13. A method of identifying a mutated pathogen with increased infectivity or escape from immunotherapy, comprising: providing a heterogeneous database comprising data related to interactions between a first set of proteins and a second set of proteins, wherein the heterogeneous database comprises experimental data from a plurality of experimental techniques; preparing a structure database utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure database to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising sequence information for an immunotherapy protein or a host receptor and a plurality of initial candidate pathogen ligand proteins originated from mutant pathogens, wherein the immunotherapy protein or the host receptor interacts with a ligand protein of wild type pathogen; generating, using the protein interaction algorithm, a secondary library comprising a plurality of secondary pathogen ligand proteins, wherein the secondary pathogen proteins are selected by the protein interaction algorithm as comprising a reduced interaction with the immunotherapy protein or an increased interaction with the host receptor ; and screening at least some of the plurality of secondary pathogen proteins for reduced interaction with the immunotherapy protein or increased interaction with the host receptor using an in vivo or in vitro assay; and reporting a pathogen comprising a secondary pathogen protein with reduced interaction with the immunotherapy protein relative to wild type pathogen to a practitioner as likely to escape treatment with the immunotherapy protein or comprising a secondary pathogen protein with increased interaction with the host receptor to a practitioner as having increased infectivity. 14. The method of claim 13, wherein the plurality of experimental techniques comprises collecting experimental data from two or more of enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and a neutralization assay. 15. The method of claim 13 or 14, wherein the immunotherapy protein is selected from the group consisting of an antibody, a fragment of an antibody, a single-chain antibody, and a fragment of a single-chain antibody. 16. The method of one of claim 13 to 15, wherein the mutated pathogen is a virus. 17. The method of one of claims 13 to 16, wherein the plurality of initial candidate pathogen proteins comprises a coronavirus spike protein. 18. The method of one of claims 13 to 17, wherein preparing the structure dataset comprises generating graphical unified protein structure models for each first and second protein, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E) , wherein V= {v 1,v 2,…,v N}are node features for each amino acid residue and pairwise edge features between amino acid residues. 19. A method of improving prediction of binding affinities between a first protein and a second protein, comprising: providing a heterogeneous database comprising data related to interactions between a first set of proteins and a second set of proteins, wherein the heterogeneous database comprises experimental data from a plurality of experimental techniques; preparing a structure database utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure database to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising sequence information for the first protein and the second protein; selecting a first protein and a second protein from the primary library by a user; generating, using the protein interaction algorithm, a predicted binding affinity between the first protein and the second protein and reporting the predicted binding affinity to the user. 20. The method of claim 19, wherein the plurality of experimental techniques comprises collecting experimental data from two or more of enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and a neutralization assay. 21. The method of claim 20 or 21, wherein the first protein is an antibody and the second protein is a ligand. 22. The method of claim 19 to 21, wherein the first protein is a coronavirus spike protein and the second protein is ACE2. 23. A method of generating a high affinity antibody directed to an antigen, comprising: providing a heterogeneous database comprising data related to interactions between the antigen and a set of initial candidate antibody proteins that can form a protein binding pair, wherein the heterogeneous database comprises experimental data from a plurality of experimental techniques; preparing a structure database utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of the protein binding pair, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure database to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising sequence information for the antigen and a plurality of initial candidate antibody proteins originated from an initial antibody directed to the antigen; generating, using the protein interaction algorithm, a secondary library comprising a plurality of secondary antibody proteins, wherein the secondary antibody proteins are selected by the protein interaction algorithm as comprising an increased interaction with the antigen relative to the initial antibody; and screening at least some of the plurality of secondary antibody proteins for increased interaction with the antigen using an in vivo or in vitro assay to identify a plurality of tertiary antibody proteins having increased affinity for the antigen relative to the initial antibody; and generating an antibody having improved affinity for the antigen from among the plurality of tertiary antibody proteins . 24. The method of claim 23, wherein the antibody is selected from the group consisting of a divalent antibody, a fragment of a divalent antibody, a single-chain antibody, and a fragment of a single-chain antibody. 25. The method of claim 23 or 24, wherein the antigen is derived from a pathogen, an immunotherapy target, or a cancer marker. 26. The method of one of claims 23 to 25, wherein preparing the structure dataset comprises generating graphical unified protein structure models for each of the antigen and initial candidate antibody protein pairs, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E) , wherein V= {v 1,v 2,…,v N} are node features for each amino acid residue an are pairwise edge features between amino acid residues. 27. A system for deriving protein binding characteristics comprising: a database module, comprising heterogeneous biologic data, wherein heterogeneous biologic data comprises protein sequence data and biologic data originating from a plurality of experimental techniques or forms of expression for the biologic data; a protein representation module, in which heterogeneous data from the database module is used to construct a plurality of graphical hierarchal protein structures for proteins represented in the database module; and an AI module, comprising encoded instructions to utilize the plurality of graphical hierarchal protein structures as a training set to derive a protein interaction algorithm and to apply the protein interaction algorithm to evaluate or estimate binding characteristics of wild type and/or mutated proteins provided to the AI module. 28. The system of claim 27, wherein the plurality of experimental techniques comprises two or more of enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and a neutralization assay. 29. The system of claim 27 or 28, wherein the database module comprises binding energy estimates derived from experimental data. 30. The system of one of claims 27 to 29, wherein the database module comprises biological data directly related to a protein or proteins being characterized. 31. The system of one of claims 27 to 29, wherein the database module does not comprise biological data directly related to a protein or proteins being characterized. 32. The system of one of claims 27 to 31, comprising an effector. 33. The system of claim 32, wherein the effector is selected from the group consisting of a liquid handling device, a handler for a disposable component, and an incubator. 34. The system of one of claims 32 and 33, comprising a controller communicatively coupled to the effector. 35 The system of claim 34, wherein the controller comprises the AI module. 36. The system of one of claims 27 to 35, comprising a sensor. 37. The system of claim 36, wherein the sensor is selected from the group consisting of a colorimeters, a spectrophotometer, a fluorometer, a luminometer, and an imaging system. 38. The system of one of claims 36 or 37, wherein the sensor is communicatively coupled to the AI module. 39. An ACE2 analog for treating or preventing infection with a coronavirus, wherein the ACE2 analog has an increased affinity for a spike protein of the coronavirus relative to ACE2. 40. The ACE2 analog of claim 39, wherein the ACE2 analog has increased affinities for spike protein of a plurality of coronavirus strains relative to ACE2 41. The ACE2 analog of claim 40, wherein the ACE2 analog is selected from the group consisting of ACE2-1, ACE2-2. ACE2-7, ACE2-8, and ACE2-9. 42. A method of generating an ACE2 analog for use in treating or preventing infection with a coronavirus, comprising: providing a heterogeneous database comprising a plurality of datasets related to interactions between a first set of proteins and a second set of proteins, wherein the plurality of datasets comprises experimental data from a plurality of experimental techniques, wherein the first set of proteins comprises spike proteins from a plurality of coronavirus strains and the second set of proteins comprises a native ACE2; preparing a structure dataset utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair comprising a ligand, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure dataset to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising a plurality of initial candidate binding proteins, wherein the candidate binding proteins comprise ACE2 analogs; generating, using the protein interaction algorithm, a secondary library comprising a plurality of secondary candidate binding proteins, wherein the secondary binding proteins are selected by the protein interaction algorithm as comprising an increased interaction with the ligand protein; screening at least some of the plurality of secondary candidate binding proteins for increased interaction with the ligand protein using an in vivo or in vitro screening assay to identify a set of tertiary candidate binding proteins; and synthesizing at least a portion of the set of tertiary candidate binding protein as ACE2 analogs for use in preparing a therapeutic formulation. 43. The method of claim 42, wherein preparing the structure dataset comprises generating graphical unified protein structure models for each first and second protein, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E) , wherein V= {v 1,v 2,…,v N} are features for each amino acid residue and pairwise edge features between amino acid residues. |
Table 2 [0088] Five ACE2 variants were selected for experimental validation (ACE2-1, ACE2 -2, ACE2-7, ACE2-8 and ACE2-9) and compared these to sACE2.v2.4, known to have the highest affinity for RBD binding. ELISA experiments showed that the EC50s of these five variants (ranging from 0.54 μg/ml to 1.36 μg/ml) were lower than both wild-type ACE2 (5.2 μg/ml) and sACE2.v2.4 (1.83 μg/ml), indicating high binding affinity as predicted by UniBind™ (as shown in FIG.4E). The results highlight the potential application of UniBind™ in therapeutic protein engineering. [0089] A post-hoc Dunnet’s test was used for pairwise comparisons of each variant against ACE2-WT. Results are summarized in Table 3. Table 3 [0090] Table 2 summarizes results of logistic fit parameters of ACE2 variant binding to S- protein. Mean ± S.E.M. of EC50 and log EC50 derived from individual logistic fits to ELISA data. Data represents n = 3 for each variant. Statistical analysis was performed across different variants using one-way ANOVA (P < 0.0001). A one-way ANOVA with a post-hoc Dunnet’s comparison revealed that the five ACE2 variants cited above as predicted to bind to S-protein with higher affinity and all showed significantly lower log EC50 than ACE2-WT. This demonstrates the applicability of UniBind™ in the rational engineering of therapeutic proteins. The molecular basis for the affinity enhancement in regard to therapeutic proteins useful in treating and/or preventing coronavirus infection can be rationalized from structural analyses as shown in FIGs.4F and 4G. FIG.4F shows interactions between wild-type and genetically modified ACE2 and SARS-CoV-2- RBD, where genetically modified ACE has a N330Y mutation and interaction with P499 of SARS-CoV-2-RBD. FIG. 4G shows interactions between wild-type and genetically modified ACE2 and SARS-CoV-2- RBD, where genetically modified ACE has a Q42L mutation and interaction with Y449 of SARS-CoV-2-RBD. For example, Q42 is situated in a highly negatively charged area which may prevent interaction with the RBD35,36. Therefore, the Q42L mutation may increase the hydrophobic area of ACE2 and improve binding to Q498 and Y449 on the RBD. Furthermore, the N330Y substitution may provide additional van der Waals contacts and H-bonds with the RBD. [0091] Inventors have found that methods of the inventive concept provide greatly improved accuracy in predicting effects of single and point mutations on the strength of inter-protein interactions relative to conventional methods. Table 43 shows typical results of a comparison of methods of the inventive concept and various conventional methods on prediction performance in the SKEMPI 2.0 set with mutation-level validation. Pearson correlations between AI model- predicted ∆∆G data and reported experimental ∆∆G data. S1131: a subset of 1,131 non- redundant interface single-point mutations. S4169: a subset of 4,169 single-point mutations compiled from the SKEMPI 2.0 dataset. S8338: a subset of 4,169 single-point mutations and all the corresponding reverse mutations. M1707: a subset of 1,707 non-redundant interface multi- point mutations. Table 4 While this exemplary data is generated using the SKEMPI 2.0 data set, Inventors believe that similar improvements are realized on application of methods of the inventive concept to other protein-related data sets. AI prediction of neutralizing antibody binding affinities to SARS CoV-2 variants [0092] Identifying the binding affinity between S-protein and antibody is essential for predicting immune evasion by current and future SARS-CoV-2 variants. UniBind™ can yield escape scores to reflect the S-protein antibody affinity after training using the PADB-SAb dataset. [0093] Inventors evaluated the performance of UniBind™ on single S-protein mutations using the deep mutational scanning data in PADB-SAb. Correlation analyses revealed a PCC of 0.85, indicating a good correlation between the predicted versus actual escape scores (FIG.5A, FIG. 5B, FIG.5C). FIG.5A shows regression performance of RBD-antibody affinity (escape score) prediction for the effects of different RBD mutations on RBD-Ab binding. FIG.5B shows a heatmap of an experimental escape score matrix upon mutations of RBD to different antibodies. Brightness represents the escape score. A brighter dot indicates that the mutation on site position of x-axis is more likely to lead to higher immune escape for antibody of y-axis. MAE, mean absolute error; R 2 , coefficient of determination; PCC, Pearson’s correlation coefficient. FIG.5C shows a heatmap of a predicted escape score matrix upon mutations of RBD to different antibodies. Brightness represents the escape score. A brighter dot indicates that the mutation on site position of x-axis is more likely to lead to higher immune escape for antibody of y-axis. MAE, mean absolute error; R 2 , coefficient of determination; PCC, Pearson’s correlation coefficient. Moreover, the predicted escape score showed a good correlation with neutralization experiment data in the dataset of PBAD-SAb (FIG.5D, FIG.5E, FIG.5F, FIG.5G), with all PCCs above 0.8 in four classes of antibodies, validating its utility in predicting antibody escape. Antibody escape scores range from 0 to 1, with increasing scores denoting higher levels of escape. The average escape scores of each site on the RBD are shown in FIG.5H. UniBind™ prediction of the RBD sites essential for immune escaping is consistent with experimental data. FIGs.5D to 5G show stratified analysis of regression performance on 4 classes of neutralization antibodies. FIG.5H shows the average mutational effects on escape score at each RBD site. The blue line represents experimental data; the orange line represents predicted results; shadows of each color indicate the standard error of each line. [0094] Next, to predict the antibody escape ability of different S-protein variants, Inventors utilized UniBind™ to generate an escape score matrix of common variants to current antibodies (FIG.5I). FIG.5I shows an escape score matrix of S protein variants to different antibodies calculated by UniBind™. The x-axis is antibodies, and the y-axis is common single point mutations and variants of concern. The higher the escape score, the easier it is to escape. The results are consistent with the current consensus that Omicron and its derivative variants display the strongest immune escape ability. [0095] To further validate the performance, Inventors set several thresholds for neutralization data according to the original literature, and divided the predicted data into two groups: Escape or Non-Escape. The receiver operating characteristic (ROC) plot shows that our predicted escape score could accurately identify the ability of different variants to escape neutralization by different antibodies, with an area under the curve (AUC) of 0.944 (FIG.5J). [0096] Stratification analysis showed that our model has a robust performance on both single mutation variants (AUC = 0.954) and VOCs (AUC = 0.937). A subgroup analysis of antibody escape profile on eight VOCs indicates that the predicted escape score is closely correlated with a neutralization escape profile (FIG.5K). FIG.5K shows predicted escape scores of antibodies as individual VOC boxplots. For each analysis, antibodies were separated into two groups that can be escaped or not escaped by SARS-CoV-2 variants according to relevant literature. The Center line indicates median; box limits indicate upper and lower quartiles; whiskers indicate 1.5x interquartile range; points indicate outliers; P values less than 0.05, 0.01, 0.001, 0.0001 are summarized with one to four asterisks, respectively. AI longitudinal prediction on viral evolution and antibody escape [0097] Inventors have found that UniBind™ can accurately predict the S-protein ACE2 binding affinity and potential for antibody escape. Based on these findings Inventors believe that UniBind™ can be used to perform a prospective analysis on potential SARS-COV-2 variants, including AI-based lineage analysis, AI-based deep mutational scanning, and model-guided evolution. Prediction of current and future VOCs affinity and immune escape can pre-emptively inform pandemic control measures such as vaccinations and targeted therapeutics development. [0098] Inventors initially utilized the UniBind™ model to assess S-protein trimer-ACE2 binding affinities, antibody binding affinities and plotted these predicted variants against the time of their appearance in the course of the COVID-19 pandemic (Figure 6a), including the recently identified Omicron BA.4 and BA.5 variants. FIG.6A shows predicted ACE2 binding affinity and antibody escape versus dates of variant emergence in the course of the COVID19 pandemic. Circles represent reported SARS-CoV-2 variants from GISAID data; circles annotated with common VOC names and their PANGO lineage represent variants examined in a number of experimental ways. Co-dimension: time from January 2020 to September 2022 (x-axis), antibody escape scores (y-axis), ACE2 affinity (color and circle size, more red and larger circle means increased affinity).AI prediction showed an overall trend for the newer variants to have a higher antibody escape score. The Omicron sub-lineages BA.1 and BA.2 which emerged late in 2021 showed a reduction in ACE2 affinity but enhanced antibody escape. The Omicron sub-lineages 22A (BA.4), 22B (BA.5), and 22C (BA.2.12.1) were predicted to show enhanced S-protein ACE2 binding affinity as well as an increase in antibody escape. UniBind™’s predicted S- protein ACE2 affinities on all VOCs are consistent with that reported in the literature, with a PCC score of 0.74 in the RBD-ACE2 affinity prediction (FIG.6B), and a PCC score of 0.89 in the S-protein trimer-ACE2 affinity prediction (FIG.6C). FIG.6B shows a correlation between AI-generated measurements of S-protein trimer-ACE2 affinity with experimental results. FIG. 6C shows a correlation between AI-generated measurements of RBD-ACE2 affinity with experimental results. Inventors have noted that such experiments performed using prior art methods showed inconsistent affinity measurements on some variants using a single RBD or trimer. However, UniBind™ prediction were in good agreement with those reported in the literature, suggesting an intrinsic affinity difference between the RBD-ACE2 and S-protein trimer-ACE2. [0099] Inventors also performed in silico deep mutational scanning prediction on long amino acid sequences to generate a sequence-function profile, to simultaneously predict the consequences of amino acid substitution mutations on ACE2 affinity in the RBD segment (FIG. 6D) as well as the antibody escape status (FIG.6E). FIG.6D shows a heat map illustrating the effect of mutations in the RBD segment on ACE2 binding affinity changes, red color means increased affinity and blue color means decreased affinity. FIG.6E shows a heat map showing the effect of mutations in the RBD segment on antibody escape scores, blue means decreased antibody binding affinity. This approach is highly consistent with previous studies using an experimental deep mutational scanning method for measurement of mutations in RBD which affects ACE2 affinity (Figure 6d). Moreover, UniBind™ can simultaneously predict affinity changes on multiple mutations such as that from all 16 VOCs, which addresses the problem of a heterogeneous batch effect in experimental biology. In addition, escape scores were calculated by surveying 80 neutralizing antibodies and averaging the escape score of AI-based antibodies to generate a sequence-to-escape heatmap (Figure 6e), which can better reflect variant's overall immune escape ability. [00100] The evolution of SARS CoV-2 is mainly determined by two broad categories of changes to the virus: infectivity/ transmissibility or immune response escape. To address this Inventors designed an affinity-based evolutionary score (evo-score) system to take into consideration of S-protein ACE2 binding affinity and S-protein antibody binding affinity and compared these data to analogous scores calculated using a mutational fitness analysis method. Analysis of UniBind-predicted evo-score in the context of viral fitness showed a high correlation with its corresponding mutational fitness profile of each variant (FIG.6B). Using the five existing Omicron variants (BA.2, BA.2.3, BA.4, BA.5, BA.2.12.1) as a starting point, UniBind™ predicted 7,560 variants based on a combination of 1-4 non-synonymous mutations from the 40 substitutions with a high score of non-synonymous mutations. Stratifying the variants using this approach reveals a number of clusters consistent with the overall evolution of SARS-CoV-2 variants. Predictions for a viral evolution analysis of SARS-CoV-2 were validated using the GISAID dataset. The predicted viral fitness landscape of the current variants from the GISAID database is shown in FIG.6F. FIG.6F shows characteristics of a viral lineage evolutionary path. Contour lines represent the affinity-based landscape. Each circle marks a cluster of variants with a similar evolutionary property. UniBind™ also predicted unknown variants (blue area, Figure 6f) for their S-protein ACE2 affinity and antibody escape ability. According to the clustering obtained from our prediction, the evolution of SARS-CoV-2 was initially on affinity enhancement, with only a modest incremental antibody escape (purple to red, Figure 6f). The emergence of the Omicron (BA.1) and Omicron (BA.2) represented a mild reduction in S-ACE2 affinity, but dramatically enhanced antibody escape (red to green, Figure 6f), which may represent a result of the immune pressure from the previous infection or through the vaccines induced immune selection. The model predicted that the BA.4 and BA.5 variants can display further antibody escape, with little changes in S-ACE2 binding affinity, versus the BA.1 and BA.2. Interestingly, the Omicron BA.2.12.1 variant displays a near one log-unit improvement in ACE2 affinity, without large changes in antibody escape. UniBind™ predicted a stronger overall evo-score for variants evolving in the direction of a higher antibody escape ability. [00101] Inventors applied the evo-score to predict the evolution of the Omicron variants (FIG. 6G). FIG.6G shows characteristics of AI-predicted new variant’s evolution based on five Omicron lineage. Blue dots represent new variants, green dots represent original Omicron lineage, and orange dots represent variants of interests that with top-five highest evo-score for each Omicron lineage. UniBind™ can predict the evolution of the Omicron variants to even higher evo-scores driven by several key non-synonymous mutations, particularly A475E which occurs most frequently. The main determinant of higher evo-scores in these predicted variants is enhanced antibody escape, with their S-protein ACE2 affinity values remaining around the same. Finally, UniBind™ also predicted that there is a possibility for future variants evolving to high S-protein ACE2 affinity values, underscoring a risk for potentially more virulent strains (FIGs. 6H and 6I). FIG.6H shows distribution maps of ACE2 binding affinity of variants from subsampled GISAID data. FIG.6I shows distribution maps of AI-predicted ACE2 binding affinity values of potential variants. FIG.6J shows correlation analysis between reported fitness and affinity-based evolutionary score (evo-score). FIG.6K shows characteristics of single mutations’ effects on ACE2 binding affinity, antibody escape, and evo-score. Dashed lines represent contour lines of evo-score; blue dots represent single mutations; orange dots show several well-known mutations which could significantly improve virus evolution. FIG.6L shows effects of essential mutations on evo-scores of five recent Omicron lineages. Circle size and color represent appearance frequency in top score variants that were derived from a relevant original lineage. [00102] Inventors have combined heterogeneous sets of protein-protein binding affinity data with AI-based protein sequence-to-function modeling to systematically identify and determine various affinity related tasks. There are a number of challenges in such an approach to prediction of protein-protein interactions. First, protein representation needs to take into consideration the entirety of the binding interface and the residues which form chemical bonds with each other in the setting of protein-protein interactions. Secondly, prior learning approaches have limitations due to poor scalability to large datasets, and predictions limited to single mutation variants. Therefore, computational integration methods with scalability represent a significant hurdle towards development of rapid, robust, and accurate tools for assessment of protein interactions, which have applicability to a wide variety of protein:protein interaction application, such as infectivity and neutralization escape for new SARS-CoV-2 variants. [00103] Inventors have both developed a generalizable modular UniBind™ framework and demonstrated real-world application of same. A graph neural network was developed for protein representation by integrating the information on both the residue main backbone at a molecular level and the side chain information at an atomic level, and their interactions. A novel BindFormer block approach was developed to learn their interactions measured by quantum physics and thermodynamics. Therefore, UniBind™ is more in-tuned for protein-protein interaction prediction, as both structural changes and energetic effects are crucial for protein−protein binding affinity prediction. Furthermore, UniBind™ integrates several heterogeneous sources of datasets and performs multi-task learning and model assembling to predict various task-specific affinity changes (for example S-protein ACE2 interaction and antibody escape scores). Inventors have validated UniBind™ on major publicly available datasets for affinity prediction, and this has demonstrated that UniBind™ is accurate, robust, and scalable. Another advantage of UniBind™ prediction on the S-protein ACE2 affinities is that it is based on using a full-length S-protein which is very desirable and feasible by AI, yet not feasible or impractical by biological experiment designs. Inventors have found that prior art experimental methods using affinity values of the RBD-ACE2 interaction to represent affinities of the full- length S-protein ACE2 interaction have technical challenges, limitations, and inaccuracies. Inventors have also found that UniBind™ provides methods for designing high affinity ACE2 receptor decoy molecules as a general strategy to target all current and future variants and validated it experimentally. [00104] New variants of SARS-CoV-2 have and will continue to emerge that have significant improvements in their fitness, which drives waves of cases in the pandemic. SARS-CoV-2 variant fitness can be measured by the affinity between the S-protein and ACE2, and the affinities between the S-protein and its neutralizing antibodies. The first interaction provides a measure of infectivity and the second for the immune escape potential. These can be assessed using in vitro approaches in the laboratory, but this is potentially hazardous, time-consuming, costly, and error-prone (e.g., cross contamination, human error, etc.). Such prior art approaches also do not provide for the evaluation of the large numbers of variants which are currently being generated, and can only increasingly lag real-world needs as new variants emerge. Using UniBind™ deep mutational scanning, Inventors have developed an affinity-based evolutionary score (evo-score) system to take into consideration of S-protein ACE2 binding affinity and S- protein antibody binding affinity. To the best of the Inventor’s knowledge, this is the first time for construction and quantification of a SARS-COV-2 variant fitness landscape using these two main determinant factors, which can reveal the biological mechanisms of fitness and potential evolution trends. [00105] UniBind™ accurately predicted a significantly increased potential for immune escape, yet a modestly increased affinity of the Omicron variants S-protein towards ACE2. This is consistent with the growing body of in vitro and clinical data on the Omicron sub-lineages (BA.1, BA.2, BA.4, and BA.5) with respect to its increased transmission, neutralizing antibody escape, and/or decreased vaccine efficacy. UniBind™ predicted that the S-protein of the BA.4 and BA.5 variants have an increased antibody escape score, but its affinity towards ACE2 remains similar to the BA.2 variant. This means that the infectivity and severity of the BA.4 and BA.5 variants is expected to be comparably low, like the BA.2 variant. Inventors believe that the therapeutic efficacy of current neutralizing antibodies will be further compromised against the BA.4 and BA.5 variants. Importantly, UniBind™ predicted that additional mutations in the Omicron BA.4 background will result in variants with reduced S-protein affinity towards ACE2 (Figure 6g and Figure 6h). In contrast, UniBind™ prediction on the potential variants with the strongest S-protein ACE2 affinities pointed to lineages from the clade 19B and Alpha (B.1.1.7) of the first wave of infections from December 2020 to May 2021, all of which carry the N501Y mutation (FIG.6A and FIG.6H). Other variants with the strongest the S-protein ACE2 affinities shared the L452R mutation carried by the recently emerged Omicron BA.2 (FIG.6H). It was previously reported that N501Y and L452R increase their S-ACE2 binding affinity. More than 30,000 current were searched and predicted future SARS-CoV-2 variants evaluated. Inventors found that there may be future variants with an increased S-protein affinity towards ACE2, potentially emerging from some of the previous lineages such as Alpha or Beta VOCs (Figure 6a, Figure 6i). Moreover, the current BA.1-4 lineages are less likely to evolve into a strain with a much higher S-ACE2 affinity (FIG.6I). [00106] Systems and methods of the inventive concept provide a general framework for predicting the affinity of a protein-protein complex, and so provide methods directed to rapid prediction and screening for future outbreaks and information for future vaccine development. However, their impact on biology and medicine goes far beyond VOC predictions. Examples of other application include methods for designing better antibodies for immune-cancer therapies and designer drugs for a ligand-receptor interaction. Systems and methods of the inventive concept can include codes and datasets available through governmental Infectious Diseases Control Units as well as the entire scientific and medical communities in order to take full advantage of these resources. [00107] In some situations a decrease in the S-protein ACE2 binding affinity may be paralleled/ compensated by an increase in the viral replication efficiency, e.g. more efficient viral polymerase or other viral replication related proteins mutants, more efficient packaging mutants, or an improved host-viral interaction, all of which may facilitate viral survival or fitness. It is known that for virus to evolve, the various part of the genome will evolve in such a way that will cluster into the same genotype or subtypes. Therefore, even though there may be changes in other viral genome to account for viral fitness, such as enhanced viral replication efficiency, the changes in the S-protein should also be reflective of the same evolutionary changes. In some embodiments of the inventive concept, UniBind™ can incorporate and utilize data representing other parts of the viral genome (e.g., those related to reproductive efficiency) and utilize such data within functional models as described above for use in evaluating current and projected mutations for virulence, escape from immune protection, etc.. [00108] [00109] It should be appreciated that the AI-based methods for predicting protein-protein interactions have a variety of practical applications. In some embodiments such an AI-based approach can be used in methods to predict the effects of specific mutations in a protein’s amino acid sequence on interactions with one or more binding ligands (such as the same or a different protein, a nucleic acid, a carbohydrate polymer, etc.). If such a binding ligand is involved in a disease process, accurate prediction of mutations that provide for enhanced binding (e.g., providing a lower binding constant) relative to an initial therapeutic protein selected for optimization can be used to generate a library of one or more mutated proteins with improved binding characteristics. Such mutated proteins can then be utilized in screening studies to identify those with binding characteristics that can provide an improved therapeutic protein. Such screening studies can be performed in vitro (e.g., microplate or microbead-based binding studies using labeled proteins) and/or in vivo (e.g., using animal models of disease). For example, a method using an AI-based approach as described above can begin with a therapeutic antibody selected to bind to an immune checkpoint protein (such as PD-1 or PD-L1) to generate a library of one or more mutated antibodies with increased binding affinity for the immune checkpoint protein. Elements of such a library can then be screened for increased binding, for example using an ELISA directed to the immune checkpoint protein. Alternatively, or in addition, such elements of such a library can be screened using an animal model for a PD-1 bearing cancer. Screening methods utilizing cells grown in culture and/or artificial organ systems can also be used. In some embodiments data from such screening experiments can be provided to the AI-based method as an experimental database, which can in turn be used to refine results from the AI-based method, [00110] In some embodiments the target ligand can be associated with an infectious disease. In such embodiments the binding ligand can be a component of the pathogen, such as a surface protein or glycoprotein. Such a binding ligand can be directly involved in the disease process (e.g., a viral protein utilized in host cell recognition and entry) or can simply be characteristic of the pathogen. In such embodiments a method using an AI-based approach as described above can begin with a therapeutic antibody selected to bind to a target ligand of the pathogen to generate a library of one or more mutated antibodies with increased binding affinity for the target ligand. Elements of such a library can then be screened for increased binding, for example using an ELISA directed to the target ligand. Alternatively, or in addition, such elements of such a library can be screened using an animal model for the infectious disease. Screening methods utilizing cells grown in culture and/or artificial organ systems can also be used. In some embodiments data from such screening experiments can be provided to the AI-based method as an experimental database, which can in turn be used to refine results from the AI-based method, [00111] Alternatively, in some embodiments methods using an AI-based approach as described above can be used to identify strains or mutations of a pathogen with reduced affinity for binding interactions with a therapeutic protein. For example, mutated proteins encoded by emergent or potential pathogenic virus strains can be scored for binding to therapeutic proteins that interact with the corresponding wild-type protein. Strains expressing reduced binding to the therapeutic protein can be aggregated in a library of pathogen strains that may escape treatment with the therapeutic protein. Elements of such a library can then be screened for reduced binding, for example using an ELISA directed to the therapeutic protein. Alternatively, or in addition, such elements of such a library can be screened using an animal model for the infectious disease. Screening methods utilizing cells grown in culture and/or artificial organ systems can also be used. In such embodiments an AI-based approach as described above can be used to develop a library of mutated therapeutic proteins with enhanced interaction with elements of the library of mutated pathogen proteins, permitting identification of potential therapies as the mutated pathogen becomes prevalent. In some embodiments data from such screening experiments can be provided to the AI-based method as an experimental database, which can in turn be used to refine results from the AI-based method. [00112] Wildlife species are a known reservoirs for coronaviruses. This raises the possibility that zoonoses and/or reverse zoonoses can occur between humans and animals, providing opportunities for rapid evolution of SARS-CoV-2 and/or SARS-CoV- 1 to generate new and potentially highly virulent variants. Prediction of S protein binding to ACE2 orthologues in different species can facilitate surveillance and early warning of potentially virulent strains of coronavirus in susceptible wildlife species. Inventors applied UniBind™ to predict cross-species binding affinities of RBD and ACE2, using high-throughput assay experiment data that profiled the binding affinity across sarbecoviruses and ACE2 orthologues. The resulting heatmap and association analysis showed that the predictions generated by UniBind™ showed a high degree of correlated with the experimental data, with a PCC of 0.87. FIG.4H shows a heatmap of S- protein–ACE2 binding affinities across species. The left panel of FIG.4H shows AI-predicted values generated by a method of the inventive concept. The right panel shows corresponding experimental data. Sarbecoviruses are colored by clade. FIG.4I shows a typical regression analysis of predicted versus experimental affinity change between S-proteins of sarbecoviruses and ACE2 orthologues of humans. FIG.4J shows a heatmap of predicted affinity values for S- protein–ACE2 binding between SARS-CoV-2 variants and ACE2 proteins from 24 animal species. Tiles with labels (circles and dots) represent the affinities between related ACE2 orthologues and SARSCoV-2 spike variants. Circles indicate that the variants reported could bind to relevant ACE2 orthologues; dots indicate that the variants reported could not bind to relevant ACE2 orthologues. [00113] It should be appreciated that methods as described above can be performed using an automated or partially automated system. Such a system can include a computer encoding elements of the AI and that is in communication with suitable databases, as can include effectors and sensors suitable for performing physical screening studies. Suitable effectors include liquid handling devices, handlers for disposable components (e.g., test plates), and incubators. Suitable sensors include colorimeters, spectrophotometers, fluorometers, luminometers, imaging systems, etc., Such systems can include a controller for directing effector functions. Such a controller can include encoded instructions for the performance of screening assays of candidate proteins identified by the AI-based methods. In some embodiments data generated by sensor systems can be provided as an experimental database that is in communication with the computer encoding elements of the AI. Methods Expression and Purification of Recombinant ACE2 Variants [00114] The mature polypeptides of human ACE2 (GenBank NM_021804.1) wild type and variants were cloned into eukaryotic expression plasmid pFcIg (ABLINK Biotech) with a C- terminally fused Fc region of human IgG1 using Gibson Assembly method. The DH10B competent cells (ABLINK Biotech) were electroporated with assembly products and cultured on LB agarose plates containing 25 μg/mL Zeocin (Invitrogen). Monoclonal colonies were selected and sequenced to confirm the mutations. Then monoclonal colonies were cultured in LB containing 25 μg/mL Zeocin overnight to enhance the plasmid yield. Recombinant plasmids were extracted using an endotoxin removal plasmid extraction kit (TIANGEN™).50 ml HEK 293F cells were transfected with 25 ng recombinant plasmids using FectoPRO(ployPlus)™ transfection reagent to express target proteins. The culture medium was collected after 5 days incubation. Recombinant ACE2 proteins were extracted using protein A dextran and purified using SDS-PAGE electrophoresis. The obtained recombinant ACE2 proteins were in the natively dimeric form. ELISA EC50s of ACE2 mutants binding to RBD were measured by indirect ELISA as previous described 3. Wells of a 96-well plate were coated with 200 ng recombinant RBD protein (ABLINK Biotech) at 4℃ overnight. After removing the supernatant, the wells were blocked using 1% BSA at room temperature for 2 hours and then washed using wash buffer PT (Abcam) for 3 times.10 mg/ml ACE2, ACE2-1, ACE2-2, ACE2-7, ACE2-8, ACE2-9 and sACE2. v2.4 recombinant proteins were diluted at a ratio of 1:3 into 7 concentrations and added to blocked plate with 100 μl per well. After incubating at room temperature for 2 hours, each well was washed and added 100 μl HRP-conjugated Anti-human IgG Fc antibody (Sigma) for 1 hour at room temperature. Unbound antibody was removed by washing, and TMB substrate (Solarbio) was added for colorimetric reaction. After 5-10 min incubation, 50 μl 1 M Phosphoric acid were added to each well to stop the reaction. The signal, which reflects the absorbance of the product, was measured at 450 nm. The 4-parameter logistic model was applied to calculate the EC50 of ACE2 binding according to the NR values of serially diluted samples. Dataset characteristics [00115] Protein-protein binding affinity datasets used here were constructed from the SKEMPI V2.0 database and literature. SKEMPI V2.0 is a manually curated database which includes affinity changes upon mutations for structurally-solved protein-protein interactions, which currently contains 7,085 mutations in total. There are many kinetic or thermodynamic parameters that were reported by SKEMPI V2.0 database; here Inventors used dissociation constants (Kd) to represent affinity. The structures of mutant and wild-type protein complexes were downloaded from SKEMPI V2.0 website (https://life.bsc.es/pid/skempi2). [00116] The effect of SARS-CoV-2 RBD mutations on RBD-ACE2 binding affinity was collected from literature sources, which were measured by apparent dissociation constants (KD,app) using deep mutational scanning approach. The effect of ACE2 mutations on RBD- ACE2 binding affinity was collected from literature sources and estimated by log2 enrichment ratio which was calculated by comparing transcript frequencies between enriched cell populations and naïve plasmid library. Flow cytometry analysis data was utilized for validation. The effects of RBD mutations on RBD-antibody binding affinity were collected from literature sources and estimated by escape score which were calculated by comparing barcode frequencies of variants between immune escape cell populations and reference populations followed by normalization within each antibody. Neutralization assay data from recent literature was included as validation. The structure of wild-type RBD-ACE2 complex was obtained from Protein Data Bank (PDB) with accession number 7df4. The structures of Spike-antibody complexes were collected and curated from Protein Data Bank (PDB). Structures of mutation harboring RBD- ACE2 complex were derived from the wild type structure by replacing amino acids with substitutions and optimized using EvoEF2 (Huang et al., 2020). The fitness of reported SARS- CoV-2 variants and existing variants were collected from GISAID data (22 May 2022). AI model overview [00117] Systems and methods of the inventive concept utilize a deep learning framework named UniBind, developed by the Inventors to estimate the functional impact of mutations on affinities of protein-protein interactions learned from multiple heterogeneous protein complex datasets and to search for unseen proteins with desired properties. There are three major components in this framework, a hierarchical protein representation as a graph at atom-level and residue-level; a new dual-path neural network named BindFormer, with geometry and energy attention (GEA) modules for aggregating messages and iterative refinement; and a multi-task learning method for heterogeneous biological data training. Finally, Inventors developed an affinity-based prospective analysis module for comprehensive analysis including lineage analysis, AI-generated deep mutational scan, fitness landscape depiction, and variant evolution. Protein representation as a graph [00118] Given an input of the wild type structure and its corresponding mutational structure, Inventors represented it as an attributed graph encoded with sequence, geometry, and energy information at both residue-level and atom-level. Specifically, for a 3D protein structure, systems and methods of the inventive concept transform it into a unified protein representation as a graph G=(V,E) ,,where V=\{v 1,v 2,…,v N} a re node features for each residue and E= {z ij } i≠j) are pairwise edge features between residues. For each residue, systems and methods of the inventive concept encode it as v i = (x r,i , x a,i ), where x r,i = (ℎ r,i , R r,i , t r,i ) is a residue feature and ^ ^,^ = ^ℎ a,p , R a,p , t a,p ^is an atomic feature for atom Ap in the atom set ^ ^ _i of residue i. In both features, ℎ is the embedding vector based on the amino acid or atom types, residue sequence indices, chain ids, and mutant types. ^ and ^ are translation and rotation vectors calculated by the coordinates of three specific atoms using Gram–Schmidt process, where N − C α − C are applied for a residue and − A ρ − C are applied for an atom A ρ in a residue. As changes in the protein−protein binding affinity upon a mutation are determined by both structural changes and energetic effects, systems and methods of the inventive concept utilize edge features to capture the energy of a biomolecule conformation. For each z ij n the edge feature set E, Z r,ij is an energy term between residue i and residue j, and ergy terms between atom A ρ and A. q Following the setting in Flex ddg method within the Rosetta macromolecular modeling suite, Inventors chose seven energy terms from Rosetta’s all-atom energy function for the calculation of ∆∆G, including solvation, hydrogen bonding, electrostatics and Lennard-Jones atomic packing interactions. BindFormer with geometry and energy attention [00119] Based on the unified protein representation, Inventors developed BindFormer, which is a dual-path neural network to predict changes in protein−protein binding affinity upon a mutation to extract and combine residue- and atom-level information around mutant sites. As the geometric and energy features are two key determinants for protein-protein interactions, Inventors implemented geometry and energy attention (GEA) to incorporate the messages passing in the network. [00120] In BindFormer block, given the input of residue feature ℎ r and atom feature ℎ a , Inventors derived transformed features he process of the dual-path with GEA layers, the i-th residue feature ℎ r,i is first transformed into atom level and combined with feature ℎ a,ρ of the atom ρ ∈ A i with a multilayer perceptron (MLP), Then the atomic GEA layer was to aggregate atom level messages from neighbor residues, here x a,e and x a,g are energy and geometry terms at atom level. After the atom messages are passed among residues, the atom features for residue i are aggregated as further propagated using the residue GEA layer, x r,e and x r,g are energy and geometry terms at residue level. The normalization layer was used after residue connection. Multi-task learning for heterogeneous biological data integration [00121] Inventors developed a framework with an affinity consistent constraint loss, which bridges the gap by modeling affinity across experiments explicitly and trains the model with joint datasets. The multi-task learning framework consists of a shared encoder z = f(x) and separated decoders or each task T, where x is input, bel space. Systems and methods of the inventive concept can perform a consistent constraint to link all decoders, aiming at instructing decoders to consistently agree with inferred affinity among various label spaces. Concretely, given a task T and label space , du ing training with the assumption of consistency of molecular affinity property, the label : of the task T is mapped to binding free energy change via a learnable monotonic neural network Then the combined loss function for the multitask learning can be formulated as L = model output for task K, I is the weight of the consistency loss. FIG. 7 schematically depicts an exemplary architectural details of a geometry and energy attention (GEA) module. Arrows show the information flow. Vector names: L M : input sequence feature, X E : input energy feature, X G : input geometric feature, X N : neighbor sequence feature, t N : neighbor translation, : output sequence feature. L: length of the amino acid sequence, L N : No. of nearest neighbor residues in the graph, N h : No. of heads in the multi-head attention. Dimension names: d s : sequence feature, d E : energy feature, d G : geometric feature, d Q : query vector, d K : key vector, d v : value vector. Operators: : element-wise multiplication, : element-wise addition. Functions: k − NN: search for K-nearest neighbors, Linear d 1 → d 2 : fully connected layer with an input dimension of P ^ and an output dimension of d 2 , f(x): 3 → P A : geometric feature function, specifically, f(x) = in BindFormer, where is the normal vector of . [00122] GEA is a geometric invariant multi-head attention layer aggregating sequence features from neighbor nodes weighted by pairwise geometric and energy terms. Specifically, in a graph of attributed nodes x i = (h i , R i , t i ) and pairwise energy-based attributed edges z ij ,.The attention value between i-th nodes and j-th node was initially calculated by SoftMax as α i] = aspect affinity logit between each node pair, f Q , f K and f Z are feed forward layers and j' are indices of all neighbor nodes. Then the sequence, energy, and geometric features from all neighbor nodes were merged with the attention value as , respectively, where f V is a feed forward layer, f G = is the normal vector of ^. Finally, the sequence feature of the current node was updated a , where f out is a feed forward layer. The geometric invariance was guaranteed via the relative distance in attention calculation stage and relative position in local coordinates in the message calculation stage. Model training and ensemble [00123] The training was performed for 200 epochs using the Adam optimizer with a learning rate of 10 -3 and a weight decay of 10 -6 . Mutation and wild type inversion were applied to each complex pair during training as data augmentation in order to enable an improved and generalized network learning. The models were implemented using PyTorch. Inventors performed the 10-fold cross-validation by leaving one-fold of mutations out as a test set and using the rest of the mutations to train and tune the model, repeating this process for each fold. To improve the overall performance of the AI, Inventors applied a model ensemble. The reported predictions were obtained by aggregating the outputs of 10-fold cross-validation. AI affinity-based prospective analysis [00124] Inventors conducted prospective analysis on SARS-COV-2 variants, including AI- based lineage analysis, AI-based deep mutational scanning, fitness landscape depiction, and model-guided evolution. [00125] Based on the AI’s ability to assess affinity properties of variants, Inventors characterized SARS-COV-2 variants by the changes of binding affinity of S-ACE2 and antibody escape score. Inventors used 918 reported SARS-COV-2 variants from GISAID data, from the wild type to the latest Omicron variants. Deep mutational scanning (DMS) approach is a high throughput method that makes use of next-generation sequencing technology to measure the properties of more than 10^5 variants of a protein in a single experiment. But the cost for wet-lab experiments will increase dramatically when the amount of desired variants and properties increases. Furthermore, Inventors conducted AI-based deep mutational scanning by predicting the affinity changes of ACE2 binding and averaged antibody escape scores of all single point mutations of spike in SARS-COV-2. [00126] For model-guided evolution of ACE2 variants and SARS-COV-2 variants, Inventors constructed an evolutionary score (evo-score) using two main determinant factors of ACE-S affinity and immunity escape score, and further depicted the landscape to characterize the fitness of SARS-COV-2 variants. Specifically, Inventors adopted SVM with RBF kernel (Radial basis function) method to fit the fitness of each variant and visualize the topography of the fitness landscape to demonstrate the mutation effects. Variants belonging to the same variant of interest (VOI) or clade were highlight, and clustering in the fitness landscape were grouped together using 2D Gaussian kernel density estimation. Systems and methods of the inventive concept can use a hill-climbing algorithm to search variants with a set number of mutations from wild-type ACE2 or SARS-COV-2 that maximized the minimum predicted functional score from an ensemble of 10-fold models , where, x is the input with wild type structure and mutant structure, S L is mutant space with mutant edit distance not large than L, M is a set of trained models, and f m (x) is a model’s predicted score for the input x. This evolution objective ensures that all models predict that the sequence will have a high functional score. Systems and methods of the inventive concept can initialize a hill-climbing run with selected variants of interest and potential single-point mutations based on AI-based DMS. For each run of hill-climbing process, systems and methods of the inventive concept generated new variants by adding a possible single-point mutation. Systems and methods of the inventive concept can then move to the new population of 3 variants with the highest objective, which became our new reference point, and repeated this hill-climbing process until a local optimum or expected number of variants was reached. Statistical analysis [00127] To evaluate the performance of regression models for continuous values prediction in this study, Inventors applied Mean Absolute Error (MAE), R-square (R2), and Pearson Correlation Coefficient (PCC). The models for binary classification were evaluated by Receiver Operating Characteristic (ROC) curves of sensitivity versus 1 – specificity. The Area Under the Curve (AUC) of ROC curves were reported with 95% Confidence Intervals (CIs). The AUCs were calculated using the Python package of scikit-learn (version 0.22.1). [00128] It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C …. and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.
Next Patent: METHODS AND SYSTEMS FOR USE IN SCAN-BASED ANALYSIS OF CROPS