SYSTEMS AND METHODS FOR DETERMINATION OF PROTEIN INTERACTIONS

Title:

SYSTEMS AND METHODS FOR DETERMINATION OF PROTEIN INTERACTIONS

Document Type and Number:

WIPO Patent Application WO/2024/025963

Kind Code:

Abstract:

Systems, methods, and compositions that incorporate, utilize, or are generated by an Al-based UniBind™ framework are described. The Al-based UniBind™ framework includes three major components: protein representation as a graph at the residue- and atom- levels, BindFormer™ blocks with geometry and energy attention, and multi-task learning for heterogeneous biological data integration. Trained on more than curated 70,000 protein structure-to-function data, UniBind™ accurately predicted binding affinities of SARS-CoV-2 spike protein mutants to human ACE2 receptor, or to neutralizing monoclonal antibodies. Systematic tests on major benchmark datasets and experimental validation show that UniBind™ is accurate, robust, and scalable.

Inventors:

LAU JOHNSON YIU-NAM (US)
FOK MANSON (CN)

Application Number:

PCT/US2023/028727

Publication Date:

February 01, 2024

Filing Date:

July 26, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

LAU JOHNSON YIU NAM (US)

International Classes:

G01N33/68; G16H50/80

Foreign References:

US20210371841A1	2021-12-02
CN111210871A	2020-05-29

Other References:

BELL ERIC W., SCHWARTZ JACOB H., FREDDOLINO PETER L., ZHANG YANG: "PEPPI: Whole-proteome Protein-protein Interaction Prediction through Structure and Sequence Similarity, Functional Association, and Machine Learning", JOURNAL OF MOLECULAR BIOLOGY, ACADEMIC PRESS, UNITED KINGDOM, vol. 434, no. 11, 1 June 2022 (2022-06-01), United Kingdom , pages 167530, XP093134364, ISSN: 0022-2836, DOI: 10.1016/j.jmb.2022.167530
TRAGNI VINCENZO, PREZIUSI FRANCESCA, LAERA LUNA, ONOFRIO ANGELO, MERCURIO IVAN, TODISCO SIMONA, VOLPICELLA MARIATERESA, DE GRASSI : "Modeling SARS-CoV-2 spike/ACE2 protein–protein interactions for predicting the binding affinity of new spike variants for ACE2, and novel ACE2 structurally related human protein targets, for COVID-19 handling in the 3PM context", THE EPMA JOURNAL, SPRINGER, NL, vol. 13, no. 1, 1 March 2022 (2022-03-01), NL , pages 149 - 175, XP093134368, ISSN: 1878-5077, DOI: 10.1007/s13167-021-00267-w
YAZDANI-JAHROMI MEHDI, YOUSEFI NILOOFAR, TAYEBI AIDA, GARIBAY OZLEM OZMEN, SEAL SUDIPTA, KOLANTHAI ELAYARAJA, NEAL CRAIG J.: "Interpretable and Generalizable Attention-Based Model for Predicting Drug-Target Interaction Using 3D Structure of Protein Binding Sites: SARS-CoV-2 Case Study and in-Lab Validation", BIORXIV, 18 February 2022 (2022-02-18), pages 1 - 11, XP093134375, DOI: 10.1101/2021.12.07.471693

Attorney, Agent or Firm:

BRILLHART, Kurt L. et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS What is claimed is: 1. A method of modulating interaction between a binding protein and a ligand protein, comprising: providing a heterogeneous database comprising a plurality of datasets related to interactions between a first set of proteins and a second set of proteins, wherein the plurality of datasets comprises experimental data from a plurality of experimental techniques; preparing a structure dataset utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair comprising a ligand, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure dataset to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising a plurality of initial candidate binding proteins; generating, using the protein interaction algorithm, a secondary library comprising a plurality of secondary candidate binding proteins, wherein the secondary binding proteins are selected by the protein interaction algorithm as comprising a modulated interaction with the ligand protein; screening at least some of the plurality of secondary candidate binding proteins for modulated interaction with the ligand protein using an in vivo or in vitro screening assay to identify a set of tertiary candidate binding proteins; and synthesizing at least a portion of the set of tertiary candidate binding protein for use in at least one of an in vitro biomedical assay, an in vivo biomedical assay, and in preparing a therapeutic formulation.

2. The method of claim 1, wherein the plurality of experimental techniques comprises collecting experimental data from two or more of enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and a neutralization assay. 3. The method of claim 1 or 2, wherein the binding protein is selected from the group consisting of an antibody, a fragment of an antibody, a single-chain antibody, and a fragment of a single chain antibody. 4. The method of claim 3, wherein the antibody is a therapeutic antibody or is a result of immunization. 5. The method of one of claims 1 to 4, wherein the ligand is selected from the group consisting of an immune checkpoint protein, a tumor marker, and a component of a pathogen. 6. The method of claim 5, wherein the pathogen is a virus. 7. The method of claim 6, wherein the virus is a coronavirus and wherein the ligand is a spike protein of the coronavirus. 8. The method of one of claims 1 to 7, wherein the modulation comprises an increase in binding between the binding protein and the ligand protein. 9. The method of one of claims 1 to 8, wherein preparing the structure dataset comprises generating graphical unified protein structure models for each first and second protein, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E) , wherein V= {v ₁,v ₂,…,v _N} are node features for each amino acid residue and pairwise edge features between amino acid residues. 10. A composition comprising a mutated ligand generated by the method of claims 1 to 9. 11. A composition for use in treating a viral infection, comprising a mutated ligand generated by the method of claims 1 to 9. 12. The composition of claim 11, wherein the viral infection is a coronavirus infection, and wherein the mutated ligand shows increased affinity for ACE2 relative to a wild type SARS coronavirus spike protein or relative to SARS spike proteins of a plurality of SARS coronavirus variants. 13. A method of identifying a mutated pathogen with increased infectivity or escape from immunotherapy, comprising: providing a heterogeneous database comprising data related to interactions between a first set of proteins and a second set of proteins, wherein the heterogeneous database comprises experimental data from a plurality of experimental techniques; preparing a structure database utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure database to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising sequence information for an immunotherapy protein or a host receptor and a plurality of initial candidate pathogen ligand proteins originated from mutant pathogens, wherein the immunotherapy protein or the host receptor interacts with a ligand protein of wild type pathogen; generating, using the protein interaction algorithm, a secondary library comprising a plurality of secondary pathogen ligand proteins, wherein the secondary pathogen proteins are selected by the protein interaction algorithm as comprising a reduced interaction with the immunotherapy protein or an increased interaction with the host receptor ; and screening at least some of the plurality of secondary pathogen proteins for reduced interaction with the immunotherapy protein or increased interaction with the host receptor using an in vivo or in vitro assay; and reporting a pathogen comprising a secondary pathogen protein with reduced interaction with the immunotherapy protein relative to wild type pathogen to a practitioner as likely to escape treatment with the immunotherapy protein or comprising a secondary pathogen protein with increased interaction with the host receptor to a practitioner as having increased infectivity. 14. The method of claim 13, wherein the plurality of experimental techniques comprises collecting experimental data from two or more of enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and a neutralization assay. 15. The method of claim 13 or 14, wherein the immunotherapy protein is selected from the group consisting of an antibody, a fragment of an antibody, a single-chain antibody, and a fragment of a single-chain antibody. 16. The method of one of claim 13 to 15, wherein the mutated pathogen is a virus. 17. The method of one of claims 13 to 16, wherein the plurality of initial candidate pathogen proteins comprises a coronavirus spike protein. 18. The method of one of claims 13 to 17, wherein preparing the structure dataset comprises generating graphical unified protein structure models for each first and second protein, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E) , wherein V= {v ₁,v ₂,…,v _N}are node features for each amino acid residue and pairwise edge features between amino acid residues. 19. A method of improving prediction of binding affinities between a first protein and a second protein, comprising: providing a heterogeneous database comprising data related to interactions between a first set of proteins and a second set of proteins, wherein the heterogeneous database comprises experimental data from a plurality of experimental techniques; preparing a structure database utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure database to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising sequence information for the first protein and the second protein; selecting a first protein and a second protein from the primary library by a user; generating, using the protein interaction algorithm, a predicted binding affinity between the first protein and the second protein and reporting the predicted binding affinity to the user. 20. The method of claim 19, wherein the plurality of experimental techniques comprises collecting experimental data from two or more of enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and a neutralization assay. 21. The method of claim 20 or 21, wherein the first protein is an antibody and the second protein is a ligand. 22. The method of claim 19 to 21, wherein the first protein is a coronavirus spike protein and the second protein is ACE2. 23. A method of generating a high affinity antibody directed to an antigen, comprising: providing a heterogeneous database comprising data related to interactions between the antigen and a set of initial candidate antibody proteins that can form a protein binding pair, wherein the heterogeneous database comprises experimental data from a plurality of experimental techniques; preparing a structure database utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of the protein binding pair, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure database to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising sequence information for the antigen and a plurality of initial candidate antibody proteins originated from an initial antibody directed to the antigen; generating, using the protein interaction algorithm, a secondary library comprising a plurality of secondary antibody proteins, wherein the secondary antibody proteins are selected by the protein interaction algorithm as comprising an increased interaction with the antigen relative to the initial antibody; and screening at least some of the plurality of secondary antibody proteins for increased interaction with the antigen using an in vivo or in vitro assay to identify a plurality of tertiary antibody proteins having increased affinity for the antigen relative to the initial antibody; and generating an antibody having improved affinity for the antigen from among the plurality of tertiary antibody proteins . 24. The method of claim 23, wherein the antibody is selected from the group consisting of a divalent antibody, a fragment of a divalent antibody, a single-chain antibody, and a fragment of a single-chain antibody. 25. The method of claim 23 or 24, wherein the antigen is derived from a pathogen, an immunotherapy target, or a cancer marker. 26. The method of one of claims 23 to 25, wherein preparing the structure dataset comprises generating graphical unified protein structure models for each of the antigen and initial candidate antibody protein pairs, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E) , wherein V= {v ₁,v ₂,…,v _N} are node features for each amino acid residue an are pairwise edge features between amino acid residues.

27. A system for deriving protein binding characteristics comprising: a database module, comprising heterogeneous biologic data, wherein heterogeneous biologic data comprises protein sequence data and biologic data originating from a plurality of experimental techniques or forms of expression for the biologic data; a protein representation module, in which heterogeneous data from the database module is used to construct a plurality of graphical hierarchal protein structures for proteins represented in the database module; and an AI module, comprising encoded instructions to utilize the plurality of graphical hierarchal protein structures as a training set to derive a protein interaction algorithm and to apply the protein interaction algorithm to evaluate or estimate binding characteristics of wild type and/or mutated proteins provided to the AI module. 28. The system of claim 27, wherein the plurality of experimental techniques comprises two or more of enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and a neutralization assay. 29. The system of claim 27 or 28, wherein the database module comprises binding energy estimates derived from experimental data. 30. The system of one of claims 27 to 29, wherein the database module comprises biological data directly related to a protein or proteins being characterized. 31. The system of one of claims 27 to 29, wherein the database module does not comprise biological data directly related to a protein or proteins being characterized. 32. The system of one of claims 27 to 31, comprising an effector. 33. The system of claim 32, wherein the effector is selected from the group consisting of a liquid handling device, a handler for a disposable component, and an incubator. 34. The system of one of claims 32 and 33, comprising a controller communicatively coupled to the effector. 35 The system of claim 34, wherein the controller comprises the AI module. 36. The system of one of claims 27 to 35, comprising a sensor. 37. The system of claim 36, wherein the sensor is selected from the group consisting of a colorimeters, a spectrophotometer, a fluorometer, a luminometer, and an imaging system. 38. The system of one of claims 36 or 37, wherein the sensor is communicatively coupled to the AI module. 39. An ACE2 analog for treating or preventing infection with a coronavirus, wherein the ACE2 analog has an increased affinity for a spike protein of the coronavirus relative to ACE2. 40. The ACE2 analog of claim 39, wherein the ACE2 analog has increased affinities for spike protein of a plurality of coronavirus strains relative to ACE2 41. The ACE2 analog of claim 40, wherein the ACE2 analog is selected from the group consisting of ACE2-1, ACE2-2. ACE2-7, ACE2-8, and ACE2-9. 42. A method of generating an ACE2 analog for use in treating or preventing infection with a coronavirus, comprising: providing a heterogeneous database comprising a plurality of datasets related to interactions between a first set of proteins and a second set of proteins, wherein the plurality of datasets comprises experimental data from a plurality of experimental techniques, wherein the first set of proteins comprises spike proteins from a plurality of coronavirus strains and the second set of proteins comprises a native ACE2; preparing a structure dataset utilizing the heterogeneous database, wherein the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair comprising a ligand, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair; training an artificial intelligence (AI) system using the structure dataset to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models; providing the trained AI with a primary library comprising a plurality of initial candidate binding proteins, wherein the candidate binding proteins comprise ACE2 analogs; generating, using the protein interaction algorithm, a secondary library comprising a plurality of secondary candidate binding proteins, wherein the secondary binding proteins are selected by the protein interaction algorithm as comprising an increased interaction with the ligand protein; screening at least some of the plurality of secondary candidate binding proteins for increased interaction with the ligand protein using an in vivo or in vitro screening assay to identify a set of tertiary candidate binding proteins; and synthesizing at least a portion of the set of tertiary candidate binding protein as ACE2 analogs for use in preparing a therapeutic formulation. 43. The method of claim 42, wherein preparing the structure dataset comprises generating graphical unified protein structure models for each first and second protein, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E) , wherein V= {v ₁,v ₂,…,v _N} are features for each amino acid residue and pairwise edge features between amino acid residues.

Description:

SYSTEMS AND METHODS FOR DETERMINATION OF PROTEIN INTERACTIONS [0001] This application claims the benefit of United States Provisional Patent Application Serial No.63/392,420 filed on July 26, 2022. These and all other referenced extrinsic materials are incorporated herein by reference in their entirety. Where a definition or use of a term in a reference that is incorporated by reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein is deemed to be controlling. Field of the Invention [0002] The field of the invention is characterization of protein interactions. Background [0003] This background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art. [0004] Protein-protein interactions, characterized by binding affinities underlie many biological functions, including pathogen-host interactions. Rapid progress on high throughput approaches in experimental biology has generated an unprecedented amount of sequence data with corresponding binding affinity information. However, computational integration and interpretation of the impact of mutations on protein binding affinity using large-scale heterogeneous datasets are challenging. [0005] Artificial intelligence (AI) is already making an enormous impact on biomedicine. Recent breakthroughs in AI-based 3D protein structure prediction (e.g., AlphaFold2 and RoseTTAFold) show their potential to revolutionize structural biology. However, translating these structural predictions, particularly the effect of amino acid substitutions to their functional impact on biology is challenging. Therefore, while AlphaFold2 and RoseTTAFold are successful in predicting 3D protein structures from primary protein sequences, the functional correlates of these predictions are less clear. [0006] Protein-protein interactions, characterized by binding affinities underlie many biological functions, including pathogen-host interactions. Protein polymorphisms or non-synonymous mutations alter the amino acid sequence and protein structure which may have changes in binding affinity. SARS-CoV-2 and other SARS-related sabrecoviruses bind to target cells through its spike protein (S-protein) to the host cellular receptor angiotensin converting enzyme (ACE) 2 which then mediates viral entry. The functional S-protein is a trimer, with each protomer consisting of two domains, S1 and S2. S1 contains the receptor binding domain (RBD), which directly interacts with ACE2. Conformational changes in S-protein from closed (or “down”) to open (or “up”) are required for its interaction with ACE2. [0007] Prevention of S-protein binding to ACE2, by neutralizing antibodies acquired through natural infection, vaccination, or therapeutic monoclonal antibodies, can prevent this interaction and thus infection. The S-protein has a remarkable capacity to accommodate large numbers of amino acid substitutions and even deletions, whilst retaining or even enhancing its ability to recognize/bind to ACE2 and evade neutralizing antibody binding. These large number of potential mutations, together with the selection pressure of infection- or vaccination-acquired population immunity, facilitate the emergence of the fittest variants to evolve. [0008] Viral mutational fitness is an essential element of viral adaptation. It is one of the key drivers of the various waves of COVID-19 infections in the current pandemic. Indeed, these waves of infections correspond to the evolution and emergence of new SARS-CoV-2 variants of concern (VOCs). For instance, the D614G and N439K non-synonymous mutations in the S- protein drove the initial wave of infections. Additional non-synonymous mutations accumulated in the Alpha and Delta variants drove the second and third waves of infections, respectively. More recently, the Omicron variant with its sub-lineages (BA.1 to BA.5) contains a hypermutated S-protein consisting of previously observed as well as novel mutations. Many non- synonymous mutations are located within the RBD. Some substitutions (for example Q493R, G496S, and Q498R) also form novel hydrogen bonds with ACE2. These new substitutions may enhance or reduce binding affinity towards ACE2, and as a result, it is important to be able to predict the impact of these substitutions, and of other as yet unknown substitutions, on ACE2 affinity, immune escape, and its general mutational fitness. [0009] Thus, there is still a need for systems and methods to predict effects of changes in protein primary structure on protein interactions. Summary of The Invention [0010] The inventive subject matter provides apparatus, systems and methods for predicting effects of mutations in the amino acid sequence of a member of a ligand-receptor and/or antigen- antibody pair, thereby permitting generation of mutated proteins with improved binding affinities for therapeutic and/or diagnostic use. Such methods can also be used to predict the ability pathogen strains to escape conventional therapy (e.g., through reduced interaction with a therapeutic ligand or receptor analog), and to identify pathogen strains with increased infectivity (e.g., in having increased binding affinity for a host receptor). In such systems and methods an artificial intelligence (AI) approach is used that incorporates a wide range of data related to protein interactions and generated using a range of methodologies to provide predicted values for protein-protein interactions that correlate with measurements. [0011] One embodiment of the inventive concept is a method of modulating(e.g., increasing or decreasing relative to values for naturally occurring or known proteins) interaction between a binding protein and a ligand protein, by providing a heterogeneous database comprising a plurality of datasets related to interactions between a first set of proteins and a second set of proteins, wherein the plurality of datasets comprises experimental data from a plurality of experimental techniques. A structure dataset is prepared utilizing the heterogeneous database, where the structure dataset includes a plurality of graphical unified protein structure models. Each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair, and each graphical unified protein structure model includes a representation (e.g., a graphical representation) of binding strength between members of the protein binding pair. An artificial intelligence (AI) system is trained using the structure dataset to generate a trained AI that includes a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models. The trained AI is provided with a primary library comprising a plurality of initial candidate binding proteins and generates, using the protein interaction algorithm, a secondary library that includes a plurality of secondary candidate binding proteins, where the secondary binding proteins are selected by the protein interaction algorithm as including a modulated (e.g., increased or decreased, depending upon the desired end result) interaction with the ligand protein. These secondary candidate binding proteins are screened for modulated interaction with the ligand protein using an in vivo or in vitro screening assay to identify a set of tertiary candidate binding proteins with the desired modulation. At least a portion of the set of tertiary candidate binding protein can be synthesized for use in, for example, an in vitro biomedical assay, an in vivo biomedical assay, or as a therapeutic protein formulation. In such embodiments the plurality of experimental techniques can include collecting experimental data from an enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, or a neutralization assay. Suitable binding proteins include, but are not limited to, an antibody, a fragment of an antibody, a single-chain antibody, and a fragment of a single chain antibody. Such an antibody can be a therapeutic antibody or a result of immunization. In some embodiments the ligand can be an immune checkpoint protein, a tumor marker, or a component of a pathogen. In such an embodiment the pathogen can be a virus, such as a coronavirus and the ligand can be a spike protein of the coronavirus. In such methods, preparing the structure dataset can include generating graphical unified protein structure models for each first and second protein, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E), wherein V=\{v_1,v_2,…,v_N \} are node features for each amino acid residue and E=\{z_ij \}_(i≠j)are pairwise edge features between amino acid residues. [0012] Another embodiment of the inventive concept is a composition that includes a mutated ligand generated by the method described above. Embodiments of the inventive concept include a composition for use in treating a viral infection, and that includes a mutated ligand generated as described above, which can act to compete for a host receptor of the virus causing the viral infection. In such an embodiment the viral infection can be a coronavirus infection, and the mutated ligand shows increased affinity for ACE2 relative to a wild type SARS coronavirus spike protein or relative to SARS spike proteins of a plurality of SARS coronavirus variants. [0013] Embodiments of the inventive concept include a method of identifying a mutated pathogen with increased infectivity or escape from immunotherapy relative to that of a wild type or known pathogen, by providing a heterogeneous database comprising data related to interactions between a first set of proteins and a second set of proteins, where the heterogeneous database comprises experimental data from a plurality of experimental techniques. A structure database is generated utilizing the heterogeneous database, wherein the structure dataset includes a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair. The artificial intelligence (AI) system is trained using the structure database to generate a trained AI that includes a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models. The trained AI is provided with a primary library comprising sequence information for an immunotherapy protein directed to the pathogen or a host receptor for the pathogen and a plurality of initial candidate pathogen ligand proteins originated from mutant pathogens, where the immunotherapy protein or the host receptor interacts with a ligand protein of wild type pathogen. The protein interaction algorithm is used to generate a secondary library that includes a plurality of secondary pathogen ligand proteins, where the secondary pathogen ligand proteins are selected by the protein interaction algorithm as having a reduced interaction with the immunotherapy protein or an increased interaction with the host receptor. These secondary pathogen ligand proteins are screened for reduced interaction with the immunotherapy protein or increased interaction with the host receptor using an in vivo or in vitro assay. Pathogen that includes a secondary pathogen ligand protein with reduced interaction with the immunotherapy protein relative to the wild type pathogen is reported to a practitioner as likely to escape treatment with the immunotherapy protein. Pathogen that includes a secondary pathogen ligand protein with increased interaction with the host receptor relative to the wild type pathogen is reported to a practitioner as likely to highly infective. The experimental techniques can include collecting experimental data from enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and/or a neutralization assay. In such embodiments the immunotherapy protein can be selected from the group consisting of an antibody, a fragment of an antibody, a single- chain antibody, and a fragment of a single-chain antibody. The mutated pathogen can be a virus, such as a coronavirus. Similarly, the plurality of initial candidate pathogen proteins can include a coronavirus spike protein. In some embodiments preparing the structure dataset can include generating graphical unified protein structure models for each first and second protein, where the graphical unified protein structure models are prepared as a graph G=(V,E), wherein V=\{v_1,v_2,…,v_N \} are node features for each amino acid residue and E=\{z_ij \}_(i≠j)are pairwise edge features between amino acid residues. [0014] Embodiments of the inventive concept include methods of improving accuracy of prediction of binding affinities between a first protein and a second protein, by providing a heterogeneous database comprising data related to interactions between a first set of proteins and a second set of proteins, where the heterogeneous database comprises experimental data from a plurality of experimental techniques. A structure database is prepared utilizing the heterogeneous database, where the structure dataset comprises a plurality of graphical unified protein structure models, wherein each graphical unified protein structure model incorporates both sequence and experimental data of members of a protein binding pair, and wherein each graphical unified protein structure model comprises a representation of binding strength between members of the protein binding pair. An artificial intelligence (AI) system is trained using the structure database to generate a trained AI comprising a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models. The trained AI is provided with a primary library comprising sequence information for the first protein and the second protein and generating, using the protein interaction algorithm. A predicted binding affinity between the first protein and the second protein selected from the primary library by a user is then reported. The experimental techniques can include collecting experimental data from enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and/or a neutralization assay. In some embodiments the first protein is an antibody and the second protein is a ligand. In some embodiments the first protein is a coronavirus spike protein and the second protein is ACE2. [0015] Embodiments of the inventive concept include a method of generating a high affinity antibody directed to an antigen, by providing a heterogeneous database including data related to interactions between the antigen and a set of initial candidate antibody proteins that can form a protein binding pair, wherein the heterogeneous database comprises experimental data from a plurality of experimental techniques. A structure database is prepared using the heterogeneous database, where the structure dataset includes a plurality of graphical unified protein structure models, where each graphical unified protein structure model incorporates both sequence and experimental data of members of the protein binding pair, and where each graphical unified protein structure model includes a representation of binding strength between members of the protein binding pair. An artificial intelligence (AI) system is trained using the structure database to generate a trained AI that includes a protein interaction algorithm derived from correlation between sequence and binding strength elements of the plurality of graphical unified structure models. The trained AI is provided with a primary library that includes sequence information for the antigen and a plurality of initial candidate antibody proteins originated from an initial antibody directed to the antigen. The protein interaction algorithm generates a secondary library that includes a plurality of secondary antibody proteins, wherein the secondary antibody proteins are selected by the protein interaction algorithm as comprising an increased interaction with the antigen relative to the initial antibody. The secondary antibody proteins are screened for increased interaction with the antigen relative to the initial candidate antibody using an in vivo or in vitro assay to identify a plurality of tertiary antibody proteins having increased affinity for the antigen relative to the initial antibody. One or more of such screened secondary antibody proteins can be synthesized to generate an antibody or antibodies having improved affinity for the antigen from among the plurality of tertiary antibody proteins. The high affinity antibody can be a divalent antibody, a fragment of a divalent antibody, a single-chain antibody, or a fragment of a single-chain antibody. The antigen can be derived from a pathogen, an immunotherapy target, or a cancer marker. Preparing the structure dataset can include generating graphical unified protein structure models for each of the antigen and initial candidate antibody protein pairs, wherein each of said graphical unified protein structure models is prepared as a graph G=(V,E), wherein V=\{v_1,v_2,…,v_N \} are node features for each amino acid residue and E=\{z_ij \}_(i≠j)are pairwise edge features between amino acid residues. [0016] Embodiments of the inventive concept include a system for deriving protein binding characteristics, which includes: (a) a database module, including heterogeneous biologic data, wherein heterogeneous biologic data comprises protein sequence data and biologic data originating from a plurality of experimental techniques or forms of expression for the biologic data; (b) a protein representation module, in which heterogeneous data from the database module is used to construct a plurality of graphical hierarchal protein structures for proteins represented in the database module; and (c) an AI module, including encoded instructions to utilize the plurality of graphical hierarchal protein structures as a training set to derive a protein interaction algorithm and to apply the protein interaction algorithm to evaluate or estimate binding characteristics of wild type and/or mutated proteins provided to the AI module. In such a system the plurality of experimental techniques can include enzyme linked immunosorbent assay, surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and/or a neutralization assay. The database module can include binding energy estimates derived from experimental data, and/or can include biological data directly related to a protein or proteins being characterized. Alternatively, the database module can not include biological data directly related to a protein or proteins being characterized. In some embodiments the system can include an effector, such as a liquid handling device, a handler for a disposable component, and an incubator. Such a system can include a controller (which can include the AI module) that is communicatively coupled to the effector. In some embodiments the system includes a sensor, such as a colorimeters, a spectrophotometer, a fluorometer, a luminometer, and/or an imaging system. In such embodiments the sensor can be communicatively coupled to the AI module. [0017] Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components. Brief Description of The Drawings [0018] Fig.1A schematically depicts an exemplary workflow of the UniBind™ framework. Detailed Description. [0019] FIG.1B schematically depicts an exemplary architecture of a deep neural network utilized in systems and methods of the inventive concept. [0020] FIG.2A shows a typical regression correlation between calculated and experimental values of changes in binding affinity for all mutations in SKEMPI V2.0. [0021] FIG.2B shows an exemplary study of the type shown in FIG.2A, where the analysis is stratified into protein complexes containing single amino acid substitution in the S-protein. [0022] FIG.2C shows an exemplary study of the type shown in FIG.2A, where the analysis is stratified into protein complexes containing multiple amino acid substitutions in the S-protein. [0023] FIG.2D shows typical mean absolute errors (MAEs) of the AI predictions from a study as shown in FIG.2A with one or more amino acid substitutions. [0024] FIG.3A shows typical regression performance of affinity change prediction between ACE2 and RBD variants measured by ∆KD,app.. [0025] FIG.3B shows predicted performance on RBD mutation effects on SARS-CoV-2 variant Alpha (N501Y). [0026] FIG.3C shows predicted performance on RBD mutation effects on SARS-CoV-2 variant Beta (K417N + E484K + N501Y). [0027] FIG.3D shows predicted performance on RBD mutation effects on SARS-CoV-2 variant Delta (L452R+T478K). [0028] FIG.3E shows predicted performance on RBD mutation effects on SARS-CoV-2 variant and Eta (E484K). [0029] FIG.4A shows typical regression performance of affinity change prediction between RBD and ACE variants measured by log2 enrichment ratio. [0030] FIG.4B shows a typical correlation between predicted and experimental RBD-ACE2 affinities for single point mutations. [0031] FIG.4C shows a typical correlation between predicted and experimental RBD-ACE2 affinities for multi-point mutations. [0032] FIG.4D shows a distribution map of log2 enrichment scores of newly designed ACE2 variants. [0033] FIG.4E shows typical results of an experimental test of ACE2 variants binding to RBD using ELISA analysis. [0034] FIG.4F shows proposed interactions between wild-type and genetically modified ACE2 and SARS-CoV-2- RBD, where genetically modified ACE has a N330Y mutation. [0035] FIG.4G shows proposed interactions between wild-type and genetically modified ACE2 and SARS-CoV-2- RBD, where genetically modified ACE has a Q42L mutation. [0036] FIG.4H shows a heatmap of S-protein–ACE2 binding affinities across species. [0037] FIG.4I shows a regression analysis of predicted versus experimental affinity change between S-proteins of sarbecoviruses and human ACE2 orthologues. [0038] FIG.4J shows a heatmap of predicted affinity values for S-protein–ACE2 binding between SARS-CoV-2 variants and ACE2 proteins from 24 animal species. [0039] FIG.5A shows regression performance of RBD-antibody affinity (escape score) prediction for the effects of different RBD mutations on RBD-Ab binding. [0040] FIG.5B shows a typical heatmap of an experimental escape score matrix upon mutations of RBD to different antibodies. [0041] FIG.5C shows a typical heatmap of a predicted escape score matrix upon mutations of RBD to different antibodies. [0042] FIG.5D shows stratified analysis of regression performance on Class 1 neutralization antibodies. [0043] FIG.5E shows stratified analysis of regression performance on Class 2 neutralization antibodies. [0044] FIG.5F shows stratified analysis of regression performance on Class 31 neutralization antibodies. [0045] FIG.5G shows stratified analysis of regression performance on Class 4 neutralization antibodies. [0046] FIG.5H shows average escape scores of each site on the RBD. [0047] FIG.5I shows an escape score matrix of S protein variants to different antibodies calculated by UniBind™. [0048] FIG.5J shows a typical receiver operating characteristic (ROC) plot of predicted escape scores of different S protein variants. [0049] FIG.5K shows predicted escape scores of antibodies as individual VOC boxplots. [0050] FIG.6A shows predicted ACE2 binding affinity and antibody escape versus dates of variant emergence in the course of the COVID19 pandemic. [0051] FIG.6B shows a typical correlation between AI-generated measurements of S-protein trimer-ACE2 affinity with experimental results. [0052] FIG.6C shows a typical correlation between AI-generated measurement on RBD-ACE2 affinity with experimental results. [0053] FIG.6D shows a heat map illustrating the effect of mutations in the RBD segment on ACE2 binding affinity changes. [0054] FIG.6E shows a heat map showing the effect of mutations in the RBD segment on antibody escape scores. [0055] FIG.6F shows characteristics of a viral lineage evolutionary path. [0056] FIG.6G shows characteristics of AI-predicted new variant’s evolution based on five Omicron lineage. [0057] FIG.6H shows distribution maps of ACE2 binding affinity of variants from subsampled GISAID data. [0058] FIG.6I shows distribution maps of AI-predicted ACE2 binding affinity values of potential variants. [0059] FIG.6J shows a typical correlation analysis between reported fitness and affinity-based evolutionary score (evo-score). [0060] FIG.6K shows characteristics of single mutations’ effects on ACE2 binding affinity, antibody escape, and evo-score. [0061] FIG.6L shows effects of essential mutations on evo-scores of five recent Omicron lineages. [0062] FIG. 7 schematically depicts an exemplary architectural details of a geometry and energy attention (GEA) module. Detailed Description [0063] The inventive subject matter provides apparatus, systems and methods in which an AI- based UniBind™ framework is provided that includes at least three major components: protein representation as a graph at the residue- and atom- levels, BindFormer blocks with geometry and energy attention, and multi-task learning for heterogeneous biological data integration. Trained on more than curated 70,000 protein structure-to-function data, UniBind™ accurately predicted binding affinities of SARS-CoV-2 spike protein mutants to human ACE2 receptor, or to neutralizing monoclonal antibodies. Systematic tests on major benchmark datasets and experimental validation demonstrated that UniBind™ is accurate, robust, and scalable. Inventors further applied it to predict future viral variants of concern and their evolutionary features, by searching an evolutionary space of more than 30,000 current and predicted future SARS-CoV-2 spike protein variants. This efficient in-silico approach for predicting protein-protein interactions and its ‘affinity maturation’ has the potential for wide applicability in biomedical research and designing effective customized therapeutics. [0064] Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components. [0065] The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art. [0066] In some embodiments, the numbers expressing quantities of ingredients, properties such as concentration, reaction conditions, and so forth, used to describe and claim certain embodiments of the invention are to be understood as being modified in some instances by the term “about.” Accordingly, in some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable. The numerical values presented in some embodiments of the invention may contain certain errors necessarily resulting from the standard deviation found in their respective testing measurements. [0067] As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. [0068] The recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g. “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention. [0069] Groupings of alternative elements or embodiments of the invention disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims. [0070] It should be noted that any language directed to a computer should be read to include any suitable combination of computing devices, including servers, interfaces, systems, databases, agents, peers, engines, controllers, or other types of computing devices operating individually or collectively. One should appreciate the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, solid state drive, RAM, flash, ROM, etc.). The software instructions preferably configure the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclosed apparatus. In especially preferred embodiments, the various servers, systems, databases, or interfaces exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public-private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods. Data exchanges preferably are conducted over a packet- switched network, the Internet, LAN, WAN, VPN, or other type of packet switched network. [0071] One should appreciate that the disclosed techniques provide many advantageous technical effects including provision of rapid and accurate prediction of the effects of specified changes in protein amino acid sequence on interaction of the protein with elements of its environment. This can improve effectiveness of therapeutic proteins and reduce the time required for their development, as well as improving the accuracy of prediction of likely pathogen variants. [0072] The following discussion provides many example embodiments of the inventive subject matter. Although each embodiment represents a single combination of inventive elements, the inventive subject matter is considered to include all possible combinations of the disclosed elements. Thus, if one embodiment comprises elements A, B, and C, and a second embodiment comprises elements B and D, then the inventive subject matter is also considered to include other remaining combinations of A, B, C, or D, even if not explicitly disclosed. [0073] As used herein, and unless the context dictates otherwise, the term "coupled to" is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms "coupled to" and "coupled with" are used synonymously. [0074] Rapid progress on high throughput approaches in experimental biology has generated an unprecedented amount of sequence data with corresponding binding affinity information. There are different experimental methods for measuring binding affinities, including surface plasmon resonance, fluorescence spectroscopy, flow cytometry, and neutralization assay. The SKEMPI V2.0 database contains the affinity changes arising from amino acid substitutions of over 7,000 structurally-solved protein complexes. Moreover, ‘deep mutational scanning’ can be used to generate large-scale mutational data for any protein. These data can then be organized into a sequence-function map to reveal the functional consequences of all possible single mutations. These data, together with other large proteomic and biochemical databases such as SKEMPI V2.0, provides an opportunity to model changes in SARS-CoV-2 mutations and evaluate their functional impact. [0075] While protein-protein interaction data were generated at an unprecedented scale, these data show enormous heterogeneity in terms of the measurement method (e.g., binding energies vs log2 enrichment ratios vs EC50s), as well as the experimental conditions. Therefore, integration of these heterogeneous datasets remains a challenge for any AI system. Changes in binding free energies due to non-synonymous mutations (interface ΔΔG) influence both the conformation of the backbone amino acid residues, and interactions between the side chains at the atomic level. The present inventors (Inventors) have developed an AI-based UniBind™ framework to make a full use of multi-source and heterogeneous biological data and accomplish tasks related to protein–protein binding affinity prediction. [0076] UniBind™ is a generalizable modular framework which includes a hierarchical representation of protein structure as a graph at both atom- and amino acid residue-level and a dual-path neural network named BindFormer, with novel attention mechanisms of geometry and energy attention (GEA). Furthermore, to address the data heterogeneity issue, Inventors trained UniBind™ with multi-task learning and model ensemble to make full use of the unprecedented large-scale experimental data (for example, deep mutational scanning and high throughput genomic sequencing). Trained using over 70,000 mutations, which is 10-fold more data than previously available, UniBind™ predicted the binding affinity between SARS-CoV-2 S-protein mutants and human ACE2 receptor, or with neutralizing monoclonal antibodies. Systematic tests and validations on major benchmark datasets for affinity prediction demonstrated that UniBind™ is accurate, robust, and scalable. [0077] The GISAID initiative has generated a sharing platform for providing over 10 million SARS-CoV-2 genetic sequences. Such approaches have been used to make useful predictions regarding the evolution of SARS-CoV-2, and adapted to generate an early warning system for emerging variants. In addition to various affinity-related prediction validation, Inventors have found that UniBind™ is a powerful tool for prospective biology analysis: high-throughput deep mutational scanning in silico, lineage analysis (including for the new lineage trends), and affinity-based viral evolution. Furthermore, Inventors have applied UniBind™ to address infectivity and immune escapes of different variants of SARS-CoV-2, by predicting properties of S-protein in different variants and their sub-lineages, including its binding affinity to ACE2 and escape from antibody/vaccine therapies. Finally, with biological experiments as an instrumental data resource, Inventors predicted model-guided evolution of SARS-COV-2 and identified single or groups of amino acid substitutions which can increase the virulence by increased S-protein ACE2 affinity, facilitate immune escape, or increase variant fitness. UniBind™ showed its potential in a variety of biomedical applications including protein engineering, drug design and pandemic control. Dataset characteristics and system overview [0078] Our training datasets included SKEMPI 2.0, an open-source database with information on binding free energy changes after mutations of 7,085 structurally-solved protein-protein interactions, and three other protein-protein binding affinity datasets (PADBs) constructed and curated from retrospective deep mutational scanning, flow cytometry or neutralization assay experiments. Examples of suitable datasets are provided below in Table 1. Table 1 PADBs contain information on affinity changes of the RBD-ACE2 binding upon mutations of the S protein or ACE2, as well as changes in RBD-antibody binding affinities upon RBD mutations (Table 1). Different evaluation metrics were employed as a label for multi-task learning. More specifically, the PADB-SA dataset which comprised 45,363 affinity data of SARS-CoV-2 S-protein variants binding to human ACE2 with the determination of apparent dissociation constants (KD,app) using deep mutational scanning approaches (Starr et al., 2022a; Starr et al., 2020). The PADB-AS dataset included 2,230 affinity data of human ACE2 variants binding to wild type SARS-CoV-2 S-protein, with measurements of log2 enrichment ratios or fluorescence intensity change (∆MFU×1000) (Chan et al., 2020). PADB-SAb comprised 16,971 affinity data of different antibodies binding to SARS-CoV-2 S-protein variants, with escape score, IC50, and fold change of IC50 (Cao et al., 2022; Liu et al., 2022; Wang et al., 2021). Overview of the AI model [0079] Inventors combined and integrated multi-source and heterogeneous biological data with UniBind™ for protein-protein interaction prediction tasks. UniBind™ includes three major components: protein representation as graph data structure, BindFormer blocks with geometry and energy attention (GEA), and multi-task learning for heterogeneous biological data integration. As shown in FIG.1A, the datasets described above together with corresponding protein structural and amino acid substitution information were expressed as graph data structure and fed into UniBind. UniBind™ can then systematically identify and quantify ACE2 or S- protein variants with altered affinity and antibody binding affinity/escape. Inventors further conducted prospective studies including AI-based lineage analysis, AI-based deep mutational scanning, fitness landscape depiction, and model-guided evolution (FIG .1A). FIG .1A provides an exemplary workflow of the UniBind™ framework. A heterogeneous affinity dataset was constructed using SKEMPI V2.0 and multiple sets of binding affinity data which were collected and curated from different experimental methods. Then UniBind™ was trained using heterogeneous multi-task learning methods for multiple regression predictions including affinity, log2 enrichment ratio, escape score, etc. Based on previous predictions on variants, multiple prospective analysis was performed on many aspects like lineage analysis, AI-based deep mutational scanning, and model-guided evolutions, as well as verified with experiments. For protein structure representation, Inventors developed a hierarchical representation of protein at an atom level and amino acid residue level as a graph, to better capture the protein-protein interaction functional characteristics, which served as the input for the BindFormer module (FIG. 1B). FIG.1B schematically depicts an exemplary architecture of a deep neural network utilized in systems and methods of the inventive concept. For protein representation, the residue-level and atom-level features of protein were extracted and aggregated using a unified protein graph representation; for the BindFormer module, Inventors applied a dual-path graph neural network based on GEA (geometry and energy attention) mechanisms; a final multi-task learning method was used to integrate heterogeneous biology experiments and measurements for robust and scalable affinity tasks. Inventors believe that the integration of certain known ‘knowledge’ of the protein-protein interaction process, e.g. the binding free energy (solvation free energy and molecular mechanical energy), can improve prediction performance. For the BindFormer module, it is a dual-path network with novel attention mechanisms of GEA to enable the messages passing in our network. GEA is a geometric invariant multi-head attention layer by aggregating geometric and energy terms. Furthermore, Inventors applied a multitask learning and model ensemble to increase the tolerance of variations in biological experimental measurements (such as affinity change, apparent affinity change, escape score, and log2 enrichment ratio) simultaneously. By checking performance on systematic benchmarks and case studies, the UniBind™ framework is accurate, robust, and scalable when applied to heterogeneous multi- source datasets. As shown, systems and methods of the inventive concept can be considered as a series of interconnected functional modules, through which information related to protein structure and interactions flows. One module can be a database module, which includes heterogeneous biologic data (i.e., biologic data originating from more than one source, more than one testing modality, and/or more than one form of expression for the biologic data). For example, such a database module can include protein sequence information and binding energy estimates derived from experimental results as provided by literature source as well as a range experimental results such as dose/response or EC50 curves for protein binding assays, surface plasmon resonance measurements, results of immunofluorescent microscopy studies, results from flow cytometry studies, etc.). Such a database module might or might not include biological data directly related to the protein or proteins being studies. Another module can be a protein representation module, in which heterogeneous data from the database module is used to construct hierarchal protein structures for proteins in the heterogeneous database. Such unified protein structures (which can be envisaged as a graph) incorporate both atom-level and amino acid-level structural data. Another module can be an AI module, which utilizes information from such unified protein structure models as a training set to derive a protein interaction algorithm. This AI module can also be used to apply the protein interaction algorithm so derived to evaluate or estimate binding characteristics of wild type and/or mutated proteins provided to the AI module. Such binding characteristics can be reported to provide a library of one or more mutant proteins desired increased or decreased binding relative to a second protein. Such a library of candidate mutant proteins can be further screened (e.g., using in vitro and/or in vivo methods) to determine functionality. AI prediction of affinities of protein-protein interactions [0080] Inventors validated UniBind™ for estimating the impact of single as well as multiple amino acid substitution on protein-protein interactions using the SKEMPI v2.0 dataset (Figures 2a to 2d). Changes in dissociation constant (∆Kd, kcal mol-1) were employed to measure the effects of mutations on binding affinity. The AI-predicted versus experimentally measured affinity changes following single or multiple amino acid substitutions were plotted. Inventors applied 10-fold cross-validation to calculate the Pearson’s correlation coefficient (PCC) between experimental and calculated ∆Kd. Overall, the prediction by UniBind™ is accurate, with a PCC of 0.85 (FIG.2A). FIG.2A shows a typical regression correlation between calculated and experimental values of changes in binding affinity for all mutations in SKEMPI V2.0. Inventors stratified the analysis into protein complexes containing single amino acid substitution in the S- protein (FIG.2B) or those with multiple amino acid substitutions (FIG.2C), and accurate results were generated with PCC of 0.78 and 0.91, respectively. Inventors also evaluated the mean absolute error (MAE) of the AI predictions with one or more amino acid substitutions (Figure 2d). Inventors found a MAE of no more than 1.5 kcal mol-1 in all amino acid substitution groups. AI prediction of the binding affinity between RBD mutations and ACE2 [0081] Inventors evaluated UniBind™ prediction on ACE2-binding affinity changes for all single amino acid substitutions, as well as some multipoint mutations in five SARS-CoV-2 RBD variants in the PADB-SA dataset. The effect of individual amino-acid mutations in RBD of SARS-CoV-2 variants was collected experimentally from previous studies, which were measured by KD,app using a deep mutational scanning approach. [0082] Inventors analyzed the effects of S-protein amino acid substitutions on its apparent binding affinity change (∆KD, app) to ACE2 (FIG.3A). FIG.3A shows typical regression performance of affinity change prediction between ACE2 and RBD variants measured by ∆K _D,app. FIGs.3A to 3E show predicted performance on RBD mutation effects on different SARS-CoV-2 variants including wild type, Beta (N501Y), Eta (K417N+E484K+N501Y), Delta (E484K). MAE, mean absolute error; R2, coefficient of determination; PCC, Pearson’s correlation coefficient. UniBind™ demonstrated a reliable performance in predicting ∆KD, app with a PCC of 0.90. Inventors stratified the binding affinity prediction into several important SARS-CoV-2 VOC subgroups, including the Alpha (N501Y; FIG.3B), Beta (K417N + E484K + N501Y; FIG.3C), Delta (L452R+T478K, FIG.3D) and Eta (E484K; FIG.3E), using wild-type SARS-CoV-2 as the baseline. The results indicated that UniBind™ was accurate for evaluating the impact of RBD mutations on ACE2 binding, achieving good performance with PCC of between 0.78 to 0.86 for all four variants. Prediction of ACE2-binding affinity to the S protein and engineering of decoy receptors [0083] Inventors extended predictions of S-protein ACE2 binding affinity in relation to ACE2 mutations and validated the model on the PADB-AS dataset (FIG.4A). FIG.4A shows typical regression performance of affinity change prediction between RBD and ACE variants measured by log2 enrichment ratio. [0084] FIGs.4B and 4C show typical results of an experimental test of ACE2 variants with single point mutations (FIG.4B) and mutli-point mutations (FIG.4C) binding to RBD using ELISA analysis. PCC, Pearson’s correlation coefficient; SCC, Spearman’s correlation coefficient. For the effects on ACE2 binding of single-point mutations to S-protein, the predicted log2 enrichment ratio of protein complexes versus actual experimental measurements had a PCC of 0.73. [0085] As shown, methods of the inventive concept can be used to generate ACE2-derived proteins and peptides that are ACE2 analogs having an increased affinity for coronavirus spike proteins relative to native ACE2. Such methods can, for example, be used to identify such ACE2 analogs that have high affinity binding to spike proteins of two or more strains of pathogenic or potentially pathogenic coronavirus strains. Such ACE2 analogs or mixtures of such ACE2 analogs can be used as ACE2 receptor traps that are effective against a wide range of coronavirus strains. This approach can be extended to the development of analogs of host virus receptors that are similarly effective as receptor traps against a range of pathogenic or potentially pathogenic viruses that interact with such host receptors. [0086] Inventors further validated the method using a subset of PADB-AS data by comparing predicted affinities with experimental multi-point mutations of ACE2 binding data obtained from flow cytometry, yielding a high PCC of 0.83 (FIG. 4C). FIG.4C shows a typical correlation between predicted and experimental RBD-ACE2 affinities, measured by log2 enrichment score and ∆MFU×1000， respectively. Experimental data were collected from a literature source, in which RBD-ACE2 affinity was tested using Flow Cytometry. [0087] Soluble ACE2 can act as a decoy to neutralize SARS-CoV-2 infection. Accordingly, UniBind™ can be used to design high affinity ACE2 receptor decoy molecules as a general strategy to target all current and future variants. Based on predictions, UniBind™ identified 111 single amino acid substitutions on ACE2 (SEQ ID NO.1) which could potentially increase the binding affinity towards S-protein. Inventors further applied an in silico evolution method to generate 13,913 ACE2 variants containing between 1 and 4 amino acid changes that lead to increased affinity. Inventors compared these predictions with the best candidate in previous literature (sACE2.v2.4) and the wild type, in which the predicted log2 enrichment ratios of these ACE2 variants had approximately 10-fold higher affinities than sACE2.v2.4 and 1000-fold higher than the wild type (FIG.4D). FIG.4D shows a distribution map of log2 enrichment scores of newly designed ACE2 variants. The orange line shows the reference ACE2 variant (sACE2.v2.4) collected from literature, and the green lines show the ACE2 variants selected for experimental validation. While human ACE2 (ACE2-WT, SEQ ID NO.1)was utilized in these studies, Inventors contemplate that ACE2 of other species, such as Mus musculus (SEQ ID NO. 2) or Rattus norvegicus (SEQ ID NO.3) can be similarly modified. Mutations to human ACE2 (SEQ ID NO.1) utilized in these studies are summarized below in Table 2.

Table 2 [0088] Five ACE2 variants were selected for experimental validation (ACE2-1, ACE2 -2, ACE2-7, ACE2-8 and ACE2-9) and compared these to sACE2.v2.4, known to have the highest affinity for RBD binding. ELISA experiments showed that the EC50s of these five variants (ranging from 0.54 μg/ml to 1.36 μg/ml) were lower than both wild-type ACE2 (5.2 μg/ml) and sACE2.v2.4 (1.83 μg/ml), indicating high binding affinity as predicted by UniBind™ (as shown in FIG.4E). The results highlight the potential application of UniBind™ in therapeutic protein engineering. [0089] A post-hoc Dunnet’s test was used for pairwise comparisons of each variant against ACE2-WT. Results are summarized in Table 3. Table 3 [0090] Table 2 summarizes results of logistic fit parameters of ACE2 variant binding to S- protein. Mean ± S.E.M. of EC50 and log EC50 derived from individual logistic fits to ELISA data. Data represents n = 3 for each variant. Statistical analysis was performed across different variants using one-way ANOVA (P < 0.0001). A one-way ANOVA with a post-hoc Dunnet’s comparison revealed that the five ACE2 variants cited above as predicted to bind to S-protein with higher affinity and all showed significantly lower log EC50 than ACE2-WT. This demonstrates the applicability of UniBind™ in the rational engineering of therapeutic proteins. The molecular basis for the affinity enhancement in regard to therapeutic proteins useful in treating and/or preventing coronavirus infection can be rationalized from structural analyses as shown in FIGs.4F and 4G. FIG.4F shows interactions between wild-type and genetically modified ACE2 and SARS-CoV-2- RBD, where genetically modified ACE has a N330Y mutation and interaction with P499 of SARS-CoV-2-RBD. FIG. 4G shows interactions between wild-type and genetically modified ACE2 and SARS-CoV-2- RBD, where genetically modified ACE has a Q42L mutation and interaction with Y449 of SARS-CoV-2-RBD. For example, Q42 is situated in a highly negatively charged area which may prevent interaction with the RBD35,36. Therefore, the Q42L mutation may increase the hydrophobic area of ACE2 and improve binding to Q498 and Y449 on the RBD. Furthermore, the N330Y substitution may provide additional van der Waals contacts and H-bonds with the RBD. [0091] Inventors have found that methods of the inventive concept provide greatly improved accuracy in predicting effects of single and point mutations on the strength of inter-protein interactions relative to conventional methods. Table 43 shows typical results of a comparison of methods of the inventive concept and various conventional methods on prediction performance in the SKEMPI 2.0 set with mutation-level validation. Pearson correlations between AI model- predicted ∆∆G data and reported experimental ∆∆G data. S1131: a subset of 1,131 non- redundant interface single-point mutations. S4169: a subset of 4,169 single-point mutations compiled from the SKEMPI 2.0 dataset. S8338: a subset of 4,169 single-point mutations and all the corresponding reverse mutations. M1707: a subset of 1,707 non-redundant interface multi- point mutations. Table 4 While this exemplary data is generated using the SKEMPI 2.0 data set, Inventors believe that similar improvements are realized on application of methods of the inventive concept to other protein-related data sets. AI prediction of neutralizing antibody binding affinities to SARS CoV-2 variants [0092] Identifying the binding affinity between S-protein and antibody is essential for predicting immune evasion by current and future SARS-CoV-2 variants. UniBind™ can yield escape scores to reflect the S-protein antibody affinity after training using the PADB-SAb dataset. [0093] Inventors evaluated the performance of UniBind™ on single S-protein mutations using the deep mutational scanning data in PADB-SAb. Correlation analyses revealed a PCC of 0.85, indicating a good correlation between the predicted versus actual escape scores (FIG.5A, FIG. 5B, FIG.5C). FIG.5A shows regression performance of RBD-antibody affinity (escape score) prediction for the effects of different RBD mutations on RBD-Ab binding. FIG.5B shows a heatmap of an experimental escape score matrix upon mutations of RBD to different antibodies. Brightness represents the escape score. A brighter dot indicates that the mutation on site position of x-axis is more likely to lead to higher immune escape for antibody of y-axis. MAE, mean absolute error; R ², coefficient of determination; PCC, Pearson’s correlation coefficient. FIG.5C shows a heatmap of a predicted escape score matrix upon mutations of RBD to different antibodies. Brightness represents the escape score. A brighter dot indicates that the mutation on site position of x-axis is more likely to lead to higher immune escape for antibody of y-axis. MAE, mean absolute error; R ², coefficient of determination; PCC, Pearson’s correlation coefficient. Moreover, the predicted escape score showed a good correlation with neutralization experiment data in the dataset of PBAD-SAb (FIG.5D, FIG.5E, FIG.5F, FIG.5G), with all PCCs above 0.8 in four classes of antibodies, validating its utility in predicting antibody escape. Antibody escape scores range from 0 to 1, with increasing scores denoting higher levels of escape. The average escape scores of each site on the RBD are shown in FIG.5H. UniBind™ prediction of the RBD sites essential for immune escaping is consistent with experimental data. FIGs.5D to 5G show stratified analysis of regression performance on 4 classes of neutralization antibodies. FIG.5H shows the average mutational effects on escape score at each RBD site. The blue line represents experimental data; the orange line represents predicted results; shadows of each color indicate the standard error of each line. [0094] Next, to predict the antibody escape ability of different S-protein variants, Inventors utilized UniBind™ to generate an escape score matrix of common variants to current antibodies (FIG.5I). FIG.5I shows an escape score matrix of S protein variants to different antibodies calculated by UniBind™. The x-axis is antibodies, and the y-axis is common single point mutations and variants of concern. The higher the escape score, the easier it is to escape. The results are consistent with the current consensus that Omicron and its derivative variants display the strongest immune escape ability. [0095] To further validate the performance, Inventors set several thresholds for neutralization data according to the original literature, and divided the predicted data into two groups: Escape or Non-Escape. The receiver operating characteristic (ROC) plot shows that our predicted escape score could accurately identify the ability of different variants to escape neutralization by different antibodies, with an area under the curve (AUC) of 0.944 (FIG.5J). [0096] Stratification analysis showed that our model has a robust performance on both single mutation variants (AUC = 0.954) and VOCs (AUC = 0.937). A subgroup analysis of antibody escape profile on eight VOCs indicates that the predicted escape score is closely correlated with a neutralization escape profile (FIG.5K). FIG.5K shows predicted escape scores of antibodies as individual VOC boxplots. For each analysis, antibodies were separated into two groups that can be escaped or not escaped by SARS-CoV-2 variants according to relevant literature. The Center line indicates median; box limits indicate upper and lower quartiles; whiskers indicate 1.5x interquartile range; points indicate outliers; P values less than 0.05, 0.01, 0.001, 0.0001 are summarized with one to four asterisks, respectively. AI longitudinal prediction on viral evolution and antibody escape [0097] Inventors have found that UniBind™ can accurately predict the S-protein ACE2 binding affinity and potential for antibody escape. Based on these findings Inventors believe that UniBind™ can be used to perform a prospective analysis on potential SARS-COV-2 variants, including AI-based lineage analysis, AI-based deep mutational scanning, and model-guided evolution. Prediction of current and future VOCs affinity and immune escape can pre-emptively inform pandemic control measures such as vaccinations and targeted therapeutics development. [0098] Inventors initially utilized the UniBind™ model to assess S-protein trimer-ACE2 binding affinities, antibody binding affinities and plotted these predicted variants against the time of their appearance in the course of the COVID-19 pandemic (Figure 6a), including the recently identified Omicron BA.4 and BA.5 variants. FIG.6A shows predicted ACE2 binding affinity and antibody escape versus dates of variant emergence in the course of the COVID19 pandemic. Circles represent reported SARS-CoV-2 variants from GISAID data; circles annotated with common VOC names and their PANGO lineage represent variants examined in a number of experimental ways. Co-dimension: time from January 2020 to September 2022 (x-axis), antibody escape scores (y-axis), ACE2 affinity (color and circle size, more red and larger circle means increased affinity).AI prediction showed an overall trend for the newer variants to have a higher antibody escape score. The Omicron sub-lineages BA.1 and BA.2 which emerged late in 2021 showed a reduction in ACE2 affinity but enhanced antibody escape. The Omicron sub-lineages 22A (BA.4), 22B (BA.5), and 22C (BA.2.12.1) were predicted to show enhanced S-protein ACE2 binding affinity as well as an increase in antibody escape. UniBind™’s predicted S- protein ACE2 affinities on all VOCs are consistent with that reported in the literature, with a PCC score of 0.74 in the RBD-ACE2 affinity prediction (FIG.6B), and a PCC score of 0.89 in the S-protein trimer-ACE2 affinity prediction (FIG.6C). FIG.6B shows a correlation between AI-generated measurements of S-protein trimer-ACE2 affinity with experimental results. FIG. 6C shows a correlation between AI-generated measurements of RBD-ACE2 affinity with experimental results. Inventors have noted that such experiments performed using prior art methods showed inconsistent affinity measurements on some variants using a single RBD or trimer. However, UniBind™ prediction were in good agreement with those reported in the literature, suggesting an intrinsic affinity difference between the RBD-ACE2 and S-protein trimer-ACE2. [0099] Inventors also performed in silico deep mutational scanning prediction on long amino acid sequences to generate a sequence-function profile, to simultaneously predict the consequences of amino acid substitution mutations on ACE2 affinity in the RBD segment (FIG. 6D) as well as the antibody escape status (FIG.6E). FIG.6D shows a heat map illustrating the effect of mutations in the RBD segment on ACE2 binding affinity changes, red color means increased affinity and blue color means decreased affinity. FIG.6E shows a heat map showing the effect of mutations in the RBD segment on antibody escape scores, blue means decreased antibody binding affinity. This approach is highly consistent with previous studies using an experimental deep mutational scanning method for measurement of mutations in RBD which affects ACE2 affinity (Figure 6d). Moreover, UniBind™ can simultaneously predict affinity changes on multiple mutations such as that from all 16 VOCs, which addresses the problem of a heterogeneous batch effect in experimental biology. In addition, escape scores were calculated by surveying 80 neutralizing antibodies and averaging the escape score of AI-based antibodies to generate a sequence-to-escape heatmap (Figure 6e), which can better reflect variant's overall immune escape ability. [00100] The evolution of SARS CoV-2 is mainly determined by two broad categories of changes to the virus: infectivity/ transmissibility or immune response escape. To address this Inventors designed an affinity-based evolutionary score (evo-score) system to take into consideration of S-protein ACE2 binding affinity and S-protein antibody binding affinity and compared these data to analogous scores calculated using a mutational fitness analysis method. Analysis of UniBind-predicted evo-score in the context of viral fitness showed a high correlation with its corresponding mutational fitness profile of each variant (FIG.6B). Using the five existing Omicron variants (BA.2, BA.2.3, BA.4, BA.5, BA.2.12.1) as a starting point, UniBind™ predicted 7,560 variants based on a combination of 1-4 non-synonymous mutations from the 40 substitutions with a high score of non-synonymous mutations. Stratifying the variants using this approach reveals a number of clusters consistent with the overall evolution of SARS-CoV-2 variants. Predictions for a viral evolution analysis of SARS-CoV-2 were validated using the GISAID dataset. The predicted viral fitness landscape of the current variants from the GISAID database is shown in FIG.6F. FIG.6F shows characteristics of a viral lineage evolutionary path. Contour lines represent the affinity-based landscape. Each circle marks a cluster of variants with a similar evolutionary property. UniBind™ also predicted unknown variants (blue area, Figure 6f) for their S-protein ACE2 affinity and antibody escape ability. According to the clustering obtained from our prediction, the evolution of SARS-CoV-2 was initially on affinity enhancement, with only a modest incremental antibody escape (purple to red, Figure 6f). The emergence of the Omicron (BA.1) and Omicron (BA.2) represented a mild reduction in S-ACE2 affinity, but dramatically enhanced antibody escape (red to green, Figure 6f), which may represent a result of the immune pressure from the previous infection or through the vaccines induced immune selection. The model predicted that the BA.4 and BA.5 variants can display further antibody escape, with little changes in S-ACE2 binding affinity, versus the BA.1 and BA.2. Interestingly, the Omicron BA.2.12.1 variant displays a near one log-unit improvement in ACE2 affinity, without large changes in antibody escape. UniBind™ predicted a stronger overall evo-score for variants evolving in the direction of a higher antibody escape ability. [00101] Inventors applied the evo-score to predict the evolution of the Omicron variants (FIG. 6G). FIG.6G shows characteristics of AI-predicted new variant’s evolution based on five Omicron lineage. Blue dots represent new variants, green dots represent original Omicron lineage, and orange dots represent variants of interests that with top-five highest evo-score for each Omicron lineage. UniBind™ can predict the evolution of the Omicron variants to even higher evo-scores driven by several key non-synonymous mutations, particularly A475E which occurs most frequently. The main determinant of higher evo-scores in these predicted variants is enhanced antibody escape, with their S-protein ACE2 affinity values remaining around the same. Finally, UniBind™ also predicted that there is a possibility for future variants evolving to high S-protein ACE2 affinity values, underscoring a risk for potentially more virulent strains (FIGs. 6H and 6I). FIG.6H shows distribution maps of ACE2 binding affinity of variants from subsampled GISAID data. FIG.6I shows distribution maps of AI-predicted ACE2 binding affinity values of potential variants. FIG.6J shows correlation analysis between reported fitness and affinity-based evolutionary score (evo-score). FIG.6K shows characteristics of single mutations’ effects on ACE2 binding affinity, antibody escape, and evo-score. Dashed lines represent contour lines of evo-score; blue dots represent single mutations; orange dots show several well-known mutations which could significantly improve virus evolution. FIG.6L shows effects of essential mutations on evo-scores of five recent Omicron lineages. Circle size and color represent appearance frequency in top score variants that were derived from a relevant original lineage. [00102] Inventors have combined heterogeneous sets of protein-protein binding affinity data with AI-based protein sequence-to-function modeling to systematically identify and determine various affinity related tasks. There are a number of challenges in such an approach to prediction of protein-protein interactions. First, protein representation needs to take into consideration the entirety of the binding interface and the residues which form chemical bonds with each other in the setting of protein-protein interactions. Secondly, prior learning approaches have limitations due to poor scalability to large datasets, and predictions limited to single mutation variants. Therefore, computational integration methods with scalability represent a significant hurdle towards development of rapid, robust, and accurate tools for assessment of protein interactions, which have applicability to a wide variety of protein:protein interaction application, such as infectivity and neutralization escape for new SARS-CoV-2 variants. [00103] Inventors have both developed a generalizable modular UniBind™ framework and demonstrated real-world application of same. A graph neural network was developed for protein representation by integrating the information on both the residue main backbone at a molecular level and the side chain information at an atomic level, and their interactions. A novel BindFormer block approach was developed to learn their interactions measured by quantum physics and thermodynamics. Therefore, UniBind™ is more in-tuned for protein-protein interaction prediction, as both structural changes and energetic eﬀects are crucial for protein−protein binding aﬃnity prediction. Furthermore, UniBind™ integrates several heterogeneous sources of datasets and performs multi-task learning and model assembling to predict various task-specific affinity changes (for example S-protein ACE2 interaction and antibody escape scores). Inventors have validated UniBind™ on major publicly available datasets for affinity prediction, and this has demonstrated that UniBind™ is accurate, robust, and scalable. Another advantage of UniBind™ prediction on the S-protein ACE2 affinities is that it is based on using a full-length S-protein which is very desirable and feasible by AI, yet not feasible or impractical by biological experiment designs. Inventors have found that prior art experimental methods using affinity values of the RBD-ACE2 interaction to represent affinities of the full- length S-protein ACE2 interaction have technical challenges, limitations, and inaccuracies. Inventors have also found that UniBind™ provides methods for designing high affinity ACE2 receptor decoy molecules as a general strategy to target all current and future variants and validated it experimentally. [00104] New variants of SARS-CoV-2 have and will continue to emerge that have significant improvements in their fitness, which drives waves of cases in the pandemic. SARS-CoV-2 variant fitness can be measured by the affinity between the S-protein and ACE2, and the affinities between the S-protein and its neutralizing antibodies. The first interaction provides a measure of infectivity and the second for the immune escape potential. These can be assessed using in vitro approaches in the laboratory, but this is potentially hazardous, time-consuming, costly, and error-prone (e.g., cross contamination, human error, etc.). Such prior art approaches also do not provide for the evaluation of the large numbers of variants which are currently being generated, and can only increasingly lag real-world needs as new variants emerge. Using UniBind™ deep mutational scanning, Inventors have developed an affinity-based evolutionary score (evo-score) system to take into consideration of S-protein ACE2 binding affinity and S- protein antibody binding affinity. To the best of the Inventor’s knowledge, this is the first time for construction and quantification of a SARS-COV-2 variant fitness landscape using these two main determinant factors, which can reveal the biological mechanisms of fitness and potential evolution trends. [00105] UniBind™ accurately predicted a significantly increased potential for immune escape, yet a modestly increased affinity of the Omicron variants S-protein towards ACE2. This is consistent with the growing body of in vitro and clinical data on the Omicron sub-lineages (BA.1, BA.2, BA.4, and BA.5) with respect to its increased transmission, neutralizing antibody escape, and/or decreased vaccine efficacy. UniBind™ predicted that the S-protein of the BA.4 and BA.5 variants have an increased antibody escape score, but its affinity towards ACE2 remains similar to the BA.2 variant. This means that the infectivity and severity of the BA.4 and BA.5 variants is expected to be comparably low, like the BA.2 variant. Inventors believe that the therapeutic efficacy of current neutralizing antibodies will be further compromised against the BA.4 and BA.5 variants. Importantly, UniBind™ predicted that additional mutations in the Omicron BA.4 background will result in variants with reduced S-protein affinity towards ACE2 (Figure 6g and Figure 6h). In contrast, UniBind™ prediction on the potential variants with the strongest S-protein ACE2 affinities pointed to lineages from the clade 19B and Alpha (B.1.1.7) of the first wave of infections from December 2020 to May 2021, all of which carry the N501Y mutation (FIG.6A and FIG.6H). Other variants with the strongest the S-protein ACE2 affinities shared the L452R mutation carried by the recently emerged Omicron BA.2 (FIG.6H). It was previously reported that N501Y and L452R increase their S-ACE2 binding affinity. More than 30,000 current were searched and predicted future SARS-CoV-2 variants evaluated. Inventors found that there may be future variants with an increased S-protein affinity towards ACE2, potentially emerging from some of the previous lineages such as Alpha or Beta VOCs (Figure 6a, Figure 6i). Moreover, the current BA.1-4 lineages are less likely to evolve into a strain with a much higher S-ACE2 affinity (FIG.6I). [00106] Systems and methods of the inventive concept provide a general framework for predicting the affinity of a protein-protein complex, and so provide methods directed to rapid prediction and screening for future outbreaks and information for future vaccine development. However, their impact on biology and medicine goes far beyond VOC predictions. Examples of other application include methods for designing better antibodies for immune-cancer therapies and designer drugs for a ligand-receptor interaction. Systems and methods of the inventive concept can include codes and datasets available through governmental Infectious Diseases Control Units as well as the entire scientific and medical communities in order to take full advantage of these resources. [00107] In some situations a decrease in the S-protein ACE2 binding affinity may be paralleled/ compensated by an increase in the viral replication efficiency, e.g. more efficient viral polymerase or other viral replication related proteins mutants, more efficient packaging mutants, or an improved host-viral interaction, all of which may facilitate viral survival or fitness. It is known that for virus to evolve, the various part of the genome will evolve in such a way that will cluster into the same genotype or subtypes. Therefore, even though there may be changes in other viral genome to account for viral fitness, such as enhanced viral replication efficiency, the changes in the S-protein should also be reflective of the same evolutionary changes. In some embodiments of the inventive concept, UniBind™ can incorporate and utilize data representing other parts of the viral genome (e.g., those related to reproductive efficiency) and utilize such data within functional models as described above for use in evaluating current and projected mutations for virulence, escape from immune protection, etc.. [00108] [00109] It should be appreciated that the AI-based methods for predicting protein-protein interactions have a variety of practical applications. In some embodiments such an AI-based approach can be used in methods to predict the effects of specific mutations in a protein’s amino acid sequence on interactions with one or more binding ligands (such as the same or a different protein, a nucleic acid, a carbohydrate polymer, etc.). If such a binding ligand is involved in a disease process, accurate prediction of mutations that provide for enhanced binding (e.g., providing a lower binding constant) relative to an initial therapeutic protein selected for optimization can be used to generate a library of one or more mutated proteins with improved binding characteristics. Such mutated proteins can then be utilized in screening studies to identify those with binding characteristics that can provide an improved therapeutic protein. Such screening studies can be performed in vitro (e.g., microplate or microbead-based binding studies using labeled proteins) and/or in vivo (e.g., using animal models of disease). For example, a method using an AI-based approach as described above can begin with a therapeutic antibody selected to bind to an immune checkpoint protein (such as PD-1 or PD-L1) to generate a library of one or more mutated antibodies with increased binding affinity for the immune checkpoint protein. Elements of such a library can then be screened for increased binding, for example using an ELISA directed to the immune checkpoint protein. Alternatively, or in addition, such elements of such a library can be screened using an animal model for a PD-1 bearing cancer. Screening methods utilizing cells grown in culture and/or artificial organ systems can also be used. In some embodiments data from such screening experiments can be provided to the AI-based method as an experimental database, which can in turn be used to refine results from the AI-based method, [00110] In some embodiments the target ligand can be associated with an infectious disease. In such embodiments the binding ligand can be a component of the pathogen, such as a surface protein or glycoprotein. Such a binding ligand can be directly involved in the disease process (e.g., a viral protein utilized in host cell recognition and entry) or can simply be characteristic of the pathogen. In such embodiments a method using an AI-based approach as described above can begin with a therapeutic antibody selected to bind to a target ligand of the pathogen to generate a library of one or more mutated antibodies with increased binding affinity for the target ligand. Elements of such a library can then be screened for increased binding, for example using an ELISA directed to the target ligand. Alternatively, or in addition, such elements of such a library can be screened using an animal model for the infectious disease. Screening methods utilizing cells grown in culture and/or artificial organ systems can also be used. In some embodiments data from such screening experiments can be provided to the AI-based method as an experimental database, which can in turn be used to refine results from the AI-based method, [00111] Alternatively, in some embodiments methods using an AI-based approach as described above can be used to identify strains or mutations of a pathogen with reduced affinity for binding interactions with a therapeutic protein. For example, mutated proteins encoded by emergent or potential pathogenic virus strains can be scored for binding to therapeutic proteins that interact with the corresponding wild-type protein. Strains expressing reduced binding to the therapeutic protein can be aggregated in a library of pathogen strains that may escape treatment with the therapeutic protein. Elements of such a library can then be screened for reduced binding, for example using an ELISA directed to the therapeutic protein. Alternatively, or in addition, such elements of such a library can be screened using an animal model for the infectious disease. Screening methods utilizing cells grown in culture and/or artificial organ systems can also be used. In such embodiments an AI-based approach as described above can be used to develop a library of mutated therapeutic proteins with enhanced interaction with elements of the library of mutated pathogen proteins, permitting identification of potential therapies as the mutated pathogen becomes prevalent. In some embodiments data from such screening experiments can be provided to the AI-based method as an experimental database, which can in turn be used to refine results from the AI-based method. [00112] Wildlife species are a known reservoirs for coronaviruses. This raises the possibility that zoonoses and/or reverse zoonoses can occur between humans and animals, providing opportunities for rapid evolution of SARS-CoV-2 and/or SARS-CoV- 1 to generate new and potentially highly virulent variants. Prediction of S protein binding to ACE2 orthologues in different species can facilitate surveillance and early warning of potentially virulent strains of coronavirus in susceptible wildlife species. Inventors applied UniBind™ to predict cross-species binding affinities of RBD and ACE2, using high-throughput assay experiment data that profiled the binding affinity across sarbecoviruses and ACE2 orthologues. The resulting heatmap and association analysis showed that the predictions generated by UniBind™ showed a high degree of correlated with the experimental data, with a PCC of 0.87. FIG.4H shows a heatmap of S- protein–ACE2 binding affinities across species. The left panel of FIG.4H shows AI-predicted values generated by a method of the inventive concept. The right panel shows corresponding experimental data. Sarbecoviruses are colored by clade. FIG.4I shows a typical regression analysis of predicted versus experimental affinity change between S-proteins of sarbecoviruses and ACE2 orthologues of humans. FIG.4J shows a heatmap of predicted affinity values for S- protein–ACE2 binding between SARS-CoV-2 variants and ACE2 proteins from 24 animal species. Tiles with labels (circles and dots) represent the affinities between related ACE2 orthologues and SARSCoV-2 spike variants. Circles indicate that the variants reported could bind to relevant ACE2 orthologues; dots indicate that the variants reported could not bind to relevant ACE2 orthologues. [00113] It should be appreciated that methods as described above can be performed using an automated or partially automated system. Such a system can include a computer encoding elements of the AI and that is in communication with suitable databases, as can include effectors and sensors suitable for performing physical screening studies. Suitable effectors include liquid handling devices, handlers for disposable components (e.g., test plates), and incubators. Suitable sensors include colorimeters, spectrophotometers, fluorometers, luminometers, imaging systems, etc., Such systems can include a controller for directing effector functions. Such a controller can include encoded instructions for the performance of screening assays of candidate proteins identified by the AI-based methods. In some embodiments data generated by sensor systems can be provided as an experimental database that is in communication with the computer encoding elements of the AI. Methods Expression and Purification of Recombinant ACE2 Variants [00114] The mature polypeptides of human ACE2 (GenBank NM_021804.1) wild type and variants were cloned into eukaryotic expression plasmid pFcIg (ABLINK Biotech) with a C- terminally fused Fc region of human IgG1 using Gibson Assembly method. The DH10B competent cells (ABLINK Biotech) were electroporated with assembly products and cultured on LB agarose plates containing 25 μg/mL Zeocin (Invitrogen). Monoclonal colonies were selected and sequenced to confirm the mutations. Then monoclonal colonies were cultured in LB containing 25 μg/mL Zeocin overnight to enhance the plasmid yield. Recombinant plasmids were extracted using an endotoxin removal plasmid extraction kit (TIANGEN™).50 ml HEK 293F cells were transfected with 25 ng recombinant plasmids using FectoPRO(ployPlus)™ transfection reagent to express target proteins. The culture medium was collected after 5 days incubation. Recombinant ACE2 proteins were extracted using protein A dextran and purified using SDS-PAGE electrophoresis. The obtained recombinant ACE2 proteins were in the natively dimeric form. ELISA EC50s of ACE2 mutants binding to RBD were measured by indirect ELISA as previous described 3. Wells of a 96-well plate were coated with 200 ng recombinant RBD protein (ABLINK Biotech) at 4℃ overnight. After removing the supernatant, the wells were blocked using 1% BSA at room temperature for 2 hours and then washed using wash buffer PT (Abcam) for 3 times.10 mg/ml ACE2, ACE2-1, ACE2-2, ACE2-7, ACE2-8, ACE2-9 and sACE2. v2.4 recombinant proteins were diluted at a ratio of 1:3 into 7 concentrations and added to blocked plate with 100 μl per well. After incubating at room temperature for 2 hours, each well was washed and added 100 μl HRP-conjugated Anti-human IgG Fc antibody (Sigma) for 1 hour at room temperature. Unbound antibody was removed by washing, and TMB substrate (Solarbio) was added for colorimetric reaction. After 5-10 min incubation, 50 μl 1 M Phosphoric acid were added to each well to stop the reaction. The signal, which reflects the absorbance of the product, was measured at 450 nm. The 4-parameter logistic model was applied to calculate the EC50 of ACE2 binding according to the NR values of serially diluted samples. Dataset characteristics [00115] Protein-protein binding affinity datasets used here were constructed from the SKEMPI V2.0 database and literature. SKEMPI V2.0 is a manually curated database which includes affinity changes upon mutations for structurally-solved protein-protein interactions, which currently contains 7,085 mutations in total. There are many kinetic or thermodynamic parameters that were reported by SKEMPI V2.0 database; here Inventors used dissociation constants (Kd) to represent affinity. The structures of mutant and wild-type protein complexes were downloaded from SKEMPI V2.0 website (https://life.bsc.es/pid/skempi2). [00116] The effect of SARS-CoV-2 RBD mutations on RBD-ACE2 binding affinity was collected from literature sources, which were measured by apparent dissociation constants (KD,app) using deep mutational scanning approach. The effect of ACE2 mutations on RBD- ACE2 binding affinity was collected from literature sources and estimated by log2 enrichment ratio which was calculated by comparing transcript frequencies between enriched cell populations and naïve plasmid library. Flow cytometry analysis data was utilized for validation. The effects of RBD mutations on RBD-antibody binding affinity were collected from literature sources and estimated by escape score which were calculated by comparing barcode frequencies of variants between immune escape cell populations and reference populations followed by normalization within each antibody. Neutralization assay data from recent literature was included as validation. The structure of wild-type RBD-ACE2 complex was obtained from Protein Data Bank (PDB) with accession number 7df4. The structures of Spike-antibody complexes were collected and curated from Protein Data Bank (PDB). Structures of mutation harboring RBD- ACE2 complex were derived from the wild type structure by replacing amino acids with substitutions and optimized using EvoEF2 (Huang et al., 2020). The fitness of reported SARS- CoV-2 variants and existing variants were collected from GISAID data (22 May 2022). AI model overview [00117] Systems and methods of the inventive concept utilize a deep learning framework named UniBind, developed by the Inventors to estimate the functional impact of mutations on affinities of protein-protein interactions learned from multiple heterogeneous protein complex datasets and to search for unseen proteins with desired properties. There are three major components in this framework, a hierarchical protein representation as a graph at atom-level and residue-level; a new dual-path neural network named BindFormer, with geometry and energy attention (GEA) modules for aggregating messages and iterative refinement; and a multi-task learning method for heterogeneous biological data training. Finally, Inventors developed an affinity-based prospective analysis module for comprehensive analysis including lineage analysis, AI-generated deep mutational scan, fitness landscape depiction, and variant evolution. Protein representation as a graph [00118] Given an input of the wild type structure and its corresponding mutational structure, Inventors represented it as an attributed graph encoded with sequence, geometry, and energy information at both residue-level and atom-level. Specifically, for a 3D protein structure, systems and methods of the inventive concept transform it into a unified protein representation as a graph G=(V,E) ,,where V=\{v 1,v 2,…,v N} a re node features for each residue and E= {z _ij } i≠j) are pairwise edge features between residues. For each residue, systems and methods of the inventive concept encode it as v _i = (x _r,i , x _a,i), where x _r,i = (ℎ _r,i , R _r,i , t _r,i) is a residue feature and ^ _^,^ = ^ℎ _a,p, R _a,p, t _a,p^is an atomic feature for atom Ap in the atom set ^ _^_i of residue i. In both features, ℎ is the embedding vector based on the amino acid or atom types, residue sequence indices, chain ids, and mutant types. ^ and ^ are translation and rotation vectors calculated by the coordinates of three specific atoms using Gram–Schmidt process, where N − C _α − C are applied for a residue and − A _ρ − C are applied for an atom A _ρ in a residue. As changes in the protein−protein binding aﬃnity upon a mutation are determined by both structural changes and energetic eﬀects, systems and methods of the inventive concept utilize edge features to capture the energy of a biomolecule conformation. For each z _ij n the edge feature set E, Z _r,ij is an energy term between residue i and residue j, and ergy terms between atom A _ρ and A. _q Following the setting in Flex ddg method within the Rosetta macromolecular modeling suite, Inventors chose seven energy terms from Rosetta’s all-atom energy function for the calculation of ∆∆G, including solvation, hydrogen bonding, electrostatics and Lennard-Jones atomic packing interactions. BindFormer with geometry and energy attention [00119] Based on the unified protein representation, Inventors developed BindFormer, which is a dual-path neural network to predict changes in protein−protein binding aﬃnity upon a mutation to extract and combine residue- and atom-level information around mutant sites. As the geometric and energy features are two key determinants for protein-protein interactions, Inventors implemented geometry and energy attention (GEA) to incorporate the messages passing in the network. [00120] In BindFormer block, given the input of residue feature ℎ _r and atom feature ℎ _a, Inventors derived transformed features he process of the dual-path with GEA layers, the i-th residue feature ℎ _r,i is first transformed into atom level and combined with feature ℎ _a,ρ of the atom ρ ∈ A _i with a multilayer perceptron (MLP), Then the atomic GEA layer was to aggregate atom level messages from neighbor residues, here x _a,e and x _a,g are energy and geometry terms at atom level. After the atom messages are passed among residues, the atom features for residue i are aggregated as further propagated using the residue GEA layer, x _r,e and x _r,g are energy and geometry terms at residue level. The normalization layer was used after residue connection. Multi-task learning for heterogeneous biological data integration [00121] Inventors developed a framework with an affinity consistent constraint loss, which bridges the gap by modeling affinity across experiments explicitly and trains the model with joint datasets. The multi-task learning framework consists of a shared encoder z = f(x) and separated decoders or each task T, where x is input, bel space. Systems and methods of the inventive concept can perform a consistent constraint to link all decoders, aiming at instructing decoders to consistently agree with inferred affinity among various label spaces. Concretely, given a task T and label space , du ing training with the assumption of consistency of molecular affinity property, the label _: of the task T is mapped to binding free energy change via a learnable monotonic neural network Then the combined loss function for the multitask learning can be formulated as L = model output for task K, I is the weight of the consistency loss. FIG. 7 schematically depicts an exemplary architectural details of a geometry and energy attention (GEA) module. Arrows show the information flow. Vector names: L _M: input sequence feature, X _E: input energy feature, X _G: input geometric feature, X _N: neighbor sequence feature, t _N: neighbor translation, : output sequence feature. L: length of the amino acid sequence, L _N: No. of nearest neighbor residues in the graph, N _h: No. of heads in the multi-head attention. Dimension names: d _s: sequence feature, d _E: energy feature, d _G: geometric feature, d _Q: query vector, d _K: key vector, d _v: value vector. Operators: : element-wise multiplication, : element-wise addition. Functions: k − NN: search for K-nearest neighbors, Linear d ₁ → d ₂: fully connected layer with an input dimension of P _^ and an output dimension of d ₂, f(x): 3 → P _A: geometric feature function, specifically, f(x) = in BindFormer, where is the normal vector of . [00122] GEA is a geometric invariant multi-head attention layer aggregating sequence features from neighbor nodes weighted by pairwise geometric and energy terms. Specifically, in a graph of attributed nodes x _i = (h _i, R _i , t _i) and pairwise energy-based attributed edges z _ij,.The attention value between i-th nodes and j-th node was initially calculated by SoftMax as α _i] = aspect affinity logit between each node pair, f _Q, f _K and f _Z are feed forward layers and j' are indices of all neighbor nodes. Then the sequence, energy, and geometric features from all neighbor nodes were merged with the attention value as , respectively, where f _V is a feed forward layer, f _G = is the normal vector of ^. Finally, the sequence feature of the current node was updated a , where f _out is a feed forward layer. The geometric invariance was guaranteed via the relative distance in attention calculation stage and relative position in local coordinates in the message calculation stage. Model training and ensemble [00123] The training was performed for 200 epochs using the Adam optimizer with a learning rate of 10 ^-3 and a weight decay of 10 ^-6. Mutation and wild type inversion were applied to each complex pair during training as data augmentation in order to enable an improved and generalized network learning. The models were implemented using PyTorch. Inventors performed the 10-fold cross-validation by leaving one-fold of mutations out as a test set and using the rest of the mutations to train and tune the model, repeating this process for each fold. To improve the overall performance of the AI, Inventors applied a model ensemble. The reported predictions were obtained by aggregating the outputs of 10-fold cross-validation. AI affinity-based prospective analysis [00124] Inventors conducted prospective analysis on SARS-COV-2 variants, including AI- based lineage analysis, AI-based deep mutational scanning, fitness landscape depiction, and model-guided evolution. [00125] Based on the AI’s ability to assess affinity properties of variants, Inventors characterized SARS-COV-2 variants by the changes of binding affinity of S-ACE2 and antibody escape score. Inventors used 918 reported SARS-COV-2 variants from GISAID data, from the wild type to the latest Omicron variants. Deep mutational scanning (DMS) approach is a high throughput method that makes use of next-generation sequencing technology to measure the properties of more than 10^5 variants of a protein in a single experiment. But the cost for wet-lab experiments will increase dramatically when the amount of desired variants and properties increases. Furthermore, Inventors conducted AI-based deep mutational scanning by predicting the affinity changes of ACE2 binding and averaged antibody escape scores of all single point mutations of spike in SARS-COV-2. [00126] For model-guided evolution of ACE2 variants and SARS-COV-2 variants, Inventors constructed an evolutionary score (evo-score) using two main determinant factors of ACE-S affinity and immunity escape score, and further depicted the landscape to characterize the fitness of SARS-COV-2 variants. Specifically, Inventors adopted SVM with RBF kernel (Radial basis function) method to fit the fitness of each variant and visualize the topography of the fitness landscape to demonstrate the mutation effects. Variants belonging to the same variant of interest (VOI) or clade were highlight, and clustering in the fitness landscape were grouped together using 2D Gaussian kernel density estimation. Systems and methods of the inventive concept can use a hill-climbing algorithm to search variants with a set number of mutations from wild-type ACE2 or SARS-COV-2 that maximized the minimum predicted functional score from an ensemble of 10-fold models , where, x is the input with wild type structure and mutant structure, S ^L is mutant space with mutant edit distance not large than L, M is a set of trained models, and f _m(x) is a model’s predicted score for the input x. This evolution objective ensures that all models predict that the sequence will have a high functional score. Systems and methods of the inventive concept can initialize a hill-climbing run with selected variants of interest and potential single-point mutations based on AI-based DMS. For each run of hill-climbing process, systems and methods of the inventive concept generated new variants by adding a possible single-point mutation. Systems and methods of the inventive concept can then move to the new population of 3 variants with the highest objective, which became our new reference point, and repeated this hill-climbing process until a local optimum or expected number of variants was reached. Statistical analysis [00127] To evaluate the performance of regression models for continuous values prediction in this study, Inventors applied Mean Absolute Error (MAE), R-square (R2), and Pearson Correlation Coefficient (PCC). The models for binary classification were evaluated by Receiver Operating Characteristic (ROC) curves of sensitivity versus 1 – specificity. The Area Under the Curve (AUC) of ROC curves were reported with 95% Confidence Intervals (CIs). The AUCs were calculated using the Python package of scikit-learn (version 0.22.1). [00128] It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C …. and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.

Previous Patent: TECHNIQUES FOR MAGNETIC NANOCLUSTER-BASED COMBINATION THERAPY

Next Patent: METHODS AND SYSTEMS FOR USE IN SCAN-BASED ANALYSIS OF CROPS