Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
POLYMER DEGRADING ENZYMES
Document Type and Number:
WIPO Patent Application WO/2022/226109
Kind Code:
A1
Abstract:
Disclosed herein are PET hydrolase enzymes, and their nucleic acid and amino acid sequences. A number of candidates have been identified with detectable, quantifiable activity on PET and these enzymes possess desirable traits that are leveraged in the design and engineering of enzyme formulations targeted to degrade specific polymers. These enzymes have measurable PET degrading activity and, in an embodiment, may be active polyester polyurethanes.

Inventors:
ERICKSON ERIKA MARIE (US)
BECKHAM GREGG TYLER (US)
GADO JAPHETH EMI (US)
MCGEEHAN JOHN E (GB)
PAYNE CHRISTINA MARIE (US)
Application Number:
PCT/US2022/025624
Publication Date:
October 27, 2022
Filing Date:
April 20, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ALLIANCE SUSTAINABLE ENERGY (US)
UNIV OF PORTSMOUTH (GB)
NAT SCIENCE FOUNDATION NSF (US)
International Classes:
C12N15/52; B09B3/00; C12N9/18; C12N15/63
Foreign References:
US20200216851A12020-07-09
US20200332334A12020-10-22
Attorney, Agent or Firm:
BARKLEY, Sam J. (US)
Download PDF:
Claims:
We claim:

1. An engineered organism capable of expressing PET hydrolase enzymes with PET hydrolase activity.

2. The engineered organism of claim 1 wherein the organism is used to degrade PET.

3. The engineered organism of claim 1 wherein the organism is genetically engineered to overexpress PET hydrolase enzymes.

4. A method for identifying PET hydrolase enzymes by identifying nucleic acid sequences from sequenced genomes that are likely to encode for active PET hydrolase enzymes.

5. The method of claim 4 wherein the identified sequences are expressed as engineered PET hydrolase enzymes from a genetically modified organism.

6. The method of claim 4 wherein the engineered organism is genetically engineered to overexpress PET hydrolase enzymes useful for degrading PET.

7. The method of claim 4 further comprising a step of comparing the sequences disclosed herein to sequences of genomes in order to identify PET hydrolases.

8. The method of claim 7 further comprising the step of applying an algorithm to predict the secondary, tertiary and quaternary structure of the PET hydrolases.

9. The method of claim 8 further comprising creating engineered PET hydrolases with increased PET hydrolase activity based upon the predicted tertiary or quaternary structure of the expressed amino acid sequences.

10. A system for identifying PET hydrolase enzymes comprising an engineered organism capable of expressing PET hydrolase enzymes with PET hydrolase activity and comparing the sequences of their corresponding genomes in order to identify PET hydrolases and further comprising the step of applying an algorithm to predict the secondary, tertiary and quaternary structure of the PET hydrolases.

11. The system of claim 10 further comprising creating engineered PET hydrolases with increased PET hydrolase activity based upon the predicted tertiary or quaternary structure of the expressed amino acid sequences.

12. The system of claim 10 wherein the organism is used to degrade PET.

Description:
POLYMER DEGRADING ENZYMES

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority under 35 U.S.C. § 119 to U.S. provisional patent application no. 63/177,334 filed on 20Apr2021 and 63/297,529 filed on 07Jan2022, the contents of which are hereby incorporated in their entirety.

CONTRACTUAL ORIGIN

[0002] The United States Government has rights in this invention under Contract No. DE- AC36-08GO28308 between the United States Department of Energy and Alliance for Sustainable Energy, LLC, the Manager and Operator of the National Renewable Energy Laboratory.

BACKGROUND

[0003] Plastics accumulation in nature represents a global environmental crisis. In response, microbes are evolving the capacity to utilize synthetic polymers as carbon and energy sources. Synthetic polymers pervade all aspects of modern life, due to their low cost, high durability, and impressive extents of tunability. Originally developed to avoid the use of animal -based products, plastics have now become so widespread that their leakage into the biosphere and accumulation in landfills is creating a global-scale environmental crisis. Indeed, plastics have been found widespread in the world’s oceans, in the soil, and more recently, microplastics have been reported even entrained in the air.

[0004] The accumulation of plastics waste in landfills and throughout the natural environment represents a global pollution crisis. Concurrently, petrochemical-derived plastics manufacturing and consumption are also major contributors to global greenhouse gas (GHG) emissions. These dual challenges in end-of-life management and the manufacturing of plastics have prompted a surge of activity in the development of chemical recycling technologies, wherein synthetic polymers are deconstructed to intermediates that can be recycled into the same material in a closed-loop process or converted into alternative products in an open-loop process. One of the most commonly used and discarded plastics is polyethylene terephthalate) (PET), which is a polyester employed in single-use beverage bottles, textiles, and packaging, among other applications. Given its ubiquity in consumer plastics and the relative ease of ester bond cleavage, PET is among the most well-studied polymers for chemical recycling, and thermal, catalytic, and biocatalytic approaches for PET recycling are currently being pursued. For biocatalytic conversion of PET, the use of hydrolase enzymes has witnessed major advances especially in the last decade, both in terms of advancing the industrial relevance of this approach, as well as the discovery of natural microbial systems that respond to the presence of PET in the biosphere.

[0005] Thirty-six serine hydrolase family enzymes have been experimentally confirmed to deconstruct PET to its constituent monomers, terephthalic acid (TP A) and ethylene glycol (EG). Most known PET hydrolases are cutinases, lipases, and carboxylesterases (Enzyme Commission 3.1.1.-). Based upon pioneering enzyme discoveries, multiple structural biology, protein engineering, and enzyme screening efforts have aimed to identify the necessary features for an enzyme to hydrolyze PET and to improve these enzymes for industrial application. Notably, the most efficient PET-degrading biocatalysts are thermostable enzymes that exhibit optimal PET hydrolysis activity near the PET glass transition temperature (PET Tg values can range from ~65-80°C). For example, others have engineered thermotolerant leaf-branch compost cutinase (LCC) variants that displayed substantial performance improvements for amorphous PET hydrolysis, and similar protein engineering efforts have achieved improved thermotolerance in Thermobifida cutinases, among others recently reported a new thermotolerant cutinase with high structural similarity to LCC that also exhibits excellent PET hydrolysis performance on amorphous substrates. Given the need for activity under thermophilic conditions for effective PET hydrolysis, multiple protein engineering efforts have also been conducted to improve the thermal stability of the mesophilic Ideonella sakaiensis PETase. These studies have made considerable advances, but progress could be potentially accelerated further via discovery of a broader diversity of enzyme scaffolds with PET hydrolytic activity.

[0006] To date, the sequence and structural features that confer PET hydrolysis activity are not yet fully understood, both within and beyond the sequence space explored to date. Similarly, the diversity of enzymes naturally able to hydrolyze PET remains unclear. To address these questions, others have applied a Hidden Markov Model (HMM) in 2018 to search metagenomic databases for potential PET hydrolases. They identified 504 putative PET hydrolases, based on known sequences at the time, and further confirmed PET hydrolysis in four new enzymes. They noted that PET hydrolysis activity, based on the enzymes reported then, is likely quite rare in nature. As the authors discussed, there remains an urgent need to further develop the suite of known PET-active enzymes from natural diversity.

SUMMARY

[0007] In an aspect, disclosed herein are PET hydrolase enzymes, nucleic acid and amino acid sequences for PET hydrolase enzymes and methods for using algorithms to predict tertiary and quaternary structures of the expressed PET hydrolase enzymes useful for generating non- naturally occurring PET hydrolase enzymes with improved activity and stability. In an embodiment the PET hydrolase enzymes disclosed herein are useful for degrading PET. In an embodiment, the enzymes disclosed herein are useful for degrading polyester polyurethanes. [0008] Other objects, advantages, and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS [0009] FIGs. 1 A, IB depict bioinformatics and machine learning to derive PET hydrolase sequences from natural diversity. FIG. 1A depicts minimum-evolution phylogenetic tree of 74 PET hydrolase candidates selected by HMM and ML. Sequences retrieved from environmental (meta)genomes in JGI IMG with lower HMM scores (groups 1 to 3) are notably diverse compared to the sequences that comprise the rest of the tree (groups 4-7). The symbols around the tree show expression, activity, and previously reported PET activity. FIG. IB depicts a Sequence Similarity Network (SSN) of enzymes with experimentally confirmed PET hydrolase activity. Edges represent pairwise BLAST similarity with E- value < le 10 . The SSN clusters are consistent with the associated families in the ESTHER database and with the phylogenetic groups in FIG. 1 A, and show that most reported PET hydrolases fall in the polyester-lipase- cutinase family.

[0010] FIGs. 2A, 2B depict enzyme activities. FIG. 2A depicts heat map profiles of pH and temperature screening on amorphous PET film for a diverse selection of enzymes and two control enzymes. The heat map gradient indicates the extent of measured product release up to 500 mg/L of total aromatic products after 96 h reaction time. FIG. 2B depicts a log-plot of the sum of aromatic products measured after 168 h reaction time as measured from time-course experiments using crystalline PET powder (open squares) and amorphous PET film (black squares) as substrates. Reaction conditions used for time-course experiments correspond to the pH and temperature resulting in the highest product release observed in screening reactions, and are listed in Table 5. For all enzymatic reactions shown in panels A-B, the enzyme loading was 0.7 mg enzyme/g PET and the solids loading was 2.9% (29 g/L). The reaction products were quantified with HPLC, and the results show the sum of aromatic products, including BHET, MHET, and TPA.

[0011] FIGs. 3A, 3B, and 3C depict the structural diversity of PET-active enzymes from phylogenetic groups. All structural models are shown to scale, rendered as cartoons with transparent accessible surface areas and putative active sites highlighted with the Ser-His-Asp catalytic triad in red sticks. FIG. 3 A depicts PET hydrolase scaffolds identified from mesophilic (top, I. sakaisiensis PETase, PDB ID 6EQE (32)) and thermophilic (middle, LCC, PDB ID 4EB0 (29), and bottom, T.fusca cutinase 1 DSM44342 (703)) sources occupy a narrow structural space with highly conserved a/b hydrolase folds. FIG. 3B depicts a selection of representatives from more distant phylogenetic groups reveals multiple additional and alternative structural features with substantial increases (102) and reductions (307) in the core fold. FIG. 3C depicts several additional distinct domains were revealed, including a Peripheral Subunit-Binding Domain (PSBD) and a Family 35 carbohydrate binding module (CBM).

[0012] FIGs. 4A, 4B, 4C, and 4D depict increasing degrees of structural diversity across phylogenic groups. FIG. 4A depicts conserved canonical folds with surface residue changes in groups 5 and 6. Electrostatic surface representations are colored with a gradient from red (acidic) at -7 kT/e to blue (basic) at 7 kT/e (where k is Boltzmann’s constant, T is temperature, and e is the charge on an electron). The general location of active sites is indicated with a star, and known (LCC) and predicted catalytic triad residues are shown as stick representations in the corresponding images below. FIG. 4B depicts accessory lid domains in group 2 enzymes. The peptidase-like core is generally conserved across this group, with the exception of a few helical deletions distal from the predicted active sites. Examples of alternative lid domains are highlighted in green. FIG. 4C depicts mini-PETases are created from large core deletions to the canonical fold. LCC is shown in the middle column (yellow) as a cartoon with the catalytic triad highlighted in red, and a surface representation below with a PET trimer (blue) docked in the active site cleft. A comparison with 307 on the left (cartoon shown without the lid domain for clarity) reveals the extent of the core deletion, removing four of the eight b-strands and corresponding helices. A comparison with 305 on the right reveals an almost complementary set of deletions. Enzyme 307 approximates the left half of the LCC core domain while 305 approximates the right half. These major rearrangements generate alternative binding clefts and docking studies predict vastly different binding modes (PET trimers in blue). FIG. 4D depicts an alternative enzyme family for PET hydrolysis. The enzymes 101 (left) and 102 (right) are colored according to the 3 -domain arrangement in the Geobacillus stearothermophilus carboxylesterase EST55 (PDB ID 20GT). Both enzymes display a truncated version of the catalytic domain (pink) compared to EST55 and have modified versions of the a/b domain (blue). Only enzyme 101 has a version of the regulatory domain, the absence of which in 102 disrupts the formation of the canonical active site (locations highlighted with red dashes). While the catalytic Ser and Glu residues are conserved between EST55 and 101 (pink and yellow sticks), there is no direct substitute for the His residue. In enzyme 102, only the catalytic Ser is position is conserved, although there are other candidate residues that could potentially form a productive triad.

[0013] FIGs. 5A, 5B, 5C, 5D, and 5E depict a time-course plots comparing product release from amorphous PET film and crystalline PET powder over 168 h reaction time. Error bars represent the standard deviation of reactions measured in triplicate. FIG 5A depicts a comparison of control enzymes using peak activity reaction conditions from screening on amorphous PET film. FIG. 5B depicts a comparison of selected candidate enzymes using peak activity conditions from screening on amorphous PET film. FIG. 5C depicts a comparison of two reaction conditions for enzyme 606 showing that 606 has higher activity in more alkaline reaction conditions. FIG. 5D depicts a comparison of two reaction conditions for enzyme 611. Enzyme 611 is more selective for crystalline PET powder compared to amorphous PET in both conditions tested. FIG. 5E depicts a comparison of two reaction conditions for enzyme 704, showing that while 704 prefers a more alkaline reaction environment (pH 9), comparable activity is achieved even at pH 7.

DETAILED DESCRIPTION

[0014] Industrial adoption of new plastics recycling and upcycling technologies could incentivize the reclamation of waste plastics and reduce greenhouse gas emissions from virgin plastics manufacturing. To this end, the use of hydrolase enzymes for polyester recycling has witnessed a surge of interest from the biotechnology community. Process analysis has predicted that enzymatic PET recycling could have both substantial economic and sustainability benefits if deployed at scale. Thus far, approximately 36 related enzymes have been demonstrated to breakdown PET to its monomers, prompting the search for more distant and diverse functional biocatalysts for PET hydrolysis. Disclosed herein are methods and to identify distantly related enzymes with high-temperature PET activity, thus providing a rich biochemical and structural resource for further engineering of enzymatic PET hydrolysis.

[0015] The leakage of plastics into the environment on a planetary scale has led to the subsequent discovery of multiple biological systems able to convert man-made polymers for use as a carbon and energy source. On the basis of these natural systems able to degrade synthetic plastics, the environmental microbiology community is interested to understand how natural enzymes evolve to convert non-natural substrates, which in turn will enable these systems to be used for biotechnology applications towards a circular materials economy.

[0016] New recycling solutions are critically needed to mitigate waste plastics pollution. To that end, the enzymatic deconstruction of a ubiquitous polyester, polyethylene terephthalate) (PET), is under intense investigation, particularly given the promise of a biological recycling approach that can depolymerize PET to its constituent monomers near the polymer glass transition temperature (~70°C). To date, reported PET hydrolases have been sourced from a relatively narrow sequence space. To enable such an enzymatic recycling approach, we sought to identify additional biocatalysts for PET deconstruction from natural diversity. In this work, we used bioinformatics and machine learning to identify 74 putative thermotolerant PET hydrolases, based on a set of known PET hydrolyzing enzymes. We successfully expressed, purified, and assayed 52 enzymes from seven distinct phylogenetic groups, and within this set, we observed PET hydrolysis activity in 37 enzymes in reactions spanning a range of pH from 4.5-9.0 and temperatures from 30-70°C. We conducted biophysical characterization and PET hydrolysis time-course reactions with the best-performing enzymes, which demonstrated that some enzymes exhibit higher specificity towards crystalline PET rather than the commonly observed preference for amorphous PET. We employed X-ray crystallography and the AlphaFold artificial intelligence-based protein structure prediction algorithm to interrogate the enzyme architectures, which revealed both protein folds and accessory domains not previously associated with PET deconstruction. Taken together, this study expands the number and structural diversity of thermotolerant protein scaffolds for PET hydrolysis, which can enable further engineering for enzymatic PET recycling and upcycling.

[0017] In an embodiment, an objective of the current disclosure is to expand the catalog of thermotolerant PET hydrolase scaffolds. To this end, we combined an HMM approach with machine learning (ML) to predict the temperature where the enzyme would be optimally active based on its sequence. In doing so, we selected 74 putative thermotolerant PET hydrolases for experimental screening, sourced from seven distinct phylogenetic groups, including several from which no PET hydrolysis activity has been previously reported to our knowledge. Expression and purification trials for each enzyme were conducted, and the proteins successfully expressed were screened for amorphous PET hydrolysis as a function of pH and temperature. For the best performing enzymes from each group, we conducted both thermal characterization to measure the melting temperature (Tm), and time-course reactions using crystalline PET powder and amorphous PET films as substrate to ascertain differences in reactivity as a function of substrate properties. Lastly, we combined X-ray crystallography and AlphaFold for structural characterization of all 74 enzymes to gain insights into the structure-activity relationships that confer PET hydrolytic activity. Taken together, this work suggests that PET hydrolytic activity can be sourced from a wider range of natural diversity than previously reported and expands the number of enzyme structural scaffolds for thermotolerant PET hydrolase engineering. [0018] Bioinformatics and ML enables identification of 74 putative thermotolerant PET hydrolases from seven distinct phylogenetic groups. Similar to other successes in identifying PET hydrolases with HMM, we constructed an HMM from 17 characterized enzymes that were confirmed to exhibit PET hydrolysis activity as of December 2018, and applied the HMM to search sequences in the National Center for Biotechnology Information (NCBI) non-redundant database as well as select thermal metagenomes from the Joint Genome Institute Integrated Microbial Genome (JGI IMG) database Table 2. We sought to limit the search to thermostable enzymes capable of PET hydrolysis near the PET Tg. To this end, we leveraged the correlation between enzyme maximum temperatures and the optimal growth temperature (OGT) of the host organisms. Hence, the HMM sequence hits were mapped to OGT data retrieved from the NCBI Bioproject database, the BacDive database, and the JGI IMG metagenome sample temperature. Sequences with OGT lower than 50°C were discarded. For sequences that could not be mapped to OGT data, we trained a ML model (ThermoProt) to discriminate between 8,000 proteins from thermophiles (>50°C) and 8,000 proteins from non-thermophiles (<50°C) using the support vector machine method with calculated amino acid features. ThermoProt demonstrated an accuracy of 86.6% in five-fold cross-validation tests.

[0019] We observed that many of the top HMM hits from the JGI IMG metagenomes were identical or very similar to hits from NCBI. To diversify the sequence search space further, we selected proteins with predicted thermostability and high HMM scores (>100, E-value<8.0e 26 ) from the NCBI hits, but thermophile-derived proteins with relatively low scores (<55, E- value>2.0e u ) from the JGI IMG hits. Consequently, 74 sequences were selected. We note that 14 of these sequences have been reported in other studies to our knowledge and were retained in our assays as benchmarks. As illustrated in Fig. 1 A, phylogenetic analysis showed that these 74 sequences comprise at least seven distinct phylogenetic groups, with the diverse JGI IMG sequences forming three clades (which we termed groups 1 to 3) that are clearly separate from the NCBI sequences. The NCBI sequences form two clades (which we termed groups 6 and 7) and two paraphyletic groups (termed groups 4 and 5) (FIG. 1 A). Based on these results, the 74 PET hydrolase candidate sequences were assigned identification numbers according to these phylogenetic groups (101 and 102 in group 1, 201 and 202 in group 2, and so on). A list of candidate sequences is provided in an annotated description with accession numbers for each in Table 3.

[0020] To gain insight into the diversity of the selected sequences within the vast a/b hydrolase superfamily, we classified the sequences according to families in the ESTHER database (56) and predicted enzyme commission (EC) numbers. EC number predictions were assigned by transferring EC numbers (1) associated with the ESTHER families, (2) associated with the top annotated hit from a BLAST search of each sequence against the SwissProt database, and (3) predicted by the deep-learning tool, DeepEC. The results reveal that all candidate sequences in groups 4 to 7 with high HMM scores (>100) belong to the polyesterase- lipase-cutinase family, along with nearly all previously reported PET hydrolases, and are associated with carboxyl ester hydrolase (3.1.1.-) and cutinase (3.1.1.74) activities. However, the sequences derived from lower HMM scores (groups 1 to 3) diverge from canonical PET hydrolases and are associated with distant families such as peptidases E.C. (3.4.-.-). A sequence similarity network (FIG. IB) demonstrates the clustering of currently known PET hydrolases in the polyesterase-lipase-cutinase family and the divergence of candidate sequences from groups 1 to 3. [0021] Screening on amorphous PET shows that PET hydrolysis activity is distributed among all seven phylogenetic groups. The 74 enzymes were expressed in Escherichia coli with each putative PET hydrolase gene codon-optimized and cloned into a pET21b(+) plasmid with a C-terminal hexa-histidine epitope tag. The likelihood of a signal peptide sequence in each of the 74 putative enzyme sequences was predicted using SignalP 5.0, and the resulting predictions were removed in the 36 relevant expression constructs (vide infra). Given the diversity of enzymes to be expressed and purified, we adopted a 4-stage expression screening approach that varied E. coli expression strains, growth medium composition, incubation temperature and time, induction protocol, and other relevant expression parameters. Enzyme purification followed a standardized protocol of affinity chromatography, buffer exchange, and size exclusion chromatography, Table 4 details the expression strategies that enabled production of 51 of the 74 enzymes.

[0022] Given the possible range of enzyme activities, we employed a comprehensive, semi- quantitative screening assay to first detect PET hydrolytic activity of each enzyme. Specifically, we used 100 mM NaCl with 50 mM buffer across a range of pH (citrate at pH 6.0, NaHiPCri at pH 7.0, NaH2P04 at pH 7.5, HEPES at pH 7.5, bicine at pH 8.0, and glycine at pH 9.0) and temperature (30°C to 70°C, in 10°C increments). All screening reactions were conducted in triplicate. In this initial activity screen, we employed commercially available amorphous PET film from Goodfellow, thereby enabling inter-study comparisons. All reactions were conducted for 96 h at an enzyme loading of 0.7 mg enzyme/g PET and a substrate loading of 2.9% by mass in polypropylene microcentrifuge tubes. Due to the molecular weight differences of the enzymes screened, the number of catalytic units added to the reactions differed. However, we chose this approach given that enzyme loadings for reactions of this nature are typically assessed for process cost on the basis of mass of enzyme loaded per mass of substrate. The aromatic reaction products, bis(2-hydroxy ethyl) terephthalate (BHET), mono(2-hydroxy ethyl) terephthalate (MHET), and TP A, were quantitated via ultra-high-performance liquid chromatography up to a product concentration of 500 mg/L accounting for dilution, above which the calibration curve was outside of the linear range. For this substrate loading, the upper limit of concentration of product corresponds to a maximum extent of conversion of 2.1% by mass. Aromatic product release data are reported throughout, relative to background aromatic product release detected in no-enzyme control reactions at each pH and temperature. As positive controls, we included the LCC wild-type enzyme and two improved mutant variants (ICCG and WCCG), the I. sakaiensis PETase wild-type enzyme and an improved double mutant variant (W159H/S238F), and T. fusca cutinase BTA-1.

[0023] FIG. 2 A shows illustrative heat maps of total aromatic product release across 30 reaction conditions for the best-performing enzymes from each of the seven phylogenetic groups, alongside two positive control enzymes, namely wild-type LCC and I. sakaiensis PETase. At least one enzyme from each of the phylogenetic groups shown in FIG. 1 exhibited measurable PET hydrolysis activity. Overall, 36 enzymes were found to be active for PET hydrolysis at statistically significant levels above the no-enzyme control, while 14 of the 51 enzymes did not exhibit any detectable PET hydrolytic activity above the no-enzyme control background. FIG.

2A shows that enzymes in groups 5, 6, and 7 exhibited the highest detected activity. This is not surprising given that most of the enzyme discovery efforts to date on PET hydrolases have identified enzymes belonging to the polyesterase-lipase-cutinase family, to which the enzymes in groups 5, 6, and 7 belong. Groups 1 and 4 also exhibited appreciable PET hydrolysis activity, while groups 2 and 3 displayed only minimal activity above the no-enzyme control background. Overall, this screening highlights 23 thermostable enzymes that have not been previously reported, to our knowledge and that exhibit PET hydrolase activity beyond the 36 currently known enzymes.

[0024] As is apparent in FIG. 2A, there is a substantial breadth of enzyme activity across the pH and temperature ranges studied, with activity of at least one enzyme in every condition tested. For the four enzymes that exhibited optimal activity at pH 6.0 (102, 611, 702, 715), we further extended the pH screen across the same five temperatures and four additional pH conditions (50 mM citrate buffer at pH 5.0 and 5.5 and 50 mM sodium acetate buffer at pH 4.5 and 5.0), with the LCC wild-type enzyme and the LCC ICCG mutant as positive controls. The LCC ICCG mutant is active in buffered medium with a pH as low as 5.0, while 102 was not active in media with a pH of less than 6.0, and 611, 702, and 715 all exhibit detectable activity in medium with a pH less than 6.0.

[0025] Lastly, because I. sakaiensis PETase and some cutinases are secreted34e, we were interested in the potential effects on both protein expression and hydrolytic activity when signal peptide sequences predicted to enable protein secretion were included. We conducted the same screening experiments for a selection of putative PET hydrolases retaining the native signal peptide (nSP) in the expression sequence, namely 301, 401, 403, 410, 606, 607, and 711. The results demonstrate that the inclusion of a signal peptide in the expression sequence does not uniformly influence activity, as illustrated by our observations of complete abolishment of activity (301, 410, 711), a slight increase in activity (606), and reduction of activity (401, 403). Enzyme 607 could only be expressed when including the native signal peptide sequence, though much of the enzyme produced is insoluble. Enzyme 607-nSP (with native peptide) exhibited measurable PET hydrolytic activity, increasing the total number of unique catalytic domains expressed and screened to 52, and the number of new, thermostable PET hydrolases identified to 24.

[0026] Detailed characterization of the best-performing enzymes highlights reactivity differences on different substrates. We were also interested to learn if the best-performing enzymes from each phylogenetic group would exhibit different reactivity profiles on different PET substrates. For these comparisons we used two commercially available substrates that have been thoroughly characterized, namely a crystalline Goodfellow PET powder and a Goodfellow amorphous PET film. This set included 12 enzymes selected to represent a diverse group with the highest PET degradation extents observed from screening, see FIG. 2B and FIG. 5. Experiments were conducted with the LCC wild-type enzyme, the LCC ICCG mutant, and BTA- 1 as positive controls. The reactions were run for 168 h to compare effects due to enzyme stability. As shown in Fig. 2B, both control enzymes and a several group 7 enzymes (701, 704, 714, 716) exhibited higher activity on amorphous PET film, consistent with prior work.

However, we also identified enzymes with higher activity on crystalline PET powder compared to amorphous PET film (FIG. 2B), which has not previously been reported for thermophilic PET hydrolases to our knowledge. Additional comparisons of the 168 h reactions are in FIG. 5. Table 5 depicts the corresponding reaction conditions employed in these experiments and the data. [0027] Calorimetry confirms thermostability across the phylogenetic groups. Of the expressed and purified enzymes, 20 were of sufficient yield and solubility for thermostability analysis by differential scanning calorimetry (DSC), including at least one member from each of the seven distinct phylogenetic groups. The observed melting temperature (Tm) values in neutral buffer for the 17 enzymes of known origin (belonging to groups 4-7) ranged from 53.9°C for enzyme 606 originating from Marinactinospora thermotolerans, to 86.9°C for wild-type LCC (501), see Table 6. In addition, Tm values were obtained for single representative members from groups 1-3, each of which originates from metagenomic sequences from environmental samples. Two of these, enzymes 102 (66.0°C) and 202 (75.1°C), have Tm values within the established range for known thermophilic enzymes, whilst enzyme 306 exhibited the highest Tm (92.6°C) of all 20 enzymes analyzed. These measurements confirm the utility of the Thermoplot ML algorithm in identifying amino acid sequences with high thermal stability.

[0028] The majority of the above enzymes that were amenable to DSC analysis are members of group 7, including eight highly homologous polyester-lipase-cutinase enzymes originating from T. fusca (701-706, 714 and 715), and three from T. cellulosylitica (709, 711 and 716). With the exception of 709, each of these exhibit some degree of PET hydrolase activity. This comprehensive T. fusca enzyme DSC dataset illustrates the potential variation in thermostability (65.6 to 71.8°C) for homologous secreted enzymes from a single thermophilic species; from a biological perspective, such variation is tolerable since, in all cases, the Tm exceeds the OGT of the organism. An analysis of the Tm sequence dependence in these enzymes reveals point variants that influence their thermostability; for example, enzymes 702 and 705, which are 99% identical in sequence and differ at only three amino acid positions, have Tm values separated by 6.2°C. Such differences in their susceptibility to thermal denaturation may influence the optimal temperatures for PET hydrolysis and inform further engineering.

[0029] Structural characterization highlights diversity of PET-active enzymes. Given the range of sequence diversity captured in this work (FIG. IB) and the opportunities to interrogate structure-function relationships across a broad group, we conducted comprehensive crystallization screening, resulting in eight high-resolution X-ray structures for enzymes 202 (7QJM), 306 (7QJN), 606 (7QJO), 611 (7QJP), 702 (7QJQ), 703 (7QJR), 705 (7QJS), and 711 (7QJT) at resolutions extending between 1.43-2.19 A. As observed previously, the compact folds of a/b hydrolases can often yield high-quality atomic, and even sub-atomic, resolution X-ray data. However, as we screened beyond the folds homologous to the I. sakaiensis, Thermobifida, and LCC enzymes, the success rate of crystallization hits fell. With PET-active representatives identified in all seven phylogenetic groups, we sought to use the AlphaFold protein structure prediction system to interrogate the structural diversity of the 74 enzymes.

[0030] To investigate the utility of AlphaFold for thermotolerant enzyme folds, we first selected sequences where we already had unpublished X-ray structures, allowing direct comparison between the predictions and experimental data. In line with recent observations on compact folds within the human proteome, we observed that pLDDT data, the AlphaFold quality scoring metric (a per-residue measure of local confidence on a scale from 0 - 100 based on a Local Distance Difference Test), were generally favorable, indicating high confidence in the accuracy of these target structures. Superposition with the experimental structures revealed a high correlation with the general architecture, and geometric predictions matched the experimental structures down to the level of individual residues. This was particularly the case for residues that form key structural interactions within the core of the proteins and, crucially, those contributing to the active sites. Further validation of the utility of this approach was demonstrated by the successful use of an AlphaFold structure as a molecular replacement search model for a challenging experimental X-ray dataset from enzyme 306. Based on these results, we used AlphaFold to predict all 74 structures, with a selection of PET-active enzymes shown in FIG. 3.

[0031] As shown in FIG. 3 A, representatives of known PET hydrolase enzymes, such as those in groups 5-7, share highly similar structures. Here, we show that expanded primary sequence phylogeny correlates with an unexpectedly large increase in structural diversity, not simply changes in surface loops and secondary structural elements, but large core deletions, modifications, and substantial fold extensions and additions (FIG. 3B). Overall, this group of enzymes spans molecular weights ranging from 13 to 55 kDa (I. sakaiensis PETase is ~27 kDa) and isoelectric points from 4.3 to 9.7, see Table 3. We focus on examples that capture the range of diversity, describing enzymes that are active on PET, and present structural features not previously associated with PET hydrolysis. Using LCC as the archetypal comparator, we explore multiple levels of structural divergence, from subtle changes in the catalytic cleft and surface charge distribution, to additional domains, major core deletions, and new folds constituting alternative active site arrangements and binding modes.

[0032] Wide ranging surface residue modifications provide functional diversity while maintaining a conserved catalytic core. The group 5, 6, and 7 enzymes are the most characterized to date and share many common features including a highly conserved core domain with a 9- stranded b-sheet flanked by 8 or 9 a-helices. While the newly identified candidates in this study have not yet been subjected to protein engineering, these groups represent generally the most active members of the cohort of 74. Given their close similarities and the wealth of structural data, we were curious if there was a structural rationale for the observed differences in substrate preference in groups 5 and 6 compared to LCC, which itself is in group 5 (FIG. 2B). A comparison of LCC with enzymes 504 and 611 reveals high similarities, with RMSDs of 0.92 A over 1,361 atoms and 0.81 A over 1,366 atoms, respectively. With an X-ray structure of enzyme 611 extending to 1.56 A, and a high-confidence AlphaFold model of enzyme 504, comparisons revealed almost identical active site triad geometries (Fig. 4A) making the substrate crystallinity differences surprising. [0033] To investigate this further, analysis of the surface charge distribution revealed a highly acidic patch adjacent to the active site cavity of enzyme 504 compared to LCC, while 611 displays an exceptionally acidic surface extending around multiple faces, in stark contrast to canonical PET hydrolases that are generally more positively charged on the solvent-exposed surface (Fig. 4A). This correlates with an isoelectric point of 4.3 for enzyme 611, compared to 9.3 and 9.5 for LCC and the I. sakaiensis PETase, respectively.

[0034] A closer look at the active sites of 504 and 611 reveals more subtle, but potentially key differences. We employed computational substrate docking to compare the relative active site surface cavities and their influence on substate binding (SI Appendix, Fig. S9). While LCC accommodates a PET trimer deep within a cleft, resulting in significant twisting of the aromatic molecules in the polymer chain, enzymes 504 and 611 present shallow clefts that appear to bind the polymer chain in a straighter conformation, possibly playing a role in the preferential accommodation of crystalline rather than amorphous PET observed as disclosed herein.

[0035] Evolution of multiple lid and accessory domains generate additional variety. A variety of accessory domains is observed in groups 2, 3 and 4, ranging from small lids that cap or partially occlude the predicted active site regions, to large independent folds connected by flexible linkers (FIG. 3C, 4B). These include a Peripheral Subunit-Binding Domain (PSBD) in enzyme 202, not initially observed in the X-ray crystal structure, but revealed by AlphaFold predictions, and a Family 35 carbohydrate binding module (CBM) in enzyme 407 (FIG. 3C). Perhaps unsurprisingly, two candidates from the set of 74 enzymes that were not successfully expressed in E.coli included enzyme 408, which contains a putative cell wall anchor domain, and enzyme 212, which contains a predicted extended transmembrane anchor. [0036] The group 2 enzymes represent a new family of peptidase-like hydrolases, all characterized by a central core with the addition of lid domains in a variety of constructions. Examples include a mixed helical and b-sheet arrangement (204), a three-helix bundle (211), and for enzyme 214, a substantial 80-residue extended helical domain which creates a 40 A wide flat surface platform of unknown function, see FIG. 4B.

[0037] It is of note that the shapes of the group 2 active site clefts are also unusual. For example, the active site is partially covered in enzyme 204. However, this region of the predicted structure has a low confidence score in the AlphaFold prediction and may be dynamic. Nevertheless, equivalent elements are well defined in the X-ray structure of enzyme 202 to a resolution of 2.19 A, a particularly interesting candidate given that it has a Tm of 75°C. It is similar to enzyme 214 in term of the extensive lid domain, but enzyme 202 has two large a- helices and two b-strands which substantially extend the central b-sheet. Combined with the attachment of the PSBD, this is the largest of representative of the Group 2 enzymes with a molecular mass of 41.5 kDa. In a departure from classical PET hydrolases, the active site is completely buried in this apo crystal structure, and while the two occluding structures, a helix on one side and a loop on the other, look to be robustly linked by hydrogen bonds and hydrophobic stacking interactions, these two regions have the highest B-factors of the catalytic core. In fact, the occluding helix sits on what appears to be a hinge-like structure which may have the potential to swing open to accommodate the polymer chain. If this was to occur, the cavity would expose 3 aromatic phenylalanine residues toward the PET surface.

[0038] Mini-PETases reconstitute productive active sites from only half the core domain. Enzyme 307 has a large deletion of around one half of the core domain, with only four strands in the central b-sheet compared to the typical eight or more strands found in canonical PET hydrolases, see FIG. 4C. Enzyme 307 would be the smallest protein in the set of 74, if not for the addition of a compact active site lid. Despite the absence of four helices in the core, this enzyme remarkably retains the conserved canonical active site. As a result of the deletion, the 307 active site is open in nature and docking studies predict potential electrostatic interactions that may stabilize an otherwise flexible protein following substrate binding. Docking simulations with a PET trimer reveal the potential for binding within a large open cleft, as compared to the relatively narrow groove of the LCC active site FIG. 4C. The same minimal fold is also observed in candidate 201, in this case without the lid domain, making it the smallest representative from the entire set at 15.6 kDa. While not expressed in sufficient quantities for biochemical analysis, given it has the same active site triad arrangement, it may still find productive use for modelling the absolute minimal scaffold solution for a 4 b-stranded PET hydrolase.

[0039] Highlighting the differences within a single phylogenetic group, enzyme 305 also displays a major deletion, but more surprisingly in the opposite half of the core compared to 307. The missing a-helical region would normally contribute half of the active site cavity and the His residue of the active site triad in the canonical fold. On closer inspection, an alternative His is positioned in the triad, reconstituting what appears to be a unique active site from the same half of the core. Both of these mini-PETases offer opportunities to investigate the minimal protein chain required for PET hydrolysis via two alternative active sites and may provide a starting point for de novo protein design.

[0040] Newly identified PET-active family members offer alternative folds, binding surfaces, and active site geometries. While the group 1 enzymes exhibit low activity relative to the other groups, examples such as enzyme 102 with a Tm of 65 °C, are quite thermotolerant. These enzymes exhibit a distinct fold, closer to carboxylesterases, such as the EST55 enzyme from Geobacillus stearothermophilus (PDB ID 20GT), see FIG. 4D, and a previously identified mesophilic enzyme with PET activity, Bacillus subtilis p-nitrobenzylesterase, BsEstB . An Alphafold structural model reveals that the BsEstB enzyme is similar to EST55, sharing the same 3-domain architecture (catalytic, regulatory, and a/b) with conserved active site triad residues. However, the PET-active group 1 enzymes from this study are structurally divergent from these examples. For example, enzymes 101 and 102 have comparatively large deletions in the main catalytic domain, and enzyme 102 lacks the regulatory domain entirely (FIG. 4D). These truncations are significant because in the canonical fold they contribute around one half of the active site environment, including the catalytic His and Glu residues. Both 101 and 102 conserve the position of the catalytic Ser, but there is no equivalently positioned His in 101, and no equivalently positioned His or Glu in 102. Further studies will be required to characterize the active sites in these enzymes where major domain deletions result in unusually large flat surfaces surrounding potential active sites.

[0041] Discussion

[0042] Enzymes capable of PET hydrolysis have been sourced thus far from a relatively narrow sequence space, and therefore unlikely fully encompass the natural diversity that can catalyze this reaction. Using bioinformatics and ML to gather sequences from environmental and cultivar genomes, we have discovered several distinct enzymes that hydrolyze PET, likely all via a serine hydrolase mechanism based on conservation of the catalytic triad, but with different enzyme architectures. We observed multiple adaptations in this enzyme cohort that will benefit from more detailed study. Many of these rearrangements and adaptations create alternative active site clefts, gorges, and planes, which may provide a useful diversity of structural motifs to achieve efficient interfacial biocatalysis for PET deconstruction. Furthermore, distinct differences in surface charge and in binding mode provide tractable parameters for enzyme engineering to develop biocatalysts with high selectivity for crystalline PET substrates. There are also many subtler adaptations observed in these enzymes, such as diverse N-glycosylation site distributions, which has previously been shown to confer significant reduction in thermal induced aggregation. Deletion and complementation of accessory domains could also provide productive improvement in enzyme performance. For example, several of the group 2 lid domains have N- and C-terminal attachment points in close proximity that could be trimmed, removed, or swapped to test the effects on active site occlusion and substrate binding. These data also indicate that signal peptide sequences, when present in the native genes, should be considered in the screening of putative PET hydrolases.

[0043] It is likely that lessons from canonical PET hydrolases will be more challenging to directly transfer to the enzymes from groups 1-3. Nevertheless, even for those enzymes with marginal activity on PET, the structural and biophysical characteristics provide a foothold for pursuing enzyme evolution. Improvement of these enzymes will benefit from the continuing advances in high-throughput screening and selection techniques. Again, this structural diversity combined with varied functional properties, including a range of thermal stabilities, pH operating ranges, and substrate discrimination, will provide new starting points for parallel engineering projects using these new folds. With the advent of enhanced structural predictions such as AlphaFold and RoseTTAFold, not only can we quickly gain structural insights from our most promising candidates, but we also gain additional insights from those enzyme homologs that are inactive. These technologies will allow the productive combination of negative and positive data to provide richer input for further engineering. [0044] This disclosure herein should enable the discovery of additional enzyme scaffolds in nature. The JGI IMG sequences in groups 1 to 3 yielded low alignment scores with the PET hydrolase HMM (Table 3), and several of these sequences showed hydrolytic activity on PET, despite being markedly diverse relative to canonical PET hydrolases. This finding suggests that the distribution of currently known PET hydrolases, which are largely limited to the polyesterase-lipase-cutinase family (FIG. IB), may result from biases of sequence similarity and HMM methods that limit the search to a narrow sequence space within the vicinity of canonical PET-active enzyme. To this end, our data points present a wider diversity of PET hydrolases across environmental gradients, and which should be the targets of continued exploration.

[0045] To provide insight into the governing sequence characteristics responsible for PET hydrolysis, we further examined the ability of HMM scores to discriminate between active PET hydrolases and inactive homologs by computing the area under the curve (AUC) of the receiver operating characteristic plot and the Spearman correlation coefficient (p) between HMM scores and our experimental activity data. Our results indicate that the HMM scores demonstrate mediocre performance in predicting PET hydrolase activity of putative hits (AUC=0.581, p=0.167). Furthermore, we investigated the distribution of amino acids at each position in a multiple sequence alignment (MSA) of active PET hydrolases and inactive homologs to identify positions that correlate with activity and, therefore, could play key roles in PET hydrolysis activity. However, we did not find statistically significant (p < 0.01) relationships between positional variation in the MSA and activity. This suggests that pairwise covariation and higher- order interactions that are not captured by the HMM play dominant roles in PET hydrolase activity. Recent studies have shown that ML can successfully capture such complex pairwise interactions. Consequently, the application of ML with our experimental activity data within a semi-supervised framework provides promise for improved prospecting of additional active PET hydrolases.

[0046] Given the diversity of putative PET hydrolases studied here, there was a risk of missing active enzymes by relying upon a limited range of expression conditions and activity assays. To mitigate this, we considered a range of heterologous protein expression and reaction conditions. Fortunately, some enzymes were active across broad temperature and pH ranges, while others exhibited narrower windows for activity. The screening results also highlight challenges associated with direct comparison of enzymes, where peak product release may be comparable, but the reaction conditions affording that are not. Furthermore, we found that codon optimization leads to substantially different expression and activity levels with different extents of codon optimization, including for the LCC enzyme and the corresponding 501 enzyme, and BTA-1 and 715, enzyme pairs with identical protein sequences but different nucleotide sequences. Another critical consideration in identifying additional PET-active enzymes are the PET substrate properties. We screened for activity using an amorphous PET film, and yet, upon further characterization, we observed selectivity differences for amorphous PET relative to a crystalline PET powder. This suggests screening should also be conducted using diverse substrates, in addition to multiple reaction conditions. While 74 enzymes represent only a modest number relative to variant libraries commonly encountered in enzyme evolution, we anticipate the lessons learned here will inform future screening efforts.

[0047] Our analysis of candidates from this study already extends to some industrially relevant functional parameters. For example, multiple studies have shown that high substrate crystallinity leads to reduced conversion extents relative to amorphous PET. From an industrial perspective, this has led to an emphasis on substrate pretreatment to thermo-mechanically convert post-consumer PET waste to an amorphous substrate. We recently reported a techno- economic analysis and life cycle assessment of enzymatic PET recycling. Of direct relevance to PET crystallinity and pretreatment, the base case process model included thermal extrusion, rapid quenching, and mechanical size reduction via a microgranulator to reduce the crystallinity of PET from post-consumer PET flake. Sensitivity analysis indicates a potential reduction in process electricity usage by 67%, overall process energy reductions of nearly 50%, and a savings of $0.24/kg recovered TPA if extensive substrate pretreatment could be avoided, thus motivating an interest in enzymes with specificity to crystalline substrates. As shown in FIG. 2B and FIG. 3, 102, 504, 611, and several other enzymes preferentially deconstruct crystalline PET powder relative to amorphous PET film, which suggests exciting possibilities in biocatalyst development for crystalline PET. For example, these enzymes could be used as a foundation from which to develop improved variants that retain preferential selectivity on crystalline PET, or defining differentiating enzyme features, such as surface charge distribution or binding clefts shape. Such features could be transplanted to the best-performing amorphous-active enzymes to assess potential gain-of-function on crystalline substrates. Moreover, this also suggests the potential to develop cocktails of PET hydrolases that contain enzymes with synergistic substrate specificity for amorphous and crystalline domains in the substrate, similar to how cellulase cocktails deconstruct cellulose. This could ultimately enable new avenues to enable enzymatic hydrolysis on PET waste with reduced pretreatment energy inputs.

[0048] Materials and Methods

[0049] Sequence search and alignments

[0050] Environmental metagenomes (n=3, 136) were retrieved from the Joint Genome Institute Integrated Microbial Genome (JGI IMG) database in April 2017. The metagenomes were first categorized into sub-categories (thermal springs, groundwater) as previously reported, and only thermal spring metagenomes were considered further (Table 2). Sequences from these metagenomes were retrieved (~38 million sequences). The National Center for Biotechnology Information (NCBI) non-redundant database was also downloaded as of 20 December 2018 (-184 million sequences). A dataset of 17 enzymes that have been confirmed to exhibit PET hydrolysis activity as of 20 December 2018 was compiled (Table 1). Sequences of the 17 PETases were retrieved and aligned with T-Coffee. T-Coffee performed better in aligning the distantly related sequences, compared to MAFFT, ClustalW2, and MUSCLE, particularly in correct placement of the catalytic Ser and His residues and the terminal Cys residues.

[0051] A profile hidden Markov Model (HMM) was constructed with the PETase alignment using the HMMER software (version 3.1b2) and putative PET hydrolases were retrieved by hmmsearch of the HMM against the retrieved NCBI and JGI IMG sequences. The NCBI search returned 2,165 hits with alignment scores ranging from 100 to 442 (E-value: 7.7e 25 to 8.6e 129 ). To diversify the sequence search space, the HMM threshold was lowered for the JGI IMG search and sequences with relatively lower scores were selected. The JGI search returned 1,367 hits with alignment scores ranging from 26 to 360 (E-value: l.Oe 2 to 1.8e 104 ). For organisms from which the NCBI sequence hits were derived, optimal growth temperature (OGT) data were retrieved from the NCBI Bioproject database (https://www.ncbi.nlm.nih.gov/bioproject/) and the BacDive database (10) (https://bacdive.dsmz.de/). The sample temperatures of the JGI IMG metagenomes (Table S2) were used as the OGT for the JGI IMG sequence hits. To limit the search to thermostable sequences, only thermophilic sequences with OGT of 50°C or greater were selected. Among the NCBI hits, 31 were selected as thermophilic, 1,777 were mesophilic and were discarded, and 353 were from organisms that could not be mapped to OGT data. The thermophilicity of these sequences that could not be mapped to OGT data was predicted with Therm oProt (vide infra). The final selection included 58 thermophilic sequences (predicted/OGT) from NCBI (scores: 104 - 442, E-values: 8.0e-26 - 8.6e-129) and 35 sequences from JGI IMG (scores: 27 - 35, E-values: 3.0e-3 - 2.6e-5). Redundant sequences (100% identity, excluding the predicted signal peptide region) were removed, which left 74 putative thermophilic PET hydrolases in the selection (Table 3).

[0052] Unless otherwise stated, structure-based multiple sequence alignments were used in all further analyses. The structure-based alignment was performed as follows. First, a structural alignment of all crystal structures and AlphaFold structure models presented in this work was performed with the Promals3D web server. Then, all sequences to be analyzed were aligned with MAFFT using the structural alignment as constraint. Sequence analyses were implemented with the Biopython package.

[0053] Prediction of thermophilicity with machine learning (ThermoProt)

[0054] From the NCBI and BacDive databases, sequence and OGT data were retrieved for 24 organisms classified as psychrophilic (<15°C), mesophilic (25-37°C), thermophilic (45- 70°C), or hyperthermophilic (>80°C). A separate testing set was formed of 22,299 proteins from an organism in each OGT class, and the remaining sequences (231,171) were used in training and validation. To prevent overestimation of the validation performance, the sequences were clustered at 40% sequence-identity threshold using the CD-HIT algorithm. From the CD-HIT output, 40,000 sequences were selected for validation such that there were 10,000 sequences in each class, with 8,000 sequences (2,000 in each class) set aside for hyperparameter optimization and feature selection, while the remaining 32,000 (8,000 in each class) were used for training, validation, and analysis. [0055] Three categories of features were derived from the protein sequences.

[0056] Amino acid composition features: the relative amounts of 20 canonical amino acids in the sequence.

[0057] g-gap dipeptide composition: the relative amounts of the peptide, a(x)gb, where a and b are specific amino acids and (x)g represents g amino acids of any type, sandwiched between a and b. In this work, 1,200 g-gap dipeptides (i.e., g = 0, 1, and 2) were tested and the top 10 were selected by their relative (Gini) importance in a random forest model. Additional g-gap dipeptides beyond 10 did not improve the random-forest classification performance.

[0058] Residue type and physiochemical features: in addition, 20 features that have been shown in previous studies to correlate with thermal stability were selected, namely the composition of acidic, basic, non-polar, acyclic, aliphatic, aromatic, charged, and EFMR (Glu, Phe, Met, Arg) residues; the ratio of basic to acidic, non-polar to polar, acyclic to cyclic, and charged to non-charged residues; the composition of tiny (Ala, Gly, Pro, Ser) and small (Thr, Asp) residues, the average maximum solvent accessible area (ASA), the ratio of (Glu + Lys) to (Gin + His), charged vs. polar composition (18), IVYWREL (lie, Val, Tyr, Trp, Arg, Glu, Leu) composition, molecular weight, and heat capacity.

[0059] Five machine-learning methods were tested with the Scikit-learn Python package (21): random forests, logistic regression, Gaussian naive Bayes, K-nearest neighbor, and support vector machine (SVM). Hyperparameters for each method were optimized with a grid search using dataset of 8,000 proteins (2,000 per class). Four binary classifiers were tested: psychrophilic vs. mesophilic (PM), mesophilic vs. thermophilic (MT), thermophilic vs. hyperthermophilic (TH), and mesophilic vs. thermophilic/hyperthermophilic (MTH). Machine learning methods with the different binary classification schemes were used and measured over fivefold cross-validation with the dataset of 32,000 proteins (8,000 per class). All methods achieve accuracies between 68.0% and 86.6%. In addition to the accuracy, the true positive rate (recall), true negative rate (specificity), and Matthew’s correlation coefficient were also computed. The SVM method (termed ThermoProt) yielded the best performance (MTH, 86.6% accuracy) and was applied to the PETase HMM hits without OGT data to predict the thermophilicity.

[0060] It is important to note that while this work was ongoing, a dataset of OGT for 21,498 microbes was published which enabled regression models that directly predict the OGT (23, 24), and the optimal catalytic temperature (Topt) of an enzyme. These regression methods could be applied in future works for more precise prediction of the thermotolerance of putative PETases. [0061] Discrimination of active PETases from inactive homologs with hidden Markov Models (HMM).

[0062] Sequence data of 60 enzymes with experimentally confirmed PET hydrolase activity were compiled, comprising 36 PETases reported in other studies (Table SI) and 24 non- redundant PETases newly presented in this study. Sequence data of 19 homologs that are experimentally confirmed to be inactive on PET were also compiled, comprising 15 sequences from this study, and PET28, PET29, PET38 (26), and Cbotu EstB reported previously. A structure-based alignment of all 79 active and inactive sequences was performed, and the alignment was split to separate sub -alignment of active and inactive sequences.

[0063] The performance of HMM in discriminating active PETases from inactive homologs was evaluated with fivefold cross-validation. The active/inactive sequences were split into five folds and the HMM was repeatedly built with the data in four folds and evaluated with the data in the left-out fold such that each fold was iteratively used in training and testing. Two methods of HMM prediction were considered. First, an HMM was built with active PETases in the training set and searched against sequences in the testing set. The HMM alignment score of test sequences was construed as a predictive measure of PET hydrolase activity (score method). In the second method (difference method), an additional HMM was built with inactive homologs in the training set, and searched against the testing set. The difference between the HMM score obtained from the active PETase HMM and the score from the inactive homologs HMM was construed as the predictive measure of PET hydrolase activity. With the score method, it is expected that sequences exhibiting high PET hydrolase activity would have high scores when searched against an HMM of active PETases, while inactive sequences or sequences with low activity would have low scores. With the difference method, it is expected that active sequences would have higher scores when searched against an HMM of active PETases than when searched against an HMM of inactive homologs, and, consequently, a higher score difference. Similar HMM approaches have proven remarkably successful in discriminating functional subtypes in protein families. However, the results indicate that HMM only demonstrates mediocre performance in discriminating PETases from inactive homologs.

[0064] In addition, the amino-acid distribution in the alignment of active PET hydrolases and inactive homologs was investigated. If a residue position plays key roles in activity, it is expected that the amino acid distribution at that position would significantly vary between actives and inactives. A chi-squared test of independence was performed to compare the amino- acid distribution at each position in the structure-based alignment between 60 active PETases and 19 inactive homologs. Positions with gaps in more than 90% of the sequences were removed (805 removed, 437 remaining). The test was also performed to compare the distribution of amino acid types (aliphatic: Ala, Gly, Val, Leu, lie, Met, Cys, Pro; aromatic: Phe, Trp, Tyr, His; positive: Arg, Lys; negative: Asp, Glu; polar: Asn, Gin, Ser, Thr). The results indicate that no single position in the alignment shows statistically significant difference (p < 0.01) between active PETase and inactive homologs.

[0065] Phylogenetic analyses and sequence similarity network

[0066] Phylogenetic analyses were conducted with the MEGAX software. For the phylogeny of 74 candidate sequences (FIG. 1A), the evolutionary history was inferred using the Minimum Evolution (ME) method. The evolutionary distances were computed using the JTT matrix-based model and are in the units of the number of amino acid substitutions per site. The ME tree was searched using the Close-Neighbor-Interchange (CNI) algorithm at a search level of 1. The Neighbor-joining algorithm was used to generate the initial tree. All ambiguous positions were removed for each sequence pair with the pairwise deletion option.

[0067] A separate tree was constructed to further illustrate the phylogenetic relationships of 36 previously reported PET-hydrolases and the unique PET-hydrolases presented in this study using the maximum likelihood method with 1000 replicates and the JTT matrix-based model.

The initial tree for the heuristic search was obtained by applying the Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the JTT model, and then selecting the topology with superior log likelihood value. All positions with less than 95% site coverage were eliminated. The phylogenetic trees were visualized with the Interactive Tree of Life (iTOL) online tool.

[0068] The sequence similarity network (SSN) (Fig. IB, main text) was implemented with the Enzyme Function Initiative Enzyme Similarity Tool (EFI-EST). Sequences were subjected to a BLASTall pairwise search and the SSN was constructed with a threshold of le 10 . The SSN was visualized with Cytoscape. [0069] Materials

[0070] Amorphous PET film (Product ES301445) and crystalline PET powder (Product 306031) were purchased from Goodfellow Corporation (EISA). Percent crystallinity was for each substrate has previously been reported. All reagents and buffer components were acquired from Sigma-Aldrich.

[0071] Plasmid construction

[0072] Coding sequences were codon optimized for Escherichia coli str. K-12 MG1655 using a guided random approach from the OPTIMIZER server

(http://genomes.urv.es/OPTIMIZER). Optimized sequences for expression of the 6 control hydrolases (wild-type IsPETase, mutant variant IsPETase (W159H/S238F), wild-type LCC, the ICCG variant of LCC, the WCCG variant of LCC, and BTA-1), and all versions of the 74 candidate enzymes were synthesized by Twist Biosciences in pET21b(+) (EMD Millipore)-based plasmids. Each construct includes a C-terminal hexa-histidine epitope tag. Sequences are provided in Table SD1 (candidates) and Table SD2 (controls). All 74 genetic expression constructs have been deposited at AddGene at https://www.addgene.org/Gregg_Beckham/.

[0073] Enzyme expression

[0074] For identifying soluble heterologous protein expression, BL21 (DE3) E. coli (NEB), OverExpressTM C41 (DE3) (Lucigen), and Lemo21 (DE3) (NEB) competent cells were used. Competent cells were transformed with pET21b(+) plasmids encoding the enzyme of interest. Single colonies from transformation were then inoculated into a starter culture of lysogeny broth (LB) media containing 100 pg/mL ampicillin and grown at 37°C overnight. Four expression strategies were evaluated using 50 mL cultures and soluble expression was evaluated by SDS- PAGE with Coomassie staining and Western blot using primary antibody against the hexa- histidine epitope tag (Invitrogen). Using results from the 50 mL scale expression tests, the best condition was chosen for each control or candidate and scaled to 1-5 L, depending on expression level. Table S10 details which competent cell line and expression strategy was used for each control and candidate enzyme, and the final expression level (mg enzyme/L culture) obtained for each enzyme.

[0075] In strategy A, the starter culture was inoculated at a 100-fold dilution into a 2xYT medium (10 g NaCl, 10 g yeast extract, 16 g tryptone per L culture) containing 100 pg/mL ampicillin and grown at 37°C until the optical density measured at 600 nm (OD600) reached 0.6- 0.8. Protein expression was then induced by addition of isopropyl b-D-l-thiogalactopyranoside (IPTG) to a final concentration of 1 mM. Cells were induced at 20°C for 18 to 24 h following IPTG addition, harvested by centrifugation, and stored at -80°C until purification.

[0076] In strategy B, the starter culture was inoculated at a 100-fold dilution into a 2xYT medium containing 100 pg/mL ampicillin and grown at 37°C until the OD600 reached 0.6. Protein expression was then induced by addition of IPTG to a final concentration of 0.5 mM. Cells were induced at 25°C for 16 to 18 h following IPTG addition, harvested by centrifugation, and stored at -80°C until purification.

[0077] In strategy C, the starter culture was inoculated at a 1000-fold dilution into ZYP-5052 medium containing 100 pg/mL ampicillin and grown at 28°C for 24 h. Cells were harvested by centrifugation and stored at -80°C until purification.

[0078] In strategy D, the starter culture was inoculated at a 500-fold dilution into ZYP-5052 medium with 0.3 M NaCl containing 100 pg/mL ampicillin and grown at 25°C for 72 h. Cells were harvested by centrifugation and stored at -80°C until purification.

[0079] Enzyme purification [0080] Harvested cells were thawed on ice and resuspended in a lysis buffer (300 mM NaCl, 10 mM imidazole, 20 mM Tris HC1, pH 8.0,) with 0.25 mg/mL lysozyme, and 12.5 U/mL DNase I. Cells were lysed using either a bead beater (BioSpec Products, Inc.) or sonication with a microtip (39% power, 20 s ON, 20 s OFF for a total of 2 min 20 s ON). Lysate was clarified by centrifugation at 40,000 x g for 40 minutes at 4°C. Clarified lysate was filtered through a 0.45 pm PVDF membrane, then applied to a 5 mL HisTrap HP (Cytiva) affinity column using an Af TA Pure chromatography system (Cytiva) and eluted using a buffer comprising 300 mM NaCl, 500 mM imidazole, 20 mM Tris HC1, pH 8.0. Resulting fractions containing the protein of interest were pooled and dialyzed at room temperature (25°C) using 3.5 kDa molecular weight exclusion membranes in an exchange reservoir at least 300 times the pooled sample volume of 300 mMNaCl, 20 mM Tris, pH 8.0 buffer. After 16 to 20 h of buffer exchange, samples were centrifuged and evaluated by SDS-PAGE with Coomassie staining. Pooled samples were concentrated using 3.5 kDa molecular weight cut-off spin columns and applied to a HiLoad Superdex 75 pg 16/60 (Cytiva) size exclusion column equilibrated with 300 mM NaCl, 20 mM Tris, pH 8.0 for use in screening or time course analysis. Protein in eluted fractions from affinity and size exclusion columns were assessed using SDS-PAGE with Coomassie staining and Western blot using primary antibody against the hexa-histidine epitope tag (Invitrogen). Total protein was assessed by BCA assay.

[0081] Signal peptide sequences

[0082] Presence of signal peptide sequences was predicted using SignalP 5.0 (40). From 74 putative thermophilic PET hydrolase sequences, 36 signal peptides were removed for construct synthesis. A selection of 12 truncated constructs that proved challenging to express were re synthesized to include the native signal peptide (nSP) and compared for changes in expression and activity. Of these signal peptide-containing constructs, 7 were successfully expressed and screened, of which, only 607 could not be expressed without the native signal peptide. Sequences for the nSP-containing candidates are provided in Table SD1. Additionally, expression of the Thh Est enzyme (710) was previously reported from an expression plasmid (pET26b(+)) containing an N-terminal pelB signal peptide. Both the truncated version of 710 and the pelB- containing version (710-pelB) expressed enzyme, but neither showed activity during screening (data not shown for 710-pelB).

[0083] Protein calorimetry (DSC)

[0084] Apparent melting temperature (Tm) values for those purified enzymes that were sufficiently soluble (>0.1 mg/mL) in neutral buffer were assessed by differential scanning calorimetry (DSC). Immediately prior to DSC analysis, to ensure both mono-dispersity and an optimal buffer match, each enzyme was prepared by size-exclusion chromatography (SEC) through a HiLoad Superdex 75 pg column (Cytiva) pre-equilibrated with the DSC reference buffer comprising 50 mM NaH2P04, pH 7.5, with either 300 mM NaCl (for 606) or 100 mM NaCl (for all other enzymes). The SEC column was calibrated with a mixture of globular protein standards (Sigma- Aldrich) - thyroglobulin (670 kDa), g-globulin (158 kDa), albumin (67.0 kDa) and ribonuclease A (13.7 kDa) - to allow for the calculation of an apparent molecular weight (MWapp) for each enzyme from its elution volume. Subsequently, triplicate DSC analyses, each using 0.1 - 0.2 mg/mL enzyme, were performed on a MicroCal PEAQ-DSC-Automated instrument (Malvern Panalytical). The temperature of the sample and reference cells was raised from 30°C to 120°C at a rate of 1.5 °C/min using low feedback. Thereafter, reference buffer subtraction, baseline correction and apparent Tm determination were performed using the instrument’s data analysis software (vl.60). [0085] Monomer quantitation

[0086] Analyte analysis of BHET, MHET, and TPA was performed on an Infinity II 1290 ultra-high-performance liquid chromatography (UHPLC) system (Agilent Technologies) equipped with a G7117A diode array detector (DAD). Samples and standards were injected using a volume of 0.25 pL onto a Zorbax Eclipse Plus C18 Rapid Resolution HD (2.1 x 50 mm, 1.8 pm) (Agilent Technologies) column maintained at 40°C. The mobile phase used to separate the analytes of interest was composed of (A) 20 mM phosphoric acid in ultrapure water and (B) 100% methanol. Separation of analytes was carried out using a constant flow rate of 0.7 mL/min and a gradient program with a total run time of 3 min. The gradient program proceeded as follows: at t = 0 min, (A) = 80% and (B) = 20%; at t = 2 min, (A) = 35% and (B) = 65%; from t = 2.01 min until the end at t = 3 min, (A) = 80% and (B) = 20%. The calibration curve for each analyte was evaluated between concentrations of 1 - 200 mg/L with DAD detection at a wavelength of 240 nm. Ten calibration standards were used with an R2 coefficient of 0.995 or better. Calibration verification standards (CVS) for each analyte was analyzed every 12-24 samples to ensure the integrity of the initial calibration. Samples were diluted with ultrapure water for analysis and maintained at 15°C during the analysis.

[0087] Screening for activity on amorphous PET film

[0088] In each screening reaction, 2.9% loading by mass of an amorphous PET film (Goodfellow) was incubated with 10 pg enzyme of interest (0.7 mg enzyme/g PET), unless noted otherwise in Table 4 due to low expression levels. Reactions were performed in polypropylene tubes containing 100 mM NaCl and 50 mM buffering agent (citrate at pH 6.0, NaH2P04 at pH 7.0, NaH2P04 at pH 7.5, HEPES at pH 7.5, bicine at pH 8.0, and glycine at pH 9.0) and incubated at 30°C, 40°C, 50°C, 60°C, or 70°C. All reactions were terminated after 96 h by addition of equal volume 100% methanol and PET was removed from the reaction solution. Soluble fractions were filtered through 0.2 pm nylon filters for monomer quantitation. All PET hydrolysis screening reactions were performed in triplicate.

[0089] For enzymes with peak activity at pH 6.0, an extended pH screening assay was performed using 2.9% loading by mass of amorphous PET film (Goodfellow) and 10 pg enzyme of interest (0.7 mg enzyme/g PET enzyme loading) in polypropylene tubes containing 100 mM NaCl and 50 mM citrate (pH 5.5 and pH 5.0) or 50 mM sodium acetate (pH 5.0 and pH 4.5). All reactions were terminated after 96 h by addition of equal volume 100% methanol and PET was removed from the reaction solution. Soluble fractions were filtered through 0.2 pm nylon filters for monomer quantitation. All PET hydrolysis screening reactions were performed in triplicate. [0090] Aromatic product release data are reported throughout relative to background aromatic product release detected in no-enzyme control reactions at each pH and temperature. Background aromatic product release for both amorphous PET film and crystalline PET powder was below the detection limit for all pH and temperature combinations tested.

[0091] Characterization of PET hydrolysis activity on varied substrates with time resolution [0092] Using the reaction conditions (buffer and temperature combination) where peak PET hydrolysis activity was measured from the screening assays, a selection of enzymes was further characterized over a 168 h reaction on amorphous PET film (Goodfellow) and crystalline PET powder (Goodfellow) substrates. Each reaction was performed using 2.9% by mass substrate loading and 10 pg enzyme of interest (0.7 mg enzyme/g PET). Reactions were terminated at the designated timepoint by addition of equal volume 100% methanol and PET was removed from the reaction solution. Soluble fractions were filtered through 0.2 pm nylon filters for monomer quantitation. All time course experiments were performed in triplicate and samples were diluted with ultrapure water for analyte quantitation. Table 5 provides details on the enzyme and reaction condition pairings evaluated over 168 h reaction time.

[0093] Structure determination

[0094] For crystallography, all proteins were concentrated and sitting drop crystallization trials were set up with a Mosquito crystallization robot (SPT Labtech) using SWISSCI 3-lens low profile crystallization plates. The proteins were crystallized using the following screens and conditions:

[0095] 202 - JCSG-plus screen (Molecular Dimensions), G7, 15 % PEG 3350, 0.1 M succinic acid.

[0096] 306 - SaltRx screen (Hampton Research), E8, 1.8 M sodium phosphate monobasic monohydrate, potassium phosphate dibasic pH 5.0.

[0097] 606 - Structure screen (Molecular Dimensions), F5, 0.1 M Sodium HEPES pH 7.5,

70% (v/v) MPD.

[0098] 611 - PACT screen (Molecular Dimensions), FI, 20 % PEG 3350, 0.2 M sodium fluoride, 0.1 M Bis-Tris propane pH 6.5.

[0099] 702 - PACT screen (Molecular Dimensions), F8, 20 % PEG 3350, 0.2 M sodium sulfate, 0.1 M Bis-Tris propane pH 6.5.

[00100] 703 - PACT screen (Molecular Dimensions), F10, 20 % PEG 3350, 0.02 M sodium/potassium phosphate, 0.1 M Bis-Tris propane pH 6.5.

[00101] 705 - JCSG screen (Molecular Dimensions), FI, 0.05 M Cesium Chloride, 0.1 M

MES pH 6.5, 30% (v/v) Jeffamine M-600.

[00102] 711 - JCSG screen (Molecular Dimensions), D6, 0.2 M Magnesium Chloride

Hexahydrate, 0.1 M Tris pH 8.5, 20 % (w/v) PEG 8000. [00103] All crystals were cryo-protected with 20% glycerol in the crystallization solution and flash-frozen into liquid nitrogen. Diffraction data were collected at the Diamond Light Source (Didcot, UK) and automatically processed with STARANISO on ISPyB. STARANISO was also used for processing anisotropic data and calculating ellipsoidal completeness. The structure was solved within CCP4 Cloud by molecular replacement with Molrep (2) using search models created by phyre2. For 306, MR was solved with an AlphaFold structure prediction. Model buildings were performed in Coot and the structures were refined with BUSTER and REFMAC5. MolProbity was used to evaluate the final models and PyMOL (Schrodinger, LLC) for protein model visualizations. The atomic coordinates have been deposited in the Protein Data Bank. Search for structural protein homologs and calculation of RMSD values were performed with the DALI server.

[00104] AlphaFold structure predictions were generated using the same models and inference procedure as employed in CASP14. This is described in the recent AlphaFold paper. Mean pLDDT (predicted local distance difference test) over the structure was used for model ranking, and pLDDT values were written into the B-factor column of each structure file.

[00105] Molecular docking

[00106] Molecular docking calculations were performed using the program Molecular Operating Environment (MOE). Flexible PET dimers and trimers were optimized inside a rigid host structure. Initial placement of the PET oligomer units was carried out using the Triangle Matcher approach, with subsequent refinement via molecular mechanics. The position and energy of 200 poses were optimized and their ranking was carried out based on the highest molecular mechanics interaction energy, E refine. [00107] Table 1. List of current experimentally verified PET hydrolases. The HMM column shows the 17 sequences used in constructing the HMM, which were among the PET hydrolases known at the time of the initial enzyme candidate selection. The Candidate Enzyme ID column shows the identifier for sequences that are also contained in our set of 74 putative PET hydrolases.

[00108] Table 2. JGI IMG metagenomes from which putative sequences were derived. These metagenomes comprised a total of 38 million sequences, which were searched against the PETase HMM to derive putative PET hydrolases. The rows that are bolded in the Scaffold Key column highlight metagenomes from which the JGI candidates in our dataset (27 out of 74) were derived.

[00109] Table 3. Annotated list of the 74 candidate enzymes. The HMM score column shows the alignment scores obtained by searching the HMM built with 17 experimentally confirmed PETases against the NCBI and JGI databases. Sequences in groups 1 to 3 were retrieved from JGI IMG and the accession column shows the scaffold ID mapping the sequence to the corresponding metagenome (see Table 2). Sequences in groups 4 to 7 were retrieved from NCBI and the accession column shows the GenBank accession number.

[00110] Table 4. Expression and purification trial results for all 74 enzymes and the signal peptide-containing variants. Enzymes previously reported to have PET hydrolysis activity are shown in peach. Constructs that were not expressed sufficiently for screening are shown in grey. Expression yields are also shown in Fig. S2. Candidates noted nSP encode the native signal peptide sequence at the N-terminus of the expression sequence. The expression approaches employed are described in the Materials and Methods, annotated as strategies A-D. Briefly, in strategy A, induction in 2xYT media with 1 mM IPTG at 20°C was used; in strategy B, induction in 2xYT media with 0.5 mM IPTG at 25°C was used; in strategy C, autoinduction in ZYP-5052 media at 28°C was used; and in strategy D, autoinduction in ZYP-5052 media supplemented with 0.3 M NaCl at 25°C was used. The final concentration of protein per L of culture is reported after affinity and size exclusion chromatography. Also reported is the pH and temperature combination that resulted in the highest level of product release from the screening assays. C6 = citrate, pH 6; NP7 = NaH2P04, pH 7; NP7.5 = NaH2P04, pH 7.5; H7.5 = HEPES, pH 7.5; B8 = bicine, pH 8; G9 = glycine, pH 9. The enzyme loading (in pg enzyme per reaction) for each screening reaction is noted.

[00111] Table 5. Enzymes and reaction conditions tested in 168 h time course experiments. Selectivity ratio provides the mass ratio of products at 168 h and preference for amorphous PET film (A) or crystalline PET powder (C) is noted. Reaction conditions tested that are not shown in Fig. 2B are noted with an asterisk (*).

[00112] Table 6. Tm data for selected proteins.

[00113] Disclosed herein are predicted and verified PET hydrolase enzymes, their activity, and their nucleic acid and amino acid sequences. In an embodiment, as disclosed in Appendix A, are amino acid sequences of PET hydrolase enzymes that have been identified. In an embodiment, the amino acid sequences disclosed in Appendix A each begin with a methionine.

In an embodiment, some of the identified sequences have been cloned, and the enzymes that they encode for have been expressed, purified and their PET hydrolase activity has been determined. In an embodiment, the PET hydrolase enzymes disclosed herein possess desirable traits that are leveraged in the design and engineering of enzyme formulations targeted to degrade specific polymers. In an embodiment, the PET enzymes disclosed herein have measurable PET degrading activity and, may be active for degrading polyester polyurethanes.

[00114] In an embodiment, computational methods and other algorithms are used to predict and identify nucleic acid and amino acid sequences for active PET hydrolase enzymes. In an embodiment, the use of algorithms is contemplated to predict secondary, tertiary and quaternary structures for the predicted PET hydrolase enzymes.

[00115] Disclosed herein are seven clade groups of PET hydrolase enzymes that were identified using the methods disclosed herein and the accession numbers of the putative and actual PET hydrolase enzyme members of the clades are disclosed in Table 7.

[00116] Table 7:

[00117] Table 8 discloses PETcan group clades and controls, their respective sequence identifiers used herein, their respective PET hydrolase activity levels, their respective amino acid sequences, their respective nucleotide sequences, the expression conditions of the studied enzymes as well as additional information regarding yield of the expressed PET hydrolases. [00118] Table 8:

[00119] In an embodiment, the sequences disclosed herein are as follows:

>PETcan_101

CLYLNIWTPDLNGSLPVMVFIHGGGNQQGSTAQIAGGARI YEGKNLARRGQVVVVTLQYR LGALGYLVHPGLEAESTHGKAGNYGALDQLAALLWIKENIRAFGGDPELVTLFGESAGAV NIGNLLVMPAAKGLFHRAILQSGSPRLKAYSAARNEGIAFAQKLGAAGTPEQQVAHLRTL

PVDSLVKGDSNPISGGSMAQGSWQPVLDGYWFPQAPLDAMRSGEHHRVPLIVGSSSD EMS

LYVPSW TPLMLQTFVQTTIPAPYRQQVLALYPPGTTNEQARASYVALVGDPLESTCRHA

S

>PETcan_l02

QSPAQSSAPTVELDSGAIAGSTADGW SFKGIPYAAPPVGNLRWRAPQPVASWTGVRAAT EYGYDCIQLPLEGDAAASGGEMSEDCLVLNVWRPAEIAPGERLPVLVWIHGGGFLNGSAA APIYDGTAFAQQGLVW SFNYRLGRLGFFAHPALTAANEGPLGNYGLMDQIAALEWVQRN IAAFGGDPARITLMGQSAGGISVMYHLTAPESQGLFHQAAVLSGGGRTYLLGLRNLREST DALPSAEQSGLAFGRRFGIRGRGRAALRSLRSLSAEEVNGDLSMAALVEKPADYAG

>PETcan_l03

QGITVRTPLGPALGQMEKGAIAFYGLPYAQASRFEAPRPVAAWPPGVGRERVACPQT PGT TARLGGYIPPQREDCLVANLFLPLEPPPPEGFPVMVYLHGGGFTSGSAAEPI YGGHRMAQ EGW W SVNYRLGPLGFLALPALEKENPKAVGNYGLLDLVEALRFVQRHIRYFGGNPQNV TLFGESAGGMLVCTLLATPEAQGLFHKAILQSGGCHQVRPLERDFPFGEQWAKNLGCSPE DLACLRNLPLSRLFPTMEPKAPPDITASALGFPNSPFKPHLGALLPESPTEALRKGQARD IPLLVGANLEELAFPGLAWLLGPRRWEEFGQRLAAQGLTQQQREALKGVYQKRFSEPRAA WGQAQTDLLLLCPSLKAARLQASFAPTYAYLFTFRVPGFEGLGAFHGLELAPLFGNFEEM PFLPLFLSAEAREKAEALGKRMRRYWVSFAREGEPRSWPHWPTYEEGYLLRLDEPPGLIP DLYEERCGVLEALGLL

>PETcan_l04

VFLGWQGSPVQLPAHAGEQAPSPVEPLNLPDPARPGAYPVALLTYGSGQDKLRQEYA QGA ALLTPSVDASLLLEGWSSLRTAYWGFSPAELPLNGRVWYPQAEGRFPLVIAVHGNHPMEE TSESGYDYLGELLASRGFIFVAVDENFLNISAWGDVLFFNRLEGESDARGWLVLEHLRLW QSWNEQPGNPFYQRVDLNQIALLGHSRGGEAIVIAAAFNRLSHYPDNAALSFDYGFKIRS LIALAPADGQYQPGGLPTPLQDVNYLLLHGSHDMDVLTMMGAAPFERLTFSGQDDFFKSA VYIYGANHGQFNSVWGNKDIAEPIPRLYNLRQLLPQTEQQRIAQVLISAFLEDTLRGERA YRPLFQ

>PETcan_201

LVRIGEQEDAVAALEFLLQRDEIDTERIALAGYSFGAFVGLAALNGNENIKALVGVS PPL TLFEFSYLKNCTKPKLLIIGDMDQFTPLKVFKE FYEKIPEPKNKRIIEGADHFYWGYENE VGQW ADFLKKTFKNIP

>PETcan_202

VDITGNGMAATAPTDERIVDKPLPQPQIRSGNVRAMPAARKLAQEHGIDLSTLTGSG PGG

VIVKEDVERAITARAVPVSPLQRVNFYSAGYRLDGLLYTPRHLPAGERRPGW LLVGYTY

LKTMVMPDIAKVLNAAGYVALVFDYRGFGESEGPRGRLIPLEQVADARAALTFLAEQ SMV

DPDRLAVIGISLGGAHAITTAALDQRVRAW ALEPPGHGARWLRSLRRHWEWRQFLSRLA

EDRRQRVLSGGSTMVDPLEIVLPDPESQAFLDQVAAEFPQMKVTLPLESAEALIEYV SED LAGRIAPRPLLIIHSDADQLVPVAEAQAIAERAGSSAQLEI IPGMSHFNWVMPGSPGFTR VTDSIVKFLRNTLPVSADN

>PETcan_203

VPLILNVHGGPAGVFQQTFTGGRS IYPIATFAARGYAVLRPNPRGSSGYGVEFRRANLKD WGGMDYQDLMAGVDKVIEMGVADSSRLGVMGWSYGGEMTSWIVTQTNRFKAASAGAPVTN LTSFTTTADIPAFIPDYFGGQFWDSPEVYRAHSPISFVKSVTTPTMIQHGTADMRVPISQ GFEFYNALKARGIPTRM

>PETcan_204

VPSAGVGLSGVLHLPAGVSRPVLFLHGFTGNKTESGRLYTDMARVLCSAGYAALRFD FRG

HGDSPLPFEEFRISLAVEDARNAAGFLKNVPEVDGTRFGW GLSMGGGVAVSLAAGREDV

GALVLLSPALDWPELFQRARGFFRAEEGYVYWGPHRMRDVYAMETMNFSVMGLAEEI QAP

TLIIHSVDDMW PISQAKRFYEKLKVEKKFIEIEHGGHVFDDYNVRRRIEQEVLDWVKRH

L

>PETcan_205

RVLCSAGYAVLRFDYRCHGDSPLPFEEFRISMAVEDAENAVKYVKSLERVDGSSFAV IGL

SMGGGVAVKLAAGRDDVAALVLLSPALDWPELTGRVPFKVEEGYVYMGPFRMRAENA MEN

ARFTVMDIAEQVKAPTLIVHATDDEW PISQAKRFYEKLRVEKRFLEVKSGHVFNDYHVR

RNLEGEILSWVKSHL

>PETcan_206

VPSAGVGLSGVLHLPAGVSRPVLFLHGFTGNKTESGRLYTDMARVLCSAGYAALRFD FRC HGDSPLPFEEFRISLAVEDARNAAGFLKNVPEVDGTKFGW GLSMGGGVAVSLAAGREDV GALVLLSPALDWPELFQRARGFFRAEEGYVYW GPNRMRDVYAMETMNFSVMGLAEEIKAP TLIIHSVDDVVVPISQAKRFYEKLKVEKKFIEIEQGGHVFEDYNVRRRIEREVLDWVKRH L

>PETcan_207

GFTGNKAEAGRLYTDMARVLCAAGYAALRFDFRCHGDSPLPFEEFRISYAVEDARNA ASF LKIQPSVDGSRFAVIGLSMGGGVAVSLAAGRDDVAALVLLSPALDWPELAARIPQPKVEG GYVYMGPNRMKVECVTETMKFTVMDLAERVKAPTLIIHAADDMW PISQSKRFYEKLKVE KKEMEIERSGHVFDDYNVRRRVEAEVLDW IKKHL

>PETcan_208

DGCIEDLRFIEFDGFRLASTIHRPAIATSSAVLMLHGFTGNRIEVNRLYVDIARRLC SEG MW LRLDYRGHGESSLPFEEFKIGYALEDGGKALEVLQKLFNPVRIGW GFSLGGYVAIH LASRYRGAISSLALLAPGIKMDELATELARKL SLEGDFYIVRALKIRREGIESMIRSPSA MIYADTVDIPVLIIHAKNDSAVPYIHS IEFYEKIRSQKKRIVILDEGGHTFELHHIRDRV IEEW AWFRETLLYT

>PETcan_209

VDITGNGMAATAPTDERIVDKPLPQPQIRSGNVRAMPAARKLAQEHGIDLSTLTGSG PGG

VIVKEDVERAITARAVPVSPLQRVNFYSAGYRLDGLLYTPRHLPAGERRPGW LLVGYTY

LKTMVMPDIAKVLNAAGYVALVFDYRGFGESEGPRGRLIPLEQVADARAALTFLAEQ SMV DPDRLAVIGISLGGAHAITTAALDQRVRAW ALEPPGHGARWLRSLRRHWEWRQFLSRLA EDRRQRVLSGGSTMVDPLEIVLPDPESQAFLDQVAAEFPQMKVTLPLESAEALIEYVSED LAGRIAPRPLL11HGDADQLVPVAEAQAIAERAGS SAQLE11PG

>PETcan_210

LIRPVAFRNMNQQIIGILHTPDNIKPGEKTPGILMLHGFTGNKTEAHRLFVHVARSL SEY GFIVLRFDFRGSGDSDGEFEDMTLPGEVSDAERALTFLLRRRNIDRDRVGVIGLSMGGRV AAILASKDKRVKFAVLYSPALGPLRDRSLSEMSREKIERLNSGEAVEFFAEGWYIKKTFF ETVDYIVPLDIMDSIRVPVLIVHGDRDPI IPVEEAIRAYEKIKGVNKKNELYIVRGGDHT FSKKEHTQEVIKKTLDWIRALSVSEGS IVLFRLLE

>PETcan_211

LIRPVTFRNMNQQIIGILHTPDNIRLNEKVPGILMFHGFTGNKTEAHRLFVHVARSL SEH GFIVLRFDFRGSGDSDGEFEDMTLPGEVSDAERALTFLLRQRNVDKNRIGVIGLSMGGRV AAILASKDRRVKFAVLYSPALGPLRDRSLSEMSKEKIERLNSGEAVEFFAEGWYIKKAFF ETVDYIVPLDIMDSIKVPVLIVHGDKDPL IPVGEAIRAYEKIKGVNEKNELYIVRGGDHT FSKKEHTLEVIKKTLDWIRSLGI

>PETcan_212

LTITAIIYLLATIIAAILLW YIISSSASKKLATPPRKTGSWSPRDLGFEYEKVEVKTSD GLTLRGWLIPRGSEKTVIVIHGYTSCKWDEWYMKPVINILARHDFNW AFDMRAHGESDG EKTTLGYREVDDIGAIINYLKERGLASRLGI IGYSMGGAITLMSLARYEELKAGVADSPY IDIRASGKRWINRVGAPLRYILLASYPLIMRLTASRTGASPEKLVMYQYAKS ITKPLLII GGQQDDLVAIDEVRKFYEEVKKVNSNVELWETTSKHVSAIQDYPREYEERIVGFFNRWL

>PETcan_213

SELELNEVFKLIKLVSEMNKGQQI IGVLHKPDKIKPHEKVPGIVMFHGFTGNKSEAHRLF VHIARGLSSRGEMVLRFDFRGSGDSDGDFEDMTLPEEVSDAERAITFVLRQRNVDREKIG VIGLSIGGRVAAILASRDERIKFAVLYSPALGRLKERFLS LMGEEALRRLNCGEPIEVSS GWYLKKAFFETVDYIVPVEVMSNIRVPVL 11HGDRDE11PVEESMKAYERIKGLNEKNEL YIVKGGDHTFSKREHTLEVLNKTIEWLSSLNLM

>PETcan_214

ARAAPISPLQRVNFYSAGYRLDGLLYTPRHLPAGERRPGW LLVGYTYLKTMVMPDIAKV

LNAAGYVALVFDYRGFGESEGPRGRLIPLEQVADARAALTFLAEQSMVDPDRLAVIG ISL

GGAHAITTAALDQRVRAW AIEPPGHGAHWLRSLRRHWEWSQFLSRLTEDRRQRVLSGVS

STVDPLEIVLPDPESQAFLDQVAAEFPQMKVTLPLESAEALIEYVPEDLAGRIAPRP LL

>PETcan_215

ATVLVIPKLGLTMTEGRVGRWLKQLGEPVQAGEPVLEVETEKLTVEVEAPASGILAY ILA EEGW LPVTAPVAVIAEPGEAVDLASLLPATSGAAATPVMAASSTMQEQARAQGPTPTGE IRATPAARKLARDHGIDLARVRGTGPGGRITAEDVERYLASQGTAWPRGEPVRFWSDGLA LAGELFLPPSTDTAVPGW LCTGIQGLKELGMPLLAQALADAGYAALIFDYRGFGASEGP RGRLLPQERIRDARAALTFLETHPLIDRTRLAILGLSLGGAHALSLAAIDDRVQACIAIA PLTNGRRWLRSLRAEWQWRV >PETcan_301

QPYPVGTRTITYQDPVRNNRNIQTYLYYPATAAGANQPVAGGQFPW W GHGFTMNYAPY AFWGNALAESGYIVAIPNTETGFSPSHSAFAADMAFLVAKLYTENTNSSSPFYQHVQYNS C11GHSMGGGCTYLAAQNNADVSATVT FAAAETNPSATAAAANVNCPSLVFSGSADCITP PAQHQVPMYNALPDCKAYGGSSRVDLQACK

>PETcan_302

VRRPNNTTFTAQLYYPATATGDNAPYDGSGAPYPAVSFGHGFLQPPERYRS ILEHLASWG YLTIATESGQELFPNHRAYAEDMRYCLTYLEEQNADPASWLFGQVATAQFGISGHSMGGG ASILAAAADARIKAVANLAAAETNPSAIQAS PNITVPHSLISGSADTITPLSSNGLRMYT AGLRAEAAARHSRRLGLRVPKTPS IFGCDSGSLPPRHA

>PETcan_303

IWYPAVRVRGQPQRTTYQYGPLIGEGRAYRDAPADLRGAPYPLLI FSHGLGGARIQSVFY AEHLASHGFW MAADHTGSTFADLLRGRADSILESFARRPLEILRQIEYAAALNADDDTL RGAIDAETVGVTGHSFGGYTALAAAGAQLNINAIREGCESGKLPEQQCLFVRSEEI IWRA RGLSAAPEGLYPPTTDPRIKAW ALAPSSAPTFGEAGAAALRVPLMIIVGSKDQATPPER DSYPIYQSVSSAQKALW FENAGHYIFVEQCVPALIALGRFEQCSDLVWDMQRAHDLINH FATAFFLHALKGDPAAKAALDPTAVQFIGITYRRDGAW

>PETcan_304

IVLLLNFDVEYKRIKFNGDYIDIYKPKAEGNYPFVI FSGGMNSPSSRYESFGKFLASNGF ITIIPDYKGWLFLLLIPLKILRI IDNLNKIDSSIKNEGCLGGHSLGAYFSMIVSYKRSSV KCLFLFSPPALFLNYSKIKVPVLIFAGTNDEITKFEANQKI IYEHLKTQKKLVLIEGGNH NGYMDRWDFVEALTDGYLGIEHKKQLEIVRDSVLKFLKEILLK

>PETcan_305

QVIQQTVTLQKTQLRLTKEGFVTNYRFPVDFYYPDSPESFPVILISHGFGSVRENFR TLA QHLASHGFLVAVPQHIGSDLQYRQELIKGTLSSALSPVEFLARPTDLSTI IDYLQATQNT GSWQKRANLQQIGVIGDSLGGTTALTIGGAPLDIPRLQTKCTSDNVIVNVALILQCQASF LPPSEYNLADSRVKAVIATHPLISGI FSPDSLAKIQIPVMITAGNFDIITP

>PETcan_306

KVKSKPLTLYNVSGDRITADVHFVESFLPAPW IYSHGFLGFKDWGFIPYVAERFAENGF VFVRFNFSHNGIGENPNKITEFDKLAKNTISKQIEDLTAVIEYVFSDEFGVLNDGQLFLL GHSGGGGISIIKAVEDERVRALALWAS ISTFRRYSKHQIEELEKNGYIFVRVPDSVIQVK IEKIVYDDFVENSERYDIIKAISKLKIPILIVHGTADAIVPLAEAEKLRNSNPEYTKLVL ISGANHLFNVKHPMEHSTDQLDKAIDETVLFFKKI IENKKAD

>PETcan_307

QTVTSMLKDLDAVITQVSEKFPQIDNKRVCLIGHSQGAYVSFLHATKDERIKCLVSW MGR LSDLKEFWSKLWFDEIERKGYIYEWDYKITKKYVRDSLKYNLSKAAWRIKVPTLLI YGEL DDIVPPSEGMKFYRNIKSPKKIVIVKDLNHTFSGEKAKKSVIRITLKWLSKWLKRLD

>PETcan_308

LKIIEDFASLDTGVKVFYRCILPESFKELAIVSHGFTSHSGFYIHIGKELASYGYGV CIH DQRGHGRTAQNLERGYVDSFNDFLVDLET FTMHVQRVFGGERTVLIGHSMGGLIVLLYAG KYGRVGDAW AVAPAVLIPETRRFSTLIFATIASILFPRKRIELPFTEQQIEEGMKRMDR ELLEAMGKDELVLRDTTIKLLVEIWKASRE FWRYVERIQIPTLLIHGEKDNIIPIEASRR TYSRLKTLKKELIVYPECGHSPLHEIGWRERIKNMVEWIRNNI

>PETcan_401

ANPPGGDPDPGCQTDCNYQRGPDPTDAYLEAASGPYTVSTIRVSSLVPGFGGGTIHY PTN AGGGKMAGIW IPGYLSFESSIEWWGPRLASHGFW MTIDTNTIYDQPSQRRDQIEAALQ YLVNQSNSSSSPISGMVDSSRLAAVGWSMGGGGTLQLAADGG IKAAIALAPWNSSINDFN RIQVPTLIFACQLDAIAPVALHASPFYNRIPNTTPKAFFEMTGGDHWCANGGNI YSALLG KYGVSWMKLHLDQDTRYAPFLCGPNHAAQTLISEYRGNCPY

>PETcan_402

AFAITPSPTPTPDPTPNPSPDPGSCSGAECYIRGPNPTVRALEADDGPYSVRTTNVS SFV SGFGGGTIHYPVGTEGKMGAIAVIPGYVSYESSIRWWGSRLASWGFW ITIDTNTIYDQP DSRANQLSAALDYVIAQSNSRNSS ISGMVDSNRLGVIGWSMGGGGSLKLSTQRTLKAAIP QAPWYSGFNSFNRITTPTLIIACELDW APVGQHASPFYNRIPSSTAKAFLEINGGDHFC ANSGYPNEDILGKYGVSWMKRFIDGDRRYDQFLCGPNHESDRS ISDYRETCNY

>PETcan_403

TTPTPTPEPEPEPPGGCGDCYQRGPDPTVAALEADRGPYSVRTINVSSWVSGFGGGT IHY PVGTQGTMGAIAVIPGYVSYENS IEWWGGRLASWGFW ITIDTNSIYDQPDSRANQLSAA LDYVIAQSNSSRSAIQGMVDPNRLGAIGWSMGGGGTLKLSTDRYLKAAIPQAPWYSGFNP FDEITTPTL11ACQLDAVAPVAQHASP FYNEIPNSTAKAFLEIRNGDHFCANSGYPDEDI LGKYGVAWMKRFIDDDRRYDAFLCGPNHEAEWDISEYRDTCNY

>PETcan_404

ADNPYQRGPDPTERSVTARRGPFAIDEISVNGGIGAGFNRGTI FYPTDRSQGTFGAVAVI PGFLSPESLVRWFGPRLASQGFW MTLTTNGLTDTPESRSEQLLAALDYLTTRSQVRDRI DPSRLAVMGHSMGGGGSLAAAAKRPTLRAAIPLAPWSLTKNWSDLTVPTLI IGAENDNVA PVAGHSERFYDSMTNVPEKAYLEMAGGNHVDPTAESDLVAKFTISWLKRFVDDDTRYDQF LCPAPRPNRQISEYRDTCPHS

>PETcan_405

QADTDTTAVAPAAANPYERGPAPTEASVTAARGPFAIAQVNVPSGSGAGFNDGTI YYPTD TSQGTFGAVAVIPGFISPQAVIQWFGPRLASQGFW FTLDSNGLADLPDARGRQLLAALD YLTTQSTVRTRIDPNRLAVMGHSMGGGGTLLAAENRPTLKAAIPLAPWEPDTSWEGVKVP TMIIGGESDW APVSSMAIPDYNSLSSAPEKAYLELRSGDHLAPASESPTVAEYALSWLK RFVDDDTRYDQFLCPGPTPDTDISQYLDTCPNGS

>PETcan_406

RFRVAASLPAEYLAVDNW LEGTAQPPAPGGSGYQKGPEPTAALLEAGTGPFATASVTLS

RSAASGFGGGTIHYPQGVAGPFAAVAW PGYLAAESTIAWWGPRLASHGFW ITMATNNT

LDLPASRSAQLTAALNQLKTLSATPGHAVFGLVDPNRLGW GWSYGGGGTLLNAQANPQL

KAAMALAPKTLLQGDFTGTTVPTLW GCQADTTAAPAFWAIPFYNKVSASTGKAYLEVRG

GSHFCVTSSTSDADKKALGKYGVAWLKRFMDEDTRYAPFLCGAPRQADVAGNAAISD YRD NCPY

>PETcan_407

ADNPYQRGPDPTRDSVAASRGTFATASTTVGSGNGFGAGFI YYPTDTSQGTFGAVAIVPG YTATWAAEGAWMGHWLASFGFW IGIDTINRNDWDTARGTQLLAALDYLTQRSTVRDRVD ASRLAVMGHSMGGGGAMYAALQRPSLKAAVGLAP FSPSQNLNGMRVPTMLLAGQHDTTTT PASITSLYNGIPAATEKAYLELSGAGHGFPTSNNSVMMRKVIPWLKI FVDSDVRYTQFLC PLMDNTGIRSYQSTCPLLPGTPTPPNRYEAETSPAVCTGTIASNHTGYSGTGFCDGNNAT NAYAQFTVNASAAGSMTLRVRFANGTTTARPASLIVNGSTVQTPSFEGTGAWTTWATKTL TVTLNAGNNTIRFNPTTANGLPNLDYIEIAAP

>PETcan_408

KPITFTLLFIFICSIFYSQCEEVNLES ISNSGPYAVGSLIEGVDPIRNGPDYDGATIYYP INGTPPYSGIAIIPGYCGVESDIQDWGPFYASHGIVAITLGTNDPCADWPSARSTALLDA IVTVKEENSRQDSPLKDKIDVNSFAVSGWSMGGGGSQLAAS IDPSLKAVIGLCPWLDLNG FEPSDLIHDVPVLIFTGENDDIANSAEYGYMHYQGTPSTTDKLYFEIANGGHGAANSPEL EGGEVGVYALSWLKTYLDNDPCYCEFLVNTPSNSSDYETNIECLNAGIDEGENLIHFI YP NPIQDYIEFSNDGMERTYELKSSNGKS IKSGIVSHGYNKILFEKQNTEIYFLIIAGKSYK LISIK

>PETcan_409

GDCPATAICRSESPGAYSGNGPYGSRSYTLSRFQTPGGATVYYPANAEPPYAGMVFT PPY

TGTQAMFAAWGPFFASHGFVLVTMDTSTTLDSVDQRAAQQKEVLNALKSENTRSGSP LRG

KLDTARLGAVGWSMGGGATWINSAEYSGLKTAMSLAGHNLTAVDIDSKGYNTRVPTL LFN

GAQDLTYLGGLGQSDGVYNNIPAGIPKVFYEVSSAGHFDWGSPTAANRSVASLALAF HKA

YLDGDTRWLQYITRPSSDVTTWRTANIR

>PETcan_410

SQVPPTDPQDAPLGECPATALCRSEAPGSYSGNGPYGYRSYSLSRLQTPGGATVYYP ANA

EPPYSGLVFTPPYTGVQFMYAAWGPFFASHGIVLVTMDTTTTLDTVDQRARQQKTVL DVL

KGENNRAASPLRGKLDTSRIGAVGWSMGGGATWINAAEYAGLKTAMSLAGHNLSAID PNA

RGYNTRVPTLLFNGALDATYLGGLGQSDGVYNAIPAGIPKVFYEVASAGHFDWGSPT AAN

RDVAGIALAFHKAFLDGDTRWVDYIRRPSRDVATWRTAYLPD

>PETcan_411

ADCPAGAICRYDEQPGGYTGDGPYRVGDYS ISTFQAAGGATVYYPTNATPPFAALVFCPP YTGVQYMYRDWGPFFASHGIVMVTMDSETTLDTVDQRADQQREVLDFLKRENTNSRSPLY GKLATDRFGVTGWSMGGGATWINSADYSGLKTAMSLAGHNLTALDPDSRGYSTRIPTLIM NGALDTTYLGGLGQSDGVYNAIPYGVPKVFYEVSSAGHFAWGSPTSASDDVAKVALAFQK TFLEGDTRWAEYIRRPFWGASEWETANLP

>PETcan_412

SQVPPTPPTDDPMGDCPSTAICRGEAPGSYSGNGPYGSRSYTLSRFQTPGGATVYYP SNA EPPYSGLVFTPPYTGTQAMFRAWGPFFASHGIVLVTMDTSTTVDTVDQRASQQKRVLDVL KQENTRSGSPLRGKLDTSRLGAVGWSMGGGATWINSAEYNGLKTAMSLAGHNMTAIDLDS KGGNTRVPTLLFNGALDLTMLGGLGQS IGVYNAIPRGIPKVIYEVASAGHFDWGSPTAAN RSVAGIALAFHKTFLDGDTRWVSYIKRPSSDVATWRTENLPQ

>PETcan_413

NKEKSSFDQTAKITTRSKSIFKT IFTYLLVLAFITTIFPMNAFANSPAIIRNEEAPGKYA GNGPFSYNSYRLPLLSVYGTGGATVYYPTSGTAPYSGLVYCPPYTAKQSALAAWGPFFAS HGIILVTFDTLTPLDPVSLRALQQRTVLNALKTENSRLNSPLYQKVATDRIGAMGWSMGG GATWINSAEYSGLKTAMTIAGHNLSSTNLNSKGYNTKCPTLIMNGAMDTTGLGGLGQSNG VYKNIPANVPKVLYEVASAGHLNWTSPISASNDVAAIALAFQKTYLDGDSRWLAFITRPN SNVSIWETSNLMNP

>PETcan_501

SNPYQRGPNPTRSALTADGPFSVATYTVSRLSVSGFGGGVI YYPTGTSLTFGGIAMSPGY TADASSLAWLGRRLASHGFW LVINTNSRFDYPDSRASQLSAALNYLRTSSPSAVRARLD ANRLAVAGHSMGGGGTLRIAEQNPSLKAAVPLTPWHTDKTFNTSVPVLIVGAEADTVAPV SQHAIPFYQNLPSTTPKVYVELDNASHFAPNSNNAAISVYTISWMKLWVDNDTRYRQFLC NVNDPALSDFRTNNRHCQ

>PETcan_502

QTSPPTSASLNATAGPLSVSTSSVSSWAARGFGGGTI YYPNATGRYGW AISPGYTARQS SIAWLGRRLATHGFW ITIDTNSTLDQPPSRATQLMAALNHW NNANATVRSRVDASKLA VAGHSMGGGGSLIAAENNPSLKAAYPLTPWSVSKNYSSVRVPTMI IGADGDSIASVSTHS RLFYNSLSSNVSKAYGELNNASHFTPNYTNTPIGRYAVTWMKRFVDNDTRYSPFLCGAPH DSYATRTVFDRYEDNCAY

>PETcan_503

ESPYERGPDPTSASVLDNGTFSLSSTSVSSLVTGFGGGTI YYPTSTTQGTFGGW LAPGY TASSSSYSSVARRVASHGFW FAIDTNSRYDQPDSRGSQILAAVSYLKNSASSTVASRLD ETRIAVSGHSMGGGGTLAAANQDSS IKAAVALQPWHTDKTWPGIQIPTMIIGAENDSVAP VASHSIPFYTSMTGAREKAYGEINNGDHFIANTDDDWQGRLFVTWLKRYVDDDTRYSQFL CPAPSSIYLSDYRNTCPD

>PETcan_504

QAQYQKGPDPTASALERNGPFAIRSTSVSRTSVSGFGGGRLYYPTASGTYGAIAVSP GFT GTSSTMTFWGERLASHGFW LVIDTITLYDQPDSRARQLKAALDYLATQNGRSSSPIYRK VDTSRRAVAGHSMGGGGSLLAARDNPSYKAAIPMAPWNTSSTAFRTVSVPTMI FGCQDDS IAPVFSHAIPFYNAIPNSTRKNYVE IRNDDHFCVMNGGGHDATLGKLGISWMKRFVDNDT RYSPFVCGAEYNRW SSYEVSRSYNNCPY

>PETcan_505

VEIGPAPTSTSLNSDGSFAVSSASVSSSACGSGCAGGTVYYPNTAGSYGVIAVCPGF TNT SSAISWFARRMATHGFVTIAMNTNSRYDFPASRATQLRAVLNYLVNSSSSTIRSRIRSAD RGVSGYSMGGGGTLLASRDDSTLKTGVPMAPYNSGTISGVNVPQMI IGGSNDSIAPVSSM ARPFYNNIPSTVKKALAVLNGASHLTFTSYDERAARYGVAFAKRFADGDTRYTPFLCGAE HTAYATSSRFTEYSSNCPY

>PETcan 601 AANPYQRGPDPTESLLRAARGPFAVSEQSVSRLSVSGFGGGRI YYPTTTSQGTFGAIAIS PGFTASWSSLAWLGPRLASHGFW IGIETNTRLDQPDSRGRQLLAALDYLTQRSSVRNRV DASRLAVAGHSMGGGGTLEAAKSRTSLKAAIPIAPWNLDKTWPEVRTPTLI IGGELDSIA PVATHSIPFYNSLTNAREKAYLELNNASHFFPQFSNDTMAKEMISWMKRFIDDDTRYDQF LCPPPRAIGDISDYRDTCPHT

>PETcan_602

AANPYQRGPNPTEASITAARGPFNTAEITVSRLSVSGFGGGKI YYPTTTSEGTFGAIAIS PGFTAYWSSLEWLGHRLASQGFW IGIETNTTLDQPDQRGQQLLAALDYLTQRSAVRDRV DASRLAVAGHSMGGGGSLEAAKARTSLKAAIPLAPWNLDKTWPEVRTPTLI IGGELDAVA PVATHSIPFYNSLSNAPEKAYLELDNASHFFPNITNTQMAKYMIAWMKRFIDDDTRYTQF LCPPPSTGLLSDFSDARFTCPM

>PETcan_603

AQNPYERGPAPTEQSVRAERGPFAISQVSVSRLAVSGFGGGTI YYPTSTAEGTFGAVAIA PGYTASQSSMAWYGPRLASQGFVIFTIDTITTGDQPDSRGRQLLAALDYLTQRSSVRSRV DASRLGVMGHSMGGGGSLEATVSRPSLQAAIPLTPWNLDKTWPEVRVPTLI IGAENDSIA PVSSHSEPFYASLPSTLDKAYLELNGASHFAPNVSDTTIARFS ISWLKRFIDNDTRYEQF LCPPPRVSTEISEYRDTCPHSG

>PETcan_604

ASPYERGPAPTSAILEASRGPFATSS INVSSLSVTGFGGGVIYYPTSTAEGTFGAVAISP GYTASWSSLSWLGPRIASHGFW IGIETNTRLDQPASRGRQLLAALDYLTERSSVRGRID SSRLAVAGHSMGGGGSLEAAAARPSLQAAVPLAPWNLDKTWSDVRVPTLI IGGETDSVAP VATHSIPFYNSIPASSEKAYLELDGASHFFPQTTNTPTAKQMVAWLKRFVDDDTRYEQFL CPGPSGSAIQEYRNTCPSA

>PETcan_605

AADNPYERGPAPTESSIEALRGPYAVSQTSVSRLAATGFGGGTI YYPTSTADGTFGAVAI SPGFTALESSISWLGPRLASQGFW FTIDTLTTVDQPGSRGDQLLAALDYLTQRSSVRGR IDSSRLGVMGHSMGGGGSLEAAKTRPSLKAAIPMTPWNLDKTWPELRTPTLI FGADADTI APVATHAKPFYNTLPSSLDRTYIELNNATHFAPNTSNTTIAKYS ISWLKRFIDKDTRYEQ FLCPLPQRSLTIDEAQGNCPHTS

>PETcan_606

SNPYERGPAPTESSVTAVRGYFDTDTDTVSSLVSGFGGGTIYYPTDTSEGTFGGW IAPG YTASQSSMAWMGHRIASQGFW FTIDTITRYDQPDSRGRQIEAALDYLVEDSDVADRVDG NRLAVMGHSMGGGGTLAAAENRPELRAAIPLTPWHLQKNWSDVEVPTMI IGAENDTVASV RTHSIPFYESLDEDLERAYLELDGASHFAPNISNTVIAKYS ISWLKRFVDEDERYEQFLC PPPDTGLFSDFSDYRDSCPHTT

>PETcan_607

ADNPYERGPAPTTASIEAARGPYAVSQTTVSSLAVTGFGGGTI YYPTSTGDGTFGAIAVS PGYTATQSSIAWLGPRLASQGFW FTIDTLTTLDQPDSRGRQLLAALDHLTQVSSVRTRV DGSRLGVMGHSMGGGGSLEAAKARPSLQAAIPLTPWNLDKSWPEVGTPTLIVGADGDTVA PVASHAEPFYSSLPSSLDRAYLELNNATHFTPNSSNTTIAKYGISWLKRFVDNDTRYEQF LCPLPQPSTTIDEYRGNCPHTS

>PETcan_608

ADNPYARGPEPTTASVEAARGPFAVAQTSVSRYAVSGFGGGTVYYPTTTTAGTFGAV AVS PGYTARQSSIAWLGPRLASQGFW ITIDTLSTYDQPASRGDQLRAALAYLTQRSSVRARI DPTRLAW GHSMGGGGALEAAKDDPSLQAAVPLTGWNLDKTWPEVRTPTLVIGAEDDGVA PVRSHSEPFYASLPATLDKAYLELRGAGHLAPTVSNTTIATYTLSWLKRFVDDDLRYDRF LCPAPATSTAIAEYRSTCPY

>PETcan_609

ADNPYQRGPAPTNASIEATRGPYAVSSTSVSSWLVSGFDGGTI YYPTTTADGTFGAVAIS PGYTAYESSIAWFGERLASQGFW FTFDTNTTVDQPAQRGDQLLAALDYLTQRSSVRSRV DASRLGVMGHSMGGGGSLEASKDRPSLKAAIPMTPWNTDKTWSEIRTPTLI FGAENDSVA PVASHSEPFYSTIPSTTNKMYIELNGASHFAPNSSNTTIAKYS ISWLKRFLDNDTRYDQF LCPLPTSALYIEESRGTCPLR

>PETcan_610

VEATDVHGPDPTEETITAPRGPFDVEQESVSRFEVEGFGGGTI YYPTDTTDGLFSAVSIS PGYTGTQESMAWYGPRLASHGFW FTIDTITTTDQPDSRARQLQASLDHLVDDSSVRDRV DPARLGVMGHSMGGGGSLKAALDNPALQAAIPLTPWHTTKDFSGVRTPTLI IGAQNDTVA PVSQHAEPFYESLPDDPGKAYLELAGAGHLAPNTPDTTIAKYSLAWLKRFLDDDTRYDQF LCPPPQDDPEIAEHRSTCPY

>PETcan_611

AEPADVHGPDPTEESITAPRGPFEVDEESVSRLSVSGFGGGT IYYPTDTTDGLFSAVSIS PGFTGTQETMAWYGPRLASQGFW FTIDTITTTDQPDSRARQLQASLDYLVNDSDVKDII DPARLGVMGHSMGGGGSLKAALDNPALKAAIPLTPWHTTKDFSGVQTPTL IIGAQNDTVA PVSQHAKPFYESLPDDPGKAYLELAGASHLAPNTDNTTIAKFS IAWLKRFLDDDTRYDQF LCPPPENDDSISDYQSTCPY

>PETcan_612

PGFLGSSSNYAWMGPRLASQGFIVFLINTNTRLDTPPQRGDQLLAALDWLVASSPSA VRT RLDARRLAVAGHSMGGGGALEASLDRPSLQASLPLQPWHTPASFSGVQVPTMI IGAEADT TAPVASHAEPFYESLTSASDRAYLELNGADHRVSTTSSTTQAKEMIAWLKRFVDN

>PETcan_701

ANPYERGPNPTDALLEARSGPFSVSEENVSRLSASGFGGGTI YYPRENNTYGAVAISPGY TGTEASIAWLGKRIASHGFW ITIDTITTLDQPDSRAEQLNAALNHMINRASSTVRSRID SSRLAVMGHSMGGGGSLRLASQRPDLKAAIPLTPWHLNKNWSSVRVPTLIIGADLDTIAP VLTHARPFYNSLPTSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYRSTCPF

>PETcan_702

AANPYERGPNPTDALLEARSGPFSVSEENVSRLSASGFGGGTI YYPRESNTYGAVAISPG YTGTEASIAWLGERIASHGFW ITIDTITTLDQPDSRAEQLNAALNHMINRASSTVRSRI DSSRLAVMGHSMGGGGTLRLASQRPDLKAAIPLTPWHLNKNWSSVTVPTLI IGADLDTIA PVATHAKPFYNSLPSSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKWFVDNDTRYTQF LCPGPRDGLFGEVEEYRSTCPF

>PETcan_703

ANPYERGPNPTDALLEARSGPFSVSEENVSRLSASGFGGGTI YYPRENNTYGAVAISPGY TGTEASIAWLGERIASHGFW ITIDTITTLDQPDSRAEQLNAALNHMINRASSTVRSRID SSRLAVMGHSMGGGGSLRLASQRPDLKAAIPLTPWHLNKNWSSVRVPTLI IGADLDTIAP VLTHARPFYNSLPTSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYRSTCPF

>PETcan_704

ANPYERGPNPTDALLEARSGPFSVSEENVSRLGASGFGGGT IYYPRENNTYGAVAISPGY TGTQASVAWLGKRIASHGFW ITIDTITTLDQPDSRARQLNAALDYMINDASSAVRSRID SSRLAVMGHSMGGGGSLRLASQRPDLKAAIPLTPWHLNKNWSSVRVPTLI IGADLDTIAP VLTHARPFYNSLPTSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYRSTCPF

>PETcan_705

ANPYERGPNPTDALLEARSGPFSVSEERASRFGADGFGGGT IYYPRENNTYGAVAISPGY TGTQASVAWLGERIASHGFW ITIDTNTTLDQPDSRARQLNAALDYMINDASSAVRSRID SSRLAVMGHSMGGGGTLRLASQRPDLKAAIPLTPWHLNKNWSSVRVPTL IIGADLDTIAP VLTHARPFYNSLPTSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYRSTCPF

>PETcan_706

ANPYERGPNPTDALLEARSGPFSVSEERASRFGADGFGGGT IYYPRENNTYGAVAISPGY TGTQASVAWLGKRIASHGFW ITIDTNTTLDQPDSRARQLNAALDYMINDASSAVRSRID SSRLAVMGHSMGGGGSLRLASQRPDLKAAIPLTPWHLNKNWSSVRVPTL IIGADLDTIAP VLTHARPFYNSLPTSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYRSTCPF

>PETcan_707

ANPYERGPNPTDALLEASSGPFSVSEENVSRLSASGFGGGT IYYPRENNTYGAVAISPGY TGTEASIAWLGGRIASHGFW ITIDTITTLDQPDSRAEQLNAALNHMINRASSTVRSRID SSRLAVMGHSMGGGGTPRLASQRPDLKAAIPLTPWHLNKNRSSVTVPTL IIGADLDTIAP VATHAKPFYNSLPSSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYCSTCPF

>PETcan_708

ANPYERGPNPTESMLEARSGPFSVSEERASRLGADGFGGGTI YYPRENNTYGAIAISPGY TGTQSSIAWLGERIASHGFW IAIDTNTTLDQPDSRARQLNAALDYMLTDASSSVRNRID ASRLAVMGHSMGGGGTLRLASQRPDLKAAIPLTPWHLNKSWRD ITVPTLIIGADLDTIAP VSSHSEPFYNSIPSSTDKAYLELNNATHFAPNITNKTIGMYSVAWLKRFVDEDTRYTQFL CPGPRTGLLSDVDEYRSTCPF

>PETcan 709 ANPYERGPNPTQALLEARSGPFSVSSERAWRLGSDGFGGGT IYYPRENNTYGAVAISPGY TGTQASVAWLGERIASHGFW ITIDTNTTLDQPDSRARQLDAALDHMLNDASSAVRSRID RNRLAVMGHSMGGGGTLRLASQRPDLKAAIPLTPWHLNKSWSNVQVPTL IIGADLDTIAP VLTHAEPFYNSIPTSTRKAYLELDGATHFAPNITNSTIGMYSVAWLKRFVDEDTRYTQFL CPGPRTGLFSDVEEYRSTCPF

>PETcan_710

YNPYERGPNPTNSSIEALRGPFRVDEERVSRLQARGFGGGTI YYPTDNNTFGAVAISPGY TGTQSSISWLGERLASHGFW MTIDTNTTLDQPDSRASQLDAALDYMVEDSSYSVRNRID SSRLAAMGHSMGGGGTLRLAERRPDLQAAIPLTPWHTDKTWGSVRVPTLIIGAENDTIAS VRSHSEPFYNSLPGSLDKAYLELDGASHFAPNLSNTTIAKYS ISWLKRFVDDDTRYTQFL CPGPSTGWGSDVEEYRSTCPF

>PETcan_711

ANPYERGPDPTQASLEASRGPFPVSEERVSSPVSGFGGGTI YYPQENNTYGAVAISPGYT ATQSSVAWLGERIASHGFW ITIDTNTTLDQPDSRADQLEAALDHMVDGASSTVRSRIDR NRLAVMGHSMGGGGTLRLASRRPDLKAAIPLTPWHLNKSWSNVQVPTLI IGAENDTVAPV ALHAEPSYTSIPTSTRKAYLELNGASHFAPSVANATIGMYGVAWLKRFVDEDTRYTRFLC PGPRTGLFSDVEEYRSTCPF

>PETcan_712

ANPYERGPNPTNSSIEALRGPYSVSEDSVS SLVSGFGGGTIYYPTGTNETFGAVAISPGY TGTQSSISWLGPRLASQGFW MTIDTNTTLDQPDSRASQLDAALDYMVNRSSSTVRNRID

>PETcan_713

ANPYERGPNPTNSSIEALRGPFRVDEERVSRLQARGFGGGTI YYPTDNNTFGAVAISPGY TGTQSSISWLGERLASHGFVVMTIDTNTTLDQPDSRASQLDAALDYMVEDSSYSVRNRID

>PETcan_714

ANPYERGPNPTDALLEARSGPFSVSEENVSRLSASGFGGGT IYYPRENNTYGAVAISPGY TGTEASIAWLGERIASHGFW ITIDTITTLDQPDSRAEQLNAALNHMINRASSTVRSRID SSRLAVMGHSMGGGGSLRLASQRPDLKAAIPLTPWHLNKNWSSVTVPTLI IGADLDTIAP VATHAKPFYNSLPSSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYRSTCPFY

>PETcan_715

ANPYERGPNPTDALLEASSGPFSVSEENVSRLSASGFGGGT IYYPRENNTYGAVAISPGY TGTEASIAWLGERIASHGFW ITIDTITTLDQPDSRAEQLNAALNHMINRASSTVRSRID SSRLAVMGHSMGGGGTLRLASQRPDLKAAIPLTPWHLNKNWSSVTVPTL IIGADLDTIAP VATHAKPFYNSLPSSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYRSTCPF

>PETcan_716

ANPYERGPNPTDALLEARSGPFSVSEENVSRFGADGFGGGT IYYPRENNTYGAVAISPGY TGTQASVAWLGERIASHGFW ITIDTNTTLDQPDSRARQLNAALDYMINDASSAVRSRID SSRLAVMGHSMGGGGTLRLASQRPDLKAAIPLTPWHLNKNWSSVRVPTLI IGADLDTIAP VLTHARPFYNSLPTSISKAYLELDGATHFAPNIPNKI IGKYSVAWLKRFVDNDTRYTQFL CPGPRDGLFGEVEEYRSTCPFALE

>PETcan_717

ANPYERGPNPTESMLEARSGPFSVSEERASRFGADGFGGGTI YYPRENNTYGAIAISPGY TGTQSSIAWLGERIASHGFW IAIDTNTTLDQPDSRARQLNAALDYMLTDASSAVRNRID ASRLAVMGHSMGGGGTLRLASQRPDLKAAIPLTPWHLNKSWRDITVPTLI IGAEYDTIAS VTLHSKPFYNSIPSPTDKAYLELDGASHFAPNITNKTIGMYSVAWLKRFVDEDTRYTQFL CPGPRTGLLSDVEEYRSTCPF