COMBINATION OF NUCLEIC ACID SEQUENCES ENCODING PROTEINS DERIVED FROM HELICHRYSUM UMBRACULIGERUM, AND ANY TRANSGENIC CELL, TISSUE, AND ORGANISM COMPRISING SAME

Title:

COMBINATION OF NUCLEIC ACID SEQUENCES ENCODING PROTEINS DERIVED FROM HELICHRYSUM UMBRACULIGERUM, AND ANY TRANSGENIC CELL, TISSUE, AND ORGANISM COMPRISING SAME

Document Type and Number:

WIPO Patent Application WO/2024/052918

Kind Code:

Abstract:

The present invention provides an isolated DNA molecule including at least a first nucleic acid sequence encoding a first protein and at least a second nucleic acid sequence encoding a second protein, wherein the first protein and the second protein are derived from Helichrysum umbraculigerum and belonging to an enzyme family selected from: acyl activating enzyme (AAE), polyketide synthase (PKS), polyketide cyclase (PKC), prenyltransferase (PT), or cannabichromenic acid synthase (CBCAS), and wherein the first protein and the second protein belong to different enzyme families. Further provided are an artificial nucleic acid molecule including the isolated DNA molecule, a transgenic cell, a tissue, or a plant including same. Further provided is a method for synthesizing a cannabinoid, a precursor thereof, or any combination thereof.

Inventors:

AHARONI ASAPH (IL)
SONAWANE PRASHANT (IL)
JOZWIAK ADAM (IL)
BERMAN PAULA (IL)
DE-HARO LUIS (IL)

Application Number:

PCT/IL2023/050968

Publication Date:

March 14, 2024

Filing Date:

September 07, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

YEDA RES & DEV (IL)

International Classes:

C12N15/52; C12N5/04; C12N9/00; C12N9/02; C12N9/10; C12N9/88; C12N15/63; C12N15/82

Domestic Patent References:

WO2020208411A2

2020-10-15

Other References:

GüLCK THIES; MøLLER BIRGER LINDBERG: "Phytocannabinoids: Origins and Biosynthesis", TRENDS IN PLANT SCIENCE, ELSEVIER, AMSTERDAM, NL, vol. 25, no. 10, 6 July 2020 (2020-07-06), AMSTERDAM, NL , pages 985 - 1004, XP086267951, ISSN: 1360-1385, DOI: 10.1016/j.tplants.2020.05.005
GüLCK THIES, BOOTH J. K., CARVALHO Â., KHAKIMOV B., CROCOLL C., MOTAWIA M. S., MøLLER B. L., BOHLMANN J., GALLAGE N: "Synthetic Biology of Cannabinoids and Cannabinoid Glucosides in Nicotiana benthamiana and Saccharomyces cerevisiae", JOURNAL OF NATURAL PRODUCTS, AMERICAN CHEMICAL SOCIETY, US, vol. 83, no. 10, 23 October 2020 (2020-10-23), US , pages 2877 - 2893, XP055800466, ISSN: 0163-3864, DOI: 10.1021/acs.jnatprod.0c00241
BERMAN PAULA, DE HARO LUIS ALEJANDRO, JOZWIAK ADAM, PANDA SAYANTAN, PINKAS ZOE, DONG YOUNGHUI, CVETICANIN JELENA, BARBOLE RANJIT, : "Parallel evolution of cannabinoid biosynthesis", NATURE PLANTS 09 NOV 2015, vol. 9, no. 5, 1 May 2023 (2023-05-01), pages 817 - 831, XP093146882, ISSN: 2055-0278, DOI: 10.1038/s41477-023-01402-3

Attorney, Agent or Firm:

GEYRA, Assaf et al. (IL)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1. An isolated DNA molecule comprising at least a first nucleic acid sequence encoding a first protein and at least a second nucleic acid sequence encoding a second protein, wherein said first protein and said second protein are derived from Helichrysum umbraculigerum and belonging to an enzyme family selected from the group consisting of: acyl activating enzyme (AAE), polyketide synthase (PKS), polyketide cyclase (PKC), prenyltransferase (PT), and cannabichromenic acid synthase (CBCAS), and wherein said first protein and said second protein belong to different enzyme families.

2. The isolated DNA molecule of claim 1, further comprising at least a third nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein said first protein, said second protein, and said third protein, belong to different enzyme families.

3. The isolated DNA molecule of claim 2, further comprising at least a fourth nucleic acid sequence encoding a fourth protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein said first protein, said second protein, said third protein, and said fourth protein, belong to different enzyme families.

4. The isolated DNA molecule of claim 3, further comprising at least a fifth nucleic acid sequence encoding a fifth protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein said first protein, said second protein, said third protein, said fourth protein, and said fifth protein, belong to different enzyme families.

5. The isolated DNA molecule of any one of claims 1 to 4, further comprising a nucleic acid sequence encoding a protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: uridine diphosphate (UDP)- glycosyltransferase (UGT), alcohol acyltransferase (AAT), and both.

6. The isolated DNA molecule of any one of claims 1 to 5, wherein: a. said AAE is encoded by a nucleic acid sequence having at least 89% homology to any one of SEQ ID Nos.: 1-11, and any combination thereof; b. said PKS is encoded by a nucleic acid sequence having at least 83% homology to any one of: SEQ ID Nos.: 23-26, and any combination thereof; c. said PKC is encoded by a nucleic acid sequence having at least 88% homology to any one of: SEQ ID Nos.: 31-38, and any combination thereof; d. said PT is encoded by a nucleic acid sequence having at least 91% homology to any one of: SEQ ID Nos.: 47-58, and any combination thereof; e. said CBCAS is encoded by a nucleic acid sequence having at least 82% homology to any one of: SEQ ID Nos.: 71-79, and any combination thereof; or f. any combination of (a) to (e). The isolated DNA molecule of claim 5 or 6, wherein: a. said UGT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 89-101, and any combination thereof; b. said AAT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 115-129, and any combination thereof; or c. both (a) and (b). The isolated DNA molecule of any one of claim 1 to 7, wherein: a. said AAE comprises an amino acid sequence with at least 93% homology to any one of SEQ ID Nos.: 12-22; b. said PKS comprises an amino acid sequence with at least 93% homology to any one of: SEQ ID Nos.: 27-30; c. said PKC comprises an amino acid sequence with at least 87% homology to any SEQ ID Nos.: 39-46; d. said PT comprises an amino acid sequence with at least 92% homology to any one of: SEQ ID Nos.: 59-70; e. said CBCAS comprises an amino acid sequence with at least 86% homology to any one of: SEQ ID Nos.: 80-88; or f. any combination of (a) to (e).

9. The isolated DNA molecule of any one of claims 5 to 8, wherein: a. said UGT comprises an amino acid sequence with at least 90% homology to any one of: SEQ ID Nos.: 102-114; b. said AAT comprises an amino acid sequence with at least 91% homology to any one of: SEQ ID Nos.: 130-144; or c. both (a) and (b).

10. The isolated DNA molecule of any one of claim 1 to 9, wherein a. said AAE consists of an amino acid sequence of any one of SEQ ID Nos.: 12-22; b. said PKS consists of an amino acid sequence of any one of SEQ ID Nos.: 27-30; c. said PKC consists of an amino acid sequence of any one of SEQ ID Nos.: 39-46; d. said PT consists of an amino acid sequence of any one of SEQ ID Nos.: 59-70; e. said CBCAS consists of an amino acid sequence of any one of SEQ ID Nos.: SO- 88; f. or any combination of (a) to (e).

11. The isolated DNA molecule of any one of claims 5 to 10, wherein: a. said UGT consists of an amino acid sequence of any one of: SEQ ID Nos.: 102- 114; b. said AAT consists of an amino acid sequence of any one of: SEQ ID Nos.: ISO- 144; or c. both (a) and (b).

12. The isolated DNA molecule of any one of claims 1 to 11, comprising a plurality of isolated DNA molecule types.

13. The isolated DNA molecule of claim 12, wherein each type of said plurality of isolated DNA molecule types encodes a protein or a plurality of proteins belonging to a different enzyme family.

14. An artificial nucleic acid molecule comprising the isolated DNA molecule of any one of claims 1 to 13.

15. A plasmid or an agrobacterium comprising the artificial nucleic acid molecule of claim 14.

16. A transgenic cell comprising: a. the isolated DNA molecule of any one of claim 1 to 13; b. the artificial nucleic acid molecule of claim 14; c. the plasmid or agrobacterium of claim 15; or d. any combination of (a) to (c).

17. The transgenic cell of claim 16, being any one of: a unicellular organism, a cell of a multicellular organism, and a cell in a culture.

18. The transgenic cell of claim 17, wherein said unicellular organism comprises a fungus or a bacterium.

19. The transgenic cell of claim 18, wherein said fungus is a yeast cell.

20. The transgenic cell of claim 19, being a transgenic Cannabis sativa cell.

21. An extract derived from the transgenic cell of any one of claims 19 to 20, or any fraction thereof.

22. The extract of claim 21, comprising a cannabinoid, a precursor thereof, or a combination thereof.

23. A transgenic plant, a transgenic plant tissue or a plant part, comprising: a. the isolated DNA molecule of any one of claims 1 to 13; b. the artificial nucleic acid molecule of claim 14; c. the plasmid or agrobacterium of claim 15; d. the transgenic cell of any one of claims 16 to 20; or e. any combination of (a) to (d).

24. The transgenic plant, transgenic plant tissue, or plant part of claim 23, wherein said plant is a transgenic C. sativa plant.

25. A composition comprising: a. the isolated DNA molecule of any one of claims 1 to 13; b. the artificial nucleic acid molecule of claim 14; c. the plasmid or agrobacterium of claim 15; d. the transgenic cell of any one of claims 16 to 20; e. the extract of claim 21 or 22; f. the transgenic plant, transgenic plant tissue, or plant part of claim 23 or 24; or g. any combination of (a) to (f), and an acceptable carrier.

26. A method for synthesizing a cannabinoid, a precursor thereof, or any combination thereof, comprising the steps: a. providing a transgenic cell or a cell transfected with the isolated DNA molecule of any one of claims 1 to 13 or the artificial nucleic acid molecule of claim 14; and b. culturing said transgenic cell or said transfected cell from step (a) such that at least said first protein and said second protein encoded by said artificial nucleic acid molecule are expressed, thereby synthesizing the cannabinoid, a precursor thereof, or any combination thereof.

27. The method of claim 26, wherein said precursor is selected from the group consisting of: acyl coenzyme A (CoA), a polyketide, a resorcinoid precursor, and any combination thereof.

28. The method of claim 27, wherein said acyl is C1-C8 alkyl.

29. The method of claim 27 or 28, wherein said acyl CoA is hexanoyl CoA.

30. The method of any one of claims 27 to 29, wherein said polyketide is a tetraketide.

31. The method of claim 30, wherein said tetraketide is a linear tetraketide.

32. The method of any one of claims 27 to 31, wherein said resorcinoid precursor is olivetolic acid.

33. The method of any one of claims 26 to 32, wherein said cannabinoid is cannabigerolic acid (CBGA), CBCA, or both.

34. The method of any one of claims 26 to 33, wherein said artificial nucleic acid molecule is an expression vector.

35. The method of any one of claims 26 to 34, wherein said transgenic cell or said transfected cell is a prokaryote cell or a eukaryote cell.

36. The method of any one of claims 26 to 35, wherein said transgenic cell or said transfected cell is a C. sativa cell.

37. The method of any one of claims 26 to 36, further comprising a step preceding step (a), comprising introducing or transfecting a cell with said artificial nucleic acid molecule, thereby obtaining the transgenic cell or the transfected cell.

38. The method of any one of claims 27 to 37, further comprising a step of extracting said transgenic cell or said transfected cell, thereby obtaining an extract from the transgenic cell or the transfected cell.

39. An extract of a transgenic cell or a transfected cell obtained according to the method of claim 38.

40. The extract of claim 39, comprising a cannabinoid, a precursor thereof, or any combination thereof.

41. The extract of claim 39 or 40, comprising CBGA, CBCA, or both.

42. A composition comprising the extract of any one of claims 39 to 41, and an acceptable carrier.

Description:

COMBINATION OF NUCLEIC ACID SEQUENCES ENCODING PROTEINS DERIVED FROM HELICHRYSUM UMBRACULIGERUM, AND ANY TRANSGENIC CELL, TISSUE, AND ORGANISM COMPRISING SAME

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

[001] The contents of the electronic sequence listing (YEDA-P-010-PCT ST26.xml; size: 251,312 bytes; and date of creation: August 20, 2023) is herein incorporated by reference in its entirety.

CROSS-REFERENCE TO RELATED APPLICATIONS

[002] This application claims the benefit of priority of U.S. Provisional Patent Application No. 63/404,645, titled "COMBINATION OF NUCLEIC ACID SEQUENCES ENCODING PROTEINS DERIVED FROM HELICHRYSUM UMBRACULIGERUM, AND ANY TRANSGENIC CELL, TISSUE, AND ORGANISM COMPRISING SAME", filed 8 September 2022, and of U.S. Provisional Patent Application No. 63/453,112, titled "COMBINATION OF NUCLEIC ACID SEQUENCES ENCODING PROTEINS DERIVED FROM HELICHRYSUM UMBRACULIGERUM, AND ANY TRANSGENIC CELL, TISSUE, AND ORGANISM COMPRISING SAME", filed 19 March 2023. The contents of both applications are incorporated herein by reference in their entirety.

FIELD OF INVENTION

[003] The present invention relates to combinations of enzymes derived from Helichrysum umbraculigerum including polynucleotides encoding same, and methods of using same, such as for producing cannabinoids.

BACKGROUND

[004] Cannabinoids are terpenophenolic compounds found in Cannabis saliva. an annual plant belonging to the Cannabaceae family. The plant contains more than 400 chemicals and approximately 70 cannabinoids. The latter accumulate mainly in the glandular trichomes. Of the naturally occurring cannabinoids, tetrahydrocannabinol (THC), for example, is used for treating a wide range of medical conditions, including glaucoma, AIDS wasting, neuropathic pain, treatment of spasticity associated with multiple sclerosis, fibromyalgia, and chemotherapy -induced nausea. THC is also effective in the treatment of allergies, inflammation, infection, epilepsy, depression, migraine, bipolar disorders, anxiety disorder, drug dependency and drug withdrawal syndromes.

[005] Additional active cannabinoids include cannabidiol (CBD), an isomer of THC, which is a potent antioxidant and anti-inflammatory compound known to provide protection against acute and chronic neuro-degeneration; cannabigerol (CBG), found in high concentrations in hemp, which acts as a high affinity a2-adrenergic receptor agonist, moderate affinity 5- HT1A receptor antagonist and low affinity CB 1 receptor antagonist, and possibly has antidepressant activity; and cannabichromene (CBC), which possesses anti-inflammatory, antifungal and anti-viral properties. Many phytocannabinoids have therapeutic potential in a variety of diseases and may play a relevant role in plant defense as well as in pharmacology. Accordingly, biotechnological production of cannabinoids and cannabinoid-like compounds with therapeutic properties is of uttermost importance. Thus, cannabinoids are considered to be promising agents for their beneficial effects in the treatment of various diseases.

[006] Despite their known beneficial effects, therapeutic use of cannabinoids is hampered by the high costs associated with the growing and maintenance of the plants in large scale and the difficulty in obtaining high yields of cannabinoids. Extraction, isolation and purification of cannabinoids from plant tissue is particularly challenging as cannabinoids oxidize easily and are sensitive to light and heat.

[007] Therefore, there is a need for developing methodologies that allow large-scale production of cannabinoids for therapeutic use.

SUMMARY

[008] According to a first aspect, there is provided an isolated DNA molecule comprising at least a first nucleic acid sequence encoding a first protein and at least a second nucleic acid sequence encoding a second protein, wherein the first protein and the second protein are derived from Helichrysum umbraculigerum and belonging to an enzyme family selected from the group consisting of: acyl activating enzyme (AAE), polyketide synthase (PKS), polyketide cyclase (PKC), prenyltransferase (PT), and cannabichromenic acid synthase (CBCAS), and wherein the first protein and the second protein belong to different enzyme families.

[009] According to another aspect, there is provided an artificial nucleic acid molecule comprising the isolated DNA molecule disclosed herein. [010] According to another aspect, there is provided a plasmid or an agrobacterium comprising the artificial nucleic acid molecule disclosed herein.

[Oi l] According to another aspect, there is provided a transgenic cell comprising: (a) the isolated DNA molecule of the invention; (b) the artificial nucleic acid molecule disclosed herein; (c) the plasmid or agrobacterium disclosed herein; or (d) any combination of (a) to (c).

[012] According to another aspect, there is provided an extract derived from the transgenic cell of disclosed herein, or any fraction thereof.

[013] According to another aspect, there is provided transgenic plant, a transgenic plant tissue or a plant part, comprising: (a) the isolated DNA molecule of the invention; (b) the artificial nucleic acid molecule disclosed herein; (c) the plasmid or agrobacterium disclosed herein; (d) the transgenic cell disclosed herein; or (e) any combination of (a) to (d).

[014] According to another aspect, there is provided a composition comprising: (a) the isolated DNA molecule of the invention; (b) the artificial nucleic acid disclosed herein; (c) the plasmid or agrobacterium disclosed herein; (d) the transgenic cell disclosed herein; (e) the extract disclosed herein; (f) the transgenic plant tissue or plant part disclosed herein; or (g) any combination of (a) to (f), and an acceptable carrier.

[015] According to another aspect, there is provided a method for synthesizing a cannabinoid, a precursor thereof, or any combination thereof, comprising the steps: (a) providing a transgenic cell or a cell transfected with the isolated DNA molecule of the invention or the artificial nucleic acid molecule disclosed herein; and (b) culturing the transgenic cell or the transfected cell from step (a) such that at least the first protein and the second protein encoded by the artificial nucleic acid molecule are expressed, thereby synthesizing the cannabinoid, a precursor thereof, or any combination thereof.

[016] According to another aspect, there is provided an extract of a transgenic cell or a transfected cell obtained according to the herein disclosed method.

[017] According to another aspect, there is provided a composition comprising the extract disclosed herein, and an acceptable carrier.

[018] In some embodiments, the isolated DNA molecule further comprises at least a third nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein the first protein, the second protein, and the third protein, belong to different enzyme families.

[019] In some embodiments, the isolated DNA molecule further comprises at least a fourth nucleic acid sequence encoding a fourth protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein the first protein, the second protein, the third protein, and the fourth protein, belong to different enzyme families.

[020] In some embodiments, the isolated DNA molecule further comprises at least a fifth nucleic acid sequence encoding a fifth protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein the first protein, the second protein, the third protein, the fourth protein, and the fifth protein, belong to different enzyme families.

[021] In some embodiments, the isolated DNA further comprises a nucleic acid sequence encoding a protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: uridine diphosphate (UDP)-glycosyltransferase (UGT), alcohol acyltransferase (AAT), and both.

[022] In some embodiments: (a) the AAE is encoded by a nucleic acid sequence having at least 89% homology to any one of SEQ ID Nos.: 1-11, and any combination thereof; (b) PKS is encoded by a nucleic acid sequence having at least 83% homology to any one of: SEQ ID Nos.: 23-26, and any combination thereof; (c) PKC is encoded by a nucleic acid sequence having at least 88% homology to any one of: SEQ ID Nos.: 31-38, and any combination thereof; (d) PT is encoded by a nucleic acid sequence having at least 91% homology to any one of: SEQ ID Nos.: 47-58, and any combination thereof; (e) CBCAS is encoded by a nucleic acid sequence having at least 82% homology to any one of: SEQ ID Nos.: 71-79, and any combination thereof; or (f) any combination of (a) to (e).

[023] In some embodiments: (a) the UGT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 89-101, and any combination thereof; (b) the AAT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 115-129, and any combination thereof; or (c) both (a) and (b).

[024] In some embodiments: (a) AAE comprises an amino acid sequence with at least 93% homology to any one of SEQ ID Nos.: 12-22; (b) PKS comprises an amino acid sequence with at least 93% homology to any one of: SEQ ID Nos.: 27-30; (c) PKC comprises an amino acid sequence with at least 87% homology to any SEQ ID Nos.: 39-46; (d) PT comprises an amino acid sequence with at least 92% homology to any one of: SEQ ID Nos.: 59-70; (e) CBCAS comprises an amino acid sequence with at least 86% homology to any one of: SEQ ID Nos.: 80-88; (f) or any combination of (a) to (e).

[025] In some embodiments: (a) the UGT comprises an amino acid sequence with at least 90% homology to any one of: SEQ ID Nos.: 102-114; (b) the AAT comprises an amino acid sequence with at least 91% homology to any one of: SEQ ID Nos.: 130-144; or (c) both (a) and (b).

[026] In some embodiments: (a) the AAE consists of an amino acid sequence of any one of SEQ ID Nos.: 12-22; (b) the PKS consists of an amino acid sequence of any one of SEQ ID Nos.: 27-30; (c) the PKC consists of an amino acid sequence of any one of SEQ ID Nos.: 39-46; (d) the PT consists of an amino acid sequence of any one of SEQ ID Nos.: 59-70; (e) the CBCAS consists of an amino acid sequence of any one of SEQ ID Nos.: 80-88; (f) or any combination of (a) to (e).

[027] In some embodiments: (a) the UGT consists of an amino acid sequence of any one of: SEQ ID Nos.: 102-114; (b) the AAT consists of an amino acid sequence of any one of: SEQ ID Nos.: 130-144; or (c) both (a) and (b).

[028] In some embodiments, the isolated DNA molecule comprises a plurality of isolated DNA molecule types.

[029] In some embodiments, each type of the plurality of isolated DNA molecule types encodes a protein or a plurality of proteins belonging to a different enzyme family.

[030] In some embodiments, the transgenic cell is any one of: a unicellular organism, a cell of a multicellular organism, and a cell in a culture.

[031] In some embodiments, the unicellular organism comprises a fungus or a bacterium.

[032] In some embodiments, the fungus is a yeast cell.

[033] In some embodiments, the transgenic cell is a transgenic Cannabis sativa cell.

[034] In some embodiments, the extract comprises a cannabinoid, a precursor thereof, or a combination thereof.

[035] In some embodiments, the precursor is selected from the group consisting of: acyl coenzyme A (Co A), a polyketide, a resorcinoid precursor, and any combination thereof.

[036] In some embodiments, the acyl is C1-C8 alkyl.

[037] In some embodiments, the acyl CoA is hexanoyl CoA. [038] In some embodiments, the polyketide is a tetraketide.

[039] In some embodiments, the tetraketide is a linear tetraketide.

[040] In some embodiments, the resorcinoid precursor is olivetolic acid.

[041] In some embodiments, the cannabinoid is cannabigerolic acid (CBGA), CBCA, or both.

[042] In some embodiments, the artificial nucleic acid molecule is an expression vector.

[043] In some embodiments, the transgenic cell or the transfected cell is a prokaryote cell or a eukaryote cell.

[044] In some embodiments, the transgenic cell or the transfected cell is a C. sativa cell.

[045] In some embodiments, the method further comprises a step preceding step (a), comprising introducing or transfecting a cell with the artificial nucleic acid molecule, thereby obtaining the transgenic cell or the transfected cell.

[046] In some embodiments, the method further comprises a step of extracting the transgenic cell or the transfected cell, thereby obtaining an extract from the transgenic cell or the transfected cell.

[047] In some embodiments, the extract comprises a cannabinoid, a precursor thereof, or any combination thereof.

[048] Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

[049] Further embodiments and the full scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description. BRIEF DESCRIPTION OF THE FIGURES

[050] Figs. 1A-1I include structures of chemical compounds, images, a chromatogram, a table, and micrographs showing that H. umbraculigerum biosynthesizes CBGA 1 and other terpenophenols in all aerial plant parts. (1A) Proposed biosynthetic pathways of CBGA 1 and heliCBGA 2. (IB) Photographs of the H. umbraculigerum plant inflorescence (up) and shoot (down). (1C) Total ion chromatogram of an ethanolic extract of H. umbraculigerum fresh leaves. The most abundant peaks of identified metabolites are marked on the Figure and color-coded according to the class of terpenophenol. CBGA 1 and heliCBGA 2 are highlighted in red and blue, respectively. (ID) Absolute quantification of CBGA 1 in different plant tissues [% w/w per fresh weight, n=3; for lyophilized leaves % w/w per dry weight (DW), n=5]. Reported Cannabis values were added for comparison. (IE) Chemical structures and names of selected terpenophenols with similar chemical formulas as 1-3. Representative (IF) cryo-SEM and (1G) confocal micrographs of the adaxial top view domain of leaves showing stalked glandular trichomes (marked by arrows). (1H) TEM micrograph showing the multicellular structure of the different cell types in a stalked glandular trichome at secretory stage. BC, basal cell; SC, stalk cell; NC, neck cell; DC, disk cell; SCv, secretory cavity. The dashed line marks the surface of the SCv. (II). High magnification image shows the ultrastructure of DCs. CW, cell wall; M, mitochondria; N, nucleus; P, plastid; PSP, periplasmic space; V, vacuole; Vs, vesicle. Arrows mark active secretions from vesicles to the periplasmic space by exocytosis.

[051] Figs. 2A-2E include fluorescent micrographs, graphs, and a scheme showing that cannabinoid-associated gene expression is correlated with cannabinoid metabolites accumulation in H. umraculigerum glandular trichomes. (2A) Optical image and (2B) MALDI-MSI of m/z 361.23 ± 0.01 Da of a cross-sectioned leaf showing that CBGA 1 accumulates in stalked glandular trichomes of leaves. Glandular trichomes in (2A) are marked to improve interpretation. The signals in (2B) correspond with the protonated m/z of CBGA 1 and geranylphlorocaprophenone 4. (2C) Normalized Enrichment Score (NES) of each co-expressed module in each tissue. Module M4 is highlighted as it is highly expressed in trichomes and leaves. (2D) Spaghetti chart showing the expression profile of module M4. The expression levels of individual genes are shown in gray lines. Colored lines highlight the expression of candidate genes from the pathway. (2E) Genomic landscape of the eight longest scaffolds of H. umbraculigerum assembly. Track i represents the gene density; ii represents repeat element density; iii represents 3’ Tran-Seq coverage; iv represents TrueSeq coverage. These metrics are calculated in 0.1 Mb non-overlapping windows. Magnification of the marked area in scaffold 1 reveals a tandem gene cluster containing seven PKSs. The enzymes HuPKSl-3 and HuTKS4 were cloned and functionally characterized in this study.

[052] Figs. 3A-3F include a heatmap, graphs, and a table showing the discovery of the core cannabinoid biosynthetic pathway enzymes. (3A) Gene expression in young leaves, roots and trichomes of the putative enzymes characterized in this study [log(cpm+l), n=3]. The most active enzymes in this study were highlighted in pink. AAE, acyl activating enzyme; PKS, type III polyketide synthase; PKC, polyketide cyclase; PT, prenyl-transferase. (3B) Products of recombinant enzyme assays of purified HuAAE proteins using various alkyl (short- and medium-chain FAs) and aromatic (cinnamic and coumaric acids) substrates. Peak areas were used for the comparisons (mean ± s.d.; n=3). CoAT, acyl-CoA-transferase; EV, empty. (3C) Products of coupled recombinant enzyme assays of HuPKSs with either an EV or Cannabis olivetolic acid cyclase (CsOAC), in the presence of hexanoyl-CoA and malonyl-CoA. PDAL, pentyl diacetic acid lactone; HTAL, hexanoyl triacetic acid lactone; OA 92, olivetolic acid; PCP 95, phlorocaprophenone. Peak areas were used for the comparisons (mean ± s.d.; n=3). OA 92 and PCP 95 were identified using analytical standards ([M-H]' = 223.097 Da). (3D) Activity assay of microsomal fractions expressing prenyltransferases (PTs) using an array of aromatic substrates and either geranyl pyrophosphate (GPP) or isopentenyl pyrophosphate (IPP) as the isoprenoid donors. Circles represent observed mono- or iso-prenylated products in H. umbraculigerum or in vitro assays. VA, divarinolic acid; DHSA 93, dihyrostilbenic acid; ND, not detected; CBGAS, cannabigerolic acid synthase. (3E) Steady state kinetic analysis of HuPTl, HuPT3 and HuCBGAS4 with OA 92 and GPP. The Michaelis-Menten Vm values were calculated using varying (0.5 pM-3 mM) and constant (1 mM) concentrations of each substrate (n = 3). The literature Km value of Cannabis CsGOT4 was added for comparison. (3F) Phylogenetic analysis of PT proteins from H. umbraculigerum and other plants. The selection of the proteins was based on functionally characterized enzymes as described by de Bruijn et al. (2020). The clades according to the different substrates are marked in colored circles. HuPT proteins are highlighted in red, while Cannabis and Rhododendron dauricum PTs which prenylate cannabinoids are highlighted in blue. AH. umbraculigerum flower and a Cannabis leaf highlight the active HuCBGA4 and CsGOT4, respectively. A full list of protein IDs is available in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023).

[053] Figs. 4A-4F include a phylogenetic tree, a heatmap, a table, chromatograms, and structure of chemical compounds showing the functional characterization of cannabinoid tailoring enzymes. (4A) Phylogenetic analysis of selected uridine diphosphate- glycosyltransf erase (UGT) proteins from H. umbraculigerum, Arabidopsis thaliana, Oryza sativa and Stevia rebaudiana. The clades were annotated according to Arabidopsis thaliana UGT family classification (numbers in colored circles). HuUGT proteins are highlighted in red, while other proteins from plant species not producing cannabinoids that were shown previously to be able to glycosylate cannabinoids are highlighted in blue. A full list of protein IDs is available in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023). H. umbraculigerum flowers mark the active HuCBGTl, HuCBGT6 and HuOAGTll. 4-Hydroxybenzoic acid (4-HBA) and 2,4-dihydroxybenzoic acid (2,4-DHBA) which are structurally similar to OA 92 and CBGA 1 are located next to the UGT enzymes that glycosylate them. Glycosylated hydroxyls are highlighted. (4B) Gene expression in young leaves, roots and trichomes of the putative UGT and alcohol acyl transferase (AAT) enzymes characterized in this study [log(cpm+l), n=3]. The enzymes found most active in this study were highlighted in pink. (4C) Comparison of steady state kinetic analysis of HuOAGTll and HuUGT13 versus OsUGT and SrUGT, with OA 92 and uridine diphosphate glucose (UDP-Glc). Assays were performed using varying (0.5 pM-3 mM) and constant (1 mM) concentrations of each substrate (n = 3). (4D) Extracted ion chromatograms of monoglucosides according to the theoretical m/z values, following enzymatic assays with the purified enzymes in the presence of UDP-Glc and an array of aromatic substrates (additional assays appear in Fig. 12B). One to three glucosylated compounds were observed for each substrate. The peaks were putatively assigned by MS/MS fragmentation patterns (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Compounds naturally observed in H. umbraculigerum were marked with a green asterisk. Chromatograms were normalized to the highest value. (4E) Extracted ion chromatograms of the O-acylatcd cannabinoids following enzymatic assays with purified HuCoAT5 in the presence of different acyl donors and aromatic substrates as acceptors. Major ion products were selected in each LC-MS/MS chromatogram. A single peak was observed for each pair of substrates. The detected analog peaks shifted in retention time depending on their change in hydrophobicity relative to the acyl group. Identification was performed according to MS/MS fragmentation (Fig. 13, and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)) and retention time. Compounds naturally observed in H. umbraculigerum were marked with a purple asterisk. Chromatograms were normalized to the highest value. (4F) Potential glucosylation and observed O-acylation sites were highlighted in blue and/or purple on each chemical structure, respectively. y [054] Figs. 5A-5D include combination diagrams and graphs showing in vivo reconstruction of the core cannabinoid pathway in heterologous systems. Co-expression of different combinations of HuCoAT6, HuTKS4, and HuCBGAS4, along with CsOAC and CsOLS from Cannabis in (5A-5B) N. benthamiana leaves and (5C-5D) S. cerevisiae yeasts. Grey, yellow, and green boxes to the left of the graphs indicate biosynthetic genes that are included in a co-expression experiment; blue boxes mark supplementation of geranyl pyrophosphate (GPP) and either (5A and 5C) sodium hexanoate (HexNa) or (5B and 5D) OA 92. Peak areas were used for the comparisons (mean ± s.d.; n=3-6). N. benthamiana produced mainly glycosylated products identified according to the previously conducted in vitro UGT enzyme assays (Figs. 4D and 12B). All the metabolites were identified by exact mass, retention time and MS/MS spectra (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). EV, empty vector.

[055] Fig. 6 includes a scheme showing parallel and divergent evolution of the cannabinoid biosynthetic pathway. The scheme provides a side-by-side comparison of the cannabinoid biosynthetic routes in H. umbraculigerum and Cannabis. On the top part, the phylogenetic relationship between Arabidopsis thaliana, Solanum lycopersicum, Helianthus annuus, Letuca sativa, Cannabis sativa and Helicrysum umbraculigerum illustrates the evolutionary distances between Cannabis and Helicrysum. The tree was constructed based on the whole proteomes of each species using the word-based software Prot-SpaM. Hybrid, yet unreported metabolites were produced in this study by reacting cannabinoids naturally biosynthesized in Cannabis (marked in green) with uridine diphosphate glucose (UDP-Glc) or acyl-CoAs in the presence of HuCoAT5, HuCBGTl or HuCBGT6 enzymes from H. umbraculigerum (represented by blue). AAE, acyl activating enzyme; OLS, olivetol synthase; OAC, olivetolic acid cyclase; GOT, geranylpyrophosphate: olivetol ate geranyltransferase; CBDAS, cannabidiolic acid synthase; CBCAS, cannabichromenic acid synthase; THCAS, (-)-A ⁹-Zrans-tetrahydrocannabinolic acid synthase; AAE, acyl activating enzyme; PT, prenyl-transferase; UGT, uridine diphosphate-glycosyltransferase; AAT, alcohol acyltransferase. The active enzymes identified in this study are marked by their names. CoAT, acyl-CoA-transferase; TKS, tetraketide synthase; PKC, polyketide cyclase; CBGAS, cannabigerolic acid synthase; OAGT, olivetolic acid UGT; CBGT, cannabinoid UGT; CB AT, cannabinoid acyl-transferase; BBE-like, berberine bridge enzyme-like; Cyc, cyclase; CYP, cytochrome P450.

[056] Figs. 7A-7B include chromatograms and structures of chemical compounds showing LC-MS/MS fingerprinting of CBGA 1, heliCBGA 2 and APHA 3 in H. umbraculigerum. (7A) Extracted ion chromatograms and MS/MS spectral matching of cannabigerolic acid (CBGA 1 [M-H]- = 359.222 Da), heli-cannabigerolic acid (heliCBGA 2 [M-H]“ = 393.206 Da), and pre-amorphastilbol (APHA 3 [M-H]’ = 391.191 Da) standards or authentic metabolites versus a H. umbraculigerum leaf extract. To confirm the assignment, CBGA 1 and heliCBGA 2 were purified and analyzed by NMR. (7B) Stable isotope labeling of CBGA 1, heliCBGA 2 and APHA 3 via feeding of H. umbraculigerum leaves with hexanoic-Dn acid, phenylalanine-Ds or phenylalanine- ¹³C9. The MS/MS spectra of the non-labeled versus the labeled forms show similar fragmentation patterns with mass shifts corresponding with the labeled parts of the molecule.

[057] Figs. 8A-8J include micrographs and images showing stalked glandular trichomes in leaves and flowers of H. umbraculigerum. (8A-8B) Representative cryo-SEM micrographs of the lateral view of flower samples showing stalked glandular trichomes (marked by arrows). (8C) Light micrograph showing the biseriate structure of stalked glandular trichomes of H. umbraculigerum leaves. (8D-8F) Selected TEM micrographs of trichomes of H. umbraculigerum leaves at different stages of secretion. High magnification images show the ultrastructure of disk cells (DCs). CW, cell wall; M, mitochondria; N, nucleus; P, plastid; PSP, periplasmic space; SCv, secretory cavity; V, vacuole; Vs, vesicle. Arrows mark active secretions from the vesicles to the PSP by exocytosis. (8D) In the presecretory stage, DCs contained a very dense cytoplasm covered by ER and multiple ribosomes. There was no SCv or PSP and plastids were large and resembled pro-plastids. (8E) In the secretory stage, delamination of the apical DC wall led to the formation of the SCv. Electron transparent secretions were exuded out of plastids in vesicles delimited by an electron-dense layer. The vesicles released their contents to the PSP by exocytosis where the secretory product accumulated prior to secretion into the SCv. (8F) DCs of mature trichomes at the post- secretion stage were largely vacuolated with a cytoplasm restricted to the small remaining area. Plastids at this stage had degenerated and no vesicles were observed. The cell wall had a largely cutinized layer with a large SCv. MALDLMSI of m/z 361.23 ± 0.01 Da signals of the (8G) abaxial and (8H) adaxial leaf domains, following partial removal of trichomes by duct tape (the peeled area is outlined by green line). The areas with partially /fully removed trichomes show less or no signals compared to the untouched parts. (81) Optical image and (8J) MALDLMSI of m/z 361.23 ± 0.01 Da of a cross- sectioned flower receptacle. Glandular trichomes in i are marked to improve interpretation. The signals in 8G-8H, and 8J. correspond with the protonated m/z of CBGA 1 and geranylphlorocaprophenone 4. The white broken lines in 8G-8J. mark the regions analyzed. [058] Fig. 9 include a scheme showing the predicted parallel metabolic pathways for the biosynthesis of cannabinoids and other terpenophenols present in H. umbraculigerum. The predicted types of enzymes catalyzing each reaction are marked by 1-8. Additional functional groups and rearrangements include hydroxylation, double bond isomerization or reduction, cyclization, and others. Alkyl chains can be linear/branched with one to seven carbons length; AAE, acyl activating enzyme; PKS, type III polyketide synthase; PKC, polyketide cyclase; PT, prenyl-transferase; UGT, uridine diphosphate-glycosyltransferase; AAT, alcohol acyl transferase; DBR, double bond reductase; CHI, chaicone isomerase. The active enzymes identified in this study are marked by their names. Co AT, acyl-CoA- transferase; TKS, tetraketide synthase; CBGAS, cannabigerolic acid synthase; OAGT, olivetolic acid UGT; CBGT, cannabinoid UGT; CBAT, cannabinoid acyl-transferase.

[059] Figs. 10A-10E include chromatograms, a scheme, structures of chemical compounds, and curves showing functional characterization of HuAAE, HuPKS and HuPTs. (10A) Ion abundances from triple-Quad analyses of acyl-CoAs produced in vitro by the HuAAEs versus analytical standard (Std). (10B) A scheme showing the steps and types of products and by-products synthesized in vitro by the recombinant HuPKSs with or without the Cannabis olivetolic acid cyclase (CsOAC). (10C) Ion abundances from triple-Quad analyses of OA 92 and olivetol products from coupled recombinant enzyme assays of HuPKSs with either an empty vector (EV) or Cannabis olivetolic acid cyclase (CsOAC), in the presence of hexanoyl-CoA and malonyl-CoA. (10D) MS/MS spectra of prenylated OA 92 products with cannabigerolic acid synthase (HuCBGAS4) and either isopentenyl pyrophosphate (IPP), geranyl pyrophosphate (GPP) or famesyl pyrophosphate (FPP) as the prenyl donors. CBPA 19, cannabiprenylic acid; CBGA 1, cannabigerolic acid; SesquiCBGA, sesqui cannabigerolic acid (MS/MS spectrum corresponds to published data from Cannabis ¹⁵'). (10E) Steady state kinetic analysis of H. umbraculigerum prenyltransferases HuPTl, HuPT3 and HuCBGAS4 with OA 92 and GPP. The Michaelis-Menten Km value of each enzyme was calculated using varying (0.5 pM -3 mM) and constant (1 mM) concentrations of each substrate (n = 3 technically independent samples; measurements were plotted individually).

[060] Figs. 11A-11D include phylogenetic trees showing phylogenetic analyses of enzymes and whole proteome from H. umbraculigerum and different plant species. Phylogenetic analysis of (11A) AAE, (11B) PKS and (11C) PT proteins from H. umbraculigerum and other plants. H. umbraculigerum and Cannabis proteins are highlighted in red and blue, respectively, and the active enzymes were marked by a flower and a leaf, respectively. A full list of protein IDs is available in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023). Bootstrap values are indicated at the nodes of each branch. (11A) The selection of the proteins was based on (11A) Arabidopsis thaliana enzymes or (11B-11C) functionally tested enzymes. Clades according to substrates or functionalities are marked by different colors. None of the active H. umbraculigerum enzymes clustered with any of the known Cannabis proteins. (11D) phylogenetic relationship between Arabidopsis thaliana, Solanum lycopersicum, Helianthus annuus, Letuca sativa, Cannabis sativa and Helicrysum umbraculigerum illustrate the evolutionary distance between the last two species (marked by a flower and a leaf, respectively). The tree was constructed based on the whole proteomes of each species using the word-based software Prot-SpaM.

[061] Figs. 12A-12C include graphs, chromatograms, structures of chemical compounds, and curves showing functional characterization of HuUGTs. (12A) Activities of lysates containing HuUGTs with olivetolic acid (OA 92), cannabigerolic acid (CBGA 1) and helicannabigerolic acid (heliCBGA 2) as substrates and uridine diphosphate glucose (UDP- Glc) as the sugar donor (n=l). Reactions show differing substrate specificities and type of products. Representative peaks correspond to chromatograms obtained for HuCBUGTl. The most abundant products in each assay are marked with asterisks. EV, empty vector. (12B) In vitro production of monoglucosides with the purified UGTs and additional substrates. Extracted ion chromatograms of the observed monoglucosides using UDP-Glc and either DHSA 93, olivetol, CBG, CBD, A ⁹-THC, PCP 95, naringenin chaicone 97 or pinocembrin chaicone 100. The substrates naringenin chaicone 97 and pinocembrin chaicone 100 contained mixtures of the chaicones and respective flavanones. All LC-MS chromatograms were selected for the theoretical m/z values of the respective metabolites of interest. (12C) Comparison of steady state kinetics of UGTs with OA 92 and UDP-Glc. HuOAUGTl l and HuUGT13 were compared with UGTs from rice (OsUGT) and stevia (SrUGT). Kinetic values were calculated using varying (0.5 Mm -3 mM) and constant (1 mM) concentrations of each substrate (n = 3 technically independent samples; measurements were plotted individually). Vo and Umax were calculated using the calibration curve of OA 92 since there was no analytical standard available for Glc-OA 102.

[062] Figs. 13A-13C include structures of chemical compounds, chromatograms, and a phylogenetic tree showing functional characterization of HuAATs. (13A) Stable dual isotope labeling of (9-MeButCBGA 120 via feeding of H. umbraculigerum leaves with either 2- methyl butyric-Dg acid or hexanoic-Dn acid. The MS/MS spectra of the non-labeled versus the two-labeled forms show fragmentation patterns with mass shifts corresponding with the labeled parts of the molecule. Fragments colored in red, or purple correspond to the m/z of the specific fragment with labeled alkyl chain or acyl group, respectively. (13B) Activities of lysates containing HuAATs with different acyl donors and cannabinoid acceptors. Extracted ion chromatograms were selected for the theoretical m/z values of the respective metabolites. Only HuCBAT5 and HuAAT14 (red and blue, respectively) acylated CBGA 1 and heliCBGA 2 with both acyl-CoAs. EV, empty vector; Std, standard; ButCoA, butyryl- CoA; HexCoA, hexanoyl-CoA. (13C) Phylogenetic analysis of HuAAT proteins and identified BAHD AATs from other plants. The Maximum Likelihood tree was constructed with 100 bootstrap tests based on a MUSCLE multiple alignment using the MEGA11 software. The evolutionary distances were computed using the JTTmatrix-based method. Bootstrap values are indicated at the nodes of each branch. The clades of the different AAT types are marked in circles based on Tuominen et al. (2011). The active HuCBAT5 and HuAAT14 were clustered in clade Illa which represents BAHDs of diverse catalytic functions. A full list of protein IDs is available in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023).

[063] Fig. 14 includes chromatograms and structure of chemical compounds showing MS/MS spectra of observed acylated cannabinoids following enzymatic assays with the purified HuCBAT5. OA 92, olivetolic acid; CBGA 1, cannabigerolic acid; HeliCBGA 2, helicannabigerolic acid; CBDA, cannabidiolic acid. Full data of MS/MS products appears in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023). MS/MS fragmentation and retention times correspond to the O-acylatcd cannabinoids found in the plant.

[064] Figs. 15A-15F include schemes, chromatograms, and a table showing the reconstruction of the core cannabinoid pathway in heterologous systems. Schematic representation of products observed in (15A) N. benthamiana leaves and (15D) S. cerevisiae yeasts following co-expression of different combinations of HuCoAT6, HuTKS4, and HuCBGAS4, along with CsOAC from Cannabis. NbUGT, N. benthamiana uridine diphosphate-glycosyltransferase; HexNa, sodium hexanoate; GPP, geranyl pyrophosphate; OA 92, olivetolic acid. Extracted ion chromatograms and MS/MS spectra showing (15B) glycosylated OA (Glc-OA 102), glycosylated polycaprophenone (Glc-PCPl/2) and glycosylated naringenin chaicone (Glc-Naringenin chaicone 1/2) following feeding with HexNa and GPP (I); and (15C) glycosylated cannabigerolic acid (Glc-CBGA 109) following feeding with OA 92 and GPP (II). Glycosylated metabolites synthesized by the recombinant stevia (SrUGT) or rice (OsUGT) enzymes were used as reference for identification of N. benthamiana products according to exact mass, retention time and MS/MS spectra. EV, empty vector; UDP-Glc, uridine diphosphate glucose. (15E) Extracted ion chromatograms of OA 92, PCP 95 and CBGA 1 products observed in yeasts without any feeding. Identification was according to analytical standards. (15F) Summary of the observed products in each assay. PDAL, pentyl acyl diacetic acid lactone; HTAL, hexanoyl acyl triacetic acid lactone.

DETAILED DESCRIPTION

[065] The present invention, in some embodiments, is directed to a DNA molecule comprising at least a first nucleic acid sequence encoding a first protein and at least a second nucleic acid sequence encoding a second protein, wherein the first protein and the second protein are derived from Helichrysum umbraculigerum, including methods of using same.

[066] In some embodiments, any one of the first protein and the second protein belongs to an enzyme family selected from: acyl activating enzyme (AAE), polyketide synthase (PKS), polyketide cyclase (PKC), prenyltransferase (PT), cannabichromenic acid synthase (CBCAS), uridine diphosphate (UDP)-glycosyltransferase (UGT), alcohol acyltransferase (AAT).

[067] In some embodiments, the DNA molecule further comprises at least a third nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.

[068] In some embodiments, the DNA molecule further comprises at least a fourth nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.

[069] In some embodiments, the DNA molecule further comprises at least a fifth nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.

[070] In some embodiments, the DNA molecule further comprises at least a sixth nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.

[071 ] In some embodiments, the DNA molecule further comprises at least a seventh nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT. [072] In some embodiments, the first protein and the second protein belong to different enzyme families.

[073] In some embodiments, the first protein, the second protein, and the third protein belong to different enzyme families.

[074] In some embodiments, the first protein, the second protein, the third protein, and the fourth protein belong to different enzyme families.

[075] In some embodiments, the first protein, the second protein, the third protein, the fourth protein, and the fifth protein belong to different enzyme families.

[076] In some embodiments, the first protein, the second protein, the third protein, the fourth protein, the fifth protein, and the sixth protein belong to different enzyme families.

[077] In some embodiments, the first protein, the second protein, the third protein, the fourth protein, the fifth protein, the sixth protein, and the seventh protein belong to different enzyme families.

[078] According to some embodiments: (a) an AAE protein is encoded by a nucleic acid sequence having at least 89% homology or identity to any one of SEQ ID Nos.: 1-11; (b) PKS is encoded by a nucleic acid sequence having at least 83% homology or identity to SEQ ID Nos.: 23-26; (c) PKC is encoded by a nucleic acid sequence having at least 88% homology or identity to SEQ ID Nos.: 31-38; (d) PT is encoded by a nucleic acid sequence having at least 91% homology or identity to SEQ ID Nos.: 47-58; (e) CBCAS is encoded by a nucleic acid sequence having at least 82% homology or identity to SEQ ID Nos.: 71-79; or (f) any combination of (a) to (e).

[079] In some embodiments, the DNA molecule further comprises a nucleic acid sequence being derived from Helichrysum umbraculigerum and encoding one or more protein(s) or enzyme(s) belonging to the uridine diphosphate (UDP)-glycosyltransferase (UGT) family; the alcohol acyltransferase (AAT) family, or both.

[080] In some embodiments: (a) UGT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 89-101, and any combination thereof; (b) AAT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 115-129, and any combination thereof; or (c) both (a) and (b).

[081] In some embodiments, the DNA molecule comprises at least two nucleic acid sequence encoding at least two enzyme, wherein each enzyme belongs to a different family, wherein the at least two families are selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and A AT.

[082] In some embodiments, the DNA molecule is an isolated DNA molecule. In some embodiments, the DNA molecule is a complementary DNA (cDNA) molecule.

[083] As used herein, the term “DNA molecule” refers to a polynucleotide comprising or consisting of deoxyribonucleotides.

[084] As used herein, the terms "isolated polynucleotide" and "isolated DNA molecule" refer to a nucleic acid molecule that is essentially free from contaminating cellular components, such as carbohydrate, lipid, or other proteinaceous impurities associated with the nucleic acid in nature. Typically, a preparation of isolated DNA or RNA contains the nucleic acid in a highly purified form, e.g., at least about 80% pure, at least about 90% pure, at least about 95% pure, greater than 95% pure, or greater than 99% pure. In some embodiments, the isolated polynucleotide is any one of DNA, RNA, and cDNA. In some embodiments, the isolated polynucleotide is a synthesized polynucleotide. Synthesis of polynucleotides is well known in the art and may be performed, for example, by ligating or covalently linking by primer linkers multiple nucleic acid molecules together.

[085] The term "nucleic acid" is well known in the art of molecular biology. A "nucleic acid" as used herein will generally refer to any molecule (e.g., a strand) of DNA, RNA or a derivative or analog thereof, comprising nucleotides. Nucleotides are comprised of nucleosides and phosphate groups. The nitrogenous bases of nucleosides include, for example, naturally occurring purine or pyrimidine nucleosides as found in DNA (e.g., an adenine "A," a guanine "G," a thymine "T" or a cytosine "C") or RNA (e.g., an A, a G, an uracil "U" or a C).

[086] The term "nucleic acid molecule" includes but is not limited to single- stranded RNA (ssRNA), double-stranded RNA (dsRNA), single- stranded DNA (ssDNA), double- stranded DNA (dsDNA), small RNAs, circular nucleic acids, fragments of genomic DNA or RNA, degraded nucleic acids, amplification products, modified nucleic acids, plasmid or organellar nucleic acids, and artificial nucleic acids such as oligonucleotides.

[087] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGACGTCGTCAAAGAAGTTTACAGTTGAAGTTGAACCGGCGATTCCGGCCAA GGATGGAAAACCGTCGGCTGGACCGGTTTACCGTAGTATCTTTGCTAAAGACG GTTTTCCAGCTCATATTGACGGTTTAGATTCATGTTGGGATATTTTCCGCCTATC TGTGGAGAAATACCCCAATAATCGAATGCTTGGCACCCGTGAATTTGTGAATG GAAAGCATGGACCATATGTATGGTCGACTTACAAACAAGTATACGACAAGGTG

ATAAAGGTTGGAAATGCTATCCGTGCGTGTGGTGTCGAGCCAGGTGGTCGGTG

TGGGATCTATGGTGCCAATTGTGCAGAATGGATTATGAGCATGGAGGCATGTA

ATGCTCATGGGCTTTACTGTGTACCTTTATACGATACCTTAGGTGCTGGTGCAA

TTGAATTCATTCTTTGCCATGCCGAGGTTACAATTGCTTTTGTAGAAGAGAAAA

AGATCCCTGAGTTGTTGAAAACATTTCCGAAAGCTGGAGAATTTCTGAAAACA

ATTGTGAGCTTTGGAAAAGTTACTCCTGAACAAAGAGAACAAGCTGAAAACTT

TGGTTTAAAAATACATTCATGGGATGAATTCTTGACATTGGGTGATGATAAAA

ACTTTGACCTGCCACTGAAGGAAAAAACTGATATCTGTACAATAATGTACACT

AGTGGAACAACTGGTGATCCTAAGGGTGTTCTGATTTCAAATAACAGCATGGC

AACACTTATAGCTGGCGTCAATCGTCTACTAGATAGTGCAAAAGAATCTTTGA

ATCAACATGATGTCTATCTCTCGTTTTTACCTCTGGCACATATATTTGACCGTGT

GATTGAAGAATGTTTTATCAATCATGGAGCATCTATAGGATTCTGGCGTGGGG

ATGTTAAATTGCTGATTGAAGACATAGGGGAGCTGAAACCTACTATTTTCTGC

GCTGTTCCTCGAGTGTTGGATAGGATTTATTCAGGTTTGCAACAGAAAATTTCT

GCGGGGGGTTTTATCAAACGTAACTTATTTAATCTAGCCTATTCATACAAATTA

CGTAATATGAAGGGAGGGAAAACACATTCAGAGGCATCTCCATTGAGTGACA

AAATCGTCTTCAGTAAGGTTAAGCAGGGCCTAGGAGGAAATGTACGAATTATT

CTATCTGGAGCTGCTCCACTAGCTCCACATGTAGAAGCTTACCTGAAAGTAGT

GGCATGTAGTCACGTCCTGCAAGGATATGGCCTGACAGAAACTTGTGCTGGAT

CATTTGTCTCACTGCCAAACGAAATGGAGATGCTGGGTACAGTGGGCCCACCT

GTACCAGTTTTGGATGCCCGACTGGAGTCTGTTCCGGAGATGAACTATGATGCT

TGTTCAAGCAAACCACAAGGAGAAATATGTATTAGAGGGGATGTTCTGTTTTC

AGGATACTACAAGCGTGAGGACCTTACAAAAGAAGTCTTTGTTGATGGGTGGT

TCCATACAGGTGATATCGGTGAGTGGCAACCAGATGGAAGCATGAAAATTATT

GACCGAAAGAAAAACATTTTTAAGCTCTCACAAGGAGAGTACGTCGCAGTTGA

AAATCTGGAGAATGTTTATGGAAATGTTTCTGACATTGACACGATATGGATAT

ATGGGAACAGCTTCGAGTTTTGTCTTGTTGCTGTGGTCAACCCAAATGAGCCAG

CAATCAAACGTTATGCTGAAGCAAATAATATTTCTGGGGATTTTGATTCATTAT

GTGAAAATCCCAAAATTAAAGAATACATACTCGGAGAGCTCGCTAGAATTGGA

AAAGAGAAAAAGTTAAAAGGTTTTGAATTCGTCAAAGCTGTTCACCTTGACCC

TGTCCCTTTCGACATGGAACGTGACCTTCTGACCCCAACATTCAAGAAGAAAA

GGCCCCAGATGCTTAAGTACTACCAGGATGTAATTGATAACATGTACAAGACT ATTAACAAGAAGTGA (SEQ ID NO: 1). [088] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 92%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 1, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 95%, 90% to 97%, 95% to 99%, or 90% to 100% homology or identity to SEQ ID NO: 1. Each possibility represents a separate embodiment of the invention.

[089] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGGATGCATTGAGGAAGCCTAATTCTGCGAATTCAAGCCCTTTAACTCCTATC GGATTCCTTGAAAGGGCAGCCGTCGTATTTGCCAACTCTCCTTCGATCGTATAC AACAATCTCATCTACACTTGGAGCGATACTTTTCATCGTTGTCTACGATTAGCT TCATCCATCTCTCGTCTCGCTATACGAAAAGGCGACGTTGTTTCAGTACTCGCA CCAAACATCCCTGCCATTTATGAGCTTCATTTTGGCATCACTATGACTGGGGCC ATAATCAACACCATCAATACCCGTTTGGATGCGCGTACTATCTCAATACTCCTT TGTCACAGTGAATCCAAGCTCGTCTTTGTTGATTACCAGTTGACTCGTCTTATA CGAGAAGCGGTTTCTTTGATGCCAGATGCTTGTGTTCCCCCACAACTCGTCCTC ATCGTAGATGACGGACATAATCTATCTTTACTTTCTGATCAATTTATCAATACT TATGAAGCTATGGTTGAAACAGGGGATCCTGGGTTCAATTGGGTTCGTCCAGA TAGCGATTGGGACCCTCTAACGTTGAATTACACTTCTGGGACGACTTCTTCCCC CAAAGGTGTTGTTAACAGCCACCGTGGATCGTTCATAGTAGCGTTTGATTCTTT ACTGGAGTGGCACGTACCGAAACAGCCGATCATGCTGTGGACTCTACCAATGT TCCACGCAAATGGGTGGAGCTTCGTTTGGGGTATGGCAGCTGTTGGTGGCACC AATGTTTGCCTTCGTAAATTCGATGCTACTATTATTTATGACACCATTCGTAAC CACCATGTGACGCACATGTGTGGCGCCCCTGTTGTACTCAACATGTTATCAGAA GGTAAGCCACTTGAACACACGGTTCACATAATGACAGCAGGAGCACCACCTCC AGCGGCCGTTTTGTTGCGAACCGAGTCGCTAGGGTTTGAGGTGACTCATGGGT TCGGGATGACAGAAACAGGCGGGTTAGTTGTGTCATGCTCATGGAAGAAAGA ATGGAATCGTCTGCCCGTGACTGAGAAAGCGAGATTGAAAGCGAGACAAGGA GTTAGAACACTTGGGATGACGGAAGTGGATATTGTGGATCCCGAGTCAGGAGT AAGTGTGACTCGAGACGGGTTAACTCAGGGGGAATTAGTGTTGCGAGGTGGGT CTATTATGTTGGGTTACTTAAAAGATCCGGAAACAACAAATAAATCCGTTAAA AACGGGTGGTTTTATACCGGCGACGTGGCGGTGATGCATCCAGATGGATATCT GGAAATAAAAGATAGATCAAAAGATGTAATAATAAGTGGTGGTGAGAATATA AGTAGTGTGGAGGTTGAGTCAATCTTGTATCAGCATCCTGCGATTAACGAGGC CGCGGTGGTGGGACGGCCTGATGAGTTTTGGGGCGAGTCGCCGTGTGCTTTCG TGAGTTTGAAAGATGATAACGGGAAGGTGGCTGTGCCAACAGCGGATGAGAT AATGAAGTTTTGTAAAGGAAAGTTGCCGGGTTACATGGTACCCAAATCGGTTG TGTTTAAGAAGGATCTTCCGAAGACATCTACCGGTAAGATTCAGAAATATGTG CTTAGAAAACTTGCTAAAGATTTGGGTTTTGCTGTAAAAAGTCGAATTTAG (SEQ ID NO: 2).

[090] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79%, at least 83%, at least 87%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 2, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 85%, 80% to 92%, 82% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 2. Each possibility represents a separate embodiment of the invention.

[091] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGACCGAAGAGGAAAAAAATAAAGCAGAGTCCATGGGGATAAAAACGTATG CATGGAGCGACTTCCTTCATCTGGGGAGTAAAAATCCTTCAGAACTGCAAACG CCTAAAGCAACTGATATATGTACAATCATGTACACTAGTGGCACTAGTGGAGA CCCAAAAGGTGTTATATTGACACATGAAAATGCTACAACAAACATACGAGGGG TTGATCTTTTCATGGAACAATTCGAGGACAAGATGACCGTGGATGACGTTTAT ATATCTTTCTTGCCTCTTGCTCACATTCTTGATCGTATGATTGAAGAATACTTTT TCCGTAGTGGTGCCTCTGTCGGCTTCTATCATGGGGATATCAATGCGTTGAAGG AGGATTTGGCAGAGCTAAAGCCTACTTTTTTGGCTGGAGTACCTCGAGTTTTGG AAAAGATTCACGAAGGTGTGCTTAAAGGACTAGAAGAAGTTAATCCAAGGAG AAGGAAAATATTTAGCATTTTATACAATCACAAACTAAAATACATGAAAGCAG GTTACAAGCATAAATATGCATCACCACTTGCAGATCTGCTTGCTTTTAGAAAGG TTAAGAACAGGCTTGGTGGGCGAATTCGTCTTATGGTATCTGGAGGAGCTCCG TTAAGCACTGAGATTGAAGAGTTCATGAGGGTTACTTCATGTGCTTTTGTGGCG CAAGGATATGGTTTGACGGAAACATGTGGTTTGGCTACTTTAGGATTTCCAGAT GAGATGTGCATGATTGGAACAGTTGGTTCGCCCTTCGTGTATACAGAATTACG CCTCGAAGAAGTTTCAGATATGGGCTATGACCCGTTGGCCAATCCACCACGTG GTGAAATATGTGTTAAGGGAAAAACGCCTTTCGCAGGTTACTACAAGAATCCA GAACTCACTAATGAGGTCATGAAAGATGGGTGGTTTCATACAGGTGACATAGG AGAGATGCAACCAAACGGGGTATTGAAAATCATCGACAGAAAGAAACATCTG ATAAAACTATCTCAAGGGGAGTATATCGCGCTTGAATATCTAGAGAAAGTTTA CTGCATCACTCCCATTCTTGAAGACATCTGGGTATATGGGGATAGCTTCAAGTC ATCATTGGTCGCGGTAGCTGTACCAAACAAAGAAAACGCAGAAAAGTGGGCC GATCAAAAGGGCCTTAAAGTTTCTTACTCTGAGCTCTGCACACTAACACAGTTC AGAGATTATATCCAATCTGAACTGAAATCTACCGCGGAGAGAAACAAGCTAAG AGGTTTTGAGCATATAAAGGCTATAATTGTGGAGCCACGGACGTTTGAAGGAG ACCAGGAATTGTTGACTGCAACAATGAAGAAACGTAGAAATAAACTGCTTAAC CGTTACAAGGAGGGGATCGACAACCTTTACAAGAACTTGGCTGCAAACAAACG CTGA (SEQ ID NO: 3).

[092] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 86%, at least 88%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 3, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 86% to 94%, 88% to 97%, 86% to 100%, or 92% to 99% homology or identity to SEQ ID NO: 3. Each possibility represents a separate embodiment of the invention.

[093] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGGTGTACAAGTCTTTGAATTCAATATCCATATCAGATATAGTAAATCTTGGT ATATCACCTGAAACTGCAACTCAACTTCATCAGAAACTAACTGAAATCATTCA GATTTATGGTTTTGATGCTCCTCAAACATGGACCCAGATATCCACCCGGATTCT TCATCCGGACCTTCCCTTTTGTTTTCATCAGATGATGTATTATGGATGCTATGTT GATTTTGGACCGGATCCTCCTGCTTGGTCACCCGACCCGAAGGATGCAAAGTT AACAAACATAGGTAGTTTATTAGAGAGACGCGGAAAGGAGTTCTTGGGGCCTA GTTATAAAGATCCCATTTCAAGCTACTCTGCTCTTCAGGAATTTTCAGCCTTAA ATCTAGAGGTGTTTTGGAAAACAATATTGGATGAAATGAATATAACATTTTCT GTGCCTCCAAAACGCATATTAGTTGATGACCTGTCTAAAGAAAGCCAGTTATT GCATCCAGGTGGTCGATGGCTTCCCGGAGCTTATGTAAATCCAGCTAGAAATT GTTTGAGTTTAAGTAGCAAGAGAAGGTTAAGTGATATAGCAGTTATATGGCGT GATGAAGGAAATGATGATATGCCGGTCAACAAAATGACGTTTCAGCAGTTGCG CTCAGAGGTTTGGTTAGTTGCATATGCACTTGATACATTGGGAGTGGAAAAAG GATCTGCAATTGCAATCGATATGCCTATGGATGTCAAATCTGTGGTGATTTATC TAGCCATTGTTTTAGCAGGCTATGTGGTTGTATCTATTGCAGATAGTTTTGCTG CTGGTGAAATTTCGACCAGACTTGTATTATCAAAAGCAAAAGCAATTTTTACTC AGGATTTGATCATTCGTGGTGACAGAAGCCATCCCTTGTACAGCCGAGTTGTTG ATGCTCAATCACCTCTAGCAATTGTCATTCCTACGAGAGGCTCAAGTTTTAGTA TAAAATTACGTGACGGTGATATTTCTTGGCATGATTTTCTGGAACGAGCTAACA CTTACAGGAATGTTGAGTTTGTTGCTGTTGAACGACCCGTTGAAGCTTTCTCAA ATATCCTTTTCTCATCAGGAACTACAGGGGAACCGAAGGCAATTCCATGGACC CTTGCAACACCTTTCAAGGCTGGTGCAGACGCTTGGTGCCACATGGATGTCCA CAAAGGTGATGTTGTTGCATGGCCTACTAATCTTGGATGGATGATGGGTCCTTG GCTAATATATGCTTCATTGTTAAATGGGGGCTCACTTGCATTATACAACGGATC TCCCCTGACTTCTGGATTTGCCAAGTTTGTTCAGGATGCAAAAGTAACATTGTT GGGAGTGATACCAAGTATTGTGAGGGCATGGAGAACAAACAATAGTACAGCC GGCTTTGACTGGTCAACCATCCGGTGCTTTGGATCGACCGGTGAGGCCTCTAAT ACTGATGAATGTCTTTGGCTGATGGGAAGAGCTCATTACAAACCGGTCATCGA GTATTGCGGTGGCACAGAGATTGGTGGTGGTTTTATTACAGGATCTTTACTGCA GCCTCAGTGTTTGTCTGCTTTCAGCACACCAAGTTTGGGTTGTAAACTGTTAAT TCTTGGCGAAGATGGAATCCCTATACCACAAAACGCTCCTGGAATTGGTGAAT TGGCTCTGAATCCCCTCATGTTTGGGGCATCGAGCACACTACTAAATGCAAAC CACTATGATGTCTACTTTAAAGGCATGCCCTCTTGGAATGGTAAGGTTCTAAGA AGGCATGGAGATGTATTTGAGCGCACGTCTAAAGGATACTATCGTGCCCATGG TCGTGCAGATGATACTATGAATCTTGGGGGTATTAAGGTAAGTTCGGTTGAGA TTGAACGTGTATGCAACTCGATTGATGACAGAATTCTCGAGACAGCGGCTATA GGGGTTACACCTTCTGGTGGCGGGCCAGAGAGGTTGGTAATTGTTGTTGCTTTT AAAGATGGCAGTGGTTCGAAACCCGACTTAATCAAGTTGAAGGTCACACTGAA TTCAGCTTTACAAAAGAATCTGAACCCTTTGTTTAAGGTTTCTGATGTGGTGCC CTTTCCATCACTTCCTAGGACAGCAACAAACAAGGTAATGAGAAGGGTTTTGC GACAGCAGTTGACTCAAATTGGTCAAAATAGCAAGCTATAA (SEQ ID NO: 4).

[094] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 88%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 4, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 88% to 95%, 89% to 99%, 91 to 98%, or 88% to 100% homology or identity to SEQ ID NO: 4. Each possibility represents a separate embodiment of the invention.

[095] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGGGTGATTCAGAGGGAAGCAGCATTAGTACTCCTACAACTGAACAAGTTGG TTTCTTGTCAAATATCATGGAAGACAAATCTTATAGTGCTGCAGTTGCAATTAT GGTTGCCATTGCTGTACCGTTGGTTCTTTCTTCAGTGTTTGCAGCGAAGAAGAA AGTGAAACAACGAGGCGTTCCCGTTCAAGTTGGTGGTGAGCCAGGTTTTGCCA TGCGTAACTCTAGATCAAACAAATTAGTTGATGTCCCATGGGAAGGAGCTAGA

ACAATGGCTGCTCTTTTTGAGCAGTCTTGTAAGAAGCATTCACAGCTTCGGTTT

CTTGGTACAAGGAAGTTGATTGAAAGAAGCTTTGTGAGTGGTAGTGATGGGAG

AAAATTCGAGAAGTTACATCTTGGGGAGTATCAGTGGGAGACATATGGGCAGA

TATTTGAACGTGTTTGCAACTTTGCATCTGGACTTATTCAGCTTGGTCATGACC

CTGATACTCGTATTGCCATCTTTTCTGACACACGAGCTGAATGGTTAATTGCAT

TTGAGGGATGCTTCAGGCAGAACATCACTGTGGTTACCATATATGCATCATTA

GGTGATGATGCCCTCATTCACTCTCTTAACGAGACTAAAGTATCGACCTTGATT

TGTGATTCCAAACTATTGAAAAAAGTGGCTGCAGTTAGTTCAAGCCTGAAAAC

TGTAGAAAACTTCATCTACTTTGAAAGTGACAACACTGAAGCTTTAAATGAAA

TCGGTGATTGGAAAATATCTTCTTTTTCTGAAGTCGAGAGCTTGGGACAGAAG

AGTCCAGTAAGTGCTAGACTGCCTATCAAGAAAGACGTTGCAGTGATCATGTA

TACAAGTGGCAGCACAGGTTTACCAAAGGGGGTGATGATGACTCATGGGAATG

TAGTAGCAACTGCAGCTGCGGTTATGACTGTAATCCCAAATATTGGGACCAAT

GATGTTTATCTGGCATACTTACCATTGGCTCATATTTTCGAGTTGGCTGCTGAG

ACTGTGATGGTAACTGCAGGTATTCCAATTGGTTATGGTTCAGCACTCACTTTA

ACAGACACATCAAATAAAATCAAGAAAGGAACCTTGGGAGATGCATCCATCTT

GAAGCCAACGTTAATGGCAGCTGTTCCAGCTATTTTAGATCGTGTCCGAGATG

GAGTATTAAAGAAGGTTGAGGAAAAGGGAGGTTTGACAACAAAAATATTCAA

TATAGCCTACAAAAGGCGTTTGCTAGCAGTAGATGGAAGTTGGCTGGGTGCAT

GGGGGTTAGAGAAGCTATTGTGGGATGCCATTGTTTTTAAGAAGATTCGTTCTG

TACTTGGAGGAGATATCCGTTTCATGCTCTGTGGTGGTGCTCCTTTAGCTGCAG

ATACTCAGCGATTTATAAATGTCTGCGTTGGGGCTCCAATTGGACAAGGATAT

GGGCTGACCGAAACATGCGCTGGAGCTGCTTTCTCTGAGGCAGATGATAATTC

TGTTGGGCGTGTTGGTCCACCACTTCCTTGTGTCTATATTAAACTTGTTTCATGG

GATGAAGGTGGGTATTTAACATCAGACAAACCAATGCCGCGAGGCGAAGTTGT

AGTTGGTGGGTACAGTGTAACCGCTGGTTACTTTAATAATGAGGAAAAGACCA

ATGAGGTTTACAAGGTTGATGAAAGTGGGATGCGTTGGTTCTACACTGGGGAC

ATTGGAAGGTTTCATCCTGATGGATGCCTTGAAATCATTGACAGGAAGAAGGA

TATTGTAAAACTTCAACATGGAGAGTACATCTCCTTGGGGAAGGTTGAGGCAG

CACTTGCGTCAAGCAAGTATGTAGAGAATGTAATGTTACATGCCGACCCCTTC

CACACTTATTGTGTCGCCTTAGTTGTCCCTGCGCGTCAGGTTATAGAACAGTGG

GCTCAAGATGCGGGTATTAGTTACCAAGATTTTGCTGAGTTGTGTGATAAAAA

GGAAACTGTCTCTGAGGTTCAGCAATCCCTTACCAAGGTAGCAAAAGATGCAA

AACTAGACAAGTTTGAAACGCCTGCAAAGATAAAGCTGATGCCAGATCCATGG ACTCCTGAATCTGGATTAGTAACAGCGGCTCTTAAGTTAAAAAGGGAACAACT

GAAGTCCAAATTTAAGGATGATCTGGATAAGCTATATGGGTGA (SEQ ID NO: 5).

[096] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 88%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 5, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 88% to 95%, 89% to 99%, 91 to 98%, or 88% to 100% homology or identity to SEQ ID NO: 5. Each possibility represents a separate embodiment of the invention.

[097] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGTCGGTTTACACCGTTAAAGTCGAGGATTCACGGGCAGCTTCCGGAGAAAC CCCGTCAGCAGGGCCGGTTTACAGGTGCATTTATGCCAAGGATGCTCTCATGG AACTGCCCCCCGGTTATGAATCTCCCTGGGACTTCTTTAGTGAGTCTGTTAAAA GAAACCCAAAGAACCCAGCACTAGGTCGTCGTCAAGTCATCGATGGAAAGGCT GGTGGTTATTCATGGCTTTCATATCAAGAAGCCTACAATTCTGCTCTACGCATT GCTTCTGCCATCAGAAGCCGATCTGTTAATCCTGGGGATCGGTGTGGTATATAT GGACCTAACTGTCCTGAATGGATAATCTCAATGGAGGCTTGTAACAGCAATGG CATAACCTATGTTCCCCTATATGATACACTTGGTGCTAATGCGGTTGAATACAT CATCAACCATGCAGAAATTTCTTTAGTTTTTGTTCAAGAGAACAAGTTGTCTGC TATTTTATCATGTCTTCCAAATTGCTCATCAAATCTTAAAACAATCGTCAGCTTT GGGAAGTTCTCTGAATCACAAAAGAACGAAGCCATGGAACATGGCGTCGATTG CTTCTCTTGGGAAGAGTTTTCTTCGATGGGGAATTTGGAAGATGAACTTCCTGC AAAAAATAAGACTGACATTTGCACCATAATGTATACAAGTGGAACAACGGGA GAGCCTAAGGGTGTCGTACTAAGTAACAGAGCTTTCATGTCCGAAGTCTTGTCT ATGCATGAACTACTCATAGAAACAGACAAACCGGGCACAGAAGAAGATACCT ACTTCTCTTTTCTTCCTTTGGCACATATATTTGATCAAATAATGGAGACGTATTT CATCTACAGTGGTGCTTCGATAGGGTTTTGGCAAGGAGATATCAGATACTTGA TTGAAGACCTTCTTGTGTTGCAGCCAACCATATTTTGTGGTGTTCCAAGAGTTT ATGACCGCATTTATACGGGCATAATGGCTAAGATTTCAACTGGAGGTGCTATT CGGAAGGCATTATTTGATTTTGCATACAACTATAAATTAAGGAACCTTGAAAA GGGAATACAACAAGACAAATCAGCTCCTCTTTTGGACAAGCTGGTCTTCGATA AGATTAAACAAGGGTTTGGAGGAAGGGTTCGTCTTATGTTATCTGGAGCCGCA CCTTTGCCAAAACACGTGGAGGAATTTTTAAGAGTGACGTGCTGTACCGTTCTC TCACAAGGATACGGACTTACTGAAAGTTGTGGTGGATGCTTTACATCCATTGC GAATGTGTACTCTATGATCGGGACTGTTGGTGTACCCATGACAACTATTGAAG CAAGACTTGAGTCAGTGCCAGAGATGGGATATGATGCACTCAGTAGTGTGCCA TGTGGCGAAATTTGCCTCAGGGGAAACACACTATTTTCTGGGTACCACAAACG AGACGATCTAACTGATGCTGTCCTTGTAGATGGCTGGTTCCATACAGGTGACAT TGGGGAATGGCAGGCAGATGGAGCAATGAAAATCATTGACAGGAAAAAGAAT ATATTCAAATTGTCTCAAGGAGAATATGTTGCAGTTGAAAGTATTGAAAGCAC CTATTCACGGTGTCCTTTGGTTACCTCGATTTGGGTGTACGGCAATAGTTTTGA ATCTTTTCTAGTTGCGGTTGTGGTTCCCGATAGAGTAGCAGTTGAAGAGTTTGC TGCAAAGAACAATGAATCAGGAGATTATGCATCGTTGTGCAAGAACCCAAATG TCAGGAAATATGTTCTTGAAGAGCTGAATGCTGAAGCTCAATGCAATAAACTT CGCGGGTTTGAGATGCTAAAAGCAGTTCATTTGGATCCAGTCCCATTTGACTTC GAGAGGGATTTAATAACACCAACCTTTAAACTAAAAAGACAGCAGCTTCTAAA ATACTATAAGGATTGCGTTGAACAACTATATGCTGAAGCAAAGACATCCAAGA AATGA (SEQ ID NO: 6).

[098] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 6, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 95%, 90% to 99%, 91 to 98%, or 89% to 100% homology or identity to SEQ ID NO: 6. Each possibility represents a separate embodiment of the invention.

[099] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGGAAACTCATGGACCAAGGCTTCTAGGTGCAGCTTACAAAGATCCTATCAC GAGTTATAAACAGTTCCAAAAGTTCTCTGTTCAACATCTAGAGGTGTATTGGTC TCTTGTGTTAGAAAAGCTTTCAATCCAATTTCAGGAACGTCCAAAATGTATAGT AGATACTTCTGACAAATCAAAACACGGGGGCACATGGCTTCCCGGTTCAGTTT TGAACATTGCGGAGTGTTGTATATTGTCAACTACTGAAACAGATGAAAAGGTT GCGATTGTGTGGCGGGATGAAAGATGTGATAATCTGGATGTAAACAAGATGAC ATTCAAAGAATTGCGACAACAAGTAATGTTGGTTGCAAATGCATTGAAGTTAT TGTTTTCAAAAGGAGATCCTATTGCAATTGATATGCCAATGACAGTTACTGCAG TAATTCTATATTTGGCGATTGTATATTCTGGATTTGTGGTTGTATCTATAGCTGA CAGTTTTGCAGCTAAAGAGATTGCAACACGATTACGTGTATCTAATGCAAAGG CTATCTTTACTCAAGATTACATTGTTCGAGGTGGTCGAAGATTTCCTTTGTACA GTCGAGTTATTGAAGCCACCCAATGTAGAGCCATCGTGGTTCCTGCGATAGGG GAAAACGTAGAAGTTATTTTAAGAAAACAGGACATTTCATGGGGCGATTTTCT TTCTGGTGCAAAACAGCTTCCTAGCCCGGATTATTGCTCTCCAGTCTATCAATC CATAGACACGTTGACAAACATACTCTTCTCTTCGGGAACAACAGGAGACCCAA AAGCTATACCATGGACGCAAATATCTCCAATGAGATGTGCTGCTGACGGATGG GCTCATATGGATATTCAGGCTGGAGATGTTTATTGTTGGCCCACAAATCTGGGA TGGGTCATGGGACCCATTGTACTTTACTCGAGTTTTCTTACCGGTGCAACATTG GCTCTTTATAATGGCTCCCCTCTTGGTCATGGTTTTGGAAAATTTGTTCAGGAT GCAGGAGTGACAATTTTGGGCACGGTTCCAAGCATAGTCAAGTCTTGGAAGAG TACAAGATGTATGGAAGGACTGGACTGGACAAAGATAAAGGCATTTGGGTCG ACTGGTGAAGCTTCTAATGTCGACGATGACCTTTGGCTTTCCTCAAAGGCCTAC TACAAACCTGTTCTTGAATGCTGTGGAGGTACCGAGCTTGCATCTTCTTATGTT CAAGGGAATCTTCTACAGCCACAAGCCTTTGGAGCATTAAGCTCTGCTTCAAT GGGAACCGGATTTGTCATATTTGACGATCATGGAGTTCCTTACCCGGACGATG AACCCTGTGTTGGTGAAGTGGGTTTGTTTCCAGTATATATGGGAGCATCTGATA GACTACTGAATGCAGATCATGAAAAAATTTACTTCAAGGGAATGCCGAGTTAC AAAGGAATGCAACTAAGGAGACATGGAGATATCATCAAGAGAACAATTGGAG GATATTTGGTTGTACAAGGCAGGGCTGATGATACCATGAACCTTGGTGGCATA AAGACGAGCTCAATAGAAATTGAGCGTGTTTGTGAACAAGCTGATGGAAGCAT CATGGAAACTGCTGCAGTCAGTGTTGCACCTGCAACCGGTGGTCCAGAACTAT TAGCCATATTTGTGGTACTAAAGAACGGTTGCAACACTCAACCACAGGACCTA AAGATGATATTTTCAAAGGCCATTCAAAAAAACCTCAACCCATTGTTCAAGGT GAGCTTTGTAAAGGTTGTTCCAGAGTTCCCTCGAACCGCTTCTAACAAGTTATT GAGAAGAGTTTTAAGGAATCAAGTGAAGGAAGAGCTTCAAACTCGAAGTAAA ATATAA (SEQ ID NO: 7).

[0100] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 85%, at least 87%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 7, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 85% to 94%, 88% to 97%, 85% to 100%, or 92% to 99% homology or identity to SEQ ID NO: 7. Each possibility represents a separate embodiment of the invention.

[0101] In some embodiments, the DNA molecule comprises the nucleic acid sequence:

ATGGAGATCACTAAAAGCATCCAAGAATTAGGATTACAAGATCTACTAAACAC

TGGATTAACACCTAATGATGCAAAATCACTGCAAATCGAGATTAAACACATCA TTAATAGTCAAACTACTAATTCAAACCCAGTTGAGTTATGGCGTCAAATCACTT

CTGCAAAGCTGCTTAAACCCTCTTATCCTCATTCGTTGCACCAGCTCATCTACT

ACGCGGTGTACTGTAACTATGATGCATCCATCTATGGTCCTCCCCTGTATTGGT

TTCCATCTGAAATTGATTCTAAAAGGTCAAACTTGGGGAACATTATGGAAACT

CATGGACCAAGGCTTCTAGGTGCAGCTTACAAAGATCCTATCACGAGTTATAA

ACAGTTCCAAAAGTTCTCTGTTCAACATCTAGAGGTGTATTGGTCTCTTGTGTT

AGAAAAGCTTTCAATCCAATTTCAGGAACGTCCAAAATGTATAGTAGATACTT

CTGACAAATCAAAACACGGGGGCACATGGCTTCCCGGTTCAGTTTTGAACATT

GCGGAGTGTTGTATATTGTCAACTAGTGAAACAGATGATAAGGTTGCGATTGT

ATGGCGGGATGAAAGATGTGATAATCTGGATGTAAACAAGATGACATTCAAA

GAATTGCGACAACAAGTAATGTTGGTTGCAAATGCATTGAAGTTATTGTTTTCA

AAAGGAGATCCTATTGCAATTGATATGCCAATGACAGTTACTGCAGTAATTCT

ATATTTGGCGATTGTATATTCTGGATTTGTGGTTGTATCTATAGCTGACAGTTTT

GCAGCTAAAGAGATTGCAACACGATTACGTGTATCTAATGCAAAGGCTATCTT

TACTCAAGATTACATTGTTCGAGGTGGTCGAAGATTTCCTTTGTACAGTCGAGT

TATTGAAGCCACCCAATGTAGAGCCATCGTGGTTCCTGCGATAGGGGAAAACG

TAGAAGTTATTTTAAGAAAACAGGACATTTCATGGGGCGATTTTCTTTCTGGTG

CAAAACAGCTTCCTAGCCCGGATTATTGCTCTCCAGTCTATCAATCCATAGACA

CGTTGACAAACATACTCTTCTCTTCGGGAACAACAGGAGACCCAAAAGCTATA

CCATGGACGCAAATATCTCCAATGAGATGTGCTGCTGACGGATGGGCTCATAT

GGATATTCAGGCTGGAGATGTTTATTGTTGGCCCACAAATCTGGGATGGGTCA

TGGGACCCATTGTACTTTACTCGAGTTTTCTTACCGGTGCAACATTGGCTCTTT

ATAATGGCTCCCCTCTTGGTCATGGTTTTGGAAAATTTGTTCAGGATGCAGGAG

TGACAATTTTGGGCACGGTTCCAAGCATAGTCAAGTCTTGGAAGAGTACAAGA

TGTATGGAAGGACTGGACTGGACAAAGATAAAGGCATTTGGGTCGACTGGTGA

AGCTTCTAATGTCGACGATGACCTTTGGCTTTCCTCAAAGGCCTACTACAAACC

TGTTCTTGAATGCTGTGGAGGTACCGAGCTTGCATCTTCTTATGTTCAAGGGAA

TCTTCTACAGCCACAAGCCTTTGGAGCATTAAGCTCTGCTTCAATGGGAACCGG

ATTTGTCATATTTGACGATCATGGAGTTCCTTACCCGGACGATGAACCCTGTGT

TGGTGAAGTGGGTTTGTTTCCAGTATATATGGGAGCATCTGATAGACTACTGA

ATGCAGATCATGAAAAAATTTACTTCAAGGGAATGCCGAGTTACAAAGGAATG

CAACTAAGGAGACATGGAGATATCATCAAGAGAACAATTGGAGGATATTTGGT

TGTACAAGGCAGGGCTGATGATACCATGAACCTTGGTGGCATAAAGACGAGCT

CAATAGAAATTGAGCGTGTTTGTGAACAAGCTGATGGAAGCATCATGGAAACT

GCTGCAGTCAGTGTTGCACCTGCAACCGGTGGTCCAGAACTATTAGCCATATTT GTGGTACTAAAGAACGGTTGCAACACTCAACCACAGGACCTAAAGATGATATT TTCAAAGGCCATTCAAAAAAACCTCAACCCATTGTTCAAGGTTTTCTCCTAA (SEQ ID NO: 8).

[0102] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 84%, at least 87%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 8, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 84% to 94%, 88% to 97%, 84% to 100%, or 92% to 99% homology or identity to SEQ ID NO: 8. Each possibility represents a separate embodiment of the invention.

[0103] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGGTGTACAAGTCTTTGAATTCAATATCCATATCAGATATAGTAAATCTTGGT ATATCACCTGAAACTGCAACTCAACTTCATCAGAAACTAACTGAAATCATTCA GATTTATGGTTTTGATGCTCCTCAAACATGGACCCAGATATCCACCCGGATTCT TCATCCGGACCTTCCCTTTTGTTTTCATCAGATGATGTATTATGGATGCTATGTT GATTTTGGACCGGATCCTCCTGCTTGGTCACCCGACCCGAAGGATGCAAAGTT AACAAACATAGGTAGTTTATTAGAGAGACGCGGAAAGGAGTTCTTGGGGCCTA GTTATAAAGATCCCATTTCAAGCTACTCTGCTCTTCAGGAATTTTCAGCCTTAA ATCTAGAGGTGTTTTGGAAAACAATATTGGATGAAATGAATATAACATTTTCT GTGCCTCCAAAACGCATATTAGTTGATGACCTGTCTAAAGAAAGCCAGTTATT GCATCCAGGTGGTCGATGGCTTCCCGGAGCTTATGTAAATCCAGCTAGAAATT GTTTGAGTTTAAGTAGCAAGAGAAGGTTAAGTGATATAGCAGTTATATGGCGT GATGAAGGAAATGATGATATGCCGGTCAACAAAATGACGTTTCAGCAGTTGCG CTCAGAGGTTTGGTTAGTTGCATATGCACTTGATACATTGGGAGTGGAAAAAG GATCTGCAATTGCAATCGATATGCCTATGGATGTCAAATCTGTGGTGATTTATC TAGCCATTGTTTTAGCAGGCTATGTGGTTGTATCTATTGCAGATAGTTTTGCTG CTGGTGAAATTTCGACCAGACTTGTATTATCAAAAGCAAAAGCAATTTTTACTC AGGATTTGATCATTCGTGGTGACAGAAGCCATCCCTTGTACAGCCGAGTTGTTG ATGCTCAATCACCTCTAGCAATTGTCATTCCTACGAGAGGCTCAAGTTTTAGTA TAAAATTACGTGACGGTGATATTTCTTGGCATGATTTTCTGGAACGAGCTAACA CTTACAGGAATGTTGAGTTTGTTGCTGTTGAACGACCCGTTGAAGCTTTCTCAA ATATCCTTTTCTCATCAGGAACTACAGGGGAACCGAAGGCAATTCCATGGACC CTTGCAACACCTTTCAAGGCTGGTGCAGACGCTTGGTGCCACATGGATGTCCA CAAAGGTGATGTTGTTGCATGGCCTACTAATCTTGGATGGATGATGGGTCCTTG GCTAATATATGCTTCATTGTTAAATGGGGGCTCACTTGCATTATACAACGGATC TCCCCTGACTTCTGGATTTGCCAAGTTTGTTCAGGATGCAAAAGTAACATTGTT GGGAGTGATACCAAGTATTGTGAGGGCATGGAGAACAAACAATAGTACAGCC GGCTTTGACTGGTCAACCATCCGGTGCTTTGGATCGACCGGTGAGGCCTCTAAT ACTGATGAATGTCTTTGGCTGATGGGAAGAGCTCATTACAAACCGGTCATCGA GTATTGCGGTGGCACAGAGATTGGTGGTGGTTTTATTACAGGATCTTTACTGCA GCCTCAGTGTTTGTCTGCTTTCAGCACACCAAGTTTGGGTTGTAAACTGTTAAT TCTTGGCGAAGATGGAATCCCTATACCACAAAACGCTCCTGGAATTGGTGAAT TGGCTCTGAATCCCCTCATGTTTGGGGCATCGAGCACACTACTAAATGCAAAC

CACTATGATGTCTACTTTAAAGGCATGCCCTCTTGGAATGGTAAGGTTCTAAGA AGGCATGGAGATGTATTTGAGCGCACGTCTAAAGGATACTATCGTGCCCATGG TCGTGCAGATGATACTATGAATCTTGGGGGTATTAAGGTAAGTTCGGTTGAGA TTGAACGTGTATGCAACTCGATTGATGACAGAATTCTCGAGACAGCGGCTATA GGGGTTACACCTTCTGGTGGCGGGCCAGAGAGGTTGGTAATTGTTGTTGCTTTT AAAGATGGCAGTGGTTCGAAACCCGACTTAATCAAGTTGAAGGTCACACTGAA TTCAGCTTTACAAAAGAATCTGAACCCTTTGTTTAAGGTTTCTGATGTGGTGCC CTTTCCATCACTTCCTAGGACAGCAACAAACAAGGTAATGAGAAGGGTTTTGC GACAGCAGTTGACTCAAATTGGTCAAAATAGCAAGCTATAA (SEQ ID NO: 9).

[0104] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 88%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 9, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 88% to 95%, 89% to 99%, 91 to 98%, or 88% to 100% homology or identity to SEQ ID NO: 9. Each possibility represents a separate embodiment of the invention.

[0105] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGACGTTTCAGCAGTTGCGCTCAGAGGTTTGGTTAGTTGCATATGCACTTGAT ACATTGGGAGTGGAAAAAGGATCTGCAATTGCAATCGATATGCCTATGGATGT CAAATCTGTGGTGATTTATCTAGCCATTGTTTTAGCAGGCTATGTGGTTGTATC TATTGCAGATAGTTTTGCTGCTGGTGAAATTTCGACCAGACTTGTATTATCAAA AGCAAAAGCAATTTTTACTCAGGATTTGATCATTCGTGGTGACAGAAGCCATC CCTTGTACAGCCGAGTTGTTGATGCTCAATCACCTCTAGCAATTGTCATTCCTA CGAGAGGCTCAAGTTTTAGTATAAAATTACGTGACGGTGATATTTCTTGGCATG ATTTTCTGGAACGAGCTAACACTTACAGGAATGTTGAGTTTGTTGCTGTTGAAC GACCCGTTGAAGCTTTCTCAAATATCCTTTTCTCATCAGGAACTACAGGGGAAC CGAAGGCAATTCCATGGACCCTTGCAACACCTTTCAAGGCTGGTGCAGACGCT TGGTGCCACATGGATGTCCACAAAGGTGATGTTGTTGCATGGCCTACTAATCTT GGATGGATGATGGGTCCTTGGCTAATATATGCTTCATTGTTAAATGGGGGCTCA CTTGCATTATACAACGGATCTCCCCTGACTTCTGGATTTGCCAAGTTTGTTCAG GATGCAAAAGTAACATTGTTGGGAGTGATACCAAGTATTGTGAGGGCATGGAG AACAAACAATAGTACAGCCGGCTTTGACTGGTCAACCATCCGGTGCTTTGGAT CGACCGGTGAGGCCTCTAATACTGATGAATGTCTTTGGCTGATGGGAAGAGCT CATTACAAACCGGTCATCGAGTATTGCGGTGGCACAGAGATTGGTGGTGGTTT TATTACAGGATCTTTACTGCAGCCTCAGTGTTTGTCTGCTTTCAGCACACCAAG TTTGGGTTGTAAACTGTTAATTCTTGGCGAAGATGGAATCCCTATACCACAAAA CGCTCCTGGAATTGGTGAATTGGCTCTGAATCCCCTCATGTTTGGGGCATCGAG CACACTACTAAATGCAAACCACTATGATGTCTACTTTAAAGGCATGCCCTCTTG GAATGGTAAGGTTCTAAGAAGGCATGGAGATGTATTTGAGCGCACGTCTAAAG GATACTATCGTGCCCATGGTCGTGCAGATGATACTATGAATCTTGGGGGTATTA AGGTAAGTTCGGTTGAGATTGAACGTGTATGCAACTCGATTGATGACAGAATT CTCGAGACAGCGGCTATAGGGGTTACACCTTCTGGTGGCGGGCCAGAGAGGTT GGTAATTGTTGTTGCTTTTAAAGATGGCAGTGGTTCGAAACCCGACTTAATCAA GTTGAAGGTCACACTGAATTCAGCTTTACAAAAGAATCTGAACCCTTTGTTTAA GGTTTCTGATGTGGTGCCCTTTCCATCACTTCCTAGGACAGCAACAAACAAGGT AATGAGAAGGGTTTTGCGACAGCAGTTGACTCAAATTGGTCAAAATAGCAAGC TATAA (SEQ ID NO: 10).

[0106] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 92%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 10, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 95%, 90% to 97%, 95% to 99%, or 90% to 100% homology or identity to SEQ ID NO: 10. Each possibility represents a separate embodiment of the invention.

[0107] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGAATATAACATTTTCTGTGCCTCCAAAACGCATATTAGTTGATGACCTGTCT AAAGAAAGCCAGTTATTGCATCCAGGTGGTCGATGGCTTCCCGGAGCTTATGT AAATCCAGCTAGAAATTGTTTGAGTTTAAGTAGCAAGAGAAGGTTAAGTGATA TAGCAGTTATATGGCGTGATGAAGGAAATGATGATATGCCGGTCAACAAAATG ACGTTTCAGCAGTTGCGCTCAGAGGTTTGGTTAGTTGCATATGCACTTGATACA TTGGGAGTGGAAAAAGGATCTGCAATTGCAATCGATATGCCTATGGATGTCAA ATCTGTGGTGATTTATCTAGCCATTGTTTTAGCAGGCTATGTGGTTGTATCTATT GCAGATAGTTTTGCTGCTGGTGAAATTTCGACCAGACTTGTATTATCAAAAGCA AAAGCAATTTTTACTCAGGATTTGATCATTCGTGGTGACAGAAGCCATCCCTTG

TACAGCCGAGTTGTTGATGCTCAATCACCTCTAGCAATTGTCATTCCTACGAGA GGCTCAAGTTTTAGTATAAAATTACGTGACGGTGATATTTCTTGGCATGATTTT CTGGAACGAGCTAACACTTACAGGAATGTTGAGTTTGTTGCTGTTGAACGACC CGTTGAAGCTTTCTCAAATATCCTTTTCTCATCAGGAACTACAGGGGAACCGAA

GGCAATTCCATGGACCCTTGCAACACCTTTCAAGGCTGGTGCAGACGCTTGGT GCCACATGGATGTCCACAAAGGTGATGTTGTTGCATGGCCTACTAATCTTGGAT GGATGATGGGTCCTTGGCTAATATATGCTTCATTGTTAAATGGGGGCTCACTTG CATTATACAACGGATCTCCCCTGACTTCTGGATTTGCCAAGTTTGTTCAGGATG

CAAAAGTAACATTGTTGGGAGTGATACCAAGTATTGTGAGGGCATGGAGAACA AACAATAGTACAGCCGGCTTTGACTGGTCAACCATCCGGTGCTTTGGATCGAC CGGTGAGGCCTCTAATACTGATGAATGTCTTTGGCTGATGGGAAGAGCTCATT ACAAACCGGTCATCGAGTATTGCGGTGGCACAGAGATTGGTGGTGGTTTTATT

ACAGGATCTTTACTGCAGCCTCAGTGTTTGTCTGCTTTCAGCACACCAAGTTTG GGTTGTAAACTGTTAATTCTTGGCGAAGATGGAATCCCTATACCACAAAACGC TCCTGGAATTGGTGAATTGGCTCTGAATCCCCTCATGTTTGGGGCATCGAGCAC ACTACTAAATGCAAACCACTATGATGTCTACTTTAAAGGCATGCCCTCTTGGAA

TGGTAAGGTTCTAAGAAGGCATGGAGATGTATTTGAGCGCACGTCTAAAGGAT ACTATCGTGCCCATGGTCGTGCAGATGATACTATGAATCTTGGGGGTATTAAG GTAAGTTCGGTTGAGATTGAACGTGTATGCAACTCGATTGATGACAGAATTCT CGAGACAGCGGCTATAGGGGTTACACCTTCTGGTGGCGGGCCAGAGAGGTTGG

TAATTGTTGTTGCTTTTAAAGATGGCAGTGGTTCGAAACCCGACTTAATCAAGT TGAAGGTCACACTGAATTCAGCTTTACAAAAGAATCTGAACCCTTTGTTTAAG GTTTCTGATGTGGTGCCCTTTCCATCACTTCCTAGGACAGCAACAAACAAGGTA ATGAGAAGGGTTTTGCGACAGCAGTTGACTCAAATTGGTCAAAATAGCAAGCT

ATAA (SEQ ID NO: 11).

[0108] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 92%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 11, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 95%, 90% to 97%, 95% to 99%, or 90% to 100% homology or identity to SEQ ID NO: 11. Each possibility represents a separate embodiment of the invention.

[0109] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCATCCTCAATTAATATCTCCAAGATCAGAGAGGCTCAACGAGCACAAGG TCCAGCCTCTATTCTTGCTGTCGGTACCGCGAATCCGTCTAATTGCGTGTATCA AGCTGATTATCCTGATTACTACTTTCGAATCACTAAAAGTGAACACATGGTTGA TCTCAAACGGAAATTCAAGCGCATGTGTGACCAATCTATGATAAGAAAGCGGT ACATGCAAATTACGGAGGAGTATCTGAAAGAAAACCCCAACATTTGTGAATAC

ATGGCTCCATCACTTGACGCCCGTCAAGACGTTGTAGTCGTCGAAGTCCCAAA ACTCGGTAAAGAAGCCGCAACAAAAGCCATCAAAGAATGGGGCCAACCAAAA TCCAAAATTACCCATCTCATCTTTTGTACCACGTCCGGTGTCGACATGCCCGGA GCAGATTACCAGCTCACCAAACTCCTCGGTCTTTGTCCTTCAGTCAAACGCTTT ATGATGTACCAACAAGGTTGTTTTGCTGGTGGCACGGTTCTTCGTCTAGCTAAG

GACATCGCTGAGAACAATAAAGGTGCTCGTGTACTTGTCGTTTGTTCCGAGATT ACAGCTGTCATTTTTCGTGGACCCAACGACACTCACCTTGATTCACTTATCGGT CAAGCGTTATTTGGGGATGGGGCATCTTCGGTTATCGTGGGGTCTGACCCAGA CTTGACAACCGAGCGGCCATTGTTTGAAATCATATCGGCTGCACAAACGATTTT ACCGGACTCTGAAGGTGCGATAGATGGACACTTGAGGGAAGCTGGGTTAACTT

TTCATCTACTTAAAGACGTACCGAGGTTGATTTCGAAGAATATAGAGAAAGCT TTAACACAAGCATTTTCTCCCCTGGGAATTAGTGACTGGAACTCTATCTTTTGG GTCACGCACCCTGGTGGTCCAGCTATACTGGACCAAGTGGAACTCAAACTTGG ACTCAAAGAGGAGAAGATGAGAACCACTAGACATGTTCTCAGTGAATATGGG AACATGTCTAGTGCATGTGTTTTTTTTGTACTTGATGAAATGAGAAAGAGATCG

GCTAAAGGCGGTGCGAGGACCACCGGAGAAGGGTTAGATTGGGGTGTTCTGTT TGGGTTTGGTCCGGGTTTAACGGTTGAGACTGTGGTCCTTCATAGTCTCCCAAC TACTATGTCGATTGCGACTTAA (SEQ ID NO: 23).

[0110] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 83%, at least 85%, at least 87%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 23, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 83% to 100%, 88% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 23. Each possibility represents a separate embodiment of the invention.

[0111] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCATCCTCAATTAATATCTCCAAGATCAGAGAGGCTCAACGAGCACAAGG TCCAGCCTCTATTCTTGCTGTCGGTACTGCGAATCCGTCTAATTGTGTGTATCA AGCTGATTATCCTGATTACTACTTTCGAATCACTAAAAGTGAACACATGGTTGA TTTGAAAGAGAAATTCCAGCGCATGTGTGACAAATCTATGATAAGAAAGCGGC ACATTCACATTACGGAGGAGTTTTTGAAAGAAAACCCAAACCTTTGTGAATAC ATGGCTCCATCACTTGACACCCGTCAAGACGTTGTAGTCGTCGAAGTCCCAAA ACTCGGTAAAGAAGCCGCAACAAAAGCCATCAAAGAATGGGGCCAACCAAAA TCCAAAATTACCCATCTCATCTTTTGTACCACGTCCGGTGTCGACATGCCCGGA GCAGATTACCAGCTCACCAAACTCCTCGGTCTCCATCCTTCAGTCAAACGCTTT ATGATGTACCAACAAGGTTGTTTTGCTGGTGGCACGGTTCTTCGTCTAGCTAAG GACCTCGCTGAGAACAATAAAGGTGCTCGTGTACTTGCCGTTTGTTCCGAGATT ACAGCTGTCACGTTTCGTGGACCCAACGACACTCACATTGATTCACTTGTCGGT CAAGCATTATTTGGGGACGGGGCAGCTGCGGTTATCGTGGGGTCTGATCCTGA CTTGACAACTGAGCGGCCGTTGTTTGAAATCATATCGGCTGCACAAACGATTTT ACCGAACTCTGAAGGTGCGATAGATGGACATGTGAGGGAAGTTGGGGTAACT ATTCATATACTTAAAGACGTCCCGGTGTTGATTTCGAAGAATATAGAGAAAGC TTTAACACAAGCATTTTCTCCCTTAGGAATTAGTGACTGGAACTCGATCTTTTG GGTCGTACACCCTGGTGGTCCAGCTATACTGGACCAAGTGGAACTCAAACTTG GACTCAAAGAGGAGAAAATGAGAACCACTAGACATGTTCTCAGTGAATATGG GAACATGTCTAGTGCATGTGTTTTTTTTGTACTTGATGAAATGAGAAAGAGATC GGCTAAAGGCGGTGCGAGGACCACCGGAGAAGGGTTAGATTGGGGTGTTCTGT TTGGGTTTGGTCCAGGTTTAACGGTTGAGACGGTGGTCCTTCATAGTCTCCCAA CTACTATGTCGATTGCAACTTAA (SEQ ID NO: 24).

[0112] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 83%, at least 85%, at least 87%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 24, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 83% to 100%, 87% to 100%, 90% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 24. Each possibility represents a separate embodiment of the invention. [0113] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCATCCTCAATTAATATCTCCAAGATCAGAGAGGCTCAACGAGCACAAGG TCCAGCCTCTATTCTTGCTGTCGGTACCGCGAATCCGTCTAATTGCGTGTATCA AGCTGATTATCCTAATTACTACTTTCGAATCACTAAAAGTGAACACATGGTTGA TCTCAAACGGAAATTCAAGCGCATGTGTGACCAATCTATGATAAGAAAGCGGT ACATGCAAATTACGGAGGAGTATCTGAAAGAAAACCCCAACATTTGTGAATAC ATGGCTCCATCACTTGACGCCCGTCAAGACGTTGTAGTCGTCGAAGTCCCAAA ACTCGGTAAAGAAGCCGCAACAAAAGCCATCAAAGAATGGGGCCAACCAAAA TCCAAAATTACCCATCTCATCTTTTGTACCACGTCCGGTGTCGACATGCCCGGA GCAGATTACCAGCTCACCAAACTCCTCGGTCTCTGTCCTTCAGTCAAACGCTTT ATGATGTACCAACAAGGTTGTTTTGCTGGTGGCACGGTTCTTCGTCTAGCTAAG GACATCGCTGAGAACAATAAAGGTGCTCGTGTACTTGTCGTTTGTTCCGAGATT ACAGCTGTCATTTTTCGTGGACCCAACGACACTCACCTTGATTCACTTATCGGT CAAGCGTTATTTGGGGATGGGGCATCTTCGGTTATCGTGGGGTCTGACCCAGA CTTGACAACCGAGCGGCCATTGTTTGAAATCATATCGGCTGCACAAACGATTTT ACCGGACTCTGAAGGTGCGATAGATGGACACTTGAGGGAAGCTGGGTTAACTT TTCATCTACTTAAAGACGTACCGGGGTTGATTTCGAAGAATATAGAGAAAGCT TTAACACAAGCATTTTCTCCCTTGGGAATTAGTGACTGGAACTCTATCTTTTGG GTCACGCACCCTGGTGGTCCAGCTATACTGGACCAAGTGGAACTCAAACTTGG ACTCAAAGAGGAGAAGATGAGAGCCTCTAGACATGTTCTCAGTGAATACGGG AACATGTCTAGTGCATGTGTTTTTTTTATACTTGATGAAATGAGAAAGAAATCG GATGAAGATGGTGCGCCGACCACTGGAGAAGGGTTAGATTGGGGTGTTCTGTT TGGGTTTGGTCCGGGTTTAACGGTTGAGACGGTGGTCCTTCATAGTCTCCCAAC TACTATGTCGATTGCGACTTAA (SEQ ID NO: 25).

[01 14] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 83%, at least 87%, at least 89%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 25, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 83% to 100%, 88% to 100%, 93% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 25. Each possibility represents a separate embodiment of the invention.

[0115] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence: ATGGCATCCTCAATTAATATCTCTAAGATCAGAGAGGCTCAACGAGCACAAGG TCCAGCCTCTATTCTTGCTGTCGGTACTGCGAATCCATCTAATTATGAGATTCA AGCTGATTTTCCTGATTACTACTTTCGAGTCACTAAAAGTGAACACATGGCTGA TATGAAAGGGACATTCCAGCGCATGTGTGACAAATCTATGATAAGAAAGCGGC ACATGCTCATTACGGAGGAGTTTTTGAAAGAAAACCCAAACCTTTGTGAATAC ATGGCTCCATCACTTGACACCCGTCAAGACGTTGTAGTCGTCGAAGTCCCAAA ACTCGGTAAAGAAGCCGCAACAAAAGCCATCAAAGAATGGGGCCAACCAAAA TCCAAAATTACCCATCTCATCTTTTGTACTACAACTGGTGTCGACATGCCTGGA GCCGATTACCAGCTCACCAAGCTCCTCGGCCTCGCTCCTTCAGTCAAACGCTTT ATGATATACCAACAAGGTTGTTTTGCTGGTGGCACGGTTCTTCGTCTTGCTAAA GACATAGCTGAGAACAATAAAGGTGCTCGTGTACTTGCCGTATGTTCAGAGAT TACAGCTATGTCGTTTCGTGGGCCCAATGACACTCACGTTGATTCACTTGTCGG TCAAGCATTATTTGGGGACGGGGCAGCTGCAGTTATCGTGGGGTCTGATCCTG ACTTGACAACCGAGCGGCCGTTGTTTGAAATCATATCGGCTGCACAAACGATT TTACCAAACTCTGAAGGTGCGATAGATGGACATGTGAGGGAAGTTGGTTTAAC TATTCATATACTTAAAGACGTCCCGGTGTTGATATCGAAGAATATAGAGAAAG CTTTGACACAAGCATTTTCTCCCTTAGGAATTAGTGACTGGAACTCGATCTTTT GGATCGTACACCCTGGTGGTCCAGCTATACTGGACCAAGTGGAACTCAAAGTT GGACTCAAAAAGGAGAAAATGGCAACCAGTAGACATGTTCTAAGTGAATACG GGAACATGTCTAGTGCATGTGTTTTTTTTATAATGGATGAAATGAGAAAGAGA TCGGCTAAAGGCGGTGCGAGGACCACCGGAGAAGGGTTAGATTGGGGTGTTTT GTTTGGGTTTGGTCCAGGTTTAACGGTTGAGACGGTGGTCCTTCATAGTCTCCC

AACTACAATGTAG (SEQ ID NO: 26).

[0116] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 26, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 86% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 26. Each possibility represents a separate embodiment of the invention.

[01 17] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCGGAGTTCACACATTTAGTGGTGGTTAAGTTCAAAGAAGAGGTGGTTGT

GGAGGATATTATGAAAGGGTTGGAGAAACTTGTATCTCAACTTGATAGTGTCA AGTCCTTTGTTTGGGGAAAGGATATTGAAAGCATGGAGATGTTAAGGCAAGGA TTCACCCATGCAATCATGATGACATTTGGTTCTAAAGAAGATTTTACTGCATTT CAATCCCACCCAAACCATGTTGAATTCTCGGCTACGTTTTCAGCAGCAATCGAA AAGATCGTTCTTCTTGATTTCCCAGTTGTTGCTGTCAAGACTGCAACTGCTTGA (SEQ ID NO: 31).

[0118] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 72%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 31, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 72% to 95%, 72% to 100%, 75% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 31. Each possibility represents a separate embodiment of the invention.

[01 19] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGTCGTCCTTACAAAACAAATTTATCGAACACATTGCTCTTATCAAAATCAAA CCCGGTGTTGAGTCTACCACCTTGATAGATAAACTCAACGGCCTTTCTTCGATT GAGGTGTTACTGCACTTCAGCGCGGGTGAACTCCTGGGATCATCCCACGGCTT CACTCACATCGTTCACTGCCGTGTCAGATCAAAGGATGATCTCCAAATCTACCT TACACATCCTATCCACTTGCATCTGGCTGATGATACTTTACCCTTACTTGATGA CGTCACCGTCGTTGACTGGTTTTCATCCAACTCTGATATTGTGGATCCTCCTAA ACCAGGATCTGCAATGAGAGTTACGCTGCTGAAGTTGAAACACGATTCGACTG AAAGTAATAAGTTAGTAGTGATTGAAGGAATTAAAAATCAGTTTAAAGGAATT GAAGACGTGATAGTTACAACTACTTTTGGTGAGAATTTGTTTCATGAAATGCAT GAGAATTTCTCGATTGAAATTGACAAAGGATACTCGATTGGTTCGATTGCCTTT GTTCCTGGATCTGCAGATTTCCAGGTTTTAAATTCAAAGGTAGATAATAATAAA CTCAATGATTTAACAGAAAGTGAAGTGGTGGTTGATTATGTGTTTCCATCAGCC AATTAA (SEQ ID NO: 32).

[0120] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 32, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 50% to 95%, 55% to 98%, 60% to 99%, or 50% to 100% homology or identity to SEQ ID NO: 32. Each possibility represents a separate embodiment of the invention. [0121] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGTCCTCTGAAGAGCAGATCGTGGAACACGTGGTCCTGTTCAAAGTGAAACC TGATGCTGATCCTAGTAAAGTCGCGGCTTGGGTCAATGGGCTCAACGGTTTGA CCTCACTCCAGCTCGCCCTCCACCTCTCCGCTGGACAACTCATCCGGTGTCGGT CGTCGTCGCTCACCTTCACTCACATGCTTCACAGTCGTTACAGATCAAAGGAGC ATCTCCGGCAGTACACCGTTCATCCCGAGCACGTGCGCGTGGTTACAGAGGGT AAATCCATCATTGATGACGTCATGGCCCTTGATTGGATGATATCTAACGGCGCT GCTAGTAGCGTCTGTCCTAAGCCTGGATCAGCGGTGAGAGTTGGGTTTTATAA GTTAATGGAGAGTTTGGGGGAAATTGAGAAAGCTAGGGTTTTGGAAGTGATGG GAGGGATTGAAGAGTTAAGTGTTGGTGAGAGTTTTTGTGATGACAGGGCCAAG GGTTATACGATTGCTTCAACCGCCGTGTTTCCCAATGGCAATCCTGCTGCTGAT TTGGATTTATATCATTCCGGTGACCAGCTCCTGCTGAAAGAGGAAGTGATGAA GGATTCTATACAAAGTGTGGTGGTTGTTGATTACGTAATTCCATCTCCCTGA (SEQ ID NO: 33).

[0122] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 67%, at least 72%, at least 78%, at least 85%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 33, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 67% to 95%, 70% to 98%, 75% to 99%, or 67% to 100% homology or identity to SEQ ID NO: 33. Each possibility represents a separate embodiment of the invention.

[0123] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGGAGAAGTGAAGCACATACTTTTAGCGAAGTTTAAGGATGGAATCTCGGA ACAACAGATCCAGCATCTCATCACAGGTTATGCTAACCTCGTCAATCTCGTTGA ACCCATGAAGTCTTTTCGATGGGGAAAAGATGTGAGCATTGAGAATCTGCACC AAGGCTTTACTCATGTGTTCGAGTCAACCTTTGAAACCACTGAAGGCATTGCA ACTTATATATCTCATCCTGCTCATGTCGAGTTCGCCACTGGTTTCCTGGATCAA CTGGAAAAAGTCATAGTCATCGACTACAAACCTACATCAGTTGACCCGTGA (SEQ ID NO: 34).

[0124] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 74%, at least 78%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 34, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 74% to 95%, 78% to 98%, 80% to 99%, or 75% to 100% homology or identity to SEQ ID NO: 34. Each possibility represents a separate embodiment of the invention.

[0125] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGCTATGTGCTCCAGCACGCACACGATTACTTCCATCAATTTCTCTCTTACCTT CCCAACATAACATCTTCCGCCGCCTGAACTGTCTCATCCACCGTCGCAACCACC ACCAAACGCCGATCACGATGTCTGCTCAACAACAAATCGTGGAACACGTAGTG CTCTTCAAAGTAAAACCGGATGTTGATTCTAGTAAAGTTGCTGCAATGGTCAA CGGACTCAACGGATTGACCTCACTCGATCTTACTCTCCACCTCTCCGCCGGACA GCTCCTCCGGTCACGGTCATCATCGCTGACCTTCACTCACATGCTTCACAGTCG TTACAGATCAAAGGACGATCTCCGGGAGTACGCTGCTCATCCTGACCACGTGC GAGTCGTGACGGAGAATATAAAACCGGTTATTGATGATATCATGGCTGTTGAT TGGATATCTAACGATGCCAGTGTATCGCCTAAGCCAGGGTCGGCGATGAGAGT AACATTTTTGAAATTAAAGGAGAATTTGGGGGAAAATGAGAAATCTAGGGTTT TGGAAGTGATTGGAGGAATCAAAAATCAGTTTAAATCAATTGAGGAGTTAAGT GTTGGTGAGAATTTTTCTCATGATAGAGCCAAGGGGTATACGATTGCTTCAATT GCTGTGTTACCCGGGCCTTCCGAGCTGGAGGCATTGGATTCGAATACTGAGCT GGTGAAGTTGGAAAAGGAGAAAGTGAAGGACTTACTGGAGAGCGTTGTGGTT GTTGATTATGTGATTCCATCTCTGCAATCGGCTAGTCTTTAA (SEQ ID NO: 35).

[0126] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 69%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 35, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 69% to 95%, 70% to 100%, 80% to 99%, or 68% to 100% homology or identity to SEQ ID NO: 35. Each possibility represents a separate embodiment of the invention.

[0127] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCAGTTGCTCAACTTTCTTCCTCCCTCTGTATCTCCACACCCGCTAGAATCT CTACTGGTTCTGGGTTTTCGTCATCAGGTTTGCCTCGGATTGGGACAACGTTTG TATGCGGTTCAGGTTCGCCTCTTGTGATATCTGGAACATATCATCAGAAGGCTC GAGTACATAAGCCTGCAGCATTATCTGTGAGATGTGAACAAAGTAGTAAGGAT GGAAATGGTTTAAATGTGTGGCTTGGTCGAACAGCAATGGTTGGCTTTGCAGT GGCAATTAGTGTTGAAGTATCAACTGGGAAGGGGCTTCTTGAGAACTTTGGGC TCACATCACCCTTGCCAACAGTGGCCTTGGCACTGACTGCACTTGGGGGCGTTC TTACAGCACTTTTCATCTTCCAGTCTGCTTCTGAGAGTTGA (SEQ ID NO: 36). [0128] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 73%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 36, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 73% to 95%, 73% to 100%, 80% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 36. Each possibility represents a separate embodiment of the invention.

[0129] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGATTGAACACATAGTCCTCCTCAAATTTAAATCCGACGTCGACTCTACCAAA GTCGAGTCCATGATTAACGAACTCAACGGATTGGCTTCACTCGATGTTGCACTC GACGTGAGTGCCGGTAAAATCCTGCGAGTGAGTAGTACATCATCCTCTTCTCTC ACTTTCACCCACCTCTTTCGCTGTTGTTTCAGATCAGCCGATGATCAGCAAGTC TTCTCTACTCATCCTGACCATCTACGAGTGGCCATTGAAGTTCGACCCGTAATT GAAGATATGGTAGTTGTTGACTTGGTATCCAAAACTACAATTGACTCACCAAA CCCAGGATCTGCAATGAAAGTTAGGATATTTAAGTTGAAAGACGATCTGATCG AAGATAGTAAGTTAGTAGTGATGGAAGGAATTAAAAATGAGTTAAAAGCAGT TGAACATATTAGGTTTGGTGACAACATTAATGTTATGGCAAAGGGATACTCGA TTGCTATGATTGCTTTTTTTCCTGATTTGGAATCTTCGGTTGCAGGTGCAGAAAT TGTTAAGGATTATATAGAGAGCGAGCTGGTGGTGGATTTTGTGTTTCCACCACC AAACGTTACAAGTCATTCATGA (SEQ ID NO: 37).

[0130] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 69%, at least 78%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 37, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 69% to 95%, 70% to 98%, 71% to 99%, or 69% to 100% homology or identity to SEQ ID NO: 37. Each possibility represents a separate embodiment of the invention.

[0131] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence: ATGGCGGAGTTCACACATTTAGTGGTGGTTAAGTTCAAAGAAGAGGTGGTTGT AGAGGATATTATGAAAGGGTTGGAGAAACTTGCATCTCAACTTGATAGTGTCA AGTCCTTTGTTTGGGGAAAGGATATTGAAAGCATGGAGATGTTAAGGCAAGGA TTCACCCATGCAATCATGATGACATTTGGTTCTAAAGAAGATTTTACTGCATTT CAATCCCACCCAAACCATGTTGAATTCTCGGCTACGTTTTCAGCAGCAATCGAA AAGATCGTTCTTCTTGATTTCCCAGTTGTTGCAGTCAAGACTGCAACTGCTTGA (SEQ ID NO: 38).

[0132] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 88%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 38, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 88% to 95%, 88% to 98%, 89% to 99%, or 88% to 100% homology or identity to SEQ ID NO: 38. Each possibility represents a separate embodiment of the invention.

[0133] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGAGTTATCACTCTCATCATCTTCTTCTTCATCCCTTCCCCAACTTCATACTC ATCCTTCATCATCATCATCTTCTTCACATTACATAAAAAAATCACCTTTTTTTAT TAATAAATTCAATAATCACACCAAATGCAAATTCCACAATTCCTCTGCTCTGAG AACTAATTTCTTCTACACTACCATAACTAAAACCTCATCATCAAGATTCGTTCT AAACAAAAACCCAAACCAATTTTCCGTCAAGGCTTGCAGTCAAGTTGGTTCTG CTGGATCCGATCCAGCATTGAATAAAGTTGCAGACTTTAAAGATGCATTTTGG AGGTTTCTAAGGCCCCATACTATTCGTGGGACAGCATTAGGATCAGTGTCTTTA GTAACGAGAGCACTACTTGAAAACCCAAACTTGATTCGGTGGTCACTTTTGCTC AAGGCATTTTCAGGTCTTGTTGCTTTGATATGTGGGAATGGTTATATAGTCGGG ATCAATCAGATCTATGATATCGGTATTGATAAGGTGAACAAACCATATTTACCT ATTGCTGCGGGAGATCTTTCTGTCCAGTCAGCATGGTTTTTGGTGTTAGCATTT GCAATGGTAGGCGTTATTATTGTTGGGATGAACTTCGGCCCATTCATCACCTCC CTTTATTCTCTCGGTCTTTTCTTGGGCACCATCTATTCCGTTCCACCACTTCGAA TGAAGAGATTTCCTGTTGTTGCATTTCTTATCATCGCCACGGTGAGAGGTTTTC TTCTAAATTTTGGTGTGTATTATGCGGTTAGAGCAGCTCTGGGACTAACATTCC AATGGAGCTCAGCAGTGGCTTTTATCACAACCTTCGTTACATTATTTGCTTTAG TCATTGCCATTACTAAAGATCTTCCTGATGTAGAGGGTGACCGAAAGTTTCAA ATTTCTACTTTTGCAACAAAACTTGGAGTAAGAAACATTGCATTATTAGGGTCA GGACTTCTGCTGATCAATTATATTGGGTCTATCGTTGCAGCACTTTACATGCCT CAGGCTTTCAGGAGCAGCTTGATGATACCATTACATACCATATTAGCTTCCTGT TTGATTTACCAGGCATGGATACTTGAGCGTGCGAATTACACCCAGGAGGCGAT AGCTGGGTACTACCGATTTGTATGGAATCTGTTTTATTCAGAGTACATCATATT TCCTTTCATCTGA (SEQ ID NO: 47).

[0134] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 75%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 47, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 75% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 47. Each possibility represents a separate embodiment of the invention.

[0135] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCTACTATGGCTTCTTCTTTGCTGAATCCTCTTTCTTGTTCCATTAAACCCA ACTCAAACAGACTACCATTACCAACACCCATTTCTCTATCTCGTTCTTGTAGAA GGCTAACAATCAAAGCAACGGAGACAGATGCAAATGAAGTGAAGCCAAAGGC GCCAGAGAAAGCACCAGCTGCAAGTGGATCTGGTTTTAATCAAATTCTTGGGA TTAAAGGGGCTAAACAAGAAACTAATAAATGGAAGATCCGTGTTCAACTTACA AAGCCGGTTACTTGGCCTCCATTAATTTGGGGAGTCGTATGTGGAGCTGCTGCT TCTGGTAACTTCCAATGGACTGTGGAAGATGTTGCTAAATCAATTGTTTGCATG TTGATGTCTGGCCCATTTCTAACCGGTTACACACAGACGATCAATGATTGGTAT GATAGAGACATTGATGCTATTAATGAACCTTACCGTCCAATTCCTTCCGGAGCC ATATCTGAAAATGAGGTCATTACTCAAATTTGGGTACTTCTTTTAGGAGGCATC GGATTGGCTGGTATATTAGACGTGTGGGCAGGGCATAAGTCCCCTACAATATT CTATCTTGCTTTGGGTGGATCATTGTTATCTTATATCTACTCAGCTCCACCTTTA AAGCTCAAACAGAATGGATGGATTGGCAACTTTGCATTAGGAGCAAGCTATAT TAGCTTACCATGGTGGGCTGGTCAAGCATTGTTCGGAACTCTTACACCTGATAT AGTAGTTCTCACACTTTTGTACAGCATAGCTGGGCTTGGTATTGCTATAGTAAA TGACTTTAAAAGTGTTGAAGGAGACAGGAAAATGGGGCTTCAGTCCCTTCCCG TGGCTTTTGGTGAAGAGACAGCTAAATGGATATGTGTTGGTGCCATTGACATA ACTCAACTCTCTATTGCAGGTTACCTTTTAGGATCTGGTAAACCATATTACGCC TTAGCACTCGTTGGGTTGATTGTTCCACAAATCTTTTTTCAGTTCAAGTACTTTC TTAAAGATCCAGTTAAATATGATGTCAAGTATCAGGCTAGTGCTCAACCATTTC

TCATTCTTGGTCTTCTGGTGACTGCCTTAGCTACTAGTCACTGA (SEQ ID NO: 48).

[0136] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 48, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 48. Each possibility represents a separate embodiment of the invention.

[0137] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGAAGTCTTTGATTATTGGGTCTTTTTCTAATAAGGTTTCTTGTTATTCCCCAT CATTACCAGATTCATCTTCTTCACTTATACCAACAGGTTGTTATCATGTATCACT AAGAACATTTCAGCGTAACCGAGCCATTCAAGCTCAATCAAGTCTTGTGAGAT GCAATATTGGCAAATTCAATGAAACATTACTACTTTCGCGGAAACGAAGTACA AAACATGTTGCATGTGCGGTTTCTGAACAACCCATTGAACCAGATGCTACAAA CCCTCAAAGTTCATTACCAAATGCTTTGGATGCTTTCTATAGGTTTTCAAGACC TCATACAGTTATAGGAACTGCATTGAGCATAGTTTCGGTTTCACTCCTAGCGGT TCAAAAGCTTTCGGATTTTTCTCCACTATTCTTCATTGGCGTTTTCGAGGCTATT GTTGCTGCCTTCTTTATGAACATATACATTGTTGGCTTGAACCAGCTATCCGAT ATTGAAATAGACAAGGTTAACAAGCCGTACCTTCCATTGGCATCTGGAGAATA TTCAGTTCAAACTGGTATTATCATTGTATCATCATTTGCAGTCATGAGTTTCTG GCTTGGATGGATCGTGGGCTCATGGCCTTTATTTTGGGCACTTTTCATAAGTTT TCTTCTAGGGACCGCATATTCAATCAATATACCGATGTTGAGATGGAAGCGCTT TGCTCTTGTGGCAGCAATGTGTATTCTAGCTGTAAGAGCTATTATAGTTCAAGT TGCATTTTATTTGCACATTCAGACTTTTGTGTATGGAAGACTCGCCGTGTTCCC AAAACCCGTGATATTTGCAACCGGATTTATGAGTTTCTTCTCTGTTGTTATAGC ATTGTTCAAGGACATACCCGACATTGTTGGAGACAAGATTTTTGGCATTCAATC ATTTACTGTCCGTATGGGTCAAAAACGGGTGTTTTGGATTTGCATCTTATTACT TGAAATAGCTTATGGTGTTGCTATTCTAGTTGGGGCATCATCTCCCTTCCTTTG GAGCCGATACATAACGGTATTGGGTCATGCGATTCTTGGTCTGATTCTCTGGGG TCGTGCCAAGTCAACGGATCTGGAGAGCAAATCAGCAATAACCTCATTTTACA TGTTCATATGGCAGTTGTTCTATGCCGAGTATTTGCTCATACCGCTCGTGAGAT GA (SEQ ID NO: 49). [0138] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 75%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 49, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 75% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 49. Each possibility represents a separate embodiment of the invention.

[0139] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGAGTTATCACTCTCATCATCTTCTTCTTCATCCCTTCCCCAACTTCATACTC ATCCTTCATCATCATCATCTTCTTCACATTACATAAAAAAATCACCTTTTTTTAT TAATAAATTCAATAATCACACCAAATGCAAATTCCACAATTCCTCTGCTCTGAG AACTAATTTCTTCTACACTACCATAACTAAAACCTCATCATCAAGATTCGTTCT AAACAAAAACCCAAACCAATTTTCCGTCAAGGCTTGCAGTCAAGTTGGTTCTG CTGGATCCGATCCAGCATTGAATAAAGTTGCAGACTTTAAAGATGCATTTTGG AGGTTTCTAAGGCCCCATACTATTCGTGGGACAGCATTAGGATCAGTGTCTTTA GTAACGAGAGCACTACTTGAAAACCCAAACTTGATTCGGTGGTCACTTTTGCTC AAGGCATTTTCAGGTCTTGTTGCTTTGATATGTGGGAATGGTTATATAGTCGGG ATCAATCAGATCTATGATATCGGTATTGATAAGGTGAACAAACCATATTTACCT ATTGCTGCGGGAGATCTTTCTGTCCAGTCAGCATGGTTTTTGGTGTTAGCATTT GCAATGGTAGGCGTTATTATTGTTGGGATGAACTTCGGCCCATTCATCACCTCC CTTTATTCTCTCGGTCTTTTCTTGGGCACCATCTATTCCGTTCCACCACTTCGAA TGAAGAGATTTCCTGTTGTTGCATTTCTTATCATCGCCACGGTGAGAGGTTTTC TTCTAAATTTTGGTGTGTATTATGCGGTTAGAGCAGCTCTGGGACTAACATTCC AATGGAGCTCAGCAGTGGCTTTTATCACAACCTTCGTTACATTATTTGCTTTAG TCATTGCCATTACTAAAGATCTTCCTGATGTAGAGGGTGACCGAAAGTTTCAA ATTTCTACTTTTGCAACAAAACTTGGAGTAAGAAACATTGCATTATTAGGGTCA GGACTTCTGCTGATCAATTATATTGGGTCTATCGTTGCAGCACTTTACATGCCT CAGGCTTTCAGGAGCAGCTTGATGATACCATTACATACCATATTAGCTTCCTGT TTGATTTACCAGGCATGGATACTTGAGCGTGCGAATTACACCCAGCGATCACA GTACTTTGACATGTCATCTTGCAGGAGGCGATAG (SEQ ID NO: 50).

[0140] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 91%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 50, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 91% to 100%, 93% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 50. Each possibility represents a separate embodiment of the invention.

[0141] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGAGTTATCACTCTCATCATCTTCTTCTTCATCCCTTCCCCAACTTCATACTC ATCCTTCATCATCATCATCTTCTTCACATTACATAAAAAAATCACCTTTTTTTAT TAATAAATTCAATAATCACACCAAATGCAAATTCCACAATTCCTCTGCTCTGAG AACTAATTTCTTCTACACTACCATAACTAAAACCTCATCATCAAGATTCGTTCT AAACAAAAACCCAAACCAATTTTCCGTCAAGGCTTGCAGTCAAGTTGGTTCTG CTGGATCCGATCCAGCATTGAATAAAGTTGCAGACTTTAAAGATGCATTTTGG AGGTTTCTAAGGCCCCATACTATTCGTGGGACAGCATTAGGATCAGTGTCTTTA GTAACGAGAGCACTACTTGAAAACCCAAACTTGATTCGGTGGTCACTTTTGCTC AAGGCATTTTCAGGTCTTGTTGCTTTGATATGTGGGAATGGTTATATAGTCGGG ATCAATCAGATCTATGATATCGGTATTGATAAGGTGAACAAACCATATTTACCT ATTGCTGCGGGAGATCTTTCTGTCCAGTCAGCATGGTTTTTGGTGTTAGCATTT GCAATGGTAGGCGTTATTATTGTTGGGATGAACTTCGGCCCATTCATCACCTCC CTTTATTCTCTCGGTCTTTTCTTGGGCACCATCTATTCCGTTCCACCACTTCGAA TGAAGAGATTTCCTGTTGTTGCATTTCTTATCATCGCCACGGTGAGAGGTTTTC TTCTAAATTTTGGTGTGTATTATGCGGTTAGAGCAGCTCTGGGACTAACATTCC AATGGAGCTCAGCAGTGGCTTTTATCACAACCTTCGTTACATTATTTGCTTTAG TCATTGCCATTACTAAAGATCTTCCTGATGTAGAGGGTGACCGAAAGTTTCAA ATTTCTACTTTTGCAACAAAACTTGGAGTAAGAAACATTGCATTATTAGGGTCA GGACTTCTGCTGATCAATTATATTGGGTCTATCGTTGCAGCACTTTACATGCCT CAGGTGAAAACCACTTCGATAGACCATTACAGACCATACAGCTTCCTGGTTGA TTTACCAGGTCAAAATGGGATTACTTTAGCAGCTTGA (SEQ ID NO: 51).

[0142] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 91%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 51, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 91% to 100%, 93% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 51. Each possibility represents a separate embodiment of the invention. [0143] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCTACTATGGCTTCTTCTTTGCTGAATCCTCTTTCTTGTTCCATTAAACCCA ACTCAAACAGACTACCATTACCATTACCAATACCCATTTCTCTATCTCGTTCTT GTAGAAGGCTAACAATCAAAGCAACGGAGACAGATGCAAATGAAGTGAAGCC AAAGGCGCCAGAGAAAGCACCAGCTGCAAGTGGATCTGGTTTTAATCAAATTC TTGGGATTAAAGGGGCTAAACAAGAAACTAATAAATGGAAGATCCGTGTTCAA CTTACAAAGCCGGTTACTTGGCCTCCATTAATTTGGGGAGTCGTATGTGGAGCT GCTGCTTCTGGTAACTTCCAATGGACTGTGGAAGATGTTGCTAAATCAATTGTT TGCATGTTGATGTCTGGCCCATTTCTAACCGGTTACACACAGACGATCAATGAT TGGTATGATAGAGACATTGATGCTATTAATGAACCTTACCGTCCAATTCCTTCC GGAGCCATATCTGAAAATGAGGTCATTACTCAAATTTGGGTACTTCTTTTAGGA GGCATCGGATTGGCTGGTATATTAGACGTGTGGGCAGGGCATAAGTCCCCTAC AATATTCTATCTTGCTTTGGGTGGATCATTGTTATCTTATATCTACTCAGCTCCA CCTTTAAAGCTCAAACAGAATGGATGGATTGGCAACTTTGCATTAGGAGCAAG CTATATTAGCTTACCATGGTGGGCTGGTCAAGCATTGTTCGGAACTCTTACACC TGATATAGTAGTTCTCACACTTTTGTACAGCATAGCTGGGCTTGGTATTGCTAT AGTAAATGACTTTAAAAGTGTTGAAGGAGACAGGAAAATGGGGCTTCAGTCCC TTCCCGTGGCTTTTGGTGAAGAGACAGCTAAATGGATATGTGTTGGTGCCATTG ACATAACTCAACTCTCTATTGCAGGTTACCTTTTAGGATCTGGTAAACCATATT ACGCCTTAGCACTCGTTGGGTTGATTGTTCCACAAATCTTTTTTCAGTTCAAGT ACTTTCTTAAAGATCCAGTTAAATATGATGTCAAGTATCAGGCTAGTGCTCAAC CATTTCTCATTCTTGGTCTTCTGGTGACTGCCTTAGCTACTAGTCACTGA (SEQ ID NO: 52).

[0144] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 52, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the polynucleotide comprises a nucleic acid sequence with 90% to 100%, 92% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 52. Each possibility represents a separate embodiment of the invention.

[0145] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCATCTCTAGCTATTGGTTCACTTGGTAGCCCAAGCTCACGTCAGTGTTCT AGCCCCGTTGCATCATCTTCTTCATTTGCGATAGGGTCACAAATAGCTTCAAAG TTTCTTCGGATATCAAAATTTGATAAGACTAAGAACAGCCCCTTAACATTGCAA CAAAAGCATATAAACAAAAGCATAGATCAAAGCTTCTTTGAGCCGCTTCCATT GCACAAAATAAACAAAGACAAGTTTAAGTTGTATGCAACATCTACAAACAATC CTCAGTTTGATGCAACTCATGATTTGAAGACTCCGGAAGTATCCATTATCAACT TTGTGGACGCTCTTTATAGGTTAATAAGGCCGTATACAGCAGTTGTAACGATCG TAAGTGTAGTCGCGATGTCCCTTCTTACAGTTAATAGCCTTTCAGATTTTTCCCC ATTGTTCTTCATCAAAGTGGTACAGGCTCTTATTGGAGGCATATTCATGCAAAT GTATGTTAGTGGTTTCAATCAAATTTGTGATATAGAACTCGACAAGGTTAACA AACAGTCTCTTCCATTAGCGGCTGGAGAACTATCTATGAAAACTGCGATCGTC ATCGCATCACTATCAGCTATCATGAGCTTATCGATTGGTTGGTTTGTTGGCTCC CCACCATTATTGTGGTGTCTTGTTTGGTGGTTTATTGTTGGGACTGCATATTCGG CCAACGTGCTGCCTTATTTGCGATGGAAAAGGTTTCCTTTCACAGCAGCATTTT GCGCCATGACGTCTCGGGCACTAGTTCTTCCTATTGGATATTACTTGCATATGC AGAATTCCATCCCGGGAGTATCTGCATTACTTTCAAGGCCAATATTATTTGCAG TCGCAATGCTCAGTGCATTTTCTTTATCAGCGATGTTCTTTAAGGACATCCCTG ATATTAAGGGAGATAGGATGCATGGAATCAAGTCTCTAGCAATTAAACTGGGT GAAAAACGGGTGTATTGGATTTCCATTTCGATTATTGAAATTGCTTATATTGCT GCTGCATTTATTGGAGCAACTTCACCCATAAGCTGGAGCAAGTATGTAACGAT TATCGGTCATCTTGGAATGGGATTACTACTTTGGGTACGAGCCAGATCAGTAG ATCCGACGAACACGGTAGCCGTTCAATCGATGTATATGTTCCTTATTAAGCTAG TATATGCAGAATACGGACTTATCTCGCTTGTACGCTGA (SEQ ID NO: 53).

[0146] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 53, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 53. Each possibility represents a separate embodiment of the invention.

[0147] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

[0148] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 54, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 100%, 92% to 100%, 94% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 54. Each possibility represents a separate embodiment of the invention.

[0149] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGTTGATTCACCATGAACATTTTTTGACAACCGGATTTGAAAGTTCAAACGAT CGAGCTGCTTATTCAATAAACTTTTCGAAACAACATCACTTACACATGGCGTCT ATAGCTACTGGTTCACTTTGTAGGCCAACCTCACATCAATTTTCTATCCCCGTT GCATCATCTTCTTCATTTGCGACAGGATCACAATTCGCTTCAAAGTTTCTTCAT ATATCAATATCTGCTAAAAAAAGCTCATTGACATTGCAACAAAGGCATATTCA TAAAAACATAGATCAAAGCTTCTTAAAGCCGCTTGCACTTCAAAAATTGAACA AAGACAAGTTTAAGTTGAATGGAACATCTCCAGACAATCCTCAGTTTGATGCA ACTCATGATTTGAAGACTCAAATAGAATCCACTATCAACTTTGTGGACGTTCTT TATAGGTTGTTAAGGCCGTATGCATTACTTCAAATGGGTTTATGTGTAGTCACG ATGAGTCTTCTTACCGTTGAAAGCCTTTCAGATTTTTCCCCATTGTTCTTCGTCA AAGTGGCACAGGCTCTTATTGGAGGCATATTCATGCAAATGTATGTTAATGGTT TTAATCAGATTTGTGATATAGAACTCGACAAGGTTAACAAACCGTCTCTTCCGT TAGCATCTGGGGAACTATCTAAGACAACTACTATAGTCGTCTCTTCACTATCAG CTATTACGAGCTTATCGATTGGTTGGTTTGTTGGCTCCCCACCATTGTTGTGGA GTCTTGTTGTGTGGTTTATTGCTGGGACTACATATTCGGCTAATCTGCCATATTT GCGATGGAAAAGGTTTCCTTTCACAAATATGTTTTGCAACTTGACGATGGCACT AGTTGTTCCTATTGGAACTTACTTGCATATGGAGAATTCCATCCACGGAGTATC CACATTACTTTCAAGGCCACTATTATTTACAGTTGCAATGTGCACTGTGTTTCC TGTTTCGATAATACTCTTTAAGGACATCCCTGATATTAAGGGAGACCGGATGC ATGGAATGAAGTCTCTAGCAATTATACTGGGTGAAAAACGGACGTATTGGATA TGCATTTGGATTCTTGAAATCACTTATATTGCTGCTGCTTTTTTCGGAGCAACTT CACCCATCAGCTGGAGCAAATATGTAACGATTATTAGTCATCTAGGAATGGGG TTCTTACTTTGGCTACGATCCAAATCAGTAGATGTGAAGAACACAGTAGCCGTT CAATCTATGTATATGTTCCTTTGGAAGCTACTCTATGCAGAATATGGCCTTATC

TTGCTTGTACGCTGA (SEQ ID NO: 55).

[0150] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 76%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 55, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 76% to 100%, 83% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 55. Each possibility represents a separate embodiment of the invention.

[0151] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGTTTATTCACCATGAACAGTTTTTGACAACCGGATTTGAAAGTTCAAACGAT CGAGCTGCCTATTCAATAAACTTTTTGAAACAACATCACTTACACATGGTGTCT ATAGCTACTGGTTCACTTTGTAGGCCAACCTCACATCGATTCTCTATCCCCGTT GCATCATCTTCTTCATTTGCGACAGGATCACAATTCGCTTCAATATCTGCTAAA AAAAGCTCATTGACATTGAAACAAAGGCATACTCATAAAAACATAGATCAAA GCTTCTTCAAGCCGCTTGCACTTCAAAAAATGAACAAAGGCAAGTTTAAGTTG AATGCAACATCTCCAGACAATTCTCAGTTGGATGCAACTCATGATTTGAAGAC TCAAATAGAATCCATTATCAACTTTGTGGACGTTCTTTATAGGTTGATAAGGCC GTATGTAGTACTTGGAATGGGTGTAACTATAGTCACGATGTGTCTTCTTACCGT TGATAGCCTTTCAGATTTTTCCCCATTGTTCTTCGTCAAAGTGGCACAGGCTCTT ATTGGAAGCATATTCATGGCAATGTATGTTAATAGTTTTAATGAGATTTGTGAT ATAGAACTCGACAAGGTTAACAAACCGTCTCTTCCGTTAGCGTCTGGGGAACT ATCTATGACAACTGCTATTGTCGTCTCTTCACTATCAGCTATCATGAGCTTATC GATTGGTTGGTTTGTTGGCTCCCCACCATTGTTGTGGAGTCTTGTTGTGTGGTTT ATTCTTGGGACTGCATATTCGGCTAATCTGCCATATTTGCGATGGAAAAGGTTT CCTTTAACAACACTGTCTTCCGCCCTGACGATGGGGGCACTAGTTATTCCTATT GGAAATTACATGCATATGGAGAATTCCATCCGCGGAGTAACCACATTACTTTC AAGGCCACTATTATTTGCAGTTGCAATGTGCGCTGCGTTTCATGTTTCGACGAT ACTCTTTAAGGACATCCCTGATATTAAGGGAGACCGGATGCATGGAATGAAGT CTCTAGCAATTAAACTGGGTGAAAAACGGATGTATTGGATATGCATTTGGATT CTTGAAATCGCTTATATTGCTGCTGCTTTTTTCGGAGCAACTTCACCCATCAGC TGGAGCAAATATGTAACGATTATTAGTCATCTAGGAATGGGGTTCTTACTTTGG CTACGATCCAAATCAGTAGATGTGAAGAACACAGTAGCCGTTCAATCTATGTA TATGTTCCTTTGGAAGCTATTCTATGTAGAACATGGTCTTATCTTGCTTGTACGT TGA (SEQ ID NO: 56).

[0152] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 56, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 75% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 56. Each possibility represents a separate embodiment of the invention.

[0153] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCGTCTATAGCTACTGGTTCACTTTGTAGGCCAACCTCACATCGATTTTCT ATCCACGTTGCATCATCTTCTTCATTTGCGACAGGATCACAGTTTGCTTCAAAG ATTCTTCAGATATCAATATCTGCTAAAAAAAGCTCATTGACATTGCAACAAAG GCATATTCATAAAAACATAGATCAAAGCTTCTTCAAGCCGCTTGCACTTCAAA AAATGAACAAAGACAAGTTTAAGTTGAATGCAACATCTCCAGACAATCCACAG TTTGATGCAACTCGTGATTTGAAGACTCAAATAGAATCCATTATCAAGTTTGTG GACGTTCTTTATAGGTTGTTAAGGCCGTACGCAATACTTGAAATGGGTTTAAGT GTAGTCACGATGAGTCTTCTTACCGTTGAAAGCCTTTCAGATTTTTCCCCGTTG TTCTTCGTCAAAGTGGCACAAGCTCTTATTGGAGGCATATTCATGCAAATGTAT GTTAATGGTTTTAATCAGATTTGTGATATAGAACTCGACAAGGTTAACAAACC GTCTCTTCCGTTAGCGTCTGGGGAACTATCTACGACAACTACTATAGTCGTCTC TTCACTATCAGCTATTATGAGCTTATCGATTGGTTGGTTTGTTGGCTCCCCACC ATTGTTGTGGAGTCTTGTTGTGTGGTTTATTGTTGGGACAACATATTCGACTAA TCTGCCATATTTGCGATGGAAAAGGTTTCCTTTCACAGCAATGTTTTGCAACCT GACGAGGGCACTAGTTGTTCCTATTGGAACTTACTTGCATATGAAGAATTCCAT CCACGAAGTATCCACATTACTTTCAAGGCCACTGTTATTTGCAGTTGCAATGTG CACTGTGTTTCCTATTTCGATAATACTCTTTAAGGACATCCCTGATATTAAGGG AGACCGGATGCATGGAATGAAGTCTCTAGCAATTATACTGGGTGAAGAACGGA CGTATTGGATATGCATTTGGATTCTTGAAATCGCTTATATTGCTGCTGCTTTTTT CGGAGCAACTTCACCCATCAGCTGGAGCAAATATGTAATGATTATTAGTCATC TAGGAATGGGGTTCTTACTTTGGCTACGATCCAAATCAGTAGATGTGAAGAAC ACAGTAGCCGTTCAATCTATGTATATGTTCCTTTGGAAGCTACTCTATGCAGAA TATGGCCTTATTTTGCTTGTACGCTGA (SEQ ID NO: 57).

[0154] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 76%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 57, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 76% to 100%, 85% to 100%, 90% to 100%, or 96% to 100% homology or identity to SEQ ID NO: 57. Each possibility represents a separate embodiment of the invention.

[0155] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCATCTCTAGCTATTGGTTCACTTGGTAGCCCAAGCTCACGTCAGTGTTCT AGCCCCGTTGCATCATCTTCTTCATTTGCGATAGGGTCACAAATAGCTTCAAAG TTTCTTCGGATATCAAAATTTGATAAGACTAAGAACAGCCCCTTAGCATTGCAA CAAAAGCATATAAACAAAAGCATAGATCAAAGCTTCTTTGAGCCGCTTCCATT GCACAAAATAAACAAAGACAAGTTTAAGTTGTATGCAACATCTACAAACAATC CTCAGTTTGATGCAACTCATGATTTGAAGACTCCGGAAGTATCCATTATCAACT TTGTGGACGCTCTTTATAGGTTAATAAGGCCGTATACAGCAGTTGTAACGATCG TAAGTGTAGTCGCGATGTCCCTTCTTACAGTTAATAGCCTTTCAGATTTTTCCCC ATTGTTCTTCATCAAAGTGGTACAGGCTCTTATTGGAGGCATATTCATGCAAAT GTATGTTAGTGGTTTCAATCAAATTTGTGATATAGAACTCGACAAGGTTAACA AACAGTCTCTTCCATTAGCGGCTGGAGAACTATCTATGAAAACTGCGATCGTC ATCGCATCACTATCAGCTATCATGAGCTTATCGATTGGTTGGTTTGTTGGCTCC CCACCATTATTGTGGTGTCTTGTTTGGTGGTTTATTGTTGGGACTGCATATTCGG CCAACGTGCTGCCTTATTTGCGATGGAAAAGGTTTCCTTTCACAGCAGCATTTT GCGCCATGACGTCTCGGGCACTAGTTCTTCCTATTGGATATTACTTGCATATGC AGAATTCCATCCCGGGAGTATCTGCATTACTTTCAAGGCCAATATTATTTGCAG TCGCAATGCTCAGTGCATTTTCTTTATCAGCGATGTTCTTTAAGGACATCCCTG ATATTAAGGGAGATAGGATGCATGGAATCAAGTCTCTAGCAATTAAACTGGGT GAAAAACGGGTGTATTGGATTTCCATTTCGATTATTGAAATTGCTTATATTGCT GCTGCATTTATTGGAGCAACTTCACCCATAAGCTGGAGCAAGTATGTAACGAT TATCGGTCATCTTGGAATGGGATTACTACTTTGGGTACGAGCCAGATCAGTAG ATCCGACGAACACGGTAGCCGTTCAATCGATGTATATGTTCCTTATTAAGCTAG TATATGCAGAATACGGACTTATCTCGCTTGTACGCTGA (SEQ ID NO: 58).

[0156] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 58, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 58. Each possibility represents a separate embodiment of the invention.

[0157] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGGGCTAAACATTTGCACTAGATTTATACCTTGTTTGGTAGTGGTTCTCATG TTTTTGTTCACTTCAACATATTCAGCTACACCAGAAGACAAATTCCTTCAATGC ATATCTCAAAAATTAAATATCACAAACTCAGATGAAGTGTTCACTCAATCAAA CACACGATATTCATCTGTTCTTGAGTCAACAATAGTTAACCTTAGATTTGCCAC TTCTACAACGCCAAAACCATTTGCTATAATCACACCTTTGTCATATTCACATGT ACAATCTGCTGTAGTTTGTGCTAAAAAAGCCGGAATCCGAATTAGAATCAGAA GTGGTGGCCATGACTATGTGGGCCTTTCATATACTTCATCTGATAATGTCCCTT TTGTTGTTCTTGACCTTAAACAGCTGCAGAATGTTACGGTCGAGTATAGTAAGA AAACGGCTTGGGTTGAATCTGGTGCAACCATCGGTCAACTGTATTATTGGGTGT CTCAGAAAAGTAAAAATCTAGGATTCCCGGGTGGGACCTGCGCAACTATAGGG GTCGGAGGGCACCTAAGTGGTGGGGGTTTTGGTACTTTGGTAAGAAAGTATGG TCTATCGGCTGATAACGTTATTGATGCTAAGATAGTTGATGTCAATGGTAGACT TCTTGATAGAAAGTCTATGGGGGAAGATTTGTTTTGGGCAATTAGAGGAGGCG GTGGAGGAAGTTTCGGTGTTGTAGTAGCTTGGATGGTCAATCTTGTTCATGTTC CTGAAAAAGTTACAGCTTTTACTATTGTCAGGACTTTGGAACAAGGTGGTTCG GATCTTTTCAACAAGTGGCAGCACGTTGGGCCCAAATTAACCAAAGATTTGTT CATTAGTGTTATAATACAGCCCATTTCTGTTTGGAATGGAAACGGAACAGTTCA AGTTATATTCAACTCGATGTATCTTGGGACGGTTGATAAGCTCATGAAGACCGT CAACAGTAGCTTTCCGGAGTTGGGGTTACAAGCAAAAGACTGCACTGAGATGA GTTGGATTCAGTCAGTACTTTATTTTGCGGGTTACCCTATAGAAGGAAGTATGG ATGTTCTTAAAGATAGGAAACCCCAGACCAGAAGATACTTTAATAATAAATCA GATCACGTGAAAGAACCGATACCCAAAGAAAGATTAGAAGATTTATGGAAAT GGTGTATGGAAGGTGATTTTCCGATTCTTCTAATGGACCCACTCGGTGGAAAG ATGAACGAGATTGACACAACAAGAATTCCGTACCCTTATAGAAATGGTTATTC GTATATGATACAATACGTTGAGACCTGGGAAAACATTGGGGACTCAGAAAAGC GTATAAGTTGGATGAGACAGATGTATGAGAATATGACACCGTATGTGTCGAAG AATCCAAGGTCAGCTTATGTGAATTATAGGGATTTGGATTTAGGTAAAAACGA TAACGCTAAAAACACGAGTTACTTGGAAGCCATGAAATGGGGAAGCAAGTAC TTTGGTGACAATTTCAAGAGGTTGGCTATGGTGAAAGGTGTAGTTGATCCAGA CAATTTCTTCTTTCATGAACAAAGCATCCCACCTCTGAAAGTGTGA (SEQ ID NO: 71).

[0158] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 68%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 71, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 68% to 95%, 75% to 100%, 72% to 99%, or 68% to 100% homology or identity to SEQ ID NO: 71. Each possibility represents a separate embodiment of the invention.

[0159] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGGGTGTAATCTCTTGCAAAAACTTACTATTTTTGTTTTCTTTATCATGTCTA TTTCCATACCTTCTTTCGCTTACGAACACGAGCACGAGCATGAGCACGAACAC GAAAATGATCAAGATCGAGTACAGGATGAAAAGGAACCTACGGATGTCTTCA CTTCGTGTTTAACTCGGTTCGGTGTTCATAATTTTACAACTCATTCCAAGTCGA ATAATGATAATTCGGTTTACTATGAGCTTCTTAATTTTTCAATTCAAAATCTTA GATTTACGGGTTTATCGATGCCTAAACCGGTTGTTATCGTGTTCCCGGAGACGA AAGAACAGTTAGCAAAAACCGTGGTTTGTGCTCGAGAATCGTCGCTAGAAATT

CGGGTTCGGTGTGGTGGTCATAGCTATGAAGGGACATCATCCGTCTCCACGGA CGGACGTCCATTTGTGGTGATTGATATGACGAGATTAGACAATGTTTCGGTGG ACGTGAACTCGGGAACCGCATGGGTTGAAGCTGGCGCGACACTTGGTCAAATG TACTGCGCGATAGCAGAGTCGAGCACGGTCCATGGTTTCTCGGCAGGGTCATG

CCCCACTGTCGGAACAGGTGGTCATATTTCGGGTGGTGGGTTTGGGTTATTGTC GCGAAAATACGGGCTGGCTGCGGATAATGTAGTCGATGCGGTTTTAGTAACCG CAGATGGTGAATTACTGAACCGCGACACGATGGGTGAGGATGTTTTTTGGGCG ATTAGAGGTGGTGGTGGCGGGGTTTGGGGAATTGTGTACGCTTTTAATGTTAA

ATTATCAAGCGTACCAAAAACAGTCACTAATTTCGTCGTGTCTAGGCCAGGCA CGAAGGGACAAGTGACTGATTTGGTATATAAATGGCAGCATGTTGCGCCTAAA TTGCCCGACGACTTCTACTTATCCTCTTTCGTTGGTGCGGGTTTGCCTGAACGA AAAAATAAACCGGGTTTATCGGCTACGTTCAAAGGTTTTTATTTGGGATCGAA

AAGCAAAGCTTTATCGATCATGAACCAAACTTTCCCCGAGCTAAAAGTCATGG AAAACGACTGTAAAGAAACAAGTTGGATTGAGTCTATTCTTTTCTTCTCGGGTT ATGGAGATGAAAGCTCGGTTTCTGACTTGAAAAATCGCTTCTTACAAGATAAA TTGTATTACAAGGCCAAATCGGACTATGTTCGGAAACCTATTCCAAGATTCGGT

CTAACTACGGCACTAGAAATACTCGAGAAACAACCAAAAGGGTATGTGATCTT GGACCCATATGGTGGCGCAATGCAAACGATAAGTAGTGACTCGATCCCGTTCC CTCATAGGAAAGGTAATATTTTCACTATTCAATATCTAGTGGAATGGAAAGAA CCGGATAACGATAAAACGAATGATTACTTAGCGTGGATACGAGACTTTCATGG

CTCGATGACGCCCTATGTGGCACAAGACCCACGAGCCGCATACATTAACTACA TGGATGTTGATATTGGAGTCATGAATTGGATCAAAACTAGAGTGGACTCAGAT GATGCAGTTGAGATGGGTCGAGAATGGGGGGAGAAGTACTTTTATAAGAATTA CGATCGGCTAGTGAGAGCGAAGACACAAATCGATCCGTACAATGTTTTTAGGC

ATCAACAAAGCATCCCTCCAATGTCTTTGGAGAACAAGAATCGCAGGGGAAGT ATATCTAGTGAGTAG (SEQ ID NO: 72).

[0160] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 71%, at least 77%, at least 85%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 72, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 71% to 95%, 75% to 98%, 80% to 99%, or 71% to 100% homology or identity to SEQ ID NO: 72. Each possibility represents a separate embodiment of the invention.

[0161] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGAAAACATCATCAAATATGCTTTCCGTATTACTCATTCTATTCTTTATCACAT GCTCAAAAGCAGCTCTGGATCCTGATTCCGTCTATCAATCATTTCTCCAATGTT TACCGTTATACTCACCGGAGTCCGCGGAGGAACTCTCCAAGGTCGTATACAGC TCCACCTTGAACACCACAACATACGAAACCGTACTCCAAGAGTACATAAAAAA CGAACGATTTAACACCACCGCGACACCCAAACCGTCGGTTATCATAACCCCGA CAACCGAATCTCAAGTCCAAGCGGCCGTCCTATGCGCGAAAAAAACCGGTGTC CAAATTAAAATTCGTAGCGGCGGACACGACTACGAAGGAATATCGTATATTTC ATCCGAACCTGATTTTATCGTACTTGACATGTTTAACTTTCGGTCGATAAATGT TAATGTAGCGGACGAAACCGCGGTTGTGGGCGCCGGCGCGCAGTTGGGCGAG CTTTATTATAGGATTTACGAAAAAAGTAAAACTCTCGGGTTCCCCGCGGGAGT TTGTCAGACGGTTGGCGTGGGAGGTCATCTGAGCGGCGGTGGTTACGGAACTA TGCTGCGAAAATACGGGTTGTCAGTTGATCATGTGATTGATGCGAAAATTGTT GATGTGAATGGTCAGGTTTTGGATCGGAAATCGATGGGTGAGGATCTATTTTG GGCGATACGAGGTGGCGGTGGCGGTAGTTTTGGTGTGATTTTGTCGTATACTGT GAAGTTGGTTTCGGTTCCCGAGGTTAACACGGTCTTTCGCGTGCTGAAAACGA CGTCGGAAAATGCTTCTGAACTGATTTATAAGTGGCAGTCGATTATGCCGGAT ATTGATAACGATTTGTTTATCAGAGTTTTGTTACAACCGGTTACGGTGAATAAA CAGAAAGTTGGTCGGGCTACGTTTATAGCGCATTTTTTAGGTGATTCTGATAGA TTGGTGGCGTTGATGAGTAAAAACTTCCCGGAATTGGGTTTAAAGAAAGAGGA TTGTATCGAGGTGAGTTGGATAGAATCGGTACTTTATTGGGCTAACTTTGATTT GAATACGACGAAGCCAGAGATTCTTCTAGATCGACATTCCGACAGTGTGAGCT ATGGTAAACGAAAGTCGGACTATGTGCAAACCCCGATTCCTGAATCCGGGTTG GAATCGATTTTTGAAAAGTTAGTCGAATTGGGTAAAATCGGGTTGGTTTTTAAC TCGTATGGCGGGAGAATGTCGGAGGTTGCGGCTGACGCAACACCATTCCCTCA CCGAGCTGGGAACATTTTCAAGATTCAGTATTCGGTTAATTGGAATGATGCGG ACCCTGAACTAGAAGCGAATTACTTAAATCAAAGTAGGGTTATGTACGACTTC ATGACACCATTTGTATCGAAGAATCCGAGAGCTGCATTCTTGAATTATCGGGA TCTCGATATTGGAGTAATGACTCCTGGCAAGAACAGTTATAGTGAAGGTGAAG TTTATGGTGAGAAATACTTCATGGGAAATTTCGAAAGATTGGTGAAGATAAAA ACCGCGGTTGATCCCGATAATTTCTTTAGAAATGAACAAAGTATTCCGACTCG GGCCGCGAAAAATTCAGGCAAGTCAAGAAAGATGATGAAGTAA (SEQ ID NO: 73).

[0162] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 69%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 73, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 69% to 95%, 75% to 100%, 72% to 99%, or 69% to 100% homology or identity to SEQ ID NO: 73. Each possibility represents a separate embodiment of the invention.

[0163] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

[0164] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79%, at least 85%, at least 92%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 74, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 98%, 80% to 99%, 82% to 99%, or 79% to 100% homology or identity to SEQ ID NO: 74. Each possibility represents a separate embodiment of the invention.

[0165] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGACCAATATGTCATAACTAAATTTATATCATATCTTCTGGCGGTTTTTATG GCTTTATTCTGTTCAGATCCAACGGCTGATAAATTTCTTCAATGCTTCACTAAA GATTCAAATGCAACAGATTCAAACTTTGTGTTCACCCAAGAAAACACACAATA TTCATCTGTTCTTGAGTCAACTATCATAAACCTTAGATTTGCAACCTCCATAAC TCCAAAACCAATAGCTGTAATCACACCATTATCATATTCCCATGTACAATCAGC AATACTTTGTTCCAAAAAAATCGGATATCGAATTAGAATCAGAAGTGGTGGGC ATGACTATGCAGGAGTTTCATACACTTCATATGATCATGATCATACCCCTTTTG TTGTTCTTGATCTTAAAGAGCTGAGGACGATAACAATCGATTCGGGTGAGAAC ACTTCATGGGTTGAATCTGGTGCAACTGTTGGTGAACTGTATTATTGGGTGTCC CAAAAAAGTCGAAATCTTGGGTTCCCAGCTGGGATTTGTCCAACTGTTGGGGT AGGTGGTCATTTAAGTGGAGGTGGGGTTGGTACTATGGTAAGAAAGTATGGTC TAGCGGCTGATAATGTAATCGATGCTAGGATTATTGATGTAAATGGGCGAATT CTTGATAGGAAATCGATGGGGGAAGATTTGTTTTGGGCGATTAGAGGTGGTGG GGGAGCTAGTTTTGGTGTTATAGTAGCTTGGAAGGTAAATCTTGTTTATGTTCC TGAAAAAAGTTTCGGTTTTTAG (SEQ ID NO: 75). [0166] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82%, at least 87%, at least 92%, at least 96%, or at least 99% homology or identity to SEQ ID NO: 75, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 98%, 83% to 99%, 85% to 99%, or 82% to 100% homology or identity to SEQ ID NO: 75. Each possibility represents a separate embodiment of the invention.

[0167] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGAGTTGTATATTAGCACTAGATTTATACTATGTTTTCTAGTGGTTCTTATGC TTATGTTCTCTTCAACATATTCAGATCCACTAGAAGATAAATTTCTTCGATGTC TATCTCAAAATTCAAATGCCACAAATTCAGACAATGTGTTCACTCAAGAAAAC ACACAGTATTCATCTGTTCTTGAGTCAACTATCATAAACCTTAGATTTGCAACC TCTACAACTCCGAAACCGTTAGCTATAATCACACCGTTGTCATGTTCCCATGTA CAATCTGCTGTACTTTGTGCCAAAAAAGTCGGAATCCGAATTAGAATCAGAAG TGGTGGCCATGACTATGCAGGCCTTTCATACACTTCATCTGAGAATGCCCCTTT TGTTGTTCTTGATCTTAAACAGCTGCAGAATGTTACGGTCGAGTCTAGTAAGAA AACGGCTTGGGTTGAATCTGGTGCAACCATCGGTCAATTGTATTATTGGGTGTC TCAAAAAAGTAAAAATCTAGGATTCCCAGCTGGGACCTGCGCGACTATAGGGG TCGGAGGGCACCTAAGTGGTGGGGGTTTCGGTACTTTGGTAAGAAAGTATGGT CTATCGGCTGATAACGTCATCGATGCTAAGATAGTTGATGTCAATGGTAGACTT CTTGATAGAAAGTCTATGGGGGAAGATTTGTTTTGGGCAATTAGAGGAGGCGG TGGAGGAAGTTTCGGTGTTGTAGTAGCTTGGAAGGTCAATCTTGTTCATGTTCC CGAAAAAGTTACGGCTTTTACTATTGTCAGGACTTTGGAACAAGGTGGTTCGG ATATTTTCAACAAATGGCAGCACATTGGGCACAAATTAACTAAAGATTTGTTC ATTAGAGTTATAATACAGCCTATTTCTGTTTCGAATGGAAACAGAACAGTTCA AGTTATATTCAACTCGATGTATCTGGGGACGGTTGATAAGCTCATGAAGACCG TCAACAGTAGCTTCCCGGAGTTGGGCTTACAAGAAAAAGACTGCACTGAGATG AGTTGGATTCAGTCAGTACTTTATTTTGCGGGTTACCCAATAGAAGGAAGTATG GATGTTCTTAAAGATAGGAAACCCGACACCCGAAATTACTTTGATAATAAATC AGATCACGTGAAAGAACCGATACCCAAAGAAAGATTAGAAGATCTATGGAAA TGGTGTATGGAAGTTGATTTTCCGATTCTTATAATGGAGCCACTCGGTGGAAAG ATGAACGAGATTGACACAACAAGAATTCCATACCCTTATAGAAAAGGTTATTC GTATATGATACAATATGTTGAGGCTTGGGATAACATTGGGGACTCGGAAAAAC ATATAAGTTGGTTGAGACAGATGTATGAGAATATGACACCATATGTGTCGAAG AATCCAAGGTCAGCTTATGTGAATTACCGGGATTTGGATTTAGGTAAAAACGA TAACGCTAAAAACACGAGTTACTTGGAAGCCATGAAATGGGGAAGCAAGTAC TTTGGTGACAATTTCAAGAGGTTGGCTATGGTGAAAGGTGTAGTTGATCCAGA CAATTTCTTCTTTCATGAACAAAGCATCCCACCTCTGAAAGTGTGA (SEQ ID NO: 76).

[0168] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80%, at least 87%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 76, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 98%, 81% to 99%, 85% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 76. Each possibility represents a separate embodiment of the invention.

[0169] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

[0170] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 77, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 95%, 82% to 97%, 81% to 98%, or 79% to 100% homology or identity to SEQ ID NO: 77. Each possibility represents a separate embodiment of the invention.

[0171] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGGGGAAGATTTGTTTTGGGCAATTAGAGGAGGCGGTGGAGGAAGTTTCGG TGTTGTAGTAGCTTGGATGGTCAATCTTGTTCATGTTCCTGAAAAAGTTACAGC TTTTACTATTGTCAGGACTTTGGAACAAGGTGGTTCGGATCTTTTCAACAAGTG GCAGCACGTTGGGCCCAAATTAACCAAAGATTTGTTCATTAGTGTTATAATAC AGCCCATTTCTGTTTGGAATGGAAACGGAACAGTTCAAGTTATATTCAACTCG ATGTATCTTGGGACGGTTGATAAGCTCATGAAGACCGTCAACAGTAGCTTTCC GGAGTTGGGGTTACAAGCAAAAGACTGCACTGAGATGAGTTGGATTCAGTCAG TACTTTATTTTGCGGGTTACCCTATAGAAGGAAGTATGGATGTTCTTAAAGATA GGAAACCCCAGACCAGAAGATACTTTAATAATAAATCAGATCACGTGAAAGA ACCGATACCCAAAGAAAGATTAGAAGATTTATGGAAATGGTGTATGGAAGGT GATTTTCCGATTCTTCTAATGGACCCACTCGGTGGAAAGATGAACGAGATTGA CACAACAAGAATTCCGTACCCTTATAGAAATGGTTATTCGTATATGATACAAT ACGTTGAGACCTGGGAAAACATTGGGGACTCAGAAAAGCGTATAAGTTGGAT GAGACAGATGTATGAGAATATGACACCGTATGTGTCGAAGAATCCAAGGTCAG CTTATGTGAATTATAGGGATTTGGATTTAGGTAAAAACGATAACGCTAAAAAC ACGAGTTACTTGGAAGCCATGAAATGGGGAAGCAAGTACTTTGGTGACAATTT

CAAGAGGTTGGCTATGGTGAAAGGTGTAGTTGATCCAGACAATTTCTTCTTTCA

TGAACAAAGCATCCCACCTCTGAAAGTGTGA (SEQ ID NO: 78).

[0172] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 78, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 95%, 85% to 98%, 89% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 78. Each possibility represents a separate embodiment of the invention.

[0173] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGAGTTGAAGTTGTTTACATGTAAACTCGTAACAATTATTCTAGCTCTGTCC CTCAGTTTTTTCACATCAACAAGCTCTAGTGACTTTCTTGATTGCATCTCTCAAA AAAACTTATCAAATATTATTTTCACTCCTAATGACACTTCATACTCAACTATTC TCCAATTTACCATCCCAAATCTTAGATTTAACACGCCTAAAACCACAAAACCAT TAGCAATAATCACACCTACAACGTATTCTCACGTACAATCTACTATAATATGCA GCGTGCAATTCAAGCACCATGTTCGCATCCGAAGTGGTGGTCATGACTACGAA GGTCTTTCGTATACTTCTTTCAATAACACCCCTTTTATACTTCTTGATCTCAACC AACTTCGGTCAGTAACGGTTGATTTAGATAGTAATACCACATGGGTCGAATCT GGTGCCACTCTAGGTGAACTTTTGTATTGGGTGTCTCGAAAAAGTAATATTCTT GGGATCCCAACCGGCGAGTGTACATCGGTGGGCGTTGGGGGACAATTAAGTGG AGGAGGGTTTGGAAATATGGCTAGAAAATATGGATTATTTTCGGATAATGCGG TTGACGCACTTATCATTGATGTAAATGGACGAATACTGGATAGAGATTCCATG GGTGAAGATTTGTTTTGGGCAATTAGAGGAGGTGGGGGTGGAAATTTTGGAGT TGTATTATCTTGGAAGATTAATCTAGTTTATGTTCCACCTAAAGTTACGGTTTTT ACTGTTTCTAAGATGTTAGATGAAAATGGTACCAAGATTGTTCACAAGTGGCA ATATATTGCGCATAATATAACGCAAGATTTGTTCATTAATCTTATAGTAAGTCC GGTTACCGTGTCAAATACAACGATTCTAGCAGTAACAATTAACTCGTTGTTTTT GGGGATGAAAAACGAGCTTGTAGCAACAATGGATGTAATATTTCCGGAATTAG GGTTACAAGAAAAGGATTGCATCGAAATGAGTTGGATAGAATCGGTGGTTTAC CATTCGGTTTATTTAAGAGGACAAAGTGTTGATGCTCTAATAGAAAGAAGACC ATGGCCTAAAAGTTACAACAAGTATAAATCAGATTATGTGAAGAAACCTATGT CAGAGAAAGCGCTTGAAAAACTGTGGAAATGGTGTTTGGAAGAGAATTTGATT CTGGCGATCGAGCCACATGGTGGAAAGATGAGCGAGATCGATGAGAGTTCGA CTCCGTATCCGCATAGAAAAGGGAATTTGTACATCATACAATATGTCATGCAA TGGGATGAAGGGTATAACACAACTCAAAAGCATGTTGCTTCCATAAGAAGGGT ATATAAGAAAATGGCACCTTTTGTGTCCAAGAACCCTAGGGAAGCTTATGTGA ACTTTAGAGATTTGGATTTGGGTACTAATGGTAATGCATGTGGTACAAGTGGT GCAAGCTATGTGCAAGCATTGAGATGGGGAAAAAAGTATTTTAAGGGAAATTT TAAGAGGTTGGCAATAGTGAAAGGTAGAGTTGACCCAACTAATTTCTTCTGTA

ATGAACAAAGCATCCCACCTTATTCGTATTAG (SEQ ID NO: 79).

[0174] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO:7 9, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 95%, 82% to 97%, 81% to 98%, or 79% to 100% homology or identity to SEQ ID NO: 79. Each possibility represents a separate embodiment of the invention.

[0175] In some embodiments, the DNA molecule comprises the nucleic acid sequence: ATGACCAACTCGGAACTTGTTTTCATCCCATCTCCGGGAGCCGGCCACCTACCA CCTACGGTGGAGCTAGCAAAGCTCCTCCTCCACCGCGAACCACAGCTTTCGGT TACCATCATCATCATGAACCTCCCTCATGAAACAAAACCCACTACTGAAACTC GAATGTCCACTCCTCGTCTACGCTTTATTGACATACCTAAAGACGAGTCAACAA AAGATCTTATCTCACGCCACACATTCATATCCGCCTTCCTTGAACACCAAAAGC CACATGTTCGAAACATTGTCCGTTCAATCACCGAGTCTGACTCGGTTCGGTTAG TTGGGTTCGTCGTAGACATGTTTTGTATTGCCATGATGGACGTCGCAAACGAGC TGGGTGCTCCAACTTATCTTTATTTCACCTCCTCTGCCGCTTCACTTGGCCTCAT GTTTTGCCTACAGGCCAAACGAGACGACGAGGAGTTTGATGTGACCGAGTTGA AGGACAAAGATTCGGAACTCTCCATTCCGTGTTACACCAACCCACTCCCAGCT AAGTTGTTACCTTCGGTACTATTTGATAAGAGAGGTGGGTCAAAAACATTTATT GACCTCGCTAGAAAGTATCGCGAGTCGAGGGGTATAGTTGTAAATACTTTTCA AGAACTCGAAAGCTATGCTATTGAGTATCTTGCAAGTAGTAATGCTAACGTCC CACCGGTGTTTCCGGTGGGGGCGATACTAAACCAAGAAAAAAAGGTAAATGA TGATAAGACGGAGGAGATTATGACATGGTTAAACGAGCAACCGGAGAGTTCG GTGGTGTTTCTATGCTTCGGGAGCATGGGAAGCTTCGGTGAGGATCAAATTAA GGAAATAGCGCTTGCTATCGAAGAAAGCGGACAAAGGTTTTTGTGGTCACTAC GTCGTCCCCCTTCGAACGAAAATAAGTACCCGAAAGAATACGAAAATTTTGGA GAGGTTCTTCCGGAAGGTTTCCTTGAACGAACATCGAGTGTAGGGAAAGTGAT AGGATGGGCCCCACAAATGGCAGTGTTGTCCCATTCTTCAGTTGGTGGGTTTGT GTCACATTGCGGATGGAACTCGACACTCGAGAGCATATGGTGTGGTGTACCGG TAGCTGCGTGGCCATTATATGCAGAACAACAACTTAATGCTTTTAAACTAGTG GTGGAGTTGGGCTTAGCGGTCGAGATTAAGATTGATTATAGGAGTGAGAACGA GATTATTTTGACATCGAAAGAAATCGAGAGTGGGATTAGGAGGTTGATGAATG ATGAAGAGTTGAGGATGAAAGTGAAAGAGATGAAGGGGAATAGTAGGTTTGC AGTTTCAGAGGGTGGATCTTCTTACGTATCCATTAGGCGTTTTATCGACCTTGT GATGACTAAGGAGTAA (SEQ ID NO: 89).

[0176] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 89, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 95%, 78% to 100%, 79% to 99%, or 77% to 100% homology or identity to SEQ ID NO: 89. Each possibility represents a separate embodiment of the invention.

[0177] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGCCGACCTCAGAACTTGTTTTCATCCCATCCCCCGGTGTCGGCCACCTGTCG CCTACCATCGAACTCGTCAATCAACTCCTCCACCGCGACCAGCGCCTGTCTGTC ACAATCATCGTCATGAAGTTCTCTCTTGAATCAAAACACGATACAGAAACTCC TACATCCACTCCTCGATTACGCTTCATTGATATCCCTTATGACGAGTCCGCTAT GGCTCTCATTAACCCGAACACGTTCCTCTCCGCTTTCGTCGAGCACAACAAACC TCATGTTCGAAACATTGTTCGTGACATTTCCGAGTCTAACTCGGTTCGGCTCGC GGGGTTTGTTGTGGACATGTTTTGTGTAGCTATGACGGATGTAGTGAACGAGTT TGAAATTCCAACCTATATTTATTTTACCTCGACCGCGAACTTACTCGGACTCAT GTTTTACCTTCAGGCCAAGCGTGACGACGAGGGTTTTGATGTCACCGTGTTGAA AGACTCAGAATCAGAGTTTTTGTCTGTTCCGAGTTATGTCAACCCGGTTCCAGC TAAGGTTTTACCTGATGCAGTTTTGGATAAGAATGGTGGGTCTCAAATGTGTCT GGATCTTGCAAAAGGGTTTCGTGAGTCGAAGGGCATAATAGTAAATACATTTC AAGAACTCGAAAGGCGTGGAATCGAGCACCTTTTAAGTAGTAACATGAACCTC CCACCTGTGTTTCCTGTGGGGCCTATATTGAACTTGAGAAATGCGCCAAACGAT GGTAAAACGGCCGATATCATGACATGGTTAAATGACCATCCAGAGAACTCGGT TGTGTTCTTGTGTTTCGGAAGTATGGGAAGCTTCGAGAAAGAACAAGTGAAGG AGATAGCGATTGCCATCGAACAGAGTGGGCAACGGTTTCTATGGTCACTCCGT CGTCCAACATCGCTAGAAAAGTTTGAGTTTCCAAAGGATTACGAGAACCCGGA GGAGGTTTTGCCAAAGGGATTTCTTGAAAGGACAAAAGGTGTGGGAAAGGTTA TCGGGTGGGCCCCACAAATGGCGGTGTTGTCTCACCCGTCAGTGGGAGGGTTC GTGTCCCACTGTGGGTGGAACTCCACATTGGAGAGCATATGGTGTGGGGTCCC AATAGCGGCTTGGCCACTATATGCGGAACAAAAAATTAATGCTTTTCAATTGG TGGTAGAGATGGGAATGGCAGCTGAGATTAGGATCGACTATCGGACTAATACG AGACCGGGTGGTGGTAAAGAGATGATGGTAATGGCTGAAGAGATTGAGAGTG GTATTAGGAAGTTGATGAGCGATGATGAGATGAGAAAGAAAGTGAAAGGTAT GAAGGATAAAAGTAGGGCTGCTGTTCTTGAAGGTGGATCATCTCACACATCAA TTGGGATTTTAATTGAGAATTTGGTGAGTATAACGATCTAG (SEQ ID NO: 90).

[0178] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 76%, at least 77%, at least 85%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 99, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 76% to 95%, 77% to 98%, 80% to 99%, or 76% to 100% homology or identity to SEQ ID NO: 90. Each possibility represents a separate embodiment of the invention.

[0179] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGTGGGTCTCAAATGTTTTTGGATCTTGCAAAAAGGTTTTCGTGAGTCGAAG GGCATAATAGTAAATACATTTCAAGAACTCGAAAGGCGTGGAATCGAGCACCT TTTAAGTAGTAACATGGACCTCCCACCTGTGTTTCCTGTGGGGCCGATATTGAA CTTGAGAAATGCGCGAAACGATGGTAAAATGGCCGATATCATGACATGGTTAA ATGACCAGCCAGAGAACTCGGTTGTGTTCTTGTGTTTCGGAAGTAGGGGAAGC TTCAAGGAGGAACAAGTGAAGGAGATAGCAATTGCCATCGAACAAAGTGGGC AACGGTTTCTATGGTCACTCCGTCGTCCAACATCGATAGAAACGTTTGAGTTTC CAAAGTATTACGAGAACCCGGAGGAGGTTTTGCCAAAGGGATTTCTTGAAAGG ACAAAAAGTGTGGGAAAGGTTATCGGGTGGGCCCCACAAATGGCGGTATTGTC TCACCCGTCAGTGGGAGGGTTCGTGTCCCACTGTGGGTGGAACTCCACATTGG AGAGCATATGGTGTGGGGTCCCAATAGCGGCTTGGCCACTATATGCGGAACAA CAAACTAATGCTTTTCAATTGGTGGTCGAGATGGGAATGGCAGCAGAGATTAG GATCGACTATCGGACTAATACACCACTGGTTGGTGGTAAAGACATGATGGTAA CGGCTGAAGAGATTGAGAGAGGTATTAGGAAGTTGATGAGCGATGATGAGAT GCGAAAGAAAGTGAAAGACATGAAGGATAAGAGTAGAGGTGCAGTTTTAGAG GGTGGGTCATCTCATACATCAATTGGGAATTTAATTGATGTTTTGGTGAGTATA

ACGATCTAG (SEQ ID NO: 91).

[0180] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 78%, at least 80%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 91, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 95%, 78% to 100%, 80% to 99%, or 79% to 100% homology or identity to SEQ ID NO: 91. Each possibility represents a separate embodiment of the invention.

[0181] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCGACCAACAACCTCCATTTCCTTCTAATTCCCCATATAGGTCCAGGCCAC ACTATTCCCATGATAGATATGGCTAAACTTCTTGCAAAACAACCAAATGTAAT GGTTACAATAGCTACAACACCTCTTAATATCACCCGTTACGGGCACACTCTCGC AGACGCCATCAACTCGTTTCGCTTCTTTGAGGTTCCATTTCCGGCAGTTGAGGC TGGATTACCTGAAGGATGTGAAAGCACGGATAAAATCCCAAGTATGGATCTAG TACCGAACTTTTTAACCGCGATTGGTATGCTAGAACAAAAGCTAGAAGAGCAT TTTCACTTGCTAGAGCCTCGTCCGAATTGTATTATTTCTGATAAGTACATGTCG TGGACGGGTGATTTTGCTGATAAGTATCGGATCCCTAGAATTATGTTTGATGGA ATGAGCTGTTTTAACGAGTTATGTTACAACAATTTGTATGAAAACAAGGTGTTT GAAGGGATGCATGAAACAGAACCATTTGTTGTCCCTGGTTTACCCGATAAAAT TGAGCTAACACGAAAACAGCTCCCACCTGAGTTTAACCCGAGCTCGATTGATA CAAGTGAGTTTCGTCAGCGGGCTAGGGACGCTGAGGTGAGGGCTTATGGAGTT GTGATCAATAGTTTTGAGGAGTTGGAACAAGAATATGTTAATGAGTATAAGAA GTTAAGAAAGGGTAAGGTTTGGTGTATCGGCCCGCTGTCACTGTGCAATAGTG ACAATTCGGATAAAGCCCAAAGAGGAAATATAGCGTCAGTCGATGAAGAAAA ATGTTTAAAATGGCTTGATTCTCATGAAGCCGACTCAGTAGTTTACGCTTGTTT TGGTAGCCTTGTTCGGGTCAACACCCCACAACTAATTGAGCTTGGTTTAGGCCT AGAAGCATCAAATCGCCCGTTCATTTGGGTGGTTAGATCGGTTCATAGAGAAA AAGAGGTCGAGGAATGGCTAGTGGAAAGTGGTTTTGAGGAGAGAATTAAAGA TAGAGGTTTAATAATCCGAGGTTGGGCCCCACAAGTACTTATCTTGTCTCACCC TTCTATTGGAGGGTTTTTAACGCATTGCGGTTGGAACTCGACCCTAGAATCAGT CTGTGCAGGTGTTCCAATGATCACATGGCCTCAATTTGCAGAGCAATTTATCAA CGAGAAGCTAATAGTGCAAGTGTTGGGGATTGGTGTGGGTGTTGGAGTTGATT CTGTTGTCCATGTGGGCGAAGAAGATAGATCTGGGGTGAAAGTGAAGAGGGA GAGTGTTACGAAGGCTATTGAGAAAGTCATGGATGACGAGATTGATGGAAATG AGAGACGGAGGAGATCGAAAGAGTTTGGAAAGATAGCTAATAACGCGATTAA AGAGGGAGGGTCTTCATACCTTAACTTGACTCTGCTAATTCAGGACATAATGC GTTATGCAAATGCAGATGCTTCAAGCTAA (SEQ ID NO: 92).

[0182] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 87%, at least 92%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 92, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 87% to 100%, 88% to 99%, 89% to 99%, or 87% to 100% homology or identity to SEQ ID NO: 92. Each possibility represents a separate embodiment of the invention.

[0183] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGAAAAAACACCTCATATAGCCATTGTACCAAGTCCAGGAATGGGCCACTT GATCCCTTTAGTTGAGTTTGCTAAAAAACTAAAAAATCACCACAACATACATG CAACTTTCATCATCCCAAATGATGGACCTTTATCTATTTCTCAAAAGGTTTTTCT TGATTCACTTCCTAATGGTTTAAACTATCTCATTCTACCTCCGGTAAATTTTGAT GATTTACCACAAGATACCCAAATCGAAACTCGAATTAGTCTAATGGTAACACG GTCTCTTGATTCGCTACGTGAAGTGTTTAAGTCATTAGTTGTGGAAAAAAATAT GGTTGCTTTGTTTATTGATCTTTTTGGGACAGATGCATTTGATGTTGCTATTGAA TTTGGTGTTTCACCTTATGTGTTCTTTCCATCAACTGCTATGGCTTTATCTTTGTT TCTATATTTGCCTAAACTTGATCAGATGGTTTCATGTGAGTATAGGGAGCTTCC TGAACCGGTTCAAATTCCAGGTTGTATACCGGTTCGTGGACAAGACTTGGTTG ACCCGGTTCAAGATAGAAAGAATGATGCATACAAATGGGTGCTTCATAATGCA AAGAAGTATTCAATGGCTAAGGGTATAGCGGTAAATAGCTTCAAGGAGTTAGA AGGTGGAGCTTTGAATGCTTTGCTAGAAGATGAACCGGGTAAGCCAAAAGTTT ATCCGGTCGGACCGTTAGTACAAACCGGTTTTAGTTGTGATGTTGATTCGATAG AGTGCTTGAAGTGGTTAGATGGTCAGCCATGTGGTTCTGTTTTGTATATATCTT TTGGAAGCGGTGGGACCCTTTCATCCAGTCAACTTAATGAGTTAGCTATGGGTT TGGAGTTGAGTGAACAACGGTTCATATGGGTGGTTAGAAGCCCGAACGATCAA CCAAACGCCACGTACTTTGATTCTCATGGTCACAAAGACCCTCTTGGTTTTTTG CCCAAAGGGTTCTTGGAAAGAACCAAAGGAATTGGGTTTGTGATCCCTTCTTG GGCTCCACAAGCCCAGATCCTGAGTCACAGTGCCACAGGTGGATTTTTAACCC ACTGTGGTTGGAACTCAATTCTCGAGACTGTAGTCCATGGTGTGCCGGTGATTG CTTGGCCACTTTATGCCGAGCAAAAGATGAATGCAGTGTCTTTAACCGAGGGT ATAAAAATGGCGTTAAGACCCACGGTTGGTGAAAATGGGATTGTGGGTCGCTT AGAGGTTGCGAGAGTTGTGAAGAGTTTACTGGAAGGAGAAGAAGGGAAGGCG ATTAGGAGTCGAGTTCGTGATCTCAAGGATGCTGCTGCTAATGTTCTTAGTAAA GATGGGTCTTCTACAAAAACTTTAGATCAATTGGCTGTACAGTTGAAAAAACA AGAATTAAGCTAG (SEQ ID NO: 93).

[0184] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 87%, at least 92%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 93, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 87% to 100%, 88% to 99%, 89% to 99%, or 87% to 100% homology or identity to SEQ ID NO: 93. Each possibility represents a separate embodiment of the invention.

[0185] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGACTCAAAAGCAAATGCAAATGCAACCTCACTTTCTCTTAGTAACATATCCC GCACAAGGTCATATTAACCCGTCTCTCCAGTTCGCTGAACGTCTCATTCGGTTG GGTGTCAAAGTCACCTTCACAACAACTGTCTCTGCTTACCGCCGAATGAGTAA AGCGGGCAACATCTCAGAGTTTTTAAATTTTGCTGCTTTTTCAGACGGCTTTGA TGACGGTTTCAACTTCGAAACAGACGATCATGGTCTCTTCTTAACTCAATTGAG AAGCAGGGGAAAAGATAGCTTGAAAGAAACAATTCTTTCAAATGCTAAAAAT GGAACTCCAATTAGTTGTTTGGTTTACACACTCCTACTCCCTTGGGCTCCTGAG GTGGCACGTGGCCTAAACGTGCCCTCAGCCTTTCTTTGGATTCAACCAGCTTCT GTTTTACGACTTTACTATTACTACTTCAATGGGTACAATGAACTCATCGGTGAC GATTGTAATGAACCTTCATGGTCCATTCAATTACCAGGGTTACCATTGCTCAAA AGTCATGACCTTCCCTCCTTTTGTCTCCCTTCAAATCCTTACAGTAATGTACTGG CTCTAGTCAAAGAGCATTTAGATATGCTGGATCTGGAAGAGAAGCCTAAAATA CTTGTGAATAGTTTTGATGAGTTGGAGAGGGAGGCGTTGAATGAAATTAATGG AAAACTAAAAATGGTCGCCGTAGGGCCTTTGATTCCATCAGCTTTTTTGGATGG ACAAGATGCATCTGACAAATCTTTTAGGGGAGATTTGTTTGAAACATCCAAAG ATTATTTGGAATGGATGAATACAAAGCCTGAAGGGTCCATTGTTTACATATCTT TTGGTAGTCTTTTAGTGTTCTCAAAGATACAAAAGGAGGCAATGGCACATGCT TTGTTAGAGTGCGGGAGGCCGTTCTTGTGGGTGATAAGAGATGGAGAACAAGG AGAACAACTAAGTTGTATTGAGAAATTGGAACAATTAGGTTTGATAGTCCCAT GGTGTAGTCAACTAGAGGTATTATCACACCCTTCTTTAGGTTGTTTTGTGACAC ATTGTGGTTGGAACTCGACTTTAGAGAGTATAGTTTGTGGAGTTCCTGTGGTGG CATTTCCTCAATGGACAGATCAGACGACAAATGCAAAGCTTCTAGAAGACGTA TGGGGAACAGGGGTGAGAGTGACAACTAATGAAGACGGGGTTGTTGAAAGCG AGGAGATAAGAAGGTGCATCGAAATGGTAATGGGAGGCCGTGATAGTGAATC AACAATGAGAAAGAATGCTAAGAAGTGGAAGGATGTGGGAAGAGAGGCTATG AAAGAAACAGGATCTTCTTATATGAATCTCAAGGCTTTTATTAAAGAAGTGAA TGATGGTGAATCAACCATCAAAACTGAAATTGTTTCAACTATATGA (SEQ ID NO: 94).

[0186] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80%, at least 87%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 94, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 98%, 81% to 99%, 85% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 94. Each possibility represents a separate embodiment of the invention.

[0187] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGACTAAAATACAACAGCAACCTCACTTTCTCTTAGTAACATATCCCGCACA AGGTCATATTAACCCGTCTCTCCGGTTCGCCGAACGACTCATTCGGTTGGGTGT CAAAGTCACCTTCACAATAACTGTCTCTGCTTACCGCCGAATGAGTAAAGCGG GCCACATCTCAGAGTTTTTAAATTTTGCTGTTTTTTCAGACGGCTTTGATGACG GTTTCAACTCCAAAACAGACGATTATGGTCTCTTCTTAACTCAATTCAGAAGCA GGGGAAAAGATAGCTTGAAAGAAACAATTCTTTCAAATGCTAAAAACGGAAC TCCAGTTAGTTGTTTGGTTTACACACTCCTACTCCCTTGGGCTCCTGAGGTGGC ACGTGGCCTAAACGTGCCCTCAGCCTTTCTTTGGATTCAACCAGCTTCTGTTTT ACGACTTTACTATTACTACTTCAATGGGTACAATGAACTCATCGGCGACGATTG TAACGAACCTTCATGGTCCATTCAATTACCAGGGTTACCATTGCTCAAAAGTCG TGACCTTCCCTCCTTTTGTCTCCCTTCAAATCCTTACGCTGATGTACTGACTTTA GTCAAAGAGCATTTAGATGTGTTGGATTTGGAAGAGAAGCCTAAAATACTTGT GAATAGTTTTGATGAGTTGGAGAGGGAGGCGTTGAATGAAATTGATGGGAAAC TAAAAATGGTTGCCGTAGGGCCTTTGATTCCATCAGCTTTTTTTGGATGGACAG GATGCATCTGA (SEQ ID NO: 95). [0188] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 95, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 95%, 82% to 97%, 81% to 98%, or 77% to 100% homology or identity to SEQ ID NO: 95. Each possibility represents a separate embodiment of the invention.

[0189] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGGTTCATGGCGGAATTCAAGAACAACGTCTACAAAGTTTTTATGGTTGATT TTACCGTTGATGGTGGTGACGGTGATTATAGGGGTAAAAAAGTCAAATTATGG GTCGAAGTATAATTATCCTTGGGTTTGGAGTTCAGTGATTAATTCTTATTCTTCT TCTGCGGTTAAAGAAGATGTAACGGTGGTGGCTGAAGGTCCTGTTGAATCATT TGGGTTGCGGTCAACGGTGGTCAACGGTGGTGGTGTGGTGGCGGAAGGGCCGT CGGAAGATTTTGGTTTTAATTCTTCTTATCCACCGTTGGCTATGGAAGATGAAA TGGATGTTGAGCTACCTGCTATTGCCAAGGAAGATGACTTGAACGCGACGTTG AGTGGACCCGACCTTTTTGTGTCTGCAAATCAAACTGGCGGACTTCATGTTGAT ATTGGAATCAACAGTAAGTATACCAGTTTGGATAAGCTTGAAGCCCGCTTAGG TCAGGTTCGAGCTGCAATAAAAGAAGCCGAATCAGGAAATAGAACTTACGATC CGGATTATGTACCAGAGGGTCCTATGTACTGGCATGCAGCCTCATTTCACAGG AGTTATTTGGAGATGGAAAAGCAATTTAAGGTGTTTGTATATGAAGAAGGAGA ACCACCAATATTTCATAACGGTCCTTGCAAAAACATATATGCAATGGAAGGTA ACTTTATCTACCATATGGAAACAACCAAGTTTAGGACAAAAAACCCCGAAAAA GCTCACACGTTTTTTCTCCCAATGAGTGCTGCAATGATGGTGAGGTTTATCTTT GAGCGTGATCCAAATGTTGACCATTGGCGTCCTATGAAGCAAACAATTAAAGA TTATGTTGATCTTGTGGGTGGTAAGTACCCATTTTGGAATCGAAGCTTAGGAGC CGATCACTTTACTGTTGCGTGCCACGATTGGGTGAGTAAAGTCTTTTATCCCAT CATTTTCATGCTTTTACTAGTATTTATCTTCAGAATGTCGACTGGATGCTGA (SEQ ID NO: 96).

[0190] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 96, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 95%, 83% to 98%, 82% to 99%, or 82% to 100% homology or identity to SEQ ID NO: 96. Each possibility represents a separate embodiment of the invention.

[0191] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGTCAACCGTTGAGGTTGCAAAGTTACTTGTGAATCGAGATCATCGTCTCTTC

ATAACATTCCTTATCATTCAGCCTCCTAGCTCGGGTTCTGGCTCAGCTATCACC

ACCTACATCGAATCATTAGCTGAGAAAGCTATGGACCGCATATCCTTCATTGA

GCTACCTCAAGATAAAATCCCACCACCACGTTACCCGAAATCCCTGCCAACTG

CAGAATCGAAAGCTCATCCCCTTATTTTCATGATTGAGTTCATTAAGTGTCACT

GCAAATATGTTAGAAACATTGTATCTGACATGATAAGTCAACCGAGTTCGGGT

CGGGTAGCTGGGTTGGTAATCGACATGCTTTGTTTCAGCATGATGGATGTCGCT

AATGAGTTCAACATTCCAACCTATGTATTTGTCACTTCTAATGCTGCTTTTCTTG

GATTTTATTTATATGTCCAGATACTCTCTAATGATCAGAACCAAGACGTTGTTG

AGCTGAGCAAATCTGATACCGAGATATCGGTTCCAGGTTTTGTAAAGCCGGTG

CCAACGAAAGTCTTCTGGACTGTTGTCCGCACTAAAGAAGGACTGGACTTTGT

TTTGTCATCTGCCCAGAAACTTAGACAAGCCAAAGCAATCATGGTTAATACCTT

CTTGGAGTTGGAAACACACGCAATCAAGTCGCTGTCTGATGACACCAGCATCC

CGCCTGTGTATCCAGTGGGACCGATACTCAATTTAGAAGGTGGTGCTGGCAAA

ACGTTCGACAATGACATTAGCAGGTGGTTGGACAGTCAACCGCCTTCCTCGGT

GGTGTTCTTGTGCTTTGGAAGCCACGGATGTTTTGATGAGATCCAAGTGAAGG

AGATAGCACATGCTTTAGAGCAGAGTGGCCACCGTTTCTTGTGGTCCCTACGTC

GACCTCCATCAGATCAAACATTAAAAGTTCCCGGTGATTACGAGGATCCAGGA

GTGGTATTACCGGAAGGATTTCTTGAGCGAACTGCTGGACGTGGGAAAGTAAT

TGGGTGGGCCCCGCAGGTGATGGTGCTGGCTCACCGTGCAGTTGGAGGCTTCG

TGTCCCACTGTGGGTGGAACTCGTTGTTGGAGAGTTTGTGGTTCGGCGTACCAA

CGGCAACATGGCCGATCTATGCTGAGCAGCAGATGAATGCGTTTGAAATGGTG

GTGGAGCTGGGACTGGCTGTGGAGATAACATTGGATTATAGGAATGATATGGA

TATGTTCATTGTCACCGCACAGGAGATAGAAAGTGGTATAAGAAAGGTGATGG

AGGATAATGAGGTAAGAACAAAAGTGAAAGAGAGAAGTGAGAAGAGTAGAG

CAGCAGTGGCGGAGGGGGGGTCATCGTATGCATCTGTTGGTCATCTTATTAAA GAATTTACAGGAAACATCTCCTAA (SEQ ID NO: 97).

[0192] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 97, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 95%, 82% to 97%, 81% to 98%, or 79% to 100% homology or identity to SEQ ID NO: 97. Each possibility represents a separate embodiment of the invention.

[0193] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGTCATCATTCATCAACTTTGTTGAATCCACAACACAACTTCAACCACAATTC GAACAACTCATCCAAACACTTCTTCCCATAACTGCGATAATATCGGATGGTTTT TTGATGTGGACACAAGATTCCGCCGAAAAATTCAATATCCCACGTCTGGTTTTT TATGGGACAAACATATTTTTCATGACTATGTGTAACATTATGGCACAATTTAAG CCACATGCGGCTGTTAATTCTGATGATGAGGCGTTTGATGTACCCGGTTTCACC AGGTTTAAGTTGACGGCTAATGATTTTGAGCCGCCTTTTAATGAGGTTGAACCG AAAGGTTCAATGTTGGATTTTTTATTGGAGCAACAAAAGGCTATGGTTAGGAG CCATGGGTTGGTGGTTAATAGTTTTTATGAGATTGAACATGAGTTTAATGTTTA TTGGAATCAGAACTATGGACCTAAAGCTTGGTTAATGGGACCATTTTGTGTAG CTAAGCCATATGCATCAAACGTCATGGATTCCGAGATATCGACTAAGGTGGTG AAAAAATCAGCATGGATCCAGTGGCTTGACAGGAAGCTTGCAGCGAACGAGC CAGTGTTATACATCTCATTTGGAACACAGGCAGAGGCGTCTATGGAGCACTTA CACGAGGTCGCTATTGGTTTGGAACGATCAAATGTAAGCTTCATTTGGGTGGT AAAAGCGAAGCAGATGCAATTAATTGGAGCAGGGTTTGAAGAGAGGGTGAAG GGGAGAGGAAAAGTGGTGACAGAATGGGTGGATCAGATGGAAATCTTGAAAC ATGAAATTGTAAGCGGGTTTTTAAGTCATTGTGGGTGGAACTCACTGCTAGAG AGTATGTGTGTGGGTGTGCCGGTGCTTGCAATGCCGTTGATGGCGGATCAACT CTTAAATGCAAGGTTGGTTGTGGAGGAGATTGGGATGGGGCTACGGTTGTGGC CGAGGGGTATGGTGGCACGTGGGATAGTTGGGGCGGAGGAAGTCGAGAAAAT GGTGGTGGAGTTGATGGAAGGGGAAGGTGGGAGAAGGGTGCGGAAAAGGGTC ATCGAGGTTAGAGAAATGGCATATGGTGCGATGAAGGAAGGAGGGTCATCAT CGAGGACATTAGACTCGTTGATTGATCATGTTTGTGAAGCCTTTCATAAGACGG TTTAA (SEQ ID NO: 98).

[0194] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 78, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 98, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 78% to 95%, 82% to 97%, 81% to 98%, or 78% to 100% homology or identity to SEQ ID NO: 98. Each possibility represents a separate embodiment of the invention. [0195] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGGGAGCTTGAAGAAAGGTGCACATATACTAATATTCCCATTCCCAGCACA AGGTCATATGCTCCCACTCCTAGACCTAACTCACCACCTAGCCACCAATGGGTT AACCATAACCATATTAGTCACACCCAAAAACCTACCAATCTTGAACCCACTTTT ATCTTCATCTCCAAACATCCAACCACTAGTCTTCCCTTTCCCACCTCACCCAAG

ACTTCCACCACATGTTGAAAATGTTAAAGACATAGGTAACCATGCAAATGTCC CAATCACAAACTCACTAGCCAAATTACAAGACCAAATAATCCAGTGGTTTAAC TCCCACCATAACCCTCCTGTTGCCATCATCTCAGATTTCTTTCTTGGATGGACCC AACACCTTGCAAACAAACTTGGTATCCCTCGTGTCGGGTTTTTTTCTTCTGGTG

CTTACTTGACTGCTGTTCTTGATTATGTTTGTCATAATATTAAAACTGTTAGGTC TCAAGAGGAGACTGTTTTTCATGACTTGCCAAATTCTCCTTGTTTTAAATTCGA GCATCTTCCGGGTTTGGCCCAGATTTATAAAGAGTCCGACCCGGAATGGGAAT TGGTTCTTGATGGTCATATTGCGAATGGGTTAAGTTGGGGTTGGATTGTGAATA

CTTTTGATGGGTTGGAGTCTCGGTATATGGAGTATCTGACAAAGAAAATGGGT GTCGGACGGGTTTTTGGTGTCGGGCCAGTTAATTTGTTAAACGGGTCGGATCCC ATGACCCGTGGGAAATCGGAATCCGGGTCTGATTCCGGTGTGTTGAACTGGCT CGATGGAAAACCCGATGGGTCGGTTTTGTATGTGTGTTTTGGAAGTCAAAAGT

TTCTTACTAATGACCAAATGGAGGGATTGTCAATTGGGCTTGAACAAAGTGGG GTCCATTATGTTTGGGTTGTGAAAGACGAACAAGGTGATGCAATTAGGTCCGG GTCGGGTAGAGGACTAGTGGTAACGGGTTGGGCCCCGCAAGTTTCAATATTGG GTCATGGAGCGGTGGGTGGGTTTTTGAGTCATTGCGGGTGGAACTCTGTTTTGG

AAGCAATTGTAAATGGAGTTATGATATTGGCTTGGCCAATGGAGGCTGATCAA TTTGTTAATGCTAAGTTGTTAGTGGATGACCATGGTATAGGGGTGTGGGTTTGT GAGGGGCCGAATACGGTTCCTGATTCAACCGAGTTGGCTCGTAAAATTGGTGA GTCAATGAGTACGGATAAGAGTGAGAAGGTAAAGGCGAAAGAAATGAAAAAC

AAAGCAAATGAAGCAGTTAAAGAAGGTGGGAGCTCATCAATGGAATTAAGCA GGCTTGTTAAGGAGCTGTCTAACTTTGAGACAAATGGGCCATGA (SEQ ID NO: 99).

[0196] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 99, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 95%, 82% to 97%, 83% to 98%, or 82% to 100% homology or identity to SEQ ID NO: 99. Each possibility represents a separate embodiment of the invention.

[0197] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGATACCCAAACACAAGTCAAGAAACAAAAACTTGAAACCATGGAACATA

AAACATCATCCGCCGAAATCTTCGTGCTACCATTTTTTGGTACGGGTCATATAA

ACCCAGCAATGGAGCTTTGCCGGAACATTTCATCACATAATTACAAAACTACC

CTCATCATCCCTTCACATCTTTCTTCATCTATTCCTTCTCCCTTTTCTTCAACTTT

ACTTCATGTTGCTGAGATCCCTTTCACTGCTTCTGACCCGGAACCCGGATCCGG

AAGAGGGAACCCACTTGATGCCCAGAACAAGCAAATGGGTGAAGGGATTAAG

GCGTTTATGTCTGCAAGATCTGACGGATCAAAACTACCCACGTGTGTTGTTATT

GATGTCATGATGAACTGGAGTAAAGAGATATTTGTTGATTACCAGATTCCTATT

GTCTCTTTTTTTACTTCTGGAGCTACTAATACTGCTATGGGTTATGGTAGGTGG

AAAGCTAAAATTGGTGATCTGAAGCCCGGGGAGACCCGTGTGATCCCCGGACT

TCCTACTGAAATGGCCGTTACTTTTGCGGATTTAAATCAAGGTCCTAGAGGCCG

TGGGCCTCGGCCGGATGGGTCAAGGCCTGACGGGCCAAGGTCTGGACCACCTG

GTGGGATGAGGTCCGGACCACCTCACGGGATGAGGGGTGGGGGACGAGGTGG

GCGGGGCGGTGGACGACCCGGCCCGGATGCGAAACCACGTTGGGTAGATGAA

GTGGACGGGTCGGTAGCTTTGCTTATCAACACGTGTGACAATCTCGAGCGTGT

GTTTATTGATTACATTGCTGAAGAAACCAAGATTCCCGTTTATGGTGTTGGCCC

GTTGCTGCCCGAAAAGTATTGGAAGTCAGCGGGTTCGTTGCTTCGTGATCATG

AAATGAGGTCTAACCATAAAGCGAATTACTCGGAAGATGAGGTGTTTCAATGG

CTAGAATCGAAACCAGTTGGGTCGGTTATTTACATATCGTTTGGGAGTGAAGTT

GGCCCGACTATAGACGAGTATAAAGAGTTAGCTGGATCATTGGAAGGATCGAA

TCAGAATTTCATTTGGGTGATCCAGCCCGGTTCGGGGATAACGGGCATGCCAA

GATCGTTTTTGGGCCCGGTTAATACGGATAGTGAGGAAGAAGAGGAAGGGTAT

TATCCTGAGGGATTAGATGTTAAAGTTGGGAACAGGGGTTTGATCATCACTGG

ATGGGCTCCACAGTTGTTGATTTTGAGCCACCCATCTACAGGCGGGTTCTTATC

ACATTGTGGGTGGAATTCAACTGTTGAGGCGATTGGGCGAGGTGTTCCGATAT

TGGGTTGGCCCTTGAGGGGTGATCAGTTTGATAATGCGAAACTTGTGGCGAAT

CATTTGAAAATTGGGTTTGCGATGTCAAGTGTGGCGAGTGAAGGCGGACGACC

TGGGAAGTTCAACAAGGAGACTATAACAGCAGGGATTGAGAAACTAATGAAT GATGAAGATGTGCATAAACAGGCAAAGAAACTTAGTAAAGAATTTGAGAGTG GGTTTCCAGTGAGTTCAGTTAAAGCATTGGGTGCTTTCGTGGAGTCTATTAGCC

AGAAAGCAACCTAA (SEQ ID NO: 100).

[0198] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 74, at least 80%, at least 85%, at least 87%, at least 93%, or at least 99% homology or identity to SEQ ID NO: 100, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 74% to 95%, 75% to 97%, 76% to 98%, or 74% to 100% homology or identity to SEQ ID NO: 100. Each possibility represents a separate embodiment of the invention.

[0199] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGTCACTCGTGACTAATAACCCACATTTACTAGTCTACCCATTACCTACCTCC GGCCATATCATTCCGTTACTCGACCTGACCGACCTTCTTCTCCGCCGTGGCCTC ACCATCACCGTCGTGATATCCACCACAGACCTTACGCTTCTCGACACTCTCCTA TCCTCACACCCCACGTCTCTACACAAACTTTACTTCCCCGACCCCGAAATCGGC CCATCTTCTCATCCCGTTATTGCCAGAATAATTGCCACCCAAAAACTATTTGAT CCAATTGTTAAATGGTTTGAATCGCACCCTTCGCCTCCAGTCGCCATCATTTCC GACTTCTTTCTTGGGTGGACTAATGAACTTGCATCACGTTTAGGTATTCGACGT GTGGTGTTTTCACCTTCGGGAGCTCTTGGTCATTCCATTTTACAAAGTTTGTGG CGTGACGTGGCGGAGATCAATGCAAAAAATGTTGATGGAAATGGAAACTACTC GATTTCTTTTACCGATATACCAAACTCGCCCGAATTTCATTGGTGGCAGTTGTC ACAACTTTTGCGTGTTCATAGGGAGGGAGATCCGGACTTCGAATTTTTTAGGA ATGGAATGTTGGCTAATACGAAAAGTTGGGGTATTGTTTACAACACATTTGAA AGGATTGAAAAGGTTTACATTGACCATGTGAAGAAACAAATAGGTCATGATCG GGTATGGGCAATAGGCCCATTACTTCCCGAAGAACATGGCCCAGTTGGTAGCA CCGCACGTGGTGGGTCCAGTGTAGTGCCACCTCATGACCTTCTCACGTGGTTGG ACAAAAAGCCCCATGACTCGGTCGTATATATATGTTTTGGGAGTCGATTGACG TTAAGTGAGAAGCAAATGAGTGCATTAGCAAGTGCACTCGAGCTCAGTAACGT TGATTTTATTTTGTGTGTGAAGGCAAGTGGTTCGAGCTTCATTCCTAGTGGGTT CGAAGATCGAGTGGTGGGTCGGGGGTTCGTAATCAAAGGTTGGGCCCCACAGT TGGCGATATTGAGACATCGGGCTGTGGGGTCGTTTGTGACTCATTGTGGGTGG AACTCAACATTGGAAGGTGTTTCATCAGGAGTGATGATGTTGACGTGGCCAAT GGGTGCAGACCAATATGCAAATGCTAAGCTATTGGTCGACCAGTTAGGTGTTG GGAAACGAGTTTGTGAAGGTGGACCCGAGAGTGTTCCTGATTCAACTGAGTTG GCTCGGTTGTTGGAAGAGTCACTGAGTGGTGATACATCCGAGCGAGTTAAAGT CAAGGAGCTAAGTCGGGAAGCTAACACAGCTGTGAAAGAAGGAACTTCAATA AGAGATTTRGAACATGTTCGTTAACCTTTTATCCGAGCTCTAA (SEQ ID NO: 101).

[0200] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 101, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 95%, 82% to 97%, 81% to 98%, or 80% to 100% homology or identity to SEQ ID NO: 101. Each possibility represents a separate embodiment of the invention.

[0201] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCAACCCAAGTCAAAACCGAGGAGAAGCATTTGAAGGTAGAAATCATAA ACAAAACCTATGTGAAACCTGAAACACCACTAGGAAGAAAAGAGTGTCAATT GGTCACATTTGATCTTCCTTATATAGCCTTCTACTACAACCAAAAGTTGATCAT CTATAAAGGTGGTGTCGAGGAGTTCGAGGATACCGTCGAGAAACTGAAAGAC GGGTTAAAGGTAGTTTTGGGAGAGTTTCATCAATTGGCTGGAAAATTAGACAA AGATGATGACGGGGTGTTTAAGGTAGTGTACGATGATGACATGGATGGGGTGG AGGTGCTTTCTGCGGTCGCGGAAGACACTGCGACCGCAGATTTGATGGACGAA GAAGGGACCATCAAGCTTAAGGAGTTGGTCCCTTATAATAGTGTTTTGAACAT AGAGGGGCTTCATCGTCCGCTTTTATCGATTCAGATAACAAAACTAAAAGATG GGCTTGTACTGGGCTGTGCGTTCAACCACGCGATTTTAGACGGTACATCCACCT GGCACTTCATGAGCTCCTGGGCCCAAATTTGCTCCGGATCCAAATCCATTTCAG CGGCGCCTTTCCTTGACCGTACCCAAGCGCGTAACACGCGCGTGAAACTCGAT CTCACCCCTCCCGCCCAAACTAACGGCAATTCAAACGGCGACACTAACGGTGA TGCGAGCGCCACGAAGCCACCAGCACCGGCACCGTTAAGAGAAAAAATCTTC AAATTCTCAGAGTCAGCAATCGACAAAATCAAAGCAAAAATCAATGCGAATCC ACCGGAAGGATCAACCAAGCCATTCTCCACATTTCAATCGCTCTCCACACACA TATGGCACGCAGTTACACGCGCTCGCAATCTAAAACCGGAAGACTACACCGTT TTCACTGTTTTCGCCGATTGCCGGAAACGTGTCGATCCTCCGATGCCGGATAGC TATTTCGGAAACCTAATTCAAGCGATCTTCACCGTCACCGCTGCCGGATTATTG CAGGCGAATCCACCGGAATTCGCGGCGTCAATGATACAAAAAGCGATTGATAT GCACGATGCGAAAGCAATTGAAGCGCGTAACAAAGAATGGGAAAGTAATCCG ATTATATTTCAATACAAAGACGCCGGAGTTAATTGTGTTGCGGTTGGGAGTTCA CCTAGGTTTAAGGTTTATGATGTGGATTTCGGGTTTGGTAAACCCGAAAGTGTT CGGAGCGGGGCGAATAACCGGTTTGATGGTATGGTTTATTTGTATCAGGGAAA AAGTGGTGGAAGGAGTATTGATGTGGAGATTAGTTTGGATGCAAGTGCAATGG GAAATCTTGAAAAGGATAAGGAATTTCTTATCCAAGAATAA (SEQ ID NO: 115).

[0202] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 84%, at least 87%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 115, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 84% to 100%, 88% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 115. Each possibility represents a separate embodiment of the invention.

[0203] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCTTCTCTTCCTCTCTTAACTGTTCTTGAACAATCCCATGTATCACCACCGC CAGCCACCGTAGTCGATAAATCGTTGTCGCTAACCTTTTTCGATTTCCTGTGGC TAACTCAACCTCCAATTCACAATCTTTTCTTTTACGAGTTTTCAATCGACGAAA CTCAGTTCGTGGAAACTATCGTTCCTAGTCTTAAAAACTCGTTATCAATCACTC TTCAACATTTTTACCCGTTCGCCGGTAACCTTATCTTATTTCCTGATAACAAAA GGCCTGAAATTCGTTACGTTGAAGGTGATTATGTCATGGTTACATTTGCAAAAT CTAGCCTTGACTTCAATGAACTAGTAGGAAACCATCCTAGAGATTGTGACCAG TTTTATGATCTTATTCCTCCATTAGGTGAAAGTGTGAAAACTTCTGAATTTCGA AAAATCCCACTCTTTTCGGTCCAGGTGACGTTTTTTCCACAAAAAGGCGTATCG ATTGGTATGACGAATCATCATAGTCTTGGCGATGCTAGCACTCGGTTTTGTTTC TTGAACGCGTGGACATCGATTTCTAGATCTAGTTCAGATGAGTCATTTCTAGCA AACGGAACTAAACCGTTTTACGATAGAGTGATAAGTAACCCGAAACTAGATCA AAGTTATCTAAAATTTTCCAAGATCGATACTCTTTACGAGAAGTATCAACCTTT AAGCCTCTCTAGACCATCTAATAAACTTCGTGGCACGTTTATCTTGACGCGAAA AATCCTAAACGAGTTGAAAAAAAGTGTGTCAATTAAACTACCAACTTTATCAT ATGTATCATCTTTTACGGTTGCATGTGGTTATATTTGGAGTTGCATAGCGAAAT CACGAAACGATGATCTACAACTATTCGGGTTCACTATTGATTGTAGGGCACGTT TGGATCCACCGGTTCCATCAACTTATTTTGGGAATTGTGTCGGGGGTTGTATGG CGATGGCAAAAACAACGTTGTTAACCGAAGACGATGGATTTATAACGGCTGCT AAATTGCTTGGAGAAAGTTTACACAAGACGTTGACCGAATCGGGTGGAATCGT GAAAGATATAGAAGTGTTTGAAGATTTGTTTAAGGATGGATTACCAACAACTA TGATAGGAGTTGCGGGAACACCAAAGCTTAAGTTTTATGAGACGGATTTCGGG TGGGGGAACCCGAAAAAGGTGGAAACGATTTCGATTGATTATAACATGTCGAT TTCTATGAACGCTTGTAGAGAATCGAAGGATGATTTGGAGATTGGTGTTTGCCT TATGAATACTGAAATGGAAGCTTTTGTTCGTTTATTTGATGAAGGATTAGAATC ATACGTTTAG (SEQ ID NO: 116).

[0204] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77%, at least 85%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 116, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 100%, 80% to 100%, 85% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 116. Each possibility represents a separate embodiment of the invention.

[0205] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGGAAGTGAAAATGTTCACAAAATAATGAAAATCAACATCACTAAATCATC ATTTGTACAACCCTCAAAGCCTACAGTACTACCCACTAACCACATATGGACTTC TAACTTAGATTTAGTTGTGGGTAGAATTCATATTTTAACCGTTTACTTTTACCGT CCAAATGGTGCTTCGAATTTTTTTGATCCAATTGTTATGAAAAAAGCTTTAGCT GATGTGCTTGTTTCTTTTTATCCGATGGCCGGAAGAATAAGTAAAGATGATAAT GGTAGAGTTGTAATTAATTGTAATGATGAAGGTGTTTTGTTTGTTGAAGCTGAG TCAGATTCCACGTTGGATGACTTCGGTGAGTTTACACCGTCTCCGGAGCTCCGA CAACTTACCCCGACGATTGATTACTCCGGTGACATTTCAACGTACCCGCTATTT TTTGCACAGGTAACGCATTTCAAGTGTGGAGGAGTTGGTTTTGGTTGTGGTGTG TTTCATACACTTGCAGATGGTCTATCCTCTATACATTTCATCAACACATGGTCG GACATGGCTCGTGGTCTCTCGATAGCCATCCCGCCATTCACTGACCGGACCCTT CTTCGTGCACGTGAACCACCCACTCCCACTTTTGACCACGTAGAGTACCACCTC CCTCCGTCCATGAAAACTACCTCACAAACCAACAAATCCAGAAAGCCTTCCAC GGCCATGTTAAAGCTTACGCTTGATCAACTAAATGCTCTCAAAGCTGCTGCTAA GAATGAAGGCGGCAACACCAATTATAGCACGTACGAGATCCTGGCGGCTCATT TATGGCGGTGTGCCTGCAAGGCTCGAGGACTCCCTGATGACCAACTAACCAAA TTGTACGTGGCAACAGATGGACGGTCCAGATTGAGCCCTCAACTCCCACCAGG CTATCTAGGCAATGTTGTGTTCACCGCCACCCCAGTTGCCAAATCAGCTGACCT CACGACTCAACCATTGTCTAATGCAGCATCTTTGATCCGAACCACATTGACAA AAATGGATAACGACTATTTGAGATCTGCCATTGATTACCTTGAGGTGCAGCCA GATCTATCTGCTTTAATTCGTGGTCCTAGTTACTTTGCTAGCCCGAATTTGAAC ATAAACACGTGGACCCGGTTGCCAGTACATGATGCGGATTTCGGGTGGGGTCG GCCTGTTTTCATGGGACCAGCAGTGATATTGTATGAGGGCACCATCTATGTTCT ACCAAGCCCAAACAATGATAGGAGTATGTCATTGGCAGTCTGTTTAGATGCAG ATGAACAACCATCGTTTGAGAAGTTCCTGTATGACTTTTAA (SEQ ID NO: 117).

[0206] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 87%, at least 90%, at least 93%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 117, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 87% to 100%, 90% to 100%, 93% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 117. Each possibility represents a separate embodiment of the invention.

[0207] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGCCTTCATCATCATCATCGCCTTCTTCAACAGCTGATTCAGTTACCATAATC TCAAAATGCACAGTCTACCCACATATGAAAAACTCAACACCAGAATCCTTGCA GCTCTCTGTTTCTGATCTCCCAATGCTTTCATGTCAATACATACAAAAAGGTGT CTTACTTTCTCAACCGCCACCCAATCACACCAACAATATCATTTCCCACTTAAA ACTCTCTCTCTCTAAAACCCTCTCTCACTTCCCACCTCTCGCCGGCCGTCTTTCG ACCGACTCTCACGGCCACGTCTCTATCATCTGCAACGATTCCGGCGTCGAATTC GTTCACTCCACCGCTAACCACCTCCACACCCACCAAATCTTACCCCTCAATTCC GACGTTCACCCATGTTTTAAAACCTTTTTTGCTTTTGATAAAACTCTGAGTTAC GCCGGCCACCACCAACCAATCGCCGCCGTGCAAGTCACGGAGCTTGCTGATGG ACTCTTTATTGGGTGTACGGTAAATCATGCTGTCGTTGACGGGACTTCTTTTTG GAACTTTTTTAATACTTTTGCTGAGATCACAAAAGGGTGTCAGAAAGTAACGA ACTTGCCGGATTTTAGCCGGGAAAATGTTTTCATTTCTCCGGTTGTTTTGCCTCT TCCCTCCGGCGGCCCGTCGGCGACGTTCTCAGGTGATGAGCCGTTGAGGGAAA GGATCATTCATTTCAGTAGAGACGCGATTCTGAAGATGAAATTCAGAGCTAAT AATCCTTTATGGCGGCAACCACAAAATTCGGATCTGGATGATACAGAGATTTA CGGGAAAGTGTGTAACGACATTAACGGCAAAGTTAACGGGGCGTTTAAACCCA AAAGTGAAATTTCGTCCTTCCAGTCTTTATGTGGTCAGTTATGGCGTGCGGTTA CACGCGCGCGTAAATTCAACGACCCTATAAAAACGACGACGTTTCGAATGGCG GTGAATTGTAGGCATAGGCTAGACCCAAAGGTCGACAAACTTTATTTCGGGAA CTTGATCCAAAGCATCCCGACCGTTGCTTCAGTTGGGGAGTTGTTATCACATGA TTTGTCGTGGGCAGCCAATGAGCTTCACCAAAATGTGGTGGCGCATGATAATG CTACCGTGCGCAGGGGTGTTAAGGATTGGGAGAATAATCCAAAGTTGTTTCCT TTGGGGAATTTTGATGGTGCTATGATCACAATGGGAAGTTCTCCTAGGTTTCCA ATGTATAATAACGATTTCGGGTGGGGCCGCCCAATGGCGGTTCGTAGTGGTAA AGCTAATAAGTTTGATGGAAAGATTTCGGCTTTTCCGGGACGTGATGGTGATG GTAGTGTCGATCTTGAGGTTGTTTTAGCTCCCGAAACCATGGCATGTCTTGAAC GTGACCATGAATTTATGCAATATGTATCTTAA (SEQ ID NO: 118).

[0208] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82%, at least 90%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 118, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 118. Each possibility represents a separate embodiment of the invention.

[0209] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGAAGTGGTTTTTCATAACCCATAAAGCAACCCAGCGTTGCCTTAATTCTAAA CAATTTCATCTTCACGGAGGTTCGAATTTCGTTTCCGGTAATAGATGTTTTCTTG CATCACACTCAATGGAGCGGCCAAAATTCATGTTGATACCATATTATCCCTACC AAATTCGGTCCTTAAATTCGAGTCACCGATATAGTAGTACGTCACCCAGCGGA TCCCCTCACAGTTTTCTGAATGGTACTAAGAATGAAAACTATACGAAGAAGGT AGATCTTGAAATAATTTCAAGAGAAATCATCAAACCAGCTTCTCCAACTCCAC ATCATTTAAGAAACTTCAACTTATCACTTTTGGACCAAATAGTATTTGATTGCT ACACCCCTGTAATCCTCTTTATTCCAAATAGTAATAAGGCTACTGTTACGGATG TCATGATCAAAAGATTGAAACATCTCAAGGAGACTTTATCTCGAATTCTAAGT CAATTTTATCCCTTTGCGGGAGAAGTTAAGGACAGATTGCATATCGAATGCAA TGACAAGGGAGTCAATTACATCGAGGCTCAAATCAATGAGACATTGGAAGAAT TTCTATGTCATCCAGATAACGAAAAGGCGAGGGAGCTTATGCCCGAAAGCCCT CATGTTCAAGAATCTGCAATAGGAAACTATGCTATGGGTATTCAGATAAACAT TTTCAGTTGCGGAGGGATTGGACTTTCCATGAGCATGGCACACAAGATCATGG ACTTCTACACATATACGATCTTCATGAAAGCATGGGCTGCAGCTGTTCGAGGTT CACCAGATACAATTATTTCACCAAGTTTTGTGGCTTCTGAGGTCTTTCCTAATG ATCCCAGCCAAGAAGATTCAATTCCTATCGAGTTAAAGTCTAGTAATTTGCTTA GCACAAAAAGATTTGAGTTTGATCCTACTGCGTTGGCTCTCCTAAAGGGACAA GTTGTCGCCAGCGGATCACCTCCCCAACGAGGACCAAGTCGTATGGAGGCGAC AACAGCCGTTATTTGGAAGGCCGCTGCAAAAGCTGCATCGACTGTCAGAAGAT TCGATCCAAAGTCACCTCATGCGCTGGCGTTACCAGTAAATATACGTAAAAGG GCATCACCTGCTCTCCCAGACAATTCCATAGGAAACATAGTTATGCGAGGTAT AGCAATTTGTTTTCCTGAGAGCCAACCGGACTTGCCAACTCTTATGGGTAAAGT GAGAGAATCAATAGCGAAACTTAACTCAGATTACATTGAGTCCCTGAAAGGTG AAAAGGGGCATGAGACAGTTAATAAGATGTTGAAGGAGTTGAAGCTTCGGAC GAATATGACAAAGGTAGGAGGGAAATTCGTTGCTAGTTGCATATTTAATAGTG GAATATATGAGTTGGATTTCGGGTGGGGAAAACCGATATGGTTCTATGTTGTG AATCCAGGAAGCGATAGTTGTGTGGTTTTGACTGATACGCTGAAGGGTGGTGG TGTTGAAGCCACAATTACACTACCACCAGATGAAATGGAGATATTCGAACGTG ATCATGAGCTTCTATCCTATACTACCATCAACCCTAGTCCACTGCGATTTCTTG ACCATTGA (SEQ ID NO: 119).

[0210] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 74%, at least 80%, at least 85%, or at least 95% homology or identity to SEQ ID NO: 119, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 74% to 100%, 80% to 100%, 87% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 119. Each possibility represents a separate embodiment of the invention.

[0211] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGAGGTGCCTGACCAATTCCACCTAAACATTCTTGAACAATGCCACGTTTCA CCATCACCAAATTCCATCATACCTTCATTTTCACTACCCTTAACATTCTTAGAC ATCCCATGGCTTTTTTACCCTTCAAATCAAACCCTTTTTTTCTTCCCAGAACCAC CACCCAAAACCACCATCATCACCACCCTTAAACAATCACTCTCTCTTACCCTCC ACCACTTCCACCCTCTCGCCGGAAACCTCTCACTTCCATCACCTCCGGCGGAAC CCCACATTGTTTACACCAAAAATGACTCAATTGCACTCACAATTGCTCAAACA AACACCAACATCCACCATCTTTCTTGCAATCACCCAAGAAGTGTAAAAAATCT TTACTCTCTTTTACCCAAACTCCCATCTCCATCCATGTCACGTGAAACTCACGT GGGCCTTGTTATCCCCCTTCTTACCATCCAAATTACGGTTTTTGCTGATTTGGGG TATTCGATCGGAGTCACTATGCAACATGCAGCAGTTGATGAACGGACATTTGA TCAGTTTATGAAATGTTGGGCGTCTGTTTGTACATCTTTGTTGAAAAATGACTC ACTTTTTACATTCAAGTCTACACCTTGGTACGATAGGAGCGTAATTATCGACCC CAAATCGCTGAAAACAACGTTTTTAAAGCAATGGTGGAACCGATCTAATTCTC TCAATGAGTCACATGATCAAGAAAATGATGATCATGATCTTGTTCTAGCAACTT TTGTTTTGAGTTCATTAGATATTAACATGATCAAGAATCATATTCTTGCAAAAT GCAAGATGATAAATGAGGATCCACCACTACATTTATCTCCTTATGTTAGTGCAT GTGCTTATTTATGGAAATGTTTAATCAAAATTCAAGAAACCCATGATTCTATTA AGGGTGGTCCTCTCTATTTAGGGTTTAATGCCGGTGGGATTACTCGATTAGGGT ACGACATACCTTCAACTTATTTTGGGAATTGTATAGCTTTTGGGAGATGCAAGG CATTTGAGAGTGAATTATTGGGTGATAATGGTATTGTTTTCGCGGCAAAATCGA TTGGAAAAGAGATCAAGAGGCTTGATAAGGATGTTTTAGGAGGTGCTAATAAG TGGATTAGTGATTGGGATGAATTAACCATTAGGCTTCTTGGTTCACCAAAAGTT GATTCATATGGTATGGATTTTGGATGGGGTAAAGTTGAGAAGGTTGAAAAAAT ATCAAGTATTTCAAATCACGGTAGGGTTAATGTAATTTCTTTGAGTGGATGTAA GGATTTTAAAGGTGGAATAGAGATAGGGGTTGTTCTTTCTGTGGCTAAAATGA ATGTTTTCACTTCCCTCTTTCATGGAGGTTTAATGGAGTTTGCATATTGA (SEQ ID NO: 120).

[0212] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79%, at least 87%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 120, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 120. Each possibility represents a separate embodiment of the invention.

[0213] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGAAAAATAAGAACCCGACTAGTGTGATCAGAGAGGCTTTAGCTAAGGTATT GGTGTTTTATTATCCGTTTGCTGGCCGGCTCAAGGAAGGGCCGGCCAGGAAAC TGATGGTGGATTGTTCTGGTGAAGGTGTGTTGTTTATTGAGGCAGAAGCTGATG TCACGTTGAAACAATTTGGTGACGCACTTCAACCGCCATTTCCTTGTTTAGAAG AGCTTCTTTACGATGTTCCTGGATCTACTGGTATTCTAGATACACCATTATTGCT GATTCAGGTGACACGATTGTTATGTGGAGGTTTTATCTTTGCTCTACGACTCAA CCACACCATGAGCGACGCAGCAGGTCTCGTTCAATTCATGACAGGGCTTGGTG AAATGGCACAAGGTGCATCAAGGCCATCAACGTTGCCTGTATGGCAAAGGGA GTTGCTTTTTGCAAGGGACCCACCACGCGTGACTTGTACTCATCACGAGTATAC TGAAGTGGAAGACACCAATGGTACAATCATTCCGCTAGATGACATGGCACATA AATCATTTTTCTTTGGACCTTCTGAGATATCAGCGTTGCGAAGGTTCGTTCCAT CATACCTAAAAAAGTGTTCTACTTTTGAGGTCTTAACCGCTTGCCTATGGCGTT GTCGTACAATTGCACTCCAGCCAGATCCCGAAGAAGAGATGCGCATGATATGC ATTGTTAATGCGCGTGGAAAGTTTAATCCTCCCCTATTACCCAAAGGATATTAT GGAAATGGTTTCGCTATACCAGTGGCCATTTCAACAGCTGGAGACCTATCTAG CAAACCATTAGGTCACGCATTGGAACTTGTAATGAAAGCCAAATCCAATGTCA CTGAGGAGTATATGAGATCAGTAGCCGACTTAATGGTAATCAAGGGACGACCC CACTATACGGTTGTCCGAAGCTACCTTGTATCGGATGTGACTCACGCTGGATTT GATGTTGTTGATTTCGGGTGGGGGAAAGCGTCCTATGGAGGACCTGCAAAAGG GGGAGTAGGTGCTATTCCCGGAGTTGTTACTTTCTTTATACCTTTTACAAACCA TAAAGGCGAGTCTGGAATTGTGCTACCTATATGTTTGCCGAGTGCAGCCATGG ATAAGTTTGTTGAAGAGTTAAATAAGATGTTGGTCCCAGACAACAACGAACAA GTACTCCGAGAACACAAGTTACTAGTTCTCGCTAGATTGTAA (SEQ ID NO: 121).

[0214] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 121, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 121. Each possibility represents a separate embodiment of the invention.

[0215] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGCACAAATCGACACTCCATTGACATTCAAAGTCCGGAGACATGCACCGGA GCTGATCGCTCCAGCGAAACCTACGCCACGAGAACTAAAACCTCTATCCGACA TTGATGATCAAGAAGGCCTTAGGTTTCATATCCCAGTGATTCAATTCTATCGTA GCGATCCAAAGATGAAAAATAAGAACCCGGCTAGTGTGATCAGAGAGGCTTT AGCTAAGGTGTTGGTGTTTTACTATCCGTTTGCTGGCCGGCTCAAGGAAGGGCC TGCCAGGAAACTGATGGTAGATTGCTCTGGTGAAGGTGTGTTGTTTATTGAGG CGGAAGCTGATGTCACGTTGAAACAATTTGGTGACGCCCTTCAACCGCCGTTTC CTTGTTTGGAAGAGCTTCTTTACGATGTTCCTGGATCTACTGGCGTTCTAGATA CACCGTTATTGCTGATTCAGGTGACACGATTGTTATGTGGAGGTTTTATCTTTG CTCTACGACTCAATCACACCATGAGCGACGCACCAGGTCTCGTTCAATTCATG ACAGGGCTCGGTGAAATGGCACAAGGTGCATCAAGGCCATCTACGTTGCCTGT ATGGCAAAGGGAGTTGCTTTTAGCAAGGGACCCACCACGCGTGACATGTACTC ATCACGAGTATACTGAAGTGGAAGACACCAAGGGTACAATCATTCCGCTAGAT GACATGGCACATAAATCATTTTTCTTTGGACCTTCTGAGATATCAGCATTGCGA AGGTTCGTTCCATCATACCTAAAAAAGTGTTCTACTTTTGAGGTCTTAACCGCT TGCCTATGGCGTTGTCGTACAATTGCACTCCAGCCAGATCCCGAAGAAGAGAT GCGCATAATATGCATTGTTAATGCGCGCGGAAAGTTTAATCCACCCCTTCCTAA AGGTTATTATGGAAATGGTTTTGCTTTCCCAGTGGCCATTTCAACAGCTGGAGA TCTATCCAGCAAACCATTAGGTCATGCATTGGAACTTGTAATGAAAGCCAAAT CCGATGTCACTGAGGAGTATATGAGATCAATAGCCGACTTAATGGTAATCAAG GGACGTCCCCACTTTACGGTTGTCAGAAGCTACCTTGTCTCGGATGTGACTCAC GCTGGATTTGATGTTGTTGATTTCGGGTGGGGGAAAGCGGCCTATGGAGGACC CGCTAAAGGGGGAGTAGGTGCTATCCCAGGTGTTGCTAGTTTCTATATACCTTT TACAAACCATAAAGGCGAGTCTGGAATTGTGCTACCTATATGTTTGCCGAGTG CGGCCATGGATAAGTTTGTTGAAGAGTTAAATAAGATGTTGGTCCCAGACAAC AACGAACAAGTACTCCGAGAACACAAGTTACTAGTTCTTGCTAGATTGTAA (SEQ ID NO: 122).

[0216] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 83%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 122, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 83% to 100%, 88% to 100%, 92% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 122. Each possibility represents a separate embodiment of the invention.

[0217] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGAAATACAAGTAATAAACTACTCATCAAAGCTAGTAAAACCCTTGACACC AACACCCACCGCAAATCGTTACTATAACATTTCTTTCACCGATGAGCTCGTCCC AACCATTTACGTCCCACTCATTCTCTACTACGCAACACCGAAAAACCCAAATG GTGATCACTTTGAAAACATTTGTGACCGTCTGGAGGAGTCGTTATCGAAAACG TTAAGTGATTTTTACCCACTGGCCGCGAGATTCATTCGTAAACTCTCCTTAATT GATTGTAACGATCAAGGGGTTTTGTTTGTCCTAGGCAATGTAAATATCCGACTT TCGGATGTTACAGGCCTAGGACTGACGTTTAAAACCAGTGTTTTAAATGATTTT CTCCCGTGTGAGATTGGAGGAGCGGATGAAGTCGATGATCCTATGCTTTGTGT CAAAGTCACCACTTTTGAGTGTGGTGGTTTTGCAATTGGTATGTGTTTTTCGCA TAGGCTTTCGGATATGGGTACCATGTGTAACTTTATTAACAATTGGGCTGCTAG AACTATTGGTGAATATGATAATGAAAAACATACTCCTATTTTTAATTCGCCGTT GTACTTCCCGCAACGAGGATTACCTGAACTTGACCTAAAAGTACCTAGGTCAA GTATTGGTGTGAAAAATGCAGCACGCATGTTTCACTTTAATGGGAAGGCAATA TCATCCATGAGAGAAGTTTTTGGAGTTGATGAAAATGGGTCTCGTAGACTCTC AAAGGTTCAACTTGTTGTAGCCTTGTTGTGGAAGGCCTTTGTTCGCATAGATGA TGTGAACGATGGCCAATCTAAGGCGTCTTTTCTGATCCAACCAGTTGGGTTGAG GGACAAAGTTGTCCCTCCATTACCATCAAACTCATTTGGGAATTTTTGGGGTCT AGCGACTTCCCAACTTGGTCCTGGTGAGGGTCACAAAATCGGTTTCCAAGAAT _{ATTTTTACATTTTGCGTGAATCTATTAAGAAAAGAGCTAGGGATTGCGCTAAA} ATATTGACACACGGTGAAGAAGGATATGGGGTTGTAATCGATCCATATCTTGA GTCGAATCAAAAGATAGCTGATAATGGTACAAACTTTTACTTGTTCACTTGTTG GTGCAAGTTTTCGTTCTACGAAGCTGATTTTGGTTGTGGTAAGCCGATTTGGGC TAGCACCGGAAAGTTTCCGGTTCAAAATTTGGTGATCATGATGGATGATAATG AGGGTGATGGTGTAGAAGCGTGGGTTCATTTAGACGATAAACGCATGAATGAG TTAGAACAAGATCCTGATGTTAAACTCTACGCATGCAATTTAGCTTAA (SEQ ID NO: 123).

[0218] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 123, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 100%, 82% to 100%, 87% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 123. Each possibility represents a separate embodiment of the invention.

[0219] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGAAATTAGCAGTGAAGGAATCAGTGATAGTAAAACCATCCAAAACGACAC CGTGTCAGCAAATATGGACATCAAATCTTGATTTAGTGGTGGGTCGGATCCAT ATATTAACCGTTTACCTTTACAGACCAAATGGGTCTTCAAATTTCTTTGATTCC ATGGTTTTAAAGAAGGCTCTAGCCGACGTTTTAGTTTCTTTTTTTCCGGTGGCC GGACGGTTGGATAAAGACGGTGACGGCAGAGTTGTAATAGATTGTAACGGTG AGGGTGTTTTGTTTGTGGAAGCTGAAGCTGATTGTTGCATTGATGATTTTGGTG AGATTACTCCGTCGCCGGAGTTACGACGGTTGGTGCCGACGGTGGATTATTCC GGTGATATGTCTTCTTATCCGTTATTTATTACGCAGGTTACACGGTTCAAGTGT GGGGGAGTTTCGTTAGGCTGTGGACTACACCATACGTTATCGGATGGACTCTC AGCACTTCACTTCATCAACACATGGTCTGATGTAGCTAGAGGCCTATCGGTGG CAATCCCACCGTTCATTGACCGCTCCCTTCTTCGAGCTCGTGATCCACCATCCC CTGTGTTTGACCACATCGAATACCACCCACCACCGTCACTGATCACTCCGTTGC AAAACCAAAAGAACGCGTCACATTCGAGGTCTGCTTCAACTTTAATCCTACGG CTCACACTCCATCAAATAAACAATCTTAAATCAAAGGCTAAAGGCGATGGGAG CATGTACCATAGCACGTACGAGATCCTAGCTGCTCATCTATGGCGATGTGCGT GCAAAGCACGTGGACTAGCAAACGATCAACCAACCAAATTGTATGTGGCCACC GATGGACGGTCAAGATTGATTCCTCCACTCCCTCCGGGCTACCTTGGGAATGTC GTTTTCACGGCTACTCCTGTCGCTAAATCGGGAGATTTCGAATCTGAATCCTTG GCAGAGACAGCAAGGAGGATTCGCAGTGAGTTGGGTAAAATGAACGATGAGT ATCTTAGATCAGCTATTGACTACTTAGAGTCGGTATCTGATATTTCGACCCTTG TTAGAGGGCCGACTTACTTTGCGAGTCCAAATCTGAATGTAAACAGTTGGACT CGGTTACCAATATACGAATCTGACTTCGGTTGGGGTCGACCTATTTTCATGGGA CCCGCAAGTATACTTTACGAGGGTACGATTTACATCATACCGAGCCCTAGTGG TGATCGGAGTGTGTCTCTGGCCGTGTGCTTGGACCCTGATCACATGGCTTTGTT TAAAGAATGCTTGTACGTTTTTTAG (SEQ ID NO: 124).

[0220] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 84, at least 89%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 124, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 84% to 100%, 88% to 100%, 93% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 124. Each possibility represents a separate embodiment of the invention.

[0221] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGAAGCTAGCAGTGAAGGAATCAGTGATAGTAAAACCATCCAAAACGACAC CGTGTCAGCAAATACGGACATCAAATCTTGATTTAGTGGCGGGTCGGATCCAT ATATTAGTCGTTTTCTTTTACAGACCAAATGGGTCTTCGAATTTCTTTGATTCCT TGGTTTTAAAGAAGGCTCTCGCCGACGTTTTAGTTCCTTTTTTTCCGGTGGCCG GACGGTTCAGTGAAGACGGTGACGGCAGAGTTGTAATTGATTGTAACGGTGAG GGTGTTTTGTTTGTGGAATCTGAAGCTGATTGTTGCATTGATGATTTTGGTGAG ATTACTCTGTCGCCGGAGTTACAACAGTTGGTGCCGACGGTGGATTATTCCGGT GATATGTCTTCTTATCCGTTATTTATTGCGCAGGTCACACGGTTCAAGTGTGGG GGAGTTTCGTTAGGTTGGGGACTACACCATACATTATTGGATGGACTCTCAGC ACTTCACTTCGTCAACACATGGGGTGATGTAGCTAGAGGCCTATCGGTGGCAA TCCAACCGTTCATTGACCGCTCCCTTCTTCGAGCTCGTGATCCACCGACCCCTG TGTTTGACCACATCGAATACCACCCACCACCGTCACTGATCACTCCATTGCAAA ACCAAAAGAACGCATCACATTCGAGGTCTGCTTCAACTTTAATCCTACAGCTC ACACCCGATCAAATAAAGAATCTTAAATCAAAGGCTAAAGGCGATGGGAGCA TGTACCATAGCACATACGAGATCCTAGCTGCTCATCTATGGCGATGTGCGTGC AAAGCGCGTGGACTAGCAAACGATCAACCAACCAAATTGTATGTGGCCGCCAA TGGACGGTCAAGATTGATTCCTCCACTCCCTCCGGGCTACCTTGGGAATGTCGT TTTCAACGCTACTCATGTCGCTAAATCGGGGGATTTTGAATCTGAATCCTTGGC AGAGACTGCAAGGAGGATTCACTGTGAGTTGGGTAAAATGAACGATGAGTATT TTAGATCAGCTATCGACTACTTAGAGTCGGTAGATGATATTTCAACCCTTGTCA AAGGGCCGACTTACTTTGCGAGTCCAAATCTGAATGTATACAGTTGGATTGGG ATACCAATATATGCATGTGACTTCGGATGGGGTCAACCTATTTTCATGAGACCC GCAAGTTTCCTTTACGATGGTTCCATTTACATCATACCGAGCCCTAGTGGTGAT CGGAGTGTGTTGTTGGCCGTGTGCTTGGACCCTGATCACATGGATTTGTTTAAA GAATGCTTGTACGCTTTTTAG (SEQ ID NO: 125).

[0222] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 125, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 125. Each possibility represents a separate embodiment of the invention.

[0223] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGTGATGATTAGCAAGCTTTTACGATTAGGTAGAAGAAAACTTCACACAAT TGTATCAAGAGATACCATTAGACCTTCTTCTCCAACTCCCTCTCATTCCAAAAC ATATAATCTCTCCTTGCTCGATCAAATAGCTGTAAATTCATACGTGCCGATTGT TGCTTTTTACCCAAGCTCAAATGTTTGTCGAAGTTCCGATGATAAGACGCTGGA GTTGAAGAACTCATTATCGAAAATATTAACTCATTACTATCCGTTTGCCGGTAG AATGAAGAAGAATCGCCCTACCGTCGTTGATTGCAATGATGAAGGGGTTGAGT TCGTTGAAGCACGTAATACCAACTCGTTATCAGATTTCCTCCAACAATCGGAGC ACGAAGATCTAGATCAACTCTTTCCAGATGATTGTGTATGGTTCAAACAAAAC CTTAAAGGTTCTATTAATGACGCAAATAATAGTAGCGTATGTCCATTGAGCATT CAAGTCAACCATTTCGCGTGTGGAGGTGTAGCAGTTGCAACTTCGTTACGCCA CAAGATTGGAGACGGAAGCAGTGCGTTAAATTTCATTAAACACTGGGCTGCAG TTACGTCACACTCTCGAGCAGGGAATCATCAAATTGATGCGACATCACCCATC ATTAATCCCCATTTCATTTCTTACCCAACTAGAACTTTTAAATTGCCAGATAGG TCACCATACATACCACCTAGTGATGTTGTGTCAAAAAGTTTTGTTTTCCCCAAC ACAAATATAAAGGACCTCCAAGCCAAGGTGGTAACCATGACCATGGGCTCTAG ACAACCTATCGTGAACCCTACCCGAGCTGATGTCGTATCATGGCTTCTACATAA GTGTGTAGTAGCAGCAGCTACCAAAAGGATATCGGGAAATTTTAAAGAAAGTT GCGTGATCTCGCCATTAAATCTGAGAAACAAGTTAGAAGAGCCATTGCCTGAA ACAAGCATAGGAAATATTTTCTATCTGATAACCTTTCCAATAAGCAATAATCAT GGCGATCTCATGCCCGATGACTTCATTAGCCAACTCAGGCTAGGAATACGTAA GTTTCAAAATATACGAAATTTGGAAACTGCATTACGAACCGTTGAAGAGATGA TATCTGAAACTTTTATCTTGGGTACGGCAGAAAGCATGGATACTAGTTATGTAT ATTCGAGCATCCGTGGGTTTCCGATGTATGATATTGATTTTGGGTGGGGGAAGC CCGTAAAAGTAACCGTTGGGGGAGCCCTTAAGAACTTAAGTATTCTGATGGAC ACTCCTGATGTCAATGGCATCGAAGCACTAGTGTCTTTGGATAAACAAGACAT GAAGATACTTCTAAACGACCCTGAGTTGTTGGCCTTTTGCTTGTAA (SEQ ID NO: 126).

[0224] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 72, at least 80%, at least 85%, at least 87%, at least 93%, or at least 99% homology or identity to SEQ ID NO: 126, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 72% to 100%, 79% to 100%, 86% to 100%, or 91% to 100% homology or identity to SEQ ID NO: 126. Each possibility represents a separate embodiment of the invention.

[0225] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGAGTACTAGTGACAAAATGAAGATAACAATAAGAGAATCATCAATGATAA AACCATCCAAACCGACGCCGGATCAACGGATATGGAACTCAAATCTTGATTTG GTAGTGGGTCGGATCCATATCTTGACCCTTTACTTTTTTAGGCCAAATGGGTCT TCGGATTTCTTTGATTCTGAGGTTTTAAAGCAATCACTTGCCGACGTTCTTGTTT _{CTTTTTTTCCGATGGCCGGACGATTGGGATTAGACGGCGATGGCAGAGTTGAA} ATTAATTGCAACGGTGAAGGTGTTTTGTTTGTTGAAGCTGAAGCGGATTGTAGT ATTGATGATTTTGGTGAGATTACTCCGTCGCCGGAGCTACGGCGGTTGGCGCC AACAGTGGATTATTCCGGCGATATCTCATCTTATCCACTCGTTATTACCCAGGT AACACATTTCAAATGTGGTGGAGTTTCTCTTGGGTGTGGACTACACCATACATT ATCCGATGGACTTTCATCTCTTCACTTCATCAACACATGGTCCGATGTTACCCG AGGCTTACCCGTTGCGATCCCGCCATTCGTAGATCGTACGGTTCTTCGTGCTAG GGACCCGCCAACCGTGGTCTTTGATCACGTGGAATACCACACTCCTCCTTCCAT GACCTCAAGTTTGGACAAAGACAAACCTCAATCCGAAGATGTTCATGTTTCCA CTTCCATGCTACGGCTCACACTCGATCAAATAAATGCACTAAAAGCAAAAGGC AAAGGTGACGGAATTGTGTACCATAGCACATATGAAATCCTAGCTGCTCATTT ATGGCGATGTGCGTGTAAAGCACGTGGGCTCCTGAATGATCAAATGACTAAAT TGTATGTAGCTACCGATGGACGGTCCAGATTGATTCCCCCACTCCCACCGGGGT ACTTAGGCAATGTGGTCTTCACCGCCACACCAATTGCCAAATCCGGCGAGCTC CAACAGGAACCACTAGCTACCACTGCAAGAAAAATTCATACAGAGTTGGCCAA AATGGATGACAAGTACCTCAGGTCGGCCCTCGACTACTTAGAGTCACAACAGG ACTTGTCAGCACTAATTCGAGGGCCAGCCTATTTTGCGTGCCCTAACCTCAACA TCAATAGTTGGACTCGCCTTCCAATATATGATGCGGACTTTGGGTGGGGTCGGC CCATATTTATGGGACCCGCCAGCATACTTTACGAGGGCACGATTTACATTATTC CGAGCCCTAGTGGTGACCGAAGTGTGTCGTTGGCTGTGTGCTTAGACCCCTCTC ATATGCCTCTCTTCCAAAAGTACTTGTATGAACTTTAA (SEQ ID NO: 127).

[0226] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 127, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 127. Each possibility represents a separate embodiment of the invention.

[0227] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGTGAATGTTGAGATCATTTCTAATGAATACATAAAACCATCCTCCCCAAC ACCACCACATCTTAAAATATACAATCTTTCCATCTTAGATCAACTCATTCCTGC CCCCTATGCACCTATCATACTATATTATCCGAATCAAGATCACATTAACGATTT TGAGGTTCACGAACGGTTGAAACTACTAAAAGATTCGTTATCGAAAACGCTAA CTCGTTTTTACCCATTAGCCGGAACCATCAAAGGCGATCTTTCCATTGATTGTA ACGATATTGGTGCTTACTTTGCAGTAGCTCATGTAAATACTCGCCTTGATGTGT TCCTGAACCATCCTGATCTTGACCTAATAAACTGTTTTCTTCCACGTGGGCCTT ACTTGAATGGTTCTAGTGAAGGAAGTTGTGTGAGTAATGTTCAAGTGAACATT TTTGAGTGTTGTGGGATTGCAATTAGTTTATGCATTTCTCACAAGATTCTTGAT GGTGCTGCGTTGAGTACTTTTCTTAAAGCATGGGCAGGGACAAGTTACGGGTC GAAAGAAGTAGTGTATCCAAACATGAGTGCACCATCTTTATTTCCTGCTAAAG ATTTGTGGCTTAAAGATTCATCAATGGTCATGTTTGGGTCTTTGTTTAAGATGG GTAAGTGTAGTACTAAAAGATTTGTTTTTGATTCATCAAAATTATCCTTCCTCA AAGCTAAGGCATCGCTAAATGGGCTAAAAGACCCAACCCGCGTAGAGGTGGT GTCTGCTTTACTATGGAAGTGTATCATGGCTGCATCTGAAGAAAACACTGGTTC TTGGAAGCCATCTCTGTTAAGCCATGTAGTTAACCTTCGCAAAAGGTTGGTTTC AACTTTATCAGAAGACTCAATTGGGAACTTAATTTGGTTAGCAAGCGCAGAAT GTAGAACCAACGCTCAATCCCGATTGAGTGATCTTGTTGAAAAGGTACGTGAT AGTGTGTCGAAAATCAATAGTGAGTTTGTGAAGAAAATACAAGGCGATAAAG GGACAAAAGTGATGGAAGAGTCTCTCAAGAGTATGAAAGATTGTGCGGATTAT ATCGGGTTTACGAGTTGGTGTAAGATGGGGTTTTACGATGTGGATTTTGGTTGG GGAAAGCCTGTATGGGTTTGTGGTAGCGTTTGTGAAGGTAGCCCGGTGTTCAT GAATTTTGTCATATTAATGGACACAAAATATGGTGATGGAATAGAAGCATGGG TGAGCTTGGATGAACACGAAATGCATATCTTAAAGCATAATCCCGAGCTCTTG GAATATGCATCAATCGATCCAAGTCCTCTGCAAATGAATAAGTGA (SEQ ID NO: 128).

[0228] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 128, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 88% to 100%, 93% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 128. Each possibility represents a separate embodiment of the invention.

[0229] In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:

ATGGGAACTATTTATCAATCTCCCATGATCAAATCTTCTACTCCCAAAATAATT GAAGACCTCAAAGTTATCATCCATGACACATTCACAATCTTCCCACCTCACGA AACCGAAAAGCGGTCCATGTTCTTATCGAACATTGACCAAGTTCTTACTTTCAA CGTTGAAACGGTCCATTTTTTTGCAGCCAACCCTGACTTTCCGCCACAAGTAGT GGCGGAAAAGCTCAAGTTGGCTCTAAGTAAGGCGCTGGTGCCATATGATTTTT TGGCAGGGAGGTTGAAGTTGAACCATGAGTCGCAACGGTTTGAGTTTGATTGT AATGGTGCTGGGGCTCGGTTCGTGGTGGGTTCGAGTGAGTTTGAGTTGGGTGA GATTGGTGACTTGGTGTATCCAAACCCTGGGTTTAGACAATTGGTTCAAAAGA GTTATGATAACTTGGAGTTACATGAAAAGCCACTATGCATTTTACAGCTGACAT CCTTCAAGTGTGGAGGATTTGCACTTGGTGTAGCAACAAATCATGCCACTTTTG ATGGCTTAAGTTTCAAAACATTTCTTCAAAATCTTGGTTCTTTGGCTGCTGATC AACCACTTGCCGTCGATCCCTGCAACGATCGCCACCTATTGGCAGCACGATCA CCACCAAAAGTCCAATTTGACCACCCTGAACTCCTCAAAATCCCAACAGGAAC AGACATCCCAAACCCAACAGTCTTTGACTGCCCAGAAAGTCAACTTGACTTCA AGATTTTCAACTTGACCTCAGATGACATAGCCCACTTAAAAACGAAAGCCAAA GATGGGCCTGGGTCAACCAATGCAAAAATCACTGGATTCAATGTGGTTGCAGC CCATGTATGGCGGTGCAAAGCGTTGTCCTCAGGGTCAGAATATGACCCCGAGA GAGTGTCAACCGTGTTATATGCTGTTGACATTCGGTCAAGATTGAACTTACCAT TATCATTAGCTGGCAATGCAGTTCTTAGTGCATACGCCTCGGCCAAATGCAAA GAGATTGAAGAAGGCCCGTTGTCAAGACTAGTGGAAATGGTGACCGAAGGTA CTAACAGAATGACTGGTGAGTATGCAAGATCGGTGATCGATTGGGGAGAGGTG AATAAAGGGTTTCCAAATGGGGAGTTTCTGATATCGTCATGGTGGCGATTGGG GTTTGCTGACGTGGAATATCCGTGGGGTAAACCTAGGTATAGTTGTCCCGTGGT TTATCATAGGAAAGATATAATATTACTCTTTCCGGATATTGTTGGTGCCGATAA CAACAATGAAGTGAATGTGTTGGTGGCTTTGCCTGGCAAAGAAATGGAGAAAT TTGAGACTTTATTTCATAAGTTTTTGGCATGA (SEQ ID NO: 129).

[0230] In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 87, at least 91%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 129, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 87% to 100%, 90% to 100%, 94% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 129. Each possibility represents a separate embodiment of the invention.

[0231] In some embodiments, the DNA molecule comprises a plurality of nucleic acid sequences. In some embodiments, the polynucleotide comprises a plurality of types of polynucleotides.

[0232] As used herein, the term “plurality” comprises any integer equal to or greater than 2. [0233] In some embodiments, plurality of nucleic acid sequences encode proteins of different enzymatic functions or families as described herein. In some embodiments, plurality of nucleic acid sequences encode at least two proteins of the same enzymatic function or family as described herein. In some embodiments, plurality of nucleic acid sequences encode a plurality of proteins of a plurality of different enzymatic functions or families as described herein.

[0234] In some embodiments, the DNA molecule encodes a protein characterized by acyl activating enzymatic (AAE) activity. In some embodiments, the DNA molecule encodes an AAE protein. In some embodiments, the AAE is an AAE derived from Helichrysum umbraculigerum. In some embodiments, the DNA molecule encoding a protein characterized by acyl activating enzymatic (AAE) activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 1-11.

[0235] As used herein, the terms “acyl activating enzyme” and “AAE” are interchangeable, and refer to any peptide, polypeptide, or a protein, capable of catalyzing the activation of a carboxylic acid. In some embodiments, AAE activity comprises forming or formation of a thioester bond. In some embodiments, AAE activity comprises coupling a carboxyl group to an amine group. In some embodiments, AAE activity comprises coupling a carboxyl group to an alcohol. In some embodiments, the AAE is an acid-thiol ligase.

[0236] In some embodiments, the DNA molecule encodes a protein characterized by polyketide synthesizing activity. In some embodiments, the DNA molecule encodes a protein being a polyketide synthase (PKS). In some embodiments, the PKS is a PKS derived from Helichrysum umbraculigerum. As used herein, the terms “polyketide synthase” and “PKS” encompasses any enzyme derived from H. umbraculigerum and having or characterized by being functional analog of the “olivetol synthase” or “OLS” of Cannabis sativa. In some embodiments, the DNA molecule encoding a protein characterized by polyketide synthesizing activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 23-26.

[0237] As used herein, the terms “polyketide synthase” and “PKS” are interchangeable, and refer to any peptide, polypeptide, or a protein, capable of catalyzing the elongation of a ketide or a polyketide chain. In some embodiments, PKS activity transacylation. In some embodiments, PKS activity comprises Claisen condensation. In some embodiments, PKS activity comprises reduction of P-keto group to a P-hydroxy group. In some embodiments, PKS activity comprises H2O splitting, thereby obtaining, providing, or resulting in a a-P- unsaturated alkene. In some embodiments, PKS activity comprises reducing a a-P-double- bond to a single-bond. In some embodiments, PKS activity comprises hydrolyzing a polyketide chain or a completed polyketide chain from an acyl carrier protein domain of the PKS. In some embodiments, PKS activity comprises polymerizing and/or ligating a diketide substrate into a polyketide chain. In some embodiments, PKS activity comprises elongating a diketide to a polyketide chain. In some embodiments, PKS activity comprises elongating a polyketide chain.

[0238] In some embodiments, the DNA molecule encodes a protein characterized by polyketide cyclizing activity. In some embodiments, the DNA molecule encodes a protein being a polyketide cyclase (PKC). In some embodiments, the PKC is a PKC derived from Helichrysum umbraculigerum. As used herein, the terms “polyketide cyclase” and “PKC” encompasses any enzyme derived from H. umbraculigerum and having or characterized by being functional analog of the “olivetolic acid cyclase” or “OAC” of Cannabis sativa. In some embodiments, the DNA molecule encoding a protein characterized by polyketide cyclizing activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 31-38.

[0239] As used herein, the terms “polyketide cyclase” and “PKC” are interchangeable, and refer to any peptide, polypeptide, or a protein, capable of folding and/or cyclizing a polyketide. In some embodiments, PKC activity comprises an action of a cyclase subunit. In some embodiments, PKC activity comprises site-specific keto-reductase activity.

[0240] In some embodiments, the DNA molecule encodes a protein characterized by prenyl transferring activity. In some embodiments, the DNA molecule encodes a protein being a prenyltransferase (PT). In some embodiments, the PT is a PT derived from Helichrysum umbraculigerum. As used herein, the terms “prenyltransferase” and “PT” encompass any enzyme derived from H. umbraculigerum and having or characterized by being functional analog of the “geranylpyrophosphate:olivetolate geranyltransferase” or “GOT” of Cannabis sativa. In some embodiments, the GOT is GOT4 or CsGOT4. In some embodiments, the DNA molecule encoding a protein characterized by prenyl transferring activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 47-58.

[0241] As used herein, the terms “prenyltransferase” and “PT” are interchangeable, and refer to any peptide, polypeptide, or a protein, capable of transferring an allylic prenyl group to an acceptor molecule. In some embodiments, PT activity comprises cyclization. In some embodiments, PT activity comprises transferring an allylic prenyl group to an acceptor molecule. [0242] In some embodiments, the DNA molecule encodes a protein characterized by cannabigerolic acid (CBGA) cyclization or cyclizing activity. In some embodiments, cycling activity comprises cyclization of CBGA to CBCA. In some embodiments, the polynucleotide encodes a protein capable of cyclizing or cyclization of CBGA to CBCA. In some embodiments, the DNA molecule encodes a protein characterized by being capable of synthesizing CBCA or being a CBCA synthase (CBCAS). In some embodiments, the CBCAS is a CBCAS derived from Helichrysum umbraculigerum. As used herein, the terms “CBCA synthase” and “CBCSA” encompass any enzyme derived from H. umbraculigerum and having or characterized by being a functional analog of the CBCA synthase of Cannabis sativa (e.g., CsCBCAS). In some embodiments, the DNA molecule encoding a protein characterized by CBGA cyclization or cyclizing activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 71-79.

[0243] In some embodiments, the polynucleotide encodes a protein characterized by catalytic activity of transfer a glucuronic acid component of UDP-glucuronic acid to a small hydrophobic molecule (e.g., a UGT). In some embodiments, the polynucleotide encodes a protein characterized by glycosyltransferase catalytic activity. In some embodiments, the polynucleotide encodes a protein characterized by being capable of transferring glucuronic acid component of UDP-glucuronic acid to a cannabinoid or a precursor thereof. In some embodiments, the polynucleotide encodes a protein characterized by having a catalytic activity of glycosylating a cannabinoid or a precursor thereof. In some embodiments, the polynucleotide encodes a UGT enzyme.

[0244] In some embodiments, the UGT is a UGT derived from Helichrysum umbraculigerum. As used herein, the term “UGT” encompass any enzyme derived from H. umbraculigerum and having or characterized by having an activity as described herein.

[0245] In some embodiments, the UGT protein is encoded by a DNA molecule comprising SEQ ID Nos.: 89-101.

[0246] In some embodiments, the DNA molecule encodes a protein characterized by being capable of acting on an acyl group. In some embodiments, the DNA molecule encodes a protein characterized by catalytic activity of transferring an acyl group from a donor molecule to an acceptor molecule. In some embodiments, the acceptor molecule is a hydrophobic molecule, a small molecule, or both. In some embodiments, the donor molecule comprises an acyl group, CoA, or both. In some embodiments, the DNA molecule encodes a protein characterized by acyltransferase catalytic activity. In some embodiments, the DNA molecule encodes a protein characterized by being capable of transferring an acyl group to a cannabinoid. In some embodiments, the DNA molecule encodes a protein characterized by having a catalytic activity of acylating a cannabinoid. In some embodiments, the acyltransferase (AT) is an alcohol acyltransferase (AAT). In some embodiments, the DNA molecule encodes an AT enzyme. In some embodiments, the polynucleotide encodes an AAT enzyme.

[0247] In some embodiments, the AAT is an AAT derived from Helichrysum umbraculigerum. As used herein, the term “AAT” encompass any enzyme derived from H. umbraculigerum and having or characterized by having an activity as described herein.

[0248] In some embodiments, the AAT protein is encoded by a DNA molecule comprising or consisting of SEQ ID Nos.: 115-129.

[0249] In some embodiments, the artificial vector comprises a plasmid. In some embodiments, the artificial vector comprises or is an agrobacterium comprising the artificial nucleic acid molecule. In some embodiments, the artificial vector is an expression vector. In some embodiments, the artificial vector is a plant expression vector. In some embodiments, the artificial vector is for use in expressing any one of: AAE, PKS, PKC, PT, or CBCAS encoding nucleic acid sequence as disclosed herein, or any combination thereof. In some embodiments, the artificial vector is further for the use in expressing UGT, AAT, or both. In some embodiments, the artificial vector is for use in heterologous expression of any one of: AAE, PKS, PKC, PT, or CBCAS encoding nucleic acid sequence as disclosed herein, or any combination thereof, in a cell, a tissue, or an organism. In some embodiments, the artificial vector is further for the use in heterologous expression of UGT, AAT, or both in a cell, in a tissue, or an organism. In some embodiments, the artificial vector is for use in producing or the production of an acyl-coenzyme A (acyl-CoA), a polyketide, a cannabinoid, e.g., CBGA, CBCA, any precursor thereof, or any combination thereof, in a cell, a tissue, or an organism. In some embodiments, the artificial vector is further used in producing or the production of a modified acyl-coenzyme A (acyl-CoA), a polyketide, a cannabinoid, e.g., CBGA, CBCA, any precursor thereof, or any combination thereof, in a cell, a tissue, or an organism, wherein the modified further comprises an acyl group, a glycan (e.g., glycosylated), or both.

[0250] Expressing a polynucleotide within a cell is well known to one skilled in the art. It can be carried out by, among many methods, transfection, viral infection, or direct alteration of the cell's genome. In some embodiments, the DNA molecule is in an expression vector such as plasmid or viral vector. A vector nucleic acid sequence generally contains at least an origin of replication for propagation in a cell and optionally additional elements, such as a heterologous polynucleotide sequence, expression control element (e.g., a promoter, enhancer), selectable marker (e.g., antibiotic resistance), poly- Adenine sequence.

[0251] The vector may be a DNA plasmid delivered via non-viral methods or via viral methods. The viral vector may be a retroviral vector, a herpesviral vector, an adenoviral vector, an adeno- associated viral vector, a virgaviridae viral vector, or a poxviral vector. The barley stripe mosaic virus (BSMV), the tobacco rattle virus and the cabbage leaf curl geminivirus (CbLCV) may also be used. The promoters may be active in plant cells. The promoters may be a viral promoter.

[0252] In some embodiments, the DNA molecule as disclosed herein is operably linked to a promoter. The term "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory element or elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell). In some embodiments, the promoter is operably linked to the polynucleotide of the invention. In some embodiments, the promoter is a heterologous promoter. In some embodiments, the promoter is the endogenous promoter.

[0253] In some embodiments, the vector is introduced into the cell by standard methods including electroporation (e.g., as described in From et al., Proc. Natl. Acad. Sci. USA 82, 5824 (1985)), heat shock, infection by viral vectors, high velocity ballistic penetration by small particles with the nucleic acid either within the matrix of small beads or particles, or on the surface (Klein et al., Nature 327. 70-73 (1987)), such as biolistic use of coated particles, and needle-like particles, Agrobacterium Ti plasmids and/or the like. [096] The term "promoter" as used herein refers to a group of transcriptional control modules that are clustered around the initiation site for an RNA polymerase i.e., RNA polymerase II. Promoters are composed of discrete functional modules, each consisting of approximately 7-20 bp of DNA, and containing one or more recognition sites for transcriptional activator or repressor proteins. The promoter may extend upstream or downstream of the transcriptional start site and may be any size ranging from a few base pairs to several kilobases.

[0254] In some embodiments, the DNA molecule is transcribed by RNA polymerase II (RNAP II and Pol II). RNAP II is an enzyme found in eukaryotic cells, known to catalyze the transcription of DNA to synthesize precursors of mRNA and most snRNA and microRNA. [0255] In some embodiments, a plant expression vector is used. In one embodiment, the expression of a polypeptide coding sequence is driven by a number of promoters. In some embodiments, viral promoters such as the 35S RNA and 19S RNA promoters of CaMV [Brisson et al., Nature 310:511-514 (1984)], or the coat protein promoter to TMV [Takamatsu et al., EMBO J. 6:307-311 (1987)] are used. In another embodiment, plant promoters are used such as, for example, the small subunit of RUBISCO [Coruzzi et al., EMBO J. 3: 1671-1680 (1984); and Brogli et al., Science 224:838- 843 (1984)] or heat shock promoters, e.g., soybean hspl7.5-E or hspl7.3-B [Gurley et al., Mol. Cell. Biol. 6:559-565 (1986)]. In one embodiment, constructs are introduced into plant cells using Ti plasmid, Ri plasmid, plant viral vectors, direct DNA transformation, microinjection, electroporation and other techniques well known to the skilled artisan. See, for example, Weissbach & Weissbach [Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463 (1988)]. Other expression systems such as insects and mammalian host cell systems, which are well known in the art, can also be used by the present invention.

[0256] In some embodiments, expression vectors containing regulatory elements from eukaryotic viruses such as retroviruses are used by the present invention. SV40 vectors include pSVT7 and pMT2. In some embodiments, vectors derived from bovine papilloma virus include pBV-lMTHA, and vectors derived from Epstein Bar virus include pHEBO, and p205. Other exemplary vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDS VE, and any other vector allowing expression of proteins under the direction of the SV-40 early promoter, SV-40 later promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.

[0257] In some embodiments, recombinant viral vectors, which offer advantages such as systemic infection and targeting specificity, are used for in vivo expression. In one embodiment, systemic infection is inherent in the life cycle of, for example, the retrovirus and is the process by which a single infected cell produces many progeny virions that infect neighboring cells. In one embodiment, the result is that a large area becomes rapidly infected, most of which was not initially infected by the original viral particles. In one embodiment, viral vectors are produced that are unable to spread systemically. In one embodiment, this characteristic can be useful if the desired purpose is to introduce a specified gene into only a localized number of targeted cells.

[0258] In some embodiments, plant viral vectors are used. In some embodiments, a wildtype virus is used. In some embodiments, a deconstructed virus such as are known in the art is used. In some embodiments, Agrobacterium is used to introduce the vector of the invention into a virus.

[0259] Various methods can be used to introduce the expression vector of the present invention into cells. Such methods are generally described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (1989, 1992), in Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, Baltimore, Md. (1989), Chang et al., Somatic Gene Therapy, CRC Press, Ann Arbor, Mich. (1995), Vega et al., Gene Targeting, CRC Press, Ann Arbor Mich. (1995), Vectors: A Survey of Molecular Cloning Vectors and Their Uses, Butterworths, Boston Mass. (1988) and Gilboa et at. [Biotechniques 4 (6): 504-512, 1986] and include, for example, stable or transient transfection, lipofection, electroporation, agrobacterium Ti plasmids and infection with recombinant viral vectors. In addition, see U.S. Pat. Nos. 5,464,764 and 5,487,992 for positive-negative selection methods.

[0260] It will be appreciated that other than containing the necessary elements for the transcription and translation of the inserted coding sequence (encoding the polypeptide), the expression construct of the present invention can also include sequences engineered to optimize stability, production, purification, yield, or activity of the expressed polypeptide.

[0261] In some embodiments, the artificial vector comprises a polynucleotide encoding a protein comprising an amino acid sequence as described herein.

[0262] According to some embodiments, there is provided a protein encoded by: (a) the DNA molecule disclosed herein; (b) the artificial vector disclosed herein; or the plasmid or agrobacterium disclosed herein.

[0263] In some embodiments, the protein is an isolated protein.

[0264] As used herein, the terms "peptide", "polypeptide" and "protein" are interchangeable and refer to a polymer of amino acid residues. In another embodiment, the terms "peptide", "polypeptide" and "protein" as used herein encompass native peptides, peptidomimetics (typically including non-peptide bonds or other synthetic modifications) and the peptide analogues peptoids and semipeptoids or any combination thereof. In another embodiment, the peptides, polypeptides and proteins described have modifications rendering them more stable while in the organism or more capable of penetrating into cells. In one embodiment, the terms "peptide", "polypeptide" and "protein" apply to naturally occurring amino acid polymers. In another embodiment, the terms "peptide", "polypeptide" and "protein" apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid.

[0265] As used herein, the terms "isolated protein" refers to a protein that is essentially free from contaminating cellular components, such as carbohydrate, lipid, or other proteinaceous impurities associated with the nucleic acid in nature. Typically, a preparation of an isolated protein contains the protein in a highly purified form, e.g., at least about 80% pure, at least about 90% pure, at least about 95% pure, greater than 95% pure, or greater than 99% pure. In some embodiments, the isolated protein is a synthesized protein. Synthesis of protein is well known in the art and may be performed, for example, by heterologous expression in a transformed cell, such as exemplified herein.

[0266] In some embodiments, the protein comprises or consists of the amino acid sequence: MTSSKKFTVEVEPAIPAKDGKPSAGPVYRSIFAKDGFPAHIDGLDSCWDIFRLSVEK YPNNRMLGTREFVNGKHGPYVWSTYKQVYDKVIKVGNAIRACGVEPGGRCGIYG ANCAEWIMSMEACNAHGLYCVPLYDTLGAGAIEFILCHAEVTIAFVEEKKIPELLK TFPKAGEFLKTIVSFGKVTPEQREQAENFGLKIHSWDEFLTLGDDKNFDLPLKEKT DICTIMYTSGTTGDPKGVLISNNSMATLIAGVNRLLDSAKESLNQHDVYLSFLPLA HIFDRVIEECFINHGASIGFWRGDVKLLIEDIGELKPTIFCAVPRVLDRIYSGLQQKIS AGGFIKRNLFNLAYSYKLRNMKGGKTHSEASPLSDKIVFSKVKQGLGGNVRIILSG AAPLAPHVEAYLKVVACSHVLQGYGLTETCAGSFVSLPNEMEMLGTVGPPVPVL DARLESVPEMNYDACSSKPQGEICIRGDVLFSGYYKREDLTKEVFVDGWFHTGDI GEWQPDGSMKIIDRKKNIFKLSQGEYVAVENLENVYGNVSDIDTIWIYGNSFEFCL VAVVNPNEPAIKRYAEANNISGDFDSLCENPKIKEYILGELARIGKEKKLKGFEFVK AVHLDPVPFDMERDLLTPTFKKKRPQMLKYYQDVIDNMYKTINKK (SEQ ID NO: 12).

[0267] In some embodiments, the protein comprises an amino acid sequence with at least 91%, at least 93%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 12, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 91% to 97%, 92% to 99%, 93% to 98%, or 90% to 100% homology or identity to SEQ ID NO: 12. Each possibility represents a separate embodiment of the invention.

[0268] In some embodiments, the protein comprises or consists of the amino acid sequence: MDALRKPNSANSSPLTPIGFLERAAVVFANSPSIVYNNLIYTWSDTFHRCLRLASSI SRLAIRKGDVVSVLAPNIPAIYELHFGITMTGAIINTINTRLD ARTIS ILLCHSESKLV FVDYQLTRLIREAVSLMPDACVPPQLVLIVDDGHNLSLLSDQFINTYEAMVETGDP GFNWVRPDSDWDPLTLNYTSGTTSSPKGVVNSHRGSFIVAFDSLLEWHVPKQPIM LWTLPMFHANGWSFVWGMAAVGGTNVCLRKFDATIIYDTIRNHHVTHMCGAPV VLNMLSEGKPLEHTVHIMTAGAPPPAAVLLRTESLGFEVTHGFGMTETGGLVVSC SWKKEWNRLPVTEKARLKARQGVRTLGMTEVDIVDPESGVSVTRDGLTQGELVL RGGSIMLGYLKDPETTNKSVKNGWFYTGDVAVMHPDGYLEIKDRSKDVIISGGEN ISSVEVESILYQHPAINEAAVVGRPDEFWGESPCAFVSLKDDNGKVAVPTADEIMK FCKGKLPGYMVPKSVVFKKDLPKTSTGKIQKYVLRKLAKDLGFAVKSRI (SEQ ID NO: 13).

[0269] In some embodiments, the protein comprises an amino acid sequence with at least 83%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 13, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 83% to 95%, 85% to 99%, 83% to 100%, or 84% to 97% homology or identity to SEQ ID NO: 13. Each possibility represents a separate embodiment of the invention.

[0270] In some embodiments, the protein comprises or consists of the amino acid sequence: MTEEEKNKAESMGIKTYAWSDFLHLGSKNPSELQTPKATDICTIMYTSGTSGDPKG VILTHENATTNIRGVDLFMEQFEDKMTVDDVYISFLPLAHILDRMIEEYFFRSGASV GFYHGDINALKEDLAELKPTFLAGVPRVLEKIHEGVLKGLEEVNPRRRKIFSILYNH KLKYMKAGYKHKYASPLADLLAFRKVKNRLGGRIRLMVSGGAPLSTEIEEFMRV TSCAFVAQGYGLTETCGLATLGFPDEMCMIGTVGSPFVYTELRLEEVSDMGYDPL ANPPRGEICVKGKTPFAGYYKNPELTNEVMKDGWFHTGDIGEMQPNGVLKIIDRK KHLIKLSQGEYIALEYLEKVYCITPILEDIWVYGDSFKSSLVAVAVPNKENAEKWA DQKGLKVSYSELCTLTQFRDYIQSELKSTAERNKLRGFEHIKAIIVEPRTFEGDQEL LTATMKKRRNKLLNRYKEGIDNLYKNLAANKR (SEQ ID NO: 14).

[0271] In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 88%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 14, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 93%, 86% to 95%, 88% to 97%, or 86% to 100% homology to SEQ ID NO: 14. Each possibility represents a separate embodiment of the invention. [0272] In some embodiments, the protein comprises or consists of the amino acid sequence: MVYKSLNSISISDIVNLGISPETATQLHQKLTEIIQIYGFDAPQTWTQISTRILHPDLPF CFHQMMYYGCYVDFGPDPPAWSPDPKDAKLTNIGSLLERRGKEFLGPSYKDPISS YSALQEFSALNLEVFWKTILDEMNITFSVPPKRILVDDLSKESQLLHPGGRWLPGA YVNPARNCLSLSSKRRLSDIAVIWRDEGNDDMPVNKMTFQQLRSEVWLVAYALD TLGVEKGSAIAIDMPMDVKSVVIYLAIVLAGYVVVSIADSFAAGEISTRLVLSKAK AIFTQDLIIRGDRSHPLYSRVVDAQSPLAIVIPTRGSSFSIKLRDGDISWHDFLERANT YRNVEFVAVERPVEAFSNILFSSGTTGEPKAIPWTLATPFKAGADAWCHMDVHKG DVVAWPTNLGWMMGPWLIYASLLNGGSLALYNGSPLTSGFAKFVQDAKVTLLG VIPSIVRAWRTNNSTAGFDWSTIRCFGSTGEASNTDECLWLMGRAHYKPVIEYCG GTEIGGGFITGSLLQPQCLSAFSTPSLGCKLLILGEDGIPIPQNAPGIGELALNPLMFG ASSTLLNANHYDVYFKGMPSWNGKVLRRHGDVFERTSKGYYRAHGRADDTMNL GGIKVSSVEIERVCNSIDDRILETAAIGVTPSGGGPERLVIVVAFKDGSGSKPDLIKL KVTLNSALQKNLNPLFKVSDVVPFPSLPRTATNKVMRRVLRQQLTQIGQNSKL (SEQ ID NO: 15).

[0273] In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 88%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 15, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 93%, 86% to 95%, 88% to 97%, or 86% to 100% homology to SEQ ID NO: 15. Each possibility represents a separate embodiment of the invention.

[0274] In some embodiments, the protein comprises or consists of the amino acid sequence: MGDSEGSSISTPTTEQVGFLSNIMEDKSYSAAVAIMVAIAVPLVLSSVFAAKKKVK QRGVPVQVGGEPGFAMRNSRSNKLVDVPWEGARTMAALFEQSCKKHSQLRFLGT RKLIERSFVSGSDGRKFEKLHLGEYQWETYGQIFERVCNFASGLIQLGHDPDTRIAI FSDTRAEWLIAFEGCFRQNITVVTIYASLGDDALIHSLNETKVSTLICDSKLLKKVA AVSSSLKTVENFIYFESDNTEALNEIGDWKISSFSEVESLGQKSPVSARLPIKKDVA VIMYTSGSTGLPKGVMMTHGNVVATAAAVMTVIPNIGTNDVYLAYLPLAHIFELA AETVMVTAGIPIGYGSALTLTDTSNKIKKGTLGDASILKPTLMAAVPAILDRVRDG VLKKVEEKGGLTTKIFNIAYKRRLLAVDGSWLGAWGLEKLLWDAIVFKKIRSVLG GDIRFMLCGGAPLAADTQRFINVCVGAPIGQGYGLTETCAGAAFSEADDNSVGRV GPPLPCVYIKLVSWDEGGYLTSDKPMPRGEVVVGGYSVTAGYFNNEEKTNEVYK VDESGMRWFYTGDIGRFHPDGCLEIIDRKKDIVKLQHGEYISLGKVEAALASSKYV ENVMLHADPFHTYCVALVVPARQVIEQWAQDAGISYQDFAELCDKKETVSEVQQ SLTKVAKDAKLDKFETPAKIKLMPDPWTPESGLVTAALKLKREQLKSKFKDDLDK

LYG (SEQ ID NO: 16).

[0275] In some embodiments, the protein comprises an amino acid sequence with at least 89%, at least 92%, at least 94%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 16, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 89% to 95%, 89% to 98%, 90% to 99%, or 89% to 100% homology to SEQ ID NO: 16. Each possibility represents a separate embodiment of the invention.

[0276] In some embodiments, the protein comprises or consists of the amino acid sequence: MSVYTVKVEDSRAASGETPSAGPVYRCIYAKDALMELPPGYESPWDFFSESVKRN PKNPALGRRQVIDGKAGGYSWLSYQEAYNSALRIASAIRSRSVNPGDRCGIYGPNC PEWIISMEACNSNGITYVPLYDTLGANAVEYIINHAEISLVFVQENKLSAILSCLPNC SSNLKTIVSFGKFSESQKNEAMEHGVDCFSWEEFSSMGNLEDELPAKNKTDICTIM YTSGTTGEPKGVVLSNRAFMSEVLSMHELLIETDKPGTEEDTYFSFLPLAHIFDQIM ETYFIYSGASIGFWQGDIRYLIEDLLVLQPTIFCGVPRVYDRIYTGIMAKISTGGAIR KALFDFAYNYKLRNLEKGIQQDKSAPLLDKLVFDKIKQGFGGRVRLMLSGAAPLP

KHVEEFLRVTCCTVLSQGYGLTESCGGCFTSIANVYSMIGTVGVPMTTIEARLESV PEMGYDALSSVPCGEICLRGNTLFSGYHKRDDLTDAVLVDGWFHTGDIGEWQAD

GAMKIIDRKKNIFKLSQGEYVAVESIESTYSRCPLVTSIWVYGNSFESFLVAVVVPD RVAVEEFAAKNNESGDYASLCKNPNVRKYVLEELNAEAQCNKLRGFEMLKAVHL DPVPFDFERDLITPTFKLKRQQLLKYYKDCVEQLYAEAKTSKK (SEQ ID NO: 17).

[0277] In some embodiments, the protein comprises an amino acid sequence with at least 93%, at least 94%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 17, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 93% to 98%, 93% to 99%, 93% to 100%, or 95% to 100% homology to SEQ ID NO: 17. Each possibility represents a separate embodiment of the invention.

[0278] In some embodiments, the protein comprises or consists of the amino acid sequence: METHGPRLLGAAYKDPITSYKQFQKFSVQHLEVYWSLVLEKLSIQFQERPKCIVDT SDKSKHGGTWLPGSVLNIAECCILSTTETDEKVAIVWRDERCDNLDVNKMTFKEL RQQVMLVANALKLLFSKGDPIAIDMPMTVTAVILYLAIVYSGFVVVSIADSFAAKE IATRLRVSNAKAIFTQDYIVRGGRRFPLYSRVIEATQCRAIVVPAIGENVEVILRKQ DISWGDFLSGAKQLPSPDYCSPVYQSIDTLTNILFSSGTTGDPKAIPWTQISPMRCA ADGWAHMDIQAGDVYCWPTNLGWVMGPIVLYSSFLTGATLALYNGSPLGHGFG KFVQDAGVTILGTVPSIVKSWKSTRCMEGLDWTKIKAFGSTGEASNVDDDLWLSS KAYYKPVLECCGGTELASSYVQGNLLQPQAFGALSSASMGTGFVIFDDHGVPYPD DEPCVGEVGLFPVYMGASDRLLNADHEKIYFKGMPSYKGMQLRRHGDIIKRTIGG YLVVQGRADDTMNLGGIKTSSIEIERVCEQADGSIMETAAVSVAPATGGPELLAIF VVLKNGCNTQPQDLKMIFSKAIQKNLNPLFKVSFVKVVPEFPRTASNKLLRRVLRN QVKEELQTRSKI (SEQ ID NO: 18).

[0279] In some embodiments, the protein comprises an amino acid sequence with at least 84%, at least 87%, at least 91%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 18, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 84% to 99%, 85% to 99%, 84% to 100%, or 90% to 100% homology to SEQ ID NO: 18. Each possibility represents a separate embodiment of the invention.

[0280] In some embodiments, the protein comprises or consists of the amino acid sequence: MEITKSIQELGLQDLLNTGLTPNDAKSLQIEIKHIINSQTTNSNPVELWRQITSAKLL KPSYPHSLHQLIYYAVYCNYDASIYGPPLYWFPSEIDSKRSNLGNIMETHGPRLLG AAYKDPITSYKQFQKFSVQHLEVYWSLVLEKLSIQFQERPKCIVDTSDKSKHGGT WLPGSVLNIAECCILSTSETDDKVAIVWRDERCDNLDVNKMTFKELRQQVMLVA NALKLLFSKGDPIAIDMPMTVTAVILYLAIVYSGFVVVSIADSFAAKEIATRLRVSN

AKAIFTQDYIVRGGRRFPLYSRVIEATQCRAIVVPAIGENVEVILRKQDISWGDFLS GAKQLPSPDYCSPVYQSIDTLTNILFSSGTTGDPKAIPWTQISPMRCAADGWAHMD IQAGDVYCWPTNLGWVMGPIVLYSSFLTGATLALYNGSPLGHGFGKFVQDAGVTI LGTVPSIVKSWKSTRCMEGLDWTKIKAFGSTGEASNVDDDLWLSSKAYYKPVLEC CGGTELASSYVQGNLLQPQAFGALSSASMGTGFVIFDDHGVPYPDDEPCVGEVGL FPVYMGASDRLLNADHEKIYFKGMPSYKGMQLRRHGDIIKRTIGGYLVVQGRAD DTMNLGGIKTSSIEIERVCEQADGSIMETAAVSVAPATGGPELLAIFVVLKNGCNT QPQDLKMIFSKAIQKNLNPLFKVFS (SEQ ID NO: 19).

[0281] In some embodiments, the protein comprises an amino acid sequence with at least 82%, at least 87%, at least 91%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 19, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 82% to 99%, 83% to 99%, 82% to 100%, or 85% to 100% homology to SEQ ID NO: 19. Each possibility represents a separate embodiment of the invention. [0282] In some embodiments, the protein comprises or consists of the amino acid sequence: MVYKSLNSISISDIVNLGISPETATQLHQKLTEIIQIYGFDAPQTWTQISTRILHPDLPF CFHQMMYYGCYVDFGPDPPAWSPDPKDAKETNIGSEEERRGKEFEGPSYKDPISS YSALQEFSALNLEVFWKTILDEMNITFSVPPKRILVDDLSKESQLLHPGGRWLPGA YVNPARNCLSLSSKRRLSDIAVIWRDEGNDDMPVNKMTFQQLRSEVWLVAYALD TLGVEKGSAIAIDMPMDVKSVVIYLAIVLAGYVVVSIADSFAAGEISTRLVLSKAK AIFTQDLIIRGDRSHPLYSRVVDAQSPLAIVIPTRGSSFSIKLRDGDISWHDFLERANT YRNVEFVAVERPVEAFSNILFSSGTTGEPKAIPWTLATPFKAGADAWCHMDVHKG DVVAWPTNLGWMMGPWLIYASLLNGGSLALYNGSPLTSGFAKFVQDAKVTLLG VIPSIVRAWRTNNSTAGFDWSTIRCFGSTGEASNTDECLWLMGRAHYKPVIEYCG GTEIGGGFITGSLLQPQCLSAFSTPSLGCKLLILGEDGIPIPQNAPGIGELALNPLMFG ASSTLLNANHYDVYFKGMPSWNGKVLRRHGDVFERTSKGYYRAHGRADDTMNL GGIKVSSVEIERVCNSIDDRILETAAIGVTPSGGGPERLVIVVAFKDGSGSKPDLIKL KVTLNSALQKNLNPLFKVSDVVPFPSLPRTATNKVMRRVLRQQLTQIGQNSKL (SEQ ID NO: 20).

[0283] In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 88%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 20, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 93%, 86% to 95%, 88% to 97%, or 86% to 100% homology to SEQ ID NO: 20. Each possibility represents a separate embodiment of the invention.

[0284] In some embodiments, the protein comprises or consists of the amino acid sequence: MTFQQLRSEVWLVAYALDTLGVEKGSAIAIDMPMDVKSVVIYLAIVLAGYVVVSI ADSFAAGEISTRLVLSKAKAIFTQDLIIRGDRSHPLYSRVVDAQSPLAIVIPTRGSSFS IKLRDGDISWHDFLERANTYRNVEFVAVERPVEAFSNILFSSGTTGEPKAIPWTLAT PFKAGADAWCHMDVHKGDVVAWPTNLGWMMGPWLIYASLLNGGSLALYNGSP LTSGFAKFVQDAKVTLLGVIPSIVRAWRTNNSTAGFDWSTIRCFGSTGEASNTDEC LWLMGRAHYKPVIEYCGGTEIGGGFITGSLLQPQCLSAFSTPSLGCKLLILGEDGIPI PQNAPGIGELALNPLMFGASSTLLNANHYDVYFKGMPSWNGKVLRRHGDVFERT SKGYYRAHGRADDTMNLGGIKVSSVEIERVCNSIDDRILETAAIGVTPSGGGPERL VIVVAFKDGSGSKPDLIKLKVTLNSALQKNLNPLFKVSDVVPFPSLPRTATNKVMR RVLRQQLTQIGQNSKL (SEQ ID NO: 21).

[0285] In some embodiments, the protein comprises an amino acid sequence with at least 89%, at least 92%, at least 94%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 21, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 89% to 95%, 89% to 98%, 90% to 99%, or 89% to 100% homology to SEQ ID NO: 21. Each possibility represents a separate embodiment of the invention.

[0286] In some embodiments, the protein comprises or consists of the amino acid sequence: MNITFSVPPKRILVDDLSKESQLLHPGGRWLPGAYVNPARNCLSLSSKRRLSDIAVI WRDEGNDDMPVNKMTFQQLRSEVWLVAYALDTLGVEKGSAIAIDMPMDVKSVV IYLAIVLAGYVVVSIADSFAAGEISTRLVLSKAKAIFTQDLIIRGDRSHPLYSRVVDA QSPLAIVIPTRGSSFSIKLRDGDISWHDFLERANTYRNVEFVAVERPVEAFSNILFSS GTTGEPKAIPWTLATPFKAGADAWCHMDVHKGDVVAWPTNLGWMMGPWLIYA SLLNGGSLALYNGSPLTSGFAKFVQDAKVTLLGVIPSIVRAWRTNNSTAGFDWSTI RCFGSTGEASNTDECLWLMGRAHYKPVIEYCGGTEIGGGFITGSLLQPQCLSAFST PSLGCKLLILGEDGIPIPQNAPGIGELALNPLMFGASSTLLNANHYDVYFKGMPSW NGKVLRRHGDVFERTSKGYYRAHGRADDTMNLGGIKVSSVEIERVCNSIDDRILE TAAIGVTPSGGGPERLVIVVAFKDGSGSKPDLIKLKVTLNSALQKNLNPLFKVSDV VPFPSLPRTATNKVMRRVLRQQLTQIGQNSKL (SEQ ID NO: 22).

[0287] In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 94%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 22, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 95%, 89% to 98%, 90% to 99%, or 88% to 100% homology to SEQ ID NO: 22. Each possibility represents a separate embodiment of the invention.

[0288] In some embodiments, the protein comprises or consists of the amino acid sequence: MASSINISKIREAQRAQGPASILAVGTANPSNCVYQADYPDYYFRITKSEHMVDLK RKFKRMCDQSMIRKRYMQITEEYLKENPNICEYMAPSLDARQDVVVVEVPKLGK EAATKAIKEWGQPKSKITHLIFCTTSGVDMPGADYQLTKLLGLCPSVKRFMMYQQ GCFAGGTVLRLAKDIAENNKGARVLVVCSEITAVIFRGPNDTHLDSLIGQALFGDG ASSVIVGSDPDLTTERPLFEIISAAQTILPDSEGAIDGHLREAGLTFHLLKDVPRLISK NIEKALTQAFSPLGISDWNSIFWVTHPGGPAILDQVELKLGLKEEKMRTTRHVLSE YGNMSSACVFFVLDEMRKRSAKGGARTTGEGLDWGVLFGFGPGLTVETVVLHSL PTTMSIAT (SEQ ID NO: 27).

[0289] In some embodiments, the protein comprises an amino acid sequence with at least 92%, at least 96%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 27, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 92% to 100%, 95% to 100%, 96% to 100%, or 98% to 100% homology or identity to SEQ ID NO: 27. Each possibility represents a separate embodiment of the invention.

[0290] In some embodiments, the protein comprises or consists of the amino acid sequence: MASSINISKIREAQRAQGPASILAVGTANPSNCVYQADYPDYYFRITKSEHMVDLK EKFQRMCDKSMIRKRHIHITEEFLKENPNLCEYMAPSLDTRQDVVVVEVPKLGKE AATKAIKEWGQPKSKITHLIFCTTSGVDMPGADYQLTKLLGLHPSVKRFMMYQQG CFAGGTVLRLAKDLAENNKGARVLAVCSEITAVTFRGPNDTHIDSLVGQALFGDG AAAVIVGSDPDLTTERPLFEIISAAQTILPNSEGAIDGHVREVGVTIHILKDVPVLISK NIEKALTQAFSPLGISDWNSIFWVVHPGGPAILDQVELKLGLKEEKMRTTRHVLSE YGNMSSACVFFVLDEMRKRSAKGGARTTGEGLDWGVLFGFGPGLTVETVVLHSL PTTMSIAT (SEQ ID NO: 28).

[0291] In some embodiments, the protein comprises an amino acid sequence with at least 91%, at least 94%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 28, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 91% to 100%, 94% to 100%, 97% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 28. Each possibility represents a separate embodiment of the invention.

[0292] In some embodiments, the protein comprises or consists of the amino acid sequence: MASSINISKIREAQRAQGPASILAVGTANPSNCVYQADYPNYYFRITKSEHMVDLK RKFKRMCDQSMIRKRYMQITEEYLKENPNICEYMAPSLDARQDVVVVEVPKLGK EAATKAIKEWGQPKSKITHLIFCTTSGVDMPGADYQLTKLLGLCPSVKRFMMYQQ GCFAGGTVLRLAKDIAENNKGARVLVVCSEITAVIFRGPNDTHLDSLIGQALFGDG ASSVIVGSDPDLTTERPLFEIISAAQTILPDSEGAIDGHLREAGLTFHLLKDVPGLISK NIEKALTQAFSPLGISDWNSIFWVTHPGGPAILDQVELKLGLKEEKMRASRHVLSE YGNMSSACVFFILDEMRKKSDEDGAPTTGEGLDWGVLFGFGPGLTVETVVLHSLP TTMSIAT (SEQ ID NO: 29).

[0293] In some embodiments, the protein comprises an amino acid sequence with at least 93%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 29, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 93% to 100%, 94% to 100%, 96% to 100%, or 98% to 100% homology or identity to SEQ ID NO: 29. Each possibility represents a separate embodiment of the invention.

[0294] In some embodiments, the protein comprises or consists of the amino acid sequence: MASSINISKIREAQRAQGPASILAVGTANPSNYEIQADFPDYYFRVTKSEHMADMK GTFQRMCDKSMIRKRHMLITEEFLKENPNLCEYMAPSLDTRQDVVVVEVPKLGKE AATKAIKEWGQPKSKITHLIFCTTTGVDMPGADYQLTKLLGLAPSVKRFMIYQQG CFAGGTVLRLAKDIAENNKGARVLAVCSEITAMSFRGPNDTHVDSLVGQALFGDG AAAVIVGSDPDLTTERPLFEIISAAQTILPNSEGAIDGHVREVGLTIHILKDVPVLISK NIEKALTQAFSPLGISDWNSIFWIVHPGGPAILDQVELKVGLKKEKMATSRHVLSE YGNMSSACVFFIMDEMRKRSAKGGARTTGEGLDWGVLFGFGPGLTVETVVLHSL PTTM (SEQ ID NO: 30).

[0295] In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 30, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 100%, 91% to 100%, 93% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 30. Each possibility represents a separate embodiment of the invention.

[0296] In some embodiments, the protein comprises or consists of the amino acid sequence: MAEFTHLVVVKFKEEVVVEDIMKGLEKLVSQLDSVKSFVWGKDIESMEMLRQGF THAIMMTFGSKEDFTAFQSHPNHVEFSATFSAAIEKIVLLDFPVVAVKTATA (SEQ ID NO: 39).

[0297] In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 91%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 39, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 99%, 88% to 98%, 90% to 99%, or 89% to 100% homology or identity to SEQ ID NO: 39. Each possibility represents a separate embodiment of the invention.

[0298] In some embodiments, the protein comprises or consists of the amino acid sequence: MSSLQNKFIEHIALIKIKPGVESTTLIDKLNGLSSIEVLLHFSAGELLGSSHGFTHIVH CRVRSKDDLQIYLTHPIHLHLADDTLPLLDDVTVVDWFSSNSDIVDPPKPGSAMRV TLLKLKHDSTESNKLVVIEGIKNQFKGIEDVIVTTTFGENLFHEMHENFSIEIDKGYS IGSIAFVPGSADFQVLNSKVDNNKLNDLTESEVVVDYVFPSAN (SEQ ID NO: 40). [0299] In some embodiments, the protein comprises an amino acid sequence with at least 45%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%, homology or identity to SEQ ID NO: 40, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 45% to 90%, 50% to 99%, 65% to 98%, or 55% to 100% homology or identity to SEQ ID NO: 40. Each possibility represents a separate embodiment of the invention.

[0300] In some embodiments, the protein comprises or consists of the amino acid sequence: MSSEEQIVEHVVLFKVKPDADPSKVAAWVNGLNGLTSLQLALHLSAGQLIRCRSS SLTFTHMLHSRYRSKEHLRQYTVHPEHVRVVTEGKSIIDDVMALDWMISNGAASS VCPKPGSAVRVGFYKLMESLGEIEKARVLEVMGGIEELSVGESFCDDRAKGYTIAS TAVFPNGNPAADLDLYHSGDQLLLKEEVMKDSIQSVVVVDYVIPSP (SEQ ID NO:

41).

[0301] In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 80%, at least 90%, or at least 99% homology or identity to SEQ ID NO: 41, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 97%, 75% to 99%, 80% to 98%, or 71% to 100% homology or identity to SEQ ID NO:

41. Each possibility represents a separate embodiment of the invention.

[0302] In some embodiments, the protein comprises or consists of the amino acid sequence: MGEVKHILLAKFKDGISEQQIQHLITGYANLVNLVEPMKSFRWGKDVSIENLHQGF THVFESTFETTEGIATYISHPAHVEFATGFLDQLEKVIVIDYKPTSVDP (SEQ ID NO:

42).

[0303] In some embodiments, the protein comprises an amino acid sequence with at least 87%, at least 92%, at least 96%, or at least 97% homology or identity to SEQ ID NO: 42, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 87% to 97%, 88% to 99%, 90% to 98%, or 87% to 100% homology or identity to SEQ ID NO:

42. Each possibility represents a separate embodiment of the invention.

[0304] In some embodiments, the protein comprises or consists of the amino acid sequence: MLCAPARTRLLPSISLLPSQHNIFRRLNCLIHRRNHHQTPITMSAQQQIVEHVVLFK VKPDVDSSKVAAMVNGLNGLTSLDLTLHLSAGQLLRSRSSSLTFTHMLHSRYRSK DDLREYAAHPDHVRVVTENIKPVIDDIMAVDWISNDASVSPKPGSAMRVTFLKLK ENLGENEKSRVLEVIGGIKNQFKSIEELSVGENFSHDRAKGYTIASIAVLPGPSELEA LDSNTELVKLEKEKVKDLLESVVVVDYVIPSLQSASL (SEQ ID NO: 43).

[0305] In some embodiments, the protein comprises an amino acid sequence with at least 85%, at least 88%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 43, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 85% to 97%, 87% to 99%, 89% to 98%, or 85% to 100% homology or identity to SEQ ID NO: 43. Each possibility represents a separate embodiment of the invention.

[0306] In some embodiments, the protein comprises or consists of the amino acid sequence: MAVAQLSSSLCISTPARISTGSGFSSSGLPRIGTTFVCGSGSPLVISGTYHQKARVHK PAALSVRCEQSSKDGNGLNVWLGRTAMVGFAVAISVEVSTGKGLLENFGLTSPLP TVALALTALGGVLTALFIFQSASES (SEQ ID NO: 44).

[0307] In some embodiments, the protein comprises an amino acid sequence with at least 79%, at least 82%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 44, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 79% to 95%, 79% to 99%, 80% to 98%, or 79% to 100% homology or identity to SEQ ID NO: 44. Each possibility represents a separate embodiment of the invention.

[0308] In some embodiments, the protein comprises or consists of the amino acid sequence: MIEHIVLLKFKSDVDSTKVESMINELNGLASLDVALDVSAGKILRVSSTSSSSLTFT HLFRCCFRSADDQQVFSTHPDHLRVAIEVRPVIEDMVVVDLVSKTTIDSPNPGSAM KVRIFKLKDDLIEDSKLVVMEGIKNELKAVEHIRFGDNINVMAKGYSIAMIAFFPD LESSVAGAEIVKDYIESELVVDFVFPPPNVTSHS (SEQ ID NO: 45).

[0309] In some embodiments, the protein comprises an amino acid sequence with at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 45, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 50% to 90%, 55% to 99%, 60% to 97%, or 50% to 100% homology or identity to SEQ ID NO: 45. Each possibility represents a separate embodiment of the invention. [0310] In some embodiments, the protein comprises or consists of the amino acid sequence: MAEFTHLVVVKFKEEVVVEDIMKGLEKLASQLDSVKSFVWGKDIESMEMLRQGF THAIMMTFGSKEDFTAFQSHPNHVEFSATFSAAIEKIVLLDFPVVAVKTATA (SEQ ID NO: 46).

[0311] In some embodiments, the protein comprises an amino acid sequence with at least 87%, at least 93%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 46, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 87% to 97%, 88% to 99%, 89% to 98%, or 87% to 100% homology or identity to SEQ ID NO: 46. Each possibility represents a separate embodiment of the invention.

[0312] In some embodiments, the protein comprises or consists of the amino acid sequence: MELSLSSSSSSSLPQLHTHPSSSSSSSHYIKKSPFFINKFNNHTKCKFHNSSALRTNFF YTTITKTSSSRFVLNKNPNQFSVKACSQVGSAGSDPALNKVADFKDAFWRFLRPH TIRGTALGSVSLVTRALLENPNLIRWSLLLKAFSGLVALICGNGYIVGINQIYDIGID KVNKPYLPIAAGDLSVQSAWFLVLAFAMVGVIIVGMNFGPFrrSLYSLGLFLGTIYS VPPLRMKRFPVVAFLIIATVRGFLLNFGVYYAVRAALGLTFQWSSAVAFITTFVTL FALVIAITKDLPDVEGDRKFQISTFATKLGVRNIALLGSGLLLINYIGSIVAALYMPQ AFRSSLMIPLHTILASCLIYQAWILERANYTQEAIAGYYRFVWNLFYSEYIIFPFI (SEQ ID NO: 59).

[0313] In some embodiments, the protein comprises an amino acid sequence with at least 82%, at least 85%, at least 90%, or at least 99% homology or identity to SEQ ID NO: 59, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 82% to 99%, 85% to 98%, 84% to 99%, or 82% to 100% homology or identity to SEQ ID NO: 59. Each possibility represents a separate embodiment of the invention.

[0314] In some embodiments, the protein comprises or consists of the amino acid sequence: MATMASSLLNPLSCSIKPNSNRLPLPTPISLSRSCRRLTIKATETDANEVKPKAPEKA PAASGSGFNQILGIKGAKQETNKWKIRVQLTKPVTWPPLIWGVVCGAAASGNFQ WTVEDVAKSIVCMLMSGPFLTGYTQTINDWYDRDIDAINEPYRPIPSGAISENEVIT QIWVLLLGGIGLAGILDVWAGHKSPTIFYLALGGSLLSYIYSAPPLKLKQNGWIGN FALGASYISLPWWAGQALFGTLTPDIVVLTLLYSIAGLGIAIVNDFKSVEGDRKMG LQSLPVAFGEETAKWICVGAIDITQLSIAGYLLGSGKPYYALALVGLIVPQIFFQFK YFLKDPVKYDVKYQASAQPFLILGLLVTALATSH (SEQ ID NO: 60). [0315] In some embodiments, the protein comprises an amino acid sequence with at least 92%, at least 93%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 60, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 92% to 98%, 93% to 99%, 94% to 98%, or 92% to 100% homology or identity to SEQ ID NO: 60. Each possibility represents a separate embodiment of the invention.

[0316] In some embodiments, the protein comprises or consists of the amino acid sequence: MKSLIIGSFSNKVSCYSPSLPDSSSSLIPTGCYHVSLRTFQRNRAIQAQSSLVRCNIG KFNETLLLSRKRSTKHVACAVSEQPIEPDATNPQSSLPNALDAFYRFSRPHTVIGTA LSIVSVSLLAVQKLSDFSPLFFIGVFEAIVAAFFMNIYIVGLNQLSDIEIDKVNKPYLP LASGEYSVQTGIIIVSSFAVMSFWLGWIVGSWPLFWALFISFLLGTAYSINIPMLRW KRFALVAAMCILAVRAHVQVAFYLHIQTFVYGRLAVFPKPVIFATGFMSFFSVVIA LFKDIPDIVGDKIFGIQSFTVRMGQKRVFWICILLLEIAYGVAILVGASSPFLWSRYI TVLGHAILGLILWGRAKSTDLESKSAITSFYMFIWQLFYAEYLLIPLVR (SEQ ID NO: 61).

[0317] In some embodiments, the protein comprises an amino acid sequence with at least 89%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 61, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 89% to 97%, 89% to 99%, 90% to 98%, or 89% to 100% homology or identity to SEQ ID NO: 61. Each possibility represents a separate embodiment of the invention.

[0318] In some embodiments, the protein comprises or consists of the amino acid sequence: MELSLSSSSSSSLPQLHTHPSSSSSSSHYIKKSPFFINKFNNHTKCKFHNSSALRTNFF YTTITKTSSSRFVLNKNPNQFSVKACSQVGSAGSDPALNKVADFKDAFWRFLRPH TIRGTALGSVSLVTRALLENPNLIRWSLLLKAFSGLVALICGNGYIVGINQIYDIGID KVNKPYLPIAAGDLSVQSAWFLVLAFAMVGVIIVGMNFGPFrrSLYSLGLFLGTIYS VPPLRMKRFPVVAFLIIATVRGFLLNFGVYYAVRAALGLTFQWSSAVAFITTFVTL FALVIAITKDLPDVEGDRKFQISTFATKLGVRNIALLGSGLLLINYIGSIVAALYMPQ AFRSSLMIPLHTILASCLIYQAWILERANYTQRSQYFDMSSCRRR (SEQ ID NO: 62).

[0319] In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 62, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 81% to 97%, 83% to 99%, 84% to 98%, or 81% to 100% homology or identity to SEQ ID NO: 62. Each possibility represents a separate embodiment of the invention.

[0320] In some embodiments, the protein comprises or consists of the amino acid sequence: MELSLSSSSSSSLPQLHTHPSSSSSSSHYIKKSPFFINKFNNHTKCKFHNSSALRTNFF YTTITKTSSSRFVENKNPNQFSVKACSQVGSAGSDPAENKVADFKDAFWRFERPH TIRGTAEGSVSEVTRAEEENPNEIRWSEEEKAFSGEVAEICGNGYIVGINQIYDIGID KVNKPYEPIAAGDESVQSAWFEVEAFAMVGVIIVGMNFGPFrrSEYSEGEFEGTIYS VPPERMKRFPVVAFEIIATVRGFEENFGVYYAVRAAEGETFQWSSAVAFITTFVTE FAEVIAITKDEPDVEGDRKFQISTFATKEGVRNIAEEGSGEEEINYIGSIVAAEYMPQ VKTTSIDHYRPYSFEVDEPGQNGITEAA (SEQ ID NO: 63).

[0321] In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 63, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 81% to 97%, 83% to 99%, 84% to 98%, or 81% to 100% homology or identity to SEQ ID NO: 63. Each possibility represents a separate embodiment of the invention.

[0322] In some embodiments, the protein comprises or consists of the amino acid sequence: MATMASSLLNPLSCSIKPNSNRLPLPLPIPISLSRSCRRLTIKATETDANEVKPKAPE KAPAASGSGFNQILGIKGAKQETNKWKIRVQLTKPVTWPPLIWGVVCGAAASGNF QWTVEDVAKSIVCMLMSGPFLTGYTQTINDWYDRDIDAINEPYRPIPSGAISENEVI TQIWVLLLGGIGLAGILDVWAGHKSPTIFYLALGGSLLSYIYSAPPLKLKQNGWIG NFALGASYISLPWWAGQALFGTLTPDIVVLTLLYSIAGLGIAIVNDFKSVEGDRKM GLQSLPVAFGEETAKWICVGAIDITQLSIAGYLLGSGKPYYALALVGLIVPQIFFQF KYFLKDPVKYDVKYQASAQPFLILGLLVTALATSH (SEQ ID NO: 64).

[0323] In some embodiments, the protein comprises an amino acid sequence with at least 92%, at least 93%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 64, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 92% to 98%, 93% to 99%, 94% to 98%, or 92% to 100% homology or identity to SEQ ID NO: 64. Each possibility represents a separate embodiment of the invention.

[0324] In some embodiments, the protein comprises or consists of the amino acid sequence: MASLAIGSLGSPSSRQCSSPVASSSSFAIGSQIASKFLRISKFDKTKNSPLTLQQKHIN KSIDQSFFEPLPLHKINKDKFKLYATSTNNPQFDATHDLKTPEVSIINFVDALYRLIR PYTAVVTIVSVVAMSLLTVNSLSDFSPLFFIKVVQALIGGIFMQMYVSGFNQICDIE LDKVNKQSLPLAAGELSMKTAIVIASLSAIMSLSIGWFVGSPPLLWCLVWWFIVGT AYSANVLPYLRWKRFPFTAAFCAMTSRALVLPIGYYLHMQNSIPGVSALLSRPILF AVAMLSAFSLSAMFFKDIPDIKGDRMHGIKSLAIKLGEKRVYWISISIIEIAYIAAAFI GATSPISWSKYVTIIGHLGMGLLLWVRARSVDPTNTVAVQSMYMFLIKLVYAEYG LISLVR (SEQ ID NO: 65).

[0325] In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 65, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 90%, 75% to 99%, 73% to 97%, or 71% to 100% homology or identity to SEQ ID NO: 65. Each possibility represents a separate embodiment of the invention.

[0326] In some embodiments, the protein comprises or consists of the amino acid sequence: MKSLIIGSFSNKVSCYSPSLPDSSSSLIPTGCYHVSLRTFQRNRAIQAQSSLVRCNIG KFNETLLLSRKRSTKHVACAVSEQPIEPDATNPQSSLPNALDAFYRFSRPHTVIGTA LSIVSVSLLAVQKLSDFSPLFFIGVFEAIVAAFFMNIYIVGLNQLSDIEIDKVNKPYLP LASGEYSVQTGIIIVSSFAVMSFWLGWIVGSWPLFWALFISFLLGTAYSINIPMLRW KRFALVAAMCILAVRAHVQVAFYLHIQTFVYGRLAVFPKPVIFATGFMSFFSVVIA LFKDIPDIVGDKIFGIQSFTVRMGQKRVFWICILLLEIAYGVAILVGASSPFLWSRYI TVLGHAILGLILWGRAKSTDLESKSAITSFYMFIWQLFYAEYLLIPLVR (SEQ ID NO: 66).

[0327] In some embodiments, the protein comprises an amino acid sequence with at least 89%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 66, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 89% to 97%, 89% to 99%, 90% to 98%, or 89% to 100% homology or identity to SEQ ID NO: 66. Each possibility represents a separate embodiment of the invention.

I l l [0328] In some embodiments, the protein comprises or consists of the amino acid sequence: MLIHHEHFLTTGFESSNDRAAYSINFSKQHHLHMASIATGSLCRPTSHQFSIPVASSS SFATGSQFASKFLHISISAKKSSLTLQQRHIHKNIDQSFLKPLALQKLNKDKFKLNG TSPDNPQFDATHDLKTQIESTINFVDVLYRLLRPYALLQMGLCVVTMSLLTVESLS DFSPLFFVKVAQALIGGIFMQMYVNGFNQICDIELDKVNKPSLPLASGELSKTTTIV VSSLSAITSLSIGWFVGSPPLLWSLVVWFIAGTTYSANLPYLRWKRFPFTNMFCNLT MALVVPIGTYLHMENSIHGVSTLLSRPLLFTVAMCTVFPVSIILFKDIPDIKGDRMH GMKSLAIILGEKRTYWICIWILEITYIAAAFFGATSPISWSKYVTIISHLGMGFLLWL RSKSVDVKNTVAVQSMYMFLWKLLYAEYGLILLVR (SEQ ID NO: 67).

[0329] In some embodiments, the protein comprises an amino acid sequence with at least 68%, at least 75%, at least 80%, at least 855, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 67, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 68% to 97%, 69% to 99%, 70% to 98%, or 68% to 100% homology or identity to SEQ ID NO: 67. Each possibility represents a separate embodiment of the invention.

[0330] In some embodiments, the protein comprises or consists of the amino acid sequence: MFIHHEQFLTTGFESSNDRAAYSINFLKQHHLHMVSIATGSLCRPTSHRFSIPVASSS SFATGSQFASISAKKSSLTLKQRHTHKNIDQSFFKPLALQKMNKGKFKLNATSPDN SQLDATHDLKTQIESIINFVDVLYRLIRPYVVLGMGVTIVTMCLLTVDSLSDFSPLFF VKVAQALIGSIFMAMYVNSFNEICDIELDKVNKPSLPLASGELSMTTAIVVSSLSAI MSLSIGWFVGSPPLLWSLVVWFILGTAYSANLPYLRWKRFPLTTLSSALTMGALVI PIGNYMHMENSIRGVTTLLSRPLLFAVAMCAAFHVSTILFKDIPDIKGDRMHGMKS LAIKLGEKRMYWICIWILEIAYIAAAFFGATSPISWSKYVTIISHLGMGFLLWLRSKS VDVKNTVAVQSMYMFLWKLFYVEHGLILLVR (SEQ ID NO: 68).

[0331] In some embodiments, the protein comprises an amino acid sequence with at least 66%, at least 75%, at least 80%, at least 855, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 68, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 66% to 97%, 67% to 99%, 70% to 98%, or 66% to 100% homology or identity to SEQ ID NO: 68. Each possibility represents a separate embodiment of the invention. [0332] In some embodiments, the protein comprises or consists of the amino acid sequence: MASIATGSLCRPTSHRFSIHVASSSSFATGSQFASKILQISISAKKSSLTLQQRHIHKN IDQSFFKPLALQKMNKDKFKLNATSPDNPQFDATRDLKTQIESIIKFVDVLYRLLRP YAILEMGLSVVTMSLLTVESLSDFSPLFFVKVAQALIGGIFMQMYVNGFNQICDIEL DKVNKPSLPLASGELSTTTTIVVSSLSAIMSLSIGWFVGSPPLLWSLVVWFIVGTTY STNLPYLRWKRFPFTAMFCNLTRALVVPIGTYLHMKNSIHEVSTLLSRPLLFAVAM CTVFPISIILFKDIPDIKGDRMHGMKSLAIILGEERTYWICIWILEIAYIAAAFFGATSP ISWSKYVMIISHLGMGFLLWLRSKSVDVKNTVAVQSMYMFLWKLLYAEYGLILL VR (SEQ ID NO: 69).

[0333] In some embodiments, the protein comprises an amino acid sequence with at least 68%, at least 75%, at least 80%, at least 855, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 69, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 68% to 97%, 69% to 99%, 70% to 98%, or 68% to 100% homology or identity to SEQ ID NO: 69. Each possibility represents a separate embodiment of the invention.

[0334] In some embodiments, the protein comprises or consists of the amino acid sequence: MASLAIGSLGSPSSRQCSSPVASSSSFAIGSQIASKFLRISKFDKTKNSPLALQQKHIN KSIDQSFFEPLPLHKINKDKFKLYATSTNNPQFDATHDLKTPEVSIINFVDALYRLIR PYTAVVTIVSVVAMSLLTVNSLSDFSPLFFIKVVQALIGGIFMQMYVSGFNQICDIE LDKVNKQSLPLAAGELSMKTAIVIASLSAIMSLSIGWFVGSPPLLWCLVWWFIVGT AYSANVLPYLRWKRFPFTAAFCAMTSRALVLPIGYYLHMQNSIPGVSALLSRPILF AVAMLSAFSLSAMFFKDIPDIKGDRMHGIKSLAIKLGEKRVYWISISIIEIAYIAAAFI GATSPISWSKYVTIIGHLGMGLLLWVRARSVDPTNTVAVQSMYMFLIKLVYAEYG LISLVR (SEQ ID NO: 70).

[0335] In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 70, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 70. Each possibility represents a separate embodiment of the invention. [0336] In some embodiments, the protein comprises or consists of the amino acid sequence: MGLNICTRFIPCLVVVLMFLFTSTYSATPEDKFLQCISQKLNITNSDEVFTQSNTRYS SVLESTIVNLRFATSTTPKPFAIITPLSYSHVQSAVVCAKKAGIRIRIRSGGHDYVGL SYTSSDNVPFVVLDLKQLQNVTVEYSKKTAWVESGATIGQLYYWVSQKSKNLGF PGGTCATIGVGGHLSGGGFGTLVRKYGLSADNVIDAKIVDVNGRLLDRKSMGEDL FWAIRGGGGGSFGVVVAWMVNLVHVPEKVTAFTIVRTLEQGGSDLFNKWQHVG PKLTKDLFISVIIQPISVWNGNGTVQVIFNSMYLGTVDKLMKTVNSSFPELGLQAK DCTEMSWIQSVLYFAGYPIEGSIDVLKDRKPDTRNYFDNKSDHVKEPIPKERLEDL WKWCMEGDFPILLMDPLGGKMNEIDTTRIPYPYRNGYSYMIQYVETWENIGDSEK RISWMRQMYENMTPYVSKNPRSAYVNYRDLDLGKNDNAKNTSYLEAMKWGSK YFGDNFKRLAMVKGVVDPDNFFFHEQSIPPLKV (SEQ ID NO: 80).

[0337] In some embodiments, the protein comprises an amino acid sequence with at least 69%, at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 80, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 69% to 99%, 70% to 98%, 75% to 99%, or 69% to 100% homology or identity to SEQ ID NO: 80. Each possibility represents a separate embodiment of the invention.

[0338] In some embodiments, the protein comprises or consists of the amino acid sequence: MGCNLLQKLTIFVFFIMSISIPSFAYEHEHEHEHEHENDQDRVQDEKEPTDVFTSCL TRFGVHNFTTHSKSNNDNSVYYELLNFSIQNLRFTGLSMPKPVVIVFPETKEQLAK TVVCARESSLEIRVRCGGHSYEGTSSVSTDGRPFVVIDMTRLDNVSVDVNSGTAW VEAGATLGQMYCAIAESSTVHGFSAGSCPTVGTGGHISGGGFGLLSRKYGLAADN VVDAVLVTADGELLNRDTMGEDVFWAIRGGGGGVWGIVYAFNVKLSSVPKTVT NFVVSRPGTKGQVTDLVYKWQHVAPKLPDDFYLSSFVGAGLPERKNKPGLSATF KGFYLGSKSKALSIMNQTFPELKVMENDCKETSWIESILFFSGYGDESSVSDLKNRF LQDKLYYKAKSDYVRKPIPRFGLTTALEILEKQPKGYVILDPYGGAMQTISSDSIPF PHRKGNIFTIQYLVEWKEPDNDKTNDYLAWIRDFHGSMTPYVAQDPRAAYINYM DVDIGVMNWIKTRVDSDDAVEMGREWGEKYFYKNYDRLVRAKTQIDPYNVFRH QQSIPPMSLENKNRRGSISSE (SEQ ID NO: 81).

[0339] In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 81, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 92% to 98%, 93% to 99%, 94% to 98%, or 92% to 100% homology or identity to SEQ ID NO: 81. Each possibility represents a separate embodiment of the invention.

[0340] In some embodiments, the protein comprises or consists of the amino acid sequence: MKTSSNMLSVLLILFFITCSKAALDPDSVYQSFLQCLPLYSPESAEELSKVVYSSTL NTTTYETVLQEYIKNERFNTTATPKPSVIITPTTESQVQAAVLCAKKTGVQIKIRSG GHDYEGISYISSEPDFIVLDMFNFRSINVNVADETAVVGAGAQLGELYYRIYEKSK TLGFPAGVCQTVGVGGHLSGGGYGTMLRKYGLSVDHVIDAKIVDVNGQVLDRKS MGEDLFWAIRGGGGGSFGVILSYTVKLVSVPEVNTVFRVLKTTSENASELIYKWQ SIMPDIDNDLFIRVLLQPVTVNKQKVGRATFIAHFLGDSDRLVALMSKNFPELGLK KEDCIEVSWIESVLYWANFDLNTTKPEILLDRHSDSVSYGKRKSDYVQTPIPESGLE SIFEKLVELGKIGLVFNSYGGRMSEVAADATPFPHRAGNIFKIQYSVNWNDADPEL EANYLNQSRVMYDFMTPFVSKNPRAAFLNYRDLDIGVMTPGKNSYSEGEVYGEK YFMGNFERLVKIKTAVDPDNFFRNEQSIPTRAAKNSGKSRKMMK (SEQ ID NO: 82).

[0341] In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 82, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 97%, 87% to 99%, 88% to 98%, or 86% to 100% homology or identity to SEQ ID NO: 82. Each possibility represents a separate embodiment of the invention.

[0342] In some embodiments, the protein comprises or consists of the amino acid sequence: MGLNICTRFIPCLVVVLMFLFTSTYSATPEDKFLQCISQKLNITNSDEVFTQSNTRYS SVLESTIVNLRFATSTTPKPFAIITPLSYSHVQSAVVCAKKAGIRIRIRSGGHDYVGL SYTSSDNVPFVVLDLKQLQNVTVEYSKKTAWVESGATIGQLYYWVSQKSKNLGF PGGTCATIGVGGHLSGGGFGTLVRKYGLSADNVIDAKIVDVNGRLLDRKSMGEDL FWAIRGGGGGSFGVVVAWMVNLVHVPEKVTAFTIVRTLEQGGSDLFNKWQHVG PKLTKDLFISVIIQPISVWNGNGTVQVIFNSMYLGTVDKLMKTVNSSFPELGLQAK DCTEMSWIQSVLYFAGYPIEGSMDVLKDRKPQTRRYFNNKSDHVKEPIPKERLED LWKWCMEGDFPILLMDPLGGKMNEIDTTRIPYPYRNGYSYMIQYVETWENIGDSE KRISWMRQMYENMTPYVSKNPRSAYVNYRDLDLGKNDNAKNTSYLEAMKWGS KYFGDNFKRLAMVKGVVDPDNFFFHEQSIPPLKV (SEQ ID NO: 83).

[0343] In some embodiments, the protein comprises an amino acid sequence with at least 69%, at least 75%, at least 80%, at least 85%, at least 92%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 83, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 69% to 97%, 70% to 99%, 75% to 98%, or 69% to 100% homology or identity to SEQ ID NO: 83. Each possibility represents a separate embodiment of the invention.

[0344] In some embodiments, the protein comprises or consists of the amino acid sequence: MDQYVITKFISYLLAVFMALFCSDPTADKFLQCFTKDSNATDSNFVFTQENTQYSS VEESTIINERFATSITPKPIAVITPESYSHVQSAIECSKKIGYRIRIRSGGHDYAGVSYT SYDHDHTPFVVEDEKEERTITIDSGENTSWVESGATVGEEYYWVSQKSRNEGFPA GICPTVGVGGHESGGGVGTMVRKYGEAADNVIDARIIDVNGRIEDRKSMGEDEFW AIRGGGGASFGVIVAWKVNEVYVPEKSFGF (SEQ ID NO: 84).

[0345] In some embodiments, the protein comprises an amino acid sequence with at least 84%, at least 87%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 84, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 84% to 97%, 86% to 99%, 85% to 98%, or 84% to 100% homology or identity to SEQ ID NO: 84. Each possibility represents a separate embodiment of the invention.

[0346] In some embodiments, the protein comprises or consists of the amino acid sequence: MELYISTRFILCFLVVLMLMFSSTYSDPLEDKFLRCLSQNSNATNSDNVFTQENTQ YSSVLESTIINLRFATSTTPKPLAIITPLSCSHVQSAVLCAKKVGIRIRIRSGGHDYAG LSYTSSENAPFVVLDLKQLQNVTVESSKKTAWVESGATIGQLYYWVSQKSKNLGF PAGTCATIGVGGHLSGGGFGTLVRKYGLSADNVIDAKIVDVNGRLLDRKSMGEDL FWAIRGGGGGSFGVVVAWKVNLVHVPEKVTAFTIVRTLEQGGSDIFNKWQHIGH KLTKDLFIRVIIQPISVSNGNRTVQVIFNSMYLGTVDKLMKTVNSSFPELGLQEKDC TEMSWIQSVLYFAGYPIEGSMDVLKDRKPDTRNYFDNKSDHVKEPIPKERLEDLW KWCMEVDFPILIMEPLGGKMNEIDTTRIPYPYRKGYSYMIQYVEAWDNIGDSEKHI SWLRQMYENMTPYVSKNPRSAYVNYRDLDLGKNDNAKNTSYLEAMKWGSKYF GDNFKRLAMVKGVVDPDNFFFHEQSIPPLKV (SEQ ID NO: 85).

[0347] In some embodiments, the protein comprises an amino acid sequence with at least 72%, at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 85, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 72% to 99%, 74% to 98%, 78% to 99%, or 72% to 100% homology or identity to SEQ ID NO: 85. Each possibility represents a separate embodiment of the invention.

[0348] In some embodiments, the protein comprises or consists of the amino acid sequence: MGLNICTRFIPCLVVVLMFLFTSTYSATPEDKFLQCISQKLNITNSDEVFTQSNTRYS SVLESTIVNLRFATSTTPKPFAIITPLSYSHVQSAVVCAKKAGIRIRIRSGGHDYVGL SYTSSDNVPFVVLDLKQLQNVTVEYSKKTAWVESGATIGQLYYWVSQKSKNLGF PGGTCATIGVGGHLSGGGFGTLVRKYGLSADNVIDAKIVDVNGRLLDRKSMGEDL FWAIRGGGGGSFGVVVAWMVNLVHVPEKVTAFTIVRTLEQGGSDLFNKWQHVG PKLTKDLFISVIIQPISVWNGNGTVQVIFNSMYLGTVDKLMKTVNSSFPELGLQAK DCTEMSWIQSVLYFAGYPIEGSMDVLKDRKPQTRRYFNNKSDHVKEPIPKERLED LWKWCMEGDFPILLMDPLGGKMNEIDTTRIPYPYRNGYSYMIQYVETWENIGDSE KRISWMRQMYENMTPYVSKNPRSAYVNYRDLDLGKNDNAKNTSYLEAMKWGS KYFGDNFKRLAMVKGVVDPDNFFFHEQSIPPLKV (SEQ ID NO: 86).

[0349] In some embodiments, the protein comprises an amino acid sequence with at least 69%, at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 86, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 69% to 99%, 70% to 98%, 75% to 99%, or 69% to 100% homology or identity to SEQ ID NO: 86. Each possibility represents a separate embodiment of the invention.

[0350] In some embodiments, the protein comprises or consists of the amino acid sequence: MGEDLFWAIRGGGGGSFGVVVAWMVNLVHVPEKVTAFTIVRTLEQGGSDLFNK WQHVGPKLTKDLFISVIIQPISVWNGNGTVQVIFNSMYLGTVDKLMKTVNSSFPEL GLQAKDCTEMSWIQSVLYFAGYPIEGSMDVLKDRKPQTRRYFNNKSDHVKEPIPK ERLEDLWKWCMEGDFPILLMDPLGGKMNEIDTTRIPYPYRNGYSYMIQYVETWE NIGDSEKRISWMRQMYENMTPYVSKNPRSAYVNYRDLDLGKNDNAKNTSYLEA MKWGSKYFGDNFKRLAMVKGVVDPDNFFFHEQSIPPLKV (SEQ ID NO: 87).

[0351] In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 87, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 75% to 99%, 74% to 98%, 78% to 99%, or 71% to 100% homology or identity to SEQ ID NO: 87. Each possibility represents a separate embodiment of the invention.

[0352] In some embodiments, the protein comprises or consists of the amino acid sequence: MELKLFTCKLVTIILALSLSFFTSTSSSDFLDCISQKNLSNIIFTPNDTSYSTILQFTIP N LRFNTPKTTKPLAIITPTTYSHVQSTIICSVQFKHHVRIRSGGHDYEGLSYTSFNNTP FILLDLNQLRSVTVDLDSNTTWVESGATLGELLYWVSRKSNILGIPTGECTSVGVG GQLSGGGFGNMARKYGLFSDNAVDALIIDVNGRILDRDSMGEDLFWAIRGGGGG NFGVVLSWKINLVYVPPKVTVFTVSKMLDENGTKIVHKWQYIAHNITQDLFINLIV SPVTVSNTTILAVTINSLFLGMKNELVATMDVIFPELGLQEKDCIEMSWIESVVYHS VYLRGQSVDALIERRPWPKSYNKYKSDYVKKPMSEKALEKLWKWCLEENLILAIE PHGGKMSEIDESSTPYPHRKGNLYIIQYVMQWDEGYNTTQKHVASIRRVYKKMAP FVSKNPREAYVNFRDLDLGTNGNACGTSGASYVQALRWGKKYFKGNFKRLAIVK GRVDPTNFFCNEQSIPPYSY (SEQ ID NO: 88).

[0353] In some embodiments, the protein comprises an amino acid sequence with at least 74%, at least 79%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 88, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 74% to 99%, 78% to 98%, 81% to 99%, or 74% to 100% homology or identity to SEQ ID NO: 88. Each possibility represents a separate embodiment of the invention.

[0354] In some embodiments, the protein comprises or consists of the amino acid sequence: MTNSELVFIPSPGAGHLPPTVELAKLLLHREPQLSVTIIIMNLPHETKPTTETRMSTP RLRFIDIPKDESTKDLISRHTFISAFLEHQKPHVRNIVRSITESDSVRLVGFVVDMFCI AMMDVANELGAPTYLYFTSSAASLGLMFCLQAKRDDEEFDVTELKDKDSELSIPC YTNPLPAKLLPSVLFDKRGGSKTFIDLARKYRESRGIVVNTFQELESYAIEYLASSN ANVPPVFPVGAILNQEKKVNDDKTEEIMTWLNEQPESSVVFLCFGSMGSFGEDQIK EIALAIEESGQRFLWSLRRPPSNENKYPKEYENFGEVLPEGFLERTSSVGKVIGWAP QMAVLSHSSVGGFVSHCGWNSTLESIWCGVPVAAWPLYAEQQLNAFKLVVELGL AVEIKIDYRSENEIILTSKEIESGIRRLMNDEELRMKVKEMKGNSRFAVSEGGSSYV SIRRFIDLVMTKE (SEQ ID NO: 102).

[0355] In some embodiments, the protein comprises an amino acid sequence with at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 102, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 75% to 99%, 76% to 98%, or 75% to 100% homology or identity to SEQ ID NO: 102. Each possibility represents a separate embodiment of the invention.

[0356] In some embodiments, the protein comprises or consists of the amino acid sequence: MPTSELVFIPSPGVGHLSPTIELVNQLLHRDQRLSVTIIVMKFSLESKHDTETPTSTP RLRFIDIPYDESAMALINPNTFLSAFVEHNKPHVRNIVRDISESNSVRLAGFVVDMF CVAMTDVVNEFEIPTYIYFTSTANLLGLMFYLQAKRDDEGFDVTVLKDSESEFLSV PSYVNPVPAKVLPDAVLDKNGGSQMCLDLAKGFRESKGIIVNTFQELERRGIEHLL SSNMNLPPVFPVGPILNLRNAPNDGKTADIMTWLNDHPENSVVFLCFGSMGSFEK EQVKEIAIAIEQSGQRFLWSLRRPTSLEKFEFPKDYENPEEVLPKGFLERTKGVGKV IGWAPQMAVLSHPSVGGFVSHCGWNSTLESIWCGVPIAAWPLYAEQKINAFQLVV EMGMAAEIRIDYRTNTRPGGGKEMMVMAEEIESGIRKLMSDDEMRKKVKGMKD KSRAAVLEGGSSHTSIGILIENLVSITI (SEQ ID NO: 103).

[0357] In some embodiments, the protein comprises an amino acid sequence with at least 76%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 103, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 76% to 99%, 80% to 98%, or 76% to 100% homology or identity to SEQ ID NO: 103. Each possibility represents a separate embodiment of the invention.

[0358] In some embodiments, the protein comprises or consists of the amino acid sequence: MVGLKCFWILQKGFRESKGIIVNTFQELERRGIEHLLSSNMDLPPVFPVGPILNLRN ARNDGKMADIMTWLNDQPENSVVFLCFGSRGSFKEEQVKEIAIAIEQSGQRFLWS LRRPTSIETFEFPKYYENPEEVLPKGFLERTKSVGKVIGWAPQMAVLSHPSVGGFV SHCGWNSTLESIWCGVPIAAWPLYAEQQTNAFQLVVEMGMAAEIRIDYRTNTPLV GGKDMMVTAEEIERGIRKLMSDDEMRKKVKDMKDKSRGAVLEGGSSHTSIGNLI DVLVSITI (SEQ ID NO: 104).

[0359] In some embodiments, the protein comprises an amino acid sequence with at least 77%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 104, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 77% to 99%, 79% to 98%, or 77% to 100% homology or identity to SEQ ID NO: 104. Each possibility represents a separate embodiment of the invention. [0360] In some embodiments, the protein comprises or consists of the amino acid sequence: MATNNLHFLLIPHIGPGHTIPMIDMAKLLAKQPNVMVTIATTPLNITRYGHTLADAI NSFRFFEVPFPAVEAGLPEGCESTDKIPSMDLVPNFLTAIGMLEQKLEEHFHLLEPR PNCIISDKYMSWTGDFADKYRIPRIMFDGMSCFNELCYNNLYENKVFEGMHETEP FVVPGLPDKIELTRKQLPPEFNPSSIDTSEFRQRARDAEVRAYGVVINSFEELEQEY VNEYKKLRKGKVWCIGPLSLCNSDNSDKAQRGNIASVDEEKCLKWLDSHEADSV VYACFGSLVRVNTPQLIELGLGLEASNRPFIWVVRSVHREKEVEEWLVESGFEERI KDRGLIIRGWAPQVLILSHPSIGGFLTHCGWNSTLESVCAGVPMITWPQFAEQFINE KLIVQVLGIGVGVGVDSVVHVGEEDRSGVKVKRESVTKAIEKVMDDEIDGNERRR RSKEFGKIANNAIKEGGSSYLNLTLLIQDIMRYANADASS (SEQ ID NO: 105).

[0361] In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 105, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 105. Each possibility represents a separate embodiment of the invention.

[0362] In some embodiments, the protein comprises or consists of the amino acid sequence: MEKTPHIAIVPSPGMGHLIPLVEFAKKLKNHHNIHATFIIPNDGPLSISQKVFLDSLP NGLNYLILPPVNFDDLPQDTQIETRISLMVTRSLDSLREVFKSLVVEKNMVALFIDL FGTDAFDVAIEFGVSPYVFFPSTAMALSLFLYLPKLDQMVSCEYRELPEPVQIPGCI PVRGQDLVDPVQDRKNDAYKWVLHNAKKYSMAKGIAVNSFKELEGGALNALLE DEPGKPKVYPVGPLVQTGFSCDVDSIECLKWLDGQPCGSVLYISFGSGGTLSSSQL NELAMGLELSEQRFIWVVRSPNDQPNATYFDSHGHKDPLGFLPKGFLERTKGIGFV IPSWAPQAQILSHSATGGFLTHCGWNSILETVVHGVPVIAWPLYAEQKMNAVSLT EGIKMALRPTVGENGIVGRLEVARVVKSLLEGEEGKAIRSRVRDLKDAAANVLSK DGSSTKTLDQLAVQLKKQELS (SEQ ID NO: 106).

[0363] In some embodiments, the protein comprises an amino acid sequence with at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 106, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 90% to 100%, 93% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 106. Each possibility represents a separate embodiment of the invention. [0364] In some embodiments, the protein comprises or consists of the amino acid sequence: MTQKQMQMQPHFLLVTYPAQGHINPSLQFAERLIRLGVKVTFTTTVSAYRRMSKA GNISEFLNFAAFSDGFDDGFNFETDDHGLFLTQLRSRGKDSLKETILSNAKNGTPIS CLVYTLLLPWAPEVARGLNVPSAFLWIQPASVLRLYYYYFNGYNELIGDDCNEPS WSIQLPGLPLLKS (SEQ ID NO: 107).

[0365] In some embodiments, the protein comprises an amino acid sequence with at least 77%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 107, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 77% to 100%, 79% to 100%, 80% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 107. Each possibility represents a separate embodiment of the invention.

[0366] In some embodiments, the protein comprises or consists of the amino acid sequence: MTKIQQQPHFLLVTYPAQGHINPSLRFAERLIRLGVKVTFTITVSAYRRMSKAGHIS EFLNFAVFSDGFDDGFNSKTDDYGLFLTQFRSRGKDSLKETILSNAKNGTPVSCLV YTLLLPWAPEVARGLNVPSAFLWIQPASVLRLYYYYFNGYNELIGDDCNEPSWSIQ LPGLPLLKSRDLPSFCLPSNPYADVLTLVKEHLDVLDLEEKPKILVNSFDELEREAL NEIDGKLKMVAVGPLIPSAFFGWTGCI (SEQ ID NO: 108).

[0367] In some embodiments, the protein comprises an amino acid sequence with at least 73%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 108, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 73% to 100%, 77% to 100%, 85% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 108. Each possibility represents a separate embodiment of the invention.

[0368] In some embodiments, the protein comprises or consists of the amino acid sequence: MGSWRNSRTTSTKFLWLILPLMVVTVIIGVKKSNYGSKYNYPWVWSSVINSYSSS AVKEDVTVVAEGPVESFGLRSTVVNGGGVVAEGPSEDFGFNSSYPPLAMEDEMD VELPAIAKEDDLNATLSGPDLFVSANQTGGLHVDIGINSKYTSLDKLEARLGQVRA AIKEAESGNRTYDPDYVPEGPMYWHAASFHRSYLEMEKQFKVFVYEEGEPPIFHN GPCKNIYAMEGNFIYHMETTKFRTKNPEKAHTFFLPMSAAMMVRFIFERDPNVDH WRPMKQTIKDYVDLVGGKYPFWNRSLGADHFTVACHDWVSKVFYPIIFMLLLVFI FRMSTGC (SEQ ID NO: 109). [0369] In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 109, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 81% to 100%, 85% to 100%, 87% to 100%, or 91% to 100% homology or identity to SEQ ID NO: 109. Each possibility represents a separate embodiment of the invention.

[0370] In some embodiments, the protein comprises or consists of the amino acid sequence: MSTVEVAKLLVNRDHRLFITFLIIQPPSSGSGSAITTYIESLAEKAMDRISFIELPQDK IPPPRYPKSLPTAESKAHPLIFMIEFIKCHCKYVRNIVSDMISQPSSGRVAGLVIDML CFSMMDVANEFNIPTYVFVTSNAAFLGFYLYVQILSNDQNQDVVELSKSDTEISVP GFVKPVPTKVFWTVVRTKEGLDFVLSSAQKLRQAKAIMVNTFLELETHAIKSLSD DTSIPPVYPVGPILNLEGGAGKTFDNDISRWLDSQPPSSVVFLCFGSHGCFDEIQVK EIAHALEQSGHRFLWSLRRPPSDQTLKVPGDYEDPGVVLPEGFLERTAGRGKVIG WAPQVMVLAHRAVGGFVSHCGWNSLLESLWFGVPTATWPIYAEQQMNAFEMV VELGLAVEITLDYRNDMDMFIVTAQEIESGIRKVMEDNEVRTKVKERSEKSRAAV AEGGSSYASVGHLIKEFTGNIS (SEQ ID NO: 110).

[0371] In some embodiments, the protein comprises an amino acid sequence with at least 74%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 110, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 74% to 100%, 79% to 100%, 85% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 110. Each possibility represents a separate embodiment of the invention.

[0372] In some embodiments, the protein comprises or consists of the amino acid sequence: MSSFINFVESTTQLQPQFEQLIQTLLPITAIISDGFLMWTQDSAEKFNIPRLVFYGTNI FFMTMCNIMAQFKPHAAVNSDDEAFDVPGFTRFKLTANDFEPPFNEVEPKGSMLD FLLEQQKAMVRSHGLVVNSFYEIEHEFNVYWNQNYGPKAWLMGPFCVAKPYAS NVMDSEISTKVVKKSAWIQWLDRKLAANEPVLYISFGTQAEASMEHLHEVAIGLE RSNVSFIWVVKAKQMQLIGAGFEERVKGRGKVVTEWVDQMEILKHEIVSGFLSHC GWNSLLESMCVGVPVLAMPLMADQLLNARLVVEEIGMGLRLWPRGMVARGIVG AEEVEKMVVELMEGEGGRRVRKRVIEVREMAYGAMKEGGSSSRTLDSLIDHVCE AFHKTV (SEQ ID NO: 111). [0373] In some embodiments, the protein comprises an amino acid sequence with at least 76%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 111, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 76% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 111. Each possibility represents a separate embodiment of the invention.

[0374] In some embodiments, the protein comprises or consists of the amino acid sequence: MGSLKKGAHILIFPFPAQGHMLPLLDLTHHLATNGLTITILVTPKNLPILNPLLSSSP NIQPLVFPFPPHPRLPPHVENVKDIGNHANVPITNSLAKLQDQIIQWFNSHHNPPVAI ISDFFLGWTQHLANKLGIPRVGFFSSGAYLTAVLDYVCHNIKTVRSQEETVFHDLP NSPCFKFEHLPGLAQIYKESDPEWELVLDGHIANGLSWGWIVNTFDGLESRYMEY LTKKMGVGRVFGVGPVNLLNGSDPMTRGKSESGSDSGVLNWLDGKPDGSVLYV CFGSQKFLTNDQMEGLSIGLEQSGVHYVWVVKDEQGDAIRSGSGRGLVVTGWAP QVSILGHGAVGGFLSHCGWNSVLEAIVNGVMILAWPMEADQFVNAKLLVDDHGI GVWVCEGPNTVPDSTELARKIGESMSTDKSEKVKAKEMKNKANEAVKEGGSSSM ELSRLVKELSNFETNGP (SEQ ID NO: 112).

[0375] In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 112, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 81% to 100%, 85% to 100%, 90% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 112. Each possibility represents a separate embodiment of the invention.

[0376] In some embodiments, the protein comprises or consists of the amino acid sequence: MDTQTQVKKQKLETMEHKTSSAEIFVLPFFGTGHINPAMELCRNISSHNYKTTLIIP SHLSSSIPSPFSSTLLHVAEIPFTASDPEPGSGRGNPLDAQNKQMGEGIKAFMSARSD GSKLPTCVVIDVMMNWSKEIFVDYQIPIVSFFTSGATNTAMGYGRWKAKIGDLKP GETRVIPGLPTEMAVTFADLNQGPRGRGPRPDGSRPDGPRSGPPGGMRSGPPHGM RGGGRGGRGGGRPGPDAKPRWVDEVDGSVALLINTCDNLERVFIDYIAEETKIPV YGVGPLLPEKYWKSAGSLLRDHEMRSNHKANYSEDEVFQWLESKPVGSVIYISFG SEVGPTIDEYKELAGSLEGSNQNFIWVIQPGSGITGMPRSFLGPVNTDSEEEEEGYY PEGLDVKVGNRGLIITGWAPQLLILSHPSTGGFLSHCGWNSTVEAIGRGVPILGWPL RGDQFDNAKLVANHLKIGFAMSSVASEGGRPGKFNKETITAGIEKLMNDEDVHKQ AKKLSKEFESGFPVSSVKALGAFVESISQKAT (SEQ ID NO: 113). [0377] In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 113, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 100%, 77% to 100%, 85% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 113. Each possibility represents a separate embodiment of the invention.

[0378] In some embodiments, the protein comprises or consists of the amino acid sequence: MSLVTNNPHLLVYPLPTSGHIIPLLDLTDLLLRRGLTITVVISTTDLTLLDTLLSSHPT SLHKLYFPDPEIGPSSHPVIARIIATQKLFDPIVKWFESHPSPPVAIISDFFLGWTNEL ASREGIRRVVFSPSGAEGHSIEQSEWRDVAEINAKNVDGNGNYSISFTDIPNSPEFH WWQESQEERVHREGDPDFEFFRNGMEANTKSWGIVYNTFERIEKVYIDHVKKQIG HDRVWAIGPEEPEEHGPVGSTARGGSSVVPPHDEETWEDKKPHDSVVYICFGSRE TESEKQMSAEASAEEESNVDFIECVKASGSSFIPSGFEDRVVGRGFVIKGWAPQEAI ERHRAVGSFVTHCGWNSTEEGVSSGVMMETWPMGADQYANAKEEVDQEGVGK RVCEGGPESVPDSTEEAREEEESESGDTSERVKVKEESREANTAVKEGTSIRDENM FVNEESEE (SEQ ID NO: 114).

[0379] In some embodiments, the protein comprises an amino acid sequence with at least 78%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 114, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 78% to 100%, 85% to 100%, 90% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 114. Each possibility represents a separate embodiment of the invention.

[0380] In some embodiments, the protein comprises or consists of the amino acid sequence: MATQVKTEEKHLKVEIINKTYVKPETPLGRKECQLVTFDLPYIAFYYNQKLIIYKG GVEEFEDTVEKLKDGLKVVLGEFHQLAGKLDKDDDGVFKVVYDDDMDGVEVLS AVAEDTATADLMDEEGTIKLKELVPYNSVLNIEGLHRPLLSIQITKLKDGLVLGCA FNHAILDGTSTWHFMSSWAQICSGSKSISAAPFLDRTQARNTRVKLDLTPPAQTNG NSNGDTNGDASATKPPAPAPLREKIFKFSESAIDKIKAKINANPPEGSTKPFSTFQSL STHIWHAVTRARNLKPEDYTVFTVFADCRKRVDPPMPDSYFGNLIQAIFTVTAAGL LQANPPEFAASMIQKAIDMHDAKAIEARNKEWESNPIIFQYKDAGVNCVAVGSSP RFKVYDVDFGFGKPESVRSGANNRFDGMVYLYQGKSGGRSIDVEISLDASAMGN LEKDKEFLIQE (SEQ ID NO: 130). [0381] In some embodiments, the protein comprises an amino acid sequence with at least 87%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 130, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 87% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 130. Each possibility represents a separate embodiment of the invention.

[0382] In some embodiments, the protein comprises or consists of the amino acid sequence: MASLPLLTVLEQSHVSPPPATVVDKSLSLTFFDFLWLTQPPIHNLFFYEFSIDETQFV ETIVPSEKNSESITEQHFYPFAGNEIEFPDNKRPEIRYVEGDYVMVTFAKSSEDFNEE VGNHPRDCDQFYDEIPPEGESVKTSEFRKIPEFSVQVTFFPQKGVSIGMTNHHSEGD ASTRFCFENAWTSISRSSSDESFEANGTKPFYDRVISNPKEDQSYEKFSKIDTEYEK YQPLSLSRPSNKLRGTFILTRKILNELKKSVSIKLPTLSYVSSFTVACGYIWSCIAKSR NDDEQEFGFTIDCRAREDPPVPSTYFGNCVGGCMAMAKTTEETEDDGFITAAKEE GESEHKTETESGGIVKDIEVFEDEFKDGEPTTMIGVAGTPKEKFYETDFGWGNPKK VETISIDYNMSISMNACRESKDDEEIGVCEMNTEMEAFVREFDEGEESYV (SEQ ID NO: 131).

[0383] In some embodiments, the protein comprises an amino acid sequence with at least 72%, at least 80%, at least 89%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 131, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 72% to 100%, 80% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 131. Each possibility represents a separate embodiment of the invention.

[0384] In some embodiments, the protein comprises or consists of the amino acid sequence: MGSENVHKIMKINITKSSFVQPSKPTVLPTNHIWTSNLDLVVGRIHILTVYFYRPNG ASNFFDPIVMKKALADVLVSFYPMAGRISKDDNGRVVINCNDEGVLFVEAESDST LDDFGEFTPSPELRQLTPTIDYSGDISTYPLFFAQVTHFKCGGVGFGCGVFHTLADG LSSIHFINTWSDMARGLSIAIPPFTDRTLLRAREPPTPTFDHVEYHLPPSMKTTSQTN KSRKPSTAMLKLTLDQLNALKAAAKNEGGNTNYSTYEILAAHLWRCACKARGLP DDQLTKLYVATDGRSRLSPQLPPGYLGNVVFTATPVAKSADLTTQPLSNAASLIRT TLTKMDNDYLRSAIDYLEVQPDLSALIRGPSYFASPNLNINTWTRLPVHDADFGW GRPVFMGPAVILYEGTIYVLPSPNNDRSMSLAVCLDADEQPSFEKFLYDF (SEQ ID NO: 132). [0385] In some embodiments, the protein comprises an amino acid sequence with at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 132, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 90% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 132. Each possibility represents a separate embodiment of the invention.

[0386] In some embodiments, the protein comprises or consists of the amino acid sequence: MPSSSSSPSSTADSVTIISKCTVYPHMKNSTPESLQLSVSDLPMLSCQYIQKGVLLSQ PPPNHTNNIISHLKLSLSKTLSHFPPLAGRLSTDSHGHVSIICNDSGVEFVHSTANHL HTHQILPLNSDVHPCFKTFFAFDKTLSYAGHHQPIAAVQVTELADGLFIGCTVNHA VVDGTSFWNFFNTFAEITKGCQKVTNLPDFSRENVFISPVVLPLPSGGPSATFSGDE PLRERIIHFSRDAILKMKFRANNPLWRQPQNSDLDDTEIYGKVCNDINGKVNGAFK PKSEISSFQSLCGQLWRAVTRARKFNDPIKTTTFRMAVNCRHRLDPKVDKLYFGN LIQSIPTVASVGELLSHDLSWAANELHQNVVAHDNATVRRGVKDWENNPKLFPLG NFDGAMITMGSSPRFPMYNNDFGWGRPMAVRSGKANKFDGKISAFPGRDGDGSV DLEVVLAPETMACLERDHEFMQYVS (SEQ ID NO: 133).

[0387] In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 133, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 133. Each possibility represents a separate embodiment of the invention.

[0388] In some embodiments, the protein comprises or consists of the amino acid sequence: MKWFFITHKATQRCLNSKQFHLHGGSNFVSGNRCFLASHSMERPKFMLIPYYPYQI RSLNSSHRYSSTSPSGSPHSFLNGTKNENYTKKVDLEIISREIIKPASPTPHHLRNFNL SLLDQIVFDCYTPVILFIPNSNKATVTDVMIKRLKHLKETLSRILSQFYPFAGEVKDR LHIECNDKGVNYIEAQINETLEEFLCHPDNEKARELMPESPHVQESAIGNYAMGIQI NIFSCGGIGLSMSMAHKIMDFYTYTIFMKAWAAAVRGSPDTIISPSFVASEVFPNDP SQEDSIPIELKSSNLLSTKRFEFDPTALALLKGQVVASGSPPQRGPSRMEATTAVIW KAAAKAASTVRRFDPKSPHALALPVNIRKRASPALPDNSIGNIVMRGIAICFPESQP DLPTLMGKVRESIAKLNSDYIESLKGEKGHETVNKMLKELKLRTNMTKVGGKFV ASCIFNSGIYELDFGWGKPIWFYVVNPGSDSCVVLTDTLKGGGVEATITLPPDEMEI FERDHELLSYTTINPSPLRFLDH (SEQ ID NO: 134). [0389] In some embodiments, the protein comprises an amino acid sequence with at least 59%, at least 65%, at least 75%, at least 85%, at least 90%, or at least 99% homology or identity to SEQ ID NO: 134, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 59% to 100%, 70% to 100%, 80% to 100%, 90% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 134. Each possibility represents a separate embodiment of the invention.

[0390] In some embodiments, the protein comprises or consists of the amino acid sequence: MEVPDQFHLNILEQCHVSPSPNSIIPSFSLPLTFLDIPWLFYPSNQTLFFFPEPPPKTTI I TTEKQSESETEHHFHPEAGNESEPSPPAEPHIVYTKNDSIAETIAQTNTNIHHESCNH PRSVKNEYSEEPKEPSPSMSRETHVGEVIPEETIQITVFADEGYSIGVTMQHAAVDE RTFDQFMKCWASVCTSEEKNDSEFTFKSTPWYDRSVIIDPKSEKTTFEKQWWNRS NSENESHDQENDDHDEVEATFVESSEDINMIKNHIEAKCKMINEDPPEHESPYVSA CAYEWKCEIKIQETHDSIKGGPEYEGFNAGGITREGYDIPSTYFGNCIAFGRCKAFE SEEEGDNGIVFAAKSIGKEIKREDKDVEGGANKWISDWDEETIREEGSPKVDSYGM DFGWGKVEKVEKISSISNHGRVNVISESGCKDFKGGIEIGVVESVAKMNVFTSEFH GGEMEFAY (SEQ ID NO: 135).

[0391] In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 80%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 135, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 100%, 80% to 100%, 87% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 135. Each possibility represents a separate embodiment of the invention.

[0392] In some embodiments, the protein comprises or consists of the amino acid sequence: MKNKNPTSVIREALAKVLVFYYPFAGRLKEGPARKLMVDCSGEGVLFIEAEADVT LKQFGDALQPPFPCLEELLYDVPGSTGILDTPLLLIQVTRLLCGGFIFALRLNHTMS DAAGLVQFMTGLGEMAQGASRPSTLPVWQRELLFARDPPRVTCTHHEYTEVEDT NGTIIPLDDMAHKSFFFGPSEISALRRFVPSYLKKCSTFEVLTACLWRCRTIALQPDP EEEMRMICIVNARGKFNPPLLPKGYYGNGFAIPVAISTAGDLSSKPLGHALELVMK AKSNVTEEYMRSVADLMVIKGRPHYTVVRSYLVSDVTHAGFDVVDFGWGKASY GGPAKGGVGAIPGVVTFFIPFTNHKGESGIVLPICLPSAAMDKFVEELNKMLVPDN NEQVLREHKLLVLARL (SEQ ID NO: 136). [0393] In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 136, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 100%, 92% to 100%, 97% to 100%, or 99% to 100% homology or identity to SEQ ID NO: 136. Each possibility represents a separate embodiment of the invention.

[0394] In some embodiments, the protein comprises or consists of the amino acid sequence: MAQIDTPLTFKVRRHAPELIAPAKPTPRELKPLSDIDDQEGLRFHIPVIQFYRSDPKM KNKNPASVIREALAKVLVFYYPFAGRLKEGPARKLMVDCSGEGVLFIEAEADVTL KQFGDALQPPFPCLEELLYDVPGSTGVLDTPLLLIQVTRLLCGGFIFALRLNHTMSD APGLVQFMTGLGEMAQGASRPSTLPVWQRELLLARDPPRVTCTHHEYTEVEDTK GTIIPLDDMAHKSFFFGPSEISALRRFVPSYLKKCSTFEVLTACLWRCRTIALQPDPE EEMRIICIVNARGKFNPPLPKGYYGNGFAFPVAISTAGDLSSKPLGHALELVMKAK SDVTEEYMRSIADLMVIKGRPHFTVVRSYLVSDVTHAGFDVVDFGWGKAAYGGP AKGGVGAIPGVASFYIPFTNHKGESGIVLPICLPSAAMDKFVEELNKMLVPDNNEQ VLREHKLLVLARL (SEQ ID NO: 137).

[0395] In some embodiments, the protein comprises an amino acid sequence with at least 91%, at least 93%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 137, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 91% to 100%, 93% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 137. Each possibility represents a separate embodiment of the invention.

[0396] In some embodiments, the protein comprises or consists of the amino acid sequence: MEIQVINYSSKLVKPLTPTPTANRYYNISFTDELVPTIYVPLILYYATPKNPNGDHFE NICDRLEESLSKTLSDFYPLAARFIRKLSLIDCNDQGVLFVLGNVNIRLSDVTGLGL TFKTSVLNDFLPCEIGGADEVDDPMLCVKVTTFECGGFAIGMCFSHRLSDMGTMC NFINNWAARTIGEYDNEKHTPIFNSPLYFPQRGLPELDLKVPRSSIGVKNAARMFHF NGKAISSMREVFGVDENGSRRLSKVQLVVALLWKAFVRIDDVNDGQSKASFLIQP VGLRDKVVPPLPSNSFGNFWGLATSQLGPGEGHKIGFQEYFYILRESIKKRARDCA KILTHGEEGYGVVIDPYLESNQKIADNGTNFYLFTCWCKFSFYEADFGCGKPIWAS TGKFPVQNLVIMMDDNEGDGVEAWVHLDDKRMNELEQDPDVKLYACNLA (SEQ ID NO: 138). [0397] In some embodiments, the protein comprises an amino acid sequence with at least 73%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 138, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 73% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 138. Each possibility represents a separate embodiment of the invention.

[0398] In some embodiments, the protein comprises or consists of the amino acid sequence: MKLAVKESVIVKPSKTTPCQQIWTSNLDLVVGRIHILTVYLYRPNGSSNFFDSMVL KKALADVLVSFFPVAGRLDKDGDGRVVIDCNGEGVLFVEAEADCCIDDFGEITPSP ELRRLVPTVDYSGDMSSYPLFITQVTRFKCGGVSLGCGLHHTLSDGLSALHFINTW SDVARGLSVAIPPFIDRSLLRARDPPSPVFDHIEYHPPPSLITPLQNQKNASHSRSAST LILRLTLHQINNLKSKAKGDGSMYHSTYEILAAHLWRCACKARGLANDQPTKLYV ATDGRSRLIPPLPPGYLGNVVFTATPVAKSGDFESESLAETARRIRSELGKMNDEYL RSAIDYLESVSDISTLVRGPTYFASPNLNVNSWTRLPIYESDFGWGRPIFMGPASILY EGTIYIIPSPSGDRSVSLAVCLDPDHMALFKECLYVF (SEQ ID NO: 139).

[0399] In some embodiments, the protein comprises an amino acid sequence with at least 83%, at least 88%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 139, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 83% to 100%, 88% to 100%, 94% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 139. Each possibility represents a separate embodiment of the invention.

[0400] In some embodiments, the protein comprises or consists of the amino acid sequence: MKLAVKESVIVKPSKTTPCQQIRTSNLDLVAGRIHILVVFFYRPNGSSNFFDSLVLK KALADVLVPFFPVAGRFSEDGDGRVVIDCNGEGVLFVESEADCCIDDFGEITLSPEL QQLVPTVDYSGDMSSYPLFIAQVTRFKCGGVSLGWGLHHTLLDGLSALHFVNTW GDVARGLSVAIQPFIDRSLLRARDPPTPVFDHIEYHPPPSLITPLQNQKNASHSRSAS TLILQLTPDQIKNLKSKAKGDGSMYHSTYEILAAHLWRCACKARGLANDQPTKLY VAANGRSRLIPPLPPGYLGNVVFNATHVAKSGDFESESLAETARRIHCELGKMNDE

YFRSAIDYLESVDDISTLVKGPTYFASPNLNVYSWIGIPIYACDFGWGQPIFMRPAS FLYDGSIYIIPSPSGDRSVLLAVCLDPDHMDLFKECLYAF (SEQ ID NO: 140). [0401] In some embodiments, the protein comprises an amino acid sequence with at least 76%, at least 84%, at least 92%, or at least 99% homology or identity to SEQ ID NO: 140, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 76% to 100%, 83% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 140. Each possibility represents a separate embodiment of the invention.

[0402] In some embodiments, the protein comprises or consists of the amino acid sequence: MVMISKLLRLGRRKLHTIVSRDTIRPSSPTPSHSKTYNLSLLDQIAVNSYVPIVAFYP SSNVCRSSDDKTLELKNSLSKILTHYYPFAGRMKKNRPTVVDCNDEGVEFVEARN TNSLSDFLQQSEHEDLDQLFPDDCVWFKQNLKGSINDANNSSVCPLSIQVNHFACG GVAVATSLRHKIGDGSSALNFIKHWAAVTSHSRAGNHQIDATSPIINPHFISYPTRT FKLPDRSPYIPPSDVVSKSFVFPNTNIKDLQAKVVTMTMGSRQPIVNPTRADVVSW LLHKCVVAAATKRISGNFKESCVISPLNLRNKLEEPLPETSIGNIFYLITFPISNNHGD LMPDDFISQLRLGIRKFQNIRNLETALRTVEEMISETFILGTAESMDTSYVYSSIRGF PMYDIDFGWGKPVKVTVGGALKNLSILMDTPDVNGIEALVSLDKQDMKILLNDPE LLAFCL (SEQ ID NO: 141).

[0403] In some embodiments, the protein comprises an amino acid sequence with at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% homology or identity to SEQ ID NO: 141, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 60% to 100%, 70% to 100%, 80% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 141. Each possibility represents a separate embodiment of the invention.

[0404] In some embodiments, the protein comprises or consists of the amino acid sequence: MSTSDKMKITIRESSMIKPSKPTPDQRIWNSNLDLVVGRIHILTLYFFRPNGSSDFFD SEVLKQSLADVLVSFFPMAGRLGLDGDGRVEINCNGEGVLFVEAEADCSIDDFGEI TPSPELRRLAPTVDYSGDISSYPLVITQVTHFKCGGVSLGCGLHHTLSDGLSSLHFIN TWSDVTRGLPVAIPPFVDRTVLRARDPPTVVFDHVEYHTPPSMTSSLDKDKPQSED VHVSTSMLRLTLDQINALKAKGKGDGIVYHSTYEILAAHLWRCACKARGLLNDQ MTKLYVATDGRSRLIPPLPPGYLGNVVFTATPIAKSGELQQEPLATTARKIHTELAK MDDKYLRSALDYLESQQDLSALIRGPAYFACPNLNINSWTRLPIYDADFGWGRPIF MGPASILYEGTIYIIPSPSGDRSVSLAVCLDPSHMPLFQKYLYEL (SEQ ID NO: 142). [0405] In some embodiments, the protein comprises an amino acid sequence with at least 85%, at least 89%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 142, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 85% to 100%, 90% to 100%, 93% to 100%, or 96% to 100% homology or identity to SEQ ID NO: 142. Each possibility represents a separate embodiment of the invention.

[0406] In some embodiments, the protein comprises or consists of the amino acid sequence: MVNVEIISNEYIKPSSPTPPHLKIYNLSILDQLIPAPYAPIILYYPNQDHINDFEVHERL KLLKDSLSKTLTRFYPLAGTIKGDLSIDCNDIGAYFAVAHVNTRLDVFLNHPDLDLI NCFLPRGPYLNGSSEGSCVSNVQVNIFECCGIAISLCISHKILDGAALSTFLKAWAG TSYGSKEVVYPNMSAPSLFPAKDLWLKDSSMVMFGSLFKMGKCSTKRFVFDSSKL SFLKAKASLNGLKDPTRVEVVSALLWKCIMAASEENTGSWKPSLLSHVVNLRKRL VSTLSEDSIGNLIWLASAECRTNAQSRLSDLVEKVRDSVSKINSEFVKKIQGDKGTK VMEESLKSMKDCADYIGFTSWCKMGFYDVDFGWGKPVWVCGSVCEGSPVFMNF VILMDTKYGDGIEAWVSLDEHEMHILKHNPELLEYASIDPSPLQMNK (SEQ ID NO: 143).

[0407] In some embodiments, the protein comprises an amino acid sequence with at least 82%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 143, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 82% to 100%, 85% to 100%, 90% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 143. Each possibility represents a separate embodiment of the invention.

[0408] In some embodiments, the protein comprises or consists of the amino acid sequence: MGTIYQSPMIKSSTPKIIEDLKVIIHDTFTIFPPHETEKRSMFLSNIDQVLTFNVETVH FFAANPDFPPQVVAEKLKLALSKALVPYDFLAGRLKLNHESQRFEFDCNGAGARF VVGSSEFELGEIGDLVYPNPGFRQLVQKSYDNLELHEKPLCILQLTSFKCGGFALG VATNHATFDGLSFKTFLQNLGSLAADQPLAVDPCNDRHLLAARSPPKVQFDHPEL LKIPTGTDIPNPTVFDCPESQLDFKIFNLTSDDIAHLKTKAKDGPGSTNAKITGFNVV AAHVWRCKALSSGSEYDPERVSTVLYAVDIRSRLNLPLSLAGNAVLSAYASAKCK EIEEGPLSRLVEMVTEGTNRMTGEYARSVIDWGEVNKGFPNGEFLISSWWRLGFA DVEYPWGKPRYSCPVVYHRKDIILLFPDIVGADNNNEVNVLVALPGKEMEKFETL FHKFLA (SEQ ID NO: 144). [0409] In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 144, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 100%, 90% to 100%, 93% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 144. Each possibility represents a separate embodiment of the invention.

[0410] In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 12-22, is an AAE.

[0411] In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 27-30, is a PKS.

[0412] In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 39-46, is a PKC.

[0413] In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 59-70, is a PT.

[0414] In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 80-88, is a CBCAS.

[0415] In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 102-114, is a UGT.

[0416] In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 130-144, is a AAT.

[0417] The terms “homology” or “identity”, as used interchangeably herein, refer to sequence identity between two amino acid sequences or two nucleic acid sequences, with identity being a stricter comparison. The phrases “percent identity or homology” and “% identity or homology” refer to the percentage of sequence identity found in a comparison of two or more amino acid sequences or nucleic acid sequences. Two or more sequences can be anywhere from 0-100% identical, or any value there between. Identity can be determined by comparing a position in each sequence that can be aligned for purposes of comparison to a reference sequence. When a position in the compared sequence is occupied by the same nucleotide base or amino acid, then the molecules are identical at that position. The degree of identity of amino acid sequences is a function of the number of identical amino acids at positions shared by the amino acid sequences. A degree of identity between nucleic acid sequences is a function of the number of identical or matching nucleotides at positions shared by the nucleic acid sequences. A degree of homology of amino acid sequences is a function of the number of amino acids at positions shared by the polypeptide sequences.

[0418] The following is a non-limiting example for calculating homology or sequence identity between two sequences (the terms are used interchangeably herein). The sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non- homologous sequences can be disregarded for comparison purposes). The optimal alignment is determined as the best score using the GAP program in the GCG software package with a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frame shift gap penalty of 5. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percentage identity between the two sequences is a function of the number of identical positions shared by the sequences.

[0419] In some embodiments, % homology or identity as described herein are calculated or determined using the basic local alignment search tool (BLAST). In some embodiments, % homology or identity as described herein are calculated or determined using Blossum 62 scoring matrix.

[0420] In some embodiments, the protein comprises or is characterized by acyl activating enzymatic activity.

[0421] In some embodiments, an acyl is selected from: C1-C8 alkyl chain, and alphaunsaturated phenylalkyl carboxylic acid.

[0422] In some embodiments, an acyl is a Cl alkyl chain. In some embodiments, an acyl is a C2 alkyl chain. In some embodiments, an acyl is a C3 alkyl chain. In some embodiments, an acyl is a C4 alkyl chain. In some embodiments, an acyl is a C5 alkyl chain. In some embodiments, an acyl is a C6 alkyl chain. In some embodiments, an acyl is a C7 alkyl chain. In some embodiments, an acyl is a C8 alkyl chain.

[0423] In some embodiments, a C1-C8 alkyl chain is hexanoic acid. In some embodiments, an acyl is hexanoic acid.

[0424] In some embodiments, an alpha-unsaturated phenylalkyl carboxylic acid comprises cinnamic acid or a derivative thereof. [0425] In some embodiments, a cinnamic acid derivative comprises a hydroxylated derivative of cinnamic acid.

[0426] In some embodiments, a hydroxylated derivative of cinnamic acid comprises or is coumaric acid.

[0427] In some embodiments, the protein comprises or is characterized by polyketide synthesizing activity, as described herein. In some embodiments, the protein is characterized by having an activity of polymerizing a diketide substrate into a polyketide.

[0428] In some embodiments, a diketide substrate is obtained by coupling of an acyl CoA starting unit.

[0429] In some embodiments, an acyl CoA starting unit is selected from: acetyl CoA, butyryl CoA, hexanoyl CoA, octanoyl CoA, cinnamoyl CoA, coumaroyl CoA, or any combination thereof.

[0430] In some embodiments, an acyl CoA is or comprises hexanoyl CoA, cinnamoyl CoA, or both.

[0431] In some embodiments, an acyl CoA is hexanoyl CoA.

[0432] In some embodiments, a polyketide comprises a tetraketide. In some embodiments, a polyketide comprises a linear polyketide. In some embodiments, a polyketide comprises a linear tetraketide.

[0433] In some embodiments, the protein comprises or is characterized by polyketide cyclization or cyclizing activity, as described herein. In some embodiments, the protein is characterized by having an activity of cyclizing a polyketide.

[0434] In some embodiments, polyketide cyclization comprises aldol cyclization, Claisen cyclization, or both.

[0435] In some embodiments, a polyketide comprises an acyl group, as described herein.

[0436] In some embodiments, the protein comprises or is characterized by prenyl transferring activity, as described herein. In some embodiments, the protein is characterized by being capable of transferring a prenyl group to a substrate molecule. In some embodiments, the protein is characterized by being capable of transferring an allylic prenyl group to an acceptor molecule. In some embodiments, the protein is a prenyl diphosphate synthase. In some embodiments, the protein is a trans-prenyltransferase. In some embodiments, the protein is a cis-prenyltransferase. [0437] In some embodiments, the prenyl group is selected from: dimethylallyl diphosphate, geranyl diphosphate, farnesyl diphosphate, or geranylgeranyl diphosphate.

[0438] In some embodiments, the protein is characterized by being capable of synthesizing a compound represented by Formula I:

, wherein: (i) Ri is selected from: C1-C8 alkyl, an alpha-unsaturated phenylalkyl carboxylic acid, or an alpha saturated phenylalkyl carboxylic acid; and R2 is OH; or (ii) Ri is OH and R2 is selected from: C1-C8 alkyl, an alpha-unsaturated phenylalkyl carboxylic acid, or an alpha saturated phenylalkyl carboxylic acid.

[0439] In some embodiments, the compound is represented by a formula selected from:

, wherein R3 is C1-C8 alkyl, and wherein R4 is alpha-unsaturated phenylalkyl carboxylic acid.

[0440] In some embodiments, the compound is selected from the group:

[0441] In some embodiments, the compound is: [0442] In some embodiments, the protein is characterized by cannabigerolic acid (CBGA) cyclization or cyclizing activity. In some embodiments, cycling activity comprises cyclization of CBGA to CBCA. In some embodiments, the protein is characterized by being capable of cyclizing or cyclization of CBGA to CBCA. In some embodiments, the protein is characterized by being capable of synthesizing CBCA or being a CBCA synthase (CBCAS).

[0443] In some embodiments, the protein is characterized by being capable of transferring a glucuronic acid component of UDP-glucuronic acid to a cannabinoid or precursor thereof.

[0444] In some embodiments, the protein is characterized by being capable of transferring an acyl group from a donor molecule to the cannabinoid.

[0445] According to some embodiments, there is provided a transgenic cell comprising: (a) the DNA molecule disclosed herein; (b) the artificial nucleic acid molecule disclosed herein; (c) the plasmid or agrobacterium disclosed herein; (d) the protein disclosed herein; or any combination thereof.

[0446] In some embodiments, the cell further comprises a nucleic acid sequence encoding at least one enzyme related to cannabinoidogenesis derived from Cannabis sativa. In some embodiments, the at least one enzyme related to cannabinoidogenesis derived C. sativa is selected from: olivetol synthase (OLS), olivetolic acid cyclase (OAC), prenyltransferase 1 (PT1/GOT1), PT4/GOT4, or any combination thereof.

[0447] . In some embodiments, the at least one enzyme related to cannabinoidogenesis derived C. sativa is selected from: OLS, OAC, or both.

[0448] As used herein, the term "transgenic cell" refers to any cell that has undergone human manipulation on the genomic or gene level. In some embodiments, the transgenic cell has had exogenous polynucleotide, such as the DNA molecule as disclosed herein, introduced into it. In some embodiments, a transgenic cell comprises a cell that has an artificial vector introduced into it. In some embodiments, a transgenic cell is a cell which has undergone genome mutation or modification. In some embodiments, a transgenic cell is a cell that has undergone CRISPR genome editing. In some embodiments, a transgenic cell is a cell that has undergone targeted mutation of at least one base pair of its genome. In some embodiments, the exogenous polynucleotide (e.g., the DNA molecule disclosed herein) or vector is stably integrated into the cell. In some embodiments, the transgenic cell expresses a polynucleotide of the invention. In some embodiments, the transgenic cell expresses a vector of the invention. In some embodiments, the transgenic cell expresses a protein of the invention. In some embodiments, the transgenic cell, is a cell that is devoid of a polynucleotide of the invention that has been transformed or genetically modified to include the polynucleotide of the invention. In some embodiments, CRISPR technology is used to modify the genome of the cell, as described herein.

[0449] In some embodiments, the cell is a unicellular organism, a cell of a multicellular organism, and a cell in a culture.

[0450] In some embodiments, a unicellular organism comprises a fungus or a bacterium.

[0451] In some embodiments, the fungus is a yeast cell.

[0452] In some embodiments, the cell is an insect cell. In some embodiments, the cell comprises an insect cell line.

[0453] Types of insect cell lines suitable for transformation and/or heterologous expression are common and would be apparent to one of ordinary skill in the art. Non-limiting examples of such insect cell lines include, but are not limited to, Sf-9 cells, SR+ Schneider cells, S2 cells, and others.

[0454] According to some embodiments, there is provided an extract derived from a transgenic cell disclosed herein, or any fraction thereof.

[0455] In some embodiments, the extract comprises the DNA molecule disclosed herein, a protein as disclosed herein, or any combination thereof.

[0456] According to some embodiments, there is provided a homogenate, lysate, extract, derived from a transgenic cell disclosed herein, any combination thereof, or any fraction thereof.

[0457] Methods and/or means for extracting, lysing, homogenizing, fractionating, or any combination thereof, a cell or a culture of same, are common and would be apparent to one of ordinary skill in the art of cell biology and biochemistry. Non-limiting examples include, but are not limited to, pressure lysis (e.g., such as using a French press), enzymatic lysis, soluble-insoluble phase separation (such for obtaining a supernatant and a pellet), detergentbased lysis, solvent (e.g., polar, or nonpolar solvent), liquid chromatography mass spectrometry, or others.

[0458] According to some embodiments, there is provided a transgenic plant, a transgenic plant tissue or a plant part. In some embodiments, there is provided a transgenic plant, or any portion, seed, tissue, or organ thereof, comprising at least one transgenic plant cell of the invention. In some embodiments, the transgenic plant, transgenic plant tissue or plant part, comprises: (a) the DNA molecule disclosed herein; (b) the artificial disclosed herein; (c) the plasmid or agrobacterium disclosed herein; (d) the protein of the invention; (e) the transgenic cell disclosed herein; or any combination thereof.

[0459] In some embodiments, the transgenic plant, transgenic plant tissue, or plant part consists of transgenic plant cells of the invention. In some embodiments, the transgenic plant, transgenic plant tissue, or plant part comprises at least: 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% transgenic cells of the invention, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the transgenic plant, transgenic plant tissue, or plant part comprises 20%-50%, 20%-60%, 20%-70%, 20%-80%, 20%-90%, or 20%-100% transgenic cells of the invention. Each possibility represents a separate embodiment of the invention.

[0460] In some embodiments, the transgenic plant, transgenic plant tissue, or plant part is or derived from a Cannabis sativa plant. In some embodiments, the transgenic plant is a C. sativa plant.

[0461] In some embodiments, the transgenic plant, transgenic plant tissue, or plant part is or derived from hemp. In some embodiments, C. sativa comprises or is hemp.

[0462] According to some embodiments, there is provided a composition comprising any one of the herein disclosed: (a) the DNA molecule of the invention; (b) artificial vector; (c) plasmid or agrobacterium; (d) protein of the invention; (e) transgenic cell; (f) extract; (g) transgenic plant tissue or plant part; and (h) any combination of (a) to (g), and an acceptable carrier.

[0463] As used herein, the term “carrier”, “excipient”, or “adjuvant” refers to any component of a composition, e.g., pharmaceutical or nutraceutical, that is not the active agent. As used herein, the term “pharmaceutically acceptable carrier” refers to non-toxic, inert solid, semi-solid liquid filler, diluent, encapsulating material, formulation auxiliary of any type, or simply a sterile aqueous medium, such as saline. Some examples of the materials that can serve as pharmaceutically acceptable carriers are sugars, such as lactose, glucose and sucrose, starches such as corn starch and potato starch, cellulose and its derivatives such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; powdered tragacanth; malt, gelatin, talc; excipients such as cocoa butter and suppository waxes; oils such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, com oil and soybean oil; glycols, such as propylene glycol, polyols such as glycerin, sorbitol, mannitol and polyethylene glycol; esters such as ethyl oleate and ethyl laurate, agar; buffering agents such as magnesium hydroxide and aluminum hydroxide; alginic acid; pyrogen-free water; isotonic saline, Ringer's solution; ethyl alcohol and phosphate buffer solutions, as well as other non-toxic compatible substances used in pharmaceutical formulations. Some nonlimiting examples of substances which can serve as a carrier herein include sugar, starch, cellulose and its derivatives, powered tragacanth, malt, gelatin, talc, stearic acid, magnesium stearate, calcium sulfate, vegetable oils, polyols, alginic acid, pyrogen-free water, isotonic saline, phosphate buffer solutions, cocoa butter (suppository base), emulsifier (e.g. carbomer, hydroxypropyl cellulose, sodium lauryl sulfate) as well as other non-toxic pharmaceutically compatible substances used in other pharmaceutical formulations. Wetting agents and lubricants such as sodium lauryl sulfate, as well as coloring agents, flavoring agents, excipients, stabilizers, antioxidants, and preservatives may also be present. Any non- toxic, inert, and effective carrier may be used to formulate the compositions contemplated herein. Suitable pharmaceutically acceptable carriers, excipients, and diluents in this regard are well known to those of skill in the art, such as those described in The Merck Index, Thirteenth Edition, Budavari et al., Eds., Merck & Co., Inc., Rahway, N.J. (2001); the CTFA (Cosmetic, Toiletry, and Fragrance Association) International Cosmetic Ingredient Dictionary and Handbook, Tenth Edition (2004); and the “Inactive Ingredient Guide,” U.S. Food and Drug Administration (FDA) Center for Drug Evaluation and Research (CDER) Office of Management, the contents of all of which are hereby incorporated by reference in their entirety. Examples of pharmaceutically acceptable excipients, carriers, and diluents useful in the present compositions include distilled water, physiological saline, Ringer's solution, dextrose solution, Hank's solution, and DMSO. These additional inactive components, as well as effective formulations and administration procedures, are well known in the art and are described in standard textbooks, such as Goodman and Gillman’s: The Pharmacological Bases of Therapeutics, 8th Ed., Gilman et al. Eds. Pergamon Press (1990); Remington’s Pharmaceutical Sciences, 18th Ed., Mack Publishing Co., Easton, Pa. (1990); and Remington: The Science and Practice of Pharmacy, 21st Ed., Lippincott Williams & Wilkins, Philadelphia, Pa., (2005), each of which is incorporated by reference herein in its entirety. The presently described composition may also be contained in artificially created structures such as liposomes, ISCOMS, slow-releasing particles, and other vehicles which increase the half-life of the peptides or polypeptides in serum. Liposomes include emulsions, foams, micelles, insoluble monolayers, liquid crystals, phospholipid dispersions, lamellar layers, and the like. Liposomes for use with the presently described peptides are formed from standard vesicle-forming lipids which generally include neutral and negatively charged phospholipids and sterol, such as cholesterol. The selection of lipids is generally determined by considerations such as liposome size and stability in the blood. A variety of methods are available for preparing liposomes as reviewed, for example, by Coligan, J. E. et al, Current Protocols in Protein Science, 1999, John Wiley & Sons, Inc., New York, and see also U.S. Pat. Nos. 4,235,871, 4,501,728, 4,837,028, and 5,019,369.

[0464] The carrier may comprise, in total, from about 0.1% to about 99.99999% by weight of the pharmaceutical compositions presented herein.

Methods of synthesis

[0465] According to some embodiments, there is provided a method for synthesizing a cannabinoid, a precursor thereof, or any combination thereof.

[0466] According to some embodiments, there is provided a method for synthesizing acyl coenzyme A (CoA), polyketide, a compound represented by Formula I, a compound represented by Formula II, a cannabinoid, or any combination thereof.

[0467] In some embodiments, the method further comprises glycosylating a compound represented by Formula I, a compound represented by Formula II, a cannabinoid, or any combination thereof. In some embodiments, the method further comprises transferring an acyl group to a compound represented by Formula I, a compound represented by Formula II, a cannabinoid, or any combination thereof.

[0468] As used herein, the term “cannabinoid” or “cannabinoids” refer to a heterogeneous family of molecules usually exhibiting pharmacological properties by interacting with specific receptors. To date, two membrane receptors for cannabinoids, both coupled to G protein and named CB1 and CB2 have been identified. While CB 1 receptors are mainly expressed in the central and peripheral nervous system, CB2 receptors have been reported to be more abundantly detected in cells of the immune system.

[0469] In some embodiments, the cannabinoid comprises any compound as presented in Fig. 2.

[0470] According to some embodiments, the method comprises the steps: (a) providing a transgenic cell or a cell transfected with the DNA molecule of the invention or the artificial nucleic acid molecule disclosed herein; and (b) culturing the transgenic cell the transfected cell from step (a) such that at least a first protein and a second protein encoded by DNA molecule or the artificial nucleic acid molecule are expressed, thereby synthesizing the cannabinoid, a precursor thereof, or any combination thereof. [0471] In some embodiments, the precursor is selected from: acyl coenzyme A (CoA), a polyketide, a resorcinoid precursor, or any combination thereof.

[0472] In some embodiments, the resorcinoid precursor is olivetolic acid.

[0473] In some embodiments, the cannabinoid comprises or is CBGA, CBCA, or both.

[0474] According to some embodiments, there is provided a method for obtaining an extract from a transgenic cell or a transfected cell.

[0475] In some embodiments, the method comprises culturing a transgenic cell or a transfected cell in a medium and extracting the transgenic cell or the transfected cell.

[0476] In some embodiments, the method comprises the steps: (a) culturing a transgenic cell or a transfected cell in a medium; and (b) extracting the transgenic cell or the transfected cell, thereby obtaining an extract from the transgenic cell or the transfected cell.

[0477] In some embodiments, the transgenic cell or the transfected cell comprises the DNA molecule of the invention or a plurality thereof, as disclosed herein.

[0478] In some embodiments, the transgenic cell or the transfected cell comprises the artificial nucleic acid molecule or vector as disclosed herein.

[0479] In some embodiments, the cell is a transgenic cell, or a cell transfected with a DNA molecule as disclosed herein.

[0480] In some embodiments, the method further comprises a step preceding step (a), comprising introducing or transfecting the cell with the artificial nucleic acid molecule or vector, disclosed herein.

[0481 ] Method for introducing or transfecting a cell with an artificial nucleic acid molecule or vector are common and would be apparent to one of ordinary skill in the art.

[0482] In some embodiments, introducing or transfecting comprises transferring an artificial nucleic acid molecule or vector comprising the DNA molecule disclosed herein into a cell; or modifying the genome of a cell to include the polynucleotide disclosed herein. In some embodiments, the transferring comprises transfection. In some embodiments, the transferring comprises transformation. In some embodiments, the transferring comprises lipofection. In some embodiments, the transferring comprises nucleofection. In some embodiments, the transferring comprises viral infection.

[0483] As used herein, the terms “transfecting” and “introducing” are interchangeable.

[0484] In some embodiments, the contacting is in a cell-free system. [0485] Types of suitable cell-free systems for expression and/or synthesis utilizing any one of: the DNA molecule of the invention or a plurality thereof, as disclosed herein, and the protein of the invention, or a plurality thereof, would be apparent to one of ordinary skill in the art.

[0486] In some embodiments, the method further comprises a step preceding step (b), comprising separating the cultured transgenic cell or the cultured transfected cell from the medium.

[0487] Method for separating cell from a medium are common and may include, but not limited to, centrifugation, ultracentrifugation, or other, as would be apparent to a skilled artisan.

[0488] According to some embodiments, there is provided an extract of a transgenic cell, or a transfected cell obtained according to the herein disclosed method.

[0489] In some embodiments, the extract comprises a cannabinoid, a precursor thereof, or any combination thereof.

[0490] In some embodiments, the extract comprises CBGA, CBCA, or both.

[0491] According to some embodiments, there is provided a medium or a portion thereof separated from a cultured transgenic cell or a cultured transfected cell, obtained according to the herein disclosed method.

[0492] According to some embodiments, there is provided a composition comprising: (a) the extract disclosed herein; (b) the medium disclosed herein or a portion thereof; or (c) any combination of (a) and (b), and an acceptable carrier, as described herein.

[0493] In some embodiments, a portion comprises a fraction or a plurality thereof.

[0494] Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention. [0495] As used herein, the term "about" when combined with a value refers to plus and minus 10% of the reference value. For example, a length of about 1,000 nanometers (nm) refers to a length of 1,000 nm ± 100 nm.

[0496] It is noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a polynucleotide" includes a plurality of such polynucleotides and reference to "the polypeptide" includes reference to one or more polypeptides and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as "solely," "only" and the like in connection with the recitation of claim elements or use of a "negative" limitation.

[0497] In those instances where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "A or B" will be understood to include the possibilities of "A" or "B" or "A and B".

[0498] It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all subcombinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.

[0499] Additional objects, advantages, and novel features of the present invention will become apparent to one ordinarily skilled in the art upon examination of the following examples, which are not intended to be limiting. Additionally, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples.

[0500] Various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below find experimental support in the following examples.

EXAMPLES

[0501] Generally, the nomenclature used herein, and the laboratory procedures utilized in the present invention include molecular, biochemical, microbiological, and recombinant DNA techniques. Such techniques are thoroughly explained in the literature. See, for example, "Molecular Cloning: A laboratory Manual" Sambrook et al., (1989); "Current Protocols in Molecular Biology" Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., "Current Protocols in Molecular Biology", John Wiley and Sons, Baltimore, Maryland (1989); Perbal, "A Practical Guide to Molecular Cloning", John Wiley & Sons, New York (1988); Watson et al., "Recombinant DNA", Scientific American Books, New York; Birren et al. (eds) "Genome Analysis: A Laboratory Manual Series", Vols. 1-4, Cold Spring Harbor Laboratory Press, New York (1998); methodologies as set forth in U.S. Pat. Nos. 4,666,828; 4,683,202; 4,801,531; 5,192,659 and 5,272,057; "Cell Biology: A Laboratory Handbook", Volumes I-III Cellis, J. E., ed. (1994); "Culture of Animal Cells - A Manual of Basic Technique" by Freshney, Wiley-Liss, N. Y. (1994), Third Edition; "Current Protocols in Immunology" Volumes I-III Coligan J. E., ed. (1994); Stites et al. (eds), "Basic and Clinical Immunology" (8th Edition), Appleton & Lange, Norwalk, CT (1994); Mishell and Shiigi (eds), "Strategies for Protein Purification and Characterization - A Laboratory Course Manual" CSHL Press (1996); all of which are incorporated by reference. Other general references are provided throughout this document.

Materials and Methods

Materials

[0502] Unless otherwise stated, all the analytical metabolites were >95% pure. CBGA 1, CBCA 15, CBDA, acetic acid, propionic acid, butyric acid, pentanoic acid, hexanoic acid, heptanoic acid, octanoic acid, ±2-methyl butyric acid, phenylalanine, hexanoic-Dn acid (D>98%), GPP, IPP, FPP, phloretin 98, naringenin 96, malonyl-CoA (>90%), acetyl-CoA (>93%), butyryl-CoA (>90%), hexanoyl-CoA (>85%), octanoyl-CoA, iso-valeryl CoA (>90%), olivetol and sodium hexnoate were purchased from Sigma-Aldrich (Rehovot, Israel). A ⁹-THCA was purchased from Silicol Scientific Equipment Ltd. (Or Yehuda, Israel). Acetic-Ds acid (D>99%), propionic-Ds acid (D>99%), butyric-Ds acid (D>98%), pentanoic - D9 acid (D>98%), heptanoic-Ds acid (D>99%), octanoic-Ds acid (D>99%), iso-butyric-D? acid (D>98%), ±2-methyl butyric-Dg acid (D>99%), iso-valeric-Dg acid (D>98%), iso- caproic-Dn acid (D>98%) were purchased from C/D/N isotopes (Quebec, Canada). Phenylalanine-Ds (D>98%) and phenylalanine- ¹³C9, ¹⁵Ni ( ¹³C, ¹⁵N>99%) were synthesized by Cambridge Isotope Laboratories (Andover, MA). HeliCBGA 2 (NP009525, 90%) was purchased from Analyticon Discovery GmbH (Potsdam, Germany). APHA 3 was reported as an impurity (NP015136, 5%) in the heliCBGA analytical metabolite. OA 92 (>90%), VA (>90%) and iso-butyryl-CoA were purchased from Cayman Chemical (Ann Arbor, MI, USA). PCP 95, naringenin chaicone 97 and pinocembrin chaicone 100 were purchased from Wuhan ChemFaces Biochemical Co Ltd. (Hubei, China). Cinnamoyl-CoA and Coumaroyl- CoA were purchased from TransMIT GmbH (Hesse, Germany).

[0503] Seeds of H. umbraculigerum (Silverhill seeds, Cape Town, South Africa) were germinated, and grown in a greenhouse in a long-day photoperiod. Plants were propagated by cuttings.

Feeding Experiments

[0504] All feeding solutions were prepared as aqua solutions of 0.5 mg ml’ ¹ of the precursor. The pH of the FA solutions was adjusted to 5.5-6.0. The phenylalanine feeding experiments were performed on leaves from young mother plants excised by cutting at the proximal side of the pedicel with scissors under water, leaving attached 1-2 cm of the pedicel. For the FA feeding experiments, 10 cm young cuttings were obtained from mother plants. The lower leaves were removed leaving 4-5 leaves on each stem, and the stem was peeled to increase the intake of the labeled solutions. Three to four leaves or the young cuttings were immersed in aqua solutions [DDW (control), unlabeled or labeled precursors, each group consisted of a minimum of three biological replicates]. All feeding experiments were performed in a controlled environment for 48-96 h under 25 °C and constant fluorescent illumination and humidity and the tubes were periodically refilled. Upon termination, the fresh leaves were rinsed with a small amount of water, dried gently, flash frozen and stored at -80 °C for extraction.

LC-MS chemical analysis [0505] Unless otherwise stated, 100 mg frozen powdered plant tissue were extracted with 300 pl ethanol, sonicated for 15 min, agitated for 30 min and centrifuged at 14,000 g for 10 min. The supernatant was filtered through a 0.22 pm syringe filter and analyzed in the obtained concentration. Detection was performed using both targeted and non-targeted approaches as described in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023) using an ultrahigh-performance liquid chromatographytandem quadrupole time-of-flight (UPLC-qTOF) system comprised of a UPLC (Waters Acquity) with a diode array detector connected either to a XEVO G2-S QTof (Waters) or to Synapt HDMS (Waters). The chromatographic separation was performed on a 100 mm x 2.1 mm i.d. (internal diameter), 1.7 pm UPLC BEH C18 column (Waters Acquity). The mobile phase consisted of 0.1% formic acid in acetonitrile:water (5:95, v/v; phase A) and 0.1% formic acid in acetonitrile (phase B). Terpenophenols were analyzed using UPLC Method 1 as follows: Initial conditions were 40% B for 1 min, raised to 100% B until 23 min, held at 100% B for 3.8 min, decreased to 40% B until 27 min, and held at 40% B until 29 min for re-equilibration of the system. The flow rate was 0.3 ml min ^-1, and the column temperature was kept at 35 °C. Intermediates and glucosylated metabolites were analyzed using UPLC Method 2 as follows: Initial conditions were from 0% to 28% B over 22 min, raised to 100% B until 36 min, held at 100% B for 2 min, decreased to 0% B until 38.5 min, and held at 40% B until 40 min for re-equilibration of the system. The flow rate was 0.3 ml min ^-1, and the column temperature was kept at 35 °C. Electrospray ionization (ESI) was used in either positive or negative ionization modes at an m/z range of 50-1,000 Da. Masses were detected with the following settings: capillary 1 kV, source temperature 140 °C, desolvation temperature 450 °C, and desolvation gas flow 8001 h ^-1. Argon was used as the collision gas. The MS system was calibrated with sodium formate and Leu encephalin was used as the lock mass. Data acquisition for untargeted analysis was performed in negative ionization using the MS ^E mode. The collision energy was set to 4 eV for the low- energy function and to 15-50 eV ramp for the high-energy function. The R package Miso was run as previously described. Differential metabolites were selected if the fold change was greater or equal to 10 and the p-value was less than 0.05. MS/MS experiments were performed in positive or negative ionization modes according to the specific protonated or deprotonated masses with following settings: capillary spray of 1 kV; cone voltage of 30 eV; collision energy ramps were 10-45 eV for positive mode and 15-50 eV for negative mode.

Absolute quantification of CBGA 1 [0506] Fresh samples of leaves (dark and light), flowers, stems and roots were collected from a plant at the flowering stage. Florets and the receptacle of flowers were detached using a scalpel and analyzed separately. All tissues were flash frozen in liquid N2 and ground into fine powder. To measure CBGA 1 content in a dry tissue, fresh leaves were flash frozen, ground and lyophilized. For the extraction, 100 mg of the frozen powders were accurately weighed in triplicates, extracted with 1 ml ethanol, and prepared as previously described. Samples were injected in several dilutions to fit into the linear range of the calibration curves. Injections were performed on a UPLC (Waters) connected to a Triple Quad detector (TQ-S, Waters) in multiple reaction monitoring (MRM) mode. The system was operated with a similar column and mobile phase as for UPLC-qTOF analysis as follows: Initial conditions were 57% B raised to 85% B until 4 min, raised to 100% B until 4.2 min, held at 100% B until 6 min, decreased to 67% B until 6.2 min, and held at 67% B until 7 min for reequilibration of the system. The flow rate was 0.6 ml min ^-1, and the column temperature was kept at 40 °C. The instrument was operated in negative mode with a capillary voltage of 1.5 kV and a cone voltage of 40 V. Absolute quantification of CBGA 1 was performed by external calibration using two different transitions (359.3 > 191.2, 32 V for quantification; and 359.3>315.4, 21 V for qualification).

Metabolite purification for NMR analysis

[0507] A total of 86 g of fresh leaves were flash frozen in liquid N2 and ground into fine powder using an electrical grinder, extracted with 600 ml ethanol, sonicated for 20 min, and agitated for 30 min. The supernatant was filtered, evaporated using a rotary evaporator at 40 °C and lyophilized. The extract was reconstituted in 25 ml acetonitrile and used for either direct purification (following ten times dilution) or prefractionation via medium pressure liquid chromatography (MPLC). The Biichi Sepacore MPLC System was equipped with two C-605 pump modules, a C-620 control unit, C-660 fraction collector, C-640 UV photometer (Biichi Labortechnik AG, Switzerland), and a C18 manually packed column. The mobile phase consisted of acetonitrile:water (5:95, v/v; phase A) and acetonitrile (phase B), with the following multistep gradient method: initial conditions were 0% B for 10 min, raised to 99% B until 530 min, and slowly raised to 100% B until 660 min. The flow rate was 15 ml min’ ^l, the injection volume was 15 ml, and the wavelengths were: 210, 224, 270 and 350 nm. Fractions of 100 ml were collected throughout the run and analyzed by UPLC-qTOF to select specific metabolites for purification. The selected fractions were evaporated using a rotary evaporator at 40 °C, lyophilized, reconstituted in ethanol or methanol (only for the fraction with Glc-OA 102 and Glc-DHSA 103), and filtered through a 0.22 pm syringe filter. Purification of metabolites was performed on either an Agilent 1290 Infinity II UPLC system (System 1, the general instrument setup was according to Jozwiak et al. 2020); or a UPLC system (Waters Acquity) equipped with a binary pump, an autosampler, a fraction manager and a diode array detector (System 2) with similar mobile phase as for the UPLC-qTOF. Triggering was performed using specific UV wavelengths according to the metabolite.

[0508] In System 1, method development was performed by acquisition of both MS and UV signals. MS spectra were acquired in negative full scan mode between m/z 50 and 1,700. HPLC columns were either XBridge (BEH Cl 8, 250 x 4.6 mm i.d., 5 pm; Waters) or Luna (C18, 250 x 4.6 mm i.d., 5 pm; Phenomenex), and the conditions were adjusted and optimized for each metabolite. In this system, the eluent with the metabolites of interest were mixed with a makeup-flow of 1.8 ml min ^-1 water and then trapped on solid phase extraction (SPE) cartridges (10 x 2 mm Hysphere resin GP cartridges). Each cartridge was loaded four times with the same metabolite, and 36-72 cartridges were used for trapping one metabolite, depending on the concentration of the sample injected. After collection, SPE cartridges were dried with a stream of N2, and eluted with 150 pl methanol. Eluents containing the same metabolite were pooled, dried under a stream of N2, and stored at -20 °C until NMR analysis. A UPLC BEH C18 column (100 mm x 2.1 mm i.d., 1.7 pm; Waters) was used on System 2, apart from metabolites Glc-OA 102 and Glc-DHSA 103 which were fractionated on a Luna Phenyl-Hexyl column (150 mm x 2 mm i.d., 3 pm; Phenomenex). The flow rate was 0.3 ml min ^-1, and the column temperature was kept at 35 °C. All other conditions were adjusted and optimized according to the sample. The eluent with the metabolite of interest was collected in 2 ml HPLC vials. Eluents containing the same metabolite were pooled, dried under a stream of N2, lyophilized, and stored at -20 °C until NMR analysis.

NMR Spectroscopy

[0509] Purified metabolites were resuspended in 300 pl of Methanol-d4, dried under a stream of N2, reconstituted in 70 pl Methanol-d4 with 0.01% of 3-(trimethylsilyl)propionic-2, 2,3,3- d4 acid sodium salt (TMSP, used as an internal chemical shift reference for ¹ H and ¹³C) and transferred into 1.7 mm micro-NMR test tubes for structure elucidation. NMR spectra were collected on a Bruker AVANCE NEO-600 NMR spectrometer equipped with a 5 mm TCL xyz CryoProbe. All spectra were acquired at 298 K. The structures of the different metabolites were determined by one dimensional (ID) ’ H NMR spectra, as well as various two-dimensional (2D) NMR spectra: ^^H Correlation Spectroscopy (COSY), ^^H Total Correlation Spectroscopy (TOCSY), ^^H Rotating Frame Nuclear Overhauser Spectroscopy (ROESY), Heteronuclear Single Quantum Coherence (HSQC), and Heteronuclear Multiple Bond Correlation (HMBC) spectra.

[0510] One dimensional ’ H NMR spectra were collected using 16,384 data points and a recycling delay of 2.5 s. Two-dimensional COSY, TOCSY and ROESY spectra were acquired using 16,384-8,192 (tf) by 400-512 (q) data points. 2D TOCSY spectra were acquired using isotropic mixing times of 100-300 ms. A T-ROESY experiment was used in this study, TOCSY-less ROESY that effectively suppresses TOCSY transfer in ROESY experiments. T-ROESY spectra were recorded using spin lock pulses of 100-400 ms. 2D HSQC and 2D HMBC spectra were collected using 4,096 (C) by 400-512 (q) data points. Multiplicity editing HSQC enables differentiating between methyl and methine groups that give rise to positive correlation, versus methylene groups that appear as negative peaks. HMBC delay for evolution of long-range couplings was set to observe long-range couplings of JH,C = 8 Hz. All data were processed and analyzed using TopSpin 4.1.1 software (Bruker).

MALDI Imaging

[0511 ] For the peeling experiment, whole fresh leaves from a young plant were attached onto glass slides using double-sided tape with either the abaxial or adaxial surfaces, gently peeled above/below the midrib using duct tape and desiccated overnight under moderate vacuum. Images were taken using a digital camera. For localization of metabolites to individual trichomes, fresh leaves and flowers were sectioned, and matrix was sprayed as previously described. Sections were imaged with a Nikon DS-Ri2 microscope. MALDI imaging was performed using a 7 T Solarix FT-ICR (Fourier Transform Ion Cyclotron Resonance) mass spectrometer (Bruker Daltonics). The datasets were collected in positive ionization using lock mass calibration (DHB matrix peak: [3DHB+H-3H2O] ⁺, m/z 409.055408 Da) at a frequency of 1 kHz and a laser power of 40%, with 200 laser shots per pixel and 50, 15 or 25 pm pixel size for the peeled trichomes and for the sectioned leaves and flowers, respectively. Each mass spectrum was recorded in the range of m/z 150-3,000 in broadband mode with a Time Domain for Acquisition of IM, providing an estimated resolving power of 115,000 at m/z 400. The spectra were normalized to root-mean- square intensity and MALDI images were plotted at theoretical m/z ± 0.005% with pixel interpolation on.

Cryo-SEM, TEM, and Confocal Microscopy [0512] For cryo scanning electron microscopy (cryo-SEM) analyses, frozen samples were attached to a holder either by mechanical clamping (leaves) or by a glue made of a concentrated PVP solution. The holder with the samples was then plunged frozen in liquid N2, transferred to a BAF 60 freeze fracture device (Leica Microsystems, Vienna, Austria) using a VCT 100 Vacuum Cryo Transfer device (Leica) and was sublimed for 30 min at - 95 °C. Samples were transferred to an Ultra 55 cryo-SEM (Zeiss, Germany) using a VCT 100 shuttle and were and observed at -95 °C without coating using mostly mixed mode of InLens + SE detectors at 1-1.3 kV. For transmission electron microscopy (TEM) analysis, H. umbraculigerum leaves were fixed with 4% paraformaldehyde, 2% glutaraldehyde in 0.1 M cacodylate buffer containing 5 mM CaCh (pH 7.4), then postfixed with 1% osmium tetroxide supplemented with 0.5% potassium hexacyanoferrate tryhidrate and potasssium dichromate in 0.1 M cacodylate (1 h), stained with 2% uranyl acetate in water (1 h), dehydrated in graded ethanol solutions and embedded in Agar 100 epoxy resin (Agar scientific Ltd., Stansted, UK). Ultrathin sections (70-90 nm) were viewed and photographed with a FEI Tecnai SPIRIT (FEI, Eidhoven, Netherlands) transmission electron microscope operated at 120 kV and equipped with an OneView Gatan Camera. Confocal microscopy of trichomes was carried out on a Nikon eclipse Al microscope. Transmitted light was used to image the trichomes since they lack fluorescence. Autofluorescence of chlorophyll (chloroplasts) was used as a contrast for better visualization of the trichomes. Far-red laser was used to detect autofluorescence of chlorophyll (excitation: 640 nm; emission: 663-738 nm).

Trichome enrichment

[0513] Trichomes were enriched following Bergau et al. guidelines with modifications. Briefly, young leaves were harvested and soaked in ice-cold, distilled water and then abraded using a BeadBeater machine (Biospec Products, Bartlesville, OK). The polycarbonate chamber was filled with 15 g of plant material and filled with half the volume with glass beads (0.5 mm diameter), XAD-4 resin (1 g/g plant material), and ethanol 80% to full volume. Leaves were beaten by 2-4 pulses of operation of 1 min each. This procedure was carried out at 4 °C, and after each pulse the chamber was allowed to cool on ice. Following abrasion, the contents of the chamber were first filtered through a kitchen mesh strainer and then through a 100 pm nylon mesh to remove the plant material, glass beads, and XAD-4 resin. The residual plant material and beads were scraped from the mesh and rinsed twice with additional ethanol 80% that was also passed through the 100 pm mesh. The presence of enriched glandular trichome secretory cells was checked by visualization in an inverted optical microscope.

Genome assembly

[0 14] High molecular weight DNA was extracted from young frozen leaves and sequenced in UC Davis Genome Center. Sequencing was done in a Pacbio Sequel II platform with ~12- kilobase DNA SMRT bell library preparation according to the manufacturer’s protocol. Three different SMRT 8M cells were used, yielding a total of 57.8 Gb of HiFi data (~44x haploid coverage). In addition to Pacbio HiFi data, 200 M reads of PE 2x150 Illumina Hi-C data were obtained by the company Phase Genomics. Hifiasm software was used to integrate both Pacbio HiFi and HiC data to produce chromosome-scale and haplotype-resolved assemblies. Further scaffolding was performed with the Hi-C data, mapping the reads following Arima Genomics pipeline and the SALSA software. Visualizations of Hi-C heatmaps were performed with Juicer and quality metrics were obtained with Assemblathon 2 script. Finally, the assembly was softmasked for repetitive elements using EDTA with the -cds flag to incorporate CDS sequences from the transcriptomic data. Parameter details of each of the commands can be found in github.com/Luisitox/Helichrysum_paper.

RNA sequencing and genome annotation

[0515] RNA was extracted from seven tissues: young leaves, old leaves, florets and receptacles of flowers, stems, roots and trichomes (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). RNA integrity was checked using a TapeStation instrument. Paired-end Illumina libraries were prepared for five of the tissues and sequenced on Illumina HiSeq 3000 instrument (PE 2x150, ~40 M reads per sample) and processed following Freedman and Weeks guidelines. Briefly, random sequencing errors were corrected using Rcorrector and uncorrectable reads were removed. Adaptor and quality trimming were performed using TrimGalore! Ribosomal RNA was filtered by discarding reads mapping to SILVA_132_LSURef and SILVA_138_SSURef non-redundant databases using bowtie2. Fastq quality checks on each of the steps were performed using MultiQC. The remaining reads were pooled and used for genome-guided and genome-independent de novo transcriptome assembly using Trinity.

[0516] The Iso-Seq data was obtained from four of the tissues (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)) and processed with isoseq3. Fused and unspliced transcripts were removed, and only polyA-positive transcripts were kept for a unique set of high-quality isoforms. Iso-Seq and Trinity transcripts were aligned to the assembly using minimap2 and the BAM files were incorporated to the PASA pipeline to generate RNA-based gene model structures. In addition, de novo gene structures were obtained using the software braker2 and the BAM file alignments of long and short reads as extrinsic training evidence. Ab initio and RNA-based gene models were combined using EvidenceModeler followed by a final round of PASA pipeline. Gene functional annotation was performed for the predicted mature transcripts using TransDecoder, which takes into account HMMER hits against PF AM and BLASTP hits against UniProt databases for similarity retention criteria. Further annotation of protein-coding transcripts was performed by taking the best hit of BLASTP searches against other plant protein databases (Uniprot protein fasta files of sunflower id UP000215914_4232, Arabidopsis id UP000006548_3702, tomato id UP000004994_4081, rice id UP000059680_39947 and Cannabis NCBI id GCF_900626175.1_csl0). Signal peptides were predicted with SignalP, transmembrane domains were predicted with TMHMM, and GO and KEGG terms were obtained with Trinotate. The full script used for the functional annotation of the proteins can be found in github.com/Luisitox/Helichrysum_paper. BUSCO was used at multiple stages of the analysis to assess the completeness of the different versions of both the transcriptome and the genome.

3 ’ RNA sequencing and gene co-expression network analysis

[0517] UMLbased 3’ RNAseq of three replicates of the seven tissues was obtained similarly as described. Adaptor and quality trimming were performed using TrimGalore! in two steps, including PolyA trimming mode. Reads were mapped to the genome using STAR UML deduplicated using UMLtools, and counts were obtained with featureCounts. Normalization was performed with the varianceStabilizingTransformation algorithm of DESeq2, and the CEMItools package was used for co-expression analysis (dissimilarity threshold of 0.6, pvalue of 0.1).

Circos and gene cluster plots

[0518] Gene and TEs density were calculated by intersecting the corresponding gff files with 0.1 Mb non-overlapping windows using bedtools makewindows and bedtools intersect. True-seq and Tran-seq coverage were calculated using bedtools genomecov in BedGraph format. The circus plot was made with the R circlize package, and the gene clusters plots were made with the gggenes package. The full R scripts can be found at github.com/Luisitox/Helichrysum_paper. Phylogenetic analyses of functionally tested enzymes

[0519] The selection of the proteins for each of the families analyzed in this study was based on functionally tested enzymes according to studies referenced in each Figure. The full list of IDs can be found in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023). The Maximum Likelihood trees were constructed with 100 bootstrap tests based on a MUSCLE multiple alignment using the MEGA11 software. The evolutionary distances were computed using the JTTmatrix -based method.

Orthology and synteny analyses

[0520] Proteomes were obtained from all available annotated Asteraceae genomes present in NCBI: GCA_003112345.1 (Artemisia annua), GCA_009363875.1 (Mikania micrantha), GCA_023376185.1 (Cichorium endivia), GCA_023525715.1 (Cichorium intybus), GCA_023525745.1 (Arctium lappa), GCA_023525975.1 (Smallanthus sonchifolius), GCA_024762085.1 (Ambrosia artemisiifolia), GCF_001531365.2 (Cynara cardunculus var. scolymus), GCF_002127325.2 (Helianthus annuus), GCF_002870075.4 (Lactuca sativa), GCF_010389155.1 (Erigeron canadensis) and Cannabis sativa GCA_900626175.1. Orthogroups and their phylogenetic relationship were inferred with Orthofinder. Genomic positions and putative function of all the genes belonging to the orthogroups of HuCoAT6 (OG0014461), HuOLS4 (OG0000313), and HuCBGAS4 (OG0002538) were determined using the corresponding GFF files and the plots were produced with the gggenomes package. Phylogenetic gene trees generated by Orthofinder were plotted with MEGA11.

^-Glucosidase assay for preparation ofDHSA 93

[0521] MPLC fractions (50 ml each) containing Glc-DHSA 103 were evaporated using a rotary evaporator at 40 °C, lyophilized and reconstituted in 15 ml Mcllvaine buffer (20 mM, pH 5.0). Reactions were performed in separate 20 ml vials incubated at 45 °C for 24 h. Each reaction consisted of 6 ml of Mcllvaine buffer (pH 5.0), 3 ml of 0.1 mg ml’ ¹ of an almond P-glucosidase solution in Mcllvaine buffer (>6 U mg’ ¹, Sigma Aldrich), and 1.5 ml of the fractions containing Glc-DHSA 103. The metabolites were extracted using 3 volumes of ethyl acetate: diethyl ether 1:1, evaporated using a rotary evaporator and reconstituted in 5 ml methanol. The products from the reaction contained a mixture of both glucosylated and non-glucosylated metabolites. DHSA 93 was therefore purified using System 2 and reconstituted in 100 pl methanol for the enzymatic assay. The purified DHSA 93 was analyzed via UPLC-qTOF to verify that the purified fraction did not contain Glc-DHSA 103.

AAE, PKS, PKC, UGT and A AT expression in E. coli and protein purification

[0522] HuAAEl-6, HuUGTl-13 and HuAATl-15 coding sequences from H. umbraculigerum and previously characterized sequences from rice (OsUGT) and stevia (SrUGlf were individually cloned into the pET28b vector digested with EcoRI using the ClonExpress II one step cloning kit (Vazyme, Germany). HuPKSl-4, HuPKCl -5, CsOLS and CsOAC were ligated into the pOPINF vector (digested with Hindlll and Kpnl) using the ClonExpress II one step cloning kit (Vazyme, Germany). Due to the high sequence similarity of the coding sequences, HuPKS2-4 were synthesized by the company Twist Biosciences. All constructs were expressed in E. coli BL21 (DE3) cells (a complete list of the primers can be found in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Bacterial starters were grown overnight in LB medium at 37 °C, diluted in fresh LB 1:100, and re-incubated at 37 °C. When cultures reached A600 = 0.6, protein expression was induced with 400 pM of isopropyl-l-thio-P-d-galactopyranoside (IPTG) overnight at 15 °C. Bacterial cells were lysed by sonication in 50 mM Tris-HCl pH 8.0, 0.5 mM phenylmethyl sulfonyl fluoride (PMSF, Sigma Aldrich) solution in isopropanol, 10% glycerol and protease inhibitor cocktail (Sigma Aldrich), and 1 mg ml’ ¹ lysozyme (Sigma Aldrich). The whole-cell extract was either kept for functional activity or used for protein purification. Purification of hexahistidine-tagged proteins was performed on Ni-NTA agarose beads (Adar Biotech). The proteins were eluted with 200 mM imidazole (Fluka) in buffer containing 50 mM NaH2PO4, pH 8.0. and 0.5 M NaCl. Protein concentration of the eluted fractions was measured with Pierce™ 660 nm protein assay reagent (Thermo Scientific).

AAE enzyme assays

[0523] Recombinant AAE assays were performed in a 20 pl reaction mix that contained 0.1 pg recombinant AAE, 50 mM HEPES pH 9.0, 8 mM ATP, 10 mM MgCh, 0.5 mM Co A and 4 mM of the sodium salt of the respective acid (acetic, butyric, hexanoic, octanoic, cinnamic and coumaric acids) for 10 min at 40 °C. Reactions were terminated with 2 pl of 1 M HC1 and stored on ice until analysis. After centrifugation at 15000 g for 5 min at 4 °C, the samples were diluted 1 : 100 in water and analyzed on the TQ-S system in MRM mode using a similar column as previously described. The system was operated with an aqueous buffer pH 7.0 (10 mM Ammonium Acetate, 5 mM NH4HCO2, phase A) and acetonitrile (phase B). The flow rate was 0.3 ml min ^-1, and the column temperature was kept at 25 °C. Metabolites were analyzed using a 15 min multistep gradient method: initial conditions were 1% B raised to 35% B until 10.5 min, and then raised to 100% B until 11 min, held at 100% B for 1 min, decreased to 1% B until 12.5 min, and held at 1% B until 15 min for re-equilibration of the system. The instrument was operated in positive mode with a capillary voltage of 3.0 kV, and a cone voltage of 50 V. Metabolite identity was confirmed with authentic standards. Two different transitions were used for analysis of: acetyl-CoA (810.52 > 303.30, 27.0V; 810.52 > 428.25, 24.0V); butyryl-CoA (838.58 > 331.30, 28.0 V; 838.58 > 331.30, 25.0 V); hexanoyl-CoA (866.65 > 359.40, 28.0 V; 866.65 > 428.25, 26.0 V); octanoyl-CoA (894.65

> 387.55, 30.0 V; 894.65 > 428.25, 28.0 V); coumaroyl-CoA (914.59 > 407.37, 30.0 V; 914.59 > 428.25, 28.0 V); cinnamoyl-CoA (898.59 > 391.37, 30.0 V; 898.59 > 428.25, 28.0 V).

PKS and PKC enzyme assays

[0524] Individual and coupled HuPKS and PKC (HuOACs or CsOAC) assays were carried out as described by Gagne et al. (2012) with some modifications. Enzyme assays were performed in 50 pL with 20 mM HEPES at pH 7.2, 5 mM DTT, 1.8 mM malonyl CoA and 0.6 mM of hexanoyl-CoA. HuPKSs (5 pg) and PKCs (10 pg), were added either individually or in combination. Reaction mixtures were incubated at 30 °C for 3 h. Reactions were stopped by extraction with 100 pL methanol, vortexing and centrifugation at 15 000 g for 10 min. The supernatant was filtered and analyzed with both UPLC-qTOF and triple-Quad systems. The column and mobile phase were as for the metabolic profiling. Initial conditions were 10% B raised to 70% until 6 min, raised to 100% B until 6.2 min, held at 100% B until 8 min, decreased to 10% B until 8.5 min, and held at 10% B until 11 min for re-equilibration of the system. The flow rate was 0.3 ml min ^-1, and the column temperature was kept at 35 °C. UPLC-qTOF was run in both polarities with MS or MS/MS modes using similar parameters as previously described. The TQ-S system was operated in MRM mode in both positive (for olivetol) and negative modes with a capillary voltage of 3.5 or 1.5 kV, respectively, and a cone voltage of 40 or 20 V, respectively. Two different transitions were used for analysis of: OA 92 (223.1 > 179.1, 15.0 V; 223.1 > 137.1, 20.0 V); PDAL (181.2 > 137.1, 10.0 V; 181.2

> 97.1, 20.0 V); HTAL (223.1 > 179.1, 10.0 V; 223.1 > 125.1, 10.0 V); PCP 95 (223.1 > 179.1, 20.0 V; 223.1 > 81.0, 25.0 V); olivetol (181.1 > 111.0, 10.0 V; 181.1 > 71.2, 10.0 V). Olivetol, OA 92 and PCP 95 identities were confirmed with authentic standards.

PT enzyme assays [0525] HuPTl-4 genes from H. umbraculigerum were separately cloned into pESC-TRP vector. Microsomal preparations from yeast cells transformed with pESC-TRP vectors were performed as described by Jozwiak et al. (2020). PT enzymatic assays were carried out as described previously for CsPT4 ⁸ with some modifications. The microsomes from yeasts expressing the HuPTs were resuspended in 3.3 ml buffer (10 mM Tris-HCl, 10 mM MgC12, pH 8.0, 10% glycerol) and homogenized with a tissue grinder. The enzyme assays were performed in 50 pL with 2 pl of the respective membrane preparations dissolved in the reaction buffer (50 mM Tris-HCl, 10 mM MgC12, pH 8.0), with 500 pM of the aromatic acceptor [OA 92, VA, DHSA 93, PCP 95, naringenin chaicone 97 or pinocembrin chaicone 100] and 500 pM of the isoprenoid (IPP, GPP or FPP). Samples were incubated for 1 h at 30 °C. Kinetic assays were similarly performed with 1 mM of GPP and varying (0.5 pM- 1.5 mM) concentrations of OA 92, with 15 min incubation at 30 °C. Samples were extracted with 100 pl ethanol followed by vortexing and centrifugation. The supernatant was filtered and analyzed via UPLC-qTOF as for the terpenophenols (UPLC Method 1).

UGT enzyme assays

[0526] The UGT enzyme assays were performed as described by Cai et al. (2021) with some modifications. UGT assays using different aromatic substrates were performed by mixing 1.5 pl of the UDP-Glc solution (80 mM, final concentration: 2.5 mM), 27.5 pl Tris buffer (100 mM, pH 8.0), 1 pl of each of the substrates (50 mM, final concentration: 1 mM) and 20 pl of the lysate enzyme solution. The reactions were incubated at 30 °C for 1 h. Reactions were stopped by extraction with 100 pl methanol, vortexing and centrifugation at 15,000 g for 10 min. The supernatant was filtered and analyzed via UPLC-qTOF using UPLC Method 2. The assay with the purified UGTs was performed by mixing 2 pl of the cannabinoid acceptors (OA 92, DHSA 93, CBGA 1, heliCBGA 2, CBDA, A ⁹-THCA, CBCA 15, olivetol, CBG, CBD or A ⁹-THC, PCP 95, naringenin chaicone 97 or pinocembrin chaicone 100) in the presence of 1.5 pl UDP-Glc 80 mM, 46.5 pl Tris buffer (100 mM, pH 8.0) and 1 pl of each enzyme. The metabolites were extracted and analyzed as previously described. Kinetic assays were performed with the purified enzymes (1.5 pg pl ¹) dissolved in 45 pl Tris buffer (100 mM, pH 8.0) and substrates were added using varying (0.5 pM -3 mM) and constant (1 mM) concentrations of OA 92 and UDP-Glc and the total reaction volume was 50 pl. To stop the reactions, 100 pl methanol was added to each tube, and the metabolites were extracted and analyzed as previously described.

AAT enzyme assay [0527] Recombinant AAT assays using different donor and acceptor substrates were performed by mixing 7 pl of the cannabinoid acceptors (OA 92, CBGA 1, or heliCBGA 2,

1 mg ml ¹) with 58 pl of a potassium phosphate buffer (100 mM, pH 7.4), 5 pl of the acyl- CoA donors (butyryl-CoA, hexanoyl-CoA, iso-valeryl-CoA, or acetyl-CoA, 10 mM) and 30 pl of the enzyme solutions. The reactions were incubated at 30 °C for 3 h. Samples were extracted with 100 pl ethanol followed by vortexing and centrifugation. The supernatant was filtered and used for UPLC-qTOF analysis using a similar column, mobile phase and MS parameters as previously described for terpenophenols. Initial conditions were 40% B for 1 min, raised to 100% B until 14 min, held at 100% B for 3.8 min, decreased to 40% B until 18 min, and held at 40% B until 20 min for re-equilibration of the system. The flow rate was 0.3 ml min ^-1, and the column temperature was kept at 35 °C.

[0528] The assay with the purified HuCBAT5 enzyme was performed by mixing 2 pl of the cannabinoid acceptors (OA 92, CBGA 1, heliCBGA 2, CBDA, A ⁹-THCA or CBCA 15) with

2 pl of the acyl-CoA donors (butyryl-CoA, iso-butyryl-CoA, hexanoyl-CoA, iso-valeryl- CoA, or acetyl-CoA, 10 mM), 44 pl of a potassium phosphate buffer (100 mM, pH 7.4), and

2 pl of the purified HuCBAT5 enzyme solution. The reactions were incubated at 30 °C for

3 h. To stop the reactions, 50 pl ethanol was added to each tube and the acylated metabolites were extracted and analyzed via UPLC-qTOF as for the terpenophenols (UPLC Method 1) in both MS and MS/MS modes. Extracted ion chromatograms using the major products were selected from the LC-MS/MS analyses as follows: cannabinoid acceptors without CoAs: OA 92>179.107, CBGA 1, CBCA 15>191.107, heliCBGA 2>225.092, CBDA, A ⁹- THCA>245.154; acylated cannabinoids: OA 92>179.107, CBGA 1>231.102, heliCBGA 2>265.086, CBDA>245.154, A ⁹-THCA>245.154, CBCA 15> 191.107).

Transient expression of selected genes in N. benthamiana

[0529] Overexpression constructs of GFP (as negative control), CsOLS and CsOAC were generated using GoldenBraid cloning as described by Jozwiak et al. 2020 to a final vector of pAlpha2-UbqlO-CCD-TerlO. H11C0AT6, HuTKS4, and HuCBGAS were amplified and cloned in pAlpha2-NPT ILUbqlO-CCD-TerlO vector digested with Bsal using ClonExpress II One Step Cloning kit (Vazyme). The full list of oligonucleotides used for cloning can be found in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023). All plasmids were sequenced and transformed into Agrobacterium tumefaciens strain GV3101 by electroporation. A. tumefaciens harboring the overexpression constructs were grown overnight at 28 °C in Luria-Bertani (LB) medium in the presence of kanamycin and gentamycin. Bacterial cells were collected by centrifugation, washed and resuspended in infiltration buffer (10 mM MES, 2 mM MgC12, 2 mM Na3PO4, 0.5% glucose and 100 mM acetosyringone) to OD600 = 0.3. Equal volumes of A. tumefaciens suspension with different expression vectors were combined to obtain the desired gene combinations and incubated for 2 h at room temperature. The solutions were infiltrated into 4- or 5-week-old N. benthamiana leaves from the abaxial side using a 1-ml needleless syringe. Substrates (0.5 mM each) were infiltrated into the same leaf areas 2 days after initial infiltration, and leaves were collected for metabolite analysis after 24 h. Leaf samples were flash frozen and extracted as previously described with 300 pl methanol and analyzed on a similar UPLC system connected to an Orbitrap IQ-X Tribrid MS (Thermo Scientific, Bremen, Germany) using UPLC Method 2 in negative mode. The source parameters were: sheath gas flow rate, auxiliary gas flow rate and sweep gas flow rate: 45, 10 and 1 arbitrary units, respectively; vaporizer temperature: 300 °C; ion transfer tube temperature: 275 °C; spray voltage: 2.3 kV. The instrument was operated in full MS ¹ with data dependent MS/MS (MS-dd-MS ²). Data acquisition in full MS ¹ mode was 60,000 resolution, the scan range 100- 1000 m/z, normalized automatic gain control (AGC) target of 25% and a maximum injection time (IT) of 50 ms. Data acquisition in dd-MS ² mode was with 15,000 resolution, a normalized AGC target of 20%, maximum IT of 150 ms, isolation window of 1.5 m/z and normalized collision energy of 40. Identification of metabolites was performed using analytical standards and/or products from in vitro UGT enzyme assays (Figs. 4D and 12B).

Heterologous expression in S. cerevisiae

[0530] For the expression of HuCoAT6, HuTKS4, CsOAC and HuCBGAS in S. cerevisiae, the CDSs were amplified, and the purified amplicons were inserted into series of pESC (Amp ^R) plasmids allowing simultaneous expression of two genes from one plasmid. HuCoAT6 and HuTKS4 were inserted using ClonExpress II One Step Cloning kit (Vazyme) into pESC-HIS plasmid linearized with Sall and Sad restriction enzymes, respectively. HuCBGAS and CsOAC were cloned in the same way into pESC-TRP plasmid linearized with SalFSacI restriction enzymes, respectively. The full list of primers used for the cloning can be found in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9817-831 (2023). pESC constructs were transformed into S. cerevisiae WAT11 using Yeastmaker yeast transformation system (Clontech). The inventors transformed yeast cells with combinations of pESC vectors allowing expression of all the four genes at once. Transformed yeast were grown on SD minimal media supplemented with appropriate amino acids and 2% glucose. Colonies were screened and the presence of the transgene was confirmed by colony PCR. For induction of gene expression, transformed cells were grown in 2 ml minimal medium with 2% glucose and after 24 h transferred to a minimal medium with 2% galactose without additional supplementation or supplemented with GPP (0.21 mM) and either sodium hexanoate (1 mM) or OA 92 (0.2 mM), and grown for additional 24 h at 30 °C. Cultures were transferred to 2 ml Eppendorf tube and centrifuged at 8,000 g for 1 min. The cell pellet was weighed, double the amount of glass beads (diameter 500 pm) and 500 pl of MeOH was added and lysed using a bead beater at 22 Hz for 6 min. Lysed cells were centrifuged at 14,000 r.p.m. for 5 min, clear supernatant was collected and dried using SpeedVac. Dry residues were dissolved in 100 pl of methanol, filtered through a 0.22 pm filter and analyzed on LC-MS as detailed for A. benthamiana samples.

EXAMPLE 1

H. umbraculigerum produces CBGA

[0531] As two earlier reports regarding the presence of cannabinoids, specifically CBGA 1, in H. umbraculigerum were contradictory, the inventors decided to carry out comprehensive chemical profiling of cannabinoids in various H. umbraculigerum tissues. The inventors confirmed that CBGA 1 is a major component of H. umbraculigerum, accumulating up to 4.3% on a dry weight basis in leaves (Figs. 1C-1D) comparable to the maximum typically measured concentrations in inflorescences of Cannabis chemotypes (Fig. ID). CBGA 1, its phenethyl analog heliCBGA 2, and pre-amorphastilbol (APHA, 3), the stilbene form of heliCBGA 2, represent three of the major peaks in the total ion chromatogram of a fresh leaves ethanolic extract (Figs. 1C-1E, and 7A).

[0532] The inventors predicted that CBGA 1 and heliCBGA 2 biosynthesis originates from hexanoic acid and phenylalanine, respectively (Fig. 1A). Therefore, the inventors fed H. umbraculigerum leaves with unlabeled and stable isotopically labeled hexanoic acid (hexanoic-Dn acid) or phenylalanine (phenylalanine-Ds or phenylalanine- ¹³C9) and compared the labeled versus non-labeled masses and their respective tandem mass spectrometry (MS/MS) spectra (Fig. 7B). Consequently, newly derived isotopologues were detected as co-eluting chromatographic peaks (unlabeled and labeled forms) with mass shifts and MS/MS fragmentation patterns corresponding with the isotopically-labeled parts of the molecule. These findings validated the existence of the alkyl and aralkyl cannabinoids in H. umbraculigerum and confirmed that their biosynthesis derives from the polyketide and phenylpropanoid pathways, respectively. Feeding experiments revealed the presence of some additional major prenyl-acyl-phloroglucinoids, prenylchalcones and prenylflavanones with similar chemical formulas as 1-3. Based on previously identified core structures and each metabolite MS/MS fragmentation spectra, the inventors assigned these peaks to the structures shown in Fig. IE [see also Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)].

EXAMPLE 2

Cannabinoids accumulate in glandular trichomes

[0533] The inventors employed various high-resolution imaging technologies to examine if, like Cannabis, H. umbraculigerum develops and accumulates cannabinoids in glandular trichomes. The inventors found that in flowers, the involucral bracts of the capitula had numerous non-glandular and glandular trichomes. In individual florets, glandular trichomes were particularly abundant on the tips of the corolla lobe (Figs. 8A-8B). In leaves, both the adaxial and abaxial surfaces were densely covered with both non-glandular and glandular trichomes (Fig. IF). The glandular trichomes were slightly elevated from the epidermis and consisted of a biseriate stalk and a globose head (Fig. 8C). Two disk cells (DCs) were observed in the subcuticular space of the globose head (Fig. 1G). In Cannabis, cannabinoid biosynthesis takes place in these cells. The multicellular biseriate structure of the trichomes further consisted of two basal cells (BCs, not always observed), stalk cells (SCs), neck cells (NCs), and a secretory cavity (SCv) (Fig. 1H). DCs of trichomes at the secretion stage showed exudation of electron transparent secretions from plastids into vesicles, followed by exocytosis of their contents into the periplasmic space (PSP), where they accumulated prior to secretion into the SCv (Figs. II and 2D-2F).

[0534] Next, the inventors applied matrix-assisted laser desorption/ionization-mass spectrometry imaging (MALDI-MSI) to spatially localize cannabinoids in H. umbraculigerum. The inventors first analyzed the abaxial and adaxial leaf surfaces following partial removal of trichomes (Figs. 8G-8H imaging m/z [M+H] ⁺ = 361.237 Da corresponding to CBGA 1 and geranylphlorocaprophenone 4). As shown, metabolites were detected in the intact parts, while areas with partially or fully removed trichomes showed less or no signals, respectively. The inventors further analyzed cross-sections of H. umbraculigerum leaves and flowers. The inventors sectioned leaves crosswise so that trichomes on the adaxial and abaxial parts were exposed on each side (Fig. 2A). In flowers, the inventors sectioned the receptacle, exposing trichomes on the outer surface of the involucral bracts (Fig. 81). As shown in Fig. 2B and Fig. 8J for the leaf and flower samples, respectively, CBGA 1 was found exclusively in glandular trichomes. EXAMPLE 3

H. umbraculigerum produces both classical and novel cannabinoids

[0535] Cannabis produces various CBGA-type analogs with aliphatic chains of different lengths (one to seven carbons), derived from different linear short- and medium-chain fatty acids (FAs). The inventors observed in leaves of H. umbraculigerum several of these analogs, including cannabigerovarinic acid (CBGVA 9), cannabigerol butyric acid (CBGBA 10), cannabigerohexolic acid (CBGHA 11), and cannabigerophorolic acid (CBGPA 12), corresponding to three, four, six, and seven carbon-atom chains, respectively (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). The inventors also observed two metabolites with similar masses and fragmentation patterns as CBGA 1 and CBGHA 11, which the inventors assigned as cannabinoids derived from branched FAs (13 and 14, respectively, Berman et al., "Parallel evolution of cannabinoid biosynthesis" ; Nature Plants 9 817-831 (2023)). These branched cannabinoids have not been identified in Cannabis. The inventors also found small amounts of CBCA 15 and its aromatic analog helichromenic acid (heliCBCA 16) and their hydroxylated forms (17 and 18, respectively, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)), and the isoprenyl-forms of CBGA 1 and heliCBGA 2 according to MS/MS fragmentation (CBPA 19 and heliCBPA 20, respectively, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). The inventors did not detect A ⁹-THCA- or CBDA-type cannabinoids in any of the tissues.

[0536] Some additional peaks exerted MS/MS fragments and chemical formulas corresponding to one or two hydroxylations of the metabolites with five-carbon-atom chains, which were labeled following feeding with hexanoic-Dn acid (21-33, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Interestingly, hydroxylated amorfrutins were observed with similar fragmentation patterns as the cannabinoids (with m/z difference of 33.984 Da), suggesting similar chemical structures and enzymes associated with their metabolism (34-46, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). The inventors purified from this group metabolite 26 and identified by NMR spectroscopy a new tetrahydroxanthane- type cannabinoid (12-OH-cyclocannabigerolic acid 26). According to its MS/MS fragmentation pattern, the inventors also putatively identified cyclocannabigerolic acid (cycloCBGA 47) and analogous amorfrutin types [12-OH-heli-cyclocannabigerolic acid (12-OH-helicycloCBGA 39) and heli-cyclocannabigerolic acid (helicycloCBGA 48), respectively, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)].

[0537] According to the current feeding experiments, prenyl-acyl-phloroglucinoids, prenylchalcones, and prenylflavanones were derived from similar precursors as the cannabinoids and amorfrutins (49-91, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). A summary of the identified metabolites 1- 91 appears in Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023).

EXAMPLE 4

Proposed cannabinoid biosynthetic pathway in H. umbraculigerum

[0538] The inventors postulated that the core cannabinoid pathway leading to CBGA 1 in H. umbraculigerum consists of similar types of enzymes and reactions as in Cannabis (Fig. 9). These include: an acyl-activating enzyme (AAE) for the activation of hexanoic acid into hexanoyl-CoA; a type III polyketide synthase (PKS) and a polyketide cyclase (PKC) to produce olivetolic acid (OA 92) and a membrane-bound aromatic prenyl transferase (PT) for geranylation of OA 92 to CBGA 1. In addition to CBGA 1 and other cannabinoids, the inventors propose that all the identified terpenophenols are produced via five parallel pathways (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). According to this scheme, cannabinoids and phloroglucinoids derive from a common linear or branched FA precursor activated via the same AAE enzyme. Amorfrutins and chaicones derive from cinnamic or coumaric acids, which originate from phenylalanine, and are also activated via an AAE enzyme (similar or different from the polyketide one). These activated intermediates can be further reduced by a double bond reductase (DBR) to form dihydro intermediates. The activated precursors are elongated using three malonyl CoAs by one or more PKS -type enzymes, and further cyclized by the PKS in a Claisen reaction to form the phloroglucinoid or chaicone backbone, or in an aldol reaction assisted by a PKC to form the cannabinoids and amorfrutins. The fifth pathway employs a chaicone isomerase (CHI) enzyme that cyclizes chaicones to flavanones. All these intermediates are further prenylated by one or more PTs to form the different types of terpenophenols. Although most of the molecules enclosed are monoprenyls, other prenyl types were also observed. The terpenophenols can be further cyclized by berberine bridgelike enzymes (BBE-like) to produce cyclized metabolites like CBCA 15, cyclocannabinoids and cycloamorfrutins (26, 47, 39 and 48), and also cyclophloroglucinoids previously identified by Pollastro et al. (2017). Additional functional groups and rearrangements include hydroxylation, double bond isomerization or reduction and others. In support of these five pathways, the inventors identified in H. umbraculigerum the primary intermediates (before prenylation) from all the corresponding metabolic routes (92-101, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)).

EXAMPLE 5

Elucidation of the core cannabinoid pathway

[0539] To identify the enzymes responsible for cannabinoid biosynthesis in H. umbraculigerum, the inventors obtained a haplotype resolved dual genome assembly using 44x Pacbio HiFi reads, and 200 M reads of Illumina HiC chromatin interaction data (haploid size of ~1.3 Gb, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). After scaffolding the N50 of the primary assembly was 174 Mb with eight scaffolds >10 Mb (Fig. 2C, and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). The inventors also obtained RNAseq data using PacBio Iso-Seq, Illumina True-Seq, and Illumina UMLaware 3’ Transeq of different tissues (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). The genome was soft masked (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)) and gene models were obtained reaching BUSCO completeness values of 98.7% for the primary assembly and 99.3% for all transcripts, including those missing in the genome (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Based on Fig. 2B, the inventors expected that the biosynthetic genes would be highly expressed in trichomes. Weighted gene co-expression network analysis of H. umbraculigerum tissue transcriptomic data (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817— 831 (2023)) revealed a transcriptional module enriched in FA and terpenoid biosynthetic genes induced in trichomes and leaves (Figs. 2D-2E, and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). This module included two AAEs, three PKSs, one stress-related protein (potential PKC) and one PT (Fig. 2E and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Notably, three of these PKSs were also located in a tandem gene cluster consisting of seven enzymes of the same type (Fig. 2C). This region exhibited strong footprints of long terminal repeat (LTR) transposition activity, which might explain the observed patterns of gene duplication (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Overall, the inventors selected six HuAAEs, four HuPKSs, five HuPKCs and four HuPTs for further characterization (Fig. 3A and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). The four selected PKSs showed subtle amino acid differences that would have been overlooked without the genomic sequence, and the inability to amplify the different variants from cDNA led us to produce the genes synthetically.

[0540] The first step in cannabinoid biosynthesis involves the formation of acyl-CoA thioesters by members of the AAE superfamily. As different acyl moieties are substrates for these enzymes, the inventors tested acetic, butyric, hexanoic, octanoic, cinnamic and coumaric acids. In vitro assays with purified recombinant proteins showed that HuAAE2 and HuAAE4 efficiently produced butyryl-CoA, and that HuAAE2 presented higher activity against acetic acid and formed acetyl-CoA (Figs. 3B and 10A). HuAAE6 (HuCoAT6) was the only enzyme with activities towards both medium chain alkyl (e.g., hexanoic and octanoic acids) and aralkyl (e.g., cinnamic and coumaric acids) precursors required for the five types of terpenophenols observed in H. umbraculigerum. Interestingly, while HuAAE4 belongs to the same clade as the most active Cannabis enzyme, HuCoAT6 is located within the clade of long-chain acyl-CoA synthetases (LACS, Fig. 11A and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)).

[0541] In Cannabis, the next step is performed by a coupled enzymatic reaction involving a CsOLS and the accessory protein CsOAC, resulting in the condensation of hexanoyl-CoA with three molecules of malonyl-CoA to yield OA 92. In in vitro assays, derailment of the unstable intermediates occurs producing additional by-products not naturally identified in plant extracts [olivetol, pentyl acyl diacetic acid lactone (PDAL) and hexanoyl acyl triacetic acid lactone (HTAL), Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)]. PDAL and HTAL are produced by spontaneous lactonization of the tri- and tetra-ketide unstable intermediates, whereas olivetol is produced by CsOLS in the absence of CsOAC in an aldol decarboxylation cyclization reaction resembling the production of resveratrol by a stilbene synthase (STS). When CsOAC is also present in the reaction, OA 92 is produced at the expense of olivetol. Here, the inventors cloned and expressed in E. coli HuPKSl-4, HuPKCl-5, CsOLS and CsOAC enzymes, and tested using hexanoyl-CoA and malonyl-CoA their ability to form OA 92 in coupled in vitro assays with all the possible combinations (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). In the absence of PKCs, all the HuPKSs produced the PDAL and HTAL by-products, while HuPKSl, HuPKS2 and HuPKS4 produced also olivetol (Fig. 3C and 10C). When the reactions were performed coupled to CsOAC, olivetol decreased and OA 92 increased, especially for HuPKS4 (HuTKS4) (Fig. 3C and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817— 831 (2023)). However, in all the reactions, considerably smaller amounts of olivetol and OA 92 were observed compared to HTAL and PDAL (Fig. 3C). Interestingly, regardless of CsOAC, all HuPKSs produced the phloroglucinoid precursor phlorocaprophenone 95 (PCP), present in H. umbraculigerum (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023), and Fig. 3C). This suggested that the same HuPKS enzyme can carry out both the aldol and Claisen cyclization reactions. This phenomenon has been observed previously for CHS and STS enzymes producing different amounts of both naringenin and resveratrol, and PKSs producing different ratios of both resorcinolic-acid and phloroglucinoid products. Interestingly, the HuPKS protein sequences did not cluster with known resorcinolic-acid or phloroglucinoid producing PKSs such as CsOLS, Rhododendron dauricum orcinol synthase (RdOS) or Humulus lupulus valerophenone synthase (H1VPS) (Fig. 11B). None of the combinations including HuPKS 1- HuPKS4 or CsOLS with the HuPKC enzymes (selected based on their expression profile and sequence homology to CsOAC) resulted in the formation of OA 92 (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). This suggests that the cyclization and possibly stabilization of the tetraketide intermediate is mediated by a different type of enzyme than in Cannabis. This was previously suggested to occur in Rhododendron dauricum in the production of orselinic acid by RdOS and a yet to be identified PKC enzyme ²⁰. Alternatively, H. umbraculigerum may contain another CsOAC homolog that the inventors did not characterize in this study.

[0542] In the next step, OA 92 or OA-derivatives are prenylated by aromatic PTs to form CBGA 1 and its derivatives. The inventors expressed four enzymes in yeast and purified the microsomal fractions used for enzymatic assays (HuPTl-4, Fig. 3D). The inventors examined an array of aromatic substrates and either geranyl pyrophosphate (GPP) or isopentenyl pyrophosphate (IPP) as the isoprenoid donors. All the HuPTs geranylated OA 92 and divarinolic acid (VA) to yield CBGA 1 and CBGVA 9, respectively. HuPT4 geranylated also the aromatic dihydro stilbenic acid (DHSA 93) and was the only enzyme that isoprenylated OA 92 and DHSA 93 (Fig. 3D). HuPT4 was also active with farnesyl pyrophosphate (FPP) yielding sesquicannabigerolic acid (SesquiCBGA, Fig. 10D). Kinetic assays of the HuPTs with GPP and OA 92 revealed that HuPT4 (HuCBGAS4) exhibited a smaller Michaelis-Menten mi value than the reported one from Cannabis CsPT4 [Figs. 3e, and 10E]. Interestingly, none of the HuPTs prenylated the phloroglucinoid or chaicone intermediates, and none of their sequences clustered with previously known terpenophenolic PTs (Fig. 3F).

[0543] To get more insight to the evolution of the pathway, the inventors searched for orthologous enzymes in Cannabis and in all other Asteraceae species with annotated genomes. To the best of inventors' knowledge, these species do not accumulate terpenophenols. Similarly, to the phylogenetic relationships observed for functionally tested enzymes (i.e., AAEs, PKSs and PTs, Fig. 11), the enzymes that enabled H. umbraculigerum to produce cannabinoids evolved independently in this lineage. Particularly for the PKS-type enzymes, multiple instances of gene duplication and subsequent specialization are likely to have occurred within this family. Interestingly, the PTs from Cannabis and ones from H. umbraculigerum did not cluster in the same orthogroup, suggesting that they are derived from evolutionary distant ancestors.

EXAMPLE 6

Decorated cannabinoids are formed by UGT- and BAHD- type enzymes

[0544] Glycosylated cannabinoids have not been reported to occur naturally in planta. Here the inventors identified glucosylated OA (Glc-OA 102) and glucosylated DHSA (Glc-DHSA 103) as well as glucosylated C3-C6 alkyl-chain intermediates (104-108), glucosylated CBGA (Glc-CBGA 109) and heliCBGA (Glc-heliCBGA 110), and their isoprenylated forms (Glc- CBPA 111 and Glc-heliCBPA 112) (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). All these metabolites exhibited neutral losses of 162.053 Da corresponding to hexose and similar fragments as the non-glucosylated compounds. Di-glucosylated metabolites were not identified in the extracts. In Arabidopsis thaliana uridine 5'-diphospho-glucuronosyltransferases (AtUGT89B l, AtUGT71Bl, AtUGT75B l and AtUGT71B2) catalyze the glycosylation of several hydroxybenzoic acids (HBA and DHBAs) which are structurally like OA 92 (Fig. 4A). The inventors selected thirteen gene candidates in H. umbraculigerum based on sequence similarity to these proteins and positive correlations between genes expression and the accumulation of glucosylated metabolites (HuUGTl-13, Figs. 4A-4B, and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). [0545] Eleven of the thirteen UGTs from H. umbraculigerum were expressed in E. coli and examined for enzyme activity using OA 92, CBGA 1, and heliCBGA 2 in a reaction including uridine diphosphate glucose (UDP-Glc) as the sugar donor. Eight out of the eleven enzymes showed activity on the different substrates, including HuUGTl-2, HuUGT4-7, HuUGTl 1, and HuUGT13 (Fig. 12A). The production of Glc-OA 102 in the enzyme assays with UDP-Glc was supported by the NMR assignment of the glucose moiety. The inventors next purified the four most active enzymes (HuUGTl, HuUGT6, HuUGTl l, and HuUGT13) and performed in vitro assays with an array of cannabinoid substrates, both natural and unnatural to H. umbraculigerum (Fig. 4D, and 12B, Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). The inventors also included enzymes from stevia and rice (SrUGT and OsUGT, respectively) reported to possess cannabinoid glycosylation activity despite these plants not producing cannabinoids. All enzymes were active with varying substrate specificity and products. For example, HuUGTl and HuUGT6 were most active on the cannabinoids (HuCBUGTl and HuCBUGT6), while HuUGTl l (HuOAUGTl l) and HuUGT13 were highly active on the cannabinoid intermediates while almost inactive on the prenylated metabolites. Di-glucosylation of acid metabolites was only observed in case of HuCBUGT6, while olivetol, cannabidiol (CBD) and cannabigerol (CBG) were di-glucosylated by different HuUGTs depending on the metabolite (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Interestingly, the UGTs from H. umbraculigerum also glucosylated the phloroglucinoid and flavonoid precursors naturally present in the plant (Fig. 12B), notwithstanding that, the glucosylated forms were not observed in the plant extracts. Kinetic assays of HuOAUGTl l, HuUGT13, OsUGT, and SrUGT showed highly significant catalytic activity of HuOAUGTl l with OA 92 and UDP-Glc as compared to all other enzymes (Figs. 4C and 12C). HuOAGTl l was also co-expressed with other cannabinoid- related enzymes (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)) and is therefore the most likely enzyme responsible for the large quantities of Glc-OA 102 and Glc-DHSA 103 produced in H. umbraculigerum (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)).

[0546] Previous reports identified in H. umbraculigerum isoprenylated O-acylatcd amorfrutins but not geranylated or alkyl-type ones which are also not found in Cannabis. Here the inventors identified a diverse group of O-acylatcd cannabinoids and amorfrutins including the (9-acylated alkyl (113-130) and aralkyl (131-141) metabolites (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). The inventors hypothesized that the acyl group is derived from short- or medium-chain FAs (Fig. 9) and verified this using a precursor isotope-labeling approach (Fig. 13A). Most of the alkyl cannabinoids in this group had five-carbon-atom tails (according to labeling with hexanoic - Dn acid), and both alkyl and aralkyl metabolites comprised iso- or monoprenyls, and linear or branched short-chain O-acyl groups as shown by the specific labeling (Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). To confirm the identification of this group of metabolites, the inventors purified O- methylbutyryl-cannabigerolic acid (O-McButCBGA 120) and O-methylbutyryl- helicannabigerolic acid (O-MeButheliCBGA 138) and confirmed their structure by NMR.

[0547] O-Acylation of specialized metabolites in plants is frequently catalyzed by BAHD- type alcohol acyl-transferase (AAT) enzymes. Therefore, the inventors selected fifteen H. umbraculigerum BAHD homologs, four of them co-expressed with other cannabinoid- related enzymes (Figs. 2E, and 4B and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Twelve of the fifteen AATs were expressed in E. coli and examined for their activity with butyryl- and hexanoyl-CoA as acyl donors, and CBGA 1 and heliCBGA 2 as acceptors. Only HuAAT5 and HuAAT14 showed activity towards these substrates (Fig. 13B). Phylogenetic analysis showed that these two enzymes clustered in clade Illa representing BAHDs of diverse catalytic functions (Fig. 13C). HuAAT5 (HuCBAT5) produced larger amounts of products and was therefore selected for in-detail characterization with an array of acyl donors and acceptors. It accepted all acyl donors tested and acylated OA 92, CBGA 1, heliCBGA 2, and CBDA, giving rise to a single C-acyl-cannabinoid from each pair of substrates (Figs. 4E-4F, and 14 and Berman et al., "Parallel evolution of cannabinoid biosynthesis"; Nature Plants 9 817-831 (2023)). Many of the cannabinoids produced were naturally observed in the plant (marked with an asterisk in Fig. 4E). On the other hand, the enzyme was inactive on A ⁹-THCA and CBCA 15. It is therefore likely that it only acylates the hydroxyl in C5. In addition, O-acyl esterification in H. umbraculigerum was only observed on prenylated cannabinoids and amorfrutins and not on their intermediates.

EXAMPLE 7

In vivo reconstruction of the core cannabinoid pathway in heterologous systems

[0548] The inventors verified the in planta activity of the enzymes towards CBGA 1 by transiently co-expressing different combinations of HuCoAT6, HuTKS4, and HuCBGAS4, and the Cannabis CsOAC and CsOLS in N. benthamiana leaves. Following leaves infiltration with sodium hexanoate and GPP, the inventors observed the production of glycosylated forms of OA 92 (HuTKS4+CsOAC or CsOLS+CsOAC) and PCP 95 (only with HuTKS4, Figs. 5A and 15A-15B). This was consistent with previous studies reporting OA 92 glycosylation by endogenous enzymes in this plant. Interestingly, the inventors also observed glycosylated products of naringenin chaicone 97 with HuTKS4, suggesting that this enzyme can accept aromatic substrates in addition to aliphatic types (Figs. 5A and 15A- 15B). However, the inventors did not observe CBGA 1 or its glycosylated forms with HuCBGAS4, likely due to the low availability of OA 92 and its rapid glycosylation in planta. When leaves expressing HuCBGAS4 were infiltrated with OA 92 and GPP, CBGA 1 and Glc-CBGA 109 were observed (Figs. 5B, 15A, andl5C).

[0549] The inventors also reconstituted the cannabinoid pathway by expressing the HuCoAT6, HuTKS4, CsOAC and HuCBGAS4 genes in S. cerevisiae. The inventors observed the production of OA 92, CBGA 1 and PCP 95 without precursor feeding (Figs. 5C, 15D, and 15E). Similarly, to the in vitro assays, the inventors also observed peaks of HTAL and PDAL which were not present in planta (Fig. 15F). When cells were supplemented with OA 92 and GPP, significantly larger amounts of CBGA 1 were produced (Fig. 5D).

[0550] Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

Previous Patent: 4EBP1 INHIBITION DURING GLUCOSE STARVATION AND USE THEREOF

Next Patent: POLYKETIDE SYNTHASE AND A TRANSGENIC CELL, TISSUE, AND ORGANISM COMPRISING SAME