Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CANNABICHROMENIC ACID SYNTHASE VARIANTS AND USES THEREOF
Document Type and Number:
WIPO Patent Application WO/2023/242195
Kind Code:
A1
Abstract:
The present invention relates to cannabichromenic acid synthase (CBCAS) variants that have increased enzymatic activity and thus allow efficient recombinant production of cannabichromenic acid (CBCA). The invention further relates to nucleic acids encoding this synthase, vectors comprising such nucleic acids, host cells that comprise the synthase variant, the nucleic acid or the vector, and methods for the production of CBCA using any of the synthase variant, the encoding nucleic acid, the vector or host cells described herein.

Inventors:
KAYSER OLIVER (DE)
THOMAS FABIAN JOHANNES (DE)
Application Number:
PCT/EP2023/065811
Publication Date:
December 21, 2023
Filing Date:
June 13, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV DORTMUND TECH (DE)
International Classes:
C12N1/16; C12N9/02; C12N15/52; C12P17/06
Domestic Patent References:
WO2021150636A12021-07-29
WO2022081681A12022-04-21
WO2022051433A12022-03-10
WO2015196275A12015-12-30
WO2022159589A12022-07-28
WO2020016287A12020-01-23
Other References:
ALTSCHUL, S.F.GISH, WMILLER, WMYERS, E.W.LIPMAN, D.J.: "Basic local alignment search tool", J. MOL. BIOL., vol. 215, 1990, pages 403 - 410, XP002949123, DOI: 10.1006/jmbi.1990.9999
ALTSCHUL, STEPHAN FTHOMAS L. MADDENALEJANDRO A. SCHAFFERJINGHUI ZHANGHHENG ZHANGWEBB MILLERDAVID J. LIPMAN: "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", NUCLEIC ACIDS RES., vol. 25, 1997, pages 3389 - 3402, XP002905950, DOI: 10.1093/nar/25.17.3389
GLOVER, D. M.: "DNA cloning: a practical approach", vol. 1-3, 1985, IRL PRESS LTD
"Vectors: a survey of molecular cloning vectors and their uses", 1988, pages: 179 - 204
GOEDDEL, D. V.: "Systems for heterologous gene expression", METHODS ENZYMOL., vol. 185, 1990, pages 3 - 7
SAMBROOK, JFRITSCH, E. F.MANIATIS, T: "Molecular cloning: a laboratory manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
CLAASSENS ET AL.: "Improving heterologous membrane protein production in Escherichia coli by combining transcriptional tuning and codon usage algorithms", PLOS ONE, 2017
CHMIEL: "Bioprocess technology 1. Introduction to bioprocess engineering", 1991, GUSTAV FISCHER VERLAG, article "Bioprozesstechnik 1. Einfuhrung in die Bioverfahrenstechnik"
STORHAS: "Bioreactors and peripheral equipment", 1994, VIEWEG VERLAG, article "Bioreaktoren und periphere Einrichtungen"
"Manual of Methods for General Bacteriology", 1981, AMERICAN SOCIETY FOR BACTERIOLOGY
Attorney, Agent or Firm:
MAIWALD GMBH (DE)
Download PDF:
Claims:
Claims

1 . Cannabichromenic acid synthase (CBCAS) variant comprising or consisting of an amino acid sequence having at least at least 80 %, at least 85 %, at least 90 %, at least 95%, at least 97.5%, at least 99% or at least 99.5% sequence identity over its entire length with the amino acid sequence set forth in SEQ ID NO:1 and comprising any one or more of the amino acid substitutions: C244W, M165K, V411A, R294A, and R294I, wherein the CBCAS variant has relative to the CBCAS of SEQ ID NO:1 an increased catalytic activity (for synthesizing cannabichromenic acid CBAC from cannabigerolic acid (CBGA) as a substrate).

2. The CBCAS variant of claim 1 , wherein positions 114, 176, 292, 296, 354, 408, 413, 444, 448, and 484 of the amino acid sequence of SEQ ID NO:1 are invariable and/or positions 165, 244, 294, and 411 of SEQ ID NO:1 are either invariable or are substituted to C244W, M165K, V411A, R294A, or R294l.

3. The CBCAS variant of claim 1 or 2, wherein if truncated on the C-terminus, only the 7 most C-terminal amino acids, preferably only the 1-5 most C-terminal amino acids are deleted.

4. The CBCAS variant of any one of claims 1 to 3, wherein the CBCAS variant comprises 1 , 2, 3, or all 4 of the amino acid substitutions C244W, M165K, V411A, R294A, and R294I.

5. The CBCAS variant of claim 4, wherein the CBCAS variant comprises the amino acid substitution C244W.

6. The CBCAS variant of any one of claims 1 to 5, wherein its amino acid sequence is that set forth in SEQ ID NO:1 comprising any one or more of the substitutions C244W, M165K, V411A, R294A, and R294I, preferably at least C244W.

7. The CBCAS variant of any one of claims 1 to 6, having a catalytic constant kcat of more than 0.02 s-1, preferably at least 0.05 s-1 or at least 0.1 s-1.

8. The CBCAS variant of any one of claims 1 to 7, having a substrate affinity Km of 10 pm or higher.

9. The CBCAS variant of any one of claims 1 to 8, wherein increased activity relative to the CBCAS of SEQ ID NO:1 is at least 2-fold, preferably at least 3-fold, more preferably at least 5-fold or 10-fold increased. Nucleic acid molecule encoding the CBCAS variant of any one of claims 1 to 9. The nucleic acid molecule of claim 10, wherein the nucleic acid is DNA. Vector comprising the nucleic acid molecule according to claim 10 or 11 , preferably a plasmid vector. Host cell comprising the CBCAS variant of any one of claims 1 to 9, the nucleic acid molecule of claim 10 or 11 or the vector of claim 12. Method for the production of cannabichromenic acid (CBCA) comprising contacting the CBCAS variant of any one of claims 1 to 9 with a suitable substrate under suitable conditions that allow CBCAS enzymatic activity. The method of claim 14, wherein the suitable substrate comprises cannabigerolic acid (CBGA).

Description:
CANNABICHROMENIC ACID SYNTHASE VARIANTS AND USES THEREOF

Field of the invention

The present invention is in the field of protein engineering and rational enzyme design. Specifically, it relates to cannabichromenic acid synthase (CBCAS) variants that have increased enzymatic activity and thus allow efficient recombinant production of cannabichromenic acid (CBCA). The invention further relates to nucleic acids encoding this synthase, vectors comprising such nucleic acids, host cells that comprise the synthase variant, the nucleic acid or the vector, and methods for the production of CBCA using any of the synthase variant, the encoding nucleic acid, the vector or host cells described herein.

Background of the invention

Terpenophenolic cannabinoids are the most prominent secondary metabolites in the annual herb Cannabis sativa L. In modern medicine, the two major cannabinoids, psychoactive A 9 -tetrahydrocannabinol (THC-C5) and non-psychoactive cannabidiol (CBD-C5), are widely used to treat neuropathic pain, convulsions, glaucoma, appetite loss, or childhood epilepsy, among others. Rare cannabinoids received slightly less public and scientific attention, but cannabichromene (CBC-C5) as the most abundant among them is nonetheless subject to several clinical studies for its antiinflammatory, immunoprotective, anti-bacterial, and anti-fungal properties. In planta, cannabinoids are produced and stored as carboxylic acid forms in the oily storage container of glandular trichomes. A 9 -Tetrahydrocannabinolic acid (THCA), cannabidiolic acid (CBDA), and cannabichromenic acid (CBCA) can be transformed to their bioactive neutral forms by heat decarboxylation. Structurally, CBC-C5 is an isomer of the psychoactive THC-C5 in which the isoprenyl moiety is oxidatively fused to the resorcinyl core. Because cannabis breeders have traditionally selected their plants for high THCA or CBDA levels, CBCA content rarely exceeds 0.5 % in commonly cultivated variants.

As an increasing demand in clinics and as cosmeceuticals can be expected, plant extraction may not meet expectations and alternative ways for production are necessary. An industrial bioprocess could replace the rather inefficient extraction from plants and, if specific to CBCA, could avoid costly separation techniques. However, to develop such a process, profound understanding of the corresponding enzyme CBCA synthase (CBCAS) would be essential, but research concerning CBCA synthase is in its infancy. The gene encoding CBCAS was identified recently, and the CBCA activity of the corresponding protein, expressed in Komagataella phaffii, was confirmed. It is remarkably similarto THCAS with an amino acid identity of 93% and 96% nucleic acid identity, respectively. The latest kinetic studies by Laverty et al. (2019) showed that CBCAS has a higher substrate affinity (K m of 9.3 pM compared to 137 pM for THCAS), however, its catalytic capacity is significantly lower (k cat 0.02 sec 1 compared to 0.2 sec 1 for THCAS). Reasons for these differences remain unclear. No crystal structure of the protein is available since there are significantly challenges caused by complex folding processes, requiring the incorporation of a covalent FAD, and at least six surface-exposed N-glycosylation sites, which result in heterogeneously mannosylated proteins in K. phaffii. Summary of the invention

The inventors of the present invention have found a solution to the existing need for improved production of CBCA by identifying a modified CBCA synthase that has significantly increased catalytic activity. Specifically, the inventors relied on the 93% identity to THCAS to construct a protein homology model using a classical homology multistep modeling process including template alignment, backbone generation, loop modeling, side-chain modeling, as well as model optimization and validation. In addition to this classical model, a deep-learning-assisted prediction technology with AlphaFold2 has been used (Jumper et al., 2021). By training on PDB data, this deep learning algorithm seeks to essentially predict the most likely protein structure to appear as a new PDB structure submission. By using the latter technology to improve the results of classical modeling, for example, for improving the side-chain accuracy, regions of the model poorly represented within the closest template could be modeled very precisely. This models then allowed the identification of potential sites in the enzyme that may be involved in catalytic activity and thus prone to modification in an attempt to increase catalytic activity of this enzyme.

The inventors successfully used these models to establish structure-function relationships for key residues within the catalytic pocket of CBCAS and to predict beneficial enzyme variants in a rational enzyme design approach. Using sequence similarity networks, it was found that CBCAS differs in key residues from a consensus sequence of an isofunctional protein cluster and this knowledge was employed to re-align CBCAS residues with the consensus and characterize the corresponding variants.

By using this approach, a 22-fold catalytic efficiency increase could be achieved in one CBCAS variant and a set of several additional beneficial variants could be designed and successfully tested. With the facilitated increased CBCA output, CBCAS’ kinetic capacity is elevated to the level of its close relative THCA synthase, clearing the way for the biotechnological production of CBCA.

In a first aspect, the present invention is thus directed to a cannabichromenic acid synthase (CBCAS) variant comprising or consisting of an amino acid sequence having at least at least 80 %, at least 85 %, at least 90 %, at least 95%, at least 97.5%, at least 99% or at least 99.5% sequence identity over its entire length with the amino acid sequence set forth in SEQ ID NO:1 and comprising any one or more of the amino acid substitutions: C244W, M165K, V411A, R294A, and R294I, wherein the CBCAS variant has relative to the CBCAS of SEQ ID NO:1 an increased catalytic activity (for synthesizing cannabichromenic acid CBAC from cannabigerolic acid (CBGA) as a substrate).

In various embodiments of this CBCAS variant,

(1) positions 114, 176, 292, 296, 354, 408, 413, 444, 448, and 484 of the amino acid sequence of SEQ ID NO:1 are invariable and/or positions 165, 244, 294, and 411 of SEQ ID NO:1 are either invariable or are substituted to C244W, M165K, V411 A, R294A, or R294I; and/or

(2) the variant, if truncated on the C-terminus, only the 7 most C-terminal amino acids, preferably only the 1-5 most C-terminal amino acids are deleted; and/or (3) the variant comprises 1 , 2, 3, or all 4 of the amino acid substitutions C244W, M165K, V411A, R294A, and R294I; and/or

(4) the variant comprises the amino acid substitution C244W; and/or

(5) the amino acid sequence is that set forth in SEQ ID NO:1 comprising any one or more of the substitutions C244W, M165K, V411A, R294A, and R294I, preferably at least C244W;

(6) the variant has a catalytic constant kcat of more than 0.02 s -1 , preferably at least 0.05 s -1 or at least 0.1 s’ 1 ; and/or

(7) the variant has a substrate affinity K m of 10 pm or higher; and/or

(8) increased activity relative to the CBCAS of SEQ ID NO:1 is at least 2-fold, preferably at least 3-fold, more preferably at least 5-fold or 10-fold increased.

In another aspect, the present invention relates to a nucleic acid molecule encoding the CBCAS variant of the invention. The nucleic acid molecule may be DNA. Also encompassed are vectors comprising the nucleic acid molecule according to the invention, preferably in form of a plasmid vector.

In still another aspect, the present invention features a host cell comprising the CBCAS variant of the invention, the nucleic acid molecule of the invention or the vector of the invention.

The invention is also directed to a method for the production of cannabichromenic acid (CBCA) comprising contacting the CBCAS variant of the invention with a suitable substrate under suitable conditions that allow CBCAS enzymatic activity. The suitable substrate may comprise cannabigerolic acid (CBGA). Also contemplated is the use of the CBCAS variant of the invention, the nucleic acid encoding it or the vectors or host cells of the invention for the production of CBCA. The production of CBCA may be by recombinant means in a suitable host organism expressing the variant enzymes or in vitro by using the isolated and/or purified CBCAS variant of the invention.

Brief description of the figures

Figure 1 schematically shows the biosynthesis of A9-tetrahydrocannabinolic acid (THCA) and cannabichromenic acid (CBCA) by their respective enzymes in Cannabis sativa. In an oxidative cyclization the common precursor cannabigerolic acid is converted to THCA or CBCA, which can be transformed to the bioactive neutral cannabinoids A9-tetrahydrocannabinol (THC-C5) or cannabichromene (CBC-C5) by heat decarboxylation.

Figure 2: Template-based classical homology model of CBCA synthase with model quality assessment. A: Structural alignment of CBCAS homology model (grey) and template crystal structure of THCA synthase (light grey, PDB 3vte). The root mean square deviation (RMSD) between atoms of both structures is shown. B: Ramachandran plot of model residues, excluding glycine, proline and those immediately preceding prolines. Probability contours are based on a high-resolution protein reference set. 91 .1 % of residues in most favored regions, 8.4% in additional allowed regions, 0.4% in generously allowed regions, 0% in disallowed regions. C: Per-residue discrete optimized protein energy (DOPE) scores for the model (red line, CBCAS) and the reference (black line, THCAS), indicating the likelihood of a correct fold at a given position. DOPE scores can be associated with secondary structure motifs, of which sheets (vertical pattern), helices (diagonal pattern) and coils (no pattern) are shown. Less certain folds with high DOPE scores are characteristic for random coils. For most residues, the classical model shows slightly higher scores, although there are several loops where the opposite is true. The assessment suggests that the model quality is close to the template quality.

Figure 3: Deep-learning-assisted structure prediction of CBCA synthase with quality assessment. A: Structural alignment of AlphaFold generated prediction and crystal structure of THCA synthase (PDB 3vte). The root mean square deviation (RMSD) between atoms of both structures is shown. B: Ramachandran plot of structure residues, excluding glycine, proline and those immediately preceding prolines. Probability contours are based on a high-resolution protein reference set. 92.0% of residues in most favored regions, 8.0% in additional allowed regions, 0% in generously allowed regions, 0% in disallowed regions. C: Per-residue discrete optimized protein energy (DOPE) scores for the structure prediction (grey line, CBCAS) and for THCAS (black line), indicating the likelihood of a correct fold at a given position. DOPE scores can be associated with secondary structure motifs, of which sheets (vertical pattern), helices (diagonal pattern) and coils (no pattern) are shown. Less certain folds with high DOPE scores are characteristic for random coils. For most residues, the AlphaFold prediction shows better (lower) scores and managed to predict especially the loop at position 300 convincingly. The Ramachandran angles are also improved over the classical model. The assessment suggests that the model quality is higher than the template quality.

Figure 4: Enzyme activity of variants assessing structure-function relationships of CBCA synthase. CBCA content detected by HPLC-UV at 255 nm after 1 hour CBGA transformation assays at pH 4.85, normalized to wild type enzyme content. The analysis confirms that Y484, H114, C176 and W444 are essential residues for CBCAS, like was shown previously for THCAS. Much different are results for residues Y417 and H292, as well as for certain N-glycosylation sites, where CBCAS, in contrast to THCAS, was much less affected. Although far from the catalytic center of the enzyme, deleting eight C- terminal amino acids abolished CBCA activity.

Figure 5: Using sequence similarity networks (A) to establish a protein sequence consensus (B) from proteins that form an isofunctional cluster. The network consists of proteins containing Pfam motifs for both an FAD-binding domain and a BBE domain. Initially disorganized, the network was iteratively trimmed by increasing the stringency on protein relatedness. At an identity of 75% and a minimum alignment score of 200, isofunctional protein clusters formed. The cluster containing CBCAS (small green dot) was subjected to a multiple sequence alignment in order to obtain a consensus sequence of proteins isofunctional to CBCAS. An exemplary segment of the sequence logo is shown at the bottom.

Figure 6: SSN consensus variant C244W drastically improves CBCA synthesis. A: Excerpt of multiple sequence alignment of proteins very similar to CBCAS. Residues 230-260 are shown containing the critical C244, that is a tryptophan in >99% of similar sequences (red arrow). B: HPLC-UV chromatograms highlighting CBCA content after CBGA conversion assays. CBGA elutes after 2.1 min and CBCA at 8.4 min. The variant C244W manages to convert almost all CBGA in steep contrast to the wild type enzyme. C: CBCA content detected by HPLC-UV at 255 nm after 1 hour CBGA transformation assays at pH 4.85, normalized to wild type enzyme content. Aligning the CBCA synthase sequence to the SSN consensus resulted in a 22-fold CBCA activity increase

Figure 7: Enzyme activity of variants suggested by sequence similarity network (SSN) consensus and functional hot spot analysis. CBCA content detected by HPLC-UV at 255 nm after 1 hour CBGA transformation assays at pH 4.85, normalized to wild type enzyme content. Lysine replacing M165 was beneficial to most similar enzymes, because of the formation of an additional hydrogen bond. Anchoring the superordinate sheet to a nearby helix improved CBCA activity 3.3-fold. Most variants suggested by HotSpot Wizards functional hot spot analysis did not facilitate enzyme production. The exception was V411 A where shortening the side-chain was beneficial.

Figure 8: Rational design: Two hydrophilic “gatekeeper” residues limit access to the catalytic center of CBCAS. A: Active site of CBCAS within the homology model in ribbon representation. Shown are the covalent cofactor FAD, precursor CBGA from protein-ligand docking, as well as two arginines at the immediate periphery of the catalytic center. B1 : Surface rendering of the CBCAS catalytic center, colored by hydrophobicity. The bulky, hydrophilic arginines are visible at the top. B2: Surface rendering of the THCAS catalytic center, colored by hydrophobicity. THCA synthase with its significantly higher catalytic capacity features lysine and isoleucine at positions 294 and 296. C: Enzyme activity of variants designed to elucidate the significance of positions 294 and 296 (gray shading). CBCA content detected by HPLC-UV at 255 nm after 1 hour CBGA transformation assays at pH 4.85, normalized to wild type enzyme content. Replacing R294 with alanine or isoleucine has a significant positive impact on CBCA activity. Interestingly, leaving R294 unmodified while changing R296 abolishes all activity. The best results with 4.3-fold activity are obtained for the exact THCAS residues R294I_R296K. Bottom blue shading: Enzyme activity of the artificial disulfide bond that was beneficial to the C-terminal stability of THCAS. Although the same importance could be determined for the CBCAS C-terminus, this disulfide bond was detrimental to the CBCA activity.

Detailed description of the invention

"At least one", as used herein, refers to 1 or more, for example 2, 3, 4, 5, 6, 7, 8, 9 or more. In connection with the invention described herein, this information does not refer to the absolute amount of a feature or component, but to the type of the feature or component. Together with specified amounts, the specified amounts refer to the total amount of the correspondingly designated type of component.

Numerical values specified herein without decimal places refer in each case to the full specified value with one decimal place. For example, "99%" stands for "99.0%". The expressions “approximately” or “about” in connection with a numerical value refer to a variance of ± 10%, based on the specified numerical value, preferably ± 5%, particularly preferably ± 1 %.

The terms “heterologous” or “recombinant” are used herein to indicate that the corresponding molecule does not occur naturally in the host organism. The heterologous or recombinant expression of one or more nucleotide sequence(s) in a host organism thus means that said host organism does not contain or express said nucleotide sequence(s) under natural conditions. This means that, in the host organism, it is possible to produce heterologous or recombinant proteins which would not be produced in the host organism under natural conditions. While the nucleotide sequences introduced into the host organism can be wild-type sequences from another organism, in the context of the present invention artificially created nucleic acid molecules are used that comprise at least one variation in the sequence that encodes for an amino acid substitution as described herein. In addition, the host cell can be altered/mutated in such a way that the expression of host genes or host nucleotide sequences is downregulated or switched off. The associated host proteins or associated functions of the downregulated or switched-off host genes can be replaced, altered, attenuated or boosted by the heterologously produced protein. Promoter sequences in the host organism can be modified, too, or activators or repressors can be introduced into the nucleotide sequence, the host genome or the expression vector in order to regulate the expression of the heterologous or recombinant nucleotide sequence.

“Modified” or “modification”, based on a nucleotide sequence or amino acid sequence or on a nucleic acid or protein/enzyme, means that the corresponding sequence is modified relative to the naturally occurring sequence (wild type), with the result that it is distinguishable therefrom. In particular, the modification is that a sequence can be mutated, for example by substitution, deletion or insertion. In the context of the present invention, the CBCAS variants disclosed comprise at least one amino acid substitution as defined herein.

The identity of nucleotide or amino acid sequences is determined by a sequence comparison. Said sequence comparison is based on the BLAST algorithm which is established in the prior art and commonly used (cf. for example Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215: 403-410, and Altschul, Stephan F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Hheng Zhang, Webb Miller, and David J. Lipman (1997): "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs"; Nucleic Acids Res., 25, pages 3389-3402) and is done in principle by assigning similar orders of nucleotides or amino acids in the nucleotide or amino acid sequences to one another. A tabular assignment of the relevant positions is referred to as an alignment. Another algorithm available in the prior art is the FASTA algorithm. Sequence comparisons (alignments), especially multiple sequence comparisons, are created using computer programs. What are commonly used are, for example, the Clustal series, T-Coffee or programs based on these programs or algorithms. Sequence comparisons (alignments) are also possible using the computer program Vector NTI® Suite 10.3 (Invitrogen Corporation, 1600 Faraday Avenue, Carlsbad, California, USA) with the specified standard parameters, the AlignX module of which for sequence comparisons is based on ClustalW. Unless otherwise stated, the sequence identity specified herein is determined using the BLAST algorithm.

Such a comparison also makes it possible to state the similarity of the compared sequences in relation to one another. It is usually specified in terms of percent (sequence) identity, i.e., the proportion of identical nucleotides or amino acid residues at the same positions or positions corresponding to one another in an alignment. This means, for example, that amino acid sequences with a sequence identity of less than 100% are typically amino acid sequences in which one or more amino acids have been modified, for example added, removed, exchanged or modified in some other way, compared to an amino acid sequence serving as a reference. The broader term of homology includes conserved amino acid exchanges in amino acid sequences, i.e., amino acids having similar chemical activity, since they usually exercise similar chemical activities within the protein. Therefore, the similarity of the compared sequences can also be specified as percent homology or percent similarity. Identities and/or homologies can be specified over entire polypeptides or genes or only over individual regions. Homologous or identical regions of different nucleotide or amino acid sequences are therefore defined by matches in the sequences. Such regions often have identical functions. They can be small and comprise only a few nucleotides or amino acids. Such small regions often exercise essential functions for the overall activity of the protein. It may therefore be meaningful to base sequence matches only on individual, possibly small, regions. Unless otherwise stated, specified sequence identities or homologies in the present application are, however, based on the total length of the nucleotide or amino acid sequence specified in each case. This means, for example, that if a reference sequence has a length of 100 amino acids, each sequence to be compared having a sequence identity of, for example, 80% must have at least 80 identical amino acids at corresponding positions of the reference of 100 amino acids when both sequences are directly compared. Said 80 amino acids can be contiguous or noncontiguous. This means that the sequence to be compared must have at least a length of 80 amino acids. The remaining 20 amino acids can differ in the two sequences. A comparable definition of “sequence identity” can be applied to nucleotide sequences. Here, the term "identity" refers to identical nucleotides at corresponding positions.

In the context of this application, “wild type” refers to, for example, a cell, the genome of which is present in a state as has arisen naturally by evolution. The term is used both for the entire cell and for individual genes and proteins. Therefore, the term “wild type” does not coverthose cells orthose genes or proteins, which have been altered at least in part by means of recombinant/gene-technology methods.

The above-described and further aspects, embodiments, features and advantages of the invention will become apparent to a person skilled in the art from studying the following detailed description and claims. In addition, any feature from one embodiment of the invention can be introduced into any other embodiment of the invention. Furthermore, it is self-evident that the examples contained herein are intended to describe and illustrate the invention, but do not restrict it, and the invention is especially not restricted to said examples.

The below-presented facts, subjects and embodiments which are described for the CBCAS according to the invention are also applicable to all other aspects of the invention, such as the nucleic acid molecule according to the invention, the vectors, the host cells, i.e. the recombinant organisms, and/or the disclosed methods and uses.

CBCAS variants

The following convention is used herein to describe substitutions affecting exactly one amino acid position (amino acid exchange): the naturally occurring amino acid is designated first in the form of the internationally customary single-letter code, and then followed by the associated sequence position and lastly the inserted amino acid. Multiple exchanges within the same polypeptide chain are separated from one another by slashes. In the case of insertions, additional amino acids are named after the sequence position. In the case of deletions, the missing amino acid is replaced by a symbol, for example an asterisk or a dash, or a A is given in front of the corresponding position. For example, C244W describes the substitution of cysteine at position 244 by tryptophan. This nomenclature is known to a person skilled in the field of enzyme technology. All positional numbering used herein refers to the amino acid sequence of SEQ ID NO:1 , if not indicated otherwise.

The amino acid sequence of the modified CBCAS has, based on the numbering according to SEQ ID NO: 1 and with respect to the amino acid sequence of SEQ ID NO: 1 , at least one amino acid substitution selected from C244W, M165K, V411A, R294A, and R294I. Optionally, 2, 3 or all 4 of these positions may be substituted accordingly. The invention thus covers variants in which the substitutions

(1) C244W and M165K;

(2) C244W and V411A;

(3) C244 and R294A;

(4) C244W and R294I;

(5) M165 K and V411 A;

(6) M165K and R294A;

(7) M165K and R294I;

(8) V411A and R294A;

(9) V411A and R294I;

(10) C244W, M165K and V411A;

(11) C244W, M165K and R294A;

(12) C244W, M165K and R294I;

(13) C244W, V411A and R294A;

(14) C244W, V411A and R294I,

(15) M165K, V411A and R294A;

(16) M165K, V411A and R294IM; (17) C244W, M165K, V411A, and R294A; or

(18) C244W, M165K, V411A, and R294I are combined. Particularly preferred are all variants that comprise the C244W substitution, as this has been found to have the strongest impact on the increase in catalytic activity.

In addition to these substitutions, the variant may comprise further substitutions, insertions or deletions relative to the amino acid sequence set forth in SEQ ID NO:1 as long as the resulting variant retains the defined sequence identity and has, relative to the wildtype sequence of SEQ ID NO:1 , an increased catalytic activity.

The variants may thus have at least 80 %, 81 %, 82%, 83%, 84%, 85 %, 86%, 87%, 88%, 89%, 90 %, 91 %, 92 %, 93 %, 94 %, 95%, 96%, 96.5%, 97%, 97.5%, 98%, 98.5%, 99% or at least 99.5% sequence identity over their entire length with the amino acid sequence set forth in SEQ ID NO:1

In various embodiments of the CBCAS variants, positions 114, 176, 292, 296, 354, 408, 413, 444, 448, and 484 using the numbering of SEQ ID NO:1 are invariable. These positions have been found to be essential for catalytic activity and that their substitution or deletion may lead to an impairment of activity of the enzyme. Similarly, positions 165, 244, 294, and 411 of SEQ ID NO:1 are preferably either invariable or are substituted to C244W, M165K, V411 A, R294A, or R294I, as described above.

As it has been found that the C-terminus of CBCAS is involved in its activity, the variants, if truncated on the C-terminus, lack only the 7 most C-terminal amino acids, preferably only the 1 -5 most C-terminal amino acids. If the truncated variants comprise a truncation on the N-terminus said limitation does not apply, although it may also be preferably that no more than 40, more preferably no more than 30, even more preferably no more than 20 or 10 terminal amino acids are deleted.

In various embodiments, the variants of the invention have a catalytic constant kcat of more than 0.02 s- 1 , preferably at least 0.05 s -1 or at least 0.1 s -1 . As the catalytic constant of the wildtype is 0.02 s -1 , this means that the variants have increased activity. The variant preferably also has a substrate affinity K m of 10 pm or higher. Again, this means that substrate affinity is at least as high as that of the wildtype enzyme. Generally, the variants of the invention preferably have increased activity relative to the CBCAS of SEQ ID NO:1 that is at least 2-fold, preferably at least 3-fold, more preferably at least 5-fold or 10- fold increased.

The variants of the invention may comprise, consist essentially of or consist of the amino acid sequence of SEQ ID NO:1 with the substitutions as defined herein. This means that the variant has either the same length as the reference sequence of SEQ ID NO:1 or may comprise additional amino acids that are not present in the wildtype enzyme, particularly on either its C-terminus, its N-terminus or both. These additional sequences are principally not limited in length, but it can be preferred that they are not longer than 200, or 100, or 50 amino acids. In some embodiments, such additional amino acid sequences may comprise detectable labels or purification tags, such as a 6xHis tag.

Nucleic acids& Vectors

In various embodiments, the CBCAS variant is encoded by a nucleotide sequence which is codon- harmonized or codon-optimized for use in the desired host organism.

The nucleic acid molecule which encodes a modified CBCAS is preferably introduced into a host organism in the form of a vector or plasmid, for example by transformation, transduction, conjugation or a combination of these methods, preferably by means of transformation. Methods for transforming cells are established in the prior art and are well-known to a person skilled in the art. Heterologous expression is achieved especially by integration of the gene or the alleles into the chromosome of the host organism or with an extrachromosomally replicating vector.

In the context of the present invention, vectors are understood to mean elements consisting of nucleic acids, which elements contain a nucleic acid according to the invention as the characterizing nucleic acid region. They are able to establish it as a stable genetic element in a species or a cell line over multiple generations or cell divisions. Vectors are specific plasmids, i.e., circular genetic elements, used especially in bacteria or yeasts. In the context of the present invention, a nucleic acid according to the invention is cloned into a vector or can be such a vector. The vectors include, for example, those which originate from bacterial plasmids, viruses or bacteriophages, or predominantly synthetic vectors or plasmids containing elements of greatly differing origin. With the further genetic elements respectively present, vectors are able to establish themselves as stable units in the host cells in question over multiple generations. They can be present extrachromosomally as units of their own or integrate into a chromosome or chromosomal DNA.

Expression vectors comprise nucleotide sequences which enable them to replicate in the host cells containing them, preferably in microorganisms, particularly preferably in unicellular fungi, bacteria or yeasts, and to express a comprised nucleotide sequence there. Expression is influenced especially by the promoter(s) which regulate transcription. In principle, expression can be effected through the natural promoter originally located in front of the nucleic acid to be expressed, but also through a host cell promoter provided on the expression vector or else through a modified or a completely different promoter from another organism or another host cell. In the present case, at least one promoter is provided for the expression of a nucleic acid according to the invention and used for the expression thereof. Expression vectors can also be regulatable, for example by changing of the culturing conditions or upon attainment of a certain cell density by the host cells containing them or by addition of certain substances, especially activators of gene expression. An example of such a substance is the galactose derivative isopropyl p-D-thiogalactopyranoside (IPTG), which is used as an activator of the bacterial lactose operon (lac operon). Another example is methanol, which acts in Pichia pastoris as an activator of the AOX1 gene, which encodes alcohol oxidase I. Furthermore, galactose can be used to regulate the Gall and Gal10 promoter in Saccharomyces cerevisiae. In contrast to expression vectors, the coding nucleotide sequence present is not expressed in cloning vectors.

Possible as plasmids or vectors are, in principle, all embodiments available to a person skilled in the art for this purpose. Such plasmids and vectors can, for example, be found in the brochures from Novagen, Promega, New England Biolabs, Clontech or Gibco BRL. Further preferred plasmids and vectors can be found in: Glover, D. M. (1985), DNA cloning: a practical approach, Vol. I-III, IRL Press Ltd., Oxford; Rodriguez, R. L. and Denhardt, D. T. (eds) (1988), Vectors: a survey of molecular cloning vectors and their uses, 179-204, Butterworth, Stoneham; Goeddel, D. V. (1990), Systems for heterologous gene expression, Methods Enzymol. 185, 3-7; Sambrook, J., Fritsch, E. F. and Maniatis, T. (1989), Molecular cloning: a laboratory manual, 2nd ed., Cold Spring Harbor Laboratory Press, New York. Suitable vectors are preferably those which are replicated in yeast cells. In a preferred embodiment, the vectors pESC (Agilent Technologies), pGAPZ A and/or pYES2 (Invitrogen, Darmstadt, Germany) or a modified form thereof can be used.

According to a preferred embodiment, the nucleic acid comprises or consists of a nucleotide sequence which encodes an above-described amino acid sequence and is, proceeding from the nucleotide sequence according to SEQ ID NO: 2, modified in such a way that (1) the above-described modified CBCAS is encoded and (2) it is optionally additionally adapted to the host organism by being codon- harmonized or codon-optimized.

Since a certain amino acid sequence can be encoded by multiple different nucleic acids because of the degeneracy of the genetic code, the invention includes all nucleotide sequences which can encode the CBCAS variants described herein. A person skilled in the art is capable of determining said nucleotide sequences unequivocally because, despite the degeneracy of the genetic code, defined amino acids must be assigned to individual codons. Proceeding from an amino acid sequence, a person skilled in the art can therefore ascertain without any problems nucleic acids encoding said amino acid sequence. Furthermore, in the case of nucleic acids according to the invention, it is possible, with respect to the wild-type or starting sequence, to replace one or more codons with synonymous codons (i.e., encoding the same amino acid). This aspect relates especially to the heterologous expression of the nucleic acids according to the invention. Since every organism, for example a host cell of a production strain, has a defined codon usage, a codon in a given organism may be translated less efficiently than a synonymous codon encoding the same amino acid.

“Codon optimization” of a nucleotide sequence is therefore preferably associated with a complete adaptation of the original nucleotide sequence to commonly used codons of the host organism. By contrast, “codon-harmonized” preferably describes an adaptation of the nucleotide sequence to the host organism while retaining a few rare codons of the original sequence. Since rare codons often have regulatory functions or may be involved in mRNA stability, it may be preferable to retain a few rare codons of the original organism, for example in order to increase the yield of active enzyme. An online tool for harmonization of sequences is, for example, available under “http://codonharmonizer.systemsbiology.nl/” (Claassens et al., Improving heterologous membrane protein production in Escherichia coli by combining transcriptional tuning and codon usage algorithms, PLoS One, 2017). The term “original organism” refers here to the organism from which the nucleotide sequence naturally originates. “Host organism” describes the organism into which the nucleotide sequence is introduced and in which it is expressed recombinantly.

On the basis of known DNA and/or amino acid sequences, it is possible for a person skilled in the art to produce the corresponding nucleic acids right up to complete genes via methods that are generally known nowadays, such as, for example, chemical synthesis or the polymerase chain reaction (PCR) in conjunction with standard methods in molecular biology and/or protein chemistry. Such methods are, for example, known from Sambrook, J., Fritsch, E.F. and Maniatis, T. 2001. Molecular cloning: a laboratory manual, 3rd Edition Cold Spring Laboratory Press.

In various embodiments of the invention, the nucleic acid molecule according to the invention and as claimed herein is

(1) DNA; and/or

(2) an expression vector; and/or

(3) codon-harmonized for expression in a host organism, preferably in Saccharomyces cerevisiae, Kluyveromyces marxianus, Yarrowia lipolytica or Pichia pastoris, further preferably Saccharomyces cerevisiae or Pichia pastoris, especially Saccharomyces cerevisiae.

In various embodiments, some of the enzymes and precursors that are required for the synthesis of cannabichromenic acid may be missing in the host organisms used, especially in yeasts, such as Saccharomyces cerevisiae or Pichia pastoris. The missing enzymes or synthesis pathways can be introduced into the host organism in addition to the modified CBCAS according to the invention in order to form required precursors or substrates or to allow subsequent reactions. Furthermore, enzymes in the host organism can be exchanged or modified in order to increase their activity or stability. Ways of increasing enzyme activity have already been described further above. However, these methods and modifications are only of importance if the cannabichromenic acid is to be produced by recombinant cells. If the CBCAS variant is only to be expressed in the host cells to be later isolated and/or purified and then used in separate methods for the production of cannbichromenic acid, such further modifications to the host cells may be unnecessary.

In embodiments, where the host organism further contains at least one further heterologous nucleic acid molecule, this may comprise a nucleotide sequence which

(1) encodes a prenyltransferase; and/or

(2) encodes a hexanoyl-CoA synthase; and/or

(3) encodes an olivetol synthase and/or

(4) encodes an olivetolic acid cyclase, the host organism containing preferably at least 2, more preferably at least 3, further preferably all 4 such sequences.

All these enzymes may be modified to improve their usefulness. This means that the nucleic acid sequences used encode for such modified enzymes and may also be codon-harmonized or -optimized for use in the selected hist organism.

Modified prenyltransferases as well as suitable hexanoyl-CoA synthases, olivetol synthases and olivetolic acid cyclases are for example described in the international patent publication WO 2020/016287, which is hereby incorporated by reference in its entirety.

Host cells

In the aspects of the invention related to host cells, all cells, i.e., prokaryotic and eukaryotic cells, are in principle suitable as host organisms. These cells can be mammalian cells (such as, for instance, cells from humans), other animal cells (e.g., insect cells), plant cells or microorganisms such as yeasts, fungi or bacteria. Preference is however given to those host cells which can advantageously be used for biotechnology applications, for example with regard to the transformation with the nucleic acid or the vector and the stable establishment thereof. Furthermore, preferred host cells are distinguished by good microbiological and biotechnological manageability. This concerns, for example, easy culturability, high growth rates, low demands on fermentation media, and good production and secretion rates of foreign proteins. Furthermore, the proteins can be modified by the cells which produce them after production thereof, for example by attachment of sugar molecules (glycosylation), formylations, aminations, etc. Such post-translational modifications can functionally influence the enzymes.

Unicellular fungi, yeasts or bacteria are particularly preferred herein, most preferably yeasts. They are distinguished by short generation times and low demands on the culturing conditions. Cost-effective culturing processes or production processes can be established as a result. In addition, a person skilled in the art has a wealth of experience with unicellular fungi, yeasts or bacteria in fermentation technology.

Particularly suitable bacteria, yeasts or unicellular fungi are those bacteria, yeasts or unicellular fungi which are deposited as bacterial, yeast or fungal strains at the German Collection of Microorganisms and Cell Cultures GmbH (DSMZ), Braunschweig, Germany. Unicellular fungi, yeasts and bacteria which are suitable according to the invention belong to the genera which are present in the catalogs of the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH under http://www.dsmz.de.

Cells preferred according to the invention are those of the genera Aspergillus, Corynebacterium, Brevibacterium, Bacillus, Acinetobacter, Alcaligenes, Actinobacillus, Anaerobiospirillum, Basfia, Wollinella, Fibrobacter, Ruminococcus, Mannheimia, Lactobacillus, Lactococcus, Paracoccus, Lactococcus, Candida, Pichia (also called Komagataella), Hansenula, Kluveromyces, Saccharomyces, Escherichia, Zymomonas, Yarrowia, Methylobacterium, Ralstonia, Pseudomonas, Rhodospirillum, Rhodobacter, Burkholderia, Clostridium or Cupriavidus, particular preference being given to Aspergillus nidulans, Aspergillus niger, Alcaligenes latus, Bacillus megaterium, Bacillus subtilis, Brevibacterium flavum, Brevibacterium lactofermentum, Escherichia coll, Basfia succiniciproducens, Wollinella succinogenes, Fibrobacter succinogenes, Ruminococcus flavefaciens, Anaerobiospirillum succiniciproducens, Mannheimia succiniciproducens, Actinobacillus succinogenes, Saccharomyces cerevisiae, Kluveromyces lactis, Kluyveromyces marxianus, Candida blankii, Candida rugosa, Corynebacterium glutamicum, Corynebacterium efficiens, Zymonomas mobilis, Yarrowia lipolytica, Methylobacterium extorquens, Hansenula polymorpha, Ralstonia eutropha, Rhodobacter sphaeroides, Paracoccus versutus, Pseudomonas aeruginosa, Acinetobacter calcoaceticus, Pichia pastoris (also called Komagataella phaffii), Thermoanaerobacter kivui, Acetobacterium woodii, Acetoanaerobium notera, Clostridium aceticum, Butyribacterium methylotrophicum, Clostridium acetobutylicum, Clostridium saccharoperbutylacetonicum, Clostridium beijerinckii, Clostridium butyricum, Moorella thermoacetica, Eubacterium limosum, Peptostreptococcus productus, Clostridium ljungdahlii, Clostridium carboxidivorans, Clostridium scatalogenes, Rhodospirillum rubrum, Burkholderia thailandensis and Pseudomonas putida.

In various embodiments, the host organism is a yeast, such as Saccharomyces cerevisiae, Kluyveromyces marxianus, Yarrowia lipolytica or Pichia pastoris, for example Saccharomyces cerevisiae or Pichia pastoris, especially Saccharomyces cerevisiae.

Host cells according to the invention can be altered with respect to their requirements for the culture conditions, can have different or additional selection markers or can also express other or additional proteins. The host cells can especially also be those which express multiple proteins or enzymes recombinantly.

The host organism can be contacted with the culture medium, cultured and fermented in a continuous or discontinuous manner in a batch process or in a fed-batch process or repeated fed-batch process for the purpose of producing cannabichromenic acid. A summary of the known culturing methods can be found in the textbook by Chmiel (“Bioprozesstechnik 1. Einfuhrung in die Bioverfahrenstechnik” [Bioprocess technology 1. Introduction to bioprocess engineering] (Gustav Fischer Verlag, Stuttgart, 1991)) or in the textbook by Storhas (“Bioreaktoren und periphere Einrichtungen” [Bioreactors and peripheral equipment], Vieweg Verlag, Braunschweig/Wiesbaden, 1994). The culture medium to be used must appropriately meet the demands of the host cell strain in question. Descriptions of culture media for various microorganisms are provided in the manual "Manual of Methods for General Bacteriology" published by the American Society for Bacteriology (Washington D.C., USA, 1981). The product, preferably the cannabichromenic acid formed or a modified form thereof, can either be collected from the medium or be obtained by cell harvesting and subsequent cell disruption. A combination of the two methods is possible as well. The product formed is preferably obtained by cell harvesting and subsequent cell disruption. At least one sugar, preferably fructose, galactose or glucose, serves as the carbon source in the culture media used. In a preferred embodiment, the host organism is cultured under conditions in which glucose is used as the carbon source, further preferably as the only carbon source.

Organic nitrogen-containing compounds such as peptone, yeast extract, meat extract, malt extract, corn steep liquor, soybean meal and urea or inorganic compounds such as ammonium sulfate, ammonium chloride, ammonium phosphate, ammonium carbonate and ammonium nitrate can be used as nitrogen sources. The nitrogen sources can be used individually or as a mixture.

Phosphoric acid, potassium dihydrogen phosphate or dipotassium hydrogen phosphate or the corresponding sodium-containing salts can be used as the phosphorus source.

The culture medium can furthermore contain metal salts, such as, for example, magnesium sulfate or iron sulfate, which are necessary for the growth of cells.

Lastly, further substances, such as, for example, bases, amino acids, vitamins and/or trace elements, can additionally be added to the medium. Moreover, suitable precursors and substrates can be added to the culture medium. The stated starting materials can be added to the culture in the form of a single batch or appropriately fed in during culturing.

Basic compounds such as sodium hydroxide, potassium hydroxide, ammonia or ammonia water or acidic compounds such as hydrochloric acid, phosphoric acid or sulfuric acid can be appropriately used for pH control of the culture. Antifoam agents such as, for example, fatty acid polyglycol esters can be used to control foaming. Appropriate selectively acting substances such as, for example, antibiotics can be added to the medium to maintain the stability of vectors.

Culturing or fermentation is typically effected at a temperature in the range from 15°C to 45°C and preferably at 19°C to 37°C.

A further embodiment of the methods according to the invention further comprises the step of isolating cannabichromenic acid from the host organism.

In one of these embodiments, the cells of the host organism are lysed in order to isolate cannabichromenic acid from the host organism. The lysis can be done mechanically, for example by means of a French press, glass beads or homogenizer, or chemically. The cells used in the fermentation are preferably treated chemically with an organic solvent. The cell debris can then be separated from the extract. This can preferably be done by means of filtration, sedimentation, centrifugation or a combination of the methods. Any other method known to a person skilled in the art can be used here as well. The products can then preferably be purified, preferably by means of preparative, chromatographic processes. The following examples serve to illustrate the invention without restricting it to these specific embodiments.

Examples

Material and Methods

CBCA synthase sequence

The CBCA synthase genomic sequence has been proposed (Laverty et al., 2019) and patented (Page and Stout, 2019, 2015). A corresponding fragment was ordered as a yeast codon-optimized doublestranded nucleotide string from Invitrogen (Darmstadt, Germany). The nucleotide sequence encoding the natural signal peptide was replaced by the a-mating factor signal sequence, the peptide of which facilitates secretion of the protein during expression (complete nucleotide sequence set forth in SEQ ID NO:2).

Classical homology modeling

The standard MODELLER workflow for homology modeling with enabled heteroatoms was used to generate a model with an incorporated FAD cofactor (Eswar et al., 2008). The initial template search was performed against a dataset of all non-redundant PDB sequences at 95% sequence identity with an additional sequence length constraint of 100-2000 amino acids. The chosen template was 3vte, the THCAS crystal structure (Shoyama et al., 2008). After aligning both sequences, MODELLER generated 30 individual models and ranked them by their Discrete Optimized Protein Energy (DOPE) scores. The model with the lowest score was loaded into USCF Chimera, where several loop refinements were performed to remodel loops with high DOPE scores.

AlphaFold v2.0 structure prediction

AlphaFold v2.0 was set up following the instructions and scripts provided at https://github.com/deepmind/alphafold. After installing Docker, all required genetic databases, and the model parameters, Alphafold was executed using a 16 GB GPU with 32 GB of RAM. Using a fasta file of each respective protein sequence and the monomer model of AlphaFold, models of THCAS, CBDAS and CBCAS were generated and downloaded as .pdb structures after Amber relaxation procedures (Case et al., 2021) were done.

Protein model structure assessment

Root mean square deviations (RMSDs) were calculated by the Dali Protein Structure Comparison Server (Holm, 2020). USCF Chimera (Pettersen et al., 2004) was used to export Ramachandran plots. Residue probability lines for Ramachandran plots were drawn after Lovell et al. (2003). From UCLA- DOE LAB SAVES v6.0, several protein structure assessment tools were used: ERRAT for overall structure quality, Verify3D for 3D to 1 D sequence compatibility, as well as WHATCHECK and PROCHECK for a range of stereochemical quality parameters. Discrete Optimized Protein Energy (DOPE) per-residue scores were calculated using the ModEval Model Evaluation Server of SaliLab (Eramian et al., 2008). A parseable secondary structure overlay of cannabinoid synthases was also generated by the Dali Server. Variant generation

CBCA synthase variants were cloned using a one-step site-directed mutagenesis protocol (Liu and Naismith, 2008). Following their primer design guidelines, variants required the exchange of one or two nucleotides. Where possible, the yeast codon frequency of the newly introduced codon was kept similar to the original codon. The PCR protocol of the methods paper was done using Q5 polymerase (NEB, Frankfurt a.M., Germany), and 1 pL of the reaction was used to transform Escherichia coli DH5a cells. Plasmids were isolated (NucleoSpin Plasmid miniprep, Macherey-Nagel, Duren, Germany) and sent for routine sequencing. Table 1 shows a list of CBCA variants constructed and tested.

Komaqataella phaffii expression strain

The expression strains used in this study were constructed as described previously (Zirpel et al., 2018). In brief, Komagataella phaffii Aade2 Apep4 (PichiaPink™, Invitrogen, Darmstadt, Germany) was used to express CBCAS wild type and variants from a single genomic integration under the control of the methanol-inducible AOX1 promoter with a CYC1 terminator. The integration plasmids featuring CBCAS sequences and an adenine complementation marker were linearized and integrated into the genome of the adenine-deficient strain by electroporation. Plating on drop-out plates without adenine yielded white yeast colonies after 3-4 days of incubation at 30°C. The yeast strain additionally co-expresses CNE1 p, FAD1 p, and Hacl p. These proteins are involved in folding and ER quality control, cofactor availability, and the unfolded protein response signaling cascade, respectively (Zirpel and Kayser, 2018). It has been shown that these modifications dramatically increase the amount of correctly folded cannabinoid synthases in these cells.

Protein expression

Three fresh colonies of each variant expression strain were picked from the integration plates and used to inoculate 10 mL of YPD medium (2% (w/v) peptone, 1 % (w/v) yeast extract, 2% (w/v) glucose) in 100 mL baffled shake flasks. These triplicate cultures were grown to saturation at 30°C and 200 rpm in 1-2 days. Subsequently, cells corresponding to an ODeoo of 0.2 were transferred to YPD precultures in 100 mL baffled flasks at 10% filling volume. After 20 hours of incubation at 30°C and 200 rpm, biomass generation was achieved in 300 mL baffled flasks at 10% filling volume again inoculating to an ODeoo of 0.2. The biomass generation medium was BMGY (10 g L -1 yeast extract, 20 g L -1 peptone, 100 mM Bis- Tris-HCI buffer pH 5.8, 13.4 g L -1 yeast nitrogen base, 0.4 mg L -1 biotin, 1.5% (w/v) glycerol) and cells were incubated at 30°C and 200 rpm for 24 hours. Cells of each flask were centrifuged and resuspended in 30 mL BMMY expression medium (10 g L -1 yeast extract, 20 g L 1 peptone, 5 g L -1 casamino acids, 100 mM Bis-Tris-HCI buffer pH 5.8, 13.8 g L -1 yeast nitrogen base, 0.4 mg L -1 biotin, 10 mg L -1 riboflavin, 1 % (v/v) methanol) and incubated at 15°C and 200 rpm for 110 hours. Methanol (0.5% v/v) was added to each flask every 24 hours to counteract metabolization and evaporation. Table 1 : List of CBCA synthase variants constructed and tested in this study. CBCAS activity assays

The cells of each expression culture were separated from the culture supernatant by centrifugation at 20,000 g. Cells corresponding to a culture ODeoo of 125 were disrupted mechanically with glass beads (0.75 - 1 mm) and vigorous shaking for 30 minutes. After separating the cell debris by centrifugation at 13,000 g for 30 minutes, the cell lysate supernatant was transferred to a fresh microcentrifuge tube. Two assays were performed for each variant triple (6 assays per variant), one with cell lysate and one with the culture supernatant obtained directly after expression. In 100 pL volume, supernatant or lysate was adjusted to pH 4.85 with 1 M sodium acetate buffer to a final concentration of 50 mM. 300 pM CBGA in DMSO was added to each assay before incubation at 40°C and 1100 rpm shaking in a benchtop thermoshaker for 1 hour. Assays were stopped by adding 300 pL stop solution (90% acetonitrile, 10% formic acid) and incubation on ice for 15 minutes. After an additional centrifugation at 13,000 rpm for 30 minutes at 4°C, the supernatant was injected into the HPLC system.

HPLC-UV cannabinoid detection

The cannabinoids were separated using an Agilent Infinity II 1260 system on a Poroshell 120 EC-C18 (2.1 x 100 mm, 2.7 pm diameter) column (Agilent, Waldbronn, Germany) heated to 40°C. At an isocratic flow of 0.7 mL min -1 using 35% (v/v) H2O with 0.1 % formic acid and 65% acetonitrile as mobile phase, CBGA eluted after 2.1 minutes and CBCA at 8.4 minutes. CBGA was detected at 225 nm and CBCA at 255 nm.

Sequence similarity network generation and trimming

The sequence similarity network was generated by the Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST). Two approaches were initially chosen, the first being the arrangement of a network from all proteins of the families Pfam 01565 and 08031 (FAD-binding and BBE-like domains). Secondly, a network was constructed from the 1000 first hits using the CBCAS sequence in a nucleotide BLAST. Proteins of both networks were combined and duplicate proteins removed, as well as all proteins shorter than 250 amino acids or longer than 750 amino acids. To achieve the isofunctional clusters required for this study, the network was gradually trimmed by increasing the stringency on the edges (the relationships between the protein nodes). Identity was >75%, and the minimum sequence alignment score was 200 when such clusters formed, and connections between the CBCAS cluster and other clusters disappeared. A total of 161 sequences, including CBCAS, remained in the cluster, which were used to perform a Clustal Omega multiple sequence alignment to determine the consensus sequence. Jalview was used to visualize the consensus logo, alongside conservation, alignment quality, and occupancy at each alignment position.

Example 1: CBCA synthase homology model

For a classical modeling approach, MODELLER was used to generate the initial models, USCF Chimera for loop and side-chain refinement, and CCDC GOLD for docking of ligands. First, the CBCAS sequence was given to MODELLER to identify suitable templates from a non-redundant dataset of PDB sequences at 95% sequence identity, using only sequences of length 100-2000 amino acids. As expected, the available THCAS crystal structure (PDB 3vte) was identified as the highest-ranking template and used for the subsequent alignment. MODELLER initially generated 30 CBCAS models from this alignment and ranked them using its Discrete Optimized Protein Energy (DOPE) method. The best model was subjected to several further optimization steps comprising loop refinement and side-chain optimization. The crystal structure of THCAS is missing three peptide loops in regions where the X-ray diffraction pattern was inconclusive. Consequently, there are breaks within the template structure (PDB 3vte) at the corresponding positions and homology modeling may be difficult. With USCF Chimera, those three missing loops were re-modeled sequentially by generating 30 individual loops for each position and selecting the best one by DOPE scores.

Figure 2 shows the template-based classical homology model of CBCA synthase with model quality assessment and comprises a general assessment of the model quality by reference to its THCAS template 3vte. The backbone of the model structure, depicted in grey, is indistinguishable from the light grey reference structure through most parts of the 3D alignment. The only noticeable divergences are in flexible loop regions or in remodeled missing loops of the crystal structure, as seen in the top right of Figure 2A. At a RMSD of 0.7 A, the reference structure is very accurately represented in the model and will be close to the real protein, barring any errors in the template structure. The overall model quality is evaluated further through Ramachandran plots (2B) and per-residue DOPE scores (2C). In the Ramachandran plot, 91 .1 % of all residues (that are not Gly, Pro, or pre-Pro) lie in the most favored regions, 8.4% in additional allowed regions, 0.4% in generously allowed regions, and no residues in any disallowed regions as determined by PROCHECK. These Ramachandran results are better than those for the crystal structure of THCAS, where only 86.8% of residues are in the most favored regions. In Figure 2C, model and template are compared by their per-residue DOPE scores, with lower scores indicating a more reliable fold. It is evident from the secondary structure, represented by graph shading (Figure 2C), that structural reliability is lowest for random coils (no shading), better for p-sheets (vertical pattern), and best for a-helices (diagonal pattern). The per-residue DOPE scores are comparable to the template scores and lie only marginally higher for specific complex loops, for example, around position 360. Overall, the evaluation forthis model allows us to put its expressiveness near the one of the THCAS crystal structure used.

Next, the open-source inference pipeline AlphaFold v2.0 was used to model the CBCAS 3D structure. From the input sequence, the neural network predicts the CBCAS structure referencing the entire PDB structure database. Once finished, AlphaFold gives every modeled residue a confidence score. Using the Local Distance Difference Test (IDDT) confidence score, the overwhelming majority of the residues achieve a “Very High (pIDDT > 90)” confidence, while a few loops categorize in “Confident (90 < pIDDT > 70)”. By their assessment, model confidence is never “Low” or “Very low”. To compare the obtained model to the classical homology model as well as the THCAS crystal structure, the equivalent analyses was run as before and compiled in Figure 3. To evaluate the generated prediction, it was critically assessed in comparison to the published crystal structure. The overlay of both protein backbones in Figure 3A looks remarkably similar considering AlphaFold derived it just from the protein sequence. The RMSD between the THCAS crystal structure (orange) and the AlphaFold model (green) is higher than the previous model. However, at 1 .1 A, it is still very low, considering the width of a carbon atom is 1.4 A. In the Ramachandran plot (Figure 3B), it can be seen that 92% of residues are plotted in the most favored regions, 8% of residues in additionally allowed regions, and no residues in generously allowed or disallowed regions. In comparison to the previous classical model, the backbone dihedral angles of residues are closer to experimentally solved protein structures, indicating an improved approximation of the protein. Most notably, the per-residue scores of the DOPE analysis (Figure 3C), reflecting the likelihood of a correct fold at a given position in the structure, are generally higher than that of the reference 3vte crystal. The model achieves better values than the solved structure for specific loops (e.g., positions 90 and 300 in 3C). Due to their high flexibility, these loop regions are challenging to resolve in crystals, which is why the deep learning algorithm might have an edge, having learned from hundreds of thousands of solved flexible loops. In summary, the validity indicators used here suggest that the predicted 3D structure generated by AlphaFold should be very close to an experimentally solved structure.

Comparing both homology models, classical and neural-network-assisted, most quality indicators suggest that the AlphaFold structure is superior. This result is insofar convincing and comprehensible that the existing THCAS crystal structure template is limited in its accuracy. The most reliable crystal structures are solved to atomic resolution (i.e. <1 .2 A, Fabiola Sanjuan-Szklarz et al., 2020). At its comparatively low resolution of 2.75 A, there is some uncertainty in inferring exact side-chain positions from the electron density map. Thus, with decreasing resolution, a crystal structure gets increasingly inaccurate, especially in flexible loops. Additionally, by wwPDB validation scores, the published crystal structure compares mostly unfavorable to the PDB archive and lies several percentile ranks below structures of similar resolution. AlphaFold considers suitable structures from the entire PDB database for difficult positions, so the resolution or the quality of a single crystal structure is not as critical. Considering all mentioned quality data for both models, the expressiveness and accuracy are certainly high enough for the following structural analyses and rational protein engineering. Primarily, the AlphaFold prediction was used over the classical model for the reasons mentioned above.

Example 2: Characterizing key structure elements of CBCA synthase

In a 3D protein structure alignment with THCAS, the established CBCAS models were critically assessed for any dissimilarities that may explain the mechanistic and kinetic differences between both enzymes. First, it was sought to confirm essential structure-catalysis relationships postulated for THCAS before. The efforts were focused on the hypothetical catalytic residues (Y484, Y417, and H292), the FAD-binding (H114 and C176), two of the enzymes N-glycosylation sites (N89 and N499), the hypothetical CBGA stabilization by W444, and the C-terminal stability. For the analysis, the wild type sequence and cloned CBCAS variants were expressed from single-copy genomic integrations in Komagataella phaffii cells. In vitro cell lysate activity assays were performed at enzyme optimum conditions (pH 4.85 and 40°C) by addition of the substrate CBGA. After protein precipitation and centrifugation, the assay supernatant was analyzed on an HPLC-UV system to detect the resulting CBCA.

Figure 4 describes the CBCA activity of the tested variants by plotting HPLC-UV-determined CBCA contents normalized to the wild type content. Several structure-catalysis relationships that were confirmed for THCAS are identical for CBCAS. The central catalytic residue that initially abstracts a proton from the C5 hydroxyl group of the resorcinyl ring (Figure 1, para-position of carboxyl group) seems to be Y484, similar to THCAS. The catalytic activity ceases in the corresponding variant, where tyrosine was changed to phenylalanine. A similar exchange at residue Y417, which might be considered an alternative candidate for the central catalytic base due to its orientation in the active site, retains the enzyme activity. Surprisingly, an exchange of histidine at position 292 to cytosine still retains a third of the wild type catalytic activity in the variant rather than completely abolishing it. This residue was found to be essential for THCAS as an important counterion to stabilize the CBGA carboxylate transition state towards THCA. Reaction mechanistic differences between THCAS and CBCAS are the likely explanation, and a diverging orientation of the common precursor CBGA within the active site of CBCAS is expected. On the other hand, the FAD-binding seems to be similar for both enzymes at H114 and C176, as suggested by the alignment of crystal structure and model. Regarding the glycosylation of CBCAS, it was reported previously that deglycosylated THCAS is more active, which motivated an extensive study of glycosylation site variants to improve enzyme activity. Zirpel et al. (2018) found that THCAS variants produced more THCA when specific N-glycosylation sites were modified and no longer glycosylated. Similar to THCAS, the two most beneficial exchanges and their combination for CBCAS were tested. However, N89Q, N499Q, and their combination produced CBCA at 94% and 97% of the wild type enzyme, respectively. For CBCAS it was concluded that the impact of preventing glycosylation is less pronounced than for THCAS. From the obtained model structure, it seemed that the tryptophan at position 444 is involved in stabilizing the precursor within the catalytic site. Specifically, W444 might anchor CBGA through p/-stacking interactions between both aromatic rings. Aim was to investigate this hypothesis by designing phenylalanine and leucine exchange variants. Both preserve the non-polar hydrophobic characteristics of the residue, while W444F is still aromatic and W444L is not. Both variants abolished the enzyme activity entirely, underlining the structural importance of the tryptophan but denying any further conclusions about the importance of aromaticity. Additionally, it was determined that, as with THCAS, the C-terminus of the enzyme is very important for its catalytic activity. While deletions of three or five amino acids (variants C-terA3 and C-terA5 in Figure 4) achieved 106% and 103% of wild type activity, deleting eight residues prevented enzyme activity. Although far from the catalytic pocket, the C-terminus is critical for activity, similar to THCAS. Here, it was reasoned that for THCAS, a stabilization of the flexible C-terminus through an artificial disulfide bond could improve its robustness and catalytic activity. A corresponding variant was constructed and tested, which will be discussed later alongside other rational design variants.

This structure-catalysis analysis revealed many parallels between THCAS and CBCAS. The catalytic base Y484 and the FAD-binding residues H114 and C176 are identical, as is the apparent C-terminal instability. However, from a mechanistic perspective, the variant activities suggest differences, as expected from the dissimilarity of the enzyme products THCA and CBCA. Comparing the presented data with previously published results for THCAS, those differences become apparent in variants Y417F and H292C. Tyrosine at position 417 was irrelevant for CBCA activity, although it brought THCA activity down to 40%. In contrast, the exchange of H292 retained 37% of CBCA activity, while THCA activities dropped to 5% (Shoyama et al., 2012). These results justify a closer investigation into CBGA binding and the differences in reaction mechanisms of THCA and CBCA formation. However, this is not the focus of this disclosure which is more focused on structural insights and engineering of CBCA synthase. However, once a sound structural understanding of CBCAS is established, it may be valuable to contrast it against THCAS in a broader approach. Another aspect that must be acknowledged is the structural importance of tryptophan W444, which is atypical for BBE-like enzymes. The positioning of W444 is spatially impossible in other BBE-like enzymes, where instead E417 resides, but surprisingly, W444 is essential for CBCA activity. The exact rotational position of this tryptophan is quite ambiguous and is frequently different between various models. More precisely, when performing homology modeling or AlphaFold predictions multiple times with the same parameters, the side-chain rotation in different. This is atypical for a large tryptophan side-chain reaching into the catalytic pocket, which would normally be restricted by the residues around it. These outcomes can be best explained for a flexible side-chain that can change its rotational position within the active site. The best molecular protein-ligand-docking results were consequently obtained for a flexible W444. It was hypothesized, that this residue may guide the precursor CBGA towards the catalytic residues Y484 and H292.

Example 3: Using bioinformatics tools to suggest effective CBCAS variants

The main focus of these experiments was to go beyond the basic structure-catalysis relationships and towards enzyme engineering. The desired increase in CBCAS activity required careful assessment of the CBCAS structure, identifying potentially beneficial residues, and testing the resulting enzyme variants. A large-scale library screen of variants was not possible, because no high-throughput screening system is available for cannabinoid synthases. Screening by detection of hydrogen peroxide, which is conveniently released in stoichiometric amounts with CBCA to regenerate the cofactor was considered. However, the cytosolic hydrogen peroxide concentration would disrupt accurate measurements, as would antioxidant defense mechanisms like catalases and peroxidases. Instead of library screens, it was chosen to rely on several bioinformatics approaches to suggest suitable variants.

First, it was narrowed down why CBCAS’s catalytic capacity was lacking behind other members of its protein family. The intention was to identify structural differences to similar enzymes that might explain its poor performance. A powerful approach to explore this question is to work with sequence similarity networks. They feature a graphical representation of protein sequence relationships where proteins (nodes) are grouped into clusters by their relatedness (edges). The network was constructed using Pfam motifs for the FAD-binding domain (PF01565) and the BBE/BBE-like domain (PF08031), yielding several thousands of proteins. The initially disorganized tangle of related proteins can be trimmed down by the elimination of weaker edges between protein nodes. Restricting the alignment length to a minimum of 100 residues for example, will separate all protein nodes that do not meet this criterion.

As visualized in Figure 5A from left to right, the intention was to separate proteins that are less alike and keep the connections between highly similar proteins. This is done by increasing the stringency of the edge properties identity and minimal alignment score, so that all connections that do not reach the threshold are deleted. The network organizes itself into clusters containing proteins of high relatedness as the connections between the clusters get eliminated. Proteins still connected by edges form so-called isofunctional clusters, in which their members have the same catalytic function despite originating from a multitude of organisms. The network containing the CBCAS sequence (black dot in Figure 5A) organized itself into isofunctional clusters at 75% protein identity and a minimum alignment score of 200 (see Methods forfull trimming of the network). 161 sequences were gathered from this cluster, a multiple sequence alignment (MSA) was run, and the consensus sequence of proteins similar to CBCAS were exported (Figure 5B). This analysis also revealed the frequency of occurrence, the alignment quality as determined by BLOSUM62, and the conservation of each aligned residue. The CBCAS sequence was compared to this consensus and several interesting positions where it differed from the majority of similar sequences were found. Most notably, at position 244, the cytosine residue of CBCAS was a tryptophan in almost all other proteins (98.8% occupancy, Figure 6A. Here, a recent mutation event in C. sativa might have led to the altered sequence, which requires only a single nucleotide substitution. In this manner, further residues stood out as promising candidates for enzyme engineering, and M165K, M413L, and M413V were also included in the list of potentially interesting variants (see Methods, Table 1 for full details).

There are enzyme-engineering-focused web services available to identify potentially interesting target sites for amino acid exchanges. The solution that seemed most suitable for this project was the HotSpot Wizard 3.1 server for automated prediction of variants. The pipeline uses the given structure to identify essential residues, catalytic pockets, and tunnels. From an MSA, it then calculates “mutability” scores for each position and combines the gathered information into the probability of having found “functional hot spots”. These suggestions were critically assessed by the structure visualization and analysis suite USCF Chimera to rank all by their expected efficacy. Regarding proximity to the active site, hydrogen bond formation, and mutability score, five promising amino acid exchanges remained. Variants L345F, E408D, V411A, V411 S and T448G were constructed and tested.

Example 4: CBGA conversion assays reveal several variants with enhanced CBCA activity

The four SSN consensus variants and five functional hot spot variants were expressed in K. phaffii, and cell lysates were subjected to the CBGA assay test protocol mentioned above. After incubation with the substrate, inactivation of the enzyme, extraction of the obtained cannabinoids, and analysis by HPLC- UV, the CBCA content was compared. Because of its profound effect on enzyme activity, the data on enzyme variant C244W will be shown first and isolated from the remaining variants. In biological triplicates, the variant enzyme managed to produce 22-fold more CBCA as the wild type, far exceeding the expectations (Figure 6C). In the 1 h CBGA conversion assays, variant C244W managed to transform most of the added precursor to CBCA (Figure 6B). This result is in steep contrast to the next best variant, which manages to facilitate CBCA content fourfold (R294A, see Figure 7). Cytosine 244 is too far from the active site to influence reaction kinetics directly. Although no additional hydrogen bonds are formed in the immediate vicinity, the exchange for tryptophan seems to greatly benefit the enzyme structurally. This region is likely involved in the stabilization of the covalent FAD cofactor during protein maturation, and C244 could be detrimental there. However, comparing the obtained results to the SSN consensus, most isofunctional enzymes from various organisms seem to feature this critical tryptophan residue for a good reason (Figure 6A).

Enzyme activities of the remaining variants suggested by the SSN analysis or the functional hotspot analysis are depicted in Figure 7. Regarding the SSN consensus variants, CBCAS_M165K resulted in a more active variant, as predicted. The lysine replacing the wild type methionine may form an additional hydrogen bond, anchoring its sheet to a nearby helix. The overall stability in this region is enhanced, explaining the increased enzyme activity. Methionine in position 413 reaches into the catalytic pocket and might therefore be involved in substrate binding. However, the most common alternatives in similar enzymes, leucine, and valine, lead to decreased activities of corresponding CBCAS variants. The lower section of Figure 7 shows five variants that were suggested as hot spots for enzyme engineering by HotSpot Wizard 3.1. With the exception of V411A, they were not particularly effective in this test system. V411A achieved a mean 168% activity respective to the unmodified CBCAS. The exact structural reasons why shortening the side-chain (Ala) improved activity and incorporating a polar residue (Ser) abolished it remain unclear.

From the structural alignment of THCAS with the CBCAS models, it was noticed two distinctly different residues at the entrance to the catalytic pocket (Figure 8A). By molecular protein-ligand-docking, it was possible to visualize the catalytic center through the position of the docked precursor CBGA, depicted in green. In its vicinity, the large and hydrophilic arginines R294 and R296 of CBCAS are lysine and isoleucine in THCAS, respectively. A surface rendering of both enzymes, colored according to the hydrophobicity of the residues, shows the presence of a bulky, hydrophilic region that narrows the entrance to the pocket in CBCAS (Figure 8B1 , topmost grey region), but not THCAS (Figure 8B2).

CBCA and its precursor CBGA are relatively hydrophobic molecules at logP values 5.1 and 5.3, respectively. It is likely that two bulky hydrophilic “gatekeeper” residues might negatively affect substrate coordination and product release. Thus, the deficit in CBCAS reaction kinetics, as compared to THCAS, can be explained. It was decided to prove this hypothesis through a range of variants by installing neutral, as well as positively and negatively charged amino acids in positions 294 and 296. Figure 8C displays CBCA activities of these variants, relative to wild type enzyme activity, in the category “Catalytic pocket” (gray shading). Indeed, at residue R294, switching to smaller, neutral amino acids alanine or isoleucine increased CBCAS activity almost fourfold. Surprisingly, every substitution of R296 resulted in dysfunctional enzymes, be it neutral, positively, or negatively charged. This sharp contrast of the results obtained for both arginines indicates that R294 is more detrimental to CBCAS than R296 is. In addition, it seems that R296 has a mitigating function and is required once R294 is present. This hypothesis is substantiated by the results of variant R294I_R296K, showing the highest CBCA content of this approach. Here, the slightly shorter lysine instead of arginine in position 296 is advantageous, although it cannot exist alongside an R294 in a functioning enzyme. The remaining rational enzyme design variants of Figure 8C address the challenge of C-terminal stability. As discussed earlier (see chapter 3.2), variant P494C_R532C seeks to increase stability by introducing an artificial disulfide bond. However, no positive effect on enzyme activity was observed here, although a similar approach for THCAS resulted in a 1.7-fold increase in activity.

The extent to which it was possible to narrow the mutational space of bioengineering of CBCA synthase toward facilitated CBCA production exceeded the expectations. Considering the SSN consensus analysis results, it cannot be reasonably recommend using any CBCAS sequence without a C244W exchange. A 22-fold increase in activity is a remarkable achievement towards the engineering goal and dwarfs the next best variants. Nonetheless, several other sequence alterations also lead to a more active enzyme. Optimizations in enzyme structure and stability (M165K, V411A) and access to the catalytic pocket (R294A, R294I_R296K) stood out as most beneficial. Testing combinations of some or all of these modifications and characterizing different enzyme parameters beyond sheer activity remain as logical next steps for future research.

The analysis done by the inventors and described herein revealed structural similarities between A 9 - tetrahydrocannabinolic acid synthase (THCAS) and CBCAS, such as the mutual central catalytic base Y484 and FAD-binding sites H114 and C176. Some variants, however, performed differently for both enzymes and thus hint towards a divergent binding mode of the common precursor cannabigerolic acid (CBGA) within the active site. Besides structure-function considerations, the inventors also carried out sophisticated enzyme engineering towards facilitated CBCA activity. From a consensus sequence of 161 similar enzymes, several atypical residues within CBCAS were uncovered. Aligning these residues with the consensus of proteins that have all evolved toward similar functionalities had surprising implications. For the variant C244W, a 22-fold increase in CBCAS activity was confirmed, far exceeding the expectations. Over 99% of the analogous enzymes had the critical tryptophan at this position. A total of five positions within CBCAS were identified where amino acid substitution resulted in a significant elevation of CBCAS activity in the corresponding variants. Of these, two positions were found by rational design, comparing the immediate periphery of the catalytic centers of THCAS and CBCAS. R294 and R296 were identified as hydrophilic "gatekeeper" residues restricting access to the catalytic center of CBCAS. Replacement of both residues resulted in a fourfold increase in activity. It is expected that different combinations of the most advantageous mutations found could further increase CBCAS activity.