Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PROCESS FOR PREPARING URSODEOXYCHOLIC ACID
Document Type and Number:
WIPO Patent Application WO/2021/059182
Kind Code:
A1
Abstract:
A chemo-enzymatic process is described for preparation of ursodeoxycholic acid (UDCA) starting from deoxy cholic acid (DC A), including preparation of an enzyme with high 7beta-hydroxylase activity for conversion of deoxycholic acid (DCA) into ursocholic acid (UCA), said conversion being the first step in said process.

Inventors:
GALDI GIANLUCA (IT)
RAPACIOLI SILVIA (IT)
VERGA ROBERTO (IT)
Application Number:
PCT/IB2020/058932
Publication Date:
April 01, 2021
Filing Date:
September 24, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ICE S P A (IT)
International Classes:
C12N9/00
Domestic Patent References:
WO2013086499A22013-06-13
WO2018036982A12018-03-01
WO2016016213A12016-02-04
Other References:
ANONYMOUS: "UPI000ED72D3B", UNIPARC, 5 December 2018 (2018-12-05), XP055764543, Retrieved from the Internet [retrieved on 20210113]
J.-Y. LEE ET AL: "Contribution of the 7beta-hydroxysteroid dehydrogenase from Ruminococcus gnavus N53 to ursodeoxycholic acid formation in the human colon", THE JOURNAL OF LIPID RESEARCH, vol. 54, no. 11, 1 June 2013 (2013-06-01), pages 3062 - 3069, XP055174301, ISSN: 0022-2275, DOI: 10.1194/jlr.M039834
Attorney, Agent or Firm:
ADV IP S.R.L. (IT)
Download PDF:
Claims:
CLAIMS

1. Enzyme with 7-b hydroxylase activity and SEQ ID N.l, wherein independently of one another: the amino acid at position 113 is a hydrophobic amino acid selected from A, E, F, I, L, P, Y, W, and V; the amino acid at position 110 is a hydrogen bond-forming amino acid selected from R, N, Q, S, and T; the amino acid at position 368 is I; the amino acid at position 365 is a hydrogen bond-forming amino acid selected from A, N, Q, S, and T.

2. The enzyme with 7-b hydroxylase activity of claim 1, wherein the amino acid at position 110 is R or S.

3. The enzyme with 7-b hydroxylase activity of claim 1 or 2, wherein the amino acid at position 365 is A or S.

4. The enzyme with 7-b hydroxylase activity of any of claims 1-3, wherein the amino acid at position 113 is V.

5. The enzyme with 7-b hydroxylase activity of any of claims 1-4, said enzyme having SEQ ID N.3.

6. The enzyme with 7-b hydroxylase activity of any of claims 1-5 wherein its amino acid sequence has at least 80% of sequence identity with SEQ ID N.l, preferably its amino acid sequence has a sequence identity of at least 85% with SEQ ID No.l.

7. Expression vector comprising the nucleotide sequence encoding for the enzyme with 7-b hydroxylase activity of any of claims 1-6.

8. A chemo-enzymatic process for the preparation of ursodeoxycholic acid (UDCA) comprising steps of: a) converting deoxycholic acid (DCA) into ursocholic acid (UCA), by using the enzyme with 7 beta-hydroxylase activity of claim 1 , b) converting ursocholic acid (UCA) into 12K-UDCA, by using 12oc-hydroxysteroid dehydrogenase (12oc-HSDH) and lactic-dehydrogenase (LDH), and c) performing Wolff-Kishner reduction, thus obtaining UDCA.

9. The chemo-enzymatic process of claim 8, wherein step a) of converting DCA into UCA is performed by using a biomass of P. pastoris transformed with the expression vector of claim 7 encoding for the enzyme with 7 beta-hydroxylase activity.

10. The chemo-enzymatic process of claims 8 or 9, wherein in step b) a dehydrogenase 12oc-HSDH is used, being selected from 12a_988, 12a_829, 12a_793, 12a_956, and 12a_698, and a lactic-dehydrogenase LDH is used, being selected from LDH_CUPNH,

LDH_THECA, LDH_CAEEL, LDA_GEOSE, and LDH_LACPE.

11. The chemo-enzymatic process of claim 10, wherein in step b) a combination of lactic- dehydrogenase LDH_LACPE and dehydrogenase 12a_829, or a combination of lactic- dehydrogenase LDH_LACPE and dehydrogenase 12a_793, is used.

12. The chemo-enzymatic process of any of claims 8-11, wherein step b) is carried out at an incubation temperature of 24-30°C, at a phosphate buffer concentration of 0.1-1 M, and at a pH of 6.5-7.5, preferably at a temperature of about 27°C, at a pH = 7, and at a concentration of phosphate buffer of about 1 M.

13. The chemo-enzymatic process of any of claims 8-12, wherein in step b) the volume of LDH used is 3-6% v/v and the volume of 12a-HSDH used is 3-6% v/v.

Description:
“PROCESS FOR PREPARING URSODEOXYCHOLIC ACID”

DESCRIPTION FIELD OF THE INVENTION

The present invention concerns a chemo-enzymatic process for preparing ursodeoxycholic acid (UDCA) by starting from deoxy cholic acid (DC A), as well as an enzyme having a high 7beta-hydroxylase activity for converting DCA into ursocholic acid (UCA), said conversion being the first step of said process. BACKGROUND ART

Ursodeoxycholic acid (UDCA) is a compound normally present in small quantities in human bile, where it increases the solubilizing capacity of bile in relation to cholesterol. UDCA, or 3a,7 -dihydroxy-5 -colanoic acid, has the following formula: UDCA is known for its therapeutic properties, for example in the treatment of hepatic disorders including cholesterol gallstones, primary biliary cholangitis, and sclerosing cholangitis.

It is consequently highly advantageous to have a method available for the industrial production of UDCA. A plurality of processes for the synthesis of UDCA are known in literature, of both chemical and enzymatic type. However, to date said processes have exhibited significant limitations, especially in terms of regioselectivity and final yield of UDCA.

It is therefore an object of the present invention is to provide a process for the preparation of UDCA that is efficient, economical, and provides high yield and purity of UDCA. SUMMARY OF THE INVENTION

Said object has been achieved by a chemo-enzymatic process for the preparation of UDCA, as described in the accompanying claims.

In another aspect, the present invention concerns an enzyme having a high 7b- hydroxylase activity, which can be advantageously used in the step of converting deoxycholic acid (DCA) into ursocholic acid (UCA), of said process.

In a further aspect, the present invention concerns an expression vector comprising the nucleotide encoding sequence for said enzyme.

BRIEF DESCRIPTION OF THE FIGURES

The characteristics and advantages of the present invention will become evident from the following detailed description, the working examples provided for illustrative and non limiting purposes, and by the accompanying figures wherein:

- Figure 1 shows regions of the amino acid sequence of the cytochrome identified for saturation mutagenesis testing, as per the examples. In grey, the mutagenesis target residues;

- Figure 2 shows Seql 1, the amino acid sequence reconstructed with the mutations of the combined A and B clones. In grey, the two mutated amino acids;

- Figure 3 shows the chemical kinetics with different biocatalyst loading: 10%, 20%, 30% w/v. Conversion expressed as UCA produced (ppm) in 24h-48h-72h-96h;

- Figure 4 shows the sequence of the enzyme Seq73. In bold, the residues of random mutagenesis, as per the examples; and

- Figure 5 shows a comparison between the sequence of cytochrome g3484 with those of cytochromes exhibiting 7 -hydroxylase activity.

DET AIFED DESCRIPTION OF THE INVENTION

An object of the present invention is therefore an enzyme having a 7-b hydroxylase activity and having SEQ ID N. 1 , wherein independently of one another: the amino acid at position 113 is a hydrophobic amino acid selected from A, E, F, I, L, P, Y, W, and V; the amino acid at position 110 is a hydrogen bond-forming amino acid selected from R, N, Q, S, and T; the amino acid at position 368 is I; the amino acid at position 365 is a hydrogen bond-forming amino acid selected from A, N, Q, S, and T.

Preferably, in the enzyme of the present invention, the amino acid at position 113 is V. Preferably, in the enzyme of the present invention, the amino acid at position 110 is R or S.

Preferably, in the enzyme of the present invention, the amino acid at position 365 is A or S.

In preferred embodiments, the enzyme of the present invention has SEQ ID N. 2.

In the most preferred embodiments, the enzyme of the present invention has SEQ ID N. 3.

Preferably, the amino acid sequence of the enzyme subject of the invention has at least 80% sequence identity matching SEQ ID N. 1 , more preferably the amino acid sequence has a sequence identity at least 85% matching SEQ ID N. 1.

A further subject of the present invention is an expression vector comprising the nucleotide encoding sequence for the enzyme having 7-b hydroxylase activity of the present invention.

Preferably, said expression vector comprises the nucleotide sequence SEQ ID N. 11, the nucleotide sequence SEQ ID N. 12, or the nucleotide sequence SEQ ID N. 13 encoding for the enzyme having 7-b hydroxylase activity of the present invention.

In a preferred aspect, the enzyme of the present invention is expressed in said expression vector under control of the promoter AOX1, TEF, GUT1, GCW14, GK1, or GAP, preferably the promoter TEF.

In a preferred aspect of the invention, this expression vector is capable of replicating itself in an eukaryotic cell.

Preferably, said enzyme is expressed in a cell of P. pastoris, E. coli, Bacillus, S. cerevisiae, K. lactis, or Aspergillus.

In a preferred aspect of the invention, said enzyme in expressed in a strain of P. pastoris through a fermentation process.

Said fermentation process is preferably performed in a culture medium comprising casein peptone, yeast extract, glycerol, and phosphate buffer, at a temperature of 25°C to 35°C, preferably about 30°C, and at a pH of 6.5 to 7.5, preferably about 7.0.

Preferably, said strain of P. pastoris is selected from P. pastoris X-33, P. pastoris KM71H, P. pastoris SMD1168H, P. pastoris M5 (Superman5), and combinations thereof. Preferably, the strain is P. pastoris X-33. Alternatively, a subject of the present invention is also an enzyme having 7-b hydroxylase activity and SEQ ID N. 1, wherein independently of one another: the amino acid at position 110 is a hydrophobic amino acid selected from A, E, F, I, L, P, Y, W, and V; - the amino acid at position 113 is a hydrogen bond-forming amino acid selected from R, N, Q, S, and T; the amino acid at position 365 is I; the amino acid at position 368 is a hydrogen bond-forming amino acid selected from A, N, Q, S, and T. Preferably, in the enzyme of the present invention, the amino acid at position 110 is V. Preferably, in the enzyme of the present invention, the amino acid at position 113 is R or S.

Preferably, in the enzyme of the present invention, the amino acid at position 368 is A or S. A further subject of the present invention is a chemo-enzymatic process for the preparation of ursodeoxycholic acid (UDCA) comprising the steps of: a) converting deoxycholic acid (DCA) into ursocholic acid (UCA), by using the enzyme having a 7 beta-hydroxylase activity of the present invention, b) converting ursocholic acid (UCA) into 12K-UDCA by using a 12a- hydroxysteroid dehydrogenase (12a-HSDH) and lactic-dehydrogenase (LDH), and c) conducting a Wolff-Kishner reduction, thereby obtaining UDCA.

Below is a diagram of a preferred embodiment of the process of the invention:

In a preferred aspect, in step a) of the process of the present invention, the conversion of DCA to UCA is achieved by using biomass of P. pastoris transformed with the encoding expression vector for the enzyme having 7 beta-hydroxylase activity of the present invention.

Preferably, in step a) of conversion, cells obtained from fermentation of Pichia pastoris are directly used.

Alternatively, in said step a) of conversion, the enzyme having 7 beta-hydroxylase activity of the present invention is used free in solution or immobilized onto solid matrices, selected from resins known in the art.

Said enzyme is obtained through lysis of the cells transformed with the vector of the present invention by cellular lysis methods known in the art.

In a preferred embodiment, the collected cells are added to a phosphate buffer 100 mM pH 7.00, containing sorbitol 200 mM, at a final concentration of 20%. The lOx solution of DCA dissolved in organic solvent, preferably DMSO (100 ml in every liter of suspension), is added under stirring. Preferably, the concentration of DCA is 5-30 g/L, more preferably is about 20 g/L. Preferably, the entire mixture is maintained under stirring for 48-96 h, more preferably 72 h.

The final conversion yield percentage from DCA into UCA is about 60-65%.

In a preferred aspect, in step b) of the process of the present invention, a dehydrogenase 12oc-HSDH selected from 12a_988, 12a_829, 12a_793, 12a_956, and 12a_698 and a lactic-dehydrogenase LDH selected from LDH_CUPNH, LDH_THECA, LDH_CAEEL, LDA_GEOSE, and LDH_LACPE, are used.

Preferably in step b), a combination of lactic-dehydrogenase LDH_LACPE and dehydrogenase 12a_829, or a combination of lactic-dehydrogenase LDH_LACPE and dehydrogenase 12a_793, is used.

In a further preferred aspect, the dehydrogenase 12oc-HSDH and LDH enzymes used in step b) of the process of the present invention can be free in the reaction solution or immobilized onto solid matrices, selected from resins known in the art.

Preferably, the concentration of UCA is 20-50 g/L, more preferably about 40 g/L.

In a preferred aspect, step b) of the chemo-enzymatic process of the present invention is performed at an incubation temperature of 24-30°C, at a phosphate buffer concentration of 0.1-1 M, and at a pH of 6.5-7.5.

Preferably, step b) is carried out at a temperature of about 27 °C, at a pH=7 and at a phosphate buffer concentration of about 1 M.

In a further preferred aspect, the volume of LDH used is 3-6% v/v and the volume of 12a- HSDH used is 3-6% v/v.

The final percentage yield of conversion from UCA into 12K-UDCA is almost total, i.e. > 98-99%.

In alternative embodiments, steps a) and b) are performed in one-pot, i.e. in the same reaction environment, without intermediate separation steps.

A variant of the one -pot embodiment is a process wherein, before adding 12a- hydroxysteroid dehydrogenase and lactic-dehydrogenase, unreacted DCA is separated.

In step c) of the process, 12K-UDCA is subject to Wolff-Kishner reduction, thereby obtaining UDCA.

Wolff-Kishner reduction is a chemical reaction of reductive deoxygenation that, starting from an aldehyde or a ketone, produces an alkane, and has been known in literature for decades. It can therefore also be implemented in the context of the invention, for example following the teaching of Dutcher et al. (“ Studies on the Wolff-Kishner reduction of steroid ketones”, Columbia University, 1939, Journal of the American chemical society, Vol. 61) or the teaching of Szmant (“The Mechanism of the Wolff-Kishner Reduction, Elimination, and Isomerization Reactions” , Angew. Chem. Internat. Edit., 1968, Vol. 7, No. 2). The original method consisted of heating a hydrazone together with sodium ethoxide to about 200°C, but various modifications have been introduced in this method over time. Most recently, Wolff-Kishner reduction is performed in a single step using aldehyde or ketone, KOH, hydrazine, and ethylene glycol.

In an alternative embodiment, steps a), b), and c) are performed in one-pot, i.e. in the same reaction environment, without intermediate separation steps.

All the possible combinations of the preferred aspects of the preparation process for the enzyme, its uses and synthesis processes involving the same, are to be understood as hereby described and analogously preferred.

It is further to be understood that all the aspects identified as preferred and advantageous for the enzyme are analogously preferred and advantageous for the process using the same.

Below are working examples of the present invention provided for illustrative and non limiting purposes.

EXAMPLES

Selection of candidate alternatives to ent-kaurene oxidase cytochrome g3484 of Fusarium sporotrichioides

A comparison was made between the cytochrome P450 identified in Fusarium sporotrichioides (ent-kaurene oxidase) and another 10 similar proteins recorded in a database.

Matching between the sequence of ent-kaurene oxidase of F. sporotrichioides and known sequences recorded in the UniProtKB/Swiss-Prot database identified 100 sequences exhibiting similarities of between 43% and 98% compared to the reference enzyme.

The sequences identified, compared to each other according to evolutionary distance, can be divided into 10 subgroups within which the proteins are phylogenetically correlated. The identifying ID's of the 10 selected sequences are set out in Table 1, indicating the percentage of similarity relative to the enzyme of F. sporotrichioides and the microorganism of origin.

Table 1. List of the 10 selected sequences for expression and comparison of bioconversion yield relative to the enzyme of F. sporotrichioides. n.d. = translation of putative genes, obtained applying sequencing techniques for entire genomes, the true activity of which have never been experimentally demonstrated. CLONING AND TRANSFORMATIONS The sequences were ordered and optimized for expression in P. pastoris. The genes were cloned in vectors for intercellular expression in yeast regulated by a promoter induced by methanol, for example the vector pPICZA. The plasmids obtained transformed the strain of P. pastoris, also capable of expressing the fungal accessory protein, NAD(P)H-p450 reductase, identified as gl 1235, necessary for the operation of the cytochromes, providing the latter the reducing power required to complete the reaction.

Two clones were selected for each vector, propagated in liquid medium and cryopreserved for the expression tests. The comparison between different clones of the same transformant is necessary because, unlike other host organisms like E. coli, expression depends on how the vector has integrated into the genome and therefore there can be marked variability in expression between clones of the same plasmid. BIOCONVERSION TESTS The bioconversion tests were developed in two steps: accumulation of biomass that expresses the cytochrome under analysis, and actual bioconversion. In the first step the clones were inoculated in enriched medium in the presence of d-aminolevulinic acid, a known precursor of the heme group, and of methanol, the inductor required by the expression system of P. pastoris. After 3 days of expression the biomass was collected, resuspended in buffer at 20% w/v and utilized for bioconversion reactions in the presence of DCA at 3 g/L.

The bile acids were extracted after 3 days utilizing 2 volumes of ethylacetate following acidification of the cellular suspensions. The extracts were analyzed utilizing an INERTSIL ODS-2 column, C18, 250 x 4.6 to 5um, CPS analytical (cat.no. 5020-01128), routinely utilized in analytical laboratories (method EP 8.0).

Only 4 of the cytochromes analyzed are capable of converting DCA into UCA, this signifying the proteins 1, 2, 3, 10 listed in Table 1 converted to a similarity of from 65% to 98% in relation to the reference protein.

With the exception of protein 10, the first three present the highest similarity of sequence in relation to the reference (Table 1) and belong to the same phylogenetic cluster; it is probable that having undergone a similar evolutionary course, they have acquired the same capacity to recognize DCA as a substrate in a more or less specific manner.

In an analysis of the bioconversion DCA -> UCA in the presence of g3484 cytochromes, it emerges that protein 1 has a very similar conversion profile to the cytochrome of F. sporotrichioides, resulting from the two sequences being practically identical (98% similarity).

Protein 10, which exhibits a sequence similarity of 65% relative to the reference, produces a reaction yield around 8% as a consequence of presenting various impurities, absent in the other samples, caused by nonspecific reactions that diminish the efficiency of the cytochrome for the production of UCA from DCA.

Cytochrome 3 exhibits a yield of around 30%, without the overoxidized compounds of the reference reaction, but including the appearance of other nonspecific peaks.

Protein 2 is interesting in that it offers a conversion yield of 49.7% compared to 40% for the standard. The superior result is because the impurities comprising overoxidized molecules in the standard reaction are present in smaller quantities, even if there is more residual unreacted substrate (approximately 45% compared to the 3% reference). The analysis of the percentage areas of impurity revealed that in the cytochrome g3484 reaction these comprise more than 55% of the bile acids present, compared to a percentage area of 18% in the case of conversion with the identified protein 2. In absolute terms, compared to the 1200 ppm of UCA obtained utilizing the gene g3484 following extraction using ethyl acetate, the quantity of relevant product measured with the new cytochrome was 1500 ppm.

Protein 2 is thus naturally more selective than cytochrome g3484 in the recognition of DCA as substrate and comprises a better starting point for the identification of the optimum protein for the bioconversion.

The samples were analyzed in HPLC utilizing the method 04 MA RS035 (Method HPLC- RI 12K-UDCA) Ed.l, which offers improved resolution of short retention-time compounds. It was observed that samples including the new cytochrome present less overoxidized compounds compared to the reference reaction.

It is clear from the results obtained that cytochrome 2, derived from Fusarium graminearum, is the most promising candidate for further increasing the reaction yield. This protein does not promote oxidation reaction towards undesired products, which constitute the contaminants responsible for the relatively lower yield when utilizing cytochrome g3484, and immediately enables a conversion yield DCA -> UCA greater than 49%.

A comparison was thus made with known proteic structures in order to elaborate a three- dimensional computer model applying the I-TASSER predictive method. The model indicates which residues are involved in the bond of the heme prosthetic group, typical of cytochromes, and which amino acids comprise the bond pocket with the substrate, making it possible to establish which amino acids are involved in the formation of the site of the bond to the substrate.

Subsequently a docking analysis was conducted, this being a simulation of the interaction between the protein and the substrate in question (DCA), such as to identify which amino acids to substitute for those present in order to obtain potentially superior enzymatic variants for the conversion of DCA into UCA.

RATIONAL MUTAGENESIS

For rational mutagenesis, the cytochrome 2 was used for computer modeling in silicon of a three-dimensional proteic structure, exploiting similar proteins as scaffolds, the known structures of which are available in the RCSB-PDB database (Research Center for Structural Biology- Protein Data Bank).

The highlighted positions 3 and 12 of the protein are the carbons bonded to an OH group, while position 7 is the target position for the hydroxylation in question. The first docking analyses identified the residues of the protein that interact with the substrate, enabling the accommodation of the bond pocket with the correct orientation to cause the reaction of the carbon at position 7 with the oxygen activated by the iron atom of the heme group, the characteristic catalytic center of cytochromes.

Table 2 summarizes the results of the analyses, showing the amino acid residues of the protein, which according to the model could be in contact with the DC A and condition the outcome of hydroxylation in beta of the carbon 7.

Table 2. List of amino acid residues of cytochrome 2 that interact in the bond pocket with specific substrate atoms (carbon atoms of the rings, C, or substituent hydroxyl groups, OH). Val301 could be correlated with the position of the heme group, conditioning its reactivity. Particularly significant are the amino acids close to the hydroxyl group at position 12, since the docking studies observed that the hydrogen bonds between the hydroxyl group at 12 and the biocatalyst promote oxidation at b of the carbon 7 when the substrate in the bond pocket has its side b facing the heme.

RANDOM MUTAGENESIS

A commonly used method for directed evolution tests as an alternative to rational mutagenesis is random mutagenesis, which exploits techniques of molecular biology to introduce random mutations in known sequences and create enzymatic variants that are not definable a priori.

Error-prone PCR

Error-prone PCR enables the introduction of random mutations in a selected DNA sequence, exploiting the PCR polymerase chain reaction ) technique of molecular biology, which normally extends DNA sequences with high precision. Altering the reaction conditions, for example by introducing manganese salts or modifying the concentrations of the dNTPs, induces the DNA polymerase under reaction to utilize incorrect nucleotides during the polymerization of the DNA, resulting in the introduction of mutations into the extended DNA sequence.

Ep-PCR is a commonly used technique in directed evolution tests to obtain variants of a reference nucleotide sequence that code for proteins with different amino acids from wild- type, in order to confer the initial protein new and potentially more interesting characteristics according to the intended use.

This technique enabled the creation of a pool of sequences that code for proteins with random mutations, cloned in the commercial vector pPICZA for intracellular expression in yeast regulated by a promoter induced by methanol. The plasmids obtained were utilized to transform the previously selected strain, also capable of expressing the fungal accessory protein, NAD(P)H-p450 reductase, known as gl 1235, necessary for the functioning of cytochromes in that it provides reducing power for the completion of the reaction. The clones were used in expression and bioconversion tests.

Saturation mutagenesis

In addition to the random mutagenesis test provided by the ep-PCR, a more targeted random mutagenesis test was conducted under saturation. Exploiting the data obtained up to this point from molecular docking, two regions of the protein were identified in contact with substrate and promising for random mutagenesis testing in more restricted regions compared to testing with ep-PCR.

The regions of the amino acid sequence of the cytochrome identified for saturation mutagenesis testing are indicated in Figure 1 , in which the target mutagenesis residues are indicated.

Comparing the sequence of the cytochrome with those of the cytochromes that presented 7 -hydroxylase activity in the previous step, these regions were seen to be conserved (Figure 5), while the residues GlullO, Argll3, Leu365, Ala368 of cytochrome 2 exhibit scarcely conservative substitutions, comprising amino acids of steric bulk or charge different from the cytochrome 2 residues. Considering that the 4 proteins compared exhibit different bioconversion profiles and yields, these different residues could play an important role in defining the enzymatic performances. They were thus selected for the saturation mutagenesis tests, which foresee substitution of each of the 4 residues with all of the other 201 , -amino acids, obtaining 204 possible different sequence combinations to test.

The cytochrome 2 sequence was used as a PCR reaction template in which the sequence was amplified in the presence of primer with degenerations associated to each of the 4 residues, thus introducing random mutations only in the four selected positions.

The primers utilized to introduce the maximum possible number of mutations, reducing the probability of obtaining stop codons to a minimum, are as follows:

EVARsite (SEQ ID N. 4)

CTTTGAAATCTGAGAGACAATTGGATTTCACTNNKGTTGCTNNSGATGATAC

TCACGGTTACATTCCAGG

LLGAFRsite (SEQ ID N. 5)

CTGTTATTAAAGAGTCTCAAAGATTGAGACCTGTTNNKTTGGGTNNSTTCAG

AAGAATGGCTTTGGCTGATG

A pool of sequences was obtained, cloned in the commercial vector pPICZA through intracellular expression in yeast regulated by a promoter induced by methanol. The plasmids obtained were used to transform the previously selected strain of P. pastoris, also capable of expressing the fungal accessory protein, NAD(P)Fl-p450 reductase. EXPRESSION AND BIOCONVERSION TESTS The clones obtained from the transformations of P. pastoris using the sequences obtained from random mutagenesis were utilized for HTS testing for expression in P. pastoris and bioconversion, exploiting a system of miniaturization using 24 well microplates (2 ml of culture/well) for cellular growth, and 96 deep well microplates for the bioconversion step, such as to reduce the working volumes and enable wide spectrum screening of the various clones simultaneously, up to 192 clones per test.

A total of 10 expression and bioconversion tests were conducted, thus achieving an analysis of 960 clones in total.

The growth step included inoculating one clone per microplate well in buffered medium in the presence of the inductor (methanol), at a concentration of 2% (v/v). After three days of growth the cellular paste was separated from the exhausted medium by centrifugation and used for preparation of bioconversion tests, following resuspension at 20% wwcp/V in reaction buffer and with addition of DCA substrate to a final concentration of 3 g/L.

After 3 days of reaction, the total bile acids were extracted utilizing 2 volumes of methanol following acidification of the cellular suspensions. Analysis of the extracts was conducted utilizing an INERTSIL ODS-2 column, C18, 250 x 4.6 to 5um, CPS analytical (cat.no. 5020-01128), (method EP 8.0).

From among the 960 clones analyzed, 2 were identified as exhibiting interesting performances. Clone A achieved a conversion yield of 60.6%, obtaining 1820 ppm of UCA and 850 ppm residues of unconverted substrate (29% residue). Clone B instead achieved a conversion of 65%, obtaining 1950 ppm of product and 700 ppm of unreacted DCA (23.3%).

ANALYSIS OF MUTATIONS

The two mutations obtained were subjected to sequencing to identify which nucleotide substitutions and therefore which mutations in the amino acid sequence had been introduced. The analyses revealed that clone A expressed an enzyme in which the alanine residue at position 368 had been mutated with one from isoleucine (Ala368 -> Ile368), while the monooxygenases expressed by clone B exhibited a mutation of an arginine residue with one from valine (Argll3 -> Vail 13). The two mutated residues were localized in target regions for the saturation mutagenesis described in the previous section. The first docking analysis results identified these sites as regions of interaction between substrate and protein. It is possible that the change in charge or polarity and the variation in steric bulk of the lateral amino acid chain modified the bonding site with the protein substrate, permitting improved accommodation of the substrate in the bond pocket and consequently optimizing the reaction.

Due to the PCR reactions the two mutations were combined into a single sequence, called seqll (SEQ ID N. 2).

The proteic sequence coded by the nucleotide sequence is shown in Figure 2.

The new nucleotide sequence was cloned in the commercial vector pPICZA for intercellular expression in yeast regulated by a promoter induced by methanol. The plasmid obtained was utilized to transform the previously selected strain of P. pastoris, also capable of expressing the fungal accessory protein, NAD(P)H-p450 reductase.

The protein expression and bioconversion test utilizing 3 g/L of DCA (according to the protocol described herein below) confirmed the yield achieved utilizing clone B, this being 65% yield. The two mutations proved to be compatible, consequently Seqll (SEQ ID N. 2) was used as a template to bring about other mutations with an additive effect relative to those already introduced, in order to achieve the predefined objectives. SECOND CYCLE OF MUTAGENESIS

Seql 1 was the starting point for the second cycle of mutagenesis. Attention was directed not only to eliminating unwanted collateral reactions, for example during the first step of modification of the enzyme, but also to reduce possible inhibition of the substrate, such as to increase the concentration of DCA in reaction in order to achieve useful conversions for the step of industrialization of the process.

Exploiting the data from the docking analyses, the second cycle of mutagenesis was focused on certain positions identified in bioinformatics analysis as important not only for increasing affinity between substrate and protein, but also to eliminate residues that could create a second bond pocket for the DCA. The bioinformatics substrate-cytochrome interaction models identified the potential presence of a secondary region where the DCA can position itself thereby hindering entry to the active site.

Table 3 lists the main residues that, from the comparison of at least three docking models, are potentially involved in the formation of a second bonding site with the DCA.

Table 3. List of residues involved in the formation of a secondary site for interaction between protein and DCA in a plurality of docking models.

It was observed that these residues are remote from the primary protein sequence, but nearby when the enzyme assumes the tertiary structure that determines its activity.

The residues were selected as targets for the saturation mutagenesis tests, which involve substituting each of the 6 residues listed in Table 3 with all of the other 20 L-amino acids, obtaining 206 possible different sequence combinations for testing. Seql 1 was utilized as a reaction template for PCR in which the sequence was amplified in the presence of primer with degenerations associated to each of the 6 residues, such as to introduce random mutations only in the 6 selected positions. The primers utilized to introduce the maximum possible number of mutations, reducing to a minimum the probability of obtaining stop codons, are described below.

In the primers listed below the 4 nitrogenous bases are indicated using the traditional abbreviation and other letters with the following meanings:

A = adenine C = cytosine G = guanine T = thymine R = G A (purine)

Y = T C (pyrimidine) K = GT (keto)

M = A C (amino)

S = G C (strong bonds)

W = A T (weak bonds)

B = G T C (all except A) D = G A T (all except C)

H = A C T (all except G)

V = G C A (all except T)

N = A G C T (all)

GlullO (SEQ ID N. 6)

CTTTGAAATCTGAGAGACAATTGGATTTCACTNNKGTTGCTGTCGATGATAC

TCACGGTTACATTCCAGG

His83 (SEQ ID N. 7)

GT AGATCTTTGT AC A AGGAT ACTCC AT AT A A AGCTNNKACTG ATTTGGGAG ATGTTTTG

Arg220 (SEQ ID N. 8)

TTGAGTACACTGTTCAATTGTTTCAAACTGCTGATGAATTGBNKAATTACCC T AGATGGACT AG ACC AT AT ATT C ATTGG

GIFRsite (SEQ ID N. 9)

GAGTCTCAAAGATTGAGACCTGTTAGTTTGHNKATCTTCNNKAGAATGGCTT

TGGCTGATGTTACTTTGCCA

Leu365 (SEQ ID N. 10)

CTGTTATTAAAGAGTCTCAAAGATTGAGACCTGTTNNKTTGGGTATCTTCAG

AAGAATGGCTTTGGCTGATG

The pool of sequences obtained was cloned in the commercial vector pPICZA for intercellular expression in yeast regulated by a promoter induced by methanol. The plasmids obtained were utilized to transform the previously selected strain of P. pastoris, also capable of expressing the fungal accessory protein, NAD(P)H-p450 reductase. EXPRESSION AND BIOCONVERSION TESTS

The clones obtained from the transformations of P. pastoris, utilizing the sequences obtained from mutagenesis as described in the section herein above, were used for HTS tests for expression in P. pastoris and bioconversion, making use of a system of miniaturization in 24 well microplates (2 ml of culture/well) for cellular growth, and 96 deep well microplates for the bioconversion step, such as to reduce the working volumes and enable wide spectrum screening of the various clones simultaneously, up to 192 clones per test.

A total of 6 tests were conducted for expression and bioconversion, thereby analyzing a total of more than a thousand clones.

The growing step included inoculation of one clone per microplate well in buffered medium in presence of inductor (methanol), at a concentration of 2% (v/v). After three days of growth, the cellular paste was separated from the exhausted medium by centrifugation and used to prepare bioconversion tests, following resuspension at 20% wwcp/V in reaction buffer. Compared to the initial screenings, the concentration of substrate provided was increased from 3 to 5 g/L, such as to satisfy two requirements: select enzymes capable of converting significant quantities of DCA from a perspective of industrialization of the process, and not to render screening of small volumes difficult due to the poor solubility of the bile acid in question.

After 3 days of reaction, the total bile acids were extracted utilizing 2 volumes of methanol following acidification of the cellular suspensions. The analysis of the extracts was conducted by means of an INERTSIL ODS-2 column, C18, 250 x 4.6 to 5um, CPS analytical (cat.no. 5020-01128), routinely utilized in PCA analysis laboratories (method EP 8.0).

Among the clones analyzed, one was identified, referred to as clone 73, capable of producing 4500 ppm of UCA relative to 350 ppm of unreacted substrate, compared to 1950 ppm of product from the best clone of the first cycle of mutagenesis. The mutated enzyme enables a yield of 90%, compared to the 65% obtained during the previous processing step.

The increased yield compared to the best clone of the first mutagenesis is a consequence of both greater consumption of DCA and increased selectivity in relation to the substrate, resulting in diminished overoxidized impurities. Measurements revealed both a reduction in unreacted substrate, and a reduction of the area of the peak corresponding to a compound presumably more polar than the required product.

MUTANT ANALYSIS

Clone 73 was subject to sequencing to identify the nucleotide substitutions introduced during the mutagenesis cycle. The amino acid sequence resulting from these substitutions is reported in Table 4, in comparison with the template (Seqll) and the wild-type sequence (protein 2). Two mutations were introduced during the processing step described: the glutamic acid residue at position 110 mutated with a serine residue, and the leucine residue at 365 mutated into a serine residue.

Table 4. Comparison of sequences before mutagenesis (ICE2), after the first cycle of 5 mutation (Seql 1), and after the second cycle of mutagenesis (Seq73). Light grey indicated residues mutated after the first mutagenesis, dark grey indicates residues mutated after the second mutagenesis.

It is possible that the change in polarity, charge, and steric bulk of the residues modified 10 the three-dimensional structure of the enzyme, facilitating entry of the DCA into the active site and increasing the affinity of the protein for the substrate in question. In particular, the substitution of the glutamic acid at position 110 with a serine may have eliminated the saline bridge between the glutamic acid and the arginine residue at position 220, at least partially modifying the secondary bond pocket between DCA and protein, 15 which was responsible for the effect of substrate inhibition identified previously.

The sum of mutations from the sequential cycles of mutagenesis resulted not only in achievement of a minimum reaction yield of 70%, the final target defined for step 4, but also exceeded this value by achieving a conversion percentage of 90%. The mutagenesis activity also partially eliminated the effect of inhibition by the substrate, making it possible to exceed 2 g/L of hydroxylated substrate, which constituted the maximum conversion limit at the end of the first cycle of mutagenesis.

OPTIMIZATION OF FERMENTATION

The first critical step for scaling up the fermentation was the elimination of methanol, which was used as an indicator for expression in the commercial Pichia pastoris system utilized for the expression of the cytochrome in question, in order to avoid problems of flammability and toxicity.

Consequently, as an alternative to the AOX promoter testing was conducted on 4 promoters, which were utilized for constitutive expression of heterologous proteins in the selected host system. The yeast was transformed using the 4 new vectors generated, and the biomass obtained, after 3 days of expression in medium without methanol, was utilized for bioconversion reactions of the DCA. The F1PLC analysis of the bile acids extracted from the reaction mixtures enabled identification of promoter 1 as optimum for regulating the expression of the protein in question.

A transition was made from the 3 ml of culture utilized during the FIT screening step to 100 ml of culture medium in a conical flask. In enriched medium, a biomass production was achieved of 120 g/L in 96 h of total fermentation.

The second scale-up step involved a transition from culture in conical flask to a 2 L fermenter. This was an essential step and already indicative of the foreseeable yield of fermenters on the pilot scale, due to the possibility of controlling fundamental parameters for the growth of the yeast in culture, including pFl, temperature, and oxygenation. It was also possible to assess alternative industrial raw materials in a buffered saline medium, in the presence of a carbon source and protein extracts and/or hydrolysate to supply the yeast with nutrition, vitamins, and cofactors required not only for growth but also for the heterologous expression of the cytochrome.

An experimental design was then applied aiming to combine different protein extracts and hydrolysates, simultaneously assessing 48 growth conditions and consequent expressions.

The applied experimental design enabled identification of the best combination of components as alternatives to the enriched medium in the conical flask, in particular eliminating the addition of pure cofactors essential for the synthesis of complex proteins like cytochromes, which is difficult to sustain economically in industrial fermentation. The resulting medium not only maintained the yield (125 gwcp/L) achieved with enriched medium even though using industrial components, but the times were also halved: maximum production was achieved in 48 h of fermentation instead of 96 h. INCREASED DCA LOADING The biomass produced was utilized in reaction at 20% w/v at scalar concentrations of DCA: 5 g/L, 10 g/L, 20 g/L, 30 g/L.

After 3 days of incubation the reactions were assessed by HPLC analysis (Table 5), the biomass also converted 10 g/L of substrate producing a yield of 90%, doubling the loading achieved up to this point. Table 5. HPLC analysis of the increased substrate loading tests. At 10 g/L conversion yields of 90% were maintained.

As shown in Table 5, after increase of DCA above 10 g/L in reaction, the conversion yields diminish: the mutations introduced in two cycles of mutagenesis have certainly reduced enzyme substrate inhibition but without completely eliminating it. However, it should be noted that at 20 g/L of loading the maximum concentration of UCA, 12.5 g/L, was achieved, and the decline in yield is attributable only to the accumulation of unreacted substrate and not to the products of undesired oxidation, as initially occurred using the original enzyme. REACTION KINETICS ANALYSIS

The reaction kinetics were assessed by preparing three different concentrations of wet cell paste (wcp), specifically 10%, 20%, 30% w/v, utilizing an initial 20 g/L of DCA. The bile acids were extracted at successive intervals to assess the development of the reaction. As emerges in Ligure 3, maximum conversion was achieved at 20% w/v of biomass in 72 h. Longer time scales at higher concentrations of biocatalyst did not improve the yield, while lower concentrations of enzyme did not allow the required performance to be achieved.

The relatively high percentage of biocatalyst compared to the reactions using free enzymes could be imputable to the complexity of the cytochrome, which is anchored to the cellular membrane and dependent on accessory proteins, also integral to the membrane, for correct operation, as well as the need for the permeation of substrates and biliary products through the cellular membrane.

STABILITY OF THE BIOCATALYST

Biocatalyst (biomass) stability tests were conducted assessing parameters including: absence or presence of stabilizers (glycerol or sorbitol), preservation temperature (specifically -80°C, -20°C, 4°C, and 25°C), and preservation times (7, 14, 30 days). Stability was assessed and expressed as bioconversion yield, in presence of 10 g/L of DCA, utilizing biomass preserved under the different conditions

The comparison between reaction products demonstrated that the performance of the enzyme was maintained stable up to 14 days of preservation at 4°C, without the need to add stabilizers. After 30 days, the biocatalyst underwent a partial decrease in activity, thus maximum storage can be conservatively foreseen up to 14 to 15 days (at 4 to 6°C) after the recovery of the cellular mass following fermentation (possibly under more detailed subsequent assessment, within a temporal range of 14 to 30 days).

SELECTION OF THE BEST 12a-HSDH CANDIDATES

Considering that the 12a-HSDH of Bacillus sp (ID Databank: A3IBN0) used to date exhibited excellent performance in the required reaction, the sequence of this enzyme was the starting point for the selection of other potential 12a-HSDH within the range of those available in the database. The strategy adopted for the selection of candidates was to maintain maximum possible similarity with the sequence in use, presupposing that similar amino acid sequences will induce similar secondary and tertiary structures and thus confer to different proteins the same properties and the same behaviors in matching conditions. Bibliographic research in the UniProtKB/Swiss-Prot database for sequences most similar to that currently in use led to the selection of the proteins listed in Table 6.

Table 6. List of 12a-HSDH selected for expression in E. coli. Similarity is expressed as a percentage of similarity of amino acid sequence compared to 12a-HSDH of Bacillus sp (ID Databank: A3IBN0 described as “predicted short chain dehydrogenase”).

The selection of a range of similarity from 69% to 98.8% represents a good compromise between high similarity with the protein that to date has given excellent results, and an adequate diversity. SELECTION OF THE BEST LDH CANDIDATES

The heterologous expression of the lactic-dehydrogenase enzyme of rabbit muscle always resulted more problematic than 12a-HSDH. This is probably the consequence of incorrect folding issues and consequent reduced solubility/activity that is frequently encountered during the expression of eukaryotic proteins in host bacteria. The selection strategy for the new LDH varieties was thus based on the facility of heterologous expression for the micro-organism E. coli., thus selecting bacterial origin sequences or proteins affording the least complex possible quaternary structure, as well as on the maximum sequence diversity, both in relation to the protein used to date, and between the new candidates, such as to select a representative from among all the lactic- dehydrogenase subclasses present in the database, and without dependence on fructose 1.6-bisphosphate and thermal stability.

The enzymes selected for expression are listed in Table 7. Table 7. List of LDH selected for expression in E. coli.

The two proteins LDH_GEOSE and LDH_THECA exhibit mutations relative to the sequences recorded in the database since these modifications confer non-dependence on fructose 1.6-bisphosphate. In particular LDH_GEOSE exhibits the mutations R104C, Q189L, N293S, while the sequence LDH_THECA exhibits six mutations, specifically

L67E, H68D, E178K, A235R, R173Q, and R216L which significantly increase the thermostability of the protein.

The other enzymes in the list are instead already effector-independent in wild-type form. Analysis of evolutionary distance between the sequences demonstrate that the selected proteins, in addition to representing different subclasses of LDH, are also evolutionarily weakly correlated, which suggests the possibility of assessing enzymes that have evolved to adapt to different conditions and that consequently exhibit diversified properties. PREPARING SYNTHETIC GENES

The genes that code for the selected enzymes were chemically synthesized after being subjected to optimization for expression in the micro-organism selected as host. The optimization algorithm exploits the degeneration of the genetic code to conceive nucleotide sequences that, although coded for the protein identical to the one required, enable transcription and optimal translation in the selected expression system, stabilizing the messenger RNA, facilitating the ribosome bond, and eliminating regions that induce premature interruptions in translation.

CLONING IN EXPRESSION VECTORS

The five encoding sequences for LDH and the five for 12a-HSDH selected in the first operating step are summarized in Tables 8 and 9, respectively.

Table 8. List of the LDH selected for expression in E. coli

Table 9. List of the 12a-HSDH selected for expression in E. coli

The genes, synthesized after optimization for use of E. coli as host, were cloned in a specific vector (pET21 12a_988) for cytoplasmic expression, which exhibits a promoter inducible using lactose or synthetic analogues, for example IPTG (Isopropyl b-d-l- thiogalactopyranoside), and confers resistance to ampicillin.

TRANSFORMATION AND LDH EXPRESSION TESTING

The constructed vectors were used to transform strains of E. coli hyperproducers, commonly used for heterologous expression. Also transformed was the strain prepared previously by deletion of the hdhA gene, which codes for the endogenous enzyme 7a- HSDH, responsible for unwanted collateral reactions (strain AhdhA). A transforming clone for each strain was propagated in elective liquid medium and then cryopreserved in different quantities. These samples represented the starting point for all subsequent expression tests.

The preliminary tests using the glycerinates obtained were conducted in minimal medium with induction from lactose. Qualitative and semi-quantitative analysis of expression was conducted using SDS-PAGE (electrophoresis on polyacrylamide gel in presence of sodium dodecyl sulfate), when the expected molecular weight of the proteins is between 32.8 kDa (LDH_THECA) and 36.8 kDa (LDH_CUPNH). The tests demonstrated that the strain utilized is not a limiting parameter, given that the expression is independent of the type of E. coli utilized. The tests conducted demonstrated that in the comparison between the overproducing strain and the corresponding deletion there are no substantial differences in expression for any of the 5 enzymes analyzed.

In order to optimize expression, different conditions that could promote the expression of the proteins were tested on laboratory scale (final volume = 50 ml). For this purpose, all the tests were conducted at ambient temperature.

In particular, the following variables were considered:

Medium: standard and optimized;

Inductor concentrations (lactose): 2 g/L or 10 g/L;

Growth phase at moment of induction: exponential or stationary phase;

Agitation: 250 rpm or 150 rpm.

An excellent level of overexpression was achieved for LDH_CUPNH, LDH_THECA, and LDH_LACPE, regardless of the conditions tested.

However, greater variability was observed in the expression of LDH_GEOSE and LDH_CAEEL.

The assessment of all combinations of the listed variables enabled identification of the condition affording the most favorable expression of all the LDH enzymes of interest, this being optimized medium, induction during exponential phase using 2g/L of lactose, and agitation at 250 rpm.

Purification and solubility

The lactic-dehydrogenase enzymes were purified applying a process of mechanical lysis that involves treatment of the cells using ultrasound in the presence of detergents to facilitate the solubilization of the membrane. The soluble fraction, suitably separated from the insoluble fraction, was subject to a first qualitative analysis using SDS-PAGE. The band corresponding to the enzymes LDH_LACPE, LDH_CAEEL, and LDH_CUPNH is visible on polyacrylamide gel, including a more intense band for the first enzyme, which is thus the most soluble among the five proteins of interest. The enzymes LDH_GEOSE and LDH_THECA are instead not visible.

Enzymatic activity

The enzymes were titrated applying the method “UV Bict LDH” developed previously. The standard test conditions include incubation of the enzyme in phosphate buffer 100 mM, pH 7 at ambient temperature. The activity levels measured are presented in Table 10.

Table 10. Activity in terms of units/ml of cell lysates. Also reported is the activity of the LDH sample of rabbit muscle (LDH.A), lot LDH010111/4, the lactic-dehydrogenase utilized in the standard bioconversions.

The enzyme resulting most soluble in the qualitative analyses, LDH_LACPE, exhibited greater activity under standard conditions. Lower solubility also corresponds with lower activity, as observed for the enzymes LDH_GEOSE and LDH_THECA.

Titration is an activity value established in vitro under standard conditions. It is thus necessary to assess the performance of the LDH under the target reaction conditions since these could influence the activity either positively or negatively for each specific protein. TRANSFORMATION AND EXPRESSION TESTING OF 12a-HSDH As in the case of the LDH, the constructed vectors were used to transform strains of E. coli hyperproducers, commonly used for heterologous expression, including the strain AhdhA. A clone transformed for each strain was propagated in selective liquid medium and then cryopreserved. These frozen samples represent the starting point for all the subsequent expression tests.

The preliminary tests in minimal medium with lactose induction demonstrated that the fundamental parameter for the expression of 12a-HSDH is the host strain. It is possible that expression induces excessive stress for all the cells, with the exception of the ROS strain, which was thus selected as best for the subsequent tests.

For each ROS transformant a version was also created with deletion of the endogenous 7a-HSDH, with the purpose of incidental use as expression strain without collateral reactions relative to the one intended. Matching the operations executed for expression of LDH, optimization of expression of ROS strains was also conducted for the 12a-HSDH, testing on laboratory scale (final volume = 50 ml) different conditions that could promote the expression of the proteins in soluble form. For this purpose all the tests were conducted at ambient temperature, assessing different inductors (lactose or IPTG), growth steps at the moment of induction (exponential or stationary), and mediums (standard or optimized).

The best expression was achieved in optimized medium under lactose induction in exponential phase, and analogous results were obtained for the other enzymes. Purification and solubility The 12a-HSDH were purified utilizing a process of mechanical lysis that includes treatment of the cells using ultrasound and subsequent separation of soluble and insoluble fractions. A first analysis utilizing SDS-PAGE indicated that all the proteins are soluble once purified by way of cellular sonication, all the bands being visible in the soluble fraction. Enzymatic activity

The enzymes were titrated using the method “Uv Bict 12a-HSDH” previously defined. The standard test conditions include incubation of the enzyme in phosphate buffer 100 mM, pH 8.5 at ambient temperature.

Table 11 shows the values obtained, including the activity of the enzyme Oxlll (lot 12A010113/1), which is used in standard reactions.

Table 11. Activity in terms of units/ml of cell lysates. Also reported is the activity of the 12a-HSDH Oxlll (lot 12A010113/1) sample, the enzyme used in the standard bioconversions. All the expressed proteins exhibited lower activity compared to the enzyme Oxlll according to titration in vitro. However, the performances of the enzymes can vary under the reaction conditions, depending on how the structure and proteic activity are influenced by the difference in reaction parameters compared to those in vitro.

It was further observed that, according to the tests conducted previously, the limiting factor in the bioconversion from cholic acid to 12k-CDCA is not the activity of the 12a- HSDH, but the recycling reaction of the cofactor NAD+/NADH and therefore the performance of the lactic-dehydrogenase (LDH). It is therefore possible that, by optimizing the latter enzyme, a lower activity of the new 12a-HSDH variants relative to the standard enzyme (Oxlll) will not negatively influence the reaction yield. Bioconversions In order to identify the most favorable combination between a new LDH and a new 12 a- HSDH in terms of yield, a preliminary test was implemented according to the factorial design set out in Table 12. All the five lactic-dehydrogenase enzymes were combined with each of the five 12a-HSDH utilizing a miniaturization system applied at Bict which enables the execution of high throughput screening (HTS), simultaneously analyzing different parameters on very small volumes (1 ml). As a control, reactions were also implemented in combination with the enzymes LDH. A and Oxl 11.

Table 12. Combination of all the 6 LDH of interest (5 new and the control LDH.A) with all the 6 12a-HSDH of interest (5 new and the control Oxll 1). The reactions were conducted under standard conditions.

After 20 h the bioconversion yields were verified utilizing HPLC analysis according to the “Method Ox of CA at 12K-CDCA”. Table 13 presents the cholic acid residue value as a percentage area of the chromatogram. Table 13. Residual cholic acid in terms of percentage area of the chromatogram.

Under standard conditions no combination achieved a yield comparable to those of the two standard enzymes LDH.A and Oxl 11, and the percentage areas of residual substrate were around 50% while only with LDH_CAEEL was a conversion achieved of at least 80%. Table 9 shows that between the two enzymes it is the lactic-dehydrogenase that conditions the yield, considering that similar values were obtained for residual cholic acid with the same LDH, regardless of the 12a-HSDH utilized. Even the most promising LDH in vitro, the LDH_LACPE, gave very bad results under reaction conditions.

Limiting factors: LDH and enzymatic stability Considering that one of the important factors conditioning enzymatic activity is the stability of the protein under reaction conditions, at a later time, the residual activity of each lactic-dehydrogenase was titrated in vitro, incubated in a reaction mixture under standard conditions (T = 30°C, pH8).

Table 14 shows the activity values (U/ml) recorded at specific times following sampling (1 h, 4 h, 20 h) from the reaction mixture.

Table 14. Titer measured using the “Method Uv-Bict LDH” following incubation at subsequent times in reaction mixture. The values in the table are expressed as units/ml (U/ml).

Table 14 shows that the eukaryotic proteins (LDH_LDH.A and LDH_CAEEL) lost more than 60% of their activity after an hour and the enzymatic titer remained constant over the following hours. The LDH_CUPNH, LDH_GEOSE, and LDH_THECA are stable, but the activity remained too low to achieve the target yields. LDH_LACPE, though exhibiting an initial titer much higher than the other proteins, dropped sharply in activity after an hour, proving to be extremely unstable in the bioconversion of interest.

It is thus evident that the standard conditions are not suitable for the activity of the five lactic-dehydrogenase selected, in particular as regards the one that appeared most promising in vitro, the LDH_LACPE.

Limiting factors: LDH and pH

The main parameter analyzed of the standard reaction was the pH: different states of protonation of the amino acids of a protein can condition its three-dimensional structure and so its activity. Reactions were implemented in parallel in which Oxl 11 was associated with a recycling system for the cofactor determined in each case by a different LDH, testing 3 pH for each enzymatic combination, pH=7.5, pH=7, and pH=6.5. After 20 h the residual activity of the LDH was titred utilizing the “Method Uv-Bict LDH”.

The experimental results obtained demonstrated that the lactic-dehydrogenase variants are relatively unstable following incubation in the reaction mixtures, in particular at lower pH values. The sole exception was TLDH_LACPE, which increased in stability as the acidity of the reaction environment increased. This behavior is supported by literature data, in which the optimum pH for the enzyme is 5.5 and possibly explaining the low bioconversion yields despite the high protein titer in vitro compared to other titrates. The bioconversion yield for neutral or acid pH values, established utilizing HPLC analyses according to the “Method Ox from CA to 12K-CDCA”.

As predicted, the stability of LDH_LACPE increases as the pH decreases, and the reaction yields also increased, passing from a conversion of 50% (pH8) to a conversion of approximately 93% (pH7). It is probable that at pH 6.5 the yield was lower than under neutral conditions because the activity of the Oxlll became limiting; it is known that 12a-HSDH variants exhibit a basic pH optimum (around 8).

The other LDH instead exhibited a deterioration in performance, shifting into acidic conditions.

Selection of 12a-HSDH

Having identified the best LDH and the condition that improves its performance, an effort was made to select a 12a-HSDH that enabled achievement of a yield at least comparable to Oxlll.

Parallel reactions were implemented at pH values in which the activity of the LDH_LACPE is more stable (pH7, pH6.5, pH6), in combination with each of the 12a- HSDH alternatives to Oxlll.

After 20 h at 30°C the samples were processed and analyzed according to the “Method Ox from CA to 12K-CDCA”.

The best yields were obtained using the enzyme LDH_LACPE with 12a_829 or 12a_793, which exhibited values comparable to those of the control Oxl 11, but only at pH7, which represents a good compromise between the optimum acid pH of the LDH and the basic pH of the 12a-HSDH.

Optimization hits

The combinations LDH_LACPE /12a-HSDH_829 and LDH_LACPE /12a-HSDH_793 that gave the best yields were subjected to optimization, in order to approach the required target (residual cholic acid lower than 0.5%).

Reactions were thus implemented varying parameters comprising: incubation temperature (30°C / 24°C), volume of LDH (6% v/v or 3% v/v), volume of 12a-HSDH (3% v/v or 6% v/v), and concentration of phosphate buffer pH7 (1M or 0.1 M). The yields were assessed applying the “Method Ox from CA to 12K-CDCA”.

The resulting values in Table 15 show that the discriminating element is the molar concentration of the phosphate buffer in which the reaction occurs, considering that the target was only achieved using phosphate buffer at a concentration of 100 mM. It can be noted that the concentration of LDH can be halved compared to the standard, since probably in these conditions, the enzyme is very active and greatly in excess.

Table 15. Reaction yield in the reaction optimization process using 12a_829 and 12a_793. The reported value is understood as percentage area of residual cholic acid in the analysis chromatogram with HPLC. Std: [LACPE] = 6% V/V, [12a-HSDH] = 3% V/V; 12A_2X: [12a-HSDH] = 6% V/V, [LACPE] = 6% V/V; LDH_0.5X: [LACPE] = 3% V/V, [12a- HSDH] = 3% V/V On the basis of the data obtained during the screening and optimization steps on small volumes (1 ml), a scale-up test was conducted, reproducing the best conditions in a reaction volume of 200 ml. After completion of bioconversion, the products were purified and analyzed using HPLC.

Considering that during the screening step the two enzymes 12a_829 and 12a_793 returned similar results, they were both tested.

HPLC analyses returned the following percentage area values for residual cholic acid: 12a_829= 0.466 12a_793= 0.657

It emerges that the parameters selected on the small scale made it possible to achieve the required yields also using larger volumes, and that 12a_829 is the better enzyme between the two options.

Sample preparation

The enzymes LDH_LACPE, 12A_829 and 12A_793 were expressed in a fermentation process on laboratory scale (conical flask of 500 ml) utilizing the strains and optimal conditions described herein above. The proteins were purified according to the methods described herein above. The enzymes, in the form of cell lysates, were then resuspended at 25% v/v in glycerol in order to achieve improved protein and titer stability.

ADDITIONAL MUTAGENESIS As observed herein above, the two cycles of mutagenesis conducted produced two performing mutants, the last of which (Seq 73) exhibited an almost total elimination of oxidized compounds and a conversion of approximately 90% utilizing up to 10 g/L of DCA substrate.

A structural bioinformatics analysis of the enzyme revealed a number of other optional mutations in order to potentially increase performance, possibly in order to further reduce substrate inhibition (acting on the so-called “secondary pockets” in the three-dimensional structure of the enzyme).

Further consideration was thus dedicated to all the potentially useful mutations for producing additional improvements, making use of the information collected during the bioinformatics analysis.

Making use of the data derived from the docking analysis, the present work was directed towards certain positions that the bioinformatics analysis had identified as important, not only to increase the affinity between the substrate and protein, but also to eliminate residues that could create a second bond pocket for the DCA. Table 16 lists the main residues which, from the comparison of at least three docking models, are potentially involved in the formation of a second bond site with the DCA. Table 16. List of residues involved in the formation of a secondary interaction site between protein and DCA in a plurality of docking models. Figure 4 clearly shows the interesting residues involved within the amino acid sequence of the protein: they are distant from each other as regards primary linear sequence, but after folding of the biocatalyst they are all close to the region of interaction with the substrate. The bioinformatics models of substrate-cytochrome interaction identified the possible presence of a secondary region where the DCA can position itself, thus blocking entry to the actual active site and so reducing the performance of the enzyme.

The residues were selected as the target for saturation mutagenesis tests, which aim to substitute all of the 6 residues with all of the other 19 L- amino acids, obtaining 19 6 possible different sequence combinations for testing. Seqll was used as a PCR reaction template in which the sequence was amplified in presence of a primer with degenerations affecting each of the 8 residues, so as to introduce random mutations in only the 8 selected positions. 5 different primers were used to introduce the maximum possible number of mutations, reducing the probability of obtaining stop codons to a minimum.

The pool of sequences obtained was cloned in the vector for intercellular expression in yeast regulated by a constituting promoter, described herein above, thus eliminating the use of methanol as inductor foreseen in commercial vectors, but difficult to apply in fermentations on an industrial scale (risks of flammability and toxicity). The plasmids obtained were utilized to transform the strain of P. pastoris selected previously, also capable of expressing the fungal accessory protein, NAD(P)H-p450 reductase. EXPRESSION AND BIOCONVERSION TESTING

The clones derived from the transformations of P. pastoris using the sequences obtained from random mutagenesis were utilized for HTS expression and bioconversion tests, utilizing a 24 well microplate miniaturization system (2 ml of culture/well) for cellular growth, and 96 deep well microplates for the bioconversion step, such as to reduce the operating volumes and enable wide spectrum screening on different clones simultaneously, up to 192 clones per test.

Overall 5 expression and bioconversion tests were conducted, achieving an analysis of a total of 960 clones.

The growth step involved inoculating one clone per microplate well in buffered medium. After three days of growth, the cellular paste was separated from the exhausted medium by centrifugation and used to prepare bioconversion tests, following resuspension at 20% wwcp/V in reaction buffer and addition of DCA substrate to a final concentration of 20 g/L.

After 3 days of reaction the total bile acids were extracted using 2 volumes of methanol following acidification of the cellular suspensions. Analysis of the extracts was conducted using an INERTSIL ODS-2 column, C18, 250 x 4.6 - 5um, analytic CPS (cat.no. 5020- 01128).

Of the 960 clones analyzed, in the small scale HTS tests none exhibited a better performance than the original clone, but three were identified (P2B1, P5C3, P8D5) offering similar performance compared to the positive control. The 3 clones identified were then tested on a larger scale, in 500 ml conical flasks with 100 ml of medium such as to improve the oxygenation of the cultures, one of the parameters that most conditions growth of P. pastoris. In this way it was possible to assess whether the clones, under optimum growth and expression conditions, would produce a mutated cytochrome offering yields superior to the wild-type, or whether, though offering similar performances, the clones might exhibited superior growth in terms of biomass produced over time. From a perspective of industrial application, considering that the biocatalyst is comprised in the biomass itself, it is advantageous to have a yeast clone that grows as quickly as possible, for equal expression of the cytochrome of interest, such as to limit the fermentation times and consequently the costs. After three days of growth, the cellular paste was separated from the exhausted medium by way of centrifugation and used to prepare bioconversion tests, following suspension at 20% wwcp/V in reaction buffer with addition of DCA substrate at a final concentration of 20 g/L. The three clones did not exhibit higher growth rates compared to the yeast clone that expresses Seq73.

SEQUENCE LISTING SEQ ID N. 1 (protein 2)

MATDLDLVLGKSQYALFCGITLFSFFILKYSLLGNGGKQYPYINPKKPFELSNQR

VVQDFIENARDILTKGRSLYKDTPYKAHTDLGDVLVIPPEFADALKSERQLDFT

EVARDDTHGYIPGFEPIGSPFDLVPLVNKYLTRALAKLTKPLWAEASLGVNHVL

GTSTEWHPINPGEDIMRIVSRMSSRIFMGEELCKDDDWLKVSIEYTVQLFQTAD

ELRNYPRWTRPYIHWFLPSCQGVRRKLQEARDLLQPHIDRRNAVKKEAIAEGR

PSPFDDSIEWFENEYEGKSDPATEQIKLSLVAIHTTTDLLSETMFNIALQPELLGP

LREEIVTVLSTEGLKKTSFYNLKLMDSVIKESQRLRPVLLGAFRRMALADVTLP

NGDVIKKGTKIICDTTHQWNPEYYPDASKFNAYRFLQMRQTPGQDKRAHLVST

SHDQMGFGHGLHACPGRFFAANEIKIALCHMLLKYDWKLPEGVVPKSKALGM

SLLGDREAKLMVKRRAAEIDIDAIGSDE

SEQ ID N. 2 (protein 11)

MATDLDLVLGKSQYALFCGITLFSFFILKYSLLGNGGKQYPYINPKKPFELSNQR

VVQDFIENARDILTKGRSLYKDTPYKAHTDLGDVLVIPPEFADALKSERQLDFT

EVAVDDTHGYIPGFEPIGSPFDLVPLVNKYLTRALAKLTKPLWAEASLGVNHVL

GTSTEWHPINPGEDIMRIVSRMSSRIFMGEELCKDDDWLKVSIEYTVQLFQTAD

ELRNYPRWTRPYIHWFLPSCQGVRRKLQEARDLLQPHIDRRNAVKKEAIAEGR

PSPFDDSIEWFENEYEGKSDPATEQIKLSLVAIHTTTDLLSETMFNIALQPELLGP

LREEIVTVLSTEGLKKTSFYNLKLMDSVIKESQRLRPVLLGIFRRMALADVTLPN

GDVIKKGTKIICDTTHQWNPEYYPDASKFNAYRFLQMRQTPGQDKRAHLVSTS

HDQMGFGHGLHACPGRFFAANEIKIALCHMLLKYDWKLPEGVVPKSKALGMS

LLGDREAKLMVKRRAAEIDIDAIGSDE

SEQ ID N. 3 (protein 73)

MATDLDLVLGKSQYALFCGITLFSFFILKYSLLGNGGKQYPYINPKKPFELSNQR

VVQDFIENARDILTKGRSLYKDTPYKAHTDLGDVLVIPPEFADALKSERQLDFT

SVAVDDTHGYIPGFEPIGSPFDLVPLVNKYLTRALAKLTKPLWAEASLGVNHVL

GTSTEWHPINPGEDIMRIVSRMSSRIFMGEELCKDDDWLKVSIEYTVQLFQTAD

ELRNYPRWTRPYIHWFLPSCQGVRRKLQEARDLLQPHIDRRNAVKKEAIAEGR

PSPFDDSIEWFENEYEGKSDPATEQIKLSLVAIHTTTDLLSETMFNIALQPELLGP

LREEIVTVLSTEGLKKTSFYNLKLMDSVIKESQRLRPVSLGIFRRMALADVTLPN

GDVIKKGTKIICDTTHQWNPEYYPDASKFNAYRFLQMRQTPGQDKRAHLVSTS HDQMGFGHGLHACPGRFFAANEIKIALCHMLLKYDWKLPEGVVPKSKALGMS

LLGDREAKLMVKRRAAEIDIDAIGSDE

EVARsite (SEQ ID N. 4)

CTTTGAAATCTGAGAGACAATTGGATTTCACTNNKGTTGCTNNSGATGATAC

TCACGGTTACATTCCAGG

LLGAFRsite (SEQ IN N. 5)

CTGTTATTAAAGAGTCTCAAAGATTGAGACCTGTTNNKTTGGGTNNSTTCAG

AAGAATGGCTTTGGCTGATG

GlullO (SEQ ID N. 6)

CTTTGAAATCTGAGAGACAATTGGATTTCACTNNKGTTGCTGTCGATGATAC

TCACGGTTACATTCCAGG

His83 (SEQ ID N. 7)

GT AGATCTTTGT AC A AGGAT ACTCC AT AT A A AGCTNNKACTG ATTTGGGAG ATGTTTTG

Arg220 (SEQ ID N. 8)

TTGAGTACACTGTTCAATTGTTTCAAACTGCTGATGAATTGBNKAATTACCC T AGATGGACT AG ACC AT AT ATT C ATTGG

GIFRsite (SEQ ID N. 9)

GAGTCTCAAAGATTGAGACCTGTTAGTTTGHNKATCTTCNNKAGAATGGCTT

TGGCTGATGTTACTTTGCCA

Leu365 (SEQ IN N. 10)

CTGTTATTAAAGAGTCTCAAAGATTGAGACCTGTTNNKTTGGGTATCTTCAG

AAGAATGGCTTTGGCTGATG

SEQ ID N. 11 (nucleotide sequence Protein2)

ATGGCTACTGATTTGGATTTGGTTTTGGGTAAATCTCAATACGCTTTGTTTTG

TGGTATCACTTTGTTTTCTTTCTTTATCTTGAAGTATTCTTTGTTGGGTAACG

GTGGTAAACAATACCCTTACATCAACCCAAAGAAACCTTTCGAGTTGTCTAA

CCAAAGAGTTGTTCAAGATTTCATCGAAAACGCTAGAGATATCTTGACTAA

GGGTAGAAGTCCTTGTACAAGGATACTCCATATAAAGCTCATACTGATTTGG

GAGATGTTTTGGTTATTCCACCTGAATTTGCTGATGCTTTGAAATCTGAGAG

ACAATTGGATTTCACTGAAGTTGCTAGAGACGATACTCACGGTTACATTCCA

GGTTTTGAGCCTATTGGTTCTCCATTCGATTTGGTTCCTTTGGTTAACAAGTA

TTTGACTAGAGCTTTGGCTAAGTTGACTAAACCATTGTGGGCTGAAGCTTCT

TTGGGT GTT A ATC AT GTTTT GGGT ACTTCT ACTGAGTGGC ACCC A ATT A ACC CTGGTGAAGATATTATGAGAATCGTTTCTAGAATGTCTTCTAGAATTTTTAT

GGGTGAAGAGTTGTGTAAGGATGATGATTGGTTGAAAGTTTCTATTGAGTA

CACTGTTCAATTGTTTCAAACTGCTGATGAATTGAGAAATTACCCTAGATGG

ACTAGACCATATATTCATTGGTTCTTGCCTTCTTGTCAAGGTGTTAGAAGAA

AGTTGCAAGAAGCTAGAGATTTGTTGCAACCACACATTGATAGAAGAAATG

CTGTTAAGAAAGAGGCTATTGCTGAAGGTAGACCATCTCCTTTTGATGATTC

TATTGAGTGGTTCGAAAACGAGTACGAGGGTAAATCTGATCCAGCTACTGA

ACAAATTAAATTGTCTTTGGTTGCTATCCATACTACTACTGATTTGTTGTCTG

AGACTATGTTCAATATTGCTTTGCAACCTGAATTGTTGGGTCCATTGAGAGA

AGAGATTGTTACTGTTTTGTCTACTGAAGGTTTGAAGAAAACTTCTTTCTAC

AACTTGAAGTTGATGGATTCTGTTATTAAAGAGTCTCAAAGATTGAGACCTG

TTTTGTTGGGTGCTTTCAGAAGAATGGCTTTGGCTGATGTTACTTTGCCAAA

CGGAGATGTT ATT A AGA A AGGT ACT A AG ATT ATTT GTGAT ACT ACTC ATC A A

TGGAACCCTGAATACTATCCAGATGCTTCTAAGTTTAACGCTTACAGATTCT

TGCAAATGAGACAAACTCCTGGTCAAGATAAAAGAGCCCATTTGGTTTCTA

CTTCTCACGATCAAATGGGTTTTGGTCATGGTTTGCACGCTTGTCCAGGTAG

ATTTTTCGCTGCTAACGAGATTAAAATTGCTTTGTGTCACATGTTGTTGAAG

TATGATTGGAAATTGCCTGAAGGTGTTGTTCCAAAGTCTAAAGCTTTGGGTA

TGTCTTTGTTGGGAGATAGAGAGGCTAAGTTGATGGTTAAAAGAAGAGCTG

CTGAGATTGATATTGATGCTATTGGTTCTGATGAATAA

SEQ ID N. 12 (nucleotide sequence Seqll)

ATGGCTACTGATTTGGATTTGGTTTTGGGTAAATCTCAATACGCTTTGTTTTG

TGGTATCACTTTGTTTTCTTTCTTTATCTTGAAGTATTCTTTGTTGGGTAACG

GTGGTAAACAATACCCTTACATCAACCCAAAGAAACCTTTCGAGTTGTCTAA

CCAAAGAGTTGTTCAAGATTTCATCGAAAACGCTAGAGATATCTTGACTAA

GGGTAGAAGTCCTTGTACAAGGATACTCCATATAAAGCTCATACTGATTTGG

GAGATGTTTTGGTTATTCCACCTGAATTTGCTGATGCTTTGAAATCTGAGAG

ACAATTGGATTTCACTGAAGTTGCTGTCGACGATACTCACGGTTACATTCCA

GGTTTTGAGCCTATTGGTTCTCCATTCGATTTGGTTCCTTTGGTTAACAAGTA

TTTGACTAGAGCTTTGGCTAAGTTGACTAAACCATTGTGGGCTGAAGCTTCT

TTGGGT GTT A ATC AT GTTTT GGGT ACTTCT ACTGAGTGGC ACCC A ATT A ACC

CTGGTGAAGATATTATGAGAATCGTTTCTAGAATGTCTTCTAGAATTTTTAT

GGGTGAAGAGTTGTGTAAGGATGATGATTGGTTGAAAGTTTCTATTGAGTA

CACTGTTCAATTGTTTCAAACTGCTGATGAATTGAGAAATTACCCTAGATGG

ACTAGACCATATATTCATTGGTTCTTGCCTTCTTGTCAAGGTGTTAGAAGAA

AGTTGCAAGAAGCTAGAGATTTGTTGCAACCACACATTGATAGAAGAAATG

CTGTTAAGAAAGAGGCTATTGCTGAAGGTAGACCATCTCCTTTTGATGATTC

TATTGAGTGGTTCGAAAACGAGTACGAGGGTAAATCTGATCCAGCTACTGA

ACAAATTAAATTGTCTTTGGTTGCTATCCATACTACTACTGATTTGTTGTCTG

AGACTATGTTCAATATTGCTTTGCAACCTGAATTGTTGGGTCCATTGAGAGA

AGAGATTGTTACTGTTTTGTCTACTGAAGGTTTGAAGAAAACTTCTTTCTAC

AACTTGAAGTTGATGGATTCTGTTATTAAAGAGTCTCAAAGATTGAGACCTG

TTTTGTTGGGTATCTTCAGAAGAATGGCTTTGGCTGATGTTACTTTGCCAAA CGGAGATGTT ATT A AGA A AGGT ACT A AG ATT ATTT GTGAT ACT ACTC ATC A A

TGGAACCCTGAATACTATCCAGATGCTTCTAAGTTTAACGCTTACAGATTCT

TGCAAATGAGACAA ACTCCTGGTC AAGAT AAAAGAGCCC ATTTGGTTTCT A

CTTCTCACGATCAAATGGGTTTTGGTCATGGTTTGCACGCTTGTCCAGGTAG

ATTTTTCGCTGCTAACGAGATTAAAATTGCTTTGTGTCACATGTTGTTGAAG

TATGATTGGAAATTGCCTGAAGGTGTTGTTCCAAAGTCTAAAGCTTTGGGTA

TGTCTTTGTTGGGAGATAGAGAGGCTAAGTTGATGGTTAAAAGAAGAGCTG

CTGAGATTGATATTGATGCTATTGGTTCTGATGAATAA

SEQ ID N. 13 (nucleotide sequence Seq73)

ATGGCTACTGATTTGGATTTGGTTTTGGGTAAATCTCAATACGCTTTGTTTTG

TGGTATCACTTTGTTTTCTTTCTTTATCTTGAAGTATTCTTTGTTGGGTAACG

GTGGTAAACAATACCCTTACATCAACCCAAAGAAACCTTTCGAGTTGTCTAA

CCAAAGAGTTGTTCAAGATTTCATCGAAAACGCTAGAGATATCTTGACTAA

GGGTAGAAGTCCTTGTACAAGGATACTCCATATAAAGCTCATACTGATTTGG

GAGATGTTTTGGTTATTCCACCTGAATTTGCTGATGCTTTGAAATCTGAGAG

ACAATTGGATTTCACTAGTGTTGCTGTCGACGATACTCACGGTTACATTCCA

GGTTTTGAGCCTATTGGTTCTCCATTCGATTTGGTTCCTTTGGTTAACAAGTA

TTTGACTAGAGCTTTGGCTAAGTTGACTAAACCATTGTGGGCTGAAGCTTCT

TTGGGT GTT A ATC AT GTTTT GGGT ACTTCT ACTGAGTGGC ACCC A ATT A ACC

CTGGTGAAGATATTATGAGAATCGTTTCTAGAATGTCTTCTAGAATTTTTAT

GGGTGAAGAGTTGTGTAAGGATGATGATTGGTTGAAAGTTTCTATTGAGTA

CACTGTTCAATTGTTTCAAACTGCTGATGAATTGAGAAATTACCCTAGATGG

ACTAGACCATATATTCATTGGTTCTTGCCTTCTTGTCAAGGTGTTAGAAGAA

AGTTGCAAGAAGCTAGAGATTTGTTGCAACCACACATTGATAGAAGAAATG

CTGTTAAGAAAGAGGCTATTGCTGAAGGTAGACCATCTCCTTTTGATGATTC

TATTGAGTGGTTCGAAAACGAGTACGAGGGTAAATCTGATCCAGCTACTGA

ACAAATTAAATTGTCTTTGGTTGCTATCCATACTACTACTGATTTGTTGTCTG

AGACTATGTTCAATATTGCTTTGCAACCTGAATTGTTGGGTCCATTGAGAGA

AGAGATTGTTACTGTTTTGTCTACTGAAGGTTTGAAGAAAACTTCTTTCTAC

AACTTGAAGTTGATGGATTCTGTTATTAAAGAGTCTCAAAGATTGAGACCTG

TTAGTTTGGGTATCTTCAGAAGAATGGCTTTGGCTGATGTTACTTTGCCAAA

CGGAGATGTT ATT A AGA A AGGT ACT A AG ATT ATTT GTGAT ACT ACTC ATC A A

TGGAACCCTGAATACTATCCAGATGCTTCTAAGTTTAACGCTTACAGATTCT

TGCAAATGAGACAA ACTCCTGGTC AAGAT AAAAGAGCCC ATTTGGTTTCT A

CTTCTCACGATCAAATGGGTTTTGGTCATGGTTTGCACGCTTGTCCAGGTAG

ATTTTTCGCTGCTAACGAGATTAAAATTGCTTTGTGTCACATGTTGTTGAAG

TATGATTGGAAATTGCCTGAAGGTGTTGTTCCAAAGTCTAAAGCTTTGGGTA

TGTCTTTGTTGGGAGATAGAGAGGCTAAGTTGATGGTTAAAAGAAGAGCTG

CTGAGATTGATATTGATGCTATTGGTTCTGATGAATAA