Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
IMPROVED OXIDOREDUCTASES FOR BIOCATALYSIS
Document Type and Number:
WIPO Patent Application WO/2016/101045
Kind Code:
A1
Abstract:
Engineered ancestral P450 enzymes and P450 reductase enzymes are provided. Also provided are methods for the production of these engineered ancestral enzymes. The engineered ancestral enzymes may have one or more improved or enhanced properties relative to one or more existing P450 enzymes or P450 reductase enzymes. One or more of these enzymes can be used to perform chemical reactions, such as for structure- activity relationship analysis; pharmacological testing; bioremediation; or biosensor technology.

Inventors:
GILLAM ELIZABETH (AU)
GUMULYA YOSEPHINE (AU)
Application Number:
PCT/AU2015/050847
Publication Date:
June 30, 2016
Filing Date:
December 24, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV QUEENSLAND (AU)
International Classes:
C12N15/53; C12Q1/00
Domestic Patent References:
WO2012038508A12012-03-29
Other References:
DATABASE GenBank 2 May 2011 (2011-05-02), Database accession no. AD060898.1
DATABASE UniProtKB "H3BGP9_LATCH)'s NADPH--cytochrome P450 reductase", Database accession no. H3BGP9
GILLIAM E. M. J.: "Engineering Cytochrome P450 Enzymes", CHEMICAL RESEARCH. TOXICOLOGY., vol. 21, 2008, pages 220 - 231
KABUMOTO H. ET AL.: "Directed Evolution of the Actinomycete Cytochrome p450 MoxA (CYP105) for Enhanced Activity.", BIOSCIENCE BIOTECHNOLOGY BIOCHEMISTRY, vol. 73, no. 9, 2009, pages 1922 - 1927, XP055187656, DOI: doi:10.1271/bbb.90013
KUMAR S. ET AL.: "Directed Evolution of Mammalian Cytochrome P450 2B1", THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 280, no. 20, 20 May 2005 (2005-05-20), pages 19569 - 19575
Attorney, Agent or Firm:
FISHER ADAMS KELLY CALLINANS (175 Eagle StreetBrisbane, Queensland 4000, AU)
Download PDF:
Claims:
CLAIMS

1. An isolated protein comprising the amino acid sequence set forth in SEQ ID NO: l, or an amino acid sequence at least 80% identical to SEQ ID NO: l, wherein Xi- X219 in SEQ ID NO: l, or in the amino acid sequence at least 80%> identical to SEQ ID NO: l, may be any amino acid.

2. The isolated protein of Claim 1, wherein one or more of the residues X1-X219 is selected from the respective group of variable amino acids as set forth in Table 9.

3. The isolated protein of Claim 1 or Claim 2, wherein said isolated protein comprises an amino acid sequence set forth in any one of SEQ ID NOS:2-41 or SEQ ID

NOS: 544-578, or an amino acid sequence at least 80%> identical to any one of SEQ ID NOS:2-41 or SEQ ID NOS: 544-578.

4. An isolated protein comprising the amino acid sequence set forth in SEQ ID NO: 180, or an amino acid sequence at least 80%> identical to SEQ ID NO: 180, wherein residues X1-X163 in SEQ ID NO: 180, or in the amino acid sequence at least 80%> identical to SEQ ID NO: 180, may be any amino acid.

5. The isolated protein of Claim 4, wherein one or more of the residues X1-X163 is selected from the respective group of variable amino acids as set forth in Table 10.

6. The isolated protein of Claim 4 or Claim 5, wherein said isolated protein comprises an amino acid sequence set forth in any one of SEQ ID NOS: 181-250, or an amino acid sequence at least 80%> identical to any one of SEQ ID NOS: 181-250.

7. The isolated protein of any preceding claim, wherein said isolated protein has P450 enzyme activity.

8. An isolated protein comprising the amino acid sequence set forth in SEQ ID NO:321, or an amino acid sequence at least 80%> identical to SEQ ID NO:321, wherein residues X1-X1 1 in SEQ ID NO:321 or in the amino acid sequence at least 80%> identical to SEQ ID NO:321, may be any amino acid.

9. The isolated protein of Claim 8, wherein one or more of the residues X1-X1 1 is selected from the respective group of variable amino acids as set forth in Table 1 1. 10. The isolated protein of Claim 8 or Claim 9, wherein said isolated protein comprises an amino acid set forth in any one of SEQ ID NOS:322-431, or an amino acid sequence at least 80% identical to any one of SEQ ID NOS:322-431.

1 1. The isolated protein of any one of Claims 8-10, wherein said isolated protein has P450 reductase enzyme activity.

12. An isolated protein that comprises an amino acid sequence set forth in any one of SEQ ID NOS:2-41, 181-250, 322-431, or 544-578, or an amino acid sequence at least 80% identical to any of these.

13. A method of producing or constructing an isolated protein, said method including the step of producing or constructing an engineered ancestral amino acid sequence of at least a fragment of a P450 protein or P450 reductase protein from one or more P450 protein or P450 reductase amino acid sequences that are different to the engineered ancestral amino acid sequence.

14. The method of claim 13, wherein the engineered ancestral P450 protein or P450 reductase protein displays or possesses one or more increased or enhanced properties compared to at least one of the one or more P450 proteins or P450 reductase proteins comprising amino acid sequences that are different to the engineered ancestral amino acid sequence.

15. An isolated engineered ancestral P450 protein or P450 reductase protein produced according to the method of Claim 13 or Claim 14.

16. The method of Claim 13 or Claim 14, or the protein of Claim 15, wherein the engineered ancestral P450 protein or P450 reductase protein comprises an amino acid sequence set forth in SEQ ID NOS: 1, 2, 180, 181, 321 and/or 322.

17. A method of producing or constructing a modified engineered ancestral P450 protein or P450 reductase protein, said method including the step of introducing one or more amino acid substitutions in an amino acid sequence of the engineered ancestral P450 protein or P450 reductase protein to thereby produce or construct the modified engineered ancestral P450 protein or P450 reductase protein.

18. The method of Claim 17, wherein the amino acid substitutions are non- conservative amino acid substitutions. 19. The method of Claim 18, wherein the modified engineered ancestral P450 protein or P450 reductase protein displays or possesses one or more increased or enhanced properties compared to the engineered ancestral P450 protein or P450 reductase protein. 20. A modified engineered ancestral P450 protein or P450 reductase protein produced according to the method of any one of Claims 17-19.

21. The method of any one of Claim 17-19, or the protein of Claim 20, wherein the modified engineered ancestral P450 protein or P450 reductase protein comprises an amino acid sequence set forth in SEQ ID NOS: 1, 3-41, 180, 182-250, 321, 323-431 and/or 544-578, or an amino acid sequence at least 80% identical to SEQ ID NOS: 1, 3- 41, 180, 182-250, 321, 323-431 and/or SEQ ID NOS:544-578.

22. An isolated nucleic acid encoding the isolated protein of any one of Claims 1- 12, 15, 16, or 20, or produced according to the method of any one of Claims 13, 14 or

16-21, or a fragment or derivative of said isolated protein.

23. A genetic construct comprising the isolated nucleic acid of Claim 22. 24. A host cell comprising the genetic construct of Claim 23.

25. An antibody or antibody fragment which binds and/or is raised against an isolated protein of any one of Claims 1-12, 15, 16, 20, or 21.

26. A composition for performing a chemical reaction, said composition comprising one or more of the isolated proteins of any one of Claims 1-12, 15, 16, 20, or 21, and one or more buffers, solvents and/or other reagents suitable for the chemical reaction. 27. A method of performing a chemical reaction, said method including the step of exposing a substrate molecule to one or more of the isolated proteins of any Claims 1- 12, 15, 16, 20, or 21, to thereby perform the chemical reaction.

28. A reaction product produced according to the method of Claim 27.

29. The composition of Claim 26 or the method of Claim 27, wherein the one or more isolated proteins include a protein having P450 enzyme activity and a protein having P450 reductase activity. 30. Use of the isolated protein of any one of Claims 1-12, 15, 16, 20, or 21, or the composition of Claim 26 or Claim 29 for structure-activity relationship analysis; pharmacological testing; bioremediation; or biosensor technology.

Description:
TITLE

IMPROVED OXIDOREDUCTASES FOR BIOCATALYSIS TECHNICAL FIELD

THE present invention relates to the use of oxidoreductases for biocatalysis. More particularly, the invention relates to isolated, engineered oxidoreductase proteins with improved characteristics for use in biocatalysis, and a process for developing these proteins.

BACKGROUND

Biocatalysis, the use of enzymes to perform chemical transformations of industrial relevance, is emerging as an attractive means of addressing bottlenecks in the synthesis and modification of new chemicals. The basic principle underpinning this is that enzymes can catalyse reactions with greater chemo-, regio- and stereo-selectivity than can be accomplished by purely chemical means (Walsh 2001). Additionally, biocatalysts have substantial utility in biosensor technology (Ronkainen et al. 2010) and for bioremediation (Wood 2008). Oxidoreductases comprise a major proportion of biocatalysts exploited industrially (Straathof et al. 2002). The P450 oxidoreductases are amongst the most versatile enzymes known, a property which has made them a high- priority target for exploitation as biocatalysts (Guengerich 2002). Furthermore, P450 oxidoreductases can be used in the preparation of small quantities of metabolites for use as experimental standards, and P450 oxidoreductase-based biocatalytic assays have substantial potential for screening drug candidates.

In microorganisms, P450s tend to be specialized to interact in a highly efficient manner with a relatively small set of structurally related substrates. However, efficiency is generally markedly diminished when microbial P450 forms act on unnatural substrates. In multicellular organisms, certain P450s perform key roles in xenobiotic metabolism, including the metabolism of drugs. These P450s show unusual characteristics compared to other more typical enzyme catalysts in that they act on an extraordinarily wide spectrum of substrates (Rendic 2002); such xenobiotic- metabolizing P450s may therefore be useful in a broader range of biocatalytic applications. However, known xenobiotic-metabolizing P450s possess relatively poor catalytic efficiency and inefficient coupling of product formation to cofactor consumption when acting on any one substrate, as well as sub-optimal interactions with their principal redox partner, NADPH-cytochrome P450 reductase (referred to variously as NPR or CPR) (Shimada et al. 2005; Tan et al. 1997). Process efficiency considerations mean that it is usually desirable to add high concentrations of substrates and generate high concentrations of product in bioreactors (Straathof et al. 2002). Achieving high substrate concentrations may necessitate the use of organic solvents at concentrations that are deleterious to the activity or stability of most P450s (e.g. Chauret et al. 1998; Wong et al. 2004). Another inherent difficulty with using P450s in an industrial context is their thermal and mechanical instability; this is particularly a problem for the lipophilic, membrane-bound, mammalian P450s, which typically need to be processed and stored in a high concentration of glycerol or some other cryoprotectant (Guengerich 1995; Guengerich et al. 1996). Conducting reactions under increased temperature may also improve efficiency, however at temperatures above which a P450 is stable the deactivation of enzyme will negate efficiency gains.

The identification and/or development of P450s with enhanced properties for industrial applications is clearly desirable. To date, screening libraries of existing P450s for desired properties has been seen as more expedient than engineering a new catalyst from a nonspecialized precursor (Straathof et al. 2002). However, improvements in protein engineering offer new scope to rapidly customize enzymes (Burton et al. 2002). Certain well-characterized and relatively efficient bacterial P450 forms have been engineered to redefine their substrate specificity, regio and enantio-selectivity, and catalytic efficiency. However, one potential limitation of substrate-specific microbial P450-derived mutant proteins, particularly for medicinal chemistry applications, is that they will be unlikely to produce many minor metabolites that may be of interest. Thus, the xenobiotic-metabolising P450s (e.g. mammalian enzymes), may be useful for engineering proteins that can produce a wider range of metabolites.

SUMMARY

The invention is broadly directed to the production of "engineered ancestral"

P450 enzymes and/or redox partners suitable for use with P450 enzymes. It is a preferred object of the invention to provide "engineered ancestral" P450 enzymes and/or redox partners suitable for use with P450 enzymes that display or possess one or more desirable properties that are at least partly absent in extant P450 enzymes and/or redox partners or are relatively enhanced compared to extant P450 enzymes and/or redox partners.

In a first aspect, the invention provides an isolated protein comprising the amino acid sequence set forth in SEQ ID NO: l, or an amino acid sequence at least 80% identical to SEQ ID NO: l , wherein residues X1-X219 in SEQ ID NO: l, or in the amino acid sequence at least 80% identical to SEQ ID NO: 1, may be any amino acid.

In an embodiment of this aspect, one or more of the residues X1-X219 is selected from the respective group of variable amino acids as set forth in Table 9.

In particular embodiments, the isolated protein comprises an amino acid sequence set forth in any one of SEQ ID NO:2-41 or SEQ ID NOS: 544-578.

Suitably, the isolated protein of this aspect has P450 enzyme activity.

This aspect also provides fragments, variants and/or derivatives of the isolated protein.

In one embodiment the isolated protein comprises an amino acid sequence at least 80% identical to any one of SEQ ID NOS:2-41 or SEQ ID NOS:544-578.

In a second aspect, the invention provides an isolated protein comprising the amino acid sequence set forth in SEQ ID NO: 180, or an amino acid sequence at least 80% identical to SEQ ID NO: 180, wherein residues X 1 -X 163 in SEQ ID NO: 180, or in the amino acid sequence at least 80% identical to SEQ ID NO: 180, may be any amino acid.

In an embodiment of this aspect, one or more of the residues Xi-Xi 63 is selected from the respective group of variable amino acids set forth in Table 10.

In particular embodiments, the isolated protein comprises an amino acid sequence set forth in any one of SEQ ID NOS: 181-250.

Suitably, the isolated protein of this aspect has P450 enzyme activity.

This aspect also provides fragments, variants and/or derivatives of the isolated protein.

One embodiment provides an isolated protein comprising an amino acid sequence at least 80% identical to the amino acid sequence of any one of SEQ ID NOS: 181-250.

In a third aspect the invention provides an isolated protein comprising the amino acid sequence set forth in SEQ ID NO:321, or an amino acid sequence at least 80% identical to SEQ ID NOS:321, wherein residues X 1 -X 1 1 in SEQ ID NO:321, or in the amino acid sequence at least 80% identical to SEQ ID NO:321, may be any amino acid.

In an embodiment of this aspect, one or more of the residues X 1 -X 1 1 is selected from the respective group of variable amino acids set forth in Table 1 1.

In particular embodiments, the isolated protein comprises an amino acid set forth in SEQ ID NOS:322-431. Suitably, the isolated protein of this aspect has P450 reductase enzyme activity. This aspect also provides fragments, variants and/or derivatives of the isolated protein.

One embodiment provides an isolated protein comprising an amino acid sequence at least 80% identical to the amino acid sequence of any one of SEQ ID NOS:322-431.

In a fourth aspect, the invention provides a method of producing or constructing an isolated protein, said method including the step of producing or constructing an engineered ancestral amino acid sequence of at least a fragment of a P450 protein or P450 reductase protein from one or more P450 protein or P450 reductase amino acid sequences that are different to the engineered ancestral amino acid sequence.

Suitably, the engineered ancestral P450 protein or P450 reductase protein displays or possesses one or more increased or enhanced properties compared to at least one of the one or more P450 proteins or P450 reductase proteins comprising amino acid sequences that are different to the engineered ancestral amino acid sequence.

In a fifth aspect, the invention provides an isolated P450 protein or P450 reductase protein produced according to the method of the fourth aspect.

In certain embodiments of the fourth or fifth aspects, the isolated protein having P450 enzyme activity is the isolated protein of the first aspect. In one particular embodiment, the isolated protein comprises an amino acid sequence set forth in SEQ ID NO: l or SEQ ID NO:2.

Non-limiting examples of the one or more P450 protein amino acid sequences that are different to said engineered ancestral amino acid sequence are set forth in SEQ ID NOS:42-179.

In another embodiment, the isolated protein having P450 enzyme activity is the isolated protein of the second aspect. In one particular embodiment, the isolated protein comprises an amino acid sequence set forth in SEQ ID NO: 180 or SEQ ID NO: 181.

Non-limiting examples of the one or more P450 protein amino acid sequences that are different to said engineered ancestral amino acid sequence are set forth in SEQ ID NOS:251-320.

In another embodiment, the isolated protein having P450 reductase enzyme activity is the isolated protein of the third aspect. In one particular embodiment, the isolated protein comprises an amino acid sequence set forth in SEQ ID NO:321 or SEQ ID NO:322. Non-limiting examples of the one or more P450 reductase protein amino acid sequences that are different to said engineered ancestral amino acid sequence are set forth in SEQ ID NOS:432-540.

In a sixth aspect, the invention provides a method of producing or constructing a modified engineered ancestral P450 protein or P450 reductase protein, said method including the step of introducing one or more amino acid substitutions in an amino acid sequence of the engineered ancestral P450 protein or P450 reductase protein to thereby produce or construct the modified engineered ancestral P450 protein or P450 reductase protein.

In certain preferred embodiments, said amino acid substitutions are non- conservative amino acid substitutions.

Suitably, the modified engineered ancestral P450 protein or P450 reductase protein displays or possesses one or more increased or enhanced properties compared to the engineered ancestral P450 protein or P450 reductase protein.

In a seventh aspect, the invention provides a modified engineered ancestral

P450 protein or P450 reductase protein produced according to the method of the sixth aspect.

In certain embodiments of the sixth or seventh aspect, the modified P450 enzyme comprises an amino acid sequence set forth in SEQ ID NO: l, any one of SEQ ID NOS:3-41 or SEQ ID NOS:544-578.

In other particular embodiments the isolated protein comprises an amino acid sequence at least 80% identical to any one of SEQ ID NOS:3-41 or SEQ ID NOS:544- 578.

In other embodiments, the modified P450 enzyme comprises an amino acid sequence set forth in SEQ ID NO: 180 or any one of SEQ ID NOS: 182-250.

In still other embodiments the isolated protein comprises an amino acid sequence at least 80% identical to any one of SEQ ID NOS: 182-250.

In yet other embodiments, the modified P450 reductase enzyme comprises an amino acid sequence set forth in SEQ ID NO:321 or any one of SEQ ID NOS:323-431.

In still yet other embodiments the isolated protein comprises an amino acid sequence at least 80% identical to any one of SEQ ID NOS:323-431.

In an eighth aspect, the invention provides an isolated nucleic acid encoding an isolated protein, fragment or derivative of any one of the aforementioned aspects, or produced according to the fourth aspect. In a ninth aspect, the invention provides a genetic construct comprising an isolated nucleic acid of the eighth aspect.

In a tenth aspect, the invention provides a host cell comprising the genetic construct of the ninth aspect.

In an eleventh aspect, the invention provides an antibody or antibody fragment which binds and/or is raised against an isolated protein of any one of the aforementioned aspects.

Suitably, in embodiments wherein the isolated protein is of the first aspect, the antibody or antibody fragment shows at least partial specificity for an isolated protein comprising any one of the amino acid sequences set forth in SEQ ID NOS:2-41 or SEQ ID NOS:544-578.

Suitably, in embodiments wherein the isolated protein is of the second aspect, the antibody or antibody fragment shows at least partial specificity for an isolated protein comprising any one of the amino acid sequences set forth in SEQ ID NOS: 181- 250.

Suitably, in embodiments wherein the isolated protein is of the third aspect, the antibody or antibody fragment shows at least partial specificity for an isolated protein comprising any one of the amino acid sequences set forth in 322-431.

A twelfth aspect of the invention relates to a composition for performing a chemical reaction, said composition comprising one or more isolated proteins according to the aforementioned aspects and one or more buffers, solvents and/or other reagents suitable for performing the chemical reaction.

A thirteenth aspect of the invention relates to a method of performing a chemical reaction, said method including the step of exposing one or more substrate molecules to one or more isolated proteins according to the aforementioned aspects to thereby perform the chemical reaction.

A fourteenth aspect of the invention relates to a reaction product produced according to the method of the thirteenth aspect.

A fifteenth aspect of the invention relates to use of the isolated protein or composition of the aforementioned aspects for biocatalysis, preferably for structure- activity relationship analysis; pharmacological testing; bioremediation; or biosensor technology.

It will be appreciated that the indefinite articles "a" and "an" are not to be read as singular indefinite articles or as otherwise excluding more than one or more than a single subject to which the indefinite article refers. For example, "a" protein includes one protein, one or more proteins or a plurality of proteins.

As used herein, unless the context requires otherwise, the words "comprise", "comprises" and "comprising" will be understood to mean the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.

BRIEF DESCRIPTION OF THE FIGURES

In order that the invention may be readily understood and put into practical effect, preferred embodiments will now be described by way of example with reference to the accompanying figures, wherein:

Figure 1 sets out the amino acid sequences SEQ ID NOS: l and 2.

Figure 2 sets out the amino acid sequences SEQ ID NOS: 180 and 181.

Figure 3 sets out the amino acid sequences SEQ ID NOS:321 and 322.

Figure 4 is a schematic representation of the construction of the bicistronic expression vector for CYP3_N1, pCW/3_NlHis/hNPR.

Figure 5 is a schematic representation of the construction of the monocistronic expression vector for CYP3_N1, pCW/3_NlHis.

Figure 6 is a schematic representation of the construction of the bicistronic vector for coexpression of the CYP3 N1-EGFP fusion with hCPR, pCW/3_Nl-EYFP His/hNPR.

Figure 7 is a schematic representation of the construction of the bicistronic expression vector for CYP2D_N1.

Figure 8 sets out the N-terminal sequences of two inferred ancestors (CYP2D_Nl-nat; and CYP2D N1-FL which is herein referred to as CYP2D N1) and the extant CYP2D proteins. The modifications made include changing the second residue to alanine (Gillam et al. 1995, Arch. Biochem. Biophys. 319, 540-550) and using the MAKKTSSKGK leader sequence (von Wachenfeldt et al. 1997, Arch. Biochem. Biophys. 339, 107-114) before the conserved PPGP motif. The CYP2D6 modifications used previously have also been included (Gillam et al. 1995, Arch. Biochem. Biophys. 319, 540-550). The "FL" sequences are full-length and the "trunc" sequences truncated.

Figure 9 is a schematic representation of the construction of the bicistronic expression vector for CPR Nl .

Figure 10 sets out thermostability profiles of extant (CYP3A4, CYP3A5, CYP3A27, and CYP3A37, labelled as 3A4, 3A5, 3A27, and 3A37, respectively) and engineered ancestral (CYP3_N1, labelled as 3_N1) CYP3 proteins, as measured by the percentage of folded protein after treatment with various temperatures. Heat treatment comprised heating the protein at the indicated temperature for 60 min, followed by cooling at 4°C and equilibration to room temperature for 5 min.

Figure 11 sets out thermostability profiles of extant (2D22 from mouse) and ancestor (2D N1) CYP2Ds based on percentage of folded protein after heat treatment. 2D N1 refers to the engineered ancestral CYP2D N1 protein, as described herein. Heat treatment comprised heating the protein at the indicated temperature for 60 min, followed by cooling at 4°C and equilibration to room temperature for 5 min.

Figure 12 sets out an Arrhenius plot of extant (CYP3A4, CYP3A5, CYP3A27, and CYP3A37, labelled as 3A4, 3A5, 3A27, and 3A37, respectively) and engineered ancestral (CYP3 N1, labelled as 3_N1) CYP3 proteins based on the percentage of folded protein after heat treatment. Samples were periodically taken and the remaining folded protein was subsequently measured. Heat treatment comprised heating the protein at the indicated temperature for 60 min, followed by cooling at 4°C and equilibration to room temperature for 5 min.

Figure 13 sets out thermostability profiles of CYP3 N1 expressed in mono- (labelled as 3_N1) and bicistronic (labelled as 3_Nl_hNPR) format based on percentage of folded protein after heat treatment. Heat treatment comprised heating the protein at the indicated temperature for 60 min, followed by cooling at 4°C and equilibration to room temperature for 5 min.

Figure 14 sets out thermostability profiles of engineered ancestral CPR Nl and extant human CPR (labelled as hNPR). Heat treatment comprised heating the protein at the indicated temperature for 60 min, followed by cooling at 4°C and equilibration to room temperature for 5 min. Data are shown for three preparations of CPR Nl derived from separate cultures (CPR.Nl #1, CPR.Nl #2, and CPR.Nl #3) compared to a pooled preparation of hNPR.

Figure 15 sets out solvent stability profiles of engineered ancestral CYP3 N1 (labelled as TS) and extant CYP3A4 (labelled as 3A4) in methanol, DMSO, or acetonitrile at various concentrations.

Figure 16 sets outsolvent stability profiles of engineered ancestral CYP3 N1 (labelled as TS) and extant CYP3A4 (labelled as 3A4) in 10% methanol, 10% DMSO, or 10%) acetonitrile. Figure 17 is a schematic representation of the construction of the CYP3 N1 library.

Figure 18 sets out ligand binding data for engineered ancestral CYP3 N1 (labelled as CYP TS) and extant CYP3A4 (labelled as Parent). '-' indicates that data is not presented.

Figure 19 sets out CYP3 N1 variant protein amino acid sequences SEQ ID NOS:3-41 respectively, in FASTA format in descending order.

Figure 20 sets out CYP2D N1 variant protein amino acid sequences SEQ ID NOS: 182-250 respectively, in FASTA format in descending order.

Figure 21 sets out CPR Nl variant protein amino acid sequences SEQ ID

NOS:323-431 respectively, in FASTA format in descending order.

Figure 22 sets out extant animal CYP3 protein amino acid sequences SEQ ID NOS:42-179 respectively, in FASTA format in descending order.

Figure 23 sets out extant animal CYP2D protein amino acid sequences SEQ ID NOS:251-320 respectively, in FASTA format in descending order.

Figure 24 sets out extant animal CPR protein amino acid sequences SEQ ID NOS:432-540 respectively, in FASTA format in descending order.

Figure 25 sets out exemplary nucleotide sequences SEQ ID NOS: 541-543, encoding CYP3_N1, CYP2D_N1, and CPR_N1, respectively, in FASTA format in descending order.

Figure 26 sets out certain compounds, and the structure thereof, that may act as substrates for P450 enzymes.

Figure 27 sets out an overview of the activity of CYP3_N1 (labelled as CYP3 TS ) on various substrates.

Figure 28 sets out the N-terminal "membrane anchor" region within certain extant CYP3 enzymes and the engineered ancestral CYP3 N1.

Figure 29 sets out substrate saturation curves for 6P-hydroxylation of testosterone by CYP3_N1 and CYP3A4.

Figure 30 sets out CYP3 N1 variant protein amino acid sequences SEQ ID NOS:544-578 respectively, in FASTA format in descending order.

Figure 31 sets out metabolism of tamoxifen (100 μΜ) to its major demethylated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle). hNPR expressed in the absence of any P450 (purple cross) was included as a control. Bacterial membranes containing 0.1 μΜ of the P450 enzymes indicated plus human NADPH-cytochrome P450 reductase were incubated with 100 μΜ tamoxifen. At 20 and 120 minutes, respectively, reactions were quenched by addition of two volumes of acetonitrile and protein was removed by sedimentation. Reaction extracts were lyophilised then resuspended for analysis by LC-MS. Results are the means +/- standard deviation of three independent replicates. The percent conversion is based on the mass spectrometer response and the ratio of metabolite to parent in that particular sample.

Figure 32 sets out metabolism of erythromycin (100 μΜ) to its major demethylated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D_N1 (green triangle). h PR expressed in the absence of any P450 (purple cross) was included as a control. Bacterial membranes containing 0.1 μΜ of the P450 enzymes indicated plus human NADPH-cytochrome P450 reductase were incubated with 100 μΜ erythromycin. At 20 and 120 minutes, respectively, reactions were quenched by addition of two volumes of acetonitrile and protein was removed by sedimentation. Reaction extracts were lyophilised then resuspended for analysis by LC-MS. Results are the means +/- standard deviation of three independent replicates. The percent conversion is based on the mass spectrometer response and the ratio of metabolite to parent in that particular sample.

Figure 33 sets out metabolism of erythromycin (10 μΜ) to its major demethylated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D_N1 (green triangle). hNPR expressed in the absence of any P450 (purple cross) was included as a control. Bacterial membranes containing 0.1 μΜ of the P450 enzymes indicated plus human NADPH-cytochrome P450 reductase were incubated with 10 μΜ erythromycin. At 20 and 120 minutes, respectively, reactions were quenched by addition of two volumes of acetonitrile and protein was removed by sedimentation. Reaction extracts were lyophilised then resuspended for analysis by LC-MS. Results are the means +/- standard deviation of three independent replicates. The percent conversion is based on the mass spectrometer response and the ratio of metabolite to parent in that particular sample.

Figure 34 sets out metabolism of erythromycin (100 μΜ) to its minor demethylated metabolite by CYP2D N1 (green triangle). hNPR expressed in the absence of any P450 (purple cross) was included as a control. Bacterial membranes containing 0.1 μΜ of the P450 enzymes indicated plus human NADPH-cytochrome P450 reductase were incubated with 100 μΜ erythromycin. At 20 and 120 minutes, reactions were quenched by addition of two volumes of acetonitrile and protein was removed by sedimentation. Reaction extracts were lyophilised then resuspended for analysis by LC-MS. Results are the means +/- standard deviation of three independent replicates. No activity was observed with CYP3_N1 or CYP3A4. The percent conversion is based on the mass spectrometer response and the ratio of metabolite to parent in that particular sample.

Figure 35 sets out metabolism of ticlopidine (100 μΜ) to its desaturated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle). hNPR expressed in the absence of any P450 (purple cross) was included as a control. Bacterial membranes containing 0.1 μΜ of the P450 enzymes indicated plus human NADPH-cytochrome P450 reductase were incubated with 100 μΜ ticlopidine. At the times indicated, reactions were quenched by addition of two volumes of acetonitrile and protein was removed by sedimentation. Reaction extracts were lyophilised then resuspended for analysis by LC-MS. Results are the means +/- standard deviation of three independent replicates. The percent conversion is based on the mass spectrometer response and the ratio of metabolite to parent in that particular sample.

Figure 36 sets out metabolism of ticlopidine (10 μΜ) to its desaturated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle). hNPR expressed in the absence of any P450 (purple cross) was included as a control. Bacterial membranes containing 0.1 μΜ of the P450 enzymes indicated plus human NADPH-cytochrome P450 reductase were incubated with 10 μΜ ticlopidine. At the times indicated, reactions were quenched by addition of two volumes of acetonitrile and protein was removed by sedimentation. Reaction extracts were lyophilised then resuspended for analysis by LC-MS. Results are the means +/- standard deviation of three independent replicates. The percent conversion is based on the mass spectrometer response and the ratio of metabolite to parent in that particular sample.

Figure 37 sets out metabolism of ticlopidine (100 μΜ) to its doubly desaturated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle). hNPR expressed in the absence of any P450 (purple cross) was included as a control. Bacterial membranes containing 0.1 μΜ of the P450 enzymes indicated plus human NADPH-cytochrome P450 reductase were incubated with 100 μΜ ticlopidine. At the times indicated, reactions were quenched by addition of two volumes of acetonitrile and protein was removed by sedimentation. Reaction extracts were lyophilised then resuspended for analysis by LC-MS. Results are the means +/- standard deviation of three independent replicates. The percent conversion is based on the mass spectrometer response and the ratio of metabolite to parent in that particular sample.

BRIEF DESCRIPTION OF THE SEQUENCE LISTING

SEQ ID NO: l Amino acid sequence of generic engineered ancestral

CYP3 sequence. SEQ ID NO:2 Amino acid sequence of engineered ancestral CYP3 sequence CYP3_N1.

SEQ ID NOS:3-41 Amino acid sequences of CYP3 N1 variants. SEQ ID NOS:42-179 Amino acid sequences of extant animal CYP3 sequences.

SEQ ID NO: 180 Amino acid sequence of generic engineered ancestral

CYP2D sequence. SEQ ID NO: 181 Amino acid sequence of engineered ancestral CYP2D sequence CYP2D_N1.

SEQ ID NOS: 182-250 Amino acid sequences of CYP2D N1 variants. SEQ ID NOS:251-320 Amino acid sequences of extant animal CYP2D sequences.

Amino acid sequence of generic engineered ancestral CPR sequence.

Amino acid sequence of engineered ancestral CPR sequence CPR Nl .

SEQ ID NOS:323-431 Amino acid sequences of CPR Nl variants.

SEQ ID NOS:432-540 Amino acid sequence of extant animal CPR sequences.

SEQ ID NO:541 Exemplary nucleotide sequence encoding CYP3 N1

SEQ ID NO:542 Exemplary nucleotide sequence encoding CYP2D N1.

SEQ ID NO: 543 Exemplary nucleotide sequence encoding CPR Nl .

SEQ ID NOS: 544-578 Amino acid sequences of further CYP3 N1 variants.

DETAILED DESCRIPTION This invention relates to the design and production of engineered P450 enzymes and/or P450 redox partners that represent enzymes ancestral to extant P450 enzymes and/or P450 redox partners, respectively. The invention is at least partly predicated on the surprising discovery that said engineered ancestral enzymes P450 enzymes and/or P450 redox partners may have one or more increased or enhanced properties relative to one or more extant P450 enzymes and/or P450 redox partners.

For example, in certain embodiments an engineered ancestral enzyme of the invention has substantially increased thermal stability. This is a highly surprising discovery because the temperature conditions to which the hypothetical engineered ancestral enzyme may have been exposed are likely to be substantially similar to the temperature conditions to which one or more corresponding extant enzymes are exposed.

In particular embodiments, "engineered ancestral" CYP3 and CYP2D P450 enzymes have been created which are putative or hypothetical ancestors of extant P450 enzymes. Furthermore, amino acid residues in the engineered ancestral CYP3 and CYP2D P450 enzymes have been identified that may be modified to confer, modify or remove one or more properties of the P450 enzymes. These include substrate specificity or promiscuity, thermal stability, pH stability and kinetic properties, although without limitation thereto.

In another aspect, the invention relates to a P450 redox partner suitable for use with a P450 enzymes such as CYP3 and CYP2D, although without limitation thereto. Suitably, the P450 redox partner is a P450 reductase. In a particular embodiment, the P450 redox partner is an "engineered ancestral" CPR P450 reductase. Furthermore, amino acid residues in the engineered ancestral CPR P450 reductase have been identified that may be modified to confer, modify or remove one or more properties of the P450 reductase enzymes. These include substrate specificity or promiscuity, thermal stability, pH stability and kinetic properties, although without limitation thereto.

For the purposes of this invention, by "isolated" is meant material that has been removed from its natural state or otherwise been subjected to human manipulation. Isolated material may be substantially or essentially free from components that normally accompany it in its natural state, or may be manipulated so as to be in an artificial state together with components that normally accompany it in its natural state. Isolated material may be in native, chemical synthetic or recombinant form. By "protein" is meant an amino acid polymer. The amino acids may be natural or non-natural amino acids, D- or L- amino acids as are well understood in the art.

A "peptide" is a protein having no more than fifty (50) amino acids.

A "polypeptide" is a protein having more than fifty (50) amino acids.

As used herein terms such as "oxidizing" and "oxidation" refer to increasing the oxidation state of an atom, molecule or ion through a loss or transfer of one or more electrons from the atom, molecule or ion. Typically, oxidation is accompanied by an associated reduction in oxidation state of another atom, molecule or ion through a gain or transfer of one or more electrons by or to the atom, molecule or ion. These associated or linked changes in oxidation state are referred to as "redox" reactions.

As used herein a "P450 enzyme" or a "protein having P450 enzyme activity" is a protein having one or more activities of a P450 enzyme. A preferred activity is monooxygenase activity according to the reaction: RH + 0 2 +NAD(P)H + H + ROH +H 2 0 +NAD(P) + where R is a carbon-containing heteroatom. Particular, non-limiting examples of such reactions include hydroxylation at aromatic and aliphatic centres, epoxidation, N-, 0-, S-dealkylation and N-, and S-oxidation, acyl migration, oxidative dehalogenation, ring expansion, contraction and cleavage, C-C bond cleavage, denitrosation of N-nitrosamines, oxidative ester cleavage, aldehyde scissions {e.g to alkenes and HCOOH), ipso attack on aromatic ring substituents and N- or O- dearylation. One or more other activities may include NO synthase-like activity, reductase activity {e.g. reductions of alkyl halides, N-oxides, nitro compounds, inorganic molecules such as S0 2 , Cr(VI) or NO, desaturations {e.g. dehydrogenations), one electron oxidations, isomerizations and/or phospholipase D activity {e.g phosphate ester hydrolysis). As used herein, "CYP3 protein" or "CYP3 enzyme" refers to a particular class or family of proteins having P450 enzyme activity. Additionally, "CYP2D protein" or "CYP2D enzyme" refers to another particular class or family of proteins having P450 enzyme activity.

Typically, a redox partner is required for redox activity of a P450 enzyme. As used herein, a "redox partner" is a protein that facilitates the transfer of electrons from an electron donor molecule to a P450 enzyme. Generally, the redox partner of a P450 enzyme is a P450 reductase, although without limitation thereto.

As used herein, a "P450 reductase" or a "protein having P450 reductase activity" is a protein having one or more activities of a P450 reductase enzyme. A preferred activity is the transfer of electrons from an electron donor molecule to a P450 enzyme. As used herein, "CPR protein" or "CPR enzyme" refers to the "cytochrome P450 reductase" class or family of proteins having P450 reductase activity. Generally, a CPR enzyme comprises a "flavin adenine dinucleotide" ("FAD") -binding domain and a "flavin mononucleotide" ("FMN") -binding domain. The general scheme of electron flow during redox reactions involving CPR and P450 enzymes is: NADPH→ FAD→ FMN→ P450→ O 2, although without limitation thereto.

It will be appreciated that a P450 enzyme may have a diverse array of substrates including testosterone, progesterone, midozalam, nifedipine, tamoxifen, cyclosporin A, erythromycin, cyclophosphamide, paracetamol, lignocaine, ethosuximide, codeine, lovastatin, 7-benzyloxy-4-(trifluoromethyl)-coumarin, 7-benzyloxyresorufin, terfenadine, S-omeprazole and benzyloxyluciferin; pesticides such as organochlorine pesticide, an organophosphate pesticide, or a pyrethroid pesticide; solvents such as perchloroethylene (PCE) or trichloroethylene (TCE); and food contaminants such as carbamate or organophosphate pesticide residues, although without limitation thereto.

Without limitation, certain compounds and the chemical structure thereof, that may be substrates of a P450 enzyme, are set out in FIG. 26 and/or FIG. 27.

Isolated proteins

The 138 amino acid sequences respectively set forth in SEQ ID NOS:42-179 (FIG. 22) are the amino acid sequences of certain isolated CYP3 proteins of animals (i.e. "extant" CYP3 proteins).

The amino acid sequence set forth in SEQ ID NO: l is the amino acid sequence of an isolated engineered ancestral CYP3 enzyme, as set out in FIG. 1. SEQ ID NOS:2- 41 (FIGS. 1 and 19) and SEQ ID NOS:544-578 (FIG. 30) are particular amino acid sequences of "variants" of SEQ ID NO: l, comprising variations at one or more of the amino acid residues designated X 1- X 219 .

The location of the residues X1-X219 in SEQ ID NO: l with reference to SEQ ID NO: 2 is given in table form in Table 9. Certain preferred amino acids for each of the residues X1-X219, respectively, are set forth in Table 9 under the heading 'Variable'.

In one embodiment, one or more of the residues X1-X219 is an amino acid sequence selected from the respective groups consisting of the variable amino acids set forth in Table 9.

It will be appreciated that in some embodiments, the isolated protein comprising the amino acid sequence set forth in SEQ ID NO: l, and variants thereof, may be referred to as a protein having P450 activity that is "ancestral" to each of the proteins set forth in SEQ ID NOS:42-179.

Suitably, the isolated engineered ancestral CYP3 enzyme comprising the sequence set forth in SEQ ID NO: l, and variants thereof, have one or more improved or enhanced properties compared to one or more of the isolated proteins comprising sequences set forth in SEQ ID NOS:42-179.

The particular engineered ancestral CYP3 enzyme variant comprising the amino acid sequence set forth in SEQ ID NO:2, and described in the EXAMPLES, is herein referred as "CYP3 N1". Additionally, the isolated proteins comprising the amino acid sequences set forth in SEQ ID NOS:3-41 and SEQ ID NOS:544-578 are herein referred to as "CYP3_N1 variants".

The CYP3 N1 variants set forth in SEQ ID NOS:3-41, are variants of CYP3 N1 which represent the CYP3 N1 variant library constructed as set forth in EXAMPLES 9- 10.

Furthermore, The CYP3 N1 variants set forth in SEQ ID NOS: 544-578 are variants of CYP3 N1 which represent nodes of the evolutionary tree calculated for construction of the engineered ancestral CYP3 protein as described in EXAMPLE 1.

In certain preferred embodiments, a CYP3 N1 variant comprising an amino acid sequences set forth in SEQ ID NOS:3-41 or SEQ ID NOS:544-578 has one or more improved or enhanced properties compared to CYP3 N1.

Non-limiting examples of the one or more improved properties of an engineered ancestral CYP3 enzyme as herein described include: thermal stability, stability in solvents (e.g. organic solvents), metabolite production, catalytic versatility, catalytic efficiency (e.g. the efficiency of coupling of product formation to cofactor consumption), substrate specificity (e.g. increased specificity or increased genericity, as desired) and enzyme kinetic properties (e.g increased V ma X , lower K m ), although without limitation thereto.

The 70 amino acid sequences respectively set forth in SEQ ID NOS:251-320 (FIG. 23) are the amino acid sequences of certain isolated CYP2D proteins of animals (i.e. "extant" CYP2D proteins).

The amino acid sequence set forth in SEQ ID NO: 180 is the amino acid sequence of an isolated engineered ancestral CYP2D protein, as set out in FIG. 2. SEQ ID NOS: 181-250 (FIGS. 2 and 20) are particular amino acid sequences of variants of SEQ ID NO: 180, comprising variations at one or more of the amino acid residues designated X 1 -X 163 The location of the residues X 1 -X 163 in SEQ ID NO: 180 with respect to SEQ ID NO: 181 is given in table form in Table 10.

Certain preferred amino acids for each of the residues X 1 -X 163 , respectively, are set forth in Table 10 under the heading 'Variable'.

In one embodiment, one or more of the residues X 1 -X 163 is an amino acid sequence selected from the respective groups consisting of the variable amino acids set forth in Table 10.

It will be appreciated that in some embodiments, the isolated protein comprising the amino acid sequence set forth in SEQ ID NO: 180, and variants thereof, may be referred to as a protein having P450 activity that is ancestral to each of the proteins set forth in SEQ ID NOS:251-320.

Suitably, the isolated engineered ancestral CYP2D protein comprising the sequence set forth in SEQ ID NO: 180, and variants thereof, have one or more improved or enhanced properties compared to one or more of the isolated proteins comprising sequences set forth in SEQ ID NOS:251-320.

The particular engineered ancestral CYP2D enzyme variant comprising the amino acid sequence set forth in SEQ ID NO: 181 is referred to herein as "CYP2D N1". Additionally, the isolated proteins comprising the amino acid sequences set forth in SEQ ID NOS: 182-250 are herein referred to as "CYP2D_N1 variants".

The CYP2D_N1 variants set forth in SEQ ID NOS: 182-250 are variants of

CYP2D N1 which represent nodes of the evolutionary tree calculated for construction of the engineered ancestral CYP2D protein as described in EXAMPLE 3.

In certain preferred embodiments, a CYP2D N1 variant comprising an amino acid sequences set forth in SEQ ID NOS: 182-250 has one or more improved or enhanced properties compared to CYP2D N1.

Non-limiting examples of the one or more improved properties of an engineered ancestral CYP2D enzyme as herein described include: thermal stability, stability in solvents (e.g. organic solvents), metabolite production, catalytic versatility, catalytic efficiency (e.g the efficiency of coupling of product formation to cofactor consumption), ligand binding capacity (e.g. increased or decreased strength of binding; and/or increased or decreased specificity of binding, as desired), substrate specificity (e.g. increased specificity or increased genericity, as desired) and enzyme kinetic properties (e.g increased V ma X , lower K m ), although without limitation thereto. The 109 amino acid sequences set forth in SEQ ID NOS:432-540 (FIG. 24) are the amino acid sequences of certain isolated CPR reductase proteins of animals (i.e "extant" CPR reductase proteins).

The amino acid sequence set forth in SEQ ID NO:321 is the amino acid sequence of an isolated engineered ancestral CPR protein, as set out in FIG. 3. SEQ ID NOS:322-431 (FIGS. 3 and 21) are particular amino acid sequences of variants of SEQ ID NO:321, comprising variations at one or more of the amino acid residues designated X 1 -X 1 1 . The location of the residues X 1 -X 1 1 in SEQ ID NO:321 with respect to SEQ ID NO:322 is given in table form in Table 1 1.

Certain preferred amino acids for each of the residues X 1 -X 1 1 , respectively, are set forth in Table 1 1 under the heading 'Variable' .

In one embodiment, one or more of the residues X 1 -X 1 1 is an amino acid sequence selected from the respective groups consisting of the variable amino acids set forth in Table 1 1.

It will be appreciated that in some embodiments, the isolated protein comprising the amino acid sequence set forth in SEQ ID NO:321, and variants thereof, may be referred to as a protein having P450 reductase activity that is ancestral to each of the proteins set forth in SEQ ID NOS:432-540.

Suitably, the isolated engineered ancestral P450 reductase protein comprising the sequence set forth in SEQ ID NO: 321, and variants thereof, have one or more improved or enhanced properties compared to one or more of the isolated proteins comprising sequences set forth in SEQ ID NOS:432-540.

The particular engineered ancestral CPR enzyme variant comprising the amino acid sequence set forth in SEQ ID NO:322 is referred to herein as "CPR Nl". Additionally, the isolated proteins comprising the amino acid sequences set forth in

SEQ ID NOS:323-431 are herein referred to as "CPR Nl variants".

The CPR Nl variants set forth in SEQ ID NOS:323-431 are variants of

CPR Nl which represent nodes of the evolutionary tree calculated for construction of the engineered ancestral CPR protein as described in EXAMPLE 4.

In certain preferred embodiments, a CPR Nl variant comprising an amino acid sequence set forth in SEQ ID NOS:323-431 has one or more improved or enhanced properties compared to CPR Nl .

Non-limiting examples of the one or more improved properties of an engineered ancestral CPR enzyme as herein described include: thermal stability, stability in solvents (e.g. organic solvents), enzyme kinetic properties (e.g increased Vmax, lower K m ), ligand binding capacity (e.g. increased or decreased strength of binding; and/or increased or decreased specificity of binding, as desired), substrate specificity (e.g. increased specificity or increased genericity, as desired), ability to couple to a diversity of proteins (e.g. P450s and/or other electron acceptor proteins), and ability to use a diversity of electron donors (e.g. both NADPH and NADH, or NADH preferentially as an electron donor), although without limitation thereto.

Certain embodiments also relate to fragments of the isolated proteins disclosed herein.

In one embodiment, a protein "fragment" includes an amino acid sequence that constitutes less than 100%, but at least 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90- 99%) of said isolated protein.

In another embodiment, a protein fragment comprises no more than 6, 10, 12, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, 200, 250, 300, 350 or 400 contiguous amino acids of the isolated protein.

In a preferred embodiment, the protein fragment has one or more activities of a P450 enzyme or P450 reductase enzyme as hereinbefore described.

In certain embodiments, the protein fragment does not comprise an N-terminal "membrane anchor" region. By way of example, with specific regard to CYP3 N1, the N-terminal membrane anchor region comprises the amino acid positions 1-38 set forth in SEQ ID NO:2, as set forth in FIG. 28.

Certain embodiments also relate to variants of an isolated protein of the invention. In a preferred embodiment, the protein variant has one or more activities of a P450 or P450 reductase enzyme as hereinbefore described.

As used herein "variant" proteins of the invention have one or more amino acids deleted or substituted by different amino acids. It is well understood in the art that some amino acids may be substituted or deleted without an expectation of changing the activity of the protein substantially (^'conservative" substitutions). More substantial changes to activity may be made by introducing substitutions or deletions that are less conservative (' 'non-conservative" substitutions).

Preferably, protein variants share at least 70% or 75%, preferably at least 80% or 85% or more preferably at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%) sequence identity with an amino acid sequence of the isolated protein. In particular preferred embodiments relating to protein variants of engineered ancestral CYP3 enzymes as herein described, said protein variants share at least 70% or 75%, preferably at least 80% or 85% or more preferably at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with at least one of the amino acid sequences set forth in SEQ ID NOS:2-41 or SEQ ID NOS: 544-578.

In particular preferred embodiments relating to protein variants of engineered ancestral CYP2D enzymes as herein described, said variants share at least 70% or 75%, preferably at least 80% or 85% or more preferably at least 90%, 91%, 92%, 93%, 94%, 95%), 96%), 97%), 98%) or 99% sequence identity with at least one of the amino acid sequences set forth in SEQ ID NOS: 181-250.

In particular preferred embodiments relating to protein variants of engineered ancestral CPR enzymes as herein described, said variants share at least 70% or 75%, preferably at least 80% or 85% or more preferably at least 90%, 91%, 92%, 93%, 94%, 95%), 96%), 97%), 98%) or 99% sequence identity with at least one of the amino acid sequences set forth in SEQ ID NOS:322-431.

In other preferred embodiments, protein variants share at least 80% or 85% or more preferably at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with an amino acid sequence of the isolated protein, excluding the N- terminal membrane anchor region of said protein. By way of example, with specific regard to CYP3 N1, the N-terminal membrane anchor region comprises the amino acid positions 1-38 of the amino acid sequence set forth in SEQ ID NO:2, as set forth in Figure 28 and Table 9. Additionally, the N-terminal membrane anchor region of CYP2D N1 comprises the amino acid positions 1-36 as set forth in Table 10.

As will be understood by one skilled in the art, the N-terminal membrane anchor region of a P450 enzyme or P450 reductase enzyme of the invention may be substantially modified without substantially affecting the function of said protein.

In certain preferred embodiments, the protein variant comprises a modified N- terminal "membrane anchor" region. By way of example, CYP3 N1 comprises the modified N-terminal sequence MALLLAVFL at amino acid positions 1-9. Such a modified N-terminal membrane anchor region may assist with protein expression, although without limitation thereto.

Terms used generally herein to describe sequence relationships between respective proteins and nucleic acids include "comparison window", "sequence identity", "percentage of sequence identity" and "substantial identity". Because respective nucleic acids/proteins may each comprise (1) only one or more portions of a complete nucleic acid/protein sequence that are shared by the nucleic acids/proteins, and (2) one or more portions which are divergent between the nucleic acids/proteins, sequence comparisons are typically performed by comparing sequences over a "comparison window" to identify and compare local regions of sequence similarity. A "comparison window" refers to a conceptual segment of typically 6, 9 or 12 contiguous residues that is compared to a reference sequence. The comparison window may comprise additions or deletions (i.e., gaps) of about 20% or less as compared to the reference sequence for optimal alignment of the respective sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by computerised implementations of algorithms (Geneworks program by Intelligenetics; GAP, BESTFIT, FAST A, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Drive Madison, WI, USA, incorporated herein by reference) or by inspection and the best alignment (i.e., resulting in the highest percentage homology over the comparison window) generated by any of the various methods selected. Reference also may be made to the BLAST family of programs as for example disclosed by Altschul et al, 1997, Nucl. Acids Res. 25 3389, which is incorporated herein by reference. A detailed discussion of sequence analysis can be found in Unit 19.3 of CURRENT PROTOCOLS IN MOLECULAR BIOLOGY Eds. Ausubel et al. (John Wiley & Sons Inc NY, 1995-1999).

The term "sequence identity" is used herein in its broadest sense to include the number of exact nucleotide or amino acid matches having regard to an appropriate alignment using a standard algorithm, having regard to the extent that sequences are identical over a window of comparison. Thus, a "percentage of sequence identity" is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, U) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. For example, "sequence identity" may be understood to mean the "match percentage" calculated by the DNASIS computer program (Version 2.5 for windows; available from Hitachi Software engineering Co., Ltd., South San Francisco, California, USA). Certain embodiments also relate to derivatives of an isolated protein of the present invention. In a preferred embodiment, the derivative protein has one or more activities of a P450 or P450 reductase enzyme as hereinbefore described.

As used herein, "derivative" proteins have been altered, for example by conjugation or complexing with other chemical moieties, by post-translational modification (e.g phosphorylation, acetylation etc), modification of glycosylation (e.g. adding, removing or altering glycosylation) and/or inclusion of additional amino acid sequences as would be understood in the art.

Additional amino acid sequences may include fusion partner amino acid sequences which create a fusion protein. By way of example, fusion partner amino acid sequences may assist in detection and/or purification of the isolated fusion protein. Non-limiting examples include metal-binding (e.g polyhistidine) fusion partners, maltose binding protein (MBP), Protein A, glutathione S-transferase (GST), fluorescent protein sequences (e.g. GFP), epitope tags such as myc, FLAG and haemagglutinin tags.

For the particular purpose of fusion polypeptide purification by affinity chromatography, relevant matrices for affinity chromatography include glutathione-, amylose-, and nickel- or cobalt-conjugated resins respectively. Many such matrices are available in kit form, such as the QIAexpress™ system (Qiagen) useful with (HIS 6 ) fusion partners and the Pharmacia GST purification system.

Preferably, the fusion partners also have protease cleavage sites, such as for

Factor X a or Thrombin, which allow the relevant protease to partially digest the fusion polypeptide of the invention and thereby liberate the recombinant polypeptide of the invention therefrom. The liberated polypeptide can then be isolated from the fusion partner by subsequent chromatographic separation.

Other derivatives contemplated by the invention include, but are not limited to, modification to amino acid side chains, incorporation of unnatural amino acids and/or their derivatives during peptide, polypeptide or protein synthesis and the use of crosslinkers and other methods which impose conformational constraints on the isolated protein, fragments and variants disclosed herein.

Methods for producing engineered ancestral proteins

In another aspect, the invention provides a method of producing or constructing an isolated protein, said method including the step of producing or constructing an engineered ancestral amino acid sequence of at least a fragment of a P450 protein or P450 reductase protein from one or more P450 protein or P450 reductase amino acid sequences that are different to the engineered ancestral amino acid sequence.

Suitably, the engineered ancestral P450 protein or P450 reductase protein displays or possesses one or more increased or enhanced properties compared to at least one of the one or more P450 protein or P450 reductase amino acid sequences that are different to the engineered ancestral amino acid sequence.

Non-limiting examples of the one or more increased or enhanced properties of the protein include thermal stability, stability in solvents (e.g. organic solvents), metabolite production, catalytic versatility, catalytic efficiency (e.g. the efficiency of coupling of product formation to cofactor consumption), ligand binding capacity (e.g. increased or decreased strength of binding; and/or increased or decreased specificity of binding, as desired), substrate specificity (e.g. increased specificity or increased genericity, as desired), enzyme kinetic properties (e.g. increased Vmax, lower K m ), ability to couple to a diversity of proteins (e.g. P450s and/or other electron acceptor proteins), and ability to use a diversity of electron donors (e.g. both NADPH and NADH, or NADH preferentially as an electron donor), although without limitation thereto.

Suitably, the engineered ancestral protein having P450 or P450 reductase enzyme activity is distinct from any of a plurality of corresponding "extant" enzymes encoded by respective genomes of different organisms, such as those proteins used to construct or produce the engineered ancestral protein. In this regard, it will be appreciated by persons skilled in the art that a protein comprising an amino acid sequence comprised by the engineered ancestral protein having P450 or P450 reductase enzyme activity may or may not ever have actually existed until its construction according to the method of the invention.

In some embodiments of the method, the step of producing or constructing an engineered ancestral amino acid sequence of at least a fragment of a P450 protein or P450 reductase protein comprises the use of one or more computational methods for the reconstruction of engineered ancestral DNA and/or amino acid sequences. As will be understood by one skilled in the art, such methods can include "maximum parsimony" methods, "maximum likelihood" methods, and "Bayesian inference" methods. In some embodiments, said one or more computational methods include a "marginal likelihood" method and/or a "joint likelihood" method. In some embodiments of the method, the construction of a sequence of a hypothetical engineered ancestral protein includes the use of one or more software tools for the reconstruction of engineered ancestral DNA and/or amino acid sequences. Said software tools may include, although without limitation thereto, FastML (Ashkenazy et al. 2012), Phylobayes 3 (Lartillot et al. 2009) and/or one or more of those listed at http://topicpages.ploscompbiol.Org/wiki/Engineeredancestral_ reconstruction#Software, incorporated herein by reference.

According to some embodiments of the method, "evolutionary intermediate sequences" may be calculated or constructed using one or more software tools in the process of constructing the engineered ancestral sequence, wherein said evolutionary intermediate sequences may or may not have ever existed prior to construction according to the method.

In some embodiments of the method, an evolutionary tree is constructed for the engineered ancestral P450 or P450 reductase protein using one or more of the aforementioned computational methods and/or software tools, and amino acid sequences which represent nodes within the evolutionary tree are identified. As used herein, a "node ' " within an evolutionary tree represents a common ancestor of the descendants that share or are linked by the node. In one embodiment, the isolated protein having P450 enzyme activity is an isolated engineered ancestral CYP3 protein as hereinbefore described. In one particular embodiment, the isolated protein comprises an amino acid sequence set forth in SEQ ID NO: 1 or SEQ ID NO:2.

Non-limiting examples of the one or more P450 protein amino acid sequences that are different to said engineered ancestral amino acid sequence are set forth in SEQ ID NOS:42-179.

In another embodiment, the isolated protein having P450 enzyme activity is an isolated engineered ancestral CYP2D protein as herein described. In one particular embodiment, the isolated protein comprises an amino acid sequence set forth in SEQ ID NO: 180 or SEQ ID NO: 181.

Non-limiting examples of the one or more P450 protein amino acid sequences that are different to said engineered ancestral amino acid sequence are set forth in SEQ ID NOS:251-320.

In another embodiment, the isolated protein having P450 reductase enzyme activity is an isolated engineered ancestral CPR protein, as herein described. In one particular embodiment, the isolated protein comprises an amino acid sequence set forth in SEQ ID NO:321 or SEQ ID NO:322.

Non-limiting examples of the one or more P450 protein amino acid sequences that are different to said engineered ancestral amino acid sequence are set forth in SEQ ID NOS:432-540.

A related aspect provides a modified engineered ancestral P450 protein or P450 reductase protein produced according to the method of this aspect.

In yet another aspect, the invention provides a method of producing or constructing a modified engineered ancestral P450 protein or P450 reductase protein, said method including the step of introducing one or more amino acid substitutions in an amino acid sequence of the engineered ancestral P450 protein or P450 reductase protein to thereby produce or construct the modified engineered ancestral P450 protein or P450 reductase protein. In certain preferred embodiments, said amino acid substitutions are non-conservative amino acid substitutions.

Suitably, the modified engineered ancestral P450 protein or P450 reductase protein displays or possesses one or more increased or enhanced properties compared to the engineered ancestral P450 protein or P450 reductase protein.

Non-limiting examples of the one or more improved or enhanced properties of the protein include thermal stability, stability in solvents (e.g. organic solvents), metabolite production, catalytic versatility, catalytic efficiency (e.g the efficiency of coupling of product formation to cofactor consumption), substrate specificity (e.g. increased specificity or increased genericity, as desired) and enzyme kinetic properties (e.g. increased V ma X , lower K m ).

In certain preferred embodiments, said non-conservative amino acid substitutions comprise functional or functionally important amino acids comprised by an engineered ancestral P450 protein or P450 reductase protein. In certain embodiments a functional or functionally important amino acid is one which is known or predicted to substantially affect a protein property or function, such as hereinbefore described.

In certain other preferred embodiments, said non-conservative amino acid substitutions comprise amino acid positions wherein there is low prediction confidence within an engineered ancestral P450 protein or P450 reductase protein amino acid sequence constructed using one or more computational methods and/or software tools, or discrepancy between ancestral P450 protein or P450 reductase protein amino acid sequences constructed using one or more computational methods and/or software tools, such as the computational methods and/or software tools as hereinbefore described. Said modifications may comprise amino acid positions wherein there is variation amongst a plurality of hypothetical engineered ancestral protein sequences constructed for the P450 or P450 reductase enzyme using different software tools and/or computational methods, although without limitation thereto.

By way of example, as set forth in Table 11, the positions X 1 -X 1 1 of the generic engineered ancestral CPR protein sequence set forth in SEQ ID NO:322 represent amino acid positions at which there is variation amongst a sequence constructed using a 'Joint Maximum Likelihood' and 'Marginal Maximum Likelihood' method, as hereinabove described. In this respect, CPR Nl (SEQ ID NO:321) represents the sequence constructed using the joint likelihood method.

In certain other preferred embodiments, said non-conservative amino acid substitutions comprise amino acid positions wherein there is variation amongst extant and/or evolutionary intermediate sequences used to construct a hypothetical engineered ancestral protein sequence, and/or sequences which represent nodes within an evolutionary tree constructed for the hypothetical engineered ancestral P450 or P450 reductase protein, as hereinbefore described.

By way of example, as set forth in Table 9, the positions X 1 -X 219 of the generic engineered ancestral CYP3 protein sequence set forth in SEQ ID NO: l represent amino acid positions at which there is variation amongst corresponding extant and/or evolutionary intermediate sequences used to construct the engineered ancestral CYP3 protein, and/or sequences which represent nodes within the evolutionary tree constructed for the engineered ancestral CYP3 protein.

Similarly, as set forth in Table 10, the positions X 1 -X 163 of the generic engineered ancestral CYP2D protein sequence set forth in SEQ ID NO: 180 represent amino acid positions at which there is variation amongst corresponding extant and/or evolutionary intermediate sequences used to construct the engineered ancestral CYP2D protein, and/or sequences which represent nodes within the evolutionary tree constructed for the engineered ancestral CYP2D protein.

In certain preferred embodiments, the modified engineered ancestral P450 protein is a modified engineered ancestral CYP3 protein. In one embodiment, the modified engineered ancestral CYP3 protein enzyme comprises an amino acid sequence set forth in SEQ ID NO: l or any one of SEQ ID NOS:3-41 or SEQ ID NOS:544-578. In other embodiments the isolated protein comprises an amino acid sequence at least 80% identical to any one of SEQ ID NOS:3-41 or SEQ ID NOS:544-578, including at least 85% identical, at least 90% identical, and at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, and 99% identical.

In certain other preferred embodiments, the modified engineered ancestral P450 protein is a modified engineered ancestral CYP2D protein. In one embodiment, the modified engineered ancestral CYP2D protein comprises an amino acid sequence set forth in SEQ ID NO: 180 or any one of SEQ ID NOS: 182-250.

In other embodiments the isolated protein comprises an amino acid sequence at least 80% identical to any one of SEQ ID NOS: 182-250, including at least 85% identical, at least 90% identical, and at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, and 99% identical.

In certain other preferred embodiments, the modified engineered ancestral P450 reductase protein is a modified engineered ancestral CPR protein. In one embodiment, the modified engineered ancestral CPR protein comprises an amino acid sequence set forth in SEQ ID NO:321 or any one of SEQ ID NOS:323-431.

In other embodiments the isolated protein comprises an amino acid sequence at least 80% identical to any one of SEQ ID NOS:323-431, including at least 85% identical, at least 90% identical, and at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, and 99% identical.

In certain embodiments, an isolated engineered ancestral protein of the invention has enhanced thermal stability compared to one or more corresponding extant animal proteins.

In certain embodiments, thermal stability is the ability of a protein to resist "thermal deactivation". As used herein, "thermal deactivation" refers to the loss of a properly folded state of a protein as a result of exposure to heat, and/or the loss of enzyme activity of a protein as a result of exposure to heat.

A measure of thermal stability may be expressed as a X T 50 temperature (°C) or "T50" for folded protein, wherein 50% of protein is properly folded after exposure to said temperature for a duration of x minutes. For example, the 60 Τ 5 ο for CYP3 N1 is about 66 + 2 (Table 1). Another measure of thermal stability may be expressed as a x T 5 o temperature (°C) or T50 for enzyme activity, wherein there is 50% of enzyme activity after exposure to said temperature for a duration of x minutes. Thermal stability may also be measured as the percentage of properly folded protein (%) after exposure to a given temperature (°C) for a given duration (min), and/or the percentage of enzyme activity (%) after exposure to a given temperature (°C) for a given duration (min).

Yet another measure of thermal stability may be "thermal deactivation energy"

(E a ) which can be defined as the minimum energy that is required to cause thermal deactivation of a protein. For example, E a for CYP3 N1 is 251 kj/mol (Table 3).

In one particular embodiment, the isolated engineered ancestral CYP3 N1 protein of the invention has enhanced thermal stability compared to one or more of the isolated proteins comprising SEQ ID NOS:42-179. As will be evident from FIG. 10 and EXAMPLE 5 herein, isolated CYP3 N1 displayed substantially greater thermal stability than extant animal CYP3 proteins.

Additionally, it will be evident from EXAMPLE 10 that the CYP3 N1 variants comprising amino acid sequences set forth in SEQ ID NOS:3-28 displayed thermostability similar or greater than CYP3 N1.

In another particular embodiment, the isolated CYP2D N1 protein of the invention has enhanced thermal stability compared to one or more of the isolated proteins comprising SEQ ID NOS:251-320. As will be evident from FIG. 11 and EXAMPLE 5 herein, isolated CYP2D N1 protein displayed substantially greater thermal stability than an extant animal CYP2D protein.

In another particular embodiment, the isolated CPR Nl protein of the invention has enhanced thermal stability compared to one or more of the isolated proteins comprising SEQ ID NOS:432-540. As will be evident from FIG. 14 and EXAMPLE 5 herein, isolated CPR Nl protein displayed substantially greater thermal stability than extant animal CPR proteins.

In certain particular embodiments, an isolated engineered ancestral protein of the invention has enhanced solvent stability compared to one or more of the corresponding extant animal proteins.

One measure of solvent stability may be the ability of a protein to adopt a properly folded state at a given concentration of solvent. Another measure of solvent stability may be the enzyme activity of a protein at a given concentration of solvent.

Preferably, said solvent is an organic solvent including, but not limited to: a polar protic solvent including an alcohol, e.g. ethanol, methanol, and isopropanol; a polar aprotic solvent e.g. dimethyl sulfoxide (DMSO), and acetonitrile; and a non-polar solvent including saturated and unsaturated hydrocarbon molecules and aromatic molecules.

In one particular embodiment, the isolated engineered ancestral CYP3 N1 molecule of the invention has enhanced stability in an organic solvent comprising methanol, compared to one or more of the isolated proteins comprising SEQ ID NOS:42-179.

As will be evident from EXAMPLE 7 and FIG. 15 and FIG. 16, CYP3 N1 displayed substantially higher velocity in methanol compared to extant animal CYP3 proteins.

In another particular embodiment, the isolated engineered ancestral CYP3 N1 molecule of the invention has enhanced stability in an organic solvent comprising acetonitrile, compared to one or more of the isolated proteins comprising SEQ ID NOS:42-179.

As will be evident from EXAMPLE 7 and FIG. 15 and FIG. 16, CYP3 N1 displayed substantially increased velocity in solvents comprising various concentrations of acetonitrile, compared to extant animal CYP3 proteins.

In certain other embodiments, an isolated engineered ancestral protein of the invention has altered binding affinity for one or more ligands as compared to one or more corresponding extant animal proteins.

One measure of ligand binding may be a "dissociation constant" (K d ) in molar units. As will be understood by those skilled in the art, K d indicates the concentration of ligand at which the binding site on a particular protein is half occupied. Therefore, it will be understood that a lower K d value indicates a higher binding affinity of a protein for a given ligand.

In some preferred embodiments, said one or more ligands comprise a macrolide

(e.g. azithromycin, cyclosporin A, troleandomycin, clarithromycin, erythromycin, and telithromycin) and/or a benzodiazepine (e.g. 2-keto benzodiazepines such as diazepam, and imidazo benzodiazepines such as midazolam, although without limitation thereto).

In yet other preferred embodiments, an isolated engineered ancestral protein of the invention demonstrates altered metabolism of chemicals, such as drugs, as compared to one or more corresponding extant animal proteins.

As will be evident from EXAMPLE 6 and FIGS. 31-37, CYP3 N1 demonstrated altered metabolism of tamoxifen, erythromycin, and ticlopidine compared to extant CYP3A4. In some embodiments, an isolated engineered ancestral protein of the invention may demonstrate more rapid conversion of a chemical, such as a drug, to a metabolite of that chemical; and/or result in a greater conversion of a chemical, such as a drug, to a metabolite of that chemical at completion (or near completion).

As set forth in EXAMPLE 6 and FIG. 31, as compared to CYP3A4, CYP3 N1 demonstrated more rapid metabolism of tamoxifen to its major demethylated metabolite, and resulted in a greater conversion of tamoxifen to its major demethylated metabolite after 120 minutes.

Furthermore, as set forth in EXAMPLE 6 and FIG. 35-36, as compared to CYP3A4, CYP3 N1 demonstrated more rapid metabolism of ticlopidine to its desaturated metabolite, and resulted in a greater conversion of ticlopidine to its desaturated metabolite after 120 minutes. In certain other embodiments, the activity of an isolated protein of the invention in a reaction comprising a given substrate produces a different metabolite profile, as compared to the activity of one or more extant animal proteins in a corresponding reaction comprising said substrate.

In certain embodiments, the different metabolite profile may comprise one or more metabolites that are present or absent, and/or a relative increase or decrease in one or more metabolites. Isolated nucleic acids

Another aspect of the invention provides an isolated nucleic acid that encodes an isolated protein of the invention, inclusive of fragments, variants and derivatives of the isolated protein.

For embodiments of the aspect relating to isolated CYP3 proteins, said nucleic acid is exemplified in SEQ ID NO: 541 (FIG. 25).

For embodiments of the aspect relating to isolated CYP2D proteins, said nucleic acid is exemplified in SEQ ID NO: 542 (FIG. 25).

For embodiments of the aspect relating isolated CPR proteins, said nucleic acid is exemplified in SEQ ID NO:543 (FIG. 25).

The term "nucleic acid" as used herein designates single-or double-stranded

DNA and RNA. DNA includes genomic DNA and cDNA. RNA includes mRNA, RNA, RNAi, siRNA, cRNA and autocatalytic RNA. Nucleic acids may also be DNA-RNA hybrids. A nucleic acid comprises a nucleotide sequence which typically includes nucleotides that comprise an A, G, C, T or U base. However, nucleotide sequences may include other bases such as inosine, methylycytosine, methylinosine, methyladenosine and/or thiouridine, although without limitation thereto.

A "polynucleotide " is a nucleic acid having eighty (80) or more contiguous nucleotides, while an "oligonucleotide " has less than eighty (80) contiguous nucleotides.

A "probe" may be a single or double-stranded oligonucleotide or polynucleotide, suitably labelled for the purpose of detecting complementary sequences in Northern or Southern blotting, for example.

A "primer" is usually a single- stranded oligonucleotide, preferably having 15-50 contiguous nucleotides, which is capable of annealing to a complementary nucleic acid "template" and being extended in a template-dependent fashion by the action of a DNA polymerase such as Taq polymerase, RNA-dependent DNA polymerase or Sequenase™.

Another particular aspect of the invention provides a variant of an isolated nucleic acid that encodes an isolated protein, variant, fragment or derivative disclosed herein. In a preferred embodiment, nucleic acid variants share at least 60% or 65%, preferably at least 70% or 75%, more preferably at least 80% or 85%, and even more preferably at least 90% or 95% nucleotide sequence identity with an isolated nucleic acid that encodes one or more of the isolated proteins of the invention.

In yet another embodiment, nucleic acid variants hybridize to isolated nucleic acids of the invention, under at least low stringency conditions, preferably under at least medium stringency conditions and more preferably under high stringency conditions.

"Hybridize and Hybridization" is used herein to denote the pairing of at least partly complementary nucleotide sequences to produce a DNA-DNA, RNA-RNA or DNA-RNA hybrid. Hybrid sequences comprising complementary nucleotide sequences occur through base-pairing between complementary purines and pyrimidines as are well known in the art.

In this regard, it will be appreciated that modified purines (for example, inosine, methylinosine and methyladenosine) and modified pyrimidines (thiouridine and methylcytosine) may also engage in base pairing.

"Stringency " as used herein, refers to temperature and ionic strength conditions, and presence or absence of certain organic solvents and/or detergents during hybridisation. The higher the stringency, the higher will be the required level of complementarity between hybridizing nucleotide sequences. "High stringency conditions" designates those conditions under which only nucleic acid having a high frequency of complementary bases will hybridize.

Reference herein to high stringency conditions include and encompass :-

(i) from at least about 31% v/v to at least about 50% v/v formamide and from at least about 0.01 M to at least about 0.15 M salt for hybridisation at 42°C, and at least about 0.01 M to at least about 0.15 M salt for washing at 42°C;

(ii) 1% BSA, 1 mM EDTA, 0.5 M NaHP0 4 (pH 7.2), 7% SDS for hybridization at 65°C, and (a) 0.1 x SSC, 0.1% SDS; or (b) 0.5% BSA, ImM EDTA, 40 mM NaHP0 4 (pH 7.2), 1% SDS for washing at a temperature in excess of 65°C for about one hour; and

(iii) 0.2 x SSC, 0.1% SDS for washing at or above 68°C for about 20 minutes.

In general, washing is carried out at T m = 69.3 + 0.41 (G + C) % -12°C. In general, the T m of a duplex DNA decreases by about 1°C with every increase of 1% in the number of mismatched bases.

Notwithstanding the above, stringent conditions are well known in the art, such as described in Chapters 2.9 and 2.10 of. Ausubel et al, supra. A skilled addressee will also recognize that various factors can be manipulated to optimize the specificity of the hybridization. Optimization of the stringency of the final washes can serve to ensure a high degree of hybridization.

In other embodiments, isolated nucleic acid variants may be produced using a nucleic acid amplification technique. Suitable nucleic acid amplification techniques are well known to the skilled addressee, and include polymerase chain reaction (PCR); strand displacement amplification (SDA); rolling circle replication (RCR); nucleic acid sequence-based amplification (NASBA), Q-β replicase amplification and helicase- dependent amplification, although without limitation thereto.

As used herein, an "amplification product" refers to a nucleic acid product generated by nucleic acid amplification.

Particularly for analytical purposes, nucleic acid amplification techniques may include quantitative and semi-quantitative techniques such as qPCR, real-time PCR and competitive PCR, as are well known in the art. Suitably, isolated nucleic acid variants may be produced using nucleic acid amplification techniques using one or more degenerate primers based on, or derived from, a nucleotide sequence of an isolated nucleic acid disclosed herein. By way of example, the degenerate primer(s) may be designed to anneal to one or more nucleotide sequences of a variant nucleic acid to thereby facilitate amplification of the variant nucleic acid, or a fragment thereof.

Genetic constructs

Yet another aspect of the invention provides a genetic construct that comprises an isolated nucleic acid or variant as herein described and one or more additional nucleotide sequences.

Suitably, the genetic construct may be in the form of, or comprise genetic components of, a plasmid, bacteriophage, a cosmid, or a yeast or bacterial artificial chromosome as are well understood in the art.

Genetic constructs may be suitable for maintenance and propagation of the isolated nucleic acid in bacteria or other host cells, for manipulation by recombinant

DNA technology and/or expression of the nucleic acid or an encoded protein of the invention.

For the purposes of host cell expression, the genetic construct is an expression construct. Suitably, the expression construct comprises one or more nucleic acid or variants disclosed herein operably linked to one or more additional sequences in an expression vector.

An "expression vector" may be either a self-replicating extra-chromosomal vector such as a plasmid, or a vector that integrates into a host genome.

By "operably linked" is meant that said additional nucleotide sequence(s) is/are positioned relative to the nucleic acid of the invention preferably to initiate, regulate or otherwise control transcription.

In one embodiment, the additional nucleotide sequences are regulatory sequences. Regulatory nucleotide sequences will generally be appropriate for the host cell used for expression. Numerous types of appropriate expression vectors and suitable regulatory sequences are known in the art for a variety of host cells.

Typically, said one or more regulatory nucleotide sequences may include, but are not limited to, promoter sequences, leader or signal sequences, ribosomal binding sites, transcriptional start and termination sequences, translational start and termination sequences, and enhancer or activator sequences.

Constitutive or inducible promoters as known in the art are contemplated by the invention. The promoters may be either naturally occurring promoters, or hybrid promoters that combine elements of more than one promoter.

In another embodiment, the additional nucleotide sequence is a selectable marker gene to allow the selection of transformed host cells. Selectable marker genes are well known in the art and will vary with the host cell used.

The expression construct may also include an additional nucleotide sequence encoding a fusion partner (typically provided by the expression vector) so that the recombinant polypeptide of the invention is expressed as a fusion protein, as hereinbefore described.

Isolated proteins of the invention (inclusive of fragments, derivatives and homologs) may be prepared by any suitable procedure known to those of skill in the art. Preferably, the isolated protein is a recombinant protein.

By way of example only, a recombinant isolated protein of the invention may be produced by a method including the steps of:

(i) preparing an expression construct which comprises an isolated nucleic acid of the invention, operably linked to one or more regulatory nucleotide sequences;

(ii) transfecting or transforming a suitable host cell with the expression construct;

(iii) expressing a recombinant protein in said host cell; and

(iv) isolating the recombinant protein from said host cell. Suitable host cells for expression may be prokaryotic or eukaryotic. For example, suitable host cells may be mammalian cells, plant cells, yeast cells, insect cells or bacterial cells. One preferred host cell for expression of an isolated protein according to the invention is a bacterium.

Introduction of genetic constructs into host cells (whether prokaryotic or eukaryotic) is well known in the art, as for example described in CURRENT PROTOCOLS IN MOLECULAR BIOLOGY Eds. Ausubel et al, (John Wiley & Sons, Inc. 1995-2009), in particular Chapters 9 and 16.

The recombinant protein may be conveniently prepared by a person skilled in the art using standard protocols as for example described in Sambrook, et al, MOLECULAR CLONING. A Laboratory Manual (Cold Spring Harbor Press, 1989), in particular Sections 16 and 17; CURRENT PROTOCOLS IN MOLECULAR BIOLOGY Eds. Ausubel et al, (John Wiley & Sons, Inc. 1995-2009), in particular Chapters 10 and 16; and CURRENT PROTOCOLS IN PROTEIN SCIENCE Eds. Coligan et al, (John Wiley & Sons, Inc. 1995-2009), in particular Chapters 1, 5 and 6.

Antibodies

Another aspect of the invention provides an antibody or antibody fragment which binds, or has been raised against, an isolated protein disclosed herein.

In certain preferred embodiments wherein the isolated protein is an engineered ancestral CYP3 protein, the antibody or antibody fragment shows at least partial specificity for an isolated protein comprising any one of the amino acid sequences set forth in SEQ ID NOS:2:41 or SEQ ID NOS:544-578. Suitably, said isolated protein does not bind, or demonstrates substantially reduced binding against, one or more of the isolated proteins set forth in SEQ ID NOS:42-179.

In certain preferred embodiments wherein the isolated protein is an engineered ancestral CYP2D protein, the antibody or antibody fragment shows at least partial specificity for an isolated protein comprising any one of the amino acid sequences set forth in SEQ ID NOS: 182-250. Suitably, said antibody or antibody fragment does not bind, or demonstrates substantially reduced binding against, one or more of the isolated proteins comprising the amino acid sequences SEQ ID NOS :251-320.

In certain preferred embodiments wherein the isolated protein is an engineered ancestral CPR protein, the antibody or antibody fragment shows at least partial specificity for an isolated protein comprising any one of the amino acid sequences set forth in 322-431. Suitably, the antibody or antibody fragment does not bind, or demonstrates substantially reduced binding against, one or more of the isolated proteins comprising the amino acid sequences SEQ ID NOS:432-540.

As used herein an "antibody" is or comprises an immunoglobulin. The term "immunoglobulin" includes any antigen-binding protein product of a mammalian immunoglobulin gene complex, including immunoglobulin isotypes IgA, IgD, IgM, IgG and IgE and antigen-binding fragments thereof. Included in the term "immunoglobulin" are immunoglobulins that are chimeric or humanised or otherwise comprise altered or variant amino acid residues, sequences and/or glycosylation, whether naturally occurring or produced by human intervention {e.g. by recombinant DNA technology). Antibody fragments include Fab and Fab'2 fragments, diabodies, triabodies and single chain antibody fragments (e.g. scVs), although without limitation thereto. Typically, an antibody comprises respective light chain and heavy chain variable regions that each comprise CDR 1, 2 and 3 amino acid sequences. A preferred antibody fragment comprises at least one light chain variable region CDR and/or at least one heavy chain variable region CDR.

Antibodies and antibody fragments may be polycolonal or preferably monoclonal. Monoclonal antibodies may be produced using the standard method as for example, described in an article by Kohler & Milstein, 1975, Nature 256, 495, or by more recent modifications thereof as for example described in Chapter 2 of Coligan et al, CURRENT PROTOCOLS IN IMMUNOLOGY, by immortalizing spleen or other antibody producing cells derived from a production species which has been inoculated with an isolated protein or a fragment thereof. It will also be appreciated that antibodies may be produced as recombinant synthetic antibodies or antibody fragments for example by expressing a nucleic acid encoding the antibody or antibody fragment in an appropriate host cell. Recombinant synthetic antibody or antibody fragment heavy and light chains may be co-expressed from different expression vectors in the same host cell or expressed as a single chain antibody in a host cell. Non-limiting examples of recombinant antibody expression and selection techniques are provided in Chapter 17 of Coligan et al, CURRENT PROTOCOLS IN IMMUNOLOGY and Zuberbuhler et al, 2009, Protein Engineering, Design & Selection 22 169.

In some embodiments, the antibody or antibody fragment is labelled.

The label may be selected from a group including a chromogen, a catalyst, biotin, digoxigenin, an enzyme, a fluorophore, a chemiluminescent molecule, a radioisotope, a drug or other chemotherapeutic agent, a magnetic bead and/or a direct visual label.

It will be appreciated that the antibody or antibody fragment may be used for the detection and/or purification of an isolated protein disclosed herein. Methods of use of the isolated protein

Another aspect of the invention relates to a method of performing a chemical reaction, said method including the step of exposing a molecule to one or more isolated proteins disclosed herein to thereby perform a chemical reaction in the molecule. An aspect of the invention also provides a composition suitable for performing the chemical reaction. The composition suitably comprises one or more buffers, salts, solvents and/or other reagents that facilitate or allow the reaction to proceed. It will be understood that such pH buffers, salts, solvents and/or other reagents are well known in the art and may be selected according to the particular type of chemical reaction.

Suitably, the method includes exposing the molecule to a protein having P450 activity and/or a redox partner for the protein having P450 activity.

In some preferred embodiments, said method includes the step of exposing the molecule to at least one of:

i) an isolated engineered ancestral CYP3 protein disclosed herein;

ii) an isolated engineered ancestral CYP2D protein disclosed herein; and iii) an isolated CPR protein disclosed herein.

In preferred embodiments of the method wherein the protein having P450 activity comprises an isolated engineered ancestral CYP3 protein disclosed herein, the redox partner for said CYP3 protein comprises an isolated engineered ancestral CPR protein disclosed herein.

In other embodiments, the redox partner for said CYP3 protein comprises any other suitable redox partner(s), which may include, but is not limited to one or more of: cytochrome b5 and cytochrome b5 reductase; a ferredoxin (e.g. adrenodoxin); and a ferrodoxin reductase (e.g. adrenodoxin reductase) or flavodoxin reductase.

In a preferred embodiment of the method wherein the protein having P450 activity comprises an isolated engineered ancestral CYP2D protein disclosed herein, the redox partner for said CY2D protein comprises an isolated engineered ancestral CPR protein disclosed herein.

In other embodiments of the method wherein the protein having P450 activity comprises an isolated engineered ancestral CYP2D protein disclosed herein, the redox partner for said CYP2D protein comprises any other suitable redox partner(s), which may include, but is not limited to one or more of cytochrome b and cytochrome b5 reductase; and ferredoxin (e.g. adrenodoxin) and ferrodoxin reductase (e.g. adrenodoxin reductase) or flavodoxin reductase.

In a preferred embodiment of the method wherein the redox partner for a protein having P450 reductase activity comprises an isolated engineered ancestral CPR protein disclosed herein, the protein having P450 activity comprises an isolated engineered ancestral CYP3 protein disclosed herein. In another preferred embodiment of the method wherein the redox partner for a protein having P450 activity comprises the CPR protein disclosed herein, the protein having P450 activity comprises an isolated engineered ancestral CYP2D protein disclosed herein.

In other embodiments wherein the redox partner for a protein having P450 activity is an isolated engineered ancestral CPR protein disclosed herein, the protein having P450 activity comprises any other suitable P450 protein, which may include one or more animal P450 proteins, e.g. CYP1, CYP2, CYP3, CYP4, CYP5, CYP6, CYP7, CYP8, CYP9, CYP1 1, CYP12, CYP17, CYP19, CYP20, CYP21, CYP24, CYP26, CYP27, CYP39, CYP46, and CYP51 animal P450 enzyme classes; microbial P450 proteins, e.g. CYP101, CYP107A1, and CYP1 19 microbial P450 enzyme classes; and/or plant P450 proteins, e.g. CYP51, CYP74, CYP97, CYP710, CYP71 1, CYP727, CYP746 plant P450 enzymes classes, although without limitation thereto.

Particular, non-limiting examples of such chemical reactions include redox reactions, hydroxylation at aromatic and aliphatic centres, epoxidation, N-, 0-, S- dealkylation and N-, and S-oxidation, acyl migration, oxidative dehalogenation, ring expansion, contraction and cleavage, C-C bond cleavage, denitrosation of N- nitrosamines, oxidative ester cleavage, aldehyde scissions (e.g to alkenes and HCOOH), ipso attack on aromatic ring substituents and N- or O-deaiylation, reductions of alkyl halides, N-oxides, nitro compounds, inorganic molecules such as S0 2 , Cr(VI) or NO, desaturations (e.g. dehydrogenations), one electron oxidations, isomerizations and/or phospholipase D activity (e.g phosphate ester hydrolysis).

These reactions may have applications for: the production of fine chemicals (e.g. pharmaceuticals, agrichemicals, fragrances, and dyes); gene therapy; bioremediation; biosensors; diagnostics; plant biotechnology; and/or medicinal chemistry (e.g. drug discovery and pharmacological testing). With particular regard to drug discovery, an isolated protein disclosed herein may be used for structural diversification of molecules present in molecular libraries, such as natural product libraries, synthetic combinatorial libraries, and/or rationally designed structure-based libraries, although without limitation thereto.

As will be understood by those skilled in the art, structural diversification may facilitate structure-activity relationship ("SAR") analysis to thereby generate an array of improved lead compounds having one or more improved properties such as: enhanced pharmacodynamic properties; reduced toxicity; enhanced bioavailability; enhanced half- life; enhanced formulation properties; and/or reduced production cost, although without limitation thereto. As will be understood by those skilled in the art, in the aforementioned context a "lead" compound refers to a chemical compound that has pharmacological or biological activity likely to be useful for a given purpose; for example, for therapeutic and/or industrial application, although without limitation thereto; but may still have suboptimal properties for said purpose.

With particular regard to pharmacological testing, an isolated protein described herein may be used to metabolize a xenobiotic, for example a drug, to produce metabolites that are produced during human metabolism of said xenobiotic. It is anticipated that said metabolites can be used for purposes including the assessment of toxicity to an animal, for example "general toxicity", "genotoxicity", "embryo-fetal toxicity", as will be understood by those skilled in the art; and "carcinogenicity", although without limitation thereto.

With particular regard to bioremediation, an isolated protein described herein may be used to metabolize an environmental pollutant, for example, although without limitation thereto, a hydrocarbon pollutant such as a diesel, a gasoline, or an oil; a pesticide pollutant such as an organochlorine pesticide, an organophosphate pesticide, or a pyrethroid pesticide; and a solvent pollutant such as perchloroethylene (PCE) or trichloroethylene (TCE). In some embodiments, the isolated protein may be expressed by a microorganism or a plant, thereby allowing said microorganism or plant to metabolize an environmental pollutant, or improving the efficiency with which said microorganism or plant metabolizes an environmental pollutant.

With particular regard to biosensor technology, an isolated protein described herein may be used to detect metabolites in a human blood sample, for example, drugs or drug metabolites; and/or food contaminants, such as carbamate or organophosphate pesticide residues, although without limitation thereto.

EXAMPLES

EXAMPLE 1. Ancestral sequence reconstruction of the cytochrome P450 family 3 (CYP3).

A total of 138 CYP3 sequences from 42 animal species were collected from Uniprot, NCBI and the cytochrome P450-nomenclature homepage (http://dnelson.utmem.edu/Cytochrome P450.html) database. The sequences were aligned using MAFFT (Katoh et al. 2002). The alignment was fine-tuned manually to improve its reliability at gap positions. Phylogenetic relationships of these CYP3 sequences were reconstructed using the Maximum Likelihood method under the JTT substitution model (Jones et al. 2002) with Phyml using T-REX (Alix et al. 2012). The initial tree was determined by neighbour-joining (BIONJ) (Gascuel 1997). Bootstrapping analysis was performed to evaluate the tree. Prediction of ancestral nodes of the tree (SEQ ID NOS:544-578) was performed using FastML (Pupko et al. 2000). The last common ancestor of CYP3 was identified and designated 'Ν .

A codon-optimized nucleotide sequence for this Nl amino acid sequence was synthesized by Gene Art (Germany), in which the first 19 codons of Nl were replaced with the sequence 5' ATG GCT CTG TTA TTA GCA GTT TTT CTG 3' encoding the peptide, MALLLAVFL, which was known to facilitate the expression of P450 CYP3 A4 in E. coli (Gillam et al. 1993). This nucleotide sequence encodes CYP3 N1, set forth in SEQ ID NO:2. A C-terminal hexa-His tag was also added to the coding sequence of the engineered ancestral protein linked to the C-terminal end of the P450 coding sequence by a Ser-Thr linker encoding a Sail site.

EXAMPLE 2. Construction of CYP3_N1 expression plasmids.

The CYP3 N1 open reading frame was subcloned from the cloning vector, pMA_RQ/3_Nl (GeneArt, Germany) into the pCW expression vector (Muchmore et al. 1989) containing the human NADPH-cytochrome P450 reductase (hNPR) sequence, using 5' Ndel and 3 ' Sail sites (FIG. 4). The sequence of the CYP3 N1 open reading frame in the expression plasmid was verified by automated dideoxy sequencing at the Brisbane Node of the Australian Genome Research Facility (St. Lucia, Australia). Subsequently, a monocistronic expression construct (lacking the hNPR open reading frame) and a CYP3_Nl/enhanced yellow fluorescent protein (EYFP) fusion construct (retaining the hCPR expression cassette) were constructed by the strategies shown in FIG. 5 and FIG. 6. For the monocistronic construct, a synthetic, double-stranded, oligonucleotide linker containing Xbal, Smal, Blpl, Sacl, Nsil, Hindlll, and Nhel restriction sites was first ligated into the bicistronic construct (pCW/3_NlHis/hNPR) to facilitate the subsequent removal of hNPR via Blpl digestion followed by relegation of the linearized monocistronic vector (FIG. 5). The CYP3 N1 EYFP fusion was obtained by ligating the EYFP fragment from pCW/2C19 F L-EYFPHis/hNPR digested with Sail into the bicistronic construct (FIG. 6). EXAMPLE 3. Ancestral sequence reconstruction of the CYP2D subfamily (CYP2D) A total of 70 CYP2D sequences from vertebrate species were collected from Uniprot, NCBI and the cytochrome P450-nomenclature homepage (http://dnelson.utmem.edu/Cytochrome P450.html) database. The sequences were aligned using MEGA 6 (Beta 2) (Tamura et al. 2013), using a Multiple Sequence Comparison by Log-Expectation (MUSCLE) alignment with the following parameters: gap open: -2.9, gap extend: -1.01, hydrophobicity multiplier: 1.2. The alignment was fine-tuned manually to improve its reliability at gap positions. Phylogenetic relationships of these CYP2D sequences were reconstructed using the Maximum Likelihood method under the JTT substitution model (Jones et al. 1992) with Phyml using T-REX (Alix et al 2012). The initial tree was determined by neighbour-joining (BIONJ) (Gascuel 1997). Bootstrapping analysis was performed to evaluate the tree. Prediction of ancestral nodes of the tree (SEQ ID NOS: 182-250) was performed using FastML (Pupko et al. 2000) and the last common ancestor of CYP2D and the ancestor of the mammalian CYP2D forms were identifed, with the last common ancestor designated 'Ν . A nucleotide sequence encoding the full length (FL) CYP2D N1 amino acid sequence (which amino acid sequence is set forth in SEQ ID NO: 181) was designed in which the second codon was changed to GCT (Ala) and the first 12 codons were optimised to mimimise mRNA secondary structure formation (Gillam et al. 1995) without further alteration of the amino acid sequence. Sequences were analysed from - 20 bases preceding to +96 following the start codon (Goodman et al. 2013) using the online software Nupack (Zadeh et al. 2011) to calculate free energy values.

A C-terminal hexa-His tag was also added to the coding sequence of the engineered ancestral protein linked to the C-terminal end of the P450 coding sequence by a Ser-Thr linker encoding a Sail site. Two 80 bp flanking sequences complementary to the destination expression vector were added to the open reading frame for this FL sequence, and the resultant sequence was synthesised as a GeneBlock by Integrated DNA Technologies (Singapore). As set forth in FIG. 7, the final bicistronic expression vector was generated by Gibson Assembly (Zadeh et al. 2011) of this Geneblock with the backbone of the pCW72D22/hNPR vector from which the CYP2D22 insert had been removed by digestion with Ndel and Xbal.

A truncated version of the CYP2D N1, as set forth in FIG. 8, was generated by PCR amplification using mutagenic primers; Forward (2DN1MAKFWD), 5 ' AGGTC AT ATGGC AA AAAAAAC ATC ATC AA AAGGA AAATTCCC ACC AGGCCCTATGTC ATT3' and Reverse (hNPR5'REV), 5 ' GAC ACGGTGGAGCTGGTGTCC ACGTGG3 ' . As set forth in FIG. 8, the MAKKTSSKGK leader sequence (von Wachenfeldht et al. 1997; Rowland et al. 2006) was added to the coding sequence upstream of the proline-rich region. The PCR product was digested with Ndel and Xbal and cloned into the cognate sites of pCW72D22/hNPR to generate the pCW72D_Nltrunc/h PR bicistronic expression vector.

EXAMPLE 4. Ancestral sequence reconstruction of the NADPH-cytochrome P450 reductase (CPR)

CPR sequences were obtained from Uniprot, and by BLASTing the Uniprot sequences against the NCBI (Altschul et al. 1990) database to retrieve additional sequences of high homology, then aligned using the CLUSTALW method (Kyoto University Bioinformatics Center; <http://www.genome.jp/>) Anomalous sequences were removed (e.g. obvious sequencing errors, clearly incomplete open reading frames, etc.), realigned and minor allelic variants and sequences that deviate significantly from the family characteristics (e.g. <55% sequence identity to all other forms, either globally or over any section of -20 residues) were pruned such that the ultimate alignment included only one sequence per enzyme. The remaining 109 sequences were then aligned with the house fly CPR sequence (designated as the outgroup sequence) and the evolutionary tree derived in MEGA (Tamura et al. 2013) by the ML method, with the designated outgroup as the root. (ML has been shown to be more accurate than MP; Gadagkar et al. 2005). This tree was then imported along with the alignment into the FastML web server (Ashkenazy et al. 2012) and prediction of ancestral nodes of the tree (SEQ ID NOS: 182-250) and the last common ancestor using joint maximum likelihood and marginal maximum likelihood methods was performed The last common ancestor made using the joint reconstruction method was designated 'Ν and used for further experiments.

The engineered ancestral amino acid sequence was reverse translated using the Geneart codon optimisation algorithm (Thermo Fisher Scientific: Life Technologies 2014) with codon usage optimised for Escherichia coli. The propensity for the corresponding mRNA to fold into stable secondary structure was analysed over the N- terminal nucleotide sequence (from -21 to +96 with respect to start codon) using NUPACK (Zadeh et al. 2011) and minimised by iterative silent changes to the nucleotide sequence. The resultant sequence with two flanking sequences complementary to the destination expression vector added to the final open reading frame, encoding CPR Nl set forth in SEQ ID NO:322, was synthesised as a GeneBlock by Integrated DNA Technologies (Singapore). As set forth in FIG. 9, the final bicistronic expression vector was generated by Gibson Assembly (Zadeh et al. 2011) of this synthetic gene fragment using the backbone of the pCW74Al 1/hNPR vector from which the hNPR insert had been removed by digestion with Xbal and Hindlll.

EXAMPLE 5. Expression and thermostability measurement of the CYP3_N1, CYP2D_N1, and CPR_N1 engineered ancestral proteins

Recombinant proteins were expressed as described previously (Notley et al. 2002) with minor modifications. Starter cultures (5 mL LB media supplemented with 11.1 mM glucose, 100 μg/mL ampicillin and 20 μg/mL chloramphenicol, in 25 mL flasks) were inoculated from glycerol stocks or freshly transformed colonies and incubated overnight at 37°C, with shaking at 400 rpm. Expression cultures were set up in 100 mL flasks by inoculating 100 μL of starter culture into 10 mL Terrific Broth (TB) (Sambrook et al. 1989) containing 100 μg/mL ampicillin, 20 μg/mL chloramphenicol, ImM thiamine and trace elements (Bauer et al. 1974; TB expression medium). Cultures were incubated at 25°C with shaking at 180 rpm for an initial 5h. Recombinant protein expression was then induced by adding arabinose (4 mg/mL), δ-aminolaevulinic acid (0.5 mM) and isopropyl-P-thiogalactopyranoside (IPTG) (ImM) and the flasks were incubated at 25°C with shaking at 180 rpm for a further 43 h. As controls, the following recombinant extant proteins were expressed in parallel: for CYP3 experiments - CYP3 A4 (Gillam et al. 1993), CYP3A5 (Gillam et al. 1995b), CYP3A27 (Behrendorff 2011) and CYP3A37 (Rawal et al. 2010); for CYP2D experiments - CYP2D22, CYP2D6, CYP2D12, CYP2D45, and CYP2D49; for CPR experiments - hNPR (extant human CPR).

For CYP3 and CYP2D experiments, cultures were harvested by centrifugation at 3000 x g for 10 min then resuspended in whole cell assay buffer (WCAB: 100 mM potassium phosphate, 6 mM magnesium acetate, 10 mM (+) glucose, pH 7.4). The resuspended enzyme solution was heated at various temperatures (30-75°C) for lh in Bio-Rad MyCycler Thermal Cycler, cooled at 4°C for 5 min and equilibrated at room temperature for 5 min. The residual folded protein was then quantified by Fe(II) CO versus Fe(II) difference spectroscopy (Johnston et al. 2008). In this way measures of thermal stability of CYP3 and CYP2D proteins based on the percentage of folded protein after treatment for 60 minutes at various temperatures were obtained. Results for CYP3 and CYP2D proteins are set forth in FIG.10 and FIG. 11, respectively.

As will be evident from FIG. 10, the percentage of folded protein was substantially higher for CYP3 N1 (3N_1) after temperature treatment as compared to the percentage of folded protein for extant CYP3 proteins after temperature treatment. For example, after temperature treatment of 55°C, the percentage folded protein was ~ 100% for CYP3N 1, as compared to ~ 0% for the extant CYP3 proteins. Additionally, T50 values for folded protein were calculated, as presented in Table 1.

As will be evident from FIG. 11, the percentage of folded protein was substantially higher for CYP2D N1 (2DN 1) after temperature treatment as compared to the percentage of folded protein for extant CYP3 proteins after temperature treatment. For example, after 60 minutes temperature treatment of 55°C, the percentage folded protein was ~ 100% for CYP2DN 1, as compared to ~ 0% for the extant CYP2D protein.

For CYP3 proteins, the half-life of folded protein was estimated by plotting the exponential decay graph of extant and ancestor proteins heated at various temperatures (50, 55, 60, 65, 70, and 75°C); results are set forth in Table 2. As will be evident from Table 2, the CYP3 N1 (3_N1) ancestor protein displayed substantially elevated half life of folded protein as compared to the extant CYP3 proteins. For example, after heating at 55°C, the half life of CYP3_N1 was ~ 566.5 minutes, while the half life for the extant CYP3 proteins was between ~ 0.5 minutes and - 1.6 minutes. Furthermore, CYP3 protein samples were taken at specific time-points and the residual folded protein was quantified. The percentage of folded enzyme remaining after heat treatment was compared to the non-heat-treated samples and the data obtained was used to generate an Arrhenius plot as set forth in FIG. 12. As will be evident from FIG. 12, the slope of the Arrhenius plot substantially was steeper for CYP3 N1 than for the extant CYP3 animal proteins. It will therefore be understood from FIG. 12 that the thermal deactivation energy was higher for CYP3 N1 than for the extant CYP3 animal proteins, as set forth in Table 3.

In order to evaluate whether there are stability effects being contributed by the interaction of CYP3 N1 with its redox partner, the thermostability of proteins produced by monocistronic (pCW/3_Nl His) and bicistronic (pCW/3_Nl his/hNPR) constructs were compared. As will be evident from FIG. 13, the thermostability of proteins produced by monocistronic and bicistronic constructs was similar. For CPR Nl experiments, cells were subjected to sub-cellular fractionation and membranes were prepared for analysis using a cytochrome c reductase assay; CPR Nl activity was then assessed via the reduction of the surrogate electron acceptor cytochrome c, as previously described by Guengerich (1994) before and after heat treatment as described above for P450s. Results are set forth in FIG. 14; data are shown for three preparations of CPR Nl derived from separate cultures compared to a pooled preparation of extant human CPR (hNPR).

As will be evident from FIG. 14, the percentage of active enzyme was substantially increased for CPR Nl after temperature treatment as compared to the percentage of active enzyme for extant CPR protein after temperature treatment. For example, the percent active enzyme after 60 minutes heating at 55°C was ~ 50% for CPR Nl, as compared to ~ 10% for extant CPR. Additionally, T50 enzyme activity values were obtained for CPR proteins. 60 Τ 50 values were as follows: CPR_N1, 45.0 ± 1.0°C; hNPR, 38.5°C. Similar measurements on housefly reductase ( 60 T 50 = 41.2°C) and two Arabidopsis thaliana reductases (ATR1, 60 T 50 = 39. 1º C ; ATR2, 60 T 50 = 27.4°C) expressed in E. coli also showed these evolutionarily divergent CPRs to be less thermostable than CPR Nl .

EXAMPLE 6. Characterization of enzyme activity for CYP3_N1 and CYP2D_N1 Initial screens for activity towards fluorogenic or luminogenic or steroid marker substrates were done with intact cells washed and resuspended as above in WCAB. For detailed kinetic comparisons of CYP3 N1 and CYP2D N1 and the respective extant P450 control forms (i.e. CYP3A4, CYP2D22, etc. as indicated for individual experiments), membrane fractions were isolated from bacteria expressing either CYP3 N1, CYP2D N1 or an extant protein, with or without hCPR. Recombinant enzymes were expressed in 50 ml cultures, harvested and fractionated according to established procedures. An overview of the activity of CYP3 N1 on certain substrates is provided in FIG. 27. P450-Glo assays

Activity assays were performed as described previously (Johnston et al. 2007) with luminogenic P450-GloTM substrates at the following final concentrations: luciferin- CEE (15 μΜ), luciferin-ME (50 μΜ), luciferin-H (50 μΜ), luciferin-ME-EGE (15 μΜ), luciferin-H-EGE (5 μΜ), luciferin PFBE (25 μΜ), luciferin-BE (25 μΜ) and luciferin- IPA (1.5 μΜ). Briefly, cells harvested from P450 expression cultures were resuspended in an equal volume of WCAB then 2 μL (~20 nM P450) were incubated with P450- Glo™ substrate in a total volume of 40 μL WCAB in white flat-bottomed 96 well plates. The reactions were allowed to proceed at 25°C for 1 h before quenching with luciferin detection reagent (20 μL ). Samples were then incubated at room temperature for 20 min before quantification of luminescence using a Microbeta Trilux luminescence counter (Perkin-Elmer, USA). Results are presented in Table 4 and Table 5. Resorufin O-dealkylation assays

Resorufin O-dealkylation assays were carried out as described in Chang and Waxman (2006). Results are presented in Table 4.

HPLC analysis of steroid hydroxylase activity

Analysis of steroid metabolism was performed as described previously (Hunter et al. 2011). Briefly, four 1 mL microaerobic cultures (Johnston et al. 2008) of each variant were combined, centrifuged (3000 x g, 10 min), and resuspended in 1 mL 100 mM potassium phosphate pH 7.4. Aliquots of each sample were taken to measure the P450 content, then the remaining cells were centrifuged and resuspended again in 1 mL 100 mM potassium phosphate pH 7.4 containing 10 mg/ml (+) glucose and substrate (testosterone or progesterone to final concentration 100 μΜ). The cultures were incubated for 48 h at 25°C, with shaking at 180 rpm, and supplemented at 4 h, 20 h, 28 h and 44 h with glucose (2.75 μmoles per addition, in 50 aliquots of a 10 mg/mL stock) and with ammonium hydroxide (5 μmoles per addition as 5 μL aliquots of a 1 M solution) in order to provide additional carbon and nitrogen sources. After 48 h, the cells were removed by centrifugation (3000 x g, 10 min) and 200 μL of supernatant from each incubation were added to an equal volume of HPLC-grade acetonitrile. Aliquots (25 μL ) were then subjected to HPLC analysis on a C18 column (4.6 x 150 mm, 5 μΜ, Agilent Technologies), initially equilibrated in 24% (v/v) acetonitrile:water at a flow rate of 1.5 mL/min. The following gradient was used for elution of testosterone metabolites: 0 - 5.5 min, 24% acetonitrile; 5.5-12 min, linear gradient to 62% acetonitrile; 12-16 min, 62% acetonitrile; 16-18 min linear gradient to 24% acetonitrile; 18-22 min, 24% acetonitrile. Metabolites were detected by absorbance at 247 nm. Under these conditions, the major metabolite 6P-hydroxytestoterone eluted at approximately 7 min. Products formed were identified by comparison to the previously calibrated retention times of 6β-hydroxytestosterone and other identified testosterone metabolites.

Kinetic analyses were carried out with isolated membrane fractions as described previously using 1 μΜ P450 and 10 - 500 μΜ testosterone for 10 min. Reactions (250 μL ) were quenched by the addition of 25 μΙ_, progesterone 1 mM (as internal standard) and metabolites were extracted with 1 ml of ethyl acetate prior to analysis by HPLC as described above. Results are presented in Table 4, Table 8, and Figure 29.

HPLC analysis of nifedipine oxidation

Analysis of nifedipine metabolism was performed as described previously using bacterial membranes containing 0.1 μΜ P450 in 100 mM Tris buffer pH 7.4 and 5 - 200 μΜ nifedipine added from a methanolic stock such that the final methanol concentration was 1 % v/v. Reactions were quenched after 5 min with 50 μL tetrahydrofuran containing 100 μg/mL nordazepam as internal standard then metabolites were extracted by addition of 800 μL of ethyl acetate and 200 μΤ_, sodium carbonate pH 10. Extracts were desiccated under a gentle stream of nitrogen then resuspended in 60 μΙ_, of mobile phase of which 25 μL, were analysed on a on a C18 column (4.6 x 150 mm, 5 μΜ, Agilent Technologies), eluted isocratically with mobile phase (55% methanol, 45% water, 0.02% triethylamine, pH 5.0) at flow rate of 0.75 ml/min. Results are presented in Table 4.

Analysis of drug metabolism by LC-MS E

Bacterial membranes containing 0.1 μΜ CYP3 N1 (SEQ ID NO:2) coexpressed with human hNPR (human CPR); extant animal CYP3 protein CYP3A4 (Gillam et al. 1993); CYP2D N1 (SEQ ID NO: 181); or hNPR (human CPR) in the absence of any P450 as a control, were incubated with the substrates listed in Table 7 at two concentrations, 10 and ΙΟΟμΜ, except for cyclosporin A, which was used at 2 and 20 μΜ. Incubations were prepared in 250 μΐ total volume of 0.1 M potassium phosphate buffer pH 7.4 and initiated by the addition of an NADPH-generating system consisting of 10 mM glucose- 6-phosphate, 250 μΜ NADP+, and 0.5 U/ml glucose-6-phosphate dehydrogenase. Reactions were quenched at 0, 20 and 120 mins by the addition of two volumes of ice- cold acetonitrile. Precipitated protein was removed by centrifugation and samples were lyophilized then resuspended in 50 % v/v acetonitrile in water. Samples (10 μΐ) were injected into a Waters AC QUIT Y UPLC liquid chromatography system coupled to a Waters Synapt HDMS instrument (Waters, Milford, MA, USA) with an electrospray ionisation (ESI) source. Separation was achieved with a 130 A, 1.7 μιη x 2.1 mm x 100 mm Waters ACQUIT Y UPLC BEH CI 8 column (Waters, Milford, MA) at a column temperature of 45 °C.

Mobile phases consisting of ultra-pure water supplemented with formic acid (0.1% v/v; mobile phase A) and pure acetonitrile (mobile phase B) were employed at a flow rate of 0.5 mL min "1 . The gradient used was as follows: 0.0-6.0 min (10-70% mobile phase B); 6.0- 6.7 min (70-90% mobile phase B), then a return to the initial mobile phase composition over 0.01 min. The MSE analysis was performed with a Waters Synapt HDMS operating in V-mode positive electrospray ionization (ESI) conditions. A routine method with two scan functions was used, comprising an m/z range of 80-1000, cone voltage of 20 V and scan time of 0.1 s. The trap collision energy (CE) in function 1 was 20 V; in function 2 an energy ramp of 15-45 V was applied, and the transfer cell CE was 12 V. Data were collected in a centroid mode. Leu-enkephalin (250 pg mL "1 ) was used as a lock mass (m/z 556.2771) for internal calibration at a flow rate of 0.04 mL min "1 . The MSE data were processed in MetaboLynx 4.1 (Waters, Milford, MA) using the dealkylation tool and mass defect filter and the proposed metabolites were manually reviewed.

Results of metabolism of tamoxifen at a concentration of 100 μΜ to its major demethylated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle), and control results for hNPR (purple cross), are given in Figure 31.

Results of metabolism of erythromycin at a concentration of 100 μΜ to its major demethylated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle), and control results for hNPR (purple cross), are given in Figure 32. Similalrly, results of metabolism of erythromycin at a concentration of 10 μΜ to its major demethylated metabolite by CYP3 N1 (blue diamond), CYP3A4 (red square), and CYP2D N1 (green triangle), and control results for hNPR (purple cross), are given in Figure 33.

Results of metabolism of erythromycin at a concentration of 100 μΜ to its minor demethylated metabolite by CYP2D N1 (green triangle) and control results for hNPR (purple cross) are given in Figure 34. Note that no metabolism of erythromycin to its minor demethylated metabolite by CYP3 N1 or CYP3 A4 was detected.

Results of metabolism of ticlopidine at a concentration of 100 μΜ to its desaturated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle), and control results for hNPR (purple cross), are given in Figure 35.

Results of metabolism of ticlopidine at a concentration of 10 μΜ to its desaturated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle), and control results for hNPR (purple cross), are given in Figure 36.

Results of metabolism of ticlopidine at a concentration of 100 μΜ to its doubly desaturated metabolite by CYP3 N1 (blue diamond); CYP3A4 (red square); and CYP2D N1 (green triangle), and control results for hNPR (purple cross), are given in Figure 37.

EXAMPLE 7. Solvent stability of CYP3_Nl

Testosterone assays were carried out as described earlier. However solvents (methanol, acetonitrile or DMSO) were included in incubation mixtures to the concentrations indicated in the figures. Results are presented in FIG. 15 and FIG. 16. As will be evident from FIG. 15 and FIG. 16, CYP3 N1 (TS) displayed substantially increased velocity in solvents comprising various concentrations of acetonitrile or methanol than extant animal CYP3 protein CYP3A4 (3A4). For example, velocity (pmol/min/pmol P450) in 10% methanol was ~ 2 for CYP3_N1 and ~ 1.5 for CYP3A4; and velocity (pmol/min/pmol P450) in 10% acetonitrile was ~ 0.8 for CYP3_N1 and ~ 0.25 for CYP3A4 (FIG. 16).

EXAMPLE 8. Ligand binding assays for CYP3_N1

Spectral binding assays were performed as described elsewhere (Isin et al. 2008) with bacterial membranes containing CYP3 N1 or CYP3A4 to a final concentration of 0.1 μΜ P450 in lx TES using an OLIS-modified Aminco DW2A spectrophotometer. Binding data were analysed in Prism using the quadratic form of the binding equation (Isin et al. 2006). Results are presented in FIG. 18. In FIG. 18 CYP3 N1 is labelled as CYP3 TS. EXAMPLE 9. Creation of a CYP3_N1 variant library

Overlap extension PCR was used for constructing a mutant library containing CYP3 N1 variant-encoding nucleic acids with codon substitutions at multiple sites (Williams et al. 2014). Each fragment Al, A2, B l, B2, CI, C2, Dl, D2, El, E2, as set forth in Table 6, was amplified individually with the following: 10 ng template DNA (pMA_RQ/3_Nl), polymerase buffer, 200 μΜ of each dNTP, 0.5 μΜ of forward and reverse primers (silent or mutagenic primers, see Table 1), 0.2 U Phusion® high fidelity DNA polymerase (New England Biolabs, USA) and sterile water to a total volume of 20 μΐ). Fragment F was amplified using pCW/2C19FL His EYFP/hNPR as template). PCR cycling conditions were as follow: an initial "hot start" at 98°C for 1 min; 29 cycles of 98°C for 10s, T ann for 20 s, and 72°C for 10 s; a polishing stage at 72°C for 10 min; and finally, storage at 4°C until use). PCR products were analysed by using agarose gel electrophoresis and purified using the Wizard® SV Gel and PCR clean-up system (Promega, Australia). Product yields were quantified using a Nanodrop spectrophotometer.

Equal quantities of matched fragments from each PCR (Al and A2, B l and B2, CI and C2, Dl and D2, El and E2) were combined in a primerless reassembly PCR containing polymerase buffer, 200 μΜ of each dNTP and 0.6 U Phusion® high fidelity DNA polymerase (New England Biolabs, USA) in a total volume of 30 μΐ. Cycling conditions consisted of an initial hot start at 98°C for 2 min followed by 14 cycles of 98°C for 10s and 72°C for 30 s before storage at 4°C until use. These 30 PCR mixtures were then supplemented with additional polymerase buffer, dNTP and Phusion® high fidelity DNA polymerase (New England Biolabs, USA) plus each of the two flanking primers to a final concentration of 0.5 μΜ in a final volume of 50 iL. The fragments were then amplified using the following conditions: an initial hot start at 98°C for 2 min; followed by 14 cycles of 98°C for 10s, and 72°C for 30 s; a polishing stage at 72°C for lOmin; then storage at 4°C until use.

Fragments A, B, C, D, E and F were combined in stepwise manner. Equal quantities of fragment A were first combined with fragment B in a primerless reassembly PCR followed by PCR with flanking primers as above. After gel purification, fragment AB was then combined with fragment C. The process was continued with the addition of fragments D, E and F in sequence, until the full-length open reading frame was obtained. The full-length mutant sequences were then subcloned into the Ndel and Xbal sites of the pCW/2C19 FL His/hNPR plasmid as set forth in FIG. 17.

EXAMPLE 10. Expression and thermostability screening of the CYP3_N1 first generation library

Library variants were expressed as fusion proteins with EYFP from a bicistronic expression vector that allowed co-expression of hCPR. This vector was chosen to enable facile screening to eliminate mutants containing frame-shift mutations or premature stop codons in the subsequent library. The general protocol for P450 expression described above was followed but modified for high throughput format. Each CYP3 library variant (bicistronic, monocistronic and EYFP fusion in bicistronic format with hCPR) was grown in duplicate from two independent colonies. All media were as described above but starter cultures were set up in 96 well plates, inoculated from single colonies and incubated overnight at 37°C, with shaking at 400 rpm in a 5 mm orbit microplate shaker. Expression cultures were set up in 24-well plates by inoculating 20 μL starter cultures into 1 mL of TB expression medium. Cultures were incubated at 25°C with shaking at 350 rpm for an initial 5h, then recombinant protein expression was induced by adding arabinose (4 mg/mL), δ-aminolevulinic acid (0.5 mM) and isopropyl-β-thiogalactopyranoside (IPTG) (lmM) as above. The plates were incubated at 25°C with shaking at 350 rpm for a further 43 h. In some cases, plates were sealed with BreathEasy membranes (Diversified Biotech, Boston, USA) to restrict oxygen availability (microaerobic conditions) during P450 expression.

Cultures were harvested by centrifugation at 2000 g for 10 min then resuspended in WCAB. The resuspended enzyme solution was heated to either 72°C or 73°C for 10 min, cooled at 4°C for 5 min and equilibrated at RT for 5 min. The residual folded protein was then quantified by Fe(II).CO versus Fe(II) difference spectroscopy (Johnston et al. 2008). Variants which showed 50-60% residual folded protein after heating at 73 °C or 60-70% after heating at 72°C were also analysed at 74°C. Finally, 39 mutants were selected for determination of 10 T 50 values (temperature at which 50% of the protein remain folded after being heated for 10 min) as described above. Of these, 26 variants (comprising amino acid sequences set forth in SEQ ID NOS:3-28) showed the best thermostability in initial tests at 72-74°C and the remaining 13 variants (comprising amino acid sequences set forth in SEQ ID NOS:29-41) were chosen to reflect mutants with lower thermostability.

3 N1 11.7 ± 1.4

75 3A4 0.1

3A5 0.1

3A27 n.a

3A37 0.1

3 N1 1.3

Table 2. The estimated half-life of extant and ancestor CYP3s.

Variants E a (KJ/mol)

3A4 91

3A5 116

3A27 151

3A37 116

3 N1 251

Table 3. Thermal deactivation energy (Ea) of extant and ancestor CYP3s.

l-S 2 I

REFERENCES

Alix, B., Boubacar, D.A. & Vladimir, M. T-REX: a web server for inferring,

validating and visualizing phylogenetic trees and networks. Nucl. Acids Res. 40,

W573-W579 (2012).

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local

alignment search tool. J Mol Biol 215, 403-410 (1990).

Ashkenazy, H. et al. FastML: a web server for probabilistic reconstruction of

ancestral sequences. Nucleic Acids Res 40, W580-584 (2012).

Bauer, S. & Shiloach, J. Maximal exponential growth rate and yield of E. coli

obtainable in a bench-scale fermentor. Biotechnol. Bioeng. 16, 933-941 (1974). Behrendorff, J.B.Y.H. in School of Chemistry and Molecular Biosciences, Vol.

Ph.D. (The University of Queensland, 2011).

Chang, T. K. and Waxman, D. J. (2006) Enzymatic Analysis of cDNA-Expressed

Human CYP1A1, CYP1A2, and CYP1B1 With 7-Ethoxyresorufin as Substrate.

In Cytochrome P450 Protocols (Phillips, I. R. and Shephard, E. A., eds.). pp. 85-

90, Springer, Humana Press, Totowa, N.J.

Gadagkar, S.R. & Kumar, S. Maximum likelihood outperforms maximum

parsimony even when evolutionary rates are heterotachous. Mol Biol Evol 22,

2139-2141 (2005).

Gascuel, O. BIONJ: An improved version of the NJ algorithm based on a simple

model of sequence data. Mol Biol Evol 14, 685-695 (1997).

Gillam, E.M.J., Baba, T., Kim, B.-R., Ohmori, S. & Guengerich, F.P. Expression

of modified human cytochrome P450 3A4 in Escherichia coli and purification and reconstitution of the enzyme. Arch. Biochem. Biophys. 305, 123-131 (1993).

Gillam, E.M.J., Guo, Z.Y., Martin, M.V., Jenkins, CM. & Guengerich, F.P.

Expression of cytochrome P450 2D6 in Escherichia coli: Purification, and spectral and catalytic characterization. Arch. Biochem. Biophys. 319, 540-550 (1995).

Gillam, E.M.J, et al. Expression of cytochrome P450 3A5 in Escherichia coli:

Effects of 5' modification, purification, spectral characterization, reconstitution conditions, and catalytic activities. Arch. Biochem. Biophys. 317, 374-384 (1995). Goodman, D.B., Church, G.M. & Kosuri, S. Causes and Effects of N-Terminal

Codon Bias in Bacterial Genes. Science 342, 475-479 (2013).

Guengerich, F.P. in Principles and Methods of Toxicology, Edn. 3. (ed. A.W.

Hayes) 1259-1313 (Raven Press, Ltd., New York; 1994).

Hunter, D.J.B. et al. Facile production of minor metabolites for drug development

using a CYP3A shuffled library. Metab. Eng. 13, 682-693 (2011).

Isin, E.M. & Guengerich, F.P. Kinetics and thermodynamics of ligand binding by

cytochrome P450 3A4. J. Biol. Chem. 281, 9127-9136 (2006).

Isin, E.M. & Guengerich, F.P. Substrate binding to cytochromes P450. Analytical

and Bioanalytical Chemistry 392, 1019-1030 (2008).

Jones, D.T., Taylor, W.R. & Thornton, J.M. The rapid generation of mutation data

matrices from protein sequences. Comput Appl Biosci 8, 275-282 (1992).

Johnston, W.A., Huang, W., De Voss, J.J., Hayes, M.A. & Gillam, E.M.J. A

shuffled CYP1A library shows both structural integrity and functional diversity. Drug Metab. Dispos. 35, 2177-2185 (2007).

Johnston, W.A., Huang, W., Hayes, M.A., De Voss, J.J. & Gillam, E.M.J.

Quantitative whole cell cytochrome P450 measurement suitable for high throughput application. J. Biomol. Screen. 13, 135-141 (2008).

Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid

multiple sequence alignment based on fast Fourier transform. Nucl. Acids Res.

30, 3059-3066 (2002).

Muchmore, D.C., Mcintosh, L.P., Russell, C.B., Anderson, D.E. & Dahlquist, F.W.

Expression and N-15 labeling of proteins for proton and N-15 nuclear magnetic resonance. Methods Enzymol. 177, 44-73 (1989).

Notley, L.M., de Wolf, C.J.F., Wunsch, R.M., Lancaster, RG. & Gillam, E.M.J.

Bioactivation of tamoxifen by recombinant human cytochrome P450 enzymes.

Chem. Res. Toxicol. 15, 614-622 (2002).

Pupko, T., Pe'er, L, Shamir, R. & Graur, D. A fast algorithm for joint

reconstruction of ancestral amino acid sequences. Mol Biol Evol 17, 890-896 (2000).

Rawal, S., Yip, S.S.M. & Coulombe, R.A. Cloning, expression and functional

characterization of cytochrome P450 3A37 from turkey liver with high aflatoxin

B-l epoxidation activity. Chem. Res. Toxicol. 23, 1322-1329 (2010).

Rowland, P. et al. Crystal structure of human cytochrome P450 2D6. J. Biol. Chem. 281, 7614-7622 (2006).

Sambrook, J., Fritsch, E.F. & Maniatis, T. Molecular cloning : a laboratory

manual, Edn. 2nd. (Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.; 1989).

Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6:

Molecular Evolutionary Genetics Analysis version 6.0. Molecular Biology and Evolution 30, 2725-2729 (2013).

von Wachenfeldt, C, Richardson, T.H., Cosme, J. & Johnson, E.F. Microsomal

P450 2C3 is expressed as a soluble dimer in Escherichia coli following modifications of its N-terminus. Arch. Biochem. Biophys. 339, 107-114 (1997).

Williams, E.M., Copp, J.N. & Ackerley, D.F. in Directed Evolution Library

Creation, Vol. 1179. (eds. E.M.J. Gillam, J.N. Copp & D.F. Ackerley) 83-101 (Humana Press, Totowa, N.J.; 2014).

Zadeh, J.N. et al. NUPACK: Analysis and design of nucleic acid systems. Journal

of Computational Chemistry 32, 170-173 (2011).

Throughout the specification, the aim has been to describe the preferred embodiments of the invention without limiting the invention to any one embodiment or specific collection of features. Various changes and modifications may be made to the embodiments described and illustrated without departing from the present invention.

The disclosure of each patent and scientific document, computer program and algorithm referred to in this specification is incorporated by reference in its entirety.