Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS FOR SELECTIVELY MODIFYING AMINO ACIDS AND PRODUCTS MADE THEREBY
Document Type and Number:
WIPO Patent Application WO/2021/007127
Kind Code:
A1
Abstract:
Disclosed herein are methods for the selective substitution of a hydrogen bonded to a carbon atom (e.g., a hydrogen of an aliphatic methylene group) of a compound, which comprise contacting the compound with a substituent in the presence of a BesD halogenase.

Inventors:
CHANG MICHELLE (US)
NEUGEBAUER MONICA E (US)
BENMAMAN JORGE MARCHAND (US)
Application Number:
PCT/US2020/040821
Publication Date:
January 14, 2021
Filing Date:
July 03, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
International Classes:
C12N9/02; C07K1/107; C07K14/36; C12P13/06; C12P13/08; C12P13/10; C12P21/00
Domestic Patent References:
WO2018200592A12018-11-01
Other References:
MARCHAND, JA ET AL.: "Discovery of a pathway for terminal-alkyne amino acid biosynthesis", NATURE, vol. 567, no. 7748, 13 March 2019 (2019-03-13), pages 420 - 424, XP036746296, DOI: 10.1038/s41586-019-1020-y
HUTCHINSON ROBIN I., GRANT RUSSELL J., MURPHY CORMAC D.: "Biosynthetic Origin of [R-(Z)]-4-Amino-3-chloro-2-pentenedioic Acid in Streptomyces viridogenes", BIOSCIENCE, BIOTECHNOLOGY, AND BIOCHEMISTRY, vol. 70, no. 12, 7 December 2006 (2006-12-07), pages 3046 - 3049, XP055782797, DOI: 10.1271/bbb.60372
YEH ELLEN, GARNEAU SYLVIE, WALSH CHRISTOPHER T: "Robust in vitro activity of RebF and RebH, a two-component reductase/halogenase, generating 7-chlorotryptophan during rebeccamycin biosynthesis", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE U.S.A., vol. 102, no. 11, 15 March 2005 (2005-03-15), pages 3960 - 3965, XP055782799, DOI: 10.1073/pnas.0500755102
NEUGEBAUER, ME ET AL.: "A family of radical halogenases for the engineering of amino-acid-based products", NATURE CHEMICAL BIOLOGY, vol. 15, no. 10, October 2019 (2019-10-01), pages 1009 - 1016, XP036888690, [retrieved on 20190923], DOI: 10.1038/s41589-019-0355-x
Attorney, Agent or Firm:
SUNDBY, Suzannah K. (US)
Download PDF:
Claims:
What is claimed is: 1. A method of selective substitution of a hydrogen bonded to a carbon atom of a compound such as an amino acid, which comprises contacting the compound with a substituent in the presence of a BesD halogenase to result in a modified compound such as a modified amino acid, wherein the BesD halogenase is not SEQ ID NO: 1 and/or the substituent is not a chloride ion where the amino acid is lysine. 2. The method according to claim 1, wherein the selective substitution is regioselective and/or stereoselective. 3. The method according to claim 1 or claim 2, wherein the substituent is a halide ion (e.g., F-, Cl-, Br-, or I-) or an azide. 4. The method according to any one of claims 1 to 3, wherein the BesD halogenase has at least 56% sequence identity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, or SEQ ID NO: 22. 5. The method according to any one of claims 1 to 4, wherein (a) a hydrogen of the C4 carbon of the amino acid is selectively substituted, or (b) a hydrogen of the C5 carbon of the amino acid is selectively substituted. 6. The method according to any one of claims 1 to 5, wherein both hydrogens bonded to the carbon atom are selectively substituted.

7. The method according to any one of claims 1 to 5, wherein (a) the modified amino acid is an R-substituted stereoisomer, e.g., R-halo stereoisomer, or (b) the modified amino acid is an S-substituted stereoisomer, e.g., S-halo stereoisomer. 8. The method according to claim 5, wherein (a) a hydrogen of the C4 carbon of the amino acid is selectively substituted, and the BesD halogenase has at least 56% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 18, or SEQ ID NO: 19.

9. The method according to claim 5, wherein (b) a hydrogen of the C5 carbon of the amino acid is selectively substituted, and the BesD halogenase has at least 56% sequence identity to SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, or SEQ ID NO: 20. 10. The method according to claim 6, wherein the BesD halogenase has at least 56% sequence identity to SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, or SEQ ID NO: 20. 11. The method according to claim 7, wherein (a) the modified amino acid is an R-substituted stereoisomer, and the BesD halogenase has at least 56% sequence identity to SEQ ID NO: 1. 12. The method according to any one of claims 1 to 11, wherein the amino acid is lysine, ornithine, leucine, isoleucine, norleucine, or norvaline. 13. The method according to any one of claims 1 to 12, wherein the modified amino acid is halogenated, azidated, hydroxylated, or nitrated. 14. The method according to any one of claims 1 to 13, which further comprises converting the modified amino acid into an alkyl ester, a keto acid, a heterocycle, or a diamine compound. 15. The method according to any one of claims 1 to 14, wherein (a) the method further comprises covalently attaching the modified amino acid, the alkyl ester, the keto acid, the heterocycle, or the diamine compound to a second amino acid, or (b) the amino acid is covalently attached to a second amino acid. 16. The method according to claim 15, wherein the second amino acid is has been modified by the method according to any one of claims 1 to 14. 17. A modified amino acid made by the method according to any one of claims 1 to 16. 18. A protein comprising a modified amino acid made by the method according to any one of claims 1 to 16. 19. An amino acid selected from the group consisting of: 4,4-dichlorolysine, 5,5- dichlorolysine, 5,5-dichloronorleucine, 5-Cl-lysine, 4,4-dibromolysine, 4-Br-Isoleucine, 4-Br- leucine, 4-Br-lysine, 4-Br-ornithine, 5,5-dibromolysine, 5,5-dibromonorleucine, 5-Br-lysine, 4,4-difluorolysine, 4-F-Isoleucine, 4-F-leucine, 4-F-lysine, 4-F-ornithine, 5,5-difluorolysine, 5,5-difluoronorleucine, 5-F-lysine, 4,4-diiodolysine, 4-I-Isoleucine, 4-I-leucine, 4-I-lysine, 4-I- ornithine, 5,5-diiodolysine, 5,5-diiodonorleucine, 5-I-lysine, 4-azido-Isoleucine, 4-azido-leucine, 4-azido-lysine, 4-azido-ornithine, 5-azido-lysine, and alkyl esters, keto acids, heterocycles, and diamines thereof.

Description:
METHODS FOR SELECTIVELY MODIFYING AMINO ACIDS AND PRODUCTS MADE THEREBY

[0001] CROSS-REFERENCE TO RELATED APPLICATIONS

[0002] This application claims the benefit of U.S. Patent Application No․ 62/871,111, filed July 6, 2019, which is herein incorporated by reference in its entirety. [0003] REFERENCE TO A SEQUENCE LISTING SUBMITTED VIA EFS-WEB

[0004] The content of the ASCII text file of the sequence listing named

“20200703_034044_207WO1_ST25” which is 85.7 kb in size was created on July 3, 2020 and electronically submitted via EFS-Web herewith the application is incorporated herein by reference in its entirety. [0005] ACKNOWLEDGEMENT OF GOVERNMENT SUPPORT

[0006] This invention was made with Government support under 1710588 awarded by the National Science Foundation. The Government has certain rights in the invention. [0007] BACKGROUND OF THE INVENTION

[0008] 1. FIELD OF THE INVENTION

[0009] The field generally relates to methods for post-translational modifications to amino acids and proteins. [0010] 2. DESCRIPTION OF THE RELATED ART

[0011] The integration of synthetic and biological catalysis enables new approaches to the synthesis of small molecule targets by crossing the high selectivity of enzymes with the reaction diversity offered by synthetic chemistry. While organohalogens are valued for their bioactivity and utility as synthetic building blocks, only a handful of enzymes that can carry out the regioselective functionalization of unactivated C sp3 -H bonds have previously been identified.

[0012] The expansion of chemical diversity drives the discovery of small molecules and macromolecules with new function. In this regard, both synthetic and cellular chemistry provide access to an extremely broad range of compounds, but the structural space occupied by molecules made by humans compared to those made by Nature are often orthogonal. While living systems use the unparalleled selectivity of enzymes to construct molecules with a limited set of functional groups, synthetic methods utilize instead an extensive range of strategies for bond formation. As such, the development of approaches to bridge synthetic and cellular chemistry can help to gain access to novel structures and classes of compounds by combining the exquisite selectivity of enzymes with breadth of functional groups used for synthetic transformations. The introduction of halogens (X = F, Cl, Br, I) into a functional group-dense scaffold is especially useful, serving both to tune bioactivity and act as a reactive handle for the formation of new chemical bonds for diversification and modification of late-stage synthetic intermediates. While the role of halogens in biosynthetic transformations is continuing to be elucidated, their value in a variety of synthetic transformations and the design of synthetic routes is well accepted. As a result, both synthetic and enzymatic approaches to the formation of carbon-halogen bonds are currently an area of major interest.

[0013] The selective modification of unactivated sp 3 C–H bonds is particularly difficult, as it is challenging to identify catalysts that are sufficiently powerful to activate these bonds while maintaining regio- and stereoselectivity. Remarkably, Nature has evolved a set of non-heme Fe II /a-ketoglutarate (Fe II /aKG)-dependent enzymes that can achieve this task, using a high-valent metal-oxo intermediate to generate a substrate radical that can rebound with the bound halide ligand. However, only a handful of these radical halogenases have been characterized and are notable for their complex substrates. The SyrB2 family has been found to maintain a strict requirement for carrier-protein tethered substrates, whereas the WelO5 family halogenates late-stage indole alkaloid natural products. Since these types of intermediates are not readily modified using downstream enzymatic pathways, the discovery of new enzymes that act on simple and modular building blocks would greatly expand the biosynthetic potential of radical halogenation. We recently discovered a radical halogenase, BesD, that chlorinates the free amino acid lysine, making it the first of the aKG-dependent radical halogenases reported to chlorinate an amino acid without the requirement for a carrier protein. [0014] SUMMARY OF THE INVENTION

[0015] In some embodiments, the present invention provides a method for selective substitution of a hydrogen bonded to a carbon atom (e.g., a hydrogen of an aliphatic methylene group) of a compound. Such compounds include amino acids (e.g., lysine, ornithine, leucine, isoleucine, norleucine, norvaline, proline), 6-amino hexanoic acid, diamines, alpha-keto acids, pipecolates, piperazines, alkyl esters, keto acids,

heterocycles, cyclic amines, and derivatives thereof. The method comprises contacting the compound, e.g., an amino acid, with a substituent in the presence of a BesD halogenase to result in a modified compound. In some embodiments, the BesD halogenase is not SEQ ID NO: 1 and/or the substituent is not a chloride ion where the compound is lysine. In some embodiments, the substituent is a halide ion (e.g., F-, Cl-, Br-, or I-), and the BesD halogenase is not SEQ ID NO: 1 when the halide ion is a chloride ion and the compound is lysine. In some embodiments, the substituent is an azide group, a hydroxyl group, or a nitro group. In some embodiments, the substituent is derived from an anion. In some embodiments, the substituent is an acid. In some embodiments, the substituent is an alkane. In some embodiments, the substituent is a diamine. In some embodiments, the substituent is or contains an isotope or has a detectable label. In some embodiments, the BesD halogenase has at least 56% sequence identity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, or SEQ ID NO: 22. In some embodiments, the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, or SEQ ID NO: 22. In some embodiments, the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 21, or SEQ ID NO: 22. In some embodiments, a hydrogen of the C4 carbon of the compound (e.g., amino acid) is selectively substituted. In some

embodiments, a hydrogen of the C4 carbon of the compound (e.g., amino acid) is selectively substituted and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 18, or SEQ ID NO: 19. In some embodiments, a hydrogen of the C4 carbon of the compound (e.g., amino acid) is selectively substituted and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 10. In some embodiments, a hydrogen of the C5 carbon of the compound (e.g., amino acid) is selectively substituted. In some embodiments, a hydrogen of the C5 carbon of the compound (e.g., amino acid) is selectively substituted and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, or SEQ ID NO: 20. In some embodiments, a hydrogen of the C5 carbon of the compound (e.g., amino acid) is selectively substituted and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 14, SEQ ID NO: 16, or SEQ ID NO: 20. In some embodiments, both hydrogens bonded to the carbon atom are selectively substituted. In some embodiments, both hydrogens bonded to the carbon atom are selectively substituted and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, or SEQ ID NO: 20. In some embodiments, both hydrogens bonded to the carbon atom are selectively substituted and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 17, or SEQ ID NO: 20. In some embodiments, the modified compound (e.g., amino acid) is an R-substituted

stereoisomer, e.g., R-halo stereoisomer. In some embodiments, the modified compound (e.g., amino acid) is an R-substituted stereoisomer, e.g., R-halo stereoisomer and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 1. In some embodiments, the modified compound (e.g., amino acid) is an S-substituted stereoisomer, e.g., S-halo stereoisomer. In some embodiments, the compound is leucine or isoleucine and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 9 or SEQ ID NO: 17. In some embodiments, the compound is leucine, isoleucine, or norleucine and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 9 or SEQ ID NO: 17. In some embodiments, the compound is lysine and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 19, or SEQ ID NO: 20. In some embodiments, the compound is ornithine and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 13, SEQ ID NO: 15, or SEQ ID NO: 18. In some embodiments, the compound is norleucine and the BesD halogenase has 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 9 or SEQ ID NO: 17. In some embodiments, the BesD halogenase is not SEQ ID NO: 2, SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 15, SEQ ID NO: 18, or SEQ ID NO: 19. In some embodiments, the compound is lysine, ornithine, leucine, isoleucine, norleucine, or norvaline. In some embodiments, the compound is an amino acid which is covalently attached to a second amino acid. In some embodiments, the method further comprises converting the modified compound (e.g., modified amino acid) into an alkyl ester, a keto acid, a heterocycle, or a diamine compound. In some embodiments, the method further comprises covalently attaching the modified amino acid, the alkyl ester, the keto acid, the heterocycle, or the diamine compound to a second amino acid. In some embodiments, the amino acid and/or the second amino acid is a non-canonical amino acid. In some embodiments, the second amino acid has been modified using the methods as described herein. In some embodiments, the modified compound (e.g., modified amino acid) is halogenated or azidated. In some embodiments, the modified compound (e.g., modified amino acid) is hydroxylated or nitrated. In some embodiments, the BesD halogenase is provided in the form of a lysate. In some embodiments, the BesD halogenase is provided in the form of composition consisting essentially of the BesD halogenase. A composition“consisting essentially of” a BesD halogenase means that the composition may comprise other ingredients so long as the other ingredients do not significantly impact the activity of the BesD halogenase to cause selective substitution of a hydrogen bonded to a carbon atom of an amino acid as compared to its activity in the absence of the other ingredients.

[0016] In some embodiments, the present invention is directed to a modified amino acid made by selective substitution as described herein. In some embodiments, the modified amino acid is di-halogenated. In some embodiments, the modified amino acid is a 4,4- dihalo-amino acid or a 5,5-dihalo amino acid. In some embodiments, the modified amino acid is 4,4-dichlorolysine, 5,5-dichlorolysine, 5,5-dichloronorleucine, 5-Cl-lysine, 4,4-dibromolysine, 4-Br-Isoleucine, 4-Br-leucine, 4-Br-lysine, 4-Br-ornithine, 5,5- dibromolysine, 5,5-dibromonorleucine, 5-Br-lysine, 4,4-difluorolysine, 4-F-Isoleucine, 4-F-leucine, 4-F-lysine, 4-F-ornithine, 5,5-difluorolysine, 5,5-difluoronorleucine, 5-F- lysine, 4,4-diiodolysine, 4-I-Isoleucine, 4-I-leucine, 4-I-lysine, 4-I-ornithine, 5,5- diiodolysine, 5,5-diiodonorleucine, 5-I-lysine, 4-azido-Isoleucine, 4-azido-leucine, 4- azido-lysine, 4-azido-ornithine, or 5-azido-lysine.

[0017] In some embodiments, the present invention is directed to a di-halogenated amino acid. In some embodiments, the present invention is directed to 4,4-dihalo-amino acid or a 5,5-dihalo amino acid. In some embodiments, the present invention is directed to 4,4- dichlorolysine, 5,5-dichlorolysine, 5,5-dichloronorleucine, 5-Cl-lysine, Cl-norvaline, 4,4- dibromolysine, 4-Br-Isoleucine, 4-Br-leucine, 4-Br-lysine, 4-Br-ornithine, 5,5- dibromolysine, 5,5-dibromonorleucine, 5-Br-lysine, 4,4-difluorolysine, 4-F-Isoleucine, 4-F-leucine, 4-F-lysine, 4-F-ornithine, 5,5-difluorolysine, 5,5-difluoronorleucine, 5-F- lysine, 4,4-diiodolysine, 4-I-Isoleucine, 4-I-leucine, 4-I-lysine, 4-I-ornithine, 5,5- diiodolysine, 5,5-diiodonorleucine, 5-I-lysine, 4-azido-Isoleucine, 4-azido-leucine, 4- azido-lysine, 4-azido-ornithine, 5-azido-lysine, and alkyl esters, keto acids, heterocycles (including proline, pipecolate, and piperazine derivatives), and diamines thereof.

[0018] In some embodiments, the present invention is directed to a protein comprising a modified amino acid made by selective substitution as described herein. In some embodiments, the present invention is directed to a protein comprising one or more of the following: 4,4-dichlorolysine, 5,5-dichlorolysine, 5,5-dichloronorleucine, 5-Cl- lysine, Cl-norvaline, 4,4-dibromolysine, 4-Br-Isoleucine, 4-Br-leucine, 4-Br-lysine, 4- Br-ornithine, 5,5-dibromolysine, 5,5-dibromonorleucine, 5-Br-lysine, 4,4-difluorolysine, 4-F-Isoleucine, 4-F-leucine, 4-F-lysine, 4-F-ornithine, 5,5-difluorolysine, 5,5- difluoronorleucine, 5-F-lysine, 4,4-diiodolysine, 4-I-Isoleucine, 4-I-leucine, 4-I-lysine, 4- I-ornithine, 5,5-diiodolysine, 5,5-diiodonorleucine, 5-I-lysine, 4-azido-Isoleucine, 4- azido-leucine, 4-azido-lysine, 4-azido-ornithine, and 5-azido-lysine. In some

embodiments, the present invention is directed to a protein comprising one or more alkyl esters, keto acids, heterocycles, and/or diamines of one or more of the following: 4,4- dichlorolysine, 5,5-dichlorolysine, 5,5-dichloronorleucine, 5-Cl-lysine, Cl-norvaline, 4,4- dibromolysine, 4-Br-Isoleucine, 4-Br-leucine, 4-Br-lysine, 4-Br-ornithine, 5,5- dibromolysine, 5,5-dibromonorleucine, 5-Br-lysine, 4,4-difluorolysine, 4-F-Isoleucine, 4-F-leucine, 4-F-lysine, 4-F-ornithine, 5,5-difluorolysine, 5,5-difluoronorleucine, 5-F- lysine, 4,4-diiodolysine, 4-I-Isoleucine, 4-I-leucine, 4-I-lysine, 4-I-ornithine, 5,5- diiodolysine, 5,5-diiodonorleucine, 5-I-lysine, 4-azido-Isoleucine, 4-azido-leucine, 4- azido-lysine, 4-azido-ornithine, and 5-azido-lysine.

[0019] Both the foregoing general description and the following detailed description are exemplary and explanatory only and are intended to provide further explanation of the invention as claimed. The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute part of this specification, illustrate several embodiments of the invention, and together with the description explain the principles of the invention. [0020] DESCRIPTION OF THE DRAWINGS

[0021] This invention is further understood by reference to the drawings wherein: [0022] Figure 1: Halogenase diversity expands the scope of accessible chlorinated small molecules. Amino acids serve as building blocks for the production of many classes of small molecules as well as ribosomally- and nonribosomally-synthesized peptides.

Substituents can be introduced by modification of amino acids by BesD radical halogenases.

[0023] Figure 2-Figure 4: Sequence comparison of BesD, WelO5, and SyrB2. Figure 2:

Structures of the substrates of BesD, WelO5, and SyrB2. BesD is the first halogenase reported to directly chlorinate a free amino acid. WelO5 halogenates indole alkaloids. SyrB2 halogenates amino acids that are tethered to carrier proteins. Figure 3: Sequence comparison (left) and RMSD (right) of BesD (PDB 6NIE), WelO5 (PDB 5IQT), and SyrB2 (PDB 2FCU). RMSD calculations were performed using methods in the art. Figure 4: An alignment of BesD to its nearest hydroxylase homolog demonstrates that BesD is closer in sequence similarity (46% sequence identity) to its nearest putative hydroxylase neighbor (WP_107105619, S. Sp. MBT76, SEQ ID NO: 90) with the characteristic HXD motif (marked with“XXX” above) than to the halogenases SyrB2 and WelO5. Alignment and sequence comparisons were performed using methods in the art. The top sequence is SEQ ID NO: 4, and the bottom sequence is SEQ ID NO: 90.

[0024] Figure 5: Data collection and refinement statistics for BesD structure (PDB ID 6NIE).

[0025] Figure 6: Schematic for outcome of halogenation vs. hydroxylation. Radical halogenases are related in reaction mechanism to radical hydroxylases. In hydroxylases, the Fe is coordinated by an Asp or Glu residue, and hydroxide rebound with the substrate radical yields a hydroxylated product. In the case of halogenase, Asp or Glu is instead replaced with a Gly or Ala residue, allowing room for direct coordination of chloride to the Fe. In this case, rebound of either the -Cl or the -OH radical is possible, leading to halogenated or hydroxylated products, respectively. Halogenases have evolved to favor halogenation over the competing hydroxylation reaction.

[0026] Figure 7: Proposed mechanism of halogenation by BesD. The Fe is coordinated by His137, His204, chloride, and aKG in a distorted square pyramidal geometry (grey box). Based on the crystal structure in this study, the putative vacant site for oxygen binding appears to be trans to His137, on the opposite side of the His204-Cl-aKG plane from the substrate carbon that is targeted for hydrogen atom abstraction. Binding of substrate followed by O 2 leads to decarboxylation of aKG via attack of the distal oxygen on C2 of aKG. Because the proposed binding site of O 2 is on the opposite side of the aKG-Cl-His204 plane from the substrate, hydrogen atom abstraction would be facilitated by shifting of the Fe IV -oxo toward the substrate. Finally, rebound of the suitably positioned chloride would yield the halogenated product. Note that the precise positions of the oxo species in the proposed intermediates are unknown.

[0027] Figure 8. Alanine scan mutagenesis of key active site residues. LCMS analysis of chlorolysine and hydroxylysine production by BesD and mutants. Following incubation of purified WT BesD from S. lavanduligriseus or the alanine mutants (N219A, T221A, R74A, D140A, W238A, H134A, W239A, and E120A) with lysine, Fe, NaCl, aKG, and ascorbate for 1 h, reactions were quenched with 2 vol of methanol + 1% formic acid followed by centrifugation at 13,000 × g for 10 min to remove protein precipitates. Samples were analyzed by LCMS/QTOF and extracted ion counts for Cl- lysine (m/z = 181.0738) and hydroxylysine (m/z = 163. 1077) were integrated for each sample. Data are mean ± sd (n = 3), Cl-Lysine = first bars of each set, OH-Lysine = right bars of each set. Hydroxylysine can arise enzymatically through direct

hydroxylation as well as non-enzymatically through intramolecular cyclization of 4-Cl- lysine to the g-lactone (1a)/e-lactam (1b), followed by hydrolysis to 4-hydroxylysine (1c).

[0028] Figure 9-Figure 12: Alanine scan of active site residues. Figure 9: Lysine- binding residues of BesD. Figure 10: LC/MS analysis of lysine reaction products following incubation of WT or mutant SlBesD enzymes with Fe II , aKG, NaCl, and lysine. Integrated extracted ion chromatograms for the formation of Cl-lysine (left bars of each set) and hydroxylysine (right bars of each set) by WT, H134A, N219A, and T221A SlBesD enzymes. R74A, D140A, E120A, W237A, and W238A yielded neither product (not shown). Data are averages ± s.d. (n = 3 experimental replicates). Figure 11: LOGOS plot shows the sequence conservation of active site residues within halogenase (top) and hydroxylase (bottom) homologs of BesD. BesD Blast hits were collected and aligned using methods in the art. The key HXG/D site was identified and used to segregate the data set into halogenase and hydroxylase subsets. To generate the halogenase and hydroxylase subgroups, homologs of BesD (BLAST E-value of e -5 ) were sorted based on the presence of the HXG (halogenase) or HXD (hydroxylase) motifs. Note that while Asn219 is highly conserved in the halogenases, more variability is observed at that position for the putative hydroxylases. In both the halogenase and the hydroxylase data sets, a high level of conservation is observed at key substrate, iron, and aKG-binding positions. In contrast, cysteine, serine, and even nonpolar residues alanine and valine are also observed at position 219 in the hydroxylases. This observation suggests that hydroxylases have fewer constraints for the amino acid at position 219, since the replacement of the chloride ligand with Asp or Glu precludes any outcome other than hydroxylation. Figure 12: Binding site of aKG.

[0029] Figure 13-Figure 15: Maximum-likelihood phylogeny of BesD homologs.

Phylogenetic tree of BesD homologs from the NCBI Non-redundant protein database using methods in the art. The protein clade containing the alkyne biosynthetic genes is indicated with brackets. Proteins tested in this study are indicated in red. Bootstrap values are indicated at branch points. Note that the tree does not contain sequences of hydroxylases, which were filtered out based on the presence of the HXD/E motif. Note that the middle branch is extended to fit the page, and the dotted line is not to scale.

[0030] Figure 16: Summary of halogenases in exemplified herein.

[0031] Figure 17-Figure 21: Amino acid halogenase diversity. Figure 17: BesD

homologs and products of enzymatic amino acid halogenation. Sequence similarity network of BesD homologs generated using the Enzyme Function Initiative’s Sequence Similarity Network tools. Homologs from NCBI’s non-redundant protein database were identified using BLAST with an E-value cutoff of e -5 . Using Cytoscape Version 3.6.1, the network was adjusted by deleting edges with low alignments scores until the halogenases identified for b-ethynylserine (alkyne) biosynthesis were separated into an isofunctional cluster, which occurred at an alignment score value of 88 (which corresponds to a sequence identity of about 56%). Halogenases from each cluster were cloned, expressed, purified, and tested for activity on a panel of amino acid substrates. Note that the enzymes tested from Cluster H displayed no activity on amino acids.

Figure 18: Regioselective mono-halogenation is observed with 4-Cl-lysine (1, m/z = 181.0738) or 5-Cl-lysine (2, m/z = 181.0738) produced by BesD or SiHalB, respectively. Figure 19: Regioselective di-halogenation is also observed with 5,5-dichlorolysine (3, m/z = 215.0349) or 4,4-dichlorolysine (4, m/z = 215.0349) produced by SwHalB or LaHalC, respectively. Figure 20: Ornithine is a substrate for both PkHalD and BesD (4- Cl-ornithine, 5, m/z = 167.0582). Figure 21: The non-polar amino acids, leucine, isoleucine, and norleucine are substrates for PrHalE (4-Cl-leucine, 6, m/z = 166.0629; 4- Cl-isoleucine, 7, m/z = 166.0629; 4-Cl-norleucine, 8, m/z = 166.0629). Assays include Fe II , ascorbate, aKG, NaCl, in addition to the amino acid substrate. All extracted ion chromatograms are representative of at least 3 experimental replicates.

[0032] Figure 22: Substrate profiles of tested halogenases from reaction screening.

Reactions (50 µL) contained the following amino acids: l-alanine, l-arginine, l- asparagine, l-aspartate, l-cysteine, l-glutamine, l-glutamate, l-glycine, l-histidine, l- isoleucine, l-leucine, l-lysine, l-methionine, l-phenylalanine, l-proline, l-serine, l- threonine, l-tryptophan, l-valine, l-ornithine (O), and dl-norleucine (NL) (0.5 mM each), sodium aKG (5 mM), sodium ascorbate (1 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1 mM), and sodium chloride (5 mM) in 100 mM HEPES buffer (pH 7.5). Reactions were initiated by addition of purified halogenase variants (10 µM final concentration) and allowed to proceed for 1 h at room temperature before quenching in 2 vol of methanol with 1% (v/v) formic acid. Samples were then analyzed by LC/MS on an Agilent 1290 UPLC-6530 QTOF using the protocol for polar metabolite analysis above. Activity detected on substrates is shown in green. When SEQ ID NO: 2 was tested on individual amino acids, chlorinated product was detected with I, L, and NL, likely because no K was present to compete.

[0033] Figure 23: Percent sequence identity matrix of tested halogenases. An all-by all sequence alignment was performed using methods in the art. The sequence identity (%) matrix output is displayed for halogenases A-H. The number representing each halogenase corresponds to the accession number as shown in Table 2. Enzymes within a cluster have a percent sequence identity of > about 56%.

[0034] Figure 24: LC/MS analysis of regioselective formation of 4-Cl-lysine and 5-Cl- lysine by halogenase homologs. Figure 24: Extracted ion chromatograms of the products observed upon halogenation of lysine to 4-Cl-lysine (m/z = 181.0738) by homologs of BesD. Reactions contained l-lysine aKG, sodium ascorbate, Fe II , and chloride. HalA (from P. fluorescens and P. orientalis) and HalC (from L. anisa) yield mono-chlorinated products with the same exact mass and retention time as 4-Cl-lysine from BesD. Figure 25: Extracted ion chromatograms of the products observed upon halogenation of lysine to 5-Cl-lysine by HalB homologs of BesD. Reactions contained l-lysine aKG, sodium ascorbate, Fe II , and chloride. HalB homologs (from S. wuyuanensis, S. toyocaensis, S. viridosporus, A. awajinensis, S. griseus, S. afghaniensis, S. iranensis, S. prunicolor), HalF (from S. sp. pristinaespiralis), and HalG ( from M. Pelagius) yield mono- chlorinated products with the same exact mass as 4-Cl-lysine from BesD (m/z =

181.0738), but with an earlier retention time. Note that a small peak corresponding to the mass and retention time of 5-Cl-lysine is observed in the 4-Cl-lysine halogenases, although 4-Cl-lysine is the dominant product.

[0035] Figure 26-Figure 30: NMR analysis of 5-Cl-lysine methyl ester. Figure 26: HalB from S. iranensis (SiHalB) was incubated with fully 15 N- and 13 C-labeled l-lysine, Fe, NaCl, aKG, and ascorbate for 60 min before quenching with methanolic HCl (3M) to yield the [ 15 N 2 , 13 C 6 ]-5-chlorolysine methyl ester product. The derivatized HalB product was then isolated by HPLC and characterized. Extracted ion chromatograms of the [ 15 N2, 13 C 6 ]-5-chlorolysine (m/z = 203.1037) produced by SiHalB. Figure 27: Mass spectra show the characteristic Cl isotope pattern for [ 15 N2, 13 C 6 ]-5-chlorolysine produced by SiHalB. Figure 28: 2D 1 H- 13 C CT-HSQC was used to confirm assignment of Ca and Ce of the [ 15 N2, 13 C 6 ]-5-chlorolysine methyl ester produced by BesD. Carbons with two neighbors have opposite phase of carbons with one or three neighbors. Figure 29: 2D 1 H- 13 C HCCH COSY was used to assign connectivity of carbons of the [ 15 N 2 , 1 3 C 6 ]-5-chlorolysine methyl ester produced by SiHalB. Figure 30: Chemical shifts determined from the 2D-NMR data.

[0036] Figure 31: LC/MS analysis of dichlorination of lysine to 4,4-dichlorolysine by HalC from L. anisa. Extracted ion chromatograms of the products observed upon halogenation of lysine to 4,4-dichlorolysine (m/z = 215.0349) by HalC from L. anisa (LaHalC). Reactions contained l-lysine aKG, sodium ascorbate, Fe II , and chloride. BesD and homologs from Cluster A (P. fluorescens and P. orientalis) do not yield dichlorolysine, and no product with the expected mass is observed. In contrast, LaHalC yields 4,4-dichlorolysine (m/z = 215.0349).

[0037] Figure 32-Figure 38: NMR analysis of 4,4-dichlorolysine methyl ester. Figure 32: HalC from L. anisa was incubated with fully 15 N- and 13 C-labeled l-lysine, Fe, NaCl, aKG, and ascorbate for 4 h before quenching with methanolic HCl (3M) to yield the [ 15 N 2 , 13 C 6 ]-4,4-dichlorolysine methyl ester product. The derivatized HalC product was then isolated by HPLC and characterized. Extracted ion chromatograms of the [ 15 N 2 , 1 3 C 6 ]-4,4-dichlorolysine methyl ester (m/z = 237.0647). Figure 33: Mass spectra show the characteristic Cl isotope pattern for [ 15 N 2 , 13 C 6 ]-4,4-dichlorolysine methyl ester. Figure 34: 2D 1 H- 13 C CT-HSQC was used to confirm assignment of C a and C e of the [ 15 N2, 13 C 6 ]-4,4-dichlorolysine methyl ester produced by HalC. Carbons with two neighbors have opposite phase of carbons with one or three neighbors. Figure 35: 2D 1H- 13 C HCCH COSY was used to assign connectivity of carbons of the [ 15 N 2 , 13 C 6 ]-4,4- dichlorolysine methyl ester produced by BesD. Figure 36: HCACO to confirm assignment of a proton. Figure 37: 1,1 ADEQUATE was used to obtain the shift of C g by correlation with H d . Figure 38: Chemical shifts determined from the 2D-NMR data.

[0038] Figure 39: LC/MS comparison of 4,4-dichlorolysine vs.5,5-dichlorolysine

products. Extracted ion chromatograms of the products observed upon halogenation of lysine to 4,4-dichlorolysine (m/z = 215.0349) by HalC from L. anisa and halogenation of lysine to 5,5-dichlorolysine (m/z = 215.0349) by halogenase homologs from Cluster B (from S. wuyuanensis, S. toyocaensis, S. viridosporus, A. awajinensis, S. griseus, S.

afghaniensis, S. iranensis, S. prunicolor), Cluster F (from S. sp. pristinaespiralis), and Cluster G (from M. pelagius). Reactions contained l-lysine aKG, sodium ascorbate, Fe II , and chloride. The structures of the observed products were confirmed by NMR.

[0039] Figure 40-Figure 45: NMR analysis of 5,5-dichlorolysine methyl ester. Figure 40: HalB from S. wuyuanensis (SwHalB) was incubated with fully 15 N- and 13 C-labeled l-lysine, Fe, NaCl, and aKG for 120 min before quenching with methanolic HCl (3M) to yield the [ 15 N 2 , 13 C 6 ]-5,5-dichlorolysine methyl ester product. The derivatized SwHalB product was then isolated by HPLC and characterized. Figure 41: Extracted ion chromatograms of the [ 15 N 13

2, C 6 ]-5,5-dichlorolysine methyl ester (m/z = 237.0647). Mass spectra show the characteristic Cl isotope pattern for [ 15 N 2 , 13 C 6 ]-5,5- dichlorolysine methyl ester. Figure 42: 2D 1 H- 13 C CT-HSQC was used to confirm assignment of Ca and Ce of the [ 15 N2, 13 C 6 ]-5,5-dichlorolysine methyl ester produced by SwHalB. Carbons with two neighbors have opposite phase of carbons with one or three neighbors. Figure 43: 2D 1 H- 13 C HCCH COSY was used to assign connectivity of carbons of the [ 15 N2, 13 C 6 ]-5,5-dichlorolysine methyl ester. Figure 44: A 2D 1 H- 13 C long-range HCCH experiment was used to obtain the shift of C d by correlation to H a , H g , and H e . Figure 45: Chemical shifts determined from the 2D-NMR data.

[0040] Figure 46: LC/MS analysis of HalD chlorination of ornithine. Extracted ion

chromatograms of the products observed upon incubation of halogenase homologs with Fe, ascorbate, NaCl, aKG, lysine, and ornithine. Halogenase homologs from P.

kilonensis (PkHalD), P. sp. SHC52 (PsHalD), and P. trivialis (PtHalD) yield a lower extracted ion count by LC/MS analysis than the halogenase from S. cattleya (BesD). Instead, these halogenases yield more 4-Cl-ornithine (m/z = 167.0582) than BesD from S. cattleya.

[0041] Figure 47-Figure 49: Kinetic analysis of lysine halogenation vs. ornithine

halogenation. Figure 47: Schematic for the coupled reaction for monitoring halogenase activity through NADH oxidation. The rate of succinate formation by HalA from P. fluorescens (PfHalA) and HalD from P. kilonensis (PkHalD) was monitored by the change in A 340 using an NADH-coupled assay with succinyl-CoA synthetase (SCS), pyruvate kinase (PK) and lactate dehydrogenase. Figure 48: Steady-state kinetic analysis for 4-Cl-lysine halogenase PfHalA with lysine or ornithine as substrates. Data are mean ± sd (n = 3). Table contains kcat, KM, and kcat/KM calculated by non-linear curve fitting to the Michaelis-Menten equation. Error in k cat /K M is obtained by propagation from the individual kinetic terms. Figure 49: Steady-state kinetic analysis for 4-Cl-ornithine halogenase (PkHalD) with lysine or ornithine as substrates. Data are mean ± sd (n = 3). Table contains kcat, KM, and kcat/KM calculated by non-linear curve fitting to the

Michaelis-Menten equation. Data are mean ± s.e. Error in k cat /K M is obtained by propagation from the individual kinetic terms.

[0042] Figure 50-Figure 54: NMR characterization of 4-Cl-ornithine. Figure 50: HalD from P. kilonensis (PkHalD) was incubated with fully 15 N- and 13 C-labeled l-ornithine, Fe, NaCl, aKG, and ascorbate for 60 min before quenching with methanolic HCl (3M) to yield the [ 15 N2, 13 C 5 ]-4-Cl-ornithine methyl ester product. The derivatized PkHalD product was then isolated by HPLC and characterized. Extracted ion chromatograms of the [ 15 N 13

2, C 5 ]-4-Cl-ornithine methyl ester (m/z = 188.0847) produced by PkHalD. Figure 51: Mass spectra show the characteristic Cl isotope pattern for [ 15 N2, 13 C 5 ]-4-Cl- ornithine. Figure 52: 2D 1 H- 13 C CT-HSQC was used to confirm assignment of Ca and C d of the [ 15 N 2 , 13 C 5 ]-4-Cl-ornithine methyl ester produced by PkHalD. Carbons with two neighbors have opposite phase of carbons with one or three neighbors. Figure 532D 1H- 13 C HCCH COSY was used to assign connectivity of carbons of the [ 15 N2, 13 C 5 ]-4- chloroornithine methyl ester produced by PkHalD. Figure 54: Chemical shifts determined from the 2D-NMR data.

[0043] Figure 55: LC/MS analysis of products of aliphatic amino acid halogenases.

Extracted ion chromatograms of the 4-Cl-leucine (m/z = 166.0629) product observed upon incubation of P. sp. Root562 HalE (PrHalE), P. fulva HalE (PfHalE), and S.

cattleya BesD (ScBesD) with Fe, ascorbate, NaCl, aKG, and leucine for 90 min.

Extracted ion chromatograms of the 4-Cl-isoleucine (m/z = 166.0629) product observed upon incubation of P. sp. Root562 HalE (PrHalE), P. fulva HalE (PfHalE), and S.

cattleya BesD (ScBesD) with Fe, ascorbate, NaCl, aKG, and isoleucine for 90 min. Extracted ion chromatograms of the 5-Cl-norleucine (m/z = 166.0629) and 5,5- dichloronorleucine (m/z = 200.0240) products observed upon incubation of P. sp.

Root562 HalE (PrHalE), P. fulva HalE (PfHalE), and S. cattleya BesD (ScBesD) with Fe, ascorbate, NaCl, aKG, and dl-norleucine for 90 min.

[0044] Figure 56: Kinetic analysis of aliphatic amino acids halogenase, PrHalE. Steady- state kinetic analysis for aliphatic amino acid halogenase PrHalE with norleucine, leucine, isoleucine, and lysine as substrates. A coupled reaction for monitoring halogenase activity through NADH oxidation was performed. The rate of succinate formation by HalE from P. sp. Root562 was monitored by the change in A340 using an NADH-coupled assay with succinyl-CoA synthetase (SCS), pyruvate kinase (PK) and lactate dehydrogenase. Data are mean ± sd (n = 3). The enzyme has a relatively high rate of succinate formation (about 2.5 min -1 ) even in the absence of substrate when compared to PfHalA and PkHalD. This indicates a high rate of turnover of aKG to succinate without halogenation of the primary substrate, or“uncoupling”. The rate of the enzyme increases with increasing concentration of norleucine, leucine, and isoleucine. However, the rate does not increase even when 10 mM lysine is added to the reactions.

[0045] Figure 57-Figure 62: NMR analysis of 4-Cl-leucine. Figure 57: HalE from P. sp.

Root562 (PrHalE) was incubated with fully 13 C-labeled l-leucine, Fe, NaCl, and aKG for 90 min before quenching with methanolic HCl (3M) to yield the [ 13 C 6 ]-4-chloroleucine methyl ester product. The derivatized product was then isolated by HPLC and characterized. Extracted ion chromatograms of the [ 13 C 6 ]-4-Cl-leucine methyl ester (m/z = 186.0987) produced by PrHalE. Figure 58: Mass spectra show the characteristic Cl isotope pattern for [ 13 C 6 ]-4-Cl-leucine produced by PrHalE. Figure 59: 2D 1 H- 13 C CT- HSQC was used to confirm assignment of C b of the [ 13 C 6 ]-4-Cl-leucine methyl ester. Carbons with two neighbors have opposite phase of carbons with one or three neighbors. Figure 60: 2D 1 H- 13 C HCCH COSY was used to assign connectivity of carbons of the [ 13 C 6 ]-4-Cl-leucine methyl ester produced by PrHalE. Figure 61: A 2D 1 H- 13 C long- range HCCH was used to obtain the shift of C g using the adjacent H d proton shifts.

Figure 62: Chemical shifts determined from the 2D-NMR data.

[0046] Figure 63-Figure 68: NMR analysis of 4-Cl-isoleucine. Figure 63: HalE from P. sp. Root562 was incubated with fully 15 N- and 13 C-labeled l-isoleucine, Fe, NaCl, and aKG for 90 min before quenching with methanolic HCl (3M) to yield the [ 15 N, 13 C 6 ]-4- Cl-isoleucine methyl ester product. The derivatized product was then isolated by HPLC and characterized. Extracted ion chromatograms of the [ 15 N, 13 C 6 ]-4-Cl-isoleucine methyl ester (m/z = 187.0957) produced by HalE. Figure 64: Mass spectra show the characteristic Cl isotope pattern for [ 15 N, 13 C 6 ]-4-Cl-isoleucine produced by HalE. Note that methyl esterified [ 15 N, 13 C 6 ]-isoleucine (the substrate) co-elutes with [ 15 N, 13 C 6 ]-4- Cl-isoleucine in the purification. Figure 65: 2D 1 H- 13 C CT-HSQC was used to assign C g 2 of the [ 15 N, 13 C 6 ]-4-Cl-isoleucine and [ 15 N, 13 C 6 ]-isoleucine methyl esters. Carbons with two neighbors (black/green) have opposite phase of carbons with one or three neighbors (red/blue). Figure 66: 2D 1 H- 13 C HCCH COSY was used to assign connectivity of carbons of the [ 15 N, 13 C 6 ]-4-Cl-isoleucine methyl ester produced by HalE and the residual [ 15 N, 13 C 6 ]-isoleucine methyl ester. Figure 67 and Figure 68: Structures and shifts obtained from the 2D-NMR data.

[0047] Figure 69-Figure 75: NMR analysis of 5,5-dichloronorleucine. HalE from P. sp.

Root562 was incubated with dl-norleucine, Fe, NaCl, aKG, and ascorbate for 16 h to produce 5,5-dichloronorleucine before quenching the reaction and precipitating the protein with acetonitrile. The 5,5-dichloronorleucine product was isolated by HPLC, derivatized with 3M HCl in methanol to yield the 5,5-dichloronorleucine methyl ester and extracted into chloroform-D for LC/MS and NMR analysis. Figure 69: Extracted ion chromatograms of the 5,5-dichloronorleucine methyl ester (m/z = 214.0396) produced by HalE. Figure 70: Mass spectra show the characteristic Cl isotope pattern for 5,5-dichloronorleucine methyl ester. Figure 71: 2D 1 H- 13 C CT-HSQC was used to confirm assignment of C a and C e of the 5,5-dichloronorleucine methyl ester. Carbons with two neighbors have opposite phase of carbons with one or three neighbors. Figure 72: 2D-TOCSY and Figure 73: 2D 1 H- 13 C HCCH COSY were used to assign

connectivity of carbons of the 5,5-dichloronorleucine methyl ester produced by HalE. Figure 74: HMBC was used to obtain the chemical shift of the d carbon, which is dichlorinated and lacks protons. Figure 75: Chemical shifts obtained from the 2D-NMR data.

[0048] Figure 76: Sequence alignment of representative halogenases A-E. Multiple sequence alignment of BesD and representative homologs from this study. Included are S. cattleya BesD (4-Cl-lysine halogenase, BesD), P. fluorescens HalA (4-Cl-lysine), S. iranensis HalB (5-Cl-lysine and 5,5-dichlorolysine), S. wuyuanensis HalB (5-Cl-lysine and 5,5-dichlorolysine), L. anisa HalC (4-Cl-lysine and 4,4-dichlorolysine), P.

kilonensis HalD (4-Cl-ornithine), P. sp. Root562 HalE (4-Cl-leucine, 4-Cl-isoleucine, and 5,5-dichloronorleucine). The alignment was performed using methods in the art. Residues investigated in this study are marked with an“X” above. The sequences from top to bottom are SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 15, SEQ ID NO: 1, SEQ ID NO: 7, SEQ ID NO: 3, and SEQ ID NO: 19, respectively.

[0049] Figure 77: Variable C-terminus of halogenases. Of all the substrate-contacting residues W238 and W239 are the most variable among halogenase homologs of BesD. W238 and W239 are located on the C-terminus of the protein (237-239 aa), which closes over the active site of BesD. The C-termini of BesD homologs is highly variable, differing by as much as 10 amino acids in length. R237 on the C-terminus of BesD contacts K117 and D119 of the internal loop of BesD (117-119 aa), which is also variable among the halogenases. The variable C terminus and internal loop may be key factors in determining halogenase substrate specificity.

[0050] Figure 78-Figure 80: Engineering downstream pathways with amino acid

halogenases. Figure 78: Bromination and azidation can be carried out with BesD halogenases. Bromolysine (m/z = 225.0233) and azidolysine (m/z = 188.1142) are produced by SwHalB using NaBr or NaN3, respectively. Extracted ion chromatograms are representative of at least 3 experimental replicates. Figure 79: The incubation of substrates (lysine, ornithine, or norleucine) and cofactors (Fe II , aKG, NaCl) with various halogenases and amino-acid metabolizing enzymes yields chlorinated heterocycles (10- 12), chlorinated diamines (13) and chlorinated a-keto acids (14, 15). 10, 11: lysine with SwHalB and lysine cyclodeaminase; 12: ornithine with PkHalD and ornithine cyclodeaminase; 13: lysine with SwHalB and lysine decarboxylase; 14, 15: norleucine with PrHalE and aliphatic amino acid transaminase, IlvE. At least 3 experimental replicates were carried out. Figure 80: Substituents can be introduced into ribosomally- synthesized peptides by addition of an amino acid halogenase. PfHalA or SwHalB could be used to generate Cl-lysine in situ followed by in vitro transcription/translation (IVTT) initiated by addition of PURExpress kit components along with a plasmid encoding the METRSKNML (SEQ ID NO: 92) peptide. IVTT reactions were quenched after 4 h and analyzed by LC/MS. Extracted ion chromatogram shows the production of peptides containing lysine (m/z = 1137.5391) or Cl-lysine (m/z = 1171.5010) and are

representative of 3 experimental replicates.

[0051] Figure 81-Figure 83: In vitro formation of chlorinated heterocycles by lysine cyclodeaminase (RapL). Figure 81: Reaction scheme for the production of 5-Cl pipecolate and 5,5-dichloropipecolate by HalB from S. wuyuanensis (SwHalB) and RapL from S. hygroscopicus. SwHalB was incubated with lysine, Fe, NaCl, aKG, and ascorbate for 90 min to form 5-Cl-lysine and 5,5-dichlorolysine before addition of RapL and NAD + to the reaction mixture. Expected products are drawn to represent the

[M+H] + mass detected in positive scan mode on the QTOF. After 60 min, reactions were quenched by addition of 2 vol of 1% formic acid in methanol. Figure 82: Extracted ion chromatograms and mass spectra for the formation of 5-Cl pipecolate (m/z = 164.0473) upon addition of RapL to 5-chlorolysine formed by SwHalB. Figure 83: Extracted ion chromatograms and mass spectra for the formation of 5,5-dichloropipecolate (m/z = 198.0083) upon addition of RapL to 5,5-dichlorolysine formed by SwHalB.

[0052] Figure 84 and Figure 85: In vitro formation of chlorinated heterocycles by

ornithine cyclodeaminase (OCD). Figure 84: Reaction scheme for the production of 4- Cl-proline from ornithine via a coupled reaction with purified HalD from P. kilonensis (PkHalD) and OCD from E. coli. PkHalD was incubated with ornithine, Fe, NaCl, aKG, and ascorbate for 90 min to form 4-Cl-ornithine before addition of OCD and NAD + to the reaction mixture. After 60 min, reactions were quenched by addition of 2 vol of 1% formic acid in methanol. Expected products are drawn to represent the [M+H] + mass detected in positive scan mode on the QTOF. Figure 85: Extracted ion chromatograms and mass spectra for the formation of 4-Cl-proline (m/z = 150.0316) upon addition of OCD to 4-Cl-ornithine formed by PkHalD.

[0053] Figure 86 and Figure 87: In vitro chlorinated diamine formation by lysine

decarboxylase (LDC). Figure 86: Reaction scheme for the production of 2,2

dichloropentane-1,5-diamine from lysine via a coupled reaction with HalB purified from S. wuyuanensis (SwHalB) and LDC from E. coli. SwHalB was incubated with lysine, Fe, NaCl, aKG, and ascorbate for 90 min to form 5,5-dichlorolysine before addition of LDC to the reaction mixture. After 60 min, reactions were quenched by addition of 2 vol of 1% formic acid in methanol. Expected products are drawn to represent the [M+H] + mass detected in positive scan mode on the QTOF. Figure 87: Extracted ion chromatograms for the formation of 2,2 dichloropentane-1,5-diamine (m/z = 171.0450) upon addition of LDC to 5,5-dichlorolysine formed by SwHalB. Mass spectra show the characteristic chlorinated isotope pattern for 2,2 dichloropentane-1,5-diamine.

[0054] Figure 88-Figure 90: In vitro a-keto acid formation by the aliphatic amino acid aminotransferase, IlvE. Figure 88: Reaction scheme for the production of the chlorinated a-keto acids from norleucine via a coupled reaction with HalE purified from P. sp.

Root562 (PfHalE) and IlvE from E. coli. PfHalE was incubated with DL-norleucine, Fe, NaCl, aKG, and ascorbate for 90 min before addition of IlvE to the reaction mixture. After 60 min, reactions were quenched by addition of 2 vol of 1% formic acid in methanol. Expected products are drawn to represent the [M-H]- mass detected in negative scan mode on the QTOF. Figure 89: Extracted ion chromatograms for 5-Cl-2- oxohexanoate (m/z = 163.0167) product formation by PfHalE and IlvE. Figure 90: 5,5- dichloro-2-oxohexanoate (m/z = 196.9778) product formation by PfHalE and IlvE.

[0055] Color versions of these figures may be found in Neugebauer, ME, et al. A family of radical halogenases for the engineering of amino-acid-based products. Nat Chem Biol 15, 1009–1016 (2019). [0056] DETAILED DESCRIPTION OF THE INVENTION

[0057] In this work, we solved the crystal structure of BesD in complex with its substrate lysine to better understand the basis for direct halogenation of amino acids. We have also identified additional members of this new family of radical halogenases that chlorinate different polar and nonpolar amino acids with varied regioselectivity by investigating the unexplored sequence space around the Fe II /aKG-dependent halogenase, BesD. Additional active-site mutagenesis and bioinformatics analysis provide further insight into the reaction mechanism of these enzymes and how they catalyze selective chlorination. Taking advantage of the central role of amino acid building blocks in the cellular production of metabolites, natural products, and macromolecules, we

demonstrate that the halogenated amino acid products can be further converted enzymatically to a range of compound classes such as nitrogen heterocycles, amines, ketoacids, and peptides (Figure 1). Taken together, this work greatly expands the utility of radical halogenases and highlights the promise of the BesD halogenase family in the production of compounds that are halogenated, azidated, hydroxylated, or nitrated. That is, the BesD halogenases described herein can be used for selective substitution of a hydrogen bonded to the C4 or C5 carbon atom of compounds such as amino acids (e.g., lysine, ornithine, leucine, isoleucine, norleucine, norvaline, proline), 6-amino hexanoic acid, diamines, alpha-keto acids, pipecolates, piperazines, alkyl esters, keto acids, heterocycles, cyclic amines, and derivatives thereof with a substituent such as a halide ion, an azide group, a hydroxyl group, or a nitro group.

[0058] Enzymes serve as highly useful catalysts, whose ability to achieve high

selectivity and turnover numbers under mild conditions have advanced the development of streamlined syntheses of a broad range of simple and complex compounds. One particularly interesting reaction carried out by enzymes is the selective activation of C-H bonds to replace them with halogen substituents, which enables the functionalization of small molecules as semisynthetic intermediates for a broad range of downstream reaction pathways or to tune the bioactivity of lead compounds. The discovery of Fe II /aKG - dependent halogenases that can operate directly on simple amino acids expands the scope of known radical halogenation chemistry. Furthermore, this halogenase family opens the door to the biosynthetic engineering of a large number of small molecule and

macromolecular targets derived from amino acid building blocks.

[0059] Characterization of the radical halogenases in the BesD family has shown that these enzymes accept both polar and non-polar amino acids, demonstrate regioselective halogenation of C sp3 -H bonds, as well as carry out mono- and di-chlorination. Structural studies of BesD further provide valuable insight into its substrate selectivity and the potential mechanisms by which reaction partitioning between halogenation and hydroxylation is achieved. In addition to showing that bromination and azidation are catalyzed by the BesD family, we have demonstrated that downstream enzymatic pathways can be constructed to utilize halogenase-modified intermediates to produce different classes of compounds, including nitrogen heterocycles, diamines, keto acids, and ribosomally-synthesized peptides. Altogether, these results highlight the potential to integrate this family of halogenases directly into engineered biological reaction pathways and expand our ability to access a wide variety of new compounds.

[0060] The integration of synthetic and biological catalysis enables new approaches to the synthesis of small molecule targets by crossing the high selectivity of enzymes with the reaction diversity offered by synthetic chemistry. While organohalogens are valued for their bioactivity and utility as synthetic building blocks, only a handful of enzymes that can carry out the regioselective functionalization of unactivated Csp3-H bonds have previously been identified. In this context, we report the structural characterization of BesD, a recently discovered radical halogenase from the FeII/-ketogluturate family that chlorinates the free amino acid lysine. We also identify and characterize additional halogenases that produce mono- and di-chlorinated as well as brominated and azidated amino acids. The substrate selectivity of this new family of radical halogenases takes advantage of the role of amino acids in metabolism and enables engineering biosynthetic pathways to afford a wide variety of compound classes, such as heterocycles, diamines, - keto acids, and peptides.

[0061] BesD family of halogenases can be:

- Used to produce halogenated (X = Cl, I, Br), azidated, brominated, and hydroxylated amino acids directly, with downstream conversion through substitution/elimination to other groups like amino, ether, ester, thioether, fluoro, cyano, etc.

- Used to produce downstream products of amino acids, such as ketoacids, amines, and heteroccycles

- Used to produce modified peptides and proteins for functional modification (addition of polymers, toxins, delivery agents, imaging agents, etc.)

- Used to produce modified peptides and proteins for structural modification (peptide macrocycles, stapled peptides, etc.)

- Used for in vivo tagging applications (e.g., Halo tagging) [0062] As disclosed herein, some BesD halogenases result in C4 halogenated amino acids, whereas others result in C5 halogenated amino acids. Therefore, in some embodiments, a BesD halogenase may be used to regioselectively halogenate a given amino acid.

[0063] Not only are the BesD halogenases regioselective, some or all the BesD

halogenases are also stereoselective. That is, at least some of the BesD halogenases stereoselectively halogenate a given amino acid.

[0064] For example, BesD results in the 4R-halo steroiosomer and

not the 4S-halo stereoisomer . [0065] Structural characterization of the lysine halogenase BesD

[0066] The Fe II /aKG-dependent halogenases are part of the large and highly diverse Cupin superfamily, which contains members that catalyze a host of different reactions, including hydroxylation, halogenation, olefin epoxidation, and stereoinversion on a broad range of substrates. As such, the prediction and discovery of new activities using bioinformatic approaches has been limited by the sequence variability of family members. In this case, BesD has very low sequence identity to the carrier protein- dependent halogenase SyrB2 (7% ID) and the indole alkaloid halogenase WelO5 (11% ID) and thus displays higher sequence similarity to predicted but uncharacterized Fe II /aKG-dependent hydroxylases (£ 46% ID) than to other known halogenases (Figure 2-Figure 4). A query of the superfamily database reveals that BesD groups in a different protein family within the Cupin Superfamily compared to previously reported Fe II /aKG- dependent halogenases. To gain a better mechanistic understanding of the BesD family of halogenases, we solved the X-ray crystal structure of BesD with lysine, Fe, chlorine, and aKG bound in the active site (Figure 5).

[0067] BesD was crystallized anaerobically with lysine and aKG and soaked with Fe II .

The X-ray structure was solved at 1.95 Å resolution by Fe-SAD phasing. The structure contains four BesD monomers in the asymmetric unit, all of which possess the b- sandwich topology characteristic of Fe/aKG enzymes. Each monomer contains a single Fe coordinated by His137, His 204, chloride, and aKG in a distorted square pyramidal geometry in the active site, which contains the HXG/A motif that is characteristic of halogenases. The binding site for lysine in BesD is largely polar, with a network of hydrogen bonds for binding to the carboxylate (Arg74, His134, Trp238), a-amine (Asp140), and e-amine (Glu120, Asn219, and Thr221) of the substrate (data not shown). In addition, Trp239 on the C-terminus of the protein stacks over the aliphatic side chain of lysine, closing over the substrate at the entrance of the active site. The structure also reveals a water network between Thr221 and the chloride ligand.

[0068] The crystal structure highlights differences in substrate binding between BesD and the carrier protein-dependent amino acid halogenases. In BesD, the positioning of the lysine substrate appears to be strongly determined by a two-point interaction between the free carboxylate of lysine and the guanidinium group of an active-site arginine (Arg74). In contrast in SyrB2, the carboxylate of threonine substrate is ligated to a phosphopantetheine (PPant) arm, which both tethers threonine to the carrier protein (SyrB1) through a thioester linkage and masks the carboxylic acid. It has been proposed that the Thr-Ppant arm enters the active site through a tunnel and that interactions with threonine, the Ppant arm, as well as SyrB1 are likely important for positioning the substrate in the active site for catalysis. Another notable difference is that BesD appears to have a covering lid that holds lysine in the active site for catalysis. This substrate binding mode is more reminiscent of the indole alkaloid halogenase, WelO5, in which the substrate is held into place by a helix that closes over the active site. [0069] Catalytic selectivity in BesD halogenases

[0070] Understanding how radical halogenases catalyze halogenation selectively over the closely related hydroxylation reaction remains an active area of investigation.

Fe II /aKG-dependent halogenases are closely related both evolutionarily and

mechanistically to the larger and more well-studied class of Fe II /aKG-dependent hydroxylases. Upon activation with O 2 and aKG, both hydroxylases and halogenases generate a high-valent Fe IV -oxo species which can abstract an H atom from the substrate. For hydroxylases, rebound of the substrate radical with the hydroxyl group is the only possible pathway as the coordination site occupied by a halide in halogenases is filled by the Asp or Glu ligand from the HXD/E motif. In halogenases, however, rebound with either the halide or the hydroxyl group is possible, leading to potential reaction partitioning between halogenation and hydroxylation, respectively (Figure 6). In the case of BesD, mutation of the active site glycine of the HXG motif to an aspartate eliminates halogenation and results in only hydroxylation. Indeed, it has been shown that halogenases can carry out a low level of off-pathway hydroxylation, especially when challenged with non-native substrates. Efforts to engineer halogenases from hydroxylases have also been difficult, suggesting that second-sphere interactions are important in controlling reaction partitioning.

[0071] Combining insights from our structure with prior studies of Fe II /aKG-dependent enzymes, we can propose a mechanism for halogenation by BesD. Halogenases must orient the Fe IV -oxo intermediate such that it is suitably positioned to perform hydrogen atom abstraction while disfavoring rebound of the resulting Fe III -OH intermediate in favor of Fe III -Cl rebound. Our structure captured under anaerobic conditions most likely represents a snapshot of the active site prior to O 2 binding. Comparison of BesD and WelO5 active site suggests that in the case of BesD, the putative O 2 binding site is on the opposite side of the aKG-Cl-H204 plane from the substrate. However, in WelO5, the putative O 2 binding site is on the same side as the substrate. Asn219 in BesD provides hydrogen bonding contacts to the C1 carboxylate of aKG and possibly serves an analogous mechanistic role to Ser189 in WelO5. By analogy to other Fe II /aKG- dependent enzymes in the literature, the putative vacant site for oxygen binding appears to be trans to His137, on the opposite side of the His204-Cl-aKG plane from the substrate carbon that is targeted for hydrogen atom abstraction. If the oxo moiety remains in the same site upon formation of the Fe IV -oxo intermediate, the calculated distance between the oxo and site of H atom abstraction (C4 of lysine) would be about 5.9 Å based on the Fe II -C4(lysine) distance (5.1 Å) and estimated Fe IV -oxo bond length (1.6 Å). While the precise structures of the reaction intermediates are unknown, this distance is significantly longer than that calculated from density-functional theory studies on SyrB2. Therefore, a shift before, during, or after decarboxylation of aKG may be needed to facilitate hydrogen atom abstraction to bring the substrate and Fe center geometries within the predicted range (Figure 7). Finally, rebound of a suitably positioned chloride would yield the halogenated product.

[0072] In order to further investigate catalytic selectivity in BesD, we performed alanine scan mutagenesis of active site residues in BesD. Protein variants were expressed, purified, and incubated with lysine before product analysis by LC/MS (Figure 8). Most alanine variants abolished all activity (both halogenation and hydroxylation). However, three variants yielded interesting profiles (Figure 9-Figure 12). Strikingly, H134A increased overall enzyme activity and abolished enzyme selectivity, resulting in decreased halogenation:hydroxylation ratios. The N219A variant also abolished selectivity for halogenation, albeit with reduced activity compared to WT BesD.

Interestingly, the T221A mutation had no effect on activity or halogenation selectivity despite the conservation of a polar amino acid at position 221 in halogenases. [0073] Analysis of the BesD crystal structure in conjunction with sequence conservation of these residues suggests that His134 and Asn219 play important roles in the second- sphere interactions that control reaction partitioning. His134 is hydrogen-bonded to the carboxylate of lysine in the BesD structure, plays an important role in orienting lysine relative to the iron complex, and is also highly conserved. The loss in halogenation selectivity upon mutation of His134 is consistent with studies that implicate the precise positioning of the substrate within the active site to favor chloride rebound over hydroxide rebound. At 2.9 Å away from aKG, Asn219 may serve to orient aKG through hydrogen bonding such that the vacant putative O 2 binding site is located far from the substrate. Additionally, Asn219 could potentially be involved in stabilizing the Fe III -OH intermediate to differentially favor chloride rebound, analogous to the proposed role of Ser189 in WelO5, if it does shift towards the substrate as proposed. Upon examining the sequence conservation of key active site residues in predicted halogenase and hydroxylase sequences with homology to BesD, we find that Asn219 is highly conserved in the halogenase homologs while greater variability is displayed by hydroxylase homologs (Figure 12). This conservation of Asn219 further supports the significance of second-sphere interactions in the BesD amino acid halogenase family. [0074] Discovery of new amino acid halogenases

[0075] To explore the sequence space around BesD, we performed a sequence-based homology search (BLAST E-value of e -5 ) to identify other potential halogenase candidates. The related hydroxylases were filtered from the data set based on the presence of the characteristic HXD/E motif, as halogenases are known to possess an HXG/A motif instead to allow for halide coordination to Fe during catalysis. To maximize sampling of functional diversity, we generated a Sequence Similarity Network (SSN) to group the homologous enzymes into eight new putative isofunctional groups, which we termed Clusters A-H (Figure 13-Figure 15). Genes adjacent to putative halogenase coding sequences were analyzed with the Enzyme Function Initiative’s Genome Neighborhood Tools in order to identify conserved genomic contexts associated with these enzymes. This analysis revealed numerous genes predicted to be involved in amino acid metabolism, such as amino acid transporters, LysR family transcriptional regulators, serine hydroxymethyl transferases, ATP-Grasp dependent carboxy-amine ligases, amino acid adenylation proteins, aspartate semialdehyde dehydrogenases, AsnC family transcriptional regulators, lysine transporter LysE, and threonine transporter RhtB, and/or genes associated with amino acid metabolism. The conservation of genes predicted to be associated with amino acid metabolism suggested that the BesD homologs might also be amino acid halogenases. We cloned, heterologously expressed, and purified representative halogenase homologs from Clusters A-H and tested the proteins for activity against a panel of amino acids (Figure 16). The products were analyzed by LC/MS, revealing newly accessible chlorinated amino acid products and identifying substrates for enzymes within the clusters (Figure 16, Figure 17, Figure 22, Figure 23).

[0076] Among the characterized halogenases, we discovered members with different regioselectivities. BesD and enzymes from Cluster A halogenate lysine at the g carbon to produce 4-Cl-lysine. However, HalB from Streptomyes iranensis (SiHalB) yielded a product with the same exact mass and characteristic chlorinated isotope pattern as 4-Cl- lysine, but with a different retention time (Figure 18, Figure 24, Figure 25). Through NMR analysis of the methyl esterified product, we identified the product of SiHalB as 5- Cl-lysine (Figure 26-Figure 30). Though 44% identical, SiHalB and BesD have distinct regioselectivities, highlighting the capability of Fe II /aKG-dependent enzymes to perform regioselective halogenation under mild conditions. We also found halogenases that perform both mono- and di-chlorination of lysine (Figure 19). When lysine is incubated with BesD, only monochlorinated 4-Cl-lysine is observed. However, in addition to monochlorination of lysine, Legionella anisa HalC (LaHalC) could also carry out dichlorination to produce 4,4-dichlorolysine (Figure 31, Figure 32-Figure 38) while HalB from Streptomyces wuyuanensis (SwHalB) yielded 5,5-dichlorolysine (Figure 39, Figure 40-Figure 45).

[0077] In addition to differing regioselectivities, other BesD family members were

identified that could act on other amino acid substrates. Enzymes from Cluster D exhibit chain-length preference and halogenate the 5-carbon substrate ornithine preferentially over the 6-carbon lysine. In particular, the Pseudomonas kilonensis HalD (PkHalD) was found to prefer L-ornithine (kcat/KM = 330 ± 70 mM -1 min -1 ) as a substrate by 25-fold over L-lysine (kcat/KM = 13.2 ± 3.6 mM -1 min -1 ) (Figure 20, Figure 46, Figure 47-Figure 49, Figure 50). Finally, Cluster E was found to contain aliphatic amino acid halogenases. When we assayed P. sp. Root562 HalE (PrHalE), we observed no activity on the amino acids lysine and ornithine. However, when we incubated PrHalE with the aliphatic amino acids leucine, isoleucine, and norleucine, we observed the expected mass and isotopic patterns for chlorinated products of those amino acids (Figure 21, Figure 55, Figure 56). We additionally observed a dichlorinated species when norleucine was used as a substrate. Further characterization by NMR indicated that PrHalE chlorinates leucine and isoleucine (CH 2 ) at the g-position and norleucine at the d-position (Figure 57-Figure 62, Figure 63-Figure 68, Figure 69-Figure 75).

[0078] Remarkably, halogenases from Clusters A-H have evolved to accommodate

different amino acids, with regioselectivity for specific sites on those substrates, while maintaining fidelity for halogenation over the competing side reaction of hydroxylation. In order to further explore the structural basis for these differences, Halogenase A-E sequences were aligned (Figure 76). As expected, the carboxylate and amine binding residues Arg74, His134, and Asp140 are highly conserved, regardless of substrate preference. However, it was surprising to note that the e-amine binding residues of BesD (Glu120, Asn219, and Thr221) were also highly conserved even for halogenases that act on non-polar amino acids. The greatest disparity in sequence occurs at W238 and W239, which are located on the C-terminal end of the protein sequence. Sequence alignment reveals that the C-terminal regions of Halogenases A-E are highly variable, differing by as much as 10 amino acids in length and resulting in unreliable alignments of that region. Structural analysis of BesD reveals that the C-terminus closes over the substrate in the active site of the protein, occluding it from solvent (Figure 77). The C-terminus of BesD additionally interacts with a corresponding internal loop near the active site (residues 117-119), which also varies among the halogenase clusters. Combined with the structure, the sequence analysis suggests that the C-terminus and the corresponding loop could be important determinants of substrate selectivity. [0079] Biosynthetic transformations of halogenated amino acids and installation of bromide and azide functional groups

[0080] Amino acids are key building blocks for metabolism, serving as the source for many metabolites and natural products, as well as ribosomally- and non-ribosomally synthesized peptides and proteins. Therefore, the discovery of the BesD family of halogenases raises the possibility that halogens and other functional groups could be incorporated into these classes of compounds through engineering downstream biosynthetic transformations. We first examined the ability to expand the repertoire of the amino acid halogenases to carry out bromination and azidation by replacement of the Cl- ligand with Br- and N - 3 , respectively. Indeed when NaCl was replaced with NaBr and NaN3, SwHalB catalyzed the corresponding bromination and azidation reactions as observed by LC/MS (Figure 78). Thus, a variety of functional handles can be installed onto amino acid substrates to diversify amino acid building blocks using SwHalB simply by providing an alternate ligand for the Fe complex. In particular bromine and azide functional groups serve as handles for substitution reactions and as precursors to amines and nitrogen containing heterocycles, respectively.

[0081] Amino acids also participate in a broad range of metabolic pathways and can be directly transformed into a variety of new compound classes (Figure 79). For example, heterocycles are key structural components in a number of bioactive molecules and drugs and can be made by oxidative cyclization by lysine cyclodeaminase (RapL) and ornithine cyclodeaminase (OCD) to form pipecolate and proline, respectively. To test whether the cyclodeaminase enzymes were sufficiently promiscuous to generate chlorinated heterocycles, we cloned, expressed, and purified RapL and OCD). Coupled reactions with lysine or ornithine halogenases and RapL or OCD, respectively, yielded chlorinated compounds with the predicted masses and isotope patterns of chloropipecolate (10), dichloropipecolate (11), and chloroproline (12) (Figure 81, Figure 84, Figure 85). Thus, the newly discovered amino acid halogenases can be further enzymatically elaborated to form 5- and 6- membered chlorinated heterocycles.

[0082] Due to the utility of diamines as precursors for polymers such as Nylon, we next sought to access chlorinated diamines through coupled enzymatic reactions. Lysine decarboxylase (LDC) catalyzes the PLP-dependent decarboxylation of lysine to 1,5- pentanediamine (13). LDC from E. coli was heterologously expressed and purified to evaluate its tolerance of a chlorinated substrate. To generate a chlorinated diamine, lysine decarboxylase was added to a reaction following the enzymatic synthesis of 5,5- dichlorolysine by SwHalB. A mass consistent with a dichlorinated diamine was observed by LCMS (Figure 86, Figure 87). We additionally accessed chlorinated a-keto acids through IlvE, an aliphatic amino acid amino transferase. IlvE from E. coli has been shown to catalyze transamination of leucine, isoleucine, and norleucine into their corresponding a-keto acids. Incubation of enzymatically synthesized chloronorleucine with IlvE afforded chlorinated a-ketoacids (14, 15) as observed by LC/MS (Figure 88, Figure 89, Figure 90).

[0083] Finally, we investigated whether chlorolysine synthesized by BesD homologs could be incorporated into peptides with the in vitro transcription/translation (IVTT) system. Following incubation of amino acids with PfHalA or SwHalB, transcription and translational machinery from the PURExpress IVTT kit, along with a plasmid encoding a 9-amino acid peptide were added to the reaction mixture. When the amino acids were pre-incubated with either halogenase, a product corresponding to the exact mass of the chlorinated peptide was detected by LC/MS (Figure 80). Therefore, the halogenases from the BesD family can be coupled with IVTT systems to generate chlorinated peptides from 4-Cl-lysine and 5-Cl-lysine.

[0084] In some embodiments, the present invention is directed to BesD halogenases and BesD homologs. As used herein, a“BesD halogenase” refers to a protein belonging to the BesD family of halogenases. The BesD family of halogenases include Accession Numbers: WP_014151497 (SEQ ID NO: 1), SDN46247 (SEQ ID NO: 2),

WP_003966730 (SEQ ID NO: 3), WP_004985557 (SEQ ID NO: 4), WP_016975823 (SEQ ID NO: 5), WP_019057205 (SEQ ID NO: 6), WP_019233318 (SEQ ID NO: 7), WP_020275004 (SEQ ID NO: 8), WP_028687259 (SEQ ID NO: 9), WP_030791981 (SEQ ID NO: 10), WP_036209184 (SEQ ID NO: 11), WP_037940874 (SEQ ID NO: 12), WP_041021480 (SEQ ID NO: 13), WP_044582715 (SEQ ID NO: 14),

WP_046063366 (SEQ ID NO: 15), WP_053557074 (SEQ ID NO: 16), WP_056854138 (SEQ ID NO: 17), WP_057008702 (SEQ ID NO: 18), WP_057723975 (SEQ ID NO: 19), and WP_067690966 (SEQ ID NO: 20); and BesD homologs. Sequence alignments between the exemplified BesD halogenases that result in: 4-Cl-Leucine, 4-Cl-Isoleucine, and 5,5-dichloronorleucine exhibit about 82% or more sequence identity; 4-Cl-Lysine exhibit about 54% or more sequence identity; 4-Cl-Ornithine exhibit about 86% or more sequence identity; and 5-Cl-Lysine, 5,5-dichlorolysine, and 5,5-dichlorolysine exhibit about 45% or more sequence identity. When the sequences of the same cluster were considered, BesD halogenases that resulted in the same products exhibited at least 56% sequence identity. Therefore, as used herein, a“BesD homolog” refers to a protein having (a) at least about 150 amino acids, (b) an amino acid sequence that has at least 56% sequence identity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, or SEQ ID NO: 22, and (c) an HXG/A motif, when optimally aligned as described herein. In some embodiments, preferred BesD homologs have (a) at least about 150 amino acids, (b) an amino acid sequence that has at least 56% sequence identity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, or SEQ ID NO: 19, SEQ ID NO: 20, and (c) an HXG/A motif, when optimally aligned as described herein. In some embodiments, preferred BesD homologs contain HHWG, HWGD, HHWGD, or HWGDY as the HXG/A motif. Exemplary BesD homologs are provided herein.

[0085] In some embodiments, the BesD homolog has at least about 150 amino acids and an amino acid sequence that has at least 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, or SEQ ID NO: 22, and an HXG/A motif (e.g., HHWG, HWGD, HHWGD, or HWGDY), when optimally aligned as described herein. In some embodiments, the BesD homolog has at least about 150 amino acids and an amino acid sequence that has at least 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, or SEQ ID NO: 19, SEQ ID NO: 20, and an HXG/A motif (e.g., HHWG, HWGD, HHWGD, or HWGDY), when optimally aligned as described herein.

[0086] BesD halogenases may be made using methods known in the art including

chemical synthesis, biosynthesis or in vitro synthesis using recombinant DNA methods, and solid phase synthesis. See, e.g., Kelly & Winkler (1990) Genetic Engineering Principles and Methods, vol.12, J. K. Setlow ed., Plenum Press, NY, pp.1-19;

Merrifield (1964) J Amer Chem Soc 85:2149; Houghten (1985) PNAS USA 82:5131- 5135; and Stewart & Young (1984) Solid Phase Peptide Synthesis, 2ed. Pierce,

Rockford, IL, which are herein incorporated by reference. BesD halogenases may be purified using protein purification techniques known in the art such as reverse phase high-performance liquid chromatography (HPLC), ion-exchange, metal affinity, or immunoaffinity chromatography, filtration or size exclusion, or electrophoresis. See, e.g., Olsnes and Pihl (1973) Biochem.12(16):3121-3126; and Scopes (1982) Protein Purification, Springer-Verlag, NY, which are herein incorporated by reference. Alternatively, the polypeptides may be made by recombinant DNA techniques known in the art. Thus, polynucleotides that encode BesD halogenases are contemplated herein. In some embodiments, the polypeptides and polynucleotides are isolated.

[0087] As used herein, an“isolated” compound refers to a compound that is isolated from its native environment. For example, an isolated polynucleotide is a one which does not have the bases normally flanking the 5’ end and/or the 3’ end of the

polynucleotide as it is found in nature. As another example, an isolated polypeptide is a one which does not have its native amino acids, which correspond to the full-length polypeptide, flanking the N-terminus, C-terminus, or both. For example, an isolated fragment of BesD halogenase refers to an isolated polypeptide that consists of only a portion of the BesD homolog or comprises some, but not all, of the amino acid residues BesD homolog and non-native amino acids, i.e., amino acids that are different from the amino acids found at the corresponding positions of BesD homolog, at its N-terminus, C- terminus, or both. In some embodiments, isolated polynucleotides and polypeptides are made“by the hand of man”, e.g., using synthetic and/or recombinant techniques.

[0088] The following examples are intended to illustrate but not to limit the invention. [0089] EXAMPLES [0090] Materials

[0091] Luria-Bertani (LB) Broth Miller, LB Agar Miller, Terrific Broth (TB), and

glycerol were purchased from EMD Biosciences (Darmstadt, Germany). Carbenicillin (Cb), kanamycin (Km), chloramphenicol (Cm), isopropyl-b-D-thiogalactopyranoside (IPTG), sodium chloride, dithiothreitol (DTT), 4-(2-hydroxyethyl)-1- piperazineethanesulfonic acid (HEPES), magnesium chloride hexahydrate, acetonitrile, ethylene diamine tetraacetic acid disodium dihydrate (EDTA), hydrochloric acid, 3kDa MWCO dialysis tubing, ammonium bicarbonate, sodium acetate, chloroform, and sodium hydroxide were purchased from Fisher Scientific (Pittsburgh, PA).

Phosphoenolpyruvate (PEP), adenosine triphosphate sodium salt (ATP), nicotinamide adenine dinucleotide reduced form dipotassium salt (NADH), pyruvate kinase, lactate dehydrogenase, lysozyme, poly(ethyleneimine) solution (PEI), ammonium iron (II) sulfate hexahydrate, a-ketoglutaric acid sodium salt, b-mercaptoethanol (bME),

N,N,N',N'-tetramethyl-ethane-1,2-diamine (TEMED), betaine, dimethylsulfoxide (DMSO), pyrodoxal-5¢-phosphate (PLP), sodium l-ascorbate, acetonitrile (LC/MS- grade), ammonium formate (LC/MS-grade), Deuterium oxide, l-lysine for crystallography, 20 canonical amino acids, seleno-l-methionine, l-leucine ( 13 C 6 , 98%), l- isoleucine ( 13 C 6 , 98%; 15 N, 98%), dl-norleucine, 2-(N-Morpholino)ethanesulfonic acid hydrate, 4-morpholineethanesulfonic acid (MES), Pall Nanosep centrifugal Omega membrane (3,000 MWCO), sodium bromide, and sodium azide were purchased from Sigma-Aldrich (St. Louis, MO). PageRuler Plus Prestained Protein Ladder was purchased from ThermoFisherScientific (Waltham, MA). Succinyl-CoA synthetase was purchased from Megazyme International (Bray, Ireland). Methanol (D4, 99.8%), chloroform (D, 99.8%), l-lysine × 2HCl ( 13 C 6 , 99%; 15 N 2 , 99%) and l-ornithine ( 13 C 5 , 99%; 15 N 2 , 99%) were purchased from Cambridge Isotope Laboratories (Andover, MA). Formic acid was purchased from Acros Organics (Morris Plains, NJ). Restriction enzymes, T4 DNA ligase, Antarctic phosphatase, Phusion DNA polymerase, Q5 DNA polymerase, T5 exonuclease, Taq DNA ligase, and the PURExpress in vitro

transcription/translation kit D(aa, tRNA) were purchased from New England Biolabs (Ipswich, MA). Deoxynucleotides (dNTPs), were purchased from Invitrogen (Carlsbad, CA). Oligonucleotides were purchased from Integrated DNA Technologies (Coralville, IA), resuspended at a stock concentration of 100 µM in water and stored at either 4 °C for immediate usage or -20 °C for long-term storage. DNA purification kits and Ni-NTA agarose were purchased from Qiagen (Valencia, CA). Zirconia/silica beads were purchased from BioSpec Products (Bartlesville, OK). Complete EDTA-free protease inhibitor was purchased from Roche Applied Science (Penzberg, Germany). PD-10 desalting columns and Superdex 7516/60 pg column were purchased from GE

Healthcare (Pittsburg, PA). Amicon Ultra 10,000 MWCO centrifugal concentrators and Milli-Q Gradient water purification system were purchased from Millipore (Billerica, MA). Acrylamide/bis-acrylamide (30%, 37.5:1), electrophoresis grade sodium dodecyl sulfate (SDS), and ammonium persulfate were purchased from Bio-Rad Laboratories (Hercules, CA). Ultrayield baffled flasks were purchased from Thompson Instrument Company (Oceanside, CA).

[0092] While chloride ions are exemplified as the halide ion (i.e., substituent) that

replaces the carbon atom of an amino acid, other halide ions, i.e., F-, Br-, I-, At-, and other ions, e.g., N - 3 , OH-, NO - 2 , may be used as the substituent to result in similarly modified amino acids as exemplified herein, e.g., 4-Br-Lysine instead of 4-Cl-Lysine. [0093] Bioinformatic selection of halogenases for screening

[0094] To select halogenases for functional screening, the sequence of BesD (SEQ ID NO: 1) was used as a query for BLAST against the non-redundant protein database with a cutoff value of e -5 , yielding 261 hits. Blast hits were aligned using MUSCLE, and the sequence position of the HXD/E or HXG/A motif was identified manually in AliView. A Bioython script was used to parse the list of alignments to segregate the halogenases (HXG/A) for storage in a separate file, yielding 105 sequences which were retrieved using BatchEntrez (NCBI). From the list of halogenases, redundant sequences of 98% sequence identity were removed using CD-HIT. The remaining halogenase sequences with homology to BesD were input into the EFI-EST web tool in the art, using Option C to read FASTA files with headers. After initial processing, sequences below 150 amino acids in length were excluded from the sequence similarity network. Using Cytoscape Version 3.6.1, the network was adjusted by deleting edges with low alignments scores until the halogenases identified for Bes biosynthesis were separated into an isofunctional cluster, which occurred at an alignment score value of 88 (which corresponds to a sequence identity of about 56%). The EFI-GNT web tool in the art was used to identify adjacent genes that were shared among 50% of the halogenase sequences in each cluster, revealing genes associated with amino acid metabolism (Figure 8). [0095] Phylogenetic analysis of halogenases

[0096] The list of halogenases for phylogenetic analysis was obtained as described

above. Halogenase sequences were aligned using MUSCLE. The MEGAX interface was then used to analyze the alignment by Maximum-Likelihood based on the JTT matrix-based model, and uncertainty in the topology of the resulting tree was evaluated with 500 bootstrap replicates. [0097] Bioinformatic analysis of active site conservation among halogenases v. hydroxylases [0098] Separate lists of halogenases (HXG/A motif) and hydroxylases (HXD/E motif) were obtained as described above. Redundant sequences with at least 90% sequence identity to other sequences under consideration were removed using CD-HIT, leaving 48 halogenases and 97 hydroxylases. Next, the halogenase and hydroxylase lists were separately analyzed with LOGOS to generate a graphical representation of sequence conservation within each data set. [0099] Bioinformatic analysis of the variability of the C-termini of halogenases

[0100] As above, a list of halogenases within a cutoff value of e -5 of was obtained. From the list of 105 halogenases, redundant sequences of 90% sequence identity were removed using CD-HIT, leaving 48 sequences. These remaining halogenases were input into LOGOS to generate a graphical representation of sequence conservation throughout the halogenases. [0101] Bacterial strains

[0102] E. coli DH10B-T1 R was used for plasmid construction. BL21(DE3)–Star

harboring the pRARE2 plasmid was used for heterologous protein production of all halogenases. Streptomyces cattleya NRRL 8057 (ATCC 35852) was obtained from the American Tissue Type Collection (Manassas, VA). The following strains were purchased from DSMZ: Pseudomonas orientalis (DSMZ 17489), Pseudomonas trivialis (DSMZ 14937), Pseudomonas sp. Root562 (DSMZ 102504), and Pseudomonas fulva (DSMZ 17717). Pseudomonas putida KT2440 was a gift from the Keasling laboratory (UC Berkeley). [0103] Construction of plasmids for protein expression

[0104] The strains and plasmids used for protein expression are summarized in Table 1:

[0105] The synthetic gene sequences are set forth in Table 2 as follows:

[0106] Gibson assembly was used to carry out plasmid construction using E. coli DH10B-T1 R as the cloning host. PCR amplifications were carried out with Phusion polymerase or Q5 polymerase (New England Biolabs, Ipswich MA) using the oligonucleotides listed in Table 3:

[0107] The intermediate cloning plasmid, pET16-His-PrescissionCutSite-IMPDH, was constructed by amplification of IMPDH from E. coli gDNA followed by insertion into NdeI/BamHI-digested pET16b. The halogenases SEQ ID NO: 1 (S. cattleya BesD), SEQ ID NO: 19 (P. orientalis HalA), SEQ ID NO: 19 (P. trivialis HalD), SEQ ID NO: 17 (P. sp. Root562 HalE), SEQ ID NO: 9 (P. fulva HalE), as well as lysine

decarboxylase WP_001020973 (LDC, E. coli), aminotransferase WP_000208517 (IlvE, E. coli) and ornithine cyclodeaminase WP_010954390 (OCD, P. putida KT2440) were amplified from genomic DNA of the corresponding organism and inserted into

NdeI/BamHI-digested pET16-His-PrescissionCutSite-IMPDH. The halogenases SEQ ID NO: 5 (P. fluorescens HalA), SEQ ID NO: 14 (S. iranensis HalB), SEQ ID NO: 4 (P. kilonensis HalD), SEQ ID NO: 22 (P. mediterranea HalH) were obtained as Gblocks and amplified with the respective primers before Gibson insertion into NdeI/BamHI-digested His-PrescissionCutSite-IMPDH. The remaining halogenases and the lysine

cyclodeaminase RapL were obtained as GeneBlocks containing the overhangs required for direct Gibson insertion into NdeI/BamHI-digested pET16-His-Prescission-IMPDH.

[0108] To clone mutants of S. lavenduligriseus BesD (SlBesD), SlBesD was amplified in two pieces from the pET16-His-SlBesD plasmid. The first piece was amplified with Sl- BesD-F and the respective reverse primers. The second piece was amplified with the respective forward primers and either Sl-BesD-F (for R74A, E120A, H134A, D140A) or pET16-anneal-R (for N219A, T221A, W238A, W239A). Then both pieces were inserted into NdeI/BamHI-digested pET16-His-PrescissionCutSite-IMPDH to yield the mutant constructs. pSV272.1-His6-MBP-PfRoot562 HalE was constructed by genomic DNA amplification with the respective primers followed by insertion into SfoI/EcoRI-digested pSV272.1. IVTT-PepC was cloned by vector-amplification of the DHFR plasmid from the PURExpress IVTT kit (E640S, NEB) using the PepC-F/R primer set, followed by Gibson assembly. Following plasmid construction, all cloned inserts were sequenced at Quintara Biosciences (San Francisco, CA) or the Barker Hall Sequencing Facility at UC Berkeley (Berkeley, CA). [0109] Expression of His10-tagged and His10-MBP-tagged proteins

[0110] E. coli BL21 Star (DE3) harboring the pRARE2 plasmid was transformed with the appropriate protein expression plasmid. An overnight TB culture of the freshly transformed cells was used to inoculate TB (1 L) containing the appropriate antibiotics (50 mg/mL carbenicillin or kanamycin with 50 mg/mL chloramphenicol) in a 2.8 L- baffled shake flask to OD 600 = 0.05. The cultures were grown at 37 °C at 200 rpm to OD 600 = 0.6 to 0.8 at which point cultures were cooled on ice for 20 min, followed by induction of protein expression with IPTG (0.25 mM) and overnight growth at 16 °C. Cell pellets were harvested by centrifugation at 9,800 × g for 7 min at 4ºC and stored at - 80ºC. [0111] Purification of His10-tagged and His10-MBP-tagged proteins

[0112] Frozen cell pellets were thawed and resuspended at 5 mL/g of cell paste in lysis buffer (50 mM HEPES, 300 mM NaCl, 10 mM imidazole, 20 mM bME, 20% (v/v) glycerol, pH 7.5) supplemented with EDTA-free Protease Inhibitor Cocktail (Roche). The cell paste was homogenized and then lysed by passage through a French Pressure cell (Thermo Scientific; Waltham, MA) at 9,000 psi or by sonication with a Misonix Sonicator 3000 (power = 7.5, 5 s on, 25 s off, 2 min total process time, ½” tip). The lysate was then centrifuged at 13,500 × g for 20 min at 4 °C to separate the soluble and insoluble fractions. DNA was precipitated in the soluble fraction with 0.15% (w/v) polyethyleneimine and stirring at 4 °C for 30 min. The precipitated DNA was then removed by centrifugation at 13,500 × g for 20 min at 4 °C. The soluble lysate was incubated with Ni-NTA (0.5 mL resin/g of cell paste) for 45 min at 4 °C, then resuspended and loaded onto a column by gravity flow. The column was washed with wash buffer (50 mM HEPES, 300 mM NaCl, 20 mM imidazole, 20 mM bME, 20% (v/v) glycerol, pH 7.5) for 15-20 column volume. The column was then eluted with elution buffer (50 mM HEPES, 300 mM NaCl, 300 mM imidazole, 20 mM bME, 20% (v/v) glycerol, pH 7.5). Fractions containing the target protein were pooled by A 280 nm and concentrated using an Amicon Ultra spin concentrator (10 kDa MWCO, Millipore). Protein was then exchanged into storage buffer (50 mM HEPES, 100 mM sodium chloride, 20% (v/v) glycerol, 1 mM DTT, pH 7.5) using PD-10 desalting columns. Note that for the PLP-dependent enzymes IlvE and LDC, 20 uM PLP was present in all the purification and storage buffers.

[0113] Final protein concentrations before storage were estimated using the e 280 nm

calculated by ExPASy ProtParam and measured by nanodrop. They are as follows: S. cattleya BesD: 2.21 mg/mL (e280 nm = 47,565 M -1 cm -1 ), P. fluorescens HalA: 4.19 mg/mL (e 280 nm = 47,565 M -1 cm -1 ), P. orientalis HalA: 3.11 mg/mL (e 280 nm = 50,545 M -1 cm -1 ), S. wuyuanensis HalB: 10.67 mg/mL (e 280 nm = 45,045 M -1 cm -1 ), S. toyocaensis HalB: 12.84 mg/mL (e280 nm = 43,555 M -1 cm -1 ), S. viridosporus HalB: 8.36 mg/mL (e280 nm = 43,555 M -1 cm -1 ), A. awajinensis HalB: 2.82 mg/mL (e280 nm = 46,075 M -1 cm -1 ), S. griseus HalB: 5.74 mg/mL (e 280 nm = 42,065 M -1 cm -1 ), S. afghaniensis HalB: 4.03 mg/mL (e280 nm = 42,065 M -1 cm -1 ), P. iranensis HalB: 2.37 mg/mL (e280 nm = 45,045 M -1 cm -1 ), S. prunicolor HalB: 1.24 mg/mL (e280 nm = 41,035 M -1 cm -1 ), P. anisa HalC: 1.25 mg/mL (e 280 nm = 53,525 M -1 cm -1 ), P. kilonensis HalD: 4.06 mg/mL (e 280 nm = 60,515 M -1 cm -1 ), P. sp. SHC52 HalD: 4.93 mg/mL (e 280 nm = 60,515 M -1 cm -1 ), P. trivialis HalD: 3.52 mg/mL (e280 nm = 60,515 M -1 cm -1 ), P. sp. Root562 HalE: 0.73 mg/mL (e280 nm = 62,005 M -1 cm -1 ), P. fulva HalE: 1.30 mg/mL (e 280 nm = 57,410 M -1 cm -1 ), S.

pristinaespiralis HalF: 2.23 mg/mL (e 280 nm = 32,430 M -1 cm -1 ), M. pelagius HalG: 8.92 mg/mL (e280 nm = 49,515 M -1 cm -1 ), P. corrugata HalH: 6.46 mg/mL (e280 nm = 46,660 M -1 cm -1 ), P. mediteranea HalH: 6.68 mg/mL (e280 nm = 46,410 M -1 cm -1 ), S. lavenduligriseus BesD (SlBesD): 6.48 mg/mL, SlBesD N219A: 3.02 mg/mL, SlBesD T221A: 3.52 mg/mL, SlBesD R74A: 6.65 mg/mL, SlBesD D140A: 2.73 mg/mL, SlBesD E120A: 4.51 mg/mL, SlBesD H134A: 3.67 mg/mL (e280 nm = 50,545 M -1 cm -1 ); SlBesD W238A: 6.80 mg/mL, SlBesD W239A: 7.28 mg/mL (e 280 nm = 45,045 M -1 cm -1 ), RapL 0.54 mg/mL (e 280 nm = 3,230 M -1 cm -1 ), OCD, 2.13 mg/mL (e 280 nm = 20,065 M -1 cm -1 ), IlvE: 13.7 mg/mL (e280 nm = 49,515 M -1 cm -1 ), LDC: 4.01 mg/mL (e280 nm = 106,605 M -1 cm -1 ). All proteins were aliquoted, flash-frozen in liquid nitrogen, and stored at -80 °C. [0114] Preparation of proteins for crystallization

[0115] For the BesD halogenase from S. cattleya, the protein was dialyzed following elution from Ni-NTA column 3 × 1:50 against dialysis buffer (50 mM HEPES, 100 mM NaCl, 1 mM DTT, pH 7.5) for 1.5 h to remove imidazole. After the third round of dialysis, protein was incubated with Prescission protease (1 mg protease/50 mg protein) and dialyzed 3 × 1:50 overnight into dialysis buffer. Cleaved and dialyzed protein was passed through Ni-NTA (2 mL) to remove Prescission and the His10 tag. The eluent was the diluted to a final salt concentration of 20 mM NaCl using Buffer A (50 mM HEPES, 20% (v/v) glycerol, 1 mM DTT, 1 mM EDTA, pH 7.5) and loaded onto a 5 mL HiTrap- Q column for ion exchange with the AKTA Purifier FPLC system (GE Healthcare). The protein was eluted using a gradient from 0-100% buffer A to buffer B (50 mM HEPES, 1 M NaCl, 20% (v/v) glycerol, 1 mM DTT, 1 mM EDTA, pH 7.5) over 40 min. The protein sample was concentrated to 2 mL and loaded onto a Superdex 7516/60 pg (GE Healthcare) column equilibrated with SEC Buffer (20 mM HEPES, 100 mM NaCl, 1 mM DTT, pH 7.5). The protein eluent was concentrated to 15 mg/mL and glycerol was added to a final concentration of 5% (v/v) before flash freezing in liquid nitrogen and storage at -80 °C. [0116] Crystallization and structure determination

[0117] Crystals of BesD from S. cattleya were obtained by the hanging drop vapor diffusion method by combining equal volumes of protein solution (10 mg/mL BesD, lysine (1.5 mM), aKG (3 mM, pH 7)) and reservoir solution (MES pH 6.5 (100 mM), sodium chloride (0.6 mM) containing 20% (v/v) PEG 4000). Thick rectangular plate crystals grew in 4 days. Crystals were transferred to an Eppendorf tube containing 100 µL of seed buffer (MES pH 6.5 (100 mM), sodium chloride (0.6 mM) containing 40% (v/v) PEG 4000) at 4 °C and vortexed for 30 seconds with 10 × 1 mM diameter zirconia/silica beads to produce a micro-seed solution. The seed solution was stored at - 80 °C for future use.

[0118] Fe-bound BesD crystals were prepared in a Coy Anaerobic chamber by micro- seeding equal volumes of protein solution (10 mg/mL BesD, lysine (1.5 mM), aKG (3 mM, pH 7)) and reservoir solution (MES pH 6.5 (100 mM), sodium chloride (0.6 mM), containing 20% (v/v) PEG 4000). Thick rectangular rods grew within 2 weeks. Crystals were soaked anaerobically in a solution containing (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1mM), aKG (1 mM), sodium ascorbate (1 mM), lysine (1 mM), and glycerol (20% (v/v) for 30 min before flash freezing in liquid nitrogen. Data were collected at Beamline 8.3.1 at the Advanced Light Source (Lawrence Berkeley National Laboratory) at a wavelength of 1.11 Å. Data were processed with XDS and scaled and merged with Aimless. The anomalous signal from the Fe atoms was sufficient for obtaining initial phases using the CRANK2 pipeline in CCP4. Bucaneer was used to build an initial model of the structure, which was refined iteratively in COOT and Phenix. Ligands were added to the model using COOT and refined in Phenix. The overall B-factors of the protein are higher than that of the ligands due to partially disordered regions of the protein.

However, the B-factors of the ligands are similar to that of the surrounding protein residues in the active site. Omit maps for the ligand complex including Fe, Cl, aKG, lysine, His137, and His 204 were generated using the Phenix Map function. The Fe, Cl, aKG, lysine, His137, and His 204 were selected for removal from refinement prior to calculation of the maps. No map is visible around those ligands at the -3s contour level. The structure was analyzed in Pymol, which was also used to create figures. [0119] General procedure for high resolution HPLC/MS analysis of polar metabolites with HILIC

[0120] Samples containing polar metabolites were analyzed using an Agilent 1290

UPLC on a SeQuant ZIC-pHILIC (5 mm, 2.1 × 100 mm; EMD-Millipore) using the following buffers: Buffer A (90% acetonitrile, 10% water, 10 mM ammonium formate) and Buffer B (90% water, 10% acetonitrile, 10 mM ammonium formate). A linear gradient from 95% to 60% Buffer A over 17 min followed by a linear gradient from 60% to 33% Buffer A over 8 min was then applied at a flow rate of 0.2 mL/min. Mass spectra were acquired in positive ionization mode using an Agilent 6530 QTOF (Agilent). [0121] In vitro screening of BesD homolog activity

[0122] Reactions (50 µL) contained the following amino acids: l-alanine, l-arginine, l- asparagine, l-aspartate, l-cysteine, l-glutamine, l-glutamate, l-glycine, l-histidine, l- isoleucine, l-leucine, l-lysine, l-methionine, l-phenylalanine, l-proline, l-serine, l- threonine, l-tryptophan, l-valine, l-ornithine, and dl-norleucine (0.5 mM each), sodium aKG (5 mM), sodium ascorbate (1 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1 mM), and sodium chloride (5 mM) in 100 mM HEPES buffer (pH 7.5). Reactions were initiated by addition of purified halogenase variants (10 µM final concentration) and allowed to proceed for 1 h at room temperature before quenching in 2 vol of methanol with 1% (v/v) formic acid. Samples were then analyzed by LC/MS on an Agilent 1290 UPLC- 6530 QTOF using the protocol for polar metabolite analysis above. Following identification of substrates from the amino acid pool, the reactions were repeated with individual amino acids lysine, ornithine, leucine, isoleucine, and norleucine tested separately (0.5 mM each). In the case of the aliphatic halogenases which chlorinate leucine, isoleucine, and norleucine, the enzymes were re-assayed against each of the three amino acids separately (0.5 mM each) to confirm activity, since the substrates and products of these isomers have the same exact mass. [0123] In vitro halogenation assay of BesD mutants

[0124] Reactions (50 µL) contained l-lysine × HCl (1 mM), sodium aKG (5 mM),

sodium ascorbate (1 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1 mM), and sodium chloride (5 mM) in 100 mM HEPES buffer (pH 7.5). Reactions were initiated by addition of purified halogenase mutants or WT BesD from S. lavanduligriseus (20 µM final concentration) and allowed to proceed for 20 min at room temperature before quenching in 2 vol of methanol with 1% (v/v) formic acid. Samples were then analyzed by LC/MS on an Agilent 1290 UPLC-6530 QTOF using the protocol for polar metabolite analysis above. [0125] Determination of kinetic parameters for PkHalD and PfHalA on lysine and ornithine [0126] Reactions (100 µL) contained ATP (2.5 mM), MgCl 2 (5 mM) phosphoenol- pyruvate (PEP, 1 mM), NADH (0.3 mM) lactate dehydrogenase (LDH, 10 U/mL), pyruvate kinase (PK, 10 U/mL) succinyl-CoA synthetase (SCS, 3.2 U/mL), coenzyme A (1 mM), sodium aKG (1 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (0.2 mM), sodium chloride (10 mM), and sodium ascorbate (2 mM) in 100 mM HEPES buffer (pH 7.5). Reactions were initiated by addition of PkHalD (5 µM) or PfHalA (5 µM) in the presence of varying concentrations of l-lysine × HCl (0 - 16 mM) or l-ornithine × HCl (0 - 2 mM). Initial rates of NADH consumption were measured by monitoring A340 using a SynergyMx

Microplate Reader (BioTek) at room temperature. k cat and K M were determined by fitting to initial rate data with Origin (OriginLab, Northampton, MA) using the equation:

where v o is the initial rate and [S] is the substrate concentration. [0127] General method for HPLC purification of chlorinated amino acids for NMR analysis [0128] Methyl-esterified chlorinated amino acids were purified using an Agilent 1200 HPLC on a SeQuant ZIC-HILIC Semi-Preparative column (5 µm, 200 Å, 150 × 21.1 mm, EMD Millipore) using the following buffers: Buffer A (90% acetonitrile, 10% water, 10 mM ammonium formate) and Buffer B (90% water, 10% acetonitrile, 10 mM ammonium formate). A gradient was applied as follows: linear gradient from 100% to 40% Buffer A over 1 h, followed by a hold of 40% Buffer A for 18 min, then to 30% Buffer A over 5 min, followed by a linear gradient to 75% A over 4 min at a flow rate of 5 mL/min. The fractions containing the desired chlorinated compounds were determined by LC-QTOF MS screening, pooled, and dried by rotary evaporation followed by speed- vacuum when the volume reached <1.5 mL. [0129] General methods for NMR data collection

[0130] All experiments were recorded on a Bruker Avance II spectrometer operating at 900 MHz and at 298 K. The instrument was equipped with a CP TXI cryoprobe and was controlled using Topspin (version 3.2) software. Data were processed using Topspin (version 3.2) by zero-filling once in each dimension, followed by apodization, Fourier transformation, and phasing, and were referenced to 4,4-dimethyl-4-silapentane-1- sulfonic acid (DSS). Spectra were analyzed using Mnova software (Mestrelab Research, Escondido, CA, USA). [0131] Preparation and purification, and NMR analysis of [ 13 C6, 15 N2]-5-Cl-lysine methyl ester [0132] [ 13 C 6 , 15 N 2 ]-5-Cl-lysine methyl ester was prepared in a 200 µL reaction containing HalB from S. iranensis (SiHalB, 70 µM), [ 13 C 6 , 15 N2]- l-lysine × 2HCl (0.35 mM), aKG (5 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1 mM), sodium ascorbate (2.5 mM) in 50 mM HEPES (pH 7.5). Chloride is provided by the enzyme storage buffer (NaCl) and the lysine hydrochloride substrate. The reaction was quenched after 60 min by transferring into 3 mL of 3 M HCl in methanol followed by incubation at 50 °C for 3 h. Samples were then cooled at room temperature for 1 h before neutralizing by dropwise addition of 10 M NaOH. Precipitate was removed by centrifugation at 10,000 × g for 10 min. The

[ 13 C 6 , 15 N2]- 5-Cl-lysine methyl ester was purified using an Agilent 1200 HPLC on a SeQuant ZIC-HILIC Semi-Preparative column as described above. Samples were resuspended in 60% D 3 -acetonitrile:40% D 2 O for NMR analysis on a Bruker 900 MHz instrument and for LC/QTOF-MS analysis. High resolution ESI-MS [M+H] + : calculated for C 13 C 35

6H 16 Cl 15 N 2 O +

2 m/z, 203.1037, found m/z 203.1034. C 13 C 6 H 37

1 6 Cl 15 N 2 O +

2 m/z, 205.1007, found m/z 205.1006.

[0133] A constant-time 1 H- 13 C HSQC experiment (Bruker pulse sequence hsqcctetgpsp) was recorded with carrier frequencies set to 5.48 ppm ( 1 H) and 40 ppm ( 13 C)) and the spectral widths set to 16 ppm ( 1 H) and 80 ppm ( 13 C). The carboxyl chemical shift was set to 176 ppm, and the 1 H- 13 C coupling constant was set to 145 Hz. The recycle delay was set to 1.5 sec. The constant-time evolution period was set to 13.6 ms, to distinguish 1 H- 13 C pairs with one or three carbon neighbors from those with two carbon neighbors. A 2D 1 H- 13 C version of the 3D HCCH-COSY experiment (Bruker pulse sequence hcchcogp3d2) was used to identify neighboring 1 H and 13 C resonances. The carrier frequencies were set to 5.48 ppm ( 1 H) and 39 ppm ( 13 C) with spectral widths of 14 ppm ( 1 H) and 75 ppm ( 13 C). The recycle delay was set to 1.8 sec. The methyl ester is formed by quenching with unlabeled methanol and does not appear using these data collection parameters. 1 H NMR (900 MHz, 60% D3-acetonitrile:40% D2O): d 4.30 (Hd), 4.19 (Ha), 3.47 (H e1 ), 3.24 (H e2 ), 2.25 (H b1 ), 2.16 (H b2 ), 2.11 (H g1 ), 1.86 (H g2 ). 13 C NMR (900 MHz, 60% D 3 -acetonitrile:40% D 2 O): d 58.91 (C d ), 53.11 (C a ), 46.10 (C e ), 31.22 (C g ), 27.31 (Cb). [0134] Preparation purification, and NMR analysis of [ 13 C 6 , 15 N 2 ]-5,5-dichlorolysine methyl ester

[0135] [ 13 C 6 , 15 N 2 ]-5,5-dichlorolysine methyl ester was prepared in a 400 µL reaction containing HalB from S. wuyuanensis (SwHalB, 70 µM), [ 13 C 6 , 15 N 2 ]- l-lysine × 2HCl (0.35 mM), aKG (5 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1 mM), sodium ascorbate (2.5 mM) in 50 mM HEPES (pH 7.5). Chloride is provided by the enzyme storage buffer (NaCl) and the lysine hydrochloride substrate. The reaction was quenched after 60 min by transferring into 3 mL of 3 M HCl in methanol followed by incubation at 50 °C for 3 h. Samples were then cooled at room temperature for 1 h before neutralizing by dropwise addition of 10 M NaOH. Precipitate was removed by centrifugation at 10,000 × g for 10 min. The [ 13 C 6 , 15 N 2 ]-5,5-dichlorolysine methyl ester was purified using an Agilent 1200 HPLC on a SeQuant ZIC-HILIC Semi-Preparative column as described above. Samples were resuspended in D2O for NMR analysis on a Bruker 900 MHz instrument and for LC/QTOF-MS analysis. High resolution ESI-MS [M+H] + : calculated for

C 13 C 5

6H 3

1 5 Cl 15

2 N 37.0647, fo 35

2O +

2 m/z, 2 und m/z 237.0647. C 13 C 6 H 15 Cl 37 Cl 15 N 2 O +

2 m/z, 239.0618, found m/z 239.0618. C 13 C 37 15 +

6H 15 Cl 2 N 2 O 2 m/z, 241.0588, found m/z 241.0587.

[0136] A constant-time 1 H- 13 C HSQC spectrum was recorded with carrier frequencies set to 4.7 ( 1 H) and 40 ( 13 C) ppm, and spectral widths set to 16 ppm ( 1 H) and 80 ppm ( 13 C). The recycle delay was set to 1.5 sec. A HCCH-COSY spectrum was recorded with carrier frequencies set to 4.7 ppm ( 1 H) and 39 ppm ( 13 C) and spectral widths of 14 ppm ( 1 H) and 75 ppm ( 13 C). HCCH-COSY spectra depend on neighboring 1 H- 13 C pairs to resolve resonances. The experiment fails when a carbon neighbor has no attached protons, as was the case for the d position of 5,5-dichlorolysine. To confirm the resonance of C 1

d, a 2D H- 13 C long-range HCCH experiment (Bruker pulse sequence hcchetgplr) was used. The carrier frequencies were set to 4.7 ( 1 H) and 50 ppm ( 13 C), and the spectral widths were set to 14 ppm ( 1 H) and 140 ppm ( 13 C). The recycle delay was 2.0 sec. The methyl ester is formed by quenching with unlabeled methanol and does not appear using these data collection parameters. 1 H NMR (900 MHz, D 2 O): d 4.32 (H a ), 3.82 (He), 2.60 (Hg1), 2.50 (Hg2), 2.50 (Hb1), 2.44 (Hb2). 13 C NMR (900 MHz, D2O): d 88.84 (Cd), 53.98 (Ce), 53.33 (Ca), 41.66 (Cg), 26.56 (Cb). [0137] Preparation purification, and NMR analysis of [ 13 C 6 , 15 N 2 ]-4,4-dichlorolysine methyl ester

[0138] [ 13 C 6 , 15 N2]- 4,4-dichlorolysine methyl ester was prepared in a 400 µL reaction containing HalC from L. anisa (LaHalC, 70 µM), [ 13 C 6 , 15 N 2 ]- l-lysine × 2HCl (1 mM), aKG (5 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1 mM), sodium ascorbate (2.5 mM) in 50 mM HEPES (pH 7.5). Chloride is provided by the enzyme storage buffer (NaCl) and the lysine hydrochloride substrate. The reaction was quenched after 4 h by transferring into 3 mL of 3 M HCl in methanol followed by incubation at 50 °C for 3 h. Samples were then cooled at room temperature for 1 h before neutralizing by dropwise addition of 10 M NaOH. Precipitate was removed by centrifugation at 10,000 × g for 10 min. The [ 13 C 6 , 15 N 2 ]-4,4-dichlorolysine methyl ester was purified using an Agilent 1200 HPLC on a SeQuant ZIC-HILIC Semi-Preparative column as described above. Samples were resuspended in D-methanol for NMR analysis on a Bruker 900 MHz instrument and for LC/QTOF-MS analysis. High resolution ESI-MS [M+H] + : calculated for

C 13 C 6 H 35

1 5 Cl 15

2 N 2 O +

2 m/z, 237.0647, found m/z 237.0651. C 13 C 5

6H 3

1 5 Cl 37 Cl 15 N 2 O +

2 m/z, 239.0618, found m/z 239.0621. C 13 C 7

6H 3

15 Cl 15

2 N 2 O +

2 m/z, 241.0588, found m/z 241.0591.

[0139] A constant-time HSQC experiment was recorded with carrier frequencies of 4.7 ppm ( 1 H) and 40 ppm ( 13 C), and with spectral widths of 16 ppm ( 1 H) and 80 ppm ( 13 C). The recycle delay was set to 2 seconds. A HCCH-COSY experiment was recorded with carrier frequencies of 4.7 ppm ( 1 H) and 39 ppm ( 13 C), and spectral widths of 14 ppm ( 1 H) and 75 ppm ( 13 C). Because Cg lacked protons, a 1,1 Adequate experiment was used to assign the shift of the Cg carbon via correlation between Hd and Cg. The experiment was recorded with carrier frequencies of 4.67 ppm ( 1 H) and 105 ppm ( 13 C), and with spectral widths of 16 ppm ( 1 H) and 200 ppm ( 13 C). The recycle delay was set to 1.5 sec. A 2D 1 H- 13 Ca plane of a 3D HCACO experiment (Bruker pulse sequence hcacogp3d) was used to unambiguously distinguish the H a resonance from the other signals. The experiment was recorded with carrier frequencies set to 4.7 ppm ( 1 H) and 50 ppm ( 13 C), and spectral widths set to 16 ppm ( 1 H) and 60 ppm ( 13 C). The recycle delay was set to 1.5 sec. 1 H NMR (900 MHz, methanol-D): d 3.86 (Ha), 3.33 (He), 2.96 (Hb1), 2.79 (Hd1), 2.68 (Hd2), 2.51 (H 13

b2). C NMR (900 MHz, methanol-D): d 82.72 (Cg), 53.12 (Ca), 52.83 (C b ), 46.01 (C d ), 37.32 (C e ). [0140] Preparation purification, and NMR analysis of [ 13 C5, 15 N2]4-Cl-ornithine methyl ester [0141] [ 13 C 5 , 15 N2]-4-Cl-ornithine methyl ester was prepared in a 400 µL reaction

containing HalD from P. kilonensis (PkHalD, 70 µM), [ 13 C 5 , 15 N 2 ]-L-ornithine× HCl (0.5 mM), aKG (5 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1 mM), sodium ascorbate (2.5 mM) in 50 mM HEPES (pH 7.5)). Chloride is provided by the enzyme storage buffer (NaCl) and the ornithine hydrochloride substrate. The reaction was quenched after 60 min by transferring into 3 mL of 3 M HCl in methanol followed by incubation at 50 °C for 3 h. Samples were then cooled at room temperature for 1 h before neutralizing by dropwise addition of 10 M NaOH. Precipitate was removed by centrifugation at 10,000 × g for 10 min. The [ 13 C 5 , 15 N 2 ]- 4-Cl-ornithine methyl ester was purified using an Agilent 1200 HPLC on a SeQuant ZIC-HILIC Semi-Preparative column as described above. Samples were resuspended in D 2 O for NMR analysis on a Bruker 900 MHz instrument and for LC/QTOF-MS analysis. High resolution ESI-MS [M+H] + : calculated for

C 13 C 5

5H 3

14 Cl 15 N 2 O +

2 m/z, 188.0847, found m/z 188.0847. C 13 C 37

5H14 Cl 15 N 2 O +

2 m/z, 190.0817, found m/z 190.0817.

[0142] A constant-time HSQC experiment was recorded with carrier frequencies of 4.7 ppm ( 1 H) and 40 ppm ( 13 C) and spectral widths of 16 ppm ( 1 H) and 80 ppm ( 13 C). The recycle delay was set to 1.5 sec. A HCCH-COSY spectrum was recorded with carrier frequencies set to 4. 7 ppm ( 1 H) and 39 ppm ( 13 C), and spectral widths of 14 ppm ( 1 H) and 80 ppm ( 13 C). The recycle delay was set to 2 sec. The methyl ester was formed by quenching with unlabeled methanol and does not appear using these data collection parameters. 1 H NMR (900 MHz, D 2 O): d 4.69 (H g ), 4.24 (H a ), 3.77 (H d1 ), 3.50 (H d2 ), 2.55 (H b1 ), 2.29 (H b2 ). 13 C NMR (900 MHz, D 2 O): d 51.5 (C g ), 48.8 (C d ), 46.1 (C a ), 32.3 (Cb). [0143] Preparation, purification, and NMR analysis of [ 13 C 6 ]-4-Cl-leucine methyl ester

[0144] [ 13 C 6 ]-4-Cl-leucine methyl ester was prepared in a 400 µL reaction containing MBP-P. Root562 HalE (80 µM), [ 13 C 6 ] l- leucine (1 mM), aKG (2 mM),

(NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (0.25 mM), sodium ascorbate (2 mM), sodium chloride (5 mM), and 40% glycerol (v/v) in 30 mM sodium acetate (pH 6.0). The reaction was quenched after 90 min by passage through a 10 kDa MWCO Pall Nanosep spin filter to remove protein. The flow-through was dried using a speed vacuum, resuspended in 200 µL of 3 M HCl in methanol, and incubated at 50 °C for 3 h. The methyl-esterified samples were then dried using a speed vacuum and resuspended in 2 mL of 90% acetonitrile:10% water. Precipitate was removed by centrifugation at 10,000 × g for 10 min. The [ 13 C 6 ]- 4-Cl-leucine methyl ester was purified using an Agilent 1200 HPLC on a SeQuant ZIC- HILIC Semi-Preparative column as described above. Fractions containing the compound of interest were dried by speed vacuum, resuspended with 100 µl of saturated ammonium bicarbonate in D2O, and extracted 3 times with 100 µL chloroform (D, 99.8%) for NMR analysis on a Bruker 900 MHz instrument and for LC/QTOF-MS analysis. High resolution ESI-MS [M+H] + : calculated for C 13 C 35

6H 15 ClNO +

2 m/z, 186.0987, found m/z 186.0988. C 13 C 37

6H15 ClNO +

2 m/z, 188.0958, found m/z 188.0958.

[0145] A constant-time HSQC was recorded with carrier frequencies of 4.7 ppm ( 1 H) and 50 ppm ( 13 C), and with spectral widths of 14 ppm ( 1 H) and 100 ppm ( 13 C). The recycle delay was set to 2 s. A HCCH-COSY spectrum was recorded with carrier frequencies of 4.7 ppm ( 1 H) and 50 ppm ( 13 C), with spectral widths of 14 ppm ( 1 H) and 100 ppm ( 13 C). The recycle delay was set to 1.8 s. To confirm the resonance of C g , a 2D 1H- 13 C long-range HCCH experiment that was recorded with carrier frequencies of 5 ppm ( 1 H) and 40 ppm ( 13 C), and with spectral widths of 14 ppm ( 1 H) and 120 ppm ( 13 C). The recycle delay was set to 2 s. The methyl ester was formed by quenching with unlabeled methanol and does not appear using these data collection parameters. 1 H NMR (900 MHz, chloroform-D): d 3.75 (Ha), 2.37 (Hb1), 1.91 (Hb2), 1.69 (Hd1), 1.64 (H d2 ). 13 C NMR (900 MHz, chloroform-D): d 69.47 (C g ), 52.39 (C a ), 50.32 (C b ), 33.14 (C d2 ), 32.97 (C d1 ). [0146] Preparation, purification, and NMR analysis of [ 13 C6, 15 N]-4-Cl-isoleucine methyl ester [0147] [ 13 C 15

6, N]-4-Cl-isoleucine methyl ester was prepared in a 400 µL reaction containing P. Root562 HalE (80 µM), [ 13 C 6 , 15 N] l- isoleucine (1 mM), aKG (2 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (0.25 mM), sodium ascorbate (2 mM), sodium chloride (5 mM), and 40% glycerol (v/v) in 30 mM sodium acetate (pH 6.0). The reaction was quenched after 90 min by passage through a 10 kDa MWCO Pall Nanosep spin filter to remove protein. The flow-through was dried using a speed vacuum, resuspended in 200 µL of 3 M HCl in methanol, and incubated at 50 °C for 3 h. Samples were then dried using a speed vacuum and resuspended in 2 mL of 90% acetonitrile and 10% water. Precipitate was removed by centrifugation at 10,000 × g for 10 min. The [ 13 C 6 , 15 N]- 4-Cl-isoleucine methyl ester was purified using an Agilent 1200 HPLC on a SeQuant ZIC-HILIC Semi- Preparative column as described above. Fractions containing the compound of interest were dried by speed vacuum, resuspended with 100 µl of saturated ammonium bicarbonate in D 2 O, and extracted 3 times with 100 µL chloroform (D, 99.8%) for NMR analysis on a Bruker 900 MHz instrument and for LC/QTOF-MS analysis. High resolution ESI-MS [M+H] + : calculated for C 13 C 35

6H15 Cl 15 NO +

2 m/z, 187.0957, found m/z 187.0959. C 13 C 7

6H 3

1 5 Cl 15 NO +

2 m/z, 189.0928, found m/z 189.0927.

[0148] A constant-time HSQC experiment was recorded with carrier frequencies set to 4.7 ppm ( 1 H) and 50 ppm ( 13 C), and with spectral widths set to 14 ppm ( 1 H) and 100 ppm ( 13 C). The recycle delay was set to 2 s. A 2D HCCH-COSY spectrum was recorded with carrier frequencies of 4.7 ppm ( 1 H) and 50 ppm ( 13 C), and spectral widths of 14 ppm ( 1 H) and 100 ppm ( 13 C). The recycle delay was set to 2 s. The methyl ester was formed by quenching with unlabeled methanol and does not appear using these data collection parameters. [ 13 C 6 , 15 N]-4-Cl-isoleucine methyl ester, 1 H NMR (900 MHz, chloroform-D): d 4.39 (Hg2), 3.55 (Ha), 2.14 (Hb), 1.49 (Hd), 0.99 (Hg1). 13 C NMR (900 MHz, chloroform-D): d 58.91 (Cg2), 56.98 (Ca), 45.39 (Cb), 20.97 (Cd), 12.18 (Cg1).

[ 13 C 6 , 15 N]-isoleucine methyl ester, 1 H NMR (900 MHz, chloroform-D): d 3.37 (H a ), 1.75 (Hb), 1.46 (Hg2), 1.22 (Hg2), 0.95 (Hg1), 0.92 (Hd). 13 C NMR (900 MHz, chloroform-D): d 58.68 (Ca), 38.72 (Cb), 24.34 (Cg2), 15.53 (Cg1), 11.44 (Cd). [0149] Preparation, purification, and NMR analysis of 5,5-dichloronorleucine methyl ester (unlabeled)

[0150] 5,5-dichloronorleucine methyl ester was prepared in a 500 µL reaction containing MBP-P. Root562 HalE (20 µM), dl- norleucine (1 mM), aKG (2 mM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (0.5 mM), sodium ascorbate (2 mM) in 50 mM NaAc (pH 6.0). After 16 h, protein was removed by passage through a 10 kDa MWCO Pall Nanosep spin filter. After dilution of the flow-through to 1.5 mL with acetonitrile, 5-5-dichloronorleucine was purified using an Agilent 1200 HPLC on a SeQuant ZIC-HILIC Semi-Preparative column as described above. Fractions containing 5,5-dichloronorleucine were dried using a speed vacuum, resuspended in 200 µL of 3 M HCl in methanol, and incubated at 50 °C for 3 h to methyl esterify the amino acid. Samples were again vacuumed to dryness. The sample was then resuspended with 100 µl of saturated ammonium bicarbonate in D 2 O and extracted 3 times with 100 µL chloroform (D, 99.8%) for NMR analysis on a Bruker 900 MHz instrument and for LC/QTOF-MS analysis. High resolution ESI-MS [M+H] + : calculated for C 35

7H 14 Cl 2 NO +

2 m/z, 214.0396, found m/z 214.0398. C 35 +

7H 14 Cl 37 ClNO 2 m/z, 216.0367, found m/z 216.0369. C 7 H 37

1 4 Cl 2 NO +

2 m/z, 218.0337, found m/z 218.0341.

[0151] An edited 1 H- 13 C HSQC experiment (Bruker pulse sequence hsqcedetgpsisp2.2), in which carbons with one or three protons are opposite in phase to those with two protons, was recorded with carrier frequencies set to 5 ppm ( 1 H) and 50 ppm ( 13 C), and with spectral widths of 11 ppm ( 1 H) and 100 ppm ( 13 C). The recycle delay was 1.5 s. A 1H- 13 C HMBC experiment (Bruker pulse sequence hmbcgplpndqf) was used to correlate protons with carbon spins two and three bonds removed via long-range proton-carbon couplings. The experiment was optimized for a long-range proton-carbon coupling of 10 Hz. Carrier frequencies were set to 5 ppm ( 1 H) and 50 ppm ( 13 C), with spectral widths of 11 ppm ( 1 H) and 100 ppm ( 13 C). The recycle delay was set to 1.5 s. Gradient COSY and TOCSY experiments were recorded with Bruker pulse sequences cosygpqf and dipsi2etgpsi, respectively. For each experiment, the carrier frequency was set to 5 ppm, the spectral widths were set to 11 ppm in both dimensions, and the recycle delay was set to 1.5 s. The mixing time in the TOCSY experiment was set to 100 ms. 1 H NMR (900 MHz, chloroform-D): d 3.51 (Ha), 2.41 (Hg1), 2.29 (Hg2), 2.19 (He), 2.16 (Hb1), 1.93 (H b2 ). 13 C NMR (900 MHz, chloroform-D): d 89.9 (C d ), 53.3 (C a ), 45.8 (C g ), 32.6 (C e ), 30.8 (Cb). [0152] Bromination and azidation assays

[0153] Reactions (50 µL) contained l-lysine (1 mM), sodium aKG (5 mM), sodium

ascorbate (5 mM), and (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (1 mM) in 50 mM sodium acetate buffer (pH 7, adjusted with acetic acid). The anion was provided as either sodium bromide (100 mM) or sodium azide (0.5 mM). Halogenases used for these assays were desalted into sodium acetate (25 mM, pH 7) to prevent competition from the native chloride ligand found in the enzyme storage buffer. Reactions were initiated by addition of SwHalB (30 mM final concentration) and allowed to proceed for 1 h at room temperature before quenching in 2 vol of Methanol with 1% (v/v) formic acid. Samples were then analyzed by LC/MS on an Agilent 1290 UPLC-6530 QTOF using the protocol for polar metabolite analysis above. [0154] Lysine cyclodeaminase reactions

[0155] Reactions (25 µL) contained l-lysine × HCl (1 mM), sodium aKG (5 mM),

sodium ascorbate (100 µM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (50 µM), and sodium chloride (1 mM) in 50 mM HEPES buffer (pH 7.5). Reactions were initiated by addition of SwHalB (15 mM final concentration) and allowed to proceed for 2 h at room temperature to form chlorinated lysine substrates for the lysine cyclodeaminase (RapL). Cyclodeamination was initiated by addition of 1 volume (25 µL) of cyclodeaminase solution (NAD + (1.8 mM) and RapL (14 µM) in 50 mM HEPES buffer pH 7.5) and allowed to proceed for 2 h at room temperature before quenching in 2 vol of Methanol with 1% (v/v) formic acid. Samples were then analyzed by LC/MS on an Agilent 1290 UPLC-6530 QTOF using the protocol for polar metabolite analysis above. [0156] Ornithine cyclodeaminase reactions

[0157] Reactions (25 µL) contained l-ornithine × HCl (1 mM), sodium aKG (5 mM), sodium ascorbate (100 µM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (50 µM), and sodium chloride (1 mM) in 50 mM HEPES buffer (pH 7.5). Reactions were initiated by addition of PkHalD (15 mM final concentration) and allowed to proceed for 2 h at room temperature to form chlorinated ornithine substrate for the ornithine cyclodeaminase (OCD).

Cyclodeamination was initiated by addition of 1 volume (25 µL) of cyclodeaminase solution (NAD + (1.8 mM) and OCD (20 µM) in 50 mM HEPES buffer pH 7.5) and allowed to proceed for 2 h at room temperature before quenching in 2 vol of Methanol with 1% (v/v) formic acid. Samples were then analyzed by LC/MS on an Agilent 1290 UPLC-6530 QTOF using the protocol for polar metabolite analysis above. [0158] Lysine decarboxylase reactions

[0159] Reactions (50 µL) contained l-lysine × HCl (1 mM), sodium aKG (5 mM),

sodium ascorbate (200 µM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (100 µM), and sodium chloride (1 mM) in 50 mM HEPES buffer (pH 7.5). Reactions were initiated by addition of SwHalB (30 mM final concentration) and allowed to proceed for 2 h at room temperature to form chlorinated lysine substrates for the lysine decarboxylase (LDC). Decarboxylation was initiated by addition of 15 µL of lysine decarboxylase (LDC, 48 µM) and allowed to proceed for 2 h at room temperature before quenching in 2 vol of methanol with 1% (v/v) formic acid. Samples were then analyzed by LC/MS on an Agilent 1290 UPLC-6530 QTOF using the protocol for polar metabolite analysis above. [0160] Aliphatic amino acid transaminase reactions

[0161] Reactions (50 µL) contained dl-norleucine (1 mM), sodium aKG (5 mM),

sodium ascorbate (200 µM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (100 µM), and sodium chloride (1 mM) in 50 mM HEPES buffer (pH 7.5). Reactions were initiated by addition of SrHalE (10 mM final concentration) and allowed to proceed for 1.5 h at room temperature to form chlorinated norleucine substrates for the aliphatic amino acid aminotransferase (IlvE). Transamination was initiated by addition of 15 µL of IlvE (370 µM) and allowed to proceed for 1 h at room temperature before quenching in 2 vol of Methanol with 1% (v/v) formic acid. Samples were then analyzed by LC/MS on an Agilent 1290 UPLC- 6530 QTOF (negative ion mode) using a Phenomenex (Torrence, CA) Rezex-ROA Organic Acid H + column (150 × 4.6 mM) and Carbo-H + Security Guard cartridge. In the isocratic method, the sample was eluted over the course of 10 min with 0.5% (v/v) formic acid. [0162] In vitro transcription/translation (IVTT) of a chlorinated peptide

[0163] The IVTT reactions were performed using the PURExpress D (aa, tRNA) kit (NEB), which separates the amino acids and tRNA from the other reaction components. Initial halogenation reactions (4.2 µL) contained the 20 canonical amino acids (1 mM), sodium aKG (1 mM), sodium ascorbate (167 µM), (NH 4 ) 2 Fe(SO 4 ) 2 × 6H 2 O (167 µM), and 0.5 mM NaCl in 33 mM HEPES buffer pH 7.5. Halogenation was initiated by addition of PfHalA or SwHalB (83 µM final concentration) and allowed to proceed for 40 min at room temperature. IVTT was then initiated by addition of the remaining PURExpress kit components along with the plasmid encoding PepC (7 ng/µL final concentration), resulting in a about 3-fold dilution of the initial halogenation reaction components. The halogenases were omitted for control reactions as noted in the data. The final reactions (14 µL) were moved to 37 °C for 4 h. Following dilution with 25 µL of 10 mM magnesium acetate, protein components were removed from the reaction by passage through a Pall Nanosep 10 kDa spin column. The flow-through was analyzed using an Agilent 1290 UPLC on a Poroshell 120 SB-Aq column (2.7 µm, 2.1 × 50 mm; Agilent) using a linear gradient from 0 to 100% acetonitrile over 5 min at a flow rate of 0.6 mL/min with 0.1% (v/v) formic acid as the mobile phase. Mass spectra were acquired using an Agilent 6530 QTOF. [0164] REFERENCES

[0165] The following references are herein incorporated by reference in their entirety with the exception that, should the scope and meaning of a term conflict with a definition explicitly set forth herein, the definition explicitly set forth herein controls:

Adams, PD, et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr.66: 213–21 (2010). Agarwal, V, et al. Enzymatic halogenation and dehalogenation reactions: pervasive and mechanistically diverse. Chem. Rev.117: 5619–5674 (2017).

Aik, W, et al. Role of the jelly-roll fold in substrate binding by 2-oxoglutarate oxygenases. Curr. Opin. Struct. Biol.22: 691–700 (2012).

Altschul, SF, et al. Basic local alignment search tool. J Mol Biol.215: 403–10 (1990). Amorim Franco, TM, et al. Chemical mechanism of the branched-chain

aminotransferase IlvE from Mycobacterium tuberculosis. Biochemistry.55: 6295–6303 (2016).

Bister, B, et al. Bromobalhimycin and chlorobromobalhimycins--illuminating the potential of halogenases in glycopeptide antibiotic biosyntheses. Chembiochem 4: 658– 662 (2003).

Blasiak, LC, et al. Crystal structure of the non-haem iron halogenase SyrB2 in syringomycin biosynthesis. Nature.440: 368–371 (2006).

Bollinger, JM, et al. Mechanisms of 2-oxoglutarate-dependent oxygenases: the hydroxylation paradigm and beyond. in 2-Oxoglutarate-Dependent Oxygenases 95–122 (Royal Society of Chemistry, London, 2015).

Challis, GL, et al. Predictive, structure-based model of amino acid recognition by nonribosomal peptide synthetase adenylation domains. Chem. Biol.7: 211–224 (2000). Chang, W, et al. Mechanism of the C5 stereoinversion reaction in the biosynthesis of carbapenem antibiotics. Science 343: 1140–1144 (2014).

Chen, K, et al. Enzymatic construction of highly strained carbocycles. Science 360: 71– 75 (2018).

Chung & Vanderwal. Stereoselective halogenation in natural product synthesis. Angew. Chemie Int. Ed.55: 4396–4434 (2016).

Cock, PJA, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics.25: 1422–3 (2009).

Cowtan, K. The Buccaneer software for automated model building.1. Tracing protein chains. Acta Crystallogr Sect D Biol Crystallogr.62: 1002–11 (2006).

Cresswell, AJ, et al. Catalytic, stereospecific syn-dichlorination of alkenes. Nat. Chem. 7: 146–152 (2015).

Crooks, GE, et al. WebLogo: a sequence logo generator. Genome Res.14: 1188–90 (2004).

de Meijere & Diederich. Metal‐Catalyzed Cross‐Coupling Reactions. (Wiley–VCH, Weinheim, 2004).

Deb Roy, A, et al. Gene expression enabling synthetic diversification of natural products: chemogenetic generation of pacidamycin analogs. J. Am. Chem. Soc.132: 12243–12245 (2010).

Dunwell, JM, et al. Cupins: the most functionally diverse protein superfamily?

Phytochemistry 65: 7–17 (2004).

Durak, LJ, et al. Late-stage diversification of biologically active molecules via chemoenzymatic C–H functionalization. ACS Catal.6: 1451–1454 (2016).

Edgar, RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics.5: 113 (2004). Emsley, P, et al. Features and development of Coot. Acta Crystallogr Sect D Biol Crystallogr.66: 486–501 (2010).

Evans & Murshudov. How good are my data and what is the resolution? Acta Crystallogr D Biol Crystallogr.69: 1204–14 (2013).

Flatt, PM, et al. Characterization of the initial enzymatic steps of barbamide

biosynthesis. J. Nat. Prod.69: 938–944 (2006).

Fu, GC. Transition-metal catalysis of nucleophilic substitution reactions: a radical alternative to SN1 and SN2 processes. ACS Cent. Sci.3: 692–700 (2017).

Galonić, DP, et al. Halogenation of unactivated carbon centers in natural product biosynthesis: trichlorination of leucine during barbamide biosynthesis. J. Am. Chem. Soc.128: 3900–1 (2006).

Galonić, DP, et al. Two interconverting Fe(IV) intermediates in aliphatic chlorination by the halogenase CytC3. Nat. Chem. Biol.3: 113–116 (2007).

Gasteiger, E, et al. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res.31: 3784–8 (2003).

Gatto, GJ, et al. Biosynthesis of pipecolic acid by RapL, a lysine cyclodeaminase encoded in the rapamycin gene cluster. J Am Chem Soc.128: 3838–47 (2006).

Gerlt, JA. Genomic enzymology: web tools for leveraging protein family sequence- function space and genome context to discover novel functions. Biochemistry.56: 4293– 308 (2017).

Gibson, DG, et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods.6: 343–5 (2009).

Goodman, JL, et al. Ornithine cyclodeaminase: structure, mechanism of action, and implications for the mu-crystallin family. Biochemistry.43: 13883–13891 (2004).

Groll, M, et al. Crystal structures of Salinosporamide A (NPI-0052) and B (NPI-0047) in complex with the 20S proteasome reveal important consequences of beta-lactone ring opening and a mechanism for irreversible binding. J. Am. Chem. Soc.128: 5136–5141 (2006).

Hartshorn, SR. Aliphatic Nucleophilic Substitution. (Cambridge University Press, London, 1973).

Hillwig & Liu. A new family of iron-dependent halogenases acts on freestanding substrates. Nat. Chem. Biol.10: 6–10 (2014).

Hillwig, ML, et al. Discovery of a promiscuous non-heme iron halogenase in ambiguine alkaloid biogenesis: implication for an evolvable enzyme family for late-stage halogenation of aliphatic carbons in small molecules. Angew. Chemie Int. Ed.55: 5780– 5784 (2016).

Holm & Laakso. Dali server update. Nucleic Acids Res.44: w351-5 (2016).

Huang, Y, et al. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics.26: 680–2 (2010).

Jones, DT, et al. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci.8: 275–82 (1992).

Junker, A, et al. Diverse modifications of the 4-methylphenyl moiety of TAK-779 by late-stage Suzuki–Miyaura cross-coupling. Org. Biomol. Chem.12: 177–186 (2014). Kabsch, W. XDS. Acta Crystallogr D Biol Crystallogr.66: 125–32 (2010).

Kal & Que. Dioxygen activation by nonheme iron enzymes with the 2-His-1-carboxylate facial triad that generate high-valent oxoiron oxidants. J. Biol. Inorg. Chem.22: 339–365 (2017).

Kan, SBJ, et al. Directed evolution of cytochrome c for carbon–silicon bond formation: Bringing silicon to life. Science.354: 1048–1051 (2016).

Kan, SBJ, et al. Genetically programmed chiral organoborane synthesis. Nature.552: 132 (2017).

Kulik & Drennan. Substrate placement influences reactivity in non-heme Fe(II) halogenases and hydroxylases. J. Biol. Chem.288: 11233–41 (2013).

Kumar, S, et al. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Battistuzzi FU, editor. Mol Biol Evol.35: 1547–9 (2018).

Larsson, A. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics.30: 3276–8 (2014).

Li, C, et al. Efficient synthesis of sterically hindered arenes bearing acyclic secondary alkyl groups by Suzuki-Miyaura cross-couplings. Angew. Chem. Int. Ed. Engl.54: 3792–3796 (2015).

Liu, W, et al. Oxidative aliphatic C-H fluorination with fluoride ion catalyzed by a manganese porphyrin. Science.337: 1322–1325 (2012).

Marchand, JA, et al. Discovery of a pathway for terminal-alkyne amino acid

biosynthesis. Nature.567: 420–4 (2019).

Martin & Buchwald. Palladium-catalyzed Suzuki-Miyaura cross-coupling reactions employing dialkylbiaryl phosphine ligands. Acc. Chem. Res.41: 1461–1473 (2008). Martinie, RJ, et al. Experimental correlation of substrate position with reaction outcome in the aliphatic halogenase, SyrB2. J. Am. Chem. Soc.137: 6912–6919 (2015).

Matthews, ML, et al. Direct nitration and azidation of aliphatic carbons by an iron- dependent halogenase. Nat. Chem. Biol.10: 209–215 (2014).

Matthews, ML, et al. Substrate positioning controls the partition between halogenation and hydroxylation in the aliphatic halogenase, SyrB2. PNAS USA.106: 17723–17728 (2009).

Matthews, ML, et al. Substrate-Triggered Formation and Remarkable Stability of the C-H Bond-Cleaving Chloroferryl Intermediate in the Aliphatic Halogenase, SyrB2. Biochemistry.48: 4331–4343 (2009).

McCusker & Klinman. Modular behavior of TauD provides insight into the origin of specificity in alpha-ketoglutarate-dependent nonheme iron oxygenases. PNAS USA. 106: 19791–19795 (2009).

Mitchell, AJ, et al. Structural basis for halogenation by iron- and 2-oxo-glutarate- dependent enzyme WelO5. Nat. Chem. Biol.12: 636–640 (2016).

Mitchell, AJ, et al. Structure-guided reprogramming of a hydroxylase to halogenate its small molecule substrate. Biochemistry.56: 441–444 (2017).

Nakamura, H, et al. A new strategy for aromatic ring alkylation in cylindrocyclophane biosynthesis. Nat. Chem. Biol.13: 916–921 (2017). Neugebauer, ME, et al. A family of radical halogenases for the engineering of amino- acid-based products. Nat. Chem. Biol.15: 1009–1016 (2019).

Neumann, CS, et al. Halogenation strategies in natural product biosynthesis. Chem. Biol. 15: 99–109 (2008).

Nicolaou, KC, et al. Palladium-catalyzed cross-coupling reactions in total synthesis. Angew. Chemie Int. Ed.44: 4442–4489 (2005).

Nissen, P, et al. The structural basis of ribosome activity in peptide bond synthesis. Science.289: 920–930 (2000).

Nyffeler, PT, et al. The chemistry of amine-azide interconversion: catalytic diazotransfer and regioselective azide reduction. J. Am. Chem. Soc.124: 10773–10778 (2002).

O’Hagan, D. Understanding organofluorine chemistry. An introduction to the C–F bond. Chem. Soc. Rev.37: 308–319 (2008).

Ortega & van der Donk. New insights into the biosynthetic logic of ribosomally synthesized and post-translationally modified peptide natural products. Cell Chem. Biol. 23: 31–44 (2016).

Pandurangan, AP, et al. The SUPERFAMILY 2.0 database: a significant proteome update and a new webserver. Nucleic Acids Res.47: D490–D494 (2019).

Park, H, et al. Controlling Pd(IV) reductive elimination pathways enables Pd(II)- catalysed enantioselective C(sp3)-H fluorination. Nat. Chem.10: 755–762 (2018).

Payne, JT, et al. Enantioselective desymmetrization of methylenedianilines via enzyme- catalyzed remote halogenation. J. Am. Chem. Soc.140: 546–549 (2018).

Price, JC, et al. Kinetic dissection of the catalytic mechanism of taurine:a-ketoglutarate dioxygenase (TauD) from Escherichia coli. Biochemistry.44: 8138–8147 (2005).

Puri, M, et al. Modeling non-heme iron halogenases: high-spin oxoiron(IV)–halide complexes that halogenate C–H bonds. J. Am. Chem. Soc.138: 2484–2487 (2016). Quinn, RK, et al. Site-selective aliphatic C–H chlorination using N-chloroamides enables a synthesis of chlorolissoclimide. J. Am. Chem. Soc.138: 696–702 (2016).

Roughley & Jordan. The medicinal chemist’s toolbox: an analysis of reactions used in the pursuit of drug candidates. J. Med. Chem.54: 3451–3479 (2011).

Rugg & Senn. Formation and structure of the ferryl [Fe=O] intermediate in the non-haem iron halogenase SyrB2: classical and QM/MM modelling agree. Phys. Chem. Chem. Phys.19: 30107–30119 (2017).

Runguphan, W, et al. Integrating carbon-halogen bond formation into medicinal plant metabolism. Nature.468: 461–464 (2010).

Savile, CK, et al. Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science.329: 305–309 (2010).

Shimizu, Y., , et al. Cell-free translation reconstituted with purified components. Nat. Biotechnol.19: 751–755 (2001).

Sievers, F, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol.7: 539 (2011).

Skubák & Pannu. Automatic protein structure solution from weak X-ray data. Nat Commun.4: 2777 (2013). Sletten & Bertozzi. Bioorthogonal chemistry: fishing for selectivity in a sea of functionality. Angew. Chem. Int. Ed. Engl.48: 6974–6998 (2009).

Srnec & Solomon. Frontier molecular orbital contributions to chlorination versus hydroxylation selectivity in the non-heme iron halogenase SyrB2. J. Am. Chem. Soc. 139: 2396–2407 (2017).

Takatsuka, Y, et al. Gene cloning and molecular characterization of lysine decarboxylase from Selenomonas ruminantium delineate its evolutionary relationship to ornithine decarboxylases from eukaryotes. J. Bacteriol.182: 6732–6741 (2000).

Ueki, M, et al. Enzymatic generation of the antimetabolite g,g-dichloroaminobutyrate by NRPS and mononuclear iron halogenase action in a streptomycete. Chem. Biol.13: 1183–1191 (2006).

Vaillancourt, FH, et al. Cryptic chlorination by a non-haem iron enzyme during cyclopropyl amino acid biosynthesis. Nature.436: 1191–1194 (2005).

Vaillancourt, FH, et al. SyrB2 in syringomycin E biosynthesis is a nonheme Fe(II) alpha- ketoglutarate- and O2-dependent halogenase. PNAS USA.102: 10111–10116 (2005). Wendisch, VF, et al. Biotechnological production of mono- and diamines using bacteria: recent progress, applications, and perspectives. Appl. Microbiol. Biotechnol.102: 3583– 3594 (2018).

Winn, MD, et al. Overview of the CCP4 suite and current developments. Acta

Crystallogr D Biol Crystallogr.67: 235–42 (2011).

Wong, C, et al. Structural analysis of an open active site conformation of nonheme iron halogenase CytC3. J. Am. Chem. Soc.131: 4872–4879 (2009).

Wong, SD, et al. Elucidation of the Fe(IV)=O intermediate in the catalytic cycle of the halogenase SyrB2. Nature.499: 320–323 (2013).

Yeh, E, et al. Robust in vitro activity of RebF and RebH, a two-component

reductase/halogenase, generating 7-chlorotryptophan during rebeccamycin biosynthesis. PNAS USA.102: 3960–3965 (2005).

Zhang, Z, et al. Crystal structure of a clavaminate synthase-Fe(II)-2-oxoglutarate- substrate-NO complex: evidence for metal centered rearrangements. FEBS Lett.517: 7– 12 (2002).

[0166] All scientific and technical terms used in this application have meanings

commonly used in the art unless otherwise specified.

[0167] As used herein, the terms“subject”,“patient”, and“individual” are used

interchangeably to refer to humans and non-human animals. The terms“non-human animal” and“animal” refer to all non-human vertebrates, e.g., non-human mammals and non-mammals, such as non-human primates, horses, sheep, dogs, cows, pigs, chickens, and other veterinary subjects and test animals. In some embodiments, the subject is a mammal. In some embodiments, the subject is a human.

[0168] As used herein, the term“diagnosing” refers to the physical and active step of informing, i.e., communicating verbally or by writing (on, e.g., paper or electronic media), another party, e.g., a patient, of the diagnosis. Similarly,“providing a prognosis” refers to the physical and active step of informing, i.e., communicating verbally or by writing (on, e.g., paper or electronic media), another party, e.g., a patient, of the prognosis.

[0169] The use of the singular can include the plural unless specifically stated otherwise.

As used in the specification and the appended claims, the singular forms“a”,“an”, and “the” can include plural referents unless the context clearly dictates otherwise.

[0170] As used herein,“and/or” means“and” or“or”. For example,“A and/or B” means “A, B, or both A and B” and“A, B, C, and/or D” means“A, B, C, D, or a combination thereof” and said“A, B, C, D, or a combination thereof” means any subset of A, B, C, and D, for example, a single member subset (e.g., A or B or C or D), a two-member subset (e.g., A and B; A and C; etc.), or a three-member subset (e.g., A, B, and C; or A, B, and D; etc.), or all four members (e.g., A, B, C, and D).

[0171] As used herein, the phrase“one or more of”, e.g.,“one or more of A, B, and/or C” means“one or more of A”,“one or more of B”,“one or more of C”,“one or more of A and one or more of B”,“one or more of B and one or more of C”,“one or more of A and one or more of C” and“one or more of A, one or more of B, and one or more of C”.

[0172] The phrase“comprises, consists essentially of, or consists of A” is used as a tool to avoid excess page and translation fees and means that in some embodiments the given thing at issue: comprises A, consists essentially of A, or consists of A. For example, the sentence“In some embodiments, the composition comprises, consists essentially of, or consists of A” is to be interpreted as if written as the following three separate sentences: “In some embodiments, the composition comprises A. In some embodiments, the composition consists essentially of A. In some embodiments, the composition consists of A.”

[0173] Similarly, a sentence reciting a string of alternates is to be interpreted as if a

string of sentences were provided such that each given alternate was provided in a sentence by itself. For example, the sentence“In some embodiments, the composition comprises A, B, or C” is to be interpreted as if written as the following three separate sentences:“In some embodiments, the composition comprises A. In some embodiments, the composition comprises B. In some embodiments, the composition comprises C.” As another example, the sentence“In some embodiments, the composition comprises at least A, B, or C” is to be interpreted as if written as the following three separate sentences:“In some embodiments, the composition comprises at least A. In some embodiments, the composition comprises at least B. In some embodiments, the composition comprises at least C.” [0174] As used herein, the terms“protein”,“polypeptide” and“peptide” are used interchangeably to refer to two or more amino acids linked together. Groups or strings of amino acid abbreviations are used to represent peptides. Except when specifically indicated, peptides are indicated with the N-terminus on the left and the sequence is written from the N-terminus to the C-terminus. Except when specifically indicated, peptides are indicated with the N-terminus on the left and the sequences are written from the N-terminus to the C-terminus. Similarly, except when specifically indicated, nucleic acid sequences are indicated with the 5’ end on the left and the sequences are written from 5’ to 3’.

[0175] As used herein, a given percentage of“sequence identity” refers to the percentage of nucleotides or amino acid residues that are the same between sequences, when compared and optimally aligned for maximum correspondence over a given comparison window, as measured by visual inspection or by a sequence comparison algorithm in the art, such as the BLAST algorithm, which is described in Altschul et al. (1990) J Mol Biol 215:403-410. Software for performing BLAST (e.g., BLASTP and BLASTN) analyses is publicly available through the National Center for Biotechnology Information

(ncbi.nlm.nih.gov). The comparison window can exist over a given portion, e.g., a functional domain, or an arbitrarily selection a given number of contiguous nucleotides or amino acid residues of one or both sequences. Alternatively, the comparison window can exist over the full length of the sequences being compared. For purposes herein, where a given comparison window (e.g., over 80% of the given sequence) is not provided, the recited sequence identity is over 100% of the given sequence.

Additionally, for the percentages of sequence identity of the proteins provided herein, the percentages are determined using BLASTP 2.8.0+, scoring matrix BLOSUM62, and the default parameters available at blast.ncbi.nlm.nih.gov/Blast.cgi. See also Altschul, et al. (1997) Nucleic Acids Res 25:3389-3402; and Altschul, et al. (2005) FEBS J 272:5101- 5109.

[0176] Unless specified herein, optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv Appl Math 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J Mol Biol 48:443 (1970), by the search for similarity method of Pearson & Lipman, PNAS USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by visual inspection. [0177] The sequences referenced by accession number are herein incorporated by reference in their entirety.

[0178] To the extent necessary to understand or complete the disclosure of the present invention, all publications, patents, and patent applications mentioned herein are expressly incorporated by reference therein to the same extent as though each were individually so incorporated.

[0179] Having thus described exemplary embodiments of the present invention, it should be noted by those skilled in the art that the within disclosures are exemplary only and that various other alternatives, adaptations, and modifications may be made within the scope of the present invention. Accordingly, the present invention is not limited to the specific embodiments as illustrated herein, but is only limited by the following claims.