Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
RECOMBINANT MICROBIAL TRANSGLUTAMINASES
Document Type and Number:
WIPO Patent Application WO/2016/170447
Kind Code:
A1
Abstract:
The present invention provides compositions and methods for recombinantly producing soluble and active microbial transglutaminases (m TGases). More particularly, the present invention relates to engineered microbial transglutaminase polypeptides comprising one or more substitutions in the pro-domain region to modulate the interaction between the pro-domain and the enzyme domain of the m TGase, and the nucleic acids encoding them. The present invention further relates to vectors and host cells comprising the engineered nucleic acids and polypeptides. The application further discloses methods of using microbial transglutaminases produced according to the methods of the disclosure.

Inventors:
RICKERT MATHIAS (US)
STROP PAVEL (US)
Application Number:
PCT/IB2016/052047
Publication Date:
October 27, 2016
Filing Date:
April 11, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
RINAT NEUROSCIENCE CORP (US)
International Classes:
C12N9/50; C12N9/10; C12N15/52; C12N15/70
Domestic Patent References:
WO1987004462A11987-07-30
WO2012059882A22012-05-10
WO2015015448A22015-02-05
Foreign References:
US4683195A1987-07-28
US4800159A1989-01-24
US4754065A1988-06-28
US4683202A1987-07-28
Other References:
KIKUCHI Y ET AL: "Secretion of Active-Form Streptoverticillium mobaraense Transglutaminase by Corynebacterium glutamicum: Processing of the Pro-Transglutaminase by a Cosecreted Subtilisin-Like Protease from Streptomyces albogriseolus", APPLIED AND ENVIRONMENTAL MICROBIOLOGY, AMERICAN SOCIETY FOR MICROBIOLOGY, US, vol. 69, no. 1, 1 January 2003 (2003-01-01), pages 358 - 366, XP002979776, ISSN: 0099-2240, DOI: 10.1128/AEM.69.1.358-366.2003
KUN WANG ET AL: "New strategy for specific activation of recombinant microbial pro-transglutaminase by introducing an enterokinase cleavage site", BIOTECHNOLOGY LETTERS, vol. 35, no. 3, 10 November 2012 (2012-11-10), NL, pages 383 - 388, XP055276105, ISSN: 0141-5492, DOI: 10.1007/s10529-012-1090-5
DATE M ET AL: "Production of native-type Streptoverticillium mobaraense transglutaminase in Corynebacterium glutamicum", APPLIED AND ENVIRONMENTAL MICROBIOLOGY, AMERICAN SOCIETY FOR MICROBIOLOGY, US, vol. 69, no. 5, 1 May 2003 (2003-05-01), pages 3011 - 3014, XP002989190, ISSN: 0099-2240, DOI: 10.1128/AEM.69.5.3011-3014.2003
HIROYA YURIMOTO ET AL: "The Pro-peptide of Streptomyces mobaraensis Transglutaminase Functions in cis and in trans to Mediate Efficient Secretion of Active Enzyme from Methylotrophic Yeasts", BIOSCIENCE BIOTECHNOLOGY BIOCHEMISTRY., vol. 68, no. 10, 1 January 2004 (2004-01-01), TOKYO, JAPAN, pages 2058 - 2069, XP055276106, ISSN: 0916-8451, DOI: 10.1271/bbb.68.2058
CHEN KANGKANG ET AL: "Enhancement ofStreptomycestransglutaminase activity and pro-peptide cleavage efficiency by introducing linker peptide in the C-terminus of the pro-peptide", JOURNAL OF INDUSTRIAL MICROBIOLOGY AND BIOTECHNOLOGY, BASINGSTOKE, GB, vol. 40, no. 3, 24 January 2013 (2013-01-24), pages 317 - 325, XP035330676, ISSN: 1367-5435, [retrieved on 20130124], DOI: 10.1007/S10295-012-1221-Y
ZHANG DONGXU ET AL: "Microbial transglutaminase production: understanding the mechanism", BIOTECHNOLOGY AND GENETIC ENGINEERING REVIEWS, INTERCEPT LTD., ANDOVER, GB, vol. 26, 1 January 2010 (2010-01-01), pages 205 - 221, XP009190210, ISSN: 0264-8725
MATHIAS RICKERT ET AL: "Production of soluble and active microbial transglutaminase in Escherichia coli for site-specific antibody drug conjugation", PROTEIN SCIENCE, vol. 25, no. 2, 26 December 2015 (2015-12-26), US, pages 442 - 455, XP055276114, ISSN: 0961-8368, DOI: 10.1002/pro.2833
YOKOYAMA ET AL., APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, vol. 64, 2004, pages 447 - 454
STROP, BIOCONJUGATE CHEMISTRY, vol. 25, no. 5, 2014, pages 855 - 862
DI SANDRO ET AL., THE BIOCHEMICAL JOURNAL, vol. 429, 2010, pages 261 - 271
FOLK, ANNU REV BIOCHEM, vol. 49, 1980, pages 517 - 531
KLOCK; KHOSLA, PROTEIN SCI, vol. 21, 2012, pages 1781 - 1791
ANDO ET AL., AGRIC BIOL CHEM, vol. 53, 1989, pages 2613 - 2617
ZHANG ET AL., BIOTECHNOLOGY & GENETIC ENGINEERING REVIEWS, vol. 26, 2010, pages 205 - 222
KASHIWAGI ET AL., THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 277, 2002, pages 44252 - 44260
STROP P, BIOCONJUGATE CHEMISTRY, vol. 25, no. 5, 2014, pages 855 - 862
STROP ET AL., CHEMISTRY & BIOLOGY, vol. 20, 2013, pages 161 - 167
ZOTZEL ET AL.: "European journal of biochemistry", FEBS, vol. 270, 2003, pages 3214 - 3222
ZOTZEL ET AL.: "European journal of biochemistry", FEBS, vol. 270, 2003, pages 4149 - 4155
YURIMOTO ET AL., BIOSCIENCE, BIOTECHNOLOGY, AND BIOCHEMISTRY, vol. 68, 2004, pages 2058 - 2069
LIU ET AL., MICROBIAL CELL FACTORIES, vol. 10, 2011, pages 112
TAKEHANA ET AL., BIOSCIENCE, BIOTECHNOLOGY, AND BIOCHEMISTRY, vol. 58, 1994, pages 88 - 92
YANG ET AL., THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 286, 2011, pages 7301 - 7307
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR PRESS
"Oligonucleotide Synthesis", 1984
"Methods in Molecular Biology, Humana Press; Cell Biology: A Laboratory Notebook", 1998, ACADEMIC PRESS
"Animal Cell Culture", 1987
J.P. MATHER; P.E. ROBERTS: "Introduction to Cell and Tissue Culture", 1998, PLENUM PRESS
"Cell and Tissue Culture: Laboratory Procedures", 1993, J. WILEY AND SONS
"Methods in Enzymology", ACADEMIC PRESS, INC.
"Gene Transfer Vectors for Mammalian Cells", 1987
"Current Protocols in Molecular Biology", 1987
"PCR: The Polymerase Chain Reaction", 1994
SAMBROOK; RUSSELL: "Molecular Cloning: A Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 2002, JOHN WILEY & SONS
HARLOW; LANE: "Using Antibodies: A Laboratory Manual", 1998, COLD SPRING HARBOR LABORATORY PRESS
COLIGAN ET AL.: "Short Protocols in Protein Science", 2003, JOHN WILEY & SONS
"Short Protocols in Molecular Biology", 1999, WILEY AND SONS
YOKOYAMA ET AL., APPL MICROBIOL BIOTECHNOL, vol. 64, 2004, pages 447 - 454
DAYHOFF, M.O.: "Atlas of Protein Sequence and Structure", vol. 5, 1978, NATIONAL BIOMEDICAL RESEARCH FOUNDATION, article "A model of evolutionary change in proteins - Matrices for detecting distant relationships", pages: 345 - 358
HEIN J.: "Methods in Enzymology", vol. 183, 1990, ACADEMIC PRESS, INC., article "Unified Approach to Alignment and Phylogenes", pages: 626 - 645
HIGGINS, D.G.; SHARP, P.M., CABIOS, vol. 5, 1989, pages 151 - 153
MYERS, E.W.; MULLER W., CABIOS, vol. 4, 1988, pages 11 - 17
ROBINSON, E.D., COMB. THEOR., vol. 11, 1971, pages 105
SANTOU, N.; NES, M., MOL. BIOL. EVOL, vol. 4, 1987, pages 406 - 425
SNEATH, P.H.A.; SOKAL, R.R.: "Numerical Taxonomy the Principles and Practice of Numerical Taxonomy", 1973, FREEMAN PRESS
WILBUR, W.J.; LIPMAN, D.J., PROC. NATL. ACAD. SCI. USA, vol. 80, 1983, pages 726 - 730
"PCR: The Polymerase Chain Reaction", 1994, BIRKAUSWER PRESS
GOEDDEL: "Gene Expression Technology: Methods in Enzymology", 1990, ACADEMIC PRESS
HOPP; WOODS, PROC. NATL. ACAD. SCI. USA, vol. 78, 1981, pages 3824
ENGSTROM, BIOCHEM. EXP. BIOL., vol. 11, 1974, pages 7 - 13
"Current Communications in Molecular Biology", 1986, COLD SPRING HARBOR LABORATORY
LOOMAN ET AL., EMBO J, vol. 6, 1987, pages 2489 - 2492
LAEMMLI, NATURE, vol. 227, 1970, pages 680 - 685
FOLK; COLE, THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 241, 1966, pages 5518 - 5525
YOKOYAMA ET AL., BIOSCIENCE, BIOTECHNOLOGY, AND BIOCHEMISTRY, vol. 64, 2000, pages 1263 - 1270
YANG ET AL., BIOSCIENCE, BIOTECHNOLOGY, AND BIOCHEMISTRY, vol. 73, 2009, pages 2531 - 2534
MARX ET AL., JOURNAL OF BIOTECHNOLOGY, vol. 136, 2008, pages 156 - 162
SOMMER ET AL., PROTEIN EXPRESSION AND PURIFICATION, vol. 77, 2011, pages 9 - 19
ZOTZEL ET AL., EUROPEAN JOURNAL OF BIOCHEMISTRY / FEBS, vol. 270, 2003, pages 4149 - 4155
MARX ET AL., ENZYME MICROB TECHNOL, vol. 40, 2007, pages 1543 - 1550
YANG ET AL., THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 286, 2011, pages 7301 - 7307
PASTERNACK ET AL.: "European journal of biochemistry", FEBS, vol. 257, 1998, pages 570 - 576
JEGER ET AL., ANGEWANDTE CHEMIE, vol. 49, 2010, pages 9995 - 9997
Attorney, Agent or Firm:
WALDRON, Roy F. (235 East 42nd StreetMS 235/9/S2, New York New York, US)
Download PDF:
Claims:
We claim:

1 . A method of producing a microbial transglutaminase, comprising:

preparing a nucleic acid comprising a modified microbial transglutaminase gene and a nucleic acid comprising a protease gene; wherein the modified microbial transglutaminase gene encodes a polypeptide comprising at least one amino acid substitution within a pro-domain region of the microbial transglutaminase; and

expressing the nucleic acid comprising a modified microbial transglutaminase gene and the nucleic acid comprising a protease gene, in a host cell.

2. The method of claim 1 , wherein the pro-domain region of the microbial

transglutaminase is selected from the group consisting of:

a) residues 1 -44 of SEQ ID NO: 1 ;

b) residues 1 -44 of SEQ ID NO: 32;

c) residues 1 -44 of SEQ ID NO: 33;

d) residues 1 -46 of SEQ ID NO: 34;

e) residues 1 -44 of SEQ ID NO: 35;

f) residues 1 -52 of SEQ ID NO: 36;

g) residues 1 -57 of SEQ ID NO: 37;

h) residues 1 -57 of SEQ ID NO: 38;

i) residues 1 -57 of SEQ ID NO: 39;

j) residues 1 -56 of SEQ ID NO: 40;

k) residues 1 -51 of SEQ ID NO: 41 ;

I) residues 1 -52 of SEQ ID NO: 42;

m) residues 1 -57 of SEQ ID NO: 43; and

n) residues 1 -53 of SEQ ID NO: 44.

3. The method of claim 2, wherein the pro-domain region of the microbial

transglutaminase is residues 1 -44 of SEQ ID NO: 1 .

4. The method of claim 1 , wherein the at least one amino acid substitution is within a conserved region of a pro-domain region of the microbial transglutaminase.

5. The method of claim 4, wherein the conserved region of the pro-domain region is selected from the group consisting of:

a) residues Tyr10 to Glu29 of SEQ ID NO: 1 ,

b) residues Tyr10 to Glu29 of SEQ ID NO: 32,

c) residues Tyr10 to Glu29 of SEQ ID NO: 33,

d) residues Tyr10 to Glu29 of SEQ ID NO: 34,

e) residues Tyr10 to Glu29 of SEQ ID NO: 35,

f) residues Tyr17 to Glu36 of SEQ ID NO: 36,

g) residues Tyr12 to Lys31 of SEQ ID NO: 37,

h) residues Tyr12 to Lys31 of SEQ ID NO: 38,

i) residues Tyr12 to Lys31 of SEQ ID NO: 39,

j) residues Tyr1 1 to Lys30 of SEQ ID NO: 40,

k) residues Tyr12 to Glu31 of SEQ ID NO: 41 ,

I) residues Tyr12 to Glu31 of SEQ ID NO: 42,

m) residues Tyr12to Glu31 of SEQ ID NO: 43, and

n) residues Tyr12 to Glu31 of SEQ ID NO: 44.

6. The method of claim 5, wherein the conserved region of the pro-domain region is amino acid residues 10 to 29 of SEQ ID NO: 1 .

7. The method of any one of claims 1 -6, wherein the protease is 3C protease, TEV (Tobacco Etch Virus) protease, thrombin, factor Xa, enterokinase, SUMO (Small ubiquitin Modifier) protease, TVMV (Tobacco Vein Mottling Virus) protease, or TAMEP (Transglutaminase Activating Metalloprotease).

8. The method of claim 7, wherein the 3C protease gene encodes a polypeptide comprising the amino acid sequence of SEQ ID NO: 2.

9. The method of any one of claims 1 -8, wherein the at least one amino acid

substitution is at an amino acid residue selected from the group consisting of: a) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 1 ; b) any one or more of Tyi 0, Tyi 4, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 32;

c) any one or more of Tyi O, Tyi 4, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 33;

d) any one or more of Tyi 0, His14, Leu16, Asp20, Val21 , Asn23, Ile24,

Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 34;

e) any one or more of Tyii 0, Tyii 4, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 35;

f) any one or more of Tyii 7, His21 , Leu23, Asp27, Val28, Asp30, Ile31 , Asn32, Leu34, Asn35, or Glu36 of SEQ ID NO: 36;

g) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 37;

h) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 38;

i) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26,

Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 39;

j) any one or more of Tyii 1 , His15, Leu17, Asp21 , Val22, Asn24, Ile25, Asn26, Leu28, Asn29, or Lys30 of SEQ ID NO: 40;

k) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 41 ;

I) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 42;

m) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 43; and

n) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Ser25, Ile26,

Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 44.

10. The method of claim 9, wherein the at least one or more amino acid substitution is at a residue selected from the group consisting of any one or more of Tyr14, Asp20, Ile24, and Asn25 of SEQ ID NO: 1.

11. The method of any one of claims 1 -10, wherein the amino acid substitution is from a wild-type amino acid to Ala, His, Lys, Phe, Met, Thr, Gin, Asp, or Glu.

12. The method of any one of claims 1 -10, wherein the amino acid substitution is from a wild-type amino acid to another amino acid which is a small, acidic, charged, or polar amino acid.

13. The method of any one of claims 1 -10, wherein the amino acid substitution is selected from the group consisting of:

a) any one or more of Ty OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 1 ;

b) any one or more of Ty OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

32;

c) any one or more of Ty OAIa, TyM 4Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

33;

d) any one or more of TyM OAIa, His14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, Ile24ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 34;

e) any one or more of TyM OAIa, TyM 4Ala, Leu16Ala, Asp20Ala, Val21Ala,

Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

35;

f) any one or more of Tyr17Ala, His21Ala, Leu23Ala, Asp27Ala, Val28Ala, Asp30Ala, lle31Ala, Asn32Ala, Leu34Ala, Asn35Ala, or Glu36Ala in SEQ ID NO: 36;

g) any one or more of TyM 2Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO:

37;

h) any one or more of TyM 2Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO:

38; i) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO:

39;

j) any one or more of TyrHAIa, His15Ala, Leu17Ala, Asp21Ala, Val22Ala, Asn24Ala, lle25Ala, Asn26Ala, Leu28Ala, Asn29Ala, or Lys30Ala in SEQ ID NO:

40;

k) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 41 ;

I) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala,

Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 42;

m) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 43; and

n) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Ser25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 44.

14. The method of any one of claims 1 -13, wherein the microbial transglutaminase is harvested from the host cell.

15. The method of claim 14, wherein the host cell is a bacterial cell, a yeast cell, a filamentous fungal cell, an algal cell, an insect cell, or a mammalian cell.

16. The method of claim 14, wherein the host cell is a member of the genus

Clostridium, Zymomonas, Escherichia, Salmonella, Serratia, Erwinia, Klebsiella, Shigella, Rhodococcus, Pseudomonas, Bacillus, Lactobacillus, Enterococcus, Alcaligenes, Paenibacillus, Arthrobacter, Corynebacterium, Brevibacterium, Schizosaccharomyces, Kluyveromyces, Yarrowia, Pichia, Candida, Pichia, or

Saccharomyces.

17. The method of claim 14, wherein the host cell is Escherichia coli,

Cory nebacteri urn glutamicum, Bacillus subtilis, Saccharomyces cerevisiae, or Pichia pastoris.

18. The method of any one of claims 1 -17, wherein the microbial transglutaminase is derived from bacteria or yeast.

19. The method of claim 17, wherein the microbial transglutaminase is derived from Streptomyces mobarensis, Streptoverticillium mobarensis, Streptomyces viridis, Streptomyces ladakanum, Streptomyces caniferus, Streptomyces platensis, Streptomyces hygroscopius, Streptomyces netropsis, Streptomyces fradiae, Streptomyces roseovertivillatus, Streptomyces cinnamaoneous, Micrococcus, Clostridium, Turolpsis, Rhizopus, Monascus, Bacillus, Saccharomyces, Candida, Cryptococcus or isolates thereof.

20. The method of any one of claims 1 -19, wherein the modified microbial

transglutaminase gene and the protease gene are on two separate vectors expressed together in the same cell or in different cells.

21 . The method of any one of claims 1 -19, wherein the modified microbial

transglutaminase gene and the protease gene are on the same vector expressed together in the same cell.

22. The method of any one of claims 1 -21 , further comprising a purification step, wherein the microbial transglutaminase is purified by a chromatography step.

23. The method of claim 22, wherein the chromatography step is affinity

chromatography, ion-exchange chromatography, hydrophobic interaction chromatography, or size-exclusion chromatography.

24. The method of claim 22, wherein the microbial transglutaminase is purified first by affinity chromatography, followed by size-exclusion chromatography.

25. The method of any one of claims 22-24, wherein the modified microbial transglutaminase gene comprises a sequence encoding a polyhistidine tag at the carboxyl terminus of the microbial transglutaminase.

26. A method of producing a microbial transglutaminase, comprising the steps of: a) preparing a nucleic acid comprising a modified microbial transglutaminase gene and a nucleic acid comprising a 3C protease gene; wherein the modified microbial transglutaminase gene encodes a polypeptide comprising at least one amino acid substitution within the sequence set forth by residues 1 -44 of SEQ ID NO: 1 ;

b) subcloning the nucleic acid comprising a modified microbial transglutaminase gene and the nucleic acid comprising a 3C protease gene into an expression vector;

c) transforming and culturing a host cell with the expression vector of step b); and

d) harvesting the microbial transglutaminase from the host cell.

27. The method of claim 25, wherein the 3C protease gene encodes the amino acid sequence of SEQ ID NO: 2.

28. The method of claim 1 -27, wherein the modified microbial transglutaminase gene encodes a polypeptide comprising a 3C protease cleavage site engineered between the pro-domain and the enzyme domain of the polypeptide.

29. The method of any one of claims 1 -28, wherein the modified microbial

transglutaminase gene is selected from the group consisting of SEQ ID NO: 9, SEQ ID NO: 1 1 , SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21 , SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, or SEQ ID NO: 29.

30. A vector comprising a nucleic acid comprising a modified microbial

transglutaminase gene and a protease gene; wherein the modified microbial transglutaminase gene encodes a microbial transglutaminase polypeptide comprising at least one amino acid substitution within a pro-domain region of the microbial transglutaminase.

31 . The vector of claim 30, wherein the pro-domain region of the transglutaminase is selected from the group consisting of:

a) residues 1 -44 of SEQ ID NO: 1 ;

b) residues 1 -44 of SEQ ID NO: 32;

c) residues 1 -44 of SEQ ID NO: 33;

d) residues 1 -46 of SEQ ID NO: 34;

e) residues 1 -44 of SEQ ID NO: 35;

f) residues 1 -52 of SEQ ID NO: 36;

g) residues 1 -57 of SEQ ID NO: 37;

h) residues 1 -57 of SEQ ID NO: 38;

i) residues 1 -57 of SEQ ID NO: 39;

j) residues 1 -56 of SEQ ID NO: 40;

k) residues 1 -51 of SEQ ID NO: 41 ;

I) residues 1 -52 of SEQ ID NO: 42;

m) residues 1 -57 of SEQ ID NO: 43; and

n) residues 1 -53 of SEQ ID NO: 44.

32. The vector of claim 31 , wherein the pro-domain region of the microbial

transglutaminase is residues 1 -44 of SEQ ID NO: 1 .

33. The vector of claim 30, wherein the modified microbial transglutaminase gene encodes a microbial transglutaminase polypeptide comprising at least one amino acid substitution within a conserved region of a pro-domain region of the microbial transglutaminase.

34. The vector of claim 33, wherein the conserved region of the pro-domain region is selected from the group consisting of:

a) residues Tyr10 to Glu29 of SEQ ID NO: 1 ;

b) residues Tyr10 to Glu29 of SEQ ID NO: 32;

c) residues Tyr10 to Glu29 of SEQ ID NO: 33; d) residues Tyr10 to Glu29 of SEQ ID NO: 34;

e) residues Tyr10 to Glu29 of SEQ ID NO: 35;

f) residues Tyr17 to Glu36 of SEQ ID NO: 36;

g) residues Tyr12 to Lys31 of SEQ ID NO: 37;

h) residues Tyr12 to Lys31 of SEQ ID NO: 38;

i) residues Tyr12 to Lys31 of SEQ ID NO: 39;

j) residues Tyr1 1 to Lys30 of SEQ ID NO: 40;

k) residues Tyr12 to Glu31 of SEQ ID NO: 41 ;

I) residues Tyr12 to Glu31 of SEQ ID NO: 42;

m) residues Tyr12to Glu31 of SEQ ID NO: 43; and

n) residues Tyr12 to Glu31 of SEQ ID NO: 44.

35. The vector of claim 34, wherein the conserved region of the pro-domain region is amino acid residues 10 to 29 of SEQ ID NO: 1 .

36. The vector of any one of claims 30-35, wherein the protease is 3C protease, TEV (Tobacco Etch Virus) protease, thrombin, factor Xa, enterokinase, SUMO (Small ubiquitin Modifier) protease, TVMV (Tobacco Vein Mottling Virus) protease, or TAMEP (Transglutaminase Activating Metalloprotease).

37. The vector of claim 36, wherein the 3C protease gene encodes a polypeptide comprising the amino acid sequence of SEQ ID NO: 2.

38. The vector of any one of claims 30-37, wherein the at least one amino acid

substitution is at an amino acid residue selected from the group consisting of: a) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 1 ;

b) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 32;

c) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24,

Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 33;

d) any one or more of Tyr10, His14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 34; e) any one or more of Tyi 0, Tyi 4, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 35;

f) any one or more of Tyi 7, His21 , Leu23, Asp27, Val28, Asp30, Ile31 , Asn32, Leu34, Asn35, or Glu36 of SEQ ID NO: 36;

g) any one or more of Tyi 2, His16, Leu18, Asp22, Val23, Asn25, Ile26,

Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 37;

h) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 38;

i) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 39;

j) any one or more of Tyii 1 , His15, Leu17, Asp21 , Val22, Asn24, Ile25, Asn26, Leu28, Asn29, or Lys30 of SEQ ID NO: 40;

k) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 41 ;

I) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26,

Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 42;

m) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 43; and

n) any one or more of Tyii 2, His16, Leu18, Asp22, Val23, Ser25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 44.

39. The vector of claim 38, wherein the at least one or more amino acid substitution is at a residue selected from the group consisting of any one or more of Tyr14, Asp20, Ile24, and Asn25 of SEQ ID NO: 1.

40. The vector of any one of claims 30-39, wherein the amino acid substitution is from a wild-type amino acid to Ala, His, Lys, Phe, Met, Thr, Gin, Asp, or Glu.

41. The vector of any one of claims 30-39, wherein the amino acid substitution is from a wild-type amino acid to a small, acidic, charged or polar amino acid.

42. The vector of any one of claims 30-39, wherein the amino acid substitution is selected from the group consisting of: a) any one or more of Tyi OAla, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

1 ;

b) any one or more of Tyi OAla, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

32;

c) any one or more of Tyi OAla, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

33;

d) any one or more of Tyi OAla, His14Ala, Leu16Ala, Asp20Ala, Val21Ala,

Asn23Ala, Ile24ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 34;

e) any one or more of Tyii OAla, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 35;

f) any one or more of Tyr17Ala, His21Ala, Leu23Ala, Asp27Ala, Val28Ala, Asp30Ala, lle31Ala, Asn32Ala, Leu34Ala, Asn35Ala, or Glu36Ala in SEQ ID NO:

36;

g) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO:

37;

h) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO:

38;

i) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala,

Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO:

39;

j) any one or more of Tyii 1 Ala, His15Ala, Leu17Ala, Asp21Ala, Val22Ala, Asn24Ala, lle25Ala, Asn26Ala, Leu28Ala, Asn29Ala, or Lys30Ala in SEQ ID NO: 40;

k) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 41 ; I) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 42;

m) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 43; and

n) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Ser25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 44.

43. The vector of any one of claims 30-42, wherein the 3C protease gene encodes the amino acid sequence of SEQ ID NO: 2.

44. The vector of any one of claims 30-43, wherein the modified microbial

transglutaminase gene encodes a polypeptide comprising a 3C protease cleavage site engineered between the pro-domain and the enzyme domain of the polypeptide.

45. The vector of any one of claims 30-44, wherein the modified microbial

transglutaminase gene is selected from the group consisting of SEQ ID NO: 9, SEQ ID NO: 1 1 , SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21 , SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, and SEQ ID NO: 29.

46. A modified microbial transglutaminase polypeptide comprising at least one amino acid substitution within the pro-domain region of the microbial transglutaminase.

47. A modified microbial transglutaminase polypeptide comprising at least one amino acid substitution within the pro-domain region of the microbial transglutaminase, wherein the pro-domain is selected from the group consisting of:

a) residues 1 -44 of SEQ ID NO: 1 ;

b) residues 1 -44 of SEQ ID NO: 32;

c) residues 1 -44 of SEQ ID NO: 33;

d) residues 1 -46 of SEQ ID NO: 34; e) residues 1 -44 of SEQ ID NO: 35;

f) residues 1 -52 of SEQ ID NO: 36;

g) residues 1 -57 of SEQ ID NO: 37;

h) residues 1 -57 of SEQ ID NO: 38;

i) residues 1 -57 of SEQ ID NO: 39;

j) residues 1 -56 of SEQ ID NO: 40;

k) residues 1 -51 of SEQ ID NO: 41 ;

I) residues 1 -52 of SEQ ID NO: 42;

m) residues 1 -57 of SEQ ID NO: 43; and

n) residues 1 -53 of SEQ ID NO: 44.

48. The transglutaminase polypeptide of claim 46 or 47, wherein the polypeptide comprises at least one amino acid substitution within the sequence set forth by residues 1 -44 of SEQ ID NO: 1.

49. The transglutaminase polypeptide of claim 46 or 47, wherein the

at least one amino acid substitution is within a conserved region of the pro- domain region of the microbial transglutaminase.

50. The transglutaminase polypeptide of claim 49, wherein the conserved region of the pro-domain region is selected from the group consisting of:

a) residues Tyr10 to Glu29 of SEQ ID NO: 1 ,

b) residues Tyr10 to Glu29 of SEQ ID NO: 32,

c) residues Tyr10 to Glu29 of SEQ ID NO: 33,

d) residues Tyr10 to Glu29 of SEQ ID NO: 34,

e) residues Tyr10 to Glu29 of SEQ ID NO: 35,

f) residues Tyr17 to Glu36 of SEQ ID NO: 36,

g) residues Tyr12 to Lys31 of SEQ ID NO: 37,

h) residues Tyr 2 to Lys31 of SEQ ID NO: 38,

i) residues Tyr12 to Lys31 of SEQ ID NO: 39,

j) residues Tyr1 1 to Lys30 of SEQ ID NO: 40,

k) residues Tyr12 to Glu31 of SEQ ID NO: 41 ,

I) residues Tyr12 to Glu31 of SEQ ID NO: 42, m) residues Tyr12to Glu31 of SEQ ID NO: 43, and

n) residues Tyr12 to Glu31 of SEQ ID NO: 44.

51 . The transglutaminase peptide of claim 50, wherein the conserved region of the pro-domain region is amino acid residues 10 to 29 of SEQ ID NO: 1 .

52. The transglutaminase polypeptide of any one of claims 46-51 , wherein the

polypeptide comprises at least one substitution at an amino acid residue selected from:

a) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24,

Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 1 ;

b) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 32;

c) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 33;

d) any one or more of Tyr10, His14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 34;

e) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 35;

f) any one or more of Tyr17, His21 , Leu23, Asp27, Val28, Asp30, Ile31 ,

Asn32, Leu34, Asn35, or Glu36 of SEQ ID NO: 36;

g) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 37;

h) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 38;

i) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 39;

j) any one or more of Tyr1 1 , His15, Leu17, Asp21 , Val22, Asn24, Ile25, Asn26, Leu28, Asn29, or Lys30 of SEQ ID NO: 40;

k) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26,

Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 41 ;

I) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 42; m) any one or more of Tyi 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 43; and

n) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Ser25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 44.

53. The transglutaminase polypeptide of claim 52, wherein the modified

transglutaminase polypeptide comprises at least one or more amino acid substitution at residue selected from the group consisting of any one or more of Tyr14, Asp20, Ile24, and Asn25 of SEQ ID NO: 1 .

54. The transglutaminase polypeptide of any one of claims 46-53, wherein the amino acid substitution is from a wild-type amino acid to Ala, His, Lys, Phe, Met, Thr, Gin, Asp, or Glu.

55. The transglutaminase polypeptide of any one of claims 46-53, wherein the amino acid substitution is from a wild-type amino acid to a small, acidic, charged or polar amino acid.

56. The transglutaminase polypeptide of any one of claims 46-53, wherein the amino acid substitution is selected from the group consisting of:

a) any one or more of Ty OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

1 ;

b) any one or more of Ty OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

32;

c) any one or more of Ty OAIa, TyM 4Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

33;

d) any one or more of TyM OAIa, His14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, Ile24ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 34; e) any one or more of Tyi OAla, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO:

35;

f) any one or more of Tyr17Ala, His21Ala, Leu23Ala, Asp27Ala, Val28Ala, Asp30Ala, Me31Ala, Asn32Ala, Leu34Ala, Asn35Ala, or Glu36Ala in SEQ ID NO:

36;

g) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO:

37;

h) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala,

Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO:

38;

i) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO: 39;

j) any one or more of TyrHAIa, His15Ala, Leu17Ala, Asp21Ala, Val22Ala, Asn24Ala, lle25Ala, Asn26Ala, Leu28Ala, Asn29Ala, or Lys30Ala in SEQ ID NO: 40;

k) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31 Ala of SEQ ID NO:

41 ;

I) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 42;

m) any one or more of Tyi 2Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala,

Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 43; and

n) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Ser25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 44.

57. The transglutaminase polypeptide of any one of claims 46-56, wherein the polypeptide comprises a 3C protease cleavage engineered between the pro- domain and the enzyme domain of the polypeptide. 58. The transglutaminase polypeptide of any one of claims 46-57, wherein the polypeptide is soluble.

59. A transglutaminase polypeptide produced according to the methods of any one of claims 1 to 29.

60. An isolated host cell that recombinantly produces the transglutaminase

polypeptide of any one of claims 46-58.

. An isolated host cell comprising the vector of any one of claims 30 to 45.

Description:
RECOMBINANT MICROBIAL TRANSGLUTAMINASES

Cross-Reference To Related Application

This application claims the benefit of U.S. Provisional Application No.

62/152,720 filed April 24, 2015, which is hereby incorporated by reference in its entirety.

Field of the Invention

The field of this invention relates generally to methods of producing microbial transglutaminases. The invention also relates to engineered microbial transglutaminase polypeptides comprising one or more substitutions in the pro-domain region, and the nucleic acids encoding them. The invention also relates to vectors and host cells comprising the engineered nucleic acids and polypeptides.

Background of the Invention

Transglutaminase (EC 2.3.2.13, protein-glutamine γ-glutamyltransferase, TGase) is part of a family of enzymes which catalyze cross linking between the γ-carboxyamide groups of glutamine residues and a variety of primary amines, including the amino group of lysine (Yokoyama et al. (2004) Applied microbiology and biotechnology

64:447-454; Strop (2014) Bioconjugate chemistry 25(5):855-862). TGases can be found throughout all groups of organisms including prokaryotes, eukaryotes (Yokoyama et al. (2004) Applied microbiology and biotechnology 64:447-454), and plants (Di Sandro et al. (2010) The Biochemical journal 429:261 -271 ). Animal TGases, for example, human blood coagulation factor XIII, TG2, and guinea pig liver TGase, are multi-subunit proteins and depend on calcium for regulation of enzyme function (Folk (1980) Annu Rev Biochem 49:517-531 ; Klock and Khosla (2012) Protein Sci 21 : 1781 -1791 ).

Microbial transglutaminases (mTGases) on the other hand, like the first discovered in Streptomyces mobarensis (Ando et al. (1989) Agric Biol Chem 53:2613-2617), have only one subunit and do not depend on calcium for activity (Zhang et al. (2010)

Biotechnology & genetic engineering reviews 26:205-222). Streptomyces mTGases evolved from proteases and the fold surrounding the catalytic triad shows similarities to the animal TGases suggesting that both enzymes are related by convergent evolution. Animal TGases and mTGase utilize a catalytic triad comprised of Cys, His, and Asp residues, although, relative to the active site cysteine, the position of His and Asp are reversed (Kashiwagi et al. (2002) The Journal of biological chemistry 277:44252- 44260).

mTGase is mainly used in the food industry to modulate the texture of meat, fish, or dairy products like yogurt, and cheese (Zhang et al. (2010) Biotechnology & genetic engineering reviews 26:205-222). More recently, mTGase use has also been

demonstrated in a wide variety of applications in pharmaceutical industry for

conjugation of proteins, DNA, and peptides, as well as in tissue engineering (Strop P (2014) Bioconjugate chemistry 25(5):855-862). The feasibility of using mTGase for generating antibody drug conjugates (ADCs) for therapeutic applications was recently demonstrated (Strop et al. (2013) Chemistry & biology 20: 161 -167). The commercially available mTGase is produced by fermentation of wild type S. mobarensis (Ando et al. (1989) Agric Biol Chem 53:2613-2617). S. mobarensis expresses mTGase as an inactive zymogen and the pro-domain needs to be proteolytically processed in order to yield an active enzyme (Zotzel et al. (2003) European journal of biochemistry / FEBS 270:3214-3222; Zotzel et al. (2003) European journal of biochemistry / FEBS 270:4149- 4155). Pro-mTGase is secreted into the surrounding medium together with activating proteases. The activation of mTGase occurs stepwise, and to date two endogenous enzymes, transglutaminase activating metalloprotease (TAMEP) and S. mobarensis tripeptidyl aminopeptidase (SM-TAP) have been purified and biochemically

characterized (Zotzel et al. (2003) European journal of biochemistry / FEBS 270:4149- 4155). The pro-domain of Streptomyces mTGase was shown to be essential for correct folding and activity of mature mTGase (Yurimoto et al. (2004) Bioscience,

biotechnology, and biochemistry 68:2058-2069) and its chaperone activity was revealed when pro-mTGase was expressed as one molecule, or even when the pro-domain and mature-domain were co-expressed as separate chains (Yurimoto et al. (2004)

Bioscience, biotechnology, and biochemistry 68:2058-2069; Liu et al. (201 1 ) Microbial cell factories 10:1 12).

While S. mobarensis derived mTGase is widely used in the food industry, clinical applications of mTGase would benefit from the development of a more commonly used expression system for soluble and fully active mTGase. Large scale production of mTGase for pharmaceutical applications would be most convenient if accomplished in commonly used and well-characterized recombinant host systems such as, e.g., E. coli, due to their well characterized genetics, versatile cloning tools, and low cost. To date, however, attempts to produce large amounts of mTGase in a soluble and fully active form in commonly used systems such as yeast and E. coli have failed (Takehana et al. (1994) Bioscience, biotechnology, and biochemistry 58:88-92; Liu et al. (201 1 ) Microbial cell factories 10:1 12). Accordingly, more efficient methods for producing recombinant microbial transglutaminase are needed. The present invention meets these needs.

Summary of the Invention

The present invention relates to compositions and methods for recombinantly producing soluble and active microbial transglutaminases (mTGases). The invention relates to the discovery that introducing one or more amino acid substitutions in the pro- domain of an mTGase polypeptide and/or introducing a protease cleavage site between the pro-domain and the enzyme-domain of the mTGase allows the expression and purification of soluble and active mTGase enzyme domain. E. coli produced

transglutaminase exhibited identical enzyme activity compared to wild type mTGase from S. mobarensis. Accordingly, the present disclosure provides engineered microbial transglutaminase polypeptides comprising one or more substitutions in the pro-domain region, and the nucleic acids encoding them. The disclosure also provides vectors and host cells comprising the engineered nucleic acids and polypeptides. The disclosure further provides methods of using microbial transglutaminases produced according to the methods of the disclosure.

In certain aspects, the disclosure provides a method of producing a microbial transglutaminase, comprising: preparing a nucleic acid comprising a modified microbial transglutaminase gene and a nucleic acid comprising a protease gene; wherein the modified microbial transglutaminase gene encodes a polypeptide comprising at least one amino acid substitution within a pro-domain region of the microbial

transglutaminase; and expressing the nucleic acid comprising a modified microbial transglutaminase gene and the nucleic acid comprising a protease gene, in a host cell.

In certain aspects, the disclosure provides a method of producing a microbial transglutaminase, comprising the steps of: a) preparing a nucleic acid comprising a modified microbial transglutaminase gene and a nucleic acid comprising a 3C protease gene; wherein the modified microbial transglutaminase gene encodes a polypeptide comprising at least one amino acid substitution within the sequence set forth by residues 1 -44 of SEQ ID NO: 1 ; b) subcloning the nucleic acid comprising a modified microbial transglutaminase gene and the nucleic acid comprising a 3C protease gene into an expression vector; c) transforming and culturing a host cell with the expression vector of step b); and d) harvesting the microbial transglutaminase from the host cell.

In certain aspects, the disclosure provides a vector comprising a nucleic acid comprising a modified microbial transglutaminase gene and a protease gene; wherein the modified microbial transglutaminase gene encodes a microbial transglutaminase polypeptide comprising at least one amino acid substitution within a pro-domain region of the microbial transglutaminase.

In certain aspects, the disclosure provides a modified microbial transglutaminase polypeptide comprising at least one amino acid substitution within the pro-domain region of the microbial transglutaminase.

In certain aspects, the disclosure provides a modified microbial transglutaminase polypeptide comprising at least one amino acid substitution within the pro-domain region of the microbial transglutaminase, wherein the pro-domain is selected from the group consisting of: a) residues 1 -44 of SEQ ID NO: 1 ; b) residues 1 -44 of SEQ ID NO: 32; c) residues 1 -44 of SEQ ID NO: 33; d) residues 1 -46 of SEQ ID NO: 34; e) residues 1 -44 of SEQ ID NO: 35; f) residues 1 -52 of SEQ ID NO: 36; g) residues 1 -57 of SEQ ID NO: 37; h) residues 1 -57 of SEQ ID NO: 38; i) residues 1 -57 of SEQ ID NO: 39; j) residues 1 -56 of SEQ ID NO: 40; k) residues 1 -51 of SEQ ID NO: 41 ; I) residues 1 -52 of SEQ ID NO: 42; m) residues 1 -57 of SEQ ID NO: 43; and n) residues 1 -53 of SEQ ID NO: 44.

In some embodiments of any of the foregoing aspects, the pro-domain region of the microbial transglutaminase is selected from the group consisting of: a) residues 1 - 44 of SEQ ID NO: 1 ; b) residues 1 -44 of SEQ ID NO: 32; c) residues 1 -44 of SEQ ID NO: 33; d) residues 1 -46 of SEQ ID NO: 34; e) residues 1 -44 of SEQ ID NO: 35; f) residues 1 -52 of SEQ ID NO: 36; g) residues 1 -57 of SEQ ID NO: 37; h) residues 1 -57 of SEQ ID NO: 38; i) residues 1 -57 of SEQ ID NO: 39; j) residues 1 -56 of SEQ ID NO: 40; k) residues 1 -51 of SEQ ID NO: 41 ; I) residues 1 -52 of SEQ ID NO: 42; m) residues 1 -57 of SEQ ID NO: 43; and n) residues 1 -53 of SEQ ID NO: 44. In certain

embodiments, the pro-domain region of the microbial transglutaminase is residues 1 -44 of SEQ ID NO: 1 .

In some embodiments of any of the foregoing aspects, the at least one amino acid substitution is within a conserved region of a pro-domain region of the microbial transglutaminase. In some embodiments of any of the foregoing aspects, the conserved region of the pro-domain region is selected from the group consisting of: a) residues Tyr10 to Glu29 of SEQ ID NO: 1 , b) residues Tyr10 to Glu29 of SEQ ID NO: 32, c) residues Tyr10 to Glu29 of SEQ ID NO: 33, d) residues Tyr10 to Glu29 of SEQ ID NO: 34, e) residues Tyr10 to Glu29 of SEQ ID NO: 35, f) residues Tyr17 to Glu36 of SEQ ID NO: 36, g) residues Tyr12 to Lys31 of SEQ ID NO: 37, h) residues Tyr12 to Lys31 of SEQ ID NO: 38, i) residues Tyr12 to Lys31 of SEQ ID NO: 39, j) residues Tyr1 1 to Lys30 of SEQ ID NO: 40, k) residues Tyr12 to Glu31 of SEQ ID NO: 41 , I) residues Tyr12 to Glu31 of SEQ ID NO: 42, m) residues Tyr12to Glu31 of SEQ ID NO: 43, and n) residues Tyr12 to Glu31 of SEQ ID NO: 44. In some embodiments, the conserved region of the pro-domain region is amino acid residues 10 to 29 of SEQ ID NO: 1 .

In some embodiments of any of the foregoing aspects, the protease is 3C protease, TEV (Tobacco Etch Virus) protease, thrombin, factor Xa, enterokinase, SUMO (Small ubiquitin Modifier) protease, TVMV (Tobacco Vein Mottling Virus) protease, or TAMEP (Transglutaminase Activating Metalloprotease).

In some embodiments of any of the foregoing aspects, the 3C protease gene encodes a polypeptide comprising the amino acid sequence of SEQ ID NO: 2.

In some embodiments of any of the foregoing aspects, the at least one amino acid substitution is at an amino acid residue selected from the group consisting of: a) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 1 ; b) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 32; c) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 33; d) any one or more of Tyr10, His14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 34; e) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 35; f) any one or more of Tyr17, His21 , Leu23, Asp27, Val28, Asp30, Ile31 , Asn32, Leu34, Asn35, or Glu36 of SEQ ID NO: 36; g) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 37; h) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 38; i) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 39; j) any one or more of Tyr1 1 , His15, Leu17, Asp21 , Val22, Asn24, Ile25, Asn26, Leu28, Asn29, or Lys30 of SEQ ID NO: 40; k) any one or more of Tyi 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 41 ; I) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 42; m) any one or more of Tyi 2, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 43; and n) any one or more of Tyi 2, His16, Leu18, Asp22, Val23, Ser25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 44.

In some embodiments of any of the foregoing aspects, the at least one or more amino acid substitution is at a residue selected from the group consisting of any one or more of Tyr14, Asp20, Ile24, and Asn25 of SEQ ID NO: 1.

In some embodiments of any of the foregoing aspects, the amino acid

substitution is from a wild-type amino acid to Ala, His, Lys, Phe, Met, Thr, Gin, Asp, or Glu.

In some embodiments of any of the foregoing aspects, the amino acid

substitution is from a wild-type amino acid to another amino acid which is a small, acidic, charged or polar amino acid.

In some embodiments of any of the foregoing aspects, the amino acid

substitution is selected from the group consisting of: a) any one or more of Tyrl OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 1 ; b) any one or more of Tyr1 OAla, Tyr14Ala,

Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 32; c) any one or more of Tyrl OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 33; d) any one or more of Tyr1 OAla, His14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, Ile24ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 34; e) any one or more of Tyrl OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 35; f) any one or more of Tyr17Ala, His21Ala, Leu23Ala, Asp27Ala, Val28Ala, Asp30Ala, Me31 Ala, Asn32Ala, Leu34Ala, Asn35Ala, or Glu36Ala in SEQ ID NO: 36; g) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala,

Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO: 37; h) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31 Ala in SEQ ID NO: 38; i) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO: 39; j) any one or more of TyrHAIa, His15Ala, Leu17Ala, Asp21Ala, Val22Ala, Asn24Ala, lle25Ala, Asn26Ala, Leu28Ala, Asn29Ala, or Lys30Ala in SEQ ID NO: 40; k) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 41 ; I) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 42; m) any one or more of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 43; and n) any one or more of Tyi 2Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala,

Ser25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 44.

In some embodiments of any of the foregoing aspects, the microbial

transglutaminase is harvested from the host cell.

In some embodiments of any of the foregoing aspects, the host cell is a bacterial cell, a yeast cell, a filamentous fungal cell, an algal cell, an insect cell, or a mammalian cell.

In some embodiments of any of the foregoing aspects, the host cell is a member of the genus Clostridium, Zymomonas, Escherichia, Salmonella, Serratia, Erwinia, Klebsiella, Shigella, Rhodococcus, Pseudomonas, Bacillus, Lactobacillus,

Enterococcus, Alcaligenes, Paenibacillus,

Arthrobacter, Corynebacterium, Brevibacterium, Schizosaccharomyces,

Kluyveromyces, Yarrowia, Pichia, Candida, Pichia, or Saccharomyces. In some embodiments, the host cell is Escherichia coli, Corynebacterium glutamicum, Bacillus subtilis, Saccharomyces cerevisiae, or Pichia pastoris.

In some embodiments of any of the foregoing aspects, the microbial

transglutaminase is derived from bacteria or yeast. In some embodiments, the microbial transglutaminase is derived from Streptomyces mobarensis, Streptoverticillium mobarensis, Streptomyces viridis, Streptomyces ladakanum, Streptomyces caniferus, Streptomyces platensis, Streptomyces hygroscopius, Streptomyces netropsis,

Streptomyces fradiae, Streptomyces roseovertivillatus, Streptomyces cinnamomeus, Micrococcus, Clostridium, Turolpsis, Rhizopus, Monascus, Bacillus, Saccharomyces, Candida, Cryptococcus or isolates thereof. In some embodiments of any of the foregoing aspects, the modified microbial transglutaminase gene and the protease gene are on two separate vectors expressed together in the same cell or in different cells.

In some embodiments of any of the foregoing aspects, the modified microbial transglutaminase gene and the protease gene are on the same vector expressed together in the same cell.

In some embodiments of any of the foregoing aspects, the method further comprises a purification step, wherein the microbial transglutaminase is purified by a chromatography step. In some embodiments, the chromatography step is affinity chromatography, ion-exchange chromatography, hydrophobic interaction

chromatography, or size-exclusion chromatography. In some embodiments, the microbial transglutaminase is purified first by affinity chromatography, followed by size- exclusion chromatography.

In some embodiments of any of the foregoing aspects, the modified microbial transglutaminase gene comprises a sequence encoding a polyhistidine tag at the carboxyl terminus of the microbial transglutaminase.

In some embodiments of any of the foregoing aspects, the modified microbial transglutaminase gene encodes a polypeptide comprising a 3C protease cleavage site engineered between the pro-domain and the enzyme domain of the polypeptide.

In some embodiments of any of the foregoing aspects, the modified microbial transglutaminase gene is selected from the group consisting of SEQ ID NO: 9, SEQ ID NO: 1 1 , SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21 , SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, and SEQ ID NO: 29.

In some embodiments of any of the foregoing aspects, the transglutaminase polypeptide is soluble.

In certain aspects, the disclosure provides a transglutaminase polypeptide produced according to any of the methods of the disclosure.

In certain aspects, the disclosure provides an isolated host cell that

recombinantly produces a transglutaminase polypeptide of the disclosure.

In certain aspects, the disclosure provides a host cell comprising a vector of the disclosure. Brief Description of the Drawings

The foregoing summary, as well as the following detailed description of the invention, will be better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention there are shown in the drawings embodiment(s) which are presently preferred. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.

Figures 1 A-1 D show the structure of the expression plasm id pBAD-T7 and the results of the small scale E. coli expression study. Figure 1 (A) shows the structure of expression plasm id pBAD-T7. The positions of the coding regions for pro-mTGase and 3C-protease are indicated by arrows. The positions for the T7- and araBAD-promoter are also indicated. Open reading frames for AraC regulator (AraC) and ampicillin resistance (Amp) are shown as open and filled black arrows, respectively. Figure 1 (B) shows the DNA sequence of the T7-promoter gene cassette. DNA sequences colored in light grey represent positions of restriction enzymes. Sequences for T7-promoter and terminator are indicated in grey and black, respectively, and the coding region for pro- mTGase is shown as a black box. Figure 1 (C) is an SDS-PAGE gel showing the results of the pBAD-T7 E. coli small scale expression study. Lanes 1 to 3 show controls for pro- mTGase, 3C-protease, and non-induced lysate, respectively. Lane 4 shows the simultaneous induction of pro-mTGAse and 3C protease. Lane 5 shows induction of pro-mTGase, 20° overnight, followed by induction of 3C-protease, 30 min at room temperature; lane 6 shows induction of pro-mTGase, 20 20° overnight, followed by induction of 3C-protease, 1 h at room temperature; lane 7 shows induction of pro- mTGase, 20° overnight, followed by induction of 3C-protease, 2h at room temperature; lane 8 shows induction of pro-mTGase, 20° overnight, followed by induction of 3C- protease, 30 min at room temperature plus additional purified 3C-protease, incubation for 2h at room temperature. Arrows mark the positions for pro-mTGase, mTGase, and 3C-protease. Figure 1 (D) shows an anti-His tag western blot analysis corresponding to the SDS-PAGE gel of Figure 1 (C).

Figures 2A-2C show the purification of soluble mTGase expressed in E. coli. Figure 2(A) shows the results of size exclusion chromatography. The chromatogram was monitored at wavelength A280 nm. Labeled dashes above the chromatogram show retention times for protein size standard. Black arrows indicate protein elution fractions analyzed with SDS-PAGE gel shown in inset. Figure 2(B) shows the results of intact liquid chromatography-mass spectrometry. Labeled peaks confirm the size of mTGase pro-domain (5,629 Da; calculated mass: 5,629 Da) and mTGase enzyme domain (39, 197 Da; calculated mass: 39,200 Da), respectively. Figure 2(C) shows the results of an mTGase enzyme activity assay. The black bar represents 100% enzyme activity of wild type S. m. mTGase control. The white bar indicates the activity of non-mutated but 3C-protease processed mTGase. Error bars represent deviation of two independently performed assays. Figures 3A-3B show the design of mTGase pro-domain alanine mutants. Figure

3(A) gives a structural overview of Pro-mTGase. A surface representation of mature mTGase in grey and a cartoon representation of the pro-domain are shown (PDB 3IU0)(Yang et al. (201 1 ) The Journal of biological chemistry 286:7301 -7307). Pro- domain amino acid residues facing towards mTGase enzyme domain are shown in stick representation. Figure 3(B) shows the pro-domain amino acid sequence. Boxes indicate pro-domain alpha helices 1 and 2. Amino acids colored in bold, Y10, Y14, L16, D20, V21 , N23, I24, N25, L27, N28, and E29 indicate the main contact residues with the mTGase enzyme domain. The amino acid sequence LEVLFQGP (SEQ ID NO: 45) indicates a 3C-protease cleavage site. For each mTGase enzyme domain contact residue, a mutant was generated exchanging the contact residue with the amino acid alanine.

Figures 4A-4D depict the analysis of mTGase pro-domain alanine mutants.

Figure 4(A) shows the SDS-PAGE gel analysis of mTGase pro-domain mutants following Ni-NTA purification. Mature mTGase protein runs around 38kDa, pro-domain around 5kDa, and mTGase aggregates around 65-190kDa. The differences in pro- domain size, most prominently displayed by mutant Leu27 [Fig. 4(A), lane 12], is attributed to SDS page mobility change. Figure 4(B) shows the results of an mTGase enzyme activity assay. The black bar control represents 100% enzyme activity of wild type S. mobarensis (S. m. mTGase). The white bar indicates the activity of non-mutated but 3C-protease processed mTGase. The remaining grey bars represent activities of mTGase pro-domain mutants. Error bars represent deviation of two independently performed assays. Figure 4(C) shows the results of an mTGase inhibitor assay. Black bars indicate inhibitor binding ratios of assay performed at 25°C. Grey bars indicate inhibitor binding ratios of assay performed at 37°C. Inhibitor binding to wild type S.

mobarensis mTGase [Fig. 4(C), bar 1 ] was set to 100%. Figure 4(D) shows an anti-His western blot of mTGase pro-domain mutants shown on SDS-PAGE in Fig. 4(A). Black star in lane 1 indicates lane with wild type S. mobarensis mTGase without His-tag.

Black arrowheads indicate pro-domain mutants with large amount of higher order mTGase aggregates. Black arrowheads in all panels highlight pro-domain mutants Tyr10, Leu16 and Val21 . The stars highlight pro-domain mutants Asp20 and Asn23, respectively.

Figures 5A-5D show intact mass deconvolution of mTGase variants. Figures 5A- 5D show intact mass deconvolutions of wild type S. m. mTGase (A), E. coli produced non-mutated and 3C-protease processed mTGase (B), mTGase mutant Tyr14 (C), and mTGase mutant Leu27 (D). While the SDS PAGE analysis showed different mobility of the Leu27A pro-domain, the correct size of mutant Leu27A pro-domain was confirmed by mass spectroscopy. The larger molecular size of mature mTGase produced in E. coli (B) compared to wild type S. m. mTGase (A) is due to the presence of a C-terminal His- tag and two extra N-terminal amino acids, glycine and proline, left behind after 3C- protease processing of the pro-domain.

Figure 6 shows the protein yield of Ni-NTA purified non-mutated mTGase and mTGase pro-domain alanine mutants. The white bar represents the protein yield of non- mutated but 3C-protease processed mTGase, while the checkered bars represent the protein yields of mTGase pro-domain mutants.

Figures 7A-7E show mTGase-catalyzed generation of site specific antibody drug conjugates. Figure 7(A) depicts the chemical reaction catalyzed by mTGase. Figure 7(B) is a surface representation of an lgG1 antibody with the position of the inserted glutamine tag used for site specific conjugation encircled. Figure 7(C) shows the chemical structure of the Linker-payload utilized for making of the antibody drug conjugate. Figure 7(D) shows hydrophobic interaction chromatography (HIC) traces of anti-Ab1 -AcLys-vc-0101 conjugated with wild type S. m. mTGase (dotted line, left panel) and anti-Ab1 -AcLys-vc-0101 conjugated with mTGase produced and purified out of E. coli (pro-domain mutant Y14A) (dotted line, right panel). Antibody anti-Ab1 prior to conjugation is shown as solid lines and conjugation yields are shown in graph titles. Figure 7(E) shows mass spectroscopy analysis of the two anti-Ab1 -Acl_ys-vc-0101 conjugates. The mass difference between the peaks corresponding to DAR 1 and 2 is 1 ,301 Da, which is the mass of the cytotoxic drug minus 17Da due to ammonia loss in the conjugation reaction.

Figure 8 shows the antibody conjugation efficiency (EC50) of selected mTGase mutants. The antibody (Ab1 ) was conjugated with AcLys-vc-0101 payload with decreasing amounts of particular mTGase pro-domain mutant. Average drug-antibody ratio (DAR) was determined by hydrophobic interaction chromatography. Titration of wild type S. mobarensis mTGase shown in upward closed triangles (^ ), of mutant Tyr14 in circles (·), of mutant Ile24 in a checked box (Ξ), of mutant Leu16 in open triangles (Δ), of mutant Asp20 in open squares (□), of mutant Asn23 in diamonds ( ), of mutant Asn28 in open circles (O), and of non-mutated processed E. coli mTGase in downward closed triangles (▼). The enzyme concentration is plotted on a logarithmic scale.

Figures 9A-9C show the alignment of mTGase pro-domains from different Streptomyces species. Exemplary residues of the pro-domain region that may be modified in some embodiments of the invention are indicated by an "X".

Detailed Description of the Invention General Techniques

Unless otherwise defined herein, scientific and technical terms used in connection with the present invention shall have the meanings that are commonly understood by those of ordinary skill in the art. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. Generally, nomenclatures used in connection with, and techniques of, cell and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic acid chemistry and hybridization described herein are those well known and commonly used in the art.

The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are within the skill of the art. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, second edition (Sambrook et al., 1989) Cold Spring Harbor Press; Oligonucleotide Synthesis (M.J. Gait, ed., 1984); Methods in Molecular Biology, Humana Press; Cell Biology: A Laboratory Notebook (J.E. Cellis, ed., 1998) Academic Press; Animal Cell Culture (R. I. Freshney, ed., 1987); Introduction to Cell and Tissue Culture (J. P. Mather and P.E. Roberts, 1998) Plenum Press; Cell and Tissue Culture: Laboratory Procedures (A. Doyle, J.B. Griffiths, and D.G. Newell, eds., 1993-1998) J. Wiley and Sons; Methods in Enzymology (Academic Press, Inc.); Gene Transfer Vectors for Mammalian Cells (J.M. Miller and M.P. Calos, eds., 1987); Current

Protocols in Molecular Biology (F.M. Ausubel et al., eds., 1987); PCR: The Polymerase Chain Reaction, (Mullis et al., eds., 1994);; Sambrook and Russell, Molecular Cloning: A Laboratory Manual, 3rd. ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (2001 ); Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, NY (2002); Harlow and Lane Using Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1998); Coligan et al., Short Protocols in Protein Science, John Wiley & Sons, NY (2003); Short Protocols in

Molecular Biology (Wiley and Sons, 1999).

Enzymatic reactions and purification techniques are performed according to manufacturer's specifications, as commonly accomplished in the art or as described herein. The nomenclatures used in connection with, and the laboratory procedures and techniques of, analytical chemistry, biochemistry, immunology, molecular biology, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well known and commonly used in the art. Standard techniques are used for chemical syntheses, and chemical analyses.

Throughout this specification and claims, the word "comprise," or variations such as "comprises" or "comprising," will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.

It is understood that wherever embodiments are described herein with the language "comprising," otherwise analogous embodiments described in terms of "consisting of" and/or "consisting essentially of" are also provided. The articles "a", "an" and "the" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element.

Reference to "about" a value or parameter herein includes (and describes) embodiments that are directed to that value or parameter per se. For example, description referring to "about X" includes description of "X." Numeric ranges are inclusive of the numbers defining the range.

Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in their respective testing measurements. Moreover, all ranges disclosed herein are to be understood to encompass any and all subranges subsumed therein. For example, a stated range of "1 to 10" should be considered to include any and all subranges between (and inclusive of) the minimum value of 1 and the maximum value of 10; that is, all subranges beginning with a minimum value of 1 or more, e.g., 1 to 6.1 , and ending with a maximum value of 10 or less, e.g., 5.5 to 10.

Where aspects or embodiments of the invention are described in terms of a Markush group or other grouping of alternatives, the present invention encompasses not only the entire group listed as a whole, but each member of the group individually and all possible subgroups of the main group, but also the main group absent one or more of the group members. The present invention also envisages the explicit exclusion of one or more of any of the group members in the claimed invention.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In case of conflict, the present specification, including definitions, will control. Throughout this specification and claims, the word "comprise," or variations such as "comprises" or "comprising" will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers. Unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. Any example(s) following the term "e.g." or "for example" is not meant to be exhaustive or limiting. Exemplary methods and materials are described herein, although methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention. The materials, methods, and examples are illustrative only and not intended to be limiting. Definitions

The following terms, unless otherwise indicated, shall be understood to have the following meanings:

The terms "polypeptide", "oligopeptide", "peptide" and "protein" are used interchangeably herein to refer to chains of amino acids of any length. The chain may be linear or branched, it may comprise modified amino acids, and/or may be interrupted by non-amino acids. The terms also encompass an amino acid chain that has been modified naturally or by intervention; for example, disulfide bond formation,

glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a labeling component. Also included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), as well as other modifications known in the art. It is understood that the polypeptides can occur as single chains or associated chains.

As known in the art, "polynucleotide," or "nucleic acid," as used interchangeably herein, refer to chains of nucleotides of any length, and include DNA and RNA. The nucleotides can be deoxyribonucleotides, ribonucleotides, modified nucleotides or bases, and/or their analogs, or any substrate that can be incorporated into a chain by DNA or RNA polymerase. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and their analogs. If present, modification to the nucleotide structure may be imparted before or after assembly of the chain. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component. Other types of modifications include, for example, "caps", substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), those containing pendant moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those containing chelators (e.g., metals, radioactive metals, boron, oxidative metals, etc.), those containing alkylators, those with modified linkages (e.g., alpha anomeric nucleic acids, etc.), as well as unmodified forms of the polynucleotide(s). Further, any of the hydroxyl groups ordinarily present in the sugars may be replaced, for example, by phosphonate groups, phosphate groups, protected by standard protecting groups, or activated to prepare additional linkages to additional nucleotides, or may be conjugated to solid supports. The 5' and 3' terminal OH can be phosphorylated or substituted with amines or organic capping group moieties of from 1 to 20 carbon atoms. Other hydroxyls may also be derivatized to standard protecting groups. Polynucleotides can also contain analogous forms of ribose or deoxyribose sugars that are generally known in the art, including, for example, 2'-O-methyl-, 2'-O-allyl, 2'-fluoro- or 2'-azido-ribose, carbocyclic sugar analogs, alpha- or beta-anomeric sugars, epimeric sugars such as arabinose, xyloses or lyxoses, pyranose sugars, furanose sugars, sedoheptuloses, acyclic analogs and abasic nucleoside analogs such as methyl riboside. One or more phosphodiester linkages may be replaced by alternative linking groups. These alternative linking groups include, but are not limited to, embodiments wherein phosphate is replaced by P(O)S("thioate"), P(S)S ("dithioate"), (O)NR 2 ("amidate"), P(O)R, P(O)OR', CO or CH 2 ("formacetal"), in which each R or R' is independently H or substituted or unsubstituted alkyl (1 -20 C) optionally containing an ether (-O-) linkage, aryl, alkenyl, cycloalkyl, cycloalkenyl or araldyl. Not all linkages in a polynucleotide need be identical. The preceding description applies to all polynucleotides referred to herein, including RNA and DNA.

As used herein, the terms "wild-type amino acid," "wild-type mTGase," "wild-type pro-domain," "wild-type protein," or "wild-type nucleic acid" refer to a sequence of amino or nucleic acids that occurs naturally within a certain population (e.g., a particular microbial species, etc.).

As used herein, the terms "engineered" or "modified" (e.g., engineered

polypeptide or modified polypeptide, engineered polynucleotide or modified

polynucleotide) are used interchangeably herein to refer to a non-native sequence that has been manipulated to have one or more changes relative a native sequence. As outlined elsewhere herein, certain positions of the mTGase polypeptide can be altered. By "position" as used herein is meant a location in the sequence of a protein. Corresponding positions are generally determined through alignment with other parent sequences.

As used herein, "residue" refers to a position in a protein and its associated amino acid identity. For example, Tyrosine 10 (also referred to as Tyr10, also referred to as Y10) is a residue in mTGase.

As used herein, a "host cell" includes an individual cell or cell culture that can be or has been a recipient for vector(s) for incorporation of polynucleotide inserts. Host cells include progeny of a single host cell, and the progeny may not necessarily be completely identical (in morphology or in genomic DNA complement) to the original parent cell due to natural, accidental, or deliberate mutation. A host cell includes cells transfected and/or transformed in vivo with a polynucleotide of this invention.

As used herein, "vector" means a construct, which is capable of delivering, and, preferably, expressing, one or more gene(s) or sequence(s) of interest in a host cell. Examples of vectors include, but are not limited to, viral vectors, naked DNA or RNA expression vectors, plasm id, cosmid or phage vectors, DNA or RNA expression vectors associated with cationic condensing agents, DNA or RNA expression vectors encapsulated in liposomes, and certain eukaryotic cells, such as producer cells.

As used herein, "expression control sequence" means a nucleic acid sequence that directs transcription of a nucleic acid. An expression control sequence can be a promoter, such as a constitutive or an inducible promoter, or an enhancer. The expression control sequence is operably linked to the nucleic acid sequence to be transcribed.

As used herein, "isolated molecule" (where the molecule is, for example, a polypeptide, a polynucleotide, or fragment thereof) is a molecule that by virtue of its origin or source of derivation (1 ) is not associated with naturally associated components that accompany it in its native state, (2) is substantially free of other molecules from the same species (3) is expressed by a cell from a different species, or (4) does not occur in nature. Thus, a molecule that is chemically synthesized, or expressed in a cellular system different from the cell from which it naturally originates, will be "isolated" from its naturally associated components. A molecule also may be rendered substantially free of naturally associated components by isolation, using purification techniques well known in the art. Molecule purity or homogeneity may be assayed by a number of means well known in the art. For example, the purity of a polypeptide sample may be assayed using polyacrylamide gel electrophoresis and staining of the gel to visualize the polypeptide using techniques well known in the art. For certain purposes, higher resolution may be provided by using HPLC or other means well known in the art for purification.

"Homologous," in all its grammatical forms and spelling variations, refers to the relationship between two proteins that possess a "common evolutionary origin," including proteins from superfamilies in the same species of organism, as well as homologous proteins from different species of organism. Such proteins (and their encoding nucleic acids) have sequence homology, as reflected by their sequence similarity, whether in terms of percent identity or by the presence of specific residues or motifs and conserved positions.

However, in common usage and in the instant application, the term

"homologous," when modified with an adverb such as "highly," may refer to sequence similarity and may or may not relate to a common evolutionary origin.

The term "sequence similarity," in all its grammatical forms, refers to the degree of identity or correspondence between nucleic acid or amino acid sequences that may or may not share a common evolutionary origin.

"Percent (%) sequence identity" with respect to a reference polypeptide (or nucleotide) sequence is defined as the percentage of amino acid residues (or nucleic acids) in a candidate sequence that are identical with the amino acid residues (or nucleic acids) in the reference polypeptide (nucleotide) sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.

As used herein, "purify," and grammatical variations thereof, refers to the removal, whether completely or partially, of at least one impurity from a mixture containing the polypeptide and one or more impurities, which thereby improves the level of purity of the polypeptide in the composition (i.e., by decreasing the amount (ppm) of impurity(ies) in the composition).

As used herein, "substantially pure" refers to material which is at least 50% pure (i.e., free from contaminants), more preferably, at least 90% pure, more preferably, at least 95% pure, yet more preferably, at least 98% pure, and most preferably, at least 99% pure.

As used herein, "acyl donor glutamine-containing tag", "glutamine tag," "Q- containing tag", or "Q-tag", refers to a polypeptide or a protein containing one or more Gin residue(s) that acts as a transglutaminase amine acceptor.

As used herein, "amine donor agent" or "acyl acceptor" refers to an agent containing one or more reactive amines (e.g., primary amines). For example, the amine donor agent can comprise an amine donor unit (e.g., primary amine NH 2 ), a linker, and an agent moiety (e.g., a small molecule). The amine donor agent can also be a polypeptide (e.g., an antibody) or a biocompatible polymer containing a reactive Lys (e.g., an endogenous Lys).

As used herein, the term "biocompatible polymer" refers to a polymer (e.g., repeating monomeric or structural units) that is suitable for therapeutic or medical treatment in a recipient (e.g., human) without eliciting any undesirable local or systemic effects in the recipient. A biocompatible polymer (synthetic, recombinant, or native) can be a water soluble or water insoluble polymer. A biocompatible polymer can also be a linear or a branched polymer.

As used herein, the term "site specificity," "site-specifically conjugated," or "site- specifically crosslinked" refers to the specific conjugation or crosslinking of the amine donor agent to the polypeptide engineered with an acyl donor glutamine-containing tag at a specific site (e.g., carboxyl terminus or amino terminus of the antibody or toxin polypeptide, accessible site in the antibody (e.g., antibody light chain and/or heavy chain loops) or toxin polypeptide (e.g., polypeptide loops)). The polypeptide engineered with an acyl donor glutamine-containing tag can be a Fc-containing polypeptide, Fab- containing polypeptide, or a toxin polypeptide. The term "site specificity," "site- specifically conjugated," or "site-specifically crosslinked" can also refer to the specific conjugation or crosslinking of the polypeptide (e.g., toxin polypeptide) to the

biocompatible polymer engineered with an acyl donor glutamine-containing tag at a specific site (e.g., an accessible site in the biocompatible polymer). Site specificity can be measured by various techniques, including, but not limited to, mass spectrometry (e.g., matrix-assisted laser-desorption ionization mass spectrometry (MALDI-MS), electrospray ionization mass spectrometry (ESI-MS), tandem mass spectrometry (MS), and time-of-flight mass spectrometry (TOF-MS)), hydrophobic interaction

chromatography, ion exchange chromatography, site-directed mutagenesis,

fluorescence-labeling, size exclusion chromatography, and X-ray crystallography.

As used herein, the term "spatially adjacent to" refers to interference with the desired transglutaminase reaction.

As used herein, the term "antibody" is an immunoglobulin molecule capable of specific binding to a target, such as a carbohydrate, polynucleotide, lipid, polypeptide, etc., through at least one antigen recognition site, located in the variable region of the immunoglobulin molecule. As used herein, unless otherwise indicated by context, the term is intended to encompass not only intact polyclonal or monoclonal antibodies, but also fragments thereof (such as Fab, Fab', F(ab')2, Fv), single chain (ScFv) and domain antibodies, including shark and camelid antibodies), and fusion proteins comprising an antibody portion, multivalent antibodies, multispecific antibodies (e.g., bispecific antibodies so long as they exhibit the desired biological activity) and antibody fragments as described herein, and any other modified configuration of the immunoglobulin molecule that comprises an antigen recognition site. An antibody includes an antibody of any class, such as IgG, IgA, or IgM (or sub-class thereof), and the antibody need not be of any particular class. Depending on the antibody amino acid sequence of the constant domain of its heavy chains, immunoglobulins can be assigned to different classes. There are five major classes of immunoglobulins: IgA, IgD, IgE, IgG, and IgM, and several of these may be further divided into subclasses (isotypes), e.g., lgG1 , lgG2, lgG3, lgG4, IgAI and lgA2. The heavy-chain constant domains that correspond to the different classes of immunoglobulins are called alpha, delta, epsilon, gamma, and mu, respectively. The subunit structures and three-dimensional configurations of different classes of immunoglobulins are well known. On one aspect, the immunoglobulin is a human, murine, or rabbit immunoglobulin.

As used herein, "Fab containing polypeptide" refers to a polypeptide comprising an Fab fragment, Fab' fragment, or "(Fab')2 fragment. An Fab-containing polypeptide may comprise part or all of a wild-type hinge sequence (generally at its carboxyl terminus). An Fab-containing polypeptide may be obtained or derived from any suitable immunoglobulin, such as from at least one of the various lgG1 , lgG2, lgG3, or lgG4 subtypes, or from IgA, IgE, IgD or IgM. An Fab-containing polypeptide may be an Fab- containing fusion polypeptide, wherein one or more polypeptides is operably linked to an Fab-containing polypeptide. An Fab fusion combines the Fab polypeptide of an immunoglobulin with a fusion partner, which in general may be any protein, polypeptide, or small molecule. Virtually any protein or small molecule may be linked to the Fab polypeptide to generate an Fab-containing fusion polypeptide. Fab-containing fusion partners may include, but are not limited, the target- binding region of a receptor, an adhesion molecule, a ligand, an enzyme, a cytokine, a chemokine, or some other protein or protein domain.

An "Fc-containing polypeptide" may be an Fc-containing fusion polypeptide, wherein one or more polypeptides is operably linked to an Fc-containing polypeptide. An Fc fusion combines the Fc polypeptide of an immunoglobulin with a fusion partner, which in general may be any protein, polypeptide, or small molecule. Virtually any protein or small molecule may be linked to the Fc region to generate an Fc-containing fusion polypeptide. Fc-containing fusion partners may include, but are not limited, the target-binding region of a receptor, an adhesion molecule, a ligand, an enzyme, a cytokine, a chemokine, or some other protein or protein domain.

As used herein, the term "conjugation efficiency" or "crosslinking efficiency" is the ratio between the experimentally measured amount of engineered polypeptide conjugate divided by the maximum expected engineered polypeptide conjugate amount. Conjugation efficiency or crosslinking efficiency can be measured by various techniques well known to persons skilled in the art, such as hydrophobic interaction chromatography. Conjugation efficiency can also be measured at different temperature, such as room temperature or 37°C.

Engineered Microbial Transglutaminase Polypeptides

This application discloses an mTGase expression system that allows

recombinant expression of soluble and fully active mTGase from S. mobarensis in the cytoplasm of, e.g., E. coli, and yields mTGase with identical enzymatic activity as S. mobarensis derived mTGase. The present disclosure provides a new system to express genetically engineered mTGase soluble in the cytoplasm of E. coli. A straight forward 2-step purification yields mature enzyme with identical activity compared with wild type mTGase from S.

mobarensis.

In one aspect, the mTGase expression system of the disclosure comprises engineered mTGase polypeptides comprising one or more amino acid substitutions in the pro-domain region. In some embodiments, the mTGase expression system of the disclosure comprises engineered mTGase polypeptides comprising a protease cleavage site between the pro-domain region and the enzyme-domain region. In some embodiments, the mTGase expression system of the disclosure comprises a single vector comprising both an mTGase encoding nucleic acid and a protease encoding nucleic acid. In some embodiments, the mTGase expression system of the disclosure comprises two or more vectors, with one vector comprising an mTGase encoding nucleic acid and one vector comprising a protease encoding nucleic acid. In some embodiments, the mTGase and the protease are co-expressed (e.g., simultaneously or in overlapping manner). In some embodiments, the mTGase and the protease are sequentially expressed.

The present invention relates to engineered microbial transglutaminase

(mTGase) polypeptides that allow efficient production of soluble and active mTGase. The engineered polypeptides preferably have one or more mutations in the pro-domain region (e.g., one or more amino acid substitutions) relative to the corresponding wild- type polypeptide and/or comprise one or more protease cleavage sites engineered between the pro-domain region and the enzyme domain region of the polypeptide. In some embodiments, co-expression of a protease along with a polypeptide of the invention results in a simple and efficient method of active mTGase production. In certain embodiments, one or more mutations in the pro-domain region modulate the folding chaperone activity of the pro-domain and the strength of the pro- and enzyme- domain interaction. In certain embodiments, the one or more mutations in the pro- domain region weaken the interaction between the pro- and enzyme domains of the mTGase polypeptide.

In some embodiments, the recombinantly expressed mature mTGase of the invention has activity comparable to that of a native mature mTGase (e.g., native mature mTGase produced by fermentation of wild-type S. mobarensis). In some embodiments, a mature mTGase produced according to the methods of the invention has at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70%, at least 65%, at least 60%, at least 55%, at least 50%, at least 45%, at least 40%, at least 35%, at least 30%, or at least 25% of the enzyme activity of a native mature mTGase. In some embodiments, a mature mTGase of the invention has 95%-100%, 90-95%, at 85%-90%, 80%-85%, 75%-80%, 70%-75%, 65%-70%, 60%-65%, 55%- 60%, 50%-55%, 45%-50%, 40%-45%, 35%-40%, 30%-35%, or 25%-30% of the enzyme activity of a native mature mTGase.

In some embodiments, a mature mTGase produced according to the methods of the invention has at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70%, at least 65%, at least 60%, at least 55%, at least 50%, at least 45%, at least 40%, at least 35%, at least 30%, or at least 25% of the pro-domain dissociated from it. In some embodiments, a mature mTGase of the invention has 95%-100%, 90- 95%, at 85%-90%, 80%-85%, 75%-80%, 70%-75%, 65%-70%, 60%-65%, 55%-60%, 50%-55%, 45%-50%, 40%-45%, 35%-40%, 30%-35%, or 25%-30% of the pro-domain dissociated from it.

Transglutaminases of microbiological origin generally have low molecular weight, as opposed to transglutaminases isolated from animal tissues. Microbiological transglutaminases are typically simple monomeric proteins (not a glycoprotein or lipoprotein) (Yokoyama et al. Appl Microbiol Biotechnol. 2004; 64:447-454) and are produced as an inactive pro-enzyme. Subsequently, proteolytic cleavage of the propeptide or pro-domain region yields active mTGase. As used herein, "microbial transglutaminase" or "mTGase" is meant any transglutaminase or protein-glutamine gamma-glutamyltranferase which may be derived from any suitable microbial organism. It is generally known that a secretory protein is translated as a prepeptide or a prepropeptide and is converted into a mature protein. That is, it is generally known that a secretory protein is translated as a prepeptide or a prepropeptide and is converted into a mature peptide or a propeptide upon cleavage of the pre-domain, and the propeptide is further converted into a mature protein upon cleavage of the pro-domain with a protease. Unless specified otherwise, "mTgase" encompasses the mTGase proenzyme ("pro-mTGase", comprising both the pro-domain and the enzyme domain) as well as the mature enzyme (enzyme domain only). The pro-domain and enzyme domains of an mTGase may be identified through various methods known in the art (e.g., by sequence alignments with known domains). As used herein, "microbial transglutaminase" or "mTGase" also encompasses fragments, functional variants, isoforms, and other homologs of such transglutaminases. The mTGase polypeptides of the invention described herein can be derived from a variety of sources. Variant mTGase polypeptides will generally be characterized by having the same type of activity as naturally occurring mTGase, such as the ability to catalyze the acyl transfer reaction between the γ-carboxyamide group of peptide-bound glutamine residues and a variety of primary amines. In some embodiments, the mTGase polypeptide is derived from a fungal protein (e.g. , Oomycetes, Actinomycetes, Saccharomyces, Candida, Cryptococcus, Monascus, or Rhizopus transglutaminases). In some embodiments, the mTGase polypeptide is derived from Myxomycetes (e.g. , Physarum polycephalum transglutaminase). In some embodiments, the mTGase polypeptide is derived from a bacterial protein, such as transglutaminase from Streptoverticillium sp. or Streptomyces sp. (e.g., Streptomyces mobarensis or Streptoverticillium mobarensis). In some embodiments, the mTGase polypeptide is derived from a bacterial protein, such as transglutaminase from, but not limited to, Streptoverticillium mobarensis,

Streptoverticillium griseocarneum, Streptoverticillium ladakanum, Streptomyces mobarensis, Streptomyces viridis, Streptomyces ladakanum, Streptomyces caniferus, Streptomyces platensis, Streptomyces hygroscopius, Streptomyces netropsis,

Streptomyces fradiae, Streptomyces roseovertivillatus, Streptomyces cinnamaoneous, Streptomyces griseocarneum, Streptomyces lavendulae, Streptomyces lividans, Streptomyces lydicus, Streptomyces sioyansis, Actinomadura sp., Bacillus (e.g., Bacillus circulans, Bacillus subtilis spores, etc.), Corynebacterium ammoniagenes, Cory nebacteri urn glutamicum, Clostridium, Enterobacter sp. , Micrococcus, Providencia sp., or isolates thereof. In some embodiments, the transglutaminase is a calcium independent transglutaminase which does not require calcium to induce enzyme conformational changes and allow enzyme activity. In some embodiments, the mTGase polypeptide is derived from S. mobarensis. The wild-type S. mobarensis mTGase polypeptide sequence is shown in SEQ ID NO: 4. Residues 1 -44 of SEQ ID NO: 4 correspond to the pro-domain region (which is also set forth in SEQ ID NO: 1 ) and residues 45-375 of SEQ ID NO: 4 correspond to the enzyme domain.

In one aspect, the engineered transglutaminase polypeptide of the invention is a modified mTGAse polypeptide and comprises one or more mutations relative to the wild-type microbial polypeptide. In some embodiments, the engineered

transglutaminase polypeptide of the invention has one or more mutations in the pro- domain region. In some embodiments, the modification or mutation is an amino acid deletion, insertion, substitution, or any combination thereof. A modified polypeptide may comprise 1 , 2, 3, 4, 5, up to 10, or more amino acid substitutions and/or deletions and/or insertions. A "deletion" may comprise the deletion of individual amino acids, deletion of small groups of amino acids such as 2, 3, 4 or 5 amino acids, or deletion of larger amino acid regions, such as the deletion of specific amino acid domains or other features. An "insertion" may comprise the insertion of individual amino acids, insertion of small groups of amino acids such as 2, 3, 4 or 5 amino acids, or insertion of larger amino acid regions, such as the insertion of specific amino acid domains or other features. A "substitution" comprises replacing a wild type amino acid with another (e.g., a non-wild type amino acid). In some embodiments, the another (e.g., non-wild type) or inserted amino acid is Ala (A), His (H), Lys (K), Phe (F), Met (M), Thr (T), Gin (Q), Asp (D), or Glu (E). In some embodiments, the another (e.g., non-wild type) or inserted amino acid is A. In some embodiments, the another (e.g., non-wild type) amino acid is Arg (R), Asn (N), Cys (C), Gly (G), lie (I), Leu (L), Pro (P), Ser (S), Trp (W), Tyr (Y), or Val (V). Conventional or naturally occurring amino acids are divided into the following basic groups based on common side-chain properties: (1 ) non-polar: Norleucine, Met, Ala, Val, Leu, He; (2) polar without charge: Cys, Ser, Thr, Asn, Gin; (3) acidic

(negatively charged): Asp, Glu; (4) basic (positively charged): Lys, Arg; and (5) residues that influence chain orientation: Gly, Pro; and (6) aromatic: Trp, Tyr, Phe, His.

Conventional amino acids include L or D stereochemistry. In some embodiments, the another (e.g., non-wild type) amino acid is a member of a different group (e.g., an aromatic amino acid is substituted for a non-polar amino acid). Substantial

modifications in the biological properties of the polypeptide are accomplished by selecting substitutions that differ significantly in their effect on maintaining (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a β-sheet or helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain. Naturally occurring residues are divided into groups based on common side-chain properties: (1 ) Non-polar: Norleucine, Met, Ala, Val, Leu, Me; (2) Polar without charge: Cys, Ser, Thr, Asn, Gln;(3) Acidic (negatively charged): Asp, Glu; (4) Basic (positively charged): Lys, Arg; (5) Residues that influence chain orientation: Gly, Pro; and(6) Aromatic: Trp, Tyr, Phe, His. In some embodiments, the another (e.g., non-wild type) amino acid is a member of a different group (e.g., a hydrophobic amino acid for a hydrophilic amino acid, a charged amino acid for a neutral amino acid, an acidic amino acid for a basic amino acid, etc.). In some embodiments, the another (e.g., non-wild type) amino acid is a member of the same group (e.g., another basic amino acid, another acidic amino acid, another neutral amino acid, another charged amino acid, another hydrophilic amino acid, another hydrophobic amino acid, another polar amino acid, another aromatic amino acid or another aliphatic amino acid). In some embodiments, the another (e.g., non-wild type) amino acid is an unconventional amino acid. Unconventional amino acids are non-naturally occurring amino acids. Examples of an unconventional amino acid include, but are not limited to, aminoadipic acid, beta-alanine, beta-aminopropionic acid, aminobutyric acid, piperidinic acid, aminocaprioic acid, aminoheptanoic acid, aminoisobutyric acid, aminopimelic acid, citrulline, diaminobutyric acid, desmosine, diaminopimelic acid, diaminopropionic acid, N-ethylglycine, N-ethylaspargine, hyroxylysine, allo-hydroxylysine, hydroxyproline, isodesmosine, allo-isoleucine, N-methylglycine, sarcosine, N-methylisoleucine, N- methylvaline, norvaline, norleucine, orithine, 4-hydroxyproline, γ-carboxyglutamate, ε- Ν,Ν,Ν-trimethyllysine, ε-Ν-acetyllysine, O-phosphoserine, N-acetylserine, N- formylmethionine, 3-methylhistidine, 5-hydroxylysine, o-N-methylarginine, and other similar amino acids and amino acids (e.g., 4-hydroxyproline). The sites of greatest interest for substitutional mutagenesis include the pro-domain regions, but enzyme domain alterations are also contemplated.

In one aspect, an engineered polypeptide comprises 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, or 15 conservative or non-conservative substitutions relative to the wild- type polypeptide.

In another aspect, the modified mTGase polypeptide of the invention comprises modified sequences, wherein such modifications can include both conservative and non-conservative substitutions, deletions, and/or additions, and typically include peptides that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the corresponding wild-type mTGase polypeptide. The engineered mTGAse polypeptides of the invention can be derived from a transglutaminase of any of microorganisms having such a gene (e.g., a bacterial transglutaminase). In certain embodiments, the engineered mTGAse polypeptides of the invention can be derived from a transglutaminase of any of the microorganisms disclosed herein. In some embodiments, the engineered mTGAse polypeptides of the invention have one or more amino acid substitutions (1 , 2, 3, 4, 5, up to 10, or more amino acid substitutions) within the wild-type pro-domain region. In some embodiments, the wild-type pro-domain region of the engineered mTGase is selected from the group consisting of a) residues 1 -44 of SEQ ID NO: 1 (pro-domain of Streptomyces

mobaraensis_Q6E0Y3); b) residues 1 -44 of SEQ ID NO: 32 (pro-domain of

Streptomyces mobaraensis_Q6RET8); c) residues 1 -44 of SEQ ID NO: 33; d) residues 1 -46 of SEQ ID NO: 34 (pro-domain of Streptomyces mobaraensis_Q6TGB5); e) residues 1 -44 of SEQ ID NO: 35 (PD of Streptomyces virides_A5PHK1 ); f) residues 1 - 52 of SEQ ID NO: 36 (pro-domain of Streptomyces ladakanum_Q84AM9); g) residues 1 -57 of SEQ ID NO: 37 (pro-domain of Streptomyces caniferus_A5PHK3); h) residues 1 -57 of SEQ ID NO: 38 (pro-domain of Streptomyces platensis_A5PHK5); i) residues 1 - 57 of SEQ ID NO: 39 (pro-domain of Streptomyces platensis_Q6Q6T1 ); j) residues 1 - 56 of SEQ ID NO: 40 (pro-domain of Streptomyces hygroscopicus_B1 PMA0); k) residues 1 -51 of SEQ ID NO: 41 (pro-domain of Streptomyces netropsis_A9Q0L6); I) residues 1 -52 of SEQ ID NO: 42 (pro-domain of Streptomyces fradiae_Q0GYU0); m) residues 1 -57 of SEQ ID NO: 43 (pro-domain of Streptomyces

roseovertivillatus_A5PHK2); and n) residues 1 -53 of SEQ ID NO: 44 (pro-domain of Streptomyces cinnamoneus_Q8GR90) and one or more amino acid substitutions are made relative to the above sequences. In some embodiments, the one or more residues to be substituted are selected by aligning the pro-domain polypeptide sequences and identifying one or more conserved residues. In some embodiments, the one or more residues to be substituted are selected by carrying out structural alignments and identifying one or more residues that contact the enzyme domain. In some embodiments, the pro-domain region of the engineered mTGase comprises the pro-domain region from S. mobarensis (e.g., residues 1 -44 of SEQ ID NO: 1 , residues 1 -44 of SEQ ID NO: 32, or residues 1 -44 of SEQ ID NO: 33). In some embodiments, the amino acid substitution is from a wild-type amino acid to Ala, His, Lys, Phe, Met, Thr, Gin, Asp, or Glu. In some embodiments, the amino acid substitution is from a wild- type amino acid to Ala. In some embodiments, the amino acid substitution is from a wild-type amino acid to Arg, Asn, Cys, Gly, lie, Leu, Pro, Ser, Trp, Tyr, or Val. In some embodiments, the amino acid substitution is from a wild-type amino acid to a small, acidic, charged or polar amino acid. In some embodiments, the amino acid substitution is from a wild-type amino acid to a conservative substitution of the wild-type amino acid or to an amino acid found at the corresponding position in a different bacterial species. In some embodiments, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 of the above residues may be substituted. In some embodiments, one or more of the above residues (e.g., 1 , 2, 3, 4, or 5) may be deleted. In some embodiments, the pro-domain region of the mTGase polypeptide comprises a variant of these sequences, wherein such variants can include both conservative and non-conservative substitutions, deletions, and/or additions, and typically include peptides that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any of the specific sequences disclosed herein. In some embodiments, the pro-domain region of the engineered mTGase comprises an amino acid sequence that is at least 85%, 90%, 95%, 97%, 99% or 100% identical to residues 1 -44 of SEQ ID NO: 1 .

In some embodiments, the engineered mTGAse polypeptides of the invention have one or more amino acid substitutions (1 , 2, 3, 4, 5, up to 10, or more amino acid substitutions) within conserved regions of the pro-domains. Conservation of protein sequences may be indicated by the presence of identical or functionally similar amino acid residues at analogous positions of proteins across species. Conserved regions of the pro-domain may be identified using various methods known in the art. In some embodiments, one such conserved region is as shown in Figures 9A-9C and

corresponds to a) residues Tyr10 to Glu29 of SEQ ID NO: 1 (pro-domain of

Streptomyces mobaraensis_Q6E0Y3); b) residues Tyr10 to Glu29 of SEQ ID NO: 32 (pro-domain of Streptomyces mobaraensis_Q6RET8); c) residues Tyr10 to Glu29 of SEQ ID NO: 33; d) residues Tyr10 to Glu29 of SEQ ID NO: 34 (pro-domain of

Streptomyces mobaraensis_Q6TGB5); e) residues Tyr10 to Glu29 of SEQ ID NO: 35 (PD of Streptomyces virides_A5PHK1 ); f) residues Tyr17 to Glu36 of SEQ ID NO: 36 (pro-domain of Streptomyces ladakanum_Q84AM9); g) residues Tyr12 to Lys31 of SEQ ID NO: 37 (pro-domain of Streptomyces caniferus_A5PHK3); h) residues Tyr12 to Lys31 of SEQ ID NO: 38 (pro-domain of Streptomyces platensis_A5PHK5); i) residues Tyr12 to Lys31 of SEQ ID NO: 39 (pro-domain of Streptomyces platensis_Q6Q6T1 ); j) residues Tyr1 1 to Lys30 of SEQ ID NO: 40 (pro-domain of Streptomyces

hygroscopicus_B1 PMA0); k) residues Tyr12 to Glu31 of SEQ ID NO: 41 (pro-domain of Streptomyces netropsis_A9Q0L6); I) residues Tyr12 to Glu31 of SEQ ID NO: 42 (pro- domain of Streptomyces fradiae_Q0GYU0); m) residues Tyr12 to Glu31 of SEQ ID NO: 43 (pro-domain of Streptomyces roseovertivillatus_A5PHK2); and n) residues Tyr12 to Glu31 of SEQ ID NO: 44 (pro-domain of Streptomyces cinnamoneus_Q8GR90) and one or more amino acid substitutions are made relative to the above sequences. In some embodiments, a conserved region of the pro-domain region of the engineered mTGase comprises a conserved region from S. mobarensis (e.g., residues 10-29 of SEQ ID NO: 1 , residues 10-29 of SEQ ID NO: 32, or residues 10-29 of SEQ ID NO: 33). In some embodiments, the amino acid substitution is from a wild-type amino acid to Ala, His, Lys, Phe, Met, Thr, Gin, Asp, or Glu. In some embodiments, the amino acid substitution is from a wild-type amino acid to Ala. In some embodiments, the amino acid substitution is from a wild-type amino acid to Arg, Asn, Cys, Gly, lie, Leu, Pro, Ser, Trp, Tyr, or Val. In some embodiments, the amino acid substitution is from a wild-type amino acid to a small, acidic, charged or polar amino acid. In some embodiments, the amino acid substitution is from a wild-type amino acid to a conservative substitution of the wild-type amino acid or to an amino acid found at the corresponding position in a different bacterial species. In some embodiments, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 of the above residues may be substituted. In some embodiments, one or more of the above residues (e.g., 1 , 2, 3, 4, or 5) above may be deleted. In some embodiments, the corresponding region of the pro-domain of the mTGase polypeptide comprises a variant of these sequences, wherein such variants can include both conservative and non- conservative substitutions, deletions, and/or additions, and typically include peptides that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any of the specific sequences disclosed herein. In some

embodiments, the corresponding region of the pro-domain region of the engineered mTGase comprises an amino acid sequence that is at least 85%, 90%, 95%, 97%, 99% or 100% identical to residues 10-29 of SEQ ID NO: 1 . Without being bound by theory, in some embodiments, the pro-domain residues that are modified (e.g., substituted with another amino acid residue) include those that modulate the interaction between the pro-domain and the enzyme domain of the mTGase polypeptide and/or important for the folding chaperone activity of the pro- domain. Exemplary pro-domain residues to be modified may be selected from: a) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 1 ; b) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 32; c) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 33; d) any one or more of Tyr10, His14, Leu16, Asp20, Val21 , Asn23,

Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 34; e) any one or more of Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Asn25, Leu27, Asn28, or Glu29 of SEQ ID NO: 35; f) any one or more of Tyr17, His21 , Leu23, Asp27, Val28, Asp30, Ile31 , Asn32, Leu34, Asn35, or Glu36 of SEQ ID NO: 36; g) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 37; h) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 38; i) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Lys31 of SEQ ID NO: 39; j) any one or more of Tyr1 1 , His15, Leu17, Asp21 , Val22, Asn24, Ile25, Asn26, Leu28, Asn29, or Lys30 of SEQ ID NO: 40; k) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 41 ; I) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 42; m) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Asn25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 43; or n) any one or more of Tyr12, His16, Leu18, Asp22, Val23, Ser25, Ile26, Asn27, Leu29, Asn30, or Glu31 of SEQ ID NO: 44. In some embodiments, the one or more amino acid residues of the pro-domain that are modified include any one or more (1 , 2, 3, or 4) of Tyr14, Asp20, Ile24, and Asn25 of SEQ ID NO: 1 . In some embodiments, the amino acid substitution is from a wild-type amino acid to Ala, His, Lys, Phe, Met, Thr, Gin, Asp, or Glu. In some embodiments, the amino acid substitution is from a wild-type amino acid to Ala. In some embodiments, the amino acid substitution is from a wild-type amino acid to Arg, Asn, Cys, Gly, lie, Leu, Pro, Ser, Trp, Tyr, or Val. In some embodiments, the amino acid substitution is from a wild-type amino acid to a small, acidic, charged or polar amino acid. In some embodiments, the amino acid substitution is from a wild-type amino acid to a conservative substitution of the wild-type amino acid or to an amino acid found at the corresponding position in a different bacterial species. In some embodiments, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 of the above residues may be substituted. In some embodiments, one or more of the above residues (e.g. 1 , 2, 3, 4, or 5) above may be deleted.

Exemplary substitutions in the pro-domains of the engineered polypeptides include a) any one or more (e.g.,1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyrl OAIa,

Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 1 ; b) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyrl OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 32; c) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyrl OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 33; d) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr1 OAla,

His14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, Ile24ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 34; e) any one or more (e.g.,1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyrl OAIa, Tyr14Ala, Leu16Ala, Asp20Ala, Val21Ala, Asn23Ala, lle24Ala, Asn25Ala, Leu27Ala, Asn28Ala, or Glu29Ala in SEQ ID NO: 35; f) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr17Ala, His21Ala, Leu23Ala, Asp27Ala,

Val28Ala, Asp30Ala, Ile31 Ala, Asn32Ala, Leu34Ala, Asn35Ala, or Glu36Ala in SEQ ID NO: 36; g) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr12Ala,

His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO: 37; h) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO: 38; i) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Lys31Ala in SEQ ID NO: 39; j) any one or more (e.g.,1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr1 1Ala,

His15Ala, Leu17Ala, Asp21Ala, Val22Ala, Asn24Ala, lle25Ala, Asn26Ala, Leu28Ala, Asn29Ala, or Lys30Ala in SEQ ID NO: 40; k) any one or more (e.g.,1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31 Ala of SEQ ID NO: 41 ; I) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 42; m) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr12Ala,

His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Asn25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 43; or n) any one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 1 1 ) of Tyr12Ala, His16Ala, Leu18Ala, Asp22Ala, Val23Ala, Ser25Ala, lle26Ala, Asn27Ala, Leu29Ala, Asn30Ala, or Glu31Ala of SEQ ID NO: 44.

In one aspect, an engineered mTGase polypeptide comprises 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, or 15 amino acid substitutions, deletions, and/or additions in the enzyme domain relative to the wild-type polypeptide. In some embodiments, an engineered polypeptide comprises 1 , 2, 3, 4, or 5 amino acid deletions in the enzyme domain. In some embodiments, an engineered polypeptide comprises 1 , 2, 3, 4, or 5 amino acid substitutions in the enzyme domain. In some embodiments, an engineered polypeptide comprises 1 , 2, 3, 4, or 5 amino acid insertions in the enzyme domain.

Exemplary mTGase polypeptide sequences are set forth in SEQ ID NOs: 6, 8,

10, 12, 14, 16, 18, 20, 22, 24, 26, 28, and 30. In some embodiments, an mTGase polypeptide of the invention comprises the amino acid sequence set forth in SEQ ID NO: 10. In some embodiments, an mTGase polypeptide of the invention comprises the amino acid sequence set forth in SEQ ID NO: 14. In some embodiments, an mTGase polypeptide of the invention comprises the amino acid sequence set forth in SEQ ID NO: 20. In some embodiments, an mTGase polypeptide of the invention comprises the amino acid sequence set forth in SEQ ID NO: 22. In some embodiments, an mTGase polypeptide of the invention comprises a polypeptide encoded by the nucleic acid sequences set forth in SEQ ID NOs: 7, 9, 1 1 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, and 31 . In some embodiments, an mTGase polypeptide of the invention comprises a variant of these sequences, wherein such variants can include both conservative and non- conservative substitutions, deletions, and/or additions, and typically include peptides that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the specific amino acid sequences set forth in SEQ ID NOs: 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, and 30. In another aspect, the invention contemplates mTGase polypeptides that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any of the engineered polypeptides (e.g., a polypeptide with one or more amino acid substitutions in the pro-domain region of the polypeptide) described herein.

In some embodiments, the mTGase polypeptides of the invention further comprise a protease cleavage site engineered between the pro-domain and the enzyme domain. In some embodiments, the protease cleavage site is engineered into the loop connecting the pro- and enzyme domains of the polypeptide. In some embodiments, a 3C-protease cleavage site is inserted between the pro-domain and the enzyme domain. In some embodiments, the 3C-protease cleavage site comprises the amino acid sequence LEVLFQGP (SEQ ID NO: 45). In some embodiments, the protease cleavage site is selected from: a TEV (Tobacco Etch Virus) protease cleavage site (e.g., ENLYFQ (SEQ ID NO: 46)), a thrombin cleavage site, a factor Xa cleavage site, an enterokinase cleavage site, a SUMO (Small ubiquitin Modifier) protease cleavage site, a TVMV (Tobacco Vein Mottling Virus) protease cleavage site, or a TAMEP (Transglutaminase Activating Metalloprotease) cleavage site. In some embodiments, the protease cleavage site is subtilisin-like serine protease (SAM-P45) cleavage site, an SM-TAP cleavage site, a trypsin cleavage site, or a dispase cleavage site. In some embodiments, one or more additional protease cleavage sites are engineered between the pro-domain the enzyme domain of the engineered polypeptide. In some embodiments, one or more additional protease cleavage sites are engineered into a different region of the polypeptide. In some embodiments, the one or more protease cleavage sites are recognized by the same or different proteases. In some embodiments, the engineered polypeptides comprise one protease cleavage site. In some embodiments, the engineered polypeptides comprise one, two, or three protease cleavage sites. It will be understood by the skilled artisan that many site-specific proteases known in the art can be employed in the invention and that the corresponding protease recognition and cleavage sites can be inserted into the region between the pro-domain and the enzyme domain of the engineered polypeptides.

In certain aspects, functional variants or modified forms of the soluble

polypeptides include fusion proteins having at least a portion of the soluble polypeptide and one or more fusion domains. Well known examples of such fusion domains include, but are not limited to, polyhistidine, Glu-Glu, glutathione S transferase (GST), thioredoxin, protein A, protein G, and an immunoglobulin heavy chain constant region (Fc), maltose binding protein (MBP), which are particularly useful for isolation of the fusion proteins by affinity chromatography. In some embodiments, the mTGase polypeptides of the invention comprise a polyhistidine tag at the carboxyl terminus. For the purpose of affinity purification, relevant matrices for affinity chromatography, such as glutathione-, amylase-, and nickel- or cobalt- conjugated resins are used. Another fusion domain well known in the art is green fluorescent protein (GFP). Fusion domains also include "epitope tags," which are usually short peptide sequences for which a specific antibody is available. Well known epitope tags for which specific monoclonal antibodies are readily available include FLAG, influenza virus haemagglutinin (HA), and c-myc tags. In some cases, the fusion domains have a protease cleavage site, such as for Factor Xa or Thrombin, which allows the relevant protease to partially digest the fusion proteins and thereby liberate the recombinant proteins therefrom. The liberated proteins can then be isolated from the fusion domain by subsequent chromatographic separation. In certain embodiments, the soluble polypeptides of the present invention contain one or more modifications that are capable of stabilizing the soluble

polypeptides. For example, such modifications may enhance the in vitro half life of the soluble polypeptides, enhance circulatory half life of the soluble polypeptides or reduce proteolytic degradation of the soluble polypeptides.

The mTGase polypeptides of the invention can be characterized using methods known in the art, whereby the ability of the mature polypeptide to catalyze the acyl transfer reaction between the γ-carboxyamide group of peptide-bound glutamine residues and a variety of primary amines is detected and/or measured. The assays may be performed in various formats. In some embodiments, mTGase enzyme activity is measured using a colorimetric hydroxamate procedure using /V-carbobenzoxy-L- glutaminyl-glycine as the amine acceptor substrate and hydroxylamine as amine donor (e.g., a commercially available kit such as the ZediXclusive Microbial Transglutaminase Assay Kit, Z0009, Zedira, Germany). In some embodiments, mTGase enzyme activity is measured by incubating a candidate antibody containing a glutamine tag (acyl donor) with a payload (acyl acceptor) and monitoring the formation of an antibody-payload conjugate. Polypeptide "fragments" or "portions" according to the invention may be made by truncation, e.g., by removal of one or more amino acids from the N and/or C-terminal ends of a polypeptide. Up to 10, up to 20, up to 30, up to 40 or more amino acids may be removed from the N and/or C terminal in this way. Fragments may also be generated by one or more internal deletions. An mTGase polypeptide of the invention may be, or may comprise, a fragment of, any of the full-length mTGase polypeptides described herein, or a variant thereof.

Engineered Nucleic Acids

The invention also provides engineered polynucleotides or nucleic acids encoding any of the mTGase polypeptides including polypeptide fragments, and variants and modified polypeptides described herein. As described above, the present invention provides engineered polypeptides that preferably have one or more mutations in the pro-domain region (e.g., one or more amino acid substitutions) relative to the corresponding wild-type polypeptide and/or comprise one or more protease cleavage sites engineered between the pro-domain region and the enzyme domain region of the polypeptide. Accordingly, the invention provides engineered nucleic acids comprising a modified mTGase gene, wherein the modified mTGase gene preferably encodes a polypeptide with one or more mutations in the pro-domain region (e.g., one or more amino acid substitutions). In another aspect, the invention provides a method of making any of the polynucleotides or nucleic acids described herein. Polynucleotides can be made and expressed by procedures known in the art. Accordingly, the invention provides polynucleotides or compositions comprising polynucleotides, encoding any of the engineered mTGase polypeptides or portions thereof described herein.

In some embodiments, the invention provides a polynucleotide comprising a modified mTGase gene derived from a fungus (e.g., Oomycetes, Actinomycetes,

Saccharomyces, Candida, Cryptococcus, Monascus, or Rhizopus transglutaminases). In some embodiments, the modified mTGase gene is derived from Myxomycetes (e.g., Physarum polycephalum transglutaminase). In some embodiments, the modified mTGase gene is derived from a bacterial species such as Streptoverticillium sp. or Streptomyces sp. (e.g., Streptomyces mobarensis or Streptoverticillium mobarensis). In some embodiments, the modified mTGase gene is derived from a bacterium such as, but not limited to, Streptoverticillium mobarensis, Streptoverticillium griseocarneum, Streptoverticillium ladakanum, Streptomyces mobarensis, Streptomyces viridis,

Streptomyces ladakanum, Streptomyces caniferus, Streptomyces platensis,

Streptomyces hygroscopius, Streptomyces netropsis, Streptomyces fradiae,

Streptomyces roseovertivillatus, Streptomyces cinnamaoneous, Streptomyces griseocarneum, Streptomyces lavendulae, Streptomyces lividans, Streptomyces lydicus, Streptomyces sioyansis, Acti nomad ura sp., Bacillus (e.g., Bacillus circulans, Bacillus subtilis spores, etc.), Corynebacterium ammoniagenes, Cory nebacteri urn glutamicum, Clostridium, Enterobacter sp., Micrococcus, Providencia sp., or isolates thereof.

In one aspect, a modified mTGAse gene of the invention comprises one or more mutations relative to the wild-type microbial gene. In some embodiments, the modified gene encodes a transglutaminase polypeptide with one or more mutations in the pro- domain region. In some embodiments, a mutation is an amino acid deletion, insertion, substitution, or any combination thereof. A modified gene may encode a polypeptide comprising 1 , 2, 3, 4, 5, up to 10, or more amino acid substitutions and/or deletions and/or insertions. The sites of greatest interest for substitutional mutagenesis include the pro-domain regions, but enzyme domain alterations are also contemplated.

In one aspect, a modified mTGase gene encodes a polypeptide with 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, or 15 conservative or non-conservative substitutions relative to the wild-type polypeptide.

In another aspect, the modified mTGase genes of the invention comprise modified nucleic acid sequences, wherein such modifications can include missense mutations, nonsense mutations, duplications, deletions, and/or additions, and typically include nucleic acid sequences that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the corresponding wild-type mTGase nucleic acid sequences. One of ordinary skill in the art will appreciate that nucleic acid sequences complementary to the nucleic acids, and variants of the nucleic acids are also within the scope of this invention. In further embodiments, the nucleic acid sequences of the invention can be isolated, recombinant, and/or fused with a

heterologous nucleotide sequence, or in a DNA library. In one aspect, a modified mTGase gene encodes a polypeptide with one or more amino acid substitutions (1 , 2, 3, 4, 5, up to 10, or more amino acid substitutions) within the wild-type pro-domain region. In some embodiments, a modified mTGase gene encodes a polypeptide with one or more amino acid deletions (1 , 2, 3, 4, or 5) in the pro-domain. In some embodiments, a modified mTGase gene encodes a polypeptide with one or more (1 , 2, 3, 4, or 5) amino acid insertions in the pro-domain.

In one aspect, a modified mTGase gene encodes a polypeptide with 1 , 2, 3, 4, 5,

6, 7, 8, 9, 10, 1 1 , 12, 13, 14, or 15 amino acid substitutions, deletions, and/or additions in the enzyme domain relative to the wild-type polypeptide. In some embodiments, a modified mTGase gene encodes a polypeptide with 1 , 2, 3, 4, or 5 amino acid deletions in the enzyme domain. In some embodiments, a modified mTGase gene encodes a polypeptide with 1 , 2, 3, 4, or 5 amino acid substitutions in the enzyme domain. In some embodiments, a modified mTGase gene encodes a polypeptide with 1 , 2, 3, 4, or 5 amino acid insertions in the enzyme domain.

Exemplary sequences of modified mTGase genes are set forth in SEQ ID NOs:

7, 9, 1 1 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, and 31 . In some embodiments, a modified mTGase gene of the invention comprises the nucleic acid sequence set forth in SEQ ID NO: 1 1 . In some embodiments, a modified mTGase gene of the invention comprises the nucleic acid sequence set forth in SEQ ID NO: 15. In some embodiments, a modified mTGase gene of the invention comprises the nucleic acid sequence set forth in SEQ ID NO: 21 . In some embodiments, a modified mTGase gene of the invention comprises the nucleic acid sequence in SEQ ID NO: 23. In some embodiments, In some embodiments, an mTGase polynucleotide of the invention comprises a variant of these sequences, wherein such variants can include can include missense mutations, nonsense mutations, duplications, deletions, and/or additions, and typically include polynucleotides that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the specific nucleic acid sequences set forth in SEQ ID NOs: 7, 9, 1 1 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, and 31 . One of ordinary skill in the art will appreciate that nucleic acid sequences complementary to the nucleic acids, and variants of the nucleic acids are also within the scope of this invention. In further embodiments, the nucleic acid sequences of the invention can be isolated, recombinant, and/or fused with a heterologous nucleotide sequence, or in a DNA library.

In another aspect, the invention contemplates mTGase-encoding polynucleotides that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any of the polynucleotides encoding any of the engineered polypeptides (e.g. , a polypeptide with one or more amino acid substitutions in the pro- domain region of the polypeptide) described herein. One of ordinary skill in the art will appreciate that nucleic acid sequences complementary to the nucleic acids, and variants of the nucleic acids are also within the scope of this invention. In further embodiments, the nucleic acid sequences of the invention can be isolated,

recombinant, and/or fused with a heterologous nucleotide sequence, or in a DNA library.

In some embodiments, the mTGase-encoding polynucleotides of the invention further comprise a nucleic acid sequence encoding a protease cleavage site

engineered between the sequences encoding the pro-domain and the enzyme domain. In some embodiments, the nucleic acid sequence encoding a protease cleavage site is engineered into the sequence encoding the loop connecting the pro- and enzyme domains of the polypeptide. In some embodiments, a nucleic acid sequence encoding a 3C-protease cleavage site is inserted between the sequences encoding the pro-domain and the enzyme domain. In some embodiments, the 3C-protease cleavage site comprises the amino acid sequence LEVLFQGP (SEQ ID NO: 45). In some

embodiments, a nucleic acid sequence encoding a protease cleavage site is selected from: a nucleic acid encoding a TEV (Tobacco Etch Virus) protease cleavage site (e.g., ENLYFQ (SEQ ID NO: 46)), a thrombin cleavage site, a factor Xa cleavage site, an enterokinase cleavage site, a SUMO (Small ubiquitin Modifier) protease cleavage site, a TVMV (Tobacco Vein Mottling Virus) protease cleavage site, or a TAMEP

(Transglutaminase Activating Metalloprotease) cleavage site. In some embodiments, a nucleic acid sequence encoding a protease cleavage site is selected from: a nucleic acid encoding subtilisin-like serine protease (SAM-P45) cleavage site, an SM-TAP cleavage site, a trypsin cleavage site, or a dispase cleavage site. In some

embodiments, one or more additional protease cleavage site-encoding sequences are engineered between the sequences encoding the pro-domain the enzyme domain of the engineered polypeptide. In some embodiments, one or more additional protease cleavage site-encoding sequences are engineered into a sequence encoding a different region of the polypeptide. In some embodiments, the one or more protease cleavage sites are recognized by the same or different proteases. In some embodiments, the engineered polypeptides encoded by the polynucleotides comprise one protease cleavage site. In some embodiments, the engineered polypeptides encoded by the polynucleotides comprise one, two, or three protease cleavage sites. It will be

understood by the skilled artisan that many site-specific proteases known in the art can be employed in the invention and that the corresponding nucleic acid sequences for the recognition and/or cleavage sites can be engineered into the region between the sequences encoding the pro-domain and the enzyme domain of the polypeptides.

In one aspect, the invention provides methods and compositions for the co- expression of a protease along with a polypeptide of the invention in a novel method of producing active mTGase. Therefore, in one aspect, the invention further provides nucleic acids encoding proteases, or functional variants or functional fragments thereof. In some embodiments, the mTGase and the protease are encoded by separate polynucleotide molecules. Alternatively, both the mTGase and the protease are encoded by a single polynucleotide. In some embodiments, the protease is 3C- protease. An exemplary amino acid sequence of 3C-protease is set forth in SEQ ID NO:

2. An exemplary nucleotide sequence encoding 3C-protease is set forth in SEQ ID NO:

3. In some embodiments, the protease is selected from: a TEV (Tobacco Etch Virus) protease, thrombin, factor Xa, enterokinase, SUMO (Small ubiquitin Modifier) protease, TVMV (Tobacco Vein Mottling Virus) protease, or TAMEP (Transglutaminase Activating Metalloprotease). In some embodiments, the protease is subtilisin-like serine protease (SAM-P45), SM-TAP cleavage site, a trypsin cleavage site, or a dispase cleavage site. Polynucleotides may comprise a native sequence (i.e., an endogenous sequence that encodes a protease or a functional fragment thereof) or may comprise a variant of such a sequence. Polynucleotide variants contain one or more substitutions, additions, deletions and/or insertions. Variants preferably exhibit at least about 70% identity, more preferably, at least about 80% identity, yet more preferably, at least about 90% identity, and most preferably, at least about 95% identity to a polynucleotide sequence that encodes a native protease or a fragment thereof. In certain aspects, the modified mTGase genes of the invention comprise a sequence encoding one or more fusion domains. Well known examples of such fusion domains include, but are not limited to, polyhistidine, Glu-Glu, glutathione S transferase (GST), thioredoxin, protein A, protein G, and an immunoglobulin heavy chain constant region (Fc), maltose binding protein (MBP), which are particularly useful for isolation of the fusion proteins by affinity chromatography. In some embodiments, the modified mTGase genes of the invention comprise a sequence encoding a polyhistidine tag at the carboxyl terminus of the mTGase polypeptide.

Polynucleotides complementary to any of the polynucleotide sequences disclosed herein are also encompassed by the present invention. Polynucleotides may be single-stranded (coding or antisense) or double-stranded, and may be DNA

(genomic or synthetic) or RNA molecules. RNA molecules include mRNA molecules. Additional coding or non-coding sequences may, but need not, be present within a polynucleotide of the present invention, and a polynucleotide may, but need not, be linked to other molecules and/or support materials.

Two polynucleotide or polypeptide sequences are said to be "identical" if the sequence of nucleotides or amino acids in the two sequences is the same when aligned for maximum correspondence as described below. Comparisons between two sequences are typically performed by comparing the sequences over a comparison window to identify and compare local regions of sequence similarity. A "comparison window" as used herein, refers to a segment of at least about 20 contiguous positions, usually 30 to about 75, or 40 to about 50, in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two

sequences are optimally aligned.

Optimal alignment of sequences for comparison may be conducted using the

MegAlign ® program in the Lasergene ® suite of bioinformatics software (DNASTAR ® , Inc., Madison, Wl), using default parameters. This program embodies several alignment schemes described in the following references: Dayhoff, M.O., 1978, A model of evolutionary change in proteins - Matrices for detecting distant relationships. In Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington DC Vol. 5, Suppl. 3, pp. 345-358; Hein J., 1990, Unified Approach to Alignment and Phylogenes pp. 626-645 Methods in Enzymology vol. 183, Academic Press, Inc., San Diego, CA; Higgins, D.G. and Sharp, P.M., 1989, CABIOS 5: 151 -153; Myers, E.W. and Muller W., 1988, CABIOS 4: 1 1 -17; Robinson, E.D., 1971 , Comb. Theor. 1 1 : 105; Santou, N., Nes, M., 1987, Mol. Biol. Evol. 4:406-425; Sneath, P.H.A. and Sokal, R.R., 1973, Numerical Taxonomy the Principles and Practice of Numerical Taxonomy, Freeman Press, San Francisco, CA; Wilbur, W.J . and Lipman, D. J . , 1983, Proc. Natl. Acad. Sci. USA 80:726-730.

Preferably, the "percentage of sequence identity" is determined by comparing two optimally aligned sequences over a window of comparison of at least 20 positions, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) of 20 percent or less, usually 5 to 15 percent, or 10 to 12 percent, as compared to the reference sequences (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid bases or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the reference sequence (i.e., the window size) and multiplying the results by 100 to yield the percentage of sequence identity.

Modified mTGase polynucleotides or variants may also, or alternatively, be substantially homologous to a native gene, or a portion or complement thereof. Such polynucleotide variants are capable of hybridizing under moderately stringent conditions to a naturally occurring DNA sequence encoding a native mTGase polypeptide (or a complementary sequence).

Suitable "moderately stringent conditions" include prewashing in a solution of 5 X SSC, 0.5% SDS, 1 .0 mM EDTA (pH 8.0); hybridizing at 50°C-65°C, 5 X SSC, overnight; followed by washing twice at 65°C for 20 minutes with each of 2X, 0.5X and 0.2X SSC containing 0.1 % SDS.

As used herein, "highly stringent conditions" or "high stringency conditions" are those that: (1 ) employ low ionic strength and high temperature for washing, for example 0.015 M sodium chloride/0.0015 M sodium citrate/0.1 % sodium dodecyl sulfate at 50°C; (2) employ during hybridization a denaturing agent, such as formamide, for example, 50% (v/v) formamide with 0.1 % bovine serum albumin/0.1 % Ficoll/0.1 %

polyvinylpyrrolidone/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride, 75 mM sodium citrate at 42°C; or (3) employ 50% formamide, 5 x SSC (0.75 M NaCI, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1 % sodium pyrophosphate, 5 x Denhardt's solution, sonicated salmon sperm DNA (50 pg/ml), 0.1 % SDS, and 10% dextran sulfate at 42°C, with washes at 42°C in 0.2 x SSC (sodium chloride/sodium citrate) and 50% formamide at 55°C, followed by a high-stringency wash consisting of 0.1 x SSC containing EDTA at 55°C. The skilled artisan will recognize how to adjust the temperature, ionic strength, etc. as necessary to

accommodate factors such as probe length and the like.

It will be appreciated by those of ordinary skill in the art that, as a result of the degeneracy of the genetic code, there are many nucleotide sequences that encode a polypeptide as described herein. Some of these polynucleotides bear minimal homology to the nucleotide sequence of any native gene. Nonetheless, polynucleotides that vary due to differences in codon usage are specifically contemplated by the present invention. Further, alleles of the genes comprising the polynucleotide sequences provided herein are within the scope of the present invention. Alleles are endogenous genes that are altered as a result of one or more mutations, such as deletions, additions and/or substitutions of nucleotides. The resulting mRNA and protein may, but need not, have an altered structure or function. Alleles may be identified using standard

techniques (such as hybridization, amplification and/or database sequence

comparison).

The polynucleotides of this invention can be obtained using chemical synthesis, recombinant methods, or PCR. Methods of chemical polynucleotide synthesis are well known in the art and need not be described in detail herein. One of skill in the art can use the sequences provided herein and a commercial DNA synthesizer to produce a desired DNA sequence.

For preparing polynucleotides using recombinant methods, a polynucleotide comprising a desired sequence can be inserted into a suitable vector, and the vector in turn can be introduced into a suitable host cell for replication and amplification, as further discussed herein. Polynucleotides may be inserted into host cells by any means known in the art. Cells are transformed by introducing an exogenous polynucleotide by direct uptake, endocytosis, transfection, F-mating or electroporation. Once introduced, the exogenous polynucleotide can be maintained within the cell as a non-integrated vector (such as a plasmid) or integrated into the host cell genome. The polynucleotide so amplified can be isolated from the host cell by methods well known within the art. See, e.g., Sambrook et al., 1989. Alternatively, PCR allows reproduction of DNA sequences. PCR technology is well known in the art and is described in U.S. Patent Nos. 4,683, 195, 4,800,159, 4,754,065 and 4,683,202, as well as PCR: The Polymerase Chain Reaction, Mullis et al. eds., Birkauswer Press, Boston, 1994.

RNA can be obtained by using the isolated DNA in an appropriate vector and inserting it into a suitable host cell. When the cell replicates and the DNA is transcribed into RNA, the RNA can then be isolated using methods well known to those of skill in the art, as set forth in Sambrook et al., 1989, supra, for example.

In other embodiments, nucleic acids of the invention also include nucleotide sequences that hybridize under highly stringent conditions to the nucleotide sequences set forth in SEQ ID NOs: 7, 9, 1 1 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, and 31 , nucleotide sequences encoding polypeptides of SEQ ID NOs: 1 , 32-44, or sequences

complementary thereto. One of ordinary skill in the art will readily understand that appropriate stringency conditions which promote DNA hybridization can be varied. For example, one could perform the hybridization at 6.0 x sodium chloride/sodium citrate (SSC) at about 45 °C, followed by a wash of 2.0 x SSC at 50 °C. For example, the salt concentration in the wash step can be selected from a low stringency of about 2.0 x SSC at 50 °C to a high stringency of about 0.2 x SSC at 50 °C. In addition, the temperature in the wash step can be increased from low stringency conditions at room temperature, about 22 °C, to high stringency conditions at about 65 °C. Both

temperature and salt may be varied, or temperature or salt concentration may be held constant while the other variable is changed. In one embodiment, the invention provides nucleic acids which hybridize under low stringency conditions of 6 x SSC at room temperature followed by a wash at 2 x SSC at room temperature.

Isolated nucleic acids which differ due to degeneracy in the genetic code are also within the scope of the invention. For example, a number of amino acids are designated by more than one triplet. Codons that specify the same amino acid, or synonyms (for example, CAU and CAC are synonyms for histidine) may result in "silent" mutations which do not affect the amino acid sequence of the protein. One skilled in the art will appreciate that these variations in one or more nucleotides (up to about 3-5% of the nucleotides) of the nucleic acids encoding a particular protein may exist among members of a given species due to natural allelic variation. Any and all such nucleotide variations and resulting amino acid polymorphisms are within the scope of this invention.

The present invention further provides oligonucleotides that hybridize to a polynucleotides having the nucleotide sequence set forth in SEQ ID NOs: 7, 9, 1 1 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, and 31 , nucleotide sequences encoding polypeptides of SEQ ID NOs: 1 , 32-44, or to a polynucleotide molecule having a nucleotide sequence which is the complement of a sequence listed above. Such oligonucleotides are at least about 10 nucleotides in length, and preferably from about 15 to about 30 nucleotides in length, and hybridize to one of the aforementioned polynucleotide molecules under highly stringent conditions, i.e., washing in 6xSSC/0.5% sodium pyrophosphate at about 37° C for about 14-base oligos, at about 48° C for about 17-base oligos, at about 55° C for about 20-base oligos, and at about 60° C for about 23-base oligos. In a preferred embodiment, the oligonucleotides are complementary to a portion of one of the aforementioned polynucleotide molecules. These oligonucleotides are useful for a variety of purposes including encoding or acting as antisense molecules useful in gene regulation, or as primers in amplification of mTGase-encoding polynucleotide

molecules.

In certain embodiments, the recombinant nucleic acids of the invention may be operably linked to one or more regulatory nucleotide sequences in an expression construct. Regulatory nucleotide sequences will generally be appropriate for a host cell used for expression. Numerous types of appropriate expression vectors and suitable regulatory sequences are known in the art for a variety of host cells. Typically, said one or more regulatory nucleotide sequences may include, but are not limited to, promoter sequences, leader or signal sequences, ribosomal binding sites, transcriptional start and termination sequences, translational start and termination sequences, and enhancer or activator sequences. Constitutive or inducible promoters as known in the art are contemplated by the invention. The promoters may be either naturally occurring promoters, or hybrid promoters that combine elements of more than one promoter. An expression construct may be present in a cell on an episome, such as a plasm id, or the expression construct may be inserted in a chromosome. In certain embodiments, the expression vector contains a selectable marker gene to allow the selection of transformed host cells. Selectable marker genes are well known in the art and will vary with the host cell used. This invention also pertains to a host cell transfected with a recombinant gene including a coding sequence for one or more of the mTGase polypeptides. The host cell may be any prokaryotic or eukaryotic cell. For example, an mTGase polypeptide of the invention may be expressed in bacterial cells such as E. coli or yeast cells. Other suitable host cells are described below and are known to those skilled in the art.

Recombinant Systems

The present invention further provides recombinant cloning vectors and expression vectors that are useful in cloning or expressing a polynucleotide of the present invention including polynucleotide molecules encoding an mTGase with one or more amino acid substitutions in the pro-domain region. The present invention further provides transformed host cells comprising a polynucleotide molecule or recombinant vector of the invention, and novel strains or cell lines derived therefrom.

A host cell may be a bacterial cell, a yeast cell, a filamentous fungal cell, an algal cell, an insect cell, or a mammalian cell. In some embodiments, the host cell is a member of a genus selected from: Clostridium, Zymomonas, Escherichia, Salmonella, Serratia, Erwinia, Klebsiella, Shigella, Rhodococcus, Pseudomonas, Bacillus,

Lactobacillus, Enterococcus, Alcaligenes, Paenibacillus, Arthrobacter,

Corynebacterium, Brevibacterium, Schizosaccharomyces, Kluyveromyces, Yarrowia, Pichia, Candida, Pichia, or Saccharomyces. In some embodiments, the host cell is E. coli. A variety of different vectors have been developed for specific use in each of these host cells, including phage, high copy number plasmids, low copy number plasmids, and shuttle vectors, among others, and any of these can be used to practice the present invention.

Suitable cloning vectors may be constructed according to standard techniques, or may be selected from a large number of cloning vectors available in the art. While the cloning vector selected may vary according to the host cell intended to be used, useful cloning vectors will generally have the ability to self-replicate, may possess a single target for a particular restriction endonuclease, and/or may carry genes for a marker that can be used in selecting clones containing the vector. Suitable examples include plasmids and bacterial viruses, e.g., pBAD18, pUC18, pUC19, Bluescript (e.g., pBS SK+) and its derivatives, mp18, mp19, pBR322, pMB9, ColE1 , pCR1 , RP4, phage DNAs, and shuttle vectors such as pSA3 and pAT28. These and many other cloning vectors are available from commercial vendors such as BioRad, Strategene, and Invitrogen.

Expression vectors are further provided. Expression vectors generally are replicable polynucleotide constructs that contain a polynucleotide according to the invention. It is implied that an expression vector must be replicable in the host cells either as episomes or as an integral part of the chromosomal DNA. Suitable expression vectors include but are not limited to plasm ids, viral vectors, including adenoviruses, adeno-associated viruses, retroviruses, cosmids, and expression vector(s) disclosed in PCT Publication No. WO 87/04462. Vector components may generally include, but are not limited to, one or more of the following: a signal sequence; an origin of replication; one or more marker genes; suitable transcriptional controlling elements (such as promoters, enhancers and terminator). For expression (i.e., translation), one or more translational controlling elements are also usually required, such as ribosome binding sites, translation initiation sites, and stop codons. Recombinant vectors of the present invention, particularly expression vectors, are preferably constructed so that the coding sequence for the polynucleotide molecule of the invention is in operative association with one or more regulatory elements necessary for transcription and translation of the coding sequence to produce a polypeptide. As used herein, the term "regulatory element" includes but is not limited to nucleotide sequences that encode inducible and non-inducible promoters, enhancers, operators and other elements known in the art that serve to drive and/or regulate expression of polynucleotide coding sequences. Also, as used herein, the coding sequence is in "operative association" with one or more regulatory elements where the regulatory elements effectively regulate and allow for the transcription of the coding sequence or the translation of its mRNA, or both. Exemplary regulatory sequences are described in Goeddel; Gene Expression Technology:

Methods in Enzymology, Academic Press, San Diego, CA (1990). For instance, any of a wide variety of expression control sequences that control the expression of a DNA sequence when operatively linked to it may be used in these vectors to express DNA sequences encoding a soluble polypeptide. Such useful expression control sequences, include, for example, the araBAD promoter, the early and late promoters of SV40, tet promoter, adenovirus or cytomegalovirus immediate early promoter, the lac system, the trp system, the TAC or TRC system, T7 promoter whose expression is directed by T7 RNA polymerase, the major operator and promoter regions of phage lambda , the control regions for fd coat protein, the promoter for 3-phosphoglycerate kinase or other glycolytic enzymes, the promoters of acid phosphatase, e.g., Pho5, the promoters of the yeast a-mating factors, the polyhedron promoter of the baculovirus system and other sequences known to control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and various combinations thereof. It should be understood that the design of the expression vector may depend on such factors as the choice of the host cell to be transformed and/or the type of protein desired to be expressed. Moreover, the vector's copy number, the ability to control that copy number and the expression of any other protein encoded by the vector, such as antibiotic markers, should also be considered.

Fusion protein expression vectors can be used to express an mTGase polypeptide-fusion protein. The purified fusion protein can be used, for example, or to aid in the identification or purification of the expressed mTGase polypeptide. Possible fusion protein expression vectors include but are not limited to vectors incorporating sequences that encode β-galactosidase and trpE fusions, maltose-binding protein fusions, glutathione-S-transferase fusions and polyhistidine fusions (carrier regions). In an alternative embodiment, an mTGase polypeptide or a portion thereof can be fused to another mTGase polypeptide, or portion thereof, derived from another microbial species or strain.

mTGase polypeptides can be engineered to comprise a region useful for purification. For example, mTGase-maltose-binding protein fusions can be purified using amylose resin; mTGase -glutathione-S-transferase fusion proteins can be purified using glutathione-agarose beads; and mTGase -polyhistidine fusions can be purified using divalent nickel resin. In some embodiments, the mTGase polypeptides of the invention comprise a polyhistidine tag at the carboxyl terminus. Alternatively, antibodies against a carrier protein or peptide can be used for affinity chromatography purification of the fusion protein. For example, a nucleotide sequence coding for the target epitope of a monoclonal antibody can be engineered into the expression vector in operative association with the regulatory elements and situated so that the expressed epitope is fused to the mTGase polypeptide. For example, a nucleotide sequence coding for the FLAG™ epitope tag (International Biotechnologies Inc.), which is a hydrophilic marker peptide, can be inserted by standard techniques into the expression vector at a point corresponding, e.g., to the carboxyl terminus of the mTGase polypeptide. The expressed mTGase polypeptide-FLAG™ epitope fusion product can then be detected and affinity-purified using commercially available anti-FLAG™ antibodies.

The expression vector encoding the mTGase polypeptide can also be

engineered to contain polylinker sequences that encode specific protease cleavage sites so that the expressed mTGase polypeptide can be released from the carrier region or fusion partner by treatment with a specific protease. For example, a vector can include DNA sequences encoding thrombin or factor Xa cleavage sites, among others.

A signal sequence upstream from, and in reading frame with, the mTGase ORF can be engineered into the expression vector by known methods to direct the trafficking and secretion of the expressed polypeptide. Non-limiting examples of signal sequences include those from a-factor, immunoglobulins, outer membrane proteins, penicillinase, and T-cell receptors, among others. In some embodiments, the mTGase polypeptides are directed to the cytoplasm of a bacterial host cell and lack signal sequences.

To aid in the selection of host cells transformed or transfected with cloning or expression vectors of the present invention, the vector can be engineered to further comprise a coding sequence for a reporter gene product or other selectable marker. Such a coding sequence is preferably in operative association with the regulatory element coding sequences, as described above. Reporter genes that are useful in the invention are well-known in the art and include those encoding green fluorescent protein, luciferase, xylE, and tyrosinase, among others. Nucleotide sequences encoding selectable markers are well known in the art, and include those that encode gene products conferring resistance to antibiotics or anti-metabolites, or that supply an auxotrophic requirement. Examples of such sequences include those that encode resistance to ampicillin, erythromycin, thiostrepton or kanamycin, among many others.

In one aspect of the invention, co-expression of a protease along with an engineered mTGase polypeptide of the invention results in a simple and efficient method of active mature mTGase production. Therefore, in one aspect, the invention provides a method of producing a mature mTGase, the method comprising preparing a nucleic acid comprising a modified microbial transglutaminase gene and a nucleic acid comprising a protease gene; wherein the modified microbial transglutaminase gene, for example, encodes a polypeptide comprising at least one amino acid substitution within a pro-domain region of the microbial transglutaminase; and expressing the nucleic acid comprising a modified microbial transglutaminase gene and the nucleic acid comprising a protease gene, in a host cell. In some embodiments, the modified microbial transglutaminase gene and the protease gene are on two separate vectors expressed together in the same cell or in different cells. In some embodiments, the modified microbial transglutaminase gene and the protease gene are on the same vector expressed together in the same cell. A combination of vectors that are compatible in the same cell is also well known to those skilled in the art. A genetic construct, which is constructed as described above and contains a nucleic acid encoding a pro-mTGase polypeptide according to the present invention (e.g., an pro-mTGase polypeptide with one or more amino acid substitutions in the pro-domain and a protease cleavage site engineered between the pro-domain and the enzyme domain) and a genetic construct, which contains a nucleic acid encoding a sites-specific protease, may be provided on a single vector to produce in a single host cell may be introduced into a host cell (e.g., E. coli) both the pro-mTGase according to the present invention and the protease, thereby allowing the pro-mTGase to be converted into a mature mTGase in the host cell. A genetic construct, which is constructed as described above and contains a nucleic acid encoding a pro-mTGase polypeptide according to the present invention (e.g., an pro- mTGase polypeptide with one or more amino acid substitutions in the pro-domain and a protease cleavage site engineered between the pro-domain and the enzyme domain), may be introduced into a host cell (e.g., E. coli) containing a genetic construct which contains a nucleic acid encoding a sites-specific protease, or vice versa, to produce in a single host cell both the pro-mTGase according to the present invention and the protease, thereby allowing the pro-mTGase to be converted into a mature mTGase in the host cell. Therefore, mature mTGase can be obtained by introducing an appropriate genetic expression construct encoding a pro-mTGase according to the present invention and/or an appropriate genetic construct encoding a protease one the same or separate vectors into a host cell, thereby allowing the genetic constructs which can express the pro-mTGase and the site-specific protease to coexist in the same host cell, culturing the host cell, and maintaining the culture under appropriate conditions such that the protease is active. In some embodiments, the genetic constructs expressing the pro-mTGase and the protease may be under the control of different promoters, allowing for simultaneous or sequential induction of each construct. In some embodiments, the genetic constructs expressing the pro-mTGase and the protease may be under the control of the same promoter.

In some embodiments, the modified microbial transglutaminase gene and the protease gene are on the same vector expressed together in the same cell. In a particular embodiment described in the examples below, and depicted in FIG. 1A, a pBAD/T7 plasm id was constructed that contains an mTGase encoding nucleic acid and a 3C-protease encoding nucleic acid under the T7 and araBAD promoters, respectively. Such hybrid vectors can be transformed into E. coli cells.

The vectors containing the polynucleotides of interest and/or the polynucleotides themselves, can be introduced into the host cell by any of a number of appropriate means, including electroporation, transfection employing calcium chloride, rubidium chloride, calcium phosphate, DEAE-dextran, or other substances; microprojectile bombardment; lipofection; and infection (e.g., where the vector is an infectious agent such as vaccinia virus). The choice of introducing vectors or polynucleotides will often depend on features of the host cell.

The present invention further provides transformed host cells comprising a polynucleotide molecule or recombinant vector of the invention, and novel strains or cell lines derived therefrom. In some embodiments, host cells useful in the practice of the invention E. coli cells. A strain of E. coli can typically be used, such as e.g., E. coli TOP10, or E. coli BL21 (DE3), DH5a, etc., available from the American Type Culture Collection (ATCC), 10801 University Blvd., Manassas, Va. 201 10, USA and from commercial sources. In some embodiments, other prokaryotic cells or eukaryotic cells may be used. In some embodiments, the host cell is a member of a genus selected from: Clostridium, Zymomonas, Escherichia, Salmonella, Serratia, Erwinia, Klebsiella, Shigella, Rhodococcus, Pseudomonas, Bacillus, Lactobacillus, Enterococcus,

Alcaligenes, Paenibacillus, Arthrobacter, Corynebacterium, Brevibacterium,

Schizosaccharomyces, Kluyveromyces, Yarrowia, Pichia, Candida, Pichia, or

Saccharomyces. Such transformed host cells typically include but are not limited to microorganisms, such as bacteria transformed with recombinant bacteriophage DNA, plasmid DNA or cosmid DNA vectors, or yeast transformed with recombinant vectors, among others. Preferred eukaryotic host cells include yeast cells, although mammalian cells or insect cells can also be utilized effectively. Suitable host cells include

prokaryotes (such as E. coli, B. subtillis, S. lividans, or C. glutamicum) and yeast (such as S. cerevisae, S. pombe, P. pastoris, or K. lactis).

The recombinant expression vector of the invention is preferably transformed or transfected into one or more host cells of a substantially homogeneous culture of cells. The expression vector is generally introduced into host cells in accordance with known techniques, such as, e.g., by protoplast transformation, calcium phosphate precipitation, calcium chloride treatment, microinjection, electroporation, transfection by contact with a recombined virus, liposome-mediated transfection, DEAE-dextran transfection, transduction, conjugation, or microprojectile bombardment. Selection of transform ants can be conducted by standard procedures, such as by selecting for cells expressing a selectable marker, e.g., antibiotic resistance, associated with the recombinant vector, as described above.

Once the expression vector is introduced into the host cell, the integration and maintenance of the mTGase gene either in the host cell chromosome or episomally can be confirmed by standard techniques, e.g., by Southern hybridization analysis, restriction enzyme analysis, PCR analysis, including reverse transcriptase PCR (rt- PCR), or by immunological assay to detect the expected gene product. Host cells containing and/or expressing the recombinant mTGase coding sequence can be identified by any of at least four general approaches which are well-known in the art, including: (i) DNA-DNA, DNA-RNA, or RNA-antisense RNA hybridization; (ii) detecting the presence of "marker" gene functions; (iii) assessing the level of transcription as measured by the expression of mTGase-specific mRNA transcripts in the host cell; and (iv) detecting the presence of mature polypeptide product as measured, e.g., by immunoassay or by the presence of mTGase biological activity.

Methods of producing soluble and active mTGases

The invention includes methods of producing soluble and active mTGases. Once the mTGase coding sequence has been stably introduced into an appropriate host cell, the transformed host cell is clonally propagated, and the resulting cells can be grown under conditions conducive to the maximum production of the mature mTGase polypeptide. Such conditions typically include growing cells to high density. Where the expression vector comprises an inducible promoter, appropriate induction conditions such as, e.g., temperature shift, exhaustion of nutrients, addition of gratuitous inducers (e.g., arabinose, analogs of carbohydrates, such as isopropyl-β-ϋ- thiogalactopyranoside (IPTG)), accumulation of excess metabolic by-products, or the like, are employed as needed to induce expression.

Where the expressed mTGase polypeptide is retained inside the host cells, the cells are harvested and lysed, and the product isolated and purified from the lysate under extraction conditions known in the art to minimize protein degradation such as, e.g., at 4° C, or in the presence of protease inhibitors, or both. Where the expressed mTGase polypeptide is secreted from the host cells, the exhausted nutrient medium can simply be collected and the product isolated therefrom. In preferred embodiments, the mTGase polypeptide is present as a cytoplasmic protein. The expressed and/or processed mTGase polypeptide can be isolated or substantially purified from cell lysates or culture medium, as appropriate, using standard methods, including but not limited to any combination of the following methods: ammonium sulfate precipitation, size fractionation, ion exchange chromatography, hydrophobic interaction

chromatography, size-exclusion chromatography, HPLC, density centrifugation, and affinity chromatography.

Where the expressed and/or processed mTGase polypeptide (e.g., mature mTGase) exhibits biological activity, increasing purity of the preparation can be monitored at each step of the purification procedure by use of an appropriate assay. Whether or not the expressed and/or processed mTGase polypeptide exhibits biological activity, it can be detected as based, e.g., on size, or reactivity with an antibody otherwise specific for mTGase, or by the presence of a fusion tag. As used herein, an mTGase polypeptide is "substantially purified" where the product constitutes more than about 20 wt % of the protein in a particular preparation. Also, as used herein, an mTGase polypeptide is "isolated" where the product constitutes at least about 80 wt % of the protein in a particular preparation.

The present invention further provides a method for producing a mature mTGase, comprising culturing a host cell transformed with one or more recombinant expression vectors, said vector or vectors comprising a nucleic encoding a pro-mTGase polypeptide and/or a nucleic acid encoding a protease, respectively, which nucleic acids are in operative association with one or more regulatory elements that control expression of the nucleic acids in the host cell, under conditions conducive to the production of mature mTGase polypeptide, and recovering the mature mTGase from the cell culture. In some embodiments, the nucleic acids encoding the pro-mTGase polypeptide and the nucleic acid encoding the protease are present in the same vector, under the control of the same or different promoters. In some embodiments, the nucleic acids encoding the pro-mTGase polypeptide and the nucleic acid encoding the protease are present on different vectors.

In a non-limiting embodiment, the invention provides method of producing a microbial transglutaminase, comprising the steps of a) preparing a nucleic acid comprising a modified microbial transglutaminase gene and a nucleic acid comprising a 3C protease gene; wherein the modified microbial transglutaminase gene encodes a polypeptide comprising at least one amino acid substitution within the sequence set forth by residues 1 -44 of SEQ ID NO: 1 ; b) subcloning the nucleic acid comprising a modified microbial transglutaminase gene and the nucleic acid comprising a 3C protease gene into an expression vector; c) transforming and culturing a host cell with the expression vector of step b); and d) harvesting the mature microbial

transglutaminase from the host cell. In some embodiments, the modified microbial transglutaminase gene encodes a polypeptide comprising a 3C protease cleavage site engineered between the pro-domain and the enzyme domain of the polypeptide. In some embodiments, the modified microbial transglutaminase gene comprises a sequence encoding a polyhistidine tag at the carboxyl terminus of the microbial transglutaminase.

In some embodiments, the mature mTGase polypeptide is purified from the host cell by a chromatography step. In some embodiments, the chromatography step is affinity chromatography, ion-exchange chromatography, hydrophobic interaction chromatography, or size-exclusion chromatography, or a combination thereof. For example, in certain embodiments, the mature mTGase is purified first by affinity chromatography, followed by size-exclusion chromatography.

In some embodiments, the transglutaminase of the invention described herein can be a purified protein. For example, the purified transglutaminase is least about 50% pure. As used herein, "pure" or "purified" protein refers to a protein (e.g.,

transglutaminase) free from other protein contaminants. In some embodiments, the purified transglutaminase is at least about any of 55%-60%, 60%-65%, 65%-70%, 70%- 75%, 75%-80%, 80%-85%, 85%-90%, 90%-95%, 95%-98%, or 99% pure. In some embodiments, the purified transglutaminase is about any of 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% pure. In certain embodiments, the transglutaminase is substantially pure.

Once an mTGase polypeptide (e.g., a mature mTGase polypeptide) of sufficient purity has been obtained, it can be characterized by standard methods, including by SDS-PAGE, size exclusion chromatography, amino acid sequence analysis, biological activity, etc. For example, the amino acid sequence of the mTGase polypeptide can be determined using standard peptide sequencing techniques. The mTGase polypeptide can be further characterized using hydrophilicity analysis (see, e.g., Hopp and Woods, 1981 , Proc. Natl. Acad. Sci. USA 78:3824), or analogous software algorithms, to identify hydrophobic and hydrophilic regions of the mTGase polypeptide. Structural analysis can be carried out to identify regions of the mTGase polypeptide that assume specific secondary structures. Biophysical methods such as X-ray crystallography (Engstrom, 1974, Biochem. Exp. Biol. 1 1 : 7-13), computer modelling (Fletterick and Zoller (eds), 1986, in: Current Communications in Molecular Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.), and nuclear magnetic resonance (NMR) can be used to map and study sites of interaction between the mTGase polypeptide and its substrate. Information obtained from these studies can be used to select new sites for mutation in the mTGase ORF to help develop new mTGase polypeptides having the desired production characteristics.

In some embodiments, the recombinantly expressed mature mTGase of the invention has activity comparable to that of a native mature mTGase (e.g., the native mTGase produced by fermentation of wild-type S. mobarensis). In some embodiments, a mature mTGase of the invention has at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70%, at least 65%, at least 60%, at least 55%, at least 50%, at least 45%, at least 40%, at least 35%, at least 30%, or at least 25% of the enzyme activity of a native mature mTGase. In some embodiments, a mature mTGase of the invention has 95%-100%, 90-95%, at 85%-90%, 80%-85%, 75%-80%, 70%-75%, 65%- 70%, 60%-65%, 55%-60%, 50%-55%, 45%-50%, 40%-45%, 35%-40%, 30%-35%, or 25%-30% of the enzyme activity of a native mature mTGase. The activity of the mTGase polypeptides of the invention can be measured using methods known in the art, whereby the ability of the mature polypeptide to catalyze the acyl transfer reaction between the γ-carboxyamide group of peptide-bound glutamine residues and a variety of primary amines is detected and/or measured. The assays may be performed in various formats. In some embodiments, mTGase enzyme activity is measured using a colorimetric hydroxamate procedure using /V-carbobenzoxy-L-glutaminyl-glycine as the amine acceptor substrate and hydroxylamine as amine donor (e.g., a commercially available kit such as the ZediXclusive Microbial Transglutaminase Assay Kit, Z0009, Zedira, Germany). In some embodiments, mTGase enzyme activity is measured by incubating a candidate antibody containing a glutamine tag (acyl donor) with a payload (acyl acceptor) and monitoring the formation of an antibody-payload conjugate.

Exemplary Uses

The recombinantly expressed mTGase produced by the methods of the invention is useful for a variety of purposes, including, various applications in the food and pharmaceutical industries (e.g., generation of antibody drug conjugates (ADCs) for therapeutic applications). Exemplary methods of generating antibody drug conjugates using mTGase are described in International Patent Application Publication Nos.

WO2012/059882 and WO2015/015448, which are incorporated herein by reference. Exemplary uses are further described below.

The mTGase polypeptides produced according to the methods of the invention may be used to generate site-specific and homogenous antibody-drug conjugates, antibody conjugates, or protein conjugates. Protein conjugation or modification using transglutaminase provides the advantages of high selectivity, simplified reaction procedures, and mild reaction conditions.

The mTGase polypeptides produced according to the methods of the invention may be used to make engineered polypeptide conjugates (e.g., antibody- drug- conjugates, toxin-(biocompatible polymer) conjugates, antibody-(biocompatible polymer) conjugates, and bispecific antibodies) comprising acyl donor glutamine- containing tags and amine donor agents.

In one aspect, the mTGase polypeptides produced according to the methods of the invention may be used to make engineered polypeptide conjugates (e.g., Fc- containing polypeptide-drug-conjugates, bispecific antibodies, Fab-containing

polypeptide -biocompatible polymer-conjugates, and toxin-biocompatible polymer conjugates). An mTGase polypeptide of produced according to the invention can be used to covalently crosslink an Fc-containing polypeptide engineered with an acyl donor glutamine-containing tag (e.g., Gin-containing peptide tags or Q-tags) or an

endogenous glutamine made reactive by polypeptide engineering (e.g., via amino acid deletion, insertion, substitution, mutation, or deglycosylation on the polypeptide), with an amine donor agent (e.g., a small molecule comprising or attached to a reactive amine) to form a stable and homogenous population of an engineered Fc-containing polypeptide conjugate with the amine donor agent being site-specifically conjugated to the Fc-containing polypeptide through the acyl donor glutamine-containing tag or the accessible/exposed/reactive endogenous glutamine. The conjugation efficiency of the Fc-containing polypeptide engineered with an acyl donor glutamine-containing tag (or the reactive endogenous glutamine) and the amine donor agent is at least about 51 %, and the conjugation efficiency between the Fc-containing polypeptide and the amine donor agent is less than about 5% in the absence of an acyl donor glutamine-containing tag or the accessible/exposed/reactive endogenous glutamine. For example, deletion or mutation of the last amino acid from Lys (lysine) to another amino acid in the Fc- containing polypeptide spatially adjacent to the Gin-containing peptide tag provides a significant increase in conjugation efficiency of the Fc-containing polypeptides and the small molecule (e.g., a cytotoxic agent or an imaging agent). The mTGase polypeptides produced according to the methods of the invention can be used to generate a stable and homogenous population of bispecific antibody using a Gin-containing peptide tag engineered to a first Fc-containing polypeptide directed to an epitope and another peptide tag (e.g. , a Lys containing polypeptide tag) engineered to a second Fc- containing polypeptide directed to a second epitope in reducing environment. A similar bispecific antibody can also be made by combining two different Fc-containing polypeptides engineered to two Gin-containing peptide tags with a diamine. The mTGase polypeptides produced according to the methods of the invention can be used to generate a stable and homogenous Fab-containing polypeptide conjugate or a toxin polypeptide conjugate with longer half life can be made by covalently reacting a Gin- containing peptide tag engineered to a Fab-containing polypeptide or a toxin

polypeptide with a biocompatible polymer. The selection of the acyl donor glutamine- containing tags, Fc- containing polypeptides, and/or the amine donor agents for site- specific conjugation is described in International Patent Application Publication Nos. WO2012/059882 and WO2015/015448. Without wishing to be bound by theory, the antibody-drug-conjugates, bispecific antibodies, antibody-biocompatible polymer conjugates, toxin-biocompatible polymer- conjugates generated using the methods described herein are stable, resistant to proteolytic degradation in vivo, in vitro, and ex vivo, and/or have longer half-life.

In one aspect, a mature mTGase polypeptides produced according to the methods of the invention is used to make an engineered Fc-containing polypeptide conjugate comprising the formula (Fc-containing polypeptide)-T-A, wherein T is an acyl donor glutamine-containing tag engineered at a specific site or comprises an

endogenous glutamine made reactive by the Fc-containing polypeptide engineering, wherein A is an amine donor agent, and wherein the amine donor agent is site- specifically conjugated to the acyl donor glutamine-containing tag or the endogenous glutamine. Exemplary acyl donor tags are disclosed in International Patent Application Publication Nos. WO2012/059882 and WO2015/015448.

In another aspect, a mature mTGase polypeptides produced according to the methods of the invention is used to make an engineered Fc-containing polypeptide conjugate comprising the formula: (Fc-containing polypeptide)-T-A, wherein the Fc- containing polypeptide conjugate comprises a first Fc-containing polypeptide; wherein T is an acyl donor glutamine-containing tag engineered at a specific site or comprises an endogenous glutamine made reactive by the Fc-containing polypeptide engineering; wherein A is an amine donor agent; wherein the amine donor agent comprises a second Fc-containing polypeptide and a tag and does not comprise a reactive Gin; and wherein the acyl donor glutamine-containing tag or the endogenous glutamine is site- specifically crosslinked to the first Fc-containing polypeptide and the second Fc- containing polypeptide.

In another aspect, a mature mTGase polypeptides produced according to the methods of the invention is used to make an engineered Fc-containing polypeptide conjugate comprising the formula: (Fc-containing polypeptide)-T-A, wherein the Fc- containing polypeptide comprises a first Fc-containing polypeptide and a second Fc- containing polypeptide; wherein T is an acyl donor glutamine-containing tag comprising a first acyl donor glutamine-containing tag and a second acyl donor glutamine- containing tag crosslinked to the first Fc-containing polypeptide and the second Fc- containing polypeptide, respectively; wherein A is an amine donor agent; and wherein the first and the second acyl donor glutamine-containing tags are site-specifically crosslinked to each other. In some embodiments, a mature mTGase polypeptides produced according to the methods of the invention is used to make is an engineered Fc-containing

polypeptide conjugate comprising the formula: (Fc-containing polypeptide)-A, wherein the Fc-containing polypeptide is a glycosylated at amino acid position 295 (e.g., in human lgG1 and lgG2) and comprises an amino acid modification at amino acid position 297 relative to a wild-type human lgG1 antibody; wherein A is an amine donor agent; and wherein the amine donor agent is site-specifically conjugated to the endogenous glutamine at amino acid position 295 in the Fc-containing polypeptide. In some embodiments, the amino acid modification is not a substitution from asparagine (Asn or N) to glutamine at position 297 of human IgG (EU numbering scheme).

In another aspect, a mature mTGase polypeptides produced according to the methods of the invention is used to make an engineered Fab-containing polypeptide conjugate comprising the formula: (Fab-containing polypeptide)-T-A, wherein T is an acyl donor glutamine-containing tag engineered at a specific site or comprises an endogenous glutamine made reactive by the Fab-containing polypeptide engineering; wherein A is an amine donor agent; wherein the amine donor agent is a biocompatible polymer comprising a reactive amine; and wherein the biocompatible polymer is site- specifically conjugated to the acyl donor glutamine-containing tag or the endogenous glutamine at a carboxyl terminus, an amino terminus, or at an another site in the Fab- containing polypeptide.

In another aspect, a mature mTGase polypeptides produced according to the methods of the invention is used to make an engineered toxin polypeptide conjugate comprising the formula: (toxin polypeptide)-T-A, wherein T is an acyl donor glutamine- containing tag engineered at a specific site or comprises an endogenous glutamine made reactive by the toxin polypeptide engineering; wherein A is an amine donor agent, wherein the amine donor agent is a biocompatible polymer comprising a reactive amine, and wherein the biocompatible polymer is site-specifically conjugated to the acyl donor glutamine-containing tag or the endogenous glutamine at a carboxyl terminus, an amino terminus, or at an another site in the toxin polypeptide.

In another aspect, a mature mTGase polypeptide produced according to the methods of the invention is used to make an engineered toxin polypeptide conjugate comprising the formula: (toxin polypeptide)-T-B, wherein T is an acyl donor glutamine- containing tag at a specific site; wherein B is a biocompatible polymer; and wherein the toxin polypeptide is site-specifically conjugated to the acyl donor glutamine-containing tag at any site in the biocompatible polymer.

In some embodiments, the molecules to be conjugated are contacted in a reducing environment in the presence of a transglutaminase. Methods of making the conjugates described above are described in detail in International Patent Application Publication Nos. WO2012/059882 and WO2015/015448.

In some embodiments, a mature mTGase polypeptide produced according to the methods of the invention is utilized in the production of foods such as gelatin, cheese, yogurt, tofu, kamaboko, hams, sausages, and noodles, etc. as well as for improving the quality of meat, and the like, and in the production of raw materials for heat-stable microcapsules, or carriers for immobilized enzymes and the like.

Equivalents

The foregoing written specification is considered to be sufficient to enable one skilled in the art to practice the disclosure. The foregoing description and Examples detail certain exemplary embodiments of the disclosure. It will be appreciated, however, that no matter how detailed the foregoing may appear in text, the disclosure may be practiced in many ways and the disclosure should be construed in accordance with the appended claims and any equivalents thereof.

All references cited herein, including patents, patent applications, papers, text books, and the like, and the references cited therein, to the extent that they are not already, are hereby incorporated herein by reference in their entirety.

Exemplary Embodiments

The invention is further described in detail by reference to the following experimental examples. These examples are provided for purposes of illustration only, and are not intended to be limiting unless otherwise specified. Thus, the invention should in no way be construed as being limited to the following examples, but rather, should be construed to encompass any and all variations which become evident as a result of the teaching provided herein.

Examples

Example 1. Materials and Methods

Materials and cell strains All chemicals used throughout the study were analytical grade and purchased from Sigma-Aldrich, America, unless stated otherwise. Bacterial strains E. coli TOP10 and BL21 (DE3) were obtained from Invitrogen. All enzymes for DNA manipulation were purchased from New England Biolabs.

Bacterial strains, plasmids and growth conditions

E. coli TOP10 and E. coli BL21 (DE3) were used as hosts for DNA manipulation and protein expression, respectively. Plasmids pET20b(+) (EMD Millipore) and pBAD-A (Life Technologies) served as basis for expression construct generation. Luria broth (LB) and Terrific broth (TB) were used for DNA plasm id cloning and protein expression, respectively. E. coli strains were grown at 37 °C for plasm id propagation and a combination of 37 °C and 20 °C for protein expression.

Construction of expression plasmids for simultaneous expression of mTGase and 3C- protease

Genes for mTGase and 3C-protease were both chemically synthesized and codon-optimized (GeneArt, Life Technologies) for growth in E coli. Expression plasmid pBAD-T7 [Fig. 1 (A)] was constructed by sub-cloning T7-promoter and T7-terminator of plasmid pET20b(+) into plasmid pBAD-A with Pcil and BsmBI restriction sites at the 5' and 3' end, respectively [Fig. 1 (A)]. For cloning the mPro-TGase gene into plasmid pBAD-T7 under the control of T7-promoter, restriction sites Xmal and Notl were inserted between T7-promoter and T7-terminator [Fig. 1 (A)]. To optimize protein expression levels of mTGase a codon for amino acid lysine (AAA) was placed as first amino acid following the start codon ATG (Looman et al. (1987) EMBO J 6:2489-2492). The six nucleotides coding for restriction site Xmal are adding amino acids proline and glycine to the N-terminus of the mTGase pro-domain [Fig. 1 (A) and (B)]. A 3C-protease cleavage site was inserted into the connecting loop of mTGase pro-domain and enzyme-domain between Pro44 and Asp45 of SEQ ID NO: 4 [Fig. 4(B)]. Full length 3C- protease gene was cloned into plasmid pBAD-T7 under the control of the araBAD promoter with Sacl and EcoRI restriction sites at the 5' and 3' end, respectively [Fig. 1 (A)].

mTGase pro-domain mutagenesis

The previously published protein crystal structure of full length mTGase (PDB ID 3IU0) was used to identify contact amino acid residues between mTGase pro- and enzyme-domain (Fig. 3) (Kashiwagi et al. (2002) The Journal of biological chemistry 277:44252-44260). Amino acid exchange of mTGase pro-domain residues Tyr10, Tyr14, Leu16, Asp20, Val21 , Asn23, Ile24, Ala25, Leu27, Asn28, and Glu29 of SEQ ID NO: 1 or SEQ ID NO: 4 to alanine residues was performed using the Quick Change II site-directed mutagenesis kit (Agilent Technologies). Mutagenesis primer and PCR conditions were designed according to manufacturer's instructions.

Fermentation and protein induction

Escherichia coli BL21 (DE3) cells transfected with plasmid pBAD-T7 were inoculated and grown overnight in LB broth at 37 °C on a platform shaker at 250 rpm. For selection of transfected cells Carbenicillin was added at a concentration of

100pg/ml. 3 ml cell culture was added to 100 ml TB broth, supplemented with 100 pg/ml Carbenicillin and incubated at 37 °C on a platform shaker at 250 rpm until an O.D.600 of 1 .0 to 1 .6 was reached. The temperature was lowered to 20 °C and cells were equilibrated to the lowered temperature for 30 to 40 minutes at continuous shaking. For simultaneous protein induction of genes, mPro-TGase and 3C-protease, Isopropyl β-D- 1 -thiogalactopyranoside (IPTG) and L-Arabinose were added at a final concentration of 0.4 mM and 0.2%, respectively. E. coli cells were further incubated at 20 °C for 20 hours.

Protein harvest and 2-step Ni-NTA purification

Cells were harvested by centrifugation at 6000g for 30 min. The cell pellet was frozen and stored at -80 °C. The frozen cell pellet was completely thawed on ice and resuspended in 50 ml of ice-cold 1x PBS. 30 mg of lyophilized Lysozyme (Sigma Aldrich) was dissolved in the cell suspension and incubated for 30 min on ice with careful inverting the tube every 10 min. Cells were disintegrated by sonication (Misonix; 100W; 4,000J; 6 sec/pulse; duration 1 :30 min). The disintegrated cells were centrifuged at 13,000g for 30 min and the supernatant was sterile filtered through a Millex-GP 0.2 micron syringe filter (Fisher Scientific). Imidazole, 2 M stock solution at pH 8.0, and sodium chloride were added at a final concentration of 10 mM and 500 mM,

respectively. The filtered supernatant was loaded onto a pre-packed 5 ml Ni-NTA column (HisTrap FF, GE Healthcare), equilibrated in loading-buffer (50 mM Tris-HCI, pH 8.0, 1 M NaCI, 5 mM Imidazole) at a flow rate of 2 ml/min on an AKTA explorer system (GE Healthcare). The column was washed with 6 column volumes of wash- buffer (50 mM Tris-HCI, pH 8.0, 1 M NaCI, 40 mM Imidazole) and protein was eluted off the column with elution-buffer (50 mM Tris-HCI, pH 8.0, 1 M NaCI, 250 mM Imidazole). Elution fractions containing protein were pooled and concentrated to 2 ml total volume in spin concentrators (Amicon Ultra-15, 10,000 NMWL, Millipore). As a second purification step the protein was loaded onto a size exclusion column (Superdex 200pg, High Load 16/600, GE Healthcare) which was equilibrated with acetate buffer (50 mM Acetate, pH 5.0, 400 mM NaCI). Purified protein was flash frozen in liquid nitrogen and stored at -80 °C.

SDS-PAGE and Western blot

Polyacrylamide gel electrophoresis (PAGE) was performed as described by the method of Laemmli (Laemmli (1970) Nature 227:680-685), using a Novex Mini Cell system and NuPAGE 4-12% Bis-Tris gels (Life Technologies). Protein gels were stained with Coomassie InstantBlue solution (Expedeon) according to manufacturer's instructions.

For Western blot analysis proteins were transferred to nitrocellulose membrane using the iBIot system (Live Technologies) following manufactures instructions. The membrane was incubated in blocking solution, 50 mM Tris-HCI, pH 8.0, 150 mM NaCI, 0.01 % Tween ® 20, 1 .5% BSA, for 2 hours at 4 deg. His-tagged proteins on the membrane were detected with mouse anti-His tag antibody (GenScript) for 1 hour at RT in binding buffer, 50 mM Tris-HCI, pH 8.0, 150 mM NaCI, 0.01 % Tween20, 1 % BSA. Non bound antibody was washed away with washing buffer, 50 mM Tris-HCI, 150 mM NaCI, 0.01 % Tween20, 3 times 10 min on orbital shaker. Bound antibody was probed with secondary goat anti mouse alkaline phosphatase conjugated F(ab')2 fragment (Jackson Immuno) in binding buffer for 1 hour at RT. Blotted protein was detected on the membrane with 1 -Step™ NBT/BCIP solution (Thermo Scientific). The reaction was stopped by washing with ddH 2 0 3 times.

Transglutaminase activity assay

mTGase enzyme activity was measured with a commercial available kit, which is based on a previously described colorimetric hydroxamate procedure using N- carbobenzoxy-L-glutaminyl-glycine (Zedira, Germany) (Folk and Cole (1966) The Journal of biological chemistry 241 :5518-5525). mTGase protein concentration was measured using a NanoDrop 2000 spectrophotometer (Thermo Scientific) assuming a calculated protein molar extinction coefficient of 70530 (Molar A280). mTGase enzyme protein concentration was adjusted to 0.125 mg/ml (3 μΜ) to assure activity

measurements within the linear range of enzyme assay. Mass spectroscopy

Prior to LC/MS analysis, ADCs were deglycosylated with PNGase F (NEB, cat#P0704L) under non-denaturing conditions at 37°C overnight. ADCs or TGase (500 ng) were loaded into a reverse phase column packed with a polymeric material

(Michrom-Bruker, cat# CM8/00920/00). LC/MS analysis was performed using Agilent 1 100 series HPLC system, comprising binary HPLC pump, degasser, temperature controlled auto sampler, column heater and diode-array detector (DAD), coupled to an Orbitrap Velos Pro (Thermo Scientific) mass spectrometer with electrospray ion source. The mass ranges acquired were m/z 1000-4000 and m/z 500-3000 for the detection of ADC and TGase respectively. The resulting mass spectra were deconvoluted using ProMass software (Thermo Fisher Scientific).

Antibody conjugation

For the conjugation of the antibody (Ab1 ) to the linker-payload AcLys-vc- 0101 (acetyl-lysine-valine-citrulline-p-aminobenzyloxycarbonyl-2- methylalanyl-/V- [(3R,4S,5S)-3-methoxy-1 -{(2S)-2-[(1 R,2R)-1 -methoxy-2-methyl-3-oxo-3-{[(1 S)-2- phenyl-1 -(1 ,3-thiazol-2-yl)ethyl]amino}propyl]pyrrolidin-1 -yl}-5-methyl-1 -oxoheptan-4-yl]- /V-methyl-L-valinamide), Ab1 was adjusted to 5 mg/ml_ in buffer containing 25 mM Tris- HCI at pH 8.0, and 150 mM NaCI, AcLys-vc-0101 was added in a 10-fold molar excess over antibody and the enzymatic reaction initiated by addition of 2% (w/v) bacterial transglutaminase (Ajinomoto Activa Tl, Japan) (Strop et al. (2013) Chemistry & biology 20: 161 -167). Following incubation with gentle shaking at 22 °C for 16 hours, the ADC was purified using MabSelect SuRe (GE Healthcare LifeSciences) using standard procedures.

HIC chromatography

Antibody-drug conjugates with zero, one or two drug moieties per antibody were separated using a TSK-GEL® Butyl-NPR column (4.6 mm x 3.5 cm) (Tosoh Bioscience, King of Prussia, PA) on an Agilent HP 1 100 HPLC (Agilent, Santa Clara, CA). The HIC method utilized a mobile phase of 1 .5 M ammonium sulfate, 50 mM potassium phosphate at pH 7 for Buffer A, and 50 mM potassium phosphate, 15% 2-propanol at pH 7 for Buffer B. Using a flow rate of 0.8 mL/min, 40 g of ADC in 0.75 M ammonium sulfate was loaded onto the column and eluted with a gradient consisting of a 2.5 min hold at 0% B, followed by a 35 minute linear gradient into 100% B. The column was washed with 100% Buffer B for 2.5 minutes and re-equilibrated with initial conditions for 5 minutes.

Example 2. Design of pro-mTGase plus 3C-protease dual gene expression plasmids In an effort to express soluble and active microbial transglutaminase (mTGase) in E. coli, a new system to express genetically engineered mTGase soluble in the cytoplasm of E. coli was developed. A vector was designed to simultaneously express genetically modified mTGase together with 3C-protease into the cytoplasm of E. coli.

Attempts at directly producing mature mTGase as a soluble protein in the cytoplasm, or in the periplasm of E. coli were unsuccessful (data not shown).

Expression of mTGase without its pro-domain either led to significant growth retardation upon induction of mTGase or generation of inclusion bodies (data not shown). Pro- mTGase and mature mTGase can be overexpressed in the cytoplasm of E. coli as insoluble inclusion bodies and subsequently refolded to yield various amounts of soluble pro-enzyme and mature enzyme (Yokoyama et al. (2000) Bioscience, biotechnology, and biochemistry 64: 1263-1270; Yang et al. (2009) Bioscience, biotechnology, and biochemistry 73:2531 -2534). Soluble pro-TGase can be produced in large quantities in the cytoplasm of E coli by lowering the temperature of protein induction below 37 °C (Marx et al. (2008) Journal of biotechnology 136: 156-162;

Sommer et al. (201 1 ) Protein expression and purification 77:9-19). Both approaches to produce pro-mTGase, however, necessitate in-vitro removal of pro-domain to yield active mTGase enzyme by proteases such as TAMEP, Dispase, and Trypsin (Zotzel et al. (2003) European journal of biochemistry / FEBS 270:4149-4155; Marx et al. (2007) Enzyme Microb Technol 40:1543-1550; Yang et al. (2009) Bioscience, biotechnology, and biochemistry 73:2531 -2534).

To produce large amounts of mature mTGase in E coli, it would be beneficial to have an expression system that allows soluble expression of mature mTGase followed by a straightforward purification procedure. Towards that goal, E .coli expression vectors that facilitate simultaneous expression of pro-mTGase and 3C-protease in the E coli cytoplasm were designed [Fig. 1 (A)]. To enable proteolytic cleavage of the pro- domain, the recognition site of 3C-protease was cloned in between the mTGase pro- domain and enzyme domain [Fig. 3(B)]. Both combinations, with pro-mTGase and 3C protease expressed separately under the control of T7- and the araBAD promoters, respectively, or vice versa, were constructed. Since the two promoters utilize different reagents to induce protein expression, (IPTG inducing T7-promoter and L-arabinose inducing araBAD-promoter activity) it is possible to either co-induce both genes or induce them sequentially. Example 3. Small scale protein expression

Expression of pro-mTGase was tested in both combinations, with pro-mTGase under the control of araBAD- or T7-promoter. In addition to the different promoter control of pro-mTGase, the influence of 3C-protease expression by either co-inducing together with pro-mTGase or delayed sequential expression was probed. In [Fig. 1 (C)] the results of a small scale expression study are summarized with pro-mTGase under the control of T7- promoter and 3C-protease under the control of the araBAD-promoter. Soluble fraction of crude lysates is shown in [Fig. 1 (C)], where lane 3 is the non-induced control supernatant. Simultaneous co-induction of both genes [Fig. 1 (C), lane 4] resulted in strong expression of fully processed mature mTGase as further

demonstrated with anti-His-tag western blot analysis in [Fig. 1 (D)]. Sequential induction with overnight expression of pro-mTGase followed by 3C-protease expression with increasing duration from 30 min. up to 2 hours showed gradual decrease of the amount of pro-mTGase [Fig. 1 (C) and (D)], lane 5-7). The addition of extra purified 3C-protease to the lysate almost completely processed pro-mTGase to mature mTGase [Fig. 1 (C) and (D), lane 8].

Interestingly, the same small scale expression study with the alternative orientation of gene expression with pro-mTGase under the control of the araBAD promoter and 3C-protease expression controlled by T7-promoter only showed weak expression of soluble processed mTGase enzyme (data not shown). Possible reasons for observed differences in expression levels of soluble mTGase depending on promoter orientation could be related to a combination of promoter strength and tightness of promoter control. The combination of genes and promoters as shown in (Fig. 1 A) enables a faster protein production rate of pro-mTGase over 3C-protease leading to optimal levels of mature mTGase accumulation.

Example 4. Expression, purification, and activity of E. coli derived mTGase

Expression was scaled up to enable protein purification of soluble mTGase enzyme for protein characterization and enzyme activity assays. The cloned His-tag at the C-terminus of mTGase facilitated straightforward Ni-NTA affinity purification of expressed protein from crude E. coli cell lysate and was followed by size exclusion chromatography (SEC). A single symmetric protein peak eluted at the expected retention time corresponding to a molecular weight of approx. 40 kDa, [Fig. 2(A)]. Two protein bands were visible when the main peak from the SEC was run on a SDS-page gel [Fig 2(A)]. One prominent larger band was running at the size of the mTGase mature-domain and a smaller protein band around 5-6 kDa protein size. The mass of the Ni-NTA purified mTGase was verified with mass spectroscopy on an Orbitrap Velos Pro (Thermo Scientific) mass spectrometer. The results revealed two masses of 5.63 and 39.2 kDa, corresponding to mTGase pro-domain and mature-domain, respectively [Fig. 2(B)]. The results of the mass spectroscopy analysis suggested that pro-mTGase was completely processed by 3C-protease, but that a significant fraction of the pro- domain protein remained non-covalently attached to the enzyme domain even after sequential purification steps of Ni-NTA plus SEC chromatography.

Wild type mTGase out of Streptomyces mobarensis, on the other hand, does not show any attached pro-domain [Fig. 5(A)]. The complete dissociation of the pro-domain in Streptomyces mobarensis expressed mTGase is likely due to multiple proteases acting on the pro-domain, cleaving it in several places and yielding fully active enzyme. The E. coli expression system described here, however, contains only one engineered protease cleavage site. The selectivity of the 3C-protease towards its recognition site makes it unlikely to cleave anywhere else within the mTGase pro-domain. The effect of the non-covalently associated pro-domain on the activity of E. coli mTGase was tested with a colorimetric assay (see Example 1 ). The enzyme activity of the purified wild type mTGase out of Streptomyces mobarensis was used as reference and represented 100% enzyme activity [Fig. 2(C)]. The mTGase purified from the described E coli system displayed approximately 50% enzyme activity as compared with wild type mTGase [Fig. 2(C)]. The data suggests that the decreased activity of E coli mTGase is due to the presence of the non-covalently attached pro-domain. This appears to be the first report that correlates reduced activity of heterologously expressed mTGase to the inability of the pro-domain to dissociate.

Example 5. Design of mTGase pro-domain mutants The crystal structures of the zymogen and mature form of mTGase (Kashiwagi et al. (2002) The Journal of biological chemistry 277:44252-44260; Yang et al. (201 1 ) The Journal of biological chemistry 286:7301 -7307) were utilized to design a series of pro- domain alanine mutants to loosen the interaction between the pro-domain and enzyme- domain to try to achieve full enzymatic activity (Fig 3). Since the pro-domain is known to also act as a folding chaperone for the enzyme-domain (Yurimoto et al. (2004)

Bioscience, biotechnology, and biochemistry 68:2058-2069; Liu et al. (201 1 ) Microbial cell factories 10:1 12), mutants that would be able to assist in mTGase folding, but would allow the pro-domain to dissociate more readily once the linker between pro- domain and enzyme-domain is cleaved by the co-expressed 3C-protease were sought. The crystal structure of mTGase zymogen revealed an L-shaped pro-domain binding to the enzyme-domain and completely occluding the active-site cleft of mTGase (Yang et al. (201 1 ) The Journal of biological chemistry 286:7301 -7307). The L-shaped pro- domain is composed of a short a-helix (residues 9-15) connected by a single residue to the second a-helix (residues 17-30) and together they cover the active-site cleft [Fig. 3(A) and (B)]. Within the short a-helix, two residues (Tyr10 and Tyr14) form the majority of the interaction with the enzyme active site cleft [Fig. 3(A)] (Yang et al. (201 1 ) The Journal of biological chemistry 286:7301 -7307). Only one hydrophobic residue, Leu16, forms a short connective loop between a-helices 1 and 2 of the mTGase pro-domain and points deep into the active-site cleft [Fig. 3(A)]. Eight residues of the longer second a-helix face towards the active-site cleft (Asp20, Asp21 , Asn23, Ile24, Asn25, Leu27, Asn28, and Glu29) and were selected together with Tyr10, Tyr14, and Leu16 for mutagenesis [Fig. 3(B)].

Example 6. Protein expression, inhibitor binding, and enzyme activity of pro-domain mutants

All of the pro-domain alanine mutants as well as the non-mutated control of pro- mTGase were expressed as described in Example 1 (Fig. 4). Pro-mTGase and 3C- protease protein production was induced simultaneously and soluble processed mTGase was purified out of crude E. coli lysate and analyzed by SDS-PAGE gel [Fig. 4(A)]. Non-mutated pro-domain mTGase control [Fig. 4(A), lane 3], and all alanine mutants showed various amounts of associated pro-domain. The amount of attached pro-domain of mTGase mutants seemed to inversely correlate with the amount of mTGase aggregate formation, judged by staining intensity of higher order bands on the SDS-PAGE gel [Fig. 4(A) and Fig. 4(D)]. Consistent with this finding, mTGase derived from Streptomyces, lacking any pro-domain also showed high levels of mTGase aggregates [Fig. 4(A), second lane]. Three mutants (Tyr10, Leu16, and Val21 ) revealed strong formation of mTGase aggregates whereas mutants Tyr14, Asp20, Ile24, and Asn25 showed only weak formation of mTGase aggregates. All other mutants and the control (non-mutated mTGase) did not form visible amounts of aggregates, which was also confirmed by western blot analysis [Fig. 4(A), Fig. 4(D)].

The level of enzyme activity of the mutants and controls was measured with a colorimetric assay at 37 °C. [Fig. 4(B)]. Based on activity, the mTGase pro-domain mutants could be grouped into two groups: the first group of mutants displayed full activity comparable to the native S. mobarensis mTGase (Tyr10, Tyr14, Leu16, Asp20, Val21 , Ile24, and Asn25). The second group of mutants (Asn23, Leu27, Asn28, and Glu29) showed approximately 50% activity and were similar in activity to the non- mutated processed mTGase [Fig. 4(B)].

Since all mTGase pro-domain mutants showed various levels of co-purified and associated pro-domain [Fig. 4(A)] it was surprising that some of the TG pro-domain mutants had nearly 100% activity at 37 °C, but still showed presence of the associated pro-domain (For example Y14A, D20A, V21A) [Fig 4(A)]. To explain how some mutants can display full activity with Z-Gln-Gly substrate (M.W. 337 Da, Zedira, Germany) even in the presence of presumably active site occluding pro-domain, we tested binding of a covalent inhibitor (MW 370 Da, Zedira, Germany) to TG variants at different

temperatures [Fig. (4C)]. Following incubation, the ratio of inhibitor-bound versus non- bound mTGase enzyme was calculated using mass spectroscopy. When the mTGase inhibitor binding assay was done at 37°C (the same temperature as the activity assay) [Fig. (4C), lighter bars] the ratio of inhibitor bound to unbound mTGase enzyme of all mTGase pro-domain mutants and controls corresponded very well to the measured mTGase enzyme activity as shown in [Fig. 4(B)]. The inhibitor binding experiment was also repeated at a lower temperature [Fig. (4C), darker bars]. Temperature independent binding was observed for pro-domain mutants Tyr10, Tyr14, Leu16, Asp20, and Val21 [Fig. 4(C)], almost all of which showed an enzyme activity identical to that of the wild type mTGase enzyme out of S. mobarensis [Fig. 4(B)]. On the other hand, pro-domain mutants Asn23, Ile24, Asn25, Leu27, Asn28, and Glu29 showed reduced or no inhibitor binding when the assay was performed at 25°C .

In most cases, the yield of purified soluble mTGase protein per liter E. coli cell culture was inversely correlated with the level of enzyme activity [Figs. 4(B) and 6]. Pro- domain mutants with high level of mTGase enzyme activity (Tyr14, Leu16, Val21 , Ile24, and Asn25) yielded significantly less protein than mutants with low enzyme activity (Asn23, Lau27, Asn28, and Glu29). One exception was the Asp20 mutant, which showed only about 20% reduced enzyme activity, but yielded a relatively high expression level Fig. (6).

Previous results in the literature showed that intact non-processed pro-TGase does not show any enzyme activity (Pasternack et al. (1998) European journal of biochemistry / FEBS 257:570-576). 3C-protease clipping of the connective loop between mTGase pro-domain and enzyme-domain [Fig. 3(B)] loosened the interaction between the two domains enough to gain about 50% of wild type S. mobarensis enzyme activity [Fig. 4(B), lane 2]. Similar to the processed non-mutated control that appears to be about 50% active, none of the mutants that have a similar 50% activity (Asn23, Leu27, Asn28, and Glu29) show formation of higher order mTGase aggregates [Fig. 4(A), Fig. 4(D)]. The appearance and amount of mTGase aggregates closely correlated with high enzyme activity [Fig. 4(A) and (B)] but were inversely correlated with purified enzyme yield [Fig. 6]. This correlation was especially evident with pro- domain mutants Tyr10, Leu16, and Val21 [Figs. 4(A), (B), and Fig. 6] likely reflects the impact of active mTGase enzyme on E. coli cell viability during protein induction.

Example 7. Pro-domain interactions

Tyr10 and Tyr14 are located in helixl of the L-shaped pro-domain and while Leu16 is in the connecting loop between helixl and helix2 [Fig. 3(A)]. All three residues protrude deep into the active side cleft of mTGase and occlude the catalytic triad (Cys1 10, Aps301 , and His320) (Yang et al. (201 1 ) The Journal of biological chemistry 286:7301 -7307). The results of the inhibitor study shown in [Fig. 4(C)] suggest that single contact residue mutations within the short helixl , connecting loop, or N-terminal part of helix2 can change the strength of pro-domain:enzyme-domain interaction enough to allow access of substrate (or inhibitor) into the active site and can restore full activity. Helix2 of the mTGase pro-domain mainly interacts with the protruding loops (Asn285 -Asn299) of mTGase enzyme domain via hydrogen bonds and van der Waals contacts (Yang et al. (201 1 ) The Journal of biological chemistry 286:7301 -7307).

Mutations in helix2 (residues Asn23, Asn28, and Glu29) did not show any binding of the covalent inhibitor, or showed only partial binding (residues Ile24, Asn25, and Leu27) when incubated at 25 °C. At 37 °C, however, the amount of bound inhibitor closely corresponded with the level of enzyme activity [Fig. 4(B) and (C)].

The temperature dependent inhibitor binding pattern suggests that mutations within helixl , linker, and the N-terminal part of helix 2 are more critical for active site access. Without being bound by theory, it is hypothesized that helix 1 , which directly covers the active site Cys1 10, is mostly responsible for preventing premature activation of the pro-enzyme upon secretion of the molecule into the medium. The longer pro- domain helix 2 seems to stabilize the interaction with the enzyme domain; individual molecular interactions along helix 2 are weaker, though, compared with helixl (Yang et al. (201 1 ) The Journal of biological chemistry 286:7301 -7307). Processing of non- mutated pro-domain after helix2 by 3C-protease in our initial construct [Fig. 3(B)] displays approximately 50% activity as compared to wild type S. mobarensis mTGase [Fig. 4(B), second bar]. Consistent with this result, this construct also displayed about 50% inhibitor binding, however, unlike the less active mutants, was independent of temperature [Fig. 4(C), second bar].

Our pro-domain alanine mutational study, together with the results of activity and inhibitor binding assays suggest that more than one mTGase processing enzyme likely exists in S. mobarensis, to facilitate complete dislodging of pro-domain from enzyme domain upon mTGase activation outside of the microbial cell. This appears to be the first study that sheds light on the residues at the interface that are important for mTGase pro- and enzyme domain interaction.

Example 8. Generation of site specific antibody drug conjugates

To verify that the E. coli derived mTGase of the disclosure is suitable for pharmaceutical applications, the drug conjugation efficiencies of commercially available mTGase (S. m. TGase) and an mTGase produced and purified from E. coli as described herein (E. coli mTGase, pro-domain mutant Y14A) were compared.

Previously, it has been shown that mTGase does not recognize naturally occurring glutamines in the constant regions of glycosylated human lgG1 antibodies (See, e.g., Jeger et al. (2010). Angewandte Chemie 49:9995-9997; Strop et al. (2012). International Patent Application Publication No. WO2012059882), providing the opportunity to design a specific "glutamine tag" that can be engineered at desired locations in an antibody. Therefore, mTGase was used to conjugate AcLys-vc-PABC- Aur0101 linker-payload [Fig 7(C)] to an antibody (Ab1 ) engineered with a glutamine tag at the C-terminus of heavy chain [Fig. 7(B)]. Hydrophobic interaction chromatography (HIC) revealed that both, S. m. TGase as well as E. coli mTGase, yielded a product with the average drug-antibody ratio (DAR) of 1.95 per expected maximum of 2.0 [Fig. 7(D)]. Intact liquid chromatography-mass spectrometry analysis confirmed the loading ratios for each conjugate [Fig. 7(E)].

For selected pro-domain mutants and controls, the EC 50 values of antibody drug conjugation efficiency were determined (Fig. 8). The data show that at high enzyme concentrations, all mutants were able to achieve high conjugation yields. As the amounts of mTGase were titrated down, mutants that did not show full S. mobarensis mTGase activity yielded worse average drug-antibody ratios.

Although the disclosed teachings have been described with reference to various applications, methods, kits, and compositions, it will be appreciated that various changes and modifications can be made without departing from the teachings herein and the claimed invention below. The foregoing examples are provided to better illustrate the disclosed teachings and are not intended to limit the scope of the teachings presented herein. While the present teachings have been described in terms of these exemplary embodiments, the skilled artisan will readily understand that numerous variations and modifications of these exemplary embodiments are possible without undue experimentation. All such variations and modifications are within the scope of the current teachings.

All references cited herein, including patents, patent applications, papers, text books, and the like, and the references cited therein, to the extent that they are not already, are hereby incorporated by reference in their entirety. In the event that one or more of the incorporated literature and similar materials differs from or contradicts this application, including but not limited to defined terms, term usage, described

techniques, or the like, this application controls. The foregoing description and Examples detail certain specific embodiments of the invention and describes the best mode contemplated by the inventors. It will be appreciated, however, that no matter how detailed the foregoing may appear in text, the invention may be practiced in many ways and the invention should be construed in accordance with the appended claims and any equivalents thereof.