Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
HIGH AFFINITY DIGOXIGENIN BINDING PROTEINS
Document Type and Number:
WIPO Patent Application WO/2014/159947
Kind Code:
A1
Abstract:
Isolated polypeptides with steroid binding activity and methods for their use as therapeutics and detection agents are disclosed herein.

Inventors:
BAKER DAVID (US)
TINBERG CHRISTINE E (US)
KHARE SAGAR D (US)
DOU JIAYI (US)
Application Number:
PCT/US2014/025500
Publication Date:
October 02, 2014
Filing Date:
March 13, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV WASHINGTON CT COMMERCIALI (US)
International Classes:
C07K14/435
Foreign References:
DE3508307A11986-04-24
US20070020624A12007-01-25
Other References:
DATABASE UniProt [online] 3 April 2007 (2007-04-03), "SubName: Full=Putative uncharacterized protein;", XP002726608, retrieved from EBI accession no. UNIPROT:A3LDC9 Database accession no. A3LDC9
DATABASE UniProt [online] 28 November 2012 (2012-11-28), "SubName: Full=Isomerase; SubName: Full=Uncharacterized protein;", XP002726609, retrieved from EBI accession no. UNIPROT:K1DMJ3 Database accession no. K1DMJ3
DATABASE UniProt [online] 25 January 2012 (2012-01-25), "SubName: Full=Uncharacterized protein;", XP002726610, retrieved from EBI accession no. UNIPROT:G5FN39 Database accession no. G5FN39
CHRISTINE E. TINBERG ET AL: "Computational design of ligand-binding proteins with high affinity and selectivity", NATURE, vol. 501, no. 7466, 4 September 2013 (2013-09-04), pages 212 - 216, XP055125367, ISSN: 0028-0836, DOI: 10.1038/nature12443
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
"Gene Expression Technology", vol. 185, 1991, ACADEMIC PRESS
"Methods in Enzymology", 1990, ACADEMIC PRESS, INC., article "Guide to Protein Purification"
INNIS ET AL.: "PCR Protocols: A Guide to Methods and Applications", 1990, ACADEMIC PRESS
R.I. FRESHNEY: "Culture of Animal Cells: A Manual of Basic Technique", 1987, LISS, INC.
"Gene Transfer and Expression Protocols", THE HUMANA PRESS INC., pages: 109 - 128
SAMBROOK; FRITSCH; MANIATIS: "Molecular Cloning, A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
SCHREIER, B.; STUMPP, C.; WIESNER, S.; H6CKER, B.: "Computational design of ligand binding is not a solved problem", PROC. NATL. ACAD. SCI. USA, vol. 106, 2009, pages 18491 - 18496
DE WOLF, F.A.; BRETT, G.M.: "Ligand-binding proteins: their potential for application in systems for controlled delivery and uptake of ligands", PHARMACOL. REV., vol. 52, 2000, pages 207 - 236, XP002241035
HUNTER, M.M.; MARGOLIES, M.N.; JU, A.; HABER, E.: "High-affinity monoclonal antibodies to the cardiac glycoside, digoxin", J. IMMUNOL., vol. 129, 1982, pages 1165 - 1172
SHEN, X.Y.; ORSON, F.M.; KOSTEN, T.R.: "Vaccines against drug abuse", CLIN. PHARMACOL. THER., vol. 91, 2012, pages 60 - 70
BRADBURY, A.R.M.; SIDHU, S.; DÜBEL, S.; MCCAFFERTY, J.: "Beyond natural antibodies: the power of in vitro display technologies", NAT. BIOTECHNOL., vol. 29, 2011, pages 245 - 254
BRUSTAD, E.M.; ARNOLD, F.H.: "Optimizing non-natural protein function with directed evolution", CURR. OPIN. CHEM. BIOL., vol. 15, 2011, pages 201 - 210, XP028187353, DOI: doi:10.1016/j.cbpa.2010.11.020
CHEN, G. ET AL.: "Isolation of high-affinity ligand-binding proteins by periplasmic expression with cytometric screening (PECS", NAT. BIOTECHNOL., vol. 19, 2001, pages 537 - 542, XP001146674, DOI: doi:10.1038/89281
TELMER, P.G.; SHILTON, B.H.: "Structural studies of an engineered zinc biosensor reveal an unanticipated mode of zinc binding", J. MOL. BIOL., vol. 354, 2005, pages 829 - 840, XP005216832, DOI: doi:10.1016/j.jmb.2005.10.016
BAKER, D.: "An exciting but challenging road ahead for computational enzyme design", PROTEIN SCI., vol. 19, 2010, pages 1817 - 1819
JIANG, L. ET AL.: "De novo computational design ofretro-Aldol enzymes", SCIENCE, vol. 319, 2008, pages 1387 - 1391
KHARE, S.D.; FLEISHMAN, S.J.: "Emerging themes in the computational design of novel enzymes and protein-protein interfaces", FEBS LETT., 2013
KHERSONSKY, O. ET AL.: "Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59", PROC. NATL. ACAD. SCI. USA, vol. 109, 2012, pages 10358 - 10363
ROTHLISBERGER, D. ET AL.: "Kemp elimination catalysts by computational enzyme design", NATURE, vol. 453, 2008, pages 190 - 195
WANG, L. ET AL.: "Structural analyses of covalent enzyme-substrate analog complexes reveal strengths and limitations of de novo enzyme design", J MOL. BIOL., vol. 415, 2012, pages 615 - 625, XP028440900, DOI: doi:10.1016/j.jmb.2011.10.043
BOEHR, D.D.; NUSSINOV, R.; WRIGHT, P.E.: "The role of dynamic conformational ensembles in biomolecular recognition", NAT. CHEM. BIOL., vol. 5, 2009, pages 789 - 796
FLEISHMAN, S.J.; KHARE, S.D.; KOGA, N.; BAKER, D.: "Restricted sidechain plasticity in the structures of native proteins and complexes", PROTEIN SCI., vol. 20, 2011, pages 753 - 757
LAWRENCE, M.C.; COLMAN, P.M.: "Shape complementarity at protein/protein interfaces", J MOL. BIOL., vol. 234, 1993, pages 946 - 950, XP024008764, DOI: doi:10.1006/jmbi.1993.1648
"The Digitalis Investigation Group. The effect of digoxin on mortality and morbidity in patients with heart failure", N. ENGL. J. MED., vol. 336, 1997, pages 525 - 533
EISEL, D.; SETH, O.; GRUNEWALD-JANHO; KRUCHEN, B.: "Roche Diagnostics", 2008, article "DIG application manual for nonradioactive in situ hybridization"
FLANAGAN, R.J.; JONES, A.L.: "Fab antibody fragments: some applications in clinical toxicology", DRUG SAFETY, vol. 27, 2004, pages 1115 - 1133
CHAO, G. ET AL.: "Isolating and engineering human antibodies using yeast surface display", NAT. PROTOC., vol. 1, 2006, pages 755 - 768, XP002520702, DOI: doi:10.1038/NPROT.2006.94
FOWLER, D.M. ET AL.: "High-resolution mapping of protein sequence-function relationships", NAT. METHODS, vol. 7, 2010, pages 741 - 746, XP055013750, DOI: doi:10.1038/nmeth.1492
MCLAUGHLIN JR, R.N.; POELWIJK, F.J.; RAMAN, A.; GOSAL, W.S.; RANGANATHAN, R.: "The spatial architecture of protein function and adaptation", NATURE, vol. 491, 2012, pages 138 - 142
WHITEHEAD, T.A. ET AL.: "Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing", NAT. BIOTECHNOL., vol. 30, 2012, pages 543 - 548
FERSHT, A.R ET AL.: "Hydrogen bonding and biological specificity analysed by protein engineering", NATURE, vol. 314, 1985, pages 235 - 238
FREDERICK, K.K.; MARLOW, M.S.; VALENTINE, K.G.; WAND, A.J.: "Conformational entropy in molecular recognition by proteins", NATURE, vol. 448, 2007, pages 325 - 329
FLEISHMAN, S.J.; BAKER, D.: "Role of the biomolecular energy gap in protein design, structure, and evolution", CELL, vol. 149, 2012, pages 262 - 273
ZANGHELLINI, A. ET AL.: "New algorithms and an in silico benchmark for computational enzyme design", PROTEIN SCI., vol. 15, 2006, pages 2785 - 2794
KUHLMAN, B.; BAKER, D.: "Native protein sequences are close to optimal for their structures", PROC. NATL. ACAD. SCI. USA, vol. 97, 2000, pages 10383 - 10388
ROSSI, A.M.; TAYLOR, C.W.: "Analysis of protein-ligand interactions by fluorescence polarization", NAT. PROTOC., vol. 6, 2011, pages 365 - 387, XP055233184, DOI: doi:10.1038/nprot.2011.305
FLEISHMAN, S.J. ET AL.: "RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite", PLOS ONE, vol. 6, 2011, pages E20161
JIANG, L. ET AL.: "De novo computational design of retro-aldol enzymes", SCIENCE, vol. 319, 2008, pages 1387 - 1391
SIEGEL, J.B. ET AL.: "Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction", SCIENCE, vol. 329, 2010, pages 309 - 313, XP055263022, DOI: doi:10.1126/science.1190239
RICHTER, F.; LEAVER-FAY, A.; KHARE, S.D.; BJELIC, S.; BAKER, D.: "De novo enzyme design using Rosetta3", PLOS ONE, vol. 6, 2011, pages E19230
KELLOGG, E.H.; LEAVER-FAY, A.; BAKER, D.: "Role of conformafional sampling in computing mutation-induced changes in protein structure and stability", PROTEINS, vol. 79, 2010, pages 830 - 838
COOPER, S. ET AL.: "Predicting protein structures with a multiplayer online game", NATURE, vol. 466, 2010, pages 756 - 760
BENATUIL, L.; PEREZ, J.M.; BELK, J.; HSIEH, C.-M.: "An improved yeast transformation method for the generation of very large human antibody libraries", PROTEIN ENG. DES. SEL., vol. 23, 2010, pages 155 - 159, XP002637051, DOI: doi:10.1093/PROTEIN/GZQ002
WHITEHEAD, T.A. ET AL.: "et al. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing", NAT. BIOTECHNOL., vol. 30, 2012, pages 543 - 548
FOWLER, D.M.; ARAYA, C.L; GERARD, W.; FIELDS, S.: "Enrich: software for analysis of protein function by enrichment and depletion of variants", BIOINFORMATICS, vol. 27, 2011, pages 3430 - 3431
Attorney, Agent or Firm:
HARPER, David, S. (Berghoff LLP300 South Wacker Drive, Suite 310, Chicago IL, US)
Download PDF:
Claims:
We claim

1. An isolated polypeptide comprising an amino acid sequence according to SEQ ID NO: 1, wherein the amino acid sequence is at least 70% identical to the amino acid sequence of SEQ ID NO: 15, and wherem the amino acid sequence is not the amino acid sequence of SEQ ID NO: 24.

2. The isolated polypeptide of claim 1, comprising the amino acid sequence of SEQ ID NO: 2

3. The isolated polypeptide of claim 1 or 2, comprising the amino acid sequence of SEQ ID NO: 3.

4. The isolated polypeptide of any one of claims 1-3, comprising the amino acid sequence of SEQ ID NO: 4.

5. The isolated polypeptide of any one of claims 1-4, comprising the amino acid sequence of SEQ ID NO: 5,

6. The isolated polypeptide of any one of claims 1 -4, comprising the amino acid sequence of SEQ ID NO: 6.

7. The isolated polypeptide of any one of claims 1-6, comprising the amino acid sequence of SEQ ID O: 7.

8. The isolated polypeptide of any one of claims 1-6, comprising the amino acid sequence of SEQ ID NO: 8.

9. The isolated polypeptide of any one of claims 1-6, comprising the amino acid sequence of SEQ ID NO: 9.

10. The isolated polypeptide of any one of claims 1 -9, wherein each of residues 34, 101 , and 1 15 are Y.

1 1. The isolated polypeptide of any one of claims 1 -9, wherein 1 , 2, or all 3 of residues 34, 101 , and 1 15 are F.

12. The isolated polypeptide of any one of claims 1 -9, wherem residue 84 is Y.

13. The isolated polypeptide of any one of claims 1-12, wherein at least one of the following is true:

Residue 7 is L;

Residue 41 is W;

Residue 58 is H;

Residue 61 is H;

Residue 64 is W;

Residue 90 is V; Residue 97 is Y;

Residue 103 is T;

Residue 1 15 is L;

Residue 1 19 is W;

Residue 124 is I; and/or

Residue 128 is A.

14. The isolated polypeptide of claim 13, wherein 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , or all 12 of the residues are as defined.

15. The isolated polypeptide of any one of claims 1- 14, selected from the group consisting of SEQ ID NOS:10-23.

16. The isolated polypeptide of any one of claims 1- 15, wherein the isolated polypeptide is at least 80% identical to the amino acid sequence of SEQ ID NO: 15.

17. The isolated polypeptide of any one of cla ims 1 -16, further comprising a detectable tag.

1 8. A pharmaceutical composition, comprising one or more polypeptides of any one of claims 1-17 and a pharmaceutically acceptable carrier.

19. An isolated nucleic acid encoding the polypeptide of any one of claims 1-17.

20. A recombinant expression vector comprising the isolated nucleic acid of claim 19 operably linked to a control sequence.

21. A recombinant host ceil comprising the recombinant expression vector of claim 20.

22. A method for treating digoxin overdose and/or toxicity, comprising administering to a subject in need thereof an amount effective of the polypeptide of any one of claims 1- 17 or the pharmaceutical composition of claim 18 to treat the digoxin toxicity.

23. The method of claim 22, wherein the polypeptide is the polypeptide of claim 10 or 15.

24. A method for detecting digoxin, comprising contacting a sample of interest with the polypeptide of any one of claims 1 - 17 under suitable conditions for binding the detectable polypeptide to digoxin present in the sample to form a polypeptide-digoxm binding complex, and detecting the polypeptide-digoxm binding complex.

Description:
HIGH AFFINITY DIGOXIGENIN BINDING PROTEINS

Related Applications

This Application claims priority to U.S. Provisional Patent Application Serial No. 61/784,618 filed March 14, 2.013, incorporated by reference herein in its entirety. Statement of Government Rights

This invention was made with government support under HDTRAl- 1 1-1 -0041 awarded by Defense Threat Reduction Agency. The government has certain rights in the invention. Background

The ability to design proteins with high affinity and selectivity for any given small molecule would have numerous applications in biosensing, diagnostics, and therapeutics, and is a rigorous test of our understanding of the physiochemical principles that govern molecular recognition phenomena. Attempts to design ligand binding proteins have met with little success, however, and the computational design of precise molecular recognition between proteins and small molecules remains an "unsolved problem".

Summary of the Invention

The present invention provides polypeptides that are high affinity polypeptide ligands of the steroid digoxigenin (DIG) or the related steroids digitoxigenin, progesterone, and β- estradiol, as well as digoxin. The inven tors have identified posi tions of the polypeptides of the invention that provide specificity of the polypeptides for DIG or one or more of the related steroids. As such, the polypeptides of the invention can be used, for example, in steroid biosensors and diagnostics, as well as for therapeutic applications.

In one aspect, the invention provides isolated polypeptides comprising or consisting of an amino acid sequence according to SEQ ID NO: l, wherein the amino acid sequence is at least 70% identical to the amino acid sequence of SEQ ID NO: 15, and wherein the amino acid sequence is not the amino acid sequence of SEQ ID NO: 24.

In various embodiments, the polypeptides comprise or consist of the amino acid sequence of one of SEQ ID NOS: 2- The isolated polypeptide of claim 1 , comprising the amino acid sequence of SEQ ID NO: 2-23. In another embodiment, each of residues 34, 101 , and 1 15 are Y. In a further embodiment, 1, 2, or all 3 of residues 34, 101, and 1 15 are F. In a further embodiment, residue 84 is Y, In another embodiment, at least one of the following is true:

Residue 7 is L;

Residue 41 is W;

Residue 58 is H;

Residue 61 is H;

Residue 64 is W;

Residue 90 is V;

Residue 97 is Y;

Residue 103 is T;

Residue 1 15 is L;

Residue 1 19 is W;

Residue 124 is I; and/ or

Residue 128 is A.

In another aspect, pharmaceutical compositions are provided, comprising one or more polypeptides of the invention and a pharmaceutically acceptable earner. The invention also provides isolated nucleic acids encoding a polypeptide of the invention, recombinant expression vector comprising the isolated nucleic acid of the inveniion operabl linked to a control sequence, and recombinant host cell comprising the recombinant expression vectors of the invention.

In a further aspect, the invention provides methods for treating digoxin overdose and/or toxicity, comprising administering to a subject in need thereof an amount effective of one or more polypeptides or pharmaceutical compositions of the invention to treat the digoxin toxicity.

In another aspect, the invention provides methods for detecting digoxin, comprising contacting a sample of interest with one or more polypeptides of the invention under suitable conditions for binding the detectable polypeptide to digoxin present in the sample to form a polypeptide-digoxin binding complex, and detecting the polypeptide-digoxin binding complex.

Description of the Figures Figure 1. Computational Design Methodology and Experimental Validation, a, Overview of the computational design procedure. First, the geometric positions of a set of pre-chosen interaction side chains are defined with respect to the ligand (left panel), and rotamers for each interaction side chain are enumerated (left, panel, inset). Second, a set of scaffolds is searched for backbones that can accommodate all of the desired interactions. For cases in which all chosen interaction residues can be placed in the scaffold proiein and orient the ligand in the native binding cavity with no backbone clashes, the binding site sequence is optimized for binding affinity (center panel). Designs having native-like properties, such as high shape complementarity and binding site pre-organizaiion, are chosen for experimental characterization (right panel), b. Ranking of the 17 experimentally characterized DIG designs by ligand interaction energy (Rosetta energy units, Reu) and the average (geometric mean) Boltzmann weight of the conformations of the side chains that hydrogen bond to the ligand. DIG 10, depicted in red, scores the best by both metrics, c, Flow cytometric analysis of yea st cells expressing computationally designed proteins as part of a surface-targeted fusion protein with a C-terminal c-Myc tag. Yeast surface expression and DIG binding were confirmed by labeling the ceils with a fluorescein (FITC)-conjugated anti-e-Myc antibody and a pre- incubated mixture of 2.7 μΜ biotinylated DIG-functionalized BSA (~10 DIG/BSA) and phycoerytnrin (PE)-conjugated streptavidin, respectively. Cell populations shown are a negative control for binding (ZZ(-)), an anti-DIG antibody serving as a positive control for binding (ZZ(+)), DIG 10, DIG 10 in the presence of excess (730 uM) unlabeled DIG competitor, and scaffold lzls. DIG 10 labeled with 2.7 μΜ biotinylated DIG-functionalized RNase (~6 DIG/RNase) is also shown, d, On-yeast substitutions of DIG 10 designed hydrogen-bonding residues Tyr34, TyrlOl, and Tyrl 15 to Phe and binding cavity residue Vail 17 to Arg reduces expressing-population compensated mean binding (PE) signals to background nonbinding (ZZ(-)) levels.

Figure 2, Affinity Maturation, a, Binding fitness landscape of DIG 10.1 a probed by deep sequencing. The effect of each amino acid substitution at 39 binding site residues on binding (Δ£; Λ ) was assessed by determining the log 2 ratios of the frequencies of substitutions to each amino acid at each position after selection for DIG binding to ihe frequencies of the substitutions in the unselected population. Colored grids represent single point mutations having >20 counts in the unselected N-terminai (fragment 1 ) and C-terminal (fragment 2) libraries. White grids show mutations for which there were not enough sequences in the unselected library to make a definitive conclusion about function. The initial DIGlO. l a amino acid at each position is indicated in bold using its one-letter amino acid code, b, The optimaiiiy of each DIG 10. la input residue type mapped onto the computational model of

X -

DIG 10.1a. OptimaHtv is defined as the positional Z-score: Z - —where x is the sum of σ

enrichment values at position / ' , μ is the mean sum of enrichment values for all interrogated positions within the fragment library, and o" is the standard deviation of the sums of enrichment values for all interrogated positions within the fragment library. Blue is very optimal (mutations to all other amino acids are disfavored) and red is very suboptimai (mutations are preferred), c. Equilibrium fluorescence polarization measurements of DIG- PEG 3 -Alexa488 treated with increasing amounts of DIG 10, DIGS, scaffold izls, and negative control bovine serum albumin (left panel). Solid lines represent fits to the data to obtain dissociation constants (¾ values) (right panel). Error bars represent standard deviations for at least three independent measurements. ¾ values of relevant designs and affinity matured DIG 10 variants are given in the right panel, d, Mutations identified during affinity maturation to generate DIG 10. la, DIG 10.2, and DIG 10,3 mapped on to the computational model of DIG 10.3.

Figure 3. Crystal Structures of the DIG 10.2- and DIG10.3-DIG Complexes, a.

Surface representation of the DIG! 0.2-DIG complex showing the high overall shape complementarity of the interface, DIG is depicted in spheres. DIG 10.2 is a dimer and crystallized with four copies in the asymmetric unit, b, 2F 0 - F c omit map electron density of DIG interacting with Tyr34, TyrlOl , and Tyrl 15 contoured at 1.0 sigma. c, Backbone superposition of the crystal structure of the DI.G1Q.2-DIG complex with the computational model shows close agreement between the two. d, Binding site backbone superposition shows that the iigand and the three programmed Tyr hydroxy] groups are in their designed conformations, e, Configurational side chain entropy between the four crystallographic copies of the DIG 10.2-DIG (left panel) complex and chains A, B, C, H, and I of the DIG10.3-DIG (right panel) complex. The side chains of DIG10.3 at positions 103, 105, and 1 15 each adopt only a single rotamer. DIG 10.2 Tyrl 15 conformation A adopts a more canonical hydrogen- bonding geometry than that of conformation B.

Figure 4. Steroid Binding Selectivity, a, The x-ray crystal structure of the

DIG10.3-DIG complex (left panel) and the chemical structures of steroids interrogated in equilibrium competition fluorescence polarization assays (right panel), b. Steroid selectivity profile of DIG 10,3. Solid fines represent fits to the data to obtain half-maximal inhibitory concentrations (IC 50 values) and error bars indicate standard deviations for at least three independent measurements, c, Steroid selectivity profile of DIG10.3 TyrlOlPhe. Dashed lines show qualitative assessments of the inhibitory effects for cases in which the data could not be fit due to experimental limitations (see Supplementary Methods), d, Steroid selectivit '- profile of DIG 10.3 Tyr34Phe. e, Steroid selectivity profile of DIG 10.3

Tyr34Phe/Tyr99Phe/Tyr 10 IF.

Figure 5. Experimental characterization of computationally designed DIG binders by yeast surface display, a, Compensated mean binding (PE) signals of the expressing populations of yeast cells displaying computationally designed proteins on iheir cell surfaces. Binding was interrogated by labeling cells with a pre-incubated mixture of 2.7 μΜ biotinylated DIG-functionalized BSA (~10 DIG/BSA) and phyeoerythrin (PE)-conjugated streptavidin. Cell populations shown are an anti-DIG antibody serving as a positive control for binding (ZZ(+)), an engineered DIG-binding lipocaiin (DigA16(+)), two negative controls for binding (ZZ(-) and S2(-)), and designed proteins DIG1 through DIGIT. DIG 10 and DIG 15 show strong binding signals. DIGS shows a reproducible signal that is slightly above background levels (starred), b, Binding was interrogated by labeling cells with a pre- incubated mixture of 2.7 μΜ biotinylated DIG-functionalized RNase (~6 DIG/RNase) and phyeoerythrin (PE)-conjugaied streptavidin. DIG5, DIGS, and DIG 10 show strong binding signals. DIG 10 and DIGS bind to both labels.

Figure 6. Experimental yeast surface display competition assays of DIGS, DIGS, and

DIG15. a, Compensated mean binding (PE) signals of the expressing populations of y east cells displaying DIGS and DIGS scaffold Izls. Izls (structural genomics target PA3332) is a protein of unknown function from Pseudomonas aeruginosa and has no functionally characterized homofogs with > 20% sequence identity. I zls belongs to the nuclear transport 2 (NTF2)-like structural superfamily, a functionally diverse fold class of which the steroid- metabolizing enzyme ketosteroid isomerase is also a member. Cells were labeled with 2.7 μ.Μ DIG-RNase-biotin and SAPE. In the presence of 790 μΜ unlabeled DIG, the signal is reduced to that of the negative control ZZ(-), revealing that binding is specific for DIG and not other assay components. Scaffold I zls does not show a binding signal, suggesting that binding is mediated by the designed interface, b, Compensated mean binding (PE) signals of the expressing populations of yeast cells displaying DIGS and DIGS scaffold 31ik4. Cells were labeled with 2.7 μΜ DIG-RNase-biotin and SAPE. In the presence of 790 μΜ unlabeled DIG, the DIGS signal is reduced by half, suggesting that binding is likely specific for DIG. The binding signal is not reduced to background levels probably because binding is weak and the unlabeled DIG is not present in high enough concentration to overcome the avidity affects of the DIG-RNase-biotin label. This explanation is corroborated by the observation that unlabeled 130 μΜ RNase does not affect the DIG8 signal. Scaffold 3hk4, NTF2-like superfamily member and structural genomics target MLR739 (PDB ID 3hk4) from Mesorhizohium loli, does bind to DIG-RNase-biotin with a weaker (and more avidity- based) signal than DIGS. 3hk4 has not been functionally characterized, c, Compensated mean binding (PE) signals of the expressing populations of yeast cells displaying DIGl 5. Cells were labeled with 2.7 μΜ DIG-BSA-biotin and SAFE. In the presence of 1.6 rnM unlabeled DIG, the binding signal is only reduced slightly. Similar effects are seen with -1.6 mM DIG- linker conjugate (DIG-NHS ester reacted with glycine) and BSA. However, the signal was completely reduced upon incubation with 18 μΜ DTG-BSA. The signal is slightly reduced in the presence of additional BSA (0.6% or 1.1% BSA). Either the DIG 15 binding signal is not reduced to background levels in the presence of unlabeled DIG because binding is weak and the amount of competitor in the assay is not enough to overcome the avidity affects of the DIG-RNase-biotin label or because the design recognizes both DIG and BSA non- specifically. Due to these complications, D1GI 5 was not characterized further.

Figure 7. Yeast surface display knockout mutagenesis studies of DIGS and DIGS, a, Functional substitutions of DIGS key modeled interacting residues leads to expressing- population compensated mean binding (PE) signals that are reduced relative to DIGS (left panel). Ceils were labeled with 1.5 μΜ DIG-RNase-biotin and SAFE. Mutation of binding site residue Trpl 19 to larger Arg indicates that DIG binds in the intended pocket of the computational model (right panel). Mutation of hydrogen bonding residue Tyr84 to Phe alone and in combination with the Tyrl I SP e substitution leads to a binding signal that is reduced to background negative control ( ' ZZ(-)) levels, confirming that this residue is necessary for binding. Mutation of tlisSS and Tyr97, which make hydrogen bonds in the computational model, to Ala and Phe, respectively, also lead to reduced binding signals, c, Functional substitutions of DIGS key modeled interacting residues leads to expressing-population compensated mean binding (PE) signals that are reduced relative to DIGS (left panel). Cells were labeled with 2.7 uM DIG-RNase-biotin and SAFE. Mutation of binding site residue Val86 to larger Arg indicates that DIG binds in the intended pocket of the computational model (right panel). Simultaneous mutation of hydrogen bonding residues TyrlO, HislOl, and Tyrl 03 to Phe, Ala, and Phe, respectively, leads to a binding signal that is reduced to background negative control (ZZ(-)) levels. Mutation of His l O l to Ala also leads to a reduced binding signal Despite the observation that the DIG8 scaffold PDB ID 3hk4 does bind to DIG-RNase (Fig. 8), these results indicate that the designed interface could contribute to binding of the design.

Figure 8. Affinity maturation of DIG10: Round 1. a, Strategy for round 1 affinity maturation of DIG 10, A single site-saturation mutagenesis library was constructed by mutating each of 34 positions in the binding pocket (magenta sticks, left panel) to all other amino acids by Kunkel mutagenesis with degenerate NNK primers. After several rounds of selections with highly avid DIG-BSA-biotin and SAFE (step 1), eight positions were identified for which mutations lead to improved binding. Beneficial mutations, chemically similar residue types, and the DIG 10 "wild type" amino acids at these positions (magenta sticks, middle panel) were combined combinatorially by Kunkel mutagenesis with degenerate primers. Following several rounds of increasingly stringent selections with DIG- BSA-biotin or DIG-RNase and SAPE (step 2), two variants, DIG 10. la, and DIGl O. lb, were identified, each having five mutations from DIG 10 (sticks, right panel), b, Flow cytometric analysis of cells expressing DIG 10, DIG 10. l a, and DIGl O. lb, DIG 10 shows a strong binding signal when labeled with a pre- incubated mixture of 2.7 uM DIG-BSA-biotin and SAPE but not when subjected to a more stringent multistep labeling procedure in which cells were first labeled with 5 pM DIG-RNase and then with SAPE. DIG 10. l a and DIGlO. lb show strong binding signals from the latter labeling procedure, however, demonstrating that these variants have improved binding affinities.

Figure 9. Fluorescence polarization affinity measurements of DIG 1 0 evolved variants, a, Equilibrium fluorescence polarization measurements of DIG-PEGj~Alexa488 treated with increasing amounts of DIG 10.1 (left panel), DIG10.2 (middle panel), and DIG 10.3 (right panel). Solid lines represent fits to the data to obtain dissociation constants (Κ ά values). For DIG 10.1 , [DIG-PEG 3 -Alexa488j - 10 iiM. For DIG 1 0.2, [DIG-PEG 3 - Alexa488] = i tiM. For DIG ! 0.3, [DIG-PEG 3 -Alexa488] = 0.5 nM. b, Equilibrium fluorescence polarization measurements of DIG-PEG:¾-Alexa488 treated with increasing amounts of DIG10.3 variants Tyr34Phe (panel 1), Tyrl OI he (panel 2), Tyrl 15Phe (panel 3), Tyr99Phe/Tyrl O lPhe (panel 4), Tyr34Phe/Tyr99Phe/Tyrl 01 Phe (pane! 5), and

Tyr34Phe/Tyr99Phe/Tyrl 01 Phe/Tyrl ! SPhe (panel 6). Solid lines represent fits to the data to obtain dissociation constants (Ka values). Error bars represent standard deviations for at least three independent measurements collected using at least two different batches of purified protein. For T r34Phe, TyrlOlPhe, Tyrl lSPhe, and Tyr99Phe/Tyrl()fPhe, [DIG-PEG 3- Alexa488j - 2 nM. For Tyr34Phe/Tyr99Phe/Tyrl01Phe and

Tyr34Phe/Tyr99Phe/Tyrl OlPhe/Tyrl 15Phe, [DIG-PEG 3 -Alexa488] = 10 nM.

Figure 10, DIG 10. la deep sequencing libraiy construction and selections, a, DIG 10. l a-based deep sequencing fragment libraries. Residues included in the fragment I

(left) and fragment 2. (right) libraries are depicted, respectively (upper panels). Libraries were constructed by recursive PGR using a combination of mutagenized (colored) and wild-type (gray) oligos (lower panels), b, Flow cytometry plots of yeast surface display selections for deep sequencing experiments. Fragment libraries 1 and 2 were first labeled with anii-cmyc- FITC and the expressing populations were collected using fluorescent gates (black squares, left panel step 1 ). Expressing cells were recovered, labeled with 100 11M monovalent DIG- PEG 3 -biotin and then SAFE, and then library clones having higher PE binding signals than DIG 10. la were collected using fluorescent gates (black quadrilaterals, center panel, step 2), To reduce noise, this procedure was repeated once more using the same conditions.

Following these two rounds of binding selections, the selected cells showed higher binding signals than DIG! 0.1 a (right panel). DNA from the expression-sorted naive libraries and the selected libraries were subjected to deep sequencing.

Figure 11. DIG 10.1 a deep sequencing libraiy statistics, a, A data matrix showing the number of counts for each single mutation in the unselected N-terminal (fragment 1) and C- terminal (fragment 2) deep sequencing libraries. The DIG 10. la amino acid at each position is indicated in bold using its one-letter amino acid code, b, A histogram of the deep sequencing data in Fig. 2a indicates that most mutations are deleterious for binding.

Figjire 12. Affinity maturation of DIG 10. la: Round 2, a, The round 2. DIG 10. la affinity maturation library was constructed by pooling the products of three recursive PGR reactions using different combinations of mutagenized and wild-type (gray) oligos (upper panel). After several rounds of selections of the library with monovalent DIG-BSA-biotin and then SAFE, a single best variant, DIG 10.2, having two mutations from DIG 10. la, was identified, b, Flow cytometry plots of yeast surface display selections for affinity maturation round 2. Yeast cells were subjected to five increasingly stringent rounds of selections with DIG-PEGj-biotin and then SAPE using fluorescent gates (black quadrilaterals).

Figjire 13. Affinity maturation of DIG 10.2: Round 3, a, Strat egy for round 3 affinity maturation of DIG 10,2. Mutations having Δ¾* >~3.5 in the deep sequencing experiment (left panel) and the DIG 10.2 amino acids at these positions (middle panel) were combined combinatorially by Kunkel mutagenesis with degenerate primers. Selections converged to a single variant DIG 10,3, having six mutations from DIG 10.2 (right panel), b, Flow cytometry plots of yeast surface display selections for affinity maturation round 3. The library was subjected to four increasingly stringent rounds of selections with DIG-PEGs-biotin and then SAPE using fluorescent gates (black quadrilaterals). An off-rate selection was used in the last round.

Figure 14, Equilibrium competition fluorescence polarization assays of DT.G10.3 with digoxin. Unlabeled digoxin or digoxigenin was added to a solution of DIG10.3 and DIG- PEG 3 -Alexa488 in increasing amounts. Solid lines represent fits to the data to obtain the half- maximal inhibitory concentrations (IC50 values). Error bars represent standard deviations for at least three independent measurements collected using at feast two different batches of purified protein. The affinity of DIG 10.3 for digoxin is slightly higher than that for DIG.

Figure 15. Crystal packing in the DI.G10.2-DIG and DIG 10.3-DT.G Complexes, a, The asymmetric unit of the DIG 10.2 crystal structure contains four copies, each of which bind a molecule of DIG. b, The asymmetric unit of the DIG10.3 crystal structure contains nine copies, each of which bind a molecule of DIG.

Figure 16. Evidence that DIG proteins are dimeric, a, The dimeric unit of DIG10.2. observed in the crystal structure. The protomers are related by a pseudosymmetric or symmetric C2 axis. DIG is shown in magenta sticks, b, The dimer interface is fonned by specific intermolecular salt-bridges, packing interactions, and hydrogen bonds between the solvent-facing sides of the curved β-sheets. c, Preparative Superdex 75 gel fi ltration traces of DIG 10, DIGlO. lb, molecular weight standard horse heart cytochrome c (M v = 29 kDa, red 1), and molecular weight standard bovine erythrocytes carbonic anhydrase (M r = 12.4 kDa, red 2). Both DIG 10 and DIG 10. lb elute near their expected dimeric molecular weights (36 kDa). Both proteins are well-behaved in solution, and the traces show no evidence for sample heterogeneity or higher-order aggregate species, d, Analytical Superdex 75 gel filtration traces of DIG 10.3, pre-formed DIG 10.3-DIG complex, molecular weight standard horse heart cytochrome c (M r = 29 kDa), molecular weight standard bovine erythrocytes carbonic anhydrase (M T = 12.4 kDa,) and bovine aprotinin (6.5 kDa, red 3). DIG 10.3 elutes near its expected dimeric molecular weight (36 kDa). The DIG10.3-DIG complex elutes at a slightly shorter retention volume. DIG! 0,3 and the DIG10.3-DIG complex are both well behaved in solution, and the traces show no evidence for sample heterogeneity or higher-order aggregate species. Figure 17, Ligand binding site 2F 0 ~F C maps of the DIG10.2-D1G complex. 2F 0 - F c omit map electron density of DIG interacting with Tyr34, Tyr41, TyrlOi, and Tyrl 15 in chains A, B, C, and D contoured at 1.0 sigma.

Figure 18, Comparison of the side chain rotamers in the DIG10.2-DTG crystal structure versus the computational model, a, A backbone superposition of the computational model (gray) and the x-ray crystal structure of DIG 10.2 shows that the majority of the amino acid side chains in the binding cavity adopt their modeled conformations, b, A side-by-side comparison of the computational model (top panel) and the x-ray crystal structure (bottom panel) of DIG 10.2 highlighting the conformations of the six incorrectly modeled amino acids. Tyr34 adopts a statistically less probable rotamer (chi2 :;; 153°) in the crystal structure than ihe computational model (chi2 = 80°), which may result from an unanticipated hydrogen bond with His54: a subtle shift in the backbone position of this histidine allows it to face inwards towards ihe binding cavity and interact with Tyr34 instead of being fully solvent- exposed as predicted by the model. Perhaps to relieve hydrophobic clashes with the crystallographic Tyr34 rotamer, Leul 17 also has a different side conformation. Finally, Try41, which engages in a second shell hydrogen-bonding interaction with Tyr34 in the computational model, adopts a. different chil rotamer, and instead participates in a. long (3.6 A) hydrogen bond with the A-ring hydroxyl group of DIG. Serl03 and Leul 05 also show different conformations in the structure and the computational model, but these residues are characterized by high conformational heterogeneity between the four protomers in the crystallographic asymmetric unit (see Fig. 3e).

Figure 19. Crystal structure of the DTG10.3-DIG complex, a, 2F 0 - F c omit map electron density of DIG and hydrogen-bonding residues Tyr34, Tyr41 , Tyrl 01 , and Tyrl 15 in chains A though I of the DIG1Q.3-DIG crystal structure contoured at 1.0 sigma. At this contour level and poor resolution (3.2 A), density is not observed for all hydrogen bonding residues in all crystallographic copies of the protein, b, Backbone binding site superposition of the crystal structures of DIG 10.3 and DIG10.2. c, Backbone binding site superposition of the crystal structures of DIG I0.3 and DTG10.2 chains A (left) and B (right). The DIG 10.3 Tyrl 15 rotamer is similar to that observed in DIG 10.2 chains A, C, and D but different from that observed in chain B. In DIG 10.3 and chains A, C, and D of DIG 10.2, the hydroxyl group of Tyrl 15 is plane with the lactone ring (~5° torsion), but in DIG 10.2 chain B, it is out of plane (-70° torsion) and therefore expected to make a weaker interaction. Figure 20. Side chain conformational heterogeneity in the crystal structures of DIG10.2-DIG and DIG10.3-DIG. a, 2F 0 - F c omit map el ectron density of DIG, Tyrl l5, Leu 105, and S103 in chains A through D of the DIG10.2-D1G crystal structure contoured at 1.0 sigma. Tyrl 15, Leu 105, and S 103 all explore more than one rotameric coniormaiion. b, 2F 0 - F c omit map electron density of DIG, Tyrl 15 and Tip 105 in chains A though I of the DIG.10.3-DIG crystal structure contoured at 1.0 sigma. At this contour level and poor resolution (3.2 A), density for Try 15 is only observed in five of the nine copies; however, for copies in which density is observed, this amino acid is clearly in the same conformation. The position of Tr lOS, which is the same in all nine crystallographic copies, is inconsistent with the alternative conformation of Tyrl 15 observed in chain B of DIG10.2-DIG. These data suggest that DIG 10.3 is more ordered than DIG 10.2.

Figure 21. CD Spectra of DIG 10-based proteins, a, CD wavelength scan of

DIG 10.1 a-TEV-his6 at 25 °C (left panel) and 70 °C (right panel). Protein was prepared in PBS, pH 7.5. The protein exhibits α/β character expected from the structure. At temperatures above 65 °C, the protein shows greater β-sheet character, b, Temperature melting curves of DIG10.1a-TEV-bis 6 , DIG10,2-TEV-his 6 , DIG10.3-TEV-his 6 , and lzls-TEV-his 6 . Proteins are all stable at temperatures below 60 °C. DIG10.3-TEV-his 6 does not unfold, even at 95 °C.

Figure 22. Sequence Alignment of DIGlO-based Designs. Binding site residues are highlighted. Residues in magenta represent designed amino acids that differ from the scaffold Izls. Other colored residues indicate residues that evolved during affinity maturation to yield DIG10. I , DIG10.2, and DIG10.3, respectively. The three hydrogen-bonding tyrosines are marked with a star.

Detailed Description of the Invention

All references cited are herein incorporated by reference in their entirety. Within this application, unless otherwise stated, the techniques utilized may be found in any of several well-known references such as: Molecular Cloning: A Laboratory Manual (Sambrook, et al.,

1989, Cold Spring Harbor Laboratory Press), Gene Expression Technology (Methods in Enzymology, Vol. 1 85, edited by D. Goeddel, 1991. Academic Press, San Diego, CA), "Guide to Protein Purification" in Methods in Enzymology (M.P. Deutshcer, ed., (1990) Academic Press, Inc.); PGR Protocols: A Guide to Methods and Applications (Innis, et al.

1990. Academic Press, San Diego, CA), Culture of Animal Cells: A Manual of Basic Technique, 2 nd Ed. (R.I. Freshney. 1987. Liss, Inc. New York, NY), Gene Transfer and Expression Protocols, pp. 109- 128, ed. E.J. Murray, The Humana Press Inc., Clifton, .J.), and the Ambion 1998 Catalog (Ambion, Austin, TX).

As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. "And" as used herein is interchangeably used with "or" unless expressly stated otherwise.

As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Giu; E), glutamine (Gin; Q), glycine (Gly; G), histidine (His; H), isoleucine (He; I), leucine (Leu: L), lysine (Lys; K), methionine (Met: M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (T p; W), tyrosine (Tyr; Y), and valine (Val; V).

All embodiments of any aspect of the invention can be used in combination, unless the context clearly dictates otherwise.

Unless the context clearly requires otherwise, throughout the description and the claims, the words 'comprise ' ', 'comprising', and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense: that is to say, in the sense of

"including, but not limited to". Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words "herein," "above," and "below" and words of similar import, when used in this application, shall refer to this application as a whole and not to any part icular portions of the application.

The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While the specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art wil 1 rec ognize.

In a first aspect, the invention provides isolated polypeptides comprising or consisting of an amino acid sequence according to SEQ ID NO: l , wherein the amino acid sequence is at least 70% identical to the amino acid sequence of SEQ ID NO: 15, (DIG10.3), and wherein the amino acid sequence is not the amino acid sequence of SEQ ID NO: 24 (PDB ID 1 z l s): MNAKEIL T:ISLRLLENGDARGWCDLFFIPEGVLEFPYAPPGWKTRFEGRETIWAFIMR LFPEHLTVRFTDVQFYETADPDLAIGEFHGDGVATVSGGKLAQDYISVLRTRDGQILL Y RDF WNPLRHLEALGGVEAAAKIV QG A)

Table 1. SEQ ID NO : 1 Residue Alignment Specificity Alternative residues

# AA

1 M White/unlabeied: iiydrophobic or absent

2 N Green: Any amino acid

3 A White/unlabeled: Hydrophobic AAs

4 K Green: Any amino acid

5 E Green: Anv amino acid

6 I White/unlabeled: iiydrophobic AAs

7 V White/unlabeled: Hydrophobic AAs

8 V Green: Any amino acid

9 H Green: Any amino acid

* Ϊ0 S, A Gray/aqua - Anv amino acid

11 L White/unlabeled: Hydrophobic AAs

12 R Green: Any amino acid

13 L White/unlabeled: Any amino acid

14 L White/unlabeled: Hydrophobic AA.s

* 15 E Green: Anv amino acid

16 N Green: Anv amino acid

17 G White/unlabeled: G, A, or S

18 D Green: Any amino acid

19 A White/unlabeled: Hydrophobic AAs

20 R Green: Any amino acid

21 G Green Anv amino acid

22 W White/unlabeled : Aromatic/Polar neutral AAs

23 c, s Gray /aqua - Any amino acid

24 D Green: Any amino acid

25 L White/unlabeled: Hydrophobic AAs

h 26 F White/unlabeled: Hydrophobic AAs

27 H Green: Anv amino acid

28 P White/unlabeled: Hydrophobic AAs

29 E Green: Any amino acid

30 G White/unlabeled: G, A, or S

' 3Ϊ V Green: Anv amino acid

32 L White/unlabeled : Hydrophobic or Polar neutral AAs

E Green: Any amino acid

34 Y Pink: Aromatic

35 P White/unlabeled: Hydrophobic AAs

' 36 Y Dark green: Polar neutral

37 A, P Gray/aqua Anv amino acid

38 P Whiie unlabeled: Hydrophobic

39 P White/unlabeled: Hydrophobic AAs

40 G White/unlabeled: Hydrophobic AAs or G

41 H, Y Gray/aqua - Any amino acid

' 42 K Green: Anv amino acid

43 T White/unlabeled : Polar neutral AAs

44 R Green: Any amino acid

45 F White/unlabeled: Hydrophobic or polar neutral AAs

46 E Green: Any amino acid 47 G W ite/unlabeled: G. A, or S

' 48 R Green: Anv amino acid

49 E Green: Anv amino acid

50 T Green: Any amino acid

51 I White/unlabeled: Hydrophobic AAs

52 w Green: Any amino acid

53 A Green: Any amino ac d

54 H White/unlabeled: Basic, hydrophobic, or polar neutral AAs

55 M White/unlabeled: Hydrophobic

56 R Green: Any amino acid

* 57 L Green: Anv amino acid

58 F White/unlabeled : Hydrophobic, basic, or polar neutral AAs

59 P White/unlabeled: Hydrophobic AAs

* 60 E Green: Anv amino acid

61 Y Green: Any amino acid

62 V, M Gray / ' aqua - Hydrophobic AAs

63 T Green: Any AA

64 V, I Gray/aqua - Any amino acid

' 65 R Green: Anv amino acid

66 F White/unlabeled: Hydrophobic AAs

67 T Green: Any amino acid

68 D Green: Any amino acid

69 V White/unlabeled: Hydrophobic AAs

70 0 Green: Any amino acid

71 F White/unlabeled: Hydrophobic AAs

72 Y Dark green: Aromatic AAs

73 E Green: Any amino acid

74 T White/unlabeled: Polar neutral AAs

75 A Green: Any amino acid

* 76 D Green: Anv amino acid

77 P White unlabeled: Hydrophobic AAs

78 D Green: Any amino acid

79 L Green: Any amino acid

80 A White/unlabeled: Hydrophobic AA.s

' 8Ϊ I Dark green: Aliphatic AAs

82 G White unlabeled: G, A, or S

83 E Dark green: Acidic AAs

84 F White/unlabeled: Hydrophobic or charged AAs

85 H Green: Any amino acid

86 G White/unlabeled: Hydrophobic or polar neutral AAs

87 D Green: Anv amino acid

88 G White/unlabeled: iiydrophobic or polar neutral

89 V Green: Any amino acid

90 H, L Gray /aqua - Any amino acid

91 T Green: Any amino acid

92 V, A Gray/aqua Anv amino acid

As described in the examples that follow, the inventors describe a general method for the computational design of small molecule binding sites with pre-organized hydrogen bonding and hydrophobic interfaces and high overall shape complementary to the ligand, and use it to design the polypeptides of ihe present invention that are high affinity polypeptide ligands of the steroid digoxigenin (DIG) or the related steroids digitoxigenin, progesterone, and β-estradiol, as well as digoxin. The inventors have identified positions of the polypeptides of the mvemion that provide specificity of the polypeptides for DIG or one or more of the related steroids. As such, the poiypeptides of the invention can be used, for example, in steroid biosensors and diagnostics, as well as for therapeutic applications. For example, digoxigenin (DIG), is the aglycone of digoxin, a cardiac glycoside used to treat heart disease. Digoxin has a narrow therapeutic window, and thus the polypeptides of the invention can be used, for example treat digoxin overdoses. The polypeptides can also be used, for example, to detect DIG and/or one or more of the related steroids. The polypeptides of the invention provide a cheaper, more selective alternative to currently used digoxigenin binding antibodies, which are costly to produce and are not selective for digoxigenin over other steroids. The polypeptides of the invention can also be used for in vivo biosensing applications, whereas the antibodies cannot because of their structurally necessary disulfide bonds and difficulty to express robustly.

The polypeptides of the invention are non-naturaily occurring poiypeptides designed using the computational methods of the invention (described herein). The starting

polypeptide was SEQ ID NO: 24 (PDB ID I zls), which is a hypothetical protein from

Pseudomonas aenjginosa. Thus, the poiypeptides of the invention do not comprise or consist of SEQ ID NO: 24. Of the specific polypeptides tested, the polypeptide of SEQ ID NO: 15, (DIG 10.3) was the best binder. Thus, the polypeptides of the invention are at least 70% identical with to the amino acid sequence of SEQ ID NO: 15, over its full length. In various embodiments, the polypeptides of the invention are at ieast 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical with to the amino acid sequence of SEQ ID NO: 15, over its full length.

SEQ ID NO: l is presented in Table 1, which includes the following information: (a) "Residue number": Position in the polypeptide amino acid sequence;

(b) "Alignment amino acid": Residues that are in exemplary polypeptides;

(c) Specificity: Indication of toleration for amino acid substitution at the specific residue based on biochemical analysis: and

(d) Alternative residues: Tolerated residues at the position based on deep mutational scanning analysis.

As used herein, the amino acid residues are abbreviated as follows: alanine (Ala: A), asparagine (Asn; N), aspartic acid (Asp; D), argimne (Arg; R), cysteine (Cys; C), glutamic acid (Giu; E), glutamme (Gin; Q), glycine (Gly; G), histidine (His; H), isoleucine (lie; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Vai; V).

Deep mutational scanning of DIG 10.1 (SEQ ID NO: 13) was carried out to reveal all point mutations that will preserve or enhance function (ΔΕ > 0) and those that negatively affect function (ΔΕ < 0). This data is summarized in the "Specificity" section of Table 1, with the recited colors representing the following information:

• Gray/aqua - positions that differ between the DIG 10 series constructs described and tested in the examples that follow;

• Pink: Mutation of these residues switches the steroid specificity profile of the

polypeptides. Only conservative substitutions permitted;

• Green: Surface residues not at critical dimer interface; these can be mutated without affecting function; and

• Dark green: Surface residues at dimer interface; conservative substitutions permitted

• White/unfabeled: Active site/core residues, conservative substitutions permitted.

Thus, some residues can be substituted with any amino acid, and thus the "alternative residues" noted in the Tables herein are listed as "any amino acid." Other positions can only tolerate conservative substitutions, and thus the "alternative residues" for these positions will define one or more amino acid grouping, as noted in the Tables herein. These amino acid groupings are defined as follows:

• Polar neutral AA's: H, N, Q, Y, T, S, and C;

» Hydrophobic AA's: A, I, L, V, M, F, W, P, and G;

o A liphatic AA's (subset of hydrophobic AA's): A., I, L, V, and M; o Aromatic AA's (subset of hydrophobic AA's): Y, W, and F;

• Charged AA's: K, , D, E, and H;

o Basic AA's (subset of charged AA's): K and R; and

o Acidic AA's (subset of charged AA's): D and E. in one embodiment, the isolated polypeptides comprise or consist of the amino acid sequence of SEQ ID NO: 2 (Table 2). In this embodiment, the specificity defining residues ("pink") are limited to those residues in the polypeptides that have been made and tested in the examples that follow. As is shown in the examples, modifications at these positions (residues 34, 101, and 1 15) between Y and F change the steroid specificity profile of the resulting polypeptide. Table 2, SEP ID NO:2

F White/unlabeled: Hydrophobic or polar neutral AAs

' 46 E Green: Anv amino acid

47 G White unlabeled: G, A, or S

48 R Green: Any amino acid

49 E Green: Any amino acid

50 T Green: Any amino acid

51 ΐ White/unlabeled: Hydrophobic AAs

' 52 w Green: Anv amino acid

53 A Green: Any amino acid

54 II White/unlabeled: Basic, hydrophobic, or polar neutral AAs

* 55 M White/unlabeled: Hydrophobic

56 R Green: Any amino acid

57 L Green: Any amino acid

58 F White/unlabeled: Hydrophobic, basic, or polar neutral AAs

59 P White/unlabeled: Hydrophobic AAs

60 E Green: Any amino acid

61 Y Green: Any amino acid

62 V, M Gray/aqua - Hydrophobic AA.s

' 63 T Green: Anv AA

64 V, I Gray /aqua - Any amino acid

65 R Green: Any amino acid

66 F White/unlabeled: Hydrophobic AAs

67 T Green: Any amino acid

68 D Green: Any amino acid

69 V White unlabeled: Hydrophobic AAs

70 Q Green: Any amino acid

71 F White/unlabeled: Hydrophobic AAs

72 Y Dark green: Aromatic AAs

73 E Green: Any amino acid

* 74 T White unlabeled: Polar neutral AAs

75 A Green: Any amino acid

76 D Green: Any amino acid

77 P White/unlabeled: Hydrophobic AAs

78 D Green: Any amino acid

79 L Green: Anv amino acid

80 A White/unlabeled: Hydrophobic AAs

81 I Dark green: Aliphatic AAs

82 G White/unlabeled: G, A, or 8

83 E Dark green: Acidic A As

84 F White/unlabeled: Hydrophobic or charged AAs

85 H Green: Any amino acid

86 G White/unlabeled: Hydrophobic or polar neutral AAs or G

87 D Green: Any amino acid

88 G White unlabeled: Hydrophobic or polar neutral

89 V Green: Any amino acid 90 H, L Gray/aqua - Any amino acid

" 91 T Green: Anv amino acid

92 V, A Gray/aqua - Anv amino acid

93 S Green: Any amino acid

94 G Green: Any amino acid

95 G Green: Any amino acid

96 K Green: Any amino ac d

* 97 L White/uniabeled: Hydrophobic or polar neutral AAs

98 A Green: Any amino acid

99 A, Y Gray /aqua - Any amino acid

100 D Green: Any amino acid

101 Y Pink: F or Y

* 102 I Dark green: Aliphatic AAs

103 S, A Gray /aqua— Any amino acid

104 V Dark green: Aliphatic AAs

105 L, W Gray /aqua - Any amino acid

106 R Green: Any amino acid

107 T White/uniabeled: Pol r neutral AAs or hydrophobic residues

108 R Green: Any amino acid

109 D Green: Any amino acid

1 10 G White/unlabeled: G, A, or S

1 11 Q Green: Anv amino acid

1 12 I White/unlabeled : Hydrophobic, polar neutral, or basic AAs

1 13 L Green: Any amino ac d

1 14 L Green: Anv amino acid

1 15 Y Pink: F or Y

1 16 R Dark green: Basic AAs

1 17 V, L Gray /aqua - Any amino acid

1 18 F Dark green: Aromatic AAs

* Ϊ Ϊ9 F White/uniabeled: Aromatic, polar neutral, or basic

120 N Dark green: Polar neutral AAs

121 P White/unlabeled: Hydrophobic AAs

122 L Dark green: Aliphatic AAs

123 R Green: Any amino acid

* Ϊ24 V White/unlabeied: Hydrophobic, polar neutral, acidic

125 L White/unlabeied: Aliphatic AAs

126 E Green: Any amino acid

127 A, P Gray /aqua - Any amino acid

128 L Green: Any amino acid

129 G Green: Any amino acid

In a further embodiment, the isolated polypeptides comprise or consist of the amino acid sequence of SEQ ID NO: 3 (Table 3). This embodiment differs from SEQ ID NO:2. (Table 2) in that the surface residues at the dimer interface ("dark green") are limited to those residues in the polypeptides that have been made and iested in the examples that follow.

Table 3. SE ID NO:3

K Green: Any amino acid

' 43 T White/unlabeied: Polar neutral AAs

44 R Green: Anv amino acid

45 F White/unlabeied: Hydrophobic or polar neutral AAs

46 E Green: Any amino acid

47 G White/unlabeied: G, A, or S

48 R Green: Any amino acid

' 49 E Green: Anv amino acid

50 T Green: Any amino acid

51 I White/unlabeied: Hydrophobic AAs

52 w Green: Any amino acid

53 A Green: Any amino ac d

54 H White/unlabeied: Basic, hydrophobic, or polar neutral AAs

55 M White/unlabeied: Hydrophobic

56 R Green: Any amino acid

' 57 L Green: Anv amino acid

58 F White/unlabeied : Hydrophobic, basic, or polar neutral AAs

59 P White/unlabeied: Hydrophobic AA.s

* 60 E Green: Anv amino acid

61 Y Green: Any amino acid

62 V, M Gra / ' aqua - Hydrophobic AAs

63 T Green: Any AA

64 V, I Gray/aqua - Any amino acid

65 R Green: Any amino acid

66 F White/unlabeied: Hydrophobic AAs

67 T Green: Any amino acid

68 D Green: Any amino acid

69 V White/unlabeied: Hydrophobic AAs

70 Q Green: Any amino acid

* 7Ϊ F White/unlabeied: Hydrophobic AAs

72 Y Dark green: Y

73 E Green: Any amino acid

74 T White/unlabeied: Polar neutral AAs

75 A Green: Any amino acid

' 76 D Green: Anv amino acid

77 P White/unlabeied: Hydrophobic AAs

78 D Green: Any amino acid

79 L Green: Any amino acid

80 A White/unlabeied: Hydrophobic AAs

81 ΐ Dark green: T

82 G White/unlabeied: G, A, or S

83 E Dark green: E

84 F White/unlabeied: Hydrophobic or charged AAs

85 H Green: Any amino acid

86 G White/uniabeled: Hydrophobic or polar neutral AAs or G

77 D Green: Any amino acid

" 88 G White/unlabeied: Hydrophobic or polar neutral

89 V Green: Anv amino acid

90 IT, L Gra / ' aqua - Any amino acid

91 T Green: Any amino acid

92 V, A Gray/aqua - Any amino acid

93 S Green: Any amino acid

* 94 G Green: Anv amino acid

95 0 Green: Any amino acid

96 K Green: Any amino acid

97 L White/unlabeied: Hydrophobic or polar neutral AAs

98 A Green: Any amino acid

* 99 A, Y Gray/aqua - Anv amino acid

100 D Green: Any amino acid

101 Y Pink: F or Y

102. I Dark green: 1

103 S, A Gray/aqua - Any amino acid

" 104 V Dark green: V

105 L, W Gray/aqua Anv amino acid

106 R Green: Any amino acid

107 T White/unlabeied: Polar neutral AAs or hydrophobic residues

108 R Green: Anv amino acid

109 D Green: Any amino acid

i io G White/unlabeied: G, A, or 8

1 1 1 Q Green: Any amino acid

1 12 I White/unfabeled: Hydrophobic, polar neutral, or basic A As

1 13 L Green: Any amino acid

1 14 L Green: Any amino acid

1 15 Y Pink: F or Y

* Ϊ Ϊ6 R Dark green: R

1 17 V, L Gray/aqua— Any amino acid

1 18 F Dark green: F

1 19 F White/unlabeied: Aromatic, polar neutral, or basic

120 Dark green: N

* Ϊ2 Ϊ P White/unlabeied: Hydrophobic AAs

122 L Dark green: L

123 R Green: Any amino acid

124 V White/unlabeied: Hydrophobic, polar neutral, acidic

125 L White/unlabeied: Aliphatic AAs

126 E Green: Any amino acid

Ϊ27 A, P Gray/aqua Anv amino acid

128 L Green: Any amino acid

129 G Green: A y amino acid

In another embodiment, the isolated polypeptides comprise or consist of the amino acid sequence of SEQ ID NO: 4 (Table 4), In this embodiment, the polypeptides differ from the polypeptides of SEQ ID NO: 3 (Table 3) in that the active/core site residues ("white") are more narrowly defined.

Table 4. SE ID NO: 4

t 42 K Green: Any amino acid

' 43 T White/unlabeied: Polar neutral AAs

44 R Green: Any amino acid

45 F White/unlabeied: Hydrophobic or polar neutral AAs

46 E Green: Any amino acid

47 G White/unlabeied: G, A, or S

48 R Green: Any amino acid

h 49 E Green: Any amino acid

50 T Green: Any amino acid

51 I White/unlabeied: Aliphatic AAs

52 w Green: Any amino acid

53 A Green: Any amino acid

54 H White/unlabeied: Basic, hydiOphobic, or polar neutral AAs

55 M White/unlabeied: Hydrophobic

56 R Green: Any amino acid

' 57 L Green: Any amino acid

58 F White/uniab eled : Aromat c A As

59 P White/unlabeied: Hydrophobic AAs

60 E Green: Any amino acid

61 Y Green: Any amino acid

62 V, M Gray/aqua - Hydrophobic AAs

63 T Green: Any AA

64 V, I Gray/aqua— Any amino acid

65 R Green: Any amino acid

66 F White/uniabeied: Aromatic AAs

67 T Green: Any amino acid

' 68 D Green: Any amino acid

69 V White/unlabeied : Aliphatic AAs

70 Q Green: Any amino acid

71 F White/uniabeied: Aromatic AAs

72 Y Dark green: Y

' 73 E Green: Any amino acid

74 T White/unlabeied : Polar neutrai AAs

75 A Green: Any amino acid

76 D Green: Any amino acid

11 P White/unlabeied: Hydrophobic AAs

' 78 D Green: Any amino acid

79 L Green: Any amino acid

80 A White/uniabeied: Aliphatic AAs

81 I Dark green: I

82 G White/unlabeied: G, A, or S

83 E Dark green: E

' 84 F White/uniab eieci : Aromatic AAs

85 H Green: Any amino acid

86 G White/uniab ef ed : Hydrophobic or polar neutrai AAs or G

* 87 D Green: Any amino acid G W ite/unlabeled: Hydrophobic or polar neutral

" 89 V Green: Any amino acid

90 H, L Gray/aqua - Any amino acid

91 T Green: Any amino acid

92 V, A Gray/aqua - Any amino acid

93 S Green: Any amino acid

94 G Green: Any amino acid

' 95 G Green: Any amino acid

96 K Green: Any amino acid

97 L White/unlabeled: Hydrophobic or polar neutral AAs

98 A Green: Any amino acid

99 A, Y Gray/aqua - Any amino acid

* ϊοδ D Green: Any amino acid

101 Y Pink: F or Y

102 I Dark green: I

103 S, A Gray/aqua - Any amino acid

104 V Dark green: V

" 105 L, W Gray/aqua - Any amino acid

106 R Green: Any amino acid

107 T White unlabeled: Polar neutral AAs

108 R Green: Any amino acid

109 D Green: Any amino acid

1 10 G White/unlabeled: G, A, or S

ϊ ϊ ϊ Q Green: Any amino acid

1 12 I White/un lab eled : Aliphatic AAs

1 13 L Green: Any amino acid

1 14 L Green: Any amino acid

1 15 Y Pink: F or Y

* Ϊ Ϊ6 R Dark green: R

1 17 V, L Gray /aqua - Any amino acid

1 18 F Dark green: F

1 19 " F White/unlabeled: Aromatic, polar neutral, or basic 120 Dark green: N

" Ϊ21 P White/unlabeled: Hydrophobic AAs

122 L Dark green: L

123 R Green: Any amino acid

124 V White/unlabeled: Hydrophobic, polar neutral, acidic

125 L White/unlabeled: Aliphatic AAs

126 E Green: Any amino acid

* Ϊ27 A, P Gray/aqua - Any amino acid

128 L Green: Any amino acid

129 G Green: Any amino acid

In another embodiment, the isolated polypeptides comprise or consist of the amino acid sequence of SEQ ID NO: 5 (Table 5). In this embodiment, the polypeptides differ from those of SEQ ID NO: 4 (Table 4) in that the active/core site residues ("white") are limited to specific amino acid residues identified in the mutational analysis as preserving or enhancing function in the deep mutational assay, and/or to residues preseni in the polypeptides made and tested.

Tabic 5. SEQ ID NO: 5

40 G W ite/unlabeled: G

H, Y Gray/aqua - Any amino acid

42 K Green: Any amino acid

43 T White/unlabeled: T

44 R Green: Any amino acid

45 F White/unlabeled: F. H, T, Y

46 E Green: Any amino acid

' 47 G White/unlabeled: G

48 R Green: Any amino acid

49 E Green: Any amino acid

50 T Green: Any amino acid

51 ΐ White/unlabeled: I

' 52 W Green: Any amino acid

53 A Green: Any amino acid

54 H White/unlabeled: H, C, 1, T

55 M White/unlabeled: M, F, 1

R Green: Any amino acid

" 57 L Green: Any amino acid

58 F White unlabeled: F, H, A, I, P, V, W

59 P White/unlabeled: P

60 E Green: Any amino acid

61 Y Green: Any amino acid

62 V, M Gray/aq a - Hydrophobic AAs

63 T Green: Any AA

64 V, I Gray/aqua - Any amino acid

65 R Green: Any amino acid

66 F White/unlabeled: F

67 T Green: Any amino acid

* 68 D Green: Any amino acid

69 V White/unlabeled : V

70 Q Green: Any amino acid

71 F White/unlabeled: F

72 Y Dark green: Y

" 73 E Green: Any amino acid

74 T White/unlabeled: T

75 A Green: Any amino acid

76 D Green: Any amino acid

11 P White/unlabeled: P

78 D Green: Any amino acid

79 L Green: Any amino acid

80 A White/unlabeled : A

81 I Dark green: I

82 G White/unlabeled: G

83 E Dark green: E

* 84 F White/unlabeled: F, A, D, W, Y

85 H Green: Any amino acid

86 G White/unlabeled: F, 1, L, T G

87 D Green: Any amino acid G White/unlabeled: G, A, F. i L, N

" 89 V Green: Any amino acid

90 H, L Gray/aqua - Any amino acid

91 T Green: Any amino acid

92 V, A Gray /aqua - Any amino acid

93 S Green: Any amino acid

94 G Green: Any amino acid

' 95 G Green: Any amino acid

96 K Green: Any amino acid

97 L White/unlabeled: I, F, L, M, S, T, W, Y

98 A Green: Any amino acid

99 A, Y Gray/aqua - Any amino acid

* ϊοδ D Green: Any amino acid

101 Y Pink: F or Y

102 I Dark green: I

103 S, A Gra /aqua - Any amino acid

104 V Dark green: V

" 105 L, W Gray/aqua - Any amino acid

106 R Green: Any amino acid

107 T White unlabeled: T

108 R Green: Any amino acid

109 D Green: Any amino acid

1 10 G White/unlabeled: G

ϊ ϊ ϊ Q Green: Any amino acid

1 12 I White/unlabeled : I

1 13 L Green: Any amino acid

1 14 L Green: Any amino acid

1 15 Y Pink: F or Y

* Ϊ Ϊ6 R Dark green: R

1 17 V, L Gray/aqua - Any amino acid

1 18 F Dark green: F

1 19 " F White/unlabeled: \ : . G. M, . W

120 Dark green: N

" Ϊ21 P White/unlabeled: P

122 L Dark green: L

123 R Green: Any amino acid

124 V White/unlabeled: V, E, F, L N, P, R, S, W

125 L White/unlabeled: L

126 E Green: Any amino acid

* Ϊ27 A, P Gray/aqua - Any amino acid

128 L Green: Any amino acid

129 G Green: Any amino acid

In another embodiment, the isolated polypeptides comprise or consist of the amino acid sequence of SEQ ID NO: 6 (Table 6), In this embodiment, polypeptides differ from those of SEQ ID NO: 5 (Table 5) in that the active/core site residues ("white") are limited to substitutions in the polypeptides made and tested in the examples that follow. Table 6. SEQ ID NO: 6

F White/uniabeled: F

' 46 E Green: Any amino acid

47 G White unlabeied: G

48 R Green: Any amino acid

49 E Green: Any amino acid

50 T Green: Any amino acid

51 ΐ White/uniabeled: I

' 52 w Green: Any amino acid

53 A Green: Any amino acid

54 II White/uniabeled: H

55 M White/unlabeied: M

56 R Green: Any amino acid

' 57 L Green: Any amino acid

58 F White/unlabeied : F, H

59 P White/unlabeied: P

60 E Green: Any amino acid 61 Y Green: Any amino acid

' 62 V, M Gray/aqua - Hydrophobic AAs

63 T Green: Any AA

64 V, I Gra / ' aqua - Any amino acid

65 R Green: Any amino acid

66 F White/uniabeled: F

67 T Green: Any amino acid

68 D Green: Any amino acid

69 V White/unlabeied : V

70 Q Green: Any amino acid

71 F White/uniabeled: F

72 Y Dark green: Y

' 73 E Green: Any amino acid

74 T White/unlabeied : T

75 A Green: Any amino acid

76 D Green: Any amino acid

11 P White/uniabeled: P

' 78 D Green: Any amino acid

79 L Green: Any amino acid

80 A White/unlabeied: A

81 I Dark green: I

82 G White/unlabeied: G

83 E Dark green: E

' 84 F White/uniabeled: F, Y

85 H Green: Any amino acid

86 G White/uniabeled: G

87 D Green: Any amino acid

88 G White/uniabeled: G

' 89 V Green: Any amino acid

90 H, L Gray /aqua - Any amino acid

91 T Green: Any amino acid

92 V, A Gray/aqua - Any amino acid

129 G Green: Any amino acid

In another embodiment, the isolated polypeptides comprise or consist of the polypeptide of SEQ ID NO: 7, which differs from SEQ ID NO: 6 (Table 6) by being limit at the surface residues ("green") or at highly variable regions in the peptides tested ("gray/aqua") to the residues shown in Table 7.

Table ?. (SEQ ID NO: 7)

essdise Alignment Specificity Alternative residues

# AA

2 N Green: N, D, S, 1 , C 4 K Green: Charged AAs

5 E Green: Charged AAs

8 V Green: Hydrophobic or aliphatic AAs or

Charged AAs

9 H Green: Charged or polar neutral AAs

1 0 S, A Gray/aqua - Aliphatic or polar neutral AAs

12 R Green: Charged AAs

15 E Green: Charged AAs

16 N Green: Polar neutral AAs

18 D Green: Charged AAs

20 R Green: Charged AAs

* 2Ϊ G Green Polar neutral AAs

23 C, S Gray/aqua— Polar neutral AAs or A

24 D Green: Charged AAs

27 H Green: Polar neutral or Charged AAs

29 E Green: Charged AAs

31 V Green: Hydrophobic or aliphatic AAs or

Charged AAs

33 E Green: Charged AAs

37 A, P Gray/aqua - Hydrophobic, polar neutral, or

Charged AAs

41 H, Y Gray / ' aqua - Hydrophobic or basic AAs

42 K Green: Charged AAs

44 R Green: Charged AAs

46 E Green: Charged AAs

48 R Green: Charged AAs

49 E Green: Charged AAs

50 T Green: Polar neutral AAs

52 w Green: Aromatic AAs or Charged AAs

53 A Green: Aliphatic AAs or Charged AAs

56 R Green: Charged AAs

57 L Green: Aliphatic AAs or Charged AAs

60 E Green: Charged AAs

61 Y Green: Aromatic AAs

62 V, M Gray/aqua - Hydrophobic or Aliphatic AAs

' 63 T Green: Polar neutral

64 V, I Gray /aqua— Hydrophobic AAs

65 R Green: Charged AAs

67 T Green: Polar neutral AAs

68 D Green: Charged AAs

' 70 Q Green: Polar neutral AAs

73 E Green: Charged AAs

75 A Green: Aliphatic AAs

76 D Green: Charged AAs

78 D Green: Charged AAs

79 L Green: Aliphatic AAs

85 H Green: Polar neutral AAs

87 D Green: Charged AAs V Green: Aliphatic AAs or Charged AAs

' 90 H, L Gray/aqua - H. L, A. C, F, !, Q, R, S, T, V, Y

91 T Green: Polar neutral AAs or Charged AAs

92 V, A Gray / ' aqua - V, A, D, E, F, G, I, K, L, M, P, Q, R, S, T,

W

93 S Green: Polar neutral or basic AAs

94 G Green: G, A, S

95 G Green: Polar neutral acidic, or aromatic

AAs

96 K Green: Charged AAs

98 A Green: Aliphatic AAs

99 A, Y Gray/aqua - Hydrophobic or polar neutral Aas

100 D Green: Charged AAs

103 S, A Gray/aqua - S, A, C, D, L, N, R, T, V, W

* Ϊ05 L, W Gray/aqua - L, W, A, F, Ϊ, K, M, S, T, V

106 R Green: Charged AAs

108 R Green: Charged AAs

109 D Green: Charged AAs

0 Green: Polar neutral AAs

" Ϊ Ϊ 3 L Green: Aliphatic or Charged AAs

1 14 L Green: Aliphatic or Charged AAs

1 17 V, L Gra / ' aqua - V, L, A, D, G, M, N, S, Y

123 R Green: Charged AAs

126 E Green: Charged AAs

127 A, P Gray/ aqua - Aliphatic or polar neutral AAs

128 L Green: i . E, G, H, I, K, P, Q, R. T, V, or A

129 G Green: G, A, S

In another embodiment, the isolated polypeptides comprise or consist of the polypeptide of SEQ ID NO: 8, which differs from SEQ ID NO: 6 (Table 6) by being limiied at the surface residues ("green") or at highly variable regions in the peptides tested

("gray/aqua") to the residues shown in Table 8. In one embodiment, no more than 4 of the residues of SEQ ID NO: 8 are cysteine. In various embodiments, no more than I , 2, or 3 of the residues of SEQ ID NO: 8 are cysteine.

Table 8. SEQ ID NO: 8

j 127 A, P Gray/aqua - A, P, H, I, L, S, V

! 128 L Green: L, E, G, H, I, K, P, Q, R, T, V, C, or A

! 129 G Green: G or C

In another embodiment, the isolated polypeptides comprise or consist of the polypeptide of SEQ ID NO: 9, which differs from SEQ ID NO: 6 (Table 6) by being limiied at the surface residues ("green") or at highly variable regions in the peptides tested

("gray/aqua") to the residues shown in Table 9. The residues shown in Table 9 are all present in polypeptides made/iested in the examples thai follow.

In one iurther embodiment of any of the polypeptides of ihe in v ention, each of residues 34, 101 , and 1 15 are Y. In this embodiment, the polypeptides of the invention show high specificity for DIG. In another embodiment of any of the polypeptides of the invention, 1, 2, or all 3 of residues 34, 101 , and 115 are F. In these various embodiments, the steroid specificity of the polypeptides of the invention is shifted such that certain variants bind better to digoxigenin and others bind better to related steroids digitoxigenin, progesterone, and β- estradiol, as described in more detail herein.

In another embodiment of any of the polypeptides of the invention, residue 84 is Y. In this embodiment, polypeptides of the invention are exemplified by Dig5, 1 (SEQ ID: 11), which differs in its hydrogen bonding pattern compared to the DIG 10 series in that the residue that contacts the lactone ring of DIG - in the DIG10 series this is ¥115, but in DIG5.1 it is Y84. In a further embodiment of this embodiment, at least one of the following is true:

Residue 7 is L;

Residue 41 is W; Residue 58 is H;

Residue 61 is H;

Residue 64 is W;

Residue 90 is V;

Residue 97 is Y;

Residue 103 is T;

Residue 1 15 s L;

Residue 1 19 is W;

Residue 124 is I: and/or

Residue 128 is A.

In various embodiments, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , or all 12. of the residues are as defined.

In various further embodiments, the isolated polypeptide of the invention comprises or consists of a polypeptide selected from the group consisting of: (Residues in parentheses are optional)

DIGS (SEQ ID NO: 10)

MNAKEILVHSLRLLENGDARGWCDLFHPEGVLEFPYAPPGWKTRFEGRETIWAHMR LHPEHVTVRFTDVQFYETADPDLAT.GE YHGDGVVTVSGGKYAADFITVLRTRDGQIL LYRVFWNPLRALEAAG(GVEAAAKIVQGA) ;

DIG5.1 (SEQ lD NO: 1 1)

MNAKETLVHSLRLLENGDARGWCDLFHPEGVLEYPYAPPGWKTOFEGRETIWAHMR

LIIPEHVTWRFTDV T QFYETADPDLAIGEYIIGDG A' VSGGKYAADYITVLRTRDGQI LLLRVFW PLRILEAAG(GVEAAAKJVQGA):

DIG 10 (SEQ ID NO: 12)

MN AKEI VVHSLRLLEN GDARG WCDLFHPEG VLEYPY APPGHKTRFEGRETI W AHMR LFPEYVTVRFTDVQFYETADPDLAIGEFHGDGVHTVSGGKLAADYISVLRTRDGQIL LYRVFFNPLRVLEALG(GVEAAAKIVQGA);

DIG 10.1 (SEQ ID NO: 13) INAKEIVVHALRLLENGDARGWCDLFHPEGVXEYPYAPPGHKTRFEGRETIWAHM RLFPEYMTIRPTDVQFYETADPDLAIGEFHGDGVHTVSGGKLAADYISVLRTRDGQIL L YRLFF PLR.VLEPLG (G VE A A A KT VQG A) ; DIG10.2 (SEQ ID NO: 14)

i\t AKEIVV¾ALRLLENGDARGWCDLFHPEGVLEYPYPPPGYKTREEGRETlWAHMR LFPEYMTIRFTDVQFYETADPDLAIGEFHGDGVHTVSGGKLAADYISVLRTRDGQILL

YRLFFNPLRVLEPLG(GVEAAAKIVQGA); DIG10.3 (SEQ ID NO: 15)

MNAKEWVHALRLLENGDARGWSDLFHPEGVLEYPYPPPGYKTRFEGRETIWAHMR. LFPEYMTIRFTDVQFYETADPDLAIGEFHGDGVLTASGGKLAYDYIAVWRTRDGQIL LYRLFF PLRVLEPLG(GVEAAAKJVQGA);

INAKEIVVFIALRLLENGDARGWCDLFHPEGVXEYPYPPPGYKTPJEGRETIWAHMR LFPEYMTIRFTDVQFYETADPDL GEFHGDGVHTVSGGKLAADYISVLRTRDGQ1LL Y RLFFNPLRVLEPLG; DIG10.3 t (SEQ ID NO: 17)

MNAKEiVVHALRLLENGDARGWSDLFHPEGVLEYPYPPPGYKTRFEGRETIWAHMR LFPEYMTIRFTDVQFYETA.DPDLA.TGEFHGDGVLTASGGKL YDYIA.VWRTRDGQIL LYRLFFNPLRVLEPLG; DIG! 0.3 Y99F (SEQ ID NO: 18)

MNAKEIVVHALRLLENGDARGWSDLFHPEGVLEYPYPPPGYKTRFEGRETIWAHMR LFPEYMTIRFTDVQFYETADPDLAIGEFHGDGVLTASGGKLAFDYIAVWRTRDGQIL

LYRLFFNPLRVLEPLG(GVEAAAKiVQGA.); DIG 10.3 Y101F (SEQ ID NO: 19)

MNAKEIVVHALRLLENGDARGWSDLFHPEGVLEYPYPPPGYKTRFEGRETIWAHMR

LFPEYMTIRFTDVQFYETADPDLAIGEFHGDGVLTASGGKLAYDFIAVWRTRDGQIL

LYRLFFNPLRVLEPLG(GVEAAAKIVQGA); DIG 10.3 Yl 15F (SEQ ID NO: 20)

MNAKEIVVHALRLLENGDARGWSDLFHPEGVLEY YPPPGY TRFEGRETIWAHMR LFPEYMTIRFTDVQFYETADPDLAIGEFHGDGVLTASGGKLAYDYIAVWRTRDGQIL LFRLFFNPLRVLEPLG(GVEAAAKrVQGA);

DIG10.3 Y99F/Y101F (SEQ ID NO: 21 )

MNAKEiVVHALRLLENGDARGWSDLFHPEGVLEYPYPPPGYKTRFEGRETIWAHMR LFPEYMTIRFTDVQFYETADPDLAIGEFHGDGVLTASGGKLAFDFIAVWRTRDGQILL YRLFFNPLRVLEPLG(GVEAAAKJVQGA):

DIG10.3 Y34F/Y99F/Y10 IF (SEQ ID NO: 22)

MNAKEIVVHALRLLENGDARGWSDLFHPEGVLEFPYTPPGYKTRFEGRETIWAHMR LFPEYMTIRFTDVQFYETADPDLAIGEFHGDGVLTASGGKLAFDFIAVWRTRDGQILL YRLFFNPLRVLEPLG(GVEAAAKIVQGA); and

DIG10.3 Y34F/Y99F/Y101F Y1 15F (SEQ ID NO: 23)

MNAKEI HALlU-Llr^GDARGWSDLFHPEGVLEFPYPPPGYKTRFEGRETlWAHMR

LFPEYMTIRFTDVQFYETADPDLAIGEFHGDGVLTASGGKLAFDFIAVWRTRDGQIL L FRLFFNPLRVLEPLG(GVEAAAKIVQGA).

As used throughout the present application, the term "polypeptide" is used in its broadest sense to refer to a sequence of subunit amino acids. The polypeptides of the invention may comprise L-amino acids, D-amino acids (which are resistant to L- amino acid- specific proteases in vivo), or a combination of D- and L-amino acids. The polypeptides described herein may be chemically synthesized or recombinantly expressed. ' The polypeptides may be linked to other compounds to promote an increased half-life in vivo, such as by PEGylation, HESylation, PASylation, giycosyiaiion, etc. Such linkage can be covaient or non-covalent as is understood by those of skill in the art.

In a further embodiment, the polypeptides of any embodiment of any aspect of the invention may further comprise a tag, such as a detectable moiety. The tag(s) can be linked to the polypeptide through covaient bonding, including, but not limited to, disulfide bonding, hydrogen bonding, electrostatic bonding, nucleophilc (i.e. Cys, Lys) conjugation chemistry, recombinant fusion and conformational bonding. Alternatively, the tag(s) can be linked to the polypeptide by means of one or more linking compounds. Techniques for conjugating tags to polypeptides are well known to the skilled artisan. Polypeptides comprising a detectable tag can be used diagnostically to, for example, identity the presence of digoxin or other steroid in a sample of interest. However, they may also be used for other detection and/or analytical and/or diagnostic purposes. Any suitable detection tag can be used, including but not limited to enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, radioaciive materials, positron emitting metals, and nonradioactive paramagnetic metal ions. The tag used will depend on the specific detection/analysis/diagnosis techniques and/or methods used such as immunohistochemical staining of (tissue) samples, flo cytometric detection, scanning laser cytometric detection, fluorescent immunoassays, enzyme-linked immunosorbent assays (ELISAs), radioimmunoassays (RlAs), bioassays (e.g., neutralization assays), Western blotting applications, etc. For immunohistochemical staining of tissue samples preferred tags are enzymes that catalyze production and local deposition of a detectable product. Enzymes typically conjugated to polypeptides to permit their immunohistochemical visualization are well known and include, but are not limited to, acetylcholinesterase, alkaline phosphatase, beta-galactosidase, glucose oxidase, horseradish peroxidase, and urease. Typical substrates for production and deposition of visually detectable products are also well known to the skilled person in the art. The polypeptides can be labeled using colloidal gold or they can be labeled with radioisotopes, such as 33 P, 3 P, 5 S, 3 H, and 123 I. Polypeptides of the invention can be attached to radionuclides directly or indirectly via a chelating agent by methods well known in the art.

When the polypeptides of the invention are used for flow cytometric detections, scanning laser cytometric detections, or fluorescent immunoassays, the tag may comprise, for example, a fluorophore. A wide variety of fluorophores useful for fluorescently labeling the polypeptides of the invention are known to the skilled artisan. When the polypeptides are used for in vivo diagnostic use, the tag can comprise, for example, magnetic resonance imaging (MRI) contrast agents, such as gadolinium diethylenetriaminepentaacetic acid, to ultrasound contrast agents or to X-ray contrast agents, or by radioisotopic labeling.

The polypeptides of the invention can also be attached to solid supports, which are particularly useful for in vitro assays or purification of digoxin or other steroids. Such solid supports might be porous or nonporous, planar or nonplanar and include, but are not limited to, glass, cellulose, poiyacrylamide, nylon, polystyrene, polyvinyl chloride or polypropylene supports. The polypeptides can also, for example, usefully be conjugated to filtration media, such as NHS-activated Sepharose or CNBr- activated Sepharose for purposes of affinity chromatography. They can also usefully be attached to paramagnetic microspheres, typically by biotin-streptavidin interaction. As another example, the polypeptides of the invention can usefully be attached to the surface of a microliter plate for ELISA.

In another aspect, the present invention provides pharmaceutical compositions, comprising one or more polypeptides of the invention and a pharmaceutically acceptable carrier. In this embodiment, the polypeptides of the invention may be used, for example, to treat digoxin overdoses. The pharmaceutical composition may comprise in addition to the polypeptide or ihe invention (a) a lyoprotectant: (b) a surfactant; (c) a bulking agent; (d) a tonicity adjusting agent; (e) a stabilizer; i ) a preserv tive and/or (g) a buffer.

In some embodiments,, the buffer in ihe pharmaceutical composition is Tris buffer, a histidine buffer, a phosphate buffer, a citrate buffer or an acetate buffer. The pharmaceutical composition may also include a lyoprotectant, e.g. sucrose, sorbitol or trehalose. In certain embodiments, the pharmaceutical composition includes a preservative e.g. benzalkonium chloride, benzethonium, chlorohexidine, phenol, m-cresoi, benzyl alcohol, methylparaben, propylparaben, chlorobutanol, o-cresol, p-cresol, chlorocresol, phenylmercuric nitrate, thimerosal, benzoic acid, and various mixtures thereof. In other embodiments, the pharmaceutical composition includes a bulking agent, like glycine. In yet other embodiments, the pharmaceutical composition includes a surfactant e.g., polysorbate-20, polysorbate-40, polysorbate- 60, polysorbate-65, polysorbate-80 polysorbate-85, poloxamer- 188, sorbitan monolaurate, sorbitan monopalmitate, sorbitan monostearate, sorbitan monooleate, sorbitan trilauraie, sorbitan tristearate, sorbitan trioleaste, or a combination thereof. The

pharmaceutical composition may also include a tonicity adjusting agent, e.g., a compound that renders the formulation substantially isotonic or isoosmotic with human blood.

Exemplary tonicity adjusting agents include sucrose, sorbitol, glycine, methionine, mannitol, dextrose, inositol, sodium chloride, arginine and arginine hydrochloride. In other embodiments, the pharmaceutical composition additionally includes a stabilizer, e.g., a molecule which, when combined with a protein of interest substantially prevents or reduces chemical and/or physical instability of the protein of interest in lyophilized or liquid form. Exemplary stabilizers include sucrose, sorbitol, glycine, inositol, sodium chloride, methionine, arginine, and arginine hydrochloride.

In a further aspect, the present invention provides isolated nucleic acids encoding a polypeptide of the present invention. The isolated nucleic acid sequence may comprise K A or DNA. As used herein, "isolated nucleic acids" are those that have been removed from their normal surrounding nucleic acid sequences in the genome or in cDNA sequences. Such isolated nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded protein, including but not limited to poly A sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals, it will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the invention.

In another aspect, the present invention provides recombinant expression vectors comprising the isolated nucleic acid of any aspect of the invention operatively linked to a suitable control sequence. "Recombinant expression vector" includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. "Control sequences" operably linked to the nucleic acid sequences of the invention are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered "operably linked" to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type known in the art, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The construction of expression vectors for use in transfeetmg prokaryotic cells is also well known in the art, and thus can be accomplished via standard techniques. (See, for example, Sambrook, Fritsch, and Maniatis, in: Molecular

Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989; Gene Transfer and Expression Protocols, pp. 109-128, ed. EJ. Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, TX). The expression vector must be repiicable in the host organisms either as an episome or by integration into host chromosomal DNA. In a preferred embodiment, the expression vector comprises a plasmid. However, the invention is intended to include other expression vectors that serve equivalent functions, such as viral vectors.

In a still further aspect, the present invention provides host cells that have been transfected with the recombinant expression vectors disclosed herein, wherein the host cells can be either prokaryotic (such as bacteria) or eukaryoiic. The cells can be transiently or stably transfected. Such transfeetion of expression vectors into prokaryotic and eukaryoiic cells can be accomplished via any technique known in the art, including but not limited to standard bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfeetion. (See, for example, Molecular Cloning: A Laboratory Manual (Sambrook, et al, 1989, Cold Spring Harbor Laboratory Press; Culture, of Animal Cells: A Manual of Basic Technique, 2"" Ed. (R.I. Freshney. 1987. Liss, Inc. New York, NY). A method of producing a polypeptide according to the invention is an additional part of the invention. The method comprises the steps of (a) eulturing a host according to this aspect of the invention under conditions conducive to the expression of the polypeptide, and (b) optionally, recovering the expressed polypeptide.

In another aspect, the invention provides methods for treating digoxin overdose and/or toxicity, comprising administering to a subject in need thereof an amount effective of one or more polypeptides or pharmaceutical compositions of the invention to treat the digoxin toxicity 7 . Digitalis or its constituents, digoxin and digitoxin, are the primary cardiotonic steroids that are used to treat cardiac arrhythmias, cardiac insufficiency and congestive heart failure. Digoxin and digitoxin have narrow therapeutic ranges ( 1.0-1.9 nmol/L or approximately 0.8-1.5 ng/ml serum digoxin concentration) and overdose is not uncommon. Digoxin overdose and/or life-threatening digoxin toxicity are treated in the methods of the invention through the administration of one or more of the polypeptides of the invention that counteract the effects of digoxin or digitalis by binding to digoxin thereby pre venting it from inhibiting or regulating the expression or function of Na7K T ATPase. In a preferred embodiment, each of residues 34, 101, and 1 15 of the polypeptide are Y. In this

embodiment, the polypeptides of the invention show high specificity for DIG.

The subject may be any subject suffering from or at risk of suffering from digoxin overdose and/or toxicity, including but not limited to subjects being treated with digitalis or digoxin for cardiac arrhythmias, cardiac insufficiency and congestive heart failure. The subject may be a mammal, such as a human. As used herein, "treating" means to provide any clinical benefit in reducing digoxin toxicity or the effects of digoxin o verdose.

As used herein, an "amount effective" refers to an amount of the polypeptide that is effective for treating the digoxin overdose and/or toxicity. The pharmaceutical composition, such as those disclosed above, and can be administered via any suitable route, including orally, parentally, by inhalation spray, rectally, or topically in dosage unit formulations containing conventional pharmaceutically acceptable carriers, adjuvants, and vehicles. The term parenteral as used herein includes, subcutaneous, intravenous, intra-arterial, intramuscular, intrasternaf, intratendinous, intraspinal, intracranial, intrathoracic, infusion techniques or intraperitoneally. Dosage regimens can be adjusted to provide the optimum desired response (e.g., a therapeutic or prophylactic response). Dosage regimens can be adjusted to provide the optimum desired response (e.g., a therapeutic or prophylactic response). A suitable dosage range may, for instance, be 0.1 ug/kg-100 mg/kg body weight; alternatively, it may be 0.5 ug/kg to 50 mg/kg: 1 ug/kg to 25 mg/kg, or 5 ug/kg to 10 mg/kg body weight. The polypeptides can be delivered in a single bolus, or may be administered more than once (e.g., 2, 3, 4, 5, or more times) as determined by an attending physician.

In another aspect, the invention provides methods for detecting digoxin, comprising contacting a sample of interest with a detectable polypeptide of the invention under suit able conditions for binding the detectable polypeptide to digoxin present in the sample to form a polypeptide-digoxin binding complex, and detecting the poiypeptide-digoxin binding complex. In one embodiment, the sample is a biological sample, including but not limited to blood, serum, nasal secretions, tissue or other biological material from a subject to be tested. The polypeptides of the invention for use in this aspect may comprise a conjugate as disclosed above, to provide a tag useful for any detection technique suitable for a given assay. The tag used will depend on the specific

detection/analysis/diagnosis techniques and/or methods used. The methods may be carried out in solution, or the polypeptide(s) of the invention may be bound or attached to a carrier or substrate, e.g., microtiter plates (ex: for KLiSA), membranes and beads, etc. Carriers or substrates may be made of glass, plastic (e.g., polystyrene),

polysaccharides, nylon, nitrocellulose, or teflon, etc. The surface of such supports may be solid or porous and of any convenient shape.

In one non-limiting embodiment, polypeptides with an Y residue at positions 34, 99, and 101 (including but not limited to DiglO.3 (SEQ ID NO: l 5) may be used in assays for detecting DIG and/or digoxin and/or distinguishing them from other steroids. In other non- limiting embodiments, (a) polypeptides with an F residue at position 101, a Y residue at position 34, and a F or Y at position 99 (including but not limited to DIG 10.3 Yl 01F (SEQ ID NO: 19)) may be used for detecting digitoxigenin and-'or distinguishing it from other steroids (such as from DIG, digoxin, or progesterone); (b) polypeptides with an F residue at each of residues 34 and 101 , and either Y or F at position 99 (including but not limited to DIG 10.3 Y34F/Y99F/Y101F (SEQ ID NO:23)) may be used to detect digitoxigenin and/or progesterone, and/or to distinguish them from other steroids (such as from DIG or digoxm).

Example 1

We developed a computational method for designing iigand binding proteins with three properties characteristic of naturally occurring binding sites: (1) specific energetically favorable hydrogen bonding and van der Waals interactions with the iigand, (2) high overall shape complementarity to the iigand, and (3 ) structural pre-organization in the unbound protein state, which minimizes entropy loss upon iigand binding 15 ' 16 . To program in specific interactions with the small molecule, disembodied binding sites are created by positioning amino acid side chains around the Iigand in orientations optimal for hydrogen bonding and other energetically favorable interactions and then placed at geometrically compatible binding sites in a set of scaffold protein stnictures. The surrounding side chain identities and conformations are then optimized to generate additional protein-ligand interactions and buttressing protein-protein interactions (Fig la). Designs with protein- small molecule shape complementarity below those typical of native protein complexes 1 ' or having interface side chain conformations with low Boltzmann-weighted probabilities in the unbound state 10 are then discarded.

We used the method to design proteins that bind the steroid digoxigenin (DIG), the aglycone of digoxin, a cardiac glycoside used to treat heart disease 18 , and a commonly used non-radioactive biomofecular labeling reagent 19 . Anti-DIG antibodies are routinely administered to treat overdoses of digoxin, which has a narrow therapeutic window" 0 , and are used widely to detect biomolecules in applications such as fluorescence in situ

hybridization 19 . We created idealized DIG binding sites with hydrogen bonds from Tyr or His to the lactone carbonyl oxygen and both hydroxy! groups of DIG and hydrophobic packing interactions between Tyr, Phe, or Tip and the steroid ring system (Fig. la). These interactions were embedded in designed binding sites with high shape complementarity to DIG as outlined above, and 17 designs were selected for experimental characterization based on computed binding affinity, shape complementarity, and the extent of binding site pre- organization in the unbound state (Fig. lb).

Binding of the designed proteins to DIG was probed by yeast surface display " 1 and ilow cytometry using biotinylated DIG-functionalized bovine serum albumin (DIG-BSA) or ribonuciease (DIG-RNase). DIGS and DIG 10 bound to both labels (Fig. lc and 5), and binding was reduced to background levels when ~1 mM of unlabeled DIG was added as a competitor (Fig, lc and 6). Fluorescence polarization (FP) measurements with purified proteins and Aiexa488 fluorophore-conjugated DIG (DIG-PEG3-Alexa488) indicated affinities in the low to mid micromolar range, with DIG 10 binding more tightly (Fig, 2c). The scaffold from which both DIGS and DIG 10 derive, PDB ID Izl s, a protein of unknown function from Pseudomonas aeruginosa, does not bind to either label (Fig. lc and Fig. 6a) when expressed on the yeast surface or to DIG-PEG3-Alexa488 in solution (Fig. 2c), suggesting that the binding activities of both proteins are mediated by the computationally designed interfaces. Indeed, substituiion of small nonpolar residues in ihe central binding pockets of DIGS and DIG 10 with arginines resulted in complete l oss of binding, and mutation of the designed hydrogen-bonding tyrosine and histidine residues to the nearly isosteric phenylalanine reduced binding; for DIG! 0, substitution of any of the three interacting tyrosines abolished binding completely (Fig. Id and Fig. 7). Optimization of DIG 10 by a single round of mutagenesis and selections using yeast surface display and fluorescence-activated cell sorting (FACS) identified small-to-large hydrophobic amino acid changes that increase binding affinity 75-fold to yield DIGlO. la, likely by optimizing steric packing against the ligand (Fig. 2c,d and Figs 8-9).

To provide feedback for improving the overall design methodology and to evaluate the contribution of each residue in the DIGl O. l a binding site, we used next generation sequencing to generate a comprehensive binding fitness map 22"24 . A library of variants with ~1 -3 substitutions at 39 designed interface positions in Dig! 0.1 a was generated using doped oligonucleotide mutagenesis, displayed on yeast, and subjected to selections using a monovalent DIG-PEG3-biotin conjugate (Fig. 10). Variants with increased affinity for DIG were isolated by FACS, and next generation sequencing was used to quantify the frequency of each single point mutation in the unselected and selected populations. A large majority of the interrogated variants were depleted in the selected population relative to the unselected input library, suggesting that most of the designed residues are close to optimal for binding (Fig. 2a and Fig. 1 1). In particular, mutation of the three designed hydrogen bonding residues, Tyr34, TyrlOl, and Tyrl 15, to any other amino acid was disfavored. Several large hydrophobic residues that pack against the ligand in the computational model are also optimal for binding (e.g. Phe66 and Phel 1 ). Besides A99, which directly contacts DIG, most of the observed mutations that improve binding are located in the second coordination shell of the ligand and fall into two functional categories: (1 ) core substitutions tolerating mutation to chemically- similar amino acids (e.g. Leul05 and Cys23), and (2) solvent- exposed loop amino acids having high sequence entropy (e.g. His90, Val92). The single best clone in the libraries, DIG 10.2, features two of the most highly enriched mutations, Ala37Pro and His41Tyr (Fig. 2c,d and Figs 9 and 12).

Combination of the most highly enriched substitutions in a library followed by selections led to DIG 10.3 (Fig. 13), which binds both DIG and its cardiac glycoside derivative digoxin with pieoinolar affinity (Fig. 2c,d, Fig. 14), rivaling the affinities of ami- digoxin antibody therapeutics 20 and an evolved single-chain variable anti-DIG antibody fragment' . FP -based affinity measurements of DIG! 0.3 and its Tyr knockouts suggest that the designed hydrogen bonds each contribute ~2 kcal/mol to binding energy (Fig. 9). Although many residues in DIG 10.3 and its less-evolved variants contribute to binding, Y34, Y101 , and Yl 15 make direct and important interactions.

The crystal structures of DIG 10.2 and DIG 10.3 in complex with DIG were solved to 2.05 A and 3.2 A resolution, respectively (Fig. 3a,b and Figs 15-17). The structure of DIG 10.2 bound to DIG shows atomic- level agreement (average all atom root mean squared deviation (RMSD) of 0.54 A) with the design model (Fig. 3c). The Hgand-protein interface has high shape complementarity (S c = 0.66) and there are no observable water molecules within the binding pocket. The DIG binding mode is nearly identical in the structure and the model, with an average RMSD of 0.99 A for all 2.8 iigand heavy atoms (Fig. 3d). As anticipated, Tyr34, Tyrl 01 , and Tyrl 15 make the designed hydrogen bonds with 03, 02, and OI of DIG, respectively. Tyr4i, a residue identified during affinity maturation, engages in an additional long hydrogen bond with the terminal hydroxyl group of DIG (05) (Fig. 17). Of 27 non-glycine and non-alanine non-surface protein residues within ~10 A of the ligand, 21 adopt rotamer conformations in the design model (Fig. 1 8), including Tyrl 01 and Tyrl 15 (in chain B) as well as the first-shell packing residues Trp22, Phe58, and Phel 19. The structure of DIG10.3 bound to DIG also agrees with the design model (average all-atom RMSD of 0.68 A) (Fig. 19).

We assessed the binding specificity of DIG 10.3 by determining binding affinities for a series of related steroids by equilibrium competition fluorescence polarization assays. Experiments with DIG, digitoxigenin, progesterone, and β-estradiol showed a decrease in affinity corresponding to the Joss of one, two, and three hydrogen bonds respectively (assuming -1 .8 kcal mol per hydrogen bond 25 ), as expected from the structure if these compounds bind in the same orientation as DIG (Fig. 4a,b and Table 10). We next investigated whether the observed steroid selectivity could be reprogrammed by mutagenesis of the key hydrogen-bondmg tyrosine residues. The variants TyrlOlPhe, Tyr34Phe, and Tyr34P e/Tyr99Phe/Tyr 101 Phe show clear preferences for more hydrophobic steroids in a predictabl e manner that depends on the hydrogen bonding capabilities of both the protein and the steroid. Mutation of TyrlOl to Phe eliminates the DIG-specific hydrogen bond with 02 of DIG and provides better hydrophobic packing for the other three steroids lacking a hydroxyl group at that position (Fig. 4c). Substitution of Tyr34 with Phe removes a hydrogen bond to the C14 hydroxyl groups of both DIG (03) and digitoxigenm, enhancing the preference for progesterone and maintaining the relative binding order of DIG and digitoxigenm due to the intact DIG-specific TyrlOl-DIG 02. bond (Fig. 4d). Mutation of Tyrl Ol , Tyr34, and binding site residue Tyr99 to Phe results in decreased binding affinity for DIG and increased affinity for the more hydrophobic steroids (Fig. 4e). These results confirm that the selectivity of DIG 10.3 for DIG is conferred largely through the designed hydrogen-bonding interactions and demonstrate how selectivit can be programmed through positive design alone by control of designed protein- ligand hydrogen bonding and non-polar interactions.

Table 10. Specificity-swapped DIG 0.3 Variants:

Variant Steroid Inhibition Constant ¾)

DIG! 0.3 Digoxigetiin (DIG) 653 ± 262 pM

digitoxigeniii 19 ± 7 nM

progesterone 243 ± 91 iiM

β-estradiol 2.1 ± 0.8 μΜ

digoxin 223 ± 105 pM

DIG! 0.3 Y10I F Digoxigetiin (DIG) 39 + 8 iiM

digitoxigeniii < 3.8 iiM

progesterone 30 + 9 iiM

β-estradiol 1.7 + 0.4 μΜ

DIG 10.3 Y34F Digoxigetiin (DIG) 59 ± 6 nM

digitoxigenm 714 + 79 nM

progesterone 76 + 14 nM

β-estradiol 15 + 2 μΜ

DIGl 0.3 Digoxigetiin (DIG) 580 + 229 nM

Y34F/Y99F/Y101F

digitoxigeniii < 16 nM

progesterone < 17 nM β-estradiol Ι .6 ± 0.6 μΜ

Comparison of the properties of successful and unsuccessful designs provides a test of the hypotheses underlying the design methodology. While all 17 designed proteins by construction had high computed shape complementarity to DIG, the DIG 10 design, which had the highest affinity for DIG, had the most favorable computed ligand interaction energy and was predicted to have the most pre-organized binding site (Fig. l b), suggesting that these attributes should continue to be the focus of future design methodology development. One potential avenue for obtaining more favorable interaction energy would be the incorporation of additional binding site backbone flexibility to achieve more tightly packed binding sites: the observation that substitution of small hydrophobic interface residues to larger residues increased binding affinity indicates that the original DIG 10 design was under-packed.

The binding fitness landscape in combination with the x-ray co-crystal structures highlight the importance of second shelf interactions in stabilizing binding competent conformations. The fitness landscape favors substitution of Leu! 05, adjacent to the key hydrogen-bonding residue Tyrl 15, to Tip or other large hydrophobic residues (Fig. 2a), Both Tyrl 15 and Leu 105 exhibit obvious conformational side chain heterogeneity in the four independent protein subunits of the 2.0 A resolution DIG10.2 crystal structure. Mutation of Leu to Trp results in a more uniform set of side chain conformations at both amino acid positions in the lower resolution DIG 10.3 design (which contains nine independently visualized subunits), as well as a more canonical hydrogen bond geometry between Tyrl 15 and DIG (Fig. 3e and Fig. 20). The higher affinity of DIG10.3 might result from a higher population of the pre-organized, higher affinity conformation of the protein' 13 ' "0 . Indeed, all key hydrogen-bonding tyrosines, particularly Tyrl 15, have higher computed Boltzmann weighted side chain probabilities in apo-DIG l 0.3 than in apo-DIG10.2 and apo-DIG 10. Similarly, reduced backbone conformational entropy is likely responsible for the increased fitness of substitutions increasing β-sheet propensity at positions 90 and 92 which likely stabilize a more ordered extended strand backbone conformation (Fig. 2a), That

conformational flexibility is selected against during affinity maturation suggests that accounting for free energy gaps between binding-competent and alternative states of the binding site 27 , possibly by better assessing side chain entropy or explicitly designing second shell buttressing interactions for key contacts, should aid in achieving high affinity in the next generation of computationally designed ligand binding proteins.

The DIG binding affinity of D G 10.3 is within a factor of two of that of the widely used anti-DIG antibodies 2 ", and as it is very stable and can be expressed at high levels in bacteria it could provide more cost-effective alternative. With continued improvement in the methodology from feedback from experimental results, computational protein design should provide an increasingly powerful approach to creating a new generation of small molecule receptors for synthetic biology, therapeutic scavengers for toxic compounds, and robust binding domains for diagnostic devices.

Methods Summary

Design calculations were performed using RosettaMatch 28 to incorporate five preselected interactions to DIG into a set of 401 scaffolds, RoseitaDesign 7'9 was then used to optimize each binding site sequence for maximal ligand binding affinity. Designs having high interface energy, shape complementarity, and binding site pre-organization were selected for experimental characterization.

Designs were displayed on the surface of yeast strain EBY 10G and examined for binding to a mixture of 2.7 μΜ biotinylated DIG-conjuated BSA or DIG-conjugated RNase and streptavidin-phycoerythrin on an Accuri C6 flow eytometer. Binding clones from yeast- surface displayed libraries based on DIG 10 were selected using highly avid DIG-BSA or DIG-RNase or mono valent DIG-conjugated biotin on a Cytopeia inFlux cell sorter.

DIG 10. l a-derived library D A was sequenced in paired-end mode on an Illumina MiSeq.

Proteins were expressed in E. coli Rosetta 2 (DE3) cells with a C-terminal TEV protease cleavable Hise tag for biochemical assays. For crystallographic analysis of DIG! 0 variants, a 12-amino acid structurally disordered C-terminus deriving from the scaffold protem Izl s was replaced directly with a Hise tag. Binding affinities were determined by equilibrium fluorescence polarization 30 on a SpectraMax MSe microplate reader by monitoring the anisotropy of DIG-conj ugated Alexa488 as a function of protein

concentration. Equilibrium fluorescence polarization competition assays were performed by examining the effect of increasing concentrations of unlabeled DIG, digitoxigeiiin, progesterone, and β-estradiol on the anisotropy of designed protein-DIG-conjugated Alexa488 complex. Methods

Computational methods. Full details for aii computational methods are given in Supplementary Methods. Example command lines and RosettaScripts 31 design protocols are provided in Supplementary Data. Source code is freely available to academic users through the Rosetta Commons agreement. Design models, the scaffold library, and scripts for running design calculation are provided on the Baker lab website.

Matching. A set of 401 scaffolds was searched for backbones that can accommodate five pre-defined side chain interactions with DIG using RosettaMatch". This set contained scaffolds previously used for design projects within our lab" "3"1 as well as structural homologs of a subset of these scaffolds that are known to tolerate mutations.

Rosetta sequence design. Two successive rounds of sequence design were employed. The purpose of the first was to maximize binding affinity for the ligand 36 . The goal of the second was to minimize protein destabilization due to aggressive scaffold mutagenesis while maintaining the binding interface designed during the first round. During the latter round, ligand-protein interactions were up-weighted by a factor of 1.5 relative to intra-protein interactions to ensure that binding energ was preserved. Two different criteria were used to minimize protein destabilization: (1) native scaffold residues identities were favored by 1 .5 Rosetta energy units (Reu), and (2) no more than five residues were allowed to change from residue types observed in a multiple sequence alignment (MSA) of the scaffold if (a) these residues were present in the MSA with a frequency greater than 0.6 and, (b) if the calculated ΔΔΟ for mutation of the scaffold residue to alanine 7 was greater than 1.5 Reu in the context of the scaffold sequence, in some design calculations, identities of the matched hydrogen bonding residues were allowed to vary subject to the MSA and AAG criteria described above. Designs having fewer than three hydrogen bonds between the protein and the ligand were rejected.

Design evaluation. Designs were evaluated on interface energy, shape

complementarity, and apo-protein binding site pre-organization. The latter was enforced by- two metrics: (1) explicitly introducing second-shell amino acids that hold the pre-selected residues in place using Foldit J& , and (2) eliminating designs having rotamer Boltzmann probabilities 39 < 0.1 for more than one of the hydrogen bonding residues (Supplementary Table 5). All designs were evaluated for local sequence secondary structure compatibility, and those predicted to have backbone conformations that varied by > 0.8 A from their native scaffold were rejected (see Supplementary Methods). General experimental methods. Detailed procedures for the syntheses of DIG-BSA- biotin, DIG-RNase-biotin, DIG-PEGs-biotin, and DIG-PEGj-Alexa488, as well as protein expression, purification, and crystallization, cloning, and mutagenesis methods are given in Supplementary Methods. Details about fluorescence polarization binding assays, gel filtration analysis, and protein stability measurements are also provided in Supplementary Methods. Yeast surface display. Designed proteins were tested for binding using yeast-surface display "0 . Yeast surface protein expression was monitored by binding of anti-cmyc F T C to the C-terminal myc epitope tag of the displayed protein. DIG binding was assessed by quantifying the phycoerythrin (PE) fluorescence of the displaying y east population following incubation with DIG-BSA-biotin, DIG-RNase-biotin, or DIG-PEGi-biotin, and streptavidin- phycoerythrin (SAFE). In a typical experiment using DIG-BSA-biotin or DIG-RNase-biotin, cells were resuspended in a premised solution of PBSF (PBS + 1 g L of BSA) containing a 1 : 100 dilution of anti-cmyc FITC, 2.66 μΜ DIG-BSA -biotin or DIG-R ase biotin, and 664 nM SAPE for 2-4 hr incubation at 4 °C. Cellular fluorescence was monitored on an Accuri C6 flow cytometer using a 488 nni laser for excitation and a 575 nm band pass filter for emission. Phycoerythrin fluorescence was compensated to minimize bleed-over contributions from the FITC channel. Competition assays with free digoxigenin were performed as above except that between 750 uM and 1.5 mM of digoxigenin was added to each labeling reaction mixture. Full details are given in Supplementary Methods.

Affinity maturation. Detailed procedures for constructing and selecting all libraries, including those for deep sequencing, are provided in Supplementary Methods. Yeast surface display library selections were conducted on a Cytopeia inFiux cell sorter using increasingly stringent fluorescence gates. In all labeling reactions for selections, care was taken to maintain at least a 10-fold molar excess of label to cell surface protein. Cell surface protein molarity was estimated by assuming that an O.D.eoo of 1.0 = le7 cells/mL and each cell displays 50,000 copies of protein 40 . For each round of sorting, we sorted at least 10 times the theoretical library size. FlowJo software v. 7.6 was used to analyze all data.

Next-generation sequencing. Two sequencing libraries based on DIG 10. la were assembled by recursive PGR: an N-terminal library (fragment 1 library) and a C-terminal library (fragment 2 library). To introduce mutations, we used degenerate PAGE-purified oiigos in which 39 selected positions within the binding site were doped with a small amount of each non-native base at a level expected to yield 1 -2 mutations per gene (TriLink

BioTechnoIogies). Yeast ceils were transformed with DNA insert and restriction-digested pETCON 4 '. Surface protein expression was induced 41' ' and ceils were labeled with anti-cymc-

FITC and sorted for protein expression. Expressing cells were reco vered, induced, labeled with 100 !iM of DIG~PEG 3 ~biotin for > 3 hrs at 4 °C and then SAPE and anti-cytnc-FITC for 8 mill at 4 °C, and then sorted. For each library, clones having binding signals higher than that of DIG! 0.1 a were collected (Fig. 10). To reduce noise from the first round of cell sorting, the sorted libraries were recovered, induced, and subjected to a second round of sorting using the same conditions (Supplementary Methods).

Library DNA was prepared as described 42 . Illumina adapter sequences and unique library barcodes were appended to each library pool by PGR amplification using population- specific primers. DNA was sequenced in paired-end mode on an Illumina MiSeq using a 300- cycle reagent kit and custom primers (see Supplementary Methods). Of a total 5,630, 105 paired-end reads, 2,531 ,653 reads were mapped to library barcodes. For each library, paired end reads were fused and filtered for quality (Phred > 30). The resulting full-length reads were aligned against DIG 10. la using Enrich 4" '. For single mutations having > 7 counts in the original input library, a relative enrichment ratio between the input library and each selected library was calculated 4" ' 44 ' 45 . The effect of each amino acid substitution at 39 binding site residues on binding (Δ£ ) is given as the log base 2 frequency of observing mutation x at position i in the selected versus the unselected population, relative to that of the DIG l 0.1 a

Fluorescence polarization binding assays. Fluorescence pol rization-based affinity measurements of designs and their evolved variants were performed as described 40 using

Alexa488-conjugated DIG (DIG-PEGv-Aiexa488). Fluorescence anisoiropy (r) was measured in 96- well plate format on a SpectraMax M5e microplate reader (Molecular Devices) with λ βχ - 485 nM and X em = 538 nM using a 515 nm emission cutoff filter. Fluorescence polarization equilibrium competition binding assays were used to determine the binding affinities of DIG 10.3 and its variants for unlabeled digoxigenin, digitoxigenin, progesterone, β-estradioL and digoxin. The inhibition constant for each protein-ligand interaction, K\, was calculated from the measured total unlabeled ligand producing 50% binding signal inhibition (¾o) and the Kd of the protein-label interaction according to a model accounting for receptor-depletion conditions 40 . Supplementary Methods

Computational Methods. Digoxigenin binders were designed using an updated version 1 of RosettaMatch 2 to search for PDB scaffold backbones that can accommodate predefined interactions to the ligand followed by RosettaDesign ' to optimize the binding site amino acid sequences of the matches for ligand binding affinity.

Generation of ligand and ligand conformer library. The 3-dimensional structure of digoxigenin (DIG) was obtained from PDB ID 1LKE 4 . Because our experimental validation and selection methods rely on the presence of a linker that connects the 05 hydroxy ! of the DIG molecule to either biotin or carrier protein, we included this linker in our ligand model. linker atoms were added to DIG using the Build functionality of MacPyMOL (Schrodinger, LLC).

A ligand conformer library was generated by sampling conformations around the C3~ 05 and 1-C26 bonds at -60° ± 30°, 60° ± 30°, and 180° ± 30°. Contemners were rejected if there were significant clashes within the molecule by using an intra fa rep cutoff value of 0.2.5 Rosetta energy units (Reu). Although the lactone-cardenoime bond (C17-C20) of the steroid is freely rotatable in solution, we restricted this torsion angle to that found in PDB ID 1LKE and PDB ID HGJ for simplicity.

Scaffold selection. A set of 401 scaffolds was generated for use as input structures for matching. This set contained scaffold proteins previously used for enzyme design projects within our lab 5"7 as well as structural homologs 8 of a subset of these scaffolds (PDB codes lm4w, loho, la53, ldl3, lel a and lthf) having a DALI Z-seore cutoff value of 8 from the input search model. These five scaffolds were chosen because of previous enzyme-design successes in these fold classes 5" ' and/or because of their thermostability. Directed evolution experiments have shown that more stable scaffolds can acquire new functions more easily than their less stable counterparts 9 ' 1 ". Ail scaffolds are <350 amino acids, have been expressed previously in E. coli, and were stripped of their cognate bound small molecules and water molecules before use. To identify residue positions to be used for matching in the homolog scaffolds, each homolog crystal structure was superimposed on that of its parent scaffold using the CEAlign plug-in of the PyMOL molecular visualization program, and then homolog residue positions within 5.0 A of any ligand heavy atom present in the parent scaffold were identified. For PDBs l a53, ld!3 and loho, ligands present in the crystal structures were used in this search. For lm4w, lela and lthf, ligand positions from the computational design models of a retroaidolase (RA60) 5 , a Diels-Aiderase (DA 20) 1 , and a Kemp Eiiminase (KE_007) 6 were used, respectively.

Geometric placement of ligand using a set of pre-selected interactions (matching). Geometric criteria for enforcing binding site interactions were determined by inspecting structures of digoxin bound to the aati-digoxigenin antibody 26-10, PDB ID 1IGJ' 1 , and of digoxigenin bound to the engineered lipocalin DigA16, PDB ID 1LKE 4 . From these structures we defined five interface criteria: (1) hydrogen bond between the lactone carbonyl oxygen 01 and a Tyr side chain, (2) hydrogen bond between the 02 hydroxvl and a histidine or Tyr side chain, (3) hydrogen bond between the 03 hydroxyl and a His or Tyr side chain, (4) hydrophobic packing interaction on the top face of the ligand, and (5) hydrophobic packing interaction on the bottom face of the ligand. Two active site configurations were specified: one having Tyr, Tyr, His, Phe/Tyr, and Phe/Tyr/Trp satisfying design criteria 1-5 (DIG yyhff), and one having Tyr, His, His, Phe/Tyr/Trp, and Tyr/Trp satisfying design criteria 1-5 (DIG__yhhffj.

Geometric criteria were defined using six degrees of freedom between the ligand and the desired interacting side chain using a matching constraints file 1 . Extra rotamer sampling (two half step standard deviations) was performed around all side chain torsion angles. To enforce burial of the lactone head group within a binding pocket, we considered only those residue positions in the binding site that had a minimum of 14 neighboring residues during matching for constraint 1 (hydrogen bond to the lactone carbonyl oxygen), A neighbor was defined as a residue having Ca within 10 A of the Ca of the binding site position under consideration. Secondary matching 1 was used for constraints 3, 4, and 5. ' TO eliminate high- energy rotamer conformations, a maximum Dunbrack energy (fa_dun) cutoff of 4,5 Reu (unweighted) was used while building rotamers for all constraints. Using these matching criteria, 29,274 and 30,861 matches were found for DIG yyhff and DIG yhhff, respectively.

Rosetta sequence design. Active site amino acid sequences of each match were designed to maximize binding affinity to the ligand according to the Rosetta energy function using the enzdes weights set for the energy terms 1 '' 2 . Explicit electrostatics were not used. Design moves were followed by steepest descent gradient minimization in which side chain degrees of freedom and the relative orientation of the ligand with respect to the protein were allowed to minimize freely ' 3 but backbone minimization was restricted such that Ca atoms were only allowed to move <0.05 A from their pre-minimization positions. Internal torsions of the ligand were allowed to minimize but were constrained to be within 5 degrees of their initial values

Two successive rounds of sequence design were used to generate designs. The purpose of the first round was to maximize binding affinity for the ligand 1 . To prevent destabilization of the apo-protein that can result from mutating potentially stabilizing residues having side chains important for core packing, aromatic residues in the scaffold were only allowed to mutate to other aromatics during this round of design.

After the first round, a second round of binding site sequence design was performed on the output files of the first round. The goal of this round was to optimize protein stability while maintaining the binding interface designed during the first round as much as possible. Ligand-protein interactions were up-weighted by a factor of 1.5 relative to intra-protein interactions during sequence optimization in attempt to ensure that the interface binding affinity was maintained, and two different criteria were used to optimize protein stability: (1) native scaffold residues identities were favored by 1.5 Rosetta energy units (Reu), and (2) no more than five residues were allowed to change from identities observed in a multiple sequence alignment (MSA) if (a) these residues were present in the MSA with a frequency greater than 0.6 as specified by a position-specific sequence matrix (PSSM) and, (b) if the calculated ΔΔΟ for mutation of the scaffold residue to alanine was greater than 1.5 Reu in the context of the scaffold sequence. The ΔΔΟ for mutation to alanine was estimated as described 14 and PSSM files were generated using NCB1 PSI-BLAST. For both the

DIO_yhhff and the DIG_yyhff designs, a first method restricted the amino acid identities of the hydrogen bonding (Tyr/His) residues to their pre-seiected (matched) identities during the design. For the DIG yhhff designs, we used an alternative second method in which the matched residues were allowed to mutate to any amino acid subject to the MSA and ΔΔΟ criteria described above. Designs generated using this fatter protocol were filtered to ensure the presence of at least three hydrogen bonds between the protein and the ligand.

Evaluation of designs. Designs passing the filters encoded in the XML files were subjected to several additional filtering criteria. High shape complementary was enforced using by rejecting designs having S c < 0.6. Shape complimentar '- was computed using the CCP4 package v.6.0.2 12 using the S c program 16 and the Rosetta radii library. A common feature of the engineered DIG-binding iipocaiin DigA16 (PDB IDs ILKE and 1KX0) 4 and the anti-DIG 2.6-10 antibody (PDB IDs IIGJ and IIGI) 1 1 is that the binding site is largely pre- organized; there are very few structural changes between the bound and unbound forms of the proteins. We therefore attempted to enforce pre -organization of the binding-competent conformation of the apo-protein by two metrics: (1) introducing second-shell amino acids that hold the pre-selected residues in place via hydrogen bonding or sterics using Foldit 17 , and (2.) eliminating designs having Boltzmann-weighted side chain probabilities' 8 < 0.1 for more than one of the hydrogen bonding residues.

Compatibility of designed sequence with local backbone structure. We reasoned that binding site pre- organization would be compromised if substitution of amino acid side chains during (fixed backbone) design leads to a change in the backbone conformational preference in regions sequence- local to the sites of substitution. Therefore, we developed a metric to estimate the impact of design on local backbone structure and used this metric to discard designs that were predicted to lead to backbone structure changes. Using the structure prediction modules of Rosetta 19 , we generated a set of 9-mer fragment structures for each designed and wild type scaffold sequence and compared the average RMSD of these fragments to those of the scaffold backbone structures. If the average RMSD of

conformations predicted in these fragments (200 9-mers) near any designed position was greater (> 0.8 A) for the designed sequence than the wild type scaffold sequence, we flagged that region of the designed protein as unlikely to adopt the local backbone conformation of the scaffold protein and rejected that designed protein.

Design Scoring. Following automated filtering, all designs were inspected manually using Foldit' ' and some ligand-proxirnal residues were manually reverted back to their native scaffold identity to increase the likelihood of design stability. Finally, 17 designs in 14 unique scaffolds were chosen for experimental testing (Supplementary Table 2). For scoring, all design models were relaxed with backbone and side chain heavy atom constraints 20 using Rosetta relax 2 '.

Modeling directed evolution mutations. Mutations arising from directed evolution studies were modeled using RosettaScripts' 2 . Mutations were introduced in the parent model, then residues having Co. within 10 A of any ligand heavy atom and having C within 12 A of any ligand heavy atom and C[i closer to any heavy atom in the ligand than Ca were repacked using the soft rep score function 22 . All side chains, the rigid body orientation of the ligand with respect to the protein, and internal ligand torsions were minimized using the Rosetta energy function with the enzdes weights set. Backbone minimization was restricted such that Ca atoms were only allowed to move < 0.05 A from their pre-minimization positions. Ten trajectories were run and the one having the lowest interface energy was selected. Materials. Digoxigenin, ligoxin, digitoxigenin, progesterone, and β-estradiol were purchased from Sigma Aldrich (St. Louis, MO) and were used as received. DIG-BSA was purchased from CalBioreagents (San Mateo, CA, ~10 DIG molecules per BSA), EZ-link- sulfo-NHS-biotin was purchased from Thermo Fisher Scientific (Waltham, MA).

Ribonuclease A (RNase A) and DIG-NHS were from Sigma Aldrich (St. Louis, MO).

Reagents and solvents used for the synthesis of the digoxigenin derivatives were pui'chased from Sigma Aldrich and used without any further purification. Dimethylsulfoxide was stored over activated molecular sieves (Sigma- Aldrich, 4 A, beads 8-12 mesh) for at feast 24 hours before use. High-resolution mass spectra (FIRMS) were collected with a LCQ Fleet Ion Trap Mass Spectrometer (Thermo Scientific). Reverse-phase analytical high-pressure liquid chromatography (RP-HPLC) was run on a Dionex system equipped with a P680 pump, an AS! 100 automatic sample injector and an UltiMate 3000 diode array detector for product visualization using a Waters symmetry CI 8 column (5μιη, 3.9 x 150 mm). Reverse-phase preparative high-pressure liquid chromatography was performed on a Dionex system equipped with an UltiMate 3000 pump and an UVD 170U UV-Vis detector for product visualization on a Waters SunFire™ Prep CI 8 OBD™ 5 μ,ηι 19x 150 mm Column. Proton and carbon nuclear magnetic resonance (NMR) spectra were recorded at room temperature on a Broker Avance-ίϊΙ 400 or on a Broker DRX-600 equipped with a cryoprobe. Chemical shifts (8) are reported in ppm relative to the solvent residual signals. Synthetic schemes are given in Fig. 21.

Biotinylation of DIG-BSA. DIG-BSA was prepared by reacting 50 μΐ. of a 58 μΜ solution of DIG-BSA (2.9 nmol) with 8 μί, of a 1 .812.5 mM solution of EZ-link-suifo-NHS- biotin (14.5 nmol, 5 eq) in PBS for 1 hr at RT. A 10 μΐ, portion of 14.5 mM glycine was added to quench the reaction. After 30 min, the reaction mixture was centrifuged and soluble protein was purified from excess small molecules by repeated rounds of centrifugal concentration and dilution into PBS until the absorbance of the flow-through remained constant.

Synthesis of DlG-R ase-biotin. A 460 uL portion of a 365 μΜ solution of

Ribonuclease A (168 nmol; RNase A) prepared in PBS was reacted with 30 μΐ. of a 9.73 mM solution of EZ-hnk-sulfo- HS-biotin (2.92. nmol, 1.7 eq) prepared in PBS and 10 uL of a 106.3 mM solution of DIG-NHS (1 μηιοΐ, 6 eq) prepared in DMSO for 1 hour at RT. A 2.0 μΐ. portion of 385 mM glycine was added to quench the reaction. After 2.0 min, the reaction mixture was centrifuged and soluble protein was purified from excess small molecules by repeated rounds of centrifugal concentration and dilution in to PBS until the absorbance of the flow-through remained constant.

Synthesis of Biotin-PEG 3 -NH 2 (2) Biotin (1 , 1 3.5 mg, 55.3 μηιοΐ, 1 eq) was dissolved in 100 μΐ, of dimethylsulfoxide (DMSO) and diisopropylethyfamine DIEA) was added (19.3 uL, 2 eq). O-iN-succinimidy^-N.N.N'.N'-tetramethyl-uronium (TSTU, 15.0 mg, 0.9 eq) was added and the clear solution was stirred for 10 minutes at room temperature to form the biotin-NHS ester. 4,7, 10-Trioxa- 1, 13-irklecanedianrrne (18 mg, 1 .5 eq) was dissolved in 200 uL of dry DMSO and the biotin-NHS was added drop wise under vigorous stirring over 5 minutes. The mixture was stirred for a further 10 minutes at room temperature. 1.5 mL of diethyl ether was added to the clear solution and the resulting suspension was centrifuged. The supernatant ether ph se was discarded and the remaining oil was purified by preparative RP-HPLC (5mL/mm, 10- 100% acetonitrile in 0.1% TFA in H 2 0). The fractions containing the product were iyophilized to afford 2 as a yellowish liquid (1 mg, 67%). [HRMS (ESI): 447.42 m/z (447.7 m/z expected). \H NMR (400 MHz, DMSO) δ 7.78 (t, 1 H, J = 5.6 Hz), 7.70 (m, 2 H), 6.42 (d, 1 H, J - 0.2 Hz), 6.37 (m, I H), 4.31 (m, 1 H), 4.13 (dd, 1 H, J ' = 7.6, 4.5 Hz), 3.50 (m, 1 1 H), 3.39 (t, 2 H, J = 6.3 Hz), 3.08 (m, 3 H), 2.85 (m, 3 H), 2.05 (t, 2 H, J = 7.4 Hz), 1.78 (m, 2. H), 1 .61 (m, 4 H), 1.49 (m, 3 H), 1.30 (m, 2 H). 13 C NMR (101 MHz, DMSO) δ 172.4, 163.2, 70.2, 70.1 , 70.0, 70.0, 68.6, 67.8, 61 .5, 59.7, 55.9, 40.3, 37.3, 36.2, 35.7, 29.9, 28.7, 28.5, 27.7, 25.8.]

Synthesis of Digoxigenin-PEGv-biotin (3) Digoxigenin-NHS ester (1 mg, 1.5 μτηοΐ) was dissolved in 100 μΐ. of DMSO and DIEA (0.4 mg, 3.0 umol) was added, followed by 2 (1 .3 mg, 3.0 μηιοΐ). The reaction was stirred for 10 minutes at room temperature and then purified by preparative HPLC (5mL/min, 10- 100% acetonitrile in 0.1 % TFA in H 2 0). The fractions containing the product were Iyophilized to afford 3 as a yellowish liquid (0.4 mg, 27%). [HRMS (ESI): 990.4 m/z (990.6 m/z expected) ! H NMR (400 MHz, DMSO) δ 7.74 ( in. 2 H), 7.52 (m, 1 I D. 6.44 (s, I H), 6.34 (s, 1 H), 5.83 (s, 1 I n. 4.88 (m, 3 H), 4.32 (m, 2 H), 4.15 (m, 2 H), 3.77 (m, 1 H), 3.60 (m, 1 H), 3.52 (m, 2 H), 3.47 (m, 2 H), 3.44-3.2 (30 H), 3.08 (m, 2 H), 2.84 (m, 1 H), 2.81 (m, 1 H), 2.60 (m, 1 H), 2.57 (m, 2 H), 2.45 (m, 1 H), 2.05 (m, 2 H), 1.74 (m, 2 H), 1.61 (m, 4 H), 1 .44 (m, 3 H), 1 .25 (m, 2 H), 0.87 (s, 2 H), 0.66 (s, 2 H)]

Synthesis of Alexa488-PEGv-Nl¾ (5) Alexa. Fluor 488 (4, 4.74 mg, 8.9 μ ιοΐ) was dissolved in 100 μΕ of DMSO and treated with DIEA (3.1 μΕ, 17.8 μηιοΐ), followed by TSTU (3.22 mg, 10.7 μηιοΐ). The reaction was stirred at room temperature for 10 minutes. 4,7, 10-Trioxa- l , 13 -tridecanediamine (3.92 mg, 17,8 μηιοΐ) was dissolved in 100 μΕ of dry DMSO and the Alexa 488 reaction mixture was added drop wise under vigorous stirring over 5 minutes. The clear orange solution was stirred for 10 minutes at room temperature and then purified by preparative HPLC (5mL/min, 10-100% aeetonitrile in 0.1 % TFA in H 2 0). The fractions containing the product were lyophilized to afford 5 as a deep red liquid (2.8 mg, 43%). [FIRMS (ESI): 738.3 m/z (738.7 m/z expected). 1H NMR (400 MHz, DMSO) δ 8.74 (m, 1 H), 8.62 (m, 1 i l l. 8.26 (m, 1 H i. 7.62 (m, 2 H), 7.26 (m, 1 H), 6.86 (m, 3 H), 3.54 (m, 4 H), 3.48 (m, 2 H), 3.3-3.4 (6H), 2.83 (m, 2 H), 2.08 (d, 1 H, J = 0.7 Hz), 1.84 (m, 2 H), 1.73 (m, 2 I i :·. 1.25 (m, 1 IT), 1.10 (t, 4 I I. J - 7.0 Hz)]

Synthesis of Digoxigenin-PEGj-Alexa488 (6) 5 (0.56 mg, 0.76 μηιοΐ) was dissolved in 200 μΕ of DMSO and treated with DIEA (0.20 mg, 1.52 μταοΐ). Digoxigenin-NHS ester (0.5 mg, 0.8 μτηοΓ) was added at once and the reaction stirred for 10 minutes at room temperature and then purified by preparative HPLC (SmL/min, 10-100%) aeetonitrile in 0.1% TFA in ¾0). The fractions containing the product were lyophilized to afford 6 as a deep red liquid (0.59 μηιοΐ, 78%). j H NM R (400 MHz, DMSO) δ 8.87 (s, 1 H), 8.69 (s, 3 H), 8.27 (dd, 1 H, J - 7.9, 1 .3 Hz), 7.74 (m, 1 H), 7.54 (m, 2 H), 7.00 (dd, 4 Fl, J - 3.2, 1.6 Hz), 5.81 (s, 1 H), 4.86 (m, 3 FI), 3.8-3.5 (31 H), 3.37 (m, 3 H), 3.22 (m, 2 H), 3.16 (s, 2 FI), 3.06 (m, 2 H), 2.08 (s, 4 H), 2.02 (t, 2 H, J = 7.3 Hz), 1.81 (d, 2 H, J = 6.5 Hz), 1.73 (m, 2 H), 1.59 (m, 4 H), 1.43 (m, 7 H), 1.19 (d, 3 I I. J - 6.2 Hz), 1.07 (m, 2 H), 0.86 (s, 2 H), 0.64 (s, 2 Fl).]

Gene synthesis. Designs DIG1-17, DigA16, and 3hk4 were ordered from Genscript (Piscataway, NJ) between the Ndel and Xhol restriction sites of a custom pET29-based vector having an -terminal FLAG tag and a C-terminal Hiss tag (pET29FLAG). Codon usage was optimized for both E. coli and yeast with preference given to E. coli. DNA sequences are given in Supplementary Table 1.

Yeast surface display assays. Designed proteins were tested for binding using yeast- surface display 23 . Designs DIG 1 - 17/pET29FLAG, DigA 16/pET29FLAG, and

3hk4/pET29FLAG were subcioned into the NdeliXhoI cloning sites of pETCON 24 . Designs and control proteins in pETCON were transformed into EBY 100 cells using lithium acetate and polyethylene glycol " ' with d¾Q instead of single stranded carrier DNA and were plated on selective media (C -lira -trp). Freshly transformed cells were inoculated into 1 mL of SDCAA media 2" ' and grown at 30 °C, 200 rpm. After -12 hrs, le7 cells were collected by eentrifugation at 1,700 x g for 3 min and resuspended in 1 mL of SGCAA media to induce protein expression. Following induction for 24-48 hrs at 18 °C, 4e6 cells were collected by centrifugation, and washed twice by incubation with PBSF (PBS supplemented with i g/L of BSA) for 10 min at room temperature

Yeast surface protein expression was monitored by binding of anti-cmyc FI.TC (Miltenyi Biotec GmbH, Germany) to the C-terminal myc epitope tag of the displayed protein. DIG binding was assessed by quantifying the phycoerythrin (PE) fluorescence of the displaying yeast population following incubation with DIG-BSA-biotin, DIG-RNase-biotin, or DIG-PEG 3 -biotin, and streptavidin-phycoeiytlirin (SAFE; invitrogen, Carlsbad, CA). In a typical experiment using DIG-BSA-biotin or DIG-RNase-biotin, 4e6 cells were resuspended in 50 μΐ. of a premixed solution of PBSF containing a 1 : 100 dilution of anti-cmyc FIT C, 2.66 μ ' Μ DIG-BSA -biotin or DIG-RNase biotin, and 664 nM SAFE. Following a 2.-4 hr incubation at 4 °C in the dark on a rotator, cells were collected by centrifugation at 1 ,700 x g for 3 min and washed with 200 μΕ of PBSF at 4 °C. Cell pellets were resuspended in 2.00 uL of ice- cold PBSF immediately before use. Cellular fluorescence was monitored on an Accuri C6 flow cytometer using a 488 mn laser for excitation and a 575 nm band pass filter for emission. Phycoerythrin fluorescence was compensated to minimize bleed-over contributions from the FITC fluorescence channel.

Two positive controls having different affinities for digoxigenin were used to assess the binding assay: DigA l 6 2 °, and a commercially available anti-DIG monoclonal antibody 9H27L19 (Life Technologies). Experiments using DigAl 6 were conducted in an identical fashion to designs DIG1 - 17. For those employing the DIG antibody, two tandem Z domains of protein A (ZZ domain) 27,2 *, were displayed on the yeast ceil surface. Washed cells were resuspended in 20 μΐ-, of PBSF with 2 uL of rabbit anti-DIG mAB 9H27L 19 (Invitrogen, Carlsbad, CA). Following a 30-min incubation at 4 °C on a rotator, excess antibody was removed by washing the cells with 200 μΐ, of PBSF. Labeling reactions were then performed as above. Negative controls for binding were the ZZ domain without mAB and an orthogonal gpl2()-based library available in the Baker lab (82). Flow Jo software version 7.6 was used to analyze ail flow cytometry data presented here.

Competition assays with free digoxigenin were performed as above except that between 750 μΜ and 1 .5 niM of digoxigenin (Sigma Aldrich, St. Louis, MO) prepared as a stock solution in MeOH was added to each labeling reaction mixture. Control experiments performed in a similar manner showed that the small amount of MeOH added does not affect the fluorescence or binding properties of SAFE (data not shown). Knockout mutations. Knockout mutations were introduced into the appropriate DIG design in pETCON or pET29b(+) by the method of Kunkel 29 . These variants included the single point mutants V 1 17R, Y 101 F, Y 1 15F, and Y34F and the triple mutant

Y101F/Y115F/Y34F for DIG 10, the single point mutants W119R, H58A, Y84F, and Y97F and the double mutant Yl 15F/Y84F for DIGS, the single point mutants V86R, H101A and the triple mutant Y10F/H101A/Y103F for DIG8, and single point mutants Y34F, Y101F, Yl 15F, the double mutant Y99F/Y101 F, the triple mutant Y34F/Y99F/Y101 , and the quadruple mutant Y34F/Y99F/Y101F Y1 ISF for DIG! 0.3. Oiigos were ordered from Integrated DNA Technologies, Inc. (Coralvilie, IA) and are listed in Supplementary Table 12 with the mutagenized region(s) highlighted in red.

Recursive PGR assembly of Izls. The gene for I zls having additional pETCON overlap fragments at either end for yeast homologous recombination was assembled via recursive PGR. Oligo sequences were designed using DNA Works' 1 ' and are given in

Supplementary Table 13. Oiigos w r ere ordered from Integrated DNA Technologies, Inc. (Coralvilie, IA). A 2 μΕ portion of a 2.5 μΜ stock solution of each oligo was combined and the mixture was added to 8 μΕ of 1.25 mM dNTPs, 20 μΕ of 5X Phusion buffer HF, 3 μί. of DMSO, and 1 μΕ of Phusion high-fidelity polymerase (NEB, Waltham, MA) in 100 iiL. Full- length gene product was assembled by 30 cycles of PGR (98 °C 10 s, 61 °C 30 s, 72 °C IS s) Correctly assembled PGR product was amplified by a second round of PGR. Reaction product (5 μΕ) was combined with 2 μί. of 10 μΜ pCTCON2f (Supplementary Table 14), 2 μΕ of 10 μΜ pCTCON2r (Supplementary Table 14), 8 μί. of 1.25 mM dNTPs, 20 uL of 5X Phusion buffer HF, 3 uL of DMSO, and 1 μΕ of Phusion high-fidelity polymerase (NEB, Waltham, MA) in 100 uL. Product was obtained by 30 cycles of PGR (98 °C 10 s, 60 °C 30 s, 72 °C 15 s). Following confirmation of a single band at the correct molecular weight by 1 % agarose gel electrophoresis, the PGR product was purified using a Qiagen PGR cleanup kit (Qiagen) and eiuted in dF ^O

Yeast EBYl 00 cells were transformed with 240 ng of Izl s gene DNA and 400 ng of gel-purified pETCON digested with Ndel miA Xhol using lithium acetate and polyethylene glycol 2"' ' with dH?0 instead of single- stranded carrier DNA. The correct sequence was confirmed by colony PCR. and gene sequencing, and plasmids from these colonies were harvested using a Zymoprep Yeast Miniprep II kit (Zymo Research Corporation, Irvine, CA).

DIG 10 site-saturation mutagenesis library (directed evolution round la). A DIG 10 single site-saturation mutagenesis (SSM) library was constructed by Kunkel mutagenesis 29 using degenerate NK primers targeting the following 34 amino acids positions: 810, LI 1 , L14, W22, L32, Y34, A37, P38, G40, H41 , H54, M55, L57, F58, Y6 L V62, V64, F66, F84, G86, G88, H90, V92, S93, L97, A99, Y101 , S103, Yl 1 5, V I 17, F l 19, VI 24, A127, and L I 28. These positions w r ere chosen from the model based on the following requirements: (! ) (hey have Ca within 7 A of any ligand heavy atom, and/or (2.) they have Ca within 9 A of any ligand heavy atom and C\i closer to any heavy atom in the ligand than Ca. The theoretical library size was 1088 clones. Primers were ordered from Integrated DNA

Technologies (Coralvilie, 1A).

Kunkel mutagenesis of each position was carried out independently. DNA from each reaction was dialyzed into d¾0 using a 0.025 μτη membrane filter (Millipore, Bilierica,

MA), and then the dialyzed reaction mixtures were pooled, concentrated to a volume of < 10 μΐ, using a Savant SpeedV c centrifugal vacuum concentrator, and transformed into yeast strain EBY100 using the method of Benatuil 3j , yielding 2.5e5 transformants. After transformation, cells were grown in 250 niL of SDCAA media for 36 hrs at 30 °C. Cells (5e8) were collected by eentrifugation at 1 ,700 x g for 4 min, resuspended in 50 niL of SGCAA media, and induced at 18 °C for 24 hrs

Cells were subjected to three rounds of permissive cell sorting (Supplementary Table 8), For each round of sorting, cells were washed and then labeled with a. pre-incubated mixture of 2,66 μΜ DIG-BSA-biotin, 644 nM S PE, and anti-cmyc-FlTC as noted above for single clones. After each sort, cells were grown in SDCAA for 24 hrs and then induced in SGCAA for 24 hrs before the next sort. After the final sort, the mean compensated PL fluorescence of the expressing population of the sorted cells was considerably higher than that of DIG 10, indicating the presence of a point mutant(s) with increased binding affinity After each sort, a portion of cells were plated and grown at 30 °C. P!asmids from individual colonies were harvested using a Zymoprep Yeast Miniprep II kit (Zyrno Research Corporation, Irvine, CA) and the gene was amplified by 30 cycles of PCR (98 °C 10 s, 61 °C 30 s, 72 °C 15 s) using Phusion high-fidelity polymerase (NEB, Waitham, MA) with the pCTCON2r and pCTCON2f primers. Sanger sequencing (Genewiz, Inc., South Piainfiekl, NJ) was used to sequence at least 10 colonies from each population.

DIG 10 combinatorial mutagenesis library (directed evolution round lb). Beneficial mutations identified in the DIG! 0 SSM library were combined by Kunkel mutagenesis ' " 9 using degenerate primers. At each mutagenized position, the original DIG 10 amino acid and chemically similar amino acids to those identified were also allowed, resulting in a combinatorial library. Amino acid substitutions included S, A, or M at position S i O, L, H, or Q ai position LI 1 , A or P at position A37, 1, L, V, F, or M at position V62, 1, L, V, F, or M at position V64, H, T, or at position H9Q, 1, L, V, F, or M at position V I 17, and A or P at position A127. The theoretical library size was 1.35e4 clones. Primers were ordered from Integrated DNA Technologies, Inc. (Coralville, IA).

Four independent Kunkel reactions using different oligo concentrations ranging from 36 nM to 291 nM during polymerization were performed to minimize sequence-dependent priming bias. For the same reason, oligos encoding native substitutions contained at least one codon base change. Library DNA was pooled, prepared as above, and transformed into electrocompetent E. coll strain III .2 h: 1)1.3 ) cells (1 800 V, 200 Ω, 25 uF), yielding 8e4 transform ants. Library plasmid D A was isolated from expanded cultures using a Qiagen miniprep kit. Gene insert was amplified from 10 ng of library DNA by 30 cycles of PGR (98 °C 10 s, 61 °C 30 s, 72 °C 15 s) using Phusion high-fidelity polymerase (NEB, Waltham, MA) with the pCTCON2r and pCTCON2f primers.

Yeast EBY 100 cells were transformed with 4.0 ,ug of PCR-purified DNA insert and

1 .0 μg of gel-purified pETCON digested with Nde! and Xhol using the method of Benaruil 11 , yielding 8e5 transformants. After transformation, cells were grown in 150 mL of low pH SDCAA media supplemented with Pen/Strep for 48 hrs at 30 °C. Cells (5e8) were collected by cen rifugation at 1 ,700 x g for 4 min, resuspended in 50 mL of SGCAA media, and induced at 18 °C for 24 hrs

Cells were subjected to seven rounds of cell sorting. For the first four rounds, cells were washed and then labeled with a pre- incubated mixture of DIG-BSA-biotin, SAFE, and anti-cmyc-FITC as noted above for single clones. Label concentrations for rounds one through four were: (1 ) 1 μΜ DIG-BSA-biotin and 250 nM SAFE, (2) 750 nM DIG--BSA- biotin and 1 87.5 nM SAPE, (3) 50 nM DIG-BSA-biotin and 12.5 nM SAFE, and (4) 5 nM DIG-BSA-biotin and 1 ,25 nM SAPE, For rounds five through seven, DIG-RNase-biotin was used in a multistep labeling procedure to minimize selection for carrier protein (BS .) binding and because this procedure showed a larger dynamic range in several control experiments. In these experiments, cells were washed as before, labeled with DIG-RNase- biotin for 3-4 hrs at 4 °C, and then treated with a solution of PBSF containing a 1 : 100 dilution of SAPE and a 1 : 100 dilution of anti-cmyc-FITC (secondary label) for < 15 min at 4 °C before washing and sorting. DIG-RNase-biotin label concentrations were 10 pM, 5 pM, and 5 pM for rounds five through seven, respectively. At least 10 clones from each round were sequenced as rioted for the DIG 10 SSM library. After seven rounds, the library converged to two sequences differing by a single point substitution: DIGlO. la, harboring the SI OA substitution, and DlGlO. lb, containing SIOM (Supplementar '' Fig. 5). Mutations common to both DIG lO. la and DlGl O.lb were V62M, V64I, VI 17L, and A127P. Analysis of both clones using the multistep labeling procedure with 5 pM DIG-RNase-biotin showed that DIGlO. l had a slightly higher signal for mean PE fluorescence of the expressing population than did DlGl O. l b.

DIGIO. IL library (directed evolution round 2). Library DNA was a mixture of DNA from DIGIO.IL fl, DIGIO.IL f2, and a third library (DIGIO.IL 3) combining mutations from the two fragment libraries (see section on Next-Gen Library Construction). For DIG! 0.1 L_3, the library was constructed using the oligos DIG 10.1 LJhrl,

DIG 10.1 L f 1 a rc variable, DIG 10. IL fib variable, DIG 10.1 L O rc variable, and DIGIO. IL _hr2 and the procedures detailed below

Yeast ΕΒΥΊ 00 cells were transformed with a mixture of D A insert from

DIGIO.IL f] (3.0 .ug), DIGI O. I L 1 (3.0 .ug), and DIGIO.IL 3 (24.0 μg) and 10.0 ,ug of gel- purified pETCON digested with Ndel and Xhol using the method of Benatuif ~ : , yielding 1.5e7 transformants. After transformation, ceils were grown for 24 hrs in 250 mL of iow-pH SDC AA media supplemented with Pen/Strep at 30 °C, passaged once, and grown for an additional 2.4 hrs under the same conditions. Ceils (5e8) were collected by centrifugation, resuspended in 50 mL of SGCAA, and induced overnight at 18 °C

Cells were subjected to five rounds of cell sorting using monovalent DIG-PEGi-biotin and the multistep labeling procedure detailed for directed evolution round lb sorts five through seven to increase stringency by avoiding avidity effects. DIG-PEGrbiotin label concentrations were 80 nM, 80 nM, 50 iiM, 1 nM, and 1 nM for the five rounds. After the final sort, the mean compensated PE fluorescence of the expressing population of the sorted cells was considerably higher than that of DIG 10.2, indicating the presence of a mutant(s) with increased binding affinity.

At least 10 clones from each round were sequenced as noted for the DIG 10 SSM library. After four rounds, the library converged to one sequence (DIG 10.2) having the two loop mutations Α37Ρ and H41Y, which were two of the most enriched single point mutations identified in the next-generation sequencing experiment.

DIG 10.2 combinatorial library based on deep sequencing data (directed evolution round 3). Mutations having normalized next-generation sequencing enrichment values (ΔΕ, λ ) > ~3.5 were combined by Kunkel mutagenesis ^9 using degenerate primers. DIG! 0.2 was used as the library template. At each mutagenized position, the original DIG 10.2 amino acid and chemically similar amino acids to those identified were also allowed, resulting in a combinatorial library. Amino acid substitutions included C or S at position C23, F, H, or Y at position F45, M or F at position M62, H, I, L, F, or Y at position H90, V or A at position V92, A, V, I, T, F, or Y at position A99, S, A, or V at position S 103, L, V, or W at position L105, 1 or F at position II 12, V or F at position VI 24, and P, 1, L, or V at position PI 27. The theoretical library size was 1.04e5 clones. Primers were ordered from Integrated DNA Technologies (Corarvilie, lA).

Four Kunkel reactions using different oligo concentrations ranging from 36 nM to 291 nM during polymerization and two Kunkel reactions using reduced oligo concentrations for the M62M substitution relative to the concentrations of the M62F substitution were performed to minimize sequence -dependent priming bias. For the same reason, oligos encoding native substitutions contained at least one codon base change. Library DNA was pooled, prepared as above, and transformed into E. coli strain ElectroMAX DH10B

(Invitrogen, Carlsbad, CA) cells (2500 V, 200 Ω, 25 μΡ), yielding 1.6e7 transformants. Library plasmid DNA was isolated from expanded cultures using a. Qiagen miniprep kit. Gene insert was amplified from 10 ng of library DNA by 30 cycles of PGR (98 °C 10 s, 61 °C 30 s, 72 °C 15 s) using Phusion high-fidelity polymerase (NEB, Waltham, MA) with the pCTCON2r and pCTCON2f primers.

Yeast EBY100 cells were transformed with 6.0 ,ug of PCR-purified DNA insert and 2.0 fig of gel-purified pETCON digested with Nde! and Xhol using the method of BenafcuT 1 , yielding 5e6 transformants. After transformation, cells were grown in 150 mL of low pH SDCAA media supplemented with Pen/Strep for 48 hrs at 30 °C. Cells (5e8) were collected by centrifugation at 1,700 x g for 4 min and resuspended in 50 mL of SGCAA media. Cells were induced at 18 °C for 24 hrs.

Cells were subjected to four rounds of cell sorting (Fig. 13). The first three sorts utilized monovalent DIG-PEGs-biotin and the muftistep labeling procedure detailed for directed evolution round lb sorts five through seven to increase stringency by avoiding avidity effects. DIG-PEGs-biotm label concentrations were 400 pM, 20 pM, and 20 pM for the first three rounds. For the fourth round, an off-rate selection 24 was used to better discriminate between high affinity binders. Cells (4e6) were washed and labeled with 20 pM DIG-PEGs-biotin, as described above. Labeled ceils were collected by centrifugation at 1,700 x g for 4 min and resuspended in 100 μΐ ^ of 100 nM DIG in PB8F. Cells were incubated with free DIG for 20 min at room temperature (20 min was found to be the half-life of the DIG10.2-DIG-PEG;;-biotin complex in off-rate experiments) collected by cenirifugation, labeled with secondary label as described above, washed, and sorted. After the final sort, the mean compensated PE fluorescence of the expressing population of the sorted cells was considerably higher than that of DIG 10.2, indicating the presence of a mutant(s) with increased binding affinity.

At least 10 clones from each round were sequenced as noted for the DIG10 SSM libraiy. After four rounds, the library converged to one sequence (DIG! 0.3) having the mutations C23 S, H90L, V92.A, A99Y, S 103 A, and L 105 W.

Next-generation DIG 10.1 libraiy construction and selections. Paired-end 151 lllumina sequencing was used to simultaneously assess the effects of mutation on binding of DIG 10.1 to digoxigenin at 39 amino acid positions within the binding site pocket. Two libraries were constructed: an N- terminal library with mutations between residues S10 and F66 (fragment 1 libraiy - DIG10.IL_f1) and a C -terminal library with mutations between residues F84 and LI 28 (fragment 2 libraiy - DTG10.IL_f2). For each libraiy, the full-length DIGl O. l gene having additional pETCON overlap fragments at either end for yeast homologous recombination was assembled via recursive PGR. To introduce mutations, we used degenerate PAGE-purified oligos in which selected positions within the binding site were doped with a small amount of each non-native base at a level expected to yield 1-2 mutations per gene (TriLink BioTechnoiogies, San Diego, CA). All other wild-type oligos were also PAGE-purified (Integrated DNA Technologies).. For DIG10.IL_fl , bases coding for the following 20 amino acid positions were allowed to vary: A 10, LI I, L14, W22, C23, F26, L32, Y34, A37, P38, G40, H41, F45, HS4, M55, F58, Y61, M62, 164, and F66. For

DIG10.1L 12, bases coding for the following 19 amino acid positions were allowed to vary: F84, G86, G88, H90, V92, S93, G95, L97, A99, Y101 , S103, L105, II 12, Yl 15, LI 17, Fl 19, VI24, P127, and L128.

For assembly of DIG10.1L_fl , 2 μΐ. of 2.5 μΜ DlGlO.IL hrl , 2 μΐ. of 2.5 μΜ DIG 10.1 L_fl a_re_variabie, 2 uL of 2.5 uM DlG10.1 L_fl _variable, 2 μΐ, of 2.5 μΜ DIG ! .0.IL_f2_rc_WT, and 2 μΐ, of 2.5 μΜ DIG10.1L_hr2 were combined with 8 μί of 1.25 mM dNTPs, 20 of DMSO, and 1 μΙ_ of Phusion high- fidelity polymerase (NEB, Waltham, MA) in 100 μΕ. Reaction mixtures for assembly of DIG10. il f2 were the same, except that DIG 10.1L fl re variable, DIG 10.1 L fib variable, and DIG 10.1L J2 _rc _WT were substituted with DIG 10.1 L f 1 a rc WT, DIG10.1 L _fl b _WT, and DIG 10.1L 12 rc variable, respectively. Full-length products were assembled by 30 cycles of PGR (98 °C 10 s, 61 °C 30 s, 72 °C 15 s).

Correctly assembled PGR products were amplified by a second round of PGR.

Reaction products (5 uL) were combined with 2 μL of 10 μΜ DIG 10.IL assembly fwd, 2 μΕ of 10 μΜ DIG I O.IL assembly rev, 8 μί of 1 .25 mM dNTPs, 20 iiL of 5X Phusion buffer HF, 3 uL of DMSO, and 1 μΕ of Phusion high-fidelity polymerase (NEB, Waitham, MA) in 100 uL. Products were amplified by 30 cycles of PGR (98 °C 10 s, 60 °C 30 s, 72 °C 15 s). Following confirmation of a single band at the correct molecular weight by 1% agarose gel electrophoresis, PGR products were purified using a Qiagen PGR cleanup kit (Qiagen) and eluied in ddF O.

Yeast EBY100 cells were transformed with 5.4 μg of library DNA. insert and 1.8 μg of gel- purified pETCON digested with Ndel and Xhol using the method of Benattu 1 , yielding 4e6 and 3e6 transformants for the DIG10.1 L_fl and DIG 10.1L_f2 libraries, respectively. After transformation, cells were grown for 2.4 hrs in 100 mL of low-pH SDCAA media

supplemented with Pen/Strep at 30 °C, passaged once, and grown for an additional 24 hrs under the same conditions. Cells (5e8) were collected by centrsfugation, resuspended in 50 mL of SGCAA, and induced overnight at 18 °C.

Induced cells (3e7) were labeled with 4 μΕ of anti-cyme -FITC (Miltenyi Biotec GmbH, Germany) in 200 uL of PBSF for 20 min (DIG 10.1 L_f 1 ) or 60 min (DIG 10.1 L_f2) at 4 °C. Then, labeled cells were washed with PBSF and sorted. In this first round of sorting, all cells showing a positive signal for protein expression were collected (Fig. 10). Cells were recovered overnight in ~1 mL of low-pFI SDCAA supplemented with Pen/strep at 30 °C, pelleted by centrifugation at 1 ,700 x g for 4 min, resupended in 5 mL of low-pH SDCAA supplemented with Pen/strep, and grown for an additional 24 hrs at 30 °C. Cells (2e7) were collected by centrifugation, resuspended in 2 mL of SGCAA, and induced overnight at 18 °C.

Induced cells from expression-sorted DIGI O. IL fl (2e7 cells), expression-sorted DIG 10.1L_£2 (2e7 cells), and two DIG 10.1a reference samples (5e6 cells per sample) were washed with 600 μΕ of PBSF and then labeled with 100 nM of DIG-PEG 3 -biotin in 400 μΕ of PBSF for the libraries or 200 uL of PBSF for the reference samples for > 3 hrs at 4 °C. Labeled cells were washed with 200 uL of PBSF, then incubated with a secondary label solution of 0.8 uL of SAFE (Tnvitrogen) and 4 \iL of anti-cymc-FITC (Miltenyi Biotec GmbH, Germany) in 400 μΧ. of PBSF for 8 min at 4 °C. Ceils were washed with 200 iL PBSF, resuspended in either 800 μΙ_ μΐ. of PBSF for the libraries or 400 iL of PBSF for the reference samples, and sorted (Fig. 10). Each library was sorted according to two different stringency conditions: ( 1) clones having binding signals higher than that of DIGlO. la (DT.G10.1_fl_better and DIG10.1_f2_better), and (2) clones having binding signals equivalent to or higher than that of DTGlO. la (DIG10.1__fI_neutral and DTG 10.1__f2__neutraf). Collected ceils were recovered overnight in ~1 mL of low-pH SDCAA supplemented with Pen/strep at 30 °C, pelleted by centrifugation at 1 ,700 x g for 4 min, resupended in 2 mL of low-pH SDCA supplemented with Pen/strep, and grown for an additional 24 hrs at 30 °C. Cells (2e7) were resuspended in 2 mL of SGCAA and induced overnight at 1 8 °C.

To reduce noise from the first round of cell sorting, the sorted libraries were labeled and subjected to a second round of cell sorting using the same conditions and gates as in the first round (Fig. 10). Collected cells were recovered overnight in 800 ,uL of low-pH SDCAA supplemented with Pen/strep at 30 °C, pelleted by centrifugation at 1,700 x g for 4 min, resupended in 2 mL of low-pH SDC AA supplemented with Pen/strep, and grown for an additional 24 hrs at 30 °C.

One hundred million cells from the expression-sorted DIG10.1L †2 and DIG10.1L £2 libraries and at least 2e7 cells from doubly-sorted DIG10.1_fl_better and DlG 10.1__f2__better were pelleted by centrifugation at 1 ,700 x g for 4 min, resuspended in 1 mL of freezing solution (50% YPD, 2.5% glycerol), transferred to cryogenic vials, slow-frozen in an isopropanol bath, and stored at -80 °C until further use.

Next-generation library sequencing. Library DNA was prepared as detailed

previously'''. Illumina adapter sequences and unique library barcodes were appended to each library pool through PCR amplification using population-specific HPLC-purified primers (Integrated DNA Technologies, Coralville, IA). The library amplicons were verified on a 2% agarose gel stained with SYBR Gold (Invitrogen) and then purified using an Agencouri AMPure XP bead-based purification kit. (Beckman Coulter, Inc.) Each library amplicon was denatured using NaOH and then diluted to 6 pM. A sample of PhiX control DNA (Illumina, Inc., San Diego, CA) was prepared in the same manner as the library samples and added to the library DN A to create high enough sample diversity for the Illumina base-calling algorithm. The final DNA sample was prepared by pooling 300 fjL of 6 pM PhiX control DNA (50%), 102 uS . of 6 pM expression-sorted DIG 10 _1L _fl (17.0%), 102 μ! . of 6 pM expression-sorted DIG10 _1L 2 (17.0%), 33 μί of 6 pM DIGlO lL fl neutral (5.5%), 33 μΐ . of 6 M DIG10 IL £2 neutral (5.5%), 15 uL of 6 pM DIGlO lL fl better (2.5%), and 15 \xL of 6 pM DIG10 IL 12 better (2.5%). DNA was sequenced in paired-end mode on an Iilumina MiSeq using a 300-cycle reagent kit and custom HPLC-purified primers (Integrated DMA Technologies, Inc., Coralville, IA),

Processing of sequencing results. Data from each next-generation sequencing library was demultiplexed using the unique library barcodes added during the amplification steps. Of a total 5,630, 105 paired-end reads, 2,531 ,653 reads were mapped to library barcodes. For each library, paired end reads were fused and filtered for quality (Phred > 30). The resulting full-length reads were aligned against the relevant segments of the DIG 10. la sequence using scripts from the software package Enrich 33 . For single mutations having > 7 counts in the original input library, a reiaiive enrichment ratio between the input library and each selected library was calculated' " 34 '". A pseudocount value of 0.3 was added to the total reads for each selected library mutation, to allow calculation of enrichment values for mutations that disappeared completely during selection.

Protein expression and purification. Selected DIG designs and variants were expressed in E. coli in pET29FLAG or with a TEV protease-eleavable Hise purification tag (pET29-TEV-His6). For the latter, DIG genes were amplified from the appropriate pETCON- based plasmid using a forward primer and a reverse primer harboring a TEV-protease recognition insertion sequence. The PGR. products were digested with Ndel and XhoJ and ligated into similarly digested pET29b(+). Ligation products were transformed into Rosetta 2 (DE3) cells for expression. Rosetta 2 (DE3)/pET29b(+) cells were grown in IL of LB or TB medium at 37 °C to an Q.D.coo of -0.7, and then protein expression was induced by the addition of 0.5 roM IPTG (isopropyl-p-D-thiogaiactopyransoide). Cultures were incubated at 37 °C for 3-4 hrs or at 18 °C for 18 hrs and then harvested by eentrifugation at 1,912 x g for 20 min. Cell pellets were stored at -20 °C until further use.

Proteins were purified by gravity flow chromatography over Ni-NTA resin (Qiagen, Hilden, Germany) columns. Frozen cell pellets were resuspended in 15 ml. of wash buffer (PBS pH 7.4, 30 mM imidazole) supplemented with 300 μΕ of 100 mM

phenylmethanesulfonyl fluoride (PBSF; Sigma Aldrich, St. Louis, MO) prepared in neat ethanol, 2 mg/mL of lysozyme, and 0.2 mg/mL of DNAse I. Cells were lysed by sonication for a total of 4 min (30 s on, 20 s off) using a Branson sonifier at 75% power. Insoluble material was removed by eentrifugation at 38,724 x g for 30 min, and particulate matter was further removed from the supernatant by filtration through a 0.45 μηι syringe filter. Supernatant was then passed through gravity columns containing 3 mL ofNi-NTA resin (Qiagen, Hilden, Germany) equilibrated in wash buffer. Bound proteins were washed with 45 mL of wash buffer and then eluted in 20 mL of elution buffer (PBS pH 7.4, 200 mM imidazole). Proteins were concentrated to -5-40 mg/ml, using Vivaspin 5 k.D MWCO centrifugal concentration devices (Sartorium Stedim Biotech GmbH, Goettingen, Germany) and imidazole was removed by dialysis (3 x 2L) into PBS pH 7.4 at 4 °C.

Yields for the D G designs expressed in pET29FLAG are given in Supplementary Table 2. Typical yields for DIG10-TEV-his 6 , DIG 10.1a-TEV-his 6 , DIG 10.2-TEV-his 6 , DIG10.3-TEV-his 6 , DIG 10.3-TEV-his 6 variants, and lzls-TEV-his 6 range from 10 to 60 mg/L. For all solution experiments, protein concentrations were determined from absorbance at 280 nm measured on a NanoDrop spectrophotometer (Thermo Scientific) using extinction coefficients calculated from primary amino acid sequences.

Size-exclusion chromatography. Protein oligomerization states were assessed by size exclusion chromatography on an AKTA FPLC (GE Healthcare) using a Superdex 75 10/300 GL column equilibrated in running buffer (25 mM Tris-HCl pH 7.4, 250 mM NaCl). Proteins were run over the column at a flow rate of 0.5 mL/min. Horse heart cytochrome c (29 kDa), bovine erythrocytes carbonic anh drase (12.4 kDa), and bovine aprotmin (6.5 kDa) molecular weight standards (Sigma Aldrieh, St. Louis, MO) were analyzed in the same manner as the protein samples. Under these conditions, cytochrome c, carbonic anhydrase, DIGlO-TEV-hise (expected MW of monomer: 17.9 kDa), DIGlO.lb-TEV-hise (expected MW of monomer: 17.9 kDa), DIG10.2 ; -his 6 (see below; expected MW of monomer: 15.9 kDa), DIG10.3-TEV (the li!s,% tag was cleaved with TEV protease; expected MW of monomer: 16.9 kDa), and DIG5-TEV-his6 (expected MW of monomer: 17.8 kDa) eluted at 12.05 mL, 13.65 mL, 1 1.88 mL, 1 1.81 mL, 1 1.40 mL, 1 1.35 mL, and 1 1.78 mL, respectively (Supplementary Fig. 13). For preparative runs, pure protein-containing fractions were identified by absorbance at 280 nm and by SDS gel electrophoresis. Analytical superdex 75 gel filtration analyses of 100 μΜ DIG10.3-TEV-his 6 and 100 μΜ DIG10.3-TEV-his 6 pre-mcubated with 500 μΜ DIG for -60 min at room temperature were also conducted using the above procedure. Under the conditions, DIG10.3 and the DIG10.3-DIG complex eluted at 1 1.29 mL and 10.71 mL, respectively (Supplementary Fig. 13).

Preparation of samples for cry stallography. Crystaliographic trials with the DIGlO-based C- terminal TEV-hise constructs (cleaved with TEV protease or un-cleaved) failed to yield diffractio -quality crystals. All Izl s-based designs contained a 12 residue C-terminal tail that was disordered in the structure of lzls but was maintained when we ordered the designs in case it was necessary for protein stability or folding. To reduce entropie effects from this disordered tail that might prevent crystal formation, we cloned the DIG 10 designs into new pET29b(+)-based constructs in which all 12 residues of this tail were eliminated and a non- cleavable hisr, tag was placed immediately after the last ordered residue (DIGlO t -hise, DIGl O. i at-hise, DIG10.2 r his 6 , and DIG10,3 t -his 6 ).

Truncated samples were expressed and purified by gravity flow over i-NTA resin using the above procedure. Typical expression yields were comparable to their un-truncated, TEV-cleavable Hise-tagged counterparts (see above). Preparative size exclusion

chromatography was used to further purify all proteins for crystallization attempts using the above procedure.

Crystallization. Purified DIG 10 and its evolved variants were incubated at 4°C with 1 mM digoxigenin for 16-20 hours. The protein-ligand complex was then screened using several commercially available sparse matrix crystallization screens using a nanoliter drop volume crystallization robot (TIP LabTech 'Mosquito'). Potential hits were sealed up into vapor diffusion plates with reservoir solution to protein-ligand complex at a ratio of 1 : 1 . Several diffraction quality crystals were obtained for DIGl Q.2 t -his6 and DIG10.3 t -his6.

Crystals of DIG10.2 t -his6 were grown at a concentration of 15 mg mL " 1 in 0.1 M Acetate pH 5.5, 1 .5% MPD, 2.5 M Sodium chloride and 12% PEG 1500. Crystals of DIG 10.3 r his 6 were grown at a concentration of 13.5 mg mL " 1 in 0.2 M Ammonium acetate, 0.1 M Bis-Tris pH

5.5 and 20% PEG3350. DlG10.2 t -his6 and DIG10.3 t -his6 crystals were transferred to artificial mother liquor containing 2.0% Sucrose or Glycerol, respectively, then individually removed in fiber loops and flash frozen in liquid nitrogen.

Crystallographic data collection and processing. Datasets from crystals of DIG 10.2 r hise and DIG10.3 t -his6 were collected at the Advanced Light Source (ALS) synchrotron facility (Berkeley, CA) on beamline 5.0.2 using a CCD area detector. Data for DTG10.2t-his6 corresponded to 360° of 1 ° diffraction exposures collected at a distance of I SO mm and exposure times of 1 second per 1 ° oscillation. Data for DIGT ().3 t -his6 corresponded to full 360° of 1 ° diffraction exposures collected at a distance of 230 mm and exposure times of 1 second per 1 ° oscillation.

Data was processed using the HKL2000 software package' 6 . Molecular replacement was performed using program PHASER:' ' in the CCP4 software suite' 8 ' 9 using Pseudomonas aeruginosa hypothetical protein PA3332 (PDB 1Z1 S) as the model 40 . Refinement and model building were carried out using RefmacS " " and COOT (Crystallography Object-Oriented Toolkit) "2 , respectively. The geometric quality of the final model was validated using ProCheck 43 , S i-Check and MolProhity j , as well as the validation tools provided by the RCSB Protein Data Base 46 .

The diffraction dataset collected from the DIG l O.Srhise crystal collected could only be processed to 3.2A resolution in space group C2. Significant disorder was displayed in several of the independent copies of protein-ligand complex in the asymmetric unit, which resulted in very high average B-factors.

Fluorescence polarization equilibrium binding assays. Fluorescence polarization- based affinity measurements of designs and their evolved variants were performed as noted previously' * ' using Alexa488-conjugated DIG (DIG-PEG3-A1exa488). In a typical experiment, the concentration of DIG-PEG 3 -Alexa488 was fixed below the J¾ of the interaction being monitored and the effect of increasing concentrations of protein on the fluorescence anisotropy of Aiexa488 was determined. Fluorescence anisotropy (r) was measured in 96-we!l plate format on a SpectraMax M5e microplate reader (Molecular

Devices) with λ εχ = 485 M and X em - 538 nM using a 515 nm emission cutoff filter. In all experiments, PBS (pFi 7.4) was used as the buffer system and the temperature was 25 °C. DIG-PEG 3 -Alexa488 solutions were prepared from a 1 mM stock in DMSO. Equilibrium dissociation constants (¾) were determined by fitting plots of the anisotropy averaged over a period of 20 to 40 min after reaction initiation versus protein concentration as described previously '* ' . Reported .¾ values represent the average of at least three independent measurements with at least two separate batches of purified protein

Design-TEV-his f , constructs were used for all measurements. The [DIG-PEGj-Alexa488] used for sets of experiments on each protem are as follows: DIGS: 2 μΜ, DIG10: 2 μΜ, lz l s: 2 μΜ, BSA: 2 μΜ, DIGl O. l a: 10 nM, DIGT 0.2: 1 nM, DIG10.3 : 0.5 nM, DIG 10.3

Y34F: 2 nM, DIG 10.3 Y99F: 2 nM, DIG 1 0.3 Y 101F: 2 nM, DIG10.3 Y1 1 5 : 2 nM, DIG 1 0.3 Y99F/Y101F: 2 nM, DIG! 0.3 Y34F/Y99F/ ' Yl () iF: 10 nM, and DIG 10.3

Y34F/Y99F/Y 101 F/Y 1 15F: 10 nM.

Fluorescence polarization equilibrium competition binding assays. Fluorescence polarization equilibrium competition binding assays were used to determine the binding affinities of

DIG 10.3 and its variants for unlabeled digoxigenin, digitoxigenin, progesterone, β-estradiol, and digoxin. In a typical experiment, the concentration of DIG-PEG . rAlexa488 was kept near or below the K& of the interaction being monitored, the concentration of protein was fixed at a saturating value such that >95% the DIG-PEG 3 -Alexa488 in the system was hound to protein, and the effects of increasing concentrations of unlabeled ligand on the fluorescence anisotropy of Alexa488 were determined as noted above. Unlabeled stock solutions of digoxigenin, digitoxigemn, progesterone, and β-estradiol were prepared in methanol.

Unlabeled stock solutions of digoxin were prepared in DMSO. Ligand stock solutions were 10 mM for DIG, digitoxigemn, and digoxin, and 1 mM for progesterone and β- estradiol For each ligand concentration, a negative control sample containing only DIG-PEG 3 -Alexa488 and the appropriate dilution of a corresponding methanol-only control solution in PBS was measured. At all concentrations employed, methanol did not affect fluorescence anisotropy (data not shown). Similarly, the highest concentration of DMSO employed also did not affect fluorescence anisotropy (data not shown).

Fluorescence anisotropy (r) was measured as noted above. In all experiments, PBS (pH 7.4) was used as the buffer system and the temperature was 25 °C. The concentration of total unlabeled ligand producing 50% binding signal inhibition (I 5 o) was determined by fitting a plot of the anisotropy averaged over a period of 30 min to 3 hr after reaction initiation versus unlabeled ligand concentration as described previously " '. For some experiments, limiting steroid concentrations made it impossible to collect data in the regime of complete inhibition, in these cases, data were fit by fixing the anisotropy at infinite steroid

concentration to a value measured for other steroids for which this value could be determined experimentally. For cases in which i¾ for steroid « .<¾ for DIG-Pf¾-Aiexa488, the data could not be fit to the model and only qualitative conclusions could be reached (Fig. 4, dashed lines).

The inhibition constant for each proteinTigand interaction, K, was calculated from the measured IC 50 and the K & of the protein-label interaction according to a model accounting for receptor-depletion conditions ' ". IC50 values, the concentrations of free unlabeled ligand producing 50% binding signal inhibition, were calculated from the measured I50 values 47 . Reported I50 and subsequent K values represent the average of at least three independent measurements from at least two batches of purified protein and a fresh unlabeled inhibitor stock prepared for each. For DIG10.3, [DIG-PEG 3 -Alexa488] - 1 11M and [DIG10.3-TEV- his 6 ] = 20 nM. For DIG10.3 Y34F, [DIG-P£G 3 -Alexa488j = 10 11M and [D1G10.3 Y34F- TEV-hise] = 200 nM. For DIG 10.3 Y101F, [DIG-PEG 3 -Alexa488] = 10 nM and [DIG10.3 YlOlF-TEV-hise] = 200 nM. For DIG 10.3 Y34F/Y99F/Y 101 F, [DIG-PEG 3 -Alexa488] - 500 nM and [DIG10.3 Y34F/Y99F/Y101F-TEV-his 6 j = 5 μΜ. Circular dichroism spectroscopy. Circular dichroism spectra were collected on an Aviv 62A DS spectrometer. Samples were prepared in PBS. Fixed- emperature scans were conducted at 25 °C, All proteins were stable < °60 C.

REFERENCES

1. Schreier, B., Stumpp, C, Wiesner, S., & Hocker, B. Computational design of ligand binding is not a solved problem. Proc. Natl. Acad. Sci. USA 106, 1 8491- 18496 (2009).

2. de Wolf, F.A. & Brett, G.M. Ligand-binding proteins: their potential for application in systems for controlled delivery and uptake of ligands. Pharmacol Rev. 52, 207-236 (2000),

3. Hunter, M.M., Margoises, M.N,, Ju, A., & Haber, E, High-affinity monoclonal antibodies to the cardiac glycoside, digoxin. J. Immunol. 129, 1 165- 1 172 (1982).

4. Shen, X.Y., Orson, F.M., & Kosten, T.R. Vaccines against drug abuse. Clin.

Pharmacol. Ther. 91, 60-70 (2012).

5, Bradbury, A.R.M., Sidhu, S., Diibel, S., & MeCafferty, J. Beyond natural antibodies: the pow r er of in vitro display technologies. Nat. Biotechnol 29, 245-254 (201 1 ).

6. Brustad, E,M. & Arnold, F.H, Optimizing non-natural protein function with directed evolution. Curt: Opin. Chem. Biol. 15, 201-210 (201 1).

7. Chen, G. et al. Isolation of high-affinity ligand-binding proteins by periplasmic expression with cytometric screening (PECS). Nat. Biotechnol. 19, 537-542 (2001).

8. Telmer, P.G. & Shilton, B.H, Structural studies of an engineered zinc biosensor reveal an unanticipated mode of zinc binding, J. Mot Biol. 354, 829-840 (2.005).

9. Baker, D, An exciting but challenging road ahead for computational enzyme design. Protein Sci. 19, 1817-1819 (2010).

10. Jiang, L. et al. De novo computational design of retro-Aldol enzymes. Science 319, 1387-1391 (2008).

1 1. Khare, S.D. & Fleishman, S.J. Emerging themes in the computational design of novel enzymes and protein-protein interfaces. FEBS Lett. In Press. (2013),

12. Khersonsky, O. et at Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc. Natl.

Acad. Sci. USA 109, 10358-10363 (2012).

13. othlisberger, D, et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190- 195 (2008), 14. Wang, L. et al. Structural analyses of covalent enzyme-substrate analog complexes reveal strengths and limitations of de novo enzyme design. J. Mol. Biol. 415, 615-625 (2012).

15. Boehr, D.D., ussinov, R., & Wright, P.E. The role of dynamic conformational ensembles in biomolecular recognition, Nat. Chern. Biol. 5, 789-796 (2009).

16. Fleishman, 8.J., Khare, S.D., Koga, ., & Baker, D. Restricted sidechain plasticity in the structures of native proteins and complexes. Protein Sci. 20, 753-757 (2011).

17. Lawrence, M.C. & Colman, P.M. Shape complementarity at proteiiv'protein interfaces. J. Mol. Biol. 234, 946-950 (1993).

18. The Digitalis Investigation Group. The effect of digoxin on mortality and morbidity in patients with heart failure. N. Engl. J. Med. 336, 525-533 (1997).

19. Eisel, D., Setli, O., Griinewald-Janho, & Rruchen, B. DIG application manual for nonradioactive in situ hybridization, 4th ed. (Roche Diagnostics, Penzberg, 2008).

20. Flanagan, R.J. & Jones, A.L. Fab antibody fragments: some applications in clinical toxicology. Drug Safety 27, 1 1 15- 1 133 (2004).

21. Chao, G. et al. Isolating and engineering human antibodies using yeast surface display. Nat. Protoc. L 755-768 (2006).

22. Fowl er, D.M. et al. High-resolution mapping of protein sequence-function relationships. Nat. Methods 7, 741-746 (2010).

23. McLaughlin Jr, R.N., Poelwijk, F.J., Raman, A., Gosal, W.S., & Ranganathan, R. The spatial architecture of protein function and adaptation. Nature 491, 138-142 (2012).

24. Whitehead, T.A. et al. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat. Biotechnol. 30, 543-548 (2012).

25. Fersht, A.R. et al. Hydrogen bonding and biological specificity analysed by protein engineering. Nature 314, 235-238 (1985).

26. Frederick, K.K., Marfow, M.S., Valentine, K.G., & Wand, A.J. Conformational entropy in molecular recognition by proteins. Nature 448, 325-329 (2007).

27. Fleishman, S.J. & Baker, D. Role of the biomolecular energy gap in protein design, structure, and evolution. Cell 149, 262-273 (2.012).

28. Zanghellini, A. et al. New algorithms and an in silico benchmark for computational enzyme design. Protein Sci. 15, 2785-2794 (2006).

29. Kuhmian, B. & Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. USA 97, 10383-10388 (2000).

30. Rossi, A.M. & Taylor, C.W. Analysis of protein- ligand interactions by fluorescence polarization, Nat. Protoc. 6, 365-387 (201 1). 31. Fleishman, SJ. el al. RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLOS ONE 6, e20161 (201 1 ).

32. ZangheJlini, A. et al. New algorithms and an in silico benchmark for computational enzyme design. Protein Set 15, 2785-2794 (2006).

33. Jiang, L. et al De novo computational design of retro-aldol enzymes. Science 319, 1387- 1391 (2008).

34. Rothlisberger, D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190- 195 (2008).

35. Siegel, LB. et al. Computational design of an enzyme catalyst for a stereoselective biniolecular Diels-Alder reaction. Science 329, 309-313 (2010).

36. Richter, F., Leaver-Fay, A., Khare, S.D., Bjelic, S., & Baker, D. De novo enzyme design using Rosetta3. PLOS ONE 6, el 9230 (201 1 ).

37. Kellogg, E.H., Leaver-Fay, A., & Baker, D. Role of conformational sampling in computing mutation -induced changes in protein structure and stability. Proteins 79, 830-838 (2010).

38. Cooper, S. et al Predicting protein structures with a multiplayer online game. Nature 466, 756-760 (201 0).

39. Fleishman, S.J., Khare, S.D., Koga, N., & Baker, D. Restricted sidechain plasticit '' in the structures of native proteins and complexes. Protein Sci. 20, 753-757 (201 1 ).

40. Chao, G. et al. Isolating and engineering human antibodies using yeast surface display. Nat. Protoc. 1 , 755-768 (2006).

41. Benatuii, L., Perez, J.M., Belk, J., & Hsieh, C.-M. An improved yeast transformation method for the generation of very large human antibody libraries. Protein Eng. Des. Sel. 23, 155- 159 (2010).

42. Whitehead, T.A. et al Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat. Biotechnol 30, 543-548 (2012).

43. Fowler, D.M., Araya, C.L., Gerard, W., & Fields, S, Enrich: software for analysis of protein function by enrichment and depletion of variants. Bioinformatics 27, 3430-3431 (201 1).

44. Fowler, D.M. et al. High-resolution mapping of protein sequence-function relationships. Nat. Methods 7, 741 -746 (2010).

45. McLaughlin Jr, R.N., Poelwijk, F.J., Raman, A., Gosal, W.S., & Ranganathan, R. The spatial architecture of protein function and adaptation. Nature 491 , 138- 142 (2012). 46. Rossi, A.M. & Taylor, C.W. Analysis of protem-ligand interactions by fluorescence polarization. Nat. Pro toe. 6, 365-387 (201 1).