Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DE NOVO DESIGNED MODULAR PEPTIDE BINDING PROTEINS BY SUPERHELICAL MATCHING
Document Type and Number:
WIPO Patent Application WO/2024/091897
Kind Code:
A2
Abstract:
The disclosure provides polypeptides that bind tripeptide repeat sequences and having an amino acid sequence at least 50%, identical to the amino acid sequence of any one of SEQ ID NO: 1-105, nucleic acids encoding such polypeptides, expression vectors including such nucleic acids, and host cells including such polypeptides, nucleic acids, and/or expression vectors.

Inventors:
STEWART LANSING (US)
BAKER DAVID (US)
WU KEJIA (US)
HICKS DERRICK (US)
BRUNETTE TJ (US)
SILVA MANZANO DANIEL (US)
SHEFFLER WILLIAM (US)
GORESHNIK INNA (US)
DERIVERY EMMANUEL (GB)
MCNALLY KERRIE (GB)
BHABHA GIRA (US)
EKIERT DAMIAN (US)
REDLER RACHEL (US)
CHANG ATTY (US)
BAI HUA (US)
Application Number:
PCT/US2023/077575
Publication Date:
May 02, 2024
Filing Date:
October 24, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV WASHINGTON (US)
RES & INNOVATION UK (GB)
UNIV NEW YORK (US)
International Classes:
C07K14/47; A61K38/16
Attorney, Agent or Firm:
HARPER, David, S. (US)
Download PDF:
Claims:
We claim: 1. A polypeptide comprising or consisting of an amino acid sequence at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-105. 2. The polypeptide of claim 1 comprising an amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1- 105. 3. The polypeptide of claim 1comprising an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-105. 4. The polypeptide of claim 1 comprising an amino acid sequence at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-105. 5. The polypeptide of claim 1comprising an amino acid sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-105. 6. The polypeptide of claim 1comprising an amino acid sequence at 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-105. 7. The polypeptide of any one of claims 1-6, wherein any N-terminal methionine residue is deleted. 8. The polypeptide of any one of claims 1-6, wherein any N-terminal methionine residue is present.

9. The polypeptide of any one of claims 1-6, wherein 1, 2, 3, 4, or 5 amino acids at the N-terminus and/or C-terminus of the polypeptide may be absent. 10. The polypeptide of any one of claims 1-9, wherein substitutions relative to the reference sequence are conservative amino acid substitutions. 11. The polypeptide of any one of claims 1-10, further comprising one or more functional domains. 12. A nucleic acid encoding the polypeptide of any one of claims 1-11. 13. An expression vector comprising the nucleic acid of claim 12 operatively linked to a suitable control sequence. 14. A host cell comprising the polypeptide, nucleic acid, or expression vector of any preceding claim. 15. A pharmaceutical composition comprising (a) the polypeptide, nucleic acid, expression vector, and/or host cell of any preceding claim; and (b) pharmaceutically acceptable carrier. 16. A library comprising a plurality of different polypeptides of any preceding claim. 17. Use of the polypeptide, nucleic acid, expression vector, library, pharmaceutical composition, and/or host cell of any preceding claim for any suitable purpose, including but not limited to use as synthetic peptide ligands for receptor activation.

Description:
De novo designed modular peptide binding proteins by superhelical matching Cross Reference This application claims priority to U.S. Provisional Patent Application Serial No. 63/381,109 filed October 26, 2022, incorporated by reference herein in its entirety. Federal Funding Statement: This invention was made with government support under Grant No.5U19AG065156- 02, awarded by the National Institute on Aging. The government has certain rights in the invention. Sequence Listing Statement A computer readable form of the Sequence Listing is filed with this application by electronic submission and is incorporated into this application by reference in its entirety. The Sequence Listing is contained in the file created on September 26, 2023 having the file name “22-1316-WO.xml” and is 131,404 bytes in size Summary In one aspect, the disclosure provides polypeptides comprising or consisting of an amino acid sequence at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-105. In another embodiment, the disclosure provides a library, comprising a plurality of different polypeptides of any embodiment of the disclosure. In another aspect, the disclosure provides nucleic acids encoding the polypeptide of any embodiment of the disclosure. In a further embodiment, the disclosure provides expression vectors comprising a nucleic acid of the disclosure operatively linked to a suitable control sequence. In a further embodiment, the disclosure provides host cells comprising a polypeptide, nucleic acid, or expression vector of any embodiment of the disclosure. In another embodiment, the disclosure provides pharmaceutical compositions comprising (a) the polypeptide, nucleic acid, expression vector, and/or host cell of any embodiment herein; and (b) pharmaceutically acceptable carrier. The disclosure further provides methods for using the polypeptide, nucleic acid, expression vector, library, pharmaceutical composition, and/or host cell of any embodiment for any suitable purpose, including but not limited to use as synthetic peptide ligands for receptor activation. Description of Figures Figure 1. Overview of modular peptide binder design procedure. (A) Like all repeating structures, repeat proteins and peptides form superhelices with constant axial displacement (DZ), and angular twist (^) between adjacent repeat units. For in register binding, the protein and peptide parameters must match (for some integral multiple of repeat units). (B) Construction of hash tables for privileged residue-residue interactions. Top row: classes of sidechain backbone interactions for which hash tables were built; (left) sidechain amide group of asparagine or glutamine forming bidentate interactions with the N-H and C=O groups on the backbone of a single residue (left) or consecutive residues (middle), or with the backbone N-H group and sidechain oxygen atom of a serine or threonine (right). Second row: as illustrated for the glutamine - backbone bidentate interaction case, to build the hash table we carry out Monte Carlo sampling over the rigid body orientation between the terminal amide group and the backbone, and the backbone torsions phi and psi, saving configurations with low energy bidentate hydrogen bonds. For each configuration, the possible placements for the backbone of the glutamine are enumerated by growing sidechain rotamers back from the terminal amide. Third row: from the six rigid body degrees of freedom relating the backbones of the two residues, and the phi and psi torsion angles, a hash key is calculated using an 8 dimensional hashing scheme. The hash key is then added to the hash table with the sidechain name and torsions as the value. (C) To dock repeat proteins and repeat peptides with compatible superhelical parameters, their superhelical axes are first aligned, and the repeat peptide is then rotated around and slid along this axis. For each of these docks, for each pair of repeat protein-peptide residues within a threshold distance, the hash key is calculated from the rigid body transform between backbones and the backbone torsions of the peptide residue, and the hash table interrogated. If the key is found in the hash table, side chains with the stored identities and torsion angles are installed in the docking interface. D) The sequence of the remainder of the interface is optimized using Rosetta TM for high affinity binding. Two representative designed binding complexes are shown to highlight the peptide binding groove and the shape complementary. The close-up snapshots illustrate hydrophobic interactions, salt bridges, and ʌ-ʌ stacks incorporated during design. Figure 2. Designed binders function in living cells. (A) Experimental design: U2OS cells coexpress the target peptide fused to GFP and a fusion between the specific binder fused to mScarlet TM and a mitochondrial targeting sequence (MitoTag TM ). If binding occurs in cells, the GFP signal is relocalized onto the mitochondria, while control cells not expressing the binder show cytosolic GFP signal. (B-E) In vivo binding. Live, spreading, U2OS cell expressing PLPx6-GFP alone (B), IRPx6-GFP alone (D), PLPx6-GFP and Mito- RPB_PLP2_R6-mScarlet (C) or IRPx6-GFP and Mito-RPB_LRP2_R6_FW6-mScarlet (E) were imaged live by Spinning Disk Confocal Microscopy (SDCM). Note that the GFP signal is cytosolic in control but relocalized to mitochondria upon coexpression with the respective binder. F-G. In vivo multiplexing. F. Experimental design: cell coexpress two target peptides fused to GFP and mScarlet TM and their corresponding specific binder fused to mitochondria or peroxysome targetting sequences. If orthogonal binding occurs, GFP and mScarlet TM signals should not overlap. G. Live, spreading, U2OS coexpressing PLPx6-GFP, IRPx6- mScarlet, Mito-RPB_PLP2_R6 and PEX-RPB_LRP2_R6_FW6 imaged by SDCM. Note the absence of overlap between channels. Images correspond to maximum intensity z-projections (ǻz= 6 μm). Dash line: cell outline. Scale bars: 10 μm. Figure 3. Evaluation of design accuracy by X-ray crystallography. (A-C) Superposition of computational design models on experimentally determined crystal structures. (A) RPB_PEW3_R4-PAWx4, (B) RPB_PLP3_R6-PLPx6, (C) RPB_LRP2_R4- LRPx4, (D-G) RPB_PLP1_R6-PLPx6, (D) overview of superimposition of the computational design model and crystal structure. € 90 degree rotation of (D); the complex is shown in surface mode for shape complementarity, (F) Zoom-in interaction of the internal three-unit from (D) (front view); Glutamine residues from the protein in both design and crystal structure are as sticks to show the accuracy of the designed sidechain-to-backbone bidentate ladder. (G) Zoom-in interaction of the back view of (F); Tyrosine residues from the protein in both design and crystal structure are in sticks to show the accuracy of designed polar interactions on the other side. Figure 4. Design of binders specific for endogenous human proteins (A) Schematic model of the human PAXT complex composed of a heterotetramer of ZFC3H1 and MTR4. Domain acronyms: CC: Coiled-coil ; ZN: Zn-finger domain. Inset shows the details of the environment of the target sequence (SEQ ID NO: 112). (B) Surface shape complementarity between the target peptide from ZFC3H1 (sphere) and the highest affinity cognate binder ĮZFC-high. (C) Fluorescence polarization binding curves between indicated ZFC3H1 binders and the target ZFC3H1 peptide (PLP)4PEDPEQPPKPP (SEQ ID NO: 107). As a negative control, we used the (PLP)x6 binder, RPB_PLP3_R6 (see Fig.3). ĮZFC-high shows higher binding affinity to the target peptide than ĮZFC-low, on the contrary to RPB_PLP3_R6, which shows negligible binding. (D) Superdex TM 20010/300 GL size exclusion chromatography profiles of purified ĮZFC-high, a fusion between GFP and a 103AA fragment of the disordered region of ZFC3H1 containing the target sequence (see A.), or a 1:1 mix of the two after 2h incubation. (E) Top: Hela cell extracts were subjected to pull down using indicated binders bound to NiNTA agarose beads, or naked beads as control. Recovered proteins were processed for western blot against endogenous ZFC3H1 (or tubulin as a loading control). Bottom, coomassie stained, SDS-PAGE gel of the samples analyzed in top panel. These panels are representative of n=3 experiments. (F) Proteomics analysis of the his-pull down samples shown in C. Top panel: overlap between the proteins identified, setting s threshold of five peptides for correct identification. bottom panel: examples of proteins identified (number indicates exclusive peptide counts. protein coverage is indicated in parenthesis). Figure 5. Examples of repeat proteins computationally designed to bind to (A) extended beta strand, (B) polypeptide II, and (C) helical peptide backbones. Figure 6. Mitochondria immunostainings in control U2OS cells. Wild type U2OS cells were spread onto fibronectin coverslips as in Fig.2, then fixed and processed for immunofluorescence using TOM20 antibodies as a marker of mitochondria. Note that mitochondria appearance in these control cells is similar to that observed upon overexpression of designed binders fused to mitochondria targeting sequences (Fig.2). suggesting that these constructs do not affect mitochondria shape. Scale bar: 10 μm. Figure 7. SSM binding interface footprinting results were consistent with the design model and crystal structure. (A) Using a PPL repeat peptide binder as an example, a heatmap presenting enrichment analysis for each mutation is generated. WT sequences are indicated in the cells labeled with amino acid one-letter codes. The mutants missing in the expression library are labeled with asterisks. Two positions (109Q and 156Q) are highlighted as examples showing conserved positions. Almost all mutations other than the WT in these two positions are greatly depleted. (B) Illustration shows the SSM region, and the two conserved positions (109Q and 156Q). Figure 8. Characterization of ZFC3H1 bindersHela cell extracts were subjected to pull down using indicated binders bound to NiNTA agarose beads, or naked beads as control. Recovered proteins were processed for western blot against endogenous ZFC3H1 (or tubulin as a loading control). Two completely independant experiments are shown. This experiments are repeats of the experiment presented in Fig.4E, albeit at a different salt concentration, namely 50 mM instead of 150 mM. Detailed Description All references cited are herein incorporated by reference in their entirety. Within this application, unless otherwise stated, the techniques utilized may be found in any of several well-known references such as: Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press), Gene Expression Technology (Methods in Enzymology, Vol.185, edited by D. Goeddel, 1991. Academic Press, San Diego, CA), “Guide to Protein Purification” in Methods in Enzymology (M.P. Deutshcer, ed., (1990) Academic Press, Inc.); PCR Protocols: A Guide to Methods and Applications (Innis, et al. 1990. Academic Press, San Diego, CA), Culture of Animal Cells: A Manual of Basic Technique, 2nd Ed. (R.I. Freshney.1987. Liss, Inc. New York, NY), Gene Transfer and Expression Protocols, pp.109-128, ed. E.J. Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, TX). As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V). All embodiments of any aspect of the disclosure can be used in combination, unless the context clearly dictates otherwise. Unless the context clearly requires otherwise, throughout the description and the claims, the words ‘comprise’, ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words “herein,” “above,” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application. The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While the specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize. In one aspect, the disclosure provides polypeptides comprising or consisting of an amino acid sequence at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-105. The polypeptides of the disclosure bind tripeptide repeat sequences (for example, PLP, LRP, PEW, IYP, PRM and/or PKW) and thus can be used, for example as synthetic peptide ligands for receptor activation or to bind to proteins natively expressed by cells. The proteins are expressed at high levels in E. coli, are hyperstable, and bind peptides with 4-6 copies of the target tripeptide sequences with nanomolar to picomolar affinities both in vitro and in living cells. The amino acid sequences of SEQ ID NO:1-105 are provided in Tables 1-3. Table 1 Table 2 Table 3 In one embodiment, any N-terminal methionine residue is deleted and is not considered in determining percent identity relative to the reference protein. In another embodiment, any N-terminal methionine residue is present and is considered in determining percent identity relative to the reference protein. In other embodiments, 1, 2, 3, 4, or 5 amino acids at the N-terminus and/or C-terminus of the polypeptide may be absent, and not considered in determining percent identity relative to the reference protein. In one embodiment substitutions relative to the reference sequence are conservative amino acid substitutions. As used herein, a “conservative amino acid substitution” means a given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are known. Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp.73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Particular conservative substitutions include, but are not limited to, Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu. The polypeptides of the disclosure may comprise further amino acids, such as additional amino acids at the N-terminus and/or C-terminus, which are not considered when determining percent identity relative to the reference polypeptide. In one embodiment, the polypeptides further comprises one or more functional domains. The polypeptides may comprise any further functional domain fused to the polypeptide that may be of use for an intended purpose. In various non-limiting embodiments, the resulting fusion protein comprises an additional functional domain such as detectable proteins, purification tags, protein antigens, and protein therapeutics. The functional domain may be a genetic fusion or may be otherwise covalently linked to the polypeptide. As used throughout the present application, the term "polypeptide" is used in its broadest sense to refer to a sequence of subunit amino acids. The polypeptides of the invention may comprise L-amino acids + glycine, D-amino acids + glycine (which are resistant to L-amino acid-specific proteases in vivo), or a combination of D- and L-amino acids + glycine. The polypeptides described herein may be chemically synthesized or recombinantly expressed. The polypeptides may be linked to other compounds to promote an increased half-life in vivo, such as by PEGylation, HESylation, PASylation, glycosylation, or may be produced as an Fc-fusion or in deimmunized variants. Such linkage can be covalent or non-covalent as is understood by those of skill in the art. In another embodiment, the disclosure comprises a library comprising a plurality (at least 2, 5, 10, 25, 50, 75, 100, or more) of different polypeptides of any embodiment or combination of embodiments herein. The disclosure also provides nucleic acids encoding the polypeptide of any embodiment or combination of embodiments of the disclosure. The nucleic acid sequence may comprise single stranded or double stranded RNA (such as an mRNA) or DNA in genomic or cDNA form, or DNA-RNA hybrids, each of which may include chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Such nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded polypeptide, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the disclosure. In a further aspect, the disclosure provides expression vectors comprising the nucleic acid of any aspect of the disclosure operatively linked to a suitable control sequence. "Expression vector" includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operably linked to the nucleic acid sequences of the disclosure are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered "operably linked" to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector. In another aspect, the disclosure provides host cells that comprise the polypeptides, oligomers, nucleic acids, expression vectors (i..e.: episomal or chromosomally integrated), disclosed herein, wherein the host cells can be either prokaryotic or eukaryotic. The cells can be transiently or stably engineered to incorporate the nucleic acids or expression vector of the disclosure, using techniques including but not limited to bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection. In one embodiment, the disclosure provides pharmaceutical compositions comprising (a) the polypeptide, nucleic acid, expression vector, and/or host cell of any preceding claim; and (b) a pharmaceutically acceptable carrier. In one embodiment, the polypeptide, nucleic acid, expression vector, and/or host cell comprise or encode a polypeptide comprising one or more functional domain selected from protein antigens and protein therapeutics. The composition may also comprise (a) a lyoprotectant; (b) a surfactant; (c) a bulking agent; (d) a tonicity adjusting agent; (e) a stabilizer; (f) a preservative and/or (g) a buffer. In some embodiments, the buffer in the pharmaceutical composition is a Tris buffer, a histidine buffer, a phosphate buffer, a citrate buffer or an acetate buffer. The composition may also include a lyoprotectant, e.g. sucrose, sorbitol or trehalose. In certain embodiments, the composition includes a preservative e.g. benzalkonium chloride, benzethonium, chlorohexidine, phenol, m-cresol, benzyl alcohol, methylparaben, propylparaben, chlorobutanol, o-cresol, p-cresol, chlorocresol, phenylmercuric nitrate, thimerosal, benzoic acid, and various mixtures thereof. In other embodiments, the composition includes a bulking agent, like glycine. In yet other embodiments, the composition includes a surfactant e.g., polysorbate-20, polysorbate-40, polysorbate- 60, polysorbate-65, polysorbate-80 polysorbate-85, poloxamer-188, sorbitan monolaurate, sorbitan monopalmitate, sorbitan monostearate, sorbitan monooleate, sorbitan trilaurate, sorbitan tristearate, sorbitan trioleaste, or a combination thereof. The composition may also include a tonicity adjusting agent, e.g., a compound that renders the formulation substantially isotonic or isoosmotic with human blood. Exemplary tonicity adjusting agents include sucrose, sorbitol, glycine, methionine, mannitol, dextrose, inositol, sodium chloride, arginine and arginine hydrochloride. In other embodiments, the composition additionally includes a stabilizer, e.g., a molecule which substantially prevents or reduces chemical and/or physical instability of the nanostructure, in lyophilized or liquid form. Exemplary stabilizers include sucrose, sorbitol, glycine, inositol, sodium chloride, methionine, arginine, and arginine hydrochloride. The disclosure further provides methods for use of the polypeptide, nucleic acid, expression vector, library, and/or host cell of any embodiment herein for any suitable purpose. In one embodiment, the methods comprise use as synthetic peptide ligands for receptor activation. In another embodiment, the polypeptide comprises an additional functional domain comprising a protein antigen and/or protein therapeutics, and the methods may comprise a method for treating a disorder that the protein therapeutic is effective against, or generating an immune response against the protein antigen. The disclosure also provides methods for deigning a peptide binding polypeptide, comprising any combination of steps as disclosed in the examples. Examples General approaches for designing sequence-specific peptide binding proteins would have wide utility in proteomics and synthetic biology. Although considerable progress has been made in designing proteins which bind to other proteins, the general peptide binding problem is more challenging as most peptides do not have defined structures in isolation, and to offset the loss in solvation upon binding the protein binding interface has to provide specific hydrogen bonds that complement the majority of the buried peptide’s backbone polar groups. We describe a general approach for de novo design of proteins made out of repeating units that bind peptides with repeating sequences such that there is a one to one correspondence between repeat units on the protein and peptide. We develop a rapid docking plus geometric hashing method to identify protein backbones and protein-peptide rigid body arrangements that are compatible with bidentate hydrogen bonds between side chains on the protein and the backbone of the peptide; the remainder of the protein sequence is then designed using Rosetta TM to incorporate additional interactions with the peptide and drive folding to the desired structure. We use this approach to design, from scratch, alpha helical repeat proteins that bind six different tripeptide repeat sequences (PLP, LRP, PEW, IYP, PRM and PKW) in near polyproline 2 helical conformations. The proteins are expressed at high levels in E. coli, are hyperstable, and bind peptides with 4-6 copies of the target tripeptide sequences with nanomolar to picomolar affinities both in vitro and in living cells. Crystal structures reveal repeating interactions between protein and peptide interactions as designed, including a ladder of protein sidechain to peptide backbone hydrogen bonds. By redesigning the binding interfaces of individual repeat units, specificity can be achieved for non-repeating sequences, and for naturally occurring proteins containing disordered regions. Our approach provides a general route to designing specific binding proteins for a broad range of repeating and non-repetitive peptide sequences. We set out to generalize peptide recognition by modular repeat-protein scaffolds to arbitrary repeating peptide backbone geometries. This requires solving two main challenges: building protein structures with a repeat spacing and orientation matching that of the target peptide conformation, and ensuring the replacement of peptide-water hydrogen bonds in the unbound state with peptide-protein hydrogen bonds in the bound state. The first challenge is critical for modular and extensible sequence recognition: if individual repeat units in the protein are to bind individual repeat units on the peptide in the same orientation, the geometric phasing of the repeat units on protein and peptide must be compatible. The second challenge is critical for achieving high binding affinity: in conformations other than the alpha and 3-10 helix, the NH and C=O groups make hydrogen bonds with water in the unbound state that need to be replaced with hydrogen bonds to the protein upon binding to avoid incurring a substantial free energy penalty. To address the first challenge, we reasoned that a necessary (but not sufficient) criterion for in-phase geometric matching between repeating units on designed protein and peptide was a correspondence between the superhelices that the two trace out. All repeating polymeric structures trace out superhelices which can be described by three parameters: the translation (rise) along the helical axis per repeat unit, the rotation (twist) around this axis, and the distance (radius) of the repeat unit centroid from the axis (Fig.1A). As described in the methods, we generated large sets of repeating protein backbones sampling a wide range of superhelical geometries. We generated corresponding sets of repeating peptide backbones by randomly sampling di-peptide and tri-peptide conformations in allowed regions of the Ramachandran map (avoiding intra-peptide steric clashes), and then repeating these four to six times to generate 12-24 residue peptides. We then searched for matching pairs of repeat protein and repeat peptide backbones, requiring that the rise be within 0.2A, the twist within 5 degrees, and the radius differ by at least 4Å (the difference in radius is necessary to avoid clashing between peptide and protein; the peptide can wrap either outside or inside the protein). To address the second challenge, we reasoned that bidentate hydrogen bonds between side chains on the protein and pairs of backbone groups or backbone and sidechain groups on the peptide could allow the burying of sufficient peptide surface area on the protein to achieve high affinity binding without incurring a large desolvation penalty. As the geometric requirements for such bidentate hydrogen bonds are quite strict, we developed a geometric hashing approach to enable rapid identification of rigid body docks of the peptide on the protein compatible with ladders of bidentate interactions. To generate the hash tables for bidentate sidechain-backbone interactions, Monte Carlo simulations of individual sidechain functional groups making bidentate hydrogen bonding interactions with peptide backbone and/or sidechain groups were carried out using the Rosetta TM energy function (12), and a move set consisting of both rigid body perturbations and changes to the peptide backbone torsions (Fig.1B; see Methods for details). For each accepted (low energy) arrangement, sidechain rotamer conformations were built backwards from the functional group to identify the possible placements of the protein backbone from which the bidentate interaction could be realized. The results of these calculations were stored in hash tables: for each placement, a hash key was computed from the rigid body transformation and peptide backbone and side chain torsion angles determining the position of the hydrogen bonding groups (for example the phi and psi torsion angles for a bidentate hydrogen bond to the NH and CO groups of the same amino acid), and the chi angles of the corresponding rotamer were stored in the hash for this key (18). Hash tables were generated for ASN and GLN making bidentate interactions with the N-H and C=O groups on the backbone of a single residue or adjacent residues, ASP or GLU making bidentate interactions with the N-H groups of two successive amino acids, and for sidechain-sidechain pi-pi and cation-pi interactions (see Methods). To identify rigid body docks that enable multiple bidentate hydrogen bonds between repeat protein and peptide, we took advantage of the fact that for matching two superhelical structures along their common axis, there are only two degrees of freedom: the translational and rotational offsets of one super helix to the other. For each repeat protein-repeat peptide pair, we carried out a grid search in these two degrees of freedom, sampling relative translations and rotations in ~1 Å and 10 degree increments (Figure 1E). For each generated dock, we computed the rigid body orientation for each peptide-protein residue pair, and queried the hash tables to very rapidly determine if these were suitable for any of the bidentate interactions; docks for which there were lower than a threshold number of matches were discarded. For the remaining docks, following building of the interacting side chains using the chi angle information stored in the hash, and rigid body minimization to optimize hydrogen bond geometry, we used Rosetta TM combinatorial optimization to design the protein and peptide sequences (20), keeping the residues identified in the hash matching fixed, and enforcing sequence identity between repeats in both peptide and protein (see Methods). In initial calculations with unrestricted sampling of peptide conformations, designs were generated with a wide range of peptide conformations. Examples of repeat proteins designed to bind to extended beta strand, polypeptide II, and helical peptide backbones, as well as a range of less canonical structures are shown in Fig 5. Reasoning that proline containing peptides would incur a lower entropic cost upon binding, we decided to start experimental characterization with designs containing at least one proline residue; in most such designs the peptide backbone is in or near the polyproline II portion of the Ramachandran map. Our design strategy requires matching the twist of the repeat unit of the peptide with that of the protein, and hence choosing a repeat length of the peptide that generates close to a full 360 degree turn requires less of a twist in the repeat protein; for the polyproline helix there are roughly 3 residues per turn and likely because of this we obtained more designs which target 3 residue than 2 residue proline containing repeat units. We selected for experimental characterization 43 designed complexes with near ideal bidentate hydrogen bonds between protein and peptide, favorable protein-peptide interaction energies (12), interface shape complementary (21), and few interface unsatisfied hydrogen bonds (22) which consistently retained more than 80% of the interchain hydrogen bonds in 20ns molecular dynamics trajectories. We obtained synthetic genes encoding the designed proteins with a terminal biotinylation tag, expressed the proteins in E. coli, and purified them by Ni-NTA chromatography. 30 of 49 were monomeric and soluble. To assess binding, the target peptides were displayed on the yeast cell surface (23), and binding to the repeat proteins was monitored by flow cytometry. To obtain a complete readout of the peptide binding specificity of individual designs, we in parallel used large scale array based oligonucleotide synthesis to generate yeast display libraries encoding all 2 and 3 residue repeat peptides with 8 repeat units each, and used fluorescence activated cell sorting (FACS) followed by Sanger sequencing to identify the peptides recognized by each designed protein. Many of the designs bound peptides with sequences similar to those targeted but the affinity and specificity were both relatively low, with most of the successes for 3 residue repeat units. Based on these results, we sought to increase the peptide sequence specificity of the computational design protocol, focusing on design of binders for peptides with 3 residue repeat units. First, we required that each non-proline residue in the peptide make specific contacts with the protein, and that the pockets and grooves engaging sidechains emanating on the two sides of the peptide were quite distinct. Second, following design, we evaluated the change in binding energy (Rosetta TM DDG) (24) for all single residue changes to the peptide repeating unit, and selected only designs for which the design target sequence made the most favorable interactions with the designed protein. Third, we used computational Alanine scanning to remove hydrophobic residues on the protein surface not contributing to binding specificity to decrease non-specific binding (25). Fourth, to assess the structural specificity of the designed peptide binding interface, we carried out Monte Carlo flexible backbone docking calculations, starting from large numbers of peptide conformations with superhelical parameters in the range of those of the proteins, and selected those designs with converged peptide backbones (RMSD<2.0 among the top 20 designs with lowest DDG) close to the design model (RMSD<1.5). We tested 54 second-round designed protein-peptide pairs using the yeast flow cytometry assay described above.42 of the designed proteins were solubly expressed in E. coli, and 16 designed bound their targets with considerably higher affinity and specificity than in the first round. We selected six designs with diverse superhelical parameters and shapes, and a range of target peptides for more detailed characterization (PLPx6; LRPx6; PEWx6; IYPx6; PRMx6; and PKWx6). As evident in the design models, these are six repeat proteins with a one to one match between repeat units in the protein and in the target peptide. Small Angle X-ray Scattering (SAXS) profiles (26, 27) were close to those computed from the design models, suggesting that the proteins fold into the designed shapes in solution. Circular dichroism (CD) studies showed that all six are largely helical and thermostable up to 95°C. Bio-Layer Interferometry (BLI) characterization of binding to biotinylated target peptides immobilized on Octet TM sensor chips revealed dissociation constant (Kd) values ranging from <500 pM to ~40nM; five out of six with dissociation half-life >= 500s. Little decrease in binding was observed after storage of the proteins for 30 days at 4°C. Three out of the six designs showed little dissociation after 1,000 - 2,000s in buffer, indicating the Kd is too tight to be accurately measured with BLI. The binding surfaces of several related designs were subjected to Site Saturation Mutagenesis (SSM) (28) on yeast; and following incorporation of 1-3 enriched substitutions strong binding signals were obtained in flow cytometry using only 10 pM biotinylated cognate peptides for 2 designs (RPB_PL1-R6 and RPB_PEW1_R6). Many current cell biology approaches (29) involve tagging cellular target proteins with a protein or peptide, and then introducing into the same cell a protein which binds the tag with high affinity and specificity, but does not bind endogenous targets. A bottleneck in such studies is that binders obtained from antibody-scaffold (scFV or VHH) based library screens often do not fold properly in the reducing environment of the cytosol, resulting in loss of binding (30). We reasoned that our binders would not have this limitation as they are designed for stability and lack disulfide bonds. As a proof of concept, we coexpressed the peptide PLPx6 fused to GFP and its cognate binder, RPB_PLP2_R6, a variant of RPB_PLP1_R6, fused to both mScarlet TM and a targeting sequence for the mitochondria outer membrane (Fig. 2A). While the PLPx6 peptide on its own was diffuse in the cytosol (Fig.2B), upon coexpression with the binder, it was relocalized to mitochondria (Fig. 2C; see also Fig. 6 for controls that binder overexpression does not affect mitochondria shape). Thus the PLPx6/RPB_PLP2_R6 pair retains binding activity in cells. Similar results were obtained for IRPx6-GFP and its cognate binder PXX13_FW6 (Fig.2D,E). If individual repeat units on the designed protein engage individual repeat units on the target peptide, binding affinity should increase with increasing the number of repeats. We investigated this with four of our designed systems, in two cases varying the number of protein repeats while keeping the peptide constant, and in the other two, varying the number of peptide repeats while keeping the protein constant. Six-repeat versions of RPB_LRP2_R6 and RPB_PEW2_R6 had higher affinity for eight-repeat LRP and PEW peptides than four- repeat versions without any decrease in specificity. Similarly, six-repeat IYP and PLP peptides had higher affinity for six-repeat versions of the cognate designed repeat proteins (RPB_IYP1_R6, RPB_PLP1_R6) than four-repeat versions. These results are consistent with one to one modular interaction between repeat units on the protein and peptide, and suggest a route to very high binding affinity by simply increasing the number of interacting repeat units. The ability to vary the affinity simply by varying the number of repeats could be useful in many contexts where competitive binding would be advantageous; for example for protein purification by affinity purification, a peptide with a larger number of repeats than that fused to the protein being expressed could be used for elution. To assess the structural accuracy of our design method, we used X-ray crystallography. We succeeded in obtaining high-resolution co-crystal structures of three first-round designs (RPB_PEW3_R4 - PAWx4, RPB_LRP2_R4 - LRPx4, RPB_PLP3_R6 - PLPx6) and one second-round design (RPB_PLP1_R6 - PLPx6) (Figure 3); and a crystal structure of the unbound first-round design RPB_LRP2_R4. In the crystal structure of RPB_PLP3_R6 - PLPx6 design, the PLP units fit exactly into the designed curved groove formed by repeating tyrosine, alanine, and tryptophan residues matching the design model with near atomic accuracy, with CĮ rmds of 1.70 Å for the binder apo, 2.00 Å for the peptide neighbor interface and 1.64 Å for the whole complex (Figure 3B). In the co-crystal structure of RPB_PEW3_R4 - PAWx4, the PAW units bind to a relatively flat groove formed by repeating histidine residues and glutamine residues as designed (Figure 3A); the CĮ root- mean-square deviation (RMSD) between design model and crystal structure over the repeat protein isௗ2.08 Å, and the median value of the RMSD to the crystal structure over the peptide and the binding residues in the flexible docking generated ensemble (which converged less well than for the second round designs) is 2.12Å within 0.03 Å- 3.89 Å. For RPB_LRP2_R4 - LRPx4, flexible backbone docking converged well with the LRP units sitting in between repeating Glutamine residues and Phenylalanine residues as designed, and the peptide Arginine sidechain sampling two distinct states associated with parallel and antiparallel protein binding modes. The lowest energy docked structure was close to the crystal structure with CĮ rmds of 1.15 Å for the binder alone, 0.98 Å for the peptide plus protein contacting residues, and 1.16 Å over the entire complex (Figure 2C). SSM binding interface footprinting results were consistent with the design model and crystal structure (Fig.7), and a FtoW substitution that increases interactions across the interface substantially increases affinity). The 2.15A crystal structure of the 2nd round design RPB_PLP1_R6 - PLPx6 (SEQ ID NO: 114) highlights key features of the computational design protocol. The PLPx6 (SEQ ID NO: 114) peptide binds to the slightly curved groove primarily through polar interactions from tyrosine, hydrophobic interactions from Valine, and sidechain-backbone bidentate hydrogen bonds from Glutamine exactly as designed (Figure 3D-4G). The CĮ rmds are 1.11 Å for the peptide neighbor interface and 1.81 Å for the binder apo, 1.91 Å for the complex. All interacting side-chains from both the protein side and the peptide side in the computational design model are nearly perfectly recapitulated in the crystal structure. This design has near picomolar binding affinity and high specificity for the PLP target sequence. We next investigated the specificity of the six designs. The PLPx6 (SEQ ID NO: 114), LRPx6 (SEQ ID NO: 115), PEWx6 (SEQ ID NO: 116), IYPx6 (SEQ ID NO: 117), PKWx6 (SEQ ID NO: 118) binders showed almost complete orthogonality in the 5~40nM concentration range, with each design binding its cognate designed repeat peptide much more strongly than the other repeat peptides. For example, PLPx6 binds design RPB_PLP1_R6 strongly at 5nM, but shows no binding signal to design RPB_IYP1_R6 at 40nM, while PEWx6 binds design RPB_PEW1_R6 but not design RPB_PKW1_R6 at all at 20nM. Some crosstalk was observed between the PRMx6 (SEQ ID NO: 119) and LRPx6 (SEQ ID NO: 115) binders perhaps involving the arginine residue which makes cation-pi interactions in both designs. We observe similar orthogonality of the interaction between peptide/binder pairs in cells, as the IRPx6 (SEQ ID NO: 120) and PLPx6 (SEQ ID NO: 114) binders specifically direct localization of their cognate peptides to different compartments when coexpressed in the same cells (Fig.2E, F). By enabling the design of multiple orthogonal protein-peptide pairs, our approach provides a route to probing the effects of localizing different proteins to different locations in the same cell. As described thus far, our approach enables specific binding of peptides with perfectly repeating structures. To go beyond this limitation and enable targeting of a much wider range of non-repeating peptides, we investigated the redesign of a subset of the peptide repeat unit binding pockets to change their specificity. We broke the symmetry in the designed repetitive binding interface by redesigning both protein and peptide in one or more repeats of six-repeat complexes; the rest of the interface was kept untouched to maintain binding affinity. Following redesign, the peptide backbone conformation was optimized by Monte Carlo resampling and rigid body optimization (see Methods). Designs were selected for experimental characterization as described above, favoring those for which the new design had lower binding energy for the new peptide than the original peptide. We redesigned the PLPx6 (SEQ ID NO: 114) binder RPB_PLP3_R6 to bind two PEP units in the third and fourth positions (target binding sequence PLPPLPPEPPEPPLPPLP (SEQ ID NO: 106), or more concisely, PLP2PEP2PLP2). The redesigned protein, called RPB_hyb1_R6, bound the redesigned peptide considerably more tightly in Octet TM experiments, while the original design favored the original perfectly repeating sequence, resulting in nearly complete orthogonality. We next designed another hybrid starting from the RPB_IYP1_R6 - IYPx6 complex, changing 3 of the IYP units to RYP, generating IYP 3 RYP 3 , and redesigning the corresponding binding pockets. The new design, RPB_hyb2_R6, selectively bound the intended cognate target as well. We measured binding of all four proteins against all four peptides, and observed quite high specificity of the designed repeat proteins for their intended peptide targets. The ability to design hybrid binders against non-repetitive sequences opens the door to the de novo design of binders against endogenous proteins. Intrinsically disordered regions (IDR) are targets of choice, as they have been very difficult to specifically recognize using other approaches, and folding will not interfere with binding. As a proof of concept, we focused on human ZFC3H1, a 200 kDa protein that together with MTR4 forms the heterotetrameric poly(A) tail exosome targeting (PAXT) complex, which directs a subset of long polyadenylated poly(A) RNAs for exosomal degradation (Fig.4A. We designed binders against ZFC3H residues 594-620 (PLP4PEDPEQPPKPPF (SEQ ID NO: 107)) which lies within a ~100 residue disordered region (Fig.4A), by extending both the protein and peptide in the PLPx4 (SEQ ID NO: 109) designed complex. On the peptide side, we kept the (PLP)x4 (SEQ ID NO: 109) backbone fixed, and used Monte Carlo sampling with Ramachandran map biases to model the remaining sequence (PEDPEQPPKPPF (SEQ ID NO: 108)); on the protein side, we extended the PLPx4 design with four additional repeats and designed the interface with each conformer, and selected eight designs for experimental characterization, as described above for the pure repeat binders. Eight designs were expressed, and found to bind the extended target peptide by biolayer interferometry. The two highest affinity designs were further characterized by fluorescence polarization and found to bind the 24 residue target peptide. The one with the highest affinity, named ĮZFC-high (Fig. 4B), co-eluted with a 103 amino acid segment of the disordered region of ZFC3H1 containing the targeting sequence by Size-Exclusion Chromatography (SEC) (Fig.4D), demonstrating that the binder can recognize the target peptide in a larger protein context. ĮZFC-high specifically pulled down the endogenous ZFC3H1 from human cell extracts when assessed by western blot with established antibodies (Fig.4E, upper panel), in contrast to the lower-affinity binder ĮZFC-low, which has similar size and surface composition and hence provides a control for non-specific association (see Fig.8 for replicates, and Fig.4F for independent identification of ZFC3H1 by mass spectrometry). Mass Spectrometry revealed that MTR4 was enriched in the ĮZFC-high pull down, demonstrating that the binder can recognize the native PAXT complex in a physiological context. We also detected in the ĮZFC-high pulldown, but not the ĮZFC-low pulldown, additional ZFC3H1 partners present in the Bioplex 3.0 interactome in multiple cell lines (33, 34), including BUB3 and ZN207, and multiple RNA binding proteins which likely associate with PAXT - RNA assemblies (Fig.4F). Our results demonstrate that by matching superhelical parameters between repeating protein and peptide conformations together with incorporation of specific hydrogen bonding and hydrophobic interactions, new repeat proteins binding repeating peptide sequences with high affinity and specificity can now be designed. The approach should be generalizable to a wide range of repeating peptide structures, and the ability to break symmetry by redesigning individual repeat units opens the door to more general peptide recognition. Our approach complements current efforts at achieving general peptide recognition by redesign of naturally occurring repeat proteins; an advantage of our approach is that a much broader range of protein conformations and binding site geometries can be generated by de novo protein design than by starting with a native protein backbone. Proteins binding repeating or nearly repeating sequences could have applications as affinity reagents for diseases such as Huntington’s which are associated with repeat expansions, and rigid fusion of protein modules designed, using the approach described here, to recognize different di, tri and tetra peptide sequences provides an avenue to achieving specific recognition of entirely non- repeating sequences. The ability to design specific binders to proteins containing large disordered regions, demonstrated by the specific pull down of the PAXT complex (Fig 4), should contribute to delineating the functions of this important but relatively poorly understood class of proteins and reduce reliance on animal immunization to generate antibodies, which can also suffer from reproducibility issues. More generally, our results demonstrate the power of computational protein design for targeting peptides not having rigid three dimensional structures, and as the designed proteins are expressed at quite high levels and very stable, we anticipate that these and further designs for a wider range of target sequences should find broad use in proteomics and other applications requiring specific peptide recognition. REFERENCES 1. N. London, D. Movshovitz-Attias, O. Schueler-Furman, The Structural Basis of Peptide- Protein Binding Strategies. Structure.18, 188–199 (2010). 2. V. Neduva, R. Linding, I. Su-Angrand, A. Stark, F. de Masi, T. J. Gibson, J. Lewis, L. Serrano, R. B. Russell, Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks. PLOS Biol.3, e405 (2005). 3. V. Neduva, R. B. Russell, Peptides mediating interaction networks: new leads at last. Curr. Opin. Biotechnol.17, 465–471 (2006). 4. P. Ernst, A. Plückthun, Advances in the design and engineering of peptide-binding repeat proteins. Biol. Chem.398, 23–29 (2017). 5. M. A. Andrade, C. Petosa, S. I. O’Donoghue, C. W. Müller, P. Bork, Comparison of ARM and HEAT protein repeats11Edited by P. E. Wright. J. Mol. Biol.309, 1–18 (2001). 6. C. Reichen, S. Hansen, C. Forzani, A. Honegger, S. J. Fleishman, T. Zhou, F. Parmeggiani, P. Ernst, C. Madhurantakam, C. Ewald, P. R. E. Mittl, O. Zerbe, D. Baker, A. Caflisch, A. Plückthun, Computationally Designed Armadillo Repeat Proteins for Modular Peptide Recognition. J. Mol. Biol.428, 4467–4489 (2016). 7. E. Conti, J. Kuriyan, Crystallographic analysis of the specific yet versatile recognition of distinct nuclear localization signals by karyopherin Į. Structure.8, 329–338 (2000). 8. E. Conti, M. Uy, L. Leighton, G. Blobel, J. Kuriyan, Crystallographic Analysis of the Recognition of a Nuclear Localization Signal by the Nuclear Import Factor Karyopherin Į. Cell.94, 193–204 (1998). 9. N. Zeytuni, R. Zarivach, Structural and Functional Discussion of the Tetra-Trico-Peptide Repeat, a Protein Interaction Module. Structure.20, 397–405 (2012). 10. L. D. D’Andrea, L. Regan, TPR proteins: the versatile helix. Trends Biochem. Sci.28, 655–662 (2003). 11. P. Ernst, F. Zosel, C. Reichen, D. Nettels, B. Schuler, A. Plückthun, Structure-Guided Design of a Peptide Lock for Modular Peptide Binders. ACS Chem. Biol.15, 457–468 (2020). 12. R. F. Alford, A. Leaver-Fay, J. R. Jeliazkov, M. J. O’Meara, F. P. DiMaio, H. Park, M. V. Shapovalov, P. D. Renfrew, V. K. Mulligan, K. Kappel, J. W. Labonte, M. S. Pacella, R. Bonneau, P. Bradley, R. L. Dunbrack, R. Das, D. Baker, B. Kuhlman, T. Kortemme, J. J. Gray, The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput.13, 3031–3048 (2017). 13. S. Hansen, D. Tremmel, C. Madhurantakam, C. Reichen, P. R. E. Mittl, A. Plückthun, Structure and Energetic Contributions of a Designed Modular Peptide-Binding Protein with Picomolar Affinity. J. Am. Chem. Soc.138, 3526–3532 (2016). 14. C. Reichen, S. Hansen, A. Plückthun, Modular peptide binding: From a comparison of natural binders to designed armadillo repeat proteins. J. Struct. Biol.185, 147–162 (2014). 15. J. A. Cross, M. S. Chegkazi, R. A. Steiner, D. N. Woolfson, M. P. Dodding, Fragment- linking peptide design yields a high-affinity ligand for microtubule-based transport. Cell Chem. Biol.28, 1347-1355.e5 (2021). 16. P. J. Fleming, G. D. Rose, Do all backbone polar groups in proteins form hydrogen bonds? Protein Sci.14, 1911–1917 (2005). 17. T. J. Brunette, F. Parmeggiani, P.-S. Huang, G. Bhabha, D. C. Ekiert, S. E. Tsutakawa, G. L. Hura, J. A. Tainer, D. Baker, Exploring the repeat protein universe through computational protein design. Nature.528, 580–584 (2015). 18. L. Shimoni, J. P. Glusker, Hydrogen bonding motifs of protein side chains: descriptions of binding of arginine and amide groups. Protein Sci. Publ. Protein Soc.4, 65–74 (1995). 19. J. A. Fallas, G. Ueda, W. Sheffler, V. Nguyen, D. E. McNamara, B. Sankaran, J. H. Pereira, F. Parmeggiani, T. J. Brunette, D. Cascio, T. R. Yeates, P. Zwart, D. Baker, Computational design of self-assembling cyclic protein homo-oligomers. Nat. Chem.9, 353–360 (2017). 20. J. K. Leman, B. D. Weitzner, S. M. Lewis, J. Adolf-Bryfogle, N. Alam, R. F. Alford, M. Aprahamian, D. Baker, K. A. Barlow, P. Barth, B. Basanta, B. J. Bender, K. Blacklock, J. Bonet, S. E. Boyken, P. Bradley, C. Bystroff, P. Conway, S. Cooper, B. E. Correia, B. Coventry, R. Das, R. M. De Jong, F. DiMaio, L. Dsilva, R. Dunbrack, A. S. Ford, B. Frenz, D. Y. Fu, C. Geniesse, L. Goldschmidt, R. Gowthaman, J. J. Gray, D. Gront, S. Guffy, S. Horowitz, P.-S. Huang, T. Huber, T. M. Jacobs, J. R. Jeliazkov, D. K. Johnson, K. Kappel, J. Karanicolas, H. Khakzad, K. R. Khar, S. D. Khare, F. Khatib, A. Khramushin, I. C. King, R. Kleffner, B. Koepnick, T. Kortemme, G. Kuenze, B. Kuhlman, D. Kuroda, J. W. Labonte, J. K. Lai, G. Lapidoth, A. Leaver-Fay, S. Lindert, T. Linsky, N. London, J. H. Lubin, S. Lyskov, J. Maguire, L. Malmström, E. Marcos, O. Marcu, N. A. Marze, J. Meiler, R. Moretti, V. K. Mulligan, S. Nerli, C. Norn, S. Ó’Conchúir, N. Ollikainen, S. Ovchinnikov, M. S. Pacella, X. Pan, H. Park, R. E. Pavlovicz, M. Pethe, B. G. Pierce, K. B. Pilla, B. Raveh, P. D. Renfrew, S. S. R. Burman, A. Rubenstein, M. F. Sauer, A. Scheck, W. Schief, O. Schueler-Furman, Y. Sedan, A. M. Sevy, N. G. Sgourakis, L. Shi, J. B. Siegel, D.-A. Silva, S. Smith, Y. Song, A. Stein, M. Szegedy, F. D. Teets, S. B. Thyme, R. Y.-R. Wang, A. Watkins, L. Zimmerman, R. Bonneau, Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods.17, 665–680 (2020). 21. D. Kuroda, J. J. Gray, Shape complementarity and hydrogen bond preferences in protein– protein interfaces: implications for antibody modeling and protein–protein docking. Bioinformatics.32, 2451–2456 (2016). 22. B. Coventry, D. Baker, Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds. PLOS Comput. Biol.17, e1008061 (2021). 23. E. T. Boder, K. D. Wittrup, Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol.15, 553–557 (1997). 24. T. Kortemme, D. Baker, A simple physical model for binding energy hot spots in protein–protein complexes. Proc. Natl. Acad. Sci.99, 14116–14121 (2002). 25. T. Kortemme, D. E. Kim, D. Baker, Computational Alanine Scanning of Protein-Protein Interfaces. Sci. STKE.2004, pl2–pl2 (2004). 26. G. L. Hura, H. Budworth, K. N. Dyer, R. P. Rambo, M. Hammel, C. T. McMurray, J. A. Tainer, Comprehensive macromolecular conformations mapped by quantitative SAXS analyses. Nat. Methods.10, 453–454 (2013). 27. Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS) | Nature Methods, (available at https://www.nature.com/articles/nmeth.1353). 28. R. M. P. Siloto, R. J. Weselake, Site saturation mutagenesis: Methods and applications in protein engineering. Biocatal. Agric. Biotechnol.1, 181–189 (2012). 29. J. Helma, M. C. Cardoso, S. Muyldermans, H. Leonhardt, Nanobodies and recombinant binders in cell biology. J. Cell Biol.209, 633–644 (2015). 30. S. Moutel, N. Bery, V. Bernard, L. Keller, E. Lemesre, A. de Marco, L. Ligat, J.-C. Rain, G. Favre, A. Olichon, F. Perez, NaLi-H1: A universal synthetic library of humanized nanobodies providing highly functional antibodies and intrabodies. eLife.5, e16228 (2016). 31. A.-E. Foucher, L. Touat-Todeschini, A. B. Juarez-Martinez, A. Rakitch, H. Laroussi, C. Karczewski, S. Acajjaoui, M. Soler-López, S. Cusack, C. D. Mackereth, A. Verdel, J. Kadlec, Structural analysis of Red1 as a conserved scaffold of the RNA-targeting MTREC/PAXT complex. Nat. Commun.13, 4969 (2022). 32. N. Meola, M. Domanski, E. Karadoulama, Y. Chen, C. Gentil, D. Pultz, K. Vitting- Seerup, S. Lykke-Andersen, J. S. Andersen, A. Sandelin, T. H. Jensen, Identification of a Nuclear Exosome Decay Pathway for Processed Transcripts. Mol. Cell.64, 520–533 (2016). 33. D. K. Schweppe, E. L. Huttlin, J. W. Harper, S. P. Gygi, BioPlex Display: An Interactive Suite for Large-Scale AP–MS Protein–Protein Interaction Data. J. Proteome Res.17, 722–726 (2018). 34. E. L. Huttlin, R. J. Bruckner, J. Navarrete-Perea, J. R. Cannon, K. Baltier, F. Gebreab, M. P. Gygi, A. Thornock, G. Zarraga, S. Tam, J. Szpyt, B. M. Gassaway, A. Panov, H. Parzen, S. Fu, A. Golbazi, E. Maenpaa, K. Stricker, S. Guha Thakurta, T. Zhang, R. Rad, J. Pan, D. P. Nusinow, J. A. Paulo, D. K. Schweppe, L. P. Vaites, J. W. Harper, S. P. Gygi, Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell.184, 3022-3040.e28 (2021). 32. N. Meola, M. Domanski, E. Karadoulama, Y. Chen, C. Gentil, D. Pultz, K. Vitting- Seerup, S. Lykke-Andersen, J.S. Andersen, A. Sandelin, T.H. Jensen, Identification of a Nuclear Exosome Decay Pathway for Processed Transcripts. Mol. Cell.64, 520-533 (2016). Materials and Methods Synthetic gene constructs For both the first and second round designs, a His-tag containing TEV protease cleavage site and short linkers were added to the N-terminus of protein sequences. For the protein lacking a Tryptophan residue, a single Tryptophan was added to the short N-terminal linker following the TEV protease cleavage site to help with protein concentration quantification by A280. The protein sequence along with linker (MGSSHHHHHHHHSSGGSGGLNDIFEAQKIEWHEGGSGGSENLYFQSG (SEQ ID NO: 110) or LEHHHHHH (SEQ ID NO: 111)) was reverse translated into DNA using a custom python script that attempts to maximize host-specific codon adaptation index 1 and IDT synthesize-ability, which includes optimizing whole gene and local GC content as well as removing repetitive sequences. Finally, a TAATCA stop codon was appended to the end of each gene. Genes were delivered cloned into pET-29b+ between NdeI/XhoI restriction sites. For the second round designs, the designed amino acid sequences were inserted into pET- 29b+ between Ndel/Xhol restriction sites directly. Protein expression and purification Proteins were transformed into Lemo21(DE3) E. coli from New England Biolabs (NEB) and then expressed as 50 ml cultures in 250 ml flasks using Studiers M2 autoinduction media with 50 ug/mL kanamycin. The cultures were either grown at 37°C for ~6-8 hours and then ~18°C overnight (~14 hours) or at 37°C the entire time ~14 hours. Cells were pelleted at 4,000g for 10 minutes, after which the supernatant was discarded. Pellets were resuspended in 30 ml lysis buffer (25 mM Tris HCl pH 8, 150 mM NaCl, 30 mM imidazole, 1mM PMSF, 0.75% CHAPS, 1 mM DNase, 10mM Lysozyme, with Thermo Scientific Pierce protease inhibitor tablet). Cell suspensions were lysed by microfluidizer or sonication, and the lysate was clarified at 20,000g for ~30 minutes. The His-tagged proteins were bound to Ni-NTA resin (Qiagen) during gravity flow and washed with a wash buffer (25 mM Tris HCl pH 8, 150 mM NaCl, 30 mM imidazole). Protein was eluted with an elution buffer (25 mM Tris HCl pH 8, 150 mM NaCl, 300 mM imidazole). For the first round designs, the His-tag was removed by TEV cleavage, followed by IMAC purification to remove TEV protease. The flowthrough was collected and concentrated prior to further purification by SEC/FPLC on a Superdex TM 200 increase 10/300 GL column in TBS (25 mM Tris pH 8.0, 150 mM NaCl). Circular dichroism Circular dichroism spectra were measured with an AVIV Model 420 DC or Jasco J- 1500 CD spectrometer. Samples were 0.25 mg/mL in TBS (25 mM Tris pH 8.0, 150 mM NaCl), and a 1-mm path length cuvette was used. The CD signal was converted to mean residue ellipticity by dividing the raw spectra by N × C × L × 10, where N is the number of residues, C is the concentration of protein, and L is the path length (0.1 cm). Size exclusion chromatography with multi-angle light scattering Purified samples after the initial SEC run, samples were pooled then concentrated or diluted as needed to a final concentration of 2 mg/mL.100 uL of each sample was then run through a high-performance liquid chromatography system (Agilent) using a Superdex TM 200 10/300 GL column. These fractionation runs were coupled to a multi-angle light scattering detector (Wyatt) in order to determine the absolute molecular weights for each designed protein as described previously 2 . Small angle X-ray scattering Small-Angle X-ray Scattering (SAXS) was collected at the SIBYLS High Throughput SAXS Advanced Light Source in Berkeley, California 3 . Beam exposures of 0.3 s for 10.2 s resulted in 33 frames per sample. Data was collected at low (~1.5 mg/mL) and high (~2-3 mg/mL) protein concentrations in SAXS buffer (25mM Tris pH 8.0, 150mM NaCl, 2% glycerol). The sibyls website(“SAXS FrameSlice” n.d.) was used to analyze the data for high and low centration samples and average the best dataset. If there was obvious aggregation over the 33 frames, only the data points before aggregation arose were used in the Gunier region, otherwise, all data was included for the Gunier region. All data was used for Porod and Wide regions. The averaged file was used with scatter.jar to remove data points with outlier residuals in the Gunier region. Finally, the data was truncated at 0.25 q. This dataset was then compared to the predicted SAXS profile based on the design model using the FoxS SAXS server, and volatility ratio (Vr) was calculated to quantify how well the predicted and data matched the experimental data. Proteins with Vr of less than 2.5 were considered to be folded to the designed quaternary shape. Biolayer interferometry Biolayer interferometry binding data were collected in an Octet TM RED96 (ForteBio) and processed using the instrument’s integrated software. To measure the affinity of peptide binders, N-terminal biotinylated (biotin-Ahx) target peptides with a short linker (GGS) were loaded onto streptavidin-coated biosensors (SA ForteBio) at 50-100 nM in binding buffer (10 mM HEPES (pH 7.4), 150 mM NaCl, 3 mM EDTA, 0.05% surfactant P20, 0.5% non-fat dry milk) for 120 s. Analyte proteins were diluted from concentrated stocks into the binding buffer. After baseline measurement in the binding buffer alone, the binding kinetics were monitored by dipping the biosensors in wells containing the target protein at the indicated concentration (association step) and then dipping the sensors back into baseline/buffer (dissociation). Yeast surface display S. cerevisiae EBY100 strain cultures were grown in C-Trp-Ura media and induced in SGCAA media following the protocol in (reference). Cells were washed with PBSF (PBS with 1% BSA) and labeled with biotinylated designed proteins using two labeling methods, with-avidity and without-avidity labeling. For the with-avidity method, the cells were incubated with biotinylated RBD, together with anti-c-Myc fluorescein isothiocyanate (FITC, Miltenyi Biotech) and streptavidin– phycoerythrin (SAPE, ThermoFisher). The concentration of SAPE in the with-avidity method was used at ¼ concentration of the biotinylated RBD. The with-avidity method was used in the first few rounds of screening against the repeat peptide library to fish out weak binder candidates. For the without-avidity method, the cells were firstly incubated with biotinylated designed proteins, washed, secondarily labeled with SAPE and FITC. Crystallography Crystallographic Data Collection and Refinement Statistics Table 4 Table 5 1 Numbers in parentheses refer to the highest resolution shell 2 As reported by MolProbity 4 . Crystallization and structure determination for RPB_PEW3_R4-PAWx4 Purified RPB_PEW3_R4 protein + PAWx4 (SEQ ID NO: 121) peptide at a concentration of 36 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PEW3_R4-PAWx4 grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 0.1 M MES pH 5.0, 30% (w/v) PEG 6K at 4°C, and were cryoprotected by supplementing the reservoir solution with 5% ethylene glycol. Native diffraction data was collected at APS beamline 23-ID-D, indexed to P212121 and reduced using XDS 5 (Tables 4-5). The structure was phased by molecular replacement using Phaser TM . A set of ~50 lowest energy predicted models from Rosetta TM were used as search models. Several of these models gave clear solutions, which were adjusted in Coot TM and refined using Phenix TM . Model refinement in P212121 initially resulted in unacceptably high values for Rfree - Rwork. Refinement was therefore first performed in lower symmetry space groups (P1 and P21). In the late stages of refinement, these P1 and P21 models were refined against the P2 1 2 1 2 1 , which ultimately yielded acceptable, albeit somewhat higher R-factors. Crystallization and structure determination for RPB_PLP3_R6-PLPx6 Purified RPB_PLP3_R6 protein + PLPx4 (SEQ ID NO: 109) peptide at a concentration of 70 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PLP3_R6-PLPx6 grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 2.4 M (NH4)2SO4, 0.1 M Na3Cit pH4 at 18 °C, and were cryoprotected by supplementing the reservoir solution with 2.2M sodium malonate pH4. Native diffraction data was collected at APS beamline 23-ID-D, indexed to I422 and reduced using XDS 5 (Tables 4-5). The structure was phased by molecular replacement using Phaser TM . A set of ~28 lowest energy predicted models from Rosetta TM were used as search models. Several of these models gave clear solutions, which were adjusted in Coot TM and refined using Phenix TM . Crystallization and structure determination for RPB_LRP2_R4-LRPx4 Purified RPB_LRP2_R4 protein + LRPx4 (SEQ ID NO: 122) peptide at a concentration of 21.4 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_LRP2_R4-LRPx4 grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 0.1 M HEPES pH7, 10 %(w/v) PEG6000 at 18 °C, and were cryoprotected by supplementing the reservoir solution with 25 % Ethylene glycol. Native diffraction data was collected at APS beamline 23-ID-B, indexed to P3221 and reduced using XDS 5 (Tables 4-5). The structure was phased by molecular replacement using Phaser TM . The coordinates of apo RPB_LRP2_R4 from the proteolyzed/filament structure were used as a search model. The resulting model was adjusted in Coot TM and refined using Phenix TM . Like the apo structure, this crystal structure of RPB_LRP2_R4 also contained “infinitely long filaments in the crystal, this time with peptide bound. Crystallization and structure determination for RPB_PLP1_R6-PLPx6 Purified RPB_PLP1_R6 protein + PLPx6 (SEQ ID NO: 114) peptide at a concentration of 143 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PLP1_R6-PLPx6 grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 0.2 M NaCl, 20 %(w/v) PEG3350 at 4 °C, and were cryoprotected by supplementing the reservoir solution with 15 % Ethylene glycol. Native diffraction data was collected at APS beamline 23-ID-B, indexed to H32 and reduced using XDS 5 (Tables 4-5). The structure was phased by molecular replacement using Phaser TM . A set of ~230 lowest energy predicted models from Rosetta TM were used as search models. Several of these models gave clear solutions, which were adjusted in Coot TM and refined using Phenix TM . In the later stages of refinement, two copies of the 6xPLP (SEQ ID NO: 114) peptide were built into clearly defined electron density in the asymmetric unit. The first copy adopts the expected location based on the design, and makes the designed interactions with RPB_PLP1_R6. The density for this peptide and the final atomic model (19 amino acid residues) are slightly longer than the peptide used in crystallization (18 residues); this is likely due to “slippage”/misregistration of the peptide relative to the R6PO11 in many unit cells, resulting in density longer than the peptide itself. A second copy of the peptide lies across a 2-fold symmetry axis at ~50% occupancy, resulting in the superposition of this peptide with a symmetry-derived copy of itself running in the opposite direction. Despite this, the locations of each Pro/Leu side chain unit was reasonably well defined. However, it seems unlikely that the binding of the peptide at this second site would occur readily in solution. Crystallization and structure determination for RPB_PLP1_R6, alternative conformation 1 Purified RPB_PLP1_R6 protein + PLPx6 (SEQ ID NO: 114) peptide at a concentration of 166 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PLP1_R6-PLPx6 grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 0.02 M CaCl2, 30 % (v/v) MPD, 0.1 M NaAcet pH 4.6 at 18 °C, and were cryoprotected by supplementing the reservoir solution with 5 % MPD. Native diffraction data was collected at APS beamline 23-ID-B, indexed to P22121 and reduced using XDS 5 (Tables 4-5). The structure was phased by molecular replacement using Phaser TM , using the coordinates for R6PO11 (alternative conformation 1) as a search model. The model was adjusted in Coot TM and refined using Phenix TM . In the later stages of refinement, one copy of the 6xPLP peptide was model at a site of crystal contact, where it is sandwiched between adjacent subunits in a way that is likely only bound in the crystal lattice Crystallization and structure determination for RPB_PLP1_R6, alternative conformation 2 Purified RPB_PLP1_R6 protein +PLPx6 (SEQ ID NO: 114) peptide at a concentration of 166 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PLP1_R6-PLPx6 grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 40 % (v/v) MPD, 0.1 M Na Phos Cit pH 4.2 at 18 °C, and were cryoprotected by supplementing the reservoir solution. Native diffraction data was collected at APS beamline 23-ID-B, indexed to P22121 and reduced using XDS 5 (Tables 4- 5). Initial attempts to phase by molecular replacement using Phaser TM and ~500 predicted models from Rosetta TM and RoseTTAfold TM failed to yield any clear solutions. Similarly, several thousand truncations of these models (containing all combinations of 1, 2, 3, 4, or 5 of the 6 repeat units), also failed to give clear solutions. To try to identify correct but low scoring solutions in the output of these trials, we ran SHELXE autobuilding and density modification on a large number of these potential solutions. Ultimately, we were able to identify an MR solution with 2 of 6 repeats correctly placed that allowed the autobuilding of a polyalanine model and an interpretable map, which could be further improved by iterative rounds of rebuilding in Coot TM and refinement using Phenix TM . Ultimately, the final model revealed that in this crystal form and a similar crystallization condition (RPB_PLP1_R6, alternative conformation 1, above), RPB_PLP1_R6 adopted an alternative fold. Crystallization and structure determination for RPB_LRP2_R4 Purified RPB_LRP2_R4-LRPx4 protein at a concentration of 33 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_LRP2_R4 grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 0.2 M K2HPO4, 20% (w/v) PEG 3350 at 18°C, and were cryoprotected by supplementing the reservoir solution with 15% ethylene glycol. Native diffraction data was collected at APS beamline 23-ID-B, indexed to P 3221 and reduced using XDS 5 (Tables 4-5). The structure was phased by molecular replacement using Phaser TM . A set of ~50 lowest energy predicted models from Rosetta TM , as well as a variety of truncated models, were used as search models. Several of these models gave clear solutions, which were adjusted in Coot TM and refined using Phenix TM . Four helical repeat modules were present in the asymmetric unit. However, unexpectedly, side chain density for all four repeats were very similar to one another and matched the sequence of the internal helical repeats, but not the N- and C-terminal capping repeats, which are slightly different from the internal ones. In addition, these four repeat units pack tightly against adjacent, symmetry-related molecules such that they form an “infinitely long” repeat protein running throughout the crystal. Careful examination of the the junction between each repeat unit revealed no clear breaks in electron density; the density for the backbone is continuous through the asymmetric unit, and continuous with the symmetry related molecules near the N- and C-terminus of the molecule in the asymmetric unit. Rather than truly forming an infinitely long polymer, we suspect that proteolytic cleavage of the RPB_LRP2_R4 (either during purification or crystallization led to the removal of the N- and/or C-terminal caps in many molecules, which could allow the internal repeats from separate molecules to polymerize to form fibers in the crystal. Heterogeneity in these cleavage products and how they assemble into the crystal lattice (misregistration) could consequently explain the “continuous” filaments of this repeat protein that we observe in these crystals. Cell Studies Plasmids For expression in cells, constructs were synthesized by Genescript and cloned into a modified pUC57 plasmid (Genscript) allowing mammalian expression under a EF1a promoter. Target peptides were cloned as C-terminal fusions with a linker (GAGAGAGRP (SEQ ID N): 123)) followed by EGFP (SEQ ID NO: 124). Binders were expressed as fusions with the first 34 residues of the Mas70p protein (Mito-tag), shown to efficiently relocalise proteins to mitochondria in mammalian cells 9 in N-terminal and with mScarlet TM in C- terminal 10 . Plasmids encoding the GFP-tagged peptide and the mScarlet TM -tagged binder were then cotransfected into cells. Alternatively, for in vivo demonstration of the multiplexed binding between different peptides and their cognate binders (Fig.2F,G), bicistronic plasmids were generated expressing the binder flanked with a Mito-tag TM (respectively, PEX-tag, the first 66 residues of human PEX3, targeting to peroxisomes 11 ) followed by a stop codon, then an Internal ribosome entry site (IRES) sequence and the target peptide tagged with EGFP (SEQ ID NO: 124) (respectively mScarlet TM ). Cells were then cotransfected with both bicistronic plasmids to express all four proteins. Cells U2OS cells (ATCC, HTB-96) were cultured in DMEM (Corning) supplemented with 10% fetal bovine serum (Gibco) and 1% Pen/Strep (Gibco) at 37 o C with 5% CO 2 . Cells were transfected with Lipofectamine TM 3000 (Invitrogen) according to the manufacturer’s instructions and imaged after 1 day of expression. Live-cell imaging For live cell imaging (Fig.2), U2OS cells were plated on glass-bottom dishes (World Precision Instruments, FD35) coated with fibronectin (Sigma, F1141, 50 ^g/ml in PBS), for 1 hours at 37°C DMEM-10% serum. Medium was then changed to Leibovitz’s L-15 medium (Gibco) supplemented with 20 mM HEPES (Gibco) for live cell imaging. Imaging was performed onto a custom spinning disk confocal instrument composed of Nikon Ti stand equipped with perfect focus system, a fast Z piezo stage (ASI) and a PLAN Apo Lambda 1.45 NA 100X objective, and a spinning disk head (Yokogawa CSUX1). Images were recorded with a Photometrics Prime TM 95B back-illuminated sCMOS camera run in pseudo global shutter mode and synchronized with the spinning disk wheel. Excitation was provided by 488, 561 lasers (all Coherent OBIS mounted in a Cairn laser launch) and imaged using dedicated single bandpass filters for each channel mounted on a Cairn Optospin wheel (Chroma 525/50 for GFP and Chroma 595/50 for mScarlet TM ). To enable fast 4D acquisitions, an FPGA module (National Instrument sbRIO-9637 running custom code) was used for hardware-based synchronization of the instrument, in particular to ensure that the piezo z stage moved only during the readout period of the sCMOS camera. Temperature was kept at 37 o C using a temperature control chamber (MicroscopeHeaters.Com, Brighton UK). System was operated by Metamorph. Immunofluorescence For immunofluorescence of mitochondria (Fig.6), U2OS FlpIn TM Trex cells were spread on glass-bottom dishes coated with fibronectin as above. Cells were washed with PBS then fixed in 4% PFA for 20 minutes at room temperature. Following fixation, cells were washed in with PBS and then permeabilized with 0.1% Triton X-100 in PBS for 5 minutes at room temperature. Cells were washed again with PBS and blocked in 1% BSA in PBS for 15 minutes. Cells were then incubated with TOM20 antibody (santa cruz), diluted in 1% BSA in PBS, for 1 hour at room temperature. Cells were washed 3 times with PBS and then incubated with DAPI and anti-mouse Alexa TM Fluor 488, diluted at 1/500e in 1% BSA in PBS, for one hour at room temperature. Cells were washed a final 3 times in PBS and then imaged using the spinning disk confocal described above. Pull down of endogenous proteins from extracts using designed binders For pull down of endogenous ZFC3H1 from human cell extracts, HeLa FlpIn Trex cells were lysed in lysis buffer (25 mM HEPES, 150 mM NaCl, 0.5% Tx100, 0.5% NP-40, 20 mM imidazole, pH 7.4 supplemented with Roche EDTA free protease inhibitor tablets). Lysate was incubated on ice for 10 minutes to continue lysis and then were spun at 4000 x g for 15 minutes at 4°C. The supernatant was incubated with pre-washed Ni-NTA agarose (Qiagen, 30210318/AV/01) for 1 hour with rocking at 4°C to remove/reduce proteins in the lysate that bind to the resin non-specifically. For each condition, 50 μl of fresh Ni-NTA agarose resin was washed twice in lysis buffer. Equimolar amounts of purified his-tagged binder, or as a control an equal volume of buffer, was added to the Ni-NTA agarose. The pre- cleared HeLa lysate was split evenly between the 3 conditions. An input was taken of each condition, and the tubes were incubated for 2 hours at 4°C with rocking. Beads were then washed twice in lysis buffer and twice in wash buffer (25 mM HEPES, 150 mM NaCl, 20 mM imidazole pH 7.4). Proteins were then eluted from the beads in elution buffer (25 mM HEPES, 150 mM NaCl, 500 mM imidazole, pH 7.4). Inputs and elutions were ran on a NuPage 3-8% Tris-Acetate gel (Invitrogen, EA0375) and transferred to a nitrocellulose membrane using the iBlot system (ThermoFischer). Membranes were blocked in 5% (w/v) milk in TBS-TWEEN TM (10 mM Tris-HCl, 120 mM NaCl, 1% (w/v) TWEEN20, pH 7.4) for 30 mins at room temperature with gentle shaking. Rabbit anti-ZFC3H1 (Sigma, HPA007151, used at 1:250) and Mouse anti-alpha tubulin 488 (Clone DMA1, Sigma T6199, directly labelled with Abberior® STAR 488, NHS ester leading to a 4.5 dye/antibody degree of labelling, and used at 0.1 μg/mL final concentration) were diluted 1% (w/v) milk in TBS- TWEEN and incubated with the membrane overnight at 4°C with gentle shaking. The membrane was washed 3x in TBS-TWEEN then incubated with goat anti-Rabbit Alexa 555 (Invitrogen, A32732, 1:2000) for one hour at room temperature with gentle shaking. The membrane was washed twice with TBS-TWEEN and a final wash with TBS-TWEEN with 0.001% SDS. Membranes were imaged using a ChemiDoc TM system (BioRad). Alternatively, the same samples were analyzed using 4-12% Bis-Tris gels (Invitrogen NP0323BOX) and stained with Instant blue coomassie stain (Sigma ISB1L). Note that ĮZFC-high is also able to pull down endogenous ZFC3H1 from human cell extracts when 50 mM rather than 150 mM NaCl was used in all buffers (Figure 8C). Mass Spectrometry Each line of the polyacrylamide gel presented in Fig.4C was cut into six pieces and prepared for mass spectrometric analysis by manual in situ enzymatic digestion (the gel area containing the binder was omitted from the analysis to avoid saturation of the detector by overabundance of binder peptides). Briefly, the excised protein gel pieces were placed in a well of a 96-well microtiter plate and destained with 50% v/v acetonitrile and 50 mM ammonium bicarbonate, reduced with 10 mM DTT, and alkylated with 55 mM iodoacetamide. After alkylation, proteins were digested with 6 ng/μL Trypsin (Promega, UK) overnight at 37 °C. The resulting peptides were extracted in 2% v/v formic acid, 2% v/v acetonitrile. The digest was analyzed by nano-scale capillary LC-MS/MS using a Ultimate U3000 HPLC (ThermoScientific Dionex, San Jose, USA) to deliver a flow of approximately 300 nL/min. A C18 Acclaim PepMap TM 1005 μm, 100 μm x 20 mm nanoViper (ThermoScientific Dionex, San Jose, USA), trapped the peptides prior to separation on a C18 Acclaim PepMap1003 μm, 75 μm x 150 mm nanoViper TM (ThermoScientific Dionex, San Jose, USA). Peptides were eluted with a gradient of acetonitrile. The analytical column outlet was directly interfaced via a modified nano-flow electrospray ionisation source, with a hybrid dual pressure linear ion trap mass spectrometer (Orbitrap TM Velos, ThermoScientific, San Jose, USA). Data dependent analysis was carried out, using a resolution of 30,000 for the full MS spectrum, followed by ten MS/MS spectra in the linear ion trap. MS spectra were collected over a m/z range of 300–2000. MS/MS scans were collected using a threshold energy of 35 for collision induced dissociation. LC-MS/MS data were then searched against an in house protein sequence database, containing Swiss-Prot and the protein constructs specific to the experiment, using the Mascot search engine program (Matrix Science, UK) 18 . Database search parameters were set with a precursor tolerance of 5 ppm and a fragment ion mass tolerance of 0.8 Da. Two missed enzyme cleavages were allowed and variable modifications for oxidized methionine, carbamidomethyl cysteine, pyroglutamic acid, phosphorylated serine, threonine, tyrosine, tert-butyloxycarbonyl-lysine, norbornene-lysine and prop-2-yn-1-yloxycarbonyl-lysine were included. MS/MS data were validated using the Scaffold program (Proteome Software Inc., USA) 19 . All data were additionally interrogated manually. To generate the Venn diagram in Fig .4D, we considered a threshold of minimum 5 peptides to consider that a protein had been identified. Design Methods DHR Scaffolds generation Each Designed Helical Repeat (DHR) scaffold is formed by a helix-loop-helix-loop topology that is repeated four or more times 12 . The helices range from 18 to 30 residues and the loops from 3 to 4 residues. The DHR design process goes through backbone design, sequence design and computational validation by energy landscape exploration. To match the peptides the designs were required to have a twist (omega) between 0.6 and 1.0 radians, a radius of 0 to 13 Å and a rise between 0 and 10 Å. The geometry of a repeat protein can be The backbone is designed using Rosetta TM fragment assembly guided by motifs 14 . Backbone coordinates are built up through 3,200 Monte Carlo fragment assembly steps with fragments harvested from a non-redundant set of structures from the PDB. Following each fragment insertion, the rigid body transform is propagated to the downstream repeats. The score that guides fragment assembly is composed of Van der Waal interactions, packing, backbone dihedral angles, and Residue-Pair-Transform(RPX) motifs 14 . RPX motifs are a fast way to measure the full-atom hydrophobic packability of the backbone prior to assigning side chains. After design, backbones are screened for native-like features. The loops are required to be within 0.4Å of a naturally occurring loop or rebuilt. Structures with helices above 0.14Å Sequence is designed using Rosetta TM for each backbone that passes filtering. Design begins in a symmetric mode where each repeat is identical using the RepeatProteinRelax mover. Core residues are restricted to be hydrophobic and surface residues hydrophilic using the layer design task operators. Sequence is biased toward natural proteins with similar local structure using the structure profile mover. After the symmetrical design is complete the N- terminal and C-terminal repeats are redesigned to eliminate exposed hydrophobics. Designs with poor core packing as measured by Rosetta TM holes < 0.5 are then filtered 15 . The designs are computationally validated using the Rosetta TM ab initio structure prediction on Rosetta@Home TM . Rosetta TM ab initio verifies that the design is lower energy state than the thousands of alternatives conformations sampled. Simulating a protein using Rosetta@Home TM can take several days on hundreds of CPUs. To speed this up we used machine learning to filter designs that were most likely to fail 13 . Backbone generation of curved repeat protein monomers in Poly-Proline II conformation A second round of designs was made to ensure the distance between helices match the 10.9Å. distance between prolines in the Poly-Proline II. To design these backbones we used AtomPair constraints between the first helix of each repeat. The atom pair constraints were set to 10.9Å with a tolerance of 0.5 Å. For these designs we found the topologies that most Peptide binders design Modular peptide docking and Hashing To construct hash tables storing the pre-computed privileged residue interactions, we first survey the non-redundant PDB database and extract the intended interacting residues as seeds. For each seeding interaction residue pairs, random perturbations were applied to search for alternative relative conformations of the interacting residues. In the case of the sidechain- backbone bidentate interactions, the backbone residues were applied a random rigid body perturbations with a random set of euler angles drawing from a normal distribution with 0° as the mean and 60° as the standard deviation, as well as a random set of translation distances in 3D space drawing from a normal distribution with 0 Å as the mean and 1 Å as the standard deviation. At the same time, the backbone torsion angles ĭ and Ȍ of the backbone residue were randomly modified to values draw from a ramachandran density plot based on structures from PDB database. The transformed set of residues losing the intended interactions were discarded. The transformed residues keeping the interactions will be collected. Then the side chains of the sidechain residues were replaced with all reasonable rotamers, to further diversify the samples of the sets of interacting residues. Finally, the geometry relationship of each set of residues keeping the intended interactions were subjected to an 8D hash function (6D rigid body transformation plus two torsion angles), and represented with a 64 bit unsigned integer as the key of an entry in the hash table. The identity and the side chain torsion angles (ȋs) of the side chain residues were treated as the value of the entry in the hash table. The similar processes were carried out to build different hash tables for various interactions, with minor alterations. For example, for pi-pi and cation- pi interactions, only a 6D hash function was used, because there is no need for the perturbation and consideration of the backbone torsions. For ASN, GLN, ASP or GLU interacting two residues on the backbone, a 10D hash table was applied for representing the geometry relationship, and in these cases, the geometries of the N-H and C=O groups on the backbone were treated as 5D rays. To sample repeat peptides matching the superhelical parameters of the DHRs, we randomly generate a set of backbone torsion angles ij and ^, for example, [ij 1 , ^ 1 , ij 2 , ^ 2 , ij 3 , ^ 3 ] for repeats of tri-peptide. If there any pair of ij and ^ angles get a high Rosetta Ramachandran score above the threshold -0.5, it means that this pair of torsion angles are likely to introduce intra-peptide steric clashes, and we will randomly regenerate a new pair of ij and ^ angles until they are reasonable according to the Rosetta Ramachandran score. Next, we will set the backbone torsion angles of the repeat peptide using this set of ij and ^ angles repetitively across the 8-repeats. And we will calculate the superhelical parameters using the 3D coordinates of adjacent repeat units of the repeat peptide. The repeat peptides matching the superhelical parameters of any one of the curated DHRs, will be saved for the docking step. To dock cognate repeat proteins and repeat peptides, with matching superhelical parameters, they are first aligned to the z axis by their own superhelical axes. Next step, a 2D grid search (rotation around and translation along the z axis) were carried out to sample compatible positions of the repeat peptide in the binding groove of the repeat protein. Once a reasonable dock is generated without steric clash, the relevant hash function will be used to iterate through all potential peptide-protein interacting residue sets, to calculate the hash keys. If a hash key exists in the hash table, the interacting side chain identities and torsion angles would be pulled out immediately and installed on all equivalent positions of this repeat peptide-repeat protein docking conformation. The docked peptide-DHR pair will be saved for the interface design step if the peptide-DHR hydrogen-bond interactions are satisfied. Peptide binding interface design If a single dock was accepted with the designed repetitive peptide-DHR hydrogen- bond, the peptide was first trimmed to the exact same repeat number as the DHR (e.g., 4- repeat or 6-repeat). After that, for both peptide and DHR sides, each amino acid was set linked to its corresponding amino acids on the same position in each repeat unit. This was to make sure all the following design steps would be carried out with the exact same symmetry inside of both the DHR and peptide. During our design cycles, the interface neighbor distance is set as 9Å designable range around the DHR-peptide binding interface, and 11Å as the whole minimization range. Three rounds of full hydrophobic fastdesign 14 followed by hydropathic fastdesign were carried out, with each hydrophobic or hydrophilic fastdesign repeating twice. The Rosetta score function beta_nov16 was chosen in all design cycles. In the produced complex, the peptide itself with an averaged score (three calculations were carried out) larger than 20.0 or the complex scored larger than -10.0 were rejected directly. After the preliminary design was done, we carried out two types of sanity checks to further optimize the designed peptide sequence, as well as the designed DHR interface. Specifically, for the peptide side, in the tri-peptide repeat units, every two amino acids other than Proline were scanned for a possible mutation to all twenty amino acids except cysteine, unless a certain originally designed peptide amino acid is making the hashed sidechain- backbone hydrogen-bond, or sidechain-sidechain hydrogen-bond, or sidechain-sidechain- backbone hydrogen-bond with the DHR interface. DDG (binding energy for the peptide-DHR complex) was compared before and after this peptide side mutation; and the mutation was accepted if the delta DDG (DDG_after - DDG_before) was larger than 1.0. Similarly, we also checked the designed DHR interface by mutation. The whole DHR was scanned. For the designed hydrophobic amino acids which were originally hydrophilic, a delta DDG of -5.0 was set as the threshold to be accepted as a necessary design which made enough binding contribution. For the designed hydropathic amino acids, a delta DDG of -2.0 was used as threshold. For experimental characterization, we selected designed complexes with near ideal bidentate hydrogen bonds between protein and peptide, favorable protein-peptide interaction energies (DDG -35.0), interface shape complementary (Iface_SCval >= 0.65), tolerable interface unsatisfied hydrogen bonds (Iface_HbondsUnsatBB <= 2, Iface_HbondsUnsatSC <=4) and low peptide apo energies (ScoreRes_chainB <= 0.9). Forward Docking As for the selected designed complexes from our round-two experiments, forward docking was performed to ensure the specificity in silico. For each designed complex, 10,000 arbitrary peptide conformations were generated as above, using the designed sequence. The same docking protocol was conducted as described in the docking stage, against the untouched designed DHR. Fastrelax 17 was then performed for the 10,000 docks, and the DDG vs. peptide backbone RMSD was plotted to check the convergence of the complex. Only the “converged” complexes were selected for experimental characterization, e.g., i) peptide backbone RMSD<2.0Å among the top 20 designs with lowest DDG during Forward Docking and ii) the averaged peptide backbone of the top 20 designs was close to the original design model (RMSD<1.5Å). SSM library preparation We carried out the site saturation mutagenesis (SSM) studies for some of the designed peptide-protein binding pairs to gain better understanding of the peptide binding modes, and to search for improved peptide binders. For each designed repeat protein, we ordered a SSM library covering the central span of 65 amino acids within the whole repeat protein, due to the chip DNA size limitation. This span roughly equals one and half repeating units, across three helices. The chip synthesized DNA oligos for the SSM library then amplified and transformed to EBY100 yeast together with a linearized pETCON3 vector including the encoding regions of the rest of the designed repeat protein. Each SSM library was subject to an expression sort first, in which the low-quality sequences due to chip synthesize defects or recombination errors were filtered out. The collected yeast population, which successfully expresses the designed repeat protein mutants, will be re-grown, and be subject to the further round of peptide binding sorts. The next-generation sequencing results of this yeast population will also serve as the reference data for SSM analysis. The next round of without- avidity peptide binding sorts used various concentrations of the target peptide, depending on the initial peptide binding abilities, ranging from 1 nM to 1000 nM. The peptide-bound yeast populations were collected and sequenced by using Illumina NextSeq kit. The mutants were identified and compared to the mutants in the expression libraries. The enrichment analysis was carried out to identify the beneficial mutants and provide information for interpreting the peptide binding modes. For each mutant, its enrichment value is calculated by dividing its ratio in peptide-bound population by its ratio in expression population. The enrichment value is then subject to a log10 transformation, and plotted in heatmaps for the SSM analysis. Design of binders against endogenous targets To evaluate which endogenous proteins could be currently targeted with our method (Fig.4), we developed a python code to search databases for subsequences matching permutations of the set of amino acid triplets we designed binders for in this study (i.e LRP, PEW, PLP, IYP, PKW, IRP, LRT, LRN, LRQ, RRN, PSR, and PRQ). This code can be accessed freely (https://github.com/tjs23/prot_pep_scan). We then ranked all outputs to find the longest subsequence possible, and manually inspected the candidates to find subsequences landing in disordered regions. Doing this analysis on the human proteome suggested that ZFC3H1 could be a good target for two main reasons: 1) this protein possess the sequence (PLP)x4 (SEQ ID NO: 109) within a large disordered domain, with downstream sequence (PEDPEQPPKPPF (SEQ ID NO: 108)) within the reach of our binder design method and 2) the protein is well studied, and, in particular, commercial, highly specific and validated antibodies exist against it. Table 7. First-round experimental characterization summary. It is clearly shown that among the binders, most of them bound peptides with sequences similar to those targeted but not the same; and peptides with 3 residue repeat units were targeted more successfully (19 in total) than those with 2 residue repeat units (2 in total). Table 8. Second-round experimental characterization summary. 54 second-round designed protein-peptide pairs were tested in total.42 of the designed proteins were solubely expressed in E coli, 25 were monomerically dispersed by Size Exclusion Chromatography (SEC), and 16 designed bound their targets with considerably higher affinity and specificity than in the first round. References 1. Sharp P. M., Li W., The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications, Nucleic Acids Research, 15(3), 281– 1295 (1987). 2. Fallas, J., Ueda, G., Sheffler, W. et al. Computational design of self-assembling cyclic protein homo-oligomers. Nature Chem 9, 353–360 (2017). 3. Dyer, K. N., Hammel, M., Rambo, R. P., Tsutakawa, S. E., Rodic, I., Classen, S., Tainer, J. A., & Hura, G. L. High-throughput SAXS for the characterization of biomolecules in solution: a practical approach. Methods in molecular biology (Clifton, N.J.), 1091, 245–258 (2014). 4. Williams, C. J. et al. MolProbity: More and better reference data for improved all- atom structure validation. Protein Sci.27, 293–315 (2018). 5. Kabsch, W. XDS. Acta Crystallogr. D Biol. Crystallogr.66, 125–132 (2010). 6. McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr.40, 658– 674 (2007). 7. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr.66, 486–501 (2010). 8. Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr.66, 213– 221 (2010). 9. Kessels, M. M., & Qualmann, B. Syndapins integrate N-WASP in receptor-mediated endocytosis. The EMBO journal, 21(22), 6083–6094 (2002). 10. Bindels, D., Haarbosch, L., van Weeren, L. et al. mScarlet: a bright monomeric red fluorescent protein for cellular imaging. Nat Methods 14, 53–56 (2017). 11. Fakieh, M. H., Drake, P. J., Lacey, J., Munck, J. M., Motley, A. M., & Hettema, E. H. Intra-ER sorting of the peroxisomal membrane protein Pex3 relies on its luminal domain. Biology open, 2(8), 829–837 (2013). 12. Brunette, TJ et al. Exploring the repeat protein universe through computational protein design. Nature 528, 580–584 (2015). Brunette, TJ et al. Modular repeat protein sculpting using rigid helical junctions. Proc. Natl. Acad. Sci. U. S. A.117, 8870–8875 (2020). Fallas, J. A. et al. Computational design of self-assembling cyclic protein homo- oligomers. Nat. Chem.9, 353–360 (2017). Sheffler, W. & Baker, D. RosettaHoles: rapid assessment of protein core packing for structure prediction, refinement, design, and validation. Protein Sci.18, 229–239 (2009). Bradley, P., Misura, K. M. S. & Baker, D. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005). Tyka MD, Keedy DA, André I, et al. Alternate states of proteins revealed by detailed energy landscape mapping. J Mol Biol. 405(2):607-618 (2011). Perkins, D. N., Pappin, D. J. C., Creasy, D. M. & Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis.20, 3551–3567 (1999). Keller, A., Nesvizhskii, A. I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem.74, 5383–5392 (2002).