Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS FOR LABELING N-TERMINAL GLYCINE FOR PROTEOMICS
Document Type and Number:
WIPO Patent Application WO/2024/036317
Kind Code:
A1
Abstract:
Provided herein are compositions, systems, methods, and kits for peptide analysis, including peptide sequencing. In some cases, the methods may be used for mapping post-translational modifications on peptides or proteins.

Inventors:
ANSLYN ERIC (US)
LIU HONGXU (US)
SWAMINATHAN JAGANNATH (US)
SOMEKH TALLI (US)
DEOL HARNIMARTA (US)
Application Number:
PCT/US2023/072097
Publication Date:
February 15, 2024
Filing Date:
August 11, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV TEXAS (US)
ERISYON INC (US)
International Classes:
C07K1/13; C07K7/06; G01N33/68
Foreign References:
US20200131254A12020-04-30
Other References:
STEGER MARTIN, KARAYEL ÖZGE, DEMICHEV VADIM: "Ubiquitinomics: History, methods, and applications in basic research and drug discovery", PROTEOMICS, WILEY-VCH VERLAG , WEINHEIM, DE, vol. 22, no. 15-16, 1 August 2022 (2022-08-01), DE , XP093141361, ISSN: 1615-9853, DOI: 10.1002/pmic.202200074
STES ELISABETH, LAGA MATHIAS, WALTON ALAN, SAMYN NOORTJE, TIMMERMAN EVY, DE SMET IVE, GOORMACHTIG SOFIE, GEVAERT KRIS: "A COFRADIC Protocol To Study Protein Ubiquitination", JOURNAL OF PROTEOME RESEARCH, AMERICAN CHEMICAL SOCIETY, vol. 13, no. 6, 6 June 2014 (2014-06-06), pages 3107 - 3113, XP093141363, ISSN: 1535-3893, DOI: 10.1021/pr4012443
CHICOOREE, N. ET AL.: "Enhanced Detection of Ubiquitin Isopeptides Using Reductive Methylation", JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, vol. 24, no. 3, 2013, pages 421 - 430, XP035354361, DOI: 10.1007/s13361-012-0538-0
TANAKA, F. ET AL.: "Development of a Small Peptide Tag for Covalent Labeling of Proteins", BIOCONJUGATE CHEMISTRY, vol. 18, no. 4, 2007, pages 1318 - 1324, XP093098893, DOI: 10.1021/bc070080x
Attorney, Agent or Firm:
LONG, S., Reid (US)
Download PDF:
Claims:
CLAIMS

WHAT IS CLAIMED IS:

1. A method of covalently labeling a peptide comprising an N-terminal glycine residue comprising introducing a substituted or unsubstituted beta-diketone to said peptide to produce a pyrrole or imidazolidinone coupled to said peptide.

2. The method of claim 1, wherein said introducing occurs in solution.

3. The method of claim 2, wherein said solution comprises a polar aprotic or nonaqueous polar protic solvent.

4. The method of claim 3, wherein said solution comprises a polar aprotic solvent, wherein said polar aprotic solvent comprises N,N-dimethylformamide (DMF), dimethyl sulfoxide (DMSO), pyridine, acetic acid, acetone, or any combination thereof.

5. The method of any one of claims 1-4, wherein said method is performed under conditions substantially free of water.

6. The method of any one of claims 2-4, wherein said solution comprises a desiccant.

7. The method of any one of claims 1-6, wherein said substituted or unsubstituted betadiketone is according to Formula la: (Formula la) wherein:

Ra and Rb are independently selected at each instance from substituted or unsubstituted alkyl(c<8) or substituted or unsubstituted 5- to 10-membered aryl or heteroaryl; and

Rc is -H, substituted or unsubstituted alkyl(c<8), -CH C^C, -N3, N-linked dibenzylcyclooctyne; N-linked biotin, -norbomene, or -tetrazine; substituted or unsubstituted alkylamino; or an N-linked fluorophore.

8. The method of claim 7, wherein Ra and Rb are independently selected at each instance from substituted or unsubstituted 5- to 10-membered aryl or heteroaryl.

9. The method of claim 8, wherein Rc is -H.

10. The method of claim 7, wherein Ra and Rb are independently selected at each instance from substituted or unsubstituted alkyl(c<8); and

Rc is -H; or substituted or unsubstituted alkyl(c<8). The method of any one of claims 7-9, wherein Ra or Rb is according to Formula Ia( 1 ) : (Formula Ia(l)) wherein:

Rs is independently selected at each instance from -H, -OH, substituted or unsubstituted alkoxy, -O(alkyl(c<8)), -OCH2C=C, -N3, or allyl. The method of any one of claims 7-11, wherein said beta-diketone is according to Formula Ia(2) or Formula Ia(3): The method of claim 12, wherein said beta-diketone is according to any one of the following: The method of any one of claims 1-13, further comprising contacting said peptide comprising said N-terminal glycine residue with a monoketone according to Formula lb: (Formula lb) wherein:

Rd and Re are independently selected at each instance from substituted or unsubstituted alkyl(c<8) or substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or Rd and Re are taken together to form a substituted or unsubstituted 5- or 6-membered aromatic or nonaromatic cycle. The method of claim 14, wherein Rd is substituted or unsubstituted alkyl<c<8) and Re is substituted or unsubstituted 5- to 10-membered aryl or heteroaryl. The method of claim 15, wherein Rd is methyl or ethyl. The method of claim 14, wherein Re is according to any one of the following: The method of any one of claims 17-16, wherein Re is methyl or ethyl. The method of claim 14, wherein said monoketone is according to Formula Ic: wherein:

R5 is independently selected at each instance from -H, -OH, and -O(alkyl(c<8))- The method of claim 14, wherein said monoketone is according to any one of the following:

The method of any one of claims 1-20, wherein said peptide comprises at least 3 or at least 6 residues. The method of any one of claims 1-21, wherein said peptide comprises at least 50 residues. The method of any one of claims 1-22, wherein said peptide comprising an N- terminal glycine residue comprises N-terminal residues of -Gly-Gly or -Ala-Gly. The method of any one of claims 1-23, wherein said peptide comprises naturally- occurring amino acids. The method of any one of claims 1-24, further comprising producing said peptide comprising said N-terminal glycine residue prior to said introducing by digesting a parent peptide comprising a K/R-G dipeptide with trypsin or Arg-C. The method of claim 25, wherein said parent peptide is a polypeptide comprising a ubiquitin conjugation. The method of any one of claims 1-26, wherein said pyrrole or imidazolidinone coupled to said peptide is fluorescent. A peptide according to Formula Ila (Formula Ila) wherein: Ri is independently selected at each instance from substituted or unsubstituted aryl or heteroaryl;

R2 is -H, -OH, -SH, substituted or unsubstituted alkyl(c<8), substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or a side chain of naturally occurring amino acid; and

R3 is an amino acid or a polypeptide chain; and

R4 is -H, substituted or unsubstituted alkyl(c<8), or -CH2C=C, N-linked dibenzylcyclooctyne; -N-linked biotin, -norbornene, or -tetrazine; substituted or unsubstituted alkylamino; or an N-linked fluorophore provided that at least one of Ri and R4 is not H. The peptide of claim 28, wherein Ri is independently selected at each instance from substituted or unsubstituted six-membered aryl or heteroaryl. The peptide of claim 28 or 29, wherein said peptide is according to Formula Ila(l): wherein:

Rs is independently selected at each instance from -H, -OH, substituted or unsubstituted alkoxy, -O(alkyl(c<8)), -OCH2C=C, -N3, or allyl. The peptide of any one of claims 28-30, wherein R5 is -OCH3 or -OCH2CMI The peptide of any one of claims 28-31, wherein said peptide comprises at least 3 or at least 6 residues. The peptide of any one of claims 28-32, wherein said peptide comprises at least 50 residues. The peptide of any one of claims 28-33, wherein R3 is -Gly or -Ala, or is a peptide chain beginning with -Gly or -Ala. A peptide according to Formula lib or Formula lie: (Formula Uh)

Wherein:

R2 is -H, -OH, -SH, substituted or unsubstituted Ci-Ce alkyl, substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or a side chain of naturally occurring amino acid; R3 is an amino acid or a polypeptide chain;

Rd and Re are independently selected at each instance from substituted or unsubstituted alkyl(c<8) or substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or Rd and Re are taken together to form an substituted or unsubstituted 5- or 6-membered cycle; and Rc is -H, substituted or unsubstituted alkyl(c<8), -CH2C=C, N-linked dibenzylcyclooctyne; N-linked biotin, -norbornene, or -tetrazine; substituted or unsubstituted alkylamino; or an N-linked fluorophore. The peptide according to claim 35, wherein the peptide is according to formula Hb(l): wherein

Ring A is C3-C8 substituted or unsubstituted aryl or heteroaryl. The peptide according to claim 36, wherein the peptide is according to Formula IIb(2) (Formula IIb(2))

Wherein:

Ri is substituted or unsubstituted alkoxy, -O(alkyl(c<8)), or -0CH2C=C; R2 is H, OH, SH, substituted or unsubstituted alkyl<c<8), substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or a side chain of naturally occurring amino acid; and R3 is an amino acid. A method of identifying a presence of ubiquitin or ubiquitin like conjugation in a parent polypeptide, comprising:

(a) digesting said parent polypeptide with a sequence- specific protease to generate a C-terminal stub residue chain at a site of ubiquitin (UBI) or ubiquitin-like protein (UBLs) conjugation in said parent peptide, wherein said C-terminal stub residue chain comprises less than a full ubiquitin (UBI) or ubiquitin-like protein (UBLs) chain;

(b) performing an N-terminal peptide selective reaction on said C-terminal stub residue chain to label said site of UBI or UBL conjugation; and

(c) detecting said site of UBI or UBL conjugation via said label at said site of UBI or UBL conjugation. The method of claim 38, wherein said sequence-specific protease is trypsin or Arg-C. The method of claim 38, comprising generating a C-terminal stub residue chain at a site of ubiquitin (UBI) conjugation in said peptide. The method of claim 40, wherein said N-terminal peptide specific reaction is an N- terminal glycine specific reaction according to any one of claims 1-27. The method of any one of claims 38-41, wherein said detecting comprises Edman degradation, peptide sequencing, MS/MS, or fluorescence detection. A kit comprising:

(a) a beta-diketone according to any one of Formulas la, Ia(2), or Ia(3);

(b) Trypsin or Arg-C; and

(c) instructions for processing a parent peptide to produce a diglycine-labelled N- terminus at sites of ubiquitination in a parent protein. The kit of claim 43, further comprising a monoketone according to Formula lb. The kit of claim 43 or 44, wherein said beta-diketone or said monoketone comprises a propargyl group. The kit of claim 45, further comprising an azide-coupled fluorophore. A method of covalently labeling a peptide comprising an N-terminal amino acid residue lacking a side chain comprising introducing a substituted or unsubstituted beta-diketone to said peptide to produce a pyrrole or imidazolidinone coupled to said peptide.

Description:
DESCRIPTION

METHODS FOR LABELING N-TERMINAL GLYCINE FOR PROTEOMICS

CROSS-REFERENCE

[0001] This application claims the benefit of U.S. Provisional Application 63/397,687, entitled “Methods of Mapping Post-Translational Modifications on Proteins”, filed on August 12, 2022, which is incorporated herein in its entirety.

REFERENCE TO A SEQUENCE LISTING

[0002] This application contains a Sequence Listing XML, which has been submitted electronically and is hereby incorporated by reference in its entirety. Said XML Sequence Listing, created on August 11, 2023, is named UTFBP1319WO.xml and is -18 kilobytes in size.

BACKGROUND

[0003] Ubiquitination is the biochemical process in which proteins are marked by ubiquitin, a 76 amino acid protein. Ubiquitination occurs intracellularly in eukaryotes and regulates a wide variety of biological processes. The ubiquitination of a protein most commonly results in degradation of the protein via the ubiquitin-proteasome pathway but can also serve to alter protein-protein interaction.

INCORPORATION BY REFERENCE

[0004] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.

SUMMARY

[0005] In some aspects, the present disclosure provides for a method of covalently labeling a peptide comprising an N-terminal glycine residue comprising introducing a substituted or unsubstituted beta-diketone to said peptide to produce a pyrrole or imidazolidinone coupled to said peptide. In some embodiments, said introducing occurs in solution. In some aspects, the present disclosure provides for a method of covalently labeling a peptide comprising an N-terminal amino acid residue lacking a side chain comprising introducing a substituted or unsubstituted beta-diketone to said peptide to produce a pyrrole or imidazolidinone coupled to said peptide. In some embodiments, said solution comprises a polar aprotic or nonaqueous polar protic solvent. In some embodiments, said solution comprises a polar aprotic solvent, wherein said polar aprotic solvent comprises N,N-dimethylformamide (DMF), dimethyl sulfoxide (DMSO), pyridine, acetic acid, acetone, or any combination thereof. In some embodiments, said method is performed under conditions substantially free of water. In some embodiments, said solution comprises a desiccant. In some embodiments, said R O 0 substituted or unsubstituted beta-diketone is according to Formula la: H R c

(Formula la) wherein: R a and Rb are independently selected at each instance from substituted or unsubstituted alkyl<c<8) or substituted or unsubstituted 5- to 10-membered aryl or heteroaryl; and R c is -H, substituted or unsubstituted alkyl<c<8), -CH2C=C, -N3, N-linked dibenzylcyclooctyne; N-linked biotin, -norbornene, or -tetrazine; substituted or unsubstituted alkylamino; or an N-linked fluorophore. In some embodiments, R a and Rb are independently selected at each instance from substituted or unsubstituted 5- to 10-membered aryl or heteroaryl. In some embodiments, R c is -H. In some embodiments, R a and Rb are independently selected at each instance from substituted or unsubstituted alkyl(c<8); and R c is

-H; or substituted or unsubstituted alkyl(c<8). In some embodiments, R a or Rb is according to Formula Ia( 1): (Formula Ia(l)) wherein: Rs is independently selected at each instance from -H, -OH, substituted or unsubstituted alkoxy, -O(alkyl(c<8)), -OCH2C=C, -Ns, or allyl. In some embodiments, said beta-diketone is according to Formula Ia(2) or Formula Ia(3): a Ia(3)) In some embodiments, said beta-diketone is according to any one of the following: oY c . In some embodiments, the method further comprises contacting said peptide comprising said N-terminal glycine residue with a monoketone according to Formula lb: (Formula lb) wherein: Rd and R e are independently selected at each instance from substituted or unsubstituted alkyl(c<s) or substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or Rd and R e are taken together to form a substituted or unsubstituted 5- or 6-membered aromatic or nonaromatic cycle. In some embodiments, Rd is substituted or unsubstituted alkyl(c<8) ; and R e is substituted or unsubstituted 5- to 10-membered aryl or heteroaryl. In some embodiments, Rd is methyl or ethyl. In some embodiments, R e is according to any one of the following:

In some embodiments, R e is methyl or ethyl. In some embodiments, said monoketone is according to Formula Ic: wherein: R5 is independently selected at each instance from -H, -OH, and -O(alkyl(c<8))- In some embodiments, said monoketone is according to any one of the following:

In some embodiments, said peptide comprises at least 3 or at least 6 residues. In some embodiments, said peptide comprises at least 50 residues. In some embodiments, said peptide comprising an N-terminal glycine residue comprises N-terminal residues of -Gly-Gly or -Ala-Gly. In some embodiments, said peptide comprises naturally-occurring amino acids. In some embodiments, the method further comprises producing said peptide comprising said N-terminal glycine residue prior to said introducing by digesting a parent peptide comprising a K/R-G dipeptide with a positively-charged amino acid specific endoprotease. In some embodiments, the positively-charged amino acid specific endoprotease is trypsin or Arg-C.

In some embodiments, said parent peptide is a polypeptide comprising a ubiquitin conjugation. In some cases, the polypeptide comprises a ubiquitin-lysine conjugation. In some embodiments, said pyrrole or imidazolidinone coupled to said peptide is fluorescent. [0006] In some aspects, the present disclosure provides for a peptide according to Formula Ila (Formula Ila) wherein: Ri is independently selected at each instance from substituted or unsubstituted aryl or heteroaryl; R2 is -H, -OH, -SH, substituted or unsubstituted alkyl(c<8), substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or a side chain of naturally occurring amino acid; and R3 is an amino acid or a polypeptide chain; and R4 is -H, substituted or unsubstituted alkyl(c<8), or -CH2C=C, N-linked dibenzylcyclooctyne; N-linked biotin, - norbornene, or -tetrazine; substituted or unsubstituted alkylamino; or an N-linked fluorophore. In some embodiments, at least one of Ri and R4 is not H. In some embodiments, Ri is independently selected at each instance from substituted or unsubstituted six-membered aryl or heteroaryl. In some embodiments, said peptide is according to Formula wherein: R5 is independently selected at each instance from -H, -OH, substituted or unsubstituted alkoxy, -O(alkyl(c<8)), -OCH2C=C, -N3, or allyl. In some embodiments, Rs is - OCH3 or -OCH2OC. In some embodiments, said peptide comprises at least 3 or at least 6 residues. In some embodiments, said peptide comprises at least 50 residues. In some embodiments, R3 is -Gly or - Ala, or is a peptide chain beginning with -Gly or -Ala.

[0007] In some aspects, the present disclosure provides for a peptide according to Formula lib or Formula lie: wherein: R2 is -H, -OH, -SH, substituted or unsubstituted Ci-Ce alkyl, substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or a side chain of naturally occurring amino acid; R3 is an amino acid or a polypeptide chain; Rd and R e are independently selected at each instance from substituted or unsubstituted alkyl(c<8) or substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or Rd and R e are taken together to form an substituted or unsubstituted 5- or 6-membered cycle; and R c is -H, substituted or unsubstituted alkyl(c<8), - CH C=C, N-linked dibenzylcyclooctyne; N-linked biotin, -norbornene, or -tetrazine; substituted or unsubstituted alkylamino; or an N-linked fluorophore. In some embodiments, the peptide is according to formula Ilb(l): (Formula Ilb(l)) wherein: Ring A is C3-C8 substituted or unsubstituted aryl or heteroaryl. In some embodiments, the peptide is according to Formula IIb(2) (Formula IIb(2)) wherein: Ri is substituted or unsubstituted alkoxy, -O(alkyl(c<8)), or -0CH2C=C; R2 is H, OH, SH, substituted or unsubstituted alkyl(c<8), substituted or unsubstituted 5- to 10-membered aryl or heteroaryl, or a side chain of naturally occurring amino acid; and R3 is an amino acid. [0008] In some aspects, the present disclosure provides for a method of identifying a presence of ubiquitin or ubiquitin like conjugation in a parent polypeptide, comprising: (a) digesting said parent polypeptide with a sequence-specific protease to generate a C-terminal stub residue chain at a site of ubiquitin (UBI) or ubiquitin-like protein (UBLs) conjugation in said parent peptide, wherein said C-terminal stub residue chain comprises less than a full ubiquitin (UBI) or ubiquitin-like protein (UBLs) chain; (b) performing an N-terminal peptide selective reaction on said C-terminal stub residue chain to label said site of UBI or UBL conjugation; and (c) detecting said site of UBI or UBL conjugation via said label at said site of UBI or UBL conjugation. In some embodiments, said sequence-specific protease is trypsin or Arg-C. In some embodiments, the method further comprises generating a C- terminal stub residue chain at a site of ubiquitin (UBI) conjugation in said peptide. In some embodiments, said N-terminal peptide specific reaction is an N-terminal glycine specific reaction according to any of the aspects or embodiments described herein. In some embodiments, said detecting comprises Edman degradation, peptide sequencing, MS/MS, or fluorescence detection. [0009] In some aspects, the present disclosure provides for a kit comprising: (a) a betadiketone according to any one of Formulas la, Ia(2), or Ia(3), or any of the beta-diketones described herein; (b) a charged-residue specific endoprotease; and (c) instructions for processing a parent peptide to produce a diglycine-labelled N-terminus at sites of ubiquitination in a parent protein. In some embodiments, the kit further provides instructions for labeling a diglycine-labelled N-terminus generated at sites of ubiquitination in a parent protein using said beta-diketone. In some embodiments, the charged-residue specific endoprotease is Trypsin or Arg-C. In some embodiments, the kit further comprises a monoketone according to Formula lb, or any of the monoketones described herein. In some embodiments, said beta-diketone or said monoketone comprises a propargyl group. In some embodiments, the kit further comprises an azide-coupled fluorophore.

[0010] Various aspects of the present disclosure provide a method, comprising: (a) providing a biopolymer comprising a biomolecular tag coupled thereto, which biomolecular tag comprises a peptide comprising a first portion and a second portion, wherein the first portion is coupled to the second portion, and wherein a terminal end of the second portion is coupled to the biopolymer; (b) cleaving the first portion from the second portion of the biomolecular tag to generate a fragmented biomolecular tag, wherein the fragmented biomolecular tag comprises the first portion of the peptide, and wherein the biopolymer is coupled to the second portion of the peptide; (c) labeling the second portion of the peptide with a label to generate a labeled biopolymer; and (d) identifying the labeled biopolymer, wherein the identifying is by sequencing by degradation.

[0011] In some embodiments, the biopolymer is a polypeptide. In some embodiments, the biopolymer is a protein. In some embodiments, the label is a fluorescent label. In some embodiments, the label is a dye. In some embodiments, the identifying comprises detecting a signal from the labeled biopolymer. In some embodiments, the identifying determines a position of the biomolecular tag within the biopolymer. In some embodiments, the labeling, or the identifying are configured to distinguish a ubiquitin-family protein from a plurality of proteins comprising ubiquitin-family proteins.

[0012] In some embodiments, the plurality of proteins comprises a ubiquitin protein, a SUMO protein, an UBL5 protein, an ISG15 protein, an El SUMO protein, an El Nedd8 protein, a 1Y8XB protein, a UBL3/MUB protein, a 1WGHA protein, a Nedd8 protein, a 1NDDC protein, a 1BT0A protein, a 1 AARA protein, a 1 A5R protein, or any ortholog or combination thereof. In some embodiments, after the cleaving, wherein the labeling selectively labels a terminal amino acid residue on the second portion of the peptide. In some embodiments, the biopolymer comprises a second peptide, a nucleic acid, a saccharide, a lipid, a polyketide, a polyterpene, a polyhydroxyalkanoate, or any combination thereof. [0013] In some embodiments, the biomolecular tag is on an internal amino acid residue of the biopolymer. In some embodiments, the identifying comprises identifying a position of the internal amino acid residue. In some embodiments, the biomolecular tag is coupled to a sidechain of the internal amino acid residue. In some embodiments, the internal amino acid residue is lysine. In some embodiments, the internal amino acid residue is glycine. In some embodiments, the internal amino acid residue is cysteine. In some embodiments, the second portion of the peptide is coupled to the biomolecular tag via an isopeptide bond.

[0014] In some embodiments, the isopeptide bond comprises a lysyl butylamine of the peptide. In some embodiments, the isopeptide bond comprises a carbonyl of the biomolecular tag. In some embodiments, the biomolecular tag comprises a ubiquitin-family protein. In some embodiments, the ubiquitin-family protein comprises a ubiquitin protein, a SUMO protein, an UBL5 protein, an ISG15 protein, an El SUMO protein, an El Nedd8 protein, a 1 Y8XB protein, a UBL3/MUB protein, a 1WGHA protein, a Nedd8 protein, a 1NDDC protein, a 1BT0A protein, a 1AARA protein, a 1A5R protein, or any ortholog or combination thereof.

[0015] In some embodiments, the ubiquitin-family protein comprises ubiquitin. In some embodiments, the labeling couples a carbonyl group to the second portion of the peptide. In some embodiments, the labeling covalently modifies an N-terminal amine of the second portion of the peptide. In some embodiments, the labeling comprises specificity for the N- terminal amine of the second portion of the peptide. In some embodiments, after the cleaving, the second portion of the peptide comprises a terminal diglycine moiety. In some embodiments, the labeling comprises specificity for the diglycine moiety. In some embodiments, the labeling covalently modifies an N-terminal amine of the diglycine moiety. In some embodiments, the labeling covalently modifies the second portion of the peptide. [0016] In some embodiments, the cleaving is site-selective. In some embodiments, the cleaving comprises site-selectivity for an amino acid or peptide sequence of the biomolecular tag. In some embodiments, the cleaving comprises site- selectivity for the first portion of the peptide. In some embodiments, the cleaving comprises site- selectivity for the second portion of the peptide. In some embodiments, the cleaving comprises enzymatic digestion. In some embodiments, the cleaving comprises protease-mediated cleavage. In some embodiments, the cleaving comprises endopeptidase-mediated cleavage. In some embodiments, the cleaving fragments the biopolymer, thereby generating the fragmented biopolymer. [0017] In some embodiments, the fragmented biopolymer comprises an average molecular mass of less than about 55 kilodaltons (kDa). In some embodiments, the fragmented biopolymer comprises an average molecular mass of less than about 20 kilodaltons (kDa). In some embodiments, the fragmented biopolymer comprises an average of at most about 50 amino acids. In some embodiments, the fragmented biopolymer comprises an average of at most about 20 amino acids. In some embodiments, the biopolymer comprises a plurality of biomolecular tags.

[0018] In some embodiments, the identifying comprises quantifying the plurality of biomolecular tags. In some embodiments, the method further comprises coupling the biopolymer to a solid support before the providing. In some embodiments, the method further comprises, before the labeling, blocking a reactive moiety of the biopolymer. In some embodiments, the blocking inhibits a reactivity of the reactive moiety against the labeling. In some embodiments, the reactive moiety comprises a peptide N-terminus, a peptide C- terminus, or an amino acid side chain.

[0019] In some embodiments, the method further comprises labeling an amino acid moiety of the biopolymer to generate a labeled biopolymer. In some embodiments, the labeling the amino acid moiety of the biopolymer comprises covalently modifying the amino acid moiety. In some embodiments, the method further comprises labeling the fragmented biomolecular tag. In some embodiments, the method further comprises distinguishing the labeled fragmented biomolecular tag from the labeled biopolymer. In some embodiments, the method further comprises quantifying the fragmented biomolecular tag.

[0020] In some embodiments, the method further comprises sequencing the biopolymer. In some embodiments, the sequencing by degradation comprises fluorosequencing. In some embodiments, the sequencing by degradation comprises Edman degradation. In some embodiments, the sequencing by degradation comprises subjecting the labeled biopolymer to conditions sufficient to remove at least one amino acid from the labeled biopolymer.

[0021] In some embodiments, the labeled biopolymer generates at least one signal or at least one signal change. In some embodiments, the at least one signal or the at least one signal change is an optical signal. In some embodiments, the at least one signal or the at least one signal change comprises a plurality of signals of different intensities. In some embodiments, the at least one signal or the at least one signal change comprises a plurality of signals of different frequencies or signals of different frequency ranges.

[0022] In some embodiments, the sequencing by degradation comprises enzymatic cleavage of the label from the biopolymer. In some embodiments, the sequencing by degradation comprises chemical cleavage of the label from the biopolymer. In some embodiments, the chemical cleavage comprises cyanogen bromide cleavage, BNPS -skatole cleavage, formic acid cleavage, hydroxylamine cleavage, 2-nitro-5-thiocyanobenzoic acid cleavage, or any combination thereof. In some embodiments, the first portion is coupled to the second portion by a linker.

[0023] Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.

[0024] Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.

[0025] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

BRIEF DESCRIPTION OF THE DRAWINGS

[0026] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “figure” and “FIG.” herein), of which:

[0027] FIG. 1 shows a computer system that is programmed or otherwise configured to implement methods provided herein.

[0028] FIG. 2 illustrates the chemical reactions used for ubiquitin mapping.

[0029] FIG. 3A shows the reaction of a glycine residue, acetylacetone, and acetone. FIG. 3B shows the proposed reaction mechanism for labeling an N-terminal glycine.

[0030] FIG. 4A shows that an external ketone can be used to compete with the acetone to form a GFP-chromophore mimicking structure. FIG. 4B shows the results of a cross-talk experiment using two different 2,3-diketones. [0031] FIG. 5 shows the mass spectrometry data resulting from the cross-talk experiment using two different 2,3-diketones.

[0032] FIG. 6 shows example reactions of glycine modification that were used to test the substrate scope and optimize reaction conditions.

[0033] FIG. 7 shows the mass spectrometry data resulting from glycine modification of a peptide.

[0034] FIG. 8 shows the mass spectrometry data resulting from glycine modification of a peptide.

[0035] FIG. 9 depicts a scheme for methylene-facilitated amine differentiation for glycine labeling.

[0036] FIG. 10 depicts a scheme showing a proposed mechanism for the formation of pyrrole-based products via the condensation of N-terminal glycines and 1,3-diketones (proton transfers are not shown, and order of steps are not conclusive).

[0037] FIG. 11 depicts a scheme showing: (a) the proposed pathway for the formation of MDZ-1 (proton transfers are not shown, and order of steps are not conclusive); and (b) a crossover study conducted as a mechanism test.

[0038] FIG. 12 depicts the reaction scope of imidazolidinone formation tested in Example 5. [0039] FIG. 13 depicts schemes and characterization of imidazolidinone and pyrrole adducts to glycine. Shown are: (a) the scheme, products, and crystal structures of Py-1 and Py-2 obtained via the reaction between glycine-N-methylamide and benzoylacetone; (b) the reaction scheme and products for pyrrole formation; (c) absorbance spectrum of molecule Py- 3 (30 pM in DMSO); (d) emission spectrum (excitation: 315 nm), and fluorescence image of Py-3 under the 365 nm UV lamp (30 pM in DMSO).

[0040] FIG. 14 depicts structures and characterization of pyrrole adducts to various peptides. Shown are: (a) pyrrole-based products formed after the specific labeling of various N- glycine-terminated peptides; (b) emission spectrum of molecule H2N-GAKYAA (SEQ ID NO: 1) (30 pM in DMSO, Excitation: 275 nm); (c) emission spectrum of Py-(G)AKYAA (SEQ ID NO: 2) (30 pM in DMSO, Excitation: 315 nm); (d) fluorescence image of H2N- GAKYAA (SEQ ID NO: 1) (unlabeled) and Py-(G)AKYAA (SEQ ID NO: 2) (labeled) under a 365 nm UV lamp (30 pM in DMSO).

[0041] FIG. 15 depicts different conditions that were used for reaction optimization (e.g. of imidazolidinone formation).in Example 5

[0042] FIG. 16 depicts a reaction scheme and mass spectrometry trace of the crosstalk reaction of glycine- 'V-mcthylannde with acetylacetone and 3, 5 -heptanedione in Example 5. [0043] FIG. 17 depicts a reaction scheme and mass spectrometry trace of the selectivity test for pyrrole formation via the reaction between alanine-N-methylamide hydrochloride and acetylacetone performed in Example 5 (3:1 molar ratio). (Note: No ion peak (167.12) from the “double condensation” which happened for MDZ-1 formation was observed.)

[0044] FIG. 18 depicts a reaction scheme and mass spectrometry trace of the selectivity test for pyrrole formation via the reaction between valine-N-methylamide hydrochloride and acetylacetone performed in Example 5(3: 1 molar ratio). (Note: No ion peak (195.14) from the “double condensation” which happened for MDZ-1 formation was observed.) [0045] FIG. 19 depicts the chemical and crystal structure of Py-3.

[0046] FIG. 20 depicts absorbance (a) and (b) Emission spectra (excitation: 315 nm) of Py-4 in 30 pM of DMSO (optical image obtained under UV irradiation at 365 nm).

[0047] FIG. 21 depicts (a) absorbance and (b) emission spectra (excitation: 312 nm) of Py-5 in 30 pM of DMSO (optical image obtained under UV irradiation at 365

[0048] FIG. 22 depicts a scheme and mass spectrometry trace of the selectivity test in Example 5 for pyrrole formation via the reaction between alanine-N-methylamide hydrochloride and DBM-1 (1:3 molar ratio).

[0049] FIG. 23 depicts a scheme and mass spectrometry trace for the formation of MDZ- (G)GAAAA in Example 5.

[0050] FIG. 24 depicts a scheme and mass spectrometry trace for the formation of MDZ- (G)GAAAA in Example 5.

[0051] FIG. 25 depicts a scheme of the reaction between peptide NEL-GGAAAA (SEQ ID NO: 3) and DBM-1, and the HRMS of the formed peptide Py-(G)GAAAA (SEQ ID NO: 4) in Example 5. [0052] FIG. 26 depicts (a) absorbance spectrum of NH2-GAKYAA (SEQ ID NO: 1) in DMSO, and (b) absorbance spectrum of Py-(G)AKYAA (SEQ ID NO: 2) in DMSO.

[0053] FIG. 27 depicts a scheme and mass spectrometry trace of the selectivity test by the reaction of NH2-KAAYAA (SEQ ID NO: 12) with DBM-1 performed in Example 5.

[0054] FIG. 28 depicts a scheme and mass spectrometry trace of the selectivity test by the reaction NH2-VAAYAA (SEQ ID NO: 13) with DBM-1 in Example 5.

[0055] FIG. 29 depicts an HRMS spectrum for the propargyl-pyrrole linked peptide depicted.

[0056] FIG. 30 depicts an HRMS spectrum for the 7-(diethylamino)-2-oxo-2H-chromene-3- carboxylic acid-linked peptide depicted.

[0057] FIG. 31 depicts photophysical studies performed on the 7-(diethylamino)-2-oxo-2H- chromene-3-carboxylic acid-linked peptide depicted in FIG. 30. INCORPORATION BY REFERENCE

[0058] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.

DETAILED DESCRIPTION

[0059] While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

[0060] Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.

[0061] Whenever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than,” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.

[0062] The term “analyte” or “analytes,” as used herein, generally refers to a molecule whose presence or absence is measured or identified. An analyte may be a molecule for which a detectable probe or assay exists or may be produced. For example, an analyte may be a macromolecule, such as, for example, a nucleic acid, a polypeptide, a carbohydrate, a small organic, an inorganic compound, or an element, for example, gold, iron, or lead. An analyte may be part of a sample that contains other components, or may be the sole or the major component of the sample. An analyte may be a component of a whole cell or tissue, a cell or tissue extract, a fractionated lysate thereof or a substantially purified molecule. In some cases, the target analyte is a polypeptide.

[0063] The terms “polypeptide” and “peptide” generally to refer to a polymer of amino acids in which an amino acid may be linked to another amino acid by a peptide bond. In some examples, a polypeptide is a protein. The amino acid may be a naturally occurring amino acid or a non-naturally occurring amino acid (e.g., amino acid analogue). The polymer may be linear or branched and may include modified amino acids, and/or may be interrupted by nonamino acids. Polypeptides may occur as single chains or associated chains. The polymer may include a plurality of amino acids and may have a secondary and tertiary structure (e.g., protein). In some examples, the polymer comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 1000, 10,000, or more amino acids.

[0064] The term “amino acid,” as used herein, generally refers to a naturally occurring or non-naturally occurring amino acid (amino acid analogue). The non-naturally occurring amino acid may be a synthesized amino acid. As used herein, the terms “amino acid sequence,” “peptide sequence,” and “polypeptide sequence,” as used herein, generally refer to at least two amino acids or amino acid analogs that are covalently linked by a peptide (amide) bond or an analog of a peptide bond. The term peptide includes oligomers and polymers of amino acids or amino acid analogs. The amino acids of the peptide may be L-amino acids or D-amino acids. A peptide, polypeptide, or protein may be synthetic, recombinant, or naturally occurring. A synthetic peptide may be a peptide that is produced by artificial approaches in vitro. In some cases, the term “amino acid” in general refers to organic compounds that contain at least one amino group, — NH2, which may be present in its ionized form, — NH3 + , and one carboxyl group, — COOH, which may be present in its ionized form, — COO“, where the carboxylic acids are deprotonated at neutral pH, having the basic formula of NH2CHRCOOH.

[0065] As used herein, the term “terminal” is referred to as singular terminus and plural termini.

[0066] As used herein, the term “side chains” generally refers to unique structures attached to the alpha carbon (attaching the amine and carboxylic acid groups of the amino acid) that render uniqueness to each type of amino acid. Side chains have a variety of shapes, sizes, charges, and reactivities, such as charged polar side chains, either positively or negatively charged, such as lysine (+), arginine (+), histidine (+), aspartate (-), and glutamate (-); amino acids may also be basic, such as lysine, or acidic, such as glutamic acid; uncharged polar side chains have hydroxyl, amide, or thiol groups, such as cysteine having a chemically reactive side chain, e.g., a thiol group that may form bonds with another cysteine, serine (Ser) and threonine (Thr), that have hydroxy lie R side chains of different sizes; asparagine (Asn), glutamine (Gin), and tyrosine (Tyr); non-polar hydrophobic amino acid side chains include the amino acid glycine, alanine, valine, leucine, and isoleucine having aliphatic hydrocarbon side chains ranging in size from a methyl group for alanine to isomeric butyl groups for leucine and isoleucine; methionine (Met) has a thiol ether side chain; proline (Pro) has a cyclic pyrrolidine side group. Phenylalanine (with its phenyl moiety) (Phe) and tryptophan (Trp) (with its indole group) contain aromatic side chains, which are characterized by bulk as well as lack of polarity.

[0067] Amino acids can also be referred to by a name or 3-letter code or 1 -letter code, for example, Cysteine, Cys, C; Lysine, Lys, K; Tryptophan, Trp, W, respectively.

[0068] The term “cleavable unit,” as used herein, generally refers to a molecule that may be split into at least two molecules. Non-limiting examples of cleavage reagents and conditions to split a cleavable unit include: enzymes, nucleophilic or basic reagents, reducing agents, photo-irradiation, electrophilic or acidic reagents, organometallic or metal reagents, and oxidizing reagents.

[0069] The term “sample,” as used herein, generally refers to a sample containing or suspected of containing a polypeptide. For example, a sample may be a biological sample containing one or more polypeptides. The biological sample may be obtained (e.g., extracted or isolated) from or include blood (e.g., whole blood), plasma, serum, urine, saliva, mucosal excretions, sputum, stool and tears. The biological sample may be a fluid or tissue sample (e.g., skin sample). In some examples, the sample is obtained from a cell-free bodily fluid, such as whole blood, saliva, or urine. In some examples, the sample may include circulating tumor cells. In some examples, the sample is an environmental sample (e.g., soil, waste, ambient air), industrial sample (e.g., samples from any industrial processes), and food samples (e.g., dairy products, vegetable products, and meat products). The sample may be processed before loading into a microfluidic device. For example, the sample may be processed to purify the polypeptides and/or to include reagents.

[0070] As used herein, sequencing of peptides “at the single molecule level” generally refers to amino acid sequence information obtained from individual (e.g., single) peptide molecules in a mixture of diverse peptide molecules. The amino acid sequence information may be obtained from an entirety of an individual peptide molecule or one or more portion of the individual peptide molecule, such as a contiguous amino acid sequence of at least a portion of the individual peptide molecule. Alternatively, partial amino acid sequence information may be obtained, which may allow for identification of the peptide or protein. Partial amino acid sequence information, including for example, the pattern of a specific amino acid residue (e.g., lysine) within individual peptide molecules, may be sufficient to uniquely identify an individual peptide molecule. For example, a pattern of amino acids may comprise a plurality of identified positions (e.g., identified as a particular amino acid type, such as lysine, or identified as a particular set of amino acids, such as the set of carboxylate side chaincontaining amino acids), and a plurality of unidentified positions. The sequence of identified positions may be searched against a documented proteome of a given organism to identify the individual peptide molecule. In some examples, sequencing of a peptide at the single molecule level may identify a pattern of a certain type of amino acid (e.g., lysine) in an individual peptide molecule. Such information may be used to identify a macromolecule (e.g., protein) from which the peptide was derived. This may advantageously preclude identifying all amino acids of the peptide.

[0071] As used herein, the term “Edman degradation” generally refers to methods comprising chemical removal of amino acids from peptides or proteins. In some cases, Edman degradation denotes terminal (e.g., N- or C-terminal) amino acid removal. In specific cases, Edman degradation refers to N-terminal amino acid removal through isothiocyanate (e.g., phenyl isothiocyanate) coupling and cyclization with the terminal amine group of an N- terminal residue, such that the N-terminal amino acid is removed from a peptide. In some cases, Edman degradation broadly encompasses N-terminal amino acid functionalizations leading to N-terminal amino acid removal. In some cases, Edman degradation encompasses C-terminal amino acid removal. In some cases, Edman degradation comprises terminal amino acid functionalization (e.g., N-terminal amino acid isothiocyanate functionalization) followed by enzymatic removal (e.g., by an ‘Edmanase’ with specificity for chemically derivatized N- terminal amino acids).

[0072] As used herein, the term “single molecule sensitivity” generally refers to the ability to acquire data (including, for example, amino acid sequence information) from individual peptide molecules in a mixture of diverse peptide molecules. In one non-limiting example, the mixture of diverse peptide molecules may be immobilized on a solid surface (including, for example, a glass slide, or a glass slide whose surface has been chemically modified). This may include the ability to simultaneously record the fluorescent intensity of multiple individual (e.g., single) peptide molecules distributed across the glass surface. Optical devices are commercially available that may be applied in this manner. For example, a microscope equipped with total internal reflection illumination and an intensified chargecouple device (CCD) detector is available. Imaging with a high sensitivity CCD camera allows the instrument to simultaneously record the fluorescent intensity of multiple individual (e.g., single) peptide molecules distributed across a surface. Image collection may be performed using an image splitter that directs light through two band pass filters (one suitable for each fluorescent molecule) to be recorded as two side-by-side images on the CCD surface. Using a motorized microscope stage with automated focus control to image multiple stage positions in the flow cell may allow millions of individual single peptides (or more) to be sequenced in one experiment.

[0073] As used herein, the term “support” generally refers to an entity to which a substance (e.g., molecular construct) may be immobilized. The solid may be a solid or semi-solid (e.g., gel) support. As a non-limiting example, a support may be a bead, a polymer matrix, an array, a microscopic slide, a glass surface, a plastic surface, a transparent surface, a metallic surface, a magnetic surface, a multi-well plate, a nanoparticle, a microparticle, or a functionalized surface. The support may be planar. As an alternative, the support may be non-planar, such as including one or more wells. A bead may be, for example, a marble, a polymer bead (e.g., a polysaccharide bead, a cellulose bead, a synthetic polymer bead, a natural polymer bead), a silica bead, a functionalized bead, an activated bead, a barcoded bead, a labeled bead, a PCA bead, a magnetic bead, or a combination thereof. A bead may be functionalized with a functional motif. Some non-limiting examples of functional motifs include a capture reagent (e.g., pyridinecarboxyaldehyde (PCA)), a biotin, a streptavidin, a strep-tag II, a linker, or a functional group that may react with a molecule (e.g., an aldehyde, a phosphate, a silicate, an ester, an acid, an amide, an alkyne, an azide, or an aldehyde dithiolane. The functional group may couple specifically to an N-terminus or a C-terminus of a peptide. The functional group may couple specifically to an amino acid side chain. The functional group may couple to a side chain of an amino acid (e.g., the acid of a glutamate or aspartate, the thiol of a cysteine, the amine of a lysine, or the amide of a glutamine, or asparagine). The functional group may couple specifically to a reactive group on a particular species, such as a label. In some examples of functionalized beads, the functional motif may be reversibly coupled and cleaved. A functional motif may also irreversibly couple to a molecule.

[0074] As used herein, the term “array” generally refers to a population of sites. Such populations of sites may be differentiated from one another according to relative location. Different molecules that are at different sites of an array may be differentiated from each other according to the locations of the sites in the array. An individual site of an array may include one or more molecules of a particular type. For example, a site may include a single polypeptide having a particular sequence or a site may include several polypeptides having the same sequence. The sites of an array may be different features located on the same substrate. Such features may include, without limitation, wells in a substrate, beads (or other particles) in or on a substrate, projections from a substrate, ridges on a substrate or channels in a substrate. The sites of an array may be separate substrates each bearing at least one molecule. Different molecules attached to separate substrates may be identified according to the locations of the substrates on a surface to which the substrates are associated or according to the locations of the substrates in a liquid or gel. Such different molecules may have the same or different sequences. An array may include one or more wells, and an well of the one or more wells may have one or more beads. As an alternative, the array may be a planar surface having, for example, a molecule immobilized thereon, or, as another example, one or more beads immobilized thereon.

[0075] As used herein, the term “label” generally refers to a molecular or macromolecular construct that may couple to a reactive group, such as an amino acid side chain, C-terminal carboxylate, or N-terminal amine. The label may comprise at least one reactive group (e.g., a first reactive group and a second reactive group). The at least one reactive group may be configured to couple to a polypeptide. The at least one reactive group may be configured to couple to a support. The at least one reactive group may be coupled to or configured to couple to a detectable moiety. A label may provide a measurable signal.

[0076] As used herein, the term “polymer matrix” generally refers to a continuous phase material that comprises at least one polymer. In some cases, the polymer matrix refers to the at least one polymer as well as the interstitial space not occupied by the polymer. A polymer matrix may be composed of one or more types of polymers. A polymer matrix may include linear, branched, and crosslinked polymer units. A polymer matrix may also contain non- polymeric species intercalated within its interstitial spaces not occupied by polymer chains. The intercalated species may be solid, liquid or gaseous species. For example, the term ‘polymer matrix’ may encompass desiccated hydrogels, hydrated hydrogels, and hydrogels containing glass fibers.

[0077] The term “reporter moiety,” as used herein, generally refers to an agent that generates a measurable signal. Such a signal may include, but is not limited to, fluorescence (e.g., a dye), visible light, motion (e.g., a mass tag), radiation, or a nucleic acid sequence (e.g., a barcode). Such a signal may include, but is not limited to, fluorescence, phosphorescence, or radiation. Such signal may be light (or electromagnetic radiation). The light may include a frequency or frequency distribution in the visible portion of the electromagnetic spectrum. For example, the light may be infrared or ultraviolet light. The signal may be an electrostatic, a conductive, or an impedance signal. The signal may be a charge. A “reporter” may comprise a “reporter moiety”. The reporter may comprise a reactive group. The reactive group may be configured to couple to a label. [0078] Peptide sequence information may be obtained from a polypeptide molecule or from one or more portions of the polypeptide molecule. Peptide sequencing may provide complete or partial amino acid sequence information for a peptide sequence or a portion of a peptide sequence. At least a portion of the peptide sequence may be determined at the single molecule level. In some cases, partial amino acid sequence information, including for example, the relative positions of a specific type of amino acid (e.g., lysine) within a peptide or portion of a peptide, may be sufficient to uniquely identify an individual peptide molecule. For example, a pattern of amino acids, such as, for example, X-X-X-Lys-X-X-X-X-Lys-X-Lys, which indicates the distribution of lysine molecules within an individual peptide molecule, may be searched against a documented proteome of a given organism to identify the individual peptide molecule. Such information may be used to identify a macromolecule (e.g., protein) from which the peptide was derived, and may preclude identifying all amino acids of the peptide.

[0079] Peptide sequencing may be used to acquire information (including, for example, amino acid sequence information) from individual peptide molecules in a mixture of diverse peptide molecules. In a non-limiting example, a plurality of peptides may be immobilized on a solid surface (including, for example, a glass slide, or a glass slide whose surface has been chemically modified, a plastic slide, a multi- well plate, a cassette), amino acids from the plurality of peptides may be coupled to fluorescent reporter moieties, and the fluorescent reporter moieties may be optically detected.

[0080] Numerous commercially available optical devices may be applied in this manner. For example, microscopes equipped with total internal reflection illumination and intensified charge-couple device (CCD) detectors may be adapted for sequencing methods disclosed herein. A high sensitivity CCD camera may be configured to simultaneously record the fluorescence intensity of multiple individual (e.g., single) peptide molecules distributed across a surface, and may be coupled to an image splitter to facilitate the simultaneous collection of multiple, distinct images (e.g., a first image comprising light of a first wavelength and a second image comprising light of a second wavelength). Using a motorized microscope stage with automated focus control to image multiple stage positions in the flow cell may allow thousands or more (e.g., millions) of individual single peptides (or more) to be sequenced in a single experiment.

[0081] In an aspect, the present disclosure provides solutions to the aforementioned challenges by providing expeditious and facile methods for analyzing a polypeptide. Additionally, some aspects of the present disclosure provide compositions that facilitate effective peptide characterization and analysis. Furthermore, in some aspects the present disclosure provides kits which enable effective polypeptide analysis.

[0082] The term “linker”, as described herein couples at least two molecules. In some embodiments, a linker couples at least two molecules directly or indirectly.

[0083] When used in the context of a chemical group: “hydrogen” means -H; “hydroxy” means -OH; “ oxo” means =0; “carbonyl” means -C(=O)-; “carboxy” means -C(=O)OH (also written as -COOH or -CO2H); “halo” means independently -F, -Cl, -Br or -I;

“amino” means -NH2; “hydroxyamino” means -NHOH; “nitro” means -NO2; imino means =NH; “cyano” means -CN; “isocyanate” means -N=C=O; “azido” means -N3; in a monovalent context “phosphate” means -OP(O)(OH)2 or a deprotonated form thereof; in a divalent context “phosphate” means -OP(O)(OH)O- or a deprotonated form thereof;

“mercapto” means -SH; and “thio” means =S; “sulfonyl” means -S(O)2~; and “sulfinyl” means -S(O)-.

[0084] In the context of chemical formulas, the symbol means a single bond, “=” means a double bond, and “=” means triple bond. The symbol “ - ” represents an optional bond, which if present is either single or double. The symbol “=” represents a single bond or a double bond. Thus, the formula covers, for example, 00.0.0 And it is understood that no one such ring atom forms part of more than one double bond. Furthermore, it is noted that the covalent bond symbol when connecting one or two stereogenic atoms, does not indicate any preferred stereochemistry. Instead, it covers all stereoisomers as well as mixtures thereof. The symbol “^vvx ”, when drawn perpendicularly across a bond (e.g. , CH 3 for methyl) indicates a point of attachment of the group. It is noted that the point of attachment is often only identified in this manner for larger groups in order to assist the reader in unambiguously identifying a point of attachment. The symbol means a single bond where the group attached to the thick end of the wedge is “out of the page.” The symbol “ " ll111 ” means a single bond where the group attached to the thick end of the wedge is “into the page”. The symbol “ ” me ans a single bond where the geometry around a double bond (e.g., either E or Z) is undefined. Both options, as well as combinations thereof are therefore intended. Any undefined valency on an atom of a structure shown in this application implicitly represents a hydrogen atom bonded to that atom. A bold dot on a carbon atom indicates that the hydrogen attached to that carbon is oriented out of the plane of the paper.

[0085] When a variable is depicted as a “floating group” on a ring system, for example, the group “R” in the formula: then the variable may replace any hydrogen atom attached to any of the ring atoms, including a depicted, implied, or expressly defined hydrogen, so long as a stable structure is formed. When a variable is depicted as a “floating group” on a fused ring system, as for example the group “R” in the formula: then the variable may replace any hydrogen attached to any of the ring atoms of either of the fused rings unless specified otherwise. Replaceable hydrogens include depicted hydrogens (e.g., the hydrogen attached to the nitrogen in the formula above), implied hydrogens (e.g., a hydrogen of the formula above that is not shown but understood to be present), expressly defined hydrogens, and optional hydrogens whose presence depends on the identity of a ring atom (e.g., a hydrogen attached to group X, when X equals -CH-), so long as a stable structure is formed. In the example depicted, R may reside on either the 5 -membered or the 6-membered ring of the fused ring system. In the formula above, the subscript letter “y” immediately following the R enclosed in parentheses, represents a numeric variable. Unless specified otherwise, this variable can be 0, 1, 2, or any integer greater than 2, only limited by the maximum number of replaceable hydrogen atoms of the ring or ring system.

[0086] For the chemical groups and compound classes, the number of carbon atoms in the group or class is as indicated as follows: “Cn” defines the exact number (n) of carbon atoms in the group/class. “C<n” defines the maximum number (n) of carbon atoms that can be in the group/class, with the minimum number as small as possible for the group/class in question. For example, it is understood that the minimum number of carbon atoms in the groups “alkyl(c<8)”, “cycloalkanediyl(c<8)”, “heteroaryl ( c<8)”, and “acyl(c<8)” is one, the minimum number of carbon atoms in the groups “alkenyl<c<8)”, “alkynyl(c<8)”, and “heterocycloalkyl(c<8)” is two, the minimum number of carbon atoms in the group “cycloalkyl(c<8)” is three, and the minimum number of carbon atoms in the groups “aryl(c<8)” and “arenediyl(c<8)” is six. “Cn-n'” defines both the minimum (n) and maximum number (n') of carbon atoms in the group. Thus, “alkyl(C2-io)” designates those alkyl groups having from 2 to 10 carbon atoms. These carbon number indicators may precede or follow the chemical groups or class it modifies and it may or may not be enclosed in parenthesis, without signifying any change in meaning. Thus, the terms “C5 olefin”, “C5-olefin”, “olefin^)”, and “olefines” are all synonymous. When any of the chemical groups or compound classes defined herein is modified by the term “substituted”, any carbon atom in the moiety replacing the hydrogen atom is not counted. Thus methoxyhexyl, which has a total of seven carbon atoms, is an example of a substituted alkyl(ci-6). Unless specified otherwise, any chemical group or compound class listed in a claim set without a carbon atom limit has a carbon atom limit of less than or equal to twelve.

[0087] The term “aliphatic” when used without the “substituted” modifier generally signifies that the compound or chemical group so modified is an acyclic or cyclic, but non-aromatic hydrocarbon compound or group. In aliphatic compounds/groups, the carbon atoms can be joined together in straight chains, branched chains, or non-aromatic rings (alicyclic).

Aliphatic compounds/groups can be saturated, that is joined by single carbon-carbon bonds (alkanes/alkyl), or unsaturated, with one or more carbon-carbon double bonds (alkenes/alkenyl) or with one or more carbon-carbon triple bonds (alkynes/alkynyl).

[0088] The term “aromatic” generally signifies that the compound or chemical group so modified has a planar unsaturated ring of atoms with 4n +2 electrons in a fully conjugated cyclic 7i system. An aromatic compound or chemical group may be depicted as a single resonance structure; however, depiction of one resonance structure is taken to also refer to any other resonance structure. For example:

Aromatic compounds may also be depicted using a circle to represent the delocalized nature of the electrons in the fully conjugated cyclic TI system, two non-limiting examples of which are shown below:

[0089] The term “alkyl” when used without the “substituted” modifier generally refers to a monovalent saturated aliphatic group with a carbon atom as the point of attachment, a linear or branched acyclic structure, and no atoms other than carbon and hydrogen. The groups -CH3 (Me), -CH2CH3 (Et), -CH2CH2CH3 (n-Pr or propyl), -CH(CH3)2 (z-Pr, 'Pr or isopropyl), -CH2CH2CH2CH3 (n-Bu), -CH(CH 3 )CH 2 CH 3 (sec-butyl), -CH 2 CH(CH 3 ) 2 (isobutyl), - C(CH3)3 (tert-butyl, t-butyl, t-Bu or Tin), and -CH2C(CH3)3 (rzeo-pentyl) are non-limiting examples of alkyl groups. The term “alkanediyl” when used without the “substituted” modifier refers to a divalent saturated aliphatic group, with one or two saturated carbon atom(s) as the point(s) of attachment, a linear or branched acyclic structure, no carbon-carbon double or triple bonds, and no atoms other than carbon and hydrogen. The groups -CH2- (methylene), -CH2CH2-, -CH 2 C(CH3) 2 CH 2 -, and -CH2CH2CH2- are nonlimiting examples of alkanediyl groups. The term “alkylidene” when used without the “substituted” modifier refers to the divalent group =CRR' in which R and R' are independently hydrogen or alkyl. Non-limiting examples of alkylidene groups include: =CH2, =CH(CH2CH3), and =C(CH3)2. An “alkane” refers to the class of compounds having the formula H-R, wherein R is alkyl as this term is defined above. When any of these terms is used with the “substituted” modifier, one or more hydrogen atom has been independently replaced by -OH, -F, -Cl, -Br, -I, -NH 2 , -NO 2 , -CO 2 H, -CO 2 CH 3 , -CN, -SH, -OCH 3 , -OCH2CH3, -C(O)CH 3 , -NHCH3, -NHCH 2 CH 3 , -N(CH 3 ) 2 , -C(O)NH 2 , -C(O)NHCH 3 , -C(O)N(CH 3 )2, -OC(O)CH 3 , -NHC(O)CH 3 , -S(O) 2 OH, or -S(O) 2 NH 2 . The following groups are non-limiting examples of substituted alkyl groups: -CH2OH, -CH2CI, -CF3, -CH2CN, -CH 2 C(O)OH, -CH 2 C(O)OCH 3 , -CH 2 C(O)NH 2 , -CH 2 C(O)CH 3 , -CH2OCH3, -CH 2 OC(O)CH 3 , -CH2NH2, -CH 2 N(CH 3 )2, and -CH2CH2CI. The term “haloalkyl” is a subset of substituted alkyl, in which the hydrogen atom replacement is limited to halo (e.g. -F, -Cl, -Br, or -I) such that no other atoms aside from carbon, hydrogen and halogen are present. The group, -CH2CI is a non-limiting example of a haloalkyl. The term “fluoroalkyl” is a subset of substituted alkyl, in which the hydrogen atom replacement is limited to fluoro such that no other atoms aside from carbon, hydrogen and fluorine are present. The groups -CH2F, -CF3, and -CH2CF3 are non-limiting examples of fluoroalkyl groups.

[0090] The term “aryl” generally refers to a monovalent unsaturated aromatic group with an aromatic carbon atom as the point of attachment, said carbon atom forming part of a one or more aromatic ring structures, each with six ring atoms that are all carbon, and wherein the group consists of no atoms other than carbon and hydrogen. If more than one ring is present, the rings may be fused or unfused. Unfused rings are connected with a covalent bond. As used herein, the term aryl does not preclude the presence of one or more alkyl groups (carbon number limitation permitting) attached to the first aromatic ring or any additional aromatic ring present. Non- limiting examples of aryl groups include phenyl (Ph), methylphenyl, (dimethyl)phenyl, -C6H4CH2CH3 (ethylphenyl), naphthyl, and a monovalent group derived from biphenyl (e.g., 4-phenylphenyl). The term “arenediyl” refers to a divalent aromatic group with two aromatic carbon atoms as points of attachment, said carbon atoms forming part of one or more six-membered aromatic ring structures, each with six ring atoms that are all carbon, and wherein the divalent group consists of no atoms other than carbon and hydrogen. As used herein, the term arenediyl does not preclude the presence of one or more alkyl groups (carbon number limitation permitting) attached to the first aromatic ring or any additional aromatic ring present. If more than one ring is present, the rings may be fused or unfused. Unfused rings are connected with a covalent bond. Non-limiting examples of arenediyl groups include:

An “arene” generally refers to the class of compounds having the formula H-R, wherein R is aryl as that term is defined above. Benzene and toluene are non-limiting examples of arenes. When any of these terms is used with the “substituted” modifier, one or more hydrogen atom has been independently replaced by -OH, -F, -Cl, -Br, -I, -NH2, -NO2, -CO2H, -CO2CH3, -CN, -SH, -OCH3, -OCH2CH3, -C(O)CH 3 , -NHCH3, -NHCH2CH3, -N(CH 3 ) 2 , -C(O)NH 2 , -C(O)NHCH 3 , -C(O)N(CH 3 )2, -OC(O)CH 3 , -NHC(O)CH 3 , -S(O) 2 OH, or -S(O) 2 NH 2 .

[0091] The term “heteroaryl” generally refers to a monovalent aromatic group with an aromatic carbon atom or nitrogen atom as the point of attachment, said carbon atom or nitrogen atom forming part of one or more aromatic ring structures, each with three to eight ring atoms, wherein at least one of the ring atoms of the aromatic ring structure(s) is nitrogen, oxygen or sulfur, and wherein the heteroaryl group consists of no atoms other than carbon, hydrogen, aromatic nitrogen, aromatic oxygen and aromatic sulfur. If more than one ring is present, the rings are fused; however, the term heteroaryl does not preclude the presence of one or more alkyl or aryl groups (carbon number limitation permitting) attached to one or more ring atoms. Non- limiting examples of heteroaryl groups include benzoxazolyl, benzimidazolyl, furanyl, imidazolyl (Im), indolyl, indazolyl (Im), isoxazolyl, methylpyridinyl, oxazolyl, phenylpyridinyl, pyridinyl (pyridyl), pyrrolyl, pyrimidinyl, pyrazinyl, quinolyl, quinazolyl, quinoxalinyl, triazinyl, tetrazolyl, thiazolyl, thienyl, and triazolyl. The term ‘W-heteroaryl” refers to a heteroaryl group with a nitrogen atom as the point of attachment. A “heteroarene” refers to the class of compounds having the formula H-R, wherein R is heteroaryl. Pyridine and quinoline are non-limiting examples of heteroarenes. The term “heteroarenediyl” refers to a divalent aromatic group, with two aromatic carbon atoms, two aromatic nitrogen atoms, or one aromatic carbon atom and one aromatic nitrogen atom as the two points of attachment, said atoms forming part of one or more aromatic ring structures, each with three to eight ring atoms, wherein at least one of the ring atoms of the aromatic ring structure(s) is nitrogen, oxygen or sulfur, and wherein the divalent group consists of no atoms other than carbon, hydrogen, aromatic nitrogen, aromatic oxygen and aromatic sulfur. If more than one ring is present, the rings are fused; however, the term heteroarenediyl does not preclude the presence of one or more alkyl or aryl groups (carbon number limitation permitting) attached to one or more ring atoms. Non-limiting examples of heteroarenediyl groups include:

[0092] When any of these terms is used with the “substituted” modifier, one or more hydrogen atom has been independently replaced by -OH, -F, -Cl, -Br, -I, -NH2, -NO2, -CO2H, -CO2CH3, -CN, -SH, -OCH3, -OCH2CH3, -C(O)CH 3 , -NHCH3, -NHCH2CH3, -N(CH 3 )2, -C(O)NH 2 , -C(O)NHCH 3 , -C(O)N(CH 3 )2, -OC(O)CH 3 , -NHC(O)CH 3 , -S(O) 2 OH, or -S(O) 2 NH 2 .

[0093] The term “alkoxy” when used without the “substituted” modifier generally refers to the group -OR, in which R is an alkyl, as that term is defined above. Non-limiting examples include: -OCH3 (methoxy), -OCH2CH3 (ethoxy), -OCH2CH2CH3, -OCH(CH 3 ) 2 (isopropoxy), or -OC(CH 3 ) 3 (n?rt-butoxy). The term “alkyl thio” and “acylthio” when used without the “substituted” modifier refers to the group -SR, in which R is an alkyl and acyl, respectively. The term “alcohol” corresponds to an alkane, as defined above, wherein at least one of the hydrogen atoms has been replaced with a hydroxy group. The term “ether” corresponds to an alkane, as defined above, wherein at least one of the hydrogen atoms has been replaced with an alkoxy group. When any of these terms is used with the “substituted” modifier, one or more hydrogen atom has been independently replaced by -OH, -F, -Cl, -Br, -I, -NH 2 , -NO2, -CO2H, -CO2CH3, -CN, -SH, -OCH3, -OCH2CH3, -C(O)CH 3 , -NHCH3, -NHCH2CH3, -N(CH 3 ) 2 , -C(O)NH 2 , -C(O)NHCH 3 , -C(O)N(CH 3 )2, -OC(O)CH 3 , -NHC(O)CH 3 , -S(O) 2 OH, or -S(O) 2 NH 2 . [0094] The term “alkylamino” when used without the “substituted” modifier generally refers to the group -NHR, in which R is an alkyl, as that term is defined above. Non-limiting examples include: -NHCH3 and -NHCH2CH3. The term “dialkylamino” when used without the “substituted” modifier refers to the group -NRR', in which R and R' can be the same or different alkyl groups. Non-limiting examples of dialkylamino groups include: -NtCHsh and -N(CH3)(CH2CH3). When any of these terms is used with the “substituted” modifier, one or more hydrogen atom attached to a carbon atom has been independently replaced by -OH, -F, -Cl, -Br, -I, -NH 2 , -NO2, -CO 2 H, -CO2CH3, -CN, -SH, -OCH3, -OCH2CH3, -C(O)CH 3 , -NHCH3, -NHCH2CH3, -N(CH 3 ) 2 , -C(O)NH 2 , -C(O)NHCH 3 , -C(O)N(CH 3 ) 2 , -OC(O)CH 3 , -NHC(O)CH 3 , -S(O) 2 OH, or -S(O) 2 NH 2 . The groups -NHC(O)OCH 3 and -NHC(O)NHCH3 are non-limiting examples of substituted amido groups.

[0095] The term "monoketone" generally refers to a compound incorporating one carbonyl group with two carbon atoms attached to the carbon of the carbonyl group, including saturated compounds and unsaturated compounds including double and/or triple bonds. Alkyl chains of a monoketone can be linear or branched. The term "monoketone" can also encompass compounds with a cyclic carbon backbone.

[0096] As used here, the term “beta-diketone” generally refers to a compound where two ketones occur at beta carbon positions to each other (e.g. a compound having the core formula — C(O) — CH2 — C(O) — , or a 1,3-diketone).

Post-translational modification mapping

[0097] Proteomics is the large-scale study proteins present in an organism, system, or biological consortia. Proteins are quintessential to organisms, facilitating the majority of chemical and physical processes carried out by life. Accordingly, the set of proteins expressed within a cell, organism, or system often strongly reflective of health, biological state, biological activity, and physical conditions (e.g., heat stress, nutrient depletion, or stimulation). Accordingly, peptide sequencing is a tool that may be used in a variety of applications within the field of proteomics.

[0098] Mapping the sites of post-translational modifications on peptides and proteins at high sensitivity is not available. Detection via antibodies fail to provide unbiased detection of post- translational modifications. Further, the stoichiometry of post-translational modifications may be greater than at least 3 orders of magnitude, which makes quantification by mass spectrometry infeasible. Disclosed herein are methods of detecting and mapping post- translational modification of proteins using single molecule fluorosequencing. [0099] The present disclosure also provides methods and systems for peptide (e.g., protein) analysis (e.g., compositional analysis and sequencing). Methods of the present disclosure may permit a peptide (e.g., protein) to be analyzed (e.g., sequenced) in a manner that provides various non-limiting benefits, such as, for example, sequencing a protein or peptide comprising a chemically modified N-terminal amino acid (e.g., fluorophores). Peptide sequencing may be used to reveal biomarkers (locations of post-translational modifications) for the diagnosis of cancer and other diseases or in understanding the function of healthy cells. Peptides produced by cells or tissues may act as unique biomarkers. Enhanced detection of these biomarkers through peptide sequencing may provide earlier, more accurate diagnoses of disease.

[0100] There are no existing tools for covalent and selective labeling of ubiquitin for proteomic applications. Existing tools, such as tandem mass-spectrometry use detection of the Gly-Gly peptide on their tryptic peptides for identification. However, sensitivity of the method is low, about 6 orders of magnitude less sensitive than fluorosequencing technology. A second existing method for detecting ubiquitinated proteins is using affinity reagents. Both are methods are semi-quantitative at best and determining stoichiometry of modification requires two different calibrants, reducing accuracy.

[0101] Disclosed herein are methods and compositions for site-specifically labeling amino acid residues with post-translational modification. In some cases, the methods and compositions of the disclosure may label terminal glycine residues on ubiquitin peptides, which remain after trypsinization of a parent protein. In some cases, the methods and compositions of the disclosure may label N-terminal glycine residues on ubiquitin peptides, which remain after trypsinization of a parent protein. In some cases, the methods and compositions of the disclosure may label C-terminal glycine residues on ubiquitin peptides, which remain after trypsinization of a parent protein. In some cases, the methods and compositions of the disclosure may label internal glycine residues on ubiquitin peptides, which remain after trypsinization of a parent protein.

[0102] A method of the present disclosure may be configured to analyze peptides spanning at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, or at least about 10 orders of magnitude in concentration in a sample. For example, a method of the present disclosure may permit simultaneous measurements of ubiquitinated proteins from human serum, peptides that are difficult to simultaneously detect due to their 7+ order of magnitude concentration differences. A method of the present disclosure may be configured to identify at least about 100, at least about 500, at least about 1000, at least about 5000, at least about 10 4 , at least about 5xl0 4 , at least about 10 5 , or at least about 5xl0 5 different proteins from a sample. A method of the present disclosure may be configured to identify at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about 1000, at least about 1200, at least about 1500, at least about 1800, at least about 2000, at least about 2500, at least about 3000, at least about 3500, at least about 4000, or at least about 5000 types of proteins from a sample (e.g., human lung homogenate). A method of the present disclosure may be configured to simultaneously (e.g., within a single assay) identify at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about 1000, at least about 1200, at least about 1500, at least about 1800, at least about 2000, at least about 2500, at least about 3000, at least about 3500, at least about 4000, or at least about 5000 types of proteins from a sample (e.g., buffy coat lysate). A method of the present disclosure may be configured to identify at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, or at least about 95% of the types of peptides in a biological sample (e.g., a human biological sample).

Biopolymers and post-translational modifications

[0103] Ubiquitination is the biochemical process in which proteins are marked by ubiquitin, a 76 amino acid protein. Ubiquitylation occurs via formation of an isopeptide bond between an internal lysine of a substrate and the C-terminal glycine (glycine 76) of ubiquitin. In some cases, ubiquitination occurs between the C-terminus (-RGGV-OH) of ubiquitin and a s-amine of a lysine side chain of the parent protein. Ubiquitination occurs intracellularly in eukaryotes and regulates a wide variety of biological processes. The ubiquitination of a protein most commonly results in degradation of the protein via the ubiquitin-proteosome pathway but may also alter protein-protein interactions. The location of ubiquitination on a protein determines the biological response resulting from ubiquitination. In some cases, the methods of the disclosure may detect and map the presence of a single ubiquitin molecule (e.g., mono- ubiquitination). In some cases, the methods of the disclosure may detect and map the presence of a plurality of ubiquitin molecules (e.g., poly-ubiquitination). In some cases, the methods of the disclosure may detect and map the presence of at least about 1, at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, or at least about 10 ubiquitin molecules. In some cases, the methods of the disclosure may detect and map the presence of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 ubiquitin molecules. In some cases, the methods of the disclosure may detect and map the presence of at most about 1, at most about 2, at most about 3, at most about 4, at most about 5, at most about 6, at most about 7, at most about 8, at most about 9, or at most about 10 ubiquitin molecules.

[0104] Small ubiquitin-like modifier (SUMO) proteins are a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their function through a process called SUMOylation. SUMOylation is a post-translational modification involved in various cellular processes, such as nuclear-cytosolic transport, transcriptional regulation, apoptosis, protein stability, response to stress and progression through the cell cycle. SUMO proteins are similar to ubiquitin and are members of the ubiquitin-like family. Mature SUMO is produced when the terminal four amino acids of the C-terminus have been cleaved off to allow formation of an isopeptide bond between the C-terminal glycine residue of SUMO and an acceptor lysine on the target protein. In some cases, SUMOylation occurs as an isopeptide bond between the C-terminal glycine carboxylate of the SUMO-(1,2,3,5) protein with the s-lysine of a substrate protein.

[0105] In some cases, the methods of the disclosure may detect post- translational modification of a whole length protein. In some cases, the methods of the disclosure may detect post-translational modification of a recombinant protein. In some cases, the methods of the disclosure may detect post-translational modification of a peptide.

[0106] The peptide may comprise a plurality of amino acids. The peptide may be an oligomer or polymer comprising amino acids or amino acid analogues. The peptide may comprise amino acids that are L-amino acids or D-amino acids. A peptide may be synthetic, recombinant, or naturally occurring. A synthetic peptide may be a peptide that is produced by artificial approaches in vitro. At least one amino acid of the plurality of amino acids may be selected from the group consisting of lysine, cysteine, glutamic acid, aspartic acid, tyrosine, arginine, histidine, threonine, serine, glutamine, asparagine and tryptophan. The plurality of amino acids may comprise one or more amino acids, the one or more amino acid selected from the group consisting of lysine, cysteine, glutamic acid, aspartic acid, tyrosine, arginine, histidine, threonine, serine, glutamine, asparagine and tryptophan. The peptide may comprise one amino acid selected from the group consisting of lysine, cysteine, glutamic acid, aspartic acid, tyrosine, arginine, histidine, threonine, serine, glutamine, asparagine and tryptophan. The plurality of amino acids may comprise a non-natural amino acid. The plurality of amino acids may comprise a D-amino acid.

[0107] The plurality of amino acids may comprise at least two or more amino acid types. The at least two amino acid or more types may comprise a first amino acid type and a second amino acid type. The first amino acid type may be coupled to a first label. The second amino acid type may be coupled to a second label. The first amino acid type may be coupled to a first label and the second amino acid type may be coupled to a second label. The first label and the second label may each be coupled to a different reporter moiety. The plurality of amino acids may comprise at least about one, two, three, four, five, six, seven, eight, nine, ten, eleven, or more amino acid types. The plurality of amino acids may comprise between two and twenty amino acid types. The plurality of amino acids may comprise between 4 and 18 amino acid types. The plurality of amino acids may comprise between 6 and 16 amino acid types. The plurality of amino acids may comprise between 8 and 14 amino acid types. The plurality of amino acids may comprise between 9 and 11 amino acid types. Less than all of the amino acid types of the plurality of amino acids may labelled. Each amino acid type of the at least two amino acid types may be coupled to a different label. The peptide may comprise at least four amino acid types, wherein each amino acid type of the at least four amino acid types are coupled to a different label. Less than all of the plurality of amino acids may be labelled. Each of the plurality of amino acids may be labelled.

[0108] The plurality of amino acids may comprise at least two amino acid types, and each amino acid type of the at least two amino acid types may be coupled to a different label. The peptide may comprise at least about three amino acid types, wherein each amino acid type of the at least about three amino acid types are coupled to a different label. The peptide may comprise at least about four amino acid types, wherein each amino acid type of the at least about four amino acid types are coupled to a different label. The peptide may comprise at least about five or six amino acid types, wherein each amino acid type of the at least about five or six amino acid types are coupled to a different label. The peptide may comprise at least about eight amino acid types, wherein each amino acid type of the at least about eight amino acid types are coupled to a different label. The peptide may comprise at least about ten amino acid types, wherein each amino acid type of the at least about ten amino acid types are coupled to a different label. Each label coupled to a different amino acid type may independently be coupled to a reporter moiety configured to emit a signal corresponding to each amino acid type. In some cases, the majority of the plurality of amino acids are labelled. In some cases, the majority of the plurality of amino acids are unlabeled. [0109] The amino acid that may be coupled to a label may be an amino acid selected from the group consisting of lysine, cysteine, glutamic acid, aspartic acid, tyrosine, arginine, histidine, threonine, serine, proline, asparagine, glutamine, and tryptophan. The amino acid that is coupled to a label may comprise a post-translational modification. The post- translational modification may be glycosylation, acetylation, alkylation, biotinylation, glutamylation, glycosylation, isoprenylation, phosphorylation, lipolation, phosphopantetheinylation, sulfation, selenation, amidation, ubiquitination, hydroxylation, nitration, nitrosylation, citrullination, cyclization (such as N-terminal glutamate or glutamine cyclization), and SUMOylation. In some cases, the post-translational modification is biotinylation. In some cases, the post-translational modification is ubiquitination. In some cases, the post- translational modification is SUMOylation.

[0110] A peptide, composed of two or more amino acids, may have an N-terminus and a C- terminus. These termini may be separated by one or more amino acids. The N-terminus is a terminal amino acid and may contain a terminal amine. The terminal amine may be unsubstituted, or may be substituted. In some instances, the amine may be cleaved, blocked, functionalized, or otherwise modified. Naturally-occurring peptides generally contain an unsubstituted amine at the N-terminal position. Any amino acid may become an N-terminus following a bond cleaving event. Similarly, the C-terminus is a terminal amino acid and may contain a terminal carboxylic acid. The terminal carboxylic acid may be unsubstituted or substituted. In some instances, the carboxylic acid may be cleaved, blocked, functionalized, or otherwise modified. Naturally-occurring peptides generally contain an unsubstituted carboxylic acid at the C-terminal position. Any amino acid may become a C-terminus following a bond cleavage event. In some examples, as provided herein, the C-terminus may be any amino acid. In other examples, the C-terminus is an acidic amino acid (e.g., glutamate or aspartate). The present disclosure provides for specific cleavage of a first peptide at a documented site in order to yield a second peptide with a specific (e.g., predetermined) C- terminal amino acid residue. The C-terminal amino acid, following cleavage, may be an acidic residue. The C-terminal amino acid, following cleavage, may be a non-acidic residue. Similarly, a first peptide may be intentionally cleaved to yield a second peptide with a specific (e.g., predetermined) N-terminal amino acid residue.

[0111] Peptides may have non-linear structures. In some cases, a peptide may be branched. In some cases, a peptide may be cyclic. In some cases, two or more peptides may be crosslinked. In some cases, two or more peptides may be covalently crosslinked. In some cases, two or more peptides may be non-covalently associated. [0112] In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is from about 25 to about 4000 residues, from about 25 to about 100 residues, from about 100 to about 200 residues, from about 200 to about 300 residues, from about 300 to about 400 residues, from about 400 to about 500 residues, from about 500 to about 600 residues, from about 600 to about 700 residues, from about 700 to about 800 residues, from about 800 to about 900 residues, or from about 900 to about 1000, from about 1000 to about 1250, from about 1250 to about 1500, from about 1500 to about 1750, from about 1750 to about 2000, from about 2000 to about 2250, from about 2250 to about 2500, from about 2500 to about 2750, from about 2750 to about 3000, from about 3000 to about 3250, from about 3250 to about 3500, from about 3500 to about 3750, or from about 3750 to about 4000 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is from about 25 to about 100 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is from about 100 to about 200 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is from about 200 to about 300 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is from about 3000 to about 3250 residues in length.

[0113] In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is at least about 25, at least about 50, at least about 75, at least about 100, at least about 125, at least about 150, at least about 175, at least about 200, at least about 225, at least about 250, at least about 275, at least about 300, at least about 325, at least about 350, at least about 375, at least about 400, at least about 425, at least about 450, at least about 475, at least about 500, at least about 550, at least about 600, at least about 650, at least about 700, at least about 750, at least about 800, at least about 850, at least about 900, at least about 950, or at least about 1000 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at least about 1000, at least about 1250, at least about 1500, at least about 1750, at least about 2000, at least about 2250, at least about 2500, at least about 2750, at least about 3000, at least about 3250, at least about 3500, at least about 3750, or at least about 4000 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at least about 50 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at least about 100 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at least about 150 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at least about 200 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is at least about 150 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at least about 3000 residues in length.

[0114] In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is about 25, about 50, about 75, about 100, about 125, about 150, about 175, about 200, about 225, about 250, about 275, about 300, about 325, about 350, about 375, about 400, about 425, about 450, about 475, about 500, about 550, about 600, about 650, about 700, about 750, about 800, about 850, about 900, about 950, or about 1000 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is about 1000, about 1250, about 1500, about 1750, about 2000, about 2250, about 2500, about 2750, about 3000, about 3250, about 3500, about 3750, or about 4000 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is about 50 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is about 100 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is about 150 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is about 200 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is about 150 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is about 3000 residues in length.

[0115] In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is at most about 25, at most about 50, at most about 75, at most about fOO, at most about 125, at most about 150, at most about 175, at most about 200, at most about 225, at most about 250, at most about 275, at most about 300, at most about 325, at most about 350, at most about 375, at most about 400, at most about 425, at most about 450, at most about 475, at most about 500, at most about 550, at most about 600, at most about 650, at most about 700, at most about 750, at most about 800, at most about 850, at most about 900, at most about 950, or at most about 1000 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at most about 1000, at most about 1250, at most about 1500, at most about 1750, at most about 2000, at most about 2250, at most about 2500, at most about 2750, at most about 3000, at most about 3250, at most about 3500, at most about 3750, or at most about 4000 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is at most about 50 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at most about 100 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at most about 150 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at most about 200 residues in length. In some cases, the methods of the disclosure may detect post-translational modification of a peptide that is at most about 150 residues in length. In some cases, the methods of the disclosure may detect post- translational modification of a peptide that is at most about 3000 residues in length.

Sample Types

[0116] The methods described herein may comprise analyzing a biological sample. A biological sample may be derived from a subject (e.g., a patient or a participant in a study), from a tissue sample (e.g., an engineered tissue sample), from a cell culture (e.g., a human cell line or a bacterial colony), from a cell (e.g., a cell isolated during a single cell sorting assay), or a portion thereof (e.g., an organelle from a cell or an exosome from a blood sample). A biological sample may be synthetic, such as a composition comprising of chemically synthesized peptides. A sample may comprise a single species or a mixture of species. A biological sample may comprise biomaterial from a single organism, from a colony of genetically near-identical organisms, or from multiple organisms (e.g., enterocytes and microbiota from a human digestive tract). A biological sample may be fractionated (e.g., plasma separated from whole blood), filtered, or depleted (e.g., high abundance proteins such as albumin and ceruloplasmin removed from plasma).

[0117] A sample may comprise all or a subset of the biomolecules from the subject, tissue sample, cell culture, cell, or portion thereof. For example, a sample from a subject may comprise the majority of proteins present in that subject, or may comprise a small subset of the proteins from that subject. A biological sample may comprise a bodily fluid such as cerebral spinal fluid, saliva, urine, tears, blood, plasma, serum, breast aspirate, prostate fluid, seminal fluid, stool, amniotic fluid, intraocular fluid, mucous, or any combination thereof. A biological sample may comprise a tissue culture, for example a tumor sample, or tissue from a kidney, liver, lung, pancreas, stomach, intestine, bladder, ovary, testis, skin, colorectal, breast, brain, esophagus,, placenta, or prostate. [0118] The biological sample may comprise a molecule whose presence or absence may be measured or identified. The biological sample may comprise a macromolecule, such as, for example, a polypeptide or a protein. The macromolecule may be isolated (e.g., separated from other components from which it was sourced) or purified, such that the macromolecule comprises at least about 0.5%, at least about 1%, at least about 2%, at least about 3%, at least about 4%, at least about 5%, at least about 7.5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 90%, at least about 95%, at least about 98%, or at least about 99% of a composition by weight (e.g., by dry weight or including solvent). The macromolecule may be isolated (e.g., separated from other components from which it was sourced) or purified, such that the macromolecule comprises about 0.5%, about 1%, about 2%, about 3%, about 4%, about 5%, about 7.5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about 50%, about 60%, about 70%, about 75%, about 80%, about 90%, about 95%, about 98%, or about 99% of a composition by weight (e.g., by dry weight or including solvent). The macromolecule may be isolated (e.g., separated from other components from which it was sourced) or purified, such that the macromolecule comprises at most about 0.5%, at most about 1%, at most about 2%, at most about 3%, at most about 4%, at most about 5%, at most about 7.5%, at most about 10%, at most about 15%, at most about 20%, at most about 25%, at most about 30%, at most about 40%, at most about 50%, at most about 60%, at most about 70%, at most about 75%, at most about 80%, at most about 90%, at most about 95%, at most about 98%, or at most about 99% of a composition by weight (e.g., by dry weight or including solvent).

[0119] The biological sample may be complex, and may comprise a plurality of components (e.g., different polypeptides, heterogenous sample from a CSF of a proteopathy patient). The biological sample may comprise a component of a cell or tissue, a cell or tissue extract, or a fractionated lysate thereof. The biological sample may be substantially purified to contain molecules of a single type (peptides, nucleic acids, lipids, small molecules). A biological sample may comprise a plurality of peptides configured for a method of the present disclosure (e.g., digestion, C-terminal labeling, or fluorosequencing).

[0120] Methods consistent with the present disclosure may comprise isolating, enriching, or purifying a biomolecule, biomacromolecular structure (e.g., an organelle or a ribosome), a cell, or tissue from a biological sample. A method may utilize a biological sample as a source for a biological species of interest. For example, an assay may derive a protein, such as alpha synuclein, a cell, such as a circulating tumor cell (CTC), or a nucleic acid, such as cell-free DNA, from a blood or plasma sample. A method may derive multiple, distinct biological species from a biological sample, such as two separate types of cells. In such cases, the distinct biological species may be separated for different analyses (e.g., CTC lysate and buffycoat proteins may be partitioned and separately analyzed) or pooled for common analysis. A biological species may be homogenized, fragmented, or lysed before analysis. In particular instances, a species or plurality of species from among the homogenate, fragmentation products, or lysate may be collected for analysis. For example, a method may comprise collecting circulating tumor cells during a liquid biopsy, optionally isolating individual circulating tumor cells, lysing the circulating tumor cells, isolating peptides from the resulting lysate, and analyzing the peptides by a fluorosequencing method of the present disclosure. A method may comprise capturing peptides from a sample using a C-terminal capture reagent, and analyzing the peptides (e.g., by a fluorosequencing method).

[0121] Methods consistent with the present disclosure may comprise nucleic acid analysis, such as sequencing, southern blot, or epigenetic analysis. Nucleic acid analysis may be performed in parallel with a second analytical method, such as a fluorosequencing method of the present disclosure. The nucleic acid and the subject of the second analytical method may be derived from the same subject or the same sample. For example, a method may comprise collecting cell free DNA and a peptides from a human plasma sample, sequencing the cell free DNA (e.g., to identify a cancer marker), and performing proteomic analysis on the plasma proteins.

[0122] In some cases, the methods of the disclosure may enrich for ubiquitinated proteins. In some cases, the methods of the disclosure may enrich for ubiquitinated proteins using an immunoprecipitation kit, for example, a ubiquitin enrichment kit.

Cleavage of biomolecular tags

[0123] In some cases, the methods of the disclosure may comprise use of a protein mixture that is prepared with enzyme digestion. The enzyme digestion used in methods of the disclosure may cleave a moiety at or after a specific amino acid residue. In some cases, the methods of the disclosure may comprise use of 1, 2, 3, 4, 5 or more enzymes to obtain a protein mixture. In some cases, the methods of the disclosure may comprise use of 1 to 3 enzymes to obtain a protein mixture. In some cases, the methods of the disclosure may comprise use of 3 to 5 enzymes to obtain a protein mixture. In some cases, the methods of the disclosure may use 1 enzyme to obtain a protein mixture. In some cases, a protein mixture may be prepared with digestion with a trypsin, wherein the trypsin cuts at the C-terminus of Lys and Arg. In some cases, a protein mixture may be prepared with an engineered N- terminal recognition protein. In some cases, a protein mixture may be prepared with a protease. In some cases, the protease is lysarginase (e.g., cuts N terminus of lys and arg). In some cases, the protease is GluC. In some cases, the proteases is AspN. In some cases, a protein mixture may be prepared with an enzyme that recognizes N-terminal arginine. In some cases, the methods of the disclosure may use 2 enzymes to obtain a protein mixture. In some cases, the methods of the disclosure may use 3 enzymes to obtain a protein mixture. In some cases, the methods of the disclosure may use GluC and trypsin.

[0124] In some cases, a protein mixture may be prepared with digestion with a trypsin. In some cases, digestion with trypsin of a ubiquitinated protein results in cleavage at the C- terminus of Lys and Arg. In some cases, cleavage of a ubiquitinated protein with trypsin leaves an NH2-GG peptide covalently attached to the Lys side chain of peptide on the parent protein. In some cases, cleavage of a ubiquitinated protein with trypsin leaves an NH2-GGV- C(O)-Lys residue on the lysine of the parent protein.

[0125] In some cases, digestion with trypsin of a SUMOylated protein results in cleavage at the C-terminus of Lys and Arg on both the substrate and the SUMO. In some cases, digestion with trypsin of a SUMOylated protein results in a 18 amino acid peptide (e.g., SUMO1). In some cases, digestion with trypsin of a SUMOylated protein generates a terminal glutamic acid residue. In some cases, digestion with trypsin of a SUMOylated protein results in a 32 amino acid peptide (e.g., SUMO2). In some cases, digestion with trypsin of a SUMOylated protein generates a terminal phenylalanine residue.

[0126] In some cases, digestion with LysArginase of a SUMOylated protein may result in a Lys or Arg on the SUMO peptide. In some cases, the resulting Lys or Arg of a SUMO peptide may be fluorescently labeled. In some cases, digestion with AspN protease may result in cleavage at the N-terminus of aspartic acid. In some cases, cleavage with AspN protease may result in a 5 amino acid peptide. In some cases, cleavage with AspN protease may generate a terminal aspartic acid residue.

[0127] Chemical techniques that allow for the mild and sequential protein degradation conditions may be important for proteomics. Degradation may be used as a method to sequence polymers (e.g., proteins or peptides) to determine the order and identity of the amino acids of a polymer. A peptide or protein may be subsequently subjected to additional cleavage conditions until the sequence of at least a portion of the peptide or protein is identified. The entire sequence of a peptide or a protein may be determined using the methods and compositions described herein. Removal of each amino acid residue may be carried out through a variety of techniques including, for example, Edman degradation, organophosphate degradation, or proteolytic cleavage. In some aspects, Edman degradation may be used to remove a terminal amino acid residue. These terminal amino acid residues may be removed from either the C-terminus or the N-terminus of the peptide chain. In some instances, the amino acid residue at the N-terminus of the peptide chain may be removed. A chemical or enzymatic technique for removing a terminal amino acid may remove a defined number of (e.g., exactly one) amino acid. Accordingly, a method for analyzing a peptide may comprise successive degradation and analysis operation s, such that the removal of a defined number of amino acids from an N-terminus or C-terminus per operation provides position and sequence specific amino acid identifications during analysis. A chemical or enzymatic technique for removing a terminal amino acid may cleave a peptide at a defined location (e.g., only in between two alanine residues).

[0128] An Edman degradation method may comprise chemically functionalizing a peptide N- terminus or C-terminus (e.g., to form a thiourea or a guanidinium derivative of an N-terminal amine), and then contacting the functionalized terminal amino acid with a reagent (e.g., a hydrazine), a condition (e.g., a high or low pH or temperature), or an enzyme (e.g., an Edmanase with specificity for the functionalized terminal amino acid) to remove the functionalized terminal amino acid.

[0129] A diactivated phosphate or phosphonate may be used for peptide cleavage. Such a method may utilize an acid to remove a functionalized amino acid. The diactivated phosphate or phosphonate may be a dihalophosphate ester. In other embodiments, the techniques involve using an enzyme to remove the terminal amino acid residue, such as, for example, an exopeptidase or an Edmanase. For example, a method may comprise derivatizing an N- terminal amino acid of a peptide with a diactivated phosphate, and contacting the peptide with an Edmanase with cleavage activity toward phosphate-functionalized N-terminal amino acids.

[0130] A cleavage method (e.g., a cleavage method implemented within a sequencing method) may comprise enzymatic cleavage. The cleavage method may comprise the use of a single protease, a series of proteases (e.g., provided in a specific order), or a combination of proteases. Example proteases and their associated cleavage sites are provided in TABLE 1. A cleavage method may comprise decoupling a peptide barcode from a molecule. For example, a peptide barcode may comprise a cleavable linker comprising a cleavage site recognized by a protease listed in TABLE 1. In such cases, the sequence of the cleavage site may be present in the cleavable linker and absent in the peptide barcode. A cleavage method may comprise fragmenting a peptide barcode (e.g., cleaving an internal peptide bond before peptide barcode sequencing).

TABLE 1 [0131] Peptide cleavage may comprise chemical cleavage. Examples of chemical cleavage reagents consistent with the present disclosure include cyanogen bromide, BNPS-skatole, formic acid, hydroxylamine, and 2-nitro-5-thiocyanobenzoic acid. A peptide barcode may comprise a chemically cleavable moiety, such as a disulfide. A peptide barcode may be coupled to a molecule by a linker which comprises a chemically cleavable moiety. A peptide barcode may be coupled to a molecule by a chemically cleavable bond. A cleavage method may comprise a combination (e.g., parallel or sequential use) of chemical and enzymatic cleavage reagents. A cleavage method may comprise activating (e.g., functionalizing) an amino acid for chemical or enzymatic cleavage. For example, a method may comprise derivatizing an N-terminal amino acid residue of a peptide, and then contacting the peptide with an ‘Edmanase’ enzyme configured to remove the derivatized N-terminal amino acid residue.

[0132] Peptide cleavage conditions may be achieved with a solvent. The solvent may be an aqueous solvent, organic solvent, or a combination thereof. The solvent may be a mixture of solvents. The solvent may be an organic solvent. The organic solvent may be anhydrous. The solvent may be a non-polar solvent (e.g., hexane, dichloromethane (DCM), diethyl ether, etc.), a polar aprotic solvent (e.g., tetrahydrofuran (THF), ethyl acetate, dimethylformamide (DMF), acetonitrile (MeCN), dimethyl sulfoxide (DMSO), etc.), or a polar protic solvent (e.g., isopropanol (IP A), ethanol, methanol, acetic acid, water, etc.). The solvent may be a polar aprotic solvent. The solvent may be DMF. The solvent may be a Ci-C^haloalkane. The Ci-Cnhaloalkane may be DCM. The solvent may be a mixture of two or more solvents. The mixture of two or more solvents may be a mixture of a polar aprotic solvent and a Ci- Cnhaloalkane. The mixture of two or more solvents may be a mixture of DMF and DCM. The mixture of solvents may be any combination thereof.

[0133] A degradation process may comprise a plurality of operations. For example, a method may comprise an initial operation for derivatizing a terminal amino acid of a peptide, and a subsequent operation for cleaving the derivatized terminal amino acid from the peptide. One such method comprises organophosphorus compound-mediated N-terminal functionalization and removal, and thus provides an alternative to the isothiocyanate (e.g., phenyl isothiocyanate) based processes of some Edman degradation schemes.

[0134] An organophosphate-based degradation scheme may comprise dissolving the peptide in an organic solvent or organic solvent mixture (e.g., a mixture of dichloromethane and dimethylformamide) in the presence of an organic base (e.g., triethylamine, N, N- diisopropylethylamine (DIPEA), l,8-diazabicyclo[5.4.0]undec-7-ene (DBU), pyridine, 1,5- diazabicyclo(4.3.0)non-5-ene, 2,6-di-tert-butylpyridine, imidazole, histidine, sodium carbonate, etc.). The peptide may then be contacted with at least one organophosphorus compound. The cleavage of the peptide or protein N-terminus may be initiated through the addition of a weak acid (e.g., formic acid in water). The cleavage of the peptide or protein N- terminus may also be initiated with water. The resulting products may include the terminal amino acid of the peptide or protein released from the peptide as a phosphoramide and the peptide or protein that is shortened by the terminal amino acid residue, which comprises a free N-terminus that may be used to perform a subsequent cleavage reaction.

[0135] A cleavage method may comprise digesting a peptide to generate fragments of a predetermined average length. The cleavage method may generate peptides (e.g., by acting upon a complex mixture of peptides, such as cell lysate) with an average length of at least about 5 amino acids, at least about 8 amino acids, at least about 10 amino acids, at least about 12 amino acids, at least about 15 amino acids, at least about 20 amino acids, at least about 25 amino acids, at least about 30 amino acids, at least about 40 amino acids, or at least about 50 amino acids. The cleavage method may generate peptides with an average length of about 5 amino acids, about 8 amino acids, about 10 amino acids, about 12 amino acids, about 15 amino acids, about 20 amino acids, about 25 amino acids, about 30 amino acids, about 40 amino acids, or about 50 amino acids. The cleavage method may generate peptides with an average length of at most about 50 amino acids, at most about 40 amino acids, at most about 30 amino acids, at most about 25 amino acids, at most about 20 amino acids, at most about 15 amino acids, at most about 12 amino acids, at most about 10 amino acids, at most about 8 amino acids, or at most about 5 amino acids. The cleavage method may generate peptide fragments with an average length of between 5 and 20 amino acids, between 5 and 30 amino acids, between 10 and 20 amino acids, between 10 and 30 amino acids, between 12 and 18 amino acids, between 15 and 30 amino acids, between 20 and 40 amino acids, or between 30 and 50 amino acids.

[0136] The reaction mixture may comprise a stoichiometric or an excess concentration of the cleavage compound (e.g., relative to the concentration of peptides to be cleaved). The reaction mixture may comprise at least about 0.001% v/v, at least about 0.01% v/v, at least about 0.1% v/v, at least about 1% v/v, at least about 5% v/v, at least about 10% v/v, at least about 15% v/v, at least about 20% v/v, at least about 30% v/v, at least about 40% v/v, at least about 50% v/v, or more of the cleavage compound. The reaction mixture may comprise about 50% v/v, about 40% v/v, about 30% v/v, about 20% v/v, about 15% v/v, about 10% v/v, about 5% v/v, about 1% v/v, about 0.1% v/v, about 0.01% v/v, about 0.001% v/v, or less of the cleavage compound. The reaction mixture may comprise at most about 50% v/v, at most about 40% v/v, at most about 30% v/v, at most about 20% v/v, at most about 15% v/v, at most about 10% v/v, at most about 5% v/v, at most about 1% v/v, at most about 0.1% v/v, at most about 0.01% v/v, at most about 0.001% v/v of the cleavage compound. The reaction mixture may comprise from about 0.1% v/v to about 20% v/v, about 0.5% v/v to about 10% v/v, or about 1% v/v to about 10% v/v of the cleavage compound. The reaction mixture may comprise about 5% v/v of the cleavage compound.

[0137] The reaction may be performed at a temperature of at least about 0 °C, at least about 5 °C, at least about 10 °C, at least about 15 °C, at least about 20 °C, at least about 25 °C, at least about 30 °C, at least about 40 °C, at least about 50 °C, at least about 60 °C, at least about 70 °C, at least about 80 °C, at least about 90 °C, at least about 100 °C, at least about 110 °C, at least about 120 °C, at least about 130 °C, at least about 140 °C, or at least about 150 °C. The reaction may be performed at a temperature of about 0 °C, about 5 °C, about 10 °C, about 15 °C, about 20 °C, about 25 °C, about 30 °C, about 40 °C, about 50 °C, about 60 °C, about 70 °C, about 80 °C, about 90 °C, about 100 °C, about 110 °C, about 120 °C, about 130 °C, about 140 °C, or about 150 °C. The reaction may be performed at a temperature of at most about 0 °C, at most about 5 °C, at most about 10 °C, at most about 15 °C, at most about 20 °C, at most about 25 °C, at most about 30 °C, at most about 40 °C, at most about 50 °C, at most about 60 °C, at most about 70 °C, at most about 80 °C, at most about 90 °C, at most about 100 °C, at most about 110 °C, at most about 120 °C, at most about 130 °C, at most about 140 °C, or at most about 150 °C. The reaction may be performed at a temperature above room temperature. In some embodiments, the reaction may be performed at a temperature of about 100 °C. In some embodiments, the reaction may be performed at a temperature of about 130 °C. In some embodiments, the reaction may be performed at a temperature of about 150 °C. In some embodiments, the reaction may be heated using microwave heating. In some embodiments, the reaction may be heated using appropriate heating methods, for example, an oil bath or a sand bath. The reaction may be performed at room temperature.

[0138] The peptide and the cleavage compound may be mixed or incubated for at least at least about 1 minute, at least about 5 minutes, at least about 10 minutes, at least about 20 minutes, at least about 30 minutes, at least about 40 minutes, at least about 50 minutes, at least about 60 minutes, at least about 90 minutes, at least about 2 hours, at least about 3 hours, at least about 4 hours, at least about 6 hours, at least about 8 hours, at least about 10 hours, at least about 12 hours, at least about 16 hours, at least about 20 hours, or at least about 24 hours. The peptide and the cleavage compound may be mixed or incubated for at least about 1 minute, about 5 minutes, about 10 minutes, about 20 minutes, about 30 minutes, about 40 minutes, about 50 minutes, about 60 minutes, about 90 minutes, about 2 hours, about 3 hours, about 4 hours, about 6 hours, about 8 hours, about 10 hours, about 12 hours, about 16 hours, about 20 hours, or about 24 hours. The peptide and the cleavage compound may be mixed or incubated for at most about 24 hours, at most about 20 hours, at most about 16 hours, at most about 12 hours, at most about 10 hours, at most about 8 hours, at most about 6 hours, at most about 4 hours, at most about 3 hours, at most about 2 hours, at most about 1 hour, at most about 50 minutes, at most about 40 minutes, at most about 30 minutes, at most about 20 minutes, at most about 10 minutes, at most about 5 minutes, at most about 1 minute, or less. The peptide and the cleavage compound may be mixed or incubated from about 1 minute to about 24 hours, 5 minutes to about 6 hours, 5 minutes to about 2 hours, or 5 minutes to about 30 minutes.

Labeling of amino acid residues

[0139] The present disclosure provides a range of compositions, systems, and methods for selectively labeling amino acids, for example for peptide sequencing. Sample preparation may be improved by selectively labeling specific amino acid types (e.g., glycine, cysteine, lysine, histidine, tyrosine, threonine, serine, arginine, glutamate, aspartate, tryptophan, or any combination thereof) and amino acid positions (e.g., N-terminal or C-terminal amino acids).

A label may comprise a first reactive group configured to couple to a specific amino acid type (e.g., glycine) or to a collection of amino acid types (e.g., lysine and cysteine).

[0140] A composition, system, or method of the present disclosure may selectively label glycine, cysteine, lysine, tyrosine, histidine, glutamic acid, aspartic acid, tyrosine, threonine, serine, arginine, N-terminal amines, C-terminal carboxyl-groups, or any combination thereof. A composition, system, or method may selectively label a group of amino acids, for example a specific reagent may be configured to couple to terminal glycine residues present in a sample. [0141] In some cases, a chemical tag may be used to label a site-specifically label a site on a parent protein. In some cases, a terminal amino acid resulting from cleavage of a post- translational modification may be labeled. In some cases, the terminal amino acid is Gly. In some cases, the terminal amino acid is Lys. In some cases, the terminal amino acid is Cys. In some cases, the terminal amino acid is an N-terminal glycine. In some cases, the terminal amino acid is an C-terminal glycine. In some cases, an amino acid may be site-specifically labeled using a chemical method. In some cases, an amino acid may be site-specifically labeled using an enzymatic method. [0142] In some embodiments, disclosed herein is a composition comprising: a) an orthosubstituted benzaldehyde; and b) a peptide comprising an N-terminal diglycine.

[0143] In some cases, an amino acid may be site-specifically labeled using a chemical method. In some cases, an amino acid may be site-specifically labeled using a compound of Formula (la), (lb), or (Ic): wherein n is 1, 2, 3, 4, or 5.

[0144] In some cases, an amino acid may be site-specifically labeled using a compound of Formula (Id), (le), or (If): wherein n is 1, 2, 3, 4, or 5; and R is a reactive group.

[0145] In some cases, an amino acid may be site-specifically labeled using a compound of Formula (la'), (lb'), or (Ic'):

Formula (la') Formula (lb') Formula (Ic').

[0146] In some cases, an amino acid may be site-specifically labeled using a compound of Formula (Id'), (le'), or (If'):

Formula (Id') Formula (le') Formula (If').

[0147] In some cases, n is 1. In some cases, n is 2. In some cases, n is 3. In some cases, n is

4. In some cases, n is 5.

[0148] In some cases, R is an azide moiety. In some cases, R is an alkynyl moiety. In some cases, R is an thiol moiety. In some cases, R is a succinimidyl ester moiety. In some embodiments, R is a fluorophore. In some cases, R is a fluorophore introduced by coupling a fluorophore-NH2 group through amide coupling. In some cases, a fluorophore may be introduced through click chemistry.

[0149] In some cases, the fluorophore is Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 633, Alexa Fluor 647, Alexa Fluor 660, Alexa Fluor 680, Fluorescein, Oregon Green 488, Pacific blue dye, Pacific orange dye, Texas red dye, or biotin.

[0150] In some embodiments, an amino acid may be site-specifically labeled to generate a fluorescent moiety. In some embodiments, an amino acid may be site-specifically labeled to generate a chromophore. In some embodiments, an amino acid may be site-specifically labeled to generate a GFP-chromophore mimetic. In some embodiments, a terminal glycine can be site-specifically labeled to generate a chromophore. In some embodiments, a terminal glycine can be site-specifically labeled to generate a GFP-chromophore mimetic. In some embodiments, the GFP-chromophore mimetic is 2,3-dimethyl-5-(propan-2-ylidene)-3,5- dihydro-4H-imidazol-4-one or a derivative thereof.

[0151] In some embodiments, a terminal glycine can be site-specifically labeled by reacting the terminal glycine moiety with an external ketone. In some embodiments, the external ketone is a 2,3-diketone. In some embodiments, the external ketone is acetylacetone. In some embodiments, the external ketone is heptane-3, 5-dione.

[0152] In some embodiments, an amino acid may be site-specifically labeled using a reagent ratio of diketone: mino acid of about 1:1, about 1: 1.5, about 1:2, about 1:2.5, about 1:3, about 1:3.5, about 1:4, about 1:4.5, about 1:5, about 1:5.5, about 1:6, about 1:6.5, about 1:7, about 1:7.5, about 1:8, about 1:8.5, about 1:9, about 1:9.5 or about 1:10. In some embodiments, an amino acid may be site-specifically labeled using a reagent ratio of amino acid:diketone of about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:3.5, about 1:4, about 1:4.5, about 1:5, about 1:5.5, about 1:6, about 1:6.5, about 1:7, about 1:7.5, about 1:8, about 1:8.5, about 1:9, about 1:9.5 or about 1:10. In some embodiments, an amino acid may be site- specifically labeled using a reagent ratio of diketone:amino acid of about 1:2. In some embodiments, an amino acid may be site-specifically labeled using a reagent ratio of diketone: amino acid of about 1:2.5. In some embodiments, an amino acid may be site- specifically labeled using a reagent ratio of diketone: amino acid of about 1:3. In some embodiments, an amino acid may be site-specifically labeled using a reagent ratio of diketone: amino acid of about 1:3.5. In some embodiments, an amino acid may be site- specifically labeled using a reagent ratio of diketone: amino acid of about 1:4.

[0153] In some embodiments, an amino acid may be site-specifically labeled in the presence of a drying agent. In some embodiments, the drying agent is molecular sieves. In some O O 0 embodiments, the drying agent is 3A, 4A, or 5A molecular sieves. In some embodiments, the drying agent is 3 A molecular sieves. In some embodiments, the drying agent can be added to the amino acid labeling reaction in an amount of at least about 50 mg, at least about 100 mg, at least about 150 mg, at least about 200 mg, at least about 250 mg, at least about 300 mg, at least about 350 mg, at least about 400 mg, at least about 450 mg, or at least about 500 mg per 0.5 mmol of the labeling agent (e.g., diketone). In some embodiments, the drying agent can be added to the amino acid labeling reaction in an amount of about 50 mg, about 100 mg, about 150 mg, about 200 mg, about 250 mg, about 300 mg, about 350 mg, about 400 mg, about 450 mg, or about 500 mg per 0.5 mmol of the labeling agent. In some embodiments, the drying agent can be added to the amino acid labeling reaction in an amount of at most about 50 mg, at most about 100 mg, at most about 150 mg, at most about 200 mg, at most about 250 mg, at most about 300 mg, at most about 350 mg, at most about 400 mg, at most about 450 mg, or at most about 500 mg per 0.5 mmol of the labeling agent. In some embodiments, the drying agent can be added to the amino acid labeling reaction in an amount of about 100 mg per 0.5 mmol of the labeling reagent. In some embodiments, the drying agent can be added to the amino acid labeling reaction in an amount of about 150 mg per 0.5 mmol of the labeling reagent. In some embodiments, the drying agent can be added to the amino acid labeling reaction in an amount of about 200 mg per 0.5 mmol of the labeling reagent.

[0154] In some embodiments, an amino acid may be site selectively labeled at a reaction temperature of at least about 25 °C, at least about 30 °C, at least about 40 °C, at least about 50 °C, at least about 60 °C, at least about 70 °C, at least about 80 °C, at least about 90 °C, at least about 100 °C, at least about 110 °C, at least about 120 °C, at least about 130 °C, at least about 140 °C, or at least about 150 °C. In some embodiments, an amino acid may be site selectively labeled at a reaction temperature of about 25 °C, about 30 °C, about 40 °C, about 50 °C, about 60 °C, about 70 °C, about 80 °C, about 90 °C, about 100 °C, about 110 °C, about 120 °C, about 130 °C, about 140 °C, or about 150 °C. In some embodiments, an amino acid may be site selectively labeled at a reaction temperature of at most about 25 °C, at most about 30 °C, at most about 40 °C, at most about 50 °C, at most about 60 °C, at most about 70 °C, at most about 80 °C, at most about 90 °C, at most about 100 °C, at most about 110 °C, at most about 120 °C, at most about 130 °C, at most about 140 °C, or at most about 150 °C. In some embodiments, an amino acid may be site selectively labeled at a reaction temperature of about 100 °C. In some embodiments, an amino acid may be site selectively labeled at a reaction temperature of about 120 °C. In some embodiments, an amino acid may be site selectively labeled at a reaction temperature of about 130 °C. In some embodiments, an amino acid may be site selectively labeled at a reaction temperature of about 150 °C.

[0155] In some embodiments, an amino acid may be site selectively labeled with a reaction time of at least about 15 minutes, at least about 30 minutes, at least about 60 minutes, at least about 90 minutes, at least about 2 hr, at least about 2.5 hr, at least about 3 hr, at least about

3.5 hr, at least about 4 hr, at least about 4.5 hr, at least about 5 hr, at least about 5.5 hr, at least about 6 hr, at least about 8 hr, at least about 10 hr, at least about 12 hr, at least about 18 hr, or at least about 24 hr. In some embodiments, an amino acid may be site selectively labeled with a reaction time of about 15 minutes, about 30 minutes, about 60 minutes, about 90 minutes, about 2 hr, about 2.5 hr, about 3 hr, about 3.5 hr, about 4 hr, about 4.5 hr, about 5 hr, about

5.5 hr, about 6 hr, about 8 hr, about 10 hr, about 12 hr, about 18 hr, or about 24 hr. In some embodiments, an amino acid may be site selectively labeled with a reaction time of at most about 15 minutes, at most about 30 minutes, at most about 60 minutes, at most about 90 minutes, at most about 2 hr, at most about 2.5 hr, at most about 3 hr, at most about 3.5 hr, at most about 4 hr, at most about 4.5 hr, at most about 5 hr, at most about 5.5 hr, at most about 6 hr, at most about 8 hr, at most about 10 hr, at most about 12 hr, at most about 18 hr, or at most about 24 hr. In some embodiments, an amino acid may be site selectively labeled with a reaction time of about 60 minutes. In some embodiments, an amino acid may be site selectively labeled with a reaction time of about 90 minutes.

[0156] In some embodiments, an amino acid may be site selectively labeled using a solvent or a mixture of solvents. In some embodiments, an amino acid may be site selectively labeled using one solvent. In some embodiments, an amino acid may be site selectively labeled using a mixture of solvents. In some embodiments, an amino acid may be site selectively labeled using a solvent system comprising DMF, acetic acid, toluene, 1,4-dioxane, 1,2- dichloroethane, and/or pyridine. In some embodiments, an amino acid may be site selectively labeled using a mixture of DMF and pyridine. In some embodiments, an amino acid may be site selectively labeled using DMF and pyridine at a ratio of about 3: 1, about 2.5:1, about 2: 1, about 1.5:1, about 1:1, about 1:1.5, about 1:2, about 1:2.5, or about 1:3. In some embodiments, an amino acid may be site selectively labeled using a 1:1 mixture of DMF and pyridine.

[0157] In some cases, an amino acid may be site-specifically labeled using an enzymatic method. In some cases, an N-terminal glycine may be labeled using a sortase enzyme. Sortase may couple a donor peptide to a target peptide, where the target peptide has an N-terminal glycine residue. In some cases, the donor peptide may be a peptide with a recognition motif. In some cases, the donor peptide may be H2N-LPXTG-COOH. In some cases, the N-terminus of the donor peptide may be modified with a reactive group, for example, a fluorophore. In some cases, the donor peptide may be a chemically synthesized peptide. In some cases, an acceptor peptide may be an N-terminal glycine residue resulting from cleavage of a post- translational modification of a target peptide or protein.

[0158] In some cases, amino acid residues may be site-specifically labeled using an affinity reagent. In some cases, the affinity reagent selectively recognizes N-terminal glycine residues. In some cases, the affinity reagent selectively recognizes C-terminal glycine residues. In some cases, the affinity reagent is a supramolecular complex.

[0159] In some cases, amino acid residues may be site-specifically labeled using a ligase. In some cases, the ligase may be Butelase-1. In some cases, the donor peptide may be H2N- NHV-COOH or a thioester. In some cases, an acceptor nucleophile may be a X1-X2, wherein XI is a first amino acid and X2 is a second amino acid. In some cases, XI may be Gly. In some cases, X2 may be He, Leu, Vai, or Cys.

[0160] In some cases, serine or threonine residues may be selectively labeled using an oxidizer and reacting the resulting aldehyde with a nucleophile. In some cases, the oxidizer is HIO4. In some cases, the nucleophile is an amine. In some cases, the nucleophile is a hydrazine. In some cases, an amino acid may be selectively labeled using a Gly-His tag. In some cases, the Gly-His tag may be introduced by reacting an N-terminal amino acid with gluconolactone and 4-methoxyphenyl esters as acylating agents. In some cases, an amino acid may be selectively labeled using an engineered aminopeptidase, for example, an alanineaminopeptidase. [0161] In some cases, one or more internal amino acid residues may be labeled using a fluorescent dye. In some cases, a fluorescent dye may be introduced using “click-clack chemistry”. In some cases, the fluorescent dye or fluorophore is Atto647N, JF549, or Atto495.

[0162] Aspects of the present disclosure provide amino acid labels comprising a first reactive group for coupling to an amino acid (or a portion thereof, such as a reactive functional group of an amino acid side chain) and a second reactive group for coupling to a reporter moiety or a protecting group. Such a system may be referred to as a “click-clack” labeling system, wherein a “click” reagent refers to a label configured to couple to an amino acid, and a “clack” reagent refers to a reporter moiety or protecting group configured to couple to the “click” reagent. The second reactive group of a label may be configured to couple to a reporter moiety, a protecting group, or any combination thereof reversibly or irreversibly. The second reactive group may be reversibly coupled to a protecting group, decoupled from the protecting group, and then coupled to a reporter moiety. For example, the label may be provided with a protecting group coupled to its first or second reactive group (e.g., a diol coupled to an aldehyde reactive group of the label). Such a modular labeling process may enable multi-amino acid labeling schemes with diminished cross-reactivity between amino acid and label types. Such a labeling process may also enable the use of chemically sensitive reporter moieties (e.g., pH sensitive or chemically quenchable dyes), by allowing their attachment following amino acid labeling operations. For example, a method may comprise selectively labeling cysteine residues of a peptide with a first label, selectively labeling lysine residues of the peptide with a second label, selectively labeling carboxylate-containing residues (e.g., aspartate and glutamate) of the peptide with a third label, selectively labeling arginine residues of the peptide with a fourth label, chemically modifying (e.g., oxidizing) methionine residues of the peptide, selectively labeling the chemically modified methionine residues of the peptide with a fifth label, and coupling different reporter moieties (e.g., different color dyes) to each of the first, second, third, fourth, and fifth labels in a single operation (e.g., upon addition of all labeling reagents simultaneously). It is also conceivable that one or more reporter fluorophores can directly label the amino acids on the peptide chains. A bifunctional label of the present disclosure may prevent cross-reactivity between a first reactive group of a label and a reporter moiety. For example, the use of bifunctional labels may permit use of reporter moieties which are cross-reactive with a first reactive group of a label, such as an iodoacetamide-reactive dye and a label comprising a cysteine reactive iodoacetamide group. [0163] A label of the present disclosure may be used to crosslink two biological species, such as two amino acid residues. For example, a method may comprise coupling a lysine selective label to a first peptide and a cysteine selective label to a second peptide, and then crosslinking the lysine and cysteine selective labels. The cross-linking may directly couple (e.g., through a chemical bond) the lysine and cysteine selective labels, or may comprise a linker, such as a “click” reagent configured to couple to second reactive groups on the lysine and cysteine selective labels.

[0164] Examples of amino acid selective labels comprising second reactive groups, as well as example reagent pairs for their syntheses, are provided in TABLE 2. A cysteine- and lysineselective “Click” label may comprise an iodoacetamide as a first reactive group (e.g., for coupling to cysteine or lysine) and an azide as a second reactive group (e.g., for coupling to a “Clack” reporter moiety or protecting group), such as the iodoacetamide PEG azide compound shown in Row A of TABLE 2. A cysteine-selective “Click” label may comprise an iodoacetamide as a first reactive group and a norbomene as a second reactive group, such as the reactant shown in Row B of TABLE 2. Such a reagent may be synthesized by coupling a norbornene amine with an iodoacetamide N-hydroxysuccinamide ester. A cysteine-selective “Click” label may comprise an iodoacetamide as a first reactive group and an aldehyde as a second reactive group, such as 2-iodo-N-(3-oxopropyl)acetamide (as shown in Row C of TABLE 2). Such a compound may be generated by coupling an N-hydroxysuccinamide ester with an amine comprising a geminal diether configured to hydrolyze to an aldehyde. A cysteine-selective label may comprise a first reactive group for coupling to cysteine but lack a second reactive group (e.g., the label may be a “dummy” label), and therefore be unable to couple to a “Clack” reporter moiety or protecting group) reagent. An example of such a reagent may be iodoacetamide, as shown in TABLE 2 Row D.

[0165] A lysine-selective “Click” label may comprise an N-hydroxysuccinamide ester as a first reactive group and a norbornene as a second reactive group, such as the reagent shown in Row F of TABLE 2. A lysine-selective “Click” label may comprise an N- hydroxysuccinamide ester as a first reactive group and a geminal diether as a second reactive group, such as the reagent shown in Row G of TABLE 2. Such a reagent may be generated by coupling 1 -hydroxypyrrolidine-2, 5-dione to the carboxylic acid of a compound comprising a geminal diether. A lysine-selective label may comprise a first reactive group for coupling to lysine but lack a second reactive group for coupling to a “Clack” reporter moiety or protecting group. An example of such a reagent may be an activated ester, such as the compound shown in Row H of TABLE 2. [0166] A carboxylate-selective (e.g., selective for aspartate and glutamate side chain carboxylates) “Click” label may comprise an amine as a first reactive group and an azide as a second reactive group, such as the reagent shown in Row I of TABLE 2. A carboxylateselective “Click” label may comprise an amine as a first reactive group a norbomene as a second reactive group, such as the reagent shown in Row J of TABLE 2. A carboxylateselective “Click” label may comprise an amine as a first reactive group a geminal diether as a second reactive group such as the reagent shown in Rows K and L of TABLE 2. A carboxylate-selective label may comprise a first reactive group for coupling to a carboxylate but lack a second reactive group for coupling to a “Clack” reporter moiety or protecting group. An example of such a reagent may be an alkyl amine, such as the compound shown in Row M of TABLE 2.

[0167] A phosphoserine-, phospho threonine-, and/or glycosylation-selective “Click” reagent may comprise a disulfide as a first reactive group and an azide, a norbomene, a geminal diether, or an aldehyde as a second reactive group, as shown in Rows N-R of TABLE 2. A phosphoserine-, phospho threonine-, and/or glycosylation-selective “Click” reagent may comprise a disulfide as a first reactive group and may lack a second reactive group. In some embodiments, compounds according to the current disclosure include any of the structures or linkers described in Table 2 below.

TABLE 2

[0168] Further “click” reagents useful with methods according to the disclosure may be prepared by any one of Schemes 1-5 below (wherein R can denote a methyl group, an amino acid, or a peptide chain): [0169] Scheme 1

[0172] Scheme 4

[0174] In some cases, additional labeling may be performed on the parent peptide. In some cases, an internal amino acid residue may be labeled on the parent peptide.

[0175] Sample preparation may be improved by labeling a plurality of amino acid residues through series of sequential operations. The present disclosure provides a range of systems to facilitate labeling of multiple amino types. The system may minimize cross-reactivity of amino acids, reporter moieties (e.g., fluorescent molecules (e.g., dyes)), or the decomposition of, for example, sensitive reporter moieties (e.g., fluorescent molecules (e.g., dyes)).

[0176] The peptide or any of the molecules described herein may comprise a plurality of amino acids coupled to a plurality of labels. The plurality of amino acids may comprise a plurality of amino acids coupled to a plurality of labels. The plurality of amino acids coupled to a plurality of labels may comprise a first amino acid coupled to a first label and a second amino acid coupled to a second label. The plurality of amino acids may comprise a plurality of first amino acids coupled to a plurality of first labels. The plurality of amino acids may comprise a plurality of second amino acids coupled to a plurality of second labels. The plurality of amino acids may comprise (i) a plurality of first amino acids coupled to a plurality of first labels and (ii) a plurality of second amino acids coupled to a plurality of second labels. The first label, or the plurality thereof, may couple to the first amino acid, or the plurality thereof. The second label, or the plurality thereof, may couple to the second amino acid, or the plurality thereof. The first label, or the plurality thereof, may couple to the first amino acid, or the plurality thereof, and the second label, or the plurality thereof, couples to the second amino acid, or the plurality thereof. At least one label of the plurality of labels may be coupled to a specific amino acid type of the plurality of amino acids. For example, one label of the plurality of labels may be coupled to a lysine, a cysteine, a glutamic acid, an aspartic acid, a tyrosine, an arginine, a histidine, a threonine, a serine, a glutamine, an asparagine, or a tryptophan.

[0177] A label may comprise a first reactive group that is configured to couple to a second reactive group. The first reactive group may be selected from the group consisting of an azide, an alkyne, an alkene, an aldehyde, a ketone, a tetrazine, a thiol, a dithiol, a cyclooctene, and norbomene. The second reactive group may be selected from the group consisting of an azide, an alkyne, an alkene, an aldehyde, a ketone, a tetrazine, a thiol, a dithiol, a cyclooctene, and norbomene. The first reactive group may be selected from the group consisting of an azide, an alkene, an aldehyde, a ketone, and a tetrazine. The first reactive group may be a strained alkyne. The second reactive group may be selected from the group consisting of an azide, a thiol, a dithiol, a cyclooctene, an alkene, an aldehyde, a ketone, a tetrazine, a norbomene, and an alkyne. The second reactive group may be a strained alkyne.

[0178] At least one label of the plurality of labels may be configured to react with a specific second reactive group coupled to a specific reporter moiety. The first reactive group may be selected from the group consisting of an alkyne, a thiol, a dithiol, and a cyclooctene, and the second reactive group may be selected from the group consisting of an alkyne, an azide, a thiol, a dithiol, a cyclooctene, an alkene, an aldehyde, a ketone, a tetrazine, and a norbomene. The first reactive group may be configured to react to a particular second reactive group. For example, the first reactive group may be an azide and the second reactive group may be an alkyne, the first reactive group may be an alkyne and the second reactive group may be an azide, the first reactive group may be an alkene and the second reactive group may be a thiol, the first reactive group may comprise an carbonyl (e.g., a ketone or an aldehyde) and the second reactive group may be an dithiol, the first reactive group may be a tetrazine and the second reactive group may be a cyclooctene (e.g., trans-cyclooctene).

[0179] The at least one label that couples to the amino acid, or plurality thereof, may be coupled to a reporter moiety. The reporter moiety may be configured to emit a signal upon excitation. The signal may be a detectable signal. For example, the signal may be an optical signal, such as a fluorescent or phosphorescent signal. The optical signal may be produced by a dye. The reporter moieties may also produce non-optically detectable signals. For example, a reporter moiety may produce an electrical signal, a radioactive signal or a chemical signal. The reporter moiety may be coupled to a spacer. The spacer may adjoin a reporter moiety and a second reactive group. A reporter may be configured to react with the label. The reporter may comprise a reporter moiety and a reactive group (e.g., a second reactive group). The reporter may comprise a reporter moiety, a reactive group (e.g., a second reactive group), and a spacer.

[0180] In another aspect, provided herein is a system comprising a peptide, wherein the peptide is immobilized to at least one support; and comprises an amino acid coupled to a label, wherein the label comprises (i) a first reactive group that is configured to couple to a second reactive group that is coupled to a reporter moiety configured to emit a signal or (ii) a protecting group configured to prevent coupling between the label and the second reactive group.

[0181] In another aspect, provided herein is a system for processing or analyzing a peptide, comprising a peptide comprising an amino acid coupled to a first reactive group and a support coupled to a second reactive group, wherein the first reactive group is configured to couple to the second reactive group to immobilize the peptide adjacent to the support. In some cases, the system is configured to couple a peptide to a support (e.g., a surface). In some cases, the system is configured to couple an amino acid residue of a peptide to a support (e.g., a surface). The support (e.g., the surface) may comprise a reactive group configured to couple to a functional group coupled to the amino acid residue of the peptide.

[0182] The plurality of peptides may be immobilized to a plurality of supports. The plurality of peptides may comprise at least about 1, at least about 10, at least about 100, at least about 1000, at least about 10,000, at least about 100,000, or at least about 1,000,000 or more peptides immobilized to the same support. The plurality of peptides may comprise about 1, about 10, about 100, about 1000, about 10,000, about 100,000, or about 1,000,000 peptides immobilized to the same support. The plurality of peptides may comprise at most about 1, at most about 10, at most about 100, at most about 1000, at most about 10,000, at most about 100,000, or at most about 1,000,000 peptides immobilized to the same support. The at least one support may be a bead, a polymer matrix, an array, or any combination thereof. The at least one support may be a bead, a polymer matrix, or an array. The at least one support may be a bead and an array. The at least one support may be a bead. The at least one support may be an array. The array may be a surface. The array may be a slide. The slide may be a microscopic slide. The at least one support may be a microscopic slide. The at least one support may be a polymer matrix.

[0183] The support may be a solid support or a semi-solid support. The solid support or semi-solid support may be a bead. The bead may be a gel bead. The bead may be a polymer bead. The support may be a resin. Non-limiting supports may comprise, for example, agarose, sepharose, polystyrene, polyethylene glycol (PEG), or any combination thereof. The support may be a polystyrene bead. The support may include functional groups, such as, for example, amines, sulfhydryls, acids, alcohols, bromides, maleamides, succinimidyl esters (NHS), sulfosuccinimidyl esters, disulfides, azides, alkynes, isothiocyanates (ITC), or combinations thereof. The support may be a PEGA resin. The support may be an amino PEGA resin. The support may comprise an amine group. The support may include protected functional groups, such as, for example, Boc, Fmoc, alkyl ester, Cbz, or combinations thereof. The bead may contain a metal core. The bead may be a polymer magnetic bead. The polymer magnetic bead may comprise a metal-oxide. The support may comprise at least one iron oxide core.

[0184] An N-terminus, a C-terminus, an internal amino acid, or any combination thereof, of the peptide may be coupled to the at least one support. The N-terminus and the C-terminus of the peptide may be coupled to the at least one support. The N-terminus of the peptide may be coupled to one support and the C-terminus of the peptide may be coupled to another support. The N-terminus of the peptide may be coupled to a bead. The C-terminus of the peptide may be coupled to a slide. The N-terminus of the peptide may be coupled to a bead and the C- terminus may not be coupled to a support. The C-terminus of the peptide may be coupled to a slide and the N-terminus of the peptide may not be coupled to a support. The N-terminus of the peptide may be coupled to a bead and the C-terminus of the peptide may be coupled to a slide.

[0185] The N-terminus of the peptide may be coupled to the at least one support. The N- terminus of the peptide may be coupled to a cleavable unit. The cleavable unit may be coupled to the at least one support. The cleavable unit may comprise at least one of (i) a cleavable moiety, (ii) an aldehyde, (iii) the at least one support, or (iv) a spacer. The cleavable unit may comprise a cleavable moiety. The cleavable moiety may comprise a rink group. The cleavable unit may comprise an aldehyde. The aldehyde may be a pyridinecarboxaldehyde (PCA), or any derivative thereof. The cleavable unit may comprise a spacer. The cleavable unit may comprise at least two of (i) a cleavable moiety, (ii) an aldehyde, (iii) the at least one support, or (iv) a spacer. The cleavable unit may comprise a cleavable moiety and an aldehyde. The cleavable unit may comprise the at least one support and an aldehyde. The cleavable unit may comprise at least three of (i) a cleavable moiety, (ii) an aldehyde, (iii) the at least one support, or (iv) a spacer. The cleavable unit may comprise an aldehyde, the at least one support, and a spacer. The cleavable unit may comprise an aldehyde, the at least one support, and a cleavable moiety. The cleavable unit may comprise a spacer, the at least one support, and a cleavable moiety. The cleavable unit may comprise (i) a cleavable moiety, (ii) an aldehyde, (iii) the at least one support, and (iv) a spacer. The cleavable may be as described in W02020072907A1. The aldehyde, the spacer, the cleavable moiety, or any combination thereof may be as described in W02020072907A1.

[0186] The C-terminus of the peptide may be modified with an agent configured to couple the C-terminus to at least one support. The agent may comprise an alkyne or an azide, either of which may be configured to couple to at least one support. The C-terminus may comprise an acidic amino acid. The C-terminus may comprise a first acidic residue and a second acidic residue. The first acidic residue may be a C-terminal carboxylic acid. The second acidic residue may be an aspartic acid side chain or a glutamic acid side chain. The first acidic residue and second acidic residue of the C-terminus may be modified. In cases where the C- terminus of the peptide contains two acidic residues, both the first and second acidic residues may be modified by an agent comprising an alkyne or an azide, either of which may be configured to couple to at least one support.

[0187] A reporter (or a reporter moiety) for use in the present system may, by way of a nonlimiting example, emit a detectable or an optical signal (e.g., from a fluorescent dye). However, any number of reporters (or a reporter moieties) as described herein may be used for their various advantageous features. As an additional example, a reporter (or a reporter moiety) may emit a radiometric signal, which can be detected by an ionization chamber, a gaseous ionization detector, a Geiger counter, a photodetector, a scintillation counter, or a semiconductor detector, among others. Conversely, a reporter (or a reporter moiety) may not emit a signal at all. Reporters (and reporter moieties) may selectively label specific amino acids by reacting with their side chains, or may detect a post-translational modification to an amino acid. In some examples, a plurality of amino acids will be contained within the peptide, of which many or all may be coupled to a label and/or a reporter (or a reporter moiety).

[0188] In some cases, peptides or proteins may be immobilized on a solid-phase as an N- terminal capture. In some cases, peptides or proteins may be reversibly captured on a solid support to prepare the molecules for mass spectrometry, sequencing, single molecule protein sequencing, or NMR analysis.

[0189] In some cases, methods of molecule capture that may be performed using a solid support by the N-terminal covalent bonding of an aromatic or a heteroaromatic carboxaldehyde (e.g., 2-pyridinylcarboxaldehyde e.g. PCA), which, despite being covalent, is fully reversible under specific conditions. This solid support-bound molecule may be chemically and biologically modified while on the solid support and released when the molecule is prepared for analysis. The molecules may be proteins, peptides, or small molecules containing a 2-aminoacetamide. This method allows for rapid and high yield preparation for peptide/protein analysis techniques that require chemical manipulation. [0190] The method may be conducted at peptide concentrations of at least about 0.001 nanomolar (nM), at least about 0.01 nM, at least about 0.1 nM, at least about 1 nM, at least about 10 nM, or at least about 100 nM. The method may be conducted at peptide concentrations of about 0.001 nM, about 0.01 nM, about 0.1 nM, about 1 nM, about 10 nM, or about 100 nM. The method may be conducted at peptide concentrations of at most about 0.001 nanomolar (nM), at most about 0.01 nM, at most about 0.1 nM, at most about 1 nM, at most about 10 nM, or at most about 100 nM. The method may be conducted at peptide concentrations from 0.001 nM to 100 nM, 0.01 nM to 100 nM, or 0.1 nM to 50 nM.

Sequencing of labeled peptides or proteins

[0191] In some cases, labeled peptides or proteins of the disclosure may be sequenced for analysis. In some cases, labeled peptides or proteins of the disclosure may be sequenced for analysis using fluorosequencing.

[0192] In some cases, labeled peptides or proteins of the disclosure may be sequenced for analysis using nanopore sequencing, labeled peptides or proteins of the disclosure may be sequenced for analysis using N-terminal affinity reagents, labeled peptides or proteins of the disclosure may be sequenced for analysis mass spectrometry.

[0193] Various aspects of the present disclosure provide compositions and methods for peptide fluorosequencing. A fluorosequencing method disclosed herein may provide peptide sequence information at the single molecule level. Example fluorosequencing methods are provided in U.S. Patent No. 9,625,469, U.S. Patent Application Serial No. 16/709,903, and U.S. Patent Application Serial No. 15/510,962). A method consistent with the present disclosure may subject a peptide to fluorosequencing and an additional form of analysis. For example, a molecule of hemoglobin may be interrogated for glycation with immunostaining, and then subsequently digested and subjected to fluorosequencing for sequencing analysis.

[0194] A characteristic feature of many fluorosequencing methods is coupling a fluorescent label to a peptide to be sequenced. A fluorescent label may be an amino acid specific label (e.g., configured to couple to a specific type of amino acid or a specific set of types of amino acids). A fluorosequencing method may comprise labeling a plurality of types of amino acids with separate, amino acid type specific fluorescent labels. A fluorosequencing method may comprise labeling one, two, three, four, five, six, or more different types of amino acids residues in a subject peptide or protein. A plurality of amino acid residues may include, for example, an N-terminal amino acid, cysteine, lysine, glutamic acid, aspartic acid, tryptophan, tyrosine, serine, threonine, arginine, histidine, methionine, or any combination thereof. Each of these amino acid residues may be labeled with a different labeling moiety. Multiple amino acid residues may be labeled with the same labeling moiety.

[0195] A label may comprise a reporter moiety. The reporter moiety may be optically detectable (e.g., fluorescent, phosphorescent, luminescent, or light absorbing). The reporter moiety may be electrochemically detectable (e.g., a redox active moiety with a characteristic oxidation or reduction potential). The reporter moiety may comprise a mass tag (e.g., for identification with mass spectrometry. A reporter moiety may identify a label to which it is attached. A plurality of labels may comprise a plurality of detectable moieties which identify labels of the plurality of labels by their type. For example, a method may comprise a plurality of types of labels configured to couple to different amino acids, each comprising a different reporter moiety that uniquely identifies the label by its type.

[0196] A label may comprise a reactive group. The reactive group may be configured to couple to a reporter moiety, a protecting group, or any combination thereof. A method may comprise coupling a label to an amino acid of a peptide (e.g., coupling a label to each amino acid of a particular type), and then coupling a reporter moiety or protecting group to the label. A method may comprise coupling a plurality of types of labels comprising reactive groups to a plurality of amino acids of a peptide, and coupling a plurality of reporter moieties, protecting groups, or combinations thereof to the labels based on their types. A method may comprise coupling a plurality of types of labels to a plurality of amino acids of a peptide, wherein the plurality of types of labels comprise labels with reactive groups, labels with reporter moieties (e.g., a cysteine-reactive label coupled to a dye), labels lacking reactive groups and reporter moieties, or any combination thereof.

[0197] A label (e.g., a label comprising a reactive group configured to couple to a reporter moiety or protecting group) may reversibly or irreversibly bind to an amino acid type, and thus may be chemically (e.g., by addition of a cleavage reagent) or physically (e.g., by addition of heat or light) decoupled from a target peptide. A method may thus comprise blocking a first amino acid, labeling a second amino acid type (e.g., threonine), unblocking the first amino acid type, and labeling the first amino acid type. Examples of reversible labels include may include silanes (e.g., trimethylsilane), acetyl groups, benzoyl groups, unsaturated pyran and furan groups, urea-forming groups, carbamate-forming groups, carbonate-forming groups, thiourea- forming groups, thiocarbamate-forming groups, thiocarbonate-forming groups, and derivatives thereof. Examples of irreversible labels may include alkyl groups, oxo-groups, amide-forming groups (e.g., an acyl chloride configured to convert an amine into an amide), and derivatives thereof.

[0198] Labeling specificity may be a major challenge for a fluorosequencing method. In many cases, a label may comprise reactivity toward a plurality of amino acid types. For example, some maleimide labels may react with cysteine, lysine, and N-terminal amines. A number of strategies may be employed to utilize or prevent such cross-reactivity. A method may comprise sequential amino acid labeling, for example to ensure that a multi-specific label is added to a system after one or more amino acid types with which the multi-specific label is configured to couple are chemically blocked or labeled, and therefore unable to react with the multi-specific label.

[0199] Discriminating between comparably reactive amino acid residues may require precise ordering of labeling operations. In the above maleimide example, lysine may be discriminated from cysteine by first reacting cysteine with a cysteine specific labeling operation (e.g., blocking cysteine in an iodoacetamide coupling operation performed at pH 7-8), thereby preventing further cysteine labeling in a subsequent lysine labeling operation. A method may comprise cysteine labeling before lysine labeling. A method may comprise cysteine labeling before glutamate labeling. A method may comprise cysteine labeling before aspartate labeling. A method may comprise cysteine labeling before tryptophan labeling. A method may comprise cysteine labeling before tyrosine labeling. A method may comprise cysteine labeling before serine labeling. A method may comprise cysteine labeling before threonine labeling. A method may comprise cysteine labeling before histidine labeling. A method may comprise cysteine labeling before arginine labeling. A method may comprise lysine labeling before glutamate labeling. A method may comprise lysine labeling before aspartate labeling. A method may comprise lysine labeling before tryptophan labeling. A method may comprise lysine labeling before tyrosine labeling. A method may comprise lysine labeling before serine labeling. A method may comprise lysine labeling before threonine labeling. A method may comprise lysine labeling before arginine labeling. A method may comprise carboxylate side chain (e.g., glutamate and aspartate side chain) labeling before tryptophan labeling. A method may comprise carboxylate side chain (e.g., glutamate and aspartate side chain) labeling before tyrosine labeling. A method may comprise carboxylate side chain (e.g., glutamate and aspartate side chain) labeling before serine labeling. A method may comprise carboxylate side chain (e.g., glutamate and aspartate side chain) labeling before threonine labeling. A method may comprise carboxylate side chain (e.g., glutamate and aspartate side chain) labeling before histidine labeling. A method may comprise carboxylate side chain (e.g., glutamate and aspartate side chain) labeling before arginine labeling. A method may comprise at least 2, at least 3, at least 4, at least 5, or at least 6 amino acid labeling operations performed in a sequence configured to minimize or prevent label cross-reactivity (e.g., labeling more than the intended type or types of amino acids).

[0200] Fluorosequencing may comprise removing peptides through techniques such as Edman degradation following or preceding subject peptide detection. Sequential peptide removal may generate sequence or position-specific information. For example, a reduction in fluorescence following an N-terminal amino acid removal operation may indicate that a labeled amino acid, and thus that a specific type of amino acid, was disposed at a peptide N-terminal. Removal of each amino acid residue may be carried out with a variety of different techniques including Edman degradation and proteolytic cleavage. The techniques may include using Edman degradation to remove the terminal amino acid residue. Alternatively, the techniques may involve using an enzyme to remove the terminal amino acid residue. These terminal amino acid residues may be removed from either the C-terminus or the A-terminus of the peptide chain. In situations where Edman degradation is used, the amino acid residue at the A- terminus of the peptide chain is removed.

[0201] A labeling moiety used in the instant application may be configured to withstand conditions for removing one or more of the amino acid residues. Some non-limiting examples of potential labeling moieties that may be used in the instant methods include, for example, those which emit a fluorescence signal in the red to infrared spectra such as an Alexa Fluor® dye, an Atto dye, Janelia Fluor® dye, a rhodamine dye, or other similar dyes. Examples of each of these dyes which were capable of withstanding the conditions of removing the amino acid residues include Alexa Fluor® 405, Rhodamine B, tetramethyl rhodamine, Janelia Fluor® 549, Alexa Fluor® 555, Atto647N, and (5)6-napthofluorescein. The labeling moiety may be a fluorescent peptide or protein or a quantum dot.

[0202] Peptide detection or imaging may comprise immobilizing the peptide on a surface. The peptide may be immobilized to the surface by coupling a peptide-derived cysteine residue, the peptide N terminus, or the peptide C terminus with the surface or with a reagent coupled to the surface. The peptide may be immobilized by reacting the cysteine residue with the surface or with a capture reagent coupled to the surface. The peptide may be immobilized by coupling the peptide C-terminus or N-terminus with a capture moiety described herein. The peptide may be immobilized on a surface. Detecting the immobilized peptide may comprise capturing an image comprising the peptide. The image may comprise a spatial address specific to the peptide. A plurality of peptides may be detected in a single imagine, wherein one or more of the peptides may comprise a spatial address within the image. The surface may be optically transparent across the visible spectrum and/or the infrared spectrum. The surface may possess a low refractive index (e.g., a refractive index between 1.3 and 1.6). The surface may be between 10 to 50 nm thick, between 20 and 80 nm thick, between 50 and 200 nm thick, between 100 and 500 nm thick, between 200 and 800 nm thick, between 500 nm and 1 pm thick, between 1 and 5 pm thick, between 2 and 10 pm thick, between 5 and 20 pm thick, between 20 and 50 pm thick, between 50 and 200 pm thick, between 200 and 500 pm thick, or greater than 500 pm in thickness. The surface may be chemically resistant to organic solvents. The surface may be chemically resistant to strong acids such as trifluoroacetic acid or sulfuric acid. A large range of substrates (like fluoropolymers (Teflon-AF (Dupont), Cytop® (Asahi Glass, Japan)), aromatic polymers (polyxylenes (Parylene, Kisco, Calif.), polystyrene, polymethmethylacrytate) and metal surfaces (Gold coating)), coating schemes (spin-coating, dip-coating, electron beam deposition for metals, thermal vapor deposition and plasma enhanced chemical vapor deposition) and functionalization methodologies (polyallylamine grafting, use of ammonia gas in PECVD, doping of long chain end-functionalized fluoroalkanes etc.) may be used in the methods described herein as a useful surface. A 20 nm thick, optically transparent fluoropolymer surface made of Cytop® may be used in the methods described herein. The surfaces used herein may be further derivatized with a variety of fluoroalkanes that will sequester peptides for sequencing and modified targets for selection. Alternatively, an aminosilane modified surfaces may be used in the methods described herein. The methods may comprise immobilizing the peptides on the surface of beads, resins, gels, quartz particles, glass beads, or combinations thereof. In some non-limiting examples, the methods contemplate using peptides that have been immobilized on the surface of Tentagel® beads, Tentagel® resins, or other similar beads or resins. The surface used herein may be coated with a polymer, such as polyethylene glycol. The surface may be amine functionalized or thiol functionalized.

[0203] A sequencing technique described herein may involve imaging the peptide or protein to determine the presence of one or more labeling moieties (e.g., amino acid labels) coupled to the peptide. The sequencing technique may comprise imaging a plurality of peptides or proteins to determine the presence of one or more labeling moieties on individual peptides from among the plurality of peptides. The sequencing technique may comprise imaging at least 10 3 , at least 10 4 , at least 10 5 , at least 10 6 , at least 10 7 , at least 10 8 or more proteins or peptides (e.g., imaging a portion of a surface comprising at least 10 3 to at least 10 8 proteins or peptides). These images may be taken after each removal of an amino acid residue and thus may enable determination of the location of the specific amino acid in the peptide sequence. For example, a C-terminal immobilized peptide may comprise a sequence (from N-terminal to C-terminal) of KDDYAGGGAAGKDA (SEQ ID NO: 5) (wherein ‘K’ denotes lysine, ‘D’ denotes aspartate, ‘Y’ denotes tyrosine, ‘A’ denotes alanine, and ‘G’ denotes glycine), and may comprise labels coupled to each lysine and tyrosine residue. A first image comprising the C-terminal immobilized peptide may indicate the presence of two lysines and one tyrosine in the peptide. The N-terminal amino acid may be removed (e.g., by Edman degradation), such that a second image comprising the C-terminal immobilized peptide may indicate the presence of one lysine and one tyrosine in the peptide. This process may be repeated until a sequence of KXXYXXXXXXXKX is identified for the peptide, wherein ‘X’ indicates a non-lysine, nontyrosine amino acid, ‘K’ indicates a lysine, and ‘Y’ indicates a tyrosine. A method of the present disclosure may identify the position of a specific amino acid in a peptide sequence. A method may be used to determine the locations of specific amino acid residues in the peptide sequence, or these results may be used to determine the entire list of amino acid residues in the peptide sequence. A method may involve determining the location of one or more amino acid residues in the peptide sequence and comparing these locations to documented peptide sequences, which may identify the entire list of amino acid residues in the peptide sequence. For example, identifying the positions of the lysines and cysteines in a 40 amino acid fragment of a human protein may uniquely identify the protein (e.g., only one human protein contains the specific pattern of lysine and cysteine residues identified in the 40 amino acid fragment). [0204] An imaging method may involve a variety of different spectrophotometric and microscopy methods, such as fluorimetry, diffuse reflectance, interferometric scattering, Raman, resonance enhanced Raman, infrared absorbance, visible light absorbance, ultraviolet absorbance, and fluorescence. The fluorescent methods may employ such fluorescent techniques, such as fluorescence polarization, Forster resonance energy transfer (FRET), or time-resolved fluorescence. A spectrophotometric or microscopy method may be used to determine the presence of one or more fluorophores coupled to a single peptide. Such imaging methods may be used to determine the presence or absence of a label on a specific peptide sequence. After repeated cycles of removing an amino acid residue and imaging a subject peptide, the position of the labeled amino acid residue may be determined in the peptide.

Sample Types [0205] The methods described herein may comprise analyzing a biological sample. A biological sample may be derived from a subject (e.g., a patient or a participant in a study), from a tissue sample (e.g., an engineered tissue sample), from a cell culture (e.g., a human cell line or a bacterial colony), from a cell (e.g., a cell isolated during a single cell sorting assay), or a portion thereof (e.g., an organelle from a cell or an exosome from a blood sample). A biological sample may be synthetic, such as a composition of synthetic peptides. A sample may comprise a single species or a mixture of species. A biological sample may comprise biomaterial from a single organism, from a colony of genetically near-identical organisms, or from multiple organisms (e.g., enterocytes and microbiota from a human digestive tract). A biological sample may be fractionated (e.g., plasma separated from whole blood), filtered, or depleted (e.g., high abundance proteins such as albumin and ceruloplasmin removed from plasma).

[0206] A sample may comprise all or a subset of the biomolecules from the subject, tissue sample, cell culture, cell, or portion thereof. For example, a sample from a subject may comprise the majority of proteins present in that subject, or may comprise a small subset of the proteins from that subject. A biological sample may comprise a bodily fluid such as cerebral spinal fluid, saliva, urine, tears, blood, plasma, serum, breast aspirate, prostate fluid, seminal fluid, stool, amniotic fluid, intraocular fluid, mucous, or any combination thereof. A biological sample may comprise a tissue culture, for example a tumor sample, or tissue from a kidney, liver, lung, pancreas, stomach, intestine, bladder, ovary, testis, skin, colorectal, breast, brain, esophagus,, placenta, or prostate.

[0207] The biological sample may comprise a molecule whose presence or absence may be measured or identified. The biological sample may comprise a macromolecule, such as, for example, a polypeptide or a protein. The macromolecule may be isolated (e.g., separated from other components from which it was sourced) or purified, such that the macromolecule comprises at least 0.5%, at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 7.5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 90%, at least 95%, at least 98%, or at least 99% of a composition by weight (e.g., by dry weight or including solvent). The biological sample may be complex, and may comprise a plurality of components (e.g., different polypeptides, heterogenous sample from a CSF of a proteopathy patient). The biological sample may comprise a component of a cell or tissue, a cell or tissue extract, or a fractionated lysate thereof. The biological sample may be substantially purified to contain molecules of a single type (peptides, nucleic acids, lipids, small molecules). A biological sample may comprise a plurality of peptides configured for a method of the present disclosure (e.g., digestion, C-terminal labeling, or fluorosequencing).

[0208] Methods consistent with the present disclosure may comprise isolating, enriching, or purifying a biomolecule, biomacromolecular structure (e.g., an organelle or a ribosome), a cell, or tissue from a biological sample. A method may utilize a biological sample as a source for a biological species of interest. For example, an assay may derive a protein, such as alpha synuclein, a cell, such as a circulating tumor cell (CTC), or a nucleic acid, such as cell-free DNA, from a blood or plasma sample. A method may derive multiple, distinct biological species from a biological sample, such as two separate types of cells. In such cases, the distinct biological species may be separated for different analyses (e.g., CTC lysate and buffycoat proteins may be partitioned and separately analyzed) or pooled for common analysis. A biological species may be homogenized, fragmented, or lysed before analysis. In particular instances, a species or plurality of species from among the homogenate, fragmentation products, or lysate may be collected for analysis. For example, a method may comprise collecting circulating tumor cells during a liquid biopsy, optionally isolating individual circulating tumor cells, lysing the circulating tumor cells, isolating peptides from the resulting lysate, and analyzing the peptides by a fluorosequencing method of the present disclosure. A method may comprise capturing peptides from a sample using a C-terminal capture reagent, and analyzing the peptides (e.g., by a fluorosequencing method).

[0209] Methods consistent with the present disclosure may comprise nucleic acid analysis, such as sequencing, southern blot, or epigenetic analysis. Nucleic acid analysis may be performed in parallel with a second analytical method, such as a fluorosequencing method of the present disclosure. The nucleic acid and the subject of the second analytical method may be derived from the same subject or the same sample. For example, a method may comprise collecting cell free DNA and a peptides from a human plasma sample, sequencing the cell free DNA (e.g., to identify a cancer marker), and performing proteomic analysis on the plasma proteins.

Fluorosequencing

[0210] Fluorosequencing (e.g., sequencing by degradation) refers to sequencing peptides in a complex protein sample at the level of single molecules. In some embodiments, millions of individual fluorescently labeled peptides are visualized in parallel, monitoring changing patterns of fluorescence intensity as N-terminal amino acids are sequentially removed, and/or using the resulting fluorescence signatures (fluorosequences) to uniquely identify individual peptides. In some embodiments, amino acids are selectively labeled on immobilized peptides, and/or the amino acids are subjected to successive cycles of removing the peptide N-terminal residues (Edman degradation) and/or imaging the corresponding decrease of fluorescent intensity for individual peptide molecules. In some embodiments, amino acids are cleaved using chemical degradation, photochemical degradation, or enzymatic degradation. The methods of the present disclosure are capable of producing patterns sufficiently reflective of the peptide sequences to allow unique identification of a majority of proteins from a species. The resulting stair-step patterns of fluorescence decreases provide positional information of the select amino acid residues. This partial pattern is often sufficient to allow unique identification of the peptide by comparison to a reference proteome. The patterns of cleavage (even for a portion of the protein) provide sufficient information to identify a significant fraction of proteins within a documented proteome, e.g. where the sequences of proteins are predetermined. In one embodiment, the single-molecule technologies of the present application allow the identification and/or absolute quantitation of a given peptide or protein in a biological sample.

[0211] In some embodiments, the methods disclosed herein can be used to perform large-scale sequencing (including but not limited to partial sequencing) of single intact peptides (not denatured) at the single molecule level by selective labeling amino acids on immobilized peptides followed by successive cycles of labeling and/or removal of the peptide aminoterminal amino acids. The methods and/or systems of the disclosure can identify amino acids in peptides, including peptides comprising unnatural amino acids. In one embodiment, the present disclosure comprises labeling the N-terminal amino acid with a first label and/or labeling an internal amino acid with a second label. In some embodiments, the labels are fluorescent labels. In other embodiments, the internal amino acid is Lysine. In other embodiments, amino acids in peptides are identified based on the fluorescent signature for each peptide at the single molecule level.

[0212] Various aspects of the present disclosure provide compositions and/or methods for peptide fluorosequencing, also called sequencing by degradation. A method consistent with the present disclosure may subject a peptide to fluorosequencing and/or an additional form of analysis. For example, a molecule of hemoglobin may be interrogated for glycation with immunostaining, and/or then subsequently digested and/or subjected to fluorosequencing for sequencing analysis. In one embodiment, the present disclosure provides a massively parallel and/or rapid method for identifying and/or quantitating individual peptide and/or protein molecules within a given complex sample. [0213] In some embodiments, the method of the disclosure comprises: (a) providing a polypeptide, wherein the polypeptide comprises at least one labeled internal amino acid; (b) detecting at least one signal or signal change from the polypeptide to identify at least a portion of a sequence of the polypeptide; and/or (c) subjecting the polypeptide to conditions sufficient to remove at least one amino acid from the polypeptide. In some embodiments, the at least one amino acid is removed from an N-terminus of the polypeptide. In some embodiments, subsequent to (c), the at least one labeled internal amino acid becomes a labeled terminal amino acid. In some embodiments, the at least one labeled internal amino acid is from a plurality of labeled amino acids, and/or wherein the at least one signal or signal change comprises a collective signal from the plurality of labeled amino acids. In some embodiments, the plurality of labeled amino acids comprise amino acids with different labels. In some embodiments, the different labels generate signals with different signal patterns.

[0214] In some embodiments, the at least one labeled internal amino acid comprises one or more members selected from the group consisting of lysine, glutamate, and aspartate. In some embodiments, the at least one labeled internal amino acid comprises an amino acid having a label covalently attached thereto, which label generates the at least one signal or signal change. In some embodiments, the at least one labeled internal amino acid comprises an amino acid having a dye coupled thereto, which dye generates the at least one signal or signal change. In some embodiments, the at least one signal or signal change is an optical signal. In some embodiments, the at least one signal or signal change is detected with an optical detector having single-molecule sensitivity. In some embodiments, the at least one signal or signal change comprises a plurality of signals of different intensities. In some embodiments, the at least one signal or signal change comprises a plurality of signals of different frequencies or frequency ranges.

[0215] In some embodiments, the label is coupled to an internal monomeric subunit of the plurality of monomeric subunits. In some embodiments, the label is an amino acid specific label. In some embodiments, the amino acid specific label comprises a methionine specific label, an arginine specific label, a histidine specific label, a tyrosine specific label, a carboxylic acid R-group specific label, a lysine specific label, a cysteine specific label, a tryptophan specific label, or any combination thereof. In some embodiments, the amino acid specific label comprises a non-natural amino acid specific label. In some embodiments, the non-natural amino acid specific label is a phosphoserine specific label, phosphothreonine specific label, pyroglutamic acid specific label, hydroxyproline specific label, azidolysine specific label, or dehydroalanine specific label. In some embodiments, the label is a fluorescent label. In some embodiments, the label is a dye.

[0216] In some embodiments, the at least one amino acid is removed from the polypeptide by a degradation reaction. In some embodiments, the degradation reaction is Edman degradation. In some embodiments, the method further comprises processing at least the portion of the sequence against a reference sequence to identify the polypeptide or a protein from which the polypeptide is derived. In some embodiments, the method further comprises, subsequent to (c),

(i) identifying the at least the portion of the sequence of the polypeptide to identify the polypeptide, and/or (ii) using the polypeptide identified in (i) to quantify the polypeptide or a protein from which the polypeptide was derived. In some embodiments, in (a), less than all amino acids of the polypeptide are labeled. In some embodiments, the method further comprises (i) repeating (b) and/or (c) to detect at least one additional signal or signal change from the polypeptide and/or (ii) using the at least one signal or signal change and/or the at least one additional signal or signal change to identify the at least the portion of the sequence.

[0217] A characteristic feature of many fluorosequencing methods is coupling amino acid labels to a peptide to be sequenced. A label may be an amino acid specific label (e.g., configured to couple to a specific type of amino acid or a specific set of types of amino acids). A fluorosequencing method may comprise labeling a plurality of types of amino acids with separate, amino acid type specific labels. A fluorosequencing method may comprise labeling one, two, three, four, five, six, or more different types of amino acids residues in a subject peptide or protein. A plurality of amino acid residues may include, for example, an N-terminal amino acid, cysteine, lysine, glutamic acid, aspartic acid, tryptophan, tyrosine, serine, threonine, arginine, histidine, methionine, or any combination thereof. Each of these amino acid residues may be labeled with a different labeling moiety. Multiple amino acid residues may be labeled with the same labeling moiety such as (i) aspartic acid and/or glutamic acid or

(ii) serine and/or threonine.

[0218] In one embodiment, a method of labeling a peptide comprises: a) providing, i) a peptide having at least one Cysteine amino acid, at least one Lysine amino acid, an N-terminal end, an amino acid having at least one carboxylate side group, a C-terminal end, and/or at least one Tryptophan amino acid, and/or ii) a first compound, iii) a second compound, iv) a third compound, v) a fourth compound, and/or vi) a fifth compound; and/or b) labeling the Cysteine with the first compound, c) labeling the Lysine with the second compound, d) labeling the N- terminal end with the third compound, e) labeling the carboxylate side group and/or the C- terminal end with the fourth compound; and/or f ) labeling the Tryptophan with the fifth compound for providing a peptide having specific labels. In one embodiment, b-f are sequential in order from b-f. In one embodiment, the labeling in b-f is performed in one (a single) solution. In one embodiment, b-f are sequential in order from b-f and/or performed in one solution. In one embodiment, the first compound is iodoacetamide. In one embodiment, the second compound is 2-methylthio-2-imadazoline hydroiodide (MDI). In one embodiment, the third compound is l-(4,4-dimethyl-2,6-dioxocyclohexylidene)-3-methylbutyl diethyl phosphate (Phos-ivDde). In one embodiment, the fourth compound is selected from the group consisting of benzylamine (BA), 3 -dimethylaminopropylamine, and isobutylamine. In one embodiment, the fifth compound is 2,4-dinitrobenzenesulfenyl chloride.

[0219] In one embodiment, disclosed herein is a method of treating a peptide comprising: a) providing a plurality of peptides immobilized on a solid support, each peptide comprising an N-terminal amino acid and internal amino acids, the internal amino acids comprising Lysine, each Lysine labeled with a first label, the first label producing a first signal for each peptide, and/or the N-terminal amino acid of each peptide labeled with a second label, the second label being different from the first label; b) treating the plurality of immobilized peptides under conditions such that each N- terminal amino acid of each peptide is removed; and/or c) detecting the first signal for each peptide at the single molecule level. In one embodiment, the second label is attached via an amine-reactive dye. In one embodiment, the second label is selected from the group consisting of fluorescein isothiocyanate, rhodamine isothiocyanate or other synthesized fluorescent isothiocyanate derivative. In one embodiment, portions of the emission spectrum of the first label do not overlap with the emission spectrum of the second label. In one embodiment, the removal of the N-terminal amino acid in b) is done under conditions such that the remaining peptides each have a new N-terminal amino acid. In one embodiment, the method further comprises d) adding the second label to the new N- terminal amino acids of the remaining peptides. In one embodiment, among the remaining peptides the new end terminal amino acid is Lysine. In one embodiment, the method further comprises e) detecting the next signal for each peptide at the single molecule level.

[0220] In one embodiment, the method further comprises an operation of treating the immobilized peptides under conditions such that each N-terminal amino acid of each peptide is removed by an Edman degradation reaction; and/or an operation of detecting the signal for each peptide at the single molecule level. In one embodiment, the label is attached to a fluorophore by a covalent bond. In one embodiment, the fluorophore and/or the covalent bond is resistant to degradation effects when incubated in an Edman degradation reaction solvent. In some embodiments, the fluorophore is a fluorophore that remains intact and/or attached to the label during Edman degradation sequencing.

[0221] The repetitive detection of signal for each peptide at the single molecule level results in a pattern. The resulting pattern is unique to a single-peptide within the plurality of immobilized peptides. In one embodiment, the single-peptide pattern is compared to the proteome of an organism to identify the peptide, one embodiment, the intensity of the labels are measured amongst the plurality of immobilized peptides. In some embodiments, the peptides are immobilized via Cysteine residues. In some embodiments, the detecting in c) is done with optics capable of single-molecule resolution. In a specific embodiment, one or more of the plurality of peptides comprises one or more unnatural amino acids.

[0222] In some embodiments, the emission spectrum of the first label do not overlap with the emission spectrum of the second label. In some embodiments, the removal of the N-terminal amino acid in b) is done under conditions such that the remaining peptides each have a new N- terminal amino acid. In one embodiment, the method further comprises d) adding the second label to the new N-terminal amino acids of the remaining peptides. In some embodiments, among the remaining peptides, the new end terminal amino acid is Lysine. In one embodiment, the method further comprises e) detecting the next signal for each peptide at the single molecule level. In one embodiment, the intensity of the first and/or second labels are measured amongst the plurality of immobilized peptides. In some embodiments, the peptides are immobilized via Cysteine residues. In some embodiments, the detecting c) is done with optics capable of singlemolecule resolution. In one embodiment, one or more of the plurality of peptides comprises one or more unnatural amino acids. In one embodiment, the unnatural amino acids comprises moieties selected from the group consisting of hydroxycarboxylates, aldehydes, thiols, and olefins. In one embodiment, one or more of the plurality of peptides comprises one or more beta amino acids.

[0223] In one embodiment, the method further comprises an operation of treating an immobilized peptide (e.g., a support or bead) under conditions such that each N-terminal amino acid of each peptide is removed by an Edman degradation reaction; and/or an operation of detecting the signal for each peptide at the single molecule level. In some embodiments, the N- terminal amino acid removing operation and/or the detecting operation are successively repeated from about 1 time to about 5 times, from about 5 times to about 10 times, from about 10 times to about 20 times, from about 20 times to about 30 times, from about 30 times to about 40 times, from about 40 times to about 50 times, from about 50 times to about 60 times, from about 60 times to about 70 times, from about 70 times to about 80 times, from about 80 times to about 90 times, or from about 90 times to about 100 times. In some embodiments, the N- terminal amino acid removing operation and/or the detecting operation are successively repeated at least about 5 times, at least about 10 times, at least about 20 times, at least about 30 times, at least about 40 times, at least about 50 times, at least about 60 times, at least about 70 times, at least about 80 times, at least about 90 times, or at least about 100 times. In some embodiments, the N-terminal amino acid removing operation and/or the detecting operation are successively repeated about 5 times, about 10 times, about 20 times, about 30 times, about 40 times, about 50 times, about 60 times, about 70 times, about 80 times, about 90 times, or about 100 times. In some embodiments, the N-terminal amino acid removing operation and/or the detecting operation are successively repeated at most about 5 times, at most about 10 times, at most about 20 times, at most about 30 times, at most about 40 times, at most about 50 times, at most about 60 times, at most about 70 times, at most about 80 times, at most about 90 times, or at most about 100 times.

[0224] A label may comprise a detectable moiety. The detectable moiety (e.g., label) may be optically detectable (e.g., fluorescent, phosphorescent, luminescent, or light absorbing). The detectable moiety may be electrochemically detectable (e.g., a redox active moiety with a characteristic oxidation or reduction potential). The detectable moiety may comprise a mass tag (e.g., for identification with mass spectrometry. A detectable moiety may identify a label to which it is attached. A plurality of labels may comprise a plurality of detectable moieties which identify labels of the plurality of labels by their type. For example, a method may comprise a plurality of types of labels configured to couple to different amino acids, each comprising a different detectable moiety that uniquely identifies the label by its type.

[0225] Labeling specificity can be a major challenge for a fluorosequencing method. In many cases, a label may comprise reactivity toward a plurality of amino acid types. For example, some maleimide labels can react with cysteine, lysine, and/or N-terminal amines. Discriminating between similarly reactive amino acid residues can require precise ordering of labeling operation s. In the above maleimide example, lysine may be discriminated from cysteine by first reacting cysteine with a cysteine specific labeling operation (e.g., iodoacetamide coupling at pH 7-8), thereby preventing further cysteine labeling in a subsequent lysine labeling operation. A method may comprise cysteine labeling before lysine labeling. A method may comprise cysteine labeling before aspartate and/or glutamate labeling. A method may comprise cysteine labeling before tryptophan labeling. A method may comprise cysteine labeling before tyrosine labeling. A method may comprise cysteine labeling before serine and/or threonine labeling. A method may comprise cysteine labeling before histidine labeling. A method may comprise cysteine labeling before arginine labeling. A method may comprise lysine labeling before glutamate labeling. A method may comprise lysine labeling before aspartate labeling. A method may comprise lysine labeling before tryptophan labeling. A method may comprise lysine labeling before tyrosine labeling. A method may comprise tyrosine labeling before lysine labeling. A method may comprise lysine labeling before serine and/or threonine labeling. A method may comprise lysine labeling before arginine labeling. A method may comprise carboxylate side chain (e.g., glutamate and/or aspartate side chain) labeling before tryptophan labeling. A method may comprise carboxylate side chain (e.g., glutamate and/or aspartate side chain) labeling before tyrosine labeling. A method may comprise carboxylate side chain (e.g., glutamate and/or aspartate side chain) labeling before serine labeling. A method may comprise carboxylate side chain (e.g., glutamate and/or aspartate side chain) labeling before serine and/or threonine labeling. A method may comprise carboxylate side chain (e.g., glutamate and/or aspartate side chain) labeling before histidine labeling. A method may comprise carboxylate side chain (e.g., glutamate and/or aspartate side chain) labeling before arginine labeling. A method may comprise C-terminal carboxylate labeling before lysine labeling. A method may comprise C-terminal carboxylate labeling before tyrosine labeling. A method may comprise C-terminal carboxylate labeling before histidine labeling. A method may comprise C-terminal carboxylate labeling before tryptophan labeling. A method may comprise C-terminal carboxylate labeling before glutamate and/or aspartate labeling. A method may comprise C-terminal carboxylate labeling before serine and/or threonine labeling. A method may comprise at least 2, at least 3, at least 4, at least 5, or at least 6 amino acid labeling operations performed in a sequence configured to minimize or prevent label cross-reactivity (e.g., labeling more than the intended type or types of amino acids). A method may comprise 2, 3, 4, 5, or 6 amino acid labeling operations performed in a sequence configured to minimize or prevent label cross-reactivity (e.g., labeling more than the intended type or types of amino acids).

[0226] Fluorosequencing may comprise removing peptides through techniques such as Edman degradation following or preceding subject peptide detection. Sequential peptide removal may generate sequence or position-specific information. For example, a reduction in fluorescence following an N-terminal amino acid removal operation may indicate that a labeled amino acid, and/or thus that a specific type of amino acid, was disposed at a peptide N-terminal. Removal of each amino acid residue can be carried out with a variety of different techniques including Edman degradation and/or proteolytic cleavage. The techniques may include using Edman degradation to remove the terminal amino acid residue. Alternatively, the techniques may involve using an enzyme to remove the terminal amino acid residue. These terminal amino acid residues may be removed from either the C-terminus or the ^-terminus of the peptide chain. In situations where Edman degradation is used, the amino acid residue at the A-terminus of the peptide chain is removed.

[0227] In some embodiments, the sequencing by degradation comprises Edman degradation. In some embodiments, the sequencing by degradation comprises subjecting the oligomeric barcode to conditions sufficient to remove at least one monomeric subunit from the oligomeric barcode. In some embodiments, the sequencing by degradation comprises subjecting the oligomeric barcode to conditions sufficient to remove at least one amino acid from the oligomeric barcode. In some embodiments, the label generates at least one signal or at least one signal change. In some embodiments, the at least one signal or the at least one signal change is an optical signal. In some embodiments, the at least one signal or the at least one signal change comprises a plurality of signals of different intensities. In some embodiments, the at least one signal or the at least one signal change comprises a plurality of signals of different frequencies or signals of different frequency ranges.

[0228] In one embodiment, the label is attached to a fluorophore by a covalent bond. In one embodiment, the fluorophore and/or the covalent bond is resistant to degradation effects when incubated in an Edman degradation reaction solvent. A labeling moiety used in the instant application may be configured to withstand conditions for removing one or more of the amino acid residues. Some non-limiting examples of potential labeling moieties that may be used in the instant methods include, for example, those which emit a fluorescence signal in the red to infrared spectra such as an Alexa Fluor® dye, an Atto dye, Janelia Fluor® dye, a rhodamine dye, or other similar dyes. Examples of each of these dyes which were capable of withstanding the conditions of removing the amino acid residues (or examples of fluorophores that can be used with methods or compositions according to the disclosure) include Alexa Fluor® 405, Rhodamine B, tetramethyl rhodamine, Janelia Fluor® 549, Alexa Fluor® 555, Atto647N, and/or (5)6-napthofluorescein. In some embodiments, a labeling moiety is tetramethylrhodamine, Si-Rhodamine, Rhodamine B, Rhodamine B N, N' - dimethylethylenediamine, Rhodamine B sulfenyl chloride, Alexafluor555, Alexa Fluor 405, Atto647N, (5)6-napthofluorescein, variants and/or derivations thereof, etc. In one embodiment, the fluorophore is selected from the group consisting of tetramethylrhodamine, Si-Rhodamine, Rhodamine B, Rhodamine B N, N'-dimethyl ethylenediamine, Rhodamine B sulfenyl chloride, Alexafluor555, Alexa Fluor 405, Atto647N, (5)6-napthofluorescein, variants and/or derivations thereof. The labeling moiety may be a fluorescent peptide or protein or a quantum dot. In some embodiments, two-color single molecule peptide sequencing reactions can be used to identify and/or quantify biomolecules by using two or more fluorescent molecules.

[0229] In some embodiments, amino acids can be removed from the carboxy terminus of a biomolecule, revealing C-terminal sequences instead of N-terminal sequences. In some embodiments, an engineered carboxypeptidase is used to mimic Edman degradation. In some embodiments, the sequencing by degradation comprises enzymatic cleavage of the oligomeric barcode from the biomolecule. In some embodiments, the sequencing by degradation comprises chemical cleavage of the oligomeric barcode from the biomolecule. In some embodiments, the chemical cleavage comprises cyanogen bromide cleavage, BNPS-skatole cleavage, formic acid cleavage, hydroxylamine cleavage, 2-nitro-5-thiocyanobenzoic acid cleavage, or any combination thereof.

[0230] In some embodiments, the methods disclosed herein comprise identifying amino acids in peptides, comprising: a) providing a plurality of peptides immobilized on a solid support, each peptide comprising an N-terminal amino acid and internal amino acids, the internal amino acids comprising Lysine, each Lysine labeled with a first label, the first label producing a first signal for each peptide, and/or the N- terminal amino acid of each peptide labeled with a second label, the second label being different from the first label and/or selected from the group consisting of Alexa fluor dyes and Atto dyes, wherein a subset of the plurality of peptides comprise an N-terminal Lysine having both the first and/or second label; b) treating the plurality of immobilized peptides under conditions such that each N-terminal amino acid of each peptide is removed by an Edman degradation reaction; and/or c) detecting the first signal for each peptide at the single molecule level under conditions such that the subset of peptides comprising an N-terminal Lysine is identified. It is preferred that the removal of the N- terminal amino acid in (b)) is done under conditions such that the remaining peptides each have a new N-terminal amino acid. The present disclosure further contemplates in one embodiment, a method of identifying amino acids in peptides, comprising: a) providing a plurality of peptides immobilized on a solid support, each peptide comprising an N-terminal amino acid and internal amino acids, the internal amino acids comprising Lysine, each Lysine labeled with a first label, the first label producing a first signal for each peptide, and/or the N- terminal amino acid of each peptide labeled with a second label, the second label being different from the first label and/or selected from the group consisting of Alexa fluor dyes and Atto dyes, wherein a subset of the plurality of peptides comprise an N-terminal acid that is not Lysine; b) treating the plurality of immobilized peptides under conditions such that each N-terminal amino acid of each peptide is removed by an Edman degradation reaction; and/or c) detecting the first signal for each peptide at the single molecule level under conditions such that the subset of peptides comprising an N- terminal amino acid that is not Lysine is identified. It is preferred that the removal of the N-terminal amino acid in (b)) is done under conditions such that the remaining peptides each have a new N-terminal amino acid. It is preferred that the peptides are immobilized via Cysteine residues. In one embodiment, one or more of the plurality of peptides comprises one or more unnatural amino acids. In one embodiment, the unnatural amino acids comprise moieties selected from the group consisting of hydroxycarboxylates, aldehydes, thiols, and/or olefins, one embodiment, one or more of the plurality of peptides comprises one or more beta amino acids.

[0231] Detecting the immobilized peptide may comprise capturing an image comprising the peptide. The image may comprise a spatial address specific to the peptide, A plurality of peptides may be detected in a single image, wherein one or more of the peptides may comprise a spatial address within the image. The surface may be optically transparent across the visible spectrum and/or the infrared spectrum. The surface may possess a low refractive index (e.g., a refractive index between 1.3 and 1.6). The surface may be between 10 to 50 nm thick, between 20 and 80 nm thick, between 50 and 200 nm thick, between 100 and 500 nm thick, between 200 and 800 nm thick, between 500 nm and 1 pm thick, between 1 and 5 pm thick, between 2 and 10 pm thick, between 5 and 20 pm thick, between 20 and 50 pm thick, between 50 and 200 pm thick, between 200 and 500 pm thick, or greater than 500 pm in thickness. The surface may be chemically resistant to organic solvents. The surface may be chemically resistant to strong acids such as trifluoroacetic acid or sulfuric acid. A large range of substrates (like fluoropolymers (Teflon-AF (Dupont), Cytop® (Asahi Glass, Japan)), aromatic polymers (polyxylenes (Parylene, Kisco, Calif.), polystyrene, polymethmethylacrytate) and/or metal surfaces (Gold coating)), coating schemes (spin-coating, dip-coating, electron beam deposition for metals, thermal vapor deposition and/or plasma enhanced chemical vapor deposition) and/or functionalization methodologies (polyallylamine grafting, use of ammonia gas in PECVD, doping of long chain end-functionalized fluoroalkanes etc.) may be used in the methods described herein as a useful surface. A 20 nm thick, optically transparent fluoropolymer surface made of Cytop® may be used in the methods described herein. The surfaces used herein may be further derivatized with a variety of fluoroalkanes that will sequester peptides for sequencing and/or modified targets for selection. Alternatively, an aminosilane modified surfaces may be used in the methods described herein. The methods may comprise immobilizing the peptides on the surface of beads, resins, gels, quartz particles, glass beads, or combinations thereof. In some non-limiting examples, the methods contemplate using peptides that have been immobilized on the surface of Tentagel® beads, Tentagel® resins, or other similar beads or resins. The surface used herein may be coated with a polymer, such as polyethylene glycol. The surface may be amine functionalized or thiol functionalized.

[0232] A sequencing technique described herein may involve imaging the peptide or protein to determine the presence of one or more labeling moieties (e.g., amino acid labels) coupled to the peptide. The sequencing technique may comprise imaging a plurality of peptides or proteins to determine the presence of one or more labeling moieties on individual peptides from among the plurality of peptides. The sequencing technique may comprise imaging from about 10 3 to about 10 4 , from about 10 4 to about 10 5 , from about 10 5 to about 10 6 , from about 10 6 to about 10 7 , or from about 10 7 to about 10 8 proteins or peptides. The sequencing technique may comprise imaging at least about 10 3 , at least about 10 4 , at least about 10 5 , at least about 10 6 , at least about 10 7 , or at least about 10 8 or more proteins or peptides (e.g., imaging a portion of a surface comprising at least about 10 3 to at least about 10 8 proteins or peptides). The sequencing technique may comprise imaging about 10 3 , about 10 4 , about 10 5 , about 10 6 , about 10 7 , or about 10 8 or more proteins or peptides (e.g., imaging a portion of a surface comprising about 10 3 to about 10 8 proteins or peptides). The sequencing technique may comprise imaging at most about 10 3 , at most about 10 4 , at most about 10 5 , at most about 10 6 , at most about 10 7 , or at most about 10 8 or more proteins or peptides (e.g., imaging a portion of a surface comprising at most about 10 3 to at most about 10 8 proteins or peptides).

[0233] These images may be taken after each removal of an amino acid residue and thus may enable determination of the location of the specific amino acid in the peptide sequence. For example, a C-terminal immobilized peptide may comprise a sequence (from N-terminal to C- terminal) of KDDYAGGGAAGKDA (SEQ ID NO: 5) (wherein ‘K’ denotes lysine, ‘D’ denotes aspartate, ‘Y’ denotes tyrosine, ‘A’ denotes alanine, and ‘G’ denotes glycine), and/or may comprise labels coupled to each lysine and/or tyrosine residue. A first image comprising the C-terminal immobilized peptide may indicate the presence of two lysines and/or one tyrosine in the peptide. The N-terminal amino acid may be removed (e.g., by Edman degradation), such that a second image comprising the C-terminal immobilized peptide may indicate the presence of one lysine and/or one tyrosine in the peptide. This process may be repeated until a sequence of KXXYXXXXXXXKX is identified for the peptide, wherein ‘X’ indicates a non-lysine, non-tyrosine amino acid, ‘K’ indicates a lysine, and ‘Y’ indicates a tyrosine. A method of the present disclosure can identify the position of a specific amino acid in a peptide sequence. A method may be used to determine the locations of specific amino acid residues in the peptide sequence, or these results may be used to determine the entire list of amino acid residues in the peptide sequence. A method may involve determining the location of one or more amino acid residues in the peptide sequence and/or comparing these locations to documented peptide sequences, which may identify the entire list of amino acid residues in the peptide sequence. For example, identifying the positions of the lysines and/or cysteines in a 40 amino acid fragment of a human protein may uniquely identify the protein (e.g., only one human protein contains the specific pattern of lysine and/or cysteine residues identified in the 40 amino acid fragment).

[0234] An imaging method may involve a variety of different spectrophotometric and/or microscopy methods, such as fluorimetry, diffuse reflectance, interferometric scattering, Raman, resonance enhanced Raman, infrared absorbance, visible light absorbance, ultraviolet absorbance, and/or fluorescence. In some embodiments, a microscope equipped with total internal reflection illumination and/or an intensified charge-couple device (CCD) detector may be used for imaging. Depending on the absorption and/or emission spectra of fluorescent Edman labels employed, appropriate filters can be used to record the emission intensity of the labels. The fluorescent methods may employ such fluorescent techniques, such as fluorescence polarization, Forster resonance energy transfer (FRET), or time-resolved fluorescence. A spectrophotometric or microscopy method may be used to determine the presence of one or more fluorophores coupled to a single peptide. Such imaging methods may be used to determine the presence or absence of a label on a specific peptide sequence. After repeated cycles of removing an amino acid residue and/or imaging a subject peptide, the position of the labeled amino acid residue can be determined in the peptide.

[0235] For each Edman cycle, the fluorescence intensity of a label is recorded after each cleavage operation. The loss and/or uptake of a label after each cleavage operation and/or coupling operation serves as a 1) counter for the number of amino acid residues removed, and/or 2) an internal error control indicating the successful completion of each round of Edman degradation for each immobilized peptide.

[0236] Following image processing to filter noise and/or identify the location of peptides, and/or to map the locations of the same peptides across the set of collected images, intensity profiles for labels are associated with each peptide as a function of Edman cycle. The label intensity profile of each error free peptide sequencing reaction is transformed into a binary sequence in which a “1” precedes a drop in fluorescence intensity and/or its location (e.g., position within the binary sequence). Identifies the number of Edman cycles performed. A database of predicted potential proteins is used as a reference database. The binary intensity profile of each peptide, as generated from the single molecule microscopy, is then compared to the entries in the simulated peptide database. Quantification can be accomplished by counting peptides derived from each protein observed.

Applications

[0237] In some cases, the methods of the disclosure may be used to modulate a ubiquitin proteasome system using proteolysis targeting chimeras (PROTACs). In some cases, the methods of the disclosure may be used to stratify patient groups based on quantification of ubiquiti retarget protein stoichiometry on a target protein. In some cases, the methods of the disclosure may be used to monitor progress of a therapy by measuring ubiquitination levels of a target protein.

[0238] In some cases, the methods of the disclosure may be used to sequence a ubiquitinated protein in a patient sample to identify a diseased state. In some cases, the methods of the disclosure may use targeted protein sequencing to map positions of ubiquitin in a patient sample. In some cases, the methods of the disclosure may be used to map the position of one or more ubiquitination sites. In some cases, the methods of the disclosure may be used to map the position of at least 1, at least 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, or at least about 50 ubiquitination sites. In some cases, the methods of the disclosure may be used to map the position of at least 1 ubiquitination site. In some cases, the methods of the disclosure may be used to map the position of at least 5 ubiquitination sites. In some cases, the methods of the disclosure may be used to map the position of at least 10 ubiquitination sites.

[0239] In some cases, the methods of the disclosure may be used to map the position of 1, 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, or about 50 ubiquitination sites. In some cases, the methods of the disclosure may be used to map the position of at most 1, at most 2, at most about 3, at most about 4, at most about 5, at most about 6, at most about 7, at most about 8, at most about 9, at most about 10, at most about 15, at most about 20, at most about 25, at most about 30, at most about 35, at most about 40, at most about 45, or at most about 50 ubiquitination sites.

[0240] In some cases, methods of the disclosure may be used to identify the presence of a neurodegenerative disease or disorder. In some cases, the neurodegenerative disease or disorder is selected from the group consisting of ischemic and hemorrhagic stroke, spinal cord injury, brain injury, Huntington's disease, Alzheimer's disease, Parkinson's disease, Schizophrenia, Autism, Ataxia, Amyotrophic Lateral Sclerosis, Lou Gehrig's Disease, Lyme Disease, Meningitis, Migraine, Motor Neuron Diseases, Neuropathy, pain, brain damage, brain dysfunction, spinal cord disorders, peripheral nervous system disorders, cranial nerve disorders, autonomic nervous system disorders, seizure disorders such as epilepsy, movement disorders such as Parkinson's disease, sleep disorders, headaches, lower back and neck pain, neuropathic pain, delirium and dementia such as Alzheimer's disease, dizziness and vertigo, stupor and coma, head injury, stroke, tumors of the nervous system, multiple sclerosis and other demyelinating diseases, infections of the brain or spinal cord, and prion diseases.

[0241] In some cases, methods of the disclosure may be used to identify a ubiquitinylated Parkinson’s disease biomarker, for example, a-synuclein. In some cases, methods of the disclosure may be used to identify a ubiquitinylated Alzheimer’s disease biomarker, for example, Tau. In some cases, methods of the disclosure may be used to identify a ubiquitinylated Huntington’s disease biomarker, for example, Htt.

[0242] In some cases, methods of the disclosure may be used to identify the presence of a cancer in a subject. A method of the disclosure may be used to, for example, slow the proliferation of cancer cell lines, or kill cancer cells. Non-limiting examples of cancer that may be identified using the methods disclosed herein include: acute lymphoblastic leukemia, acute myeloid leukemia, adrenocortical carcinoma, AIDS-related cancers, AIDS-related lymphoma, anal cancer, appendix cancer, astrocytomas, basal cell carcinoma, bile duct cancer, bladder cancer, bone cancers, brain tumors, such as cerebellar astrocytoma, cerebral astrocytoma/malignant glioma, ependymoma, medulloblastoma, supratentorial primitive neuroectodermal tumors, visual pathway and hypothalamic glioma, breast cancer, bronchial adenomas, Burkitt lymphoma, carcinoma of unknown primary origin, central nervous system lymphoma, cerebellar astrocytoma, cervical cancer, childhood cancers, chronic lymphocytic leukemia, chronic myelogenous leukemia, chronic myeloproliferative disorders, colon cancer, cutaneous T-cell lymphoma, desmoplastic small round cell tumor, endometrial cancer, ependymoma, esophageal cancer, Ewing's sarcoma, germ cell tumors, gallbladder cancer, gastric cancer, gastrointestinal carcinoid tumor, gastrointestinal stromal tumor, gliomas, hairy cell leukemia, head and neck cancer, heart cancer, hepatocellular (liver) cancer, Hodgkin lymphoma, Hypopharyngeal cancer, intraocular melanoma, islet cell carcinoma, Kaposi sarcoma, kidney cancer, laryngeal cancer, lip and oral cavity cancer, liposarcoma, liver cancer, lung cancers, such as non-small cell and small cell lung cancer, lymphomas, leukemias, macroglobulinemia, malignant fibrous histiocytoma of bone/osteosarcoma, medulloblastoma, melanomas, mesothelioma, metastatic squamous neck cancer with occult primary, mouth cancer, multiple endocrine neoplasia syndrome, myelodysplastic syndromes, myeloid leukemia, nasal cavity and paranasal sinus cancer, nasopharyngeal carcinoma, neuroblastoma, non-Hodgkin lymphoma, non-small cell lung cancer, oral cancer, oropharyngeal cancer, osteosarcoma/malignant fibrous histiocytoma of bone, ovarian cancer, ovarian epithelial cancer, ovarian germ cell tumor, pancreatic cancer, pancreatic cancer islet cell, paranasal sinus and nasal cavity cancer, parathyroid cancer, penile cancer, pharyngeal cancer, pheochromocytoma, pineal astrocytoma, pineal germinoma, pituitary adenoma, pleuropulmonary blastoma, plasma cell neoplasia, primary central nervous system lymphoma, prostate cancer, rectal cancer, renal cell carcinoma, renal pelvis and ureter transitional cell cancer, retinoblastoma, rhabdomyosarcoma, salivary gland cancer, sarcomas, skin cancers, skin carcinoma merkel cell, small intestine cancer, soft tissue sarcoma, squamous cell carcinoma, stomach cancer, T-cell lymphoma, throat cancer, thymoma, thymic carcinoma, thyroid cancer, trophoblastic tumor (gestational), cancers of unknown primary site, urethral cancer, uterine sarcoma, vaginal cancer, vulvar cancer, Waldenstrom macroglobulinemia, and Wilms tumor.

[0243] In some cases, the methods of the disclosure may be used to sequence a biochemical process. In some cases, the methods of the disclosure may be used to quantify the number of ubiquitin moieties on a parent protein. In some cases, the methods of the disclosure may be used to quantify at least 1, at least 2, at least 2, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 ubiquitin moieties on a parent protein. In some cases, the methods of the disclosure may be used to quantify at least 1 ubiquitin moiety on a parent protein. In some cases, the methods of the disclosure may be used to quantify at least 5 ubiquitin moieties on a parent protein. In some cases, the methods of the disclosure may be used to quantify at least 10 ubiquitin moieties on a parent protein.

[0244] In some cases, the methods of the disclosure may be used to determine a ubiquitin chain length, for example, the ratio of number of ubiquitin moieties to a parent protein. In some cases, the methods of the disclosure may be used to determine a ubiquiti parent protein ratio of from about 1: 1 to about 3:1, from about 3:1 to about 6: 1, from about 6:1 to about 10:1, from about 10: 1 to about 20:1, from about 20:1 to about 30:1, or from about 30:1 to about 50:1. In some cases, the methods of the disclosure may be used to determine a ubiquitimparent protein ratio of about 1:1, about 2:1, about 3:1, about 4:1, about 5: 1, about 6:1, about 7: 1, about 8:1, about 9:1, about 10:1, about 15:1, about 20:1, about 25:1, about 30: 1, about 35:1, about 40:1, about 45:1, or about 50:1. In some cases, the methods of the disclosure may be used to determine a ubiquiti parent protein ratio of about 1:1. In some cases, the methods of the disclosure may be used to determine a ubiquitimparent protein ratio of about 5: 1. In some cases, the methods of the disclosure may be used to determine a ubiquitimparent protein ratio of about 10:1.

Computer Systems

[0245] The present disclosure provides computer systems that are programmed to implement methods of the disclosure. FIG. 1 shows a computer system 101 that is programmed or otherwise configured to implement methods or parts of methods disclosed herein, including compiling, analyzing, and displaying data obtained through the present methods. The computer system 101 may regulate various aspects of the present disclosure, such as, for example, controlling cell partitioning and optical imaging devices. The computer system 101 may be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device may be a mobile electronic device. [0246] The computer system 101 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 105, which may be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 101 also includes memory or memory location 110 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 115 (e.g., hard disk), communication interface 120 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 125, such as cache, other memory, data storage and/or electronic display adapters. The memory 110, storage unit 115, interface 120 and peripheral devices 125 are in communication with the CPU 105 through a communication bus (solid lines), such as a motherboard. The storage unit 115 may be a data storage unit (or data repository) for storing data. The computer system 101 may be operatively coupled to a computer network (“network”) 130 with the aid of the communication interface 120. The network 130 may be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 130 in some cases is a telecommunication and/or data network. The network 130 may include one or more computer servers, which may enable distributed computing, such as cloud computing. The network 130, in some cases with the aid of the computer system 101, may implement a peer-to-peer network, which may enable devices coupled to the computer system 101 to behave as a client or a server.

[0247] The CPU 105 may execute a sequence of machine-readable instructions, which may be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 110. The instructions may be directed to the CPU 105, which may subsequently program or otherwise configure the CPU 105 to implement methods of the present disclosure. Examples of operations performed by the CPU 105 may include fetch, decode, execute, and writeback.

[0248] The CPU 105 may be part of a circuit, such as an integrated circuit. One or more other components of the system 101 may be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).

[0249] The storage unit 115 may store files, such as drivers, libraries and saved programs. The storage unit 115 may store user data, e.g., user preferences and user programs. The computer system 101 in some cases may include one or more additional data storage units that are external to the computer system 101, such as located on a remote server that is in communication with the computer system 101 through an intranet or the Internet.

[0250] The computer system 101 may communicate with one or more remote computer systems through the network 130. For instance, the computer system 101 may communicate with a remote computer system of a user (e.g., a fluorimeter or a cell sorting device). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user may access the computer system 101 via the network 130.

[0251] Methods as described herein may be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 101, such as, for example, on the memory 110 or electronic storage unit 115. The machine executable or machine-readable code may be provided in the form of software. During use, the code may be executed by the processor 105. In some cases, the code may be retrieved from the storage unit 115 and stored on the memory 110 for ready access by the processor 105. In some situations, the electronic storage unit 115 may be precluded, and machineexecutable instructions are stored on memory 110.

[0252] The code may be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or may be compiled during runtime. The code may be supplied in a programming language that may be selected to enable the code to execute in a pre-compiled or as-compiled fashion.

[0253] Aspects of the systems and methods provided herein, such as the computer system 101, may be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” often in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code may be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media may include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.

[0254] Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.

[0255] The computer system 101 may include or be in communication with an electronic display 135 that comprises a user interface (UI) 140 for providing, for example, orders and options for controlling flow rates in a cell sorting device. Examples of UI’s include, without limitation, a graphical user interface (GUI) and web-based user interface.

[0256] Methods and systems of the present disclosure may be implemented by way of one or more algorithms. An algorithm may be implemented by way of software upon execution by the central processing unit 105. The algorithm may, for example, determine a correlation using linear and quadratic discriminant analysis (LDA and QDA), Support Vector Machine (SVM), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), Naive Bayes, Random Forest, or any other suitable method.

[0257] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

EXAMPLES

EXAMPLE 1: Reaction of terminal glycine with acetylacetone and acetone

[0258] FIG. 3A shows the reaction of a glycine residue, acetylacetone, and acetone. A glycine residue is reacted with acetylacetone and acetone in DMF, pyridine mixture (1:1). The resulting mixture is heated with a micro wave to form an end product with a 5 -membered ring structure, 2,3-dimethyl-5-(propan-2-ylidene)-3,5-dihydro-4H-imidazol-4- one. The 5- membered ring structure was selectively formed on an N-terminal glycine residue due to the presence of additional hydrogen atoms at the alpha carbon, which are unavailable on other amino acid residues. FIG. 3B shows the proposed reaction mechanism for labeling an N- terminal glycine.

[0259] The proposed mechanism was validated by introducing an external ketone (e.g., acetophenone) to compete with the acetone released during the reaction process, which resulted in the formation of a GFP-chromophore mimic structure. FIG. 4A shows that an external ketone can be used to compete with the acetone to form a GFP-chromophore mimicking structure.

[0260] A cross-talk experiment was performed using two different 2,3-diketones, acetylacetone and heptane-3, 5-dione. The ketone released from C-C bond cleavage was confirmed by mass spectrometry. FIG. 4B shows the results of a cross-talk experiment using two different 2,3-diketones. FIG. 5 shows the mass spectrometry data resulting from the cross-talk experiment using two different 2,3-diketones. The data show that expected products were detected, including 2,3-dimethyl-5-(propan-2-ylidene)-3,5-dihydro-4H- imidazol-4-one, 2-ethyl-3-methyl-5-(propan-2-ylidene)-3,5-dihydro-4H-imidazo l-4-one, (Z)- 5-(butan-2-ylidene)-2,3-dimethyl-3,5-dihydro-4H-imidazol-4-o ne, and (Z)-5-(butan-2- ylidene)-2-ethyl-3-methyl-3,5-dihydro-4H-imidazol-4-one.

EXAMPLE 2: Acetophenone-acetone competition reaction

[0261] Glycine-A-methylamide hydrochloride (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were added into a 10 mL of reaction vessel with a mixed solvent of pyridine and DMF (2 mL, 1:1 v/v). Next, acetylacetone (51 pL, 0.5 mmol) and acetophenone (292.5 pL, 2.5 mmol) were added, and the vessel was heated to 130 °C in a microwave (CEM, Discover 2.0) and reacted for 1.5 h. Pyridine was removed under reduced pressure, and the resulting solution was diluted using ethyl acetate (50 mL) and washed with brine (50 mL x 3 times). The organic phase was then combined, dried using anhydrous sodium sulfate, concentrated under reduced pressure, and purified by flash column chromatography on silica gel (0 - 70% of ethyl acetate in hexane). The end products 1 and 2 were obtained as orange solids with 35% and 16% yields, respectively. EXAMPLE 3: Optimization of glycine modification reaction using acetylacetone

[0262] The glycine modification reaction with acetylacetone was optimized by performing 0 the reaction in the presence or absence of microwave and/or 3 A molecular sieves (MS); changing equivalent amounts of diketone and/or glycine; adding additives (e.g., TMSOTf, TiC14, acetoneproline, 10-camphorsulfonic acid, and/or H3PO4; modifying solvents (e.g., acetic acid, toluene, 1,4-dioxane, 1,2-dichloroethane, DMF, and/or pyridine); modifying reaction temperatures and/or reaction times. The resulting reaction yields are shown in TABLE 3 and TABLE 4. The optimal condition was determined to be: acetone as additive, 2.7:3: 1 ratio of acetone:glycine amide: 1,3-diketone; 150 mg of 3 A MS (for 0.5 mmol diketone), heating the reaction via microwave, 130 °C reaction conditions at 1.5 h (bolded row in TABLE 4).

TABLE 3

TABLE 4

[0263] Reaction conditions for the synthesis of GFP-chromophore analogues were investigated using acetophenone. Conditions for glycine modification using acetylacetone in the presence of acetophenone were further optimized by modifying reactant ratios and reaction time; and performing the reaction in the presence or absence of 3 A MS (TABLE 5). The optimal reaction condition was determined to be acetophenone: glycine amide: 1,3- diketone = 5:3:1, 150 mg 3A MS (for 0.5 mmol diketone), microwave at 130 °C, and a reaction time of 1.5 h. FIG. 6 shows preliminary tests to determine the substrate scope of the GFP-chromophore analogue synthesis using 5-hydroxy-2,3-dihydro-lH-inden- 1-one, 5- hydroxy-2, 3-dihydro-lH-inden-l-one, l-(4-hydroxyphenyl)ethan-l-one, 2-amino-N- hexylacetamide hydrochloride, 2-amino-N-isopropylacetamide hydrochloride, 2-amino-N- phenylacetamide hydrochloride, and 2-amino-N-benzylacetamide. FIG. 7 and FIG. 8 show LC-MS results of modifying peptides comprising terminal glycine moieties to form terminal 2,3-dimethyl-5-(propan-2-ylidene)-3,5-dihydro-4H-imidazol-4- one moieties, which indicate that the predetermined product was obtained.

TABLE 5

EXAMPLE 4: Labeling of ubiquitin proteins (prophetic)

[0264] FIG. 2 illustrates a method of mapping ubiquitination on a peptide. A ubiquitin moiety is cleaved from a ubiquitinated parent protein using tryptic cleave. The tryptic cleave results in a GG-ubiquitin tag, which comprises the isopeptide bond from the ubiquitin moiety. The remaining N-terminal glycine moiety is specifically labeled with a fluorophore and sequenced to map the position of the ubiquitin moiety on the parent peptide.

[0265] Isolation of ubiquitinated proteins: 2 L of yeast W303 is grown in rich media to mid-log phase. The cells are lysed in 6 M guanidine-HCl, 1 M glycine, pH to 9.0 by bead beating. The soluble fraction, comprising the proteins, is isolated and concentration measured using BCA Assay (Thermo, Cat# 23225). Immunoprecipitation of ubiquitin tagged proteins are performed using Pierce Ubiquitin Enrichment kit (Thermo, Cat# 89899) according to vendor described instructions.

[0266] Trypsin digestion: The enriched proteins are denatured and cysteines are alkylated with 10 mM iodoacetamide at room temperature in the dark. The reduced and alkylated samples are incubated with sequencing-grade trypsin at a ratio of protein: trypsin of 30:1 at 37 °C with mild shaking for 6 hours. The peptides are desalted using Cl 8 tip (Thermo, Cat# 84850), dried to completion, and stored at -20 °C.

[0267] N-terminal Glycine labeling with Azide- Acetophenone: 0.1 mL of peptide are mixed with a mixture of pyridine and DMF (1 mL, 1:1 v/v). Next, acetylacetone (51 pL, 0.5 mmol) and azide- acetophenone (292.5 u L, 2.5 mmol; CAS # 223513-47-1; Enamine, Cat# EN300- 1707019) are added, and the vessel is heated to 130 °C in a microwave (CEM, Discover 2.0) and reacted for 1.5 h. Next, pyridine is removed under reduced pressure and the solution is diluted in Sodium phosphate buffer (pH 7.5, lOOmM).

[0268] Sample preparation for fluorosequencing: Peptides generated following trypsin digestion generate a GG tag coupled to the lysine’s epsilon amine (on the parent protein). The G-G residue marks the Ubiquitin tag. C-termini differentiation, tyrosine and acidic residue labeling are performed following the immobilization of peptides on solid support via the Pyridinecarboxaldehyde moiety on the support. The two amino-acids tyrosine and acid residues (glutamic acid, aspartic acid) are labeled with fluorophores Atto647N (Atto-tec) and Janelia Fluor (JF549, Tocris), respectively. The azide moiety generated through glycine labeling is reacted to DBCO-Pro30-Atto495 (Atto-tec). The result are fluorescently labeled peptides, where the C-termini are labeled with an alkyne moiety, and the tyrosine and acidic residues labeled with fluorophores.

[0269] Fluorosequencing: For single-molecule peptide sequencing, 40 mm German Desag 263 borosilicate glass coverslip (Bioptechs) surfaces are first cleaned by UV/ozone and then functionalized by soaking the coverslips for 30 min in methanol containing 0.01% azidopropyltriethoxysilane (Gelest) and 4 mM acetic acid. Weakly attached silane is removed by agitating the coverslips gently for 10 min in a bath of methanol, and subsequently gently agitating the coverslips for 10 min in water. The coverslip with immobilized peptides is dried under a nitrogen gas stream and baked in a vacuum oven for 20 min at 110 °C. Peptides are covalently coupled to the coverslip surface via copper-catalyzed click chemistry between the alkyne-modified C-terminal AA residue and the azido silane. Coverslips are incubated with a fresh solution of 2 mM copper sulfate, 1 mM tris(3-hydroxypropyltriazolylmethyl)amine (Sigma), 20 mM HEPES (pH 8.0), and 5 mM sodium ascorbate with fluorescently labeled angiotensin for 30 min at room temperature. The coverslips are then washed with water to remove unbound peptides and dried under a nitrogen gas stream. Single-molecule sequencing is performed as described. Fluorosequencing datasets are analyzed using SigProc software tool.

[0270] Data Analysis: The ubiquitinated peptides generate a signal corresponding to the fluorescently labeled GG residue. The fluorosequence generated of the parent peptide is used to identify the protein and the site of ubiquitin modifications.

EXAMPLE 5: N-terminal-glycine Selective Peptide Labelling, and the Introduction of Fluorophores [0271] Chemical methods for the selective labeling of amino acids (AAs) in peptides have drawn significant attention due to their potential use in bioconjugation, peptide-based drug discovery, and proteomic studies. Although various peptide modification methods are now available, AA selective N-terminal labeling is still challenging, especially for glycine which has no side chain at the a-carbon position.

[0272] Toward the goal of methylene group labeling, a new strategy to selectively label an N-terminal glycine was ennvisaged, one that can perform “double” activation on the a- methylene group and incorporate both the amine and methylene position in the product (FIG. 9).

[0273] The present disclosure provides a mechanistic postulate that guides selective reactivity to N-terminal glycine involving a double activation at its a-methylene group, that allows this AA to be differentiated from the other 19 AAs. This postulate led to the discovery of two reactions that commence with N-terminal glycine and P-diketones, resulting in imidazolidinone and/or pyrrole-based products in substantial yields. Depending upon the [3- diketone used, the products can be fluorescent. Mechanistic investigations revealed that the imidazolidinones formed via a C-C cleavage-induced molecular fragmentation and the subsequent recombination mechanism. This allowed production of GFP-like chromophores from N-glycine-terminated molecules via a one-pot reaction. Interestingly, when aromatic ring-substituted diketones were employed, the reaction generated pyrrole moieties. The reaction generating pyrroles was further investigated in the labeling of peptides and was found to be specific for N-terminal glycine over other AAs.

Mechanistic Postulate

[0274] Our approach to selective methylene/glycine labeling started with a mechanistic postulate that involved a cascade of imine-formation and double activation of the glycine methylene a-carbon via enols, providing the opportunity to differentiate and label N-terminal glycine because a-carbons from all other AAs are methine groups. It was postulated that the reaction of 1,3-diketones (e.g. beta-diketones) with glycine commenced with the condensation between the primary amine and one of the carbonyl groups to form an imine (Step A, FIG. 10), an enol tautomerization precedented from the Rai study (Step B, FIG. 10), an intramolecular cyclization (Step C, FIG. 10), elimination of water (Step D, FIG. 10), and a tautomerization (Step E FIG. 10) to form a pyrrole.

[0275] Herein, systematic investigations on the selective pyrrole formation with N-glycine terminated peptides are described. First discussed is an unanticipated product (an imidazolidinone) and mechanistic analysis for its formation. With an understanding of this mechanism, rational reagent modification led toward devising methods for the target pyrrole formation. Importantly, both reactions were found to be specific to N-terminal glycine, and can be implemented with N-glycine terminated peptides. Lastly, depending upon reagent structure, the resulting imidazolidinones and pyrroles are fluorescent, allowing for selective fluorescence tagging of N-glycine terminated peptides.

An unanticipated product

[0276] With the mechanistic postulate in hand (Scheme 1), the possibility of this pathway was first tested. Glycine-N-methylamide hydrochloride (GMA) and acetylacetone were chosen as the model substrates and performed the reactions under various conditions (FIG. 15). However, instead of a pyrrole, a product with the same molar mass was isolated, and identified as imidazolidinone-based product MDZ-1 via X-ray diffraction (XRD) (FIG. 16). The unusual structure suggested that a carbon-carbon bond cleavage and recombination was involved in the reaction. To understand the reaction, a pathway was proposed which starts with an imine formation (FIG. 11 panel a , Step A). However, instead of performing enolization and double activation at the glycine a-carbon, an intramolecular cyclization occurred between amide nitrogen and the activated imine (FIG. 11 panel a, Step B), which was exploited in other N-terminal peptide labeling studies. Subsequently, an acetone molecule was released via a C-C bond cleavage (FIG. 11 panel a, Step C, a retro-Mannich reaction). Enolization of the ring generates and aromatic product (FIG. 11 panel a, Step D) which then recombined with the “parent” fragment via a condensation reaction at the methylene position (FIG. 11 panel a, Step E), ultimately generating the imidazolidinone product MDZ-1 after eliminating water (FIG. 11 panel a, Step F).

[0277] To test this mechanistic hypothesis a crossover study of GMA with the 1:1 mixture of acetylacetone and 3, 5 -heptanedione (FIG. 11 panel b) was designed. If the proposed reaction pathway was correct, it was hypothesized that four combinations of the released ketones with their parent imidazolidinone rings (FIG. 11 panel b) would be generated by the reaction. As is evident in FIG. 16, the mass peaks of corresponding imine IM-1 and IM-2, imidazolidinone intermediate INT-1 and INT-2, and four of the products were all detected by mass spectrometry, validating the proposed fragmentation pathway. After elucidating the fragmentation, the reaction conditions were optimized by varying the solvents, reaction time, additives, and temperature (FIG. 15). Finally, the reaction yield increased to at least 55% with the assistance of microwave at 130 °C for 1.5 hours in a mixed solvent of N,N- dimethylformamide (DMF) and pyridine (1:1). The increase in yield from 33% to 54% after the external addition of three equiv. of acetone, was further evidence supporting the proposed reaction pathway, where acetone was first released after C-C cleavage and then recombined for the formation of the end product (FIG. 11 panel a, Step C, D & E). When acetylacetone was replaced by 3-methylacetylacetone, a mixture of trans and cis isomers (1:1 ratio) MDZ-5 was obtained in 44% yield. The results revealed that ketones other than acetone can also be installed into the end product at the recombination step (FIG. 11 panel a, Step E), bringing the opportunity to expand the product diversity by incorporating exogenous ketones.

[0278] Because the imidazolidinone products share the same core structure with GFP chromophores, it was investigated whether the ability to incorporate exogenous ketones can be used to directly create GFP chromophore derivatives, or analogous fluorescent moieties, on to glycine-terminated molecules. The hypothesis was examined using 3 equiv. of acetophenone as the added ketone, along with acetylacetone and GMA. These reactants were heated for 5 hours in a microwave oven, successfully affording MDZ-6 in 35% yield (FIG. 12). To further increase the structural diversity of the GFP chromophore derivatives, attempts to introduce electron-withdrawing and donating groups to the benzene ring were made. As is shown in Figure 2, with fluoride as the electron-withdrawing group, the product was obtained in 30% yield. When methyl, methoxy, 3,4-methylenedioxy, dimethyl amino and hydroxyl were introduced as electron-donating groups, corresponding products were obtained in 23% to 42% yields. The stereochemistry was confirmed to be Z via a crystal structure of MDZ-12 (FIG. 12), presumably due to steric effects. Although these products are analogues of GFP chromophores, none of the obtained products were found to be fluorescent, which is attributed to isomerization and rotation of the double bond in the excited state that leads to internal conversion. In other reports, cyclic rings have been shown to lock the aromatic moieties on GFP chromophore structures, prohibiting nonradiative relaxations and contributing to the generation of fluorescence.

[0279] In a further attempt to generate fluorophores, 5 -hydroxy- 1 -indanone and 7-hydroxy-l- indanone were utilized as the exogenous ketones, generating imidazolidinone MDZ-13 and MDZ-14 in 37% and 24% yields, respectively. Interestingly, MDZ-14 was found to be fluorescent with a maximum emission at 592 nm, while no significant fluorescence was observed for MDZ-13. This is likely because of the intramolecular hydrogen bonding formed in product MDZ-14 which restricts nonradiative relaxation processes and facilitates fluorescence generation. Additionally, when the amide substituent was changed to phenyl (MDZ-15), the maximum emission was found to be 588 nm, which is substantially the same as MDZ-14. [0280] Finally, albeit the MDZ structures logically arise selectively from a glycine moiety due to the double activation of the a-carbon, the selectivity of these reactions for glycine was examined. This was done using alanine- and valine-N-methylamide hydrochloride with acetylacetone. As are evident in FIGs. 17 and 18, an imine created by the initial condensation was detected and mass peaks from other products were not detected, validating the selectivity of this labeling method to N-terminal glycine.

Seeking the Pyrrole Product

[0281] When contrasting the target reaction (FIG. 10) with the unanticipated one discovered (FIG. 11 panel A), it appeared that the enolization required in FIG. 10 to generate conjugation with the imine was out-competed by a cyclization followed by fragmentation. It was reasoned that one way to favor enolization was to extend the conjugation. Thus, a phenyl ring was introduced on the diketone. The first test used benzoylacetone, which once again produced imidazolidinones (MDZ-6 and MDZ- 16). But gratifyingly, two other products possessing the same molecular mass as MDZ-6 were isolated. After a series of characterizations, the two products were confirmed by XRD analysis to be target pyrrole- based isomers Py-1 and Py-2. The formation of pyrrole-based product demonstrated that the reaction can proceed as originally hypothesized in FIG. 10.

[0282] Because the major difference between benzoylacetone and acetylacetone was the introduction of a single phenyl group, the use of an unsymmetrical diketone was considered not productive for the target reaction. If condensation to form an imine occurred on the phenyl carbonyl, this was hypothesized to lead to fragmentation. To test the hypothesis, dibenzoylmethane (DBM-1) was allowed to react with GM A, and now the product generated was the target pyrrole-based molecule Py-3 (FIG. 19) in 21% isolated yield (FIG. 13). In an attempt to improve the yield, electronic effects from electron-withdrawing para- fluorine (DBM-2) and electron-donating para-methoxy groups (DBM-3) were next examined. As expected, both reactants were successfully converted into the target products, Py-4 and Py-5, respectively, but no improvement in the reaction yields was observed. Intriguingly, a blue fluorescence with a maximum wavelength of 367 nm 364 nm and 376 nm was observed for Py-3 (FIG. 13 panels c and d), Py-4 (FIG. 19) and Py-5 (FIG. 20) respectively, offering a way to directly place a fluorescent moiety on an N-terminal glycine.

[0283] According to the central postulate of FIG. 10, the derivatization was designed to be specific for glycine, as this AA possesses the methylene group to lead to the formation of a pyrrole and the other natural amino acids do not. Thus, a control experiment was performed using alanine-N-methylamide hydrochloride and DBM-1. As shown in FIG. 22, although the peak of corresponding imine was detected, peaks with the calculated mass as the pyrrole or imidazoline products were not detected, validating the selectivity of this reaction for glycine. Labeling N-Terminal Glycine Containing Peptides

[0284] Considering these reactions were able to directly generate fluorophores in one step on N-terminal glycine using non-fluorescent reactants, they provide new approaches for the fluorescent labeling of peptides. As single step reactions, the high specificity to N-terminal glycine residues and simplicity can make them quite useful in peptide chemistry, steps were made to proceed to the goal of specifically labeling peptide. A glycine-terminated peptide H2NOC-Ala-Ala-Ala-Ala-Gly-Gly-NH2 (AAAAGG-NH2) (SEQ ID NO: 6) was synthesized as the model peptide via Solid Phase Peptide Syn-thesis (SPPS) with Rink Amide resin as the solid support. First examined was a protocol that generated imidazolidinone-based fluorophores using AAAAGG-NH2 (SEQ ID NO: 6), acetylacetone, and 7-hydroxy-l- indanone. The labeling was carried out in DMF with all other optimal conditions retained to support solubilization of the peptide. After reacting for 5 hours, a fluorescent product AAAAG(G)-MDZ (SEQ ID NO: 7) was afforded in 6% yield (FIGs. 23 and 24), lower than the model system using GMA. This yield drove investigation of the pyrrole-based peptide labeling strategy.

[0285] Delightedly, when AAAAGG-NH2 (SEQ ID NO: 6) was allowed to react with DBM- 1, the target pyrrole AAAAG(G)-Py (SEQ ID NO: 8) (Py stands for N-terminal pyrrole) was successfully obtained in 21% isolated yield, which was comparable to the small moleculebased reaction (FIG. 25). To further test the influence of adjacent AAs to this reaction, peptide AAGAAG-NH2 (SEQ ID NO: 9) was examined and gave 24% isolated yield of pyrrole. To investigate the compatibility of this reaction with other amino acids, especially lysine which has a free amino group on the side chain, peptides AAYAAG-NH2 (SEQ ID NO: 10) and AAYKAG-NH2 (SEQ ID NO: 14) were synthesized, and reactions were conducted with DBM-1. As is shown in FIG. 14, 15% and 17% isolated yield were obtained respectively for AAYAA(G)-Py (SEQ ID NO: 11) and AAYKA(G)-Py (SEQ ID NO: 15). Notably, although an additional imine-based byproduct AAYK(imine)A(G)-Py on the free amine of lysine was observed after the reaction, it was able to be converted to the target product AAYKA(G)-Py (SEQ ID NO: 15) after treatment in acidic environments via hydrolysis. Similarly, these two peptides were labeled using 4-methoxy-substituted DBM-3, and successfully yielded the end peptides AAYAA(G)-PyOMe (SEQ ID NO: 16) and AAYKA(G)-PyOMe (SEQ ID NO: 17). [0286] As many applications for this reaction may not involve isolating the end product and the purification process was thought to significantly lower the isolated yields of peptides relative to the yields in the reaction, the crude reaction yields were calculated using HPLC with toluic acid as an internal standard and found to be significantly higher than the isolated ones. For example, the calculated yield of AAYAA(G)-Py (SEQ ID NO: 11) and AAYAA(G)-PyOMe (SEQ ID NO: 16) were found to be 26 ± 5.5 % and 20 ± 5.5 % respectively, almost twice as much as their isolated yields (FIG. 14).

[0287] Because the small molecule-based product Py-3 and Py-5 were found to be fluorescent photophysical studies on the labeled peptides were next performed. As is evident in FIG.14 and FIG. 26, the pyrrole-labeled peptide AAYAA(G)-Py (SEQ ID NO: 11) has broader absorbance spectrum than the unlabeled peptide (AAYAAG-NH2) (SEQ ID NO: 10) and exhibits blue fluorescence with the maximum emission at -382 nm. Similarly, blue fluorescence was also observed for other labeled peptides with different side chains and sequences (Py-(G)AAYAA (SEQ ID NO: 11) and PyOMe-(G)AAYAA (SEQ ID NO: 16), validating the versatility of this labeling method on N-terminal glycine peptides

[0288] Even though the labeling is specific to glycine in small molecule-based reactions, whether this specificity was retained for peptide labeling needed evaluation. Thus, lysine and valine terminated peptides, AAYAAK-NH2 (SEQ ID NO: 18) and AAYAAV-NH2 (SEQ ID NO: 19), respectively, were allowed to react with DBM-1. However, products other than imine-based byproducts (which are reversible and was able to be converted to reactants under acidic conditions) were not detected by liquid chromatography mass spectrometry (LCMS) for both peptides (FIGs. 28 and 28). These results further validated the selectivity of this peptide labeling strategy.

Conclusion

[0289] In conclusion, the present report describes two approaches for the selective labeling of N-terminal glycine peptides using 1,3-diketones (e.g. beta-diketones) with alkyl or aryl substituents, affording imidazolidinone and pyrrole-based products, respectively. The imidazolidinones formed via a C-C cleavage-induced molecular fragmentation and the subsequent recombination mechanism, which was confirmed via crossover experiments. The mechanistic investigations spurred development of a method for the synthesis of GFP-like chromophores from N-glycine-terminated molecules via a one-pot reaction. Differently, when aromatic ring-substituted diketones, e.g., dibenzoylmethane derivatives, were employed, the reaction occurred via a cascade double-condensation on the amine and glycine methylene groups, generating pyrrole moieties on a N-terminated glycine. T=his approach was then successfully applied to the labeling of peptides (e.g. fluorescent labeling of peptides) and was found to be specific for N-terminal glycine over other N-terminal AAs. [0290] This appears to be the first report of a chemical approach for the selective differentiation and labeling of N-terminal of glycine peptides which exploits the terminal amine and adjacent methylene group (rather than the methine of other AAs) to achieve the specificity. Moreover, in both the imidazolidinone and pyrrole forming reactions, proper choice of the diketone synthon resulted in fluorescent species being generated, thus allowing one to follow N-terminal glycine generation via fluorescence.

EXAMPLE 6: Synthetic methods

Materials and instruments

[0291] All the chemicals were purchased and used without further purification, unless otherwise stated. Microwave reactions were performed in a CEM Discover 2.0 microwave reactor. 1H and 13C NMR were recorded on a Varian Direc tDrive or Varian INOVA 400 MHz NMR spectrometer. 1H NMR were reported in ppm with TMS and residual solvents as standards (TMS signal at 0.00 ppm, acetone signal at 2.05 ppm, methanol signal at 3.31 ppm and DMSO signal at 2.50 ppm). 13C NMR were reported with the internal chloroform signal at 77.05 ppm, acetone signal at 29.84 ppm and DMSO signal at 39.52 ppm as standards.

Liquid Chromatography/Mass spectra (LCMS) and flow injection analysis-mass spectrometry (FIA-MS) data were collected by an Agilent 6120 Single Quadrupole or Agilent 6125B Single Quadrupole LC/MS System (UT Austin Mass Spectrometry Facility). High-resolution mass spectrometry (HRMS) data were collected using an Agilent 6530 Accurate-Mass Q- TOF LC/MS or Agilent 6546 Q-TOF LC/MS (UT Austin Mass Spectrometry Facility). Absorption spectra were obtained using a Cary 100 UV-vis spectrophotometer. Fluorescence and excitation spectra were recorded on a Horiba Jobin Yvon Fluorolog®-3 spectrofluorometer (UT Austin Texas Materials Institute).

Synthesis of Reagents

[0292] Boc-Gly-OH (1g, 5.71 mmol) and HBTU (2.38g, 6.28 mmol) were dissolved into 15 mL of dry DMF in a 100 mL round-bottom flask. Hexylamine (754 pL, 5.71 mmol) and DIEA (2 mL, 11.42 mmol) were next added slowly, and the reaction was stirred at room temperature for 20 hours until completion. The solution was then diluted using 100 mL of ethyl acetate (EA) and washed with brine (50 mL) 5 times. The organic phase was collected, dried using anhydrous Na 2 SO4, concentrated and purified by flash column chromatography on silica gel (0 - 30% EA in hexane as eluent) to obtain product 1 as light-yellow oil (1.274 g, 87% yield). 1.2 g of product 1 was dissolved into 6 mL of DCM and then added to 12 mL of 4 M hydrogen chloride solution in dioxane. After stirring for 5 hours at room temperature, the final product 2 was obtained by filtration as white solid (768 mg, 85% yield). ! H NMR (400 MHz, D 2 O) 5 3.76 (s, 2H), 3.22 (t, 2H, J = 7.0 Hz), 1.54-1.45 (m, 2H), 1.33-1.23 (m, 6H), 0.84 (t, 3H, 6.8 Hz); 13 C NMR (100 MHz, D 2 O) 8 166.4, 40.2, 39.4, 30.5, 28.0, 25.6, 21.8, 13.2; HRMS (ESI): m/z calcd for C 8 HI 8 N 2 O [M + H] + : 159.1492; found: 159.1491.

[0293] Boc-Gly-OH (1g, 5.71 mmol) and HCTU (2.6g, 6.28 mmol) were dissolved into 15 mL of dry DMF in a 100 mL round-bottom flask. Benzylamine (624 pL, 5.71 mmol) and DIEA (2 mL, 11.42 mmol) were next added slowly, and the reaction was stirred at 50 °C for 18 hours until completion. The solution was then diluted using 100 mL of EA and washed with brine (50 mL) 5 times. The organic phase was collected, dried using anhydrous Na 2 SO4, concentrated and purified by flash column chromatography on silica gel to obtain compound 3 as yellow solid (503 mg, 33% yield). 476 mg of compound 3 was dissolved into 4 mL of DCM and then added to 4 mL of 4 M hydrogen chloride solution in dioxane. After stirring for 3 hours at room temperature, the final product 4 was obtained by filtration as white solid (268 mg, 74% yield). 1 H NMR (400 MHz, D 2 O) 5 7.45-7.37 (m, 2H), 7.37-7.31 (m, 3H), 4.43 (s, 2H), 3.84 (s, 2H); 13 C NMR (100 MHz, D 2 O) 6 166.7, 137.4, 128.7, 127.5, 127.2, 43.0, 40.3; HRMS (ESI): m/z calcd for C 9 HI 2 N 2 O [M + H] + : 165.1022; found: 165.1015.

[0294] Boc-Gly-OH (1g, 5.71 mmol) and HCTU (2.6g, 6.28 mmol) were dissolved into 15 mL of dry DMF in a 100 mL round-bottom flask. Analine (516 pL, 5.71 mmol) and DIEA (2 mL, 11.42 mmol) were next added slowly, and the reaction was stirred at 50 °C for 18 hours until completion. The solution was then diluted using 100 mL of EA and washed with brine (50 mL) 5 times. The organic phase was collected, dried using anhydrous Na 2 SO4, concentrated and purified by flash column chromatography on silica gel to obtain compound 5 as yellow solid (765 mg, 54% yield). 610 mg of compound 5 was dissolved into 6 mL of DCM and then added to 8 mL of 4 M hydrogen chloride solution in dioxane. After stirring for 3 hours at room temperature, the final product 6 was obtained by filtration as white solid (427 mg, 94% yield). 3 H NMR (400 MHz, D 2 O) 5 7.47-7.39 (m, 4H), 7.28-7.23 (m, 1H), 3.96 (s, 2H); 13 C NMR (100 MHz, D 2 O) 6 165.3, 136.1, 129.2, 125.7, 121.3, 40.8; HRMS (ESI): m/z calcd for C 8 HI 0 N 2 O [M + H] + : 151.0866; found: 151.0863.

[0295] Glycine-/V-methylamide hydrochloride (GMA, 188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel. Acetylacetone (51 pL, 0.5 mmol) and acetone (100 pL) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 1.5 hours. After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product MDZ-1 as yellow solid (42 mg, 55% yield). It should be noted that the reaction was only stirred at the first 1 min during heating and continuous stirring resulted in low yield. 1 H NMR (400 MHz, CDCh) 5 3.12 (s, 3H), 2.39 (s, 3H), 2.27 (s, 3H), 2.23 (s, 3H); 13 C NMR (100 MHz, CDCh) 5 168.5, 156.7, 148.3, 136.5, 29.7, 26.3, 22.2, 19.5, 15.3; HRMS (ESI): m/z calcd for C 8 HI 2 N 2 O [M + H] + : 153.1022; found: 153.1025.

[0296] Compound 2 (292.1 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel. Acetylacetone (51 pL, 0.5 mmol) and acetone (100 pL) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 1.5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na 2 SC>4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel (0 - 50% of EA in hexane) to obtain product MDZ-2 as orange oil (42 mg, 38% yield). It should be noted that the reaction was only stirred at the first 1 min during heating. H NMR (400 MHz, CDCh) 5 3.52 (t, 2H, J = 7.5 Hz), 2.39 (s, 3H), 2.28 (s, 3H), 2.22 (s, 3H), 1.62 - 1.53 (m, 2H), 1.38-1.26 (m, 6H), 0.92-0.84 (m, 3H); 13 C NMR (100 MHz, CDCh) 5 168.4, 156.7, 148.0, 136.4, 40.4, 31.4, 29.3, 26.5, 22.5, 22.1, 19.4, 15.3, 14.0;

HRMS (ESI): m/z calcd for CI 3 H 22 N 2 O [M + H] + : 223.1805; found: 223.1806.

[0297] Compound 4 (250 mg, 1.25 mmol) and 3 A molecular sieves (125 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel. Acetylacetone (42.4 pL, 0.415 mmol) and acetone (83 pL) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 1.5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na 2 SO 4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel (0 - 30% of EA in hexane) to obtain product MDZ-3 as yellow oil (38.5 mg, 41% yield). It should be noted that the reaction was only stirred at the first 1 min during heating. 1 H NMR (400 MHz, CDCh) 5 7.35 - 7.24 (m, 3H), 7.22 - 7.18 (m, 2H), 4.76 (s, 2H), 2.43 (s, 3H), 2.25 (s, 3H), 2.16 (s, 3H); 13 C NMR (100 MHz, CDCh) 8 168.3,

156.7, 148.9, 136.4, 136.2, 128.9, 127.7, 127.0, 43.5, 22.2, 19.6, 15.6; HRMS (ESI): m/z calcd for CI 4 HI 6 N 2 O [M + H] + : 229.1335; found: 229.1337.

[0298] Compound 6 (280 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel. Acetylacetone (51 pL, 0.5 mmol) and acetone (100 pL) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 1.5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na 2 SO 4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel (0 - 70% of EA in hexane, then 0 - 3% of methanol in DCM) to obtain product MDZ-4 as red oil (33 mg, 31% yield). It should be noted that the reaction was only stirred at the first 1 min during heating. 1 H NMR (400 MHz, CDCh) 8 7.52 - 7.38 (m, 3H), 7.20 (d, 2H, J = 7.56 Hz), 2.42 (s, 3H), 2.29 (s, 3H), 2.16 (s, 3H); 13 C NMR (100 MHz, CDCh) 5 167.5, 156.1, 149.3, 135.8, 133.6, 129.5, 128.6, 127.4, 22.4, 19.6, 15.9; HRMS (ESI): m/z calcd for C13H14N2O [M + H] + : 215.1179; found: 215.1178.

[0299] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel. 3- Methylacetylacetone (58.2 p L, 0.5 mmol) and 2-butanone (134.4 pL, 1.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for

1.5 hours. After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product MDZ-5 as yellow oil (36.7 mg, 44% yield). The ratio of E and Z isomers was determined to be ~1: 1 by the relative integration of the methylene protons in 3 H NMR. It should be noted that the reaction was only stirred at the first 1 min during heating. r H NMR (400 MHz, CDCh) 8 3.12 (s, 6H), 2.89 (q, 2H, J = 7.6 Hz), 2.64 (q, 2H, J = 7.6 Hz), 2.38 (s, 3H), 2.27 (s, 3H), 2.22 (s, 3H), 1.10-1.16 (m, 6H); 13 C NMR (100 MHz, CDCh) 3 168.7, 168.0, 156.9, 156.7, 154.2,

153.4, 149.0, 148.9, 135.9, 135.7, 28.4, 26.3, 25.6, 19.4, 16.7, 15.3, 15.2, 12.3, 12.2; HRMS

(ESI): m/z calcd for C9H14N2O [M + H] + : 167.1179; found: 167.1179.

[0300] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 pL, 0.5 mmol) and acetophenone (292.5 pL, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na2SC>4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel to obtain product MDZ-6 as yellow solid (37.6 mg, 35% yield). It should be noted that the reaction was only stirred at the first 1 min during heating and continuous stirring resulted in low yield. 1 H NMR (400 MHz, CDCh) 6 7.71-7.65 (m, 2H), 7.43-7.32 (m, 3H), 3.17 (s, 3H), 2.75 (s, 3H), 2.28 (s, 3H); 13 C NMR (100 MHz, CDCh) 8 170.0, 158.3, 145.2, 140.0, 136.2, 129.7, 128.9, 128.0, 26.5, 18.4, 15.4; HRMS (ESI): m/z calcd for C13H14N2O [M + H] + : 215.1179; found: 215.1174.

[0301] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 pL, 0.5 mmol) and ketone 7 (303.4 pL, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na2SO4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel (0 - 70% of EA in hexane, then 0 - 2% of methanol in DCM) to obtain product MDZ-7 in 30% yield (34.5 mg). It should be noted that the reaction was only stirred at the first 1 min during heating. X H NMR (400 MHz, CDCh) 6 7.77-7.70 (m, 2H), 7.09 (t, 2H, J = 8.7 Hz), 3.17 (s, 3H), 2.73 (s, 3H), 2.28 (s, 3H); 13 C NMR (100 MHz, CDCh) 5 169.9, 163.0 (JC-F = 248.5 Hz), 159.5, 143.5, 136.1, 135.9 (J C -F = 3.4 Hz), 131.8 (J C -F = 8.2

Hz), 115.0 (JC-F = 21.4 Hz), 26.5, 18.2, 15.4; HRMS (ESI): m/z calcd for C13H13FN2O [M +

H] + : 233.1085; found: 233.1083.

[0302] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 pL, 0.5 mmol) and ketone 8 (334 pL, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na2SO4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel to obtain product MDZ-8 in 42% yield (47 mg). It should be noted that the reaction was only stirred at the first 1 min during heating. 1 H NMR (400 MHz, CDCh) 5 7.62 (d, 2H, J = 8.2 Hz), 7.21 (d, J= 7.9 Hz, 2H), 3.16 (s, 3H), 2.73 (s, 3H), 2.37 (s, 3H), 2.27 (s, 3H); 13 C NMR (100 MHz, CDCh) 5 170.0, 157.9, 145.3, 139.1, 137.1, 135.9, 129.7, 128.7, 26.4, 21.4, 18.2, 15.4; HRMS (ESI): m/z calcd for C14H16N2O [M + H] + :

229.1335; found: 229.1333.

[0303] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 pL, 0.5 mmol) and ketone 9 (375.4 mg, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na2SO4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel to obtain product MDZ-9 in 30% yield (36.5 mg). It should be noted that the reaction was only stirred at the first 1 min during heating. 1 H NMR (400 MHz, CDCI3) 8 7.76 (d, 2H, J = 8.8 Hz), 6.93 (d, 2H, J = 8.8 Hz), 3.84 (s, 3H), 3.17 (s, 3H), 2.74 (s, 3H), 2.28 (s, 3H); 13 C NMR (100 MHz, CDCI3) 8 170.1, 160.3, 157.5, 144.7, 135.3, 132.3, 131.7, 113.5, 55.3, 26.4, 17.9, 15.4; HRMS (ESI): m/z calcd for C14H16N2O2 [M + H] + :

245.1285; found: 245.1282.

[0304] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 pL, 0.5 mmol) and ketone 10 (410.4 mg, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na2SO4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel to obtain product MDZ-10 in 23% yield (30.2 mg). It should be noted that the reaction was only stirred at the first 1 min during heating. ! H NMR (400 MHz, CDCI3) 5 7.40 (d, 1H, J = 1.6 Hz), 7.26 (dd, 1H, J = 8.3, 1.5 Hz), 6.85 (d, 1H, J= 8.2 Hz), 5.99 (s, 2H), 3.16 (s, 3H), 2.72 (s, 3H), 2.28 (s, 3H); 13 C NMR (100 MHz, CDCI3) 8 170.0, 157.8, 148.3, 147.3, 144.4, 135.6, 133.9, 124.4, 110.8, 107.9, 101.3, 26.5, 18.2, 15.4; HRMS (ESI): m/z calcd for C14H14N2O3 [M + H] + : 259.1077; found: 259.1073.

[0305] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 ,11 L, 0.5 mmol) and ketone 11 (408.1 mg, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na2SC>4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel (0 - 80% of EA in hexane) to obtain product MDZ-11 in 26% yield (32.4 mg). It should be noted that the reaction was only stirred at the first 1 min during heating. ' H NMR (400 MHz, CDCI3) 57.86 (d, 2H, J = 9.1 Hz), 6.72 (d, 2H, J = 9.1 Hz), 3.16 (s, 3H), 3.01 (s, 6H), 2.75 (s, 3H), 2.28 (s, 3H); 13 C NMR (100 MHz, CDCI3) 8 170.1, 156.0, 151.0, 145.5, 134.0, 131.9, 127.1, 111.3, 40.2, 26.4, 17.4, 15.3; HRMS (ESI): m/z calcd for C15H19N3O [M + H] + : 258.1601; found: 258.1598.

[0306] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 pL, 0.5 mmol) and ketone 12 (340.4 mg, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solution was then diluted using 50 mL of EA and washed with brine (30 mL) 3 times. The organic phase was then collected, dried using anhydrous Na2SO4 and concentrated under reduced pressure. The final product was purified by flash column chromatography on silica gel (0 - 70% of EA in hexane) to obtain product MDZ-12 in 33% yield (38 mg). It should be noted that the reaction was only stirred at the first 1 min during heating. NMR (400 MHz, ^-acetone) 5 8.73 (s, 1H), 7.88 (d, 2H, J = 8.9 Hz), 6.85 (d, 2H, 7 = 8.9 Hz), 3.13 (s, 3H), 2.69 (s, 3H), 2.26 (s, 3H); 13 C NMR (100 MHz, //-acetone) 5 170.6, 159.1, 158.9, 142.7, 136.1, 133.0, 132.1, 115.4, 26.4, 17.1, 15.2; HRMS (ESI): m/z calcd for C13H14N2O2 [M + H] + : 231.1128; found: 231.1124.

[0307] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 pL, 0.5 mmol) and ketone 13 (370.5 mg, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 1.5 hours.

After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product MDZ-13 as yellow solid (45 mg, 37% yield). It should be noted that the reaction was only stirred at the first 1 min during heating. The ! H NMR data was consistent to previous report. 1 ! H NMR (400 MHz, rZ-DMSO) 8 8.43 (d, 1H, J = 8.2 Hz), 6.78-6.72 (m, 2H), 3.18-3.14 (m, 2H), 3.06 (s, 3H), 3.02-2.97 (m, 2H), 2.28 (s, 3H); HRMS (ESI): m/z calcd for C14H14N2O2 [M + H] + :

243.1128; found: 243.1124.

[0308] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel.

Acetylacetone (51 pL, 0.5 mmol) and ketone 14 (370.5 mg, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 1.5 hours.

After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product MDZ-14 as yellow solid (29.1 mg, 24% yield). It should be noted that the reaction was only stirred at the first 1 min during heating. ' H NMR (400 MHz, CDCI3) 8 7.35-7.28 (m, 1H), 6.82 (q, 1H, J = 7.3 Hz), 6.76 (d, 1H, J = 8.2 Hz), 3.36 (t, 2H, J= 5.7 Hz), 3.17 (s, 3H), 3.11 (t, 2H, J= 5.9 Hz), 2.34 (s, 3H); 13 C NMR (100 MHz, CDCI3) 8 166.9, 157.0, 154.5, 153.8, 153.0, 134.9, 126.8, 125.0, 115.8, 115.5, 30.7, 30.6, 26.4, 14.9; HRMS (ESI): m/z calcd for C14H14N2O2 [M + H] + : 243.1128; found: 243.1128.

[0309] Compound 6 (280 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel. Acetylacetone (51 pL, 0.5 mmol) and ketone 14 (370.5 mg, 2.5 mmol) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 1.5 hours.

After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product MDZ-15 as yellow solid (36 mg, 24% yield). It should be noted that the reaction was only stirred at the first 1 min during heating. NMR (400 MHz, CDC1 3 ) 8 7.56-7.43 (m, 3H), 7.35 (t, 1H, J = 7.7 Hz), 7.27 (d, 2H, J = 8.1 Hz), 6.86 (d, 1H, 7 = 7.3 Hz), 6.81 (d, 1H, J= 8.2 Hz), 3.42 (t, 2H, J = 5.7 Hz), 3.14 (t, 2H, J = 5.7 Hz), 2.27 (s, 3H); 13 C NMR (100 MHz, CDCI3) 6 166.3, 157.1, 155.4, 153.9, 152.4, 135.0, 133.2, 129.7, 128.9, 127.4, 126.3, 125.1, 115.9, 115.6, 30.8, 30.7, 15.7; HRMS (ESI): m/z calcd for C19H16N2O2 [M + H] + : 305.1285; found: 305.1285.

[0310] GMA (188.4 mg, 1.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel. Diketone 15 (81 mg, 0.5 mmol) was next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product MDZ-6, MDZ-16, Py-1 and Py-2. It should be noted that the reaction was only stirred at the first 1 min during heating. [0311] 3 H NMR (400 MHz, CDCh) 5 7.85 (d, 2H, J = 7.4 Hz), 7.75 (d, 2H, J = 6.3 Hz), 7.51-7.35 (m, 6H), 3.34 (s, 3H), 2.85 (s, 3H); 13 C NMR (100 MHz, CDCh) 5 170.9, 158.9, 146.9, 139.9, 136.6, 131.0, 130.2, 129.6, 129.2, 128.7, 128.6, 128.0, 28.9, 18.4; HRMS (ESI): m/z calcd for CI 8 HI 6 N 2 O [M + H] + : 277.1335; found: 277.1337.

[0312] ' H NMR (400 MHz, CDCh) 5 9.75 (s, 1H), 7.55 (d, 2H, 7 = 7.8 Hz), 7.36 (t, 2H, 7 = 7.6 Hz), 7.27-7.22 (m, 1H), 6.35 (d, 1H, 7 = 2.6 Hz), 5.78 (s, 1H), 3.01 (d, 3H, J = 4.8 Hz), 2.37 (s, 3H); 13 C NMR (100 MHz, CDCh) 6 162.7, 133.6, 131.6, 128.9, 127.2, 124.5, 123.4, 121.3, 110.1, 26.5, 13.6; HRMS (ESI): m/z calcd for C13H14N2O [M + H] + : 215.1179; found: 215.1177.

[0314] GMA (62.8 mg, 0.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of DMF in a 10 mL reaction vessel. Diketone DBM-1 (336.4 mg, 1.5 mmol) was next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product Py-3 as orange solid (29.3 mg, 21% yield). It should be noted that the reaction was only stirred at the first 1 min during heating. ' H NMR (400 MHz, CDCh) 5 10.00 (s, 1H), 7.62 (d, 2H, 7 = 7.4 Hz), 7.51-7.43 (m, 4H), 7.42-7.36 (m, 3H), 7.31-7.26 (m, 1H), 6.50 (d, I H, 7 = 2.8 Hz), 5.76 (d, 1H, 7 = 3.8 Hz), 2.79 (d, 3H, 7 = 4.9 Hz); 13 C NMR (100 MHz, CDCh) 5 161.9, 135.7,

133.8. 131.3, 129.3, 129.0, 128.9, 127.8, 127.6, 127.4, 124.6, 122.2, 109.4, 26.1; HRMS (ESI): m/z calcd for CI 8 HI 6 N 2 O [M + H] + : 277.1335; found: 277.1337.

[0315] GMA (62.8 mg, 0.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of DMF in a 10 mL reaction vessel. Diketone DBM-2 (390.4 mg, 1.5 mmol) was next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product Py-4 as orange solid (27.2 mg, 17% yield). It should be noted that the reaction was only stirred at the

[0316] GMA (62.8 mg, 0.5 mmol) and 3A molecular sieves (150 mg) were first dispersed into 2 mL of DMF in a 10 mL reaction vessel. Diketone DBM-3 (426.5 mg, 1.5 mmol) was next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solvent was removed under reduced pressure and the final product was purified by flash column chromatography on silica gel to obtain product Py-5 as yellow solid (30.2 mg, 18% yield). It should be noted that the reaction was only stirred at the first 1 min during heating. 1 H NMR (400 MHz, CDCI3) 8 9.53 (s, 1H), 7.51 (d, 2H, 7 = 8.8 Hz), 7.40 (d, 2H, 7 = 8.7 Hz), 7.00 (d, 2H, 7 = 8.7 Hz), 6.94 (d, 2H, 7 = 8.8 Hz), 6.36 (d, 1H, 7 = 3.1 Hz), 5.75 (d, 1H, 7 = 4.6 Hz), 3.87 (s, 3H), 3.84 (s, 3H), 2.80 (d, 3H, 7 = 4.9 Hz); 13 C NMR (100 MHz, CDCI3) 6 162.2, 159.2, 159.1, 134.0, 130.5, 127.9, 127.5, 126.0, 124.4, 121.7, 114.5, 114.3, 108.6, 55.4, 26.2; HRMS (ESI): m/z calcd for C20H20N2O3 [M + H] + : 337.1547; found: 337.1553.

General procedure for the labeling of glycine-terminated peptides via pyrrole formation [0317] Peptides (1 equiv.) and 3A molecular sieves (150 mg for 0.2 mmol of peptide) were first dispersed into 2 mL of DMF in a 10 mL reaction vessel. Three equiv. of diketones was next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, the solvent was removed under reduced pressure and the final product was purified by reverse phase chromatography with water- acetonitrile as eluent to obtain the labeled peptide. It should be noted that the reaction was only stirred at the first 1 min during heating. To confirm that the labeling of peptides is successful, the product AAAAG(G)-Py (SEQ ID NO: 8), AAGAA(G)-Py (SEQ ID NO: 20) and AAYAA(G)-Py (SEQ ID NO: 11) were characterized using NMR (showing below) and HRMS (Sections 4). All other labeled peptides were separated and characterized using HRMS (Sections 4). 11.65 (s, 1H), 8.18 (d, 1H, J = 7.2 Hz), 8.07 (d, 1H, J = 7.2 Hz), 7.90-7.77 (m, 4H), 7.73 (d, 2H, J = 7.5 Hz), 7.54 (d, 2H, J = 7.0 Hz), 7.42 (t, 2H, J = 7.7 Hz), 7.34 (t, 2H, 7 = 7.5 Hz), 7.31-7.23 (m, 2H), 7.21 (s, 1H), 7.01 (s, 1H), 6.72 (d, 1H, 7 = 2.5 Hz), 4.35-4.10 (m, 4H), 3.87 (d, 2H, J = 5.4 Hz), 1.27-1.10 (m, 12H); 13 C NMR (100 MHz, 7-DMSO) 5 174.5, 172.6, 172.5, 172.0, 169.4, 161.6, 136.0, 133.7, 131.9, 129.4, 129.2, 128.4, 127.4, 126.9, 125.1, 122.8, 109.3, 48.9, 48.7, 48.4, 42.9, 18.7, 18.5, 18.2, 18.0; HRMS (ESI): m/z calcd for C31H37N7O6 [M + Na] + : 626.2698; found: 626.2693. 11.69 (s, 1H), 8.20 (d, 1H, 7 = 7.0 Hz), 8.11 (t, 1H, 7 = 5.3 Hz), 7.99 (d, 1H, 7 = 6.8 Hz), 7.88 (d, 1H, 7 = 7.6 Hz), 7.82 (d, 2H, 7 = 7.8 Hz), 7.56 (d, 1H, 7 = 7.0 Hz), 7.51 (d, 2H, 7 = 7.5 Hz), 7.42 (t, 2H, 7 = 7.6 Hz), 7.35 (t, 2H, 7 = 7.5 Hz), 7.31-7.23 (m, 2H), 7.15 (s, 1H), 6.99 (s, 1H), 6.70 (s, 1H), 4.48-4.38 (m, 1H), 4.28-4.10 (m, 3H), 3.66 (d, 2H, 7 = 5.0 Hz), 1.30- 1.15 (m, 12H); 13 C NMR (100 MHz, 7-DMSO) 8 174.6, 173.1, 172.6, 172.0, 169.1, 160.8, 136.0, 133.6, 131.9, 129.5, 129.2, 128.4, 127.4, 127.0, 125.1, 122.8, 109.3, 49.0, 48.84, 48.77, 48.5, 42.5, 18.51, 18.48, 18.41, 18.35; HRMS (ESI): m/z calcd for C31H37N7O6 [M + H] + : 626.2698; found: 626.2697.

[0320] PyOMe-(G)AAYAA (SEQ ID NO: 16): 15% isolated yield (26 ± 5.5% yield, calculated by HPLC). J H NMR (400 MHz, CD 3 OD) 57.67 (d, 2H, J = 8.7 Hz), 7.45 (d, 2H, J = 8.5 Hz), 6.99 (d, 2H, J = 8.6 Hz), 6.92 (t, 4H, J = 7.9 Hz), 6.60 (d, 2H, J = 8.3 Hz), 6.50 (s, 1H), 4.39-4.21 (m, 5H), 4.17-4.09 (m, 1H), 3.83 (s, 3H), 3.80 (s, 3H), 2.93-2.86 (m, 1H) ), 2.67-2.58 (m, 1H) ), 1.41-1.35 (m, 6H), 1.29 (d, 3H, 7 = 7.0 Hz), 1.21 (d, 3H, 7 = 7.2 Hz); HRMS (ESI): m/z calcd for C40H47N7O9 [M + Na] + : 792.3327; found: 792.3308.

Crossover study on the mechanism ofMDZ-1 formation

[0321] GMA (75.3 mg, 0.6 mmol), acetylacetone (25.5 pL, 0.25 mmol) and 3,5- heptanedione (33.9 pL, 0.25 mmol) were dispersed into 2 mL of glacial acetic acid in a 10 mL reaction vessel. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave oven for 1 hour. After completion, 5.6 pL of the solution was taken out and dried using air flow. The final product was dissolved into 1.5 mL of a mixed solvent of water and methanol (1:2, v/v ratio) and analyzed by flow injection analysis-mass spectrometry (FIA- MS) using an Agilent 6120 Single Quadrupole LC/MS System.

General method for fluorescence tests (DMZ-14 as the example)

[0322] 2.42 mg of DMZ-14 was dissolved into 1 mL of DMSO to make a 10 mM stock solution. After dilution in DMSO, the absorbance spectrum was recorded on a Beckman DU- 640 UV-vis spectrophotometer. Based on the absorbance, 387 nm was chosen as the excitation wavelength and the fluorescence spectrum of DMZ-14 (30 pM in DMSO) was next recorded on a Horiba Fluorolog®-3 spectrofluorometer. The same method was applied for the fluorescence tests of other DMZ- and pyrrole-based products Specificity test ofN-terminal glycine labeling via MDZ formation

[0323] Alanine-A-methylamide hydrochloride (52 mg, 0.375 mmol) and 3A molecular sieves (100 mg) were first dispersed into 1 mL of the DMF/pyridine mixed solvent (1:1, v/v) in a 10 mL reaction vessel. Acetylacetone (12.8 pL, 0.125 mmol) and acetone (25 pL) were next added. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 1.5 hours. After completion, 6 pL of the solution was taken out and dried using air flow. The final product was dissolved into 1.5 mL of a mixed solvent of MeCN and methanol (2:1, vlv ratio) and analyzed by FIA-MS using an Agilent 6120 Single Quadrupole LC/MS System. The same method was also applied for the test using valine-A-methylamide hydrochloride (62.5 mg, 0.375 mmol).

Specificity test ofN-terminal glycine labeling via pyrrole formation

[0324] Alanine-A-methylamide hydrochloride (79.3 mg, 0.5 mmol), DBM-1 (336 mg, 1.5 mmol) and 3 A molecular sieves (100 mg) were first dispersed into 2 mL of the DMF in a 10 mL reaction vessel. The vessel was sealed and reacted at 130 °C in a CEM Discover 2.0 microwave for 5 hours. After completion, 6 L of the solution was taken out and dried using air flow. The final product was dissolved into 1.5 mL of a mixed solvent of MeCN and methanol (2:1, v/v ratio) and analyzed by FIA-MS using an Agilent 6120 Single Quadrupole LC/MS System.

Synthesis of exogenous fluorophore (7-( diethylamino )-2-oxo-2H-chromene-3-carboxylic acid)-linked peptide

[0325] Propargyl-pyrrole-linked peptide and azide-derivatized 7-(diethylamino)-2-oxo-2H- chromene-3-carboxylic acid was synthesized via the scheme below. MS spectra of the propargyl-pyrrole-linked peptide and the final 7-(diethylamino)-2-oxo-2H-chromene-3- carboxylic acid product were acquired.

[0326] Scheme 6: Syn thesis of exogenous fluorophore-linked peptides

[0327] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.