Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TUNING BINDING COOPERATIVITY IN NUCLEIC ACID TARGETING COMPOUNDS
Document Type and Number:
WIPO Patent Application WO/2020/172664
Kind Code:
A1
Abstract:
Described herein are genetic recognition reagents that bind specifically to a target nucleic acid and comprise terminal left-handed PNA cooperative binding domains that enable concatenation of two or more of the recognition reagents when the reagents are hybridized to a target nucleic acid. Also provided are methods of using the genetic recognition reagents, e.g., to treat or diagnose a repeat expansion disorder, such as DM1.

Inventors:
LY DANITH (US)
THADKE SHIVAJI (US)
HSIEH WEI-CHE (TW)
Application Number:
US2020/019490
Publication Date:
August 27, 2020
Filing Date:
February 24, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CARNEGIE MELLON (US)
International Classes:
A61K48/00; C07D239/47; C07D239/54; C07D251/16; C07D471/04; C07D475/08
Domestic Patent References:
WO2018058091A12018-03-29
Foreign References:
US20160083433A12016-03-24
US20160083434A12016-03-24
US20170058325A12017-03-02
Other References:
YU ET AL.: "Orthogonal gammaPNA Dimerization Domains Empower DNA Binders with Cooperativity and Versatility Mimicking that of Transcription Factor Pairs", CHEM. EUR. J., vol. 24, no. 53, 12 July 2018 (2018-07-12), pages 14183 - 14188, XP055733954
SACUI ET AL.: "Gamma Peptide Nucleic Acids: As Orthogonal Nucleic Acid Recognition Codes for Organizing Molecular Self-Assembly", J. AM. CHEM. SOC., vol. 137, no. 26, 16 June 2015 (2015-06-16), pages 8603 - 8610, XP055733952
MANICARDI ET AL.: "Effect of chirality in gamma-PNA:PNA interaction, another piece in the picture", ARTIFICIAL DNA: PNA & XNA, vol. 5, no. 3, 17 March 2016 (2016-03-17), pages 1 - 5, XP055733953
Attorney, Agent or Firm:
HIRSHMAN, Jesse A. et al. (US)
Download PDF:
Claims:
We claim:

1. A compound, comprising:

a nucleic acid or nucleic acid analog recognition domain comprising a first end and a second end and from three to eight nucleobases in a sequence complementary to a target nucleic acid;

a first peptide nucleic acid (PNA) cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence, covalently linked to the first end of the nucleic acid or nucleic acid analog recognition domain; and

a second PNA cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence that is complementary to the nucleobase sequence of the first PNA cooperative binding domain, covalently linked to the second end of the nucleic acid or nucleic acid analog recognition domain, wherein the PNA cooperative binding domains are left-handed gamma- PNA (LH-gRNA) or at least one of the three to eight nucleobases of the PNA cooperative binding domains do not bind with sufficient strength to form duplex structures with natural nucleobases of RNA or DNA under conditions in which the nucleic acid or nucleic acid analog recognition domain binds to a target sequence in RNA or DNA,

or a pharmaceutically-acceptable salt thereof.

2. The compound of claim 1 , wherein the first and second PNA cooperative binding domains are LH-yPNA.

3. The compound of claim 1 or 2, wherein at least one nucleobase of the nucleobase sequences of the first and second PNA cooperative binding domains do not hydrogen bond with natural nucleobases.

4. The compound of any one of claims 1 -3, wherein the nucleic acid or nucleic acid analog recognition domain is PNA.

5. The compound of claim 4, wherein the nucleic acid or nucleic acid analog recognition domain is right-handed gamma-PNA (RT-yPNA).

6. The compound of claim 4, comprising one or more pendant guanidine-containing groups, amino acid side chains, or polyethylene glycol (PEG)- containing groups comprising the moiety -(O-Chte-Chtejq-, where q ranges from 1 to 100.

7. The compound of any one of claims 1 -6, wherein one or both of the first and second cooperative binding domains is covalently linked to the nucleic acid or nucleic acid analog recognition domain by a linker comprising from one to 10 ethylene glycol moieties.

8. The compound of any one of claims 1 -6, having the structure: H- LArg-LDab(b)-DR-LOrn(b’)-LArg-NH2; H-LArg-L0rn(b)-DR-L0m(b’)-LArg-NH2; or H-LArg- LLys(b)-DR-LLys(b’)-LArg-NH2, wherein Orn is ornithine, Dab is diamino butyric acid, b and b’ are the first and second PNA cooperative binding domains, and DR is the nucleic acid or nucleic acid analog recognition domain.

9. The compound of any one of claims 1 -6, having the structure:

where,

n is an integer ranging from 1 to 6;

each instance of R is, independently, one of the three to eight nucleobases

B is a ribose, deoxyribose, or a nucleic acid analog backbone residue;

L are, independently, linkers; and

b and b’ are PNA cooperative binding domains, wherein b and b’ are complementary.

10. The compound of claim 1 , wherein the recognition domain comprises nucleic acid analog backbone residues.

11. The compound of claim 10, wherein the nucleic acid analog backbone residues comprise conformationally preorganized backbone residues.

12. The compound of claim 11 , wherein the conformationally preorganized backbone residues are yPNA, LNA, or glycol nucleic acid backbone residues.

13. The compound of claim 11 , wherein the conformationally preorganized nucleic acid analog backbone residues are gRNA backbone residues.

14. The compound of claim 13, wherein one or more of the gRNA backbone residues are substituted with a group comprising an ethylene glycol unit having from 1 to 100 ethylene glycol residues, selected from the group consisting of: -(OCH2-CH2)qOPi; -(OCH2-CH2)q-NHPi; -(SCH2-CH2)q-SPi; -(OCH2-CH2 -OH; - (OCH2-CH2)r-NH2; -(OCH2-CH2)r-NHC(NH)NH2; or -(OCH2-CH2)rS- S[CH2CH2]SNHC(NH)NH2, where Pi is H, (Ci-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene and (C3-C8)cycloalkyl(Ci- C6)alkylene; where q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50, wherein the ethylene glycol unit is optionally attached to the one or more yPNA backbone residues by a (C1-C6) divalent hydrocarbyl linker or a poly(ethylene glycol) linker comprising from two to six ethylene glycol moieties.

15. The compound of claim 11 , wherein the conformationally preorganized nucleic acid analog backbone residues are RH-yPNA.

16. The compound of claim 13, having the structure:

where, each R is independently, one of the three to eight nucleobases;

n is an integer ranging from 1 and 6, such as 1 , 2, 3, 4, 5, or 6;

each L is, independently, linkers;

each Ri and R2 are, independently: ; a guanidine-containing group, an amino acid side chain, methyl, ethyl, linear or branched (C3-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci- C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain comprising from 1 to 50 units; H -CH2-(OCH2-CH2)qOPi; -CH2-(OCH2-CH2)q-NHPi; -CH2-(SCH2-CH2)q-SPi; -CH2-(OCH2-CH2)r-OH; -CH2-(OCH2-CH2)rNH2; -CH2-(OCH2-CH2)r-NHC(NH)NH2; or -CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2, where Pi is H, (Ci-C8)alkyl, (C2- Cejalkenyl, (C2-Cs)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; and

R3 is the first PNA cooperative binding domain and R3’ is the second PNA cooperative binding domain complementary to R3.

17. The compound of claim 16, wherein each linker independently comprises one or more guanidine-containing groups, one or more amino acid side chains, or one or more contiguous amino acid residues.

18. The compound of any one of claims 1 -6 and 10-15, wherein, independently, each PNA cooperative binding domain is linked to the recognition domain with a linker comprising one or more contiguous amino acid residues; a guanidine-containing group; an amino acid side chain; methyl; ethyl; linear or branched (C3-Cs)alkyl, (C2-Cs)alkenyl, (C2-Cs)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci- C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH2-(OCH2-CH2)qOPi, -CH2-(OCH2-CH2)q-NHPi, -CH2-(SCH2-CH2)q-SPi, -CH2-(OCH2-CH2)r-OH, -CH2-(OCH2-CH2)rNH2, -CH2-(OCH2-CH2)r-NHC(NH)NH2, or -CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2, or one or more contiguous amino acid residues substituted with one or more of a: guanidine-containing group, an amino acid side chain, methyl, ethyl, linear or branched (C3-C8)alkyl, (C2-C8)alkenyl, (C2-Cs)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci- C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH2-(OCH2-CH2)qOPi, -CH2-(OCH2-CH2)q-NHPi, -CH2-(SCH2-CH2)q-SPi, -CH2-(OCH2-CH2)r-OH, -CH2-(OCH2-CH2)rNH2, -CH2-(OCH2-CH2)r-NHC(NH)NH2, or -CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2;

where Pi is H, (Ci-Cs)alkyl, (C2-Cs)alkenyl, (C2-Cs)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene;

q is an integer from 0 to 50, r is an integer from 1 to 50; and s is an integer from 1 to 50.

19. The compound of any one of claims 1 -16, wherein, independently, each PNA cooperative binding domain is linked to the recognition

domain with a linker comprising the group , where v ranges from 1 to

10, each instance of X is -(OCH2CH2)-, methylene, or a hetero-atom, such as one or more of S-, 0-, or N-containing moieties.

20. The compound of claim 19, wherein v ranges from 1 to 3 and/or X is -(OCH2CH2)-.

21 . The compound of any one of claims 1 -16, wherein, independently, each PNA cooperative binding domain is linked to the recognition domain with a linker comprising, a first amino acid residue having the side group , where v ranges from 1 to 10 and each X is -(OCH2CH2)-, methylene, or a hetero-atom-containing moiety, and, optionally, N-terminal and C-terminal arginine, lysine, ornithine, or diamino butyric acid residues attached to each of the first amino acid residues.

22. The compound of any one of claims 1 -6 and 10-15, comprising a plurality of linkers having from 5 to 25 carbon atoms covalently connecting the recognition domain and the cooperative binding domains, and optionally the linker comprises one or more hetero-atoms, and/or has an atomic mass of less than 400.

23. The compound of claim 13, having the structure:

where,

each R is independently, one of the three to eight nucleobases;

n is an integer ranging from 1 and 6;

each Ri and R2 are, independently: a guanidine-containing group; an amino acid side chain; methyl, ethyl, linear or branched (C3-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci- C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; H -CH2-(OCH2-CH2)qOPi; -CH2-(OCH2- CH2)q-NHPi ; -CH2-(SCH2-CH2)q-SPi; -CH2-(OCH2-CH2)r-OH; -CH2-(OCH2-CH2)rNH2; -CH2-(OCH2-CH2)r-NHC(NH)NH2; or -CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2, where Pi is H, (Ci-Cs)alkyl, (C2-Cs)alkenyl, (C2-Cs)alkynyl, (C3-Cs)aryl, (C3- Cejcycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene;

one of R4 or Rs is -L-R3, and one of R6, R7, or Rs are -L-R3’, where R3 is the first PNA cooperative binding domain and R3’ is the second PNA cooperative binding domain complementary to R3, and each L is, independently, a linker, and the remainder of R4, Rs, R6, R7, and Rs are each, independently: H; one or more contiguous amino acid residues; a guanidine-containing group; an amino acid side chain; linear or branched (Ci-Cs)alkyl, (C2-Cs)alkenyl, (C2-Cs)alkynyl, (Ci- Cejhydroxyalkyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3- C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH2-(OCH2-CH2)qOPi , -CH2-(OCH2-CH2)q-NHPi, -CH2-(SCH2- CH2)q-SPi, -CH2-(OCH2-CH2)rOH, -CH2-(OCH2-CH2)r-NH2, -CH2-(OCH2-CH2)r-

NHC(NH)NH2, or -CH2-(OCH2-CH2)rS-S[CH2CH2]sNHC(NH)NH2; or one or more contiguous amino acid residues substituted with:

one or more of a guanidine-containing group; an amino acid side chain; linear or branched (Ci-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci- C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH2-(OCH2-CH2)qOPi, -CH2-(OCH2-CH2)q-NHPi, -CH2-(SCH2-CH2)q-SPi, - CH2-(OCH2-CH2)rOH, -CH2-(OCH2-CH2)r-NH2, -CH2-(OCH2-CH2)r-NHC(NH)NH2, or - CH2-(OCH2-CH2)rS-S[CH2CH2]sNHC(NH)NH2;

where Pi is H, (Ci-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene;

q is an integer from 0 to 50, r is an integer from 1 to 50; and s is an integer from 1 to 50.

24. The compound of claim 23, wherein one or more of Ri, R2, R4, R5, R6, R7, or Re is (Ci-C6)alkyl substituted with -(OCH2-CH2)qOPi; -(OCH2-CH2)q- NHPi; -(SCH2-CH2)q-SPi ; -(OCH2-CH2)r-OH; -(OCH2-CH2)r-NH2; -(OCH2-CH2)r- NHC(NH)NH2; or -(OCH2-CH2)rS-S[CH2CH2]sNHC(NH)NH2, where Pi is H, (Ci- Cejalkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci- C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

25. The compound of claim 23 or 24, wherein R4 and R7 are -L-R3 and -L-R3’, respectively.

26. The compound of any one of claims 23-25, wherein Rs and Rs independently comprise an arginine, lysine, ornithine, or diamino butyric acid residue.

27. The compound of claim 13, having the structure: where

n is an integer ranging from 1 to 8;

each m is, independently, an integer ranging from 1 to 5; each X is, independently, -(OCH2CH2)-, methylene, or a hetero atom- containing moiety;

each R2 is: a guanidine-containing group; an amino acid side chain; methyl, ethyl, linear or branched (C3-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3- C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH2-(OCH2-CH2)qOPi; -CH2-(OCH2-CH2)q-NHPi; -CH2-(SCH2- CH2)q-SPi; -CH2-(OCH2-CH2)rOH; -CH2-(OCH2-CH2)r-NH2; -CH2-(OCH2-CH2)r-

NHC(NH)NH2; or -CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2, where Pi is H, (Ci- Cejalkyl, (C2-C8)alkenyl,

(C2-C8)alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3- C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50;

R3 is the first left-handed PNA cooperative binding domain and R3’ is the second left-handed PNA cooperative binding domain complementary to R3; and

each of Rs, R7, and Rs are, independently H, a guanidine-containing

NH group that is where z=1 , 2, 3, 4, or 5, an amino acid side chain, or one or more contiguous amino acid residues.

28. The compound of claim 27, wherein one or more of R2, R3, R3' Rs, R7, or Rs is (Ci-C6)alkyl substituted with -(OCH2-CH2)qOPi; -(OCH2-CH2)q-NHPi; -(SCH2-CH2)q-SPi ; -(OCH2-CH2)r-OH; -(OCH2-CH2)rNH2; -(OCH2-CH2)r NHC(NH)NH2; or -(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2, where Pi is H, (Ci-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3- C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

29. The compound of any one of claims 13-26, wherein Ri is H and Ri and R2 are different.

30. The compound of any one of claims 1 -29, comprising a guanidine moiety.

31 . The compound of claim 30, wherein the guanidine-containing

moiety

32. The compound of any one of claims 16-29, wherein, R2 is -CH2- (OCH2-CH2)r-OH, wherein r is an integer of from 1 to 50, from 1 to 10, or 2.

33. The compound of any one of claim 1 -32, wherein the recognition domain is fully complementary to a nucleic acid having an expanded repeat that is associated with a repeat expansion disease.

34. The compound of claim 33, wherein the repeat expansion disease is FRDA, FRAXA, FRAXE, SCA1 , SCA2, SCA3 (MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1 , MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS.

35. The compound of claim 33 or claim 34, wherein the expanded repeat has one of the following sequences: (GAA)W, (CGG)w. (CCG)w, (CAG)w, (CTG)w, (CCTG)w, (ATTCTV· or (GGGGCC)w, where w is at least 3.

36. A method of binding a nucleic acid, comprising a target sequence, comprising contacting the nucleic acid with a compound of any one of claims 1-32, in which the sequence of the recognition domain is complementary to the target sequence.

37. The method of claim 36, wherein the nucleobase sequence of the recognition domain is fully complementary to a nucleic acid having an expanded repeat associated with a repeat expansion disease.

38. The method of claim 37, wherein the repeat expansion disease is FRDA, FRAXA, FRAXE, SCA1 , SCA2, SCA3 (MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1 , MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS.

39. The method of claim 37 or claim 38, wherein the expanded repeat has one of the following sequences: (GAA)W, (CGG)w, (CCG)w, (CAG)w. (CTG)w. (CCTG)w, (ATTCT)w, or (GGGGCC)w, where w is at least 3.

40. A method of knocking down expression of an mRNA in a cell, where the mRNA comprises a repeated target sequence, comprising contacting the target sequence of the mRNA with a compound of any one of claims 1-32, in which the sequence of the recognition domain is complementary to the repeated target sequence of the mRNA.

41. The method of claim 40, wherein the nucleobase sequence of the recognition domain is fully complementary to a nucleic acid having an expanded repeat associated with a repeat expansion disease.

42. The method of claim 41 , wherein the repeat expansion disease is FRDA, FRAXA, FRAXE, SCA1 , SCA2, SCA3 (MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1 , MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS.

43. The method of claim 41 or claim 42, wherein the expanded repeat has one of the following sequences: (GAA)W, (CGG)w, (CCG)w, (CAG)w. (CTG)w. (CCTG)w, (ATTCT)w, or (GGGGCC)w, where w is at least 3.

44. A composition comprising the compound of any one of claims 1 - 34, and a pharmaceutically-acceptable carrier.

45. A genetic recognition reagent comprising:

a nucleic acid recognition domain or nucleic acid analog recognition domain comprising a plurality of units, wherein each unit comprises a residue of a nucleic acid backbone monomer or a nucleic acid analogue backbone monomer, wherein the residue comprises a nucleobase, wherein a sequence of the plurality of units is complementary to a sequence of a target nucleic acid; and

a first peptide nucleic acid (PNA) cooperative binding domain (1 ) comprising nucleobases that do not bind with sufficient strength to form duplex structures with natural nucleobases of the nucleic acid recognition domain, the nucleic acid analog recognition domain, or the target nucleic acid or (2) is of opposite handedness relative to the target nucleic acid, wherein the first PNA cooperative binding domain is linked to the nucleic acid backbone or the nucleic acid analog backbone.

46. The genetic recognition reagent of claim 45, further comprising a second PNA cooperative binding domain having a nucleobase sequence that is complementary to a nucleobase sequence of the first PNA cooperative binding domain.

47. The genetic recognition reagent of claim 45, wherein the first PNA cooperative binding domain is a left-handed gamma-PNA (LH-yPNA).

48. The genetic recognition reagent of claim 45, wherein the nucleic acid recognition domain or the nucleic acid analog recognition domain is PNA.

49. The genetic recognition reagent of claim 47, wherein the nucleic acid recognition domain or the nucleic acid analog recognition domain is right-handed PNA (RH-gRNA).

Description:
TUNING BINDING COOPERATIVITY IN NUCLEIC ACID TARGETING

COMPOUNDS

STATEMENT REGARDING FEDERAL FUNDING

[0001] This invention was made with government support under Grant No. R21 NS098102 awarded by the National Institutes of Health. The government has certain rights in the invention.

CROSS REFERENCE TO RELATED APPLICATIONS

[0002]This application claims the benefit of co-pending U.S. Provisional Application, No. 62/919,008, filed February 22, 2019, which is incorporated herein by reference in its entirety.

BACKGROUND

[0003] Described herein are compositions and methods of binding nucleic acids using nucleic acid and nucleic acid oligomer compositions. A method of treating repeat expansion disorders, such as myotonic dystrophy type 1 (DM1 ) and type 2 (DM2), also is provided.

[0004] For most organisms, the genetic information is encoded in double-stranded DNA in the form of Watson-Crick base-pairing— in which adenine (A) pairs with thymine (T) and cytosine (C) with guanine (G). Depending on which set of this genetic information is decoded through transcription and translation, the developmental program and physiological status will be determined. Development of molecules that can be tailor-designed to bind sequence-specifically to any part of this genetic biopolymer, thereby enabling the control of the flow of genetic information and assessment and manipulation of the genome’s structures and functions, is important for biological and biomedical research. This effort is also important for medicinal and therapeutic applications for the treatment and detection of genetic diseases.

[0005] Compared to proteins, RNA molecules are easier to target because they are made up of just four building blocks (A, C, G, U), whose interactions are defined by the well-established rules of Watson-Crick base-pairing. Compared to standard, double-stranded DNA (or RNA), the secondary structures of RNA are generally thermodynamically less stable and, thus, energetically less demanding for binding because, in addition to being canonical (perfectly-matched) base-pairs, many of them are noncanonical (mismatched) and contain single-stranded loops, bulges, and junctions. The presence of these local interacting domains is essential for‘tertiary’ interactions and assembly of the secondary structures into compact three-dimensional shapes. As such, slight variations in the interaction patterns or bonding strengths within these regions will have a profound effect on the overall three-dimensional folding patterns of RNA. Thus, molecules that can be used to modulate RNA interactions and thereby interfere with the RNA folding behaviors are important as molecular tools for assessing RNA functions, as well as therapeutic and diagnostic reagents.

[0006] Genetic disorders generally occur as the result of an aberrant protein function due to mutation in the DNA coding sequence, or dysregulation at the transcriptional or translational level, resulting in the loss or gain of protein function. However, over the last three decades a preponderance of evidence has emerged showing that a large number of neuromuscular disorders, more than 20 in counting, including myotonic dystrophy type I (DMI) and type 2 (DM2), occur as the result of an unstable repeat expansion. An expansion in the coding region of a gene can lead to an altered protein function, whereas that occurring in the noncoding region can cause a disease without interfering with a protein sequence through toxic-gain of RNA function and, in certain cases, inadvertent production of deleterious polypeptides through repeat-associated non-ATG (RAN) translation.

[0007]A prototype of the latter class of genetic disorders is DMI, a debilitating muscular atrophy that affects one in every 8000 adults worldwide for which there is no effective treatment. DMI is caused by a CTG-repeat expansion in the 3'-untranslated region (3'-UTR) of the dystrophia myotonica protein kinase ( DMPK) gene, from a normal range of 5-35 repeats to a pathogenic range of 80 to >2500. The etiology of DMI is largely attributed to RNA toxicity. Upon transcription, the expanded rCUG- repeats (rCUG exp ) adopt an imperfect hairpin structure which sequesters the muscleblind-like protein I (MBNLI), a key RNA splicing regulator. Their association results in an rCUG exp -MBNLI complex that is trapped in the nucleus, precluding its export to the cytoplasm for the production of DMPK protein, as well as in preventing MBNLI from carrying out its normal physiological function. Therapeutic intervention may be developed for DMI, and possibly for other related neuromuscular conditions as well, by targeting the mutant transcript. The challenge, however, is in how to design molecules that would target the expanded transcripts without interfering with the wildtype (wt), and that would be able to displace the non-cognate proteins such as MBNLI from rCUG exP .

[0008] Nucleic acid interactions, such as RNA-RNA and RNA-protein interactions play key roles in gene regulation, including replication, translation, folding and packaging. The ability to selectively bind to these perturbed regions within the secondary structures of RNA is important in manipulating their physiological functions.

SUMMARY OF THE INVENTION

[0009]According to one aspect of the disclosure, a compound is provided. The compound may comprise: a nucleic acid or nucleic acid analog recognition domain comprising a first end and a second end and from three to eight nucleobases in a sequence complementary to a target nucleic acid; a first peptide nucleic acid (PNA) cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence, covalently linked to the first end of the nucleic acid or nucleic acid analog recognition domain; and a second PNA cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence that is complementary to the nucleobase sequence of the first PNA cooperative binding domain, covalently linked to the second end of the nucleic acid or nucleic acid analog recognition domain, wherein the PNA cooperative binding domains are left-handed gamma-PNA (LH-gRNA) or at least one of the three to eight nucleobases of the PNA cooperative binding domains do not bind with sufficient strength to form duplex structures with natural nucleobases of RNA or DNA under conditions in which the nucleic acid or nucleic acid analog recognition domain binds to a target sequence in RNA or DNA, or a pharmaceutically-acceptable salt thereof.

[0010]According to another aspect, a method of binding a nucleic acid comprising a target sequence is provided. The method may comprise contacting the nucleic acid with a compound comprising: a nucleic acid or nucleic acid analog recognition domain comprising a first end and a second end and from three to eight nucleobases in a sequence complementary to the target sequence; a first peptide nucleic acid (PNA) cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence, covalently linked to the first end of the nucleic acid or nucleic acid analog recognition domain; and a second PNA cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence that is complementary to the nucleobase sequence of the first PNA cooperative binding domain, covalently linked to the second end of the nucleic acid or nucleic acid analog recognition domain, wherein the PNA cooperative binding domains are left-handed gamma-PNA (LH-gRNA) or at least one of the three to eight nucleobases of the PNA cooperative binding domains do not bind with sufficient strength to form duplex structures with natural nucleobases of RNA or DNA under conditions in which the nucleic acid or nucleic acid analog recognition domain binds to a target sequence in RNA or DNA, or a pharmaceutically-acceptable salt thereof.

[0011] In another aspect, a method of knocking down expression of an mRNA in a cell is provided, where the mRNA comprises a repeated target sequence. The method may comprise contacting the target sequence of the mRNA with a compound comprising: a nucleic acid or nucleic acid analog recognition domain comprising a first end and a second end and from three to eight nucleobases in a sequence complementary to the repeated target sequence; a first peptide nucleic acid (PNA) cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence, covalently linked to the first end of the nucleic acid or nucleic acid analog recognition domain; and a second PNA cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence that is complementary to the nucleobase sequence of the first PNA cooperative binding domain, covalently linked to the second end of the nucleic acid or nucleic acid analog recognition domain, wherein the PNA cooperative binding domains are left-handed gamma-PNA (LH-gRNA) or at least one of the three to eight nucleobases of the PNA cooperative binding domains do not bind with sufficient strength to form duplex structures with natural nucleobases of RNA or DNA under conditions in which the nucleic acid or nucleic acid analog recognition domain binds to a target sequence in RNA or DNA, or a pharmaceutically-acceptable salt thereof.

[0012] In another aspect, a genetic recognition reagent is provided. The reagent may comprise: a nucleic acid recognition domain or nucleic acid analog recognition domain comprising a plurality of units, wherein each unit comprises a residue of a nucleic acid backbone monomer or a nucleic acid analogue backbone monomer, wherein the residue comprises a nucleobase, wherein a sequence of the plurality of units is complementary to a sequence of a target nucleic acid; and a first peptide nucleic acid (PNA) cooperative binding domain (1 ) comprising nucleobases that do not bind with sufficient strength to form duplex structures with natural nucleobases of the nucleic acid recognition domain, the nucleic acid analog recognition domain, or the target nucleic acid or (2) is of opposite handedness relative to the target nucleic acid, wherein the first PNA cooperative binding domain is linked to the nucleic acid backbone or the nucleic acid analog backbone.

[0013] In another aspect, a composition is provided, comprising a compound or genetic recognition reagent as described in the preceding paragraphs, and a pharmaceutically-acceptable carrier.

[0014]The following numbered clauses describe non-limiting embodiments or aspects of the invention:

Clause 1 . A compound, comprising:

a nucleic acid or nucleic acid analog recognition domain comprising a first end and a second end and from three to eight nucleobases in a sequence complementary to a target nucleic acid;

a first peptide nucleic acid (PNA) cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence, covalently linked to the first end of the nucleic acid or nucleic acid analog recognition domain; and a second PNA cooperative binding domain of from two to five PNA monomer residues having a nucleobase sequence that is complementary to the nucleobase sequence of the first PNA cooperative binding domain, covalently linked to the second end of the nucleic acid or nucleic acid analog recognition domain,

wherein the PNA cooperative binding domains are left-handed gamma-PNA (LH- yPNA) or at least one of the three to eight nucleobases of the PNA cooperative binding domains do not bind with sufficient strength to form duplex structures with natural nucleobases of RNA or DNA under conditions in which the nucleic acid or nucleic acid analog recognition domain binds to a target sequence in RNA or DNA,

or a pharmaceutically-acceptable salt thereof.

Clause 2. The compound of clause 1 , wherein the first and second PNA cooperative binding domains are LH-yPNA.

Clause 3. The compound of clause 1 or 2, wherein at least one nucleobase of the nucleobase sequences of the first and second PNA cooperative binding domains do not hydrogen bond with natural nucleobases. Clause 4. The compound of any one of clauses 1 -3, wherein the nucleic acid or nucleic acid analog recognition domain is PNA.

Clause 5. The compound of clause 4, wherein the nucleic acid or nucleic acid analog recognition domain is right-handed gamma-PNA (RT-yPNA).

Clause 6. The compound of clause 4, comprising one or more pendant guanidine- containing groups, amino acid side chains, or polyethylene glycol (PEG)-containing groups comprising the moiety -(O-Chte-Chtejq-, where q ranges from 1 to 100.

Clause 7. The compound of any one of clauses 1 -6, wherein one or both of the first and second cooperative binding domains is covalently linked to the nucleic acid or nucleic acid analog recognition domain by a linker comprising from one to 10 ethylene glycol moieties.

Clause 8. The compound of any one of clauses 1 -6, having the structure: H- L Arg- L Dab(b)-D R - L 0m(b’)- L Arg-NH2; H- L Arg- L 0rn(b)-D R - L 0m(b’)- L Arg-NH2; or H- L Arg- L Lys(b)-D R - L Lys(b’)- L Arg-NH2, wherein Orn is ornithine, Dab is diamino butyric acid, b and b’ are the first and second PNA cooperative binding domains, and D R is the nucleic acid or nucleic acid analog recognition domain.

Clause 9. The compound of any one of clauses 1 -6, having the structure:

where,

n is an integer ranging from 1 to 6;

each instance of R is, independently, one of the three to eight nucleobases B is a ribose, deoxyribose, or a nucleic acid analog backbone residue;

L are, independently, linkers; and

b and b’ are PNA cooperative binding domains, wherein b and b’ are complementary.

Clause 10. The compound of clause 1 , wherein the recognition domain comprises nucleic acid analog backbone residues.

Clause 1 1 . The compound of clause 10, wherein the nucleic acid analog backbone residues comprise conformationally preorganized backbone residues. Clause 12. The compound of clause 1 1 , wherein the conformationally preorganized backbone residues are yPNA, LNA, or glycol nucleic acid backbone residues.

Clause 13. The compound of clause 1 1 , wherein the conformationally preorganized nucleic acid analog backbone residues are yPNA backbone residues.

Clause 14. The compound of clause 13, wherein one or more of the yPNA backbone residues are substituted with a group comprising an ethylene glycol unit having from 1 to 100 ethylene glycol residues, selected from the group consisting of: -(OCH2- CH 2 ) q OPi; -(OCH 2 -CH 2 )q-NHPi ; -(SCH 2 -CH 2 )q-SPi; -(OCH 2 -CH 2 )r-OH; -(OCH 2 -CH 2 )r NH 2 ; -(OCH2-CH2)r-NHC(NH)NH 2 ; or -(OCH2-CH2)r-S-S[CH 2 CH2]sNHC(NH)NH2, where Pi is H, (Ci-Cs)alkyl, (C2-Cs)alkenyl, (C2-Cs)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene and (C3-C8)cycloalkyl(Ci-C6)alkylene; where q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50, wherein the ethylene glycol unit is optionally attached to the one or more yPNA backbone residues by a (C1-C6) divalent hydrocarbyl linker or a poly(ethylene glycol) linker comprising from two to six ethylene glycol moieties.

Clause 15. The compound of clause 1 1 , wherein the conformationally preorganized nucleic acid analog backbone residues are RH-yPNA.

Clause 16. The compound of clause 13, having the structure:

where,

each R is independently, one of the three to eight nucleobases;

n is an integer ranging from 1 and 6, such as 1 , 2, 3, 4, 5, or 6;

each L is, independently, linkers;

each Ri and R2 are, independently: ; a guanidine-containing group, an amino acid side chain, methyl, ethyl, linear or branched (C3-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain comprising from 1 to 50 units; H -CH 2 -(OCH2-CH 2 )qOPi; -CH 2 -(OCH2-CH 2 )q-NHPi; -CH 2 -(SCH2-CH 2 )q-SPi; -CH2-(OCH 2 -CH 2 )r-OH; -CH 2 -(OCH 2 -CH 2 )rNH 2 ; -CH 2 -(OCH 2 -CH 2 ) r - NHC(NH)NH 2 ; or -CH 2 -(OCH 2 -CH 2 )r-S-S[CH 2 CH 2 ] s NHC(NH)NH 2 , where Pi is H, (Ci-C8)alkyl, (C 2 -C8)alkenyl, (C 2 -C8)alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; and R3 is the first PNA cooperative binding domain and R3’ is the second PNA cooperative binding domain complementary to R3.

Clause 17. The compound of clause 16, wherein each linker independently comprises one or more guanidine-containing groups, one or more amino acid side chains, or one or more contiguous amino acid residues.

Clause 18. The compound of any one of clauses 1 -6 and 10-15, wherein, independently, each PNA cooperative binding domain is linked to the recognition domain with a linker comprising one or more contiguous amino acid residues; a guanidine-containing group; an amino acid side chain; methyl; ethyl; linear or branched (C3-Cs)alkyl, (C 2 -C8)alkenyl, (C 2 -C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci- C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH 2 -(OCH 2 -CH 2 ) q OPi, -CH 2 -(OCH 2 -CH 2 ) q -NHPi, -CH 2 -(SCH 2 -CH 2 ) q -SPi, -CH 2 -(OCH 2 -CH 2 ) r -OH, -CH 2 -(OCH 2 -CH 2 )rNH 2 , -CH 2 -(OCH 2 -CH 2 )r-NHC(NH)NH 2 , or -CH 2 -(OCH 2 -CH 2 )r-S-S[CH 2 CH 2 ] s NHC(NH)NH 2 , or one or more contiguous amino acid residues substituted with one or more of a:

guanidine-containing group, an amino acid side chain, methyl, ethyl, linear or branched (C3-Cs)alkyl, (C 2 -C8)alkenyl, (C 2 -C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci- C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH 2 -(OCH 2 -CH 2 ) q OPi, -CH 2 -(OCH 2 -CH 2 ) q -NHPi, -CH 2 -(SCH 2 -CH 2 ) q -SPi, -CH 2 -(OCH 2 -CH 2 ) r -OH, -CH 2 -(OCH 2 -CH 2 )rNH 2 , -CH 2 -(OCH 2 -CH 2 )r NHC(NH)NH 2 , or -CH 2 -(OCH 2 -CH 2 )r-S-S[CH 2 CH 2 ] s NHC(NH)NH 2 ;

where Pi is H, (Ci-Cs)alkyl, (C 2 -C8)alkenyl, (C 2 -C8)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50, r is an integer from 1 to 50; and

s is an integer from 1 to 50. Clause 19. The compound of any one of clauses 1 -16, wherein, independently, each PNA cooperative binding domain is linked to the recognition domain with a linker

comprising the group , where v ranges from 1 to 10, each instance of

X is -(OCH2CH2)-, methylene, or a hetero-atom, such as one or more of S-, 0-, or N-containing moieties.

Clause 20. The compound of clause 19, wherein v ranges from 1 to 3 and/or X is -(OCH2CH2)-.

Clause 21. The compound of any one of clauses 1 -16, wherein, independently, each PNA cooperative binding domain is linked to the recognition domain with a linker

comprising, a first amino acid residue having the side group , where v ranges from 1 to 10 and each X is -(OCH2CH2)-, methylene, or a hetero-atom- containing moiety, and, optionally, N-terminal and C-terminal arginine, lysine, ornithine, or diamino butyric acid residues attached to each of the first amino acid residues.

Clause 22. The compound of any one of clauses 1 -6 and 10-15, comprising a plurality of linkers having from 5 to 25 carbon atoms covalently connecting the recognition domain and the cooperative binding domains, and optionally the linker comprises one or more hetero-atoms, and/or has an atomic mass of less than 400. Clause 23. The compound of clause 13, having the structure:

where,

each R is independently, one of the three to eight nucleobases;

n is an integer ranging from 1 and 6;

each Ri and R2 are, independently: a guanidine-containing group; an amino acid side chain; methyl, ethyl, linear or branched (C3-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; H -CH2-(OCH2- CH 2 ) q OPi; -CH 2 -(OCH2-CH 2 )q-NHPi; -CH 2 -(SCH2-CH 2 )q-SPi; -CH 2 -(OCH 2 - CH 2 )r-OH; -CH2-(OCH2-CH2)r-NH 2 ; -CH2-(OCH2-CH2)rNHC(NH)NH 2 ; or -CH 2 - (OCH2-CH2)r-S-S[CH 2 CH2]sNHC(NH)NH2, where Pi is H, (Ci-C 8 )alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci- C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene;

one of R4 or Rs is -L-R3, and one of R6, R7, or Rs are -L-R3’, where R3 is the first PNA cooperative binding domain and R3’ is the second PNA cooperative binding domain complementary to R3, and each L is, independently, a linker, and the remainder of R4, Rs, R6, R7, and Rs are each, independently: H; one or more contiguous amino acid residues; a guanidine-containing group; an amino acid side chain; linear or branched (Ci-Cs)alkyl, (C2-Cs)alkenyl, (C2-Cs)alkynyl, (Ci-C8)hydroxyalkyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH2-(OCH2-CH2) q OPi, -CH2-(OCH2- CH 2 ) q -NHPi, -CH 2 -(SCH2-CH 2 )q-SPi, -CH2-(OCH 2 -CH 2 )r-OH, -CH 2 -(OCH 2 - CH 2 )r-NH 2 , -CH2-(OCH2-CH 2 )r-NHC(NH)NH2, or -CH2-(OCH2-CH 2 )r-S- S[CH2CH2] S NHC(NH)NH2; or one or more contiguous amino acid residues substituted with:

one or more of a guanidine-containing group; an amino acid side chain; linear or branched (Ci-Cs)alkyl, (C2-Cs)alkenyl, (C2-Cs)alkynyl, (Ci-C8)hydroxyalkyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-Cs)aryl(Ci- C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH2-(OCH2-CH2) q OPi, -CH 2 -(OCH2-CH 2 )q-NHPi, -CH 2 -(SCH2-CH 2 )q-SPi, -CH2-(OCH 2 -CH 2 )r OH, -CH2-(OCH2-CH 2 )r-NH2, -CH2-(OCH2-CH 2 )r-NHC(NH)NH2, or -CH2- (OCH 2 -CH2)rS-S[CH2CH2]sNHC(NH)NH 2 ;

where Pi is H, (Ci-Cs)alkyl, (C2-Cs)alkenyl, (C2-Cs)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci- C6)alkylene;

q is an integer from 0 to 50, r is an integer from 1 to 50; and s is an integer from 1 to 50.

Clause 24. The compound of clause 23, wherein one or more of Ri, R2, R4, Rs, R6, R 7> or Rs is (Ci-Cs)alkyl substituted with -(OCH 2 -CH 2 )qOPi; -(OCH 2 -CH 2 ) q -NHPi; -(SCH 2 -CH 2 ) q -SPi ; -(OCH 2 -CH 2 ) r -OH; -(OCH 2 -CH 2 )rNH 2 ; -(OCH 2 -CH 2 )r NHC(NH)NH 2 ; or -(OCH 2 -CH 2 )r-S-S[CH 2 CH 2 ] s NHC(NH)NH 2 , where Pi is H, (Ci-C8)alkyl, (C 2 -Cs)alkenyl, (C 2 -Cs)alkynyl, (C3-Cs)aryl, (C3-Cs)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

Clause 25. The compound of clause 23 or 24, wherein R4 and R7 are -L-R3 and -L-R3’, respectively.

Clause 26. The compound of any one of clauses 23-25, wherein Rs and Rs independently comprise an arginine, lysine, ornithine, or diamino butyric acid residue. Clause 27. The compound of clause 13, having the structure:

where,

n is an integer ranging from 1 to 8;

each m is, independently, an integer ranging from 1 to 5;

each X is, independently, -(OCH 2 CH 2 )-, methylene, or a hetero atom- containing moiety;

each R 2 is: a guanidine-containing group; an amino acid side chain; methyl, ethyl, linear or branched (C3-Cs)alkyl, (C 2 -Cs)alkenyl, (C 2 -Cs)alkynyl, (Ci-C8)hydroxyalkyl, (C3-Cs)aryl, (C3-Cs)cycloalkyl, (C3-Cs)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH 2 -(OCH 2 -CH 2 ) q OPi; -CH 2 -(OCH 2 - CH 2 ) q -NHPi; -CH 2 -(SCH 2 -CH 2 ) q -SPi; -CH 2 -(OCH 2 -CH 2 ) r -OH; -CH 2 -(OCH 2 - CH 2 ) r -NH 2 ; -CH 2 -(OCH 2 -CH 2 )r-NHC(NH)NH 2 ; or -CH 2 -(OCH 2 -CH 2 )rS- S[CH 2 CH 2 ] S NHC(NH)NH 2 , where Pi is H, (Ci-Cs)alkyl, (C 2 -Cs)alkenyl, (C 2 -C 8 )alkynyl, (C3-C 8 )aryl, (C3-C 8 )cycloalkyl, (C3-C 8 )aryl(Ci-C6)alkylene or (C3-C 8 )cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50;

R3 is the first left-handed PNA cooperative binding domain and R3’ is the second left-handed PNA cooperative binding domain complementary to R3; and each of Rs, R7, and Rs are, independently H, a guanidine-containing group that

NH an amino acid side chain, or one or more contiguous amino acid residues.

Clause 28. The compound of clause 27, wherein one or more of R2, R3, R3' Rs, R7, or Rs is (Ci-Ce)alkyl substituted with -(OCH 2 -CH 2 )qOPi; -(OCH 2 -CH 2 ) q -NHPi; -(SCH 2 - CH 2 ) q -SPi; -(OCH 2 -CH 2 )r-OH; -(OCH 2 -CH 2 )rNH 2 ; -(OCH 2 -CH 2 )rNHC(NH)NH 2 ; or -(OCH 2 -CH 2 )r-S-S[CH 2 CH 2 ] s NHC(NH)NH 2 , where Pi is H, (Ci-C 8 )alkyl, (C 2 -C 8 )alkenyl, (C 2 -C 8 )alkynyl, (C3-C 8 )aryl, (C3-C 8 )cycloalkyl, (C3-C 8 )aryl(Ci- C6)alkylene or (C3-C 8 )cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

Clause 29. The compound of any one of clauses 13-26, wherein Ri is H and Ri and R 2 are different.

Clause 30. The compound of any one of clauses 1-29, comprising a guanidine moiety.

Clause 31. The compound of clause 30, wherein the guanidine-containing moiety is

Clause 32. The compound of any one of clauses 16-29, wherein, R 2 is -CH 2 -(OCH 2 - CH 2 ) r -OH, wherein r is an integer of from 1 to 50, from 1 to 10, or 2.

Clause 33. The compound of any one of clause 1-32, wherein the recognition domain is fully complementary to a nucleic acid having an expanded repeat that is associated with a repeat expansion disease.

Clause 34. The compound of clause 33, wherein the repeat expansion disease is FRDA, FRAXA, FRAXE, SCA1 , SCA2, SCA3 (MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1 , MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS. Clause 35. The compound of clause 33 or clause 34, wherein the expanded repeat has one of the following sequences: (GAA) W , (CGG)w, (CCG)w, (CAG)w. (CTG)w. (CCTG)w, (ATTCT)w, or (GGGGCC)w, where w is at least 3.

Clause 36. A method of binding a nucleic acid, comprising a target sequence, comprising contacting the nucleic acid with a compound of any one of clauses 1-32, in which the sequence of the recognition domain is complementary to the target sequence.

Clause 37. The method of clause 36, wherein the nucleobase sequence of the recognition domain is fully complementary to a nucleic acid having an expanded repeat associated with a repeat expansion disease.

Clause 38. The method of clause 37, wherein the repeat expansion disease is FRDA, FRAXA, FRAXE, SCA1 , SCA2, SCA3 (MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1 , MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS.

Clause 39. The method of clause 37 or clause 38, wherein the expanded repeat has one of the following sequences: (GAA) W , (CGG)w, (CCG)w, (CAG)w, (CTG)w, (CCTG)w, (ATTCT)w, or (GGGGCC)w, where w is at least 3.

Clause 40. A method of knocking down expression of an mRNA in a cell, where the mRNA comprises a repeated target sequence, comprising contacting the target sequence of the mRNA with a compound of any one of clauses 1-32, in which the sequence of the recognition domain is complementary to the repeated target sequence of the mRNA.

Clause 41. The method of clause 40, wherein the nucleobase sequence of the recognition domain is fully complementary to a nucleic acid having an expanded repeat associated with a repeat expansion disease.

Clause 42. The method of clause 41 , wherein the repeat expansion disease is FRDA, FRAXA, FRAXE, SCA1 , SCA2, SCA3 (MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1 , MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS.

Clause 43. The method of clause 41 or clause 42, wherein the expanded repeat has one of the following sequences: (GAA) W , (CGG)w, (CCG)w, (CAG)w, (CTG)w, (CCTG)w, (ATTCT)w, or (GGGGCC)w, where w is at least 3.

Clause 44. A composition comprising the compound of any one of clauses 1 -34, and a pharmaceutically-acceptable carrier.

Clause 45. A genetic recognition reagent comprising: a nucleic acid recognition domain or nucleic acid analog recognition domain comprising a plurality of units, wherein each unit comprises a residue of a nucleic acid backbone monomer or a nucleic acid analogue backbone monomer, wherein the residue comprises a nucleobase, wherein a sequence of the plurality of units is complementary to a sequence of a target nucleic acid; and

a first peptide nucleic acid (PNA) cooperative binding domain (1 ) comprising nucleobases that do not bind with sufficient strength to form duplex structures with natural nucleobases of the nucleic acid recognition domain, the nucleic acid analog recognition domain, or the target nucleic acid or (2) is of opposite handedness relative to the target nucleic acid, wherein the first PNA cooperative binding domain is linked to the nucleic acid backbone or the nucleic acid analog backbone.

Clause 46. The genetic recognition reagent of clause 45, further comprising a second PNA cooperative binding domain having a nucleobase sequence that is complementary to a nucleobase sequence of the first PNA cooperative binding domain.

Clause 47. The genetic recognition reagent of clause 45, wherein the first PNA cooperative binding domain is a left-handed gamma-PNA (LH-yPNA).

Clause 48. The genetic recognition reagent of clause 45, wherein the nucleic acid recognition domain or the nucleic acid analog recognition domain is PNA.

Clause 49. The genetic recognition reagent of clause 47, wherein the nucleic acid recognition domain or the nucleic acid analog recognition domain is right-handed PNA (RH-yPNA).

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] Fig. 1 shows cooperative binding of nucleic acid probes to expanded RNA- repeats.

[0016] Fig. 2A: Examples of chemical compositions of PNA for recognition (a') and cooperative binding (b and b') domains. Fig. 2B provides exemplary naturally- compatible and orthogonal nucleobases for B of Figure 2A

[0017] Fig. 3 (A-F) provides exemplary structures of nucleic acid analogs. R is a nucleobase moiety.

[0018] Fig. 4 provides examples of amino acid side chains.

[0019] Figure 5 provides structures of exemplary nucleobases. R is H or a protecting group. [0020] Figures 6A-6C provide structures of exemplary divalent nucleobases. R is a nucleic acid backbone or a nucleic acid analog backbone, or a residue thereof. Ri is H or a protecting group.

[0021] Fig. 7 provides examples of repeated expansions in humans.

[0022] Fig. 8 depicts a first scheme for targeting repeat expansions as described in Example 1 .

[0023] Fig. 9 depicts a second scheme for targeting repeat expansions as described in Example 2. I is isocytosine and Q is isoguanine, as depicted in Fig. 2B.

[0024] Fig. 10 depicts a third scheme for targeting repeat expansions as described in Example 3. I is isocytosine and Q is isoguanine, as depicted in Fig. 2B.

DETAILED DESCRIPTION

[0025] The use of numerical values in the various ranges specified in this application, unless expressly indicated otherwise, are stated as approximations as though the minimum and maximum values within the stated ranges are both preceded by the word "about". In this manner, slight variations above and below the stated ranges can be used to achieve substantially the same results as values within the ranges. Also, unless indicated otherwise, the disclosure of ranges is intended as a continuous range including every value between the minimum and maximum values. As used herein“a” and“an” refer to one or more.

[0026]As used herein, the term“comprising” is open-ended and may be synonymous with“including”,“containing”, or“characterized by”. The term“consisting essentially of” limits the scope of a claim to the specified materials or steps and those that do not materially affect basic and novel characteristic(s). The term“consisting of excludes any element, step, or ingredient not specified in the claim. As used herein, embodiments“comprising” one or more stated elements or steps also include, but are not limited to embodiments“consisting essentially of” and“consisting of these stated elements or steps.

[0027] Provided herein are compositions and methods for binding target sequences in nucleic acids, for example for binding repeat expansions associated with diseases involving repeat expansions of nucleotide sequences. Fig. 1 is a schematic diagram illustrating the cooperative binding of a non-limiting example of the recognition reagents (genetic recognition reagents) described herein, targeting CUG repeats in RNA hairpin structures as seen in DM1 . Cooperative binding of modules to adjacent modules is facilitated in this example by the terminal left-handed PNA groups. The plurality of recognition reagents bind by Watson-Crick or Watson-Crick-like cooperative base pairing to a template nucleic acid. In a cell a template nucleic acid is an RNA or DNA molecule, though in vitro, a template nucleic acid can be any RNA or DNA, as well as a modified nucleic acid or a nucleic acid analog. Recognition reagents in sufficient proximity, for example binding to adjacent sequences on a template nucleic acid, will concatenate, via hybridization of the left-handed PNA cooperative binding domains, thus concatenating to essentially form a longer oligomer or polymer.

[0028] Fig. 1 provides a non-limiting example of a scheme depicting cooperative binding of nucleic acid probe (recognition reagent) to expanded RNA-repeats, as described herein. The recognition reagent has two domains, a nucleic acid or nucleic acid analog recognition domain (a') and nucleic acid or nucleic acid analog cooperative binding domains (b and b'). The recognition domain comprises a nucleic acid or nucleic acid analog, such as achiral or chiral peptide nucleic acid backbone to which nucleobases are attached. The nucleobases can be natural or unnatural (not found in nature). As an individual probe in the unbound state, b and b' are unable to form stable Watson-Crick (or related) H-bonding interactions with each other due to length constraint in the linker that connects the recognition to the cooperative binding domain and/or due to the length and rigidity of the backbone of the recognition domain a’. In use, as an example, in the presence of expanded RNA-repeats, the probe hybridizes to its respective RNA-repeat next to one another, mediated by the intermolecular Watson-Crick H-bonding interactions of the adjacent cooperative binding elements (b and b'). Figs. 8 through 10 (See Examples, below) provide non-limiting examples of molecular designs for targeting RNA-repeated expansion. Such a binding mode enables probe to discriminate the expanded RNA-repeats from that of the normal transcript, providing a novel therapeutic modality for treating neuromuscular and neurodegenerative disorders associated with unstable repeat expansions (Fig. 7, below).

[0029] Of note, the cooperative binding domains are orthogonal to natural nucleic acids, such as RNA or DNA. By“orthogonal” it is meant that they do not hybridize to natural RNA or DNA sequences, or do not hybridize to natural RNA or DNA sequences to any significant extent in the context of the use of the recognition reagent. This may be accomplished by use of an orthogonal nucleic acid analog backbone for the cooperative binding domains, such as left-handed gRNA (see, e.g., Example 1 , below), or by using orthogonal nucleobases for the cooperative binding domains (see, e.g., Examples 2 and 3, below). Both orthogonal nucleic acid analog backbones and orthogonal nucleobases may be used in one or both cooperative binding domains, such as by combining left-handed gRNA, e.g., as in Example 1 , below, with the orthogonal nucleobase sequences, such as IQQ and I IQ as depicted in Examples 2 and 3, below.

[0030] In reference to Fig. 2A, a is PNA (achiral), b is right-handed gRNA, and c is left- handed gRNA. X includes, but are not limited to, any one of the following function groups:

(1 ) Amino acid sidechains (Ala, -CH 3 ; Val, -CH(CH 3 )2; lie, -CH(CH 3 )CH 2 CH 3 ; Leu, -CH 2 CH(CH 3 ) 2 ; Met, -CH 2 CH 2 SCH 3 ; Phe, -ChhCeHs; Tyr, -CI-hCeFUOH; Trp, -CH 2 C 8 I-15NH; Ser, -CH 2 OH; HSer, -CH 2 CH 2 OH; Thr, -CHCHsOH; Asn, -CH 2 CONH 2 ; Gin, -CH 2 CH 2 CONH 2 ; Cys, -CHSH; Sec, -CH 2 SeH; Gly, -H; Pro, — (CH 2 ) 3 -; Arg, -(CH 2 ) 3 NHC(NH)NH 2 ; His, -CH 2 C 3 H 3 N 2 ; Lys, -(CH 2 ) NH 2 ; Asp, -CH 2 C0 2 H; and Glu, -(CH 2 ) 2 C0 2 H).

(2) Linear or branched (Ci-Csjalkyl, (C 2 -C8)alkenyl, (C 2 -C8)alkynyl, (C 3 -C8)aryl, (C 3 -C8)cycloalkyl, (C 3 -C8)aryl(Ci-C6)alkylene, (C 3 -C8)cycloalkyl(Ci- C 6 )alkylene, -CH 2 -(OCH 2 -CH 2 ) q OH, -CH 2 -(OCH 2 -CH 2 ) q -NH 2 , -CH 2 -(OCH 2 - CH 2 ) q -NHC(NH)NH 2 , -CH 2 -(0CH 2 -CH 2 -0) q -SH and -CH 2 -(SCH 2 -CH 2 ) q -SH, -(CH 2 CH 2 ) q -NHC(NH)NH 2 , where subscript q is an integer between 0-25.

[0031] Nucleobases may include any one of the nucleobases (both natural and unnatural pairs and the derivatives therefor) depicted in Fig. 2B. One or more of the nucleobases, independently, may be orthogonal nucleobases, such as an orthogonal nucleobase depicted in Fig. 2B. In some cases, an orthogonal nucleobase (e.g. in the RH-gRNA cooperative binding domains) can be a nucleobase that does not bind to a natural nucleobase.

[0032] Advantages may comprise (1 ) the tight, specific, selective, and cooperative binding of the nucleic acid probes to the expanded RNA-repeats, and (2) the generalization of this approach to other repeated sequences, including, but not limited to, CUG (DM1 ), CCUG (DM2), GCC (FRAX-E), GAA (FRDA), AUUCU (SCA10), and GGGGCC (ALS). A combination of the right-handed (RH) and left-handed (LH) gamma peptide nucleic acid (PNA), along with the natural and unnatural nucleic acid recognition elements can be employed in the construction of nucleic probes for discrimination of the expanded RNA-repeats.

[0033]As used herein, a "patient" can be an animal, such as a mammal, including, but not limited to, a primate (such as a human, a non-human primate, e.g., a monkey, and a chimpanzee), a non-primate (such as a cow, a pig, a camel, a llama, a horse, a goat, a rabbit, a sheep, a hamster, a guinea pig, a cat, a dog, a rat, a mouse, a horse, and a whale), or a bird (e.g., a duck or a goose).

[0034]As used herein, "protecting group" can refer to a moiety of a compound that masks or alters the properties of a functional group to which it bound, or the properties of the compound as a whole. Non-limiting examples of protecting groups include, but are not limited to, methyl, formyl, ethyl, acetyl, anisyl, benzyl, benzoyl, carbamate, trifluoroacetyl, diphenylmethyl, triphenylmethyl, benzyloxymethyl, benzyloxycarbonyl, 2-nitrobenzoyl, t-Boc (tert-butyloxycarbonyl), 4-methylbenzyl, 4-nitrophenyl, 2-chlorobenzyloxycarbonyl, 2-bromobenzyloxycarbonyl, 2,4,5-trichlorophenyl, thioanizyl, thiocresyl, cbz (carbobenzyloxy), p-methoxybenzyl carbonyl, 9-fluorenylmethyloxycarbonyl, pentafluorophenyl, p-methoxybenzyl,

3,4-dimethozybenzyl, p-methoxyphenyl, 4-toluenesulfonyl, p-nitrobenzenesulfonyl, 9-fluorenylmethyloxycarbonyl, 2-nitrophenylsulfenyl, 2,2,5,7,8-pentamethyl-chroman- 6-sulfonyl, N-succinimidyl, and p-bromobenzenesulfonyl.

[0035] As used herein, the terms "treating”, or "treatment" can refer to a beneficial or specific result, such as improving one of more functions, or symptoms of a disease. The terms "treating" or "treatment" can also include, but are not limited to, alleviation or amelioration of one or more symptoms of a repeat expansion disease, such as DM1 , DM2, or Huntington’s Disease. "Treatment" can also mean prolonging survival as compared to expected survival in the absence of treatment.

[0036] "Lower," in the context of a disease marker or symptom, can refer to a clinically- relevant and/or a statistically significant decrease in such level. The decrease can be, for example, at least 10%, at least 20%, at least 30%, at least 40%, or more, down to a level accepted as within the range of normal for an individual without such disorder, or to below the level of detection of the assay. In certain aspects, the decrease can be down to a level accepted as within the range of normal for an individual without such disorder, which can also be referred to as a normalization of a level. In certain aspects, the reduction can be the normalization of the level of a sign or symptom of a disease, that is, a reduction in the difference between the subject level of a sign of the disease and the normal level of the sign for the disease (e.g., to the upper level of normal when the value for the subject must be decreased to reach a normal value, and to the lower level of normal when the value for the subject must be increased to reach a normal level). The methods may include a clinically relevant inhibition of expression of a mRNA of a repeat expansion disease, such as DM1 , DM2, or Huntington’s Disease, e.g., as demonstrated by a clinically relevant outcome after treatment of a subject with a recognition reagent as described herein.

[0037] "Therapeutically effective amount," as used herein, can include the amount of a recognition reagent as described herein that, when administered to a subject having a disease, can be sufficient to effect treatment of the disease (e.g., by diminishing, ameliorating or maintaining the existing disease or one or more symptoms of disease). The "therapeutically effective amount" may vary depending on the recognition reagent (agent), how the agent is administered, the disease and its severity and the history, age, weight, family history, genetic makeup, the types of preceding or concomitant treatments, if any, and other individual characteristics of the subject to be treated.

[0038]A "therapeutically-effective amount" can also include an amount of an agent that produces a local or systemic effect at a reasonable benefit/risk ratio applicable to any treatment. Recognition reagent agents employed in the methods described herein may be administered in a sufficient amount to produce a reasonable benefit/risk ratio applicable to such treatment.

[0039] The phrase "pharmaceutically-acceptable carrier" as used herein can refer to a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the subject compound from one organ, or portion of the body, to another organ, or portion of the body. Each carrier can be "acceptable" in the sense of being compatible with the other ingredients of the formulation and not injurious to the subject being treated. Some non-limiting examples of materials which can serve as pharmaceutically-acceptable carriers include: (1 ) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium state, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (1 1 ) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol; (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21 ) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; and (24) other non-toxic compatible substances employed in pharmaceutical formulations.

[0040]As used herein, the terms“cell” and“cells” can refer to any types of cells from any animal, such as, without limitation, rat, mice, monkey, and human. For example and without limitation, cells can be progenitor cells, such as stem cells, or differentiated cells, such as endothelial cells, smooth muscle cells. In certain embodiments, cells for medical procedures can be obtained from the patient for autologous procedures or from other donors for allogeneic procedures.

[0041] The terms "expression" or“gene expression,” can refer to the overall flow of information from a gene (without limitation, a functional genetic unit for producing a gene product, such as RNA or a protein in a cell, or other expression system encoded on a nucleic acid and comprising: a transcriptional promoter and other c/s-acting elements, such as response elements and/or enhancers; an expressed sequence that typically encodes a protein (open-reading frame or ORF) or functional/structural RNA, and a polyadenylation sequence), to produce a gene product (typically a protein, optionally post-translationally modified or a functional/structural RNA). The phrases "expression of genes under transcriptional control of," or alternately "subject to control by," in regards to a designated sequence, can refer to gene expression from a gene containing the designated sequence operably linked (functionally attached, typically in cis) to the gene. The designated sequence may be all or part of the transcriptional elements (without limitation, promoters, enhancers and response elements), and may wholly or partially regulate and/or affect transcription of a gene. A "gene for expression of" a stated gene product can be a gene capable of expressing that stated gene product when placed in a suitable environment--that is, for example, when transformed, transfected, transduced, etc. into a cell, and subjected to suitable conditions for expression. In the case of a constitutive promoter, "suitable conditions" can mean that the gene typically need only be introduced into a host cell. In the case of an inducible promoter, "suitable conditions" can refer to conditions that, when an amount of the respective inducer is administered to the expression system (e.g., cell), are effective to cause expression of the gene.

[0042] As used herein, the term“knockdown” can mean that expression of one or more genes in an organism is reduced, typically significantly, with respect to a functional gene, such as to a therapeutically-effective degree. Gene knockdown can also include complete gene silencing. As used herein,“gene silencing” can mean that expression of a gene is essentially completely prevented. Knockdown and gene silencing may occur either at the transcriptional stage or the translational stage. Use of the described recognition reagents to target an RNA in a cell, such as an mRNA, may modify gene expression, by knocking down or silencing a gene or genes at the translational stage.

[0043]As used herein, the term“nucleic acid” refers to deoxyribonucleic acids (DNA) and ribonucleic acids (RNA). Nucleic acid analogs include, for example and without limitation: 2’-0-methyl-substituted RNA, locked nucleic acids, unlocked nucleic acids, triazole-linked DNA, peptide nucleic acids, morpholino oligomers, dideoxynucleotide oligomers, glycol nucleic acids, threose nucleic acids and combinations thereof including, but not limited to, ribonucleotide or deoxyribonucleotide residue(s). Herein, “nucleic acid” and“oligonucleotide,” in reference to nucleic acids and nucleic acid analogs, are used interchangeably, and can refer to a short, single-stranded structure made of up nucleotides. An oligonucleotide may be referred to by the length (i.e. number of nucleotides) of the strand, through the nomenclature“-mer”. For example, an oligonucleotide of 22 nucleotides would be referred to as a 22-mer.

[0044]A “nucleic acid analog” can be a composition comprising a sequence of nucleobases arranged on a substrate, such as a polymeric backbone, and can bind DNA and/or RNA by hybridization by Watson-Crick, or Watson-Crick-like hydrogen bond base pairing. Non-limiting examples of common nucleic acid analogs include peptide nucleic acids (PNAs), such as yPNA, morpholino nucleic acids, phosphorothioates, locked nucleic acid (2’-0-4’-C-methylene bridge, including, but not limited to, oxy, thio or amino versions thereof), unlocked nucleic acid (the C2’-C3’ bond is cleaved), 2’-0-methyl-substituted RNA, threose nucleic acid, glycol nucleic acid, etc.

[0045]A conformationally preorganized nucleic acid analog can be a nucleic acid analog that has a backbone (a preorganized backbone) that forms either a right- handed helix or a left-handed helix, depending on the structure of the nucleic acid backbone. As shown herein, one example of a conformationally preorganized nucleic acid analog is gRNA, which has a chiral center at the g carbon, and, depending on, and due to, the chirality of the groups at the g carbon, forms a right-handed helix or a left-handed helix. Likewise, locked nucleic acids can comprise a ribose with a bridge between the 2’ oxygen and the 4’ carbon, which“locks” the ribose into a 2 > -endo (North) conformation.

[0046] In the context of the present disclosure, a“nucleotide” can refer to a monomer comprising at least one nucleobase and a backbone element (backbone moiety), which in a nucleic acid, such as RNA or DNA, is ribose or deoxyribose. “Nucleotides” also typically comprise reactive groups that permit polymerization under specific conditions. In native DNA and RNA, those reactive groups are the 5’ phosphate and 3’ hydroxyl groups. For chemical synthesis of nucleic acids and analogs thereof, the bases and backbone monomers may contain modified groups, such as blocked amines. A“nucleotide residue” can refer to a single nucleotide that is incorporated into an oligonucleotide or polynucleotide. Likewise, a“nucleobases residue” can refer to a nucleobases incorporated into a nucleotide or a nucleic acid or analog thereof. A “genetic recognition reagent” can refer to generically to a nucleic acid or a nucleic acid analog that comprises a sequence of nucleobases that is able to hybridize to a complementary nucleic acid or nucleic acid analog sequence on a nucleic acid by cooperative base pairing, e.g., Watson-Crick base pairing or Watson-Crick-like base pairing.

[0047] In further detail, nucleotides, for either RNA, DNA, or nucleic acid analogs, can have the structure A-B wherein A is a backbone monomer moiety and B is a nucleobase as described herein. The backbone monomer can be any suitable nucleic acid backbone monomer, such as a ribose triphosphate or deoxyribose triphosphate, or a monomer of a nucleic acid analog, such as peptide nucleic acid (PNA), such as a gamma PNA (yPNA). In one example the backbone monomer is a ribose mono-, di-, or tri-phosphate or a deoxyribose mono-, di-, or tri-phosphate, such as a 5’ monophosphate, diphosphate, or triphosphate of ribose or deoxyribose. The backbone monomer can include both the structural“residue” component, such as the ribose in RNA, and any active groups that are modified in linking monomers together, such as the 5’ triphosphate and 3’ hydroxyl groups of a ribonucleotide, which are modified when polymerized into RNA to leave a phosphodiester linkage. Likewise for PNA, the C-terminal carboxyl and N-terminal amine active groups of the N-(2-aminoethyl)glycine backbone monomer can be condensed during polymerization to leave a peptide (amide) bond. In another aspect, the active groups are phosphoramidite groups useful for phosphoramidite oligomer synthesis. The nucleotide also optionally comprises one or more protecting groups, such as 4,4’-dimethoxytrityl (DMT), and as described herein. A number of additional methods of preparing synthetic genetic recognition reagents depend on the backbone structure and particular chemistry of the base addition process. Determination of which active groups to utilize in joining nucleotide monomers and which groups to protect in the bases, and the required steps in preparation of oligomers is well within the abilities of those of ordinary skill in the chemical arts and in the field of nucleic acid and nucleic acid analog oligomer synthesis.

[0048] Non-limiting examples of common nucleic acid analogs include peptide nucleic acids, such as yPNA, phosphorothioate (e.g., Fig. 3 (A)), locked nucleic acid (2’-0-4’- C-methylene bridge, including, but not limited to, oxy, thio or amino versions thereof, e.g., Fig. 3 (B)), unlocked nucleic acid (the C2’-C3’ bond is cleaved, e.g., Fig. 3 (C)), 2’-0-methyl-substituted RNA, morpholino nucleic acid (e.g., Fig. 3 (D)), threose nucleic acid (e.g., Fig. 3 (E)), glycol nucleic acid (e.g., Fig. 3 (F), showing R and S Forms), phosphorodiamidate morpholino oligomer (PMO). Fig. 3 (A-F) shows monomer structures for various examples of nucleic acid analogs. Fig. 3 (A-F) each show two monomer residues incorporated into a longer chain as indicated by the wavy lines. Incorporated monomers are referred to herein as“residues” and the part of the nucleic acid or nucleic acid analog excluding the nucleobases is referred to as the “backbone” of the nucleic acid or nucleic acid analog. As an example, for RNA, an exemplary nucleobase is adenine, a corresponding monomer is adenosine triphosphate, and the incorporated residue is an adenosine monophosphate residue. For RNA, the“backbone” consists of ribose subunits linked by phosphates, and, thus, the backbone monomer is ribose triphosphate prior to incorporation and a ribose monophosphate residue after incorporation. Like yPNA, Locked Nucleic Acid (e.g., Fig.3 (B)) is conformationally preorganized.

[0049] A“moiety” is a part of a molecule, and can include as a class“residues”, which are the portion of a compound or monomer that remains in a larger molecule, such as a polymer chain, after incorporation of that compound or monomer into the larger molecule, such as a nucleotide as-incorporated into a nucleic acid or an amino acid as-incorporated into a polypeptide or protein. [0050] The term "polymer composition" is a composition comprising one or more polymers. As a class, "polymers" can include, without limitation, homopolymers, heteropolymers, co-polymers, block polymers, block co-polymers and can be both natural and synthetic. Homopolymers contain one type of building block, or monomer, whereas co-polymers contain more than one type of monomer. An“oligomer” can be a polymer that comprises a small number of monomers, such as, for example, from 3 to 100 monomer residues. As such, the term“polymer” can include oligomers. The terms“nucleic acid” and“nucleic acid analog” can include nucleic acid and nucleic acid polymers and oligomers.

[0051]A polymer "comprises" or is "derived from" a stated monomer if that monomer is incorporated into the polymer. Thus, the incorporated monomer that the polymer comprises is not the same as the monomer prior to incorporation into a polymer, in that at the very least, certain linking groups are incorporated into the polymer backbone or certain groups are removed in the polymerization process. A polymer is said to comprise a specific type of linkage if that linkage is present in the polymer. An incorporated monomer can be a“residue”. A typical monomer for a nucleic acid or nucleic acid analog is referred to as a nucleotide.

[0052]“Non-reactive”, in the context of a chemical constituent, such as a molecule, compound, composition, group, moiety, ion, etc. can mean that the constituent does not react with other chemical constituents in its intended use to any substantial extent. The non-reactive constituent is selected to not interfere, or to interfere insignificantly, with the intended use of the constituent, moiety, or group as a recognition reagent. In the context of the linker moieties described herein, the constituents can be non-reactive in that they do not interfere with the binding of the recognition reagents to a target template, and do not interfere with concatenation of the recognition reagents on the target template.

[0053]As used herein, "alkyl" refers to straight, branched chain, or cyclic hydrocarbon groups including, for example, from 1 to about 20 carbon atoms, for example and without limitation C1-3, C1-6, C1-10 groups, for example and without limitation, straight, branched chain alkyl groups such as methyl, ethyl, propyl, butyl, pentyl, hexyl, heptyl, octyl, nonyl, decyl, undecyl, dodecyl, and the like. An alkyl group can be, for example, a Ci, C2, C3, C4, C5, C6, C7, Cs, C9, C10, C11 , C12, C13, C14, C15, C16, C17, C18, C19, C20, C21 , C22, C23, C24, C25, C26, C27, C28, C29, C30, C31 , C32, C33, C34, C35, C36, C37, C38, C39, C40, C41 , C42, C43, C44, C45, C46, C47, C48, C49, or C50 group that is substituted or unsubstituted. Non-limiting examples of straight alkyl groups include methyl, ethyl, propyl, butyl, pentyl, hexyl, heptyl, octyl, nonyl, and decyl. Branched alkyl groups comprises any straight alkyl group substituted with any number of alkyl groups. Nonlimiting examples of branched alkyl groups include isopropyl, isobutyl, sec-butyl, and t-butyl. Non-limiting examples of cyclic alkyl groups include cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl, cycloheptlyl, and cyclooctyl groups. Cyclic alkyl groups also comprise fused-, bridged-, and spiro-bicycles and higher fused-, bridged-, and spiro-systems. A cyclic alkyl group can be substituted with any number of straight, branched, or cyclic alkyl groups. "Substituted alkyl" can include alkyl substituted at 1 or more (e.g., 1 , 2, 3, 4, 5, or even 6) positions, which substituents are attached at any available atom to produce a stable compound, with substitution as described herein. "Optionally substituted alkyl" refers to alkyl or substituted alkyl. "Halogen," "halide," and "halo" refers to -F, -Cl, -Br, and/or -I. "Alkylene" and "substituted alkylene" can include divalent alkyl and divalent substituted alkyl, respectively, including, without limitation, methylene, ethylene, trimethylene, tetramethylene, pentamethylene, hexamethylene, hepamethylene, octamethylene, nona methylene, or decamethylene. "Optionally substituted alkylene" can include alkylene or substituted alkylene.

[0054] "Alkene or alkenyl" can include straight, branched chain, or cyclic hydrocarbyl groups including, e.g., from 2 to about 20 carbon atoms, such as, without limitation C2-3, C2-6, C2-10 groups having one or more, e.g., 1 , 2, 3, 4, or 5, carbon -to-carbon double bonds. The olefin or olefins of an alkenyl group can be, for example, E, Z, c/s, trans, terminal, or exo-methylene. An alkenyl or alkenylene group can be, for example, a C2, C3, C4, C5, C6, C7, Cs, C9, C10, C11 , C12, C13, C14, C15, C16, C17, C18, C19, C20, C21 , C22, C23, C24, C25, C26, C27, C28, C29, C30, C31 , C32, C33, C34, C35, C36, C37, C38, C39, C40, C41 , C42, C43, C44, C45, C46, C47, C48, C49, or C50 group that is substituted or unsubstituted. A halo-alkenyl group can be any alkenyl group substituted with any number of halogen atoms. "Substituted alkene" can include alkene substituted at 1 or more, e.g., 1 , 2, 3, 4, or 5 positions, which substituents are attached at any available atom to produce a stable compound, with substitution as described herein. "Optionally substituted alkene" can include alkene or substituted alkene. Likewise, "alkenylene" can refer to divalent alkene. Examples of alkenylene include without limitation, ethenylene (-CH=CH-) and all stereoisomeric and conformational isomeric forms thereof. "Substituted alkenylene" can refer to divalent substituted alkene. "Optionally substituted alkenylene" can refer to alkenylene or substituted alkenylene.

[0055] "Alkyne" or "alkynyl" refers to a straight chain, branched chain, or cyclic unsaturated hydrocarbon having the indicated number of carbon atoms and at least one triple bond. The triple bond of an alkyne or alkynyl group can be internal or terminal. Examples of a (C2-C8)alkynyl group include, but are not limited to, acetylene, propyne, 1 -butyne, 2-butyne, 1- pentyne, 2-pentyne, 1-hexyne, 2-hexyne, 3-hexyne, 1-heptyne, 2-heptyne, 3-heptyne, 1-octyne, 2-octyne, 3-octyne and 4-octyne. An alkynyl group can be unsubstituted or optionally substituted with one or more substituents as described herein below. An alkyne or alkynyl group can be, for example, a C2, C3, C4, C5, C6, C7, Cs, C9, C10, C11 , C12, C13, C14, C15, C16, C17, C18, C19, C20, C21, C22, C23, C24, C25, C26, C27, C28, C29, C30, C31, C32, C33, C34, C35, C36, C37, C38, C39, C40, C41 , C42, C43, C44, C45, C46, C47, C48, C49, or C50 group that is substituted or unsubstituted. A halo-alkynyl group can be any alkynyl group substituted with any number of halogen atoms. The term "alkynylene" refers to divalent alkyne. Examples of alkynylene include without limitation, ethynylene, propynylene. "Substituted alkynylene" refers to divalent substituted alkyne.

[0056] The term "alkoxy" can refer to an -O-alkyl group having the indicated number of carbon atoms. An ether or an ether group comprises an alkoxy group. For example, a (Ci-C6)alkoxy group includes -O-methyl (methoxy), -O-ethyl (ethoxy), -O-propyl (propoxy), -O-isopropyl (isopropoxy), -O-butyl (butoxy), -O-sec -butyl (sec-butoxy), -O-tert-butyl (tert-butoxy), -O-pentyl (pentoxy), -O-isopentyl (isopentoxy), -O- neopentyl (neopentoxy), -O-hexyl (hexyloxy), -O-isohexyl (isohexyloxy), and -O- neohexyl (neohexyloxy). "Hydroxyalkyl" refers to a (Ci-Cio)alkyl group wherein one or more of the alkyl group's hydrogen atoms is replaced with an -OH group. Examples of hydroxyalkyl groups include, but are not limited to, -CH2OH, -CH2CH2OH, -CH2CH2CH2OH, -CH2CH2CH2CH2OH, -CH2CH2CH2CH2CH2OH,

-CH2CH2CH2CH2CH2CH2OH, and branched versions thereof. The term "ether" or "oxygen ether" refers to an alkyl group wherein one or more of the alkyl group's carbon atoms is replaced with an -O- group. The term ether can include -CH2-(OCH2- CH2) q OPi compounds where Pi is a protecting group, -H, or a (Ci-Cio)alkyl. Exemplary ethers include polyethylene glycol, diethylether, methylhexyl ether and the like. [0057]“PEG” refers to polyethylene glycol. “PEGylated” refers to a compound comprising a moiety, comprising two or more consecutive ethylene glycol moieties. Non-limiting examples of PEG moieties for PEGylation of a compound include, one or more blocks of a chain of from 1 to 50 ethylene glycol moieties, such as -(0-CH2-CH2)n-, -(CH2-CH2-0)n-, or -(O-Chh-ChhJn-OH., where n ranges from 2 to 50.

[0058]“Heteroatom" refers to any atom other than carbon or hydrogen, for example, N, O, P and S. Compounds that contain N or S atoms can be optionally oxidized to the corresponding N-oxide, sulfoxide or sulfone compounds. “Hetero-substituted” refers to an organic compound in any embodiment described herein in which one or more carbon atoms are substituted with any atom other than carbon or hydrogen, for example, N, O, P or S.

[0059]“Aryl," alone or in combination refers to an aromatic ring system such as phenyl or naphthyl. "Aryl" also can include aromatic ring systems that are optionally fused with a cycloalkyl ring. A "substituted aryl" is an aryl that is independently substituted with one or more substituents attached at any available atom to produce a stable compound, wherein the substituents are as described herein. The substituents can be, for example, hydrocarbyl groups, alkyl groups, alkoxy groups, and halogen atoms. "Optionally substituted aryl" refers to aryl or substituted aryl. An aryloxy group can be, for example, an oxygen atom substituted with any aryl group, such as phenoxy. An arylalkoxy group can be, for example, an oxygen atom substituted with any aralkyl group, such as benzyloxy. "Arylene" denotes divalent aryl, and "substituted arylene" refers to divalent substituted aryl. "Optionally substituted arylene" refers to arylene or substituted arylene. A“polycyclic aryl group” and related terms, such as“polycyclic aromatic group” refers to a group composed of at least two fused aromatic rings. “Heteroaryl” or“hetero-substituted aryl” refers to an aryl group substituted with one or more heteroatoms, such as N, O, P, and/or S.

[0060]“Cycloalkyl" refers to monocyclic, bicyclic, tricyclic, or polycyclic, 3- to 14-membered ring systems, which are either saturated, or partially unsaturated. The cycloalkyl group may be attached via any atom. Cycloalkyl also contemplates fused rings wherein the cycloalkyl is fused to an aryl or hetroaryl ring. Representative examples of cycloalkyl include, but are not limited to cyclopropyl, cyclobutyl, cyclopentyl, and cyclohexyl. A cycloalkyl group can be unsubstituted or optionally substituted with one or more substituents as described herein below. “Cycloalkylene" refers to divalent cycloalkyl. The term "optionally substituted cycloalkylene" refers to cycloalkylene that is substituted with at least 1 , 2 or 3 substituents, attached at any available atom to produce a stable compound, wherein the substituents are as described herein.

[0061]“Carboxyl” or“carboxylic” refers to group having an indicated number of carbon atoms, where indicated, and terminating in a -C(0)0H group, thus having the structure -R-C(0)0H, where R is an unsubstituted or substituted divalent organic group that can include linear, branched, or cyclic hydrocarbons. Non-limiting examples of these include: Ci-e carboxylic groups, such as ethanoic, propanoic, 2-methylpropanoic, butanoic, 2,2-dimethylpropanoic, pentanoic, etc. “Amine” or “amino” refers to group having the indicated number of carbon atoms, where indicated, and terminating in a -Nhte group, thus having the structure -R-NH2, where R is a unsubstituted or substituted divalent organic group that, e.g. includes linear, branched, or cyclic hydrocarbons, and optionally comprises one or more heteroatoms. The term “alkylamino” refers to a radical of the formula -NHR X or -NR X R X where each R x is, independently, an alkyl radical as defined above.

[0062]Terms combining the foregoing refer to any suitable combination of the foregoing, such as arylalkenyl, arylalkynyl, heteroarylalkyl, heteroarylalkenyl, heteroarylalkynyl, heterocyclylalkyl, heterocyclylalkenyl, heterocyclylalkynyl, aryl, heteroaryl, heterocyclyl, cycloalkyl, cycloalkenyl, alkylarylalkyl, alkylarylalkenyl, alkylarylalkynyl, alkenylarylalkyl, alkenylarylalkenyl, alkenylarylalkynyl, alkynylarylalkyl, alkynylarylalkenyl, alkynylarylalkynyl, alkyl heteroarylalkyl, alkylheteroarylalkenyl, alkyl heteroarylalkynyl, alkenyl heteroarylalkyl, alkenylheteroarylalkenyl, alkenylheteroarylalkynyl, alkynylheteroarylalkyl, alkynylheteroarylalkenyl, alkynyl heteroarylalkynyl, alkyl heterocyclylalkyl, alkylheterocyclylalkenyl, alkyl hererocyclylalkynyl, alkenyl heterocyclylalkyl, alkenyl heterocyclylalkenyl, alkenylheterocyclylalkynyl, alkynyl heterocyclylalkyl, alkynylheterocyclylalkenyl, alkynylheterocyclylalkynyl, alkylaryl, alkenylaryl, alkynylaryl, alkyl heteroaryl, alkenylheteroaryl, alkynyl hereroaryl. As an example, “arylalkylene" refers to a divalent alkylene wherein one or more hydrogen atoms in an alkylene group is replaced by an aryl group, such as a (C3-C8)aryl group. Examples of (C3-C8)aryl-(Ci-C6)alkylene groups include without limitation 1 -phenylbutylene, phenyl-2-butylene, l-phenyl-2-methylpropylene, phenylmethylene, phenylpropylene, and naphthylethylene. The term "(C3-C8)cycloalkyl-(Ci-C6)alkylene" refers to a divalent alkylene wherein one or more hydrogen atoms in the C1-C6 alkylene group is replaced by a (C3-C8)cycloalkyl group. Examples of (C3-C8)cycloalkyl-(Ci-C6)alkylene groups include without limitation 1 -cycloproylbutylene, cycloproyl-2-butylene, cyclopentyl-1 -phenyl-2-methylpropylene, cyclobutylmethylene and cyclohexyl propylene.

[0063]An“amino acid” can include compounds that have the structure H2N-C(R)- C(0)0H, where R is a side chain or H, such as an amino acid side chain. An“amino acid residue” represents the remainder of an amino acid when incorporated into a chain of amino acids, such as when incorporated into a recognition reagent as discloses herein, e.g., having the structures -NH-C(R)-C(0)-, H2N-C(R)-C(0)- (when at the N-terminus of a polypeptide), or -NH-C(R)-C(0)0H (when at the C-terminus of a polypeptide). An“amino acid side chain” is a side chain for an amino acid, including, but not limited to, proteinogenic or non-proteinogenic amino acids. Non-limiting examples of amino acid side chains are shown in Fig. 4. Glycine (H2N-CH2-C(0)0H) has no side chain.

[0064]A "peptide nucleic acid" refers to a nucleic acid analog, or DNA or RNA mimic, in which the sugar phosphodiester backbone of the DNA or RNA is replaced by an N-(2-aminoethyl)glycine unit. A gamma PNA (yPNA) is an oligomer or polymer of gamma-modified N-(2-aminoethyl)glycine monomers of the following structure:

, where R is a nucleobase moiety, and at least one of Ri or R2 attached to the gamma carbon is not a hydrogen, such that the gamma carbon is a chiral center. When Ri and R2 are hydrogen (N-(2-aminoethyl)-glycine backbone), or the same, there is no such chirality about the gamma carbon. Where R2 is hydrogen, and Ri is not, the yPNA is said to be left-handed. Where Ri is hydrogen, and R2 is not, the yPNA is said to be right-handed. Right-handed PNA is able to hybridize in a Watson-Crick or Watson-Crick-like manner with complementary single- stranded DNA or RNA. In some cases where neither Ri nor R2 are hydrogen but Ri and R2 are different, the handedness of the residue or oligomer may be dictated by the respective bulkiness of the Ri and R2 groups, or by other physical or chemical properties of the groups, and the chirality may be designated right-handed where the configuration hybridizes to complementary DNA or RNA strands, but the opposite configuration does not.

[0065] An incorporated PNA or yPNA monomer, referred to herein as a PNA or yPNA“residue”, in reference to the remaining structure after integration into an oligomer or polymer, with each residue having the same or different R group as its base (nucleobase), such as adenine, guanine, cytosine, thymine and uracil bases, or other bases, such as the monovalent and divalent bases described herein, such that the order of bases on the PNA is its“sequence”, as with DNA or RNA. A sequence of nucleobases in a nucleic acid or a nucleic acid analog oligomer or polymer, such as a PNA or yPNA oligomer or polymers, binds to a complementary sequence of adenine, guanine, cytosine, thymine and/or uracil residues in a nucleic acid or nucleic acid analog strand by nucleobase pairing, in a Watson-Crick or Watson-Crick-like manner, essentially as with double-stranded DNA or RNA.

[0066]A“guanidine” or“guanidinium” group may be added to the recognition reagent to increase solubility and/or bioavailability. Because PNA is produced in a similar manner to synthetic peptides, a simple way to add guanidine groups is to add one or more terminal arginine (Arg) residues to the N-terminal and/or C-terminal ends of the PNA, e.g., yPNA, recognition reagent. Likewise, an arginine side group,

NH NH , or a guanidine-containing moiety, such as

where z, for example and without limitation, ranges from 1 -5, or a salt thereof, can be attached to a recognition reagent backbone as described herein. A guanidine- containing group is a group comprising a guanidine moiety, and may have less than 100 atoms, less than 50 atoms, e.g., less than 30 atoms. In one aspect, the guanidine-

containing group has the structure: , where L is a linker according to any aspect described herein, e.g., a non-reactive aliphatic hydrocarbyl linker, such as a methylene, ethylene, trimethylene, tetramethylene, or pentamethylene linker. In aspects the guanidine-containing group has the structure: , where z is 1 -5, e.g., the guanidine group may be arginine.

[0067]A“nucleobase” can include primary nucleobases: adenine, guanine, thymine, cytosine, and uracil, as well as modified, non-natural, purine and pyrimidine bases, such as, without limitation, hypoxanthine, xanthene, 7-methylguanine, 5, 6, di hydrouracil, 5-methylcytosine, and 5-hydroxymethylcytosine. Figs. 5 and 6A-6C also depict non-limiting examples of nucleobases, including, but not limited to, monovalent nucleobases (e.g., adenine, cytosine, guanine, thymine or uracil, which bind to one strand of nucleic acid or nucleic acid analogs), and divalent nucleobases (e.g., JB1 -JB16 described herein) which bind complementary nucleobases on two strands of DNA simultaneously, and“clamp” nucleobases, such as a“G-clamp,” which binds complementary nucleobases with enhanced strength. Additional purine, purinelike, pyrimidine and pyrimidine-like nucleobases are disclosed, for example in United States Patent Nos. 8,053,212, 8,389,703, and 8,653,254. For divalent nucleobases JB1 -JB16, shown in Fig. 6A, Table A shows the specificity of the different nucleobases. Of note, JB1 -JB4 series bind complementary bases (C-G, G-C, A-T and T-A), while JB5-JB16 bind mismatches, and, thus, can be used to bind two strands of matched and/or mismatched bases. Divalent nucleobases are described in further detail in United States Patent Application Publication No. 2016/0083434 A1 and International Patent Publication No. WO 2018/058091 , both of which are incorporated herein by reference.

Table A - Divalent Nucleobases

*diaminopurine, an adenine analog.

[0068] Exemplary yPNA structures that are not end-modified with left-handed PNAs in the manner described herein, but which may be so modified, as described herein, are disclosed in International Patent Publication No. WO 2012/138955, incorporated herein by reference.

[0069] Complementary refers to the ability of polynucleotides (nucleic acids) to hybridize to one another, forming inter-strand base pairs. Base pairs are formed by hydrogen bonding between nucleotide units in polynucleotide or polynucleotide analog strands that are typically in antiparallel orientation. Complementary polynucleotide strands can base pair (hybridize) in the Watson-Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes. In RNA as opposed to DNA, uracil rather than thymine is the base that is complementary to adenosine. Two sequences comprising complementary sequences can hybridize if they form duplexes under specified conditions, such as in water, saline (e.g., normal saline, or 0.9% w/v saline) or phosphate-buffered saline), or under other stringency conditions, such as, for example and without limitation, 0.1X SSC (saline sodium citrate) to 10X SSC, where 1X SSC is 0.15M NaCI and 0.015M sodium citrate in water. Hybridization of complementary sequences is dictated, e.g., by salt concentration and temperature, with the melting temperature (Tm) lowering with increased mismatches and increased stringency. Perfectly matched sequences are said to be “fully complementary”, though one sequence (e.g., a target sequence in an mRNA) may be longer than the other, as in the case of the small recognition reagents described herein in relation to the much longer target sequences on which they concatenate, such as mRNAs containing repeat expansions. Binding of adjacent genetic recognition reagents as described herein is said to be cooperative, in that binding of the reagents to the target sequences of their recognition domains, enhances the end-to-end binding or concatamerization of the reagents via binding of their left-handed PNA cooperative binding domains as described herein.

[0070]Toxic RNAs containing expanded trinucleotide repeats are the cause of many neuromuscular disorders, one being myotonic dystrophy type I (DMI). DMI is triggered by CTG-repeat expansion in the 3'-UTR of the DMP K gene, resulting in toxic-gain of RNA function through sequestration of MBNLI protein, among others. Described herein are short probes that are capable of binding nucleic acid sequences, such as repeated nucleic acid sequences, in a sequence-specific and selective manner. For example, as described in the Examples, a short PNA probe, two triplet-repeats in length, containing terminal left-handed yPNA cooperative binding domains can be capable of binding rCUG-repeats in a sequence-specific and selective manner. This probe can discriminate the pathogenic rCUG exp from the wild-type transcript and is able to disrupt the rCUG exp -MBNL1 complex. As such, in aspects, described herein are short nucleic acid probes, referred to as genetic recognition reagents, for targeting RNA-repeat expansions associated with DMI and other related neuromuscular disorders.

[0071] The methods and compositions described herein overcome three major hurdles presently facing conventional antisense and antigene approaches. A first hurdle concerns the scale and cost of oligonucleotide synthesis. Since oligonucleotides are traditionally synthesized in a step-wise fashion on solid-support, it is difficult to scale up the production. This translates to high-cost and unmet demand for oligonucleotide therapeutics. The methods and compositions described herein overcome this challenge because the recognition reagents are relatively small in size, 3 to 8 nucleotides in length - bordering the molecular weights of small molecules and biomimetics. The compounds described herein can be produced in large scales using convergent, solution-phase synthesis methods, which would translate to lower production costs and greater accessibility to these materials for treatment.

[0072]A second hurdle concerns cellular delivery - specifically how to get these nucleic acid probes across the lipid-bilayer of cell membrane and into the cytoplasm and nucleus of the target cells. Most oligonucleotides are not permeable to the cell- membrane due to of their relatively large molecular weight. Their delivery into cells would require the aid of transfecting reagents, or mechanical or electrical transduction. While these approaches have been successfully used to transport oligonucleotides and other macromolecules into cells, they are limited to small scale-up, in vitro (tissue culture) experimental setups. In vivo, systemic delivery (a requirement for treatment of genetic and most infectious diseases) remains an issue, especially for diseases of the central nervous system. This can be overcome by the reduced size of the recognition reagents and flexibility in the chemical modifications. The fact that they are relatively small in size, they are taken-up more readily by cells and more permeable to the nuclear membrane. Further, with regard to PNAs, such as yPNAs, because of their synthetic flexibility, in that any chemical group can be incorporated in the backbone of PNAs, these recognition reagents can be easily modified with specific chemical functionalities to promote cellular uptake and systemic delivery.

[0073] The third hurdle concerns nonspecific binding and cytotoxic effects. When introduced into a cell, a naked piece of oligonucleotide 10-30nt in length, synthetic or otherwise, would bind not only to its designated target but also a slew of other DNA or RNA regions with related sequences. Such nonspecific binding would trap the probe, preventing it from freely diffusing and searching for and binding to its target. A reduction in the effective concentration of the probe, due to nonspecific binding, would lead to a reduction in the efficacy. Moreover, such nonspecific binding can also lead to cytotoxic effects, as the result of mis-regulation of gene expression and/or perturbation of the function of other key proteins. Nonspecific binding, in fact, has been attributed as the main cause of side-effects of oligonucleotide therapeutics (as well as small molecule drugs), and presently there is no solution in sight. This may be overcome by taking advantage of the weak interaction between the short recognition reagent (typically 3 to 8nt in length) and the target. This weak,‘kissing’ interaction permits the module to freely diffuse in the intracellular environment in search for its target. Its designated target, in this case, differs from the‘random,’‘single-binding site’ hit in that it contains repeated sequence element, which enables the module to assemble next to one another in a cooperative manner through adjacent base stacking.

[0074] Presently there are several genetic diseases associated with unstable repeat expansions of nucleotide sequences, as illustrated in Fig. 7. The challenge is that the target, in this case DNA or RNA, is monotonous in its three-dimensional architecture in comparison to proteins. This makes it difficult for small molecules to discriminate a particular site from a sea of other DNA or RNA sequences.

[0075]An advantage over the“small molecule drug” approach is in the treatment of cancer and in combating bacterial, viral, and parasitic infections, where the targets are rapidly evolving due to the rapid rate of mutations. There are a number of conserved and repeated elements within the genomes and transcriptomes of the tumorigenic clones and of the bacterial, viral, and parasitic pathogens that can potentially be targeted with this method and approach. The chance for these tumor cells or pathogens to evade these recognition reagents described herein and become resistant is unlikely, as compared to the‘small molecule-protein recognition’ approach because the mutation would have to occur at every repeat element within the DNA/RNA template.

[0076] The recognition reagents described herein are designed to be chemically inert until they enter the cytoplasm and/or nucleus of a cell, under which condition the recognition reagent hybridizes to a complementary sequence in a nucleic acid, and, if adjacent to a sequence another recognition reagent is hybridized to, the adjacent left- handed PNA cooperative binding domains thereof will hybridize, thereby concatenating the recognition reagents. The recognition reagents can recognize and bind their DNA or RNA target through cooperative Watson-Crick (or Watson-Crick-like, hydrogen bonding) base-pairing interactions, upon which the adjacent modules non- covalently concatenate via hybridization of complementary, adjacent left-handed PNA cooperative binding domains to form extended, concatenated oligomers in a head-to- tail fashion, e.g., as shown in Fig. 1 . The rate of intramolecular vs. intermolecular hybridization of the left-handed PNA cooperative binding domains in the recognition reagents can be controlled by modulating the rigidity of recognition reagent’s backbone, with rigid, e.g., conformationally preorganized backbones, such as yPNA or LNA backbones limiting intramolecular hybridization of the left-handed PNA cooperative binding domains, and thus potential inactivation of the recognition reagent. Even if some intramolecular hybridization occurs, the recognition reagent can “open up” when hybridizing to its target sequence. In the presence of the target sequence, hybridization to the target sequence and concatenation can predominate.

[0077] The recognition reagents described herein combine the features of small molecules, for example, low molecular weight, ease of large-scale production, low production cost, cell permeability, and pharmacokinetics, with the sequence-specific recognition of oligonucleotides and concatenation via Watson-Crick base-pairings. The concatenation of the oligomer recognition reagents has been described in accordance with several examples, which are intended to be illustrative in all aspects rather than restrictive. Thus, many variations in detailed implementation may be possible.

[0078] Examples of applications for the recognition reagents described herein is in the treatment of genetic diseases with repeat expansion of small sequences, such as those listed in Fig. 7, such as where the expanded repeat has one of the following sequences: (GAA)w, (CGG)w, (CCG)w, (GAG)w, (GTG)w, (CCTG)w, (ATTCT)w, or (GGGGCCV. where w is at least 3.

[0079] Based on the sequences shown in Fig. 7, recognition reagents that would target gene products described therein include, but are not limited to, the sequences in a 5’ to 3’ direction: GAA, CGG, CCG, CAG, CTG, CCTG, ATTCT, and GGGGCC or a sequence complementary thereto, e.g., TTC, CCG, CGG, CTG, CAG, CAGG, AGAAT or GGCCCC, for targeting RNA, or hybridize to target sequences comprising repeats of: GAA, CGG, CCG, CAG, CTG, CCTG, ATTCT, and GGGGCC, or a sequence complementary thereto.

[0080] Other potential applications can be in the treatment of cancer (telomere), bacterial infection (resistant strains, targeting the repeated and conserved elements unique to the pathogenic strains), hepatitis C (affecting 3% of the world population for which there are no effective treatment by targeting the repeated elements within the viral RNA genome), malaria (targeting microsatellites that have been shown to be essential in the replication and life cycle of the plasmodium), and AIDS (this is a rapidly moving target for which the new mutant sequence can be chased after by dialing-in the corresponding nucleobase sequence in the recognition reagents).

[0081] Therefore, provided herein are compositions, e.g., recognition reagents - modified nucleic acids - that assemble on a nucleic acid template and cooperatively bind to each-other on the template and concatenate. The recognition reagents can include a recognition domain, which is an oligomer of a nucleic acid or a nucleic acid analog, e.g., from three to ten bases, e.g., 3, 4, 5, 6, 7, 8, 9, or 10 bases, or from 3 to 8 bases in length, comprising terminal PNA cooperative binding domains at its ends (e.g., the 5’ end and the 3’ end, relative to a nucleic acid). The PNA cooperative binding domains are orthogonal to natural RNA or DNA in that their backbone is unable to hybridize to any significant extent to natural RNA or DNA under typical oligonucleotide hybridization conditions, such as when the recognition domain hybridizes with its target RNA or DNA, and/or the nucleobases of the PNA cooperative binding domain do not bind, e.g., hydrogen bond with any natural nucleobases (A, T, G, C, or U) under oligonucleotide hybridization conditions, or do not bind with sufficient strength to form duplex structures under typical oligonucleotide hybridization conditions. The hybridization conditions can be conditions in which the composition or recognition reagent are employed, such as in treatment of a patient, or in vitro in a cell culture or typical hybridization assay, e.g., under physiological conditions or isotonic conditions based on cellular or bodily fluid, such as in Normal Saline, 0.9% w/v, lactated Ringers at 37°C, water, blood, blood serum, blood plasma, cerebrospinal fluid, lymph, cell lysate, or cell culture media, or binds in 2X SSC, 0.1 % SDS at 25°C, in or 0.5X SSC, 0.1 % SDS at 37°C.

[0082] The PNA of the cooperative binding domains may be left-handed PNA, which is orthogonal to natural RNA and DNA. The two PNA cooperative binding domains are complementary to each other, such that the PNA cooperative binding domains at one end of the recognition domain binds to the PNA cooperative binding domain at the other end of the recognition domain in a second molecule when the recognition domain of the two molecules are bound to adjacent sequences in a single RNA strand. Each PNA cooperative binding domain is attached at its N-terminal end or its C-terminal end to the recognition reagent backbone in any suitable manner, e.g., with a linker, using any appropriate linking chemistry. For example, as shown in Figs. 8-10, the PNA cooperative binding domain can be linked to the recognition domain using a linker terminating in a carboxyl group and an amine at the other end, to form amide bonds. The linker may comprise from one to 10, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 ethylene oxide groups and/or methylene (-CH2-) groups, optionally including a hetero-atom- containing moiety, such as one or more 0-, N-, P-, or S-containing bonds or linkages, such as an amide, ether, ester, thioether, or phosphodiester bonds. The recognition domain may be modified to comprise one or more terminal carboxyl or amine groups, or such groups are natively present in the recognition domain, as in the case of a PNA recognition domain. The recognition domain is not orthogonal to natural RNA or DNA, in that it hybridizes to complementary sequences in RNA and DNA under typical hybridization conditions as described above. A recognition domain may be end- modified at one or independently at both ends with one or more amino acids or amino acid-link groups such as N-(2-aminoethyl)glycine, or an amino acid comprising two amine groups, such as arginine (Arg), diamino butyric acid (Dab), ornithine (Orn), or lysine (Lys), that can be linked to a PNA cooperative binding domain. Use of a linker having a terminal carboxyl group and a terminal amine group with a recognition domain having a terminal carboxyl group and a terminal amine group allows for antiparallel alignment of the two left-handed PNA cooperative binding domains. The binding domains of the genetic recognition reagents may be rigid or conformationally pre-organ ized, as with yPNA and locked nucleic acid backbones. All nucleotide sequences are provided in a 5’ to 3’ direction, left to right, unless indicated otherwise. In the context of PNA oligomers, which can hybridize in a parallel or anti-parallel orientation, unless indicated otherwise, the sequences thereof are depicted in a 5’ to 3’ orientation with respect to their nucleobase sequences in relation to their specific Watson-Crick or Watson-Crick-like binding to a complementary nucleic acid strand.

[0083]A moiety in a compound, such as a nucleobase, is covalently attached to the recognition domain backbone, and thus is said to be “linked” to the backbone. Depending on the chemistry used to prepare the compound, the linkage may be direct, or through a“linker” which is a moiety that covalently attaches two other moieties or groups. In one aspect, the PNA cooperative binding domains are attached to the recognition domain via a linker. In some embodiments, the linker is a non-reactive moiety that links the aromatic group to the backbone of the recognition reagent, and, in some aspects includes from 5-25 carbon atoms (C1-C10), optionally substituted with a hetero-atom, such as a N, S, or O, or a non-reactive linkage, such as an amide linkage (peptide bond) formed by reacting an amine with a carboxyl group. Examples of C1-C10 alkylenes include linear or branched, alkylene (bivalent) moieties such as a methylene, ethylene, trimethylene, tetramethylene, pentamethylene, hexamethylene, hepamethylene, octamethylene, nonamethylene, or decamethylene moiety (e.g., -CH2-[CH2]n-, where n= 1 to 9), optionally comprising an amide linkage and optionally comprising a cyclic moiety. The linkers may comprise from one to four ethylene oxide (e.g., -O-CH2-CH2- or -CH2-CH2-O-) moieties. The linkers are non-bulky in that they do not sterically hinder or otherwise interfere to any substantial extent with the binding of the recognition reagents to a target template, and do not interfere with concatenation of the recognition reagents on the target template. The linker, when incorporated into a compound is the remaining moiety or residue resulting from the linking of the PNA cooperative binding domains to the nucleic acid or nucleic acid analog recognition domain.

[0084] The linker or linking group may be an organic moiety that connects two parts of a compound, e.g., covalently attaches two parts of a compound, such as, for example and without limitation, connection of the aromatic groups to the backbone of the recognition reagent, connection of a nucleobase to the nucleic acid or nucleic acid analog backbone, and/or connection of a guanidium group to the recognition reagent. Linkers typically comprise a direct bond or an atom such as oxygen, nitrogen, phosphorus, or sulfur, a unit such as, C(O), C(0)NH, SO, SO2, SO2NH, or a chain of atoms, such as, but not limited to, substituted or unsubstituted alkyl, substituted or unsubstituted alkenyl, substituted or unsubstituted alkynyl, arylalkyl, arylalkenyl, arylalkynyl, heteroarylalkyl, heteroarylalkenyl, heteroarylalkynyl, heterocyclylalkyl, heterocyclylalkenyl, heterocyclylalkynyl, aryl, heteroaryl, heterocyclyl, cycloalkyl, cycloalkenyl, alkylarylalkyl, alkylarylalkenyl, alkylarylalkynyl, alkenylarylalkyl, alkenylarylalkenyl, alkenylarylalkynyl, alkynylarylalkyl, alkynylarylalkenyl, alkynylarylalkynyl, alkyl heteroarylalkyl, alkylheteroarylalkenyl, alkyl heteroarylalkynyl, alkenyl heteroarylalkyl, alkenylheteroarylalkenyl, alkenylheteroarylalkynyl, alkynyl heteroarylalkyl, alkynylheteroarylalkenyl, alkynyl heteroarylalkynyl, alkyl heterocyclylalkyl, alkylheterocyclylalkenyl, alkyl hererocyclylalkynyl, alkenyl heterocyclylalkyl, alkenylheterocyclylalkenyl, alkenyl heterocyclylalkynyl, alkynyl heterocyclylalkyl, alkynylheterocyclylalkenyl, alkynyl heterocyclylalkynyl, alkylaryl, alkenylaryl, alkynylaryl, alkyl heteroaryl, alkenylheteroaryl, alkynyl hereroaryl, in which one or more carbons, e.g., methylenes or methylidynes (-CH=) is optionally interrupted or terminated by a hetero atom, such as O, S, or N, substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted heterocyclic. In one aspect, the linker may comprise or consist of between about 5 to 25 atoms, e.g., 5-20, 5-10, e.g., 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 atoms, or a total of from 1 to 10, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 C and heteroatoms, e.g., O, P, N, or S atoms. The linker may have a molecular weight, based on the atomic mass of its constituent atoms, of less than 500 Daltons (Da) or less than 400 Da.

[0085] For linkage to a PNA, such as a yPNA, an expedient and available linker is one that reacts an amine with a carboxyl group to form an amide linkage, e.g., using peptide synthesis chemistries to add amino acids to the recognition reagent, where amino acids, such as arginine, may be pre-modified with a chemical moiety, such as a left-handed yPNA, or a guanidine group. Linking to non-peptide nucleic acid analogs can be achieved using any suitable linking chemistry, such as by using carbodiimide chemistry.

[0086] In linking the PNA cooperative binding domains to the nucleic acid or nucleic acid analog recognition domain, the linker is of an appropriate size or length to orient the PNA cooperative binding domains to enable concatenation of the recognition reagents on a target nucleic acid sequence, as described herein.

[0087]A compound or recognition reagent may be provided, comprising: a nucleic acid or nucleic acid analog recognition domain, having a first end and a second end, prepared from three or more, e.g., from 3 to 10, or 3, 4, 5, 6, 7, 8, 9, or 10, or from 3-8, nucleic acid or nucleic acid analog backbone residues. The backbone of the recognition domain may be conformationally preorganized. The recognition domain has a sequence of nucleobases (a sequence) complementary to a target RNA or DNA sequence. The compound comprises a first PNA cooperative binding domain linked to the first end of the recognition domain, and a second PNA cooperative binding domain linked to the second end of the recognition domain. The PNA cooperative binding domains may be PNA (achiral) or LH-yPNA. The PNA cooperative binding domains are complementary, and orthogonal to natural RNA or DNA, e.g., their backbones are LH-gRNA and/or their nucleobases do not form stable hydrogen bonds or hybridization with natural RNA or DNA under typical hybridization conditions, e.g. as described above. The compound may have the structure:

where,

n is an integer ranging from 1 to 6;

each R is, independently, a nucleobase, producing a sequence of nucleobases, optionally complementary to a target nucleic acid; each B is independently a ribose 5’ phosphate residue, deoxyribose-5- phosphate residue, or a nucleic acid analog backbone residue, and in one aspect, is a backbone residue of a conformationally preorganized nucleic acid analog, such as RT-gRNA or LNA;

each L is, independently, linkers, e.g., a non-reactive linker or a nonreactive, non-bulky linker, and each instance of L can be the same or different; and

b and b’ are the PNA, e.g., LH-gRNA, cooperative binding domains, wherein b and b’ are complementary.

[0088] The composition may comprise a PNA backbone, and thus has a structure exemplified by:

where, each R is independently, nucleobases;

n is an integer ranging from 1 and 6, such as 1 , 2, 3, 4, 5, or 6;

each L is independently, linkers;

each Ri and R2 may be, independently: a guanidine-containing group such as

NH n=1 , 2, 3, 4, or 5, ; an amino acid side chain, such as:

methyl, ethyl, linear or branched (C3-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; H, -CH2-(OCH2-CH2) q OPi; -CH 2 -(OCH2-CH 2 )q-NHPi; -CH 2 -(SCH2-CH 2 )q-SPi; -CH2-(OCH 2 -CH 2 )r-OH;

-CH2-(OCH2-CH2)r-NH 2 ; -CH2-(OCH2-CH2)r-NHC(NH)NH 2 ; or -CH2-(OCH2-CH 2 )r-S- S[CH 2 CH 2 ] S NHC(NH)NH2, where Pi is H, (Ci-C 8 )alkyl, (C 2 -Ce)alkenyl, (C2-Ce)alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci- C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50,; and in one aspect Riand F¾ are different, Ri is H and R2 is not H, or R2 is H and Ri is not H. For binding to natural nucleic acids, such as RNA or DNA, Ri is H and R2 is not H, thereby forming“right-handed” L-gRNA. “Left-handed” D-gRNA, in which R2 is H and Ri is not H, does not bind natural nucleic acids; and R3 is the first PNA, e.g., LH-gRNA, cooperative binding domain and R3’ is the second PNA, e.g., LH-gRNA, cooperative binding domain complementary to R3,

or a pharmaceutically-acceptable salt thereof.

[0089] Further, each instance of L is, independently, a linker, and may comprise one or more amino acid residues, or substituted or unsubstituted alkyl, substituted or unsubstituted alkenyl, substituted or unsubstituted alkynyl, arylalkyl, arylalkenyl, arylalkynyl, heteroarylalkyl, heteroarylalkenyl, heteroarylalkynyl, heterocyclylalkyl, heterocyclylalkenyl, heterocyclylalkynyl, aryl, heteroaryl, heterocyclyl, cycloalkyl, cycloalkenyl, alkylarylalkyl, alkylarylalkenyl, alkylarylalkynyl, alkenylarylalkyl, alkenylarylalkenyl, alkenylarylalkynyl, alkynylarylalkyl, alkynylarylalkenyl, alkynylarylalkynyl, alkyl heteroarylalkyl, alkylheteroarylalkenyl, alkyl heteroarylalkynyl, alkenyl heteroarylalkyl, alkenylheteroarylalkenyl, alkenylheteroarylalkynyl, alkynyl heteroarylalkyl, alkynylheteroarylalkenyl, alkynyl heteroarylalkynyl, alkyl heterocyclylalkyl, alkylheterocyclylalkenyl, alkyl hererocyclylalkynyl, alkenyl heterocyclylalkyl, alkenylheterocyclylalkenyl, alkenyl heterocyclylalkynyl, alkynyl heterocyclylalkyl, alkynylheterocyclylalkenyl, alkynyl heterocyclylalkynyl, alkylaryl, alkenylaryl, alkynylaryl, alkyl heteroaryl, alkenylheteroaryl, alkynyl hereroaryl, in which one or more carbons, e.g., methylenes or methylidynes (-CH=) is optionally interrupted or terminated by a hetero atom, such as O, S, or N, substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted heterocyclic, and optionally comprises a guanidine-containing group such as amino acid side chain.

[0090] Each L may comprise or consist of between about 5 to 25 atoms, e.g., 5-20,

5-10, e.g., 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 atoms, or a total of from 1 to 10, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 C and heteroatoms, e.g., O, P, N, or

S atoms. In one aspect, Ri or R 2 may be (Ci-C6)alkyl substituted with -(OCH2- CH 2 ) q OPi; -(OCH 2 -CH 2 )q-NHPi ; -(SCH 2 -CH 2 ) q -SPi; -(OCH 2 -CH 2 ) r -OH; -(OCH 2 -CH 2 )r NH 2 ; -(OCH 2 -CH 2 ) r -NHC(NH)NH 2 ; or -(OCH 2 -CH 2 )r-S-S[CH 2 CH 2 ] s NHC(NH)NH 2 , where Pi is H, (Ci-C8)alkyl, (C 2 -C8)alkenyl, (C 2 -C8)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50. [0091] The recognition reagent may have the structure:

where,

each R is, independently, a nucleobase, and each instance of R can be the same, or a different nucleobase;

n is an integer ranging from 1 to 6, such as 1 , 2, 3, 4, 5, or 6;

each Ri and R2 may be, independently: a guanidine-containing group such as

(C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; H, -CH2-(OCH2-CH2) q OPi; -CH 2 -(OCH2-CH 2 )q-NHPi; -CH 2 -(SCH2-CH 2 )q-SPi; -CH2-(OCH 2 -CH 2 )r-OH; -CH2-(OCH2-CH2)r-NH 2 ; -CH2-(OCH2-CH2)r-NHC(NH)NH 2 ; or -CH2-(OCH2-CH 2 )r-S- S[CH 2 CH 2 ]SNHC(NH)NH2, where Pi is H, (Ci-C 8 )alkyl, (C2-C 8 )alkenyl, (C2-C 8 )alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci- C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50,; and in one aspect Riand R2 are different, Ri is H and R2 is not H, or R2 is H and Ri is not H. For binding to natural nucleic acids, such as RNA or DNA, Ri is H and R2 is not H, thereby forming RH-yPNA;

one of R4 and Rs is -L-R3, and one of R6, R7, Rs are -L-R3’, where R3 is a first PNA, e.g., LH-gRNA, cooperative binding domain and R3’ is a second PNA, e.g., LH-yPNA, cooperative binding domain complementary to R3; and

[0092] Each L is a linker, e.g. a non-reactive linker or a non-reactive, non-bulky linker, and each instance of L can be the same or different, and may comprise an amino acid residue, or substituted or unsubstituted alkyl, substituted or unsubstituted alkenyl, substituted or unsubstituted alkynyl, arylalkyl, arylalkenyl, arylalkynyl, heteroarylalkyl, heteroarylalkenyl, heteroarylalkynyl, heterocyclylalkyl, heterocyclylalkenyl, heterocyclylalkynyl, aryl, heteroaryl, heterocyclyl, cycloalkyl, cycloalkenyl, alkylarylalkyl, alkylarylalkenyl, al kylarylal kynyl , alkenylarylalkyl, alkenylarylalkenyl, alkenylarylalkynyl, alkynylarylalkyl, alkynylarylalkenyl, alkynylarylal kynyl, alkyl heteroarylalkyl, alkylheteroarylalkenyl, alkyl heteroarylalkynyl, alkenyl heteroarylalkyl, alkenylheteroarylalkenyl, alkenylheteroarylalkynyl, alkynyl heteroarylalkyl, alkynylheteroarylalkenyl, alkynyl heteroarylalkynyl, alkyl heterocyclylalkyl, alkylheterocyclylalkenyl, alkyl hererocyclylal kynyl, alkenyl heterocyclylalkyl, alkenylheterocyclylalkenyl, alkenylheterocyclylalkynyl, alkynyl heterocyclylalkyl, alkynylheterocyclylalkenyl, alkynyl heterocyclylalkynyl, alkylaryl, alkenylaryl, alkynylaryl, alkyl heteroaryl, alkenylheteroaryl, al kynyl hereroaryl, in which one or more carbons, e.g., methylenes or methylidynes (-CH=) is optionally interrupted or terminated by a hetero atom, such as O, S, or N, substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted heterocyclic, and the remainder of R4, Rs, R6, R7, and Rs are each, independently: H; one or more contiguous amino acid residues; a guanidine-containing group such as

NH

-VH- N NH 2

H where z=1 , 2, 3, 4, or 5, ; an amino acid side chain, such as:

(C2-C8)alkenyl, (C2-C8)alkynyl, (Ci-C8)hydroxyalkyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci-C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH2-(OCH2-CH2) q OPi; - CH 2 -(OCH2-CH 2 )q-NHPi; -CH 2 -(SCH2-CH 2 )q-SPi; -CH2-(OCH 2 -CH 2 )r-OH; -CH 2 - (OCH2-CH 2 )r-NH 2 ; -CH2-(OCH 2 -CH2)rNHC(NH)NH 2 ; or -CH2-(OCH 2 -CH 2 )rS- S[CH 2 CH 2 ] S NHC(NH)NH2, where Pi is H, (Ci-C 8 )alkyl, (C2-C 8 )alkenyl, (C2-C 8 )alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci- C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50. R4 and R7 may be are -L-Ar, and in another aspect, R4 and R7 are -L- Ar and Rs and Rs are Arg. In one aspect, the linker comprises between about 5 to 25 atoms, e.g., 5-20, 5-10, e.g., 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 atoms, or a total of from 1 to 10, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 C and heteroatoms, e.g., O, P, N, or S atoms. In another aspect, one or more of Ri, R2, R4, Rs, R6, R7, or Rs is (Ci-C 6 )alkyl substituted with -(OCH 2 -CH 2 )qOPi; -(OCH 2 -CH 2 )q-NHPi; -(SCH2- CH 2 ) q -SPi; -(OCH 2 -CH 2 )r-OH; -(OCH2-CH 2 )r-NH 2 ; -(OCH2-CH 2 )r-NHC(NH)NH 2 ; or - (OCH 2 -CH2)r-S-S[CH2CH 2 ]sNHC(NH)NH2, where Pi is H, (Ci-Cs)alkyl, (C2-C 8 )alkenyl, (C2-C8)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

[0093] In yet another aspect, the recognition domain comprises a RH-gRNA backbone, and thus the composition has the structure:

N.

Rr'^ 1^tN^ i t| R 5

i x- H Tf O

R { m„ o where, n is an integer ranging from 1 to 8, including 1 , 2, 3, 4, 5, 6, ,7, or 8;

m is an integer ranging from 1 to 5, such as from 1 -3, including 1 , 2, 3, 4, or 5;

NH

AΉ N^NH 2

each R2 is: a guanidine-containing group such as z H 2 where z=1 , 2, 3, 4,

or 5; an amino acid side chain, such as:

branched (Ci-C8)alkyl, (C2-C8)alkenyl, (C2-Cs)alkynyl, (Ci-C8)hydroxyalkyl, (C3- Cejaryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene, (C3-C8)cycloalkyl(Ci- C6)alkylene, each optionally substituted with a polyethylene glycol chain of 1 to 50 units; -CH 2 -(OCH2-CH 2 )qOPi; -CH 2 -(OCH2-CH 2 )q-NHPi; -CH 2 -(SCH2-CH 2 )q-SPi; - CH2-(OCH 2 -CH 2 )rOH; -CH 2 -(OCH2-CH2)r-NH 2 ; -CH 2 -(OCH2-CH2)r-NHC(NH)NH 2 ; or - CH 2 -(OCH2-CH2)rS-S[CH2CH 2 ]sNHC(NH)NH2, where Pi is H, (Ci-Ce)alkyl, (C2- Cejalkenyl, (C2-Cs)alkynyl, (C3-Cs)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50;

R3 is a first PNA, e.g., LH-gRNA, cooperative binding domain and R3’ is a second PNA, e.g., LH-gRNA, cooperative binding domain complementary to R3; and

each of Rs, R7, and Rs are, independently H, a guanidine-containing group such as

NH , where z=1 , 2, 3, 4, or 5, an amino acid side chain, or one or more contiguous amino acid residues, such as one or more Arg, Orn, Lys, or Dab residues.

[0094] Each R2 may be -CH2-(OCH2-CH2)r-OH, where r is an integer ranging from 1 - 50, e.g., 1 -10, such as 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10, and in one example 2. One or more of R2, Rs, R7, or Rs may be (Ci-C6)alkyl substituted with -(OCH2-CH2) q OPi; -(OCH 2 -CH 2 )q-NHPi; -(SCH 2 -CH 2 )q-SPi; -(OCH 2 -CH 2 )r-OH; -(OCH2-CH 2 )r-NH 2 ; -(OCH2-CH2)r-NHC(NH)NH 2 ; or -(OCH2-CH2)r-S-S[CH 2 CH2]sNHC(NH)NH2, where Pi is H, (Ci-C8)alkyl, (C2-C8)alkenyl, (C2-C8)alkynyl, (C3-C8)aryl, (C3-C8)cycloalkyl, (C3-C8)aryl(Ci-C6)alkylene or (C3-C8)cycloalkyl(Ci-C6)alkylene; q is an integer from 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

[0095]Another aspect of the disclosure provides a genetic recognition reagent comprising: a nucleic acid recognition domain or nucleic acid analog recognition domain comprising a plurality of units, wherein each unit comprises a residue of a nucleic acid backbone monomer or a nucleic acid analog backbone monomer, wherein the residue comprises a nucleobase, wherein a sequence of the plurality of units is complementary to a sequence of a target nucleic acid; and a first peptide nucleic acid (PNA) cooperative binding domain (1 ) comprising nucleobases that do not bind with sufficient strength to form duplex structures with natural nucleobases of the nucleic acid recognition domain, the nucleic acid analog recognition domain, or the target nucleic acid or (2) is of opposite handedness relative to the target nucleic acid, wherein the first PNA cooperative binding domain is linked to the nucleic acid backbone or the nucleic acid analog backbone.

[0096] In some embodiments, the genetic recognition reagent comprises a second PNA cooperative binding domain having a nucleobase sequence that is complementary to a nucleobase sequence of the first PNA cooperative binding domain. In some embodiments, the first PNA cooperative binding domain is a left- handed gamma-PNA (LH-gRNA). In some embodiments, the nucleic acid recognition domain or the nucleic acid analog recognition domain is PNA. In some embodiments, the nucleic acid recognition domain or the nucleic acid analog recognition domain is right-handed PNA (RH-yPNA).

[0097]Another aspect of the disclosure provides a genetic recognition reagent comprising a nucleic acid recognition domain or nucleic acid analog recognition domain comprising a plurality of units. Each unit can comprise a residue of a nucleic acid backbone monomer or a nucleic acid analogue backbone monomer. The residue may comprise a nucleobase. The sequence of the plurality of units can be complementary to a sequence of a target nucleic acid. A first peptide nucleic acid (PNA) cooperative binding domain can comprise nucleobases that do not bind with sufficient strength to form duplex structures with natural nucleobases of the nucleic acid recognition domain, the nucleic acid analog recognition domain, or the target nucleic acid. A first peptide nucleic acid (PNA) cooperative binding domain can be of opposite handedness relative to the target nucleic acid. The first PNA cooperative binding domain can be linked to the nucleic acid backbone or the nucleic acid analog backbone.

[0098] In some embodiments, the genetic recognition reagent comprises a second PNA cooperative binding domain having a nucleobase sequence that is complementary to a nucleobase sequence of the first PNA cooperative binding domain. In some embodiments, the first PNA cooperative binding domain is a left- handed gamma-PNA (LH-gRNA). In some embodiments, the nucleic acid recognition domain or the nucleic acid analog recognition domain is PNA. In some embodiments, the nucleic acid recognition domain or the nucleic acid analog recognition domain is right-handed PNA (RH-yPNA).

[0099] In any of the structures above, e.g. in the R3, R4, Rs, R6, R7, and/or Rs positions, one or more, or all, chiral amino acid residues may be L-amino acids. In another example, when present, e.g. in the R3, R4, Rs, R6, R7, and/or Rs positions, one or more, or all, chiral amino acid residues may be D-amino acids.

[00100] In any of the structures above, the sequence of the recognition domain may target expanded repeats, and therefore exemplary nucleobase sequences can include (e.g., either in a single recognition reagent or when concatenated), the following: TTC, CCG, CGG, CTG, CAG, CAGG, AGAAT or GGCCCC for targeting mRNA (sense), or non-natural nucleobases with the same specificity. These sequences are merely exemplary, and for targeting a sequence in a repeated element, such as ...CAGCAGCAG..., any sequential combination of repeated sequences may be included in a single recognition reagent. For example, the sequence CTG will target complementary CAG repeats, but so will TGC and GCT, and dimers thereof, CTGCTG, TGCTGC and GCTGCT. Therefore, the following sequences, or equivalent nucleobases having the same nucleobase binding specificities, may be used to target expanded repeats in mRNA shown in Table B: TTC, TTCTTC, TCT, TCTTCT, CTT, CTTCTT, CCG, CCGCCG, CGC, CGCCGC, GCC, GCCGCC, CGG, CGGCGG, GCG, GCGGCG, GGC, GGCGGC, CTG, CTGCTG, TGC, TGCTGC, GCT, GCTGCT, CAG, CAG CAG, AGC, AGCAGC, GCA, GCAGCA, CAGG, CAGG CAGG, AGGC, AGGCAGGC, GGCA, GG CAG GCA, GCAG, G CAGG CAG, AGAAT, GAATA, AATAG, ATAGA, TAGAA, GGCCCC, GCCCCG, CCCCGG, CCCGGC, CCGGCC, and CGGCCC. The preceding sequences are listed 5’ to 3’ antiparallel to the sense mRNA strand, and may be, for example, in a C-terminal to N-terminal direction in PNA. [00101] R groups of the recognition reagents described herein are arranged in a sequence to be complementary to a sequence of nucleobases in template nucleic acid(s) so that the compositions will bind to the sequence of nucleobases in the template nucleic acid(s). A“template nucleic acid” can include any nucleic acid or nucleic acid analog. When the template is within a cell, it likely will be a nucleic acid, such as DNA or RNA, such as an mRNA to silence. If the recognition reagents are assembled in vitro, the template can be a nucleic acid or any analog thereof that permits specific hybridization to the recognition reagents described herein.

[00102] Unless otherwise indicated, the recognition reagents described herein are not described with respect to any particular sequence of nucleobases. The present disclosure is directed to methods and compositions concatenating the described recognition reagents, such as those based on the yPNA backbone, and is independent of the identity and sequence of bases attached thereto. Any nucleobase sequence attached to the backbone of the described yPNA oligomers can hybridize in a specific manner with a complementary nucleobases sequence of a target nucleic acid or nucleic acid analog by Watson-Crick or Watson-Crick-like hydrogen bonding. The compositions and methods described herein are sequence-independent and describe a novel, generalized method, and related compositions, for tern plate -directed assembly of longer yPNA sequences from shorter yPNA (precursor) sequences.

[00103] Nucleobases of the recognition reagents described herein are arranged in a sequence complementary to target sequences of the template nucleic acid, such as a mRNA, e.g. an mRNA containing expanded repeats, so that two or more recognition reagents as described herein bind by base pairing, e.g., by Watson-Crick, or Watson-Crick-like base pairing, to sequences of bases on the template nucleic acid, and concatenate. Non-limiting examples of the combinations of recognition reagents that may be assembled according to the methods described herein are a two recognition reagents in which a first recognition reagent has a nucleobase sequence complementary to a first sequence of a template nucleic acid or nucleic acid analog, and a second recognition reagent has a nucleobases sequence complementary to a second sequence on the template immediately adjacent to the first sequence on the template, such that the two precursors bind a contiguous series of bases on the template, for example, as depicted in Fig. 1 . Two or more different recognition reagents can be assembled in that manner, with each binding adjacent short sequences in a longer, contiguous sequence of the template nucleic acid, and concatenating with adjacent recognition reagents. In one example, as shown in Fig. 1 , the recognition reagent has a single sequence of nucleobases complementary to a repeated sequence on the template so that two or more identical recognition reagents bind tandemly to a contiguous sequence of repeats on the template. As indicated above, based on Table B, recognition reagents that would target gene products described above can include TTC, CCG, CGG, CTG, CAG, CAGG, AGAAT or GGCCCC. In another example, two or more different recognition reagents, having different sequences, can be selected to concatenate on a unique non-repeated sequence. For example, two different hexamer recognition reagents can be produced to hybridize adjacent to each-other on a unique 12 nucleotide sequence present in a target sequence, such as an mRNA.

[00104] As indicated above, a method of producing a concatenated nucleic acid or nucleic acid analog, e.g., a conformationally preorganized nucleic acid analog such as yPNA, is provided, comprising, binding a plurality of the compounds or recognition reagents as described above, to a template nucleic acid or nucleic acid analog. The compositions will concatenate when the terminal PNA cooperative binding groups are in proximity to each other. The PNA cooperative binding groups are considered to be in proximity to each other when they are sufficiently close such that they hybridize, e.g., as shown in Figure 1 .

[00105] In another method, the recognition reagent as described herein, is introduced into a cell for a therapeutic effect. A variety of transfection reagents, suitable for in vitro or in vivo use are suitable for delivery of the compositions described herein to cells or liposomal preparations (commercially available from multiple sources). Once within the cells, the recognition reagents will concatenate on a suitable nucleic acid template, such as a native mRNA. When more than one of the recognition reagents hybridize to the same template nucleic acid, placing the terminal PNA cooperative binding groups in proximity to each-other, e.g., next to each-other in a contiguous sequence, the recognition reagents will concatenate due to the hybridization of the proximal PNA cooperative binding groups. Compounds that do not bind a nucleic acid containing either repeats of the recognition reagent’s recognition domain sequence, or adjacent sequences complementary to more than one of the delivered recognition reagents will release from the bound nucleic acid because the strength of the binding of the e.g., 3-8-mer is not strong enough to maintain the compound on the bound nucleic acid. When more than one recognition reagent binds to a target nucleic acid sequence, e.g., to adjacent, repeated sequences, they will form a concatenated structure of sufficient length to bind the nucleic acid with sufficient strength to remain hybridized to the target sequence, for achieving a specific effect, such as gene silencing where the composition has a sequence of an siRNA, miRNA, mirtron or similar composition.

[00106] In aspects a method of treating a patient having a disease as listed in Table B, such as such as myotonic dystrophy type 1 (DM1 ) and type 2 (DM2) or Huntington’s disease is provided. The method comprises administering an amount of a composition or recognition reagent according to any embodiment or aspect described herein and complementary to a repeat expansion target sequence in mRNA of the patient, effective to knock down expression of the mRNA comprising the repeat expansion target sequence in a patient. For DM1 , the target sequence is (CTG) n , and thus the recognition reagent has the sequence: CTG, CTGCTG, TGC, TGCTGC, GCT, or GCTGCT. For DM2, the target sequence is (CCTG) n , and thus the recognition reagent has the sequence: CAGG, CAGGCAGG, AGGC, AGGCAGGC, GGCA, GGCAGGCA, GCAG, or GCAGGCAG. For Huntington’s Disease, the target sequence is (CAG) n , and thus the recognition reagent has the sequence: CTG, CTGCTG, TGC, TGCTGC, GCT, or GCTGCT. Sequences complementary to those sequences above are useful for binding the antisense strand of DNA containing the expanded repeats.

[00107] For treatment of a patient, the recognition reagent can be administered by any effective route of administration, such as, without limitation, by: parenteral, administration, such as by intravenous, intraperitoneal, intra-organ, such as delivery to the liver, or intramuscular injection; by inhalation, e.g., in a spray or aerosol metered dose inhaler; topically, such as dermal, transdermal, otic, or ophthalmic delivery; transmucosally such as transvaginally or buccal; or orally. The composition may be administered as an individual dose, or in multiple doses over time, so as to maintain reduced expression of a target RNA.

[00108] Also provided herein are pharmaceutical compositions and formulations which can include the recognition reagents described herein. In one aspect, provided herein are pharmaceutical compositions containing a recognition reagent, as described herein, and a pharmaceutically acceptable carrier. The pharmaceutical compositions containing the recognition reagents are useful for treating a disease, such as a repeat expansion disease, e.g., as listed in Fig. 7. Such pharmaceutical compositions can be formulated based on the mode of delivery. One example is compositions that are formulated for systemic administration via parenteral delivery, e.g., by intravenous (IV) or for subcutaneous delivery. The pharmaceutical compositions may be administered in dosages sufficient to treat the disease, e.g., by knocking down expression of an mRNA, as with repeat expansion diseases. A suitable dose of a recognition reagent can be in the range of about 0.001 to about 200.0 milligrams per kilogram body weight of the recipient per day, for example in the range of about 1 to 50 mg per kilogram body weight per day. A repeat-dose regimen may include administration of a therapeutic amount of the recognition reagent on a regular basis, such as every other day or once a year. In certain aspects, the recognition reagent is administered about once per month to about once per quarter (e.g., about once every three months). After an initial treatment regimen, the treatments can be administered on a less frequent basis. Certain factors can influence the dosage and timing required to effectively treat a subject, including but not limited to the severity of the disease or disorder, previous treatments, the general health and/or age of the subject, and other diseases present. Moreover, treatment of a subject with a therapeutically effective amount of a composition can include a single treatment or a series of treatments. Estimates of effective dosages and in vivo half-lives for the individual recognition reagents encompassed herein can be made using conventional methodologies, or based upon in vivo testing using an appropriate animal model.

[00109] The pharmaceutical compositions can be administered in any number of ways depending upon whether local or systemic treatment is selected and upon the area to be treated. Administration can be topical (e.g., by a transdermal patch), pulmonary, e.g., by inhalation or insufflation of powders or aerosols, including, but not limited to, by nebulizer; intratracheal, intranasal, epidermal and transdermal, oral or parenteral. Parenteral administration includes, but are not limited to, intravenous, intraarterial, subcutaneous, intraperitoneal or intramuscular injection or infusion; subdermal, e.g., via an implanted device; or intracranial, e.g., by intraparenchymal, intrathecal or intraventricular, administration. The recognition reagent can be delivered in a manner to target a particular tissue, such as the liver (e.g., the hepatocytes of the liver).

[00110] Pharmaceutical compositions and formulations for topical administration can include transdermal patches, ointments, lotions, creams, gels, drops, suppositories, sprays, liquids, and powders. Conventional pharmaceutical carriers, aqueous, powder, or oily bases, thickeners and the like can be used. Coated condoms, gloves and the like can also be useful. Suitable topical formulations include, but are not limited to, those in which the recognition reagents are in admixture with a topical delivery agent such as lipids, liposomes, fatty acids, fatty acid esters, steroids, chelating agents, and surfactants. Suitable lipids and liposomes include, but are not limited to, neutral (e.g., dioleoylphosphatidyl DOPE ethanolamine, dimyristoylphosphatidyl choline DMPC, distearolyphosphatidyl choline), negative (e.g., dimyristoylphosphatidyl glycerol DMPG), and cationic (e.g., dioleoyltetramethylaminopropyl DOTAP and dioleoylphosphatidyl ethanolamine DOTMA). Recognition reagents can be encapsulated within liposomes or can form complexes thereto, in particular to cationic liposomes. Alternatively, recognition reagents can be complexed to lipids, in particular to cationic lipids. Suitable fatty acids and esters include but are not limited to arachidonic acid, oleic acid, eicosanoic acid, lauric acid, caprylic acid, capric acid, myristic acid, palmitic acid, stearic acid, linoleic acid, linolenic acid, dicaprate, tricaprate, monoolein, dilaurin, glyceryl 1 -monocaprate, 1 -dodecylazacycloheptan-2-one, an acylcarnitine, an acylcholine, or a C1-20 alkyl ester (e.g., isopropyl myristate IPM), monoglyceride or diglyceride; or pharmaceutically acceptable salt thereof. Topical formulations are described in detail in U.S. Patent No. 6,747,014. A person of skill in the pharmaceutical and compounding arts can prepare suitable formulations for delivery of the recognition reagents as described herein.

Example 1

[00111] Fig. 8 depicts a design of a nucleic acid probe containing RH-yPNA recognition and LH-yPNA cooperative binding domains for discrimination of r(CAG)- repeated expansion. Of note, LH-yPNA is orthogonal to RH-yPNA, RNA, and DNA in recognition; therefore, any combination of base-pairs can be employed without the concern for the cooperative binding domain interacting nonspecifically with the unintended genetic materials.

Example 2

[00112] Fig. 9 depicts the design of nucleic acid probe containing RH-yPNA recognition and cooperative binding domains for discrimination of r(CAG)-repeated expansion. The design employs orthogonal nucleobases in the RH-yPNA cooperative binding domains that do not bind natural nucleobases. Such a design minimizes or prevents the cooperative binding domain from nonspecifically interacting with the unintended genetic materials.

Example 3

[00113] Fig. 10 depicts the design of nucleic acid probe containing PNA recognition and cooperative binding domains for discrimination of r(CAG)-repeated expansion. Such a design minimizes or prevents the cooperative binding domain from nonspecifically interacting with the unintended genetic materials. The PNA backbones do not discriminate between natural nucleic acids and LH-gRNA, but the orthogonal nucleobases in the LH-gRNA cooperative binding domains do not bind natural nucleobases.

[00114] The present invention has been described with reference to certain exemplary embodiments, dispersible compositions and uses thereof. However, it will be recognized by those of ordinary skill in the art that various substitutions, modifications or combinations of any of the exemplary embodiments may be made without departing from the spirit and scope of the invention. Thus, the invention is not limited by the description of the exemplary embodiments.