Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CAPTURING GENETIC TARGETS USING A HYBRIDIZATION APPROACH
Document Type and Number:
WIPO Patent Application WO/2021/168261
Kind Code:
A1
Abstract:
Provided herein are methods of determining a location of an analyte using hybridization as a method for enhancing detection of the analyte. In particular, capture probes comprising a spatial barcode and a capture domain are used to capture analytes in a biological sample contacted with a substrate. The analyte may be a nucleic acid or a protein. Bait oligonucleotides are used to enrich a nucleic acid of interest before sequencing.

Inventors:
KOHLWAY ANDREW SCOTT (US)
MESCHI FRANCESCA (US)
CHEW JENNIFER (US)
ARTHUR JOSEPH GLENN (US)
DELANEY NIGEL (US)
BENT ZACHARY (US)
PFEIFFER KATHERINE (US)
HILL ANDREW JOHN (US)
ALVARADO MARTINEZ LUIGI JHON (US)
Application Number:
PCT/US2021/018795
Publication Date:
August 26, 2021
Filing Date:
February 19, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
10X GENOMICS INC (US)
International Classes:
C12Q1/6806; C12Q1/6834
Domestic Patent References:
WO2015117163A22015-08-06
WO2019213254A12019-11-07
WO2020047004A22020-03-05
WO2020123309A12020-06-18
WO2018091676A12018-05-24
WO2020176788A12020-09-03
WO2020123320A22020-06-18
WO2020061108A12020-03-26
WO2020061066A12020-03-26
Foreign References:
US20190145982A12019-05-16
US20180245142A12018-08-30
US20160122817A12016-05-05
US10774374B22020-09-15
US10724078B22020-07-28
US10480022B22019-11-19
US10059990B22018-08-28
US10041949B22018-08-07
US10002316B22018-06-19
US9879313B22018-01-30
US9783841B22017-10-10
US9727810B22017-08-08
US9593365B22017-03-14
US8951726B22015-02-10
US8604182B22013-12-10
US7709198B22010-05-04
US20200239946A12020-07-30
US20200080136A12020-03-12
US20200277663A12020-09-03
US20200024641A12020-01-23
US20190330617A12019-10-31
US20190264268A12019-08-29
US20200256867A12020-08-13
US20200224244A12020-07-16
US20190194709A12019-06-27
US20190161796A12019-05-30
US20190085383A12019-03-21
US20190055594A12019-02-21
US20180216161A12018-08-02
US20180051322A12018-02-22
US20180245142A12018-08-30
US20170241911A12017-08-24
US20170089811A12017-03-30
US20170067096A12017-03-09
US20170029875A12017-02-02
US20170016053A12017-01-19
US20160108458A12016-04-21
US20150000854A12015-01-01
US20130171621A12013-07-04
US20200061064W2020-11-18
US16951854A
US20200053655W2020-09-30
US16951864A
US16951843A
US20180156784A12018-06-07
USPP62979652P
USPP62980124P
US63077019A
USPP62970066P
USPP62929686P
USPP62970066P
Other References:
RODRIQUES ET AL., SCIENCE, vol. 363, no. 6434, 2019, pages 1463 - 1467
LEE ET AL., NAT. PROTOC., vol. 10, no. 3, 2015, pages 442 - 458
TREJO ET AL., PLOS ONE, vol. 14, no. 2, 2019, pages e0212031
CHEN ET AL., SCIENCE, vol. 348, no. 6233, 2015, pages aaa6090
GAO ET AL., BMC BIOL, vol. 15, 2017, pages 50
GUPTA ET AL., NATURE BIOTECHNOL, vol. 36, 2018, pages 1197 - 1202
CREDLE ET AL., NUCLEIC ACIDS RES, vol. 45, no. 14, 21 August 2017 (2017-08-21), pages e128
BOLOGNESI ET AL., J. HISTOCHEM. CYTOCHEM., vol. 65, no. 8, 2017, pages 431 - 444
LIN ET AL., NAT COMMUN, vol. 6, 2015, pages 8390
PIRICI ET AL., J. HISTOCHEM. CYTOCHEM., vol. 57, 2009, pages 899 - 905
HARROW ET AL.: "GENCODE: The reference human genome annotation for The ENCODE Project", GENOME RES, vol. 22, no. 9, 2012, pages 1760 - 1774, XP055174460, DOI: 10.1101/gr.135350.111
FLICEK ET AL.: "Ensembl 2014", NUCLEIC ACIDS RES. 42(DATABASE ISSUE, 2014, pages D749 - D755
CUNNINGHAM ET AL., ENSEMBL, 2019
KANDOTH ET AL.: "Mutational landscape and significance across 12 major cancer types", NATURE, vol. 502, no. 7471, 2013, pages 333 - 339, XP055614322, DOI: 10.1038/nature12634
HOADLEY ET AL.: "Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer", CELL, vol. 173, no. 2, 2018, pages 291 - 304, XP085371385, DOI: 10.1016/j.cell.2018.03.022
BAILEY ET AL.: "Comprehensive Characterization of Cancer Driver Genes and Mutations", CELL, vol. 173, no. 2, 2018, pages 371 - 385, XP085371382, DOI: 10.1016/j.cell.2018.02.060
THORSSON ET AL.: "The Immune Landscape of Cancer", IMMUNITY, vol. 48, no. 4, 2018, pages 812 - 830, XP085382263, DOI: 10.1016/j.immuni.2018.03.023
BEHAN ET AL.: "Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens", NATURE, vol. 568, 2019, pages 511 - 516, XP036953699, DOI: 10.1038/s41586-019-1103-9
SANCHEZ-VEGA ET AL.: "Oncogenic Signaling Pathways in The Cancer Genome Atlas", CELL, vol. 173, no. 2, 2018, pages 321 - 337, XP085371390, DOI: 10.1016/j.cell.2018.03.035
FANG ET AL.: "A genetics-led approach defines the drug target landscape of 30 immune-related traits", NATURE GENETICS, vol. 51, no. 7, 2019, pages 1082 - 1091, XP036824719, DOI: 10.1038/s41588-019-0456-1
Attorney, Agent or Firm:
ZHAO, Hanchao et al. (US)
Download PDF:
Claims:
What is claimed is:

1. A method for identifying abundance and location of an analyte in a biological sample, the method comprising:

(a) contacting a plurality of nucleic acids with a plurality of bait oligonucleotides, wherein the plurality of nucleic acids comprises an extended nucleic acid comprising (i) a spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of the analyte, an analyte derivative, or a complement thereof; and a bait oligonucleotide of the plurality of bait oligonucleotides comprises:

(i) a capture domain that hybridizes to all or a portion of the sequence of the analyte, the analyte derivative, or a complement thereof, and

(ii) a molecular tag;

(b) capturing the bait oligonucleotide bound to the extended nucleic acid using a substrate comprising an agent that binds to the molecular tag; and

(c) determining (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the extended nucleic acid, and using the determined sequences of (i) and (ii) to identify the abundance and the location of the analyte in the biological sample.

2. The method of claim 1, wherein the analyte is a nucleic acid.

3. The method of claim 2, wherein the method further comprises generating the plurality of nucleic acids, which comprises:

(a) contacting the biological sample with a substrate comprising a plurality of attached capture probes, wherein a capture probe of the plurality comprises (i) the spatial barcode and (ii) a capture domain that binds to a sequence present in the nucleic acid;

(b) hybridizing the capture probe to the nucleic acid;

(c) extending a 3’ end of the capture probe using the nucleic acid that is bound to the capture domain as a template to generate an extended capture probe; and

(d) amplifying the extended capture probe to produce the extended nucleic acid.

4. The method of claim 3, wherein the extended nucleic acid is released from the extended capture probe.

5. The method of claim 1, wherein the analyte is a protein.

6. The method of claim 1 or 5, wherein the analyte derivative is an oligonucleotide comprising an analyte binding moiety barcode, and an analyte capture sequence.

7. The method of claim 6, wherein the method further comprises generating the plurality of nucleic acids, the method comprising:

(a) contacting a plurality of analyte capture agents to the biological sample, wherein: an analyte capture agent of the plurality of the analyte capture agents comprises (i) an analyte binding moiety that binds to the protein and (ii) the oligonucleotide comprising the analyte binding moiety barcode and the analyte capture sequence;

(b) contacting the plurality of analyte capture agents to a substrate comprises a plurality of capture probes, wherein a capture probe of the plurality comprises the spatial barcode and a capture domain, wherein the capture domain binds to the analyte capture sequence;

(c) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and

(d) amplifying the extended capture probe to produce the extended nucleic acid.

8. The method of claim 7, wherein the extended nucleic acid is released from the extended capture probe.

9. The method of claim 7, wherein step (a) and step (b) are performed at substantially the same time.

10. A method for enriching an analyte or an analyte derivative in a biological sample, the method comprising:

(a) contacting a plurality of nucleic acids with a plurality of bait oligonucleotides, wherein: the plurality of nucleic acids comprises an extended nucleic acid comprising (i) a spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of the analyte, an analyte derivative, or a complement thereof; and a bait oligonucleotide of the plurality of bait oligonucleotides comprises:

(i) a capture domain that hybridizes to all or a portion of the sequence of the analyte, the analyte derivative, or a complement thereof, and

(ii) a molecular tag;

(b) capturing a complex of the bait oligonucleotide bound to the extended nucleic acid using a substrate comprising an agent that binds to the molecular tag; and

(c) isolating the complex of the bait oligonucleotide bound to the extended nucleic acid, thereby enriching the analyte or the analyte derivative in the biological sample.

11. The method of claim 10, wherein the analyte is a nucleic acid.

12. The method of claim 11, wherein the method further comprises generating the plurality of nucleic acids, which comprises:

(a) contacting the biological sample with a substrate comprising a plurality of attached capture probes, wherein a capture probe of the plurality comprises (i) the spatial barcode and (ii) a capture domain that binds to a sequence present in the nucleic acid;

(b) hybridizing the capture probe to the nucleic acid;

(c) extending a 3’ end of the capture probe using the nucleic acid that is bound to the capture domain as a template to generate an extended capture probe; and

(d) amplifying the extended capture probe to produce the extended nucleic acid.

13. The method of claim 12, wherein the extended nucleic acid is released from the extended capture probe.

14. The method of claim 10, wherein the analyte is a protein.

15. The method of claim 10 or 14, wherein the analyte derivative is an oligonucleotide comprising an analyte binding moiety barcode, and an analyte capture sequence.

16. The method of claim 15, wherein the method further comprises generating the plurality of nucleic acids, the method comprising:

(a) contacting a plurality of analyte capture agents to the biological sample, wherein: an analyte capture agent of the plurality of the analyte capture agents comprises (i) an analyte binding moiety that binds to the protein and (ii) the oligonucleotide comprising the analyte binding moiety barcode and the analyte capture sequence;

(b) contacting the plurality of analyte capture agents to a substrate comprises a plurality of capture probes, wherein a capture probe of the plurality comprises the spatial barcode and a capture domain, wherein the capture domain binds to the analyte capture sequence;

(c) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and

(d) amplifying the extended capture probe to produce the extended nucleic acid.

17. The method of claim 16, wherein the extended nucleic acid is released from the extended capture probe.

18. The method of claim 16, wherein step (a) and step (b) are performed at substantially the same time.

19. The method of any one of claims 1-18, wherein the analyte from the biological sample is associated with a disease or condition.

20. The method of any one of claims 1-19, wherein the capture domain of the bait oligonucleotide binds to a 3’ portion, a 5’ portion, an intron, an exon, an untranslated 3’ region, or an untranslated 5’ region of the sequence of the analyte, the analyte derivative, or a complement thereof.

21. The method of any one of claims 1-20, wherein the capture domain of the bait oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides.

22. The method of any one of claims 1-21, wherein the molecular tag comprises a protein, a small molecule, a nucleic acid, or a carbohydrate.

23. The method of any one of claims 1-21, wherein the molecular tag is streptavidin, avidin, biotin, or a fluorophore.

24. The method of any one of claims 1-23, wherein the agent that binds to the molecular tag comprises a protein (e.g., an antibody), a nucleic acid, or a small molecule.

25. The method of any one of claims 1-24, wherein the molecular tag is biotin and the agent that binds to the molecular tag is avidin or streptavidin.

26. The method of any one of claims 1-25, wherein the agent that binds specifically to the molecular tag is attached to a substrate.

27. The method of claim 26, wherein the substrate is a bead, a well, or a slide.

28. The method of any one of claims 1-27, wherein the extended nucleic acid is a DNA molecule (e.g., a cDNA molecule).

29. The method of any one of claims 1-28, wherein the extended nucleic acid further comprises a primer sequence or a complement thereof; a unique molecular sequence or a complement thereof; or an additional primer binding sequence or a complement thereof.

30. The method of any one of claims 1-29, wherein the biological sample is a tissue sample selected from a formalin-fixed, paraffin-embedded (FFPE) tissue sample or a frozen tissue sample.

31. The method of any one of claims 1-30, wherein the biological sample was previously stained using a detectable label, a hematoxylin and eosin (H&E) stain, immunofluorescence, or immunohistochemistry.

32. The method of any one of claims 1-31, wherein the biological sample is a permeabilized biological sample.

33. The method of any one of claims 1-9, and 19-32, wherein the determining step comprises sequencing (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the nucleic acid from the biological sample.

34. The method of claim 33, wherein the sequencing is high throughput sequencing.

35. The method of any one of claims 1-34, wherein the analyte is dysregulated or differentially expressed in a cancer cell, an immune cell, a cellular signaling pathway, or a neurological cell.

36. A method for identifying abundance and location of a nucleic acid in a biological sample, the method comprising:

(a) contacting the biological sample with a substrate comprising a plurality of attached capture probes, wherein a capture probe of the plurality comprises (i) a spatial barcode and (ii) a capture domain that binds to a sequence present in the nucleic acid;

(b) hybridizing the capture probe to the nucleic acid;

(c) extending a 3’ end of the capture probe using the nucleic acid that is bound to the capture domain as a template to generate an extended capture probe; and

(d) amplifying the extended capture probe to produce the extended nucleic acid; wherein the extended nucleic acid comprises (i) the spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of the nucleic acid, or a complement thereof;

(e) releasing the extended nucleic acid from the extended capture probe;

(i) contacting a plurality of the released nucleic acids comprising the extended nucleic acid from step (e) with a plurality of bait oligonucleotides, wherein a bait oligonucleotide of the plurality of bait oligonucleotides comprises:

(i) a capture domain that hybridizes to all or a portion of the sequence of the nucleic acid, or a complement thereof, and

(ii) a molecular tag;

(g) capturing the bait oligonucleotide bound to the extended nucleic acid using a substrate comprising an agent that binds to the molecular tag; and

(h) determining (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the extended nucleic acid, and using the determined sequences of (i) and (ii) to identify the abundance and the location of the nucleic acid in the biological sample.

37. A method for identifying abundance and location of a protein in a biological sample, the method comprising:

(a) contacting a plurality of analyte capture agents to the biological sample, wherein: an analyte capture agent of the plurality of the analyte capture agents comprises (i) an analyte binding moiety that binds to the protein and (ii) an oligonucleotide comprising an analyte binding moiety barcode and an analyte capture sequence;

(b) contacting the plurality of analyte capture agents to a substrate comprises a plurality of capture probes, wherein a capture probe of the plurality comprises a spatial barcode and a capture domain, wherein the capture domain binds to the analyte capture sequence;

(c) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe;

(d) amplifying the extended capture probe to produce the extended nucleic acid; wherein the extended nucleic acid comprises (i) the spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of the oligonucleotide, or a complement thereof;

(e) releasing the extended nucleic acid from the extended capture probe;

(i) contacting a plurality of the released nucleic acids comprising the extended nucleic acid from step (e) with a plurality of bait oligonucleotides, wherein a bait oligonucleotide of the plurality of bait oligonucleotides comprises:

(i) a capture domain that hybridizes to all or a portion of the sequence of the oligonucleotide, or a complement thereof, and

(ii) a molecular tag;

(g) capturing the bait oligonucleotide bound to the extended nucleic acid using a substrate comprising an agent that binds to the molecular tag; and

(h) determining (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the extended nucleic acid, and using the determined sequences of (i) and (ii) to identify the abundance and the location of the protein in the biological sample.

38. A composition comprising a bait oligonucleotide and an extended nucleic acid, wherein the bait oligonucleotide is bound to the extended nucleic acid, wherein the extended nucleic acid comprises (i) a spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of an analyte, an analyte derivative, or a complement thereof; and wherein the bait oligonucleotide comprises a molecular tag, wherein the molecular tag is selected from the group consisting of streptavidin, avidin, biotin, or a fluorophore.

39. The composition of claim 38, wherein the bait oligonucleotide is bound to the extended nucleic acid by a capture domain that hybridizes to all or a portion of the sequence of the analyte, the analyte derivative, or a complement thereof.

40. The composition of claim 38 or 39, further comprising an agent that binds to the molecular tag

41. The composition of claim 40, wherein the agent that binds to the molecular tag comprises a protein (e.g., an antibody), a nucleic acid, or a small molecule.

42. The composition of claim 40, wherein the molecular tag is biotin and the agent that binds to the molecular tag is avidin or streptavidin.

43. The composition of any one of claims 38-42, further comprising a substrate, wherein the agent that binds to the molecular tag is attached to the substrate.

44. The composition of claim 43, wherein the substrate is a bead, a well, or a slide.

45. A kit comprising: an array comprising a plurality of capture probes, wherein a capture probe of the plurality of capture probes comprises a spatial barcode and a capture domain; a plurality of bait oligonucleotides; and instructions for performing the method of any one of claims 1-4, 10-13, and 19-36.

46. A kit comprising: a plurality of analyte capture agents, wherein an analyte capture agent of the plurality of analyte capture agents comprises an analyte binding moiety, an analyte binding moiety barcode, and an analyte capture sequence; an array comprising a plurality of capture probes, wherein a capture probe of the plurality of capture probes comprises a spatial barcode and a capture domain; a plurality of bait oligonucleotides; and instructions for performing the method of any one of claims 1, 5-9, 10, 14-35, and 37.

47. The kit of claim 45 or 46, further comprising reagents and/or enzymes for performing the method.

Description:
CAPTURING GENETIC TARGETS USING A HYBRIDIZATION APPROACH

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Provisional Patent Application No. 62/979,652, filed February 21, 2020; U.S. Provisional Patent Application No. 62/980,124, filed February 21, 2020; and U.S. Provisional Patent Application No. 63/077,019, filed September 11, 2020. The contents of the foregoing applications are incorporated herein by reference in its entirety.

BACKGROUND

Cells within a tissue of a subject have differences in cell morphology and/or function due to varied analyte abundance (e.g., gene and/or protein expression) within the different cells. The specific position of a cell within a tissue (e.g., the cell’s position relative to neighboring cells or the cell’s position relative to the tissue microenvironment) can affect, e.g., the cell’s morphology, differentiation, fate, viability, proliferation, behavior, and signaling and cross-talk with other cells in the tissue.

Spatial heterogeneity has been previously studied using techniques that only provide data for a small handful of analytes in the context of an intact tissue or a portion of a tissue, or provide a lot of analyte data for single cells, but fail to provide information regarding the position of the single cell in a parent biological sample (e.g., tissue sample).

Whole exome sequencing provides coverage for each transcripts in a sample. There is a need in the art for a transcriptome-specific method for the high-fidelity enrichment of nucleic acid molecules for targeted sequencing while reducing cost, maximizing efficiency, and minimizing redundancy.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted on a compact disc and is hereby incorporated by reference in its entirety. Said compact disc, created on February 19, 2021, is named 47706-

0198WOl_sequence_listing_CORRECTED.txt and is 316,281,109 bytes in size. 3 copies of the compact discs are submitted. SUMMARY

Disclosed herein is a method for identifying abundance and location of an analyte in a biological sample, the method comprising: (a) contacting a plurality of nucleic acids with a plurality of bait oligonucleotides, in some embodiments, the plurality of nucleic acids comprises an extended nucleic acid comprising (i) a spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of the analyte, an analyte derivative, or a complement thereof; and a bait oligonucleotide of the plurality of bait oligonucleotides comprises: (i) a capture domain that hybridizes to all or a portion of the sequence of the analyte, the analyte derivative, or a complement thereof, and (ii) a molecular tag; (b) capturing the bait oligonucleotide bound to the extended nucleic acid using a substrate comprising an agent that binds to the molecular tag; and (c) determining (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the extended nucleic acid, and using the determined sequences of (i) and (ii) to identify the abundance and the location of the analyte in the biological sample.

In some embodiments, the analyte is a nucleic acid.

In some embodiments, the method further comprises generating the plurality of nucleic acids, which comprises: (a) contacting the biological sample with a substrate comprising a plurality of attached capture probes, in some embodiments, a capture probe of the plurality comprises (i) the spatial barcode and (ii) a capture domain that binds to a sequence present in the nucleic acid; (b) hybridizing the capture probe to the nucleic acid; (c) extending a 3’ end of the capture probe using the nucleic acid that is bound to the capture domain as a template to generate an extended capture probe; and (d) amplifying the extended capture probe to produce the extended nucleic acid.

In some embodiments, the extended nucleic acid is released from the extended capture probe.

In some embodiments, the analyte is a protein.

In some embodiments, the analyte derivative is an oligonucleotide comprising an analyte binding moiety barcode, and an analyte capture sequence.

In some embodiments, the method further comprises generating the plurality of nucleic acids, the method comprising: (a) contacting a plurality of analyte capture agents to the biological sample, in some embodiments, an analyte capture agent of the plurality of the analyte capture agents comprises (i) an analyte binding moiety that binds to the protein and (ii) the oligonucleotide comprising the analyte binding moiety barcode and the analyte capture sequence; (b) contacting the plurality of analyte capture agents to a substrate comprises a plurality of capture probes, in some embodiments, a capture probe of the plurality comprises the spatial barcode and a capture domain, in some embodiments, the capture domain binds to the analyte capture sequence; (c) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (d) amplifying the extended capture probe to produce the extended nucleic acid.

In some embodiments, the extended nucleic acid is released from the extended capture probe.

In some embodiments, step (a) and step (b) are performed at substantially the same time. In some embodiments, step (a) is performed prior to step (b). In some embodiments, step (b) is performed prior to step (a).

Also disclosed herein is a method for enriching an analyte or an analyte derivative in a biological sample, the method comprising: (a) contacting a plurality of nucleic acids with a plurality of bait oligonucleotides, in some embodiments, the plurality of nucleic acids comprises an extended nucleic acid comprising (i) a spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of the analyte, an analyte derivative, or a complement thereof; and a bait oligonucleotide of the plurality of bait oligonucleotides comprises: (i) a capture domain that hybridizes to all or a portion of the sequence of the analyte, the analyte derivative, or a complement thereof, and (ii) a molecular tag; (b) capturing a complex of the bait oligonucleotide bound to the extended nucleic acid using a substrate comprising an agent that binds to the molecular tag; and (c) isolating the complex of the bait oligonucleotide bound to the extended nucleic acid, thereby enriching the analyte or the analyte derivative in the biological sample.

In some embodiments, the analyte is a nucleic acid.

In some embodiments, the method further comprises generating the plurality of nucleic acids, which comprises: (a) contacting the biological sample with a substrate comprising a plurality of attached capture probes, in some embodiments, a capture probe of the plurality comprises (i) the spatial barcode and (ii) a capture domain that binds to a sequence present in the nucleic acid; (b) hybridizing the capture probe to the nucleic acid; (c) extending a 3’ end of the capture probe using the nucleic acid that is bound to the capture domain as a template to generate an extended capture probe; and (d) amplifying the extended capture probe to produce the extended nucleic acid.

In some embodiments, the extended nucleic acid is released from the extended capture probe.

In some embodiments, the analyte is a protein. In some embodiments, the analyte derivative is an oligonucleotide comprising an analyte binding moiety barcode, and an analyte capture sequence.

In some embodiments, the method further comprises generating the plurality of nucleic acids, the method comprising: (a) contacting a plurality of analyte capture agents to the biological sample, in some embodiments, an analyte capture agent of the plurality of the analyte capture agents comprises (i) an analyte binding moiety that binds to the protein and (ii) the oligonucleotide comprising the analyte binding moiety barcode and the analyte capture sequence; (b) contacting the plurality of analyte capture agents to a substrate comprises a plurality of capture probes, in some embodiments, a capture probe of the plurality comprises the spatial barcode and a capture domain, in some embodiments, the capture domain binds to the analyte capture sequence; (c) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (d) amplifying the extended capture probe to produce the extended nucleic acid.

In some embodiments, the extended nucleic acid is released from the extended capture probe.

In some embodiments, step (a) and step (b) are performed at substantially the same time. In some embodiments, step (a) is performed prior to step (b). In some embodiments, step (b) is performed prior to step (a).

In some embodiments, the analyte from the biological sample is associated with a disease or condition.

In some embodiments, the capture domain of the bait oligonucleotide binds to a 3’ portion, a 5’ portion, an intron, an exon, an untranslated 3’ region, or an untranslated 5’ region of the sequence of the analyte, the analyte derivative, or a complement thereof.

In some embodiments, the capture domain of the bait oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides.

In some embodiments, the molecular tag comprises a protein, a small molecule, a nucleic acid, or a carbohydrate.

In some embodiments, the molecular tag is streptavidin, avidin, biotin, or a fluorophore.

In some embodiments, the agent that binds to the molecular tag comprises a protein (e.g., an antibody), a nucleic acid, or a small molecule.

In some embodiments, the molecular tag is biotin and the agent that binds to the molecular tag is avidin or streptavidin. In some embodiments, the agent that binds specifically to the molecular tag is attached to a substrate. In some embodiments, the substrate is a bead, a well, or a slide.

In some embodiments, the extended nucleic acid is a DNA molecule (e.g., a cDNA molecule).

In some embodiments, the extended nucleic acid further comprises a primer sequence or a complement thereof; a unique molecular sequence or a complement thereof; or an additional primer binding sequence or a complement thereof.

In some embodiments, the biological sample is a tissue sample selected from a formalin-fixed, paraffin-embedded (FFPE) tissue sample or a frozen tissue sample.

In some embodiments, the biological sample was previously stained using a detectable label, a hematoxylin and eosin (H&E) stain, immunofluorescence, or immunohistochemistry .

In some embodiments, the biological sample is a permeabilized biological sample.

In some embodiments, the determining step described herein comprises sequencing (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the nucleic acid from the biological sample.

In some embodiments, the sequencing is high throughput sequencing.

In some embodiments, the analyte is dysregulated or differentially expressed in a cancer cell, an immune cell, a cellular signaling pathway, or a neurological cell.

Also disclosed herein is a method for identifying abundance and location of a nucleic acid in a biological sample, the method comprising: (a) contacting the biological sample with a substrate comprising a plurality of attached capture probes, in some embodiments, a capture probe of the plurality comprises (i) a spatial barcode and (ii) a capture domain that binds to a sequence present in the nucleic acid; (b) hybridizing the capture probe to the nucleic acid; (c) extending a 3’ end of the capture probe using the nucleic acid that is bound to the capture domain as a template to generate an extended capture probe; and (d) amplifying the extended capture probe to produce the extended nucleic acid; in some embodiments, the extended nucleic acid comprises (i) the spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of the nucleic acid, or a complement thereof; (e) releasing the extended nucleic acid from the extended capture probe; (f) contacting a plurality of the released nucleic acids comprising the extended nucleic acid from step (e) with a plurality of bait oligonucleotides, in some embodiments, a bait oligonucleotide of the plurality of bait oligonucleotides comprises: (i) a capture domain that hybridizes to all or a portion of the sequence of the nucleic acid, or a complement thereof, and (ii) a molecular tag; (g) capturing the bait oligonucleotide bound to the extended nucleic acid using a substrate comprising an agent that binds to the molecular tag; and (h) determining (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the extended nucleic acid, and using the determined sequences of (i) and (ii) to identify the abundance and the location of the nucleic acid in the biological sample.

Also disclosed herein is a method for identifying abundance and location of a protein in a biological sample, the method comprising: (a) contacting a plurality of analyte capture agents to the biological sample, in some embodiments,: an analyte capture agent of the plurality of the analyte capture agents comprises (i) an analyte binding moiety that binds to the protein and (ii) an oligonucleotide comprising an analyte binding moiety barcode and an analyte capture sequence; (b) contacting the plurality of analyte capture agents to a substrate comprises a plurality of capture probes, in some embodiments, a capture probe of the plurality comprises a spatial barcode and a capture domain, in some embodiments, the capture domain binds to the analyte capture sequence; (c) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; (d) amplifying the extended capture probe to produce the extended nucleic acid; in some embodiments, the extended nucleic acid comprises (i) the spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of the oligonucleotide, or a complement thereof; (e) releasing the extended nucleic acid from the extended capture probe; (f) contacting a plurality of the released nucleic acids comprising the extended nucleic acid from step (e) with a plurality of bait oligonucleotides, in some embodiments, a bait oligonucleotide of the plurality of bait oligonucleotides comprises: (i) a capture domain that hybridizes to all or a portion of the sequence of the oligonucleotide, or a complement thereof, and (ii) a molecular tag; (g) capturing the bait oligonucleotide bound to the extended nucleic acid using a substrate comprising an agent that binds to the molecular tag; and (h) determining (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the extended nucleic acid, and using the determined sequences of (i) and (ii) to identify the abundance and the location of the protein in the biological sample.

Also disclosed herein is a composition comprising a bait oligonucleotide and an extended nucleic acid, in some embodiments, the bait oligonucleotide is bound to the extended nucleic acid, in some embodiments, the extended nucleic acid comprises (i) a spatial barcode or a complement thereof, and (ii) all or a portion of a sequence of an analyte, an analyte derivative, or a complement thereof; and in some embodiments, the bait oligonucleotide comprises a molecular tag, in some embodiments, the molecular tag is selected from the group consisting of streptavidin, avidin, biotin, or a fluorophore.

In some embodiments, the bait oligonucleotide is bound to the extended nucleic acid by a capture domain that hybridizes to all or a portion of the sequence of the analyte, the analyte derivative, or a complement thereof.

In some embodiments, the composition as described herein further comprises an agent that binds to the molecular tag.

In some embodiments, the agent that binds to the molecular tag comprises a protein (e.g., an antibody), a nucleic acid, or a small molecule.

In some embodiments, the molecular tag is biotin and the agent that binds to the molecular tag is avidin or streptavidin.

In some embodiments, the composition as described herein further comprises a substrate, in some embodiments, the agent that binds to the molecular tag is attached to the substrate. In some embodiments, the substrate is a bead, a well, or a slide.

Also disclosed herein is a kit comprising: an array comprising a plurality of capture probes, in some embodiments, a capture probe of the plurality of capture probes comprises a spatial barcode and a capture domain; a plurality of bait oligonucleotides; and instructions for performing the method as described herein.

Also disclosed herein is a kit comprising: a plurality of analyte capture agents, in some embodiments, an analyte capture agent of the plurality of analyte capture agents comprises an analyte binding moiety, an analyte binding moiety barcode, and an analyte capture sequence; an array comprising a plurality of capture probes, in some embodiments, a capture probe of the plurality of capture probes comprises a spatial barcode and a capture domain; a plurality of bait oligonucleotides; and instructions for performing the method as described herein.

In some embodiments, the kit as described herein further comprises reagents and/or enzymes for performing the method.

Also disclosed herein is a method for identifying a location nucleic acid in a biological sample, the method comprising: (a) contacting a plurality of nucleic acids from a biological sample with a plurality of bait oligonucleotides, in some embodiments, a plurality of nucleic acids comprises (i) a spatial barcode or a complement thereof, and (ii) a portion of a sequence of a nucleic acid from a biological sample, or a complement thereof; and a bait oligonucleotide of the plurality of bait oligonucleotides comprises: a capture domain that binds specifically to all or a portion of the sequence of the nucleic acid from the biological sample, or a complement thereof, and a molecular tag; (b) capturing the bait oligonucleotide specifically bound to the nucleic acid using a substrate comprising an agent that binds specifically to the molecular tag; and (c) determining (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the nucleic acid from the biological sample, and using the determined sequences of (i) and (ii) to identify the location of the nucleic acid in the biological sample.

In some embodiments, the capture domain of the bait oligonucleotide binds specifically to all or a portion of the sequence of the nucleic acid from the biological sample. In some embodiments, the capture domain of the bait oligonucleotide binds specifically to a 3’ portion of the sequence of the nucleic acid from the biological sample or a complement thereof. In some embodiments, the capture domain of the bait oligonucleotide binds specifically to a 5’ portion of the sequence of the nucleic acid from the biological sample or a complement thereof. In some embodiments, the capture domain of the bait oligonucleotide binds specifically to an intron in the sequence of the nucleic acid from the biological sample or a complement thereof. In some embodiments, the capture domain of the bait oligonucleotide binds specifically to an exon in the sequence of the nucleic acid from the biological sample or a complement thereof. In some embodiments, the capture domain of the bait oligonucleotide binds specifically to an untranslated 3’ region of the nucleic acid from the biological sample or a complement thereof. In some embodiments, the capture domain of the bait oligonucleotide binds specifically to an untranslated 5’ region of the nucleic acid from the biological sample or a complement thereof.

In some embodiments, the nucleic acid from the biological sample is associated with a disease or condition. In some embodiments, the nucleic acid from the biological sample comprises a mutation. In some embodiments, the nucleic acid from the biological sample comprises a single nucleotide polymorphism (SNP). In some embodiments, the nucleic acid from the biological sample comprises a trinucleotide repeat.

In some embodiments, the capture domain of the bait oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides.

In some embodiments, the molecular tag comprises a moiety. In some embodiments, the moiety is streptavidin, avidin, biotin, or a fluorophore. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is positioned 5’ to the capture domain in the bait oligonucleotide. In some embodiments, the molecular tag is position 3’ to the capture domain in the bait oligonucleotide. In some embodiments, the agent that binds specifically to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that binds specifically to the molecular tag comprises a nucleic acid. In some embodiments, the agent that binds specifically to the molecular tag comprises a small molecule. In some embodiments, the agent that binds specifically to the molecular tag is attached to a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

In some embodiments, the nucleic acid is DNA. In some embodiments, the nucleic acid further comprises a functional sequence. In some embodiments, the functional sequence is a primer sequence or a complement thereof. In some embodiments, the nucleic acid further comprises a unique molecular sequence or a complement thereof. In some embodiments, the nucleic acid further comprises an additional primer binding sequence or a complement thereof.

In some embodiments, the biological sample is a tissue sample. In some embodiments, the tissue sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample or a frozen tissue sample. In some embodiments, the biological sample was previously stained using a detectable label. In some embodiments, the biological sample was previously stained. In some embodiments, the biological sample was previously stained using hematoxylin and eosin (H&E). In some embodiments, the biological sample was previously stained using immunofluorescence or immunohistochemistry. In some embodiments, the biological sample is a permeabilized biological sample that has been permeabilized with a permeabilization agent selected from an organic solvent, a cross-linking agent, a detergent, and an enzyme, or a combination thereof. In some embodiments, the permeabilization agent is a cross-linking agent. In some embodiments, the analyte is an RNA molecule. In some embodiments, the RNA molecule is an mRNA molecule.

In some embodiments, the determining in step (c) comprises sequencing (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the nucleic acid from the biological sample. In some embodiments, the sequencing is high throughput sequencing. In some embodiments, the sequencing comprises ligating an adapter to the nucleic acid.

In some embodiments, the method further comprises generating the plurality of nucleic acids comprises: (a) contacting the biological sample with a substrate comprising a plurality of attached capture probes, in some embodiments, a capture probe of the plurality comprises (i) the spatial barcode and (ii) a capture domain that binds specifically to a sequence present in the analyte; (b) extending a 3’ end of the capture probe using the analyte that is specifically bound to the capture domain as a template to generate an extended capture probe; and (c) amplifying the extended capture probe to produce the nucleic acid. In some embodiments, the amplifying is isothermal.

In some embodiments, the produced nucleic acid is released from the extended capture probe.

In some embodiments, the nucleic acid is dysregulated or differentially expressed in a cancer cell. In some embodiments, the nucleic acid is dysregulated or differentially expressed in an immune cell. In some embodiments, the nucleic acid is dyregulated in a cellular signaling pathway. In some embodiments, the nucleic acid is dyregulated or differentially expressed in a neurological cell.

Also disclosed herein is a method for identifying a location of a nucleic acid in a biological sample, the method comprising: (a) contacting a plurality of nucleic acids with a plurality of bait oligonucleotides, in some embodiments, a nucleic acid of the plurality of nucleic acids comprises (i) a spatial barcode or a complement thereof, and (ii) a portion of a binding moiety barcode, or a complement thereof; and a bait oligonucleotide of the plurality of bait oligonucleotides comprises: a capture domain that binds specifically (i) all of a portion of the nucleic acid, or a complement thereof, and/or (ii) all or a portion of the binding moiety barcode, or a complement thereof, and a molecular tag; (b) capturing a complex of the bait oligonucleotide specifically bound to the nucleic acid or a complement thereof, or a complex of the bait oligonucleotide specifically bound to the analyte binding moiety barcode or a complement thereof, using a substrate comprising an agent that binds specifically to the molecular tag; and (c) determining (i) all or a portion of the sequence of the nucleic acid or the complement thereof, and/or (ii) all or a portion of the sequence of the analyte binding moiety barcode or the complement thereof, and using the determined sequences of (i) and (ii) to identify the location of the nucleic acid in the biological sample.

In some embodiments, the capture domain of the bait oligonucleotide binds specifically to all or a portion of the nucleic acid sequence from the biological sample. In some embodiments, the capture domain of the bait oligonucleotide binds specifically to all or a portion of the analyte binding moiety barcode.

In some embodiments, the nucleic acid from the biological sample is associated with a disease or condition.

In some embodiments, the capture domain of the bait oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides. In some embodiments, the molecular tag comprises a protein. In some embodiments, the protein is streptavidin, avidin, or biotin. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is positioned 5’ to the domain in the bait oligonucleotide. In some embodiments, the molecular tag is position 3’ to the domain in the bait oligonucleotide. In some embodiments, the agent that binds specifically to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that binds specifically to the molecular tag comprises a nucleic acid. In some embodiments, the agent that binds specifically to the molecular tag comprises a small molecule.

In some embodiments, the agent that binds specifically to the molecular tag is attached to a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

In some embodiments, the nucleic acid is DNA. In some embodiments, the nucleic acid further comprises a primer binding sequence or a complement thereof. In some embodiments, the nucleic acid further comprises a unique molecular sequence or a complement thereof. In some embodiments, the nucleic acid further comprises an additional primer binding sequence or a complement thereof.

In some embodiments, the biological sample is a tissue sample. In some embodiments, the tissue sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample or a frozen tissue sample. In some embodiments, the biological sample was previously stained using a detectable label. In some embodiments, the biological sample was previously stained. In some embodiments, the biological sample was previously stained using hematoxylin and eosin (H&E). In some embodiments, the biological sample was previously stained using immunofluorescence or immunohistochemistry. In some embodiments, the biological sample is a permeabilized biological sample that has been permeabilized with a permeabilization agent selected from an organic solvent, a cross-linking agent, a detergent, and an enzyme, or a combination thereof. In some embodiments, the permeabilization agent is selected from an organic solvent, a cross-linking agent, a detergent, and an enzyme, or a combination thereof. In some embodiments, the analyte is an RNA molecule. In some embodiments, the RNA molecule is an mRNA molecule.

In some embodiments, the determining in step (c) comprises sequencing (i) all or a portion of the sequence of the nucleic acid or the complement thereof, and (ii) all or a portion of the sequence of the analyte binding moiety barcode. In some embodiments, the sequencing is high throughput sequencing. In some embodiments, the sequencing comprises ligating an adapter to the nucleic acid.

In some embodiments, the method further comprises generating the plurality of nucleic acids comprises: (a) contacting a plurality of capture agents to the biological sample disposed on a substrate, in some embodiments, a capture agent of the plurality of the capture agents comprises (i) a binding moiety that binds specifically to the nucleic acid in the biological sample, (ii) a binding moiety barcode; and (iii) a capture sequence; and the substrate comprises a plurality of capture probes, in some embodiments, a capture probe of the plurality comprises a spatial barcode and a capture domain, in some embodiments, the capture domain binds specifically to the analyte capture sequence; and (b) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (c) amplifying the extended capture probe to produce the nucleic acid. In some embodiments, the amplifying is isothermal.

In some embodiments, the produced nucleic acid is released from the extended capture probe. In some embodiments, the nucleic acid is a nucleic acid that is dysregulated or differentially expressed in a cancer cell. In some embodiments, the nucleic acid is a nucleic acid that is dysregulated or differentially expressed in an immune cell. In some embodiments, the nucleic acid is a nucleic acid that is dyregulated in a cellular signaling pathway. In some embodiments, the nucleic acid is a nucleic acid that is dyregulated or differentially expressed in a neurological cell.

Also disclosed herein is a method for enriching a nucleic acid in a biological sample, the method comprising: (a) contacting a plurality of nucleic acids with a plurality of bait oligonucleotides, in some embodiments, a nucleic acid of the plurality of nucleic acids comprises (i) a spatial barcode or a complement thereof, and (ii) a portion of a binding moiety barcode, or a complement thereof; and a bait oligonucleotide of the plurality of bait oligonucleotides comprises: a capture domain that binds specifically (i) all of a portion of the nucleic acid, or a complement thereof, and/or (ii) all or a portion of the binding moiety barcode, or a complement thereof, and a molecular tag; (b) capturing a complex of the bait oligonucleotide specifically bound to the nucleic acid or a complement thereof, or a complex of the bait oligonucleotide specifically bound to the analyte binding moiety barcode or a complement thereof, using a substrate comprising an agent that binds specifically to the molecular tag; and (c) isolating the complex of the bait oligonucleotide specifically bound to the nucleic acid or a complement thereof, or a complex of the bait oligonucleotide specifically bound to the analyte binding moiety barcode or a complement thereof, thereby enriching the nucleic acid in the biological sample.

In some embodiments, the capture domain of the bait oligonucleotide binds specifically to all or a portion of the nucleic acid sequence from the biological sample. In some embodiments, the capture domain of the bait oligonucleotide binds specifically to all or a portion of the analyte binding moiety barcode. In some embodiments, the nucleic acid from the biological sample is associated with a disease or condition. In some embodiments, the capture domain of the bait oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides.

In some embodiments, the molecular tag comprises a protein. In some embodiments, the protein is streptavidin, avidin, or biotin. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is positioned 5’ to the domain in the bait oligonucleotide. In some embodiments, the molecular tag is position 3’ to the domain in the bait oligonucleotide. In some embodiments, the agent that binds specifically to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that binds specifically to the molecular tag comprises a nucleic acid. In some embodiments, the agent that binds specifically to the molecular tag comprises a small molecule. In some embodiments, the agent that binds specifically to the molecular tag is attached to a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

In some embodiments, the nucleic acid is DNA. In some embodiments, the nucleic acid further comprises a primer binding sequence or a complement thereof. In some embodiments, the nucleic acid further comprises a unique molecular sequence or a complement thereof. In some embodiments, the nucleic acid further comprises an additional primer binding sequence or a complement thereof.

In some embodiments, the biological sample is a tissue sample. In some embodiments, the tissue sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample or a frozen tissue sample. In some embodiments, the biological sample was previously stained using a detectable label. In some embodiments, the biological sample was previously stained. In some embodiments, biological sample was previously stained using hematoxylin and eosin (H&E). In some embodiments, the biological sample was previously stained using immunofluorescence or immunohistochemistry. In some embodiments, the biological sample is a permeabilized biological sample that has been permeabilized with a permeabilization agent selected from an organic solvent, a cross-linking agent, a detergent, and an enzyme, or a combination thereof. In some embodiments, the permeabilization agent is selected from an organic solvent, a cross-linking agent, a detergent, and an enzyme, or a combination thereof.

In some embodiments, the analyte is an RNA molecule. In some embodiments, the RNA molecule is an mRNA molecule.

In some embodiments, the method further comprises generating the plurality of nucleic acids comprises: (a) contacting a plurality of capture agents to the biological sample disposed on a substrate, in some embodiments, a capture agent of the plurality of the capture agents comprises (i) a binding moiety that binds specifically to the nucleic acid in the biological sample, (ii) a binding moiety barcode; and (iii) a capture sequence; and the substrate comprises a plurality of capture probes, in some embodiments, a capture probe of the plurality comprises a spatial barcode and a capture domain, in some embodiments, the capture domain binds specifically to the analyte capture sequence; and (b) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (c) amplifying the extended capture probe to produce the nucleic acid. In some embodiments, the amplifying is isothermal.

In some embodiments, the produced nucleic acid is released from the extended capture probe.

In some embodiments, the nucleic acid is a nucleic acid that is dysregulated or differentially expressed in a cancer cell. In some embodiments, the nucleic acid is a nucleic acid that is dysregulated or differentially expressed in an immune cell. In some embodiments, the nucleic acid is a nucleic acid that is dyregulated in a cellular signaling pathway. In some embodiments, the nucleic acid is a nucleic acid that is dyregulated or differentially expressed in a neurological cell.

Also disclosed herein is a method for identifying a location of a nucleic acid in a biological sample, the method comprising: (a) contacting a plurality of capture agents to the biological sample disposed on a substrate, in some embodiments, a capture agent of the plurality of the capture agents comprises (i) a binding moiety that binds specifically to the nucleic acid in the biological sample, (ii) a binding moiety barcode; and (iii) a capture sequence; and the substrate comprises a plurality of capture probes, in some embodiments, a capture probe of the plurality comprises a spatial barcode and a capture domain, in some embodiments, the capture domain binds specifically to the analyte capture sequence; and (b) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (c) amplifying the extended capture probe to produce the nucleic acid; in some embodiments, the nucleic acid comprises (i) a spatial barcode or a complement thereof, and (ii) a portion of an analyte binding moiety barcode, or a complement thereof; (d) releasing the produced nucleic acid from the extended capture probe; (e) contacting a plurality of released nucleic acids from step (d) with a plurality of bait oligonucleotides, in some embodiments, a bait oligonucleotide of the plurality of bait oligonucleotides comprises: a capture domain that binds specifically (i) all of a portion of the nucleic acid, or a complement thereof, and/or (ii) all or a portion of the analyte binding moiety barcode, or a complement thereof, and a molecular tag; (1) capturing a complex of the bait oligonucleotide specifically bound to the nucleic acid or a complement thereof, or a complex of the bait oligonucleotide specifically bound to the analyte binding moiety barcode or a complement thereof, using a substrate comprising an agent that binds specifically to the molecular tag; and (g) determining (i) all or a portion of the sequence of the nucleic acid or the complement thereof, and/or (ii) all or a portion of the sequence of the analyte binding moiety barcode or the complement thereof, and using the determined sequences of (i) and (ii) to identify the location of the nucleic acid in the biological sample.

Also disclosed herein is a composition comprising a bait oligonucleotide bound to a nucleic acid, in some embodiments, the nucleic acid comprises (i) a spatial barcode or a complement thereof, and (ii) a portion of an analyte binding moiety barcode, or a complement thereof.

In some embodiments, the bait oligonucleotide is bound to the nucleic acid by a capture domain that binds specifically (i) all of a portion of the nucleic acid, and/or (ii) all or a portion of the analyte binding moiety barcode, or a complement thereof.

In some embodiments, the composition further comprises a molecular tag. In some embodiments, the composition further comprises an agent that binds specifically to the molecular tag. In some embodiments, the molecular tag comprises a protein. In some embodiments, the protein is streptavidin, avidin, or biotin. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is positioned 5’ to the domain in the bait oligonucleotide. In some embodiments, the molecular tag is position 3’ to the domain in the bait oligonucleotide. In some embodiments, the agent comprises streptavidin, avidin, or biotin. In some embodiments, the agent that binds specifically to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that binds specifically to the molecular tag comprises a nucleic acid. In some embodiments, the agent that binds specifically to the molecular tag comprises a small molecule.

In some embodiments, the composition further comprises a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

All publications, patents, patent applications, and information available on the internet and mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, patent application, or item of information was specifically and individually indicated to be incorporated by reference. To the extent publications, patents, patent applications, and items of information incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.

Where values are described in terms of ranges, it should be understood that the description includes the disclosure of all possible sub-ranges within such ranges, as well as specific numerical values that fall within such ranges irrespective of whether a specific numerical value or specific sub-range is expressly stated.

The term “each,” when used in reference to a collection of items, is intended to identify an individual item in the collection but does not necessarily refer to every item in the collection, unless expressly stated otherwise, or unless the context of the usage clearly indicates otherwise.

As used herein, the term “analyte derivative” refers to a molecule (e.g., a nucleic acid or protein molecule) that is derived from an analyte (e.g., a nucleic acid), or conveys information (e.g., identity information) that corresponds to an analyte (e.g., a protein). In some embodiments, the analyte derivative comprises all or a portion of the analyte (e.g., a nucleic acid) described herein. In some embodiments, the analyte derivative comprises all or a portion of the analyte capture agent (e.g., all or a portion of the analyte capture moiety, the analyte binding moiety barcode, and/or the analyte capture sequence) described herein.

Various embodiments of the features of this disclosure are described herein. However, it should be understood that such embodiments are provided merely by way of example, and numerous variations, changes, and substitutions can occur to those skilled in the art without departing from the scope of this disclosure. It should also be understood that various alternatives to the specific embodiments described herein are also within the scope of this disclosure.

DESCRIPTION OF DRAWINGS

The following drawings illustrate certain embodiments of the features and advantages of this disclosure. These embodiments are not intended to limit the scope of the appended claims in any manner. Like reference symbols in the drawings indicate like elements.

FIG. 1 is a schematic diagram showing an example of a barcoded capture probe, as described herein.

FIG. 2 is a schematic illustrating a cleavable capture probe, wherein the cleaved capture probe can enter into a non-permeabilized cell and bind to target analytes within the sample.

FIG. 3 is a schematic diagram of an exemplary multiplexed spatially-barcoded feature.

FIG. 4 is a schematic diagram of an exemplary analyte capture agent.

FIG. 5 is a schematic diagram depicting an exemplary interaction between a feature- immobilized capture probe 524 and an analyte capture agent 526.

FIGs. 6A-6C are schematics illustrating how streptavidin cell tags can be utilized in an array-based system to produce a spatially-barcoded cells or cellular contents.

FIG. 7 is a workflow schematic showing the targeted spatial gene expression.

FIGs. 8A-8B show the on-target, the enrichment, and the complexity values wherein different spatial libraries were used. The spatial libraries included samples from heart, breast, breast cancer, or lymph. Either the hLlk_200.1 panel or the Immune panel were used.

FIG. 9 is an exemplary workflow showing targeted spatial gene expression.

FIG. 10 is an exemplary workflow for hybridization and capture-based enrichment of targeted nucleic acid sequences using a spatial gene expression library.

FIG. 11 is an exemplary graph showing the correlation between a mouse brain tissue spatial UMI count using a 65 gene targeted enrichment neuro panel (Y axis) as compared to a Visium control whole transcriptome analysis (X axis).

FIGs. 12A-12B are exemplary pictures demonstrating A) expression of the OLIG2 gene from a Visium whole transcriptome spatial array (control) and B) expression of the OLIG2 gene from a 65 gene targeted enrichment neuro panel. Pictures correlate to data of FIG. 11 FIGs. 13A-13B are exemplary graphs showing A) mapped reads for whole transcriptome Visium versus four Visium targeted gene panels, and B) UMI count correlation between the Visium whole transcriptome control (X axis) as compared to the Visium targeted Pan Cancer gene panel (Y axis).

FIGs. 14A-14B are exemplary graphs demonstrating targeted metrics for 12 human tissue type spatial arrays using four targeted gene enrichment panels as compared to a Visium whole transcriptome spatial array for A) fraction of genes recovered against matched Visium whole transcriptome, and B) UMI R-squared concordance against matched Visium whole transcriptome.

FIGs. 15A-15C demonstrate human cerebral cortex clustering of six gene clusters for A) Visium whole transcriptome and B) neuroscience targeted gene enrichment panel. FIG.15C demonstrates the UMI correlation between FIG. 15A and FIG. 15B.

FIGs. 16A-16D show spatial gene expression in a human breast cancer tissue section from the Pan-Cancer gene panel; A) pathologist’s annotations (control), B) whole transcriptome Visium data, C) 196 breast cancer genes from the Pan-Cancer enrichment panel and D) ERBB2 gene expression from the Pan-Cancer enrichment panel.

FIGs. 17A-17D provide details on the genetic targets that are encompassed in the Cancer Group in accordance with an embodiment of the present disclosure.

FIGs. 18A-18C provide details on the genetic targets that are encompassed in the Immunology Probe Group in accordance with an embodiment of the present disclosure.

FIGs. 19A-19D provide details on the genetic targets that are encompassed in the Pathway Group in accordance with an embodiment of the present disclosure.

FIGs. 20A-20D provide details on the genetic targets that are encompassed in the Neuro Group in accordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION

I. Introduction

Spatial analysis methodologies and compositions described herein can provide a vast amount of analyte and/or expression data for a variety of analytes within a biological sample at high spatial resolution, while retaining native spatial context. Spatial analysis methods and compositions can include, e.g., the use of a capture probe including a spatial barcode (e.g., a nucleic acid sequence that provides information as to the location or position of an analyte within a cell or a tissue sample (e.g., mammalian cell or a mammalian tissue sample) and a capture domain that is capable of binding to an analyte (e.g., a protein and/or a nucleic acid) produced by and/or present in a cell. Spatial analysis methods and compositions can also include the use of a capture probe having a capture domain that captures an intermediate agent for indirect detection of an analyte. For example, the intermediate agent can include a nucleic acid sequence (e.g., a barcode) associated with the intermediate agent. Detection of the intermediate agent is therefore indicative of the analyte in the cell or tissue sample.

Non-limiting aspects of spatial analysis methodologies and compositions are described in U.S. Patent Nos. 10,774,374, 10,724,078, 10,480,022, 10,059,990, 10,041,949,

10,002,316, 9,879,313, 9,783,841, 9,727,810, 9,593,365, 8,951,726, 8,604,182, 7,709,198,

U.S. Patent Application Publication Nos. 2020/239946, 2020/080136, 2020/0277663, 2020/024641, 2019/330617, 2019/264268, 2020/256867, 2020/224244, 2019/194709, 2019/161796, 2019/085383, 2019/055594, 2018/216161, 2018/051322, 2018/0245142, 2017/241911, 2017/089811, 2017/067096, 2017/029875, 2017/0016053, 2016/108458, 2015/000854, 2013/171621, WO 2018/091676, WO 2020/176788, Rodriques et al., Science 363(6434): 1463-1467, 2019; Lee et al., Nat. Protoc. 10(3):442-458, 2015; Trejo et al., PLoS ONE 14(2) :e0212031, 2019; Chen et al., Science 348(6233):aaa6090, 2015; Gao et al., BMC Biol. 15:50, 2017; and Gupta et al., Nature Biotechnol. 36:1197-1202, 2018; the Visium Spatial Gene Expression Reagent Kits User Guide (e.g., Rev C, dated June 2020), and/or the Visium Spatial Tissue Optimization Reagent Kits User Guide (e.g., Rev C, dated July 2020), both of which are available at the lOx Genomics Support Documentation website, and can be used herein in any combination. Further non-limiting aspects of spatial analysis methodologies and compositions are described herein.

Some general terminology that may be used in this disclosure can be found in Section (I)(b) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. Typically, a “barcode” is a label, or identifier, that conveys or is capable of conveying information (e.g., information about an analyte in a sample, a bead, and/or a capture probe). A barcode can be part of an analyte, or independent of an analyte. A barcode can be attached to an analyte. A particular barcode can be unique relative to other barcodes. For the purpose of this disclosure, an “analyte” can include any biological substance, structure, moiety, or component to be analyzed. The term “target” can similarly refer to an analyte of interest.

Analytes can be broadly classified into one of two groups: nucleic acid analytes, and non-nucleic acid analytes. Examples of non-nucleic acid analytes include, but are not limited to, lipids, carbohydrates, peptides, proteins, glycoproteins (N-linked or O-linked), lipoproteins, phosphoproteins, specific phosphorylated or acetylated variants of proteins, amidation variants of proteins, hydroxylation variants of proteins, methylation variants of proteins, ubiquity lati on variants of proteins, sulfation variants of proteins, viral proteins (e.g., viral capsid, viral envelope, viral coat, viral accessory, viral glycoproteins, viral spike, etc.), extracellular and intracellular proteins, antibodies, and antigen binding fragments. In some embodiments, the analyte(s) can be localized to subcellular location(s), including, for example, organelles, e.g., mitochondria, Golgi apparatus, endoplasmic reticulum, chloroplasts, endocytic vesicles, exocytic vesicles, vacuoles, lysosomes, etc. In some embodiments, analyte(s) can be peptides or proteins, including without limitation antibodies and enzymes. Additional examples of analytes can be found in Section (I)(c) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. In some embodiments, an analyte can be detected indirectly, such as through detection of an intermediate agent, for example, a connected probe (e.g., a ligation product) or an analyte capture agent (e.g., an oligonucleotide-conjugated antibody), such as those described herein.

A “biological sample” is typically obtained from the subject for analysis using any of a variety of techniques including, but not limited to, biopsy, surgery, and laser capture microscopy (LCM), and generally includes cells and/or other biological material from the subject. In some embodiments, a biological sample can be a tissue section. In some embodiments, a biological sample can be a fixed and/or stained biological sample (e.g., a fixed and/or stained tissue section). Non-limiting examples of stains include histological stains (e.g., hematoxylin and/or eosin) and immunological stains (e.g., fluorescent stains). In some embodiments, a biological sample (e.g., a fixed and/or stained biological sample) can be imaged. Biological samples are also described in Section (I)(d) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663.

In some embodiments, a biological sample is permeabilized with one or more permeabilization reagents. For example, permeabilization of a biological sample can facilitate analyte capture. Exemplary permeabilization agents and conditions are described in Section (I)(d)(ii)(13) or the Exemplary Embodiments Section of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663.

Array-based spatial analysis methods involve the transfer of one or more analytes from a biological sample to an array of features on a substrate, where each feature is associated with a unique spatial location on the array. Subsequent analysis of the transferred analytes includes determining the identity of the analytes and the spatial location of the analytes within the biological sample. The spatial location of an analyte within the biological sample is determined based on the feature to which the analyte is bound (e.g., directly or indirectly) on the array, and the feature’s relative spatial location within the array.

A “capture probe” refers to any molecule capable of capturing (directly or indirectly) and/or labelling an analyte (e.g., an analyte of interest) in a biological sample. In some embodiments, the capture probe is a nucleic acid or a polypeptide. In some embodiments, the capture probe includes a barcode (e.g., a spatial barcode and/or a unique molecular identifier (UMI)) and a capture domain). In some embodiments, a capture probe can include a cleavage domain and/or a functional domain (e.g., a primer-binding site, such as for next- generation sequencing (NGS)).

FIG. 1 is a schematic diagram showing an exemplary capture probe, as described herein. As shown, the capture probe 102 is optionally coupled to a feature 101 by a cleavage domain 103, such as a disulfide linker. The capture probe can include a functional sequence 104 that are useful for subsequent processing. The functional sequence 104 can include all or a part of sequencer specific flow cell attachment sequence (e.g., a P5 or P7 sequence), all or a part of a sequencing primer sequence, (e.g., a R1 primer binding site, a R2 primer binding site), or combinations thereof. The capture probe can also include a spatial barcode 105. The capture probe can also include a unique molecular identifier (UMI) sequence 106. While FIG. 1 shows the spatial barcode 105 as being located upstream (5’) of UMI sequence 106, it is to be understood that capture probes wherein UMI sequence 106 is located upstream (5’) of the spatial barcode 105 is also suitable for use in any of the methods described herein. The capture probe can also include a capture domain 107 to facilitate capture of a target analyte.

In some embodiments, the capture probe comprises one or more additional functional sequences that can be located, for example between the spatial barcode 105 and the UMI sequence 106, between the UMI sequence 106 and the capture domain 107, or following the capture domain 107. The capture domain can have a sequence complementary to a sequence of a nucleic acid analyte. The capture domain can have a sequence complementary to a connected probe described herein. The capture domain can have a sequence complementary to a capture handle sequence present in an analyte capture agent. The capture domain can have a sequence complementary to a splint oligonucleotide. Such splint oligonucleotide, in addition to having a sequence complementary to a capture domain of a capture probe, can have a sequence of a nucleic acid analyte, a sequence complementary to a portion of a connected probe described herein, and/or a capture handle sequence described herein.

The functional sequences can generally be selected for compatibility with any of a variety of different sequencing systems, e.g., Ion Torrent Proton or PGM, Illumina sequencing instruments, PacBio, Oxford Nanopore, etc., and the requirements thereof. In some embodiments, functional sequences can be selected for compatibility with non- commercialized sequencing systems. Examples of such sequencing systems and techniques, for which suitable functional sequences can be used, include (but are not limited to) Ion Torrent Proton or PGM sequencing, Illumina sequencing, PacBio SMRT sequencing, and Oxford Nanopore sequencing. Further, in some embodiments, functional sequences can be selected for compatibility with other sequencing systems, including non-commercialized sequencing systems.

In some embodiments, the spatial barcode 105 and functional sequences 104 is common to all of the probes attached to a given feature. In some embodiments, the UMI sequence 106 of a capture probe attached to a given feature is different from the UMI sequence of a different capture probe attached to the given feature.

FIG. 2 is a schematic illustrating a cleavable capture probe, wherein the cleaved capture probe can enter into a non-permeabilized cell and bind to analytes within the sample. The capture probe 201 contains a cleavage domain 202, a cell penetrating peptide 203, a reporter molecule 204, and a disulfide bond (-S-S-). 205 represents all other parts of a capture probe, for example a spatial barcode and a capture domain.

FIG. 3 is a schematic diagram of an exemplary multiplexed spatially-barcoded feature. In FIG. 3, the feature 301 can be coupled to spatially-barcoded capture probes, wherein the spatially-barcoded probes of a particular feature can possess the same spatial barcode, but have different capture domains designed to associate the spatial barcode of the feature with more than one target analyte. For example, a feature may be coupled to four different types of spatially-barcoded capture probes, each type of spatially-barcoded capture probe possessing the spatial barcode 302. One type of capture probe associated with the feature includes the spatial barcode 302 in combination with a poly(T) capture domain 303, designed to capture mRNA target analytes. A second type of capture probe associated with the feature includes the spatial barcode 302 in combination with a random N-mer capture domain 304 for gDNA analysis. A third type of capture probe associated with the feature includes the spatial barcode 302 in combination with a capture domain complementary to a capture handle sequence of an analyte capture agent of interest 305. A fourth type of capture probe associated with the feature includes the spatial barcode 302 in combination with a capture domain that can specifically bind a nucleic acid molecule 306 that can function in a CRISPR assay (e.g., CRISPR/Cas9). While only four different capture probe-barcoded constructs are shown in FIG. 3, capture-probe barcoded constructs can be tailored for analyses of any given analyte associated with a nucleic acid and capable of binding with such a construct. For example, the schemes shown in FIG. 3 can also be used for concurrent analysis of other analytes disclosed herein, including, but not limited to: (a) mRNA, a lineage tracing construct, cell surface or intracellular proteins and metabolites, and gDNA; (b) mRNA, accessible chromatin (e.g., ATAC-seq, DNase-seq, and/or MNase-seq) cell surface or intracellular proteins and metabolites, and a perturbation agent (e.g., a CRISPR crRNA/sgRNA, TALEN, zinc finger nuclease, and/or antisense oligonucleotide as described herein); (c) mRNA, cell surface or intracellular proteins and/or metabolites, abarcoded labelling agent (e.g., the MHC multimers described herein), and a V(D)J sequence of an immune cell receptor (e.g., T-cell receptor). In some embodiments, a perturbation agent can be a small molecule, an antibody, a drug, an aptamer, a miRNA, a physical environmental (e.g., temperature change), or any other known perturbation agents. See, e.g., Section (II)(b) (e.g., subsections (i)-(vi)) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. Generation of capture probes can be achieved by any appropriate method, including those described in Section (II)(d)(ii) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663.

In some embodiments, more than one analyte type (e.g., nucleic acids and proteins) from a biological sample can be detected (e.g., simultaneously or sequentially) using any appropriate multiplexing technique, such as those described in Section (IV) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663.

In some embodiments, detection of one or more analytes (e.g., protein analytes) can be performed using one or more analyte capture agents. As used herein, an “analyte capture agent” refers to an agent that interacts with an analyte (e.g., an analyte in a biological sample) and with a capture probe (e.g., a capture probe attached to a substrate or a feature) to identify the analyte. In some embodiments, the analyte capture agent includes: (i) an analyte binding moiety (e.g., that binds to an analyte), for example, an antibody or antigen-binding fragment thereof; (ii) analyte binding moiety barcode; and (iii) a capture handle sequence (e.g., an analyte capture sequence). As used herein, the term “analyte binding moiety barcode” refers to a barcode that is associated with or otherwise identifies the analyte binding moiety. As used herein, the term “analyte capture sequence” or “capture handle sequence” refers to a region or moiety configured to hybridize to, bind to, couple to, or otherwise interact with a capture domain of a capture probe. In some embodiments, a capture handle sequence is complementary to a capture domain of a capture probe. In some cases, an analyte binding moiety barcode (or portion thereof) may be able to be removed (e.g., cleaved) from the analyte capture agent.

FIG. 4 is a schematic diagram of an exemplary analyte capture agent 402 comprised of an analyte-binding moiety 404 and an analyte binding moiety barcode domain 408. The exemplary analyte -binding moiety 404 is a molecule capable of binding to an analyte 406 and the analyte capture agent is capable of interacting with a spatially-barcoded capture probe. The analyte -binding moiety can bind to the analyte 406 with high affinity and/or with high specificity. The analyte capture agent can include an analyte binding moiety barcode domain 408, a nucleotide sequence (e.g., an oligonucleotide), which can hybridize to at least a portion or an entirety of a capture domain of a capture probe. The analyte binding moiety barcode domain 408 can comprise an analyte binding moiety barcode and a capture handle sequence described herein. The analyte -binding moiety 404 can include a polypeptide and/or an aptamer. The analyte-binding moiety 404 can include an antibody or antibody fragment (e.g., an antigen-binding fragment).

FIG. 5 is a schematic diagram depicting an exemplary interaction between a feature- immobilized capture probe 524 and an analyte capture agent 526. The feature-immobilized capture probe 524 can include a spatial barcode 508 as well as functional sequences 506 and UMI 510, as described elsewhere herein. The capture probe can also include a capture domain 512 that is capable of binding to an analyte capture agent 526. The analyte capture agent 526 can include a functional sequence 518, analyte binding moiety barcode 516, and a capture handle sequence 514 that is capable of binding to the capture domain 512 of the capture probe 524. The analyte capture agent can also include a linker 520 that allows the capture agent barcode domain 516 to couple to the analyte binding moiety 522.

FIGs. 6A, 6B, and 6C are schematics illustrating how streptavidin cell tags can be utilized in an array-based system to produce a spatially-barcoded cell or cellular contents. For example, as shown in FIG. 6A, peptide-bound major histocompatibility complex (MHC) can be individually associated with biotin (b2ih) and bound to a streptavidin moiety such that the streptavidin moiety comprises multiple pMHC moieties. Each of these moieties can bind to a TCR such that the streptavidin binds to a target T-cell via multiple MCH/TCR binding interactions. Multiple interactions synergize and can substantially improve binding affinity. Such improved affinity can improve labelling of T-cells and also reduce the likelihood that labels will dissociate from T-cell surfaces. As shown in FIG. 6B, a capture agent barcode domain 601 can be modified with streptavidin 602 and contacted with multiple molecules of biotinylated MHC 603 such that the biotinylated MHC 603 molecules are coupled with the streptavidin conjugated capture agent barcode domain 601. The result is a barcoded MHC multimer complex 605. As shown in FIG. 6B, the capture agent barcode domain sequence 601 can identify the MHC as its associated label and also includes optional functional sequences such as sequences for hybridization with other oligonucleotides. As shown in FIG. 6C, one example oligonucleotide is capture probe 606 that comprises a complementary sequence (e.g., rGrGrG corresponding to C C C), a barcode sequence and other functional sequences, such as, for example, a UMI, an adapter sequence (e.g., comprising a sequencing primer sequence (e.g., R1 or a partial R1 (“pRl”), R2), a flow cell attachment sequence (e.g., P5 or P7 or partial sequences thereof)), etc. In some cases, capture probe 606 may at first be associated with a feature (e.g., a gel bead) and released from the feature. In other embodiments, capture probe 606 can hybridize with a capture agent barcode domain 601 of the MHC-oligonucleotide complex 605. The hybridized oligonucleotides (Spacer C C C and Spacer rGrGrG) can then be extended in primer extension reactions such that constructs comprising sequences that correspond to each of the two spatial barcode sequences (the spatial barcode associated with the capture probe, and the barcode associated with the MHC- oligonucleotide complex) are generated. In some cases, one or both of these corresponding sequences may be a complement of the original sequence in capture probe 606 or capture agent barcode domain 601. In other embodiments, the capture probe and the capture agent barcode domain are ligated together. The resulting constructs can be optionally further processed (e.g., to add any additional sequences and/or for clean-up) and subjected to sequencing. As described elsewhere herein, a sequence derived from the capture probe 606 spatial barcode sequence may be used to identify a feature and the sequence derived from spatial barcode sequence on the capture agent barcode domain 601 may be used to identify the particular peptide MHC complex 604 bound on the surface of the cell (e.g., when using MHC-peptide libraries for screening immune cells or immune cell populations).

Additional description of analyte capture agents can be found in Section (II)(b)(ix) of WO 2020/176788 and/or Section (II)(b)(viii) U.S. Patent Application Publication No. 2020/0277663.

There are at least two methods to associate a spatial barcode with one or more neighboring cells, such that the spatial barcode identifies the one or more cells, and/or contents of the one or more cells, as associated with a particular spatial location. One method is to promote analytes or analyte proxies (e.g., intermediate agents) out of a cell and towards a spatially-barcoded array (e.g., including spatially-barcoded capture probes). Another method is to cleave spatially -barcoded capture probes from an array and promote the spatially-barcoded capture probes towards and/or into or onto the biological sample.

In some cases, capture probes may be configured to prime, replicate, and consequently yield optionally barcoded extension products from a template (e.g., a DNA or RNA template, such as an analyte or an intermediate agent (e.g., a connected probe (e.g., a ligation product) or an analyte capture agent), or a portion thereof), or derivatives thereof (see, e.g., Section (II)(b)(vii) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663 regarding extended capture probes). In some cases, capture probes may be configured to form a connected probe (e.g., a ligation product) with a template (e.g., a DNA or RNA template, such as an analyte or an intermediate agent, or portion thereof), thereby creating ligation products that serve as proxies for a template.

As used herein, an “extended capture probe” refers to a capture probe having additional nucleotides added to the terminus (e.g., 3’ or 5’ end) of the capture probe thereby extending the overall length of the capture probe. For example, an “extended 3’ end” indicates additional nucleotides were added to the most 3’ nucleotide of the capture probe to extend the length of the capture probe, for example, by polymerization reactions used to extend nucleic acid molecules including templated polymerization catalyzed by a polymerase (e.g., a DNA polymerase or a reverse transcriptase). In some embodiments, extending the capture probe includes adding to a 3’ end of a capture probe a nucleic acid sequence that is complementary to a nucleic acid sequence of an analyte or intermediate agent bound to the capture domain of the capture probe. In some embodiments, the capture probe is extended using reverse transcription. In some embodiments, the capture probe is extended using one or more DNA polymerases. The extended capture probes include the sequence of the capture probe and the sequence of the spatial barcode of the capture probe.

In some embodiments, extended capture probes are amplified (e.g., in bulk solution or on the array) to yield quantities that are sufficient for downstream analysis, e.g., via DNA sequencing. In some embodiments, extended capture probes (e.g., DNA molecules) act as templates for an amplification reaction (e.g., a polymerase chain reaction).

Additional variants of spatial analysis methods, including in some embodiments, an imaging step, are described in Section (II)(a) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. Analysis of captured analytes (and/or intermediate agents or portions thereof), for example, including sample removal, extension of capture probes, sequencing (e.g., of a cleaved extended capture probe and/or a cDNA molecule complementary to an extended capture probe), sequencing on the array (e.g., using, for example, in situ hybridization or in situ ligation approaches), temporal analysis, and/or proximity capture, is described in Section (II)(g) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. Some quality control measures are described in Section (II)(h) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663.

Spatial information can provide information of biological and/or medical importance. For example, the methods and compositions described herein can allow for: identification of one or more biomarkers (e.g., diagnostic, prognostic, and/or for determination of efficacy of a treatment) of a disease or disorder; identification of a candidate drug target for treatment of a disease or disorder; identification (e.g., diagnosis) of a subject as having a disease or disorder; identification of stage and/or prognosis of a disease or disorder in a subject; identification of a subject as having an increased likelihood of developing a disease or disorder; monitoring of progression of a disease or disorder in a subject; determination of efficacy of a treatment of a disease or disorder in a subject; identification of a patient subpopulation for which a treatment is effective for a disease or disorder; modification of a treatment of a subject with a disease or disorder; selection of a subject for participation in a clinical trial; and/or selection of a treatment for a subject with a disease or disorder.

Spatial information can provide information of biological importance. For example, the methods and compositions described herein can allow for: identification of transcriptome and/or proteome expression profiles (e.g., in healthy and/or diseased tissue); identification of multiple analyte types in close proximity (e.g., nearest neighbor analysis); determination of up- and/or down-regulated genes and/or proteins in diseased tissue; characterization of tumor microenvironments; characterization of tumor immune responses; characterization of cells types and their co-localization in tissue; and identification of genetic variants within tissues (e.g., based on gene and/or protein expression profiles associated with specific disease or disorder biomarkers).

Typically, for spatial array-based methods, a substrate functions as a support for direct or indirect attachment of capture probes to features of the array. A “feature” is an entity that acts as a support or repository for various molecular entities used in spatial analysis. In some embodiments, some or all of the features in an array are functionalized for analyte capture. Exemplary substrates are described in Section (II)(c) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. Exemplary features and geometric attributes of an array can be found in Sections (II)(d)(i), (II)(d)(iii), and (II)(d)(iv) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. Generally, analytes and/or intermediate agents (or portions thereof) can be captured when contacting a biological sample with a substrate including capture probes (e.g., a substrate with capture probes embedded, spoted, printed, fabricated on the substrate, or a substrate with features (e.g., beads, wells) comprising capture probes). As used herein, “contact,” “contacted,” and/or “contacting,” a biological sample with a substrate refers to any contact (e.g., direct or indirect) such that capture probes can interact (e.g., bind covalently or non-covalently (e.g., hybridize)) with analytes from the biological sample. Capture can be achieved actively (e.g., using electrophoresis) or passively (e.g., using diffusion). Analyte capture is further described in Section (II)(e) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663.

In some cases, spatial analysis can be performed by ataching and/or introducing a molecule (e.g., a peptide, a lipid, or a nucleic acid molecule) having a barcode (e.g., a spatial barcode) to a biological sample (e.g., to a cell in a biological sample). In some embodiments, a plurality of molecules (e.g., a plurality of nucleic acid molecules) having a plurality of barcodes (e.g., a plurality of spatial barcodes) are introduced to a biological sample (e.g., to a plurality of cells in a biological sample) for use in spatial analysis. In some embodiments, after ataching and/or introducing a molecule having a barcode to a biological sample, the biological sample can be physically separated (e.g., dissociated) into single cells or cell groups for analysis. Some such methods of spatial analysis are described in Section (III) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663.

In some cases, spatial analysis can be performed by detecting multiple oligonucleotides that hybridize to an analyte. In some instances, for example, spatial analysis can be performed using RNA-templated ligation (RTL). Methods of RTL have been described previously. See, e.g., Credle etal., Nucleic Acids Res. 2017 Aug 21;45(14):el28. Typically, RTL includes hybridization of two oligonucleotides to adjacent sequences on an analyte (e.g., an RNA molecule, such as an mRNA molecule). In some instances, the oligonucleotides are DNA molecules. In some instances, one of the oligonucleotides includes at least two ribonucleic acid bases at the 3’ end and/or the other oligonucleotide includes a phosphorylated nucleotide at the 5’ end. In some instances, one of the two oligonucleotides includes a capture domain (e.g., apoly(A) sequence, anon-homopolymeric sequence). After hybridization to the analyte, a ligase (e.g., SplintR ligase) ligates the two oligonucleotides together, creating a connected probe (e.g., a ligation product). In some instances, the two oligonucleotides hybridize to sequences that are not adjacent to one another. For example, hybridization of the two oligonucleotides creates a gap between the hybridized oligonucleotides. In some instances, a polymerase (e.g., a DNA polymerase) can extend one of the oligonucleotides prior to ligation. After ligation, the connected probe (e.g., a ligation product) is released from the analyte. In some instances, the connected probe (e.g., a ligation product) is released using an endonuclease (e.g., RNAse H). The released connected probe (e.g., a ligation product) can then be captured by capture probes (e.g., instead of direct capture of an analyte) on an array, optionally amplified, and sequenced, thus determining the location and optionally the abundance of the analyte in the biological sample.

During analysis of spatial information, sequence information for a spatial barcode associated with an analyte is obtained, and the sequence information can be used to provide information about the spatial distribution of the analyte in the biological sample. Various methods can be used to obtain the spatial information. In some embodiments, specific capture probes and the analytes they capture are associated with specific locations in an array of features on a substrate. For example, specific spatial barcodes can be associated with specific array locations prior to array fabrication, and the sequences of the spatial barcodes can be stored (e.g., in a database) along with specific array location information, so that each spatial barcode uniquely maps to a particular array location.

Alternatively, specific spatial barcodes can be deposited at predetermined locations in an array of features during fabrication such that at each location, only one type of spatial barcode is present so that spatial barcodes are uniquely associated with a single feature of the array. Where necessary, the arrays can be decoded using any of the methods described herein so that spatial barcodes are uniquely associated with array feature locations, and this mapping can be stored as described above.

When sequence information is obtained for capture probes and/or analytes during analysis of spatial information, the locations of the capture probes and/or analytes can be determined by referring to the stored information that uniquely associates each spatial barcode with an array feature location. In this manner, specific capture probes and captured analytes are associated with specific locations in the array of features. Each array feature location represents a position relative to a coordinate reference point (e.g., an array location, a fiducial marker) for the array. Accordingly, each feature location has an “address” or location in the coordinate space of the array.

Some exemplary spatial analysis workflows are described in the Exemplary Embodiments section of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. See, for example, the Exemplary embodiment starting with “In some non- limiting examples of the workflows described herein, the sample can be immersed... ” of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663. See also, e.g., the Visium Spatial Gene Expression Reagent Kits User Guide (e.g., Rev C, dated June 2020), the Targeted Gene Expression-Spatial User Guide (e.g., Rev A, dated October 2020), and/or the Visium Spatial Tissue Optimization Reagent Kits User Guide (e.g., Rev C, dated July 2020).

In some embodiments, spatial analysis can be performed using dedicated hardware and/or software, such as any of the systems described in Sections (II)(e)(ii) and/or (V) of WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663, or any of one or more of the devices or methods described in Sections Control Slide for Imaging, Methods of Using Control Slides and Substrates for, Systems of Using Control Slides and Substrates for Imaging, and/or Sample and Array Alignment Devices and Methods, Informational labels of WO 2020/123320.

Suitable systems for performing spatial analysis can include components such as a chamber (e.g., a flow cell or sealable, fluid-tight chamber) for containing a biological sample. The biological sample can be mounted for example, in a biological sample holder. One or more fluid chambers can be connected to the chamber and/or the sample holder via fluid conduits, and fluids can be delivered into the chamber and/or sample holder via fluidic pumps, vacuum sources, or other devices coupled to the fluid conduits that create a pressure gradient to drive fluid flow. One or more valves can also be connected to fluid conduits to regulate the flow of reagents from reservoirs to the chamber and/or sample holder.

The systems can optionally include a control unit that includes one or more electronic processors, an input interface, an output interface (such as a display), and a storage unit (e.g., a solid state storage medium such as, but not limited to, a magnetic, optical, or other solid state, persistent, writeable and/or re-writeable storage medium). The control unit can optionally be connected to one or more remote devices via a network. The control unit (and components thereol) can generally perform any of the steps and functions described herein. Where the system is connected to a remote device, the remote device (or devices) can perform any of the steps or features described herein. The systems can optionally include one or more detectors (e.g., CCD, CMOS) used to capture images. The systems can also optionally include one or more light sources (e.g., LED-based, diode-based, lasers) for illuminating a sample, a substrate with features, analytes from a biological sample captured on a substrate, and various control and calibration media.

The systems can optionally include software instructions encoded and/or implemented in one or more of tangible storage media and hardware components such as application specific integrated circuits. The software instructions, when executed by a control unit (and in particular, an electronic processor) or an integrated circuit, can cause the control unit, integrated circuit, or other component executing the software instructions to perform any of the method steps or functions described herein.

In some cases, the systems described herein can detect (e.g., register an image) the biological sample on the array. Exemplary methods to detect the biological sample on an array are described in PCT Application No. 2020/061064 and/or U.S. Patent Application Serial No. 16/951,854.

Prior to transferring analytes from the biological sample to the array of features on the substrate, the biological sample can be aligned with the array. Alignment of a biological sample and an array of features including capture probes can facilitate spatial analysis, which can be used to detect differences in analyte presence and/or level within different positions in the biological sample, for example, to generate a three-dimensional map of the analyte presence and/or level. Exemplary methods to generate a two- and/or three-dimensional map of the analyte presence and/or level are described in PCT Application No. 2020/053655 and spatial analysis methods are generally described in WO 2020/061108 and/or U.S. Patent Application Serial No. 16/951,864.

In some cases, a map of analyte presence and/or level can be aligned to an image of a biological sample using one or more fiducial markers, e.g., objects placed in the field of view of an imaging system which appear in the image produced, as described in the Substrate Attributes Section, Control Slide for Imaging Section of WO 2020/123320, PCT Application No. 2020/061066, and/or U.S. Patent Application Serial No. 16/951,843. Fiducial markers can be used as a point of reference or measurement scale for alignment (e.g., to align a sample and an array, to align two substrates, to determine a location of a sample or array on a substrate relative to a fiducial marker) and/or for quantitative measurements of sizes and/or distances.

The sandwich process is described in PCT Patent Application Publication No. WO 2020/123320, which is incorporated by reference in its entirety.

II. Targeted spatial gene expression profiling by hybridization and capture of spatial cDNA

(a) Introduction

Spatial analysis methods using capture probes and/or analyte capture agents provide information regarding the abundance and location of an analyte (e.g., a nucleic acid or protein). Traditionally, the sequencing result contains unwanted genes, ribosomal or mitochondrial transcripts, and other reads that are not of interest. Detection of the analytes of interest largely depends, at least in part, on the sequencing capacity of all captured analytes on the array. Disclosed herein are methods and compositions for capturing target analytes of interest using bait oligonucleotides that are specific to the target analytes of interest. In this way, the sequencing result contains a higher percentage of reads from the target analytes of interest, which increases the spatial resolution and sequencing cost of the target analytes of interest.

Disclosed herein are methods of capturing target analytes of interest using bait oligonucleotides that are specific to the target analytes of interest. As disclosed herein, bait oligonucleotides are short (40 bp to 160 bp) oligonucleotides that hybridize to transcribed (e.g., mRNA) sequences in order to detect the mRNA, and the expression thereof. It has been identified that bait oligonucleotides can hybridize to the 5’ end of a transcript, the 3’ end of the transcript, or any intervening sequence of the transcript. In particular, designing probes to hybridize to an intervening sequence of a transcript (i.e., not the 5’ end and not the 3’ end) has certain advantages. For example, many transcripts in the human genome have varying sequences at the 5’ and 3’ end. Thus, designing baits that would target intervening transcript sequences, and particularly conserved sequences, could allow a single bait to hybridize to multiple isoforms of the same gene. Thus, disclosed herein are compositions and methods that comprise bait oligonucleotides that hybridize to multiple isoforms of the same gene.

Accordingly, provided herein are methods that comprise designing and using probes (e.g., bait oligonucleotides, nucleic acid baits, and the like) to capture full-length (e.g., unfragmented) cDNA for sequencing analysis, rather than bait oligonucleotides for capturing final library fragments encompassing 3’ UTRs. By targeting full-length cDNA, it is possible to utilize the more reliably annotated regions of the target gene, such as coding sequences, for each transcript (e.g., isoform) of the respective gene. Accordingly, steps for identifying a target of interest in a sample using this method include, but are not limited to, preparation of a nucleic acid library so that nucleic acid baits can hybridize to one or more target analytes; hybridization of the nucleic acid baits to the one or more target analytes; and determining the location and abundance of the target analyte in the biological sample.

Also featured herein are methods of detecting analytes of interest that have been captured by a capture probe on a substrate (i.e., a spatial array). In some instances, target analytes of interest are identified using probes (e.g., nucleic acid baits) after the analytes of interest are captured on a spatial array. Thus, steps for identifying a target of interest in a sample using this method include, but are not limited to, capture of an analyte in a sample using a capture probe; amplification of the hybridized capture probe/analyte product to create a cDNA library; preparation of the cDNA library in order that nucleic acid baits can hybridize to one or more target analytes; hybridization of the nucleic acid baits to the one or more target analytes; and determining the location and abundance of the target analyte in the biological sample.

Profiling the phenotype of a biological sample using either of the methods described above and as further elucidated herein helps to avoid confounding biological factors, such as detection of strongly expressed genes that are not relevant to a study or cell-cycle phenotypes that could mask biological variance. In addition, enhancing detection of one or more target analytes helps to dramatically reduce sequencing cost to obtain information from a panel of genes of interest, by minimizing the number of reads spent on non-panel genes (e.g., ribosomal protein transcripts, mitochondrial transcripts, etc.). As such, the methods described herein are further cost effective.

(b) Biological Samples Analytes and Preparation of the Same (i) Biological Samples and Analytes

In some embodiments, the biological sample used in the methods disclosed herein is a cell culture sample. In some instances, the biological sample is a tissue sample. In some embodiments, the biological sample is a section of a tissue sample. In some embodiments, the biological sample is a fresh tissue sample. In some embodiments, the biological sample is a fresh-frozen tissue sample. In some embodiments, the biological sample is a tissue sample that has been formalin-fixed and paraffin-embedded (FFPE) (i.e., an FFPE sample). In some embodiments, the biological sample is a tissue sample embedded in optimal cutting temperature (OCT) compound. In some embodiments, the biological sample has been previously stained (e.g., immunohistochemistry (IHC) or histological staining) and imaged, and optionally, destained.

In some instances, a biological sample is obtained from the subject for analysis using any of a variety of techniques including, but not limited to, biopsy, surgery, and laser capture microscopy (LCM), and generally includes cells and/or other biological material from the subject. A biological sample can also be obtained from a prokaryote such as a bacterium, e.g., Escherichia coli, Staphylococci or Mycoplasma pneumoniae, mArchaea a virus such as Hepatitis C virus or human immunodeficiency virus; or a viroid. A biological sample can also be obtained from a eukaryote, such as a patient derived organoid (PDO) or patient derived xenograft (PDX) for example. Biological samples can be from mammals, such as humans, non-human mammals (e.g., mice), etc. Subjects from which biological samples can be obtained can be healthy or asymptomatic individuals, individuals that have or are suspected of having a disease (e.g., a patient with a disease such as cancer) or a pre- disposition to a disease, and/or individuals that are in need of therapy or suspected of needing therapy.

The biological sample can include any number of macromolecules, for example, cellular macromolecules and organelles (e.g., mitochondria and nuclei). The biological sample can be a nucleic acid sample and/or protein sample. The biological sample can be a carbohydrate sample or a lipid sample. The biological sample can be obtained as a tissue sample, such as a tissue section, biopsy, a core biopsy, needle aspirate, or fine needle aspirate. The sample can be a fluid sample, such as a blood sample, urine sample, or saliva sample. The sample can be a skin sample, a colon sample, a cheek swab, a histology sample, a histopathology sample, a plasma or serum sample, a tumor sample, living cells, cultured cells, a clinical sample such as, for example, whole blood or blood-derived products, blood cells, or cultured tissues or cells, including cell suspensions.

Biological samples can be derived from a homogeneous culture or population of the subjects or organisms mentioned herein or alternatively from a collection of several different organisms, for example, in a community or ecosystem. In some embodiments, the biological sample is a human sample.

Biological samples can include one or more diseased cells. A diseased cell can have altered metabolic properties, gene expression, protein expression, and/or morphologic features. Examples of diseases include inflammatory disorders, metabolic disorders, nervous system disorders, and cancer. Cancer cells can be derived from solid tumors, hematological malignancies, cell lines, or obtained as circulating tumor cells.

Biological samples can also include fetal cells. For example, a procedure such as amniocentesis can be performed to obtain a fetal cell sample from maternal circulation. Sequencing of fetal cells can be used to identify any of a number of genetic disorders, including, e.g., aneuploidy such as Down’s syndrome, Edwards syndrome, and Patau syndrome. Further, cell surface features of fetal cells can be used to identify any of a number of disorders or diseases.

Biological samples can also include immune cells. Sequence analysis of the immune repertoire of such cells, including genomic, proteomic, and cell surface features, can provide a wealth of information to facilitate an understanding in the status and function of the immune system. By way of example, determining the status (e.g., negative or positive) of minimal residue disease (MRD) in a multiple myeloma (MM) patient following autologous stem cell transplantation is considered a predictor of MRD in the MM patient (see, e.g., U.S. Patent Application Publication No. 2018/0156784, the entire contents of which are incorporated herein by reference).

Examples of immune cells in a biological sample include, but are not limited to, B cells, T cells (e.g., cytotoxic T cells, natural killer T cells, regulatory T cells, and T helper cells), natural killer cells, cytokine induced killer (CIK) cells, myeloid cells, such as granulocytes (basophil granulocytes, eosinophil granulocytes, neutrophil granulocytes/hypersegmented neutrophils), monocytes/macrophages, mast cells, thrombocytes/megakaryocytes, and dendritic cells.

In some embodiments, the biological sample is affixed to a slide. In some embodiments, the sample is stained prior to creation of the library of nucleic acids (e.g., plurality of nucleic acids). In some embodiments, the biological sample is stained while the biological sample is on the slide. In some embodiments, the stained biological sample is imaged prior to creation of the library of nucleic acids (e.g., plurality of nucleic acids).

In some embodiments, staining includes biological staining techniques such as H&E staining. In some embodiments, staining includes identifying analytes using fluorescently conjugated antibodies (e.g., immunofluorescence). In some embodiments, a biological sample is stained using two or more different types of stains, or two or more different staining techniques (e.g., IF, IHC, and/or H&E staining). For example, a biological sample can be prepared by staining and imaging using one technique (e.g., H&E staining and bright field imaging), destained (e.g., quenching or photobleaching), followed by staining and imaging using another technique (e.g., IHC/IF staining and fluorescence microscopy) for the same biological sample.

In some embodiments, biological samples can be destained prior to creation of the spatial library of nucleic acids. Methods of destaining or discoloring a biological sample are known in the art, and generally depend on the nature of the stain(s) applied to the biological sample.

In some embodiments, the analyte in the biological sample is a nucleic acid. In some embodiments, the analyte (e.g., nucleic acid) is obtained from the biological sample. In some embodiments, the nucleic acid is DNA (e.g., genomic DNA, mitochondrial DNA, or exosomal DNA). In some embodiments, the nucleic acid is RNA. In some embodiments, the RNA is mRNA. Additional examples of RNA such as various types of coding and non coding RNA. Examples of the different types of RNA analytes include messenger RNA (mRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), microRNA (miRNA), and viral RNA. The RNA can be a transcript (e.g., present in a tissue section). The RNA can be small (e.g., less than 200 nucleic acid bases in length) or large (e.g., RNA greater than 200 nucleic acid bases in length). Small RNAs mainly include 5.8S ribosomal RNA (rRNA), 5S rRNA, transfer RNA (tRNA), microRNA (miRNA), small interfering RNA (siRNA), small nucleolar RNA (snoRNAs), Piwi-interacting RNA (piRNA), tRNA-derived small RNA (tsRNA), and small rDNA-derived RNA (srRNA). The RNA can be double- stranded RNA or single- stranded RNA. The RNA can be circular RNA. The RNA can be a bacterial rRNA (e.g., 16s rRNA or 23s rRNA).

In some embodiments, the nucleic acid comprises DNA. Examples of DNA include genomic DNA, methylated DNA, specific methylated DNA sequences, fragmented DNA, mitochondrial DNA, in situ synthesized PCR products, and RNA/DNA hybrids.

(ii) Imaging and Staining

In some instances, biological samples can be stained using a wide variety of stains and staining techniques. In some instances, the biological sample is a section of a tissue (e.g., a 10 pm section). In some instances, the biological sample is dried after placement onto a glass slide. In some instances, the biological sample is dried at 42°C. In some instances, drying occurs for about 1 hour, about 2, hours, about 3 hours, or until the sections become transparent. In some instances, the biological sample can be dried overnight (e.g., in a desiccator at room temperature).

In some embodiments, a sample can be stained using any number of biological stains, including but not limited to, acridine orange, Bismarck brown, carmine, coomassie blue, cresyl violet, DAPI, eosin, ethidium bromide, acid fuchsine, hematoxylin, Hoechst stains, iodine, methyl green, methylene blue, neutral red, Nile blue, Nile red, osmium tetroxide, propidium iodide, rhodamine, or safranin. In some instances, the methods disclosed herein include imaging the biological sample. In some instances, imaging the sample occurs prior to deaminating the biological sample. In some instances, the sample can be stained using known staining techniques, including Can-Grunwald, Giemsa, hematoxylin and eosin (H&E), Jenner’s, Leishman, Masson’s tri chrome, Papanicolaou, Romanowsky, silver, Sudan, Wright’s, and/or Periodic Acid Schiff (PAS) staining techniques. PAS staining is typically performed after formalin or acetone fixation. In some instances, the stain is an H&E stain.

In some embodiments, the biological sample can be stained using a detectable label (e.g., radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes) as described elsewhere herein. In some embodiments, a biological sample is stained using only one type of stain or one technique. In some embodiments, staining includes biological staining techniques such as H&E staining. In some embodiments, staining includes identifying analytes using fluorescently -conjugated antibodies. In some embodiments, a biological sample is stained using two or more different types of stains, or two or more different staining techniques. For example, a biological sample can be prepared by staining and imaging using one technique (e.g., H&E staining and brightfield imaging), followed by staining and imaging using another technique (e.g., IHC/IF staining and fluorescence microscopy) for the same biological sample.

In some embodiments, biological samples can be destained. Methods of destaining or decoloring a biological sample are known in the art, and generally depend on the nature of the stain(s) applied to the sample. For example, H&E staining can be destained by washing the sample in HC1, or any other acid (e.g., selenic acid, sulfuric acid, hydroiodic acid, benzoic acid, carbonic acid, malic acid, phosphoric acid, oxalic acid, succinic acid, salicylic acid, tartaric acid, sulfurous acid, trichloroacetic acid, hydrobromic acid, hydrochloric acid, nitric acid, orthophosphoric acid, arsenic acid, selenous acid, chromic acid, citric acid, hydrofluoric acid, nitrous acid, isocyanic acid, formic acid, hydrogen selenide, molybdic acid, lactic acid, acetic acid, carbonic acid, hydrogen sulfide, or combinations thereof). In some embodiments, destaining can include 1, 2, 3, 4, 5, or more washes in an acid (e.g., HC1). In some embodiments, destaining can include adding HC1 to a downstream solution (e.g., permeabilization solution). In some embodiments, destaining can include dissolving an enzyme used in the disclosed methods (e.g., pepsin) in an acid (e.g., HC1) solution. In some embodiments, after destaining hematoxylin with an acid, other reagents can be added to the destaining solution to raise the pH for use in other applications. For example, SDS can be added to an acid destaining solution in order to raise the pH as compared to the acid destaining solution alone. As another example, in some embodiments, one or more immunofluorescence stains are applied to the sample via antibody coupling. Such stains can be removed using techniques such as cleavage of disulfide linkages via treatment with a reducing agent and detergent washing, chaotropic salt treatment, treatment with antigen retrieval solution, and treatment with an acidic glycine buffer. Methods for multiplexed staining and destaining are described, for example, in Bolognesi et ak, J. Histochem. Cytochem. 2017; 65(8): 431-444, Lin et ak, Nat Commun. 2015; 6:8390, Pirici et ak, J. Histochem. Cytochem. 2009; 57:567-75, and Glass et ak, J. Histochem. Cytochem. 2009; 57:899-905, the entire contents of each of which are incorporated herein by reference. In some embodiments, immunofluorescence or immunohistochemistry protocols (direct and indirect staining techniques) can be performed as a part of, or in addition to, the exemplary spatial workflows presented herein. For example, tissue sections can be fixed according to methods described herein. The biological sample can be transferred to an array (e.g., capture probe array), wherein analytes (e.g., proteins) are probed using immunofluorescence protocols. For example, the sample can be rehydrated, blocked, and permeabilized (3X SSC, 2% BSA, 0.1% Triton X, 1 U/mI RNAse inhibitor for 10 minutes at 4°C) before being stained with fluorescent primary antibodies (1:100 in 3XSSC, 2% BSA, 0.1% Triton X, 1 U/mI RNAse inhibitor for 30 minutes at 4°C). The biological sample can be washed, coverslipped (in glycerol + 1 U/mI RNAse inhibitor), imaged (e.g., using a confocal microscope or other apparatus capable of fluorescent detection), washed, and processed according to analyte capture or spatial workflows described herein.

In some instances, a glycerol solution and a cover slip can be added to the sample. In some instances, the glycerol solution can include a counterstain (e.g., DAPI).

As used herein, an antigen retrieval buffer can improve antibody capture in IF/IHC protocols. An exemplary protocol for antigen retrieval can be preheating the antigen retrieval buffer (e.g., to 95°C), immersing the biological sample in the heated antigen retrieval buffer for a predetermined time, and then removing the biological sample from the antigen retrieval buffer and washing the biological sample.

In some embodiments, optimizing permeabilization can be useful for identifying intracellular analytes. Permeabilization optimization can include selection of permeabilization agents, concentration of permeabilization agents, and permeabilization duration. Tissue permeabilization is discussed elsewhere herein.

In some embodiments, blocking an array and/or a biological sample in preparation of labeling the biological sample decreases nonspecific binding of the antibodies to the array and/or biological sample (decreases background). Some embodiments provide for blocking buffers/blocking solutions that can be applied before and/or during application of the label, wherein the blocking buffer can include a blocking agent, and optionally a surfactant and/or a salt solution. In some embodiments, a blocking agent can be bovine serum albumin (BSA), serum, gelatin (e.g., fish gelatin), milk (e.g., non-fat dry milk), casein, polyethylene glycol (PEG), polyvinyl alcohol (PVA), or polyvinylpyrrolidone (PVP), biotin blocking reagent, a peroxidase blocking reagent, levamisole, Camoy’s solution, glycine, lysine, sodium borohydride, pontamine sky blue, Sudan Black, trypan blue, FITC blocking agent, and/or acetic acid. The blocking buffer/blocking solution can be applied to the array and/or biological sample prior to and/or during labeling (e.g., application of fluorophore-conjugated antibodies) to the biological sample.

(iii) Preparation of Sample for Application of Probes

In some instances, the biological sample is deparaffmized. Deparaffmization can be achieved using any method known in the art. For example, in some instances, the biological sample is treated with a series of washes that include xylene and various concentrations of ethanol. In some instances, methods of deparaffmization include treatment with xylene (e.g., three washes at 5 minutes each). In some instances, the methods further include treatment with ethanol (e.g., 100% ethanol, two washes 10 minutes each; 95% ethanol, two washes 10 minutes each; 70% ethanol, two washes 10 minutes each; 50% ethanol, two washes 10 minutes each). In some instances, after ethanol washes, the biological sample can be washed with deionized water (e.g., two washes for 5 minutes each). It is appreciated that one skilled in the art can adjust these methods to optimize deparaffmization.

In some instances, the biological sample is decrosslinked. In some instances, the biological sample is decrosslinked in a solution containing TE buffer (comprising Tris and EDTA). In some instances, the TE buffer is basic (e.g., at a pH of about 9). In some instances, decrosslinking occurs at about 50°C to about 80°C. In some instances, decrosslinking occurs at about 70°C. In some instances, decrosslinking occurs for about 1 hour at 70°C. Just prior to decrosslinking, the biological sample can be treated with an acid (e.g., 0.1M HC1 for about 1 minute). After the decrosslinking step, the biological sample can be washed (e.g., with lx PBST).

In some instances, the methods of preparing a biological sample for probe application include permeabilizing the sample. In some instances, the biological sample is permeabilized using a phosphate buffer. In some instances, the phosphate buffer is PBS (e.g., lx PBS). In some instances, the phosphate buffer is PBST (e.g., lx PBST). In some instances, the permeabilization step is performed multiple times (e.g., 3 times at 5 minutes each).

In some instances, the methods of preparing a biological sample for probe application include steps of equilibrating and blocking the biological sample. In some instances, equilibrating is performed using a pre-hybridization (pre-Hyb) buffer. In some instances, the pre-Hyb buffer is RNase-free. In some instances, the pre-Hyb buffer contains no bovine serum albumin (BSA), solutions like Denhardf s, or other potentially nuclease-contaminated biological materials. In some instances, the equilibrating step is performed multiple times (e.g., 2 times at 5 minutes each; 3 times at 5 minutes each). In some instances, the biological sample is blocked with a blocking buffer. In some instances, the blocking buffer includes a carrier such as tRNA, for example yeast tRNA such as from brewer’s yeast (e.g., at a final concentration of 10-20 pg/mL). In some instances, blocking can be performed for 5, 10, 15, 20, 25, or 30 minutes.

Any of the foregoing steps can be optimized for performance. For example, one can vary the temperature. In some instances, the pre-hybridization methods are performed at room temperature. In some instances, the pre-hybridization methods are performed at 4°C (in some instances, varying the timeframes provided herein).

(c) Nucleic Acid Library Preparation

(i) Single Cell Library Preparation

Disclosed herein are methods of preparing a library of nucleic acids (e.g., a plurality of nucleic acids) from a cell or population of cells. In some embodiments, the biological sample can be derived from a cell culture grown in vitro. Samples derived from a cell culture can include one or more suspension cells which are anchorage-independent within the cell culture. Examples of such cells include, but are not limited to, cell lines derived from hematopoietic cells, and from the following cell lines: Colo205, CCRF-CEM, HL-60, K562, MOLT-4, RPMI-8226, SR, HOP-92, NCI-H322M, and MALME-3M.

Samples derived from a cell culture can include one or more adherent cells which grow on the surface of the vessel that contains the culture medium. Non-limiting examples of adherent cells include DU145 (prostate cancer) cells, H295R (adrenocortical cancer) cells, HeLa (cervical cancer) cells, KBM-7 (chronic myelogenous leukemia) cells, LNCaP (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-468 (breast cancer) cells, PC3 (prostate cancer) cells, SaOS-2 (bone cancer) cells, SH-SY5Y (neuroblastoma, cloned from a myeloma) cells, T-47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, National Cancer Institute’s 60 cancer cell line panel (NCI60), vero (African green monkey Chlorocebus kidney epithelial cell line) cells, MC3T3 (embryonic calvarium) cells, GH3 (pituitary tumor) cells, PC12 (pheochromocytoma) cells, dog MDCK kidney epithelial cells, Xenopus A6 kidney epithelial cells, zebrafish AB9 cells, and Sf9 insect epithelial cells.

Additional examples of samples considered as a cell or population of cells include, without limitation, those listed in Table 1 in priority documents U.S. Provisional Patent Application Nos. 62/979,652; 62/980,124; and 63/077,019, each of which is incorporated by reference in its entirety. It is appreciated that a cell population used in single-cell library preparation can be from any of the cells or cell culture populations disclosed herein.

In some embodiments, the library of nucleic acids (e.g., a plurality of nucleic acids) include one or more nucleic acids of interest, whose detection can be enhanced using hybridization methods. Analytes can be isolated, amplified, and/or otherwise processed for subsequent analysis, such as fragmentation and sequencing library preparation.

In some embodiments, the biological sample is permeabilized using any method described herein. A biological sample is permeabilized to allow analytes to be released from one or more cells within the biological sample. In some embodiments, obtaining the pool of analytes (e.g., mRNA) from the biological sample comprises any method of nucleic acid extraction and/or isolation known in the art and/or described herein. In some embodiments, analytes released and obtained from the one or more cells can be amplified. In some embodiments, the pool of poly-adenylated mRNA is obtained from the biological sample in preparation for a sequencing method (e.g., by any method encompassed in a sequencing library preparation workflow).

Additional reagents can be added to a biological sample to perform various functions prior to analysis of the sample. In some embodiments, DNase and RNase inactivating agents or inhibitors, and/or chelating agents such as EDTA, can be added to the sample. In some embodiments, the sample can be treated with one or more enzymes. For example, one or more endonucleases to fragment DNA or RNA, and DNA polymerase enzymes, used to amplify nucleic acids can be added. Enzymes that can be added to the sample include, but are not limited to, polymerases, transposases, ligases, DNAses, and RNAses.

In some embodiments, reverse transcriptase enzymes can be added to the sample, including enzymes with terminal transferase activity, primers, and switch oligonucleotides. Template switching can be used to increase the length of a cDNA, e.g., by appending a predefined nucleic acid sequence to the cDNA from which nucleic acid extension can proceed.

In some embodiments, obtaining the plurality of cDNA sequences comprises performing first-strand cDNA synthesis via reverse transcription of the corresponding pool of analytes (e.g., poly-adenylated mRNA). In some such embodiments, the cDNA is generated using a poly(T) containing primer. In some embodiments, the generated cDNA is barcoded using a capture probe, featuring a barcode sequence (and optionally, a UMI sequence) that hybridizes with at least a portion of the generated cDNA. In some embodiments, the generated cDNA is appended with a unique molecular identifier (UMI) using a capture probe featuring a UMI that hybridizes with at least a portion of the generated cDNA. In some embodiments, a template switching oligonucleotide hybridizes to a poly(C) tail added to a 3’ end of the cDNA by a reverse transcriptase enzyme. In some such embodiments, the original mRNA template and template switching oligonucleotide is denatured from the cDNA, and the barcoded capture probe with optional UMI hybridizes with the cDNA to generate a complement of the cDNA.

In some embodiments, obtaining the plurality of cDNA sequences further comprises amplification (e.g., PCR amplification) and/or adaptor extension of each cDNA sequence in the plurality of cDNA sequences. In some embodiments, the adaptor extension occurs prior to cDNA amplification. In some such embodiments, the adaptor extension occurs during the first-strand cDNA synthesis. In some such embodiments, the adaptor extension occurs by hybridization of the RNA molecule to a capture probe. In some other embodiments, the adaptor extension occurs by hybridization of a cDNA molecule to a capture probe.

In some embodiments, the plurality of cDNA sequences is not fragmented, such that each cDNA sequence in the plurality of cDNA sequences is a full-length cDNA sequence. In some such embodiments, the plurality of cDNA sequences comprises intermediate products of a gene expression library preparation workflow (e.g., unfragmented cDNA sequences generated using a method for 3’ gene expression library preparation).

In some embodiments, the plurality of cDNA sequences comprises a first subset of cDNA sequences, where each respective cDNA sequence in the first subset of cDNA sequences maps to a respective gene in a plurality of genes. The plurality of genes can be a targeted gene panel (e.g., a panel of genes of interest). In some embodiments, the plurality of genes is between five genes and twenty thousand genes. In some embodiments, the plurality of genes is between one hundred genes and ten thousand genes. In some embodiments, the plurality of genes is between five hundred genes and two thousand genes. In some embodiments, the plurality of genes is more than 10, more than 50, more than 100, more than 500, more than 1000, more than 2000, more than 5000, more than 10000, more than 15000, or more than 20000 genes.

In some embodiments, the plurality of cDNA sequences comprises a second subset of cDNA sequences, where each respective cDNA sequence in the second subset of cDNA sequences maps to a portion of a reference genome not represented by the plurality of genes to which the first subset of cDNA sequences is mapped. The second subset of cDNA sequences can include cDNA sequences that map to off-target regions of the reference genome and/or other genes not included in a targeted gene panel.

In some embodiments, the plurality of cDNA sequences consists of the first subset of cDNA sequences and the second subset of cDNA sequences. In some such embodiments, each cDNA sequence in the plurality of cDNA sequences maps to either a gene of interest or to an off-target region of a reference genome.

Each respective gene in the plurality of genes can be characterized by any number of transcripts. For example, in some embodiments, three or more transcripts correspond to each respective gene. In some embodiments, five or more transcripts correspond to each respective gene. In some embodiments, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more transcripts correspond to each respective gene. In some embodiments, the corresponding plurality of transcripts of a respective gene in the plurality of genes comprises a plurality of isoforms of the respective gene.

Isoforms of a gene refer to mRNA molecules (and the corresponding cDNA molecules) that originate from the same genomic locus but comprise different nucleic acid sequences, including but not limited to transcription start sites (TSSs), protein coding DNA sequences (CDSs), and/or untranslated regions (UTRs). These differences are caused by alternative splicing, variable promoter usage, gene fusions or deletions, single nucleotide polymorphisms (SNPs), and/or other mutations or post- transcriptional genetic modifications. Isoforms of a gene may have different functional capacities due to the differences in mRNA sequence of the coding sequences and/or the cis-regulatory elements in the promoter sequences.

The identification of different isoforms of a gene is an essential step in accurately enriching for cDNA sequences of the respective gene. For example, where the diverging nucleic acid sequences of two or more isoforms of a target gene include different transcription start sites and/or untranslated regions, bait probes complementary to the 3’ or 5’ annotated ends of the gene may fail to capture all of the possible isoforms. This can occur if a nucleic acid bait is designed to hybridize to a region of a first isoform that extends beyond either of the terminal ends of a second isoform. While minimal bait probe sets are desirable for cost-effective and streamlined targeted sequencing analysis, in some cases where two or more isoforms have such drastically different lengths that they do not overlap, it may be necessary to design a plurality of nucleic acid baits such that each isoform can be bound and enriched. In some such cases, each non-overlapping isoform hybridizes to a different, unique bait probe. Therefore, proper annotation of the genomic regions spanned by each isoform (e.g., the coding and/or non-coding sequences) is necessary to ensure that bait probe design generates a plurality of bait probes in which every transcript for the target gene is represented (e.g., hybridizable).

In some embodiments, each transcript in the plurality of transcripts is protein coding. Alternatively, one or more transcripts in the plurality of transcripts can comprise a non coding sequence (e.g., a 3’ or 5’ untranslated region). For example, one or more transcripts in the plurality of transcripts can comprise a 3’ UTR sequence downstream of a stop codon, or a 5’ UTR upstream of a start codon. In some embodiments, one or more transcripts in the plurality of transcripts is protein coding but comprises an incomplete coding sequence. For example, in some such embodiments, a transcript in the plurality of transcripts corresponding to the respective gene is coding sequence (CDS) 3’ incomplete, CDS 5’ incomplete, or both CDS 3’ and 5’ incomplete. As used herein, CDS 3’ incomplete refers to a protein-coding transcript that does not include the stop codon due to incomplete evidence. As used herein, CDS 5’ incomplete refers to a protein-coding transcript that does not include a start codon due to incomplete evidence.

In some embodiments, reference databases are useful for aligning transcripts corresponding to the respective gene, such as the reference genome GENCODE Release 33 (GRCh38.pl3). For instance, annotations for a respective gene are available as a reference database within the GENCODE Consortium, In other instances, annotations for a respective gene are available through the Ensembl project See, Harrow el al, 2012, “GENCODE: The reference human genome annotation for The ENCODE Project,” Genome Res. 22(9): 1760- 1774: doi: 10.1101/gr.135350. Ill; and Flicek et al, 2014, “Ensembl 2014,” Nucleic Acids Res. 42(Database issue):D749-D755: doi:10.1093/nar/gktll96, the entire contents of which are incorporated herein by reference.

(ii) Spatial Library Preparation

Disclosed herein are methods of generating a plurality of extended nucleic acids (e.g., any of the extended nucleic acids described herein) for detection of analytes that include proteins and nucleic acids. In some instances, the methods include detection of nucleic acids. In some embodiments, the methods include detection of proteins.

As used herein, an “extended nucleic acid” refers to a nucleic acid having additional nucleotides added to the terminus (e.g., 3’ or 5’ end) of the nucleic acid thereby extending the overall length of the nucleic acid. Disclosed herein are methods of preparing a library of target nucleic acids using bait oligonucleotides, wherein the library of target nucleic acids is initially prepared from nucleic acids that were hybridized to capture probes on a spatial array (i.e., a substrate). For example, captured nucleic acid sequences on a substrate can be put through a spatial workflow providing resultant nucleic acids that can be isolated, amplified, and/or otherwise processed for subsequent analysis, such as fragmentation and sequencing library preparation. Capture probes, substrates, and arrays have been described in previous sections of this application above and are incorporated into this section.

In some instances, embodiments of any of the methods described herein can include contacting a biological sample with a substrate comprising a plurality of attached capture probes, wherein a capture probe of the plurality comprises (i) a spatial barcode and (ii) a capture domain that binds specifically to a sequence present in an analyte (e.g., a nucleic acid); extending a 3’ end of the capture probe using the analyte that is specifically bound to the capture domain as a template to generate an extended capture probe; and amplifying the extended capture probe. In some instances, embodiments of any of the methods described herein can include contacting a plurality of analyte capture agents to a biological sample, wherein an analyte capture agent of the plurality of the analyte capture agents comprises (i) an analyte binding moiety that binds to the analyte (e.g., a protein) and (ii) an oligonucleotide comprising an analyte binding moiety barcode and an analyte capture sequence; contacting the plurality of analyte capture agents to a substrate comprises a plurality of capture probes, wherein a capture probe of the plurality comprises a spatial barcode and a capture domain, wherein the capture domain binds to the analyte capture sequence; extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and amplifying the extended capture probe. Individual method steps and system features can be present in combination in many different embodiments; the specific combinations described herein do not in any way limit other combinations of steps and features.

Analytes can be captured on an array when contacting a biological sample with, e.g., a substrate (e.g., an array) comprising capture probes. Capture probes interact with analytes released from the biological sample through a capture domain, described throughout this application, to capture analytes. For example, a capture domain captures analytes via hybridization to a nucleic acid sequence in a target nucleic acid molecule from a biological sample. In some instances, the sequence complementary to the capture domain is a poly- adenylated (poly(A)) sequence. In some instances, the sequence complementary to the capture domain is designed to be specific to a sequence of interest (i.e., in a target analyte).

In the setting of nucleic acid detection, the nucleic acid analyte hybridizes directly to the capture probes on the array (e.g., a substrate). In the setting of protein detection, referring to FIG. 4, an analyte binding moiety 402 comprises a protein-binding moiety 404 and an oligonucleotide 408. In some instances, the analyte binding moieties are added to the biological sample. After association of the protein with the protein-binding moiety 404, the oligonucleotide is captured by a capture domain of the array (e.g., a substrate). Thus, the analyte binding moiety as used herein can be considered an analyte derivative, and its oligonucleotide can be captured and analyzed in a similar way that a nucleic acid analyte can be analyzed. Embodiments for analyte binding moieties has been described previously e.g., in WO 2020/176788 and/or U.S. Patent Application Publication No. 2020/0277663, each of which is incorporated by reference in its entirety.

Alternatively, a proxy for an analyte, can be captured by a capture domain on a capture probe on an array. For example, two probes can hybridize to target nucleic acids such that they can be ligated together to create a ligation product that can serve as a proxy of the target sequence. The ligation product can comprise a sequence that is substantially complementary to the capture domain of the capture probe on the array. The ligation product can be captured on the array and serve as a proxy for the presence of the target nucleic.

In some embodiments, the sample is stained prior to creation of the nucleic acid library. In some embodiments, the biological sample is stained while the biological sample is on the slide. In some embodiments, the stained biological sample is imaged prior to creation of the nucleic acid library. In some embodiments, staining includes biological staining techniques such as H&E staining. In some embodiments, staining includes identifying analytes using fluorescently conjugated antibodies (e.g., immunofluorescence). In some embodiments, a biological sample is stained using two or more different types of stains, or two or more different staining techniques (e.g., IF, IHC, and/or H&E staining). For example, a biological sample can be prepared by staining and imaging using one technique (e.g., H&E staining and bright field imaging), destained (e.g., quenching or photobleaching), followed by staining and imaging using another technique (e.g., IHC/IF staining and fluorescence microscopy) for the same biological sample.

In some embodiments, prior to interaction with capture probes on the array, target- specific reactions are performed in the biological sample to enhance detection of one or more targets of interest. Examples of target specific reactions include, but are not limited to, ligation of target specific adaptors, probes and/or other oligonucleotides, target specific amplification using primers specific to one or more nucleic acid analytes, and target-specific detection using in situ hybridization, DNA microscopy, and/or antibody detection. In some embodiments, a capture probe includes capture domains targeted to target-specific products (e.g., amplification or ligation). A target analyte can then be captured by a capture probe (e.g., as described throughout herein).

In some embodiments, the methods provided herein include a permeabilizing step in order to release the analytes from the biological sample. In some embodiments, permeabilization occurs using a protease. In some embodiments, the protease is an endopeptidase. Endopeptidases that can be used include but are not limited to trypsin, chymotrypsin, elastase, thermolysin, pepsin, clostripan, glutamyl endopeptidase (GluC), ArgC, peptidyl-asp endopeptidase (ApsN), endopeptidase LysC and endopeptidase LysN. In some embodiments, the endopeptidase is pepsin.

In some embodiments, methods provided herein include permeabilization of the biological sample such that the capture probe can more easily bind to the analyte or analyte derivative (i.e., compared to no permeabilization). In some embodiments, reverse transcription (RT) reagents can be added to permeabilized biological samples. Incubation with the RT reagents can produce spatially -barcoded full-length cDNA from the captured analytes (e.g., polyadenylated mRNA). Second strand reagents (e.g., second strand primers, enzymes) can be added to the biological sample on the substrate to initiate second strand synthesis.

In some instances, the permeabilization step includes application of a permeabilization buffer to the biological sample. In some instances, the permeabilization buffer includes a buffer (e.g., Tris pH 7.5), MgCh. sarkosyl detergent (e.g., sodium lauroyl sarcosinate) or other detergent, enzyme (e.g., proteinase K, pepsin, collagenase, etc.), and nuclease free water. In some instances, the permeabilization step is performed at 37°C. In some instances, the permeabilization step is performed for about 20 minutes to 2 hours (e.g., about 20 minutes, about 30 minutes, about 40 minutes, about 50 minutes, about 1 hour, about 1.5 hours, or about 2 hours). In some instances, the releasing step is performed for about 40 minutes.

After permeabilization, in some instances, the analytes are captured by the capture probes and/or analyte capture agents. In some embodiments, such methods of increasing capture efficiency of a spatial array described herein includes contacting the spatial array with the biological sample and allowing the analyte to interact with the capture probes and/or analyte capture agents.

In some embodiments, the capture probe hybridized to the analyte can be extended using a polymerase (e.g., a reverse transcriptase) using the hybridized analyte as a template, to generate an extended capture probe. In some embodiments, a 3’ end of the capture probe hybridized to the analyte can be extended using a polymerase (e.g., a reverse transcriptase) using the hybridized analyte as a template, to generate an extended capture probe. In some embodiments, a 5’ end of the capture probe hybridized to the analyte can be extended using a polymerase (e.g., a reverse transcriptase) using the hybridized analyte as a template, to generate an extended capture probe. The extended capture probe can be amplified (e.g., via second strand synthesis) to generate a single-stranded nucleic acid comprising a sequence that is complementary to the extended capture probe. The single-stranded nucleic acid comprising a sequence that is complementary to the extended capture probe can be used to generate or can be a part of a nucleic acid library.

After hybridization of the analyte to the capture probe, the hybridized product is amplified. For example, obtaining the library of cDNA sequences further comprises amplification (e.g., PCR amplification) and/or adaptor extension of the cDNA sequences. In some embodiments, the adaptor extension occurs prior to cDNA amplification. In some such embodiments, the adaptor extension occurs during the first-strand cDNA synthesis.

In some embodiments, the method includes amplifying all or part of the analyte using isothermal amplification or non-isothermal amplification. In some embodiments, the amplifying creates an amplified product that includes (i) all or part of a sequence of the analyte or the analyte derivative bound to the capture probe, or a complement thereof, and (ii) all or a part of the sequence of the spatial barcode, or a complement thereof. In some embodiments, the determining step includes sequencing. A non-limiting example of sequencing that can be used to determine the sequence of the analyte, the analyte derivative, and/or spatial barcodes is in situ sequencing. In some embodiments, in situ sequencing is performed via sequencing-by-synthesis (SBS), sequential fluorescence hybridization, sequencing by ligation, nucleic acid hybridization, or high-throughput digital sequencing techniques. In some embodiments the analyte is RNA or DNA. In some embodiments, the analyte is a protein.

In some embodiments, after contacting a biological sample with a substrate that includes capture probes, a removal step can optionally be performed to remove all or a portion of the biological sample from the substrate. In some embodiments, the removal step includes enzymatic and/or chemical degradation of cells of the biological sample. For example, the removal step can include treating the biological sample with an enzyme (e.g., a proteinase, e.g., proteinase K) to remove at least a portion of the biological sample from the substrate. In some embodiments, the removal step can include ablation of the tissue (e.g., laser ablation).

In some embodiments, provided herein are methods for spatially detecting an analyte (e.g., detecting the location of an analyte, e.g., a biological analyte) from a biological sample (e.g., present in a biological sample), the method comprising: (a) optionally staining and/or imaging a biological sample on a substrate; (b) permeabilizing (e.g., providing a solution comprising a permeabilization reagent to) the biological sample on the substrate; (c) contacting the biological sample with an array comprising a plurality of capture probes, wherein the capture probes capture the biological analyte; and (d) analyzing the captured biological analyte, thereby spatially detecting the biological analyte; wherein the biological sample is fully or partially removed from the substrate.

In some embodiments, a biological sample is not removed from the substrate. For example, the biological sample is not removed from the substrate prior to releasing a capture probe (e.g., a capture probe bound to an analyte) from the substrate. In some embodiments, such releasing comprises cleavage of the capture probe from the substrate (e.g., via a cleavage domain). In some embodiments, such releasing does not comprise releasing the capture probe from the substrate (e.g., a copy of the capture probe bound to an analyte can be made and the copy can be released from the substrate, e.g., via denaturation). In some embodiments, the biological sample is not removed from the substrate prior to analysis of an analyte bound to a capture probe after it is released from the substrate. In some embodiments, the biological sample remains on the substrate during removal of a capture probe from the substrate and/or analysis of an analyte bound to the capture probe after it is released from the substrate. In some embodiments, the biological sample remains on the substrate during removal (e.g., via denaturation) of a copy of the capture probe (e.g., complement). In some embodiments, analysis of an analyte bound to capture probe from the substrate can be performed without subjecting the biological sample to enzymatic and/or chemical degradation of the cells (e.g., permeabilized cells) or ablation of the tissue (e.g., laser ablation).

In some embodiments, at least a portion of the biological sample is not removed from the substrate. For example, a portion of the biological sample can remain on the substrate prior to releasing a capture probe (e.g., a capture probe bound to an analyte) from the substrate and/or analyzing an analyte bound to a capture probe released from the substrate. In some embodiments, at least a portion of the biological sample is not subjected to enzymatic and/or chemical degradation of the cells (e.g., permeabilized cells) or ablation of the tissue (e.g., laser ablation) prior to analysis of an analyte bound to a capture probe from the substrate.

In some embodiments, provided herein are methods for spatially detecting an analyte (e.g., detecting the location of an analyte, e.g., a biological analyte) from a biological sample (e.g., present in a biological sample) that include: (a) optionally staining and/or imaging a biological sample on a substrate; (b) permeabilizing (e.g., providing a solution comprising a permeabilization reagent to) the biological sample on the substrate; (c) contacting the biological sample with an array comprising a plurality of capture probes, wherein a capture probe of the plurality captures the biological analyte; and (d) analyzing the captured biological analyte, thereby spatially detecting the biological analyte; where the biological sample is not removed from the substrate.

In some embodiments, provided herein are methods for spatially detecting a biological analyte of interest from a biological sample that include: (a) staining and imaging a biological sample on a substrate; (b) providing a solution comprising a permeabilization reagent to the biological sample on the substrate; (c) contacting the biological sample with an array on a substrate, wherein the array comprises one or more capture probe pluralities thereby allowing the one or more pluralities of capture probes to capture the biological analyte of interest; and (d) analyzing the captured biological analyte, thereby spatially detecting the biological analyte of interest; where the biological sample is not removed from the substrate.

In some embodiments, the method further includes subjecting a region of interest in the biological sample to spatial transcriptomic analysis. In some embodiments, one or more of the capture probes includes a capture domain. In some embodiments, one or more of the capture probes comprises a unique molecular identifier (UMI). In some embodiments, one or more of the capture probes comprises a cleavage domain. In some embodiments, the cleavage domain comprises a sequence recognized and cleaved by uracil-DNA glycosylase, apurinic/apyrimidinic (AP) endonuclease (APE1), U uracil-specific excision reagent (USER), and/or an endonuclease VIII. In some embodiments, one or more capture probes do not comprise a cleavage domain and is not cleaved from the array.

In some embodiments, a capture probe can be extended (an “extended capture probe,” e.g., as described herein). For example, extending a capture probe can include generating cDNA from a captured (hybridized) RNA. This process involves synthesis of a complementary strand of the hybridized nucleic acid, e.g., generating cDNA based on the captured RNA template (the RNA hybridized to the capture domain of the capture probe). Thus, in an initial step of extending a capture probe, e.g., the cDNA generation, the captured (hybridized) nucleic acid, e.g., RNA, acts as a template for the extension, e.g., reverse transcription, step.

In some embodiments, the capture probe is extended using reverse transcription. For example, reverse transcription includes synthesizing cDNA (complementary or copy DNA) from RNA, e.g., (messenger RNA), using a reverse transcriptase. In some embodiments, reverse transcription is performed while the tissue is still in place, generating an analyte library, where the analyte library includes the spatial barcodes from the adjacent capture probes. In some embodiments, the capture probe is extended using one or more DNA polymerases.

In some embodiments, a capture domain of a capture probe includes a primer for producing the complementary strand of a nucleic acid hybridized to the capture probe, e.g., a primer for DNA polymerase and/or reverse transcription. The nucleic acid, e.g., DNA and/or cDNA, molecules generated by the extension reaction incorporate the sequence of the capture probe. The extension of the capture probe, e.g., a DNA polymerase and/or reverse transcription reaction, can be performed using a variety of suitable enzymes and protocols.

In some embodiments, a full-length DNA (e.g., cDNA) molecule is generated. In some embodiments, a “full-length” DNA molecule refers to the whole of the captured nucleic acid molecule. However, if a nucleic acid (e.g., RNA) was partially degraded in the tissue sample, then the captured nucleic acid molecules will not be the same length as the initial RNA in the tissue sample. In some embodiments, the 3’ end of the extended probes, e.g., first strand cDNA molecules, is modified. For example, a linker or adaptor can be ligated to the 3’ end of the extended probes. This can be achieved using single stranded ligation enzymes such as T4 RNA ligase or Circligase™ (available from Lucigen, Middleton, WI). In some embodiments, template switching oligonucleotides are used to extend cDNA in order to generate a full-length cDNA (or as close to a full-length cDNA as possible). In some embodiments, a second strand synthesis helper probe (a partially double stranded DNA molecule capable of hybridizing to the 3’ end of the extended capture probe), can be ligated to the 3’ end of the extended probe, e.g., first strand cDNA, molecule using a double stranded ligation enzyme such as T4 DNA ligase. Other enzymes appropriate for the ligation step are known in the art and include, e.g., Tth DNA ligase, Taq DNA ligase, Thermococcus sp. (strain 9°N) DNA ligase (9°N™ DNA ligase, New England Biolabs), Ampligase™ (available from Lucigen, Middleton, WI), and SplintR (available from New England Biolabs, Ipswich, MA). In some embodiments, a polynucleotide tail, e.g., a poly(A) tail, is incorporated at the 3’ end of the extended probe molecules. In some embodiments, the polynucleotide tail is incorporated using a terminal transferase active enzyme.

In some embodiments, double-stranded extended capture probes are treated to remove any unextended capture probes prior to amplification and/or analysis, e.g., sequence analysis. This can be achieved by a variety of methods, e.g., using an enzyme to degrade the unextended probes, such as an exonuclease enzyme, or purification columns.

In some embodiments, extended capture probes are amplified to yield quantities that are sufficient for analysis, e.g., via DNA sequencing. In some embodiments, the first strand of the extended capture probes (e.g., DNA and/or cDNA molecules) acts as a template for the amplification reaction (e.g., a polymerase chain reaction).

In some embodiments, the amplification reaction incorporates an affinity group onto the extended capture probe (e.g., RNA-cDNA hybrid) using a primer including the affinity group. In some embodiments, the primer includes an affinity group and the extended capture probes includes the affinity group. The affinity group can correspond to any of the affinity groups described previously.

In some embodiments, the extended capture probes including the affinity group can be coupled to a substrate specific for the affinity group. In some embodiments, the substrate can include an antibody or antibody fragment. In some embodiments, the substrate includes avidin or streptavidin and the affinity group includes biotin. In some embodiments, the substrate includes maltose and the affinity group includes maltose-binding protein. In some embodiments, the substrate includes maltose-binding protein and the affinity group includes maltose. In some embodiments, amplifying the extended capture probes can function to release the extended probes from the surface of the substrate, insofar as copies of the extended probes are not immobilized on the substrate.

In some embodiments, the extended capture probe or complement or amplicon thereof is released. The step of releasing the extended capture probe or complement or amplicon thereof from the surface of the substrate can be achieved in a number of ways. In some embodiments, an extended capture probe or a complement thereof is released from the array by nucleic acid cleavage and/or by denaturation (e.g., by heating to denature a double- stranded molecule). In some embodiments, the extended capture probe or complement or amplicon thereof is released from the surface of the substrate (e.g., array) by physical means. For example, where the extended capture probe is indirectly immobilized on the array substrate, e.g., via hybridization to a surface probe, it can be sufficient to disrupt the interaction between the extended capture probe and the surface probe. Methods for disrupting the interaction between nucleic acid molecules include denaturing double stranded nucleic acid molecules are known in the art. A straightforward method for releasing the DNA molecules (i.e., of stripping the array of extended probes) is to use a solution that interferes with the hydrogen bonds of the double stranded molecules. In some embodiments, the extended capture probe is released by an applying heated solution, such as water or buffer, of at least 85°C, e.g., at least 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99°C. In some embodiments, a solution including salts, surfactants, etc. that can further destabilize the interaction between the nucleic acid molecules is added to release the extended capture probe from the substrate.

In some embodiments, where the extended capture probe includes a cleavage domain, the extended capture probe is released from the surface of the substrate by cleavage. For example, the cleavage domain of the extended capture probe can be cleaved by any of the methods described herein. In some embodiments, the extended capture probe is released from the surface of the substrate, e.g., via cleavage of a cleavage domain in the extended capture probe, prior to the step of amplifying the extended capture probe.

In some embodiments, where a sample is barcoded directly via hybridization with capture probes or analyte capture agents hybridized, bound, or associated with either the cell surface, or introduced into the cell, as described above, sequencing can be performed on the intact sample.

A wide variety of different sequencing methods can be used to analyze the barcoded analyte or moiety. In general, sequenced polynucleotides can be, for example, nucleic acid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA or DNA/RNA hybrids, and nucleic acid molecules with a nucleotide analog).

Sequencing of polynucleotides can be performed by various systems. More generally, sequencing can be performed using nucleic acid amplification, polymerase chain reaction (PCR) (e.g., digital PCR and droplet digital PCR (ddPCR), quantitative PCR, real time PCR, multiplex PCR, PCR-based single plex methods, emulsion PCR), and/or isothermal amplification. Non-limiting examples of methods for sequencing genetic material include, but are not limited to, DNA hybridization methods (e.g., Southern blotting), restriction enzyme digestion methods, Sanger sequencing methods, next-generation sequencing methods (e.g., single-molecule real-time sequencing, sequence by synthesis sequencing, nanopore sequencing, and Polony sequencing), ligation methods, and microarray methods.

(iii) Processing of Nucleic Acid Libraries

In some embodiments, after creation of the library of nucleic acids (e.g., plurality of nucleic acids), one or more bait oligonucleotides (e.g., from one or more panels as described herein) are hybridized to a plurality of nucleic acids. The nucleic acids include all or a portion of a sequence of an analyte of interest or a complement thereof and/or include all or a portion of a spatial barcode of interest or a complement thereof. In some embodiments, one or more bait oligonucleotides are hybridized to the nucleic acid including all or a portion of a sequence of analyte of interest or a complement thereof.

In some instances, one or more libraries of nucleic acids are pooled. In some instances, one or more libraries are incubated with Cot DNA (e.g., human Cot DNA). In some instances, one or more libraries are incubated with universal blocker nucleic acids, which hybridize to one or more well-expressed nucleic acids to prevent hybridization of unwanted nucleic acids to bait oligonucleotides.

In other embodiments, prior to hybridization of the bait oligonucleotides, the library, the nucleic acid(s), or the enriched nucleic acid(s) can be quantified using quantitative PCR (qPCR). In some embodiments, the library, the nucleic acid(s), or the enriched nucleic acid(s) can be fragmented. In some embodiments, the library, the nucleic acid(s), or the enriched nucleic acid(s) can be fragmented by enzyme-based methods (e.g., by restriction enzymes, nicking enzymes and/or transposases). In some embodiments, the library, the nucleic acid(s), or the enriched nucleic acid(s) can be fragmented by endonucleases. In some embodiments, the library, the nucleic acid(s), or the enriched nucleic acid(s) can be fragmented by mechanical shearing (e.g., acoustic shearing, hydrodynamic shearing, and/or nebulization). In some embodiments, the library, the nucleic acid(s), or the enriched nucleic acid(s) can be fragmented by combined enzyme-based methods and mechanical shearing. In some embodiments, the library, the nucleic acid(s), or the enriched nucleic acid(s) can be further processed via end-repair, poly-A tailing, or a combination thereof. In some embodiments, adaptors are ligated to each nucleic acid or enriched nucleic acid sequence. In some embodiments, the adaptors can be ligated to the 3’ end of the nucleic acid or enriched nucleic acid sequence. In some embodiments, the adaptors can be ligated to the 5’ end of the nucleic acid sequence or enriched nucleic acid sequence. In some embodiments, the adaptors can be nucleic acid sequences that add a function, e.g., spacer sequences, primer sequences/sites, barcode sequences, unique molecular identifier (UMI) sequences, linkers, and/or sequencing adaptors.

In some embodiments, the methods disclosed herein include sample index (SI) PCR, which adds nucleic acid sequences (e.g., barcodes) to the 5’ and/or 3’ ends of a nucleic acid sequence or an enriched nucleic acid sequence. In some instances, a SI-PCR reaction is performed at 67°C. In some embodiments, SI-PCR is a PCR reaction that introduces sample index sequences (e.g., i5 and i7) to the 5’ and/or 3’ ends of a nucleic acid sequence or an enriched nucleic acid sequence. In some embodiments, methods for SI-PCR add the i5 sample index sequence. In some embodiments, methods for SI-PCR add the i7 sample index sequence. In some embodiments, a P5 adapter is added to a nucleic acid sequence or an enriched nucleic acid sequence. In some embodiments, a P7 adapter is added to a nucleic acid sequence or an enriched nucleic acid sequence. In some embodiments, SI-PCR is performed before bait oligonucleotide enrichment of a nucleic acid of interest. In some embodiments, SI-PCR is performed after bait oligonucleotide enrichment of a nucleic acid of interest.

In some embodiments, the nucleic acid of interest (before or after enrichment using a bait oligonucleotide) or a library generated from the same can be dried. In some embodiments, drying includes a dehydrating process such as heat, a vacuum, lyophilization, desiccation, filtration, and air-drying. In some instances, a vacuum centrifuge is used to dry the sample. In some instances, drying is performed at about 50°C, about 55°C, about 60°C, about 63°C, about 65°C, about 67°C, about 70°C, or about 75°C. In some embodiments, drying can be performed for at least 1 hour, at least 2 hours, at least 3 hours or at least 4 hours. A sample can be stored (e.g., at -20°C) if the sample is not used immediately. In some embodiments, the nucleic acid of interest (before or after enrichment using a bait oligonucleotide) or a library generated from the same is not dried.

(d) Targeted Capture of Analytes using Hybridization of Bait Oligonucleotides

After preparation of a nucleic acid library from a biological sample, the library can be incubated with a plurality of bait oligonucleotides in order to selectively enrich the library for targets of interest. Target analytes can be isolated from the library, creating an enriched population of target analytes. While whole transcriptome spatial analysis is very informative, a more targeted gene enrichment allows for the spatial localization of a subset of targets that are of particular interest, for example for cancer or disease detection of related cancer or disease related genes and gene expression. As such, a research can focus on a subset of genes of interest and maximize spatial knowledge of those genes while minimizing costs and reagents associated with spatial whole transcriptomic workflows.

(i) Design of Bait Oligonucleotides

In some embodiments, bait oligonucleotide sets are designed to target and hybridize to a plurality of nucleic acids (e.g., to prepared spatial libraries, e.g., to prepared cDNA libraries). In some embodiments, bait oligonucleotide sets hybridize to targeted nucleic acids (e.g., cDNA) from a more expansive set of library nucleic acids. In some embodiments, the hybridized product (e.g., the bait oligonucleotide and hybridized nucleic acid) are then captured by streptavidin beads. In some embodiments, the hybridized product (e.g., the bait oligonucleotide and hybridized nucleic acid) are then captured by avidin beads. Unhybridized nucleic acids are washed away. The targeted product is reamplified and sequenced. In some embodiments, the reamplifed targeted product can be fragmented, ligated with adaptor sequences, and amplified by SI-PCR.

Disclosed herein are methods to design and test candidate bait oligonucleotide sequences. Candidate bait oligonucleotide sequences are designed so that each bait oligonucleotide sequence theoretically hybridizes to a unique target of interest. Accordingly, the designed bait oligonucleotides are at least 40 nucleotides in length. To identify a bait oligonucleotide of interest, at unique 40 nucleotide sections of the human transcriptome are identified and aligned to the genome using an aligner designed for aligning RNA-seq data. In some embodiments, bait oligonucleotides are designed to hybridize to a particular exon. In some embodiments, bait oligonucleotides are designed to span an exon-exon junction. In some embodiments, bait oligonucleotides can hybridize to a target, allowing for identification of splicing and alternative splicing transcripts in the transcriptome. Using the alignments, sequences that align to the genome one or more times are identified and cataloged. Each bait designed can be tested against (i.e., compared to) a sequence identified in the genome. If the sequences in the bait oligonucleotide and in the genome do not match, then the bait can be tested in one or more panels as disclosed herein.

The present disclosure provides a method to design nucleic acid baits for full-length cDNA comprising obtaining a coding sequence for each transcript (e.g., isoform) of each target gene in a targeted gene panel. Where the coding sequence is less than a threshold length (e.g., 120 base pairs), a full mRNA sequence can be used instead. The method further comprises, for each 120 base pair sub-sequence in each coding sequence, obtaining a count of the number of transcripts in which the 120 base pair sub-sequence occurs. Each sub-sequence is ranked by the obtained count and a first sub-sequence is selected from the sub- sequence that occurs in the maximal number of transcripts for the respective gene. The first sub sequence is further subjected to filtering criteria (e.g., uniqueness, mappability, absence of repetitive subsequences, and/or overall GC content). In some embodiments, if the first sub sequence fails to satisfy one or more filtering criteria, the first sub-sequence is rejected and a new first sub-sequence is selected from the sub-sequence that occurs in the maximal number of transcripts for the respective gene that further satisfies the filtering criteria. In some embodiments, if the first sub-sequence fails to satisfy one or more filtering criteria, the first sub-sequence is modified (e.g., by truncating the first sub-sequence such that it satisfies the one or more filtering criteria and/or by shifting the first sub-sequence along the reference genome such that it satisfies the one or more filtering criteria).

The method further comprises selecting a second sub-sequence from the sub-sequence that occurs in the maximal number of remaining transcripts for the respective gene (e.g., other than the transcripts in which the first sub-sequence occurred). The method is iterated for all remaining transcripts until no transcripts remain (e.g., at least one sub-sequence in the plurality of selected sub-sequence occurs in each transcript in the plurality of transcripts for the target gene).

The method provided in the present disclosure improves upon the current technology by utilizing full-length cDNA sequences rather than 3’ fragmented sequencing libraries, allowing access to larger regions of reliably annotated sequences for nucleic acid bait design. For example, full-length cDNA sequences comprise coding sequences, which are generally well-annotated, ensuring better targeted hybridization results and on-target rates.

Full-length cDNA sequences include common sequences shared across transcripts (e.g., isoforms), allowing the design of nucleic acid baits that can target multiple transcripts and reduce the number of nucleic acid baits required by 10-fold. The resulting plurality of nucleic acid baits can thus be downsized and streamlined, leading to higher efficiency and reduced costs for users desiring nucleic acid baits for large and/or custom targeted gene panels. As an example, whereas a plurality of nucleic acid baits conventionally “tiled” for IX coverage of a 1000-gene panel would comprise between 38,000 and 68,000 nucleic acid baits, a plurality of nucleic acid baits designed using the presently disclosed method for the same 1000-gene panel would comprise between 2,500 and 4,000 nucleic acid baits. Furthermore, the presently disclosed method can be performed for any application in which targeted analysis is desired (e.g., single cell RNA sequencing and/or spatial gene expression profiling). In some instances, the plurality of nucleic acid baits comprises at least 500, at least 1000, at least 2000, at least 3000, at least 4000, or at least 5000 nucleic acid baits.

In some instances, each respective nucleic acid bait in the plurality of nucleic acid baits that hybridizes to a cDNA sequence mapping to a respective gene in the plurality of genes (i) selectively hybridizes to a first subset of transcripts, in a corresponding plurality of subsets of transcripts in the plurality of transcripts corresponding to the respective gene, or (ii) selectively hybridizes to another subset of transcripts, other than the first subset of transcripts, in the corresponding plurality of subsets of transcripts in the plurality of transcripts corresponding to the respective gene. Each respective transcript in the corresponding plurality of transcripts of each respective gene in the plurality of genes is hybridizable to a nucleic acid bait in the plurality of nucleic acid baits.

For example, a nucleic acid bait can hybridize to (e.g., can comprise nucleic acid sequences complementary to) one or more nucleic acid sequences corresponding to a target gene. In some cases, the one or more nucleic acid sequences that hybridize to the nucleic acid bait represent a corresponding one or more transcripts, or isoforms, of the target gene. As used herein, a subset of transcripts (e.g., a subset of isoforms) is defined as the group of transcripts (e.g., isoforms) for the target gene that hybridize to a respective nucleic acid bait.

In some embodiments, a nucleic acid bait that hybridizes to a cDNA sequence for a target gene selectively hybridizes to each transcript in the plurality of transcripts for the respective gene. For example, a single nucleic acid bait can be hybridized to all isoforms for a target gene, such that the first subset of isoforms consists of the plurality of isoforms for the target gene. In some such instances, no other subset of isoforms is defined (e.g., the plurality of isoforms can be grouped into only a single or first subset of isoforms).

In some instances, the plurality of transcripts for a target gene can be subdivided into a plurality of subsets of transcripts. The plurality of subsets of transcripts can include a first and a second subset of transcripts.

In some embodiments, each subset of transcripts in the plurality of subsets of transcripts is defined as the group of transcripts to which a respective nucleic acid bait hybridizes. In such cases, the first subset of transcripts is defined as the group of transcripts to which at least a first nucleic acid bait hybridizes, and the second subset of transcripts is defined as the group of transcripts to which at least a second nucleic acid bait hybridizes. For example, a first group of transcripts (e.g., isoforms) for the target gene that hybridize to a first nucleic acid bait is defined as a first subset of transcripts (e.g., a first subset of isoforms). Furthermore, a second group of transcripts (e.g., isoforms) for the target gene that hybridize to a second nucleic acid bait is defined as a second subset of transcripts (e.g., a second subset of isoforms).

In some embodiments, the second subset of transcripts consists of transcripts not included in the first subset of transcripts (e.g., a first subset and a second subset of isoforms can comprise mutually exclusive groups of isoforms).

In some instances, a first subset of isoforms can be selected by counting matches for each possible sub-sequence, of a given length, of a candidate bait sequence (e.g., sub sequences of a target gene coding sequence or mRNA sequence).

Hybridization matching between each possible bait sub-sequence and each position across each isoform is performed iteratively for every isoform for the respective gene. The number of isoforms that match (e.g., comprise a complementary sequence to) each bait sub sequence is tallied, the bait sub-sequence with the highest number of matches is selected as the first nucleic acid bait, and the subset of isoforms that match the first nucleic acid bait is defined as the first subset of isoforms. In some instances, when the corresponding first subset of isoforms fails to account for all the isoforms for the target gene, the process is repeated for all remaining isoforms that failed to match with the first nucleic acid bait. Thus, the bait sub sequence with the highest number of matches to the remaining isoforms (e.g., the isoforms not included in the first subset of isoforms) is selected as the second nucleic acid bait, and the second subset of isoforms that match the second nucleic acid bait is defined as the second subset of isoforms.

In some instances, the process can be repeated as many times as necessary until a plurality of nucleic acid baits are identified such that all transcripts in the plurality of transcripts for the respective gene is hybridizable to at least one nucleic acid bait in the plurality of nucleic acid baits.

In some cases, the plurality of subsets of transcripts includes at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten subsets of transcripts. In some embodiments, each subset of transcripts corresponds to a single nucleic acid bait. In some alternative embodiments, each subset of transcripts corresponds to a plurality of nucleic acid baits.

In some embodiments, a subset of transcripts other than the first subset of transcripts consists of one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more transcripts. In some embodiments, the plurality of nucleic acid baits comprises at least 2 x 10 3 , at least 3 x 10 3 , at least 4 x 10 3 , at least 5 x 10 3 , at least 1 x 10 4 , at least 2 x 10 4 , at least 3 x 10 4 , at least 4 x 10 4 , at least 5 x 10 4 , at least 6 x 10 4 , at least 7 x 10 4 , or at least 1 x 10 5 nucleic acid baits.

In some embodiments, the plurality of nucleic acid baits includes a minimum number of baits necessary to selectively hybridize to each respective transcript in the corresponding plurality of transcripts for a respective gene in the plurality of genes. For example, each respective transcript in the corresponding plurality of transcripts of each respective gene in the plurality of genes is hybridizable to at least one nucleic acid bait in the plurality of nucleic acid baits.

In some embodiments, each nucleic acid bait in the plurality of nucleic acid baits is hybridizable to a single transcript in the plurality of transcripts, and the number of nucleic acid baits in the plurality of nucleic acid baits is equal to the number of transcripts in the plurality of transcripts for each respective gene in the plurality of genes. In some embodiments, each nucleic acid bait in the plurality of nucleic acid baits is hybridizable to a plurality of transcripts, and the number of nucleic acid baits in the plurality of nucleic acid baits is less than the number of transcripts in the plurality of transcripts for each respective gene in the plurality of genes.

In some such embodiments, the bait coverage for each respective transcript in the plurality of transcripts for the respective gene is less than IX. In some such embodiments, the bait coverage for each respective isoform in the plurality of isoforms for the first genetic target is less than 0.8X, less than 0.6X, less than 0.4X, less than 0.2X, or less than 0.1X.

In some embodiments, each respective nucleic acid bait in the plurality of nucleic acid baits shares less than a threshold percentage of sequence identity to any other nucleic acid bait in the plurality of nucleic acid baits. For instance, in some embodiments, each respective nucleic acid bait in the plurality of nucleic acid baits shares less than 100 percent, less than 98 percent, less than 96 percent, less than 94 percent, less than 92 percent, less than 90 percent, less than 88 percent, less than 86 percent, less than 84 percent, less than 82 percent, less than 80, less than 70 percent, less than 60 percent, less than 50 percent, or less than 40 percent identity to any other nucleic acid bait in the plurality of nucleic acid baits. In some embodiments, the threshold percentage of sequence identity is ten percent, twenty percent, thirty percent, or between five and fifty percent. In some embodiments, the threshold of shared sequence identity between a respective nucleic acid bait in the plurality of nucleic acid baits to any other nucleic acid bait in the plurality of nucleic acid baits determines the level of cross-hybridization of each respective nucleic acid bait to off-target sequence reads. In some embodiments, each respective nucleic acid bait in the plurality of nucleic acid baits comprises a nucleic acid sequence that has a minimal identity to the reference genome of at least 90%.

In some embodiments, each respective nucleic acid bait in the plurality of nucleic acid baits comprises a nucleic acid sequence that has a minimal identity to the reference genome of at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%.

In some embodiments, each respective nucleic acid bait in the plurality of nucleic acid baits that hybridizes to a transcript in the plurality of transcripts of the respective gene has a Tm with respect to the transcript that is between a first threshold temperature and a second threshold temperature. In some such embodiments, the first threshold temperature is between 55°C and 85°C and the second threshold temperature is between 90°C and 110°C. In some instances, hybridization occurs at 65°C. In some instances, hybridization occurs at 60°C.

In some embodiments, each respective nucleic acid bait in the plurality of nucleic acid baits hybridizes to a region of the respective gene that is at least a minimum threshold distance away from any annotated start and/or stop sites of the respective gene. In some embodiments, a respective nucleic acid bait in a plurality of nucleic acid baits for a respective sequence read in a plurality of sequence reads mapping to a respective gene is located at a position that is at least a minimum threshold distance from the 3’ end of the respective sequence read. In some such embodiments, off-target hybridization of a respective nucleic acid bait to a respective cDNA sequence can occur where there are unannotated poly-A sites or poly-A sequences present in the genomic exon and/or mRNA sequence that cause oligo-dT mispriming. As a result, in some such embodiments, the optimal position for nucleic acid bait hybridization to a corresponding cDNA sequence is located at a position that is at least a minimum threshold distance from the 3’ end. Non-limiting examples of a minimum threshold distance are from 100 to 200 base pairs (bp), from 200 to 300 bp, from 300 to 400 bp, from 400 to 500 bp, from 500 to 600 bp, from 600 to 700 bp, from 700 to 800 bp, from 800 to 900 bp, from 900 to 1000 bp, or more than 1000 bp. In some embodiments, the percentage of cDNA sequences that preferentially hybridize to nucleic acid baits at positions at least a minimum threshold distance away from the 3’ end is between 0% and 10%, between 10% and 20%, or between 20% and 30%. In addition, in some embodiments, nucleic acid baits comprising unannotated poly-A sites or poly-A sequences in the mRNA sequence are removed from the plurality of nucleic acid baits. In some instances, if the sequences in the bait oligonucleotide and in the genome match, then a modified bait is designed. To prepare a modified bait, one can slide the initial sequence +/- 40bp from the original position to identify a potentially new bait oligonucleotide. With each design, the new bait oligonucleotide is tested against the genome. After all such candidates are cataloged, the bait oligonucleotides that are ultimately included in one or more panels described herein are ordered (i.e., ranked) based on the bait oligonucleotide length (i.e., a longer bait is prioritized) and then by distance to the original intended position (with priority to sequences closer to the original intended position). However, if no bait meets the required criteria, that bait is dropped and no bait is designed at that position.

In some embodiments, the bait oligonucleotide sequence is 40 nucleotides long. In some embodiments, the bait oligonucleotide sequence is between 40 and 160 nucleotides long. In some embodiments, the bait oligonucleotide sequence is between 40 and 120 nucleotides long. In some embodiments, the bait oligonucleotide sequence is about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, about 48, about 49, about 50, about 51, about 52 about 53, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 64, about 65, about 66, about 67, about 68, about 69, about 70, about 71, about 72, about 73, about 74, about 75, about 76, about 77, about 78, about 79, about 80, about 81, about 82, about 83, about 84, about 85, about 86, about 87, about 88, about 89, about 90, about 91, about 92, about 93, about 94, about 95, about 96, about 97, about 98, about 99, about 100, about 101, about 102, about 103, about 104, about 105, about 106, about 107, about 108, about 109, about 110, about 111, about 112, about 113, about 114, about 115, about 116, about 117, about 118, about 119, about 120, about 121, about 122, about 123, about 124, about 125, about 126, about 127, about 128, about 129, about 130, about 131, about 132, about 133, about 134, about 135, about 136, about 137, about 138, about 139, about 140, about 141, about 142, about 143, about 144, about 145, about 146, about 147, about 148, about 149, about 150, about 151, about 152, about 153, about 154, about 155, about 156, about 157, about 158, about 159, or about 160 nucleotides long. In some instances, bait oligonucleotides are single stranded, 120 nucleotide- long DNA oligonucleotides with a 5’ biotin modification. In some instances, each bait targets a unique library molecule. Bait oligonucleotides can span all mature mRNA sequences, including UTRs and all annotated isoforms.

In some embodiments, a bait oligonucleotide of the plurality of bait oligonucleotides includes a domain that binds specifically to all or a portion of the spatial barcode or a complement thereof. In some embodiments, a bait oligonucleotide of the plurality of bait oligonucleotides includes a domain that binds specifically to all or a portion of the sequence of the analyte from the biological sample, or a complement thereof. In some embodiments, a bait oligonucleotide of the plurality of bait oligonucleotides includes a domain that binds specifically to all or a portion of the spatial barcode or a complement thereof and all or a portion of the sequence of the analyte from the biological sample, or a complement thereof.

In some embodiments, the domain of the bait oligonucleotide hybridizes to an analyte of interest. In some embodiments, the domain of the bait oligonucleotide binds specifically to an analyte of interest. In some embodiments, the domain of the bait oligonucleotide binds specifically to all or a portion of the spatial barcode or a complement thereof. In some embodiments, the domain of the bait oligonucleotide binds specifically to all or a portion of the sequence of the analyte from the biological sample. In some embodiments, the domain of the bait oligonucleotide binds specifically to a 3’ portion of the sequence of the analyte from the biological sample or a complement thereof. In some embodiments, the domain of the bait oligonucleotide binds specifically to a 5’ portion of the sequence of the analyte from the biological sample or a complement thereof. In some embodiments, the domain of the bait oligonucleotide binds specifically to an intron in the sequence of the analyte from the biological sample or a complement thereof. In some embodiments, the domain of the bait oligonucleotide binds specifically to an exon in the sequence of the analyte from the biological sample or a complement thereof. In some embodiments, the domain of the bait oligonucleotide binds specifically to an untranslated 3’ region of the analyte from the biological sample or a complement thereof. In some embodiments, the domain of the bait oligonucleotide binds specifically to an untranslated 5’ region of the analyte from the biological sample or a complement thereof.

In some embodiments, the domain of the bait oligonucleotide sequence is 40 nucleotides long. In some embodiments, the domain of the bait oligonucleotide sequence is between 40 and 160 nucleotides long. In some embodiments, the domain of the bait oligonucleotide sequence is between 40 and 120 nucleotides long. In some embodiments, the domain of the bait oligonucleotide sequence is about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, about 48, about 49, about 50, about 51, about 52 about 53, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 64, about 65, about 66, about 67, about 68, about 69, about 70, about 71, about 72, about 73, about 74, about 75, about 76, about 77, about 78, about 79, about 80, about 81, about 82, about 83, about 84, about 85, about 86, about 87, about 88, about 89, about 90, about 91, about 92, about 93, about 94, about 95, about 96, about 97, about 98, about 99, about 100, about 101, about 102, about 103, about 104, about 105, about 106, about 107, about 108, about 109, about 110, about 111, about 112, about 113, about 114, about 115, about 116, about 117, about 118, about 119, about 120, about 121, about 122, about 123, about 124, about 125, about 126, about 127, about 128, about 129, about 130, about 131, about 132, about 133, about 134, about 135, about 136, about 137, about 138, about 139, about 140, about 141, about 142, about 143, about 144, about 145, about 146, about 147, about 148, about 149, about 150, about 151, about 152, about 153, about 154, about 155, about 156, about 157, about 158, about 159, or about 160 nucleotides long.

In some embodiments, the analyte from the biological sample is associated with a disease or condition. In some embodiments, the analyte from the biological sample comprises a mutation. In some embodiments, the analyte from the biological sample comprises a single nucleotide polymorphism (SNP). In some embodiments, the analyte from the biological sample comprises a trinucleotide repeat.

In some embodiments, the domain of the bait oligonucleotide hybridizes to a particular exon of a transcript (i.e., an mRNA molecule). For example, a transcript can be processed such that an exon that would otherwise be excised during a normal setting is included in the mature mRNA product in a different setting (e.g., a pathological setting such as cancer). In some embodiments, the domain of the bait oligonucleotide identifies (e.g., hybridizes to) one or more isoforms or an analyte, but not others. In some embodiments, for example, a bait oligonucleotide hybridizes to a particular exon that is detected in a pathological setting (e.g., cancer).

In some embodiments, there is more than one bait oligonucleotide for a particular analyte of interest in a panel. For example, an analyte can undergo alternate splicing (e.g., as in an mRNA molecule). In some embodiments, different baits could detect and enhance particular transcripts, thereby detecting whether a particular exon is included in an analyte. Thus, in some embodiments, a panel will include at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more bait oligonucleotides for one analyte.

In some embodiments, a bait oligonucleotide is fully complementary (i.e., 100% complementary) to a portion of a target analyte. In some embodiments, a bait oligonucleotide is partially complementary (i.e., less than 100% complementary) to a portion of a target analyte. In some embodiments, a bait oligonucleotide has at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a portion of a target analyte.

In some embodiments, a bait oligonucleotide is partially complementary (i.e., less than 100% complementary) to a portion of a target analyte. In some embodiments, a bait oligonucleotide is partially complementary (i.e., less than 100% complementary) to a portion of a target analyte. In some embodiments, part of the bait oligonucleotide hybridizes to a portion of a target analyte. In some embodiments, part of the bait oligonucleotide has at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a portion of a target analyte.

Bait oligonucleotides are included in one or more panels (e.g., cohorts). Panels include groups of bait oligonucleotides that target a particular cohort of analytes. For example, a panel can include a set of bait oligonucleotides that is particular to a pathological setting. In some embodiments, more than one panel (e.g., set of bait oligonucleotides) can be used to enhance detection of analytes. For example, in some embodiments, at least 1, at least 2, at least 3, at least 4, at least 5, or more panels (e.g., set of bait oligonucleotides) can be used to enhance detection of analytes. The bait oligonucleotides and target analytes for all panels disclosed are provided in the accompanying sequencing listing disclosed herein.

In some embodiments, a panel includes bait oligonucleotides that hybridize and enhance detection of analytes dysregulated in cancer (e.g., a cancer panel). In some embodiments, the cancer panel allows for quantitative analysis of analytes aberrantly expressed in the cancer transcriptome while avoiding the excess costs and time associated with sequencing an entire exome. The bait oligonucleotides and target analytes for an exemplary cancer panel are provided in Table 1 and in the accompanying sequencing listing disclosed herein. In some embodiments, the bait oligonucleotides in the cancer panel can include detection of dysregulated analytes associated with biological processes such as apoptosis, metabolism, cell cycle (e.g., checkpoint analytes), DNA damage and repair, hypoxia, and stress toxicity. More specifically, the bait oligonucleotides in the cancer panel target cancer-specific analytes such as analytes that function in pathways that include but are not limited to the Myc pathway, Hippo pathway, RTK/RAS pathway, TP53 and TP53- associate pathway, TFGP pathway, and Wnt pathway. The bait oligonucleotides in the cancer panel also can detect cancer hot spots, tumor suppressor analytes, and immune-dysregulated analytes. The bait oligonucleotides and target analytes for an exemplary cancer panel were disclosed in U.S. Appl. No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety. In some embodiments, a panel includes bait oligonucleotides that hybridize and enhance detection of analytes associated with immunological dysregulation (e.g., an immunology panel). In some embodiments, the immunology panel allows for quantitative analysis of analytes associated with immune-dysregulation while avoiding the excess costs and time associated with sequencing an entire exome. The bait oligonucleotides and target analytes for an exemplary immunology panel are provided in Table 2 and in the accompanying sequencing listing disclosed herein. In some embodiments, the bait oligonucleotides in the immunology panel can include detection of dysregulated analytes associated with biological processes such as B cell function, T cell function, cell cycle, cell signaling, interleukin signaling, and metabolism. More specifically, bait oligonucleotides target analytes that include but are not limited to transcription factors, T cell activation markers, antigen presentation genes, metabolic genes, and SIRP family members. In some embodiments, the immunology panel has applications to detect associated with immunological dysregulation to examine innate immunity, adaptive immunity, one or more inflammatory responses, detection of one or more infectious diseases (e.g., a bacterial infection; a viral infection), and immune response in a transplant recipient. More specifically, the bait oligonucleotides in the immunology panel target immune-specific biomarkers that include but are not limited to lineage markers, tissue markers, and cancer markers. In some embodiments, the bait oligonucleotides enhance detection of analytes expressed in bone marrow, gut, lung, salivary gland, intestine, lymph node, stem cells, or a combination thereof. The bait oligonucleotides and target analytes for an exemplary immunology panel were disclosed in U.S. Appl. No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety.

In some embodiments, a panel includes bait oligonucleotides that hybridize and enhance detection of analytes that detect pathway dysregulation (e.g., a pathway panel or a gene signature panel). In some embodiments, the pathway panel allows for quantitative analysis of analytes associated with pathway dysregulation while avoiding the excess costs and time associated with sequencing an entire exome. The bait oligonucleotides and target analytes for an exemplary pathway panel are provided in Table 3 and in the accompanying sequencing listing disclosed herein. In some embodiments, the bait oligonucleotides in the pathway panel can include detection of dysregulated analytes associated with complex signal transduction pathways. The target analytes include analytes specific to disease or drug targets; G-protein coupled receptors (GPCRs), one or more kinases; one or more epigenetic markers; or one or more checkpoint analytes. Analytes detected by bait oligonucleotides in the pathway panel include but are not limited to tissue markers of cancer dysregulation, central nervous system dysregulation, inflammation dysregulation, metabolic dysregulation, cardiovascular dysregulation, respiratory dysregulation, and reproductive dysregulation. The bait oligonucleotides and target analytes for an exemplary pathway panel were disclosed in U.S. Appl. No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety.

In some embodiments, a panel includes bait oligonucleotides that hybridize and enhance detection of analytes that detect neurological development and/or dysregulation. In some embodiments, a panel includes bait oligonucleotides that hybridize and enhance detection of analytes that detect neurological development. In some embodiments, a panel includes bait oligonucleotides that hybridize and enhance detection of analytes that detect neurological dysregulation. In some embodiments, a panel includes bait oligonucleotides that hybridize and enhance detection of analytes that detect neurological development and dysregulation. In some embodiments, the neurological panel allows for quantitative analysis of analytes associated with pathway dysregulation while avoiding the excess costs and time associated with sequencing an entire transcriptome. The bait oligonucleotides and target analytes for an exemplary neurological panel are provided in Table 4 and in the accompanying sequencing listing disclosed herein. In some embodiments, the bait oligonucleotides in the neurological panel can include detection of dysregulated analytes associated with axonal targeting, hypoxia, and glioblastoma. In some embodiments, the bait oligonucleotides in the neurological panel can include detection of dysregulated analytes associated with axonal targeting. In some embodiments, the bait oligonucleotides in the neurological panel can include detection of dysregulated analytes associated with hypoxia. In some embodiments, the bait oligonucleotides in the neurological panel can include detection of dysregulated analytes associated with glioblastoma. In some embodiments, the bait oligonucleotides in the neurological panel can include detection of mitochondrial protein coding genes. In some embodiments, the bait oligonucleotides in the neurological panel can include detection of mitochondrial protein-coding genes in order to evaluate energy metabolism. In some instances, additional (i.e., custom) baits can be added to each panel. Further, in some instances, a fully-custom panel can be prepared.

In some embodiments, any one of the panels disclosed herein include bait oligonucleotides that can enhance detection of at least about 100 analytes, about 200 analytes, about 300 analytes, about 400 analytes, about 500 analytes, about 600 analytes, about 700 analytes, about 800 analytes, about 900 analytes, about 1000 analytes, about 1100 analytes, about 1200 analytes, about 1300 analytes, about 1400 analytes, about 1500 analytes, about 1600 analytes, about 1700 analytes, about 1800 analytes, about 1900 analytes, about 2000 analytes or more.

In some embodiments, a bait oligonucleotide enhances detection of an analyte of interest by about 1.5 fold, about 2 fold, about 3 fold, about 4 fold, about 5 fold, about 6 fold, about 7 fold, about 8 fold, about 9 fold, about 10 fold, about 20 fold, about 50 fold, about 100 fold, 500 fold, 1000 fold, or greater in comparison to analytes that are not detected using a bait oligonucleotide.

In some embodiments, a bait oligonucleotide includes a molecular tag. As disclosed herein, a molecular tag of a bait oligonucleotide is affixed to (e.g., conjugated to) the nucleic acid sequence of the bait oligonucleotide. In some embodiments, the molecular tag includes one or more moieties. In some embodiments, the moiety includes a label as described herein. The label can allow detection of the hybridization of a bait oligonucleotide to an analyte. In some embodiments, the label is directly associated with (i.e., conjugated to) the bait oligonucleotide. The detectable label can be directly detectable by itself (e.g., radioisotope labels or fluorescent labels) or, in the case of an enzymatic label, can be indirectly detectable, e.g., by catalyzing chemical alterations of a chemical substrate compound or composition, which chemical substrate compound or composition is directly detectable. Detectable labels can be suitable for small scale detection and/or suitable for high-throughput screening. As such, suitable detectable labels include, but are not limited to, radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes.

In some embodiments, the molecular tag includes a small molecule. In some embodiments, the molecular tag includes a nucleic acid. In some embodiments, the nucleic acid is single-stranded. In some embodiments, the nucleic acid is double-stranded. In some embodiments, the nucleic acid is RNA. In some embodiments, the nucleic acid is DNA. In some embodiments, the molecular tag includes a carbohydrate. In some embodiments, the molecular tag is positioned 5’ to the domain in the bait oligonucleotide. In some embodiments, the molecular tag is position 3’ to the domain in the bait oligonucleotide. In some embodiments, the agent that binds specifically to the molecular tag includes a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that binds specifically to the molecular tag comprises a nucleic acid. In some embodiments, the agent that binds specifically to the molecular tag comprises a small molecule. In some embodiments, the agent that binds specifically to the molecular tag is attached to a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well.

In some embodiments, the substrate is a slide.

In some embodiments, the moiety is biotin. In some embodiments, a biotin molecule is directly associated with (i.e., conjugated to) the bait oligonucleotide at the 3’ end. In some embodiments, a biotin molecule is directly associated with (i.e., conjugated to) the bait oligonucleotide at the 5’ end. In some embodiments, and as disclosed below, the biotin molecule can be associated to (e.g. conjugated to) an avidin molecule, allowing pulldown of an analyte. In some embodiments, and as disclosed below, the biotin molecule can be associated to (e.g. conjugated to) a streptavidin molecule, allowing pulldown of an analyte.

In some embodiments, the bait oligonucleotide does not include a moiety affixed to the sequence (i.e., the bait oligonucleotide is a naked bait oligonucleotide).

(ii) Hybridization Methods

In some instances, nucleic acids from a single library of nucleic acids (i.e., a single sample) hybridize to the oligonucleotide baits. In some instances, prior to hybridization, the library of nucleic acids can be pooled from multiple samples. In some instances, at least 2, 3, 4, 5, 6, 7, 8, or more samples are pooled. In some instances, each library is multiplexed with barcodes that are specific to a certain pool of nucleic acids. In some instances, pool libraries are from the same cell or tissue type. In some instances, pool libraries are from different cell or tissue types.

In the setting of pooling multiple samples from different spots on one or more arrays, pooling calculations take into account the number of spots on an array covered by a tissue. In some instances, estimating the number of gene expression spots covered by tissue can be performed visually, or by using automated computation methods for a more accurate measurement. In the setting of an array having about 5000 spots, the number of spots covered by the sample on the array can be calculated by multiplying the percent coverage by the total number of gene expression spots (e.g., about 5000).

In some instances, during the step of pre-hybridization pooling of nucleic acids, universal blockers are added to the pool. In some instances, human Cot DNA is added to the pool. Human Cot DNA is enriched for repetitive, non-coding elements commonly found in genomic DNA. These repetitive sequences often lead to non-specific binding during hybridization reactions. Adding Cot DNA to these reactions reduces non-specific binding associated with these repetitive sequences to improve accuracy. In some instances, the methods of bait oligonucleotide hybridization disclosed herein include various buffers and components to aid in nucleic acid capture. In some instances, the methods include a hybridization enhancer, a wash buffer, an equilibration buffer, a hybridization buffer or any combination thereof. Any of the buffers can be in a concentrated form that can be diluted to an appropriate concentration by one in the art using e.g., nuclease- free water. In some instances, the components include streptavidin beads. In some instances, the components include avidin beads.

In some embodiments, where RNA is the nucleic acid analyte, one or more RNA analyte species of interest can be selectively enriched. For example, one or more species of RNA of interest can be selected by addition of one or more oligonucleotides to the sample. In some embodiments, the additional oligonucleotide is a sequence used for priming a reaction by a polymerase. For example, one or more primer sequences with sequence complementarity to one or more RNAs of interest can be used to amplify the one or more RNAs of interest, thereby selectively enriching these RNAs. In some embodiments, an oligonucleotide with sequence complementarity (e.g. a probe) to the complementary strand of captured RNA (e.g., cDNA) can bind to the cDNA. For example, biotinylated oligonucleotides with sequence complementary to one or more cDNA of interest binds to the cDNA and can be selected using biotinylation-strepavidin affinity using any of a variety of methods known to the field (e.g., streptavidin beads). Other non-nucleic acid affinity moieties are known in the art, for example, 2-(4-Hydroxyphenylazo)benzoic acid (HABA) or a compound listed in Table 5.

Alternatively, one or more species of RNA can be down-selected (e.g., removed) using any of a variety of methods. For example, probes can be administered to a sample that selectively hybridize to ribosomal RNA (rRNA), thereby reducing the pool and concentration of rRNA in the sample. Subsequent application of the capture probes to the sample can result in improved capture of other types of RNA due to the reduction in non-specific RNA present in the sample.

(1) Hybridization of Bait Oligonucleotides Directly After creation of a plurality of nucleic acids or a library generated using the same, one or more nucleic acids of interest can be enriched using one or more bait oligonucleotides.

In some embodiments, bait oligonucleotides are added to a plurality of nucleic acids (e.g., a library of nucleic acids). In some embodiments, the plurality of nucleic acids includes nucleic acids that have a spatial barcode or a complement thereof, and a portion of a sequence of an analyte from a biological sample, or a complement thereof. In some embodiments, the spatial barcode includes a sequence that corresponds to a region of interest in the biological sample. In some embodiments, the spatial barcode allows detection of and association with a particular region of interest in the biological sample. In some embodiments, the methods disclosed herein include identifying the region of interest. In some embodiments, the spatial barcode provides information about the location of an analyte in a biological sample. In some embodiments, the methods disclosed herein include identifying the location and/or abundance of an analyte in the biological sample.

In some embodiments, the bait oligonucleotides are added to a plurality of nucleic acids. In some embodiments, bait oligonucleotides include a domain that binds specifically to all or a portion of a spatial barcode or a complement thereof, and/or all or a portion of a sequence of an analyte or a complement thereof. In some embodiments, a complex of a bait oligonucleotide specifically bound to the nucleic acid can be enriched. For example, a bait oligonucleotide can include a molecular tag, and a complex of a bait oligonucleotide specifically bound to the nucleic acid can be enriched using an agent that binds specifically to the molecular tag. In some embodiments, the molecular tag can be attached (directly or indirectly) to a substrate (e.g., a slide, a well, or a bead). In some embodiments, a molecular tag can include a protein, a nucleic acid, a carbohydrate, a small molecule, or any combination thereof. In some embodiments, an agent that binds specifically to a molecular tag can be a protein, a nucleic acid, a carbohydrate, a small molecule, or any combination thereof. In some embodiments, a molecular tag can be avidin and an agent that binds specifically to the molecular tag can be biotin. In some embodiments, a molecular tag can be streptavidin and an agent that binds specifically to the molecular tag can be biotin.

In some embodiments, a molecular tag can be biotin and an agent that binds specifically to the molecular tag can be avidin or streptavidin (e.g., streptavidin attached to a bead). In some instances, beads are prepared using methods known in the art. In some instances, buffers (e.g., equilibration buffer) is used to suspend the bead. In some instances, the beads are separated from the buffer using methods known in the art (e.g., using a magnetic separator; centrifugation). In some instances, beads are resuspended in a buffer to be used to capture bait oligonucleotides.

In some embodiments, where the molecular tag is biotin and the method includes the use of avidin or streptavidin beads to enrich a nucleic acid complexed with a bait oligonucleotide, the streptavidin beads can be washed using any method known in the art. In some embodiments, the streptavidin bead can be washed for 1 time, 2 times, 3 times, 4 times, 5 times, 6 times, 7 times or more. In some embodiments, the streptavidin beads can be stringently washed for one time, 2 times, 3 times, 4 times, 5 times, 6 times, 7 times, or more. In some embodiments, the streptavidin beads can be washed at about 15 °C, about 20 °C, about 25 °C, about 30 °C, about 35 °C, about 40 °C, about 45 °C, about 48 °C, about 50 °C, about 55 °C, about 60 °C, about 62 °C, about 65 °C, about 67 °C, about 70 °C, about 75 °C or more. In some embodiments, the streptavidin beads can be washed at about 67 °C. In some instances, the temperature of the avidin or streptavidin bead wash is at the same temperature (e.g., 60 °C or 65 °C) as the bait oligonucleotide hybridization step. In some embodiments, after one or more wash steps, the nucleic acid(s) hybridized to the bait oligonucleotide(s) can be recovered and are enriched.

In some embodiments, the recovered nucleic acids can be released from the bait oligonucleotide(s) and purified to remove avidin or streptavidin and biotin (or any other molecular tag and agent that binds specifically to the molecular tag).

In some embodiments, a molecular tag can be biotin and an agent that binds specifically to the molecular tag can be avidin (e.g., avidin attached to a bead). In some embodiments, where the molecular tag is biotin and the method includes the use of avidin beads to enrich a nucleic acid complexed with a bait oligonucleotide, the avidin beads can be washed using any method known in the art. In some embodiments, the avidin bead can be washed for 1 time, 2 times, 3 times, 4 times, 5 times, 6 times, 7 times or more. In some embodiments, the avidin beads can be stringently washed for one time, 2 times, 3 times, 4 times, 5 times, 6 times, 7 times, or more. In some embodiments, the avidin beads can be washed at about 15 °C, about 20 °C, about 25 °C, about 30 °C, about 35 °C, about 40 °C, about 45 °C, about 48 °C, about 50 °C, or more. In some embodiments, after one or more wash steps, the nucleic acid(s) hybridized to the bait oligonucleotide(s) can be recovered and are enriched. In some embodiments, a plurality of bait oligonucleotides can be used in any of the methods described herein to enrich one or more nucleic acids of interest from a plurality of nucleic acids. In some embodiments, the plurality of bait oligonucleotides are designed to enrich one or more nucleic acids that include all or a part of a sequence of an analyte of interest (e.g., one or more genes that function or are aberrantly expressed in a particular cellular state or pathway), or a complement thereof. For example, in some embodiments, a plurality of bait oligonucleotides can be used to enrich nucleic acids that include all or a part of a sequence of cancer-related transcripts, or complements thereof (e.g., as disclosed in U.S. Appl. No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety). In some embodiments, a plurality of bait oligonucleotides can be used to enrich nucleic acids that include all or a part of a sequence of immune-related transcripts, or complements thereof (e.g., as disclosed in U.S. Appl. No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety). In some embodiments, a plurality of bait oligonucleotides can be used to enrich nucleic acids that include all or a part of a sequence of pathway-specific transcripts, or complements thereof (e.g., as disclosed in U.S. Appl. No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety). In some embodiments, a plurality of bait oligonucleotides can be used to enrich nucleic acids that include all or a part of a sequence of neurological-specific transcripts, or complements thereof, as listed in Table 4 and in the accompanying sequence listing.

In some embodiments, after hybridization of one or more bait oligonucleotides to their target nucleic acids, the hybridized nucleic acids are enriched, creating a pool of nucleic acids that are enriched for particular nucleic acids of interest. In some instances, after hybridization and binding to the streptavidin or avidin beads, unhybridized nucleic acids are washed from the sample, thus enriching nucleic acids of interest. In some embodiments, after hybridization of one or more bait oligonucleotides to their target nucleic acids, the un hybridized nucleic acids are degraded (e.g., by a nuclease), thus enriching the hybridized nucleic acids. In some embodiments, after hybridization of one or more bait oligonucleotides, the hybridized nucleic acids are degraded (e.g., by a nuclease), thus enriching the un hybridized nucleic acids; for example, this technique may be useful to decrease the amount of a high-abundance nucleic acid that is not of interest. In some embodiments, the bait oligonucleotides may not include a detectable moiety. In some embodiments, the enriched nucleic acids are purified. In some embodiments, the enriched nucleic acids are sample indexed (e.g., prior to sequencing).

In some embodiments, a bait oligonucleotide is hybridized with a nucleic acid at about 40 °C, about 45 °C, about 50 °C, about 55 °C, about 60 °C, about 65 °C, about 70 °C, about 75 °C, about 80 °C or higher. In some embodiments, the bait oligonucleotides are hybridized with a nucleic acid for at least 15 minutes, at least 30 minutes, at least 45 minutes, at least 1 hour, at least 90 minutes, at least 2 hours, at least 3 hours, at least 4 hours, at least 5 hours, or longer.

In some instances, the bait oligonucleotides are hybridized with a nucleic acid for about 2 hours. In some instances, the bait oligonucleotides are hybridized with a nucleic acid overnight (e.g., at least about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, or longer). In some instance, the bait oligonucleotides are hybridized with a nucleic acid for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, or longer. In some instances, after hybridization, the sample comprising the hybridized bait oligonucleotides with the nucleic acids of interest is washed.

In some instances, washing occurs using any wash solution described herein. In some instances, washing occurs at 65°C. In some instance, washing occurs at 60°C. In some instances, hybridization occurs overnight at 60°C followed by subsequent wash steps at 60°C. In instances where the average length of a nucleic acid molecule is less than 700 base pairs, hybridization should be extended to overnight at 60°C. In this setting, subsequent washes are performed at 60°C. In instances where a library pool includes nucleic acid molecules of varying length (e.g., if the pool contains both short (e.g., <700bp) and long (e.g., >700bp)), hybridization can be extended to overnight at 60°C with subsequent washes performed at 60°C. For example a percentage of the pool of nucleic acid molecules (e.g., about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%) can be less than 700 base pairs, and thus this alternative method of overnight at 60°C with subsequent washes performed at 60°C.

In some embodiments, one or more detectable moieties can be associated (e.g., attached directly or indirectly) with a bait oligonucleotide. In some embodiments, the one or more detectable moieties can be used to detect (or enhance detection) of a bait oligonucleotide (e.g., a bait oligonucleotide hybridized to a nucleic acid).

In some embodiments, the enriched nucleic acid can be amplified. After amplification, the enriched and amplified nucleic acid can be used to generate a library of nucleic acids and sequenced using any method known in the art, including the exemplary sequencing methods described herein. In some embodiments, sequencing can include determining all or a portion of the sequence of the spatial barcode or the complement thereof in the nucleic acid. In some embodiments, sequencing includes determining all or a portion of the sequence of the analyte from the biological sample or a complement thereof in the nucleic acid. In some embodiments, sequencing includes high throughput sequencing. In some instances, the methods of amplification comprise steps that include compositions including a buffer and a set of library primers. A thermocycler can be used to amplify the enriched nucleic acid library (e.g., cycling steps of e.g., 98°C, 67°C, and 72°C). It is appreciated that one skilled in the art can determine the temperature and run time parameters in order to amplify the nucleic acid library. In some instances, one can vary the number of total cycles of amplification in order to optimize detection of one or more bait oligonucleotides. In some instances, the number of PCR cycles run to amplify the librar ies) is at least 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,

30, or more cycles. It is appreciated that if the expected expression of a given target is low, then more cycles will be necessary to detect the target. In some instances, amplification can be performed while the bead (e.g., avidin or streptavidin) is in the mixture. In some instances, the bead(s) can be removed using methods known in the art (e.g., using a magnet).

In some instances, the average fragment size of a nucleic acid in the library can be determined after amplification. In some instances, automated electrophoresis system (e.g., TapeStation from Agilent) can be run as quality control of the nucleic acid library samples. In some instances, one can analyze size, quantity, and integrity of samples before downstream analysis (e.g., sequencing).

In some embodiments, targeted spatial gene expression profiling using one or more panels as described herein (e.g., cancer panel, immune panel, pathway panel, or neuro panel) can enrich the target nucleic acids as compared to unbiased spatial profiling by about 1 fold, about 2 fold, about 3 fold, about 4 fold, about 5 fold, about 6 fold, about 7 fold, about 8 fold, about 9 fold, about 10 fold, about 20 fold, about 50 fold, about 100 fold or greater. In some embodiments, the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) can enrich the target nucleic acids per gene as compared to unbiased spatial profiling by about 1 fold, about 2 fold, about 3 fold, about 4 fold, about 5 fold, about 6 fold, about 7 fold, about 8 fold, about 9 fold, about 10 fold, about 20 fold, about 50 fold, about 100 fold or greater.

In some embodiments, the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) can increase on-target read percentage as compared to unbiased spatial profiling by about 1 fold, about 2 fold, about 3 fold, about 4 fold, about 5 fold, about 6 fold, about 7 fold, about 8 fold, about 9 fold, about 10 fold, about 20 fold, about 50 fold, about 100 fold or greater. In some embodiments, the number of on-target reads by targeted spatial gene expression profiling using one or more panels as described herein is at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% as compared to that using unbiased spatial profiling. In some embodiments, the targeted spatial gene expression profiling using the panel as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) can decrease off-target read percentage as compared to unbiased spatial profiling by about 1 fold, about 2 fold, about 3 fold, about 4 fold, about 5 fold, about 6 fold, about 7 fold, about 8 fold, about 9 fold, about 10 fold, about 20 fold, about 50 fold, about 100 fold or greater.

In some embodiments, the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) enriches detection of a target analyte. In some embodiments, the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) enriches detection of one or more target analytes. In some embodiments, enrichment includes an increase in the number of sequencing reads that are detected when the plurality nucleic acids and/or probes are sequenced. In some embodiments, enrichment includes an increase in the number of sequencing reads that are detected when the nucleic acids are sequenced after avidin-biotin pulldown described herein. In some embodiments, enrichment includes an increase in the number of sequencing reads that are detected when the nucleic acids are sequenced after streptavidin-biotin pulldown described herein. In some embodiments, the increase is about 50%, about 55%, about 60%, about 62%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, about 1 fold, about 2 fold, about 3 fold, about 4 fold, about 5 fold, about 6 fold, about 7 fold, about 8 fold, about 9 fold, about 10 fold, about 20 fold, about 50 fold, about 100 fold or greater enrichment of target analyte sequence reads compared to the number of sequence reads in the same target analyte when performing whole genome sequencing.

In some embodiments, enriched target analyte sequence reads obtained from the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) is highly correlated with the sequence reads in the same biological sample when performing an unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% of the UMIs detected from the enriched target analyte sequence reads are matched to the sequence reads in the same biological sample when performing un unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, the R 2 of the matched UMI counts is at least 0.95, at least 0.96, at least 0.97, at least 0.98, or at least 0.99.

In some embodiments, the number of reads per spot (reads/spot) obtained from the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) is less than 20%, less than 15%, less than 10%, less than 9%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 3%, less than 2%, or less than 1% as compared to that obtained from an unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, the total reads obtained from the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) is less than 20%, less than 15%, less than 10%, less than 9%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 3%, less than 2%, or less than 1% as compared to that obtained from an unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, even though the total reads, and/or reads/spot is decreased by the targeted spatial gene expression profiling using one or more panels as described herein, the biological pattern (e.g., the spatial pattern of a particular gene expression in a biological sample) is maintained as compared to that obtained from an unbiased (e.g., non-targeted) spatial gene expression profiling.

In some embodiments, the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) can have about 50%, about 55%, about 60%, about 62%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or more of panel gene recovery relative to an unbiased (e.g., non-targeted) spatial gene expression profiling.

In some embodiments, the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) can have about 50%, about 55%, about 60%, about 62%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or more of panel UMI recovery and/or UMI retention relative to an unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, the targeted spatial gene expression profiling using one or more panels as described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel) can reduce gene or UMI targeting complexity (or complexity) to about 90%, about 80%, about 70%, about 60%, about 50%, about 40%, about 30%, about 20% or less of an unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, dual-indexed libraries can be used for targeted spatial gene expression profiling. In some embodiments, individually indexed libraries can be mixed before the enrichment step by hybridization/capture to generate a targeted library for downstream high-throughput sequencing. In some embodiments, individually indexed libraries can be mixed after the enrichment step by hybridization/capture to generate a targeted library for downstream high-throughput sequencing.In some embodiments, the targeted spatial gene expression profiling using one or more panels as described herein provides comparable, even more detailed spatial gene profiling results as compared to those obtained from a pathologist’s annotation, and/or an unbiased (e.g., non-targeted) spatial gene expression profiling.

(2) Hybridization of Bait Oligonucleotides Directly to Analytes of Interest.

In some embodiments, a bait oligonucleotide (e.g., a set of bait oligonucleotides from one or more panels as disclosed herein) can be added directly to an analyte. In some embodiments, the bait oligonucleotide is directly associated with (i.e., conjugated to) a detectable moiety or a molecular tag at the 3’ end. In some embodiments, the bait oligonucleotide is directly associated with (i.e., conjugated to) a detectable moiety or a molecular tag at the 5’ end. In some embodiments, the detectable moiety is a fluorophore or a radioisotope.

In some embodiments, target-specific reactions are performed in the biological sample. In some embodiments, a nucleic acid analyte can be denatured to create single- stranded analytes before contact with a bait oligonucleotide. In some embodiments, a bait oligonucleotide binds (e.g., hybridizes) directly to an analyte (e.g., a single-stranded analyte), and the detectable moiety (e.g., the fluorophore) can be identified using one or more imaging techniques. In some embodiments, target-specific reactions include in situ hybridization of the bait oligonucleotide to an analyte (e.g., a single-stranded analyte). In some embodiment, the in situ hybridization is fluorescence in situ hybridization (FISH). In some embodiments, the fluorophore or a molecular tag can serve as a marker to identify (e.g., enrich) an analyte of interest. In some embodiments, fluorescence microscopy can be used to locate a fluorophore-labelled bait oligonucleotide in a biological sample or elsewhere. In some embodiments, analytes not hybridized to a bait oligonucleotide can be washed away. In some embodiments, the wash methods include isolating only those analytes that fluoresce. In some embodiments, the wash steps include any of the wash steps provided herein. In some embodiments, a fluorophore (e.g., any of the fluorophores disclosed herein) can be directly associated to (e.g., conjugated to) a bait oligonucleotide. In some embodiments, a bait oligonucleotide can include a non-fluorescent moiety that is directly associated to (e.g., conjugated to) the bait oligonucleotide. In some embodiment, a fluorophore can bind to the non-fluorescent moiety, thereby enhancing detection of the analyte.

In some embodiments, the fluorescence in situ hybridization methods disclosed herein detect a particular set of transcripts. In some embodiments, fluorescence in situ hybridization methods disclosed herein detect cancer-related transcripts (e.g., as disclosed in U.S. Appl.

No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety). In some embodiments, fluorescence in situ hybridization methods disclosed herein detect immune-related transcripts (e.g., as disclosed in U.S. Appl. No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety). In some embodiments, fluorescence in situ hybridization methods disclosed herein detect pathway-specific transcripts (e.g., as disclosed in U.S. Appl. No. 62/970,066 (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”) and U.S. Appl. No. 62/929,686, (Titled “Capturing Targeted Genetic Targets Using A Hybridization/Capture Approach”), each of which is incorporated by reference in its entirety). In some embodiments, fluorescence in situ hybridization methods disclosed herein detect neurological-specific transcripts as described herein.

In some embodiments, a hybridized analyte/bait oligonucleotide complex can be cleaved from a substrate. In some embodiments, a spatially-barcoded array populated with a plurality of capture probes can be contacted with a biological sample. The spatially-barcoded capture probes can be cleaved and then interact with cells within a biological sample. In some embodiments, a capture probe of the plurality of capture probes can optionally include at least one cleavage domain. The cleavage domain represents the portion of the capture probe that is used to reversibly attach the capture probe to an array feature. Further, one or more segments or regions of a capture probe can optionally be released from the array feature by cleavage of the cleavage domain. As an example, spatial barcodes can be released by cleavage of a cleavage domain. In some embodiments, a capture probe can also include a cleavage site (e.g., a cleavage recognition site of a restriction endonuclease), a photolabile bond, a thermosensitive bond, or a chemical-sensitive bond. In some embodiments, once a spatially-barcoded capture probe is associated with a particular cell or analytes thereof, the biological sample can be optionally removed.

In some embodiments, a cleaved or enriched analyte/bait oligonucleotide complex can be purified prior to downstream steps (e.g., sequencing). Sequencing can be performed using any method known in the art, including the exemplary sequencing methods described herein. In some instances, the methods disclosed herein include sequencing of single indexed libraries. In some instances, the methods disclosed herein include sequencing of dual indexed libraries. In some instances, the libraries comprise standard Illumina paired-end constructs which include P5 and P7 sequences. For single indexed libraries, an 8 bp sample index sequence is incorporated as the i7 sample index sequence. For dual indexed libraries, 10 bp sample index sequences are incorporated as the i7 and i5 sample index sequences.

In some embodiments, sequencing of a nucleic acid includes determining all or a portion of a sequence of a spatial barcode or the complement thereof. In some embodiments, sequencing includes determining all or a portion of the sequence of the analyte from the biological sample or a complement thereof. In some embodiments, sequencing includes high throughput sequencing. In some embodiments, sequencing includes ligating or adding one or more of an adapter sequence or a complement thereof, a poly (GI) sequence or a complement thereof, a template switching oligonucleotide sequence or a complement thereof, a primer binding site or a complement thereof, to the nucleic acid.

In some embodiments, multiple bait oligonucleotides (e.g., two or more bait oligonucleotides) can be used to interrogate spatial gene expression in a biological sample via RNA-templated ligation.

(3) Downstream Application after Hybridization of Bait oligonucleotides

In some embodiments, the methods disclosed herein include post-capture amplification. In some instances, the hybridized library molecules that are bound to streptavidin or avidin beads can be amplified prior to downstream sequencing events. In some instances, primers (e.g., a P5 adapter and/or a P7 adaptor) are added to a nucleic acid sequence or an enriched nucleic acid sequence. In some embodiments, amplification is performed before bait oligonucleotide enrichment of a nucleic acid of interest. In some embodiments, amplification is performed after bait oligonucleotide enrichment of a nucleic acid of interest.

In some embodiments of methods provided here, RNA-templated ligation is used to interrogate spatial gene expression in a biological sample (e.g., an FFPE tissue section). In some aspects, the steps of RNA-templated ligation include: (1) hybridization of pairs of bait oligonucleotides to a nucleic acid (e.g., a single-stranded cDNA or RNA molecule); (2) ligation of adjacently annealed probe pairs in situ; (3) RNase H treatment that (i) releases RNA-templated ligation products from the tissue (e.g., into solution) and (ii) destroys unwanted DNA-templated ligation products; and optionally, (4) amplification of RNA - templated ligation products (e.g., by multiplex PCR).

In some aspects, RNA-templated ligation is used for detection of a target analyte, determination of sequence identity, and/or expression monitoring and transcript analysis. In some aspects, RNA-templated ligation allows for detection of a particular change in a nucleic acid (e.g., a mutation or single nucleotide polymorphism (SNP)), detection or expression of a particular nucleic acid, or detection or expression of a particular set of nucleic acids (e.g., in a similar cellular pathway or expressed in a particular pathology). In some embodiments, the methods that include RNA-templated ligation are used to analyze nucleic acids, e.g., by genotyping, quantitation of DNA copy number or RNA transcripts, localization of particular transcripts within samples, and the like. In some aspects, systems and methods provided herein that include RNA-templated ligation identify single nucleotide polymorphisms (SNPs). In some aspects, such systems and methods identify mutations.

In some aspects, disclosed herein are methods of detecting RNA expression that includes bringing into contact a first bait oligonucleotide, a second bait oligonucleotide, and a ligase (e.g., T4 RNA ligase). In some embodiments, the first bait oligonucleotide and the second bait oligonucleotide are designed to hybridize to a target sequence such that the 5’ end of the first bait oligonucleotide and the 3’ end of the second bait oligonucleotide are adjacent and can be ligated. After hybridization, the ligase (e.g., T4 RNA ligase) ligates the first bait oligonucleotide and the second bait oligonucleotide if the target sequence is present in the target sample, but does not ligate the first bait oligonucleotide and the second bait oligonucleotide if the target sequence is not present in the target sample. The presence or absence of the target sequence in the biological sample can be determined by determining whether the first and second bait oligonucleotides were ligated in the presence of ligase.

In some aspects, two or more RNA analytes are analyzed using methods that include RNA-templated ligation. In some aspects, when two or more analytes are analyzed, a first and a second bait oligonucleotide that is specific for each RNA analyte (e.g., specifically hybridizes to) are used.

In some aspects, three or more bait oligonucleotides are used in RNA-templated ligation methods provided herein. In some embodiments, the three or more bait oligonucleotides are designed to hybridize to a target sequence such that the three or more probes hybridize adjacent to each other such that the 5’ and 3’ ends of adjacent probes can be ligated. In some embodiments, the presence or absence of the target sequence in the biological sample can be determined by determining whether the three or more bait oligonucleotides were ligated in the presence of ligase.

(e) Compositions and Kits

Also provided herein are kits that include parts and instructions for performing the methods described herein. In some instances, the kits include a set of bait oligonucleotides. In some instances, the set of bait oligonucleotides is specific to a particular set of transcripts (e.g., a cancer panel, an immunology panel, a cellular pathway panel, or a neuroscience panel as disclosed herein). In some instances, the kit includes bait oligonucleotides specific for the cancer panel described herein. In some instances, the kit includes bait oligonucleotides specific for the immune panel described herein. In some instances, the kit includes bait oligonucleotides specific for the cellular pathway panel described herein. In some instances, the kit includes bait oligonucleotides specific for the neuro panel described herein. It is further appreciated that a custom panel can be designed using methods of designing bait oligonucleotides as described herein. In some instances, a kit includes multiple sets of baits for the same panel so that multiple (e.g., duplicate, triplicate) reactions can be run at the same time on the same slide. In some instances, additional bait oligonucleotides can be added to one of the panels described herein (e.g., the cancer panel, the immune panel, the pathway panel, or the neuro panel as disclosed herein). In some instances, the kit includes at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or more reactions, wherein each reaction refers to the testing of one sample.

In some instances, a moiety (e.g., biotin) is adhered to each bait in the set of bait oligonucleotides. In some instances, the kit further includes streptavidin beads. In some instances, the kit further includes avidin beads. In some instances, the kit further includes an array comprising a plurality of capture probes (i.e., as described herein). The capture probes on the array each include a spatial barcode and a capture domain. The kit can also include reagents for carrying out the methods disclosed herein. For example, in some instances, the kit includes reagents for carrying out reverse transcription reactions (e.g., a reverse transcriptase); nucleic acid quantification as described herein; addition of fragments to nucleic acid sequences; and nucleic acid clean up. In some instances, the kit includes reagents necessary to fix, stain, and destain the biological sample.

In some instances, the kit includes components that allow for multiple reactions. In some instances, the kit includes components that allow for target hybridization. In some instances, the components for target hybridization include one or more of universal blockers, hybridization buffer, hybridization enhancer, equilibrium buffer (concentrated or lx), wash buffer (concentrated or lx), primers, control samples, or any combination thereof. In some instances, the kit includes components for amplifying the library of nucleic acids. In some instances, the components for amplifying the library of nucleic acids includes a mixture that aids in amplification, primers, or any combination thereof.

In some instances, the kit further includes instructions for carrying out library preparation and nucleic acid detection using the methods described herein.

Also provided herein are systems for analyzing one or more biological analytes present in a biological sample comprising an array comprising a plurality of capture probes. Any of the capture probes described herein can be included in part of the system. In certain instances, the captures probes each comprise a spatial barcode and a capture domain. In some instances, the system includes a composition (e.g., an enzyme) that can cleave the probe from the array. In some instances, cleavage of the probe occurs after hybridization of the probe to the analyte.

In some instances, the system includes a set of bait oligonucleotides that are specific for targets of interest. In some instances, the system includes bait oligonucleotides specific for the cancer panel described herein. In some instances, the system includes bait oligonucleotides specific for the immune panel described herein. In some instances, the system includes bait oligonucleotides specific for the pathway panel described herein. In some instances, the system includes bait oligonucleotides specific for the neuro panel described herein.

The system can also include reagents for carrying out the methods disclosed herein. For example, in some instances, the system includes reagents for carrying out reverse transcription reactions (e.g., a reverse transcriptase); nucleic acid quantification as described herein; addition of fragments to nucleic acid sequences; and nucleic acid clean up. In some instances, the system includes reagents necessary to fix, stain, and destain the biological sample.

In some instances, the system includes a reagent delivery system. The reagent delivery system includes instrumentation that allows the delivery of reagents to discrete portions of the biological sample, maintaining the integrity of the spatial patterns of the addressing scheme. In some instances, reagent delivery systems of the present assay systems comprise optional imaging means, reagent delivery hardware and control software. Reagent delivery can be achieved in a number of different ways. It should be noted that reagents may be delivered to many different biological samples at one time. A single tissue section has been exemplified herein; however, multiple biological samples may be manipulated and analyzed simultaneously. For example, serial sections of a tissue sample can be analyzed in parallel and the data combined to build a 3D map. In one exemplary aspect, the reagent delivery system may be a flow-based system. The flow-based systems for reagent delivery in the present invention can include instrumentation such as one or more pumps, valves, fluid reservoirs, channels, and/or reagent storage cells.

Also provided herein are compositions that include a bait oligonucleotide bound to a nucleic acid, wherein the nucleic acid comprises (i) a spatial barcode or a complement thereof, and (ii) a portion of an analyte binding moiety barcode, or a complement thereof. In some instances, the composition is an intermediate composition that is produced by the methods disclosed herein.

The bait oligonucleotide can be any bait oligonucleotide described herein. In some instances, the bait oligonucleotide is bound to the nucleic acid by a capture domain that binds specifically (i) all of a portion of the nucleic acid, and/or (ii) all or a portion of the analyte binding moiety barcode, or a complement thereof.

In some instances, the composition described herein further includes a molecular tag. In some instances, the composition further includes an agent that binds specifically to the molecular tag. The molecular tag can be any suitable molecular tag described herein. In some instances, the molecular tag comprises a protein. In some instances, the protein is streptavidin, avidin, or biotin. In some instances, the molecular tag includes a small molecule. In some instances, the molecular tag includes a nucleic acid. In some instances, the molecular tag includes a carbohydrate.

In some instances, the molecular tag is positioned 5’ to the domain in the bait oligonucleotide. In some instance, the molecular tag is position 3’ to the domain in the bait oligonucleotide. The agent that binds specifically to the molecular tag can be any suitable agent described herein. In some instances, the agent includes streptavidin, avidin, or biotin. In some instances, the molecular tag includes a biotin, and the agent that binds specifically to the molecular tag includes an avidin. In some instances, the molecular tag includes a biotin, and the agent that binds specifically to the molecular tag includes a streptavidin. In some instances, the molecular tag includes an avidin, and the agent that binds specifically to the molecular tag includes a biotin. In some instances, the molecular tag includes a streptavidin, and the agent that binds specifically to the molecular tag includes a biotin.

In some instances, the agent that binds specifically to the molecular includes a protein. In some instances, the protein is an antibody. In some instances, the agent that binds specifically to the molecular tag includes a nucleic acid. In some instances, the agent that binds specifically to the molecular tag includes a small molecule.

In some instances, the composition described herein further includes a substrate. The substrate can be any suitable substrate described herein. In some embodiment, the agent that binds specifically to the molecular tag is immobilized on the substrate. In some embodiments, the molecular tag is immobilized on the substrate. In some instances, the substrate is a bead.

In some instances, the substrate is a well. In some instances, the substrate is a slide.

Exemplary Embodiment

In some embodiments, disclosed herein is a method for identifying the location of a target analyte, e.g., a nucleic acid, in a biological sample, the method comprising (a) contacting a plurality of nucleic acids with a plurality of bait oligonucleotides, wherein a nucleic acid of the plurality of nucleic acids comprises (i) a spatial barcode or a complement thereof, and (ii) a portion of an analyte binding moiety barcode, or a complement thereof; with a bait oligonucleotide of the plurality of bait oligonucleotides comprising a domain that binds specifically to (i) all or a portion of a target sequence in the nucleic acid or a complement thereof, and a molecular tag; (b) capturing, and/or isolating a complex of the bait oligonucleotide specifically bound to the nucleic acid using a substrate comprising an agent that binds specifically to the molecular tag; and (c) determining (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the analyte binding moiety barcode, and using the determined sequences of (i) and (ii) to identify the location of the analyte in the biological sample.

In some embodiments, disclosed herein is a method for enriching a target analyte, e.g., a nucleic acid, in a biological sample, the method comprising (a) contacting a plurality of nucleic acids with a plurality of bait oligonucleotides, wherein a nucleic acid of the plurality of nucleic acids comprises (i) a spatial barcode or a complement thereof, and (ii) a portion of an analyte binding moiety barcode, or a complement thereof; with a bait oligonucleotide of the plurality of bait oligonucleotides comprising a domain that binds specifically to (i) all or a portion of a target sequence in the nucleic acid or a complement thereof, and a molecular tag; (b) capturing, and/or isolating a complex of the bait oligonucleotide specifically bound to the nucleic acid using a substrate comprising an agent that binds specifically to the molecular tag, thereby enriching the analyte, e.g., the nucleic acid, in the biological sample.

In some embodiments, the analyte from the biological sample is associated with a disease or condition. In some embodiments, the domain of the bait oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides. In some embodiments, the molecular tag comprises a protein. In some embodiments, the protein is biotin. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is positioned 5’ to the domain in the bait oligonucleotide. In some embodiments, the molecular tag is position 3’ to the domain in the bait oligonucleotide. In some embodiments, the agent that binds specifically to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that binds specifically to the molecular tag comprises streptavidin or avidin. In some embodiments, the agent that binds specifically to the molecular tag is attached to a substrate. In some embodiments, the substrate is a bead, a slide, or a well. In some embodiments, the nucleic acid is DNA. In some embodiments, the nucleic acid further comprises a primer binding sequence or a complement thereof. In some embodiments, the nucleic acid further comprises a unique molecular sequence or a complement thereof. In some embodiments, the nucleic acid further comprises an additional primer binding sequence or a complement thereof.

In some embodiments, the biological sample is a tissue sample. In some embodiments, the tissue sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample or a frozen tissue sample. In some embodiments, the biological sample was previously stained using a detectable label. In some embodiments, the biological sample was previously stained using hematoxylin and eosin (H&E). In some embodiments, the biological sample is a permeabilized biological sample. In some embodiments, the permeabilized biological sample has been permeabilized with a permeabilization agent. In some embodiments, the permeabilization agent is selected from an organic solvent, a cross-linking agent, a detergent, and an enzyme, or a combination thereof.

In some embodiments, the permeabilization agent is selected from an organic solvent, a cross-linking agent, a detergent, and an enzyme, or a combination thereof. In some embodiments, the analyte is an RNA molecule. In some embodiments, the RNA molecule is an mRNA molecule. In some embodiments, the determining in step (c) comprises sequencing (i) all or a portion of the sequence of the spatial barcode or the complement thereof, and (ii) all or a portion of the sequence of the analyte binding moiety barcode. In some embodiments, the sequencing is high throughput sequencing. In some embodiments, the sequencing comprises ligating an adapter to the nucleic acid.

In some embodiments, the method further comprises generating the plurality of nucleic acids comprises: (a) contacting a plurality of an analyte capture agents to the biological sample disposed on a substrate, wherein: an analyte capture agent of the plurality of the analyte capture agents comprises (i) an analyte binding moiety that binds specifically to the analyte, (ii) the analyte binding moiety barcode; and (iii) an analyte capture sequence; and the substrate comprises a plurality of capture probes, wherein a capture probe of the plurality comprises a spatial barcode and a capture domain, wherein the capture domain binds specifically to the analyte capture sequence; and (b) extending a 3’ end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (c) amplifying the extended capture probe to produce the nucleic acid. In some embodiments, the amplifying is isothermal. In some embodiments, the produced nucleic acid is released from the extended capture probe. In some embodiments, the analyte is an analyte that is dysregulated or differentially expressed in a cancer cell. In some embodiments, the analyte is an analyte that is dysregulated or differentially expressed in an immune cell. In some embodiments, the analyte is an analyte that is dyregulated in a cellular signaling pathway. In some embodiments, the analyte is an analyte that is dyregulated or differentially expressed in a neurological cell.

EXAMPLES

Example 1: Targeted Spatial Gene Expression

An exemplary targeted spatial gene expression workflow is shown in FIG. 7. Generally, a slide with a tissue affixed thereon is stored at -80 °C for less than a week. After thawing, the biological sample is fixed in methanol, and stained using H&E (Haemotoxylin and Eosin), imaged using bright field microscopy and permeabilized. After gaining access to intracellular analytes (e.g., mRNA) via permeabilization, the analytes are captured via the capture probes affixed to the slide and, in this case first and second strand cDNA synthesis occurs wherein the second strand cDNA can be used as a proxy for the captured analyte (e.g. mRNA). After cDNA synthesis, the double-stranded analytes (e.g., cDNA molecules) are denatured and transferred from the slide and quantified by qPCR. After quantifying the cDNA, the cDNA analytes are fragmented and end-repaired. Adapters are ligated to the cDNA molecules and sample index (SI) PCR is performed. After SI-PCR, the library as prepared above can be dried down.

Panels that include bait oligonucleotides specific for cancer targets, immune targets, pathway targets, or neurological targets are hybridized to the cDNA analytes. The bait oligonucleotides are associated with biotin. After hybridization of the cDNA analytes to the biotinylated bait oligonucleotide, the biotinylated baits are captured with avidin or streptavidin beads. After a wash step to remove unbound analytes, the retaining library is re- amplified as a new nucleic acid library. The purified targeted library pool is then sequenced.

As shown in FIGs. 8A-8B, using two different panels (a 200-gene cancer panel (i.e., hllk.200) and a 400-gene immune panel), spatial libraries from multiple samples (i.e., heart, breast, breast cancer, and lymph) showed very high on-target and mapped read percentage, enrichment, and UMI complexity (relative to the parent sample). These data demonstrate that the panels disclosed herein for targeted hybridization enrichment methods, are compatible with prepared spatial libraries.

Example 2: Targeted Gene Capture

In another example, the steps of targeted oligonucleotide hybridization can be performed prior to the adapter ligation and SI-PCR steps.

An exemplary workflow is shown in FIG. 9. The workflow utilizes a plurality of spatially generated library molecules obtained from gene expression library. Optionally, the library is amplified by PCR to increase the number of library molecules for targeted enrichment. In this example, up to 8 different libraries are pooled together prior to enrichment. Optionally, blocking oligos are hybridized to the libraries to prevent the modification of the targeted sequences, thus preventing downstream interference during sequencing applications. The pooled libraries are dried down for 2-4 hours in a SpeedVac, which allows samples to be dried under low pressure without heat. Biotinylated bait oligonucleotides are added and hybridized with the pooled library samples at 65 °C for 2 hours. Avidin- or streptavidin-coated beads are provided to capture the biotinylated oligonucleotide/targeted capture complexes at 65 °C for 5 minutes. Avidin or streptavidin beads are washed to remove any unbound sequence reads and post capture PCR is performed to reamplify the targeted library (e.g., to amplify the target sequences hybridized to the biotinylated oligonucleotide baits and subsequently captured by the avidin or streptavidin beads), followed by SPRI cleanup for 15 minutes. The purified targeted library pool is then sequenced.

Example 3: Hybridization and Capture using Targeted Panel

An exemplary workflow of the hybridization and capture approach using a spatial gene expression library is shown in FIG. 10.

Specifically, a biological sample (e.g., a tissue section) is obtained using a suitable method described here, e.g., by cryo-sectioning of a tissue sample. The tissue section is then stained, e.g., by H&E staining, and imaged using any suitable method described herein (e.g., brightfield microscopy). The stained tissue section is then destained using an appropriate method described herein.

After the tissue section is imaged and destained, it is permeabilized and then contacted with an array comprising a plurality of capture probes. Each capture probe comprises a spatial barcode and a poly(T) sequence that hybridizes to the poly(A) tail of an mRNA analyte in the tissue section, thereby capturing multiple mRNA analytes on the array. The captured mRNAs are then reverse transcribed into cDNAs and amplified using a suitable method described herein. As a result, a cDNA library for spatial gene expression analysis is constructed (final library from GEX spatial array).

Subsequently, hybridization and capture using bait sets are performed. Blocking oligos are hybridized to the analytes in the library to prevent the modification of the sequencing related locations, thus preventing downstream interference during sequencing applications. The library is then dried down for 2-4 hours in a speedvac, which allows samples to be dried under low pressure without heat. The biotinylated bait sets are added to hybridize with the library samples at 65 °C for 2 hours. The avidin- or streptavidin-coated beads are provided to capture the biotinylated bait set-sequence read complexes at 65 °C for 5 minutes. The avidin or streptavidin beads are washed at 65°C to remove any unbound sequence reads. In the alternative, biotinylated bait sets are added to hybridize with the library samples at 65 °C overnight (e.g., for at least 8 hours), and the avidin or streptavidin beads are washed at 60°C to remove any unbound sequence reads. PCR is performed for 15 minutes on the avidin or streptavidin beads to reamplify the targeted library (e.g., to amplify the sequence reads hybridized to the biotinylated baits and subsequently captured by the avidin or streptavidin beads), followed by SPRI cleanup for 15 minutes. The purified targeted library pool is then sequenced.

Example 4: Mouse tissues and neuro panel spatial transcriptomics

A mouse gliable panel of bait oligonucleotides was generated and targeted gene expression was performed on mouse brain tissue sections. Experiments were performed as described in the 10X Genomic Targeted Gene Expression-Spatial User Guide protocol, except mouse brain tissues were used in lieu of human tissue and mouse glia panel was used.

A whole transcriptome panel (control) versus a targeted 65 neuro gene mouse glia panel was run on a Visium array. As seen in FIG. 11, there is very high quality UMI efficiency of recovery and retention of the spatial distribution of the control (Visium whole transcriptome assay) versus the targeted gene panel. FIGs. 12A-12B show pictorially the OLIG2 spatial gene distribution in the control library (FIG. 12A) as compared to the OLIG2 gene in the targeted gene panel library (FIG. 12B). For the control, there were approximately 72k reads/spot with 2211 UMI counts, whereas the targeted gene panel had approximately 3.5k reads/spot with 3315 UMI counts. OLIG2 gene encodes the oligodendrocyte transcription factor 2 which is expressed in oligodendroglial brain cells, and in humans OLIG2 gene is expressed in oligodendroglial tumors of the brain. The results demonstrate that the same methods for targeted gene expression enrichment can also be performed using mouse bait oligonucleotides in mouse tissue samples.

Example 5: Targeted gene panels as applied to human tissue sections

Targeted gene panels were used to target capture gene expression in different human tissues from Visium arrays. Targeted gene expression methods were followed as described in the 10X Genomic Targeted Gene Expression Spatial User Guide protocol.

FIGs. 13A-13B show general data results when comparing enrichment of targets using four different gene panels on human glioblastoma tissue sections as compared to a normal whole transcriptome control Visium (non-targeted) spatial array. For all four panels (i.e., Pan-Cancer, Immunology, Gene Signature and Neuroscience enrichment gene panels), on-target reads for the different genes were between 70%-90% regardless of the targeted panel used (FIG. 13A). As shown by the Visium control, roughly 3%-10% of the whole transcriptome mapped reads corresponded to the genes that were in the targeted panels. Further, when comparing the correlation between the Visium control and the Pan-Cancer targeted panel (as a representative targeted panel, but evidenced for all panels), it was observed that the expression profiles were highly reproducible and correlated (R square of 0.98) (FIG. 13B).

To demonstrate that multiple tissue types are compatible with the targeted gene enrichment workflow, twelve different tissue types were evaluated for each of the four targeted gene panels (FIGs. 14A-14B) against a Visium control whole transcriptome spatial assay. The twelve tissue types tested were all from human: breast IDC, breast ILC, triple negative breast cancer, cerebellum, colorectal cancer, cortex, glioblastoma, heart, kidney diseased, lung, ovarian tumor and spinal cord. Regardless of tissue type or targeted gene enrichment panel, greater than 90% of the targeted genes were recovered (FIG. 14A) with over 80% matched UMIs (FIG. 14B) relative to the the control Visium whole transcriptome assay. As such, multiple tissue types are compatible with the targeted gene enrichment workflow described herein.

To demonstrate the preservation of clustering when practicing the targeted gene enrichment methods described herein, a Visium whole transcriptome spatial array on human cerebral cortex tissue was compared to a neuroscience panel targeted gene enrichment spatial array. As seen in FIG. 15A (Visium whole transcriptome control) and FIG. 15B (neuroscience panel), clustering was comparable. In fact, for the neuroscience panel, only 5k raw sequencing reads/spot were needed to recaptilulate the major biological pattern seen with the whole transcriptome assay (50k reads/spot for the whole transcriptome control). FIG.

15C confims this data, demonstrating excellent correlation between the two data sets for UMIs plotted per gene.

The next question asked was whether the targeted gene panel enrichment protocols could hone in on sub-categories of relevant genes in a tissue, and how that might compare to a pathologist’s annotation of a sample. As such, experiments were performed as described herein on human breast cancer tissue sections. FIG. 16A shows a pathologist’s annotation for different cells in a breast cancer tissue: DCIS, fat, fibrous tissue, immune cells, invasive carcinoma cells and normal gland cells were identified. Specifically, regions representing invasive carcinoma cells, fibrous tissue, and immune cells are circled by dashed lines, dotted lines, and solid lines, respectively. FIG. 16B shows the pathologist’s annotiations converted to Visium whole transcriptome data. FIG. 16C shows 196 breast cancer genes from the Pan- Cancer gene enrichment panel, narrowed down to the expression of ERBB2 gene from the Pan-Cancer enrichment panel (FIG. 16D). As seen in the figures, the targeted Pan-Cancer panel following the 196 breast cancer genes from that panel demonstrate that targeted gene expression is comparable to a pathologist’s annotation, further that ERBB2 was most abundantly expressed in the invasive carcinoma compartment as expected. Further, all data were obtained using only 5k reads/spot, as compared to 50k reads/spot for a Visium whole transcriptome. Such efficiencies require a fraction of the sequencing cost of performing whole transciptome spatial analysis, therefore not only benefitting targeted gene expression but also providing cost savings.

As such, the data demonstrate that for different targeted gene panels, and collectively for all targeted panels, the power of using targeted gene panels with Visium spatial gene expression to identify gene expression information for specific gene targets related to cancer and other diseases important for human health and disease.

Example 6. The Cancer Group

FIGs. 17A-17D detail the sequence listing of each probe that is in the Cancer Group. The Cancer Group represents a subset of the sequence reads in SEQ ID NO: 1 through SEQ ID NO: 1041963. Each entry in the Cancer Group is a SEQ ID NO of a probe in the Cancer Group as found in the Sequence Listing. Thus, the entry 843 for the Cancer Group is SEQ ID NO: 843 in the Sequence Listing. Turning to SEQ ID NO: 843 in the Sequence Listing, it is seen that the name of this sequence is probe ENSG00000002016_0. The base portion of probe ENSG00000002016J) , ENSG00000002016, stands for gene ENSG00000002016, or RAD52, in the Ensembl release 101 database, the gene which the probe ENSG00000002016_0 is designed to pull down for during targeted sequencing. See, Cunningham et al. , Ensembl 2019, PubMed PMID: 30407521, doi:10.1093/nar/gkylll3, which is hereby incorporated by reference, for details of the Ensembl database. Each sequence in the Sequence Listing provides the name of an entry of the corresponding gene in the Ensembl database for which the probe is designed. For conciseness, in FIGs. 17A-17D when the Cancer Group includes consecutive ranges of sequences, such as SEQ ID NOS: 843-887, they are listed as 843-887 rather than listing out each sequence listing (e.g., 843, 844, 845, ... , 887). Nevertheless, the Cancer Group includes each of the probes in each of the ranges provided in FIGs. 17A-17D. The genetic targets in the Cancer Group are related to cancer. See, for example, Kandoth et al. , 2013, “Mutational landscape and significance across 12 major cancer types,” Nature 502(7471), p. 333-339; Hoadley et al, 2018, “Cell-of- Origin Pahems Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer, Cell 173(2), pp 291-304; Bailey et al, 2018, “Comprehensive Characterization of Cancer Driver Genes and Mutations,” Cell 173(2), pp. 371-385, each of which is hereby incorporated by reference. See also the Cancer Genome Atlas Research Network, Illumina TruSight 500, Nanostring nCounter Pan-Cancer Pathway, IDT xGen Pan-Cancer, and the Agilent ClearSeq Comprehensive Cancer.

Example 7. The Immunology Probe Group

FIGs. 18A-18D detail the Sequence Listing of each probe that is in the Immunology Probe Group. The Immunology Probe Group represents a subset of the sequence reads in SEQ ID NO: 1 through SEQ ID NO: 1041963. Each entry in the Immunology Probe Group is a SEQ ID NO of a probe in the Immunology Probe Group as found in the Sequence Listing. Thus, the entry 888 for the Immunology Probe Group is SEQ ID NO: 888 in the Sequence Listing. Turning to SEQ ID NO: 888 in the Sequence Listing, it is seen that the name of this sequence is probe ENSG00000002549_0. The base portion of probe ENSG00000002549_0, ENSG00000002549, stands for gene ENSG00000002549, or LAP3, in the Ensembl release 101 database, the gene which the probe ENSG00000002549 is designed to pull down for during targeted sequencing. See, Cunningham el al. , Ensembl 2019, PubMed PMID: 30407521, doi:10.1093/nar/gkylll3, which is hereby incorporated by reference, for details of the Ensembl database. Each sequence in the Sequence Listing provides the name of an entry of the corresponding gene in the Ensembl database for which the probe is designed. For conciseness, in FIGs. 18A-18D, when the Immunology Probe Group includes consecutive ranges of sequences, such as SEQ ID NOS: 888-894, they are listed as 888-894 rather than listing out each sequence listing (e.g., 888, 889, 890, ... , 894). Nevertheless, the Immunology Probe Group includes each of the probes in each of the ranges provided in FIGs. 18A-18D. The genetic targets in the Immunology Probe Group are related to immunology. See, for example, Thorsson et al., 2018, “The Immune Landscape of Cancer,” Immunity 48(4), pp. 812-830, which is hereby incorporated by reference. See also BD (T Cell, Immune Response) and HTG (HTG EdgeSeq Immuno-Oncology Assay), Nanostring (Pan Cancer Immune Profiling, Adaptive Immunity, Innate Immunity).

Example 8. The Pathway Group

FIGs. 19A-19D detail the sequence listing of each probe that is in the Pathway Group. The Pathway Group represents a subset of the sequence reads in SEQ ID NO: 1 through SEQ ID NO: 1041963. Each entry in the Pathway Group is a SEQ ID NO of a probe in the Pathway Group as found in the Sequence Listing. Thus, the entry 1 for the Pathway Group is SEQ ID NO: 1 in the Sequence Listing. Turning to SEQ ID NO: 1 in the Sequence Listing, it is seen that the name of this sequence is probe ENSG00000000003_0. The base portion of probe ENSG00000000003_0, ENSG00000000003, stands for gene ENSG00000000003, or TSPAN6, in the Ensembl release 101 database, the gene which the probe ENSG00000000003 is designed to pull down for during targeted sequencing. See, Cunningham et al. , Ensembl 2019, PubMed PMID: 30407521, doi:10.1093/nar/gkylll3, which is hereby incorporated by reference, for details of the Ensembl database. Each sequence in the Sequence Listing provides the name of an entry of the corresponding gene in the Ensembl database for which the probe is designed. For conciseness, in FIGs. 19A-19D, when the Pathway Group includes consecutive ranges of sequences, such as SEQ ID NOS: 1- 37, they are listed as 1-37 rather than listing out each sequence listing (e.g., 1, 2, 3, ... , 37). Nevertheless, the Pathway Group includes each of the probes in each of the ranges provided in FIGs. 19A-19D. The genetic targets for the Pathway Group are related to biological pathways. See, for example, Behan et al, 2019, “Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens, Nature 568, pp. 511-516; Sanchez-Vega etal, 2018, “Oncogenic Signaling Pathways in The Cancer Genome Atlas,” Cell 173(2), pp. 321-337; and Fang et al, 2019, “A genetics-led approach defines the drug target landscape of 30 immune-related traits,” Nature Genetics 51(7); pp. 1082-1091, each of which is hereby incorporated by reference.

Example 9. The Neuro Group

FIGs. 20A-20D detail the sequence listing of each probe that is in the Neuro Group. The Neuro group represents a subset of the sequence reads in SEQ ID NO: 1 through SEQ ID NO: 1041963. Each entry in the Neuro Group is a SEQ ID NO of a probe in the Neuro group as found in the Sequence Listing. Thus, the entry 632 for the Neuro Group is SEQ ID NO: 632 in the Sequence Listing. Turning to SEQ ID NO: 632 in the Sequence Listing, it is seen that the name of this sequence is probe ENSG00000001626_0. The base portion of probe ENSG00000001626_0, ENSG00000001626, stands for gene ENSG00000001626, or CFTR, in the Ensembl release 101 database, the gene which the probe ENSG00000001626 is designed to pull down for during targeted sequencing. See, Cunningham et al, Ensembl 2019, PubMed PMID: 30407521, doi:10.1093/nar/gkylll3, which is hereby incorporated by reference, for details of the Ensembl database. Each sequence in the Sequence Listing provides the name of an entry of the corresponding gene in the Ensembl database for which the probe is designed. For conciseness, in FIGs. 20A-20D, when the Neuro Group includes consecutive ranges of sequences, such as SEQ ID NOS: 632-705, they are listed as 632-705 rather than listing out each sequence listing (e.g., 632, 633, 634, ... , 705). Nevertheless, the Neuro Group includes each of the probes in each of the ranges provided in FIGs. 20A-20D. The genetic targets for the Neuro Group are related to neurobiology.

The genes that are represented in the Neuro Group are ABAT, ABCA1, ABCA7, ABCB1, ABCG2, ABI3, ABL1, ACAA1, ACE, ACHE, ACINI, ACTG1, ACTN1,

ACVRL1, ADAM 10, ADAM15, ADAM 17, AD ARBI, ADCY5, ADCY8, ADCY9, ADCYAP1, ADGRG1, ADM, ADORA1, ADORA2A, ADRA2A, ADRA2C, ADRBl, ADRB2, AGER, AGT, AGTR1, AIF1, AK3, AKT1, AKT1S1, AKT2, AKT3, ALB, ALDH1A1, ALDH1L1, ALDH7A1, ALK, ALOX12, ALOX12B, ALOX15, ALS2,

AMIGO 1, AMPH, ANG, ANGPT2, AN03, ANXA11, AP1S1, AP2A2, AP2B1, AP3M2, AP3S1, AP4S1, APAF1, APBB1, APC, APEX1, APOA1, APOD, APOE, APP, APTX, AQP1, AQP4, AR, ARC, ARHGAP44, ARHGEF10, ARMC10, ARRB2, ARSA, ARX, ASCL1, ASCL2, ASPM, ATCAY, ATF4, ATF6, ATG5, ATM, ATP13A2, ATP1A2,

ATP 1 A3, ATP2B3, ATP6AP2, ATP6V0C, ATP6V0D1, ATP6V0E1, ATP6V0E2, ATP6V1A, ATP6V1D, ATP6V1E1, ATP6V1G2, ATP6V1H, ATP7A, ATP8A2, ATRN, ATRX, ATXN2, ATXN3, ATXN7, AVP, B4GALT6, BACE1, BACE2, BAD, BAI1, BAX, BCAS1, BCAS2, BCHE, BCL2, BCL2L1, BDNF, BECN1, BID, BIN1, BIRC5, BMI1, BNIP3, BRAF, BRMS1L, BTNL2, C210RF2, C3, C5, C6, C90RF72, CA1, CA2, CAB39, CACNA1A, CACNA1B, CACNA1C, CACNA1D, CACNA1F, CACNA1S, CACNB2, CACNB4, CADM3, CADPS, CALB1, CALB2, CALC A, CALM1, CALML5, CAMK2B, CAMK2D, CAMK2G, CAMK4, CAPN1, CASK, CASP1, CASP3, CASP6, CASP7, CASP8, CASP9, CASS4, CAST, CAV1, CBS, CCK, CCKBR, CCL2, CCL5, CCND1, CCNH,

CCR2, CCR5, CCS, CD11B, CD14, CD163, CD2AP, CD31, CD33, CD34, CD39, CD4, CD40, CD44, CD68, CD8A, CD9, CDC27, CDC40, CDC6, CDH1, CDH2, CDK2, CDK5, CDK5R1, CDK5RAP2, CDK5RAP3, CDK7, CDKL5, CDKN1A, CDKN1B, CDKN2A, CDS1, CELF1, CEND1, CENPJ, CEP135, CEP41, CER1, CERS1, CERS2, CERS4, CERS6, CFTR, CHAT, CHCHD10, CHCHD2, CHD4, CHGA, CHI3L1, CHL1, CHMP1A,

CHMP2B, CHRM1, CHRM2, CHRM3, CHRM5, CHRNA1, CHRNA3, CHRNA4, CHRNA5, CHRNA7, CHRNB2, CKB, CLCN6, CLDN15, CLDN5, CLN3, CLN5, CLN8, CLU, CNKSR2, CNP, CNR1, CNR2, CNTF, CNTN1, CNTN4, CNTNAP1, CNTNAP2, COL4A1, COL4A2, COMT, COQ2, CP, CPLX1, CPT1B, CR1, CREB1, CREBBP, CRH, CRHRl, CRP, CRTC2, CRYAB, CSF1, CSF1R, CSF2RB, CSNK1A1, CSPG4, CSTB, CTNNB1, CTNS, CTSC, CTSD, CTSE, CUL1, CUL2, CUL3, CX3CL1, CX3CR1,

CXCL10, CXCL11, CXCL12, CXCL16, CXCL8, CXCR4, CXXC1, CYBB, CYCS, CYP19A1, CYP27A1, CYP46A1, CYP4X1, DAGLA, DAO, DBH, DBNL, DCC, DCTN1, DCX, DDC, DDIT3, DDX23, DES, DGKB, DGKE, DHCR7, DISCI, DKK1, DLAT, DLD, DLG3, DLG4, DLGAP1, DLL4, DLX1, DLX2, DMBT1, DMD, DMPK, DNAH1, DNAJA2, DNAJC5, DNAJC6, DNM1L, DNM2, DNMT1, DOT1L, DRDl, DRD2, DRD4, EDN1, EDNRB, EEF1A1, EFHC1, EFNA1, EFNA5, EFNB3, EFR3A, EGF, EGFL7, EGFR, EGLN1, EGLN3, EGR1, EGR2, EHMT1, EIF2S1, EIF4G1, ELAVL1, EMCN, EMP2, EN2, ENG, EN02, ENTPD2, ENTPD4, EP300, EPAS1, EPHA1, EPHA2, EPHA3, EPHA4, EPHA5, EPHA6, EPHA7, EPHB2, EPO, ERBB2, ERBB3, ERBB4, ERCC1, ERG,

ERLEC1, EROIL, ESAM, ESR1, ESR2, F2, F2RL1, FA2H, F AM 126 A, FAM174A, FANCF, FAS, FAS LG, FBX07, FERMT2, FGF12, FGF14, FGF2, FGFR1, FGFR3, FIG4, FKTN, FLNA, FLT1, FLT4, FMR1, FN1, FOLH1, FOLR1, FOS, FOXG1, FOXP2, FRMPD4, FUS, FXN, FYN, GAA, GABRA1, GABRA2, GABRA4, GABRA5, GABRB2, GABRB3, GABRG1, GABRG2, GABRP, GABRR1, GABRR3, GAD1, GAD2, GAK, GAL, GAL3ST1, GALC, GATA2, GBA, GCG, GCH1, GDNF, GDPD2, GFAP, GFPT1, GGA1, ENSG00000100031, ENSG00000286070, GHRL, GIGYF2, GJA1, GJB1, GLRB, GLS, GLUD1, GNAI1, GNAI2, GNAI3, GNAOl, GNAQ, GNAS, GNB5, GNG2, GNGT1, GNPTAB, GNPTG, GNRH1, GNRHR, GPD1L, GPHN, GPR37, GPR4, GPR84, GPRASP1, GRIA1, GRIA2, GRIA3, GRIA4, GRIK2, GRIN1, GRIN2A, GRIN2B, GRIN2C, GRIN2D, GRIN3B, GRM1, GRM2, GRM3, GRM5, GRM8, GRN, GSK3B, GSN, GSR, GSS, GSTP1, GTF2A1, GTF2B, GTF2H1, GTF2H3, GTF2IRD1, GUCY1B3, GUSB, HAP1, HCN1, HDAC1, HDAC2, HDAC6, HDAC7, HEXB, HGF, HIF1A, HIF3A, HLA-DRA1, HLA- DRB1, HLA-B, HLA-A, HMGB1, HMOX1, HNRNPA1, HNRNPA2B1, HNRNPM, HOMER1, HOXA2, HPCAL1, HPGDS, HRAS, HRH3, HSPA6, HSPB1, HTR1A, HTR1B, HTR2A, HTR2C, HTR3A, HTR4, HTR5A, HTRA2, HTT, ICAM1, ICAM2, IDE, IDH1, IDH2, IDOl, IFNG, IGF1, IGF1R, IGF2, IGFBP2, IKBKB, IL10, IL10RA, IL13RA1, IL15RA, IL18, ILIA, IL1B, IL1R1, IL1RN, IL2, IL4, IL4R, IL6, IL6R, INA, INHBB, INPP4A, INPP5D, INPP5F, INS, INSR, IPCEF1, IRF8, ISLR2, ITGA5, ITGA7, ITGAL, ITGAM, ITGAX, ITGB3, ITM2B, ITPR1, ITPR2, ITPR3, JAM3, JUN, KATNA1, KCNA1, KCNB1, KCNC3, KCND3, KCNJ10, KCNK9, KCNMA1, KCNN3, KCNQ2, KCNQ3, KDELR2, KDR, KEAP1, KEL, KIAA1161, KIF3A, KIF5A, KIFBP, KIT, KLK6, KNL1, KRAS, LI CAM, LAMA2, LAMB2, LAMP1, LARGE1, LCLAT1, LDHC, LDLR, LEP, LEPR, LGALS3, LGI1, LIF, LINGOl, LMNA, LMNBl, LOX, LPAR1, LPL, LRP1, LRRC25, LRRC4, LRRC61, LRRK2, LSM2, LSM7, LSR, LTBR, LYPLA1, MAG,

MAGEE 1, MAL, MAN2B1, MAP2, MAP2K1, MAP2K2, MAPK1, MAPK10, MAPK3, MAPK8, MAPK9, MAPKAPK2, MAPT, MARCO, MARK2, MARK4, ENSG00000015479, ENSG00000280987, MBP, MDM2, MEAF6, MECP2, MEDIO, MEF2C, MET, MFN2, MFSD8, MGMT, MIF, MKI67, MKS1, MME, MMP12, MMP14, MMP16, MMP19, MMP2, MMP24, MMP3, MMP9, MMRN2, MNAT1, MOG, MPZ, MS4A4A, MS4A6E, MSH6, MSI1, MSN, MT-ATP6, MT-ND1, MT1H, MTA1, MTA2, MTHFR, MTOR, MUTYH, MYC, MYCN, MYCT1, MYD88, MYH10, MYRF, NAGLU, NAMPT, NAPSA, NCAM1, NCF1, NCL, NEFH, NEFL, NEGRI, NEK1, NELFA, NELL2, NEOl, NES, NEUN, NF1, NFE2L2, NFKB1, NFKBIA, NFKBIB, NGF, NGFR, NINJ2, NKX2-1, NKX6-2, NLGN4X, NLRP3, NMB, NME5, NME8, NMNAT2, NOG, NOL3, NOS1, NOS2, NOS3, NOSTRIN, NOTCH1, NOTCH2, NOTCH3, NOTCH4, NOVA1, NPAS4, NPC1, NPC2, NPHP1, NPPB, NPR1, NPTN, NPY, NQOl, NR3C1, NR3C2, NR4A2, NRG1, NRP1, NRP2, NRXN1, NSF, NTF3, NTN1, NTNG1, NTRK1, NTRK2, NTRK3, NTS, NWD1, OCLN, OLFM3, OLIG2, OLR1, ONECUT2, OPA1, OPHN1, OPRK1, OPRM1, OPTN, ORC4, ORC6, OSMR, OTUD4, OXR1, OXTR, P2RX4, P2RX7, P2RY12, P4HA1, PADI2, PAFAH1B1, PAH, PAK1, PALM, PANK2, PARK2, PARK5, PARK7, PARP1, PAX2, PAX3, PAX6, PCDH19, PCNA, PCNT, PCSK2, PDE1B, PDE4D, PDGFRA, PDGFRB, PDPK1, PECAM1, PFN1, PGAM1, PGK1, PHF19, PHF2, PHF21A, PICALM, PIK3CA, PIK3CB, PIK3R1, PINK1, PKN1, PLA2G16, PLA2G2A, PLA2G2E, PLA2G2F, PLA2G4A, PLA2G4B, PLA2G4C, PLA2G4D, PLA2G4E, PLA2G4F, PLA2G6, PLAU, PLAUR, PLCB1, PLCB2, PLCB3, PLCB4, PLCG2, PLCL2, PLEKHG4, PLEKH02, PLLP, PLOD2, PLP1, PLS1, PLXNB3, PLXNCl, PMCH, PMP22, PNKD, POC1A, PODNL1, POLG, POLR2B, POLR2H,

POLR2J, POLR2K, POLR2L, POMC, POMGNT1, POU5F1, PPARG, PPARGC1A,

PPM1L, PPP1R1B, PPP2CA, PPP2R5C, PPP2R5E, PPP3CA, PPP3CB, PPP3CC, PPT1, PQBP1, PRF1, PRKAA2, PRKACA, PRKACB, PRKACG, PRKAR1B, PRKCA, PRKCB, PRKCE, PRKCG, PRKCQ, PRKCSH, PRKN, PRKRA, PRL, PRNP, PROM1, PRPF3, PRPF31, PRPH, PRRT2, PSEN1, PSEN2, PSMB8, PSMB9, PTDSS1, PTDSS2, PTEN, PTGDS, PTGS2, PTK2, PTK2B, PTPRN, PTPRN2, PTPRR, PVALB, PVRL2, QKI, RAB2A, RAB38, RAB39B, RAB3A, RAB3C, RAB7L1, RABGEF1, RAC1, RAD23B, RAF1, RAN, RAPGEF2, RARS2, RASGRPl, RBI, RBBP8, RBFOX3, RCAN1, RDX, RELA, RELN, REST, RET, RHEB, RHOA, RIMS1, RIN3, RING1, RIT2, RNF216,

ROBOl, RPL39L, RPP25, RRAS, RTN4, RYR1, RYR2, RYR3, S100G, S100B, SARM1, SART1, SCAMP2, SCARB2, SCN1A, SCN2A, SCN5A, SCN8A, SCN9A, SEC23A, SEMA3A, SERPINB6, SERPINE1, SERPINI1, SETX, SF3A2, SF3B2, SF3B4, SGK1, SGPL1, SGTA, SH3TC2, SHANK2, SHH, SIGMAR1, SIRT1, SIRT2, SIRT7, SIX3, SLA, SLC11A1, SLC12A5, SLC17A6, SLC17A7, SLC18A2, SLC18A3, SLC1A1, SLC1A2, SLC1A3, SLC24A4, SLC25A19, SLC2A1, SLC2A3, SLC32A1, SLC4A10, SLC6A3, SLC6A4, SLC6A6, SLC8A1, SLC9A6, SLIT1, SLIT2, SLIT3, SLU7, SMARCB1, SMN1, SMPD4, SMYD1, SNAP91, SNCA, SNCAIP, SNCB, SNRPA, SOD1, SOD2, SORCS3, SORLl, SOXIO, SOX2, SOX9, SP1, SP100, SPAG4, SPARC, SPAST, SPG11, SPI1, SPP1, SPTBN2, SQSTM1, SRC, SRGAP1, SRGAP2, SRGAP3, SRI, SRSF4, SRY, SST, ST3GAL3, ST3GAL5, STAB1, STAMBPL1, STAT1, STAT3, STC1, STC2, STIL, STX1A, STX1B, STX2, SUCLA2, SYN1, SYNJ1, SYP, SYT1, SYT13, SYT4, SYT7, TAC1, TACR1, TAF1, TAF10, TAF4, TAF4B, TAF6L, TAF9, TARDBP, TAU, TAZ, TBC1D24, TBK1, TBP, TBPL1, TBR1, TCERG1, TCIRG1, TENM2, TERT, TF, TFAM, TGFB1, TGFB2, TGFBR2, TGIF1, TGM2, TH, THAP1, THY1, TIA1, TIE1, TLR2, TLR3, TLR4, TMEM106B, TMEM119, TMEM216, TMEM230, TMEM237, TMEM67, TNC, TNF, TNFRSF10A, TNFRSF10B, TNFRSF10D, TNFRSF11B, TNFRSF12A, TNFRSF1A, TNFRSF1B, TNFSF10, TNNI3, TNR, TOMM40, TORI A, TP53, TP53INP1, TP73, TPH1, TPH2, TPM1, TPP1, TRADD, TREM1, TREM2, TRIM28, TRIM37, TRIP4, TRPA1, TRPM2, TRPV1, TRPV4, TSC1, TSC2, TSEN34, TSPAN13, TSPO, TUBA4A, TUBA8, TUBB3, TWISTNB, TWNK, TXNL1, TYROBP, U2AF2, UBB, UBE2K, UBE2N, UBE3A, UBQLN1, UBQLN2, UCHL1, UGCG, UGT8, UNC13A, USP21, USP30, VAPB, VCP, VEGFA, VEGFB, VHL, VIP, VPS13C, VPS 13D, VPS35, WDR62, WFDC2, WFS1, WNT10A, WNT6, WT1, XAB2, XBP1, XIAP, XK, ZBED6CL, ZCWPW1, ZEB2, ZIC2, and ZNF24.