Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NUCLEIC ACID PROBES
Document Type and Number:
WIPO Patent Application WO/2024/006361
Kind Code:
A1
Abstract:
The technology relates in part to compositions that include oligonucleotide probes useful for enriching nucleic acid preparation products for analysis, kits and related methods. Described herein are compositions comprising: a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample, for use in capture-C technologies and methods associated therewith.

Inventors:
BELTON JON-MATTHEW (US)
SCHMITT ANTHONY (US)
ZHOU XIANG (US)
AL-BASSAM MAHMOUD (US)
JIVANJEE IBRAHIM (US)
Application Number:
PCT/US2023/026458
Publication Date:
January 04, 2024
Filing Date:
June 28, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ARIMA GENOMICS INC (US)
International Classes:
C12Q1/6874; C12N15/10; C12N5/07; C12Q1/6806; C12Q1/6832; C12Q1/6869; G16B20/00; G16B20/20; G16B30/00
Domestic Patent References:
WO2020236851A12020-11-26
Foreign References:
US11365446B22022-06-21
Other References:
BLANC VALERIE, RIORDAN JESSE D., SOLEYMANJAHI SAEED, NADEAU JOSEPH H., NALBANTOGLU ILKE, XIE YAN, MOLITOR ELIZABETH A., MADISON BL: "Apobec1 complementation factor overexpression promotes hepatic steatosis, fibrosis, and hepatocellular cancer", JOURNAL OF CLINICAL INVESTIGATION, vol. 131, no. 1, 4 January 2021 (2021-01-04), XP093127541, ISSN: 0021-9738, DOI: 10.1172/JCI138699
Attorney, Agent or Firm:
WEEKS, Anne, E. et al. (US)
Download PDF:
Claims:
What is claimed is:

1. A composition, comprising: a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample, wherein: a plurality of intron-directed oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns comprising (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; the intron-directed oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the oligonucleotide probe pairs in the set of oligonucleotide probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each oligonucleotide probe in the set of single oligonucleotide probes comprises a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

2. The composition of claim 1 , wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from a cancer gene selected from the group of cancer genes listed in Appendix 1.

3. The composition of claim 1, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 110 to about 130 consecutive nucleotides.

4. The composition of claim 1, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns capable of hybridizing to a target nucleic acid fragment are not capable of hybridizing to contiguous, non-overlapping regions of the nucleic acid fragment.

5. A method for nucleic acid enrichment, comprising: subjecting target nucleic acid from a nucleic acid sample to nucleic acid cleavage conditions in which nucleic acid fragments are generated; subjecting the target nucleic acid fragments to linking conditions in which proximity ligated nucleic acid molecules are generated; contacting the proximity ligated nucleic acid molecules with a composition comprising a plurality of oligonucleotide probes under hybridization conditions in which hybridization complexes comprising proximity ligated nucleic acid hybridized to oligonucleotide probes are generated; isolating the complexes; and analyzing nucleic acid in the complexes; and wherein the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample, wherein: a plurality of intron-directed oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns comprises (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; the intron- directed oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the oligonucleotide probe pairs in the set of oligonucleotide probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each oligonucleotide probe in the set of single oligonucleotide probes comprises a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

6. The method of claim 5, wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from a cancer gene selected from the group of cancer genes listed in Appendix 1.

7. The method of claim 5, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 110 to about 130 consecutive nucleotides.

8. The method of claim 5, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns capable of hybridizing to a target nucleic acid fragment are not capable of hybridizing to contiguous, nonoverlapping regions of the nucleic acid fragment.

9. A method for designing a plurality of oligonucleotide probes, comprising: identifying a plurality of target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample; designing a plurality of intron-directed oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns, comprising (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer in length, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; wherein: the intron-directed oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the oligonucleotide probe pairs in the set of oligonucleotide probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each oligonucleotide probe in the set of single oligonucleotide probes comprises a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

10. The method of claim 9, wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from a cancer gene selected from the group of cancer genes listed in Appendix 1.

11. The method of claim 9, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 110 to about 130 consecutive nucleotides.

12. The method of claim 9, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns capable of hybridizing to a target nucleic acid fragment are not capable of hybridizing to contiguous, nonoverlapping regions of the nucleic acid fragment.

13. A kit, comprising: a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample, wherein: a plurality of oligonucleotide probes capable of hybridizing to introns comprises (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; the probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) 120 consecutive nucleotides in length, (ii) 100% complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the probe pairs in the set of probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end 5 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end 5 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each probe in the set of single oligonucleotide probes comprises a 5’ or 3’ end 5 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

14. The kit of claim 13, wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from a cancer gene selected from the group of cancer genes listed in Appendix 1.

15. The kit of claim 13, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 110 to about 130 consecutive nucleotides.

16. The kit of claim 13, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns capable of hybridizing to a target nucleic acid fragment are not capable of hybridizing to contiguous, nonoverlapping regions of the nucleic acid fragment.

Description:
NUCLEIC ACID PROBES

Field

The technology relates in part to compositions that include oligonucleotide probes useful for enriching nucleic acid preparation products for analysis, kits and related methods.

Cross Reference to Related

This application claims the benefit under 35 U.S.C. § 119(e) of U.S. provisional application number 63/356,878 filed June 29, 2022, and U.S. provisional application number 63/402,043, filed August 29, 2022. The entire contents of each of these referenced applications is incorporated by reference herein.

Chromosome conformation capture (3C) and similar technologies (HiC, 4C, 5C and the like) have been used to map long-range interactions and which probes the three dimensional architecture of whole genomes.

However, there are some limitations with these technologies, including that there is a need to sequence very deep and thus the technology is time consuming as well as expensive and there is a need of developing new techniques that can solve those problems and enable the possibility to evaluate and detect direct intra- and inter-chromosomal interactions of interest, and potentially to utilize the information to diagnose specific medical and/or biological conditions.

Sahlen et al. (WO2014168575) discloses a method (“capture Hi-C”) of targeted chromosome conformation capture which combines HiC technology with hybridization of probes to targeted regions which allows for enrichment of the targeted regions for HiC analysis.

Disclosed herein is a composition of nucleic acid probes for use in capture-C technologies and methods associated therewith.

Described herein are compositions comprising: a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample, wherein: a plurality of intron-directed oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns comprising (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; the intron-directed oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the oligonucleotide probe pairs in the set of oligonucleotide probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each oligonucleotide probe in the set of single oligonucleotide probes comprises a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

Also described herein are methods for designing a plurality of oligonucleotide probes, comprising: identifying a plurality of target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample; designing a plurality of intron-directed oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns, comprising (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer in length, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; wherein: the intron- directed oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the oligonucleotide probe pairs in the set of oligonucleotide probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each oligonucleotide probe in the set of single oligonucleotide probes comprises a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

Also described herein is a kit, comprising: a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample, wherein: a plurality of oligonucleotide probes capable of hybridizing to introns comprises (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; the probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) 120 consecutive nucleotides in length, (ii) 100% complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the probe pairs in the set of probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end 5 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end 5 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each probe in the set of single oligonucleotide probes comprises a 5’ or 3’ end 5 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

Other aspects described herein are oligonucleotide probes in the set of oligonucleotide probe pairs where the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from a cancer gene selected from the group of cancer genes listed in Appendix 1; the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 110 to about 130 consecutive nucleotides, and the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns capable of hybridizing to a target nucleic acid fragment are not capable of hybridizing to contiguous, non-overlapping regions of the nucleic acid fragment.

Brief Description of the Drawings

The drawings illustrate certain implementations of the technology and are not limiting. For clarity and ease of illustration, the drawings are not made to scale and, in some instances, various aspects may be shown exaggerated or enlarged to facilitate an understanding of particular implementations.

FIGs. 1A-1B show a schematic of probes covering the entire breakpoint cluster region (BCR) gene.

FIGs. 2A-2B show a schematic of probes covering the first exon and part of the first intron of the BCR gene.

FIGs. 3A-3B illustrate simulated capture HiC data for the oligonucleotide probe panel. FIGs. 4A-4C illustrate probe performance, as shown by read coverage, when probes are selected for size and percent GC content. FIGs. 5A-5D illustrate probe performance comparing the sparse design improvements described herein, which are also selected for size and percent GC content, as compared to standard dense design probes.

FIGs. 6A-6C illustrate probe performance comparing the sparse design improvements described herein, which are also selected for size and percent GC content, as compared to standard dense design probes.

FIGs. 7A-7B illustrate exemplary capture HiC data for the oligonucleotide probe panel as compared to whole genome HiC data.

FIGs. 8A-8B illustrate capture loops that are associated with the MYC gene in alignment with reported epigenetic data.

Appendix

Appendix 1 presents a list of cancer genes containing polynucleotide regions to which oligonucleotide probes can hybridize, and/or to which oligonucleotide probes can be designed to hybridize, in certain implementations. Appendix 1 shows the name of the cancer gene, the chromosome on which the cancer gene is located, the start and end positions of the cancer gene, according to coordinate positions from the Genome Reference Consortium Human Build 38 (GRCH38), and on which Watson(+) or Crick(-) strand the gene is oriented in the sense direction.

Detailed Description

Oligonucleotide probe compositions described herein can be utilized to (i) detect structural variant (SV) break points in exons in the genes in the panel; (ii) detect SV break points in introns in the genes in the panel; (iii) detect SV break points upstream or downstream of the genes in the panel (e.g., neighborhood SVs); (iv) identifying novel looping interactions with each of the genes, referred to as “neoloops,” associated with breakpoints in or near the genes in the panel; (v) detecting SV’s in FFPE tissues; or (vi) a combination of two or more of any of (i), (ii), (iii), (iv) and (v). Thus, oligonucleotide probe compositions described herein provide advantages of (i) detecting SV’s in exons as well as in introns and outside of genes; (ii) detecting SV’s in FFPE tissues; (iii) detecting SV’s in solid tumor samples; and (iv) detecting SV’s in undiagnosed tumor samples.

Oligonucleotide probe compositions (e.g., a panel of oligonucleotide probes that hybridize to or target one or more of the cancer genes presented in Appendix 1) may be generated using one or more design methodologies as described herein. In some embodiments, a panel of oligonucleotide probes are generated using a sparse design. In some embodiments, a sparse design for generating a panel of oligonucleotide probes involves constraining the panel to consist of oligonucleotide probes having a %GC content of 40% to 60%, constraining the panel such that discrete oligonucleotide probes do not comprise overlapping cut sites with other probes within the panel, and/or reducing the number of predicted low-performance (e.g., high off-target) oligonucleotide probes. In some embodiments, a sparse design for generating a panel of oligonucleotide probes involves constraining the panel to consist of oligonucleotide probes having a %GC content of 40% to 60%, constraining the panel such that discrete oligonucleotide probes do not comprise a restriction cut site or hybridize to a region of the target that comprises a restriction cut site, and/or reducing the number of predicted low-performance (e.g., high off-target) oligonucleotide probes. In some embodiments, a panel of oligonucleotide probes that target intronic regions (e.g., of an cancer gene) are generated using a sparse design.

Sparse design (e.g., sparse intronic design) can, in some embodiments, add more complexity to the subsequent HiC capture data than traditional approaches. For example, sparse design enables the capture of long-range genomic information when using a panel of oligonucleotide probes that target a limited region of the genome, making it easier for bioinformatics pipelines to identify SVs and to make the collected data more similar to a genome-wide HiC experiment. Thus, in some embodiments, sparse design provides the low cost of sequencing benefits of capture, while retaining the useful information derived from genome-wide sequencing approaches.

Traditional approaches for HiC capture have relied on panels of oligonucleotide probes that are designed to target exons. This has been done primarily because targeting exonic regions provides low off- target effects (e.g., due to specificity of probe hybridization to target exonic regions relative to off-target exonic regions) while targeting intronic regions has historically provided higher off-target (e.g., because there is a higher degree of sequence similarity among disparate intronic regions) and other low-performance issues (e.g., low-performance issues relating to low %GC, e.g., lower than 40% GC, or high %GC, e.g., higher than 60% GC). Traditional approaches for HiC capture would have required the use of a large number of low- performance oligonucleotide probes in order to overcome the higher off-target effects and other low-performance issues of oligonucleotide probes. Thus, in part because of performance and in part because of cost concerns, panels of oligonucleotide probes have avoided the inclusion of intronic targeting probes at all. The inventors of the present disclosure have found that sparse intronic design approaches allow for the inclusion of intronic targeting probes and greatly improve the overall sequencing data obtained using a relatively smaller panel of oligonucleotide probes (e.g., saving sequencing costs and instrument time).

Oligonucleotide probe compositions

Provided in certain aspects is a composition that includes: a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from a process comprising nucleic acid restriction enzyme cleavage of a nucleic acid sample, where: a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns comprises (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; the probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the probe pairs in the set of probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each probe in the set of single oligonucleotide probes comprises a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

The term “target nucleic acid fragments of introns” generally refers to target nucleic acid fragments generated by fragmentation of gene intron regions. The term ’’probe pair” generally refers to two probes designed to specifically hybridize to a fragment, where one probe is designed to hybridize near the 5’ end of the fragment and the other probe is designed to hybridize near the 3’ end of the fragment. The term “intron-directed” oligonucleotide probes generally refers to oligonucleotide probes capable of hybridizing to, and/or designed to hybridize to, target nucleic acid fragments of introns. The term “single probe” generally refers to a probe designed to specifically hybridize near the 5’ end of a fragment, where the fragment typically hybridizes to a single probe, and not a probe pair, in the plurality of oligonucleotide probes.

In certain implementations, the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or oligonucleotide probes in the composition, are capable of hybridizing to nucleic acid fragments from 100 or more cancer genes. In certain instances, the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or oligonucleotide probes in the composition, are capable of hybridizing to nucleic acid fragments from about 500 or more cancer genes, about 800 or more cancer genes, about 1000 or more cancer genes, about 1200 or more cancer genes, or about 1400 or more cancer genes. In certain implementations, the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or oligonucleotide probes in the composition, are capable of hybridizing to nucleic acid fragments from 100 or more cancer genes listed in Appendix 1. In certain instances, the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or oligonucleotide probes in the composition, are capable of hybridizing to nucleic acid fragments from about 500 or more cancer genes listed in Appendix 1 , about 800 or more cancer genes listed in Appendix 1, about 1000 or more cancer genes listed in Appendix 1 , about 1200 or more cancer genes listed in Appendix 1, or about 1400 or more cancer genes listed in Appendix 1.

In certain implementations, the untranslated regions surrounding cancer genes comprise (i) a nucleic acid region extending in the 5’ direction from the 5’ end of an cancer gene coding region and (ii) a nucleic acid region extending in the 3’ direction from the 3’ end of an cancer gene coding region. In certain instances, a nucleic acid region is within 10,000 consecutive nucleotides from an cancer gene coding region of at least one cancer gene, is within 5,000 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene, is within 2,000 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene or is within 1 ,500 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene. In certain implementations, the untranslated regions surrounding cancer genes include promoter regions, and sometimes the promoter regions comprise a 5’ end about 500 consecutive nucleotides to about 1500 consecutive nucleotides from a 5’ end or a 3’ end of each cancer gene coding region.

In certain instances, the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or in the composition, is about 120 consecutive nucleotides. In certain implementations, the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or in the composition, is complementary to a subsequence of a nucleic acid fragment.

In certain instances, the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or the polynucleotide of each of the oligonucleotide probes in the composition, is 100% complementary to a corresponding portion of the minus strand of Genome Reference Consortium Human Build 38 (GRCH38). In certain instances, the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or the polynucleotide of each of the oligonucleotide probes in the composition, is 100% complementary to a corresponding portion of the plus strand of Genome Reference Consortium Human Build 38 (GRCH38). In certain instances, the composition comprises a mixture of (i) polynucleotides of oligonucleotide probes that are 100% complementary to a corresponding portion of the minus strand of Genome Reference Consortium Human Build 38 (GRCH38) and (ii) polynucleotides of oligonucleotide probes that are 100% complementary to a corresponding portion of the minus strand of Genome Reference Consortium Human Build 38 (GRCH38). In certain implementations, each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or each of the oligonucleotide probes in the composition, is capable of hybridizing to a fragment under hybridization conditions of moderate stringency and/or high stringency. Hybridization and hybridization conditions of different stringency are described herein. In certain instances, the polynucleotides of the intron-directed oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, and/or of each of the oligonucleotide probes in the composition, are capable of hybridizing to target nucleic acid fragments having one or more of the following features: (i) an average fragment size of about 180 consecutive nucleotides to about 200 consecutive nucleotides (e.g., about 190 or 191 consecutive nucleotides); (ii) a GC content of about 1% (e.g., about 1.53%) to about 97% (e.g., about 96.6%); and (iii) an average GC content of about 40% to about 45% (e.g., an average GC content of about 43%). In certain instances, the polynucleotides of the intron-directed oligonucleotide probes in the set of oligonucleotide probe pairs and/or the set of single oligonucleotide probes do not comprise polynucleotides having a GC content of less than about 40% GC content or higher than about 60% GC content.

In certain implementations, target nucleic acid fragments result from nucleic acid restriction enzyme cleavage. Restriction enzyme cleavage sometimes is performed by one or more restriction endonucleases, such as by restriction enzymes cutting at A GATC and G A ANTC, where “ A ” represents the cut site on the positive DNA strand (i.e. , Watson strand), for example. In certain instances, target nucleic acid fragments result from a process that includes digestion of sample nucleic acid by two, three, four, five or six restriction enzymes. In certain implementations, target nucleic acid fragments result from a process that includes digestion of nucleic acid by one or more restriction enzymes chosen from one or more of HpyCH4IV, Hinfl, HinP11 and Msel. In certain instances, target nucleic acid fragments result from a process that includes digestion of nucleic acid by one or more restrictions enzymes chosen from one or more of Hindi 11, Dpnll, Mbol and Nlalll. In certain implementations, target nucleic acid fragments result from a process that includes a combination of (i) digestion of nucleic acid by one or more restriction enzymes and (ii) physical fragmentation of nucleic acid (e.g., sonication, shearing). Polynucleotides of oligonucleotide probes can be designed by generating a set of target nucleic acid fragments by an in silico process that includes in silico cleavage of nucleic acid, and analyzing polynucleotide targets within the target nucleic acid fragments generated in silico for the design of complementary polynucleotides of oligonucleotide probes capable of hybridizing to the polynucleotide targets. In some embodiments, the oligonucleotide probes are designed such that the probes do not hybridize to a target sequence that contains a restriction cut site. A polynucleotide of each of the oligonucleotide probes in a composition often is synthetic. A polynucleotide of an oligonucleotide probe can include any backbone and base combination suitable for the oligonucleotide probe to specifically hybridize to a target nucleic acid (e.g., a target nucleic acid fragment). In certain instances, a polynucleotide of each of the oligonucleotide probes includes RNA or DNA, a modified backbone of RNA or DNA, one or more modified DNA or RNA bases, or a combination of two or more of the foregoing. In certain implementations, oligonucleotide probes in a composition include a capture agent. In such implementations, a polynucleotide of an oligonucleotide probe often is modified by the capture agent and is not naturally occurring. Any suitable capture agent that can specifically bind to a capture agent counterpart (e.g., a capture agent counterpart linked to a solid phase) can be associated with an oligonucleotide probe. A capture agent and capture agent counterpart sometimes are members of a binding pair. Non-limiting examples of binding pairs include an antigenic epitope and an antibody or immunologically reactive fragment thereof; an antibody and a hapten; a digoxigenin moiety and an anti-digoxigenin antibody; a fluorescein moiety and an anti-fluorescein antibody; an operator and a repressor; a nuclease and a nucleotide; a lectin and a polysaccharide; a steroid and a steroid-binding protein; an active compound and an active compound receptor; a hormone and a hormone receptor; an enzyme and a substrate; an immunoglobulin and protein A; an oligonucleotide or polynucleotide and its corresponding complement; biotin and avidin; biotin and streptavidin; the like or combinations thereof. A capture agent and a capture agent counterpart sometimes include, independently, biotin, avidin or streptavidin. A capture agent sometimes is covalently attached to an oligonucleotide probe, sometimes is linked to a 5’ end and/or 3’ end of an oligonucleotide probe, and/or sometime is linked to one or more nucleotide bases of an oligonucleotide probe (e.g., linked to one or more bases chosen from adenine, cytosine, guanine, thymine or uracil of an oligonucleotide probe). Methods for associating a capture agent with an oligonucleotide probe are known in the art.

In certain implementations, oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns are not capable of hybridizing to tiled subsequences of a target nucleic acid fragment. In certain instances, oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns are not capable of hybridizing to contiguous regions of the nucleic acid fragment. The term “contiguous regions of a nucleic acid fragment” is synonymous with the term “tiled” and generally refers to target polynucleotide regions of a target nucleic acid fragment, to which oligonucleotide probes are capable of hybridizing, that are disposed end to end in the target nucleic acid fragment with no gap, or a short gap of 1 nucleotide to about 5 consecutive nucleotides, between the ends of adjacent target polynucleotide regions in the fragment. There sometimes are no gaps between the ends of adjacent (tiled) oligonucleotide probes hybridized to contiguous regions of a nucleic acid fragment. In certain instances, the contiguous (tiled) target polynucleotide regions of a target nucleic acid fragment, to which oligonucleotide probes are capable of hybridizing, are disposed end to end in the target nucleic acid fragment with a gap between the ends of adjacent target polynucleotide regions in the fragment, where the gap can be 1 nucleotide, 2 consecutive nucleotides, 3 consecutive nucleotides, 4 consecutive nucleotides, or 5 consecutive nucleotides or any combination thereof.

In certain instances, the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns does not contain a probe capable of hybridizing to an end of a first target nucleic acid fragment and to an end of a second target nucleic acid fragment (e.g., intron-directed probes are not designed to a region overlapping a restriction enzyme cleavage site, and therefore are not designed to hybridize to, under hybridization conditions, two target nucleic acid fragments containing adjacent polynucleotides in non-cleaved target nucleic acid). In certain instances, the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns does not contain an oligonucleotide probe capable of hybridizing to, or does not contain an oligonucleotide probe designed to hybridize to, a target nucleic acid fragment having a length of less than 130 consecutive nucleotides (e.g., intron- directed probes are not designed to hybridize to, under hybridization conditions, target nucleic acid fragment of a length of less than 130 consecutive nucleotides). In certain instances, the plurality of oligonucleotide probes in the composition does not contain an oligonucleotide probe capable of hybridizing to, or does not contain an oligonucleotide probe designed to hybridize to, a target nucleic acid fragment having a length of less than 130 consecutive nucleotides (e.g., oligonucleotide probes in the composition are not designed to hybridize to, under hybridization conditions, target nucleic acid fragment of a length of less than 130 consecutive nucleotides). In certain implementations, the first oligonucleotide probe and the second oligonucleotide probe in each of the oligonucleotide probe pairs do not hybridize to regions of a target nucleic acid fragment that overlap. In certain implementations, oligonucleotide probes in the composition do not hybridize to regions of a target nucleic acid fragment that overlap.

When the introns are fragmented, some resulting nucleic acid fragments will be of a length sufficient to hybridize at least a pair of oligonucleotide probes, such length being at least about 220 consecutive nucleotides, at least about 240 consecutive nucleotides, at least about 260 consecutive nucleotides, at least about 280 consecutive nucleotides or at least about 300 consecutive nucleotides. When the introns are fragmented, some resulting nucleic acid fragments will be of a length sufficient to hybridize only a single oligonucleotide probe, such length being about 120 consecutive nucleotides to about 140 consecutive nucleotides in length (e.g., about 130 consecutive nucleotides in length), or no more than about 220 consecutive nucleotides to about 280 consecutive oligonucleotides in length (e.g., no more than about 230m 240, 250, 260, 270 or 280 consecutive nucleotides in length). When introns are fragmented, the resulting nucleic acid fragments typically are a plurality of sizes, often including (i) nucleic acid fragments of sizes sufficient to hybridize at least a pair of oligonucleotide probes and (ii) nucleic acid fragments of a length sufficient to hybridize only a single oligonucleotide probe, and sometimes including (iii) nucleic acid fragments of a length insufficient to hybridize to any oligonucleotide probe. Accordingly, in certain embodiments, the composition may include a plurality of oligonucleotide probes capable of hybridizing to introns comprising (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length.

In certain implementations, each oligonucleotide probe of a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters is capable of hybridizing to a region spanning about 100 to about 500 consecutive nucleotides (e.g., about 350 consecutive nucleotides) from the 5’ end of each target nucleic acid fragment or to a region spanning about 100 to about 500 consecutive nucleotides (e.g., about 350 consecutive nucleotides) from the 3’ end of each target nucleic acid fragment. In certain instances, the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters capable of hybridizing to a target nucleic acid fragment are capable of hybridizing to contiguous, non-overlapping regions of the nucleic acid fragment. In certain instances, the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters are not restricted to a GC content range (e.g., not restricted to a GC percentage range) or to a GO content threshold (e.g., not restricted to a GC percentage threshold).

The term “target nucleic acid fragments of exons and promoters” generally refers to target nucleic acid fragments generated by fragmentation of gene exon regions and to target nucleic acid fragments generated by fragmentation of gene promoter regions. The term “exon-directed” and “promoter-directed” oligonucleotide probes generally refers to oligonucleotide probes capable of hybridizing to, and/or designed to hybridize to, target nucleic acid fragments of exons and promoters, respectively.

The term “abundance” of each oligonucleotide probe in a composition generally refers to the amount of the oligonucleotide probe in the composition, which sometimes is a molar concentration for example. The abundance of one or more oligonucleotide probes in a composition sometimes is a relative abundance, relative to the abundance of one or more other oligonucleotide probes, and sometimes is expressed as a molar ratio for example. The abundance of at least one oligonucleotide probe in a composition sometimes is the same or about the same as the abundance of another oligonucleotide probe in the composition. The abundance of each oligonucleotide probe in a composition sometimes is the same or about the same as the abundance of each other oligonucleotide probe in the composition. In certain implementations, the abundance of at least one intron-directed oligonucleotide probe in a composition sometimes is the same or about the same as the abundance of another intron- directed oligonucleotide probe in the composition. In certain instances, the abundance of each intron-directed oligonucleotide probe in the composition is the same or about the same as each other intron-directed oligonucleotide probe.

In certain implementations, the abundance of one or more oligonucleotide probes in a composition is different than the abundance of one or more other oligonucleotide probes in the composition. For example, the abundance of a first oligonucleotide probe in a composition can be lower or higher than the abundance of a second oligonucleotide probe in the composition. The abundance of one or more exon-directed oligonucleotide probes sometimes is different than the abundance of one or more other exon-directed oligonucleotide probes. The abundance of one or more promoter-directed oligonucleotide probes sometimes is different than the abundance of one or more other promoter-directed oligonucleotide probes. The abundance of an oligonucleotide probe can be different than the abundance of another oligonucleotide probe in a composition to normalize the amount of target nucleic acid hybridized to each of the probes. For example, where a first target nucleic acid is present in a sample at an abundance higher than the abundance of a second target nucleic acid in the sample, the abundance of a first oligonucleotide probe that specifically hybridizes to the first target nucleic acid fragment can be lower than the abundance of a second oligonucleotide probe that specifically hybridizes to the second target nucleic acid fragment, to normalize the amount of target nucleic acid hybridized to each of the oligonucleotide probes. The abundance of at least a subset of oligonucleotide probes in a composition can be selected, for the purpose of normalizing the amount of target nucleic acid hybridized for example, by methods known in the art (e.g., by utilizing Agilent Sure Design software or other software products).

In certain implementations, the abundance of one or more designed oligonucleotide probes in the composition, or the relative abundance of one or more designed oligonucleotide probes relative to one or more other oligonucleotide probes, can be selected. Oligonucleotide probe abundance is described herein.

Methods for designing and manufacturing probe compositions

Provided in certain aspects is a method for designing a plurality of oligonucleotide probes, the method including: identifying a plurality of target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample; and designing a plurality of oligonucleotide probes described herein. In the designed oligonucleotide probes, oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns often include (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer in length, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length.

In the designed oligonucleotide probes, the intron-directed oligonucleotide probes, in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes, often include a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments. In certain implementations, a design process includes removing from a first set of designed intron-directed oligonucleotide probes (i) oligonucleotide probes having a GC content less than about 40% and greater than about 60%, and (ii) oligonucleotide probes complementary to a subsequence of a nucleic acid that repeats in the nucleic acid fragments, thereby generating a second set of designed intron- directed oligonucleotide probes.

In the designed oligonucleotide probes, each of the probe pairs in the set of probe pairs often includes (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing. In the designed oligonucleotide probes, each probe in the set of single oligonucleotide probes often includes a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

Methodology for fragmenting sample nucleic acid in silico, identifying polynucleotides in the fragmented sample nucleic acid in silico, and designing oligonucleotide probes that are complementary to the polynucleotides of the fragmented sample nucleic acid in silico, are known. Agilent Sure Design software and other commercially available software can be utilized to design oligonucleotide probes in silico that are complementary to the polynucleotides of target nucleic acid, for example. Methodology for synthesizing oligonucleotide probes from in silico oligonucleotide probe designs are known, including without limitation phosphite triester synthesis, phosphotriester synthesis and phosphodiester synthesis methodologies.

Preparation and analysis of enriched nucleic acid

Provided in certain aspects is a method for nucleic acid enrichment, the method including: subjecting target nucleic acid from a nucleic acid sample to nucleic acid cleavage conditions in which nucleic acid fragments are generated; subjecting the target nucleic acid fragments to linking conditions in which proximity ligated nucleic acid molecules are generated; contacting the proximity ligated nucleic acid molecules with a composition comprising a plurality of oligonucleotide probes described herein under hybridization conditions in which hybridization complexes comprising proximity ligated nucleic acid hybridized to oligonucleotide probes are generated; isolating the complexes; and analyzing nucleic acid in the complexes.

The terms nucleic acid(s), nucleic acid molecule(s), nucleic acid fragment(s), target nucleic acid(s), nucleic acid template(s), template nucleic acid(s), nucleic acid target(s), target nucleic acid(s), polynucleotide(s), polynucleotide fragment(s), target polynucleotide(s), polynucleotide target(s), and the like may be used interchangeably throughout the disclosure. The terms refer to nucleic acids of any composition from, such as DNA (e.g., complementary DNA (cDNA; synthesized from any RNA or DNA of interest), genomic DNA (gDNA), genomic DNA fragments, mitochondrial DNA (mtDNA), recombinant DNA (e.g., plasmid DNA), and the like), RNA (e.g., message RNA (mRNA), small interfering RNA (siRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), microRNA, transacting small interfering RNA (ta-siRNA), natural small interfering RNA (nat-siRNA), small nucleolar RNA (snoRNA), small nuclear RNA (snRNA), long non-coding RNA (IncRNA), non-coding RNA (ncRNA), transfer-messenger RNA (tmRNA), precursor messenger RNA (pre-mRNA), small Cajal body-specific RNA (scaRNA), piwi-interacting RNA (piRNA), endoribonuclease-prepared siRNA (esiRNA), small temporal RNA (stRNA), signal recognition RNA, telomere RNA, RNA highly expressed by a fetus or placenta, and the like), and/or DNA or RNA analogs (e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like), RNA/DNA hybrids and polyamide nucleic acids (PNAs), all of which can be in single- or double-stranded form, and unless otherwise limited, can encompass known analogs of natural nucleotides that can function in a similar manner as naturally occurring nucleotides.

A nucleic acid may be, or may be from, a plasmid, phage, virus, bacterium, autonomously replicating sequence (ARS), mitochondria, centromere, artificial chromosome, chromosome, or other nucleic acid able to replicate or be replicated in vitro or in a host cell, a cell, a cell nucleus or cytoplasm of a cell in certain embodiments. A template nucleic acid in some embodiments can be from a single chromosome (e.g., a nucleic acid sample may be from one chromosome of a sample obtained from a diploid organism). Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, single nucleotide polymorphisms (SNPs), and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues. The term nucleic acid is used interchangeably with locus, gene, cDNA, and mRNA encoded by a gene. The term also may include, as equivalents, derivatives, variants and analogs of RNA or DNA synthesized from nucleotide analogs, single-stranded ("sense" or "antisense," "plus" strand or "minus" strand, "forward" reading frame or "reverse" reading frame) and doublestranded polynucleotides. The term "gene" refers to a section of DNA involved in producing a polypeptide chain; and generally includes regions preceding and following the coding region (leader and trailer) involved in the transcription/translation of the gene product and the regulation of the transcription/translation, as well as intervening sequences (introns) between individual coding regions (exons). A nucleotide or base generally refers to the purine and pyrimidine molecular units of nucleic acid (e.g., adenine (A), thymine (T), guanine (G), and cytosine (C)). For RNA, the base thymine is replaced with uracil (U). Nucleic acid length or size may be expressed as a number of bases.

Target nucleic acids may be any nucleic acids of interest. Nucleic acids may be polymers of any length composed of deoxyribonucleotides (i.e., DNA bases), ribonucleotides (i.e., RNA bases), or combinations thereof, e.g., 10 bases or longer, 20 bases or longer, 50 bases or longer, 100 bases or longer, 200 bases or longer, 300 bases or longer, 400 bases or longer, 500 bases or longer, 1000 bases or longer, 2000 bases or longer, 3000 bases or longer, 4000 bases or longer, 5000 bases or longer. In certain aspects, nucleic acids are polymers composed of deoxyribonucleotides (i.e., DNA bases), ribonucleotides (i.e., RNA bases), or combinations thereof, e.g., 10 bases or less, 20 bases or less, 50 bases or less, 100 bases or less, 200 bases or less, 300 bases or less, 400 bases or less, 500 bases or less, 1000 bases or less, 2000 bases or less, 3000 bases or less, 4000 bases or less, or 5000 bases or less.

Nucleic acid may be single-stranded or double-stranded. Single-stranded DNA (ssDNA), for example, can be generated by denaturing double-stranded DNA by heating or by treatment with alkali, for example. Accordingly, in some embodiments, ssDNA is derived from double-stranded DNA (dsDNA).

Nucleic acid (e.g., genomic DNA, nucleic acid targets, oligonucleotides, probes, primers) may be described herein as being complementary to another nucleic acid, having a complementarity region, being capable of hybridizing to another nucleic acid, or having a hybridization region. The terms “complementary” or “complementarity” or “hybridization” generally refer to a nucleotide sequence that base-pairs by non-covalent bonds to a region of a nucleic acid. In the canonical Watson-Crick base pairing, adenine (A) forms a base pair with thymine (T), and guanine (G) pairs with cytosine (C) in DNA. In RNA, thymine (T) is replaced by uracil (II). As such, A is complementary to T and G is complementary to C. In RNA, A is complementary to II and vice versa. In a DNA-RNA duplex, A (in a DNA strand) is complementary to II (in an RNA strand). Typically, “complementary” or “complementarity” or “capable of hybridizing” refer to a nucleotide sequence that is at least partially complementary. These terms may also encompass duplexes that are fully complementary such that every nucleotide in one strand is complementary or hybridizes to every nucleotide in the other strand in corresponding positions. In certain instances, a nucleotide sequence may be partially complementary to a target, in which not all nucleotides are complementary to every nucleotide in the target nucleic acid in all the corresponding positions.

The percent identity of two nucleotide sequences can be determined by aligning the sequences for optimal comparison purposes. When the total number of positions is different between the two nucleotide sequences, gaps may be introduced in the sequence of one or both sequences for optimal alignment. The nucleotides at corresponding positions are then compared, and the percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity= # of identical positions/total # of positions* 100). When a position in one sequence is occupied by the same nucleotide as the corresponding position in the other sequence, then the molecules are identical at that position. In certain instances, extra or missing bases within a sequence are expressed as gaps in an alignment and may or may not be factored into a percent identity calculation. For example, a percent identity calculation may include a number of mismatches and gaps or may include a number of mismatches only.

As used herein, the phrase “hybridizing” or grammatical variations thereof, refers to binding of a first nucleic acid molecule to a second nucleic acid molecule under low, medium or high stringency conditions, or under nucleic acid synthesis conditions. Hybridizing can include instances where a first nucleic acid molecule binds to a second nucleic acid molecule, where the first and second nucleic acid molecules are complementary. As used herein, “specifically hybridizes” refers to preferential hybridization under nucleic acid synthesis conditions of a primer, oligonucleotide, or probe, to a nucleic acid molecule having a sequence complementary to the primer, oligonucleotide, or probe compared to hybridization to a nucleic acid molecule not having a complementary sequence. For example, specific hybridization includes the hybridization of a primer, oligonucleotide, or probe to a target nucleic acid sequence that is complementary to the primer, oligonucleotide, or probe.

Primer, oligonucleotide, or probe sequences and length can affect hybridization to target nucleic acid sequences. Depending on the degree of mismatch between the primer, oligonucleotide, or probe and target nucleic acid, low, medium or high stringency conditions may be used to effect primer/target, oligonucleotide/target, or probe/target annealing. As used herein, the term “stringent conditions” refers to conditions for hybridization and washing. Methods for hybridization reaction temperature condition optimization are known, and can be found, e.g., in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y., 6.3.1-6.3.6 (1989). Aqueous and non-aqueous methods are described in the aforementioned reference and either can be used. Non-limiting examples of stringent hybridization conditions include, for example, hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 50°C. Another example of stringent hybridization conditions includes hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 55°C. A further example of stringent hybridization conditions includes hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 60°C. Often, stringent hybridization conditions are hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 65°C. More often, stringency conditions can include 0.5 M sodium phosphate, 7% SDS at 65°C, followed by one or more washes at 0.2X SSC, 1% SDS at 65°C. Stringent hybridization temperatures also can be altered (generally, lowered) with the addition of certain organic solvents, such as formamide for example. Organic solvents such as formamide can reduce the thermal stability of doublestranded polynucleotides, so that hybridization can be performed at lower temperatures, while still maintaining stringent conditions and extending the useful life of heat labile nucleic acids. In some embodiments, target nucleic acids comprise degraded DNA. Degraded DNA may be referred to as low-quality DNA or highly degraded DNA. Degraded DNA may be highly fragmented and may include damage such as base analogs and abasic sites subject to miscoding lesions and/or intermolecular crosslinking. For example, sequencing errors resulting from deamination of cytosine residues may be present in certain sequences obtained from degraded DNA (e.g., miscoding of C to T and G to A).

Nucleic acid may be derived from one or more sources (e.g., a biological sample described herein) by methods known in the art. Any suitable method can be used for isolating, extracting and/or purifying DNA from a biological sample (e.g., from blood or a blood product, tissue, tumor), non-limiting examples of which include methods of DNA preparation, various commercially available reagents or kits, such as DNeasy®, RNeasy®, QIAprep®, QIAquick®, and QIAamp® (e.g., QIAamp® Circulating Nucleic Acid Kit, QiaAmp® DNA Mini Kit or QiaAmp® DNA Blood Mini Kit) nucleic acid isolation/purification kits by Qiagen, Inc. (Germantown, Md); GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.); GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.); DNAzol®, ChargeSwitch®, Purelink®, GeneCatcher® nucleic acid isolation/purification kits by Life Technologies, Inc. (Carlsbad, CA); NucleoMag®, NucleoSpin®, and NucleoBond® nucleic acid isolation/purification kits by Clontech Laboratories, Inc. (Mountain View, CA); the like or combinations thereof. In certain aspects, nucleic acid is isolated from a fixed biological sample, e.g., formalin-fixed, paraffin-embedded (FFPE) tissue. Genomic DNA from FFPE tissue may be isolated using commercially available kits - such as the AHPrep® DNA/RNA FFPE kit by Qiagen, Inc. (Germantown, Md), the RecoverAII® Total Nucleic Acid Isolation kit for FFPE by Life Technologies, Inc. (Carlsbad, CA), and the NucleoSpin® FFPE kits by Clontech Laboratories, Inc. (Mountain View, CA).

In some embodiments, nucleic acid is extracted from cells using a cell lysis procedure. Cell lysis procedures and reagents are known in the art and may generally be performed by chemical (e.g., detergent, hypotonic solutions, enzymatic procedures, and the like, or combination thereof), physical (e.g., French press, sonication, and the like), or electrolytic lysis methods. Any suitable lysis procedure can be utilized. For example, chemical methods generally employ lysing agents to disrupt cells and extract the nucleic acids from the cells, followed by treatment with chaotropic salts. Physical methods such as freeze/thaw followed by grinding, the use of cell presses and the like also are useful. In some instances, a high salt and/or an alkaline lysis procedure may be utilized. In some instances, a lysis procedure may include a lysis step with EDTA/Proteinase K, a binding buffer step with high amount of salts (e.g., guanidinium chloride (GuHCI), sodium acetate) and isopropanol, and binding DNA in this solution to silica-based column.

Nucleic acids can include extracellular nucleic acid in certain embodiments. The term "extracellular nucleic acid" as used herein can refer to nucleic acid isolated from a source having substantially no cells and also is referred to as “cell-free” nucleic acid (cell-free DNA, cell-free RNA, or both), “circulating cell-free nucleic acid” (e.g., CCF fragments, ccfDNA) and/or “cell-free circulating nucleic acid.” Extracellular nucleic acid can be present in and obtained from blood (e.g., from the blood of a human subject). Extracellular nucleic acid often includes no detectable cells and may contain cellular elements or cellular remnants. Non-limiting examples of acellular sources for extracellular nucleic acid are blood, blood plasma, blood serum and urine. In certain aspects, cell-free nucleic acid is obtained from a body fluid sample chosen from whole blood, blood plasma, blood serum, amniotic fluid, saliva, urine, pleural effusion, bronchial lavage, bronchial aspirates, breast milk, colostrum, tears, seminal fluid, peritoneal fluid, pleural effusion, and stool. As used herein, the term “obtain cell-free circulating sample nucleic acid” includes obtaining a sample directly (e.g., collecting a sample, e.g., a test sample) or obtaining a sample from another who has collected a sample. Extracellular nucleic acid may be a product of cellular secretion and/or nucleic acid release (e.g., DNA release). Extracellular nucleic acid may be a product of any form of cell death, for example. In some instances, extracellular nucleic acid is a product of any form of type I or type II cell death, including mitotic, oncotic, toxic, ischemic, and the like and combinations thereof. Without being limited by theory, extracellular nucleic acid may be a product of cell apoptosis and cell breakdown, which provides basis for extracellular nucleic acid often having a series of lengths across a spectrum (e.g., a "ladder"). In some instances, extracellular nucleic acid is a product of cell necrosis, necropoptosis, oncosis, entosis, pyrotosis, and the like and combinations thereof. In some embodiments, sample nucleic acid from a test subject is circulating cell-free nucleic acid. In some embodiments, circulating cell free nucleic acid is from blood plasma or blood serum from a test subject. In some aspects, cell-free nucleic acid is degraded. In certain aspects, cell-free nucleic acid comprises circulating cancer nucleic acid (e.g., cancer DNA). In certain aspects, cell-free nucleic acid comprises circulating tumor nucleic acid (e.g., tumor DNA).

Extracellular nucleic acid can include different nucleic acid species, and therefore is referred to herein as "heterogeneous" in certain embodiments. For example, blood serum or plasma from a person having a tumor or cancer can include nucleic acid from tumor cells or cancer cells (e.g., neoplasia) and nucleic acid from non-tumor cells or non-cancer cells. In some instances, cancer nucleic acid and/or tumor nucleic acid sometimes is about 5% to about 50% of the overall nucleic acid (e.g., about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, or 49% of the total nucleic acid is cancer, or tumor nucleic acid).

Nucleic acid may be provided for conducting methods described herein with or without processing of the sample(s) containing the nucleic acid. In some embodiments, nucleic acid is provided for conducting methods described herein after processing of the sample(s) containing the nucleic acid. For example, a nucleic acid can be extracted, isolated, purified, partially purified or amplified from the sample(s). The term “isolated” as used herein refers to nucleic acid removed from its original environment (e.g., the natural environment if it is naturally occurring, or a host cell if expressed exogenously), and thus is altered by human intervention (e.g., "by the hand of man") from its original environment. The term “isolated nucleic acid” as used herein can refer to a nucleic acid removed from a subject (e.g., a human subject). An isolated nucleic acid can be provided with fewer non-nucleic acid components (e.g., protein, lipid) than the amount of components present in a source sample. A composition comprising isolated nucleic acid can be about 50% to greater than 99% free of non-nucleic acid components. A composition comprising isolated nucleic acid can be about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater than 99% free of non-nucleic acid components. The term “purified” as used herein can refer to a nucleic acid provided that contains fewer non-nucleic acid components (e.g., protein, lipid, carbohydrate) than the amount of non-nucleic acid components present prior to subjecting the nucleic acid to a purification procedure. A composition comprising purified nucleic acid may be about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater than 99% free of other non-nucleic acid components. The term “purified” as used herein can refer to a nucleic acid provided that contains fewer nucleic acid species than in the sample source from which the nucleic acid is derived. A composition comprising purified nucleic acid may be about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater than 99% free of other nucleic acid species. In certain examples, small fragments of nucleic acid (e.g., 30 to 500 bp fragments) can be purified, or partially purified, from a mixture comprising nucleic acid fragments of different lengths. In certain examples, nucleosomes comprising smaller fragments of nucleic acid can be purified from a mixture of larger nucleosome complexes comprising larger fragments of nucleic acid. In certain examples, larger nucleosome complexes comprising larger fragments of nucleic acid can be purified from nucleosomes comprising smaller fragments of nucleic acid. In certain examples, cancer cell nucleic acid can be purified from a mixture comprising cancer cell and non-cancer cell nucleic acid. In certain examples, nucleosomes comprising small fragments of cancer cell nucleic acid can be purified from a mixture of larger nucleosome complexes comprising larger fragments of non-cancer nucleic acid. In some embodiments, nucleic acid is provided for conducting methods described herein without prior processing of the sample(s) containing the nucleic acid. For example, nucleic acid may be analyzed directly from a sample without prior extraction, purification, partial purification, and/or amplification.

In certain instances, cleavage conditions include contacting the sample nucleic acid with a restriction endonuclease, and sometimes the restriction endonuclease cleaves the sample nucleic acid at A GATC and G A ANTC, where “ A ” represents the cut site. In certain instances, sample nucleic acid is contacted with two or more restriction enzyme types. In certain implementations, linking conditions comprise contacting the nucleic acid fragments with a ligase under conditions in which ends of fragments in proximity are joined. Methodology for preparing proximity ligated nucleic acid is known in the art, and non-limiting examples of such methodology are referred to as Hi-C, 3C, 4C, ChiA-PET and variants thereof (e.g., capture Hi- C), as described additionally herein. In certain instances, oligonucleotide probes include a capture agent and the complexes are isolated by contacting the complexes with a solid phase comprising a capture agent counterpart that specifically binds to the capture agent under binding conditions. A solid phase sometimes is a plurality of beads, such as magnetic or SEPHAROSE (TM) beads for example. In certain implementations, (i) a capture agent is selected from biotin, avidin and streptavidin, and (ii) the capture agent counterpart is a molecule that specifically binds to the capture agent and independently is selected from biotin, avidin and streptavidin. Hybridization complexes can be isolated by contacting the complexes with a solid phase that includes a capture agent counterpart under conditions in which the capture agent counterpart of the solid phase specifically binds to a capture agent associated with the oligonucleotide probes of the hybridization complexes, and separating the complexes bound to the solid phase from complexes not bound to the solid phase.

Any suitable method for carrying out proximity ligation may be used. For example, a Hi-C method typically includes the following steps: (1) digestion of a chromatin sample with a restriction enzyme (or fragmentation), where a non-limiting example of a chromatin sample is chromatin obtained from solubilized and decompacted FFPE (formalin-fixed paraffin embedded) tissue; (2) labelling the digested ends by filling in the 5’-overhangs with biotinylated nucleotides; and (3) ligating the spatially proximal digested ends, thus preserving spatial-proximal contiguity information. Once spatial-proximal contiguity information is preserved, further steps in a HiC method may include: purifying and enriching biotin-labelled ligation junction fragments, preparing a library from the enriched fragments and sequencing the library. Another example of a proximity ligation method may include the following steps: (1) digestion of a chromatin sample with a restriction enzyme (or fragmentation), where a non-limiting example of a chromatin sample is chromatin obtained from solubilized and decompacted FFPE (formalin-fixed paraffin embedded) tissue; (2) blunting the digested or fragmented ends or omission of the blunting procedure; and (3) ligating the spatially proximal ends, thus preserving spatial-proximal contiguity information. Once spatial-proximal contiguity information is preserved, further steps can include: using size selection to purify and enrich ligated fragments, which represent ligation junction fragments, preparing a library from the enriched fragments and sequencing the library. In some embodiments, proximity ligated nucleic acid molecules are generated in situ (i.e. , within a nucleus). For methods that include Capture HiC, a further step is included where ligation products containing certain nucleic acid sequences are enriched using one or more capture probes (see e.g., International Patent Application Publication No. WO 2014/168575), such as a set of oligonucleotide probes described herein.

Processes that include preparing hybridization complexes comprising proximity ligated nucleic acid hybridized to oligonucleotide probes and then isolating the complexes can enrich the relative abundance of polynucleotides in sample nucleic acid complementary to the oligonucleotide probe polynucleotides, which are referred to as “target polynucleotides” herein. As oligonucleotide probes described herein include polynucleotides complementary to cancer gene introns, exons and promoters, such probes are useful for enriching cancer gene polynucleotides in a sample. A subset of hybridization complexes containing proximity ligated nucleic acid hybridized to probes described herein typically is enriched for cancer gene target polynucleotides. Target polynucleotides (e.g., cancer gene oligonucleotides) generally are enriched in a subset of hybridization complexes containing proximity ligated nucleic acid hybridized, or that was hybridized, to the probes, relative to all proximity ligated nucleic acid prepared from a nucleic acid sample. Stated another way, the abundance (e.g., percentage) of target polynucleotides (e.g., cancer gene polynucleotides) generally is greater in the subset of hybridization complexes containing proximity ligated nucleic acid hybridized, or that was hybridized, to the probes, relative to the abundance of target polynucleotides (e.g., percentage) in all proximity ligated nucleic acid prepared from a nucleic acid sample.

Target nucleic acid sometimes is modified as part of a nucleic acid analysis process. A target nucleic acid can be modified to include an identifier (e.g., a tag, an indexing tag), a capture sequence, a label, an adapter, a restriction enzyme site, a promoter, an enhancer, an origin of replication, a stem loop, a complimentary sequence (e.g., a primer binding site, an annealing site), a suitable integration site (e.g., a transposon, a viral integration site), a modified nucleotide, a unique molecular identifier (UMI), the like or combinations thereof. In some embodiments, a nucleic acid or isolated nucleic acid comprises one or more adapters (e.g., sequencing adapters, also known as sequencing adapter oligonucleotides). Sequencing adapters may comprise sequences complementary to flow-cell anchors, and sometimes are utilized to immobilize a nucleic acid to a solid support, such as the inside surface of a flow cell, for example. Adapters and other polynucleotide components described typically are not associated with the nucleic acid in vivo and thereby do not naturally occur with the nucleic acid. In certain instances, analyzing the nucleic acid in the complexes includes sequencing the proximity ligated nucleic acid of the isolated complexes. Target nucleic acid sometimes is modified as part of a sequencing process. In certain implementations, non-naturally occurring oligonucleotides that facilitate sequencing, known as “sequencing adapter oligonucleotides,” are joined to proximity ligated nucleic acid in the oligonucleotide probe-isolated proximity ligated nucleic acid, thereby forming adapter-modified nucleic acid. Adapter-modified nucleic acid is optionally amplified by an amplification process known in the art, and the adapter-modified nucleic acid (or amplified adapter-modified nucleic acid) is then subjected to sequencing conditions to identify the polynucleotide sequence of proximity ligated nucleic acid. Additional aspects of nucleic acid analytical methodology are described herein.

Analysis of nucleic acid from a sample can identify one or more structural variants in the nucleic acid relative to nucleic acid from a reference genome or from another sample in certain instances. Various types of structural variants that can be identified are described herein. Samples Provided herein are methods and compositions for processing and/or analyzing nucleic acid. Nucleic acid utilized in methods and compositions described herein may be isolated from a sample obtained from a subject (e.g., a test subject). A subject can be any living or non-living organism, including but not limited to a human and a non-human animal. Any human or nonhuman animal can be selected, and may include, for example, mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g., cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g., pig), camelid (e.g., camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark. In some embodiments, a subject is a human. A subject may be a male or female. A subject may be any age (e.g., an embryo, a fetus, an infant, a child, an adult). A subject may be a cancer patient, a patient suspected of having cancer, a patient in remission, a patient with a family history of cancer, and/or a subject obtaining a cancer screen. In some embodiments, a subject is an adult patient. In some embodiments, a subject is a pediatric patient.

A nucleic acid sample may be isolated or obtained from any type of suitable biological specimen or sample (e.g., a test sample). A nucleic acid sample may be isolated or obtained from a single cell, a plurality of cells (e.g., cultured cells), cell culture media, conditioned media, a tissue, an organ, or an organism. In some embodiments, a nucleic acid sample is isolated or obtained from a cell(s), tissue, organ, and/or the like of an animal (e.g., an animal subject). In some instances, a nucleic acid sample may be obtained as part of a diagnostic analysis.

A sample or test sample may be any specimen that is isolated or obtained from a subject or part thereof (e.g., a human subject, a cancer patient, a tumor). Non-limiting examples of specimens include fluid or tissue from a subject, including, without limitation, blood or a blood product (e.g., serum, plasma, or the like), umbilical cord blood, chorionic villi, amniotic fluid, cerebrospinal fluid, spinal fluid, lavage fluid (e.g., bronchoalveolar, gastric, peritoneal, ductal, ear, arthroscopic), biopsy sample (e.g., from pre-implantation embryo; cancer biopsy), celocentesis sample, cells (blood cells, placental cells, embryo or fetal cells, fetal nucleated cells or fetal cellular remnants, normal cells, abnormal cells (e.g., cancer cells)) or parts thereof (e.g., mitochondrial, nucleus, extracts, or the like), washings of female reproductive tract, urine, feces, sputum, saliva, nasal mucous, prostate fluid, lavage, semen, lymphatic fluid, bile, tears, sweat, breast milk, breast fluid, the like or combinations thereof. In some embodiments, a biological sample is a cervical swab from a subject. A fluid or tissue sample from which nucleic acid is extracted may be acellular (e.g., cell-free). In some embodiments, a fluid or tissue sample may contain cellular elements or cellular remnants. In some embodiments, cancer cells may be included in the sample.

A sample can be a liquid sample. A liquid sample can comprise extracellular nucleic acid (e.g., circulating cell-free DNA). Examples of liquid samples include, but are not limited to, blood or a blood product (e.g., serum, plasma, or the like), urine, cerebrospinal fluid, saliva, sputum, biopsy sample (e.g., liquid biopsy for the detection of cancer), a liquid sample described above, the like or combinations thereof. In certain embodiments, a sample is a liquid biopsy, which generally refers to an assessment of a liquid sample from a subject for the presence, absence, progression or remission of a disease (e.g., cancer). A liquid biopsy can be used in conjunction with, or as an alternative to, a sold biopsy (e.g., tumor biopsy). In certain instances, extracellular nucleic acid is analyzed in a liquid biopsy.

In some embodiments, a biological sample may be blood, plasma or serum. The term “blood” encompasses whole blood, blood product or any fraction of blood, such as serum, plasma, buffy coat, or the like as conventionally defined. Blood or fractions thereof often comprise nucleosomes. Nucleosomes comprise nucleic acids and are sometimes cell-free or intracellular. Blood also comprises buffy coats. Buffy coats are sometimes isolated by utilizing a ficoll gradient. Buffy coats can comprise white blood cells (e.g., leukocytes, T-cells, B-cells, platelets, and the like). Blood plasma refers to the fraction of whole blood resulting from centrifugation of blood treated with anticoagulants. Blood serum refers to the watery portion of fluid remaining after a blood sample has coagulated. Fluid or tissue samples often are collected in accordance with standard protocols hospitals or clinics generally follow. For blood, an appropriate amount of peripheral blood (e.g., between 3 to 40 milliliters, between 5 to 50 milliliters) often is collected and can be stored according to standard procedures prior to or after preparation.

An analysis of nucleic acid found in a subject’s blood may be performed using, e.g., whole blood, serum, or plasma. An analysis of tumor or cancer DNA found in a patient’s blood, for example, may be performed using, e.g., whole blood, serum, or plasma. Methods for preparing serum or plasma from blood obtained from a subject (e.g., patient; cancer patient) are known. For example, a subject’s blood (e.g., patient’s blood; cancer patient’s blood) can be placed in a tube containing EDTA or a specialized commercial product such as Cell-Free DNA BCT (Streck, Omaha, NE) or Vacutainer SST (Becton Dickinson, Franklin Lakes, N.J.) to prevent blood clotting, and plasma can then be obtained from whole blood through centrifugation. Serum may be obtained with or without centrifugation-following blood clotting. If centrifugation is used then it is typically, though not exclusively, conducted at an appropriate speed, e.g., 1,500-3,000 times g. Plasma or serum may be subjected to additional centrifugation steps before being transferred to a fresh tube for nucleic acid extraction. In addition to the acellular portion of the whole blood, nucleic acid may also be recovered from the cellular fraction, enriched in the buffy coat portion, which can be obtained following centrifugation of a whole blood sample from the subject and removal of the plasma.

A sample may be a tumor nucleic acid sample (i.e. , a nucleic acid sample isolated from a tumor). The term “tumor” generally refers to neoplastic cell growth and proliferation, whether malignant or benign, and may include pre-cancerous and cancerous cells and tissues. The terms “cancer” and “cancerous” generally refer to the physiological condition in mammals that is typically characterized by unregulated cell growth/proliferation. In some embodiments, a sample is a tissue sample, a cell sample, a blood sample, or a urine sample. In some embodiments, a sample comprises formalin-fixed, paraffin-embedded (FFPE) tissue. In some embodiments, a sample comprises frozen tissue. In some embodiments, a sample comprises peripheral blood. In some embodiments, a sample comprises blood obtained from bone marrow. In some embodiments, a sample comprises cells obtained from urine. In some embodiments, a sample comprises cell-free nucleic acid. In some embodiments, a sample comprises one or more tumor cells. In some embodiments, a sample comprises one or more circulating tumor cells. In some embodiments, a sample comprises a solid tumor. In some embodiments, a sample comprises a blood tumor.

Nucleic acid analysis methodology

Non-limiting examples of processes for analyzing nucleic acid include amplification (e.g., polymerase chain reaction (PCR)), targeted sequencing, microarray, and fluorescence in situ hybridization (FISH), methods that preserves spatial-proximal contiguity information, and methods that generate proximity ligated nucleic acid molecules.

In some embodiments, a nucleic acid analysis comprises nucleic acid amplification. For example, nucleic acids may be amplified under amplification conditions. The term “amplified” or “amplification” or “amplification conditions” generally refer to subjecting a target nucleic acid in a sample to a process that linearly or exponentially generates amplicon nucleic acids having the same or substantially the same nucleotide sequence as the target nucleic acid, or part thereof. In certain embodiments, the term “amplified” or “amplification” or “amplification conditions” refers to a method that comprises a polymerase chain reaction (PCR). Detecting a structural variant (SV) described herein using amplification (e.g., PCR) may include use of primers designed to hybridize to a region upstream (e.g., 5’) of one or more SV breakpoints, hybridize to a region downstream (e.g., 3’) of one or more SV breakpoints, hybridize to a region adjacent to one or more SV breakpoints, and/or hybridize to a region spanning one or more SV breakpoints. Examples of PCR primers useful for identifying a structural variant are provided herein.

In some embodiments, a nucleic acid analysis comprises fluorescence in situ hybridization (FISH). Fluorescence in situ hybridization (FISH) is a technique that uses fluorescent probes that bind to a nucleic acid sequence with a high degree of sequence complementarity. In certain configurations, fluorescence microscopy may be used to observe where the fluorescent probe is bound to a chromosome. Detecting a structural variant (SV) described herein using fluorescence in situ hybridization (FISH) may include use of probes designed to hybridize to a region upstream (e.g., 5’) of one or more SV breakpoints, hybridize to a region downstream (e.g., 3’) of one or more SV breakpoints, hybridize to a region adjacent to one or more SV breakpoints, and/or hybridize to a region spanning one or more SV breakpoints. Examples of probes useful for identifying a structural variant are provided herein. In some embodiments, a nucleic acid analysis comprises a microarray (e.g., a DNA microarray, DNA chip, biochip). A DNA microarray is a collection of DNA probes attached to a solid surface. Probes can be short sections of a gene or other genomic DNA element that can hybridize to target nucleic acids in a sample (e.g., under high-stringency conditions). Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine presence, absence, and/or relative abundance of target nucleic acid sequences in the sample. Detecting a structural variant (SV) described herein using DNA microarrays may include use of array probes designed to hybridize to a region upstream (e.g., 5’) of one or more SV breakpoints, hybridize to a region downstream (e.g., 3’) of one or more SV breakpoints, hybridize to a region adjacent to one or more SV breakpoints, and/or hybridize to a region spanning one or more SV breakpoints. Examples of array probes useful for identifying a structural variant are provided herein.

In some embodiments, a nucleic acid analysis comprises sequencing (e.g., genome-wide sequencing, targeted sequencing). For targeted sequencing, a target nucleic acid may be amplified (e.g., by PCR with primers specific to the target), enriched using a probe-based approach, where one or more probes hybridize to a target nucleic acid prior to sequencing, or enriched using Cas9-mediated approaches, such as Cas9-guided adapter ligation, as described in Gilpatrick, T. et al., Targeted nanopore sequencing with Cas9-guided adapter ligation, Nature Biotechnology, volume 38, pages 433-438 (2020). Nucleic acid may be sequenced using any suitable sequencing platform including a Sanger sequencing platform, a high throughput or massively parallel sequencing (next generation sequencing (NGS)) platform, or the like, such as, for example, a sequencing platform provided by Illumina® (e.g., HiSeq™, MiSeq™ and/or Genome Analyzer™ sequencing systems); Oxford Nanopore™ Technologies (e.g., MinlON sequencing system), Ion Torrent™ (e.g., Ion PGM™ and/or Ion Proton™ sequencing systems); Pacific Biosciences (e.g., PACBIO RS II sequencing system); Life Technologies™ (e.g., SOLiD sequencing system); Roche (e.g., 454 GS FLX+ and/or GS Junior sequencing systems); or any other suitable sequencing platform. In some embodiments, the sequencing process is a highly multiplexed sequencing process. In certain instances, a full or substantially full sequence is obtained and sometimes a partial sequence is obtained. Nucleic acid sequencing generally produces a collection of sequence reads. As used herein, “reads” (e.g., “a read,” “a sequence read”) are short sequences of nucleotides produced by any sequencing process described herein or known in the art. Reads can be generated from one end of nucleic acid fragments (single-end reads), and sometimes are generated from both ends of nucleic acid fragments (e.g., paired-end reads, double-end reads). In some embodiments, a sequencing process generates short sequencing reads or “short reads.” In some embodiments, the nominal, average, mean or absolute length of short reads sometimes is about 10 continuous nucleotides to about 250 or more contiguous nucleotides. In some embodiments, the nominal, average, mean or absolute length of short reads sometimes is about 50 continuous nucleotides to about 150 or more contiguous nucleotides.

In some embodiments, a nucleic acid analysis comprises a method that preserves spatial- proximal relationships and/or spatial-proximal contiguity information (see e.g., International PCT Application Publication No. WO2019/104034; International PCT Application Publication No. W02020/106776; International PCT Application Publication No. WO2020236851; Kempfer, R., & Pombo, A. (2019). Methods for mapping 3D chromosome architecture. Nature Reviews Genetics. doi:10.1038/s41576-019-0195-2; and Schmitt, Anthony D.; Hu, Ming; Ren, Bing (2016). Genome-wide mapping and analysis of chromosome architecture. Nature Reviews Molecular Cell Biology. doi:10.1038/nrm.2016.104; each of which is incorporated by reference in its entirety, to the extent permitted by law). Methods that preserve spatial-proximal relationships and/or spatial-proximal contiguity information generally refer to methods that capture and preserve the native spatial conformation exhibited by nucleic acids when associated with proteins as in chromatin and/or as part of a nuclear matrix. Spatial-proximal contiguity information can be preserved by proximity ligation, by solid substrate-mediated proximity capture (SSPC), by compartmentalization with or without a solid substrate or by use of a Tn5 tetramer. Methods that preserve spatial-proximal contiguity information may be based on proximity ligation or may be based on a different principle where special proximity is inferred. Methods based on proximity ligation may include, for example, 3C, 4C, 5C, Hi-C, TCC, GCC, TLA, PLAC-seq, HiChIP, ChlA-PET, Capture-C, Capture-HiC, single-cell HiC, sciHiC, single-cell 3C, single-cell methyl-3C, DNAase HiC, Micro-C, Tiled-C, and Low-C. Methods where special proximity is inferred based on a principle other than proximity ligation may include, for example, SPRITE, scSPRITE, Genome Architecture Mapping (GAM), ChlA-Drop, imaging-based approaches using labeled probes and visualization of DNA, and plus/minus sequencing of an imaged sample (e.g. in situ Genome Sequencing (IGS)). In some embodiments, a nucleic acid analysis comprises generating proximity ligated nucleic acid molecules (e.g., using a method described herein). In some embodiments, a nucleic acid analysis comprises sequencing the proximity ligated nucleic acid molecules, e.g., by a suitable sequencing process known in the art or described herein.

In some embodiments, a nucleic acid analysis comprises a method for preparing nucleic acids from particular types of samples that preserves spatial-proximal contiguity information in the sequence of the nucleic acids. Nucleic acid molecules that preserve spatial-proximal contiguity information can fragmented and sequenced using short-read sequencing methods (e.g., Illumina, nucleic acid fragments of lengths approximately 500 bp) or intact molecules that preserve spatial-proximal contiguity information can be sequenced using long-read sequencing (e.g., Illumina, Oxford Nanopore, or others, nucleic acid fragments of lengths approximately 30 K bp or greater). In certain embodiments, a sample can be a fixed sample that is embedded in a material such as paraffin (wax). In some embodiments, a sample can be a formalin fixed sample. In certain embodiments, a sample is formalin-fixed paraffin-embedded (FFPE) sample. In some embodiments, a formalin-fixed paraffin-embedded sample can be a tissue sample or a cell culture sample. In some embodiments, a tissue sample has been excised from a patient and can be diseased or damaged. In some embodiments, a tissue sample is not known to be diseased or damaged. In certain embodiments, a formalin-fixed paraffin-embedded sample can be a formalin-fixed paraffin-embedded section, block, scroll or slide. In certain embodiments, a sample can be a deeply formalin-fixed sample, as described below.

In certain embodiments, a formalin-fixed paraffin-embedded sample is provided on a solid surface and a method of preparing nucleic acid that preserves spatial-proximal contiguity information is performed on the solid surface. In some embodiments, a solid surface is a pathology slide. In some embodiments, additional downstream reactions are also performed on the solid surface.

Those of skill in the art are familiar with methods that can be substituted for steps requiring centrifugation and that achieve a comparable result but are performed on a solid surface. In some embodiments, methods that preserve spatial-proximal contiguity information comprise methods that generate proximity ligated nucleic acid molecules (e.g., using proximity ligation). A proximity ligation method is one in which natively occurring spatially proximal nucleic acid molecules are captured by ligation to generate ligated products. Proximity ligation methods generally capture spatial-proximal contiguity information in the form of ligation products, whereby a ligation junction is formed between two natively spatially proximal nucleic acids. Once the ligation products are formed, the spatial-proximal contiguity information is detected using next generation sequencing, whereby one or more ligation junctions (either from an entire ligation product or fragment of a ligation product) are sequenced (as described herein). With this sequence information, one is informed that the nucleic acid molecules from a given ligation product (or ligation junction) are natively spatially proximal nucleic acids. In some embodiments, reagents that generate proximity ligated nucleic acid molecules can include a restriction endonuclease, a DNA polymerase, a plurality of nucleotides comprising at least one biotinylated nucleotide, and a ligase. In certain embodiments, two or more restriction endonucleases are used. Any suitable method for carrying out proximity ligation may be used, as described herein. Structural variants

Provided herein are methods for detecting the presence or absence of a structural variant in a sample. In certain aspects, provided is a method for detecting the presence or absence of a structural variant in a sample, the method comprising analyzing sample nucleic acid from a subject, wherein the analyzing comprises: generating proximity ligated nucleic acid molecules; contacting the proximity ligated nucleic acid molecules with one or more oligonucleotide probes described herein, thereby generating enriched proximity ligated nucleic acid molecules; sequencing the enriched proximity ligated nucleic acid molecules, thereby generating sequences of the sample nucleic acid; and determining the presence or absence of a structural variant in the sample nucleic acid from the sequences. In certain instances, the method includes comparing sequences of the sample nucleic acid to sequences of a reference genome to determine the presence or absence of a structural variant.

A structural variant may be referred to as a structural variation and/or a chromosomal rearrangement. A structural variant may comprise one or more of a translocation, inversion, insertion, deletion, and duplication. In some embodiments, a structural variant comprises a microduplication and/or a microdeletion. In some embodiments, a structural variant comprises a fusion (e.g., a gene fusion where a portion of a first gene is inserted into a portion of a second gene). Any type of structural variant, whether it be translocation, inversion, insertion, deletion, and/or duplication as described below, can be of any length, and in some embodiments, is about 1 base or base pair (bp) to about 250 megabases (Mb) in length. In some embodiments, a structural variation is about 1 base or base pair (bp) to about 50,000 kilobases (kb) in length (e.g., about 10 bp, 50 bp, 100 bp, 500 bp, 1 kb, 5 kb, 10kb, 50 kb, 100 kb, 500 kb, 1000 kb, 5000 kb or 10,000 kb in length). A structural variant may be intra-chromosomal (rearrangement of genomic material within a chromosome) or inter-chromosomal (rearrangement of genomic material between two or more chromosomes).

A structural variant may comprise a translocation. A translocation is a genetic event that results in a rearrangement of chromosomal material. Translocations may include reciprocal translocations and Robertsonian translocations. A reciprocal translocation is a chromosome abnormality caused by exchange of parts between non-homologous chromosomes - two detached fragments of two different chromosomes are switched. A Robertsonian translocation occurs when two non-homologous chromosomes become attached, meaning that given two healthy pairs of chromosomes, one of each pair sticks and blends together homogeneously. A gene fusion may be created when a translocation joins two genes that are normally separate. Translocations may be balanced (i.e. , in an even exchange of material with no genetic information extra or missing, sometimes with full functionality) or unbalanced (i.e., where the exchange of chromosome material is unequal resulting in extra or missing genes or fragments thereof).

A structural variant may comprise an inversion. An inversion is a chromosome rearrangement in which a segment of a chromosome is reversed end-to-end. An inversion may occur when a single chromosome undergoes breakage and rearrangement within itself. Inversions may be of two types: paracentric and pericentric. Paracentric inversions do not include the centromere, and both breaks occur in one arm of the chromosome. Pericentric inversions include the centromere, and there is a break point in each arm.

A structural variant may comprise an insertion. An insertion may be the addition of one or more nucleotide base pairs into a nucleic acid sequence. An insertion may be a microinsertion (generally a submicroscopic insertion of any length ranging from 1 base to about 10 megabases (e.g., about 1 megabase to about 3 megabases)). In certain embodiments, an insertion comprises the addition of a segment of a chromosome into a genome, chromosome, or segment thereof. In certain embodiments an insertion comprises the addition of an allele, a gene, an intron, an exon, any non-coding region, any coding region, segment thereof or combination thereof into a genome or segment thereof. In certain embodiments an insertion comprises the addition (e.g., insertion) of nucleic acid of unknown origin into a genome, chromosome, or segment thereof. In certain embodiments an insertion comprises the addition (e.g., insertion) of a single base.

A structural variant may comprise a deletion. In certain embodiments, a deletion is a genetic aberration in which a part of a chromosome or a sequence of DNA is missing. A deletion can, in certain embodiments, result in the loss of genetic material. In embodiments, a deletion can be translocated to another portion of the genome (balanced translocation or unbalanced translocation), such as on the same chromosome (same arm of the chromosome or other arm of the chromosome) or on a different chromosome. Any number of nucleotides can be deleted. A deletion can comprise the deletion of one or more entire chromosomes, a segment of a chromosome, an allele, a gene, an intron, an exon, any non-coding region, any coding region, a segment thereof or combination thereof. A deletion can comprise a microdeletion (generally a submicroscopic deletion of any length ranging from 1 base to about 10 megabases (e.g., about 1 megabase to about 3 megabases)). A deletion can comprise the deletion of a single base. A structural variant may comprise a duplication. In certain embodiments, a duplication is a genetic aberration in which a part of a chromosome or a sequence of DNA is copied and inserted back into the genome. In certain embodiments, a duplication is any duplication of a region of DNA. In some embodiments, a duplication is a nucleic acid sequence that is repeated, often in tandem, within a genome or chromosome. In some embodiments a duplication can comprise a copy of one or more entire chromosomes, a segment of a chromosome, an allele, a gene, an intron, an exon, any non-coding region, any coding region, segment thereof or combination thereof. A duplication can comprise a microduplication (generally a submicroscopic duplication of any length ranging from 1 base to about 10 megabases (e.g., about 1 megabase to about 3 megabases)). A duplication sometimes comprises one or more copies of a duplicated nucleic acid. A duplication may be characterized as a genetic region repeated one or more times (e.g., repeated 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 times). Duplications can range from small regions (thousands of base pairs) to whole chromosomes in some instances. Duplications may occur as the result of an error in homologous recombination or due to a retrotransposon event. A structural variant may include a plurality of chromosomal rearrangements (e.g., translocations, inversions, insertions, deletions, duplications). For example, a structural variant may include a plurality of intra-chromosomal rearrangements. In certain instances, a structural variant may include a plurality of inter-chromosomal rearrangements. In certain instances, a structural variant may include a plurality of intra-chromosomal rearrangements and inter- chromosomal rearrangements.

A structural variant may be defined according to one or more breakpoints. A breakpoint generally refers to a genomic position (i.e. , genomic coordinate) where a structural variant occurs (e.g., translocation, inversion, insertion, deletion, or duplication). A breakpoint may refer to a genomic position where an ectopic portion of genomic material is inserted (e.g., a recipient site for an insertion or a translocation). A breakpoint may refer to a genomic position where a portion of genomic material is deleted (e.g., a donor site for an insertion or a translocation). A breakpoint may refer to a pair of genomic positions (i.e., genomic coordinates) that have become flanking (i.e., adjacent) to one another as a result of a structural variant (e.g., translocation, inversion, insertion, deletion, or duplication). A breakpoint may be defined in terms of a position or positions in a reference genome. A breakpoint may be defined in terms of a position or positions in a human reference genome (e.g., HG38 human reference genome). Generally, genomic positions discussed herein are in reference to an HG38 human reference genome, and corresponding and/or equivalent positions in any other human reference genome are contemplated herein.

A breakpoint may be defined in terms mapping to a position or positions in a reference genome. A breakpoint may be defined in terms of mapping to a position or positions in a human reference genome (e.g., HG38 human reference genome). A breakpoint may map to a position in a reference genome when a nucleic acid sequence located upstream, downstream, or spanning the breakpoint aligns with a corresponding sequence in a reference genome. Any suitable mapping method (e.g., process, algorithm, program, software, module, the like or combination thereof) can be used and certain aspects of mapping processes are described hereafter.

Mapping a nucleic acid sequence may comprise mapping one or more nucleic acid sequence reads (e.g., sequence information from a fragment whose physical genomic position is unknown), which can be performed in a number of ways, and often comprises alignment of the obtained sequence reads with a matching sequence in a reference genome. In such alignments, sequence reads generally are aligned to a reference sequence and those that align are designated as being "mapped", "a mapped sequence read" or “a mapped read”.

The terms “aligned”, “alignment”, or “aligning” generally refer to two or more nucleic acid sequences that can be identified as a match (e.g., 100% identity) or partial match. Alignments can be done manually or by a computer (e.g., a software, program, module, or algorithm), nonlimiting examples of which include the Efficient Local Alignment of Nucleotide Data (ELAND) computer program distributed as part of the Illumina Genomics Analysis pipeline. Alignment of a sequence read can be a 100% sequence match. In some cases, an alignment is less than a 100% sequence match (e.g., non-perfect match, partial match, partial alignment). In some embodiments an alignment is about a 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 79%, 78%, 77%, 76% or 75% match. In some embodiments, an alignment comprises a mismatch (i.e. , a base not correctly paired with its canonical Watson-Crick base partner (e.g., A or T incorrectly paired with C or G). In some embodiments, an alignment comprises 1 , 2, 3, 4 or 5 mismatches. Two or more sequences can be aligned using either strand. In certain embodiments a nucleic acid sequence is aligned with the reverse complement of another nucleic acid sequence. In certain instances, extra or missing bases within a sequence are expressed as gaps in an alignment and may or may not be factored into a percent identity calculation. For example, a percent identity calculation may include a number of mismatches and gaps or may include a number of mismatches only.

Various computational methods can be used to map and/or align sequence reads to a reference genome. Non-limiting examples of computer algorithms that can be used to align sequences include, without limitation, BLAST, BLITZ, FASTA, BOWTIE 1, BOWTIE 2, ELAND, MAQ, PROBEMATCH, SOAP or SEQMAP, or variations thereof or combinations thereof. In some embodiments, sequence reads can be aligned with reference sequences and/or sequences in a reference genome. In some embodiments, the sequence reads can be found and/or aligned with sequences in nucleic acid databases known in the art including, for example, GenBank, dbEST, dbSTS, EM BL (European Molecular Biology Laboratory) and DDBJ (DNA Databank of Japan). BLAST or similar tools can be used to search the identified sequences against a sequence database.

A structural variant may be defined in terms of a receiving site and a donor site. A receiving site may be referred to as a first partner or “partner 1” and a donor site may be referred to as a second partner or “partner 2.” In some embodiments, a structural variant may be defined in terms of comprising an ectopic portion of genomic DNA (i.e., a portion of genomic DNA at a receiving site from a different region of a chromosome or from a different chromosome). The ectopic portion may be referred to as a donor portion.

In some embodiments, a structural variant may comprise an ectopic portion of genomic DNA (i.e., a portion of genomic DNA at a receiving site from a different region of a chromosome or from a different chromosome). The ectopic portion may be referred to as a donor portion. If the ectopic portion (donor portion) is from the same chromosome as the structural variant, the ectopic portion may be from a location outside of the position ranges provided herein for certain structural variants. The ectopic portion may comprise genomic DNA from a genomic coordinate window provided herein, or part thereof. The ectopic portion may comprise genomic DNA from a genomic coordinate window provided herein, or part thereof, and may further comprise genomic DNA from a region outside of a genomic coordinate window provided herein.

In some embodiments, an ectopic portion of genomic DNA is characterized by its location (e.g., observed location for a given sample or samples) at a receiving site (e.g., at a structural variant site). In some embodiments, an ectopic portion is characterized by its location (e.g., observed location for a given sample samples) relative to a coding region of a gene and/or cancer gene. A coding region of a gene and/or cancer gene generally refers to a part of the gene and/or cancer gene that is transcribed and translated into protein (i.e., the sum total of its exons). In some embodiments, an ectopic portion is within a coding region of a gene and/or cancer gene. In some embodiments, an ectopic portion is not within a coding region of a gene and/or cancer gene. For example, an ectopic portion may be located in an intronic region, an intergenic region, or within another gene. In some embodiments, an ectopic portion is located at a position in proximity to a coding region for a gene and/or cancer gene. The term “in proximity” may refer to spatial proximity and/or linear proximity.

Spatial proximity generally refers to 3-dimentional chromatin proximity, which may be assessed according to a method that preserves spatial-proximal relationships, such as a method described herein or any suitable method known in the art. An ectopic portion may be located at a position in spatial proximity to a coding region for a gene and/or cancer gene when an ectopic portion and a gene and/or cancer gene (or a fragment thereof) are ligated in a proximity ligation assay or are bound by a common solid phase in a solid substrate-mediated proximity capture (SSPC) assay, for example.

Linear proximity generally refers to a linear base-pair distance, which may be assessed according to mapped distances in a reference genome, for example. Linear proximity distance may be provided as a distance between a 5’ or 3’ end of an ectopic portion and a 5’ or 3’ end of a gene and/or exon. An ectopic portion may be located at a position in linear proximity to a coding region of a gene and/or cancer gene when the ectopic portion is within about 1,000 base pairs, about 2,000 base pairs, about 3,000 base pairs, about 4,000 base pairs, about 5,000 base pairs, about 10,000 base pairs, about 20,000 base pairs, about 30,000 base pairs, about 40,000 base pairs, about 50,000 base pairs, about 60,000 base pairs, about 70,000 base pairs, about 80,000 base pairs, about 90,000 base pairs, about 100,000 base pairs, about 200,000 base pairs, about 300,000 base pairs, about 400,000 base pairs, about 500,000 base pairs, about 600,000 base pairs, about 700,000 base pairs, about 800,000 base pairs, about 900,000 base pairs, or about 1,000,000 base pairs of a coding region of a gene and/or cancer gene. A structural variant may be associated with one or more genes. For example, a structural variant may be associated with one or more cancer genes. An cancer gene is a gene that, when altered, is associated with cancer. Alterations may include mutations, structural variants, copy number variations, and the like and combinations thereof. Alterations may be located within a gene and/or cancer gene (i.e., intragenic) or outside of/adjacent to a gene and/or cancer gene (i.e., intergenic, extragenic). For structural variants, the terms “outside of” and “adjacent to,” as used herein in reference to a structural variant being outside of or adjacent to a gene generally means that a breakpoint of a structural variant is not within the gene. The structural variant can contain the gene, such as an inversion of the gene, an insertion of the gene, a duplication of the gene, or the like, or can contain a portion of the gene. In certain aspects, the structural variant may not include the gene, i.e., the structural variant does not contain the gene, insertion, inversion, duplication or any portion thereof.

In certain instances, alterations may be located within a different gene. Alterations may be located in a portion of genomic DNA that is proximal to a gene and/or cancer gene (e.g., within a certain linear proximity and/or within a certain spatial proximity). Alterations may affect expression of a gene and/or cancer gene (e.g., increased expression, decreased expression, no expression, constitutive expression). Alterations may affect the function of a protein encoded by a gene and/or cancer gene (e.g., increased function, decreased function, loss-of-function, gain-of-function, constitutive function, change in function).

In some embodiments, a structural variant and/or breakpoint of a structural variant is within a gene (e.g., within an intron and/or exon of a gene (e.g., an cancer gene)). In some embodiments, a structural variant and/or breakpoint of a structural variant is outside of a gene (e.g., within an intergenic region or within a different nearby gene). In some embodiments, a structural variant and/or breakpoint of a structural variant is adjacent to a gene (e.g., within an intergenic region or within a different nearby gene). Thus, in some embodiments, a structural variant and/or a breakpoint for a structural variant is not within a gene (e.g., an cancer gene). In certain instances, a structural variant and/or breakpoint of a structural variant (e.g., an intergenic structural variant) may be defined in terms of linear distance to a gene (e.g., an cancer gene). Linear distance may be measured from the 5’ end of a gene and/or a 3’ end of a gene. In some embodiments, a structural variant and/or a breakpoint for a structural variant may be located at least about 1 kb to about 700 kb from the 5’ end or 3’ end of a gene. For example, a structural variant and/or a breakpoint for a structural variant may be located at least about 1 kb, 2 kb, 3 kb, 4 kb, 5 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, or 700 kb from the 5’ end or 3’ end of a gene.

Kits

Provided in certain embodiments are kits. A kit may include any components and compositions described herein (e.g., oligonucleotide probes, nucleic acids, primers, vectors, enzymes) useful for performing any of the methods described herein, in any suitable combination. Kits may further include any reagents, buffers, or other components useful for carrying out any of the methods described herein. A kit sometimes includes one or more isolated enzymes. In certain instances, a kit can include one or more isolated restriction enzymes suitable for cleaving sample nucleic acid into target sample nucleic acid fragments. A kit sometimes includes an isolated ligase suitable for ligating cleaved sample nucleic acid fragments that are in proximity to one another after cleavage. In certain implementations, a kit includes an isolated polymerase, such as a polymerase useful for conducting an amplification process. A kit sometimes includes one or more oligonucleotide primers useful for conducting an amplification process, and sometimes includes one or more adapter oligonucleotides useful for conducting a sequencing process. Certain enzymes (e.g., isolated enzymes) typically are not associated with polynucleotides of oligonucleotide probes and/or sample nucleic acid in vivo and do not naturally occur together.

A kit in certain implementations includes a solid phase comprising a capture agent suitable for specifically binding to a capture agent counterpart incorporated in oligonucleotide probes and thereby suitable for capturing, isolating, purifying and/or enriching probe hybridization complexes. A solid phase sometimes is a plurality of beads. In certain implementations, a capture agent is selected from biotin, avidin and streptavidin, and the capture agent counterpart is a molecule that specifically binds to the capture agent and independently is selected from biotin, avidin and streptavidin.

Components of a kit may be present in separate containers, or multiple components may be present in a single container. Suitable containers include a single tube (e.g., vial), one or more wells of a plate (e.g., a 96-well plate, a 384-well plate, and the like), and the like.

Kits may also comprise instructions for performing one or more methods described herein and/or a description of one or more components described herein. For example, a kit may include instructions for using oligonucleotide probes and other components described herein. Instructions and/or descriptions may be in printed form and may be included in a kit insert. In some embodiments, instructions and/or descriptions are provided as an electronic storage data file present on a suitable computer readable storage medium, e.g., portable flash drive, DVD, CD-ROM, diskette, and the like. A kit also may include a written description of an internet location that provides such instructions or descriptions.

Certain Implementations

Following are non-limiting examples of certain implementations of the technology.

A1. A composition, comprising: a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample, wherein: a plurality of intron-directed oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns comprises (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; the intron- directed oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the oligonucleotide probe pairs in the set of oligonucleotide probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each oligonucleotide probe in the set of single oligonucleotide probes comprises a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

A2. The composition of embodiment A1 , wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from 100 or more cancer genes.

A3. The composition of embodiment A1 , wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from 500 or more cancer genes.

A4. The composition of embodiment A1 , wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from 800 or more cancer genes, 1000 or more cancer genes, 1200 or more cancer genes, or 1400 or more cancer genes.

A4.1 The composition of any of embodiments A1-A4, wherein any one cancer gene, or the 100 or more, 500 or more, 800 or more, 1000 or more, 1200 or more or 1400 or more cancer genes or subset thereof, is/are selected from the group of cancer genes listed in Appendix 1.

A5. The composition of any one of embodiments A1-A4.1 , wherein the untranslated regions surrounding cancer genes comprise (i) a nucleic acid region extending in the 5’ direction from the 5’ end of an cancer gene coding region and (ii) a nucleic acid region extending in the 3’ direction from the 3’ end of an cancer gene coding region.

A6. The composition of embodiment A5, wherein a nucleic acid region is within 10,000 consecutive nucleotides from an cancer gene coding region end of a plurality of the cancer genes.

A7. The composition of embodiment A5, wherein a nucleic acid region is within 5,000 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene. A8. The composition of embodiment A5, wherein a nucleic acid region is within 2,000 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene. A9. The composition of embodiment A5, wherein a nucleic acid region is within 1 ,500 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene. A10. The composition of any one of embodiments A5-A9, wherein the untranslated regions surrounding cancer genes comprise promoter regions.

A11. The composition of embodiment A10, wherein the untranslated regions surrounding cancer genes consist of promoter regions. A12. The composition of embodiment A10 or A11, wherein the promoter regions comprise a 5’ end about 500 consecutive nucleotides to about 1500 consecutive nucleotides from a 5’ end of each cancer gene coding region.

A13. The composition of any one of embodiments A1-A12, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 110 to about 130 consecutive nucleotides.

A14. The composition of any one of embodiments A1-A13, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 120 consecutive nucleotides.

A15. The composition of any one of embodiments A1-A13, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes is complementary to a subsequence of a nucleic acid fragment.

A16. The composition of embodiment A15, wherein the polynucleotide or each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes is 100% complementary to the minus strand of Genome Reference Consortium Human Build 38 (GRCH38).

A16.1. The composition of embodiment A15, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes is 100% complementary to a corresponding portion of the plus strand of Genome Reference Consortium Human Build 38 (GRCH38).

A16.2. The composition of embodiment A15, wherein the composition comprises a mixture of (i) polynucleotides of oligonucleotide probes that are 100% complementary to a corresponding portion of the minus strand of Genome Reference Consortium Human Build 38 (GRCH38) and (ii) polynucleotides or oligonucleotide probes that are 100% complementary to a corresponding portion of the minus strand of Genome Reference Consortium Human Build 38 (GRCH38).

A17. The composition of any one of embodiments A1-A16.2, wherein each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes is capable of hybridizing to a fragment under hybridization conditions of moderate stringency and/or high stringency.

A18. The composition of any one of embodiments A1-A17, wherein the polynucleotides of the oligonucleotide probes in the composition are capable of hybridizing to sample nucleic acid fragments having an average fragment size of about 180 consecutive nucleotides to about 200 consecutive nucleotides.

A19. The composition of any one of embodiments A1-A18, wherein the polynucleotides of the oligonucleotide probes in the composition are capable of hybridizing to target nucleic acid fragments having an average GO content of about 40% to about 45%.

A20. The composition of embodiment A19, wherein the average GO content is about 43%. A21. The composition of any one of embodiments A1-A20, wherein the polynucleotides of the oligonucleotide probes in the composition are capable of hybridizing to target nucleic acid fragments of about 1% to about 97%.

A22. The composition of any one of embodiments A1-A21 , wherein the oligonucleotide probes in the composition comprise a biotin molecule.

A23. The composition of any one of embodiments A1-A22, wherein the polynucleotide of each of the oligonucleotide probes comprises RNA.

A24. The composition of any one of embodiments A1-A22, wherein the polynucleotide of each of the oligonucleotide probes consists of RNA.

A25. The composition of any one of embodiments A1-A24, wherein the target nucleic acid fragments result from nucleic acid restriction enzyme cleavage by restriction enzymes cutting at A GATC and G A ANTC, where “ A ” represents the cut site on the positive DNA strand.

A26. The composition of embodiment A25, wherein the restriction enzyme cleavage is by two restriction enzymes.

A27. The composition of any one of embodiments A1-A26, wherein the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns are not capable of hybridizing to tiled subsequences of a target nucleic acid fragment.

A28. The composition of any one of embodiments A1-A27, wherein the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns does not contain a probe capable of hybridizing to an end of a first target nucleic acid fragment and to an end of a second target nucleic acid fragment.

A29. The composition of any one of embodiments A1-A28, wherein the first oligonucleotide probe and the second oligonucleotide probe in each of the oligonucleotide probe pairs do not hybridize to regions of a target nucleic acid fragment that overlap.

A30. The composition of any one of embodiments A1-A29, wherein the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns does not contain a probe capable of hybridizing to target nucleic acid fragment having a length of less than 130 consecutive nucleotides.

A31. The composition of any one of embodiments A1-A30, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments from introns capable of hybridizing to a target nucleic acid fragment are not capable of hybridizing to contiguous, non-overlapping regions of the nucleic acid fragment.

A32. The composition of any one of embodiments A1-A31 , wherein each oligonucleotide probe of a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters is capable of hybridizing to a region spanning about 100 to about 500 consecutive nucleotides from the 5’ end of each target nucleic acid fragment or to a region spanning about 100 to about 500 consecutive nucleotides from the 3’ end of each target nucleic acid fragment. A33. The composition of any one of embodiments A1-A32, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters are capable of hybridizing to a region at a first region spanning about 350 consecutive nucleotides from the 5’ end of each target nucleic acid fragment or to a region spanning about 350 consecutive nucleotides from the 3’ end of each target nucleic acid fragment.

A34. The composition of any one of embodiments A1-A33, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters capable of hybridizing to a target nucleic acid fragment are capable of hybridizing to contiguous, non-overlapping regions of the nucleic acid fragment.

A35. The composition of any one of embodiments A1-A34, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters are not restricted to a GC content percentage range or threshold.

A36. The composition of any one of embodiments A1-A35, comprising about 500,000 or more oligonucleotide probes.

A37. The composition of embodiment A36, comprising about 600,000 or more oligonucleotide probes.

A38. The composition of any one of embodiments A1-A37, comprising about 400,000 to about 800,000 oligonucleotide probes.

A39. The composition of any one of embodiments A1-A37, comprising about 500,000 to about 700,000 oligonucleotide probes.

A40. The composition of any one of embodiments A1-A37, comprising about 500,000 to about 700,000 oligonucleotide probes.

A41. The composition of any one of embodiments A1-A37, comprising about 550,000 to about 650,000 oligonucleotide probes.

A42. The composition of any one of embodiments A1-A41 , comprising about 150,000 to about 300,000 unique oligonucleotide probes.

A43. The composition of any one of embodiments A1-A41 , comprising about 175,000 to about 275,000 unique oligonucleotide probes.

A44. The composition of any one of embodiments A1-A41 , comprising about 200,000 to about 250,000 unique oligonucleotide probes.

A45. The composition of any one of embodiments A1-A41 , comprising about 241,000 unique oligonucleotide probes.

B1. A method for nucleic acid enrichment, comprising: subjecting target nucleic acid from a nucleic acid sample to nucleic acid cleavage conditions in which nucleic acid fragments are generated; subjecting the target nucleic acid fragments to linking conditions in which proximity ligated nucleic acid molecules are generated; contacting the proximity ligated nucleic acid molecules with a composition comprising a plurality of oligonucleotide probes of any one of embodiments A1-A45 under hybridization conditions in which hybridization complexes comprising proximity ligated nucleic acid hybridized to oligonucleotide probes are generated; isolating the complexes; and analyzing nucleic acid in the complexes.

B2. The method of embodiment B1 , wherein the cleavage conditions comprise contacting the sample nucleic acid with a restriction endonuclease.

B3. The method of embodiment B2, wherein the restriction endonuclease cleaves the sample nucleic acid at A GATC and G A ANTC, where “ A ” represents the cut site.

B4. The method of embodiment B2 or B3, wherein the sample nucleic acid is contacted with two or more restriction enzyme types.

B5. The method of any one of embodiments B1-B4, wherein the linking conditions comprise contacting the nucleic acid fragments with a ligase under conditions in which ends of fragments in proximity are joined.

B6. The method of any one of embodiments B1-B5, wherein the oligonucleotide probes comprise biotin and the complexes are isolated by contacting the complexes with a solid phase comprising avidin or streptavidin.

B7. The method of any one of embodiments B1-B6, wherein the analyzing the nucleic acid in the complexes comprises sequencing the proximity ligated nucleic acid.

01. A kit comprising a composition of any one of embodiments A1-A45.

02. A kit comprising instructions or a link to instructions describing the method of any one of embodiments B1-B7.

03. A kit of embodiment 01 or 02, comprising one or more isolated restriction enzymes.

04. The kit of embodiment 03, wherein the one or more isolated restriction enzymes are suitable for cleaving sample nucleic acid into target sample nucleic acid fragments.

05. The kit of any one of embodiments 01-04, comprising an isolated ligase enzyme.

06. The kit of embodiment 05, wherein the isolated ligase enzyme is suitable for ligating cleaved sample nucleic acid fragments that are in proximity to one another.

07. The kit of any one of embodiments 01-06, comprising a solid phase.

08. The kit of embodiment 07, wherein the oligonucleotide probes comprise a capture agent and the solid phase comprises a capture agent counterpart suitable for binding to a capture agent of the oligonucleotide probes.

09. The kit of embodiment 08, wherein the capture agent and the capture agent counterpart independently are selected from biotin, avidin and streptavidin.

010. The kit of any one of embodiments 07-09, wherein the solid phase is beads.

D1. A method for designing a plurality of oligonucleotide probes, comprising: identifying a plurality of target nucleic acid fragments of cancer gene exons, introns, and untranslated regions surrounding cancer genes resulting from nucleic acid restriction enzyme cleavage of a nucleic acid sample; designing a plurality of intron-directed oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns, comprising (i) a set of probe pairs each capable of hybridizing to a target nucleic acid fragment of about 260 consecutive nucleotides or longer in length, and (ii) a set of single oligonucleotide probes each capable of hybridizing to a nucleic acid fragment of about 130 consecutive nucleotides to about 260 consecutive nucleotides in length; wherein: the intron- directed oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes comprise a polynucleotide (i) about 110 to about 130 consecutive nucleotides in length, (ii) substantially complementary to a subsequence of a nucleic acid fragment, (iii) containing an average GC content of about 40 percent to about 60 percent, and (iv) complementary to a subsequence of a nucleic acid that does not repeat in the nucleic acid fragments; each of the oligonucleotide probe pairs in the set of oligonucleotide probe pairs comprises (i) a first oligonucleotide probe comprising a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of the fragment, and (ii) a second oligonucleotide probe comprising a 3’ end about 2 to about 15 consecutive nucleotides from the 3’ end of the nucleic acid fragment to which the first oligonucleotide probe of the probe pair is capable of hybridizing; and each oligonucleotide probe in the set of single oligonucleotide probes comprises a 5’ end about 2 to about 15 consecutive nucleotides from the 5’ end of a nucleic acid fragment.

D2. The method of embodiment D1, wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from 100 or more cancer genes.

D3. The method of embodiment D1, wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from 500 or more cancer genes.

D4. The method of embodiment D1, wherein the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes are capable of hybridizing to nucleic acid fragments from 800 or more cancer genes, 1000 or more cancer genes, 1200 or more cancer genes, or 1400 or more cancer genes.

D4.1 The composition of any of embodiments D1to D4, wherein any one cancer gene, or 100 or more, 500 or more, 800 or more, 1000 or more, 1200 or more or 1400 or more cancer genes or a subset thereof, is/are selected from the group of cancer genes listed in Appendix 1.

D5. The method of any one of embodiments D1-D4, wherein the untranslated regions surrounding cancer genes comprise (i) a nucleic acid region extending in the 5’ direction from the 5’ end of an cancer gene coding region and (ii) a nucleic acid region extending in the 3’ direction from the 3’ end of an cancer gene coding region.

D6. The method of embodiment D5, wherein a nucleic acid region is within 10,000 consecutive nucleotides from an cancer gene coding region end of a plurality of the cancer genes.

D7. The method of embodiment D5, wherein a nucleic acid region is within 5,000 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene.

D8. The method of embodiment D5, wherein a nucleic acid region is within 2,000 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene.

D9. The method of embodiment D5, wherein a nucleic acid region is within 1 ,500 consecutive nucleotides from an cancer gene coding region end of at least one cancer gene.

D10. The method of any one of embodiments D5-D9, wherein the untranslated regions surrounding cancer genes comprise promoter regions.

D11. The method of embodiment D10, wherein the untranslated regions surrounding cancer genes consist of promoter regions.

D12. The method of embodiment D10 or D11, wherein the promoter regions comprise a 5’ end about 500 consecutive nucleotides to about 1500 consecutive nucleotides from a 5’ end of each cancer gene coding region.

D13. The method of any one of embodiments D1-D12, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 110 to about 130 consecutive nucleotides.

D14. The method of any one of embodiments D1-D13, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes consists of about 120 consecutive nucleotides.

D15. The method of any one of embodiments D1-D13, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes is complementary to a subsequence of a nucleic acid fragment.

D16. The method of embodiment D15, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes is 100% complementary to the minus strand of Genome Reference Consortium Human Build 38 (GRCH38).

D16.1. The method of embodiment D15, wherein the polynucleotide of each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes is 100% complementary to a corresponding portion of the plus strand of Genome Reference Consortium Human Build 38 (GRCH38).

D16.2. The method of embodiment D15, wherein the plurality of oligonucleotide probes comprises a mixture of (i) polynucleotides of oligonucleotide probes that are 100% complementary to a corresponding portion of the minus strand of Genome Reference Consortium Human Build 38 (GRCH38) and (ii) polynucleotides or oligonucleotide probes that are 100% complementary to a corresponding portion of the minus strand of Genome Reference Consortium Human Build 38 (GRCH38).

D17. The method of any one of embodiments D1-D16.2, wherein each of the oligonucleotide probes in the set of oligonucleotide probe pairs and the set of single oligonucleotide probes is capable of hybridizing to a fragment under hybridization conditions of moderate stringency and/or high stringency.

D18. The method of any one of embodiments D1-D17, wherein the polynucleotides of the oligonucleotide probes in the composition are capable of hybridizing to sample nucleic acid fragments having an average fragment size of about 180 consecutive nucleotides to about 200 consecutive nucleotides.

D19. The method of any one of embodiments D1-D18 wherein the polynucleotides of the oligonucleotide probes in the composition are capable of hybridizing to target nucleic acid fragments having an average GO content of about 40% to about 45%.

D20. The method of embodiment D19, wherein the average GO content is about 43%.

D20.1. The method of any one of embodiments D1-D20, comprising removing from a first set of designed intron-directed oligonucleotide probes (i) oligonucleotide probes having a GO content less than about 40% and greater than about 60%, and (ii) oligonucleotide probes complementary to a subsequence of a nucleic acid that repeats in the nucleic acid fragments, thereby generating a second set of designed intron-directed oligonucleotide probes.

D21. The method of any one of embodiments D1-D20, wherein the polynucleotides of the oligonucleotide probes in the composition are capable of hybridizing to target nucleic acid fragments having a GO content of about 1% to about 97%.

D22. The method of any one of embodiments D1-D21, wherein the oligonucleotide probes in the plurality of oligonucleotide probes comprise a biotin molecule.

D23. The method of any one of embodiments D1-D22, wherein the polynucleotide of each of the oligonucleotide probes comprises RNA.

D24. The method of any one of embodiments D1-D22, wherein the polynucleotide of each of the oligonucleotide probes consists of RNA.

D25. The method of any one of embodiments D1-D24, wherein the target nucleic acid fragments result from nucleic acid restriction enzyme cleavage by restriction enzymes cutting at A GDTC and G A DNTC, where “ A ” represents the cut site on the positive DNA strand.

D26. The method of embodiment D25, wherein the restriction enzyme cleavage is by two restriction enzymes.

D27. The method of any one of embodiments D1-D26, wherein the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns are not capable of hybridizing to tiled subsequences of a target nucleic acid fragment.

D28. The method of any one of embodiments D1-D27, wherein the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns does not contain a probe capable of hybridizing to an end of a first target nucleic acid fragment and to an end of a second target nucleic acid fragment.

D29. The method of any one of embodiments D1-D28, wherein the first oligonucleotide probe and the second oligonucleotide probe in each of the oligonucleotide probe pairs do not hybridize to regions of a target nucleic acid fragment that overlap.

D30. The method of any one of embodiments D1-D29, wherein the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns does not contain a probe capable of hybridizing to target nucleic acid fragment having a length of less than 130 consecutive nucleotides.

D31. The method of any one of embodiments D1-D30, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of introns capable of hybridizing to a target nucleic acid fragment are not capable of hybridizing to contiguous, non-overlapping regions of the nucleic acid fragment.

D32. The method of any one of embodiments D1-D31, wherein each oligonucleotide probe of a plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters is capable of hybridizing to a region spanning about 100 to about 500 consecutive nucleotides from the 5’ end of each target nucleic acid fragment or to a region spanning about 100 to about 500 consecutive nucleotides from the 3’ end of each target nucleic acid fragment.

D33. The method of any one of embodiments D1-D32, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters are capable of hybridizing to a region at a first region spanning about 350 consecutive nucleotides from the 5’ end of each target nucleic acid fragment or to a region spanning about 350 consecutive nucleotides from the 3’ end of each target nucleic acid fragment.

D34. The method of any one of embodiments D1-D33, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters capable of hybridizing to a target nucleic acid fragment are capable of hybridizing to contiguous, non-overlapping regions of the nucleic acid fragment.

D35. The method of any one of embodiments D1-D34, wherein the oligonucleotide probes of the plurality of oligonucleotide probes capable of hybridizing to target nucleic acid fragments of exons and promoters are not restricted to a GC content percentage range or threshold.

D36. The method of any one of embodiments D1-D35, comprising about 500,000 or more oligonucleotide probes.

D37. The method of embodiment D36, comprising about 600,000 or more oligonucleotide probes.

D38. The method of any one of embodiments D1-D37, comprising about 400,000 to about 800,000 oligonucleotide probes. D39. The method of any one of embodiments D1-D37, comprising about 500,000 to about 700,000 oligonucleotide probes.

D40. The method of any one of embodiments D1-D37, comprising about 500,000 to about 700,000 oligonucleotide probes.

D41. The method of any one of embodiments D1-D37, comprising about 550,000 to about 650,000 oligonucleotide probes.

D42. The method of any one of embodiments D1-D41, comprising about 150,000 to about 300,000 unique oligonucleotide probes.

D43. The method of any one of embodiments D1-D41, comprising about 175,000 to about 275,000 unique oligonucleotide probes.

D44. The method of any one of embodiments D1-D41, comprising about 200,000 to about 250,000 unique oligonucleotide probes.

D45. The method of any one of embodiments D1-D41, comprising about 241,000 unique oligonucleotide probes.

E1. A method for preparing a composition comprising a plurality of oligonucleotide probes, the method comprising synthesizing oligonucleotide probes designed by the method of any one of embodiments D1-D45.

E2. The method of embodiment E1, comprising synthesizing the second set of designed intron- directed oligonucleotide probes according to embodiment D20.1.

F1. A method for detecting the presence or absence of a structural variant in a sample, the method comprising analyzing sample nucleic acid from a subject, wherein the analyzing comprises: generating proximity ligated nucleic acid molecules, contacting the proximity ligated nucleic acid molecules with one or more oligonucleotide probes of any one of embodiments A1-A45, or designed by a method of any one of embodiments D1- D45, or prepared by a method of embodiment E1 or E2, thereby generating enriched proximity ligated nucleic acid molecules, sequencing the enriched proximity ligated nucleic acid molecules, thereby generating sequences of the sample nucleic acid, and determining the presence or absence of a structural variant in the sample nucleic acid from the sequences.

F2. The method of embodiment F1, comprising comparing sequences of the sample nucleic acid to sequences of a reference genome to determine the presence or absence of a structural variant.

Example

The example set forth below illustrates certain implementations and does not limit the technology.

Example 1: Preparation of oligonucleotide probe composition A panel of oligonucleotide probes was prepared for use in capturing a subset of target nucleic acids in proximally-ligated HiC libraries generated from biological sample nucleic acid preparations. Such a panel of capture oligonucleotide probes reduces the cost of sequencing per sample as compared to genome-wide sequence of nucleic acid prepared from a sample. The long-range information encoded in the proximally-ligated HiC libraries, which is the material enriched by the panel of oligonucleotide probes, enables detection of structural variants (SVs) outside of a gene body, referred to as “neighborhood SVs.” The oligonucleotide probes in the panel are RNA probes useful for capturing DNA target nucleic acid in proximally-ligated HiC libraries generated from samples (i.e. , archived FFPE tissues), which are more stable than RNA in the samples.

Oligonucleotide probes that hybridize to 1404 genes involved in heme and or solid tumors, presented in Appendix 1, were prepared and included in a comprehensive panel of oligonucleotide probes useful for capturing library nucleic acid and for identifying a wide variety of structural variants (SVs). Capture probes were designed to the exons, promoters, and introns of the cancer genes. Promoters were defined as 1500bp upstream to 500bp downstream of the start of the gene. Each of the probes was 120 base pairs (bp) long and is designed to have a full 120bp of complementarity to a fragment of the minus strand of the reference Genome assembly, GRCH38 (also referred to as HG38, hg38, and GRCh38). The average fragment size of the in silico digested fragments was 191 bp and the average %GC was 43.0% and ranged from 1.53% to 96.6%. There were approximately 610,000 individual probe oligonucleotide polynucleotides designed for the panel, comprising approximately 241,000 unique sequences. Target sequences were defined using scripts to identify regions around cut sites. The target regions for the exon and promoter probes were input into the Agilent SureDesign Software. All probes were manufactured as biotinylated RNA molecules.

For exons and promoters, probes were generated via tiling at a 1x tiling density for 350bp around each HiC restriction cut site that was contained in the exon and promoter sequences for the genes. The polynucleotide sequence of each of the probes was determined by in silico genome digestion using restriction enzymes cutting at A GATC and G A ANTC, where “ A ” represents the cut site on the positive DNA strand and then using only the 350bp of sequence around the in silico digested cut sites. Probe sequences designed to repetitive sequences in the genome were filtered out and removed from the panel of oligonucleotide probes using the “moderate Stringency” feature in the Agilent Sure Design Software. For these probes, no restriction of the %GC content was made. Performance of the probes was normalized using “Boosting” feature in Agilent Sure Design, in which certain probes were replicated in the design to increase their relative abundance in the probe pool.

For introns, probes were designed using a sparse design that generates probes that are exactly 5 bp away from the restriction cut site (i.e., 5 bp away from the cleavage site on the positive strand, i.e., the first 5 double-stranded bases after digestion) using the same in silico digestion above but with some differences in how the probes were placed. Probes were not designed for restriction fragments that are less than 130bp. A single probe was designed to the 5’ end cut site in restriction fragments that are between 130bp and 259bp in size. Two probes were designed to restriction fragments greater than or equal to 260bp. Probes were removed if they have less than 40% GC and higher than 60% GC. Probes were removed if they are designed to repetitive regions in the genome. The “Boosting” feature in Agilent Sure Design was not utilized for these probes. The intron probe sequences and replication rate in the probe pool (1 for all probes) was defined without using Agilent SureDesign. Agilent Sure Design was utilized only for purchasing intron probes and not for designing the probes.

The spacing of the intron probes when hybridized to target nucleic acid fragments was significantly less dense than the spacing of hybridized exon and promoter probes. The spacing of hybridized intron probes was sparser than the spacing of hybridized exon and promoter probes due to the size restrictions of the fragments to which the probes are designed. This resulted in many fragments in introns not having probes deigned to them and only a few having two probes designed to them.

The probes were designed with higher density tiling in the exons (i.e., 1x tiling) to allow for high resolution capture of structural variant (SV) break points involving the exons. Oligonucleotide probes that hybridize to promoters were included in the panel to capture novel looping interactions with each of the genes, which may occur as a result of nearby SV’s, referred to as “neoloops”.

Oligonucleotide probes with sparser coverage of introns were included to provide resolution of SV’s that occur in a gene body but outside of exon sequences. Sparser coverage of introns, as opposed to the full 1x tiling for promoters and exons, was included in the oligonucleotide probe design in part to reduce the overall cost of the oligonucleotide probe panel and in part to improve the overall performance and quality of the sequencing data obtained when using the panel. Probe oligonucleotide performance was assessed in silico. Constraining intron-directed probe oligonucleotide polynucleotides to a %GC content of 40% to 60% and designing probe sequences so that they do not overlap cut sites, reduced the number of predicted under- performing probe oligonucleotides, demonstrating the value of the sparse intron design.

Figure 1 (panels A and B) is a schematic of probes covering the entire breakpoint cluster region (BCR) gene. The top track is the Gencode v29 gene track. The second from the top track is the Repeatmasked regions in the genome. The third track form the top is the HiC+ restriction cut site locations. The fourth track, in blue, is the annotation of the gene sequence used in the design (1). The third track from the bottom, in red, is the exons sequences used in the design (2). The second track from the bottom, in green, is the annotation of the promoter sequence used for the design (3). The bottom track, in sea foam, are the Probes sequences for the design (4). Figure 2 (panels A and B) is a schematic of probes covering the first exon and part of the first intron of the BCR gene. The top track is the Gencode v29 gene track. The second from the top track is the Repeatmasked regions in the genome. The third track form the top is the HiC+ restriction cut site locations. The fourth track, in blue, is the annotation of the gene sequence used in the design (1). The third track from the bottom, in red, is the exons sequences used in the design (2). The second track from the bottom, in green, is the annotation of the promoter sequence used for the design (3). The bottom track, in sea foam, are the Probes sequences for the design (4).

Figure 3 illustrates simulated capture HiC data for the oligonucleotide probe panel. Fig. 3A shows a HiC Heatmap of the BCR-ABL1 gene fusion in K562 cell line using genome-wide HiC sequenced with 175M raw paired-end reads. The RefSeq genes are on the Top most track and the Position of panel probes in the second track down, above the heatmap. The tracks are replicated, transposed to the left most axis The BCR and ABL1 genes are indicated with text on the two axis. High frequency counts of proximally ligated HiC fragments are represented by darker points, whereas grey represents no counts detected. The Cyan box (1, 2) indicates the breakpoint in the HiC heatmap between BCR and ABL1. Fig. 3B shows the same view as the left panel except with simulated Capture HiC Data with 3.6M raw paired-end reads and 90% on- target rate. The break point is detected in the simulated CHiC data.

Figure 4 illustrates coverage uniformity of probes. Fig. 4A shows a solid line in the plot as the distribution of read coverage, a measure of probe performance, for a representative probe design with no fragment size or %GC filtering. The dotted line is the distribution of read coverage for a representative probe design, designed with fragment size and %GC filtering. A tighter distribution is a more uniform probe performance. Fig. 4B shows histogram of %GC of probes designed without filter for %GC. Fig. 4A shows a histogram of %GC of probes designed with %GC filtering to remove greater than 60% and less than 40% GC probes from the design.

Example 2: Performance Characteristics of Dense vs. Sparse Probe Designs

To evaluate probe performance of both the ‘dense’ (Exon and Promoters) and the ‘sparse’ (sparse intronic design) design methodologies, Capture HiC was performed in GM 12878 cells using the panel of probes in Example 1. Given that this study was performed in a context in which probes designed using both the sparse and dense designs are designed to the same genes with the experiment being carried out in the same exact cells, then this controls for sources of variation that could otherwise bias comparisons of this sort. Figures 5A and 5B show the effect that restriction enzyme cut sites have on the performance of probes, measured by the number of sequencing reads aligning to each probe. The Dense probes in the design are not filtered out for probes that can be cut be the HiC Restriction Enzyme chemistry. Probes that overlap Restriction cut sites hybridize to fragments that do not have the complete hybridization sequence which results in low performance of those probes. Figure 5A shows the effect of cut site numbers on the performance of probes in a Dense design. The number of reads (performance) of the probes decreases when the probes overlap cut sites and the degree of reduction of performance depends on the number of cut sites that the probes overlap. The Sparse design has been filtered out for probes that overlap cut sites and therefore, the performance of these probes were not affected by cut sites (Figure 5B).

The %GC content in a probe affects probe performance. High percent GC resulted in probes that were prone to forming secondary structures and low percent GC probes had a lower binding affinity for their target and thus would be lost during high stringency wash conditions in a high specificity hybridization assay. No filtering for %GC was applied to probes in the Dense portion of the design, so the %GC for these probes is dependent exclusively on the genomic sequence they are designed to hybridize to (Figure 5C). On the contrary, the Sparse probes were filtered out for probe sequences that were less than 40% GC or higher than 60% GC (Figure 5D).

Figure 5 shows that the new Sparse Design provides improvements relative to standard, Dense Designs. Dense design included probes that aligned to DNA regions that were cut by the restriction enzymes, which led to reduction in the performance, evident from the reduction in total counts. 45.3% of probes hybridize to DNA that has at least one cut site in this design. The Sparse Design probes were designed such that they avoid cut sites and their performance is not affected by cut sites.

The combined effect of filtering out probe sequences that overlap Restriction cut sites and filtering out probe sequences with percent GC higher than 60% and lower than 40% also improved two other key performance metrics in the Sparse probes (Introns) vs. Dense probes (Exons and Promoters) - the percent dropout rate and the probe uniformity (also known as coefficient of variance of probe counts). Percent dropout was defined as the percentage of probes, at a given read depth, that did not have any reads aligning to them. This is an indication of the percentage of probes in a design that were low performance probes because these probes did not capture their target sequence in the experiment. The Dense design probes has 7.9 times more dropouts than the Sparse design probes with 7.1% of probes dropping out in the Dense design and 0.9% dropping out in the Sparse design (Figure 6B). The probe uniformity is a measure of how tight the distribution of probe performance was in the experiment, as measured by sequencing read counts. A tighter distribution of probe performance was preferable because more reads need to be sequenced for samples captured with broader probe performance distributions (e.g., in order to compensate for dropouts and low performance probes). Therefore, the sequencing costs are lower for probes with better uniformity. Probe uniformity was measured by calculating the coefficient of variance (i.e., the standard deviation of the probe sequencing counts divided by the mean probe sequencing counts). The Dense probes had a 50% lower uniformity than the Sparse probes, with the Dense probes having a coefficient of variance of 1.8 and the Sparse probes having a coefficient of variance of 1.2 (Figure 6C). The effect of having a lower dropout rate and a tighter distribution of probe performance is that the Sparse probes performed 2.1 times better than the Dense probes in terms of mean sequencing read counts per probe resulting in fewer total reads needed to sequence a design composed of Sparse probes compared to the design composed of Dense probes (Figure 6A).

Figure 6 shows a comparison of probe performance of Dense vs Sparse probe designs. Fig. 6A. shows a comparison where the dashed line in the plot is the distribution of read coverage, a measure of probe performance for Dense probes (Exons and promoters). The solid line is the distribution of read coverage for a Sparse probe (Introns) design. The counts were normalized by the relative abundance of probes in the probe pool. In other words, probes that were replicated more than once, due to boosting from Agilent SureDesign were divided by their replication number. This ensures that differences in probe performance were not biased by difference in concentration of the probe molecules in the assay. Fig. 6B shows the percent dropout was a measure of the percentage of probes that failed to hybridize to DNA, wherein a lower value indicates better performance. Fig. 6C shows uniformity measured by the coefficient of variance, wherein a smaller uniformity score indicates higher probe performance.

Figure 7. Capture HiC Data versus the panel of probes comprising the Sparse Design targeting introns. Fig. 7A shows a HiC Heatmap of the TBI1XR1 gene fusion in MCF7 breast cancer cell line using whole-genome HiC sequenced with -250 M raw Paired-end reads. The RefSeq genes are on the Topmost track and the Position of panel probes in the second track down, above the heatmap. Darker shading indicated high frequency counts of proximally ligated HiC fragments whereas a lighter shading represented no counts detected. The box indicated the breakpoint in the HiC heatmap for the TBI1XR1 gene. Fig. 7B shows same view as the left panel except with capture HiC (CHiC) data obtained using the panel of probes comprising the Sparse Design targeting introns, with -250 M raw Paired-end reads and 92% on-target rate. The Breakpoint in the TBI1XR1 gene is indicated by the box in CHiC data.

FIGs. 8A-8B show the panel of probes comprising the Sparse Design targeting introns captured loops that are associated with the MYC gene (block arrow) in alignment with reported epigenetic data. Two independent CHiC experiments were performed and sequenced to 270-300M reads. The top track shows the annotated genes. Epigenetic ChlP-seq data from (ENCODE) from proteins associated with chromatin modification are shown in the middle tracks, including CTCF, H3K4me4, and H3K27ac. The probe enrichment is shown as peaks for both replicates. The bottom two tracks show the reproducible loops associated with the MYC gene. The majority of the MYC-specific loops are associated with CTCF or regions with histone modifications, which is in line with the scientific literature. The entirety of each patent, patent application, publication and document referenced herein is incorporated by reference. Citation of patents, patent applications, publications and documents is not an admission that any of the foregoing is pertinent prior art, nor does it constitute any admission as to the contents or date of these publications or documents. Their citation is not an indication of a search for relevant disclosures. All statements regarding the date(s) or contents of the documents is based on available information and is not an admission as to their accuracy or correctness.

The technology has been described with reference to specific implementations. The terms and expressions that have been utilized herein to describe the technology are descriptive and not necessarily limiting. Certain modifications made to the disclosed implementations can be considered within the scope of the technology. Certain aspects of the disclosed implementations suitably may be practiced in the presence or absence of certain elements not specifically disclosed herein.

Each of the terms “comprising,” “consisting essentially of,” and “consisting of” may be replaced with either of the other two terms. The term “a” or “an” can refer to one of or a plurality of the elements it modifies (e.g., “a reagent” can mean one or more reagents) unless it is contextually clear either one of the elements or more than one of the elements is described. The term “about” as used herein refers to a value within 10% of the underlying parameter (i.e. , plus or minus 10%; e.g., a weight of “about 100 grams” can include a weight between 90 grams and 110 grams). Use of the term “about” at the beginning of a listing of values modifies each of the values (e.g., “about 1 , 2 and 3” refers to "about 1, about 2 and about 3"). When a listing of values is described, the listing includes all intermediate values and all fractional values thereof (e.g., the listing of values "80%, 85% or 90%" includes the intermediate value 86% and the fractional value 86.4%). When a listing of values is followed by the term "or more," the term "or more" applies to each of the values listed (e.g., the listing of "80%, 90%, 95%, or more" or "80%, 90%, 95% or more" or "80%, 90%, or 95% or more" refers to "80% or more, 90% or more, or 95% or more"). When a listing of values is described, the listing includes all ranges between any two of the values listed (e.g., the listing of "80%, 90% or 95%" includes ranges of "80% to 90%, " "80% to 95%" and "90% to 95%").

Certain implementations of the technology are set forth in the claim(s) that follow(s).

Appendix 1