Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPOSITIONS, METHODS, DEVICES, AND KITS FOR DETECTING THE NUMBER AND GENOMIC LOCATIONS OF POLYMORPHIC LINE-1 ELEMENTS IN AN INDIVIDUAL
Document Type and Number:
WIPO Patent Application WO/2021/174217
Kind Code:
A1
Abstract:
The invention provides compositions, methods, kits, and devices for detecting the number and locations of polymorphic LINE-1 (pL1s) elements present in the genome of an individual and for detecting previously unknown pL1s. The inventive compositions, methods, kits, and devices permit the identification of numbers and patterns of pL1 insertions that render a person with such numbers and patterns at higher risk of developing cancer or cognitive disorders compared to persons without such numbers and patterns.

Inventors:
PEREPELITSA VICTORIA (US)
DEININGER PRESCOTT (US)
Application Number:
PCT/US2021/020346
Publication Date:
September 02, 2021
Filing Date:
March 01, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ADMINISTRATORS OF THE TULANE EDUCATIONAL FUND (US)
International Classes:
A61K31/713; A61K35/12; A61K48/00; A61P25/00; C12N15/85; C12Q1/68
Foreign References:
US20140038896A12014-02-06
US20030099930A12003-05-29
US20140287468A12014-09-25
US20110229887A12011-09-22
US20170121770A12017-05-04
US20130315886A12013-11-28
Other References:
VINCENT A STREVA;VALLMER E JORDAN;SARA LINKER;DALE J HEDGES;MARK A BATZER;PRESCOTT L DEININGER: "Sequencing, identification and mapping of primed L1 elements (SIMPLE) reveals significant variation in full length L1 elements between individuals", BMC GENOMICS, vol. 16, no. 1, 21 March 2015 (2015-03-21), London, UK, pages 220, XP021220171, ISSN: 1471-2164, DOI: 10.1186/s12864-015-1374-y
SHEEN ET AL.: "Reading between the LINEs: Human Genomic Variation Induced by LINE-1 Retrotransposition", GENOME RESEARCH, vol. 10, no. 10, 1 October 2000 (2000-10-01), pages 1496 - 1508, XP055850227
HELMAN ET AL.: "Somatic Retrotransposition in Human Cancer Revealed by Whole-Genome and Exome Sequencing", GENOME RESEARCH, vol. 24, no. 7, 1 July 2014 (2014-07-01), pages 1053 - 1063, XP055850229
ARDELJAN ET AL.: "The Human LINE-1 Retrotransposon: an Emerging Biomarker of Neoplasia", CLIN CHEM, vol. 63, no. 4, 10 February 2017 (2017-02-10), pages 816 - 822, XP055844382, DOI: 10.1373/clinchem.2016.257444
BELANCIO ET AL.: "LINE Dancing in the Human Genome: Transposable Elements and Disease", GENOME MEDICINE, vol. 1, 27 October 2009 (2009-10-27), pages 1 - 8, XP055850231
ROY-ENGEL A.M, EL-SAWY M, FAROOQ L, ODOM G.L, PEREPELITSA-BELANCIO V, BRUCH H, OYENIRAN O.O, DEININGER P.L: "Human Retroelements may Introduce Intragenic Polyadenylation Signals", CYTOGENETIC AND GENOME RESEARCH, vol. 110, no. 1-4, 1 July 2005 (2005-07-01), US, pages 365 - 371, XP009530567, ISSN: 1424-8581, DOI: 10.1159/000084968
Attorney, Agent or Firm:
HYMAN, Laurence (US)
Download PDF:
Claims:
CLAIMS

1. A composition for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, are present in genomic DNA of a subject, and at which of the sites at which pLls are known to insert said pLls are present in said genomic DNA of said subject, said composition comprising

(a) a substrate or a plurality of substrates,

(b) a plurality of first DNA probes, RNA probes, or both, attached to said substrate or said plurality of substrates, each of said first DNA probes, RNA probes, or both, comprising a contiguous sequence of about 200 to about 1000 bases complementary to a consensus human genomic sequence surrounding and including one particular known pLl insertion site, for each of the pLl insertion points shown on Table 2, and

(c) a plurality of second DNA probes, RNA probes, or both, which second DNA probes,

RNA probes, or both, are complementary to said beginning contiguous sequence of said 300 bases of said 5’UTR of said pLl or to said 3’UTR contiguous sequence of at least 300 bases.

2. The composition of claim 1, further wherein said first DNA probes, RNA probes, or both, comprise a contiguous sequence of about 200 to about 700 bases.

3. The composition of claim 1, further wherein said first DNA probes, RNA probes, or both, comprise a contiguous sequence of about 250 to about 500 bases.

4. The composition of claim 1, further wherein said first DNA probes, RNA probes, or both comprise a contiguous sequence of about 300 to about 400 bases.

5. The composition of claim 1, wherein said substrate is a slide.

6. The composition of claim 1, wherein said substrate is a well of a multi- well plate.

7. The composition of claim 1, wherein said substrate is a wall of a microfluidic device.

8. The composition of claim 1, further wherein some or all of said solid substrates is in the form of beads.

9. The composition of claim 1, further wherein said solid surface is of quartz.

10. The composition of claim 1, further wherein said solid surface is of glass.

11. The composition of claim 1, further wherein said plurality of solid surfaces is of plastic.

12. The composition of claim 1, further wherein said attachment of said first DNA probes or said second DNA probe, or both, to said solid support or said plurality of solid supports is covalent.

13. The composition of claim 1, further comprising (d) a plurality of third DNA probes, RNA probes, or both, attached to said substrate or said plurality of substrates, each of said third DNA probes, RNA probes, or both, comprising a contiguous sequence of about 200 to about 1000 bases complementary to a consensus human genomic sequence surrounding and including one or more particular fixed LI insertion points associated with cancer.

14. A method for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, may be full-length pLls in genomic DNA of a subject who has both (a) pLls, and (b) LINE-1 elements that occur at known genomic locations in all individuals (“fixed Lis”) with known genomic sequences upstream and downstream of said known genomic locations, and, with regard to the sites at which pLls are known to insert in a human genome as shown in Table 2, at which of said sites at which said sites at which pLls are known to occur pLls are present in said subject, said method comprising the following steps, in the following order:

(a) obtaining genomic DNA from said subject, which genomic DNA is fragmented into lengths of choice, and

(b) contacting said fragmented genomic DNA with

(1) a plurality of first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes, each of which said first DNA probes and first RNA probes (i) comprises a contiguous sequence of about 200 to about 1000 bases complementary to a consensus human genomic sequence surrounding and including one particular known pLl insertion site, wherein said plurality of said first DNA probes, first RNA probes, or mixture of both first DNA probes and first RNA probes taken together comprises human genomic sequence surrounding and including each of said pLl insertion points shown in Table 2, and (ii) wherein each of said first DNA probes and said first RNA probes is (A) attached to an solid support or (B) are tagged with a tag which allows said probes to be specifically captured on a solid support when desired, and

(2) a plurality of second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, wherein said second DNA probes and said second RNA are complementary to said beginning contiguous sequence of said 300 bases of said 5’UTR of said pLl, further wherein each of said second DNA probe and second RNA probe is (A) attached to a solid support or (B) are tagged to allow said probes to be specifically captured on a support when desired, under conditions allowing said fragmented genomic DNA complementary to any of said first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes or to said second DNA probes, second RNA probes, or a mixture of both second DNA probes and second RNA probes to hybridize to said probes, thereby creating a mixture of unhybridized fragmented genomic DNA, and fragmented genomic DNA that has hybridized to one of said probes,

(c) if probes have been used in step (b) that are tagged to allow said tagged probes to be specifically captured on a solid support when desired, capturing said tagged probes on said solid support,

(d) eluting any fragmented genomic DNA that has not hybridized to either one of said first DNA probes, first RNA probes, or mixture of both first DNA probes and first RNA probes, or one of said second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes,

(e) eluting from said supports and collecting for sequencing any fragmented genomic DNA that hybridized to one of said first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes, or to said second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, thereby obtaining a plurality of previously-hybridized genomic DNA fragments,

(f) sequencing said plurality of previously-hybridized genomic DNA fragments, thereby obtaining a DNA sequence for each fragment contained within said plurality of previously- hybridized genomic DNA fragments,

(g) comparing said DNA sequence for each fragment contained within plurality of previously-hybridized genomic DNA fragments to consensus human genomic sequences including each of said pLl insertion sites set forth in Table 2, and determining for each of said pLl insertion sites set forth in Table 2 whether:

(1) said genomic sequence upstream for each of said pLl insertion sites is followed by (i) some or all of beginning of said LI 5’UTR sequence or (ii) some or all of said end of said LI 3’ sequence, indicating that for those insertion sites, there is a pLl present that may be full length, and (2) whether said genomic sequence downstream for each of said pLl insertion sites set forth in Table 2 is followed by (i) some or all of beginning of said LI 5’UTR sequence or (ii) some or all of said end of said LI 3’ UTR sequence, indicating that for those pLl insertion sites, there is a pLl present that may be full length.

15. The method of claim 14, further comprising step (g)(3), compiling a list of how many pLls that have said beginning of said LI 5’UTR and said end of said LI 3’UTR are present in said genome from said individual, thereby determining how many pLls in said individual may be full-length.

16. The method of claim 15, further comprising step (g)(4), identifying in said list the locations of each of said pLls.

17. The method of claim 16, further comprising step (g)(5), for each location in which a pLl has been identified in step (g)(4), determining whether (A) said plurality of sequenced DNA sequences also contains a normal genomic sequence uninterrupted by a pLl at said location, thereby determining that there is a copy of pLl and a normal genomic sequence at that location, indicating that said genome of said individual has one copy of genomic sequence with said pLl at said genomic location and one copy that does not have a pLl at said location, or (B) said plurality of sequenced DNA sequences do not also contain a normal genomic sequence uninterrupted by a pLl at said location, indicating that the genome of said individual has two copies of genomic sequence with said pLl at said genomic location.

18. The method of claim 17, further comprising steps:

(h)(1), comparing the genomic sequences upstream and downstream of all LI sequences in said plurality of sequenced DNA sequences to the genomic sequence upstream and downstream of said fixed Lis in said individual,

(h)(2), determining how many fixed Lis have been detected compared to the number known to exist in the human genome, and

(h)(3) reporting whether the number of fixed Lis detected in said individual is the same or different from the number of fixed Lis known to exist in said human genome.

19. The method of claim 14, wherein said tag to allow said tagged probes to be specifically captured on a solid support is biotin or streptavidin.

20. The method of claim 14, wherein said tag to allow said tagged probes to be specifically captured on a solid support is an antigen which is specifically bound by an antibody attached to said solid support.

21. The method of claim 20, wherein said antigen is digoxigenin and said antibody is an anti-digoxigenin antibody.

22. A method for determining if an individual has a risk of developing cancer or Alzheimer’s Disease due to polymorphic LINE-1 elements (“pLls”) related to risk of cancer or Alzheimer’s Disease in said individual’s genome, said method comprising, determining if said individual carries one of more pLls and, if so, how many, selected from the following groups:

(a) pLls identified in Table 2 as found by WGS, SCORE, or both, only in individuals diagnosed with breast cancer,

(b) pLls identified in Table 2 as found by WGS, SCORE, or both, only in individuals diagnosed with prostate cancer,

(c) pLls identified in Table 2 as found by WGS, SCORE, or both, in genomes of both individuals diagnosed with breast cancer and in genomes of individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS,”

(d) pLls identified in Table 2 as found only in individuals diagnosed with Alzheimer’s Disease,

(e) pLls identified in Table 2 as found by WGS, SCORE, or both, in individuals diagnosed with Alzheimer’s Disease, in individuals diagnosed with breast cancer, and in individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS,” wherein, if said individual has one or more pLls identified in groups (a)-(e), said individual is at risk of developing cancer or Alzheimer’s Disease.

23. The method of claim 22, wherein said pLls are of group (a), and the individual’s risk is of breast cancer.

24. The method of claim 22, wherein said pLls are of group (b), and the individual’s risk is of prostate cancer.

25. The method of claim 22, wherein said pLls are of group (c), and the individual’s risk is of cancer in general, (if female) breast cancer in particular, or, (if male) prostate cancer in particular.

26. The method of claim 22, wherein said pLls are of group (d), and the individual’s risk is of Alzheimer’s Disease.

27. The method of claim 22, wherein said pLls are of group (e), and the individual’s risk is of cancer or Alzheimer’s Disease.

28. A method for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, may be full-length pLls in genomic DNA of a subject who has both (a) pLls, and (b) LINE-1 elements that occur at known genomic locations in all individuals (“fixed Lis”) with known genomic sequences upstream and downstream of said known genomic locations, with regard to pLl insertions sites at which pLls are shown in Table 2 to be:

(group 1) found to be inserted at said sites only in persons diagnosed with breast cancer,

(group 2) found to be inserted at said sites only in persons diagnosed with prostate cancer,

(group 3) found to be inserted at said sites in both persons diagnosed with breast cancer and in persons diagnosed with prostate cancer,

(group 4) found to be inserted at said sites only in individuals diagnosed with Alzheimer’s Disease, or,

(group 5) found to be inserted at said sites in individuals diagnosed with Alzheimer’s Disease, in individuals diagnosed with breast cancer, and in individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS”, said method comprising the following steps, in the following order:

(a) obtaining genomic DNA from said subject, which genomic DNA is fragmented into lengths of choice, and

(b) contacting said fragmented genomic DNA with

(1) a plurality of first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes, each of which said first DNA probes and first RNA probes (A) comprises a contiguous sequence of about 200 to about 1000 bases complementary to a consensus human genomic sequence surrounding and including one particular known pLl insertion site, wherein said plurality of said first DNA probes, first RNA probes, or mixture of both first DNA probes and first RNA probes taken together comprises human genomic sequence surrounding and including each of said pLl insertion points in at least one of said groups (1) to (5), and (ii) wherein each of said first DNA probes and said first RNA probes is (A) attached to an solid support or (B) are tagged with a tag which allows said probes to be specifically captured on a solid support when desired, and

(2) a plurality of second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, wherein said second DNA probes and said second RNA are complementary to said beginning contiguous sequence of said 300 bases of said 5’UTR of said pLl, further wherein each of said second DNA probe and second RNA probe is (A) attached to a solid support or (B) are tagged to allow said probes to be specifically captured on a support when desired, under conditions allowing said fragmented genomic DNA complementary to any of said first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes or to said second DNA probes, second RNA probes, or a mixture of both second DNA probes and second RNA probes to hybridize to said probes, thereby creating a mixture of unhybridized fragmented genomic DNA, and fragmented genomic DNA that has hybridized to one of said probes,

(c) if probes have been used in step (b) that are tagged to allow said tagged probes to be specifically captured on a solid support when desired, capturing said tagged probes on said solid support, or, if said probes were already attached to a solid support, proceeding to step

(d),

(d) eluting any fragmented genomic DNA that has not hybridized to either one of said first DNA probes, first RNA probes, or mixture of both first DNA probes and first RNA probes, or one of said second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes,

(e) eluting from said supports and collecting for sequencing any fragmented genomic DNA that hybridized to one of said first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes, or to said second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, thereby obtaining a plurality of previously-hybridized genomic DNA fragments,

(f) sequencing said plurality of previously-hybridized genomic DNA fragments, thereby obtaining a DNA sequence for each fragment contained within said plurality of previously- hybridized genomic DNA fragments,

(g) comparing said DNA sequence for each fragment contained within plurality of previously-hybridized genomic DNA fragments to consensus human genomic sequences including each of said pLl insertion sites for said in at least one of said groups (1) to (5), and determining for each of said pLl insertion sites in said at least one of said groups (1) to (5) whether:

(1) said genomic sequence upstream for each of said pLl insertion sites is followed by (i) some or all of beginning of said LI 5’UTR sequence or (ii) some or all of said end of said LI 3’ sequence, indicating that for those insertion sites, there is a pLl present that may be full length, and

(2) whether said genomic sequence downstream for each of said pLl insertion sites set forth in Table 2 is followed by (i) some or all of beginning of said LI 5’UTR sequence or (ii) some or all of said end of said LI 3’ UTR sequence, indicating that for those pLl insertion sites, there is a pLl present that may be full length.

29. The method of claim 28, further comprising step (g)(3), compiling a list of how many pLls that have said beginning of said LI 5’UTR and said end of said LI 3’UTR are present in said genome from said individual, thereby determining how many pLls in said at least one of said groups (1) to (5) may be full-length.

30. The method of claim 29, further comprising step (g)(4), identifying in said list the locations of each of said pLls in said at least one of said groups (1) to (5) present in said individual.

31. The method of claim 30, further comprising step (g)(5), for each location in which a pLl has been identified in step (g)(4), determining whether (A) said plurality of sequenced DNA sequences also contains a normal genomic sequence uninterrupted by a pLl at said location, thereby determining that there is a copy of pLl and a normal genomic sequence at that location, indicating that said genome of said individual has one copy of genomic sequence with said pLl at said genomic location and one copy that does not have a pLl at said location, or (B) said plurality of sequenced DNA sequences do not also contain a normal genomic sequence uninterrupted by a pLl at said location, indicating that the genome of said individual has two copies of genomic sequence with said pLl at said genomic location.

32. The method of claim 31 , further comprising steps :

(h)(1), comparing the genomic sequences upstream and downstream of all LI sequences in said plurality of sequenced DNA sequences to the genomic sequence upstream and downstream of said fixed Lis in said individual,

(h)(2), determining how many fixed Lis have been detected compared to the number known to exist in the human genome, and

(h)(3) reporting whether the number of fixed Lis detected in said individual is the same or different from the number of fixed Lis known to exist in said human genome.

33. An electronic device configured for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, are present in genomic DNA of a subject, and at which of the sites at which pLls are known to insert said pLls are present in said genomic DNA of said subject, said device comprising a processor and memory, said memory storing computer executable instmctions for performing the method of claim 1.

34. An electronic device configured for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, are present in genomic DNA of a subject, and at which of the sites at which pLls are known to insert said pLls are present in said genomic DNA of said subject, said device comprising a processor and memory, said memory storing computer executable instructions for performing the method of claim 14.

35. An electronic device configured for performing the method of claim 22, said device comprising a processor and memory, said memory storing computer executable instructions for performing the method of claim 22.

36. An electronic device configured for performing the method of claim 28, said device comprising a processor and memory, said memory storing computer executable instructions for performing the method of claim 28.

37. A kit for determining with regard to a human genome having a genomic sequence proceeding in direction from 5’ to 3’, which genome has 826 known potential insertion points at which a full-length polymorphic LINE-1 element (“pLl”) may be inserted which of said insertion point has had a pLl inserted, said full-length pLls having a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence, said kit comprising

(a) a set of probes for all or substantially of said 826 potential insertion points listed in Table 2, each member of which set of probes comprises (i) a sequence complementary to genomic sequence contiguous to one of said insertion points at which pLl inserts into said genome, attached directly to a sequence complementary to at least the first 100 bases of said beginning of said 5’UTR of said pLl, and,

(b) probes consisting essentially of 100-600 contiguous bases of said pLl 5’UTR.

38. A kit for determining with regard to a human genome having a genomic sequence proceeding in direction from 5’ to 3’, which genome has 826 known potential insertion points at which a full-length polymorphic LINE-1 element (“pLl”) may be inserted which of said insertion point has had a pLl inserted, said full-length pLls having a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence, said kit comprising (a) a set of probes for a subset of said 826 potential insertion points listed in Table 2, said subset consisting of one or more of said following groups: group 1: pLl insertions sites at which pLls are shown in Table 2 to be found to be inserted at said sites only in persons diagnosed with breast cancer, group 2: pLl insertions sites at which pLls are shown in Table 2 found to be inserted at said sites only in persons diagnosed with prostate cancer, group 3: pLl insertions sites at which pLls are shown in Table 2 found to be inserted at said sites in both persons diagnosed with breast cancer and in persons diagnosed with prostate cancer, group 4: pLl insertions sites at which pLls are shown in Table 2 found to be inserted at said sites only in individuals diagnosed with Alzheimer’s Disease, and, group 5: pLl insertions sites at which pLls are shown in Table 2 found to be inserted at said sites in individuals diagnosed with Alzheimer’s Disease, in individuals diagnosed with breast cancer, and in individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS”, each member of which set of probes comprises (i) a sequence complementary to genomic sequence contiguous to one of said insertion points at which pLl inserts into said genome, attached directly to a sequence complementary to at least the first 100 bases of said beginning of said 5’UTR of said pLl.

39. The kit of claim 38, further comprising

(b) probes consisting essentially of 100-600 contiguous bases of said pLl 5’UTR.

40. The kit of claim 38, wherein said subset is the pLl insertion sites of group 1.

41. The kit of claim 38, wherein said subset is the pLl insertion sites of group 2. 42. The kit of claim 38, wherein said subset is the pLl insertion sites of group 3.

Description:
COMPOSITIONS, METHODS, DEVICES, AND KITS FOR DETECTING THE NUMBER AND GENOMIC LOCATIONS OF POLYMORPHIC LINE-1 ELEMENTS

IN AN INDIVIDUAL

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to, and the benefit of, U.S. Provisional Patent Application No. 62/982,596, filed February 27, 2020, the contents of which are incorporated herein by reference.

STATEMENT OF FEDERAL FUNDING

[0002] This invention was made with government support under grant R01 833AG057597 awarded by the National Institute on Aging of the National Institutes of Health. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION

[0003] Long Interspersed Element- 1 (“LI”) retroelements are the only family of mobile genetic elements currently active in the human genome. See , e.g., Deininger et al., Nuc. Acids Res., 2017, 45(5):e31.doi:10.1093/nar/gkwl067 (hereafter, “Deininger”). About 500,000 LI elements have accumulated in the genome over time and now comprise approximately 17% of human genomic content. See, e.g., Belancio et al., Nuc. Acids Res., 2010, 38(12):3909-3922; Lander et al., Nature, 2001, 409, 860-921, doi.org/10.1038/35057062. The majority of LI elements in the genome are inactive, due either to tmncation of their 5’ ends, mutations, or to internal rearrangements. There are, however, also a number of functional LI elements which have both 5’ - and 3’- untranslated regions (“UTRs”) and which do not contain inactivating rearrangements. Functional LI elements continue to generate additional new copies in the genome of the individuals who carry them; the new LI copies can then contribute to genetic instability during the individual’s life, potentially increasing the individual’s risk of diseases such as cancer or increasing the possibility that a cancer in the individual will be more aggressive than might otherwise be the case. [0004] Many non-functional and some functional Lis are present in the genome of every individual. Since these Lis do not vary among individuals, they are sometimes referred to as “fixed Lis;” some fixed Lis are of a type identified as “PA2s,” or “LlPA2s.” As the fixed Lis do not vary in number between individuals, they are less likely to change the risk of genetic instability in any one individual compared to any other individual. “Polymorphic Lis,” on the other hand, vary in number from individual to individual. As polymorphic Lis, by definition, vary in number from one individual to another and they also vary in genomic position. Both the number and position of specific pLls can place an individual at higher risk of genetic instability, and of diseases related to that genetic instability, than that of individuals with lower numbers of polymorphic Lis or with pLls in other positions in their genome.

[0005] Unfortunately, there is currently no convenient, affordable method of screening patients to determine the number and genomic positions of polymorphic Lis they carry and to assess the individual’s consequent risk of genetic instability. Surprisingly, the present invention fulfills these and other needs.

BRIEF SUMMARY OF THE INVENTION

[0006] In a first group of embodiments, the invention provides compositions for determining how many polymorphic LINE-1 elements (“pLls”) are present in genomic DNA of an individual subject, and at which sites within the individual’s genome the pLls are inserted. The pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases. In some embodiments, the composition comprises (a) a substrate or a plurality of substrates, (b) a plurality of first DNA probes, RNA probes, or both, attached to the substrate or the plurality of substrates, each of the DNA probes, RNA probes, or both, comprising a contiguous sequence of about 200 to about 1000 bases complementary to a consensus human genomic sequence surrounding and including one particular known pLl insertion site, for each of the pLl insertion points shown on Table 2, and (c) a plurality of second DNA probes, RNA probes, or both, which second DNA probes, RNA probes, or both, are complementary to the beginning contiguous sequence of the 300 bases of said 5’UTR of said pLl or to said 3’UTR contiguous sequence of at least 300 bases. In some embodiments, the first DNA probes, RNA probes, or both, comprise a contiguous sequence of about 200 to about 700 bases. In some embodiments, the first DNA probes, RNA probes, or both, comprise a contiguous sequence of about 250 to about 500 bases. In some embodiments, the first DNA probes, RNA probes, or both comprise a contiguous sequence of about 300 to about 400 bases. In some embodiments, the substrate is a slide. In some embodiments, the substrate is a well of a multi-well plate. In some embodiments, the substrate is a wall of a microfluidic device. In some embodiments, some or all of the solid substrates are in the form of beads. In some embodiments, the solid surface is of quartz. In some embodiments, the solid surface is of glass. In some embodiments, the plurality of solid surfaces is of plastic. In some embodiments, the attachment of the first DNA probes or the second DNA probe, or both, to the solid support or the plurality of solid supports is covalent. In some embodiments, the composition further comprises (d) a plurality of third DNA probes, RNA probes, or both, attached to the substrate or the plurality of substrates, each of the third DNA probes, RNA probes, or both, comprising a contiguous sequence of about 200 to about 1000 bases complementary to a consensus human genomic sequence surrounding and including one or more particular fixed LI insertion points associated with cancer.

[0007] In a further group of embodiments, the invention provides methods for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, may be full-length pLls in genomic DNA of a subject who has both (a) pLls, and (b) LINE-1 elements that occur at known genomic locations in all individuals (“fixed Lis”) with known genomic sequences upstream and downstream of said known genomic locations, and, with regard to the sites at which pLls are known to insert in a human genome as shown in Table 2, at which of said sites at which said sites at which pLls are known to occur pLls are present in said subject, said method comprising the following steps, in the following order: (a) obtaining genomic DNA from said subject, which genomic DNA is fragmented into lengths of choice, and (b) contacting said fragmented genomic DNA with (1) a plurality of first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes, each of which said first DNA probes and first RNA probes (i) comprises a contiguous sequence of about 200 to about 1000 bases complementary to a consensus human genomic sequence surrounding and including one particular known pLl insertion site, wherein said plurality of said first DNA probes, first RNA probes, or mixture of both first DNA probes and first RNA probes taken together comprises human genomic sequence surrounding and including each of said pLl insertion points shown in Table 2, and (ii) wherein each of said first DNA probes and said first RNA probes is (A) attached to an solid support or (B) are tagged with a tag which allows said probes to be specifically captured on a solid support when desired, and (2) a plurality of second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, wherein said second DNA probes and said second RNA are complementary to said beginning contiguous sequence of said 300 bases of said 5’UTR of said pLl, further wherein each of said second DNA probe and second RNA probe is (A) attached to a solid support or (B) are tagged to allow said probes to be specifically captured on a support when desired, under conditions allowing said fragmented genomic DNA complementary to any of said first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes or to said second DNA probes, second RNA probes, or a mixture of both second DNA probes and second RNA probes to hybridize to said probes, thereby creating a mixture of unhybridized fragmented genomic DNA, and fragmented genomic DNA that has hybridized to one of said probes, (c) if probes have been used in step (b) that are tagged to allow said tagged probes to be specifically captured on a solid support when desired, capturing said tagged probes on said solid support, (d) eluting any fragmented genomic DNA that has not hybridized to either one of said first DNA probes, first RNA probes, or mixture of both first DNA probes and first RNA probes, or one of said second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, (e) eluting from said supports and collecting for sequencing any fragmented genomic DNA that hybridized to one of said first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes, or to said second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, thereby obtaining a plurality of previously-hybridized genomic DNA fragments,

(f) sequencing said plurality of previously-hybridized genomic DNA fragments, thereby obtaining a DNA sequence for each fragment contained within said plurality of previously- hybridized genomic DNA fragments, (g) comparing said DNA sequence for each fragment contained within plurality of previously-hybridized genomic DNA fragments to consensus human genomic sequences including each of said pLl insertion sites set forth in Table 2, and determining for each of said pLl insertion sites set forth in Table 2 whether:

(1) said genomic sequence upstream for each of said pLl insertion sites is followed by (i) some or all of beginning of said LI 5’UTR sequence or (ii) some or all of said end of said LI 3’ sequence, indicating that for those insertion sites, there is a pLl present that may be full length, and (2) whether said genomic sequence downstream for each of said pLl insertion sites set forth in Table 2 is followed by (i) some or all of beginning of said LI 5’UTR sequence or (ii) some or all of said end of said LI 3’ UTR sequence, indicating that for those pLl insertion sites, there is a pLl present that may be full length. In some embodiments, the method comprises step (g)(3), compiling a list of how many pLls that have said beginning of said LI 5’UTR and said end of said LI 3’UTR are present in said genome from said individual, thereby determining how many pLls in said individual may be full-length. In some embodiments, the method further comprises step (g)(4), identifying in said list the locations of each of said pLls. In some embodiments, the method further comprises step

(g)(5), for each location in which a pLl has been identified in step (g)(4), determining whether (A) said plurality of sequenced DNA sequences also contains a normal genomic sequence uninterrupted by a pLl at said location, thereby determining that there is a copy of pLl and a normal genomic sequence at that location, indicating that said genome of said individual has one copy of genomic sequence with said pLl at said genomic location and one copy that does not have a pLl at said location, or (B) said plurality of sequenced DNA sequences do not also contain a normal genomic sequence uninterrupted by a pLl at said location, indicating that the genome of said individual has two copies of genomic sequence with said pLl at said genomic location. In some embodiments, the method further comprises steps:

(h)(1), comparing the genomic sequences upstream and downstream of all LI sequences in said plurality of sequenced DNA sequences to the genomic sequence upstream and downstream of said fixed Lis in said individual, (h)(2), determining how many fixed Lis have been detected compared to the number known to exist in the human genome, and (h)(3) reporting whether the number of fixed Lis detected in said individual is the same or different from the number of fixed Lis known to exist in said human genome. In some embodiments, the tag allowing the tagged probes to be specifically captured on a solid support is biotin or streptavidin. In some embodiments, the tag to allow said tagged probes to be specifically captured on a solid support is an antigen which is specifically bound by an antibody attached to said solid support. In some embodiments, the antigen is digoxigenin and the antibody is an anti-digoxigenin antibody.

[0008] In another group of embodiments, the invention provides methods for determining if an individual has a risk of developing cancer or Alzheimer’s Disease due to polymorphic LINE-1 elements (“pLls”) related to risk of cancer or Alzheimer’s Disease in said individual’s genome, said method comprising, determining if said individual carries one of more pLls and, if so, how many, selected from the following groups: (a) pLls identified in Table 2 as found by WGS, SCORE, or both, only in individuals diagnosed with breast cancer,

(b) pLls identified in Table 2 as found by WGS, SCORE, or both, only in individuals diagnosed with prostate cancer,

(c) pLls identified in Table 2 as found by WGS, SCORE, or both, in genomes of both individuals diagnosed with breast cancer and in genomes of individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS,”

(d) pLls identified in Table 2 as found only in individuals diagnosed with Alzheimer’s Disease,

(e) pLls identified in Table 2 as found by WGS, SCORE, or both, in individuals diagnosed with Alzheimer’s Disease, in individuals diagnosed with breast cancer, and in individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS,” wherein, if said individual has one or more pLls identified in groups (a)-(e), said individual is at risk of developing cancer or Alzheimer’s Disease. In some embodiments, the pLls are of group (a), and the individual’s risk is of breast cancer. In some embodiments, the pLls are of group (b), and the individual’s risk is of prostate cancer. In some embodiments, the pLls are of group (c), and the individual’s risk is of cancer in general, (if female) breast cancer in particular, or, (if male) prostate cancer in particular. In some embodiments, the pLls are of group (d), and the individual’s risk is of Alzheimer’s Disease. In some embodiments, the pLls are of group (e), and the individual’s risk is of cancer or Alzheimer’s Disease.

[0009] In yet another group of embodiments, the invention provides methods for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, may be full-length pLls in genomic DNA of a subject who has both (a) pLls, and (b) LINE-1 elements that occur at known genomic locations in all individuals (“fixed Lis”) with known genomic sequences upstream and downstream of said known genomic locations, with regard to pLl insertions sites at which pLls are shown in Table 2 to be: (group 1) found to be inserted at said sites only in persons diagnosed with breast cancer, (group 2) found to be inserted at said sites only in persons diagnosed with prostate cancer, (group 3) found to be inserted at said sites in both persons diagnosed with breast cancer and in persons diagnosed with prostate cancer, (group 4) found to be inserted at said sites only in individuals diagnosed with Alzheimer’s Disease, or, (group 5) found to be inserted at said sites in individuals diagnosed with Alzheimer’s Disease, in individuals diagnosed with breast cancer, and in individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS”, said method comprising the following steps, in the following order: (a) obtaining genomic DNA from said subject, which genomic DNA is fragmented into lengths of choice, and (b) contacting said fragmented genomic DNA with (1) a plurality of first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes, each of which said first DNA probes and first RNA probes (A) comprises a contiguous sequence of about 200 to about 1000 bases complementary to a consensus human genomic sequence surrounding and including one particular known pLl insertion site, wherein said plurality of said first DNA probes, first RNA probes, or mixture of both first DNA probes and first RNA probes taken together comprises human genomic sequence surrounding and including each of said pLl insertion points in at least one of said groups (1) to (5), and (ii) wherein each of said first DNA probes and said first RNA probes is (A) attached to an solid support or (B) are tagged with a tag which allows said probes to be specifically captured on a solid support when desired, and (2) a plurality of second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, wherein said second DNA probes and said second RNA are complementary to said beginning contiguous sequence of said 300 bases of said 5’UTR of said pLl, further wherein each of said second DNA probe and second RNA probe is (A) attached to a solid support or (B) are tagged to allow said probes to be specifically captured on a support when desired, under conditions allowing said fragmented genomic DNA complementary to any of said first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes or to said second DNA probes, second RNA probes, or a mixture of both second DNA probes and second RNA probes to hybridize to said probes, thereby creating a mixture of unhybridized fragmented genomic DNA, and fragmented genomic DNA that has hybridized to one of said probes, (c) if probes have been used in step (b) that are tagged to allow said tagged probes to be specifically captured on a solid support when desired, capturing said tagged probes on said solid support, or, if said probes were already attached to a solid support, proceeding to step (d), (d) eluting any fragmented genomic DNA that has not hybridized to either one of said first DNA probes, first RNA probes, or mixture of both first DNA probes and first RNA probes, or one of said second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, (e) eluting from said supports and collecting for sequencing any fragmented genomic DNA that hybridized to one of said first DNA probes, first RNA probes, or a mixture of both first DNA probes and first RNA probes, or to said second DNA probes, second RNA probes, or mixture of both second DNA probes and second RNA probes, thereby obtaining a plurality of previously -hybridized genomic DNA fragments, (f) sequencing said plurality of previously-hybridized genomic DNA fragments, thereby obtaining a DNA sequence for each fragment contained within said plurality of previously -hybridized genomic DNA fragments, (g) comparing said DNA sequence for each fragment contained within plurality of previously-hybridized genomic DNA fragments to consensus human genomic sequences including each of said pLl insertion sites for said in at least one of said groups (1) to (5), and determining for each of said pLl insertion sites in said at least one of said groups (1) to (5) whether: (1) said genomic sequence upstream for each of said pLl insertion sites is followed by (i) some or all of beginning of said LI 5’UTR sequence or (ii) some or all of said end of said LI 3’ sequence, indicating that for those insertion sites, there is a pLl present that may be full length, and (2) whether said genomic sequence downstream for each of said pLl insertion sites set forth in Table 2 is followed by (i) some or all of beginning of said LI 5’UTR sequence or (ii) some or all of said end of said LI 3’ UTR sequence, indicating that for those pLl insertion sites, there is a pLl present that may be full length. In some embodiments, the method further comprises step (g)(3), compiling a list of how many pLls that have said beginning of said LI 5’UTR and said end of said LI 3’UTR are present in said genome from said individual, thereby determining how many pLls in said at least one of said groups (1) to (5) may be full-length. In some embodiments, the methods further comprise step (g)(4), identifying in said list the locations of each of said pLls in said at least one of said groups (1) to (5) present in said individual. In some embodiments, the methods further comprise step (g)(5), for each location in which a pLl has been identified in step (g)(4), determining whether (A) said plurality of sequenced DNA sequences also contains a normal genomic sequence uninterrupted by a pLl at said location, thereby determining that there is a copy of pLl and a normal genomic sequence at that location, indicating that said genome of said individual has one copy of genomic sequence with said pLl at said genomic location and one copy that does not have a pLl at said location, or (B) said plurality of sequenced DNA sequences do not also contain a normal genomic sequence uninterrupted by a pLl at said location, indicating that the genome of said individual has two copies of genomic sequence with said pLl at said genomic location. In some embodiments, the methods further comprise steps: (h)(1), comparing the genomic sequences upstream and downstream of all LI sequences in said plurality of sequenced DNA sequences to the genomic sequence upstream and downstream of said fixed Lis in said individual, (h)(2), determining how many fixed Lis have been detected compared to the number known to exist in the human genome, and (h)(3) reporting whether the number of fixed Lis detected in said individual is the same or different from the number of fixed Lis known to exist in said human genome.

[0010] In still another group of embodiments, the invention provides electronic devices configured for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, are present in genomic DNA of a subject, and at which of the sites at which pLls are known to insert said pLls are present in said genomic DNA of said subject, said device comprising a processor and memory, said memory storing computer executable instructions for performing the methods of one or more the groups of embodiments set forth above.

[0011] In yet a further group of embodiments, the invention provides kits for determining, with regard to a human genome having a genomic sequence proceeding in direction from 5’ to 3’, which genome has known potential insertion points at which a full-length polymorphic LINE-1 element (“pLl”) may be inserted as set forth in Table 2, which of said insertion point has had a pLl inserted, said full-length pLls having a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence, said kit comprising (a) a set of probes for all or substantially of said potential insertion points listed in Table 2, each member of which set of probes comprises (i) a sequence complementary to genomic sequence contiguous to one of said insertion points at which pLl inserts into said genome, attached directly to a sequence complementary to at least the first 100 bases of said beginning of said 5’UTR of said pLl, and, (b) probes consisting essentially of 100-600 contiguous bases of said pLl 5’UTR.

[0012] In another group of embodiments, the invention provides kits for determining with regard to a human genome having a genomic sequence proceeding in direction from 5’ to 3’, which genome has 826 known potential insertion points at which a full-length polymorphic LINE-1 element (“pLl”) may be inserted which of said insertion point has had a pLl inserted, said full-length pLls having a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence, said kit comprising (a) a set of probes for a subset of said 826 potential insertion points listed in Table 2, said subset consisting of one or more of said following groups: group 1: pLl insertions sites at which pLls are shown in Table 2 to be found to be inserted at said sites only in persons diagnosed with breast cancer, group 2: pLl insertions sites at which pLls are shown in Table 2 found to be inserted at said sites only in persons diagnosed with prostate cancer, group 3: pLl insertions sites at which pLls are shown in Table 2 found to be inserted at said sites in both persons diagnosed with breast cancer and in persons diagnosed with prostate cancer, group 4: pLl insertions sites at which pLls are shown in Table 2 found to be inserted at said sites only in individuals diagnosed with Alzheimer’s Disease, and, group 5: pLl insertions sites at which pLls are shown in Table 2 found to be inserted at said sites in individuals diagnosed with Alzheimer’s Disease, in individuals diagnosed with breast cancer, and in individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS”, each member of which set of probes comprises (i) a sequence complementary to genomic sequence contiguous to one of said insertion points at which pLl inserts into said genome, attached directly to a sequence complementary to at least the first 100 bases of said beginning of said 5’UTR of said pLl. In some embodiments, the kit further comprises (b) probes consisting essentially of 100-600 contiguous bases of said pLl 5’UTR. In some embodiments, the subset is the pLl insertion sites of group 1. In some embodiments, the subset is the pLl insertion sites of group 2. In some embodiments, the subset is the pLl insertion sites of group 3.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] Figs. 1A-C. Fig. 1A. Fig. 1A is a schematic diagram illustrating aspects of designing probes for use in the inventive methods. The top thick line show a genomic sequence into which a first pLl, designated as pLl#l in Fig. 1A, has inserted at a location annotated in the human genome. The original genomic sequence thus has been divided, with a section upstream of the pLl and then continuing downstream of the inserted pLl. The thinner line immediately under the first thick line represents a probe designed to detect if a pLl has inserted at this genomic location. The probe consists of has a first portion having the genomic sequence immediately upstream of the site at which the pLl inserts, joined to the beginning of the pLl 5’UTR. The second thick line depicts the insertion of a pLl at a second genomic location annotated in the human genome, and the design of a similar probe based on this second genomic location at which a pLl inserts. The third thick line represents a design for a portion of genomic sequence at which a pLl has been found to insert, but which has not yet been annotated in the consensus human genome as having a pLl. This design accommodates any pLl which orientation in the genome is not known, as a pLl may insert into the genome in either orientation. Fig. IB. Fig. IB shows the design of a probe to detect the presence of pLls at sites at which pLls are not known to insert, and to detect the presence of PA2s. Fig. 1C. Fig. 1C is a schematic diagram showing the overall flow of some embodiments of the inventive methods. The top diagram shows genomic DNA containing pLls (shaded boxes) and a fixed Lis of the type referred to as “PA2” (horizontally striped box) present in the genome. After the DNA is fragmented (in this hypothetical example, by being sheared), it is hybridized to the probes of the various types described for Figs. 1A and IB. The DNA that hybridized to a probe is then amplified, sequenced, and analyzed by bioinformatics to determine the sites at which pLls and PA2s have been detected as present in the genome.

[0014] Figure 2. Figure 2 is a graph showing the detection of PA2 fixed LI elements in genomes in publicly available data sets analyzed by others by whole genome sequencing (“WGS”), or analyzed in the studies reported below using the inventive methods (“SCORE,” an acronym for the phrase “Screen for Content Of Retro Elements”). Controls: persons who have not been diagnosed with cancer or Alzheimer’s Disease prior to genome analysis. Alzheimer’s Disease: genomes from individuals diagnosed with Alzheimer’s Disease prior to genome analysis. Breast cancer: genomes from individuals diagnosed with breast carcinoma prior to genome analysis. Prostate cancer: genomes from individuals diagnosed with prostate adenocarcinoma prior to genome analysis. Each circle represents a single genome.

[0015] Figure 3. Figure 3 is a graph showing the detection of pLls in the same genomes as described for Figure 2. Figure 3 shows that the inventive methods (“SCORE”) detect considerably more pLls than are detected by conventional analysis by whole genome sequencing (“WGS”).

[0016] Figure 4. Figure 4 is a graph showing pLls that are present in the same genomes as described for Figure 2, after subtracting pLls at locations at which a pLl was present in at least one genome in each group. Use of the inventive methods (“SCORE”) results in detecting numerous pLls that were not detected by conventional whole genome sequencing (“WGS”). Data are presented as mean with standard deviation bars. Data were analyzed by two-tailed Student’s ί-test for n= 2 groups. Significant P-values are indicated as follows: **** P O.0001,*** P< 0.005, ** P< 0.01, * P< 0.05. Statistical analysis was performed with GraphPad Prism.

[0017] Figure 5. Figure 5 is a graphical depiction of data regarding pLls presented in Table 2. Th graph shows the number of pLls detected in genomes in persons who had been diagnosed with Alzheimer’s Disease, Breast Cancer, or Prostate Cancer, or who had not been diagnosed with one of these conditions (“Control”), after excluding pLls present in at least one genome in each group. WGS and SCORE results were combined for breast cancer and for prostate cancer.

DETAILED DESCRIPTION

Introduction

[0018] As discussed in the Background, there are some 500,000 Long Interspersed Element- 1 (“LI”) retroelements in the human genome. Of these approximately 500,000 LI elements, only some 5000 are full length elements that contain the internal promoter, see, Deininger, supra, Lander et ah, supra, and fewer have been identified as being active; that is, they have both 5’- and 3’- UTRs and no inactivating rearrangements and are capable of introducing new copies of themselves into the genome. Many functional LI elements have inserted themselves at known positions in the human genome and are present in all human genomes in the same number. Functional and non-functional LI elements present in all human genomes in the same number are sometimes referred to herein as “fixed Lis,” “PA2s,” or “LlPA2s” (fixed Lis and PA2s will be discussed in more detail in a later section). In addition to the fixed LI elements, however, there are also full-length Lis retroelements that vary in number between individuals. Any one individual may have a different number of these LI elements compared to others. Moreover, any one individual may have a different subset of locations at which these active LI elements have inserted in their genome compared even to another individual with the same overall number of such Lis. The active LI elements that vary in number and location among individuals are sometimes called “polymorphic” or “hot” LI elements, as they can generate new integration events. See, e.g., Deininger, supra. Lis elements that vary in number among individuals will sometimes be referred to herein as “polymorphic Lis” or as “pLls.”

[0019] By definition, functional LI elements can continue to insert additional copies into the genome of individuals who carry them, and can contribute to genetic instability during the individual’s life. Such new insertions potentially increase the individual’s risk of developing diseases such as cancer. Further, pLls insertions in a portion of the genome specifically expressed in a particular tissue or organ can increase the possibility that a cancer in that tissue or organ will be more aggressive than might otherwise be the case, and require more aggressive or different treatment than might otherwise be the case.

[0020] As the number of fixed Lis is the same among all individuals, the risk of genetic instability posed by fixed Lis is likely to similar for all individuals. But the number and specific locations of polymorphic Lis by definition varies from individual to individual, and a higher number of pLls places carriers at a higher risk of genetic instability compared to those with lower numbers. For example, a person with a small number of pLls would be considered at lower risk of genetic instability due to LI -associated mutations, while someone with a higher number of pLls would be considered at higher risk of genetic instability. Further, persons with a low number of pLls in their genomes, but with inherited defects in DNA repair pathways (especially those already known to increase the risk of developing cancer), would be considered at a higher risk for genetic instability from LI than those without such genetic defects, because various DNA repair pathways guard against Ll- induced genomic alterations. Moreover, the particular locations at which a pLl has inserted is also important. As discussed further below, we have discovered that some pLls inserted at some genomic locations are more likely to be associated with cancers or with Alzheimer’s Disease than others, and that persons with pLls inserted at the particular subsets of sites identified below therefore should be monitored more closely for development of cancer or Alzheimer’s Disease than persons without pLl insertions at these locations in the genome.

[0021] Persons with higher risk of genetic instability from pLls might therefore benefit from being having more frequent medical checkups, starting from a younger age. Those with a high number of pLls should also be considered in a number of pathologies other than cancer or Alzheimer’s Disease. Among these are neurodegenerative diseases, infertility with unknown etiology, spontaneous abortions with unknown etiology, and sporadic genetic diseases with unknown etiology. Similarly, a person newly diagnosed with cancer who is determined to have a low number of pLls in their genome might respond better to treatment than a person newly diagnosed with cancer who has a high number of pLls in their genome.

[0022] Unfortunately, there is currently no convenient, affordable, and reproducible means for identifying the number of pLls present in a particular individual, or for identifying patients with pLl elements in a particular tissue type. Currently, the most direct method is to sequence the individual’s entire genome in a procedure called whole genome sequencing, and to use bioinformatics programs to, first, search for each copy of full-length LI and, second, determine which are fixed and which are in genomic positions at which the presence of an LI retrotransposon is variable. While the price of whole genome sequencing has been dropping rapidly, sequencing the 3 billion + bases of the entire human genome to an informative depth is too expensive and time consuming for wide scale or routine screening, and render whole genome sequencing unsuitable for high throughput screening.

[0023] Surprisingly, the present invention solves these problems. In various embodiments, the invention provides methods and devices for determining which of the genomic locations at which pLl elements that have found to insert to date are occupied by a pLl in a subject, without the need for whole genome sequencing. Further, the methods, devices, and kits can not only identify which genomic locations at which pLl elements are known to insert are in fact occupied by a pLl element in a given subject, but also can determine whether the subject is heterozygous or homozygous with respect to a particular genomic location (that is, for the diploid chromosomes, whether a pLl has inserted at a particular genomic location on both of the copies of the individual’s chromosome, or just one). Moreover, the methods, devices and kits allow determining if the individual has any pLls present that have not been previously identified. And, the methods, devices and kits include internal controls that allow the practitioner to determine if the assay is valid or whether the information provided is suspect due to, for example, a problem with the reagents or with storage of the DNA used as a patient sample. And, because the methods, devices and kits do not require whole genome sequencing, they are not only cheaper and faster than whole genome sequencing-based techniques, but they are also more sensitive and can also be used for high-throughput screening. In sum, the inventive methods, devices, systems, and kits provide a surprising combination of advantages that have not previously been available in the art.

[0024] One problem with whole genome sequencing, or “WGS,” is that cost considerations usually constrain the number of cycles that the practitioner has ran on a genomic sample. This depth of sequencing is set by the practitioner at the beginning and is often not sufficient to detect all pLls present in a sample, particularly those present in low allelic frequency. The studies reported below show that the inventive techniques uncovered pLls at sites at which they were not located by WGS performed on other individuals with the same general diagnosis.

[0025] Further, our studies show that genomes of persons with breast cancer and prostate cancer had more pLls present than did available data regarding the genomes of persons diagnosed with Alzheimer’s Disease (“AD”) or who had not been diagnosed with either AD or with breast or prostate cancer (persons in this latter group will sometimes be referred to below as “controls”). Further, we found that persons with breast cancer or with prostate cancer had pLls present in genomic locations at which pLls had not inserted themselves in persons with AD or in controls. Figure 5 presents the results of our analysis of pLls present in genomes from persons with AD, breast cancer, prostate cancer, or controls. The Figure shows that genomes from persons with prostate cancer had pLls present at 166 locations that were not shared with controls or with persons in the other groups studied, while genomes from persons diagnosed with breast cancer had 10 pLls at positions in which they were not found in any of the genomes from persons in the other groups, and 80 pLls at positions that were also locations of pLls in genomes from persons who had been diagnosed with prostate cancer. Moreover, while the genomes from persons diagnosed with AD had pLls at only 2 locations unique to that disorder in these datasets, the genomes had 1 pLl at a location at which a pLl was also found in persons with breast cancer, 17 with persons who had either breast cancer and prostate cancer, and 20 more with persons who had prostate cancer that were not also found in genomes from individuals diagnosed with breast cancer. Further, genomes from persons with prostate or breast cancer had more polymorphic Lis than did persons with AD or controls. And, reviewing the ages of the individuals, it was noted that persons 70-80 years old had fewer pLls than did persons 40-70 years old who had been diagnosed with breast cancer or prostate cancer. On the other hand, no difference in the number of pLls was noted between controls and persons with late-onset AD.

[0026] The results of the studies reported here show that the inventive methods make possible determining the number and distribution of pLls in the genome of individuals, and that differences in the number and distribution of pLls in the individual’s genome can be used to determine whether they are more or less likely to develop cancer. In particular, as shown in Figure 5, the finding of pLls at certain genomic locations was found to be correlated with breast cancer or with prostate cancer, respectively. Thus, individual’s whose genomes have pLls at some or all of the genomic locations noted should be monitored for cancer in general and, depending on the sex of the individual and the particular locations at which the pLls are present, for breast cancer or prostate cancer, respectively. The full list of genomic locations at which pLls are known to insert into the genome is set forth in Table 2, as are the particular genomic locations at to which the presence of a pLl was associated with breast cancer, prostate cancer, or AD.

LI, the Human Genome, and pLl Insertion Sites

[0027] As noted, active Lis have an internal promoter, both a 5-UTR and a 3’-UTR, and no inactivating rearrangements; they are therefore capable of introducing new copies of themselves into the genome. The full-length sequence of LI is known in the art and available in references such as Scott et ah, Genomics. 1987; 1(2): 113-25 and Boissinot et ah,

Molecular Biol and Evolution, 2000; 17(6):915— 928. Studies of the evolution of LI elements in the human genome have resulted in further categorization of the elements as belonging in the PA subfamilies or in the subfamily HS. L1PA elements are fixed, while L1HS elements are considered to be younger, with some being fixed and some being polymorphic. Thus, the majority of fixed LI elements are members of the PA subfamily, while all polymorphic Lis are in the HS subfamily. Lixed Lis are considered to be older in terms of the length of time they have been present in the human genome, and therefore have had more time in which to develop mutations. Some of these mutations can result in frame shifts or other changes that render the LI incapable of introducing new copies of themselves. Lixed Lis with such mutations are, by definition, inactive. The sequence of any particular fixed or polymorphic LI can be readily reviewed to determine if it is active or if mutations have rendered it inactive.

[0028] The sequence of the human genome was first published by Lander et ak, Nature,

2001, 409:860-921 (references herein to the “human genome” or to the “genome” refer to the nuclear genome, not to the mitochondrial genome). The Genome Reference Consortium (“GRC”) currently maintains on GenBank a curated, publicly accessible, consensus reference genome sequence. As of this writing, the consensus sequence is Human Build 38 patch release 13 (GRCh38.pl3), GenBank assembly accession: GCA_000001405.28; RefSeq assembly accession: GCL_000001405.39. The reference genomic sequence for each chromosome is available on GenBank, as set forth in Table 1. Table 1

While the genome of individuals differs from that of the reference genome, for example, by the presence of single nucleotide polymorphisms and the genetic variation that causes differences between individuals, that variation is not expected to significantly affect the conduct or performance of the inventive methods or devices. [0029] As of this writing, over 800 sites in the human genome have been identified as positions at which a pLl has been found. The sites can be referred to by their positions in the respective chromosomes and can conveniently be identified by reference to the genome sequence set forth in GenBank. Table 2, below, identifies the insertion points with respect to HumanBuild assembly 19 of the over 800 known pLl insertion points, as well as a few fixed LI locations (some of which are designated by being preceded by asterisks) reported to be active in certain cancers. As noted above, the current build is 38, patch release 13.

[0030] As the current build of the human genome in GenBank changes over time, persons of skill are accustomed to translating positions in any previous assembly of the genome to the current assembly, and various tools have been created to facilitate translating information from previous assemblies to more current ones. For example, the University of California, Santa Cruz (“UCSC”) maintains an on-line tool suite which it calls the Genome Browser.

See, e.g, Kent et ak, “The human genome browser at UCSC,” Genome Res. 2002, 12(6):996- 1006; Karolchik et ah, “The UCSC Table Browser data retrieval tool,” Nucleic Acids Res. 2004, l;32(Database issue) :D493-6. In particular, the UCSC Lift Genome Annotations tool (hi.tps://genome.ucsc.edi.i/cgi-bin/hgLiftOv&r) converts genome coordinates and genome annotation files between assemblies and can be used to convert the coordinates set forth in Table 2 (assembly 19) to newer assemblies as they are developed.

[0031] As persons of skill appreciate, each human nucleated somatic cell carries two copies of each chromosome. Further, even if a pLl is present in an individual’s genome, to be capable of replication, it must be full length, which means it must have a full LI 5’ UTR.

[0032] The genome of an individual can exist in one of several states (with respect to pLl) with respect to each site on each chromosome at which polymorphic Lis have been found to date. First, the genome may be the normal sequence of the chromosome on both copies of the chromosome carried by that individual at the particular site a pLl can insert: in that case, a pLl is not present at that site in either copy of that individual’s chromosome. Second, one copy of the chromosome at that site can have the normal genomic sequence, but the other copy of the chromosome can a genomic sequence interrupted by the presence of the sequence of a pLl, which shows that one copy of the individual’s chromosome carries a pLl at that location. Third, the normal sequence of both copies of the chromosome at that site may be interrupted by the presence of the sequence of a pLl, in which case the sequence shows that a pLl is present in both copies. The fact that a pLl sequence is present in one or both copies of the chromosome at that site does not necessarily mean it is active. If the pLl sequence at the site does not commence with the start of the LI 5’ UTR, it is not a full-length sequence, and cannot be active.

[0033] As noted above, as of this writing, there are over 800 sites in the human genome at which pLls have been reported to have been found to be inserted. Analyzing the sequence of an individual’s genome on either side of a point in the genome at which a pLl is known to insert therefore allows the practitioner to determine whether or not a pLl is present in that individual at that genomic location on at least one of the individual’s two copies of the chromosome on which that genomic location occurs. In some embodiments, the inventive methods and devices allow that determination to be made conveniently for each of the over 800 pLls that have been identified to date, to identify if the individual has a pLl copy on one copy of a given chromosome or a pLl present on both copies of the chromosome, and to identify if the individual bears any pLls that have not been identified to date. Additional pLl insertions are still being found from time to time as research into the genome and LINE-1 elements continues. It is anticipated that any such new pLl insertion points will be added to the list of known pLl insertion points so that it can be also be determined whether a subject has a pLl at the newly -known insertion point on one or both copies of the chromosome on which the new pLl insertion point has been identified. It is further anticipated that in embodiments of the inventive methods using a probe set to detect the full set of sites at which pLls have inserted into an individual’s genome, probes for such newly identified sites will be added to the probe set to improve the diagnostic power of the methods and of devices using them to analyze the resulting genomic information.

[0034] This section will first present a brief overall of some embodiments of the inventive methods, followed by a more detailed discussion of some aspects. The entire human genome was first sequenced in 2001, and almost all of the sequences of all 22 numbered chromosomes and of the X and the Y chromosomes have been identified. As noted above, the over 800 currently-identified sites at which pLls have been found to insert into the genome are also known, as is the normal genomic sequence at each site if a pLl has not inserted itself at that site. For convenience of reference, the position in the sequence of a chromosome at which a particular pLl element inserts into the normal genomic sequence is sometimes interchangeably referred to herein as the “pLl insertion point” or the “pLl insertion site.” The portions of the genomic sequence adjacent to the presence of a pLl can be referred to as being upstream (“5””) or downstream (“3”’) of the insertion point, respectively. For clarity with respect to later portions of the discussion, it is noted that a pLl may be inserted into any particular insertion site in a subject’s genome in either orientation (that is, 5’ to 3’ or 3’ to 5’). Thus, the 5’ portion of the subject’s genomic sequence adjacent to the inserted pLl may be adjacent to the 3’ end of the pLl, while the 3’ portion of the genomic sequence adjacent to the inserted pLl may be adjacent to the pLl’s 5’UTR, or vice versa. Further, as noted elsewhere herein, while the sequence of all pLls is the same or closely the same as all others, there are over 800 locations in the genome at which a pLl has been found to insert. For clarity, references to an individual having a given number of pLls refer not to different types of, or variations between, the pLls, but to the number of locations in that individual’s genome at which a pLl was found to have been inserted.

[0035] In some embodiments of the inventive methods, a sample containing genomic DNA from the subject is obtained (references to a “subject,” “individual” or “patient” herein refers to a human subject). The sample can be obtained from any part of the subject’s body, and can be taken prenatally (in the case of genetic testing of a fetal genome), shortly after birth, or at any time thereafter during the subject’s life.

[0036] Collection of DNA from an individual is routine. Fetal DNA can be collected by, for example, amniocentesis. Post-natally, it is conveniently performed by swabbing the inside of the subject’s cheek with a cotton swab, collection of a blood sample, or by taking a biopsy of any part of the individual (including, for neonates or for persons whose umbilical cord has been preserved, their umbilical cord). The genomic DNA is then isolated and fragmented, typically by shearing. The technique is generally selected and performed so as to result in randomly-generated segments of genomic DNA (for convenience of reference, hereafter referred to “sheared DNA”) which have been sheared to a length falling within maximum and minimum limits chosen by the practitioner. Typically, the practitioner will choose a maximum length that is convenient and cost effective to sequence using the techniques available at the time the DNA fragments are being sequenced, and a minimum length that is sufficient to identify target portions of the genome, as discussed further below. In current practice, the segments of sheared DNA are preferably between 100-600 base pairs (“bps”) in length, more preferably 150-500 bps in length, still more preferably 200-500 bps in length, even more preferably 250-450 bps in length, most preferably about 300 - about 400 bps in length, where “about” means ±25 bps. [0037] In preferred embodiments, the sheared DNA fragments are then tagged in a manner that allows sequences of interest to be captured or otherwise enriched, while allowing non target sequences to be eluted or otherwise removed from the sequences to be sequenced. The capturing is typically conducted using DNA or RNA sequences complementary to the sequence or sequences of interest (sometimes referred to herein as the “target” sequences). For example, the sheared fragments can be contacted with SURESELECT® probes (Agilent Technologies, Inc., Santa Clara, CA), which are complementary to one or more sequences of interest, such as the 5’UTR of LI. Agilent’s “SureDesign” system allows constructing probes customized for target sequences of interest. SURESELECT® probes are biotinylated. The probes are placed in contact with the sheared DNA fragments under conditions allowing them to hybridize to the complementary target sheared DNA sequences with the desired degree of stringency. The sequences hybridized to the probes are then captured by contacting the hybridized sequences with the biotinylated probes to streptavidin-coated magnetic beads. The streptavidin-coated magnetic beads, with the captured sheared DNA sequences hybridized to the probes, can then be retained, while sheared DNA not having a target sequence complementary to that of the probes can be eluted or otherwise removed.

[0038] In some embodiments, the streptavidin-coated magnetic beads may be disposed on a solid support. A variety of such solid supports are used in the art but, by way of example, the solid support may be in the form of beads, of a microwell in a multi-well plate, or a slide. In these embodiments, the captured sequences will be retained on the solid support, while sheared DNA that has not hybridized is washed away.

[0039] The sheared DNA that hybridized to the probes is then released from the probes, eluted, and subjected to next generation sequencing protocols. The sequences of the sheared DNA are then entered into and analyzed by bioinformatics software programmed to make the determinations discussed below.

Probes, Sequences, and Methods of Detecting the Number and Locations of pLls in a Individual Subject

[0040] As noted above, over 800 genomic locations have been identified at which a pLl has been found to insert. The over 800 known pLl insertion sites known as of this writing are set forth in Table 2, along with some insertion points of fixed Lis that are known to cause mutations in certain cancer types (the fixed LI insertion points are designated by being preceded by an asterisk). In some embodiments, the inventive methods and devices comprise two types of DNA or RNA probes, which will be discussed in turn.

[0041] The following discussion sets forth the design of probes to determine, first, which of the over 800 sites at which pLls are known to insert are occupied by a functional pLl in an individual, and, second, to determine if that individual also has a pLl present at a site which has not previously been identified as one in which a pLl may insert into the genome. Figures 1A and B are schematic diagrams which set forth the design plan for probes for use in some embodiments of the inventive methods to pull pLls and PA2s out of an individual’s genome for analysis. Fig. 1C is a schematic diagram of methods used in various embodiments to use probes to detect the locations at which pLls have inserted into the genome of an individual. For example, if the probes used include probes specific for each one of the over 800 sites at which pLls have been found to insert in the human genome, and probes that will detect the presence of any pLls that have inserted at one or more sites at which pLls are not known to insert as of the time the probe set is used, the methods should result in detecting all of the pLls in the individual’s genome. If the probes used are limited to those for a specific subset of locations at which pLls are known to insert, such as a probe set designed to detect the presence of pLls at the specific locations shown in Table 2 to be present only in persons who have been diagnosed with specific conditions, then those embodiments will detect only the presence of pLls at those positions as opposed to all of the positions in the individual’s genome at which a pLl has inserted. For clarity, it is noted that an assay using probes for all of the sites at which pLls are known to insert will also detect pLls that have inserted at sites identified in Table 2 as present only in persons with specific conditions. Using a probe set specific to one or more chosen subsets, however, such as to detect whether an individual has pLls inserted at one or more sites that are occupied by a pLl only in individuals that have been diagnosed with prostate cancer, allows gathering desired information while sharply reducing the number of genomic fragments that have to be sequenced, and the resulting number of sequences that have to be analyzed by bioinformatics, and thus can reduce the cost of reagents used and time to analyze and report the information of interest.

[0042] Fig. 1A, top two lines, shows the design of probes to detect the presence of pLls that insert at known, annotated locations in the genome (pLl#l and pLl#2 in Fig. 1 A). Probes designed to detect whether the genome of an individual does or does not have a pLl inserted at one of the over 800 sites at which pLls are known to insert consist of a DNA sequence with two components: (1) a sequence complementary to the genomic sequence immediately before the site at which a pLl is known to insert, followed without interruption by (2) a sequence complementary to the pLl 5’ UTR. Design of such probes is discussed in more detail later in this section.

[0043] Fig. 1 A also shows the design of a probe for a site at which a pLl is known to insert, but which has not yet been annotated in the consensus human genome. The presence of such known, but unannotated pLls, can be detected using probes similar to those just described to detect pLl# and pLl#2. Probes such as those shown for pLl#l and 2 for pLls at sites that have not yet been annotated, however, are more costly than probes for sites that have been annotated. The probe shown for pLl#3 avoids the cost concern by being comprised of the genomic sequence at the point the unannotated pLl has been found to insert. For clarity, it is noted that all functional pLls have the same or closely similar sequences. The designations of the pLls that insert at the particular hypothetical sites shown in Fig. 1A as pLls “#1,”

“#2,” and “#3” is for the reader’s convenience of reference regarding the sites at which a pLl is known to insert into the genome, and in determining whether a pLl has or has not inserted at any particular site in the genome of a particular individual or a particular population of individuals.

[0044] Turning now to designing probes for use in various embodiments of the inventive methods, the first type of probes consists of sequences designed to be complementary to the sequences surrounding and including a plurality of, and preferably all, of the known pLl insertion points (that is, a first probe has a sequence designed to be complementary to the first pLl insertion site known on chromosome 1, a second probe has a sequence designed to be complementary to the second pLl insertion site known on chromosome 1, and so on). The sequences are preferably short enough to be readily made, but long enough to hybridize to and thereby capture segments of sheared DNA that are complementary to the probe under the selected hybridization conditions. Analyzing the fragmented DNA captured by the probes then reveals whether the subject has a pLl inserted at each of the insertion points for which a probe is provided, whether the pLl is inserted in each of the two copies of the chromosome in the subject’s genome or just one, and whether the pLl is likely to be full-length and therefore capable of being active, in which case the practitioner can optionally choose it as a candidate for further sequencing to verify its sequence and if the pLl is indeed full-length.

[0045] For each of the pLls annotated in the human genome, the selection of the coordinates should account for the presence of the respective Lis in the genome. For such pLls, the human genome sequence immediately upstream or immediately downstream, or both, of the location of the pLl insertion point will be used to determine the complementary sequences to be used for the probes. (Probes to the genomic sequence either upstream or downstream of an insertion point are expected to capture sequence from an inserted pLl; having probes to the genomic sequence both upstream and downstream of an insertion point provides redundancy and can be used in some embodiments.)

[0046] To illustrate how the probes allow determining which pLls are present in an individual and whether the individual has a pLl on one copy or of both of a particular chromosome at a particular insertion point, the discussion below uses as an example the first pLl insertion point in chromosome 1. Referring to Table 2, the first pLl insertion point shown on chromosome 1 is at position 32004332. If the practitioner elects, as an example, to use DNA probes of 300 bp 5’ and 300 bp 3’ of the insertion point, the probes will therefore be made to have a sequence complementary to that of chromosome 1 from position 32004032 to position 32004632. (For the reader’s convenience in focusing on the positions of the genomic sequence of interest, some of the discussion below omits the leading numbers 32004, which are indicated by an apostrophe.)

[0047] When fragmented DNA from whom the DNA sample is captured, eluted, and sequenced, if the individual has no pLl present at this site on chromosome 1, all the DNA captured by the probes for this pLl insertion point on chromosome 1 will have the normal genomic sequence at positions Ό32 to ‘632 (sites at which apLl has not inserted in a subject’s genome are sometimes referred to as an “empty site”). If a pLl is present on one of the two copies of the chromosome at position ‘332, sequencing of the sheared DNA captured by the DNA probe will reveal (1) some sequences that have the normal genomic sequence at positions ‘032 to ‘632 and some that have the normal genomic sequence at positions Ό32 to ‘332 and then a portion of sequence from pLl and (2) some sequences that have pLl sequence, followed by the normal genomic sequence of positions ‘333 to ‘632. If the subject has a pLl present on both copies of chromosome 1, sequencing of the sheared DNA captured by the DNA probe will determine that all the sequences have the normal genomic sequence at positions ‘032 to ‘332 and then a sequence from pLl, and other sequences having a portion of pLl sequence, followed by the normal genomic sequence of positions ‘333 to ‘632.

[0048] As persons of skill are aware, pLl can insert into the genome in either 5’ to 3’ orientation or 3’ to 5’ orientation. As only full-length pLl can be active, if the 5’ sequence of the pLl does not commence with the start of the pLl 5’ UTR, in whichever orientation the pLl has inserted, the pLl cannot be full length, and cannot be active. Similarly, for sequences having a 3’ portion of pLl, if the pLl sequence does not terminate in the end of the pLl 3 ’UTR, the pLl cannot be full length, and cannot be active. The genomic sequences at locations in which insertions of pLl have occurred that are less than full length can optionally be reviewed to determine whether the insertion of LI sequence has disrupted a coding sequence, has dismpted a promoter, or might otherwise be causative of a disease or contribute to disease progression. Only pLls that commence with the beginning of the pLl 5’ UTR and end with the end of the 3’ UTR can be full length and are likely to have the capacity to be functional. Thus, the sequencing allows a ready determination of whether the pLl present is likely to be full-length, and therefore has the capability to generate de novo inserts or other types of genomic instability associated with LI enzymatic function.

[0049] A second type of probe, a sequence complementary to the beginning of the LI 5’ UTR is also present on the solid support or supports. Preferably, this second type of probe comprises a sequence of about 300 bp to about 400 bp of the beginning of the 5 ’UTR, with about here meaning ± 25 bp. The LI 5 ’UTR is approximately 900 bp in length, but the probes use a sequence complementary to that of the beginning of the 5 ’UTR sequence as, once again, only full-length Lis that might be active are of interest. These probes, which for convenience may be referred to as the “LI probes” will hybridize to the 5 ’UTR of any full- length LI present in the sample, including known Lis and any unknown Lis that are present in the sheared DNA, along with any genomic sequence upstream of the pLl that is on the segment of sheared DNA.

[0050] Sequencing of the DNA sequences upstream of the Lis captured from the subject’s sample and comparing those sequences to the sequences upstream and downstream of the 826 sites at which pLl is known to insert, and all full-length Lis annotated in the human genome will reveal whether each sequence captured by the LI probe is (1) from one of the already identified pLl insertion points, (2) from the site of a previously annotated fixed LI or, (3) a site not previously identified as a LI insertion point and therefore a previously unknown pLl.

[0051] Further, the LI probe acts as an internal control to confirm that all components of the method worked as intended. If the methods and devices are working as intended, the pLl probe will capture all the full-length Lis present in the individual’s genome, including not only the polymorphic Lis, which by definition can vary in number from individual to individual, but also the fixed Lis, which by definition are the same in every individual. Specifically, as graphically depicted in Fig. 1C, the probe detects the presence of the fixed LI type referred to as “PA2.” While in some embodiments, the inventive methods can seek to detect the presence of all fixed Lis in the genome, the detection of PA2s is preferred. First, this reduces the number of fixed Lis to be sequenced and analyzed to a little under 1000 per genome. Second, the sequence of PA2 is closer to that of pLls than the sequence of other fixed Lis, and thus serves as a better control for determining if the sample has been handled correctly such that the number of pLls detected is reliable, as discussed in the next paragraph.

[0052] A number of factors can affect whether the inventive methods work as intended, or whether they are providing inaccurate results due to mishandling of the sample or other procedural problems. For example, assume the DNA in the sample has degraded due to improper storage prior to the hybridization step or the wash buffers have been prepared with incorrect salt concentrations. In such cases, LI sequences in the sample may not hybridize to the LI probes or may wash off the LI probes prior to the elution step. Since the genomic sequence upstream and downstream of each fixed LI is known, a comparison of the readout of sequences of genomic DNA around the fixed Lis to the sequences of genomic DNA around the Lis in the sample allows the practitioner to determine the percentage of the annotated fixed Lis detected in each sample compared to the number known to be present in the human genome. As persons of skill are aware, some of the fixed Lis are located in regions of the genome with repetitive sequences and in some cases, the repetitive nature of the sequences surrounding the Lis makes it difficult to distinguish one of these fixed Lis from another. Accordingly, it is expected that, when the methods work as intended, the presence of approximately 97% of the almost 1000 annotated PA2s should be detected. Detection of less than 95% of these annotated fixed Lis indicates that there has been a problem with the assay. In such cases, the practitioner can review the sample to determine if the problem is with the quality of the DNA, in which case a fresh DNA preparation should be used, or if there was a problem with preparation of the reagents, in which case fresh reagents should be prepared and the test rerun using the fresh reagents.

[0053] The sections below discuss various embodiments of the inventive methods and devices. Isolation of Genomic DNA and Shearing

[0054] As mentioned, in some embodiments, the inventive methods involve isolating DNA from a subject and hybridizing it to probes. Obtaining DNA from a subject is well known, as evidenced by the kits provided at modest cost by companies which offer DNA analysis to members of the public. Isolating DNA and sequencing it has been well known in the art for decades, as exemplified by Roe, Crabtree, and Khan, DNA ISOLATION AND SEQUENCING, John Wiley & Sons, New York, 1996. Kits and equipment for isolation of research-ready genomic DNA are commercially available, as exemplified by the GenFind V3 Blood and Serum DNA isolation Kit (Beckman Coulter Life Sciences, Indianapolis, IN), which can be performed using a 96-well plate configuration to increase sample throughput. A Biomek i7 Hybrid Genomics workstation (Beckman Coulter Life Sciences) can be used for automated processing of 96 samples at a time. It is assumed that the practitioner is familiar with methods for isolating genomic DNA suitable for use in the inventive methods and systems.

[0055] Shearing and other methods for randomly fragmenting DNA have been used since the 1970s, and one of the present inventors was one of the originators of DNA shearing in the preparation of DNA sequencing libraries. See, Deininger, Anal Biochem, 1983, 129(1):216- 223. Low pressure shearing as a technique for obtaining randomly fragmented DNA was investigated as early as 1990 (see, e.g., Schriefer et ah, Nucleic Acids Res. 1990; 18(24):7455-7456). Hydrodynamic shearing of DNA was widely adopted in the 1990s and 2000s, as discussed in, e.g., Thorstenson et ah, Genome Res., 1998; 8:848-855; doi:10.1101/gr.8.8.848; Oefner et ah, Nucleic Acids Res., 1996, 24:3879-3886; Hengen, Trends Biochem Sci, 1997, 22(7):273-274; and Joneja and Huang, Biotechniques, 2009, 46(7):553-556. More recent techniques for fragmenting DNA include lateral cavity acoustic transducers (LCATs) designed by Okabe and Lee (J Laboratory Automation, 2014,

19(2): 163-170) that can be integrated into microfluidic platforms to automate DNA processing. Okabe and Lee note that it is desirable to fragment the DNA to about the size of the probes to improve both hybridization and sensitivity. Id. It is assumed that the practitioner is familiar with the various methods known in the art for fragmenting DNA, whether by shearing or another method, to sizes desired by the practitioner for use in the methods disclosed herein. Probes

[0056] DNA or RNA probes are used to capture complementary DNA from the subject. As discussed above, the compositions and methods comprise two types of DNA or RNA probes: a first set of probes which are complementary to the genomic DNA at the sites in the genome at which pLls are known to insert, and a second probe which is complementary to 200 or more bases of LI sequence, preferably the first 200 or more bases of the beginning of the 5’UTR. Current technology makes it relatively convenient to make probes of about 300- about 400 bases, with “about” meaning ±25 bases, and to sequence DNA of about that length that hybridizes to those probes. Table 2 sets forth the insertion points of the over 800 sites at which pLls are known as of this writing to insert in the genome. A probe consisting of a sequence complementary to the 300 bases upstream of the pLl insertion point for any given known pLl insertion point is expected to hybridize uniquely to sheared DNA from the subject from that genomic position which, depending on where the subject’s DNA sheared randomly, may also carry with it LI sequence from the 3’ end of the LI or the beginning of the LI 5’UTR, if a full-length pLl is present in the subject at that site. Similarly, a probe consisting of a sequence complementary to the 300 bases downstream of the pLl insertion point for any given known pLl insertion point is expected to hybridize uniquely to sheared DNA from the subject from that genomic position which, depending on where the subject’s DNA sheared randomly, may also carry with it LI sequence from the end of the LI 3’UTR, if a full-length pLl is present in the subject at that position.

[0057] As practitioners will recognize, DNA and RNA synthesis and DNA sequencing technologies are continually improving and the cost and difficulty of synthesizing longer probes is expected to come down. The use of longer probes, such as probes between about 400 and about 500 bases in length, between about 500 and about 600 bases in length, between about 600 and about 700 bases in length, between about 700 and about 800 bases in length, between about 800 and about 900 bases in length, or between about 900 and about 1000 bases in length are expected to be useful in the compositions and methods as the cost and ease of sequencing makes them cost effective, with “about” meaning ±25 bases. Probes longer than 1000 bases could be used if price and synthesis difficulty come down enough to justify their use, but are believed to be unnecessary, as they are not expected to improve the ability of the compositions and methods to identify the presence of pLls in the subject over probes of between about 200 to about 1000 bases in length. [0058] As noted in the Okabe and Lee reference cited in the preceding section, the lengths of the probes and of the sheared DNA from the subject are preferably about the same length. Thus, if the practitioner uses a longer probe, the DNA of the subject is preferably sheared to a similar length. It is expected that it is within the skill of the practitioner to adjust the shearing techniques used to shear DNA samples to desired lengths, such as those mentioned above.

Modification of probes, hybridization, and capture of targeted DNA solid supports

[0059] The inventive methods, systems, and apparatuses can use DNA or RNA probes attached to supports to capture for analysis DNA from the subject. Synthesizing DNA or RNA sequences for use as probes and attaching them to supports, or synthesizing DNA or RNA probes directly on a solid support has been known in the art for at least two decades.

For example, the Affymetrix, Inc. GENECHIP®, has been sold commercially since 1994.

[0060] DNA or RNA probes can be synthesized with terminal modifications that allow them to attach to glass or other surfaces, while still being able to hybridize to target sequences. Various options are available in the art for capture and enrichment of the target DNA sequences using probes attached to solid supports. One example is the Agilent SureSelect XT HS target enrichment system discussed above, in which the probes are biotinylated and captured by magnetic beads coated with streptavidin. Another technique attaches DNA to a glass surface by attaching a digoxigenin (dig) molecule to the DNA and attaching an anti-dig antibody to the glass surface by non-specific adsorption. The DNA molecule is then tethered to the glass surface by allowing the dig to be bound by the anti-dig antibody. See, e.g„ Kruithof et ah, Nat Struct Mol Biol. 2009; 16(5):534-40; Smith et ah, Science. 1992; 258:1122-1126. For convenience of reference, modifications of DNA or RNA probes that allow the probes to specifically bind to a capture molecule disposed on a solid support may be referred to herein as being “tags” and probes bearing such modifications as being “tagged.” When targeted DNA hybridizes to the tagged probes, the hybridized DNA can then be captured on the solid supports, allowing the DNA which has not hybridized to the probes to be eluted, thereby enriching the targeted DNA.

[0061] Glass or silica can be treated with amino silane reagent to coat their surfaces with amines or epoxides, which can then react with modified nucleotides to bind DNA to the surface. Schlingman et ah, Colloids Surf B Biointerfaces. 2011; 83(1): 91-95, disclose a method to attach DNA to a glass surface using N-hydroxysuccinimide (NHS) modified PEG. The glass surface is coated with silane-PEG-NHS and DNA of interest is modified with a single terminal amine group that allows covalent linkage through a reaction between the NHS group and the amine. Adessi et al., Nuc Acids Res, 2000; 28(20) p. e87, doi.org/10.1093/nar/28.20.e87, review a variety of chemistries that have been used to covalently attach DNA to glass or other surfaces, including 5'-succinylated target oligonucleotides immobilized on amino-derivatised glass slides, 5 '-disulfide modified oligonucleotides bound via disulfide bonds onto thiol-derivatised glass slides, the use of cross-linkers, such as phenyldiisothiocyanate or maleic anhydride, and the use of l-ethyl-3- (3-dimethylaminopropyl)-carbodiimide hydrochloride (EDC). Adessi et ak, also note that carbodiimide chemistry has been used with supports such as amino controlled-pore glass, latex beads, dextran supports, and polystyrene microwells. It is assumed that the practitioner is familiar with these and other methods of using DNA and RNA probes to capture and enrich from a genome target DNA sequences for sequencing.

[0062] Many conventional chips for capturing DNA are microarrays, in which the positions of the probes are registered and the information desired by the practitioner is the presence or absence of DNA hybridized to the probe at a particular location or plurality of locations. In embodiments of the inventive methods and systems, however, the information of interest is the sequence of the segments of sheared DNA from the subject. Thus, it is unnecessary to have the probes at particular positions on the solid surface. Accordingly, while the solid support can be a planar surface, such as a slide or a chip, it can alternatively be a bead or a well in a multi-well plate.

[0063] In some embodiments, fragmented or sheared DNA from the subject is hybridized to complementary DNA or RNA probes to capture DNA of interest. Protocols and conditions for hybridizing fragmented or sheared DNA to probes are well known in the art and it is expected that practitioners are familiar with the guidance already available in this area.

[0064] As practitioners will appreciate, the sequences used as probes for the subject’s genomic DNA are based on consensus sequences in GenBank. The sequences of subjects are expected to contain variations from the consensus sequence, due to single nucleotide polymorphisms (SNPs) or to other genetic variations. It is expected that the hybridization conditions will not be so stringent as to prevent hybridization due to these variations. Similarly, it is expected that the 5’UTR of functional pLls will contain occasional SNPs or other genetic variations. It is expected that the practitioner can readily select hybridization conditions that will not be so stringent as to prevent hybridization due to these variations. It is noted that adjusting stringency conditions to allow desired hybridization is routine in the art.

Release and hybridization

[0065] Once the targeted DNA has hybridized to the probes and captured on the solid supports, non-hybridized DNA is typically eluted, as in standard protocols for enrichment of target DNA. The targeted DNA that has hybridized to the probes is then released, eluted, amplified, and provided to conventional DNA sequencing. The amplification can be by any convenient means deemed suitable by the practitioner, including conventional PCR or droplet digital PCR (see, e.g. Olmedillas-Lopez et al., Mol Diagn Ther. 2017 Oct;21(5):493-510. doi: 10.1007/s40291-017-0278-8. As tens of thousands to millions of fragments of targeted DNA are typically captured in such protocols, the sequencing typically results in a like number of sequences. The sequences of the targeted DNA are typically then entered into a bioinformatics program which analyzes the sequences.

Bioinformatics programs

[0066] In some embodiments, DNA from is sequenced and analyzed to detect with respect to each of the over 800 sites at which pLl is known to insert into the genome, to determine which sites in an individual’s genome have a pLl present, to detect the presence of fixed Lis in the subject’s genome, and to detect the presence of any pLls in the subject at sites at which a pLl was not previously known to occur. The detection of these LI elements is developed from analyzing normal, non-Ll element genomic sequence that hybridizes to the DNA or RNA probes. Given the large number of genomic sequences (over 800 just for the pLl elements identified as of this writing), to be analyzed to determine the particular sites a pLl has inserted in the subject’s genome, plus the determination of whether all the approximately 1000 fixed L1PA2 elements have been detected, to verify that the hybridization and other conditions worked as intended, it is not possible for these analyses to be performed by hand calculation. Accordingly, performing the methods of the invention requires the use of bioinformatics software to analyze the sequence information.

[0067] Dozens of free and paid bioinformatics programs are available for comparing and analyzing nucleotide sequences. To list just a few, the free software programs include the European Molecular Biology Open Software Suite (“EMBOSS”), Integrated Genome Browser (IGB), GENtle, and Jalview. Paid DNA bioinformatics software includes CLC Genomics Workbench (QIAGEN Aarhus, Aarhus, Denmark), Partek® Genomics Suite, and Vector NTI Advance® (Invitrogen). Practitioners typically have preferences based on their prior use of and familiarity with particular software packages and compatibility with their computer system. It is anticipated that the practitioner is capable of choosing and using a software package suitable for use in the inventive methods and systems.

Systems, Devices, and Kits

[0068] As discussed above, in some embodiments, two sets of DNA or RNA probes, or combinations of DNA and RNA probes, are used, a first set which is complementary to genomic sections in which pLls have previously been found in the human genome, and a second set, which is complementary to several hundred bases of the sequence of LI, preferably of the beginning of the LI 5‘UTR or the end of the 3’ UTR (these sets of probes are sometimes referred to herein as the “first set” and the “second set” of probes, respectively, or together simply as “the probes”). The probes are preferably disposed on solid supports. In some embodiments, the probes are covalently attached to the supports so they do not wash off the supports in later wash and elution steps. In some embodiments, the probes are conjugated or fused to provide a terminal modification, or tag, which allow the probes to specifically bind to a solid support, either directly, or through a linker that specifically binds the tag. For example, the probes may be modified by biotinylation or by containing digoxigenin, as discussed above in the section on probes. When targeted DNA hybridizes to the tagged probes, the hybridized DNA can then be captured on the solid supports, allowing the DNA which has not hybridized to the probes to be eluted, thereby enriching the targeted DNA.

[0069] Isolated DNA from an individual is obtained (the DNA may be obtained in the form of cells which are lysed and from which the DNA is isolated, or may be obtained as already- isolated DNA) and is fragmented into a size selected by the practitioner, typically by shearing (as the DNA is preferably fragmented by shearing, for convenience of reference, the DNA fragments will be referred to below as having been “sheared,” even if another technique has been used to fragment the DNA). The sheared DNA is in lengths of at least 100 bases in length, more preferably about 200 bases, more preferably about 250 bases, still more preferably about 300 to about 400, in some embodiments about 400 to about 500, in some embodiments about 500 to about 600, in some embodiments about 600 to about 700, with “about” in this context meaning ±25 bases. The fragmented DNA from the subject is then placed in contact with the DNA or RNA probes under conditions which allow fragmented DNA from the subject that is complementary to that of the probes to hybridize. The fragmented DNA from the subject which has not hybridized to the DNA or RNA probes is washed away, after which the DNA which has hybridized is eluted and sequenced.

[0070] In some preferred embodiments, the process of hybridizing and sequencing the DNA fragments is conducted in an automated device configured for the purpose. In some embodiments, the automated device is a microfluidic device. In some embodiments, the automated device is configured to allow high-throughput of samples. This can be accomplished by, for example, using a multi-well plate system or by other apparatuses allowing multiple runs of samples undergoing the same procedures, such as parallel microfluidic chambers.

[0071] The sequencing of the DNA from the subject that has hybridized to the DNA or RNA probes typically results in tens of thousands to millions of sequences from each individual. The sequences are provided to a bioinformatics program which is programmed to compare the sequences from the first set of probes to the sequences of the genome upstream and downstream of the insertion points of the 826 sites at which pLl elements are known to insert into the genome, as listed in Table 2, below, as well as the LI 5’UTR and ‘3 UTR sequences and to identify and record (a) whether for each of the known pLl insertion sites, the individual shows the genomic sequence present at each of the potential pLl insertion points, (b) whether the sequence shows that the genomic sequence normally at each potential pLl insertion point has a LI 5 ’UTR sequence commencing from the beginning of the 5’UTR, (c) whether, for each potential pLl site that does have a LI sequence commencing from the beginning of the LI 5’UTR, the site also has a sequence with the ending of the LI ‘3UTR, and (d)(i) the total number of Lis sequences that have bound to the second probe set and the surrounding genomic sequence, (ii) determine from comparing the genomic sequences upstream or downstream, or both, of each of the LI sequences detected to the genomic sequences surrounding each of the fixed Lis in the genome how many of the fixed Lis have been detected, whereby detecting less than 95% of the number of fixed Lis indicates that there was a problem with the detection and that the subject’s DNA should be rescreened, and (iii) determine by detecting any LI elements that are surrounded by genomic sequence not previously identified as a point at which an LI element inserts that the individual has a previously unidentified pLl. As noted in a previous section, Table 2 includes both the approximately 800 currently known pLl insertion points and, as a positive control, a small number of insertion locations of fixed pLls known to be active in particular cancers.

[0072] In some embodiments, the invention further provides electronic devices configured for determining how many polymorphic LINE-1 elements (“pLls”) which pLls have a 5’ untranslated region (“5’UTR”) and a 3’UTR, which 5’UTR begins with a contiguous sequence of at least 300 bases and which 3’UTR terminates in a contiguous sequence of at least 300 bases, are present in genomic DNA of a subject, and at which of the sites at which pLls are known to insert the pLls are present in the genomic DNA of the subject. The devices comprise a processor and memory, in which the memory stores computer executable instructions for performing the methods set forth above.

[0073] In some embodiments, the invention provides kits. The kits provide sets of probes, which can be for detecting with regard to each of the over 800 insertion sites whether or not a pLl has inserted in that site with respect to an individual’s genome, or for detecting with regard to a subset of sites at which a pLl is known to insert, such as pLls identified in Table 2 as found by WGS, SCORE, or both, only in individuals diagnosed with breast cancer, pLls identified in Table 2 as found by WGS, SCORE, or both, only in individuals diagnosed with prostate cancer, pLls identified in Table 2 as found by WGS, SCORE, or both, in genomes of both individuals diagnosed with breast cancer and in genomes of individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS,” pLls identified in Table 2 as found only in individuals diagnosed with Alzheimer’s Disease, or, pLls identified in Table 2 as found by WGS, SCORE, or both, in individuals diagnosed with Alzheimer’s Disease, in individuals diagnosed with breast cancer, and in individuals diagnosed with prostate cancer, but not in genomes of individuals listed in Table 2, column “Cont-WGS.

EXAMPLES

Example 1

[0074] This Example sets forth materials and methods that were used for finding polymorphic Lis in studies reported herein.

Materials and Methods

Human whole genome sequencing samples: [0075] Human Prostate adenocarcinoma and Breast invasive carcinoma WGS (whole genome sequencing) samples were downloaded from the National Cancer Institute Genomic Data Commons (“GDC”) Data Portal. Human Cognitively normal (control) and Alzheimer’s Disease WGS samples were downloaded from the Alzheimer’s Disease Neuroimaging Initiative (“ADNI”) data archive. Cognitively normal patients with cancer were excluded from the control group.

Prostate cancer patient samples

[0076] Buffy coats and patient metadata from prostate cancer patients were obtained through the Tulane Urology Department Biospecimen Bank. Genomic DNA was extracted using the DNeasy Blood and Tissue kit (Qiagen N.V., Germantown, MD) and submitted for SCORE analysis.

Cell culture

[0077] MCF7 cells (American Type Culture Collection (ATCC), Manassas, VA, #HTB-22) were maintained in Minimum Essential Media (“MEM”) (Gibco™, Thermo Fisher Scientific, Waltham, MA) supplemented with 10% bovine serum (Gibco), sodium pyruvate, essential and nonessential amino acids, and L-glutamine.

Targeted sequencing

[0078] Targeted sequencing probes were designed by Agilent Technologies, Inc. (Santa Clara, CA) for all known polymorphic LI insertion sites and for fixed PA2 loci. Fragmenting of DNA and paired-end sequencing was performed by BGI Americas (San Jose, CA).

Bioinformatic analysis

[0079] Paired-end sequencing files were obtained through SCORE targeted sequencing or extracted from WGS alignment files. The paired alignment files for each sample were aligned separately to the human LI consensus sequence using STAR v2.3.0e alignment software and allowing one alignment per read (-outFilterMultimapNmax 1) and a maximum of 25 mismatches (-outFilterMismatchNmax 25). Alignments that occurred in the first 700bp of the LI consensus sequence and were in the reverse orientation to LI were extracted. These reads were then used to find their pair based on matching read IDs. The opposite read pair was then aligned to the human genome using bowtie vO.12.8, requiring unique alignments (-m 1), the tryhard setting (-y), and allowing 3 mismatches (-v 3). Alignments in the resulting file were then parsed for read alignments that occurred within the 5’ upstream region of known polymorphic LI loci and LI PA2s. This was done using bedtools v2.22.0.

Example 2

[0080] This Example compares the ability of the inventive methods to detect PA2 fixed Lis to that of whole genome sequencing. PA2s are present in approximately 1000 fixed positions in the genome of all individuals and the number found should therefore be the same regardless of which method is used to detect them.

[0081] A study was conducted using information obtained from whole genome sequencing of patient samples diagnosed with Alzheimer’s Disease, breast cancer, or prostate cancer, or individuals who had not been diagnosed with any of these conditions (“controls”), and information developed by analyzing the genome of a breast cancer patient and of prostate cancer patients by the inventive methods.

[0082] The results of this study are presented as a bar graph in Figure 2, with each genome analyzed represented as a circle. As shown in Figure 2, all the genomes analyzed (except for those discussed further below) showed the same number of PA2s. Those genomes analyzed by whole genome sequencing are labeled by the acronym “WGS,” while those analyzed by the inventive methods are labeled with the acronym “SCORE,” which stands for the phrase “Screen for Content Of Retro Elements,” used by the present inventors to refer to some embodiments of the inventive methods.

[0083] As noted, two of the genomes in the bar presenting the results for Alzheimer Disease patients whose genomes were analyzed by WGS show just over 500 PA2s in the genomes of those individuals, little more than half the number expected. The results for these individuals show that the whole genome sequencing conducted on the genomes of those two individuals failed to detect hundreds of the fixed Lis actually present, and that there was a problem in the sample preparation or subsequent analysis. Thus, determining the number of PA2s present in a sample acts as a control, in which a result showing the presence of a lower number of fixed Lis than are known to exist indicates that there was a problem with either the sample preparation or the analysis.

[0084] Figure IB shows a probe that is complementary to the 5’UTR of polymorphic Lis and fixed Lis, but without having a sequence from the genome. DNA fragments from the LI 5’UTR will bind to the probe, and some fragments will include genomic sequence upstream of the 5’UTR. That genomic DNA upstream of the 5’UTR is then sequenced and compared to the consensus genomic DNA and other sources to determine where in the genome the particular 5’UTR is located and to see if the 5’UTR at the genomic location corresponding to the genomic sequence attached to the 5’UTR fragment is the location of a known fixed LI or a pLl. If a LI is found to always be present in the consensus genomic DNA or other genome sources at a location that does not correspond to a previously known fixed LI, it can be classified as a previously unknown fixed LI. If a complete LI sequence is found to sometimes occur at the newly identified location but not always, that would indicate that the probe has detected a pLl not previously known to occur at the genomic location in question.

Example 3

[0085] This Example shows that the inventive methods detect more pLls in patient samples than does whole genome sequencing, and is therefore a more sensitive detection method.

[0086] Figure 3 presents bars representing the number of pLls found in individuals who had either been diagnosed with one of four conditions, Alzheimer’s disease, breast cancer, or prostate cancer, or, as controls, who had not been diagnosed with one of the three conditions mentioned. The genomes indicated in the Figure as having been analyzed by WGS are from the GDS Data Portal and ADNI data archives, as described in Example 1. The genomes of individuals with prostate cancer analyzed by the inventive methods, shown in the bar labeled “Prostate Cancer SCORE,” are from the Tulane Urology Department, as described in Example 1.

[0087] The Y axis of Figure 3 shows the number of polymorphic Lis determined to be present in an individual in one of the four patient or control samples by either whole genome sequencing (“WGS”) or by the inventive methods, referred to on the Figure as “SCORE.” As noted above, SCORE is an acronym for “Screen for Content Of Retro Elements.” The circles at the top of each bar show the number of pLls found in a sample from a single individual, either with Alzheimer’s disease, breast cancer, prostate cancer, or not known to have any of these conditions (“Control”). A sample for one breast cancer patient was available for analysis by the inventive methods and the number of different pLls found in that patient sample is shown in the bar labeled “Breast Cancer SCORE.” As can be seen from the height of the respective bars, analysis of the genomes of the breast cancer patient and of prostate cancer patients by the inventive methods resulted in detection of notably more pLls than were detected in patients’ cancers of the same organs, but analyzed by conventional by whole genome sequencing. These results demonstrate the increased sensitivity of the inventive methods over the use of a conventional method for finding how many pLls exist in an individual. Finally, two genomes in the Alzheimer’s Disease WGS group showed unusually low numbers of pLls. These are likely the same two genomes shown in Figure 2 as not having detected the expected number of PA2s present in the genome, and therefore indicate that there was a problem with those two samples.

Example 4

[0088] This Example describes the results of a study using information developed from genomes of individuals in the GDC Data Portal and the ADNI data archives analyzed by WGS, or from the Tulane Urology Department analyzed by SCORE, as described in Example 1.

[0089] As described elsewhere in this disclosure, genomes differ not only in how many of pLls have inserted into them, but also in the subsets of the over 800 sites identified to date as to which pLls insert into the human genome. The locations of each of those individual pLl insertion sites is set forth in Table 2, below. Individual genomes from individuals that had been analyzed by WGS or by SCORE were examined for the presence of a pLl at each of the sites identified in Table 2, in an attempt to find pLls or patterns of pLls that were markers of either breast or prostate cancer or of Alzheimer’s Disease, as an exemplary cognitive disorder. pLls that were common to at least one individual in every group examined (Alzheimer’s, prostate cancer, and breast cancer) were excluded as unlikely to be useful as a marker for any of the particular conditions included in the study. Figure 4 shows in graphical form the number of pLls found in each group after excluding pLls that inserted at sites at which a pLl was found at least once in each group. As can be seen, the bars reflecting pLls analyzed by the inventive methods, shown by the label “SCORE” on the Figure, found many more pLls present in individuals with the conditions in question than did analysis by conventional WGS analysis.

[0090] Table 2 lists all of the pLls identified as of this writing, and identifies the pLls which were found in genomes of individuals who had developed one of the conditions listed, as well as those present in cognitively normal individuals who had not been diagnosed with cancer (the group labeled as “Control WGS” in Table 2). As can be seen by referring to Table 2, many more pLls were identified in genomes from prostate cancer patients analyzed by the inventive methods compared to those identified by analysis by WGS.

Example 5

[0091] This Example describes the results of a study using information developed from genomes of individuals in the GDC Data Portal and the ADNI data archives analyzed by WGS, or from the Tulane Urology Department analyzed by SCORE, as described in Example

1.

[0092] As noted in the preceding Example, individual genomes from individuals that had been analyzed by WGS or by SCORE were examined for the presence of each of the 826 pLls identified in Table 2 in an attempt to find pLls or patterns of pLls that were markers of either breast or prostate cancer or of Alzheimer’s Disease (sometimes abbreviated herein as “AD”), as an exemplary cognitive disorder. pLls that were common to at least one individual in every group examined (AD, prostate cancer, etc.), were excluded as unlikely to be useful as indicative of increased risk for one of the conditions included in the study. This Example reports the results of the analysis. Table 2 lists 826 known pLls, and identifies the pLls which were found in the studies reported here to be present in genomes of individuals who had been diagnosed with AD, breast cancer, or prostate cancer, as well as those pLls present in individuals who had not been diagnosed with AD, breast cancer, or prostate cancer (for purposes of this study, the last group of individuals were considered to be controls; the column listing the pLls noted in their genomes is labeled in Table 2 as “Cont WGS.”)· Table 2 sets forth for each of the 826 pLls listed by chromosome and insertion point within that chromosome whether the pLl was found in the genome of an individual who had been diagnosed with AD, with breast cancer, or with prostate cancer, or who had not been diagnosed with any of these conditions as of the time their genome was analyzed for the presence of pLls.

[0093] Figure 5 presents the results in graph form. One hundred and sixty-six pLls were found only in patient with prostate cancer. Referring to Figure 5, it can be seen that, while 166 pLls identified were found only in prostate cancer patients in the groups analyzed, additional pLls were found that were also present in breast cancer patients, but not in AD patients and not in persons that had not been diagnosed with cancer (“Control”). Similarly, additional pLls were found that were also present in Alzheimer’s patients, but not in breast cancer patients or in cognitively normal patients that had not been diagnosed with cancer. Adding these groups together results in 166+80+17+20=283 pLl loci that were found in persons who had been diagnosed with prostate cancer, but not in the control group. The presence of one or more of the 166 pLls unique to prostate cancer in an individual’s genome is expected to indicate that that individual is at an increased risk of developing prostate cancer compared to the general population.

[0094] Referring again to Figure 5, 80 pLls were found to be common to both by individuals who had been diagnosed with breast cancer and individuals who had been diagnosed with prostate cancer. Breast cancer is unfortunately common in women. Prostate cancer, of course, occurs only in men. Males do develop breast cancer, but rarely compared to women. Breast cancer and prostate cancer were chosen for analysis in this study in part because the populations affected by the two cancer types rarely overlap. Thus, the finding of 80 pLls shared by patients diagnosed with one of these cancers suggests that the presence of one or more of these pLls in an individual’s genome indicates that the individual is at heightened risk of developing cancer during their lifetime and should be monitored more closely than might otherwise be the case. Without wishing to be bound by theory, it is believed that the more of these 80 pLls is present in an individual’s genome, the higher the risk that individual will develop cancer during their lifetime compared to those who do not carry those pLls. If the individual is female, she should be monitored for cancer, and in particular for breast cancer, earlier and more often than if she had no other risk factors. If the individual is a male, he should be monitored more closely for prostate cancer and, with respect to any prostate cancer detected, should be monitored more closely for progression of that cancer, than might be practiced for men without these risk factors.

[0095] Ten pLl loci, identified in Table 2, were found to be unique to breast cancer patients. The presence of one or more of these 10 pLls in an individual’s genome indicates that the individual is at elevated risk of developing breast cancer during their lifetime and should be monitored with breast exams and mammograms earlier than patients without one or more of these pLls being present. Further, 80 pLls were found in breast cancer patients that were also present in prostate cancer patients, but not in AD patients or in cognitively normal patients that had not been diagnosed with cancer. Similarly, 17 additional pLls were found that were also present in Alzheimer’s patients and in prostate cancer patients, but not in cognitively normal patients that had not been diagnosed with cancer. Adding these groups together results in 10+80+17+1=108 pLl loci are unique to breast cancer compared to the control group. The presence of one or more of the 18 pLls that breast cancer patients share with AD patients indicate that individuals with one or more of those pLls have an elevated risk of developing breast cancer, Alzheimer’s Disease, or both.

[0096] Alzheimer’s Disease patients were found to have 2 pLl loci that were not shared with the cognitively normal individuals or individuals with either of the two cancers. The presence of one or both of these 2 pLls in an individual’s genome indicates that the individual is at elevated risk of developing AD during their lifetime and should be monitored for cognitive impairment starting in their early 60s. A number of pLls found in AD patients were also found in patients with breast cancer, or with prostate cancer. The presence of one or more of these pLls indicate that those individuals have an elevated risk of developing AD or cancer. For example, 17 pLls are shared by AD, breast cancer, and prostate cancer patients, thus their presence in a genome would indicate a risk of developing any of these three diseases.

[0097] Combined, these findings demonstrate that age and gender should be considered when interpreting pLl content relevant to the risk of developing disease. This is because females will not develop prostate cancer, while males can, although rarely develop breast cancer. Similarly, defects in DNA repair genes detected by genetic tests combined with pLl content are expected to have a better predictive power of a disease risk than they do alone. [0098] 481 pLls were found to be shared among the four groups analyzed. A pLl was considered to be shared if it was found in at least one of the samples within each group. Some of these pLls could remain important for a specific disease included in this analysis or in other diseases because their allelic frequencies may differ between different groups. For example, some of these pLls may be found more frequently in breast cancer patients than in controls, which would indicate that they may carry some risk of association with breast cancer. The same is true for any pLls that are shared between controls and any individual diseases. Table 2 also includes 6 fixed Lis, each set off by an asterisk, which have been found to be active (that is, able to cause mutations) in persons with cancer. As noted earlier, by definition, a fixed Lis is present in every human genome, these six fixed Lis do not by themselves indicate that a person carrying them is at greater risk for cancer than any other member of the population.

TABLE 2

Legend for Table 2:

[0099] “Chrom” and “chr”: chromosome.

[0100] “WGS”: acronym for “Whole genome sequencing.” [0101] “SCORE”: acronym for “Screen for Content Of Retro Elements.”

[0102] “Cont WGS”: genomes from individuals who did not have Alzheimer’s Disease (“AD”) and who had not been diagnosed with cancer, as analyzed by whole genome sequencing. The presence of the number “1” in a row corresponding to a particular chromosome and insertion site within that chromosome indicates that a LI element was found in the genome of at least one person in the data set analyzed who had not been diagnosed with either AD or cancer. For example, in chromomsome Y, position 3311594 is known to be an insertion site at which a pLl has been found to insert. The number “1” in the “Cont WGS” column in the row listed for chromosome Y, insertion site 3311594, indicates that the analysis of the data set from individuals who had not been diagnosed with either AD or cancer found that the genome of at least one individual in the data set had a pLl located at that insertion site. For clarity, it is noted that the appearance of the digit “1” in this column, or in the other columns showing whether a pLl was present at an insertion site, indicates only that at least one genome of a person in the group labeled at the top of the column was found to have a pLl at the indicated location; it does not indicate that only one pLl was found at that location among the genomes of persons in the group indicated.

[0103] “AD WGS”: genomes from individuals who were diagnosed with Alzheimer’s Disease (“AD”), as analyzed by whole genome sequencing. As stated in the preceding paragraph, the number “1” in this column with respect to a row designating a particular LI element indicates that the LI element whose insertion point is named in the two left columns was found in the genome of at least one person in the data set used who had been diagnosed with AD, not the number of genomes in that data set which had a pLl element present at that position..

[0104] “BC WGS”: genomes from individuals who were diagnosed with breast cancer, as analyzed by whole genome sequencing. The number “1” in this column with respect to a row designating a particular LI element indicates that the LI element whose insertion point is named in the two left columns was found in the genome of at least one person in the data set used who had been diagnosed with breast cancer.

[0105] “BC SCORE”: genome(s) from individual(s) who were diagnosed with breast cancer and analyzed by the inventive methods. The number “1” in this column with respect to a row designating a particular LI element indicates that the LI element whose insertion point is named in the two left columns was found in the genome of at least one person in the data set used who had been diagnosed with breast cancer.

[0106] “PC WGS”: genomes from individuals who were diagnosed with prostate cancer, as analyzed by whole genome sequencing. The number “1” in this column with respect to a row designating a particular LI element indicates that the LI element whose insertion point is named in the two left columns was found in the genome of at least one person in the data set used who had been diagnosed with prostate cancer.

[0107] “PC SCORE”: genomes from individuals who were diagnosed with prostate cancer and analyzed by the inventive methods. The number “1” in this column with respect to a row designating a particular LI element indicates that the LI element whose insertion point is named in the two left columns was found in the genome of at least one person in the data set used who had been diagnosed with prostate cancer.

[0108] Chromosome positions marked with an asterisk designate the positions in the genome of fixed Lis that have been found to be active in causing mutations that can lead to cancers. As they are in every human genome, they are not themselves indicative of an increased risk of cancer compared to anyone else in the population.

[0109] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.