Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DISEASE RISK ESTIMATING METHOD USING SEQUENCE POLYMORPHISMS IN A SPECIFIC REGION OF CHROMOSOME 19
Document Type and Number:
WIPO Patent Application WO/2006/136170
Kind Code:
A2
Abstract:
The present invention provides methods and compositions for identifying human subjects with an increased risk of having or developing cancer. In particular, this invention relates to the identification and characterization of polymorphisms in the human chromosome 19q, the region r located approximately 19q 13.2-3 correlated with increased risk of developing cancer and the responsiveness of a subject to various treatments for cancer. An allele in the r region can be identified as correlated with an increased risk of developing cancer, the prognosis of developed cancer, and responsiveness to cancer treatment, on the basis of statistical analyses of the incidence of a particular allele in individuals diagnosed with cancer. The invention further relates to probes and kits comprising the probes useful in the diagnostic.

Inventors:
NEXOE BJOERN ANDERSEN (DK)
VOGEL ULLA BIRGITTE (DK)
BOERGLUM ANDERS (DK)
Application Number:
PCT/DK2006/000367
Publication Date:
December 28, 2006
Filing Date:
June 22, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV AARHUS (DK)
ARBEJDSMIJOEINSTITUTTET (DK)
NEXOE BJOERN ANDERSEN (DK)
VOGEL ULLA BIRGITTE (DK)
BOERGLUM ANDERS (DK)
International Classes:
C12Q1/68
Domestic Patent References:
WO2004003229A22004-01-08
WO2006018023A22006-02-23
Other References:
NEXO BJORN A ET AL: "A specific haplotype of single nucleotide polymorphisms on chromosome 19q13.2-3 encompassing the gene RAI is indicative of post-menopausal breast cancer before age 55." CARCINOGENESIS (OXFORD), vol. 24, no. 5, May 2003 (2003-05), pages 899-904, XP002261349 ISSN: 0143-3334
YIN JIAOYANG ET AL: "Multiple single nucleotide polymorphisms on human chromosome 19q13.2-3 associate with risk of basal cell carcinoma." CANCER EPIDEMIOLOGY BIOMARKERS AND PREVENTION, vol. 11, no. 11, November 2002 (2002-11), pages 1449-1453, XP002261348 ISSN: 1055-9965 (ISSN print)
ROCKENBAUER ESZTER ET AL: "Association of chromosome 19q13.2-3 haplotypes with basal cell carcinoma: Tentative delineation of an involved region using data for single nucleotide polymorphisms in two cohorts" CARCINOGENESIS (OXFORD), vol. 23, no. 7, July 2002 (2002-07), pages 1149-1153, XP002261350 ISSN: 0143-3334
YIN JIAOYANG ET AL: "Twelve single nucleotide polymorphisms on chromosome 19q13.2-13.3: Linkage disequilibria and associations with basal cell carcinoma in Danish psoriatic patients." BIOCHEMICAL GENETICS, vol. 41, no. 1-2, February 2003 (2003-02), pages 27-37, XP002261351 ISSN: 0006-2928
CHEN PENGCHIN ET AL: "Association of an ERCC1 polymorphism with adult-onset glioma" CANCER EPIDEMIOLOGY BIOMARKERS AND PREVENTION, vol. 9, no. 8, August 2000 (2000-08), pages 843-847, XP002261355 ISSN: 1055-9965
VOGEL ULLA ET AL: "Two regions in chromosome 19q13.2-3 are associated with risk of lung cancer." MUTATION RESEARCH, vol. 546, no. 1-2, 26 February 2004 (2004-02-26), pages 65-74, XP002407438 ISSN: 0027-5107
Attorney, Agent or Firm:
HØIBERG A/S (Copenhagen K, DK)
Download PDF:
Claims:

Claims

1. A method for estimating the disease risk of an individual comprising

in a sample from said individual assessing in the genetic material at least one sequence polymorphism

- in a region corresponding to SEQ ID NO: 132, or a part thereof, or in a region complementary to SEQ ID NO: 132, or a part thereof, or

- in a transcription product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, or or translation product from a sequence in a region corresponding to SEQ ID

NO: 132, or a part thereof, and optionally a further sequence polymorphism in a region corresponding to SEQ ID NO: 133, or a part thereof, or in a region complementary to SEQ ID NO: 133, or a part thereof, or in a transcription product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof, or or translation product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof,

- with the proviso that when using SEQ ID NO: 132 only the at least one se- quence polymorphism is in a region corresponding to at least one region within SEQ ID NO: 132 selected from the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO: 19) and rs2377329 (SEQ ID NO: 22), or

in a region flanked by and including the sequence polymorphism positioned at 18149120 in Contig NT_011109 (SEQ ID NO: 3) and sequence polymorphism positioned at 18149154 in Contig NT_011109 (SEQ ID NO: 4), or

the sequence polymorphism positioned at 18150815 in Contig NT_011109 (SEQ ID NO: 5), or

the sequence polymorphism positioned at 18151158 in Contig NT_011109 (SEQ ID NO: 6)

- obtaining a sequence polymorphism response,

- estimating the disease risk of said individual based on the sequence polymorphism response.

2. The method according to claim 1, wherein the cell sample is a blood sample, a tissue sample, a sample of secretion, semen, ovum, a washing of a body surface, such as a buccal swap, a clipping of a body surface, including hairs and nails.

3. The method according to any of the preceding claims, wherein the cell is selected from white blood cells and tumor tissue.

4. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least one mutation base change.

5. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least two base changes.

6. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least one single nucleotide polymorphism.

7. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least two single nucleotide polymorphisms.

8. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least one tandem repeat polymorphism.

9. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least two tandem repeat polymorphisms.

10. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least one deletion polymorphism.

11. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least one insertion polymorphism.

12. The method according to any of the preceding claims, wherein the sequence polymorphism comprises at least one dinucleotide polymorphism.

13. The method according to any of the preceding claims, wherein the disease is cancer.

14. The method according to any of the preceding claims, wherein the cancer is selected from skin carcinoma including malignant melanoma, breast cancer, lung cancer, colon cancer and other cancers in the gastro-intestinal tract, prostate cancer, lymphoma, leukemia, multiple myeloma, pancreas cancer, head and neck cancer, ovary cancer and other gynecological cancers.

15. The method according to any of the preceding claims, wherein the cancer is se- lected from skin cancer, lung cancer, colon cancer, breast cancer, and multiple myeloma.

16. The method according to any of the preceding claims, wherein the cancer is selected from skin cancer, lung cancer, colon cancer and breast cancer.

17. The method according to any of the preceding claims, wherein the cancer is selected from skin cancer, lung cancer, and breast cancer.

18. The method according to any of the preceding claims, wherein the cancer is lung cancer.

19. The method according to any of the preceding claims, wherein the cancer is lung cancer in an individual with marker XPD exon 23 M .

20. The method according to any of the preceding claims, wherein the cancer is selected from skin cancer and breast cancer.

21. The method according to any of the preceding claims 13-16 and 19, wherein the skin cancer is basal cell carcinoma.

22. The method according to any of the preceding claims, wherein the cancer is multiple myeloma.

23. The method according to any of the preceding claims, wherein the at least one sequence polymorphism is assessed in a region of SEQ ID NO: 132 flanked by and including the markers RAI 37 (SEQ ID NO: 21) and RAI 3'4 (SEQ ID NO: 26).

24. The method according to any of the preceding claims, wherein at least one se- quence polymorphism is assessed in a region of SEQ ID NO: 132 flanked by and including the markers RAI 3'3 (SEQ ID NO: 23) and the marker sequence RAI 3' d1 GGTTTAT[ATTTT]Ntgagatggatttt (SEQ ID NO: 19).

25. The method according to any of the preceding claims, wherein at least one se- quence polymorphism is the marker sequence RAI 3' d1

GGTTTAT[ATTTT]Ntgagatggatttt (SEQ ID NO: 19).

26. The method according to any of the preceding claims, wherein at least one sequence polymorphism is RAI 3'3 (SEQ ID NO: 23).

27. The method according to any of the preceding claims, wherein at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO: 142).

28. The method according to any of the preceding claims, wherein at least one sequence polymorphism is assessed in a region of SEQ ID NO: 132 flanked by and including the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO: 19) and rs2377329 (SEQ ID NO: 22), and at least one se- quence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO: 142).

29. The method according to any of the preceding claims, wherein at least one sequence polymorphism is assessed in a region of SEQ ID NO: 132 flanked by and including the regions flanked by and including the sequence polymorphism positioned at 18149120 in Contig NT_011109 (SEQ ID NO: 3) and sequence polymorphism positioned at 18149154 in Contig NT_011109 (SEQ ID NO: 4), and at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO: 142).

30. The method according to any of the preceding claims, wherein the sequence polymorphism is assessed in SEQ ID NO: 132 said polymorphism is positioned at 18150815 in Contig NT_011109 (SEQ ID NO: 5), and at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO: 142).

31. The method according to any of the preceding claims, wherein the sequence polymorphism is assessed in SEQ ID NO: 132 said polymorphism is positioned at 18151158 in Contig NT_011109 (SEQ ID NO: 6), and at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO.: 142).

32. The method according to any of the preceding claims, wherein the sequence polymorphism assessed in SEQ ID NO: 132 is RAI3' d1 (SEQ ID NO:19), and the sequence polymorphism assessed in SEQ ID NO: 133 is ASE 1 exon 3-2 (SEQ ID NO: 142)

33. The method according to any of the preceding claims, wherein the sequence polymorphism assessed in SEQ ID NO: 132 is RAI3' d1 (SEQ ID NO: 19), and the sequence polymorphism assessed in SEQ ID NO: 133 is ASE 1 exon 1 (SEQ ID NO: 43)

34. The method according to any of the preceding claims, wherein the assessment is conducted by means of at least one nucleic acid primer or probe, such as a

primer or probe of DNA, RNA or a nucleic acid analogue such as peptide nucleic acid (PNA) or locked nucleic acid (LNA).

35. The method according to claim 34, wherein the nucleotide primer or probe is capable of hybridising to a subsequence of the region corresponding to SEQ ID

NO: 132, or a part thereof, or a region complementary to SEQ ID NO: 132 and optionally further a subsequence of the region corresponding to SEQ ID NO: 133, or a part thereof, or a region complementary to SEQ ID NO: 133.

36. The method according to claim 34 or 35, wherein the primer or probe has a length of at least 9 nucleotide or peptide monomers.

37. The method according to any of the preceding claims 34-36, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 132 selected from the group of subsequences

1.TTTTAGTAGAGACATGGTTCCGCCA[C/ηGTTGCCCAGGCT

GGTCTTGAACTCC SEQ ID NO:21

2. GGTTTAT[ATTTT]Ntgagatggatttt SEQ ID NO:19

3. ctggggaggctgaggcaggagaatc[A/G]cttgaaaccgggaggcggaggttgt SEQ ID NO:22

4. actaaaaataaaaaaataaaaaaaa[-/AA]atagccgagcatggtggtgggtgcc SEQ ID NO:23

5. GATTGTCATGT[G/T]ACATCAGCCAATACT SEQ ID NO:2

6.AAAAAACTAAAGTGGGGTTTGCGGG[GZTI-

AGTGGGAGGGCCCTTCCTGCTAGGT SEQ ID NO:24

7.caggcggatcacaaggtcaggagtt[C/T]-gagaccagcctggccaacacagtga SEQ ID NO:25

8. CACAGTGAAAC[C/T]CCATCTCTACTAAA SEQ ID NO:3

9. AAAATTAGCCGG[A/G]CGCCATGGCGGGAG SEQ ID NO:4

10.CCCTATGTTGTCCAAGCTGGCA- SEQ ID NO:26

GAG[A/G]TTTTTGTTTGTTTGTTTGAGAGGGA

11. AGCCTGGCCAACATG[CZG]TGAAACCCCGTCTCT SEQ ID NO:5

12. ctcgggaggctgaggcaggagaatc[A/G]cttgaactcaggaggcagaggttgc SEQ ID NO.27

13. AAGTTTCTCTATT[GZT]TGTTTATAAACA SE Q ID NO:6

or to a sequence complementary to any of the subsequences.

38. The method according to claim 34, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 132 selected from the group of subsequences

GGTTTAT[ATTTT]Ntgagatggatttt SEQ ID NO:19 actaaaaataaaaaaataaaaaaaa[-/AA]atagccgagcatggtggtgggtgcc SEQ ID NO:23

AAAAAACTAAAGTGGGGTTTGCGGG[G/η-AGTGGGAGGGCCCTTCCTGCTAGGT SEQ ID NO:24 AAAAT-

TAGCCGG[A/G]CGCCATGGCGGGAG SEQ ID NO:4

CCCTATGTTGTCCAAGCTGGCAGAG-[AZG]TTTTTGTTTGTTTGTTTGAGAGGGA SEQ ID NO-.26 or to a sequence complementary to any of the subsequences.

39. The method according to claim 34, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 132 selected from the group of subsequences

GGTTT AT[ATTTT]Ntgagatggatttt SEQ ID NO:19 actaaaaataaaaaaataaaaaaaa[-/AA]atagccgagcatggtggtgggtgcc SEQ ID NO:23

AAAAAACTAAAGTGGGGTTTGCGGG[G/T]-

AGTGGGAGGGCCCTTCCTGCTAGGT SE Q ID NO:24

AAAATTAGCCGG[A/G]CGCCATGGCGGGAG SEQ ID NO:4

CCCTATGTTGTCCAAGCTGGCAGAG- SEQ ID NO:26

[A/G]TTTTTGTTTGTTTGTTTGAGAGGGA

or to a sequence complementary to any of the subsequences.

40. The method according to claim 34, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 132 selected from the group of subsequences

GGTTT AT[ATTTT]Ntgagatggatttt SEQ ID NO19 ctggggaggctgaggcagga- gaatc[A/G]cttgaaaccgggaggcggaggttgt SE Q ID NO:22

or to a sequence complementary to any of the subsequences.

41. The method according to claim 34, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 132 selected from the group of subsequences

CACAGTGAAAC[C/T]CCATCTCTACTAAA SEQ ID NO: 3

AAAATTAGCCGG[A/G]CGCCATGGCGGGAG SEQ ID NO: 4 or to a sequence complementary to any of the subsequences.

42. The method according to claim 34, wherein the at least one nucleotide primer or probe capable of hybridising to a subsequence of SEQ ID NO: 132 is

AGCCTGGCCAACATG[C/G]TGAAACCCCGTCTCT SEQ ID NO: 5 or to a sequence complementary to the subsequence.

43. The method according to claim 34, wherein the at least one nucleotide primer or probe capable of hybridising to a subsequence of SEQ ID NO: 132 is

AAGTTTCTCTATT[GZT]TGTTTATAAACA SEQ ID NO: 6 or to a sequence complementary to the subsequence.

44. The method according to claim 34, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 133 selected form the group of subsequences

AGAACCTGTTCAGGCTGGCGGCTCA[CZT]TTGGATGAACAGGGAGTGTGTGAC SEQ ID NO: 41

CCCCCTTCTTAGGACGCATGGGGGT[G/ηGAGAGAACGGGGAGATAGACAGAG SEQ ID NO: 42

TGCGAGCAGCCCGGGCTACAGGGTT[AZG]CCTGAGGTGTGGGTCCCAGGATGG SEQ ID NO: 43

GGCGCCTCAACAGCCAGAAGGAGCG[AZG]AGCCTCAGGCCCAGGCAGCTCTGG SEQ ID NO: 44

AGAAAGAAAAACAGCAA AZG ATGCCACAGTGGAGCCAGAG SEQ ID N0: 142

or to a sequence complementary to any of the subsequences.

45. The method according to claim 34, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 133 selected from the group of subsequences

CCCCCπCTTAGGACGCATGGGGGT[G/T]GAGAGAACGGGGAGATAGACAGAG SEQ ID NO:42

TGCGAGCAGCCCGGGCTACAGGGTT[AZG]CCTGAGGTGTGGGTCCCAGGATGG SEQ ID NO:43

GGCGCCTCAACAGCCAGAAGGAGCG[AZG]AGCCTCAGGCCCAGGCAGCTCTGG SEQ ID NO:44

SEQ ID NO:

AGAAAGAAAAACAGCAA A/G ATGCCACAGTGGAGCCAGAG 142

or to a sequence complementary to any of the subsequences.

46. The method according to claim 34, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 133 selected form the group of subsequences

AGAACCTGTTCAGGCTGGCGGCTCA[CZηTTGGATGAACAGGGAGTGTGTGAC SEQ ID NO:41

TGCGAGCAGCCCGGGCTACAGGGTT[AZG]CCTGAGGTGTGGGTCCCAGGATGG SEQ ID NO:43

GGCGCCTCAACAGCCAGAAGGAGCG[AZG]AGCCTCAGGCCCAGGCAGCTCTGG SEQ ID NO:44

SEQ ID NO: 142

AGAAAGAAAAACAGCAA A/G ATGCCACAGTGGAGCCAGAG

or to a sequence complementary to any of the subsequences.

47. The method according to claim 34, wherein at least one nucleotide primer or probe is capable of hybridising to a subsequence of SEQ ID NO: 133 selected form the group of subsequences

TGCGAGCAGCCCGGGCTACAGGGTT[AZG]CCTGAGGTGTGGGTCCCAGGATGG SEQ ID NO: 43

AGAAAGAAAAACAGCAA A/G ATGCCACAGTGGAGCCAGAG SEQ ID NO: 142

or to a sequence complementary to any of the subsequences.

48. The method according to claim 34, wherein the at least one nucleotide primer or probe capable of hybridising to the subsequence of SEQ ID NO: 133 is

GGCGCCTCAACAGCCAGAAGGAGCG[AZG]AGCCTCAGGCCCAGGCAGCTCTGG SEQ ID NO: 44

or to a sequence complementary to the subsequence.

49. The method according to claim 34, wherein the at least one nucleotide primer or probe capable of hybridising to the subsequence of SEQ ID NO: 133 is

AGAAAGAAAAACAGCAAA/G ATGCCACAGTGGAGCCAGAG ^® ' D N0:

or to a sequence complementary to the subsequence.

50. The method according to any of the preceding claims, wherein at least two different probes are used, one or more probes being selected from the probes as de- fined in any of claims 34-49, and one or more probes being capable of hybridising to a sequence different from SEQ ID NO: 132 and optionally SEQ ID NO: 133, or a part thereof, or to a sequence complementary to a region different from SEQ ID NO: 132 and optionally SEQ ID NO: 133, or a part thereof.

51. The method according to claim 1 , wherein the translational product from a sequence in a region corresponding to SEQ ID NO: 132 and optionally SEQ ID NO. 133, or a part thereof, is an antibody, such as a monoclonal or polyclonal antibody.

52. A method for estimating the disease prognosis of an individual comprising in a sample from said individual assessing in the genetic material at least one sequence polymorphism in a region corresponding to SEQ ID NO: 132, or a part thereof, or - in a region complementary to SEQ ID NO: 132, or a part thereof, or

in a transcription product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, or

- or translation product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, and optionally a further sequence polymorphism - in a region corresponding to SEQ ID NO: 133, or a part thereof, or

- in a region complementary to SEQ ID NO: 133, or a part thereof, or in a transcription product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof, or

- or translation product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof,

- with the proviso that when using SEQ ID NO: 132 only the at least one sequence polymorphism is in a region corresponding to at least one region within SEQ ID NO: 132 selected from the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO: 19) and rs2377329 (SEQ

ID NO: 22), or

in a region flanked by and including the sequence polymorphism positioned at 18149120 in Contig NT_011109 (SEQ ID NO: 3) and sequence polymor- phism positioned at 18149154 in Contig NT_011109 (SEQ ID NO: 4), or

the sequence polymorphism positioned at 18150815 in Contig NT_011109 (SEQ ID NO: 5), or

the sequence polymorphism positioned at 18151158 in Contig NT_011109

(SEQ ID NO: 6)

- obtaining a sequence polymorphism response,

- estimating the disease prognosis of said individual based on the sequence polymorphism response.

53. The method according to claim 52, wherein the method has any of the features as defined in any of the claims 2-51.

54. A method for estimating a treatment response of an individual suffering from cancer to a disease treatment, comprising in a sample from said individual assessing in the genetic material at least one sequence polymorphism in a region corresponding to SEQ ID NO: 132, or a part thereof, or in a region complementary to SEQ ID NO: 132, or a part thereof, or

- in a transcription product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, or or translation product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, and optionally a further sequence polymorphism in a region corresponding to SEQ ID NO: 133, or a part thereof, or

- in a region complementary to SEQ ID NO: 133, or a part thereof, or in a transcription product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof, or - or translation product from a sequence in a region corresponding to SEQ ID

NO: 133, or a part thereof,

- with the proviso that when using SEQ ID NO:132 only the at least one sequence polymorphism is in a region corresponding to at least one region within SEQ ID NO: 132 selected from the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO: 19) and rs2377329 (SEQ ID NO: 22), or

in a region flanked by and including the sequence polymorphism positioned at 18149120 in Contig NT_011109 (SEQ ID NO: 3) and sequence polymorphism positioned at 18149154 in Contig NT_011109 (SEQ ID NO: 4), or

the sequence polymorphism positioned at 18150815 in Contig NT_011109 (SEQ ID NO: 5), or

the sequence polymorphism positioned at 18151158 in Contig NT_011109 (SEQ ID NO: 6)

obtaining a sequence polymorphism response,

- estimating the individual's response to the disease treatment based on the sequence polymorphism response.

55. The method according to claim 54, wherein the method has any of the features as defined in any of the claims 2-51.

56. A primer or probe for use in a method as defined in any of the claims above, said primer or probe being selected from the group of nucleotides consisting of agt gca gcc tea act tec (SEQ ID NO: 96) cca gtc caa aca ata tga tec (SEQ ID NO: 97)

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) cat gat tea ctg cac cca ace (SEQ ID NO: 98) ttt cac tct tgt tgc cca age (SEQ ID NO: 99) ttt tea cac agg tec aat cc (SEQ ID NO: 104) act gca ace tec ate tec (SEQ ID NO: 105) atg ttg ggg aga ctg agg (SEQ ID NO: 124) ccg cat eta act tat tct gg (SEQ ID NO: 125) aac tac etc tgc aaa ccc age (SEQ ID NO: 126) ttg gaa tgg agg gat tct ace (SEQ ID NO: 127) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 137)

55. The primer or probe for use in a method as defined in any of the claims above, said primer or probe being selected from the group of nucleotides consisting of agt gca gcc tea act tec (SEQ ID NO: 96) cca gtc caa aca ata tga tec (SEQ ID NO: 97)

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) cat gat tea ctg cac cca ace (SEQ ID NO: 98) ttt cac tct tgt tgc cca age (SEQ ID NO: 99) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128)

cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 137)

57.The primer or probe for use in a method as defined in any of the claims above, said primer or probe being selected from the group of nucleotides consisting of agt gca gcc tea act tec (SEQ ID NO: 96) cca gtc caa aca ata tga tec (SEQ ID NO: 97)

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) cat gat tea ctg cac cca ace (SEQ ID NO: 98) ttt cac tct tgt tgc cca age (SEQ ID NO: 99) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 137)

58. The primer or probe for use in a method as defined in any of the claims above, said primer or probe being selected from the group of nucleotides consisting of agt gca gcc tea act tec (SEQ ID NO: 96) cca gtc caa aca ata tga tec (SEQ ID NO: 97)

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO:137)

59. The primer or probe for use in a method as defined in any of the claims above, said primer or probe being selected from the group of nucleotides consisting of AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

67

120

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129)

60. The primer or probe for use in a method as defined in any of the claims above, said primer or probe being selected from the group of nucleotides consisting of AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134) TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO:137)

61. The primer or probe for use in a method as defined in any of the claims above, said primer or probe being selected from the group of nucleotides ides consisting of AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134) TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 137)

62. A primer or probe for use in a method as defined in any of the claims above as the other probe said primer or probe being selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gcc acg ttg cc (SEQ ID NO: 54) teg get att ttt ttt ttt att ttt tta tt (SEQ ID NO: 55) att aca ggc ace cac cac cat g (SEQ ID NO: 56) cct gga caa cat agg gag ace ctg tgt (SEQ ID NO: 138) caa aca aac aaa aac etc tgc ca (SEQ ID NO:139) atg ttg ggg aga ctg agg (SEQ ID NO: 124) ccg cat eta act tat tct gg (SEQ ID NO: 125) aac tac etc tgc aaa ccc age (SEQ ID NO: 126) ttg gaa tgg agg gat tct ace (SEQ ID NO: 127) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130)

tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID N0:141 )

63. The primer or probe according to claim 61 said primer or probe being selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gee acg ttg cc (SEQ ID NO: 54) teg get att ttt ttt ttt att ttt tta tt (SEQ ID NO: 55) att aca ggc ace cac cac cat g (SEQ ID NO: 56) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141)

64. The primer or probe according to claim 61 said primer or probe being selected from the group consisting teg get att ttt ttt ttt att ttt tta tt (SEQ ID NO: 55) att aca ggc ace cac cac cat g (SEQ ID NO: 56) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141 )

65. The primer or probe according to claim 61 said primer or probe being selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gee acg ttg cc (SEQ ID NO: 54) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141 )

66. The primer or probe according to claim 61 said primer or probe being selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gcc acg ttg cc (SEQ ID NO: 54) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141 )

67. The primer or probe according to claim 61 said primer or probe being selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gcc acg ttg cc (SEQ ID NO: 54) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129)

68. The primer or probe according to any of claims 55 or 62, wherein the probe is operably linked to at least one label, such as operably linked to two different labels.

69. The primer or probe according to claim 68, wherein the label is selected from TEX, TET, TAM, ROX, R6G, ORG, HEX, FLU, FAM, DABSYL, Cy7, Cy5, Cy3, BOFL, BOF, BO-X, BO-TRX, BO-TMR, JOE, 6JOE, VIC, 6FAM, LCRed640, LCRed705, TAMRA, Biotin, Digoxigenin, DuO-family, Daq-family.

70. The primer or probe according to claim 68, wherein the primer or probe is operably linked to a surface.

71. The primer or probe according to claim 70, wherein the surface is the surface of microbeads or a DNA chip.

72. An antibody directed to an epitope of a RAI gene product.

Description:

Disease risk estimating method using sequence polymorphisms in a specific region of chromosome 19

The present invention provides methods and compositions for identifying human subjects with an increased risk of having or developing disease. In particular, this invention relates to the identification and characterization of polymorphisms in the human chromosome 19q, the region r located approximately 19q 13.2-3 correlated with increased risk of developing disease, in particular cancer and the responsiveness of a subject to various treatments for cancer.

Background

DNA polymorphisms provide an efficient way to study the association of genes and diseases by analysis of linkage and linkage disequilibrum. With the sequencing of the human genome a myriad of hitherto unknown genetic polymorphisms among people have been detected. Most common among these are the single nucleotide polymorphisms, also called SNPs, of which several millions are known. Other examples are variable number of tandem repeat polymorphisms, insertions, deletions and block modifications 7 . Tandem repeats often have multiple different alleles (variants), whereas the other groups of polymorphisms usually just have two alleles. Some of these genetic polymorphisms probably play a direct role in the biology of the individuals, including their risk of developing disease, but the virtue of the majority is that they can serve as markers for the surrounding DNA, and thus serve as leads during as search for a causative gene polymorphism, as substitutes in the evaluation of its role in health and disease, and as substitutes in the evaluation of the genetic constitution of individuals.

The association of an allele of one sequence polymorphism with particular alleles of other sequence polymorphisms in the surrounding DNA has two origins, known in the genetic field as linkage and linkage disequilibrium, respectively. Linkage arises because large parts of chromosomes are passed unchanged from parents to offspring, so that minor regions of a chromosome tend to flow unchanged from one generation to the next and also to be similar in different branches of the same family. Linkage is gradually eroded by recombination occurring in the cells of the germ- line, but typically operates over multiple generations and distances of a number of million bases in the DNA.

Linkage disequilibrium deals with whole populations and has its origin in the (distant) forefather in whose DNA a new sequence polymorphism arose. The immediate surroundings in the DNA of the forefather will tend to stay with the new allele for many generations. Recombination and changes in the composition of the population will again erode the association, but the new allele and the alleles of any other polymorphism nearby will often be partly associated among unrelated humans even today. A crude estimate suggests that alleles of sequence polymorphisms with distances less than 10000 bases in the DNA will have tended to stay together since modem man arose. Linkage disequilbrium in limited populations, for instance Europeans, often extends over longer distances. This can be the result of newer mutations, but can also be a consequence of one or more "bottlenecks" with small population sizes and considerable inbreeding in the history of the current population. Two obvious possibilities for "bottlenecks" in Europeans are the exodus from Africa and the repopula- tion of Europe after the last ice age.

Linkage disequilibrium is the results of many stochastic events and as such subject to statistical variation occasionally resulting in discontinuities, lack of a monotonic relationship between association and distance and differences between people of different ethnicity. Therefore, it is often advantageous to study more that one sequence polymorphism in a given region. This also allows for further definition of the genetic surroundings of the biologically relevant polymorphism by combining the associated alleles of the different markers into a socalled haplotype.

Humans in general carry two copies of each human chromosome in each cell. There are exceptions to this rule, not relevant to this application. We therefore speak about genotypes i.e. the combined analysis of both chromosomes at a given sequence polymorphism. The resulting genotypes of a person, analysed for instance on DNA from peripheral blood leukocytes, are inherently very stable over time. Therefore, this type of analysis can be performed any time in the life of a person and will be applicable to this person for his or her entire life. By the same token such genetic analyses are ideally suited to predict future risks of disease.

A variety of investigations suggest that many diseases in part are determined by the genetic constitution of the individual. One group of genes in particular has been as-

sociated with rare genetic predispositions to cancer. These are the genes involved in maintaining the integrity of a person's DNA, the so-called DNA repair genes. One set of such genes are the XP genes which participate in nucleotide excision repair, and, when mutated, give rise to a 1000 fold increased risk of getting skin cancer. For this reason we have previously investigated single nucleotide polymorphisms in one DNA repair gene XPD for association with risk of skin cancer in a cohort of Caucasian Americans, and found that one allele of the sequence polymorphism called XPDe6 was associated with a moderately increased risk of getting basal cell carcinoma, the most common form of skin cancer. Later other groups have studied the association between sequence polymorphisms in this and other DNA repair genes and various forms of cancer. Some have reported positive results.

Very little is known about the function of the gene RAI. It was cloned because its protein product binds to and inhibits ReIA of the transcription regulator NF-kappaB. Other studies suggest that it may interact with the product p53 of a tumor suppressor gene.

Summary of the invention

The present invention relates in a first aspect to a group of nucleic acid sequences found to be associated with disease, in particular cancer. The invention further relates to transcriptional and translational products of said sequence. An allele in the r region can be identified as correlated with an increased risk of developing disease, in particular cancer, the prognosis of developed disease, in particular cancer, and responsiveness to disease treatment, in particular cancer treatment on the basis of statistical analyses of the incidence of a particular allele in individuals diagnosed with disease, in particular cancer.

Thus, in a first aspect the invention relates to a method for estimating the disease risk of an individual comprising A method for estimating the disease risk of an individual comprising

- in a sample from said individual assessing in the genetic material at least one sequence polymorphism

- in a region corresponding to SEQ ID NO: 132, or a part thereof, or - in a region complementary to SEQ ID NO: 132, or a part thereof, or

in a transcription product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, or or translation product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, and optionally a further sequence polymorphism - in a region corresponding to SEQ ID NO: 133, or a part thereof, or in a region complementary to SEQ ID NO: 133, or a part thereof, or in a transcription product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof, or or translation product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof,

- with the proviso that when using SEQ ID NO: 132 only the at least one sequence polymorphism is in a region corresponding to at least one region within SEQ ID NO: 132 selected from the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO: 19) and rs2377329 (SEQ

ID NO: 22), or

In a region flanked by and including the sequence polymorphism positioned at 18149120 in Contig NT_011109 (SEQ ID NO: 3) and sequence polymor- phism positioned at 18149154 in Contig NT_011109 (SEQ ID NO: 4), or

the sequence polymorphism positioned at 18150815 in Contig NTJ311109 (SEQ ID NO: 5), or

the sequence polymorphism positioned at 18151158 in Contig NT_011109

(SEQ ID NO: 6)

obtaining a sequence polymorphism response,

- estimating the disease risk of said individual based on the sequence polymorphism response.

The estimation of the disease risk of an individual can involve the comparison of the number and/or kind of polymorphic sequences identified with a predetermined dis- ease risk profile. Such a profile can be based on statistical data obtained for a rele-

vant reference group of individuals. In particular the disease is a proliferative disease, such as cancer.

The sequence of the r region is set forth as SEQ ID NO: 1 , originating from the clon- ing of human chromosome 19q published as part of the contig NT_011109 in the database of human sequences established by National Center for Biotechnology Information and located on the internet at http://www.ncbi.nlm.nih.gov/- genome/guide/human/ .

The presence of an allele is determined by determining the nucleic acid sequence of all or part of the region according to standard molecular biology protocols well known in the art as described for example in Sambrook et al. (1989) and as set forth in the Examples provided herein or products of the nucleic acid sequences.

In particular, the nucleic acid molecules of the present invention represent in a first aspect nucleic acid sequences forming part of the region r corresponding to SEQ ID NO: 132 and optionally a further region corresponding to SEQ ID NO: 133. The nucleic acid sequences of the present invention represent within the gene or controlling the gene referred to herein as RAI. As demonstrated in the Examples pre- sented below, the RAI gene is in particular associated with human cancer diseases.

Furthermore, the invention relates to a method for estimating the disease prognosis of an individual comprising

A method for estimating the disease prognosis of an individual comprising - in a sample from said individual assessing in the genetic material at least one sequence polymorphism in a region corresponding to SEQ ID NO: 132, or a part thereof, or

- in a region complementary to SEQ ID NO: 132, or a part thereof, or in a transcription product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, or

- or translation product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, and optionally a further sequence polymorphism

- in a region corresponding to SEQ ID NO: 133, or a part thereof, or in a region complementary to SEQ ID NO: 133, or a part thereof, or

in a transcription product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof, or

- or translation product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof,

- with the proviso that when using SEQ ID NO: 132 only the at least one sequence polymorphism is in a region corresponding to at least one region within SEQ ID NO: 132 selected from the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO: 19) and rs2377329 (SEQ ID NO: 22), or

in a region flanked by and including the sequence polymorphism positioned at 18149120 in Contig NT_011109 (SEQ ID NO: 3) and sequence polymorphism positioned at 18149154 in Contig NT_011109 (SEQ ID NO: 4), or

the sequence polymorphism positioned at 18150815 in Contig NT_011109 (SEQ ID NO: 5), or

the sequence polymorphism positioned at 18151158 in Contig NT_011109 (SEQ ID NO: 6)

- obtaining a sequence polymorphism response,

- estimating the disease prognosis of said individual based on the sequence polymorphism response.

The estimation of the disease prognosis of an individual can involve the comparison of the number and/or kind of polymorphic sequences identified with a predetermined disease prognosis profile. Such a profile can be based on statistical data obtained for a relevant reference group of individuals.

Additionally provided is a method of identifying a human subject as having an increased likelihood of responding to a treatment, comprising a) correlating the presence of an r region allele genotype with an increased likelihood of responding to treatment; and b) determining the r region allele genotype of the subject, whereby a subject having an r region allele genotype correlated with an increased likelihood of

responding to treatment is identified as having an increased likelihood of responding to treatment.

Thus, the present invention also relates to method for estimating a treatment re- sponse of an individual suffering from disease to a disease treatment, comprising A method for estimating a treatment response of an individual suffering from cancer to a disease treatment, comprising in a sample from said individual assessing in the genetic material at least one sequence polymorphism in a region corresponding to SEQ ID NO: 132, or a part thereof, or

- in a region complementary to SEQ ID NO: 132, or a part thereof, or in a transcription product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, or

- or translation product from a sequence in a region corresponding to SEQ ID NO: 132, or a part thereof, and optionally a further sequence polymorphism in a region corresponding to SEQ ID NO: 133, or a part thereof, or in a region complementary to SEQ ID NO: 133, or a part thereof, or in a transcription product from a sequence in a region corresponding to SEQ ID NO: 133, or a part thereof, or - or translation product from a sequence in a region corresponding to SEQ ID

NO: 133, or a part thereof,

- with the proviso that when using SEQ ID NO: 132 only the at least one sequence polymorphism is in a region corresponding to at least one region within SEQ ID NO: 132 selected from the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO:19) and rs2377329 (SEQ ID NO: 22), or

in a region flanked by and including the sequence polymorphism positioned at 18149120 in Contig NT_011109 (SEQ ID NO: 3) and sequence polymorphism positioned at 18149154 in Contig NT_011109 (SEQ ID NO: 4), or

the sequence polymorphism positioned at 18150815 in Contig NT_011109 (SEQ ID NO: 5), or

the sequence polymorphism positioned at 18151158 in Contig NT_011109 (SEQ ID NO: 6)

- obtaining a sequence polymorphism response,

- estimating the individual's response to the disease treatment based on the sequence polymorphism response.

The estimation of the individual's response to disease treatment can involve the comparison of the number and/or kind of polymorphic sequences identified with a predetermined cancer treatment response profile. Such a profile can be based on statistical data obtained for a relevant reference group of individuals. In particular the disease is a proliferative disease, such as cancer.

The invention also comprises primers or probes for use in the invention, as well as kits including these. The primers and/or probes are preferably capable of hybridising to SEQ ID NO: 132 and optionally to SEQ ID NO: 133, or a part thereof, in particularly the regions relevant to this invention, or a part thereof, under stringent conditions, as well as to a sequence complementary thereto.

In particular, the present invention is based on the discovery of the correlation with single nucleotide polymorphisms (SNPs), deletion polymorphisms, insertion polymorphisms, dinucleotide polymorphisms and/or tandem repeats in the regions and disease. Thus, polymorphisms have been found in the r region as shown in table 1. However, the present invention is not limited to the polymorphisms shown in table 1 , but does include any polymorphism in the region. Deletions and insertions have been found in the r region as shown herein. However, the present invention is not limited to the deletions shown in the tables, but does include any deletion and insertion in the region.

The term human includes both a human having or suspected of having a disease and a a-symptomatic human who may be tested for predisposition or susceptibility to disease. At each position the human may be homozygous for an allele or the human may be a heterozygote.

Drawings

Fig. 1 shows an overview of a subregion of chromosome 19q in which the exon- intron structure of RAI, XPD and ASE-1 is shown. Letters correspond to the specific marker position. The position in NT_011109 is shown. Marker abbreviations are as follows: a=XPDexon10 1 b=XPD exonθ, c=XPD-4bp, d=XPD-81 bp, e=XPD-5'2, f=XPD5'3, g=RAI-37, h=RAI-3'3, i=RAI-3'6, j=RAI-3'5, k=RAI-3'4, l=RAI-3'8, m=RAI exonδ, n=RAI intron5-2, O=RAI intron3, p=RAI-intron1 , q=RAI intron1-2, r=RAI in- tron1-3, s=RAI-5'2, t=RAI-5'3, u=RAI-5\ v=ASE1-5'2, w= ASE1 exoni , x=ERCC1-3'

Fig. 2 shows a pair-wise linkage disequilibrium between all markers in breast cancer controls.

Fig. 3 shows an association of single polymorphisms with early breast cancer.

Fig. 4 shows an association of sets of neighboring SNPs with breast cancer.

Fig. 5 shows a maximal OR for breast cancer for haplotypes formed by two SNPs.

Fig. 6 shows a Lazzeroni estimation of the position of the causative variant using young breast cancer.

Fig. 7 shows the overall distribution of p-values for sets of markers plotted against the position on the chromosome for basal cell carcinoma and lung cancer. As position on the abscissa we used the median marker position in a given cluster of mark- ers. The ordinate values are the negative logarithms to the overall p-values for a difference between cases and controls associated with a given set of markers. Each curve corresponds to a given size of marker sets, i.e. the length of the haplotypes.

Fig. 8 shows odds ratio for cancer versus control between the two homozygotes of each SNP in relation to location on chromosome 19.

Fig. 9 shows event free survival for patients who are wild-type carriers of ASE-1 (0) or homozygous or heterozygous carriers of the variant allele (1).

Fig. 10 shows overall survival subdivided by ASE- 1 genotype. 0=homozygous carrier of the wild-type allele of ASE-1 , 1= carrier of the variant allele of ASE-1.

Fig. 11 shows event free survival for women subdivided by ASE-1 genotype. 0=homozygous carrier of the wild-type allele of ASE-1 , 1 = carrier of the variant allele of ASE-1.

Fig. 12 shows overall survival for women subdivided by ASE-1 genotype. 0=homozygous carrier of the wild-type allele of ASE-1 , 1= carrier of the variant allele of ASE-1

Fig. 13 shows event free survival for men subdivided by ASE-1 genotype. 0=homozygous carrier of the wild-type allele of ASE-1 , 1= carrier of the variant allele of ASE-1.

Fig. 14 shows overall survival for men subdivided by ASE-1 genotype. 0=homozygous carrier of the wild-type allele of ASE-1 , 1 = carrier of the variant allele of ASE-1.

Fig. 15 shows Kaplan Meier plot of survival of lung cancer patients in relation to highrisk haplotype

Fig. 16 shows Kaplan Meier plot of survival in relation to XPD K751Q among those lung cancer patients homozygous for the high risk haplotype.

Fig. 17 shows Kaplan Meier plot of survival in relation to XPD K751Q among those lung cancer patients not homozygous for the highrisk haplotype.

Detailed description of the invention The present invention relates to a characterization of a person's present and/or future risk of getting certain forms of disease, in particular a proliferative disease, such as cancer. The characterization is based on the analysis of sequence polymorphisms in a region of chromosome 19q in the person.

A number of polymorphisms in the chromosomal region 19q13.2-3 have been identified and characterised. Surprisingly, the sequence polymorphisms with strongest association to disease appeared to be located outside the gene XPD. More specifically, the sequences were located in a sub-region harboring the gene RAI and im- mediately downstream of the RAI gene. The nature of the association between hap- lotype and disease was examined together with the p-values associated with the individual haplotypes. The odds ratios for each haplotype of each set of three neighboring SNPs were also determined. The odds ratio for test of homozygotes of individual markers against cancer status was likewise detemined. The most likely location of a single causative gene variant for individuals of the breast cancer cases was evaluated.

The region of chromosome 19q, more precisely the region located in 19q13.2-3, with which the present invention is concerned, is depicted in Fig. 1 as it is presently known together with the presently known or suspected genes. The arrows indicate the directions of transcription of the genes. The absolute chromosome positions shown are from the particular build of NCBI's map of chromsome 19, and will probably change with time. The position of markers used throughout the experiments is indicated. The intron-exon structure of the RAI gene is shown together with the posi- tion of the inter gene region between RAI and XPD genes.

The region r stretches from the beginning of, but not including the XPD gene, to approximately the end of ERCC1 and includes the genes RAI, LOC162978, and ASE- 1. More specifically r is bounded by and includes the following two sequences: AGAACCCCCG CCCCTCCACC TCGTCTCAAA and TCCCTCCCCA GA- GACTGCAC CAGCGCAGCC, and is defined by SEQ ID NO: 1.

In the present context the region r means SEQ ID NO: 1 and complementary sequence as well as transcriptional products and translational products thereof.

In one preferred embodiment, the gene RAI is defined in the claims as including transcribed sequences of the gene plus a 1500 base upstream promoter region. More specifically RAI is bounded by and includes the following sequences: CATAACCACA ATGATGAGCA TGTATTGAGT and ATGTTGTCCA GGCTGGTCTT GAACTCCTGA. In the present context this section of the region relates to SEQ ID

NO: 1 bases 7761-22885 and complementary sequence as well as transcriptional products and translational products thereof.

In another embodiment, one preferred section of the region stretches approximately from the the beginning of, but not including the XPD gene, to approximately the end of the RAI gene. In the present context the region means SEQ ID NO: 1 bases 1 to 25550 and complementary sequence as well as transcriptional products and translational products thereof.

In an even more preferred embodiment, one preferred section of the region stretches approximately from the the beginning of, but not including the XPD gene, to approximately within the RAI gene. In the present context the region means SEQ ID NO: 1 bases 1 to 15698 and complementary sequence as well as transcriptional products and translational products thereof.

In a most preferred embodiment, one preferred section of the region r stretches approximately from outside the XPD gene, to approximately within the RAI gene. In the present context the region means SEQ ID NO: 1 bases 4528 to 15698 and complementary sequence as well as transcriptional products and translational products thereof.

In a third embodiment, one preferred section of the region r stretches approximately from the beginning of, but not including the XPD gene, into the inter gene region between the RAI and XPD gene. In the present context the region means SEQ ID NO: 1 bases 1 to 1510 and complementary sequence as well as transcriptional products and translational products thereof.

In a further embodiment, one preferred section of the region r stretches approximately from the the beginning of, but not including the XPD gene, throughout the inter gene region between the RAI and XPD gene and into the 3' part of the RAI gene. In the present context the region means SEQ ID NO: 1 bases 1710 to 8685 and complementary sequence as well as transcriptional products and translational products thereof.

In yet a further embodiment, one preferred section of the region r stretches approximately over the central part of the RAI gene. In the present context the region means SEQ ID NO: 1 bases 8987 to 12090 and complementary sequence as well as transcriptional products and translational products thereof.

In yet another embodiment, one preferred section of the region r stretches approximately over the middle and 5' part of the RAI gene. In the present context the region means SEQ ID NO: 1 bases 15898 to 25550 and complementary sequence as well as transcriptional products and translational products thereof.

In a further embodiment, one preferred section of the region r stretches over the regions corresponding to SEQ ID NO: 132 and SEQ ID NO: 133 as described in detail elsewhere herein-

Modifications to the human genome map are known to occur from time to time. It is therefore possible that the defining sequences quoted above will change slightly in future maps.

Fragments or parts of the region r as used herein relates to any fragment of at least 5 nucleic acid redues in length, or multiples of 5 nucleic acid residues in length starting from SEQ ID NO: 1 position 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100. For example at least 21 , such as at least 22, for example at least 23, such as at least 24, for example at least 26, such as at least 27, for example at least 28, such as at least 29, for example at least 31 , such as at least 32, for example at least 33, such as at least 34, for example at least 36, such as at least 37, for example at least 38, such as at least 39, for example at least 41 , such as at least 42, for example at least 43, such as at least 44, for example at least 46, such as at least 47, for example at least 48, or at least 100 nucleic acid redues in length, or mutiples of 100 nucleic acid residues in length, starting from SEQ ID NO: 1 posi- tion 1, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2600, 2700, 2800, 2900, 3000, and so forth, each fragment starting position having an increment of 100 nucleic acid residues. Multiples are preferably multiples of e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26,

27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 and 50.

For fragments starting at position 1 , the length of said fragments will thus be e.g. 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500,

1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2600, 2700, 2800, 2900, 3000, and so forth, using suitable multiplicators as listed herein above.

For fragments starting at position 100, the length of said fragments will thus be e.g. 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500,

1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2600, 2700, 2800, 2900, 3000, and so forth, using suitable multiplicators as listed herein above.

For fragments starting at position 7700, the length of said fragments will thus be e.g. 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500,

1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2600, 2700, 2800, 2900, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000, 10500, 11000, 11500, 12000, 12500, 13000, 13500, 14000, 14500, 15000, and so forth, using suitable multiplicators such as e.g. the ones listed herein above.

The nucleic acid sequences according to the present invention make it possible to estimate cancer risk in an individual by using sequence polymorphisms originating from a specific region of chromosome 19.

Estimation of disease risks has a number of important applications, which in the following is exemplified with respect to cancer, but also apply to other disease, as described herein:

(1) Individuals with reasons to suspect that they are at risk for getting cancer would be able to clarify their situation and, if possible, take protective action. Alternatively, anti-cancer campaigns, companies, hospitals or other institutions could offer a service to help people clarify their situation. It would for instance be possible to test persons, when they got their first basal cell carcinoma, which is often recurrent and also is a moderate predictor for other cancers. If the persons were in a high-risk

group, one could then advice them about, or they could of their own accord choose, risk-reducing behaviour, such as avoidance of excessive sun-exposure, abstaining from smoking etc. About 5 percent of the Danish population will at some point in their life get a basal cell carcinoma.

(2) Anti-cancer campaigns, companies, hospitals or other institutions would be able to define relevant target subpopulations and focus information on risk-reducing behaviour on these persons. They might perhaps also be in a position to inform the remainder of the population that they need not worry. Lung cancer affects approxi- mately 10-15 percent of smokers and thus approximately 5 percent of the population, somewhat varying from country to country. Malignant melanoma, a sun- induced, often lethal form of skin cancer, affects approximately 700 persons a year in Denmark or about 1 percent of the Danish population.

(3) The drugs used in cancer treatment are often carcinogenic themselves and individual responses to them vary considerably, both with respect to tolerance to the treatment and with respect to efficacy of the treatment. It is an obvious possibility that the region of chromosome 19 here dealt with, which contains DNA repair genes known to modulate carcinogen responses, also modulates response to anti-cancer agents. Hence, analysis of the region may facilitate better choices of treatment for cancer, and/or help predict the future course of disease.

By sequence polymorphism is understood any single nucleotide, tandem repeat, insertion, deletion or block polymorphism, which varies among humans, whether it is of known biological importance or not.

Position of sequence polymorphism in the region r

In one embodiment of the methods of the invention, preferably the method for diagnosis as described herein, one or more single nucleotide polymorphism(s) at a pre- determined position in the region (SEQ ID NO:1) are identified and used for e.g. cancer risk profiling and/or cancer treatment response profiling. Presently preferred polymorphism(s) are listed in Tables 1a, 1b and 1c, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism(s) are selected. However, the present invention relates to any polymorphism in the region.

Table 1a

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ in Position SEQ ID ID NO: sequence NO:1

XPD-81 bp 81 bp/- rs3916787 NT 011109 18143027 19890 632 33

XPD-5'2 A/G rs2097215 NT 011109 18144005 20868 1610 34

XPD-5'3 err rs11878644 NT 011109 18145185 22048 2790 35

RAI-37 C/T rs7252567 NT 011109 18146823 23686 4428 21

RAI-3'3 AA/- rs3047560 NT 011109 18147192 24055 4797 23

RAI-3'6 A/C 10422489 NT 011109 18147886 24749 5491 24

RAI-3'5 C/T 10426701 NT 011109 18148193 25056 5798 25

RAI-3'4 A/G rs4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G rs8101662 NT 011109 18150911 27774 8516 27

RAI exon 6 A/T rs6966 NT 011109 18151180 28043 8785 36

RAW5-2 A/T rs8112723 NT 011109 18153497 30360 10204 37

RAI intron 3 A/G rs2017104 NT 011109 18155483 32346 12190 38

RAI intron 1 A/G rs1970764 NT 011109 18159091 35954 15798 28

RAI intron 1-2 A/G AC092309 24351 39069 18913 45

RAI intron 1-3 A/C AC092309 25115 39833 19677 46

RAI-5'2 C/T rs4803814 NT 011109 18168944 45807 25650 39

RAI-5'3 C/T rs4803815 NT 011109 18168948 45811 25564 40

RAI-5' C/T rs4572514 NT 011109 18171984 48847 28691 41

ASE1-5'2 G/T rs2226949 NT 011109 18175327 52190 32035 42

ASE1 exon 1 A/G rs967591 NT 011109 18178152 55015 34854 43

ERCC1-3' C/T rs762562 NT 011109 18180561 57424 37267 44 rs numbers were derived from the NCBI's database dbSNP.

Table 1b dbSNP # Position SEQ ID in SEQ ID NO: NO: 1 rs3916787 gctgcagtga gctgt (-/ACACCTGTGGTCCCAGCTACTCTGG 632 33

AAGCTGAGGTGGGAGGATCGCTTGAGCCCAAGAGGTG- GAGGCTGCAGTGAGCTGT) gactgtgcca ctgcactcca rs2097215 TGACAGT AGA CATCCTGTCA T (A/G) ATAAGTCttt ttttttt 1610 34 rs11878644 CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct 2790 35 rs7252567 ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc 4428 21 rs3047560 ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- 4797 23

CATGGTGGTGGGGTGC rs10422489 AAAAAAC- 5491 24

TAAAGTGGGGTTTGCGGG(G/T)AGTGGGAGGGCCCTTC

CTGCTAGG rs10426701 ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac 5798 25 rs4544343 TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG 7804 26

TTTGTTTGAG rs8101662 CTCGGGAGGCTGAGGCAGGAGAATC 8516 27

(A/G)CTTGAACTCAGGCAGAGGTTG rs6966 ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA 8785 36 rs8112723 ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa 10204 37 rs2017104 gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt 12190 38 rs1970764 tgcagtgagc tgagatcgc (A/G) ccactgcact ccagcctggg 15798 28 AC092309 CTTGCTACAGAATTACAGGCA (T/C) 18913 45

GCGCCACCGCTCCGGGCTAA AC092309 CTAAAGACTACA (-/A) TTTCCCAGCATCCCATTG 19677 46

rs4803814 CCCTCCCTGCTTGCTTGCT TTCTCT[C/T]TCTCTCTTTCTT 25650 39

TCTTTCTT TCTT rs4803815 CCCTGCTTGCTTGCTTTCTCTCTCT[C/T]TCTTTCTTTCTT 25564 40

TCTTTCTT TCTT rs4572514 AGAACCTGTTCAGGCTGGCGGCTCA[C/T]TTGGATGAAC 28691 41

AGGGAGTGTGTGAC rs2226949 CCCCCTTCTTAGGACG- 32035 42

CATGGGGGT[GZT]GAGAGAACGGGGAGATAGACAGAG rs967591 TGCGAGCAGCCCGGGCTA- 34854 43

CAGGGTT[A/G]CCTGAGGTGTGGGTCCCAGGATGG rs762562 GGCGCCTCAACAGCCAGAAG- 37267 44

GAGCG[A/G]AGCCTCAGGCCCAGGCAGCTCTGG

Table 1c

Trivial name Kind dbSNP # Sequence Position Relative SEQ in sequence position ID NO:

XPD exon 23 A/C Rs1052559 NT 011109 18123137 0

XPD exon 10 A/G Rs1799793 NT 011109 18135477 12340

XPD exon 6 A/C Rs238406 NT 011109 18136527 13390

XPD intron 3 A/G Rs1799783 NT 011109 18140254 17117

XPD-4bp GACA/- RS3916791 NT 011109 18142345 19208

XPD-81 bp 81 bp/- Rs3916787 NT 011109 18143027 19890 33

XPD-5'2 A/G Rs2097215 NT 011109 18144005 20868 34

XPD-5'3 C/T Rs11878644 NT 011109 18145185 22048 35

RAI-3'7 C/T Rs7252567 NT 011109 18146823 23686 21

RAI-3'3 AA/- Rs3047560 NT 011109 18147192 24055 23

RAI-Z'4 A/G RS4544343 NT 011109 18150199 27062 26

RAI exon 6 A/T Rs6966 NT 011109 18151180 28043 36

RAI intron5-3 C/T Rs10417235 NT 011109 18152171 29034

RAI intron5-2 A/T Rs8112723 NT 011109 18153497 30360 37

R/\/ intron 3 A/G RS2017104 NT 011109 18155483 32346

RAI intron 1 A/G Rs1970764 NT 011109 18159091 35954 28

RAI intron 1-2 C/T RS6509210 NT 011109 18162206 39069 45

R>A/ intron1-3 A/C NT 011109 18162970 39833 46

R/V-5'2 C/T RS4803814 NT 011109 18168944 45807 39

R>4/-5'3 C/T Rs4803815 NT 011109 18168948 45811 40

RAI-5' C/T Rs4572514 NT 011109 18171984 48847 41

ASE1-5'2 G/T RS2226949 NT 011109 18175327 52190 42

ASE1 exon 1 A/G Rs967591 NT 011109 18178152 55015 43

ERCC1-S2 A/G RS735482 NT 011109 18180220 57083 44

ERCC1-3' C/T RS762562 NT 011109 18180561 57424

ERCC7-3'3 A/G Rs2336219 NT 011109 18180624 57487

ERCC1 exon 4 I- C/T Rs3177700 NT 011109 18191871 68734

In one embodiment of the methods of the invention, preferably the method for diagnosis as described herein, one or more single nucleotide polymorphism(s) at a predetermined position in the region (SEQ ID NO:1) are identified and used for e.g. cancer risk profiling and/or cancer treatment response profiling. Presently preferred polymorphism(s) are listed in Tables 2a and 2b, more preferably at least two polymorphism^) are selected, most preferably at least three polymorphism(s) are selected. However, the present invention relates to any polymorphism in the region.

Table 2a

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ ID in Position SEQ ID NO: sequence NO:1

RAI-3'4 A/G rs4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G rs8101662 NT 011109 18150911 27774 8516 27

RAI exon 6 A/T rs6966 NT 011109 18151180 28043 8785 36

RAM5-2 A/T rs8112723 NT 011109 18153497 30360 10204 37

RAI intron 3 A/G rs2017104 NT 011109 18155483 32346 12190 38

RAI intron 1 A/G rs1970764 NT 011109 18159091 35954 15798 28

RAI intron 1-2 A/G AC092309 24351 39069 18913 45

RAI intron 1-3 A/C AC092309 25115 39833 19677 46 rs numbers were derived from the NCBI's database dbSNP.

Table 2b dbSNP # Position SEQ in SEQ ID ID NO: NO: 1 rs4544343 TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG 7804 26

TTTGTTTGAG rs8101662 CTCGGGAGGCTGAGGCAGGAGAATC 8516 27

(A/G)CTTGAACTCAGGCAGAGGTTG rs6966 ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA 8785 36 rs8112723 ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa 10204 37 rs2017104 gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt 12190 38 rs1970764 tgcagtgagc tgagatcgc (A/G) ccactgcact ccagcctggg 15798 28 AC092309 CTTGCTACAGAATTACAGGCA (TIC) 18913 45

GCGCCACCGCTCCGGGCTAA AC092309 CTAAAGACTACA (-/A) TTTCCCAGCATCCCATTG 19677 46

In one embodiment of the methods of the invention, preferably the method for diagnosis as described herein, one or more single nucleotide polymorphism(s) at a predetermined position in the region (SEQ ID NO: 1) are identified and used for e.g. cancer risk profiling and/or cancer treatment response profiling. Presently preferred polymorphism(s) are listed in Tables 3a, 3b and 3c, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism(s) are selected. However, the present invention relates to any polymorphism in the region.

Table 3a

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ in Position SEQ ID ID NO: sequence NO:1

XPD-81 bp 81 bp/- Rs3916787 NT 011109 18143027 19890 632 33 XPD-5'2 A/G Rs2097215 NT 011109 18144005 20868 1610 34 XPD-5'3 C/T Rs1187864 NT 011109 18145185 22048 2790 35

4

RAI-37 err Rs7252567 NT 011109 18146823 23686 4428 21

RAI-3'3 AA/- Rs3047560 NT 011109 18147192 24055 4797 23

RAI-3'6 A/C 10422489 NT 011109 18147886 24749 5491 24

RAI-3'5 C/T 10426701 NT 011109 18148193 25056 5798 25

RAI-3'4 A/G Rs4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G Rs8101662 NT 011109 18150911 27774 8516 27

RAI exon 6 A/T Rs6966 NT 011109 18151180 28043 8785 36

RAI-iδ-2 A/T Rs8112723 NT 011109 18153497 30360 10204 37

RAI intron 3 A/G Rs2017104 NT 011109 18155483 32346 12190 38

RAI intron 1 A/G Rs1970764 NT 011109 18159091 35954 15798 28

RAI intron 1-2 A/G AC092309 24351 39069 18913 45

RAI intron 1-3 A/C AC092309 25115 39833 19677 46 rs numbers were derived from the NCBI's database dbSNP.

Table 3b dbSNP # Position SEQ in SEQ ID ID NO: NO: 1 rs3916787 gctgcagtga gctgt (-/ACACCTGTGGTCCCAGCTACTCTGG 632 33

AAGCTGAGGTGGGAGGATCGCTTGAGCCCAAGAGGTG- GAGGCTGCAGTGAGCTGT) gactgtgcca ctgcactcca rs2097215 TGACAGTAGA CATCCTGTCA T (A/G) ATAAGTCttt ttttttt 1610 34 rs11878644 CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct 2790 35 rs7252567 ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc 4428 21 rs3047560 ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- 4797 23

CATGGTGGTGGGGTGC rs10422489 AAAAAAC- 5491 24

TAAAGTGGGGTTTGCGGG(G/T)AGTGGGAGGGCCCTTC

CTGCTAGG rs10426701 ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac 5798 25 rs4544343 TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG 7804 26

TTTGTTTGAG rs8101662 CTCGGGAGGCTGAGGCAGGAGAATC 8516 27

(A/G)CTTGAACTCAGGCAGAGGTTG rs6966 ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA 8785 36 rs8112723 ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa 10204 37 rs2017104 gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt 12190 38 rs1970764 tgcagtgagc tgagatcgc (A/G) ccactgcact ccagcctggg 15798 28 AC092309 CTTGCTACAGAATTACAGGCA (T/C) 18913 45

GCGCCACCGCTCCGGGCTAA AC092309 CTAAAGACTACA (-/A) TTTCCCAGCATCCCATTG 19677 46

More preferably polymorphism(s) are listed below in tables 3c and 3d, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism(s) are selected from SEQ ID NO:1 below

Table 3c

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ in Position SEQ ID ID NO: sequence NO:1

XPD-81 bp 81 bp/- Rs3916787 NT 011109 18143027 19890 632 33

XPD-5'2 A/G Rs2097215 NT 011109 18144005 20868 1610 34

XPD-5'3 C/T Rs1187864 NT _011109 18145185 22048 2790 35

4

RAI-3'7 C/T rs7252567 NT 011109 18146823 23686 4428 21

RAI-3'3 AA/- rs3047560 NT 011109 18147192 24055 4797 23

RAI-3'6 A/C 10422489 NT 011109 18147886 24749 5491 24

RAI-3'5 C/T 10426701 NT 011109 18148193 25056 5798 25

RAl-3'4 A/G rs4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G rs8101662 NT 011109 18150911 27774 8516 27

RAI exon 6 A/T rs6966 NT 011109 18151180 28043 8785 36

RAI-i5-2 A/T rs8112723 NT 011109 18153497 30360 10204 37

RAl intron 3 A/G rs2017104 NT 011109 18155483 32346 12190 38 rs numbers were derived from the NCBI' s database dbSNP.

Table 3d dbSNP # Position SEQ in SEQ ID ID NO: NO: 1 rs3916787 gctgcagtga gctgt (-/ACACCTGTGGTCCCAGCTACTCTGG 632 33

AAGCTGAGGTGGGAGGATCGCTTGAGCCCAAGAGGTG- GAGGCTGCAGTGAGCTGT) gactgtgcca ctgcactcca rs2097215 TGACAGTAGA CATCCTGTCA T (A/G) ATAAGTCttt ttttttt 1610 34 rs11878644 CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct 2790 35 rs7252567 ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc 4428 21 rs3047560 ACTAAAAAT AAAAAATAAAAAAAA(-/AA) ATAGCCGAG- 4797 23

CATGGTGGTGGGGTGC rs10422489 AAAAAAC- 5491 24

TAAAGTGGGGTTTGCGGG(G/T)AGTGGGAGGGCCCTTC

CTGCTAGG rs10426701 ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac 5798 25 rs4544343 TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG 7804 26

TTTGTTTGAG rs8101662 CTCGGGAGGCTGAGGCAGGAGAATC 8516 27

(A/G)CTTGAACTCAGGCAGAGGTTG rs6966 ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA 8785 36 rs8112723 ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa 10204 37 rs2017104 gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt 12190 38

Even more preferably polymorphism(s) are those listed below in tables 3e and 3f, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism(s) are selected from SEQ ID NO: 1 below:

Table 3e

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ in Position SEQ ID ID NO: sequence NO:1

RAI-3'3 AA/- rs3047560 NT 011109 18147192 24055 4797 23

RAI-3'6 A/C 10422489 NT 011109 18147886 24749 5491 24

RAI-3'5 C/T 10426701 NT 011109 18148193 25056 5798 25

06 000367

21

RAI-3'4 A/G rs4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G rs8101662 NT 011109 18150911 27774 8516 27

RAI exon 6 JVT rs6966 NT 011109 18151180 28043 8785 36

RAI-15-2 A/T rs8112723 NT 011109 18153497 30360 10204 37

RAI intron 3 A/G rs2017104 NT 011109 18155483 32346 12190 38 rs numbers were derived from the NCBI's database dbSNP.

Table 3f dbSNP # Position SEQ in SEQ ID ID NO: NO: 1 rs3047560 ACTAAAAAT AAAAAATAAAAAAAA(-/AA) ATAGCCGAG- 4797 23

CATGGTGGTGGGGTGC rs10422489 AAAAAAC- 5491 24

TAAAGTGGGGTTTGCGGG(G/T)AGTGGGAGGGCCCTTC

CTGCTAGG rs10426701 ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac 5798 25 rs4544343 TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG 7804 26

TTTGTTTGAG rs8101662 CTCGGGAGGCTGAGGCAGGAGAATC 8516 27

(A/G)CTTGAACTCAGGCAGAGGTTG rs6966 ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA 8785 36 rs8112723 ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa 10204 37 rs2017104 gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt 12190 38

Most preferably polymorphism(s) are those listed in tables 4a and 4b below, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism(s) are selected from the polymorphisms shown below:

Table 4a

Trivial name Kind dbSNP # Sequence Position Relative SEQ in SEQ ID 1 Position ID NO:

RAI-3'3 AA/- rs3047560 NT 011109 24055 23 RAI exon 6 A/T rs6966 NT 011109 28043 36 RAI intron 3 A/G rs2017104 NT 011109 32346 38 rs numbers were derived from the NCBI's database dbSNP.

Table 4b dbSNP # Position SEQ in SEQ ID ID NO:

NO: 1 rs3047560 ACTAAAAATAAAAAATAAAAAAAAC-ZAA) ATAGCCGAG- 4797 23

CATGGTGGTGGGGTGC rs6966 ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA 8785 36 rs2017104 gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt 12190 38

06 000367

22

In yet another embodiment according to the methods of the invention, preferably the method for diagnosis as described herein, one or more single nucleotide polymor- phism(s) at a predetermined position in the region (SEQ ID NO: 1) are identified and used for e.g. cancer risk profiling and/or cancer treatment response profiling. In this embodiment the preferred polymorphism(s) are listed in Tables 5a, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism^) are selected. However, the present invention relates to any polymorphism in the region.

Table 5a

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ in Position SEQ ID ID NO: sequence NO:1

XPD-81 bp 81 bp/- rs3916787 NT 011109 18143027 19890 632 33

XPD-5'3 C/T rs 11878644 NT 011109 18145185 22048 2790 35

RAI-37 C/T rs7252567 NT 011109 18146823 23686 4428 21

RAI-3'3 AA/- rs3047560 NT 011109 18147192 24055 4797 23

RAI-3'6 A/C 10422489 NT 011109 18147886 24749 5491 24

RAI-3'5 C/T 10426701 NT 011109 18148193 25056 5798 25

RAI-3'4 A/G PS4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G rs8101662 NT 011109 18150911 27774 8516 27

RAM5-2 A/T rs8112723 NT 011109 18153497 30360 10204 37

RAI intron 1-2 A/G AC092309 24351 39069 18913 45

RAI intron 1-3 A/C AC092309 25115 39833 19677 46 rs numbers were derived from the NCBPs database dbSNP.

More preferably, polymorphism(s) are those listed in table 5b below, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism^) are selected from the polymorphisms shown below:

Table 5b

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ in Position SEQ ID ID NO: sequence NO:1

XPD-5'3 C/T rs11878644 NT 011109 18145185 22048 2790 35

RAI-37 C/T rs7252567 NT 011109 18146823 23686 4428 21

RAI-3'3 AA/- rs3047560 NT 011109 18147192 24055 4797 23

RAI-3'6 A/C 10422489 NT 011109 18147886 24749 5491 24

RAI-3'5 C/T 10426701 NT 011109 18148193 25056 5798 25

RAI-3'4 A/G rs4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G rs8101662 NT 011109 18150911 27774 8516 27

RAI-iδ-2 A/T rs8112723 NT 011109 18153497 30360 10204 37

RAI intron 1-2 A/G AC092309 24351 39069 18913 45

RAI intron 1-3 A/C AC092309 25115 39833 19677 46

rs numbers were derived from the NCBI's database dbSNP.

In an even more preferred embodiment polymorphism(s) are those listed in table 5c below, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism(s) are selected from the polymorphisms shown below:

Table 5c

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ in Position SEQ ID ID NO: sequence NO:1

XPD-5'3 err rs11878644 NT 011109 18145185 22048 2790 35

RAI-3'7 C/T rs7252567 NT 011109 18146823 23686 4428 21

RAI-3'3 AA/- rs3047560 NT 011109 18147192 24055 4797 23

RAI-3'6 A/C 10422489 NT 011109 18147886 24749 5491 24

RAI-3'5 C/T 10426701 NT 011109 18148193 25056 5798 25

RAI-3'4 A/G rs4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G rs8101662 NT 011109 18150911 27774 8516 27

RAI-iδ-2 A/T rs8112723 NT 011109 18153497 30360 10204 37 rs numbers were derived from the NCBI's database dbSNP.

In a most preferred embodiment polymorphism(s) are those listed in table 5d below, more preferably at least two polymorphism(s) are selected, most preferably at least three polymorphism(s) are selected from the polymorphisms shown below:

Table 5d

Trivial name Kind dbSNP # Sequence Position Relative Position in SEQ in position SEQ ID ID NO: sequence NO:1

XPD-5'3 C/T rs11878644 NT 011109 18145185 22048 2790 35

RAI-3'7 C/T rs7252567 NT 011109 18146823 23686 4428 21

RAI-3'3 AA/- rs3047560 NT 011109 18147192 24055 4797 23

RAI-3'6 A/C 10422489 NT 011109 18147886 24749 5491 24

RAI-3'5 C/T 10426701 NT 011109 18148193 25056 5798 25

RAI-3'4 A/G rs4544343 NT 011109 18150199 27062 7804 26

RAI-3'8 A/G rs8101662 NT 011109 18150911 27774 8516 27 rs numbers were derived from the NCBI's database dbSNP.

In a preferred embodiment at least one of the following combinations of polymorphisms is included in the methods:

In another embodiment of the invention preferably the method described herein is one in which the tandem repeat is at a position as described in Table 6:

Table 6

Identification in uniSTS 2

D19S908 STS-W67936 D19S543 D19S393 STS-R48186 GDB:181915 RH47033 GDB:190019

2 UniSTS is a database of unique sequence tag sites established by National Center for Biotechnology Information and located on the internet at http://www.ncbi. nlm.nih.qov/entrez/querv,fcqi?db=unists

In another embodiment of the invention, the method for diagnosis described herein is preferably one in which the sequence polymorphism is in region r. Testing for the presence of the RAI gene allele is especially preferred because, without wishing to be bound by theoretical considerations, of its association with increased risk of cancer (as explained herein).

In one embodiment of the methods of the invention, preferably the method for diagnosis as described herein, one or more polymorphism(s) at a predetermined position in the region r (SEQ ID NO:1) are identified and used for e.g. cancer risk profiling and/or cancer treatment response profiling. Presently preferred polymorphism(s) are the dinucleotide polymorphism RAI-3'3 in which AA or a deletion is present, the single nucleotide polymorphism RAI exon 6 in which A or T is present, and RAI intron 3 in which A or G is present. However, the present invention relates to any polymorphism and SNP in the r region.

The sequence polymorphism of the invention comprises at least one base difference, such as at least two base differences, such as at least three base differences, such as at least four base differences, such as eighty one base pair differences. As described above the sequence polymorphism(s) comprises at least one polymorphism, such as at least two polymorphisms, such as at least three polymorphisms,

such as at least four polymorphisms. Also, the sequence polymorphism comprises at least one polymorphism, such as at least two tandem repeat polymorphisms.

Also, the sequence polymorphism may be a combination of single nucleotide polymorphism and dinucleotide polymorphism, such as one single nucleotide polymorphism and one dinucleotide polymorphism.

The status of the individual may be determined by reference to allelic variation at one, two, three, four or more of the above loci.

The primary effectors i.e. the mutations causing cancer according to the present invention may be found within the region of SEQ ID NO: 132 including and flanked by the marker RAI-3'7 and the polymorphism having the sequence

AAGTTTCTCTATT[GZT]TGTTTATAAACA corresponding to position 18151158 of contig nt_011109 (SEQ ID NO.: 6). The region and markers herein are shown to be important in relation to cancer according to the present invention. The sequence polymorphisms considered to be primary effector may be selected from the group consisting of

000367

27

* novel polymorphisms identified in the present invention

In yet another embodiment of the present invention the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group consisting of

In a preferred embodiment the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group consisting of

In another embodiment the primary effectors i.e. the mutations causing cancer according to the present invention may further be be selected from the group consisting of

In yet another embodiment the primary effectors i.e. the mutations causing cancer according to the present invention may further be be selected from the group consisting of

In another embodiment the primary effectors i.e. the mutations causing cancer according to the present invention may further be be selected from the group consisting of

In another preferred embodiment the primary effectors i.e. the mutations causing cancer according to the present invention may further be be selected from the group consisting of

In yet another preferred embodiment the primary effectors i.e. the mutations causing cancer according to the present invention may further be be selected from the group consisting of

In yet another preferred embodiment the primary effectors i.e. the mutations causing cancer according to the present invention may further be be selected from the group consisting of

However, in another embodiment of the present invention the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group of polymorphisms consisting of

In yet another embodiment of the present invention the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group of polymorphisms consisting of

In a further embodiment of the present invention the polymorphism according to the present invention is

In yet a further embodiment of the invention the polymorpshism present in the region SEQ ID NO: 132 is

The primary effector may be the mutation corresponding to the sequence TTTTAG- TAGAGACATGGTTCCGCCA[CZT]GTTGCCCAGGCTGGTCTTGAACTCC positioned at 18146823 of Contig nt_011109 (SEQ ID NO.: 21) . In another embodiment the primary effector may be the mutation corresponding to the sequence ctggggaggctgaggcaggagaatc[A/G]cttgaaaccgggaggcggaggttgt positioned at 18147126 of Contig nt_011109 (SEQ ID NO.: 22). However, in another preferred embodiment the primary effector may be the mutation corresponding to the sequence GATTGTCATGT[G/T]ACATCAGCCAATACT positioned at 18146233 of Contig nt_011109 (SEQ ID NO.:2). In yet another embodiment the primary effector may be the mutation corresponding to the sequence caggcggatcacaaggtcag- gagtt[C/T]gagaccagcctggccaacacagtga positioned at 18148193 of Contig nt_011109 (SEQ ID NO.: 25). In a further embodiment the primary effector may be the mutation corresponding to the sequence CACAGTGAAAC[C/T]CCATCTCTACTAAA posi-

2006/000367

32

tioned at 18149120 of Contig nt_011109 (SEQ ID NO.: 3). In another preferred embodiment the primary effector may be the mutation corresponding to the sequence AGCCTGGCCAACATG[C/G]TGAAACCCCGTCTCT positioned at 18150815 of Contig nt_011109 (SEQ ID NO.: 5). In yet another embodiment the primary effector may be the mutation corresponding to the sequence ctcgggaggctgaggcagga- gaatc[A/G]cttgaactcaggaggcagaggttgc positioned at 18150911 of Contig nt_011109 (SEQ ID NO.: 27). In a further embodiment the primary effector may be the mutation corresponding to the sequence AAGTTTCTCTATT[GZT]TGTTTATAAACA positioned at 18151158 of Contig nt_011109 (SEQ ID NO.: 6). However, in another pre- ferred embodiment the primary effector may be the mutation corresponding to the sequence CCCTATGTTGTCCAAGCTGGCAGAG[A/G]TTTTT- GTTTGTTTGTTTGAGAGGGA positioned at 18150199 of Contig nt_011109 (SEQ ID NO.: 26). Similarly in yet a further embodiment, the primary effector may be the mutation corresponding to the sequence actaaaaataaaaaaataaaaaaaa[- /AA]atagccgagcatggtggtgggtgcc positioned at 18147192 of Contig nt_011109 (SEQ ID NO.: 23). In another embodiment the primary effector may be the mutation corresponding to the sequence AAAAAACTAAAGTGGGGTTTGCGGGIG/T]- AGTGGGAGGGCCCTTCCTGCTAGGT positioned at 18147886 of Contig nt_011109 (SEQ ID NO.: 24). In a further embodiment the primary effector may be the mutation corresponding to the sequence

AAAATTAGCCGG[A/G]CGCCATGGCGGGAG positioned at 18149154 of Contig nt_011109 (SEQ ID NO.: 4). In an especially preferred embodiment the primary effector may be the mutation RAI3'd1 corresponding to the sequence GGTTTAT[ATTTT]Ntgagatggatttt positioned at 18147012 of Contig nt_011109 (SEQ ID NO.: 19) , wherein N denotes the number of repeats of the ATTTT sequence. It is appreciated that the sequences may be disclosed as the complementary sequence. N is for example at least 2 repeats, such as at least 3 repeats, for example at least 4 repeats, such as at least 5 repeats, for example at least 6 repeats, such as at least 7 repeats, for example at least 8 repeats, such as at least 9 repeats, for example at least 10 repeats, such as at least 11 repeats, for example at least 12 repeats, such as at least 13 repeats, for example at least 14 repeats, such as at least 15 repeats, for example at least 16 repeats, such as at least 17 repeats, for example at least 18 repeats, such as at least 19 repeats, for example at least 20 repeats, such as at least 21 repeats, for example at least 22 repeats, such as at least 23 repeats, for example at least 24 repeats, such as at least 25 repeats, for example at least 26

00367

33

repeats, such as at least 27 repeats, for example at least 28 repeats, such as at least 29 repeats, for example at least 30 repeats. In one embodiment of the invention the number of repeats ranging from at least 2 to at least 7 is correlated to an increased risk of developing breast cancer, for example ranging from at least 2 to at least 6 repeats, such as ranging from at least 2 to at least 5 repeats, for example ranging from at least 2 to at least 4 repeats, such as ranging from at least 3 to at least 5 repeats, for example ranging from at least 3 to at least 7 repeats, such as ranging from at least 3 to at least 6 repeats, for example ranging from at least 3 to at least 5 repeats, such as ranging from at least 3 to at least 4 repeats, for example ranging from at least 4 to at least 7 repeats, such as ranging from at least 4 to at least 6 repeats, for example ranging from at least 4 to at least 5 repeats. In general, the lower the number of repeats the higher the risk of developing breast cancer.

Information on the number of repeats may be combined with information on the age of the woman concerned. This combined information correlates to the risk of developing breast cancer. Furthermore, the number of repeats may be combined with information on the homozygosity or heterozygosity in the woman concerned. This combined information correlates to the risk of developing breast cancer. Additionally, information on the number of repeats may be combined with information on the age of the woman concerned and the homozygosity or heterozygosity status of the woman. This combined information correlates to the risk of developing breast cancer.

In one embodiment of the present invention homozygosity with regard to the presence of N repeats of ATTTT (ie. the presence of N repeats of ATTTT on both alleles) is correlated to an increased risk of developing breast cancer. In another embodiment the heterozygosity with regard to the presence of N repeats of ATTTT (ie. the presence of N repeats of ATTTT on one allele) is correlated to an increased risk of developing breast cancer.

Thus, in one embodiment the presence of N repeats of ATTTT on both alleles is correlated to an increased risk of developing breast cancer. In another embodiment of the present invention the presence of N repeats of ATTTT on one allele is corre- lated to an increased risk of developing breast cancer. The increased risk of devel-

oping breast cancer is correlated to all age groups. However, the increased risk of developing breast cancer is correlated to the age group under 55 years of age. In a further embodiment the increased risk of developing breast cancer is correlated to the age group between 55-60 years of age. Furthermore, for women aged over 60 years the risk of developing breast cancer is increased.

In one embodiment the presence of RAI3'd1 on both alleles wherein N is at least 4 in women in all age groups combined is correlated with an increased risk of developing breast cancer. In another embodiment of the present invention the presence of RAI3'd1 on both alleles wherein N is at least 4 in women under 55 years of age is correlated with an increased risk of developing breast cancer. In yet another embodiment the presence of RAI3'd1 on one allele wherein N is at least 4 in women in all age groups combined, and/or women aged under 55 years, and/or women aged 55-60 years, and/or women over 60 years is correlated with an increased risk of developing breast cancer.

The primary effectors may however be any combination of the polymorphisms mentioned herein.

Among the polymorphisms employed herein for providing a method and/or compositions for identifying human subjects with an increased risk of having or developing disease, a number of polymorphisms are novel and have been identified by the present inventors. For such polymorphisms the position according to Contig nt_011109 and the nucleotide sequence of the identified polymorphism are provided, whereas the novel polymorphisms cannot be assigned trivial names or identification numbers according to the dbSNP database.

The primary effectors i.e. the mutations causing cancer according to the present invention may be found within the region around position 39000 which is shown to be important in relation to cancer according to the present invention. The region includes and is flanked by the marker Rai intron 1 and the polymorphism having the sequence TCCAG CCTG G G CAAG AA[C/G] AGTG AAACTCCAG CTT corresponding to position 18165052 of contig nt_011109. The primary effector may be selected from the group consisting of

In another embodiment of the invention the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group consisting of

In yet another embodiment of the invention the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group consisting of

In a further embodiment of the invention the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group consisting of

In another embodiment of the invention the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group consisting of

In yet a further embodiment of the invention the primary effectors i.e. the mutations causing cancer according to the present invention may be be selected from the group consisting of

In a further embodiment the primary effector may be selected from the group consisting of

In one embodiment of the present invention the primary effector may be the mutation corresponding to the sequence ggagcttgcagtgagctga- gatcgc[A/G]ccactgcactccagcctgggcgaca positioned at 18159091 of Contig nt_011109 (SEQ ID NO.: 28). In another embodiment the primary effector may be the mutation corresponding to the sequence

TTCTCCTGACCTC[A/G]TGATCCGCCCACCTCGG positioned at 18159263 of Contig nt_011109 (SEQ ID NO.: 7). In yet another embodiment the primary effector may be the mutation corresponding to the sequence GGGATT ACAGG- CATGC[A/G]CCACCAGGCCCAGCTAATTTTTGT positioned at 18160363 of Contig nt_011109 (SEQ ID NO.: 8). In a further embodiment the primary effector may be the mutation corresponding to the sequence

TCCAATGGTGACA[A/C]CAGTAAGAGCAGTTAACAG positioned at 18160936 of Contig nt_011109 (SEQ ID NO.: 9). In a preferred embodiment the primary effector may be the mutation corresponding to the sequence

TCCAATGGTGACAA[C/G]AGTAAGAGCAGTTAACAG positioned at 18160937 of Contig nt_011109 (SEQ ID NO.: 10). In a further preferred embodiment the primary effector may be the mutation corresponding to the sequence tacaggcgcccgccac- cacccccag[A/C]taatttttgtatttttagtagagac positioned at 18161433 of Contig nt_011109 (SEQ ID NO.: 29). In a further preferred embodiment the primary effector may be the mutation corresponding to the sequence

TTGCCTCAGCCTCCTGA[G/T]TAGCTGGGATTGGAATGAGA positioned at 18161694 of Contig nt_011109 (SEQ ID NO.: 11). However, in another embodiment the primary effector may be the mutation corresponding to the sequence TACGA-

TAMTAGCTAGA[CZGACCTTGGCGCCACCATCTT] positioned at 18161841 of Contig nt_011109 (SEQ ID NO.: 20). In an especially preferred embodiment the primary effector may be the mutation corresponding to the sequence AAAATAATAATAATAATATTAA[CZT]CCTGACCTTGGCGCCACCATCT positioned at 18161896 of Contig nt_011109 (SEQ ID NO.: 12). In further preferred embodiment the primary effector may be the mutation corresponding to the sequence tcgtcctgctacagaattacaggca[CZT]gcgccaccgctccgggctaattttt positioned at 18162206 of Contig nt_011109 (SEQ ID NO.: 30). In another embodiment the primary effector may be the mutation corresponding to the sequence CCTCAT- GAGCCACCCAC[CZT]TCGGCCTCCCAAAGTGCT positioned at 18162309 of Contig nt_011109 (SEQ ID NO.: 13). In yet another embodiment the primary effector may be the mutation corresponding to the sequence

TGAGCCACCGCGCCC[AZG]GCCGAGACTCACTATTT positioned at 18162356 of Contig nt_011109 (SEQ ID NO.: 14). In yet a further embodiment the primary effec- tor may be the mutation corresponding to the sequence taaagcgggag- gatggcttgaacct[AZG]ggaggcggaggttgcagtgagccga positioned at 18162599 of Contig nt_011109 (SEQ ID NO.: 31). In a further embodiment. In an additional embodiment the primary effector may be the mutation corresponding to the sequence GGAGA- GAAGGAGCAGAGAAC[AZC]TCTCTATGTGGCCA positioned at 18162903 of Con- tig nt_011109 (SEQ ID NO.: 15). However, in another embodiment the primary effector may be the mutation corresponding to the sequence ATCCTAAAGAC- TAC[AZC]TTTCCCAGCATCCCA positioned at 18162970 of Contig nt_011109 (SEQ ID NO.: 32). Furthermore, in a further embodiment the primary effector may be the mutation corresponding to the sequence TTTCCCAGCATC- CCA[CZT]TGCAATGAGGCTCCTGGCC positioned at 18162986 of Contig nt_011109 (SEQ ID NO.: 16). In yet a further embodiment the primary effector may be the mutation corresponding to the sequence

TCCTGACTCCAGTG[AZC]GGTGCCTACAGTCCTG positioned at 18163200 of Contig nt_011109 (SEQ ID NO.: 17). In yet a preferred embodiment the primary effector may be the mutation corresponding to the sequence

TTCCAGCCTGGGCAAGAA[CZG]AGTGAAACTCCAGCTT positioned at 18165052 of Contig nt_011109 (SEQ ID NO.: 18).

The primary effectors may also be found within the region corresponding to SEQ ID NO: 133, wherein the polymorphisms are shown to be important in relation to the

DK2006/000367

41

present invention. The region includes and is flanked by the polymorphisms RAI 5' (SEQ ID NO: 4) and ASE 1 exon 3-2 also named ERCC1 3'3 or ASE1e1b (SEQ ID NO: 142). The polymorphisms may be selected from the group consisting of

In another embodiment of the present invention the polymorphisms are selected from the group consisting of

In yet another embodiment of the present invention the polymorphisms are selected from the group consisting of

In a further embodiment of the present invention the polymorphisms are selected from the group consisting of

In yet a further embodiment of the present invention the polymorphisms are selected from the group consisting of

Thus in one embodiment of the present invention the at least one polymorphism is RAI5' (SEQ ID NO: 41). However, in another embodiment the at least one polymorphism is ASE 1- 5'2 (SEQ ID NO:42). A further embodiment is ERCC1-3' (SEQ ID NO: 44). A preferred embodiment of the invention is the at least one polymorphism ASE 1 exon 3-2 (SEQ ID NO: 142). In yet a further preferred embodiment the at least one polymorphism is ASE 1 exon 1 (SEQ ID NO:43).

In a further preferred embodiment the primary effectors may however be any combination of the polymorphisms.

One aspect of the present invention relates to a method for estimating the disease risk of an individual comprising assessing at least one sequence polymorphism in a region corresponding to SEQ ID NO: 132 and optionally at least one further se- quence polymorphism in a region corresponding to SEQ ID NO: 133 with the proviso that when using SEQ ID NO: 132 only, the at least one polymorphism is in a region corresponding to at least one region within SEQ ID NO:132 selected from the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO: 19) and rs 2377329 (SEQ ID NO: 22). In one embodiment the at least one polymor- phism is RAI3'd1 (SEQ ID NO: 19). In another embodiment the at least one polymorphism is rs 2377329 (SEQ ID NO: 22).

However, another embodiment of the present invention relates to a method for estimating the diseaser isk of an individual comprising assessing at least one sequence polymorphism in a region corresponding to SEQ ID NO: 132 and optionally at least one further sequence polymorphism in a region corresponding to SEQ ID NO: 133 with the proviso that when using SEQ ID NO: 132 only, the at least one polymorphism is in a region corresponding to at least one region within SEQ ID NO: 132 selected from the regions flanked by and including the sequence polymorphism positioned at 18149120 (SEQ ID NO: 3) and sequence polymorphism positioned at 1814154 (SEQ ID NO: 4). Thus, in one embodiment of the present invention the

sequence polymorphism is the sequence polymorphism positioned at 18149120 (SEQ ID NO: 3). In another embodiment the is the sequence polymorphism is the sequence polymorphism positioned at 18149154 (SEQ ID NO: 4).

In yet other embodiments under said proviso when at least one polymorphism is assesses in the region corresponding to SEQ ID NO: 132 the sequence polymorphism is positioned at 18450815 (SEQ ID NO: 22), or for example at position 18151158 (SEQ ID NO: 6)

The at least one sequence polymorphisms residing in the region corresponding to

SEQ ID NO: 132 may optionally further be combined with at least one polymorphism sequence polymorphism in the region corresponding to SEQ ID NO: 133 in order to estimate the disease risk of an individual. For example the at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5 1 (SEQ ID NO.:4 and ASE 1 exon 3-2 (SEQ ID NO: 142). For example the at least one sequence polymorphism is assessed in a region of SEQ ID NO: 132 flanked by and including the regions flanked by and including the sequence polymorphism RAI3'd1 (SEQ ID NO: 19) and rs2377329 (SEQ ID NO: 22), and at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO: 142)

In another embodiment at least one sequence polymorphism is assessed in a region of SEQ ID NO: 132 flanked by and including the regions flanked by and including the sequence polymorphism positioned at 18149120 in Contig NT_011109 (SEQ ID NO: 3) and sequence polymorphism positioned at 18149154 in Contig NT_011109 (SEQ ID NO: 4), and at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO: 142)

In yet another embodiment according to the present invention the at least one sequence polymorphism is assessed in SEQ ID NO: 133 said polymorphism is positioned at 18150815 in Contig NT_011109 (SEQ ID NO: 5), and at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and

including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO: 142 ).

Further embodiments include assessing at least one sequence polymorphism in SEQ ID NO: 132 said polymorphism is positioned at 18151158 in Contig NT_011109 (SEQ ID NO: 6), and at least one sequence polymorphism is assessed in a region of SEQ ID NO: 133 flanked by and including the polymorphisms RAI 5' (SEQ ID NO: 4 and ASE 1 exon 3-2 (SEQ ID NO: 142).

Yet other embodiments include a method for estimating the disease risk by assess- ing for example the sequence polymorphism RAI3' d1 (SEQ ID NO: 19) in SEQ ID NO: 132, and the sequence polymorphism assessed in SEQ ID NO: 133 is ASE 1 exon 3-2 (SEQ ID NO.: 142), such as the sequence polymorphism RAI3' d1 (SEQ ID NO: 19) in SEQ ID NO: 132, and the sequence polymorphism assessed in SEQ ID NO: 133 being ASE 1 exon 3-2 (SEQ ID NO: 142).

Cell sample

The cell sample used in the present invention may be any suitable cell sample capable of providing the genetic material for use in the method. In a preferred embodiment, the cell sample is a blood sample, a tissue sample, a sample of secretion, semen, ovum, a washing of a body surface (e.g. a buccal swap), a clipping of a body surface (hairs, or nails), such as wherein the cell is selected from white blood cells and tumour tissue.

It will be appreciated that the test sample may equally be a nucleic acid sequence corresponding to the sequence in the test sample, that is to say that all or a part of the region in the sample nucleic acid may firstly be amplified using any convenient technique e.g. PCR, before use in the analysis of variation in the region.

Detection methods Detection may be conducted on the sequence of SEQ ID NO: 1 or a complementary sequence as well as on transcriptional (mRNA and/or cDNA) and translational products (polypeptides, proteins) therefrom. Similarly detection may thus be conducted on the sequence of SEQ ID NO: 132 and optionally further o SEQ ID NO: 133 or complementary sequences as well as on transcriptional (mRNA, and/or cDNA) and translational products (polypeptides, proteins) therefrom.

It will be apparent to the person skilled in the art that there are a large number of analytical procedures which may be used to detect the presence or absence of variant nucleotides at one or more of positions mentioned herein in the r region. Muta- tions or polymorphisms within or flanking the r region can be detected by utilizing a number of techniques. Nucleic acid from any nucleated cell can be used as the starting point for such assay techniques, and may be isolated according to standard nucleic acid preparation procedures that are well known to those of skill in the art. In general, the detection of allelic variation requires a mutation discrimination tech- nique, optionally an amplification reaction and a signal generation system. Table 7 lists a number of mutation detection techniques, some based on the PCR. These may be used in combination with a number of signal generation systems, a selection of which is listed in Table 8. Further amplification techniques are listed in Table 9. Many current methods for the detection of allelic variation are reviewed by Nollau et al., Clin. Chem. 43, 1114-1120, 1997; and in standard textbooks, for example "Laboratory Protocols for Mutation Detection", Ed. by U. Landegren, Oxford University Press, 1996 and "PCR", 2nd Edition by Newton & Graham, BIOS Scientific Publishers Limited, 1997.

Table 7

Abbreviations:

ALEX Amplification refractory mutation system linear extension

APEX Arrayed primer extension

ARMS Amplification refractory mutation system b-DNA Branched DNA

CMC Chemical mismatch cleavage bp base pair

COPS Competitive oligonucleotide priming system

DGGE Denaturing gradient gel electrophoresis FRET Fluorescence resonance energy transfer

LCR Ligase chain reaction

MASDA Multiple allele specific diagnostic assay

NASBA Nucleic acid sequence based amplification

OLA Oligonucleotide ligation assay PCR Polymerase chain reaction

PTT Protein truncation test

RFLP Restriction fragment length polymorphism

SDA Strand displacement amplification

SNP Single nucleotide polymorphism SSCP Single-strand conformation polymorphism analysis

SSR Self sustained replication

TGGE Temperature gradient gel electrophoresis

Table 8 illustrates various mutation detection techniques capable of being used for SNP detection.

Table 8

General techniques: DNA sequencing, Sequencing by hybridisation, SNAPshot.

Scanning techniques: PJT*, SSCP, DOGE, TGGE, Cleavase, Heteroduplex analysis, CMC, Enzymatic mismatch cleavage

Hybridisation Based techniques

Solid phase bybridisation: Dot blots, MASDA, Reverse dot blots, Oligonucleotide arrays (DNA Chips)

Solution phase hybridisation: Taqman -U.S. Pat. No. 5,210,015 & 5,487,972 (Hoffmann-La Roche), Molecular Beacons ~ Tyagi et al (1996), Nature Biotechnology, 14, 303; WO 95/13399 (Public Health Inst., New York), Lightcycler, optionally in combination with FRET.

Extension Based: ARMS, ALEX - European Patent No. EP 332435 B1 (Zeneca Limited), COPS - Gibbs et al (1989), Nucleic Acids Research, 17, 2347.

Incorporation Based: Mini-sequencing, APEX

Restriction Enzyme Based: RFLP, Restriction site generating PCR

Ligation Based: OLA

Other: Invader assay

Various Signal Generation or Detection Systems is listed below:

Fluorescence: FRET, Fluorescence quenching, Fluorescence polarisation-United Kingdom Patent No. 2228998 (Zeneca Limited)

Other: Chemiluminescence, Electrochemiluminescence, Raman, Radioactivity, CoI- orimetric, Hybridisation protection assay, Mass spectrometry

Table 9 illustrates examples of further amplification techniques.

Table 9 SSR, NASBA, LCR, SDA, b-DNA

Preferred mutation detection techniques include ARMS, ALEX, COPS, Taqman, Molecular Beacons, RFLP, and restriction site based PCR and FRET techniques.

Particularly preferred methods include FRET; taqman, ARMS and RFLP based methods.

In a preferred embodiment, mutations or polymorphisms can be detected by using a microassay of nucleic acid sequences immobilized to a substrate or "gene chip" (see, e.g. Cronin, et al., 1996, Human Mutation 7:244-255).

Further, improved methods for analyzing DNA polymorphisms, which can be utilized for the identification of region r specific mutations, have been described that capitalize on the presence of variable numbers of short, tandemly repeated DNA sequen- ces between the restriction enzyme sites. For example, Weber (U.S. Pat. No.

5,075,217) describes a DNA marker based on length polymorphisms in blocks of (dC-dA)n-(dG-dT)n short tandem repeats. The average separation of (dC-dA)n-(dG- dT)n blocks is estimated to be 30,000-60,000 bp. Markers that are so closely spaced exhibit a high frequency co-inheritance, and are extremely useful in the iden-

tification of genetic mutations, such as, for example, mutations within the RAI gene, and the diagnosis of diseases and disorders related to RAI mutations.

Also Caskey et al. (U.S. Pat. No. 5,364,759) describe a DNA profiling assay for de- tecting short tri and tetra nucleotide repeat sequences. The process includes extracting the DNA of interest, such as the RAI gene, amplifying the extracted DNA, and labelling the repeat sequences to form a genotypic map of the individual's DNA.

The level of RAI gene expression can also be assayed. For example, RNA from a cell type or tissue known, or suspected, to express the RAI gene may be isolated and tested utilizing hybridization or PCR techniques such as are described, above.

The isolated cells can be derived from cell culture or from a patient. The analysis of cells taken from culture may be a necessary step in the assessment of cells to be used as part of a cell-based gene therapy technique or, alternatively, to test the ef- feet of compounds on the expression of the RAI gene. Such analyses may reveal both quantitative and qualitative aspects of the expression pattern of the RAI gene, including activation or inactivation of RAI gene expression.

In one embodiment of such a detection scheme, a cDNA molecule is synthesized from an RNA molecule of interest (e.g., by reverse transcription of the RNA molecule into cDNA). A sequence within the cDNA is then used as the template for a nucleic acid amplification reaction, such as a PCR amplification reaction, or the like. The nucleic acid reagents used as synthesis initiation reagents (e.g., primers) in the reverse transcription and nucleic acid amplification steps of this method are chosen from among the RAI gene nucleic acid reagents described above. The preferred lengths of such nucleic acid reagents are at least 9-30 nucleotides. For detection of the amplified product, the nucleic acid amplification may be performed using radio- actively or non-radioactively labeled nucleotides. Alternatively, enough amplified product may be made such that the product may be visualized by standard ethidium bromide staining or by utilizing any other suitable nucleic acid staining method.

Additionally, it is possible to perform such RAI gene expression assays "in situ", i.e., directly upon tissue sections (fixed and/or frozen) of patient tissue obtained from biopsies or resections, such that no nucleic acid purification is necessary. Nucleic acid reagents such as those described above may be used as probes and/or prim-

ers for such in situ procedures (see, for example, Nuovo, G. J., 1992, "PCR In Situ Hybridization: Protocols And Applications", Raven Press, NY).

Alternatively, if a sufficient quantity of the appropriate cells can be obtained, stan- dard Northern analysis can be performed to determine the level of mRNA expression of the RAI gene.

Activity of the gene

Another method for detecting sequence polymorphism is by analysing the activity of gene products resulting from the sequences. Accordingly, in one embodiment the detection uses the activity of the RAI gene product as compared to a reference in the method. In particular if the activity of the genes are decreased or increased by at least or about 50 %, such as at least or about 40%, for example at least or about 30%, such as at least or about 20%, for example at least or about 10%, such as at least or about 10%, for example at least or about 5%, such as at least or about 2%, it indicates a sequence polymorphism in the gene.

Mutations outside the region

The present invention may combine the result of sequence polymorphism within the region r with sequence polymorphism outside the region in order to increase the probability of the correlation. The sequence polymorphisms outside the region which may be indicative of diseases according to the method of the present invention are linked to the primary effectors.

Primers

The primer nucleotide sequences of the invention further include: (a) any nucleotide sequence that hybridizes to a nucleic acid molecule of the region r or its complementary sequence or RNA products under stringent conditions, e.g., hybridization to filter-bound DNA in 6x sodium chloride/sodium citrate (SSC) at about 45 0 C followed by one or more washes in 0.2x SSC/0.1% SDS at about 50-65 0 C, or (b) under highly stringent conditions, e.g., hybridization to filter-bound nucleic acid in 6x SSC at about 45°C followed by one or more washes in 0.1 x SSC/0.2% SDS at about 68°C, or under other hybridization conditions which are apparent to those of skill in the art (see, for example, Ausubel F.M. et al., eds., 1989, Current Protocols in Molecular Biology, Vol. I, Green Publishing Associates, Inc., and John Wiley & sons, Inc., New

York, at pp. 6.3.1-6.3.6 and 2.10.3). Preferably the nucleic acid molecule that hybridizes to the nucleotide sequence of (a) and (b), above, is one that comprises the complement of a nucleic acid molecule of the region s or r or a complementary sequence or RNA product thereof. In a preferred embodiment, nucleic acid molecules comprising the nucleotide sequences of (a) and (b), comprises nucleic acid molecule of RAI or a complementary sequence or RNA product thereof.

Among the nucleic acid molecules of the invention are deoxyoligonucleotides ("oli- gos") which hybridize under highly stringent or stringent conditions to the nucleic acid molecules described above. In general, for probes between 14 and 70 nucleotides in length the melting temperature (TM) is calculated using the formula:

Tm(°C)=81.5+16.6(log [monovalent cations (molar)])+0.41(% G+C)-(500/N)

where N is the length of the probe. If the hybridization is carried out in a solution containing formamide, the melting temperature is calculated using the equation Tm(°C)=81.5+16.6(log[monovalent cations (molar)])+0.41(% G+C)-(0.61% forma- mide)-(500/N) where N is the length of the probe. In general, hybridization is carried out at about 20-25 degrees below Tm (for DNA-DNA hybrids) or 10-15 degrees be- low Tm (for RNA-DNA hybrids).

Exemplary highly stringent conditions may refer, e.g., to washing in 6x SSC/0.05% sodium pyrophosphate at 37 0 C (for about 14-base oligos), 48 0 C (for about 17-base oligos), 55°C (for about 20-base oligos), and 6O 0 C (for about 23-base oligos).

Accordingly, the invention further provides nucleotide primers or probes which detect the r region polymorphisms of the invention. The assessment may be conducted by means of at least one nucleic acid primer or probe, such as a primer or probe of DNA, RNA or a nucleic acid analogue such as peptide nucleic acid (PNA) or locked nucleic acid (LNA). The nucleotide primer or probe is preferably capable of hybridising to a subsequence of the region corresponding to SEQ ID NO: 1 , or a part thereof, or a region complementary to SEQ ID NO: 1.

According to one aspect of the present invention there is provided an allele-specific oligonucleotide probe capable of detecting a r region polymorphism at one or more of positions in the r region as defined by the positions in SEQ ID NO: 1.

The allele-specific oligonucleotide probe is preferably 5-50 nucleotides, more preferably about 5-35 nucleotides, more preferably about 5-30 nucleotides, more preferably at least 9 nucleotides.

The design of such probes will be apparent to the molecular biologist of ordinary skill. Such probes are of any convenient length such as up to 50 bases, up to 40 bases, more conveniently up to 30 bases in length, such as for example 8-25 or 8- 15 bases in length. In general such probes will comprise base sequences entirely complementary to the corresponding wild type or variant locus in the region. However, if required one or more mismatches may be introduced, provided that the dis- criminatory power of the oligonucleotide probe is not unduly affected. The probes of the invention may carry one or more labels to facilitate detection.

In one embodiment, the primers and/or probes are capable of hybridizing to and/or amplifying a subsequence hybridizing to a single nucleotide polymorphism contain- ing the sequence shown herein selected from the group of subsequences below or a sequence complementary thereto, wherein the polymorphism is denoted as for example T/C:

1. gctgcagtga gctgt (-/ACACCTGTGGTCCCAGCTACTCTGG AAGCTGAGGTGGGAGGATCGCTTGAGCCCAAGAGGTGGAGGCTG- CAGTGAGCTGT) gactgtgcca ctgcactcca (SEQ ID NO: 33)

2. TGACAGTAGA CATCCTGTCA T (MG) ATAAGTCttt ttttttt (SEQ ID NO: 34)

3. CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct (SEQ ID NO: 35)

4. ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc (SEQ ID NO: 21)

5. ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- CATGGTGGTGGGGTGC (SEQ ID NO: 23)

6. AAAAAACTAAAGTGGGGTTTGCGGG (GIT) AGTGGGAGGGCCCTTCCTGCTAGG (SEQ ID NO: 24)

7. ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac (SEQ ID NO: 25)

8. TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG TTTGTTTGAG

9. CTCGGGAGGCTGAGGCAGGAGAATC (AJG) CTTGAACTCAGGCA- GAGGTTG

10. ATTAAGTGCCTTCACACAGC (AIT) CTGGTTTAAT GTTTATAA (SEQ ID NO: 36) 11. ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa (SEQ ID NO: 37)

12. gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt (SEQ ID NO: 38)

13. tgcagtgagc tgagatcgc (A/G) ccactgcact ccagcctggg (SEQ ID NO: 28)

14. CTTGCTACAGAATTACAGGCA (T/C) GCGCCACCGCTCCGGGCTAA (SEQ ID NO: 45) 15. CTAAAGACTACA (-/A) TTTCCCAGCATCCCATTG (SEQ ID NO: 46)

16. CCCTCCCTGCTTGCTTGCTTTCTCT [C/T] TCTCTCTTTCTTTCTTTCTTTCTT

17. CCCTGCTTGCTTGCTTTCTCTCTCT [C/T] TCTTTCTTTCTTTCTTTCTTTCTT (SEQ ID NO: 40) 18. AGAACCTGTTCAGGCTGGCGGCTCA [C/T] TTGGAT-

GAACAGGGAGTGTGTGAC (SEQ ID NO: 41 )

19. CCCCCTTCTTAGGACGCATGGGGGT [G/T] GAGAGAACGGGGAGATA- GACAGAG (SEQ ID NO: 42)

20. TGCGAGCAGCCCGGGCTACAGGGTT [A/G] CCTGAGGTGTGGGTCCCAGGATGG (SEQ ID NO: 43)

21. GGCGCCTCAACAGCCAGAAGGAGCG [A/G] AGCCTCAGGCCCAGG- CAGCTCTGG (SEQ ID NO: 44)

In another embodiment of the methods of the invention the primers and/or probes are capable of hybridizing to and/or amplifying a subsequence hybridizing to a single nucleotide polymorphism containing the sequence shown herein selected from the group of subsequences below or a sequence complementary thereto, wherein the polymorphism is denoted as for example T/C:

1. TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG TTTGTTTGAG

2. CTCGGGAGGCTGAGGCAGGAGAATC (A/G) CTTG AACTCAG GCA- GAGGTTG

3. ATTAAGTGCCTTCACACAGC (AJT) CTGGTTTAAT GTTTATAA (SEQ ID NO: 36) 4. ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa (SEQ ID NO: 37)

5. gggaggctcg aggcgggc (AJG) gattgcatga gctcaggatt (SEQ ID NO: 38)

6. tgcagtgagc tgagatcgc (A/G) ccactgcact ccagcctggg (SEQ ID NO: 28)

7. CTTGCTACAGAATTACAGGCA (T/C) GCGCCACCGCTCCGGGCTAA (SEQ ID NO: 45) 8. CTAAAGACTACA (-/A) TTTCCCAGCATCCCATTG (SEQ ID NO: 46)

In a preferred embodiment of the methods of the invention the the primers and/or probes are capable of hybridizing to and/or amplifying a subsequence hybridizing to a single nucleotide polymorphism containing the sequence shown herein selected from the group of subsequences below or a sequence complementary thereto, wherein the polymorphism is denoted as for example T/C:

1. gctgcagtga gctgt (-/ACACCTGTGGTCCCAGCTACTCTGG AAGCTGAGGTGGGAGGATCGCTTGAGCCCAAGAGGTGGAGGCTG- CAGTGAGCTGT) gactgtgcca ctgcactcca (SEQ ID NO: 33)

2. TGACAGTAGA CATCCTGTCA T (A/G) ATAAGTCttt ttttttt (SEQ ID NO: 34)

3. CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct (SEQ ID NO: 35)

4. ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc (SEQ ID NO: 21)

5. ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- CATGGTGGTGGGGTGC (SEQ ID NO:23 )

6. AAAAAACTAAAGTGGGGTTTGCGGG (GfT) AGTGGGAGGGCCCTTCCTGCTAGG (SEQ ID NO: 24)

7. ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac (SEQ ID NO: 25)

8. TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG TTTGTTTGAG 9. CTCGGGAGGCTGAGGCAGGAGAATC (A/G) CTTGAACTCAGGCA-

GAGGTTG

10. ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA (SEQ ID NO: 36)

11. ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa (SEQ ID NO: 37) 12. gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt (SEQ ID NO: 38)

13. tgcagtgagc tgagatcgc (A/G) ccactgcact ccagcctggg (SEQ ID NO: 28)

14. CTTGCTACAGAATTACAGGCA (T/C) GCGCCACCGCTCCGGGCTAA (SEQ ID NO: 45)

15. CTAAAGACTACA (-JA) TTTCCCAGCATCCCATTG (SEQ ID NO: 46)

In a preferred embodiment, the primers and/or probes are capable of hybridizing to a subsequence selected from the group of subsequences below:

1. gctgcagtga gctgt (-/ACACCTGTGGTCCCAGCTACTCTGG AAGCTGAGGTGGGAGGATCGCTTGAGCCCAAGAGGTGGAGGCTG-

CAGTGAGCTGT) gactgtgcca ctgcactcca (SEQ ID NO: 33)

2. TGACAGTAGA CATCCTGTCA T (A/G) ATAAGTCttt ttttttt (SEQ ID NO: 34)

3. CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct (SEQ ID NO: 35)

4. ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc (SEQ ID NO: 21) 5. ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG-

CATGGTGGTGGGGTGC (SEQ ID NO: 23)

6. AAAAAACTAAAGTGGGGTTTGCGGG (G/T) AGTGGGAGGGCCCTTCCTGCTAGG (SEQ ID NO: 24)

7. ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac (SEQ ID NO: 25) 8. TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG TTTGTTTGAG

9. CTCGGGAGGCTGAGGCAGGAGAATC (A/G) CTTGAACTCAGGCA- GAGGTTG

10. ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA (SEQ ID NO: 36) 11. ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa (SEQ ID NO: 37)

12. gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt (SEQ ID NO: 38)

In an even more preferred embodiment, the primers and/or probes are capable of hybridizing to a subsequence selected from the group of subsequences below:

1. ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- CATGGTGGTGGGGTGC (SEQ ID NO: 23)

2. AAAAAACTAAAGTGGGGTTTGCGGG (G/T) AGTGGGAGGGCCCTTCCTGCTAGG (SEQ ID NO: 24) 3. ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac (SEQ ID NO: 25)

4. TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG TTTGTTTGAG

5. CTCGGGAGGCTGAGGCAGGAGAATC (A/G) CTTGAACTCAGGCA- GAGGTTG

6. ATTAAGTGCCTTCACACAGC (A/T) CTGGTTTAAT GTTTATAA (SEQ ID NO: 36)

7. ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa (SEQ ID NO: 37)

8. gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt (SEQ ID NO: 38)

Most preferred are the primers and/or probes capable of hybridizing to a subse- quence selected from the group of subsequences below:

1. ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- CATGGTGGTGGGGTGC (SEQ ID NO: 23)

2. ATTAAGTGCCTTCACACAGC (AfT) CTGGTTTAAT GTTTATAA (SEQ ID NO: 36)

3. gggaggctcg aggcgggc (A/G) gattgcatga gctcaggatt (SEQ ID NO: 38)

In a second embodiment according to the methods of the invention are the primers and/or probes capable of hybridizing to a subsequence selected from the group of subsequences below:

I . gctgcagtga gctgt (-/ACACCTGTGGTCCCAGCTACTCTGG AAGCTGAGGTGGGAGGATCGCTTGAGCCCAAGAGGTGGAGGCTG- CAGTGAGCTGT) gactgtgcca ctgcactcca (SEQ ID NO: 33) 2. CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct (SEQ ID NO: 35)

3. ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc (SEQ ID NO: 21)

4. ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- CATGGTGGTGGGGTGC (SEQ ID NO: 23)

5. AAAAAACTAAAGTGGGGTTTGCGGG (G/T) AGTGGGAGGGCCCTTCCTGCTAGG (SEQ ID NO: 24)

6. ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac (SEQ ID NO: 25)

7. TGTTGTCCAA GCTGGCAGAG (A/G) TTTTTGTTTG TTTGTTTGAG (SEQ ID NO: 26)

8. CTCGGGAGGCTGAGGCAGGAGAATC (A/G) CTTGAACTCAGGCA- GAGGTTG (SEQ ID NO: 22)

9. ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa (SEQ ID NO: 37)

10. CTTGCTACAGAATTACAGGCA (T/C) GCGCCACCGCTCCGGGCTAA (SEQ ID NO: 45)

I 1. CTAAAGACTACA (-/A) TTTCCCAGCATCCCATTG (SEQ ID NO: 46)

In a preferred embodiment, the primers and/or probes are capable of hybridizing to a subsequence selected from the group of subsequences below:

1. CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct (SEQ ID NO: 35) 2. ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc (SEQ ID NO: 21)

3. ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- CATGGTGGTGGGGTGC (SEQ ID NO: 23)

4. AAAAAACTAAAGTGGGGTTTGCGGG (G/T) AGTGGGAGGGCCCTTCCTGCTAGG (SEQ ID NO: 24) 5. ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac (SEQ ID NO: 25)

6. TGTTGTCCAA GCTGGCAGAG (PJG) TTTTTGTTTG TTTGTTTGAG

7. CTCGGGAGGCTGAGGCAGGAGAATC (NG) CTTGAACTCAGGCA- GAGGTTG

8. ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa (SEQ ID NO: 37)

In an even more preferred embodiment the primers and/or probes are capable of hybridizing to a subsequence selected from the group of subsequences below:

1. CATCCCCATA CCAAcccacc (c/t) tactgctctg atctcctcct (SEQ ID NO: 35) 2. ttttagtagagacatggttccgcca (C/T) gttgcccaggctggtcttgaactc (SEQ ID NO: 21)

3. ACTAAAAATAAAAAATAAAAAAAA(-/AA) ATAGCCGAG- CATGGTGGTGGGGTGC (SEQ ID NO: 23)

4. AAAAAACTAAAGTGGGGTTTGCGGG (G/T) AGTGGGAGGGCCCTTCCTGCTAGG (SEQ ID NO: 24) 5. ggatcacaag gtcaggagtt (c/t) gagaccagcc tggccaacac (SEQ ID NO: 25)

6. TGTTGTCCAA GCTGGCAGAG (PJG) TTTTTGTTTG TTTGTTTGAG (SEQ ID NO: 26)

7. CTCGGGAGGCTGAGGCAGGAGAATC (PJG) CTTG AACTCAG GCA- GAGGTTG (SEQ ID NO: 22)

Even more preferred are the primers and/or probes are capable of hybridizing to a subsequence selected from the group of subsequences below:

1. CTTGCTACAGAATTACAGGCA (T/C) GCGCCACCGCTCCGGGCTAA (SEQ ID NO: 45)

2. CTAAAGACTACA (-/A) TTTCCCAGCATCCCATTG (SEQ ID NO: 46)

3. ttgggagacc aaggcaggtg gate (a/t) tttgaggtca gtagatcaaa (SEQ ID NO: 37)

In one embodiment the primers or probes able to detect subsequences as described above are selected from one or more of the following:

1. agt cac age tea ctg cag cct c (SEQ ID NO: 47)

2. ace tct tgg get caa gcg ate etc (SEQ ID NO: 48)

3. aaa aaa aga ctt ate atg aca gga tgt ct (SEQ ID NO: 49) 4. gca aga etc cgt ccc aga aaa aga aaa (SEQ ID NO: 50)

5. tcctctctctcccccagctcattttg (SEQ ID NO: 51)

6. ACCCCACCCTACTGCTCTGATCTC (SEQ ID NO: 52)

7. agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53)

8. ggt tec gee acg ttg cc (SEQ ID NO: 54) 9. teg get att ttt ttt ttt att ttt tta tt (SEQ ID NO: 55)

10. att aca ggc ace cac cac cat g (SEQ ID NO: 56)

11. ace cca ctt tag ttt ttt ttt cct eta gtg ate gee (SEQ ID NO: 57)

12. gcc etc cca eta ccc gca (SEQ ID NO: 58)

13. aga egg ggt ttc act gtg ttg gc (SEQ ID NO: 59) 14. agg ctg gtc tea aac tec tga c (SEQ ID NO: 60)

15. CAAACAAACAAAAACCTCTGCCA (SEQ ID NO: 61 )

16. CTTGGACAACATAGGGAGACCCTGTGT (SEQ ID NO: 62)

17. tgc etc age etc ccg agt age t (SEQ ID NO: 63)

18. cct cct gag ttc aag cga ttc tc (SEQ ID NO: 64) 19. tgc ctt cac aca get ctg gtt taa tg (SEQ ID NO: 65)

20. tgc ctt cac aca gca ctg gtt taa tg (SEQ ID NO: 66)

21. ACCATGTTGGCCAGGCTGGTTTT (SEQ ID NO: 67)

22. ATCTACTGACCTCAAATGATCCACCT (SEQ ID NO: 68)

23. tgc aat ccg ccc gcc (SEQ ID NO: 69) 24. cca ggc tgg ttt gga aat cct gag etc (SEQ ID NO: 70)

25. ctg aga teg cac cac tgc ac (SEQ ID NO: 71)

26. ggg agg egg age ttg cag tga (SEQ ID NO: 72)

27. gcg cat gcc tgt aat tct gta (SEQ ID NO: 73)

28. cag gac gag cca cag aca aaa etc c (SEQ ID NO: 74) 29. tgc aat gag get cct ggc c (SEQ ID NO: 75)

30. act aca att tec cag cat ccc a (SEQ ID NO: 76)

31. cct ccc tec etc cct gc (SEQ ID NO: 77)

32. tgc ttg ctt tct tct tct (SEQ ID NO: 78)

33. tec ctg ctt get tgx ttt etc t (SEQ ID NO: 79) 34. tct etc ttt ctt tct ttc ttt c (SEQ ID NO: 80)

35. tgt tea tec aaa tga gee gc (SEQ ID NO: 81)

36. age ctg aac agg ttc tgt tec ttc gac tt (SEQ ID NO: 82)

37. caa get get ate teg ace gat ctt (SEQ ID NO: 83)

38. ggg tga cca ccc tgc cag cc (SEQ ID NO: 84) 39. egg get aca ggg tta cct gag (SEQ ID NO: 85)

40. tct gca ace tgg tgc gag cag c (SEQ ID NO: 86)

41. tga ggc tec get cct tct gg (SEQ ID NO: 87)

42. age tgc cag age tgc ctg ggc (SEQ ID NO: 88)

One or more primers able to detect subsequences by hybridisation as described may in a particularly preferred embodiment of the method of the invention be selected from

1. teg get att ttt ttt ttt att ttt tta tt (SEQ ID NO: 55) 2. att aca ggc ace cac cac cat g (SEQ ID NO: 56)

3. tgc ctt cac aca get ctg gtt taa tg (SEQ ID NO: 65)

4. tgc ctt cac aca gca ctg gtt taa tg (SEQ ID NO: 66)

5. tgc aat ccg ccc gee (SEQ ID NO: 69)

6. cca ggc tgg ttt gga aat cct gag etc (SEQ ID NO: 70)

According to another aspect of the present invention there is provided a diagnostic nucleic acid primer capable of detecting a r region polymorphism at one or more of positions in the r region as defined by the in SEQ ID NO: 1.

The primer or probe may be a diagnostic nucleic acid primer defined as an allele specific primer, used, generally together with a constant primer, in an amplification reaction such as a PCR reaction, which provides the discrimination between alleles through selective amplification of one allele at a particular sequence position. The diagnostic primer is preferably 5-50 nucleotides, more preferably about 5-35 nucleo-

tides, more preferably about 5-30 nucleotides, more preferably at least 9 nucleotides.

In accordance with the present invention diagnostic primers are provided, compris- ing the sequences set out below as well as derivatives thereof wherein about 6-8 of the nucleotides at the 3' terminus are identical to the sequences given below and wherein up to 10, such as up to 8, 6, 4, 2, or 1 of the remaining nucleotides may be varied without significantly affecting the properties of the diagnostic primer. Conveniently, the sequence of the diagnostic primer is as written below.

Furthermore, as described above at least two sets of primer(s) and/or probe(s), such as at least three sets of primer(s) and/or probe(s), for example four sets of primer(s) and/or probes may be combined in the method thereby increasing the correlation probability. This second or other set of primer(s) and/or probe(s) may be a nucleo- tide or nucleotide analogues hybridising to a region within the region r or to a sequence different from the region r. Said sequence different from the region r is preferably a region in chromosome 19, preferably in chromosome 19q. In particular such second or other primer or probe may be selected from one or more of the sequences below, or the complementary strands:

1. tgc etc ace cct gta ate c (SEQ ID NO: 90)

2. get tgt aat ccc age tac teg (SEQ ID NO: 91 and SEQ ID NO: 89)

3. caa cac tea cac ccc aca g (SEQ ID NO: 92)

4. aga tea cgc cac tgc act c (SEQ ID NO: 93) 5. ttg aca att gag caa aga gc (SEQ ID NO: 94)

6. ttg gat tac aga cgt gag c (SEQ ID NO: 95)

7. agt gca gee tea act tec (SEQ ID NO: 96)

8. cca gtc caa aca ata tga tec (SEQ ID NO: 97)

9. cat gat tea ctg cac cca ace (SEQ ID NO: 98) 10. ttt cac tct tgt tgc cca age (SEQ ID NO: 99)

11. cac cat gcc tgg etc caa tgt (SEQ ID NO: 100)

12. age cca gga att caa gg (SEQ ID NO: 101 )

13. aga aca ttg gag cca gg (SEQ ID NO: 102)

14. aga ate act gga ate cag g (SEQ ID NO: 103) 15. ttt tea cac agg tec aat cc (SEQ ID NO: 104)

16. act gca ace tec ate tec (SEQ ID NO: 105)

17. caa tta agt gee ttc aca cag ca (SEQ ID NO: 106)

18. ggt caa gag ttc aag ace age (SEQ ID NO: 107)

19. ccc tgc ccc ace tct cc (SEQ ID NO: 108) 20. agt caa ttt ctg tgc aaa eta ctt tta ttt (SEQ ID NO: 109)

21. gag gca aca gga aca aac c (SEQ ID NO: 110)

22. cattgg aat gag cag aaa cc (SEQ ID NO: 111)

23. taa cat aaa gaa tea gga gga ggc (SEQ ID NO: 112)

24. agt tgg etc ate tgc etc tt (SEQ ID NO: 113) 25. tgg eta aca egg tga aac c (SEQ ID NO: 114)

26. gga ate caa aga ttc tat gat gg (SEQ ID NO: 115)

27. act cct gac ttc aaa tga tec (SEQ ID NO: 116)

28. tag ccc cca gtc acg ttc c (SEQ ID NO: 117)

29. aga agt cca aga gtt tgc age (SEQ ID NO: 118) 30. ttc tea gtc cca gaa tga ace (SEQ ID NO: 119)

31. cca ctt agg taa aca cct ctt (SEQ ID NO: 120)

32. ctg caa tga gcc gag ata gaa (SEQ ID NO: 121)

33. cca ctt agg taa aca cct ctt (SEQ ID NO: 122)

34. ctg caa tga gcc gag ata gaa (SEQ ID NO: 123) 35. atg ttg ggg aga ctg agg (SEQ ID NO: 124)

36. ccg cat eta act tat tct gg (SEQ ID NO: 125)

37. aac tac etc tgc aaa ccc age (SEQ ID NO: 126)

38. ttg gaa tgg agg gat tct ace (SEQ ID NO: 127)

39. ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) 40. cct ttc tec ttc cac caa eg (SEQ ID NO: 129)

41. gga cag atg gca atg atg g (SEQ ID NO: 130)

42. tct tct tct tgg tgg atg tgg (SEQ ID NO: 131)

However, in another embodiment a primer or probe for use in the methods of the present invention may be selected from the group of nucleotides consisting of agt gca gcc tea act tec (SEQ ID NO: 96) cca gtc caa aca ata tga tec (SEQ ID NO: 97)

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) cat gat tea ctg cac cca ace (SEQ ID NO: 98)

ttt cac tct tgt tgc cca age (SEQ ID NO: 99) ttt tea cac agg tec aat cc (SEQ ID NO: 104) act gca ace tec ate tec (SEQ ID NO: 105) atg ttg ggg aga ctg agg (SEQ ID NO: 124) ccg cat eta act tat tct gg (SEQ ID NO: 125) aac tac etc tgc aaa ccc age (SEQ ID NO: 126) ttg gaa tgg agg gat tct ace (SEQ ID NO: 127) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131 ) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO:137)

In yet another embodiment of the present invention the primer or probe is selected from the group of nucleotides consisting of agt gca gee tea act tec (SEQ ID NO: 96) cca gtc caa aca ata tga tec (SEQ ID NO: 97)

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134) TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) cat gat tea ctg cac cca ace (SEQ ID NO: 98) ttt cac tct tgt tgc cca age (SEQ ID NO: 99) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO:137)

In a further embodiment the primer or probe for use in a methods described herein is selected from the group of nucleotides consisting of agt gca gee tea act tec (SEQ ID NO: 96) cca gtc caa aca ata tga tec (SEQ ID NO: 97)

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134) TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135)

cat gat tea ctg cac cca ace (SEQ ID NO: 98) ttt cac tct tgt tgc cca age (SEQ ID NO: 99) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO:137)

In yet a further embodiment the primer or probe is selected from the group of nu- cleotides consisting of agt gca gcc tea act tec (SEQ ID NO: 96) cca gtc caa aca ata tga tec (SEQ ID NO: 97)

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO:137)

In a particular embodiment of the present invention the primer or probe is selected from the group of nucleotides consisting of

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129)

In yet another embodiment the primer or probe is selected from the group of nucleotides consisting of

AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 137)

However, the primer or probe may be selected from the group of nucleotides consisting of AAA AAA ATA GCC GAG CAT GG(SEQ ID NO: 134)

TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) ace atg gcg cct caa ca (SEQ ID NO: 136) gaa ttg get cag tea ctg tgt ga (SEQ ID N0:137)

The primer or probe in one embodiment of the present invention includes the use of a primer or probe being selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gee acg ttg cc (SEQ ID NO: 54) teg get att ttt ttt ttt att ttt tta tt (SEQ ID NO: 55) att aca ggc ace cac cac cat g (SEQ ID NO: 56)

cct gga caa cat agg gag ace ctg tgt (SEQ ID NO: 138) caa aca aac aaa aac etc tgc ca (SEQ ID NO: 139) atg ttg ggg aga ctg agg (SEQ ID NO: 124) ccg cat eta act tat tct gg (SEQ ID NO: 125) aac tac etc tgc aaa ccc age (SEQ ID NO: 126) ttg gaa tgg agg gat tct ace (SEQ ID NO: 127) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141 )

The primer or probe may, however, be selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gee acg ttg cc (SEQ ID NO: 54) teg get att ttt ttt ttt att ttt tta tt (SEQ ID NO: 55) att aca ggc ace cac cac cat g (SEQ ID NO: 56) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) ace atg gcg cct caa ca (SEQ ID NO: 140)

gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141 )

Or the primer or probe is selected from the group consisting teg get att ttt ttt ttt att ttt tta tt (SEQ ID NO: 55) att aca ggc ace cac cac cat g (SEQ ID NO: 56) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) gga cag atg gca atg atg g (SEQ ID NO: 130) tct tct tct tgg tgg atg tgg (SEQ ID NO: 131) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141 )

In another embodiment the primer or probe is selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gee acg ttg cc (SEQ ID NO: 54) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141)

In another embodiment of the present invention the primer or probe used in the methods described herein is selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gee acg ttg cc (SEQ ID NO: 54) ace atg gcg cct caa ca (SEQ ID NO: 140) gaa ttg get cag tea ctg tgt ga (SEQ ID NO: 141)

In a further embodiment the primer or probe is selected from the group consisting of agg ctg gtc ttg aac tec tgg get taa g (SEQ ID NO: 53) ggt tec gee acg ttg cc (SEQ ID NO: 54) ggt ttt ctg etc tgc aca eg (SEQ ID NO: 128) cct ttc tec ttc cac caa eg (SEQ ID NO: 129)

It is understood that the primers may be selected individually or for example in pairs. However, it is also within the scope of the present application to combine any one of the primers in any combination.

In one embodiment the polymorphism observed at position 18147012 in contig NT_011109, RAI3'd1 , is detected using one fluorescent primer and one non- fluorescent in. The primers are: RAI3'd1-f1 : 5'-AAA AAA ATA GCC GAG CAT GG (SEQ ID NO: 134) and RAI3'd1-r1 : 5'-6FAM-TT TGG ACT GGG TAA GAA TTT CC (SEQ ID NO: 135).

The primers and probes may be manufactured using any convenient method of synthesis. Examples of such methods may be found in standard textbooks, for example "Protocols for Oligonucleotides and Analogues; Synthesis and Properties," Methods in Molecular Biology Series; Volume 20; Ed. Sudhir Agrawal, Humana ISBN: 0- 89603-247-7; 1993; Lsup.st Edition. If required the primer(s) and probe(s) may be labelled to facilitate detection.

Kit

According to another aspect of the present invention, there is provided a diagnostic kit comprising at least one diagnostic primer of the invention and/or at least one al- lele-specific oligonucleotide primer of the invention.

The diagnostic kits may comprise appropriate packaging and instructions for use in the methods of the invention. Such kits may further comprise appropriate buffer(s) and polymerase(s) such as thermostable polymerases, for example taq polymerase.

Preferred kits can comprise means for amplifying the relevant sequence such as primers, polymerase, deoxynucleotides, buffer, metal ions; and/or means for discriminating the polymorphism, such as one or a set of probes hybridising to the poly- morphic site, a sequence reaction covering the polymorphic site, an enzyme or an antibody; and/or a secondary amplification system, such as enzyme-conjugated antibodies, or fluorescent antibodies. The kit-of-parts preferably also comprises a detection system, such as a fluorometer, a film, an enzyme reagent or another highly sensitive detection device.

The methods described herein may be performed, for example, by utilizing prepackaged diagnostic kits. The invention therefore also encompasses kits for detecting the presence of a polypeptide or nucleic acid of the invention in a biological sample (i.e., a test sample). Such kits can be used, e.g., to determine if a subject is suffering from or is at increased risk of developing a disorder associated with a disorder-causing allele, or aberrant expression or activity of a polypeptide of the invention. For example, the kit can comprise a labeled compound or agent capable of detecting the polypeptide or mRNA or DNA or RAI gene sequences, e.g., encoding the polypeptide in a biological sample. The kit can further comprise a means for de- termining the amount of the polypeptide or mRNA in the sample (e.g., an antibody which binds the polypeptide or an oligonucleotide probe which binds to DNA or mRNA encoding the polypeptide). Kits can also include instructions for observing that the tested subject is suffering from or is at risk of developing a disorder associated with aberrant expression of the polypeptide if the amount of the polypeptide or mRNA encoding the polypeptide is above or below a normal level, or if the DNA correlates with presence of an RAI allele that causes a disorder.

One embodiment of the present invention is a diagnostic kit comprising at least one diagnostic primer of the invention directed against the RAI3'd1 polymorphic marker and/or at least one allele-specific oligonucleotide primer of the invention. However, the kit may comprise for example two diagnostic primers, such as three, for example four, such as five, for example six, such as seven, for example eight diagnostic primers and/or probes as described herein. For example the kit may further comprise at least one diagnostic primer of the invention directed against the ASE 1 exon 1 polymorphism. However, the kit may also comprise at least one diagnostic primer of the invention directed against the ASE 1exon 3-2. In one embodiment the at least one diagnostic primer and/ or at least one allele-specific oligonuclotide primer is the RAI3'd1 -f2 primer, such as the RAI3' d1-r2 primer or for example SEQ ID NO: 134, such as SEQ ID NO: 135. However, it is appreciated that the present invention is not limited to the four above named primers, but that any primer capable of hybridizing or capable of amplifying the RAI3'd1 polymorphic marker is within the scope of the present application. Similarly, primers or probes directed against the ASE 1 exon 1 and/or ASE 1 exon 3-2 as described in the present invention may be used. Howver, it is understood that any primer or probe capable of hybridizing or capable of ampli-

fying ASE 1 exon 1 and/or ASE 1 exon 3-2 is within the scope of the present invention.

For antibody-based kits, the kit can comprise, for example: (1) a first antibody (e.g., attached to a solid support) which binds to a polypeptide of the invention; and, optionally, (2) a second, different antibody which binds to either the polypeptide or to the first antibody and is conjugated to a detectable agent.

Identification of an allele as having implication for risk of cancer An allele in the r region can be identified as correlated with an increased risk of developing cancer on the basis of statistical analyses of the incidence of a particular allele in two groups of individuals with and without cancer, respectively, according to the χ 2 test, which is well known in the art. Furthermore, an allele in the region can be identified as an allele correlated with prognosis of cancer on the basis of statistical analyses of the incidence of a particular allele in individuals demonstrating different prognostic characteristics.

Identification of humans having increased likelihood of responding to treatment it is further contemplated that the present invention provides a method for identifying a human subject as having an increased likelihood of responding positively to a cancer treatment, comprising determining the presence in the subject of a s or r region allele genotype correlated with an increased likelihood of positive response to treatment, whereby the presence of the genotype identifies the subject as having an increased likelihood of responding to cancer treatment.

The treatment mentioned herein may be any cancer treatment, such as conventional cancer treatment, for example X-ray, chemotherapeutics, surgical excision or combinations thereof.

Protein Products of the Gene(s)

Gene products of the region r or peptide fragments thereof, can be prepared for a variety of uses. For example, such gene products, or peptide fragments thereof, can be used for the generation of antibodies, in diagnostic assays.

The gene products of the invention include, but are not limited to, human RAI gene products. In the following the invention is described in relation to RAI gene product.

Gene product, sometimes referred to herein as an "protein" or "polypeptide", in- eludes those gene products encoded by the RAI gene sequences shown as position 7760-22885 in SEQ ID NO: 1. Among gene product variants are gene products comprising amino acid residues encoded by the polymorphisms. Such gene product variants also include a variant of the RAI gene product.

In addition, RAI gene products may include proteins that represent functionally equivalent gene products. In preferred embodiments, such functionally equivalent RAI gene products are naturally occurring gene products. Functionally equivalent RAI gene products also include gene products that retain at least one of the biological activities of the RAI gene products described above, and/or which are recognized by and bind to antibodies (polyclonal or monoclonal) directed against RAI gene products.

Antibodies to Gene Products

Described herein are methods for the production of antibodies capable of specifi- cally recognizing one or more gene product epitopes or epitopes of conserved variants or peptide fragments of the gene products. Furthermore, antibodies that specifically recognize mutant forms are encompassed by the invention. The terms "specifically bind" and "specifically recognize" refer to antibodies that bind to RAI gene product epitopes at a higher affinity than they bind to non-RAI (e.g., random) epi- topes.

Such antibodies may include, but are not limited to, polyclonal antibodies, monoclonal antibodies (mAbs), humanized or chimeric antibodies, single chain antibodies, Fab fragments, F(ab') 2 fragments, fragments produced by a Fab expression library, anti-idiotypic (anti-Id) antibodies, and epitope-binding fragments of any of the above, including the polyclonal and monoclonal antibodies described below. Such antibodies may be used, for example, in the detection of a gene product in a biological sample and may, therefore, be utilized as part of a diagnostic or prognostic technique whereby patients may be tested for abnormal levels of gene products, and/or for the presence of abnormal forms of such gene products. Such antibodies may

also be utilized in conjunction with, for example, compound screening schemes, as described, below, for the evaluation of the effect of test compounds on gene product levels and/or activity.

For the production of antibodies against a gene product, various host animals may be immunized by injection with a RAI gene product, or a portion thereof. Such host animals may include, but are not limited to rabbits, mice, and rats, to name but a few. Various adjuvants may be used to increase the immunological response, depending on the host species, including but not limited to Freund's (complete and in- complete), mineral gels such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, dinitrophenol, and potentially useful human adjuvants such as BCG (bacille Calmette-Guerin) and Corynebacterium parvum.

Polyclonal antibodies are heterogeneous populations of antibody molecules derived from the sera of animals immunized with an antigen, such as a gene product, or an antigenic functional derivative thereof. For the production of polyclonal antibodies, host animals such as those described above, may be immunized by injection with gene product supplemented with adjuvants as also described above.

Monoclonal antibodies, which are homogeneous populations of antibodies to a particular antigen, may be obtained by any technique that provides for the production of antibody molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma technique of Kohler and Milstein (1975, Nature 256:495- 497; and U.S. Pat. No. 4,376,110), the human B-cell hybridoma technique (Kosbor et al., 1983, Immunology Today 4:72; Cole et al., 1983, Proc. Natl. Acad. Sci. U.S.A. 80:2026-2030), and the EBV-hybridoma technique (Cole et al., 1985, Monoclonal Antibodies And Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Such antibodies may be of any immunoglobulin class including IgG 1 IgM, IgE, IgA, IgD and any subclass thereof. The hybridoma producing the mAb of this invention may be cultivated in vitro or in vivo. Production of high titers of mAbs in vivo makes this the presently preferred method of production.

In addition, techniques developed for the production of "chimeric antibodies" (Morri- son, et al., 1984, Proc. Natl. Acad. Sci., 81 :6851-6855; Neuberger, et al., 1984, Na-

ture 312:604-608; Takeda, et al., 1985, Nature, 314:452-454) by splicing the genes from a mouse antibody molecule of appropriate antigen specificity together with genes from a human antibody molecule of appropriate biological activity can be used. A chimeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable region derived from a murine mAb and a human immunoglobulin constant region. (See, e.g., Cabilly et al., U.S. Pat. No. 4,816,567; and Boss et al., U.S. Pat. No. 4,816397, which are incorporated herein by reference in their entirety.)

In addition, techniques have been developed for the production of humanized antibodies. (See, e.g., Queen, U.S. Pat. No. 5,585,089, which is incorporated herein by reference in its entirety.) An immunoglobulin light or heavy chain variable region consists of a "framework" region interrupted by three hypervariable regions, referred to as complementarily determining regions (CDRs). The extent of the framework region and CDRs have been precisely defined (see, "Sequences of Proteins of Immunological Interest", Kabat, E. et al., U.S. Department of Health and Human Services (1983) ). Briefly, humanized antibodies are antibody molecules from non- human species having one or more CDRs from the non-human species and a framework region from a human immunoglobulin molecule.

Alternatively, techniques described for the production of single chain antibodies (U.S. Pat. No. 4,946,778; Bird, 1988, Science 242:423-426; Huston, et al., 1988, Proc. Natl. Acad. Sci. U.S.A. 85:5879-5883; and Ward, et al., 1989, Nature 334:544- 546) can be adapted to produce single chain antibodies against gene products. Sin- gle chain antibodies are formed by linking the heavy and light chain fragments of the Fv region via an amino acid bridge, resulting in a single chain polypeptide.

Antibody fragments that recognize specific epitopes may be generated by known techniques. For example, such fragments include but are not limited to: the F(ab') 2 fragments, which can be produced by pepsin digestion of the antibody molecule and the Fab fragments, which can be generated by reducing the disulfide bridges of the F(ab') 2 fragments. Alternatively, Fab expression libraries may be constructed (Huse, et al., 1989, Science 246:1275-1281) to allow rapid and easy identification of monoclonal Fab fragments with the desired specificity.

Immunoassays for gene products, conserved variants, or peptide fragments thereof will typically comprise incubating a sample, such as a biological fluid, a tissue extract, freshly harvested cells, or lysates of cells in the presence of a detectably labeled antibody capable of identifying gene product, conserved variants or peptide fragments thereof, and detecting the bound antibody by any of a number of techniques well-known in the art.

The biological sample may be brought in contact with and immobilized onto a solid phase support or carrier, such as nitrocellulose, that is capable of immobilizing cells, cell particles or soluble proteins. The support may then be washed with suitable buffers followed by treatment with the detectably labeled gene product specific antibody. The solid phase support may then be washed with the buffer a second time to remove unbound antibody. The amount of bound label on the solid support may then be detected by conventional means.

By "solid phase support or carrier" is intended any support capable of binding an antigen or an antibody. Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros, and magnetite. The nature of the carrier can be either soluble to some extent or insoluble for the purposes of the present invention. The support material may have virtually any possible structural configuration so long as the coupled molecule is capable of binding to an antigen or antibody. Thus, the support configuration may be spherical, as in a bead, or cylindrical, as in the inside surface of a test tube, or the external surface of a rod. Alternatively, the surface may be flat such as a sheet, test strip, etc. Preferred supports include polystyrene beads. Those skilled in the art will know many other suitable carriers for binding antibody or antigen, or will be able to ascertain the same by use of routine experimentation.

One of the ways in which the RAI gene product-specific antibody can be detectably labeled is by linking the same to an enzyme, malate dehydrogenase, staphylococcal nuclease, delta-5-steroid isomerase, yeast alcohol dehydrogenase, α-glycero- phosphate, dehydrogenase, triose phosphate isomerase, horseradish peroxidase, alkaline phosphatase, asparaginase, glucose oxidase, β-galactosidase, ribonucle- ase, urease, catalase, glucose-6-phosphate dehydrogenase, glucoamylase and acetylcholinesterase. The detection can be accomplished by colorimetric methods

that employ a chromogenic substrate for the enzyme. Detection may also be accomplished by visual comparison of the extent of enzymatic reaction of a substrate in comparison with similarly prepared standards.

Detection may also be accomplished using any of a variety of other immunoassays. For example, by radioactively labeling the antibodies or antibody fragments, by labeling the antibody with a fluorescent compound. Among the most commonly used fluorescent labeling compounds are fluorescein isothiocyanate, rhodamine, phyco- erythrin, phycocyanin, allophycocyanin, o-phthaldehyde and fluorescamine.

The antibody can also be detectably labeled using fluorescence emitting metals such as 152 Eu, or others of the lanthanide series or by coupling it to a chemilumines- cent compound.

Diseases

Described herein are various applications of gene sequences, gene products, including peptide fragments and fusion proteins thereof, and of antibodies directed against gene products and peptide fragments thereof. Such applications include, for example, prognostic and diagnostic evaluation of a disease, such as cancer, and the identification of subjects with a predisposition to such disorders, as described above.

The method according to the invention may be used in relation to any cancer form, such as, but not limited to, skin carcinoma including malignant melanoma, breast cancer, lung cancer, colon cancer and other cancers in the gastro-intestinal tract, prostate cancer, lymphoma, leukemia, multiple myeloma, pancreas cancer, head and neck cancer, ovary cancer and other gynecological cancers. In particular the method is relevant for skin cancer, lung cancer, colon cancer, multiple myeloma, and breast cancer, such as skin cancer, breast cancer, multiple myeloma, and lung cancer, such as skin cancer, breast cancer and lung cancer, such as skin cancer and breast cancer, preferably wherein the skin cancer is basal cell carcinoma, such as lung cancer. For example the cancer is multiple myeloma, In another embodiment the cancer is breast cancer. The cancer may also be skin cancer, preferably basal cell carcinoma, for example early age basal cell carcinoma.

In particular, the method is relevant for both early age cancer and later age cancer, such as early age breast cancer, and such as later age breast cancer.

The method is of particular relevance for lung cancer, such as in patients with XPD exon 23 AA . The method is also in particular relevant for early age skin cancer, such as early age basal cell carcinoma.

Gene nucleic acid sequences, described above, can be utilized for transferring recombinant nucleic acid sequences to cells and expressing said sequences in recipi- ent cells. Such techniques can be used, for example, in marking cells or for the treatment of cancer. Such treatment can be in the form of gene replacement therapy. Specifically, one or more copies of a normal RAI gene or a portion of the RAI gene that directs the production of an RAI gene product exhibiting normal RAI gene function, may be inserted into the appropriate cells within a patient, using vectors that include, but are not limited to, adenovirus, adeno-associated virus, and retrovirus vectors, in addition to other particles that introduce DNA into cells, such as liposomes.

In another embodiment, the invention may be used in relation to inflammatory dis- eases, such as, but not limited thereto, rheumatoid arthritis, colitis ulcerosa, Crohn's disease, thyroiditis, neural inflammation as in Alzheimer's disease, and Guillain- Barre syndrome.

RAI function

Presumably, the primary effector influences activity or expression of RAI or XPD. Possibly, one of the markers analyzed is the effector itself. However, there are several alternative possibilities. First, the effector could modify the sequence of RAI protein, through amino acid substitution, splicing or termination. Secondly, one likely position of the effector is in the 3' portion of the gene and that the effector modifies mRNA expression and stability (22). Thirdly, one could imagine that the effector was a modification of an enhancer situated between RAI and XPD, again modulating the expression of RAI, and possibly modulating other nearby genes as well. Finally, the primary effector may be located in the promoter or first intron of RAI and influence the transcriptional activity of the gene.

If a primary result of the effector is modified RAI action, then one result may be modified apoptosis. RAI protein is an inhibitor of ReIA, a subunit of the transcription factor NF-κB (23, 24). NF-κB has long been implicated in both cell proliferation and apoptosis. Modulation of NF-κB may well be part of a "crunch-time scenario", in- voked when the cell has to muster its forces and make life-and-death decisions. A recent scientific paper suggests that the choice between the cell survival and death is regulated by the relative activity of the two subunits encoded by ReIA and c-rel, respectively (25). By neutralising the ReIA product, RAI protein would presumably shift this balance towards apoptosis. RAI protein may also influence p53 activity (26).

Predictor of RAI action

The present invention relates to method of estimating the disease risk of an individual comprising further a predictor of RAI action. In one embodiment the length of the RAI transcript may be used as a predictor. In another embodiment the quantity of RAI transcripts may be used as a predictor. The present invention comprises the combined RAI transcript characteristics described above as a predictor of RAI action and the estimation of the disease risk of an individual.

Examples

Materials and methods

Study groups. Diet, Cancer and Health (DCH) is a Danish prospective follow-up study. Individuals eligible for invitation were born in Denmark, were living in the Copenhagen or Aarhus areas, and were not at the time of invitation registered with a previous diagnosis of cancer (including non-melanoma skin cancer) in the Danish

Cancer Registry. Invited to participate were 160,725 individuals aged 50-64 years of which 57.053 individuals were recruited (22). Among these 542 were later notified to the Danish Cancer Register with a cancer diagnosed before the date of enrolment and were therefore excluded. At enrolment (1993-1997), detailed information on diet, smoking habits, lifestyle, weight, height, reproduction, medical treatment, and other socio-economic characteristics and environmental exposures were collected. For women hormone replacement therapy (HRT) and menopausal status were also recorded. Moreover, blood, urine, fat tissue and other biological material was sampled and stored in a biobank at -15O 0 C. Cohort members were identified by a unique identification number, which is allocated to every Danish citizen by the Central

Population Registry. Cohort members were linked to The Central Population Registry for information on vital status and immigration. Information on cancer occurrence among cohort members was obtained through record linkage to the Danish Cancer Registry, which collects information on all inhabitants in Denmark who develop can- cer (7). Linkage was performed by use of the personal identification number.

Study groups for breast cancer. In "Diet Health and Cancer" 79,729 women aged 50-64 years were invited to participate in the study and 29,875 accepted the invitation. Of these a total of 326 women that later were reported to the Danish Cancer Registry with a cancer before the visit to the study clinic were excluded from the study. In addition, 8 women were excluded from the study because they did not fill in the lifestyle questionnaire. Because the present analysis aimed at the subgroup of women, who were postmenopausal at study entry, we further excluded 4,844 supposedly pre-menopausal women, including 4,798 women who had reported at least one menstruation no more than 12 months prior to entry and no use of hormone replacement therapy (HRT), nine women who gave a lifetime history of no menstruations and 37 women who did not answer the questions about current or previous use of HRT, leaving 24,697 postmenopausal women.

Each cohort member was followed-up for breast cancer occurrence from date of entry, i.e. date of visit to the study centre until the date of diagnosis of any cancer (except for non-melanoma skin cancer), date of death, date of emigration, or 31 December 2000, whichever came first. A total of 434 women were diagnosed with incident breast cancer during the follow up period (Vogel et al. In press), and a similar number was selected as controls by individual matching for age, HRT use and menopausal status.

Study groups for lung cancer. These were also recruited from "Diet, Health and Cancer". Among the 56.511 individuals with no previous cancer diagnosis, 2291 with incomplete, inconsistent or missing information on smoking habits were excluded. Among the remainder 54.220 cohort members included in this study, 265 cases of lung cancer, diagnosed between 1994 and 2001 , were identified in the files of the Danish Cancer Registry (23). Also from among the cohort members, a sub-cohort of 272 persons (including 4 of the cases) were selected at random and weighted in numbers similarly to cases within strata defined by sex, year of birth (5-year inter-

vals) and duration of smoking (10 years intervals). Blood samples were available in the biobank for 521 of the 533 selected individuals (Vogel et al. In press).

Study groups for basal cell carcinoma (basocellular carcinoma BCC). The groups of Caucasian Americans with and without BCC have been described previously (Athas et al, Cancer Res. 51 :5786-5793, 1991; Wei et al, Proc. Natl. Acad. Sci USA, 90: 1614-8, 1994). Briefly, the study was a clinic based case control study at the Johns Hopkins Hospital, which serves multiple participating dermatologists in Maryland. Cases were histo-pathologically confirmed primary BCCs and were diagnosed be- tween 1987 and 1990. The controls were patients from the same physician practices and had a diagnosis of mild skin disorders. All participants were Caucasians living near Baltimore and were between 20 and 60 years of age. The controls were frequency matched to the cases by age and sex. Cases and controls with any other forms of cancer were excluded. In the questionnaire, the study subjects were asked if they had any blood relatives with skin cancer, and were asked to specify the type of cancer. Study subjects with relatives with basal cell carcinoma and squamous cell carcinoma and 'skin cancer' were included in the group of subjects with a family of skin cancer. Subjects with relatives with melanoma were not included. At the clinic visit the subjects gave informed consent, were examined by dermatologists, com- pleted a structured questionnaire and provided blood. Available frozen lymphocytes were genotyped. Initially, 71 cases and 118 controls were included in this study. However, the number of persons varied between analyses, as the supply of DNAs gradually was depleted. In case of the SNP RAI intron 1 only 133 persons could be genotyped reliably.

The examples relate to prediction from sequence polymorphisms in and around the region r to cancer, see Fig. 1 for an overview of a subregion of chromosome 19q. In the present study 27 markers were investigated within this 69 kb stretch for association with breast, lung, skin cancer, and multiple myeloma, using linkage disequilib- rium mapping based on single markers, as well as, on haplotypes combining several neighboring markers. We report that haplotypes with maximal association to all three cancers are centered around the gene RAI.

Typing of SNPs

Table 10 lists the polymorphisms used in this study, their nature, the numbers in the NCBI database dbSNP, the sequence defining the SNPs, and their position therein, the currently estimated chromsome positions and their relative position within the region of interest (which can readily be calculated based on the information provided in the table). Table 11 lists the primers used for the PCR reactions. Table 12 lists the probes used for detection and typing. Table 13 lists the PCR regimen for the polymorphisms using Lightcycler, Taqman and ABI 2720.

The particular sequence polymorphisms analysed in these examples are listed in Table 10, together with their sources of information and their definition as sequences.

Table 10

Trivial name Kind dbSNP # Sequence Position Relative in sePosition quence

XPD exon 23 A/C Rs1052559 NT_011109 18123137 0

XPD exon 10 A/G Rs1799793 NT_011109 18135477 12340

XPD exon 6 A/C Rs238406 NT_011109 18136527 13390

XPD intron 3 A/G Rs1799783 NT_011109 18140254 17117

XPD-4bp GACA/- Rs3916791 NT_011109 18142345 19208

XPD-81 bp 81 bp/- Rs3916787 NT_011109 18143027 19890

XPD-5'2 A/G Rs2097215 NT_011109 18144005 20868

XPD-5'3 C/T Rs11878644 NT_011109 18145185 22048

RAI-3'7 C/T RS7252567 NT_011109 18146823 23686

None 1) ATTTT/- 18147012 23875

None 2) A/G Rs2377329 18147126 23989

RAI-3'3 AA/- Rs3047560 NT_011109 18147192 24055

None 3) GfT 18146233

None 4) G/T rs10422489 18147886

None 5) C/T rs10426701 18148193

None 6) C/T 18149120

None 7) A/G 18149154

RAI-3'A A/G Rs4544343 NT_011109 18150199 27062

None 8) C/G 18150815

None 9) A/G Rs8101662 18150911

None 10) G/T 18151158

RyA/ exon 6 A/T Rs6966 NT_011109 18151180 28043

RAI intron5-3 C/T Rs10417235 NT_011109 18152171 29034

RAI intron5-2 A/T Rs8112723 NT__011109 18153497 30360

RAI intron 3 A/G Rs2017104 NT_011109 18155483 32346

RAI intron 1 A/G Rs1970764 NT_011109 18159091 35954

None 11) A/G 18159263

None 12) A/G 18160363

None 13) A/C 18160936

None 14) C/G 18160937

None 15) A/C rs10419090 18161433

None 16) G/T 18161694

None 17) C/GACCTTGG 18161841

CGCCAC-

CATCTT

None 18) C/T 18161896

RAI intron 1-2 C/T AC092309 26729 39249

NT_011109 18162206

None 19) C/T 18162309

None 20) A/G 18162356

None 21) A/G rs12986272 18162599

None 22) A/C 18162903

R/A/ intron1-3 A/C AC092309 27493 39833

NT_011109 18162970

None 23) C/T 18162986

None 24) A/C 18163200

None 25) C/G 18165052

RAI-5'2 C/T Rs4803814 NT_011109 18168944 45807

RAI-5'3 C/T Rs4803815 NT_011109 18168948 45811

RAI-5' C/T Rs4572514 NT_011109 18171984 48847

ASE1-5'2 G/T Rs2226949 NT_011109 18175327 52190

ASE1 exon 1 AZG Rs967591 NT_011109 18178152 55015

ERCC1-3'2 AZG Rs735482 NT_011109 18180220 57083

ERCC1-3 1 C/T Rs762562 NT_011109 18180561 57424

ERCC1-3'3 A/G Rs2336219 NT_011109 18180624 57487

ERCC1 exon 4 C/T Rs3177700 NT_011109 18191871 68734

rs numbers were derived from the NCBI's database dbSNP; Nucleotide sequence of polymorphisms identified in the present invention with no trivial name as yet: 1 ) GGTTTAT[ATTTT]Ntgagatggatttt (SEQ ID NO: 19) 2) ctggggaggctgaggcaggagaatc[A/G]cttgaaaccgggaggcggaggttgt (SEQ ID NO: 22)

3) GATTGTCATGT[G/T]ACATCAGCCAATACT (SEQ ID NO: 2)

4) AAAAAACTAAAGTGGGGTTTGCGGG[GZT]AGTGGGAGGGCCCTTCCTGCTAGGT (SEQ ID NO: 24) 5) caggcggatcacaaggtcaggagtt[C/T]gagaccagcctggccaacacagtga (SEQ ID NO: 25) 6) CACAGTGAAAC[C/T]CCATCTCTACTAAA (SEQ ID NO: 3)

7) AAAATTAGCCGG[A/G]CGCCATGGCGGGAG (SEQ ID NO: 4)

8) AGCCTGGCCAACATG[CZG]TGAAACCCCGTCTCT (SEQ ID NO: 5)

9) ctcgggaggctgaggcaggagaatc[A/G]cttgaactcaggaggcagaggttgc (SEQ ID NO: 27) 10) AAGTTTCTCTATT[GZT]TGTTTATAAACA (SEQ ID NO:6) 11) TTCTCCTGACCTC[A/G]TGATCCGCCCACCTCGG (SEQ ID NO: 7)

12) GGGATTACAGGCATGC[AZG]CCACCAGGCCCAGCTAATTTTTGT (SEQ ID NO: 8)

13) TCCAATGGTGACA[AZC]CAGTAAGAGCAGTTAACAG (SEQ ID NO: 9)

14) TCCAATGGTGACAA[CZG]AGTAAGAGCAGTTAACAG (SEQ ID NO: 10)

15) tacaggcgcccgccaccacccccag[AZC]taatttttgtatttttagtagagac (SEQ ID NO: 29) 16) TTGCCTCAGCCTCCTGA[GZT]TAGCTGGGATTGGAATGAGA (SEQ ID NO: 1 1 )

17J TACGATAAATAGCTAGA[CZGACCTTGGCGCCACCATCTT] (SEQ ID NO: 20)

18) AAAATAATAATAATAATATTAA[CZT]CCTGACCTTGGCGCCACCATCT (SEQ ID NO: 12)

19) CCTCATGAGCCACCCAC[CZT]TCGGCCTCCCAAAGTGCT (SEQ ID NO: 13) 20) TGAGCCACCGCGCCC[A/G]GCCGAGACTCACTATTT (SEQ ID NO: 14)

21) taaagcgggaggatggcttgaacct[AZG]ggaggcggaggttgcagtgagccga (SEQ ID NO: 31 )

22) GGAGAGAAGGAGCAGAGAAC[AZC]TCTCTATGTGGCCA (SEQ ID NO: 15)

23) TTTCCCAGCATCCCA[CZT]TGCAATGAGGCTCCTGGCC (SEQ ID NO: 16)

24) TCCTGACTCCAGTG[AZC]GGTGCCTACAGTCCTG (SEQ ID NO: 17) 25) TTCCAGCCTGGGCAAGAA[CZG]AGTGAAACTCCAGCTT (SEQ ID NO: 18)

Statistics and haplotype assignment

Data recording and calculations and tests of allele frequencies were performed in SPSS and Excel. Calculation of the relative risk and confidence intervals for the single polymorphisms was performed in SAS (SAS Institute, Cary, NC, USA).

Simultaneous analysis of multiple SNPs employing haplotype trend regression (17) was performed with HelixTree (GoldenTree, Bozeman, MT, USA). Haplotype trend regression is basically a two-stage procedure. First, genotype results in combination with population assumptions such as Hardy-Weinberg equilibrium are used to construct all haplotype probabilities corresponding to a given set of markers for each individual. Secondly, the disease state (1 for cases, 0 for controls) is regressed on the haplotype probabilities of all individuals, resulting in a p-value for the overall association of the set of markers with disease, and parameters for association for each specific haplotype with disease. The HelixTree program will use sets of markers of defined size, typically 2 to 4 neighboring markers at a time, to scan the entire region. It can then be used to derive frequencies of the individual haplotypes, to calculate an overall p-value for the distribution of haplotypes covering a given set of markers among cases and controls, and to calculate p-values for the distribution of each haplotype derived from a given set of markers.

The program RASCAL for performing gene localization according to Lazzeroni (16) was implemented in Delphi (B.A. Nexø, unpublished). We used a bootstrap set of 10000 sets of haplotypes selected from the original set with replacement, and to avoid a Q-form with negative values we used a relaxation value of 0.25. A 95% confidence interval for the location of the causative gene variant was derived from the Q-form. Places, where it took on a value less that 3.85 above the minimum, were considered inside the confidence interval.

Three programs were used for assigning haplotypes to individuals on the bases of genotype data: HelixTree, Arlequin (18) and Phase (19). Arlequin (18) like HelixTree is a maximum likelihood algorithm, while Phase in addition includes a penalty for each new haplotype that is brought into play. Furthermore, the program Arlequin includes missing values in its table of frequencies. To compensate for the latter, we normalised the values corresponding to fully defined haplotypes before including them in the analysis. All three were run under Windows 2000. The figure of the link-

age disequilibrium among the controls (Fig. 2) was also derived with the program HelixTree.

Primers and probes

The primers used for typing the polymorphisms are listed in Table 11. Table 12 lists the probes used for the polymorphisms.

Polymorphisms listed in Table 10 as having no trivial name have been deduced by sequencing in the present study.

A person skilled in the art will appreciate that primers and probes can be designed based on the information provided herein, in order to detect polymorphism by other means than sequencing, for example by PCR amplication methods as also used herein.

Primers were purchased from DNA Technology (Aarhus, Denmark), probes for the lightcycler were obtained from MolBiol (Tempelhofer Weg 11-12, Berlin) and the probes for Taqman were supplied by Applied Biosystem.

Table 11 Primers for the polymorphisms

Trivial name Primers

XPD exon 23 5' atg cac cag gaa ccg ttt atg g

5' tct gtt etc tgc agg agg ate XPD exon 10 5' gat caa aga gac aga cga gc

5' gaa gcc cag gaa atg c XPD exon 6 5' gta cca gca tga cac cag cct

5' tec etc cct gag ccc tg XPD intron 3 5' aag gca gac aaa gga agg

5' gca agg aga agg aac agg XPD-4bp 5' caa tea aaa aga aaa cat gg

5' tga gac gag gtg gag g XPD-81 bp 5' tgc etc ace cct gta ate c (SEQ ID NO.: 47)

5' get tgt aat ccc age tac teg XPD-5'2 5' caa cac tea cac ccc aca g

5' aga tea cgc cac tgc act c XPD-5'3 5' ttg aca att gag caa aga gc

5' tgg gat tac aga cgt gag c RAI-3'7 5' cca gtc caa aca ata tga tec

5' agt gca gcc tea act tec RAI-3'3 5' cat gat tea ctg cac cca ace

5' ttt cac tct tgt tgc cca age RAI-3'4 5' ttt tea cac aag tec aat cc

5' act gca ace tec ate tec R>A/ exon 6 5' ccc tgc ccc ace tct cc

5' agt caa ttt ctg tgc aaa eta ctt tta ttt RAI intron 5-3 5' cat gac gag ace ctg tct eta eta aa

5' cac etc ccg gat tea agt ga RAI intron 5-2 5' gag gca aca gga aca aac c

5' cat tgg att gag cag aaa cc RAI intron 3 5' taa cat aaa gaa tea gga gga ggc

5' agt tgg etc ate tgc etc tt

RAI intron 1 5' tgg eta aca egg tga aac c 5' gga ate caa aga ttc tat gat gg

R/\/ intron1-2 5' act cct gac ttc aaa tga tec 5' tag ccc cca gtc acg ttc c

RAI intron 1-3 5' aga agt cca aga gtt tgc age 5' ttc tea gtc cca gaa tga ace

RAI-5'2 5' cca ctt agg taa aca cct ctt 5' ctg caa tga gee gag ata gaa

RAI-5'3 5' cca ctt agg taa aca cct ctt 5 1 ctg caa tga gcc gag ata gaa

RAI-S 5' atg ttg ggg aga ctg agg 5' ccg cat eta act tat tct gg

ASE1-5'2 5' aac tac etc tgc aaa ccc age 5' ttg gaa tgg agg gat tct ace

ASE1 exon 1 5' ggt ttt ctg etc tgc aca eg 5' cct ttc tec ttc cac caa eg

ERCC1-3'2 5'-aca gag ccc aca gtg gag aca 5'-cta gag get cag tgt taa tct gtt cct

ERCC1-3' 5' gga cag atg gca atg atg g 5' tct tct tct tgg tgg atg tgg

ERCC1-3'3 5'-acc atg gcg cct caa ca 5'-gaa ttg get cag tea ctg tgt ga

ERCC1 exon 4 5' ggc cct gtg gtt ate aag g 5' tct cat aga aca gtc cag aac act g

RAI-3'3 nested 1 5' cat gat tea ctg cac cca ace 5' ttt cac tct tgt tgc cca age

RAI-3'3 nested 2 5' 6FAM-ctt gca cag tgg etc atg c 5' tct tgt tgc cca age tgg

Trivial name Primers

ASE1 exon 1 5' ggt ttt ctg etc tgc aca eg

5' cct ttc tec ttc cac caa eg ERCC1-32 5'-aca gag ccc aca gtg gag aca

5'-cta gag get cag tgt taa tct gtt cct ERCC1-3' 5' gga cag atg gca atg atg g

5' tct tct tct tgg tgg atg tgg ERCC1-3'3 5'-acc atg gcg cct caa ca

5'-gaa ttg get cag tea ctg tgt ga ERCC1 exon 4 5' ggc cct gtg gtt ate aag g

5' tct cat aga aca gtc cag aac act g RAI-3'3 nested 1 5' cat gat tea ctg cac cca ace

5' ttt cac tct tgt tgc cca age RAI-3'3 nested 2 5' 6FAM-ctt gca cag tgg etc atg c

5' tct tgt tgc cca age tgg

Table 12 Probes for the polymorphisms

Trivial name Probes

XPD exon 23 5' FAM-ctc tat cct ctg cag cg-MGB

5' VIC-tat cct ctt gag cgt ct-MGB XPD exon 10 5' LC Red640- cgt get gcc caa cga agt g -p

5' gga cgc cca cct ggc caa cc -fluoresceine XPD exon 6 5' FAM-ccc cac tgc cgc ttc tat gag gt-TAMRA

5' VIC-ccc cac tgc cga ttc tat gag gtt-TAMRA XPD intron 3 5' LC Red640- ccc tgc ccc cca act ttg ga -p

5' gcc tec aat gaa cac aag etc -fluoresceine XPD-4bp 5' LC Red640- cct ggg ttc gat caa tac tea gac a -p

5' etc get ate ttg etc aag ctg ate teg aac -fluoresceine XPD-81 bp 5' agt cac age tea ctg cag cct c -fluoresceine

5' LC Red640- ace tet tgg get caa gcg ate etc -p (SEQ ID NO.: 48)

XPD-5'2 5' LC Red640- aaa aaa aga ctt ate atg aca gga tgt ct -p (SEQ ID

NO.: 49)

5' gca aga etc cgt cec aga aaa aga aaa -fluoresceine

XPD-5'2, 5' LC Red640- tec tet etc tec cec age tea ttt tg -p

5' aac cca cec tac tgc tet gat etc -fluoresceine RAI-3'7 5' LC Red640- agg ctg gtc ttg aac tec tgg get taa g -p

5' ggt tec gee acg ttg cc -fluoresceine RAI-3'3 5' LC Red640- teg get att ttt ttt ttt att ttt tta tt -p

5' att aca ggc ace cac cac cat g -fluoresceine RAI-3'4 5' LC Red640- ect gga caa cat agg gag ace ctg tgt -p

5' caa aca aac aaa aac etc tgc ca -fluoresceine RAI exon 6 5' 6-FAM-tgc ctt cac aca get ctg gtt taa tg - TAMRA

5' VIC - tgc ctt cac aca gca ctg gtt taa tg - TAMRA RAI intron 5-3 5' FAM-tgg tgg tgc atg ect gta ate cc-BHQ

5' Yakima Yellow-tgg tgg tgc atg cec gta atc-BHQ RAI intron 5-2 5' ace atg ttg gcc agg ctg gtt tt -fluoresceine

5' LC Red640- ate tac tga ect caa atg ate cac ct -p RAI intron 3 5' LC Red640- tgc aat ccg cec gcc -p

5' cca ggc tgg ttt gga aat ect gag etc -fluoresceine RAI intron 1 5' LC Red640- ctg aga teg cac cac tgc ac -p

5' ggg agg egg age ttg cag tga -fluoresceine RAI intron 1-2 5' gcg cat gcc tgt aat tet gta -fluoresceine

5' LC Red640- cag gac gag cca cag aca aaa etc c -p RyA/ intron 1-3 5' LC Red640- tgc aat gag get ect ggc c -p

5' act aca ttt cec age ate cca -fluoresceine RAI- 5'2 5' ect cec tec etc ect gc -fluoresceine

5' LC Red640- tgc ttg ctt tet etc tet -p RAI-5'3 5' tec ctg ctt get tgc ttt etc t -fluoresceine

5' LCRed 640- tet etc ttt ctt tet ttc ttt c -p RAI-5' 5' tgt tea tec aaa tga gcc gc -fluoresceine

5' LC Red640- age ctg aac agg ttc tgt tec ttc gac tt -p ASE1-5'2 5' LC Red640- caa get get ate teg ace gat ctt -p

5' ggg tga cca cec tgc cag cc -fluoresceine ASE1 exon 1 5' LC Red640- egg get aca ggg tta ect gag -p

5' tet gca ace tgg tgc gag cag c -fluoresceine ERCC1-3'2 5' FAM-aaa ggg aaa gaa ace t-MGB

5' VIC- cca aag gga cag aaa-MGB ERCC1-3' 5' LC Red640- tga ggc tec get ect tet gg -p

5' age tgc cag age tgc ctg ggc -fluoresceine ERCC1-3'3 5' VIC -aca gca aaa tgc cac agt-MGB

5' FAM-aca gca aga tgc cac ag-MGB ERCC1 exon 4 5' cgc aac gtg cec tgg gaa t -fluorescein

5' LC Red640- tgg cga cgt aat tec cga eta tgt get g -p

Table 13. PCR regimens used

Trivial Tech Buffer Buffer modifiDenaturation Annealing Elongation name cation

XPD exon T M - 15 sec 94 0 C 60 sec 60

23 0 C

XPD exon L R 5% DMSO 10 sec 95 0 C 15 sec 53 30 sec 72

10 0 C 0 C

XPD exon T M - 15 sec 94 0 C 60 sec 63

6 0 C

XPD inL H - 2 sec 95 °( 15 sec 30 sec 72 tron 3 55°C 0 C

XPD-4bp L H 2x Buffer 10 sec 95 0 C 15 sec 60 30 sec 72

0 C 0 C

XPD- L H 5% DMSO 10 sec 95 0 C 15 sec 66 30 sec 72

81bp 0 C 0 C

XPD-5'2 L H 2x dNTP + 10 sec 95 0 C 15 sec 64 30 sec 72

2x Buffer 0 C 0 C

XPD-5'3 L H - 2 sec 95 °C 15 sec 71 30 sec 72

0 C 0 C

RAI-3'7 L H - 2 sec 95 0 C 15 sec 67 30 sec 72

0 C 0 C

RAI-3'3 L H 2x dNTP + 2 sec 95 0 C 15 sec 70 30 sec 72

2x Buffer 0 C 0 C

RAI-3'4 L H - 2 sec 95 0 C 15 sec 67 30 sec 72

0 C 0 C

RAI exon T M - 15 sec 94 C 60 sec 60

6 0 C

RAI intron T M - 15 sec 94 C 60 sec 63

5-3 0 C

RAI intron L H - 2 sec 95 0 C 15 sec 60 40 sec 72

5-2 0 C 0 C

RAI intron L H 2x dNTP 2 sec 95 0 C 15 sec 63 30 sec 72

3 0 C 0 C

RAI intron L H 2x Buffer 0 sec 95 0 C 10 sec 57 15 sec 72

1 0 C 0 C

R>A/ inL H - 2 sec 95 0 C 15 sec 66 30 sec 72 tron1-2 0 C 0 C

RAI intron L H - 2 sec 95 0 C 15 sec 65 30 sec 72

1-3 0 C 0 C

RA/-5'2 L R - 2 sec 95 0 C 15 sec 61 30 sec 72

0 C 0 C

Rλ/-5'3 L R - 2 sec 95 0 C 15 sec 61 30 sec 72

0 C 0 C

RAIS' L H 1M Betain 2 sec 95 °C 15 sec 63 30 sec 72

0 C 0 C

ASE1-5'2 L H 5% DMSO + 2 sec 95 0 C 15 sec 63 30 sec 72

VAx Primer 0 C 0 C

ASE1 L H - 2 sec 95 0 C 15 sec 60 30 sec 72 exon 1 0 C 0 C

ERCC1- T M 15 sec 94 0 C 60 sec 60

3'2 0 C

ERCC1-3' L H - 2 sec 95 0 C 15 sec 62 30 sec 72

0 C 0 C

ERCC1- T M 15 sec 94 0 C 60 sec 62

3'3 0 C

ERCC1 L H 5% DMSO 0 sec 95 0 C 15 sec 57 25 sec 72 exon 4 0 C 0 C ft4/-3'3 A H - 30 sec 94 0 C 30 sec 64 30 sec 72 nested 1 0 C 0 C

R/A/-3'3 A H - 30 sec 94 0 C 30 sec 62 30 sec 72 nested 2 0 C °C

Where R is Roche buffer, H is homebrew buffer and M is mastermix, respectively, and where T denotes Tagman technique, L is Lightcycler and A is ABI2720 sequencing, respectively.

R is Roche buffer: 10 pmole primers

1 pmole probes 1.3 nmole dNTP

1 ul Hybridization Mix for Lightcycler (Roche) 1 ul MgCI2 for Lightcycler (Roche) 1 ul DNA water to 10 ul

H is Homebrew: 10 pmole primers

1 pmole probes 1.3 nmole dNTP

0.25 ul Titanium Taq (BD Biosciences, Palo Alto, CA) 1 ul Titanium Taq Buffer (BD Biosciences, Palo AIo, CA) 1 ul DNA water to 10 ul

Determination of polymorphisms by real-time PCR using Taqman probes. Polymorphisms analyzed on Taqman were scored on the basis of the relative reaction of the 2 probes. Some polymorphisms were analysed using the ABI Prism 7700 sequence detection system (Applied Biosystems, Foster City, Ca, USA). PCR Primers and Taqman probes were designed using Primer Express v 1.0 (Applied Biosystems). The reactions were performed in MicroAmp optical tubes sealed with MicroAmp optical caps (Applied Biosystems) containing a 10 μl reaction volume (Mastermix): 1x Taqman buffer A, 2.5mM MgCI 2 , 200 μM each of dATP dCTP, dGTP, 400μM dUTP, 80OnM each primer, 200nm each probe, 0,01 U/μL AmpErase UNG, 0,025 U/μL AmpliTaq Gold Polymerase. Tubes were incubated at 50 0 C for 2 min followed by 10 min at 95°C. The incubation was succeeded by 45 cycles with thermal cycler conditions as shown in Table 13.

Determination of polymorphisms by Lightcycler. Genotypes of the American per- sons and the Danish persons for polymorphisms in XPD exon 10, XPD-4bp, XPD- 81 bp, XPD-5'2, RAI-3'3, RAI-3'7, RAI intron 3, RAI intron 1 , RAI intron 1-2, RAI-5', RAI-5'2, RAI-5'3 ASE1-5'2, ASE1 exon 1 , ERCC1-3' and ERCC1 exon 4 were detected using LightCycler™ (Roche Molecular Biochemicals, Mannheim, Germany), scored on the bases on the temperature dependency ("melting profile") of the fluo- rescence. For XPD exon 10 PCR was performed by rapid-cycling in a reaction vol-

ume of 20 μl with 0.5 μM of each primer, 0.045 μM of anchor and sensor probe, 3.5 mM MgCI 2 , approximately 7 - 25 ng genomic DNA, and 2 μl LightCycler DNA Master Hybridization probe buffer (Roche Molecular Biochemicals, Cat. No 2158 825). This buffer contains Taq DNA polymerase, dNTP mix, and 10 mM MgCI 2 and 5% DMSO. The temperature cycling consisted of denaturation at 95 0 C for 2 sec, followed by 46 cycles consisting of 10 sec at 95 0 C, 15 sec at 53°C, and 30 sec at 72°C, see table 4a for PCR regimen for the polymorphisms using LightCycler™. For detection of the polymorphisms XPD-4bp, XPD-81bp, XPD-5'2, RAI-3'3, RAI intron 3, RAI intron 1, RAI intron 1-2, RAI-5', ASE1-5'2, ASE1-5'2, ASE1 exon 1, ERCC1-3" and ERCC1 exon 4 using LightCycler™ PCR was performed using Homebrew buffer, the composition of which is listed in table 13a. XPD exon 10, RAI-5'2 and RAI-5'3 were run in Roche buffer, the composition of which is also listed in table 13. Buffer modifications for the individual assay are shown in table 13 below. The temperature cycling parameters are shown for each of the reactions in table 13. The last annealing pe- riod at 72°C was extended to 120 sec followed by denaturation for 10 sec at 95 0 C. The melting profile was determined by a temperature ramp from 50 0 C to 95°C with a rate of 0.1 degree/sec.

EXAMPLE 1 Pair-wise linkage disequilibrium of all markers of controls

To get an initial overview over the many markers the pair-wise linkage disequilibrium of all markers in the controls was calculated, see Fig. 2. It is clear that the region consists of 2 major haplotype blocks with a few interspersed markers in between. One haplotype block spans roughly 16 markers and 26 kb from XPD exonδ to RAI intron 1-3, while the other haplotype block spans 6 markers and 9 kb from RAI-5' to ERCC1-3'3.

EXAMPLE 2 Relative risk of breast cancer

Blood samples were collected and frozen from a large number of Danish postmenopausal women who had suffered from breast cancer. An age limit of 55 years was used to separate early from late cases. The cut-off was forced by a previous deci- sion to use 50 years of age as an entrance criteria for inclusion in the cohort. The

salient features of this cohort are shown in Table 14. DNAs were purified from the blood samples of these persons and the polymorphisms as shown in Table 10 were typed.

Table 14 Features of the investigated breast cancer cases and controls.

Breast cancer

Background population Danes

Recruitment Population based

Epidemiological design Cohort

No. of cases 428

No. of controls 433

Age at inclusion 50-64 yrs of age

Cutoff for young cases 55 yrs of age

No. of young cases 62

The relative risk of breast cancer was calculated for the two homozygous forms of all the markers as well as the confidence intervals. This relatively simple analysis has the advantage that no a priori assumptions about the distribution of genotypes are needed. Fig. 9 depicts the relative risks and the lower confidence limits of the relative risks for postmenopausal breast cancer below age 55 as a function of the location of the markers. The values were initially calculated with the wild-type allele as reference, however, if the calculation gave a RR values less than 1 we used the reciprocal value of the RR and the reciprocal value of the high confidence limit in- stead. It is obvious that a significant association of markers with cancer exists in the region from 13 000 to 39 000 bases, corresponding to the first mentioned haplo- type block and covering most of the gene RAI and the 5' region of the gene XPD. The corresponding curves for the other age-brackets did not show significant regions of association (results not shown).

EXAMPLE 3

Distribution of p-values for sets of markers in relation to breast cancer

As previous experience (Vogel et al. In press; Nexø et al. In press) had indicated to us that combined analyses involving multiple markers were superior to analysis of individual markers, we used the analytical technique known as haplotype trend regression (Zaykin et al. 2002), more specifically the program HelixTree, to associate sets of markers with the individual diseases. Haplotype trend regression is basically a two-stage procedure. First, genotype results in combination with population as- sumptions such as Hardy-Weinberg equilibrium are used to construct all haplotype

probabilities corresponding to a given set of markers for each individual. Secondly, the disease state (1 for cases, 0 for controls) is regressed on the haplotype probabilities of all individuals, resulting in a p-value for the overall association of the set of markers with disease, and parameters for association for each specific haplotype with disease. The HelixTree program will use sets of markers of defined size, typically 2 to 4 neighboring markers at a time, to scan the entire region. It can then be used to derive frequencies of the individual haplotypes, to calculate an overall p- value for the distribution of haplotypes covering a given set of markers among cases and controls, and to calculate p-values for the distribution of each haplotype derived from a given set of markers.

The entire sets of controls were used for the analyses. This broke the matching of the data sets, but matching on other criteria than ethnicity and age may well be irrelevant for genetic association studies (Hemminki & Forsti 2001) (21).

Fig. 4 shows the overall distribution of p-values for sets of markers plotted against the position on the chromosome for breast cancer. As position on the abscissa we used the median marker position in a given set of markers. The ordinate values are the negative logarithms to the overall p-values for a difference between cases and controls associated with a given set of markers. Each curve corresponds to a given size of marker sets, i.e. the number of markers in the haplotypes.

The results in Fig. 4 suggest that the association with breast cancer has two peaks. One was located at roughly 24 kb and one at 39 kb. When the cases were broken down into early and late breast cancer the peak at 24 kb was present in both groups, while the peak at 39 kb was only present in the older group. The peaks were clearly present in curves for haplotypes of 2, 3 and 4 neighboring markers. Larger sets were less informative, presumably due to increased degrees of freedom, but they essentially corroborated the results (results not shown).

Remarkably, this association at 24 kb was present for breast cancer both in young and in older female cases. This is the first time an association to RAI has been reported for cancers in older populations. The position of the optimal set of markers corresponds to the 3' part of the gene RAI and the inter-gene region between RAI

and XPD and was identical for young and older cancer cases. The peak at 39 kb which is only present in the older breast cancers correspond to the 5' part of RAI.

EXAMPLE 4 Frequency of individual haplotypes made up from 2 neighbouring markers in relation to breast cancer.

To understand better the nature of the association, we tabulated the frequencies of individual haplotypes made up of 2 neighboring markers centring on the position at 24 kb with maximal association, in various groups of cases and controls (RAI-3'3 RAI-3'4; Table 15). We also tabulated p-values associated with the individual haplotypes. It was the same haplotypes that differed whin young and older breast cancers were compared to controls (RAI-3'3 S RAI-3'4 C down; RAI-3'3 1 RAI-3'4 1 up). A similarity between patterns in the different cancer forms was not obvious, however, the frequencies of the haplotypes in the control groups were similar.

Similarly, we calculated haplotypes for the set of 2 markers, located at 39 kb. Again the deviation of young and old breast cancer cases from the controls had similarity (RAI intron1-2 9 RAI intron1-3 a up; RAI intron1-2 9 RAI intron1-3 c down). The same pattern was present in the lung cancer data. This is remarkable, as neither the young breast cancers nor the lung cancers showed a peak in this position, and suggests an underlying similarity of haplotypes in the different groups.

Table 15. Frequency of 2-haplotypes of cancer cases and controls and their association with cancer

1) p-values in parentheses relative to breast cancer controls, all ages

p-values in parentheses relative to skin cancer controls, all ages

3) p-value could not be calculated

The assignment of frequencies to individual haplotypes also made it possible to derive a measure of association which was stochastically independent of group sizes. Under the assumption that a putative causative variant must be at least equally as associated with disease as any surrogate haplotype we calculated the odds ratios for each haplotype of each set of two neighbouring markers versus the three other haplotypes of the set and chose the maximum value. This value we then plotted against the median position of the set (Fig. 5). The resulting curves indicate a maximal association of the set RAI-3'3 RAI-3'4 at 25 kb with a value of approximately 4.4 for the young women, with rapid fall-off at either side. The curve for older women as expected had two maxima, one at the same place as the young women with a maximum of 7.3, and one associated with the set RAI intron1-2 RAI intron1-3 with a maximum of the same height.

To make sure that the derived haplotype frequencies did not reflect a peculiarity of the particular program used, we also calculated the frequencies of individual haplotypes in cases from breast cancer and basal cell carcinoma combined and in the corresponding controls, using 2 other computer programs, Arlequin and Phase. Ar-

lequin (18) like HelixTree is a maximum likelihood algorithm, while Phase in addition includes a penalty for each new haplotype that is brought into play. Furthermore, the program Arlequin includes missing values in its table of frequencies. To compensate for this, we normalised the values corresponding to fully defined haplotypes before including them in the analysis. It was evident that results from all 3 programs were similar. In particular, HTR and Arlequin produced essentially identical results (0 to 2 percent difference, except when the values were essentially zero). The results obtained with Phase occasionally diverged 5 to 10 percent of the value, presumably a reflection of the different optimization criterion, but the change appeared to affect both cases and controls. Therefore, we are confident that our biological conclusions are independent of which computer program was used (results not shown).

We also assigned long haplotypes to all individuals of the breast cancer cases and controls using the program Phase (Stephens et al. 2001). To avoid overloading this program and to remove SNPs of little importance we reduced the set to 22 SNPs, by eliminating those with lowest heterogeneity. The resulting haplotypes were used as input to a program implementing Lazzeroni's technique (Lazzeroni 1998) (16) for locating gene variation (RASCAL, BA Nexø, unpublished). We used a bootstrap set of 10000 sets of haplotypes selected from the original set with replacement, and to avoid a Q-form with negative values we used a relaxation value of 0.25. With these values we found the most likely location to be at 20 kb very close to the marker XPD_81 bp for the early breast cancers, (Fig. 6). Changing the relaxation factor to 0.5 moved the most likely location to 23 kb bases. A 95% confidence interval for the location of the causative gene variant was derived from the quadratic form for the residual variance. Places, where it took on a value less that 3.85 above the minimum were considered inside the confidence interval. The curve suggested that the Cl stretches from approximately 10 kb to approximately 57 kb. Using the same parameters, we found an almost identical position for the gene variant among the late breast cancers, however in this case no part of the region could be statistically ex- eluded as a possible location for the variant.

EXAMPLE 5

Distribution of p-values for sets of markers in relation to early basal cell carcinoma

DNA was isolated from lymphocytes from humans from the American cohort of patients with basal cell carcinoma and controls, described in Materials and Methods,

was typed with respect to a number of sequence polymorphisms located in and around the claimed region r. The salient features of the Americant cohort are shown in Table 16.

Table 16. Features of the investigated basal cell carcinoma cases and controls.

Basal cell carcinoma

Background population Caucasian Americans

Recruitment Hospital based

Epidemiological design Case-Control

No. of cases 71

No. of controls 118

Age at inclusion 29 - 60

Cutoff for young cases 50 yrs of age

No. of young cases 45

Figure 7 shows the same kind of result for early basal cell carcinoma. Again we found a strong association with the region around RAI, with the curve for sets of 3 markers possibly shifted slightly to the right, and the curve for 4 markers showing suggestions of an extra association at higher positions.

EXAMPLE 6

Distribution of p-values for sets of markers in relation to lung cancer

DNA from humans from the Danish cohort of patients with lung cancer and controls was typed with respect to a number of sequence polymorphisms located in and around the claimed region r.

Evidence exists that a separate effector is associated with the marker XPD exon 23 which influences the lung cancer risk and thus may interfere with accurate mapping.

Therefore the Danish men and women who had suffered from lung cancer were stratified according to the value of the marker XPD exon 23. The salient features of the Danish cohort are shown in Table 17.

Table 17. Features of the investigated lung cancer cases and controls.

Lung cancer Lung cancer w/ XPD exon23 M

Background population Danes Danes

Recruitment Population based Population based

Epidemiological design Cohort Subcohort

No. of cases 249 83

No. of controls 259 115

Age at inclusion 50 - 64 yrs of age 50 - 64 yrs of age

Cutoff for young cases 56 yrs of age 56 yrs of age

No. of young cases 33 9

To minimize the interference of this second effector, we stratified the young population for XPD exon 23, and calculated overall p -values for each value of XPD exon23. Fig. 7 shows that also this disease is associated with sets of markers spanning the distal end of RAI and the inter-gene region in the population with the XPD exon23 AA genotype. This is the group of patients free of influence of the XPD gene. No association was evident in the two other groups (results not shown).

EXAMPLE 7

The results obtained in examples 1 , and 3 to 6 rely on the use of the HelixTree program. In an attempt to control for this dependency the odds ratios for test of the two repective homozygotes of each individual marker against cancer status were plotted against the marker position on the chromosome 19, see Fig. 8. Where one of the four homozygotic groups was empty heterozygotes were included in the analysis. This relatively simple analysis has the advantage that no assumptions about the phase of different markers are necessary.

The data for early cancers show a sharp maximum in the marker RAI introni , and breast cancer in addition a minor peak in XPD_4bp. Among the late cancers there was no striking association with disease, but mapping with single SNPs is also not expected to be particular sensitive (results not shown).

EXAMPLE 8 The ASE-1 genotype influences relapse-free survival and survival in Multiple Myeloma

391 patients diagnosed with multiple myeloma were found eligible for transplantation in Denmark in the period 1993-2004.

304 patients were genotyped for the polymorphisms ERCC1 exon4, RAI introni , ASE-1 e1 , XPD exon23, XPD exon10. Clinical parameters were available for 360 patients.

The multiple myeloma patients were treated with high dose alkylating chemotherapy followed by autologous bone marrow transplantation. The overall survival was sig- nificantly shorter for patients who experienced relapse than for patients who did not.

ASE-1 genotype was found to strongly influence the event-free survival (EFS) and overall survival in patients with multiple myeloma who were auto-transplanted. Thus, homozygous carriers of the wild-type allele had mean EFS of 1160 days whereas carriers of the variant allele had a mean EFS of 1718 days (p= 0.0018). Thus variant allele carries had 1.5 year longer event-free survival, see Fig. 9. The EFS was similar for men and women, although the difference was only statistically significant for women (Table 18). This means that the ASE-1 genotype can predict who will benefit most from the present treatment regimen.

The overall survival was also found to be influenced by the ASE-1 genotype. Thus, homozygous carriers of the wild-type allele had an overall survival of 2117 days whereas carriers of the variant allele lived 2727 days, which is 1.67 years longer (p=0.029), see Fig. 10. There was no gender effect but the difference was only sta- tistically significant for women.

This means that the ASE-1 genotype can predict who will benefit the most from the present treatment regimen.

Table 18

Event-free survival and overall survival for patients with different ASE-1 genotypes

Group Median event-free sur- p Median overall sur- P vival, EFS, (5-95% Cl) vival (5-95 % Cl)

GG 736 (634, 838) 0.0018 1902 (1613, 2191) 0.0293

AA+AG 1397 (901 , 1893) 3015 (2093, 3937)

Men

GG 736 (616,856) 0.106 2030 (1741 , 2404) 0.2297

AA+AG 963 (693,1233) 2747 (2377, 3077)

Women

GG 714 (426, 1002) 0.0075 1897 (1384, 2410) 0.0368

AA+AG 1479 (1019, 2939) 3015 (2261 , 3764)

As Fig. 11 illustrates, female carriers of the variant allele of ASE- 1 polymorphism had a longer relapse-free period than homozygous carriers of the wild-type allele (median 1479 days vs. 714 days), while Figure 12 illustrates that they lived longer (median 3015 vs. 1897). Similar analyses for the men revealed the same tendencies but no statistically significant differences (median 963 vs 736 days for the relapse-free period; mean 2747 vs. 2030 days for the survival), Fig. 13 and Fig. 14, respectively.

EXAMPLE 9

Survival from time of diagnosis and genotypes of relevant genes were determined for 432 lung cancer patients who had been enrolled in the prospective study 'Diet, Cancer and Health'.

High-risk haplotype carriers (defined as ERCC1 exon 4 M , ASE1 exon1 GG , RAI In- tron 1 AA ) had a mean survival of 359 days compared to a mean survival of 244 for non-carriers (Table 19 and Fig. 15). The genotype XPD K751Q was also determined.

Table 19.

Genotype N Survival (days) P a

High Risk carrier.

Yes 105 359.3 +/- 37.1

No 319 244.5 +/- 17.5 0.002

Missing

XPD K751Q

AA 146 330.8 +/- 33.8 0.053

AC 218 248.9 +/- 20.7

CC 64 255.8 +/- 31.6

Missing 4 a) p for cox regression

When the two genotypes were combined, there were significant differences in survival between carriers of different genotype combinations, see Table 20. Thus, homozygous carriers of both favorable genotypes had a mean survival of 415 days whereas carrier of an unfavorable had a lower mean survival time, see Fig.16 and Fig. 17, respectively.

Table 20.

High-risk carrier XPD K751Q N Survival (days) P

No AA 101 296 +/- 37 0.001 a

No AC 164 222 +/- 24

No CC 51 217 +/- 29

Yes AA 44 415 +/- 74

Yes AC 51 318 +/- 42

Yes CC 10 346 +/- 83

Missing 11

a) the p-value is for a cox regression. P for high-risk is 0.004 and for XPD K751Q p=0.035.

EXAMPLE 10 Odds ratios for alleles in young cases vs controls

In analogy with the methods employed and results obtained in examples 1-9 the alleles of the markers listed below in table 20 have been found to have the indicated odds ratios for breast cancer in young cases versus controls when determined by sequencing of the region in approximately 10 cases and 10 controls. Further- more, to make sure that an odds ratio could be calculated in all instances (i.e. that

the divisor was never 0) we added a value of 0.5 to each of the four alleles occurrences. Table 20

TTCCAGCCTGGGCAAGAA[CZG]AGTGAAACTCCAGCTT 1 ,5 18

EXAMPLE 11.

Superior association of the marker RAI3'd1 with breast cancer.

To investigate the association of the length-polymorphism observed at position

18147012 in contig NT_011109, here called RAI3'd1 , with breast cancer we developed a PCR reaction using one fluorescent primer and one non-fluorescent. The primers were:

RAI3'd1-f1 : 5'-AAA AAA ATA GCC GAG CAT GG RAI3'd 1 -r1 : 5'-6FAM-TT TGG ACT GGG TAA GAA TTT CC

The PCR reaction contained

DNA (ca 20 ng/ul) 2.5 μl

Primers (10O uM) 0.075 μl each dNTP (25 mM) 0.1 μl

Titanium taq polymerase *) 0.065 μl

Titanium taq buffer * ) 1.25 μl

Water 8.45 μl

* (BD Biosciences, Brøndby, Denmark)

The PCR regimen consisted of

Denaturation at 94°C for 5 min, 30 cycles consisting of a denaturation step(at 94 0 C for 30 sec), an annealing step (at 62°C for 30 sec) and an elongation step (at 72°C for 36 sec). A polishing step was included in the reaction (at 72°C for 6 min)

After the PCR, 1 μl of the reactions were diluted 1 :20 in water, 1 μl of the dilutions were mixed with 12.5 μl formamide and 0.5 μl of fluorescent molecular length standards (Genescan 500, ROX, Applied Biosystems, Naerum, Denmark). The mixtures were denatured 2 min at 94 0 C and put on ice. The resulting mixtures were analyzed on an ABI3730 unit (Applied Biosystems) with registration of both fluorophores.

DNAs from peripheral leukocytes from 434 breast cancer patients and a similar number of healthy controls, all part of the prospective study "Diet, Health and Cancer" (see previous examples) were analyzed. The following sizes of PCR fragments were found: Length Number of chromosomes 230 1

238 993

239 7 259 2

263 1

264 409 269 49 274 28 279 2

We consider the difference between PCR fragments of length 238 and 239 nonsignificant. These are probably the same molecular entities, and were treated as such in the further analyses. The same goes for the fragments of length 263 and 264.

For sequencing the same PCR reactions were performed with non-fluorescent primers on select DNAs, preferentially from homozygotes. The primers RAI3'd1- f2: 5'- CCA -AGA TCG TGC CACRGT and RAI3'd1-r2: 5'- TTT GGA CTG GGT AAG AAT TTC C were used for priming a big-dye dideoxynucleotide sequencing reaction, which was then analysed on a ABI 3130.

Examples of sequences observed, wherein the repeats are underlined/double un- derlined:

Sequence corresponding to 4 repeats:

238

GAAAATCATCTCAAAAATAATAAAATAAAATAAAATAAAATATAAACCAGTCAGTT

Sequences corresponding to 8 repeats:

259

GAAAATCCATCTCAAAAATAATAAAATAAAATAAAATAAAATAAAATAAAATAAAA

IδδδδTATAAACCAGTCAGTT

264

GAAAATCCATCTCAAAAATAATAAAATAAAATAAAATAAAATAAAATAAAATAAAA IAAAATAAAATATAAACCAGTCAGTT

Sequences corresponding to 9 repeats: 269

GAAAATCCATCTCAAAAATAATAAAATAAAATAAAATAAAATAAAATAAAATAAAA TAAAATAAAATATAAACCAGTCAGTT

274 GAAAATCCATCTCAAAAATAATAAAATAAAATAAAATAAAATAAAATAAAATAAAA TAAAATAAAATATAAACCAGTCAGTT

We conclude that the difference between the major forms (length 238 and 264) is an increase in the number of repeats of the sequence TAAAA (complementary se- quence ATTTT). Of the minor forms length 259 also seems to fit into this system, while length 269 and 274 seem identical to length 264 in the region investigated and may have differences elsewhere in the PCR fragment.

For the analysis of the association with breast cancer we categorized the fragments into short fragments (S: fragments 230 and 238) and long (L: all other sizes). We then calculated the relative risk of breast cancer as well as 95% confidence interval for homozygous SS and heterozygous SL individuals relative to the individuals with the LL genotype. This calculation was performed both on the material as a whole and after age-stratification. For each age group we finally calculated a p-value for trend in the relative risk among the three genotypes. The results are shown in table 21.

Table 21. Association of the polymorphism RAI3'd1 with risk of breast cancer.

Age group/ N cases N controls RR (95% Cl) P trend

Genotype

All 0.0008

LL 38 59 1

SL 127 165 1.39 (0.81-2.38)

SS 200 150 2.44 (1.41-4.23)

<=55 yr 0.01

LL 5 15 1

SL 18 25 1.46 (0.33-6.47)

SS 33 14 6.29 (1.49-26.61)

55-60 yr 0.19

LL 12 17 1

SL 45 63 1.72 (0.63-4.73)

SS 61 49 2.57 (0.88-7.48)

>60 yr 0.12

LL 21 27 1

SL 64 77 1.09 (0.52-2.60)

SS 106 87 1.73 (0.83-3.64)

It is obvious that persons homozygous SS are at greater risk of developing breast cancer.

This is apparent in the combined group as well as in the younger persons. In the two older age groups the tendency is suggestive. The relative risks in the combined group and among the younger patients are higher than we have observed with any other polymorphism. We conclude that RAI3'd1 is a superior marker for risk of breast cancer.

EXAMPLE 12

Interaction between RAI3'd1 and markers in the second haplotype block.

We have previously described that the markers investigated form two haplotype blocks, one ranging from XPD exon 23 to RAI intron1-3 and the other ranging from RAI5' to ERCC1 exon4. The marker RAI3'd1 is located in the first block. We found it of interest to look for interaction between RAI3'd1 and markers of the second haplotype block, because this might indicate a possibility for strengthening the association with breast cancer risk further.

Table 22 to 25 shows testing for interaction between RAI3'd1 and four markers located in the second haplotype block. We pooled the heterozygotes with the low risk homozygote for all markers. Using pooled groups as references tends to reduce the

relative risks, but the larger group sizes improve the statistical power. Indicated in each quadrant of the tables are the number of individuals in each subgroup, the relative risk for breast cancer and the corresponding confidence intervals. The p- value for interaction between the markers and the magnitude of the interaction are indicated below each table. It is clear that the middle two markers ASE1 exoni and ASE1 exon3-2 show a statistically significant interaction with RAI3'd1 , while the outer markers RAI5' and ERCC1 exon4 show tendencies in the same direction, but do not achieve statistical significance.

Table 22. Interaction between markers RAI3'd1 and RAI5' in relation to breast cancer

RAI3'd1/RAI5' AA AG+GG

LL+SL n=279 n=107

1 1.24 (0.74-2.05)

SS n=241 n=104

2.23 (1.47-3.37) 1.39 (0.82-2.35)

P(interaction) = 0.18. Magnitude of interaction = 1.39 - 2.23 - 1.24 + 1 = -1.08

Table 23. Interaction between markers RAI3'd1 and ASE1 exoni in relation to breast cancer RAI3'd1/ASE1 exoni AA + AG GG

LL + SL n=109 n=277

1 0.70 (0 .42-1 .16)

SS n=105 n=239

1.08 (0.58-1 .99) 1.78 (1 .08-2 .04)

P (interaction) = 0.02. Magnitude of interaction = 1.78 - 1.08 - 0.70 + 1 = 1

Table 24. Interaction between markers RAI3'd1 and ASE1 exon3-2 in relation to breast cancer

RAI3'd1/ASE1 exon3-2 GG AG+AA

LL+SL n=311 n=76

1 1.44 (0.84-2 .49)

SS n=272 n=77

2.31 (1 .56-3. 43) 1.22 (0.67-2 .22)

P(interaction) =0.02. Magnitude of the interaction = 1.22 - 2.31 - 1.44 + 1 = -1.53

Table 25. Interaction between markers RAI3'd1 and ERCC1 exon4 in relation to breast cancer

RAI3'd1/ERCC1 exon4 GG + AG AA

LL + SL n=257 n=127

1 0.89 (0.54-1 .46)

SS n=160 n=184

1.55 (0.98-2.45) 2.16 (1.36-3 .43)

P (interaction) =0.18. Magnitude of interaction = 2.16 - 1.55 - 0.89 + 1 = 0.72

Inspired by these results we subdivided the patients and controls into three groups: those that were homozygote for the haplotype RAI3'd1 s ASE1 exon1 G , those that were sure or possible heterozygotes for the same haplotype, and those that were definite non carriers of the haplotype.

Table 26 lists the relative risk with confidence interval and the P value for each group.

Table 26. Risk of breast cancer in relation to presence of the haplotype RAI3'd1 s

ASE1 exon1 G

_____ _

Non-carriers (117) 1

Heterozygotes (374) 1.31 (0.81-2.12) 0.26

Homozygotes (239) 2.51 (1.50-4.20) 0.0005

We made a similar calculation using only those breast cancer cases occurring before age 55 and the corresponding controls. The results are shown in table 27.

Table 27. Risk of breast cancer before age 55 in relation to presence of the haplo- type RAI3'd1 s ASE1 exon1 G

RR (Cl) P

Non-carriers(22) 1 Heterozygotes (55) 1.20 (0.37-3.88) 0.77 Homozygotes (32) 6.78 (1.56-29.48) 0.01

It is apparent that the combined use of the two markers improves discrimination and increases the relative risk in the homozygote group.