Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR DETERMINING A HLA-DQ HAPLOTYPE IN A SUBJECT
Document Type and Number:
WIPO Patent Application WO/2008/110206
Kind Code:
A1
Abstract:
The present invention relates to a method for determining a HLA-DQ haplotype of a subject, said method comprising: (a) obtaining a sample of genetic material from said subj ect ; (b) providing the identity of a nucleotide for one or more single nucleotide polymorphisms (SNPs) correlating to said HLA-DQ haplotype in said sample of genetic material; and (c) determining the HLA-DQ haplotype based on the identity of said nucleotide, wherein the one or more SNPs are selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108. The invention provides a novel, simple and reliable method for determining the HLA-DQ haplotype of a subject.

Inventors:
MONSUUR ALIENKE J (NL)
WIJMENGA CISCA (NL)
Application Number:
PCT/EP2007/052343
Publication Date:
September 18, 2008
Filing Date:
March 13, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GENOME DIAGNOSTICS B V (NL)
MONSUUR ALIENKE J (NL)
WIJMENGA CISCA (NL)
International Classes:
C12Q1/68
Domestic Patent References:
WO2005123951A22005-12-29
WO2005085323A22005-09-15
Other References:
DE BAKKER PAUL I W ET AL: "A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC.", NATURE GENETICS OCT 2006, vol. 38, no. 10, October 2006 (2006-10-01), pages 1166 - 1172, XP002456619, ISSN: 1061-4036
MONSUUR ALIENKE J ET AL: "Understanding the molecular basis of celiac disease: what genetic studies reveal.", ANNALS OF MEDICINE 2006, vol. 38, no. 8, 2006, pages 578 - 591, XP008085237, ISSN: 0785-3890
DE BAKKER PAUL I W ET AL: "Efficiency and power in genetic association studies.", NATURE GENETICS NOV 2005, vol. 37, no. 11, November 2005 (2005-11-01), pages 1217 - 1223, XP002456764, ISSN: 1061-4036
LOUKA A S ET AL: "HLA in coeliac disease: unravelling the complex genetics of a complex disorder.", TISSUE ANTIGENS FEB 2003, vol. 61, no. 2, February 2003 (2003-02-01), pages 105 - 117, XP002456765, ISSN: 0001-2815
Attorney, Agent or Firm:
MANTEN, Annemieke (Sweelinckplein 1, GK The Hague, NL)
Download PDF:
Claims:
CLAIMS

1. Method for determining a HLA-DQ haplotype of a subject, said method comprising: (a) obtaining a sample of genetic material from said subj ect;

(b) providing the identity of a nucleotide for one or more single nucleotide polymorphisms (SNPs) correlating to said HLA-DQ haplotype in said sample of genetic material; and

(c) determining the HLA-DQ haplotype based on the identity of said nucleotide, wherein the one or more SNPs are selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108.

2. Method as claimed in claim 1, wherein the HLA-DQ haplotype is selected from the group consisting of DQ2.2, DQ2.5, DQ7 and DQ8.

3. Method as claimed in claim 2, wherein said haplotype is the DQ2.2 haplotype when the nucleotide for the SNP rs2395182 is T, the nucleotide for the SNP rs7775228 is G and the nucleotide for SNP rs4713586 is not G.

4. Method as claimed in claim 2, wherein said haplotype is the DQ2.5 haplotype when the nucleotide for the SNP rs2187668 is T.

5. Method as claimed in claim 2, wherein said haplotype is the DQ7 haplotype when the nucleotide for the SNP rs4639334 is A.

6. Method as claimed in claim 2, wherein said haplotype is the DQ8 haplotype when the nucleotide for the SNP rs7454108 is G.

7. Method for identifying a subject at risk of having or developing a disease which is associated with one or more HLA-DQ haplotypes; said method comprising:

(a) obtaining a sample of genetic material from said subj ect; (b) providing the identity of a nucleotide for one or more single nucleotide polymorphisms (SNPs) correlating to one or more HLA-DQ haplotypes in said sample of genetic material;

(c) determining the HLA-DQ haplotypes based on the identity of said nucleotide; and

(d) identifying said subject based on the determined HLA-DQ haplotypes, wherein the one or more SNPs are selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108.

8. Method as claimed in claim 7, wherein the one or more HLA-DQ haplotypes are selected from the group consisting of DQ2.2, DQ2.5, DQ7 and DQ8.

9. Method as claimed in claim 8, wherein the HLA-DQ haplotype is the DQ2.2 haplotype when the nucleotide for the SNP rs2395182 is T, the nucleotide for the SNP rs7775228 is G and the nucleotide for SNP rs4713586 is not G.

10. Method as claimed in claim 8, wherein the HLA-DQ haplotype is the DQ2.5 haplotype when the nucleotide for the SNP rs2187668 is T.

11. Method as claimed in claim 8, wherein the HLA-DQ haplotype is the DQ7 haplotype when the nucleotide for the SNP rs4639334 is A.

12. Method as claimed in claim 8, wherein the HLA-DQ haplotype is the DQ8 haplotype when the nucleotide for the SNP rs7454108 is G.

13. Method as claimed in any of the claims 7-12, wherein the disease is celiac disease.

14. Method as claimed in claim 13, wherein the subject is at risk when said subject carries the DQ2.5 haplotype.

15. Method as claimed in claim 14, wherein the subject is at risk when said subject is homozygous for said DQ2.5 haplotype .

16. Method as claimed in claim 13, wherein the subject is at risk when said subject carries the DQ8 haplotype.

17. Method as claimed in claim 16, wherein the subject is at risk when said subject is homozygous for said DQ8 haplotype .

18. Method as claimed in claim 13, wherein the subject is at risk when said subject carries the DQ2.2 haplotype in combination with the DQ2.5 or the DQ7 haplotype.

19. Method as claimed in claim 13, wherein the subject is at risk when said subject carries the DQ7 haplotype in combination with the DQ2.5 haplotype.

20. Method as claimed in any of the preceding claims 1- 19, wherein the subject is of Caucasian origin.

21. Method as claimed in claim 20, wherein the subject is of Dutch Caucasian origin.

22. Kit for performing the method as claimed in any of the claims 1-21, comprising means for providing the identity of the nucleotide for one or more single nucleotide polymorphisms (SNPs) correlating to said HLA-DQ haplotype in a sample of genetic material, wherein the one or more SNPs are selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108.

23. Use of one or more SNPs selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108 for identifying a subject at risk of having or developing a disease which is associated with one or more HLA-DQ haplotypes .

24. Use as claimed in claim 23, wherein said disease is celiac disease.

Description:

METHOD FOR DETERMINING A HLA-DQ HAPLOTYPE IN A SUBJECT

The present invention relates to methods for determining a HLA-DQ haplotype in a subject. The invention further relates to methods for identifying a subject at risk of having or developing a disease which is associated with one or more HLA-DQ haplotypes. The invention also relates to kits for performing said methods.

The HLA genes are located in the major histocompatability complex (MHC) region on the human chromosome 6p21.3. The MHC complex thus is a large genomic region that contains a series of linked HLA genes encoding the HLA proteins that have critical roles in the presentation of antigens to T-cells and in the recognition and discrimination between "self and "non-self molecules by T-cells. The HLA proteins are encoded by two classes of genes, HLA class I and HLA class II. HLA class I genes include HLA-A, HLA-B and HLA-C, and HLA class II genes include HLA-DR, HLA-DQ and HLA-DP. Class I gene products combine with β2-microglobulin protein to form a functional receptor on most nucleated cells of the body.

Class II molecules are heterodimeric (cxβ) protein receptors consisting of a low-polymorphic alpha (CX) chain and a highly polymorphic beta (β) chain. Class II MHC molecules are typically expressed on the cell surface of antigen presenting cells, such as macrophages, B-cells and activated T-cells.

Different types of HLA proteins are expressed in different individuals, and these different HLA proteins also bind to different peptide antigens. This gives rise to the differences in the immune response among individuals. For this reason, determination of the types of HLA-encoding genes in the HLA region (HLA typing) is very important for determining the compatibility between a donor and a recipient in organ transplantation, and for e.g. evaluating the susceptibility of an individual to particular diseases.

Different approaches for HLA typing are known. In serological typing, HLA serotypes are identified based on the antigen/antibody interactions between anti-HLA monoclonal antibodies and lymphocytes. However, this approach has disadvantages in that the accuracy of the typing is relatively low and that it only allows determination of limited types of HLA. HLA genotyping by techniques based on polymerase chain reaction (PCR) have become a widely used alternative to serological methods in clinical practice. While the known DNA genotyping methods have been effective in the routine laboratory practice, these methods generally are laborious and/or expensive. Thus, testing for specific HLA risk alleles is routinely performed using specialized kits, but these tests often require as many as 24-60 reactions, multiple steps, like amplification and hybridization to a membrane, special software and/or expertise in analyzing the results (e.g. DNA PCR-single-strand conformation polymorphism (PCR-SSCP) (Carrington et al . , 1992), PCR and sequence specific oligonucleotide probing (PCR-SSOP) (Ronningen et al., 1991), PCR-sequence specific primer kits (PCR-SSP)

(Olerup et al . , 1992), and PCR-reverse lineblot (PCR-RLB) (Buyse et al . , 1993). Typing directly the genetic variants that encode specific HLA alleles is usually very difficult since most of these variants are surrounded by too many other variants that interfere with primer annealing.

The object of the present invention is to provide a simple HLA genotyping method by which the above-mentioned drawbacks are obviated.

This object has been achieved by the invention, by providing a method for determining a HLA-DQ haplotype of a subject, said method comprising:

(a) obtaining a sample of genetic material from said subj ect;

(b) providing the identity of a nucleotide for one or more single nucleotide polymorphisms (SNPs) correlating to said HLA-DQ haplotype in said sample of genetic material; and (c) determining the HLA-DQ haplotype based on the identity of said nucleotide. wherein the one or more SNPs are selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108. The invention thus provides a novel, simple and reliable method for determining the HLA-DQ haplotype of a subject, by providing the identity of one or more SNPs in a sample of genetic material derived from the individual, wherein the SNPs correlate with a specific HLA-DQ haplotype. The SNPs according to the present invention are located in the chromosome 6p21.3 region.

A single nucleotide polymorphism (SNP) is a single base pair position in genomic DNA at which different sequence alternatives (alleles) exist. Thus, a SNP is a DNA sequence variation that occurs when one nucleotide, e.g. adenine (A), thymine (T), cytosine (c) or guanine (g) in the genome sequence is altered to another nucleotide. Known SNPs are catalogued and referenced in different, freely accessible databanks of for example NCBI (National Center for Biotechnology Information, Bethesda, U.S.A.), The

International HapMap Project (www.hapmap.org), ensembl (www.ensembl.org), or others. SNPs are identified by their unique rs number, by which the SNPs and its flanking sequences (usually up to a few hundred basepairs) can be found in said databases. The sequence information for the specific SNPs of the invention is given in Figure 2.

A haplotype generally is a combination of alleles found at linked polymorphic sites (such as within a gene) on a single chromosome. According to the present invention, the term HLA-DQ haplotype refers to a specific set of HLA-DQAl and -DQBl alleles. A HLA-DQ haplotype can include one or more SNPs.

In the research that has led to the present invention, specific SNPs have been identified that are in linkage disequilibrium (LD) with specific HLA-DQ haplotypes of interest, and it has been demonstrated that the SNPs of the present invention can be used to predict the HLA-DQ haplotypes of a subject, in particular one or more of the HLA-DQ haplotypes selected from the group consisting of the DQ2.2, DQ2.5, DQ7 and DQ8 haplotype, with a high sensitivity and specificity.

In a preferred embodiment of the method of the invention, the HLA-DQ haplotype is the DQ2.2 haplotype when the nucleotide for the SNP rs2395182 is T, the nucleotide for the SNP rs7775228 is G and the nucleotide for SNP rs4713586 is not G.

In a further preferred embodiment, said haplotype is the DQ2.5 haplotype when the nucleotide for the SNP rs2187668 is T.

In yet another preferred embodiment of the invention, said haplotype is the DQ7 haplotype when the nucleotide for the SNP rs4639334 is A.

In a further preferred embodiment, the HLA-DQ haplotype is the DQ8 haplotype when the nucleotide for the SNP rs7454108 is G. The sample of genetic material conveniently is a sample of blood, saliva, or any other body fluid or tissue containing one or more nucleic acids derived from a subject to be tested. It will be appreciated that these samples may

be processed using well-known techniques before being used in the method according to the present invention. For example, all or part of a sample nucleic acid may first be amplified using convenient techniques, such as PCR, before analysis of the SNPs.

It will be apparent to the skilled person in the art that different well-known analytical methods may be used to detect the SNPs, i.e. to detect the presence of variant nucleotides. For example, use can be made of specific unlabeled forward and reverse primers to amplify the part of the DNA that contains the SNP. Subsequently, two specific probes, one for each allele, are used to detect the specific alleles, both with their own fluorescent label, e.g. TaqMan® probes (VIC® dye - MGB and 6FAM™ dye - MGB, Applied Biosystems, Foster City, U. S.A) The probe that is specific for that allele anneals to the DNA. Subsequently the fluorescence is released and can be detected.

Examples of the specific primer pairs which are preferably used in order to identify the nucleotides for on one or more SNPs according to the invention, are listed in Table 1.

Allelic variations in HLA genes have been shown to have associative or causal relationships with a relative risk or predisposition to a large number of diseases, especially diseases with (auto) immunological or inflammatory origin. For example, HLA-DQAl and HLA-DQBl genes have specific known risk alleles that confer risk to specific diseases, such as e.g. celiac disease and type 1 diabetes.

It is therefore a further object of the invention to provide methods, which are based on the identification of the HLA-DQ haplotype of an individual, for predicting the individual's susceptibility to certain diseases and can also be used in disease diagnosis.

This object is achieved by providing a method for identifying a subject at risk of having or developing a disease which is associated with one or more HLA-DQ haplotypes comprising: (a) obtaining a sample of genetic material from said subj ect;

(b) providing the identity of a nucleotide for one or more single nucleotide polymorphisms (SNPs) correlating to one or more HLA-DQ haplotypes in said sample of genetic material;

(c) determining the one or more HLA-DQ haplotypes based on the identity of said nucleotide; and

(d) identifying said subject based on the determined HLA-DQ haplotypes, wherein the one or more SNPs are selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108.

According to the invention a method is provided for determining whether a subject is at risk of having or developing a disease which is associated with one or more HLA-DQ haplotypes, by providing the identity of a plurality of SNPs in one or more nucleic acids from the individual that correlate with one or more HLA-DQ haplotypes, and identifying the subject at risk based on the predicted HLA-DQ haplotypes. As used in the present application, the wording "a subject at risk of having or developing a disease which is associated with one or more HLA-DQ haplotypes" refers to both subjects having or suspected of having a disease and asymptomatic subjects who may be tested for a predisposition or susceptibility of such disease.

The HLA-DQ associated disease preferably is celiac disease (CD) . Celiac disease, the most common intolerance to a dietary component in Western society, is sustained by an

abnormal T cell response to gluten as an environmental factor and is strongly associated with HLA class II genes. It is known that almost 95% of the CD patients carry at least one of the two risk molecules DQAl*0501/DQBl*0201 (i.e. haplotype DQ2.5) and DQAl*0301/DQBl*0302 (i.e. haplotype DQ8 ) . The molecules encoded by the CD-associated HLA-DQAl and HLA-DQBl genes form DQa and DQβ heterodimers, which can lead to several functional molecules of which one to four copies can be made. A few variants of these genes predispose to CD (either alone or in combination) .

In the research that forms the basis of the invention, specific tag SNPs were selected to predict DQ2.2, DQ2.5, DQ7 and DQ8 haplotypes in a collection of CD patients, non-CD trio families and blood bank controls. The sensitivity, specificity, predictive value and the correlation between the SNP-based test and the true HLA variant (r 2 ) were examined (as described in Example 1) . It was thus found that only 6 SNPs are needed to predict the DQ2.2, DQ2.5, DQ7 and DQ8 risk types carried by >95% of CD patients, with a sensitivity >0.968, specificity >0.994 and a predictive value >0.940.

It is known that most of the patients without DQ2.5 and DQ8 carry half of the DQ2.5 or DQ2.2 molecule (either HLA- DQAl*0501 or -DQBl*0202) suggesting that carrying part of the risk molecules has functional implications on the risk of CD (as described by Karell et al. 2003; and Louka et al . , 2003). According to the present invention, it has been found that of the patient group 98.4% carry one of the risk groups (DQ2.2, DQ2.5, DQ7, DQ8 and the DQ types that have half of the risk haplotypes) and that 96.1% of all patients are correctly predicted using the method of the invention. Overall, the specificity is >0.969 and predictive value is >0.995 when taking into account that some of the false predictions

included an allele that is part of a risk haplotype (e.g. the HLA-DQAl* 0501 allele which is part of the DQ2.5 haplotype).

With the methods of the invention it is also possible to determine whether a person is homozygous or heterozygous for a specific HLA-DQ haplotype. Reinton et al . (2006) have described a real-time PCR for the detection of the CD- associated HLA risk alleles. This method requires 11 reactions and even more when homozygous persons for the HLA- risk alleles need to be distinguished from heterozygous persons. De Bakker et al . (2006) furthermore showed two examples that used a tagging method for CD and systemic lupus erythematosus (SLE) . In that study, two SNPs were chosen to capture DQ2.2 and DQ2.5 in the same CD cohort (N=330) as used for the present invention. The rs4988889 (T) , rs2858331 (C) haplotype was used to determine the presence of DQ2.2 and the rs4988889 (T) , rs2858331(T) haplotype was used to determine DQ2.5. Although the SNPs looked promising in determining DQ2.5 homozygosity or DQ2.2/DQ2.5 heterozygosity, it appeared often difficult to distinguish DQ2.2/X heterozygous from the DQ2.5/X heterozygous persons (X is any other allele excluding DQ2.2 or DQ2.5), due to phase uncertainty of the alleles at the two SNPs. A person that is heterozygous for rs4988889 (G/T) , has one copy of DQ2.2 or DQ2.5. When this person is also heterozygous for rs2858331 (C/T) , then it is uncertain which of these alleles (either C or T) is on the same chromosome as the T allele of rs498889, and therefore forms either DQ2.2 or DQ2.5.

In contrast to these examples, the SNPs according to the present invention are well capable of determining whether a person is homozygous for DQ2.2 or DQ2.5, heterozygous for DQ2.2 or DQ2.5 or does not possess the DQ2.2 or DQ2.5 haplotype at all.

The methods of the invention are cost-effective and the experimental procedures are straightforward using only routine genotyping equipment. Thus, the methods can e.g. be used for SNP-based population screening for CD. Family-based or population-based screening for the CD risk variants has important diagnostic value in supporting the diagnosis of CD when the specific risk alleles are present, and minimizing the possibility of CD when they are not present (high negative predictive value) . CD affects almost 1% of the population, although it is estimated that the majority of cases remain undiagnosed. Since untreated CD can cause long- term health problems, targeted screening for CD can potentially identify such undiagnosed individuals and prevent life long symptoms and complications. The methods of the present invention provide easy, cheap and quick means for determining which part of the population (-25%) needs to be screened extensively for CD. As this test requires very little DNA and is somewhat insensitive to DNA quality, this test will also be applicable to DNA material isolated from e.g. biopsies, whole-genome amplified DNA, and DNA isolated from FTA cards. Furthermore, the methods also determine which individuals are not at risk for developing CD and therefore do not need further serology tests.

As described above, according to the invention simple and reliable methods are described that predict the presence of risk DQ types involved in CD with high accuracy. This method can, however, also be applied to TlD where the DQ2.5 and DQ8 haplotypes are known risk factors, or more generally for other immune-related diseases with known HLA-DQ risk alleles.

It has been demonstrated that the most important risk factor for CD is the DQ2.5 haplotype (see Figure 1 and Table 2) with the highest risk in individuals being homozygous for

this haplotype, or who have a single copy of DQ2.5 and one copy of DQAl*0201/DQBl*0202 (i.e. haplotype DQ2.2 ) , haplotype DQ8, or DQAl*0505/DQBl*0301 (i.e. haplotype DQ7 ) . The frequency of these alleles in the general population is substantial (> 20%), suggesting that these variants are necessary for disease development but not sufficient. Thus, according to a preferred embodiment of the invention, a subject is at risk when said subject carries the DQ2.5 haplotype. A subject is particular at risk when said subject is homozygous for said DQ2.5 haplotype.

According to another preferred embodiment, a subject is at risk when said subject carries the DQ8 haplotype. A subject is particular at risk when said subject is homozygous for said DQ8 haplotype. According to another preferred embodiment, a subject is at risk when said subject carries the DQ2.2 haplotype in combination with the DQ2.5 or the DQ7 haplotype.

According to a further preferred embodiment, a subject is at risk when said subject carries the DQ7 haplotype in combination with the DQ2.5 haplotype.

The present invention further provides kits for performing the methods of the invention, as described above. The invention preferably provides kits comprising means for providing the identity of the nucleotide for one or more single nucleotide polymorphisms (SNPs) corresponding to said HLA-DQ haplotype in a sample of genetic material, wherein the one or more SNPs are selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108. According to a preferred embodiment, the kits comprise at least one or more allele-specific primer pairs. The kits preferably comprise one or more of the specific primer pairs as listed in Table 1. The kits further may comprise suitable

buffer (s), enzymes, such as polymerases, and/or dNTPs, optionally provided in combination with suitable carriers, as well as appropriate packaging and instructions for use. Preferably, one or more of said reagents are present on the carrier (such as a multiwell plate) , such that only the genetic material (e.g. the DNA) of the subjects to be tested needs to be added.

The invention further also relates to the use of at least one of the SNPs selected from the group consisting of rs2395182, rs7775228, rs4713586, rs2187668, rs4639334 and rs7454108 for identifying a subject at risk of having or developing a disease which is associated with one or more specific HLA-DQ haplotypes. Preferably, the disease is celiac disease . The present invention is illustrated by the following Figures and Examples.

Figure 1: HLA-DQAl* and -DQBl* together form heterodimers of which DQ2.5 and DQ8 either in homozygous or heterozygous state, confer risk to CD due to their ability to present gluten to T cells. DQ2.2 and DQ7 can only confer risk to CD when present in combination or when present with DQ2.5 (trans effect, see dashed lines) . See Table 2b for the possible combinations, the number of risk molecules and the associated risk. Figure 2 shows the sequence information for the SNPs (underlined) according to the present invention and their flanking sequence.

EXRMPLE 1

Effective detection of HLA risk alleles using tag SNPs

MATERIAL AND METHODS

DNA samples

DNA was available from three cohorts. The three different cohorts are used to study different aspects of the tag SNP method. The control trio cohort enabled the inventors to check for Mendelian errors (which were not observed) . The case cohort has a high amount of the HLA-DQ2 risk variants which is useful for testing the positive predictive value, and the blood bank cohort gives a better view of the robustness of the method in the general population. The first cohort consists of 330 unrelated celiac disease (CD) patients of Dutch Caucasian origin, as described earlier (Monsuur et al., 2005) . Only CD patients diagnosed according to revised ESPGHAN criteria and with a Marsh III lesion confirmed by duodenal biopsy sampling were selected for this study, as described previously by Van Belzen et al . (2003) and Walker- Smith et al . (1990). A cohort of population-based control trios was derived from families without a history of CD (Van Belzen et al . , 2004) . The 86 control trios were selected for the presence of at least one parent carrying haplotype DQ2.5 and were all of Dutch Caucasian origin. For 207 of the 264 persons in the 86 trios HLA typing data was available (see below). The blood bank cohort is part of the ITItwo panel (ITI panel is a DNA panel from the Section Immunogenetics and Transplantation

Immunology of the LUMC, Leiden, the Netherlands) and consist of 219 unrelated randomly selected Dutch blood donors. The study was approved by the Medical Ethics Committee of the

University Medical Centre Utrecht, and informed consent was obtained from the participants .

HLA typing The CD cohort and the control trio cohort were typed for HLA-DQAl and -DQBl genes using a classical PCR- SSCP/heteroduplex method in an official HLA typing laboratory as described earlier (Carrington et al., 1992). Full HLA-DQAl and -DQBl typing was available for the entire CD cohort. For the control trio cohort full HLA-DQAl and -DQBl typing was available for the child and both parents in 35 trios and for the child and one of the parents in 51 trios, leading to a total of 207 persons available for analyses. For the blood bank cohort full (four digit) HLA-DRBl, -DQAl and -DQBl typing was performed by PCR-SSCP using locally produced and slightly modified primer mixes (Verduyn et al . , 1996) . The typing of this cohort was performed in the European Foundation of Immunogenetics (EFI) accredited HLA laboratory of the Department of IHB, LUMC, Leiden.

Tag SNP selection

Tag SNPs were selected that capture the following HLA types: DQ2.2 (two SNP for DQ2.2 and one SNP to exclude DQ4 from the DQ2.2 group), DQ2.5 (1 SNP), DQ7 (1 SNP), and DQ8 (1 SNP) (see Table 1). DQ2.5 and DQ8 are risk factors for CD and are carried by -95% of the CD patients.

The HLA-DQAl*0505 allele of DQ7 and HLA-DQAl*0501 allele of DQ2.5 only differ one or a few base-pairs and are thought to have the same functional properties. This also holds true for the HLA-DQBl*0202 allele of DQ2.2 and the HLA-DQBl*0201 allele of DQ2.5. Most of the CD patients that do no carry DQ2.5 or DQ8, carry half of the DQ2.5 or DQ2.2 molecule (that is either HLA-DQAl*0501 or -DQBl*0202) suggesting that

carrying part of the risk molecules has functional implications on the risk of CD, as described by Karell et al . (2003) .

Tag SNP selection was based on genotype data collected in the classical HLA genes and >7,500 common SNP and deletion- insertion polymorphisms across the extended human MHC region (De Bakker et al . , 2006). "Tagger" (De Bakker, 2005) was used to derive SNP-based tests to capture each DQ type in the extended CEU analysis panel (Utah residents with northern and western European ancestry) . First the SNPs that have the highest r 2 to a DQ type were found. Subsequently, multi-SNP (haplotype) tests were performed to achieve higher a r 2 with which a DQ type is captured (if r 2 <1). For DQ2.2 multiple SNPs were needed that in combination capture this HLA allele. Since there is a lot of variation in the MHC region that can interfere in primer annealing only SNPs that could be typed using conventional methods (e.g. Taqman) were selected.

Tag SNP typing The tag SNPs were obtained as Assay on Demand (rs2395182, rs4713586, rs4639334) or Assay by Design (rs7454108, rs7775228, rs2187668) from Applied Biosystems (Applied Biosystems, Foster City, California, USA) (see Table 1 for assay numbers or primer sequences and their allele labeling) . Samples were genotyped using the manufacturer' s instructions and analyzed on an ABI PRISM 7900 HT system (Applied Biosystems). All SNPs have been typed using the standard amplification protocol as supplied by Applied biosystems. For analysis end-point measurements were obtained. Dropout rates were below 3.57%. No Mendelian errors were observed for the SNPs in the control trio cohort.

Analyses

The HLA-DQAl and -DQBl genotypes as determined at the HLA-typing centers were used to establish the corresponding DQ types (see Figure 1) . Due to the high linkage disequilibrium in the MHC region only a limited set of DQAl*- DQBl* haplotypes (DQ types) are observed in the general population, resulting in only a few instances that did not correspond to canonical DQ types. For the prediction method DQ types from the tag SNPs were inferred. DQ types were determined according to the predicting alleles (see Table 2, e.g. a person was called homozygous DQ8 if rs7454108 was homozygous G, heterozygous DQ8 if rs7454108 was heterozygous G/A) . Only individuals with non-missing data were used for the comparison of the official typing and the prediction method.

DQ types based on the official typing and those from the tag SNP typing method were compared to examine the sensitivity, specificity, positive predictive value (PPV) and r 2 .

RESULTS

A total of 6 SNPs were needed to accurately predict the DQ2.2, DQ2.5, DQ7 and DQ8 risk types for celiac disease (CD). Typing was done in three different cohorts comprising a total of 756 persons (1512 alleles). Dropout rates for these SNPs were <3.57%. All SNPs were in Hardy-Weinberg equilibrium and no Mendelian errors were observed in the trios.

A high correlation was observed between the three groups for the sensitivity, specificity, PPV and the r 2 . The results of the three different cohorts have been grouped in Table 3. (The separate results of each group are shown in Table 4) . For each DQ type all persons with non-missing data for

the relevant SNPs were used. A person with missing data for e.g. DQ2.5 is excluded for the DQ2.5 analysis, but can be used for the other analyses if genotypes relevant for the other DQ types are present. At first the sensitivity and specificity for DQ2.2 was high and accurate but the predictive value was low. The SNPs for DQ2.2 (rs2395182, rs7775228) did not only tag DQ2.2 but also included the relatively infrequent DQ4 allele. It was therefore decided to tag DQ4 as well (rs4713586) making it possible to call a person DQ2.2 when the alleles were positive for DQ2.2 and negative for DQ4. This led to three tag SNPs needed for the prediction of DQ2.2, with an overall sensitivity of 0.992, a specificity of 0.995 and a PPV of 0.954. Only seven out of the 1448 tested chromosomes gave false results (0.5%).

The tag SNP selected for prediction of DQ2.5 (rs2187668), showed an overall sensitivity of 0.996, a specificity of 0.994 and a PPV of 0.991. Only seven out of the 1460 tested chromosomes gave false results (0.5%). One out of these seven chromosomes carried half of the DQ2.5 haplotype (DQAl*0501) .

The tag SNP for DQ7 (rs4639334), showed an overall sensitivity of 0.968, a specificity of 0.996 and a PPV of 0.939. Nine out of the 1468 tested chromosomes gave false results (0.6 %), of which three carried a rare haplotype consisting of half of the DQ7 haplotype, two out of these three carried on the other half the DQBl*0302 of DQ8 (see results of DQ8 as well) .

The tag SNP for DQ8 (rs7454108), showed an overall sensitivity of 0.991, a specificity of 0.995 and a PPV of 0.940. Eight out of the 1486 tested chromosomes gave false results (0.5%). Three of these eight chromosomes carried half of the DQ8 haplotype (DQBl*0302), two out of these three

chromosomes carried on the other half the DQ7 (DQAl*0505) and were predicted to be both DQ7 and DQ8.

Accepting the prediction of these half haplotypes as good predictions of the risk alleles increases the sensitivity, specificity and PPV slightly.

CONCLUSION

In the research that led to the present invention a tagging approach was applied that exploits linkage disequilibrium between single nucleotide polymorphism (SNPs) and the CD-associated HLA risk factors DQ2.5 and DQ8 that give direct risk, and DQAl*0201/DQBl*0202 (DQ2.2) and DQAl*0505/DQBl*0301 (DQ7) that attribute to the risk of DQ2.5 to CD. In order to evaluate the predictive power of this approach, the predicted DQ types based on six tag SNPs were compared with those executed with current validated laboratory typing methods of the HLA-DQAl and HLA-DQBl genes in three large cohorts. The results show that this tag SNP method is very accurate and provides a good basis for e.g. population screening of CD.

References :

Carrington M, Miller T, White M, Gerrard B, Stewart C, Dean M, et al. Typing of HLA-DQAl and DQBl using DNA single-strand conformation polymorphism. Hum Immunol. 1992 Mar; 33 (3) : 208-

12.

Ronningen KS, Spurkland A, Iwe T, Vartdal F, Thorsby E. Distribution of HLA-DRBl, -DQAl and -DQBl alleles and DQAl- DQBl genotypes among Norwegian patients with insulin- dependent diabetes mellitus. Tissue Antigens. 1991 Mar;37 (3) :105-ll.

Olerup 0, Zetterquist H. HLA-DR typing by PCR amplification with sequence-specific primers (PCR-SSP) in 2 hours: an alternative to serological DR typing in clinical practice including donor-recipient matching in cadaveric transplantation. Tissue Antigens. 1992 May; 39 (5) : 225-35.

Buyse I, Decorte R, Baens M, Cuppens H, Semana G, Emonds MP, et al . Rapid DNA typing of class II HLA antigens using the polymerase chain reaction and reverse dot blot hybridization. Tissue Antigens. 1993 Jan; 41 (1) : 1-14.

Karell K, Louka AS, Moodie SJ, Ascher H, Clot F, Greco L, et al . HLA types in celiac disease patients not carrying the DQAl*05-DQBl*02 (DQ2) heterodimer: results from the European Genetics Cluster on Celiac Disease. Hum Immunol. 2003 Apr; 64 (4) : 469-77.

Louka AS, Sollid LM. HLA in coeliac disease: unravelling the complex genetics of a complex disorder. Tissue Antigens. 2003 Feb;61 (2) : 105-17.

Reinton N, Helgheim A, Shegarfi H, Moghaddam A. A one-step real-time PCR assay for detection of DQAl () 05, DQBl () 02 and DQBlO 0302 to aid diagnosis of celiac disease. J Immunol Methods. 2006 Oct 30; 316 (1-2 ): 125-32.

De Bakker PI, McVean G, Sabeti PC, Miretti MM, Green T, Marchini J, et al . A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet. 2006 Oct; 38 (10) : 1166-72.

Monsuur AJ, de Bakker PI, Alizadeh BZ, Zhernakova A, Bevova MR, Strengman E, et al . Myosin IXB variant increases the risk of celiac disease and points toward a primary intestinal barrier defect. Nat Genet. 2005 Dec; 37 (12) : 1341-4.

Van Belzen MJ, Meijer JW, Sandkuijl LA, Bardoel AF, Mulder CJ, Pearson PL, et al. A major non-HLA locus in celiac disease maps to chromosome 19. Gastroenterology. 2003 Oct;125 (4) :1032-41.

Walker-Smith J, Guandalini S, Schmitz J, Shmerling D, JK V. Revised criteria for diagnosis of coeliac disease. Report of Working Group of European Society of Paediatric Gastroenterology and Nutrition. Arch Dis Child. 1990 Aug;65 (8) : 909-11.

van Belzen MJ, Koeleman BP, Crusius JB, Meijer JW, Bardoel AF, Pearson PL, et al . Defining the contribution of the HLA region to cis DQ2-positive coeliac disease patients. Genes Immun. 2004 May; 5 (3 ): 215-20.

Verduyn W, Anholts JD, Versluis LF, Parlevliet J, Drabbels J, De Meester J, et al . Six newly identified HLA-DRB alleles:

DRB1*1121, *1419, *1420, *1421, DRB3*0203 and DRB5*0103. Tissue Antigens. 1996 Aug; 48 (2) : 80-6.

De Bakker PI, Yelensky R, Pe'er I, Gabriel SB, Daly MJ, Altshuler D. Efficiency and power in genetic association studies. Nat Genet. 2005 Nov; 37 (11) : 1217-23.

Table 1.

rs number Allele call Assay type Assay number Basepair location Drop out rate

VIC FAM % rs2395182 G T On Demand C 11409965 10 32521295 1.32 rs4713586 A G On Demand C 27950246 10 32767560 2.90 M rs4639334 A G On Demand C 42975350 10 32710192 2.90 rs7454108 G A By Design 32789461 1.85 rs7775228 A G By Design 32766057 0.93 rs2187668 T C By Design 32713862 3.57

rs number Primer sequence Reporter sequence

Forward Reverse VIC FAM rs7454108 ACTATTATTTCTCCAAGTTCTGACTTCCCT GCCAAGTTGGAATAAGCCCACTATA CAAAATAGCATGAGTATTAG AAAATAGCATGAATATTAG rs7775228 AGGAAAGGAACTATCTGGGTATGGA TGCAAAGCCCCTTTATCATTATCCT TTCAATCACAATCTTGC TCAATCACAGTCTTGC rs2187668 GTGAGGTGACACATATGAGGCAG GGCTGAATGCCTTCAACAATCATTT CTGAGAGTAAATGAGGACC TGAGAGTAAGTGAGGACC

Table 2. DQ molecules and tested tag SNPs a) DQ molecules, the corresponding HLA-DQAl * and -DQB 1 * alleles, with the DR type and the tag SNPs

DQ type DQAl DQBl DR tag SNP Positive predicting allele(s) tag SNP Negative predicting allele

DQ2.2 0201 0202 7 rs2395182, rs7775228 T 5 G rs4713586 G

DQ2.5 0501 0201 3 rs2187668 T

DQ7 0505 0301 5 rs4639334 A

DQ8 0301 0302 4 rs7454108 G

b) Combinations of the DQ molecules on the two chromosomes, the number of functional copies and the genetic risk associated to celiac disease (calculated using the case cohort and the blood bank cohort). *this risk increases to 4.1 in the DQ2.5 negative group

Number of Genetic

IV)

DQ molecule 1 DQ molecule 2 functional copies risk

DQ2.5 rest ≥l 5.5

DQ2.5 DQ2.5 4 13.1

DQ2.5 no DQ2.2, DQ2.5, DQ7 1 1.3

DQ2.5 no DQ2.5 1-2 2.5

DQ2.5 DQ2.2 2 10.1

DQ2.2 or DQ2.5 rest 1-4 24.4

DQ2.2 DQ7 1 1.8*

DQ2.2 no DQ2.5, DQ7 0 -

DQ7 no DQ2.2, DQ2.5 0 -

DQ2.5 DQ7 2 -

DQ8 rest 1 -

DQ8 DQ8 4 -

Table 3. Prediction results; combined cohorts.

DQ2.2 3 SNPs

+ sensitivity 0.992

SNP prediction + 124 6 130 specificity 0.995

- 1 1317 1318 positive predictive value 0.954

125 1323 1448 r-squared 0.941

DQ2.5 1 SNP

+ sensitivity 0.996

SNP prediction + 565 5 570 specificity 0.994

- 2 888 890 positive predictive value 0.991

567 893 1460 r-squared 0.980

DQ7 1 SNP sensitivity 0.968

SNP prediction 98 specificity 0.996 1370 positive predictive value 0.939

95 1373 1468 r-squared 0.903

DQ8 1 SNP sensitivity 0.991

SNP prediction 116 specificity 0.995 1370 positive predictive value 0.940

110 1376 1486 r-squared 0.926

Table 4. Predictive results, a) DQ2.2 per cohort, b) DQ2.5 per cohort, c) DQ7 per cohort, d) DQ8 per cohort.

Cases DQ2.2 3 SNPs

+ sensitivity 0.987

SNP prediction + 75 3 78 specificity 0.995

- 1 557 558 positive predictive value 0.962

76 560 636 r-squared 0.942

Control trios DQ2.2 3 SNPs

+ sensitivity 1.000

SNP prediction + 15 0 15 specificity 1.000

- 0 111 377 positive predictive value 1.000

15 377 392 r-squared 1.000

Blood bank cohort DQ2.2 3 SNPs

+ sensitivity 1.000

SNP prediction + 34 3 37 specificity 0.992

- 0 383 383 positive predictive value 0.919

34 386 420 r-squared 0.911

Cases DQ2.5 1 SNP

+ sensitivity 0.994

SNP prediction + 351 4 355 specificity 0.986

- 2 281 283 positive predictive value 0.989

353 285 638 r-squared 0.962

Control trios DQ2.5 1 SNP

+ sensitivity 1.000

SNP prediction + 157 1 158 specificity 0.996

- 0 238 238 positive predictive value 0.994

157 239 396 r-squared 0.989

Blood bank cohort DQ2.5 1 SNP

+ sensitivity 1.000

SNP prediction + 57 0 57 specificity 1.000

- 0 369 369 positive predictive value 1.000

57 369 426 r-squared 1.000

Cases DQ7 1 SNP

+ sensitivity 0.964

SNP prediction + 27 4 31 specificity 0.993

- 1 610 611 positive predictive value 0.871

28 614 642 r-squared 0.833

Control trios DQ7 1 SNP

+ sensitivity 0.938

SNP prediction + 30 1 31 specificity 0.997

- 2 373 375 positive predictive value 0.968

32 374 406 r-squared 0.900

Blood bank cohort DQ7 I SNP

+ sensitivity 1.000

SNP prediction + 35 1 36 specificity 0.997

- 0 384 384 positive predictive value 0.972

35 385 420 r-squared 0.970

Cases DQ8 1 SNP

+ sensitivity 0.972

SNP prediction + 35 4 39 specificity 0 .994

- 1 612 613 positive predictive value 0 .897

36 616 652 r-squared 0 .865

Control trios DQ8 1 SNP

+ sensitivity 1.000

SNP prediction + 31 2 33 specificity 0 .995

- 0 369 369 positive predictive value 0 .939

31 371 402 r-squared 0 .934

Blood bank cohort DQ8 1 SNP

+ sensitivity 1.000

SNP prediction + 43 1 44 specificity 0.997

- 0 388 388 positive predictive value 0.977

43 389 432 r-squared 0.975