Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR BOOD GROUP GENOTYPING
Document Type and Number:
WIPO Patent Application WO/2017/194973
Kind Code:
A1
Abstract:
The present invention provides a method for determining a Kidd (JK) blood group phenotype of a patient, the method comprising analysing the genetic sequence of the patient's blood group defining alleles; comparing that sequence to at least one of the JK reference sequences of SEQ ID NO: 19-22; and determining the level of homology between the patient's genetic sequence and the reference sequences(s), wherein homology of at least 80% indicates one of the patient's JK phenotypes as identified by the reference sequence and wherein homology of about 40% to about 60% indicates a hybrid phenotype. The present invention additionally provides a method for determining the RHD haplotype of a patient, the method comprising analysing the genetic sequence of the patient's blood group defining alleles; comparing that sequence to at least one of the RHD reference sequences of SEQ ID NOs:23-25; and determining the level of homology between the patient's genetic sequence and the reference sequence(s), wherein homology of at least 80% indicates one of the patient's RHD haplotypes as identified by the reference sequence

Inventors:
AVENT NEIL DAVID (GB)
MADGETT TRACEY ELIZABETH (GB)
HALAWANI AMR JAMAL (GB)
ALTAYAR MALIK (GB)
Application Number:
PCT/GB2017/051347
Publication Date:
November 16, 2017
Filing Date:
May 15, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV PLYMOUTH (GB)
International Classes:
C12Q1/68
Other References:
RHIANNON S. MCBEAN ET AL: "Approaches to Determination of a Full Profile of Blood Group Genotypes: Single Nucleotide Variant Mapping and Massively Parallel Sequencing", COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, vol. 11, no. 19, 1 September 2014 (2014-09-01), Sweden, pages 147 - 151, XP055388761, ISSN: 2001-0370, DOI: 10.1016/j.csbj.2014.09.009
NEIL D. AVENT: "Large-scale blood group genotyping - clinical implications", BRITISH JOURNAL OF HAEMATOLOGY, vol. 144, no. 1, 1 January 2009 (2009-01-01), pages 3 - 13, XP055130465, ISSN: 0007-1048, DOI: 10.1111/j.1365-2141.2008.07285.x
LOUISE TILLEY ET AL: "Is Next Generation Sequencing the future of blood group testing?", TRANSFUSION AND APHERESIS SCIENCE, vol. 50, no. 2, 1 April 2014 (2014-04-01), GB, pages 183 - 188, XP055388764, ISSN: 1473-0502, DOI: 10.1016/j.transci.2014.02.013
Attorney, Agent or Firm:
WITHERS & ROGERS LLP et al. (GB)
Download PDF:
Claims:
Claims

1. A method for determining a Kidd (JK) blood group phenotype of a patient, the method comprising:

analysing the genetic sequence of the patient's blood group defining alleles; comparing that sequence to at least one of the JK reference sequences of SEQ ID NO: 19-22;

and

determining the level of homology between the patient's genetic sequence and the reference sequences(s),

wherein homology of at least 80% indicates one of the patient's JK phenotypes as identified by the reference sequence and wherein homology of about 40% to about 60% indicates a hybrid phenotype.

2. A method according to claim 1, wherein homology of at least 90% indicates the patient's JK phenotype as identified by the reference sequence.

3. A method according to claim 1 or claim 2, wherein homology of at least 95% indicates the patient's JK phenotype as identified by the reference sequence.

4. A method according to any of claims 1 to 3, wherein the genetic sequence of the patient's blood group defining alleles is obtained from a sample.

5. A method according to claim 4, wherein the sample is a whole blood sample, or any sample from which DNA can be derived.

6. A method according to any of claims 1 to 5, wherein the genetic sequence of the patient's blood group defining alleles is obtained by next generation sequencing (NGS) of the sample.

7. A method for determining the RHD haplotype of a patient, the method comprising: analysing the genetic sequence of the patient's blood group defining alleles; comparing that sequence to at least one of the RHD reference sequences of SEQ ID NOs:23-25;

and

determining the level of homology between the patient's genetic sequence and the reference sequence(s),

wherein homology of at least 80% indicates one of the patient's RHD haplotypes as identified by the reference sequence.

8. A method according to claim 7, wherein homology of at least 90% indicates the patient's RHD haplotype as identified by the reference sequence.

9. A method according to claim 7 or claim 8, wherein homology of at least 95% indicates the patient's RHD haplotype as identified by the reference sequence.

10. A method according to any of claims 7 to 9, wherein the genetic sequence of the patient's blood group defining alleles is obtained from a sample.

11. A method according to claim 10, wherein the sample is a whole blood sample.

12. A method according to any of claims 7 to 11, wherein the genetic sequence of the patient's blood group defining alleles is obtained by next generation sequencing (NGS) of the sample.

13. A method for diagnosing hemolytic disease of a newborn baby, the method comprising determining the JK blood group phenotype and/or RHD haplotype of a mother and newborn baby according to the method of any of claims 1 to 12, and comparing the JK blood group phenotype and/or RHD haplotype of the mother with that of the newborn baby, wherein a difference in JK phenotype and/or RHD haplotype indicates hemolytic disease of the newborn.

14. The method of claim 13, further comprising treating the newborn with one or more of temperature stabilization, phototherapy, transfusion with compatible packed red blood, exchange transfusion with a blood type compatible with both the newborn baby and the mother, sodium bicarbonate for correction of acidosis and/or assisted ventilation

Description:
METHOD FOR BOOD GROUP GENOTYPING Field of Invention

The present invention relates to methods for determining the predicted blood group phenotype of a patient using reference sequences comprising single nucleotide polymorphism (S P) profiles. Methods of determining RHD haplotype using reference sequences comprising SNP profiles are also disclosed.

Background to the Invention

Human blood group polymorphisms have been almost completely defined and have been applied in DNA-based determination of blood groups in clinical situations, such as multi-transfused patients. Blood group genotyping (BGG) has impacted significantly on routine clinical assessment of both patients and blood donors. This has been very significant in the management of sickle cell disease (SCD) (and indeed other hemoglobinopathies), where African ancestry individuals have different frequencies and complexities of blood group alleles, resulting in high alloimmunization frequencies. For SCD, high degrees of variation within the RH genes account for most alloimmunization, but many other blood group polymorphisms (in multiple systems) have been shown to be causative, including antigens from the Duffy (FY) and Kidd (JK) systems. The gene encoding JK is found on chromosome 18 and three JK alleles include Jk(a) (JK*01), Jk(b) (JK*02) and JK*01W (JK*Aweak). These alleles result in three common phenotypes: Jk(a + b " ), Jk(a " b + ) and Jk(a +weak ). A fourth phenotype is the JK- null phenotype Jk(a " b " ), but this is rare in most populations. The Jk a antigen is important in transfusion medicine. People with two Jk(a) antigens, for instance, may form antibodies against donated blood containing two Jk(b) antigens (and thus no Jk(a)). This can lead to hemolytic anemia, in which the body destroys the transfused blood, leading to low red blood cell counts. Hemolytic disease of the newborn is also associated with the JK antigen.

JK has been well characterized at the genomic level, and there are currently 36 JK {SLC14A1) alleles as described in the Blood Group Antigen gene database (http://www.ncbi. nlm.nih.gov/projects/gv/mhc/xslcgi.cgi?cmd=bgmut/systems). A significant number of JK alleles cause null phenotypes, as the mutation ablates the expression of the Jk protein, a urea transporter on the red cell. Furthermore, mutations causing certain amino acid substitutions cause weakened expression of either Jk a or Jk b antigens.

The Rhesus (Rh) blood group system is the second most important blood group after ABO. The principal antigens are D, C, E, c and e. The RHD gene encodes the D antigen and variants, and the RHCE gene encodes the C, E, c and e antigens and variants. Recombination, deletion, and point mutations in these genes generate Rh allelic diversity which makes the Rh blood group the most polymorphic blood group system. In the past decade, different DNA microarray-based tests were introduced that enable genotyping of variant blood group specific single nucleotide polymorphisms (SNPs). RHD antigens are particularly associated with hemolytic disease of the new born.

All commercially available BGG platforms utilize PCR amplification of polymorphic regions of genes responsible for blood group antigen expression, followed by their detection using bead or glass arrays. This methodology has the inherent weakness in that the platform requires previous knowledge of the defined blood group allele, and consequently the array needs constant reassessment with additional new probes as deemed necessary. Next generation sequencing (NGS) of DNA has revolutionized medicine, and the present inventors have started to use this technology to assess blood group antigens. The inventors have assessed the clinically significant RHD and JK blood groups using a NGS protocol that involves long-range PCR (LR-PCR), followed by fragmentation of the PCR products and Ion Torrent PGM™ sequencing. The inventors used a small cohort of samples of genomic DNA (gDNA) derived from blood donors of known Jk phenotypes and RhD haplotypes. The inventors have identified the following novel findings: (1) Previously described weakening mutation, Glu44Lys causing weakened Jk a antigen expression is common amongst the cohort sequenced, and these individuals demonstrate no apparent alteration in expression of these antigens. (2) There are distinct intronic polymorphisms that closely map with various JK genotypes and RHD haplotypes. (3) A previously described purported JK*B null allele is common (10/67 samples), and shows no effect on expression of the Jk b antigen. Critically, despite sequencing a relatively small cohort of individuals the inventors have been able to establish reference alleles for the clinically significant JK and RHD blood group systems. This will be of crucial importance for the routine implementation of NGS in BGG for all other blood group systems.

Summary of the Invention

The present inventors have determined DNA reference sequences for two of the common Jk phenotypes, namely Jk(a " b + ) (SEQ ID NO:20) and Jk(a + b " ) (SEQ ID NOs: 19, 21 and 22). These reference sequences can be used to quickly and accurately determine a patient's Jk phenotype.

Accordingly, in a first aspect the present invention provides a method for determining a Kidd (JK) blood group phenotype of a patient, the method comprising analysing the genetic sequence of the patient's blood group defining alleles; comparing that sequence to at least one of the Jk phenotype reference sequences of SEQ ID NOs: 19- 22; and determining the level of homology between the patient's genetic sequence and the reference sequences(s), wherein homology of at least 80% indicates one of the patient's Jk phenotypes as identified by the reference sequence and wherein homology of about 40% to about 60% indicates a hybrid phenotype.

The present inventors have also determined RHD reference sequences for three common RHD haplotypes, namely Ri/R 0 (SEQ ID NO:23), R 2 (SEQ ID NO:24) and R.2 weak D type 2 (SEQ ID NO:25). These reference sequences can be used to quickly and accurately determine a patient's RHD haplotype. They were defined by sequencing hemizygous individuals (as defined by zygosity testing) and thus by definition are unique RHD sequences possessing just one copy of the RHD gene.

Accordingly, in a second aspect the present invention provides a method for determining the RHD haplotype of a patient, the method comprising analysing the genetic sequence of the patient's blood group defining alleles; comparing that sequence to at least one of the RHD reference sequences of SEQ ID NOs:23-25; and determining the level of homology between the patient's genetic sequence and the reference sequence(s), wherein homology of at least 80% indicates one of the patient's RHD haplotypes as identified by the reference sequence.

In alternative embodiments, the present invention provides a method for determining a blood group phenotype of a patient, the method comprising analysing the genetic sequence of the patient's blood group defining alleles, comparing that sequence to at least one of the reference single nucleotide polymorphism (SNP) profiles of Table A and/or Table B, and determining the level of homology between the patient's genetic sequence and the reference profile(s), wherein homology of at least 80% indicates one of the patient's blood groups as identified by the reference SNP profile and wherein homology of about 40% to about 60% indicates a hybrid gene.

Table A:

43310180 (intron 3) A A G

43310187 (intron 3) A/g* G/a* A

43310415 (exon 4) G G A

43310845 (intron 4) G/c* C/g* G

43310851 (intron 4) A/g* G/a* G

43311380 (intron 5) C/g* G/c* C

43311566 (intron 5) A/g* G/a* A

43311696 (intron 5) T/c* C/t* T

43313128 (intron 5) C/g*/t G/c* C

43314962 (intron 6) C C G

43315027 (intron 6) T/c* C/t* T

43316110 (intron 6) T/c* C/t* T

43316270 (intron 6) C/t* T/c* T

43316538 (exon 7) A/g* G/a* G

43317005 (intron 7) T/c T c

43317249 (intron 7) A/g* G/a* G

43317252 (intron 7) G A/g* A

43317528 (intron 7) C/t* T T

43318694 (intron 7) A/t* T A

43318746 (intron 7) G A G

43319060 (intron 7) C/t* T C

43319274 (exon 8) G G/A G

43319519 (exon 9) G A G

43320438 (intron 9) A G A

43320714 (intron 9) C/g* G G

43320771 (intron 9) T/g* G G

43321412 (intron 9) A G A

43321558 (intron 9) G DEL G/g* G

43321563 (intron 9) A G/a A

43322410 (intron 9) T/c* C/t* C

43323624 (intron 9) G/a* A/g* A

43325159 (intron 9) G/a* A/g* A

43327294 (intron 9) C/t* T/c* T

43329198 (intron 10) A/g* G/a* G

43329310 (intron 10) A/g* G/a* G

43329590 (intron 10) T/g* G/t* T

Table B:

Position within Reference SNP profile: Reference SNP profile: Reference SNP human genome JK*01.X JK*02.X profile:

(Chrl8) Lowercase nucleotides indicate low Lowercase nucleotides indicate JK*01 W.X

(Hgl9) frequency allele low frequency allele Lowercase

* indicates variable SNP due to * indicates variable SNP due to nucleotides indicate hybrid or chimeric gene hybrid or chimeric gene low frequency allele

BOLD indicates a JK*B identifier SNP indicating a hybrid gene

43303687 (promoter) Cl2i (JK*01.04) C CI& (JK*02.07) c

43303834 (promoter) T/c (JK*01.02 and 03) T c

43303932 (promoter) C/g (JK*01.02 and 03) C c c

43304182 (exon 1) G G A

43304783 (exon 1) G G G (A

JK*01 W.02)

43305785 (intron 2) Gl2L*(JK*01.05) A/g* (JK*02.08) G 43306681 (intron 2) Alg*(JK*01.05) G/a *(JK*02.08) A

43306793 (intron 2) G/a*(JK*01.05) AJg*(JK.02.08) G

43306780 (intron 2) C C/g (JK*02.08) C

43306891 (intron 2) G/a (JK*01.02 and 03) G G

43307072 (intron 2) T/c* (JK*01.05) C/t* (JK*02.08) T

43307246 (intron 2) C/t* (JK*.01.05) T/c* (JK*02.08) c

43307338 (exon 3) C/a* (JK*01.05) A/c* (JK*02.08 c

43307455 (intron 3) T/c (JK*01.02 and 03) T T

43308083 (intron 3) T T c

43308188 (intron 3) C/t*(JK*01.05) T/c*(J^*020S) c

43308198 (intron 3) G/t*(JK*01.05) T/g* (JK*02.08) G

43308230 (intron 3) T Tlg {JK*02.02) T

43308315 (intron 3) A/g* (JK*01.05) G/a*(JK*02.08) A

43308810 (intron 3) A/t* (JK*01.05) T/a* (JK*02.08) A

43308889 (intron 3) C/g* (JK*01.05) G/c*(JK*02.08 ) C

43308955 (intron 3) C/t* (JK*01.05) T/c* (JK*02.08) C

43309135 (intron 3) Git* (JK*01.05) T/g* (JK* 02. OS) G

43309355 (intron 3) C/g* (JK*01.05) G/c* (JK*02.08) C

43309703 (intron 3) G/a (JK*01.05) AJg *(JK*02.08) G

43309809 (intron 3) T/c* (JK*01.05) C/t* (JK*02.08) T

43309911 (intron 3) T/c* (JK*01.02/03/05*) C/t*( JK*02.08) C

43310 180 (intron 3) A A G

43310187 (intron 3) A/g* (JK*01.05) G/a*( JK*02.08)) A

43310415 (exon 4) G G A

43310845 (intron 4) G/c* (JK*01.05) Clg*( JK*02.08) G

43310851 (intron 4) A/g* (JK*01.02/03/05) G/a* (JK*02.08) G

43311380 (intron 5) C/g* (JK*01.05) G/c*(JK*02.08) C

43311399 (intron 5) C/t* (JK*01.05) T/c* (JK*02.08) C

43311483 (intron 5) A/g (JK*01.02/03) A A

43311566 (intron 5) A/g* (JK*01.05) G/a*(J^*02.0S) A

43311696 (intron 5) T/c (JK*01.05)* C/t* (JK*02.08) T

43313128 (intron 5) C/g*(JK*01.05)/t(JK*01.03) G/c*(JK*02.08) C

43313367 (intron 5) C/t* (JK*01.05) T/c*(JK*02.08) c

43313916 (intron 5) A A A/g

(JK 1 W.03)

43314473 (Intron 6) T/a* (JK*01.05) A/t*(J^*02.0S) T

43314545 (intron 6) C/t* (JK*01.05) T/c* (JK*02.08) c

43314962 (intron 6) c C G

43315027 (intron 6) T/c* (JK*01.05) C/t*(JK*02.08) T

43315591 (intron 6) A/c* (JK*01.05) c A

43316110 (intron 6) T/c* (JK*01.05) c T

43316270 (intron 6) C/t* (JK*01.02/03/05) T T

43316538 (exon 7) klg*(JK*01.05) G G

43316901 (intron 7) G G T

43316966 (intron 7) T C c

43317005 (intron 7) T T c

43317222 (intron 7) G G A

43317249 (intron 7) A/g (JK*01.02/03) G G

43317252 (intron 7) G A/g (J^*02.05) A

43317282 (intron 7) T C C

43317528 (intron 7) c T T

43317547 (intron 7) c T T

43317866 (intron 7) c G c

43329590 (intron 10) Tig (JK*01.02/03) Glt*(JK*02.08/09) T

43329719 (Intron 10) G G A

The capital letters used in Tables A and B denote that in >90% of cases the S P will be that nucleotide. A lower case letter indicates that the SNP occurs only rarely or is variable as indicated in a specific allele. Specific alleles for low frequency mutations are shown in brackets. Asterisks refer to hybrid genes where the sequence switches from JK*A (JK*01) to JK*B (JK*02) or vice versa. When determining homology the patient's genetic sequence may match high or low frequency alleles of a reference SNP.

In a further embodiment the present invention provides a method for determining the RHD haplotype of a patient, the method comprising analysing the genetic sequence of the patient's blood group defining alleles, comparing that sequence to at least one of the reference single nucleotide polymorphism (SNP) profiles of Table C and/or Table D; and determining the level of homology between the patient's genetic sequence and the reference profile(s), wherein homology of at least 80% indicates one of the patient's RHD haplotypes as identified by the reference SNP profile.

Table C:

Table D:

SNP position (Hg38) in Sequence in Sequence in Ri Sequence in weak

RHD gene R 2 haplotype /Ro haplotype D type 2 (DcE) ,273,231 Intron 1 T T G,277,881/2 Intron 1 TG TG CC,277,954 Intron 1 G G T,277,983/4 Intron 1 TA TA Del-Del,280,027 Intron 1 C C Del,282,654 Intron 1 G A G,284,544 Intron 1 G C G,285,089 Intron 2 A G A,287,909 Intron 2 G C G,292,953 Intron 3 G A G,295,072 Intron 3 A G A,295,317 Intron 3 G A G,295,354 Intron 3 T C T,295,489 Intron 3 T C T,295,708 Intron 3 A G A,295,731 Intron 3 G A G,295,739 Intron 3 A G A,295,753 Intron 3 G A G,295,797 Intron 3 T A T,295,800 Intron 3 G A G,296,764 Intron 3 A C A,297,476 Intron 3 A G A,298,410 Intron 3 G C G,298,980 Intron 3 C T C,304,945 Intron 6 A T A,305,898 Intron 6 G A G,307,040 Intron 7 G C G,307,714 Intron 7 A G A,308,845 Intron 7 C G C,311,416 Intron 7 C C T,311,439-442 Intron 7 GGCA GGCA CTTT 25,311,447 Intron 7 C C T

25,311,450 Intron 7 G G A

25,311,453 Intron 7 A A T

25,311,456/7 Intron 7 CA CA TG

25,311,461 Intron 7 C C G

25,311,504 Intron 7 G G A

25,311,516 Intron 7 C C G

25,311,527/8 Intron 7 AA AA TT

25,311,530 Intron 7 G G A

25,311,722/3 Intron 7 TG GT TG

25,316,269 Intron 7 G A G

25,320,257 Intron 8 A C A

25,320,442 Intron 8 G T G

25,321,858 Intron 8 C T C

25,323,618 Intron 9 C G C

25,324,869 Intron 9 C Del C

25,329,839 Intron 10 A T A

The present invention also provides an oligonucleotide primer, wherein the primer is complementary to at least ten consecutive nucleotides from one of the following regions:

The present invention also provides a method for determining whether a patient is JK*A and/or JK*B positive, the method comprising contacting a sample obtained from the patient with first and second oligonucleotide primers complimentary to at least one JK*A and/or at least one JK*B SNP of Table A and determining the presence or absence of an amplicon, wherein the presence of the amplicon indicates that the patient is JK*A and/or JK*B positive.

Description

Most current blood group genotyping methods require the blood group polymorphism under investigation to be known, and almost without exception involve the analysis of exonic (and causative) mutations. This can provide a weakness in the genotyping methods because new mutations remain invisible to diagnosis and intronic changes are ignored. The present inventors have made use of novel genomic sequence variation, mainly within the intronic regions of blood group active genes. The inventors have shown that distinctive patterns (genetic fingerprints) of single nucleotide polymorphisms exactly correlate with JK blood group haplotypes and phenotypes, and in the case of the RHD system, with haplotype. The method of the present invention operates in "discovery mode" in that the actual sequences of the genes are defined, rather than predefined as in currently applied commercially available genotyping platforms. The methods of the present invention can be applied to any blood group or platelet gene polymorphic variation once the reference sequence (or fingerprint) has been established.

The inventors have identified patterns of intronic (and some silent exonic) S Ps that have revealed fundamental information regarding the molecular basis of blood group polymorphisms that were unexpected and have not been described elsewhere. For example, different SNPs identify the JK*A and JK*B alleles, and may have a fundamental impact on JK genotyping. Haplotype specific intronic SNPs have been identified for RHD haplotypes such as cDE (R 2 ), which can be found in at least 8 locations in the RHD gene and were completely unexpected. These will greatly aid zygosity testing and assist in RH genotype assignment. Using these patterns of SNPs the present inventors have derived reference sequences that will be of critical assistance both in future projects where blood group genotyping is required and in the development of bioinformatic tools that handle NGS data from blood group genotyping analyses.

Accordingly, the present invention provides a method for determining a Kidd (JK) blood group phenotype of a patient, the method comprising analysing the genetic sequence of the patient's blood group defining alleles; comparing that sequence to at least one of the JK reference sequences of SEQ ID NO: 19-22; and determining the level of homology between the patient's genetic sequence and the reference sequences(s), wherein homology of at least 80% indicates one of the patient's JK phenotypes as identified by the reference sequence and wherein homology of about 40% to about 60% indicates a hybrid phenotype.

Homology of at least 90% or at least 95% may indicate the patient's JK phenotype as identified by the reference sequence. In embodiments of the invention the homology may be at least 98% or at least 99%. The patient's genetic sequence may be homologous to within one or two SNPs of a reference sequence, In other words, homology between the patient's genetic sequence and a reference sequence may be close to 100%.

Preferably the genetic sequence of the patient's blood group defining alleles is obtained by single molecule sequencing or next generation sequencing (NGS) of a sample obtained from the patient. NGS sequencing examines alleles derived from both maternally and paternally inherited chromosomes.

Interaction between maternal and paternal alleles or closely related genes can lead to hybrid or small segments of chimeric sequence within a gene. This interaction can lead to weakening of blood group antigenicity or in some cases expression of new blood group antigens or loss of epitopes (parts of antigens) and antigens. In the context of the present application such genes are referred to as hybrid genes.

The presence of hybrid genes can be determined by the detection of homozygote sequences against the reference sequences, for example hybrid RHD-RHCE-RHD (partial D and D-negative) or hybrid JK*A/JK*B alleles by reference sequencing to show the extent of the hybrid gene. For instance, in a hybrid gene comprising the 5' end resembling JK*A and 3 'end resembling JK*B when sequenced in the presence of a normal JK*A allele, the 5' reference S Ps will be homozygous JK*A and 3' reference SNPs heterozygous JK*A/JK*B.

The genetic sequence of the patient's blood group defining alleles may be obtained from a sample, preferably a biological sample comprising genetic material obtained from the patient. The sample may be a blood sample, such as a whole blood sample. Alternatively, the sample may be cell scraping, a biopsy tissue or other body fluid such as bone marrow, plasma, serum, cerebrospinal fluid, saliva, sperm, sputum, urine or stool.

The patient is preferably a human and may be a pregnant woman, a post-partum woman, a newborn baby or a fetus.

The reference sequence may be a JK (Kidd) blood group sequence as defined in any of SEQ ID NOs: 19-22. The Kidd glycoprotein is the red blood cell urea transporter, which maintains osmotic stability and shape of red blood cells. Antibodies formed against the Kidd antigens are a significant cause of delayed haemolytic transfusion reactions and are also a cause of haemolytic disease of the newborn and fetus.

The Kidd glycoprotein is encoded by the JK gene of which there are two main alleles: JK*A and JK*B. These alleles are codominant, meaning that it is possible to express both Jk a and Jk b antigens on RBCs from a single patient. A variant form of the JK gene, dubbed JK*Aweak is thought to give rise to weakened Jk a antigen. However, the present inventors have shown it to be high abundance (in one third of Jk(a+b-) phenotype individuals) with no effect on Jk a antigenicity. The JK gene is found on chromosome 18 at position 18ql2-q21 and consists of 11 exons, of which exons 4-11 encode the mature protein. In embodiments of the invention the reference sequence may be the related reference SNP profiles of tables A or B. Alternatively, the reference sequence may identify the Jk phenotype as Jk(a " b + ) (SEQ ID NO:20) or Jk(a + b " ) (SEQ ID NOs: 19, 21 and 22).

RHD zygosity can be defined by quantitative methods, and is helpful in the determination of paternal RHD zygosity to RhD negative mothers. A RHD positive individual can be homozygous having two copies of RHD (e.g. CDe/CDe; cDE/CDe; cDE/cDE, cDe/cDe etc) or hemizygous with just one RHD copy (e.g. CDe/cde; cDE/cdE, cDe/cde etc). In embodiments of the invention the reference sequence may be Ri/Ro (DCe / Dee) (SEQ ID NO:23), R 2 (DcE) (SEQ ID NO:24), or R 2 weak D type 2 (DcE) (SEQ ID NO:25) or related reference S P profiles of tables C or D.

The present invention additionally provides a method for determining the RHD haplotype of a patient, the method comprising analysing the genetic sequence of the patient's blood group defining alleles; comparing that sequence to at least one of the RHD reference sequences of SEQ ID NOs:23-25; and determining the level of homology between the patient's genetic sequence and the reference sequence(s), wherein homology of at least 80% indicates one of the patient's RHD haplotypes as identified by the reference sequence.

Homology of at least 90% or at least 95% may indicate the patient's RHD haplotype as identified by the reference sequence. In embodiments of the invention the homology may be at least 98% or at least 99%. The patient's genetic sequence may be homologous to within one or two S Ps of a reference sequence, In other words, homology between the patient's genetic sequence and a reference sequence may be close to 100%.

Preferably the genetic sequence of the patient's blood group defining alleles is obtained by single molecule sequencing or next generation sequencing (NGS) of a sample obtained from the patient. NGS sequencing examines alleles derived from both maternally and paternally inherited chromosomes.

The patient is preferably a human and may be a pregnant woman, a post-partum woman, a newborn baby or a fetus.

The genetic sequence of the patient's blood group defining alleles may be obtained from a sample, preferably the sample is a biological sample comprising genetic material and is obtained from the patient. The sample may be a blood sample, such as a whole blood sample. Alternatively, the sample may be cell scraping, a biopsy tissue or other body fluid such as bone marrow, plasma, serum, cerebrospinal fluid, saliva, sperm, sputum, urine or stool.

The present invention additionally provides an oligonucleotide primer comprising at least 10 nucleotides being complementary to at least ten consecutive nucleotides of the JK gene that do not include nucleotides at positions 43306891, 43307455, 43311483, 43313309, 43319359 or 43329031 of chromosome 18 of the human genome sequence.

Oligonucleotide primers of the invention are preferably capable of hybridising to the polynucleotide of the JK portion of chromosome 18 or the RHD portion of chromosome 1 under high stringency conditions.

The present invention additionally provides a method for manufacturing a primer to the JK gene, the method comprising: determining a sequence of at least 10 consecutive nucleotides complimentary to at least 10 consecutive nucleotides of the JK portion of chromosome 18 of the human genome sequence, wherein said at least 10 consecutive nucleotides do not include positions 43306891, 43307455, 43311483, 43313309, 43319359 or 43329031 of chromosome 18 of the human genome sequence, designing 5' and 3' primers based on said consecutive nucleotides and manufacturing the primers.

The present invention further provides an oligonucleotide primer, wherein the primer is complementary to at least ten consecutive nucleotides from one of the following regions:

Chromosome Position within

human genome 18 (JK) 43301432

18 43312443

18 43310959

18 43322011

18 43318623

18 43333287

Preferably the position with the human genome as identified about refers to the finishing position of primer, i.e. the 3' end.

The present invention also provides an oligonucleotide primer wherein the primer sequence is at least 80% homologous to one of the following nucleotide sequences:

JK:

GA AGC CC AC TGC GA A ATC C A A AT AG

GAGGGCAAATGGGAGGTGATACAA

GCTTTACCTCATCCCTTCCAGACAA

GCTTCTGCCCTCTATTGTAACACTC

GCTTTGGGTCTCTGGCTTTAGTGTA

TTCCGTGCTAATCCTGTATCATGGG

RHD:

ATCCACTTTCCACCTCCCTGC

TCTTTGCACTTCTTCTGACAACA

CTGGGAGAGTGAAGCTGGGTGTGA

TTCATACACATCTCTACCCCCCCTC

GTTTGAGCCCAGGAGTTAGGGACCGAG

CCCACTGTGACCACCCAGCATTCTA

CATACCTTTGAATTAAGCACTTCAC

CAGAATGGCCTTTACCAGCCAT

GTTCAAGCTGTCAAGGAGACATCACTATACA

CCAGTTTTAAGAATTTGTCGGCCGGTCG

ATACATTCCATCCAGAACTGTTCACC

AGGCCAAGAGATCCTGGTGAAACTATCC The oligonucleotide primer may be at 90% or at least 95% or at least 98% homologous to the above sequences. In embodiments of the invention the oligonucleotide primer is 99% or 100% homologous to the above sequences.

Oligonucleotide primers of the invention are preferably capable of hybridising to the polynucleotide of the JK portion of chromosome 18 or the RHD portion of chromosome 1 under high stringency conditions.

The present invention also provides a method for determining whether a patient is JK*A and/or JK*B positive, the method comprising contacting a sample obtained from the patient with first and second oligonucleotide primers complimentary to at least one JK*A and/or at least one JK*B SNP of Table A or Table B and determining the presence or absence of an amplicon, wherein the presence of the amplicon indicates that the patient is JK*A and/or JK*B positive.

Preferably the JK*A SNP is 43317252 (intron 7) G. The JK*B SNP may include one more of 43318746 (intron 7) A, 43319519 (exon 9) A, 43320438 (intron 9) G, or 43321412 (intron 9) G.

Presence of an amplicon may be determined, e.g. by gel electrophoresis or other conventional methods which will be familiar to the skilled person.

The present invention additionally provides a method for diagnosing hemolytic disease of a newborn baby, the method comprising determining the JK blood group phenotype and/or RHD haplotype of a mother and newborn baby according to the methods of the invention, comparing the JK blood group phenotype and/or RHD haplotype of the mother with that of the newborn baby, wherein a difference in JK phenotype and/or RHD haplotype indicates hemolytic disease of the newborn. In the event that the newborn is diagnosed with hemolytic disease the method may further comprise treating the newborn baby with one or more of temperature stabilization, phototherapy, transfusion with compatible packed red blood, exchange transfusion with a blood type compatible with both the newborn baby and the mother, sodium bicarbonate for correction of acidosis and/or assisted ventilation

Brief Description of the Drawings

The present invention will now be described in relation to specific embodiments in which: Figure 1 shows a graphic representation of the JK gene and assignment of reference JK*A, JK*B and JK*Aweak (JK*01W)SNPs.

Figure 1 A shows differences between the 67 complete assembled JK gene sequences and the reference human genome sequence Chrl8: hgl9. Each SNP is denoted by a vertical line, and numbered in accordance with its position on the human genome sequence.

Figures IB and 1C identities of variant SNPs defined from NGS sequencing oiJK*A, JK*A and JK*B, JK*B genotype samples. As the majority of individuals matched these "sequencing fingerprints" we have dubbed these the JK*A and JK*B reference sequences, and we have shown JK*A reference SNPs as red, JK*B reference SNPs as blue. Figure ID defined sequence differences between JK*Aweak (JK*01W) allele containing samples and hgl9. The derivation of each SNP from either JK*A or JK*B are indicated as red or blue lines respectively, and JK*Aweak (JK*01W) defining SNPs are shown in green. In Figure 1 A the identified SNPs were homozygous for the change identified in the figure for example (A>G 43306891) in all samples except the following: A>G 43306891 (samples JK009.04/11/13/15/60 and 65 were heterozygous while JK009.05 was homozygous A/A); C>T 43307455 (samples JK009.04/11/13/15/60 and 65 were heterozygous while JK009.05 was homozygous C/C); G>A 43311483 (samples JK009.04/11/13/15/23/60 and 65 were heterozygous while JK009.05 was homozygous G/G); A>G 43313309 (samples JK009.04/11/13/15 and 60 were heterozygous while JK009.05 was homozygous A/ A); C>T 43319359 (samples JK009.04/11/13/15/60 and 65 were heterozygous while JK009.05 was homozygous C/C); A>G 43329031 (sample JK009.65 was heterozygous).

Figure 2 shows the cDNA sequence analysis of intron 8/exon 9 region of one Jk(a+b+) sample heterozygous for the G810A mutation. Purified PCR products following RT-PCR using JK exon 8F and exon 9R primers and cDNA prepared from known Jk phenotype red cells, and were subjected to Sanger sequence analysis as described. It was found that the exon 8/intron 8 boundary was spliced normally, and that transcripts encoding both G838 and A838 nucleotides (JK*A and JK*B defining mutations respectively), and both G810 and A810 were defined (data not shown).

Example 1 Methods

Blood samples and serological phenotyping

Blood samples (pilot EDTA tubes) were obtained with appropriate ethical consent from the UK National Health Service Blood and Transplant (NHSBT), Filton, Bristol, UK. Standard serological phenotyping methods were performed by the NHSBT as part of the donor testing process and phenotype information was supplied with the randomly selected samples.

Genomic DNA extraction

Blood samples were centrifuged at 3500 xg for 5 minutes in order to obtain leukocyte rich buffy coat. gDNA was extracted from the buffy coat fraction using a QIAamp DNA Blood Mini Kit (QIAGEN, Hilden, Germany), according to manufacturer's instructions. Subsequently, the DNA was quantified by using Qubit ® 2.0 Fluorometer and the Qubit ® dsDNA Broad range (BR) assay Kit (Life Technologies, Paisley, UK), before being stored at -20°C.

Long-Range PCR amplification of JK genes

Oligonucleotide primers (Table 1) were designed using Primer 3 software (http://frodo.wi.mit.edu/primer3/). NCBI BLAST

(http://blast.ncbi.nlm.nih.gov/Blast.cgi) was used to confirm the specificity of primers and the UCSC Genome Browser (http://genome.ucsc.edu) was used to check specificity and visualize the amplicons.

The LR-PCR reactions were performed in a 50 μΐ ^ volume containing lx Long Amp ® Hot Start Taq Master Mix (New England BioLabs, Hitchin, UK), 500nM forward and reverse primers and (lOOng for FY and 200ng for JK) gDNA template. Cycling was carried out on a Veriti Thermal Cycler (Life Technologies, Paisley, UK) following optimized conditions for JK reactions: 94°C for 5 min, 30 cycles of 94°C for 30 s, 60°C for 30 s and 65°C for 10 min, followed by a final extension of 65°C for 10 min. For the FY reactions, the cycling conditions were: 95°C for 30 s, 30 cycles of 95°C for 30 s, 62°C for 30 s and 72°C for 3 min, followed by a final extension at 72°C for 5 min. After amplification, the amplicons were visualized by agarose gel electrophoresis.

Table 1: LR-PCR primers for amplification of JK genes

Preparation of LR-PCR Library

The long amplicons were purified by Agencourt ® AMPure ® XP beads (Beckman Coulter, High Wycombe, UK) to ensure removal of primer dimers and free nucleotides. Purified amplicons were then quantified by Qubit ® dsDNA BR assay kit before conducting enzymatic fragmentation by Ion Xpress™ Plus Fragment Library Kit (Life Technologies, Paisley, UK) to result in fragments of an average size of 200bp. Next, the fragments were ligated with barcoded adapters, which add about 70bp to the fragments. PI and Ion Xpress Barcode X adapters from the Ion Xpress™ Barcode adapters Kit (Life Technologies, Paisley, UK) were used to distinguish the samples when pooled prior to sequencing. Then, the adapter ligated library was size selected by Pippin Prep™ instrument (Sage Science, Inc., Beverly, USA) and Pippin Prep™ Kit 2010 with Ethidium Bromide cassettes or SPRIselect ® reagent kit (Beckman Coulter, High Wycombe, UK). After each step (fragmentation, ligation and size selection), purification was conducted using magnetic beads and the integrity, size distribution, concentration and quality of the library in those steps was checked using the Agilent ® 2100 Bioanalyzer ® instrument and Agilent High Sensitivity DNA Kit (Agilent Technologies UK Limited, Stockport, UK).

NGS sequencing of amplicons by Ion Torrent PGM™

Template-positive ion sphere particles (ISPs) containing clonally amplified DNA were prepared by the Ion PGM™ Template OT2 200 Kit (for 200 base-read libraries) (Life Technologies, Paisley, UK) with the Ion OneTouch™ 2 System (which is based on emulsion PCR). Then the percentage of template-positive ISPs was checked by the Ion Sphere™ Quality Control assay (Life Technologies, Paisley, UK) on the Qubit ® 2.0 Fluorometer (Life Technologies, Paisley, UK) and then enriched by the Ion OneTouch™ ES Instrument before loading onto a 316™ chip. Sequencing was carried out using the Ion PGM™ Sequencing 200 Kit v2 (Life Technologies, Paisley, UK) and the Ion Torrent PGM™.

Bioinformatics

The sequence data was obtained on the Ion Torrent server, with the generation of Variant Caller Files (VCF), FASTQ files and BAM files, which were used for the bioinformatics analysis. Software was used to align the reads with the reference sequence of JK gene (NM_001146036.2) and human genome (hgl9). Alleles and variants were analyzed and visualized by software packages including CLC Genomics Workbench 6.5, Ion Torrent Suite™ plugins (such as coverage analysis (v3.6.63324), VariantCaller (v3.6.59049), Alignment and FastQC (v3.4.1.1)) and Integrative Genomics Viewer (IGV). In addition, the Seattle annotation 137 website was used (http://snp.gs.washington.edu/SeattleSeqAnnotationl37/index. jsp). The software packages are linked to databases such as 1000 Genomes and dbSNP, in order to identify variants and single nucleotide polymorphisms (S Ps). Sequence coverage during the NGS experiments was in average of 1000X for JK sequencing.

RNA extraction

RNA was extracted from 1 mL of the reticulocyte rich uppermost layer of the red cells. The RNA was extracted using the QIAamp RNA Blood Mini Kit (QIAGEN, Hilden, Germany) with addition of the RNase Free DNase Set (QIAGEN, Hilden, Germany) as provided by the manufacturer to successfully extract RNA from reticulocytes. Subsequently, the extracted RNA was quantified by the NanoDrop 2000™ (Thermo Scientific, USA).

Reverse transcription coupled PCR of JK cDNAs

First strand cDNA synthesis was performed using Superscript ® III First Synthesis System (Invitrogen™, Paisley, UK), according to the manufacturer's instructions. Subsequently, this first strand cDNA was used for the PCR amplification, in which the JK exon 8-9 specific primers (for exon 8 splice site mutation analysis) and exon 4-9 primers (for JK*Aweak transcript analysis were utilized). For JK exon 8-9 amplification, the Q5 ® Hot Start High-Fidelity 2X Master Mix (New England BioLabs, Hitchin, UK) was used with a final primer concentration of 500nM. Cycling was carried out on a Veriti Thermal Cycler (Life Technologies, Paisley, UK) following optimized conditions: 98°C for 30 s, 35 cycles of 98°C for 5 s, 64°C for 20 s and 72°C for 30 s, followed by a final extension of 72°C for 2 mins. The production of the JK exon 4-9 amplicon involved the use of BioMix™ 2x reaction mix (BioLine, UK) with a final primer concentration of 500nM mixed with 80 ng cDNA. Optimized thermocycling conditions were as follows: 94°C for 30 s, 35 cycles of 94°C for 5 s, 60°C for 20 s and 72°C for 30 s, followed by a final extension of 72°C for 20 mins. After amplification, amplicons were visualized by agarose gel electrophoresis, and then subjected to cloning (only JK exons 4-9 amplicons) and Sanger sequencing. The cloning process was completed according to the manufacturer's instructions using the TOPO ® TA Cloning ® Kit for Sequencing (Invitrogen™, Paisley, UK) in which the pCR™ 4-TOPO ® vector (TOPO ® vector) and Top 10 One Shot ® Chemically competent cells were utilized. Inserts were sequenced by Sanger sequencing using T3 and T7 primers.

Results

NGS analysis of the complete JK gene

The complete sequence of the JK (SLC14A1) gene was defined in 67 different samples, most of which (59/67) express known serological Jk phenotypes (See Table A for summarised results. For complete results see Table 1 of Altayer et a). The JK gene was amplified from the samples using overlapping LR-PCRs. The amplicons were then fragmented and sequenced on the Ion Torrent PGM™ as described in the Methods. First, we compared each of the defined DNA sequences to a reference sequence derived from the human genome (hgl9), and all nucleotide reference numbers in this application are derived from this sequence. Next, we compared the sequences of all JK*A, JK*A homozygotes, which allowed us to define a reference JK*A sequence. A similar comparison was carried out for all the JK*B, JK*B homozygotes. This process resulted in novel allele-defining SNPs (and in the case of JK*B, a single nucleotide deletion) which have not been previously described. Table A describes the reference SNPs that define JK*A, JK*B and JK*Aweak alleles, and which show a high degree of concordance to JK genotype.

Figure 1 provides a graphical representation of these SNPs and their distribution along the JK gene. We found that in all 59 samples of defined Jk serotype, there was complete correlation with genotype at the allele defining SNP, position 43319519 in exon 9 (838G>A, encoding Asp280Asn). Interestingly we found that 10/67 of our sequenced cohort were heterozygous for a 810G>A mutation located at the exon 8/intron 8 boundary, and this mutation was only found in samples carrying the JK*B allele. This was also confirmed by sequencing cDNA derived from one such individual, with both G and A alleles being spliced normally (see cDNA analysis of the 810G>A mutation). Furthermore, we found 10/67 randomly selected samples that expressed normal Jk (a+) and Jk (b+) phenotypes were heterozygous, and one sample that was homozygous for a 130G>A mutation (encoding Glu44Lys), which has previously been assigned as a JK*Aweak allele. We found no apparent weakening of the Jk a antigen in any samples that carried this allele.

Reference JK*A, JK*B and JK*Aweak SNPs and their role in JK genotyping

Described above is the manner by which we determined reference JK*A or JK*B and JK*Aweak alleles. Figure 1 and Table A demonstrate these identifier SNPs when compared to the reference human genome sequence (hgl9). Several nucleotide changes were found in most samples sequenced when compared to the reference sequence. Samples were homozygous for the changes, except where noted (see Figure 1 legend). The reference sequence (hgl9) encodes a JK*A allele at codon 280 but has multiple intronic SNPs consistent with either JK*A, JK*B or JK*Aweak alleles. Therefore, we assume that the reference human genome sequence (hgl9) is a consensus sequence obtained from a number of individuals of different JK genotypes. Whilst the majority of samples fitted the intronic SNP "fingerprint" for JK*A or JK*B alleles, there were some exceptions, and these are presented in Table 1 and Figure 1. Interestingly, we found that in all of the JK*Aweak samples investigated, their inferred sequence strongly suggests that the JK*Aweak allele is composed of a hybrid JK*A- JK*B gene (Table A, Figure 1). cDNA analysis of the JK (810G>A and 588 A >G) mutations:

810G>A

A significant proportion of our sequenced cohort carry the JK (810G>A) mutation, that has been described recently as a novel JK*B silencing mutation occurring at the exon 8/intron8 boundary. In contrast to these findings, we found that this mutation actually restores a splice site at the exon 8/intron8 boundary whilst the wild type JK gene actually has an AG-intron boundary. We studied the mRNA expression of 15 samples carrying both alleles of the splice site boundary and found that sequenced cDNAs encoding both Jk a and Jk b polymorphisms had correctly spliced exons in this key position (Figure 2). These samples were obtained from donors of known phenotype (2 Jk(a+b-), 6 Jk(a-b+), 5 Jk(a+b+) , 1 JK*Aweak, JK*B (Jka+b+) and 1 JK*Aweak, JK*A (Jka+b-) and were analyzed by RT-PCR and direct sequencing, after synthesizing cDNA from RNA extracted from peripheral blood. Only one (Jka+b+, JK*A, JK*B) of these 15 samples was heterozygous for both 810G>A and 838G>A and was spliced as expected.

588A>G

We sequenced cDNA clones obtained by RT-PCR of JK transcripts using primers located in exons 4-9 (see Methods). This was done to ascertain whether the G588 synonymous mutation was carried by JK*B transcripts and JK*Aweak transcripts as found by genomic sequencing (Table 1). This was because the NGS approach used is unable to unequivocally define which allele this mutation arises from. We found that in all JK*B and JK*Aweak transcripts, they carried the G588 allele in combination with JK*B A838 and G838 in JK*Aweak. However, genomic analysis identified one Jk(a+b+) sample that is homozygous for G588, suggesting that it is not involved in the weakening of the Jk a antigen.

Discussion

We describe here a novel approach to using NGS to sequence blood group active genes. We have chosen the technically more difficult method of coupling LR-PCR with NGS rather than a more targeted approach (for example, AmpliSeq™) for several reasons. First, the AmpliSeq™ method amplifies only predefined target exons, and we decided that the complete gene, including splice sites and introns, was to be analyzed. We found that with this approach the JK (810G>A) splice site mutation could be readily identified in 10/67 samples (Table A). Secondly, we wanted a protocol to be adaptable for single molecule sequencing methods under development. This will be critical to establish whether identified mutant alleles lie in cis or trans chromosomal positions to blood group allele defining SNPs (for example, JK*AIJK*B 838G>A) so that silencing mutations can be identified. Our LR-PCR approach is also amenable to cloning individual genomic fragments using plasmids/cosmids and bacterial vectors to identify using our current Ion Torrent PGM™ method to establish whether a novel mutation sits in cis or trans to key blood group alleles. We have shown that our NGS method has effectively reassigned a purported JK*0 allele (810G>A) carried on a JK*B background, to be one of normal variation within the JK gene, as 10/67 normal phenotype Jk(b+) samples carried this mutation. This mutation actually restores an AG splice site at the exon 8/intron 8 boundary compared to the reference sequence (Figure 2, Table A). It had already been previously noted that a number of JK {SLC14A1) exon/intron boundaries do not possess the 5' exonic consensus sequence AG-intron. All 10 heterozygous 810G>A samples were of phenotype 168 Jk(a+b+) or Jk (a-b+) suggesting no ablation of Jk b antigen expression (Table A). Transcript sequencing confirmed that intron 8 was correctly spliced in one sample containing both the G and A nucleotides at position 810 (Figure 2). In addition, we have also shown that a weakening JK*A allele (130G>A) is more frequent than previously thought (found in 10/67 of the randomly selected cohort analyzed). These ten samples were not defined by serology as having weakened Jk a antigen expression in 8 heterozygous JK*A, JK*Aweak samples and one homozygous JK*Aweak, JK*Aweak sample (see Table A).

JK polymorphism patterns and assignment of JK*A and JK*B reference sequences Our analysis of the 67 fully sequenced JK individuals revealed a significant number of S Ps within both coding and intronic regions of the JK (SLC14A1) gene. The average number of intronic SNPs within the JK gene were around 30 within JK*A,JK*A samples, 50 in JK*A,JK*Aweak samples, 40 in JK*Aweak,JK*Aweak samples, 80 in JK*B,JK*Aweak samples, 70 in JK*B, JK*B samples and 80 in JK*A, JK*B samples. Our initial analysis focused on SNPs that closely correlated with JK*A, JK*B and JK*Aweak alleles. As the NGS chemistry we adopted results in the sequencing of millions of overlapping 200-3 OObp fragments it is not possible to ascertain which chromosome carries which SNP. However, we were able to show that certain SNPs were closely associated with each of the common three JK alleles we have studied. By initial analysis of JK*A/JK*A, JK*B/JK*B and JK*Aweak/JK*Aweak homozygote samples we generated what can be called JK*A, JK*B and JK*Aweak reference sequences (Figures lb, lc and Id). This analysis also revealed in a number of instances that all 134 sequenced haplotypes differed to the Human Genome Project derived reference sequence (hgl9) (Figure 1 A). We assume that these are uncommon sequence variations found when the human genome sequence was originally assembled. As all current BGG depends on primer design using the human genome sequence, it is noteworthy to point out one of these SNPs (43319359 C>T) is located very close (160 bp) upstream of the JK*A/JK*B critical polymorphism (43319519) 838G>A. Accordingly, if this position was utilised for a genotyping primer, allelic dropout may occur.

It is clear from our analysis that a number of S Ps show complete concordance with JK*A, JK*B and JK*Aweak genotypes, whilst others show a high degree of concordance. It is quite apparent however that there are multiple JK*A and JK*B alleles within the cohort, as illustrated by the SNP zygosity patterns. We feel that these discrete patterns can be used in the future to identify in evolutionary terms from which allele JK variants (silencing or weakening) arose.

We have utilized the high frequency JK*A and JK*B reference sequences to define the most probable sequence of the low frequency JK*Aweak allele (Figure Id), of which we have found 11 among the 134 JK haplotypes we have sequenced. In samples carrying the previously described JK*Aweak defining SNP 43310415 (G>A) (which corresponds to a codon 44 Glu>Lys change) it is possible to predict the sequence of the JK*Aweak haplotype, as NGS clearly defines each SNP as heterozygous or homozygous. Of the ten sequenced samples harbouring the JK*Aweak SNPs, all exhibited almost an identical pattern. Two samples (JK009.27 & JK009.28) carried a JK*Aweak haplotype and a JK*B haplotype, and one was defined serologically as Jk(a+b+). This questions whether a weakened Jk a antigen is expressed on the red cells, as routine serological methods were used to detect it. Whilst we realize our data does not prove categorically the sequences of these various JK variant gene structures (which could be confirmed by long read single molecule sequencing), we are confident to state that the JK*Aweak allele resembles a hybrid or "patched repair" JK*A/JK*B gene, but with the addition of specific allele-defining SNPs (shown in green in Table 1 and in Figure Id). The patched repair segments indicate that at least intron 7, exon 7, the 3 ' end of intron 9, exon 10 and the 5' end of intron 10 are derived from the JK*B haplotype, whereas the remainder of the JK*Aweak allele is JK*A derived. We are astonished by the level of interactions between the JK*A and JK*B alleles causing the genomic variants (Figures 1).

In all of our methods to analyze blood group genes by NGS we chose to amplify the complete gene. This does in some cases include substantial intronic and promotor regions, but the sequence information obtained is useful in determination of the genomic evolution of blood group alleles. Zygosity for each intronic S P is easy to discern as NGS sequence coverage in our hands is routinely between 1000-2000X, and in these circumstances homozygosity for a SNP (compared to the reference sequences we have defined) is 100% and heterozygosity about 50%. It is not exactly 50% as a population of DNA molecules is sequenced. We note that in all samples carrying the JK*Aweak allele, a SNP located near the transcript start site in exon 1 is found (Table A, 43304182 G>A) which may affect the efficiency of transcription, but we have not found any difference in abundance between JK*Aweak, JK*A and JK*B transcripts during our RT-PCR studies.

Application of NGS to BGG

We are of the opinion that NGS will rapidly supplant array-based methods for routine BGG as they are vastly superior in terms of the information obtained, whilst being technically very similar and similar in cost. The prime weakness of arrays is that they are totally dependent on the population of probes carried on them, and require therefore that the mutation to be investigated is previously known. NGS methods operate in discovery mode that we have termed Gen2Phen, where it may be possible to predict phenotype from defined genotype- for example where a silencing or weakening mutation is present within the coding gene of an identified blood group allele. We have shown that in particular for the JK samples that certain defined alleles that differ to our defined JK*A and JK*B reference sequences may warrant further serological analysis to assess weakening or silencing mutations. It is somewhat astonishing to us to note such degrees of interactions between two allelic genes, especially between JK*A and JK*B. It is possible that templated repair occurs on a much higher frequency than previously thought to generate hybrid JK*A-JK*B genes, and that new mutations occur when such repair is not entirely possible. Such situations have been described previously for other blood group genes, notably ABO, RHCE, RHD, GYPA, GYPB and GYPE but this is the first description within JK, and only when the analysis of intronic polymorphism has been conducted is this revealed. Example 2 Methods

Genomic DNA samples from blood donors of different phenotypes including 6 RiRi (DCe/DCe), 6 R 2 R 2 (DcE/DcE), 7 RiR 2 (DCe/DcE), 6 Rir (DCe/dce), 6 R 2 r (Dce/dce), and 6 Ror (Dce/dce) were sequenced using the Ion Personal Genome Machine™ (PGM™). All samples were tested for RHD zygosity using digital PCR. The RHD gene was amplified in 6 overlapping amplicons using RHD-specific primers. 200-base pair read sequencing libraries were prepared and then sequenced on the Ion PGMTM using a 316 chip. Data was then mapped to the hg38 human genome reference sequence and analyzed using the CLC Workbench 9.5.

Table 2: Sequence, annealing temperature (T a ), and exons covered for primers used for RHD LR-PCR

Results

In one R 2 R 2 sample, one exon 9 S P 25321889 G>C was detected resulting in the amino acid change Gly385Ala, which is linked to weak D type 2. Multiple intronic SNPs were detected in all samples in which 15 homozygous SNPs were present in all 37 samples; these may represent SNP variants of the DAU*0 allele which the hg38 reference sequence encodes. Another 19 SNPs were present in all R 2 R 2 and R 2 r samples as homozygous SNPs and in RiR 2 samples as heterozygous SNPs. 14 intronic SNPs were present in all RiRi, Rir and Ror as homozygous SNPs and heterozygous SNPs in all R1R2 samples. 16 heterozygous intronic SNPs were only present in the R2R2 weak D type 2 sample. Intronic SNPs are suspected to be linked to a specific haplotype, which could be used in the future to establish an assay to genotype Rh antigens without the need to fully sequence the Rh genes. Summary / Conclusions

37 samples were sequenced on the Ion PGMTM to study RHD mutations and assess RHD variations present in the population. Further samples are currently being sequenced using identical techniques. Intronic SNPs were used to determine their relation to specific haplotypes. These SNPs represent novel diagnostic approaches to investigate known and novel variants of RHD and RHCE. The sequencing of multiple hemizygous samples results in the determination of reference RHD genes (SEQ ID NOs: 23-25).

References

Huh et al, A Rapid Long PCR-Direct Sequencing Analysis for ABO Genotyping. Annal. Clin & Lab. Sci. 41 :340