Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A GENE MUTATED IN FANCONI ANEMIA COMPLEMENTATION GROUP I
Document Type and Number:
WIPO Patent Application WO/2008/108640
Kind Code:
A1
Abstract:
The current invention relates to an isolated human DNA molecule on chromosome 15 wherein said DNA molecule has a nucleotide sequence which sequence is mutated in both alleles of individuals of complementation group I of Fanconi anemia (FA), the FANCI gene. It also relates to monoallelic mutations in individuals predisposed to cancer. It further relates to cDNA, fragment, and polypeptides and siRNAS derived thereof, and methods for determining a genetic defect in a patient, the defect being a mutation in the Fanconi anemia gene of complementation group I, or for complementing a genetic defect in an isolated cell, the defect being a mutation in the Fanconi anemia gene of complementation group I. It also relates to determining defects in individuals predisposed to cancer. Furthermore the invention relates to the use of the gene, cDNA and polypeptides and corresponding fragments and siRNAS for diagnosis of sporadic (common) cancers, treatment or drug development.

Inventors:
DORSMAN JOSEPHINE CHRISTINE (NL)
DE WINTER JOHANNES PETRUS (NL)
JOENJE HANS (NL)
Application Number:
PCT/NL2008/000077
Publication Date:
September 12, 2008
Filing Date:
March 07, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VERENIGING VOOR CHRISTELIJK HOGER ONDERWIJS WETENSCHAPPELIJK ONDERZOEK EN PATIENTENZORG (NL)
DORSMAN JOSEPHINE CHRISTINE (NL)
DE WINTER JOHANNES PETRUS (NL)
JOENJE HANS (NL)
International Classes:
C07K14/47; C12N15/10; C12Q1/68
Foreign References:
US5681942A1997-10-28
Other References:
DATABASE EMBL [online] 11 June 2002 (2002-06-11), "Homo sapiens chromosome 15, clone RP11-217B1, complete sequence.", XP002486637, retrieved from EBI accession no. EMBL:AC124068 Database accession no. AC124068
DATABASE EMBL [online] 13 August 2003 (2003-08-13), "Diagnostic and screening method.", XP002486636, retrieved from EBI accession no. EMBL:BD249887 Database accession no. BD249887
DORSMAN JOSEPHINE C ET AL: "Identification of the Fanconi anemia complementation group I gene, FANCI", CELLULAR ONCOLOGY, vol. 29, no. 3, 2007, pages 211 - 218, XP009102622, ISSN: 1570-5870
SIMS ASHLEY E ET AL: "FANCI is a second monoubiquitinated member of the Fanconi anemia pathway", NATURE STRUCTURAL & MOLECULAR BIOLOGY, vol. 14, no. 6, June 2007 (2007-06-01), pages 564 - 567, XP002486632, ISSN: 1545-9985
SMOGORZEWSKA AGATA ET AL: "Identification of the FANCI protein, a monoubiquitinated FANCD2 paralog required for DNA repair", CELL, vol. 129, no. 2, April 2007 (2007-04-01), pages 289 - 301, XP002486633, ISSN: 0092-8674
LEVITUS MARIEKE ET AL: "Heterogeneity in Fanconi anemia: evidence for 2 new genetic subtypes", BLOOD, vol. 103, no. 7, 1 April 2004 (2004-04-01), pages 2498 - 2503, XP002486634, ISSN: 0006-4971
STRATHDEE C A ET AL: "CLONING OF CDNAS FOR FANCONI'S ANAEMIA BY FUNCTIONAL COMPLEMENTATION", NATURE (LONDON), vol. 356, no. 6372, 1992, pages 763 - 767, XP002486635, ISSN: 0028-0836
Attorney, Agent or Firm:
WITTOP KONING, T., H. (P.O. Box 3241, GE Rijswijk, NL)
Download PDF:
Claims:
CLAIMS

1. An isolated human DNA molecule derived from chromosome 15 wherein said DNA molecule has a nucleotide sequence which sequence is mutated in FA complementation group I .

2. DNA molecule according to claim 1 which is localized to locus 15q25- 26.

3. DNA molecule according to claim 2 containing a gene, wherein the gene encoded by said DNA molecule has 38 exons with a translation start in exon 2, and encodes a 1328 amino acid protein with 3 nuclear localization and 3 ATM/ATR phosphorylation motifs.

4. DNA molecule according to any of the previous claims selected from the group consisting of: a. a DNA molecule having the nucleotide sequence shown in SEQ ID No. 1, or the complementary strand of said nucleotide sequence; b. A DNA molecule that shows at least 80%, more preferable

90%, even more preferable 95%, most preferable 98% homology with the nucleotide sequence shown in SEQ ID No. 1, or the complementary strand of said nucleotide sequence

5. DNA molecule according to any of the previous claims, that is a cDNA molecule selected from the group consisting of: a. a DNA molecule having the nucleotide sequence shown in SEQ ID No. 2, or the complementary strand of said nucleotide sequence; b. A DNA molecule that shows at least 80%, more preferable

90%, even more preferable 95%, most preferable 98% homology with the nucleotide sequence shown in SEQ ID No. 2, or the complementary strand of said nucleotide sequence.

6. An oligonucleotide comprising at least 150, preferably at least 100, more preferably at least 50, even more preferably at least 20, most preferably at least 15 consecutive nucleotides of a DNA molecule according to any of the claims 1-5.

7. A DNA molecule according to any of the previous claims, wherein a mutation has been introduced by means of insertion, deletion, and/or replacement of one or more nucleotides, and wherein said molecule encodes for a protein that, when introduced into cells from patients with Fanconi anemia of complementation group I, does not reduce the sensitivity of those cells to mitomycin C.

8. A DNA molecule according to any of the previous claims, wherein a mutation has been introduced chosen from the group consisting of partial of complete exon deletion, inserted exon, protein truncation, amino acid substitution.

9. A DNA molecule according to any of the previous claims, wherein at least one mutation or polymorphism is selected from the group consisting of mutations c.2T>C, c.670-2A>G, c.3854G>A, c.3006+3A>G,

C.3853OT, c.3437_3455deletion, C.3895OT, c.l264G>C, c.3350-88A>G,

C.2572OT, c.2509G>T, and c.2248T>G and polymorphism C.164OT of the nucleotide sequence as shown in SEQ ID No .2 is present.

10. Polypeptide encoded by a DNA molecule according to any of the claims 1-9.

11. Method for determining a genetic defect in a patient, the defect being a mutation in the Fanconi anemia gene of complementation group I, the method comprises determination of the sequence of the FANCI gene of said patient.

12. Method for determining whether a subject carries a mutant FANCI gene, comprising the steps of: i. providing a biological sample from the subject, which sample includes DNA and/or RNA, ii. determining the sequence of the FANCI gene or FANCI mRNA, or a portion thereof, iii. comparing the determined sequence with that of SEQ ID No 1 or SEQ ID No 2.

13. Method according to claim 12, wherein step b) comprises determining the sequence of a portion of the FANCI gene encompassing

one or more of the mutations as defined in claim 8, or a corresponding portion of the FANCI mRNA.

14. Method for classification of tumors, comprising the steps of i. providing a biological sample from the subject, which sample includes DNA and/or RNA and/or protein, ii. determining whether the FANCI gene is expressed at the

RNA and/or protein level, and iii. determining the sequence of the FANCI gene or FANCI mRNA, or a portion thereof, iv. comparing the determined sequence with that of SEQ ID No 1 or SEQ ID No 2 to establish the differences therebetween.

15. Method according to claim 14, wherein step b) comprises determination of the presence of the FANCI protein, in particular by- binding to specific antibodies, or by determination of the presence of FANCI mRNA, in particular by quantitative RT-PCR, or by determination of methylation of the promoter of the FANCI gene, e.g. by methylation specific PCR or determination of DNA changes of the promoter of the FANCI gene, e.g. by DNA sequencing.

16. Method for drug testing, in particular for the suitability for use as an antitumorigenic drug in patients having a tumor wherein FANCI is inactive or less ' active, comprising, the steps of: i. Providing a biological sample of the patient comprising DNA and/or RNA and/or protein ii. determining whether the FANCI gene is expressed at the

RNA and/or protein level, and iii. determining whether a FANCI mutation is present in the tumor DNA of the patient, iv. identifying the said mutation, v. incubating tumor cells from the patient ex vivo with an antitumorigenic drug, vi. determine whether the antitumorigenic drug is capable of inhibiting growth of the tumor cells, vii. correlating the mutation with the effectivity of the antitumorigenic drug.

17. Method according to claim 17, wherein the antitumorigenic drug is a polyfunctional alkylating agent, preferably a bifunctional agent, most preferably cis-platin.

18. Method according to any of the claims 14 - 17, wherein the tumor is chosen from the group, consisting of FA, breast cancer, ovarian cancer, head-and-neck cancer, solid childhood cancer or prostate cancer.

19. Method of complementing a genetic defect in an isolated cell, the defect being a mutation in the Fanconi anemia gene of complementation group I, the method comprising introducing into the cell one or more DNA molecules according to claims 1-6 or the polypeptide according to claim 10.

20. Use of a DNA molecule according to claims 1-9 and/or a polypeptide according to claim 10 for methods of diagnosis, treatment, drug development, molecular diagnostics of FA patients carrying biallelic FANCI mutations, counseling and surveillance of subjects carrying^ a biallelic FANCI mutation for a highly increased tumor risk and bone marrow failure, counseling and surveillance of subjects carrying a monoallelic FANCI mutation for a moderately increased tumor risk, tumor classification by determining FANCI defects, as target for therapy or as lead in drug development schemes.

21. Use according to claim 20, wherein said methods of diagnosis, treatment or drug development are directed to the diagnosis, treatment or drug testing of FA, cancer, bone marrow failure.

Figures and Tables

Table 2 Sequence variants at the KIAAlI'94 locus in FA-I patients'

Maternal Allele

Paternal Allele

Patient Origin DNA Effect DNA Effect ID

EUFA592 13 Turkey- , e . 2T>C p .Metl ? c.2T>C " p.Metl?

1 BD952 b India c . 164OT a p . Pro55Leu c.164OT p.Pro55Leu c .3854G>A p .Argl285Gln c.3854G>A p.Argl285Gln

EUFA695 U . S .A. c . 3006+3A>G p .Arg964_Glrxl00 c7l264G>C p.Gly422Arg 0 2del

L EUFA816 Hungary C . 3853OT p .Argl285X c.3350- p.Glulll7fs 88A>G d j EUFA961 ' Austria | c.3437__3455 p,Glulll7fs ? C.2572OT r p.His858_Arg879 i ,del e del

EUFA1399 Germany C.3895OT p.Argl299X C.3895OT g p.Argl299X

VU1301 h NL c.2509G>T p.E837X

VU1466 h U. S.A c.2248T>G p.C750G C.2509OT P-E837X

a Description of variants refers to the sequence in Supplementary Figure 2 online. Mutations were found by cDNA sequencing, followed by- sequencing of the genomic DNA. Variant C.164OT was observed in 9/96 healthy controls (from The Netherlands) and was therefore considered a polymorphism. None of the other sequence variants were observed in the control panel, nor in the public databases that list common polymorphisms and were therefore considered pathogenic. b Consanguineous marriages (parents are first cousins). c Changes splice donor site score from 0.92 to 0.43, which results in in-frame deletion of exon 27. d Generates a new splice donor site resulting in an additional exon (see Fig. Id) . e This 19-base pair deletion leads to skipping of the entire exon 32. f Creates splice donor site in exon 24 leading to an in-frame deletion of base pairs 2571 to 2636 from the cDNA.

g The maternal mutation appeared homozygous in the patient, but hemizygosity can not be excluded, since DNA from the father was not available for analysis. h Patients VU1301 and VU1466 were not used for identification of the FANCI gene. DNA sequence mutations were obtained afterwards. The mutations were not detected in a panel of 96 control individuals.

Figure 2 Compensatory sequence alteration in KIAA1794 associated with phenotypic reversion in FA-I patient-derived lymphoblasts . (a) A subline of lymphoblasts derived from patient EUFA816 was phenotypically reverted to mitomycin C (MMC) resistance (EUFA816R) , as shown by MMC-induced growth inhibition curves; HSC93, wild type lymphoblasts. (b) Monoubiquitination of FANCD2 (formation of D2-L) is absent in EUFA816 lymphoblasts, but restored in the reverted cells. D2-S, non-ubiquitinated form of FANCD2. (c) Amplification of base pairs 3019 to 3765 of the KIAA1794 cDNA, encompassing exons 31 and 32, generated an extra (larger) fragment in EUFA816, which appeared weaker in the reverted cells, EUFA816R. Patient BD952 (who carries no mutations in the amplified region) served as a control. The lower band in both EUFA816 cells (wt) represents the other allele containing the premature stop mutation C.3853OT in exon 37 (Table 1). (d) Genomic sequence showing exon 31 and 32 (red) and the additional exon (blue) resulting from the pathogenic mutation c.3350-88A>G (lower arrow) which creates a splice donor site with a score of 0.85 according to http: //www. fruitfly.org/cgi-bin/seq tools/splice. html . In EUFA816R cells a second mutation is observed (c.3349+97T>G, upper arrow), which reduces the splice acceptor score for the additional exon from 0.45 to 0.33, allowing the normal splicing to take place with a probability that appears sufficient to correct the cellular phenotype.

Figure 2 . Compensatory sequence alteration in KIAAI l 94 associated with phenotypic reversion in FA-I patient-derived lymphoblasts .

MMC (nM)

c.3350-88A>G

Figure 1 FA-I families and patients analyzed to identify the gene defect in individuals of FA complementation group I.

595 592 961 962 965 816 480 1355

1. 2. 3. 4.

EUFA EUFA 695 1399

Filled-in symbols are the affected individuals. Families 1-4 were used to delineate the candidate regions for FANCI by homozygosity mapping (patients EUFA592, BD952, and 1428 in consanguineous families 1 and 2) and linkage analysis (families 3 and 4) . DNA was not available from the parents of family 2 and from the patients EUFA695 and EUFA1399.

Table 1 Clinical features of the FA-I individuals studied

Patient Retarde Thumb/radi Kidney/hea Onset Age at Comments ID d us rt marrow death growth anomalies anomalies failure (yr)

(yr)

EDFA592 Yes Yes ,Yes 2.5 6.5 'Consanguine i OUS

I BD952 Yes No No 7.3 23 Consanguine ous 1428 Yes No No 7.3 115 Consanguine

IOUS

L_

EUFA695 12 No details known a

IεDFA816 [Yes i No No 12 ' J Died after

'[transplant i

EUFA480 Yes No No 4.8 Age 24, transplante d at age 8.5 yr

EUFA961 Yes 'Yes Yes _ > 1 Died after i transplant

EUFA1399 Yes Yes Yes Age 30.5 yr EUFA 1301 Yes No No Alive at 8 years

EUFAl466 Yes Yes Yes 1.5

a Clinical details for this patient, including cause of death, could not be filed. Lymphoblasts were hypersensitive to mitomycin C and used for cell fusion analysis leading to classification as FA-I.

SEQ ID No. 2: KIAAl 794/FANCI coding sequence (human cDNA) , including amino acid sequence of polypeptide derived thereof. This sequence represents a splicing variant in the database, which differs from the reference sequence by the presence of an additional exon (exon 24, underlined) . The reading frame starts in exon 2. All (varying) splicing variants are also contemplated, including the one without exon 24.

2

-90 ' TCTTGTTGTTACGGGTAACGGAAGTGTGGCGGCGTTGGGTTGAGCGGGCTTTTTGGAAGT

2 -30 TTGTGGCGGAGTTCTGTGATATGAGCAACA

1 2 ->

1 ATGGACCAGAAGATTTTATCTCTAGCAGCAGAAAAAACAGCAGACAAACTGCAAGAATTT 60 1 M D Q K I L S L A A E K T A D K L Q E F 20

3

61 CTTCAAACCCTGAGAGAAGGTGATTTGACTAATCTCCTTCAGAATCAAGCAGTGAAAGGA 120 21 L Q T L R E G D L T N L L Q N Q A V K G 40

4

121 AAAGTTGCTGGAGCACTCCTGAGAGCCATCTTCAAAGGTTCCCCCTGCTCTGAGGAAGCT 41 K V A G A L L R A I F K G S P C S E E A 60

181 GGAACACTTAGGAGACGTAAGATATACACTTGTTGTATCCAGTTGGTGGAATCGGGGGAT

61 G T L R R R K I Y T C C I Q L V E S G D 80

5

241 TTGCAGAAAGAAATAGCGTCTGAGATCATAGGATTACTGATGCTGGAGGCTCACCATTTT 300

81 L Q K E I A S E I I G L L M L E A H H F 100

301 CCAGGACCATTATTGGTTGAATTAGCCAATGAGTTTATTAGTGCTGTCAGAGAAGGCAGC 360 101 P G P L L V E L A N E F I S A V R E G S 120

361 CTAGTGAATGGAAAATCTTTGGAGTTACTACCTATCATTCTCACTGCCCTGGCTACGAAA 420 121 L V N G K S L E L L P I I L T A L A T K 140

6

421 AΆGGAAAATCTGGCTTATGGAAAΆGGTGTACTGAGTGGGGAΆGAATGTAΆGΆAACAGTTG 480

141 K E N L A Y G K G V L S G E E C K K Q L 160

7

481 ATTAACACCCTGTGTTCTGGCAGGTGGGATCAGCAATATGTAATCCAACTCACCTCCATG 540

161 I N T L C S G R W D Q Q Y V I Q L T S M 180

8

541 TTCAAGGATGTCCCTCTGACTGCAGAAGAGGTGGAATTTGTGGTGGAAAAAGCATTGAGC 600

181 F K D V P L T A E E V E F V V E K A L S 200

601 ATGTTCTCCAΆGATGAΆTCTTCAAGAAΆTACCACCTTTGGTCTATCAGCTTCTGGTTCTC 660 201 M F S K M N L Q E I P P L V Y Q L L V L 220

9

661 TCCTCCAAGGGAAGCAGAAAGAGTGTTTTGGAAGGAATCATAGCCTTCTTCAGTGCACTA 720 221 S S K G S R K S V L E G I I A F F S A L 240

10 721 GATAAGCAGCACAATGAGGAACAGAGTGGTGACGAGCTATTGGATGTTGTCACTGTGCCA 780

241 D K Q H N E E Q S G D E L L D V V T V P 260

781 TCAGGTGAACTTCGTCATGTGGAAGGCACCATTATTCTACACATTGTGTTTGCCATCAAA 840

261 S G E L R H V E G T I I L H I V F A I K 280

11

841 TTGGACTATGAACTAGGCAGAGAΆCTCGTGAAACACTTAAAGGTAGGACAGCAΆGGAGAT 900 281 L D Y E L G R E L V K H L K V G Q Q G D 300

901 TCCAATAATAACTTAAGTCCCTTCAGCATTGCTCTTCTTCTGTCTGTAACAAGAATACAA 960

301 S N N N L S P F S I A L L L S V T R I Q 320

12

961 AGATTTCAGGACCAGGTGCTTGATCTTTTAAAGACTTCGGTTGTAAAGAGCTTTAAGGAT 1020 321 R F Q D Q V L D L L K T S V V K S F K D 340

1021 CTTCAACTCCTCCAAGGCTCAAAATTTCTTCAGAATCTAGTTCCTCATAGATCTTATGTT 1080 341 L Q L L Q G S K F L Q N L V P H R S Y V 360

13

1081 TCAACCATGATCTTGGAAGTAGTGAAGAATAGCGTTCATAGCTGGGACCATGTTACTCAG 1140 361 S T M I L E V V K N S V H S W D H V T Q 380

1141 GGCCTCGTAGAACTTGGTTTCATTTTGATGGATTCATATGGGCCAAAGAAGGTTCTTGAT 1200 381 G L V E L G F I L M D S Y G P K K V L D 400

1201 GGAAAAACTATTGAAACCAGCCCAAGTCTTTCTAGAATGCCAAACCAGCATGCATGTAAG 1260

401 G K T I E T S P S L S R M P N Q H A C K 420

14

1261 CTCGGAGCTAATATCCTGTTGGAAACTTTTAAGATCCATGAGATGATCAGACAAGAAATT 1320 421 L G A N I L L E T F K I H E M I R Q E I 440

1321 TTGGAGCAGGTCCTCAACAGGGTTGTTACCAGAGCATCTTCTCCCATCAGTCATTTCTTA 1380 441 L E Q V L N R V V T R A S S P I S H F L 460

15

1381 GACCTGCTTTCAAATATCGTCATGTATGCACCCTTAGTTCTTCAAAGTTGTTCTTCTAAA 1440 461 D L L S N I V M Y A P L V L Q S C S S K 480

1441 GTCACAGAAGCTTTTGACTATTTGTCCTTTCTGCCCCTTCAGACTGTACAAAGGCTGCTT 1500 481 V T E A F D Y L S F L P L Q T V Q R L L 500

16

1501 AAGGCAGTGCAGCCCCTTCTCAAAGTCAGCATGTCAATGAGAGACTGCTTGATACTTGTC 1560 501 K A V Q P L L K V S M S M R D C L I L V 520

17

1561 CTTCGGAAAGCTATGTTTGCCAACCAGCTTGATGCCCGAAAATCTGCAGTTGCTGGGTTT 1620 521 L R K A M F A N Q L D A R K S A V A G F 540

1621 TTGCTGCTCCTGAAGAACTTTAAAGTTTTAGGCAGCCTGTCATCCTCTCAGTGCAGTCAG 1680 541 L L L L K N F K V L G S L S S S Q C S Q 560

18

1681 TCTCTCAGTGTCAGTCAGGTTCATGTGGATGTTCACAGCCATTACAATTCTGTCGCCAAT 1740

561 S L S V S Q V H V D V H S H Y N S V A N 580

1741 GAAACTTTTTGCCTTGAGATCATGGATAGTTTGAGGAGATGCTTAAGCCAGCAAGCTGAT 1800 581 E T F C L E I M D S L R R C L S Q Q A D 600

19

1801 GTTCGACTCATGCTTTATGAGGGGTTTTATGATGTTCTTCGAAGGAACTCTCAGCTGGCT 1860 601 V R L M L Y E G F Y D V L R R N S Q L A 620

20

1861 AATTCAGTCATGCAAACTCTGCTCTCACAGTTAAAACAGTTCTATGAGCCAAAACCTGAT 1920 621 N S V M Q T L L S Q L K Q F Y E P K P D 640

1921 CTGCTGCCTCCTCTGAAATTAGAAGCTTGTATTCTGACCCAAGGAGATAAGATCTCTCTA 1980 641 L L P P L K L E A C I L T Q G D K I S L 660

21

1981 CAAGAACCACTGGATTATCTGCTGTGTTGTATTCAGCATTGTTTGGCCTGGTATAAGAAT 2040 661 Q E P L D Y L L C C I Q H C L A W Y K N 680

2041 ACAGTCATACCCTTACAGCAGGGAGAGGAGGAAGAGGAGGAGGAAGAGGCATTCTACGAA 2100 681 T V I P L Q Q G E E E E E E E E A F Y E 700

2101 GACCTAGATGATATATTGGAGTCCATTACTAATAGAATGATTAAGAGTGAGCTGGAAGAC 2160 701 D L D D I L E S I T N R M I K S E L E D 720

22

2161 TTTGAACTGGATAAATCAGCAGATTTTTCTCAGAGCACCAGTATTGGCATAAAAAATAAT 2220 721 F E L D K S A D F S Q S T S I G I K N N 740

2221 ATCTGTGCTTTTCTTGTGATGGGAGTTTGTGAGGTTTTAATAGAATACAATTTCTCCATA 2280 741 I C A F L V M G V C E V L I E Y N F S I 760

23 2281 AGTAGTTTCAGTAAGAATAGGTTTGAGGACATTCTGAGCTTATTTATGTGTTACAAAAAA 2340

761 S S F S K N R F E D I L S L F M C Y K K 780

2341 CTCTCTGACATTCTTAATGAAAAAGCGGGTAAAGCCAAAACTAAAATGGCCAACAAGACA 2400 781 L S D I L N E K A G K A K T K M A N K T 800

24

2401 AGTGATAGTCTTTTGTCCATGAAATTTGTGTCCAGTCTTCTCACTGCTCTTTTCAGGGAT 2460 801 S D S L L S M K F V S S L L T A L F R D 820

2461 AGTATCCAAAGCCACCAAGAAAGCCTTTCTGTTCTCAGGTCCAGCAATGAGTTTATGCGC 2520 821 S I Q S H Q E S L S V L R S S N E F M R 840

2521 TATGCAGTGAATGTAGCTCTGCAGAAGGTACAGCAGCTAAAGGAAACAGGGCATGTGAGT 2580 841 Y A V N V A L Q K V Q Q L K E T G H V S 860

25 2581 GGCCCTGATGGCCAAAACCCAGAAAAGATCTTTCAGAACCTCTGTGACATAACTCGAGTC 2640

861 G P D G Q N P E K I F Q N L C D I T R V 880

2641 TTGCTATGGAGATACACTTCAATTCCTΆCTTCAGTGGAAGAGTCGGGAΆΆGAAAGAGAAA 2700 881 L L W R Y T S I P T S V E E S G K K E K 900

2701 GGAAAGAGCATCTCACTGCTGTGCTTGGAGGGTTTACAGAAAATATTCAGTGCTGTGCAA 2760 901 G K S I S L L C L E G L Q K I F S A V Q 920

26

2761 CAGTTCTATCAGCCCAAGATTCAGCAGTTTCTCAGAGCTCTGGATGTCACAGATAAGGAA 2820 921 Q F Y Q P K I Q Q F L R A L D V T D K E 940

2821 GGAGAAGAGAGAGAAGATGCAGATGTCAGTGTCACTCAGAGAACAGCATTCCAGATCCGG 2880 941 G E E R E D A D V S V T Q R T A F Q I R 960

27 2881 CAATTTCAGAGGTCCTTGTTGAATTTACTTAGCAGTCAAGAGGAAGATTTTAATAGCAAA 2940

961 Q F Q R S L L N L L S S Q E E D F N S K 980

2941 GAAGCCCTCCTGCTAGTCACGGTTCTTACCAGTTTGTCCAAGTTACTGGAGCCCTCCTCT 3000

981 E A L L L V T V L T S L S K L L E P S S 1000

28 29

3001 CCTCAGTTTGTGCAGATGTTATCCTGGACATCAAAGATTTGCAAGGAAAACAGCCGGGAG 3060

1001 P Q F V Q M L S W T S K I C K E N S R E 1020

3061 GATGCCTTGTTTTGCAAGAGCTTGATGAACTTGCTCTTCAGCCTGCATGTTTCGTATAAG 3120 1021 D A L F C K S L M N L L F S L H V S Y K 1040

3121 AGTCCTGTCATTCTGCTGCGTGACTTGTCCCAGGATATCCACGGGCATCTGGGAGATATA 3180 1041 S P V I L L R D L S Q D I H G H L G D X 1060

30

3181 GACCAGGATGTAGAGGTGGAGAAAACAAACCACTTTGCAATAGTGAATTTGAGAACGGCT 3240 1061 D Q D V E V E K T N H F A I V N L R T A 1080

31

3241 GCCCCCACTGTCTGTTTACTTGTTCTGAGTCAGGCCGAGAAGGTTCTAGAAGAAGTGGAC 3300 1081 A P T V C L L V L S Q A E K V L E E V D 1100

32

3301 TGGCTAATCACCAAGCTTAAGGGACAAGTGAGCCAAGAAACCTTATCAGAAGAGGCCTCT 3360 1101 W L I T K L K G Q V S Q E T L S E E A S 1120

3361 TCTCAGGCAACCCTACCAAATCAGCCTGTTGAGAAAGCTATCATCATGCAACTGGGAACT 3420 1121 S Q A T L P N Q P V E K A I I M Q L G T 1140

3421 CTGCTTACATTTTTCCACGAGCTGGTGCAGACAGCTCTGCCATCAGGCAGCTGTGTGGAC 3480 1141 L L T F F H E L V Q T A L P S G S C V D 1160

33

3481 ACCTTGTTAAAGGACTTGTGCAAAATGTACACCACACTTACAGCCCTTGTCAGATATTAT 3540 1161 T L L K D L C K M Y T T L T A L V R Y Y 1180

34

3541 CTCCAGGTGTGTCAGAGCTCCGGAGGAATTCCAAAAAATATGGAAAAGCTGGTGAAGCTG 3600 1181 L Q V C Q S S G G I P K N M E K L V K L 1200

35

3601 TCTGGTTCTCATCTGACCCCCCTGTGTTATTCTTTCATTTCTTACGTACAGAATAAGAGT 3660 1201 S G S H L T P L C Y S F I S Y V Q N K S 1220

3661 AAGAGCCTGAACTATACGGGAGAGAAAAAGGAGAAACCTGCTGCCGTTGCCACAGCCATG 3720

1221 K S L N Y T G E K K E K P A A V A T A M 1240

36

3721 GCCAGAGTTCTTCGGGAAACCAAGCCAATCCCT AACCTCATCTTTGCCATAGAACAGTAT 3780 1241 A R V L R E T K P I P N L I F A I E Q Y 1260

37

3781 GAAAAATTTCTCATCCACCTTTCTAAGAAGTCCAAGGTGAACCTGATGCAGCACATGAAG 3840 1261 E K F L I H L S K K S K V N L M Q H M K 1280

3841 CTCAGCACCTCACGAGACTTCAAGATCAAAGGAAACATCCTAGACATGGTTCTTCGAGAG 3900 1281 L S T S R D F K I K G N I L D M V L R E 1300

38

3901 GATGGTGAAGATGAAAATGAAGAGGGCACTGCATCAGAGCATGGGGGACAGAACAAAGAA 3960 1301 D G E D E N E E G T A S E H G G Q N K E 1320

3961 CCAGCCAAGAAGAAAAGGAAAAAATAAATGAAATGCCTGAGTTAATGTGAACTTTGGGGC 4020 1321 P A K K K R K K X

Seq ID No . 1

Human genomic sequence of FANCI gene, including exons and introns .

1 CCTGTCGGGACACCGCGCTGGGCACGCTCAGGTCCCACCGCGCGTGCTCATCGGCGGTCC 60

61 CTAGCGCGAGTTCGGACAGGCAGCGCCCCCCTCCGACTGTGAGCTGGGACGCTCCCGCAC 120 121 GCTTCCCGGCACCCCTTCAGTCTTCATGGTACACCCCGCCCGCAGGTACCCGGACGGCGG 180 181 AAGTGAGCCGCGGGGGCGGATCTTGTTGTTACGGGTAACGGAAGTGTGGCGGCGTTGGGT 240 241 TGAGCGGGCTTTTTGGAAGTTTGTGGCGGAGGTGAGGCCGAGGTGACTGCAGAGCGGCTC 300

10 301 GCGAGGTGCTCGGGCTGTGGGACTGGGCCCCTGGGAGGGAGCGGTTCTGTGGGGGAAAGG 360 361 AGGCTCCTGTCCTGACTTGGCGTTCTGGTATTTTCTCTGGCGTGGAAAGTAGGCGCGCTC 420 421 CGCTGTCTCCCGCCGCCTGCCTCAGCTTTCGCGGTTCCCTCCTAGGGGTGTGCCTCAGTC 480 481 GGAGCCCCTTCTGTTAAATTCTTCCCCCTTGGCCGAGGGTGAGCTCTTACTTCTGAGAAA, 540 541 TTTCCCTTAACCTTGCTTTGCATTTGTTGTCCGGTCGCGGTCTCCATCACCAGCTGCTTC 600

15 601 GTGTTATAGTTTTTGTATTTGCGCTCACGGCTTGTTAACTGAAGAACCCGGAAGGGAAGG 660 661 CCGCGGCGTTTCCCGCCGGCGGGGCTCTGCCATTCTCGGATCTTGGCTTCAGATCTCTGA 720 721 TGCTCCCCAGCGTCTCTTGGCAAGGAATCTACTTCGCGTTCAGGGAATTCACAGCCACCC 780 781 GCCTCTGGTCTTACCTCTGGTTTTAGGATGATTTTAGCAAATGTGCCTAGGTTGATTTAA 840 841 ACGTATCCCCCTCCGAAGCCGCCTGCCCCTTCTGGCTGGACACCTTTTTAGGTTGGTGTG 900

20 901 TCCTGTTTGCATCCGGTAGTTCAAAGCATGGGATCTTGGAGTCCGGAGCTGGACAGACTC 960 961 GGGTTTCACTTCTCTCTCTGCCACTTGGTAGCCTTCTAATCTTGGGGGAAATTCCCTAAT 1020

1021 CTCTGAATCCCAAAAGTTGTTCATCTAAAATGGGGCTAATATTACACCTACCTAGTAGGT 1080

25 1081 GGTGTTGTGGGAGTCATGTTAATGAATGTAAAGTGCTGGGTTCAGTGTTTTCCCTATTAT 1140

1141 AGGGGTTCAATAATTTGTTCAACAAACATTTGTTACCTAGTTTGAGCTAGGTACTGTGCT 1200

1201 AAGTGCTGGAGATACAATGATAAACAΆAACCAGCTACGGCCCGTGTCCTTAGGGGACTTA

30 1260

1261 TAGTCTAGTGGGAGATAACATTTGTTAATCTGATAATTACACATGTAATATTAAAAATTG 1320

1321 GTGACAGTATGTAAGAGAGTTTACTGGTGCCATGAGAGGATGTACTAGGGGAACGTTTAA 1380

35 1381 CAΆCCTATACTGAGCTTTTCTTTTCTTTTCTTTTCTTTTTTTTTTGAGACAGAGTCTCCC 1440

1441 TCTGTTGCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAACCTCCGCCTCCTG 1500

1501 AGTTCAAGTGATTCTCCTGCCTCAGCCTCACAAGTAGCTGGGACTACAGGTGCGTGCCGT

40 1560

1561 CACGCCCAGCTATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTATTTTTAGTA 1620

1621 GAGACGGGGTTTCACCATCTTGGCCAGGAAGGTCTGGATGTCTTGACCTAGTGATCCGCC 1680

45 1681 CGCCTTGGCCTCCCAAAGTGCTGAGATTACAGGCGTGAGCTACTGCGCCTGGCCTAGACT 1740

1741 GAGCTTTTGAGGAGATGTGGATTGAGCTAAGATCTGAAGGATTTGTAGGAATTAGTAATA 1800

1801 AAAGATTGGGGGGCAGTGGATGGAAGCAGGAATATTTATTTAAGTTCCAGATGGAAACAT

50 1860

1861 GTGGAAGGGCCTGTGGCAGAAAGAACTTGGAGATTTGAAGGACTGGAAGCAΆTTCTGCAT 1920

1921 GGCTGGAGCAGAGTGATGAAAGGGAGGTAGGCATGGGTCAGAACTTGCAGGACTTATTTT 1980

55 1981 ATCCTTACTTAAGGGTTTTGGACTGATAAAGCAATGGAAGCCACTGAAGAGCTTTAAGAT 2040

2041 GAGGATGGCATGGAGGGAΆGATAAGTCACATAATCAGGTTTGTGTTTCAAAΆACATCACT

2100

2101 TTAGTTGCTGCGTAGAAGGGAACAAGTTAGGAAACGATTGCAGTAGTTCAGGTGAGAGAT 2160

5 2161 GGTGGTGGCTTGGGGTAATGTGTCAGTAGTGGAGATGGAGAAAAGTAGACAGATTTGGGA 2220

2221 GATATTTAGTAGGTAACATTGACAGGACTTGGTAATGGATGTGGTGAAGGATGAAGGCAA 2280

2281 AGTTGTTAAGAATGTTACCTAGGTTTCTGGTCTGAGCAGCTGAGTGGATGATGGTACCAT 10 2340

2341 TTATGGAGACGGGAAACACAGGAGGAAGAAGAGGAGGAACAGATTTTCTTTCTACTAGGC 2400

2401 TGTACTTCCCCCTTCTATTTTCATAGCTACTCACTGAATTCTATCTGCTTCATGTGCCTC 2460

15 2461 CATTTCATCCCATCTTTTCTGTTCTCCAGTCACCACCCTAGTACAGGTCTTTATCATTTC 2520

2521 CACTAAAATAACTTCCAAATGGTCTCTTAGCATTTGCTGTTTCTTCTTACCTTCTCCCCA 2580

2581 CTTTAATCTGCACAAAGATGCTAGGTTCTGTTTCCTCAAACACTTGCAATCTTGTGAGAT 20 2640

2641 TGTCTTTTGCTCAAATTTTTTAATGGTTCTCAGTAATTTACGAAATAGAATCTAAATTCC 2700

2701 TTAACCTGTTTACTTCCAGCCTATCTTTCCAGACTTCCTACTCACTATTTTCCTACTTGG 2760

25 2761 ATTCTACCTCAGCCATTTTACTCATTTTCCCCCAAAGAAATTTCTCATTTCCCGTATGAA 2820

2821 TGCCTTTGTTCATGATACCCTTCTCCATAATGTATACAACCTGGTTGTCTGGACTACCTA 2880

2881 TATTTAATCTCTAAGTTCCAGTTCAAGCTTTCCTCTGCTGCGAAACCTTGACATTTAAGA 30 2940

2941 ACACAGTTACATCTTTTACCTTGTTAACTTCTCAGCTTTTCTTTCTTTCTTTCTTCCTTT 3000

3001 CTTTTTTTTTTTTTTTTTTTTGAGACAGAGCCTCGCTCTGTCGCCCAGGCTGGAGTGCAG 3060

35 3061 TGGCGTGATCTCCGCTCACTGCAAGCTCCTCCTCCCGGGTTCACGCCATTCTCCTGCCTC 3120

3121 AGCCTCCCGAGTAGCTGGGACTATAGGCGCCCGCCACCACGCCCAGCTAATTTTTTTTTG 3180

3181 TATTTTTGGTAGAGACGGGGGTTTCACCGTGTTAGCCAAAATGGTCTCGATCTCCTGACT 40 3240

3241 TCGTGATCTGCCCTCCTCAGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACGGTGC 3300

3301 CGGCCTCTTTTTTTTTTTTTTTTTAAGAGACAGGGTCTTACCTTGTTGCCTAGGCTGGAG 3360

45 3361 TACTGTGGCATGATCATAGCTCACTGCAGCCTCACACTCTTGGGCTCCCCTTCTCACTTG 3420

3421 TTTTTATACCATTTAATTGGCACATAGCATTTATAGTGCTATTGGTTTATTCATTTATTT 3480

3481 GTCAAATACTTTATTACATGCTTATGGTAATATATAATGTATTACATACTTGATATGCTG 50 3540

3541 TTAAATGGAAGCTTTGTTCTAGGTAGTATGGTGAGTAGATGGTAACAAGTGTGGGCAAAA 3600

3601 ATTATGAAACCACTTGATTAGAATACAGATTCCTAATAAGGCAATTCCAGGGAATTTTCA 3660

55 3661 GGATTATTTTGGTTAGGTTAAGAATCATCTTGCCACTGAGCTTAAGAAGAAGAAGAAAAA 3720

3721 AGAATCAGCTACTTAAAGATAGTCCTAAGTTTGTTGAGCACCCATTGCATAATAGGTATG 3780

3781 AAATATTCAGTTAACAGGGAAGGAAAACAAATCAAGTTTGTTTATTTCTGGAATCTTCAC 3840

3841 CCACCTCTGACGTTTTTCCCTTGTAGTTCTGTGATATGAGCAACAATGGACCAGAAGATT 3900 3901 TTATCTCTAGCAGCAGAAAAAACAGCAGACAAACTGCAAGAATTTCTTCAAACCCTGAGA 3960

3961 GAAGGTGATGTGAGTATTAGGAAGCATGTTCTGCTAAAAGTAAATGTCAGGCATGATGAC 4020

4021 ACATTCAAAGGACATGTGAGAAAGAAAAATTACGCTGCTTCTTCATCTACTCCCCATTCA 4080

4081 CTGTAGGAGACAAGGATTTATTTAATAATAATTTATAGATGTAAAGAAATTGGGGTGAGG 4140

4141 GAATGAGAGAAATAGACTACTTGAGGGCATTTGTGACCAAGTTTAAACCTGAAATAGTTT 4200 4201 AAGAATAGAATGCTTTTACAACTGTATTTATAAAAGCAATTAGTATACTTTTATGGTCCT 4260

4261 CAGAACAATCCTATTTGATGATAATATCCAGACTATAGTTACTGATTGAGACATTAGTTT 4320

4321 TGCATTTTGATGCTTCTCTGTATTTTTCCGTAAAGAAATTAAGTTTATAAGCGGCAAAAT 4380

4381 ATTTTGAAATTATTAGTAAAAATTAGGGCCTCTGTTTTCACTATAAGTGTAGGATTATAA 4440

4441 AATGGGATAGTAATTGCCTTATATGTTAGGCTGTTTCTTCTCCCTAATGTTCTTTCTCAG 4500 4501 CTGATATACTGCTCCTTTTTCCATAAGTGATCTCATCCTATCCAGGTCCCTGCCTGGAAA 4560

4561 AGAATCCCTCTTCAAGGTATATGTACAAACAAGTAAAGACATTGAGTATCATGGGATTTT 4620

4621 TTTCCTTTTTCTGTCAATTGATGAGTATTATGTTTTTAAGGGAATTTCTTTGTGTTAATG 4680

4681 TTGTAGTTATTGAAGGAAGGTGCATCTTAGTGTGTGTTATCTGACATACTAACTAGGAAT 4740

4741 TGATGCAGTGTTGCTCTTTGATTTTTTTTTCTTTGAAATTATCCAAGATTGCCTTTATAG 4800 4801 CTATTTTTTGAGATGACTTGTTGCATGAGTTGTAACTTGAATCATTCTATAGACAAGATT 4860

4861 TGATTGAAACTAGCATTTAAATTATTTATTTAAAGTTTCATGCCTCAAATTGAAGGCGTG 4920

4921 AACACTGTCAAATTGAAGACAGTGTTCCAGATGTTTTCTGACTCTAGGATAGAGTAATGC 4980

4981 TAATATTTCCCTCATTTTTCCATTAATGCAAAATCAGCTTTTTGGCAGCAACATTATATT 5040

5041 GTAGTCTCATTGCTCTTATATTTATTTTTGTATATGTTGATAAGTTGCTAAGTTTCCTTT 5100 5101 ATCTCTGAAGTCCTTTTAGAATCCATGTTAAAAACTATATATAGGCCAGGCATGGTGACT 5160

5161 TACACCTGTAGTCCCAGCACTTTGGGAGGATTCCTTTAGGCCAGGAGTTCGAGATCAGCT 5220

5221 TAGGCAACATAGTGAGACCCAGTTTCTAAAAACAACTACCAAAACAACAAACTTTATATC 5280

5281 TGTTAACTGTCACCTTTTTTAGATTCATCTTGTTTTACTCCATCAAGATCATTTTGTATG 5340

5341 ATTCTGCTGACCGTATTAATCAGCTTTTTATCCTCTACAAATTTGTTAAGCATGTCAGTA 5400 5401 ATATATGCAGGCAGTTCATTTGATAAAATGTCCCACAGAAGATGAACTCAGAGTAAAAAA 5460

5461 TTATAAAACACATAAAGAAGTCCTGTGGCAAGAΆAGACTCAAΆCAGAAATGGGAΆAATTC

5520

5521 ATATTCAAGGATTTAGAGCAATCTGAAAAAAATCTTTAATTATTTAACACAAATACTTAA 5580

5581 TACAGTATTTAACTTTATTTATTTAAAGGTGGGATCTCACCCTTGCACAGGCTGGAGTGC 5640 5641 AGTGGGGCGATCTCGGCACACTGCAGCCTCAACCTCCTGGGCTCATGCGATCCTCCCACC 5700

5701 TCAGCCTCCTGAGTAGCTGGGAGTACAAGCATGTGCCACCATGTCCGACTAATTTTTGTA 5760

5761 TTTTTAGCAGAGATGGGGTTTCGCCATGCTTTCCAGGCTGGTCTTGAACTCCTGAGCTCA 5820

5821 AGCAATCCACCCACCTCAGCCTCCCACAGTGCCGGGATTACACGTGTGAGCCACTCTGCC 5880

5881 TGACCTAGTATTTAAAATATTTGAAGATGTAAATGAAGGAATAGAACTCACAAAATAAGG 5940 5941 GATGGATTTAGAΆAACAΆTGAAAGACAACTTCTTGAAΆTGAAAACCTCAGTAGGTGGTTT

6000

6001 AΆCAGTAGΆCAAAACATTGCTAAAGAGAAAATTAGTAΆAGTGGAATACAGAGCTGAGGAT

6060

6061 AACTAAATAAAGCACAGATTGTAAAGAGGTGGAAAATATAAAGAATATTTAAAATATATG 6120

6121 CAGAGAACTTTGATAAGAATATATTATCTTCTTAAACACATAGAATGTTCATAAAAATTA 6180

6181 ATCACATGTCATATACTAGAGCACAAAGAAAACATGGGTAGGTCCCATAAGGTAGAATTA 6240 6241 TTACAAACTATTCTCTGTGATCACAATGCAATAAAACTGGAAATGAAAACCGAAACTAAA 6300

6301 AGTCAAAAAGCAAAGACACTTTTACTTGGAAATTAAAATACCTTCTATTCAACCATTGTG 6360

6361 GAAGACAGTGTGGCGATTTCTCAGGGΆTCTΆGAACTAGAΆATACCΆTTTGACCCAGTGAT 6420

6421 CCCATTACTGGGTATATACCCAAAGGACTGTAAATCATGCTGCTATAAAGACACATTCAC 6480

6481 ACGTATGTTTATTGCGGCACTATTCACAATAGCAAAGACTTGGAACCAACCCAAGTGTCC 6540 6541 AACAATGATAGACTGGATTAAGAAAATGTGGCACATCTACACCATGGAATACTATGCAGC 6600

6601 CATAAAAAATGATGAGTTCATGTCCTTTGTAGGGACATGGATGAAGCTGGAAGCCATCAT 6660

6661 TCTCAGCAAACTATCGCAAGGACAAAAAAACCAAACCCCACATGTTCTCACTCATAGGTG 6720

6721 GGAATTGAACAGTGAGAACACTTGGACACGGGAATGGGAACATCACACACCGGGGCCTGT 6780

6781 TGTGGAGTGAGGGGAGGGGGGAGGGATAGCATTAGGAGATACACCTAATGTAAATGACGA 6840 6841 GTTAATGGGTGCAGCACACCAACATGGCACATGTACGCATATGTAACAAACCTGCATGTT 6900

6901 GTGCACATGTACCCTAGAACTTAAAGTATAATAAAAATATATATATATATAAAAATAAAA 6960

6961 AAATACCTTCTAAATAACTCTTGGAGGAAAAGAGATACAAACAAATTACAGGATTTTTGA 7020

7021 AGAATAATTACAGATAACAAAAATACTACATGTTAAAAATCCATGGAATAATGTTAAAAA 7080

7081 CAAGTGTTTAACAAATGAAATAATGAAATCAATACAAATGAAATAATGAAAATAAATAAA 7140 7141 TTCCCAACTCAAAAACCTAGΆΆAAAΆGAGΆAAGTAAΆCΆACΆΆCAΆAΆAΆACACAAGAAA

7200

7201 CTACTAAAGATAAAAGTGGAAΆTTAATAAAGTAGAGAACAGAΆAΆATΆGTGGATCAAATT

7260

7261 AATCAAAATCTTGGTTCTTTGTGGGGGAAAAAAACACCATTAATTAATATAATAATCAAA 7320

7321 ACAGTGGAGTAAAAACATGCACAAAATGAGAAATGACAAGGGAGAAATACTGTTAATACA 7380 7381 GAGAAAATCTCAAAAAATACTGTGAGACTACAGTAGATTCCTGTGCAAATAAATTTGAAA 7440

7441 ACCTAGAGGAAATGGATAATTTCTTAGGCACGTACAGTTTAATAAAATTGAGCCAATTTG 7500

7501 AGATGGAAAGCTTAAGCAGACCAGTTTCCATAGAAGAAAAGAGAAAGTTATGAAGGAACT 7560

7561 ACTTCGTACAAAAACACCAGGCCCAGATAGATTGGCAGGAAACTTCTGTCAAACCTTTAG 7620

7621 AGATCAGGTAGATAGTCCTAATGCTACATAAATTGTTTTAGAGCATAAATAATAAAGGAA 7680 7681 AACTTCCAAGTTCTTTGAATGAAGCAAGTATAACATTGAATTAAACAAATATAACATTGA 7740

7741 TTTTTTTATGCTATCTTTATCAGGTATAACCTTAAACCTCATAAAGATAGCATAGAAAAA 7800

7801 AGAGGATATAGACTAATGTAACTTACGAATATTGATATAAAAGTCTTAAATATTAGGGGA 7860

7861 CAGATTGCAACACCACATTTAAAAAAAGTACACCATGATCAAGTGGGATTTAATAGCAGG 7920

7921 AATACAAGGTTGGTTCAGTATTTGAAAGTATATTACTATAAAGATTATACGATTATCTTC 7980 7981 ATGGATCAGAGAAAGCCTTTGACAAAATTCAACACTCATTCCTAATAAAAGCTTGAGAAA 8040

8041 AATCAGAATGGGTGGCTGTTTCCTTTACATCATAAAATGTATATATCTAAATTCTAGAGT 8100

8101 CAGCATCTTAGTGGGAAACCATAGAGACATTTCCTCTAAGGCCGGGAACAAAGCAATGAT 8160

8161 GCCTACTGTCTCCATTACTATATAATGTTCTGATTGGCTAGCCTATGCATTTAACCTAGA 8220

8221 GAAAACAATCACAGGCATAATAΆTGGAAAΆATAAAGGCTGAGTGCAGTGGCTCACACCCA

8280 8281 TAATCCCAACGCTTTGGGAGGCCAAGGTGGGCAGATCACTTGAGGTCAGGAGTTCGAAAC 8340

8341 CAGCCTGGCCAACATGGTGAAACCCCTTCTCTACTAAAAATATAAAAATTAGCTGGGCGT 8400

8401 GGTGGTGTGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAAGAATCACTTGAAC 8460

8461 CTGGGAGGTGGAGGTTGCAGTGAGCCCAGATTGCACCATTGCACTTCAACCTGGGTAAAC 8520

8521 AAAGTGAGATTCCATCTAAΆAAΆAAΆAAAΆAGAAAATATAAAACGTCTCTCTTTGTAGAT

8580 8581 GGCATGATAGTATACCTGGAAAACCCTGGAGAATCTGTGATAAAACCATCTATAAACGAA 8640

8641 TTCATCAAGGTAGCAGAATATAAAATTAATATGCAAAAATCAGTAGCATTCATACATAAT 8700

8701 AGCCAGTTAGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCAAGGCAGGCGGATCACC 8760

8761 TGAGGTGAGGAGTTCTAAACTAGCCTGGCCAACATGGTGAAACCTCAACTCTACTAAAAA 8820

8821 TAGAAAAATTAGCTGGGCGTGGTGGTGCACGCCTGTAGTCTCAGCTACTCAGGAGGCTGA 8880 8881 GGCAGGAGAATTGCTTGAACCCGGGAGGTGGAAGTTGCAGTGAGCTGAGATCACGCCACT 8940

8941 GTACTCCAGCCTGGGCCCCATTTACAATAGCAACAGAAAAGTTAAAACAGGAATAAACCT 9000

9001 GAGAAATGTCCAAAACCCATGCGAAAAAAAACTATGAAACACTTCTGAAGGACAAAGTAG 9060

9061 AAAACTAAGAAGATATCCCCTGTTCTTGATTGGGGAGTCTCAACATCACGAAAATGTCAA 9120 9121 TTCTTCCAAAGTTAATTTATCAATTTAATACCAGTAAAAATCCCATTGAGCTATATATAT 9180

9181 AAGTTGACTGCAAAATTCATATAGCAAATTGTACACACAGGGATAGTTAGGGAAAAGCTG 9240

9241 AAAAGAGCTCTGAGAGAGGATTAACCATAGCAGAGACTAAACACCACAAAAACTTTTATA 9300

9301 AATAAGAGTGCAGTACTGGTACATGAGTAGATAAACAGAAACGGAATAGGATACAAAGAA 9360

9361 TAGAATAGGTAATGCATAAGAATGGAAAGTCTCAAAATAGACCCAACTACAAGTAAAATC 9420 9421 TAGTATATAATAAAATTACATCTTAAAACACTGAAATAAAGATAATCTTTGACAAATGAT 9480

9481 ATTGGAACAACTGGATGGACATTTGGAAAAAATAGATGAAATTAGATGTGTTTCTCATAT 9540

9541 TCAAGAGAAAGCTCCAAATGAATCAGAAATTTAAATGTAAAAAACAAATCTAAATAAATG 9600

9601 TTAGGGCTGGGCACAGGGGCTTATCCCTGTAATCCCAGCACTTTGGGAGGCTGAGGTGGG 9660

9661 AAGATCGCTTGAGCCACGAGGTTCAGATTACAGTGAGCTATGATTGTGCCACTGCACTTC 9720 9721 AACGGGTGAAAGTGAGACTCTGTCTCTTΆAAAATAAATAΆATAAATAAATAAATAAATAA

9780

9781 ATAAGTATATGTTAGAAGAΆAACATGGATGAATTTCTCTTTAΆGCTCAGTGCAGTGAAAΆ

9840

9841 GCTTTCTAACTGTCATACAAAGTATAGATGCAAGAAAAGATAAGTTTGTCTACATATAAG 9900

9901 TTAAAAAGTTTGTGTGGCCAGAAATACCATAAGCAGAGTCAAAAGACAAATGGCAAACTG

9960 9961 GGAAAAAATATTTGTAACGTATCACAGAAAAAGGATTAATATCCCTAATTCATAAATAAC

10020 10021 TTGTGGGGGGGGGGGGGAAGATAAACTGTATTTTAAAAAGGCAAAAAACATGAGAATAAT

10080

10081 CACTAAAAGATATATAAAAATGGCCTTAAACATAAGAAAATATGCTCAACTTCACTCATA

10140

10141 ATAAGAGAAAAATAAAATAAAAGATAGTGATCCACCTGCCTTGGCCTCACAAAGTGCTGG 10200

10201 GATTACAGGCAGGAGCCACCATGCCTGGCCCCATGTATATTTTTATGTGTATAAACATGC

10260

10261 ATACATTTCCTGGCTCTCTCCAATGAAAGAACCATGAAGCAAAGATGCCATAGTTGCTCG

10320 10321 GTGCGGTGGCTCACGCCTGTAATCCCAACACTTTGGGAAACTGCGTGGGTGGATCACTTG

10380

10381 AGTCCGGGAGTTCAAGACCAGTTTGGGCAACGTGGTCAAATCCTGTCTCTATAAAAAAAT

10440

10441 AAATAAATAAATTTTAAAAAGATGCCTTAGTAACAACAAACACACCCATCACCTAGATCT 10500

10501 TTGCTGTTTCATAΆCCATTCCTCACTCAΆGGGAΆCCAGGGCTCTTCΆGAGAAACGGATAA

10560

10561 GTTTATGCGTGGGCAGGATGGATACAAGATGAACCTGGAACATCTTGCTATACCAGAAAG 10620 10621 TAAAGAAGTGCTCAAAAAAGACAGTGGGGGCGTGTCACAAAGACAGAAGAGCCAGCTTGA 10680

10681 AGGGTTTCCCTTTGGCCAAATCTGGGACTATTTGAGCATCAAAATAAGTTCAATAATGAA 10740

10741 CCATTGAAAAATACAGAGATTTATAAGTCCATACTCATAATTAATAAATAAATTACTTTG

10800

10801 GGAGTTCGAGGACGGAGGATCACTTGAGTCCAGGAGTTTGAGACCAGCTTGGACAACATA

10860 10861 GTGAGAACCCACCTCTACAAAAAAATAATTTAAAAAATTAGCATGGCATATCAGCACACG

10920

10921 CTTGTATTCCTAGCTAACTGGAGGGTTGCTGAGGCAGGAGAATTGCTTGAGCCCAGGAGT

10980

10981 TTCAAGTTATAGTGAGCTATGACTGTGCCACTGCAGTCCAGCCAGGGTGACAAAGAAGAC 11040

11041 TCTGTCTCTAAGGTAAGTAAAGTAAGTAAATGAATACATGGGTAGAAAGGAGTTCTCTTG

11100

11101 CTTACAGTGCAGGAGTTGGAAAATCATTCCTTTGCAATTGTAATGGTAAAGATTGAACCA

11160 11161 GACAAGAAATACCAATGGATGTTAAATCCAGGGGGACTTTAAAAACATTGATACGATACA

11220

11221 ATTAATGAAGCTAAGAACCTTATTCCAACTCTCCCCAGTAATTATTTCACTAATGTTCTC

11280

11281 TTTATGTTCCAGGATCCCACTCAGCATCCTACATTGCTCTTAGTTGTCATTTCCCCCTGA 11340

11341 TTTCCTACACTCTGCAATAGTTCTCAGTCTTTCCTTATCTCTGTTGATCATGATAGCCCT

11400

11401 GAGGAGTATTGATTAGTTGTTTTGCTGAATGTGTCTCATTTAGGGCCTGTTTGGTGGTGT

11460 11461 CTCATGATTGGACCAATGTCTTGTATCCCTGGCATAAGGACCACAGAAATGGTACCATGC

11520

11521 CCTTTTTAGTGTATCAAATCACTGAGCTCTTGATGTCAGTGTTATTCCTGGTGATACTGA

11580

11581 CTTTGATAACCTGGTAACAGTGGTTTCTGCTGGGCTTTTCCATTTTACAGGAAGAAAATT 11640

11641 AGATGAGGAGCAAGATATTTGCATGATCTTAAAGTGTCTCCCTATAAATTGCTTACTGGT

11700

11701 GGCAAAGGGACAGGGTGGGGGAAGTAATTATAAAGTGGGATCAGAATATCACCAATGAGG

11760 11761 GACAGATGGACATCATGTGCTTCCTGATGTGATACTCCGAGAATGACATGATATCACCTG

11820

11821 TGTTCTTGTGAGACCGCATAAACTCAATCTAGCCACAAGGAAACATCAGACAACACAAAA

11880

11881 TGAGGAATGTTCTATTTTTTAAAAACTAAAATTTATTTGGATTATGCTGTTAACTTGGTA 11940

11941 GTTACAGTCTGCCAATTAATTTTAAAGCCTTCTGAGGTAACAGGGATACTTAAATCTTAC

12000

12001 TTGCTCTATTTCTGAGTTTTAAAΆTGTAGTAAAΆAGCATAACTGAΆCATGGTCAΆAΆGAΆ

12060 12061 AGACTGAAGATCACATAAGTGGAAGGAGGAAGAAGGGGAGTAGAGGACAGAGTCAATAGA

12120

12121 AGAAAATTAACATTCGTTGAGTACTTGTCGTGTGTTAGGCTCTGTGCAAATTCTTCTCCC

12180

12181 CTTTATCTCAGTCCTTTGAGGTGGGGAAATGGACACACATAACAGATTAAGTTGCCTGAG 12240

12241 GTTTTATAGCTAGTAAATGGCATAGCTAGGATTTACATCCAGGGCTTCTGATTCTGGATC

12300

12301 CAGCTTTTTTTCCCTCTTCCTATGCCACACTGCTTTCTGTTGAAACTGAACTATTTCAAA

12360 12361 AAGAGTCAACAGCTTCTCAGCCTTGGCAGTGCTCACGTTTTGGCTCGGATAAATCTTTGT

12420

12421 TGTGGTGGGCTGTCCAGTGCATTGCAGAATGTTTGATAGCATCTCTTACCTCTACCCACT

12480

12481 AGATTCCAACAGCATCCTTCCAGTTGTGACAACCAGAAATGTTTCTAGACATTGTCTAGT

12540

12541 GTCCCCTGGGGGACAAAAATCACTGGTTGAGAACTGCTGCTCTAAGGGTTATGTCAAGGA

12600 12601 CCAAGTACTCTGTTGGAAGGTATAGGCAAGAGTAGTCAGATACTTCTTTATTATATATTT

12660

12661 AATAGAGAGATGGACTCCAGCTAGAAACACCAGCCTCATGTACTTGTGGAAGAGAGAGTT

12720

12721 TATTGTTTTATTAGTCTTGGCTTCTGTGACACCTCTCTCTCGAGATTATCCTCCAATGTT 12780

12781 ATTGACTTCACTTTCTCCATTGTCTTTGCTGGCAGCTGCTTCTCTTTTTAATGACTGACT

12840

12841 CTTGGTGTTTCTCTTCTTTATATTCTTTCCTGGATGGTCTCATCTGTTTCCATTGATGCT

12900 12901 CCACTTGGCATATTTCATTGTCATCTATTTGGCACTTCCACTTGGATATCTCAGAGTAGT

12960

12961 GGTTCTGAGCCTCACAGCAGACCCGAATCATAAACTGTGGGTAGATCCTGGAAATTTAGT

13020

13021 AAGCTCTCAGGGTGATTTAGATGCTTCCTTAATCCCAAATTTTCAGCACGTAAACAGATG 13080

13081 GTGCCACTATCCACCAGCTAGCTGCTTAAGCCAGAAATTTAGGAATCCTCCTTGATTTCT

13140

13141 TTCTTTTTTGTTGTTTTTTTTTGATATGGAGTCTCGCTCTGTCACCCAGGCTAGAGTGCA

13200 13201 GGGGCGTGAGCTCGGCTCACTGCAAGCTCCGCTTCCCGGGTTCATGCCATTCTCCTGCCT

13260

13261 CAGCCTCCCAAGTAGCTGGGACTACAGGTGCCCGCCACCACGCCCGGCTAATTTTTTGTA

13320

13321 TTTTTAGTAGAGACGGGGTTTCACCGTGTTAGCCAGGATGGTCTTGATCTCCTGACCTTG 13380

13381 TGATCTGCCCACCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCACGCTTG

13440

13441 GCCAGATTTCTTTTCCTTCATCACCACCCCCTACTTCCCTTCCAAACTAAGGAATTTCTA

13500 13501 TTCTTAGTTCTCAAATATTTATCTTAGATCTATCTACTTTTCTTCATCGCCTGTATTAGT

13560

13561 GTGCTAGGGCTGCTGTTACAATGTATCACAAATTGGGTAGCTTAAATAGCAGCAATGTAA

13620

13621 TGTTTCACAGTTCTGGAGACTAGAAGTGCAAGATCAAGTGTTGTCAGGGTTGGTTCCTTT 13680

13681 TGAGGGCTATGAGAATTTGTTCTGTGTCTCTCACCTAGCTTCTGGTGGTTTGTTGGCAAC

13740

13741 CTTTAGCATTCCTTGGCTTGTATCACCTGATATCTGCTTTCATCTTCACCTGACATTCCC

13800 13801 CCTGTTTGTGTTTCTGTGTCCAGAGTTCCCCTTTTTATAAGGGCATTAGTCTTTGGATTA

13860

13861 GGACCCATCCTAATGACCTCAACTAATCTCACCTACAAGAAGAGCCCAATTTCCAAATAA

13920

13921 GGTCACATTCTGAAATACTGGGGGTTAGAGCTTCAACATATGAAATTTGGGGTTGTGGGG 13980

13981 AGGACACAATTCAATTCATAATGCCTCTATTACTTTGGTCTAGGCCACTGTCATCTCTCT

14040

14041 CTTAAACATGGTAGCAGTCTACTAACAGGTATCCCCATTTCACTCTTAAACCCCCATAAC

14100 14101 TCATTCTCCACACAAGGAGCAAGGATGATTTTTGAAAAATCTATCATCCCATTTCCTTCT

14160

14161 TTTCTCTTTAGTAACTTCCGTTGCACTTAGAGAAAAGCTAAACTTCCTACCATGGCCTGT

14220

14221 AAGATTCTTCATAATCTGGTCCTTTCCCGTTTCTCTTCAAACTCATCTTCTACTACTCTT

14280

14281 TTCTTTATTATTCTCCAGCCATGTTGATTTTTTTTTTTTTAAGCAAGGCTTTGGCCTCAA

14340 14341 ATATGCCAAGTTCTTTGCTGCTTCTTGGACATTGCACGTGATCATACTTCCATCTGGAAT

14400

14401 ATCCTTCTCTAACGCCTTTGAATAATTAGCTCCTTCTTGTCCGCCAGGTCTCTACTTAAA

14460

14461 TGTCACCTACTTCTGAGAAGCTTCCCCTTACTACCCTAGAGGAAATTTCCTTCCTTTCCT 14520

14521 CACTTATTGTCTATCTCAGCCTCTTATTTGGTTCTTTTACAATTTTTTTCTGCCATCCTT

14580

14581 TCCCCTCCTGCTCTGTTATGAACTCTGAGTCAAATTACTATGTCTGTTTTGTTCATCTTT

14640 14641 ATAACATGCCTAAGGAAAATAACATGCTCAAAAAGTATTTGTTGGATGAATGTATGTTCC

14700

14701 CAACTTAGTGCTAAGTTATAΆΆCATAAATGTTTATAGΆATGAGTAΆGAATAGATTTTAGT

14760

14761 CTTGACAATCCAGGATCATATTAAGATATACCCTCCTCTGTCTΆAAΆATTATGGTGTTTG 14820

14821 TAATACTTGGCGTCTCTGAATGATCTGAACGTGACAGAAGAGTACTTTTTCATATTGAGT

14880

14881 GTTTAGGTCATTGGGGTAAAAGACTGTTGCCAGAGACTTGTACCTTTTTCTTTCTTTGCA

14940 14941 GTTGACTAATCTCCTTCAGAATCAAGCAGTGAAAGGAAAAGTTGCTGGAGCACTCCTGAG

15000

15001 AGCCATCTTCAAAGGTAATAATAATTAATGTCACTTCTGTCTGTCTCCTGCATTCATTGG

15060

15061 TAGATTTGTTTAGTTTTCGTACTTTTCTTACGGTTTGAAGCCCCACTGGAACATTTTCCA 15120

15121 TGTTTTGAGGATTCTACAGAATCAACTACAGCAGCAGTTTCCTGTCTAGTGGACTTAAGT

15180

15181 AATTTTACTGAAGGGAGTTTGAAGTGGCTACCTGAGATTGCGTTGCTGATTTGGGGGCCT

15240 15241 ATAATTACAGAATGTGCTTTTCGTAAAATCTGCTTTGGTTTACTTGTGCTTTTAAGATTG

15300

15301 TGGAGAAGAGAACAGCAATGTTTAATACATAATCCATGCTTAGCAAGGGGCAAGTAACCT

15360

15361 CTGGGGTTTAATGAAATGGGGTATCTTAAGGTGCCTTATATAGAGATTCTTCATGCCCGC 15420

15421 TGAGACGTATTTGTTTTCTTGTCTTGAACAAGGGCTAGTTTAAATTGTTGCAGATTGTGA

15480

15481 TTTATAGTGGTTTTATAAATTCTGGTAGGATAAAATTTTACAAGGCAAAAATTAAAAGTG

15540 15541 TCTTTGGACTCTTAATTAGGTATGTTAACTTGGTTTATTTTTCTTCCCCCAAGTAAATTA

15600

15601 GATCTCTGCTGCAAGGGTACAAAAACACATCTTCCTATTCATGAAAGAACTTTCAAGATT

15660

15661 GGTTGAGTCAACCTGTATGGAAACTTGCTAAATATTTTATAGAAATATCAGAGAAAAGGA 15720

15721 AATAATTAAAGATCAATAGCAACTGCCTACGTACACTACTACTAAATGGAAGAGCAATTC

15780

15781 TATGGCTTTTAGTATATTCAGAGTTGTGTATCCATTACCACAATTGATTTTAAAACAATT

15840 15841 TCCTTACCCCTGAAAGAAGGTTTGCATCTCTTAGATGTCACCTCTCCAAACCCCTAGTCC

15900

15901 ATTCCAGTCCTAAGAAACCACGAATCTACTTTTTGTCTCTACAGATTTGTCTATTCTGGA

15960

15961 CATTTCATTTAGTGGAATCATTCTATATATGGTCCTTTGTGACTGACTTCTTTCGTTTAT

16020

16021 AATGTTACAAAGGTTCCTCTATGTTGTAATATGTATCAGTACTTTATTTTTACTGCTGAA

16080 16081 TAATATTCCATTATGTGGATATACAACATTTTTCCATTCATGAGTTGATGAAGACTTAAG

16140

16141 TTGTATCTATTTTCTGGCCATTATAATACTACTATGAACATTTGTGTATAAATTTTTGTG

16200

16201 CGGACATGTTTTCCTTTCTCTTGGATATATATCTAGGAGTGGAATTGCCGGATCATGTGG 16260

16261 AAACTCTTATGTTTAACTGAGAAACCGCCAAACTGTTTTCCAATGTGGCTGGATTAGGAT

16320

16321 GCCAATTTTTTCACATCCTCATCAACACTTGTTATTATCTTTTTTTTTCAGTTTTGATTT

16380 16381 ATTTTCATCACTTTTTCTACATGATCCAGATATTTTAAAATGCAAAGAAAATTAACTTTA

16440

16441 ATGATATGTTCCAGGATCGGCACTAΆΆAΆAAAΆTTTTCAGACTGCAAATGAATTATACAΆ

16500

16501 ATGAAΆATATCAAATGGAGATCCCCTTATCCAAΆTGAAAGCACTCAACTTATTAAAAGTT 16560

16561 CACAAGTATTTGTATAGAGCACATTAAAAAAGTCAGCTTGCTAAATGTTTTGATTTTAAA 16620

16621 GAACGATTGCAGAAGTCTGAAGAAAΆTAGATTAGTTATTAAΆTTTGGGTTACTGGACTTC

16680 16681 TCAAAAGCTGTAAGACCTATTAGAAGGTTACTTCATCCTGTAATTATTAAAATAATAGGT

16740

16741 AGATGAAGAAAAGATGACATTTTAGTCCCTTTATTTTGGCTAAATTAAGCACTTTTTCAA

16800

16801 AGCCCTTAACCATTGCTGTTCTAAGCACCGTAGTAATCAGTCGTTTGATAATTCTGTTTT 16860

16861 TTTGTATTTAACTACAAΆCCTGTTCGTTTTTCCTATTTACCTGTCAATGTTGTAΆGACTT

16920

16921 GTTTCTGAACCCCCTGTTTAΆAACAATAAGGTTCCCCCTGCTCTGAGGAAGCTGGAΆCAC

16980 16981 TTAGGAGACGTAAGATATACΆCTTGTTGTATCCAGTTGGTGGAΆTCGGGGGATTTGCAGA

17040

17041 AAGAAATAGCGTCTGAGATCATAGGATTACTGATGCTGGAGGTAAGATGGCAAACAAAAA

17100

17101 CTTTTATTTGGGGGTAGGTTTTTGAGGGTTTGTAATTTGTTGAGGGAAGGAAAACTTAAG 17160

17161 TGCCCAACCTAGCATAGTCCTTATTTATGAGTAAGACCTGTTGTTAAAAAACAAAGAAGC

17220

17221 AGGCCGGGCATGGTGACTCACACCTGTAATCCCAGCACTTTGGGAGGCTGAGGTGAGGAT

17280 17281 CGCTTGAGCTCAGGAGTTTGAGACCACCCTCTGCAACATGGTGAAACCCCATCTCTACCA

17340

17341 AAAATACAAAAGAAAAAAAATTAGCCGGGCTCGTGGCGCATGTCAGTGGTCACAGCTACT

17400

17401 TGGGAGGCTGAGGCAGGAAGATCGCCTGAGCCAGGGAGGCGGAGGTTGCAGTGAGCCGAG 17460

17461 ATTGCACCGCTGCAGTCCAGCCTGGGCAGCAGAGTGAGACCCCGTCTCAAAAAAAAAGCA

17520

17521 GATGAATGGGTTATAATCCACTTAGGTGTATATATTACATTTCTGGTACCATGGACATAT

17580 17581 ATCTCTTTGAAGAGCTGGCATATCTGAAAACATCAGGGAACCGATACAGGATTCTAACTT

17640

17641 TTGAAGCCCTTCACAATTATAACTACGTACCAGACΆGAGCAGΆATGATTCCATAΆΆTCCC

17700

17701 TTATGGGACCAGTTCTGGATCTCGGTCAATTCATTTATTTCAGCAAAAGACTTTATTTCT 17760

17761 TAGAAAGATTTATCAAACATTGGATTTCTGCTTGAGATTTCCTTTATCCTGTGAACTTTT 17820

5 17821 AGGCTGSLCCATTTTCCAGGACCATTATTGGTTGAATTAGCCAATGAGTTTATTAGTGCTG 17880

17881 TCAGAGAAGGCAGCCTAGTGAATGGAAAATCTTTGGAGTTACTACCTATCATTCTCACTG 17940

17941 CCCTGGCTACGAAAAAGGAAAATCTGGCTTATGGAAAAGGTAATTTTCTTCCGACTTTAG 10 18000

18001 TGGCTTTTTCTCTATGCACATAATCAAATTCATTGCTCAGTATATTTTGCATTTCTAGGT

18060

18061 GTACTGAGTGGGGAAGAATGTAAGAAACAGTTGATTAACACCCTGTGTTCTGGCAGGTGA

18120

15 18121 GTCTTGTTAATATGTATAΆCTTTCTTAGGAΆTACAΆGTGGCGGAAAAAAAΆCCACTTTAT

18180

18181 TTCAGGAATTTATGTCTGGAGGTTTTAAGTCCTTCTTTTTTGTTGTTTTTAATATAACAC

18240

18241 TATAAAACTATTTAGTCTGCAATGAATTAGTGAGAAAGAAATGATTTACTATATTCTGTC 20 18300

18301 CTCTAAGACTGTTCATATAGTTTTATTTGCTTGACTGTATCTTCTTACAGCAATTTAGTG

18360

18361 TGTTAAGATACACATAATTAAATGTTAATTTAGTGTGTTTAATGCTCCAATCTAAGCATT

18420 25 18421 ATTGGAGTCTCAAGGTCGTGAAGGCTGGGAAAGTCACCCTGCTGCTGCATCTTTTCTTTT

18480

18481 TTTCTTTTTTCTTTTTTTTTTGAGGGAGTAGAATATACAAGTGATGTATCCTTTAACCAG

18540

18541 TAATTGTATTTCTGGGAATCTACATTAAAGAATGACTGCAACATACAAGAAAATCCTCAC 30 18600

18601 ATATGAACATGTTCAACACAGAAATTTGTGACCAGTATAAGCATACAGTAATAGAGACAT

18660

18661 GTAAACTATGTAAGTTAAGGTAGAGCCACTCCATGACATTTATACCTATTΆΆAAGTGAAA

18720

35 18721 TGTATACATCAAΆGATGAAΆAATTCTAACCATATAATTAACΆAΆGAGATTACTATGTTAT

18780

18781 ATAATACTGTGATTGCGTGTAGGTTTGTTAAAAAGAΆCAΆGAAGGGAATACCCAAGATTA

18840

18841 TCTAAACCAGAGATAATTGCTTTTATGGTTCCTTACATAAGATCCCTTACATAAATCATT 40 18900

18901 TTAAGCATGTATTTGTCAATTAATTTGTTTTTAATTACACAAGAGTTTATTCAATACATA

18960

18961 CTGTAATATACCTTGCATTTTTTATTTAATACAACTGTCCATGTCAGCATATACAACTCT

19020 45 19021 ACATCATTCTTTTTAATGGTTTGCAGTAATCTATTATTTGGCTAATCTACAATTGTCAGA

19080

19081 TTATTTGGTTATTTCCAGTTTTTGTGCTATTACAAAGAGTGTTGCAATGAATGTCTTTTT

19140

19141 TAATTTTAAATAGAGATGGGAATCTTGCTTTATTGGCCAGGCTGGTCTCTTGAACTCCTG 50 19200

19201 GCCTCAAGGGATCCTCCTCCTTTGGCCTCCCAAAGTGTTGGGATAATAAGCATGAGCCAC

19260

19261 TGCACCTGGCCTGAACATCTTTATATATCTAGATGACCTTTTATAAGCATGTACATGGCA

19320 55 19321 TATATTTCAAATAGTAAATTGGGTAGATCTGTGATTATGTGCATTTTTAAAGTTGATAAC

19380

19381 TACTGCCAGCTTGTCATACATTCCCTAAAACGGTATGTGAATAGAGTTTGTTTTCCACAA

19440

19441 TCTCACCAACATTGGGTATCATCAGAATTTTGATTTTGCGAGTCGGGTAGGTTAACAACT

19500

19501 GCTATTTCATTGTTTTAAATGGTATTTATTTGATCTTGTTTCTCGGTATTGCAAAATCAA

19560 5 19561 ACTTGAATTGGCCCTGTTTTTTTGTCCTCATTAATTTTACAAAGTTATATCTAAATAAGT

19620

19621 TGTAAAGAAATAAACTTTGTCATTTTCTTCTACCAGGTGGGATCAGCAATATGTAATCCA

19680

19681 ACTCACCTCCATGTTCAAGTAAGCATCATCTTTTCCCTTTTCTTTGTGTATCCTGCTTTG 10 19740

19741 TGAACTTACTTGCTAGAAATTAAACTATAGCAAACGTAAACATCACTCTCTCCTGATTGT

19800

19801 AAGAATATTAGTAGTTATGTTTTCTGTTACAGATTCACAGATTCCCTCCTCACTCCTGCC

19860 15 19861 TCCTTTTTGGCTTTACTTACTTATTTAAAACATAACTTTATTTTTTTTATAAGAATGACT

19920

19921 TTGCTTAGGTAΆGGTTATTGGTTCCCTTAAAGGTTCTTCATTAGAGAAAAΆTCAAΆΆGGG

19980

19981 ACAGTTATAATAATCTATTATCTCTTTAATTCTCTGCTCCCAAGTTTCATTTCTTAAAAT 20 20040

20041 ATGTTTAATGGAATTTTAACCACTGTAAAGCCCTGAATTGAATTTCTGAAAACAAGGCAG

20100

20101 TTAGACACTGTCTATAGCCTTTAGAATCTTTGATCCACAGGGATGTCCCTCTGACTGCAG

20160 25 20161 AAGAGGTGGAATTTGTGGTGGAAAAAGCATTGAGCATGTTCTCCAAGATGAATCTTCAAG

20220

20221 AAATACCACCTTTGGTCTATCAGCTTCTGGTTCTCTCCTCCAAGGTACAAATGGAAAATT

20280

20281 GTTTCTCCTTAGTTCTGGTGGTATGACCAGTTATGACCATTCAACTTATTCATACCCTTG 30 20340

20341 GTTTATATAGCAGAGCAAACATAGCCGAACATAGATGCTTTCATTTCACGTGAGCAAACA

20400

20401 TAGCCGAACATAGATGCTTTCATTTCACATATAGTGGTTTGGTCTGGTAAAAGGATCATA

20460 35 20461 CTTCTCATCCAAGGAAAACATGCATTATCTCTTGTATTGCCACATTTTCTTTGCGGAGGT

20520

20521 AAAAGGGATAAAGATCTCAGAAAGGGTAGACTATAGGCCCCATAAATTCTTACAGTAAGA

20580

20581 GGTATTTCAAATCTTGTTTCAACATGAAAGTGTTCTATACATTCTTAGTTTGAGTCTAAG 40 20640

20641 TCATAGATTATTTATTTAAATTTGTACTGGAGAATTCTTTAAGCTGAAAACTACTTTGAA

20700

20701 TTCAAATTGGTTATTTTGCTGTTAATTGGGAGACCTTACCAATTTTGTATTGTTTTCAGG

20760 45 20761 GAAGCAGAAAGAGTGTTTTGGAAGGAATCATAGCCTTCTTCAGTGCACTAGATAAGCAGC

20820

20821 ACAATGAGGAACAGAGTGGTGACGAGTGAGTAATATAGTGTAGAAATAAAGATCATTTTT

20880

20881 ACAAATTCATCTTCTCATTGTGTATTTTAGCTATTTCATTTTCTTCTCTCTCACCGCCTA 50 20940

20941 CCTTCACCTCAGCACAAAACTTTTCTAATCTTCTCTTAGCTACTCAGTAATAAAGAGAAT

21000

21001 AACATGATCTCTTCAGCTCTTTCATTTTCACCTTTCCAGGAAAATAATTGTCCTATAAGC

21060 55 21061 TGCAAGTTTAAAACCTATTTATTAGGTGTCAGCTCTCTGCCAGGAACTGTTTAGCCTTGG

21120

21121 GGCTACCGTGCTGAACCAGTCAGACAATGTCCCTGCTCTTGTAGAACTTCAATTTTAGTG

21180

21181 AGAAGAGGAAAGAAAATAAATAACAAGAAAAATGTTATAGTAAAAGTGCTATGCAGGTAA

21240

21241 TTCAAATGGATGATCTACCACTGACAGTATAGCTACTTTAGATTGGGTCAGGGAAAACCT

21300 5 21301 CTCTCAGATGTGCTATTTTAACTGAAGCCTGAATGATAAGATCCATAATATATAGATCAG

21360

21361 AGAACATTCTCAGGTAGGGAAATAATACAAAGGCAGAAGAACAGTGTGACATGTCTGAGA

21420

21421 AAGAGGATTTTGGGTTTTTTTTGTTTCATTTTGTTTTGTTTTTGAGACAGAGTGTAGCTC 10 21480

21481 TGTGGCCCAGGCTGGAGTACATTGGCATGATCTCAGCTCACTGCATCTTCCACCTCCCAG

21540

21541 GTTCAAGTGATTCTCCTGCCTCAGCCTCCTGAGTATCTGGTACTACAGGTGCGCACCACT

21600 15 21601 ACGCCCAGCTAATTTTTGTATTTTTAGCAGAGACAGGGTTTGCCATGTTGGCCAGGCTGG

21660

21661 TCTCAAACTCCTGGCCTGAAGTGATTCACCCTCCTCGGTCTCTCAAAGTGCTGGGATTAC

21720

21721 AGACATGAGCCACTGTGCCTGGCCAAGGATTTTTGTTTTTTCTTTCTTTCTTTTTTTTTG 20 21780

21781 AGATGGGGTCTCACTCTGTTGCCCAGACTGGAGTGCAGTGGTACGATCTCGGCTCAGTGC

21840

21841 AACCTCCACCTCCTGGGTTCAAGAGATTCTCCTCCCTCAGCCTCCTGAGTAGCTGAGATT

21900 25 21901 GCAGGCATGCGCCACCACGCCCAGCTAACTTTTTGTATTTTTAGTAGCGATGGGGTTTCA

21960

21961 CCATGTTGGCCAGGCTGGTCTTGAACTGACCTCAGGTGATCCACCTGCCTCGGCCTCCCA

22020

22021 AAGTGCTGGGATCACAGGCGTGAGCCACCGTGCCTGGCCGGATTTTTTGTTTTTTAAGCA 30 22080

22081 AACATTTGTAGGATGTTCATTCTGTGCCAGGGGCTATTCTAAGCACTTTAAACCTCACAA

22140

22141 CCCTATTTTACCTCTACTTTATGGATATGGAAATCGAGAGACAAAGAATTTAGTTAGTTT

22200 35 22201 GGCTTGAGCAATTTATTAGAGGTGGTTTCTTATTTCTTTTTTTTTTTTTTTTTTAAGAGT

22260

22261 GTCTTGCCCTGTTGTCCAGGTTGGAGTACAGTGGCACCATCATAGCTCACTACAGCCTCC

22320

22321 AATTCCTGGGCTCAAGTGACCCTCCAGCCTCTGCTTCCCAAGTAGCTGGGACTATAGGCA 40 22380

22381 CGTGACACCAAGCCTGGCTACTGTTTTTTTTTTTTTTTTTTGAGATGGAGTCTCTGTCAC

22440

22441 CCAGGCTGGAGTACAGTGGTGTGATCTCGGCTCACTGCAACCTCTGCTTCCCAAGTTCAA

22500 45 22501 GTGATTCTCTTGCCTCAGCCTCCCAAGTGGCTGGCATTACAGGCACCCACCACTATGCCT

22560

22561 GGCTAATTTTTTTTTTTTTTTTTTGAGAGGAAGTCTCACTGTGTTGCCCAGGCTGGAGTA

22620

22621 CATTGGCATGATGTCAGGTCACTGCAAACTCTGCCTCCTGGGTTCAAGCGATCCTCCTGC 50 22680

22681 CTCAGCCTCCCTAGTAGCTGGGGTTACAGGTGCGTGCCACCACGCCCGGCTACTTTTTGT

22740

22741 ATTTTTAGTAGAGACGGGCTTTCACCATATTGGCCAGGCTGGTCTCGAACTCCTGACCTC

22800 55 22801 AAGTGATCCGCCTACCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACTGCAC

22860

22861 CTGGCCCAAGCCTGGCTAATTTTAAAAATTTTTTCTAGAGATGGTCTCACTGTGTTGCCC

22920

22921 AGGCTGGTCTTGAACTCCTAGCTTCAGGCGATCTCCTGCCTCAGCCTCCCAAAGTGCTGG

22980

22981 AATTACAGGAGCAAGCTGCCACACCCAGCAGAGAAGGTGGTCTTAGATACTAAATTAGAA

23040 23041 TAAGCTGACAGAGAATCAGATTTTGGAGAAAATAATTAGCTCACTTTTTTTTTTTTAATT

23100

23101 AGCTCACTTTTGACCACACTAGGTTTGAGATATATATTAGGGTCCCATAAGGAGATATTA

23160

23161 AGCAGGTACTCTGGAGTAGTAGGGAGAGGGCTGAACTGGGGATAAGACTGGAAGGGACTA 23220

23221 AAGCCAAGAGACTGGATGGGATTATCAAGGGAAAGCATGAAGAAAAACCTAAGACTGAGT

23280

23281 TCAGAGGCATTCCACCATTTCAAGGTCAAGCACTTTGGGAGGCCGAGGCAGGCAGATTGC

23340 23341 TTGAGCTCAGGAGTTCAAGACCAGCCTGGATGACATGGTAAAAACCTATCTCTACAAAAA

23400

23401 GTACAAAAATAGCCAGGCGCAGTGGCTCATGCCTGTAGTCCCAACTACTTGGGAGGCTGA

23460

23461 GGCAGGAGAATCACTTGAGCTCAGGAGGCGGAGGTTGCAGTGAGCTGAGATTGCACCTTT 23520

23521 GCATTCCATCCTGGGCAΆTGGGAGTGAΆTCCTTGTCTCAAAAAAAAAΆAΆTTTTTTTTTT

23580

23581 TTTTTTTGAGAAGAGGCCAGTGATGATGCCCTGGAAGCTAAGAGAAGAAAATTTTTCAAG 23640 23641 AAAGAAGTAGTTAATTGTGTCAGATCTGAGATGTTAATTAAGATGGGACTTTATTGTTTT 23700

23701 TGACAACTTGGAGGTCATTTGTGACCCTTGAAAAAATAATCTAATGTTATTGTGGGGGTG 23760

23761 AGGGTAGAAAGAGGATAGAAGCCTGATTGAATTGGATTGTACTGGGAAGAGATTGTGAGA 23820

23821 TAAAAAAGGAGAAATTATGACTATACACATTTATTTTACGAAATCTTGTAAAAAGAAGCA 23880

23881 GAGTCTGGGCACAGTGAATCGTGCCTGTGATCCCAACGTTTTGGGAGGCTGAGGCATGAG 23940 23941 AATCACTTGAGCCCAGGAGTTCAAAACCAGCTGGGCAATATAGTGAAACTCCATCTCTAC

24000

24001 CAAAAAATAGAAAAATTAGCCGGGCATAGTGGCACATGCCTGTAGTCCCAGCTACTCAGG

24060

24061 AGGCTGAGGTGGGAGGATCGTTTGAGCCTGGGAGGTTGAGGCCACAGTGAGCTATGATCA 24120

24121 TGCCATTGCACACCAGCCTGAGTGACAGAGCGAGACCCTGTCTCAGATAAATAGATGGAT

24180

24181 AGATAGATAGACAGACAGACAGGAGCAGAAAATTGGGCAGTAGAATTTCTAGTAAACACC

24240 24241 TGCCTTCAGTGATTGTATGAGTTTAGCTTGAGTTGGTCTACTTGCTCATCGTAGCACCTG

24300

24301 TCCTTGTCTCTCACTCAAAAACCATAGCTCATACCTTTTTTTTGTTGTTGTTGAGACAGG

24360

24361 GTCTCTCTGTTGCCTAGGCTGAAGTGCAATGGCACAATCTTGGCTCACGGCAACGTCCAC 24420

24421 CTTCCGGGCTGAAGCAATCCTCCCACCTCAGCCTCCCAAGTAGCTGGGACTATAGGAATG

24480

24481 CCACCACATCGGGCTGATTTTTTTGTATTTTTAGTAGAGGCTGGTCTTGAACTCCTGGGA

24540 24541 TCAAGTGATGCGCCCGCCTTGGCCTCCCAAAGTGCTGGCATTACAGGCATGAGCCACTGC

24600

24601 ACCTTATAGCTCATAACTTTCTGTTGAATCTTTTAGGCTATTGGATGTTGTCACTGTGCC

24660

24661 ATCAGGTGAACTTCGTCATGTGGAAGGCACCATTATTCTACACATTGTGTTTGCCATCAA

24720

24721 ATTGGACTATGAACTAGGCAGAGAACTCGTGAAACACTTAAAGGTAGCATCAAACTTGTA 24780

5 24781 AGGTGATCTGGGTCTCTTTTGAATGAAAGTGTTTGAACTTAAGCCACTGTTATGCCAGTT 24840

24841 AΆTGACAGGAAATAΆATACTGTAAAATGACCCTGGGTGTGGAGGATCTCTTTTTTTTTTT

24900

24901 TTACTTTAαAAGTGGAAAATACATGATGTTACCACACTCCCTTTTTTTTAATGTCCTCAC 10 24960

24961 TTTAGCAGTATACCACATCCTTAAGCAAGCCAGAAAGTTCTATTTAAAAAAACCCTCAAA

25020

25021 AGTTCTAACGCAAAAAAGTGGGAAAAAAGGTGAGAACTATTCAGCAGGAAACTCAAAATC

25080 15 25081 TAATTCTAAGTAGATTATGTGTCACTGTCTTATGTGACCTTCAGCTATGGCCTCTGCTTT

25140

25141 CTCATTTATGAACTAGGATGTTGCATCTAGTAGTATATCGTGGAGTTGTGTGATGCTAAG

25200

25201 TCATTCATCATGAAGTATTTGTAΆAGCAGTAAAATAΆAACTCAGAAGTGTTTTAAACTAT

20 25260

25261 ATTAAAAGTAGTGTACAGATATATTTGTTCCTCCAACTTTTTAGTTTGAAAATTCTAACT

25320

25321 CAGAδAAGTTAAAACACTACATAGCGTGAAGAACACCTATGTTCCCTTCACCTAGATTGA

25380 25 25381 CCAATTATTAGACACTTTGCCACATTTACCTCACTTTTTCTCTCATACATATTCTTTTTC

25440

25441 TGAACATTTAAAAGTAAGATGCAGAAACCCAAACCCTAACACTTCAATGCCTGAATATTT

25500

25501 CAGCATGCTAΆGAATAAGGACACATTATCACAATACCATCATCACATCTAAGAAAATTAA

30 25560

25561 CAATAATTCCATAATATTATTTAATTTATATTCCCTATTCACATTTTCACAGTTGTCTCT

25620

25621 AAAATGTCTTTTGCAGCTGTGTTTTTTTTTTCCTCTGCTCCAGGCTCTCATCAGGCTTCA

25680 35 25681 TGCATTGCATTTGGTTGTTCTTTCTTTCTAGTTTTAGACTGATTTTTTTGAAGCATCAGC

25740

25741 CCACTTGTCTTGTGTAATGTCCCACATTTTTGGATTTATCTGATTGTTTCCTCACGATTG

25800

25801 GGTTCAAGTTAAACATTTTTGGCAAGAATACTTCGTAAGTGATGTGTACATTTTCTTGCA 40 25860

25861 AGGGCCAGATACTAATATTTTACACTTTGCAGGCCATGCAGGTTCTGTCACAACTACTCA

25920

25921 GCTTTACGGTTCCAGCAAGAAGGCAGCCATAGACAAAATGTACTAACGAGTGTGGCTGTT

25980 45 25981 TTCCCATTTATGGACACTGAAATCTGAATTTTATTCCGTTTTTATATTGTGGGATATTAT

26040

26041 TCTTCCTTTGATTTGTTTTTTTTTTTTGGACAGAGTCTCACTGTGTCACCCAAGCTAGAG

26100

26101 TGCCGTGGCCCAATCTTGGCTCACTGCAACCTCCACCTCCCAGGTTCAAGTGATTCTCAT 50 26160

26161 GCCTCAGCCTCCCAAGTAGCTGGGACTACAAGTGCGTGCCACCACGCCTGGCTAATTTTC

26220

26221 GTATTTTTAATAGAGGTGGGGTTTCGCCATATTGGCCAGGCTGGTCTTCAACTCCTGGCC

26280 55 26281 TCAAGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGAGATTACAGGCGTGAGCCACTGC

26340

26341 ACTGGCCTTCCTTTGATTTTTTTTTTCCCCCAACCATTTAAAACCATAAAAACCAGTGTT

26400

26401 AGCTCATGGGTTGCACAAAAACAGGTAGTGGGTTGGATTTGGCCAACAGGCTGTATGTAG

26460

26461 TTTGCCAACCACTGTTCTGTGCATTACATCAGGAGGAACATAGTATTAGCTACCTTACTA

26520 5 26521 TTTGTGAGGCTGAATCATTTGCTTTTAAGAAAATGACCATTGATTTTCTTCATTTATAGG

26580

26581 TACATATGTATCCCTTTGTATTAGAAAGAAATCTATAGGATCATAGTTCGAGATAGTATT

26640

26641 CAAAATCCTTTTCCCAACAACATTTCACAGAAGGTTTTCGTTTTCCCATCTATTGATGAT 10 26700

26701 TCTTGGCCTAAATCAGTAATTATACCAGGAATTCAAATTTTATATCTTATTTTTTTTTTC

26760

26761 TTTTTTTTGTCCTCACAATCTGTTCCCAAGGAATTCTAATTTTAATTAGAAAATTTTAAT

26820 15 26821 TGCATCTTTCTTTCGGCATTATTAGCTGGCATTCTTCTGTAAAAAGAGCCTCTCCTTTCC

26880

26881 CTATTGTTTTGAATATCATTATGCACTCATGAACCTTTTTAAATTCAATGTACTTATTAT

26940

26941 TCATTACTGTCATTATTTGTTTTGAGCACAAAATTACCCAAATTTAGCCAGTTGGATCCT 20 27000

27001 TCAAGTCAGCTTATGTGTTCTTTTGATGTGACCCCATTTAGTCTTTGAGCTTGTGGCATG

27060

27061 ATATGCCCAAGGCTCACCATAGTACTTTCTTTGCCCCATCCTGATGCTAGCCATGTCTCT

27120 25 27121 AAAAAGCCCTGGGTCCCTTCTGTAGGAAAGGATATTTAAAGACCATAATCTGAGTAGGCT

27180

27181 TTAAAACTTCTTTATCTTCTCTGTATCTGTTCACTCTTTTTTTAAATTTAGGTTCGTAGT

27240

27241 AACTATATTGCTGGCCTTTTTACCAGCTTTATGTATTTTGATCCAGTCTGTTGTTTTCTT 30 27300

27301 TTTGTTTTCCATTTGATTGCTGTTTTATTTTACATTTTTTTTGTTTTACTAGTATCTTTT

27360

27361 CTAGGCCTTGAAAATGTATTCTTTAAACTTTTTGAGGCACAATTTACATAGCACAAAATT

27420 35 27421 ATTTTATTCTAAGCATGCAATCCAGTGATTTTTTTTCATAAATTTACCTAGTTTTGTAAC

27480

27481 TGTCACTATAATCCAATTTTAGAACATTTTTTATCACCCTGTAAGATCTTTTATGCCCAT

27540

27541 TTAATCTGCATTAGCACTCCCAGTCCCAGGTAACCACTAATCTACTTTCTGTCTCTATAG 40 27600

27601 ATTTACCTTTCCTAGATATTTTGTATAAATGGAATTATATGATATGTGGTCTTTTGTGTA

27660

27661 TGTCTTCTTTCATTAAGTGTAATGTTTTCAGGATTCACCTGTGTTGTAGCGTATGTCAGT

27720 45 27721 ATTTCATTTCTTTTTATTGTTTTGTTTGTTTTAGGAGACATGGTCTCACTCTGTCACCTA

27780

27781 GGGTGGAGTGCAATATCATGATCATGGCTCACTGCACCCTCAAACTCCTGGCCTCAAGTG

27840

27841 ATCCTCCCACTGTAGAGTTCTGACTAGCTAGGACCGCAGGCGCACACCACCATGCCTGGC 50 27900

27901 TAATTTGTAAACAATTTTTTTAAGAGATGGAGGTCTTGCTGTGAACTCCTGACCTCAAGT

27960

27961 GATCCTCCTGCCTCAGCCTCCTGAGTAGCTGGGATTACAGGTGTGAGCTACCATATCAGG

28020 55 28021 CTCATTTCTTTTTATTGTTGAATAGTATTCCATTGTGTGGATATATATGAGTTTTGTTTA

28080

28081 GCAGTCTCCCAGCTGGTGGACATTTGAATTGTTTCTACTGTTTGGCTATTACAAATAATG

28140

28141 CTGCCACTAACATTCAAATGTAAGTCTTTGTGTGGACATACATTTTATTTCTTTTGTGCA

28200

28201 TATATCCAGGAGTAGAACTGCTGGCTTATATGAGAAATTTATGTTTATATTTTTAAGAAA

28260 5 28261 CTGCCTGTTTTCCAAAACGGTATCATTTTACATTCCCACTATCACTATGTGAGGGTTCCT

28320

28321 ATTTCTCTGTGTCTTGGCCAACATTTGTTATTGTCTGTCTTTTTTATTATATCCATTCTA

28380

28381 ATGGGTATGAAACAGTACCTCATTGAGGTTTAACTTTGCATTTTCTTAATGACTGATGAT 10 28440

28441 AATTGAGCATCTTTTCATGTGCTCATTAACTGTTCTAATACCTTCTTTGGTGAAATGTCT

28500

28501 ATTCAAATCTTTTGCCCGTTTTTTGAGTGTGTTGTTTGTCTTCTTAATTAAAATTCTTTG

28560 15 28561 TATATTCTACTGACAAGCTCTTTTTTATTTTTTCAATAACTTTGTTATGAATGGTGACAA

28620

28621 GTTCTTTATCAGGTATATAATTAGCAAATATTTTTTCTCAGTCTGTGACTTGTCTTTTTA

28680

28681 TTTTCTTAATGGTATCTTTTGAAGTACAAAAGTTTTAAATTTTGGTGAGGTCCAGTTTAT 20 28740

28741 CAATTTTTTCTTGTAGAAATTATGCTTTTGGTATTGTATGTAAGCAATCTTTACTTAATC

28800

28801 CAAGGTCTTGAAATTTTTCTATATGTTTTCCTCTAAAAATATTACAATTTTTGCTCTTAC

28860 25 28861 ATTTGGATTTCTGATCCATTCAAGGACCTATTTTTATTTTTTATTACTATTAAAAAATGT

28920

28921 TTTTTATGTCTCTCGAGTCATAAGGACTTTTTTGGGGGATGCAGTGGCGCAATCTGAGCT

28980

28981 CACTGCAACCTTCGCCTCCCAGATTCAAGTGATTCTCCTGCCTCAACCTCCCGAGTAGCT 30 29040

29041 GGGATTACAGGTGTGTGCCACCACTCCCAGCTAATTTTTGTACTTTACAAGACAGGGTTT

29100

29101 CACCATGTTGGCCAGACTGATCTTGAACTCCTGACCTCAAGTGACCCACCTGCCTCAGTC

29160 35 29161 TCCCAAAGTGCTGGCATTACAGGCATAAGCCACCACACCCGGCCAAGGGACTTACTGTTA

29220

29221 AATAGGCTGTTATACTTTTAAGATGTATGTAATGGGACCAGGTGTGGTGGTTCACACCTG

29280

29281 TAATCCCAGCACTTTGAGAGACCAAGGTGGGAGGATCACTTGAGGCCAGGAGTTTGAGAC 40 29340

29341 TCTCCTGGGCAACATAGTGAGACTTCATCTCTCCAAAAAAATTAAAAATTAAAAATTAAA

29400

29401 AAATTAGCCAGGCATGGTATAGTCCCAGCTACTTGGGAGGCTAAGCCAGGAGGATCCGTT

29460 45 29461 GAGCCTGGGAAATCAAGGCTGCAGTGAGCTATGATTGCATCACTGCACTCTAGCCTGGGC

29520

29521 AACAGAGAGACCCAATCTTTTTTTTTTTAAAAAΆAAAAAAAAAAGAΆΆAAAGAAAATGTA

29580

29581 AGTAAATGACTTCCTTTTGGTTGCTCTCTTCTAGGTAGGACAGCAAGGAGATTCCAATAA 50 29640

29641 TAACTTAAGTCCCTTCAGCATTGCTCTTCTTCTGTCTGTAACAAGAATACAAAGATTTCA

29700

29701 GGACCAGGTATTTTTTTAAAATGCCATTTTGTTTCTTTCTGTAGTTGGTAAAAAAAAAAA 29760

55 29761 AAAAAAAAAAAAATCACAGTAATCTGTTTTTGTTGTATCCGTTCCTAAACAATTTTATCC 29820

29821 CCTTCCTGCCAGCAGCAAAGCTGCAATAAGACATGTATGTCCCCTCTGTTCTCTGGGGAT 29880

29881 TGATGGAAAACTTTTACAGGTTTTTGGTTTATTCCTCCGTTTCCTGTTGTGTGGTAAGCT

29940

29941 AGGTAGTGAGGTGGGAATAAGAGTGGAAGGATACCACACAGCCCCAGTTGGAGGAAGGGA

30000 5 30001 CAATTTAGCTGGATATTCACCAGCAGTTTTACTATGTTGTAGCTTTCCAAGAGTAGGTTA

30060

30061 GTTGCAAGGATGAAATTTACAATTTTTTTCTTATATGTGTTCCAGGTTGTGGGTAATATT

30120

30121 TATATTGACTTTTAGCCTTTTTTCCCCCTTTTTTCTTAAGGAAGGCTTGTCTTCAGCTCC 10 30180

30181 TTAAAGTTCCCACTGAATGTCATAATTTTTGTAACATCTTCATCTAGTTAAAATAGTTGT

30240

30241 CATAGGGATATATGTGAATGGCAGCTCATAGGAACAGTTATAAGGTTAAGATTTATAGCA

30300 15 30301 TGAGCTATATAATATTTGCTTTATTTTCTTATTTCTTTCATTCTGTTAGGTGCTTGATCT

30360

30361 TTTATTTATTTATTTTTTCATTTATTTTTATTAACTATACTCAAGGTGCTTGATCTTTTA

30420

30421 AAGACTTCGGTTGTAAAGAGCTTTAAGGATCTTCAACTCCTCCAAGGCTCAΆAATTTCTT

20 30480

30481 CAGAATCTAGTTCCTCATAGATCTTATGTTTCAACCATGATCTTGGAAGTAGTGAAGAAT

30540

30541 AGGTAAGTTGGTGAACCATATCTCAAAAGGTTTATAAGGTGTTTAATGTTAGAAAATGCA

30600 25 30601 ATGTGATATCATACTTGCAGCCTGTGAGGGGAAACTGATGACTTAACAGCTGTTGACGTC

30660

30661 ACACATCGAGGTTCATTTTCTTAAAACTGGTAGAGGATACTTTTTCCCTATGGAAATATC

30720

30721 CTTGAATCTCAGGCCCTGGTTTCTTGGCCACCTTGACAGCAAGCATACAAAACTGAATTA 30 30780

30781 GTTGCTTTTGTTTATTCCATGATATTACAATTCTCTATTTTCCTTTTCTTTCTTTCCTTT

30840

30841 TTTTTTTTTTTTTTTTTTTTGAGACAGAGTCTCACTCTGTCACCGAGGCTGGAGTGTAGT

30900 35 30901 GCCACAATCTTGGCTCACTGCACCCTTGACCTCCTGGGCTCAAGCAATCCTCCCACCTGA

30960

30961 ACCTCTCGAGTACCTGGTATTACAGGCATGCTCCACCATGCCCAAGTAATTTTTGCGTTT

31020

31021 TTTGTAGAGACAGAGTTTCGCCATATTGCCCAGGCTGGTCTTGAACTTCTGAGCTCAAGC 40 31080

31081 AATCCACCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCTTGAGCCACCGGTCCTGG

31140

31141 CCGTTTCTTTTCCTTTTTAAAGGAAACAGTCTCACTATGTTGCCCAGGCTGGAGTGCAAT

31200 45 31201 GGCTATTCACAGGCTCTATTATAGCACGGTAAAGCCTCAAACTCCTGGCCTCAAGTGATG

31260

31261 CTCCCACCTCAGTAGCTGGGACTATAGGCATGTAGCCCTGTGCCTGGCTTGGCCTTTCTT

31320

31321 CTTTTTTTTTTTTTTTTTTTTCTGAAGACGTAGGAACTTTTCACTATGTTGCCCAGGCTA 50 31380

31381 GACTTGATCTTCTGGGCTCAACTGATCCTCCTGCCTCAGCCTCTCAAGTAGCTGGGACAA

31440

31441 TAATAGGTGCATGCCACTGCATCTGACTCATGGCCTCTTTTTAAAGTGAGTAAAAAACAT

31500 55 31501 TAAGAGTTTTATTTACTTTGTTGAAATTTGGAGCGTACAGTTGGAATTCAATTATTGATT

31560

31561 AGCCAGTTTAGAGAAATTCATTTAAAACCACATGGAΆΆΆCATCAGTAACTGTCCTGGTTT

31620

31621 TTATTTTAATTTCAGGATATACATTAGCTGTTGTAACAGTCTTTACTATGTCATGCTTTC

31680

31681 ACTGTGTATTAAATATATTTTAAAGTTTCATAATTTAAGCATAATTTATGGTAGGCCTCT

31740 31741 TCTAGGCCTGGAATTCTTGTTTATTAATAGATTGAATTTACATGAGCTTTGCAGTAGGGC

31800

31801 AAATTTGAAATTAGAGTTTATTGTTTGAGTGTTTTACCCATTGTTAGGCCTTAAAGCTTT

31860

31861 TTAATCTTTGTATCTCATTATTCTTTTTCCTGAGGTTTATACTTGTGTCTCCAGTATTTT 31920

31921 CCCAACCCAATCTCTTAAAGTTGGGCTAAAGAAGGAGGTACTCACTAACCAGGAAATTCC

31980

31981 TGGAAAGCTTCCCATCTTCTGGAAATCTCAACTGTCATAGGACTTGAAAGCATCTCCATG

32040 32041 TATAATTAGCCTGGAGTGGGTACTTGCCACTGGGTGGGACCAAACCTCAGCTGGGAGAGA

32100

32101 TTTGGATAGCCTTATTAGATCCTCTAAAATAACTGGCTGGTTAGTTACACATGCTTGCCC

32160

32161 CAGTCAGTTAATGAACCTTTAAATAATGCATTCTCCATTTTATGTAATTGAATTTTTGGC 32220

32221 TCTTCAGAGTATGAATATTGTTAACTCTGTTAAAGTCTTATAATTTATCCATAACTGTCT

32280

32281 ACTTCATTCTGTAGTGCTGCATTATAATTTGTTGCAATTTAACAGGTAGATATAATGGAT

32340 32341 AAAGGTTATATTCACATTATTTCCACTGTGGGGTATAACATTAAGTGGTATAAAACCACC

32400

32401 AATGATTCTACTCTTTTACAAAAGATTATTTGTAGTTTTAGCTTATGCTGTTGATACCAG

32460

32461 ATCACTCATCCATTTATAGTATAAATATCTACTGAGTGTCTTTTATATGCCACCCTTGTG 32520

32521 CTAAGCAATGTGGGTAAAAATATGACTATACTATGACCTCTCTCCTAGAGAATCTCTGTC

32580

32581 TTTGGAGGGGAGCTAATTTGTAAGCCTCTACATAAGCGTTGTCAGTGCTGTAGGGTAATA

32640 32641 GAAGAGGCAGCAATGAACTTCTCATCATGAAATCTTGAATAGGAATCTTTAATTTTTTTA

32700

32701 ATACTCTCTAGCTTCTGGATTTTCCCAGTATTCTGTGTGTCATTGGAGACCACTTGCCTG

32760

32761 CTGTTCCCATTATGGTACATTTAGGAATTCAGAATTCTGTCAGACGATGTGTGTAAAGAA 32820

32821 AAGGTTTATATGCAGAGTACGTTGATTATCTAGGTCAAGTTTGGGAAATGTATAACAGGG

32880

32881 CAATGAGTATAAAGGTAAGAATATCTCACTAAGTTTTTCTTTTTTAACTAAGCTTTGTGT

32940 32941 TCTTGTAGCGTTCATAGCTGGGACCATGTTACTCAGGGCCTCGTAGAACTTGGTTTCATT

33000

33001 TTGATGGATTCATATGGGCCAAAGAAGGTTCTTGATGGAAAAACTATTGAAACCAGCCCA

33060

33061 AGTCTTTCTAGAATGCCAAACCAGCATGCATGTAAGCTCGGAGCTAATATCCTGTTGGAA 33120

33121 ACTTTTAAGGTGAGACACTATTTTGGCAACCACTCCCTAATTTTGTTTAAATAATCCAAG

33180

33181 TAGGGCTTGGCATTTGCCTGGGAGTTTGGTGGACAAGGAAATTTGAGTATGTATTAGTTT

33240 33241 GACAGAGGATATATGACAGAGATTAGGATAATATTGAAACATTATAGATCCAGTTTGATG

33300

33301 GAATACTTACTTACGTATTCTTCTATGTAAGTTTTTCATTTATTCCACTTAATCCAATTG

33360

33361 TTTCTGACCTAGAAGTTTGAATATAAGACAGCTTTTTCATGATTTTTCCTTCTTCTTCCA

33420

33421 GAAGTAAACCATTTCTAGTTTTTGTTTTGTTTTGTTTTAATGCTGTTCCTTTACACAGAA

33480 5 33481 GAAATTTTTTTAGGTCACTGTAGAGCATCAAGCTTAAAGAATCATGCTTAAGAATTTTTA

33540

33541 CTGAAAATAAAACAAAAAAAGAATCATGCTTGGTCGGGCATGGTGGCTCAAGCCTATAAT

33600

33601 CCCAACACTTTGGGAGGCTGAGGTGGGCAGATCACTTGAGCTCAGGAGTTTGAGACCAGC 10 33660

33661 CTGGGCAACGTGGCGAAACCTCATCTCTACAAAAAACACAAAAATTAGCCAGGTGTGGTG

33720

33721 GCACGCACCTGCAGTCCCAGCTACTTAGGTGACTGAGGTGGATCACTTGAGCCCAGGAAG

33780 15 33781 TCAAGGCTACAGTGAGCTGTGTTTGCACCACTGCACTCCAGCCTGGGTAACAAAGTGAGA

33840

33841 TCCTATCTCAAAAAAAAAAAAAAAAAAAAAGAATCATACTTATTGGTTCATTGTTTCTGA

33900

33901 AGAAACCAAAATGGGAAATGTACATTTTTTTGACATTATTACCAAATTATCACTGACATC 20 33960

33961 TCAATCTTAATTACTGAAACAGTTTTCAAAAATTTATTTTCTCTTGAATAAAAAAATGCT

34020

34021 AGATCATATCTGAGGTTAAACAACTTTTCTAGTAATTCCTCTAAGTCTTCTTGATGTTCA

34080 25 34081 ATTGTTTAATGTAGCTACACACCAAGAGTGTCTGCTGAAAATGTTTATTTGGGAGGCTGA

34140

34141 TCTAAATGTAAATCTAGTGTTCTAATCCAGAACTATGAAGTAAATGTCAGAATAGTTCTT

34200

34201 GAGAGTTATCAGTGTATTTGATGATATAAATTACCCTGCTTAACAGTGATATTGTTAAAG 30 34260

34261 ACATTATCTTAGAGTATGTTTTTATATTTAAATTATTATTTTTCAAAACTAAAGAGGCTC

34320

34321 CATTTGATTTCACATGGATGTCGGCTTTATCAAGGCATCTTTATAAATTGCTACTAAGTC

34380 35 34381 ACTAACAGTCTGAGGTCTGAACGAATTAGTCTGTACCATAAAGTAATCAAGTTGAAGTAA

34440

34441 GAATTTTAACTACAATTCAAGTGAAAATTCAGAATGGGTCTTTTTAACAGCAACTGCTGT

34500

34501 GCTTCTAGAAACTGAGAAAGGGGTTAGGGAAGATCTTTTAAGTTTCTAGAAATCTTCATG 40 34560

34561 GGCTTTTACATTTTAATTTTTAGAATAAAAGTTATGTATACTGTCAGGAATATGGATATT

34620

34621 GTCAGGTAATGGATACCTAAGTAGAATTGATCCTGTTTGATGCTGAACCTTATATGTAGC

34680 45 34681 AATAGTTTAATGTGATCTGTACTTTTAGGCTAATAAACATGGTATTTATCCTACTAATGA

34740

34741 ATCATCTAAGCACAGAACAAGGGGAAAGTTGAGTGATTCTGTCTTTCCAGATGATTCACA

34800

34801 TTATATTCTGATTTTTTGAGTTTTACTCATATCCAAACGGTTCTTCATGCTGCCTGACAT 50 34860

34861 GCATTGATATACTTAAAAGCTGAAGGGTCACCCAGGAATATTTTATTTATTCTGTTCTTT

34920

34921 ϊ'TAGATCCATGAGATGATCAGACAAGa-AATTTTGGAGCAGGTCCTCAACAGGGTTGTTAC 1

34980 55 34981 CAGAGCATCTTCTCCCATCAGTCATTTCTTAGGTATTCAACTTTGAAAGAATGAATAAAG

35040

35041 TTTTTAGAAATATTTTCATTTCAATGTCATAATCATTTGGTCCTGGGAAAAAAATTTTAA

35100

35101 TGAGCATTTAAATGCAGTATCTTTGATAGGATGCCAATAAAGGAAATCCATTTATGTTGT

35160

35161 CTGGGATTGTCAGTTGTCAAAAAGGCTGTAGCTGTTAGTAAGGAAAGCAACAATTAACCA

35220 35221 ATAAGTAAAACTTCCATTTTGCTCAACATTTTCTCTTTTTGTTAATACCTCTCAAATGAG

35280

35281 TGCTGTCCTGCTGCTGCCACTCATTGACTTCCTAGTCCTTTTAGCCTTGACATTTGGACT

35340

35341 GCACAGTGTTTCTCTCTGGTAACCTCTGAATTTGCTGTTTTGTTGTTGTTGTTGTTGTTG 35400

35401 TTGTTGTTGTTTCTTTTTTAGACACAGTTTCACTCTGTTGCCTGGAGTGCAGTGGCGCAA

35460

35461 TCTCAGCTCACTGCAACCTCTGCCTCCTGGGTTCACGCAGTTCTCGTGTCTCAGCCTCCT

35520 35521 CAGTAGCTGGGATTACAGGCGCACGCTACCACACCCAGTTAATTTTTGTATTATTAGTAG

35580

35581 AGACAGGGTTTCGCCATGCTGGCCAGGCTGGTCTCGAACTCCTGACTTCAGGTGATCCAC

35640

35641 CCACCTTAGCTTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCATGCTTGGCCTGAAT 35700

35701 TTGCTGTTTTTAATACATAACTCCTAATGTCTAAATTTACTAGCCTTCATCACAAATTTT

35760

35761 AAAAATTTCTCCGTGTCATTCATCACTGTTTATAACTCCCCTGAAACTTTTTAGGTAAGT

35820 35821 TCTCTGATTACTGTAAGAGTCTGCTTCTTTTCCTGCCTGTGACCAGTTTTCCTTTTACCT

35880

35881 CAAATATAATACAGTTCCCTTCAAACAAATTTTCCTTTTCATTGCTCCTATTTCCATTAT

35940

35941 TTGAAGCATGGTTCTCCAGGTCCTGGAGGCTGGAATTACCTTTTATTTTCTCAAACACAA 36000

36001 GTTCTGTCATTTTTCTCCCTCATACTGTCATTGTTTCTATCTTTTTATTCCCTCCTCCTC

36060

36061 CCAACCTCACATCCAGTGTCTAATTCCTATTCTTACTACCTTGAATTTAGACTACTGCCA

36120 36121 CAGCTTTCTAACTGGTTTCCTGGCTTTTTCTTCTCCCTGCAGCAATTTAACCTGCATATT

36180

36181 ACCACACATTAGTATTCCTAAAATACCACCTTATGACTACAGAATCATCAATATGTTCTT

36240

36241 TAGTAAATTGTCAGCTCTTTTTGTTTGTTTGTTTTTTGAGATGGAGTCTCACTCTGTCAC 36300

36301 CCAGGCTGGAGGGCAGTGGTGTGATCTCAGCTCACTGCAACCTCCGCCTCCCGGGTTCAA

36360

36361 GCAATTCTCCTGCCTCAGCCTTGAGAGTAGCTGGGACTGCAGGCACGTGCCACCACACCC

36420 36421 AGCTAATTTTTGTATTTTTAGTAGGGATGGTGTTTCACCATGTTGGCCAGGATGGTCTCA

36480

36481 ATCTCTTGACCTCGTGATCCACCTGCCTCAGGCTTCCAAAGTGCTGGGATTACAGGCGTG

36540

36541 AGCCACCACACCTGGCCAAATTGTCAGCTCTTTATCCTGGCACTGAAGGTGCTTGACAAG 36600

36601 TTGATCCCACCCTACCTTTCCATTGTTAAAGTCAGTTATTTACTTTCCTGGATTCTTCAC

36660

36661 CCCAGCATCATTGATTTCCCTACCATCTTCTGAATATGTCTTAAACTTACCTTTCTCATA

36720 36721 CCTTTACTCATACTTTTCCCTTTATATATGTAGGCCTTTCTGGTGCTTTGTGCATTTAAA

36780

36781 TGTTCCCCATCTTTTAGAACCTAGTTTAAGTTCCTTCTCCTCATTGCTTTCCTAATATAT

36840

36841 ACCAACTCATAATGAGCTCTTCTTCCTTTGAACTCCTATTTAAACTCTCCCTTTACCATT

36900

36901 TGGTAGGCAATAAACTATGTTTGGAATTAATAGATATTTGATATTGCTCTGTGATTTACC

36960 36961 TCTTTAATTCTTTAAAAAAATTTTTTTACTTCCAACTAAATTATTAAAATTTTTGAAGAA

37020

37021 AGACTATGTTTTATACTTTTATGTTCCTAATAGGACCATATTACGCTTGTGCATAGTAAA

37080

37081 AATTCAGAAATATTTATTGATTTATTTTAGCTTAGAAGTAAGAAGATGTGTGCTACTACT 37140

37141 TATGTCTAGGCAGTGTAATCGGTTTCTTCCTTTCTTTTAGGCAGTTTAATTTTTATAGAA

37200

37201 ATTGAAACTATCATAGAAAAAAGCTAAAATTATCCTGGTTACATTGGAGAAAAAGTATAT

37260 37261 ATAGAATCAGTAGAAATTTGTATTGCAGTATACTTGACTTATTTGAGAAAGGAAACAAAA

37320

37321 GACTTCACATTTCTTTCTCCGTATTTTTCTTTTATTTCAGTTATCTTTAGTTATTTGAGA

37380

37381 CAACTGATTATAGATTCTGTTTTTCAGACCTGCTTTCAAATATCGTCATGTATGCACCCT 37440

37441 TAGTTCTTCAAAGTTGTTCTTCTAAAGTCACAGAAGCTTTTGACTATTTGTCCTTTCTGC

37500

37501 CCCTTCAGACTGTACAAAGGCTGCTTAAGGCAGTGCAGGTAAGTCTTCAGATTCCCAAGT

37560 37561 AACTTGCCAAAACTGAGGTTTAACTGTCTAGTGGAAGTCTGATGCATTTTTGTATAAATA

37620

37621 TATATGTCTAAGAAATTCTATTCCACTTTAGTCTGAAACATAAAATTTGCATTAAAAGGG

37680

37681 CAAGAGCATTGAGCAAGACTGTTTTCCAGGTTTATTTCCTATCCATAACACTATTCTCTC 37740

37741 AACTCATTTTCAAGTTAATGAAATAATTAAAACTCTCTTAGAACATTATTGACTACAATA

37800

37801 GATTTACAGAGGCATTTGCAACCAGTTTCATTTTAGTGTTTCAGAGTATACTATTGTCCT

37860 37861 TGTTAACTGTAATAAACATTAAAACGTATATAAAACAGAAGTAAGCTTCCTTATAGAAGC

37920

37921 CATATGGTAACAGTATTGGGTAGAAATGACCTAAGGCTAATAAGCAAACTTGTTCTGTTT

37980

37981 TTACCCACTGATTCTTTTTCAGCCCCTTCTCAAAGTCAGCATGTCAATGAGAGACTGCTT 38040

38041 GATACTTGTCCTTCGGAAAGCTATGTTTGCCAAGTATGTAGCATCTTTTTCTATCATAGG

38100

38101 AAGACGTTGTCTTCTAATGTTGGAGCTAAAGTTATCTCTGCCATCTCCTAGTACCACCTG

38160 38161 TTCACAGTGATCATGAGAATTCCTAGCCCCGCAGAGCAATTTCATCAAATAAGGCATTTT

38220

38221 GCTAAGTGCTTTATGTGCTTTTTTCATTTAGTCTTCAGAGCAACCCTATGAAGTAAGAAT

38280

38281 TATTGCAATTCCCATTTTACAGATAAGAAAAGTTAGGCTTAGAGAAGATGAGTGACTTTC 38340

38341 CAAAAGTAAAACAGTAGTTGGTGGTGGAGCAAGGATTCAAAGTCAGGTCTAACATCAGAG

38400

38401 CTTGTGATGTCAAGTACTTTGCTCTACTACATTAGAATAATCTTCAGGGTAATGTTCTAG

38460 38461 TGTCAAATACTCTGAAACCTGAGCTGGCAGCAGTTCCAAAGGGGAAAGTACCTTAGTCAA

38520

38521 GAAGATTATACTTAAGAATCTCATTTTAGTGAACATCTGAATTATGATCTAGAGGCTAGT

38580

38581 AGATCAAATTAATTGTTGAGGGGAATGAGTTTTATTTGTAGCTTAGCCAGTCCCTCCTAA

38640

38641 CATCTCTTCCTTTATTTCTTAGTATATAAGCTAGAGATGATCTTGCCATCTCCTAAAAAT

38700 5 38701 ATAAGACAGTAATATGTGAAAAGCATTTGGTACTTTGAGGATAAGATGCCCAATTAATAT

38760

38761 CACATTGGCCATATTATAATATATTCATGGGTGTCCAGAGGAACAACTAAGCCATATGCT

38820

38821 ATTATCTTTGTTCTAGAATTAAAAGTTTTTGTTACTCCTCACCCAAAGGCATAGCGAATA 10 38880

38881 GACTGAGATATTACGTTGTAACATCTTTAACATGAATTAATCATTGCGCCTCAGTGAGAA

38940

38941 GACAAGTGTGACATTTTCTGGATGGACAGTTTACATGGCATCAAATATTACATTATCCTA

39000 15 39001 TGATGAAAGAGAGCATATGAAGGGAAGCACTCCTTCAAACATGGCAAAATCAGCAATTTC

39060

39061 TTACATTTGAGTAACCAGCCAATTTTACAGTCTCTCAGCAGTCTTCTTATTCCTCACATA

39120

39121 GACTTGAGAGCAGGATAGAAGGAAGTAGAGTATCATGAGTTAAGTATTGAAAAAAGTCAG 20 39180

39181 TTGAATTTCTGTACTATAAATTGCTTTGAACAGCTGATACCTTATTTTGAGTACATAAAT

39240

39241 TCACTTTGTTTTGCTCTACGCTTCATTGTTTATACATATGCCTTCTCTGTATGCAACTAG

39300 25 39301 CTGGATTTTTCTGACCCAGAAGCATATTCCTGTGAAATAGTACTGTTTGTTAACTTCTCT

39360

39361 ATTTCTGAGCTAGCCAGCTTGATGCCCGAAAATCTGCAGTTGCTGGGTTTTTGCTGCTCC

39420

39421 TGAAGAACTTTAAAGTTTTAGGCAGCCTGTCATCCTCTCAGTGCAGTCAGTCTCTCAGTG 30 39480

39481 TCAGTCAGGTAAGGATTATTTACGTTAACTTGCAGTGTGTGCGTATCATTCATGTGCATC

39540

39541 GATGAAATGCACGCTGTTTTCTTTTATTCCTGTTATGGTTGTAAGAATGATCCTAGAACT

39600 35 39601 AAAGAGTCTGATGATGATGATGGTGCTGATGATGATGATGATGATGATATTGGCTAACAC

39660

39661 TCATTAATGGCAGATTTACTGTATGCTAAACTGTGCTATCCAGTATGGTAGCCACTGTCC

39720

39721 TCAGTGTGGCTATTGAATGGAACATTTGATTGAAATGTGGCTAGTCCACATTGTGTTGTG 40 39780

39781 CTATACATGCAAAATTCATACCAGATTTTGAAATAAAAGAATGTAAAAATCTAATAATTT

39840

39841 TTAATATTGATTACATCTTCAGAGAGAATTTTGTATGTGTTGGGTTACAGAAAACATTAA

39900 45 39901 AATTTGACTTTATCTCTTTCTCTTTTTTAATGTGGCTACTAGAAAATATAAAATTACATA

39960

39961 CATGGCTTGCATTTGTGGTTCATATTATATTTCTTTCTTTTTTTTTTTTTGAGACGGAGT

40020

40021 CAGTCTGTCACCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAACTCACTGCA 50 40080

40081 AGCTCTGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGACTA

40140

40141 CAGGCGCCTGCTACCGTGCCTGGCTAATTTTTTGTATTTTTAGTAGAGATGGGGTTTCAT

40200 55 40201 TGTGTTAGTCAGGAGGTCTCGATCTCCTGACCTCGTGATCCACCCGCCTCGGCCTCCCAA

40260

40261 AGTGCTGGGATTACAGGCGTGAACCACCACACCCGGCCCATATTATATTTCTTTTGGATA

40320

40321 GCACGGTGCTAGATATTGTGCCAAAATGCTTTACATTTATTATCTCATTTAATCTTCACA

40380 40381 GCAGTCCTCTGAGGCTAGATACTATTATTATCCCATTTTCTTACGTCATTGTATTTACTT

40440

40441 TGCACAGGAACATATATTACACCTTGTAATCTGGGAAAACAAAACTTTTTATAATAATCT

40500

40501 GGTAATCTAGGCTTTGGAGCAATAAAATTGTCTTCATAATTTAAATGATGTACTATAAGA 40560

40561 TTTATAAATAGGAGACATTTATATATAACAGTTTGACTAGGAATCCTCACTTTATTGCTT

40620

40621 TGGTATAACTACATAAAGCTTTGTCTATAACATATAGCATAGATTAAATATTAGGAACAC

40680 40681 TAGATATCATTTGTGTGTTATTATCCCCATTTTAAAGACAGGGAAACTGAGGCTGAGCCA

40740

40741 GGTTAAATAACTTGCCCAAGGTTATATAACCAAGAAAGGGTGGAAGAAGGATTTGGCCTC

40800

40801 AGGCAGTATGAACTACATTACTGTGATGCCTGGAGTAAGGATTAGGAATTCTTGGTCTAT 40860

40861 AATTGCTGATTTTATTATGTAATTTACACAAAGGATGTCAGTTCTTTGTCATGTCCATTT

40920

40921 CTAAΆCAAAAGACTGTCTTCCTCTCAΆCCTTTCTAGAΆTTTGTGGAATAGTGAGATTCTT

40980 40981 CTAGCTATTGAAGAAGTCTTAGACCTTTTAAATAAAATTATGTATATTGTTACTACTGGA

41040

41041 TTCTAGTGGCTTGAAATAAGCAGTGTACCTTCCTAAGCAATTGATATTGTCAAGGAAACC

41100

41101 TTAAGGAATATGAACAGTTAATTTTGTTAAAATGCTTAAAAATTCTCTTATTCCAAACCA 41160

41161 GAATAAGAGTGACACTTACTACTGCATGGTAATGATTATACTGTGTGTCTCACCCCTGGG

41220

41221 GTTTGGCATGTGGAGGAAGCTAAGTTTTTTCGACCAAGAACATAGGCTCATTTAGCTCCT

41280 41281 TATTGGAAACAGCTTTGGTAAGATAGACGTGAATTGGCCTGTCTTCCTTTCAGGTTCATG

41340

41341 TGGATGTTCACAGCCATTACAATTCTGTCGCCAATGAAACTTTTTGCCTTGAGATCATGG

41400

41401 ATAGTTTGAGGAGATGCTTAAGCCAGCAAGCTGATGTTCGACTCATGCTTTATGAGGTAA 41460

41461 GTCCGTAGAATGGAAAGAATGTAGCAAAACCCCAACTAATAATTTTTATTTTAGTAATTT

41520

41521 TTTTCCTGATTATAAAAGTACAATAGCTATGCTCTTACTTGATGTAGTCAGCACCAGTAG

41580 41581 CTAGTGATGTTTAGTCAAAAAGTTGTTTAAGGCAGGGCACAGTGGCTCACACCTGTAATC

41640

41641 CCAACACCTTGTGAGGCCAAGGCAGGAGGATCAAGCAAGACTCTGTCTTTACAAAAAATA

41700

41701 ΆGAAAATAAΆAAACATTAGCTGGGTGTGGTGGCATGCGCCCGTGGTCCCAGCTACTCAGG 41760

41761 AGGCTGAGGCAGGAGGATCCCAGCAGTGAGGCTGCAGTGAGCTATGATCACACCACTGCC

41820

41821 CTCCAACCCAGGCAACAGTGAGACCTTGTCTCTCAAAAAAATTTTTTTAAGGTGAGAAAT

41880 41881 CATAAAACACAGTTGATAGACCTTAAATATTTTATTTTAATTTGAAATATTTAAATATAC

41940

41941 TATAAΆTTTGCCAGTCTTAAGACTAAGTCCTTTTTTTTGCTTTGTTAΆTGATTACΆAGTT

42000

42001 TACTTCCTGCCTCTCATACAAACCCATACCCTTAATTTATTGCCTATAGTTTCAAATTTG

42060

42061 TGTTTACCTAAAGTGGTATGAAAACTTTAAGCTACCTCAGATACAGTTTAGCACAAAAAT

42120 42121 CTTTTAGATAGCTTCAGTTTTTTCTCTTTTTACAATTGTGTTACCTCTAATTTATCAGAA

42180

42181 ATTGGTTTGAGAGACTGACTTTCCAAACATTGTTCCAAACTTAGTTTTTTTTTTTCTTTG

42240

42241 GGACAGAGTCTCGCTCTGTCACCCAGGCTGGAGTGCAGTGGCACAATCTCGGTTCACTGC 42300

42301 AGCCTCCATCCTCCATCTCCTGGGTTCAAGCAGTTCTCCTGCTTCTCAACCTCCCAAGTA

42360

42361 CCTGGGATTACAGCCATGCACCACCATGCCCAGCTGATTTTTGTATTTTTAGTAGAGATG

42420 42421 GGATTTCACCATGTTGGCCAGGCTGGTCTTGTACTCCTGGCCTCAAGTGATCCACCTGTC

42480

42481 TCGGCATCCCAAAGTATTGGGATTACAGGCGTGAGCCACCATGCCTGGCCTAACTTAGTT

42540

42541 TTTATAAAAGCAACTTCTTTTTCTTTTTGCACTGGGTTTTCAACAGTTTACTGAACGTCT 42600

42601 GATTAAATCACTAGTTATTATGACAGTACAGCTAAGAGAAATACAACTACTGACTTGTGA

42660

42661 TTAGCATATAACTCACCCAGAGGATATTGTGGTAGCATATGTGAAAACAACATTCATCTT

42720 42721 TCTGTACATTTCCATCAGAGCTCTTAGATGCCTAGGTGCACTATCAATGAGCAGTAATAT

42780

42781 TTTGAGAAGAATCTTTTTTTTTCTGAGCAGTAGGTCTCAACAGTGGGCTTAAAATATTCA

42840

42841 GCAAACCATGCTATAAACAGATGTGTTATCATGCAGGCTTTGCTTTTCCATTTATAAAGC 42900

42901 ACAGGCAGAGTAGATGTAGCATAATTCTTAAGGGCCCTAGGATTTTTGGAATAGTAAATG

42960

42961 AGCATTGATTTCAACTTAAAGTCAGCAGCTACATTAACCCCTAACAAGACAGTCAGCCTG

43020 43021 TCCTTTGAAGCCAGGCATTGACTTCTCCTCCCTAGCTAGGCAAGTCCCAGATAAAGTCTT

43080

43081 CTTCCAATATAAGGCTATTTCATCTACATTGAAAATCTGTTGTTTAATATAACCACCTCG

43140

43141 ATCAATTATGTTAGCTATATCTTCTGGATAATTTGCCTCAGCTTCTCCATCAGCATTTGC 43200

43201 TGCTTCACCTTGCACTTTTTTTTTTTTTTTTTTTTTGAGTCTCACTCTGTCACAAGGCTG

43260

43261 TAGTGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCACCCAGGTTCAAGTGATTCT

43320 43321 CCTGCCTCAGCCTCCCAAGTAGCTGGGACTATAGGCTCACGCCACCATACCCAGCTAATT

43380

43381 TTTGTATTTTTAGTAGAGACTGGGGTTTCACCATGTTGGCCAGGATGGTCTCAATCTCTT

43440

43441 GACCTCGTGATCTGCCCGCCTCGGCCTCCTAAAGTGCTGGGATTACAGGCGTGAGCCACT 43500

43501 GCGCCCAGCCCCTTGCACTTTTATGTTATGGAGATGGCTTCTTTCCTTAAACCTCATGAA

43560

43561 CCAACCTCTGCTCGCTTCAGCTTTTCTTTGGCAGCTTCCTCATCTCTTTCAGGCTTTATA

43620 43621 GAATTGAAGAGTTTGGTCCTTGCTCTGGATTAGGCTTTGACTTAAGGGAATGTTGTGGCT

43680

43681 GGCTTGGTCTTTTACCAGACCACTAAAACTTTCTTCACATCAGCAATAAGGCTGTTTTGC

43740

43741 TTTCTAATCATTCATATGTTCACTGGAATAGCACTTTAAATTTCCTTTAAGAGCATTTTC

43800

43801 TTTGCATTCACAACTTGGCTCTTTGGTGCAAGAAGCCTAGCTTTTGACTTGCCTCAGCTT

43860 43861 GTAACATGCCTTACTAAACTGGATCATTTCTAGCTTTTGATTTAAAGTGAGAGACGTGCG

43920

43921 ACTCTTCCTTTCACTTGAATACTTATAGGTCATTGTAGGGGTTATTAATTGACCTAATTT

43980

43981 CAATATTGTTTTGTCTCAGTGAATAGGGAGGCCTGAGGAGAGGTGGAGAGATGGGATGAT 44040

44041 GGCCCAGTCAGTGGAACCAGTCAGAACATACACAGCATTTATCAATTAAGTTCACTATCT

44100

44101 TATATAGACACAATTCGTGACTCTTCAAACCAATTGCAATAGTAATGTCAAAGATCATTT

44160 44161 ATCATAGATCCCCATAACACATACAATAATAAAGAGAAAGTTTGAAATATTATGAGAATT

44220

44221 ACCAAAATGTGACACAGAGACATGCAGTGAGCACATGCTTTTGGAAAAATGGTACTGAGA

44280

44281 GACTTGCTTGATGTAGGGTTGCCACAAACCTTTAATTTGTAAACGTAATATCTGCGAAGT 44340

44341 GCAATCAAGCAACTTGCAATAAAATTAGGTACGCCCATATTCTGTATAAACCACATTGTT

44400

44401 TAAACAGTTAAGGCACAGTGAGCCACTCTTCTTTAGGGTGAGCTTTGTGTCAGTGTAAGT

44460 44461 AACTGTTTACCAGCTAAATTCTCAGCCACCAGCTGGCAAAGGTGCACTAAGGATAGGCAG

44520

44521 TCTTAGACTTCATGTGTTAACTCTTGTGCAGTCCTTGATATGCCCCTTCCCCCCCAAAAA

44580

44581 AAATCTTCACCCTTATTTAGCTCTAAAGTAAAAAAAATTAACATGAAGTGGATCATAGAC 44640

44641 TTAAACCTATTAGAATATTGCTGTTCATTTATTCATCAGTCTTAGTATATCATTGGTATA

44700

44701 TAAAAATACCTTATGTAGTGATAAAACTTGTCATATTTGTTGCAAGTGTATGCCAGATTG

44760 44761 CCATTTGCTTTTTTTTTAATATGATCCTTTTACACTTAAATAATAATTTTTGGTTACTAA

44820

44821 ATCTATTGATTTTCTTCCTCTGAGATGTCTTCTATTACTTTTTTTTTTTTTTTAAGAGAC

44880

44881 AGAGTCTTCCTCTGTTATCCAGGCTGGAGTGCACTAGTGCAGTCTAACTCACTGCAGCCT 44940

44941 CCAGCTCCTAGGCTCAAATAATCCTCTTGCCTCAGCCCCATCCCCAAACGATTGGGACCA

45000

45001 TAGGCACAAGCCACTGCACCCAGCTAATTTTTCTGTTTGTTTTTTTTTTTAACCTTTTGT

45060 45061 AGAGACAGGGTCTCACTTTGTTGCCCAGGATGGTCTCAAATGCCCTGGCTTCAAGCGATC

45120

45121 CTCCTGCCTCAGTCTTCCAAAGTGCTGGGATTAAAAGCATGAGCTACCGTGCCCGGCCCA

45180

45181 TCTTTATTATTATTTTATCTGTAGAAAGTCCAAATCCTTCCAATGAATATTCACTAACAT 45240

45241 TTTCTTTGAGGTTTTTTTTTTTATGCATTTGTTTTTGACGTTTAGCTACTTAATTCATCC

45300

45301 TGAATTTGTATGATAATATATATGAGGAGCTTTTCCCCCAAGCTAACCATTTGTTCAGAC

45360 45361 ATCATTGCCTCCTTAATGTGTACTATGTATCTCTGTATGCCATTTTAAAATTTTGCCTGT

45420

45421 AGATTTTTGTGCCAAGACAAAGTGTCTTTTTATTATTATTACACATTCATAAAATATTTT

45480

45481 AATTGCTGTTAGGATTAATTTTCCTTTGTTCCGTTCCTTTCCCCTTTCCCCTTTTTCCGT

45540

45541 TTTCCTTTCCTTTCCGATGGAGTCTCACTCTGTTGCCCAGACTGGAATGCAGTGGTGCGG

45600 45601 TCTCGGGTCTCAGCTCACCACAACCTCCCCATCCCCAGTTTAAGCGATTCTCCTGCCTCA

45660

45661 GCCTCCCAAGTAGCTGGGATCACAGGCGTGCATCACCATGCCTGGCTAATTTTTATATTT

45720

45721 TTAGTAGAGATGGGGTTTCACCATGTTGACCAGGCTGGTCTCGACTTCCTGACCTCAGGT 45780

45781 GATCCCTCCTGCCTTGGCCTTCCAAAGTGCTGGGATTACAGGCATGAGCCACCGTGTCTG

45840

45841 GCCTGTTTCTTTTCTTTAAACAAAAAAATTTTTTTGATCAATTCTCATTTGCTTTTACTC

45900 45901 ACAGATGAACTGTATAACCTCTTTTCCAGGTTCCAGAATGAAAGTTCTCCCCAGGATTTT

45960

45961 TTTTTTTTTTAAGACAGAGTTTTGCTCTATTGCCCAGGCTGGAGTGCACTGGTGTGATCT

46020

46021 CAGCTCACTGCAACCTTCATACCTCAGCCTCCCGAGTAGCTGTGATTACAGGCATGTGCC 46080

46081 ACCACACCTGGCTAATTTTTGTATTTTTTAGTAGAGACAGGGTTTTGCTGTGTTGGCCAG

46140

46141 GCTGGTCTCGAGCTCTTGGCCTCAAGTGATCTGCCTGTCTCAGCCTCCCACAGTGCTGGG

46200 46201 ATTACAGGTGTGAGTTACTTCACCTGGCTTCCCCAGGATTTTACTCTGAGCAGTTCTTAA

46260

46261 CTAAGACTAGGACATTTTGGGTGGGAAAAAAATATGGTCATTTATATTAGGGCATTTAGG

46320

46321 ACTGTCAAATATGGTCTCAATAGGAAAAAACTGAGAAGAGGATAAAGAAAGTGTTATTTT 46380

46381 TAAGTTATGATTGTCCAGATCACTAGTATCTCTGTTAAAGTGCTTATTTCTTCTCTTTGA

46440

46441 TTCCTCTTAGGGGTTTTATGATGTTCTTCGAAGGAACTCTCAGCTGGCTAATTCAGTCAT

46500 46501 GCAAACTCTGCTCTCACAGGTAAAATACATTTTTATGGATATATGGAAAACAGACCATCA

46560

46561 AGGATCGAGAGACAGTTGATGGTTTTCAGACCTTTCAAAATATAAAATAATTCCCAACTA

46620

46621 GTACATTAGTACTTTGTGAACATGCTGTGGGTATGTTATGCTGTTCTGGGTTGTCTTATC 46680

46681 TCGGTCACCACTTACCTGTCTGCCATAATACCATCATCTAGCAGTTATCCTTGTGAATTA

46740

46741 ACAGTCTGTATCCTAAGTCAGAAGGACATTTTGAAGAACCATTTATATAAGGACCATTGC

46800 46801 AACTGTATCCATTATGTTATATTTCTGAAAGAAAAAAGAAATCCAAAACAAATATAGCAA

46860

46861 AATATTATTTCTTAAATCTGGGTAATGGATACATGGATGTCTTTTACATTCCTTTCTGCA

46920

46921 CTTTTCAGTTAACAAATTTTTAAATTAGAAGATTAAATTCTTCTAATCTTCAAGACTAGA 46980

46981 AGAATTCATATAGAGACTCTCTACCAGGAGGGTTCCTTTAGACCCCTAGTCTATCATTTT

47040

47041 GTTTGCTGTCTCTCAATGCCATTGCTTTGTGATAGTTATGAAATTTCATTTAAAATTAAG

47100 47101 GTCTCAGATGTGTGAAATTATTGGAAGTTCCCATACTGTCCATCAAAAGCTTTTCATCTG

47160

47161 GCCTTAGGCTTGATCCTATATAAAATTTATATTCTGTCAGCTATATTGTATCTTTCAAAC

47220

47221 TGTAATCAACACATCTACATTCACACGTTTAACAAAAATAGAAAATTATGTTTGAAGGCA

47280

47281 TAGAATAGCGCTTTCTGACTCAGGTTAATATGCAGGACAGTAAAATTCTGCAGACTCCAA

47340 5 47341 AAGTTAAAAGGAGAAAAATAATAACTCATATTTTTTTTGGCTGTCTTATATGCCAAATAC

47400

47401 TGTTTGAAGTACTTTACATGTATTGTCATGTTTAATCCTTATACAATGATTATTAATAAT

47460

47461 ATCCTCATTTTTAAGATGAGGAATCACGGTACAGATACTTTTATAAAAACCAGTAATAAG 10 47520

47521 TAGTGGTACTGGGATCTTGGACTCAGCCAGTCTGACTCTGGATCCTGTATTTTTAACCTC

47580

47581 CAAGTTAACTGAAAATGTTGATTGCATGTACAAGTTTAGGAATAGAAGGAATTGGAATAG

47640 15 47641 GAAAGATGATCCTAATCTGTACAATATCCAAAGCAAGAGCACAGGATGTGGGTGAGCCAA

47700

47701 AGAAATAAAAACTTGGCTGCATTGTTCTTTGAACAGTTGGGAAATGTGTGTTAGAAGGCA

47760

47761 TTAGACATTAAAGATACCTTTCACCTGAGAAGTTAAAAAAGTAAAAAGCATTTATGAGCC 20 47820

47821 AAGATGTCTTTTTTTTCTTACTATACACAGTTAAAACAGTTCTATGAGCCAAAACCTGAT

47880

47881 CTGCTGCCTCCTCTGAAATTAGAAGCTTGTATTCTGACCCAAGGAGATAAGATCTCTCTA

47940 25 47941 CAAGAACCACTGGTGAGACTTTTATTCTTCCTTCAACCATTATTTTTAGTATTAAGGATA

48000

48001 GGGTTGAACTTTGCAAAGAGATACCTTTCCCTGTGAAACTCTCTTCCTGGTTCTTCCAAT

48060

48061 GAATGTTTACTTTTAAGTCTTCCCTGCTTATCATTTCTGAACCATTATCATTTTGGATCT 30 48120

48121 ATTGGTAAGTTATCTCCATGTTAGCCCTTTACTAAATCAAGGCTTCCATCACTTCCCCCA

48180

48181 GAGAATATTTTATCTCATTTATGAATCTTTGATTACATTCACTAAATAAATGTACTTTAG

48240 35 48241 AATGGGGAGAGGGATGATGTCACCTACGAACAGATGAACCTAGGGGACCTGCTTCATTAG

48300

48301 TCTAGAGGAAGTATATGATTTAGGCTGAACTAGGATTGGCACTGTTGAACCTTCAGAGGA

48360

48361 GCTAGGCCTGGTTGACTTTCTGTTCTTATCTAGCCTCAGAAGGACAGCGAAGAGGATAAA 40 48420

48421 AATGGAAACTCCCACCCCAAGTGACTTATAACTCATGGCAGAAGAAAGGAGGAAGTCTTT

48480

48481 CAGTTGTCATATCTTGCCTTCATTTTTGTGCCTTCTGTGACATGAGATGTCCTCAGCCTA

48540 45 48541 GATCTAAGATGCTGTGTTGGTCTCCTAAAGATTCCCTCAACATAAACCTCTATCTTTAGA

48600

48601 ATAGCCTCTGGGACTCAAATGAACTAGAGAAGATTTAACTTCACCACCTAGCTCCAGGGT

48660

48661 TATTTATCAGACCAGAGATAGCACATCTAATGTTGCATTTTGAGGTGTCAGCCACTTAGG 50 48720

48721 CCAATTTTCGGGGGGAAGCATTAGAAAAGGATGCTGCTGTTCAATTAATTTTCCCAGTTT

48780

48781 TGTTAATTTAGATGGACTTAATTTGTAACGCCAATGAATCATTTTCTTTAGCCTTATAGA

48840 55 48841 TAAGAATTATTTGCTGGTTATGAACAACTTTATAAGTTATCCATCAACACTCAAGAGTAT

48900

48901 TTAATTTACTTATTTTCTCCTACAGGATTATCTGCTGTGTTGTATTCAGCATTGTTTGGC

48960

48961 CTGGTATAAGAATACAGTCATACCCTTACAGCAGGGAGAGGAGGAAGAGGAGGAGGAAGA 49020

49021 GGCATTCTACGAAGACCTAGATGATATATTGGAGTCCATTACTAATAGAATGATTAAGAG 49080 49081 TGAGCTGGAAGACTTTGAACTGGTAATTGCTAAGTCCTCAGCTGTATTGAATGATGGAGT

49140

49141 TCTTTAGTAGCTTGTTAATTTTTATCTTGTGTCTTTTAGGATAAATCAGCAGATTTTTCT

49200

49201 CAGAGCACCAGTATTGGCATAAAAAATAATATCTGTGCTTTTCTTGTGATGGGAGTTTGT 49260

49261 GAGGTTTTAATAGAATACAATTTCTCCATAAGTAGTTTCAGGTAAGGTTTTGCTATAACT

49320

49321 CCATTTGTAATTTGATGAATTCTCCATTTTATTTACATATTTTTACTGTAGCAGTATTCT

49380 49381 AGGCAGTAAACAGACCACAGACGATGACACTGTCTCTTAAATGAGCACCTAGTCTAAGAC

49440

49441 TAAACTGTGTTCTTTCCACTAGAGAGCGCTGAAGTATATGCTTACCAGGCCTATAAAATG

49500

49501 AAAGCCTTCCAGCAGGGTTTCACTTACTCAGAATTTCCCAGACCCTGAATTCAAGAACTC 49560

49561 CTAATTTTAACATGGAAATTTTGATTGGTGGTAGTATTGGACTTTAATACTCTGCAGGGA

49620

49621 CTGGTATAAATAAGGGAGGGGAGGCCAGGTGCGGTGGCTCACGCCTGTAATCCCAGAACT

49680 49681 TTGGGAGGCCAAGCTGGGTGTATCACTTGAGGTCAGGAGTTCAAGACCAGCCTGGCCAAC

49740

49741 ATGGTGAAACCCCATCTCTACTAATAATGTAAAACTTAGCCAGGTGTTAGTGCCTGGGAC

49800

49801 CTGTAGTCCTAGCTACTCGGGAGGCTGAGGCAGGGGAATCACTTGAGCCTGGGAGGCGGA 49860

49861 GGTTGCAGTGAGCCAAGGTCACACCAGGGGAAACAAAACCCAGATAGATTGCTGTGACCT

49920

49921 GGGAGTATTATATTAATGAAGTTCTATTCATTTATAATGTTACGTCTAAAATATGACATT

49980 49981 TAAAGAAGCATTGTGATTATTATCATATGTAAAAGTAAACCACAATATTCTGATGTTTGT

50040

50041 CCTTAGCGGTCTCTTTTTTGTTTTTTACAGTAAGAATAGGTTTGAGGACATTCTGAGCTT

50100

50101 ATTTATGTGTTACAAAAAACTCTCTGACATTCTTAATGAAAAAGCGGGTAAAGCCAAAAC 50160

50161 TAAAATGGCCAACAAGACAAGTGATAGTCTTTTGTCCATGAAATTTGTGTCCAGTCTTCT

50220

50221 CACTGCTCTTTTCAGGTAAGGTTCTGCTAGAGTGCTTAAAGACAGCCACTCCCTGAGGAT

50280 50281 CGTATACCTACCTAGGCTCACTTGTTTCACCTCGTTTGGATTGTTTACAGTAAGAAATAA

50340

50341 GTCAGGAGGTTTTATTTTTCTTGAAATAAAAGCTGGACAATCCTAGGTAGGTACTTTAAG

50400

50401 TAGGTACCTTGAATTATTAAAGGACAAATGAGAAACAGTGGGCCAATAAATGAAAAGATA 50460

50461 GGGATGAΆTTCCAGCAGGGTGTTTTAAGAAAΆTCTATGTAAGTAGCTTGTACAAGATAΆC

50520

50521 AGATATAAAAAGGTAGGGGGTGCCAGGCATGGTGCCTCACTCCTGTAATCCCAGTACTTT 50580 50581 GGTAGACTGAGGTGGGTGGATCACCTGAGGTCAAAAGTTCAAGATCAGCCATGACCGATA 50640

50641 TGGTGCGACCTCATCTCTACTAAAAATACAAAATTAGCCGGGCATGGTGGTGCGCACCTG 50700

50701 TAGTCCCGGCTACTTGGGAGGCCGAGGCAGGAGACTCACTTGAACCCAGGAGGCAGAGGT

50760

50761 TGCAGTGAGCCGAGATCATGCCATTGCACTCCAGCCTGGGCAACAAGAACAAAATTCTGC

50820 50821 CTAAATAAΆTAAAAAGGTAGGGGGGGTGACTGAATTAAAAATACAAAGGGACAAΆΆΆAAA

50880

50881 ACCTAAAAGACTAAGACTGACATTTAAATTTTATCAAAGTAAAGGATTTGGGGAGGGAGA

50940

50941 GAAAATGAGGAACCACTAAGAATAACGATTATAAAAGGGCAGCCATAGCAGTGACTATAG 51000

51001 CTTGGGCTGCCTAGAGAGTCTGCCAGTCGGAACTTACTGGCAAGCTCTGTTGGGTGCTGC

51060

51061 TGTTCTGAACAATTTCTAGAAAGCTTAATTCCATATACCAATAGCAGTAAGGGAATCTTC

51120 51121 CTTTTTCTTTCTCTCTCTCTGTCTCTCTCTAGGGATAGTATCCTkAAGCCACCAAGAAAGC

51180

51181 CTTTCTGTTCTCAGGTCCAGCAATGAGTTTATGCGCTATGCAGTGAATGTAGCTCTGCAG

51240

51241 AAGGTACAGCAGCTAAAGGAAACAGGGCATGTGAGTGGCCCTGATGGCCAAAACCCAGAA 51300

51301 AAGATCTTTCAGAACCTCTGTGACATAACTCGGTAAGCCACTCCCACCCCTTAGAAACTT

51360

51361 ATTCCACTTGGCTGTGGTGTCTCCAAGAGAACAAACTGGGAACAGAGGATGAAGGTAATT

51420 51421 TGAGGTTGGACATATTTTAGGAGTGAAAGAATATAAAACATTAGCCAGGTGGCAGGCACT

51480

51481 TGTAATCCCAGCTACTCAGGAAGCTGAGGCAAGAGAATCACTTGAACCAGGAGGTGGAGG

51540

51541 TCGCAGTGAGCTGAGTTCATACCACTGCACTCTAGCCTGGGTGACAGGGCAAGACTCTCA 51600

51601 AΆAGAΆAAΆAAAAAΆAAAΆAAAGCCAGACATGGCAGCTCATGCCTGTAATCCCAGCACTT

51660

51661 TGGGAGGCTGAGCCGGGTGGATTACCTGAGGTTGGGAGTTCGAGACCAGTCTGACCAACA

51720 51721 TGGAGAAACCCTATCTCTACTAAAAATACAAAAATTAGCCTGGCATGGTGGCACATGCCT

51780

51781 GTCATACCAGCTACTCGGGAGGCTGAGGCAGGAGAATTGCTCGAACCTGGGAGGCGGAGG

51840

51841 TTGCAGTGAGCCTAGATTGTGCCATTGCACTCTAGCCTGGGCAACAAGAGCGAAACTCCA 51900

51901 TCTTAAAAGAGTGAAAAAAAGTGCAGTCATTGACATAGGGTCTTTTTTCTAGTCTTGCCT

51960

51961 TCCCCTGGCGCCCCCCCCACCTTTTTTTTTTTGTTGTTGTTGTTGTTTTGGCTTTTTTTT

52020 52021 TGGCTTTTTTTGAGAGAGTCTCGCTCTATTGCCCAGGGTGGAATGCCGTGGTGCAATCTC

52080

52081 GGCTCACTACAACCCCTTCCTCCGAGGTTCAAGCAGTCCTTCCACTGCAGCTTTCTGAGT

52140

52141 AGCTGGGACTACAGGTGCACACCACCATGCCCAACTACTTTTTAATTTTTAATTTTAGTT 52200

52201 TATTTTATTTACTACTTTTTTTGGAGACAGGGTCTTGCTCTGTCACCCAGGCTAGAGTGC

52260

52261 AGTGGCGTGATCTTGGCTTACTGCAACCTCGGCCTCCCCGGTTCAAGTGATCCTCCTGCC

52320 52321 TCAGTCTCCCAAGTAGCTGGGATTACAGGGCACCACCATGCCCTGCTAATTTTTGTATTT

52380

52381 TTAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCTCAGGT

52440

52441 GATCCACCACCCTGGCCTCCCAAAGTGCTGGGTTTACAGGCGTGAGCCACTGCACCTGGC

52500

52501 CCTTATTTTTTTTTTTTTAATTTTCTGTAGAGATAGGATTTCACCATATTGCCCAGGCTG

52560 52561 GTCTTGAACTCCTGGGCTCAAGCGATACGCCCATCTCAGCCTCCCAAAGTGCTGGGATTA

52620

52621 CAGGCCTGAGCCACTGTGGCTTTTGTTTTTTCGTTTTTTTTGTTTTGTTTTGTTTTTGTT

52680

52681 TTTGTTTTTGTTTTTCCATTTAACCCTGAGTGGACACAGCACATGTTTCAGAGAGCACGG 52740

52741 GGTTGCGGGTAAGGTCACAGATCAACAGGATCCCAAGGCAGAAGAATTTTTCTTAGTACA

52800

52801 GAACAAAATGAAAAGTCTCCCATGTCTACTTCTTTCTACATAGACATGGCAACCATCCGA

52860 52861 TTTCTCAATCTTTTCCCCACCTTTCCCCCCTTTCTATTCCACAAAACCGCCATTGTCATC

52920

52921 GTGGCCCGTTCTCAATGAGCTGTTGGGTACACCTCCCAGACGGGGTGGTGGCTGGGCAGA

52980

52981 GGGGCTCCTCACTTCCCAGTAGGGGCCCCTCACCTCCCAGACGGGGCGGCTGGCCAGGTG 53040

53041 GGGGGCTGACCCCCCCACCTCCCTCCCGGACGGGGCGGCTGGCCGGGCAGAGGGGCTCCT

53100

53101 CACTTCCCAGTAGGGGCCCCTCACCTCCCGGACGGGGCGGCTGGCCGGGCGGGGGGCTGA

53160 53161 CCCCCCCACCTCCCTCCCGGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCACC

53220

53221 TCCCTCCCGGACGGGGCGGCTGGCCTGGCGGGGGCTGACCCCCACCTCCCTCCTGGACGG

53280

53281 GGTGGCTGCCGGGTGGAGACGCTCCTCACTTCCCAGACGGGGTGGCTGCCGGGCGGAGGG 53340

53341 GCTCCTCACTTCTCAGACGGGGCGGCTGCCGGGCGGCAGGGCTCCTCACTTCTCAGACGG

53400

53401 GGCGGCCGGGCAGAGGTGCTCCTCACATCCCAGACGGGGCGGCGGGGCAGAGGCGCTCCC

53460 53461 CACATCTCAGGCGATGGGCGGCCAGGCAGAGACGCTCCTCACTTCCCAGATGGGATGGCG

53520

53521 GCCGGGAAGAGGCGCTCCTCACCTCCTAGATGGGATTGCGGCCGGGCAGAGACGCTCCTC

53580

53581 ACTTCCCAGACGGGGTGGCGGCTGGGCAGAGGCTGCAATCTCGGCACTTTGGGAGGCCAA 53640

53641 GGCAGGCGGCTGGGAGGTGGAGGTTGTAGCGAGCCGAGATCACGCCACTGCACTCCAGCC

53700

53701 TGGGCACCATTGAGCACTGAGTGAACGAGACTCCGTCTGCAATCCCGGCACCTCGGGAGG

53760 53761 CCGAGGCTGGCGGATCACTCGTGGTTAGGAGCTGGAGACCAACCCGGCCAACACAGCGAA

53820

53821 ACCCCGTCTCCATCAAAAAAATACGAAAACCAGTCAGGTGTGGCGGCAATTGCAGGCACT

53880

53881 CGGCAGGCTGAGGCAGGAGAATCAGGCAGGGAGGTTGCAGTGAGCCGAGATGGCAGCAGT 53940

53941 ACAGTCCAGCTTTGGCTCGGCATCAGAGGGAGACCGTGGAAAGAGAGGGAGAGGGAGACC

54000

54001 GTGGGGAGAGGGAGGGGGAGGGGGAGGGGGCTCGTTTGTTTTTTTGTTTTTGTTTTTTTT

54060 54061 TCATTTAAAGAGATGGGGTTTGCAGTGTTGCCCAGGCTGGCCTTGAGCTCTTGGGTTCCC

54120

54121 AGGGCTTCTAGTCTTGCTTTGATATGGATTGTTAGTCACATATAAAATGTTATGCTATTA

54180

54181 TTGGGGAAAAAAAGAGGGATATTACATAATGGTTAAGTGATTCAAAATACTTAAAGCAAA

54240

54241 AACTGATGGAACTGAAGGAAAAATAGACAAATCCTTAATTAATGTAATCAGCCCTCTCTC

54300 54301 CATAGTCAAAAAACAGAAAACAAAACAAAAAAAAACCAGTAGATAGAAAATCTACTGAAA

54360

54361 GGAGGTAGAAGAAGTGAACAACACTGTCACCCAGCTTGATTTAATTGACACTTATAAGAA

54420

54421 CACTCCACAGCACCTTCAAAATACACACTTCTTTTTAGGTGCACATGGTGTATTCACCAA 54480

54481 GATGAGCCATATTTTGGGCCATAAAACTCAGCAAATTAAAAAAAAGTAGAAATTCTCTGA

54540

54541 CCAAACAGAATTAAAGAAGAAAGCAGTAACAGAAAGATATCTAGATAATCCCCATATATT

54600 54601 AGAAATTAAATGCATTTTTAAATAGCCCATAGTCAAAAAGTCGTCAGGGAAATTAGAAAA

54660

54661 CATCTTGAATAGAATGGAAACAAACAGACAACATGTCAAAAGATATGATCCAGCCAAAGT

54720

54721 GCTATTGAGGAAATTAAAGCAGTTAGTGATTTAAAAATCTCAGATCTCCCATATAAACTT 54780

54781 CCACATTAAGAAAATAGAAAAAGAAGAGCAAATTATAATGAAAGTAATCAGAAGGAAGGA

54840

54841 AATAGACAAAΆGCACAACTATGAAGAAGCAGAAAATTAAAATAGAAAΆCATTAΆAΆGAAA

54900 54901 GCTGGTTCTTTGAGAΆGATCAATAAATTTAΆTGAACTTTAGCCAGACTAΆTCAGGAAAAG

54960

54961 AGAAGACACATTACAAATTGCAGAATTGAGGAAGAAGGGACATCACTACTGATCCTACAG

55020

55021 ATGCTAAAAAATAGATAATAGAATATTAACGATTTTATGCCAGTAAACTTTTAAGTTTAA 55080

55081 TGGACAGATTTCTTGAAAGCTCACTCAACAAAAAATAGGCAACCTGGCCGGGCGTGGTAG

55140

55141 CTCACATCTGTAATCCCAGCACTTTTGGGAGGCCGAAGCAGGTGGATCACCTGAGGTCAG

55200 55201 CAGTTCAAGACCAGTCTGGCCAACATGGTGAAACCCCAGCTCTACTAAAAAAAATACAAA

55260

55261 AATTAGCTGGGCGTGGTGGTATGCACTTGTAGTCCCAGCTACTCAGGAAGCTGAGGCAGG

55320

55321 AAGACCACTTGAATCTGGGAGGTGTGAGTTTGCAGTGAGATCACCCCACTGCACTCCAGC 55380

55381 CTGGGTGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAACCTGAGTAATCTTATCT

55440

55441 CTATTCAAAAAAACCTGAGTAATCTTATCTCTATTCAAAGAAATTGAATTTGTGGTTCAA

55500 55501 ACTTTTTTCACAAAACTTAAGATCTAGGTCTATTCCGTGGTAAATTCATCCAAACATTTA

55560

55561 AGGGAGAAGTAATGTAAATTCTGTGCAAATTCTTCCAGGAAATAGGAGAGAACATATTCC

55620

55621 ACCTCATTTAATTAATGAGGTCAGCATTACCCTGATACCAAACCAGACCCAGACCATTGC 55680

55681 AAGAAAACTGCATAGCAATATCCCTGATGAATATAGAGTCTGCTGTGTTAAAGGCAGACT

55740

55741 CAGAAGGCTATATACTGAATGATAACCATTTTTGTATGATGTTTCGTAGAAAGTAAAACC

55800 55801 ATAGGGACAGAAATTAGATCAGTAGTTGCTTGGAAATGGGAGGAGAGAATTGACTATCAG

55860

55861 TGAGCACAAGGGAACTTTTAGAGTGATAGAAATACCTTGATTGTGGGGAGATTACACAAC

55920

55921 CATGTAAGTTTTTTTAAACTTCTAGAACTGTTCACCTACAGGGTAAGAATTTTACTGTTT

55980

55981 GTTATTTTATCTCGGTGAACCTGTCTTTAAAAACAATACCACTTTCTCCTGCTTCAGAGT

56040 5 56041 CTTGCTATGGAGATACACTTCAATTCCTACTTCAGTGGAAGAGTCGGGAAAGAAAGAGAA

56100

56101 AGGAAAGAGCATCTCACTGCTGTGCTTGGAGGGTTTACAGAAAATATTCAGTGCTGTGCA

56160

56161 ACAGTTCTATCAGCCCAAGATTCAGCAGTTTCTCAGAGCTCTGGGTAAGCATTGCAGTAT 10 56220

56221 CAATAAATAGGCTTGATTTAGGTTTCTCCTTAGCTCAACCCCTTAATTCCATTTGTTAAC

56280

56281 CTACCAGAAGACACTTGAGAACAGTCCTAATGTTGCTAAGGAGTGACATCTAGTGGATTC

56340 15 56341 AGGATTGGTACCTACTGGATTACATTCAACCACAAAGAGTGGCTGGAAAATGGAATATAG

56400

56401 TATTAAGAAATCACAAAATTTTAGAAATTTAAGTCTGCCTTTAGACTTTTTTTTGGCTTT

56460

56461 CAAAAATTTTAGCTATTAAAAGGCCAAAAAGTATGAGTTTATATCAAAGAAGACAAGAGT 20 56520

56521 TTCTTTTCCATTCCTAGATGTCACAGATAAGGAAGGAGAAGAGAGAGAAGATGCAGATGT

56580

56581 CAGTGTCACTCAGAGAACAGCATTCCAGATCCGGCAATTTCAGGTGAGAAGCCTTGCAAA

56640 25 56641 GCTGCTGTACTGGCCTGAGAGGCTTTGCAAAGGAACTGTTAGCAGGGGCAGCTGGTAAGA

56700

56701 ATGGGCAGACAGTGACCAAGTAGGAGAAGAATTTTTTCTGGGTCATACAGGCAACAGGAC

56760

56761 ATAGCTCTCAGAATGGAAGATAATGTATTTAAAAGGACCCTAGTGAGGATAGGTATTTAG 30 56820

56821 CCACCCCAACCTGCAATACTCATTACTTCTAAGGACCACTGTTTGAGTTGATGTAAACAG

56880

56881 AAAGGAGTCTTCAAGGCATAGAGTCTGTGTTTTTGTTTTCTGAGTCTGTGGTATAATTTG

56940 35 56941 GCTTGGTCTGACAGACAACGTAAACACTGGACAATAAAGTTTGTTAGTTTCTAACCAATA

57000

57001 GTTGTCAAAATATCAGGATGTTCTATCACAGGAGTTTTTTATTTAACAAGTAATTCTGAA

57060

57061 GTTCCTACCATGTGCCAGAAACAGTTCTAGGTGCTGGGGATGTAGCAGGGAACAAAAGAG 40 57120

57121 ACAAAATATTCCTATCCTTATGGAGCTTACATTATAGTAGAAAGACACTGACAAATAATC

57180

57181 AAATACATAGTTATTTCAGGTAACAGGGAAGGTATATAAGCATTGCTGGGTTTGGTGCCT

57240 45 57241 TGTCTAAGCCTGAGATCAATGGATAGGCTCCAGGGAGCTGTGAGCCACCAGAAATTGGGT

57300

57301 ACATAGTTCATTATCTACATTTTGCATGGGGTTGGAACGGAAAGAGTCAGTTTTCAGGAG

57360

57361 CATTCTCAGAGAGGTTTATGGTCAAAAACAGCTTAGGAAACTACTGCTCAATTCAGGTGT 50 57420

57421 ATTTTCAGATTTTATCCTCTTAGAATTTTGGTAGCTTTGAATTCTTTCTGTCCTAGTCTT

57480

57481 AGGAGTCTCTGACTAGATGCCTCTTCTTCCTGATAGGAACGTGTCTGCTAACATTGCTTG

57540 55 57541 CTGTGTGTGCCTTCCTTTCTCAGAGGTCCTTGTTGAATTTACTTAGCAGTCAAGAGGAAG

57600

57601 ATTTTAATAGCAAAGAAGCCCTCCTGCTAGTCACGGTTCTTACCAGTTTGTCCAAGTTAC

57660

57661 TGGAGCCCTCCTCTCCTCAGGTACTAGTACCGCTAACTTAATCCCATTTAGCATTCCTCA

57720

57721 GAAGGCAAGGATTATTGCATCATAAAAGTGTATATCTAAATGTCCTCTTGAAATTGAGCT

57780 5 57781 TATTTTACATAATGTGTTTAATTATACACCTGAGTTAATATAAGACCTATAAGATGTATT

57840

57841 CTTTCCTGTGTAATGCGTCTACCACAGCATACTTACATGCAGCCATACAGGGCAAGAACA

57900

57901 TACTGCTTTTGTTTTGTTTGTTCAGGGTAGAGTGAAACACCACAGAAAGAATCCTGTGTT 10 57960

57961 GATCTTTGGTGCCTTCAACCTTCTTAAACCTGTGTTCTTTGCCACTCTTGGTGTTTACTT

58020

58021 CTTTTCTAAGAAGAAAATTTGTTAATTATGAAGAAGGCAGTTGACATTCAGCAGTTGACT

58080 15 58081 TTCTCAGCACAGTTACCTGGTACTGACTTTGGAGGAAATTCAACCAAAGATGATTAATTC

58140

58141 AGAAAGTGTAACTATAGTCCCAAAGCAGAAACACCTCTGCAGAGATAGGTTGTATGTATG

58200

58201 TTTGTGTAAGTGTGCACACGTAGATGTAGAGCTGTGTACCCACATGAGCGTTCACACACA 20 58260

58261 GAAGAATAAGAGAAGGACCACTTGTTGGGGGTCTGTACCATTCAGGAGTAATTTTCCTGT

58320

58321 AGGATGATAGAGTACCTTCAGGAGCCCAGGACTGTGTGCTCCTACCACATCTACTTCCTG

58380 25 58381 GGCTTGAGAAAGACAATGAAGCCCTGCGGTATACTCCTCTTTATGTGAGCCCTTGCCTCA

58440

58441 GAAGACTGTAGCCTGGGAGTATGAAGGCGTTTGGGCACTAGAAAGCAGAGCTGCTTCCTG

58500

58501 TCGTCCTACTTCCAGGTTTGCCCAAAGAAAGTCCCTTTGGTTACTTCATCCCAGTTCCAA 30 58560

58561 ACAGCTCACCCCCACTTTGCTCTCATCATCGTTCTTTCATACTGCTGTTTTTGCTGCAAA

58620

58621 GAGGAAAGGACTGAGAAAGAAATAGCTAATTGGGGCAGGGGGAAATTTGAGAGAAAATAA

58680 35 58681 ATGGATTTAGGAAGGCTAAAATGTTCAGGAAAATTTAAAAATTTTTCTTTTTTAAACTTT

58740

58741 TAAACTTTTAGATTTTTAAACGTTTTACCACCTACCCCTTCTCCCTTAACTTCAGGTAGT

58800

58801 CTCTTTAAAAAAAAATTTTTTTTTTTTTTGAGACAGAGTCTCGCTCTGTCTCCCAGGCTG 40 58860

58861 GAGTGTAGTGGCGCCATCTCAGCTCACTGCAGGCGCCGCCTTCCGGGTTCACGCCATTCT

58920

58921 CCTGCCTCAGCCTCCCAAGTAGCTGGGACTACAGGCGCCTGCCACAACACCCGGCTAATT

58980 45 58981 TTTTGTATTTTTAATAGAGACGGGGTTTCACCGTGTTAGCCAGGATGATCTCGATCTCCT

59040

59041 GACCTCGTGATTCACCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGCCACC

59100

59101 GCGCCTGGCCAAAAATTTTTAATTTTAATTTTGAATGGGTAATAAAGTCCTATGATTTGA 50 59160

59161 AAACCAGAGATTATCAAGAGATTTATGGTAAATGGTGTCCCCCACCCATGTCCTCCATCT

59220

59221 GTTCAGTTCCTCCTCTCCCCAAATAAGTAAACACCTTTAGTTTTTTGTGTCCTTCAACCT

59280 55 59281 TCTTTATTCCCTTCCAAGTAAATATAGAGTTTTATTCCCAATCCTACCCCTCTTCACACA

59340

59341 AAAACTTGTTCTGCATCCTGCTTATTTCATTGAGCACGGTATTCTGAAGATCTTTTAATA

59400

59401 TCAGTTTGTAGAGAGCTTTCAGTTAGCCTCTTAATTTACCTTTTCCTCAACAGATCAAAA

59460

59461 CTGCATTTTTGTTAGTCATTTACACTTAAGTATAACGGCATTTCGGTGCCTTTGTAGCTG

59520 59521 AAAGAACTGCTGAAGCAGAAAGTTGACTGTGTGTCTGGTGATGCAGATGCTTGCTATCTG

59580

59581 AAGACCTTAAGTCTCCTTCACTTAGCTAAGAAAATGCTGTCTGGTTTCTTGCCTCTTTTA

59640

59641 TTTCTCTTTTTTCTGCCCTTCTCTCTTCTAACCTCCTTTGTTCTTTTGGCTTCTAAATGC 59700

59701 TAATTCTACTGTTCTTATAGCTAGGTTTTCTCTTTCCCACACCTTTCTGATAACCTAATA

59760

59761 TCCTCAAGTGGAGAGTCTGGGGGCATGGGAAAGAAAAGAGGGCGCTTTAGCCCTAGACCA

59820 59821 CTATGGTAGAAGCAGAGGAAGTTCAGGTGTAAGGACCCTAAAACTTTTGGACCTCAGTAA

59880

59881 GGACATAGACATTGAGAATTAAATTATATATTATTCTCATAATTCTTAGAATTTTGGACT

59940

59941 TATTTTTATGCCCTTCATCATATATTACCACTTGAGATCTAAGGACATGTTTTGGCAGCT 60000

60001 GATTTGCTTAGATATGGAAAGGTCTAGAACTCTTCTGTTCTGACAACACCTAGGTCTATC

60060

60061 TCTGGCATGTTTCTTTAATATCTGAATGATCTCTAATTTAGTTTGTGCAGATGTTATCCT

60120 60121 GGACATCAAAGATTTGCAAGGAAAACAGCCGGGGTAAGTTTACTGCCATGTTTTCCTAAA

60180

60181 GGCTTTATATAAAATCACTATCCTCCAGTGGCATTTGGAAAAGAAAAGGTATTCCCCATT

60240

60241 CATTACACATTCACCAGAGTGGCCAAAATGAAΆCAGΆCACCTAATACCAAGTCGAATTGT 60300

60301 TGGCAAGAATGTAGAGTAGTTGGACCTTTCATTCACTGGGGGTGGAGGTAGAAATTGGTA 60360

60361 TAACCACTATGGΆAAACTGTTCAGCAATATCTACTAΆAATTAAΆCATATGCCTATGACTC

60420 60421 AACAATTCGAATCACAACAGAAAAΆTGTGGAΆGAATTCTCTTAGCAGCACTATTTTAGTA

60480

60481 GGTAAAACAAACCTAAGTGTTCATCAGCAGTAGAAAGGΆTAΆAΆGAΆTTATGATATATTC

60540

60541 ACACAGTGGAATCCTATACAGCAAATAAGCATGAACAAACTACATGTACACACAACTTGG 60600

60601 AGAGGATCTCACAAATGTACTGTAAAACAAAAGAAGCCAGACACAAGAAAGGGTACTTAC

60660

60661 TGTAGAATTCCGTTTAGATGAAGTTCAAAATTAGACAAAATTTATCTGTGATGTTAGAAG

60720 60721 TCAGGATAGCAGTTTCCTTGGCGGGGCAAGTTAGGAGGGAGTCCCTAGAATACTCGTAAT

60780

60781 GTACTTGATCTGGATGCTAGTTACACAGTAATCTTAACTAATGATGGGAAATGTCTGCCT

60840

60841 TCTTTGGGCATGTGGATAAAGAGGCATCATTGCAGACTAGCTGCTTGGCTGTAGTATGTG 60900

60901 TAΆCACTGGGATGTTGTGCCTGΆGACACGACAΆGCTACCTCCTGTTCTGGTTAGGCACTC

60960

60961 CACCCCACACACCCTGTACACATACCTGAGGGACCTGACTGGGTTATAACTTAACTTTTT 61020 61021 TTTTTTTTTTTGAGACAGAGTTTCACTCTTGTTGCCCAGGCAGGAATGCAATGGCACGGT 61080

61081 CTTGGCTGACCGCAGCCTCCGCCTCCTGGGTTCAAGCAATTCTCGTGCCTCAGCCTCCCA 61140

61141 AGTAGCTGGGATTACAGGCATGCGCCACCATGCCCGGCTAATTTTATATTTTTAGTAGAG

61200

61201 ACGGGGTTTCTCCATGTTGGTGAGGCTGGTCTTGAATTCCTGACCTCAGGTGATCCACCC

61260 61261 GCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACTGCACTTGGCCACAACTT

61320

61321 ACCTTTTAATTTGCTTTTCTTCCTATTCCTAGAGGATGCCTTGTTTTGCAAGAGCTTGAT

61380

61381 GAACTTGCTCTTCAGCCTGCATGTTTCGTATAAGAGTCCTGTCATTCTGCTGCGTGACTT 61440

61441 GTCCCAGGATATCCACGGGCATCTGGGAGATATAGACCAGGTACTATAATGAGCCTTCAG

61500

61501 TACAATACCCTGTGTGGGGATGGGGGTCAAGGGAACAACCTTACTGTCATTAGGTCTCAC

61560 61561 CTCTCTTCTTTTCCCCAGGATGTAGAGGTGGAGAAAACAAACCACTTTGCAATAGTGAAT

61620

61621 TTGAGAACGGCTGCCCCCACTGTCTGTGTAAGTGTTGTACCTGAGCCATGGGGAATAGCT

61680

61681 TTGTCATTCTTTCCATTCTACTCCATGTGAAGTGACTTGTGTCCTATAGCGGCTTGTGTC 61740

61741 CTGAGGCCTGTAACCCACCTGTAGGTGGCCAGGTGACCACAGTTATTATATTACCCCCAC

61800

61801 AGCTGTGTGTAGTGAATCACACCAGGCACTCTTTTCTTTCAGTTACTTGTTCTGAGTCAG

61860 61861 GCCGAGAAGGTTCTAGAAGAAGTGGACTGGCTAATCACCAAGCTTAAGGGACAAGTGAGC

61920

61921 CAAGAAACCTTATCAGGTAAGATAAGTTCAACTGGGATTCCAGGAATTGACATGAGCAAG

61980

61981 GTCAAAATTCATATTGGGTCGGTGCATAAAAATGGAGCCCCTGAGCTAAATTTCTTATTT 62040

62041 GCCTTTAAGCTAGATTTTTTCCTATGTGATGAAAGAAGTAGAAGATCATATGTCTTATCA

62100

62101 CAGCACGATTAATTCACTCTGCATATTGAATGTTCGTTTTTCCATAAAGACTTTCTAGTT

62160 62161 AGTAGTAATCAATTTCTAAATCTCCTTTACTCTTAATTAGAAAACGAAGTTGGGTTTGAA

62220

62221 TGTGCCAATTTTATTTCCCTTTAGAAGAGGCCTCTTCTCAGGCAACCCTACCAAATCAGC

62280

62281 CTGTTGAGAAAGCTATCATCATGCAACTGGGAACTCTGCTTACATTTTTCCACGAGCTGG 62340

62341 TGCAGACAGCTCTGCCATCAGGCAGCTGTGTGGACACCTTGTTAAAGGACTTGTGCAAAA

62400

62401 TGTACACCACACTTACAGCCCTTGTCAGATATGTGAGTATTTGAGACAAGCAGATTCGCC

62460 62461 CCACCATTCTACCCCAGTGAGCCAGGAGATGAGGATGGGGCAGCAGCCCACTGCTGCAGA

62520

62521 ATGCCCCTAAACAAGAGAGCATTTGCCTCCATCTTGGCAGGTCATGGTTTCATTTTAGTT

62580

62581 TGGTCAGTAGGTGGCAGGCATCTCCACAGGCCAGGGAAAAACAGTTTATAATTCTAATCT 62640

62641 GCTTTTGGGCCTGTCACGGTGGTTCATGCCTGTAATCCCAGCACTTTGGTAGGCTGAGAC

62700

62701 GGGCGGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGT

62760 62761 CTCTACTAAAAATACAAAAAGTAGCTGGGCGTGGTGGTGGGCACCTGTAATCCCAGCTAC

62820

62821 TCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCGGAGGTTGCAGTGAGCTGA

62880

62881 GATTGTGCCACTGTACTCCAGCGTTGGCAATAGAGTGAGACTCCATCTCAAAAATTAAAA

62940

62941 AAATAAAGTAAAAATCTGCTTTTGGAGGCCCTTAATTCTTTGGCATGTATGTGGCACCTG

63000 63001 ATTGAGAATGAATGCCTCTTTTAAATACCTTCTAAGGTTTAATCTAAACTATAACTATCC

63060

63061 TAGAATTAAATCTGGGCCCATAATGCATAGAAATTATACCATTTCTAACCAGTCTGGACA

63120

63121 CTTAAAGAGATGTTTATTCCTTTTTTCATTCCTTTAACATGTAGGTGTTAAATGCCTTAT 63180

63181 AGATCAAATACTAAGGTGGGGAAGAAAATTATTTTTTATAAAAATACAAGGCATACCCTG

63240

63241 CCCTTACTGTTTATGTTCCTAGCAATGATCCATATTAAATACTTAGAAAGGCACAGATAA

63300 63301 TCCAAGAAAGATGTGGACCACCCCGCATCACAGCAGTGTGGCTCTCTCAGAATGTGTAGC

63360

63361 TCTAGCTACATGTGGCTAAGCTGCTGTAGTGGAAATACTGAGAGCCTCTTCTCTTTATTG

63420

63421 GCATCTTTAAAAGGAAAAACAGCAAAGTTGTTGGTTTTTTACATGCAGGGACTACCAGCC 63480

63481 TGAGGAATTTTTAAAGAAATGAGCAAATCAGAAACAAACCCTTCATTACTTATTGTCTGG

63540

63541 GAAGCTTAGGAGCTTTGGAGGAGAGGAATAAGGAAGCTTGTGTCTTGGCTTCTCTGTGAA

63600 63601 TTTTGTTTTTCCTCTTCAGTCAGAATGATGAGTCACACTCCATAGGCTCACTGCAGCAGT

63660

63661 CACTTGTAGTTTCATAGGACAGTCTACTAAATCTAGGAATCTTTTTTTATTAGTATCTCC

63720

63721 AGGTGTGTCAGAGCTCCGGAGGAATTCCAAAAAATATGGAAAAGCTGGTGAGTTGAGAAT 63780

63781 GCCTTTCCTAGGAATGGGGGAAGCACTTTTACTGCTGGTTACATTGGTTTCCTTCTCCCT

63840

63841 TGTTGTGCAGGTGAAGCTGTCTGGTTCTCATCTGACCCCCCTGTGTTATTCTTTCATTTC

63900 63901 TTACGTACAGGTAAGAGATTCAGAGGCAGTACCCAATAGGTCTTCAAGAAAGGATTTCTT

63960

63961 ACATGCCCAGGGGACTATTGATCACCTGAACTGAGCATATATAAACTTTTTTCCCTGCTT

64020

64021 ACCACAAGCTGGAGGCACCCATCAGAGACCAAGCCAAGGCTTGAGGGCTTTGCAGCAAGG 64080

64081 ACATCTTTATGTAAAGAAAACTGTTTCACTTAATTTGGAGAAAGGAAGCATATATGTTTG

64140

64141 CATTTGGAGCAAGGAAGGAACAAGCATGCTCAGGCTCAAGGACTTGGTTTATTCCTGGGT

64200 64201 GACTGAAAGTCCATGCCATGAGGCATCCTCTGGGAATGTGAGCAGAGTAGCAGCAAGCAC

64260

64261 ATGAAGAAGGGTTGTGATGAGACGGCAGTGCAGTGAGGGCACTTTCAGGGAAGTGGAAGG

64320

64321 ACGTGGCTTTGTAAGTGTCATTTGGTCCTAAATACCATCAGATTCCTGCAGTACTTGAAT 64380

64381 CTCCTTTACTGTTTGCTTAAACCAGTTTTTTAACCTCACCACTATTGACATGTTGCGCTA

64440

64441 GATAATTCTTTGTTGTGGGACCTGTTCTGTGCATTTTGGGATGTTGAGTAGCATCCCTGG

64500 64501 CCTCCACCTACCAGATGCCAGTAGCAGCCCCCTAGTTTTAACAGCCAAAAATGTCTCCAG

64560

64561 AAATTGCCAGTTGTCCTTAAGGGACAGAATCACCCCTCGTTGGAAAGAGGAAGAAACGAG

64620

64621 AGATCTACTGTCTCCTAAGGCCACTTAGTCATGCAGTAAATGTATCACAGGGCTGTTACT 64680

64681 GATCTTTGAGGAATCAAAGAGAATGGGTCCACTAAGTGTCAATCTCAGCCAAGATTCTAG 64740 64741 AACCAATTATTAAAGCAATGGATCGTGACCACTTTGGTAGTGATGCTTCACTTGTTTTCT 64800

64801 GCCTCTGCATCACTTACAGGCACAATATCATTCTATCAGTAACTTGCCTCATTATGTGTA 64860

64861 TTCAGGGTGCCTCTTCCCCTCCCTTTAAGAATCAGCAATCAGCTGGGCACAGTGGCTCAT 64920

64921 GCCTGTAATTGCAGCACTTTGGGAGGCCGAGGCGGGTGGATCACCTGAGGTCAGGACTTC 64980

64981 AAGACCAGCCTGGCCAACATGGTGAAACCCCATCTCTACAAΆAAATACAAAAΆΆTTAGCT

65040 65041 GAGCATGCTGGCGGGCACCTGTAATCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATCGC 65100

65101 CTGAACCTGGAGGGCGGAGGTTGCAGTGAGCAAAGATCGCACTGTTGTGCTCCAGCCTAG 65160

65161 CAACAAGAGCGAAACTCCGTCTCAAAAAAΆAAAΆAΆATAAAGΆΆTCAGCCATCAAGTGGG 65220

65221 CTGTCATCTGGGTTTAGCTGAAGGAGTTCTTAGTAATGGGCTGATAGTTGAGACTCTCTG

65280

65281 TCACCTTGCCTTTGTATCAGTTTTGTGATCTGGGTCAGGAATGCATCATCCATTTTCTAT

65340 653-41 ATCCAGCCATCCAGCCAGTAGTCAACTGTCATAAAGCTATTACTTCTGGTCCTCTCCTTT

65400

65401 TΆTCTCACCCAΆATGCTTGCΆGGAΆTGCTTTCTCAΆACCTCATGCTTGTAΆATTTTCATG

65460

65461 TGGGTGCCATTCGTCAAACAAATCTGTTGTCTATATATAGGATATATATCTGTCCATTCC 65520

65521 CATATTTGGGTTTTGAGTTTCCTCATCTCCAAGTCTTAATGATACCTATAAAGACACATT

65580

65581 TCACAAAATATGCTATATAGACTGAGACAGTCCTCAAAAAAAAAAGGAATTCAGGGAGAG

65640 65641 ATGCTCATACCGTATATCTCTCACCTTAGAGATTCGTGGTGTCCATTTGCATATTAAAGG

65700

65701 TGCTGAGACCAGTCATGGTGGTTAATGGAGGCTGAGGTAAGAAGATCACTTGAGGCCAGG

65760

65761 AGTTCAAGACCAGCCTGGGCAACATAGTGAGΆCCCCGTCTCTACAAAAAΆTTTAAAΆTTT 65820

65821 AAGCCAGGTGTTTGCCTATAGTCCCAGTTACTTGGGAGGCTGAGGCAGGAAGATCACTTG 65880

65881 AGCCTAGGACTTGAAGGCTGCAGTGAGCCATGATCATGCCACTGTACTCCAAGGCAACAG 65940 65941 CGTGAGACCCTGTCTCAΆGAAAAATAΆΆAΆCAGTGCTGAGAGTTCCAACAGTATGTTCAT

66000

66001 TAGTAGTTTTCAGACCTATTCAAGAACATTTTTAAAAAAATAACAAATGTTACTATTTGG

66060

66061 CAGAATAATGTTCCATAGAGTACAGTTTGGGAAATACTGCTGTTAAATGCTATTTACTAC 66120

66121 CCTTGTTATGATTCAAAAGTCACATTGTCTAATAATTTTATTAGAACAGGTAATGCTTAG

66180

66181 AGAACCTCTACCTTGTGTCTGATGTGTTAAGCTCTTTTCATACATCATCTCATTCATCCT

66240 66241 AGCTATAGGTTATGATTAGTAGTTATCCCCATATGCATGATGAGAAAGGTTAATCAGAAT

66300

66301 GCCCAAGAACAGACCATGAGTAAAAGGCAGATCTGAGACTTGAACCCAGCTAGTTAGACT

66360

66361 CGATAGCAGGAATTGGCACACTTTTTCTGTAΆAGGGCCAGATAGTAAATGTGGGTCATAC

66420

66421 AGTCTCTATCACACCTACTCAACTCTACTATTAGAACATAAAAGCAGCCATAGATAACTT

66480 66481 ACAAATAAACGAGCATGGCTATGTTCCAATGAAATTTTCTTATCAAAAACAGGCAGTGGG

66540

66541 CCAGATTTATACTATGGGCTATAGTCAGTCAACCATTGCTCTAGAGTCATACCTGGTACT

66600

66601 TGTTGACATGTAAAATGTTGGAAGGTTTTGATTTTCGTGTTTCTGACCCCATTTTCAAGC 66660

66661 ATTTTCTTCTGGGAATCACTACTACTTCCTAGACTGCTATTATTTTCCCCTCATTACACA

66720

66721 GTAATCAGTTTCATTTCCTCCCTAGGCATTTTCCTCCTCCTCCATCCATATTTGACTTAA

66780 66781 AGCCTTACTGACAAGTCAGCTGATCTGCAGGCAAACATTTTACATTTTTCTGTTTTTGTG

66840

66841 AGATGTATTTAATGAGTGCCCCCTGGGCAAGAGCCCATGCATGGTTCCATTAACTCCAGC

66900

66901 TCTTAACTAGAAATGGGGATTGCATTTTCCACACCTCTCAGTGAACTGTTAATTCTCTTG 66960

66961 TAGCCCCCTCTTCATTTCCCCAGTTCCTTAGATCTCAATGGGAAGAAAGCAACGAAAACC

67020

67021 ATTGCTGCAACTCTTAGCATCCTGCCCAGAACTGTATACCTTTAAAAGATGCTTCTCAAG

67080 67081 GGTCGGACATGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGAGAA

67140

67141 TCACAΆGGTCTGGAGTTTGAGACCAGCCTGGCCAACATGGTGAΆΆTCCCGTCTCAΆCTAA

67200

67201 AGATACAAAAAATAGCAGGGCACGGTGGCTCACACCTGTAATCTCAGCATTTTGGGAGGC 67260

67261 TGAGGCAGGCGGATCACAAGGTCAGGAGTTCGAAACTAGCCCGGCCAACATGGTGAAACC

67320

67321 CCATTTCTACTAAAAAAAATACAAAAATTAGCTGGGCATGGTGGCAGGCGCCTGTAATCC

67380 67381 CAGCTACTCGGGAGGCTGAGGCAGGAGAATTGCTTGAACCCGGGAGGTGGAGGTTGCAGT

67440

67441 GAGATGAGATCACACCACTGCACTCCAGCCTGGGCAACAGAGCATGACTCTGTCTCAAAA

67500

67501 AAAAAAATTAGCCGGGCATGGTAGTGCGTGCCTGTAATCCCAGCTACTCGGGAAGCTGAG 67560

67561 GCAGGAGAATCGCTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCTGAGATCGCACCATTG

67620

67621 TACTCCAGCCTGGGCGACAGAGTGAAACTCCACCTCAAAAAAAGATGCTTCTCTTCTGGC

67680 67681 TCTAATTTTAGTCCTGCCCGCCCGGATCAGTTGGAATGGGCAGGGAACAGGGTTGATCTA

67740

67741 AGGAGTTCCAACAGGGCCCCATCTTGGACACAACCTCTGGGAAGATGGTTCACTTCCCAC

67800

67801 TTGGTCAGTGAGGTCGATCCGTGACTGCTTCCTTGCCCTCTTCTGCTGCCTTCACCCTCT 67860

67861 TCCATGCTCTACCCTGCCTTCCAGGGCCAAAAACAGATTACTGCATACCAGCCCCTCTGG

67920

67921 AATGTCTGCAGTTCCTTATTTTCCAGAACGTGGCATGTGGAACAAGTCTCCTGTTTCTCT

67980 67981 CATGGCTCCAGTTGCAGAGCCTCTCTGTATTTTCAGGTCTGTTTCTTTATTCTGTTGCCC

68040

68041 TCGTGGAGCATGGTATGTTCCCTTCCTAAGAAGGTTTAGAACTCCCCTTGAGGTAAAAAG

68100

68101 CTGCCTACATTTCTGTACCGTACCTTACATTTGGGTGGTTTTTCTCCCCAGTCTCTCCTG

68160

68161 ACTCAAGTCTTGATTTTCTTTTGCTTTATCAGAATCTTACACTTTTTGGTTCCTTTTATT

68220 68221 AGTTATTCATCCCTCTCCAGGAGCTCTTTATTACCCAATAATTTACAACACAGGGAAGTG

68280

68281 TTCTGTGACCCTTTTAACCCTTCTTGAGCATGATAACCCATTGCACATAAAATATATCTG

68340

68341 GTTTTGGCTTTGCTGAGTAACGAAAATAAAATATATCTGGTGATTGCCTGTAAGTGGGAA 68400

68401 GACTGCAGGGCCCTGCAGCAGCCTCCTCAGCTTCCTCTCCTTTATGGGATCCTTCTAGCC

68460

68461 CACTGGCTAAAATAGAAATAGAAATGAATATTCCTGCTACCACCTTGACAGTCCCCATTT

68520 68521 TAACCATTCTTTAGTACCTCACCAGCCCCTTGGCTGACCCACTGCAGTAATAGTCCTGCC

68580

68581 TTCAACTTCCATGTGCCAGCTCTAAGAACAACAGATTGTAAACCTCTCAACAACTAAAAA

68640

68641 AGAAAAAATATCTGAGGATTACAAAGCATCCATTTTTTCCCTATCTTCTAACATCTGTTG 68700

68701 CACAGATAGTCCTGTTTACTCCAGACTTGGTCCTCATACCCTGTGTAATTACCTTCTAAG

68760

68761 AAAATTGCCACTAGTTGGTGGGCATGGTGGCTTAGGCCTGTAATCCCAGCACTTTGGGAG 68820 68821 GCTGAGGCCGGCAGATCACCTGAGGCCAGGAGTTCGAGACCAGCCTGGCCAACATGGCAA 68880

68881 AACACCGTCTCTACTAAAAATACAAAAATTAGCTGAGTGTGGTGGCACGTGCCTGTAGTC 68940

68941 CCAGCTACTCGGGAGGCTGAGGCACAAGAATCACTTAACTCGGGGCGCAGAGGTTACAGT 69000

69001 GGGTCGACATCACGCCACTGCACACCAGCCTGGGTGACAGCAAΆGCTCTGTCTTAAAAΆΆ

69060

69061 AAAAAAAAAAAAATTAGCACTAGCATGCTAGCTCCACAGAAAAATCATCAGGAATAAGAG

69120 69121 AATGTGTTTCTATTTCTTTAGAATAAGAGTAAGAGCCTGAACTATACGGGAGAGAAAAAG

69180

69181 GAGAAACCTGCTGCCGTTGCCAGAGCCATGGTAAGCTGCTGACACTGACCGTTAAAATAT

69240

69241 GCTTGTTGGTGAACCCGAAAACATGAAGCGGAAGAAGCCAGTGACAAAAAGACCACGTGT 69300

69301 TGTATGATTCCATTTTCATGAAGTGCCCAGAATAAATCATCTAAAAAGGCAAACTAGATT

69360

69361 AGTGGTAGCGTAGAGCTAGGTGGGTTGGAAGATGGGGAGACTGGGGTGACAGCTAAAAAC

69420 69421 GGTATGGAATTTCTTTTGGGGGTGATGAAAATGTTTTAGAATTGACTGTGATGATGGTTG

69480

69481 CACAACTCTGCAΆATACTGAΆAAAΆAAGTACATTGTGAGTGAΆTTGTATGGTATATGAAT

69540

69541 TACAGCTCTATCAAGCTGTTAGAAAAAATTGTGTGTATTATATACATGGCTGCTTCTCAA 69600

69601 CTCAGGGATTTTATGATCAGATGGAGCATCTTATTTTTCCTCTGAACCAAAAACATGTTT

69660

69661 TATTCACGAAGCACCAGAAAATCTTGAGTTGAAATGGAAATAGTAAAATAACTTACGATA

69720 69721 CTCAATTGGGAGAAATTATTTTTAGGTAAAGTTGGTGAATGCAAAAGGTGCTCCCTAAAA

69780

69781 GATGTGTAAGAGTCAGCGGGGGTACAGGAAAGGAGGCACTTAGTACATTGCTGGCAAAGG

69840

69841 TGAAAATTAGTACAGCCTCTCCAGAGGCTTAGCAATGACTACTGAATCCCATTCTGATGT

69900

69901 AGCAGTTTCACTTTTAGGATATACAGGCCTGTGTACAGGATGGTCCTCACAACATTATTT

69960 5 69961 ATAAAAGTGAAATATTAGAAACAAAGATATCCATTAATAGAGGACTGGCTAAATAGGTTA

70020

70021 TAATATATTTATACTGGAGAGATATGAAAAAAATAAACATCTAAATGTTCAGCTCTCTGG

70080

70081 AATTATCTCCAAGATAATGTTAAAAAAAATTTGTACAGTATCCTCATGGGTTAAAAATAC 10 70140

70141 ACACACACACACACACACACACACGTAGGACCTACAAATTACAATTCACGTTAGGGGGAA

70200

70201 AGATATATAATCACACACACACACACACACACACACACACAAATGTAGGACCTACAAATT

70260 15 70261 ACAGTTCACGTTAGGGGGAAAGATATATAATCACAGACACACACACACACACACAAATGT

70320

70321 AGGACCTACAAATTACAATTCACGTTAGGAGGAAAGACACACACACAAAAACGTAGGACC

70380

70381 ACACACAAAAATGTAGGACCTTTCGTTAGGGGGAAAGACACACACACACACACGTAGAAC 20 70440

70441 CTACAAΆTTACAΆTTCACGTTAGGGGGAAΆGATATΆTAATCCCAGCAGTGAAAATTCATA

70500

70501 CCCAGACCAGAACAAGGAAATAGAGCTAGGGCCAAAATAGGCTTAAGAATTTCAAGTCAG 70560

25 70561 AAAAGCAACTAAATTTCTACAAGTGATCCTGCCCTCCTCAGCCATTTTCTAGATAACAAC 70620

70621 AAATΆCTGGAATTATAΆATGAΆATGGATCAGTGCAATTATCTCTACTGGCATTTGGGTCC

70680

70681 TTCCAGATTGAACTGCCAAAAATTTTAGATTACTGGTATACAACTGCATTTGATTGGGAG 30 70740

70741 ACAGACTAGTTTTCCCCTAAAGCCATACAGTTTTGTAAGTGACAGCTTCAGAGGAAAACT 70800

70801 TCΆAΆAACCATTGAΆCCTGAAΆTTTAAGTCTTATGTTCTTTGCCCTTAGGCCAGAGTTCT

70860 35 70861 TCGGGA2y^CCi^GCCαATCCeTαA^

70920

70921 C^TCCACCTTTCTi^GiyiLGTCCaAGGTAAACATTCTCTTATTATGTGCTACCATTCCCAT

70980

70981 TTACCTTCTTGACAGATCTGGCAAACTGAAGCAGCACAACTACAATGGAGGAAAAGAGAC 40 71040

71041 TCTGGGCCATTTGTGTGTTTGCCATCCCCCCTCTCCCCCCCCCCCCCTTTTTTTTTTTGA

71100

71101 GACTGTGTCTCATTCTGTCACCCAGGCTGGAGTGCACATTCAAGTGATCCTCTGGCCTCA

71160 45 71161 GCCTCCAGAGTAGCTGGGACTACAGTCATATGCCATCATACCCCACTAGTTTTTTGGGTT

71220

71221 TGTTTTTGTAGAGATGGGGTCTCACTATGCTGCCCAGGCTGGTCTCAAACTCCTGGCCTG

71280

71281 AAGTGATCCTTCCACCTTGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCACAC 50 71340

71341 TCAGCCTCTTTGCTGTTCTTGATCTGATGACCTGAACCCCAGCATACAGAAGGAAGTGGG

71400

71401 TGCGTGCTTGCTTTAGGTAGAAATGGAAAGGTTTCTCAGAGGTGAACTAAAAATGGCTTA

71460 55 71461 AGCTGTTCTGGCAGGACCTATGAGTAGGGAGATGTCCCATGCTTACAATCTTGTCATAGG

71520

71521 TGλαCCTGATGCAGCACATGAAGCTCAGCACCTCACGAGACTTCAaGATCAAAGGAαACA

71580

71581 TCCTAGACATGGTTCTTCGAGAGGATGGTGAAGATGAZkAATGSAGAGGTCAGTGCTGGCT

71640

71641 TCTGTCTGGAGCCCAGCCACTCTTCCTAGCTGGTTAGCAGCCTATCTGGGAGCAGTCACA

71700 5 71701 GATTAGTGGAAGGGCAACAGAATGGGAAAGGGCTAAAAGGTGATCAGGCAAAGATGAGAG

71760

71761 CAAAGGACTCTCAGAAGGGTTGTAAACTTTGGACCACACAAAATCCTATGAAGTTGCCCC

71820

71821 GACTTGTTTGATAAACACTTCCAGGAAGCAGGTGGTAACAAAACAATGTCCTGCTGTGGG 10 71880

71881 TCCTCTTCCTGTTAATCTTTTAAACTACTGGATGATGACCAGCTATAGCAGTCTGGCTCA

71940

71941 GAAACTAGTTTACCTCTCATGTCATGCCCTCATTCAAAGGCCACTCATTAGTTTGGGTTA

72000 15 72001 AGGCTTCTTCCTTCAATTCATGCCAATGTCTTGTCCTTTTCCCTTGTGGTTTGTTCCTCT

72060

72061 CCCTTTACTCTTAGCAAGTGACAGTACCGCATCCCCATCCCTGCACCACCTTTGAAACAG

72120

72121 AACCCAATGTTTCCTTACTTCTAGCACCTAAAATTTTCATCTTAGGTTTGTCTAATAAGT 20 72180

72181 AGCAAATCAGTAGCCCACTCCCATTTCAAAACTGGAAGTTTACATACAAGTGCCCTGGAT

72240

72241 TTCAGAATTTTGAACATGCATTCTTCAGATGATCACACTCCAGGCTCTAATGTACCTGGC

72300 25 72301 CCCATCCACTCTCTTTACCCACTTGGCAGCTGCACTCAGGAGAGTCATGTAGCATTTTGA

72360

72361 TGTCAATGGCTTCTTCTCTTGTCATTTTGTCCCATAAGTATCCTCAGTATAGCδAAAGTC

72420

72421 CTAAAAGGAGGTGCCACTGTCTTATTACTCTACACTTTGGAGGACTCCTTCCTCTTCTTC 30 72480

72481 AACAACATAATTACCTCCTGTGAGTGGGAAAGTGGGGAAAGCATGACTAACAAAGGCTTT

72540

72541 GATGAGAAGATAGAGTCTTTTTTTTCCTTCTTTTTATTTCCACTGCCTTGGAGCAGGTTT

72600 35 72601 ATCACGTTAGAGCATTAATTCTTTCCCCTTCTAGGGCACTGCATCAGAGCATGGGGGACA

72660

72661 GAACAAAGAACCAGCCAAGAAGAAAAGGAAAAAATAAATGAAATGCCTGAGTTAATGTGA

72720

72721 ACTTTGGGGCTTCTGCTTCATTTTTACCCAACAAGCAACAATGCCCCTTGTCCTGTAGTC 40 72780

72781 CACACCGATGTTGGCATCTTGGTTCTGAACCCACTGAATTCAACTGCACCTTCAGTTAGA

72840

72841 AGGAATCTTCTTGGCAGGTCCTGCTACTGAAAAATGGCTGGCCTTAGGCAAGCCCTTTTG

72900 45 72901 CAAAAAGCACAGCTGAAAGCCTGAGTTTGGGAGCCTGCACCACCCCGATGAAGCTCCACG

72960

72961 GGAGCAAATACAGAGCCTCCAGGCAGTGCTATGGTCCAGGCTGGCTTCGTTTTTCCAAGG

73020

73021 AGCCTTTGGTGAGTTCAATTATCTGGTAAATATCCAGCGCTTCACCTGAAAGATAGTGCA 50 73080

73081 AATTGGTTAGGATGCCACCTCAAGAACTGTAACTGAGAGCTCAGAAGTGAGCAAAGGAGC

73140

73141 TTAATGCTAAGGTCAAAAGGAGAGTGAAAGGTTGAGAΆCAATTGCCACGAACGGTAΆTGT

73200

55 73201 TACATGTTAGGAGGGTCTGTTTTCTTTTTATATAAGTGTGTCTTAGATATATTTTAAATA 73260 73261 GAAAATAAGCTTTCTGATTTACTTGTTTGGTATTTAAAGCACAGTTTGTTTTTCTGTCAC

73320

73321 CTATAGAGTGCAAGAATGCACTCTATAGAATAi^ATTATCTTTA2\ACATTTCTTCTGTGGT 73380

73381 TGAAGTAGGGGACAGGTACAGGTAGAATATTTGAAGCTCTGCTGCCTTCATTTCTGAAAC 73440

5 73441 ATCATATCACATTCACTCTGGACACAGGGCACCTTATAAACTGAAATTAGCCTAGAATAT 73500

73501 AGCCTGAGTCAAGAGTGGATTCTCTGGGGCCCCAAGTTTCCTGTTCTCCAAGACCCACTT 73560

73561 TCTAGTCCA 10 73569

Description:

Title: A gene mutated in Fanconi anemia complementation group I

The current invention relates to an isolated human genomic DNA molecule on chromosome 15 wherein said DNA molecule has a nucleotide sequence which sequence is mutated in complementation group I of Fanconi anemia (FA), the FANCI gene. It further relates to methods for determining a genetic defect in a patient, the defect being a mutation in the Fanconi anemia gene of complementation group I, or for complementing a genetic defect in an isolated cell, the defect being a mutation in the Fanconi anemia gene of complementation group I. Furthermore the invention relates to the use of the gene and fragments, cDNA and fragments, polypeptides and fragments, and antibodies against FANCI and siRNAS against FANCI, for diagnosis of FA, determining hereditary predisposition, tumor classification, for treatment or for drug development.

Fanconi anemia (FA) is a genetically heterogeneous chromosomal instability disorder with both autosomal and X-linked recessive inheritance and characterized by developmental abnormalities, retarded growth, bone marrow failure, and a high risk of cancer (Joenje, H., & Patel, K.J. Nat. Rev. Genet. 2, 446-457 (2001); Levitus, M. et al . Cell. Oncol. 28, 3-29 (2006); Taniguchi, T. & D'Andrea, A. D. Blood 107, 4223-4233 (2006) .

Cells from FA patients are hypersensitive to cross-linking agents such as mitomycin C (MMC) or diepoxybutane . Thirteen complementation groups are currently distinguished, all of which - except group I - have been linked to distinct disease genes (Levitus, M. et al. Cell. Oncol. 28, 3-29 (2006); Taniguchi, T. & D'Andrea, A. D. Blood 107, 4223-4233 (2006); Xia, B. et al. Nat. Genet. 39, 159-161 (2007)).

All FA proteins are supposed to function in the FA pathway of genomic maintenance. Most of these proteins assemble into a multiprotein core complex which functions as an E3 ubiquitin ligase to modify FANCD2 by monoubiquitination (Garcia-Higuera, I. et al. MoI. Cell 7, 249-262

(2001)). FANCJ/BRIPl, FANCD1/BRCA2, and FANCN/PALB2 are supposed to act downstream of this modification step, because FANCD2 ubiquitination appears normal in cells lacking these proteins. The ubiquitinated form of FANCD2 is called FANDCD2-L, whereas the non- ubiquitinated form is called FANCD2-S.

Patient cell lines of complementation group I (FA-I cells) are deficient in FANCD2 ubiquitination and are characterized by a defect in the association of FANCD2 with chromatin (Levitus, M. et al . Cell. Oncol. 28, 3-29 (2006).).

Recently, a cDNA for the FA gene corresponding to complementation group C (FA-C) was cloned and located to position q22.3 on chromosome 9 (WO 93/22435) , and genetic map positions of the FA-A and FA-D genes were reported. Such progress brings the possibility of DNA-based diagnosis and therapy for Fanconi anemia significantly closer.

However it is until now unknown what the background is of the deficiency in FANCD2 ubiquitination and the defect in the association of FANCD2 with chromatin in this complementation group I, thus hampering further research and development of new drugs that might be helpful in treating a wide variety of disorders.

It is the therefore object of the present invention to provide a human DNA gene sequence for the FA-I complementation group.

The identification, cloning and sequencing of such a DNA molecule should facilitate new and improved methods of diagnosis and treatment of Fanconi anemia, and also cancer.

The goal is achieved by the subject matter- disclosed in the various claims and in the description and examples provided herewith.

In order to facilitate review of the various embodiments of the invention and an understanding of various embodiments and constituents used in making the invention, the following definition of terms is provided:

DNA: deoxyribonucleic acid. DNA is a polymer which comprises the genetic material of most living organisms (some viruses have genes comprising ribonucleic acid (RNA) ) . The repeating units in DNA polymers are four different nucleotides, each of which comprises one of the four bases, adenine, guanine, cytosine and thymine bound to a deoxyribose sugar to which a phosphate group is attached. Triplets of nucleotides, referred to as codons, in DNA molecules code for amino acid in a polypeptide. The term codon is also used for the corresponding (and complementary) sequences of three nucleotides in the πiRNA into which the DNA sequence is transcribed.

cDNA (complementary DNA) : a piece of DNA lacking internal, non-coding segments (see introns) and regulatory sequences which determine transcription. cDNA is synthesized in the laboratory by reverse transcription from messenger RNA extracted from cells.

FA: Fanconi anemia.

FA carrier or FA heterozygote: a person who does not exhibit apparent signs and symptoms of FA but whose chromosomes contain a mutant FA gene that may be transmitted to that person's offspring.

FA gene: a gene, the mutant forms of which are associated with the disease Fanconi anemia.

This definition is understood to include the various sequence polymorphisms that exist, wherein nucleotide substitutions in the gene sequence do not affect the essential functions of the gene product.

This term relates primarily to an isolated coding sequence but can also include some or all of the flanking 5' and 3' regulatory elements and/or intron sequences, and the so-called UTR-regions (untranslated regions) .

FA patient: a person who carries a mutant FA gene, such that the person exhibits clinical signs and/or symptoms of FA.

FA-I: Fanconi anemia of complementation group I.

FA-I carrier or FA-I heterozygote : a person who does not exhibit signs or symptoms of FA but whose chromosomes contain a mutant FA-I gene that may be transmitted to that person's offspring.

FA-I gene or FANCI: the gene, present in the human genome, mutant forms of which are associated with Fanconi anemia of complementation group I. This definition is understood to include the various sequence polymorphisms that exist, wherein nucleotide substitutions in the gene sequence do not affect the essential functions of the gene product. This term relates primarily to an isolated coding sequence, but can also include some or all of the flanking regulatory elements and/or intron sequences.

FA-I cDNA: a human cDNA molecule which, when transfected into FA-I cells, is able to complement the hypersensitivity of those cells to DNA crosslinking agents. The FA-I cDNA is derived by reverse transcription from the mRNA encoded by the FA-I gene and lacks internal noncoding segments present in the FA-I gene.

FA-I protein or polypeptide: the protein encoded by a human FA-I cDNA. This definition is understood to include the various sequence polymorphisms that exist, wherein amino acid substitutions in the protein sequence do not affect the essential functions of the protein.

Intron: The DNA base sequence interrupting the protein coding sequence of a gene; this sequence is transcribed into RNA but is cut out of the message before it is translated into protein.

Mutant FA-I gene: a mutant form of the FA-I gene which is associated with Fanconi anemia of complementation group I.

Mutant FA-I RNA: the RNA transcribed from a mutant FA-I gene.

Mutant FA-I protein: the protein encoded by a mutant FA-I gene.

ORF: open reading frame. Contains a series of nucleotide triplets (codons) coding for amino acids without any termination codons . These sequences are usually translatable into protein.

PCR: polymerase chain reaction. Describes a technique in which cycles of denaturation, annealing with primer, and then extension with DNA polymerase are used to amplify the number of copies of a target DNA sequence.

Purified: the term "purified" does not require absolute purity; rather, it is intended as a relative term. Thus, for example, a purified protein preparation is one in which the protein referred to is more pure than the protein in its natural environment within a cell.

siRNA: Abbreviation for small inhibitory RNA, a short sequence of RNA which can be used to silence gene expression.

Additional definitions of common terms in molecular biology may be found in Lewin, B. "Genes IV" published by Oxford University Press.

In particular, there is provided an isolated human DNA molecule derived from chromosome 15 wherein said DNA molecule has a nucleotide sequence which sequence is mutated in FA complementation group I and in individuals predisposed to cancer.

As will be exemplified in the examples, it has surprisingly been found that chromosome 15 comprises at least one gene that has a nucleotide sequence which sequence is mutated in FA complementation group I. In other words, the isolated DNA molecule does in normal and healthy person not contain a mutation, whereas in FA complementation group I said mutation might be present.

In a preferred embodiment said mutation found in FA complementation group I contributes to the phenotype or molecular basis of FA.

According to another embodiment the DNA molecule is localized to locus 15q25-26. As will be exemplified in the examples, it has been found that in particular from the locus 15q25-26 DNA molecules can be isolated that can be advantageously utilized to solve the problem of the current invention. In other words, there is provided an isolated DNA molecule from locus 15q-26 of the human genome and said molecule

does in normal and healthy person not contain a mutation, whereas in

FA complementation group I said mutation might be present.

In a preferred embodiment said mutation found in a DNA molecule isolated from locus 15q25-26 of DNA form FA complementation group I contributes to the phenotype or molecular basis of FA. According to another embodiment of the invention, the DNA molecule contains a gene, wherein the (reference) gene has 38 exons with a translation start in exon 2, and encodes a 1328 amino acid protein with 3 nuclear localization and 3 ATM/ATR phosphorylation motifs. Other splice variants do exist and are contemplated.

The person skilled in the art knows what ATM/ATR phosphorylation motifs are, namely phosphorylation sites for the phosphatidyl inositol 3-kinase-like kinases ataxia-telangiectasia mutated (ATM) and ATM- and Rad3-related (ATR) .

It has surprisingly been found that in particular a DNA molecule that contains a gene, wherein the gene has 38 exons with a translation start in exon 2, and encodes a 1328 amino acid protein with 3 nuclear localization and 3 ATM/ATR phosphorylation motifs provides a solution for solving the problem of the current invention. In other words, in particular in such gene it was found that in contrast to healthy humans or cells not showing any phenomena related to FA, in particular as those observed in FA complementation group I, mutations might be present in FA complementation group I, mutations that appear to contribute to the phenotype or molecular basis of FA.

It is contemplated that also alternative splicing forms of the genomic gene is comprehended by the current invention.

The skilled person knows that there are at least four known modes of alternative splicing:

Alternative selection of promoters: this is the only method of splicing which can produce an alternative N-terminus domain in proteins .

Alternative selection of cleavage/polyadenylation sites: this is the only method of splicing which can produce an alternative C-terminus domain in proteins. In this case, different sets of polyadenylation sites can be spliced with the other exons .

Intron retaining mode: in this case, instead of splicing out an intron, the intron is retained in the itiRNA transcript. However, the intron must be properly encoding for amino acids. The intron 's code must be properly expressible, otherwise a stop codon or a shift in the reading frame will cause the protein to be likely to be nonfunctional.

Exon cassette mode: in this case, certain exons are spliced out to alter the sequence of amino acids in the expressed protein.

Said splicing variants of the gene are in particular contemplated/comprehended in the current invention.

According to a next embodiment of the current invention, there is provided a DNA molecule according to any of the previous claims selected from the group consisting of

-a DNA molecule having the nucleotide sequence shown in SEQ ID No. 1, or the complementary strand of said nucleotide sequence;

-a DNA molecule that shows at least 80%, more preferable 90%, even more preferable 95%, most preferable 98% homology with the nucleotide sequence shown in SEQ ID No. 1, or the complementary strand of said nucleotide sequence.

It has surprisingly been found, as exemplified in the examples, that in particular said sequence, comprising at least several introns and exons, but also including the 3' and 5' UTR (untranslated region) and being/comprising the genomic DNA molecule that contains the FANCI gene and encodes the protein, provides for the solution of the problem of the current invention. Thus said genomic gene, from which for example a cDNA can be derived, is comprehended by the current invention. In other words, it has been found that in the sequence disclosed in Seq ID No 1 mutations can occur (both in exons and or introns) that might be observed in FA complementation group I and in individuals predisposed to cancer. Also comprehended is a DNA molecule that shows

at least 80%, more preferable 90%, even more preferable 95%, most preferable 98% homology with the nucleotide sequence shown in SEQ ID No. 1, or the complementary strand of said nucleotide sequence.

Having herein provided the nucleotide sequence of the genomic gene of FANCI (Seq Id No. 1), correspondingly provided are the complementary DNA strands, which for example can be used as basis for primers and the like, useful in the polymerase chain reaction or as hybridization probes. Such probes and primers are particularly useful in diagnosis of FA-I carriers and sufferers.

Having provided the isolated human FA-I genomic gene sequence, also comprehended by this invention is the (reference) cDNA derived therefrom and disclosed in SEQ ID No 2. The present invention also provides for the use of the FA-I cDNA and derivatives thereof, the corresponding genomic gene and derivatives thereof and of the FA-I protein and derivatives thereof, in aspects of diagnosis and treatment of FA-I.

More in particular, in another embodiment a DNA molecule according to the invention is provided, the DNA molecule being a cDNA molecule selected from the group consisting of: a. a DNA molecule having the nucleotide sequence shown in SEQ ID No. 2, or the complementary strand of said nucleotide sequence; b. A DNA molecule that shows at least 80%, more preferable 90%, even more preferable 95%, most preferable 98% homology with the nucleotide sequence shown in SEQ ID No. 2, or the complementary strand of said nucleotide sequence.

Said cDNA allows for efficient studying and utilization of the FANCI gene in the development of, for example, methods for diagnosis of FA, for counseling of subjects carrying FANCI mutations in cancer predisposed families, for tumor classification, as target for therapy, or as a lead for drug development. For example said cDNA can be introduced in a vector like a plasmid and brought to expression in bacteria, yeast and or other organism. Obviously it can also be used to produce protein/polypeptides, which for example can be further used

for developing antibodies, finding drugs that interact with the FA pathway, ' of in particular with the FA genes and/or proteins.

Also comprehended is a DNA molecule that shows at least 80%, more preferable 90%, even more preferable 95%, most preferable 98% homology with the nucleotide sequence shown in SEQ ID No. 1, or the complementary strand of said nucleotide sequence.

Having herein provided the nucleotide sequence of the reference cDNA of FANCI (Seq Id No. 2), correspondingly provided are the complementary DNA strands, which for example can be used as basis for primers and the like, useful in the polymerase chain reaction or as hybridization probes. Such probes and primers are particularly useful in diagnosis of FA-I carriers and sufferers.

DNA molecules which differ in minor ways from those disclosed. DNA molecules and nucleotide sequences which are derivatives of those specifically disclosed herein and which differ from those disclosed by the deletion, addition or substitution of nucleotides while still encoding a protein which possesses the functional characteristic of the FANCI protein are comprehended by this invention. Also within the scope of this invention are small DNA molecules which are derived from the disclosed DNA molecules. Such small DNA molecules include oligonucleotides suitable for use as hybridization probes or polymerase chain reaction (PCR) primers.

The degeneracy of the genetic code further widens the scope of the present invention as it enables major variations in the nucleotide sequence of a DNA molecule while maintaining the amino acid sequence of the encoded protein. Thus, the nucleotide sequence of the FANCI DNA could be changed at certain position to any of those codons that encode the same amino acid, without affecting the amino acid composition of the encoded protein or the characteristics of the protein. Based upon the degeneracy of the genetic code, variant DNA molecules may be derived from the cDNA molecules disclosed herein using standard DNA mutagenesis techniques as described above, or by synthesis of DNA sequences.

According to another embodiment of the invention, there is provided an oligonucleotide comprising at least 150, preferably at least 100, more preferably at least 50, even more preferably at least 20, most preferably at least 15 consecutive nucleotides of a DNA molecule according to the invention.

As described above, such small DNA molecules can comprise at least a segment of the genomic gene (SEQ ID 1) molecule and, for the purposes of PCR, will comprise at least a 15 nucleotide sequence and, more preferably, a 15-20 nucleotide sequence of the genomic DNA. DNA molecules and nucleotide sequences which are derived from the disclosed DNA molecules as described above may also be defined as DNA sequences which hybridize under stringent conditions to the DNA sequences disclosed, or fragments thereof.

In another embodiment of the invention, there is provided a DNA molecule according to any of the previous claims, wherein a mutation has been introduced by means of insertion, deletion, and/or replacement of one or more nucleotides, and wherein said molecule encodes for a protein that, when introduced into cells from patients with Fanconi anemia of complementation group I, does not reduce the sensitivity of those cells to mitomycin C.

Such DNA molecule thus provides a DNA molecule that carries a mutation that contributes to the FA phenotype, in particular relates to the molecular mechanism of phenotype related to FA complementation group I. It will thus be understood by the skilled person that, for example, by random mutagenesis of DNA according to the invention and introduction thereof in cells from patients with FA, it can be easily studied whether said mutagenesis led to the formation of a DNA molecule and/or protein derived thereof that can either reduce the sensitivity to mitomycin C or not, and thus that might contain mutations that either contribute to FA or not, allowing to quickly identify such mutations. Mutations found in a manner like described above are therefore also clearly contemplated.

According to another embodiment, there is provided a DNA molecule according to the invention, wherein a mutation has been introduced chosen from the group consisting of partial of complete exon deletion,

inserted exon, protein truncation, amino acid substitution. In addition, also mutations or modifications in the promoter region, for example leading to silencing of the promoter and mutations in the 3' flank are also contemplated.

From intensive study it has been found that in particular these types of mutation might contribute to the FA phenotype (see examples) , in particular to the FA complementation group I. Such mutations can thus provide information with respect to the FA phenotype, and are helpful in the study, diagnosis, treatment, and drug development, for example for FA, but also to other diseases influenced by said mutations.

In particular there is provided a DNA molecule, wherein at least one mutation or polymorphism selected from the group consisting of mutations c.2T>C, c.670-2A>G, c.3854G>A, c.3006+3A>G, C.3853OT, c.3437_3455deletion, C.3895OT, c.l264G>C, c.3350-88A>G, C.2572OT, c.2509G>T, and c.2248T>G and polymorphism C.164OT of the nucleotide sequence as shown in SEQ ID No.2 is present. Mutation was found c.670- 2A>G in a breast cancer family negative for BRCAl and BRCA2 mutations.

It is to be understood that the established nomenclature for mutations is used herein, which nomenclature is e.g. also practiced and prescribed by the scientific journal Nature Genetics. Accordingly, λ c.' refers to cDNA sequences, and λ +3' refers to the third nucleotide of the intron beyond an intron boundary. Accordingly, λ -3' refers to a position in the intron, 3 nucleotides before the intron boundary. The 670-2A>G has been found by sequencing FANCI cDNA in 90 index patients that were screened for BRCAl/BRCA2 mutations and found negative, i.e. subjects predisposed for inherited breast cancer. FANCI therefore also seems to play a role in a variety of cancers, in addition to the above-identified, such as breast cancer and ovarian cancer.

As will be exemplified in the examples, in a first assessment, in particular these mutations (and the consequent changes in the polypeptide) provide for mutants that contribute to FA in particular FA complementation group I or predispose to cancer in mono-allelic form. In order words, these mutations, when present in the DNA, can lead to a FA phenotype, in particular FA complementation group I

lead to a FA phenotype, in particular FA complementation group I phenotype, when both alleles are affected by mutation. As set-out above, also other mutants than can be identified as described above are contemplated. In other words, mutations in monoallelic form of the contemplated gene can increase cancer risk. As set-out -above, also other mutants than can be identified as described above are contemplated.

In another aspect of the invention, there is provided a polypeptide encoded by a DNA molecule according to the invention. Such protein, either in mutant form or normal, are very useful in developing new drugs, antibodies, understanding and knowledge with respect to diagnosis, treatment and the like of FA, in particular like that is shown in- FA complementation group I patients. Antibodies, both monoclonal and polyclonal, and antibody fragments, specifically recognizing FANCI protein or mutated protein as identified herein on DNA level are also encompassed by the present invention.

With respect to mutant polypeptides, the following is noted: while the site for introducing an amino acid sequence variation is predetermined, the mutation per se need not be predetermined. For example, in order to optimize the performance of a mutation at a given site, random mutagenesis may be conducted at the target codon or region and the expressed protein variants screened for the optimal combination of desired activity. Techniques for making substitution mutations at predetermined sites in DNA having a known sequence as described above are well known.

Amino acid substitutions are typically of single residues; insertions usually will be in the order of about from 1 to 10 amino acid residues; and deletions will usually range about from 1 to 30 residues .

Deletions or insertions preferably are made in adjacent pairs, i.e., a deletion of 2 residues or insertion of 2 residues. Substitutions, deletions, insertions or any combination thereof may be combined to arrive at a final construct.

In another aspect of the invention, there is provided a method for determining a genetic defect in a patient, the defect being a mutation in the Fanconi anemia gene of complementation group I, the method comprises determination of the sequence of the FANCI gene of said patient.

The invention further provides a method for determining whether a subject carries a mutant FANCI gene, comprising the steps of: a) providing a biological sample from the subject, which sample includes DNA and/or RNA, b) determining the sequence of the FANCI gene or FANCI mRNA, or a portion thereof, c) compare the determined sequence with that of SEQ ID No 1 or SEQ ID No 2.

The biological sample can be any suitable sample from the (human) subject, as long as it contains representative DNA or RNA of the subject that can be sequenced by techniques, know in the art, e.g. by PCR mediated sequencing. The FANCI sequence can be determined by using specific oligonucleotide probes. The sequenced FANCI sequence can be compared with that of wild-type, such as that of SEQ ID No 1. It is also possible to use the corresponding wild-type cDNA, e.g. as given in SEQ ID No 2. Any genetic difference is indicative for a mutation in the FANCI gene.

It is to be noted that in case both alleles of the subject are mutated, any tumor formation is heriditary; in case of the presence of a single mutated gene (heterozygous) , there is predisposition for the tumor formation.

Preferably, step b) comprises determining the sequence of a portion of the FANCI gene encompassing one or more of the mutations as defined above, or a corresponding portion of the FANCI mRNA.

In another embodiment, the invention relates to a method for classification of tumors, comprising the steps of: a) providing a biological sample from the subject, which sample includes DNA and/or RNA, and/or protein,

b) determining whether the FANCI gene is expressed at the RNA and/or protein level, and c) determining the sequence of the FANCI gene or FANCI mRNA, or a portion thereof, d) comparing the determined sequence with that of SEQ ID

No 1 or SEQ ID NO 2, and establish the differences therebetween.

The skilled person will be aware of suitable methods to assess whether the FANCI gene is expressed. He can determine the presence of FANCI protein, e.g. by checking specific binding with an anti FANCI antibody, or he can determine the presence of FANCI mRNA, e.g. by RT- PCR, a well established method in the art, or he can check the methylation condition of the promoter of the FANCI gene. If the promoter is methylated, it can be concluded that the gene is not expressed. If the gene is not expressed, or to a lower extent than at normal conditions, the tumor can be classified as a FANCI defective tumor, by not expressing FANCI or to a lesser extent. On the other hand, if the tummor does express FANCI, the next step is to establish whether the FANCI gene is mutated and to identify the said mutation(s), as explained above. The sequence of the FANCI gene can be determined before, after or simultaneously with the determination of of the expression of the FANCI gene. I.e., steps b) and c) can be performed in any order, or simultaneously. The classification can be made based on the mutation (s) identified.

In another embodiment, the invention related to a method for drug testing, in particular for the suitability for use as an antitumorigenic drug in patients having a tumor wherein FANCI is inactive or less active, comprising the steps of: a) providing a biological sample of the patient comprising

DNA and/or RNA and/or protein, b) determining whether the FANCI gene is expressed at the RNA and/or protein level, and c) determining whether a FANCI mutation is present in the tumor DNA of the patient, d) identifying the said mutation, e) treating tumor cells from the patient ex vivo with an antitumorigenic drug,

f) determine whether the antitumorigenic drug is capable of inhibiting growth of the tumor cells, g) correlating the mutation with the effectivity of the antitumorigenic drug.

The determination of FANCI expression as well as determination of the presence and identification of a FANCI mutation can be established as explained above, wherein again both determinations can be performed in any order or simultaneously. Then, the tumor cells from the patient can be subjected to treatment with an antitumorigenic drug, such as a polyfunctional alkylating agent, preferably a bifunctional agent, most preferably cis-platin. However, any other possible antitumorigenic drug can be used to check whether it is suitable to treat the tumor. If the drug is capable of inhibiting the growth of the tumor cells, the drug is effective for the particular tumor. So, a correlation can be made of the effectivity of the drug and the particular mutation, which is helpful for future treatments of tumors wherein the same mutations are identified. In such a case, the effectivity of one or more antitumorigenic tumors is known, and based thereon, a suitable drug therapy can be designed.

The tumor is preferably chosen from the group, consisting of FA, breast cancer, ovarian cancer, head-and-neck cancer, solid childhood cancer or squamous cell carcinoma. However at least any tumor wherein the FANCI gene is possible affected, is subject of the present invention.

As described, the sequence shown in Seq ID No 1 contains the FANCI gene. An embodiment of the present invention is thus a method for screening a subject to determine if said. subject carries a mutant

FANCI gene. The method comprises the steps of: providing a biological sample obtained from the subject, which sample includes DNA or RNA, and providing an assay for detecting in the biological sample the presence of a mutant FANCI gene or a mutant FANCI mRNA. This assay preferably comprises either: hybridization with oligonucleotides; PCR amplification of the FANCI gene or a part thereof using oligonucleotide primers; RT-PCR amplification of the FANCI RNA or a part thereof using oligonucleotide primers; or direct sequencing of the FANCI gene of the subjects genome using oligonucleotide primers.

The efficiency of these molecular genetic methods should permit a more accurate and more rapid classification of FA patients than is possible with the labor intensive method of classical complementation analysis. All said approaches are to be understood to be included in the term "determination of the sequence of the FANCI gene of said patient". According to another embodiment the invention relates to a method of complementing a genetic defect in an isolated cell, the defect being a mutation in the Fanconi anemia gene of complementation group I, the method comprising introducing into the cell one or more DNA molecules according to the invention, said DNA molecules having no mutations that contribute to the FA phenotype, or more particular to the FA complementation group I phenotype, or the polypeptide derived from such DNA.

It is now for the first time possible to complement a genetic defect in the Fanconi anemia gene of complementation group I, by introducing a DNA molecule as disclosed herein.

According to another aspect of the invention, there is provided the use of a DNA molecule according to the invention, cDNA, siRNA or polypeptides according to the invention for methods of diagnosis, treatment or drug development, in particular for methods of diagnosis, treatment or drug development directed to the diagnosis, treatment or drug testing of FA, cancer, and bone marrow failure.

The following further underline the importance of the current finding:

Although the protein defective in the FA-I complementation group was unknown so far, a clue about its function and position within the FA pathway was deduced from the characteristics of FA-I cell lines. By- focusing on the interactions of the FA proteins within these cell lines, it has become evident that in FA-I cells the core complex is properly formed. This indicates that FANCI, like FANCD2, FANCD1/BRCA2 and FANCJ, does not belong to this core complex and must function downstream or independent. However, FANCD2 is not / barely monoubiquitinated in FA-I cells, suggesting that its function is upstream of FANCD2 activation, but downstream of the FA core complex. FANCI may assist FANCL in the activation of FANCD2. The reduced presence of FANCD2-S in nuclear extracts of FA-I cells suggest a

function for FANCI in binding FANCD2-S to the chromatin. As both forms of FANCD2 seem to associate with the chromatin, it is possible that FANCD2 is monoubiquitinated here. However, the faint presence of FANCD2-L in these extracts might point to a downstream function for FANCI, in binding of FANCD2 to the chromatin, but also in the stabilization of FANCD2-L (See Levitus, M. et al. Cell. Oncol. 28, 3- 29 (2006) .) .

In addition, cancer results from alterations in the genome of somatic cells. These alterations can accumulate due to a genomic instability that is thought to exist in the premalignant phase of tumor development. This state of genomic instability is caused by a somatically acquired defect in a genomic maintenance mechanism. Many different maintenance mechanisms have been described. A mechanism that has recently been identified is the "Fanconi anemia pathway of genomic maintenance". This pathway is controlled by at least 13 different proteins, each of which is equally essential for the pathway to function. Twelve genes/proteins have already been identified: FANCA, FANCB, FANCC, FANCDl/BRCA2, FANCD2, FANCE, FANCF, FANCG, FANCJ/BRIPl, FANCL, FANCM, and FANCN/PALB2. A thirteenth, FANCI, has now been identified by us, which is subject of this claim.

Two important features are associated with the FA pathway (and with defects thereof) , which lead to applicability in cancer therapy. First, cells with a FA pathway defect possess an unstable genome and as a result will accumulate genomic alterations at accelerated speed, leading to full-blown tumor cells with high probability. Second, cells with a FA pathway defect are highly sensitive to a specific class of chemotherapeutic agents known as "polyfunctional alkylating" or "cross-linking" agents (examples: cisplatinum, mitomycin C, cyclophosphamide) . Tumors that have resulted from a FA defective premalignant cell are thus likely to be responsive to treatment with cross-linking agents. There is evidence for a proportion of common cancers to show such a FA pathway defect as a result of mutations in or silencing of one of the FA genes, for example by promoter methylation.

Applications of the current finding thus include:

- With the discovery of a new FA gene, FANCI, we are now in a position to classify tumors as being defective in this gene or not. In case a defect is found, this result may become an indication to choose a cross-linking agent for chemotherapy that most probably will be effective to cure the patient. Such crosslinking agents are preferably poly or bifunctional alkylating agents, most preferably bifunctional, such as cis- platin.

- A high-throughput screen for small molecule inhibitors of FANCI may result in the identification of agents that specifically block FANCI. Such agents, when specifically targeted to tumor cells, may result in substantial improvement of cure rates by cross-linking agents by sensitizing otherwise non-responsive cells. - -Molecular diagnostics for FA patients. When new FA patients are screened the current protocol will be extended to the DNA sequencing of 1) the FANCI cDNA and the 2) the genomic FANCI sequence .

- Classification of tumors. Tumor samples can be analysed via immunohistochemistry or Western using antibodies against

FANCI. The levels of mRNA can be determined via techniques such as quantitative RT-PCR. DNA mutations can be analysed by techniques such as DNA sequencing, DNA copy number analysis of gene or exons, LOH analysis. When these analyses point to defects, tumors may be susceptible to treatment with cross- linking reagents.

- Counseling of subjects carrying FANCI mutations in e.g. cancer predisposed families, in particular FA, breast, and ovary and orphan cancer. Difference should be made for counseling of individuals carrying two mutations, thus being FA patients and individuals with monoallelic patients which can be at increased risk for cancer. In the case of the mono-allelic mutations, counseling will take place using the guidelines for counseling individuals carrying low-risk cancer predisposition genes. In the case of the bi-allelic mutations, the patients are at high risk for cancer and will be counseled accordingly.

- FANCI as target for therapy, e.g. by sensitizing cell for drugs such as MMC and cisplatin. In-vitro systems have

indicated that knocking-down via siRNA of FA genes, including

FANCI, in human cancer cell lines, such as HeLa, increases the sensitivity towards polyfunctional alkylating agents. This feature can be exploited to identify small molecules specifically inhibiting FANCI in drug screening protocols.

- FANCI as target in other drug development schemes, such as synthetic lethality. The concept of synthetic lethality is defined as follows. A (partial) defect in gene A (herein FANCI) function on its own is not lethal. A (partial) defect in gene B function on its own is not lethal. Combined defects are lethal. Thus a deliberate disruption of FANCI (gene A) in a cancer cell with a specific defect to be identified (gene B) , or a deliberate disruption of a to be identified gene (gene B) in a cancer cell with a defect in FANCI (gene A) may confer synthetic lethality. The identification of genes of the group B group is contemplated.

EXAMPLES

The current invention is further disclosed by means of the following non-limiting examples. It is to be understood by the skilled person that conditions, chemicals and parameters described in the following examples are typical and can with respect to concentrations of chemicals, temperatures and other numerical values be varied within a range of approximately 30%, preferably 20%, more preferably 10% without falling outside the scope of the current invention. Indeed these conditions, chemicals and parameters used in the example and the range set-out above, are each and independently to be considered as preferred embodiments of the current invention.

Example 1. Patients, cell lines, and controls.

The 4 FA-I cell lines, EUFA592, EUFA816, BD952, and EUFA961, which were all hypersensitive to growth inhibition by mitomycin C, have previously been assigned to complementation group I (Levitus et al 2004) . Following the same methods (see for example Dorsman et al) patients EUFA695 and EUFA1399 were subsequently classified as FA-I based on the lack of complementation after hybridization with FA-I lymphoblasts; hybrids were checked for ploidy to exclude lack of complementation due to loss of complementing chromosomes. All patients and families analyzed so far in this study are summarized in Figure 1.

Clinical features of the patients are summarized in Table 1. Control

DNAs were isolated from blood samples obtained from The Netherlands Blood Transfusion Service; the donors were healthy and unselected for ethnic background.

Example 2. Genome-wide scan.

A genome-wide scan for genetic linkage was performed using the Applied Biosystems microsatellite polymorphism linkage mapping kit MDlO and the Weber 6B Screening set, in accordance with the manufacturer's protocols and performed with the GeneAmp PCR system 9700 (Applied

Biosystems, Foster City, CA, USA) . Combination of both sets, each of which is composed of approximately 400 markers (10 cM apart, on average) , results in an average marker spacing of 5 cM. Samples were analyzed on ABI 3730 DNA Analyzer (Applied Biosystems) . Genomic DNA was isolated from whole blood or lymphoblastoid cell lines from patients and family members, using a Qiagen Blood mini kit (Qiagen, Venlo, The Netherlands) . The genomic DNA of patient 480 was isolated from hair follicles using a 2 h incubation with proteinase K and a Qiagen Blood mini kit. Due to the lack of sufficient DNA whole genome amplification was carried out on DNA from patients 480 and 1428, using the GenomiPhi DNA amplification kit (Amersham Biosciences, Buckinghamshire, UK) .

Example 3. Determination of candidate regions . The initial genome-wide genetic linkage analysis with the patients

EUFA592 and BD952 from consanguineous families 1 and 2 (Figure 1) and the multiplex family 4 yielded candidate regions on chromosome 2, 4, 6, 7, 8, 15, 16, 17 and 18, which were further analyzed using patient 1428 from family 2, patient EUFA1355 from family 4, and family 3. The following potential candidate genes in these regions were sequenced: the kinase DBF4/ASK (on chromosome 7q21.3) for its role in replication initiation and S-phase progression) ; the putative E2 ubiquitin conjugating enzyme FLJlIOIl (on chromosome 8q21.11 for its interaction with FANCD2 in Drosophila (FlyGrid) ; the aprataxin like HIT domain containing hydrolase LOC390637 (on chromosome 15q26.1) for its putative role in DNA repair; the RING finger Nsel (on chromosome 16pl2.1), for its role in DNA damage response as part of the SMC5/6 complex; the RING finger RNF40 (on chromosome 16pll.2), for its putative function as E3 ubiquitin ligase; and the vitamin K epoxide

reductase complex subunit 1 (VKORCl on chromosome 16pll.2), for its presence in a cDNA expression library- λ complemented' FA-I cell line.

In DBF4/ASK and VKORCl heterozygous polymorphisms were found in the consanguineous patient EUFA592, which narrowed down the candidate regions in these parts of the chromosomes. Polymorphisms described in LOC390637 were found to be homozygous in both consanguineous patients BD952 and EUFA592, strengthening the idea that this was a candidate region.

Because of the degree of consanguinity in the parents (first cousins) relatively large candidate regions were to be expected in the single consanguineous patients. Relatively large regions in at least one of the consanguineous patients that were compatible with the additional families and family members helped to define the candidate regions.

Thus the regions on chromosomes 7q, 15q, 16q, and 17q were identified as best candidate regions for further analysis.

Example 4. Bioinformatics and data mining. The positions of DNA markers were identified via NCBI map viewer

(option STS) (http://www.ncbi.nlm.nih.gov/mapview/maps) followed by gene identification in the relevant regions (option Gene) . In these regions, first known genes were selected and excluded (see Determination of candidate regions) . For the novel genes, the following strategy was used. Proteins were firstly selected for which mouse proteins exist with a 50 to 85% identity with the human amino acid sequence (http: //www.ncbi .nlm.nih. gov/sutils/blink. cgi) . Pseudogenes were not further analysed. This yielded 11 proteins, 3 of which were excluded based on unlikely properties. The remaining 8 proteins (KIAA1794 [NPJD60663] , C15Orf42/NP_689472, NP_064597.1,

NP_859058.1, NP 001013679.2, NP 073581.1, XP 933746.1, and XP 934096) were subjected to a WoLFPSort and NUCDISC search (wolfpsort.org) and those were selected for which the nucleus was the most likely location and which contained at least 1 putative nuclear localization signal (NLS: pat4, pat7 or bipartite): KIAA1794 [NP_060663] ,

C15Orf42/NP_689472, NP_064597.1 and NP_859058.1. These 4 genes/proteins - with a focus on the two highest rankers of the pSORT analysis - were compared for 1) degree of evolutionary conservation, 2) orphan status, 3) iriRNA expression patterns in normal tissues (e.g.

http: //bioinfo2.weizmann.ac. il/cgi-bin/genenote/home page .pi) , 4) protein modification motifs (phosphorylation motifs: http: //www. cbs .dtu.dk/services/NetPhos/) , and 5) protein motifs/domains (http: //smart .embl-heidelberg. de/) .

KIAA1794 is an orphan protein displaying a similar conservation as FANCD2 (human versus mouse: ~75%; both genes are present in Drosophila) . It showed an expected expression pattern for a FA gene (low and ubiquitous, but relatively prevalent in bone marrow and thymus; same pattern also found for FANCM) and contained 3 ATM/ATR motifs. C15orf42 was less conserved in the mouse than KIAA1794 (68% versus 75%) , displayed a higher level of expression than usually found for FA genes and contained 1 ATM/ATR motif. Thus in total, KIAA1794 was considered the prime candidate.

Example 5. Amplification of FJiNCI sequences.

Primer sequences for amplification of FANCI cDNA and genomic DNA and

PCR were:

Primer set 1 for cDNA/mRNA FANCI (NCBI site)

CTTGTTGTTACGGGTAACGGAAG LOC55215.for01

CCACAAATTCCACCTCTTCTGC LOC55215. revO1

GATCAGCAATATGTAATCCAACTCAC LOC55215. for02

GAATCCATCAAAATGAAACCAAG LOC55215.rev02 GTTCCTCATAGATCTTATGTTTCAACC LOC55215.for03

AATTGTAATGGCTGTGAACATCC LOC55215. revO3

GTTTTTGCTGCTCCTGAAGAAC LOC55215.for04

TAAGCTCAGAATGTCCTCAAACC LOC55215. revO4

TCTGTGCTTTTCTTGTGATGGG LOC55215. for05 TGCAGGCTGAAGAGCAAGTTC LOC55215.rev05

TTATCCTGGACATCAAAGATTTGC LOC55215. forOβ

AAAGATGAGGTTAGGGATTGGC LOC55215. revO6

GGAGAGAAAAAGGAGAAACCTGC LOC55215. for07

AAAAACGAAGCCAGCCTGG LOC55215.rev07

Primer set 2 bases on FANCI/KIAA1794 consensus coding sequence

(including exon24)

CTTTTTGGAAGTTTGTGGCG cFANCI.forOl

TTTTTCGTAGCCAGGGCAG cFANCI.revOl CCATTTTCCAGGACCATTATTG cFANCI.for02 GGCACAGTGACAACATCCAATAG cFANCI.revO2 TCAAGΆAATACCΆCCTTTGGTCTATC cFANCI.for03 TTCTTCACTACTTCCAAGATCATGG cFANCI.revO3 TCTGTCTGTAACAAGAATACAAAGATTTC cFANCI.for04 GAAGAACAACTTTGAAGAACTAAGGG cFANCI.revO4 CGGAGCTAΆTATCCTGTTGGAAΆC • cFANCI.for05 GGCAAAAAGTTTCATTGGCG cFANCI.revO5 GGGTTTTTGCTGCTCCTGAAG cFANCI.forOβ GGTCTTCGTAGAATGCCTCTTCC cFANCI.revOβ TTGTATTCTGACCCAAGGAGATAAG cFANCI.for07 TTCATGGACAAAAGACTATCACTTG cFANCI.revO7 ATAATATCTGTGCTTTTCTTGTGATGG cFANCI.for08 CACTGAAGTAGGAATTGAAGTGTATCTC cFANCI.revO8 CCATGAΆATTTGTGTCCAGTCTTC cFANCI.for09 TGGAATGCTGTTCTCTGAGTGAC cFANCI.revO9 ACAGTTCTATCAGCCCAAGATTCAG cFANCI.forlO CΆCTATTGCAAAGTGGTTTGTTTTC cFANCI.revlO TGTTTCGTATAAGAGTCCTGTCATTC cFANCI.forll CATATTTTTTGGAATTCCTCCG cFANCI.revll AGACAGCTCTGCCATCAGGC cFANCI.forl2 GCAGAAGCCCCAΆAGTTCAC cFANCI.revl2

Primer sequences of genomic DNA based on genomic sequence FANCI/KIAA1794 K1794 exl.for CGCCCGCAGGTACCCG

K1794_exl.rev ATTTAACAGAAGGGGCTCCGA K1794_ex2.for GTTGAGCACCCATTGCATAAT K1794_ex2.rev ΆGTAGATGAΆGAAGCAGCGTA K1794_ex3.for GCGTCTCTGAATGATCTGA K1794_ex3.rev ACTGCTGCTGTAGTTGATTCT K1794_ex4.for GCACTTTTTCAAAGCCCTTA K1794_ex4.rev CTATGCTAGGTTGGGCACTTA K1794_ex5nβ.for TTGGATTTCTGCTTGAGATT K1794_ex5n6.rev GACTTAAAACCTCCAGACATA K1794_ex7. for CACCAACATTGGGTATCATCA K1794_ex7. rev AGGAGTGAGGAGGGAATCTGT K1794 exδ.for TTCTCTGCTCCCAAGTTTC

K1794_ ex8. rev ATGATCCTTTTACCAGACCAA

K1794_ ex9. for GCCACATTTTCTTTGCGGAGG

K1794_ ex9. rev GAGGTGAAGGTAGGCGGTGAG

K1794_ exlO .for GAATGCCACCACATCG

K1794 exlO .rev CCAGGGTCATTTTACΆGTATT

K1794_ exll.for (a CTAAGCCAGGAGGATCCG

K1794_ exll.rev(a) CCAGAGAACAGAGGGGACATA

K1794_ exll.for (b) AAATGACTTCCTTTTGGTTGC

K1794_ exll . rev (b) AGAAAGAΆACAΆΆΆTGGCA

K1794_ _exl2. for TGAATGGCAGCTCATAGGAA

K1794_ _exl2. rev AAGAAAATGAACCTCGATGTG

K1794_ _exl3.for GCCTGCTGTTCCCATTAT

K1794_ _exl3.rev ATACTCAAATTTCCTTGTCCA

K1794_ exl4.for CAAGGGGAAAGTTGAGTGA

K1794_ exl4. rev TCCTTTATTGGCATCCTATCA

K1794_ _exl5.for ACATTTCTTTCTCCGTAT

K1794_ _exl5.rev TGCCTCTGTAAATCTATTGTA

K1794__exl6_l . forCCTAAGGCTAATAAGCAAAC

K1794__exl6_l . revTGCTCCACCACCAACTACTGT

K1794_ exlβ_2. forGCAGGATAGAAGGAAGTAGAG

K1794__exl6_2. revATGCACATGAATGATACGC

K1794_ _exl7.for CATGTGGAGGAAGCTAAGT

K1794_ _exl7. rev GCTACTGGTGCTGACTACATC

K1794_ _exl8. for TTATATTAGGGCATTTAGG

K1794_ _exl8. rev CAGACAGGTAAGTGGTGACCG

K1794_ _exl9.for GTGTGTTAGAAGGCATTAGAC

K1794_ exl9. rev TAGTAAAGGGCTAACAT

K1794_ _ex20. for TGCTGGTTATGAACAACTTTA

K1794_ _ex20. rev TGCCAATACTGGTGCTC

K1794_ _ex21. for CTGGAAGACTTTGAACTGGT

K1794 _ex21. rev CTGGGAAATTCTGAGTAAGTG

K1794_ ex22.for GCTGTGACCTGGGAGTATTA

K1794 _ex22. rev TACTGTAAACAATCCAAACGA

K1794 _ex23. for GGCAGCCATAGCAGTG

K1794_ ex23. rev TCCTAAAATATGTCCAACCTC

K1794_ _ex24. for GGGAGATTACACAACCATGT

K1794_ _ex24. rev GTACCAATCCTGAATCCACTA

K1794_ ex25. for TGGCTGGAAAATGGAATATAG

K1794 ex25. rev ATGTCCTGTTGCCTGTATGAC

K1794_ex26. for TGGGTACATAGTTCATTATCT

K1794_ex26. rev CACTTTTATGATGCAATAATC

K1794_ex27. for GGACCTCAGTAAGGACATAGA

Kl794_ex27. rev TTGCCAACAATTCGAC K1794_ex28n29.for CACTGCACTTGGCCACAACTT

K1794_ex28n29. rev CAAGCCGCTATAGGACAC

K1794_ex30. for GGCCAGGTGACCACAGTTATT

Kl794_ex30. rev CACATTCAAACCCAACTTCGT

K1794_in30n31. for GAAGTGGACTGGCTAATCACC K1794_in30n31.rev CTCGTGGAAAAATGTAAGCAG

K1794_ex31. for TCTGCATATTGAATGTTCGTT

K1794_ex31. rev TGCCTGCCACCTACTGAC

Kl794_ex32n33. for GGCTCACTGCAGCAGTCACTT

K1794_ex32n33. rev GTCCCCTGGGCATGTA K1794_ex34.for CACTAGCATGCTAGCTCCACA

K1794_ex34. rev CCATACCGTTTTTAGCTGTC

Kl794_ex35. for CCTCAGCCATTTTCTAGATA

K1794_ex35. rev AGTCTCTTTTCCTCCATTGTA

Kl794_ex36.for GCGTGCTTGCTTTAGGTAGA Kl794_ex36.rev GACCCACAGCAGGACATTGTT

K1794_ex37. for TGGAGCAGGTTTATCACGTTA

Kl794_ex37.rev CTTGGAAAAACGAAGCC

Additional primers for exon24 FANCI

FANCI_ex24. for . aGAGAGTCTGCCAGTCGGAAC

FANCI_ex24. rev. aCTGCCACCTGGCTAATGTTT

FANCI_ex24. for .bGTCGGAACTTACTGGCAAGC

FANCI_ex24. rev. bGAGAGTCTTGCCCTGTCACC

PCR were performed under conditions known to the skilled person, using standard methodology.

Total RNA was isolated from lymphoblasts (after 4.5 hours of cycloheximide (5 mg/ml) treatment for EUFA592, BD952, EUFA816, EUFA961, and EUFA1399) using the High Pure RNA Isolation Kit (Roche

Applied Science, Almere, The Netherlands) , from which cDNA was prepared using iScript™ cDNA Synthesis Kit (BioRad Laboratories,

Veenendaal, The Netherlands) . The PCR reactions for amplification of

FANCI were performed on cDNA using Platinum Taq polymerase

(Invitrogen) and sequenced as described below.

Example 6. Sequencing of FANCI. PCR products were purified using a SAP/EXO treatment (Amersham Biosciences, Uppsala, Sweden) according to the manufacturer's instructions. Sequencing reactions were prepared using specific primers and Big Dye terminator cycle sequencing kit (Applied Biosystems, Foster City, CA, USA) . Samples were analyzed on an ABI 3730 DNA Analyzer (Applied Biosystems) .

Example 7. A gene mutated in Fanconi anemia complementation group I Using the methods described in the above-mentioned samples, a genome- wide linkage study involving 4 genetically informative families, including two first cousin marriages (See Fig. 1 and Table 1) , resulted in 4 candidate regions that were considered to harbour the gene: on chromosome 7q between markers D7S2204 and D7S820 (5.6 Mb, 8.6 cM, 12 genes), on 15q between D15S653 and D15S652 (7.1 Mb, 10.5 cM, 79 genes), on 16q between VKORCl and D16S3105 (14.4 Mb, 1.5 cM, 102 genes), and on 17q between D17S1290 and D17S2059 (12.3 Mb, 15.3 cM, 158 genes), together encompassing 39.4 Mb of genomic DNA and 351 genes .

Next, we identified in those regions known genes connected with DNA repair/chromatin and with cellular roles compatible with the FA cellular phenotype, as described above. After excluding those genes by DNA-sequencing we selected novel genes via data mining and bioinformatics incorporating known features of already identified FA genes/proteins .

It has been proposed that some human FA genes encode orphan proteins, whose mouse orthologs displayed a 50 to ~80% amino acid identity. We first selected genes according to evolutionary conservation, which resulted in 8 candidates. We next selected on the basis of predicted nuclear localization. This screen resulted in 4 candidates: with the two highest ranking being KIAA1794/NP_060663 and C15orf 42/NP 689472, both on chromosome 15.

KIAA1794 was considered the prime candidate based on its orphan status, tissue distribution, and the presence of multiple ATR/ATM motifs .

KIAA1794, which is localized to 15q25-26, has 38 exons with a translation start in exon 2, encoding a 1328 amino acid protein with 3 nuclear localization and 3 ATM/ATR phosphorylation motifs (see Seq Id No 2) .

Example 8. Mutations in a gene mutated in Fanconi anemia complementation group I

Using the methods and results described in the above-mentioned samples, sequence analysis of this new gene in 8 FA individuals assigned to complementation group FA-I revealed mutations in all these affected individuals (Table 2) .

Mutations appeared homozygous in the patients from consanguineous marriages .

Patient BD952 was homozygous for two missense mutations, C.164OT (in exon 4) and c.3854G>A (in exon 36), resulting in a Proline to Leucine substitution at position 55 and an Arginine to Glutamine substitution at position 1285, respectively (Table 2) . We tested 96 healthy individuals for the occurrence of these variants and found C.164OT heterozygously present in 9 individuals, whereas c.3854G>A was not detected. This indicates that C.164OT is a polymorphism and that c.3854G>A is thought to represent a pathogenic mutation. The latter mutation creates an additional ATM/ATR phosphorylation motif, which might disturb the protein's proper regulatory response.

In the affected individual from the second consanguineous family, patient EUFA592, a homozygous mutation c.2T>C was found, which eliminates the translation initiation site of the gene; the unaffected sib was homozygous for the normal allele. This mutation was not detected in the panel of 96 healthy control individuals.

This also applied to the additional mutations encountered in the remaining affected individuals, which included three (partial) exon

deletions, one inserted exon, three protein truncations, and one . amino acid substitution (Table 2) .

Example 9. Phenotypic Reversion

In EUFA816 the maternal allele contained a premature stop (c.3853OT) in exon 37, whereas the paternal allele carried a mutation (c.3350-88A>G) in intron 31 resulting in aberrant splicing. From this individual a lymphoblastoid subline had been obtained that was phenotypically reverted to MMC resistance, while these cells had regained their capacity to monoubiquitinate FANCD2 (Fig. 2a and b) .

When investigating cDNA-amplified fragments from MMC-sensitive EUFA816 cells we noted that amplification of the sequence encompassing exons 31 and 32 generated an additional larger fragment, which appeared weaker in the reverted cells (Fig. 2c) . Sequencing of the cDNA from the reverted cells showed partial restoration of normal splicing from one of the alleles and the presence of an additional mutation in intron 31 (c.3349+97T>G) , which reduced the splice acceptor score for the aberrant exon (Fig. 2d) . This indicated that phenotypic reversion was associated with a secondary DNA alteration at the locus under study, thus confirming the identity of KIAA1794 as the disease gene for this individual. Taken together with the mutational data presented for the other FA-I individuals, we conclude that KIAA1794 is the disease-causing gene in FA complementation group I, FANCI.

A striking feature of FA-I cells is their apparent deficiency in the association of FANCD2 with chromatin. FANCI possesses several strong SQD/SQE motifs for ATM- or ATR-induced phosphorylation in its C- terminal domain, a feature that suggests a role in a DNA damage response. Interestingly, the splice site mutation in patient EUFA695 results in an in-frame deletion of exon 27, encoding one of the SQE motifs, while the missense mutation in BD952 creates an additional SQD motif. FANCI could thus be a signal-regulated localizer of FANCD2.