Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
WATERMELON WITH PALE MICROSEEDS
Document Type and Number:
WIPO Patent Application WO/2021/105408
Kind Code:
A1
Abstract:
The present invention relates to a modified watermelon PPO gene, the wild type of which is identified as SEQ ID NO: 1, encoding the protein of SEQ ID NO: 5, or the wild type of which encodes a protein that has at least 90% sequence identity to SEQ ID NO: 5, wherein the modified PPO gene comprises one or more nucleotides replaced, inserted and/or deleted relative to the wild type, and wherein said one or more replaced, inserted and/or deleted nucleotides result in an absence of functional PPO protein. The present invention further relates to a watermelon plant comprising the modified PPO gene, wherein the homozygous presence of the modified PPO gene confers a pale seed color to the plant. The present invention also relates to methods for selecting, producing or the use of the watermelon plant of the invention.

Inventors:
VAN HOF RUDOLPH ADRIANUS (NL)
SANCHEZ SILVA GUADALUPE (NL)
SARRIA VILLADA EMILIO (NL)
Application Number:
PCT/EP2020/083706
Publication Date:
June 03, 2021
Filing Date:
November 27, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
RIJK ZWAAN ZAADTEELT EN ZAADHANDEL BV (NL)
International Classes:
A01H5/02; A01H5/08; A01H5/10; A01H6/34; C07K14/415; C12N9/02
Foreign References:
CN109722486A2019-05-07
US20140020127A12014-01-16
KR20170077082A2017-07-05
KR20170045848A2017-04-28
Other References:
LUCKY PAUDEL ET AL: "Chromosomal Locations and Interactions of Four Loci Associated With Seed Coat Color in Watermelon", FRONTIERS IN PLANT SCIENCE, vol. 10, 1 January 2019 (2019-01-01), CH, XP055681310, ISSN: 1664-462X, DOI: 10.3389/fpls.2019.00788
CHAO CHEN ET AL: "Fine-mapping and candidate gene analysis of BLACK HULL1 in rice (Oryza sativa L.)", PLANT OMICS, 1 January 2014 (2014-01-01), pages 12 - 18, XP055681461, Retrieved from the Internet [retrieved on 20200331]
DATABASE UniProt [online] 31 July 2019 (2019-07-31), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:PLY85797.1};", XP002798563, retrieved from EBI accession no. UNIPROT:A0A2J6LEP1 Database accession no. A0A2J6LEP1
GUO ET AL.: "The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions", NATURE GENETICS, vol. 45, no. 1, 2013, pages 51 - 58
ROCHA ET AL.: "Characterisation of 'Starking' apple polyphenoloxidase", JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE, vol. 77, no. 4, 1998, pages 527 - 534
CHEN ET AL.: "Fine-mapping and candidate gene analysis of BLACK HULL1 in rice (Oryza sativa L.", PLANT OMICS, vol. 7, no. 1, 2014, pages 12 - 18
LIAO ET AL.: "Arabidopsis HOOKLESS1 Regulates Responses to Pathogens and Abscisic Acid through Interaction with MED18 and Acetylation of WRKY33 and ABI5 Chromatin", THE PLANT CELL, vol. 28, no. 7, 2016, pages 1662 - 1681
URNOV ET AL., NAT. REV. GENET., vol. 11, 2010, pages 636 - 46
CARROLL, GENETICS, vol. 188, 2011, pages 773 - 82
JINEK ET AL., SCIENCE, vol. 337, 2012, pages 816 - 821
CHO ET AL., NAT. BIOTECHNOL., vol. 31, 2013, pages 230 - 232
MALI ET AL., SCIENCE, vol. 339, 2013, pages 823 - 826
FENG ET AL., CELL RES, vol. 23, 2013, pages 1229 - 1232
WOO ET AL., NAT. BIOTECH., vol. 33, 2015, pages 1162 - 1164
MCCALLUM, NATURE BIOTECHNOLOGY, vol. 18, 2000, pages 455 - 457
Attorney, Agent or Firm:
VAN SOMEREN, Petronella Francisca Hendrika Maria (NL)
Download PDF:
Claims:
CLAIMS

1. A modified watermelon POLYPHENOL OXIDASE ( PPO ) gene, the wild type of which is identified as SEQ ID NO: 1, encoding the protein of SEQ ID NO: 5, or the wild type of which encodes a protein that has at least 90% sequence identity to SEQ ID NO: 5, wherein the modified PPO gene comprises one or more nucleotides replaced, inserted and/or deleted relative to the wild type, and wherein said one or more replaced, inserted and/or deleted nucleotides result in an absence of functional PPO protein.

2. The modified PPO gene as claimed in claim 1, wherein the modified PPO gene confers a pale seed color to a watermelon plant when present homozygously.

3 The modified PPO gene as claimed in claim 1 or 2, wherein the modified PPO gene, which when homozygously present in a watermelon plant causes the production of pale seeds, comprises a premature stop codon that leads to an absence of functional PPO protein.

4. The modified PPO gene as claimed in any of the claims 1 to 3 wherein one or more nucleotides are replaced, inserted and/or deleted relative to the wild type gene at position 1 to 712 of SEQ ID NO: 1 resulting in a premature stop codon, which modified PPO gene when homozygously present in a watermelon plant causes the production of pale seeds.

5. The modified PPO gene as claimed in claim 4 wherein the modified PPO gene comprises an insertion of a T between nucleotides 711 and 712 of SEQ ID NO: 1 .

6. A watermelon plant comprising the modified PPO gene as claimed in any of the claims 1 to 5, wherein the homozygous presence of the modified PPO gene causes the production of pale seeds.

7. The watermelon plant as claimed in claim 6, wherein the modified PPO gene that confers a pale seed color to the plant when present homozygously is as comprised in the genome of a Citrullus lanatus var. lanatus plant representative seed of which was deposited under accession number NCIMB 43364.

8. The watermelon plant as claimed in claim 6 or 7, wherein the modified PPO gene is homozygously present and the plant produces seeds with a pale seed color.

9. The watermelon plant as claimed in any of the claims 6 to 8, wherein the plant comprises a non-functional HLS1 gene, the wild type of which HLS1 gene is identified as SEQ ID NO: 7 encoding the protein of SEQ ID NO: 9, or the wild type of which HLS1 gene encodes a protein that has at least 90% sequence identity to SEQ ID NO: 9, and/or the plant comprises a non functional BAG4 gene, the wild type of which BAG4 gene is identified as SEQ ID NO: 10 encoding the protein of SEQ ID No: 12, or the wild type of which BAG4 gene encodes a protein that has at least 90% sequence identity to SEQ ID NO: 12, wherein the absence of functional HLS1 protein and/or the absence of functional BAG4 protein confers a microseed size to the plant .

10. The watermelon plant as claimed in claim 9, wherein the non-functional HLS1 gene comprises one or more nucleotides replaced, inserted and/or deleted relative to the wild type, and wherein said one or more replaced, inserted and/or deleted nucleotides result in an absence of functional HLS 1 protein, and/or wherein the non-functional BAG4 gene comprises one or more nucleotides replaced, inserted and/or deleted relative to the wild type, and wherein said one or more replaced, inserted and/or deleted nucleotides result in an absence of functional BAG4 protein.

11. The watermelon plant as claimed in claim 9, wherein the HLS1 gene and/or the BAG4 gene is non-functional because it is absent from the genome.

12. The watermelon plant as claimed in any of the claims 9 to 11, wherein the non functional HLS1 gene is homozygously present and/or the non-functional BAG4 gene is homozygously present or the HLS1 gene and/or the BAG4 gene are homozygously absent resulting in the plant producing seeds with a microseed size.

13. The watermelon plant as claimed in any of the claims 6 to 12, wherein the plant comprises a deletion on chromosome 2 corresponding to 13962 bp being deleted between base pair position 29902114 and 29916077 on the Citrullus lanatus 97103_vl genome, , wherein said deletion confers a microseed size to the plant when present homozygously.

14. The watermelon plant as claimed in any of the claims 6 to 13, wherein the plant comprises a deletion on chromosome 2, wherein said deletion is as comprised in the genome of a Citrullus lanatus var. lanatus plant representative seed of which was deposited under accession number NCIMB 43364, and wherein said deletion confers a microseed size to the plant when present homozygously.

15. The watermelon plant as claimed in claim 13 or 14, wherein the deletion is present homozygously and the plant produces seeds with a microseed size.

16. A watermelon seed, comprising the modified PPO gene as claimed in any of the claims 1 to 5, wherein the plant grown from said seed produces seeds with a pale seed color as a result of the homozygous presence of the modified PPO gene, and optionally further comprising the non functional HLS1 gene as in any of the claims 9 to 15, and/or further comprising the non-functional BAG4 gene as in any of the claims 9 to 15, wherein the absence of functional HLS1 protein and/or the absence of functional BAG4 protein confers a microseed size to the plant grown from said seed.

17. A watermelon fruit produced by the watermelon plant as claimed in any of the claims 6 to 15, wherein the watermelon fruit has seeds that have a pale seed color and optionally a microseed size.

18. Food product, comprising the watermelon fruit of claim 17, or a part thereof, optionally in processed form.

19. Propagation material capable of developing into and/or being derived from a plant as claimed in any of the claims 6 to 15, wherein the propagation material comprises the modified PPO gene as claimed in any of the claims 1 to 5, and optionally the non-functional HLS1 gene and/or the non-functional BAG4 gene as defined in any of the claims claim 9 to 15, and wherein the propagation material is selected from the group consisting of a microspore, a pollen, an ovary, an ovule, an embryo, an embryo sac, an egg cell, a cutting, a root, a root tip, a hypocotyl, a cotyledon, a stem, a leave, a flower, an anther, a seed, a meristematic cell, a protoplast and a cell, or a tissue culture thereof.

20. Use of the modified PPO gene as claimed in any of the claims 1 to 5 for producing a plant that produces seeds with a pale seed color.

21. Use as claimed in claim 20, wherein the plant that produces seeds with a pale seed color is produced by introducing the modified PPO gene into its genome, in particular by means of mutagenesis or introgression, or combinations thereof.

22. Use of the plant as claimed in any of the claims 6 to 15 for the production of a watermelon fruit having seeds that have a pale seed color and optionally a microseed size.

23. Marker for the identification of a modified PPO gene, wherein the marker sequence detects an insertion of a T between nucleotides 711 and 712 of SEQ ID NO:l.

24. Use of the marker of claim 23 for identification and/or selection of a watermelon plant that produces seeds with a pale seed color.

25. Method for selecting a watermelon plant that produces seeds with a pale seed color, comprising identifying the presence of a modification in the PPO gene, optionally checking the color of the seeds the plant produces, and selecting a plant that homozygously comprises said modification as a plant that produces seeds with a pale seed color.

26. Method as claimed in claim 25, wherein the identification is performed by using the marker as defined in claim 23.

27. Marker for the identification of a deletion on chromosome 2, wherein the marker sequence detects the presence or absence of a deletion corresponding to 13962 bp being deleted between base pair position 4930 and 18893 of SEQ ID NO: 13.

28. Use of the molecular marker of claim 27 for identification and/or selection of a watermelon plant producing seeds with a microseed size.

29. Method for selecting a watermelon plant that produces seeds with a microseed size, comprising identifying the presence of the deletion on chromosome 2 using the marker of claim 27, and selecting a plant that homozygously comprises said deletion as a plant that produces seeds with a microseed size.

30. Method for producing a watermelon plant that produces seeds that have a pale seed color, comprising modifying the PPO gene of SEQ ID NO:l, wherein the modification results in an absence of functional PPO protein, and the absence of functional PPO protein leads to the seeds of the produced plant having a pale seed color.

31. The method of claim 30, wherein the plant in which the PPO gene is modified has seeds with a microseed size. 32. Method for producing a watermelon plant that produces seeds that have a microseed size, comprising modifying the HLS1 gene of SEQ ID NO: 7 and/or the BAG4 gene of SEQ ID NO: 10, wherein the modification results in an absence of functional HLS1 protein and/or an absence of functional BAG4 protein in the plant, which leads to the seeds produced by said plant having a microseed size. 33. The method of claim 32, wherein the plant in which the HLS1 gene and/or the BAG4 gene is modified has seeds with a pale seed color.

34. A modified nucleic acid molecule, the wild type of which is identified as SEQ ID NO: 13, or the wild type of which that has at least 90% sequence identity to SEQ ID NO: 13, wherein the modified nucleic acid does not comprise SEQ ID NO: 7 and/or SEQ ID NO: 10, wherein the modified nucleic acid confers a microseed size to the watermelon plant when present homozygously.

35. The nucleic acid molecule of claim 34, comprising a deletion corresponding to 13962 bp being deleted between base pair position 4930 and 18893 of SEQ ID NO: 13.

36. Use of the modified nucleic acid molecule as claimed in claim 34 or 35 for producing a watermelon plant that produces seeds with a microseed size.

37. Use as claimed in claim 36, wherein the watermelon plant that produces seeds with a microseed size is produced by introduction of the modified nucleic acid molecule into its genome, in particular by means of mutagenesis or introgression, or combinations thereof.

Description:
WATERMELON WITH PALE MICROSEEDS

The present invention relates to genes that together impart a pale microseed phenotype to a watermelon plant. Additionally, the invention relates to use of these genes for producing watermelon plants with pale colored seeds with optionally a microseed size, as well as to methods for identifying and selecting a watermelon plant having a pale seed color and methods for identifying and selecting a watermelon plant having a microseed size.

Watermelon belongs to the genus Citrullus which is part of the Cucurbit family ( Cucurbitaceae ). The modern cultivated watermelon is known as Citrullus lanatus var. lanatus (Thunb.) Matsum. & Nakai. Watermelon is grown throughout the tropical and sub-tropical regions of the world, predominantly for consumption of its sweet flesh. The Southern part of the USA, China, the Middle East, Africa, India, Japan and Southern Europe are the most important watermelon producing areas.

Cultivated watermelon plants are large annual plants with a vine-like growth habit. The fruit flesh of mature watermelon fruits of cultivated watermelon is usually red and sweet. The seeds of mature fruits of cultivated watermelons are normally dark (brown to black) and big, making the seeds stand out in the red fruit flesh.

Consumers prefer seedless watermelon fruits. For this reason cultivated watermelon varieties are often triploid. If a triploid watermelon plant is pollinated this triggers fruit development. The three sets of chromosomes make successful meiosis very unlikely however, and cause the ovules or embryos to abort without producing mature seeds, an example of stenospermocarpy. Though the fruits of triploid watermelon plants are considered seedless they do contain such abortive incompletely developed seeds. Triploid hybrid varieties are produced by crossing a tetraploid mother line with a diploid father line. Seed production and the breeding of triploid watermelon varieties is complicated and expensive. As a triploid plant has no viable pollen it is necessary for the watermelon grower to plant a diploid (pollenizer) variety in the production field to provide the pollen that stimulates fruit to form. Usually, one row of the diploid pollenizer variety is planted for every two to three rows of triploid watermelon. The pollenizer variety and the triploid variety need to be synchronized so that pollen are produced by the pollenizer at the time the triploid mother can accept them for induction of fruit set. It is difficult to make good combinations, especially since environmental conditions can affect the pollenizer and triploid differently, leading to asynchrony and lowering of the watermelon fruit yield. Usually varieties are chosen that can be distinguished easily so the seeded diploid fruit can be separated from the seedless triploid fruit for harvesting and marketing. Triploid watermelons germinate weakly, are more difficult to grow than diploid watermelons, and usually produce a lower number of fruits. All this makes triploid watermelon fruit production very expensive and complex for watermelon growers. For growers and consumers the presence of the remains of the undeveloped seeds in the fruit can be a problem. Especially under stress conditions, fruits are produced with clearly noticeable remains of incompletely developed seeds or even normally developed seeds that are objectionable to consumers.

It is therefore an object of the present invention to avoid using triploid watermelon fruit production.

In angiosperms, seed development begins with double fertilization. One of the two sperm cells fuses with the egg cell to form the diploid zygote, which then develops into an embryo with a shoot meristem, cotyledons, hypocotyl, root and a root meristem. The other sperm cell fertilizes the diploid central cell to generate the triploid endosperm. In most dicots such as Arabidopsis thaliana, the endosperm grows rapidly initially, but is consumed at later developmental stages. The embryo therefore occupies most of the mature seed. After fertilization the maternal integuments surrounding the developing embryo and endosperm undergo cell differentiation, may accumulate pigments, mucilage and starch granules, and eventually form the mature seed coat.

The seed coat in many species contains dark (brown to black) pigments. Seed coat coloring has been studied best in Arabidopsis thaliana. In Arabidopsis seeds, pigmentation of the seed coat is observed at late stages of seed development. The actual synthesis of the pigments, which are called proanthocyanidins (PA) or condensed tannins, starts during early stages of embryo development (1-2 days after fertilization). These flavonoids initially accumulate as colorless compounds in vacuoles of the endothelium, the innermost cell layer of the integuments, and are oxidized during seed desiccation thereby conferring the brown color to mature seeds. Several Arabidopsis seed coat pigmentation mutants are known. In these so called transparent testa mutants the Arabidopsis seed coat exhibits a white to pale yellow color. Many TRANSPARENT TESTA genes encode enzymes in the flavonoid biosynthesis pathway, while others encode regulatory genes involved in several points of the pathway.

The size of a seed is determined by the coordinated growth of the embryo, endosperm and maternal tissue. Growth of plant seeds up to their species-specific size is predominantly determined by internal developmental signals from maternal and zygotic tissues. Several genes that promote endosperm growth have been identified in Arabidopsis. Loss-of- function mutants of such genes form small seeds. The phenotype of these mutants is determined by the genotype of the zygotic tissues. In contrast, other genes have been identified that act maternally to regulate seed size. These genes are involved in regulating cell proliferation and/or expansion in the maternal integuments. These maternal integuments surrounding the ovule form the seed coat after fertilization, and are thought to set an upper limit to seed size as they provide the cavity for the growth of the embryo and the endosperm. In the research that led to the present invention, a modification in a POLYPHENOL OXIDASE gene, abbreviated herein as PPO, of watermelon was found to result in the plant comprising the modified PPO gene to have seeds with a pale seed color. Moreover, a non functional HOOKLESS1 (HLSl ) gene and/or a non-functional BCL-2 ASSOCIATED ANTHANOGENE 4 ( BAG4 ) was found to result in the plant comprising said non-functional gene(s) to have seeds with a microseed size. Combining the modified PPO gene and the non functional HLSl gene and/or non-functional BAG4 gene in a watermelon plant resulted in a novel watermelon plant producing seeds with a pale seed color and a microseed size. Such a watermelon plant produces fruits that to a consumer seem seedless, without all the disadvantages of triploid watermelon fruit production and breeding.

Polyphenol oxidase (PPO) is an enzyme that catalyzes the hydroxylation of monophenols into ortho-diphenols (cresolase activity) and the oxidation of o-diphenols into o- quinones (catecholase activity). While the biochemical reactions catalyzed by PPOs are well known, data on physiological functions of the enzyme are scarce. The enzyme is present in nearly all plants, and is also found in fungi, bacteria and animals. Most plants and fungi carry multiple PPO type gene copies and their expression is thought to be tissue specific and developmentally controlled or stress-induced. Different copies within a plant have different expression profiles and even their cellular localization may differ. Plant PPO proteins are best known for causing the rapid polymerization of o-quinones to produce black, brown or red pigments (polyphenols) that cause e.g. fruit or vegetable browning upon damage of the tissue through bruising or cutting. A function of PPOs in resistance to pathogens and herbivores has also been proposed in some plants. Several assays exist to measure PPO enzyme activity.

The watermelon genome comprises 8 PPO type gene copies that are all arranged in tandem on chromosome 3 ( Citrullus lanatus 97103 Chr3:5634000-5814000, see Guo et al, 2013, The draft genome of watermelon ( Citrullus lanatus) and resequencing of 20 diverse accessions. Nature Genetics 45( 1):51 -58).

The present invention provides a modified watermelon PPO gene, which is one of the above mentioned eight gene copies, the wild type of which is identified as SEQ ID NO: 1, encoding the protein of SEQ ID NO: 5, or the wild type of which encodes a protein that has at least 90% sequence identity to SEQ ID NO: 5, wherein the modified PPO gene comprises one or more nucleotides replaced, inserted and/or deleted relative to the wild type, and wherein said one or more replaced, inserted and/or deleted nucleotides result in an absence of functional PPO protein.

Suitably, sequence identity is calculated using the Sequence Identities and Similarities (SIAS) tool, which can be accessed at imed.med.ucm.es/Tools/sias.html. SIAS calculates pairwise sequence identity and similarity percentages between each pair of sequences from a multiple sequence alignment. Sequence identity is calculated using a method taking the gaps into account; sequence similarity is calculated based on grouping of amino acids having similar properties. For calculations, default settings for SIM percentage, similarity amino acid grouping, sequence length, normalized similarity score, matrix and gap penalties are used.

The DNA sequence of a gene may be altered in a number of ways, and will have varying effects depending on where the modification(s) occur and whether they alter the expression level and/or function of the encoded protein. Examples of DNA modifications include an insertion, a deletion, and base substitution (also called nucleotide replacement), this may e.g. result in a frameshift mutation, a nonsense mutation, a null-mutation, a knockout mutation, a premature stop codon, and/or an amino acid substitution.

An insertion changes the number of DNA bases in a gene by adding a piece of DNA. A deletion changes the number of DNA bases by removing one or more base pairs, or even an entire gene or neighboring genes. These types of modifications may alter the function of the resulting protein.

Frame shift mutations are caused by insertion or deletion of one or more base pairs in a DNA sequence encoding a protein. When the number of inserted or deleted base pairs at a certain position within the coding sequence is not a multiple of 3, the triplet codon encoding the individual amino acids of the protein sequence becomes shifted relative to the original open reading frame, and then the encoded protein sequence changes dramatically. Protein translation will result in an entirely different amino acid sequence than that of the originally encoded protein, and very often a frameshift leads to a premature stop codon in the open reading frame. The overall result is that the encoded protein no longer has the same biological function as the originally encoded protein.

An amino acid substitution in an encoded protein sequence arises when the mutation or base substitution of one or more base pairs in the coding sequence results in an altered triplet codon, often encoding a different amino acid. Mutations resulting in an amino acid substitution are called non-synonymous or missense mutations. Due to the redundancy of the genetic code not all point mutations lead to amino acid changes. Such mutations are termed silent mutations. Some amino acid changes are conservative, i.e. they lead to the replacement of one amino acid by another amino acid with comparable properties, such that the mutation is unlikely to dramatically change the folding of the mature protein, or influence its function. Other amino acid changes are more likely to affect protein function: non-conservative amino acid changes in domains that play a role in substrate recognition, the active site of enzymes, interaction domains or in major structural domains (such as transmembrane helices) may partly or completely destroy the functionality of an encoded protein, without thereby necessarily affecting the expression level of the encoding gene. Whether an amino acid substitution is conservative or non-conservative may be predicted on the basis of chemical properties, for example similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity or amphipathic nature of the amino acids.

A deletion, insertion, frame shift mutation and/or amino-acid substitution may result in a nonsense mutation. A nonsense mutation is a mutation in a nucleic acid molecule encoding a protein whereby a codon is changed into a premature stop codon. Converting an amino acid into a premature stop codon results in a truncated protein. How much of the protein is lost determines whether or not the protein is still functional. Especially when all or part of the conserved functional domains are lacking from the truncated protein it is likely protein function is affected. Premature stop codons may also lead to nonsense-mediated decay, in which mRNAs that are transcribed from an allele carrying a nonsense mutation are eliminated, leading to low RNA expression levels and no or very little protein.

A deletion, insertion, frame shift mutation and/or amino-acid substitution may result in a null mutation or knockout mutation. A null mutation or knockout mutation is a mutation that eliminates the function of the affected gene. For example, a null mutation in a gene that usually encodes a specific enzyme leads to the production of a nonfunctional enzyme or no enzyme at all.

The wild type of the PPO gene of this invention comprises SEQ ID NO: E In the publicly available genome assembly of Citrullus lanatus cv. 97103 (version 1, see Guo et al, 2013, The draft genome of watermelon ( Citrullus lanatus) and resequencing of 20 diverse accessions. Nature Genetics 45( 1):51 -58) said wild type of the modified PPO gene of this invention is located on chromosome 3 at position 5704673 .. 5707416 (-). SEQ ID NO: 20 provides the reverse complementary sequence of the PPO gene that is present on the positive strand. Also encompassed by the term “wild type of the PPO gene of this invention” is a gene that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 1.

The wild type of the PPO gene of the invention encodes the protein of SEQ ID NO: 5. This wild type PPO protein comprises the following conserved domains: Tyrosinase domain (aa 171-378 of SEQ ID NO: 5, Pfam domain PF00264), PPOl-DWL domain (aa 384-432 of SEQ ID NO: 5, Pfam domain PF12142), PPOl-KFDV domain (aa 458-585 of SEQ ID NO: 5, Pfam domain PF12143). Also encompassed by the term “wild type of the PPO gene of this invention” is a gene that encodes a protein that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 5.

The modified PPO gene of the invention comprises one or more nucleotides replaced, inserted and/or deleted relative to the wild type, and said one or more replaced, inserted and/or deleted nucleotides result in an absence of functional PPO protein. In the context of this invention the term “absence of functional PPO protein” means that either no PPO protein is expressed, or that PPO protein is expressed that is non-functional and does not have PPO enzyme activity. The modification to the PPO gene can lead to the absence of PPO RNA or a significantly decreased PPO RNA level, resulting in an absence of PPO protein. Alternatively, the modified PPO protein is expressed but is non-functional: an absence of one or more of the functional domains of the PPO protein results in a modified PPO protein that cannot perform its function as a polyphenol oxidase enzyme. The absence of functional PPO protein can e.g. be determined by using a PPO enzyme activity assay. In short, in such an assay a protein extract is made of the seed coat tissue, after which different phenolic substrates can be added to this protein extract and a significant reduction of PPO activity can be determined by measuring the color change using a spectrophotometer (see in general e.g. Rocha et al, 1998, Characterisation of 'Starking' apple polyphenoloxidase. Journal of the Science of Food and Agriculture 77(4):527-534). An even simpler method is by putting the entire seed coat or seed in a phenolic substrate and checking for a color change, such as is described in Chen et al. (2014, Fine-mapping and candidate gene analysis of BLACK HULL1 in rice (Oryza sativa L.). Plant Omics, 7(1): 12-18).

In one embodiment, the modified PPO gene of the invention comprises a premature stop codon that leads to an absence of functional PPO protein. In another embodiment, the modified PPO gene of the invention comprises a premature stop codon resulting in the absence of the PPOl-KFDV domain from the encoded modified PPO protein, the absence of the PPOl- KFDV and the PPOl-DWL domain from the encoded modified PPO protein, or the absence of the PPOl-KFDV, the PPOl-DWL and the Tyrosinase domain from the encoded modified PPO protein. In a preferred embodiment, the one or more nucleotides that are replaced, inserted and/or deleted in the modified PPO gene of the invention relative to the wild type are at position 1 to 712 of SEQ ID NO: 1, resulting in a premature stop codon that leads to an absence of functional protein. In a most preferred embodiment, the modified PPO gene comprises an insertion of a T between nucleotides 711 and 712 (711_712insT) of SEQ ID NO: 1.

In the genome of a watermelon plant representative seed of which was deposited under accession number NCIMB 43364 there is an insertion of an A at the genomic position corresponding to cl_97103_vl_Chr3:5706705 (Guo et al, supra). This insertion leads to a frameshift, which leads to the introduction of a premature stop codon in the PPO gene of SEQ ID NO: 1 (which gene is in the genome on the reverse strand). SEQ ID NO: 20 provides the reverse complementary sequence of the PPO gene that is present on the positive strand. The modified PPO gene comprises an insertion of a T between nucleotides 711 and 712 of SEQ ID No. 1. This one base pair insertion leads to a frameshift, which leads to 13 amino acids being encoded in the wrong frame followed by a premature stop codon at position 751-753 of the modified PPO gene (SEQ ID NO:2). Whereas the size of the wild type PPO protein is 587 amino acids (SEQ ID NO:5), the modified PPO protein (SEQ ID NO:6), if produced at all, is only 250 amino acids long, comprises only a small part of its Tyrosinase domain, lacks its conserved PPOl-DWL and PPOl-KFDV domains completely and comprises 13 altered amino acids at its C-terminus. The mutant protein is thus non-functional.

The modified PPO gene of this invention confers a pale seed color to the plant when present homozygously.

In one embodiment, the modified PPO gene of this invention is a nucleic acid, in particular a nucleic acid molecule, more in particular an isolated nucleic acid molecule.

Seed color can be determined visually. While the color of fully developed and mature dried watermelon seeds of cultivated watermelon plants not carrying the modified PPO gene of the invention normally varies from middle brown to black depending on the variety, fully developed and mature dried seeds of cultivated watermelon plants carrying the modified PPO gene of the invention homozygously may be indicated as beige, light yellow, pale yellow, wheat, or light khaki. Seed color hardly changes upon the drying of the fresh wet seeds as they are present in the mature watermelon fruit. The seed color of fully developed and mature fresh seeds of cultivated watermelon plants carrying the modified PPO gene of the invention homozygously may thus be indicated as beige, light yellow, pale yellow, wheat, or light khaki. When comparing the color of seeds produced by plants of the invention carrying the modified PPO gene of the invention homozygously and seeds of isogenic plants carrying the modified PPO gene either heterozygously or not at all, all seeds have to be at the same developmental stage and all seeds have to be either all fresh or all dried.

An RHS color chart (The Royal Horticultural Society, London, UK) is often used by plant breeders and growers for determining plant colors visually, however, it is clear the color may also be determined using other color charts or systems. Colors may, for example, also be specified in RGB color codes, using the Munsell color system or may be determined using a colorimeter or image analysis. The skilled person knows how to use these different color systems and convert color codes between different color systems.

The color of seeds can also be determined by using a colorimeter or by using image analysis, e.g. as described in Example 1. When determining the color of seeds it is good to do this on an appropriate number of seeds, such as at least 10 seeds, from each seed lot, so that the average color values can be calculated. For image analysis photographs need to be taken in a standardized set-up. It is important that the about 10 seeds to be photographed are clearly separated from each other and for later color correction of the photographs it is good to include a colorchart, such as the X-rite colorchecker passport colorchart, in each picture. By image analysis of the color corrected photographs, using a CellProfiler pipeline or a comparable program, calibrated RGB values can be generated. These can then be translated into, for example, CIELAB L*a*b* color values using a color calibration algorithm. A color scale that is widely used to measure colors, for instance using a colorimeter or image analysis, is the CIELAB color scale. The scale includes 3 data variables: L*, a* and b*. L* indicates lightness on a 0 to 100 scale, where 0 is black and 100 is white. The variables a* and b* indicate the amount of red, green, blue and yellow color: a* value indicates color change from green (negative values) to red (positive values), while b* indicates color change from yellow (positive values) to blue (negative values). Differences in color between two samples can be expressed in terms of change in L* and/or a*, and/or b*.

Seeds produced by plants carrying the modified PPO gene of the invention homozygously have a pale seed color. As used herein the term “pale seed color” is intended to refer to a seed color of fully developed and mature dry seeds that is beige, light yellow, pale yellow or light khaki and/or the fully developed and mature dry seeds having an L* (107D65) score when determined using image analysis, e.g. as described above or in Example 1, of at least, in order of increased preference 55, 60, 62, 64, 66, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 80, 85, 90. The L* (107D65) score when determined using image analysis on dry mature seeds of said plant is suitably not higher than 99.

The pale seed color phenotype of seeds of the invention is due to the reduction or absence of brown pigments in the seed coat, also called the testa. The seed coats of the pale colored seeds of the invention have a reduced amount of the brown pigments that are normally present in the seed coats of seeds at the mature seed stage produced by plants not comprising the modified PPO gene of the invention homozygously. In particular, the seed coats of seeds of this invention have such a low amount of the brown pigments that the brown color in the seed coats of said seeds of plants of the invention is not detectable by the eye. More in particular, seed coats of seeds of this invention completely lack the brown pigments that are normally present in the seed coats of brown or black seeds.

The seed coat is the outer protective layer of the seed and is derived from the integuments of the ovule. The seed coat is thus of maternal origin. The color of the seeds therefore is determined by the genotype of the plant that produces the seeds (the mother plant that receives pollen in a cross). Since this trait is recessive, the watermelon plant producing the seeds (mother plant) needs to comprise the modified PPO gene of the invention homozygously to produce pale seeds. The genotype of the father plant providing the pollen in the cross has no impact on the color of the seeds produced by the mother plant after this pollination.

The invention relates to a watermelon plant comprising the modified PPO gene of the invention, wherein the homozygous presence of the modified PPO gene confers a pale seed color to the plant. The modified PPO gene of the invention can be as comprised in the genome of a Citrullus lanatus var. lanatus plant representative seed of which was deposited under accession number NCIMB 43364. The plant can comprise the modified PPO gene of the invention heterozygously, in which case the seeds produced by the plant do not have the pale seed color trait but the plant is useful for transferring the modified PPO gene of the invention to another plant. The plant can also comprise the modified PPO gene of the invention homozygously, in which case said plant produces seeds with a pale seed color.

This invention further relates to a watermelon plant comprising the modified PPO gene of the invention, wherein the plant further comprises a non-functional HLS1 gene, the wild type of which is identified as SEQ ID NO: 7 encoding the protein of SEQ ID NO: 9, or the wild type of which encodes a protein that has at least 90% sequence identity to SEQ ID NO: 9, and/or a non-functional BAG4 gene, the wild type of which is identified as SEQ ID NO: 10 encoding the protein of SEQ ID NO: 12, or the wild type of which encodes a protein that has at least 90% sequence identity to SEQ ID NO: 12, wherein the absence of functional HLS1 protein and/or the absence of functional BAG4 protein confers a microseed size to the plant.

The wild type of the watermelon HLS1 gene of this invention comprises SEQ ID NO: 7. In the publicly available genome assembly of Citrullus lanatus cv. 97103 (version 1, see Guo et al, supra) said wild type HLS1 gene is located on chromosome 2 at position 29904246 .. 29906227 (-). Also encompassed by the term wild type of the HLS1 gene of this invention is a gene that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 7.

The wild type of the watermelon HLS1 gene of this invention encodes the protein of SEQ ID NO: 9. This wild type HLS1 protein comprises the following conserved domain: Acetyltransferase (GNAT) family domain (aa 38-146 of SEQ ID NO: 9, Pfam domain pfam00583). This HLS1 gene is an N-acetyltransferase family gene which encodes an enzyme that catalyzes the transfer of an acetyl group to a substrate. The Arabidopsis HLS1 gene was linked to regulation of apical hook formation under etiolation and ethylene treatment, and was shown to be involved in sugar and auxin signaling. The Arabidopsis HLS1 gene was shown to function through histone acetylation (Liao et al, 2016, Arabidopsis HOOKLESS1 Regulates Responses to Pathogens and Abscisic Acid through Interaction with MED18 and Acetylation of WRKY33 and ABI5 Chromatin. The Plant Cell, 28 (7): 1662-1681). Also encompassed by the term “wild type of the HLS1 gene of this invention” is a gene that encodes a protein that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 9.

The non-functional HLS1 gene of the invention can comprise one or more nucleotides replaced, inserted and/or deleted relative to the wild type resulting in an absence of functional HLS1 protein. In this context, the absence of functional HLS1 protein can be due to the absence of HLS1 RNA resulting in an absence of HLS1 protein. The absence of functional HLS1 protein can also mean an absence of the functional domain of the HLS1 protein, resulting in a modified HLS1 protein that cannot perform its function. The HLS1 gene of the invention can also be non-functional because it is absent from the genome.

In one embodiment, the non-functional HLS1 gene of this invention is a nucleic acid, in particular a nucleic acid molecule, more in particular an isolated nucleic acid molecule.

The wild type of the watermelon BAG4 gene of this invention comprises SEQ ID NO: 10. In the publicly available genome assembly of Citrullus lanatus cv. 97103 (version 1, see Guo et al, supra) said wild type BAG4 gene is located on chromosome 2 at position 29911929 .. 29915565 (+). Also encompassed by the term “wild type of the BAG4 gene of this invention” is a gene that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 10.

The wild type of the watermelon BAG4 gene of this invention encodes the protein of SEQ ID NO: 12. This wild type BAG4 protein comprises the following conserved domains: ubiquitin-like domain (aa 49-117 of SEQ ID NO: 12, InterPro domain IPR000626) and BAG- domain (aa 141-219 of SEQ ID NO:12, InterPro domain IPR003103). The protein encoded by the BAG4 gene is a member of the BAG1 -related protein family. BAG1 is an anti-apoptotic protein that functions through interactions with a variety of cell apoptosis and growth related proteins.

Also encompassed by the term “wild type of the BAG4 gene of this invention” is a gene that encodes a protein that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 12.

The non-functional BAG4 gene of the invention can comprise one or more nucleotides replaced, inserted and/or deleted relative to the wild type resulting in an absence of functional BAG4 protein. In this context, the absence of functional BAG4 protein can be due to the absence of BAG4 RNA resulting in an absence of BAG4 protein. The absence of functional BAG4 protein can also mean an absence of one or more or all of the functional domains of the BAG4 protein, resulting in a modified BAG4 protein that cannot perform its function. The BAG4 gene of the invention can also be non-functional because it is absent from the genome.

In one embodiment the non-functional BAG4 gene of this invention is a nucleic acid, in particular a nucleic acid molecule, more in particular an isolated nucleic acid molecule.

The watermelon plant of the invention can comprise the non-functional HLS1 gene and/or the non-functional BAG4 gene heterozygously. Preferably, a watermelon plant of the invention homozygously comprises the non-functional HLS1 gene and/or homozygously comprises the non-functional BAG4 gene and the plant produces seeds with a microseed size. If the HLS1 gene and/or the BAG4 gene are absent from the genome, this absence is preferably also homozygous, which means that both copies are absent.

The invention further relates to a watermelon plant comprising the modified PPO gene of the invention, further comprising a deletion on chromosome 2 corresponding to 13962 bp being deleted between base pair position 29902114 and 29916077 on the Citrullus lanatus 97103_vl genome, wherein this deletion confers a microseed size to the plant when present homozygously. By this deletion all nucleotides starting from the G at position 29902115 on chromosome 2 of the Citrullus lanatus 97103_vl genome to the A at position 29916076 on chromosome 2 of the Citrullus lanatus 97103_vl genome, have been deleted. Sequence SEQ ID NO:13 provides the cl_97103_vl genomic sequence from position 29897185 to 29920517 of chromosome 2. The genomic deletion conferring microseed size corresponds to a deletion of all nucleotides between base pair position 4930 and 18893 of SEQ ID NO: 13. This genomic deletion leads to two genes being deleted: the HLS1 gene of SEQ ID NO: 7 and the BAG4 gene of SEQ ID NO: 10. Preferably, this deletion is as comprised in the genome of a Citrullus lanatus var. lanatus plant representative seed of which was deposited under accession number NCIMB 43364. This deletion can be present heterozygously, in which case the seeds produced by the plant do not have the microseed size trait but the plant is useful for transferring this deletion of the invention to another plant. Preferably, this deletion is present homozygously and the plant produces seeds with a microseed size.

The absence of functional HLS1 protein and/or the absence of functional BAG4 protein confers a microseed size to the watermelon plant.

Seed size can be estimated visually by a skilled person, but is better measured using image analysis or using a caliper as described in Example 1. When determining the size of seeds this has to be done on an appropriate number of fully developed and mature dry seeds, such as at least 10 seeds, from each seed lot, so that the average size can be calculated. With a caliper seed length, seed width and seed thickness can be measured. Seed length is the best measure for watermelon seed size.

Seeds as deposited at the NCIMB under deposit Accession number 43364 with a pale color and a microseed size have an average length of 4.0 mm, an average width of 2.5 mm and an average thickness of 1.5 mm. The average 100 seed weight (100SDW, in g) of seeds as deposited is 0.7 g. In general there is a strong correlation between seed length and 100SDW.

As used herein the term “microseed size” is intended to refer to fully developed and mature dry seeds having an average length when determined on about 10 seeds, of at most, in order of increased preference 6.0 mm, 5.9 mm, 5.8 mm, 5.7 mm, 5.5 mm, 5.4 mm, 5.3 mm, 5.2 mm, 5.1 mm, 5.0 mm, 4.9 mm, 4.8 mm, 4.7 mm, 4.6 mm, 4.5 mm, 4.4 mm, 4.3 mm, 4.2 mm, 4.1 mm, 4.0 mm, 3.9 mm, 3.8 mm, 3.7 mm, 3.6 mm, 3.5 mm, 3.0 mm, 2.5 mm, or 2.1 mm. The seed length is suitable not lower than 2.0 mm.

As used herein the term “watermelon plant of the invention” or “plant of the invention” is intended to refer to a watermelon ( Citrullus lanatus var. lanatus ) plant comprising the modified PPO gene of the invention and optionally further comprising the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention. A watermelon plant comprising the modified PPO gene of the invention and further comprising a deletion on chromosome 2 corresponding to 13962 bp being deleted between base pair position 29902114 and 29916077 on the Citrullus lanatus 97103_vl genome, is also a plant of the invention as in this plant both the HLS1 gene and the BAG4 gene are absent from the genome. Preferably in the plant of the invention said deletion is as comprised in the genome of a Citrullus lanatus var. lanatus plant representative seed of which was deposited under accession number NCIMB 43364.

The watermelon plant of the invention can be a watermelon plant of any type, any fruit form or fruit color, and is preferably an agronomically elite watermelon plant. In one embodiment, the mature fruits of the watermelon plant of the invention have red, orange or yellow flesh. In another embodiment, the mature fruits of said plant have flesh with soluble solids of at least, in order of increased preference, 5.0 degrees Brix, 6.0 degrees Brix, 7.0 degrees Brix, 8.0 degrees Brix, 9.0 degrees Brix, 9.5 degrees Brix, 10.0 degrees Brix, 10.5 degrees Brix, 11.0 degrees Brix, 11.5 degrees Brix, 12.0 degrees Brix, 12.5 degrees Brix, 13.0 degrees Brix, 13.5 degrees Brix, 14.0 degrees Brix, 14.5 degrees Brix, 15.0 degrees Brix, 15.5 degrees Brix, 16.0 degrees Brix, or 17.0 degrees Brix. The soluble solids of the mature fruits of said plant are suitably not higher than 18 degrees Brix. In another embodiment the watermelon plant of the invention is a plant of an inbred line or a hybrid plant. In yet another embodiment the watermelon plant of the invention is a diploid, tetraploid or triploid plant.

In case triploid watermelon plants homozygously comprise the modified PPO gene of the invention and optionally further homozygously comprise the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, the fruits these plants produce after being pollinated by a diploid pollenizer are improved over triploid watermelon plants not containing the gene(s) of this invention. The incompletely developed seeds or occasional normally developed seeds that can be present in such fruits are less noticeable than in normal triploid fruits because of the pale color and optionally the smaller size.

In the context of this invention an “agronomically elite watermelon” plant is a plant having a genotype that results in an accumulation of distinguishable and desirable agronomic traits which allow a producer to harvest a product of commercial significance.

As used herein, a “plant of an inbred line” is a plant of a population of plants that is the result of three or more rounds of selfing, or backcrossing, or which plant is a doubled haploid. An inbred line may e.g. be a parent line used for the production of a commercial hybrid.

As used herein, a “hybrid plant” is a plant which is the result of a cross between two different plants having different genotypes. More in particular, a hybrid plant is the result of a cross between plants of two different inbred lines, such that a hybrid plant may e.g. be a plant of an F | hybrid variety. The invention also encompasses a watermelon seed, comprising the modified PPO gene of the invention, wherein the plant grown from said seed produces seeds with a pale seed color as a result of the homozygous presence of the modified PPO gene, and optionally further comprising the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, wherein the absence of functional HLS1 protein and/or the absence of functional BAG4 protein confers a microseed size to the plant grown from said seed.

The invention further relates to a part of the watermelon plant of the invention, which comprises a fruit of the plant of the invention or a seed of the plant of the invention, wherein the plant part comprises the modified PPO gene of the invention and optionally further comprises the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention.

The invention further relates to a watermelon fruit produced by the watermelon plant of the invention, wherein the watermelon fruit has seeds that have a pale seed color and optionally a microseed size. This watermelon fruit is a fruit of the invention.

Moreover, the invention also relates to a food product or a processed food product comprising the fruit of the invention or a part thereof. The food product may have undergone one or more processing steps. Such a processing step might comprise but is not limited to any one of the following treatments or combinations thereof: peeling, cutting, washing, juicing, cooking, cooling or preparing a salad mixture comprising the fruit of the invention. The processed form that is obtained is also part of this invention since it comprises DNA in which the modified PPO gene and/or a non-functional HLS1 gene and/or a non-functional BAG4 gene are present.

The invention further relates to a cell of a plant of the invention. Such a cell may either be in isolated form or a part of the complete plant or parts thereof and still constitutes a cell of the invention because such a cell harbors the genetic information that imparts the pale seed color and optionally the microseed size to a plant of the invention. Each cell of a plant of the invention carries the genetic information that leads to the pale seed color and optionally the microseed size of the invention. A cell of the invention may also be a regenerable cell that can regenerate into a new plant of the invention. The presence of genetic information as used herein is the presence of the modified PPO gene of the invention and optionally the presence of the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, or the presence of the deletion on chromosome 2 as defined herein.

The invention further relates to plant tissue of a plant of the invention, which comprises the modified PPO gene of the invention, and optionally further comprises the non functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention. The tissue can be undifferentiated tissue or already differentiated tissue. Undifferentiated tissue is for example a stem tip, an anther, a petal, or pollen, and can be used in micropropagation to obtain new plantlets that are grown into new plants of the invention. The tissue can also be grown from a cell of the invention.

The invention moreover relates to progeny of a plant, a cell, a tissue, or a seed of the invention, which progeny comprises the modified PPO gene of the invention, and optionally further comprises the non-functional HLS1 gene of the invention, and/or the non-functional BAG4 gene of the invention. Such progeny can in itself be a plant, a cell, a tissue, or a seed. The progeny can in particular be progeny of a plant of the invention deposited under NCIMB Accession number 43364. As used herein “progeny” is intended to mean the first and all further descendants from a cross with a plant of the invention, wherein a cross comprises a cross with itself or a cross with another plant, and wherein a descendant that is determined to be progeny comprises the modified PPO gene of the invention, and optionally further comprises the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention. Progeny also encompasses material that is obtained by vegetative propagation or another form of multiplication. Preferably, the progeny plant produces seeds that have a pale seed color as a result of the homozygous presence of the modified PPO gene of the invention, and optionally a microseed size as a result of the presence of the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, or the presence of the deletion on chromosome 2 as defined herein.

The invention also relates to propagation material capable of developing into and/or being derived from a plant of the invention, wherein the propagation material comprises the modified PPO gene of the invention, and optionally further comprises the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, and wherein the propagation material is selected from a group consisting of a microspore, a pollen, an ovary, an ovule, an embryo, an embryo sac, an egg cell, a cutting, a root, a root tip, a hypocotyl, a cotyledon, a stem, a leave, a flower, an anther, a seed, a meristematic cell, a protoplast and a cell, or a tissue culture thereof.

The invention further relates to use of the modified PPO gene of the invention for producing a plant that produces seeds with a pale seed color. The plant that produces seeds with a pale seed color may be produced by introduction of the modified PPO gene into its genome, in particular by means of mutagenesis or introgression, or combinations thereof. The seeds of said plant may have a microseed size.

The invention further relates to use of the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention for producing a plant that produces seeds with microseed size. The plant that produces seeds with a microseed size may be produced by introduction of the non-functional HLS1 gene of the invention and/or the non functional BAG4 gene of the invention into its genome, in particular by means of mutagenesis or introgression, or combinations thereof. Deleting the HLS1 gene and/or BAG4 gene from the genome can also lead to the HLS1 gene of the invention and/or the BAG4 gene of the invention being non-functional. The seeds of said plant may have a pale color.

The invention also relates to use of the plant of the invention for the production of a watermelon fruit having seeds that have a pale seed color and optionally a microseed size.

The invention further relates to a marker for the identification of a modified PPO gene, wherein the marker sequence detects an insertion of a T between nucleotides 711 and 712 of SEQ ID NO:l. This insertion corresponds to a single nucleotide insertion of an A at position cl_97103_vl_Chr3:5706705. An example of such a marker is marker CL08381 (SEQ ID NO: 14 and SEQ ID NO: 15). SEQ ID NO: 15 represents the allele of marker CL08381 as it is present in the genome of a plant comprising the modified PPO gene of this invention. SEQ ID NO: 14 represents the wild type allele of this same marker, as is present in genomes of plants that do not comprise the modified PPO gene of this invention. The nucleotide that is different between the two marker alleles of marker CL08381 is underlined and in bold in Table 4 below. The marker allele (SEQ ID No. 15) for the modified PPO gene has a single nucleotide insertion of an A that is underlined and in bold in Table 2 (position 101 of SEQ ID NO: 15).

Use of this marker for identification and/or selection of a watermelon plant producing seeds with a pale seed color is also part of this invention. The invention further relates to a method for selecting a watermelon plant that produces seeds with a pale seed color, comprising identifying the presence of a modification in the PPO gene, optionally checking the color of the seeds the plant produces, and selecting a plant that homozygously comprises said modification as a plant that produces seeds with a pale seed color. The identification of the presence of a modification in the PPO gene may be performed by using the marker as defined above.

The invention further relates to a marker for the identification of a deletion on chromosome 2, wherein the marker sequence detects the presence or absence of a deletion corresponding to 13962 bp being deleted between base pair position 4930 and 18893 of SEQ ID NO: 13. Use of this marker for identification and/or selection of a watermelon plant producing seeds with a microseed size is also part of this invention. Also encompassed in this invention is a method for selecting a watermelon plant that produces seeds with a microseed size, comprising identifying the presence of the deletion on chromosome 2 using the marker as defined above, optionally checking the size of the seeds the plant produces, and selecting a plant that homozygously comprises said deletion as a plant that produces seeds with a microseed size.

An example of a marker for detecting the presence of the deletion is marker CL_chr2_gapl with primers SEQ ID NO: 16 plus SEQ ID NO: 17 (see Table 4). In material comprising the genome deletion said primers amplify a PCR product of 446 bp, in material without said genome deletion no PCR product is amplified by these primers. An example of a marker for detecting the absence of the deletion is marker CL_chr2_gap2 with primers SEQ ID NO: 18 plus SEQ ID NO: 19 (see Table 4). In material not comprising the genome deletion said primers amplify a PCR product of 945 bp, in material comprising said genome deletion no PCR product is amplified by these primers.

Also encompassed in this invention is a method for identifying the presence of the genomic deletion leading to the microseed size, wherein the method comprises the steps of: a) running an assay with the primers represented by SEQ ID NO: 16 plus SEQ ID NO: 17 and/or SEQ ID NO: 18 plus SEQ ID NO: 19 to determine the presence of an amplification product of SEQ ID NO:16 and SEQ ID NO:17 and/or an amplification product of SEQ ID NO:18 and SEQ ID NO: 19; b) determining the presence of the deletion by assigning: presence of the deletion when the product of the primer represented by SEQ ID NO: 16 and the primer represented by SEQ ID NO: 17 is produced, and absence of the deletion when the product of the primer represented by SEQ ID NO: 18 and the primer represented by SEQ ID NO: 19 is produced.

The invention further relates to a method for producing a watermelon plant that produces seeds that have a pale seed color, comprising modifying the wild type of the PPO gene of this invention, wherein the modification results in an absence of functional PPO protein, and the absence of functional PPO protein leads to the seeds of the produced plant having a pale seed color. The wild type of the PPO gene of this invention is a gene that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:l. Optionally, the plant in which the PPO gene is modified has seeds with a microseed size.

The invention further relates to a method for producing a watermelon plant that produces seeds that have a microseed size, comprising modifying the wild type of the HLS1 gene of this invention and/or the wild type of the BAG4 gene of this invention, wherein the modification results in an absence of functional HLS1 protein and/or an absence of functional BAG4 protein in the plant, which leads to the seeds produced by said plant having a microseed size. The wild type of the HLS1 gene of this invention is a gene that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:7. The wild type of the BAG4 gene of this invention is a gene that has, in order of increased preference, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO: 10. In one embodiment, the modification is a deletion of the HLS1 gene and/or the BAG4 gene. Optionally, the plant in which the HLS1 gene and/or the BAG4 gene is modified has seeds with a pale seed color.

This invention also relates to a modified nucleic acid molecule, the wild type of which is identified as SEQ ID NO: 13, or the wild type of which has at least 90% sequence identity to SEQ ID NO: 13 , wherein the modified nucleic acid does not comprise SEQ ID NO: 7 and/or SEQ ID NO: 10, wherein the modified nucleic acid confers a microseed size to the watermelon plant when present homozygously. In one embodiment, this nucleic acid molecule comprises a deletion corresponding to 13962 bp being deleted between base pair position 4930 and 18893 of SEQ ID NO: 13.

Moreover, this invention relates to use of said modified nucleic acid molecule for producing a watermelon plant that produces seeds with a microseed size. The watermelon plant that produces seeds with a microseed size may be produced by introduction of the modified nucleic acid molecule into its genome, in particular by means of mutagenesis or introgression, or combinations thereof.

The present invention relates to a method for the production of a watermelon plant that produces seeds that have a pale seed color, said method comprising: a) crossing a plant comprising the modified PPO gene of the invention with a plant not comprising said modified PPO gene; b) optionally performing one or more rounds of selfing and/or crossing a plant resulting from step a) to obtain a further generation population; c) selecting from the population a plant that homozygously comprises the modified PPO gene that produces seeds that have a pale seed color.

The present invention also relates to a method for the production of a watermelon plant that produces seeds that have a pale seed color and a microseed size, said method comprising: a) crossing a plant comprising the modified PPO gene of the invention and comprising the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, with a plant not comprising said modified PPO gene, non-functional HLS1 gene and non-functional BAG4 gene; b) optionally performing one or more rounds of selfing and/or crossing a plant resulting from step a) to obtain a further generation population; c) selecting from the population a plant that homozygously comprises the modified PPO gene and homozygously comprises the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, that produces seeds that have a pale seed color and a microseed size.

The present invention also relates to a method for the production of a watermelon plant that produces seeds that have a pale seed color and a microseed size, said method comprising: a) crossing a plant comprising the modified PPO gene of the invention with a plant comprising the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention; b) optionally performing one or more rounds of selfing and/or crossing a plant resulting from step a) to obtain a further generation population; c) selecting from the population a plant that homozygously comprises the modified PPO gene and homozygously comprises the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, that produces seeds that have a pale seed color and a microseed size.

The present invention relates to a method for the production of a watermelon plant that produces seeds that have a pale seed color, said method comprising: a) crossing a plant comprising the modified PPO gene of the invention with a plant not comprising said modified PPO gene; b) backcrossing the plant resulting from step a) with the parent not comprising the modified PPO gene for at least three generations; c) selecting from the third or higher backcross population a plant that homozygously comprises the modified PPO gene that produces seeds that have a pale seed color.

The present invention also relates to a method for the production of a watermelon plant that produces seeds that have a pale seed color and a microseed size, said method comprising: a) crossing a plant comprising the modified PPO gene of the invention and comprises the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, with a plant not comprising said modified PPO gene, non-functional HLS1 gene and non-functional BAG4 gene; b) backcrossing the plant resulting from step a) with a plant not comprising said modified PPO gene, non-functional HLS1 gene and non-functional BAG4 gene for at least three generations; c) selecting from the third or higher backcross population a plant that homozygously comprises the modified PPO gene and homozygously comprises the non-functional HLS1 gene of the invention and/or the non-functional BAG4 gene of the invention, that produces seeds that have a pale seed color and a microseed size.

The presence of a modified PPO gene and/or modified PPO protein leading to a pale seed color may be detected using routine methods known to the skilled person such as RT- PCR, PCR, antibody-based assays, sequencing and genotyping assays, or combinations thereof. Such methods may be used to determine for example, a reduction of the expression of the wild type PPO gene, a reduction of the expression of wild type PPO protein, the presence of a modified mRNA, cDNA or genomic DNA encoding a modified PPO protein, or the presence of a modified PPO protein, in plant material or plant parts, or DNA or RNA or protein derived therefrom. Using the same routine methods the presence of a non-functional BAG4 and/or HLS1 gene and/or modified B AG4 and/or HLS 1 protein leading to a microseed size may be detected.

Modifications or mutations of the wild type PPO gene, the wild type BAG4 and/or the wild type HLS1 gene can be introduced randomly by means of one or more chemical compounds, such as ethyl methane sulphonate (EMS), nitrosomethylurea, hydroxylamine, proflavine, N-methly-N-nitrosoguanidine, N-ethyl-N-nitrosourea, N-methyl-N-nitro- nitrosoguanidine, diethyl sulphate, ethylene imine, sodium azide, formaline, urethane, phenol and ethylene oxide, and/or by physical means, such as UV-irradiation, fast neutron exposure, X-rays, gamma irradiation, and/or by insertion of genetic elements, such as transposons, T-DNA, retroviral elements.

Mutagenesis also comprises the more specific, targeted introduction of at least one modification by means of homologous recombination, oligonucleotide -based mutation introduction, zinc-finger nucleases (ZFN), transcription activator-like effector nucleases (TALENs) or Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) systems.

Modifying the wild type PPO gene, the wild type BAG4 and/or the wild type HLS1 gene could also comprise the step of targeted genome editing, wherein the sequence of the wild type PPO gene, the wild type BAG4 and/or the wild type HLS1 gene is modified, or wherein the wild type PPO gene, the wild type BAG4 and/or the wild type HLS1 gene is replaced by, respectively, another PPO, BAG4 or HLS1 gene that is modified. This can be achieved by means of any method known in the art for modifying DNA in the genome of a plant, or by means of methods for gene replacement. Such methods include genome editing techniques and homologous recombination.

Homologous recombination allows the targeted insertion of a nucleic acid construct into a genome, and the targeting is based on the presence of unique sequences that flank the targeted integration site. For example, the wild type locus of a PPO gene could be replaced by a nucleic acid construct comprising a modified PPO gene, the wild type locus of the BAG4 gene could be replaced by a nucleic acid construct comprising a modified BAG4 gene, and/or the wild type locus of the HLS1 gene could be replaced by a nucleic acid construct comprising a modified HLS1 gene.

Modifying the wild type PPO, the wild type BAG4 and/or the wild type HLS1 gene can involve inducing double strand breaks in DNA using zinc-finger nucleases (ZFN), TAF (transcription activator-like) effector nucleases (TAFEN), Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated nuclease (CRISPR/Cas nuclease), or homing endonucleases that have been engineered to make double-strand breaks at specific recognition sequences in the genome of a plant, another organism, or a host cell.

TAF effector nucleases (TAFENs) can be used to make double-strand breaks at specific recognition sequences in the genome of a plant for gene modification or gene replacement through homologous recombination. TAF effector nucleases are a class of sequence-specific nucleases that can be used to make double-strand breaks at specific target sequences in the genome of a plant or other organism. TAF effector nucleases are created by fusing a native or engineered transcription activator-like (TAL) effector, or functional part thereof, to the catalytic domain of an endonuclease, such as, for example, Fok I. The unique, modular TAL effector DNA binding domain allows for the design of proteins with potentially any given DNA recognition specificity. Thus, the DNA binding domains of the TAL effector nucleases can be engineered to recognise specific DNA target sites and thus, used to make double-strand breaks at desired target sequences.

ZFNs can be used to make double-strand breaks at specific recognition sequences in the genome of a plant for gene modification or gene replacement through homologous recombination. The Zinc Finger Nuclease (ZFN) is a fusion protein comprising the part of the Fok I restriction endonuclease protein responsible for DNA cleavage and a zinc finger protein which recognizes specific, designed genomic sequences and cleaves the double-stranded DNA at those sequences, thereby producing free DNA ends (Urnov et al, 2010, Nat. Rev. Genet. 11:636-46; Carroll, 2011, Genetics 188:773-82).

The CRISPR/Cas nuclease system can also be used to make double-strand breaks at specific recognition sequences in the genome of a plant for gene modification or gene replacement through homologous recombination. The CRISPR/Cas nuclease system is an RNA- guided DNA endonuclease system performing sequence-specific double-stranded breaks in a DNA segment homologous to the designed RNA. It is possible to design the specificity of the sequence (Jinek et al, 2012, Science 337: 816-821; Cho et al, 2013, Nat. Biotechnol. 31:230-232; Cong et al, 2013, Science 339:819-823; Mali et al., 2013, Science 339:823-826; Feng et al, 2013, Cell Res. 23:1229-1232). Cas9 is an RNA-guided endonuclease that has the capacity to create double- stranded breaks in DNA in vitro and in vivo, also in eukaryotic cells. It is part of an RNA-mediated adaptive defence system known as Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) in bacteria and archaea. Cas9 gets sequence-specificity when it associates with a guide RNA molecule, which can target sequences present in an organism’s DNA based on their sequence. Cas9 requires the presence of a Protospacer Adjacent Motif (PAM) immediately following the DNA sequence that is targeted by the guide RNA. The Cas9 enzyme has been first isolated from Streptococcus pyogenes (SpCas9), but functional homologues from many other bacterial species have been reported, such as Neisseria meningitides , Treponema denticola, Streptococcus thermophilus, Francisella novicida, Staphylococcus aureus, etcetera. For SpCas9, the PAM sequence is 5’-NGG-3’, whereas various Cas9 proteins from other bacteria have been shown to recognise different PAM sequences. In nature, the guide RNA is a duplex between crRNA and tracrRNA, but a single guide RNA (sgRNA) molecule comprising both crRNA and tracrRNA has been shown to work equally well (Jinek et al, 2012, Science 337: 816-821). The advantage of using an sgRNA is that it reduces the complexity of the CRISPR-Cas9 system down to two components, instead of three. For use in an experimental setup (in vitro or in vivo ) this is an important simplification. An alternative for Cas9 is, for example, Cpfl, which does not need a tracrRNA to function, which recognises a different PAM sequence, and which creates sticky end cuts in the DNA, whereas Cas9 creates blunt ends.

On the one hand, genetic modification techniques can be applied to express a site- specific nuclease, such as an RNA-guided endonuclease and/or guide RNAs, in eukaryotic cells. One or more DNA constructs encoding an RNA-guided endonuclease and at least one guide RNA can be introduced into a cell or organism by means of stable transformation (wherein the DNA construct is integrated into the genome) or by means of transient expression (wherein the DNA construct is not integrated into the genome, but it expresses an RNA-guided endonuclease and at least one guide RNA in a transient manner). This approach requires the use of a transformation vector and a suitable promoter for expression in said cell or organism. Organisms into which foreign DNA has been introduced are considered to be Genetically Modified Organisms (GMOs), and the same applies to cells derived therefrom and to offspring of these organisms. In important parts of the worldwide food market, transgenic food is not allowed for human consumption, and not appreciated by the public. There is however also an alternative, “DNA-free” delivery method of CRISPR-Cas components into intact plants that does not involve the introduction of DNA constructs into the cell or organism.

For example, introducing the mRNA encoding Cas9 into a cell or organism has been described, after in vitro transcription of said mRNA from a DNA construct encoding an RNA-guided endonuclease, together with at least one guide RNA. This approach does not require the use of a transformation vector and a suitable promoter for expression in said cell or organism.

Another known approach is the in vitro assembly of ribonucleoprotein (RNP) complexes, comprising an RNA-guided endonuclease protein (for example Cas9) and at least one guide RNA, and subsequently introducing the RNP complex into a cell or organism. In plants, the use of RNPs has been demonstrated in protoplasts, for example with polyethylene glycol (PEG) transfection (Woo et al, 2015, Nat. Biotech. 33: 1162-1164). After said modification of a genomic sequence has taken place, the protoplasts or cells can be used to produce plants that harbour said modification in their genome, using any plant regeneration method known in the art (such as in vitro tissue culture).

Breaking DNA using site specific nucleases, such as, for example, those described herein above, can increase the rate of homologous recombination in the region of the breakage. Thus, coupling of such effectors as described above with nucleases enables the generation of targeted changes in genomes which include additions, deletions and other modifications.

The present invention will be further illustrated in the Examples that follow that are for illustration purposes only and are not intended to limit the invention in any way. In the description and the Examples reference is made to the below figures. FIGURES

Figure 1: Mature fruit of a watermelon ( Citrullus lanatus var. lanatus ) plant that is homozygous for both the modified PPO gene and the deletion on Chromosome 2 corresponding to 13962 bp being deleted between base pair position 29902114 and 29916077 on the Citrullus lanatus 97103_vl and has red fruit flesh and pale colored microseeds.

Figure 2: Seeds as deposited at the NCIMB under deposit Accession number 43364 with a pale color and a microseed size (left), and seeds of a wild type watermelon variety that are black and have a big seed size (right). The size bar indicates a size of 1 cm.

DEPOSIT INFORMATION

Seeds of watermelon ( Citrullus lanatus var. lanatus ) that are homozygous for both the modified PPO gene comprising an insertion of a T between nucleotides 711 and 712 (711_712insT) of SEQ ID NO: 1, and the deletion on Chromosome 2 corresponding to 13962 bp being deleted between base pair position 29902114 and 29916077 on the Citrullus lanatus 97103_vl, were deposited with the NCIMB Ltd, Ferguson Building, Craibstone Estate, Bucksburn, Aberdeen AB21 9YA, UK on 27 February 2019 under accession number NCIMB 43364.

The deposited seeds do not meet the DUS criteria which are required for obtaining plant variety protection, and can therefore not be considered to be plant varieties.

EXAMPLES

EXAMPLE 1

Phenotypic analysis of seed color

In a screening of the internal Rijk Zwaan germplasm collection for watermelon plants with seeds with a light seed color, two Citrullus mucosospermus accessions (RZ907-03 and RZ907-04) were selected that produce pale colored seeds. The color of the seeds was examined visually on fully developed, mature and dry seeds. The color of fully developed, mature and dry seeds of the two selected Citrullus mucosospermus accessions could be indicated as pale, beige, light yellow, pale yellow or light khaki.

Seed color of the dried seeds of the two selected accessions was also examined using image analysis. For this photography was conducted in a standardized set-up in a darkened room using a Nikon D7000 camera with a Nikon AF-S 35mm f/1.8G DX 35mm lens with circular B+W polarization filter. The standardized camera set up used daylight fluorescent lamps (4x36 watts, 5400 K, CRI 98, 40 kHz) with a polarization filter. Lamp heads were angled at 45 degrees to the sample platform. Prior to taking photographs, lamps were turned on and allowed to warm up for at least 30 minutes. The camera was mounted on a stand with the lens pointing down and positioned over the sample platform. About 10 seeds that were clearly separated from each other were photographed for each sample. In each photograph an X-rite color-checker passport color- chart was included.

Color correction of the photographs was performed using an ImageJ macro (1.48u) and the X-rite color-checker passport color-chart. Image analysis and generation of calibrated RGB values was performed using a CellProfiler pipeline. The calibrated RGB color values were then translated into CIELAB L*a*b* color values (D65 illuminant and a 10 degree angle of observer) using a color calibration algorithm.

The average calibrated RGB values and calibrated CIE L*a*b* values (107D65) as determined for fully developed, mature and dry seeds of the selected Citrullus mucosospermus accessions with a pale seed color and two Citrullus lanatus var. lanatus accessions with dark seeds are presented in Table 2.

The L* (107D65) values indicates lightness on a 0 to 100 scale, where 0 is black and 100 is white. As is clear from Table 2, the L* values most clearly show the color difference between the pale and dark seeds: for the pale seeds of the two selected Citrullus mucosospermus accessions the average L* (107D65) values were 66.10 and 72.20; whereas the average L*

(107D65) value was 46.50 for the brown Citrullus lanatus var. lanatus accession, and 34.70 for the black Citrullus lanatus var. lanatus accession.

Table 2: Average color in calibrated RGB values and in calibrated CIE L*a*b* values (107D65) of fully developed, mature and dry seeds of the selected Citrullus mucosospermus accessions with a pale seed color and two Citrullus lanatus var. lanatus accessions with dark seeds, as determined by image analysis. EXAMPLE 2

Phenotypic analysis of seed size

In a screening for watermelon plants with seeds with a small seed size two Citrullus lanatus var. lanatus accessions were selected that produce seeds with a microseed size. The size of the fully developed, mature and dry seeds was initially assessed by visual inspection and by determining the 100SDW (weight of a batch of 100 seeds). The 100SDW of Citrullus lanatus var. lanatus accession RZ907-01 was 0.8 g on average (standard deviation 0.3), while the 100SDW of Citrullus lanatus var. lanatus accession RZ907-02 was 0.6 g on average (standard deviation 0.1). The 100SDW for the seeds of the two Citrullus lanatus var. lanatus accession with microseeds and that of three control accession with medium or big seeds is presented in Table 3.

Seed size of the fully developed, mature and dry seeds of the two selected accessions and control accessions with medium or big seeds was examined by weighing with a precision balance and measuring with a caliper. For each accession 3 to 10 batches of seeds were weighed and measured, and the average 100SDW and seed sizes calculated. The average 100SDW and seed sizes determined are presented in Table 3.

Table 3: Average weight of 100 seeds (100SDW) and average seed size of fully developed mature dried seeds of the selected Citrullus lanatus var. lanatus accessions with a microseed size and three accessions with medium or big seeds, as determined visually, by weighing with a precision balance and by measuring with a caliper. EXAMPLE 3

QTL mapping and gene identification for pale seed color and microseed size

Four mapping populations were developed in order to map both the genomic region responsible for the pale seed trait and the genomic region responsible for the microseed size trait.

A first mapping population for mapping the genomic region responsible for the pale seed trait resulted from a cross between a Citrullus mucosospermus plant (RZ907-04) with pale colored seeds with a normal seed size and a Citrullus lanatus var. lanatus plant (Giong) with brown seeds of medium seed size. 182 RIL lines were developed up until the F5 generation, in order to map the genomic region responsible for the pale seed color. The color of the seeds produced by these F5 plants was phenotyped by image analysis, while the F5 plants were genotyped using 168 markers of which 104 where informative for the map construction.

Two mapping populations resulted from a cross between a Citrullus mucosospermus plant (RZ907-03) with pale colored seeds with a normal seed size and a Citrullus lanatus var. lanatus plant (RZ907-01) with dark colored microseeds. The F2 population was used for initial QTL mapping of both the pale seed color and microseed size traits. For finemapping and validation of the QTLs, near isogenic lines (NILs) were developed out of the F2 plants by using Citrullus lanatus var. lanatus plant (RZ907-01) as the recurrent backcross (BC) parent. Both the parent lines, the F2 population (155 individuals genotyped with 137 informative markers) and the BC3F3 NILs population (182 informative individuals genotyped with 136 informative markers) were phenotyped for seed color and seed size and genotyped for genetic map construction.

For finemapping and validation of the QTLs a pale seeded NIL (BC3F2) from the cross between a Citrullus mucosospermus plant (RZ907-04) and a Citrullus lanatus var. lanatus plant (Giong), was crossed with a Citrullus lanatus var. lanatus plant (RZ907-02) with dark colored microseeds. All three parent lines and the F2 population from this cross (181 individuals genotyped with 104 informative markers) were phenotyped for seed color and seed size and genotyped for genetic map construction.

In the development of these mapping populations both the pale seed color trait and the microseed size trait showed a monogenic recessive inheritance. It also became clear that both the pale seed color and the microseed size phenotype are determined by the genotype of the plant that produces the seed.

Before the genetic map construction, non-polymorphic and uninformative markers and strong deviant individuals were removed. During the genetic map construction two linkage groups were constructed. Numbering and orientation of the linkage groups was done using physical assembly cl_97103_vl as reference, publicly available at http://cucurbitgenomics.org. Marker phase correction was performed using the marker information from the parental lines and the grouping structure. Map integration was performed using the physical assembly cl_97103_vl as reference. QTL analysis was performed using the R software package.

A quantitative trait locus (QTL) for pale seed color on Chromosome 3 was identified in all 4 mapping populations. By fine mapping the QTL region could be reduced to a region of approximately 110 kB, which comprises 5 PPO genes and one unknown predicted gene.

By studying whole genome sequence data of both sources of the pale seed color trait and comparing the sequence with whole genome sequence data of material with dark colored seeds it was clear that there is a C/CA indel at position cl_97103_vl_Chr3:5706705 in the sources with pale colored seeds. The insertion of an A at position cl_97103_vl_Chr3:5706705 leads to a frameshift, which leads to the introduction of a premature stop codon in the PPO gene of SEQ ID NO: 1 (which gene is in the genome on the reverse strand). The modified PPO gene comprises an insertion of a T between nucleotides 711 and 712 of SEQ ID No. 1. This one base pair insertion leads to a frameshift, which leads to 13 amino acids being encoded in the wrong frame followed by a premature stop codon at position 751-753 of the modified PPO gene (SEQ ID NO:2).

Whereas the size of the wild type PPO protein is 587 amino acids (SEQ ID NO:5), the modified PPO protein (SEQ ID NO:6), if produced at all, is only 250 amino acids long, comprises only a small part of its Tyrosinase domain, lacks its conserved PPOl-DWL and PPOl- KFDV domains completely and comprises 13 altered amino acids at its C-terminus. The mutant protein is thus not functional.

A marker (named CL08381) was designed on the C/CA indel at position cl_97103_vl_Chr3:5706705. SEQ ID NO: 15 represents the allele of marker CL08381 as it is present in the genome of a plant comprising the modified PPO gene of this invention and producing pale colored seeds. SEQ ID NO: 14 represents the wild type allele of this same marker, as is present in genomes of plants that do not comprise the modified PPO gene of this invention. The nucleotide that is different between the two marker alleles of marker CL08381 is underlined and in bold in Table 4 below. The marker allele (SEQ ID No. 15) for the modified PPO gene has a single nucleotide insertion of an A that is underlined and in bold in Table 2 (position 101 of SEQ ID NO: 15). In all four mapping populations this marker showed a 100% correlation with the pale seed phenotype.

A quantitative trait locus (QTL) for microseed size on Chromosome 2 was identified in 3 mapping populations. In two of these populations the QTL interval was only 1 cM. By studying whole genome sequence data of both sources of the microseed size trait and comparing the sequence with whole genome sequence data of material with big or medium sized seeds it was clear that only in material with a microseed size (about 30 kB from the peak markers of two of the QTL populations) there is genomic deletion, corresponding to 13962 bp being deleted between base pair position 29902114 and 29916077 on chromosome 2 of the Citrullus lanatus 97103_vl genome. In other words, all nucleotides starting from the G at position 29902115 on chromosome 2 of the Citrullus lanatus 97103_vl genome to the A at position 29916076 on chromosome 2 of the Citrullus lanatus 97103_vl genome, have been deleted. Sequence SEQ ID NO:13 provides the cl_97103_vl genomic sequence from position 29897185 to 29920517 of chromosome 2. The genomic deletion conferring microseed size corresponds to a deletion of all nucleotides between base pair position 4930 and 18893 of SEQ ID NO: 13.

This genomic deletion leads to two genes being deleted: the HLS1 gene of SEQ ID NO: 7 and the BAG4 gene of SEQ ID NO: 10.

Markers were designed to detect the presence or the absence of the genomic deletion on chromosome 2 (CL_chr2_gapl and CL_chr2_gap2 as included in Table 4 below): In material comprising the genome deletion the primers of marker CL_chr2_gapl (SEQ ID NO: 16 and 17) amplify a PCR product of 446 bp, in material without said genome deletion no PCR product is amplified by these primers. In material not comprising said genome deletion the primers of marker CL_chr2_gap2 (SEQ ID NO: 18 and 19) amplify a PCR product of 945 bp, in material comprising said genome deletion no PCR product is amplified by these primers.

A PCR on DNA isolated from a plant producing microseeds gave a PCR product of 446 bp in a PCR with the primers of marker CL_chr2_gapl (SEQ ID NO: 16 and 17), and no PCR product in a PCR with the primers of marker CL_chr2_gap2 (SEQ ID NO: 18 and 19). A PCR on DNA isolated from a plant producing big seeds gave no PCR product in a PCR with the primers of marker CL_chr2_gapl (SEQ ID NO: 16 and 17), and a PCR product of 945 bp in a PCR with the primers of marker CL_chr2_gap2 (SEQ ID NO: 18 and 19). A PCR on DNA isolated from an FI plant resulting from a cross between a plant producing big seeds and a plant producing microseeds gave a PCR product of 446 bp in a PCR with the primers of marker CL_chr2_gapl (SEQ ID NO: 16 and 17), and a PCR product of 945 bp in a PCR with the primers of marker CL_chr2_gap2 (SEQ ID NO: 18 and 19).

Table 4: Marker information.

EXAMPLE 4

Combining the pale seed color and microseed size traits in one plant

Out of the cross between a Citrullus mucosospermus plant (RZ907-03) with pale colored seeds with a normal seed size and a Citrullus lanatus var. lanatus plant (RZ907-01) with dark colored microseeds, plants were selected using markers SEQ ID NO: 14-19 in which both the microseed and the pale seed trait were fixed. Besides selecting for these two traits plants were also selected for having fruit flesh that is red and has a high brix level. These selected plants were homozygous for both the modified PPO gene and the deletion of 13962 bp on Chromosome 2 corresponding to base pair position 29902114 and 29916077 on the Citrullus lanatus 97103_vl genome. Figure 1 shows a picture of a mature fruit of such a plant. The mature fruits of the selected plants have red fruit flesh with pale colored microseeds. The brix levels of mature fruits of these plants vary between 9.0 degrees Brix and 14.4 degrees Brix. Seeds resulting from a self- pollination on such a plant with an average brix level of mature fruits of 10.0 (Std 0.87) were deposited at the NCIMB under deposit Accession number 43364. Table 5 and Table 6 present the color and seed size data gathered on fully developed, mature and dried seeds as deposited and seeds of control varieties with big or medium sized seeds and/or seeds with a dark color. Table 5: Average seed color of fully developed mature dried seeds in calibrated RGB values and in calibrated CIE L*a*b* values (107D65) for seeds of the deposit NCIMB 43364 with a pale seed color and microseed size and for brown (variety Giong) and black seeds (RZ907-1) not comprising the modified PPO gene of the invention, as determined by image analysis. Photographs were taken on a black background.

Table 6: Average seed size of fully developed mature dried seeds for seeds of the deposit NCIMB 43364 with a pale seed color and microseed size and for medium (variety Giong) and big (RZ907- 04) sized seeds not comprising the genomic deletion on chromosome 2 of this invention, as determined visually, by weighing with a precision balance and by measuring with a caliper.

Figure 2 shows seeds as deposited at the NCIMB under deposit Accession number 43364 with a pale color and a microseed size (left), and seeds of a wild type publicly available watermelon variety that are black and have a big seed size (right). The bar indicates a size of 1 cm. EXAMPLE 5

Modifying the PPO gene to produce the pale seed color trait

Seeds of the watermelon plants of interest with dark colored seeds are mutagenized in order to introduce mutations into the genome. Mutagenesis is achieved using chemical means, such as EMS treatment, fast neutron (FN) radiation or specific targeted means such as CRISPR.

The skilled person is familiar with chemical, radiation and targeted means for introducing mutations into a genome.

Mutagenized seed is then germinated, the resultant plants are selfed or crossed to produce M2 seed. A tilling screen for PPO gene modifications which are responsible for the pale seed color trait is performed. PPO gene modifications are identified based on comparison to the wild type PPO DNA sequences listed in SEQ ID NO: 1 and SEQ ID NO: 3. The skilled person is also familiar with tilling (McCallum et. al. (2000) Nature Biotechnology, 18: 455-457) and techniques for identifying nucleotide changes such as DNA sequencing, amongst others.

Watermelon plants with a modified PPO gene can be identified and selected on the basis of modifications to the PPO gene. Preferably, PPO gene knockout mutants (encoding a premature stop codon) are selected, but also PPO amino acid change mutants can result in a pale seed color. Amino acid change mutants that are most likely to be deleterious to the function of the protein and thus most likely to result in a pale seed color can be selected using a predictive tool such as SIFT or PROVEAN. Mutants are homozygous or made homozygous by selfing, crossing or doubled haploid techniques which are familiar to the skilled person. Seed color of said homozygous plants can then be analyzed visually, with a colorimeter or using image analysis, to confirm that they have a pale seed color.

EXAMPLE 6

Modifying the HLS1 gene and/or the BAG4 gene to produce the microseed trait

Seeds of the watermelon plants of interest with medium or big seeds are mutagenized in order to introduce mutations into the genome. Mutagenesis is achieved using chemical means, such as EMS treatment, fast neutron (FN) radiation or specific targeted means such as CRISPR. The skilled person is familiar with chemical, radiation and targeted means for introducing mutations into a genome.

Mutagenized seed is then germinated, the resultant plants are selfed or crossed to produce M2 seed. A tilling screen for BAG4 gene and/or HLS1 gene modifications which are responsible for the microseed size trait is performed. HLS1 gene modifications are identified based on comparison to the wild type HLS1 DNA sequences listed in SEQ ID No. 7 and SEQ ID No. 8. BAG4 gene modifications are identified based on comparison to the wild type BAG4 DNA sequences listed in SEQ ID No. 10 and SEQ ID No. 11. The skilled person is also familiar with tilling (McCallum et. al. (2000) Nature Biotechnology, 18: 455-457) and techniques for identifying nucleotide changes such as DNA sequencing, amongst others.

Watermelon plants with a non-functional HLS1 gene can be identified and selected on the basis of deleterious mutations to the HLS1 gene. Preferably, HLS1 gene knockout mutants (encoding a premature stop codon) are selected, but also HLS1 amino acid change mutants can result in a pale seed color. Amino acid change mutants that are most likely to be deleterious to the function of the protein and thus to result in a microseed size can be selected using a tool such as SIFT or PROVEAN.

Watermelon plants with a non-functional BAG4 gene can be identified and selected on the basis of deleterious mutations to the BAG4 gene. Preferably, BAG4 gene knockout mutants (encoding a premature stop codon) are selected, but also BAG4 amino acid change mutants can result in a pale seed color. Amino acid change mutants that are most likely to be deleterious to the function of the protein and thus to result in a microseed size can be selected using a tool such as SIFT or PROVEAN. Mutants are homozygous or made homozygous by selfing, crossing or doubled haploid techniques which are familiar to the skilled person. Seed size of said homozygous plants can then be analyzed visually by measuring or by using image analysis, to confirm that the microseed size results from one or more modification to the HLS1 gene and/or BAG4 gene.

SEQUENCE INFORMATION:

Table 1: PPO, HLS1 and BAG4 gene and protein sequences and their corresponding SEQ ID NOs (“CDS”: Coding sequence).

SEQ ID NO: 1 > PPO_WT_gDNA

ATGGCCTCTCTATCTCCTTCCATGCCACTAGCACTTTCCTCCGCCGCAATAACCACG GC

C ACC ACCGGCGGCGCCT CCTTT GGTCT GTTTT AT CGT A A A A A A A A AG AT CC AT CTT CCA

CCATTCATAGACTCAATAACTTGGTTGTGTGTAGCGGCTCCAATGGCAGTGGTGAAG A

AAGTAATAACTCATTATGGCCAGGCAAGTTTGTTGACCGGAGAGAAGCGCTTATCGG G

CTCGGCGGT CTGT AT GGCTC AGCTT C A AGT GCTTTTGGAGTT GAT CCCTT CGCTTT GGC

AGCTCCAGTCACAACCCCCGACCCTTCCAAGTGTGGATCAAGCACGGACTTGGCAGA T

GGCGTCAAAGATTTGGTTTGCTGCCCACCATCCACCAATAACGTAAAACCCTTCCTC A

A ACC ACGCGTT AGGA A AGCGGC AC A ATC ATT AG AT A A AG A AT AT ATTGA A A AGT AT A

AGGAAGCCGTAGCGCTTATGAAAGCGCTTCCTGATGATGATCCACGTAGTTTTAAAC A

GCAAGCACTTGTTCACTGTGCTTATTGTACTGGGGGTTACGATCAATTGGGTCTTCC AG GATAAAAGTTTATCGCGGTTTATCACGGATAGACAATGAAATATTTGCAAATAATTTG ACTTTTTTTGCT AT ATTTGA A A AC A ACCCT A A A AT GTT A AT ATGT A A A ATTTTT GTT GA ATT ATTTTT ATATATTGAATTAAAATT AACAACTC AATTTC AATTT AT ATT ATGAAGAT TTTCTTTTCAAAATGCTAAAATATTATTGTTTGAAGTTTCTGGAGAATAAACTTTCCAA AAAATTAT G A A ATT A A AT AAAATTAAAAT A A AT AT AT GT ATATATAATT GT ACC AT A A GTAAATCTTTT AT AAAAGAATT AT AACAAGTTGGGGTTATTTGAAGAGAAC ACAT AAA

GTGGACGGGTGACCCGAACCAATCGAACCGAATTGACATGGGAACCTTCTTCTCAGC G

GCTAGAGATCCCATCTTCTACGCCCACCACGCGAACGTGGACCGTTTTTGGTCCATA T

GGA A AT CCTT AGGCGA A A AGCGAC A AGAC ATT A A AG AC A A AG ATTT CCT A A ACGCTT

CCTTTGTATTCTACGATGAGAATGGTGAAGCTGTCCGAGTTTATGTCAAAGACTGTC TA

GAT ACC AGAGCCTT AGGCT AT GT CT ACGATGAC ACCGT ACC A ATT CC AT GGCT C A A A A

CACCTCCAACCCCACGAGTACCACGCACACCCAACAAAACCAAGAAGAAATCTACCA

AGAAGACCGGGAAGCTACCGTCGAGTGTTGACAAGATCGTCAGCTTCGAAGTCAAGA

GGCCGAAGAAATCGAGGGGTACGAAGGAGAAAGACGATGAAGAGGAGATTTTGGTG

ATT GAT GGG ATT G AGTT CG ACGG A A AC A AGGCT ATT A AGTTT GAT GTTTTT ATC A ATG

ACGAGGATGATAGGGAAATTAGAGCGGATAATTCTGAGTTTGCAGGGAGCTTTGTGA

ATGTGCCTCATATGAAAGGTAGCAGCAGCATGAACATAAAAACATGCCTTAGGTTAG

GG AT A ACT G A ACT GCTT GAG AGTTT GG AT GCGG AT A ACG ACG AT AGC ATT ATT GTT AC

ATTGGTCCCTAGGTTTGGAGATGGGTCCGCCACCGTTAAGGACATTAGAATTGAATA T

GATGCATAA

SEQ ID NO: 2

> PPO_MUT_gDNA_with INDEL

ATGGCCTCTCTATCTCCTTCCATGCCACTAGCACTTTCCTCCGCCGCAATAACCACG GC C ACC ACCGGCGGCGCCT CCTTT GGTCT GTTTT AT CGT A A A A A A A A AG AT CC AT CTT CCA CCATTCATAGACTCAATAACTTGGTTGTGTGTAGCGGCTCCAATGGCAGTGGTGAAGA AAGTAATAACTCATTATGGCCAGGCAAGTTTGTTGACCGGAGAGAAGCGCTTATCGGG

CTCGGCGGT CTGT AT GGCTC AGCTT C A AGT GCTTTTGGAGTT GAT CCCTT CGCTTT GGC

AGCTCCAGTCACAACCCCCGACCCTTCCAAGTGTGGATCAAGCACGGACTTGGCAGA T

GGCGTCAAAGATTTGGTTTGCTGCCCACCATCCACCAATAACGTAAAACCCTTCCTC A

A ACC ACGCGTT AGGA A AGCGGC AC A ATC ATT AG AT A A AG A AT AT ATTGA A A AGT AT A

AGGAAGCCGTAGCGCTTATGAAAGCGCTTCCTGATGATGATCCACGTAGTTTTAAAC A

GCAAGCACTTGTTCACTGTGCTTATTGTACTGGGGGTTACGATCAATTGGGTCTTCC AG

ATGGAAATCCTTAGGCGAAAAGCGACAAGACATTAAAGACAAAGATTTCCTAAACGC TTCCTTTGTATTCTACGATGAGAATGGTGAAGCTGTCCGAGTTTATGTCAAAGACTGTC T AGAT ACC AGAGCCTT AGGCT AT GT CT ACGATGAC ACCGT ACC A ATT CC AT GGCT C A A AACACCTCCAACCCCACGAGTACCACGCACACCCAACAAAACCAAGAAGAAATCTAC C A AGA AGACCGGG A AGCT ACCGT CGAGT GTTGAC A AGATCGTC AGCTT CGA AGT C A A GAGGCCGAAGAAATCGAGGGGTACGAAGGAGAAAGACGATGAAGAGGAGATTTTGG T GATT GAT GGG ATT G AGTT CG ACGG A A AC A AGGCT ATT A AGTTT GAT GTTTTT AT C A A TGACGAGGATGATAGGGAAATTAGAGCGGATAATTCTGAGTTTGCAGGGAGCTTTGTG AATGTGCCTCATATGAAAGGTAGCAGCAGCATGAACATAAAAACATGCCTTAGGTTA GGG AT A ACT G A ACT GCTT GAG AGTTT GG AT GCGG AT A ACG ACG AT AGCATTATT GTT A C ATTGGT CCCT AGGTTT GGAGATGGGT CCGCC ACCGTT A AGGAC ATT AGA ATTGA AT A TGATGCATAA

SEQ ID NO: 3 > PPO_WT_CDS

ATGGCCTCTCTATCTCCTTCCATGCCACTAGCACTTTCCTCCGCCGCAATAACCACG GC

C ACC ACCGGCGGCGCCT CCTTT GGTCT GTTTT AT CGT A A A A A A A A AG AT CC ATCTT CCA

CCATTCATAGACTCAATAACTTGGTTGTGTGTAGCGGCTCCAATGGCAGTGGTGAAG A

AAGTAATAACTCATTATGGCCAGGCAAGTTTGTTGACCGGAGAGAAGCGCTTATCGG G

CTCGGCGGT CTGT AT GGCTC AGCTT C A AGT GCTTTTGGAGTT GAT CCCTT CGCTTT GGC

AGCTCCAGTCACAACCCCCGACCCTTCCAAGTGTGGATCAAGCACGGACTTGGCAGA T

GGCGTCAAAGATTTGGTTTGCTGCCCACCATCCACCAATAACGTAAAACCCTTCCTC A

A ACC ACGCGTT AGGA A AGCGGC AC A ATC ATT AG AT A A AGA AT AT ATTGA A A AGT AT A

AGGAAGCCGTAGCGCTTATGAAAGCGCTTCCTGATGATGATCCACGTAGTTTTAAAC A

GCAAGCACTTGTTCACTGTGCTTATTGTACTGGGGGTTACGATCAATTGGGTCTTCC AG

TTCATGAGAGAATATTGGGGTCTTTGATTAAGGATCCTGATTTTGCATTGCCGTTTT GG AATT ATGACGCTCCGCAAGGGATGGAAAT ACCAAAAATCT AC ACGG AT AAAAGTTCG TC ACT CT AT GAT GC ATTT CGTGACGGACGT CAT C AGCCGCCGAC ATT GGTT GATTTGGA TT AC A AT GACGTT GAGCC A AC A AT A AGC AGAGA A A AGAT A AT CC A ATGC A ATCT A AG TGTTATGTATCGCCAGGTCGTGTCCGGCGCCCGTACGCCCTTGCTCTTTTTCGGCCAGC CTTATCGAAGTGGCAGCAACCCAAGTCCAGGAATGGGGACGGTGGAAAACCTTCCTC ACAATTCAATTCATTTGTGGACGGGTGACCCGAACCAATCGAACCGAATTGACATGGG AACCTTCTTCTCAGCGGCTAGAGATCCCATCTTCTACGCCCACCACGCGAACGTGGAC

GATTTCCTAAACGCTTCCTTTGTATTCTACGATGAGAATGGTGAAGCTGTCCGAGTT TA T GT C A A AGACTGT CT AGAT ACC AGAGCCTT AGGCT ATGTCT ACG AT GAC ACCGT ACC A ATTCCATGGCTCAAAACACCTCCAACCCCACGAGTACCACGCACACCCAACAAAACCA AGAAGAAATCTACCAAGAAGACCGGGAAGCTACCGTCGAGTGTTGACAAGATCGTCA GCTTCGAAGTCAAGAGGCCGAAGAAATCGAGGGGTACGAAGGAGAAAGACGATGAA GAGGAGATTTTGGT GATT GAT GGGATTGAGTTCGACGGA A AC A AGGCT ATT A AGTTT G AT GTTTTT ATC A AT G ACG AGG AT GAT AGGG A A ATT AG AGCGG AT A ATTCT G AGTTT GC AGGGAGCTTTGTGAATGTGCCTCATATGAAAGGTAGCAGCAGCATGAACATAAAAAC AT GCCTT AGGTT AGGG AT A ACT G A ACT GCTT GAG AGTTT GG AT GCGG AT A ACG ACG AT AGC ATT ATT GTT AC ATT GGT CCCT AGGTTT GGAGATGGGT CCGCC ACCGTT A AGG AC A TT AG A ATT G A AT AT GAT GC AT A A

SEQ ID NO: 4 >PPO_MUT_CDS

ATGGCCTCTCTATCTCCTTCCATGCCACTAGCACTTTCCTCCGCCGCAATAACCACG GC

C ACC ACCGGCGGCGCCT CCTTT GGTCT GTTTT AT CGT A A A A A A A A AG AT CC ATCTT CCA

CCATTCATAGACTCAATAACTTGGTTGTGTGTAGCGGCTCCAATGGCAGTGGTGAAG A

AAGTAATAACTCATTATGGCCAGGCAAGTTTGTTGACCGGAGAGAAGCGCTTATCGG G

CTCGGCGGT CTGT AT GGCTC AGCTT C A AGT GCTTTTGGAGTT GAT CCCTT CGCTTT GGC

AGCTCCAGTCACAACCCCCGACCCTTCCAAGTGTGGATCAAGCACGGACTTGGCAGA T

GGCGTCAAAGATTTGGTTTGCTGCCCACCATCCACCAATAACGTAAAACCCTTCCTC A

A ACC ACGCGTT AGGA A AGCGGC AC A ATC ATT AG AT A A AG A AT AT ATTGA A A AGT AT A

AGGAAGCCGTAGCGCTTATGAAAGCGCTTCCTGATGATGATCCACGTAGTTTTAAAC A

GCAAGCACTTGTTCACTGTGCTTATTGTACTGGGGGTTACGATCAATTGGGTCTTCC AG

TTCATGAGAGAATATTGGGGTCTTTGATTAAGGATCCTGATTTTGCATTGCCGTTTT GG A ATT ATGACTGCTCCGC A AGGGAT GGA A AT ACC A A A A ATCT AC ACGG AT A A A AGTT C GTCACTCTATGATGCATTTCGTGACGGACGTCATCAGCCGCCGACATTGGTTGATTTGG ATT AC A AT GACGTT GAGCC A AC A AT A AGC AG AGA A A AGAT A ATCC A AT GCA ATCT A A GTGTTATGTATCGCCAGGTCGTGTCCGGCGCCCGTACGCCCTTGCTCTTTTTCGGCCAG CCTTATCGAAGTGGCAGCAACCCAAGTCCAGGAATGGGGACGGTGGAAAACCTTCCT C AC A ATTC A ATT C ATTT GT GGACGGGT GACCCGA ACC A AT CGA ACCGA ATT GAC AT GG GAACCTTCTTCTCAGCGGCTAGAGATCCCATCTTCTACGCCCACCACGCGAACGTGGA CCGTTTTT GGTCC AT AT GGA A ATCCTT AGGCGA A A AGCG AC A AG AC ATT A A AGAC A A A GATTTCCTAAACGCTTCCTTTGT ATTCT ACGATGAGAATGGTGAAGCTGTCCGAGTTTA TGT C A A AGACTGT CT AGAT ACC AGAGCCTT AGGCT ATGTCT ACG AT GAC ACCGT ACC A ATTCCATGGCTCAAAACACCTCCAACCCCACGAGTACCACGCACACCCAACAAAACCA AGAAGAAATCTACCAAGAAGACCGGGAAGCTACCGTCGAGTGTTGACAAGATCGTCA GCTTCGAAGTCAAGAGGCCGAAGAAATCGAGGGGTACGAAGGAGAAAGACGATGAA GAGGAGATTTTGGT GATT GAT GGGATTGAGTTCGACGGA A AC A AGGCT ATT A AGTTT G AT GTTTTT ATC A AT G ACG AGG AT GAT AGGG A A ATT AG AGCGG AT A ATTCT G AGTTT GC AGGGAGCTTTGTGAATGTGCCTCATATGAAAGGTAGCAGCAGCATGAACATAAAAAC AT GCCTT AGGTT AGGG AT A ACT G A ACT GCTT GAG AGTTT GG AT GCGG AT A ACG ACG AT AGC ATT ATT GTT AC ATT GGT CCCT AGGTTT GGAGATGGGT CCGCC ACCGTT A AGG AC A TT AG A ATT G A AT AT GAT GC AT A A

SEQ ID NO: 5

> PPO_WT_protein

MASLSPSMPLALSSAAITTATTGGASFGLFYRKKKDPSSTIHRLNNLVVCSGSNGSG EESN

NSLWPGKFVDRREALIGLGGLYGSASSAFGVDPFALAAPVTTPDPSKCGSSTDLADG VKD

LVCCPPSTNNVKPFLKPRVRKAAQSLDKEYIEKYKEAVALMKALPDDDPRSFKQQAL VHC

AYCTGGYDQLGLPVELQVHFSWLFFPFHRFYLYFHERILGSLIKDPDFALPFWNYDA PQG

MEIPKIYTDKSSSLYDAFRDGRHQPPTLVDLDYNDVEPTISREKIIQCNLSVMYRQV VSGA

RTPLLFFGQPYRSGSNPSPGMGTVENLPHNSIHLWTGDPNQSNRIDMGTFFSAARDP IFYA

HHANVDRFWSIWKSLGEKRQDIKDKDFLNASFVFYDENGEAVRVYVKDCLDTRALGY V

YDDTVPIPWLKTPPTPRVPRTPNKTKKKSTKKTGKLPSSVDKIVSFEVKRPKKSRGT KEKD

DEEEILVIDGIEFDGNKAIKFDVFINDEDDREIRADNSEFAGSFVNVPHMKGSSSMN IKTCL

RLGITELLESLDADNDDSIIVTLVPRFGDGSATVKDIRIEYDA

SEQ ID NO: 6

> PPO_MUT_protein

MASLSPSMPLALSSAAITTATTGGASFGLFYRKKKDPSSTIHRLNNLVVCSGSNGSG EESN

NSLWPGKFVDRREALIGLGGLYGSASSAFGVDPFALAAPVTTPDPSKCGSSTDLADG VKD

LVCCPPSTNNVKPFLKPRVRKAAQSLDKEYIEKYKEAVALMKALPDDDPRSFKQQAL VHC

AYCTGGYDQLGLPVELQVHFSWLFFPFHRFYLYFHERILGSLIKDPDFALPFWNYDC SARD

GNTKNLHG

SEQ ID NO: 7 >HLS l_WT_gDNA

ATGGGGTTTAAAGGCTTTGTTATTCGAAGCTACGAAGAGAGTCAATTATCAGATAAA G

CTCAAGTTATGGATCTTGAACGAAGATGTGAAATTGGCCAATCAAAACGTGTGTTTC T

CTTCACTGACACTTTGGGTGACCCCATTTGTAGGATACGTAACAGTCCCATGTATAA A

AATTTGTGAATTGAAAATTTTTTATTATTAAGGTTGCTGAGCGGGACAAGGAAGTGG T TGGTGTT ATTC A AGGCT CT AT A A A ACCGGTTTTTTTT ACT GCT CAT A A ACCGCCGCCCG GTTTGGTGGTTAAACTGGGCTACATTCTTGGCCTGAGAGTGGCACCGCCGTATCGCCG CCGTGGAATTGGCTCTAGCCTCGTCCGCCGTTTGGAAGATTGGTTCCTTTCTAATGATG

TTGATTACTGTTGTATGGCCACTGAGAAAGATAATCATGCCTCTCTTAATCTCTTCA TC

C A ACT CGT A A ATGA AGTT A A ACTTGA A AGTTT AGAGGC AT ATT AGA A ATTT CTTT AAA T ATT CTTTTTCCC A AC AGGT AC AT A A AGTTT AGA AC AGGA AGGAT CTT GGT A A ACCC A GT A AG A A ATC ATCC AT AC A AT AT G A ATT CAT C AG A A AT C A AC ATT C A A A AGCT A A A A AT AGA AGA AGC AG A AGC A AT AT AC A A A A A AC AC ATGGCTTC A AC AGAGTT CTT CCCC AAAGACATAAAAAACATATTGAAAAACAAGTTGAGTTTAGGGACATGGGTGGCAAAT TTCAAACAACCGCCATGGTCGTCGTCGAACTCTGTTGGAGGAAACGGGCAGACTATGG CGAGT AGCTGGGCC ATT GT A AGTCT AT GGA AC AGT GGGGA AGTTTT C A AGCT A AGGCT AGGAAAAGCACCATTTCCATGGCTTATCTACACAAAGAGTTTAAAAATTATGGATAAA

TTT GTTT AT GG ATT GC ACC AT G A AGGCCCTTTTT CT GAG AG ATT GGTT GG AGCTTT GT G CAAATTTGTGCACAATATGGCATTGAATAATTCAAAGGATAATTGTAAAGCTATTGTT ACTGAGATTGGAGGTGATGAGGATGATGGGCTGAAAATGGAGATTCCTCATTGGAAA TT GCT ATC AT GTT AT G A AG ATTTTT GGT GC AT A A AGT CCTT G A A A AGT A AG AG AT AT A AT A AT ATT AGT A AT GAT A AT GAT A ACGAT A ACGAT C ACGATC ATC AT AT ATTGGA ATG GAC A A ATGCCT C ACCT A AT AG A ACT CT CTTTGT AGACCC A AGAGAGGT AT A A

SEQ ID NO: 8 >HLS 1_WT_CDS

ATGGGGTTTAAAGGCTTTGTTATTCGAAGCTACGAAGAGAGTCAATTATCAGATAAA G

CTCAAGTTATGGATCTTGAACGAAGATGTGAAATTGGCCAATCAAAACGTGTGTTTC T

CTTCACTGACACTTTGGGTGACCCCATTTGTAGGATACGTAACAGTCCCATGTATAA A

ATGCTGGTTGCTGAGCGGGACAAGGAAGTGGTTGGTGTTATTCAAGGCTCTATAAAA C

CTTGGCCTGAGAGTGGCACCGCCGTATCGCCGCCGTGGAATTGGCTCTAGCCTCGTC C GCCGTTT GG A AG ATT GGTT CCTTTCT A AT GAT GTT GATT ACT GTT GT AT GGCC ACTG AG AAAGATAATCATGCCTCTCTTAATCTCTTCATCAATAATTTGAGGTACATAAAGTTTAG A AC AGGA AGGAT CTT GGT A A ACCC AGT A AG A A AT CAT CC AT AC A AT AT GA ATTC AT C A GA A ATC A AC ATT C A A A AGCT A A A A AT AGA AG A AGC AGA AGC A AT AT AC A A A A A AC A CATGGCTTCAACAGAGTTCTTCCCCAAAGACATAAAAAACATATTGAAAAACAAGTTG AGTTTAGGGACATGGGTGGCAAATTTCAAACAACCGCCATGGTCGTCGTCGAACTCTG TTGGAGGAAACGGGCAGACTATGGCGAGTAGCTGGGCCATTGTAAGTCTATGGAACA GTGGGGAAGTTTTCAAGCTAAGGCTAGGAAAAGCACCATTTCCATGGCTTATCTACAC

AGT CCTT G A A A AGT A AG AG AT ATAATAAT ATT AGT A AT GAT A AT GAT A ACG AT A ACG ATCACGATCATCATATATTGGAATGGACAAATGCCTCACCTAATAGAACTCTCTTTGT AGACCC A AGAGAGGT AT A A

SEQ ID NO: 9 > HLS 1 _WT_protein

MGFKGFVIRSYEESQLSDKAQVMDLERRCEIGQSKRVFLFTDTLGDPICRIRNSPMY KMLV

AERDKEVV GVIQGSIKPVFFT AHKPPPGLVVKLGYILGLRV APPYRRRGIGSSLVRRLEDW

FLSNDVDYCCMATEKDNHASLNLFINNLRYIKFRTGRILVNPVRNHPYNMNSSEINI QKLKI

EEAEAIYKKHMASTEFFPKDIKNILKNKLSLGTWVANFKQPPWSSSNSVGGNGQTMA SSW

AIVSLWNSGEVFKLRLGKAPFPWLIYTKSLKIMDKIFPCFKVVLVPNFFKPFGFYFV YGLH

HEGPFSERLVGALCKFVHNMALNNSKDNCKAIVTEIGGDEDDGLKMEIPHWKLLSCY EDF

WCIKSLKSKRYNNISNDNDNDNDHDHHILEWTNASPNRTLFVDPREV

SEQ ID NO: 10 >BAG4_WT_gDNA

ATGAAAAAATGGTGTTCAAAAGGAAGCCAAATTAGGAGCGAAGAGTATGGAAGAGG AGACGTAGATTGGGAGCTCCGACCAGGTGGAATGATTGTTCAGAAACGACATGTCGG GT CGGGTT CGGGTT C A A ATTCGG AGCGTTTC ATT AC A AT C A ACGT ATCTC AT GGGT CTT ATCGTCATCAAATCACCGTCGATTCTCATTCCACATTTGGTATGTTATCATTTCAATTT

TTTT GT GTTT GGGAC AGGGA ATTT A A AGAC AGTTTT ACGAC AGC AGAC AGGGTT AG AG CCGAGGG A AC AGAGATT GTT GTTT A A AGGGA AGGAGA AGGAGA ACGACGAGTGGTT G CAT ATGGCCGGTGTGAACGACATGTCGAAACTCATACTCATGGAAGATCCTGCT ACTA AAGAGAGGAAGCTTCAAGAGATGAAGAAGAAGAATACCACTGCTGCAGGCGAAGCA CTGGCGGGGATC AGAGCGGAGGT CGAT A A ACTCT CCGA A A AGGTT CGTT A A AT CGTT A A ATT AC A ACTTT AGT CGA A A A AT AT ATTTGAGA A A ATT GT A A A A ACC ACTCGTGGT A A TT AC AGTT AT ACCTT C A A ACTTTT A AT ATT A A A A ATT A AGT CTTT A A ATTT AT ATT ATT GTTAAAATTGGACTCTTAAATTTTGTTTAATTGTAGAATTGAAGCTCAAAATGATAAA

TTTAAATTCTT A ATTT CTCTAACATTTT GT GTTT A AT A A AT GGGT A AT A ATTT ATTT CAT A AATTTTTT A AGTTC AC ACCT A A A ATTC A ATTTT AT A ACT A A A A A ATT A ATT A A ATTTT ACTTATTT ATTT ATT AT GAT ATTCACATACTTTT A AG AT ATTT G A ATT CT C A AGT G A ATT TTTTTTTAAACAACAAGTTTTTCTGGAAATTGACAAAAAAAAGAAAAAAGTAGTT TTA A AT GCCTT GCTTTT ATTTT ATTTT ATTT A AT G A AGTTTT GAT A AT GAT AC A A AT GTTT AT GT AACAAAAAT G A A A AC ATTT G A A AG A A A A A A AT GGTT ATT AGGT AACATTTCTAAA GTTTAAAAACCTATTTGAGACAAATGTGAAAGTTCAATAACTCATTGAATTGCTTTAG A AGGTTT A A A A A AC A A AT AGTT AC A A AC A AGCTT AG AG ACT A A ACTTCT A ATTGA ACC

TAAAAAAAAAAATT GG A A AGGGGC A A A A A AGT GT G AGT GT ATAAAATT AGGGTTTTT AAGTGCAGGTTGCGGCAGTGGAAGGTAGTGTTAATGACGGGAAGAGGGTGGAAGAGA A AG A AGTT A ATTT ATT GAT AG AGTT GTT GAT GAT GCA ATT GTT G A A ATT AG AT GCA AT T G AG ACGG AT GGGG ATT CCAAACTT C A A AG ACG A ACT C AGGT ATCT ATT GG ACT AT AT GT C A ATT AT CATTAAAAATAAATTT ACTTT GGCTT AT CT ATTT AT A AT A ATT AGGGT AT ACAAATTT A AGGT G ATTT A A AT CGTTTT CTTTCATTAATCTAACTAT GC A A A ACCGTT A C A AC A AT ACGT C ACCTTT A A AT A ACTTT A ACT ATTT ACC A A A ACTTT ATGA AGAGGAA

C ATTT GG AT G A ATT ATATATTTAAAT ATT AT ATTTCT AT GGT AG A A A ATT ATTT ACT AT T ATTT AATCAATTTTAATAATGATGGATTAAATTTAAGTTTT ATT AAATGAATACTTGA A A AT AT G A ATT AAAATTAAATATATATATTTTAATTTTTCAATTTT GGT ATTTATAATA A A A AGT ACGAT AGTTT A ATC ATT A ATGGGTT AGGT GGCT GGT GCT CCCCT AGGGCC AT CAAACTTAAAATAATTAAAAATAATGAAAGTCTCCTAAATTGTATGAAAATTCAATGA ATATAAATT GT G A A A A AT GAT A AT GGGT ATTTT AT CT ATTT ATTT ATT A ACT C A A A A A A A A A A ATT A AT A A AT AT AG ACT A A A A A A ATTGC AGA A AT AGGAC A A A ATGATTTT A AT TCTTTCCCTTGATATGACATTTTTATGTGGGACATTATGAAACCAAGAACTTATCAAGA AGG ATT CT ATT CAAAATAAATAAATAATT GATT A A AG A AG A A A ATT CC ATT A AT GT CC CT A A AGT CTTAATCACACCTCT ATTT AGCGT CT ATC AT G A AT A A A AT A A AT AG A A AT C AT AGG A AT GCTG AGGT GGC AT G A AC ACT AG AT AAAAATTTT AGGTTT A A AT ACT ACTT

TT AT A A ATTT AGTCC AT AT A ATTT G A AG A A AGTT AAAATTTAATCCT AT AGTTT AT A AT TAGAATTTAATCTCTATGATCTGATAAAATCCTCATAAATAATCTCACTACTGTAGAGA CT A A ACT AT AGGGA AC ATTT AT A AGGTTTT ATC A A ACC AT AGG A ACT A ATT CT AG ATT TT A A A ACC A A ATGGACC AGATTTT A ATTTTCT CCA A ACT AC AGGGGCC A A ATTCT A AT TTTTT CT A A ATT AT AGGAGAC A A ATTTGC A ATTT A ACCTTT A ACT AT AGTT A ATTTT GG TCCACTTACTTTCAAAATATCAATTTTAGTCCCGTGGTTTTAAAAAGTCTCCATTTTGG CCCCTT A AC A ATGA AC A A A A AT A AG AT A A A A AT AGT A ATT A A ATTTT A ATTTTGA ACT

ATT A ATTT G A AT A ACGTT AG A ATT GT A ATTTT ATAATTTT GGG A AT A A A AC AGGTT GTT

AGGGTACAGAAATTAGTGGACAGAATTGACAAGTTGAAGGTTAGAATCTCAAATCCT T

TAAACCAAACAACAATGAAAAGAGGCAAATGGGAGGAATTTGAATCTGGATTTGGCA

GCCTTATTCCTCCAACTTCAAAACTCACCATCAGCTCTACAAAAATAACTCATGATT GG

GAACTCTTTGATTAG

SEQ ID NO: 11 >BAG4_WT_CDS

ATGAAAAAATGGTGTTCAAAAGGAAGCCAAATTAGGAGCGAAGAGTATGGAAGAGG

AGACGTAGATTGGGAGCTCCGACCAGGTGGAATGATTGTTCAGAAACGACATGTCGG

GT CGGGTT CGGGTT C A A ATTCGG AGCGTTTC ATT AC A AT C A ACGT ATCTC AT GGGT CTT

ATCGTCATCAAATCACCGTCGATTCTCATTCCACATTTGGGAATTTAAAGACAGTTT TA

CGAC AGC AGAC AGGGTT AGAGCCGAGGGA AC AGAGATT GTT GTTT A A AGGGA AGGAG

AAGGAGAACGACGAGTGGTTGCATATGGCCGGTGTGAACGACATGTCGAAACTCATA

CTCATGGAAGATCCTGCTACTAAAGAGAGGAAGCTTCAAGAGATGAAGAAGAAGAAT

ACCACTGCTGCAGGCGAAGCACTGGCGGGGATCAGAGCGGAGGTCGATAAACTCTCC

GAAAAGGTTGCGGCAGTGGAAGGTAGTGTTAATGACGGGAAGAGGGTGGAAGAGAA

AG A AGTT AATTTATT GAT AG AGTT GTT GAT GAT GCA ATT GTT G A A ATT AG AT GCA ATT

GAGACGGAT GGGGATT CCA A ACTT C A A AGACGA ACT C AGGTTGTT AGGGT AC AGA A A

TTAGTGGACAGAATTGACAAGTTGAAGGTTAGAATCTCAAATCCTTTAAACCAAACA A

CAATGAAAAGAGGCAAATGGGAGGAATTTGAATCTGGATTTGGCAGCCTTATTCCTC C AACTTCAAAACTCACCATCAGCTCTACAAAAATAACTCATGATTGGGAACTCTTTGAT

TAG

SEQ ID NO: 12 >BAG4_protein

MKKW CS KGS QIRSEE Y GRGD VD WELRPGGMI V QKRH V GSGSGSNSERFITIN V SHGS YRH

QITVDSHSTFGNLKTVLRQQTGLEPREQRLLFKGKEKENDEWLHMAGVNDMSKLILM ED

PATKERKLQEMKKKNTTAAGEALAGIRAEVDKLSEKVAAVEGSVNDGKRVEEKEVNL LI

ELLMMQLLKLDAIETDGDSKLQRRTQVVRVQKLVDRIDKLKVRISNPLNQTTMKRGK WE

EFESGFGSLIPPTSKLTISSTKITHDWELFD

SEQ ID NO: 13

> cl_97103_v 1 _Chr2 :29897185-29920517

TT GTT ACCT A AT A AG A A ATTT AG ACTTT AGT ATTT AGTTTTT G A A A ATT A AGTTT AT A A

TACCTATTGGTTTCTTTGTTTTGAAGTAAGTTTTGAAAACTAAAAACTTAAAAAAAG TC ATTTCTAAT A ATTTTTTTTT CT GG A A ATT GGTT A AG A ATTT A AGT GTT C AGT A A AGG A A GACGA A A ACC A AGT A ACC AT GAT A AGA A AG A ATGT A AGA A A AT A A AC AT A ATTTTT A A A A ACTC A A A ACT A A AT AGTT AT C A A ACC A A A A ACT AGT A ATTC A A AT A A AT AT AT AT AT A AGT ACTT GA ACT AGCCCC A ACC A A AGT ACTT ATTT GATTGAGT A A ACT AC A ATT A A ATT A A A AGGTT A A A AT ATTT A ATT A ATCCTT A ATT A ACT A A A A AT AT C A AT ATT GAC AACTCATTATGGACCATATTACTCTCTCTCTCTCTCTTAATAAAAGAAACTTAATATAA GTT GA AGGTTGGT CC ATT GGTTT ATTTT ATTTT A A ATCTTTTT AGCCCTTCTT A A ATTTT T A ATTTT ATAAAAACAAATTTATT AGT CACCAAAAAAT GGTT GCTT GT AGG AT G A AT G A AGACGA A AT ACT A A AT A ATTTT ATTTT ATT ACC ACGCGT GGCC AC AGA A AT GTT ATT

TCAGCTTTGCGTGAGTTGGAACTTGGAATACACAACATATTATTTGACCATTGAAAA T CCAAAATCT G A ATC AC A ACCCT CT A ATTTT ATTCT GTT ATT ATT ATTT ATTT GG ATT G A AGT A ATCC AGCC ATT CCTTT ACTT ATTT ATTT ATTT A AT AT A AGGT A AGT GTT AC A A A A TCGAT A A AC A A AGA AT AC A A A A A AT AT A A A A AG AT AG AG A AT CGAC AT AT AG ATTT A CATGATTTACTAACAGTGTGTTAGTTACGTTCACAGAACAGATGAAACACAATTTTAT T AG AG AT A ATGTT GC AGA AT AC A AT AC AGTGAC ACCTCT ATCTTT AG AG A ATTT AT AT AGTGC ACTC ATTT A A ACCTT AGGGAT CAT A ATCGT A ATT A ACC AT A A AT A ATT A A AT A T AT C A A AT ACGA AT GCCCCC A A ATCCCC ACC AGT A AG AT CCC AT ACTT AGT C ATTTGA AGT GCC AGAT AGC AT AC A AC ATT ATT C A A AC A AT ATGTT AGTT GACC ACCTTGACTTT ATT GA A A A AGGATT C A A ATCGA ACCT AGT A AGATT GC ATTT AC A AT AGATT A AGA A A GTT CAT GA A AGA AT A AC AC A ATT ATTTT CTTTCC A A A A A AC AC AC AC A AT A ATTTT AC GTGGA A ACCCT CT A A AC A ATTT A AGGC A A A A ACT AT GGAT A A AG AT A AA AG AATTTC ACTATATAAAATAGGTATTACAATTTGTTTACAGACTCTCTCGTAAGATAAAATCTCTC T ATTTT AT ATCTCAATCTTTTCTCACTTAATTCTGGTTTGTTAAGCAACCATGGGTTCTG ACCTCGTTATAACTAAAAGAAATTTAGCCATCAAACAAATTCAATGGTCCCAAAATGA CTTT G A AT A AT GT G A A AGTTTT A AGGGT CACAAATT G ACCTT G A AT GG AGT GT A A A AT CT A ATT A ACC AC A A ATT GGCCT AC A AGA A A ATGT ATTGA AT A AT CTT GA AG AT A AT AT T ATTT AACTAACTTCTCATAAGAAAAATTGTT ATTTT AATTGTTGAGTCGCTCACGTTG

A AGAAT A AT A AT A A ACTC AT GATTT AGA ATTT A ATTT C AGCGT A AC ACT ACGAGCGAT TT GATTT GA A ATGT A ATT AT A ACCGAGGGGT A A ATTT CGCGGCCC ATTT A AC AG ACC A TTTACAAAACTTGAGCCGGGCTGCCACCATGTGGGCTGGGTGTCAAATGCAACTGGTG AAGTGGCCTGCTGATGGGCCGCTCCATCCAACCCATTCAAACCTTAAAAAAGAGTTAA AAA AT ATT ACT ACGGTT AGTTTT GGACGA A AT CCT ATGA ACTTTC A A A ATT AT A A A A A AT ATT CTT C A ACTT A A A A A A A A A A A A A A A A A A ACT AT CTTTCCT ATT AAT AT AT A A AT AGA A ACT ATTT AT ACGT CGTTGC A A A A AT A A AT AT GAT ATTTGAGATTTTTTTTTTT A A

TGAAATTACCTCAACATGCATATAAACACTAGTTTTGAGCCACAAATCACTATGGAT C AT A AGT ATCTT GACTTC AT AT GACTC A AGT A A A ACTTCGGATT C A ACCCT AC A AC ATTT C ACT AGTTGT ATTT AGGGGCT AGCTT AC ATTTTGA A AT A AGT AGGT AAT A ATT A ATT GC TT G ACT AT GT AT AGT GATT GT AAT A AGGCTT AGC A AT AAT A A A AG AT GGTTT AT GATT TCT ATTTTTCCTTT CT A A AGT AT GGCCGAT ACC ATTTTC AT A A ACGTT A ACC ATT AC A A GAGATTGAT AT AAT A AC ATTGCT A ACGTT GTT A AC A A AT GT ATT AT CT AT AC A A A AT A ATGTTTTCAACAATTACAATAGGGAAGAAAACATGGAAAAGTAGTGCTTCACACTTTG GGC A A ATCTTGTTT A ATT A ATT A ATCTT GTTT CGA AT AT AAT A ATGC A AT ATTTT CTTTT

TTTAAAATT AG AT ACTT G ATC A AGTT GTT AG A AT CTTTCTT GTTT ACT AG ATTTT CTCC A ACTTCTATGATTTGTTTGGTTTTCAATATTCATGGCTTTTAGATTTGATACGAGGGTATT TCTGTCCATCAATTTTCTAGTTGCTATATAATTTACTTTTCAAATATTTGAACTAAATAC TGATCTATTTGATATATTAATTATTTTGTTTCTATCATGATGTTTTTTTTTTTTTTTTTT TT TATAATTTTTAATGGCTTGGTATTCTCCAAGTGCAATTAACCATTTAAATGAAAATTTT GGT AAAAATTTACTATTT GATT AAAATAATTT G A A A ATC A AGTTT ATTTT GAG AT AT A A CCTC AT GCTT ATCTAACAT GGT ATC AT AGTT AT A A AGTT GAT ATT CAATCTCAATAAAA A A A A A A A A A A A A A A A A A AG A A AC ATT GAGATCC A ATT C A AT A AGAGT GA ACC A A A A GATCCACCATCTCAAAGGGCATGTTGAGTGGTCATACTTTGAAAAAATTGAGAGACTT C ACGT G A AG AT ACTTAAATTAATT A ATT AAT ATTTT AAT GAT AACAAATCTT ATTTT AT ATTTAATGTTTTGAAGAGTTTATTGTTTTGAGTTCAGAATAATGATTCCAACGTAATAA TT ACTC A A AT AGT AGTT C A A A AGA AGA A A AT AT A AC A A AC ATT CT A AGTT A ACC A ATT

ATACAAAGCTTCCATTGTTACTTTCTTCAAGCTTCATTTTTTTTTTCATCTTATTTT TAA

GAG AG ATTT AT GT ATT GT GAG A ATT G ATTT GTTTT AT A AGCT C A A ATTT GT G A AT CCT A

CTTCTTTAAAGAGTTTTTCTCATCTTGTTCTTCATTTTCAAATTCATATTGTAAGTG GTT

ACTCTTGAACCTGTGAGAAGAGTGTGGTAACAATTTGATCATCAATGGTGGAAAACA T

CGAGTTTATTCTTGTGCCTGGAAAAAGAACGTTGTAGCAGTTCACCTCAAGCTGTGG A

ATGAATCGAGTTTGAGTGACCATATCCAAGAGAAACTTAGGGAGTGGATGTAGGTCG

GGT AGT GTC A A ACT ACTATAAAAT GT GT CAATTTCCTCTCT ACGT CTTCTAATTT ATTTT TGC A ATT A ATTT GTT ATT AT ATTTT ATT CC AT GA A ATT A ATTGC A AT ATTT ATTCCTT A A ATTAAGTGCTTACACATTGATTTATTTAGATGGGTTGAATTGATCTTTACATATCCTTT ACCAATCTTCTT G A ATC A A AT AGGGTTCT AT A A A AT ATT AGT GTT AATTTATT AT CT A A TT C A AG AT G A ATTT ATTT GC AT G A A ATT ATT GTTT A AGTT ACTT GCT CGC A A A A AGT AT ATT G ATTT GTTT A ATT GGGCCC ACTGT AGT A A A A AGTTTT A ATT A ATTT AATAAACTCT ATTCACCCCTTTAAGGTTGCCATACCAGTCCTACAATGAGATGGAACTCATGTACGTTT ATCTAATACCACCCTAATGCCTGCAAAATTTATAACGAGGGGTGAGAAATGAGAAAG AGGCGATCTCCCATTTCCGTCCCCGCCCCATCTCGTAAACATCCCTAATATATAATCCA

TGTTTT C A A AT A ACGT A AT C A ACGC A AGCTGA A AT A A A AC AGC AGCC ATTT GAGA ATC A AAC AC A A A A A ATGG A A A ATTT GT AT GT GT AT A ATT A ATT A ACC ATC A A A ATTTC AT A AACAAATAAACACTTCAAATCAACTTCACATTGTCCATAAATTGCAAAGACATATTTC C ATTTT AGG A AG A AT A A AGTTTT GGCCT AACTCATCCAT GGCT GT GTTTTTT A A A AGCC CTAATAATAATATATTCCAAAACTCTCCTCTTTATCTTTTTTCTCAAATCTCTCCCCTTT TTGCCCTTTGCCCTTCCCCCACCTGGCCTTCCTTTAAACCCTAATAATTCCAAACTCTCA

ATTTGTCCATTCCAATATATGATGATCGTGATCGTTATCGTTATCATTATCATTACT AA T ATT ATT ATATCTCTTACTTTT C A AGG ACTTT AT GC ACC A A A A AT CTTCATAACAT GAT AGCAATTTCCAATGAGGAATCTCCATTTTCAGCCCATCATCCTCATCACCTCCAATCTC AGT A AC A AT AGCTTT AC A ATT AT CCTTT GA ATT ATTC A ATGCC AT ATTGT GC AC A A ATT TGCACAAAGCTCCAACCAATCTCTCAGAAAAAGGGCCTTCATGGTGCAATCCATAAAC A A A AT AGA ACCC A A AT GGCTT GA A A A A ATT AGGC ACC A A A ACC ACTTT A A AGC A AGG TAGCTTTTGAATGTTGATTTCTGATGAATTCATATTGTATGGATGATTTCTTACTGGGTT T ACC A AG ATCCTT CCT GTT CTAAACTTTAT GT ACCTGTT GGG A A A A AG A AT ATTTAAAG AAATTTCTAATATGCCTCTAAACTTTCAAGTTTAACTTCATTTACGAGTTGCATGTGAA TC AT AT GTT A ATTT CCAAAAAT GT G ATTTT GTT A ATT A AT AGC A A AGT G ACGT CCC A A A CTACAAATAT G AG AGGCGTT AGTTT G A A ATT GTT AAAATCCCTTTT GCC AG ATT CT A A ACCACTTCAAACTAAACAT G A AT CTAATCCTTCAAAATCAT ACGT A A ATT AT ATTT A A AAGTGCAAAACTCAACATTAAAATTACTCTCGAAAAAGTTTGATTTGACCATTAAATT TTT A ATTTT AT ATCT ATT A AGTT GTT GGT A AGT AT GAT GCT CCT ACTT GCC AC A AT AAA A A AT A A AT A A A A A A AT A A A A A A A AT A A A A A ATTT CAT CAT AT A AT AAT ATT ATC ACG TGTTT GGT A AGT A ACT A AGGA A A A A A A A ACT AGTT AGAC AC A A ATTTGA A AGTT CTC A A AAT ACT GAC A A A ACCT AC ATT AT A AG AT AG AGTT AAT AGCT A ATCT GAGT AT A AC AC ATAAGATAAAATGACACTAATTACCATTTTAAAAATGATTGTTCGATCCGTGCTCTCTC CC A ATTC AT A ATT GTTGACTT A A A AGA A A A AGA A A A A A AT GGA A A AT ACCT C A A ATT ATTGATGAAGAGATTAAGAGAGGCATGATTATCTTTCTCAGTGGCCATACAACAGTAA TCAACATCATTAGAAAGGAACCAATCTTCCAAACGGCGGACGAGGCTAGAGCCAATT CC ACGGCGGCGAT ACGGCGGT GCC ACT CT C AGGCC A AGA AT GT AGCCC AGTTT A ACC A CCAAACCGGGCGGCGGTTTATGAGCAGTAAAAAAAACCGGTTTTATAGAGCCTTGAAT A AC ACC A ACC ACTT CCTT GT CCCGCTC AGC A ACCTT AAT AAT A A A A A ATTTTC A ATT C A C A A ATT AAT A ATT A ATT A ACCT AT A A A A A A AC AC A ATT A ATT A A A ATT A A ATT AG ATT ACC AGC ATTTT AT AC AT GGG ACT GTT ACGT ATCCT AC A A AT GGGGTC ACCC A A AGTGT CAGTGAAGAGAAACACACGTTTTGATTGGCCAATTTCACATCTTCGTTCAAGATCCAT AACTTGAGCTTTATCTGATAATTGACTCTCTTCGTAGCTTCGAATAACAAAGCCTTTAA ACCCCATTAATTAGAAAAAACAAAATTAAAAGATAAAGATTGAGAGTGAGTGAGTGA T A A A A ATGAGAGGAG A A ATTT ATTT AT AT AAT A AGGGGAGAGAG A AGA AGGGTTTGT TGTAAATATGGTATTTGATATCAACATCAAGATGGTAGGATAAATTTTGAAAAGGTGT TT AAT ACGAGACAAAAAGTGATTAATGATAGAACGTGTTCCTTGTCTATGGTTT ACCC TTT CTT A ACTTTTCTTTT ACTT ATT AT AGGC AGACC AT GTT AGGAT A ACT AT ATGCCCTC ACAAAATTCTCTCTCTCTCTCTCTCTCTTGTTTATTCAATTAACATCTCTTTTCTATTTT A ATTT AAT A A A AT AC A ATT A ACTT ATGATTC ACT C A ATTTT CTC ACTC A AC AC ATT GA A A ATGACCATCTTAATCCCTGTTTGCAAAACGTTGATATAAAATTTTGAAAAAAAAAAAG ACTAAAAATGAAGATTGATGGACCAAAATTGTCAACCAAAATATATTTTGAAAGTTCA AG A AT C A A ATTGA AC ATTTT GG ATT A A A AT AG ATT A A ATT GCA A ATTT GGT CGTT AT G

GACC ATTGA A AT ACC ATTTT A AT CT ACGT ATTTTGAGT AT A ATT CC ATTTT A AT AC AT A T ACTTTC ATTTGCGT AAATTT AATCTT AAATTTTCTTTT ATGAAAATAT ATT AAAAATA AT A AT AT ATT ATC A A A ATTT AGA AC AT A AGGCT A AT AC A A AGC AGA AT A AGT A A ATG G A AGCGT AT AT G A A A AT A AT ATTT A A AT AAAATAAAATAAAATAATAAT AG AG AT G A A A AT G A A AGCT A AG A A AG A AT GG A ATT GG A A A AG ATTGT ATT AC A A AG AC AT A AT G A

TT AT G A AGT AAAATTTTAAAATT AATTT AAAT AGT A AT GG AC A A AT AGT G A A ACTT AT TCAATTTT A ATT CTTTTAAATAAATTAAT AG AT AATTAACAAT A A AGGCT A A A AGT G A GTATTTTGTGATAAATCTAAGTTAAAAATATAAGTCTTCAAACATAGGAACCAAATTG A A AC A A AGTTT A A ATCCC A AT GGT A A ACTTGT A AT ATTTT GA A ATTT AGGGACT A A AC TGAAATCAAACTCTTTAAAATCTATGGACCAAATGAAAACTAAACGTAAAACTTAAGG TCC A A A A A AG AGT AT ACG A AT GT GTT AGT G ACC A A A A ACT AGT CT GT ATTTTG A ATTT TTCTATCTTTATT GT ATT G A A A A A A A A A A A ATT CCTTCTATT G AGG A ACT ATTT ATTTT T AATTT GTT A AGTGGA A A A AT GAT ATGAT CGT A AGGGGGCCCCT CTTTC AC ACC AT GT CTTT GGC A A ACGT GT GGT AGT G A AT GAT A AG A A AGCC A AT AGGT GCA AT A AT AT GCCT

ATT A AGGT ATT ATTT A AGG AGCTCGGT GCA AT AT AGTT A ACG AT AT CAT ATT GAT GG A GAC A A A ATT ATTGCTTT AT CAT ACGTTT A AT AATTT AGA AG AT A ACGTTT AC A AT AT A A CAT C AC AGT CCATATAAATAATT GTT ATT ACGT ACTTTTTCTTTTCAAATATTACAATA GTT ATGGAGAGA AT AC A A ACT GA ACC ATTTTTTT AGT CT A AC A ATT GCA AGG A AT GGG A ACT A A ATT GTT GACCTTTGATGT A AT C A ATT A A AGA ATT A A A AGAG AGA A A ACC A A AAAGAGGAAGGAGTGTACAATGTGTCGAGTAATAGTTTCGATTGAATGAATGATTAGT

TCCT C A A A AC AC A ATTTC AT GA AT AT AGCGATTTTGTTTGAT GA AT GA A ATTT GA A A A T ATTTT ATTTTTTTT CTAAAAAT G A A ATT GTT CCATTAATAATTT GCTCTT C A AT GT AT A GATGAAAAAGTCTAACTTCATTGATTGTCAAATCATCCTTTTTTATTTTCAATGTTATA TTAATGTTTAAAATTTTTCGATAAATGTCAAGTTGATCAAGTGATATATATCTAATCTC ATTT GAGCC ATT AGA A AT CTTCTT AT A A ACC A AT AGTGTC A AT C A AGCT ATT AA A AGA GATGTTATTGATGATGATGTTTGACAATTTAATTTTTTATGTCTAGAAATCTTCTTTAG

CCTCCATTTTCCTCCAATTTTATTCTCTCATCCCTTTCATTAATAACCCAACAAAAT ATT C A A AGT A ATTT GTT AC A AT ATCCTTTTC A A AGT A ACT AGA AGA ACT AGA A AC A ACT A A ATT GC ACTT A A ACTGT CT AT G ACT G A AGCTT A A A AGT ACATAATAAAAATCTAT A AGT AGT CTT AT AT GT A AGT GAT AG ATTTT A ATTT GCT ATT GGT GAT ACCTCATATCATT GAT AGGGT CTACTCAT GAT AG AG AGT A A AT GAT A A ACTCT AT C ATT GAT ACC A AT CT AT C A AT GAT ATACTTCTATT GTT GAT AT ATTT C A AGT A AT AT A AGCCT AC A AGTC AT AG ACTT CT ATT ACCGAT AG ACTCT ATC ACT A AT A A ACTT CGAT AGTT A A ATCT AAGTTTT GGT AT ATCT AT A A ATT CTATTATATT GAT AT ATT AT ATATATTAATACTTT G A AC ATT GTT GT AT TT AC A AC A AT A AT AT ACTTC A A ATT ACT A AG AT AGC AGCCGA AC AT C AGA ACCC AT GG GCTTGGGCCC A AT A AT AT CAT GGC ACGA AGT AC A ACCCC AT GGAC AT CGC ATGGGTT A TTAAGCCCACCAAGAGCCTAAATCACATCAAGTTCAAGCCATGATCAGAGGCCTCAAG A AGCCC ACGGGT AT ATGT GGGCC AGC A A AGCCC A A A A A ACTT GGGCC A A AGGCCC A A TT A A AGG A ATT GT CGT ACG AT GT GGAC AGTT GGAC AT GT ACAATTCCCTTTTT A AT C AC GAACTTTAAAGTTGTGACCGACCAACCGCTGTTCGTAATGACTGTTTGAAGGTCCCTTT CATCAATCCACTGACACTTGAAAAGTCATTAAAATCTCCGTCGCTATTATGTAGCTGTC T CAT ACT A ATTT A ACCT A ACTTTT A A ACTTT AGGCT A AGTTT AC ATGTT ATT A ACTTTC A A ATTTT AC AT CT ATT AGGTGT CCT C A ACTTTT A ATT A AGT GT CTT A AT GAG ATTTT AG

AAAAGCTTGTTTTTGGAATTTGGAAATTGAGGGATGCTTAATTTCAAAAAAAAAAAA A AAAAATGAAATGGTTATCGAACAAAACTTGACCTTTTAAATTCATGT ATCT ATT AAAC ATAATGTGAACTTTTACCGTTTCTTTAGATATAAGATCCAATTTTATGTCAAATAGGTT A ATT AC A A AT GA ATT A AGGGT AGT AGTTTT A A AT AGC A A A ACT ACT AGA A AT ATTT AC A AGT AT AG A A A A AT GTCTCT GTTT ATT AGT A AT AG AC A AT ATT GAT AG AC AT GT ACC A GT GT CT ATT ACT A AT AG AC A ATT AT AG AT GTT AGT A ATCT ATC AGT GAT A A AT AT GAT

AT ACTT ATT AT AT AC A A A ATT AT A AGTTT A A A A ATTT ATT AGAC AC A A A ATTT A A AGTT TAACTTATTAGACACATCTATCTATTAGTGTCCGGTTGGGTTTGGATTCAACAGTTTCG TT AT AT CT A A A A AT ATTT AT A ATT A A A A A A ATT A A A AT AC ACGCCGTT ATT A A ATT A A TTT GA A AGTTT AG AG ACT A A AT A A A AGGGA A A A A A A A AGGA A AGTTGGA ACGATT A A TTGT C A ACC ACGT A A A A AGGACCT GAT AGGA AT A ATT CTTC A AT GAC AC ACTCT CCCC CTCTTCTTTT GTT AT A ATTT CGCTTT CATTTCACTCACACTCTCACATCATCCAACCAAC

AAGACTGAATTATTAATTTTTGTGTTTGGGACAGGGAATTTAAAGACAGTTTTACGA C AGCAGACAGGGTTAGAGCCGAGGGAACAGAGATTGTTGTTTAAAGGGAAGGAGAAG GAGAACGACGAGTGGTTGCATATGGCCGGTGTGAACGACATGTCGAAACTCATACTC ATGGA AGATCCT GCT ACT A A AGAGAGGA AGCTT C A AG AG AT GA AG A AG A AG A AT ACC ACTGCTGCAGGCGAAGCACTGGCGGGGATCAGAGCGGAGGTCGATAAACTCTCCGAA A AGGTTCGTT A A ATCGTT AAATTACAACTTT AGT CG A A A A AT AT ATTT GAG A A A ATT G T A A A A ACC ACT CGT GGT A ATT AC AGTT AT ACCTTC A A ACTTTT A AT ATT A A A A ATT A A GTCTTTAAATTTATATTATTGTTAAAATTGGACTCTTAAATTTTGTTTAATTGTAGAATT GAAGCTCAAAATGATAAAAATTGAACTCTCAAACTTATACAATTTTTACCATTTCTATT ATT ACTT A AGTTT G AGGT CTCAATTTTACCATAAAAAAATTT A AG AGGT GG A ATT GCA

A A A A ATT A ATT A A ATTTT ACTT ATTT ATTT ATT AT GAT ATTC AC AT ACTTTT A AG AT ATT TGAATTCTCAAGTGAATTTTTTTTTAAACAACAAGTTTTTCTGGAAATTGACAAAAAAA AGA A A A A AGT AGTTTT A A AT GCCTTGCTTTT ATTTT ATTTT ATTT A AT GA AGTTTTGAT AATGATACAAATGTTTATGTAACAAAAATGAAAACATTTGAAAGAAAAAAATGGTTA TT AGGT A AC ATTT CT AA AGTTT A A A A ACCT ATTT GAGAC A A AT GTGA A AGTTC A AT A A CTC ATT GA ATTGCTTT AGA AGGTTT A A A A A AC A A AT AGTT AC A A AC A AGCTT AG AG AC TAAACTTCTAATT G A ACCT A ATTCT A A AT GATT G A A AT G A ATT G ACC A AT GGAT A ACT

GGAAGAGGGTGGAAGAGAAAGAAGTTAATTTATTGATAGAGTTGTTGATGATGCAAT TGTTGAAATTAGATGCAATTGAGACGGATGGGGATTCCAAACTTCAAAGACGAACTCA GGT ATCT ATT GG ACT AT AT GT C A ATT AT CATTAAAAATAAATTTACTTT GGCTT AT CT A TTT AT A AT A ATT AGGGT AT ACAAATTT A AGGT G ATTT A A AT CGTTTT CTTT C ATT A ATC T A ACT AT GC A A A ACCGTT AC A AC A AT ACGTC ACCTTT A A AT A ACTTT A ACT ATTT ACC A A A ACTTT AT GA AG AGGA ATT AT A A ATTT ACTT ACCGCCT A ATTTCTCTTTT A A A ACT CT TTTTGTTAACTCTTAATGTCGGGTATGTTTGCATTAGTCAT ATTT AAT ATCCATTAAATG AT AT A ACTTTT C A A AC A AT AAT AATT A AC AT AT AT CTTT ATT ATT ATT ATT AGTT ATT A G ATTT GT AT AGTTTT CT A A A A A A A AG A AT GG ATTTT AT GT A AGTTT GGATT A ACTT A A A

AATAAAAACACTAAATTCCATTT GG AT G A ATT ATATATTT A A AT ATT ATATTTCTAT GG T AGA A A ATT ATTT ACT ATT ATTT A AT C A ATTTT A AT A AT GAT GGATT A A ATTT A AGTTT TATTAAAT G A AT ACTT G A A A AT AT G A ATT AAAATTAAAT AT AT AT ATTTT A ATTTTTC A ATTTT GGT ATTT AT A AT A A A A AGT ACGAT AGTTT A AT C ATT A AT GGGTT AGGT GGCTG GTGCT CCCCT AGGGCC ATC A A ACTT A A A AT A ATT A A A A AT A AT GA A AGTCT CCT A A AT TGTATGAAAATTCAATGAAT AT AAATTGTGAAAAATGATAATGGGT ATTTT ATCT ATTT ATTT ATT A ACT C A A A A A A A A A A ATT A AT A A AT AT AG ACT A A A A A A ATTGC AGA A AT A GG AC A A A AT G ATTTT AATTCTTTCCCTT GAT AT G AC ATTTTT AT GT GGG AC ATT AT G A A ACCAAGAACTTATCAAGAAGGATTCTATTCAAAATAAATAAATAATTGATTAAAGAA G A A A ATTCC ATT A AT GT CCCT A A AGT CTTAATCACACCTCT ATTT AGCGT CT ATC AT G A AT A A A AT A A AT AGA AAT CAT AGGA ATGCT GAGGT GGC AT GA AC ACT AG AT A A A A ATT

AACTATAGTTAATTTTGGTCCACTTACTTTCAAAATATCAATTTTAGTCCCGTGGTT TT A A A A AGTCT CC ATTTTGGCCCCTT A AC A AT GAAC A A A A AT A AG AT A A A A AT AGT AATT AAATTTTAATTTT G A ACT AT GT A ATTTTTTTTTT G A AGT AC A A AT AGT AG AGT AGGGA A ATT GAG AG A A AG AGT AT ACGTT AATT AT C ATT G A ACT AT GTTT ATTTT GGT GGT GAT A AGTTTTT ACGC A ATTTC A ATT A ATTTGA AT A ACGTT AGA ATT GT A ATTTT AT A ATTTT G GG A AT A A A AC AGGTTGTT AGGGT AC AG A A ATT AGT GG AC AG A ATTG AC A AGTT G A AG GTT AGA ATCT C A A AT CCTTT A A ACC A A AC A AC A AT GA A A AGAGGC A A AT GGGAGGA A TTT GA ATCT GGATTT GGC AGCCTT ATTCCT CC A ACTT C A A A ACTC ACC AT C AGCTCT AC A A A A AT A ACTC AT GATT GGG A ACT CTTT GATT AGTT CATTCTCTTTCTTCCCATTTTTTT GC ATT AGA ACCGA ACCGA AT CGA ATT A A ACT ATTTT GGC ATTT CTGT AC AT ATT GCTTT ATGTGGGCTTCCCAATTGATATTGGACCCAAATGGGCTCTGTTATAAGCCCAATAAGA TGTCTGTGCAGTGTGATGTTGGGTTAAGTGGAATATTATTACTCTCTCTTTTTATCAAA

A ATC ATT GTTT GCCTTT A AT ATTTTC A ATT AT AC AT AG A A AGTT GGCT CAT AC ATT ACC TTTCTCAAACATGTTATTTATGGCAACTTCTTAGTTTACTCTCTTCCTCTCTATTTCTTT G CCTTT CCTCT ACC ATT A AG ACT CCTCTT GTT ATTTTC A A AG ACT ATTT A ATTT A ATT AAA T A ACGCT A AT G AGTTTT A AT AATTATCTAATTAATATT AT A ACGTTTTCGTTTT ACTG A TCTCTTAATTTTAGAAGAATAAGGACTTCAATCAATAGTTATATATTTGTTAAAAATCT ATTGATCCAATCTTTTATAAATAAAACAAGTCCAAAATTAAACAAGAAAGATCGATGA T ATT AT CGA AC ACTTT G A A AT A ATT AT G A ACTTTTT A AT A AGTT A AT GT AG AT AT GTTT T AAAT AT A AGA AGGGCC AT GCTTT AC ATGGT AT C A ACTTT A AGT CTC AT GCA A AT ATT GCA ACT CAT GGGTGT AC A A AG AT AGATC A AC A AGC AT ATTT AT C A ATTTTTT A A ATTT A A A A AC A A AGTTC ATTT CTTT A ATTTT CAT A ATC AT AGGGTTT AT A A A A AGGCT AC AT AGTCCCT ACC A ATTC ATT C ATT ATTT CTTCCCTTT GGCT A AGGT ACGT AC AT AC ATTT A

GATGAACCAAGGCGGAGGAATTCCGACGATGGCAGCAGCACTGCCACCATTGCCACC GTCATGTTTAGGAAAACTGACAACTAGCGGTGAAAAAAAGCTACCGTTCTTTCAATCC AACATGAATTTAAGTATGTATGGAAATGACAAAAGTATCCTTTCTCAAAGAGAAGCTA CCATAACACCACCGCCGAAGCAACACCAATCACAACTTCTAGACTCCGACAAAGATCT C ACT GTCG A AGCC A AGCG ACT A AG A AGGTT CACTCTCTT G A A ACT A ATCT AG A ACTT A A A AGT A A A ATTT A A A AAGT C ATTT C A A AC A ACCCT A ACGTT CTTTT ATTC ATGT ACGT A

TT GTT C A ATTT GT A ATTC A ATTT CT G ATTTTT CTTTCTATATT A A ATT AT GT CACATTTT A AT GTTT AGGGCTGGC AT GAG AT GAT G A A AGGCT AT AT AT GC AT GT AT ATT CAT A A AT TT GTTT CCTT A AG A A ATTT GAT GAT GCT GG AC ATT GG AT A AG ACT A ATT GGC AGCCTG ATCATATGCTTCAATCAATATTTCTATAATGGAATAAGCAAATTGGTAAGTGTGTGGC CTCCTGGCATGGTTGTGGACGTGATTGGCGAAATCAAAAGGTGGGGAGACACATCTCA T ACTCC ATTT GCC A A AGGTTGAGC A ATT AGCTCGTT ACT C ACT GCCTT ACCC AT C A ACC AT GCTTT GGT GT GAGCTTT C AGCTTTC AGCTTTC AGCTTT GTT ATTT AC A AT AT AT ATTT CCTCTCTTTCAACTGCTCCATCTTCTTCTGTGATCCTTACTTTCCTTTATGATTGTATAA T GAG A AT GTTT GG A A A AT CGT A AT A AG AT AG ACTT GT A AT GT A AT GT A AT CCAAAATT AATGTTTGGATTGAACGTTTTGGACCCGATTTGTAATACGAAAGTCATTCTGTTCCGAC

GATT CTT AC ACCGT ACT A ATTTTCT A A AC A ACC A A A A AGGA AT CC A A A ACTTT AT ATT A A A AT AT A A A A AT CTTC A A A ATTTCCGCTT A A A ACT C AGC A A A AC A ATT A AT AGC A A A AT AT A A A A A A A ATGCCT ACC A A A A A AT ATC AT ATTT GATCCTGAT A ATTTTTTT A ATT G ATC AT AGC A AGC A A ACT AATTTAAATT GT A A A A AT GAT C A AC A A AGT CTCCTCAT CG A AAAGTGTTGGGTCATCTATTAAATTAGAGGGAGAGAGGAAATAAAAGATTGAGGTGA A ATGGGAGGGT AG AT AGC AGCTTTC AT CT AT AT ACT AT GCT A AGGAC AT ATTTT A ATT

A AAGTTGTTT A ATTT AGCT A A ATT A A ACTT ATC A A A ATC A A ACGTT ATTT A AATTT C AC T ATT CTTTTTAT A AT AT GT GT GAT AGG A A A AT AG A ACTT CTCACCAAAAT GTT GAT GT A C A A ATTT GAT G AGTTT G A A A A ATTT A ACT A ATT ACAACTAAT GGT A AGATT C A ACTTC TT ACCCT AGCTTCTT ACTTCTTTGAAAGT ATGAAATT AT AT AC AT A AAA AG AC A A ACT A ATTT GCT A AGT CTTCCAAAATAAACC AT AAT ATTTT AATTT ATTTCATCTCAATTT A

SEQ ID NO: 14 >CL08381 _WT_aIIeIe

MACCAATGTCGGCGGCTGATGACGTCCGTCACGAAATGCATCATARAGTGACGAACT TTT AT CCGTGT AGATTTTTGGT ATTT CC AT CCCTT GCGGAGCCGT CAT A ATTCC A A A AC GGCAATGCAAAATCAGGATCCTTAATCAAAGACCCCAATATTCTCTCATGAAAGTAAA GAT A A A A ACG AT GG A AT GGG A AG A AC

SEQ ID NO: 15 >CL0838 l_MUT_allele

MACCAATGTCGGCGGCTGATGACGTCCGTCACGAAATGCATCATARAGTGACGAACT TTT AT CCGTGT AGATTTTTGGT ATTT CC AT CCCTT GCGGAGCC AGTC AT AATT CCA A A A CGGCAATGCAAAATCAGGATCCTTAATCAAAGACCCCAATATTCTCTCATGAAAGTAA AG AT A A A A ACG AT GG A AT GGG A AG A AC

SEQ ID NO: 16 >CL_chr2_gap_Fl_ primer AGAGTGAACCAAAAGATCC SEQ ID NO: 17 > CL_chr2_gap_R3_primer CCC A A A ACC A A AT AGTT ACC

SEQ ID NO: 18 >CL_chr2_gap_F2 primer GAACCAAAAGATCCACCA

SEQ ID NO: 19 >CL_chr2_gap_Rl primer ACCT AC AT CC ACT CCCT A A

SEQ ID NO: 20

> PPO_WT_gDNA reverse complementary sequence

TTATGCATCATATTCAATTCTAATGTCCTTAACGGTGGCGGACCCATCTCCAAACCT AG

GGACC A AT GT A AC A AT A ATGCT ATCGT CGTT AT CCGC AT CCA A ACT CTC A AGC AGTT C

AGTT ATCCCT A ACCT A AGGC AT GTTTTT ATGTT C ATGCTGCT GCT ACCTTT CAT AT GAG

GCACATTCACAAAGCTCCCTGCAAACTCAGAATTATCCGCTCTAATTTCCCTATCAT CC

TCGTCATTGATAAAAACATCAAACTTAATAGCCTTGTTTCCGTCGAACTCAATCCCA TC

A ATC ACC A A A AT CT CCT CTTC AT CGT CTTTCT CCTT CGT ACCCCT CGATTT CTTCGGCCT

CTTGACTTCGAAGCTGACGATCTTGTCAACACTCGACGGTAGCTTCCCGGTCTTCTT GG

TAGATTTCTTCTTGGTTTTGTTGGGTGTGCGTGGTACTCGTGGGGTTGGAGGTGTTT TG

AGCC AT GG A ATT GGT ACGGT GT C ATCGT AG AC AT AGCCT A AGGCTCT GGT ATCT AG AC

AGTCTTTGACATAAACTCGGACAGCTTCACCATTCTCATCGTAGAATACAAAGGAAG C

GTTTAGGAAATCTTTGTCTTTAATGTCTTGTCGCTTTTCGCCTAAGGATTTCCATAT GG

ACC A AAA ACGGT CC ACGTTCGCGT GGT GGGCGT AGA AGATGGGAT CTCT AGCCGCT G

AGAAGAAGGTTCCCATGTCAATTCGGTTCGATTGGTTCGGGTCACCCGTCCACAAAT G

AATTGAATTGTGAGGAAGGTTTTCCACCGTCCCCATTCCTTTCAAAAAATTAATCAA TT

CAT C A A A ATT AC ATC ATTTT ACTTTCC AT ACT A AT ATCCTT A ATT ATT ATCT CT A A AC AT

CATTATCTAATTTACAAAT GGGC AT GTTT AG A AT ACATTTTCAAAT GATT A A ATT A A A A

AAACAAGTCAATTTGATGACCAGCATAACTTATCAACACAACCCAATCATCACTGTT T

CGATTGGGTTAGGTTGAGTTCAAATAAATTAAAAAGTTATTGGTTGAGTTGTTTATG TG ATTGAAAAAACTTATCAATCCGACAATTGAGTTAGATCTATAAACTGCTCTAATCCAA

TGAAAAAAAAAATCACCTGGACTTGGGTTGCTGCCACTTCGATAAGGCTGGCCGAAA A AGAGC A AGGGCGT ACGGGCGCCGGAC ACGACCT GGCGAT AC AT A AC ACTT AGATTG CATTGGATTATCTTTTCTCTGCTTATTGTTGGCTCAACGTCATTGTAATCCAAATCAAC CAATGTCGGCGGCTGATGACGTCCGTCACGAAATGCATCATAGAGTGACGAACTTTTA T CCGT GT AGATTTTTGGT ATTTCC AT CCCTTGCGGAGCGT CAT A ATTCC A A A ACGGC A A T GCA A A AT C AGGAT CCTT A AT C A A AGACCCC A AT ATTCTCT C ATGA A AGT A A AG AT A A AAACGATGGAATGGGAAGAACAGCCACGAGAAATGAACTTGTAATTCAACTGGAAGA CCCAATTGATCGTAACCCCCAGTACAATAAGCACAGTGAACAAGTGCTTGCTGTTTAA AACTACGTGGATCATCATCAGGAAGCGCTTTCATAAGCGCTACGGCTTCCTTATACTTT TCAATATATTCTTTATCTAATGATTGTGCCGCTTTCCTAACGCGTGGTTTGAGGAAGGG TTTTACGTTATTGGTGGATGGTGGGCAGCAAACCAAATCTTTGACGCCATCTGCCAAG TCCGTGCTTGATCCACACTTGGAAGGGTCGGGGGTTGTGACTGGAGCTGCCAAAGCGA AGGGAT C A ACTCC A A A AGC ACTT GA AGCT GAGCC AT AC AGACCGCCG AGCCCGAT A A GCGCTTCT CTCCGGT C A AC A A ACTT GCCTGGCC AT A AT GAGTT ATT ACTTT CTTC ACC A CTGCC ATT GGAGCCGCT AC AC AC A ACC A AGTT ATTGAGT CT ATGA AT GGT GGA AGATG GATCTTTTTTTTTACGATAAAACAGACCAAAGGAGGCGCCGCCGGTGGTGGCCGTGGT TATTGCGGCGGAGGAAAGTGCTAGTGGCATGGAAGGAGATAGAGAGGCCAT