Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR IDENTIFICATION OF MOLECULAR MARKERS LINKED TO HEIGHT INCREMENT
Document Type and Number:
WIPO Patent Application WO/2014/129885
Kind Code:
A1
Abstract:
The present invention relates to a method for identifying a genetic marker linked to a trait locus of a plant comprising the steps of measuring phenotypic parameter of plant, developing and screening markers, and conducting data analyses to identify genetic marker. The marker relates to gene that suppresses level of auxin and associated with height increment.

Inventors:
ONG PEI WEN (MY)
OOI CHENG LI LESLIE (MY)
LOW ENG TI LESLIE (MY)
Application Number:
PCT/MY2014/000020
Publication Date:
August 28, 2014
Filing Date:
February 19, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MALAYSIAN PALM OIL BOARD (MY)
International Classes:
C12Q1/68; C12N15/29
Domestic Patent References:
WO2010056107A22010-05-20
WO2010146357A12010-12-23
WO2008114000A12008-09-25
Foreign References:
US20090075829A12009-03-19
Other References:
KIM ET AL.: "CAPS marker linked to tomato hypocotyl pigmentation", KOREAN JOURNAL OF HORTICULTURAL SCIENCE & TECHNOLOGY, vol. 30, no. 1, 2012, pages 56 - 63
Attorney, Agent or Firm:
MOHAN, K. (A-28-10 Menara UOA Bangsar,No., Jalan Bangsar Utama 1 Kuala Lumpur, MY)
Download PDF:
Claims:
CLAIMS

1. A method for identifying a genetic marker linked to a trait locus of a plant, said method comprising the steps of: measuring phenotypic parameter of said plant;

extracting genomic deoxyribonucleic acid (DNA) from said plant;

developing single nucleotide polymorphism (SNP) based cleaved amplified polymorphic sequence (CAPS) marker; and conducting data analyses from said method to identify said genetic marker.

2. The method as claimed in Claim 1 wherein said plant is oil palm.

3. The method as claimed in Claim 1 wherein the trait locus contributes to the height increment of said oil palm fruit.

4. The method as claimed in Claim 1 wherein said phenotypic parameter includes but not limited to fruit bunch yield, bunch weight, bunch number, bunch quality component, rachis length, girth width, girth height or combination thereof.

5. The method as claimed in Claim 1 wherein said step of developing SNP based CAPS markers further comprising the step of screening SNP markers.

6. The method as claimed in Claim 1 wherein said SNP based CAPS markers are co-dominant.

7. The method as claimed in Claim 1 wherein said step of extracting genomic DNA further comprising the steps of amplifying said genomic DNA with polymerase chain reaction (PCR) and preparing restriction enzyme for digesting said amplified DNA.

8. The method as claimed in Claim 1 wherein said step of conducting CAPS further comprising the steps of conducting agarose gel electrophoresis to separated said digested DNA and visualizing said agarose gel for data analysis.

9. The method as claimed in Claim 3 wherein said restriction enzyme is EcoRI and/or Haelll.

10. A selection markers obtained and identified with the method as claimed in Claim 1.

11. The selection markers as claimed in Claim 10 wherein said markers include but not limited to 653_AciI, 3064_TaqI, 5962_AluI, SNPG00002_Hpyl88l, SNPG00004_AciI , SNPG00005_BcgI, SNPG00006_FatI , SNPG00014_HpyCH4III and SNPG00014_SspI .

12. The selection markers as claimed in Claim 11 wherein said markers are described in SEQ ID NO:l, SEQ ID NO: 2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:ll, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO: 16, SEQ ID NO: 17 and SEQ ID NO: 18 or complement thereof.

13. The selection markers as claimed in Claim 10 wherein said marker is SNPG00006_Fat I .

14. The selection markers as claimed in Claim 13 wherein said markers are described in SEQ ID NO: 13 and SEQ ID NO: 14 or complement thereof.

15. The selection markers as claimed in Claim 13 wherein said marker SNPG00006_FatI is from Indole-3-acetic acid- amido synthetase (IAA-amido synthetase) .

16. The selection markers as claimed in Claim 15 wherein said IAA-amido synthetase gene suppresses level of auxin.

17. The selection markers as claimed in Claim 16 wherein said suppressed level of auxin reduces the length of internodes in oil palm.

18. The selection markers as claimed in Claim 10 wherein said markers are associated with height increment.

Description:
Method for Identification of Molecular Markers Linked to

Height Increment

FIELD OF INVENTION

The present invention relates to the field of marker identification in plant biotechnology, more particularly isolating Single Nucleotide Polymorphism (SNP) marker linked to the height gene of plant. BACKGROUND OF INVENTION

Agriculture plays an important role in generating income to one country's economy. Oil palm (Elaeis guineensis Jacq. ) is one of the most important crops due to its importance as fuel and food. In Malaysia, oil palm has contributed RM 80.4 billion income from exporting oil palm products to other countries. The total area for oil palm plantation has increase 3.0% to 5 million hectares in year 2011. The high revenue has attracted developers' interest in planting oil palm and researching to improve traits of the crop.

It is important to have oil palms to be short enough to be reached by workers for harvesting and determining ripeness. This is a major concern of oil palm developers as oil palms of age 25 years can reach 10 meters in height. In order to overcome this problem, oil palm breeders have initiated short palm breeding programs. A benchmark for height increment has been set to below 30cm per year to ease bunch harvesting and to lengthen the economic life span of oil palm. Dwarf varieties have been developed for some crops such as wheat, barley, rye and oat. In fact, reduction in plant height among these cereal crops created lodging resistance and increased yield production. However, similar study has not been disclosed for oil palm. Thus, there remains a need to conduct selective breeding using markers for height increment in oil palm.

The conventional method of breeding oil palm can take as long as 30 years as oil palm being a perennial crop, takes 10 years to complete a single breeding cycle. It will take a long time to cultivate progeny to find out if it has the traits of interest after cross-breeding two parent plants. Due to the long breeding cycle, oil palm breeders face a great challenge in improving polygenic traits that have complex pattern of inheritance.

It is an object of present invention to provide molecular marker to improve the selection efficiency as well as reduce the time spent to develop new planting material through conventional breeding program. The types of molecular markers useful for such application include restriction fragment length polymorphisms (RFLPs), randomly amplified polymorphic DNAs (RAPDs), amplified fragment length polymorphisms (AFLPs), simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) .

The Malaysian Palm Oil Board (MPOB) has assembled the largest oil palm germplasm collection for genetic conservation and enrichment of breeding population. These genetic materials are evaluated in field plots where phenotypic parameters such as oil yield, height increment, stalk length, iodine value, carotene, kernel, vitamin E, oleic acid contents and lipase activity are measured. Some high yielding palms with low height increment (20 to 30cm per year) were found in the Angola germplasm collection and are being incorporated into breeding schemes to develop low height increment planting material.

SNPs are co-dominant molecular markers and reveal single nucleotide change in the deoxyribonucleic acid (DNA) sequences. SNPs are the most abundant sequence variations in plant genome. Newly automated and high throughput systems for SNP detection were established which allows for wide application of the markers in genetic studies such as genetic diversity, population structure determination, linkage disequilibrium analysis, marker-assisted selection, cultivar identification and association mapping.

A candidate gene approach to study the association between allelic variant and phenotypic traits has been widely applied in plants. Candidate genes are identified based on the understanding of the biochemical pathway, mutational analysis and linkage analysis. The first candidate gene association between allelic variants in Dwarf gene and the development traits (plant height and flowering time) has been demonstrated in maize. Later, similar approaches have been described in rice, wheat, alfalfa and grapevine. However, similar study has not been disclosed for oil palm. Thus, there remains a need to provide molecular marker for height increment in oil palm.

Some mining of SNPs in oil palm genomic sequence database and EST sequence database have been conducted. An agarose gel based technique known as cleaved amplified polymorphic sequence (CAPS) or polymerase chain reaction- restriction fragment length polymorphism (PCR-RFLP) is used to amplify target DNA region containing a SNP, followed by digestion of the amplified product and fragment analysis on agarose gel. Using SNP based CAPS assay, the polymorphism in natural oil palm germplasm as well as differentiating the two oil palm species (E. guineensis and E. oleifera) has been reported. Moreover, it is yet another object of the present invention to provide SNPs for use in marker-trait association for germplasm collection from Angola, which shows variation in height increment.

SUMMARY OF INVENTION

The present invention relates to a method for identifying a genetic marker linked to a trait locus of a plant comprising the steps of measuring phenotypic parameter of plant, extracting genomic deoxyribonucleic acid (DNA) of plant, developing single nucleotide polymorphism (SNP) based cleaved amplified polymorphic sequence (CAPS) marker, screening SNP markers, amplifying genomic DNA with polymerase chain reaction (PCR), digesting amplified DNA with selected restriction enzymes, conducting agarose gel electrophoresis to separate digested DNA, visualizing agarose gel for data analysis and conducting data analyses to identify informative genetic marker.

The method refers to the plant of oil palm where the trait locus contributes to the height increment of oil palm. The phenotypic parameter measured includes but is not limited to height increment, fruit bunch yield, bunch weight, bunch number, bunch quality component, rachis length, girth width and girth height. The SNP based CAPS markers are co-dominant.

The selection markers obtained and identified include but not limited to 653_AciI, 3064_TaqI, 5962_AluI, SNPG00002_Hpyl88I, SNPG00004_AciI , SNPG00005_BcgI ,

SNPG00006_FatI, SNPG00014_HpyCH4III and SNPG00014_SspI wherein the markers are described in SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:3, SEQ ID N0:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:ll, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17 and SEQ ID NO:18 or complement thereof. In yet another embodiment of the present invention there is provided a selection marker SNPG00006_FatI obtained with the subject method comprising the DNA sequence of SEQ ID NO: 13 and SEQ ID NO: 14 or complement thereof. The marker SNPG00006_FatI is from Indole-3-acetic acid-amido synthetase (IAA-amido synthetase) which suppresses level of auxin and is speculated to reduce the length of internodes in oil palm. The markers identified are associated with height incremen .

The present invention consists of features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated, in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings in which:

Figure 1 depicts the graph of distribution of height increment for AGOOl and AGO08 populations in accordance with an embodiment of the present invention;

Figure 2 shows the agarose gel electrophoresis of PCR products digested by random SNP marker 3064_TaqI in accordance with an embodiment of the present invention; Figure 3 shows the banding profiles for SNP based CAPs marker;

Figure 4 illustrates unweighted pair group method with arithmetic mean (UPGMA) dendrogram at family level based on the Nei (1983) genetic distance in accordance with an embodiment of the present invention;

Figure 5 depicts the graph of ad hoc quantity (ΔΚ) values with its modal value to determine the true K of two groups in accordance with an embodiment of the present invention;

Figure 6 depicts the graph of population structure of 219 palms based on nine SNP markers (Plot in single line) in accordance with an embodiment of the present invention; and

Figure 7 depicts the banding profiles of SNPG00006_FatI marker across a validation panel containing the Nigerian Elaeis guineensis palms in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

Further understanding of the object, construction, characteristics and functions of the invention, a detailed description with reference to the embodiments is given in the following.

In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known methods, procedures and/or components have not been described in detail so as not to obscure the invention. Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings .

It is therefore purpose of this invention to present molecular markers associated with height increment in oil palm. In one preferred embodiment, the markers are used to genotype selected Angola germplasm populations showing variation in height increment. This is due to a relatively high level of genetic diversity observed in MPOB-Angola populations using SNP analysis. BEST MODE FOR CARRYING OUT THE INVENTION The preparation of the present invention is described in detail by referring to the experimental examples. It should be understood that these experimental examples, while indicating preferred embodiments of the invention, are given by way of better elucidation only. A person skilled in the art can ascertain the essential characteristics and embodiments of this invention, therefore various changes may be provided to adapt to various usages and conditions.

Materials and Methods

A total of 54 open-pollinated bunches were harvested and collected from eight sites (population) in Angola. For each bunch (defined as family) , sixteen seedlings were laid down in a trial with randomized complete block design (RCBD) to evaluate their phenotypic performance. For each palm, data on fresh fruit bunch yield, bunch weight, bunch number, bunch quality components (oil yield, fruit weight, ratios of oil to bunch, mesocarp to fruit, shell to fruit) , rachis length, girth width and height were collected. These phenotypic measurements recorded for the Angola trial were subjected to clustering analysis using Statistical Analysis Software (SAS) . Two populations namely, AGO01 and AGO08 which showed high similarity were chosen for use in this study. A total of 219 palms (61 from AGO01 and 158 from AGO08) were sampled from the experimental plot as shown in Table 1.

Country No. of palms of origin Population Family Block available Total

Angola AGOOl AGO0104 Block 1 16 61

Block 2 15

AGO0105 Block 1 14

Block 2 16

AGO08 AGO0801 Block 1 16 158

Block 2 16

AGO0808 Block 1 14

Block 2 16

AGO0810 Block 1 16

Block 2 16

AGO0811 Block 1 16

Block 2 16

AGO0812 Block 1 16

Block 2 16

Total 219

Table 1 The number of palms from Angola populations AGOOl and AGO08 used in the study

In one's preferred embodiment, the palm height was measured at the 8 th year of planting. Measurement was done from the ground level to the base of frond number 41 of the palm. The height increment rate was calculated based on formula of Bruere and Powell (1987) as below:

Height at year t

Height increment = ^ where t is the age of palm. Descriptive analysis of this trait was computed using SAS . Development of SNP based CAPS markers A set of random gene SNP markers and a set of candidate gene SNP markers were utilized in this study. For development of random SNPs, oil palm GeneThresher sequences were aligned based on binning using Basic Local Alignment Search Tool (BLAST) (95% identity over a minimum of 50 bp) and PHRAP assembly (minimum match of 15, minimum score of 60) . Clustering and assembly of the unique sequences that were identified were further collapsed into unique clusters and singletons. The sequence clusters were further analysed to identify SNPs. CAPS assay were identified by analyzing the SNP sites for modification or introduction of restriction enzyme recognition site.

Auxin-responsive GRETCHEN HAGEN 3 (GH3) family genes 5 related to plant growth and development were selected from previously published literature. Four members of GH3 gene shown in Table 2 were used as for development of SNP markers from candidate genes.

Gene Synonymous Description Reference

Dwarf in light 1 GH3-6 Involved in shoot Nakazawa et

{DFL1) {Arabidopsis elongation, lateral al., 2001 thaliana) root formation and

light response of

hypocotyl length

Dwarf in light 2 GH3-10 Involved in red Takase et

(DFL2) {Arabidopsis light specific al., 2003 thaliana) hypocotyl elongation Indole-3-acetic GH3-8 Involved in cell Ding et acid- ( Oryza wall loosening and al., 2008 amidosynthetase sativa ) expansion

( IAA- amidosynthetase)

Jasmonate resistant GH3-11 Involved in Reimann et 1 ( JAR1 ) ( Oryza coleoptile al., 2008;

sativa and elongation and root Staswick et Arabidopsis growth al., 2002 thaliana )

Table 2 Candidate genes {GH3 family genes) related to plant growth and development

The sequences of these genes were obtained from National Center for Biotechnology Information (NCBI) database and blasted against MPOB oil palm genomic sequences database. Oil palm sequences with homology to the candidate genes were identified. These sequences then were aligned and assembled into contigs using BioEdit. SNP2CAPS was used to analyse the potential number of SNPs that can be converted into CAPS markers. The SNP primers were designed by Primer3 for the consensus sequence of contigs which contained SNPs with restriction enzyme recognition site. 5 DNA extraction

Genomic DNA was extracted from frozen spear leaves using the modified Cetyl Trimethyl Ammonium Bromide (CTAB) method. DNA concentration was measured and the quality of each DNA sample was examined by digestion with restriction enzyme. In one preferred embodiment, the restriction enzymes used are EcoRI and Haelll . Both digested and undigested DNA samples underwent agarose gel electrophoresis and visualized under ultraviolet (UV) light. The preferred embodiment for agarose gel electrophoresis is 0.9% LE agarose gel at 100 Volt for 90 minutes. Prior to visualizing in UV, staining is done on DNA sample, in one preferred embodiment, the staining is done with ethidium bromide.

SNP screening and genotyping

A screening panel comprising 23 samples from AGO01 and AGO08 populations was established to evaluate the 62 SNP markers developed in this study. There are twenty-nine random markers and thirty-three markers designed from candidate genes related with plant growth and development.

In one preferred embodiment, PCR amplification for oil palm genomic DNA comprises 50ng of template DNA, IX PCR buffer (20mM Tris-HCl [pH 8.4] and 50mM KC1), 0. ImM of deoxyribonucleotide triphosphates (dNTPs), 0.2μΜ of each primer, 1U of Taq DNA polymerase and 0.1% Triton X-100. The setting for amplification is preferably initial denaturation at 94 °C for 4 min, followed by 35 cycles of 94 ° C (40s), 60 to 62 ° C depending on the primer (40s) and 72°C (40s) with final extension at 72°C for 20 min.

The PCR products were digested with 5U of restriction enzyme in a final volume of 15μ1. in accordance to protocol from manufacturer. The digested PCR products were analyzed on a 3% agarose gel and visualized.

Statistical analysis

Genetic diversity and population structure

The molecular data scored at each SNP locus of the SNP based CAPS assay was analysed to compute gene diversity, heterozygosity, polymorphism information content (PIC) and inbreeding coefficient. Nei (1983) genetic distance was estimated at individual and family levels. Unweighted pair group method with an arithmetic mean (UPGMA) was performed using PowerMarker and viewed using molecular evolutionary genetics analysis (MEGA) . Analysis of molecular variance (AMOVA) was conducted by genetic analysis in excel (GeneAlEx) to partition the total variation into among and within populations. The threshold for statistical significance was determined by running 999 permutations.

Analysis of population structure was performed using STRUCTURE. A model based clustering method was applied for inferring population structure and assign individuals to population using multilocus genotype data. The optimum number of cluster was identified after five independent runs for each K value ranging from 2 to 10 using admixture model and correlated allele frequencies. The length of burn-in period and the number of Markov chain Monte Carlo (MCMC) were set as 100,000 iterations during analysis.

The most likely number of clusters (K) was often determined using maximal value of logarithmized probability of data (LnP(D)) in the output from the STRUCTURE. Average of LnP(D) against each K across replications were calculated and plotted on a graph. The K at which LnP(D) plateaus or shows slight continuous increase was identified as optimum K If distribution of LnP(D) did not show a clear mode for the optimum K, an ad hoc statistic (ΔΚ) based on the rate of change in the LnP(D) between successive K values was used. Once optimum population structure (K) was determined, inferred ancestry of individuals (Q matrix) was estimated. After determining the number of population, K-l columns of the population structure Q matrix was employed as the population structure and then used in the linear models for association analysis. Marker-trait association Marker-trait association analysis was performed using Trait Analysis by association, Evolution and Linkage (TASSEL) software. Table 3 shows the four models applied to associate SNP markers with height increment via the general linear model (GLM) and mixed linear model (MLM) .

The GLM without Q was the simplest model to explain variation in height increment, taking into account the marker effect only. GLM with Q and MLM with K models include the population structure (Q matrix) and kinship (K matrix) , respectively. Model MLM with Q + K takes into account both population structure and kinship. The kinship coefficient (K matrix) was calculated by TASSEL. For all models, the marker effect and population structure were defined as fixed effect.

The kinship was considered to be random effect. Markers were considered as significantly associated with height increment when P < 0.05.

Model Description

GLM without Q Height increment = marker effect + error

GLM with Q Height increment = marker effect + Q + error

MLM with K Height increment = marker effect + K + error

MLM with Q+K Height increment = marker effect + Q + K + error

Table 3 Statistical models applied for marker -trait

association In addition, association analysis between marker genotypes and height increment was carried out using statistical analysis software (SAS) . In the analysis, block and population were considered as random effects and markers as fixed effect. The least squares mean method of GLM procedure was applied according to the mixed linear model as below :

Yijklm= μ + Oii + + Pij+ Υ(β)κθ) + Οίγ(β) ik(j) + δι+ £ijklm where Yijkim is the observed value of height increment; μ is the overall mean; a± is the random effect of i ch block (i= 1 and 2); Pj is the random effect of j th population (j= 1 and 8); a i j is the interaction between i th block and j th population; y(P) k(j) is the k th family nested within j th population; ay(P)ik(j) is the interaction between i th block and k th family nested within j th population; δι is fixed effect of 1 th SNP genotype (1= 1, 2 and 3) and e±jki is the random residual.

Value of P< 0.05 was regarded as significant. Some parameters such as interaction between genotypes were not included in this model.

Phenotype data

The two MPOB Angola germplasm populations (AGO01 and AGO08) showed mean height increment of 42.6 cm per year, with maximum and minimum height increment of 73.0 cm per year and 20.0 cm per year respectively (Table 4) . Previous analysis indicated that palms from AGO01 recorded height increment of 50 to 60 cm per year while those from population AGO08 were relatively shorter (height increment of 20 to 30 cm per year) .

Table 4 shows that the coefficient of variation (CV) for height increment was 27.9% thus indicating that the selected MPOB Angola germplasm populations have high genetic variation for height increment. In Figure 1, normal distribution of palm height increment of AGO01 and AGO08 shows the highest number of palm having an increment of 40 cm per year.

Trait N a Mean (cm) Min b (cm) Max c (cm) SD d cv e (%)

Height Increment 219 42. 69 20. 00 73. 00 11.94 27. 96

Table 4 Descriptive statistics for height increment

In table 4 the superscripted side notes where a is representing number of palms, while b for minimum, c for maximum, d for standard deviation and e for coefficient of variation .

Development of SNP based CAPS markers

In selecting random SNP markers, sequences generated from the gene-rich hypomethylated regions of oil palm were used. A total of 29 randomly selected SNP containing sequences were shortlisted for use in this study. By using candidate genes approach, 97 homologous genes related to plant growth and development were identified from MPOB oil palm genome database. Cluster analysis was carried out and 16 clusters were attained to reduce redundancy. Of these, eleven groups contained sequences from both the Elaeis species (E. guineensis and E. oleifera) as well as sequences from dura, tenera and pisifera (different fruit forms of E. guineensis) palms. The eleven groups were assembled into 17 contigs and 90

SNPs were identified in eight of the contigs. However, only 41 SNPs had restriction enzyme sites identified for development of the CAPS assays. Eight of the 41 SNPs were excluded due to unknown enzymes or small size of digested PCR product (less than 20 bp) . Hence, only 33 SNP based markers were suitable for screening.

SNP marker screening

A total of 62 oil palm SNP markers (29 random and 33 candidate gene SNPs) were successfully developed. All primers generated amplifiable products with sizes around 300 base pair (bp) and 200-500 bp for random and candidate gene SNPs respectively. After digestion with restriction enzymes, nine SNP markers were found to be informative. As shown in Table 5, the nine informative SNP markers are three from random (653_AciI, 3064_TagI and 5962_AluI) and six from candidate markers (SNPG00002_Hpyi88l, SNPG00004_AciI , SNPG00005_BcgI, SNPG00006_Fa tl , SNPG00014_fipyCi4III and SNPG00014_SspI ) , these markers were used to genotype the 219 5 palms representing the two Angola populations selected for analysis in this study. Figure 2 shows an example of informative banding profiles from digestion of PCR products generated using random SNP marker 3064_TaqrI.

SNP based Forward Reverse

EPS OFSAD d CAPS Primer Primer Ta a CO RE C

(bp) (bp) marker (5'- 3") (5'- 3·)

GCTGAGACATGAAA ATGAACAACAACT

65.7/66.

653 Acil TGTGCGTAG CGGAGTCACC 272 Acil 175,97

2

(SEQ ID NO: 1) (SEQ ID NO:2)

CACCCTCTCAGGCA AAAGGGAGAAAGA

65.4/65.

3064 Taql TATTGTTG CACAGAACCC 260 Taql 188,72

4

(SEQ ID NO: 3) (SEQ ID NO: 4)

CTGCGTGACTACGT ACTTGCATTAGCC

67.1/65.

5962 Alul GAGAGGG ACCAACAAAC 269 Alul 182,87

(SEQ ID NO: 5) (SEQ ID N0:6)

TAAGGGCTGGAGGA CGAAGTGATCTTG

SNPG00002 60.0/59.

AGGATT GTGCTGA 356 Hpyl88l 186,170 _iipyl881 9

(SEQ ID NO:7) (SEQ ID NO: 8)

SNPG00004 GGTCATCATTGACG TCACCAACCTAAA 58.7/59,

221 Acil 162,59 Acil GTCATC CGCAAGA 3 [SEQ ID NO:9) (SEQ ID

NO: 10)

GCTCCCGTCATAA

CGAAGCAAACACTT TGCCATA 59.6/60.

CAGACG 413 Bcgl 294, 119

Bcgl (SEQ ID

(SEQ ID NO: 11)

NO: 12)

AGCATTTCATTGG

CAGGAAGCTTGCCA

SNPG00006 CTCGAAG 59.0/60

CTGATA 334 Fatl 267, 67 Fatl (SEQ ID 3

(SEQ ID NO: 13)

NO: 14)

CTTGCCGGTACAC

SNPG00014 TCGACCTCGTTGAT

GCTATTC 59.8/60. HpyCH4I

GTGAAG 501 439,102

(SEQ ID 6 II

HpyCH4III (SEQ ID NO:15)

NO: 16)

CTTGCCGGTACAC

TCGACCTCGTTGAT

SNPG00014 GCTATTC 59.8/60.

GTGAAG 501 Sspl 451,90 Sspl (SEQ ID 6

(SEQ ID NO:17)

NO: 18)

Table 5 Primers sequences for the nine informative oil palm

SNP markers.

The superscripted side notes in Table 5 where a representing annealing temperature, b for expected PCR fragment size, c for restriction enzyme and d for observed digested fragment size after digestion.

The SNP based CAPS markers are co-dominant and can be used to differentiate homozygous and heterozygous individuals. Figure 3 shows the three types of banding profiles for SNP based CAPS marker that can be observed and scored. For homozygote AA, only one band is observed as the restriction enzyme recognition site is absent in both alleles. For homozygote BB, two bands were observed due to the presence of the restriction enzyme recognition site in both alleles. In the case of heterozygote AB, three bands were observed, contributed by the combination of allele A and allele B.

Genetic diversity

Table 6 shows the summary statistics calculated for genetic diversity of selected two MPOB-Angola germplasm population (N=219) , where the average gene diversity observed for populations analyzed was 0.394. Gene diversity is often known as expected heterozygosity (H e ) . This suggests that the selected populations from Angola oil palm germplasm are relatively diverse. The gene diversity of the two selected MPOB-Angola germplasm populations was lower than that recorded for oil palm germplasm samples analysed using SSR markers (H e =0.5188; H e =0.537) . However, the values were higher than that observed for isozyme (H e =0.194) and RFLP (H e =0.211) . The high H e detected by SSR markers is due to the multi-allelic nature of the SSRs. This observation is consistent based on the observed heterozygosity values. H 0 is the proportion of heterozygous individuals at a given locus in a population. The present study recorded mean observed heterozygosity (H 0 ) of 0.400 (0.219 to 0.553) which is lower than that shown by SSR analysis (H o =0.458).

GD/He a Ho b PIC f d

Population

AGO01 0. 398 0. .403 0, .315 -0 .002 AGO08 0.389 0..399 0..309 -0.023

Mean 0. 394 0. .401 0. .312 -0 .012

Marker

653_AciI 0. 256 0. .219 0. .223 0. 146

3064_TagI 0. 327 0. .365 0. .273 -0 .117

5962_A2uI 0. 402 0. .347 0. .321 0. 139

SNPG00002_ Hpyl 881 0 .5 0. .553 0. .375 -0 .103

SNPG00004_ Acil 0. 438 0. .384 0. .342 0. 127

SNPG00005_ Bcgl 0 .49 0. .502 0. .370 -0 .023

SNPG00006_ Fa tl 0. 293 0. .338 0. ,250 -0 .152

SNPG000014 _JipyCH4III 0. 494 0. .516 0. , 372 -0 .042

SNPG00014_ Sspl 0. 385 0. .374 0. , 311 0. 030

Mean 0. 398 0. .400 0. , 315 -0 .002

Table 6 Summary of statistics calculated for genetic diversity of selected two MPOB-Angola germplasm populations

(N = 219)

For Table 6, the superscripted side notes where a representing genetic diversity/expected heterozygosity, while b is observed heterozygosity, c is polymorphism information content and d is inbreeding coefficient.

Several previous genetic diversity studies using various molecular markers were conducted on oil palm advanced breeding population namely the Deli dura. The H e values obtained were 0.147 using isozyme, 0.085 using RFLP and 0.234 using SSR. This shows the selected MPOB-Angola oil palm germplasm populations have a relatively higher level of H e compared to that reported for the Deli dura population. In general the germplasm populations have higher genetic variability than the advanced breeding population which indicates the usefulness of germplasm for oil palm improvement . The polymorphism information content (PIC) value is commonly used in genetic studies as a measure of polymorphism for a marker locus. The PIC values ranged from 0.223 (653_AciI) to 0.375 with a mean of 0.315 (SNPG00002_Hpyl88I) . Yan et al . (2009) have classified the PIC value into three different classes: slightly informative (PIC < 0.25), reasonably informative (0.5 > PIC > 0.25) and highly informative (PIC > 0.5). Based on this classification, two SNPs applied in the present study are categorised as slightly informative and the other seven SNP markers as reasonably informative. None of the SNP marker was highly informative .

Generally, bi-allelic SNP markers reveal PIC value less than 0.50. The mean PIC values of oil palm germplasm was analysed using SSR markers were 0.53 and 0.65. The average PIC values for the SSR markers were about 2 fold higher' th-an that recorded for the SNP markers (0.315). SSR markers are expected to have higher PIC value than the SNP marker due to their multi-allelic pattern of inheritance. Nevertheless, the SNP markers were more abundant in plant genome and thus have potentially wider genome coverage than SSRs.

The mean value of inbreeding coefficient ( f) for the populations analysed in this study was -0.012. The negative f value indicated higher heterozygosity than homozygosity levels. This might be due to the out-crossing behaviour of the oil palm. The same was observed in some of the germplasm collected from Gambia, Cameroon, Nigeria and Senegal.

Cluster analysis

Cluster analysis for the 219 oil palms selected from populations AGO01 and AGO08 at the individual level were carried out using UPGMA method. Figure 4 shows the dendrogram revealing three clusters at family level based on the Nei (1983) genetic distance. Cluster 1 consisted of AGO0105 and AGO0812. Three families (AGO0811, AGO0801 and AGO0808) were classified into cluster 2 whereas cluster 3 contained AGO0104 and AGO0810. The dendrogram shows that certain families within population AGO08 (AGO0811, AGO0801 and AGO0808) cluster together in group 2. However, families AGO0105, AGO0812, AGO0104 and AGO0810 were not grouped according to their respective populations. The dendrogram constructed did not clearly separate the families according to their population.

The genetic distance values calculated among families from populations AGOOl and AGO08 within the Angola germplasm exhibited close genetic relationship. The dendrograms showed that the individual palms and families from the two populations are mixed in the clusters. These results signify considerable genetic similarity among population AGOOl and AGO08. It is not suprising that exchange of genetic material could have occurred actively between the different regions, explaining the genetic similarity observed.

AMOVA analaysis

The analysis of molecular variance (AMOVA) revealed that only 1% of the total genetic variation was explained by the variation between populations (AGOOl and AG08) . Table 7 shows the AMOVA among two populations based on nine SNP markers; wherein the remaining 99% of the total genetic variation was contributed by the variation within the populations .

Source df Sum of Mean Estimate Percentage squares squares variation (%)

Among Populations 1 5.183 5.183 0.031 1%

Within Populations 217 530.457 2.445 2.445 99%

Total 218 535.639 2.476 100% Table 7 AMOVA among two populations based on nine SNP

markers

Levels of genetic variation are directly associated with the breeding system in plant. It is a known fact that cross-pollinated and long-lived perennial species such as oil palm would have high genetic variation within a population. This information can be applied in drawing up sampling strategy for genetic conservation work. Population structure

The model based approach as described by Pritchard et al., (2000) was applied in population structure analysis. This approach presents the exact K value based on the estimated log-likelihood values. Since the estimated log- likelihood values decreased with K, true K may not have been represented in this model. An ad hoc quality (ΔΚ) was therefore performed according to Evanno et al., (2005) to determine K value. Figure 5 shows that the highest value of ΔΚ was at K = 2 which suggested that the samples was made up of two main genetic groups.

The model based simulation of population structure using nine informative SNP markers (K = 2) is shown in Figure 6. All the 219 palms sorted by membership probabilities (Q) represented by each thin vertical line were partitioned into two coloured segments that indicate the estimated Q of the individual to the two genetic groups.

The selected two populations (AGO01 and AGO08) from MPOB-Angola germplasm do not show strong evidence of subpopulation structure although the ad hoc quantity (ΔΚ) suggested that subpopulation number might be two. Populations with low differentiation levels would have an optimum number of two clusters. Clear population structure was not present possibly due to the relatively narrow genetic background. The results presented in this study suggest that the genetic base of the two selected populations using the SNP markers is narrow. However, the descriptive analysis of height increment showed high genetic variation in the two selected populations, thereby creating a favourable situation for marker-trait association. The selection of a minimum number of samples with maximum variation has a normalizing effect that is expected to minimize population structure.

Furthermore, the normal distribution observed for the trait suggests that alleles linked to height increment are likely present in appropriate frequency on the populations analyzed. In fact analysis revealed that only common alleles with frequency > 0.05 were observed and no rare alleles with frequency < 0.05 were detected. Rare allele tends to cause biasness in covariance between markers and population structure and increase the chance of type I error marker- trait association study. The appropriate population structure with no rare alleles and use of a combination of random and candidate gene SNP markers do provide a favourable platform for marker-trait association even with a relativity small SNP panel. Marker-trait association

Using GLM with Q, MLM with K and MLM with K+Q models which take into account the population structure (Q matrix) or kinship (K matrix), the number of significant markers associated with height increment was reduced after multiple correction steps. Results are presented in Table 8. Using the GLM without Q model, three markers (SNPG0002_tfpyl88l, SNPG00006_FatI and SNPG00014_HpyCH4III) were associated with height increment (P < 0.05) . For the GLM with Q model, two markers namely SNPG00006_FatI and SNPG000014_HpyCH4III showed significant association with height increment (P < 0.05) . Further association analysis using another two models, MLM with K and MLM with K+Q, revealed that only marker SNPG00006_FatI was significantly associated with height increment at P < 0.05.

SNP-based CAPS

Marker TASSEL TASSEL TASSEL TASSEL SAS

GLM GLM with MLM with MLM with

without Q Q K Q+K MLM

(P-value) (P-value) (P-value) (P-value) (P- value )

653_AciI 0.4631 0.2797 0.8979 0.8177 0 .8018

3064_TagI 0.6659 0.622 0.4601 0.897 0 .4025

5962 Alul 0.4135 0.3333 0.4366 0.6082 0 .4227

SNPG00002_flpyl88I 0.0352* 0.2331 0.7541 0.7744 0 .4065

SNPG00004_AciI 0.9494 0.0595 0.8664 0.3565 0 .4751

SNPG00005 Bcgl 0.9821 0.9602 0.1519 0.753 0 .6326

SNPG00006_FatI 1.12E "06 * 5.39E "06 * 1.19E "04 * 3.22E "04 * 0. 0046*

SNPG00014_tfpyCH4III 0.0109* 0.0415* 0.4194 0.6136 0 .6785

SNPG00014_SspI 0.9108 0.7006 0.8029 0.8115 0 .1956

* Significant at P < 0.05

Table 8 Marker-trait associations using TASSEL (GLM and MLM) and SAS (MLM) The number of significant markers decreased after Q and K matrices were introduced as a correction factor in the models. More importantly in this study, marker SNPG00006_FatI consistently showed significant association with height increment in all models.

The inclusion of population structure and kinship reduces type I error, thus eliminating the false positive associations. Correction factors concerning population structure and kinship are very crucial in marker-trait association studies. The level of association can be classified into 4 groups: strong (P<0.005), moderate (0.005<P<0.01) , weak (0.01<P<0.05) and no association (P>0.05). Marker SNPG00006_FatI constantly displayed strong association with height increment in all the four models tested, thus increasing the confidence in the association of the marker with height increment.

The level of linkage disequilibrium (LD) in oil palm has not been documented. However, being an outbreeding species, LD is not expected to be very large, perhaps extending only a few hundred base pairs similar to some forests species. As such, it is not surprising that none of the limited randomly selected SNP markers showed significant association with height increment. A more comprehensive collection of SNPs may be required to identify association with the trait. Fortunately, SNP markers developed from candidate genes had significant association with height increment. Significant association between a candidate gene SNP marker and height increment was also obtained when analysis was carried out using MLM with Q+K model. This observation highlights the potential of candidate gene SNP markers to be significantly associated with trait compared to random SNP markers. The candidate genes may be selected based on prior information obtained from previous genetic, biochemical or physiology studies in other plant species. In addition, the number of markers required for candidate gene association studies are less. Thus, candidate gene association method may be preferred for crops that with limited resources or incomplete genome sequence.

Marker-trait association using mixed linear model was also carried out using SAS GLM procedure. The correlation between nine SNP markers (653_AciI, 3064_TagI, 5962_AluI, SNPG00002_Hpyl88I, SNPG00004_AciI , SNPG00005_BcgI , SNPG00006_FatI, SNPG00014_fipyCH4III and SNPG00014_SspI ) and height increment were analyzed and evaluated by the least squares means (LSD) method. Table 8 shows that SNP marker SNPG00006 Fatl again was revealed to have significant association with height increment (P < 0.05) . The rest of the markers did not show significant association with height increment .

According to Table 8, the Least Squares means with same superscript are not significantly different at P<0.05 and the asterisk (*) sign indicates that the data is significant at P < 0.05.

Validation of marker in additional populations

The candidate gene association mapping study in diverse maize inbred lines and the association findings were further verified through linkage mapping, gene expression and mutagenesis studies. Therefore, it is important to determine the reliability of the marker in predicting for low height increment in oil palm. Only markers that have been validated across different genotypes would increase selection efficiency and reduce the time required to develop new and improved planting materials in oil palm.

As such prior to application in oil palm breeding, the SNPG00006_FatI marker needs to be further validated in other independent populations. SNPG00006_Fa tl marker was further tested on a validation panel containing Elaeis guineensls palms from Nigeria (Table 9) with variable height. Polymorphic profiles were detected among the samples indicating the marker' s ability to detect variation in height in the samples tested (Figure 7). The molecular data generated from the validation panel were subjected to oneway ANOVA (Analysis of Variance) ( P<0.05 ) to evaluate the marker effect on the height of the samples. The simplest model was used for the ANOVA analysis, as shown below:

Yij = μ + oil + εΐ]

where, Yij is the observed value of height increment;

μ is the overall mean;

ai is the fixed effect of marker;

eij is the random residual

The results obtained indicated that there was a significant effect of the marker on the height trait, F (2, 19) =5.81, P=0.0107 (Table 10) .

No Palm No Progeny Code Height (meter)

1 150/499 NGA 12.05 1.9

2 150/501 NGA 12.05 2.08

3 150/2333 NGA 12.04 1.8

4 150/2360 NGA 12.04 1.53

5 150/4280 NGA 12.06 2.09

6 150/5275 NGA 12.05 1.75

7 152/341 NGA 44.10 3.55

8 152/389 NGA 44.12 3.76

9 152/400 NGA 44.12 3.75 10 152/427 NGA 44.10 2.72

11 152/496 NGA 44.12 3.06

12 152/554 NGA 44.10 3.21

13 149/11526 NGA 12.01 2.59

14 150/500 NGA 12.05 2.4

15 150/5115 NGA 12.04 2.45

16 150/5376 NGA 12.05 2.14

17 150/5974 NGA 12.04 2.5

18 152/354 NGA 4412 3.3

19 152/396 NGA 4412 3.05

20 152/462 NGA 4410 2.98

21 152537 NGA 4412 2.8

22 152/576 NGA 4410 2.71

Table 9 List of the Nigerian palms included in the validation

panel

Table 10 shows the GLM Procedure which discloses the ANOVA for the Nigerian palms included in the validation panel indicating the effect of the SNPG00006_FatI marker on height trait. The dependent variable for this analysis is on the height of Nigerian palms.

Source DF Sum of Squares Mean Square F value Pr > F

Model 2 3.35311856 1.67655928 5.81 0.0107*

Error 19 5.47820871 0.28832677

Corrected 21 8.83132727

Total

* Significant at P < 0.05 Table 10 ANOVA for the Nigerian palms included in the validation panel indicating the effect of the SNPG00006_FatI marker on height trait

Another point to note is plant height is a quantitative trait that may be influenced by several genes. Among the four members of GH3 gene possibly involved in height increment, only one SNP marker ( SNPG00006_Fa tl ) from IAA- amido synthetase was associated with height increment. IAA- amido synthetase belongs to the family of GH3 proteins that regulate auxin levels in plant. More specifically IAA-amido synthetase suppresses level of auxin by catalyzing the reaction that conjugates auxin to amino acids. In rice, overexpression of GH3-8 and GH3-13 resulted in the dwarf phenotype. The expression of GH3 in rice can be triggered by pathogen or abiotic stress, where the focus is switched from growth to stress adaptation. Adapting to stress is at the expense of plant architecture, where the dwarf phenotype can be one of the result characteristic exhibited. The dwarfness in rice was contributed by the reduction in internode length Notably, SNP marker in the IAA-amido synthetase gene is also associated with dwarfness in oil palm, where the lower height increment is also a result of the reduction in the length of the internodes . Admittedly it may not be appropriate to rely only on one marker to select for palms with low height increment. The study will be expanded in the future to include a larger set of SNP markers on a wider range of natural populations collected from Nigeria (E. guineensis) and Colombia (E. oleifera) which also show a wide variation for height increment, to identify additional marker genes linked to this trait. The number of bi-allelic SNP markers required for estimating population structure and relatedness accurately is much higher than the multi-allelic SSR markers. However, despite their lower information content, SNPs are still popular. The abundance of SNP, lower mutation rate and amenability to high throughput assay make SNP markers the currently preferred choice for complex marker-trait association studies. However, both SSR and SNP markers may be required to dissect more complex traits (with strong environmental influence) in oil palm such as yield.

The present work was initiated to evaluate the usefulness of SNP markers and develop appropriate analysis methods and models for marker-trait association study. The preliminary study has identified nine informative SNP markers for genotyping oil palm natural populations collected from Angola.

These results also demonstrated the potential use of the SNP markers in identifying marker-trait association. The current finding admittedly is still immeasurably far from adoption in oil palm breeding and further validation of the marker in other genotypes is required. Furthermore, additional marker genes linked to height increment identified are used for effective prediction of the trait. However, the study does provide a promising start to apply association mapping techniques with SNP markers to identify linkages with complex trait in oil palm. Although the present invention has been described with reference to the preferred embodiments and examples thereof, it is apparent to those skilled in the art that a variety of modifications and changes may be made without departing from the scope of the present invention which is intended to be defined by the appended claims.