Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND COMPOSITIONS FOR SELECTING SOYBEAN PLANTS HAVING FAVORABLE ALLELIC COMBINATIONS OF STEM TERMINATION AND MATURITY
Document Type and Number:
WIPO Patent Application WO/2024/011056
Kind Code:
A2
Abstract:
The present disclosure relates to compositions and methods for identifying, selecting and/or producing soybean plants having determinate stem termination and late maturity. The present disclosure also relates to compositions and methods for identifying, and deselecting soybean plants having indeterminate stem termination and/or early maturity.

Inventors:
HANCOCK WESLEY GRAHAM (US)
PERUMAL AZHAGUVEL (US)
Application Number:
PCT/US2023/069335
Publication Date:
January 11, 2024
Filing Date:
June 29, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SYNGENTA CROP PROTECTION AG (CH)
SYNGENTA CROP PROTECTION LLC (US)
International Classes:
C12Q1/6895; C12N15/82
Attorney, Agent or Firm:
DAVENPORT, Destiny (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS: A method of identifying and/or selecting a soybean plant comprising an allelic combination associated with a late maturity determinate stem growth phenotype, comprising: a) isolating a nucleic acid from a soybean plant part; b) detecting in the nucleic acid of a), a first molecular marker that is associated with late maturity and a second molecular marker that is associated with determine stem growth, wherein said first molecular marker associated with determinate stem growth comprises a G at position 335 of SEQ ID NO: 1, and wherein said second molecular marker associated with late maturity comprises a T at position 260 of SEQ ID NO: 2; and c) selecting or identifying a soybean plant comprising the allelic combination associated with a late maturity determinate stem growth phenotype on the basis of the presence of said first and second molecular markers of b).

2. The method of claim 1, wherein the detecting comprises amplifying each of the first and the second marker, or a portion thereof, and detecting the resulting amplified marker amplicons.

3. The method of claim 2, wherein the amplifying comprises: i) admixing an amplification primer pair with a nucleic acid isolated from the first soybean plant or germplasm thereof, wherein the primer pair is complementary or partially complementary to at least a portion of the first molecular marker associated with determinate stem growth, and is capable of initiating DNA polymerization by a DNA polymerase using the soybean nucleic acid as a template; ii) admixing another amplification primer or amplification primer pair with the nucleic acid isolated from the first soybean plant or germplasm thereof, wherein the primer pair is complementary or partially complementary to at least a portion of the second molecular marker associated with late maturity, and is capable of initiating DNA polymerization by a DNA polymerase using the soybean nucleic acid as a template and, iii) extending each primer pair in a DNA polymerization reaction comprising a DNA polymerase and a template nucleic acid to generate corresponding amplicons.

4. The method of claim 3, wherein the primer pair that is complementary or partially complementary to at least a portion of the first molecular marker associated with determinate stem growth comprises SEQ ID NOS: 3 and 4, and wherein the primer pair that is complementary or partially complementary to at least a portion of the second molecular marker associated with late maturity comprises SEQ ID NOS: 5 and 6.

5. The method of claim 2, wherein the amplifying comprises employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm thereof as a template in the PCR or LCR.

6. The method of claim 2, wherein detecting the resulting amplified marker amplicons comprises detecting using a marker probe, wherein the marker amplicon generated by amplifying the first marker associated with determinate stem growth comprises SEQ ID NO: 7 or 9, and wherein the marker amplicon generated by amplifying the second marker associated with late maturity comprises SEQ ID NO: 8 or 10.

7. A method of producing a soybean plant having a favorable allelic combination associated with a late maturity determinate stem growth phenotype, comprising: a) isolating a nucleic acid from a soybean plant part; b) detecting in the nucleic acid of a), a first molecular marker that is associated with determine stem growth and a second molecular marker that is associated with late maturity, wherein said first molecular marker associated with determinate stem growth comprises a G at position 335 of SEQ ID NO: 1, and wherein said second molecular marker associated with late maturity comprises a T at position 260 of SEQ ID NO: 2; c) selecting a first soybean plant comprising the favorable allelic combination on the basis of the presence of said first and second molecular markers of b); d) crossing the first soybean plant of c) with a second soybean plant not comprising the molecular marker of b); and e) producing a progeny plant from the cross of d), thereby producing a soybean plant having the favorable allelic combination associated with the late maturity determinate stem growth phenotype.

8. The method of claim 7, wherein the progeny plant comprises each of the first and second molecular markers of b).

9. The method of claim 7, wherein either the first or second soybean plant is an elite soybean plant.

10. The method of claim 7, wherein the progeny plant is backcrossed by one or more generations.

11. The method of claim 7, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a microsatellite.

12. The method of claim 7, wherein the detecting comprises amplifying each of the first and the second marker, or a portion thereof, and detecting the resulting amplified marker amplicons.

13. The method of claim 12, wherein the amplifying comprises: i) admixing an amplification primer pair with a nucleic acid isolated from the first soybean plant or germplasm thereof, wherein the primer pair is complementary or partially complementary to at least a portion of the first molecular marker associated with determinate stem growth, and is capable of initiating DNA polymerization by a DNA polymerase using the soybean nucleic acid as a template; ii) admixing another amplification primer or amplification primer pair with the nucleic acid isolated from the first soybean plant or germplasm thereof, wherein the another primer pair is complementary or partially complementary to at least a portion of the second molecular marker associated with late maturity, and is capable of initiating DNA polymerization by a DNA polymerase using the soybean nucleic acid as a template; and iii) extending each primer pair in a DNA polymerization reaction comprising a DNA polymerase and a template nucleic acid to generate corresponding amplicons.

14. The method of claim 13, wherein the primer pair that is complementary or partially complementary to at least a portion of the first molecular marker associated with determinate stem growth comprises SEQ ID NOS: 3 and 4, and wherein the another primer pair that is complementary or partially complementary to at least a portion of the second molecular marker associated with determinate stem growth comprises SEQ ID NOS: 5 and 6.

15. The method of claim 12, wherein the amplifying comprises employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm thereof as a template in the PCR or LCR.

16. The method of claim 12, wherein the detecting the resulting amplified marker amplicons comprises detecting using a marker probe, wherein the marker amplicon generated by amplifying the first marker associated with determinate stem growth comprises SEQ ID NO: 7 or 9, and wherein the marker amplicon generated by amplifying the second marker associated with late maturity comprises SEQ ID NO: 8 or 10.

17. A composition comprising at least two amplification primer pairs capable of initiating DNA polymerization by a DNA polymerase on a Glycine max nucleic acid template to generate at least two Glycine max marker amplicons for detecting presence of an allelic combination associated with a late maturity determinate stem growth phenotype, wherein the at least two Glycine max marker amplicons comprise SEQ ID NO: 1 and SEQ ID NO: 2.

18. The composition of claim 17, wherein the at least two amplification primer pairs comprise (i) the primer pair of SEQ ID NO: 3 and SEQ ID NO: 4 and (ii) the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6.

19. The composition of claim 17 or 18, further comprising at least two marker probes for identification of the at least two amplicons, where the at least two marker probes are: (i) SEQ ID NO: 7 and 9; or (ii) SEQ ID NO: 8 and 10.

20. The composition of claim 19, wherein the amplification primer pair comprises SEQ ID NOS: 3 and 4 and the marker probes comprise SEQ ID NOS: 7 and 9 and/or wherein the amplification primer pair comprises SEQ ID NOS: 5 and 6 and the marker probes comprise SEQ ID NOS: 8 and 10.

Description:
METHODS AND COMPOSITIONS FOR SELECTING SOYBEAN PLANTS HAVING FAVORABLE ALLELIC COMBINATIONS OF STEM TERMINATION AND MATURITY

FIELD OF THE DISCLOSURE

[0001] The present disclosure relates to compositions and methods for identifying, selecting and/or producing soybean plants having a desired allelic combination for late maturity and determinate stem termination.

SEQUENCE LISTING

[0002] A Sequence Listing in XML format, submitted under 37 C.F.R. § 1.831-1.835, entitled 82466-US-L-ORG-NAT-l.xml, generated on June 30, 2022 and approximately 64kb in size, is submitted herewith. This Sequence Listing is hereby incorporated by reference into the specification for its disclosures.

BACKGROUND

[0003] Soybean (Glycine max L. Merr) is a major cash crop and investment commodity in North America and elsewhere. Soybean oil is one of the most widely used edible oils, and soybeans are used worldwide both in animal feed and in human food production. Soybean cultivars can be classified based on a variety of phenotypic traits. One such trait is stem termination which affects plant stem growth habits. Another such trait is soybean maturity which affects number of days to flowering.

[0004] Most soybean cultivars can be classified into two main categories of stem termination: determinate and indeterminate types. In indeterminate cultivars, the apical meristems at the stem and branch apices maintain vegetative activity until photosynthate demand by developing seeds causes a cessation in the production of vegetative dry matter. In contrast, for determinate cultivars, the apical meristems cease vegetative activity at or soon after photoperiod-induced floral induction, and then the meristems become reproductive inflorescences. It is known that stem termination has an impact on plant height, node production, water-use efficiency, and soybean yield. One of the genes that plays a role in stem termination is the Dtl gene on chromosome 19 of the soybean genome. Classical genetic analysis of the stem termination habit of soybean revealed a recessive allele at the determinate stem (Dtl) locus. Using the pea orthologs to extract the coding sequence, expression data, and gene silencing, Dtl has been identified as being an ortholog of Arabidopsis TERMINAL FL0WER1 (GmTFLlb) (Liu et al., Plant Physiology, 153: 198-210 (2010)). The indeterminate cultivars are more widely adapted and more stable to adverse environmental conditions. Molecular markers associated with indeterminate alleles at the Dtl locus have been disclosed at U.S. Patent No. 10,517,242, the contents of which are incorporated by reference herein in their entirety.

[0005] Soybean maturity refers to a growth trait involving the control of flowering time (that is, photoperiod-induced floral induction) and plant maturity, with early maturity associated with a shorter number of days to flowering and later maturity associated with a larger number of days to flowering. At least eleven major E-genes have been reported as being involved in the control of flowering time and maturity in soybean with E1-E4 being the primary genes whose dominant alleles can delay maturity (Tsubokura et al. Annuals of Botany, 113(3): 429-441 (2014)). The El locus was limited to a 17-4 kb region containing an intron-free gene and was identified by positional cloning (Xia et al., PNAS, 109:E2155- E2164, (2012)). This gene (El gene) encodes a protein that contains a putative bipartite nuclear localization signal and a region distantly related to the B3 domain, suggesting that this protein is a novel transcription factor. The El gene is considered to be a suppressor of the expression of GmFT2a and GmFT5a, which are both orthologues of Arabidopsis FLOWERING LOCUS T (FT) (Kong et al., Plant Physiology, 154: 1220-1231 (2010)), under the regulation of the E3 and E4 loci (Xia et al, PNAS, 109:E2155-E2164, (2012)). Xia et al. have also disclosed allelic variations in the El gene. The El maturity gene regulates the number of days required for photoperiod-induced floral induction.

[0006] Stem termination and soybean maturity are typically assessed through phenotyping, which can be time consuming, subjective, and can only be done at late stages of breeding programs. When assessing segregating populations developed from breeding crosses involving parent lines having different allelic combinations of stem termination and maturity genes, errors in selecting for a desired phenotype (due to variations in growth habit and flowering traits) can result in breeding selection errors.

SUMMARY OF THE INVENTION

[0007] Disclosed herein are methods for the use of a combination of molecular markers associated with the El maturity locus and the Dtl stem termination locus to provide more consistent and reliable selection of plants having a favorable or desired phenotype. In some examples, the methods disclosed herein can be used to identify and/or select for soybean plants having a favorable allelic combination of a dominant allele at the El locus (El/El) and a recessive allele at the Dtl locus (dtl/dtl), thereby resulting in a plant having the favorable phenotypic combination of determinate stem growth and delayed maturity (or larger number of days to flowering). Plants identified and/or selected by such a method can be advanced at earlier stages of a breeding pipeline. In other examples, the methods disclosed herein can be used to identify and/or deselect for soybean plants having an unfavorable allelic combination, such as the unfavorable allelic combination of a recessive allele at the El locus (el/el) and a recessive allele at the Dtl locus (dtl/dtl), thereby resulting in a plant having the unfavorable phenotypic combination, such as the combination of determinate stem growth and early maturity (or smaller number of days to flowering). Still other unfavorable phenotypic combinations can be identified and deselected. Plants identified and/or deselected by such a method can be removed at earlier stages of a breeding pipeline.

[0008] Also disclosed herein are compositions comprising nucleic acid based molecular markers associated with a favorable allelic combination that can also be used to select for plants having a favorable phenotype resulting from the favorable allelic combination. In some examples, the compositions comprise markers associated with the favorable allelic combination of a dominant allele at the El locus (El/El) and a recessive allele at the Dtl locus (dtl/dtl), allowing for the selection of soybean plants having the favorable phenotypic combination of determinate stem growth and delayed maturity (or larger number of days to flowering). The compositions may be used to select for plants having the favorable allelic combination and/or for deselecting plants having unfavorable allelic combinations.

[0009] Use of molecular markers associated with stem termination and maturity in soybean will support selections by the soybean breeding programs on earlier breeding stages and will assure that only plants with desirable alleles will be advanced to late stage testing. Also, there is an expected improvement of cost efficiency by eliminating the need of expensive low throughput phenotyping at early breeding stages, thus increasing accuracy of the selection and maximizing the value of investments in field testing. From a trait introgression perspective, the molecular marker combination can be deployed for rapid introgression of both GM and native traits; and expansion of allele frequency for native traits in a broad germplasm base.

[0010] Embodiments of the method comprises identifying and/or selecting a soybean plant with a genotype associated with a late maturity-determinate growth phenotype, the method comprising obtaining a nucleic acid (e.g., DNA or RNA) sample from a soybean plant or part thereof; detecting in the obtained nucleic acid a molecular marker combination comprising: (1) a G at position 335 of SEQ ID NO: 1; and (2) a T at position 260 of SEQ ID NO: 2; and selecting at least one soybean plant on the basis of presence of said molecular marker combination, wherein said molecular marker combination is associated with the desired late maturity-determinate growth phenotype.

[0011] The disclosure also encompasses methods of introgression of a favorable allelic combination of a dominant allele at an El locus and a recessive allele at an Dtl locus into the germplasm of a soybean plant not comprising said allelic combination by crossing a donor parental soybean plant comprising the favorable allelic combination with a recurrent parental soybean plant not comprising the favorable allelic combination to produce progeny plants; selecting a progeny plant for backcrossing with the recurrent parental soybean plant based on the presence, in their genome, of a first marker associated with the dominant allele at the El locus and a second marker associated with the recessive allele at the Dtl locus, wherein the first marker comprises a G at position 335 as in SEQ ID NO: 1 and the second marker comprises a T at position 260 of SEQ ID NO: 2; and backcrossing the selected progeny plant with the recurrent parental soybean plant to generate further progeny plants having the desired allelic combination.

[0012] Compositions and/or kits are also disclosed comprising at least two amplification primer pairs capable of initiating DNA polymerization by a DNA polymerase on a Glycine max nucleic acid template and amplifying nucleic acid from the template to generate a Glycine max marker amplicon wherein the Glycine max amplicon can be used to identify the presence of a molecular marker combination associated with a dominant allele at an El locus and a recessive allele at an Dtl locus. Such compositions comprise (i) a first primer pair comprising a nucleotide sequence as set forth in SEQ ID NOs: 1 and 2; and (ii) a second primer pair comprising a nucleotide sequence as set forth in SEQ ID NOs: 3 and 4. In additional embodiments, the composition further comprises a marker probe pair for identification of the marker amplicons, where the marker probe comprises SEQ ID NO: 7 and SEQ ID NO: 9 or SEQ ID NO: 8 and SEQ ID NO: 10.

[0013] These and other aspects of the invention are set forth in more detail in the description of the invention below.

BRIEF DESCRIPTIONS OF THE SEQUENCES

[0014] SEQ ID NO: 1 is the DNA sequence for molecular marker SY3970 associated with an allele for stem termination at the Dtll locus of the soybean genome.

[0015] SEQ ID NO: 2 is the DNA sequence for molecular marker SY0768BQ associated with an allele for early maturity at the El locus of the soybean genome.

[0016] SEQ ID NO: 3 is a forward primer for the SY3970 marker.

[0017] SEQ ID NO: 4 is a reverse primer for the SY3970 marker.

[0018] SEQ ID NO: 5 is a forward primer for the SY0768BQ marker.

[0019] SEQ ID NO: 6 is a reverse primer for the SY0768BQ marker.

[0020] SEQ ID NO: 7 is a first marker probe for the SY03970 marker.

[0021] SEQ ID NO: 8 is a second marker probe for the SY03970 marker. [0022] SEQ ID NO: 9 is a first marker probe for the SY0768BQ marker.

[0023] SEQ ID NO: 10 is a second marker probe for the SY0768BQ marker.

BRIEF DESCRIPTION OF THE FIGURE

[0024] Figure 1 shows the various developmental phases of a soybean plant.

DEFINITIONS

[0025] Although the following terms are believed to be well understood by one of ordinary skill in the art, the following definitions are set forth to facilitate understanding of the presently disclosed subject matter.

[0026] Compositions and methods are provided for identifying, selecting, and producing soybean plants having a favorable allelic combination of determinate stem termination and late maturity.

[0027] As used herein the term “determinate” refers to a favorable phenotype wherein a soybean apical meristems ceases vegetative activity at or soon after photoperiod-induced floral induction, and then the meristems become reproductive inflorescences. Thus, “determinate growth habit” or “determinate stem termination phenotype” refers to ceasing of vegetative growth after the main stem terminates in a cluster of flowers. In comparison, the term “indeterminate” refers to the unfavorable phenotype wherein a soybean apical meristems at the stem and branch apices maintain vegetative activity until photosynthate demand by developing seeds causes a cessation in the production of vegetative dry matter. Thus, “indeterminate growth habit” or “indeterminate stem termination phenotype” refers to the development of leaves and flowers simultaneously throughout a portion of their reproductive period, with one to three pods at the terminal apex. In embodiments of the invention, the markers disclosed herein can be used to select, identify or produce soybean plants having a determinate phenotype or a plant breeder could use the markers as disclosed herein to eliminate plant lines having the indeterminate phenotype.

[0028] As used herein, the “Dtl gene”, “Dtl locus”, or an allele thereof is in reference to GmDtl or Glyma,19G194300 (www(.)soybase(.)org) associated with stem termination and located on chromosome 19. Dtl affects stem termination which in turn has great effects on plant height, flowering period, node production, maturity, water-use efficiency, and soybean yield. Usually the indeterminate cultivars are more widely adapted and more stable to adverse environmental conditions.

[0029] As used herein, the “El gene”, “El locus” or an allele thereof, is in reference to GmEl or Glyma.06G207800 (www(.)soybase(.)org) associated with flowering time and maturity and located at the pericentromeric region of chromosome 6. The El gene is intron- free and encodes an El protein that contains a putative bipartite nuclear localization signal (NLS) and a domain distantly related to the plant-specific B3 domain (B3-like domain) (Xia, Z. J. et al. Proc Natl Acad Sci USA. 109, E2155-E2164 (2012)). El is a transcription factor. El is considered to be a contributor to the variation in flowering time among soybean cultivars. El also has an impact on pre-flowering development in addition to post-flowering response. El is expressed in a bimodal pattern, with higher expression in long-day (LD) conditions than in short-day (SD) conditions. El is a putative transcription factor (TF) that negatively controls GmFT2a and GmFT5a to delay flowering under the background with functional PHYA genes (E3, E4) and LD conditions.

[0030] In embodiments of the invention, a marker associated with the Dtl locus of the soybean genome, and particularly with the recessive allele at the Dtl locus, may be used in combination with a marker associated with the El locus, and particularly with the dominant allele at the El locus, to identify or select for plants having the presence of the favorable late maturity determinate phenotype. In particular embodiments, the marker associated with the recessive allele at the Dtl locus and responsible for the favorable determinate phenotype is the marker of SEQ ID NO: 1 with a G at position 335, herein also referred to as molecular marker SY3970, and the marker associated with the dominant allele at the El locus and responsible for the favorable late maturity phenotype is the marker of SEQ ID NO: 2 with a T at position 260 herein also referred to as molecular marker SY0768BQ. Use of the molecular associated with the Dtl locus in combination with a molecular marker associated with the El maturity locus enables a person to select, identify or produce soybean plants having a favorable late maturity determinate phenotype. Additionally or optionally, use of the molecular associated with the Dtl locus in combination with a molecular marker associated with the El maturity locus enables a person to select, identify and eliminate soybean plants having an unfavorable early maturity determinate phenotype, an unfavorable early maturity indeterminate phenotype, or an unfavorable late maturity indeterminate

[0031] As used herein, generally the phrase “late maturity” refers to a phenotype wherein a soybean plant has an increased “flowering time” or number of days to flower compared to a soybean plant not having that phenotype, whereas “early maturity” refers to a phenotype wherein a soybean plant has a decreased flowering time or number of days to flower compared to a soybean plant not having that phenotype. The exact number of increased or decreased days may vary, relative to a control plant comprising wild-type alleles at the relevant loci.

[0032] For example, the markers disclosed herein can be used to select, identify or produce soybean plants having a late maturity phenotype or a plant breeder could use the markers as disclosed herein to eliminate plant lines having the early maturity phenotype as compared to wild-type plants. Beyond flowering time, other aspects of maturity phenotypes can be described using post-flowering time, relative maturity, maturity time, maturity group, and number of days from flowering of the soybean plant to beginning of maturity. An illustration of the various growth stages is provided in Figure 1.

[0033] As used herein, “flowering time” or “days to flowering” is an estimate of a duration (e.g., in terms of hours, days, weeks, etc.) elapsed between initiation of first flowering and seed emergence. In particular embodiments, flowering time is defined as a number of days elapsed for a soybean plant to transition from a VE stage (e.g., seeds emergence wherein cotyledons have been pulled through the soil surface for at least 50% of the seeds) to an R1 stage (e.g., beginning of flowering wherein at least 50% of the plants have at least one flower on any node). In particular embodiments, “maturity time” or post flowering time is defined as a number of days elapsed for a soybean plant to transition from the R1 stage (e.g., beginning of bloom wherein there is one open flower at any node on the main stem) to an R7 stage (wherein any pod has reached a mature pod color) or from the R1 stage to an R8 stage (wherein 95% of the pods have reached their mature pod color). A description of the various development stages of a soybean plant and mature soy pod coloration is provided at FIG. 1 as reference. [0034] In embodiments, a plant having an allele associated with a late maturity phenotype has a flowering time that is between 1 and 10 days longer or later than the control plant (e.g., longer than that of the control plant by at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days or todays). In embodiments, a plant having an allele associated with an early maturity phenotype has a flowering time that is between 1 and 10 days shorter or earlier than the control plant (e.g., shorter than that of the control plant by at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days or lOdays).

[0035] As used herein, the terms “a” or “an” or “the” may refer to one or more than one. For example, “a” marker (e g., SNP, QTL, haplotype) can mean one marker or a plurality of markers (e.g., 2, 3, 4, 5, 6, and the like).

[0036] As used herein, the term “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted in the alternative (“or”).

[0037] As used herein, the term “about,” when used in reference to a measurable value such as an amount of mass, dose, time, temperature, and the like, is meant to encompass variations of 20%, 10%, 5%, 1%, 0.5%, or even 0.1% of the specified amount.

[0038] As used herein, the transitional phrase “consisting essentially of’ means that the scope of a claim is to be interpreted to encompass the specified materials or steps recited in the claim and those that do not materially affect the basic and novel character! stic(s) of the claimed invention. Thus, the term “consisting essentially of’ when used in a claim of this invention is not intended to be interpreted to be equivalent to “comprising.”

[0039] As used herein, the term “allele” refers to a variant or an alternative nucleotide sequence of a gene or at a particular genetic locus. Such an allele can be considered (i) wildtype or (ii) mutant if one or more mutations or edits are present in the nucleic acid sequence of the mutant allele relative to the wild-type allele. In diploids, a single allele is inherited by a progeny individual separately from each parent at each locus. The two alleles of a given locus present in a diploid organism occupy corresponding places on a pair of homologous chromosomes, although one of ordinary skill in the art understands that the alleles in any particular individual do not necessarily represent all of the alleles that are present in the species.

[0040] As used herein, the terms “desired allele”, “favorable allele”, “target allele” and/or “allele of interest” are used interchangeably to refer to an allele associated with a desired trait. In some embodiments, a desired allele may be associated with either an increase or a decrease (relative to a control) of or in a given trait, depending on the nature of the desired phenotype. In some embodiments of this invention, the phrase “desired allele,” “target allele” or “allele of interest” refers to an allele(s) that is associated with late maturity and/or determinate stem traits in soybean plants relative to a control soybean plant not having the target allele or alleles.

[0041] A mutant allele for a gene may have a reduced or eliminated activity or expression level for the gene relative to the wild-type allele. For diploid organisms such as corn and soy, a first allele can occur on one chromosome, and a second allele can occur at the same locus on a second homologous chromosome. If one allele at a locus on one chromosome of a plant is a mutant allele and the other corresponding allele on the homologous chromosome of the plant is wild-type, then the plant is described as being heterozygous for the mutant allele. However, if both alleles at a locus are mutant alleles, then the plant is described as being homozygous for the mutant alleles. A plant homozygous for mutant alleles at a locus may comprise the same mutant allele or different mutant alleles if heteroallelic or biallelic.

[0042] “Allelic variation” refers to the phenomenon of variation in the sequence form of an allele at a given genetic locus. Allelic variation results in the creation of two or more allelic variants. The variants may be naturally occurring and reflective of genetic differences among individuals of the same species. Such natural variations can occur as a result of natural breeding patterns. Alternatively, the variants may be non-naturally occurring, and artificially created (e.g., by a breeder or a scientist), such as using mutagenesis and/or gene editing techniques. A “dominant allele” is an allele that, when present either in single copy (heterozygous) or two copies (homozygous), affects the corresponding trait. Thus, a “dominant allele at the El locus” is an El allele that, when present either in a single or double copy, results in a late maturity phenotype. A “recessive maturity allele at the El locus” is an allele that affects the maturity of the plant only when present in two copies (homozygous), to result in an early maturity phenotype, and does not cause an early maturity phenotype when present in a single copy (heterozygous). A “dominant allele at the Dtl locus” is a Dtl allele that, when present either in a single or double copy, results in an indeterminate stem termination phenotype. A “recessive maturity allele at the Dtl locus” is an allele that affects the maturity of the plant only when present in two copies (homozygous), to result in a determinate stem termination phenotype.

[0043] As used herein, an “allelic combination” refers to the specific combination of alleles present at more than one characterized location or loci. Embodiments of the invention include methods of using molecular markers to identify, in the genome of a soybean plant, the presence of a favorable or desired allelic combination at the El and Dtl loci. Such a desirable allelic combination includes a recessive allele at the Dt l locus resulting in a determinate stem termination phenotype, and a dominant allele at the El locus resulting in a late maturity phenotype. Embodiments of the invention also include methods of using molecular markers to identify, in the genome of a soybean plant, the absence of the favorable allelic combinations and/or the presence of an unfavorable or undesired allelic combination at the El and Dtl loci. Non-limiting examples of undesired allelic combinations at the El and Dtl loci include: (i) a recessive allele at the Dtl locus resulting in a determinate stem termination phenotype, and a recessive allele at the El locus resulting in an early maturity phenotype; (ii) a dominant allele at the Dtl locus resulting in an indeterminate stem termination phenotype, and a dominant allele at the El locus resulting in a late maturity phenotype; and (iii) a dominant allele at the Dtl locus resulting in an indeterminate stem termination phenotype, and a recessive allele at the El locus resulting in an early maturity phenotype.

[0044] In embodiments of the invention, the allelic combination of a plant at the El and Dtl loci may be determined via molecular marker-based assays, such as a first assay of the DNA of the plant indicative of a type of the presence of a recessive allele at the Dtl locus and a second assay of the DNA indicative of the presence of a dominant allele at the El locus. The first assay may be performed using a primer pair directed to a molecular marker associated with the recessive allele at the Dtl locus, such as the primer pair of SEQ ID NOS: 3-4 used for detection of the molecular marker SY3970 (SEQ ID NO: 1 with a G at position 335). Alternatively, the first assay may be performed using a probe directed to a molecular marker associated with the recessive allele at the Dtl locus, such as the probe of SEQ ID NO: 7 or 8 used for detection of the molecular marker SY3970 (SEQ ID NO: 1 with a G at position 335). The second assay may be performed using a primer pair directed to a molecular marker associated with the dominant allele at the El locus, such as the primer pair of SEQ ID NOS: 5-6 used for detection of the molecular marker SY0768BQ (SEQ ID NO: 2 with a T at position 260). Alternatively, the second assay may be performed using a probe directed to a molecular marker associated with the recessive allele at the El locus, such as the probe of SEQ ID NO: 9 or 10 used for detection of the molecular marker SY0768BQ (SEQ ID NO: 2 with a T at position 260).

[0045] A “locus” is a position on a chromosome where a gene or marker or allele is located. In some embodiments, a locus may encompass one or more nucleotides. In particular embodiments of the invention, the locus is defined with reference to the 8X public build of the Williams82 soybean genome at the SoyBase internet resource (soybase.org/SequenceIntro.php) or USDA at (bfgl.anri.barc.usda.gov/cgi- bin/soybean/Linkage.pl).

[0046] A marker is “associated with” a trait when said trait is linked to it and when the presence of the marker is an indicator of whether and/or to what extent the desired trait or trait form will occur in a plant/germplasm comprising the marker. Similarly, a marker is “associated with” an allele or chromosome interval when it is linked to it and when the presence of the marker is an indicator of whether the allele or chromosome interval is present in a plant/germplasm comprising the marker. For example, “a marker associated with late maturity” refers to a marker whose presence or absence can be used to predict whether a soybean plant will have a late maturity phenotype with an increased number of days until flowering. Similarly, “a marker associated with the determinate trait” refers to a marker whose presence or absence can be used to predict whether a soybean plant will have a determinate stem phenotype wherein the apical meristems will cease vegetative activity at or soon after photoperiod-induced floral induction, and then become reproductive inflorescences. [0047] As used herein, the terms “backcross” and “backcrossing” refer to the process whereby a progeny plant is crossed back to one of its parents one or more times (e.g., 1, 2, 3, 4, 5, 6, 7, 8, etc.). In a backcrossing scheme, the “donor” parent refers to the parental plant with the desired gene or locus to be introgressed. The “recipient” parent (used one or more times) or “recurrent” parent (used two or more times) refers to the parental plant into which the gene or locus is being introgressed. For example, see Ragot, M. et al. Marker-assisted Backcrossing: A Practical Example, in TECHNIQUES ET UTILISATIONS DES MARQUEURS MOLECULAIRES LES COLLOQUES, Vol. 72, pp. 45-56 (1995); and Openshaw et al., Marker- assisted Selection in Backcross Breeding, in PROCEEDINGS OF THE SYMPOSIUM “ANALYSIS OF MOLECULAR MARKER DATA,” pp. 41-43 (1994). The initial cross gives rise to the Fl generation. The term “BC1” refers to the second use of the recurrent parent, “BC2” refers to the third use of the recurrent parent, and so on.

[0048] As used herein, the terms “cross” or “crossed” refer to the fusion of gametes via pollination to produce progeny (e.g., cells, seeds or plants). The term encompasses both sexual crosses (the pollination of one plant by another) and selfing (self-pollination, e.g., when the pollen and ovule are from the same plant). The term “crossing” refers to the act of fusing gametes via pollination to produce progeny.

[0049] As used herein, the terms “cultivar” and “variety” refer to a group of similar plants that by structural or genetic features and/or performance can be distinguished from other varieties within the same species.

[0050] As used herein, the terms “elite” and/or “elite line” refer to any line that is substantially homozygous and has resulted from breeding and selection for desirable agronomic performance.

[0051] As used herein, the terms “exotic,” “exotic line” and “exotic germplasm” refer to any plant, line or germplasm that is not elite. In general, exotic plants/germplasms are not derived from any known elite plant or germplasm, but rather are selected to introduce one or more desired genetic elements into a breeding program (e.g., to introduce novel alleles into a breeding program). [0052] As used herein, the term “dominant” refers to a variant or allele of a gene that masks or overrides the phenotypic effect of a different variant (allele) or the gene on the other copy of the chromosome. This definition can also specifically encompass incomplete dominance or co-dominance, as generally understood in the art.

[0053] A “genetic map” is a description of genetic linkage relationships among loci on one or more chromosomes within a given species, generally depicted in a diagrammatic or tabular form. For each genetic map, distances between loci are measured by the recombination frequencies between them. Recombination between loci can be detected using a variety of markers. A genetic map is a product of the mapping population, types of markers used, and the polymorphic potential of each marker between different populations. The order and genetic distances between loci can differ from one genetic map to another.

[0054] As used herein, the term “genotype” refers to the genetic constitution of an individual (or group of individuals) at one or more genetic loci, as contrasted with the observable and/or detectable and/or manifested trait (the phenotype). Genotype is defined by the combination of allele(s) of one or more known loci that the individual has inherited from its parents. The term genotype can be used to refer to an individual's genetic constitution at a single locus, at multiple loci, or more generally, the term genotype can be used to refer to an individual's genetic make-up for all the genes in its genome. Genotypes can be indirectly characterized, e.g., using markers and/or directly characterized by nucleic acid sequencing.

[0055] As used herein, the term “germplasm” refers to genetic material of or from an individual (e.g., a plant), a group of individuals (e.g., a plant line, variety or family), or a clone derived from a line, variety, species, or culture. The germplasm can be part of an organism or cell, or can be separate from the organism or cell. In general, germplasm provides genetic material with a specific genetic makeup that provides a foundation for some or all of the hereditary qualities of an organism or cell culture. As used herein, germplasm includes cells, seed or tissues from which new plants may be grown, as well as plant parts that can be cultured into a whole plant (e.g., leaves, stems, buds, roots, pollen, cells, etc ).

[0056] A “haplotype” is the genotype of an individual at a plurality of genetic loci, i.e., a combination of alleles. In some embodiments, the genetic loci that define a haplotype are physically and genetically linked, i.e., on the same chromosome segment, however this is not necessarily the case. If the various combination of alleles are not physically and genetically linked, then the haplotype combination is generally determined by the ability of the haplotype to provide a desired phenotype, such as presently described for the late maturity determinate phenotype. The term “haplotype” can refer to polymorphisms at a particular locus, such as a single marker locus, or polymorphisms at multiple loci along one or more chromosomal segments.

[0057] As used herein, the term “heterozygous” refers to a genetic status wherein different alleles reside at corresponding loci on homologous chromosomes.

[0058] As used herein, the term “homozygous” refers to a genetic status wherein identical alleles reside at corresponding loci on homologous chromosomes.

[0059] As used herein, the term “hybrid" in the context of plant breeding refers to a plant that is the offspring of genetically dissimilar parents produced by crossing plants of different lines or breeds or species, including but not limited to the cross between two inbred lines.

[0060] As used herein, the term “inbred” refers to a substantially homozygous plant or variety. The term may refer to a plant or plant variety that is substantially homozygous throughout the entire genome or that is substantially homozygous with respect to a portion of the genome that is of particular interest.

[0061] As used herein, the term “indel” refers to an insertion or deletion in a pair of nucleotide sequences, wherein a first sequence may be referred to as having an insertion relative to a second sequence or the second sequence may be referred to as having a deletion relative to the first sequence.

[0062] As used herein, the terms “introgression,” “introgressing” and “introgressed” refer to both the natural and artificial transmission of a desired allele or combination of desired alleles of a genetic locus or genetic loci from one genetic background to another. For example, a desired allele at a specified locus can be transmitted to at least one progeny via a sexual cross between two parents of the same species, where at least one of the parents has the desired allele in its genome. Alternatively, for example, transmission of an allele can occur by recombination between two donor genomes, e g., in a fused protoplast, where at least one of the donor protoplasts has the desired allele in its genome. The desired allele may be a selected allele of a marker, a QTL, a transgene, or the like. Offspring comprising the desired allele can be backcrossed one or more times (e.g., 1, 2, 3, 4, or more times) to a line having a desired genetic background, selecting for the desired allele, with the result being that the desired allele becomes fixed in the desired genetic background. For example, a marker associated with late maturity or a marker associated with determinate may be introgressed from a donor into a recurrent parent that displays early maturity, average maturity, or indeterminate stem traits. The resulting offspring could then be backcrossed one or more times and selected until the progeny possess the genetic marker(s) associated with late maturity and/or indeterminate in the recurrent parent background.

[0063] As used herein, the term “linkage” refers to the degree with which one marker locus is associated with another marker locus or some other. The linkage relationship between a genetic marker and a phenotype may be given as a “probability” or “adjusted probability.” Linkage can be expressed as a desired limit or range. For example, in some embodiments, any marker is linked (genetically and physically) to any other marker when the markers are separated by less than about 50, 40, 30, 25, 20, or 15 map units (or cM).

[0064] A centimorgan (“cM”) or a genetic map unit (m.u.) is a unit of measure of recombination frequency and is defined as the distance between genes for which one product of meiosis in 100 is recombinant. One cM is equal to a 1% chance that a marker at one genetic locus will be separated from a marker at a second locus due to crossing over in a single generation. Thus, a recombinant frequency (RF) of 1% is equivalent to 1 m.u.

[0065] As used herein, the phrase “linkage group" refers to all of the genes or genetic traits that are located on the same chromosome. Within the linkage group, those loci that are close enough together can exhibit linkage in genetic crosses. Since the probability of crossover increases with the physical distance between loci on a chromosome, loci for which the locations are far removed from each other within a linkage group might not exhibit any detectable linkage in direct genetic tests. The term "linkage group" is mostly used to refer to genetic loci that exhibit linked behavior in genetic systems where chromosomal assignments have not yet been made. Thus, the term "linkage group" is synonymous with the physical entity of a chromosome, although one of ordinary skill in the art will understand that a linkage group can also be defined as corresponding to a region of (i.e., less than the entirety) of a given chromosome.

[0066] As used herein, the term “linkage disequilibrium” refers to a non-random segregation of genetic loci or traits (or both). In either case, linkage disequilibrium implies that the relevant loci are within sufficient physical proximity along a length of a chromosome so that they segregate together with greater than random (i.e., non-random) frequency (in the case of co-segregating traits, the loci that underlie the traits are in sufficient proximity to each other). Markers that show linkage disequilibrium are considered linked. Linked loci cosegregate more than 50% of the time, e.g., from about 51% to about 100% of the time. In other words, two markers that co-segregate have a recombination frequency of less than 50% (and, by definition, are separated by less than 50 cM on the same chromosome). As used herein, linkage can be between two markers, or alternatively between a marker and a phenotype. A marker locus can be “associated with” (linked to) a trait, e.g, late maturity and/or determinate. The degree of linkage of a genetic marker to a phenotypic trait is measured, e.g., as a statistical probability of co-segregation of that marker with the phenotype.

[0067] Linkage disequilibrium is most commonly assessed using the measure r 2 , which is calculated using the formula described by Hill and Robertson, Theor. AppL Genet. 38:226 (1968). When r 2 =l, complete linkage disequilibrium exists between the two marker loci, meaning that the markers have not been separated by recombination and have the same allele frequency. Values for r 2 above 1/3 indicate sufficiently strong linkage disequilibrium to be useful for mapping. Ardlie et al., Nature Reviews Genetics 3:299 (2002). Hence, alleles are in linkage disequilibrium when r 2 values between pairwise marker loci are greater than or equal to about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0.

[0068] As used herein, the term “linkage equilibrium” describes a situation where two markers independently segregate, i.e., sort among progeny randomly. Markers that show linkage equilibrium are considered unlinked (whether or not they lie on the same chromosome). [0069] As used herein, the terms “marker” and “genetic marker” are used interchangeably to refer to a nucleotide and/or a nucleotide sequence that has been associated with a phenotype and/or trait. A marker may be, but is not limited to, an allele, a gene, a haplotype, a chromosome interval, a restriction fragment length polymorphism (RFLP), a simple sequence repeat (SSR), a random amplified polymorphic DNA (RAPD), a cleaved amplified polymorphic sequence (CAPS) (Rafalski and Tingey, Trends in Genetics 9:275 (1993)), an amplified fragment length polymorphism (AFLP) (Vos et al., Nucleic Acids Res. 23:4407 (1995)), a single nucleotide polymorphism (SNP) (Brookes, Gene 234: 177 (1993)), a sequence-characterized amplified region (SCAR) (Paran and Michelmore, Theor. AppL Genet. 85:985 (1993)), a sequence-tagged site (STS) (Onozaki et al., Euphytica 138:255 (2004)), a single-stranded conformation polymorphism (SSCP) (Orita et al., Proc. Natl.

Acad. Sei. USA 86:2766 (1989)), an inter-simple sequence repeat (ISSR) (Blair et al., Theor. Appl. Genet. 98:780 (1999)), an inter-retrotransposon amplified polymorphism (IRAP), a retrotransposon-microsatellite amplified polymorphism (REMAP) (Kalendar et al., Theor. Appl. Genet. 98:704 (1999)), an isozyme marker, an RNA cleavage product (such as a Lynx tag) or any combination of the markers described herein. A marker may be present in genomic or expressed nucleic acids (e.g., ESTs). A large number of soybean genetic markers are known in the art, and are published or available from various sources, such as the SoyBase internet resource (www.soybase.org). In some embodiments, a genetic marker of this invention is a SNP allele, a SNP allele located in a chromosome interval and/or a haplotype (combination of SNP alleles) that is associated with a late maturity determinate phenotype.

[0070] Markers corresponding to genetic polymorphisms between members of a population can be detected by methods well-established in the art. These include, but are not limited to, nucleic acid sequencing, hybridization methods, amplification methods (e.g., PCR-based sequence specific amplification methods), detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), detection of amplified variable sequences of the plant genome, detection of self-sustained sequence replication, detection of simple sequence repeats (SSRs), detection of randomly amplified polymorphic DNA (RAPD), detection of single nucleotide polymorphisms (SNPs), and/or detection of amplified fragment length polymorphisms (AFLPs). Thus, in some embodiments of this invention, such well known methods can be used to detect the SNP alleles as defined herein (See, e.g., Tables 1-3)

[0071] Accordingly, in some embodiments of this invention, a marker is detected by amplifying a Glycine sp. nucleic acid with two oligonucleotide primers by, for example, the polymerase chain reaction (PCR).

[0072] A “marker allele,” also described as an “allele of a marker locus,” can refer to one of a plurality of polymorphic nucleotide sequences found at a marker locus in a population that is polymorphic for the marker locus.

[0073] “Marker-assisted selection” (MAS) is a process by which phenotypes are selected based on marker genotypes. Marker assisted selection includes the use of marker genotypes for identifying plants for inclusion in and/or removal from a breeding program or planting.

[0074] As used herein, the terms “marker locus” and “marker loci” refer to a specific chromosome location or locations in the genome of an organism where a specific marker or markers can be found. A marker locus can be used to track the presence of a second linked locus, e.g., a linked locus that encodes or contributes to expression of a phenotypic trait. For example, a marker locus can be used to monitor segregation of alleles at a locus, such as a QTL or single gene, that are genetically or physically linked to the marker locus.

[0075] As used herein, the terms “marker probe” and “probe” refer to a nucleotide sequence or nucleic acid molecule that can be used to detect the presence of one or more particular alleles within a marker locus (e.g., a nucleic acid probe that is complementary to all of or a portion of the marker or marker locus, through nucleic acid hybridization). Marker probes comprising about 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more contiguous nucleotides may be used for nucleic acid hybridization. Alternatively, in some aspects, a marker probe refers to a probe of any type that is able to distinguish (i.e., genotype) the particular allele that is present at a marker locus. Typically, probes are a single-stranded oligonucleotide sequence that will form a hydrogen-bonded duplex with a complementary sequence in a target nucleic acid sequence analyte or its cDNA derivative. Non-limiting examples of probes of this disclosure include those listed in Table 4. [0076] As used herein, the term “molecular marker” may be used to refer to a genetic marker, as defined above, or an encoded product thereof (e.g., a protein) used as a point of reference when identifying a linked locus. A molecular marker can be derived from genomic nucleotide sequences or from expressed nucleotide sequences (e.g., from a spliced RNA, a cDNA, etc.). The term also refers to nucleotide sequences complementary to or flanking the marker sequences, such as nucleotide sequences used as probes and/or primers capable of amplifying the marker sequence. Nucleotide sequences are “complementary” when they specifically hybridize in solution, e.g., according to Watson-Crick base pairing rules. Some of the markers described herein can also be referred to as hybridization markers when located on an indel region. This is because the insertion region is, by definition, a polymorphism vis- a-vis a plant without the insertion. Thus, the marker need only indicate whether the indel region is present or absent. Any suitable marker detection technology may be used to identify such a hybridization marker, e.g., SNP technology.

[0077] As used herein, the term “primer" refers to an oligonucleotide which is capable of annealing to a nucleic acid target and serving as a point of initiation of DNA synthesis when placed under conditions in which synthesis of a primer extension product is induced e.g., in the presence of nucleotides and an agent for polymerization such as DNA polymerase and at a suitable temperature and pH). A primer (in some embodiments an extension primer and in some embodiments an amplification primer) is in some embodiments single stranded for maximum efficiency in extension and/or amplification. In some embodiments, the primer is an oligodeoxyribonucleotide. A primer is typically sufficiently long to prime the synthesis of extension and/or amplification products in the presence of the agent for polymerization. The minimum lengths of the primers can depend on many factors, including, but not limited to temperature and composition (A/T vs. G/C content) of the primer. In the context of amplification primers, these are typically provided as a pair of bi-directional primers consisting of one forward and one reverse primer or provided as a pair of forward primers as commonly used in the art of DNA amplification such as in PCR amplification. As such, it will be understood that the term "primer", as used herein, can refer to more than one primer, particularly in the case where there is some ambiguity in the information regarding the terminal sequence(s) of the target region to be amplified. Hence, a "primer" can include a collection of primer oligonucleotides containing sequences representing the possible variations in the sequence or includes nucleotides which allow a typical base pairing. Nonlimiting examples of primers used in this disclosure to amplify a marker locus and generate an amplicon include those listed in Table 3.

[0078] Primers can be prepared by any suitable method. Methods for preparing oligonucleotides of specific sequence are known in the art, and include, for example, cloning and restriction of appropriate sequences and direct chemical synthesis. Chemical synthesis methods can include, for example, the phospho di- or tri-ester method, the diethylphosphoramidate method and the solid support method disclosed in U.S. Patent No. 4,458,066. Primers can be labeled, if desired, by incorporating detectable moieties by for instance spectroscopic, fluorescence, photochemical, biochemical, immunochemical, or chemical moieties.

[0079] The PCR method is well described in handbooks and known to the skilled person. After amplification by PCR, target polynucleotides can be detected by hybridization with a probe polynucleotide which forms a stable hybrid with that of the target sequence under stringent to moderately stringent hybridization and wash conditions. If it is expected that the probes are essentially completely complementary (z.e., about 99% or greater) to the target sequence, stringent conditions can be used. If some mismatching is expected, for example if variant strains are expected with the result that the probe will not be completely complementary, the stringency of hybridization can be reduced. In some embodiments, conditions are chosen to rule out non-specific/adventitious binding. Conditions that affect hybridization, and that select against non-specific binding are known in the art, and are described in, for example, Sambrook & Russell (2001 ) Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America. Generally, lower salt concentration and higher temperature hybridization and/or washes increase the stringency of hybridization conditions.

[0080] Different nucleotide sequences or polypeptide sequences having homology are referred to herein as “homologues.” The term homologue includes homologous sequences from the same and other species and orthologous sequences from the same and other species. “Homology” refers to the level of similarity between two or more nucleotide sequences and/or amino acid sequences in terms of percent of positional identity (i.e., sequence similarity or identity). Homology also refers to the concept of similar functional properties among different nucleic acids, amino acids, and/or proteins.

[0081] As used herein, the phrase “nucleotide sequence homology" refers to the presence of homology between two polynucleotides. Polynucleotides have "homologous" sequences if the sequence of nucleotides in the two sequences is the same when aligned for maximum correspondence. The "percentage of sequence homology" for polynucleotides, such as 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100 percent sequence homology, can be determined by comparing two optimally aligned sequences over a comparison window (e.g, about 20-200 contiguous nucleotides), wherein the portion of the polynucleotide sequence in the comparison window can include additions or deletions (z.e., gaps) as compared to a reference sequence for optimal alignment of the two sequences. Optimal alignment of sequences for comparison can be conducted by computerized implementations of known algorithms, or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST; Altschul et al. (1990) JMolBiol 215:403-10; Altschul et al. (1997) Nucleic Acids Res 25:3389-3402) and ClustalX (Chenna et al. (2003) Nucleic Acids Res 31 :3497-3500) programs, both available on the Internet. Other suitable programs include, but are not limited to, GAP, BestFit, Plotsimilarity, and FASTA, which are part of the Accelrys GCG Package available from Accelrys Software, Inc. of San Diego, California, United States of America.

[0082] As used herein “sequence identity” refers to the extent to which two optimally aligned polynucleotide or polypeptide sequences are invariant throughout a window of alignment of components, e.g., nucleotides or amino acids. “Identity” can be readily calculated by known methods including, but not limited to, those described in: Computational Molecular Biology (Lesk, A. M., ed.) Oxford University Press, New York (1988); Biocomputing: Informatics and Genome Projects (Smith, D. W ., ed.) Academic Press, New York (1993); Computer Analysis of Sequence Data, Part I (Griffin, A. M., and Griffin, H. G., eds.) Humana Press, New Jersey (1994); Sequence Analysis in Molecular Biology (von Heinje, G., ed.) Academic Press (1987); and Sequence Analysis Primer (Gribskov, M. and Devereux, J., eds.) Stockton Press, New York (1991). [0083] As used herein, the term “substantially identical” or “corresponding to” means that two nucleotide sequences have at least 50%, 60%, 70%, 75%, 80%, 85%, 90% or 95% sequence identity. In some embodiments, the two nucleotide sequences can have at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity.

[0084] An “identity fraction” for aligned segments of a test sequence and a reference sequence is the number of identical components which are shared by the two aligned sequences divided by the total number of components in the reference sequence segment, i.e., the entire reference sequence or a smaller defined part of the reference sequence. Percent sequence identity is represented as the identity fraction multiplied by 100. As used herein, the term “percent sequence identity” or “percent identity” refers to the percentage of identical nucleotides in a linear polynucleotide sequence of a reference (“query”) polynucleotide molecule (or its complementary strand) as compared to a test (“subject”) polynucleotide molecule (or its complementary strand) when the two sequences are optimally aligned (with appropriate nucleotide insertions, deletions, or gaps totaling less than 20 percent of the reference sequence over the window of comparison). In some embodiments, “percent identity” can refer to the percentage of identical amino acids in an amino acid sequence.

[0085] Optimal alignment of sequences for aligning a comparison window is well known to those skilled in the art and may be conducted by tools such as the local homology algorithm of Smith and Waterman, the homology alignment algorithm of Needleman and Wunsch, the search for similarity method of Pearson and Lipman, and optionally by computerized implementations of these algorithms such as GAP, BESTFIT, FASTA, and TFASTA available as part of the GCG® Wisconsin Package® (Accelrys Inc., Burlington, Mass.). The comparison of one or more polynucleotide sequences may be to a full-length polynucleotide sequence or a portion thereof, or to a longer polynucleotide sequence. For purposes of this invention “percent identity” may also be determined using BLASTX version 2.0 for translated nucleotide sequences and BLASTN version 2.0 for polynucleotide sequences.

[0086] The percent of sequence identity can be determined using the “Best Fit” or “Gap” program of the Sequence Analysis Software Package™ (Version 10; Genetics Computer Group, Inc., Madison, Wis.). “Gap” utilizes the algorithm of Needleman and Wunsch (Needleman and Wunsch, J Mol. Biol. 48:443-453, 1970) to find the alignment of two sequences that maximizes the number of matches and minimizes the number of gaps. “BestFif ’ performs an optimal alignment of the best segment of similarity between two sequences and inserts gaps to maximize the number of matches using the local homology algorithm of Smith and Waterman (Smith and Waterman, Adv. Appl. Math., 2:482-489, 1981, Smith et al., Nucleic Acids Res. 11 :2205-2220, 1983).

[0087] Useful methods for determining sequence identity are also disclosed in Guide to Huge Computers (Martin J. Bishop, ed., Academic Press, San Diego (1994)), and Carillo et al. (Applied Math 48: 1073(1988)). More particularly, preferred computer programs for determining sequence identity include but are not limited to the Basic Local Alignment Search Tool (BLAST) programs which are publicly available from National Center Biotechnology Information (NCBI) at the National Library of Medicine, National Institute of Health, Bethesda, Md. 20894; see BLAST Manual, Altschul et al., NCBI, NLM, NIH, (Altschul et al., J. Mol. Biol. 215:403-410 (1990)); version 2.0 or higher of BLAST programs allows the introduction of gaps (deletions and insertions) into alignments; for peptide sequence BLASTX can be used to determine sequence identity; and for polynucleotide sequence BLASTN can be used to determine sequence identity.

[0088] As used herein, the terms “phenotype,” “phenotypic trait” or “trait” refer to one or more traits of an organism. The phenotype can be observable to the naked eye, or by any other means of evaluation known in the art, e.g., microscopy, biochemical analysis, or an electromechanical assay. In some cases, a phenotype is directly controlled by a single gene or genetic locus, i.e., a “single gene trait ” Tn other cases, a phenotype is the result of several genes.

[0089] As used herein, the term “polymorphism” refers to a variation in the nucleotide sequence at a locus, where said variation is too common to be due merely to a spontaneous mutation. A polymorphism must have a frequency of at least about 1% in a population. A polymorphism can be a single nucleotide polymorphism (SNP), or an insertion/deletion polymorphism, also referred to herein as an “indel.” Additionally, the variation can be in a transcriptional profile or a methylation pattern. The polymorphic site or sites of a nucleotide sequence can be determined by comparing the nucleotide sequences at one or more loci in two or more germplasm entries.

[0090] As used herein, the term “plant” can refer to a whole plant, any part thereof, or a cell or tissue culture derived from a plant. Thus, the term “plant” can refer to a whole plant, a plant component or a plant organ (e.g., leaves, stems, roots, etc.), a plant tissue, a seed and/or a plant cell. A plant cell is a cell of a plant, taken from a plant, or derived through culture from a cell taken from a plant.

[0091] As used herein, the term “soybean" refers to a plant, and any part thereof, of the genus Glycine including, but not limited to Glycine max.

[0092] As used herein, the term "plant part" includes but is not limited to embryos, pollen, seeds, leaves, flowers (including but not limited to anthers, ovules and the like), fruit, stems or branches, roots, root tips, cells including cells that are intact in plants and/or parts of plants, protoplasts, plant cell tissue cultures, plant calli, plant clumps, and the like. Thus, a plant part includes soybean tissue culture from which soybean plants can be regenerated. Further, as used herein, "plant cell" refers to a structural and physiological unit of the plant, which comprises a cell wall and also may refer to a protoplast. A plant cell of the present invention can be in the form of an isolated single cell or can be a cultured cell or can be a part of a higher-organized unit such as, for example, a plant tissue or a plant organ.

[0093] As used herein, the term “population” refers to a genetically heterogeneous collection of plants sharing a common genetic derivation.

[0094] As used herein, the terms “progeny”, “progeny plant,” and/or “offspring” refer to a plant generated from a vegetative or sexual reproduction from one or more parent plants. A progeny plant may be obtained by cloning or selfing a single parent plant, or by crossing two parental plants and includes selfings as well as the Fl or F2 or still further generations. An Fl is a first-generation offspring produced from parents at least one of which is used for the first time as donor of a trait, while offspring of second generation (F2) or subsequent generations (F3, F4, and the like) are specimens produced from selfings or crossings of FIs, F2s and the like. An Fl can thus be (and in some embodiments is) a hybrid resulting from a cross between two true breeding parents (the phrase “true-breeding” refers to an individual that is homozygous for one or more traits), while an F2 can be (and in some embodiments is) an offspring resulting from self-pollination of the Fl hybrids.

[0095] As used herein, the term “recessive” refers to a variant or allele of a gene that is masked or overridden by the phenotypic effect of a different variant (allele) or the gene on the other copy of the chromosome, which is generally described as “dominant.” This definition can also encompass incomplete recessiveness, as generally understood in the art.

[0096] As used herein, the term “reference sequence” refers to a defined nucleotide sequence used as a basis for nucleotide sequence comparison. The reference sequence for a marker, for example, can be obtained by genotyping a number of lines at the locus or loci of interest, aligning the nucleotide sequences in a sequence alignment program, and then obtaining the consensus sequence of the alignment. Hence, a reference sequence identifies the polymorphisms in alleles at a locus. A reference sequence may not be a copy of an actual nucleic acid sequence from any particular organism; however, it is useful for designing primers and probes for actual polymorphisms in the actual nucleic acid sequence.

DETAILED DESCRIPTION

[0097] All technical and scientific terms used herein, unless otherwise defined below, are intended to have the same meaning as commonly understood by one of ordinary skill in the art. References to techniques employed herein are intended to refer to the techniques as commonly understood in the art, including variations on those techniques or substitutions of equivalent techniques that would be apparent to one of skill in the art.

[0098] All patents, patent publications, non-patent publications and sequences referenced herein are incorporated by reference in their entireties.

[0099] Disclosed herein are compositions for, and methods of using, a combination of genetic markers (e.g., combinations of SNPs) to identify soybean plants having a desired or favorable combination of alleles at the El and Dtl loci, the favorable allelic combination associated with a late maturity determinate phenotype in soybean plants. The combinations of alleles for these two genes are disclosed in specific haplotypes that provide the desired phenotype of late maturity determinate growth. By using molecular markers to identify plants having the favorable allelic combination, plants expected to have a late maturity determinate phenotype can be identified and promoted at an earlier stage of a breeding pipeline. Also disclosed are methods of using the combination of genetic markers (e.g., combinations of SNPs) to identify plants having an undesired or unfavorable combination of alleles at the El and Dtl loci, the unfavorable allelic combination associated with an early maturity and/or an indeterminate phenotype in soybean plants. By using molecular markers to identify plants having the unfavorable allelic combination, plants expected to have an early maturity and/or indeterminate phenotype can be identified and removed at an earlier stage of a breeding pipeline. The methods support soybean germplasm assessment, correct progeny selection during population development, and aid soybean product development. By selecting the correct plant type required for later maturity determinate soybeans based on genetic information provided by the markers disclosed in this invention, there is an improvement in cost efficiency of developing late maturity determinate soybean products due to identifying the correct plant type earlier in the development pipeline. Overall, the time and labor involved in phenotyping each plant of a segregating population is significantly reduced.

Markers Associated with Late Maturity and Determinate Stem Growth

[0100] Genetic loci correlating with a particular phenotype, such as late maturity and/or determinate stem growth, can be mapped in an organism's genome. By identifying a marker or cluster of markers that co-segregate with a trait of interest, or a combination of traits of interest, a breeder is able to rapidly select a desired phenotype, or combination of phenotypes, by selecting for the proper marker or markers (a process called marker-assisted selection, or MAS). Such markers may also be used by breeders to design genotypes in silico and to practice whole genome selection.

[0101] Molecular markers are used for the visualization of differences in nucleic acid sequences. This visualization can be due to DNA-DNA hybridization techniques after digestion with a restriction enzyme (e.g., an RFLP) and/or due to techniques using the polymerase chain reaction (e.g., SNP, STS, SSR/microsatellites, AFLP, and the like). In some embodiments, all differences between two parental genotypes segregate in a mapping population based on the cross of these parental genotypes. The segregation of the different markers can be compared, and recombination frequencies can be calculated. Methods for mapping markers in plants are disclosed in, for example, Glick & Thompson (1993) Methods in Plant Molecular Biology and Biotechnology, CRC Press, Boca Raton, Florida, United States of America; Zietkiewicz et al. (1994) Genomics 20:176-183. Methods for identifying markers in soybean is specifically disclosed in Yuan et al., Plant Genetics, Genomics, and Biotechnology, 2(1): 90-94 (2014).

[0102] The recombination frequencies of genetic markers on different chromosomes and/or in different linkage groups are generally 50%. Between genetic markers located on the same chromosome or in the same linkage group, the recombination frequency generally depends on the physical distance between the markers on a chromosome. A low recombination frequency typically corresponds to a low genetic distance between markers on a chromosome. Comparison of all recombination frequencies among a set of genetic markers results in the most logical order of the genetic markers on the chromosomes or in the linkage groups. This most logical order can be depicted in a linkage map. A group of adjacent or contiguous markers on the linkage map that is associated with late maturity and/or indeterminate phenotype can provide the position of a locus associated with those phenotypes.

[0103] The present invention provides combination of SNP markers that can be used in various aspects of the presently disclosed subject matter as set forth herein. The SNP markers provided herein can be used for detecting the presence of a combination of alleles in soybean plant or germplasm that result in a desired late maturity determinate phenotype and can therefore be used in methods involving marker-assisted breeding and selection of soybean plants/soybean plants having the allelic combination that results in the desired late maturity determinate phenotype.

[0104] In particular, methods are disclosed that rely on a combination of markers associated with two independent genes, one marker (e.g., one SNP marker) associated with the stem termination Dtl gene on Gm 19 (LGL) and the other marker (e.g., other SNP marker) associated with the El maturity gene on Gm 6 (LG C2). This marker combination is used to select the correct plant type consisting of the determinate growth habit (dtl/dtl) and the correct El genotype (El/El), while also de-selecting the incorrect plant type (dtl/dtl, el/el) that results in the deleterious early maturing determinate growth habit that consists of reduced plant height, earlier maturity, and lower yields. The association of the markers used herein, such as the association of marker SY3970 with the Dtl stem termination locus and the association of marker SY0768BQ with the El maturity locus has been determined through a genomewide association mapping study. Thus, in particular embodiments, the combination of the El maturity locus marker SY0768BQ and Stem termination marker SY3970 are used to identify the correct plants in segregating populations developed from breeding crosses involving parent lines contrasting for stem termination and the El maturity gene (Dtl/Dtl el/el x dtl/dtl E1E1), thereby allowing breeding pipelines to quickly identify and select for progenies according to the need to fix either allele (Dtl/el or dtl/El).

[0105] In embodiments, the present invention provides a composition comprising a combination of (1) at least one marker associated with stem termination alleles at the Dtl locus of the soybean genome and associated with determinate stem growth habits; and (2) at least one marker associated with the El locus of the soybean genome and associated with late maturity. In embodiments, detection of the combination of markers and/or other linked markers can be used in a method to identify, select and/or produce plants having a combination of determinate stem termination alleles and late maturity alleles. Detection of the combination of these markers and/or other linked markers can also be used to identify and deselect plants having indeterminate stem termination alleles and/or early maturity alleles, thereby enabling early elimination of such plants from breeding programs.

[0106] In some embodiments, methods for detecting the presence of a SNP combination in a soybean plant or germplasm can comprise providing two, or at least two, oligonucleotides or polynucleotides capable of hybridizing under stringent hybridization conditions to a nucleotide sequence of two SNPs disclosed herein, contacting the oligonucleotides or polynucleotides with genomic nucleic acid (or a fragment thereof, including, but not limited to a restriction fragment thereof) of the soybean plant or germplasm, and determining the presence of each of the SNPs by the specific hybridization of the oligonucleotides or polynucleotides to the soybean genomic nucleic acid (or the fragment thereof) at the corresponding loci. Herein, the two SNPs comprise a first SNP associated with the late maturity allele and a second SNP associated with the determinate stem termination allele.

[0107] Table 1 provides information about a combination of first molecular marker associated with a late maturity phenotype and a second molecular marker associated with a determinate stem termination phenotype. Table 1 includes, for each marker, the corresponding soybean chromosome and linkage group, the SY Identifier, the locus location, the favorable allele in a favored allelic combination that is associated with the desired late maturity indeterminate phenotype, and the nucleotide position of the favored allele within the isolated sequence of the SNP. Markers of the present invention can be described herein with respect to the positions of marker loci in the 8X public build of the Williams82 soybean genome (Williams82_V3 reference genome) at the SoyBase internet resource (www.soybase.org/SequenceIntro.php) or USDA at (bfgl.anri.barc.usda.gov/cgi- bin/soybean/Linkage.pl)

Table 1: Description of molecular markers associated with desired stem termination and maturity phenotype

[0108] In particular embodiments, the marker associated with late maturity comprises SY0768BQ (SEQ ID NO: 2) and the marker associated with determinate stem termination comprises SY3970 (SEQ ID NO: 1). In further embodiments, the marker associated with late maturity can include any marker linked to marker SY0768BQ (SEQ ID NO: 2).

Similarly, the marker associated with determinate stem termination can include any marker linked to marker SY3970 (SEQ ID NO: 1). Linked markers may be determined, for example, by using resources available on the SoyBase internet resource (soybase(.)org). Thus, in embodiments, the compositions and methods for detecting the presence of the favorable allele combination in a soybean plant or germplasm can comprise detecting [0109] In the case of soybean maturity genes such as El, the phenotypic trait includes one or more or a combination of flowering time, post-flowering time, relative maturity, maturity time, maturity group and number of days from flowering of the soybean plant to beginning of maturity. In particular embodiments, the phenotypic trait measured is a flowering time and includes a measure of time elapsed between the VE and R1 phase of a modified soybean plant (see FIG. 1 for the various stages) relative to a control plant. In another embodiment, the phenotypic trait measured is a maturity time and includes a measure of time elapsed between the R1 and R7 phase, or R1 and R8 phase, of the modified soybean plant (see FIG. 1) relative to a control plant. The number of days may vary, relative to a control plant comprising wild-type alleles at both loci, based on the specific allelic combination of the plant. For more descriptions of the various growth stages of the soybean plant, see Endres, G., An overview of soybean plant growth stages, Soybean Growth and Management Quick Guide, publication Al 174, North Dakota State University, revised November 2021.

[0110] As used herein, a modified plant having a flowering time and/or maturity time that is later than a control plant has a flowering time and/or maturity time that is between 1 and 10 days later than the control plant (e.g., later than that of the control plant by at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days or lOdays). In embodiments, the flowering time of a modified plant is later than the control plant if the flowering time is later by 1-2 days, 1-3 days, 1-4 days, 1-5 days, 1-6 days, 1-7 days, 1-8 days, 1-9 days or 1-10 days.

[0111] In embodiments, the progeny plant has a later flowering time and/or maturity time relative to the control plant, and wherein the control plant comprises a wild-type allele at the El locus. In some examples, assigning a changing in flowering time and/or maturity time comprises assigning a number of days by which the flowering time and/or maturity time is shortened for the progeny plant relative to the control plant. In particular embodiments, assigning a change in the flowering time comprises increasing a number of days between a VE stage and an R1 stage of the progeny plant relative to the control plant, and/or wherein assigning a change in the maturity time comprises increasing a number of days between an R1 stage and an R7 (or R8) stage of the progeny plant relative to the control plant.

[0112] A “dominant determinate allele” is an allele that, when present either in a single copy (heterozygous) or two copies (homozygous) affects the determinate phenotype, namely whether a plant ceases vegetative activity at or soon after photoperiod-induced floral induction, and whether the meristems become reproductive inflorescences. A “recessive determinate allele” is an allele that affects the determinate phenotype of the plant only when present in two copies (homozygous) and does not affect the maturity of a plant when present in a single copy (heterozygous).

[0113] The present disclosure provides a particular recessive determinate allele that, when present in a homozygous state, provides the desired growth habit of the stem termination that results in a determinate phenotype, namely the cessation of vegetative activity at or soon after photoperiod-induced floral inductions, then the meristems become reproductive inflorescences. In other words, determinate plants essentially halt vegetative growth once reproductive growth (flowering) begins while indeterminate plants continue to grow after flowering has begun. For more information concerning the determinate phenotype, see Pyle, A Comparison of Determinate and Indeterminate Soybean Lines for Double Cropping in Virginia, Dissertation, Virginia Polytechnic Institute and State University, 1982.

[0114] The presently disclosed subject matter thus also relates to methods for identifying, selecting, and/or producing soybean plants having one or more late maturity determinate allele(s) comprising detecting in a donor soybean plant the presence of a genetic marker associated with a late maturity determinate allele(s) and/or a genetic marker(s) associated with late maturity determinate phenotypes as described herein and transferring the nucleotide sequence comprising the at least one genetic marker thus detected from the donor soybean plant to a recipient soybean plant. It is noted that the recipient soybean plant can be late or early maturity or display the determinate or indeterminate stem phenotype. Typically, the recipient soybean plant carries at least one or more the undesirable traits of early maturity or indeterminate stem phentoype. In other embodiments, the recipient soybean plant can have both of these undesirable phenotypes. The transfer of the nucleotide sequence can be performed by any of the methods described herein.

[0115] Thus, methods for identifying, selecting and/or producing a soybean plant or germplasm comprising a late maturity determinate phenotype can comprise detecting the presence of a genetic marker associated with late maturity and/or determinate phenotype. The STkTP marker can be detected in any sample taken from the soybean plant or germplasm, including, but not limited to, the whole plant or germplasm, a portion of said plant or germplasm (e.g., a cell, leaf, seed, etc., from said plant or germplasm) or a nucleotide sequence from said plant or germplasm.

[0116] As discussed herein, in some embodiments of this invention, a marker can be identified using amplification products generated by amplifying a Glycine sp. nucleic acid with two oligonucleotide primers. In some embodiments, the amplification is by PCR, and the primers are PCR primers that are designed to hybridize to opposite strands of the Glycine sp. genomic DNA (e.g., Chromosome 19) in order to amplify a Glycine sp. genomic DNA sequence present between the sequences to which the PCR primers hybridize. Methods of amplifying nucleic acids are well known in the art.

[0117] Accordingly, in some embodiments of the present invention, a method of identifying and/or selecting a soybean plant or germplasm having a combination of late maturity and determinate phenotype is provided, the method comprising: detecting, in said soybean plant or germplasm, the presence of one or more genetic markers associated with each of a late maturity and a determinate stem termination allele, wherein said markers are detected in amplification products from a nucleic acid sample isolated from said soybean plant or germplasm using a probe, said amplification products having been produced using pairs of amplification primers wherein said amplification primers and probes as described in the provided Sequence Listing.

Marker-Assisted Selection

[0118] The subject matter disclosed herein also relates to methods for producing pathogen-resistant soybean plants comprising detecting the presence of a genetic marker associated with late maturity indeterminate in a donor soybean plant according to the methods as described herein and transferring a nucleic acid sequence comprising at least one genetic marker thus detected from the donor plant to a recipient soybean plant. The transfer of the nucleic acid sequence can be performed by any method known in the art.

[0119] Thus, the present invention encompasses methods of plant breeding and methods of selecting/identifying plants, in particular soybean plants, particularly cultivated soybean plants as breeder plants for use in breeding programs or cultivated soybean plants having desired genotypic or potential phenotypic properties, in particular related to producing valuable soybeans, also referred to herein as commercially valuable plants. Herein, a cultivated plant is defined as a plant being purposely selected or having been derived from a plant having been purposely selected in agricultural or horticultural practice for having desired genotypic or potential phenotypic properties, for example a plant obtained by inbreeding.

[0120] The presently disclosed subject matter thus also provides methods for selecting a plant of the genus Glycine having late maturity and the determinate phenotype comprising detecting in the plant the presence of one or more late maturity and/or determinate alleles as defined herein. In an exemplary embodiment of the presently disclosed methods for selecting such a plant, the method comprises providing a sample of genomic DNA from a soybean plant; and (b) detecting in the sample of genomic DNA at least one genetic marker associated with late maturity and/or determinate phenotype. In some embodiments, the detecting comprises detecting one or more SNPs that are associated with late maturity or determinate phenotype.

[0121] The providing of a sample of genomic DNA from a soybean plant can be performed by standard DNA isolation methods well known in the art.

[0122] The detecting of a genetic marker (e g., SNP, combination of SNPs) can in some embodiments comprise the use of one or more sets of primer pairs (SNP assays) that can be used to produce one or more amplification products that can be used in the detection of genetic markers (SNPs). Such a set of primers can comprise, in some embodiments, nucleotide sequences as set forth in the Sequence Listing. [0123] In some embodiments, the detecting of a genetic marker can comprise the use of a nucleic acid probe having a nucleotide base sequence that is substantially complementary to the nucleic acid sequence defining the genetic marker and which nucleic acid probe specifically hybridizes under stringent conditions with a nucleic acid sequence defining the genetic marker. A suitable nucleic acid probe can for instance be a single strand of the amplification product corresponding to the marker. In some embodiments, the detecting of a genetic marker is designed to determine whether a particular allele of a SNP is present or absent in a particular plant.

[0124] The presently disclosed subject matter thus also relates to methods for producing soybean plants comprising detecting the presence of a genetic marker associated with late maturity and/or determinate stem trait in a donor soybean plant according to the presently disclosed subject matter as described herein and transferring a nucleotide sequence comprising at least one genetic marker thus detected, or a late maturity and/or determinate conferring part thereof, from the donor plant to a recipient soybean plant. In particular embodiments, the recipient soybean plant maturity and/or stem determinate trait is changed by said transferred nucleotide sequence. The transfer of the nucleic acid sequence can be performed by any of the methods described herein.

[0125] An exemplary embodiment of such a method comprises the transfer of the nucleic acid sequence from a late maturity determinate donor soybean plant into a recipient soybean plant by crossing the plants by introgression. This transfer can be accomplished by using traditional breeding techniques. Late maturity and/or determinate loci are introgressed in some embodiments into commercial soybean varieties using marker-assisted selection (MAS) or marker-assisted breeding (MAB). MAS and MAB involves the use of one or more of the molecular markers, identified as having a significant association with a desired trait, and used for the identification and selection of those offspring plants that contain one or more of the genes that encode for the desired trait. As disclosed herein, such identification and selection is based on selection of SNP alleles of this invention or markers associated therewith. MAB can also be used to develop near-isogenic lines (NIL) comprising one or more pathogen resistant alleles of interest, allowing a more detailed study of an effect of such allele(s). MAB is also an effective method for development of backcross inbred line (BIL) populations. Soybean plants developed according to these embodiments can in some embodiments derive a majority of their traits from the recipient plant, and derive late maturity and/or the determinate trait from the donor plant. MAB/MAS techniques increase the efficiency of backcrossing and introgressing genes using marker-assisted selection (MAS) or marker-assisted breeding (MAB).

[0126] Thus, traditional breeding techniques can be used to introgress a nucleic acid sequence associated late maturity and/or the determinate trait into a recipient soybean plant. In some embodiments of the present invention, the recipient soybean does not have the late maturity or the determinate traits and these traits are conferred by transferring said nucleic acid sequence associated with late maturity determinate. Thus, for example, inbred late maturity determinate soybean plant lines can be developed using the techniques of recurrent selection and backcrossing, selfing, and/or di-haploids, or any other technique used to make parental lines. In a method of recurrent selection and backcrossing, late maturity determinate can be introgressed into a target recipient plant (the recurrent parent) by crossing the recurrent parent with a first donor plant, which differs from the recurrent parent (i.e., nonrecurrent parent). The recurrent parent is a plant that is but, in some embodiments, possesses commercially desirable characteristics, such as, but not limited to (additional) disease and/or insect resistance, valuable nutritional characteristics, valuable abiotic stress tolerance (including, but not limited to, drought tolerance, salt tolerance), and the like. In some embodiments, the non-recurrent parent exhibits late maturity indeterminate phenotypes and comprises a nucleic acid sequence that is associated with late maturity indeterminate phenotypes. The non-recurrent parent can be any plant variety or inbred line that is cross- fertile with the recurrent parent.

[0127] In some embodiments, the progeny resulting from a cross between the recurrent parent and non-recurrent parent are backcrossed to the recurrent parent. The resulting plant population is then screened for the desired characteristics, which screening can occur in a number of different ways. For instance, the population can be screened using phenotypic pathology screens or quantitative bioassays as known in the art. Alternatively, instead of using bioassays, MAB can be performed using one or more of the hereinbefore described molecular markers, hybridization probes, or polynucleotides to identify those progeny that comprise a nucleic acid sequence encoding, for example, for late maturity and/or determinate combination phenotypes (e.g., SNPs and SNP combinations described herein). Also, MAB can be used to confirm the results obtained from the quantitative bioassays. In some embodiments, the markers defined herein are suitable to select proper offspring plants by genotypic screening.

[0128] Following screening, Fl hybrid plants that exhibit a pathogen-resistant phenotype or, in some embodiments, the genotype, and thus comprise the requisite nucleic acid sequence associated with late maturity indeterminate, are then selected and backcrossed to the recurrent parent in order to allow for the soybean plant to become increasingly inbred. The process of selecting and backcrossing can be repeated for a number of generations (e.g., for one, two, three, four, five, six, seven, eight, or more generations).

[0129] The availability of integrated linkage maps of the soybean genome containing increasing densities of public soybean markers has facilitated soybean genetic mapping and MAS. See, e.g. soybeanbreederstoolbox.org, which can be found on the SoyBase internet resource (www.soybase.org).

[0130] Of the types of genetic marker available, SNPs are some of the most abundant and have the potential to provide the highest genetic map resolution (Bhattramakki et al., Plant Molec. Biol. 48:539 (2002)). SNPs can be assayed in a so-called “ultra-high-throughput” fashion because they do not require large amounts of nucleic acid and automation of the assay is straight-forward. SNPs also have the benefit of being relatively low-cost systems. These three factors together make SNPs highly attractive for use in MAS. Several methods are available for SNP genotyping, including but not limited to, hybridization, primer extension, oligonucleotide ligation, nuclease cleavage, mini-sequencing and coded spheres. Such methods have been reviewed in various publications: Gut, Hum. Mutat. 17:475 (2001); Shi, Clin. Chem. 47:164 (2001); Kwok, Pharmacogenomics 1 :95 (2000); Bhattramakki and Rafalski, Discovery and application of single nucleotide polymorphism markers in plants, in PLANT GENOTYPING: THE DNA FINGERPRINTING OF PLANTS, CABI Publishing, Wallingford (2001). A wide range of commercially available technologies utilize these and other methods to interrogate SNPs, including Masscode™ (Qiagen, Germantown, MD), Invader® (Hologic, Madison, WI), Snapshot® (Applied Biosystems, Foster City, CA), Taqman® (Applied Biosystems, Foster City, CA) and Beadarrays™ (Illumina, San Diego, CA).

Soybean Plants, Parts Thereof, and Germplasms Having Late Maturity Determinate Haplotypes

[0131] The present disclosure provides soybean plants and germplasms having the late maturity determinate haplotypes. As discussed above, the methods of the present disclosure can be utilized to identify, produce and/or select a soybean plant or germplasm having a late maturity and/or determinate allele. In addition to the methods described above, a soybean plant or germplasm having a late maturity and/or determinate allele may be produced by any method whereby a late maturity and/or determinate allele is introduced into the soybean plant or germplasm by such methods that include, but are not limited to, transformation (including, but not limited to, bacterial-mediated nucleic acid delivery (e.g., via Agrobacteria)), viral- mediated nucleic acid delivery, silicon carbide or nucleic acid whisker-mediated nucleic acid delivery, liposome mediated nucleic acid delivery, microinjection, micro-particle bombardment, electroporation, sonication, infdtration, PEG-mediated nucleic acid uptake, as well as any other electrical, chemical, physical (mechanical) and/or biological mechanism that results in the introduction of nucleic acid into the plant cell, or any combination thereof), protoplast transformation or fusion, a double haploid technique, embryo rescue, or by any other nucleic acid transfer system.

[0132] “Introducing” in the context of a plant cell, plant and/or plant part means contacting a nucleic acid molecule with the plant, plant part, and/or plant cell in such a manner that the nucleic acid molecule gains access to the interior of the plant cell and/or a cell of the plant and/or plant part. Where more than one nucleic acid molecule is to be introduced these nucleic acid molecules can be assembled as part of a single polynucleotide or nucleic acid construct, or as separate polynucleotide or nucleic acid constructs, and can be located on the same or different nucleic acid constructs. Accordingly, these polynucleotides can be introduced into plant cells in a single transformation event, in separate transformation events, or, e.g., as part of a breeding protocol. Thus, the term “transformation” as used herein refers to the introduction of a heterologous nucleic acid into a cell. [0133] Thus, a soybean plant, or part thereof, having a haplotype providing late maturity and/or the determinate trait (i.e., soybean plant or part thereof), obtainable by the methods of the presently disclosed subject matter, are aspects of the presently disclosed subject matter. In some embodiments, the soybean plant of the present invention has more than one haplotype providing late maturity determinate traits as described herein.

[0134] The soybean plant, or part thereof, of this description having a haplotype providing late maturity and/or determinate can be homozygous for the various allele combinations. In some embodiments of this invention, the plant can be heterozygous.

[0135] The soybean plant or germplasm may be the progeny of a cross between a variety of soybean and a second variety of soybean that comprises a haplotype providing late maturity and/or determinate trait.

[0136] The soybean plant or genuplasm may be the progeny of an introgression wherein the recurrent parent is a variety of soybean and the donor comprises a haplotype providing late maturity and/or determinate traits.

[0137] The soybean plant or germplasm may be the progeny of a cross between a first variety of soybean (e.g., a tester line) and the progeny of a cross between a second variety of soybean (e.g., a recurrent parent) and a variety of soybean that a haplotype providing late maturity and/or determinate traits (e.g., a donor).

[0138] The soybean plant or germplasm may be the progeny of a cross between a first variety of soybean and the progeny of an introgression wherein the recurrent parent is a second variety of soybean and the donor comprises a haplotype providing late maturity and/or determinate traits.

[0139] Another aspect of the presently disclosed subject matter relates to a method of producing seeds that can be grown into soybean plants having a haplotype that provides late maturity and/or determinate traits. In some embodiments, the method comprises providing a soybean plant of this description, crossing that plant with another soybean plant, and collecting seeds resulting from the cross, which when planted, produce plants that have late maturity and/or determinate traits. [0140] Accordingly, the present invention provides improved soybean plants, seeds, and/or tissue cultures produced by the methods described herein. In further embodiments, the present invention provides introgressed Glycine max plants and/or germplasm produced by the methods described herein.

Compositions for Analysis of a Soybean Genome

[0141] In some embodiments, the presently disclosed subject matter provides methods for analyzing the genomes of soybean plants/germplasms to identify those that include desired markers and combination of markers associated with haplotypes associated with late maturity and/or determinate. In some embodiments, the methods of analysis comprise amplifying subsequences of the genomes of the soybean plants/germplasms and determining the nucleotides present in one, some, or all positions of the amplified subsequences.

[0142] Thus, in some embodiments, the present invention provides compositions comprising one or more amplification primer pairs capable of initiating DNA polymerization by a DNA polymerase on a Glycine max nucleic acid template to generate a Glycine max marker amplicon. In some embodiments, the Glycine max amplicon can be used to identify the Glycine max marker comprising a nucleotide sequence of any of SEQ ID NOs: 1-2. In view of the disclosure of SEQ ID NOs: 1-2 as being able to be combined to provide haplotypes that result in desired late maturity determinate, one of ordinary skill in the art would be aware of various techniques that could be employed to analyze the sequences of the corresponding soybean nucleic acids. Representative sequences involved are provided below in Table 2.

Table 2: Sequences [0143] Representative amplification primer pairs can comprise the nucleotide sequences of a forward primer and corresponding reverse primer as set forth in the sequence listing provided at Table 3. Representative probes can comprise the nucleotide sequences as set forth in the sequence listing provided at Table 4.

Table 3: Example Amplification Primer

Table 4: Example Marker Probes

NON-LIMITING EMBODIMENTS

[0144] Non-limiting example embodiments of the invention are provided below. Embodiment 1. A method of identifying and/or selecting a soybean plant comprising an allelic combination associated with a late maturity determinate stem growth phenotype, comprising: a. isolating a nucleic acid from a soybean plant part; b) detecting in the nucleic acid of a), a first molecular marker that is associated with late maturity and a second molecular marker that is associated with determine stem growth, wherein said first molecular marker associated with determinate stem growth comprises a G at position 335 of SEQ ID NO: 1, and wherein said second molecular marker associated with late maturity comprises a T at position 260 of SEQ ID NO: 2; and c) selecting or identifying a soybean plant comprising the allelic combination associated with a late maturity determinate stem growth phenotype on the basis of the presence of said first and second molecular markers of b). [0145] Embodiment 2. The method of embodiment 1, wherein the detecting comprises amplifying each of the first and the second marker, or a portion thereof, and detecting the resulting amplified marker amplicons.

[0146] Embodiment 3. The method of embodiment 2, wherein the amplifying comprises: a) admixing an amplification primer pair with a nucleic acid isolated from the first soybean plant or germplasm thereof, wherein the primer pair is complementary or partially complementary to at least a portion of the first molecular marker associated with determinate stem growth, and is capable of initiating DNA polymerization by a DNA polymerase using the soybean nucleic acid as a template; b) admixing another amplification primer or amplification primer pair with the nucleic acid isolated from the first soybean plant or germplasm thereof, wherein the primer pair is complementary or partially complementary to at least a portion of the second molecular marker associated with late maturity, and is capable of initiating DNA polymerization by a DNA polymerase using the soybean nucleic acid as a template and, c) extending each primer pair in a DNA polymerization reaction comprising a DNA polymerase and a template nucleic acid to generate corresponding amplicons.

[0147] Embodiment 4. The method of embodiment 3, wherein the primer pair that is complementary or partially complementary to at least a portion of the first molecular marker associated with determinate stem growth comprises SEQ ID NOS: 3 and 4, and wherein the primer pair that is complementary or partially complementary to at least a portion of the second molecular marker associated with determinate stem growth comprises SEQ ID NOS: 5 and 6.

[0148] Embodiment 5. The method of embodiment 2, wherein the amplifying comprises employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm thereof as a template in the PCR or LCR. [0149] Embodiment 6. The method of embodiment 2, wherein the detecting the resulting amplified marker amplicons comprises detecting using a marker probe, wherein the marker amplicon generated by amplifying the first marker associated with determinate stem growth comprises SEQ ID NO: 7 or 9, and wherein the marker amplicon generated by amplifying the second marker associated with late maturity comprises SEQ ID NO: 8 or 10.

[0150] Embodiment 7. A method of producing a soybean plant having a favorable allelic combination associated with a late maturity determinate stem growth phenotype, comprising: a. isolating a nucleic acid from a soybean plant part; b. detecting in the nucleic acid of a), a first molecular marker that is associated with late maturity and a second molecular marker that is associated with determine stem growth, wherein said first molecular marker associated with determinate stem growth comprises a G at position 335 of SEQ ID NO: 1, and wherein said second molecular marker associated with late maturity comprises a T at position 260 of SEQ ID NO: 2; c. selecting a first soybean plant comprising the favorable allelic combination on the basis of the presence of said first and second molecular markers of b); d. crossing the first soybean plant of c) with a second soybean plant not comprising the molecular marker of b); and e. producing a progeny plant from the cross of d), thereby producing a soybean plant having the favorable allelic combination associated with a late maturity determinate stem growth phenotype.

[0151] Embodiment 8. The method of embodiment 7, wherein the progeny plant comprises each of the molecular markers of b).

[0152] Embodiment 9. The method of embodiment 7, wherein either the first or second soybean plant is an elite soybean plant.

[0153] Embodiment 10. The method of embodiment 7, wherein the progeny plant is backcrossed by one or more generations.

[0154] Embodiment 11. The method of embodiment 7, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a microsatellite.

[0155] Embodiment 12. The method of embodiment 7, wherein the detecting comprises amplifying each of the first and the second marker, or a portion thereof, and detecting the resulting amplified marker amplicons.

[0156] Embodiment 13. The method of embodiment 12, wherein the amplifying comprises: a) admixing an amplification primer pair with a nucleic acid isolated from the first soybean plant or germplasm thereof, wherein the primer pair is complementary or partially complementary to at least a portion of the first molecular marker associated with determinate stem growth, and is capable of initiating DNA polymerization by a DNA polymerase using the soybean nucleic acid as a template; b) admixing another amplification primer or amplification primer pair with the nucleic acid isolated from the first soybean plant or germplasm thereof, wherein the primer pair is complementary or partially complementary to at least a portion of the second molecular marker associated with late maturity, and is capable of initiating DNA polymerization by a DNA polymerase using the soybean nucleic acid as a template and, c) extending each primer pair in a DNA polymerization reaction comprising a DNA polymerase and a template nucleic acid to generate corresponding amplicons.

[0157] Embodiment 14. The method of embodiment 13, wherein the primer pair that is complementary or partially complementary to at least a portion of the first molecular marker associated with determinate stem growth comprises SEQ ID NOS: 3 and 4, and wherein the primer pair that is complementary or partially complementary to at least a portion of the second molecular marker associated with determinate stem growth comprises SEQ ID NOS: 5 and 6.

[0158] Embodiment 15. The method of embodiment 12, wherein the amplifying comprises employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm thereof as a template in the PCR or LCR.

[0159] Embodiment 16. The method of embodiment 12, wherein the detecting the resulting amplified marker amplicons comprises detecting using a marker probe, wherein the marker amplicon generated by amplifying the first marker associated with determinate stem growth comprises SEQ ID NO: 7 or 9, and wherein the marker amplicon generated by amplifying the second marker associated with late maturity comprises SEQ ID NO: 8 or 10. [0160] Embodiment 17. A composition comprising at least two amplification primer pairs capable of initiating DNA polymerization by a DNA polymerase on a Glycine max nucleic acid template to generate at least two Glycine max marker amplicons for detecting presence of an allelic combination associated with a late maturity determinate stem growth phenotype, wherein the at least two Glycine max marker amplicons comprise SEQ ID NO: 1 and SEQ ID NO: 2.

[0161] Embodiment 18. The composition of embodiment 17, wherein the at least two amplification primer pairs comprise (i) the primer pair of SEQ ID NO: 3 and SEQ ID NO: 4 and (ii) the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6. [0162] Embodiment 19. The composition of embodiment 17, further comprising at least two marker probes for identification of the at least two amplicons, where the at least two marker probes are: (i) SEQ ID NO: 7 and 9; or (ii) SEQ ID NO: 8 and 10.

[0163] Embodiment 20. The composition of embodiment 19, wherein the amplification primer pair comprises SEQ ID NOS: 3 and 4 and the marker probes comprise SEQ ID NOS: 7 and 9 or the amplification primer pair comprises SEQ ID NOS: 5 and 6 and the marker probes comprise SEQ ID NOS: 8 and 10.

EXAMPLE

[0164] Average plant height in centimeters was used as a proxy to determine the impact of different allelic combinations of stem termination (SY3970) and El Maturity gene (SY0768BQ) on soybean plants. Data from 62 individuals from 5 separate populations are recorded in Table 5 below. The results indicate that TTGG is the desirable haplotype for the correct plant type for a late maturity determinate phenotype, with AAGG being highly undesirable. AACC is the correct haplotype for mid-maturity indeterminates and TTCC is the correct haplotype for late maturity indeterminates.

Table 5