Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
LRR-RLKII RECEPTOR KINASE INTERACTION DOMAINS
Document Type and Number:
WIPO Patent Application WO/2021/015616
Kind Code:
A1
Abstract:
The invention relates to a plant comprising a gene encoding an altered leucine-rich repeat, receptor-like kinase of a somatic embryogenesis receptor kinase (SERK) subfamily, or an altered extracellular like SERK receptor (ELS) whereby said alteration is in a region of the gene that encodes a conserved, extracellular domain of said receptor. The invention further relates to a part of the plant of the invention, and to a food product prepared from the plant or plant part according to the invention. The invention further relates to means, such as a recombinant nucleic acid molecule, a vector, a plant protoplast, cell, or callus, and method for the production of a plant comprising in its genome at least one copy of a gene encoding a leucine-rich repeat, receptor-like kinase of a somatic embryogenesis receptor kinase (SERK) subfamily, or an extracellular like SERK receptor (ELS), in which a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered.

Inventors:
BAKKER FREDERIK THEODOOR (NL)
HOSSEINIFARHANGI SAMINALSADAT (NL)
Application Number:
PCT/NL2020/050479
Publication Date:
January 28, 2021
Filing Date:
July 22, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV WAGENINGEN (NL)
International Classes:
C12N9/12; C12N15/82
Domestic Patent References:
WO2009080743A22009-07-02
WO2006131547A12006-12-14
WO2003010319A22003-02-06
Foreign References:
US7838728B22010-11-23
US4945050A1990-07-31
US5591616A1997-01-07
US20090126041A12009-05-14
Other References:
NCBI: "BRASSINOSTEROID INSENSITIVE 1-associated receptor kinase 1 [Brassica napus]", 4 October 2017 (2017-10-04), XP055729339, Retrieved from the Internet [retrieved on 20200910]
WILLING: "KFK29958", 13 August 2014 (2014-08-13), XP055729342, Retrieved from the Internet [retrieved on 20200910]
SAMIN HOSSEINI ET AL: "Leucine-rich repeat receptor-like kinase II phylogenetics reveals five main clades throughout the plant kingdom", THE PLANT JOURNAL, vol. 103, no. 2, 8 April 2020 (2020-04-08), GB, pages 547 - 560, XP055728329, ISSN: 0960-7412, DOI: 10.1111/tpj.14749
DE SMET ET AL., NAT CELL BIOL, vol. 11, 2009, pages 1166 - 1173
HOHMANN ET AL., ANNU REV PLANT BIOL, vol. 68, 2017, pages 109 - 137
SHIUBLEEKER, PROC NATL ACAD SCI, vol. 98, 2001, pages 10763 - 10768
TORII, INT REV CYTOL, vol. 234, 2004, pages 1 - 46
AAN DEN TOORN ET AL., MOL PLANT, vol. 8, 2015, pages 762 - 782
SHIU ET AL., PLANT CELL, vol. 16, 2004, pages 1220 - 1234
WANG ET AL., CRC CRIT REV PLANT SCI, vol. 29, 2010, pages 285 - 299
HE ET AL., J CELL SCI, vol. 131, 2018, pages jcs209353
BUTENKO ET AL., TRENDS PLANT SCI, vol. 14, 2009, pages 255 - 263
LI, CURR OPIN PLANT BIOL, vol. 13, 2010, pages 509 - 514
NEWMAN ET AL., FRONT PLANT SCI, vol. 4, 2013, pages 1 - 14
ROUX ET AL., PLANT CELL, vol. 23, 2011, pages 2440 - 2455
HALTER ET AL., CURR BIOL, vol. 24, 2014, pages 134 - 43
TANG ET AL., CELL RES, vol. 25, 2015, pages 110 - 20
CHAE ET AL., MOL PLANT, vol. 2, 2009, pages 84 - 107
SAKAMOTO ET AL., BMC PLANT BIOL, vol. 12, 2012, pages 229
LIU ET AL., BMC EVOL BIOL, vol. 17, 2017, pages 47
SHIMODAIRAHASEGAWA, MOL BIOL EVOL, vol. 17, 1999, pages 1114 - 1116
"UniProt", Database accession no. Q94F62
SKOOGMILLER, SYMP SOC EXP BIOL, vol. 54, 1957, pages 118 - 130
MOTTE ET AL., PROC NATL ACAD SCI USA, vol. 111, 2014, pages 8305 - 8310
HECHT ET AL., PLANT PHYSIOL, vol. 127, 2001, pages 803 - 816
KUMARVAN STADEN, ACTA PHYSIOLOGIAE PLANTARUM, vol. 41, 2019, pages 31
WANG ET AL., ANN REV GENET, vol. 46, 2012, pages 701 - 724
OKAMUROGOLDBERG, BIOCHEMISTRY OF PLANTS, vol. 15, 1989, pages 1 - 82
POUWELS ET AL., CLONING VECTORS: A LABORATORY MANUAL, vol. I, II, 1985
DE BLAERE ET AL., METH. ENZYMOL., vol. 143, 1987, pages 277
KLEIN, NATURE, vol. 327, 1987, pages 70 - 73
TAVAZZA ET AL., PLANT SCIENCE, vol. 59, 1989, pages 175 - 181
FLEVIN ET AL.: "Plant Molecular Biology Manual", 1990, KLUWER ACADEMIC PUBLISHERS
NAKADE ET AL., BIOENGINEERED, vol. 8, 2017, pages 265 - 273
MIKIMCHUGH, J BIOTECH, vol. 107, 2004, pages 193 - 232
HORSCH ET AL., SCIENCE, vol. 227, 1985, pages 1229 - 1231
GRUBERCROSBY: "Methods in Plant Molecular Biology and Biotechnology", 1993, CRC, article "Vectors for plant transformation"
MIKI ET AL.: "Techniques in plant molecular biology and biotechnology", 1993, CRC PRESS INC.
SAMBROOK JRUSSELL DW: "Molecular cloning: a laboratory manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
HAMILTON, GENE, vol. 200, 1997, pages 107 - 16
GAJ ET AL., TRENDS BIOTECHNOL, vol. 31, 2013, pages 397 - 405
BOETTCHERMCMANUS, MOL CELL, vol. 58, 2015, pages 575 - 585
KATOHSTANDLEY, MOL BIOL EVOL, vol. 30, 2013, pages 772 - 780
ASHKENAZY ET AL., NUCLEIC ACIDS RES, vol. 44, 2016, pages W344 - W350
NGUYEN ET AL., MOL BIOL EVOL, vol. 32, 2015, pages 268 - 274
CROOKS ET AL., GENOME RES, vol. 14, 2004, pages 1188 - 1190
FARRIS, CLADISTICS, vol. 5, 1989, pages 417 - 419
GRANTHAM, SCIENCE, vol. 185, 1974, pages 862 - 864
ABKEVICH, J MED GENET, vol. 41, 2004, pages 492 - 507
BALASUBRAMANIAN ET AL., NUCLEIC ACIDS RES, vol. 33, 2005, pages 1710 - 1721
BERMAN ET AL., NUCLEIC ACIDS RES, vol. 28, 2000, pages 235 - 242
SANTIAGO ET AL., SCIENCE, vol. 341, 2013, pages 889 - 892
CHENG ET AL., CELL HOST MICROBE, vol. 10, 2011, pages 616 - 26
ASHKENAZY ET AL., NUCLEIC ACIDS RES, vol. 38, 2010, pages W529 - W533
CELNIKER ET AL., ISR J CHE, vol. 53, 2013, pages 199 - 206
JONES ET AL., BIOINFORMATICS, vol. 8, 1992, pages 275 - 282
Attorney, Agent or Firm:
WITMANS, H.A. (NL)
Download PDF:
Claims:
Claims

1. A plant comprising a gene encoding an altered leucine-rich repeat, receptor like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), whereby said alteration is in a region of the gene that encodes a conserved, extracellular domain of said receptor.

2. The plant of claim 1, whereby said gene is involved in apomixis, regeneration, resistance and/or steroid signal transduction.

3. The plant of claim 1 or claim 2, whereby said alteration is in a domain that is involved in protein-protein, protein-ligand interactions, or both protein-protein and protein-ligand interactions.

4. The plant of any one of claims 1-3, wherein said alteration is at one or more amino acid positions in a domain comprising a conserved pair of cysteines, a leucine-rich region (LRR), or both.

5. The plant of any one of claims 1-4, wherein said alteration is in a domain having the consensus sequence XXVXPCSWXXXXCXXXXXXXXXXL,

corresponding to a region from amino acid residue 52 to 75 of SEQ ID NO:1, a domain having the consensus sequence XXXLX, corresponding to a region from amino acid residue 96 to 100 of SEQ ID NO:1, a domain having the consensus sequence LXLX, corresponding to a region from amino acid residue 121 to 124 of SEQ ID NO:1, and/or a domain having the consensus sequence LXYLXL, corresponding to a region from amino acid residue 142 to 147 of SEQ ID NO:1.

6. The plant of any one of claims 1-5, wherein said alteration has altered the amino acid sequence of one or more conserved, extracellular domains from a Clade X amino acid sequence into a Clade Y amino acid sequence, whereby Clade X and Clade Y each refers to a different Clade of the identified Clades 1-5 as depicted in Figure 2.

7. The plant of claim of any one of claims 1-6, wherein said alteration is not an alteration of one or more of the conserved amino acid residues at positions 3, 5, 6, 7, 8, 13 and 24 of the first consensus sequence, at positions 2 and 4 of the second consensus sequence, at positions 1 and 3 of the third consensus sequence and at positions 2 and 5 of the fourth consensus sequence.

8. The plant of any one of claims 1-7, wherein said plant is selected from the group consisting of maize, soybean, sunflower, sorghum, canola, wheat, alfalfa, cotton, rice, barley, millet, sugar cane, and switchgrass.

9. A part of the plant of any one of claims 1-8, selected from pollen, ovules, leaves, embryos, roots, root tips, anthers, flowers, fruits, stems shoots, scions, rootstocks, seeds, protoplasts and calli, including single cells, cell clumps and tissue cultures therefrom.

10. A food product prepared from the plant of any one of claims 1-8, or the part thereof according to claim 9.

11. A recombinant nucleic acid molecule comprising at least an extracellular part of a leucine-rich repeat, receptor-like kinase of a somatic embryogenesis receptor kinase (SERK) subfamily, or of an extracellular like SERK (ELS) receptor, wherein at least a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered.

12. A vector comprising the recombinant nucleic acid molecule of claim 11.

13. A plant protoplast, cell, or callus that is transformed with the recombinant nucleic acid construct of claim 11, or with the vector of claim 12.

14. A method for the production of a plant comprising in its genome at least one copy of a gene encoding a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor (ELS), in which a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered, said method comprising the steps of:

(a) introducing the recombinant nucleic acid construct of claim 11, or the vector of claim 12, into a plant protoplast, cell, or callus;

(b) regenerating a transgenic plant from the plant protoplast, cell, or callus, wherein the transgenic plant comprises in its genome the recombinant nucleic acid construct; and

(c) obtaining a progeny plant derived from the transgenic plant of step (b), wherein said progeny plant comprises in its genome the recombinant nucleic acid construct.

15. The method of claim 14, wherein said progeny plant exhibits an alteration in apomixis, regeneration, resistance and/or steroid signal transduction, when compared to a control plant not comprising the recombinant nucleic acid construct.

Description:
Title: LRR-RLKII receptor kinase interaction domains

FIELD: The invention relates to genetically transformed plants. In particular the invention relates to plants exhibiting an alteration in apomixis, regeneration, resistance and/or steroid signal transduction, and to methods for producing such plants.

INTRODUCTION

In order for plant cells to sense signals from their environment as well as to communicate with each other means that they perceive and process information through so-called cell surface receptors. For instance, during plant development and growth, as well as during cell specification, proper organisation and communication is of obvious importance. Whereas plant hormones and

transcription factors were long understood to be important in such regulation, cell surface receptors too are now considered potentially crucial (e.g. De Smet et al., 2009. Nat Cell Biol 11, 1166-1173). Receptor like kinases (RLKs), for which the structural basis of ligand perception and signal activation was reviewed by Hohmann et al., 2017 (Hohmann et al., 2017. Annu Rev Plant Biol 68: 109-137), represent the largest group of cell surface receptors in plants, and are considered the largest plant gene family with >600 members and representing approximately 2.5% of protein coding genes in Arabidopsis (Shiu and Bleeker, 2001. Proc Natl Acad Sci 98: 10763-10768; Torii, 2004. Int Rev Cytol 234: 1-46; Aan den Toorn et al., 2015. Mol Plant 8: 762-782). They were found to be monophyletic, based on A. thaliana kinase sequence comparisons using Neighbor Joining (Shiu and Bleeker 2001. Ibid), and appear to form one of the clades in the kinase superfamily. Within RLKs, up to 50 different kinase clades or subfamilies were found by Shiu et al., 2004 (Shiu et al., 2004. Plant Cell 16: 1220-1234), based on Arabidopsis and Oryza sequence comparisons, and confirming Shiu and Bleeker’s classification of 2001. One of the largest RLK subfamilies are the Leucine-Rich Repeat (LRR)-RLKs, which are considered to comprise 235 out of the 610 known RLKs (Aan den Toorn et al., 2015. Ibid) and which combine an extracellular leucine rich repeat with an intracellular kinase domain. Another important class of surface receptors apart from the RLKs is formed by the receptor like proteins (RLPs) which include an LRR but lack a cytoplasmic kinase domain (Wang et al., 2010. CRC Crit Rev Plant Sci 29: 285-299). One such RLP is the“extracellular like SERK receptor” (ELS) (US granted patent 7838728).

RLK and RLP are involved either in plant development or in plant immunity (He et al., 2018. J Cell Sci 131: jcs209353). For all RLKs, a ligand-receptor interaction is often required to activate the kinase. In plants, most reported plant receptor-like kinases have serine/threonine kinase specificity (Butenko et al., 2009. Trends Plant Sci 14: 255-263), whereas it is mostly a tyrosine kinase in animals (Shiu and Bleeker, 2001. Ibid).

A subgroup of LRR-RLK contain the somatic embryogenesis receptor kinases (SERK), known to be involved in developmental processes such as stomatal patterning, root meristem development, floral organ abscission, plant growth, xylem differentiation and male gametophyte development, as well as in cellular immunity (Li, 2010. Curr Opin Plant Biol 13: 509-514; He et al., 2018. J Cell Sci 131: jcs209353.). In the latter process LRR-RLK family members take part in the first phase of the immunity system of plants as the elicitors. They detect conserved protein structures of micro-organisms, so-called microbe-associated molecular patterns (MAMPs), such as the 22 amino acid conserved bacterial flagellin (flg22) protein (Newman et al., 2013. Front Plant Sci 4: 1-14). Several RLKs, for instance flagellin- sensing 2 (FLS2), Botrytis-induced kinase 1 (BIK1), elongation factor- Tu receptor (EFR), DAMP peptide receptor 1 (AtPEPRl) and Brassinosteroid insensitive 1- associated receptor kinase 1 (BAKl), interacting with RLK1-3 (BIR1- 3), activate plant immunity systems after a pathogen attacks by forming heterodimers with the kinase domain of SERK proteins by a phosphorylation event between SERK protein and the receptor (Wang et al., 2010. CRC Crit Rev Plant Sci 29: 285-299; Roux et al., 2011. Plant Cell 23: 2440-2455; Halter et al., 2014. Curr Biol 24: 134-43; Tang et al., 2015. Cell Res 25: 110-20; He et al., 2018. Ibid).

Several reports have described the clustering of the plant LRR-RLK subfamily (e.g. Shiu and Bleeker, 2001. Ibid, Shiu et al., 2004. Ibid; Chae et al., 2009. Mol Plant 2: 84-107; Sakamoto et al., 2012. BMC Plant Biol 12: 229; Liu et al., 2017. BMC Evol Biol 17: 47), but none of these has widely been accepted. Most of these LRR-RLK clades appear to be fairly well- supported, based on Shimodaira- Hasagawa tests for‘local support’ (Shimodaira and Hasegawa, 1999. Mol Biol Evol 17: 1114-1116), but as CLUSTAL was used for the alignment and given the nature of FastTree, it is not known how robust this pattern actually is.

Based on studies towards the phylogenetic relationships within the LRR-RLK family, conserved sequence motifs and associated functions were identified, especially within the LRR-RLKII family. Alteration of these conserved domains allows to adapt the role of these proteins in the developmental processes and in cellular immunity. BRIEF DESCRIPTION OF THE INVENTION

We have refined phylogenetic patterns among members of the leucine-rich repeat (LRR)-receptor-like kinase (RLK) family of receptor proteins, especially of LRR-RLKII receptor proteins, using all available sequence data to date. Trends in LRR-RLKII structural and functional evolution were identified, and amino acid residues were identified that are involved in extra-cellular interactions of the LRR- RLKII proteins.

Liu et al., 2017 (Liu et al., 2017. BMC Evol Biol 17: 47) have divided LRR- RLK genes into 19 different subfamilies, one of which is termed the LRR-RLKII proteins. Members within a subfamily have conserved intron/exon boundaries. In addition, the LRR-RLKII family would be characterized by a Q_M4 motif

GxxVAV/iKrLxxxxxx in the predicted kinase domain (Liu et al., 2017. BMC Evol Biol 17: 47). Said LRR-RLKII subfamily is sometimes termed“LRRII-RLK” subfamily.

LRR-RLKII family members act as co -receptors for plant receptor kinases. For example, brassinosteroid binding to the brassinosteroid LRR receptor kinase BRI1 generates a binding site for a somatic embryogenesis receptor kinase, termed somatic embryogenesis receptor kinase 1 (SERK1). SERK1 interacts both with the brassinosteroid, and with a part of the extracellular domain of BRI1 (Hohmann et al., 2017. Annu Rev Plant Biol 68: 109-137). The interaction of a SERK-related LRR-RLKII family member with both a ligand and its LRR-RK receptor is mediated by conserved domains in the extracellular domain of the LRR-RLKII family member. Alteration of these interaction domains may provide ways to adapt processes involved in apomixis, regeneration, resistance and/or steroid signal transduction in a plant by enabling interactions of the LRR-RLKII family member with one or more other LRR-RKs and/or other ligands.

The invention therefore provides a plant comprising a gene encoding an altered leucine-rich repeat, receptor-like kinase of the LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), whereby said alteration is in a region of the gene that encodes a conserved, extracellular domain of said receptor. Said gene preferably is involved in apomixis, regeneration, resistance and/or steroid signal transduction.

Said alteration preferably is in a conserved extracellular domain that is involved in protein-protein, protein-ligand interactions, or both protein-protein and protein-ligand interactions. .Said alteration preferably is at one or more amino acid positions in a domain comprising a conserved pair of cysteines, a domain

comprising a leucine-rich region (LRR), or both domains. Said alteration preferably is in a domain having the consensus sequence

XXVXPCSWXXXXCXXXXXXXXXXL, corresponding to a region from amino acid residue 52 to 75 of SEQ ID NO:1, a domain having the consensus sequence XXXLX, corresponding to a region from amino acid residue 96 to 100 of SEQ ID NO:1, a domain having the consensus sequence LXLX, corresponding to a region from amino acid residue 121 to 124 of SEQ ID NO:1, and/or a domain having the consensus sequence LXYLXL, corresponding to a region from amino acid residue 142 to 147 of SEQ ID NO:1.

In an embodiment, said alteration has altered the amino acid sequence of one or more conserved, extracellular domains from a Clade X amino acid sequence into a Clade Y amino acid sequence, whereby Clade X and Clade Y each refers to a different Clade of the identified Clades 1-5 as depicted in Figure 2.

Said alteration preferably is not an alteration of one or more of the conserved amino acid residues at positions 3, 5, 6, 7, 8, 13 and 24 of the first consensus sequence, at positions 2 and 4 of the second consensus sequence, at positions 1 and 3 of the third consensus sequence and at positions 2 and 5 of the fourth consensus sequence.

A consensus LRR domain sequence for the receptor proteins of each of the 5 Clades is provided in Figure 2. Although the LRR domain of receptor proteins of all Clades underlines the consensus sequence LxxLxxLxLxxNxxSGxIPxxLgx, subtle differences do exist between the receptor proteins of each of the 5 Clades of the LRRII_RLK subfamily.

The class of plants that can be used in the methods of the invention is generally as broad as the class of higher plants amenable to transformation techniques, including both monocotyledonous and dicotyledonous plants, and gymnosperm plant species. Said plant preferably is selected from maize, soybean, sunflower, sorghum, canola, wheat, alfalfa, cotton, rice, barley, millet, sugar cane, and switchgrass.

The invention further provides a part of the plant of the invention, selected from pollen, ovules, leaves, embryos, roots, root tips, anthers, flowers, fruits, stems shoots, scions, rootstocks, seeds, protoplasts and calli, including single cells, cell clumps and tissue cultures therefrom.

The invention further provides a food product that is prepared from the plant or plant part according to the invention.

The invention further provides a recombinant nucleic acid molecule comprising at least an extracellular part of a leucine-rich repeat, receptor- like kinase of the LRR-RLKII) subfamily, or of an extracellular like SERK (ELS) receptor, wherein at least a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered. Said recombinant nucleic acid molecule preferably is comprised in a vector.

The invention further provides a plant protoplast, cell, or callus that is transformed with the recombinant nucleic acid construct or vector according to the invention.

The invention further provides a method for the production of a plant comprising in its genome at least one copy of a gene encoding a leucine-rich repeat, receptor-like kinase of the LRR-RLKII family member subfamily, or an

extracellular like SERK receptor (ELS), in which a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered, said method comprising the steps of: (a) introducing the recombinant nucleic acid, the construct or vector according to the invention, into a plant protoplast, cell, or callus; (b) regenerating a transgenic plant from the plant protoplast, cell, or callus, wherein the transgenic plant comprises in its genome the recombinant nucleic acid construct; and (c) obtaining a progeny plant derived from the transgenic plant of step (b), wherein said progeny plant comprises in its genome the recombinant nucleic acid construct.

Said method preferably produces a plant or progeny plant that exhibits an alteration in apomixis, regeneration, resistance and/or steroid signal transduction, when compared to a control plant not comprising the recombinant nucleic acid.

LEGENDS TO THE FIGURES

Figure 1. Amino acid sequence of brassinosteroid insensitive 1-associated receptor kinase 1 (BAK1) of Arabidopsis thaliana. UniProt reference code Q94F62, a Glade 5 SERK-related LRR-RLKII receptor (A). Conserved domains in LRR- RLKII receptors (B).

Figure 2. Amino acid sequences of the conserved extracellular domains of Glades 1-5 LRR-RLKII receptors.

Figure 3. Motif structure of LRR-RLKII proteins from each Glade and ELS. Structures are based on observations in the Weblogo analyses of all available sequences in this study.

Figure 4. Exon and intron structure of the proteins. Exon and intron boundaries of HDR Glades and ELS genes of six selected Arabidopsis thaliana and six selected Oryza sativa genes. Each gene belongs to a different HDR clade.

Figure 5. Motif structure located in the N-capping residue and the leucine- rich repeat domains of associated SERK proteins of LRR-RLKII proteins for each main clade inferred here, as well as for ELS (included in Clade 5). (A) Structures are based on observations in the Weblogo analyses of all available sequences.

WebLogo frequency plots indicate the conservation pattern across entire clades. (B) Amino acid residues involved in SERK (ID 4MN8)-FLS2, SERK (ID 5IYX)- HAESA, SERK (ID 4Z64)-PSKR and SERK (ID 4LSC)-BRI1 complex interactions.

DETAILED DESCRIPTION OF THE INVENTION

Definitions

The term“regeneration”, as is used herein, refers to the reproduction of plants from small tissues or single cells in vitro. Two hormones, auxin and cytokinin, determine the fate of regenerating tissue: high ratios of auxin to cytokinin generally lead to root regeneration, while high ratios of cytokinin to auxin tend to promote shoot regeneration (Skoog and Miller, 1957. Symp Soc Exp Biol 54: 118-130). Regeneration of shoot in Arabidopsis seems to require a leucine- rich repeat receptor-like kinase and abscisic acid-receptor binding (Motte et al., 2014. Proc Natl Acad Sci USA 111: 8305-8310).

The term“apomixis”, as is used herein, refers to vegetative, non- sexual reproduction of plants through seeds. Apomixis is a genetically controlled reproductive mechanism found in some polyploid non-cultivated plant species. Apomixis is mediated by heterodimeric interactions between receptor kinases (RKs) and receptor-like proteins (RLKs) (Hecht et al., 2001. Plant Physiol 127: 803-816; Kumar and van Staden, 2019. Acta Physiologiae Plantarum 41: 31).

The term“resistance”, as is used herein, refers to RKs and RLKs that act as pattern recognition receptors to detect pathogen- or microbe-associated molecular patterns. An individual RK is thought to interact with a specific RLK upon binding of a ligand to the RK. Both local and systemic resistance and/or immunity is mediated by RKs/RLKs.

The term“steroid signal transduction”, as is used herein, refers to the activation of one or more signalling pathways in as plant cell after binding of a steroid such as a steroidal alkaloid, a cardiac glycoside, a phytosterol and a brassinosteroid, to its receptor. Common steroid receptors are cell- surface receptor serine/threonine kinases that activate a signal transduction cascade that regulates transcription (Wang et al., 2012. Ann Rev Genet 46: 701-724). Brassinosteroid binding activates its cognate receptor kinase activity and involves recruitment of a co-receptor kinase which is a leucine-rich repeat (LRR)-receptor-like kinase.

As used herein, the terms "nucleic acid molecule", "nucleic acid sequence", "polynucleotide", "polynucleotide sequence", "nucleic acid fragment", "isolated nucleic acid fragment" are used interchangeably herein. These terms encompass nucleotide sequences and the like. A polynucleotide may be a polymer of RNA or DNA that is single- or double-stranded and that optionally contains synthetic, non natural or altered nucleotide bases. A polynucleotide in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA, synthetic DNA, or mixtures thereof.

The term "isolated polynucleotide”, as is used herein refers to a

polynucleotide that is substantially free from other nucleic acid sequences, such as other chromosomal and extrachromosomal DNA and RNA, that normally accompany or interact with it as found in its naturally occurring environment. However, isolated polynucleotides may contain polynucleotide sequences which may have originally existed as extrachromosomal DNA but exist as a nucleotide insertion within the isolated polynucleotide. Isolated polynucleotides may be purified from a host cell in which they naturally occur. Conventional nucleic acid purification methods known to skilled artisans may be used to obtain isolated polynucleotides. The term also embraces recombinant polynucleotides and chemically synthesized polynucleotides.

The term "recombinant", as is used herein refers to a nucleic acid molecule which has been obtained by manipulation of genetic material using restriction enzymes, ligases, and similar genetic engineering techniques as described by, for example, Sambrook et al. 1989. Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY or DNA Cloning: A Practical Approach, Vol. I and II (Ed. D. N. Glover), IRL Press, Oxford, 1985. The term "recombinant," as used herein, does not refer to naturally occurring genetic recombinations.

The term "express" or "expression", as is used herein refers to transcription alone. The regulatory element(s) are operably linked to the coding sequence of a gene encoding a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor, such that the regulatory element(s) is capable of controlling expression of said gene encoding a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor.

The terms "encoding", "coding", or "encoded", as used herein in the context of a specified nucleic acid mean that the nucleic acid comprises the requisite information to guide transcription and translation of the nucleotide sequence into a specified protein. The information by which a protein is encoded is specified by the use of codons. A nucleic acid encoding a protein may comprise non-translated sequences (e.g., introns) within translated regions of the nucleic acid or may lack such intervening non-translated sequences {e.g., as in cDNA).

The term "operably linked" refers to the association of two or more nucleic acid fragments on a single nucleic acid fragment so that the function of one is affected by the other. For example, a promoter is operably linked to a coding sequence when it is capable of affecting the expression of that coding sequence (i.e., that the coding sequence is under the transcriptional control of the promoter). Coding sequences can be operably linked to regulatory sequences in sense or antisense orientation.

The terms "regulatory elements" or "regulatory sequences", which terms can be used interchangeably herein, refer to nucleotide sequences located upstream, within, or downstream of a coding sequence, and which may influence

transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include promoters, translation leader sequences, introns, and polyadenylation recognition sequences.

In addition to regulatory elements, the construct of the invention may comprise a promoter. The term "promoter" refers to a nucleotide sequence capable of controlling the expression of a coding sequence or functional RNA. In general, a coding sequence is located downstream to a promoter sequence. The promoter sequence may comprise proximal and more distal upstream elements, the latter elements often referred to as enhancers.

The term "enhancer", as is used herein, refers to a nucleotide sequence that can stimulate promoter activity. An enhancer may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue- specificity of a promoter. Promoters may be derived in their entirety from a native gene, as for example, a promoter which specifically induces expression of a gene encoding a leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic nucleotide segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. The tissue-specificity of a promoter, for example, is exemplified by the promoter sequence (described above) which induces expression of a gene encoding a leucine- rich repeat, receptor-like kinase of a LRR-RLKII ) subfamily, or an extracellular like SERK receptor in a specific cell or tissue. Promoters that cause a nucleic acid fragment to be expressed in most cell types at most times are commonly referred to as "constitutive promoters". New promoters of various types useful in plant cells are constantly being discovered; numerous examples may be found in the compilation by Okamuro and Goldberg. 1989. Biochemistry of Plants 15:1-82. It is further recognized that since the exact boundaries of regulatory sequences have not been completely defined in all cases, nucleic acid fragments of different lengths may have identical promoter activity.

In addition to regulatory elements, a construct of the invention may comprise a translation leader sequence. The term "translation leader sequence" refers to a nucleotide sequence located between the promoter sequence of a gene and the coding sequence. The translation leader sequence is present in the fully processed mRNA upstream of the translation start sequence ATG. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency.

The term "messenger RNA (mRNA)", as is used herein, refers to

polyadenylated RNA that is without introns and that can be translated into a polypeptide by the cell. "cDNA" refers to a DNA that is complementary to and derived from an mRNA template. cDNA can be single-stranded or converted to double stranded form using, for example, the Klenow fragment of DNA polymerase

I.

The term“sense" RNA”, as is used herein refers to an RNA transcript that includes the mRNA and can be translated into a polypeptide by the cell.

The term "antisense", as is used herein, refers to the complementary strand of the reference transcription product. Expression of an RNA molecule that is antisense to a part of a target mRNA may reduce of even block expression of the target gene. The complementarity of an antisense RNA may be with any part of a nucleotide sequence, i.e., with all or part of the 5' non-coding sequence, 3' non coding sequence, intron, or the coding sequence.

As used herein, the terms“introgression”,“introgressed” and“introgressing” refer to both a natural and artificial process, and the resulting events, whereby genes of one species, variety or cultivar are moved into the genome of another species, variety or cultivar, by crossing those species. The process may optionally be completed by backcrossing to the recurrent parent. "Transformation" refers to the transfer of a nucleic acid fragment into the genome of a host organism, resulting in genetically stable inheritance. Host organisms containing the transformed nucleic acid fragments are referred to as "transgenic" organisms. Examples of methods of plant transformation include Agrobacterium- mediated transformation (De Blaere et al. 1987. Meth. Enzymol.

143: 277) and particle-accelerated or "gene gun" transformation technology (Klein et a/. 1987. Nature 327: 70-73; U.S. Pat. No. 4,945,050, incorporated herein by reference). Isolated nucleic acid molecules of the invention can be incorporated into recombinant constructs, typically DNA constructs, capable of introduction into a host cell. Such a construct can be a vector that includes a replication system and sequences that are capable of transcription and translation of a polypeptide encoding sequence in a given host cell. A number of vectors suitable for stable transfection of plant cells or for the establishment of transgenic plants have been described in, e.g., Pouwels et al., 1985. Supp. 1987. Cloning Vectors: A Laboratory Manual; Weissbach and Weissbach. 1989. Methods for Plant Molecular Biology, Academic Press, New York; Flevin et al., 1990. Plant Molecular Biology Manual, Kluwer Academic Publishers, Boston. Typically, plant expression vectors include, for example, an altered plant gene encoding a leucine-rich repeat, receptor-like kinase of a somatic embryogenesis receptor kinase (SERK) subfamily, or an extracellular like SERK receptor, under the transcriptional control of 5' and 3' regulatory sequences and a dominant selectable marker. Such plant expression vectors may comprise a promoter regulatory region, e.g., a regulatory region controlling inducible or constitutive, environmentally- or developmentally- regulated, or cell- or tissue-specific expression, a transcription initiation start site, a ribosome binding site, an RNA processing signal, a transcription termination site, and/or a polyadenylation signal.

The term "protein" or "polypeptide", as is used herein, refers to a chain of amino acids arranged in a specific order determined by the coding sequence in a polynucleotide encoding the polypeptide. Each protein or polypeptide may have an unique function.

Described herein are a functional leucine-rich repeat, receptor-like kinase of a somatic embryogenesis receptor kinase (SERK) subfamily, or an extracellular like SERK receptor polypeptide, and functional fragments thereof, mutants and variants having the same or a similar biological function or activity, as well as mutants of which the functional expression is reduced or absent. As used herein, the terms "functional fragment", "mutant" and "variant" refers to a polypeptide which possesses biological function or activity identified through a defined functional assay and associated with a particular biologic, morphologic, or phenotypic alteration in the cell.

Genes encoding a leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor, can be cloned using a variety of techniques according to the invention. The simplest procedure for the cloning of a gene encoding a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor, requires the cloning of genomic DNA from an organism identified as producing a leucine-rich repeat, receptor-like kinase II subfamily, or an extracellular like SERK receptor, and the transfer of the cloned DNA on a suitable plasmid or vector to a host organism, followed by the identification of transformed hosts. Techniques suitable for cloning by homology include standard library screening by DNA hybridization or polymerase chain reaction (PCR) amplification using primers derived from conserved sequences.

As defined herein, two DNA sequences are substantially identical when at least 80% (preferably at least 85% and most preferably 90%) of the nucleotides match over a defined length of the sequences, preferably the complete length of the sequences, using algorithms such as CLUSTAL W, CLUSTAL OMEGA, or

EMBOSS NEEDLE. Sequences that are substantially identical can also be identified in a hybridization experiment such as a Southern blotting experiment, under stringent conditions as is known in the art. See, for example, Sambrook et al., supra. Sambrook et al. describe highly stringent conditions as a hybridization temperature 5-10° C below the Tm of a perfectly matched target and probe; thus, sequences that are "substantially identical" would hybridize under such conditions. Substantially identical nucleic acid sequences may encode identical or

substantially identical proteins. "Substantially identical proteins" refers to proteins that are at least 80%, preferably at least 85%, at least 90%, at least 95% such as at least 99% of the amino acid residues are identical over a defined length of the sequences, preferably the complete length of the sequences, using algorithms such as CLUSTAL W, CLUSTAL OMEGA, or EMBOSS NEEDLE. The term“guide RNA (gRNA) molecule”, or“single gRNA molecule (sgRNA)”, as is used herein, refers to a specific single RNA sequence that recognizes the target DNA region of interest and directs an associated nuclease there for editing. Said gRNA preferably comprises a 17-20 nucleotide sequence complementary to the target DNA, and a binding scaffold for the associated nuclease.

The term“CRISPR associated endonuclease” (Cas), as is used herein, refers to an endonuclease that is guided by gRNA or CRISPR to a target DNA. Said target DNA is subsequently cut by the endonuclease. Said CRISPR associated endonuclease may be a Cas9, for example isolated from Streptococcus pyogenes, a Cpfl, for example isolated from Francisella novicida, C2c1, C2c2 and C2c3, or variants thereof (Nakade et al., 2017. Bioengineered 8: 265-273).

Nucleic acid constructs

In a first embodiment, the invention provides a recombinant nucleic acid molecule, preferably an isolated nucleic acid molecule, comprising a gene coding an altered leucine-rich repeat (LRR), receptor-like kinase (RLK) of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), whereby said alteration is in a region of the gene that encodes a conserved, extracellular domain of said receptor.

SEQ ID NO:1 in Figure 1 depicts a reference sequence from Arabidopsis thaliana. A preferred isolated nucleic acid molecule that encodes an altered LRR- RLK, or a functional part thereof, preferably comprises one or more amino acid alterations in a domain that is involved in protein-protein, protein-ligand interactions, or both protein-protein and protein-ligand interactions.

A preferred alteration is at one or more amino acid positions in a domain comprising a conserved pair of cysteines, a leucine-rich region (LRR), or both. Said alteration of one or more amino acid residues preferably is in a domain having the consensus sequence XXVXPCSWXXXXCXXXXXXXXXXL, corresponding to a region from amino acid residue 52 to 75 of SEQ ID NO:1, a domain having the consensus sequence XXXLX, corresponding to a region from amino acid residue 96 to 100 of SEQ ID NO:1, a domain having the consensus sequence LXLX, corresponding to a region from amino acid residue 121 to 124 of SEQ ID NO:1, and/or a domain having the consensus sequence LXYLXL, corresponding to a region from amino acid residue 142 to 147 of SEQ ID NO:1 (see Figures 1A, B).

Said alteration of one or more amino acid has altered the amino acid sequence of one or more of the conserved, extracellular domains corresponding to a region from amino acid residue 52 to 75 of SEQ ID NO:1, corresponding to a region from amino acid residue 96 to 100 of SEQ ID NO:1, corresponding to a region from amino acid residue 121 to 124 of SEQ ID NO:1, and/or corresponding to a region from amino acid residue 142 to 147 of SEQ ID NO:1, from a Clade X amino acid sequence into a Clade Y amino acid sequence, whereby Clade X and Clade Y each refers to a different Clade of the identified Clades 1-5 as depicted in Figure 2.

Said alteration of one or more amino acid preferably has altered the amino acid sequence of one or more of the conserved, extracellular domains corresponding to a region from amino acid residue 52 to 75 of SEQ ID NO:1 (SEQ ID NO:2), corresponding to a region from amino acid residue 96 to 100 of SEQ ID NO:1 (SEQ ID NO:3), corresponding to a region from amino acid residue 121 to 124 of SEQ ID NO:1 (SEQ ID NO:4), and/or corresponding to a region from amino acid residue 142 to 147 of SEQ ID NO:1 (SEQ ID NO:5), such that the alteration is not an alteration of one or more of the conserved amino acid residues at positions 3, 5, 6, 7, 8, 13 and 24 of the first conserved domain, at positions 2 and 4 of the second conserved domain, at positions 1 and 3 of the third conserved domain, and at positions 2 and 5 of the fourth conserved domain, as depicted in Figure 2.

Said alteration preferably is of at least one amino acid residue, at least two amino acid residues, at least three amino acid residues, at least four amino acid residues, at least five amino acid residues, at least ten amino acid residues, and preferably less than twenty-five amino acid residues, such as less than twenty amino acid residues, and less than fifteen amino acid residues.

Said alteration preferably is within the first conserved domain, within the second conserved domain, within the third conserved domain and/or within the fourth conserved domain, preferably within the first and second conserved domains, the first and third conserved domains. The first and fourth conserved domains, the second and third conserved domains, the second and fourth conserved domains, the third and fourth conserved domains, the first, second and third conserved domains, the first, second and fourth conserved domains, the first, third and fourth conserved domains or, most preferably in all four conserved domains.

In an embodiment, a library of nucleic acid molecules is generated, of which each member encodes a gene with a differently altered leucine-rich repeat (LRR), receptor-like kinase (RLK) of a LRR-RLKII subfamily, or a differently altered extracellular like SERK receptor (ELS), whereby said alteration is in a region of the gene that encodes a conserved, extracellular domain of said receptor. Said library preferably comprises substantially all possible permutations of amino acid alterations in one or more of the conserved domains, preferably in all of the four conserved domains depicted in Figure 2.

A member of the library of nucleic acid molecules may be selected by expressing the altered leucine-rich repeat (LRR), receptor- like kinase (RLK) of a LRR-RLKII subfamily, or the altered extracellular like SERK receptor (ELS) in a suitable cell or cell line, and selecting a cell that responds differently to a certain stimulus, when compared to a cell or cell line comprising not an altered leucine- rich repeat (LRR), receptor-like kinase (RLK) of a LRR-RLKII subfamily or an altered extracellular like SERK receptor (ELS), or when compared to cells or cell lines that express other members of the library of nucleic acid molecules. Said different response preferably is a modulation of a serine/threonine kinase activity, such as an enhanced serine/threonine kinase activity, or a reduced serine/threonine kinase activity.

A recombinant nucleic acid molecule according to the invention preferably is present in an expression construct in which the nucleic acid molecule is operably linked to a promoter that is functional in plants and/or in plant cells and/or in one or more plant cell lines.

A preferred recombinant nucleic acid molecule is present in a vector. The invention therefore also provides a vector comprising the recombinant nucleic acid construct of the invention. More particularly, the invention provides a vector comprising an isolated, synthetic or recombinant nucleic acid sequence encoding a LRR-RLKII or ELS receptor protein that comprises at least one alteration in a region that encodes a conserved, extracellular domain of said receptor, or a functional fragment or a functional highly homologous sequence thereof. Examples of a suitable vector are bacterial artificial chromosome (BAG) vectors such as BeloBACII, pBINplus, pKGW-MG, or any other commercially available cloning vector.

As will be outlined below there are multiple ways in which a nucleic acid molecule of the invention can be transferred to a plant. One suitable means of transfer is mediated by Agrobacterium in which the nucleic acid to be transferred is part of a binary vector and hence it is preferred that the above described vector is a binary vector. Another suitable means is by crossing a plant which expresses a protein that comprises at least one of the altered amino acid sequences in a LRR- RLKII or ELS receptor, or a functional fragment or a functional highly homologous sequence thereof to a plant that does not express said altered protein and to identify progeny plants of the cross that have inherited the gene encoding the altered protein that comprises at least one of the altered amino acid residues, or a functional fragment or a functional highly homologous sequence thereof.

The invention further provides a host cell comprising a nucleic acid as described herein or a vector as described herein. Examples of a preferred host cell are an E. coli cell suitable for BAG clones (e.g. DH10B) or an Agrobacterium cell. In another embodiment, said host cell comprises a plant cell. Suitable cells or cell cultures may be obtained from sources such as the plant cell culture library at UMass Amherst, including plant cell cultures of monocot, dicot and gymnosperm plant species. From such a cell, a transgenic or genetically modified plant can be obtained by methods known by the skilled person including, for example, regeneration protocols.

Methods

The invention further provides a method for the production of a plant comprising in its genome at least one copy of a gene encoding a leucine-rich repeat, receptor-like kinase of a somatic embryogenesis receptor kinase (SERK) subfamily, or an extracellular like SERK receptor (ELS), in which a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered, said method comprising the steps of (a) introducing a recombinant nucleic acid molecule encoding an altered leucine-rich repeat, receptor-like kinase of a somatic LRR- RLKII subfamily, or an altered extracellular like SERK receptor (ELS), into a plant protoplast, cell, or callus; (b) regenerating a transgenic plant from the plant protoplast, cell, or callus, wherein the transgenic plant comprises in its genome the recombinant nucleic acid construct; and (c) obtaining a progeny plant derived from the transgenic plant of step (b), wherein said progeny plant comprises in its genome the recombinant nucleic acid construct.

For transgenic methods of transfer a nucleic acid molecule comprising an altered leucine-rich repeat, receptor-like kinase of a somatic embryogenesis receptor kinase (SERK) subfamily, or an altered extracellular like SERK receptor (ELS) may be isolated from a donor plant by using methods known in the art and the thus isolated nucleic acid molecule may be transferred to a recipient plant by transgenic methods, for instance by means of a vector, in a gamete, or in any other suitable transfer element, such as a bombardment with a particle coated with said nucleic acid sequence.

Plant transformation generally involves the construction of a vector with an expression cassette that will function in plant cells. In the present invention, such a vector consists of a nucleic acid sequence that comprises a gene encoding an altered leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), which gene may be under control of or is operably linked to one or more regulatory elements such as a promoter. The expression vector may contain one or more such operably linked gene/regulatory element combinations. The vector(s) may be in the form of a plasmid, and can be used, alone or in combination with other plasmids, to provide transgenic plants that exhibit altered apomixis, regeneration, resistance and/or steroid signal transduction, when compared to a control plant not comprising the altered LRR- RLKIIor altered ELS nucleic acid construct, using transformation methods known in the art, such as the Agrobacterium transformation system.

Expression vectors can include at least one marker gene, operably linked to a regulatory element (such as a promoter) that allows transformed cells containing the marker to be either recovered by negative selection (by inhibiting the growth of cells that do not contain the selectable marker gene), or by positive selection (by screening for the product encoded by the marker gene). Many commonly used selectable marker genes for plant transformation are known in the art, and include, for example, genes that code for enzymes that metabolically detoxify a selective chemical agent which may be an antibiotic or a herbicide, or genes that encode an altered target which is insensitive to the inhibitor. Several positive selection methods are known in the art, such as mannose selection. Alternatively, marker-less transformation can be used to obtain plants without marker genes, the techniques for which are known in the art (e.g. WO 03/010319). Suitable marker genes are described in Miki and McHugh, 2004 (Miki and McHugh, 2004. J Biotech 107: 193-232).

One method for introducing an expression vector into a plant is based on the natural transformation system of Agrobacterium (See e.g. Horsch et al., 1985. Science 227:1229-1231). A. tumefaciens and A. rhizogenes are plant pathogenic soil bacteria that can genetically transform plant cells. The Ti and Ri plasmids of A. tumefaciens and A. rhizogenes, respectively, carry genes responsible for genetic transformation of a plant. Methods of introducing expression vectors into plant tissue include the direct infection or co-cultivation of plant cells with

Agrobacterium tumefaciens. Descriptions of Agrobacterium vectors systems and methods for Agrobacterium-mediated gene transfer are provided in US Pat. No.

5,591,616. General descriptions of plant expression vectors and reporter genes and transformation protocols and descriptions of Agrobacterium vector systems and methods for Agrobacterium- mediated gene transfer can be found in Gruber and Crosby, 1993. (Gruber and Crosby,. 1993, Vectors for plant transformation, in Methods in Plant Molecular Biology and Biotechnology (Click, B. R. and

Thompson, J. E., eds.), CRC, Boca Raton, FL). General methods of culturing plant tissues are provided for example by Miki et al., 1993 (Miki et al., 1993. In: B.R. Click and J.E. Thompson, eds. Techniques in plant molecular biology and biotechnology. CRC Press Inc.), and by Tavazza et al., 1989 (Tavazza et al., 1989. Plant Science 59: 175-181). A reference handbook for molecular cloning techniques and suitable expression vectors is Sambrook and Russell, 2001 (Sambrook J and Russell DW (2001) Molecular cloning: a laboratory manual. 3rd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York).

Another method for introducing an expression vector into a plant is based on microprojectile-mediated transformation (particle bombardment) wherein DNA is carried on the surface of microprojectiles. The expression vector is introduced into plant tissues with a biolistic device that accelerates the microprojectiles to speeds of 300 to 600 m/s which is sufficient to penetrate plant cell walls and membranes. Another method for introducing DNA to plants is via sonication of target cells. Alternatively, liposome or spheroplast fusion has been used to introduce expression vectors into plants. Direct uptake of DNA into protoplasts using CaC12 precipitation, polyvinyl alcohol or poly-L-ornithine has also been reported.

Electroporation of protoplasts and whole cells and tissues has also been described.

Other well-known techniques such as the use of BACs, wherein a part of a plant genome comprising the gene encoding an altered leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), is introduced into bacterial artificial Chromosomes (BACs), i.e. vectors used to clone large DNA fragments of up to 00- to 300-kb insert size) in Escherichia coli cells, based on naturally occurring F-factor plasmid found in the bacterium E. coli may for instance be employed in combination with the BIBAC system (Hamilton, 1997. Gene 200: 107-16) to produce transgenic plants.

The invention further provides a plant that is obtainable or obtained by the method for production of a plant comprising in its genome at least one copy of an gene encoding an altered leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS).

Plants

The invention further provides a plant protoplast, cell, or callus transformed with a recombinant nucleic acid molecule according to the invention, preferably a recombinant nucleic acid construct or a vector according to the invention.

A nucleic acid molecule that comprises a gene encoding an altered leucine- rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), may be transferred to a suitable recipient plant by any method available. For instance, said nucleic acid molecule may be transferred by crossing a plant comprising at least one allele of an altered leucine- rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or of an altered extracellular like SERK receptor (ELS), with a selected breeding line i.e. by introgression, by transformation, by protoplast fusion, by a doubled haploid technique or by embryo rescue or by any other nucleic acid transfer system, optionally followed by selection of offspring plants comprising the gene encoding an altered leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), as assessed by markers, and/or by altered apomixis, regeneration, resistance and/or steroid signal transduction, when compared to a control plant not comprising the altered LRR-RLKII or altered ELS nucleic acid construct.

The introgression of a nucleic acid molecule comprising an altered leucine- rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), as described herein may suitably be accomplished by using traditional breeding techniques. The gene is preferably introgressed into plants by using marker-assisted selection (MAS) or marker- assisted breeding (MAB). MAS and MAB involve the use of one or more of the molecular markers for the identification and selection of those offspring plants that contain one or more of the genes that encode an altered leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS). In the present instance, such identification and selection is based on selection of the gene of the present invention or markers associated therewith. MAS can also be used to develop near-isogenic lines (NIL) harboring the gene of interest, or the generation of gene isogenic recombinants (QIRs), allowing a more detailed study of each gene effect and is also an effective method for development of backcross inbred line (BIL) populations. Plants developed according to this embodiment can advantageously derive a majority of their traits from the recipient plant, and derive an altered leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), from a donor plant.

Based on the herein described nucleic acid sequences, the invention also provides probes and primer, i.e. oligonucleotide sequences complementary to the DNA strand as described herein, or complementary to the complementing strand. Said primers and probes are for example useful in PCR analysis. Primers based on coding sequences for the herein described altered amino acid sequences are very useful to assist plant breeders active in the field of classical breeding and/or breeding by genetic modification of the nucleic acid content of a plant and in selecting a plant that is capable of expressing an altered leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), or a functional fragment or functional highly homologous sequence thereof.

Preferably, the nucleic acid of a plant to be tested is isolated from said plant and the obtained isolated nucleic acid is brought in contact with one or more of the primers and/or probes. One can for example use a PCR analysis to test plants for the presence or absence of an altered leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), in the plant genome. Such a method would be especially preferable in marker-free transformation protocols, such as described in WO 03/010319.

The invention further provides a transformed plant regenerated from a protoplast, cell, or callus according to the invention. Said transformed plant comprises the recombinant nucleic acid molecule of the invention, or a recombinant nucleic acid construct or vector comprising the recombinant nucleic acid molecule integrated somewhere within the genome such that it is or will be expressed in suitable cells and/or at a suitable time in development.

In a preferred embodiment, the transformed plant comprises the recombinant nucleic acid molecule of the invention in the form of a replacement of at least one allele of the endogenous gene encoding the leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or the extracellular like SERK receptor (ELS).

In this embodiment, a plant comprising at least one functional allele of a gene encoding an altered leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an altered extracellular like SERK receptor (ELS), is generated for example by using any one of CRISPR-CAS, TALEN, and CRE-LOX.

Alteration of at least one allele of a gene encoding a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor (ELS), may be accomplished by homologous recombination. Said homologous recombination preferably is supported by a DNA recognition site- specific recombinase, as is known to a person skilled in the art.

Said DNA recognition site-specific recombinase preferably is selected from a Zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), a topoisomerase I like recombinase such as Cre recombinase from the PI

bacteriophage, a Saccharomyces cerevisiae- derived flippase (Flp recombinase), a lambda integrase, a gamma- delta resolvase, Tn3 resolvase, fC31 integrase and/or a clustered regularly interspaced short palindromic repeats (CRISPR)-guided nuclease. Preferred site- specific recombinases are a Zinc finger nuclease, a transcription activator-like effector nuclease (TALEN) and/or a clustered regularly interspaced short palindromic repeats (CRISPR)-guided nuclease.

TALEN, Zinc finger nuclease or CRISPR-CAS mediated alteration of a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor (ELS), preferably is mediated by targeting a nuclease to at least one specific position on said leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or said extracellular like SERK receptor (ELS), preferably at least at two specific positions. Said targeting may be mediated by the TALE DNA binding domains, or by CRISPR single chimeric guide RNA sequences. The nuclease, a FOK1 nuclease in the case of a TALEN, and a CAS protein or CAS- related protein, preferably a CAS9 protein, for CRISPR mediated double stranded breaks in the genomic DNA of the gene encoding the leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or the extracellular like SERK receptor

(ELS). The introduction of DNA double stranded breaks increases the efficiency of gene editing via homologous recombination, in the presence of suitable donor DNA to alter at least one amino acid residue in a region that encodes a conserved, extracellular domain of a gene encoding a leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or encoding an altered extracellular like SERK receptor (ELS) (Gaj et al., 2013. Trends Biotechnol 31: 397-405).

Zinc finger proteins are DNA-binding motifs and consist 5 of modular zinc finger domains that are coupled to a nuclease. Each domain can be engineered to recognize a specific DNA triplet in the region of the gene that encodes a conserved, extracellular domain of said receptor. A combination of three or more domains results in the recognition of a sequence that is specific for a gene encoding a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor (ELS). Expressing said coupled zinc finger protein-nuclease in a relevant plant cell, in the presence of a recombinant nucleic acid molecule comprising at least an extracellular part of a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or of an extracellular like SERK (ELS) receptor, wherein at least a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered, will result in alteration of the gene encoding the LRR-RLKII or ELS.

Similarly, synthetic transcription factor DNA binding domains (DBDs) can be programmed to recognize specific DNA motifs. Such transcription activator-like effector (TALE) DNA binding domains (DBD) preferably contain a number, from 7 to 34, highly homologous direct repeats, each consisting of 33-35 amino acids. Specificity is contained in the two amino acid residues in positions 12 and 13 of each repeat. Since the DNA:protein binding code of said two amino acid residues has been deciphered, it is possible to design TALEs that bind any desired target DNA sequence by engineering an appropriate DBD. Typically, the TALEs are designed to recognize 15 to 20 DNA base-pairs, balancing specificity with potential off targeting (Boettcher and McManus, 2015. Mol Cell 58: 575-585). A TALE, preferably at least two TALEs specific for a gene encoding a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or an extracellular like SERK receptor (ELS), may than be coupled to a nuclease, for example Cas9. Expressing said coupled TALE-nucleases in a relevant plant cell, in the presence of a recombinant nucleic acid molecule comprising at least an extracellular part of a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or of an extracellular like SERK (ELS) receptor, wherein at least a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered, will result in alteration of the gene encoding the LRR-RLKII or ELS.

A preferred site-specific recombinase is CRISPR associated protein 9 (Cas 9). Cas9 is a RNA- guided DNA endonuclease enzyme that can cleave any sequence that is complementary to the nucleotide sequence in a CRISPR-comprising guide RNA. The target specificity of this system originates from the gRNA:DNA complementarity, and is not dependent on modifications to the protein itself, like in TALE and Zinc-finger proteins.

As is indicated herein above, DNA recognition site-specific recombinases can be used to perform targeted genome editing in cells. Targeted replacement employing targeting modules at two positions within one or more conserved extracellular domains of a leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or of an extracellular like SERK (ELS) receptor, in the presence of a recombinant nucleic acid molecule comprising at least an extracellular part of a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or of an extracellular like SERK (ELS) receptor, wherein at least a region of the gene that encodes a conserved, extracellular domain of said receptor has been altered, will effectively generate targeted alterations in the conserved extracellular domains of the genes encoding a leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or an extracellular like SERK (ELS) receptor,

In the presence of a homology repair donor, this system can guide precise gene replacement by exchanging the extracellular part of a leucine-rich repeat, receptor-like kinase of a LRR-RLKII subfamily, or of an extracellular like SERK (ELS) receptor, or a relevant part thereof, for the corresponding part of an altered sequence, for example wherein the one or more amino acid alterations said is not an alteration of one or more of the conserved amino acid residues at positions 3, 5, 6, 7, 8, 13 and 24 of the first consensus sequence, at positions 2 and 4 of the second consensus sequence, at positions 1 and 3 of the third consensus sequence and at positions 2 and 5 of the fourth consensus sequence, as depicted in Figure 2.

The invention further provides a part of the transformed plant, wherein said part preferably is pollen, ovules, leaves, embryos, tuber, roots, root tips, anthers, flowers, fruits, stems shoots, scions, rootstocks, seeds, protoplasts and calli, including single cells, cell clumps and tissue cultures therefrom. A preferred part is a fruit, tuber or seed.

Food products

The invention further provides a food product prepared from a plant part of a plant according to the invention, preferably a genetically modified plant according to the invention. Said plant part preferably is a fruit, a seed and/or a tuber. The alteration of the extracellular part of a leucine-rich repeat, receptor- like kinase of a LRR-RLKII subfamily, or of an extracellular like SERK (ELS) receptor, will be demonstratable in said food product, for example by amplification reactions such as polymerase chain reaction.

Remnants of plant or plant part according to the invention will be present in said food product, such as traces of the altered genomic region encoding the extracellular part of a leucine-rich repeat, receptor-like kinase of a LRR-RLKII (SERK) subfamily, or of an extracellular like SERK (ELS) receptor. Said remnants can be visualized, for example by amplification of the genomic region comprising the gene encoding an altered leucine-rich repeat, receptor- like kinase of a LRR- RLKII subfamily, or an altered extracellular like SERK receptor (ELS), as is known to a person skilled in the art.

Having now generally described this invention, the same will be better understood by reference to certain specific examples, which are included herein only to further illustrate the invention and are not intended to limit the scope of the invention as defined by the claims.

EXAMPLES

Example 1

Materials and methods

Sequence mining and compilation

A multiple sequence alignment (MSA) was compiled comprising the amino acid sequences from land plants that had been submitted in the literature as well as GenBank (National Centre for Biotechnology Information) as‘SERK protein’.

We made sure to perform extensive searches for sequence similarity using pBLAST tool with default settings in order to harvest as thoroughly as possible and in order to avoid miss-spellings, and -naming in sequence names. The Arabidopsis thaliana RKS proteins (0-16) which have been described in US patent application US 2009/0126041A, were used as query sequences for the pBLAST search.

A total of 1528 sequences were considered to represent all publically available sequences to date. The set of 1528 sequences represented land plants and comprised 317 Asterids, 897 Rosids, 39 lower Eudicots, 259 Monocots, and 16 Magnoliids. In order to represent land plants we included Amborella trichopoda, Picea sitchensis, Adiantum capillusveneris, Selaginella moellendorffii,

Physcomitrella pattens, Marchantia polymorpha. We also included the alga

Closterium ehrenbergii and three animal kinase sequences which had highest similarity with SERK proteins: Two kinase proteins of the nematode Trichuris suis and a kinase Pelle protein from the nematode Trichinella pseudospiralis as outgroup candidates. We also added RLK sequences named‘NIKI’,‘NIK2’ and ‘LRRII non-SERKs’ used as‘outliers’ in Aan Den Toorn et al., 2015 (Aan Den Toorn et al., 2015. Ibid), following the suggestion by Sakamoto et al., 2012 (Sakamoto et al., 2012. BMC Plant Biol 12: 229) that there are three LRR-RLKII clades.

Moreover, we included three‘extracellular like SERK receptor’ sequences (labelled ‘ELS protein’ in US patent application US 2009/0126041A, which have the LRR domains only.

'The same proteins are often given different names including‘SERK’ (types 1- 6),‘RKS’ (0-16) and‘BAKG by authors for these sequences. We therefore started out by creating a Rosetta’s stone of RLK nomenclature in the form of a Venn diagram (constructed using at bioinformatics.psb.ugent.be/webtools/Venn/) to explore overlap and redundancy in RLK classification and nomenclature. RKS types were assigned using US patent application US 2009/0126041A, SERK types were assigned as in the GenBank accession information.

Sequence alignment and phylogenetic analysis

Multiple sequence alignment (MSA) was performed using a Linux version of MAFFTv.7 (Katoh and Standley, 2013. Mol Biol Evol 30: 772-780) with’Auto’ settings in effect. The resulting MSA was inspected visually and adjusted manually when needed using Mesquite v. 3.10 (Maddison and Maddison, 2016. Mesquite, Version 3.10. Version available at mesquiteproject.org) but no columns/residues were removed. However, redundant sequences (with identical names or sequences) and sequences that did not align properly were discarded from the MSA. Also some terminals were removed which only contain a LRR or a kinase domain. As indicated above, the serine/threonine-protein kinase Pelle of the nematode

Trichinella pseudospiralis was used as outgroup sequence. In order to do so we had to exclude parts of this sequence, in order to maintain homology with the ingroup MSA. Positions 1-156 from the Pelle sequence were excluded, corresponding to the LRR part (not homologous to plant LRRs of receptor-like kinase receptors) of the protein. As a final matrix, 1328 terminals were kept for further analyses.

According to the result of motif searching using PFAM 32.0 (Finn et al., 2016. Nucleic Acids Res 44: D279-85; available at pfam.xfam.org/), the MSA was divided in two parts, one (named‘Ext’, with alignment positions 1-573) containing the extracellular LRR up to the transmembrane part. A second part (named Tnt’, with alignment positions 574-1319) contained the intracellular kinase, with position 574 detected as the first position of a kinase by PFAM.

Final MSAs were subjected to phylogenetic reconstruction using a Linux version of IQ-TREE (Nguyen et al., 2015. Mol Biol Evol 32: 268-274) which is maximum likelihood-based and includes automatic amino acid substitution model selection (ModelFinder) as well as 1000 replicates of‘ultrafast’ bootstrapping (UFBoot). Tree searches were performed on both the full alignment and separate Ext and Int matrices. Resulting trees were visualized using Figtree v.1.4.3

(available at tree.bio.ed.ac.uk/software/figtree/).

To see if there was a significant difference between substitution rate of different main clades, we used the IQ-TREE ML tree topology and calculated the number of amino acid substitutions (parsimony branch lengths) of each main clade (see below) divided by the number of terminals in that clade, and for Int and Ext partitions separately, using PAUP*4.0. (Swofford, 2002. Phylogenetic analysis using parsimony. Sinauer Associates, Sunderland, Massachusetts). Subsequently, a G-test (goodness of fit) was applied to test for significance of differences in the number of amino acid substitutions among main clades and between the Ext and Int partitions.

Amino acid conservation and protein structure analysis

A Linux version of WebLogo 3 (Crooks et al., 2004. Genome Res 14:1188-1190) was used to check and visualize the amino acid residue conservation patterns in the

MSA for separate main clades. Furthermore, for each clade one A. thaliana and one O. sativa sequence was selected, as well as one A. thaliana and one O. sativa ELS sequence and compared their exon and intron boundaries as presented in

ARAPORT (Arabidopsis Information Portal; available at araport.org) and NCBI (available at ncbi.nlm.nih.gov/).

Apomorphy detection for the clades incongruently placed between Ext and Int In order to identify synapomorphic amino acid substitutions for each clade that could probably lead us to functionally- relevant residues, we used the IQ-TREE ML tree and optimised the amino acid MSA onto it using PAUP*4.0 (Swofford, 2002. Ibid) for Linux, using accelerated transformation (ACCTRAN). An apomorphy list (from‘DescribeTree’) was compiled and, in addition, the consistency Index (Cl) for individual changes were recorded in order to measure the fit of these sites to the tree (Farris, 1989. Cladistics 5: 417-419.), and hence to establish whether they represent unique changes.

To evaluate the evolutionary distance between two amino acids we calculated Grantham scores (Grantham, 1974. Science 185:862-864) between original and replaced amino acids for each node using a python script (available upon request). These scores range from 5-215 and are based on side chain atomic composition, polarity and volume properties of all amino acids (Grantham, 1974. Ibid). Higher Grantham scores therefore show more physico-chemical and hence functional distance between two amino acids (Grantham, 1974. Ibid). Amino acid substitutions involving Grantham scores of 5-60, 60-100 and more than 100 have been

considered‘conservative’,‘non-conservative’ and‘radical’, respectively (Abkevich, 2004. J Med Genet 41: 492-507; Balasubramanian et al., 2005. Nucleic Acids Res 33: 1710-1721). Grantham scores were plotted against Cl to see if there is any correlation between level of homoplasy and the level of Grantham similarity between replaced amino acids.

Finally, we selected two SERK 3D protein structures from PDB (Berman et al., 2000. Nucleic Acids Res 28: 235-242; available at rcsb.org) including the extracellular part (PDB ID: 4LSC) and intracellular part (PDB ID: 2TL8) of the protein, determined using X-RAY diffraction by Santiago et al., 2013 (Santiago et al., 2013. Science 341: 889-892) and Cheng et al., 2011 (Cheng et al., 2011. Cell Host Microbe 10: 616-26), respectively. We mapped the MSA of our matrices‘Ext’ and‘Int’ onto the corresponding selected 3D protein structures using the ConSurf server, available at consurf.tau.ac.il (Ashkenazy et al., 2010. Nucleic Acids Res 38: W529-W533; Celniker et al., 2013. Isr J Che 53: 199 - 206; Ashkenazy et al., 2016. Nucleic Acids Res 44: W344 ^ -W350), in order to calculate the evolutionary conservation of amino acid positions in relation to the SERK structure (Berman et al., 2000. Nucleic Acids Res 28: 235-242; available at rcsb.org).’Empirical Baysian method’ was used to compute the evolutionary rate.

Results

Phylogenetic analysis of SERK proteins of plants

Our final amino acid MSA including 1328 available SERK proteins comprised 1319 alignment position and showed contrasting patterns of sequence conservation. Overall, alignment quality was good and we found MAFFT giving reproducible, consistent results (not shown). As the Ext matrix (extracellular partition of SERK proteins, see Methods) did not include a kinase Pelle protein sequence, which was used as outgroup for the Int+Ext and Int matrices, rooting for Ext was inferred from full matrix-based tree.

The first half of the MSA, comprising the LRR regions (Ext), was found slightly-less conserved than the second half (Int), containing the kinase domains. Using PAUP* (Swofford, 2002. Ibid) 58 % of the characters in Ext was estimated to be‘parsimony informative’, 16% uninformative and 26% constant, whereas in Int these proportions were 54%, 14% and 32%, respectively. A maximum likelihood tree was constructed based on the entire MSA of 1319 characters using IQ-TREE (Nguyen et al., 2015. Ibid), which selected the JTT model (Jones et al., 1992. Bioinformatics 8: 275-282) with 4 categories of Gamma rate distribution modelling as best-fitting. Using Trichinella Pelle as outgroup, in our IQ-TREE ML tree topology five main clades could be identified, four of which had bootstrap support of 100%, and one with bootstrap of 92%. These clades were labeled Clades I-V in order to reflect general structure and function of these proteins, irrespective of their multiple specific functions. Each clade comprised multiple types of both‘RKS’ and‘SERK’ and occurrence and distribution of both types across the 5 main clades found. Almost in all clades, land plant and APG relationships were presented, indicating that gene duplication and subsequent clade proliferation would have occurred before the split among land plants, as also outlined by Shiu and Bleeker (Shiu and Bleeker, 2001. Proc Natl Acad Sci 98: 10763-10768; Liu et al., 2017. BMC Evol Biol 17: 47). Amborella trichopoda and Marchantia polymorpha are distributed over the five clades depending on their

RKS types which would indicate that the RKS genes of these plants are paralogous and probably resulted from gene duplication events. Moreover, ELS proteins, lacking a cellular kinase component, are all located in Clade V. NIKI, NIK2 and other non- SERK proteins from the LRR-RLKII family did not create a separate clade as has been suggested in previous studies (Sakamoto et al., 2012. BMC Plant Biol 12: 229; Aan Den Toorn et al., 2015. Mol Plant 8: 762-782), but were distributed among four out of five clades in this study (see Table 1).

Trees based on extracellular (Ext) and intracellular (Int) partitions of the MSA had topological differences compared with the Ext+Int-based tree. In the Ext- based tree, topology Clades I and II are sister groups, followed by Clade III. In the Int-based tree topology, however, Clades II and III are sister groups followed by Clade I. The Int-based tree did not differ in topology from the Ext+Int-based tree, indicating that, despite the higher % of informative sites in Ext compared with Int (see above), overall topology was driven by Int.

While Clade I is a composition of 7 different RKS types, Clade III

predominantly consists of RKS type 5, which is represented by 4 and 5 terminals in Clades I and II, respectively. The difference among Clades I and II appears to be related to their SERK types: almost all SERK genes in Clade I are determined as type 1 or 2, while Clade II has only type 3. Clades IV and V comprise 3 and 6 RKS types, respectively. RKS type distribution over five clades appears independent from whether the Ext or Int partition was used. Table 1. Occurrence and distribution of sequences annotated as similar to RKS (receptor kinase-like SERK) and SERK (somatic embryogenesis receptor kinase) across the main leucine-rich repeat receptor- like kinase II (LRR-RLKII) clades found in this study. RKS types are designated according to similarity to types in Schmidt et al. (2009). The SERK and NSP interacting kinase (NIK) types are based on NCBI annotation files

To explore possible overlap among the different names used for‘SERK’,‘RKS’ and ΈAKG, we created a Venn diagram including all types, as far as they are available for all individual accession numbers. RKS types of 4 individuals

(annotated as‘SERK5’ or‘SERK6’) were unknown. We infer that the relationship between SERK and RKS types are complicated and SERK type number is in conflict with RKS type number. Eleven BAK1 sequences which refer to BRI-I associated kinase, were actually considered SERK3 in previous studies (e.g. Li, 2010. Curr Opin Plant Biol 13: 509-514; Roux et al., 2011. Plant Cell 23: 2440- 2455). All detected BAK1 are in clade V, and their RKS types are 10, 12, 13 or 16.

Furthermore, in order to investigate possible paralogue- specific rate-changes following gene duplications the number of amino acid substitutions was calculated of each of the five Clades, divided by its number of terminals, and based on the Ext end Int-partitions separately (Table 2). A G-test was applied in order to test for significance of differences. According to our results main Clades don’t differ significantly. However, the largest difference appears to occur in Clade IV LRR ('Ext').

Motif structure in five main clades

We drew the motif structure of proteins per each Clade based on multiple sequence alignment of LRR+kinase matrix visualized by Mesquite v. 3.10

(Maddison and Maddison, 2016. Ibid) and the sequence logos generated by Weblogo (Crooks et al., 2004. Genome Res 14: 1188-1190) (Figure 5A,B). All structures have a signal sequence at the N-terminal of their protein. A conserved motif with unknown function (LSPxY/FExAL) was present in Clades II and III proteins, but not in proteins of other Clades. Immediately after this motif, Clades II and V have two leucine zipper domains formed each by two leucines with 6 other amino acids in between. Clades I and IV have only one leucine zipper domain, whereas clade III does not have this motif (Figure 3). A putative disulphide bridge, composed of two conserved cysteine residues with 4-7 spacer amino acids in between, is present in proteins of all five Clades and is located in between the leucine zipper domains and the leucine-rich repeat (LRR) domain. The extracellular region of SERK proteins including leucine zipper domains and disulphide bridge, sometimes called N- capping residues as well (Aan den Toorn et al., 2015. Ibid). In the Clades I, II and III, a second disulphide bridge was detected after the LRR domain. Additionally, based on multiple sequence alignment of all proteins, we found a region enriched for serine and proline amino acids. The later regions include the SP and disulphide bridge termed“SPP'’ motif (Aan den Toorn et al., 2015. Mol Plant 8: 762-782). The extra-cellular part of the SERK protein is terminated by a transmembrane domain and a semi-conserved motif (xGxL/TK/RxF/YxxxEL/Tx) with unknown function.

The intra-cellular structure showed no variation in all clades and characterized by a kinase domain and ending with a semi-conserved C-terminal sequence

(xxE/YLSG/xP/xR) .

Analysis of the exon boundaries of detected domains (using ARAPORT (at araport.org) and NCBI (at ncbi.nlm.nih.gov/)) within selected annotated sequences of each of the five clades, revealed that there are 11-12 exons. It was found that the size and the order of exons are highly conserved among different Clades, but not the length of the introns. The first exon harbors a signal peptide domain followed by an exon with leucine zipper and disulphide bridge (N-capping residue). Next, there are 4 exons with totally 5 LRR domains, with 24 amino acids each, followed by two exons with one SPP domain including disulphide bridge and SP rich region, and one transmembrane domain. The SPP domain in sequences 9, 11 and 12 which are located in Clade V, does not have a disulphide bridge. The first eight domains that were identified are located in extracellular part of proteins. The last three or two detected exons are associated with the intra-cellular part of the proteins, and is recognized as kinase domain by PFAM. The three kinase domains in sequence 7 (A. thaliana, Clade IV), are present in two exons. Moreover, as it is illustrated in Figure 4, the exon-intron boundaries of ELS proteins are almost similar to clade V proteins, apart from the first exon of ELS which is a combination of the first and second exons of Clade V proteins.

Apomorphy detection for clades II and III in Ext and Int

Detection of synapomorphic amino acid substitutions showed that in the Ext- based tree, the number of apomorphies between node 2 and node 1 is 25, seven of which are considered as radical changes according to calculated Grantham scores (Grantham, 1974. Ibid), among characters with radical changes, none of them have unique changes with Cl = 1. Similarly, from node 2 to node 5, we counted 68 apomorphies, 17 of which can be considered as radical changes with a single one, G116 > W, having consistency index Cl = 1. Additionally, 8 of 34 detected apomorphies for node 1 to node 3 and 8 out of 39 for node 1 to node 4 have been grouped as the radical changes, none of them have CI=1.

In the Int-based tree, 30,33, 15 and 44 substitutions were recorded, respectively, between the node 1 to node 3 , node 1 to node 2, node 2 to 4 and node 2 to 5. Out of all counted apomorphies, 9 (one of them, P645>W, having Cl = 1), 12

(with 3 of them, A574>N, A646>D and N653>I, having Cl = 1), 6 (one of them, N567>L, has CI=1) and 8 radical changes were detected. All in all, roughly a quarter of all amino acid substitutions along these branches were radical, both in the Ext and Int parts of the SERK protein.

Furthermore, when plotting the Cl for residues against its corresponding Grantham score changes no direct relationship between these two criteria appears to exist; most of the changes had Cl less than 0.4 and Grantham score < 150 indicating that majority of the changes are not specific and they are not unique to just one clade but occur independently several times in different clades. For the unique changes on the other hand, around 23% concerned radical changes.

Conservation in 3D structure of the extra- and intra-cellular domain of plant SERK proteins

Several disulfide bridges were observed in the N-terminal of the protein. Of 40 detected apomorphies with radical changes associated to three first Clades, 7 are present in the sequence of PDB ID: 4LSC. Most detected apomorphies are located in the signal and transmembrane sequences (12 and 10 apomorphies, resp.), followed by the leucine zipper domains (7 apomorphies). Others are located at the LRR domain (5 apomorphies), disulphide domain (1 apomorphy) and the second unknown domain (5 apomorphies). Thirty five radical apomorphies are located in the kinase domain, of which 12 characters available in the sequence of 2TL8 protein.