Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ALGAL COMPONENTS OF THE PYRENOID'S CARBON CONCENTRATING MECHANISM
Document Type and Number:
WIPO Patent Application WO/2017/196790
Kind Code:
A1
Abstract:
Eukaryotic algae, which play a fundamental role in global CO2 fixation, enhance the performance of the carbon-fixing enzyme RuBisCO by placing it into an organelle called the pyrenoid where the enzyme's activity is enhanced by the carbon concentrating mechanism. Despite the ubiquitous presence and biogeochemical importance of this organelle, it was unknown how RuBisCO assembles to form the pyrenoid. Here, molecular components of the pyrenoid, their transport to the pyrenoid as essential components thereof, and their function in the carbon concentrating mechanism (CCM) are described. Algal pyrenoid proteins and variants thereof may be transferred into cells of bacteria (e.g., yanobacterium), algae (e.g., green algae like Chlamydomonas), and embryophytes (e.g., C3 and C4 plants) for stable or transient (especially in leaves) expression to manipulate the assembly of the pyrenoid and its biological functions.

Inventors:
MACKINDER LUKE C M (GB)
MEYER MORITZ T (GB)
METTLER-ALTMANN TABEA (DE)
PALLESEN LEIF (US)
STITT MARK (DE)
GRIFFITHS HOWARD (GB)
JONIKAS MARTIN C (US)
Application Number:
PCT/US2017/031669
Publication Date:
November 16, 2017
Filing Date:
May 09, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MACKINDER LUKE C M (GB)
MEYER MORITZ T (GB)
METTLER-ALTMANN TABEA (DE)
PALLESEN LEIF (US)
STITT MARK (DE)
GRIFFITHS HOWARD (GB)
JONIKAS MARTIN C (US)
International Classes:
C12N1/12; C07K14/405; C12N1/13; C12N15/82; C12R1/89
Domestic Patent References:
WO2015086798A22015-06-18
WO2010033229A22010-03-25
Other References:
DATABASE NCBI 23 August 2007 (2007-08-23), MERCHANT SS ET AL.: "Low-C02-lnducible Protein", XP055442029, Database accession no. XP_001690584 .1
DATABASE GenBank 22 June 2001 (2001-06-22), LAVIGNE, AC ET AL.: "LCI5", XP055442031, Database accession no. AAK77552.1
MACKINDER, LCM ET AL.: "A Repeat Protein Links Rubisco To Form The Eukaryotic Carbon-Concentrating Organelle", PNAS, vol. 113, 24 May 2016 (2016-05-24), pages 5958 - 5963, XP055442033
Attorney, Agent or Firm:
TANIGAWA, Gary, R. (US)
Download PDF:
Claims:
We Claim:

1. A variant protein comprising a polypeptide domain having (i) a length of at least 56-63 contiguous amino acids and (ii) one or more variations from the EPYCl amino acid sequence (SEQ ID NO: 1); wherein said variations are 1-15 amino acid substitutions, 1-7 amino acid deletions, 1-7 amino acid insertions, or a combination thereof as compared to SEQ ID NO : 1.

2. The variant protein of claim 1, wherein said polypeptide domain comprises at least residues 52-114, residues 115-174, residues 175-235, or residues 236-291 of SEQ ID NO: 1 and has been mutated from the EPYCl amino acid sequence by 1-15 amino acid substitutions, 1-7 amino acid deletions, 1-7 amino acid insertions, or a combination thereof.

3. The variant protein of claim 1, wherein said polypeptide domain comprises at least residues 52-114 of SEQ ID NO: 1 and has been mutated from the EPYCl amino acid sequence by 1-15 amino acid substitutions, 1-7 amino acid deletions, 1-7 amino acid insertions, or a combination thereof.

4. The variant protein of claim 1, wherein said polypeptide domain comprises at least residues 115-174 of SEQ ID NO: 1 and has been mutated from the EPYCl amino acid sequence by 1-15 amino acid substitutions, 1-7 amino acid deletions, 1-7 amino acid insertions, or a combination thereof.

5. The variant protein of claim 1, wherein said polypeptide domain comprises at least residues 175-235 of SEQ ID NO: 1 and has been mutated from the EPYCl amino acid sequence by 1-15 amino acid substitutions, 1-7 amino acid deletions, 1-7 amino acid insertions, or a combination thereof.

6. The variant protein of claim 1, wherein said polypeptide domain comprises at least residues 236-291 of SEQ ID NO: 1 and has been mutated from the EPYC1 amino acid sequence by 1-15 amino acid substitutions, 1-7 amino acid deletions, 1-7 amino acid insertions, or a combination thereof.

7. The variant protein of claim 1, wherein said polypeptide domain does not comprise residues 1-51, residues 52-114, residues 115-174, residues 175-235, residues 236-291, or residues 292-317 of SEQ ID NO : 1.

8. Use of EPYC1 or the variant protein of any one of claims 1-7 to bind RuBisCO in a cell, wherein said cell does not contain a native copy of EPYC1 or an ortholog thereof.

9. The use according to claim 8, wherein said cell is a cyanobacterium, red alga, green alga, or embryophyte.

10. The use according to claim 9, wherein said embryophyte is a C3 or C4 plant.

11. The use according to claim 10, wherein said embryophyte is a C3 or C4 plant, such as Arabidopsis, soybean, a rice, tobacco, a wheat, maize, or a grass.

12. The use according to claim 8, wherein said EPYC1 or variant protein is expressed transiently or stably in the cell.

13. Use of EPYCl or the variant protein of any one of claims 1-7 to bind RuBisCO in a green alga, wherein said green alga contains at least one native copy of EPYCl or an ortholog thereof.

14. Use of EPYCl or the variant protein of any one of claims 1-7 to bind RuBisCO in a green alga, wherein said green alga contains an inactivated or silenced native copy of EPYCl or an ortholog thereof.

15. The use according to claim 14, wherein said green alga is a wild- type or mutant Chlamydomonas.

16. The use according to claim 14, wherein expression of said EPYCl or variant protein at least increases the rate of production of biomass, reduces the requirement for fertilizer and/or irrigation, increases CO2 assimilation per unit of RuBisCO, decreases the rate of oxygenation of ribulose-l,5-biphosphate catalyzed by RuBisCO, enhances RuBisCO catalytic rate, or any combination thereof in the cell.

17. The use according to claim 14, wherein expression of said EPYCl or variant protein causes at least some of the RuBisCO to cluster such that the mean center-to-center distance between molecules of RuBisCO is decreased as compared to the absence of EPYCl or variant protein expression in the cell.

18. Use of EPYCl or the variant protein of any one of claims 1-7 to bind RuBisCO in a cell containing a pyrenoid, wherein expression of said EPYCl or variant protein at least increases EPYCl expression to promote

RuBisCO localization to the pyrenoid or increases the size of the pyrenoid.

19. Use of EPYCl or the variant protein of any one of claims 1-7 to make a fusion protein, which comprises said EPYCl or variant protein linked to a heterologous domain that is not present in EPYCl in either order.

20. A fusion protein comprising an N-terminal EPYCl transit peptide linked to a heterologous domain that is not present in EPYCl, wherein said transit peptide targets the fusion protein to the chloroplast.

21. The fusion protein of claim 20, wherein said transit peptide is derived from at least 10, 20, 30, or 40 contiguous amino acids at the N- terminus of SEQ ID NO: 1, for example residues 1-51 of SEQ ID NO: 1.

22. The fusion protein of claim 20, wherein said heterologous domain is derived from a protein that is not imported into the chloroplast.

23. The fusion protein of claim 20, wherein the transit peptide is cleaved from the heterologous domain after transport into the chloroplast.

24. The fusion protein of claim 20, wherein the heterologous domain is targeted to the pyrenoid by the transit peptide.

Description:
ALGAL COMPONENTS OF THE PYRENOID'S CARBON

CONCENTRATING MECHANISM

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application No. 62/333,807, filed May 9, 2016; which is incorporated by reference herein.

FEDERALLY-SPONSORED RESEARCH OR DEVELOPMENT The U.S. Government has certain rights in this invention as provided in contracts EF-1105617 and IOS-1359682 awarded by the National Science Foundation.

BACKGROUND

The invention relates to molecules (e.g., algal polypeptides and the nucleic acids encoding them), their transport into the pyrenoid, and their involvement in metabolic reactions such as photosynthesis. In many algae, photosynthesis in chloroplasts is improved by a carbon concentrating mechanism that includes a pyrenoid. Components of the pyrenoid's carbon concentrating mechanism are identified. They are used, inter alia, to manipulate metabolic processes, thereby providing improvements in photo- synthesis, increased biomass production and crop yield in photosynthetic organisms, and novel processes and products resulting therefrom.

Carbon fixation or carbon assimilation, which converts inorganic carbon (CO2) into organic compounds, can occur through photosynthesis by cyanobacteria, algae, and land plants. They grow by fixing inorganic carbon and synthesizing organic compounds using energy from the sun. Carboxysomes in cyanobacteria and the pyrenoid in algae are packed with ribulose-l,5-biphosphate carboxylase/oxygenase (RuBisCO), which is the enzyme (EC 4.1.1.39) catalyzing the rate-limiting step in the Calvin-Benson cycle. Carboxysomes and the pyrenoid increase the local concentration of carbon dioxide around RuBisCO to fix carbon dioxide at a higher rate. Annually, there is net conversion by photosynthesis of approximately 400 billion metric tons of carbon dioxide. But the gross amount of carbon dioxide fixed is much larger since about 40% is consumed by respiration following photosynthesis. Most carbon fixation occurs in the marine envi- ronment, where there are high levels of nutrients and photosynthetic microbes. Carbon fixation, converting carbon dioxide into a three-carbon sugar, is an endothermic redox reaction. Thus, photosynthesis needs to supply both a source of energy to drive this process, and electrons to convert carbon dioxide into a carbohydrate via a reduction reaction. In effect, photosynthesis is the opposite of cellular respiration in which glucose and other compounds are oxidized to produce carbon dioxide and water, and to release chemical energy that drives metabolism.

Photosynthesis occurs in two stages. In the first, light-dependent (or light) reactions capture the energy of light and store the energy as ATP and NADPH. During the second, the light-independent (or dark) reactions use these products to capture and reduce carbon dioxide. Since water is used as the electron donor in oxygenic photosynthesis, CO2 + 2H2O + photons → [CH2O] + O2 + H2O. Water is a reactant in the first stage and a product in the second stage. Thus, canceling one water molecule on each side of the equation, gives CO2 + H2O + photons→ [CH2O] + O2. For the second stage, the light-independent reactions use RuBisCO to convert carbon dioxide into a three-carbon sugar in a process called the Calvin-Benson cycle. This simple sugar is an intermediate in the synthesis of carbohydrates and other organic compounds, a precursor for lipid and amino acid biosynthesis, and a source of energy for cellular respiration.

RuBisCO is one of many enzymes in the Calvin-Benson cycle. During carbon fixation, ribulose-l,5-bisphosphate and carbon dioxide are this enzyme's substrates. RuBisCO also catalyzes a reaction between ribulose- 1,5-bisphosphate and molecular oxygen instead of carbon dioxide. Only 3- 10 carbon dioxide molecules are fixed per molecule of enzyme each second. Under most conditions, the speed of RuBisCO responds positively to an increase in the concentration of carbon dioxide when the amount of light is not limiting. Enzyme side activities can lead to useless or inhibitory byproducts (e.g., xylulose-l,5-bisphosphate). When molecular oxygen is the substrate, the products of the oxygenase reaction are phosphoglycolate and 3-phosphoglycerate. Phosphoglycolate is recycled through a sequence of reactions called photorespiration, where two molecules of phosphoglycolate are converted into one molecule of 3-phosphoglycerate, which can reenter the Calvin-Benson cycle, and the release (loss) of carbon dioxide and some nitrogen into the environment. Photorespiration uses energy but does not produce sugars. Thus, RuBisCO's slow enzymatic activity and its inability to prevent reacting with oxygen greatly reduces the photosynthetic capacity of many land plants. Some plants, many algae, and cyanobacteria have addressed this problem by increasing the concentration of CO2 around RuBisCO through carbon concentrating mechanisms, which include C 4 carbon fixation, crassulacean acid metabolism, and use of carboxysomes and the pyrenoid.

RuBisCO is the most abundant protein in leaves, accounting for up to 50% of soluble leaf protein in C3 plants (20-30% of total leaf nitrogen) and 30% of soluble leaf protein in C 4 plants (5-9% of total leaf nitrogen). Thus, it would be difficult to increase production of this rate-limiting enzyme. Given its importance in the biosphere, increasing RuBisCO activity by genetic engineering of pyrenoid components and transfer of its carbon concentrating mechanism addresses long-felt needs in the fields of synthetic biology, food security, energy independence, adaptation to climate change, and mitigation of its long-term effects.

The inventors describe below their identification of components of the pyrenoid, essential proteins transported into the pyrenoid, and manipulation of metabolic processes with such proteins or variants thereof. Products, compositions, and processes for using and making them are described as different aspects of the invention. They provide alternatives to the transfer of C 4 carbon fixation, crassulacean acid metabolism, or carboxysomes into land plants. Further advantages and improvements are described below or would be apparent from the disclosure herein.

SUMMARY

Discovery of an abundant repeat protein that binds RuBisCO represents an important advance in understanding the biogenesis of the pyre- noid. The repeat sequence of this protein and our proposed structure may provide an elegant model to explain the structural arrangement of RuBisCO enzymes in the pyrenoid.

In a first embodiment, EPYC1 or a variant protein thereof is provided.

The protein comprises a polypeptide domain having (i) a length of at least 56-63 contiguous amino acids and (ii) one or more variations from the EPYC1 amino acid sequence (SEQ ID NO: 1); wherein said variations are 1-15 amino acid substitutions, 1-7 amino acid deletions, 1-7 amino acid insertions, or a combination thereof as compared to SEQ ID NO: 1. The latter can be used to identify a similar sequence (SEQ ID NO: 2) in another green alga. The polypeptide domain may comprise at least residues 1-51, residues 52-114, residues 115-174, residues 175-235, residues 236-291, or residues 292-317 of SEQ ID NO: 1, which may or may not have been mutated from the amino acid sequence of SEQ ID NO : 1 by 1-15 amino acid substitutions, 1-7 amino acid deletions, 1-7 amino acid insertions, or a combination thereof. The polypeptide domain may or may not comprise residues 1-51, residues 52-114, residues 115-174, residues 175-235, residues 236-291, or residues 292-317 of SEQ ID NO: 1. Fusion proteins may comprise the polypeptide domains and heterologous domains.

In a second embodiment, EPYC1 or a variant protein may be used to bind RuBisCO in a cell, wherein the cell does not contain a native copy of EPYC1 or an ortholog thereof. The cell or an organism from which the cell is obtained may be transformed with a nucleic acid (e.g., expression vector) encoding EPYC1 or the variant protein. In one or more optional steps, the nucleic acid is manipulated to enable transformation and/or expression before transformation, then a functional characteristic is assessed (e.g., quantitative measurement or qualitative determination) for a change in RuBisCO enzymatic activity or other cellular function after transformation.

In a third embodiment, EPYC1 or a variant protein may be used to bind RuBisCO in a green alga, wherein the green alga contains an inactivated or silenced native copy of EPYC1 or an ortholog thereof. The green alga may be transformed with a nucleic acid (e.g., expression vector) encoding EPYC1 or the variant protein. In one or more optional steps, the nucleic acid is manipulated to enable transformation and/or expression before transformation, then a functional characteristic is assessed (e.g., quantitative measurement or qualitative determination) for a change in RuBisCO enzymatic activity or other cellular function after transformation.

In fourth embodiment, EPYC1 or a variant protein may be used to bind RuBisCO in a green alga, wherein the green alga contains at least one native copy of EPYC1 or an ortholog thereof. The green alga may be transformed with a nucleic acid (e.g., expression vector) encoding EPYC1 or the variant protein. In one or more optional steps, the nucleic acid is manipulated to enable transformation and/or expression before transformation, then a functional characteristic is assessed (e.g., quantitative measurement or qualitative determination) for a change in RuBisCO enzymatic activity or other cellular function after transformation.

In fifth embodiment, EPYC1 or a variant protein may be used to bind RuBisCO in a cell containing a pyrenoid, wherein expression of said EPYC1 or variant protein at least increases EPYC1 expression to promote RuBisCO localization to the pyrenoid or increases the size of the pyrenoid.

As used herein, a "heterologous domain" is at least ten, at least 20, at least 40, at least 80, at least 160, at least 320, at least 640, or at least 1280 contiguous amino acids that are not in the amino acid sequence of EPYC1. The heterologous domain may have an enzymatic activity, or another cellular function such as a substrate or binding partner for another protein. For example, the heterologous domain may be identical or similar in amino acid sequence to another molecular component of the carbon concentrating mechanism, or be a functional equivalent thereof. In some cases, the fusion protein is expressed in a host cell that does not express a host protein containing the heterologous domain, EPYCl, and/or the variant protein. Alternatively, the host cell expressing the fusion protein may also express a host protein containing the heterologous domain, EPYCl, and/or the variant protein. The fusion protein could comprise (i) EPYCl or a variant protein linked to (ii) a heterologous domain that is not present in EPYCl; the heterologous domain is translated before or after the EPYCl or variant protein (i.e., one is N- or C-terminal to the other in the fusion protein).

The host cell expressing the fusion protein may or may not have an active, mutated, inactivated, or silenced copy of a host protein containing the heterologous domain, EPYCl, and/or the variant protein. Although not critical for this definition, the heterologous domain has a finite length (e.g., not more than 10,000, not more than 5000, or not more than 2000 contiguous amino acids).

A nucleic acid or expression vector encoding the aforementioned protein may be transformed into a cell or organism. Alternatively, the cell may be obtained from the transformed organism. The cell or organism may or may not be photosynthetic; the cell or organism may or may not have a C3 metabolic pathway, C 4 metabolic pathway, crassulacean acid metabolic pathway, carbon concentrating mechanism, or carboxysome. The cell may be derived from or the organism may be a cyanobacterium, red or green alga, or embryophyte. The embryophyte may be a C3 or C 4 plant, such as Arabidopsis, soybean, a rice, tobacco, a wheat, maize, or a grass (e.g., sugar cane). The cell or organism may or may not produce or be a source for biomass, dietary protein, oils, and/or grains (e.g., a cereal or legume).

Expression of EPYCl or a variant protein in a cell may at least increase the rate of production of biomass, reduce the requirement for fertilizer and/or irrigation, increase CO2 assimilation per unit of RuBisCO, decrease the rate of oxygenation of ribulose-l,5-biphosphate catalyzed by RuBisCO, enhance RuBisCO's catalytic rate, or any combination thereof. Expression of EPYCl or a variant protein in a cell may cause at least some of the RuBisCO to cluster such that the mean center-to-center distance between molecules of RuBisCO is decreased as compared to the absence of EPYCl or variant protein expression in the cell. This distance has been measured with cryo-electron tomography as ~15 mm when RuBisCO is hexagonally packed within the pyrenoid of Chlamydomonas reinhardtii. Expression of EPYCl or a variant protein in a cell containing a pyrenoid may at least increase EPYCl expression to promote RuBisCO localization to the pyrenoid or increase the size of the pyrenoid.

Also provided are processes for using and making these products. It should be noted, however, that a claim directed to the product is not necessarily limited to these processes unless the particular steps of the process are recited in the product claim.

Further aspects will be apparent to a person skilled in the art from the following description of specific embodiments and the claims, and generalizations thereto.

BRIEF DESCRIPTION OF DRAWINGS

Figure 1 : EPYCl is an abundant pyrenoid protein. (Fig. 1A), TEM images of Chlamydomonas reinhardtii whole cells and pyrenoid-enriched pellet fraction from cells grown at low CO2. Arrow in the panel labeled "Whole cell" indicates the pyrenoid; three arrows in the panel labeled "Pellet" indicate pyrenoid-like structures. Scale bar: 2 pm. (Fig. IB) Mass- spectrometric analysis of 366 proteins in pyrenoid-enriched pellet fractions from low- and high-C02 grown cells (mean of 4 biological replicates). RbcL, RBCS, EPYCl and RCA1 (dark) are abundant in low CO2 pellets (determined by intensity-based absolute quantification (iBAQ); y-axis). Additionally, these proteins showed increased abundance in low CO2 compared to high CO2 pellets (determined by label-free quantification (LFQ); x-axis). (Fig. 1C) Confocal microscopy of EPYCl-Venus and RBCSl-mCherry co- expressed in wild-type cells. Scale bar: 5 pm.

Figure 2: EPYC1 is an essential component of the carbon concentrating mechanism. (Fig. 2A) EPYC1 protein levels in WT and epycl mutant cells grown at low and high CO2 were probed by Western blotting with anti- EPYC1 antibodies. Anti-tubulin is shown as a loading control. (Fig. 2B) Growth phenotypes of WT, epycl and three lines complemented with EPYCl . Serial 1 : 10 dilutions of WT, epycl, epycl wEPYCl, epycl v. EPYCl- mCherry and epycl : : EPYCl-Venus lines were spotted on TP minimal medium and grown at low and high CO2 under 500 pmol photons nr 2 s 1 illumination. (Fig. 2C) Inorganic carbon affinity of WT (left) and epycl (right cells. Cells were pre-grown at low or high CO2, and whole-cell inorganic carbon affinity was measured as the concentration of inorganic carbon at half maximal O2 evolution (data is a mean of 5 low CO2 or 3 high CO2 biological replicates; error bars: SEM; asterisk: =0.0055, Student's t- test).

Figure 3 : EPYCl is essential for RuBisCO aggregation in the pyrenoid. (Fig. 3A) Representative TEMs of WT and epycl cells grown at low CO2. (Fig. 3B) Quantification of pyrenoid area as percentage of cell area of WT (solid line) and epycl (dashed line) cells grown at low CO2 (data is from TEM images as represented in Fig. 3A, epycl : n=37, WT: n=79, < 10 19 , Welch's t-test). (Fig. 3C) Quick-freeze deep-etch electron microscopy (QFDEEM) of the pyrenoid of WT and epycl cells grown at low CO2. M, pyrenoid matrix; St, stroma; Th, thylakoids; SS, starch sheath. Insets are a 400% zoom of the pyrenoid matrix. (Fig. 3D) RuBisCO protein levels in WT and epycl cells grown at low and high CO2 were probed by Western blotting. (Fig. 3E) The localization of RuBisCO was determined by microscopy of WT and epycl mutants containing RBCSl-mCherry. The sum of fluorescence signal from Z stacks is shown and was used for quantitation. (Fig. 3F) The fraction of RBCSl-mCherry signal from outside the pyrenoid region (inner dotted line, Fig. 3E) was quantified in WT and epycl cells at low C0 2 (epycl : n=27, WT: n=27, *** represents < 10 "15 , Student's t- test). (Fig. 3G) Representative images of anti-RuBisCO immunogold labeling of WT and epycl cells grown at low CO2. Gold particles were enlarged lOx for visibility. (Fig. 3H) The fraction of immunogold particles outside the pyrenoid was quantified (WT: n=26 cells, 8123 gold particles; epycl : n=27 cells, 2708 gold particles; *** represents < 10 15 , Student's t-test). Figs. 3F and 3H show mean values with error bars indicating SEM. Arrow indicates pyrenoid. Scale bars: 1 pm.

Figure 4: EPYC1 forms a complex with RuBisCO. (Fig. 4A) Anti-FLAG co-immunoprecipitations (co-IPs) of WT cells expressing Venus-3xFLAG, EPYCl-Venus-3xFLAG and RBCSl-Venus-3xFLAG are shown. For each co- IP, the input, flow-through (FT), 4 th wash (wash), 3xFLAG elution (FLAG Elu.) and boiling elution (Boil. Elu.) were probed with anti-FLAG, anti- RuBisCO or anti-EPYCl. Right hand side labels show the expected sizes of proteins. (Fig. 4B) Analysis of the EPYC1 protein sequence shows that EPYC1 consists of four nearly identical repeats. (Fig. 4C) Each repeat has a highly disordered domain (light shading) and less disordered domain (dark shading) containing a predicted alpha-helix (thick line) rich in charged residues. (Fig. 4D) Amino acid alignments of the four repeats (SEQ ID NOS: 3-6) are shown. Asterisks indicate residues that are identical in all four repeats. N, S, T and W are polar residues. K and R are basic residues; D and E are acid residues. A, G, L, P and V are nonpolar or hydrophobic residues. A conservative substitution replaces a residue with another in the same class. NH2- and -COOH termini (SEQ ID NOS: 7-8, respectively) are not shown. (Figs. 4E-4F) Two models illustrate how EPYC1 could bind the RuBisCO holoenzyme in a manner that is compatible with the observed packing of RuBisCO in the pyrenoid. (Fig. 4E) EPYC1 and RuBisCO could form a co-dependent network. If each EPYC1 can bind four RuBisCO holoenzymes, and each RuBisCO holoenzyme can bind eight EPYCls, eight EPYC1 proteins could connect each RuBisCO to twelve neighboring RuBisCOs. (Fig. 4F) EPYC1 could form a scaffold onto which RuBisCO binds. Both arrangements could expand indefinitely in every direction. For clarity, the spacing between RuBisCO holoenzymes was increased and EPYC1 is depicted in light and dark shading for contrast.

Figure 5: An illustration of Chlamydomonas reinhardtii in cross section drawn by Ninghui Shi.

Figure 6: A schematic of the carbon concentrating mechanism (CCM) in Chlamydomonas reinhardtii drawn by Moritz Meyer.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

Photosynthesis occurs in chloroplasts of algae and land plants. Since the genes for many chloroplast protein have been moved from the chloroplast genome to the nuclear genome, many proteins that would have been translated in the chloroplast immediately after endosymbiosis are now synthesized in the cytoplasm. In land plants, some 11-14% of the DNA in the nucleus can be traced back to the chloroplast. Among land plants, the contents of the chloroplast genome are fairly similar and imply they were already present early in the evolution of land plants. Curiously, around half of the proteins encoded by the genes transferred to the nucleus are not transported back to the chloroplast, but are localized in other parts of the cells and have acquired new functions.

Chloroplast proteins encoded in the nuclear genome must be transported back to the chloroplast, and imported through at least two chloroplast membranes. In most, but not all cases, nuclear-encoded chloroplast proteins are translated with a transit peptide that is at the N-terminus of the protein precursor and cleaved after transport across the chloroplast membranes. Sometimes the transit sequence is found on the C-terminus of the protein, or within the functional part of the protein.

After a polypeptide bound for the chloroplast is synthesized on a cytosolic ribosome, an enzyme specific to chloroplast proteins phosphory- lates many (but not all) of them in their transit peptides. Phosphorylation helps many proteins bind the polypeptide, keeping it from folding prematurely. This is important because it prevents chloroplast proteins from assuming their active form and carrying out their chloroplast functions in the wrong place— the cytosol. At the same time, they have to keep just enough shape so that they can be recognized by the chloroplast. Such chap- erones also help import the polypeptide into the chloroplast. From the cytosol, chloroplast proteins bound for the stroma must pass through two protein complexes— the TOC (translocon on the outer chloroplast membrane) complex and the TIC (translocon on the inner chloroplast mem- brane) translocon. Chloroplast polypeptide chains probably often travel through the two complexes at the same time, but the TIC complex can also retrieve preproteins lost in the intermembrane space.

Land plants contain chloroplasts that are generally lens-shaped, 5- 8pm in diameter and l-3pm thick. Greater diversity in chloroplast morphol- ogy exists among algae, which often contain a single chloroplast that can be shaped like a cup (e.g., Chlamydomonas). In some algae, the chloroplast takes up most of the cell, with pockets for the nucleus and other organelles. All chloroplasts have at least three membranes: the outer chloroplast membrane, the inner chloroplast membrane, and the thylakoid system. Chloro- plasts that are the product of secondary endosymbiosis may have additional membranes surrounding these three. Between outer and inner chloroplast membranes is the intermembrane space. Within the double membrane is the chloroplast stroma, a semi-gel-like fluid that makes up much of a chlo- roplast's volume, and in which the thylakoid system floats. Nucleoids of chloroplast DNA, chloroplast ribosomes, the thylakoid system, and many proteins (including RuBisCO) can be found floating around in the stroma of land plants. The Calvin-Benson cycle, which fixes carbon dioxide into sugar, takes place in the stroma. Thylakoids are membranous sacs where there is chlorophyll and the light reactions of photosynthesis occur. In most vascular plant chloroplasts, the thylakoids are arranged in stacks called grana, though in certain C 4 plant chloroplasts and some algal chloroplasts, the thylakoids are free floating.

Algae contain pyrenoids. They are not found in higher plants. The pyrenoid is a roughly spherical and highly refractive body, which in many algae is a site of starch accumulation. In algae with carbon concentrating mechanisms, the enzyme RuBisCO is found in the pyrenoid. The pyrenoid is associated with the carbon concentrating mechanism that improves the operating efficiency of carbon assimilation by increasing the CO2 concentration around RuBisCO, and overcomes diffusive limitations in aquatic photosynthesis with inorganic carbon transporters at the plasma membrane and chloroplast envelope and carbonic anhydrases. Thus, the carbon concentrating mechanism actively suppresses RuBisCO oxygenase activity and associated photorespiration.

RuBisCO usually consists of two protein subunits: the large chain and the small chain. The large-chain gene is encoded by the chloroplast genome in plants. There are typically several small-chain genes in the nuclear genome of plants, and the small chains are imported to the stromal compartment of chloroplasts from the cytosol by crossing the outer chloroplast membrane. Binding sites for RuBisCO's substrate, ribulose 1,5-bisphos- phate, are located in dimerized large chains where amino acid residues from each large chain contribute to the binding sites. A total of eight large chains (or four dimers) and eight small chains assemble into a larger complex. The pyrenoid requires a specific amino acid sequence at two surface-exposed a-helices of the RuBisCO small subunit. Higher plant-like helices knock out the pyrenoid, whereas native algal helices establish a pyrenoid. In general, Chlamydomonas proteins expressed in a land plant (e.g., transient expression in Nicotiana tabacum leaves and stable expression in Arabidopsis thaliana) will localize to the same compartments. See Atkinson et al., Proc. Natl. Acad. Sci. USA 14(5) : 1302-1315 (2016).

Chlamydomonas proteins that are putative components of the carbon concentrating system can be expressed transiently or stably in bacteria (e.g., cyanobacterium), algae (e.g., Chlorophyta such as Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus tauri, and Ulva lactuca), and land plants (e.g., C 4 or C3 plants such as rice, wheat, and soybean). Binary expression vectors carrying the genes that encode each protein component of the carbon concentrating system of algae can be generated by PCR amplification of cDNA or genomic DNA, and subsequent Gateway cloning. Gene expression may be under the control of a constitutive or inducible promoter, slicing signals, 5'- and 3'-untranslated regions and a terminator that are derived from the cauliflower mosaic virus 35S gene, nopaline synthase (NOS), T-DNA, other viruses, and microbes. Stop codons are removed to allow in-frame fusion at the N- or C-terminus with a transit peptide for transport into chloroplasts or other subcellular compartments, an affinity tag (e.g., calmodulin binding protein, FLAG, glutathione-S-trans- ferase, polyhistidine, hemaglutinin, maltose binding protein, c-Myc, strep- tavidin, SUMO, and thioredoxin) for detection or purification, a fluorescent protein to localize the fusion within the chloroplast or pyrenoid, specific protease cleavage sites, and other functional protein domains. An expression vector can be introduced into a cell by Agrobacterium-mediated gene transfer, agroinfiltration, chemical transfection, electroporation, lipofection, microinjection, or particle gun. See, for example, Barampuram & Zhang, Plant Chromosome Engineering 701: 1-35 (2010). Such genetic engineering reagents and techniques facilitate identification of components of the carbon concentrating mechanism, transfer of one or more of those components into a host cell, modification of the RuBisCO small subunit in a chloroplast-containing cell, inactivation or silencing of carbonic anhydrases in the chloroplast stroma and/or thylakoid lumen, addition or subtraction of certain host proteins, mutation of host gene functions, and improvements of plant functional traits by genetic modification and transfer.

Fusion of a heterologous protein domain with EPYC1 (or its transit peptide) can target the heterologous domain to the chloroplast. Fusion between a heterologous protein domain and the RuBisCO small subunit (or its transit peptide) could also be used to target the chloroplast. Localization to the pyrenoid through association with RuBisCO may be obtained by fusing one or more of EPYCl's four repeats with a heterologous protein domain.

The genetics of Chlamydomonas reinhardtii (see Mussgnug, Appl.

Microbiol. Biotechnol. 99(13) : 5407-5418, 2015) and other algae (see Hallman, Transgenic Plant J. 1(1) : 81-98, 2007) are highly developed. Selectable markers, promoters, reporters, transformation techniques, and other genetic tools and methods are already available for various species and currently ~25 species are accessible to genetic transformation. Large- scale sequencing projects are ongoing or completed for several algal species: e.g., Chlamydomonas reinhardtii, Volvox carte ri, and Ostreococcus tauri. A driving force in algal transgenics is the prospect of using genetically modified algae as bioreactors. Having identified and isolated algal genes involved in the carbon concentrating mechanism, one or more of the components can be transferred (with or without genetic modification) into algae and other photosynthesizing organisms: Chlorophyta (green algae) such as Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus tauri, and Ulva lactuca; Rhodophyta (red algae) like Palmaria palmata and Porphyra umbilicalis; Cyanophyta (cyanobacteria) such as Synechocystis sp., Pelagi- bacter ubique, and Prochlorococcus marinus; Glaucophyta (glaucophyte algae); cereals, especially maize (Zea mays), rices (e.g., Oryza sativa and Oryza glaberrima), and wheats (e.g., Triticum aestivum and Triticum durum); legumes, especially soybean (Glycine max), peas (e.g., Pisum sativum), beans (e.g., Phaseolus vulgaris), and lentil (Lens culinaris); oil- producing plants such as palm (e.g., Elaeis guineensis and Elaeis oleifera), coconut (Cocos nucifera), rapeseed (Brassica napus), and canola (Brassica rapa); C 4 plants (e.g., sugar cane, maize, and sorghum); and C3 plants (e.g., rices, wheats, and soybean).

Molecular components of the carbon concentrating mechanism, such as EPYC1, may be identified and isolated in green algae generally (e.g., C. globosa and V. globator), but not in cyanobacteria, red algae, and embryo- phytes. Other components of the algal carbon concentrating mechanism may also need to be transferred along with EPYC1 such as, for example, the large subunit of RuBisCO, small subunits RuBisCO, inorganic carbon transporters, carbonic anhydrases, etc. The transfer of the carbon concentrating machinery into a cell from photosynthesizing organisms, such as those above (especially Oryza sativa, Oryza glaberrima, Triticum aestivum, Triticum durum, and Glycine max), may be confirmed by measuring photosynthetic efficiency of the genetically modified organism with growth in an artificially low concentration of carbon dioxide. Increased production of biomass, and more efficient use of nitrogen and/or water (i.e., a reduced need to fertilize and irrigate crops) are some improvements that might be expected even under atmospheric concentration of CO2. EXAMPLES

Biological carbon-fixation is a key step in the global carbon cycle that regulates the atmosphere's composition while producing the food we eat and the fuels we burn. Approximately one-third of global carbon-fixation occurs in an overlooked algal organelle called the pyrenoid. The pyrenoid contains the C02-fixing enzyme RuBisCO, and enhances carbon-fixation by supplying RuBisCO with a high concentration of CO2. Since the discovery of the pyrenoid over 130 years ago, the molecular structure and biogenesis of this ecologically fundamental organelle have remained enigmatic. Here, we use the model green alga Chlamydomonas reinhardtii to discover that a low complexity repeat protein, Essential Pyrenoid Component 1 (EPYC1), links RuBisCO to form the pyrenoid. We find that EPYC1 is of comparable abundance to RuBisCO and colocalizes with RuBisCO throughout the pyrenoid. We show that EPYC1 is essential for normal pyrenoid size, number, morphology, RuBisCO content and efficient carbon-fixation at low CO2. We explain the central role of EPYC1 in pyrenoid biogenesis by finding that EPYC1 binds RuBisCO to form the pyrenoid matrix. We propose two models where EPYCl's four repeats could produce the observed lattice arrangement of RuBisCO in the Chlamydomonas pyrenoid . Our results suggest a surprisingly simple molecular mechanism for how RuBisCO can be packaged to form the pyrenoid matrix, potentially explaining how RuBisCO packaging into a pyrenoid could have evolved across a broad range of photosynthetic eukaryotes through convergent evolution . Additionally, our findings represent a key step towards engineering the pyrenoid into crops to enhance their carbon-fixation efficiency.

Many eukaryotic green algae possess carbon concentrating mecha- nisms (CCMs) that enhance photosynthetic efficiency and thus permit high growth rates at low CO2 concentrations. They are thus an attractive option for improving productivity in higher plants.

RuBisCO, the most abundant enzyme in the biosphere, fixes CO2 into organic carbon that supports nearly all life on Earth . Over the past three billion years, the enzyme became a victim of its own success, as it drew down the atmospheric CO2 concentration to trace levels and as the oxygen- producing reactions of photosynthesis filled our atmosphere with O2. In today's atmosphere, O2 competes with CO2 at RuBisCO's catalytic active site. This competition results in the production of the toxic compound phos- phoglycolate, which must be metabolized at the expense of energy and the loss of fixed carbon and nitrogen. Carboxylation of ribulose 1,5-bisphos- phate (RuBP) initiates the rate-limiting step of photosynthetic carbon fixation. However, O2 is mutually competitive with CO2, and oxygenation of RuBP is a nonessential side reaction that ultimately leads to the loss of CO2 in the photorespiratory pathway. Thus, net carbon fixation is determined by the difference between the rates of carboxylation and oxygenation, which are ultimately determined by the V max values for carboxylation (V c ) and oxygenation (V 0 ), the K m values for CO2 (K c ) and O2 (K 0 ), and the concentrations of CO2 and O2 at the RuBisCO's catalytic active site. Another kinetic constant referred to as the CO2/O2 specificity factor (Ω) is equal to the catalytic efficiency of carboxylation (V c /K c ) relative to the catalytic efficiency of oxygenation (V 0 /K 0 ). To overcome RuBisCO's limitations, many photosynthetic organisms have evolved carbon concentrating mechanisms (CCMs). CCMs increase the CO2 concentration around RuBisCO, decreasing O2 competition and enhancing carbon fixation.

At the heart of the CCM of eukaryotic algae is an organelle called the pyrenoid. The pyrenoid is a spherical structure in the chloroplast stroma, discovered over 130 years ago. Pyrenoids have been found in nearly all of the major oceanic eukaryotic primary producers, and mediate approximately 28-44% of global carbon fixation. The pyrenoid typically consists of a matrix surrounded by a starch sheath and traversed by membrane tubules continuous with the photosynthetic thylakoid membranes. The matrix is thought to primarily consist of tightly packed RuBisCO and its chaperone, RuBisCO activase. In higher plants and non-pyrenoid- containing photosynthetic eukaryotes, RuBisCO is instead found in soluble form throughout the chloroplast stroma. The molecular mechanism by which RuBisCO aggregates to form the pyrenoid matrix was previously unknown.

Two mechanisms for RuBisCO accumulation in the pyrenoid have been proposed : (a) RuBisCO holoenzymes could bind each other directly through hydrophobic residues or (b) a linker protein may link RuBisCO holoenzymes together. The second model is based on analogy to the well- characterized prokaryotic carbon concentrating organelle, the β- carboxysome, where RuBisCO aggregation is mediated by a linker protein consisting of repeats of a domain resembling the RuBisCO small subunit. Here, we find that RuBisCO accumulation in the pyrenoid of the model alga Chlamydomonas reinhardtii is mediated by a disordered repeat protein EPYC1. Without being bound to any theory, our findings may suggest a mechanism for aggregation of RuBisCO in the pyrenoid matrix, and highlight similarities and differences between the mechanisms of assembly of the eukaryotic and prokaryotic organelles. EPYCl is an abundant pyrenoid component. We hypothesized that the pyrenoid contains unidentified components that are important for its biogenesis. We therefore used mass spectrometry to analyze the protein composition of the pyrenoid of the model green alga Chlamydomonas reinhard- tii, before and after applying a stimulus that induces pyrenoid growth. When cells are transferred from high CO2 (2-5% CO2 in air) to low CO2 (air; ~0.04% CO2), the CCM is induced and the pyrenoid increases in size. We developed a protocol for isolating largely intact pyrenoids by cell lysis and centrifugation, and applied this protocol to cells before and after a shift from high to low CO2 (Fig. 1A) . Mass spectrometry indicated that the most abundant proteins in the low CO2 pyrenoid fraction included the RuBisCO large (rbcL) and small (RBCS) subunits, as well as RuBisCO activase (RCA1) .

Strikingly, a fourth protein, previously identified as a low CO2 induced, nuclear encoded protein (LCI5, Crel0.g436550), was found in the low CO2 pyrenoid fraction with comparable abundance to RuBisCO (Fig . IB) . Based on the data presented herein, we propose to name this protein Essential Pyrenoid Component 1 (EPYCl) . Under low CO2, the stoichiometry of EPYCl was ~ 1 : 6 with rbcL and ~ 1 : 1 with RBCS (estimated by intensity- based absolute quantification or iBAQ). Consistent with EPYCl being a component of the pyrenoid, the abundance of EPYCl in the pyrenoid fraction increased ~ 12-fold after the shift from high to low CO2 (Fig. IB), a similar increase to that of rbcL (7-fold), RBCS (7-fold), and RCA1 ( 19-fold) . To confirm the pyrenoid localization of EPYCl, we co-expressed fluorescently tagged EPYCl and RBCS. Venus-tagged EPYCl showed clear co-localization with mCherry-tagged RBCS in the pyrenoid (Fig . 1C).

EPYCl is essential for a functional CCM. The high abundance of EPYCl in the pyrenoid led us to ask whether EPYCl is required for the CCM . We isolated a mutant in the 5' UTR of the EPYCl gene, which contains markedly reduced levels of EPYC1 mRNA and EPYC1 protein (Fig. 2A), and lacks transcriptional regulation in response to CO2. Similarly to previously described mutants in other components of the CCM, the epycl mutant showed defective photoautotrophic growth in low CO2, which was rescued by high CO2 and by reintroducing the EPYC1 gene (Fig. 2B). We further tested the CCM activity in epycl mutants by measuring whole-cell affinity for inorganic carbon, inferred from photosynthetic O2 evolution. When grown under low CO2, the epycl mutant showed a reduced affinity for inorganic carbon (increased K0.5) relative to wild-type (WT; =0.0055, n=5, Student's t-test; Fig. 2C). The affinity of the epycl mutant under low CO2 was slightly greater than that of WT at high CO2, indicating a residual level of CCM activity. This activity may be due to trace levels of EPYC1 in the epycl mutant, or normal concentration of CO2 followed by inefficient capture by RuBisCO. EPYC1 is required for normal pyrenoid size, number, and matrix density. Knowing that EPYC1 is in the pyrenoid and is required for the CCM, we explored whether the epycl mutant shows any visible defects in pyrenoid structure. Thin-section transmission electron micrographs (TEM) revealed that epycl mutants had smaller pyrenoids than WT at both low and high C0 2 (low C0 2 : n=37-79, < 10 "19 , Welch's t-test; high C0 2 : n = 18- 22, < 10 5 , Welch's t-test; Figs. 3A-3B). Chlamydomonas typically has one pyrenoid per cell. The epycl mutant showed a higher frequency of multiple pyrenoids: 13% of non-dividing epycl cells (n=231) showed multiple pyrenoids, compared with 3% of WT cells (n=252, P=0.00048, Fisher's exact test of independence). Higher resolution quick-freeze deep-etch electron microscopy indicated a lower packing density of granular material in the pyrenoid matrix of the epycl mutant (Fig. 3C). This defect was most noticeable when cells were grown in low CO2, but was also visible at high CO2. Interestingly, the epycl mutant retains a number of canonical pyrenoid characteristics, including correct localization in the chloroplast, the presence of a starch sheath under low CO2, and traversing membrane tubules, suggesting that these characteristics are regulated by mechanisms other than EPYCl. Additionally, the epycl mutant shows normal levels of the carbonic anhydrase CAH3, thought to be central to delivering CO2 to RuBisCO in the pyrenoid.

EPYCl is required for RuBisCO assembly into the pyrenoid. Our observations of decreased pyrenoid size and apparent matrix density in epycl mutants could be explained by decreased whole-cell levels of RuBisCO. However, Western blotting revealed no detectable difference in rbcL and RBCS abundance in epycl relative to WT cells or between cells grown at low and high CO2 levels (Fig. 3D). This result led us to hypothesize that the localization of RuBisCO was perturbed in epycl mutants. To test this hypothesis, we generated WT and epycl cell lines expressing RuBisCO tagged with mCherry, and determined the distribution of fluorescence signal by microscopy. Remarkably, a large fraction of RuBisCO was found outside the pyrenoid in the epycl mutant. In epycl cells grown in low CO2, 68% of fluorescence from RuBisCO tagged with mCherry was found outside the pyrenoid region, compared with 21% in WT cells (n=27, < 10 ~15 , Student's t-test; Figs. 3E-3F). Immunogold-EM confirmed the mislocalization of RuBisCO in epycl . In pyrenoid-containing sections of low- C02-grown epycl cells, 42% of anti-RuBisCO immunogold particles were found outside the pyrenoid, whereas only 6% were found outside the pyrenoid in WT (WT: n=26 cells, 8123 gold particles; epycl : n=27 cells, 2708 gold particles; < 10 15 , Student's t-test; Figs. 3G-3H).

If EPYCl functions in the recruitment of RuBisCO to the pyrenoid solely at low CO2, the epycl mutant could be trapped in a "high-C02" state of RuBisCO localization. However, the epycl mutant showed a defect in RuBisCO localization even under high CO2, where the fraction of RuBisCO- mCherry fluorescence outside the pyrenoid region increased to 80% in epycl, compared with 68% in the WT (WT: n = 20, epycl : n=20, =10 "6 , Student's t-test). We conclude that EPYCl is required for RuBisCO localization to the pyrenoid not only at low CO2, but also at high CO2.

EPYCl and RuBisCO are part of the same complex. EPYCl could promote RuBisCO's localization to the pyrenoid by a physical interaction. We therefore immunoprecipitated EPYCl and RuBisCO, and probed the eluates by Western blotting (Fig. 4A). Immunoprecipitation of tagged EPYCl pulled down the RuBisCO holoenzyme; and reciprocally, tagged RBCS1 pulled down EPYCl. We conclude that EPYCl and RuBisCO are part of the same supramolecular complex in the pyrenoid. The high abundance of EPYCl in the pyrenoid, EPYCl's physical interaction with RuBisCO, and the dependence of RuBisCO on EPYCl for localization to the pyrenoid, all suggest that EPYCl plays a structural role in pyrenoid biogenesis. EPYCl protein consists of four nearly identical repeats. To gain insights into how EPYCl might contribute to pyrenoid biogenesis, we performed a detailed analysis of the EPYCl protein sequence. This analysis indicated that EPYCl consists of four nearly identical ~60 amino acid repeats (Figs. 4B-4D), flanked by short N- and C-termini (in contrast to a suggestion in Turkina et al., Proteomics 6(9) : 2693-2704 (2006) of only three repeats). We found that each repeat consists of a predicted disordered domain and a shorter, less disordered domain containing a predicted alpha helix (Fig. 4C). Given that these repeats cover >80% of the EPYCl protein, it is likely that the RuBisCO binding site(s) are contained within the repeats.

Two models are proposed for RuBisCO assembly into the pyrenoid matrix by EPYCl. If each repeat of EPYCl binds RuBisCO, EPYCl could link multiple RuBisCO holoenzymes together to form the pyrenoid matrix. Multiple RuBisCO binding sites on EPYCl could arrange RuBisCO into the hexagonal close packed or cubic close packed arrangement observed in recent cryoelectron tomography studies of the Chlamydomonas pyrenoid. EPYC1 and RuBisCO could interact in one of two fundamental ways: (a) they could form a co-dependent network (Fig. 4E) or (b) EPYC1 could form a scaffold onto which RuBisCO binds (Fig. 4F). Importantly, the 60 amino acid repeat length of EPYC1 is sufficient to span the observed 2- 4.5nm gap between RuBisCO holoenzymes in the pyrenoid, and a stretched out repeat could potentially span the 15nm observed RuBisCO center-to- center distance. A promising candidate for an EPYC1 binding site on RuBisCO would be the two alpha-helices of the small RuBisCO subunit (cf. Meyer et al., Proc. Natl. Acad. Sci. USA 109(47) : 19474-19479, 2012). When these helices are exchanged for higher-plant alpha-helices, pyrenoids fail to form and the CCM does not function, but holoenzyme assembly is normal. Proteins with similar physicochemical properties to EPYC1 are present in a diverse range of eukaryotic algae. The primary sequences of disordered proteins like EPYC1 are known to evolve rapidly compared to structured proteins, but their physicochemical properties are under selective pressure and are evolutionarily maintained. We therefore searched for proteins with similar physicochemical properties (repeat number, length, high isoelectric point, disorder profile, and absence of transmembrane domains) across a broad range of algae. Orthologs of Chlamydomonas rein- hardti EPYC1 having consensus repeat sequence N H2-VTPSRSALPSNW KQELESLRSSSPAPASSAPAPARSSSASWRDAAPASSAPARSSSASKKA-COOH, SEQ ID NO: 9, were found in the following species (followed by their phylum, Uniprot or phytozome protein ID, the consensus repeat sequence, and sequence identifier) : Thalassiosira pseudonana (Heterokontophyta) B8CF53_THAPS (LSSKPSSAPFVRSEKPSSAPSDSPSASVAPTLETSFSPSSSGQP SPMTSESPS, SEQ ID NO: 10); Phaeodactylum t cornutum (Heterokonto- phyta) B7GDW7_PHATC (TG PS MTG PS DS D D RRLRS PSSTG PS LTG PS MTG PSA TGPSMTGPSM, SEQ ID NO : 11); Emiliania huxleyi (Haptophyta) R1G412_ EMIHU (PYLPISP ARLARGSTSPHLSPSLPISPHISRTARSRFHIAPSLPISPHISPT APHGFHEAPHLPISPHLS, SEQ ID NO: 12) and R1D601_EMIHU (WTAADDAL VKAGQEAGESWVDIAKRLPGRSADSVKS RSNRLKRQPDTSVKHEPVKRELVR, SEQ ID NO: 13); and Ostreococcus tauri (Chlorophyta) A0A096PAN3_ OSTTA (MAASKLGSKNASTRPTVGSTLDASALTPPSLRFTTENNIHSVPTAFGVAD RPASRRVLRREDA, SEQ ID NO: 14) and A0A090M8K8_OSTTA (MAASKLGS KNASTRPTVGSTLDASALTPPSLRFTTENNIHSVPTAFGVADRPASRRVLRREDA, SEQ ID NO: 15). Excitingly, proteins with similar properties were found in most pyrenoid-containing algae (e.g., Micromonas pusilla and Chlorella variabilis were exceptions), and appear to be absent in pyrenoid-less algae (e.g., Chlorella protothecoides, Cyanidioschyzon merolae, Galdieria sulphur aria, and Nannochloropsis gaditana), suggesting that EPYCl-like proteins may play similar roles in pyrenoids across eukaryotic algae. Discussion

Our data provide strong support to the concept that RuBisCO clustering into the pyrenoid is required for an efficient CCM in eukaryotic algae. Current models of the CCM suggest that CO2 is released at a high concentration from the thylakoid tubules traversing the pyrenoid matrix. The mislocalization of RuBisCO to the stroma of the epycl mutant could decrease the efficiency of capture of CO2 by RuBisCO, explaining the severe CCM defect observed in this mutant.

The observations presented here suggest that RuBisCO packaging to form the matrix of the eukaryotic pyrenoid is achieved by a different mech- anism from that used in the well-characterized prokaryotic β-carboxysome. In the β-carboxysome, aggregation of RuBisCO is mediated by the protein CcmM. CcmM contains multiple repeats of a domain resembling the RuBisCO small subunit, and incorporation of these domains into separate RuBisCO holoenzymes is thought to produce a link between RuBisCO holoenzymes. Given that the EPYC1 repeats show no homology to RuBisCO and are highly disordered, it is likely that they bind to the surface of RuBisCO holoenzymes rather than becoming incorporated in the place of small subunits. The simplicity of such a surface binding mechanism potentially explains how RuBisCO packaging into a pyrenoid could have evolved across a broad range of photosynthetic eukaryotes through convergent evolution, leading to the dominant role of pyrenoids in aquatic CO2 fixation. Such a surface binding mechanism may even organize RuBisCO in prokaryotic a-carboxysomes, where the intrinsically disordered RuBisCO-binding repeat protein CsoS2 plays a poorly understood role in assembly.

In addition to being a key structural component, EPYCl could regulate

RuBisCO partitioning to the pyrenoid or RuBisCO kinetic properties. The RuBisCO content of the pyrenoid changes in response to CO2 while total cellular RuBisCO stays constant (Fig. 3D). Given that EPYCl is required for RuBisCO localization to the pyrenoid, changes in EPYCl abundance and/or RuBisCO binding affinity could affect RuBisCO partitioning to the pyrenoid. Consistent with this hypothesis, EPYCl was previously found to be upregu- lated at both the transcript and protein levels in response to light and low CO2, and our data further supports this finding (Fig. 2A). Moreover, previous studies have shown that EPYCl becomes phosphorylated at multiple sites in response to low CO2, potentially affecting its binding affinity for RuBisCO. Another mode of regulation of EPYCl-RuBisCO binding could be by methylation of RuBisCO: RuBisCO is methylated a multiple residues and in Chlamydomonas the predicted methyltransferase CIA6 is required for RuBisCO localization to the pyrenoid. It is also possible that EPYCl binding to RuBisCO alters the kinetic properties of RuBisCO to fine-tune its performance in the pyrenoid.

Further to advancing our understanding of the molecular mechanisms underlying global carbon-fixation, our findings may help to enable the future engineering of crops with enhanced photosynthesis. There is great interest in introducing a CCM into C3 plants, as this enhancement is predicted to increase yields by up to 60% and improve nitrogen and water use efficiency (Long et al., Cell 161(1) : 56-66, 2015). While much remains to be done in understanding the algal CCM, recent work suggests that algal components may be relatively easy to engineer into higher plants (Atkinson et al., Plant Biotechnol. J. 10.1111/pbi.12497, 2015). Our discovery of a possible mechanism for RuBisCO assembly to form the pyrenoid is a key step towards engineering an algal CCM into crops.

Strains and culture conditions. Wild-type (WT) Chlamydomonas rein- hardtii CC-1690 was maintained at 22°C with 55 pmol photons rrr 2 s 1 light on Tris-acetate-phosphate (TAP) agar (1.4%) plates containing 0.4% Bacto-Yeast extract. It was used for pyrenoid enrichment and proteomics. Chlamydomonas reinhardtii WT strain CMJ030 (CC-4533) and epycl mutant were maintained in the dark or low light (~ 10 pmol photons rrr 2 s _1 ) on 1.5% agar plates containing TAP with revised or traditional Hutner's trace elements.

The epycl mutant was isolated from a collection of high CO2 requiring mutants by a pooled screening approach. A collection of approximately 7500 mutants on 79 plates, each with 96 colonies, was grown in liquid TAP in 96-well plates then pooled by well row, well column, whole plate row and whole plate column to give a total of 38 pools. Pooled cells were pelleted, DNA was extracted by phenol : chloroform : isoamyl alcohol (Phenol : CIA, 25 : 24: 1; Sigma-Aldrich) then screened by PCR for an epycl mutant using a primer in the pMJ016c mutagenesis cassette (a modified pMJ013c cassette) and a primer in the EPYC1 gene. The identified epycl mutant has an insertion of the pMJ016c resistance cassette in the 5'-UTR, the resistance cassette is llbp upstream of the ATG start codon, with the cassette having a lObp deletion at the 3' end. The upstream gDNA-cassette junction cannot be PCR amplified. However, PCR shows the full cassette is intact and that >397bp upstream of the insertion site is also intact. All experiments were performed under photoautotrophic conditions supplemented with high CO2 (3% or 5% v/v CO2 enriched air) or low CO2 (air, ~0.04% v/v C0 2 ).

For proteomics analysis, a 50 mL pre-culture was grown mixotrophi- cally in TAP on a rotatory shaker at 124 rpm and 22°C, under an illumination of 55 pmol photons nr 2 s 1 for three days. In brief, a second pre-culture of 500 mL was used to inoculate a 5-liter bioreactor (BIOSTAT®B-DCU, Sartorius Stedim). The absence of contamination was monitored. Cultures with a cell density of 3-5 x 10 6 cells mL 1 were grown photoautotrophically at 46 pmol photons nr 2 s _ 1 light in air enriched with high CO2 (5% CO2) under constant turbidity for two days before the culture was aerated with low CO2 (ambient air; 0.039% CO2) . The CO2 level in the outlet air of the bioreactor was measured by an on-line multi-valve gas chromatograph (3000A MicroGC run by EZChromElute software, Agilent Technologies) . After switching from high to low CO2, the CO2 dropped from 4.5% to a constant 0.02% after 12 hours. Cells were harvested at 30 hours after the shift to low CO2.

For mRNA levels, O2 evolution, RuBisCO content Western blotting, pyrenoid size analysis by transmission electron microscopy (TEM) and RuBisCO subcellular localization by immunogold labelling, strains were grown photoautotrophically in 50 mL Tris-minimal medium under constant aeration, shaking and illumination ( 150 rpm, 21°C, 50-65 pmols photons nr 2 s "1 ; Infors HT Multitron Pro) . Briefly, starter cultures were inoculated from freshly replated cultures on TAP plates, to 0.3 pg chlorophyll (a+b) mL -1 , and aerated with high C02 (5% v/v CO2 enriched air). When cell density reached mid-log (~3 pg chlorophyll (a+b) mL 1 ), half of the cultures were harvested and analyzed . The remaining half of the cultures were then switched to aeration with low CO2 (air, 0.04% v/v CO2) for induction of the CCM. For gene expression analysis and affinity for inorganic carbon, cells were air-adapted for 3 hours, corresponding to peak induction of CO2- inducible genes. The state of CCM induction was controlled by measuring the mRNA accumulation of a highly C02-responsive gene LCI1 (Cre03. gl62800). For TEM analysis of pyrenoid size and immunogold labelling of RuBisCO, cells were adapted to low CO2 for 24 hours.

For EPYC1 protein abundance determination and freeze fracture cryoelectron microscopy of WT and mutant cells, cultures were propagated continuously in Tris-phosphate (TP) medium with 50 pmols photons nr 2 s 1 light for ~1 week in a Multi-Cultivator (Photon Systems Instruments) with bubbling of high CO2 (3% v/v CO2) . Cells were diluted every 24 hours to ensure they were kept in the log phase. 6 hours before sampling, half of the cultures were switched from high CO2 to low CO2 (air, ~0.04% v/v CO2) . The chlorophyll concentration at harvesting was ~3 pg chlorophyll (a+b) mL "1 .

For fluorescence microscopy and RbcSl-mCherry localization experiments, cells were grown in TP medium containing antibiotics used for selection of expression of the fluorescently labeled gene (Venus, paromo- mycin at 2 pg mL 1 ; mCherry, hygromycin 6.25 pg mL 1 ), bubbled with high CO2 (3% v/v CO2) at a 150 pmol photons nr 2 s "1 light intensity. At ~2 x 10 6 cells mL 1 , after >6 doublings, cells were transferred to low CO2 for 14 hours. For the RbcSl-mCherry localization experiments, samples were taken and imaged immediately before the switch to low CO2 and after 14 hours at low CO2. For co-immunoprecipitation experiments, cells were grown in 50 mL of TAP at 150 pmol photons nr 2 s 1 light until ~2-4 x 10 6 cells mL -1 , centrifuged at lOOOg for 4 min, resuspended in TP and used to inoculate 800 mL of TP. Cells were then bubbled with low C02 (air, ~0.04% v/v CO2) at 150 pmol photons nr 2 s 1 until ~2-4 x 10 6 cells mL 1 and harvested as indicated below. All liquid media contained 2 pg mL 1 paromomycin. Cell concentrations were measured using a Z2 Coulter Counter (Beckman Coulter).

Proteomics

Pyrenoid enrichment. Ten mL algal material (3-5 x 10 6 cells mL -1 ) were harvested by centrifugation for 2 min (4000rpm, 4°C), immediately frozen in liquid nitrogen and extracted with extraction buffer (EB; 50 mM HEPES, 20 μΜ leupeptin, 1 mM PMSF, 17.4% glycerol, 2% Triton). The samples were sonicated 6 x 15 sec (6 cycles, 50% intensity, Sonoplus Bandelin Electronics) and kept on ice between cycles for 90 sec. The samples were centrifuged at 500g for 3 min at 4°C to obtain a soluble and pellet fraction. This procedure resembled the first steps of previously used protocols. The pellet was washed three times with 1 ml_, 500 μΙ_, and 300 μΙ_ EB before resuspension in lOOpL of 50 mM ammonium bicarbonate. Protein concentrations were measured by Lowry assay using BSA as a standard.

SDS-PAGE. Samples were resuspended in a buffer containing 50 mM dithi- othreitol (DTT), 50 mM sodium carbonate, 15% sucrose (w/v) and 2.5% SDS (w/v), heated 45 sec at 95°C and spun down at 14,000 rpm before applying 22 pg total protein to the polyacrylamide gel. The 14% separating gel was stained with Coomassie Brilliant Blue.

Protein digestion and mass spectrometric analysis. For shotgun proteomics, samples were prepared and measured by precipitating 20 pg protein per sample in 80% acetone at -20°C overnight. The precipitated proteins were resuspended in 6 M urea and 2 M thiourea (in 50 mM ammonium hydrogen carbonate), reduced by DTT, carbamidomethylated with iodoacetamide, digested with endoproteinase LysC (Roche) and immobilized trypsin (Applied Biosystems, Thermo Fisher Scientific), and subsequently desalted. The resuspended peptides were acidified with 1% acetic acid. Peptides were chromatographically separated by reverse phase separation with a nanoU- PLC (nanoACQUITY UPLC, Waters) using a 10cm x 75pm BEH 130 C18 1.7pm particles (Waters) column for separation and a 2cm x 180pm Symmetry C18 5pm particles (Waters) column for trapping. Peptides were analyzed by a linear trap quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific). Data processing and data analysis. Raw MS files were processed with MaxQuant (ver. 1.5.2.8). Peak list files were searched against Chlamydo- monas reinhardtii gene model JGIv4 from Phytozome 10.2 (phyto- zome.jgi.doe.gov) including the organelle genome sequences. Maximum precursor and fragment mass deviations were set to 20ppm and 0.5Da. Peptides with at least six amino acids were considered for identification. The search included carbamidomethylation as a fixed modification and variable modifications for oxidation of methionine and protein N-terminal acetylation. The false discovery rate, determined by searching a reverse database, was set at 0.01 for both peptides and proteins. Identification across different replicates and treatments was achieved by enabling the "match between runs" option in MaxQuant within a time window of 2 min. For comparison of protein levels between samples, the label-free quantification (LFQ) intensity based method was used. For the estimation of protein stoichiometries within a sample, the intensity-based absolute quantification (iBAQ) method was applied. Both values were calculated by the MaxQuant software. All statistical analyses were performed using Microsoft Excel. Cloning of EPYC1 and RbcSl. EPYC1 (Crel0.g436550) and RBCS1 (Cre02.gl20100) ORFs were amplified from gDNA using Phusion Hotstart II polymerase (Thermo Scientific) with the respective EPYCl_ORF_F/R or RBCSl_ORF_F/R primer pairs. Gel purified PCR products, containing vector overlap regions, were cloned into pLM005 or pLM006 by Gibson assembly. Final pLM005 constructs are in frame with Venus-3xFLAG and contain the AphVIII gene for paromomycin resistance, final pLM006 constructs are in frame with mCherry-6xHIS and contain the AphVII gene for hygromycin resistance. Both pLM005 and pLM006 confer ampicillin resistance for bacterial selection. For complementation with untagged EPYC1, mCherry- 6XHIS was removed from pLM006_EPYCl-mCherry-6xHIS by Bglll restriction digestion, gel purified then re-ligated. All constructs were verified by Sanger sequencing.

Transformation of Chlamydomonas for complementation and fluo- rescence localization of proteins. For each transformation, 14.5 ng kbp 1 of EcoRV cut plasmid was mixed with 250 μΙ_ of 2 x 10 8 cells mL 1 at 16°C and transformed immediately into WT or epycl strains by electro- poration. Cells we selected on TAP paromomycin (20 pg mL _1 ) or hygromy- cin (25 pg mL 1 ) plates and kept in low light (5-10 pmol photons nr 2 s 1 ) until picking or screening for fluorescence lines. Nuclear transformation in Chlamydomonas occurs by non-homologous insertion of DNA into the genome, resulting in random integration. In addition, for the complementation of the epycl mutant, a second transformation was selected on TP plates, without antibiotics at low CO2 (~0.04% v/v C02) under 500 pmol photons nr 2 s 1 light. To screen for Venus and mCherry expressing lines, transformations were spread on rectangular plates (Singer Instruments) containing 86 mL of TAP plus antibiotics. Once colonies were ~2-3 mm in diameter, plates were transferred to ~ 100 pmol photons nr 2 s 1 light for 24-36 hours and then screened for colony fluorescence on a Typhoon TRIO fluorescence scanner (GE Healthcare). Excitation and emission settings were: Venus, 532 excitation with 555/20 emission; mCherry, 532 excitation with 610/30 emission; and chlorophyll autofluorescence, 633 excitation with 670/30 emission. Dual-tag lines were generated sequentially by expressing pLM005_EPYCl-Venus-3xFLAG in WT then adding pLM006_ RbcSl-mCherry-6xHIS. To confirm expression of both Venus and mCherry in dual-tag strains and to select for strains with equal fluorescence intensity for the analysis of RbcSl-mCherry localization in WT and epycl, strains were also screened on a Tecan Infinite M 1000 PRO. Fluorescence microscopy and RuBisCO-mCherry mislocalization in the epycl mutant. All fluorescence microscopy was performed using a spinning disk confocal microscope (custom adapted Leica DMI6000) with samples imaged on poly-L-lysine coated plates. The following excitation and emission settings were used : Venus, 514 excitation with 543/22 emission; mCherry, 561 excitation with 590/20 emission; and chlorophyll, 561 exci- tation with 685/40 emission. Images were analyzed using Fiji software. For RbcSl-mCherry localization in WT and the epycl mutant, lines showing equal RbcSl-mCherry fluorescence intensity were chosen for analysis. WT and epycl lines were imaged using the above mCherry and chlorophyll settings. A Zstack composed of 40 slices 0.3pm apart was obtained for each field of view. To quantify the percentage of fluorescence signal from outside the pyrenoid region, raw images were analysed as follows: pixel intensity in the mCherry channel was summed across the 40 Z-sections for cells that were fully sectioned. Using the chlorophyll channel as a reference a cell outline region of interest (ROI; varying between cells) and pyrenoid ROI (set at 2.8pm in diameter for WT and mutant) were drawn. For each cell, background fluorescence was subtracted by taking the average of 4 measurements surrounding the cell, and autofluorescence was subtracted separately from the pyrenoid and whole cell ROIs by taking the average of 22 WT cells not expressing mCherry. Finally, the percentage RbcSl- mCherry signal from outside of the pyrenoid region was calculated as the (total cell signal - pyrenoid signal) / total cell signal x 100%.

Analysis of gene expression by quantitative real-time PCR (qt-PCR).

Total RNA was extracted from 30 pg chlorophyll a+b (~ 10 mL mid-log cell suspension), using TRIzol Reagent, as per manufacturer's instructions (Life Technologies). Complementary DNA was synthesised from 500 ng of total RNA using Superscript III Reverse Transcriptase (Life Technologies), RNaseOUT (Life Technologies), and oligo(dT)18 primers (Thermo Scientific). Relative gene expression was measured in real time in a Rotor- Gene Q thermocycler (Qiagen). Reactions (10 pL) used SYBR Green JumpStart Taq ReadyMix (Sigma-Aldrich). Gene expression was calculated relative to the Chlamydomonas gene coding for the Receptor of Activated Protein Kinase CI (RCK1, Cre06.g278222), which is not significantly induced by low CO2. Protein extraction and Western blotting. For EPYCl and CAH3 protein quantification in WT and the epycl mutant, protein was extracted from unfrozen cells, normalized to chlorophyll, separated by SDS-PAGE and Western blotted. Both the primary anti-EPYCl and anti-CAH3 (Agrisera) antibodies were used at a 1 : 2,000 concentration and the secondary horse- radish-peroxidase (HRP) conjugated goat anti-rabbit (Life Technologies) at a 1 : 10,000 concentration. To ensure even loading, membranes were stripped (Restore PLUS Western blot stripping buffer, Thermo Scientific) and re-probed with anti-tubulin (1 : 25,000; Sigma) followed by HRP conjugated goat anti-mouse (1 : 10,000; Life Technologies). The anti-EPYCl antibody was raised in rabbit to the C-terminal region of EPYCl (KSKPEIKRTALPA DWRKGL-COOH, SEQ ID NO : 16) by Yenzym Antibodies.

For RuBisCO quantification in WT and epycl mutant, total soluble proteins were extracted from 300 pg chlorophyll (a+b) (~100 mL mid-log cell suspension). Cells were harvested by centrifugation (13,000g, 10 min, 4°C), resuspended in ice cold 1.5 mL extraction buffer (50 mM Bicine, pH 8.0, 10 mM NaHC03, 10 mM MgCb, and 1 mM DTT), and lysed by sonication (6 x 30 second bursts of 20 microns amplitude, with 15 sec on ice between bursts; Soniprep 150, MSE UK, London, UK). Lysis was checked by inspecting samples under a light microscope. Lysate was clarified by centrifugation (13,000g, 20 min, 4°C). Protein content was determined using the Bradford method (Sigma Aldrich). Soluble proteins were separated on 12% (w/v) denaturing polyacrylamide gel. Sample loading was normalized by protein amount (10 pg per lane), and even loading was controlled by staining a gel with identical protein load (GelCode Blue, Life Technologies). After transfer onto a polyvinylidene difluoride membrane (Amersham), RuBisCO was immunodetected with a polyclonal primary antibody raised against RuBisCO (1 : 10,000) followed by a HRP conjugated goat anti-rabbit (1 : 20,000; GE Healthcare).

Chlorophyll concentration. Total pigments were extracted in 100% methanol, and the absorbance of the clarified supernatant (13,000g, 1 min, 4°C) was measured at 470, 652, 665, and 750 nm (UV 300 Unicam, Thermo Spectronic). Concentration of chlorophyll (a+b) was calculated using the equation of Wellburn. Spot tests. WT, epycl, and complemented cell lines were grown in TAP until ~2xl0 6 cells mL -1 , washed once with Tris-phospate (TP), resuspended in TP to a concentration of 6.6xl0 5 cells mL 1 , then serially diluted 1 : 10 three times. 15 μΙ_ of each dilution was spotted onto four TP plates and incubated in low or high CO2 with 100 or 500 pmol photons nr 2 s 1 of light for seven days before imaging.

Oxygen evolution measurements. Apparent affinity for inorganic carbon was determined by oxygen evolution. Photoautotrophically grown liquid cultures were harvested by centrifugation (2,000g, 5 min, 4°C) and re- suspended in 25 mM HEPES-KOH (pH 7.3) to a density of ~ 1.5 x 10 8 cells mL -1 , as determined by hemocytometer count. Aliquots of cells (1 mL) were added to a Clark-type oxygen electrode chamber (Rank Brothers, Bottisham, UK) attached to a circulating water bath set to 25°C. The chamber was closed for a light pre-treatment (200-300 pmol photons nr 2 S " 1 illumination for 10-25 min), to allow cells to deplete any internal inorganic carbon pool. When net oxygen evolution ceased, 10 pL of increasingly concentrated NaHC03 solution was added to the algal suspension at 30 second intervals, and the rate of oxygen evolution was recorded every second using a PicoLog 1216 data logger (Pico Technologies). Cumulative concentrations of sodium bicarbonate after each addition were as follows: 2.5, 5, 10, 25, 50, 100, 250, 500, 1,000 and 2,000 mM. Michaelis-Menten curves were fitted to plots of external inorganic carbon concentration versus the rate of O2 evolution. The concentration of inorganic carbon required for half maximal rates of photosynthesis (K0.5) was calculated from this curve. Pyrenoid area analysis by transmission electron microscopy. To minimize the loss of biological signal during harvesting, fixative (glutaraldehyde, final 2.5%) was added to cell cultures immediately before harvesting. Cell suspensions containing ~5 x 10 7 cells in mid-log were pelleted (4,000g, 5 min, 4°C) and fixed in 1 mL tris-minimal medium containing 2.5% glutaraldehyde and 1% H2O2 (30% w/v) for 1 hour on a tube rotator at 4°C. Unless otherwise specified, all following steps were performed at room temperature on a tube rotator. Cells were pelleted (4,000g for 5 min) and washed with ddh O (3X, 5 min). Cells were osmicated for 1 hour in 1 mL 1% (v/v) Os0 4 containing 1.5% (w/v) K 3 [Fe(CN) 6 ] and 2 mM CaCI 2 . Cells were pelleted and washed with ddH20 (4X, as above). Cells were stained for 1 hour in 1 mL 2% (w/v) uranyl acetate. After pelleting and washing with ddH 2 0 (3X), cells were dehydrated in 70%, 95%, and 100% ethanol, and 100% acetonitrile (2X). Cells were embedded in epoxy resin mix, containing Quetol 651, nonenyl succinic anhydride, methyl-5- norbornene-2,3-dicarboxylic anhydride, and dimethyl-benzylamine (all reagents from Agar Scientific), in the following proportions: 35%, 46%, 17%, 2%. Resin was refreshed 4 times over two days. Thin sections (50 nm) were prepared by the Cambridge Advanced Imaging Centre on a Leica Ultracut UCT Ultramicrotome and mounted onto 300 mesh copper grids. Samples were imaged with a Tecnai G2 transmission electron microscope (FEI) at 200 kV. Image analysis (area measurements) was performed using ImageJ. Ten 54pm 2 areas were randomly selected and all pyrenoid positive cells were imaged (WT low CO2, 79 out of 271 cells displayed a pyrenoid; epycl low CO2, 37 out of 139 cells displayed a pyrenoid; WT high CO2, 18 out of 196 cells displayed a pyrenoid; epycl high CO2, 22 out of 255 cells displayed a pyrenoid). Cell area was determined by outlining the plasma membrane. Pyrenoid area was taken as the area inside the starch sheath (generally visible in CCM-induced cells) or the electron dense area inside the chloroplast when no starch sheath was visible. Control immunogold labeling experiments using a high concentration of primary antibody (1 : 20) confirmed that these areas had dense concentrations of RuBisCO. Pyrenoid area was expressed as a percentage of cell area, and data was ordained in classes of 0.5% increment.

Quick-freeze deep-etch EM (QFDEEM)

Sampling and fixation

It was ascertained in pilot experiments that pyrenoids fixed by the following procedure are indistinguishable in QFDEEM ultrastructure from unfixed controls. 150 mL of each of air-bubbled cultures and 75 mL of high CO2- bubbled cultures were pelleted at l,000g for 10 min at RT to produce pellets of ~200 μΙ_. The pellets were resuspended in 6 mL of ice-cold 10 mM HEPES buffer (pH 7) and transferred to a cold 25 mL glass flask. A freshly prepared solution of 4% glutaraldehyde (Sigma-Aldrich G7651) in 10 mM HEPES (pH 7) was added 100 pL at a time, swirling between drops, until 1.5 mL in total had been added. The mixture was then left on ice for 1 hour, with agitation every 10 min. The mixture was pelleted (lOOOg, 5 min, 4° C), washed in cold HEPES buffer, pelleted again, and finally resuspended in 6 mL fresh HEPES. Samples were shipped overnight to St. Louis in 15 mL conical screw cap tubes maintained at 0-4°C. Microscopy

QFDEEM was performed. Briefly, small samples of pelleted cells were placed on a cushioning material and dropped onto a liquid helium-cooled copper block; the frozen material was transferred to liquid nitrogen and then to an evacuated Balzers apparatus, fractured, etched at -80°C for 2 min, and platinum/carbon rotary- replicated. The replicas were examined with a JEOL electron microscope, model JEM 1400, equipped with an AMTV601 digital camera. The images are photographic negatives; hence, protuberant elements of the fractured/etched surface are more heavily coated with platinum and appear whiter. Immunogold-localization of RuBisCO. Resin embedded material previously used for ultra-structural characterization of the pyrenoid was re-cut and thin sections were mounted on nickel grids. Osmium removal and unmasking of epitopes was done by acid treatment. Grids were gently floated face down on a droplet (~30 μΙ_) of 4% sodium meta-periodate (w/v in ddh O) for 15 min, and 1% periodic acid (w/v in ddH 2 0) for 5 min. Each acid treatment was followed by several short washes in ddh O. Non-specific sites were blocked for 5 min in 1% BSA (w/v) dissolved in high-salt Tris- buffered saline containing 500 mM NaCI, 0.05% Triton X-100 and 0.05% Tween 20 (hereafter abbreviated HSTBSTT). Salt, detergent, and surfactant concentrations were determined empirically to minimize background signal. Binding to primary antibody was done by incubating grids overnight in 1% BSA in HSTBSTT, with 1 : 1,000 dilution of the RuBisCO antibody. Excess antibody was removed by 15 min washes (2X) in HSTBSTT and 15 min washes (2X) in ddH 2 0. Incubation with secondary antibody (15 nm gold particle-conjugated goat anti-rabbit secondary antibody in 1% BSA in HSTBSTT, 1 : 250) was done at RT for 1 hr. Excess secondary antibody was removed by washing as above. Thin sections were prepared and imaged as for Pyrenoid area analysis by transmission electron microscopy, above. Randomization was done as above (see TEM) with scoring capped to ~25 cells for each treatment. Nonspecific labelling was taken as any particle on a free resin area, i.e. outside a cell. Non-specific density was subtracted from pyrenoid and chloroplast particle density. Fraction of particles in the pyrenoid was calculated as background-adjusted npyrenoid / (npyrenoid + nstroma), where nstroma is the number of particles in the stroma to the exclusion of the pyrenoid and the starch sheath. To improve the clarity of gold particles in Fig. 3G, particles were enlarged lOx using the image analysis software Fiji. Briefly, images were thresholded to isolate individual gold particles, these were then enlarged lOx, and the new image overlaid on the original image with an opacity of 50%.

Co-Immunoprecipitations

WT cells expressing pLM005_Venus-3xFLAG, pLM005_EPYCl-Venus-3x FLAG or pLM005_RbcSl-Venus-3xFLAG were grown in 800 mL of TP plus 2 pg mL 1 paromomycin with continual bubbling at low C02 (0.04% CO2) under 150 pmol photons nr 2 s 1 of light until a cell density of ~2-4 x 10 6 cells mL 1 . Cells were then spun out (2,000g, 4 min, 4°C), washed in 40 mL of ice cold TP, centrifuged then resuspended in a 1 : 1 (v/w) ratio of ice-cold 2xIP buffer (400 mM sorbitol, 100 mM HEPES, 100 mM KOAc, 4 mM Mg(OAc)2-4H 2 0, 2 mM CaC , 2 mM NaF, 0.6 mM Na3V0 4 and 1 Roche com- plete EDTA-free protease inhibitor/ 25 mL) to cell pellet. This cell slurry was then added drop wise to liquid nitrogen to form small Chlamydomonas "popcorn" balls approximately 5 mm in diameter. These were stored at -70°C until needed. Cells were lysed by grinding lg (~500 mg of original cell pellet) of Chlamydomonas popcorn balls by mortar and pestle at liquid nitrogen temperatures, for 10 min. The ground cells were defrosted on ice, then dounced 20 times on ice with a Kontes Glass Duall #21 homogenizer. Membranes were solubilized by incrementally adding an equal volume of ice-cold lxIP buffer plus 2% digitonin (final concentration is 1%), then incubating at 4°C for 40 min with nutation. The lysate was then clarified by spinning for 30 min at full-speed in a table-top centrifuge at 4°C. The supernatant (Input) was then transferred to 225 pL of protein G Dynabeads (Life Technologies) that had been incubated with anti-FLAG M2 antibody (Sigma) according to the manufacturer's instructions, except lxIP buffer was used for the wash steps. The Dynabead-cell lysate was incubated for 2.5 hours on a rotating platform at 4°C, then the supernatant removed (flow-through). The Dynabeads were washed four times with lxIP buffer plus 0.1% digitonin followed by a 30 min elution with 50 μΙ_ of lxIP buffer plus 0.25% digitonin and 2 pg pL 1 3xFLAG peptide (Sigma; 3xFLAG peptide elution) and a 10 min elution in lx Laemmli buffer with 50 mM beta- mercaptoethanol at 70°C (Boiling elution). Samples were run on 10% SDS- PAGE gels, then silver stained or transferred to PVDF membrane and probed with anti-FLAG (1 : 2,000; secondary: 1 : 10,000 HRP goat anti-mouse), anti- RuBisCO (1 : 10,000; secondary: 1 : 20,000 HRP goat anti-rabbit) or anti- EPYC1 (1: 2,000; secondary: 1 : 10,000 HRP goat anti-rabbit). Co-Immunoprecipitations. Cell lysate from WT cells expressing the bait proteins: Venus-3xFLAG, EPYCl-Venus-3xFLAG or RbcSl-Venus-3xFLAG, was incubated with anti-FLAG M2 antibody (Sigma) bound to protein G Dynabeads (Life Technologies). Bait proteins with interaction partners were eluted by 3xFLAG competition followed by boiling in lx Laemmli buffer.

EPYC1 sequence analysis. To understand the intrinsic disorder of EPYC1, the full-length amino acid sequence was run through several structural disorder prediction programs including VL3, VLTX, and GlobPlot 2. To look for regions of secondary structure, the full-length and repeat region of the EPYC1 amino acid sequence was analysed by PSIPRED v3.3 and Phyre2.

EPYCl-RuBisCO interaction model. We built a model of the EPYC1- RuBisCO interaction using Blender (blender dot org) based on the following rationale: If each of the four EPYC1 repeats can bind a holoenzyme, the two internal repeats would have different linking properties from the two terminal repeats. If bound to an internal repeat, a holoenzyme would be directly linked through this EPYC1 protein to two other holoenzymes. In contrast, if bound to a terminal repeat, the holoenzyme would only be directly linked through this EPYC1 protein to one other holoenzyme. There- fore on average, each EPYC1 repeat would link one RuBisCO holoenzyme to 1.5 other holoenzymes. Given the octameric structure of the RuBisCO holoenzyme, a holoenzyme likely has 8 binding sites for EPYCl. Taken together, on average each holoenzyme would be bound to 12 other nonenzymes by 8 EPYCl proteins, in an arrangement that could expand indefinitely in all directions. A perfect arrangement of this nature would require a stoichiometry of one EPYCl polypeptide for every four RuBisCO small or large subunits.

Analysis of other algal proteomes for EPYCl-like physicochemical properties. Complete translated genomic sequences from pyrenoid and non-pyrenoid algae were downloaded from Uniprot or Phytozome. Protein sequences were then analyzed for tandem repeats using Xstream with default settings except the following were set to: Min Period 40; Max Period 80; Min Copy No 3.0; Min TR Domain 75; Min Seq Content 0.7. The pi of the Xstream hits were then batch calculated using the Gene Infinity Protein Isoelectric Point calculator (geneinfinity dot org/sms/sms_proteiniep dot html), the disorder profile calculated using VLXT and the presence of transmembrane domains using TMHMM v.2.0. Proteins with >3 repeats, a pi >8, an oscillating disorder profile with a frequency between 40 and 80 amino acids, and no transmembrane domains were classified as potential EPYCl-like RuBisCO linker proteins. By applying stringent parameters we have tried to reduce the number of false positive hits but we realize that our approach has several limitations, including : 1) Missing true linker proteins due to not all of the physicochemical properties of EPYCl being essential for linker function. 2) Incomplete genome assembly of the investi- gated algae. 3) Incorrect gene models resulting in truncated, misspliced and frame-shifted proteins.

Statistical methods. When growing algal material in liquid medium, flasks were placed randomly throughout the orbital shaker/incubator. Placement was randomized after each sub-culturing to offset any differences in illumination quality. Manifold for air/C02 delivery had standardized tubing length and internal diameter for even aeration. Cells lysis via sonication required samples to be processed sequentially. Order of processing was randomized. Sample size of O2 evolution measurement was aligned to previously published work from the Griffiths lab. Sample size of electron microscopy related experiments (scoring of TEM thin sections and immunogold experiments) was validated by jackknife resampling.

Pre-established exclusion criteria for TEM image scoring were: (i) only grid areas fully covered with material (54pm 2 ) were considered; (ii) sections through broken cells and cell sections with a cross area < 12.5pm 2 (a circle with 2pm radius served as a guide) were not scored. Scoring of electron micrographs: images files were renamed with a random number (RAND BETWEEN function in Microsoft Excel), sorted from high to low, and scored blindly. The original filename appearing on the bottom left of each micrograph was masked during the on-screen processing in ImageJ. Randomly selected images were scored by a second experimenter for independent validation. No systematic bias (over- or under-estimation) was measured, and measurements deviated on average only by a couple of percentage points. Two-tailed Student's t-test was used to compare affinities for inorganic carbon of WT and epycl, as well as the mislocalization of RuBisCO by fluorescence microscopy and EM, because this test is robust to non-normal distributions. Welch's t-test was used to compare pyrenoid sizes, because the WT and mutant groups had substantially different standard deviations. Fisher's exact test of independence was used to compare the number of pyrenoids in WT and epycl, as this test is appropriate when there are two nominal variables.

Patents, patent applications, books, and other publications cited herein are incorporated by reference in their entirety.

In stating a numerical range, it should be understood that all values within the range are also described (e.g., one to ten also includes every integer value between one and ten as well as all intermediate ranges such as two to ten, one to five, and three to eight). The term "about" may refer to the statistical uncertainty associated with a measurement or the variability in a numerical quantity that a person skilled in the art would understand does not affect operation of the invention or its patentability.

All modifications and substitutions that come within the meaning of the claims and the range of their legal equivalents are to be embraced within their scope. A claim which recites "comprising" allows the inclusion of other elements to be within the scope of the claim; the invention is also described by such claims reciting the transitional phrases "consisting essentially of" (i.e., allowing the inclusion of other elements to be within the scope of the claim if they do not materially affect operation of the invention) or "consisting of" (i.e., allowing only the elements listed in the claim other than impurities or inconsequential activities which are ordinarily associated with the invention) instead of the "comprising" term. Any of these three transitions can be used to claim the invention.

It should be understood that an element described in this specification should not be construed as a limitation of the claimed invention unless it is explicitly recited in the claims. Thus, the granted claims are the basis for determining the scope of legal protection instead of a limitation from the specification which is read into the claims. In contradistinction, the prior art is explicitly excluded from the invention to the extent of specific embodiments that would anticipate the claimed invention or destroy novelty.

Moreover, no particular relationship between or among limitations of a claim is intended unless such relationship is explicitly recited in the claim (e.g., the arrangement of components in a product claim or order of steps in a method claim is not a limitation of the claim unless explicitly stated to be so). All possible combinations and permutations of individual elements disclosed herein are considered to be aspects of the invention. Similarly, generalizations of the invention's description are considered to be part of the invention. From the foregoing, it would be apparent to a person of skill in this art that the invention can be embodied in other specific embodiments without departing from its spirit or essential character. The described embodiments should be considered only as illustrative, not restrictive, because the scope of the legal protection provided for the invention will be indicated by the appended claims rather than by this specification.