Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
INHIBITORS OF CRISPR-CAS ASSOCIATED ACTIVITY
Document Type and Number:
WIPO Patent Application WO/2019/185751
Kind Code:
A1
Abstract:
The invention relates to Acr proteins capable of regulating CRISPR-Cas associated activity, and their use in regulating CRISPR-Cas associated activity within a biological cell. The invention further provides nucleic acid molecules and self-replications genetic elements adapted for expression of said Acr proteins, or for their co-expression with a cognate CRISPR-Cas system in a biological cell.

Inventors:
VAZQUEZ-URIBE RUBEN (DK)
VAN DER HELM ERIC (DK)
SOMMER MORTEN OTTO ALEXANDER (DK)
MISIAKOU MARIA-ANNA (DK)
LEE SANG WOO (DK)
Application Number:
PCT/EP2019/057786
Publication Date:
October 03, 2019
Filing Date:
March 27, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV DANMARKS TEKNISKE (DK)
International Classes:
C07K14/33; C12N15/10; C12N15/90
Domestic Patent References:
WO2017160689A12017-09-21
WO2018093990A12018-05-24
Other References:
MAXWELL KAREN L: "The Anti-CRISPR Story: A Battle for Survival", MOLECULAR CELL, vol. 68, no. 1, 5 October 2017 (2017-10-05), pages 8 - 14, XP085207631, ISSN: 1097-2765, DOI: 10.1016/J.MOLCEL.2017.09.002
ADAIR L. BORGES ET AL: "The Discovery, Mechanisms, and Evolutionary Impact of Anti-CRISPRs", ANNUAL REVIEW OF VIROLOGY, vol. 4, no. 1, 29 September 2017 (2017-09-29), USA, pages 37 - 59, XP055483877, ISSN: 2327-056X, DOI: 10.1146/annurev-virology-101416-041616
RAUCH BENJAMIN J ET AL: "Inhibition of CRISPR-Cas9 with Bacteriophage Proteins", CELL, CELL PRESS, AMSTERDAM, NL, vol. 168, no. 1, 29 December 2016 (2016-12-29), pages 150, XP029882136, ISSN: 0092-8674, DOI: 10.1016/J.CELL.2016.12.009
PAWLUK APRIL ET AL: "Naturally Occurring Off-Switches for CRISPR-Cas9", CELL, CELL PRESS, AMSTERDAM, NL, vol. 167, no. 7, 8 December 2016 (2016-12-08), pages 1829, XP029850707, ISSN: 0092-8674, DOI: 10.1016/J.CELL.2016.11.017
DATABASE UniProt [online] 24 July 2013 (2013-07-24), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:CDC86037.1};", XP002782243, retrieved from EBI accession no. UNIPROT:R6VUW4 Database accession no. R6VUW4
DATABASE UniProt [online] 21 March 2012 (2012-03-21), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:EHO80923.1};", XP002782244, retrieved from EBI accession no. UNIPROT:H1BQC5 Database accession no. H1BQC5
DATABASE UniProt [online] 7 September 2016 (2016-09-07), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:KEI89326.1};", XP002782245, retrieved from EBI accession no. UNIPROT:A0A175M4B1 Database accession no. A0A175M4B1
DATABASE EMBL-WGS [online] EBI; 28 November 2014 (2014-11-28), "Clostridium acetobutylicum hypothetical protein", XP002782246, Database accession no. KHD34419
DATABASE PROTEIN [online] 23 December 2014 (2014-12-23), "hypothetical protein [Clostridium sp. KNHs214]", XP002782247, retrieved from NCBI Database accession no. WP_035290260
DATABASE UniProt [online] 8 April 2008 (2008-04-08), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:EDS08511.1};", XP002782388, retrieved from EBI accession no. UNIPROT:B0NA18 Database accession no. B0NA18
DATABASE UniProt [online] 29 April 2008 (2008-04-29), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:EDT14069.1};", XP002782389, retrieved from EBI accession no. UNIPROT:B1BW36 Database accession no. B1BW36
DATABASE EMBL [online] 28 October 2016 (2016-10-28), "Clostridium perfringens hypothetical protein", XP002782390, retrieved from EBI Database accession no. AOY53692
DATABASE PROTEIN [online] 16 March 2017 (2017-03-16), "hypothetical protein [Clostridium baratii]", XP002782391, retrieved from NCBI Database accession no. WP_079286774
DATABASE PROTEIN [online] 26 October 2015 (2015-10-26), "hypothetical protein [Clostridium baratii]", XP002782392, retrieved from NCBI Database accession no. WP_055225583
DATABASE PROTEIN [online] 15 February 2017 (2017-02-15), "hypothetical protein [Clostridium baratii]", XP002782393, retrieved from NCBI Database accession no. WP_077229201
DATABASE UniProt [online] 8 March 2011 (2011-03-08), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:EFV20804.1};", XP002782394, retrieved from EBI accession no. UNIPROT:E5XD46 Database accession no. E5XD46
DATABASE UniProt [online] 14 October 2015 (2015-10-14), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:KMT21593.1};", XP002782395, retrieved from EBI accession no. UNIPROT:A0A0J8DBC9 Database accession no. A0A0J8DBC9
DATABASE UniProt [online] 15 March 2017 (2017-03-15), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:SHI90130.1};", XP002782396, retrieved from EBI accession no. UNIPROT:A0A1M6EXJ7 Database accession no. A0A1M6EXJ7
DATABASE PROTEIN [online] 21 July 2016 (2016-07-21), "hypothetical protein [Clostridium sporogenes]", XP002782397, retrieved from NCBI Database accession no. WP_049041135
DATABASE PROTEIN [online] 24 March 2015 (2015-03-24), "hypothetical protein [Clostridium sporogenes]", XP002782398, retrieved from NCBI Database accession no. WP_045518524
DATABASE PROTEIN [online] 27 February 2016 (2016-02-27), "hypothetical protein [Clostridium botulinum]", XP002782399, retrieved from NCBI Database accession no. WP_061327540
DATABASE PROTEIN [online] 13 October 2014 (2014-10-13), "hypothetical protein [Clostridium botulinum]", XP002782400, retrieved from NCBI Database accession no. WP_033066342
DATABASE PROTEIN [online] 11 January 2018 (2018-01-11), "hypothetical protein [Clostridium botulinum]", XP002782401, retrieved from NCBI Database accession no. WP_071647430
DATABASE PROTEIN 28 July 2017 (2017-07-28), "hypothetical protein [Clostridium gasigenes]", XP002782402, retrieved from NCBI Database accession no. WP_089966494
DATABASE PROTEIN [online] 30 January 2015 (2015-01-30), "phage head-tail adapter protein [Clostridium botulinum]", XP002782403, retrieved from NCBI Database accession no. WP_041082932
DATABASE UniProt [online] 18 January 2017 (2017-01-18), "SubName: Full=Phage head-tail adapter protein {ECO:0000313|EMBL:AOR24305.1};", XP002782404, retrieved from EBI accession no. UNIPROT:A0A1D7XLS5 Database accession no. A0A1D7XLS5
DATABASE PROTEIN [online] 27 January 2016 (2016-01-27), "hypothetical protein [Clostridium botulinum]", XP002782405, retrieved from NCBI Database accession no. WP_061301923
DATABASE UniProt [online] 10 May 2017 (2017-05-10), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:SLK19661.1};", XP002782406, retrieved from EBI accession no. UNIPROT:A0A1U6JH80 Database accession no. A0A1U6JH80
DATABASE PROTEIN [online] 29 October 2017 (2017-10-29), "phage head-tail adapter protein [Clostridium paraputrificum]", XP002782407, retrieved from NCBI Database accession no. WP_099313953
AUSUBEL F.M. ET AL., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, 1987
ESVELT, K. M.; MALI, P.; BRAFF, J. L.; MOOSBURNER, M.; YAUNG, S. J.; CHURCH, G. M.: "Orthogonal Cas9 proteins for RNA-guided gene regulation and editing", NATURE METHODS, vol. 10, no. 11, 2013, pages 1116 - 21, XP055128928, Retrieved from the Internet DOI: doi:10.1038/nmeth.2681
GENEE, H. J.; BONDE, M. T.; BAGGER, F. O.; JESPERSEN, J. B.; SOMMER, M. O.; WERNERSSON, R.; OLSEN, L. R.: "Software-supported USER cloning strategies for site-directed mutagenesis and DNA assembly", ACS SYNTHETIC BIOLOGY, vol. 4, no. 3, 2014, pages 342 - 349, Retrieved from the Internet
GENEE H J; BALI A P; PETERSEN SD; SIEDLER, S; BONDE MT; GRONENBERG LS; KRISTENSEN, M; HARRISON SJ; MOA SOMMER: "Functional mining of transporters using synthetic selections", NATURE CHEMICAL BIOLOGY, vol. 12, 2016, pages 1015 - 1022
HUG, L. A.; B.J. BAKER; K. ANANTHARAMAN; C.T. BROWN; A.J. PROBST; C.J. CASTELLE; C.N. BUTTERFIELD; A.W. HERNSDORF; Y. AMANO; K. IS: "A new view of the tree of life", NAT. MICROBIOL., vol. 1, 2016, pages 1 - 6
KOONIN, E. V.; MAKAROVA, K. S.; ZHANG, F.: "Diversity, classification and evolution of CRISPR-Cas systems", CURRENT OPINION IN MICROBIOLOGY, vol. 37, 2017, pages 67 - 78, XP085276922, Retrieved from the Internet DOI: doi:10.1016/j.mib.2017.05.008
LUTZ, R.; BUJARD, H.: "Independent and tight regulation of transcriptional units in Escherichia coli via the LacR / O , the TetR / O and AraC / I 1 -I 2 regulatory elements", PHARMACIA, vol. 25, 1997, pages 1203 - 1210, XP001084137, DOI: doi:10.1093/nar/25.6.1203
MARTINEZ-GARCIA, E.; APARICIO, T.; GONI-MORENO, A.; FRAILE, S.; DE LORENZO, V.: "SEVA 2.0: An update of the Standard European Vector Architecture for de-/re-construction of bacterial functionalities", NUCLEIC ACIDS RESEARCH, vol. 43, no. D1, 2015, pages D1183 - D1189, Retrieved from the Internet
MEDEMA, M. H.; E. TAKANO; R. BREITLING: "Detecting sequence homology at the gene cluster level with multigeneblast", MOL. BIOL. EVOL., vol. 30, 2013, pages 1218 - 1223
PAWLUK, A.; N. AMRANI; Y. ZHANG; B. GARCIA; Y. HIDALGO-REYES; J. LEE; A. EDAKI; M. SHAH; E.J. SONTHEIMER; K.L. MAXWELL: "Naturally Occurring Off-Switches for CRISPR-Cas9 Article Naturally Occurring Off-Switches for CRISPR-Cas9", CELL, vol. 167, 2016, pages 1829 - 1834
SAVITSKY, P.; J. BRAY; C.D.O. COOPER; B.D. MARSDEN; P. MAHAJAN; N.A. BURGESS-BROWN; O. GILEADI: "High-throughput production of human proteins for crystallization: The SGC experience", J. STRUCT. BIOL., vol. 172, 2010, pages 3 - 13, XP027249552
SOMMER, M.O.; DANTAS, G.; CHURCH, G.M.: "Functional characterization of the antibiotic resistance reservoir in the human microflora", SCIENCE, vol. 325, 2009, pages 1128 - 1131, XP055130442, DOI: doi:10.1126/science.1176950
ZHU, W.; A. LOMSADZE; M. BORODOVSKY: "Ab initio gene identification in metagenomic sequences", NUCLEIC ACIDS RES., vol. 38, 2010, pages 1 - 15
Attorney, Agent or Firm:
GUARDIAN IP CONSULTING I/S (DK)
Download PDF:
Claims:
Claims

1. A nucleic acid molecule encoding an anti-CRISPR protein selected from the group consisting of an :

a. AcrIIA8 protein,

b. AcrIIA6 protein,

c. AcrIIA9 protein and

d. AcrIIA7 protein.

2. A self-replicating genetic element comprising the nucleic acid molecule

according to claim 1, wherein

a. the amino acid sequence of said AcrIIA8 protein has at least 80%

sequence identify to an amino acid sequence selected from the group consisting of SEQ ID Nos. : 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, amd 219;

b. the amino acid sequence of said AcrIIA6 protein has at least 80%

sequence identify to an amino acid sequence selected from the group consisting of SEQ ID Nos. : 221, 223 and 225;

c. the amino acid sequence of said AcrIIA9 protein has at least 80%

sequence identify to an amino acid sequence selected from the group consisting of SEQ ID Nos. : 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, and 247;

d. the amino acid sequence of said AcrIIA7 protein has at least 80%

sequence identify to an amino acid sequence selected from the group consisting of SEQ ID Nos. : 249, 251, 253, 255 and 257, wherein said nucleic acid molecule is operably linked to a promoter.

3. The self-replicating genetic element according to claim 2; and a CRISPR-Cas expression system comprising :

a. a first gene encoding a polypeptide or a first gene cluster encoding a polypeptide complex, said polypeptide or polypeptide complex having RNA-guided endonuclease activity, wherein said gene or gene cluster is operably linked to a promoter; and

b. a nucleic acid molecule operably linked to a promoter, said nucleic acid molecule encoding one or more RNA molecules, wherein said CRISPR-Cas expression system is comprised on said self- replicating genetic element or on a further self-replicating genetic element, wherein said one of more RNA molecules are capable of guiding said polypeptide or polypeptide complex having RNA-guided endonuclease activity to a target gene in a cell; and

wherein expression of said anti-CRISPR protein is capable of regulating said CRISPR-Cas system when co-expressed in the cell.

4. The self-replicating genetic element according to claim 3, wherein

a. said first gene encodes a Cas9 polypeptide having RNA-guided

endonuclease activity, and wherein

b. one of said RNA molecules is a mat-crRNA and one of said RNA

molecules is a cognate trans-activating crRNA, or said one RNA molecule is a fusion of a mat-crRNA and a cognate trans-activating crRNA.

5. A biological cell comprising :

a. a nucleic acid molecule according to claim 1;

or

b. a self-replicating genetic element according to any one of claims 2 to

4.

6. An isolated anti-CRISPR protein encoded by the nucleic acid molecule

according to claim 1 or encoded by the self-replicating genetic element according to claim 2, wherein said protein is selected from the group consisting of:

a. AcrIIA8 protein,

b. AcrIIA6 protein,

c. AcrIIA9 protein and

d. AcrIIA7 protein.

7. Use of a nucleic acid molecule according to claim 1, or an anti-CRISPR protein according to claim 6 to regulate CRISPR-Cas associated activity. 8. The use of a nucleic acid molecule or an anti-CRISPR protein to regulate the

CRISPR-Cas associated activity according to claim 7, wherein said activity is provided by a CRISPR-Cas system selected from the group consisting of Type Type II and Type V and wherein said system comprises: a. a polypeptide or polypeptide complex having RNA-guided endonuclease activity, and

b. one or more RNA molecules capable of guiding said polypeptide or polypeptide complex having RNA-guided endonuclease activity to a target gene or its transcript,

wherein the amino acid sequence(s) of said polypeptide or each member of said polypeptide complex having RNA-guided endonuclease activity has at least 80% amino acid sequence identity to an amino acid sequence or a corresponding member of a set of amino acid sequences selected from among Group A; and

wherein one of said RNA molecules comprises at least one cognate palindromic repeat sequence and one of said RNA molecules comprises a cognate trans activating crRNA or one of said RNA molecules comprises a fusion at least one cognate palindromic repeat sequence and one cognate trans-activating crRNA, wherein said repeat sequence is encoded by a nucleotide sequence at a corresponding position in the list of Group B; and wherein said cognate trans activating crRNA is encoded by a nucleotide sequence at a corresponding position in the list of Group C

9. The use of a nucleic acid molecule or an anti-CRISPR protein to regulate the CRISPR-Cas associated activity according to claim 7, wherein said activity is provided by a CRISPR-Cas system selected from the group consisting of Type 1, Type III, Type IV, Type V, and Type VI and wherein said CRISPR-Cas system comprises :

a. a polypeptide or polypeptide complex having RNA-guided endonuclease activity, and

b. one or more RNA molecules capable of guiding said polypeptide or polypeptide complex having RNA-guided endonuclease activity to a target gene or its transcript,

wherein the amino acid sequence of each member of said polypeptide complex having RNA-guided endonuclease activity has at least 80% amino acid sequence identity to a corresponding member of a set of amino acid sequences selected from among Group A;

wherein said one or more RNA molecules comprises at least one cognate palindromic repeat sequence, wherein said repeat sequence is encoded by a nucleotide sequence at a corresponding position in the list of Group B

10. The use of a nucleic acid molecule or an anti-CRISPR protein to regulate CRISPR-Cas associated activity according to any one of claims 7 to 9, wherein said activity is expressed in a biological cell selected from the group consisting of a bacterial, yeast, fungus, insect, and plant cell .

11. A method for in vivo regulation of a CRISPR-Cas system, comprising

a. providing a biological cell comprising a target gene;

b. contacting said cell with a molecule, wherein said molecule is selected from the group consisting of:

i. an anti-CRISPR protein according to claim 6, ii. a nucleic acid molecule encoding and capable of expressing an anti-CRISPR protein according to claim 1, and iii. a self-replicating genetic element comprising the nucleic acid molecule encoding and capable of expressing an anti-CRISPR protein according to any one of claims 2 to 4, wherein said cell either comprises or is additionally contacted with a CRISPR-Cas expression system comprising :

iv. a first gene encoding a polypeptide or a first gene cluster encoding a polypeptide complex, said polypeptide or polypeptide complex having RNA-guided endonuclease activity, wherein said gene or gene cluster is operably linked to a promoter; and

v. a nucleic acid molecule operably linked to a promoter, said nucleic acid molecule encoding one or more RNA molecules capable of guiding said polypeptide or polypeptide complex having RNA-guided endonuclease activity to said target gene; wherein expression of said anti-CRISPR protein in said cell is capable of regulating said CRISPR-Cas system.

12. The method for in vivo regulation of a CRISPR-Cas system according to claim 11, wherein said polypeptide or polypeptide complex having RNA-guided endonuclease activity is selected from the group of CRISPR-Cas systems consisting of Type 1, Type II, Type III; Type IV, Type V, and Type VI.

13. The method for in vivo regulation of a CRISPR-Cas system according to claim 11 or 12, wherein

i. said first gene encodes a Cas9 polypeptide having RNA-guided

endonuclease activity, and wherein

ii. one of said RNA molecules is a mat-crRNA and one of said RNA

molecules is a cognate trans-activating crRNA, or said one RNA molecule is a fusion of a mat-crRNA and a cognate trans-activating crRNA.

14. The method for in vivo regulation of a CRISPR-Cas system according to any one of claims 11 to 13, wherein the amino acid sequence(s) of said polypeptide or each member of said polypeptide complex having RNA-guided endonuclease activity has at least 80% amino acid sequence identity to an amino acid sequence or a corresponding member of a set of amino acid sequences selected from among Group A; and

wherein one of said RNA molecules comprises at least one cognate palindromic repeat sequence and one of said RNA molecules comprises a cognate trans activating crRNA or one of said RNA molecules comprises a fusion at least one cognate palindromic repeat sequence and one cognate trans-activating crRNA, wherein said repeat sequence is encoded by a nucleotide sequence at a corresponding position in the list of Group B; and wherein said cognate trans activating crRNA is encoded by a nucleotide sequence at a corresponding position in the list of Group C

15. The method for in vivo regulation of a CRISPR-Cas system according to any one of claims 11 to 13, wherein the amino acid sequence of each member of said polypeptide complex having RNA-guided endonuclease activity has at least 80% amino acid sequence identity to a corresponding member of a set of amino acid sequences selected from among Group A;

wherein said one or more RNA molecules comprises at least one cognate palindromic repeat sequence, wherein said repeat sequence is encoded by a nucleotide sequence at a corresponding position in the list of Group B

16. The method for in vivo regulation of a CRISPR-Cas system according to any one of claims 11 to 15, wherein biological cell is selected from the group consisting of a bacterial, yeast, fungus, insect, and plant cell.

17. A composition comprising an anti-CRISPR protein according to claim 6, or a nucleic acid molecule encoding and capable of expressing an anti-CRISPR protein according to claim 1, for use as a medicament.

18. The composition according to claim 17 for use as a medicament, wherein said composition is for use in regulating CRISPR-Cas associated activity in a somatic mammalian cell .

19. A pharmaceutical kit comprising the composition according to claim 17 in combination with a CRISPR-Cas expression system for use as a medicament, wherein said expression system comprises:

a. a first gene encoding a polypeptide or a first gene cluster encoding a polypeptide complex, said polypeptide or polypeptide complex having RNA-guided endonuclease activity, wherein said gene or gene cluster is operably linked to a promoter; and

b. a nucleic acid molecule operably linked to a promoter, said nucleic acid molecule encoding one or more RNA molecules capable of guiding said polypeptide or polypeptide complex having RNA-guided endonuclease activity to said target gene; and

wherein said anti-CRISPR protein is for use in regulating CRISPR-Cas associated activity in a somatic mammalian cell .

Description:
TITLE: Inhibitors of CRISPR-Cas associated activity

Technical field of the invention

The invention relates to Acr proteins capable of regulating CRISPR-Cas associated activity, and their use in regulating CRISPR-Cas associated activity within a biological cell. The invention further provides nucleic acid molecules and self-replications genetic elements adapted for expression of said Acr proteins, or for their co-expression with a cognate CRISPR-Cas system in a biological cell.

Background of the invention

CRISPR-Cas systems originate from bacteria where they function as adaptive immune systems that cleave foreign DNA or RNA in a sequence specific manner. Components of type II CRISPR-cas systems, such as Cas9 from Streptoccocus pyogenes, have been exploited for precise and programmable gene editing with a substantial impact on life sciences in spite of concerns about off target activity. Given the significant impact of CRISPR-Cas systems, substantial efforts have been made to discover new systems in cultured and recently uncultured bacterial species. CRISPR-Cas technologies provide effective biotechnological tools that broadly impact life sciences. Applications range from treating human genetic diseases to eliminating pathogens using gene-drives. Beyond genome editing, CRISPR-Cas systems have several other applications, including programmable transcriptional regulators, and ultra-sensitive detection devices. However, there is increasing concern about the safety and precision of CRISPR-Cas. Indeed, the 2016 worldwide threat assessment of U.S. intelligence listed genome editing technologies as a potential weapon of mass destruction and proliferation. Accordingly, more tools are needed to control the CRISPR-Cas technology in order to reduce potential risks of genome editing, and to contain their spread and potential harm to other organisms.

A promising method to control CRISPR-Cas activity is via anti-CRISPR (ACR) proteins. These naturally occurring inhibitors can potentially inactivate CRISPR-Cas systems when needed, or decrease off-target effects, without significantly affecting on-target efficiency. ACRs were initially discovered for type I CRISPR systems. Subsequent bioinformatic mining identified ACR activity against type II-C Cas9 from Neisseria meningitidis and type II-A Cas9 from Streptococcus pyogenes (spCas9). More recently an additional ACR against spCas9 was identified by cloning and testing multiple genes from a phage that was able to escape CRISPR-based immunity from Streptococcus thermophillus. Given the abundance of CRISPR-Cas systems in bacteria as well as the abundance of uncharacterized phages, it is likely that we currently have only revealed a minute proportion of ACRs in the environment. Accordingly, there exists a need for for identifying further ACRs, that can be used to regulate CRISPR-cas in its diverse applications. Summary of the invention

A first embodiment of the invention provides a nucleic acid molecule encoding an anti- CRISPR protein selected from the group consisting of an : i. AcrIIA8 protein,

ii. AcrIIA6 protein,

iii. AcrIIA9 protein and

iv. AcrIIA7 protein .

In respect to the first embodiment, the invention further provides a self-replicating genetic element comprising the nucleic acid molecule encoding said anti-CRISPR protein, wherein :

a. the amino acid sequence of said AcrIIA8 protein has at least 80% sequence identify to an amino acid sequence selected from the group consisting of SEQ ID Nos. : 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, amd 219;

b. the amino acid sequence of said AcrIIA6 protein has at least 80% sequence identify to an amino acid sequence selected from the group consisting of SEQ ID Nos. : 221, 223 and 225;

c. the amino acid sequence of said AcrIIA9 protein has at least 80% sequence identify to an amino acid sequence selected from the group consisting of SEQ ID Nos. : 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, and 247;

d. the amino acid sequence of said AcrIIA7 protein has at least 80% sequence identify to an amino acid sequence selected from the group consisting of SEQ ID Nos. : 249, 251, 253, 255 and 257; and

wherein said nucleic acid molecule is operably linked to a promoter.

In respect to the first embodiment, the invention further provides a biological cell comprising either said nucleic acid molecule encoding an anti-CRISPR protein, or comprising said self-replicating genetic element comprising the nucleic acid molecule operably linked to a promoter and where the nucleic acid molecule encodes said anti- CRISPR protein.

Furthermore, said first embodiment may further include a CRISPR-Cas expression system comprising : a first gene encoding a polypeptide or a first gene cluster encoding a polypeptide complex, said polypeptide or polypeptide complex having RNA-guided endonuclease activity, wherein said gene or gene cluster is operably linked to a promoter; and a nucleic acid molecule operably linked to a promoter, said nucleic acid molecule encoding one or more RNA molecules,

wherein said CRISPR-Cas expression system is comprised on said self-replicating genetic element or on a further self-replicating genetic element, wherein said one of more RNA molecules are capable of guiding said polypeptide or polypeptide complex having RNA-guided endonuclease activity to a target gene in a cell; and wherein expression of said anti-CRISPR protein is capable of regulating said CRISPR-Cas system when co-expressed in the cell.

A second embodiment of the invention provides an isolated anti-CRISPR protein selected from the group consisting of: i. AcrIIA8 protein,

ii. AcrIIA6 protein,

iii. AcrIIA9 protein and

iv. AcrIIA7 protein .

A third embodiment of the invention provides for the use of the nucleic acid molecule of the first embodiment, or an anti-CRISPR protein of the second embodiment to regulate CRISPR-Cas associated activity. This use may be applied to regulating

CRISPR-Cas associated activity expressed in a biological cell selected from the group consisting of a bacterial, yeast, fungus, insect, and plant cell . Alternatively, this use may be applied to a mammalian cell (e.g. somatic cell).

A fourth embodiment of the invention provides a method for in vivo regulation of a CRISPR-Cas system, comprising

a. providing an isolated biological cell comprising a target gene;

b. contacting said cell with a molecule, wherein said molecule is selected from the group consisting of:

i. an anti-CRISPR protein of the invention,

ii . a nucleic acid molecule encoding and capable of expressing an anti- CRISPR protein of the invention, and

iii . a self-replicating genetic element of the invention comprising said nucleic acid molecule encoding and capable of expressing said anti- CRISPR protein,

wherein said cell either comprises or is additionally contacted with a CRISPR- Cas expression system comprising : iv. a first gene encoding a polypeptide or a first gene cluster encoding a polypeptide complex, said polypeptide or polypeptide complex having RNA-guided endonuclease activity, wherein said gene or gene cluster is operably linked to a promoter; and

v. a nucleic acid molecule operably linked to a promoter, said nucleic acid molecule encoding one or more RNA molecules capable of guiding said polypeptide or polypeptide complex having RNA-guided endonuclease activity to said target gene;

wherein expression of said anti-CRISPR protein in said cell is capable of regulating the activity of the CRISPR-Cas system expressed by the Cas expression system . This method may be applied to an isolated biological cell selected from the group consisting of a bacterial, yeast, fungus, insect, and plant cell or alternatively to a mammalian cell (e.g . somatic cell). A fifth embodiment of the invention provides a composition comprising an anti-CRISPR protein of the invention, or a nucleic acid molecule encoding and capable of expressing an anti-CRISPR protein of the invention, for use as a medicament. Said composition is also for use in regulating CRISPR-Cas associated activity in a somatic mammalian cell . A sixth embodiment of the invention provides a pharmaceutical kit comprising said aforementioned composition in combination with a CRISPR-Cas expression system for use as a medicament, wherein said expression system comprises : a first gene encoding a polypeptide or a first gene cluster encoding a polypeptide complex, said polypeptide or polypeptide complex having RNA-guided endonuclease activity, wherein said gene or gene cluster is operably linked to a promoter; and a nucleic acid molecule operably linked to a promoter, said nucleic acid molecule encoding one or more RNA molecules capable of guiding said polypeptide or polypeptide complex having RNA- guided endonuclease activity to said target gene; and wherein said anti-CRISPR protein is for use in regulating CRISPR-Cas associated activity in a somatic mammalian cell.

Description of the invention

Figures:

Figure 1 is a cartoon illustrating a host cell comprising a genetic circuit and a member of a self-replicating library of non-host DNA, used in the detection of inhibitors of CRISPR-Cas systems. The host cell comprises a gene or gene cluster encoding a protein or protein complex having RNA-guided endonuclease activity, a nucleic acid molecule encoding chimeric gRNA (sgRNA) or a CRISPR array, as well as a reporter gene encoding a selection marker. Cells of the host are transformed with a self- replicating library of non-host DNA. Expression of polypeptides encoded by the non host DNA that inhibit inactivation of the reporter gene or its transcript by the RNA- guided endonuclease are detected by means of the selection marker.

Figure 2 shows a cartoon (A) illustrating the pCasens3 and pDual3 plasmids and their interaction with an anti-CRISPR protein encoded by a positive control plasmid. Plasmid pCasens3 (middle) comprises: the Streptococcus pyogenes Cas9 gene (spCas9) [SEQ ID No. : 150] cloned into the backbone of the plasmid pSEVA47. Additionally, a sigma70 constitutive promoter (P23100) [SEQ ID No. : 148], adjacent to a DNA molecule encoding a theophylline translational riboswitch [SEQ ID No. : 149] were inserted upstream of the spCas9 gene in pSEVA47, in order to control its expression . pSEVA47 comprises a low copy number origin of replication, SC101 [SEQ ID No. : 155]; and the antibiotic resistance gene aadA [SEQ ID No. : 153] under the control of its native (constitutive) promoter and ribosomal binding site, conferring resistance against Spectinomycin. Plasmid pDual3 (to the right) comprises a chloramphenicol resistance gene (CmR) [SEQ ID No. : 164] operably linked to an upstream constitutive promoter and RBS; and a DNA sequence encoding a sgRNA [SEQ ID No. : 161] whose transcription/expression is regulated by the inducible pBAD promoter [SEQ ID No. : 158], located upstream of the respective sequences. The plasmid further comprises an araC L-arabinose sensor gene [SEQ ID No. : 159] that regulates the pBAD promoter in response to the metabolite, L-arabinose. pDual3 comprises a low copy number origin of replication, pl5A [SEQ ID No. : 163] . A positive control plasmid comprising a gene [SEQ ID No. : 146] encoding ACRIIA2 protein under the control of a constitutive pTET promoter [SEQ ID No. : 139] .

Cartoon in figure 2B illustrates the negative control plasmid comprising a gene [SEQ ID No. : 140] encoding Green Fluorescent Protein (GFP) under the control of a constitutive pTET promoter [SEQ ID No. : 139] . Both negative and positive control plasmids of Figure 2 further comprise a kanamycin resistance gene [SEQ ID No. : 144] under the control of its native (constitutive) promoter and ribosomal binding site and a colEl origin of replication [SEQ ID No. : 143] .

Figure 3 shows (A) a cartoon illustrating the molecular mechanism whereby in E. coli cells comprising the Genetic Circuit (£. coli- GC), the genetic circuit is induced to express the cas9 gene and sgRNA, with the formation of a Cas9-sgRNA complex, and the endonuclease-mediated cleavage of the reporter gene (Cat), leading to cell death when the cells are exposed to chloramphenicol. The cartoon further shows that co- expression of an acrIIA2 gene in the E. coli- GC cells can inhibit Cas9-sgRNA inactivation of the Cat gene, leading to cell survival under selective conditions. (B) is a histogram showing the number of colony forming units (CFU .ml 1 ) of E. coli- GC cells comprising a negative control plasmid (expressing GFP) or the positive control plasmid (expressing ACRIIA2) following cultivation on selective media, where the genetic circuit is either induced or non-induced .

Figure 4 is a cartoon showing the use of host cells comprising the genetic circuit to screen a metagenomics DNA expression library for anti-CRISPR proteins.

Figure 5 is a cartoon illustrating a metagenomics DNA expression plasmid comprising a fragment of (a library of) metagenomics DNA fragments cloned down-stream of a constitutive pTET promoter [SEQ ID No. : 139]; a kanamycin resistance gene under the control of its native (constitutive) promoter and ribosomal binding site; together with the genetic circuit comprising plasmids pCasens3 and pDual3.

Figure 6 is a histogram showing the number of colony forming units (CFU.ml 1 ) of E. coli- GC cells comprising selected metagenomics inserts, designated AC11 - AC42; as compared to a negative control plasmid (expressing GFP) and the positive control plasmid (expressing ACRIIA2), following cultivation on selective media, where the genetic circuit is induced.

Figure 7 (A) shows a 1% agarose gel comprising the size-separated cleavage products resulting from an in vitro cleavage of linear double-stranded DNA template by the spCas9-sgRNA complex in the presence or absence of putative inhibitors of spCas9 activity. Said DNA template comprises the same 20 base pair target sequence recognised spCas9-sgRNA complex as used in the in vivo assay of figure 6. (B) is a graphical presentation of the binding strength of each putative inhibitor of spCas9 (ACR) to biotinylated spCas9 showed as Kd determined using biolayer interferometry. Figure 8 Table showing the sequence identities between the isolated ACRIIA 1-9 polypeptides in matrix format.

Figure 9 Diagram showing the computed hierarchical clustering of members of the ACRIIA1 - 9 protein families.

Figure 10 (A) Diagram showing the distribution of AcrllA genes across genomic and metagenomic datasets, where hosts of all identified AcrIIAs are mapped to the tree of life (Flug, LA et al . , 2016). Tree branches correspond to all distinct bacterial phyla, (NCBI Taxonomy) carrying AcrIIA's, where previously identified AcrIIAs are given in italics. (B) Quantification of AcrllA genes and their homologues in publically available viromes (MetaVir). The x axis corresponds to the number of raw reads with homology to an AcrllA and the y axis to the distinct virome datasets stacked by habitat and grouped by AcrllA. Bars are assigned shades of gray according to the type of habitat the datasets represent; showing localization of AcrIIA9 in respectively human, mouse, wastewater and freshwater samples; and AcrIIA7 in respectively human, insect, food, coral, wastewater, freshwater, seawater, hypersaline, and soil samples.

Figure 11 Genomic context of AcrIIA8 homologues. The AcrIIA8-surrounding genomic neighbourhood across all relevant reference genomes is shown mapped to the AcrIIA8 protein tree. Homologous genomic regions were identified by MultiGeneBlast (Medema, M . H., et al . , 2013) using the genome carrying the closest homologue to AcrIIA8 as a search query, here displayed at the top. Homology to the queried genes is denoted by shapes having a common shading/hatching; shaded/hatched shapes that are not part of the legend correspond to uncharacterized proteins; White shapes correspond to genes below the homology threshold .

Figure 12 Alignment of the amino acid sequence of members of the AcrIIA8 anti- CRISPR protein family. The aligned proteins are all approximately 90-110 amino acids in length; where ArcIIA8 (encoded by ac27-l gene) shares a sequence identity of 81% to W0_009270720.1 derived from Erysipelotrichaceae ; 48% to WP_009270720.1; and between 29% and 40% to the other members of this protein family.

Figure 13 Alignment of the amino acid sequence of members of the AcrIIA6 anti- CRISPR protein family. The aligned proteins are all approximately 109 amino acids in length; wherein ArcIIA6 shares a sequence identity of 99% to WP_058324878.1 derived from Sinorhizobium sp. GL28 as well as to KGM08958.1 derived from Cellulomonas carbonis T26, based on the aligned regions.

Figure 14 Alignment of the amino acid sequence of members of the AcrIIA9 anti- CRISPR protein family. The aligned proteins are all approximately 140-145 amino acids in length; wherein ArcIIA9 shares a sequence identity of at least 78% with the selected members of the family; ranging from a sequence identity of 76% to WP_009036714.1 derived from Bacteroides sp. D20, up to 93% to WP_004294380.1 derived from Bacteroides fragilis. The aligned amino acid sequences share a PcfK domain spanning their N-terminal portion.

Figure 15 Alignment of the amino acid sequence of members of the AcrIIA7 anti- CRISPR protein family. ArcIIA7 shares a sequence identity of at least 60% with the selected members of the family. The aligned amino acid sequences share a DUF2829 domain .

Figure 16 Cartoon showing design of T7E1 assay for Anti-CRISPR protein activity in a human cell line (left hand) and image of agarose gel showing DNA fragments following T7 endonuclease I (T7E1) cleavage of Cas9 target DNA amplified from said cell line. The cartoon shows a human cell line (HEK293T) expressing Cas9, that together with guide RNA (crRNA) introduces a double-stranded break in a target locus on the genome, which is then repaired by a non-homologous end joining mechanism (NHEJ), leaving insertion or deletion mutations (indels). Genomic DNA is extracted from the cells; the target locus is then amplified; and the PCR products are then denatured and annealed to generate mismatched DNA hybrids around the target site. T7

endonuclease I cleaves the mismatched DNA hybrids resulting in cleaved products detectable on an agarose gel. The ararose gel (right hand image) shows that the proportion of target locus DNA in a human cell line (HEK293T) that is cleaved by CRISPR-Cas9-mediated double-stranded breaks and detected by the T7E1 assay, is inhibited by co-expression of the anti-CRISPR proteins AcrIIA6, AcrIIA7, AcrIIA8, AcrIIA9, and AcrIIA2.

Abbreviations and definition of terms:

Amino acid sequence identity: The term "sequence identity" as used herein, indicates a quantitative measure of the degree of homology between two amino acid sequences of substantially equal length . The two sequences to be compared must be aligned to give a best possible fit, by means of the insertion of gaps or alternatively, truncation at the ends of the protein sequences. The sequence identity can be calculated as ((Nref-Ndif)100)/(Nref), wherein Ndif is the total number of non-identical residues in the two sequences when aligned and wherein Nref is the number of residues in one of the sequences. Sequence identity between each AcrllA and its homologues is calculated by the BLAST program e.g. the BLASTP program (Pearson W.R and D.J. Lipman (1988)) (www.ncbi. nlm .nih .gov/cgi-bin/BLAST). The identity matrix between AcrIIAl-9 is determined by the sequence alignment method ClustalW with default parameters as described by Thompson J., et al 1994, available at JlttBiZZwmy ^ eM ^ acjjk/dustalwZ.

Preferably, the numbers of substitutions, insertions, additions or deletions of one or more amino acid residues in the polypeptide as compared to its comparator polypeptide is limited, i .e. no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 substitutions, no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 insertions, no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 additions, and no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 deletions. Preferably the substitutions are conservative amino acid substitutions : limited to exchanges within members of group 1 : Glycine, Alanine, Valine, Leucine, Isoleucine; group 2 : Serine, Cysteine, Selenocysteine, Threonine, Methionine; group 3 : proline; group 4 : Phenylalanine, Tyrosine, Tryptophan; Group 5 : Aspartate, Glutamate, Asparagine, Glutamine.

Cis element: is a sequence of DNA that is capable of conferring inducible expression of a gene that is functionally linked to a promoter; where the cis element may induce expression by regulating transcription of the gene or by regulating translation of transcript product of the gene.

Cognate: is used to refer to interacting pairs of functional entities; more specifically that each protein or protein cluster having RNA-guided endonuclease activity has a cognate mat-crRNA, and in the case of Type II CRISPR-Cas systems both a cognate mat-crRNA and cognate tracrRNA, that are required for their activity.

CRISPR system: is a CJustered R_egularly Interspaced S_hort P_alindromic R_epeats is bacterial immune system that cleave DNA or RNA in a sequence specific manner, mediated by a RNA guided nuclease. E. coli ToplO : is E. coli having chromosomal Genotype mcrA, A(mrr-hsdRMS- mcrBC), Phi80lacZ(del)M15, AlacX74, deoR, recAl, araD139, A(ara-leu)7697, galU, galK, rpsL(SmR), endAl, nupG

gi number: (genlnfo identifier) is a unique integer which identifies a particular sequence, independent of the database source, which is assigned by NCBI to all sequences processed into Entrez, including nucleotide sequences from DDBJ/EMBL/GenBank, protein sequences from SWISS-PROT, PIR and many others. Nucleic acid molecule: according to the invention comprises a gene whose nucleic acid sequence encodes a AcrllA protein of the invention.

Promoter: is a region of DNA that initiates transcription of a particular gene. Promoters are located near the transcription start sites of genes, on the same strand and upstream on the DNA (towards the 5' region of the sense strand). A promoter that is "functionally linked" to a gene is capable of driving expression of said gene. A promoter may be drive expression constitutively or only in response to an inducer. RBS: Ribosomal Binding Site is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of protein translation.

Heterologous gene and heterologous DNA molecule: have a different genetic origin from the recombinant cell in which they are expressed; and this also applies to the protein or transcript that they encode. The nucleotide sequence of the heterologous gene or heterologous DNA molecule may be optimized (e.g. codon optimization) with respect to the recombinant cell in which they are expressed. Heterologous gene and heterologous DNA molecule may be located on (and therefor be a part of) the chromosome or a episome of the recombinant cell, and may be inserted into this location by recombinant DNA cloning.

Detailed description of the invention

The invention provides isolated nucleic acid molecules encoding anti-CRISPR proteins; their encoded anti-CRISPR proteins; and the use of the isolated nucleic acid molecules and respective anti-CRISPR proteins in methods for regulating CRISPR-Cas associated activity. The regulation conferred by these proteins may provide an enhancement, or more preferably an inhibition of CRISPR-Cas associated activity. The anti-CRISPR proteins of the invention have broad application, since they can be used to regulate more than one CRISPR-Cas system. Examples of systems having CRISPR-Cas associated activity, their component molecules, and their expression are detailed in section III. The anti-CRISPR proteins and their encoding nucleic acid molecules were identified in soil or human gut fecal metagenomics libraries (Example 3, figures 4 and 5) by means of a high through-put screening assay (as described in Example 1 and 2, Figures 1-3). The identified anti-CRISPR proteins were classified as ArcIIA6, ArcIIA7 ArcIIA8 and ArcIIA9, based on their amino acid sequence encoded by their respective nucleic acid molecules. The nucleic acid molecules encoding anti-CRISPR proteins of the invention may comprise a promoter sequence and be capabe of expression of said proteins in vitro or in vivo. I Nucleic acid molecules encoding anti-CRISPR proteins

The anti-CRISPR proteins, encoded by the isolated nucleic acids molecules of the invention and their homologues, are members of 4 families of anti-CRISPR proteins, that are both distinct from each other and from all previously identified anti-CRISPR proteins. Sequence identity between the anti-CRISPR proteins, ArcIIA6, ArcIIA7 ArcIIA8 and ArcIIA9 respectively and when compared to the previously identified AcrIIAl - 5, is less than 20% (Figure 8). Furthermore, all the members of the four protein families, encoded by genes identified in reference genomes, cluster in discrete groups that are non-overlapping with each other or with any of AcrIIAl - 5, confirming the classification of the identified anti-CRISPR proteins and their family members as providing 4 new families of anti-CRISPR proteins (Figure 9). A surprising characteristic of the identified genes encoding the anti-CRISPR proteins of the invention is their wide distribution in natural microbial populations and habitats. While genes encoding AcrIIAl - 5 are confined to Firmicutes bacteria, whose homologues are confined to a few species, the genes encoding members of the anti-CRISPR ArcIIA6, ArcIIA7 and ArcIIA9 protein families have a wide host range (Figure 10A). Additionally, genes and their homologues encoding ArcIIA7 and ArcIIA9 protein families were found in a wide range of habitats, based on analysis of publically available viromes datasets (Figure 10B). ii Structure of the ArcIIA8 anti-CRISPR proteins of the invention

A nucleic acid molecule [SEQ ID No. : 168] encoding ArcIIA8 [SEQ ID No. : 169] was identified in a gut metagenomic library. The genes encoding the AcrIIA8 protein family comprise at least 52 homologues present in 68 reference genomes of 38 unique species of Firmicutes. A common characteristic of the AcrIIA8 protein family is the location of their genes, which are all located within prophage DNA fragments present in the genomes of the Firmicutes species. The highly conserved synteny of these prophage DNA regions, and co-localisation of the AcrIIA8 genes within the region is seen in the alignment in Figure 11. The identification of the AcrIIA8 homologues is based on their highly conserved genetic origin (figure 11), clustering of the members of this protein family (Figure 9), and on the conservation of their primary protein structure as seen from the alignment of their sequences (Figure 12).

Accordingly, the invention provides an isolated nucleic acid molecule encoding an anti- CRISPR protein belonging to the ArcIIA8 family, wherein the amino acid sequence of said protein has at least 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96,

97, 98, 99, or 100% amino acid sequence identity to an amino acid sequence selected from among SEQ ID )'s : 169 [ArcIIA8]; 171 [WP_009270720.1], 173 [WP_008690853.1]; 175 [WP_076236714.1], 177 [WP_034586516.1], 179

[WP_035290260.1], 181 [WP_004605840.1], 183 [WP_003465204.1], 185

[WP_003477332.1], 187 [WP_079286774.1], 189 [WP_055225583.1], 191

[WP_077229201.1], 193 [WP_009242330.1], 195 [WP_048571017.1], 197

[WP_073005556.1], 199 [WP_049041135.1], 201 [WP_045518524.1], 203

[WP_061327540.1], 205 [WP_033066342.1], 207 [WP_071647430.1], 209

[WP_089966494.1], 211 [WP_041082932.1], 213 [WP_069680440.1], 215

[WP_061301923.1], 217 [V _079481466.1], and 219 [Wl 099313953.1] .

In one preferred embodiment, the amino acid sequence of the encoded anti-CRISPR protein belonging to the ArcIIA8 family has at at least 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% amino acid sequence identity to an amino acid sequence selected from among SEQ ID No. : 169 [ArcIIA8]; 171 [WP_009270720.1], 173 [WP_008690853.1] .

Iii Structure of the ArcIIA6 anti-CRISPR proteins of the invention

A nucleic acid molecule [SEQ ID No. : 220] encoding ArcIIA6 [SEQ ID No. : 221] was identified in a soil metagenomic library. The genes encoding the AcrIIA6 protein family includes at least 2 homologues present in Cellulomonas carbonis T26 ( Actinobacteria ) and Sinorhizobium sp. GL28 ( Proteobacteria ). An alignment revealing their highly conserved structure is shown in Figure 13.

Accordingly, the invention provides an isolated nucleic acid molecule encoding an anti- CRISPR protein belonging to the ArcIIA6 family, wherein the amino acid sequence of said protein has at at least 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% amino acid sequence identity to an amino acid sequence selected from among SEQ ID No. : SEQ ID No. : 221 [ArcIIA6], 223 [WP_058324878.1] and SEQ ID No. : 225 [KGM08958.1] .

Iiii Structure of the ArcIIA9 anti-CRISPR proteins of the invention

A nucleic acid molecule [SEQ ID No. : 226] encoding ArcIIA9 [SEQ ID No. : 227] was identified in a human gut fecal metagenomic library. The AcrIIA9 protein family is very large, encoded by genes distributed across multiple families of Bacteroidetes. Its members are encoded by genes in several Bacteroides species and the protein members from three Parabacteroides species share 100% amino acid sequence identity. Gene homologues are also found in genomes of the recently defined phylum Balneolaeota, as well as the phyla Firmicutes, Proteobacteria and Actinobacteria. An amino acid sequence alignment with ten members of the AcrIIA9 protein family is shown in Figure 14; where all members share in common a highly conserved PcfK domain that is characterised by completely conserved residues (D and L).

Accordingly, the invention provides an isolated nucleic acid molecule encoding an anti- CRISPR protein belonging to the ArcIIA9 family, wherein the amino acid sequence of said protein has at at least 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% sequence identity to an amino acid sequence selected from among SEQ ID No. : SEQ ID No. : 227 [ArcIIA9], 229 [WP_009036714.1], 231 [WP_072067265.1], 233 [WP_074707943.1], 235 [WP_026367493.1], 237

[WP_008152072.1], 239 [WP_009040414.1], 241 [WP_008654998.1], 243 [EKC79239.1 ], 245 [WP_055173355.1], 247 [WP_004294380.1] .

Iiv Structure of the ArcIIA7 anti-CRISPR proteins of the invention

A nucleic acid molecule [SEQ ID No. : 248] encoding ArcIIA7 [SEQ ID No. : 249] was identified in a human gut fecal metagenomic library. The AcrIIA7 family is very large, being distributed across 6 distinct phyla ( Firmicutes , Proteobacteria, Bacteroidetes, Actinobacteria, Cyanobacteria and Spirochaetes). The AcrIIA7 gene family has homologues in viral reference genomes, including all three tailed bacteriophage families (i.e. Siphoviridae, Myoviridae and Podoviridae) , confirming its ubiquitousness. An amino acid sequence alignment with five selected AcrIIA7 protein family members is shown in Figure 15.

Accordingly, the invention provides an isolated nucleic acid molecule encoding an anti- CRISPR protein belonging to the ArcIIA7 family, wherein the amino acid sequence of said protein has at at least 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% amino acid sequence identity to an amino acid sequence selected from among SEQ ID No. : 249 [ArcIIA7], 251 [WP_081033090.1], 253 [WP_006800621.1], 255 [WP_005795395.1], and 257 [EKC44436.1] .

Iv Functional properties of the anti-CRISPR proteins of the invention

The anti-CRISPR proteins ArcIIA6, ArcIIA7 ArcIIA8 and ArcIIA9 and their identified family members (Ii - Iiv), are capable of regulating the activity of a CRISPR-Cas system. The ability of the anti-CRISPR proteins ArcIIA6, ArcIIA7 ArcIIA8 and ArcIIA9 to regulate a CRISPR-Cas system is illustrated in Example 3 and 5, where an in vivo assay demonstrated regulation of CRISPR-Cas9 editing of a target DNA sequence by each of these anti-CRISPR proteinin a microbial cell (Figure 6), and in a human cell (Figure 16). A regulatory effect of said anti-CRISPR proteins on CRISPR-Cas9 editing was further demonstrated in an in vitro assay where inhibition of CRISPR-Cas9 nuclease cleavage of a target DNA sequence was detected (Figure 7A). Regulation of CRISPR-Cas systems may employ a range of molecular mechanisms, one involving the direct binding of the anti-CRISPR protein with components of the CRISPR-Cas system . The binding affinitiy of the anti-CRISPR proteins of the invention with spCas9 was determined; and surprisingly showed that AcrIIA8, in particular, had a higher binding affinity for spCas9 that AcrIIA2 (Figure 7B). II A self-replicating genetic element comprising the nucleic acid molecule encoding an anti-CRISPR protein, and cells comprising the genetic element.

Hi Anti-CRISPR protein expression vectors

The invention further provides a self-replicating genetic element comprising a nucleic acid molecule encoding an anti-CRISPR protein of the invention (a member of one of said ArcIIA6, ArcIIA7 ArcIIA8 and ArcIIA9 protein families); where said nucleic acid molecule is operably linked to a promoter. Preferable the promoter is a heterologous promoter capable of directing gene expression in a selected host cell; where the promoter may drive constitutive or inducible expression, (e.g . a promoter in combination with a regulating transactiving element (e.g . riboswitch or inducible transcription factor). The self-replicating genetic element, comprising said nucleic acid molecule linked to a promoter, facilitates expression of the anti-CRISPR protein either in vitro or in vivo, where the choice and design of the genetic element may be adapted for expression in a given cell-type or tissue-type. A suitable self-replicating genetic element for expression in bacteria includes a plasmid with a origin of replication (e.g. pl5A as described in Example 1) and either a constitutive or inducible promoter to control expression of the nucleic acid molecule encoding the operably linked anti- CRISPR protein . A suitable self-replicating genetic element for expression in a mammalian cell typically contain both prokaryotic sequences that facilitate the propagation of the vector in bacteria and one or more eukaryotic transcription units that are expressed only in eukaryotic cells, as known to one skilled in the art. Delivery and expression of a nucleic acid molecule encoding the operably linked anti-CRISPR protein in mammalian cells may also be facilitated using viral vectors adapted for the purpose, for example a Lentivirus or Adenovirus, as known to one skilled in the art. Ilii CRISPR-Cas and anti-CRISPR protein co-expression vectors

The invention further provides a self-replicating genetic element (as decribed in Iii) comprising both a nucleic acid molecule encoding an anti-CRISPR protein of the invention and a CRISPR-Cas expression system; capable of directing the expression of both the anti-CRISPR protein and the components of the CRISPR-Cas system, as detailed in section III. Alternatively, the anti-CRISPR protein of the invention and the CRISPR-Cas expression system may each be encoded by their respective nucleic acid molecules, said respective molecules being comprised on separate self-replicating genetic elements.

Iliii Biological cells expressing an anti-CRISPR protein

The invention further provides a biological cell comprising a nucleic acid molecule encoding an anti-CRISPR protein of the invention, or a self-replicating genetic element comprising said nucleic acid molecule. The cell may be a prokaryotic or eukaryotic cell, and may be present in, or derived from, a bacterium, yeast, fungus, insect, plant or mammal . When the biological cell is a eukaryotic cell, the cell may be an isolated cell or it may be present in a biological tissue; where the tissue may be healthy, diseased, and/or have genetic mutations. The biological tissue may include any single tissue (e.g ., a collection of cells that may be interconnected) or a group of tissues making up an organ or part or region of the body of an organism and can include lung tissue, skeletal tissue, and/or muscle tissue. Exemplary tissues include, but are not limited to those derived from liver, lung, thyroid, skin, pancreas, blood vessels, bladder, kidneys, brain, biliary tree, duodenum, abdominal aorta, iliac vein, heart and intestines, including any combination thereof.

Iliv Isolated anti-CRISPR protein of the invention

The invention further provides an isolated anti-CRISPR protein expressed by the nucleic acid molecule of the invention, wherein the protein is a member of a protein family selected from the group ArcIIA6 protein, ArcIIA7 protein, ArcIIA8 protein and ArcIIA9 protein families. Methods for isolating an expressed anti-CRISPR protein of the invention are known to the skilled man and are illustrated in Example 3.5.

Ill Use of the nucleic acid molecule, or anti-CRISPR protein of the invention to regulate CRISPR-Cas associated activity

The anti-CRISPR proteins, encoded by the nucleic acid molecules of the invention, find use in a method of regulating CRIPSR-Cas associated activity. For example they may be used in a method to regulate the activity of a CRISPR-Cas system, wherein said system is selected from the group consisting of Type 1, Type II, Type III; Type IV, Type V, and Type VI.

The CRISPR-Cas associated activity that can be regulated by the anti-CRISPR proteins of the invention share the common functional property of modifying or cleaving a target DNA molecule or its transcript; and thereby allowing genetic material to be added, removed, or altered at particular locations in the genome of a cell or in a DNA molecule. The genome of a cell or the DNA molecule comprising the target DNA to be modified or cleaved by CRISPR-Cas associated activity, be means of a system that can be present in a biological cell as defined in section Iliii. In general terms, such a system is located in a CRISPR locus consists of a CRISPR array, comprising short variable DNA sequences (called spacer sequence(s)) interspersed by short palindromic repeat sequences (called repeat sequence(s)), this being flanked by diverse cognate cas genes. The CRISPR-Cas systems may for example be those that have been classified into 2 classes; whereby members of Class 1 comprise a multi-subunit protein complex, each subunit encoded by a gene in the CRISPR locus; while members of Class 2 comprise a single multidomain protein encoded by a single gene. Class 1, includes type I, III and IV CRISPR-Cas systems; while Class 2, includes type II, V and VI CRISPR-Cas systems. Thus, in total there are six types, which have been further classified into over 28 subtypes; based on their protein structure, signature Cas genes and operon arrangements (Koonin, E. V. et al . , 2017).

The genetic loci encoding Class 1, type I CRISPR-Cas systems share in common a signature gene cas3 (or its variant cas3 ' ) encoding a nuclease that both unwinds and cleaves target double-stranded DNA (dsDNA) and RNA-DNA duplexes. Type I systems comprise seven subtypes, I-A to I-F, and I-U. The genetic loci comprising Class 1, type III CRISPR-Cas systems all contain the signature gene caslO, as well genes encoding Cas5 gene and several paralogues of Cas7 that co-transcriptionally target RNA and DNA. The Type III systems comprise four subtypes, III-A to III-D. The loci comprising Class 1, type IV CRISPR-Cas system comprise two subtypes, IV-A and IV-B; where the Csfl can serve as as signature gene.

The genetic loci encoding the Class 2, type II CRISPR-Cas systems, comprise the signature Cas9 gene encoding a multidomain protein that combines the functions of a crRNA-effector complex and cleaves target DNA. All Type II loci are characterised by a nucleic acid sequence encoding a tracrRNA, which is partially complementary to the repeats within the respective CRISPR array. Type II systems comprise three subtypes, II-A to II-C. Class 2, Type V CRISPR-Cas systems comprise the signature gene Cpfl, also encoding a single multidomain protein. Unlike Cas9, the Cpfl gene is commonly located outside a CRISPR-Cas locus. Type V systems are currently divided into five subtypes, V-A to V-E. For the purpose of expression of members of the Class 2 typell CRISPR-Cas systems in a cell, it is sufficient to express the signature Cas gene encoding the multidomain protein having RNA guided-endonuclease activity. Expression of all other Class 1 and 2 CRISPR-Cas systems can be achieved by co-expressing the several genes encoding the Cas proteins that constitute the multi-subunit protein complex having RNA guided- endonuclease activity.

When expression of the CRISPR locus is activated, transcription of the CRISPR array yields a pre-crRNA, that is processed into individual mature crRNAs (mat-crRNA). Each mat-crRNA sequence, aided by its cognate single or multi-subunit Cas protein(s) (effector complex), functions as a guide to specifically target the effector complex to modify or cleave a target nucleic acid molecule (DNA or RNA). Processing of the pre- crRNA is mediated by either an endonuclease subunit of the effector complex; or in the case of type II CRISPR-cas system by a combination of a RNase III protein together with a trans-activating CRISPR RNA (tracrRNA); where tracrRNA serves to recruit the components of the type II effector complex.

For the purpose of expressing of any member of the Class 1 or 2 CRISPR-Cas systems in a cell, an individual mat-crRNA comprising at least one spacer sequence, flanked at least at one end (preferably at both ends) by a palindromic repeat sequence, is sufficient to specifically guide the cognate effector complex to cleave a reporter nucleic acid molecule (DNA or RNA). Alternatively, a complete CRISPR array (pre-crRNA) can be expressed. The target nucleic acid molecule comprises a sequence that is complementary to the spacer sequence. A tracrRNA is additionally necessary for the expression of a Type II CRISPR-cas system. Synthetic DNA molecules can conveniently be engineered to express a RNA transcript comprising a spacer sequence flanked, at least at one end, by a palindromic repeat sequence (i.e. a mat-RNA), for use in the expression of the Class 1 or 2 CRISPR-Cas systems. However, in the case a Type II CRISPR-cas system, the RNA transcript of the synthetic DNA molecule can comprise a a mat-RNA fused with a tracrRNA sequence, also called single-guide RNA (sgRNA).

Accordingly, expression of a CRISPR-Cas system in a biological cell can be achieved by providing :

a. a first gene encoding a polypeptide, or a first gene cluster encoding a polypeptide complex, said polypeptide or polypeptide complex having RNA-guided endonuclease activity, wherein said gene or gene cluster is operably linked to a promoter; and b. a nucleic acid molecule operably linked to a promoter, wherein said nucleic acid molecule encodes one or more RNA molecules capable of guiding said polypeptide or polypeptide complex having RNA-guided endonuclease activity to a target gene or its transcript,

wherein the polypeptide or polypeptide complex having RNA-guided endonuclease activity is selected from the group of CRISPR-Cas systems consisting of Type 1, Type II, Type III; Type IV, Type V, and Type VI and is capable of modifying or inactivating said target gene or its transcript.

More specifically, said nucleic acid molecule may encode any one of:

5 - a pre-crRNA molecule (encoded by a DNA molecule comprising a CRISPR array) wherein said pre-crRNA comprises a spacer sequence complementary to a nucleic acid sequence of at least 15 nucleotides of at least one strand of said target gene or its transcript;

- a mat-crRNA molecule comprising at least one spacer sequence flanked by at least 10 one palindromic repeat sequence, wherein the spacer sequence is complementary to a nucleic acid sequence of at least 15 nucleotides of at least one strand of said target gene or its transcript;

- said pre-crRNA molecule and a trans-activating crRNA; or

- said mat-crRNA molecule and a trans-activating crRNA (or a fusion thereof),

15 wherein the palindromic repeat sequence and the trans-activating crRNA are cognate with respect to the polypeptide or polypeptide complex having RNA-guided endonuclease activity.

Examples of the components of polypeptide or polypeptide complex that make up the 20 CRISPR-Cas systems that may be expressed in a cell, and be regulated by the anti- CRISPR proteins of the invention are listed in Table 1 below, together with their DNA coding sequences. In a preferred embodiment, the amino acid sequence of the polypeptide or polypeptides of a protein complex having RNA-guided endonuclease activity has at least 70, 75, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 25 96, 97, 98, 99, or 100% amino acid sequence identity to an amino acid sequence selected from Table 1. These polypeptide or polypeptide complexes each selectively associate with a cognate palindromic repeat sequence and, where applicable also a cognate tracrRNA. Thus, when the polypeptide or polypeptide complex is selected from Table 1, then the cognate palindromic repeat sequence and cognate tracrRNA are 30 encoded by the nucleotide sequences listed in the adjacent boxes in Table 1.

N/A = not applicable or required; * non-essential Cas component of Type IB

The target gene may comprise a protospacer adjacent motif (PAM) located adjacent to the sequence complementary to the mat-crRNA spacer sequence, in particular where 5 the RNA-guided endonuclease polypeptide is a Type II A Cas (e.g. Cas9). A canonical PAM is the sequence 5'-NGG-3' where "N" is any nucleobase.

In one embodiment, the CRISPR-Cas system regulated by an anti-CRISPR protein of the invention is a Class 2 type II A Cas9 system.

10 In one embodiment, the CRISPR-Cas system regulated by an anti-CRISPR protein of the invention is a Class 2 type V A system comprising the signature gene Cpfl encoding a Cpfl protein (e.g. WP_014550095.1).

In one embodiment, the CRISPR-Cas system regulated by an anti-CRISPR protein of the invention is a Class 1 type I system comprising the signature gene Cas3 encoding 15 a Cas3 protein. Each of said first gene, first gene cluster, and nucleic acid molecule may independently be a component of a self-replicating genetic element (episome) that is stably replicated in a biological cell or is an integrated component of its chromosome. IV Applications of the nucleic acid molecule of the invention and its encoded anti-CRISPR protein

The anti-CRISPR proteins and their respective encoding nucleic acid molecule, capable of regulating CRISPR-Cas gene editing activity, provide an advantageous adjunct (e.g. by co-administration) in permitting safe and practical biological therapeutics through spatial or temporal control of CRISPR-Cas activity; controlling CRISPR-Cas-based gene drives in wild populations to reduce the ecological consequences of such forced inheritance schemes; and contributing to general research into various biotechnological, agricultural, and medical applications of gene editing technologies. IVi. Temporal control to minimize Off-Target Effects

An anti-CRISPR protein of the invention, and/or its respective encoding nucleic acid molecule, find a use in a method of controlling the duration of CRISPR-Cas activity. For example, a CRISPR-Cas expression system delivered into a cell, can be expressed, and allowed to function for a predetermined duration shown empirically to yield efficient on-target editing, and then inactivated through the co- or post-delivery of the anti-CRISPR protein or its encoding nucleic acid molecule. In some cases it could be important to stop Cas9 activity after the initial rounds of mitosis, otherwise it could give rise to mosaic genotypes. IVii Control Of Spatial Activity

An anti-CRISPR protein of the invention, and/or its respective encoding nucleic acid molecule, find a use in a method of controlling the spacial activity of a CRISPR-Cas having two or more tissue or cell type specific activities where the anti-CRISPR protein selectively regulates at least one of said plurality of tissue- or cell type-specific activities. The co- or post-delivery of the anti-CRISPR protein or its encoding nucleic acid molecule to a cell with a CRISPR-Cas expression system will result in CRISPR-Cas activity being confined to a selected subset of possible cell types or tissues.

IViii Gene Drives

An anti-CRISPR protein of the invention, and/or its respective encoding nucleic acid molecule, find a use in a method of controlling CRISPR-Cas mediated gene drives used to modify the genomes of entire populations of organisms for potentially beneficial purposes. For examples use in i) preventing the spread of genes in insect vectors of infectious disease (malaria, dengue, etc.); ii) preventing the ability of a general population to act as a host for a pathogen; or iii) rendering a specific population sub- fertile or even sterile. By introducing the nucleic acid sequence encoding the anti- CRISPR of the invention into a "reserve population" it would be possible to counter the potential for negative effects of an uncontrolled gene drive.

IViv Pharmaceutical compositions

A composition comprising an anti-CRISPR protein of the invention, and/or its respective encoding nucleic acid molecule is provided for use as a medicament, where the composition may optionally comprise ingredients required for administration of said composition to a subject in need thereof. Said composition finds application when provided as an adjunct (e.g . co-administration) to the administration of a CRISPR-Cas expression system to the subject, whereby said anti-CRISPR protein is one that is capable of regulating the CRISPR-Cas gene editing activity of said CRISPR-Cas expression system such that target DNA in the subject is edited, while limiting or preventing off-target DNA editing. The DNA in the subject that is the target for DNA editing by the CRISPR-Cas expression system is DNA in a somatic cell (and not a germ-line cell).

EXAMPLES

Example 1 Construction and cloning of a genetic circuit for screening for polypeptide inhibitors of CRISPR-Cas associated activity

1.1 Construction of pCasens3 and pDual3 plasmids

The pCasens3 and pDual3 plasmids, illustrated in Figure 2A (central and right-hand plasmids respectively), were constructed as follows:

Plasmid pCasens3 containing the Streptococcus pyogenes Cas9 gene (spCas9) [SEQ ID No. : ] was constructed in a single step by using USER cloning (Genee et al . 2014). A DNA fragment containing spCas9 gene [SEQ ID No. : 150] and its endogenous terminator [SEQ ID No. : 152] was PCR amplified from a plasmid DS-SPcas (supplied by addgene: Plasmid #48645) (Esvelt et al. 2013); and the amplified DNA fragment was cloned into the backbone of the plasmid pSEVA47 (as described by Martinez-Garcia et al. 2014). The plasmid pSEVA47 contains: the low copy number origin of replication pSClOl [SEQ ID No. : 155], and the antibiotic resistance gene aadA [SEQ ID No. : 153] that confers resistance against Spectinomycin and a restriction site for insertion of a gene. A DNA molecule [SEQ ID No. : 149] encoding a theophylline translational riboswitch was placed in front of Cas9 using a long forward primer:

5'AAGTCTAGCGAACCGCACTTAATACGACTCACTATAGGTACCGGTGATACCAGCA TCGTCTT GATGCCCTTGGCAGCACCCTGCT AAGGT AACAACAAGATGATGGAT AAGAAAT ACT CAAT AGG CTTAGATATCGGCAC-3' [SEQ ID No. : 156]). Additionally, a sigma70 constitutive promoter (J23100; as defined by http://parts.igem .org/Promoters/Catalog/Anderson) was also introduced using a reverse primer in substitution for the endogenous of spCas9 promoter:

5'-ctctagTagctagcactgtacctaggactgagctagccgtcaaGTTAGCTGTGCTCT AGAAGCTAGCAG-3' [SEQ ID No. : 157]

Plasmid pDual3 for arabinose inducible sgRNA expression was constructed by using USER cloning (Genee et al . 2014) to insert the respective genes into the backbone of the plasmid was from pSEVA3610, described by Martinez-Garcia et al., 2015. The chimeric gRNA encoding sequence [SEQ ID No. : 161] with a TrrnB terminator [SEQ ID No. : 162] and a chloramphenicol resistance gene (CmR) [SEQ ID No. : 164] were cloned downstream of the inducible pBAD promoter [SEQ ID No. : 158] and araC gene [SEQ ID No. : 159] encoding an L-arabinose-inducible transcription factor. In the absence of L-arabinose, the AraC protein binds to operator sites within pBAD effectively repressing transcription . On addition of L-arabinose, AraC protein binds to L-arabinose to form a complex having a DNA-binding conformation that activates pBAD leading to induction of transcription of its cognate genes, namely sgRNA and CmR. The plasmid comprises a low copy number origin of replication, pl5A [SEQ ID No. : 163] . The first twenty nucleotide sequence of the encoded gRNA transcript is complementary to nucleotides 96 : 115 of the + strand in the CmR gene having [SEQ ID No. : 164]

Positive strand nucleotides 96 : 115 : CTATAACCAGACCGTTCAGC [SEQ ID No. : 166]

1.2 Construction of positive and negative control plasmids for testing the genetic circuit

Control plasmids for testing the genetic circuit include a positive control plasmid (left- hand plasmid in Figure 2A) comprising an acrlla2 gene [SEQ ID No. : 159] capable of expression of the anti-CRISPR protein, AcrIIA2, and a negative control plasmid (Figure 2B) comprising a gfp gene [SEQ ID No. : 140] with rrnB T1 terminator [SEQ ID No. : 142] capable of expression of GFP; each under the control of a constitutive pTET promoter (lacking the TET repressor) and RBS [SEQ ID No. : 139] . Both plasmids have a ColEl ori [SEQ ID No. : 143] and kanamycin resistance gene [SEQ ID No. : 144] When the control plasmids are transformed into host cells, comprising the genetic circuit (i .e. pCasens3 and pDual3 plasmids as described in Example 1.1); the expression of the anti-CRISPR protein, AcrIIA2 is capable of inhibiting Cas9 mediated inactivation of the CmR gene. 1.3 Host cells transformed with pCasens3 and pDual3 plasmids

E.coli TOPIO were made electro-competent and transformed with the genetic circuit comprising the plasmids: pCasens3 and pDual3 (obtained in example 1.1) as follows: i. A 40 pi sample of electrocompetant E.coli TOPIO cells were transferred to microcentrifuge tubes; to which a 1 m I sample of pCasens3 and pDual3 DNA was added; and this DNA-cell mixture was then transferred to a cold cuvette and electroporated with a pulse having 1.8 kv, 200 ohms and 25 pF.

ii. Immediately thereafter, a 975 pi volume of 37°C SOC [Ausubel F.M. et al.

(1987)] was added to the electroporated cells; and then incubated on rollers at 37°C for 1 h;

iii. The recovered transformed cells were diluted 10 L 0 - 10 L 8; plated on

selective solid media comprising Luria Broth agar supplemented with : 50 pg/mL spectinomycin; 30 pg/mL chloramphenicol and 50 pg/mL kanamycin; and incubate overnight at 37°C.

iv. Transformed cell colonies comprising the pCasens3 and pDual3 plasmids (£. coli genetic circuit [E. coli-GC ]) were selected; amplified and stored.

1.4 E. coli-GC cells transformed with control plasmids

E.coli TOPIO containing the genetic circuit comprising the plasmids: pCasens3 and pDual3 (E. coli-GC) were made electro-competent and transformed with either the negative control plasmid expressing GFP or positive control comprised a plasmid expressing AcrIIA2, as described in Example 1.2, employing the transformation protocol described in Example 1.3, but where the host cells were E. coli-GC. Example 2 E. coli-GC host cells provide an effective screening tool for polypeptide inhibitors of CRISPR-Cas associated activity

2.1 E. coli-GC host cells detect proteins having anti-CRISPR activity

E. coli-GC host cells transformed with the positive control plasmid expressing AcrIIA2 (Figure 2A), or the negative control expressing GFP (Figure 2B); were screened for anti-CRISPR activity as follows:

i. Overnight cultures of said E.coli host cells were prepared by culture in 3ml 2xYT media supplemented with the following antibiotics (50 pg/mL

spectinomycin; 30 pg/mL chloramphenicol and 50 pg/mL kanamycin) at 37°C and 250RPM;

ii. Said cultures were used to prepared dilutions from 10 L 0-10 L 7 and 5pl

volumes of each dilution was spotted onto a 1 st and 2 nd set of agar plates, in order to determine both cell death and cell viability, having the following agar composition : 1 st set of plates used to determine cell death comprised LB agar supplemented with : 2mM theophylline; 1% arabinose and antibiotics (50 pg/mL

spectinomycin; 30 pg/mL chloramphenicol and 50 pg/mL kanamycin);

2 nd set of plates used to determine cell viability comprised LB agar

supplemented with : 2mM theophylline; 1% arabinose; and antibiotics (50 pg/mL spectinomycin; and 50 pg/mL kanamycin).

The survival of E. coli-GC host cells comprising the negative control plasmid, and grown under conditions that induce expression of the genetic circuit, is significantly reduced (see Figure 3B). However, expression of the, AcrIIA2 is seen to enable cell growth and survival, consistent with the known anti-CRISPR properties of this protein (Figure 3A and B).

Example 3 Construction and screening of Metagenomic Libraries for anti-

CRISPR proteins

3.1 Construction of expression metagenome libraries from diverse sources

Functional metagenomic libraries from human, cow and pig fecal samples and soil samples were constructed as described previously [Sommer et al. 2009; Genee et a I . ,

2016] . The steps of the procedure were as follows:

i) total DNA was isolated from each of 5 g of fecal samples and a soil sample using the PowerMax Soil DNA Isolation Kit (Mobio Laboratories Inc.); where after total DNA isolated from each sample was treated as follows :

ii) the extracted DNA, in a total sample volume of 200pl in minitubes, was

fragmented by sonication into fragments of an average size of 4-5 kb using a Covaris E210 (Massachusetts, USA);

iii) the sheared DNA (200mI) was size-selected by separation on a 1 % agarose gel with DNA dye and DNA migrating in the 3-6 kb region was extracted using a Qiagen gel extaction kit;

iv) the gel-extracted DNA fragments were end-repaired using an End-It end

repair kit (Epicentre) and selected using PCR clean up column; and v) the end-repaired fragments were blunt-end cloned into the Hindi site, down stream of a constitutive pTET promoter (lacking the tet repressor) and RBS [SEQ ID No. : 139] in the expression 100-200 ng pZE21 plasmid in a 1 : 5 ratio [Lutz and Bujard, 1997] using the Fast Link ligation kit (Epicentre) over-night and the reaction was cleaned with a spin column and eluted in a 15 mI twice; vi) 2 mI cleaned ligation mix was transformation into 50 mI E. coli toplO cells by electroporation; and cells in the transformation mix were recovered in 1 ml SOC medium for 2 h; using multiple electroporations to increase library size; vii) the transformation mix were plated in 1 : 100 and 1 : 1000 dilutions on kanamycin 50 pg/ml plates to create a library;

viii) the library was amplified by inoculation into 10 ml LB+Kan 50 and grown over night; 5 ml 50 % glycerol is added and the library is aliquoted into 1 ml portions and frozen at -80 C.

In brief, plasmid DNA, comprising the amplified metagenomic library was extracted and recovered using Machery-Negel plasmid DNA purification kit as follows:

1. An overnight culture of bacteria comprising the metagenomic library was grow in 3 ml LB medium supplemented with 50 pg/mL kanamycin;

2. the culture was centrifuged, and the pelleted bacteria were resuspend the bacteria in buffer, to which a denaturing solution was added;

3. the proteins and genomic DNA were pelleted by centrifugation, and the

plasmid-containing supernatant was recovered and purified through the Nucleospin ® plasmid column on the kit; and bound plasmid DNA was eluted from the column using water or a neutral buffer such as Tris:EDTA, to provide a plasmid DNA comprising the metagenomic DNA expression library.

3.2 Transformation and screening of Metagenomic DNA expression library into host cells, comprising genetic circuit

E.coli TOP10 containing the genetic circuit comprising the plasmids: pCasens3 and pDual3 (E. coli-GC) were made electro-competent and transformed with plasmids for expression of each metagenomic library (obtained in example 3.1) as follows:

i. E.coli TOP10 cells (supplied by ThermoFischer) containing the plasmids:

pCasens3 and pDual3 were cultured in 3 ml of 2xYT media with 50 pg/mL spectinomycin and 30 pg/mL chloramphenicol for 16-18 hours at 37°C and 250 rpm;

ii. the culture from (i) was inoculated 1 : 100 in 3 ml of pre-warmed 2xYT media with 50 pg/mL spectinomycin and 30 pg/mL chloramphenicol; and cultured at 37°C at 250 rpm until the cell density reached 0.5-0.7 OD600;

iii. the cultures from (ii) were cooled on ice; and the cells harvested by

centrifugation at 10 000 xg for 1 min;

iv. the cells from (iii) were washed by re-suspension in cold water and then

harvested again by centrifugation; then the cells were re-suspended in 1 ml glycerol and harvested by centrifugation; and finally the cells were re suspended in the residual water and then placed on ice, ready for electroporation. v. A 40 m I sample of the electrocompetant cells from (iv) were transferred to microcentrifuge tubes; to which a 1 m I sample of the metagenomic library was added; and this DNA-cell mixture was then transferred to a cold cuvette and electroporated with a pulse having 1.8 kv, 200 ohms and 25 pF.

vi. Immediately thereafter, a 975 mI volume of 37°C SOC [Ausubel F.M . et al.

(1987)] was added to the electroporated cells from (v), which were mixed and transferred to a 15 ml- tube; and incubated on rollers at 37°C for 1 h;

vii . serial dilutions from 10°- 10 8 from the recovered cells from (vi) were made; and lOOul of each dilution were plated on selective solid media comprising Luria Broth agar supplemented with : 2Mm theophylline; 1% arabinose; 50 pg/mL spectinomycin; 30 pg/mL chloramphenicol and 50 pg/mL kanamycin; and incubated overnight at 37°C.

viii . Colonies growing on the plates were collected by adding 1 ml Fl 2 0, after which the colonies were scraped off the plate with a sterile loop. The bacterial cells were then pelleted and washed in water by centrifugation and the plasmid DNA in collected bacterial cells was extracted using a Machery-Negel plasmid DNA purification kit. 3.3 Detection of metagenomic DNA encoding proteins having anti-CRISPR activity

E.coli host cells containing the genetic circuit, transformed with the library of metagenomic DNA expression plasmids (Figure 5) prepared as described in Example 3.1, were screened for expression of proteins having anti-CRISPR activity (as described in Example 3.2 and figure 4). Colonies that appeared on the selective growth medium were selected and pooled; their metagenomic inserts were extracted, barcoded per library and sequenced using nanopore technology to obtain full inserts (22). The resulting contigs were annotated with blastx and manually curated. The selected metagenomic DNA insert (39 inserts) were each re-cloned in the metagenomics DNA expression plasmid (Figure 5, plasmid to the left); and then transformed into cells of the E. coli strain containing the genetic circuit comprising the plasmids: pCasens3 and pDual3 (E. coli-GC). Eleven of the individualy tested inserts showed anti-CRISPR activity that was greater than the negative control (GFP) and several were similar to that of AcrIIA2 in the selection assay (Figure 6).

3.4 Identification of ORFs for polypeptides having ACR activity

Identification of putative ORFs from inserts with ACR activity were detected using MetaGeneMark v3.25 (Zhu, W., et al ., 2010) with the flags ' gmhmmp -m MetaGeneMark_vl . mod -f G ' . In parallel the inserts were re-annotated using blastx using the NT database accessed at the 17th of May 2017. Based on the combined annotation, potentially 16 biologically active ORFs were manually identified (ie. >80% of the subject gene present, no missing N-terminus) and USER primers were designed to clone the 16 identified ORFs into the pNIC28-Bsa4 plasmid (Savitsky, P, et al ., 2010) .

3.5 Expression of novel ACR polypeptides

Nine of the cloned ORFs in the pNIC28-Bsa4 plasmid (from 3.4), were expressed in E. coli strain BL21 (AI) grown in the 2xYT medium at 18°C for 16 hours following induction with 1% arabinose. Proteins were purified by a combination of affinity, ion exchange, and size exclusion chromatography steps.

Briefly, cells were lysed by three passes through an EmulsiFlex-C5 homogenizer (Avestin, Mannheim, Germany) at 10 000-15 000 psi, any debris and unbroken cells were removed by centrifuging at 18 000 g at 4°C for 30 minutes. The supernatant was loaded onto nickel-nitrilotriacetic acid (Ni2+-NTA) resin columns (FlisTRAP, GE Flealthcare, Chicago, IL, USA) on an Akta Pure system connected to an F9-C fraction collector (GE Flealthcare). The expressed proteins were eluted by increasing the imidazole concentration in a stepwise manner to 25 mM, 50 mM, 75 mM and finally 500 mM. After pooling and concentration, protein samples were buffer exchanged into IEX start buffer. Anion exchange was performed on a FliTrap Q FF column (GE Flealthcare) in 20 mM phosphate buffer pH 7.0 (AC12-1, AC19-2, AC28-1, AC42-1, AcrIIA2 and GFP) or 20 mM TRIS-HCI pH 8.0 (AC23-2 and AC27-1). Cation exchange was performed on a HiTrap SP FF column (GE Healthcare) in 20 mM phosphate buffer pH 7.0 (AC23-1 and AC27-2). The fractions containing the protein of interest were pooled, concentrated, flash frozen, and stored at -80°C. Purity analysis was performed using a Coomassie-stained SDS-PAGE gel analyzed by ImageQuant TL software (GE Healthcare).

MBP-Cas9 was expressed from plasmid pMJ806 and was essentially purified as described (41) with some modifications. After performing expression and His-tag affinity purification as described above, MBP-Cas9 was further purified using an MBPTrap HP column (GE Healthcare). After cleavage with AcTEV protease (ThermoFisher Scientific, Waltham, MA, USA) during overnight dialysis and negative His-tag affinity purification, the sample was loaded onto a Superdex 200 Increase 10/300 GL column (GE Healthcare) equilibrated with 50 mM Tris-HCI pH 7.5, 150 mM

NaCI. The fractions containing Cas9 were pooled, concentrated and biotinylated using EZ-Link™ Sulfo-NHS-LC-Biotinylation Kit (ThermoFisher Scientific). After buffer exchange, samples were flash frozen and stored at -80°C. 3.6 Functional characterisation of novel ACR polypeptides

The nine expressed proteins were functionally characterized as follows:

An in vitro DNA cleavage assay was carried out as described by (Pawluk, N et al . , 2016) with the following modifications. SpyCas9 (New England Biolabs) (lOOnM), gRNA (in vitro transcribed) (lOOnM), and purified anti-CRISPR protein were mixed together in cleavage buffer (20 mM HEPES-KOH (pH 7.5), 75 mM KCI, 10% glycerol, 1 mM DTT, and 10 mM MgCI2) and incubated for 30 min . Then, PCR amplified DNA target (lOnM) was added and the mixture was incubated for 10 min for cleavage. The reaction was stopped by adding proteinase K and incubating at 60°C for 15 min . The cleaved and un-cleaved fraction of DNA target were visualized in 1% agarose gel .

Four expressed proteins (designated AC19-2, AC23-2, AC27-1 and AC42-1) were identified as ACR polypeptides, that inhibited cleavage of a target DNA molecule by the spCas9:gRNA complex in the in vitro assay (Figure 7A).

The in vitro direct binding assay to spCas9 was carried out using biolayer interferometry. Equimolar gRNA was mixed together with biotinylated Cas9 and incubated at 25°C for 15min to form a biotinylated Cas9 :gRNA complex. Streptavidin biosensors (Pall ForteBio) were pre-equilibrated in PBS buffer for 600s, loaded with a biotinylated Cas9:gRNA complex at optimal concentrations and times, and brought to baseline in kinetics buffer (IX PBS, 0.02% Tween-20, 0.1% BSA, 75mM KCI, lOmM MgCI 2 ) for 300s. Association with anti-CRISPR proteins was measured in the same kinetics buffer for 600s, and then dissociation was measured in the kinetics buffer without anti-CRISPR proteins for 1000s. All biolayer interferometry experiments were performed on Octet RED96 system (Pall ForteBio) in 96-well microplates at 30°C with 200 pi volume. Binding kinetics were calculated using the ForteBio Data Analysis v7.1 software by fitting the association and dissociation data to a 1 : 1 model.

The proteins AC19-2, AC27-1 and AC42-1 showed binding affinity to Cas9 :gRNA complex when Mg2+ was added to the running buffer (Figure 7B). The observed Kd for AC19-2, AC27-1 and AC42-1 was 47, 12 and 810 nM, respectively. Surprisingly,

AC27-1 showed a stronger binding affinity to Cas9:gRNA complex than AcrIIA2. An absence of detectable binding between AC27-1 and spCas9 and the lower binding affinity detected between AC42-1 and Cas9:gRNA complex using biolayer interferometry, suggests that these two ACR proteins as CRISPR-Cas inhibitors share a different mode of action . The 4 proteins, that clearly display anti-spCas9 activity, based on the in vivo and in vitro assays were renamed AcrIIA6 (AC19-2), AcrIIA7 (AC23-2), AcrIIA8 (AC27-1) and AcrIIA9 (AC42-1) (Figure 7B). AcrIIA6 originates from a soil metagenomic library whereas AcrIIA7, AcrIIA8 and AcrIIA9 are derived from human gut fecal metagenomic libraries.

Example 4 The ACRs of the invention are distinct families of ACR IIA proteins

The four ACR proteins of the invention (ACRIIA6, ACRIIA7, ACRrIIA8 and ACRIIA9) were classified as AcrllA proteins based on their demonstrated ability to inhibit CRISPR-Cas systems belong to Class 2 subtype IIA. Each of the four ACR proteins are representatives of a new family of ACRIIA proteins, each representative having less than 25% sequence identity with each other, as well as any of the ACRIIA2-5 proteins (Figure 8). Genes encoding the four new ACRIIAs 6-9 families, were identified by interrogating metagenomics datasets. The ACRIIAs 6-9 protein families were found to be structurally distinct families of AcrllA, where the members of each family share a high degree of structural homology, as seen from the cluster analysis shown in Figure 9.

Computational analysis of the distribution of the distinct AcrllA gene families in reference genomes revealed that their host range was surprisingly highly diverse, in strong contrast to known AcrllA 2-5, which are all confined to Firmicutes genomes (Figure 10A). Furthermore, the AcrIIAs 6 - 9 genes and their homologues were both more abundant and diversely distributed across multiple environments compared to previously characterized AcrIIAs (Figure 10B).

4.1 Structural characteristics of ACRIIA 8

Fifty-two homologous AcrIIA8 genes were identified in 68 reference genomes of 38 unique species of Firmicutes comprising about 1% of the 3257 Firmicutes species. An AcrIIA8 homologue was also found in a Listeria monocytogenes genome, co-existing with AcrIIAl, AcrIIA2, and AcrIIA4. All of the identified AcrIIA8 homologues share a viral (prophage) origin since they were all localized within highly conserved prophage regions in the genome of diverse Firmicutes family members (Figure 11). Based on both the highly conserved synteny of the viral fragments comprising the AcrIIA8 genes; and the conserved amino acid sequence of the encoded proteins, it is concluded that the proteins encoded by the AcrIIA8 genes shown in figure 11 are ACRII8 proteins having anti-CRISPR-Cas activity.

4.2 Structural characteristics of ACRIIA 6

Flomologues of the soil-derived AcrIIA6 gene were found in the genomes of soil isolates Cellulomonas carbonis T26 ( Actinobacteria ) and Sinorhizobium sp. GL28 (Proteobacteria), as seen in Figure 10 and 13. Both genes had a very high degree of homology to AcrIIA6 gene (94% identity at nucleotide level) and its encoded ACRIIA6 protein (99% sequence identity). 4.3 Structural characteristics of ACRIIA 7

Homologues of the AcrIIA7 gene are widely distributed across 6 distinct phyla ( Firmicutes , Proteobacteria, Bacteroidetes, Actinobacteria, Cyanobacteria and Spirochaetes ); but the majority are carried by Firmicutes representatives, predominantly by S. pneumoniae strains (56% of all strains). The second most represented phylum is Proteobacteria, including a variety of classes {Alpha-, Beta-, Gamma-, Delta-, Epsilon-proteobacteria ), as seen in figure 10. Notably, AcrIIA7 was present on several viral reference genomes, including all three tailed bacteriophage families (i.e. Siphoviridae, Myoviridae and Podoviridae ), reflecting its ubiquitousness. The ARCIIA7 proteins cluster together with Bacteroidetes representatives, most likely B. dorei being its original host (NCBI EL88_22925), while its closest family member is encoded by a gene derived from a human gut fecal virome sample. The ARCIIA7 proteins belong to the uncharacterized DUF2829 domain superfamily (Helix-turn-helix XRE-family like proteins) as seen in Figure 15. 4.4 Structural characteristics of ACRIIA 9

AcrIIA9 gene and its homologues are widely distributed across two different families of Bacteroidetes, spanning several Bacteroides species and their respective genes in three Parabacteroides species encode proteins with 100% sequence identity. Homologues were found in genomes from the recently defined phylum Balneolaeota, as well as the phyla Firmicutes, Proteobacteria and Actinobacteria (figure 10). Representatives from the two latter phyla {Sinorhizobium sp. GL28 and C. carbonis T26, respectively) carry both identical AcrIIA9 homologues, together with identical AcrIIA6 homologues. Indicating that they provide a functional genetic module to counteract type II CRISPR-Cas immune system. The ACRIIA9 proteins are members of the uncharacterized PcfK superfamily; and their members show a high degree of sequence conservation as seen in figure 14.

Example 5 Detetion of ACR activity in human cells

Cells of the human cell line, HEK293T, stably expressing a Cas9 gene from an AAS1 site (supplied by GeneCopoeia, MD, USA), with or without transfection with Alt-R crRNA/tracrRNA (supplied by Integrated DNA technologies, IA, USA) using PolyFect Transfection Reagent (Qiagen, Venlo, Netherlands), were used to observe the effect of Acr proteins on guide Cas9-mediated gene editing of a target locus in the host cell genome. The candidate Acr proteins were transfected into cells of the human cell line, before crRNA transfection, by using an Xfect protein transfection reagent (Takara, Japan). The cells were then cultured for 24 hours, allowing for Cas9-induced double strand breaks introduced in the targeted locus on the chromosome of the human cell to be repaired by non-homologous end joining mechanism (NHEJ) leaving insertion or deletion mutation (indels). A T7E1 assay was then used, according to the manufacturer's protocol (New England Biolabs, MA, USA), to detect indels in the targeted locus in the genomic DNA extracted from the human cells. The assay was performed by amplifying the target locus by PCR reaction; denaturing and re-annealing the PCR products to create mismatched DNA region around the target site. T7 endonuclease I was then used to detect and cleave the mismatched DNA region resulting in cleaved product appeared on the agarose gel.

A reduction in CRISPR-Cas9 mediated double-stranded breaks in the host cells was detected using the T7E1 assay in cells transfected with Acr proteins AcrIIA6, AcrIIA7, AcrIIA8, AcrIIA9 and AcrIIA2 (Figure 16).

References

Ausubel F.M. et al. (1987) Current Protocols in Molecular Biology ;

Esvelt, K. M., Mali, P., Braff, J. L, Moosburner, M., Yaung, S. J., & Church, G. M. (2013). Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nature

Methods, 10(11), 1116-21. https://doi.org/10.1038/nmeth.2681

Genee, H. J., Bonde, M. T., Bagger, F. O., Jespersen, J. B., Sommer, M. O. a,

Wernersson, R., & Olsen, L. R. (2014) . Software-supported USER cloning strategies for site-directed mutagenesis and DNA assembly. ACS Synthetic Biology, 4(3), 342- 349. https://doi.org/10.1021/sb500194z

Genee FI J, Bali A P, Petersen SD, Siedler, S, Bonde MT, Gronenberg LS, Kristensen, M and Flarrison SJ and MOA Sommer (2016) Functional mining of transporters using synthetic selections. Nature Chemical Biology 12, 1015-1022;

Hug, L. A., B.J. Baker, K. Anantharaman, C.T. Brown, A.J. Probst, C.J. Castelle, C.N. Butterfield, A.W. Flernsdorf, Y. Amano, K. Ise, Y. Suzuki, N. Dudek, D.A. Reiman, K.M. Finstad, R. Amundson, B.C. Thomas, J.F. Banfield (2016) A new view of the tree of life. Nat. Microbiol. 1, 1-6.

Koonin, E. V., Makarova, K. S., & Zhang, F. (2017) Diversity, classification and evolution of CRISPR-Cas systems. Current Opinion in Microbiology, 37, 67-78.

https://doi.Org/10.1016/j.mib.2017.05.008

Lutz, R., Bujard, FI., 1997. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR / O , the TetR / O and AraC / I 1 -I 2 regulatory elements. Pharmacia 25: 1203-1210. Martinez-Garcia, E., Aparicio, T., Gohi-Moreno, A., Fraile, S., & De Lorenzo, V.

(2015). SEVA 2.0: An update of the Standard European Vector Architecture for de- /re-construction of bacterial functionalities. Nucleic Acids Research, 43(D1), D1183-

D1189. https ://doi.orQ/10.1093/nar/Qkul l l4

Medema, M. H ., E. Takano, R. Breitling, (2013) Detecting sequence homology at the gene cluster level with multigeneblast. Mol. Biol. Evol. 30, 1218-1223.

Pawluk, A., N. Amrani, Y. Zhang, B. Garcia, Y. Hidalgo-Reyes, J. Lee, A. Edaki, M. Shah, E.J. Sontheimer, K.L. Maxwell, A.R. Davidson (2016) Naturally Occurring Off- Switches for CRISPR-Cas9 Article Naturally Occurring Off-Switches for CRISPR-Cas9. Cell. 167, 1829-1834. e9.

Savitsky, P., J. Bray, C.D.O. Cooper, B.D. Marsden, P. Mahajan, N.A. Burgess-Brown, O. Gileadi, (2010) High-throughput production of human proteins for crystallization : The SGC experience. J. Struct. Biol. 172, 3-13.

Sommer, M.O., Dantas, G. & Church, G.M. (2009) Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325, 1128-1131.

Zhu, W., A. Lomsadze, M. Borodovsky, (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, 1-15