Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
POOLS OF MICROBIAL PROTEIN FRAGMENTS
Document Type and Number:
WIPO Patent Application WO/2022/162355
Kind Code:
A1
Abstract:
The disclosure concerns a method for producing a pool of fragments derived from a microbial protein. The disclosure also concerns a pool of fragments derived from a microbial protein, and a method for determining the presence or absence of immune cells targeting a microbe.

Inventors:
COCHRANE DANIEL (GB)
Application Number:
PCT/GB2022/050199
Publication Date:
August 04, 2022
Filing Date:
January 26, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OXFORD IMMUNOTEC LTD (GB)
International Classes:
C07K7/00; G01N33/569; G16B30/10
Domestic Patent References:
WO2001074130A22001-10-11
WO2016024129A12016-02-18
WO2017168135A12017-10-05
Other References:
MEZIERE ET AL., J. IMMUNO1, vol. 159, 1997, pages 3230 - 3237
Attorney, Agent or Firm:
J A KEMP LLP (GB)
Download PDF:
Claims:
Claims

1. A method for producing a pool of fragments derived from a microbial protein, comprising:

(a) identifying fragments of the microbial protein that are comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein;

(b) determining for each fragment identified in step (a) whether or not a homolog exists, wherein the homolog is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; and

(c) preparing a pool of fragments in which:

(i) each fragment is a fragment identified in step (a) for which step (b) determines the existence of a homolog; or

(ii) each fragment is a fragment identified in step (a) for which step (b) does not determine the existence of a homolog, and the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein.

2. A pool of fragments derived from a microbial protein, wherein:

(I) each fragment is comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and has a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; or (II) the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and each fragment does not have a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived.

3. The pool of claim 2, produced according to the method of claim 1.

4. The method of claim 1, or the pool of claim 2 or 3, wherein the pool comprises fragments whose sequences overlap, optionally wherein the sequences overlap by 11 amino acids.

5. The method of claim 1 or 4, or the pool of any one of claims 2 to 4, wherein the fragments are 15 amino acids in length.

6. The method of claim 1, 4 or 5, or the pool of any one of claims 2 to 5, wherein the microbe from which the microbial protein is derived is an emerging pathogen.

7. The method of any one of claims 1 and 4 to 6, or the pool of any one of claims 2 to

6, wherein one or more of the microbes expressing the homolog is endemic within a population.

8. The method of any one of claims 1 and 4 to 7, or the pool of any one of claims 2 to

7, wherein the microbe from which the microbial protein is derived and the microbe expressing the homolog are each capable of infecting the same species.

9. The method or pool of claim 8, wherein the species is human.

10. The method of any one of claims 1 and 4 to 9, or the pool of any one of claims 2 to 9, wherein the family is Coronaviridae .

11. The method of any one of claims 1 and 4 to 10, or the pool of any one of claims 2 to 10, wherein the microbe from which the microbial protein is derived is a coronavirus.

12. The method or pool of claim 11, wherein the coronavirus is SARS-CoV-2.

13. The method of any one of claims 1 and 4 to 12, or the pool of any one of claims 2 to 12, wherein one or more of the microbes expressing the homolog is a coronavirus.

14. The method or pool of claim 13, wherein one or more of the microbes expressing the homolog is an endemic human coronavirus.

15. The method or pool of claim 14, wherein one or more of the microbes expressing the homolog is selected from HKU1, OC43, 229E and NL63.

16. The method of any one of claims 1 and 4 to 15, or the pool of any one of claims 2 to 15, wherein the microbial protein is selected from SARS-CoV-2 S1 spike domain, SARS-CoV-2 S2 spike domain, SARS-CoV-2 nucleocapsid protein, SARS-CoV-2 membrane protein, and SARS-CoV-2 envelope protein.

17. A consolidated pool of fragments which comprises two or more pools as defined in any one of claims 2 to 16, wherein each of the two or more pools comprises fragments derived from a different microbial protein, optionally wherein the microbial protein is selected from SARS-CoV-2 S1 spike domain, SARS-CoV-2 S2 spike domain, SARS- CoV-2 nucleocapsid protein, SARS-CoV-2 membrane protein, and SARS-CoV-2 envelope protein.

18. The consolidated pool of claim 17, wherein the pool comprises or consists of the fragments set out in Table 3.

19 . A method for determining the presence or absence of immune cells targeting a microbe, the method comprising contacting a sample comprising immune cells with one or more pools as defined in any one of claims 2 to 18, and detecting in vitro the presence or absence of an immune response to the one or more pools.

20. The method of claim 19, wherein the sample is contacted with each of the one or more pools in a separate reaction.

21. The method of claim 19 or 20, wherein the one or more pools comprise:

(a) one or more pools as defined in claim 2(I); and/or

(b) one or more pools as defined in claim 2(II); and/or

(c) one or more pools as defined in claim 17 or 18.

22. The method of any one of claims 19 to 21, wherein each of the one or more pools comprises fragments derived from a different microbial protein.

23. The method of any one of claims 19 to 22, wherein the method further comprises contacting the sample with a pool of fragments derived from a protein from the microbe and detecting in vitro the presence or absence of an immune response to the pool, wherein the fragments in the pool form a protein fragment library encompassing at least 80% of the sequence of the protein.

24. The method of any one of claims 19 to 23, wherein the method further comprises, in a separate reaction, contacting the sample with a pool of fragments derived from a protein from a microbe in the same family as the microbe from which the microbial protein is derived and detecting in vitro the presence or absence of an immune response to the pool, wherein the fragments in the pool form a protein fragment library encompassing at least 80% of the sequence of the protein.

Description:
POOLS OF MICROBIAL PROTEIN FRAGMENTS

Field of the disclosure

The disclosure concerns a method for producing a pool of fragments derived from a microbial protein. The disclosure also concerns a pool of fragments derived from a microbial protein, and a method for determining the presence or absence of immune cells targeting a microbe.

Background

Microbes, such as viruses, bacteria, fungi and protozoa, are a common cause of disease in humans and animals. Some microbial infections may cause mild disease symptoms, and others severe disease or even death.

Immune protection to microbial disease may be elicited in both humans and animals. One mechanism of immune protection involves antibody generation. Another mechanism involves the generation and priming of T cells responsive to the microbe. In either case, an initial encounter with a first microbe may elicit immune protection against a further encounter with that microbe. An initial encounter with a first microbe may also elicit immune protection against a second microbe that is different from the first microbe. In other words, the immune protection elicited in response to the first microbe may be cross-protective against infection with a second microbe.

Cross-protective immunity may exist between related microbes, such as microbes belonging to the same family. For example, cross-protective immunity is thought to exist between different human coronaviruses. Animal data and limited human epidemiological data indicate that T cell mediated immune protection to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mediated disease can be elicited. SARS-CoV-2 responsive T cells may be generated in individuals symptomatically or asymptomatically infected with SARS-CoV-2. Additionally, SARS-CoV-2 responsive T cells have been described in a proportion of the SARS- CoV-2 naive population. These cells are likely primed by infection with the endemic common cold Coronaviridae (CCCs). That is, an initial encounter with an endemic common cold coronavirus may provide cross-protection against a subsequent encounter with SARS-CoV-2. Microbe-specific immune responses may be characterised using a number of methods known in the art. For example, cell mediated immunity to a microbe may be characterised by contacting a sample containing immune cells with one or more antigens from the microbe, and detecting the presence, absence or characteristics of an immune response to the one or more antigens. Each antigen may, for example, comprise one or more peptides or proteins from the microbe. While cross-protection may be beneficial to the individual encountering the microbe(s), it can complicate the characterisation of microbe-specific immune responses such as cell mediated immune responses. This can pose challenges to research into, and diagnosis of, microbial diseases. There is therefore a need for a toolkit that enables cell mediated immune responses elicited by a microbe of interest to be distinguished from cross-reactive cell mediated immune responses elicited by a different (e.g. related) microbe.

Summary

Some assays for cell mediated immunity to a microbe of interest detect the presence, absence or characteristics of an immune response of immune cells in a sample to a pool of fragments from a protein from the microbe (i.e. a microbial protein). The pool of fragments is essentially used as the test antigen in the assay. Providing the antigen as a pool of fragments may help to account for variations in immune repertoire between individuals, because the number of potential epitopes with which the immune cells are contacted is maximised. In certain cases, the fragments comprised in the pool form a protein fragment library that encompasses some or all of the sequence of the microbial protein. The present inventors have developed a method for producing such a pool of fragments, which pool is optimised for use in an assay for cell mediated immunity.

In more detail, the present inventors have developed a method of producing a pool of fragments that is optimised for assaying (I) cell mediated immunity that is cross-reactive for the microbe of interest, or (II) cell mediated immunity that is specific for the microbe of interest. This allows the nature of cell mediated immunity for a microbe of interest to be better characterised. This may be beneficial in a research or diagnostic context, where it is desirable to distinguish true microbespecific immunity from immunity that is elicited from a different but related microbe. For example, it may be advantageous to distinguish cell mediated immunity elicited by exposure to the emerging pathogen SARS-CoV-2 from that elicited by exposure to endemic common cold Coronaviridae, as this may improve the specificity of diagnosis and disease surveillance. The same may apply to other emerging and endemic pathogens.

Accordingly, the disclosure provides a method for producing a pool of fragments derived from a microbial protein, comprising: (a) identifying fragments of the microbial protein that are comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein; (b) determining for each fragment identified in step (a) whether or not a homolog exists, wherein the homolog is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; and (c) preparing a pool of fragments in which: (i) each fragment is a fragment identified in step (a) for which step (b) determines the existence of a homolog; or (ii) each fragment is a fragment identified in step (a) for which step (b) does not determine the existence of a homolog, and the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein.

The invention also provides: a pool of fragments derived from a microbial protein, wherein: (I) each fragment is comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and has a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; or (II) the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and each fragment does not have a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; a consolidated pool of fragments which comprises two or more pools of the invention, wherein each of the two or more pools comprises fragments derived from a different microbial protein, optionally wherein the microbial protein is selected from SARS-CoV-2 S1 spike domain, SARS-CoV-2 S2 spike domain, SARS-CoV-2 nucleocapsid protein, SARS-CoV-2 membrane protein, and SARS- CoV-2 envelope protein; and a method for determining the presence or absence of immune cells targeting a microbe, the method comprising contacting a sample comprising immune cells with one or more pools of the invention, and detecting in vitro the presence or absence of an immune response to the pool.

Brief description of the Figures

Figure 1: graphical representation of P1-4, P13 and P7-10.

Detailed description

It is to be understood that different applications of the disclosed methods and products may be tailored to the specific needs in the art. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments of the disclosure only, and is not intended to be limiting.

In addition, as used in this specification and the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “a cell” includes “cells”, reference to “an image” includes two or more such images, reference to “an antigen” includes two or more such antigens, and the like.

All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety.

Method for producing a pool of fragments

Disclosed herein is a method for producing a pool of fragments derived from a microbial protein, comprising: (a) identifying fragments of the microbial protein that are comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein; (b) determining for each fragment identified in step (a) whether or not a homolog exists, wherein the homolog is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; and (c) preparing a pool of fragments in which: (i) each fragment is a fragment identified in step (a) for which step (b) determines the existence of a homolog; or (ii) each fragment is a fragment identified in step (a) for which step (b) does not determine the existence of a homolog, and the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein. The features and advantages of the method are described in detail below.

Fragments and fragment pools

The method produces a pool of fragments derived from a microbial protein. The pool of fragments is a pool in which (i) each fragment is a fragment identified in step (a) for which step (b) determines the existence of a homolog; or (ii) each fragment is a fragment identified in step (a) for which step (b) does not determine the existence of a homolog, and the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein.

In more detail, each fragment comprised in the pool of fragments (i) is a fragment that is identified as being comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein. The fragments comprised in the pool (i) need not themselves form such a protein fragment library. Rather, each fragment comprised in the pool of fragments (i) is a fragment that is notionally comprised in a protein fragment library that encompasses at least 80% of the sequence of the microbial protein. In other words, each fragment comprised in the pool of fragments (i) is a fragment that is or would be found in a protein fragment library that encompasses at least 80% of the sequence of the microbial protein. Each fragment comprised in the pool of fragments (i) is also a fragment that has a homolog which is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. Homologs are described in detail below.

Accordingly, the pool of fragments (i) essentially comprises fragments that are not unique to the microbe from which the microbial protein is derived. The pool of fragments (i) may thus comprise fragments that may be recognised by a cross-reactive immune response. That is, the pool of fragments (i) may comprise fragments that are recognised by (e.g. bind to antigen receptors on and/or trigger a response by) immune cells that are generated by contact with a microbe other that the microbe from which the microbial protein is derived.

Each fragment comprised in the pool of fragments (ii) is a fragment that is identified as being comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein. In other words, each fragment comprised in the pool of fragments (ii) is a fragment that is notionally comprised in a protein fragment library that encompasses at least 80% of the sequence of the microbial protein. That is, each fragment comprised in the pool of fragments (ii) is a fragment that is or would be found in a protein fragment library that encompasses at least 80% of the sequence of the microbial protein. In addition, the fragments comprised in the pool (ii) themselves form protein fragment library encompassing at least 80% of the sequence of the microbial protein. Furthermore, each fragment comprised in the pool of fragments (ii) is a fragment that does not have a homolog which is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. Homologs and protein fragment libraries are described in detail below.

Accordingly, the pool of fragments (ii) essentially comprises fragments that are unique to the microbe from which the microbial protein is derived. In other words, the pool of fragments (ii) essentially comprises only fragments that do not have a homolog in another microbe belonging to the same family as the microbe from which the microbial protein is derived. Thus, the pool of fragments (ii) may exclude fragments that may be recognised by a cross-reactive immune response. That is, the pool of fragments (ii) may exclude fragments that are recognised by (e.g. bind to antigen receptors on and/or trigger a response by) immune cells generated by contact with a microbe other that the microbe from which the microbial protein is derived.

In either case, a fragment derived from a microbial protein may be an amino acid sequence, or a peptide. For example, a fragment derived from a microbial protein may be a sequence comprising five or more amino acids that is derived by truncation at the N-terminus and/or C-terminus of the sequence of the microbial protein (“the parent sequence”). For instance, the fragment may comprise about 5 or more, about 6 or more, about 7 or more, about 8 or more, about 9 or more, about 10 or more, about 11 or more, about 12 or more, about 13 or more, about 14 or more, about 15 or more, about 16 or more, about 17 or more, about 18 or more, about 19 or more, about 20 or more, about 21 or more, about 22 or more, about 23 or more, about 24 or more, about 25 or more, about 26 or more, about 27 or more, about 28 or more, about 29 or more or about 30 or more amino acids. The fragment may be from about 5 to about 30, from about 6 to about 29, from about 7 to about 28, from about 8 to about 27, from about 9 to about 26, from about 10 to about 25, from about 11 to about 24, from about 12 to about 23, from about 13 to about 22, from about 14 to about 21, from about 15 to about 20, from about 16 to about 19, or from about 17 to about 18 amino acids in length. The fragment may, for example, be from about 9 to about 20, about 10 to about 19, about 11 to about 18, about 12 to about 17, about 13 to about 16, or about 15 amino acids in length. Preferably, the fragment is about 15 amino acids in length.

The term "fragment" includes not only molecules in which amino acid residues are joined by peptide (-CO-NH-) linkages but also molecules in which the peptide bond is reversed. Such retro-inverso peptidomimetics may be made using methods known in the art, for example such as those described in Meziere et al (1997) J. Immunol.159, 3230-3237. This approach involves making pseudopeptides containing changes involving the backbone, and not the orientation of side chains. Meziere et al (1997) show that, at least for MHC class II and T helper cell responses, these pseudopeptides are useful. Retro-inverse peptides, which contain NH-CO bonds instead of CO-NH peptide bonds, are much more resistant to proteolysis.

Similarly, the peptide bond may be dispensed with altogether provided that an appropriate linker moiety which retains the spacing between the carbon atoms of the amino acid residues is used; it is particularly preferred if the linker moiety has substantially the same charge distribution and substantially the same planarity as a peptide bond. It will also be appreciated that the fragment may conveniently be blocked at its N-or C-terminus so as to help reduce susceptibility to exoproteolytic digestion. For example, the N-terminal amino group of the peptides may be protected by reacting with a carboxylic acid and the C-terminal carboxyl group of the peptide may be protected by reacting with an amine. One or more additional amino acid residues may also be added at the N-terminus and/or C-terminus of the fragment, for example to increase the stability of the fragment. Other examples of modifications include glycosylation and phosphorylation. Another potential modification is that hydrogens on the side chain amines of R or K may be replaced with methylene groups (-NH 2 -NH(Me) or -N(Me) 2 ).

Fragments of the microbial protein may include variants of fragments that increase or decrease the fragments’ longevity in vitro or in vivo. Examples of variants capable of increasing the longevity of fragments according to the invention include peptoid analogues of the fragments, D-amino acid derivatives of the fragments, and peptide-peptoid hybrids. The fragment may also comprise D-amino acid forms of the fragment. The preparation of polypeptides using D-amino acids rather than L-amino acids greatly decreases any unwanted breakdown of such an agent by normal metabolic processes, decreasing the amounts of agent which needs to be administered, along with the frequency of its administration. D-amino acid forms of the parent protein may also be used.

The fragments may be derived from splice variants of the parent protein encoded by mRNA generated by alternative splicing of the primary transcripts encoding the parent protein chains. The fragments may also be derived from amino acid mutants, glycosylation variants and other covalent derivatives of the parent proteins which retain at least an MHC -binding or antibody-binding property of the parent protein. Exemplary derivatives include molecules wherein the fragments of the invention are covalently modified by substitution, chemical, enzymatic, or other appropriate means with a moiety other than a naturally occurring amino acid.

A pool of fragments derived from a microbial protein comprises two or more fragments of the microbial protein. Fragments are described above. A pool may, for example, comprise three or more, four or more, five or more, six or more, seven or more, eight or more, nine of more, 10 or more, 15 or more, 20 or more, 25 or more, 50 or more, 75 or more, 100 or more, 200 or more, or 250 or more, fragments of the microbial protein.

The fragments comprised in a pool may form a protein fragment library. A protein fragment library comprises a plurality of fragments derived from a parent protein (in the present disclosure, the microbial protein), that together encompass at least 10%, such as at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, of the sequence of the parent protein. In the pool of fragments (ii), the fragments form a protein fragment library encompassing at least 80% of the sequence of the parent protein. For example, the fragments may form a protein fragment library encompassing the entire sequence of the parent protein. In a protein fragment library in which the fragments together encompass at least 10% (such as at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%) of the sequence of the parent protein, the fragments are diverse enough that the pool contains epitopes capable of binding to many different MHC alleles. This allows the pool to be used in assays for cell mediated immunity across the global population, despite variation in MHC alleles between subjects. The protein fragment library may comprise fragments that are capable of stimulating CD4+ and/or CD8+ T cells. The protein fragment library may comprise fragments that are capable of stimulating both CD8+ T cells and CD4+ T cells. It is known in the art that the optimal fragment size for stimulation is different for CD4+ and CD8+ T-cells. Fragments consisting of about 9 amino acids (9mers) typically stimulate CD8+ T-cells only, and fragments consisting of about 20 amino acids (20mers) typically stimulate CD4+ T-cells only. Broadly speaking, this is because CD8+ T-cells tend to recognise their antigen based on its sequence, whereas CD4+ T- cells tend to recognise their antigen based on its higher-level structure. However, fragments consisting of about 15 amino acids (15mers) may stimulate both CD4+ and CD8+ T cells. The protein fragment library preferably comprises fragments that are about 15 amino acids, such as about 12 amino acids, about 13 amino acids, about 14 amino acids, about 16 amino acids, about 17 amino acids or about 18 amino acids in length.

All of the fragments in the protein fragment library may be the same length. Alternatively, the protein fragment library may comprise fragments of different lengths. Fragment lengths are discussed above.

The protein fragment library may comprise fragments whose sequences overlap. The sequences may overlap by one or more, such as two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 16 or more, 17 or more, 18 or more, 19 or more, or 20 or more, amino acids. Preferably, the sequences overlap by 9 or more amino acids, such as 10 or more, 11 or more or 12 or more amino acids, as this maximises the number of fragments that comprise 9mers capable of stimulating CD8+ T cells. More preferably, the sequences overlap by 11 amino acids. All of the overlapping fragments in the protein fragment library may overlap by the same number of amino acids. Alternatively, the protein fragment library may comprise fragments whose sequences overlap by different numbers of amino acids.

The protein fragment library may, for example, comprise fragments of 12 to 18 (such as 12 to 15, 15 to 18, 13 to 17, or 14 to 16) amino acids in length that overlap by 9 to 12 (such as 9 to 11 or 10 to 12) amino acids. For instance, the protein fragment library may comprise fragments of (a) 14 amino acids in length that overlap by 9, 10, or 11 amino acids, (b) 15 amino acids in length that overlap by 9, 10, or 11 amino acids, or (c) 16 amino acids in length that overlap by 9, 10, or 11 amino acids. The protein fragment library preferably comprises fragments of 15 amino acids in length that overlap by 11 amino acids.

Microbial protein

The fragments comprised in the pool produced by the method of the disclosure are derived from a microbial protein. A microbial protein is a protein that is expressed by a microbe.

Microbes are well-known in the art and include viruses, bacteria, fungi and protozoa. Accordingly, the microbial protein may be expressed by a virus. In this case, the microbial protein is a viral protein. The microbial protein may be expressed by a bacterium. In this case, the microbial protein is a bacterial protein. The microbial protein may be expressed by a fungus. In this case, the microbial protein is a fungal protein. The microbial protein may be expressed by a protozoa. In this case, the microbial protein is a protozoal protein.

The microbe from which the microbial protein is derived may be a pathogenic microbe. That is, the microbe may be capable of causing disease. The microbe from which the microbial protein is derived may be a non-pathogenic microbe. That is, the microbe may be one that does not typically cause disease. For instance, the microbe may be a commensal microbe.

In one aspect of the disclosure, the microbe from which the microbial protein is derived is an emerging pathogen. An emerging pathogen may be defined as the causative microbe of an infectious disease whose incidence is increasing following its appearance in a new host population or whose incidence is increasing in an existing population as a result of long-term changes in its underlying epidemiology. Typically, an emerging pathogen is a virus, a bacterium or a protozoa. Emerging diseases have, in recent years, included respiratory, central nervous system, and enteric infections, viral hemorrhagic fevers, hepatitides, systemic bacterial infections, and human retroviral and novel herpes viral infections. Emerging viruses have included HIV, hepatitis C virus, ebola virus, nipah virus, lassa virus, and West Nile virus, for example. Emerging bacteria have included E. coli 0157, Vibrio choleras 0139, Clostridium difficile, Legionella pneumophila, and Campylobacter jejuni/coli, for example. Emerging pathogens of particular note include novel human coronavirues such as SARS-CoV-2, which is responsible for an ongoing global pandemic.

In a preferred aspect of the disclosure, the microbe is a virus. Preferably, the virus is a virus of the realm Riboviria. Preferably, the virus is a virus of the kingdom Orthornavirae . Preferably, the virus is a virus of the phylum Pisuviricota. Preferably, the virus is a virus of the class Pisoniviricetes . Preferably, the virus is a virus of the order Nidovirales . Preferably, the virus is a virus of the family Coronaviridae . Thus, the microbe is preferably a coronavirus. The coronavirus may, for example, be SARS-CoV-2.

The protein may be expressed on the surface of the microbe. That is, the microbial protein may be a surface microbial protein. The microbial protein may be expressed internally within the microbe. That is, the microbial protein may be an internal microbial protein. If the microbe is a bacterium, fungus, or protozoa, the internal protein may be an intracellular protein. If the microbe is a virus, the internal protein may be an intraviral protein.

The protein may be any type of protein. For example, the protein may be a structural protein. The protein may, for example, be an enzyme. The protein may, for example, be a receptor. The protein may, for example, be a transport molecule. The protein may, for example, be a transcription factor.

The protein may be an antigenic protein. An antigenic protein is a protein that may function as an antigen. In other words, an antigenic protein is a protein that comprises a peptide that is capable of binding to an immune receptor. For instance, an antigenic protein may comprise a peptide that is capable of binding to an antibody. An antigenic protein may comprise a peptide that is capable of binding to an B cell receptor. An antigenic protein may comprise a peptide that is capable of binding to a T cell receptor, such as an alpha-beta T cell receptor or a gamma-delta T cell receptor. In the present disclosure, the antigenic protein is preferably capable of binding to a T cell receptor.

As set out above, the microbe from which the microbial protein is derived is preferably a coronavirus, such as SARS-CoV-2. Accordingly, the microbial protein is preferably a coronavirus protein. The coronavirus protein may, for example, be a SARS-CoV-2 protein. Preferably, the SARS-CoV-2 protein is a structural protein. SARS-CoV-2 structural proteins include SARS-CoV-2 S1 spike glycoprotein (which comprises SARS-CoV-2 S1 spike domain (S1) and SARS-CoV-2 S2 spike domain (S2)), SARS-CoV-2 nucleocapsid protein (N), SARS-CoV-2 membrane protein (M), and SARS-CoV-2 envelope protein (E).

Step (a) - identifying fragments comprised in a protein fragment library

Step (a) of the method comprises identifying fragments of the microbial protein that are comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein. The protein fragment library comprises a plurality of fragments derived from the microbial protein, that together encompass at least 80% (such as at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%) of the sequence of the microbial protein.

The protein fragment library may comprise fragments that are capable of stimulating CD4+ and/or CD8+ T cells. The protein fragment library may comprise fragments that are capable of stimulating both CD8+ T cells and CD4+ T cells. As explained above, it is known in the art that the optimal fragment size for stimulation is different for CD4+ and CD8+ T-cells. Fragments consisting of about 9 amino acids (9mers) typically stimulate CD8+ T-cells only, and fragments consisting of about 20 amino acids (20mers) typically stimulate CD4+ T-cells only. Fragments consisting of about 15 amino acids (15mers) may stimulate both CD4+ and CD8+ T cells. The protein fragment library may therefore comprise fragments that are from about 9 to about 20 (such as about 10 to about 19, about 11 to about 18, about 12 to about 17, about 13 to about 16, or about 15) amino acids in length. The protein fragment library preferably comprises fragments that are about 15 amino acids, such as about 12 amino acids, about 13 amino acids, about 14 amino acids, about 16 amino acids, about 17 amino acids or about 18 amino acids in length. All of the fragments in the protein fragment library may be the same length. Alternatively, the protein fragment library may comprise fragments of different lengths. Fragment lengths are discussed above.

The protein fragment library may comprise fragments whose sequences overlap. The sequences may overlap by one or more, such as two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 16 or more, 17 or more, 18 or more, 19 or more, or 20 or more, amino acids. Preferably, the sequences overlap by 9 or more amino acids, such as 10 or more, 11 or more or 12 or more amino. More preferably, the sequences overlap by 11 amino acids. All of the overlapping fragments in the protein fragment library may overlap by the same number of amino acids. Alternatively, the protein fragment library may comprise fragments whose sequences overlap by different numbers of amino acids.

The protein fragment library may, for example, comprise fragments of 12 to 18 (such as 12 to 15, 15 to 18, 13 to 17, or 14 to 16) amino acids in length that overlap by 9 to 12 (such as 9 to 11 or 10 to 12) amino acids. For instance, the protein fragment library may comprise fragments of (a) 14 amino acids in length that overlap by 9, 10, or 11 amino acids, (b) 15 amino acids in length that overlap by 9, 10, or 11 amino acids, or (c) 16 amino acids in length that overlap by 9, 10, or 11 amino acids. The protein fragment library preferably comprises fragments of 15 amino acids in length that overlap by 11 amino acids.

Methods for identifying fragments of the microbial protein that are comprised in the protein fragment library are known in the art. For example, the amino acid sequence of the microbial protein may be processed to an algorithm that returns a list of fragments comprised in a protein fragment library that encompasses an inputted percentage of the amino acid sequence of the microbial protein, and comprises fragments of an inputted length and overlap. A similar exercise could be performed manually.

Step (b) - determining the existence of a homolog

Step (b) of the method comprises determining for each fragment identified in step (a) whether or not a homolog exists. In this context, a homolog is defined as an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. As set out above, the pool of fragments (i) produced in step (c) contains only fragments having such a homolog. The pool of fragments (ii) produced in step (c) excludes fragments having such a homolog.

The homolog may, for example, have at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the respective fragment. For the purpose of this disclosure, in order to determine the percent identity of two sequences (such as two amino acid sequences), the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in a first sequence for optimal alignment with a second sequence). The nucleotide residues at nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide residue as the corresponding position in the second sequence, then the nucleotides are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity = number of identical positions /total number of positions in the reference sequence x 100).

Typically the sequence comparison is carried out over the length of the reference sequence. For example, if the user wished to determine whether a given (“test”) sequence has a certain percentage identity to SEQ ID NO: X, SEQ ID NO: X would be the reference sequence. For example, to assess whether a sequence is at least 60% identical to SEQ ID NO: X (an example of a reference sequence), the skilled person would carry out an alignment over the length of SEQ ID NO: X, and identify how many positions in the test sequence were identical to those of SEQ ID NO: X. If at least 60% of the positions are identical, the test sequence is at least 60% identical to SEQ ID NO: X. If the sequence is shorter than SEQ ID NO: X, the gaps or missing positions should be considered to be non-identical positions. SEQ ID NO: X may be taken to represent a fragment identified in step (a) of the method. The “test sequence” may be taken to represent a potential homolog.

The skilled person is aware of different computer programs that are available to determine the homology or identity between two sequences. For instance, a comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm.

As set out above, the fragments identified in step (a) of the method are preferably 15 amino acids in length. An amino acid sequence having at least 60% sequence identity to a 15 amino acid fragment may comprise 9 or more (such as 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, or 15) positions that are identical to those in the 15 amino acid fragment. For example, an amino acid sequence having at least 60% sequence identity to a 15 amino acid fragment may comprise 9 to 15 (such as 10 to 14, or 12 to 13) positions that are identical to those in the 15 amino acid fragment.

An amino acid sequence having at least 60% sequence identity to a 15 amino acid fragment may comprise one or more amino acid substitutions with respect to the 15 amino acid fragment. For example, the amino acid sequence may comprise one, two, three, four, five or six amino acid substitutions with respect to the 15 amino acid fragment, providing that the amino acid sequence comprises 9 or more positions that are identical to those in the 15 amino acid fragment. An amino acid sequence having at least 60% sequence identity to a 15 amino acid fragment may comprise one or more amino acid deletions with respect to the 15 amino acid fragment. For example, the amino acid sequence may comprise one, two, three, four, five or six amino acid deletions with respect to the 15 amino acid fragment, providing that the amino acid sequence comprises 9 or more positions that are identical to those in the 15 amino acid fragment. An amino acid sequence having at least 60% sequence identity to a 15 amino acid fragment may comprise any number and combination of amino acid substitutions and amino acid deletions, providing that the amino acid sequence comprises 9 or more positions that are identical to those in the 15 amino acid fragment.

The homolog is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. For example, the homolog may be expressed by two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or 10 or more microbes in the same family as the microbe from which the microbial protein is derived. In this context, the term “family” refers to a taxonomic family. By way of non-limiting example, the microbial protein may be expressed by a first virus in the Coroniviridae family, and the homolog may be expressed by a second virus in the Coroniviridae family. That is, the family may be Coroniviridae . The microbe expressing the microbial protein may be a coronavirus. One or more of the microbes expressing the homolog may be a coronavirus. All of the microbes expressing the homolog may be a coronavirus. The microbe expressing the microbial protein may be a coronavirus and one or more of microbes expressing the homolog may be a coronavirus. The microbe expressing the microbial protein may be a coronavirus and all of microbes expressing the homolog may be a coronavirus.

The microbe from which the microbial protein is derived and one or more microbes expressing the homolog may be different microbes. That is, the microbe from which the microbial protein is derived may be of a different genus from the one or more microbes expressing the homolog. The microbe from which the microbial protein is derived may be of a different species from the one or more microbes expressing the homolog. The microbe from which the microbial protein is derived may be of a different strain from the one or more microbes expressing the homolog. By way of non-limiting example, the microbial protein may be expressed by SARS- CoV-2 and the homolog may be expressed by one or more non-SARS-CoV-2 coronavirus(es). The non-SARS-CoV-2 coronavirus may, for example, be SARS- CoV-1 or a common cold coronavirus such as HKU1, OC43, 229E and/or NL63.

One or more of the microbes that express the homolog may be endemic within a population. Preferably, each of the one or more microbes that express the homolog is endemic within a population. A pathogen may be defined as endemic in a population when infection with the pathogen is constantly maintained at a baseline level in the population without external inputs. For example, chickenpox is endemic in the United Kingdom population, but malaria is not. The population may be a geographical population. In other words, the population may be defined in terms of the area (e.g. region, country, continent) in which its members reside. The population may be defined in terms of attributes of its members, such as health status, vaccination status, age and so on.

The microbe from which the microbial protein is derived and the microbe expressing the homolog may each be capable of infecting the same species. That is, both the microbe from which the microbial protein is derived and the microbe expressing the homolog may be capable of infecting an individual belonging to a given species. The microbe from which the microbial protein is derived and the microbe expressing the homolog may be capable of infecting the same individual. The microbe from which the microbial protein is derived and the microbe expressing the homolog may be capable of infecting the different individuals belonging to the same species. The species may, for example, be canine, feline, avian, bovine, ovine, equine, porcine, murine or primate. Preferably, the species is human.

One or more (such as two or more, three or more, or four or more) of the microbes expressing the homolog may be an endemic common cold coronavirus. All of the microbes expressing the homolog may be an endemic common cold coronaviruses. For example, the one or more microbes expressing the homolog may comprise (A) HKU1, (B) OC43, (C) 229E and/or (D) NL63. The one or more microbes expressing the homolog may, for example, comprise (A); (B); (C); (D); (A) and (B); (A) and (C); (A) and (D); (B) and (C); (B) and (D); (C) and (D); (A), (B) and (C); (A), (B) and (D); (A), (C) and (D); (B), (C) and (D); or (A), (B), (C) and (D). In any of these cases, the microbe from which the microbial protein is derived may be SARS-CoV-2. Step(c) preparing a pool of fragments

Step (c) comprises preparing a pool of fragments in which: (i) each fragment is a fragment identified in step (a) for which step (b) determines the existence of a homolog; or (ii) each fragment is a fragment identified in step (a) for which step (b) does not determine the existence of a homolog, and the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein. Pool of fragments (i) and pool of fragments (ii) are each described in detail in the “Fragments and fragment pools" section above.

Methods for preparing a pool of fragments are well known in the art. In essence, each fragment to be included in the pool is obtained, and the pool is produced by combining each fragment into a single composition. A fragment comprised in the pool may be chemically derived from the parent protein, for example by proteolytic cleavage. A fragment comprised in the pool may be derived in an intellectual sense from the parent protein, for example by making use of the amino acid sequence of the parent protein and synthesising fragments based on the sequence. Fragments may be synthesised using methods well known in the art.

Pool of fragments

Disclosed herein is a pool of fragments derived from a microbial protein, wherein: (I) each fragment is comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and has a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; or (II) the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and each fragment does not have a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. The pool may, for example, be produced according to the method described above.

Fragments and pools of fragments are described in detail in the section “Fragments and fragment pools" above. Any of the aspects described in that section may apply to the pool of fragments disclosed herein. Microbial proteins are described in detail in the section “Microbial protein" above. Any of the aspects described in that section may apply to the pool of fragments disclosed herein. Further features of pool of fragments (I) and pool of fragments (II) are set out below.

Pool of fragments (I)

Each fragment comprised in the pool of fragments (I) is a fragment that is comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein. The fragments comprised in the pool of fragments (I) need not themselves form such a protein fragment library. Rather, each fragment comprised in the pool of fragments (I) is a fragment that is notionally comprised in a protein fragment library that encompasses at least 80% of the sequence of the microbial protein. In other words, each fragment comprised in the pool of fragments (I) is a fragment that is or would be found in a protein fragment library that encompasses at least 80% of the sequence of the microbial protein. Protein fragment libraries that encompasses at least 80% of the sequence of the microbial protein are described in detail in the section “-step (a) - identifying fragments comprised in a protein fragment library" above. Any of the aspects described in that section may apply to the pool of fragments (I).

Each fragment comprised in the pool of fragments (I) is also a fragment that has a homolog which is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. Such homologs are described in detail in the section “Step (b) - determining the existence of a homolog" above. Any of the aspects described in that section may apply to the pool of fragments (I).

The pool of fragments (I) essentially comprises fragments that are not unique to the microbe from which the microbial protein is derived. The pool of fragments (I) may thus comprise fragments that may be recognised by a cross-reactive immune response. That is, the pool of fragments (I) may comprise fragments that are recognised by (e.g. bind to antigen receptors on and/or trigger a response by) immune cells that are generated by contact with a microbe other that the microbe from which the microbial protein is derived.

Pool of fragments (II) Each fragment comprised in the pool of fragments (II) is a fragment that is identified as being comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein. In other words, each fragment comprised in the pool of fragments (II) is a fragment that is notionally comprised in a protein fragment library that encompasses at least 80% of the sequence of the microbial protein. That is, each fragment comprised in the pool of fragments (II) is a fragment that is or would be found in a protein fragment library that encompasses at least 80% of the sequence of the microbial protein. Protein fragment libraries that encompasses at least 80% of the sequence of the microbial protein are described in detail in the section “Step (a) - identifying fragments comprised in a protein fragment library" above. Any of the aspects described in that section may apply to the pool of fragments (II).

In addition, the fragments comprised in the pool (II) themselves form protein fragment library encompassing at least 80% of the sequence of the microbial protein. For example, the fragments comprised in the pool (II) may form a protein fragment library encompassing at least 85%, at least 90%, at least 95%, at least 98%, at least 99% of the sequence of the microbial protein. The fragments comprised in the pool (II) may form a protein fragment library encompassing the entire sequence of the microbial protein. Protein fragment libraries that encompasses at least 80% of the sequence of the microbial protein are described in detail in the section “Step (a) - identifying fragments comprised in a protein fragment library" above. Any of the aspects described in that section may apply to the pool of fragments (II). As explained above, in a protein fragment library in which the fragments together encompass at least 80% (such as at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%) of the sequence of the microbial protein, the fragments are diverse enough that the pool contains epitopes capable of binding to many different MHC alleles. This allows the pool to be used in assays for cell mediated immunity across the global population, despite variation in MHC alleles between subjects.

In addition, each fragment comprised in the pool of fragments (II) is a fragment that does not have a homolog which is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. Such homologs are described in detail in the section “Step (b) - determining the existence of a homolog" above. Any of the aspects described in that section may apply to the pool of fragments (II).

The pool of fragments (II) essentially comprises fragments that are unique to the microbe from which the microbial protein is derived. In other words, the pool of fragments (II) essentially comprises only fragments that do not have a homolog in another microbe belonging to the same family as the microbe from which the microbial protein is derived. Thus, the pool of fragments (II) may exclude fragments that may be recognised by a cross-reactive immune response. That is, the pool of fragments (II) may exclude fragments that are recognised by (e.g. bind to antigen receptors on and/or trigger a response by) immune cells generated by contact with a microbe other that the microbe from which the microbial protein is derived.

Consolidated pool of fragments

Disclosed herein is a consolidated pool of fragments which comprises two or more pools of the present disclosure. Each of the two or more pools comprises fragments derived from a different microbial protein. Each of the two or more pools may be produced according to a method of the present disclosure.

Fragments and pools of fragments are described in detail in the section “Fragments and fragment pools'" above. Any of the aspects described in that section may apply to the consolidated pool of fragments disclosed herein. Microbial proteins are described in detail in the section “Microbial protein" above. Any of the aspects described in that section may apply to the consolidated pool of fragments disclosed herein. Further features of the consolidated pool of fragments are set out below.

Each of the two or more pools comprised in the consolidated pool of fragments may be selected from: (I) a pool of fragments derived from a microbial protein, wherein each fragment is comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and has a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; and (II) a pool of fragments derived from a microbial protein, wherein the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and each fragment does not have a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived.

The consolidated pool may comprise both: (I) a pool of fragments derived from a microbial protein, wherein each fragment is comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and has a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; and (II) a pool of fragments derived from a microbial protein, wherein the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and each fragment does not have a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived.

The consolidated pool may comprise either: (I) a pool of fragments derived from a microbial protein, wherein each fragment is comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and has a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived; or (II) a pool of fragments derived from a microbial protein, wherein the fragments form a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and each fragment does not have a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. Thus, the consolidated pool may comprise two or more pools according to (I) and no pools according to (II). The consolidated pool may comprise two or more pools according to (II) and no pools according to (I).

Each of the two or more pools comprised in the consolidated pool comprises fragments derived from a different microbial protein. Inclusion of pools comprising fragments derived from a different microbial protein increases the likelihood of eliciting a cell mediated immune response when the consolidated pool is used in an assay for cell mediated immunity. Preferably, each of the two or more pools comprises fragments derived from a different microbial protein expressed by the same microbe. For example, each of the two or more pools may comprise fragments derived from a different microbial protein expressed by the same coronavirus. Each of the two or more pools may comprise fragments derived from a different microbial protein expressed by SARS-CoV-2. For instance, each of the two or more pools may comprise fragments derived from a different microbial protein selected from (A) SARS-CoV-2 S1 spike domain, (B) SARS-CoV-2 S2 spike domain, (C) SARS-CoV- 2 nucleocapsid protein, (D) SARS-CoV-2 membrane protein/or (E) SARS-CoV-2 envelope protein. The consolidated pool may, for example, comprise pools of fragments derived from (A) and (B); (A) and (C); (A) and (D) (A) and (E); (B) and (C); (B) and D); (B) and (E); (C) and (D); (C) and (E); (D) and (E); (A), (B) and (C); (A), (B and (D); (A), (B) and (E); (A), (C)) and (D); (A) (C) and (E); (A), (D) and (E); (B), (C) and (D); (B), (C) and (E); (B), (D) and (E); (C), (D) and (E); (A), (B), (C) and (D); (A), (B), (C) and (E); (A), (B), (D) and (E); (A), (C), (D) and (E); (B), (C), (D) and (E); (A), (B), (C), (D) and (E).

For example, the pool may comprise or consist panel 13 (P13) of the Examples. The fragments comprised in P13 are set out in Table 3 in Example 2. P13 is a consolidated pool that comprises four pools that are each (I) a pool of fragments derived from a microbial protein, wherein each fragment is comprised in a protein fragment library encompassing at least 80% of the sequence of the microbial protein, and has a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more microbes in the same family as the microbe from which the microbial protein is derived. The four pools are derived from (A) SARS-CoV-2 S1 spike domain, (B) SARS-CoV-2 S2 spike domain, (C) SARS-CoV-2 nucleocapsid protein and (D) SARS-CoV-2 membrane protein respectively.

Method for determining the presence or absence of immune cells

Disclosed herein is a method for determining the presence or absence of immune cells targeting a microbe. The method comprises contacting a sample comprising immune cells with one or more fragment pools disclosed herein, and detecting in vitro the presence or absence of an immune response to the one or more pools. The method may comprise an assay for cell-mediated immunity, such as T cell-mediated immunity.

Sample The method comprises contacting a sample comprising immune cells with one or more fragment pools disclosed herein. The sample may be a sample that has been obtained from a subject. The subject may be canine, feline, avian, bovine, ovine, equine, porcine, murine or primate. Preferably, the subject is human.

The sample may, for example, comprise whole blood. The sample may comprise immune cells isolated from whole blood. For example, the sample may comprise peripheral blood mononuclear cells (PBMCs) isolated from whole blood. The sample may, for example, comprise T cells. The T cells may comprise CD8+ T cells and/or CD4+ T cells.

Accordingly, the immune cells comprised in the sample may comprise PBMCs. The immune cells comprised in the sample may comprise T cells. The immune cells comprised in the sample may comprise CD8+ T cells. The immune cells comprised in the sample may comprise CD4+ T cells. The immune cells comprised in the sample may comprise CD4+ T cells and CD8+ T cells.

Fragment pools

The method comprises contacting a sample comprising immune cells with one or more fragment pools disclosed herein. Such fragment pools are described in detail above.

The sample may, for example, be contacted with two or more fragment pools disclosed herein. For instance, the sample may be contacted with three or more, four or more, or five or more fragment pools disclosed herein.

The one or more fragment pools contacted with the sample may comprise (a) one or more pools of fragments according to pool of fragments (I) described above. For example, the one or more pools contacted with the sample may comprise two or more, three or more, four or more, or five or more pools of fragments according to pool of fragments (I) described above. The one or more pools contacted with the sample may comprise (b) one or more pools of fragments according to pool of fragments (II) described above. For example, the one or more pools contacted with the sample may comprise two or more, three or more, four or more, or five or more pools of fragments according to pool of fragments (II) described above. The one or more pools contacted with the sample may comprise (c) one or more pools of fragments according to the consolidated pool of fragments described above. For example, the one or more pools contacted with the sample may comprise two or more, three or more, four or more, or five or more pools of fragments according to the consolidated pool of fragments described above. The one or more pools contacted with the sample may comprise: (a); (b); (c); (a) and (b); (a) and (c); (b) and (c); or (a), (b) and (c).

When the one or more fragment pools comprises two or more fragment pools, each of the two or more pools may comprise fragments derived from a different microbial protein. That is, the microbial protein from which the fragments in one of the two or more pools are derived may be different from the microbial protein(s) from which the fragments in the other pool(s) are derived. Use of multiple pools each comprising fragments derived from a different microbial protein increases the likelihood of eliciting an immune response by the immune cells comprised in the sample.

Preferably, each of the two or more pools comprises fragments derived from a different microbial protein expressed by the same microbe. For example, each of the two or more pools may comprise fragments derived from a different microbial protein expressed by the same coronavirus. Each of the two or more pools may comprise fragments derived from a different microbial protein expressed by SARS- CoV-2. For instance, each of the two or more pools may comprise fragments derived from a different microbial protein selected from (A) SARS-CoV-2 surface glycoprotein, (B) SARS-CoV-2 nucleocapsid protein, (C) SARS-CoV-2 membrane protein and/or (D) SARS-CoV-2 envelope protein. The two or more pools may, for example, comprise pools of fragments derived from (A) and (B); (A) and (C); (A) and (D); (B) and (C); (B) and (D); (C) and (D); (A), (B) and (C); (A), (B) and (D); (A), (C) and (D); (B), (C) and (D); or (A), (B), (C) and (D). Each of the two or more pools may be contacted with the sample in a separate reaction.

The method may further comprise contacting the sample with a pool of fragments derived from a protein from the microbe, and detecting in vitro the presence or absence of an immune response to the pool, wherein the fragments in the pool form a protein fragment library encompassing at least 80% of the sequence of the protein. Protein fragment libraries that encompasses at least 80% of the sequence of the microbial protein are described-in detail in the section “Step (a) - identifying fragments comprised in a protein fragment library" above. Any of the aspects described in that section may apply to this further pool of fragments. This further pool may comprise fragments capable of stimulating both cell mediated immunity that is cross-reactive for the microbe of interest, and cell mediated immunity that is specific for the microbe of interest. Essentially, this further pool is not specially optimised for use in an assay for cell mediated immunity, and may be used in combination with a pool described herein that is optimised for assaying (I) cell mediated immunity that is cross-reactive for the microbe of interest, or (II) cell mediated immunity that is specific for the microbe of interest. This further contacting step may be conducted in a separate reaction.

The further pool and the one or more pools contacted with the sample may comprise fragments derived from a different microbial protein. Preferably, the further pool and the one or more pools contacted with the sample comprise fragments derived from a different microbial protein expressed by the same microbe. For example, the further pool and the one or more pools contacted with the sample may comprise fragments derived from a different microbial protein expressed by the same coronavirus. Each of the further pool and the one or more pools contacted with the sample may comprise fragments derived from a different microbial protein expressed by SARS-CoV-2. For instance, each of the further pool and the one or more pools contacted with the sample may comprise fragments derived from a different microbial protein selected from (A) SARS-CoV-2 surface glycoprotein, (B) SARS-CoV-2 nucleocapsid protein, (C) SARS-CoV-2 membrane protein and/or (D) SARS-CoV-2 envelope protein. The further pool and the one or more pools contacted with the sample may, for example, comprise pools of fragments derived from (A) and (B); (A) and (C); (A) and (D); (B) and (C); (B) and (D); (C) and (D); (A), (B) and (C); (A), (B) and (D); (A), (C) and (D); (B), (C) and (D); or (A), (B), (C) and (D).

The method may further comprise contacting the sample with a pool of fragments derived from a protein from a microbe in the same family as the microbe from which the microbial protein is derived and detecting in vitro the presence or absence of an immune response to the pool, wherein the fragments in the pool form a protein fragment library encompassing at least 80% of the sequence of the protein. Protein fragment libraries that encompasses at least 80% of the sequence of the microbial protein are described-in detail in the section “Step (a) - identifying fragments comprised in a protein fragment library" above. This further contacting step is conducted in a separate reaction. Preferably, the microbe from which the microbial protein is derived is an emerging pathogen, and the microbe in the same family is endemic within a population. In this case, the further contacting and detecting step provides information about prior exposure to endemic pathogens. This information may aid in the interpretation of an immune response detected in connection with the emerging pathogen. For example, absence of an immune response to the endemic pathogen may help to demonstrate that an immune response detected to the emerging pathogen is specific for that emerging pathogen and not the result of cross-protective immunity conferred by prior exposure to the endemic pathogen.

Detecting in vitro the presence or absence of an immune response

The method comprises detecting in vitro the presence or absence of an immune response to the one or more pools. Mechanisms for detecting in vitro the presence or absence of an immune response are well known in the art.

Detecting the presence or absence of an immune response may, for example, comprise one or more of the following, in any combination:

Determining the number or proportion of cells comprised in the cell sample or an aliquot thereof that are responsive to the one or more pools. Determining the expression or secretion of one or more cytokines by immune cell comprised in the sample in response to the one or more pools. The one or more cytokines may, for example, comprise interferon gamma (IFNy).

Determining the number or proportion of immune cells comprised in the sample or an aliquot thereof that secrete one or more cytokines in response to the one or more pools. The one or more cytokines may, for example, comprise interferon gamma (IFNy).

Determining the expression of one or more markers by immune cells comprised in the sample in response to the one or more pools. The immune cells may comprise T cells. The one or more markers may, for example, comprised markers of activation, degranulation, or other T cell functions. T cell markers and their associated functions are well known in the art.

Methods for such determination are known in the art.

Detecting in vitro the presence or absence of an immune response may, for example, comprise determining the number or proportion of immune cells comprised in the cell sample or an aliquot thereof that are responsive to the one or more pools. This may comprise determining the number or proportion of immune cells comprised in the cell sample or an aliquot thereof that secrete one or more cytokines in response to the one or more pools. The cytokine may, for example, be interferon gamma (IFNy). Methods for such determination are well known in the art and include, for example, flow cytometry and ELISpot assays. Preferably, such determination is by enzyme-linked immunospot (ELISpot) assay.

The method may, for example, comprise an interferon gamma release assay (IGRA). Assays for interferon gamma release are well-known in the art and include, for example, ELISpot assays and enzyme linked immunosorbent assays (ELISA), such as in-tube ELISAs.

Preferably, the method comprises an ELISpot assay. Preferably, the ELISpot assay is an interferon gamma release assay (IGRA). Preferably, the ELISpot assay is an interferon gamma release assay (IGRA) and the immune cells comprise T cells, such as CD8+ T cells and/or CD4+ T cells.

ELISpot assays are well-known in the art. The ELISpot is an immunoassay that measures the frequency of protein secreting cells in a sample at the single-cell level. Cells from the cell sample are cultured in one or more wells of an assay plate. Cells may be cultured at a density of, for example, 100,000 to 500,000 cells per well. For instance, cells may be cultured at a density of 150,000 to 450,000 cells per well; 200,000 to 400,000 cells per well; 250,000 to 350,000 cells per well. For example, cells may be cultured at a density of about 100,000, about 150,000, about 200,000, about 250,000, about 300,000, about 350,000, about 400,000, about 450,000 or about 500,000 cells per well. Cells are preferably cultured at a density of about 250,000 cells per well. Each well comprises a surface coated with a capture antibody specific for the secreted protein of interest. A different stimulus regime may be applied to each of the one or more well, for example to provide test wells and control wells. Proteins that are secreted by the cells are captured by the capture antibody. After an appropriate incubation time, cells are removed and the secreted protein is detected using a detection antibody that is directly or indirectly conjugated with an enzyme. Upon contact of the enzyme with a substrate forming precipitating product, visible spots from on the surface. Each spot corresponds to an individual protein-secreting cell. The assay is interpreted based on number of spots formed in each well. Spot count may be expressed as <number of spots> per <number of cultured cells>, or a multiple thereof. For example, if 250,000 cells are cultured in each well, spot count may be expressed as spots per 250,000 cells or a multiple thereof (e.g. spots per million cells).

The method may comprise conducting one or more separate reactions in order to contact each pool with a different aliquot of the cell sample. Preferably, each of the different aliquots has substantially the same composition. An aliquot is essentially a divided portion of the cell sample. Contacting each pool with a different aliquot of the cell sample allows the sample to be contacted with each of the pools separately. In other words, the sample can be contacted with each pool in a physically separate reaction. A plurality of physically separate reactions may be performed in order to contact each of a plurality of aliquots with a different pool. The physically separate reactions are preferably performed at the same time. When the method comprises an ELISpot assay, the physically separate reactions may, for example, be performed in different wells of an ELISpot plate.

In addition to the separate reactions conducted to contact each pool with a different aliquot of the cell sample, the method may comprise conducting one or more separate reactions in order to provide a negative control reaction or a positive control reaction. A negative control reaction may, for example, comprise an aliquot of the cell sample in the absence of a pool of fragments or other antigen. A positive control reaction may, for example, comprise an aliquot of the cell sample and a known stimulator of cells comprised in the cell sample. When the cell sample comprises T cells, the known stimulator may for example be phytohaemagglutinin (PHA).

It is readily apparent to the skilled person how the presence or absence of an immune response to the one or more pools may be detected based on the various determinations described above. For example:

The presence of cells in the sample that are responsive to the one or more pools may indicate the presence of an immune response to the one or more pools. The absence of cells in the sample that are responsive to the one or more pools may, for example, indicate the absence of an immune response to the one or more pools.

Expression or secretion of one or more cytokines by immune cells comprised in the sample in response to the one or more pools may, for example, indicate the presence of an immune response to the one or more pools. The absence of expression or secretion of one or more cytokines by immune cells comprised in the sample in response to the one or more pools may, for example, indicate the absence of an immune response to the one or more pools.

The number or proportion of immune cells comprised in the sample or an aliquot thereof that secrete one or more cytokines in response to one or more pools may, for example, indicate the presence or absence of an immune response to the one or more pools. That is, the presence or absence of an immune response to the one or more pools may be determined based on the number of immune cells comprised in the sample or an aliquot thereof that secrete one or more cytokines in response to the one or more pools. The presence or absence of an immune response may be determined based on the proportion of immune cells comprised in the sample or an aliquot thereof that secrete one or more cytokines in response to the one or more pools.

The expression of one or more markers by one or more immune cells comprised in the sample in response to one or more pools may indicate the presence of an immune response to the one or more pools. The absence of expression of one or more markers by one or more immune cells comprised in the sample in response to the one or more pools may indicate the absence of an immune response to the one or more pools.

When the method comprises an ELISpot assay, detecting the presence or absence of an immune response to the one or more pools may comprise determining the number of spots formed in each well. Detecting the presence or absence of an immune response to the one or more pools may comprise processing mathematically the number of spots formed in each well (for example by calculating the square root of the number of spots, the cubic root of the number of spots, and/or log(<number of spots> +1)). A cut-off may be applied to the number of spots formed in each well (or the mathematically processed equivalent thereof) in order to determine the presence or absence of an immune response to the one or more pools.

In one aspect disclosed herein, the method may further comprise the step of diagnosing the presence or absence of infection with the microbe in a subject from which the sample is obtained. That is, the method for determining the presence of absence of immune cells targeting a microbe may be a method for determining the presence or absence of infection with the microbe. The method for determining the presence or absence of immune cells targeting a microbe may be a method for diagnosing infection with the microbe. The presence of an immune response to the one or more pools may indicate the presence of infection with the microbe in the subject. The absence of an immune response to the one or more pools may indicate the absence of infection with the microbe in the subject.

The following Examples illustrate the invention.

Example 1 - SARS-CoV-2 peptide pool bioinformatics homology search

Objectives

Analyse peptide sequences generated from the main structural proteins of SARS-CoV-2 for homology to any common human pathogen using a bioinformatics approach.

Summary

Significant homology was detected between SARS-CoV-2 peptides and various human coronaviruses, including SARS-CoV-1 and the endemic common cold coronaviruses. Modified peptide lists can be generated by removing peptide with detected homology.

1. Introduction/background

T-SPOT Discovery SARS-CoV-2 is an assay kit for studying the immune response to SARS-CoV-2, the causative agent of COVID-19. T-SPOT Discovery SARS-CoV-2 consists of pools of overlapping 15-mer peptides which scan the full length of the four major structural proteins of SARS-CoV-2. These proteins are the spike surface glycoprotein (S or spike; which comprises S1 spike domain and S2 spike domain), the nucleocapsid phosphoprotein (N or nuc), the membrane glycoprotein (M or memb) and the envelope protein (env or E).

As SARS-CoV-2 is an emerging human pathogen, the immune response to the virus has not been fully characterised. SARS-CoV-2-specific CD4 and CD8 T-cells have been identified in recovered patients. In these studies, SARS-CoV-2 T-cell responses were also detected in donor samples isolated before the emergence of the virus. This suggests that there is some level of cross-reactive immune response, possibly originally targeting the endemic common cold human coronaviruses.

This study utilised a bioinformatics approach to characterise overlapping peptide panels generated from the main structural proteins of SARS-CoV-2. Homology to other human pathogens was assessed by homology alignment search using the BLAST search engine.

2, Results

2.1. Overlapping peptide generation

The following Genbank accession numbers were used for the reference sequences of the SARS-CoV-2proteins: surface glycoprotein- qhd43416.1, nucleocapsid - qhd43423.2, membrane - qhd43419.1, and envelope - yp_009724392.1. See appendix 1 for full protein sequences. Amino acids 1 to 643 of qhd43416.1 (SEQ ID NO: 741) represent S1 spike domain. Amino acids 633 to 1273 of qhd43416.1 (SEQ ID NO: 741) represent S2 spike domain.

Four lists of 15-mer peptide with 11-aa overlap sequences were generated (appendix 2).

2.2. Homology search

The 487 peptide sequences generated in section 2.1 were searched for homology using the BLAST search tool. Approximately 50,000 results were retrieved from the searches.

Results were filtered by number of matching amino acids between the peptide sequence and the result sequence, with greater than or equal to 9 matches considered high homology. This method fails to filter out matches consisting of multiple small alignments (e.g. three separate alignments of three residues) but does capture all high homology matches.

Five main categories of homology matches were detected:

1. SARS-CoV-2. These results were expected and confirm the correct sequences were used for the search terms 2. SARS-CoV-1. SARS-CoV-2 shares a very high level of homology with SARS-CoV-1. Approximately 400 peptides from the 487 peptides on the list have detectable homology to SARS-CoV-1.

3. Non-coronavirus human pathogens. No major human pathogens or antigens were detected in the homology search. Several low quality hits (E values >1) were detected against pathogens such as E.coli and Campylobacter proteins, however these are unlikely have cross- reactive immune responses as the homology is quite low.

4. Animal coronaviruses. There were over 1000 matches to 130 unique proteins from more than 50 different animal coronaviruses. Table 1 lists the animal coronaviruses detected. Despite the high homology detected between SARS-CoV-2 and the animal coronaviruses these sequences are unlikely to cause cross-reactive immune responses as it is very unlikely that humans would have been exposed to these viruses.

Table 1 - Animal coronaviruses with significant homology to SARS-CoV-2 peptides

5. Endemic human coronaviruses. Multiple matches to all four endemic human coronaviruses (HKU1, OC43, 229E, NL63) were detected. Table 2 lists the proteins and viruses where homology was detected. Homology was detected in 26 peptides from the spike, membrane and nucleocapsid pools. Homology was not detected in any peptides from the envelope pool. Appendix 3(a) lists the sequences of the peptides with high homology to the endemic human coronaviruses. The endemic human coronaviruses are a likely source of any cross reactive immune response as infection with these viruses are very common. To ensure that all homology with the endemic human coronaviruses was captured the filtering criteria was removed and all human coronavirus hits were selected from the BLAST results. This gave a list of 46 peptides with homology to the human coronavirus. Appendix 3(b) list these sequences.

Table 2 - Human coronaviruses and proteins with significant homology to SARS- CoV-2 peptides

3. Conclusion

Sequences for 487 overlapping peptides were generated from the spike, membrane, nucleocapsid and envelop proteins of SARS-CoV-2. Homology to common human pathogens was detected by performing a BLAST search on the sequences. The pathogens with the highest homology to the SARS-CoV-2 peptides were SARS-CoV-1 and the endemic human coronaviruses. The potential for peptide pools to provoke a cross-reactive immune responses could be reduced by removing the identified peptides from the antigen pools used in a SARS-CoV-2 assay, such as an assay for cell mediated immunity to SARS-CoV-2.

Appendix 1 - full protein sequences

Full protein sequence of SARS-CoV-2 surface glycoprotein (spike glycoprotein) [QHD43416.1]

MF VFLVLLPLVS SQCVNLTTRTQLPP AYTNSFTRGVYYPDKVFRS S VLHSTQD LFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFG TTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRV YSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINL VRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAY YVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNF RVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNS ASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYK LPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGS TPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKK STNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQT LEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWR VYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSV ASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYI CGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIK DFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLI CAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQM AYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQ NAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYV TQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGV VFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEP QIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDL GDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYIWLG FIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHY T (SEQ ID NO: 741)

Full protein sequence of SARS-CoV-2 membrane glycoprotein [QHD43419.1]

MADSNGTITVEELKKLLEQWNLVIGFLFLTWICLLQFAYANRNRFLYIIKLIFL WLLWPVTLACFVLAAVYRINWITGGIAIAMACLVGLMWLSYFIASFRLFART RSMWSFNPETNILLNVPLHGTILTRPLLESELVIGAVILRGHLRIAGHHLGRCDI KDLPKEIT VATSRTLS YYKLGASQRVAGDSGF AAYSRYRIGNYKLNTDHS S S S DNIALLVQ (SEQ ID NO: 742)

Full protein sequence of SARS-CoV-2 nucleocapsid phosphoprotein [QHD43423.2]

MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASW FTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDL SPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAA IVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNSSRNSTPGSSRGTSPARM AGNGGDAALALLLLDRLNQLESKMSGKGQQQQGQTVTKKSAAEASKKPRQ KRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSAS AFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPP TEPKKDKKKKADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMSSADS TQA (SEQ ID NO: 743)

Full protein sequence of SARS-CoV-2 envelope protein [YP 009724392.1] MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKP

SFYVYSRVKNLNSSRVPDLLV (SEQ ID NO: 744)

Appendix 2 - overlapping peptide sequences

Overlapping peptide sequences derived from SARS-CoV-2 surface glycoprotein

(spike glycoprotein) [qhd43416.1]

Overlapping peptide sequences derived from SARS-CoV-2 membrane protein [QHD43419.1] Overlapping peptide sequences derived from SARS-CoV-2 nucleoprotein [QHD43423.2]

Overlapping peptide sequences derived from SARS-CoV-2 envelope protein [YP 009724392.1]

Appendix 3 - Peptides sequences with identified homology to endemic human coronaviruses a) High homology cut off b) Homology detected (no cut off)

Comparative Example 1 - MHC binding predictions

In an alternative approach to panel construction, performed for illustrative purposes only, a list of predicted MHC binding epitopes were generated by using the TepiTool software from the internet Epitope Database (IEDB.org). Predicted MHC class I and class Il-binding peptides were predicted from the spike protein for the 27 most common HLA class I allelles and the 26 most common HLA class II alleles (appendix 4 for raw TepiTool results). Once duplicate peptides were removed, a list of 117 9mers and 137 15mers were generated spanning the spike, envelope and nucleocapsid proteins (appendix 4a).

This list was then examined for homology using the BLAST search tool as described above. 29 peptides were identified as having high homology (>=9aa matches) to human coronaviruses (appendix 4b), and 90 peptides (appendix 4c) had homology when the lower homology criteria was used.

Appendix 4 a) Predicted MHC class I and class II binding peptides from SARS-CoV-2 genes b) MHC binding peptides with high homology to endemic human coronaviruses c) MHC binding peptides with homology to endemic human coronaviruses

Example 2 - Use of optimised pools of fragments derived from SARS-CoV-2 proteins

ELISpot assays were performed using PBMC samples obtained from healthy donors. Various fragment pools were separately contacted with the PBMC samples in order to perform the ELISpot:

“Pl-4” comprising panel 1, 2, 3 or 4 respectively. Each of panels 1 to 4 is a fragment pool in which the fragments form a protein fragment library encompassing the sequence of a SARS-CoV-2 protein. The fragments are 15 amino acids in length and overlap by 11 amino acids. Fragments having a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more of the endemic common cold coronaviruses are excluded from the protein fragment library. For panel 1, the SARS-CoV-2 protein is SARS-CoV-2 S1 spike domain (S1). For panel 2, the SARS-CoV-2 protein is SARS- CoV-2 S2 spike domain (S2). For panel 3, the SARS-CoV-2 protein is SARS-CoV-2 nucleocapsid protein (N). For panel 4, the SARS-CoV-2 protein is SARS-CoV-2 membrane protein (M).

“P13” comprising the fragments excluded from Pl -4. The fragments in P13 each have a homolog that is an amino acid sequence that has at least 60% sequence identity to the fragment and is expressed by one or more of the endemic common cold coronaviruses (HKU1, OC43, 229E and NL63). The fragments comprised in P13 are set out in Table 3 below.

“P7-10” comprising one of panel 7, 8, 9 or 10 respectively. Each of panels 7 to 10 is a fragment pool in which the fragments form a protein fragment library encompassing the sequence of spike glycoprotein from a different endemic human coronavirus (P7 = HKU1, P8 = 229E, P9 = NL63, P10 = OC43). The fragments are 15 amino acids in length and overlap by 11 amino acids.

P1-4, P13 and P7-10 are represented graphically in Figure 1.

Table 3 - fragments comprised in panel 13 (P13)

Results

12% (53/449) were reactive to one of Pl, P3 and P4.

76% (219/289) responded to Spike from at least one of the endemic strains, P7-10.

10% (47/449) responded to P13. For those subjects responding, the mean adjusted spot count was 16.5 (sd 13.6), the median was 11, and the range was from 6 to 64. In order to assess the value of P13 in distinguishing SARS-CoV-2 specific immune responses from cross-reactive immune responses primed by endemic coronaviruses, P13 reactive samples were allocated into the following groups:

Based on this dataset, it seems that in most cases P13 responses could be attributed to a prior exposure to endemic strains of coronaviruses (group 2). When individuals react (i.e. raise a T-cell immune response) to endemic strains, only a small proportion also react to SARS Cov-2 (i.e. Panel 13). Cross-reactivity between CCCs and SARS-CoV-2 is not, therefore, common in the population. However, it is possible that such responses provide some protection against COVID-19. P13 may have utility in screening for pre-existing cross-reactive immune responses for SARS- CoV-2 primed by prior exposure to one or more endemic coronaviruses.

P1-4 are optimised for high specificity for SARS-CoV-2. These pools exclude fragments that are potentially cross-reactive with homologs found in endemic coronaviruses. P1-4 may have utility in screening for SARS-CoV-2 specific immune responses.

Summary of immune reactive responses to SARS Cov-2 peptide pools and spike from CCCs peptide pools