Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD OF IDENTIFYING A SUBJECT HAVING KAWASAKI DISEASE
Document Type and Number:
WIPO Patent Application WO/2020/030609
Kind Code:
A1
Abstract:
A method of identifying a subject having Kawasaki disease (KD), which includes discriminating the subject from a subject having another condition, for example other infectious and inflammatory conditions, such as those that present similar symptoms to KD. Also provided is a minimal gene signature employed in the method, as well as primers, probes and gene chips for use in the method.

Inventors:
HERBERG JETHRO (GB)
WRIGHT VICTORIA (GB)
LEVIN MICHAEL (GB)
HOGGART CLIVE (GB)
KAFOROU MYRSINI (GB)
Application Number:
PCT/EP2019/071052
Publication Date:
February 13, 2020
Filing Date:
August 05, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
IMPERIAL COLLEGE SCI TECH & MEDICINE (GB)
International Classes:
C12Q1/6883
Domestic Patent References:
WO2016014761A12016-01-28
Other References:
STEPHEN J. POPPER ET AL: "Gene Transcript Abundance Profiles Distinguish Kawasaki Disease from Adenovirus Infection", JOURNAL OF INFECTIOUS DISEASES. JID, vol. 200, no. 4, 15 August 2009 (2009-08-15), US, pages 657 - 666, XP055639469, ISSN: 0022-1899, DOI: 10.1086/603538
PREETI JAGGI ET AL: "Whole blood transcriptional profiles as a prognostic tool in complete and incomplete Kawasaki Disease", PLOS ONE, vol. 13, no. 5, 29 May 2018 (2018-05-29), pages e0197858, XP055639675, DOI: 10.1371/journal.pone.0197858
VICTORIA J. WRIGHT ET AL: "Diagnosis of Kawasaki Disease Using a Minimal Whole-Blood Gene Expression Signature", JAMA PEDIATRICS, vol. 172, no. 10, 6 August 2018 (2018-08-06), USA, pages e182293, XP055639460, ISSN: 2168-6203, DOI: 10.1001/jamapediatrics.2018.2293
EDGAR ET AL., NCBI'S GENE EXPRESSION OMNIBUS, 2002
KAWASAKI TKOSAKI FOKAWA SSHIGEMATSU IYANAGAWA H: "A new infantile acute febrile mucocutaneous lymph node syndrome (MLNS) prevailing in Japan", PEDIATRICS, vol. 54, no. 3, 1974, pages 271 - 6
MAKINO NNAKAMURA YYASHIRO MAE RTSUBOI SAOYAMA Y ET AL.: "Descriptive epidemiology of Kawasaki disease in Japan, 2011-2012: from the results of the 22nd nationwide survey", JOURNAL OF EPIDEMIOLOGY / JAPAN EPIDEMIOLOGICAL ASSOCIATION, vol. 25, no. 3, 2015, pages 239 - 45
DU ZDZHAO DDU JZHANG YLLIN YLIU C ET AL.: "Epidemiologic study on Kawasaki disease in Beijing from 2000 through 2004", THE PEDIATRIC INFECTIOUS DISEASE JOURNAL, vol. 26, no. 5, 2007, pages 449 - 51
KIM GBPARK SEUN LYHAN JWLEE SYYOON KL ET AL.: "Epidemiology and Clinical Features of Kawasaki Disease in South Korea, 2012-2014", THE PEDIATRIC INFECTIOUS DISEASE JOURNAL, vol. 36, no. 5, 2017, pages 482 - 5
LUE HCCHEN LRLIN MTCHANG LYWANG JKLEE CY ET AL.: "Estimation of the incidence of Kawasaki disease in Taiwan. A comparison of two data sources: nationwide hospital survey and national health insurance claims", PEDIATR NEONATOL., vol. 55, no. 2, 2014, pages 97 - 100
HARNDEN AMAYON-WHITE RPERERA RYEATES DGOLDACRE MBURGNER D: "Kawasaki disease in England: ethnicity, deprivation, and respiratory pathogens", THE PEDIATRIC INFECTIOUS DISEASE JOURNAL, vol. 28, no. 1, 2009, pages 21 - 4
HOLMAN RCBELAY EDCHRISTENSEN KYFOLKEMA AMSTEINER CASCHONBERGER LB: "Hospitalizations for Kawasaki syndrome among children in the United States, 1997-2007", THE PEDIATRIC INFECTIOUS DISEASE JOURNAL, vol. 29, no. 6, 2010, pages 483 - 8
KATO HSUGIMURA TAKAGI TSATO NHASHINO KMAENO Y ET AL.: "Long-term consequences of Kawasaki disease. A 10- to 21-year follow-up study of 594 patients", CIRCULATION, vol. 94, no. 6, 1996, pages 1379 - 85
SUDA KIEMURA MNISHIONO HTERAMACHI YKOTEDA YKISHIMOTO S ET AL.: "Long-term prognosis of patients with Kawasaki disease complicated by giant coronary aneurysms: a single-institution experience", CIRCULATION, vol. 123, no. 17, 2011, pages 1836 - 42
DANIELS LBGORDON JBBURNS JC: "Kawasaki disease: late cardiovascular sequelae", CURRENT OPINION IN CARDIOLOGY, vol. 27, no. 6, 2012, pages 572 - 7
YU JJ.: "Use of corticosteroids during acute phase of Kawasaki disease", WORLD J CLIN PEDIATR., vol. 4, no. 4, 2015, pages 135 - 42
TREMOULET AHJAIN SJAGGI PJIMENEZ-FERNANDEZ SPANCHERI JMSUN X ET AL.: "Infliximab for intensification of primary therapy for Kawasaki disease: a phase 3 randomised, double-blind, placebo-controlled trial", LANCET, vol. 383, no. 9930, 2014, pages 1731 - 8
DOMINGUEZ SRANDERSON MSEL-ADAWY MGLODE MP: "Preventing coronary artery abnormalities: a need for earlier diagnosis and treatment of Kawasaki disease", PEDIATR INFECT DIS J., vol. 31, no. 12, 2012, pages 1217 - 20
MCCRINDLE BWROWLEY AHNEWBURGER JWBURNS JCBOLGER AFGEWITZ M ET AL.: "Diagnosis, Treatment, and Long-Term Management of Kawasaki Disease: A Scientific Statement for Health Professionals From the American Heart Association", CIRCULATION, vol. 135, no. 17, 2017, pages e927 - e99
ANDERSON STKAFOROU MBRENT AJWRIGHT VJBANWELL CMCHAGALUKA G ET AL.: "Diagnosis of childhood tuberculosis and host RNA expression in Africa", THE NEW ENGLAND JOURNAL OF MEDICINE, vol. 370, no. 18, 2014, pages 1712 - 23, XP055316396, doi:10.1056/NEJMoa1303657
RAMILO 0ALLMAN WCHUNG WMEJIAS AARDURA MGLASER C ET AL.: "Gene expression patterns in blood leukocytes discriminate patients with acute infections", BLOOD, vol. 109, no. 5, 2007, pages 2066 - 77, XP002580520, doi:10.1182/BLOOD-2006-02-002477
FRANGOU EABERTSIAS GKBOUMPAS DT: "Gene expression and regulation in systemic lupus erythematosus", EUR J CLIN INVEST., vol. 43, no. 10, 2013, pages 1084 - 96
JIA HLLIU CWZHANG LXU WJGAO XJBAI J ET AL.: "Sets of serum exosomal microRNAs as candidate diagnostic biomarkers for Kawasaki disease", SCIENTIFIC REPORTS, vol. 7, 2017, pages 44706
KUO HCHSIEH KSMING-HUEY GUO MWENG KPGER LPCHAN WC ET AL.: "Next-generation sequencing identifies micro-RNA-based biomarker panel for Kawasaki disease", THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, vol. 138, no. 4, 2016, pages 1227 - 30
HERBERG JAKAFOROU MWRIGHT VJSHAILES HELEFTHEROHORINOU HHOGGART CJ ET AL.: "Diagnostic Test Accuracy of a 2-Transcript Host RNA Signature for Discriminating Bacterial vs Viral Infection in Febrile Children", JAMA, vol. 316, no. 8, 2016, pages 835 - 45, XP009500864, doi:10.1001/jama.2016.11236
HOANG LT, SHIMIZU CLING LNAIM ANKHOR CCTREMOULET AH ET AL.: "Global gene expression profiling identifies new therapeutic targets in acute Kawasaki disease", GENOME MED., vol. 6, no. 11, 2014, pages 541, XP021207765, doi:10.1186/s13073-014-0102-6
HERBERG JAKAFOROU MGORMLEY SSUMNER ERPATEL SJONES KD ET AL.: "Transcriptomic profiling in childhood H1N1/09 influenza reveals reduced expression of protein synthesis genes", J INFECT DIS., vol. 208, no. 10, 2013, pages 1664 - 8, XP055234869, doi:10.1093/infdis/jit348
WATANABE S: "Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory", JOURNAL OF MACHINE LEARNING RESEARCH, vol. 11, 2010, pages 3571 - 94, XP058336459
JOHNSON WELI CRABINOVIC A: "Adjusting batch effects in microarray expression data using empirical Bayes methods", BIOSTATISTICS (OXFORD, ENGLAND, vol. 8, no. 1, 2007, pages 118 - 27, XP055067729, doi:10.1093/biostatistics/kxj037
ABE JEBATA RJIBIKI TYASUKAWA KSAITO HTERAI M: "Elevated granulocyte colony-stimulating factor levels predict treatment failure in patients with Kawasaki disease", THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, vol. 122, no. 5, 2008, pages 1008 - 13, XP025641323, doi:10.1016/j.jaci.2008.09.011
ABE JJIBIKI TNOMA SNAKAJIMA TSAITO HTERAI M: "Gene expression profiling of the effect of high-dose intravenous Ig in patients with Kawasaki disease", JOURNAL OF IMMUNOLOGY, vol. 174, no. 9, 2005, pages 5837 - 45, XP002371640
FURY WTREMOULET AHWATSON VEBEST BMSHIMIZU CHAMILTON J ET AL.: "Transcript abundance patterns in Kawasaki disease patients with intravenous immunoglobulin resistance", HUM IMMUNOL., vol. 71, no. 9, 2010, pages 865 - 73, XP055365050, doi:10.1016/j.humimm.2010.06.008
POPPER SJSHIMIZU CSHIKE HKANEGAYE JTNEWBURGER JWSUNDEL RP ET AL.: "Gene-expression patterns reveal underlying biological processes in Kawasaki disease", GENOME BIOL., vol. 8, no. 12, 2007, pages R261, XP021041527
POPPER SJWATSON VESHIMIZU CKANEGAYE JTBURNS JCRELMAN DA: "Gene transcript abundance profiles distinguish Kawasaki disease from adenovirus infection", J INFECT DIS., vol. 200, no. 4, 2009, pages 657 - 66
EBIHARA TENDO RKIKUTA HISHIGURO NMA XSHIMAZU M ET AL.: "Differential gene expression of S100 protein family in leukocytes from patients with Kawasaki disease", EUROPEAN JOURNAL OF PEDIATRICS, vol. 164, no. 7, 2005, pages 427 - 31, XP019345210, doi:10.1007/s00431-005-1664-5
HU XRYU JSCROSBY SDSTORCH GA: "Gene expression profiles in febrile children with defined viral and bacterial infection", P NATL ACAD SCI USA, vol. 110, no. 31, 2013, pages 12792 - 7, XP055379373, doi:10.1073/pnas.1302968110
O'HANLON TPRIDER LGGAN LFANNIN RPAULES RSUMBACH DM ET AL.: "Gene expression profiles from discordant monozygotic twins suggest that molecular pathways are shared among multiple systemic autoimmune diseases", ARTHRITIS RES THER., vol. 13, no. 2, 2011, pages R69, XP021102307, doi:10.1186/ar3330
ISHII TONDA HTANIGAWA AOHSHIMA SFUJIWARA HMIMA T ET AL.: "Isolation and expression profiling of genes upregulated in the peripheral blood cells of systemic lupus erythematosus patients", DNA RES., vol. 12, no. 6, 2005, pages 429 - 39, XP002523876, doi:10.1093/DNARES/DSI020
FABRIEK BOVAN BRUGGEN RDENG DMLIGTENBERG AJNAZMI KSCHORNAGEL K ET AL.: "The macrophage scavenger receptor CD163 functions as an innate immune sensor for bacteri", BLOOD, vol. 113, no. 4, 2009, pages 887 - 92
MINICH LLSLEEPER LAATZ AMMCCRINDLE BWLU MCOLAN SD ET AL.: "Delayed diagnosis of Kawasaki disease: what are the risk factors?", PEDIATRICS, vol. 120, no. 6, 2007, pages e1434 - 40
NEWBURGER JWTAKAHASHI MGERBER MAGEWITZ MHTANI LYBURNS JC ET AL.: "Diagnosis, treatment, and long-term management of Kawasaki disease: a statement for health professionals from the Committee on Rheumatic Fever, Endocarditis and Kawasaki Disease, Council on Cardiovascular Disease in the Young, American Heart Association", CIRCULATION, vol. 110, no. 17, 2004, pages 2747 - 71
PETTY RESOUTHWOOD TRMANNERS PBAUM JGLASS DNGOLDENBERG J ET AL.: "International League of Associations for Rheumatology classification of juvenile idiopathic arthritis: second revision, Edmonton, 2001", J RHEUMATOL., vol. 31, no. 2, 2004, pages 390 - 2
DU PKIBBE WALIN SM: "lumi: a pipeline for processing Illumina microarray", BIOINFORMATICS, vol. 24, no. 13, 2008, pages 1547 - 8
RITCHIE MEPHIPSON BWU DHU YLAW CWSHI W ET AL.: "limma powers differential expression analyses for RNA-sequencing and microarray studies", NUCLEIC ACIDS RES., vol. 43, no. 7, 2015, pages e47
GELMAN AHWANG JVEHTARI A: "Understanding predictive information criteria for Bayesian models", STATISTICS AND COMPUTING, vol. 24, no. 6, 2014, pages 997 - 1016, XP035377382, doi:10.1007/s11222-013-9416-2
ROBIN XTURCK NHAINARD ATIBERTI NLISACEK FSANCHEZ JC ET AL.: "pROC: an open-source package for R and S+ to analyze and compare ROC curves", BMC BIOINFORMATICS, vol. 12, 2011, pages 77, XP021096345, doi:10.1186/1471-2105-12-77
YOUDEN WJ: "Index for rating diagnostic tests", CANCER, vol. 3, no. 1, 1950, pages 32 - 5
BERNARDO JMSMITH AFM: "Chichester, Eng.", vol. xiv, 1994, WILEY, article "Bayesian theory", pages: 586
HOGGART C.J.: "PReMS: Parallel Regularised Regression Model Search for sparse bio-signature discovery", BIORXIV 355479, 2018
Attorney, Agent or Firm:
STERLING IP LTD (GB)
Download PDF:
Claims:
CLAIMS

1. A method of identifying a subject having Kawasaki disease (KD) comprising detecting in a subject derived RNA sample the modulation in gene expression levels of a gene signature comprising at least 5 of the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, SI OOP, IFI27, HS.553068, CD163, and RTN1.

2. The method according to claim 1, wherein the gene signature comprises 6, 7, 8, 9, 10, 11, 12 or 13 of the genes.

3. The method according to claims 1 or 2, wherein the gene signature comprises at least one of the following genes: PYR0XD2, SMOX, CACNA1E, CD163, DDIAS, CLIC3, KLHL2 and HS.553068, in particular at least one of PYR0XD2, SMOX, CACNA1E and CD 163.

4. The method according to any one of claims 1 to 3, wherein the gene signature comprises PYR0XD2.

5. The method according to any one of claims 1 to 4, wherein the gene signature comprises CACNA1E.

6. The method according to any one of claims 1 to 5, wherein the gene signature comprises SMOX.

7. The method according to any one of claims 1 to 6, wherein the gene signature comprises CD163.

8. The method according to any one of claims 1 to 7, wherein the gene signature comprises:

(i) PYR0XD2 and CACNA1E;

(ii )PYR0XD2 and SMOX; or

(iii) PYR0XD2, CACNA1E and SMOX.

9. The method according to any one of claims 1 to 8, wherein the gene signature comprises or consists of any one of the following combinations of genes:

(i) PYR0XD2, CACNA1E, CD163, KLHL2 and SMOX;

(ii) PYR0XD2, CACNA1E, IFI27, KLHL2 and SMOX;

(iii) PYR0XD2, CACNA1E, HS.553068, IFI27 and SMOX;

(iv) PYR0XD2, DDIAS, CACNA1E, IFI27 and SMOX;

(v) PYR0XD2, CACNA1E, CD163, KLHL2 and ZNF185;

(vi) PYR0XD2, DDIAS, CD163, KLHL2 and SMOX;

(vii) PYR0XD2, CACNA1E, CD163, IFI27, KLHL2 and SMOX;

(viii) PYR0XD2, CACNA1E, CD163, KLHL2, LINC02035 and SMOX;

(ix) PYR0XD2, DDIAS, CACNA1E, CD163, IFI27 and SMOX;

(x) PYR0XD2, CACNA1E, CD163, HS.553068, IFI27 and SMOX;

(xi) PYR0XD2, CACNA1E, CD163, KLHL2, SMOX and ZNF185;

(xii) PYR0XD2, CACNA1E, IFI27, KLHL2, RTN1 and SMOX;

(xiii) PYR0XD2, CACNA1E, CD163, CLIC3, KLHL2 and SMOX;

(xiv) PYR0XD2, CACNA1E, CLIC3, IFI27, KLHL2 and SMOX;

(xv) PYR0XD2, DDIAS, CACNA1E, IFI27, RTN1 and SMOX;

(xvi) PYR0XD2, DDIAS, CD163, IFI27, KLHL2 and SMOX;

(xvii) PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2 and SMOX; (xviii) PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX;

(xix) PYR0XD2, DDIAS, CACNA1E, CD163, IFI27, KLHL2 and SMOX;

(xx) PYR0XD2, CACNA1E, CD163, IFI27, KLHL2, RTN1 and SMOX;

(xxi) PYR0XD2, DDIAS, CACNA1E, CD163, HS.553068, IFI27 and SMOX

(xxii) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX;

(xxiii) PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX;

(xxiv) PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX;

(xxv) PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2, SI OOP and SMOX;

(xxvi) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX;

(xxvii) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX;

(xxviii) PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX;

(xxix) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1, SIOOP and SMOX;

(xxx) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX; (xxxi) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, SIOOP and

SMOX;

(xxxii) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, SI OOP and SMOX;

(xxxiii) PYR0XD2, DDIAS, CACNA1E, CD 163, CLIC3, IFI27, KLHL2, RTN1, SIOOP, SMOX and ZNF185.

(xxxiv) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, SIOOP, SMOX and ZNF185;

(xxxv) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, LINC02035, RTN1, SIOOP and SMOX; or

(xxxvi) PYROXD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, SIOOP, SMOX and ZNF185.

10. The method according to any one of claims 1 to 8, wherein the gene signature comprises or consists of CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

11. The method according to any one of claims 1 to 10, wherein the method further incorporates detecting the expression levels or one or more housekeeping genes, such as 1, 2, 3, 4 or 5 housekeeping genes, for example selected from actin, GAPDH, ubiquitin, 18s rRNA, RPII (P0LR2A), TBP, PPIA, GUSB, HSPCB, YWHAZ, SDHA, RPS13, HPRT1 and B4GALT6.

12. The method according to any one of claims 1 to 11, wherein the subject with KD can be identified in the presence of or discriminated from a patient with one or more of the following: a bacterial infection, a viral infection and an inflammatory condition.

13. The method according to any one of claims 1 to 12, wherein the subject is a child, for example where the child is in the age range 2 to 59 months.

14. The method according to any one of claims 1 to 13, wherein the subject has a fever.

15. The method according to any one of claims 1 to 14, wherein the analysis of gene expression modulation employs a microarray, a gene chip or PCR, such as RT-PCR, in particular a multiplex PCR.

16. The method according to any one of claims 1 to 15, which comprises the further step of prescribing or administering a treatment for Kawasaki disease (KD) to the subject based on the results of the analysis of the gene signature.

17. A method of treating a subject having Kawasaki disease (KD), comprising administering a treatment for KD to the subject, wherein the subject has been previously identified as having Kawasaki disease by detecting in a subject derived RNA sample the modulation in gene expression levels of a gene signature comprising at least 5 of the following genes: CACNA1E, DD1AS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, SIOOP, IFI27, HS.553068, CD163, and RTN1, for example employing a method according to any one of claims 1 to 15.

18. The method according to claims 16 or 17, wherein the treatment is gamma globulin (IVIg), aspirin, or other anti-inflammatory agents, such as steroids and infliximab, or a combination thereof.

19. A set of primers for use in a method of identifying a subject having Kawasaki disease (KD) comprising primers specific to a polynucleotide gene transcript from at least 5 of the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, SI OOP, IFI27, HS.553068, CD163, and RTN1.

20. A gene chip consisting of probes that are specific to at least 5 of the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

21. A gene chip consisting of probes that are specific to at least 5 of the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1; and one or more control probes, for example selected from the group consisting of actin, GAPDH, ubiquitin, 18s rRNA, RPII (P0LR2A), TBP, PPIA, GUSB, HSPCB, YWHAZ, SDHA, RPS13, HPRT1 and B4GALT6.

22. A point of care test for identifying a subject having Kawasaki disease (KD) comprising the set of primers according to claim 19 or the gene chip according to claims 20 or 21.

23. Use of the set of primers according to claim 19 or the gene chip according to claims 20 or 21 in an assay to detect Kawasaki disease (KD) in a sample, for example a blood sample.

Description:
METHOD OF IDENTIFYING A SUBJECT HAVING KAWASAKI DISEASE

The present disclosure relates to a method of identifying a subject having Kawasaki disease (KD), which includes discriminating the subject from a subject having another condition, for example other infectious and inflammatory conditions, such as those that present similar symptoms to KD. The disclosure also relates to a minimal gene signature employed in the said method and to a bespoke gene chip for use in the method. The disclosure further extends to probes and/or primers specific to genes in a signature of the present disclosure. The disclosure further relates to use of known gene chips in the methods of the disclosure and kits comprising the elements required for performing the method. The disclosure also relates to use of the method to provide a composite expression score which can be used in the discrimination of a bacterial infection from a viral infection or inflammatory disease, particularly suitable for use in a low resource setting.

BACKGROUND

Kawasaki disease (KD) is an acute inflammatory disorder predominantly affecting young children. Since its initial description in Japan [1], the disease has emerged as the most common cause of acquired heart disease with an incidence in children under five ranging from 265/100,000 in Japan [2], 51-194/100,000 in other Asian countries [3-5], and 8-20/100,000 in Europe [6] and the USA [7] respectively. What has made KD of such concern is its association with vasculitis, affecting predominantly the coronary arteries, which results in coronary artery aneurysm (CAA) formation in up to 25% of untreated children [8]. Death from myocardial infarction may occur due to thrombotic occlusion of the aneurysms, or from the later development of stenotic lesions due to vascular remodelling in the damaged artery. Long-term outcome studies of children with giant CAA indicate a worrying prognosis with over 50% needing revascularization or suffering myocardial infarction within a 30-year period [9, 10].

Treatment with intravenous immunoglobulin (IVIG) and, for those who do not respond, the administration of additional IVIG [11] or other anti-inflammatory agents such as steroids and infliximab, is effective in abrogating the inflammatory process and reduces the risk of CAA to 5-10% [12]. As KD is difficult to distinguish from other common febrile conditions, many children with KD are not diagnosed and treated early enough in the course of the illness to prevent development of CAA [13]. Furthermore, patients who do not fulfil the clinical criteria for diagnosing KD (so called "incomplete KD") may nonetheless suffer CAA. Delayed diagnosis is a consistent risk factor for development of CAA, and even in centres with considerable experience with KD, treatment is often commenced only when coronary dilatation is already demonstrated on echocardiography. CAA development is clinically silent and may be recognised only years later at the time of sudden death or myocardial infarction.

The symptoms of KD are similar to those of several other childhood febrile illnesses, including staphylococcal and streptococcal toxic shock syndromes, measles and other viral illnesses such as adenovirus infection, Rocky Mountain spotted fever, and childhood inflammatory diseases, leading to diagnostic difficulty and thus delay in diagnosis and treatment. Guidelines have been developed to facilitate clinical diagnosis based on clinical signs and symptoms, echocardiography, and laboratory parameters [14] . However, there is no definitive diagnostic test for the disease. As the global incidence of KD is increasing, there is an urgent need for an accurate test to distinguish KD from other conditions causing prolonged fever in children.

SUMMARY OF THE INVENTION

In the era of precision medicine, diagnosis of many conditions previously based on clinical features alone is being replaced by diagnosis based on molecular pathology. Host blood gene expression signatures have been shown to distinguish a number of specific infectious and inflammatory diseases including tuberculosis [15], bacterial and viral infections [16], and systemic lupus erythematosus [17]. Support for a diagnostic approach for KD based on gene expression signatures comes from identification of microRNA biomarkers in KD [18, 19], though existing studies are limited by the range of comparator patient groups, or by the need to extract RNA from exosomes.

Accordingly, the present inventors have explored the use of whole blood gene expression patterns to distinguish KD from other childhood infectious and inflammatory conditions. The present disclosure provides a gene expression signature, discovered and validated in independent patient groups, that distinguishes KD from a range of bacterial, viral and inflammatory illnesses.

The present disclosure is summarised in the following paragraphs:

1. A method of identifying a subject having Kawasaki disease (KD) comprising detecting in a subject derived RNA sample the modulation in gene expression levels of a gene signature comprising at least 5 of the following genes: CACNA1E, DD1AS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, SIOOP, 1F127, HS.553068, CD163, and RTN1.

2. The method according to paragraph 1, wherein the gene signature comprises 6, 7, 8, 9, 10, 11, 12 or 13 of the genes.

3. The method according to paragraphs 1 or 2, wherein the gene signature comprises at least one of the following genes: PYR0XD2, SMOX, CACNA1E, CD163, DDIAS, CLIC3, KLHL2 and HS.553068, in particular at least one of PYR0XD2, SMOX, CACNA1E and CD 163.

4. The method according to any one of paragraphs 1 to 3, wherein the gene signature comprises PYR0XD2.

5. The method according to any one of paragraphs 1 to 4, wherein the gene signature comprises CACNA1E.

6. The method according to any one of paragraphs 1 to 5, wherein the gene signature comprises SMOX.

7. The method according to any one of paragraphs 1 to 6, wherein the gene signature comprises CD163.

8. The method according to any one of paragraphs 1 to 7, wherein the gene signature comprises:

(i) PYR0XD2 and CACNA1E;

(ii )PYR0XD2 and SMOX; or

(iii) PYR0XD2, CACNA1E and SMOX.

9. The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of at least 5 of the genes, for example selected from:

(i) PYR0XD2, CACNA1E, CD163, KLHL2 and SMOX;

(ii) PYR0XD2, CACNA1E, IFI27, KLHL2 and SMOX;

(iii) PYR0XD2, CACNA1E, HS.553068, IFI27 and SMOX; (iv) PYR0XD2, DDIAS, CACNA1E, 1F127 and SMOX;( ) PYR0XD2, CACNA1E, CD163, KLHL2 and ZNF185; or

(vi) PYR0XD2, DDIAS, CD 163, KLHL2 and SMOX.

The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of at least 6 of the genes, for example selected from:

(i) PYR0XD2, CACNA1E, CD163, IFI27, KLHL2 and SMOX;

(ii) PYR0XD2, CACNA1E, CD163, KLHL2, LINC02035 and SMOX;

(iii) PYR0XD2, DDIAS, CACNA1E, CD163, IFI27 and SMOX;

(iv) PYR0XD2, CACNA1E, CD163, HS.553068, IFI27 and SMOX;

(v) PYR0XD2, CACNA1E, CD163, KLHL2, SMOX and ZNF185;

(vi) PYR0XD2, CACNA1E, IFI27, KLHL2, RTN1 and SMOX;

(vii) PYR0XD2, CACNA1E, CD163, CLIC3, KLHL2 and SMOX;

(viii) PYR0XD2, CACNA1E, CLIC3, IFI27, KLHL2 and SMOX;

(ix) PYR0XD2, DDIAS, CACNA1E, IFI27, RTN1 and SMOX; or

(x) PYR0XD2, DDIAS, CD163, IFI27, KLHL2 and SMOX.

The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of at least 7 of the genes, for example selected from:

(i) PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2 and SMOX;

(ii) PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX;

(iii) PYR0XD2, DDIAS, CACNA1E, CD163, IFI27, KLHL2 and SMOX;

(iv) PYR0XD2, CACNA1E, CD163, IFI27, KLHL2, RTN1 and SMOX; or

(v) PYR0XD2, DDIAS, CACNA1E, CD163, HS.553068, IFI27 and SMOX.

The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of at least 8 of the genes, for example selected from:

(i) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX;

(ii) PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX;

(iii) PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX; or

(iv) PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2, SI OOP and SMOX.

The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of at least 9 of the genes, for example selected from:

(i) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX;

(ii) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX; or

(iii) PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX.

The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of at least 10 of the genes, for example selected from:

(i) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1, S100P and SMOX; or

(ii) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX.

The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of at least 11 of the genes, for example selected from:

(i) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, S100P and SMOX;

(ii) PYR0XD2, DDIAS, CACNA1E, CD 163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, S100P and SMOX; or (iii) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1, SIOOP, SMOX and ZNF185. The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of at least 12 of the genes, for example selected from:

(i) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, S100P, SMOX and ZNF185;

(ii) PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, LINC02035, RTN1, S100P and SMOX; or

(iii) PYROXD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, S100P, SMOX and ZNF185.

The method according to any one of paragraphs 1 to 8, wherein the gene signature comprises or consists of CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

The method according to any one of paragraphs 1 to 17, wherein the method further incorporates detecting the expression levels or one or more housekeeping genes, such as 1, 2, 3, 4 or 5 housekeeping genes, for example selected from actin, GAPDH, ubiquitin, 18s rRNA, RPII (P0LR2A), TBP, PPIA, GUSB, HSPCB, YWHAZ, SDHA, RPS13, HPRT1 and B4GALT6.

The method according to any one of paragraphs 1 to 18, wherein a subject with KD can be identified in the presence of one or more of the following: a bacterial infection, a viral infection and an inflammatory condition.

The method according to any one of paragraphs 1 to 19, wherein a subject with KD can be discriminated from a patient with one or more of the following: a bacterial infection, a viral infection and an inflammatory condition.

The method according to paragraphs 19 or 20, wherein the bacterial infection is selected from the group consisting of: Chlamydia pneumoniae, Chlamydia trachomatis, Chlamydophila psittaci, Mycoplasma pneumonia, Corynebactehum diphtheriae, Clostridium botulinum, Clostridium difficile, Clostridium perfringens, Clostridium tetani, Enterococcus faecalis, Enterococcus faecium, Listeria monocytogenes, Staphylococcus aureus, Staphylococcus epidermidis, Staphylococcus saprophyticus, Group B streptococcus, Streptococcus agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, or acid fast bacteria such as Mycobacterium leprae, Mycobaterium tuberculosis, Mycobacterium ulcerans, mycobacterium avium intercellularae, Bordetella pertussis, Borrelia burgdorferi, Brucella abortus, Brucella canis, Brucella melitensis, Brucella suis, Campylobacter jejuni, Escherichia coli, Francisella tularensis, Haemophilus influenzae, Helicobacter pylori, Legionella pneumophila, Leptospira interrogans, Neisseria gonorrhoeae, Neisseria meningitidis, Pseudomonas aeruginosa, Pseudomonas spp, Rickettsia rickettsii, Salmonella typhi, Salmonella typhimurium, Shigella sonnei, Treponema pallidum, Vibrio cholerae, Yersinia pestis, Kingella kingae, Stenotrophomonas, Klebsiella, a gram-positive coccus, a gram-negative bacillus, mycoplasma, pertussis, mycobacteria and staphylococcal and streptococcal toxic shock syndromes, for example a gram-positive coccus, a gram-negative bacillus, mycoplasma or pertussis, and mycobacteria, in particular selected from the group consisting of S.pneumoniae, S.aureus, S.pyogenes, Group B streptococcus, E.coli, N. meningitidis, Enterococcus, Kingella, H.influenzae, Pseudomonas spp, Stenotrophomonas, Klebsiella, staphylococcal and streptococcal toxic shock syndrome, in particular staphylococcal or streptococcal toxic shock syndrome.. The method according to any one of paragraphs 19 to 21, wherein viral infection is selected from the group consisting of: selected from the group consisting of: Influenza such as Influenza A, including but not limited to: H1N1, H2N2, H3N2, H5N1, H7N7, H1N2, H9N2, H7N2, H7N3, H10N7, Influenza B and Influenza C, Respiratory Syncytial Virus (RSV), rhinovirus, enterovirus, bocavirus, parainfluenza (such as parainfluenza 1-4), adenovirus, metapneumovirus, herpes simplex virus, Chickenpox virus, Human papillomavirus, Hepatitis, Epstein-Barr virus, Varicella- zoster virus, Human cytomegalovirus, Human herpesvirus, type 8 BK virus, JC virus, Smallpox, Parvovirus B19, Human astrovirus, Norwalk virus, coxsackievirus, poliovirus, Severe acute respiratory syndrome virus, yellow fever virus, dengue virus, West Nile virus, Rubella virus, Human immunodeficiency virus, Guanarito virus, Junin virus, Lassa virus, Machupo virus, Sabia virus, Crimean-Congo haemorrhagic fever virus, Ebola virus, Marburg virus, Measles virus, Mumps virus, Rabies virus, Rotavirus, and Rocky Mountain spotted fever, for example selected from the group consisting of: respiratory syncytial virus (RSV), adenovirus, parainfluenza virus (such as parainfluenza 1-4), influenza (such as influenza A, B or A+B), bocavirus, metapneumovirus, rhinovirus and enterovirus, in particular RSV, influenza A/B and adenovirus, in particular measles, an adenovirus infection and Rocky Mountain spotted fever.

The method according to any one of paragraphs 19 to 22, wherein the inflammatory condition is selected from the group consisting of asthma, peptic ulcers, tuberculosis, periodontitis, ulcerative colitis, Crohn’s disease, sinusitis, hepatitis, multiple sclerosis, atherosclerosis, sjogrens disease, inflammatory bowel disease, lupus erythrematosus (including systemic lupus erythrematosus), fibrotic diseases, such as pulmonary fibrosis, Henoch-Schonlein Purpura (HSP) and Juvenile Idiopathic Arthritis (JIA), in particular Henoch-Schonlein Purpura (HSP) or Juvenile Idiopathic Arthritis (JIA) .

The method according to any one of paragraphs 1 to 23, wherein the subject is a child, for example where the child is in the age range 2 to 59 months.

The method according to any one of paragraphs 1 to 23, wherein the subject is an infant in the age range 0 to 59 days.

The method according to any one of paragraphs 1 to 25, wherein the subject has a fever.

The method according to any one of paragraphs 1 to 26, wherein the analysis of gene expression modulation employs a microarray or a gene chip.

The method according to any one of paragraphs 1 to 27, wherein the analysis gene expression modulation employs: PCR, such as RT-PCR, in particular a multiplex PCR.

The method according to paragraph 14 or 15, wherein the PCR is quantitative.

The method according to any one of paragraphs 28 to 29, wherein primers employed in the PCR comprise a label or a combination of labels, for example wherein the label is fluorescent or coloured, for example coloured beads.

The method according to any one of paragraphs 1 to 30, which comprises the further step of prescribing or administering a treatment for Kawasaki disease (KD) to the subject based on the results of the analysis of the gene signature.

A method of treating a subject having Kawasaki disease (KD), comprising administering a treatment for KD to the subject, wherein the subject has been previously identified as having Kawasaki disease by detecting in a subject derived RNA sample the modulation in gene expression levels of a gene signature comprising at least 5 of the following genes: CACNA1E, DD1AS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, SIOOP, IFI27, HS.553068, CD163, and RTN1.

33. The method according to paragraphs 31 or 32, wherein the treatment is gamma globulin (IVlg), aspririn, or other anti-inflammatory agents, such as steroids and infliximab, or a combination thereof.

34. A set of primers for use in a method of identifying a subject having Kawasaki disease (KD) comprising primers specific to a polynucleotide gene transcript from at least 5 of the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

35. The set of primers according to paragraph 34, consisting of primers that are only specific to the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

36. A gene chip consisting of probes that are specific to at least 5 of the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD 163, and RTN1.

37. A gene chip consisting of probes that are specific to at least 5 of the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD 163, and RTN1; and one or more control probes.

38. The gene chip according to paragraph 37, wherein the one or more control probes are specific to a gene selected from the group consisting of actin, GAPDH, ubiquitin, 18s rRNA, RPII (P0LR2A), TBP, PPIA, GUSB, HSPCB, YWHAZ, SDHA, RPS13, HPRT1 and B4GALT6.

39. A point of care test for identifying a subject having Kawasaki disease (KD) t comprising the set of primers defined in paragraphs 34 or 35 or the gene chip according to any one of paragraphs 36 to38.

40. Use of the set of primers defined in paragraphs 34 or 35 or the gene chip according to any one of paragraphs 36 to 38 in an assay to detect Kawasaki disease (KD) in a sample, for example a blood sample.

The present disclosure provides a method of identifying a subject having Kawasaki disease (KD) comprising detecting the expression levels of at least 5 of the following genes: CACNA1E, DDIAS ( C110RF82) , KLHL2, PYR0XD2 ( C100RF33) , SMOX, ZNF185, LINC02035 ( LOCIOO 129550 CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

Therefore, in one aspect, there is a method of identifying a subject having Kawasaki disease

(KD) comprising detecting in a subject derived RNA sample the modulation in gene expression levels of a gene signature comprising at least 5 of the following genes: CACNA1E, DDIAS ( Cl 1 ORF82), KLHL2, PYR0XD2 ( C100RF33) , SMOX, ZNF185, LINC02035 (LOC100129550), CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

Advantageously, use of the gene signature in a method according to the present disclosure allows the robust and accurate identification of a subject having KD. Importantly, the method allows the accurate discrimination between patients having KD from those displaying similar symptoms but have other bacterial infections, viral infections and/or inflammatory conditions. In other words, the method allows the accurate detection of KD in the presence or absence of bacterial, viral infections and/or inflammatory conditions, without the need to rely on clinical criteria and/or laboratory tests such as echocardiography.

Gene signatures often comprise a large number of genes which only in combination show a pattern or marker of biological significance. It is very surprising that the gene signature of the present disclosure can be based on as few as 13 genes and still reliably identify the presence of KD.

A gene signature of the present disclosure comprising at least 5 of the above mentioned 13 genes provides good predictive power. However, additional genes can be included in the signature in order to further augment and increase the discriminatory power of the gene signature. Thus, in one embodiment, the signature comprises at least 5, 6, 7, 8, 9, 10, 11, 12 or 13 of the genes.

Thus, in one embodiment, the signature comprises at least PYR0XD2. In one embodiment, the signature comprises at least SMOX. In one embodiment, the signature comprises at least CACNA1E. In one embodiment, the signature comprises at least CD 163. In one embodiment, the signature comprises at least DDIAS. In one embodiment, the signature comprises at least CL1C3. In one embodiment, the signature comprises at least KLHL2. In another embodiment, the signature comprises at least HS.553068. In another embodiment, the signature comprises at least RTN1. In another embodiment, the signature comprises at least ZNF185. In another embodiment, the signature comprises at least 1F127. In another embodiment, the signature comprises at least SI OOP. In another embodiment, the signature comprises at least LINC02035.

In one embodiment, the gene signature comprises at least one of the following genes: PYR0XD2, SMOX, CACNA1E, CD163, DDIAS, CLIC3, KLHL2 and HS.553068. In another embodiment, the gene signature comprises at least one of the following genes: PYR0XD2, SMOX, CACNA1E and CD163. The present inventors have discovered that these particular genes have higher discriminatory power and are therefore more likely to be present in the signatures with the best predictive capabilities.

For example, the gene signature may comprise any of the following combinations of genes: PYR0XD2, SMOX, CACNA1E and CD163; PYR0XD2, SMOX and CACNA1E; PYR0XD2, SMOX and CD163; SMOX, CACNA1E and CD163; PYR0XD2, CACNA1E and CD163; PYR0XD2 and SMOX; PYR0XD2 and CACNA1E; PYR0XD2 and CD163; SMOX and CACNA1E; SMOX and CD163; or CACNA1E and CD163; or any other combination.

In one embodiment, the signature comprises PYR0XD2 and at least one of CACNA1E and SMOX. Therefore, in one embodiment, the signature comprises PYR0XD2 and CACNA1E. In another embodiment, the signature comprises PYR0XD2 and SMOX. In yet another embodiment, the signature comprises PYR0XD2, CACNA1E and SMOX.

In one embodiment, the signature comprises at least 5 of the 13 genes. Thus, in one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, KLHL2 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, IFI27, KLHL2 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, HS.553068, IFI27 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, IFI27 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, KLHL2 and ZNF185. In one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CD 163, KLHL2 and SMOX.

In one embodiment, the signature comprises at least 6 of the 13 genes. Thus, in one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, IFI27, KLHL2 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, KLHL2, L1NC02035 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, IFI27 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, HS.553068, IFI27 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, KLHL2, SMOX and ZNF185. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, CLIC3, KLHL2 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CLIC3, IEI27, KLHL2 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, IFI27, RTN1 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CD163, IFI27, KLHL2 and SMOX.

In one embodiment, the signature comprises at least 7 of the 13 genes. Thus, in one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX. In one embodiment, the signature comprises of consists of PYR0XD2, DDIAS, CACNA1E, CD163, IFI27, KLHL2 and SMOX. In one embodiment, the signature comprises of consists of PYR0XD2, CACNA1E, CD163, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the signature comprises of consists of PYR0XD2, DDIAS, CACNA1E, CD163, HS.553068, IFI27 and SMOX.

In another embodiment, the signature comprises at least 8 of the 13 genes. Thus, in one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX. In another embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2, S100P and SMOX.

In another embodiment, the signature comprises at least 9 of the 13 genes. Thus, in one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the signature comprises or consists of PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX.

In another embodiment, the signature comprises at least 10 of the 13 genes. Thus, in one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1, S100P and SMOX. In another embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX.

In another embodiment, the signature comprises at least 11 of the 13 genes. Thus, in one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, S100P and SMOX. In another embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, SIOOP and SMOX. In another embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1, S100P, SMOX and ZNF185. In one embodiment, the signature comprises at least 12 of the 13 genes. Thus, in one embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, S100P, SMOX and ZNF185. In another embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, LINC02035, RTN1, S100P and SMOX. In another embodiment, the signature comprises or consists of PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, SIOOP, SMOX and ZNF185.

In one embodiment, the signature comprises all 13 genes. Thus, in one embodiment, the gene signature comprises or consists of CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1. Advantageously, the signature comprising all 13 genes has the highest discriminatory power and allows KD to be identified with the highest degree of sensitivity and specificity.

The identification of KD can be particularly critical because of its association with vasculitis, which may result in coronary artery aneurysm (CCA) formation. Death from myocardial infarction may occur due to thrombotic occlusion of the aneurysms, or from the later development of stenotic lesions due to vascular remodelling in the damaged artery. Hence, there is a significant unmet clinical need for proper and reliable identification of KD. The gene signature of the present disclosure is a huge step forward on the road to treating patients, such as febrile patients because it allows accurate and rapid diagnosis which, in turn, allows patients to be appropriately and timely treated.

Furthermore, the components employed in the method disclosed herein can be provided in a simple format, which are cost efficient, rapid, cost effective, and can be employed in low resource and/or rural settings.

The present inventors found that the transcript expression levels of CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035 and CLIC3 are increased in subjects having KD compared to subjects that do not have KD, and that the expression levels of S100P, IFI27, HS.553068, CD163, and RTN1 are decreased in subjects having KD compared to subjects that do not have KD.

Advantageously, the present inventors were able to discriminate subjects having KD from subjects that do not have KD with a high AUC of ~96.2%, and a high degree of sensitivity (~81.7%) and specificity (~92.1%) using a gene signature which detects the modulation in gene expression levels of the 13 genes listed above.

Advantageously, the gene signature was developed from a training set including a range of ethnicities. This means that the gene signature and methods of the present disclosure can be applied to samples derived from subjects of different ethnicities. Further advantageously, the gene signature was developed using KD patients that were no more than 7 days into their illness. This means that the signature can facilitate early diagnosis of KD, before 5 days of fever, which can aid in the early identification of KD patients and early appropriate treatment can be given.

Accordingly, the present inventors have demonstrated that the method is applicable across a wide range of different samples and patient groups which suggests that the method is robust and reliable.

Hence, in one aspect, the present disclosure provides a method of diagnosing a subject having Kawasaki disease comprising detecting in a subject derived RNA sample the modulation in gene expression levels of a gene signature comprising at least 5 of the following genes: CACNA1E, DDIAS 0 C110RF82 KLHL2, PYR0XD2 ( C100RF33 SMOX, ZNF185, LINC02035 (LOC100129550), CLIC3, SIOOP, IFI27, HS.553068, CD163, and RTN1.

In one embodiment, the method of diagnosis is performed in vitro.

In one embodiment the method further employs one or more housekeeping genes, such as 1, 2, 3, 4 or 5 housekeeping genes. Housekeeping genes are not considered part of the signature in the context of the present specification. In one embodiment, the housekeeping gene is selected from the group consisting of actin, GAPDH, ubiquitin, 18s rRNA, RPII ( P0LR2A ), TBP, PPIA, GUSB, HSPCB, YWHAZ, SDHA, RPS13, HPRT1 and B4GALT6.

In one embodiment the method of the present disclosure is capable of identifying a subject with KD in the presence of bacterial infection, viral infection and/or an inflammatory condition.

In one embodiment the method of the present disclosure is capable of discriminating a subject with KD from a patient with bacterial infection, viral infection and/or inflammatory condition.

In one embodiment the bacterial infection is selected from the group consisting of: Chlamydia pneumoniae, Chlamydia trachomatis, Chlamydophila psittaci, Mycoplasma pneumonia, Corynebacterium diphtheriae, Clostridium botulinum, Clostridium difficile, Clostridium perfringens, Clostridium tetani, Enterococcus faecalis, Enterococcus faecium, Listeria monocytogenes, Staphylococcus aureus, Staphylococcus epidermidis, Staphylococcus saprophyticus, Group B streptococcus, Streptococcus agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, or acid fast bacteria such as Mycobacterium leprae, Mycobaterium tuberculosis, Mycobacterium ulcerans, mycobacterium avium intercellularae, Bordetella pertussis, Borrelia burgdorferi, Brucella abortus, Brucella canis, Brucella melitensis, Brucella suis, Campylobacter jejuni, Escherichia coli, Francisella tularensis, Haemophilus influenzae, Helicobacter pylori, Legionella pneumophila, Leptospira interrogans, Neisseria gonorrhoeae, Neisseria meningitidis, Pseudomonas aeruginosa, Pseudomonas spp, Rickettsia rickettsii, Salmonella typhi, Salmonella typhimurium, Shigella sonnei, Treponema pallidum, Vibrio cholerae, Yersinia pestis, Kingella kingae, Stenotrophomonas, Klebsiellaa, gram positive coccus, a gram-negative bacillus, mycoplasma, pertussis, mycobacteria and staphylococcal and streptococcal toxic shock syndromes, for example a gram-positive coccus, a gram-negative bacillus, mycoplasma or pertussis, and mycobacteria.

In one embodiment, the bacterial infection is selected from the group consisting of: S.pneumoniae, S.aureus, S.pyogenes, Group B streptococcus, E.coli, N. meningitidis, Enterococcus, Kingella, H.influenzae, Pseudomonas spp, Stenotrophomonas and Klebsiella.

In one embodiment, the bacterial infection is staphylococcal or streptococcal toxic shock syndrome.

In one embodiment the viral infection is selected from the group comprising or consisting of: Influenza such as Influenza A, including but not limited to: H1N1, H2N2, H3N2, H5N1, H7N7, H1N2, H9N2, H7N2, H7N3, H10N7, Influenza B and Influenza C, Respiratory Syncytial Virus (RSV), rhinovirus, enterovirus, bocavirus, parainfluenza, adenovirus, metapneumovirus, herpes simplex virus, Chickenpox virus, Human papillomavirus, Hepatitis, Epstein-Barr virus, Varicella-zoster virus, Human cytomegalovirus, Human herpesvirus, type 8 BK virus, JC virus, Smallpox, Parvovirus B19, Human astrovirus, Norwalk virus, coxsackievirus, poliovirus, Severe acute respiratory syndrome virus, yellow fever virus, dengue virus. West Nile virus. Rubella virus. Human immunodeficiency virus, Guanarito virus, Junin virus, Lassa virus, Machupo virus, Sabia virus, Crimean-Congo haemorrhagic fever virus, Ebola virus, Marburg virus, Measles virus, Mumps virus, Rabies virus , Rotavirus and Rocky Mountain spotted fever.

In one embodiment, the viral infection is selected from the group consisting of: respiratory syncytial virus (RSV), adenovirus, parainfluenza virus (such as parainfluenza 1-4), influenza (such as influenza A, B or A+B), bocavirus, metapneumovirus, rhinovirus and enterovirus, in particular RSV, influenza A/B and adenovirus. In one embodiment, the viral infection is selected from the group consisting of measles, an adenovirus infection and Rocky Mountain spotted fever.

The method according to any one of claims 4 to 11, wherein the inflammatory condition is selected from the group consisting of asthma, peptic ulcers, tuberculosis, periodontitis, ulcerative colitis, Crohn’s disease, sinusitis, hepatitis, multiple sclerosis, atherosclerosis, sjogrens disease, inflammatory bowel disease, lupus erythrematosus (including systemic lupus erythrematosus), fibrotic diseases, such as pulmonary fibrosis, Henoch- Schonlein Purpura (HSP) and Juvenile Idiopathic Arthritis (JIA).

In one embodiment the inflammatory disease is disease is juvenile idiopathic arthritis (JIA), Henoch-Schonlein purpura (HSP).

In a further aspect the present disclosure provides a method of treating a subject having KD after diagnosis employing the method herein.

In one embodiment the subject is a child, for example under 17 years of age, such as 2 to 59 months old.

In one embodiment the subject is an infant, for example in the age range 0 to 59 days.

In one embodiment the subject has fever, for example is a febrile patient.

In one embodiment the method of the present disclosure is employed on a patient derived sample, for example a blood sample.

In one embodiment the analysis of gene expression modulation employs a microarray.

In one embodiment the analysis of gene expression modulation employs PCR, such as RT-

PCR.

In one embodiment the PCR is multiplex PCR.

In one embodiment the PCR is quantitative.

In one embodiment the primers employed in the PCR comprise a label or a combination of labels.

In one embodiment the label is fluorescent or coloured, for example the label is coloured beads.

In one embodiment the analysis of gene expression modulation employs dual colour reverse transcriptase multiplex ligation dependent probe amplification.

In one embodiment the gene expression modulation is detected by employing fluorescence spectroscopy.

In one embodiment the gene expression modulation is detected by employing colourimetric analysis.

In one embodiment the gene expression modulation is detected employing by impedance spectroscopy. In one embodiment the method comprises the further step of prescribing or administering a treatment for the subject having KD based on the results of the analysis of the gene signature.

Thus, in one aspect there is provided a method of treating a KD patient by administering a treatment such as gamma globulin (IVIg), aspirin, or other anti-inflammatory agents such as steroids and infliximab, wherein the patient is characterised in that the patient has been identified as positive for KD by the method disclosed herein. Hence, in one aspect, there is provided a method of treating a subject having Kawasaki disease (KD), comprising administering a treatment for KD to the subject, wherein the subject has been previously identified as having Kawasaki disease by detecting in a subject derived RNA sample the modulation in gene expression levels of a gene signature comprising at least 5 of the following genes: CACNA1E, DD1AS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1. Suitable treatments for KD will be known to the skilled person, including by not limited to gamma globulin (IVIg), aspirin, or other anti-inflammatory agents, such as steroids and infliximab, or a combination thereof.

In one aspect, there is provided a method of determining whether to administer a treatment for KD, such as gamma globulin (IVIg), aspirin, or other anti-inflammatory agents such as steroids and infliximab, comprising the steps of: performing the method according to the present disclosure, and administering the KD to the subject if the method indicates that the subject has KD.

Hence, the presently disclosed method can aid in the appropriate treatment of patients, such as febrile patients, for example where it is unclear if the fever is due to Kawasaki disease or due to a bacterial infection, viral infection, inflammatory condition or a combination thereof. This has the advantage of ensuring rapid and appropriate treatment without the need to wait for laboratory test results.

In one aspect of the disclosure, there is provided a set of primers for use in multiplex PCR, wherein the set of primers include nucleic acid sequences specific to a polynucleotide gene transcript from at least 5 of the following genes: CACNA1E, DD1AS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1. Thus, in one embodiment, the set of primers are specific to a transcript from at least PYR0XD2. In one embodiment, the set of primers are specific to a transcript from at least SMOX. In one embodiment, the set of primers are specific to a transcript from at least CACNA1E. In one embodiment, the set of primers are specific to a transcript from at least CD163. In one embodiment, the set of primers are specific to a transcript from at least DDIAS. In one embodiment, the set of primers are specific to a transcript from at least CLIC3. In one embodiment, the set of primers are specific to a transcript from at least KLHL2. In one embodiment, the set of primers are specific to a transcript from at least HS.553068. In one embodiment, the set of primers are specific to a transcript from at least RTN1. In one embodiment, the set of primers are specific to a transcript from at least ZNF185. In one embodiment, the set of primers are specific to a transcript from at least IFI27. In one embodiment, the set of primers are specific to a transcript from at least S100P. In one embodiment, the set of primers are specific to a transcript from at least LINC02035.

In one embodiment, the set of primers are specific to a transcript from at least one of the following genes: PYR0XD2, SMOX, CACNA1E, CD163, DDIAS, CLIC3, KLHL2 and HS.553068. In another embodiment, the set of primers are specific to a transcript from at least one of the following genes: PYR0XD2, SMOX, CACNA1E and CD163. For example, the set of primers are specific to transcripts from any of the following combinations of genes: PYR0XD2, SMOX, CACNA1E and CD163; PYR0XD2, SMOX and CACNA1E; PYR0XD2, SMOX and CD 163; SMOX, CACNA1E and CD163; PYR0XD2, CACNA1E and CD163; PYR0XD2 and SMOX; PYR0XD2 and CACNA1E; PYR0XD2 and CD163; SMOX and CACNA1E; SMOX and CD163; or CACNA1E and CD163; or any other combination.

In one embodiment, the set of primers are specific to a transcript from PYR0XD2 and at least one of CACNA1E and SMOX. Therefore, in one embodiment, the set of primers are specific to PYR0XD2 and CACNA1E. In another embodiment, the set of primers are specific to PYR0XD2 and SMOX. In yet another embodiment, the set of primers are specific to PYR0XD2, CACNA1E and SMOX.

In one embodiment, the set of primers are specific to a transcript from at least 5 of the 13 genes. Thus, in one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, IFI27, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, HS.553068, IFI27 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, IFI27 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, KLHL2 and ZNF185. In one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CD 163, KLHL2 and SMOX.

In one embodiment, the set of primers are specific to at least 6 of the 13 genes. Thus, in one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, IEI27, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, KLHL2, LINC02035 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD 163, IFI27 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD 163, HS.553068, IFI27 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, KLHL2, SMOX and ZNF185. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, CLIC3, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CLIC3, IFI27, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, IFI27, RTN1 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CD163, IFI27, KLHL2 and SMOX.

In one embodiment, the signature comprises at least 7 of the 13 genes. Thus, in one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, IFI27, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, HS.553068, IFI27 and SMOX.

In another embodiment, the set of primers are specific to at least 8 of the 13 genes. Thus, in one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX. In another embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2, SI OOP and SMOX. In another embodiment, the set of primers are specific to at least 9 of the 13 genes. Thus, in one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the set of primers are specific to PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX.

In another embodiment, the set of primers are specific to at least 10 of the 13 genes. Thus, in one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1, S100P and SMOX. In another embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX.

In another embodiment, the set of primers are specific to at least 11 of the 13 genes. Thus, in one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, S100P and SMOX. In another embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, S100P and SMOX. In another embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD 163, CLIC3, IFI27, KLHL2, RTN1, S100P, SMOX and ZNF185.

In one embodiment, the set of primers are specific to at least 12 of the 13 genes. Thus, in one embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, S100P, SMOX and ZNF185. In another embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, LINC02035, RTN1, S100P and SMOX. In another embodiment, the set of primers are specific to PYR0XD2, DDIAS, CACNA1E, CD 163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, S100P, SMOX and ZNF185.

In one embodiment, the set of primers are specific to all 13 genes. Thus, in one embodiment, the set of primers are specific to CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

In one embodiment, the gene transcript is RNA, for example mRNA or cRNA. Thus, in one embodiment, the

In one embodiment the primers for each gene are at least a pair of nucleic acid primer sequences.

In one embodiment the primer length is 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,

77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100 bases in length.

In one embodiment at least one primer for each gene comprises a label.

In one embodiment the labels on the primers are independently selected from selected from a fluorescent label, a coloured label, and antibody, step tag, his tag.

In one embodiment each primer in a given pair of primers is labelled, for example where one label quenches the fluorescence of the other label when said labels are within proximity of each other.

In another aspect of the disclosure there is provided a gene chip consisting of probes for detecting the modulation in gene expression levels of at least 5 of the following genes: CACNA1E, DD1AS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, SIOOP, IFI27, HS.553068, CD163, and RTN1. In one embodiment, the Illumina probe IDs for the 13 genes are shown in Table 2. Alternatively, the skilled addressee is able to design custom probes based on the nucleic acid sequence of each of the 13 genes.

In one embodiment, the gene chip further comprises control probes. In the context of this disclosure, the control probes are not considered as part of the gene signature. Hence, in one embodiment, the gene chip consists of probes for at least 5 of the following genes: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, SIOOP, IFI27, HS.553068, CD163, and RTN1; and one or more control probes. In one embodiment, the control probes are specific for transcripts from one or more of the following genes: actin, GAPDH, ubiquitin, 18s rRNA, RPII ( P0LR2A ), TBP, PPIA, GUSB, HSPCB, YWHAZ, SDHA, RPS13, HPRT1 and B4GALT6.

Advantageously, a chip with probes for at least 5 of the 13 genes is able to accurately and reliably differentiate between a sample, for example whole blood derived from a subject having KD from a sample derived from a subject having a bacterial/viral infection and/or inflammatory condition. Such a chip with a small number of probes can be cheaply produced, making the chip particularly suited for use in resource poor settings.

Thus, in one embodiment, the gene chip comprises probes for at least PYR0XD2. In one embodiment, the gene chip comprises probes for at least SMOX. In one embodiment, the gene chip comprises probes for at least CACNA1E. In one embodiment, the gene chip comprises probes for at least CD163. In one embodiment, the gene chip comprises probes for at least DDIAS. In one embodiment, the gene chip comprises probes for at least CLIC3. In one embodiment, the gene chip comprises probes for at least KLHL2. In one embodiment, the gene chip comprises probes for at least HS.553068. In one embodiment, the gene chip comprises probes for at least RTN1. In one embodiment, the gene chip comprises probes for at least ZNF185. In one embodiment, the gene chip comprises probes for at least IFI27. In one embodiment, the gene chip comprises probes for at least S100P. In one embodiment, the gene chip comprises probes for least LINC02035.

In one embodiment, the gene chip comprises probes for at least one of the following genes: PYR0XD2, SMOX, CACNA1E, CD163, DDIAS, CLIC3, KLHL2 and HS.553068. In another embodiment, the gene chip comprises probes for at least one of the following genes: PYR0XD2, SMOX, CACNA1E and CD163.

For example, the gene chip comprises probes for any of the following combinations of genes: PYR0XD2, SMOX, CACNA1E and CD163; PYR0XD2, SMOX and CACNA1E; PYR0XD2, SMOX and CD163; SMOX, CACNA1E and CD163; PYR0XD2, CACNA1E and CD163; PYR0XD2 and SMOX; PYR0XD2 and CACNA1E; PYR0XD2 and CD163; SMOX and CACNA1E; SMOX and CD163; or CACNA1E and CD163; or any other combination.

In one embodiment, the gene chip comprises probes for PYR0XD2 and at least one of CACNA1E and SMOX. Therefore, in one embodiment, the gene chip comprises probes for PYR0XD2 and CACNA1E. In another embodiment, the gene chip comprises probes for PYR0XD2 and SMOX. In yet another embodiment, the gene chip comprises probes for PYR0XD2, CACNA1E and SMOX.

In one embodiment, the gene chip comprises or consists of probes for at least 5 of the 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, 1F127, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, HS.553068, IFI27 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, IFI27 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, KLHL2 and ZNF185. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CD163, KLHL2 and SMOX.

In one embodiment, the gene chip comprises or consists of probes for at least 6 of the 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, IFI27, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, KLHL2, LINC02035 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, IEI27 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, HS.553068, IFI27 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, KLHL2, SMOX and ZNF185. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, CLIC3, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CLIC3, IFI27, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, IFI27, RTN1 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CD163, IFI27, KLHL2 and SMOX.

In one embodiment, gene chip comprises or consists of probes for at least 7 of the 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, IFI27, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, HS.553068, IFI27 and SMOX.

In another embodiment, the gene chip comprises or consists of probes for at least 8 of the 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2 and SMOX. In another embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, HS.553068, IFI27, KLHL2, SIOOP and SMOX.

In another embodiment, the gene chip comprises or consists of probes for at least 9 of the 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1 and SMOX. In one embodiment, the gene chip comprises or consists of probes for PYR0XD2, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX. In another embodiment, the gene chip comprises or consists of probes for at least 10 of the 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DD1AS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1, SIOOP and SMOX. In another embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1 and SMOX.

In another embodiment, the gene chip comprises or consists of probes for at least 11 of the 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, S100P and SMOX. In another embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, S100P and SMOX. In another embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, RTN1, SI OOP, SMOX and ZNF185.

In one embodiment, the gene chip comprises or consists of probes for at least 12 of the 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, RTN1, S100P, SMOX and ZNF185. In another embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, HS.553068, IFI27, KLHL2, LINC02035, RTN1, S100P and SMOX. In another embodiment, the gene chip comprises or consists of probes for PYR0XD2, DDIAS, CACNA1E, CD163, CLIC3, IFI27, KLHL2, LINC02035, RTN1, S100P, SMOX and ZNF185.

In one embodiment, the gene chip comprises or consists of probes for all 13 genes. Thus, in one embodiment, the gene chip comprises or consists of probes for CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3, S100P, IFI27, HS.553068, CD163, and RTN1.

In a further embodiment the present disclosure includes use of a known or commercially available gene chip in the method of the present disclosure.

In one aspect, there is provided a point of care test for identifying a subject having KD comprising the set of primers or gene chip as defined above. Advantageously, the presently disclosed test can be performed rapidly in as little as a couple of hours without the need for complex diagnostic or lab equipment. Accordingly, the presently disclosed method can be easily implemented as part of an existing patient care program in a hospital setting as well as in more resource poor settings, such as in remote villages.

In one aspect, there is provided the use of a set of primers or gene chip as defined above in an assay to detect KD in a sample, for example a blood sample.

DETAILED DESCRIPTION

The 13 genes/gene transcripts shown in Table 2 are useful for identifying a patient having KD or discriminating KD from a bacterial infection. In one embodiment the method of the present disclosure is able to differentiate a subject having KD from different conditions/diseases or infections, such as a bacterial/viral infection or an inflammatory condition, which have similar clinical symptoms. In another embodiment the 13 genes/gene transcripts are useful for discriminating from a viral infection. In yet another embodiment, the 13 genes/gene transcripts are useful for discriminating patient having KD from an inflammatory disease, such as juvenile idiopathic arthritis (JIA), Henoch-Schonlein purpura (HSP) or systemic lupus erythematosus (SLE). In one embodiment one probe is employed for detecting the modulation in gene expression of each gene, for example selected from the list of probes shown in Table 2.

In another embodiment, two or more probes are employed for detecting the modulation of each gene. In one embodiment of the present disclosure the gene signature is the minimum set of genes required to optimally detect the infection or discriminate the disease, for example between a bacterial/viral infection and/or between an inflammatory disease.

Optimally is intended to mean the smallest set of genes needed to discriminate between KD and a bacterial/viral infection and/or inflammatory condition without significant loss of specificity and/or sensitivity of the signature’s ability to detect or discriminate.

Detect or detecting as employed herein is intended to refer to the process of identifying KD in a sample, in particular through detecting modulation of the relevant genes in the signature. In one embodiment, a subject may be detected as only having KD. In another embodiment, the subject may have KD and also have a bacterial infection, a viral infection, an inflammatory condition, or a combination thereof.

Discriminate refers to the ability of the signature to differentiate between different disease statuses, for example KD vs a viral/bacterial infection or an inflammatory disease. Detect and discriminate are interchangeable in the context of the gene signature.

Subject as employed herein is a human suspected of having KD or a human having a fever from whom a sample is derived. The term patient may be used interchangeably although in one embodiment a patient has a morbidity.

In one embodiment the method of the present disclosure is performed on a sample derived from a subject having or suspected of having KD, for example wherein the subject exhibits symptoms normally associated with KD.

In one embodiment the method of the present disclosure is performed on a sample derived from a subject having or suspected of having a bacterial/viral infection or an inflammatory condition, but not suspected of having KD, for example wherein the subject exhibits symptoms normally not associated with KD. Testing a sample from such a subject can help to identify an individual who has KD who would normally not be correctly diagnosed.

In one embodiment the subject exhibits symptoms of a viral infection. In another embodiment the subject exhibits symptoms of a bacterial infection. In yet another embodiment the subject exhibits symptoms of both a bacterial and a viral infection. In one embodiment, the subject exhibits symptoms of an inflammatory condition.

In a further embodiment the sample is a sample derived from a febrile subject; that is to say with a temperature above the normal body temperature of 37.5°C.

In yet a further embodiment the analysis is performed to establish if a fever is associated with KD. Establishing the source of the fever/infection advantageously allows the prescription and/or administration of appropriate medication, for example patients identified has having KD can be given appropriate treatment like gamma globulin (IVIg), aspirin, whilst patients with bacterial infections can be given antibiotics and those with viral infections can be given antipyretics.

Efficient treatment is advantageous because it minimises hospital stays, ensures that patients obtain appropriate treatment, which may save lives, especially when the patient is an infant or child, and also ensures that resources are used appropriately. In recent years it has become apparent that the over-use of antibiotics should be avoided because it leads to bacteria developing resistance. Therefore, the administration of antibiotics to patients who do not have bacterial infection should be avoided.

In one embodiment the subject is an adult. Adult is defined herein as a person of 18 years of age or older.

In one embodiment the subject is a child. Child as employed herein refers to a person under the age of 18, such as 5 to 17 years of age.

In one embodiment, the subject is an infant. Infant as used herein refers to a person in the age range of 0 to 59 days.

Modulation of gene expression as employed herein means the up-regulation or down- regulation of a gene or genes.

Up-regulated as employed herein is intended to refer to a gene transcript which is expressed at higher levels in a diseased or infected patient sample relative to, for example, a control sample free from a relevant disease or infection, or in a sample with latent disease or infection or a different stage of the disease or infection, as appropriate.

Down-regulated as employed herein is intended to refer to a gene transcript which is expressed at lower levels in a diseased or infected patient sample relative to, for example, a control sample free from a relevant disease or infection or in a sample with latent disease or infection or a different stage of the disease or infection. Thus, a gene that is up-regulated is one that is expressed at a higher level in a subject having KD compared to a subject who does not have KD. Likewise, a gene that is down-regulated is expressed at a lower level in a subject having KD compared to a subject who does not have KD.

The modulation is measured by measuring levels of gene expression by an appropriate

Gene expression as employed herein is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA (rRNA), transfer RNA (tRNA) or small nuclear RNA (snRNA) genes, the product is a functional RNA. That is to say, RNA with a function. In the context of the present disclosure, measuring the expression levels of a gene generally refers to measuring the levels of transcripts associated with that gene .

Gene expression data as employed herein is intended to refer to any data generated from a patient sample that is indicative of the expression of the two or more genes, for example 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,

36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 or 50.

In one embodiment one or more, for example 1 to 21, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20, genes are replaced by a gene with an equivalent function provided the signature retains the ability to detect/discriminate the relevant clinical status without significant loss in specificity and/or sensitivity.

In one embodiment the genes employed have identity with the 13 genes listed Table 2.

In one embodiment, one or more of the genes in the 13 gene signature are significantly differentially expressed in a sample derived from a subject having KD compared to a sample derived from a subject who does not have KD. Gene signature as used herein is intended to refer to two or more genes which when tested together are able to detect/discriminate the relevant clinical status. Hence, a gene signature represents a minimal set of genes which have sufficient discriminatory power to identify a subject having a KD or to discriminate a subject having KD from a subject having a bacterial/viral infection or inflammatory disease.

Significantly differentially expressed as employed herein means the gene shows a log2 fold change >0.5 or <-0.5 in a sample derived from a subject having KD compared to a sample derived from a subject who does not have KD, for example who has a bacterial/viral infection and/or an inflammatory condition.

In one embodiment, up-regulated as used herein means the gene shows a log2 fold change

>0.5.

In one embodiment, down-regulated as used herein means the gene shows a log2 fold change

<-0.5.

In one embodiment, one or more of the following genes are down-regulated in a subject having KD: SIOOP, IFI27, HS.553068, CD163, and RTN1.

In one embodiment, one or more of the following genes are up-regulated in a subject having KD: CACNA1E, DDIAS, KLHL2, PYR0XD2, SMOX, ZNF185, LINC02035, CLIC3.

Presented in the form of as employed herein refers to the laying down of genes from one or more of the signatures in the form of probes on a microarray.

Accurately and robustly as employed herein refers to the fact that the method can be employed in a practical setting or low resource setting, such as Africa, and that the results of performing the method properly give a high level of confidence that a true result is obtained.

High confidence is provided by the method when it provides few results that are false positives (e.g. the result suggests that the subject has a bacterial infection when he/she does not) and also has few false negatives (e.g. the result suggests that the subject does not have a bacterial infection when he/she does).

High confidence would include 90% or greater confidence, such as 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% confidence when an appropriate statistical test is employed.

In one embodiment the method provides a sensitivity of 80% or greater such as 90% or greater in particular 95% or greater, for example where the sensitivity is calculated as below:

number of true positives

sensitivity

number of true positives + number of false negatives

= probability of a positive test given that the patient is ill

In one embodiment the method provides a high level of specificity, for example 80% or greater such as 90% or greater in particular 95% or greater, for example where specificity is calculated as shown below:

number of true negatives

specificity =

number of true negatives + number of false positives

= probability of a negative test given that the patient is well In one embodiment the sensitivity of method of the gene signature is 90 to 100%, such as 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99%.

In one embodiment the specificity of the method of the gene signature is 85 to 100%, such as 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99%.

In one embodiment the sensitivity of the method of the gene signature is 85 to 100%, such as 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99%.

In one embodiment the specificity of the method of the gene signature is 85 to 100%, such as 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99%.

There are a number of ways in which gene expression can be measured including microarrays, tiling arrays, DNA or RNA arrays for example on gene chips, RNA-seq and serial analysis of gene expression.

Any suitable method of measuring gene modulation may be employed in the method of the present disclosure.

In one embodiment the gene expression measured is that of the host (e.g. human), for example the host inflammatory response, i.e. not that of the infectious agent or disease.

In one embodiment DNA or RNA from the subject sample is analysed.

In one embodiment RNA from the subject sample is analysed.

In one embodiment mRNA from the subject sample is analysed.

In one embodiment cRNA from the subject sample is analysed.

In one embodiment the sample is solid or fluid, for example blood or serum or a processed form of any one of the same.

A fluid sample as employed herein refers to liquids originating from inside the bodies of living people. They include fluids that are excreted or secreted from the body as well as body water that normally is not. Includes amniotic fluid, aqueous humour and vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, endolymph and perilymph, gastric juice, mucus (including nasal drainage and phlegm), sputum, peritoneal fluid, pleural fluid, saliva, sebum (skin oil), semen, sweat, tears, vaginal secretion, vomit, urine. Particularly blood and serum.

Blood as employed herein refers to whole blood, that is serum, blood cells and clotting factors, typically peripheral whole blood.

Serum as employed herein refers to the component of whole blood that is not blood cells or clotting factors. It is plasma with fibrinogens removed.

In one embodiment the subject derived sample is a blood sample.

In one embodiment the sample is whole blood. Hence in one embodiment the RNA sample is derived from whole blood.

The RNA sample may be subjected to further amplification by PCR, such as whole genome amplification in order to increase the amount of starting RNA template available for analysis. Alternatively, the RNA sample may be converted into cDNA by reverse transcriptase, such as HIV-1 reverse transcriptase, moloney murine leukaemia virus (M-MLV) reverse transcriptase, AMV reverse transcriptase and telomersease reverse transcriptase. Such amplification steps may be necessary for smaller sample volumes, such as blood samples obtained from children.

In one or more embodiments the analysis is ex vivo.

Ex vivo as employed herein means that which takes place outside the body. In one embodiment the gene expression data is generated from a microarray, such as a gene chip.

Microarray as employed herein includes RNA or DNA arrays, such as RNA arrays. Various different forms of microarrays will be known to the skilled person, including but not limited to solid- phase arrays and bead arrays.

Polymerase chain reaction (PCR) as employed herein refers to a widely used molecular technique to make multiple copies of a target DNA sequence. The method relies on thermal cycling, consisting of cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA. Primers containing sequences complementary to the target region along with a DNA polymerase, which the method is named after, are key components to enable selective and repeated amplification. As PCR progresses, the DNA generated is itself used as a template for replication, setting in motion a chain reaction in which the DNA template is exponentially amplified.

Multiplex PCR as employed herein refers to the use of a polymerase chain reaction (PCR) to amplify two or more different DNA sequences simultaneously, i.e. as if performing many separate PCR reactions together in one reaction.

Primer as employed herein is intended to refer to a short strand of nucleic acid sequence, usually a chemically synthesised oligonucleotide, which serve as a starting point for DNA synthesis reactions.

Primers are typically about 15 base pairs long but can vary from 5 to 100 bases long. It is required in processes such as PCR because DNA polymerases can only add new nucleotides or base pairs to an existing strand of DNA. During a PCR reaction, the primer hybridises to its complementary sequence in a DNA sample. Next, DNA polymerase starts replication at the 3’end of the primer and extends the primer by copying the sequence of the opposite DNA strand.

In one embodiment the primers of the present disclosure are specific for RNA, such as mRNA, i.e. they are complementary to RNA sequences. In another embodiment, the primers are specific for cDNA, i.e. they are complementary to cDNA sequences.

In one embodiment the primers of the present disclosure comprise a label which enables the primers to be detected or isolated. Examples of labels include but are not limited to a fluorescent label, a coloured label, and antibody, step tag, his tag.

In another embodiment, each primer in a given pair of primers is labelled, for example where one label (also known as a quencher) quenches the fluorescence of the other label when said labels are within proximity of each other. Such labels are particularly useful in real time PCR reactions for example. Examples of such label pairs include 6-carboxyfluorescein (FAM) and tetrachlorofluorescein, or tetramethylrhodamine and tetrachlorofluorescein.

Point of care test or bedside test as used herein is intended to refer to a medical diagnostic test which is conducted at or near the point of care, i.e. at the time and place of patient care. This is in contrast with a conventional diagnostic test which is typically confined to the medical laboratory and involves sending specimens away from the point of care to the laboratory for testing. Such diagnostic tests often require many hours or days before the results of the test can be received. In the meantime, patient care must continue without knowledge of the test results. In comparison, a point of care test is typically a simple medical test that can be performed rapidly. A gene chip is essentially a microarray that is to say an array of discrete regions, typically nucleic acids, which are separate from one another and are, for example arrayed at a density of between, about 100/cm 2 to 1000/cm 2 , but can be arrayed at greater densities such as 10000/cm 2 . The principle of a microarray experiment, is that mRNA from a given cell line or tissue is used to generate a labelled sample typically labelled cDNA or cRNA, termed the 'target', which is hybridised in parallel to a large number of, nucleic acid sequences, typically DNA or RNA sequences, immobilised on a solid surface in an ordered array. Tens of thousands of transcript species can be detected and quantified simultaneously. Although many different microarray systems have been developed the most commonly used systems today can be divided into two groups.

Using this technique, arrays consisting of more than 30,000 cDNAs can be fitted onto the surface of a conventional microscope slide. For oligonucleotide arrays, short 20-25mers are synthesised in situ, either by photolithography onto silicon wafers (high-density-oligonucleotide arrays from Affymetrix) or by ink-jet technology (developed by Rosetta Inpharmatics and licensed to Agilent Technologies).

Alternatively, pre-synthesised oligonucleotides can be printed onto glass slides. Methods based on synthetic oligonucleotides offer the advantage that because sequence information alone is sufficient to generate the DNA to be arrayed, no time-consuming handling of cDNA resources is required. Also, probes can be designed to represent the most unique part of a given transcript, making the detection of closely related genes or splice variants possible. Although short oligonucleotides may result in less specific hybridization and reduced sensitivity, the arraying of pre- synthesised longer oligonucleotides (50-100mers) has recently been developed to counteract these disadvantages.

In one embodiment the gene chip is an off the shelf, commercially available chip, for example HumanHT-12 v4 Expression BeadChip Kit, available from Illumina, NimbleGen microarrays from Roche, Agilent, Eppendorf and Genechips from Affymetrix such as HU-U1 33. Plus 2.0 gene chips.

In an alternate embodiment the gene chip employed in the present invention is a bespoke gene chip, that is to say the chip contains only the target genes which are relevant to the desired profile. Custom made chips can be purchased from companies such as Roche, Affymetrix and the like. In yet a further embodiment the bespoke gene chip comprises a minimal disease specific transcript set.

In one embodiment the chip consists of probes for detecting the expression levels of the 13 genes listed in Table 2.

In one embodiment the following Illumina Probe ID nos. are used to detect the modulation in gene expression levels: 7510647 for CACNA1E, 2570019 for DD1AS, 1070593 for KLHL2, 1684497 for PYR0XD2, 270068 or 3710553 for SMOX, 6840674 for ZNF185, 3236239 for LINC02035, 5870136 for CL1C3, 1510424 for SIOOP, 3990170 for IFI27, 1470450 for HS.553068, 2680092 for CD163, and 6860193 for RTN1.

In one or more embodiments above, the chip may further include 1 or more, such as 1 to 10, control probes such as house-keeping genes.

In one embodiment the gene expression data is generated in solution using appropriate probes for the relevant genes. Probe as employed herein is intended to refer to a hybridisation probe which is a fragment of DNA or RNA of variable length (usually 100-1000 bases long) which is used in DNA or RNA samples to detect the presence of nucleotide sequences (the DNA target) that are complementary to the sequence in the probe. The probe thereby hybridises to single-stranded nucleic acid (DNA or RNA) whose base sequence allows probe-target base pairing due to complementarity between the probe and target.

In one embodiment the method according to the present disclosure and for example chips employed therein may comprise one or more house-keeping genes.

House-keeping genes as employed herein is intended to refer to genes that are not directly relevant to the profile for identifying the disease or infection but are useful for statistical purposes and/or quality control purposes, for example they may assist with normalising the data, in particular a house-keeping gene is a constitutive gene i.e. one that is transcribed at a relatively constant level. The housekeeping gene's products are typically needed for maintenance of the cell.

Examples of housekeeping genes include but are not limited to actin, GAPDH, ubiquitin, 18s rRNA, RPII (P0LR2A), TBP, PP1A, GUSB, HSPCB, YWHAZ, SDHA, RPS13, HPRTland B4GALT6.

In one embodiment minimal disease specific transcript set as employed herein means the minimum number of genes need to robustly identify the target disease state.

Minimal discriminatory gene set is interchangeable with minimal disease specific transcript set or minimal gene signature.

Normalising as employed herein is intended to refer to statistically accounting for background noise by comparison of data to control data, such as the level of fluorescence of house keeping genes, for example fluorescent scanned data may be normalized using RMA to allow comparisons between individual chips. Irizarry et al 2003 describes this method.

Scaling as employed herein refers to boosting the contribution of specific genes which are expressed at low levels or have a high fold change but still relatively low fluorescence such that their contribution to the diagnostic signature is increased.

Fold change is often used in analysis of gene expression data in microarray and RNA-Seq experiments, for measuring change in the expression level of a gene and is calculated simply as the ratio of the final value to the initial value i.e. if the initial value is A and final value is B, the fold change is B/A. Tusher et al 2001.

In programs such as Arrayminer, fold change of gene expression can be calculated. The statistical value attached to the fold change is calculated and is the more significant in genes where the level of expression is less variable between subjects in different groups and, for example where the difference between groups is larger.

The step of obtaining a suitable sample from the subject is a routine technique, which involves taking a blood sample. This process presents little risk to donors and does not need to be performed by a doctor but can be performed by appropriately trained support staff. In one embodiment the sample derived from the subject is approximately 2.5 ml of blood, however smaller volumes can be used for example 0.5-lml.

Blood or other tissue fluids are immediately placed in an RNA stabilizing buffer such as included in the Pax gene tubes, or Tempus tubes.

If storage is required then it should usually be frozen within 3 hours of collections at -80°C. In one embodiment the gene expression data is generated from RNA levels in the sample.

For microarray analysis the blood may be processed using a suitable product, such as PAX gene blood RNA extraction kits (Qiagen).

Total RNA may also be purified using the Tripure method - Tripure extraction (Roche Cat. No. 1 667 165). The manufacturer’s protocols may be followed. This purification may then be followed by the use of an RNeasy Mini kit - clean-up protocol with DNAse treatment (Qiagen Cat. No. 74106).

Quantification of RNA may be completed using optical density at 260nm and Quant-IT RiboGreen RNA assay kit (Invitrogen - Molecular probes R1 1490). The Quality of the 28s and 18s ribosomal RNA peaks can be assessed by use of the Agilent bioanalyser.

In another embodiment the method further comprises the step of amplifying the RNA. Amplification may be performed using a suitable kit, for example TotalPrep RNA Amplification kits (Applied Biosystems).

In one embodiment an amplification method may be used in conjunction with the labelling of the RNA for microarray analysis. The Nugen 3' ovation biotin kit (Cat: 2300-12, 2300-60).

The RNA derived from the subject sample is then hybridised to the relevant probes, for example which may be located on a chip. After hybridisation and washing, where appropriate, analysis with an appropriate instrument is performed.

In performing an analysis to ascertain whether a subject presents a gene signature indicative of disease or infection according to the present disclosure, the following steps are performed: obtain mRNA from the sample and prepare nucleic acids targets, hybridise to the array under appropriate conditions, typically as suggested by the manufactures of the microarray (suitably stringent hybridisation conditions such as 3X SSC, 0.1% SDS, at 50 <0>C) to bind corresponding probes on the array, and wash if necessary to remove unbound nucleic acid targets and analyse the results.

In one embodiment the readout from the analysis is fluorescence.

In one embodiment the readout from the analysis is colorimetric.

In one embodiment physical detection methods, such as changes in electrical impedance, nanowire technology or microfluidics may be used.

In one embodiment there is provided a method which further comprises the step of quantifying RNA from the subject sample.

If a quality control step is desired, software such as Genome Studio software may be employed.

Numeric value as employed herein is intended to refer to a number obtained for each relevant gene, from the analysis or readout of the gene expression, for example the fluorescence or colorimetric analysis. The numeric value obtained from the initial analysis may be manipulated, corrected and if the result of the processing is a still a number then it will be continue to be a numeric value.

By converting is meant processing of a negative numeric value to make it into a positive value or processing of a positive numeric value to make it into a negative value by simple conversion of a positive sign to a negative or vice versa.

Analysis of the subject-derived sample will for the genes analysed will give a range of numeric values some of which are positive (preceded by + and in mathematical terms considered greater than zero) and some of which are negative (preceded by - and in strict mathematical terms are considered to less than zero). The positive and negative in the context of gene expression analysis is a convenient mechanism for representing genes which are up-regulated and genes which are down regulated.

In the method of the present disclosure either all the numeric values of genes which are down-regulated and represented by a negative number are converted to the corresponding positive number (i.e. by simply changing the sign) for example -1 would be converted to 1 or all the positive numeric values for the up-regulated genes are converted to the corresponding negative number.

The present inventors have established that this step of rendering the numeric values for the gene expressions positive or alternatively all negative allows the summating of the values to obtain a single value that is indicative of the presence of disease or infection or the absence of the same.

This is a huge simplification of the processing of gene expression data and represents a practical step forward thereby rendering the method suitable for routine use in the clinic.

By discriminatory power is meant the ability to distinguish between a KD sample and a bacterial infected, a viral infected sample / subj ect and/or between and an inflammatory disease, such as SLE, JIA and HSP.

The discriminatory power of the method according to the present disclosure may, for example, be increased by attaching greater weighting to genes which are more significant in the signature, even if they are expressed at low or lower absolute levels.

As employed herein, raw numeric value is intended to, for example refer to unprocessed fluorescent values from the gene chip, either absolute fluorescence or relative to a house keeping gene or genes.

Summating as employed herein is intended to refer to act or process of adding numerical values.

Composite expression score as employed herein means the sum (aggregate number) of all the individual numerical values generated for the relevant genes by the analysis, for example the sum of the fluorescence data for all the relevant up and down regulated genes. The score may or may not be normalised and/or scaled and/or weighted.

In one embodiment the composite expression score is normalised.

In one embodiment the composite expression score is scaled.

In one embodiment the composite expression score is weighted.

Weighted or statistically weighted as employed herein is intended to refer to the relevant value being adjusted to more appropriately reflect its contribution to the signature.

In one embodiment the method employs a simplified risk score as employed in the examples herein.

Simplified risk score is also known as disease risk score (DRS).

Control as employed herein is intended to refer to a positive (control) sample and/or a negative (control) sample which, for example is used to compare the subject sample to, and/or a numerical value or numerical range which has been defined to allow the subject sample to be designated as positive or negative for disease/infection by reference thereto.

Positive control sample as employed herein is a sample known to be positive for the pathogen or disease in relation to which the analysis is being performed, such as a bacterial infection.

Negative control sample as employed herein is intended to refer to a sample known to be negative for the pathogen or disease in relation to which the analysis is being performed. In one embodiment the control is a sample, for example a positive control sample or a negative control sample, such as a negative control sample.

In one embodiment the control is a numerical value, such as a numerical range, for example a statistically determined range obtained from an adequate sample size defining the cut-offs for accurate distinction of disease cases from controls.

Conversion of multi-gene transcript disease signatures into a single number disease score

Once the RNA expression signature of the disease has been identified by variable selection, the transcripts are separated based on their up- or down-regulation relative to the comparator group. The two groups of transcripts are selected and collated separately.

Summation of up-regulated and down-regulated RNA transcripts

To identify the single disease risk score for any individual patient, the raw intensities, for example fluorescent intensities (either absolute or relative to housekeeping standards) of all the up- regulated RNA transcripts associated with the disease are summated. Similarly summation of all down-regulated transcripts for each individual is achieved by combining the raw values (for example fluorescence) for each transcript relative to the unchanged housekeeping gene standards. Since the transcripts have various levels of expression and respectively their fold changes differ as well, instead of summing the raw expression values, they can be scaled and normalised between 0,1. Alternatively they can be weighted to allow important genes to carry greater effect. Then, for every sample the expression values of the signature’s transcripts are summated, separately for the up- and down- regulated transcripts.

The total disease score incorporating the summated fluorescence of up- and down-regulated genes is calculated by adding the summated score of the down-regulated transcripts (after conversion to a positive number) to the summated score of the up-regulated transcripts, to give a single number composite expression score. This score maximally distinguishes the cases and controls and reflects the contribution of the up- and down- regulated transcripts to this distinction.

Comparison of the disease risk score in cases and controls

The composite expression scores for patients and the comparator group may be compared, in order to derive the means and variance of the groups, from which statistical cut-offs are defined for accurate distinction of cases from controls. Using the disease subjects and comparator populations, sensitivities and specificities for the disease risk score may be calculated using, for example a Support Vector Machine and internal elastic net classification.

Disease risk score as employed herein is an indicator of the likelihood that patient has a bacterial infection when comparing their composite expression score to the comparator group’s composite expression score.

Development of the disease risk score into a simple clinical test for disease severity or disease risk prediction

The approach outlined above in which complex RNA expression signatures of disease or disease processes are converted into a single score which predicts disease risk can be used to develop simple, cheap and clinically applicable tests for disease diagnosis or risk prediction.

The procedure is as follows: For tests based on differential gene expression between cases and controls (or between different categories of cases such as severity), the up- and down- regulated transcripts identified as relevant may be printed onto a suitable solid surface such as microarray slide, bead, tube or well.

Up-regulated transcripts may be co-located separately from down-regulated transcripts either in separate wells or separate tubes. A panel of unchanged housekeeping genes may also be printed separately for normalisation of the results.

RNA recovered from individual patients using standard recovery and quantification methods (with or without amplification) is hybridised to the pools of up- and down-regulated transcripts and the unchanged housekeeping transcripts.

Control RNA is hybridised in parallel to the same pools of up- or down-regulated transcripts.

Total value, for example fluorescence for the subject sample and optionally the control sample is then read for up- and down- regulated transcripts and the results combined to give a composite expression score for patients and controls, which is/are then compared with a reference range of a suitable number of healthy controls or comparator subjects.

Correcting the detected signal for the relative abundance of RNA species in the subject sample

The details above explain how a complex signature of many transcripts can be reduced to the minimum set that is maximally able to distinguish between patients and other phenotypes. For example, within the up-regulated transcript set, there will be some transcripts that have a total level of expression many fold lower than that of others. However, these transcripts may be highly discriminatory despite their overall low level of expression. The weighting derived from the elastic net coefficient can be included in the test, in a number of different ways. Firstly, the number of copies of individual transcripts included in the assay can be varied. Secondly, in order to ensure that the signal from rare, important transcripts are not swamped by that from transcripts expressed at a higher level, one option would be to select probes for a test that are neither overly strongly nor too weakly expressed, so that the contribution of multiple probes is maximised. Alternatively, it may be possible to adjust the signal from low-abundance transcripts by a scaling factor.

Whilst this can be done at the analysis stage using current transcriptomic technology as each signal is measured separately, in a simple colorimetric test only the total colour change will be measured, and it would not therefore be possible to scale the signal from selected transcripts. This problem can be circumnavigated by reversing the chemistry usually associated with arrays. In conventional array chemistry, the probes are coupled to a solid surface, and the amount of biotin- labelled, patient-derived target that binds is measured. Instead, we propose coupling the biotin- labelled cRNA derived from the patient to an avidin-coated surface, and then adding DNA probes coupled to a chromogenic enzyme via an adaptor system. At the design and manufacturing stage, probes for low-abundance but important transcripts are coupled to greater numbers, or more potent forms of the chromogenic enzyme, allowing the signal for these transcripts to be 'scaled-up' within the final single-channel colorimetric readout. This approach would be used to normalise the relative input from each probe in the up-regulated, down-regulated and housekeeping channels of the kit, so that each probe makes an appropriately weighted contribution to the final reading, which may take account of its discriminatory power, suggested by the weights of variable selection methods.

The detection system for measuring multiple up or down regulated genes may also be adapted to use rTPCR to detect the transcripts comprising the diagnostic signature, with summation of the separate pooled values for up and down regulated transcripts, or physical detection methods such as changes in electrical impedance. In this approach, the transcripts in question are printed on nanowire surfaces or within microfluidic cartridges, and binding of the corresponding ligand for each transcript is detected by changes in impedance or other physical detection system.

In one embodiment the gene chip is a fluorescent gene chip that is to say the readout is fluorescence.

Fluorescence as employed herein refers to the emission of light by a substance that has absorbed light or other electromagnetic radiation.

Thus in an alternate embodiment the gene chip is a colorimetric gene chip, for example colorimetric gene chip uses microarray technology wherein avidin is used to attach enzymes such as peroxidase or other chromogenic substrates to the biotin probe currently used to attach fluorescent markers to DNA. The present disclosure extends to a microarray chip adapted to be read by colorimetric analysis and adapted to discriminate a subject having a bacterial infection from a subject having a viral infection or an inflammatory disease. The present disclosure also extends to use of a colorimetric chip to analyse a subject sample for discriminating a subject having a bacterial infection from a subject having a viral infection or an inflammatory disease.

Colorimetric as employed herein refers to as assay wherein the output is in the human visible spectrum.

In an alternative embodiment, a gene set or probe set for discriminating a subject having a bacterial infection from a subject having a viral infection or an inflammatory disease may be detected by physical detection methods including nanowire technology, changes in electrical impedance, or microfluidics.

The readout for the assay can be converted from a fluorescent readout as used in current microarray technology into a simple colorimetric format or one using physical detection methods such as changes in impedance, which can be read with minimal equipment. For example, this is achieved by utilising the Biotin currently used to attach fluorescent markers to DNA. Biotin has high affinity for avidin which can be used to attach enzymes such as peroxidase or other chromogenic substrates. This process will allow the quantity of cRNA binding to the target transcripts to be quantified using a chromogenic process rather than fluorescence. Simplified assays providing yes/no indications of disease status can then be developed by comparison of the colour intensity of the up- and down-regulated pools of transcripts with control colour standards. Similar approaches can enable detection of multiple gene signatures using physical methods such as changes in electrical impedance.

This aspect of the invention is likely to be particularly advantageous for use in remote or under-resourced settings or for rapid diagnosis in "near patient" tests. For example, places in Africa because the equipment required to read the chip is likely to be simpler.

Multiplex assay as employed herein refers to a type of assay that simultaneously measures several analytes (often dozens or more) in a single run/cycle of the assay. It is distinguished from procedures that measure one analyte at a time.

In one embodiment there is provided a bespoke gene chip for use in the method, in particular as described herein.

In one embodiment there is provided use of a known gene chip for use in the method described herein in particular to identify one or more gene signatures described herein. In one aspect there is provided a method of determining whether to administer a treatment for KD to a subject, such as a subject suspected of having KD, for example a subject exhibiting symptoms of having KD, by employing the method disclosed therein, and administering the treatment to the subject if the method indicates that the subject has KD. Examples of suitable treatments for KD include but are not limited to gamma globulin (IVIg), aspirin, or other anti inflammatory agents such as steroids and infliximab, including combinations thereof.

Gene signature, gene transcript signature, gene set, disease signature, diagnostic signature and gene profile are used interchangeably throughout and should be interpreted to mean gene signature.

In the context of this specification "comprising" is to be interpreted as "including".

Aspects of the invention comprising certain elements are also intended to extend to alternative embodiments "consisting" or "consisting essentially" of the relevant elements.

Where technically appropriate, embodiments of the invention may be combined.

Embodiments are described herein as comprising certain features/elements. The disclosure also extends to separate embodiments consisting or consisting essentially of said features/elements.

Technical references such as patents and applications are incorporated herein by reference.

Any embodiments specifically and explicitly recited herein may form the basis of a disclaimer either alone or in combination with one or more further embodiments.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1 shows the diagnostic algorthim for assigning patients to diagnostic groups. KD =

Kawasaki disease; AHA = American Heart Association; CAA = coronary artery aneurysm; JIA = juvenile idiopathic arthritis; HSP = Henoch-Schonlein purpura; CRP= C-reactive protein.

Figure 2 shows the Overall study pipeline showing sample handling, derivation of test and t raining datasets, data processing and analysis pipeline.

a see methods (Example 1); b Healthy controls were used in model building but were excluded from estimates of model accuracy; c Diagnostic performance assessed on 72 patients (days 2-7 of illness). Abbreviations: KD = Kawasaki disease; DB = definite bacterial; DV = definite viral; U = infections of uncertain bacterial or viral aetiology; JIA = juvenile idiopathic arthritis; HSP = Henoch-Schonlein purpura; HC = healthy controls; PDMS = Parallel Deterministic Model Search; SDE = significantly differentially expressed; FC = fold change.

Figure 3 shows the performance of the 13-transcript signature on the discovery test and validation sets. Classification (A), and receiver operating characteristic (ROC) curve (B) of the 13-transcript signature in the discovery test set, comprising patients with Kawasaki Disease (KD) and patients with other diseases, using the Disease Risk Score (DRS) values. Classification (C), and ROC curve (D) of the 13 -transcript signature in the validation set, comprising three KD clinical subgroups of differing diagnostic certainty and patients with other diseases. In box plots, horizontal lines represent the median; lower and upper edges represent interquartile ranges; whiskers represent the range, or 1.5x the interquartile range, whichever is smaller. The horizontal line across the graphs indicates the DRS threshold that separates patients predicted as KD (above the line) or not KD (below), as determined by the point in the ROC curve that maximized sensitivity and specificity in the discovery training group. KD = Kawasaki disease; DB = definite bacterial; DV = definite viral; U = infections of uncertain bacterial or viral aetiology; JIA = juvenile idiopathic arthritis; HSP = Henoch- Schonlein Purpura; KD = def definite KD; KD-HP = highly probable KD; KD-P = possible KD.

Figure 4 shows the performance of the 13-transcript signature by illness day at sample collection in validation set. The X axis shows the collection day of the sample in relation to the first day of illness (i.e. initiation of fever). Black dots = definite KD, grey dots = highly probable KD, black dots with arrows = possible KD clinical subgroups in the validation set.

Figure 5 shows the principal component analysis (PCA) plot of PCI & PC2 in the discovery cohort after background adjustment and normalisation. A sample from a KD patient was removed (arrow) from subsequent analysis. Each spot is data from an array. KD = Kawasaki Disease, DB = Definite Bacterial, DV = Definite Viral, HC = healthy controls, U = infections of uncertain bacterial or viral aetiology, JIA = juvenile idiopathic arthritis, HSP = Henoch- Schonlein purpura.

Figure 6 shows PCA plots of (A) naive merging of validation cohorts and (B) merging using

ComBat. Each spot represents data from an array; KD-acute = Acute Kawasaki Disease, KD-conv = Convalescent Kawasaki Disease, DB = Definite Bacterial, DV = Definite Viral, U = infections of uncertain bacterial or viral aetiology, HC healthy controls. Panel (B) includes data from 30 KD patients with samples after the 7 th day of fever, who were not included in the diagnostic performance calculations.

Figure 7 shows a gene network derived from the 13-transcript signature. The network was generated using Ingenuity Pathways Analysis. 12 of the 13 transcripts were mapped to the database. This network, containing 7 focus molecules was the top network in the analysis. Each molecule is coloured according to the direction of expression in KD. Unbroken lines indicate direct interaction, dashed lines indicate indirect interaction. The legend to the network is located at:

http://ingenuity.force.com/ipa/articles/Feature_Description/ Legend.

EXAMPLE 1 - Identification of 13 transcript gene signature

Patient study groups

The differential diagnosis for KD includes multiple infectious and inflammatory conditions, and we therefore established a case-control discovery study group of children with KD and a range of other infectious and inflammatory diseases with clinical signs overlapping KD. Patients were prospectively recruited, at pediatric centres in the UK, the Netherlands, Spain, and USA, if they had febrile illness and required blood testing for clinical investigation, as part of the Immunopathology of Respiratory, Inflammatory and Infectious Disease Study [20], the Spanish GENDRES study, the USA- based Kawasaki Disease Research Center Program or the Dutch Kawasaki study.

Children recruited with KD represented a combination of those presenting directly to the study centre Emergency Department, and patients referred from regional centres. However, our study included only those patients for whom blood sampling had taken place before initiation of IVIG for treatment of KD and in the first 7 days of the illness. Febrile controls were recruited with blood samples collected early, before clinical diagnosis was confirmed, in order to obtain samples as close to presentation as possible, including patients referred for evaluation of possible KD by practitioners in the community, who represented the true population for whom a diagnostic test would be extremely relevant.

Febrile controls were assigned to diagnostic groups using predefined criteria, once the results of all investigations were available (supplementary appendix and Fig 1). Children with comorbidities likely to influence gene expression, such as immunosuppressive treatment or bone marrow transplant, were excluded. We included comparator groups of children presenting with inflammatory illness: Henoch-Schonlein Purpura (HSP) and Juvenile Idiopathic Arthritis (JIA).

Patients in the validation study group were similarly recruited as part of biomarker studies of febrile children presenting to hospital and requiring blood tests, as has been described previously [21, 22]. Patients presenting to hospital within ten days of the onset of a febrile illness were recruited and blood samples for gene expression analysis collected at the same time as routine diagnostic studies to evaluate the cause of the child’s illness. Healthy control children with no recent (2 weeks) history of fever or immunisation were recruited alongside KD and febrile control patients as part of the discovery and validation studies. Data from healthy controls were used to standardise data obtained in different microarray experiments but were not used to evaluate the performance of the signature.

KD Case definition

KD was diagnosed on the basis of the American Heart Association (AHA) criteria [14]. Patients diagnosed with KD underwent 2D echocardiography soon after presentation and at two and six weeks after onset. Patients with fewer than four of the classic criteria were included as incomplete KD if the maximum coronary artery Z score (Zmax) (standard deviation units from the mean internal diameter normalized for body surface area) at any time during the illness for the left anterior descending or right coronary arteries was > 2-5, or if they satisfied the algorithm for incomplete KD in the AHA guidelines. Patients were classified as having normal (Zmax < 2-5) or dilated coronary arteries (Zmax > 2-5 < 5-0) or CAA (Zmax > 5-0). Because of inter-operator variability in exact coronary artery dimensions, we set a high (Zmax > 5.0) threshold to define patients with aneurysms in order to reduce misclassification.

Further classification of KD by diagnostic certainty

As there is no "gold" standard for diagnosis of KD, some patients may meet the criteria for KD but have other conditions such as staphylococcal or streptococcal infection, viral infection or inflammatory diseases. Therefore, we further categorized KD patients in the validation study group based on certainty of clinical diagnosis. All clinical records, laboratory results, echocardiogram reports, response to treatment and follow-up clinic notes were reviewed by an independent pediatric infectious disease specialist and expert on KD (author MPG - blinded to the analysis). Patients with documented CAA (Zmax > 5-0) persisting six weeks after onset were considered to have definite KD, as there is no other self-resolving inflammatory illness in childhood leading to CAA. The remaining patients (all of whom were treated with IVIG by the clinical team for suspected KD) were classified as highly probable, possible or unlikely KD by the expert reviewer. This review identified no "unlikely KD" cases. Febrile control children with infection or other inflammatory syndromes

Children presenting with febrile illnesses were assigned as having definite bacterial infection, definite viral infection, suspected bacterial or viral infection, HSP or JIA using the criteria shown in Figure 1 and described in the supplementary appendix.

Ethical approval and consent

Patients were recruited under approvals by the Research Ethics Committees of UCSD (Human Research Protection Program #140220), Spain (Ethical Committee of Clinical Investigation of Galicia, CEIC ref 2010/015), Amsterdam (NL41023.018.12 and NL34230.018.10), and the UK (St Mary’s Hospital 09/H0712/58, 13/LO/0026).

Oversight and conduct of the study

Patients were categorized into disease groups (Figure 1) after evaluation of all results by at least two independent clinicians not involved in the patient’s care (authors JAH, JCB, JK, MPG, AMB). All samples were anonymized. The transcriptomic datasets were analyzed only after the clinical assignments were finalized and dispatched for independent verification (supplementary appendix). Discovery and validation of gene expression signature

The overall study design, and signature discovery pipeline is shown in Figure 2. Whole blood was collected at the time of recruitment (before IVIG treatment for KD cases) into PAXgene blood RNA tubes (PreAnalytiX, Germany), frozen, extracted and analysed on Human HT-12 v.4 BeadChip arrays (Illumina). An earlier Illumina BeadChip array (HT-12 v.3) with largely overlapping probes was used in a subset of the validation study group. Details of laboratory methods are provided in the Supplementary Appendix.

Statistical Analysis

Transcript signature discovery

Analysis of the transcriptomic data was conducted with 'R' Language and Environment for Statistical Computing (R) 3.2.2 (supplementary methods). As shown in Figure 2, the discovery study group was randomly divided into an 80% 'training' set and a 20% 'test set’. The signature was identified in the training set and validated in the test set as well as in a second study group (the validation study group) established using our previously reported acute and convalescent KD patients [21] and acute bacterial and viral patients [22] (supplementary methods). After quality control and filtering (supplementary methods), significantly differentially expressed (SDE) transcripts in KD patients compared to all other diseases were identified in the training set.

Small signature discovery using Parallel Deterministic Model Search (PDMS)

A novel method, PDMS, that identifies and ranks transcript signatures on the basis of the least number of transcripts and highest accuracy in discrimination, was used to identify a parsimonious gene expression signature comprising the smallest number of transcripts that optimally distinguished KD from other diseases. The method first evaluates all possible one and two-gene models distinguishing KD from comparator diseases based on all SDE transcripts, and takes the 100 best-fitting two-gene models to the next round when a further gene is added to the model, and all combinations are again evaluated. The process continues with incremental addition of one further gene at a time to the best 100 models. The optimum signature for a given number of transcripts (model size) was selected after ranking each model by its Watanabe-Akaike Information Criterion, which is a Bayesian estimate of the out-of-sample error [23]. The optimum model size was determined by cross-validation. Further details are in the supplementary statistical methods.

Disease Risk Score and assessment of model accuracy

We applied our previously reported Disease Risk Score (DRS) method that assigns individual disease risk based on the transcripts included in the diagnostic signature [15]. The DRS combines the fluorescence intensity of up-regulated transcripts and subtracts the combined fluorescence intensity of down-regulated transcripts [15] and might facilitate development of tests from complex signatures. Healthy controls were used in model building but were excluded from estimates of model accuracy, assessed by area under the receiver operator curve (AUC), sensitivity and specificity. Supplementary methods

RNA sample extraction and processing

Whole blood (2.5ml) was collected into PAXgene blood RNA tubes (PreAnalytiX, Germany), incubated for 2 hours, frozen at -20°C within 6 hours of collection, before storage at -80°C. RNA was extracted using PAXgene blood RNA kits (PreAnalytiX, Germany) according to the manufacturer’s instructions. The integrity and yield of the total RNA was assessed using an Agilent 2100 Bioanalyser and a NanoDrop 1000 spectrophotometer. The samples used in the discovery cohort came from the USA (UCSD), Spain, The Netherlands and UK. All samples were extracted in the UK except for the samples from the USA. After quantification and quality control, biotin-labeled cRNA was prepared using Illumina TotalPrep RNA Amplification kits (Applied Biosystems) from 500ng RNA. Labeled cRNA was hybridized overnight to Human HT-12 v.4 Expression BeadChip arrays (Illumina). After washing, blocking and staining, the arrays were scanned using an Illumina BeadArray Reader according to the manufacturer’s instructions. Using Genome Studio software the microarray images were inspected for artifacts and QC parameters were assessed. No arrays were excluded at this stage. Pathogen diagnosis

Viral diagnostics were undertaken on nasopharyngeal aspirates using immunofluorescence (RSV, adenovirus, parainfluenza virus, influenza A+B) and nested PCR (RSV, adenovirus, parainfluenza 1-4, influenza A+B, bocavirus, metapneumovirus, rhinovirus/enterovirus). Bacterial cultures included blood, CSF, urine and tissue sites. Pneumococcal antigen was measured in blood and urine, and bacterial DNA was detected by meningococcal and pneumococcal PCR.

Diagnostic process in febrile controls

Patients had a diagnostic work-up as directed by the clinical team, including blood count, blood chemistry, C-reactive protein (CRP), blood urine and throat swab cultures; cerebrospinal fluid analysis and chest radiographs were performed where appropriate. Multiplex PCR was used to detect common respiratory viruses in nasopharyngeal aspirates or throat swabs, and common viruses in blood. Once the results of all investigations were available, patients were assigned to diagnostic groups using predefined criteria (Figure 1), as follows.

Bacterial infection:

Patients assigned to the bacterial pathogen group had a bacterial pathogen (gram-positive coccus or gram-negative bacillus) identified by culture or by molecular techniques in a sample from a sterile site (blood, CSF, pleural space, joint, urine), and a clinical syndrome in keeping with the identified bacterial species. This group included patients with and without viral co-infection. Children diagnosed with other bacterial infections (for instance mycoplasma, pertussis, mycobacteria) were not included in this group. No threshold for inflammatory markers was set for this group, as identification of bacteria in a sterile-site sample was taken as conclusive evidence for a confirmed bacterial infection.

Viral infection:

Patients in the viral infection group had an identified virus, a clinical syndrome in keeping with viral infection, and no microbiological or clinical features of bacterial disease. In order to avoid inclusion of children with occult bacterial infection in the viral group, children with raised inflammatory markers were excluded. A maximum threshold was set at CRP of 60mg/L, and neutrophil count of 12 x 10 9 /L. Among the 94 children, the most frequent pathogens were RSV (27 children), influenza A/B and adenovirus (23 children each).

Uncertain bacterial or viral infection:

When children with an acute febrile illness and features of infection could not be assigned confidently to one of the above groups, they were labelled as 'Uncertain Bacterial or Viral’. Children in this group had inconclusive features of bacterial or viral infection, negative microbiological findings or absent virological investigations, a syndrome inconsistent with their microbiological findings, inflammatory markers inconsistent with other clinical features of their illness, or insufficient clinical data for confident coding in another group. Patients in this group did not have bacterial infection detected at a sterile site, and some patients did have detectable virus.

Other inflammatory syndromes:

a) Henoch-Schonlein purpura (HSP) was diagnosed in children presenting with palpable purpura, typically over the buttocks and extensor surfaces in association with abdominal pain, arthralgia or renal abnormalities (haematuria and proteinuria); b) Juvenile idiopathic arthritis (JIA) was defined according to International League of Associations for Rheumatology [37]. Patients with JIA included i) treatment-naive and ii) active-exacerbation/smouldering.

Statistical Methods

Microarray pre-processing - The Discovery Dataset

Background subtraction and robust spline normalisation (RSN) were applied to the raw expression data using the R package lumi [38]. Sample outliers were assessed by Principal Component Analysis (PCA). One sample from a Kawasaki patient, was a clear outlier on PC 1 and was removed from the analysis (Fig. 5).

The samples in the discovery dataset were randomly assigned to ten different folds conditional on equal numbers of each comparator group (KD Kawasaki Disease, DB Definite Bacterial, DV Definite Viral, U infections of uncertain bacterial or viral aetiology, JIA juvenile idiopathic arthritis, HSP Henoch-Schonlein purpura, HC healthy controls). Two folds (20%) were reserved as the test set and the remaining eight folds made up the training set. As a diagnostic test for KD would be of most value early in the course of the illness, we developed our signature using only samples from patients at 7 or fewer days of fever in the discovery cohort.

Microarray pre-processing - The Validation Dataset

The validation dataset was constructed by merging two gene-expression datasets: one with acute and convalescent Kawasaki samples [39] and one with bacterial and viral infections [40]. All convalescent samples had ESR (erythrocyte sedimentation rate) levels less than 40mm/hr and all acute samples were taken within ten days of onset of illness. Background subtraction and RSN normalisation were applied to the two datasets separately in the R package lumi [38]. At this stage, there were differences between the cohorts. This is evident from a PCA plot which shows that PCI clearly distinguishes samples by batch (Fig. 6a). We therefore employed the ComBat [41] method to remove batch effects. Two binary covariates were passed to ComBat which assigned samples to three groups - healthy, KD and other diseases. The Kawasaki convalescent samples were assigned as healthy. A PCA after ComBat shows samples from both batches overlap on a plot of PCI against PC2 with no significant batch effects (Fig. 6b).

Model estimation

Before model estimation probes were pre-filtered to identify robustly expressed transcripts with log2 fold change >1 between the relevant disease groups. This was implemented by selecting probes meeting all of the following criteria in the training data:

1. Probes measured on both V3 and V4 Illumina Beadchips

2. Robustly expressed transcripts: for each probe, we calculated the proportion of samples in each comparator group for which the detection threshold p-value<0.01, and selected those probes for which this proportion was > 80% in at least one disease group

3. The majority of Kawasaki patients were recruited in UCSD. To ensure probe selection was not biased by batch effects emanating from UCSD, we excluded probes which showed association with recruitment at UCSD at p<0.05 in a linear model conditional on age in months and all disease groups which also included non-KD patients recruited from UCSD (DV, U, KD and HSP)

4. log2 fold change (conditional on age) was calculated between Kawasaki and each other comparator group; we took forward those probes with |log2 fold change |>1 for at least one of these comparisons

The functions lmFit and eBayes in the R package limma [42] were used to calculate probe association statistic used in steps (3) and (4) above.

Discovery using Parallel Deterministic Model Search (PDMS)

We used an in-house method, PDMS, to derive a parsimonious gene-expression signature, which balances small transcript number with accurate discrimination. The method iteratively estimates logistic regression coefficients for a selected subset of gene-expression levels (covariates). The regression coefficients are assigned zero-centred Gaussian prior distributions, with precision parameter t (where x=l/variance, and is equivalent to the penalty), to induce shrinkage of the coefficients to zero. The method searches as many models as possible and chooses the "best" one, with each model comprising a unique subset of selected covariates with their respective logistic regression coefficients.

In order to find the best prior probability distribution shrinkage parameter, we assessed the precision of each model using LASSO cross-validation of multiple partitions of the discovery data. The R package glmnet was used to determine the LASSO penalty with the minimum out-of-sample cross-validated deviance. We then set t by equating the penalty induced by LASSO with the penalty induced by a Gaussian prior on the largest regression coefficient ( M3X ) of the optimum LASSO fit. Denoting the LASSO penalty parameter by l,:

LASSO penalty = Gaussian penalty n\

T

The PDMS method proceeded as follows: using the pre-filtered probes that were robustly expressed with log 2 fold change >1 between groups, all possible one and two-transcript models were evaluated and ranked, based on their log-likelihood (the measure of how well the model fits the data), and the top 100 two-transcript models were taken forward. In the next stage the algorithm determined the unique set of three-gene models that could be constructed from the addition of one gene to each of the top 100 two-gene models. The log-likelihood of these models was calculated, and the process continued taking forward the top 100 models to construct models one gene larger.

For models of a given size (number of transcripts), the models are ranked by the Watanabe- Akaike Information Criteria (WAIC) [43]. The WAIC is a Bayesian information criterion for estimating the out-of-sample expected error, which penalises a model according to its effective number of parameters and assumes inference is made from the posterior distribution. PDMS makes inference from the posterior mean, therefore the WAIC is appropriate for our application. Details of how the WAIC is estimated can be found in Gelman et al [44] . Although WAIC adjusts for over-fitting by adding a correction for the effective number of parameters, it does not account for the increased false positive rate induced by searching over many models. PDMS does not simply choose the model with the lowest WAIC across all model sizes explored, but instead introduces an additional penalty which minimises:

Corrected criteria = WAIC + ak

Where k is the model size, we take a= 1.

Calculating model accuracy

The area under ROC curves, and corresponding confidence intervals of the models’ application to the test and validation datasets were calculated using the R package pROC [45].

Results for each patient were summarised as a Disease Risk Score (DRS) to determine the accuracy of classification by the 13-transcript signature, and the optimal threshold-cut-off for classification as KD or not KD, based on training set data, was determined according to Youden's J statistic by the point in the ROC curve that maximizes the distance to the identity line (maximum of (sensitivities + specificities)) [46]. The same threshold was used in accuracy calculations for the validation data.

Confidence intervals (Cl) for sensitivity and specificity were calculated using Jeffrey’s method. Jeffrey’s method is derived from a Bayesian perspective in which the underlying proportion of interest is assigned the non-informative Jeffrey’s reference prior— Beta( ½, ½ ).[47] Thus, sensitivity 95% CIs are derived from the 2.5% and 97.5% quantiles of a Beta (p+½, q+½) distribution, where p is the number of true positives and q is the number of false negatives.

Results

The numbers of patients in each diagnostic category are shown in Figure 2. Clinical and demographic features of the KD patients are shown in Table 1, and those of patients with other inflammatory syndromes and infections are shown in Tables 3-6. Principal Component Analysis of the normalised gene expression profiles was performed separately on the discovery (training and test) and validation groups; Figures 5 and 6 plot PCI vs PC2 of these two analyses. Study groups clustered together in the discovery group, and in the validation group after combining KD and case-control data using the ComBat algorithm [24] (see Supplementary Statistical Methods above). Table 1: Clinical characteristics and laboratory values at acute time point for KD subjects in discovery and validation study group _

Patient characteristic Discovery set Validation set d

Age, nionlhs 20-5 ( 10 - 45 :·; i ( 1 7.B - G. 1.0 )

Asian (includes Far 12 (15) 12 (17)

Fast & Indian subcontinent)

All values shown as median (IQR). There were no significant differences between the discovery and validation patients for the characteristics a: illness day 1= first day of fever; b: Haemoglobin normalized by age; c: >140 is written as 140; d: of 102 patients with KD, 30 patients with illness day at sampling > 8 were excluded and the remaining 72 patients were used for diagnostic performance; e discovery vs validation P value = 0.051.

Identification of minimal transcript signatures

There were 1600 transcripts passing QC that were significantly differentially expressed between KD and all other diseases and healthy controls (defined as |log2 fold change | >1 in KD vs at least one of the comparator groups). To identify minimal signatures suitable for developing as a test, we next undertook variable selection using PDMS. This approach identified a 13-transcript signature (Table 2), which when implemented as a DRS had a diagnostic performance as follows: AUC in the test set was 96.2% (95% Cl, 92.5%, 99.9%) with sensitivity/specificity 81.7% (95% Cl, 60.0%, 94.8%), 92.1% (95% Cl, 84.0%, 97.0%) respectively (Fig 3A, B).

Table 2: The genes included in the diagnostic signature

Gene Gene name HGNC ID Probe ID Location Logistic symbol regression coefficient

DDIAS DNA damage induced 26351 2570019 llql4.1 0.844

apoptosis suppressor

KLHL2 Kelch-like family member 2 6353 107059.8 0.789

PYROXD2 Pyridine nucleotide- 23517 1684497 10q24.2 0.727

disulphide oxidoreductase

domain 2

SMOX Spermine oxidase 15862 270068 0.675

ZNF185 Zinc finger protein 185 with 12976 6840674 Xq28 0.646

LIM domain

CLIC3 Chloride intracellular 2064 5870136 9q34.3 0.464

channel 3

IFI27 Interferon alpha-inducible 5397 3990170 14q32.12 -0.426

protein 27

CD163 CD163 molecule 1631 2680092 12pl3.31 -0.638

The logistic regression coefficient indicates the power of the gene to discriminate KD in the PDMS model; genes with positive values show increased expression in KD relative to other diseases; genes with negative values show decreased expression in KD.

Signature performance in validation set

When the signature was applied to the 72 KD cases in the validation set, the AUC was 96.5% (95%CI, 93.7%, 99.3%) with sensitivity of 90.8% (95% Cl, 82.5%, 96.2%) and specificity of 89.1% (95% Cl, 83.0%, 93.7%). As clinical features of KD overlap other conditions, and as any KD study group is likely to include patients without KD, we assessed whether the certainty of clinical diagnosis corresponded to the strength of the KD DRS prediction score. The performance of the 13-transcript signature in the definite, highly probable, or possible KD patients of the validation set (see methods) followed the clinical certainty of diagnosis. When analysed separately, the performance of the 13- transcript PDMS signature in definite, probable and possible KD groups followed the clinical certainty of diagnosis with ROC AUCs of 98.1% (95% Cl, 94.5%, 100%), 96.3% (95% Cl, 93.3%, 99.4%) and 70.0% (95% Cl, 53.4%, 86.6%) respectively (Fig 3C, D).

Performance of the signature by illness day

The discovery group included KD patients up to day 7 of their illness (with day 1 as the first day of fever), and the signature was validated on patients up to and including day 7 of illness. The performance of the signature declined when applied to 30 patients on day 8-10 of their illness (Fig 4).

Discussion

We have identified a 13-transcript signature that distinguishes KD from patients with bacterial, viral and inflammatory diseases. The high sensitivity and specificity of this signature for early diagnosis of KD suggests it might form the basis of a diagnostic test. Our findings extend previous gene expression studies in KD, which focused on immunopathogenesis [21, 25-29].

For 5 of the 13 transcripts in the signature, the expression was lower in KD patients compared to the non-KD group (Table 2). Of these 5, the S100 calcium binding protein P [S100P], has previously been reported to show increased expression in KD during the acute phase, in comparison to convalescence [30], or with viral infections [29, 30]. S100P expression was highest in bacterial patients, and selection of this transcript in the PDMS model was driven by KD-bacterial discrimination. The interferon inducible gene, interferon alpha-inducible protein 27 ( IFI27) that regulates apoptosis, has been reported to be up-regulated in febrile children with viral infections compared with children with acute bacterial infections [31] and autoimmune diseases [32, 33]. Low transcript abundance of the family of genes induced by Type 1 interferons was previously reported in a comparison of whole blood gene expression in acute KD versus adenovirus infection [29], which is consistent with inclusion of IFI27 in the model as a negative predictor of KD. CD163 is transmembrane receptor expressed in macrophages and monocytes involved in bacterial clearance during the acute phase of infection [34]. A network analysis of the signature using Ingenuity Pathways Analysis reveals that 7 of the 13 transcripts in the signature were connected in a network around a central hub of TNF and IL6 (Fig 7).

The diagnosis of KD currently relies on the presence of four of the five characteristic clinical criteria. Fewer criteria are accepted as diagnostic if coronary artery abnormalities (dilatation or aneurysms) are detected on echocardiography. Children with "incomplete KD" who do not fulfil the classical diagnostic criteria, but have prolonged fever and inflammation are at increased risk of developing CAA [35]. One reason for the greater risk of CAA in incomplete KD is the delayed diagnosis that often occurs in patients lacking all clinical features. As the clinical features of KD overlap those of many other common childhood conditions such as staphylococcal and streptococcal toxin diseases, viral exanthems, Stevens Johnson syndrome, systemic juvenile idiopathic arthritis and drug reactions [36], treatment with IVIG may be delayed while awaiting exclusion of other conditions. Conversely, because the diagnosis of KD is considered in the differential of many childhood febrile illnesses and the consequences of delayed treatment may be severe, overtreatment with IVIG or immunosuppressant second-line treatments may occur. A diagnostic test that accurately distinguishes KD from other infectious and inflammatory processes would be a significant advance in management of the disorder, reduce unnecessary investigations and inappropriate treatments, and enable earlier treatment with IVIG and other anti-inflammatory agents. In establishing our discovery and validation study groups we aimed to include a wide range of disorders which have overlapping features with KD, including both infectious and inflammatory diseases. The signature we have identified distinguished KD from a wide range of other conditions. As KD is diagnosed based on a constellation of clinical features, and there is no gold standard for diagnosis, evaluation of biomarkers or tests is difficult. In any cohort of children treated with IVIG for presumed KD, it is likely that some patients with non-KD illness with overlapping clinical features will be included. To evaluate the correspondence of the KD DRS with levels of diagnostic certainty, we categorized ah patients in the validation set as definite, probable or possible KD based on independent review of ah the clinical data. We observed a higher sensitivity and specificity of our signature in the definite and highly probable than in the possible group. The diagnostic accuracy of the KD-specific signature is ready for testing in prospective studies.

We recognize both strengths and limitations in the study. Firstly, the epidemiology of KD varies globally by ethnicity, with high rates in East Asia and lower rates in Europe. Further studies are required to investigate whether there are ethnic and geographical variations in gene expression in KD. A strength of our study is that the signature was developed from a training set including a range of ethnicities. Febrile control samples in the discovery set were drawn from ah centres, whilst multi-ethnic KD samples came from UCSD. Secondly, in the validation experiment, KD and case- control data from different Illumina microarray versions were combined by applying the ComBat algorithm [24], and normalising with respect to healthy control data from each platform. This normalisation may reduce both experimental and biological sources of variability between datasets and consequently, the accuracy (AUC result) of the diagnostic signature when applied to the validation set may be an underestimate compared to that obtained from a validation dataset drawn from a single microarray experiment. Thirdly, the 13 -transcript signature was discovered using KD patients that were no more than 7 days into their illness. The advantage of our signature is that it might facilitate early diagnosis of KD, before 5 days of fever. However further work is required to establish the optimal signature for diagnosis in late, 'missed' KD patients.

Translation of multi-transcript signatures into a rapid clinical test for use in hospital diagnostic laboratories is challenging, but is made more achievable due to the relatively small number of transcripts in our signature and the rapidly evolving technologies for detecting nucleic acids. Furthermore, the DRS offers a new approach for individual disease risk assignment without the requirement for complex analysis, and provides a platform for development as a test where up- or down-regulated transcripts comprising the KD signature are co-located and their combined signal detected.

Our study suggests that KD can be distinguished from the range of infectious and inflammatory conditions with which it is often clinically confused using a small number of transcripts in blood. Development of a rapid test, based on this gene expression signature would be a major advance allowing earlier treatment, and thus prevention of cardiac complications of this serious childhood disease. Our findings represent a step towards better diagnosis of diseases based on molecular signatures rather than clinical criteria, and thus are relevant to many other clinical syndromes.

Data repository The data discussed above have been deposited in NCBI's Gene Expression Omnibus (Edgar et al, 2002) and are accessible through GEO Series accession number GSE73464 rhttD://www.ncbi.nlm.nih.gov/geo/l·

Supplementary tables

Table 3: Clinical features of children in the juvenile idiopathic arthritis cohort (Discovery)

Active-exacerbation/

Treatment-naive

smouldering

No. children

Age, months' 1 163.5 (124.0 - 186.8) 157 (137.8 - 176.5)

aAII values shown as median (IQR); b Lab values out of 27 patients for treatment-naive set, 35 patients for active-exacerbation/smouldering set. ESR = erythrocyte sedimentation rate, ANA = antinuclear antibodies, ANCA = anti-neutrophil cytoplasmic antibodies.

Table 4: Clinical features of children in the Henoch-Schonlein purpura group (Discovery)

Henoch-Schonlein Purpura

No. children

Age, months' 1 55.5 (43.0 - 81.0)

Hispanic 4 (22)

Other 2 (11)

aAll values shown as median (IQR); b Hemoglobin normalized by age; c Lab data available from 15 patients; d Lab data available from 14 patients; e Lab data available from 4 HSP; f Lab data available from 8 patients; ESR = erythrocyte sedimentation rate

Table 5: Clinical features of children with bacterial and viral infection, infections of uncertain bacterial or viral aetiology and healthy controls

(Discovery and Validation)

White blood count (x 10 3 /mm 3 ) 1 12.7 (7.7-19.3) 8.5 (6.1-12.0) 8.4 (6.5-14.6) 7.2 (6.4-9.75) 16.6 (10.0-9.3) 8.3 (5.6-10.9) 10.6 (6.5-16.0) 8.0 (5.8-8.9)

C-reactive protei

aAll values shown as median (IQR); b percentage of those with known ethnicity, c until research blood sampling, d maximum value of CRP in illness is reported.

Table 6: Viral and Bacterial causative pathogens in patients in the Definite Bacterial and Viral groups

Definite Viral Definite Bacterial

Discovery Validation Discovery Validation

Viral causative pathogen

Adenovirus 23 2

Influenza

RSV 27 10

Other

Bacterial causative pathogen

Pseudomonas spp 3

Table 7: Summary of performance of models

EXAMPLE 2 - Identification of gene signatures with fewer transcripts

The PReMS software (Hoggart, 2018) was used to generate alternate smaller signatures (fewer transcripts) based on subsets of the original 13 transcripts.

PReMS searches over many logistic regression models constructed from optimal subsets of the biomarkers, iteratively increasing the model size. Zero centred Gaussian prior distributions are assigned to all regression coefficients to induce shrinkage. The method estimates the optimal shrinkage parameter, optimal model for each model size and the optimal model size.

The Table 8 below shows examples of smaller 5, 6, 7, 8, 9, 10, 11 and 12 transcript signatures based on combinations of transcripts from the original 13 transcripts. The AUC values for the test and validation data sets (see Example 1) is shown for each signature. Note that the sample list of gene signatures shown here is not exhaustive for the sake of brevity and is purely illustrative.

Table 8 - Examples of smaller signatures based on subsets of the original 13 transcripts

Thus, this example demonstrates that smaller gene signatures based on subsets of 5, 6, 7, 8, 9, 10, 11 or 12 of the original 13 transcripts have good discriminatory power and are able to reliably identify individuals with KD vs individuals who do not have KD.

References

1. Kawasaki T, Kosaki F, Okawa S, Shigematsu I, Yanagawa H. A new infantile acute febrile mucocutaneous lymph node syndrome (MLNS) prevailing in Japan. Pediatrics. 1974;54(3):271-6. Epub 1974/09/01. PubMed PMID: 4153258.

2. Makino N, Nakamura Y, Yashiro M, Ae R, Tsuboi S, Aoyama Y, et al. Descriptive epidemiology of Kawasaki disease in Japan, 2011-2012: from the results of the 22nd nationwide survey. Journal of epidemiology / Japan Epidemiological Association. 2015;25(3):239-45. doi: 10.2188/jea.JE20140089. PubMed PMID: 25716368; PubMed Central PMCID: PMCPMC4341001.

3. Du ZD, Zhao D, Du J, Zhang YL, Lin Y, Liu C, et al. Epidemiologic study on Kawasaki disease in Beijing from 2000 through 2004. The Pediatric infectious disease journal. 2007;26(5):449-51. doi: 10.1097/01.inf.0000261196.79223.18. PubMed PMID: 17468660.

4. Kim GB, Park S, Eun LY, Han JW, Lee SY, Yoon KL, et al. Epidemiology and Clinical Features of Kawasaki Disease in South Korea, 2012-2014. The Pediatric infectious disease journal. 2017;36(5):482-5. Epub 2016/12/21. doi: 10.1097/INF.0000000000001474. PubMed PMID: 27997519.

5. Lue HC, Chen LR, Lin MT, Chang LY, Wang JK, Lee CY, et al. Estimation of the incidence of Kawasaki disease in Taiwan. A comparison of two data sources: nationwide hospital survey and national health insurance claims. Pediatr Neonatol. 2014;55(2):97-100. doi: 10.1016/j.pedneo.2013.05.011. PubMed PMID: 23890670.

6. Harnden A, Mayon-White R, Perera R, Yeates D, Goldacre M, Burgner D. Kawasaki disease in England: ethnicity, deprivation, and respiratory pathogens. The Pediatric infectious disease journal. 2009;28(l):21-4. PubMed PMID: 19145710.

7. Holman RC, Belay ED, Christensen KY, Folkema AM, Steiner CA, Schonberger LB.

Hospitalizations for Kawasaki syndrome among children in the United States, 1997-2007. The Pediatric infectious disease journal. 2010;29(6):483-8. doi: 10.1097/INF.0b013e3181cf8705. PubMed PMID: 20104198.

8. Kato H, Sugimura T, Akagi T, Sato N, Hashino K, Maeno Y, et al. Long-term consequences of Kawasaki disease. A 10- to 21-year follow-up study of 594 patients. Circulation. 1996;94(6): 1379-85. Epub 1996/09/15. PubMed PMID: 8822996.

9. Suda K, Iemura M, Nishiono H, Teramachi Y, Koteda Y, Kishimoto S, et al. Long-term prognosis of patients with Kawasaki disease complicated by giant coronary aneurysms: a single-institution experience. Circulation. 2011;123(17):1836-42. doi: 10.1161/CIRCULATIONAHA.llO.978213. PubMed PMID: 21502578. Daniels LB, Gordon JB, Burns JC. Kawasaki disease: late cardiovascular sequelae. Current opinion in cardiology. 2012;27(6):572-7. doi: 10.1097/HCO.0b013e3283588f06. PubMed PMID: 23075819.

Yu JJ. Use of corticosteroids during acute phase of Kawasaki disease. World J Clin Pediatr. 2015;4(4):135-42. doi: 10.5409/wjcp.v4.i4.135. PubMed PMID: 26566486; PubMed Central PMCID: PMCPMC4637804.

Tremoulet AH, Jain S, Jaggi P, Jimenez-Fernandez S, Pancheri JM, Sun X, et al. Infliximab for intensification of primary therapy for Kawasaki disease: a phase 3 randomised, double-blind, placebo-controlled trial. Lancet. 2014;383(9930):1731-8. doi: 10.1016/S0140-6736(13) 62298-9. PubMed PMID: 24572997.

Dominguez SR, Anderson MS, El-Adawy M, Glode MP. Preventing coronary artery abnormalities: a need for earlier diagnosis and treatment of Kawasaki disease. Pediatr Infect Dis J. 2012;31(12):1217-20. Epub 2012/07/05. doi: 10.1097/INF.0b013e318266bcf9. PubMed PMID: 22760536.

McCrindle BW, Rowley AH, Newburger JW, Burns JC, Bolger AF, Gewitz M, et al. Diagnosis, Treatment, and Long-Term Management of Kawasaki Disease: A Scientific Statement for Health Professionals From the American Heart Association. Circulation. 2017;135(17):e927-e99. Epub 2017/03/31. doi: 10.1161/CIR.0000000000000484. PubMed PMID: 28356445.

Anderson ST, Kaforou M, Brent AJ, Wright VJ, Banwell CM, Chagaluka G, et al. Diagnosis of childhood tuberculosis and host RNA expression in Africa. The New England journal of medicine. 2014;370(18):1712-23. doi: 10.1056/NEJMoal303657. PubMed PMID: 24785206; PubMed Central PMCID: PMC4069985.

Ramilo O, Allman W, Chung W, Mejias A, Ardura M, Glaser C, et al. Gene expression patterns in blood leukocytes discriminate patients with acute infections. Blood. 2007;109(5):2066-77. doi: 10.1182/blood-2006-02-002477. PubMed PMID: 17105821; PubMed Central PMCID: PMCPMC1801073.

Frangou EA, Bertsias GK, Boumpas DT. Gene expression and regulation in systemic lupus erythematosus. Eur J Clin Invest. 2013;43(10):1084-96. doi: 10.1111/eci.l2130. PubMed PMID: 23902282.

Jia HL, Liu CW, Zhang L, Xu WJ, Gao XJ, Bai J, et al. Sets of serum exosomal microRNAs as candidate diagnostic biomarkers for Kawasaki disease. Scientific reports. 2017;7:44706. Epub 2017/03/21. doi: 10.1038/srep44706. PubMed PMID: 28317854; PubMed Central PMCID: PMCPMC5357789.

Kuo HC, Hsieh KS, Ming-Huey Guo M, Weng KP, Ger LP, Chan WC, et al. Next-generation sequencing identifies micro-RNA-based biomarker panel for Kawasaki disease. The Journal of allergy and clinical immunology. 2016; 138(4): 1227-30. Epub 2016/07/28. doi: 10.1016/j.jaci.2016.04.050. PubMed PMID: 27450727.

Herberg JA, Kaforou M, Wright VJ, Shades H, Eleftherohorinou H, Hoggart CJ, et al. Diagnostic Test Accuracy of a 2-Transcript Host RNA Signature for Discriminating Bacterial vs Viral Infection in Febrile Children. JAMA. 2016;316(8):835-45. Epub 2016/08/24. doi: 10.1001/jama.2016.11236. PubMed PMID: 27552617.

Hoang LT, Shimizu C, Ling L, Naim AN, Khor CC, Tremoulet AH, et al. Global gene expression profiling identifies new therapeutic targets in acute Kawasaki disease. Genome Med. 2014;6(11):541. doi: 10.1186/sl3073-014-0102-6. PubMed PMID: 25614765; PubMed Central PMCID: PMCPMC4279699.

Herberg JA, Kaforou M, Gormley S, Sumner ER, Patel S, Jones KD, et al. Transcriptomic profiling in childhood H1N1/09 influenza reveals reduced expression of protein synthesis genes. J Infect Dis. 2013;208(10):1664-8. Epub 2013/08/01. doi: 10.1093/infdis/jit348. PubMed PMID: 23901082; PubMed Central PMCID: PMCPmc3805235.

Watanabe S. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research. 2010;11:3571-94.

Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics (Oxford, England). 2007;8(l):118-27. Epub 2006/04/25. doi: 10.1093/biostatistics/kxj037. PubMed PMID: 16632515.

Abe J, Ebata R, Jibiki T, Yasukawa K, Saito H, Terai M. Elevated granulocyte colony- stimulating factor levels predict treatment failure in patients with Kawasaki disease. The Journal of allergy and clinical immunology. 2008;122(5):1008-13.e8. Epub 2008/10/22. doi: 10.1016/j.jaci.2008.09.011. PubMed PMID: 18930517.

Abe J, Jibiki T, Noma S, Nakajima T, Saito H, Terai M. Gene expression profiling of the effect of high-dose intravenous Ig in patients with Kawasaki disease. Journal of immunology. 2005;174(9):5837-45. PubMed PMID: 15843588.

Fury W, Tremoulet AH, Watson VE, Best BM, Shimizu C, Hamilton J, et al. Transcript abundance patterns in Kawasaki disease patients with intravenous immunoglobulin resistance. Hum Immunol. 2010;71(9):865-73. doi: 10.1016/j.humimm.2010.06.008. PubMed PMID: 20600450; PubMed Central PMCID: PMCPMC2929310.

Popper SJ, Shimizu C, Shike H, Kanegaye JT, Newburger JW, Sundel RP, et al. Gene- expression patterns reveal underlying biological processes in Kawasaki disease. Genome Biol. 2007;8(12):R261. doi: 10.1186/gb-2007-8-12-r261. PubMed PMID: 18067656; PubMed Central PMCID: PMCPMC2246263.

Popper SJ, Watson VE, Shimizu C, Kanegaye JT, Burns JC, Reiman DA. Gene transcript abundance profiles distinguish Kawasaki disease from adenovirus infection. J Infect Dis. 2009;200(4):657-66. Epub 2009/07/09. doi: 10.1086/603538. PubMed PMID: 19583510; PubMed Central PMCID: PMC2878183.

Ebihara T, Endo R, Kikuta H, Ishiguro N, Ma X, Shimazu M, et al. Differential gene expression of S100 protein family in leukocytes from patients with Kawasaki disease. European journal of pediatrics. 2005;164(7):427-31. doi: 10.1007/s00431-005-1664-5. PubMed PMID: 15838637. Hu XR, Yu JS, Crosby SD, Storch GA. Gene expression profiles in febrile children with defined viral and bacterial infection. P Natl Acad Sci USA. 2013;110(31):12792-7. doi: 10.1073/pnas.1302968110. PubMed PMID: WOS:000322441500067.

O'Hanlon TP, Rider LG, Gan L, Fannin R, Paules RS, Umbach DM, et al. Gene expression profiles from discordant monozygotic twins suggest that molecular pathways are shared among multiple systemic autoimmune diseases. Arthritis Res Ther. 2011;13(2):R69. doi: 10.1186/ar3330. PubMed PMID: 21521520; PubMed Central PMCID: PMCPMC3132064. Ishii T, Onda H, Tanigawa A, Ohshima S, Fujiwara H, Mima T, et al. Isolation and expression profiling of genes upregulated in the peripheral blood cells of systemic lupus erythematosus patients. DNA Res. 2005;12(6):429-39. doi: 10.1093/dnares/dsi020. PubMed PMID: WOS:000242119900004.

Fabriek BO, van Bruggen R, Deng DM, Ligtenberg AJ, Nazmi K, Schornagel K, et al. The macrophage scavenger receptor CD163 functions as an innate immune sensor for bacteria. Blood. 2009;113(4):887-92. doi: 10.1182/blood-2008-07-167064. PubMed PMID: 18849484.

Minich LL, Sleeper LA, Atz AM, McCrindle BW, Lu M, Colan SD, et al. Delayed diagnosis of Kawasaki disease: what are the risk factors? Pediatrics. 2007;120(6):el434-40. doi: 10.1542/peds.2007-0815. PubMed PMID: 18025079.

Newburger JW, Takahashi M, Gerber MA, Gewitz MH, Tani LY, Burns JC, et al. Diagnosis, treatment, and long-term management of Kawasaki disease: a statement for health professionals from the Committee on Rheumatic Fever, Endocarditis and Kawasaki Disease, Council on Cardiovascular Disease in the Young, American Heart Association. Circulation. 2004;110(17):2747-71. doi: 10.1161/01.CIR.0000145143.19711.78. PubMed PMID: 15505111.

Petty RE, Southwood TR, Manners P, Baum J, Glass DN, Goldenberg J, et al. International League of Associations for Rheumatology classification of juvenile idiopathic arthritis: second revision, Edmonton, 2001. J Rheumatol. 2004;31(2):390-2. PubMed PMID: 14760812.

Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray. Bioinformatics. 2008;24(13):1547-8. doi: 10.1093/bioinformatics/btn224. PubMed PMID: 18467348.

Hoang LT, Shimizu C, Ling L, Naim AN, Khor CC, Tremoulet AH, et al. Global gene expression profiling identifies new therapeutic targets in acute Kawasaki disease. Genome Med. 2014;6(11):541. doi: 10.1186/sl3073-014-0102-6. PubMed PMID: 25614765; PubMed Central PMCID: PMCPMC4279699.

Herberg JA, Kaforou M, Gormley S, Sumner ER, Patel S, Jones KD, et al. Transcriptomic profiling in childhood H1N1/09 influenza reveals reduced expression of protein synthesis genes. J Infect Dis. 2013;208(10):1664-8. Epub 2013/08/01. doi: 10.1093/infdis/jit348. PubMed PMID: 23901082; PubMed Central PMCID: PMCPmc3805235. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics (Oxford, England). 2007;8(l):118-27. Epub 2006/04/25. doi: 10.1093/biostatistics/kxj037. PubMed PMID: 16632515.

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. doi: 10.1093/nar/gkv007. PubMed PMID: 25605792; PubMed Central PMCID: PMCPMC4402510.

Watanabe S. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. J ournal of Machine Learning Research. 2010;11:3571-94.

Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Statistics and Computing. 2014;24(6):997-1016.

Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics. 2011;12:77. doi: 10.1186/1471-2105-12-77. PubMed PMID: 21414208; PubMed Central PMCID: PMC3068975.

Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(l):32-5. Epub 1950/01/01. PubMed PMID: 15405679.

Bernardo JM, Smith AFM. Bayesian theory. Chichester, Eng. ; New York: Wiley; 1994. xiv, 586 p. p.

Hoggart C.J. (2018). PReMS: Parallel Regularised Regression Model Search for sparse bio signature discovery. bioRxiv 355479; doi: https://doi.org/10.1101/355479.