Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR THE CHARACTERISATION OF BREAST MICROCALCIFICATIONS USING A RAMAN SPECTROSCOPY IMAGING TECHNIQUE IN THE DIAGNOSIS OF BREAST CANCER
Document Type and Number:
WIPO Patent Application WO/2020/115645
Kind Code:
A1
Abstract:
A method is described for ex vivo classifying breast microcalcifications (MCs) in a breast tissue sample, the method comprising the steps of: a) collecting Raman spectroscopy imaging data of a breast tissue sample containing at least one MC, the imaging data relating to the at least one MC; b) determining the composition of the at least one MC, based on reference spectra of at least one selected typical MC calcified components; c) producing an average Raman spectrum for each at least one MC by averaging at least one Type II MC component signal from the calcified components of the at least one MC; and d) carrying out a multivariate analysis with the average Raman spectrum for each at least one MC, based on a plurality of average Raman spectra averaging the same at least one Type II MC component signal from the calcified components of respective MCs, thus obtaining a classification of the at least one MC as benign MC or malignant MC.

Inventors:
VANNA RENZO (IT)
CORSI FABIO RUGGERO (IT)
MORASSO CARLO FRANCESCO (IT)
TORTI EMANUELE (IT)
LEPORATI FRANCESCO (IT)
Application Number:
PCT/IB2019/060383
Publication Date:
June 11, 2020
Filing Date:
December 03, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ISTITUTI CLINICI SCIENT MAUGERI S P A SB (IT)
International Classes:
G01N21/65
Domestic Patent References:
WO2003087793A12003-10-23
Foreign References:
US20040073120A12004-04-15
Other References:
LIJIA LIANG ET AL: "Exploring type II microcalcifications in benign and premalignant breast lesions by shell-isolated nanoparticle-enhanced Raman spectroscopy (SHINERS)", SPECTROCHIMICA ACTA. PART A: MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, vol. 132, 5 May 2014 (2014-05-05), NL, pages 397 - 402, XP055663106, ISSN: 1386-1425, DOI: 10.1016/j.saa.2014.04.147
JENNIE A.M.R. KUNITAKE ET AL: "Correlative imaging reveals physiochemical heterogeneity of microcalcifications in human breast carcinomas", JOURNAL OF STRUCTURAL BIOLOGY, vol. 202, no. 1, 6 December 2017 (2017-12-06), United States, pages 25 - 34, XP055663010, ISSN: 1047-8477, DOI: 10.1016/j.jsb.2017.12.002
CAMILLE SCOTTÉ ET AL: "Assessment of Compressive Raman versus Hyperspectral Raman for Microcalcification Chemical Imaging", ANALYTICAL CHEMISTRY, vol. 90, no. 12, 15 May 2018 (2018-05-15), US, pages 7197 - 7203, XP055662803, ISSN: 0003-2700, DOI: 10.1021/acs.analchem.7b05303
R. SATHYAVATHI ET AL: "Raman spectroscopic sensing of carbonate intercalation in breast microcalcifications at stereotactic biopsy", SCIENTIFIC REPORTS, vol. 5, no. 1, 30 April 2015 (2015-04-30), XP055663012, DOI: 10.1038/srep09907
BLEYERWELCH, N ENGL J MED, vol. 367, 2012, pages 21
ONGMANDL, HEALTH AFFAIRS, vol. 34.4, 2015, pages 576 - 583
BARREAU, B.MASCAREL, I. DEFEUGA, C.MACGROGAN, G.DILHUYDY, M.-H.PICOT, V.DILHUYDY, J.-M.DE LARA, C.T.BUSSIERES, E.SCHREER, I.: "Mammography of ductal carcinoma in situ of the breast: Review of 909 cases with radiographic-pathologic correlations", EUR. J. RADIOL., vol. 54, 2005, pages 55 - 61, XP004813054, DOI: 10.1016/j.ejrad.2004.11.019
LAKHANI, S.R., WHO CLASSIFICATION OF TUMOURS OF THE BREAST (INTERNATIONAL AGENCY FOR RESEARCH ON CANCER, 2012
BENT, C.K.BASSETT, L.W.D'ORSI, C.J.SAYRE, J.W.: "The Positive Predictive Value of BI-RADS Microcalcification Descriptors and Final Assessment Categories", AM. J. ROENTGENOL., vol. 194, 2010, pages 1378 - 1383
HENROT, P.LEROUX, A.BARLIER, C.GENIN, P.: "Breast microcalcifications: The lesions in anatomical pathology", DIAGN. INTERV. IMAGING, vol. 95, 2014, pages 141 - 152
FRAPPART LREMY ILIN HCBREMOND ARAUDRANT DGROUSSON B ET AL.: "Different types of microcalcifications observed in breast pathology", VICHOWS ARCHIV A PATHOL ANAT., vol. 410, 1987, pages 179 - 87, XP009155233, DOI: 10.1007/BF00710823
MORGAN, M.P.COOKE, M.M.MCCARTHY, G.M.: "Microcalcifications Associated with Breast Cancer: An Epiphenomenon or Biologically Significant Feature of Selected Tumors?", J MAMMARY GLAND BIOL NEOPLASIA, vol. 10, 2005, pages 181 - 187, XP019283023
VANNA, R.RONCHI, P.LENFERINK, A.T.M.TRESOLDI, C.MORASSO, C.MEHN, D.BEDONI, M.PICCIOLINI, S.TERSTAPPEN, L.W.M.M.CICERI, F. ET AL.: "Label-free imaging and identification of typical cells of acute myeloid leukaemia and myelodysplastic syndrome by Raman microspectroscopy", THE ANALYST, vol. 140, 2015, pages 1054 - 1064
WACHSMANN-HOGIU, S.WEEKS, T.HUSER, T.: "Chemical analysis in vivo and in vitro by Raman spectroscopy-from single cells to humans", CURR. OPIN. BIOTECHNOL., vol. 20, 2009, pages 63 - 73, XP026095192, DOI: 10.1016/j.copbio.2009.02.006
BARMAN, I.DINGARI, N.C.SAHA, A.MCGEE, S.GALINDO, L.H.LIU, W.PLECHA, D.KLEIN, N.DASARI, R.R.FITZMAURICE, M.: "Application of Raman Spectroscopy to Identify Microcalcifications and Underlying Breast Lesions at Stereotactic Core Needle Biopsy", CANCER RES., vol. 73, 2013, pages 3206 - 3215
HAKA, A.S.SHAFER-PELTIER, K.E.FITZMAURICE, M.CROWE, J.DASARI, R.R.FELD, M.S.: "Identifying Microcalcifications in Benign and Malignant Breast Lesions by Probing Differences in Their Chemical Composition Using Raman Spectroscopy", CANCER RES., vol. 62, 2002, pages 5375 - 5380
KUNITAKE, J.A.M.R.CHOI, S.NGUYEN, K.X.LEE, M.M.HE, F.SUDILOVSKY, D.MORRIS, P.G.JOCHELSON, M.S.HUDIS, C.A.MULLER, D.A. ET AL.: "Correlative imaging reveals physiochemical heterogeneity of microcalcifications in human breast carcinomas", J. STRUCT. BIOL., vol. 202, 2018, pages 25 - 34
SAHA, A.BARMAN, I.DINGARI, N.C.MCGEE, S.VOLYNSKAYA, Z.GALINDO, L.H.LIU, W.PLECHA, D.KLEIN, N.DASARI, R.R. ET AL.: "Raman spectroscopy: A real-time tool for identifying microcalcifications during stereotactic breast core needle biopsies", BIOMED. OPT. EXPRESS, vol. 2, 2011, pages 2792 - 2803
BAKER RROGERS KDSHEPHERD NSTONE N: "New relationships between breast microcalcifications and cancer", BR J CANCER, vol. 103, 2010, pages 1034 - 9, XP055285600, DOI: 10.1038/sj.bjc.6605873
SATHYAVATHI RSAHA ASOARES JSSPEGAZZINI NMCGEE SRAO DASARI R ET AL.: "Raman spectroscopic sensing of carbonate intercalation in breast microcalcifications at stereotactic biopsy", SCIENTIFIC REPORTS, vol. 5, 2015, pages 9907
KUNITAKE, J.A.M.R.CHOI, S.NGUYEN, K.X.LEE, M.M.HE, F.SUDILOVSKY, D.MORRIS, P.G.JOCHELSON, M.S.HUDIS, C.A.MULLER, D.A. ET AL.: "Correlative imaging reveals physiochemical heterogeneity of microcalcifications in human breast carcinomas", JOURNAL OF STRUCTURAL BIOLOGY, vol. 202, 2018, pages 25 - 34
FRAPPART, L.REMY, I.LIN, H.C.BREMOND, A.RAUDRANT, D.GROUSSON, B.VAUZELLE, J.L.: "Different types of microcalcifications observed in breast pathology", VICHOWS ARCHIV A PATHOL ANAT, vol. 410, 1987, pages 179 - 187, XP009155233, DOI: 10.1007/BF00710823
POLITI, YAEL ET AL.: "Sea urchin spine calcite forms via a transient amorphous calcium carbonate phase", SCIENCE, vol. 306.5699, 2004, pages 1161 - 1164
FANDOS-MORERA, A.PRATS-ESTEVE, M.TURA-SOTERAS, J. M.TRAVERIA-CROS, A.: "Breast tumors: composition of microcalcifications", RADIOLOGY, vol. 169, no. 2, 1988, pages 325 - 327
T. W. ANDERSON: "An Introduction to Multivariate Statistical Analysis", 1958, WILEY
N. J. PERKINSE. F. SCHISTERMAN: "The inconsistency of ''optimal'' cut-points using two ROC based criteria", AMERICAN JOURNAL OF EPIDEMIOLOGY, vol. 163, no. 7, 2006, pages 670 - 675
ELLIS 10HUMPHREYS SMICHELL MPINDER SEWELLS CAZAKHOUR HD ET AL.: "Best Practice No 179. Guidelines for breast needle core biopsy handling and reporting in breast screening assessment", J CLIN PATHOL., vol. 57, 2004, pages 897 - 902
D'ORSI CSICKLES EMENDELSON EMORRIS E: "ACR BI-RADS@ Atlas, Breast Imaging Reporting and Data System", 2013, AMERICAN COLLEGE OF RADIOLOGY
LAGIER RBAUD C-A: "Magnesium Whitlockite, a Calcium Phosphate Crystal of Special Interest in Pathology", PATHOLOGY - RESEARCH AND PRACTICE, vol. 199, 2003, pages 329 - 35, XP004958098, DOI: 10.1078/0344-0338-00425
SCOTT RSTONE NKENDALL CGERAKI KROGERS K: "Relationships between pathology and crystal structure in breast calcifications: an in situ X-ray diffraction study in histological sections", NPJ BREAST CANCER, vol. 2, 2016, pages 16029
LOSTE EWILSON RMSESHADRI RMELDRUM FC: "The role of magnesium in stabilising amorphous calcium carbonate and controlling calcite morphologies", JOURNAL OF CRYSTAL GROWTH, vol. 254, 2003, pages 206 - 18, XP004424654, DOI: 10.1016/S0022-0248(03)01153-9
BENIASH EAIZENBERG JADDADI LWEINER S: "Amorphous calcium carbonate transforms into calcite during sea urchin larval spicule growth", PROCEEDINGS OF THE ROYAL SOCIETY OF LONDON SERIES B: BIOLOGICAL SCIENCES, vol. 264, 1997, pages 461 - 5
MUL FFM DEOTTO CGREVE JARENDS JBOSCH JJT: "Calculation of the Raman line broadening on carbonation in synthetic hydroxyapatite", JOURNAL OF RAMAN SPECTROSCOPY, vol. 19, 1988, pages 13 - 21
JOLLIFF BLHUGHES JMFREEMAN JJZEIGLER RA: "Crystal chemistry of lunar merrillite and comparison to other meteoritic and planetary suites of whitlockite and merrillite", AMERICAN MINERALOGIST, vol. 91, 2006, pages 1583 - 95
COX RFHERNANDEZ-SANTANA ARAMDASS SMCMAHON GHARMEY JHMORGAN MP: "Microcalcifications in breast cancer: novel insights into the molecular mechanism and functional consequence of mammary mineralisation", BRITISH JOURNAL OF CANCER, vol. 106, 2012, pages 525 - 37
BELLAHCENE ACASTRONOVO V: "Increased expression of osteonectin and osteopontin, two bone matrix proteins, in human breast cancer", AM J PATHOL., vol. 146, 1995, pages 95 - 100, XP002596215
LEV-TOAFF ASFEIG SASAITAS VLFINKEL GCSCHWARTZ GF: "Stability of malignant breast microcalcifications", RADIOLOGY, 1994, Retrieved from the Internet
RUDIN AVHOSKIN TLFAHY AFARRELL AMNASSAR AGHOSH K ET AL.: "Flat Epithelial Atypia on Core Biopsy and Upgrade to Cancer: a Systematic Review and Meta-Analysis", ANN SURG ONCOL., vol. 24, 2017, pages 3549 - 58, XP036347159, DOI: 10.1245/s10434-017-6059-0
VIDAVSKY NKUNITAKE JAMRCHIOU AENORTHRUP PAPORRI TJLING L ET AL.: "Studying biomineralization pathways in a 3D culture model of breast cancer microcalcifications", BIOMATERIALS, vol. 179, 2018, pages 71 - 82, XP085419784, DOI: 10.1016/j.biomaterials.2018.06.030
FAOLAIN EOHUNTER MBBYRNE JMKELEHAN PLAMBKIN HABYRNE HJ ET AL.: "Raman Spectroscopic Evaluation of Efficacy of Current Paraffin Wax Section Dewaxing Agents", J HISTOCHEM CYTOCHEM., vol. 53, 2005, pages 121 - 9, XP002493778, DOI: 10.1369/jhc.4A6536.2005
TORTI, E.MARCINNO, B.VANNA, R.MORASSO, C.PICOTTI, F.VILLANI, L.LEPORATI, F.: "2019 22nd Euromicro Conference on Digital System Design (DSD)", August 2019, IEEE, article "Automatic and Unsupervised Identification of Specific Biochemical Features from Raman Mapping Data", pages: 464 - 469
TORTI EFLORIMBI GCASTELLI FORTEGA SFABELO HCALLICO GM ET AL.: "Parallel K-Means Clustering for Brain Cancer Detection Using Hyperspectral Images", ELECTRONICS, vol. 7, 2018, pages 283
Attorney, Agent or Firm:
CASCI, Tamara (IT)
Download PDF:
Claims:
CLAIMS

1. A method for ex vivo classifying breast microcalcifications (MCs) in a breast tissue sample, the method comprising the steps of:

a) collecting Raman spectroscopy imaging data of a breast tissue sample containing at least one MC, said imaging data relating to said at least one MC;

b) determining the composition of said at least one MC, based on reference spectra of at least one selected typical MC calcified components;

c) producing an average Raman spectrum for each said at least one MC by averaging at least one Type II MC component signal from the calcified components of said at least one MC; and

d) carrying out a multivariate analysis with said average Raman spectrum for each said at least one MC, based on a plurality of average Raman spectra averaging the same said at least one Type II MC component signal from the calcified components of respective MCs, thus obtaining a classification of said at least one MC as benign MC or malignant MC .

2. Method according to claim 1, wherein the step of collecting Raman spectroscopy imaging data comprises acquiring said Raman data with a confocal Raman spectrometer .

3. Method according to claim 2, wherein the light source of said Raman spectrometer is a monochromatic light source, preferably comprising a laser light source . 4. Method according to any of the previous claims, wherein the step of collecting Raman imaging data is carried out by scanning entirely the surface area of said at least one MC .

5. Method according to any of the previous claims, wherein the step of determining the composition of said at least one MC comprises the assessment of the presence and the quantification on said at least one MC of at least one calcified component chosen from hydroxyapatite form (HA) of calcium phosphate, whitlockite (WIT) form of calcium phosphate, amorphous calcium carbonate (aCaCa) , crystalline calcium carbonate (calcite) and calcium oxalate (CaO) .

6. Method according to any of the previous claims, wherein said step of determining the composition of the at least one MC comprises producing a spectral classification of each map point (pixel) and a spatial classification map and fusing them together by a majority vote system, thus producing a Raman image.

7. Method according to claim 6, wherein the production of said spectral classification comprises the evaluation, for each MC spectrum, of one or more indices, said indices preferably being chosen from: monotonicity of the spectra; correlation with the reference spectrum; and peaks positions.

8. Method according to claim 6, wherein the production of said spatial classification map comprises adopting the K-means clustering algorithm based on a random initialisation of the groups.

9. Method according to any of the previous claims, wherein in the step of producing an average Raman spectrum for said at least one MC, the average Raman spectrum for the at least one MC is produced by averaging the signals of HA and WIT.

10. Method according to any of the previous claims, wherein in the step of carrying out a multivariate analysis, the number of the average Raman spectra is greater than 25, preferably greater than 50, more preferably greater than 100, even more preferably greater than 200, most preferably greater than 400.

11. Method according to any of the previous claims, wherein said multivariate analysis is carried out by performing a principal component analysis (PCA) thus obtaining a plurality of principal components (PCs) .

12. Method according to claim 11, wherein, following said PCA analysis, a linear discriminant analysis (LDA) classification model is built using a plurality of said PCs; a leave-one-out cross validation is carried out to validate the classification model thus obtained; canonical variable 1 score is produced for use as unique classification variable; a receiver operating characteristic (ROC) curve is automatically calculating with relative area under the curve (AUC) ; an optimal threshold is obtained from the ROC curve, thus enabling to obtain a classification of said at least one MC as benign or malignant depending on their canonical variable 1 in relation to the threshold value.

Description:
METHOD FOR THE CHARACTERISATION OF BREAST MICROCALCIFICATIONS USING A RAMAN SPECTROSCOPY IMAGING TECHNIQUE IN THE DIAGNOSIS OF BREAST CANCER Field of application

The present invention relates to a method for the diagnosis of breast cancer using Raman spectroscopy, particularly a method for the classification of breast microcalcifications (MCs) as benign or malignant using Raman spectroscopy mapping.

Prior art

Breast cancer is the most commonly occurring cancer in women and the second most common cancer overall.

Current available methods for diagnosis include mammogram, ultrasound, MRI and biopsy. However, the reliability of the current methods is quite poor, leading to the follow up of cancers that turn out to be benign.

It has been reported that breast cancer has been overdiagnosed in 1.3 million U.S. women since the 1980s. (Bleyer and Welch, N Engl J Med 367; 21, 2012) . Moreover, it has been calculated that four million US dollars per year are spent on average in the US because of the expenditures involved for false-positive mammograms, invasive breast cancer, and ductal carcinoma in situ (Ong and Mandl, Health Affairs, 34.4 (2015) :576-583) .

Breast tissue microcalcifications (MCs) are common findings on screening mammography and are among the earliest signs of breast cancer (Barreau, B., Mascarel, I. de, Feuga, C., MacGrogan, G., Dilhuydy, M.-H., Picot, V., Dilhuydy, J.-M., de Lara, C.T., Bussieres, E., and Schreer, I. (2005) . Mammography of ductal carcinoma in situ of the breast: Review of 909 cases with radiographic-pathologic correlations. Eur. J. Radiol. 54, 55-61; Lakhani, S.R. (2012) . WHO Classification of Tumours of the Breast (International Agency for Research on Cancer) .

At the same time, from the use of well-known radiographic risk score systems, which include MC assessment, such as Breast Imaging-Reporting and Data System (BI-RADS), a recent study reported that only 28% of biopsies are further associated with malignancy (Bent, C.K., Bassett, L.W., D'Orsi, C.J., and Sayre, J.W. (2010) . The Positive Predictive Value of BI-RADS Microcalcification Descriptors and Final Assessment Categories. Am. J. Roentgenol. 194, 1378-1383) .

In parallel, the diagnosis by histological and immunohistological evaluations is still laborious and time-consuming. These evaluations include the morphological assessment of lesions by expert pathologists for the recognition of specific benign or malignant features that in some cases need to be confirmed by further immunostaining procedures. In this context, the histological assessment only includes a general description of MCs amount and of their general appearance, without any detail about their biomolecular or biochemical characterization (Henrot, P., Leroux, A., Barlier, C., and Genin, P. (2014) . Breast microcalcifications: The lesions in anatomical pathology. Diagn. Interv. Imaging 95, 141-152; Frappart L, Remy I, Lin HC, Bremond A, Raudrant D, Grousson B, et al . Different types of microcalcifications observed in breast pathology. Vichows Archiv A Pathol Anat . 1987;410:179-87; Morgan MP, Cooke MM, McCarthy GM. Microcalcifications Associated with Breast Cancer: An Epiphenomenon or Biologically Significant Feature of Selected Tumors? J Mammary Gland Biol Neoplasia. 2005;10:181-7) . This is probably due to the inorganic nature of MCs that cannot be easily characterized by commonly used histological approaches (i.e. staining and immunostaining) and/or by visual inspection only. This also means that nowadays, biochemical features of MCs are not considered and investigated also due to the lack of appropriate technologies and methodologies.

Raman Spectroscopy (RS) is a photonic approach capable of providing detailed chemical information of analysed samples without complex tissue preparation or staining (Vanna, R., Ronchi, P., Lenferink, A.T.M., Tresoldi, C., Morasso, C., Mehn, D., Bedoni, M., Picciolini, S., Terstappen, L.W.M.M., Ciceri, F., et al . (2015) . Label-free imaging and identification of typical cells of acute myeloid leukaemia and myelodysplastic syndrome by Raman microspectroscopy. The Analyst 140, 1054-1064; Wachsmann-Hogiu, S., Weeks, T., and Huser, T. (2009) . Chemical analysis in vivo and in vitro by Raman spectroscopy—from single cells to humans. Curr. Opin. Biotechnol. 20, 63-73. Furthermore, RS has a proven ability to distinguish different inorganic features including those commonly present in MCs. Some studies based on RS have explored MCs chemical features with different approaches and results (Barman, I., Dingari, N.C., Saha, A., McGee, S., Galindo, L.H., Liu, W., Plecha, D., Klein, N., Dasari, R.R., and Fitzmaurice, M. (2013) . Application of Raman Spectroscopy to Identify Microcalcifications and Underlying Breast Lesions at Stereotactic Core Needle Biopsy. Cancer Res. 73, 3206- 3215; Haka, A.S., Shafer-Peltier, K.E., Fitzmaurice, M., Crowe, J., Dasari, R.R., and Feld, M.S. (2002) . Identifying Microcalcifications in Benign and Malignant Breast Lesions by Probing Differences in Their Chemical Composition Using Raman Spectroscopy. Cancer Res. 62, 5375-5380; Kunitake, J.A.M.R., Choi, S., Nguyen, K.X., Lee, M.M., He, F., Sudilovsky, D., Morris, P.G., Jochelson, M.S., Hudis, C.A., Muller, D.A., et al . (2018) . Correlative imaging reveals physiochemical heterogeneity of microcalcifications in human breast carcinomas. J. Struct. Biol. 202, 25-34; Saha, A., Barman, I., Dingari, N.C., McGee, S., Volynskaya, Z., Galindo, L.H., Liu, W., Plecha, D., Klein, N., Dasari, R.R., et al . (2011) . Raman spectroscopy: A real-time tool for identifying microcalcifications during stereotactic breast core needle biopsies. Biomed. Opt. Express 2, 2792-2803) . Haka et al . (2002) firstly investigated the chemical composition of MCs by Raman from 11 patients with a diagnosis of benign tumour or ductal carcinoma in situ (DCIS) , reporting some spectral features associated with malignancy (Haka et al . , 2002, W02003087793A1 2002, US20040073120A1 ) . In particular they reported for the first time Raman spectra of Type I microcalcifications (consisting of calcium oxalate dehydrate, CA) and Raman spectra of Type IT MC (consisting of calcium phosphate, in particular hydroxyapatite (HA) ) . On the other hand, in the mentioned study, only single-point scans with 2 pm spot size were arbitrarily performed in an unspecified position of MC lesions without recording a complete characterization of MC whose size can be from a few micrometres to 1 mm, also presenting intrinsic heterogeneities (Henrot et al . , 2014) . Consequently, this approach enables a partial description of MCs thus limiting the detection of potential biomarkers associated to the spatial distribution of certain MCs components and/or to the presence of some specific components in isolated MCs portions. In addition, Haka et al . investigated a small number of samples (n=ll) and a small number of MCs (n=90 ) .

Saha et al . and Barman et al . , further attempted to use RS to assess MCs directly on fresh needle biopsy tissue samples form 33 patients (Barman et al . , 2013; Saha et al . , 2011) . Their approaches were able to recognize lesions with or without MCs and, in parallel, to recognize different lesion subtypes (cancer, fibroadenoma, fibrocystic change) but using the whole tissue spectral information and not those related to MCs embedded in the tissue matrix. Even if this approach was pioneering because performed on fresh biopsies, the characterization and the assessment of MCs were partial and only confined to the detection of their presence thanks to the detection of HA typical peaks.

In summary, the previously proposed approaches only partially exploited the potentialities of using MCs as biomarkers of breast cancer malignancy.

Clearly, a faster and more reliable method of assessment of the malignancy of a tumour is required to provide higher chances of survival, while saving resources and unnecessary stress.

As mentioned above, a few biophysical approaches, including Raman and infrared spectroscopy, suggested a relationship between breast lesion pathology and MC composition (Haka et al . 2002; Baker R, Rogers KD, Shepherd N, Stone N. New relationships between breast microcalcifications and cancer. Br J Cancer. 2010;103:1034-9; Sathyavathi R, Saha A, Soares JS, Spegazzini N, McGee S, Rao Dasari R, et al . Raman spectroscopic sensing of carbonate intercalation in breast microcalcifications at stereotactic biopsy. Scientific Reports. 2015; 5: 9907) . Raman spectroscopy (RS) is a promising approach, considering its ability to provide highly informative data about the molecular composition of the sampled area, specifically regarding mineralized materials, but a detailed RS characterization on relevant cohort of patients undergoing screening mammography is still missing and it is necessary to move to clinics.

The technical problem underlying the present invention is thus that of making available an automated Raman-based diagnostic method for breast cancer with improved diagnostic capability, which allows the output of a fast and highly accurate assessment of the presence of a malignant cancer in a breast tissue from the assessment of the MCs . In particular, the technical problem underlying the present invention is that of making available such a method which is also practical to use.

Another technical problem underlying the present invention is that of making available such a method that is capable of assessing the presence of a malignant cancer from the surrounding tissue.

Summary of the invention

Such a technical problem has been solved by a method for ex vivo classifying breast microcalcifications (MCs) in a breast tissue sample, the method comprising the steps of:

a) collecting Raman spectroscopy imaging data of a breast tissue sample containing at least one MC, the imaging data relating to the at least one MC;

b) determining the composition of the at least one MC, based on reference spectra of at least one selected typical MC calcified component;

c) producing an average Raman spectrum for each at least one MC by averaging at least one Type II MC component signal from the calcified components of the at least one MC; and

d) carrying out a multivariate analysis with said average Raman spectrum for each at least one MC, based on a plurality of average Raman spectra averaging the same said at least one Type II MC component signal from the calcified components of respective MCs, thus obtaining a classification of said at least one MC as benign MC or malignant MC .

Microcalcifications (MCs) are small deposits of calcium salts that can be detected by imaging (e.g. mammography, echography, resonance magnetic resonance (RMN) in clinics) . They can be scattered throughout the mammary gland or can be clustered. They are very common and although they can be an early sign of malignant breast cancer, they are usually benign. They are usually first detected on a mammogram. Although breast MCs are the most common mammographic feature of early breast cancer, their diagnostic value is currently defined and limited by morphological criteria.

Raman spectroscopy involves directing light at a specimen (in this case a breast tissue sample) which inelastically scatters some of the incident light. Inelastic interactions with the specimen can cause the scattered light to have wavelengths that are shifted relative to the wavelength of the incident light (Raman shift) . The wavelength spectrum of the scattered light (the Raman spectrum) contains information about the chemical nature of the portion of specimen illuminated by the incident light.

In a typical Raman imaging (or mapping) experiment, the specimen is sequentially illuminated according to a predefined area (map) with monochromatic light and the Raman scattered light is collected for each of the points mapped, usually by a raster scan pattern, during the experiment. A computer analysis of the map data is used to produce a composite image (hyperspectral data) highlighting the information desired according to specific spectral features selected and used to produce false-colour for each of the point (pixel) mapped, thus producing a false-colours Raman image.

Preferably, in the step of collecting Raman spectroscopy imaging data (also referred to as mapping data) , the tissue sample is in the form of a tissue slice having a thickness of 5 to 15 pm, more preferably about 10 pm. The tissue slice can be prepared according to standard preparation protocols for Raman spectroscopy known to the skilled person in the field. Preferably, the tissue slice is generated from formalin fixed paraffin embedded tissue blocks. The slice is then preferably microtomed and mounted on a mirrored stainless steel slide specific for Raman measurements. The slice is then preferably deparaffinised, dewaxed by two baths in hexane, more preferably hexane 95%, two baths of ethanol, preferably absolute, and a final bath of ethanol, preferably ethanol 95%. The final steps can be repeated three times. Preferably, the tissue slice is then air dried, more preferably for 1-3 hours, most preferably around 2 hours.

Preferably, the step of collecting Raman spectroscopy imaging data comprises acquiring the Raman data with a confocal Raman spectrometer, to maximize the lateral and vertical resolution.

The light source of the Raman spectrometer is a monochromatic light source, preferably a laser light source. The light source may for example comprise a single-mode stabilised diode laser operating in the visible/near-infrared range (400-1400 nm) , most preferably at 785nm, preferably with round shape spot, and preferably having a power of 40-200 mW, more preferably around 90 mW.

Preferably, the light source, more preferably a laser, is coupled with an objective lens, more preferably a 100 x objective, for example a N-Plan lOOx (NA 0.75, WD 0.37) Leica objective, and preferably with a diffraction grating between 400 to 2500 1/mm, more preferably a 1200 1/mm diffraction grating.

It is within the capabilities of the skilled person in the field to determine the most suitable light source equipment for the purpose.

Preferably the diffraction grating, is centred around a certain wavelength in order to be able to cover a spectral range between 400 and 3000 cm -1 , more preferably 700 and 1760 cm -1 .

Preferably, the light source is filtered so that the final power of the light to reach on the tissue sample is from 10 to 30 mW, more preferably around 20 mW .

Preferably, the method comprises a step of detecting the light by means of a detector, for example a charged coupled device (CCD) (1024x256 pixels), sensitive between 400 and 1060 nm and cooled at -60 to -80 °C, more preferably around -70 °C.

Preferably, the spectral resolution is from 2 to 6 pm, most preferably around 2 pm.

Preferably, the step of collecting Raman imaging data is carried out by defining a squared region on the MC, by defining a mapping step size (i.e. the distance between each spectrum acquired) more preferably from 1 to 20 pm and by performing an acquisition, more preferably by raster scan, over the surface area of the region to be mapped. Preferably the acquisition time for each point of the Raman map is from 0.5 to 5 seconds, more preferably around 1 second, with a single repetition .

Depending on the surface area of the region to be mapped, the squared region can be defined accordingly. For example, the squared region can be defined around the MC, or inside the MC .

Preferably, the step of collecting Raman imaging data is carried out by scanning at least 40%, more preferably at least 50%, even more preferably at least 70%, even more preferably at least 90%, even more preferably at least 95%, most preferably at least 99% of the surface area the at least one MC .

Preferably, the step of collecting Raman imaging data is carried out by scanning entirely (100%) the surface area the at least one MC .

It is within the capabilities of the skilled person to select the size of the squared region and the step- size to be used in view of the aim to capture the surface area of the region to be mapped with a reasonable resolution and in a reasonable acquisition time.

For example, it is possible to increase or lower the amount of data acquired by increasing or lowering the acquisition time; or by lowering or increasing the step size.

It is an advantage of the present invention that the mapping data are collected from the entire surface area of the MC in order to obtain a Raman image of the entire surface area of the MC .

Preferably, the step of collecting Raman imaging data comprises the removal of cosmic rays, the removal of background signal (baseline) and normalisation.

It is within the abilities of the person skilled in the field to apply suitable techniques known in the field. Cosmic rays can be removed, for example, using WiRe (Renishaw pic, Wotton-under-Edge . UK); the removal of backgroud signal can be carried out by baseline correction by fitting and substracting a polynomial function of the 10 th order to each spectrum, using for example MATLAB (MathWorks, Natick, MA, USA); normalisation can be performed through the unit vector method .

Preferably, the normalised data are filtered, more preferably with a moving average filter with a window size of 2 to 6, more preferably around 5.

Advantageously, step a) is performed in a semi automatic programmed approach comprising a system able to synchronize the acquisition of Raman spectra and the movement of the sample stage in order to automatically acquire map data from a single MC or even to automatically acquire multiple regions containing MCs .

Advantageously, steps b) to d) are carried out in a programmed data analysis unit comprising a data processor executing software instructions.

Preferably, the programmed data analysis unit comprises a programmed data processor such as a personal computer, an embedded computer, a microprocessor, a graphics processor, a digital signal processor or the like executing software, hardware and/or firmware instructions that cause the processor to extract the specific spectral characteristics from the Raman spectra. It is within the capabilities of the skilled person in the field to arrange the spectrum analysis unit according to the specific requirements.

Preferably, the programmed spectrum analysis unit operates in real time or near real time. Preferably, steps b) to d) are completed in 1-60 seconds, more preferably 1-10 seconds, even more preferably 1 second, for each MC .

Preferably, the step of determining the composition of the at least one MC comprises the assessment of the presence and the quantification on the at least one MC of at least one calcified component chosen from hydroxyapatite form (HA) of calcium phosphate, whitlockite (WIT) form of calcium phosphate, amorphous calcium carbonate (aCaCa) , crystalline calcium carbonate (calcite) and calcium oxalate (CaO) .

Calcium phosphate, in the HA form, is a known component of Type II MC . WIT is a rare form of calcium phosphate rarely reported in breast cancer tissue and described in only one Raman study for descriptive purposes (Kunitake, J.A.M.R., Choi, S., Nguyen, K.X., Lee, M.M., He, F., Sudilovsky, D., Morris, P.G., Jochelson, M.S., Hudis, C.A., Muller, D.A., et al . (2018) . Correlative imaging reveals physiochemical heterogeneity of microcalcifications in human breast carcinomas. Journal of Structural Biology 202, 25-34) .

The current histological description of MCs, classifies them as Type I and Type II MC (Frappart, L., Remy, I., Lin, H.C., Bremond, A., Raudrant, D., Grousson, B., and Vauzelle, J.L. (1987) . Different types of microcalcifications observed in breast pathology. Vichows Archiv A Pathol Anat 410, 179-187; Morgan, M.P., Cooke, M.M., and McCarthy, G.M. (2005) . Microcalcifications Associated with Breast Cancer: An Epiphenomenon or Biologically Significant Feature of Selected Tumors? J Mammary Gland Biol Neoplasia 10, 181- 187.) . Type I appear as amber in colour, are partially transparent and are composed by calcium oxalate (CO) when observed on histological specimens. They have been mostly associated with benign lesions but and are rarely observed. Type II MCs are grey-white, opaque and are composed by calcium phosphate, mainly hydroxyapatite (HA) when observed on histological samples. They are the most common type of breast MCs and are found in both benign and malignant lesions. Nowadays, distinguishing benign and malignant Type II MCs is not possible by currently used approach in histology. CaO is a known component of Type I MCs, as above described.

Both the amorphous (aCaCa) and the crystalline form (calcite) of Calcium carbonate crystals, identified as single and isolated species (calcium carbonate is also present as intercalating component of calcium phosphate) have been described in detail in biological samples (Politi, Yael, et al . "Sea urchin spine calcite forms via a transient amorphous calcium carbonate phase." Science 306.5699 (2004) : 1161-1164. Calcite as isolated component was previously reported in breast samples but only using X-ray spectroscopy (Fandos-Morera, A., Prats- Esteve, M., Tura-Soteras, J. M., & Traveria-Cros, A. (1988) . Breast tumors: composition of microcalcifications. Radiology, 169(2), 325-327) .

aCaCa was never reported as single component in breast MCs by Raman spectroscopy.

In the step of determining the composition of the at least one MC, any number and choice of calcified components may be used.

Preferably, the qualitative assessment is carried out on all five calcified components, namely hydroxyapatite (HA) , whitlockite (WIT) , amorphous calcium carbonate (aCaCa) , crystalline calcium carbonate (calcite) and calcium oxalate (CaO) .

According to another embodiment of the invention, the assessment is carried out on two calcified components, namely hydroxyapatite (HA) and whitlockite (WIT) , representing two forms of calcium phosphate, representing in turn the main component of Type II MC, representing the most common type of MC observed in breast .

In particular, the peak wavelengths corresponding to each component are as follows: 955-965 nm for hydroxyapatite (HA), 965-973 nm for the whitlockite (WIT), 1070-1083 nm for the amorphous calcium carbonate (aCaCa), 1083-1090 nm for crystalline calcium carbonate (calcite) and 1450-1490 nm for the Calcium Oxalate (CaO) .

It should be stressed that in this step is carried out the automatic extraction of pixels associated only to the MC (i.e. calcified components), thus excluding spectral information derived by other components (i.e non-calcified tissue, necrotic tissue (associated to a high fluorescence signal) and the optical support, usually a stainless steel slide) commonly present in the mapped area containing the MC of interest and showing typical Raman features.

Advantageously, the determination of the composition of the at least one MC comprises the determination of the overall spectral features of each at least one MC .

Reference spectra of selected typical MC calcified components can include one or more chosen from hydroxyapatite (HA) , whitlockite (WIT) , amorphous calcium carbonate (aCaCa) , crystalline calcium carbonate (calcite) , calcium oxalate (CaO) . The reference spectra also include the spectrum for non-calcified tissue, necrotic tissue (which produces a high fluorescence signal), and the optical support (usually stainless steel) .

The term "non-calcified tissue" is used herein to refer to the tissue surrounding a MC that does not exhibit the features of a necrotic tissue nor the features of calcification.

Accordingly, the method of the present invention includes the step of providing, in the programmed data analysis unit, the above reference spectra prior to the step of recognizing Raman features.

Preferably, the reference spectra of each typical MC calcified component is derived from the average of 15 spectra. This enables to obtain a representative reference spectrum that takes into account any variations .

Preferably, the step of determining the composition of the at least one MC comprises producing a spectral classification of each map point (pixel) and a spatial classification map and fusing them together by a Majority voting system, thus producing a Raman image. The Raman image can be then used for qualitative analysis purposes (e.g.to identify the presence and distribution of calcified components) or for diagnostic purposes (e.g. by multivariate analysis of Raman signals form the entire MC) as better specified in this invention.

In particular, the spectral classification contains a classification based only on spectral information. The production of the spectral classification comprises the evaluation, for each MC spectrum, of one or more indices. Preferably, the indices are chosen from: monotonicity of the spectra; correlation with the reference spectrum; and peaks positions. Preferably, the chosen indices are monotonicity of the spectra, correlation with the reference spectrum, and peaks positions. Preferably, the indices are fused using the majority vote system.

The production of the spatial classification map comprises adopting the K-means clustering algorithm (which is specific to detecting the borders) based on a random initialisation of the groups. Preferably, the clustering is carried out 10 times with different initialisations to produce reproducible results. Preferably, the results of ten clusterings phases are then fused together, thus obtaining the spatial classification map.

After fusing the spectral classification of each map point and the spatial classification map, for each K-means cluster, the correspondent area of the map can be evaluated and the groups can be labelled on the basis of a majority vote system, thus giving a Raman map (Raman image) reporting, for each map point, the composition derived by Raman information and the final Raman image for each MC .

Thus, for each Raman image, a detailed description of the MC composition can be provided, reporting: 1) the amount of pixels assigned to a specific component and 2) the average spectrum of all pixels assigned to a specific component .

Preferably, in the step of producing an average Raman spectrum for the at least one MC, the one or more Type II signals are the signals for HA and WIT.

As mentioned above, Type II MCs are grey-white, opaque and are composed by calcium phosphate, mainly hydroxyapatite (HA) and whitlockite (WIT) . Type II MC represent the most common subtype of MC, they are found in both benign and malignant lesions and a morphological examination does not indicate malignancy (Frappart et al (1987); Morgan et al . (2005)) .

More preferably, the average Raman spectrum for the at least one MC is produced by averaging the signals of HA and WIT. It has in fact been observed that the observation of these two components allows a reliable distinction between benign and malignant MCs. However, it is within the abilities of the skilled person in the field to adopt more than two components or use only one component, such as for example only HA.

The Raman average spectra of the MCs can be considered the most informative and comprehensive data about MCs, and the most innovative aspect of the present invention .

Preferably, in the step of carrying out a multivariate analysis, it is within the abilities of the skilled person to select the inorganic components to be selected and averaged thus obtaining the average Raman spectrum of each MC . In particular, Type II MC components HA and WIT have been shown to give good results.

Preferably, the plurality of average Raman spectra averaging the same said at least one Type II MC component signal from the calcified components of respective MCs have been obtained from respective microcalcifications of breast tissue samples containing MCs, which have undergone steps a) to c) above.

Preferably, in the step of carrying out a multivariate analysis, it is within the abilities of the skilled person to select the number of MCs to be studied, corresponding to the number of average Raman spectra. The accuracy of the method improves with the number of average Raman spectra.

Preferably, the number of the average Raman spectra is greater than 25, more preferably greater than 50, even more preferably greater than 100, even more preferably greater than 200, most preferably greater than 400.

With the expression "multivariate analysis", what is meant here is the statistical tool based on the statistical principle of multivariate statistics, which involves the observation and analysis of more than one statistical outcome variable at a time (T. W. Anderson, An Introduction to Multivariate Statistical

Analysis, Wiley, New York, 1958) .

It is within the abilities of the skilled person to select the protocol for performing the multivariate analysis .

Preferably, the multivariate analysis is carried out by performing a principal component analysis (PCA) thus obtaining a plurality of principal components (PCs) .

Preferably, following the PCA analysis, a linear discriminant analysis (LDA) classification model is built using a plurality of the PCs. The choice of the number of PCs used for the LDA is within the capabilities of the skilled person. Preferably, the number of PCs to be used is selected in order to represent about 90% of dataset variability and in order to exclude PCs associated to noise and/or artefacts.

Preferably, following the LDA, the leave-one-out cross validation is carried out to validate the classification model thus obtained.

Preferably, the step of carrying out a multivariate analysis produces the output of canonical variable 1 score for use as unique classification variable.

Preferably, a receiver operating characteristic (ROC) curve with relative area under the curve (AUC) is automatically calculated, ffor example using OriginPro2019 (Originlab Corporation, Wellesley Hills, MA, USA) using as input the canonical variable 1 emerging from the PCA-LDA classification.

Preferably, an optimal threshold (cut-off point) is obtained from the ROC curve.

By the expression "threshold" (or cut-off point) what is meant here is the optimal cut-point providing the best separation of the test results (positive or negative; malignant or benign; diseased or not) . Preferably, the optimal cut-point can be calculated as the point closest-to- ( 0 , 1 ) corner in the ROC plane which defines the optimal cut-point as the point minimizing the Euclidean distance between the ROC curve and the (0,1) point. (N. J. Perkins and E. F. Schisterman, "The inconsistency of "optimal" cut-points using two ROC based criteria," American Journal of Epidemiology, vol. 163, no. 7, pp. 670-675, 2006)

The optimal threshold can be used to produce confusion matrices and relative diagnostic performances, enabling to obtain a classification of the at least one MC as benign or malignant depending on their canonical variable 1 in relation to the threshold value.

The method of the present invention can also enable to obtain a classification of the at least one MC as belonging to different lesion subtypes of the MCs according to the B-categories defined by the UK National Health Service Breast Screening Programme (NHSBSP), Ellis 10, Humphreys S, Michell M, Pinder SE, Wells CA, Zakhour HD, et al . Best Practice No 179. Guidelines for breast needle core biopsy handling and reporting in breast screening assessment. J Clin Pathol. 2004;57:897- 902) depending on their canonical variable 1 in relation to the threshold value.

It is within the capabilities of the skilled person to arrange the means of output of the information classifying the MC as malignant or benign; and as belonging to the above-mentioned B-categories . The classification may in fact be associated to a suitable indicator such as for example a number on a screen, a sound, a light, to name a few.

In the present method, any canonical variable 1 above the treshold value is associated to a diagnosis of malignant MC, whereas any canonical variable 1 below the treshold value is associated to a diagnosis of benign MC.

By the expression "canonical variable 1 what is meant here is the linear combination of the covariates (PCs in this case) that maximizes the multiple correlation between the categories (benign and malignant in this case) providing the maximum separation among the groups .

In a preferred embodiment, in the step of collecting Raman spectroscopy imaging data of a breast tissue sample containing at least one MC, the at least one MC is found in the cancerous breast tissue. In other words, the at least one MC is found inside a carcinoma region of the breast tissue.

In an alternative preferred embodiment, in the step of collecting Raman spectroscopy imaging data of a breast tissue sample containing at least one MC, the at least one MC is found in the non-cancerous breast tissue. In other words, the at least one MC is outside a carcinoma region .

An MC outside a carcinoma region is considered to be an MC detected in the tissue surrounding B5a or B5b lesions, that could be categorized with a lower B- category (according to the B-categories defined by the UK National Health Service Breast Screening Programme (NHSBSP), Ellis, 2004) . This corresponds in particular to non-cancerous tissue in both in situ carcinoma (B5a) and invasive carcinoma (B5b) tissue samples.

Advantageously, in fact, the method of the present invention is capable of recognising as malignant the biochemical composition and structural features of MCs that are found in a relatively extended tissue region around the malignancy, even if not directly surrounded or in close contact with cancer cells.

This is particularly useful for the early diagnosis of a malignant cancer. In particular, it is advantageous that the method of the invention is capable of detecting a malignant cancer even if the breast tissue under examination is not the tissue closest to the cancer cells. In addition, this can be useful in case of using optical probes or objectives with lower spatial resolution if compared with the configuration here proposed .

The Raman maps processing is automatically performed on all the selected MCs detected on tissue samples .

Advantageously, the classification model is validated with each further input of average Raman spectra. The method of the present invention shows an accuracy greater than 87%, a sensitivity greater than 93% and a specificity greater than 80%.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1 shows a schematic representation of a part of an embodiment of the method of the invention. It shows an example of the Raman mapping and analysis of a single MC.

Figure 2 shows an example of Raman mapping images of a MC from a malignant lesions (DCIS) .

Figure 3 shows an example of Raman mapping images of a MC from a benign lesion (fibroadenoma) .

Figure 4 shows average Raman spectra of all MCs from benign and malignant lesions (4a) A vertical shift has been introduced in the graph to distinguish the two spectra; difference (4b) .

Figure 5 shows first 10 PCs (5a) from the statistical analysis of Example 1; and a scatter plot of the scores associated to PC3 and PC8 (5b) in Example 1.

Figure 6 shows the confusion matrix produced in Example 1 (6a) ; and group separation after LDA (* p<0.001) (6b) in Example 1.

Figure 7 shows part of the experimental workflow and components of Example 2. Overview of the experimental workflow (left) and components detected in each Raman imaging map and considered in this study (right) . The Raman map reported as example was obtained from the analysis of a B5a sample and reports only the presence of HA (green) , surrounded by tissue (grey) . Figure 8 shows the detailed composition of representative MC (n=315) in Example 2. A, Number of MCs exhibiting at least one pixel of the components described by the legend, for each diagnostic category. B , Overall composition of MCs, calculated considering altogether all MC belonging to each diagnostic category. Components: calcium phosphate in the hydroxyapatite form (HA) , calcium phosphate in the whitlockite form (WIT) ; amorphous calcium carbonate (aCaC) ; crystalline calcium carbonate (calcite) ; calcium oxalate (CaO) ; MC not showing specific Raman spectra associated to mineralized components (none) .

Figure 9 shows the vibrational features of representative Type II MC (n=264) in Example 2. Average phosphate band ( 9A (A) ) , average carbonate band (9A(B)) and average protein band (9A(C)) of each of the diagnostic categories, including standard deviation (shaded area) . In 9A (A) , the intensity of spectra is shifted for clarity. 9A(D), bar-plot reporting the broadening (FWHM) of the phosphate band. 9A(E), bar-plot reporting the intensity (area) of the carbonate band, between 1070-1090 cm -1 . 9A(F), bar-plot reporting the intensity (area) of protein peak, between 1425-1475 crrr 1 . Data are shown as box ad whiskers. Each data point represents a single MC . Each box represents the 25th to 75th percentiles (interquartile range [IQR]) . Dots inside the box is the mean, lines inside the boxes represent the median. The whiskers represent the lowest and highest values within the boxes ±1.5x the IQR. All p values are reported in the Table 9B. "n.s." means that not significant difference was observed (i.e. with p value > 0.05) (p > 0.05 (two tailed) ) .

Figure 10 shows for Example 2 A) overlapped mean spectra of MC detected in different diagnostic categories. Three specific regions of interest (phosphate band, carbonate band and protein band) are specified and correspond to the same spectral regions investigated in Fig. 9; B) Comparison (difference spectra) between MC detected in different diagnostic categories; C) mean spectra of MC detected in different diagnostic categories plotted separately; D) Comparison (difference spectra) between MC detected in pure benign samples (B1+B2) versus MC detected in pure malignant samples (B5a+B5b) .

Figure 11 shows the loading of the first sixteen principal components obtained by PCA in Example 2. The first fourteen were used to build the LDA-based model.

Figure 12 shows the results from the LDA- classification model in Example 2. A, Box plot showing the canonical variable score emerging from the LDA-based classification of MC from pure benign (Bl, B2) and pure malignant (B5a, B5b) samples (b=211) . B , box plot reporting the canonical variable score of MC found in B3 samples and used as test dataset (n=53) . C, box plot reporting the canonical variable score of MC found outside carcinoma lesions in B5a (n=28) and B5b samples (n=88), used as test dataset. Colors of dots in C indicate the locally defined category of the tissue surrounding each MC . Data are shown as box ad whiskers. Each data point represents a single MC . Each box represents the 25th to 75th percentiles (interquartile range [IQR]) . Dots inside the box is the mean, lines inside the boxes represent the median. The whiskers represent the lowest and highest values within the boxes ±1.5x the IQR. The dashed lines refer to the optimal cut-off value (-0.445) defined by the ROC.

Figure 13 Shows the scatter plot showing the association between age and canonical variable 1 in Example 2, calculated by averaging the canonical variable value of all MC found in each of the 380 representative samples.

Figure 14 shows an ROC curve obtained from the post probability assignment produced by the LDA-based classification model in Example 2 and considering only pure benign (B1,B2) and pure malignant MC (B5a,B5b) (n=211 ) .

Figure 15 Shows the scatter plot showing the association between BI-RADS and canonical variable 1, calculated by averaging the canonical variable value of all MC found in each of the 380 representative samples of Example 2.

DETAILED DESCRIPTION OF THE INVENTION

A new label-free and practical Raman imaging approach has been developed able to extract and analyse all the vibrational information contained in each MC thus extracting more informative data and, as a consequence, a more accurate characterization and classification of breast lesions.

The invention concerns a method for breast cancer diagnosis by Raman-based mapping of mammary MCs, in particular, by collecting Raman mapping datasets, produced by scanning entirely each MC . Raman mapping datasets have been processed to extract only spectra related to calcified components and then analyzed by multivariate analysis. In the Examples, the variables extracted by principal component analysis have been then analyzed by Linear Discriminant Analysis (LDA) , in order to discriminate MCs between benign and malignant lesions and to classify with different histopathological subtypes .

The rapid and accurate characterization of breast MCs is currently an unmet clinical need. MCs are considered as suspicious signs of a breast lesion, and histopathology is mandatory to assess the nature of such lesions. Up to now, MCs have been traditionally seen by mammography as simple bystanders of cancer and only characterized by descriptive criteria (i.e. morphology and spatial distribution) , according to the current guidelines (BI-RADS) (D'Orsi C, Sickles E, Mendelson E, Morris E. ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System. Reston, VA, American College of Radiology; 2013) .

In the present invention, breast MCs from different diagnostic categories have been investigated by studying the entire area of each MC using high resolution Raman imaging. This allows for the first time 1) to describe all inorganic components contained in MC and 2) to evaluate whether a specific (and complete) MC composition correlates with the pathological state. Previously reported RS characterizations of MC have been made by performing single acquisitions at selected sites inside the calcified lesion (Haka et al . 2002) or single spectra on fresh biopsies (Sathyavathi et al . 2015), thus obtaining partial information. Only recently the spatial composition of MCs has been investigated by Raman imaging but on a single DCIS specimen, with explorative aims (Kunitake et al . 2018) .

The first evidence emerging from this study is that MCs from pure benign lesions (i.e. B1 and B2) are largely heterogeneous if compared with both lesions of uncertain malignant potential (B3) and with carcinoma lesions (B5a, B5b) . The identification of the lesions from B1 to B5 is in accordance with the B-categories defined by the UK National Health Service Breast Screening Programme (NHSBSP) (Ellis 10, Humphreys S, Michell M, Pinder SE, Wells CA, Zakhour HD, et al . Best Practice No 179. Guidelines for breast needle core biopsy handling and reporting in breast screening assessment. J Clin Pathol. 2004;57:897-902) First, in benign samples both Type I (calcium oxalate) and Type II (hydroxyapatite (HA) ) MCs were observed, confirming what previously reported; also confirming that Type I MC are uncommon findings (Haka et al . 2002) . Second, in most of Type II MCs identified in benign samples, hydroxyapatite is not the only component but also whitlockite and amorphous calcium carbonate (aCaCa) were observed, (in 55% and 24% of benign MCs, respectively) . Whitlockite is a crystal phase of calcium phosphate, sporadically detected in human tissues where magnesium partially substitutes calcium if compared with hydroxyapatite (Lagier R, Baud C-A. Magnesium Whitlockite, a Calcium Phosphate Crystal of Special Interest in Pathology. Pathology - Research and Practice. 2003; 199:329-35) . As reported in the study of Example 2, investigating 52 samples (315 representative MC) , whitlockite overall represents 4.5% of benign calcifications (Bl, B2), 0.32% of B3 samples, and the 0.22% of cancerous calcification (B5a, B5b) . These results were further confirmed when the mean spectra of each MC was considered. Recently, whitlockite was identified by X-ray reporting more whitlockite in malignant (1.18%) then in benign (0.46%) samples (Scott R, Stone N, Kendall C, Geraki K, Rogers K. Relationships between pathology and crystal structure in breast calcifications: an in situ X-ray diffraction study in histological sections, npj Breast Cancer. 2016; 2 : 16029) . Different results could be explained by the inclusion of different number of samples (57 MC from 15 biopsies) and by the lower spatial resolution of X-ray approaches.

In parallel, the finding of both amorphous (aCaCa) and crystalline calcium carbonate (calcite) as isolated components in MCs is a new evidence. aCaCa was mostly found in benign samples, always localized with whitlockite, and this could be due to the fact that magnesium, contained in whitlockite, is known to stabilize aCaCa (Loste E, Wilson RM, Seshadri R, Meldrum FC. The role of magnesium in stabilising amorphous calcium carbonate and controlling calcite morphologies. Journal of Crystal Growth. 2003;254:206-18) . Calcite is the stable and crystalline form of CaCa that can naturally form if not stabilized by active processes (Beniash E, Aizenberg J, Addadi L, Weiner S. Amorphous calcium carbonate transforms into calcite during sea urchin larval spicule growth. Proceedings of the Royal Society of London Series B: Biological Sciences. 1997; 264:461-5) and has been found in both benign and malignant samples only as single very small crystals of pure calcite. The amount of data is not sufficient to explain an association with pathology.

When the overall Raman signals originating from single Type II MC were extracted to verify their correlation with diagnosis, the first evident feature was the right-shift and the broadening of the 960crrr 1 phosphate band of calcium phosphate in B1 and B2 samples. The broadening of the phosphate band is normally explained by the increase of carbonate content into hydroxyapatite, resulting in the alteration of the symmetry of the crystal structure (Mul FFM de, Otto C, Greve J, Arends J, Bosch JJT. Calculation of the Raman line broadening on carbonation in synthetic hydroxyapatite. Journal of Raman Spectroscopy. 1988;19:13-21) . The shift of the phosphate band is due to the change of the crystal composition as in the case of whitlockite, due to the presence of magnesium (Jolliff BL, Hughes JM, Freeman JJ, Zeigler RA. Crystal chemistry of lunar merrillite and comparison to other meteoritic and planetary suites of whitlockite and merrillite. American Mineralogist. 2006;91:1583-95) . In our data, both contributions have been seen and this is explained by the co-presence of (carbonated) apatite and of whitlockite in benign MCs . Previous RS studies, only mentioned peak broadening and this could be due to single point acquisition, not able to recapitulate the whole MC composition (Haka et al . 2002; Sathyavathi et al . , 2015) . Noteworthy, when the phosphate features are compared among diagnostic categories, B3, B5a and B5b samples cluster together with only minor differences {p = 0.032) . In parallel, also the band of carbonate changes among diagnostic categories, decreasing in its overall intensity passing from benign to malignant samples. Also in this case, there is a contribution derived by the presence of whitlockite in benign samples, producing a band broadening, and this was not revealed in previous studies describing carbonate as indicator of benignity (Haka et al . 2002; Sathyavathi et al . , 2015) . As a final point, proteins have been described as MC components but their contribution did not show any significant correlation with pathology.

A general overview of the results coming from the detailed characterization of breast MCs suggests that malignant microcalcifications are more homogeneous, more crystalline and less substituted if compared with benign ones. Studies performed on tissue samples and in vitro on cultured cells demonstrated that MC formation is a cell-active process influenced by the microenvironment and by the overexpression of bone matrix proteins (i.e. osteonectin (OSN) and osteopontin (OPN) ) (Cox RF, Hernandez-Santana A, Ramdass S, McMahon G, Harmey JH, Morgan MP . Microcalcifications in breast cancer: novel insights into the molecular mechanism and functional consequence of mammary mineralisation. British Journal of Cancer. 2012;106:525-37; Bellahcene A, Castronovo V. Increased expression of osteonectin and osteopontin, two bone matrix proteins, in human breast cancer. Am J Pathol. 1995;146:95-100) . In particular, both these studies reported that active processes of MC formation are significantly more represented in case of malignancy. In parallel, a retrospective study on patients referred for needle-guided biopsy, reported that the increasing of size and the formation of new MCs significantly correlates with high probability of ductal invasive carcinoma (Lev-Toaff AS, Feig SA, Saitas VL, Finkel GC, Schwartz GF . Stability of malignant breast microcalcifications. Radiology [Internet] . 1994 [cited 2019 Oct 3]; Available from: https : / /pubs . rsna . org/doi /abs / 10.1148/radiology .192.1.8 208928) . These data suggest that malignant lesions are more active in the formation of MCs. Starting from this assumption, the crystallinity and homogeneity of malignant MCs could originate from a faster and active process stimulated by cancer and the cancer microenvironment. On the contrary, the heterogeneity and low crystallinity of benign MC could be explained by a slower and less regulated mechanism of mineralization, thus permitting both loss of crystal stability and/or the intercalation of external components (i.e. carbonate, magnesium) from the surrounding tissue.

In order to verify the diagnostic potential and the transferability of Raman-based MC characterization to clinics, a multivariate approach was applied, including the use of a LDA-based model to automatically verify the diagnostic performance of the proposed approach. The model was built using pure benign (Bl, B2) and pure malignant (B5a, B5b) categories; including only MCs detected inside a carcinoma region in case of malignant MCs. The results were promising, showing 93.5% sensitivity and 80.6% specificity with a negative predictive value (NPV) of 92.2% after cross-validation. The same classification model was used to investigate the grade of malignancy of B3, according to the biochemical composition, revealing that 39 (83%) of 53 B3 MC detected showed malignant features, mostly by flat epithelial atypia (FEA) subtype. This data is in agreement with the fact that FEA represents a direct pre-cancerous lesion possibly leading to ductal carcinoma in situ (DCIS) in up to 18.6% of cases (Rudin AV, Hoskin TL, Fahy A, Farrell AM, Nassar A, Ghosh K, et al . Flat Epithelial Atypia on Core Biopsy and Upgrade to Cancer: a Systematic Review and Meta-Analysis. Ann Surg Oncol. 2017;24:3549-58) .

In addition, it was investigated whether all MCs detected in malignant bioptic samples, but outside the specific cancer lesion, exhibit the features of the local "benign" (or B3 or DCIS) surrounding tissue, or the malignant features of the carcinoma nearby. As described above, 64% of locally benign (or B3) MC found around DCIS samples showed malignant features. In invasive carcinoma samples, 86 (98%) of 88 MC found outside the invasive region (including 34 (94%) of 36 locally classified as Bl) were classified as malignant. This is interesting data showing that the biochemical composition and structural features of MC are influenced by the tumour even if MC are not directly surrounded or in close contact with cancer cells. As a consequence we can assume that the tumour environment, and probably the metabolism of breast tissue in the presence of malignancy (especially if invasive carcinoma) , influence a relatively extended tissue region around the malignancy (Vidavsky N, Kunitake JAMR, Chiou AE, Northrup PA, Porri TJ, Ling L, et al. Studying biomineralization pathways in a 3D culture model of breast cancer microcalcifications. Biomaterials. 2018;179:71-82) .

In particular, these data suggest that the investigation of a relatively large region (500-2000 pm) of suspected tissue containing MC could correctly inform about the malignancy even if some locally benign MC regions surrounding the lesion are inevitably probed due to low-resolution of in vivo configuration.

In conclusion, the study of the present invention reports new detailed information about MC composition and demonstrates that Raman-based approach could provide a direct and reliable description of breast lesions thanks to the study of MC, scarcely described in histology in the current practice. Indeed, Raman spectroscopy allows to accurately discriminate benign from malignant lesions and to report different malignant features when comparing uncertain, in situ and invasive lesions. Interestingly, Raman characterization of surrounding tissue in malignant microcalcifications also revealed specific features of malignancy, suggesting that not only the lesion but also its context is influenced by the tumour.

Certain implementations of the invention comprise computer processors which execute software instructions which cause the processors to perform a method of the invention. For example, one or more processors in a medical Raman specrometer may implement methods as described herein by executing software instructions in a program memory accessible to the processors. The invention may also be provided in the form of a program product. The program product may comprise any medium which carries a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of the invention. Program products according to the invention may be in any of a wide variety of forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like or transmission-type media such as digital or analog communication links. The computer-readable signals on the program product may optionally be compressed or encrypted .

Where a component (e.g. a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a "means") should be interpreted as including as equivalents of that component, any component which performs the function of the described component (i.e., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which perform the function in the illustrated exemplary embodiments of the invention.

The present invention will now be further illustrated by means of the following examples, which are illustrative only and are not intended to limit in any sense the scope of the invention.

EXAMPLE 1

Sample preparation

Breast biopsies from 29 patients affected with different benign and malignant tumours were used for this study (see Fig. 1 for the schematic representation of the concept underlying the study) . Formalin fixed paraffin embedded (FFPE) tissue blocks, previously used to prepare stained slices for diagnostic purposes, were cut into 10 pm slices, mounted onto stainless steel slides and deparaffinized in order to avoid the contribution from paraffin Raman signals in the measurements. The deparaffination procedure included the immersion of the tissue slices in hexane 100% for 5 min (2 times) , in Ethanol 100% for 5 min (1 time) and in Ethanol 98% (1 time), followed by air drying.

Raman mapping measurement

A commercial instrument (InVia Reflex, Renishaw (UK)) was used to perform Raman mapping acquisitions. For this methodology, a 785nm laser source (Renishaw (UK)) with round shape spot, powered with around 90mW, was coupled with a N-Plan lOOx (NA 0.75, WD 0.37) Leica objective and with a 1200 1/mm grating. The CCD detector (1024x256 pixels) Renishaw (UK)) used here is sensitive between 400-1060 nm and cooled at -70°C. Raman mapping measurements were performed using a motorized stage with minimum step size of 0.1 pm.

Raman maps were acquired on all MCs previously identified on the contiguous haematoxylin-eosin (HE) slice by the pathologist, and/or visible on the deparaffinized tissue slice. The mapped area was defined by the selection of the entire region around the MC and positioning the objective focus between the tissue surface and the MC surface, in order to collect the largest amount of signals. The laser power on the sample was set around 20mW. Each Raman mapping dataset was produced scanning the entire MC with a step size between 4 and 10 pm, depending on the MC size, and setting an acquisition time of 3s for each map point. The resulting Raman map sizes were between 200 and 15000 points, depending on the MC size. For biopsies containing more than 3 MCs, Raman maps were automatically acquired following a pre-programmed acquisition queue.

Data pre-processing and automatic extraction of Raman spectra from the calcified region

All data processing were performed with MATLAB custom scripts with the exception of cosmic rays removal, performed using the commercial software Wire (Renishaw (UK) ) .

After the Raman map acquisition, to each spectra the following operations are applied:

1. Spectra normalization through the unity vector method .

2. Peaks search algorithm in the ranges (955-965 nm) for the Apatite, (965-973 nm) for the Whitlockite and (1070-1087 nm) for the Carbonate.

3. Evaluation of the monotonicity of each spectrum.

4. Spectra filtering though mobile mean of eight order .

5. Evaluation of the correlation between each spectrum and each reference spectrum. A reference spectrum is required for each class (i.e. for each component) .

6. Spectra classification through peaks position, correlation with reference spectra and monotonicity (referenced in the following as spectral classification) .

7. K-means agglomerative clustering on the whole Raman map, repeated several time in order to achieve affordable results.

8. The final classification is obtained by fusing the K-means spatial classification with the previous spectral classification through a majority voting system .

Other information about the distribution of calcified components into the MC (called here "MC compactness") were evaluated through the following operations :

1. Clustering density based algorithm (DBSCAN) . In this phase, the number of clusters and the number of noise pixels are obtained. The minimum cluster area is set to 15 pixels, while the search range is set to 2.5 pixels .

2. Compactness evaluation through the clustered area, the noise pixels and the micro-calcification area.

The MC compactness is evaluated independently for the Apatite, the Whitlockite and the Carbonate.

Statistical analysis and classification

For each MC, a single Raman spectrum, calculated by averaging only spectra containing calcification signals as above described, was used here for further statistical analyses and classification. First, principal component analysis (PCA) was performed on the whole dataset to reduce the complexity of Raman spectra to a lower number of variables thus obtaining >100 principal components (PCs) representing 100% of dataset variability. The first 8-20 PCs, representing the 85-92% of dataset variability were further considered excluding in this way PCs associated to noise and/or small artefacts. Linear Discriminant Analysis (LDA) was then performed in order to use the mentioned selected PCs to discrimination and classification purposes. In particular, LDA was used to discriminate MCs from benign/malignant lesion or from lesions classified with different histopathological subtypes .

Application examples and results

To demonstrate the efficacy of the proposed methodology here we report some results obtained from the analysis of breast biopsy sample.

The method of the invention permitted to perform a detailed Raman mapping of all the regions containing MC in the selected biopsies and to automatically extract only the spectral information associated to calcified material. In Fig.2 and Fig.3 we report the automatic extraction of pixels associated to microcalcification (i.e. calcium phosphate (HA and whitlockite) thus excluding spectral information derived by other components (i.e surrounding non-calcified tissue, necrotic tissue and the stainless steel slide) ) .

The Raman maps processing have then been automatically performed on all 146 MCs detected on tissue samples from 29 patients (14 affected by benign lesions, 15 affected by malignant lesions) . In general, as result of this step much information can be extracted; here some examples:

1) Raman average spectra of the calcified part (HA, whitlockite and calcium carbonate) can be easily obtained for each MC and then used to perform statistical analyses and to build a classification model (see below) ;

2) the percentage of different inorganic components (HA, whitlockite and calcium carbonate) can be automatically obtained to establish the composition of each MC;

3) the spatial distribution of different components can be measured thus obtaining information about the MC compactness .

Among the mentioned information, the Raman average spectra of the calcified part can be considered the most informative and comprehensive data about MCs, and the most innovative aspect of the new proposed diagnostic methodology .

Using these data we report in Fig.4 the average Raman spectra of all MCs form benign and malignant lesions and the main spectral differences between these two groups .

Considering the small differences between the benign and malignant lesions we used PCA to determine which are the main components associated with the variability in the dataset. In Fig. 5 we report the first 10 PCs, representing 87.4 % of total dataset variance. These PCs show that main spectral features representing the variability in the dataset are similar to those identified in the difference spectra in Fig.4. The first 6 PCs show spectral variances around 960 crrr 1 which corresponds to the region of calcium phosphate. In detail, main peaks in these PCs are associated with shift of 960 crrr 1 band, associated with HA, or with the presence of features around 970 crrr 1 , associated with whitlockite.

Among all PCs, the coupling of PC3 and PC8 (representing 11% of total dataset variance) gives a discrete separation between benign and malignant lesions. Benign lesions are associated with positive values of PC3 related to both the broadening (940 crrr 1 ) of the HA band and with the presence of whitlockite (970 crrr 1 ) . Malignant lesions are more associated with negative values of PC3 associated with higher proteins features (1002 and 1440 crrr 1 ) . In parallel, malignant lesions are more associated with negative values PC8, associated with calcium carbonate (1073 crrr 1 ) .

Considering that the evaluation of PCs is only qualitative and that only 2 (or 3) PCs can be visualized on a scatter plot, thus limiting the variance that can be considered, we have taken advantage of LDA to consider many PCs at once and to build a classification model. Figure 6 reports the results of the classification model resulting in an overall accuracy of around 90% after leave one out cross-validation.

EXAMPLE 2

Here, we report the currently most extended biochemical characterization of breast MC thanks to the use of a Raman imaging approach. The aim of the present study is to clarify the association between MC composition and the pathological features of breast lesions and the surrounding tissue, validating some specific evidences by X-ray scattering approaches. Moreover, we verified the diagnostic accuracy of Raman Spectroscopy (RS) on breast MC to specifically identify malignant breast lesions. Finally, features of MC detected in malignant samples but outside carcinoma lesions were investigated to understand the influence of nearby cancerous tissue.

Materials and methods

Samples

Fifty-two patients affected by suspicious breast MC on screening mammography, undergoing a core biopsy and treated at the Breast Unit of Istituti Clinici Scientific! (ICS) Maugeri (Pavia, Italy) from 2018 to 2019, were included. Patients with mass-like lesions or previous breast surgery were excluded from the study. All patients signed an informed consent before inclusion in the study, which was authorized by the Ethical Committee of the Institution (protocol 2281 CE) , which approved the study in compliance with the Declaration of Helsinki. In detail, according to the B-categories defined by the UK National Health Service Breast Screening Programme (NHSBSP) (Ellis 10, Humphreys S, Michell M, Pinder SE, Wells CA, Zakhour HD, et al . Best Practice No 179. Guidelines for breast needle core biopsy handling and reporting in breast screening assessment. J Clin Pathol. 2004;57:897-902), 6 patients reported normal tissue or minimal changes (Bl), 9 patients reported benign lesions (B2), 8 patients reported lesions of uncertain malignancy (B3) , 17 reported in- situ carcinoma (B5a) and 16 reported invasive carcinoma (B5b) (Table 1) . Detailed characteristics of patients are reported in Table 2.

Table 1. Number of MCs detected from different subjects. Samples were classified according to the B-categories defined by the UK National Health Service Breast Screening Programme (NHSBSP) and according to histological classification. "MC outside the lesion" were MC detected in the tissue surrounding B5a or B5b lesions and categorized with a lower diagnostic malignancy. "Representative MC" were MC detected inside the malignant lesion (cancerous lesion) specified by the diagnostic category (B5a or B5b) .

Table 2. Patients' characteristics. Classified according to the B-categories defined the UK National Health Service Breast Screening Programme (NHSBSP) . "Representative" MC are those identified inside the cancerous lesion in B5a and B5b samples, thus excluding MC locally surrounded by tissue categorised with a lower B-category .

Abbreviations: normal tissue (NOR); fibroadenoma (FAD); fibrocystic change (FIB) ; fat necrosis (FNE) ; usual ductal hyperplasia (UDH) ; papillary lesion (PAP); flat epithelial atypia (FEA) ; atypical ductal hyperplasia (ADH) ; ductal carcinoma in situ (DCIS) ; invasive ductal carcinoma (IDC) ; invasive lobular carcinoma (ILC) ; invasive mucinous carcinoma (IMC) .

Tissue preparation

Tissue slices were generated from formalin fixed paraffin embedded (FFPE) tissue blocks. For each patient, a 10 pm slice was microtomed and mounted on mirrored stainless steel slides specific for Raman measurements (Renishaw pic, Wotton-under-Edge, UK) . In parallel, for each tissue sample, a contiguous 6 pm tissue slice was colored using standard hematoxylin eosin stain for standard diagnostic evaluation. To correctly assess the concordance of histopathology with specific Raman features, MCs located inside in-situ or invasive carcinoma were primarily considered in B5a and B5b samples, respectively, and defined among "representative" MC . MC detected in cancerous tissue but not embedded into the cancerous lesions were analyzed separately. The 10pm slices selected for Raman analyses were then deparaffinized using a simple protocol optimized starting from a previously reported method (Faolain E0, Hunter MB, Byrne JM, Kelehan P, Lambkin HA, Byrne HJ, et al . Raman Spectroscopic Evaluation of Efficacy of Current Paraffin Wax Section Dewaxing Agents. J Histochem Cytochem. 2005;53:121-9) . Briefly, tissue slices mounted onto mirrored steel slides were dewaxed by two baths of hexane 95% (Merck KGaA, Darmstadt, Germany), two baths of ethanol absolute (Merck KGaA, Darmstadt, Germany) and a final bath of ethanol 95%, repeating these steps three times, followed by air drying for 2 h. Raman spectroscopy

The experimental and data analysis workflow is illustrated in Figure 7. A commercial confocal Raman microscope (InVia Reflex, Renishaw pic, Wotton-under- Edge, UK) was used to perform Raman mapping acquisitions. A 785nm laser source with round shape spot, powered with around 90mW, was coupled with a N-Plan lOOx (NA 0.75, WD 0.37) Leica objective and with a 1200 1/mm grating, centred around 1250 crrr 1 thus obtaining a spectral range between 700 and 1760 cm -1 . The final power was then filtered to reach around 20mW on the sample. The detector is a CCD (1024x256 pixels) sensitive between 400-1060 nm and cooled at -70 °C. The instrument was daily aligned and intensity calibrated using automated procedures implemented in the instrument start-up process. The wavelength shift calibration was periodically performed using multiple standards (polystyrene, paracetamol, silica) and daily checked by an automatic procedure using the silica band at 520 cm -1 .

Raman mapping measurements of each MC identified by the pathologist on the contiguous haematoxylin-eosin slice were performed. All MCs with diameter between 15 and 1200 pm were selected. A squared region was centred on each MC so as to comprise the whole MC, defining a step size between 1 and 15 pm, depending on MC size. The focus was pre-set for each MC in the middle of the calcified part. For each step the acquisition time was of 3 sec with a single repetition.

Data processing of Raman data

Data analysis and statistical analyses were performed using the commercial software WiRe (Renishaw pic, Wotton-under-Edge , UK) or MATLAB (MathWorks, Natick, MA, USA) or OriginPro2019 (Originlab Corporation, Wellesley Hills, MA, USA) . Cosmic rays removal was performed using WiRe (nearest neighbour and width of features algorithms) . Next steps were performed using Matlab. Background signal was removed by baseline correction by fitting and subtracting a polynomial function of the 10 th order to each spectrum, while the normalization was performed through the unit vector method. The normalized data were then filtered with a moving average filter with a window size equal to five. For each spectrum, three evaluated indices are the monotonicity of the spectra, the correlation with the reference spectrum and the peaks position as preliminary reported (Torti, E., Marcinno, B., Vanna, R., Morasso, C., Picotti, F., Villani, L., & Leporati, F. (2019, August) . Automatic and Unsupervised Identification of Specific Biochemical Features from Raman Mapping Data. In 2019 22nd Euromicro Conference on Digital System Design (DSD) (pp. 464-469) . IEEE) . For each spectrum, three different indices were evaluated and a majority vote system was used to determine a first classification, which was referenced in the following as spectral classification . The three evaluated indices were the monotonicity of the spectra, the correlation with the reference spectrum and the peaks position. The reference spectra (i.e. tissue, hydroxyapatite (HA), whitlockite (WIT) , amorphous calcium carbonate (aCaCa) , crystalline calcium carbonate (calcite) , calcium oxalate (CaO) are reported in Figure 7. The mentioned indices were fused with the majority vote system in order to produce a thematic map which contains a classification based only on spectral information. Spatial classification was taken into account adopting the K-means clustering algorithm which is specific to detecting the borders (Torti E, Florimbi G, Castelli F, Ortega S, Fabelo H, Callicb GM, et al . Parallel K-Means Clustering for Brain Cancer Detection Using Hyperspectral Images. Electronics. 2018; 7:283) . The algorithm is based on a random initialization of the groups; different executions of this algorithm produce different clusters, depending from the initial centroids. Therefore, in order to produce reproducible results, the clustering is repeated ten times with different initializations. The results of ten clustering phases are then fused together in order to obtain a classification map. The classification map produced by the K-means clustering has not a biochemical meaning, since K-means is a non- supervised classification method. The biochemical information was obtained by fusing the K-means clustering with the spectral classification map produced before. For each K-means cluster, the correspondent area of the spectral classification map was evaluated and the groups were labelled on the basis of a majority vote system .

For each MC, a single Raman spectrum, calculated by averaging only signals from Type II MC (i.e. HA and WIT components), was used for further statistical analyses and classification. According to studies reported in literature and according to the studies herein described on more than 400 MCs, both calcite and aCaCa are very rare components. Therefore they were studied form the qualitative point of view but they were not primarily considered for diagnostic purposes.

Principal component analysis (PCA) was performed obtaining 380 principal components (PCs) . A linear discriminant analysis (LDA) classification model was built using the first fourteen PCs as training data, using pure benign (i.e. B1 and B2 MC) and pure malignant (i.e. representative B5a and B5b MC) and setting prior probabilities proportional to the group size. The number of PCs to be utilized was selected in order to represent about 90% of dataset variability and in order to exclude PCs associated to noise and/or small artefacts. The PCA- LDA classification model was validated by the leave-one- out cross validation. B3 MC, B5a MC detected outside in- situ carcinoma or B5b MC detected outside invasive carcinoma were used as test data. ROC curve, with relative AUC, was automatically calculated by OriginLab using as input the canonical variable 1 emerging from the PCA-LDA classification. From the ROC curve the optimal threshold (cut-off point) was obtained. This was used to produce confusion matrices and relative diagnostic performances. If not specified, variables were reported as means (± standard deviations) or median with range of values or as absolute numbers and percentages. Continuous variables were compared using non-parametric Wilcoxon-Mann-Whitney/ Krustal-Wallis test for variables with non-normal distribution. Statistical significance level was set at p<0.05 (two tailed) .

The following Table 3 summarises the main steps of the statistical analysis of the data.

Table 3. main steps of the statistical analysis of the data

RESULTS A total of 474 MCs from 52 patients were mapped by Raman imaging as above described. Out of these, 65 MC were detected in normal tissue (Bl), 67 in samples with benign lesion (B2), 61 in samples with uncertain malignant features (B3) , 97 in samples with in situ carcinoma (B5a) and 184 in samples with invasive carcinoma (B5b) . (Table 1 and Table 2) . Only 59 and 63 MCs from B5a and B5b, respectively, were found inside the corresponding diagnostic category {in situ or invasive carcinoma, respectively) and primarily considered as representative MCs. The remaining were locally surrounded by tissue categorized within a lower diagnostic category and studied separately.

Benign MCs contain specific components and are more heterogeneous than malignant ones

A high resolution Raman imaging approach was optimized in order to characterize and automatically identify all inorganic components contained in MC at micrometric scale, Figure 7. Out of 315 representative MC, 273 (86%) contains HA, the most common form of calcium phosphate defining Type II MC (Figure 8A) . Overall, HA represents also the most abundant component (>74%) of MC (Figure 8B) . Considering different diagnosis, benign calcifications (Bl, B2) are more heterogeneous than uncertain malignant (B3) or malignant MC (B5a, B5b) . In particular, 33 and 40 MC found in Bl and B2 samples, respectively, reported spectral features of WIT (24) (Figure 8A,B), a particular magnesium- containing crystal phase of calcium phosphate. In RS data, WIT was not found as unique inorganic component but always co-localized with HA and it constitutes, overall, 2.7% and 6.2% of B1 and B2 MC, respectively. Some benign MC (i.e. 7 and 25 MC in B1 and B2 lesion, respectively) also show the presence of a few signals (<2%) of aCaC. In addition, 14 MC from B1 samples only contain CaO as sole component and these calcifications correspond to Type I MC, an uncommon type benign calcifications.

Conversely, malignant (B5a, B5b) and B3 lesions, generally exhibit homogeneous MC, containing almost only HA, representing >97% of their overall composition (Figure 8A,B) . A single B3 MC presented Type I MC containing pure CaO. Among the whole dataset, a few MC from both benign and malignant samples reported a highly crystalline form of calcium carbonate (calcite) , never reported before in breast MC and apparently not associated with pathology from our data. Finally, 17 MC identified by the pathologists did not show any specific signal of mineralized components and were likely amorphous material attributable to necrotic regions.

The chemical features of MC are associated with the diagnostic classification

The mean Raman spectra of Type II MC from different diagnostic categories revealed some differences around the major signals; in particular, around the 960 crrr 1 band, related to calcium phosphate vibrational modes, around the 1070-1090 crrr 1 band, related to calcium carbonate content, and around the 1450 crrr 1 band, related to protein content (Figure 9, Figure 10) . When passing from B1/B2 to B3/B5a/B5b samples, right-shift and broadening of the phosphate band is observed and this is mainly associated with both the presence of WIT which have maximum intensity at 970 crrr 1 (24) and with more disordered phosphate crystal lattice (25) (Figure 7) . The right shift and broadening of the phosphate peak is similar in normal (Bl) and benign (B2) MC (p=0.43) but significantly higher (p=7.90*10 10 ) if compared with MC found in lesion of uncertain malignancy (B3) , in-situ (B5a) and invasive lesions (B5b) . In turn, B3,B5a,B5b exhibit only minor variation of broadening values (p = 0.032), with main contribution by differences between B3 and invasive carcinoma samples (p = 0.011), (Figure 9A,D) .

Also the carbonate band reveals right-shift and broadening in benign samples, showing significant differences when comparing Bl, B2 and B3, B5a, B5b subtypes (p=2.86*10 9 ) (Figure 9B,E) . No significant differences can be observed when comparing Bl and B2 (p = 0.51) and a certain variance characterize B3, B5a and B5b (p = 0.001), especially due to differences between B5a and B5b (Figure 9B,E) . Finally, the intensity of protein signals also changes among diagnostic categories, but differences do not correlate with the pathological status (Figure 9C,F) .

The multivariate analysis of MC Raman maps allows to accurately classify different histological subtypes

Multivariate analysis was applied to extract all potential pathological biomarkers from the complexity of Raman data thus improving the limited and subjective analysis of single spectrometric variables. First, principal component analysis (PCA) was performed and the first fourteen principal components (Figure 11), representing 90% of the variability of the entire dataset, were extracted and used as variables for further classification. A classification model was made by LDA firstly using only representative MC from pure benign samples (i.e. B1-B2) and pure malignant samples (B5a-

B5b) thus producing a Raman canonical correlation coefficient (canonical variable 1) further used as unique classification variable (Figure 12) . After excluding a correlation between age and the biochemical composition of MC (R 2 =0.020, slope p =0.33) (Figure 13), a receiver operating characteristic (ROC) curve (Figure 14) was calculated and used to find the optimal threshold point (i.e. -0.445 of canonical variable) for the classification. Out of 108 malignant MC, 103 were correctly classified giving 95.4% sensitivity and 94.6% negative predictive value (NPV) (Table 4) . Table 4. Confusion matrix resulted from the LDA-based classification of representative MC .

Among MC incorrectly assigned as benign, 4 were from DCIS, 1 from IDC. Out of 103 benign MC, 15 were wrongly assigned as malignant giving 85.4% of specificity and 87.3% PPV. Out of benign MC wrongly classified as malignant 10 were from normal tissue, 1 from fibroadenoma, 2 from fibrocystic change, 2 from UDH. Overall, the model gives 90.5% accuracy. These results were validated through leave-one-out cross validation which virtually assigns all MC with an unknown diagnosis before to be classified. This gave 93.5% sensitivity and 80.6% specificity with 87.2% of overall accuracy (Table 5) .

Table 5. Confusion matrix resulted from the leave-on- out cross-validation of the LDA-based classification of representative MC reported in Table S3.

The classification model was then used to investigate the pathological fingerprint of MC found in lesion of uncertain malignancy (B3) . Out of 53 B3 MC, 44 (29 from flat epithelial atypia tissue and 15 from ADH) were assigned as malignant and 9 MC (7 ADH, 1 flat epithelial atypia and 1 from papillary lesion) were recognized as having benign features (Figure 12B) . Moreover, also a correlation between the Raman data and mammographic evaluation according to the Breast Imaging- Reporting and Data System (BI-RADS) (D'Orsi et al . 2013) classification was observed (Figure 15) .

Most of MC found in locally healthy tissue (i.e. either normal tissue or a benign lesion) neighbouring carcinoma lesions show malignant features

116 MC (28 and 88 for B5a and B5b, respectively) were excluded from the main data analysis because locally surrounded by non-cancerous tissue in B5a samples, or because surrounded by non-cancerous tissue or in-situ carcinoma in B5b samples. When data from B5a samples were used as test dataset, out of 28 MC found locally benign, 18 showed malignant features and were classified as malignant, the remaining 10 MC were classified as benign (Figure 12C. Table 5) . Out of 88 MC from B5b detected outside the invasive carcinoma, only 2 MC (locally Bl) were classified as benign but the remaining 80 were classified as malignant. These include 29 MC situated in in-situ carcinoma (B5a) , 23 MC surrounded by B3 tissue, but also 34 MC surrounded by Bl tissue (Figure 12C . Table 6) . Table 6. Membership prediction of different test data by the LDA-based classification approach.