Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND SYSTEMS FOR PHOTOACOUSTIC VISUAL SERVOING
Document Type and Number:
WIPO Patent Application WO/2023/235250
Kind Code:
A1
Abstract:
Provided herein are methods of tracking the positions of medical devices using photoacoustic visual servoing. Additional methods as well as related systems and computer readable media are also provided. In some embodiments, the instructions comprise an electronic neural network and wherein the deep learning-based target segmentation algorithm is implemented using an electronic neural network. In some embodiments, the instructions which, when executed on the processor, perform tracking the position of the medical device using the image data and two or more of the photoacoustic point source localization algorithms. In some embodiments, the medical device comprises a needle, a catheter, or a surgical implement. In some embodiments, the image data comprises beamformed image data and/or raw channel data. In some embodiments, the subject comprises a human subject.

Inventors:
BELL MUYINATU (US)
GUBBI MARDAVA (US)
Application Number:
PCT/US2023/023695
Publication Date:
December 07, 2023
Filing Date:
May 26, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV JOHNS HOPKINS (US)
International Classes:
G01S15/02; A61B5/00; A61B34/20; G01N29/06; G01N29/24; G01N33/483
Foreign References:
US20150150464A12015-06-04
US20150098305A12015-04-09
Other References:
MUYINATU A. LEDIJU BELL, JOSHUA SHUBERT: "Photoacoustic-based visual servoing of a needle tip", SCIENTIFIC REPORTS, vol. 8, no. 1, 1 December 2018 (2018-12-01), XP055672492, DOI: 10.1038/s41598-018-33931-9
Attorney, Agent or Firm:
SAPPENFIELD, Christopher, C. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A system, comprising: an electromagnetic radiation source configured to produce electromagnetic waves; an optical fiber operably connected to the electromagnetic radiation source, which optical fiber is configured to transmit the electromagnetic waves from the electromagnetic radiation source to one or more selected sites in and/or on a subject to generate acoustic waves at least proximal to the one or more selected sites in and/or on the subject; a medical device operably connected to the optical fiber; a robotic device comprising an acoustic sensor, which robotic device is configured to position the acoustic sensor to receive the acoustic waves; and, a controller operably connected at least to the robotic device, which controller comprises a processor and a memory communicatively coupled to the processor, which memory stores instructions which, when executed on the processor, perform operations comprising: positioning the acoustic sensor within sensory communication of the acoustic waves using the robotic device such that the acoustic sensor receives the acoustic waves to produce image data; and, tracking a position of the medical device using the image data and one or more photoacoustic point source localization algorithms selected from the group consisting of: an amplitude-based algorithm, a coherence-based algorithm, and a deep learning-based target segmentation algorithm.

2. The system of any one of the preceding claims, wherein the instructions comprise an electronic neural network and wherein the deep learning-based target segmentation algorithm is implemented using an electronic neural network.

3. The system of any one of the preceding claims, wherein the instructions which, when executed on the processor, perform tracking the position of the medical device using the image data and two or more of the photoacoustic point source localization algorithms.

4. The system of any one of the preceding claims, wherein the medical device comprises a needle, a catheter, or a surgical implement.

5. The system of any one of the preceding claims, wherein the image data comprises beamformed image data and/or raw channel data.

6. The system of any one of the preceding claims, wherein the subject comprises a human subject.

7. A method of tracking a position of a medical device, the method comprising: moving an optical fiber that is operably connected to a medical device to one or more selected sites in and/or on a subject, which optical fiber is operably connected to an electromagnetic radiation source; transmitting electromagnetic waves from the electromagnetic radiation source to the one or more selected sites in and/or on the subject through the optical fiber to generate acoustic waves at least proximal to the one or more selected sites in and/or on the subject; positioning an acoustic sensor within sensory communication of the acoustic waves using a robotic device that is operably connected to the acoustic sensor such that the acoustic sensor receives the acoustic waves to produce image data; and, tracking a position of the medical device using the image data and one or more photoacoustic point source localization algorithms selected from the group consisting of: an amplitude-based algorithm, a coherence-based algorithm, and a deep learning-based target segmentation algorithm.

8. The method of any one of the preceding claims, wherein the deep learningbased target segmentation algorithm is implemented using an electronic neural network.

9. The method of any one of the preceding claims, comprising tracking the position of the medical device using the image data and two or more of the photoacoustic point source localization algorithms.

10. The method of any one of the preceding claims, wherein the medical device comprises a needle, a catheter, or a surgical implement.

11 . The method of any one of the preceding claims, wherein the image data comprises beamformed image data and/or raw channel data.

12. The method of any one of the preceding claims, comprising tracking the position of the medical device in substantially real-time.

13. The method of any one of the preceding claims, wherein the subject comprises a human subject.

14. A computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor, perform at least: moving an optical fiber that is operably connected to a medical device to one or more selected sites in and/or on a subject, which optical fiber is operably connected to an electromagnetic radiation source; transmitting electromagnetic waves from the electromagnetic radiation source to the one or more selected sites in and/or on the subject through the optical fiber to generate acoustic waves at least proximal to the one or more selected sites in and/or on the subject; positioning an acoustic sensor within sensory communication of the acoustic waves using a robotic device that is operably connected to the acoustic sensor such that the acoustic sensor receives the acoustic waves to produce image data; and, tracking a position of the medical device using the image data and one or more photoacoustic point source localization algorithms selected from the group consisting of: an amplitude-based algorithm, a coherence-based algorithm, and a deep learning-based target segmentation algorithm.

15. The computer readable media of any one of the preceding claims, wherein the deep learning-based target segmentation algorithm is implemented using an electronic neural network.

16. The computer readable media of any one of the preceding claims, wherein the non-transitory computer-executable instructions which, when executed by the electronic processor, further perform at least: tracking the position of the medical device using the image data and two or more of the photoacoustic point source localization algorithms.

17. The computer readable media of any one of the preceding claims, wherein the medical device comprises a needle, a catheter, or a surgical implement.

18. The computer readable media of any one of the preceding claims, wherein the image data comprises beamformed image data and/or raw channel data.

19. The computer readable media of any one of the preceding claims, wherein the non-transitory computer-executable instructions which, when executed by the electronic processor, further perform at least: tracking the position of the medical device in substantially real-time.

20. The computer readable media of any one of the preceding claims, wherein the subject comprises a human subject.

Description:
METHODS AND SYSTEMS FOR PHOTOACOUSTIC VISUAL SERVOING

CROSS-REFERENCE TO RELATED APPLICATIONS

[001] This application claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 63/346,855, filed May 28, 2022, the disclosure of which is incorporated herein by reference.

STATEMENT OF GOVERNMENT SUPPORT

[002] This invention was made with government support under R21 EB025621 awarded by the National Institutes of Health and under NSF ECCS- 1751522 and IIS-2014088 awarded by the National Science Foundation. The government has certain rights in the invention.

BACKGROUND

[003] The integration of computer vision with medical imaging is an important subfield of modern healthcare. The dual tasks of visualizing and tracking needle tips, catheter tips, and other surgical tool tips form a significant component of numerous surgical and interventional procedures, such as percutaneous biopsies. Ultrasound imaging is commonly used for this task due to its low cost, high frame rates, portability, and the absence of harmful ionizing radiation associated with other imaging modalities such as fluoroscopy. However, ultrasound fails in imaging environments characterized by acoustic clutter, sound scattering, and signal attenuation. These limitations may be overcome by replacing the acoustic transmission component of an ultrasound imaging system with optical energy transmission to create a photoacoustic imaging system, then further integrating deep learning to overcome computer vision challenges.

SUMMARY

[004] In one aspect, the present disclosure relates to a system that includes an electromagnetic radiation source configured to produce electromagnetic waves, and an optical fiber operably connected to the electromagnetic radiation source, which optical fiber is configured to transmit the electromagnetic waves from the electromagnetic radiation source to one or more selected sites in and/or on a subject to generate acoustic waves at least proximal to the one or more selected sites in and/or on the subject. The system also includes a medical device operably connected to the optical fiber, and a a robotic device comprising an acoustic sensor, which robotic device is configured to position the acoustic sensor to receive the acoustic waves. In addition, the system also includes a controller operably connected at least to the robotic device, which controller comprises a processor and a memory communicatively coupled to the processor, which memory stores instructions which, when executed on the processor, perform operations comprising: positioning the acoustic sensor within sensory communication of the acoustic waves using the robotic device such that the acoustic sensor receives the acoustic waves to produce image data; and tracking a position of the medical device using the image data and one or more photoacoustic point source localization algorithms selected from the group consisting of: an amplitude-based algorithm, a coherence-based algorithm, and a deep learning-based target segmentation algorithm.

[005] In some embodiments, the instructions comprise an electronic neural network and wherein the deep learning-based target segmentation algorithm is implemented using an electronic neural network. In some embodiments, the instructions which, when executed on the processor, perform tracking the position of the medical device using the image data and two or more of the photoacoustic point source localization algorithms. In some embodiments, the medical device comprises a needle, a catheter, or a surgical implement. In some embodiments, the image data comprises beamformed image data and/or raw channel data. In some embodiments, the subject comprises a human subject.

[006] In another aspect, the present disclosure provides a method of tracking a position of a medical device. The method includes moving an optical fiber that is operably connected to a medical device to one or more selected sites in and/or on a subject, which optical fiber is operably connected to an electromagnetic radiation source, and transmitting electromagnetic waves from the electromagnetic radiation source to the one or more selected sites in and/or on the subject through the optical fiber to generate acoustic waves at least proximal to the one or more selected sites in and/or on the subject. The method also includes positioning an acoustic sensor within sensory communication of the acoustic waves using a robotic device that is operably connected to the acoustic sensor such that the acoustic sensor receives the acoustic waves to produce image data, and tracking a position of the medical device using the image data and one or more photoacoustic point source localization algorithms selected from the group consisting of: an amplitude-based algorithm, a coherencebased algorithm, and a deep learning-based target segmentation algorithm. [007] In some embodiments, the deep learning-based target segmentation algorithm is implemented using an electronic neural network. In some embodiments, the method includes tracking the position of the medical device using the image data and two or more of the photoacoustic point source localization algorithms. In some embodiments, the medical device comprises a needle, a catheter, or a surgical implement. In some embodiments, the image data comprises beamformed image data and/or raw channel data. In some embodiments, the method includes tracking the position of the medical device in substantially real-time. In some embodiments, the subject comprises a human subject.

[008] In another aspect, the present disclosure provides a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor, perform at least: moving an optical fiber that is operably connected to a medical device to one or more selected sites in and/or on a subject, which optical fiber is operably connected to an electromagnetic radiation source; transmitting electromagnetic waves from the electromagnetic radiation source to the one or more selected sites in and/or on the subject through the optical fiber to generate acoustic waves at least proximal to the one or more selected sites in and/or on the subject; positioning an acoustic sensor within sensory communication of the acoustic waves using a robotic device that is operably connected to the acoustic sensor such that the acoustic sensor receives the acoustic waves to produce image data; and, tracking a position of the medical device using the image data and one or more photoacoustic point source localization algorithms selected from the group consisting of: an amplitude-based algorithm, a coherence-based algorithm, and a deep learning-based target segmentation algorithm.

[009] In some embodiments, the deep learning-based target segmentation algorithm is implemented using an electronic neural network. In some embodiments, the non-transitory computer-executable instructions which, when executed by the electronic processor, further perform at least: tracking the position of the medical device using the image data and two or more of the photoacoustic point source localization algorithms. In some embodiments, the medical device comprises a needle, a catheter, or a surgical implement. In some embodiments, the image data comprises beamformed image data and/or raw channel data. In some embodiments, the non- transitory computer-executable instructions which, when executed by the electronic processor, further perform at least: tracking the position of the medical device in substantially real-time. In some embodiments, the subject comprises a human subject.

BRIEF DESCRIPTION OF DRAWINGS

[010] FIG. 1 is a flow chart that schematically shows exemplary method steps of tracking a position of a medical device according to some aspects disclosed herein.

[011] FIG. 2 is a schematic diagram of an exemplary system suitable for use with certain aspects disclosed herein.

[012] FIG. 3 is a block diagram illustrating the photoacoustic visual servoing system. Process A is our previously introduced segmentation-based approach to visual servoing beamformed photoacoustic signals, after the acquisition of raw photoacoustic sensor data (also known as channel data). Process B is our newly introduced deep learning-based approach to visual servoing raw photoacoustic channel data. In each case, the red rectangular overlay indicates position coordinates that are input to the robot controller.

[013] FIG. 4 is a finite state machine component of visual servoing system (illustrated with validity checks, d (n), corresponding to Process A).

[014] FIGS. 5A and 5B show examples of (a) acquired experimental data and (b) simulated images with reverberations directly under the source. The reverberations are observed up to 5 mm deeper than the source, and laterally centered underneath the source.

[015] FIG. 6 is a photograph of the setup for needle tracking and probe centering experiments.

[016] FIG. 7 is a plot of mean needle tip tracking errors as functions of the lateral shift for Processes A and B in the phantom and ex vivo tissue. The black error bars represent the standard deviation of each set of measured errors.

[017] FIG. 8 is a directed graph illustrating relationships among the system parameters, signal power distributions, image quality metrics, and performance of computer vision tasks. The solid black lines denote relationships reported in this example. The solid gray lines denote relationships that can be mathematically derived, yet lie outside the scope of this article. The dashed gray lines denote empirically observed relationships that confound direct mathematical descriptions, k = shape of target power distribution; 6 = scale of target power distribution; μ = mean of exponentially distributed background power.

[018] FIG. 9 is a plot of target and background power distributions extracted from a photoacoustic image, with dashed lines showing two points of intersection. The shaded regions denote the overlap between the two distributions, which is used to compute the gCNR.

[019] FIG. 10 is a plot of target and background power distributions extracted from a photoacoustic image after applying a 0.5 threshold, with black dashed lines showing the decision boundaries of the optimal classifier for the underlying image and shaded regions denoting the overlap between the two distributions. The solid lines ending in circles at x = 0 denote the Dirac delta function.

[020] FIGS. 11A-11 E are photographs with Experimental setups showing various imaging environments investigated throughout this manuscript: (a) 5-mm- diameter optical fiber bundle immersed in water bath and imaged with an Alpinion L3- 8 ultrasound probe, (b) catheter equipped with optical fiber inserted into in vivo swine and imaged with an Alpinion SP1-5 probe, (c) 2-mmdiameter optical fiber bundle inserted into black plastisol phantom and imaged with the SP1-5 probe, (d) 2-mm- diameter optical fiber bundle inserted into ex vivo caprine heart and imaged with the SP1 -5 probe, and (e) 1-mm-diameter optical fiber inserted into plastisol phantom and imaged with the L3-8 probe.

[021] FIG. 12 is a plot of mean ± one standard deviation of p as a function of channel SNR measured from photoacoustic images of a simulated 6-mm-diameter target, an experimental setup of a 5-mm-diameter optical fiber bundle in a water bath, and a 2.5-mm-diameter catheter in an in vivo porcine heart.

[022] FIGS. 13A and 13B are plots of mean ± one standard deviation of (a) Q and (b) k as functions of laser energy measured from photoacoustic images acquired with a 2-mm-diameter optical fiber bundle and a plastisol phantom or ex vivo caprine heart. The maximum energy in these plots was limited to focus on the increase in k from 0.07 to 7 μJ, rather than the saturation of similar k values observed at higher laser energies.

[023] FIG. 14 is a plot of Possible gCNR values as a function of target scale parameter 0 for independent selections of target shape parameter k and mean back- ground power μ . The upper and lower bounds of gCNR are separated into Regions 1 and 2 and Regions 3-7, respectively, denoted by the circled numbers on the plot.

[024] FIGS. 15A-15I are photoacoustic images of (a)-(c) a 6-mm-diameter simulated photoacoustic target, (d)-(f) a 5-mm-diameter optical fiber bundle in a water bath, and (g)-(i) a 2.5-mm-diameter catheter in an in vivo porcine heart at channel SNR values of (a), (d), (g) -10 dB, (b), (e), and (h) 0 dB, and (c), (f), and (i) 10 dB, respectively. The circles in each image denote the target and background ROIs. The measured gCNR value is printed at the bottom of each image.

[025] FIGS. 16A-16C are plots of predicted and measured gCNR as functions of channel SNR computed on subsets of 810 photoacoustic images each of (a) a 6-mm- diameter simulated target, (b) a 5-mm-diameter optical fiber bundle in a water bath, and (c) a 2.5-mm-diameter catheter in an in vivo porcine heart. The black lines and gray shaded regions denote the mean ± one standard deviation of our theoretical gCNR predictions. The dots and error bars denote the mean ± one standard deviation of the gCNR measured using the procedure described herein.

[026] FIGS. 17A-17C show DAS beamformed photoacoustic images of a 2- mm-diameter optical fiber bundle in (a) a plastisol phantom and (b) an ex vivo caprine heart, each acquired with a laser energy of 2.9 μJ. The target and background ROIs are marked by circles, (c) Predicted and measured gCNR as functions of laser energy, with the maximum energy limited to 7 μJ to focus on the increase at lower energies, rather than the asymptote at higher energies. The black lines and gray shaded regions denote the mean ± one standard deviation of our theoretical gCNR predictions. The dots and error bars denote the mean ± one standard deviation of the gCNR measurements in the phantom and ex vivo environments.

[027] FIGS. 18A-18C are plots showing segmentation accuracy as a function of (a) gCNR, (b) SNR, and (c) CNR measured from images acquired using a 1-mm- diameter optical fiber in a plastisol phantom and in vivo tissue. The sigmoid fits shown in each plot were created using the phantom data.

[028] FIGS. 19A-19D are plots showing Minimum and maximum values of (a) gCNR, (b) SNR, and (c) CNR measurements as functions of the threshold t 0 applied to photoacoustic images of a 1-mm-diameter optical fiber in phantom and in vivo tissue, (d) Percentage of finite values within these measurements. [029] FIG. 20 is a plot showing comparison of photoacoustic gCNR values measured using the histogram-based approach with the number of bins either fixed at 256 or selected using a data-based method. The solid black line denotes the ideal case of gCNR measurements matching predicted values.

[030] FIG. 21 is a plot showing Predicted gCNR values using the ultrasoundbased gCNR framework and our photoacoustic-based gCNR prediction framework, plotted as functions of measured gCNR values (with data-based histogram bin widths) for the five datasets described herein. The solid black line denotes the ideal case of gCNR predictions matching measured values.

DEFINITIONS

[031] In order for the present disclosure to be more readily understood, certain terms are first defined below. Additional definitions for the following terms and other terms may be set forth throughout the specification. If a definition of a term set forth below is inconsistent with a definition in an application or patent that is incorporated by reference, the definition set forth in this application should be used to understand the meaning of the term.

[032] As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Thus, for example, a reference to “a method” includes one or more methods, and/or steps of the type described herein and/or which will become apparent to those persons skilled in the art upon reading this disclosure and so forth.

[033] It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. Further, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In describing and claiming the methods, systems, and computer readable media, the following terminology, and grammatical variants thereof, will be used in accordance with the definitions set forth below.

[034] About. As used herein, “about” or “approximately” or “substantially” as applied to one or more values or elements of interest, refers to a value or element that is similar to a stated reference value or element. In certain embodiments, the term “about” or “approximately” or “substantially” refers to a range of values or elements that falls within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11 %, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1 %, or less in either direction (greater than or less than) of the stated reference value or element unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value or element).

[035] Classifier. As used herein, “classifier” generally refers to algorithm computer code that receives, as input, test data and produces, as output, a classification of the input data as belonging to one or another class.

[036] Data set As used herein, “data set” refers to a group or collection of information, values, or data points related to or associated with one or more objects, records, and/or variables. In some embodiments, a given data set is organized as, or included as part of, a matrix or tabular data structure. In some embodiments, a data set is encoded as a feature vector corresponding to a given object, record, and/or variable, such as a given test or reference subject. For example, a medical data set for a given subject can include one or more observed values of one or more variables associated with that subject.

[037] Electronic neural network As used herein, “electronic neural network” or “neural network” refers to a machine learning algorithm or model that includes layers of at least partially interconnected artificial neurons (e.g., perceptrons or nodes) organized as input and output layers with one or more intervening hidden layers that together form a network that is or can be trained to classify data, such as test subject medical data sets (e.g., peptide sequence and binding value pair data sets or the like).

[038] Machine Learning Algorithm As used herein, "machine learning algorithm" generally refers to an algorithm, executed by computer, that automates analytical model building, e.g., for clustering, classification or pattern recognition. Machine learning algorithms may be supervised or unsupervised. Learning algorithms include, for example, artificial or electronic neural networks (e.g., back propagation networks), discriminant analyses (e.g., Bayesian classifier or Fisher’s analysis), multiple-instance learning (MIL), support vector machines, decision trees (e.g., recursive partitioning processes such as CART -classification and regression trees, or random forests), linear classifiers (e.g, multiple linear regression (MLR), partial least squares (PLS) regression, and principal components regression), hierarchical clustering, and cluster analysis. A dataset on which a machine learning algorithm learns can be referred to as "training data." A model produced using a machine learning algorithm is generally referred to herein as a “machine learning model.”

[039] Subject As used herein, “subject” or “test subject” refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian ora human. Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals). A subject can be a healthy individual, an individual that has or is suspected of having a disease or pathology or a predisposition to the disease or pathology, or an individual that is in need of therapy or suspected of needing therapy. The terms “individual” or “patient” are intended to be interchangeable with “subject.” A “reference subject” refers to a subject known to have or lack specific properties.

[040] System. As used herein, "system" in the context of analytical instrumentation refers a group of objects and/or devices that form a network for performing a desired objective.

DETAILED DESCRIPTION

[041] The present disclosure provides photoacoustic visual servoing methods, systems, and related aspects for tracking the positions of various types of medical devices as medical procedures are performed using, for example, beamformed image data and/or raw channel data.

[042] By way of overview, FIG. 1 is a flow chart that schematically shows exemplary method steps of tracking a position of a medical device according to some aspects disclosed herein. As shown, method 100 includes moving an optical fiber that is operably connected to a medical device to one or more selected sites in and/or on a subject, which optical fiber is operably connected to an electromagnetic radiation source (step 102) and transmitting electromagnetic waves from the electromagnetic radiation source to the one or more selected sites in and/or on the subject through the optical fiber to generate acoustic waves at least proximal to the one or more selected sites in and/or on the subject (step 104). Method 100 also includes positioning an acoustic sensor within sensory communication of the acoustic waves using a robotic device that is operably connected to the acoustic sensor such that the acoustic sensor receives the acoustic waves to produce image data (step 106). In addition, method 100 also includes tracking a position of the medical device using the image data and one or more photoacoustic point source localization algorithms selected from the group consisting of: an amplitude-based algorithm, a coherence-based algorithm, and a deep learning-based target segmentation algorithm (step 106).

[043] In some embodiments, the deep learning-based target segmentation algorithm is implemented using an electronic neural network. In some embodiments, the method includes tracking the position of the medical device using the image data and two or more of the photoacoustic point source localization algorithms. In some embodiments, the medical device comprises a needle, a catheter, or a surgical implement. In some embodiments, the image data comprises beamformed image data and/or raw channel data. In some embodiments, the method includes tracking the position of the medical device in substantially real-time. In some embodiments, the subject comprises a human subject.

[044] The present disclosure also provides various systems and computer program products or machine readable media. In some aspects, for example, the methods described herein are optionally performed or facilitated at least in part using systems, distributed computing hardware and applications (e.g., cloud computing services), electronic communication networks, communication interfaces, computer program products, machine readable media, electronic storage media, software (e.g., machine-executable code or logic instructions) and/or the like. To illustrate, FIG. 2 provides a schematic diagram of an exemplary system suitable for use with implementing at least aspects of the methods disclosed in this application. As shown, system 200 includes at least one controller or computer, e.g., server 202 (e.g., a search engine server), which includes processor 204 and memory, storage device, or memory component 206, and one or more other communication devices 214, 216, (e.g., client-side computer terminals, telephones, tablets, laptops, other mobile devices, etc. (e.g., for receiving molecular interaction data sets or results, etc.) in communication with the remote server 202, through electronic communication network 212, such as the Internet or other internetwork. Communication devices 214, 216 typically include an electronic display (e.g., an internet enabled computer or the like) in communication with, e.g., server 202 computer over network 212 in which the electronic display comprises a user interface (e.g., a graphical user interface (GUI), a web-based user interface, and/or the like) for displaying results upon implementing the methods described herein. In certain aspects, communication networks also encompass the physical transfer of data from one location to another, for example, using a hard drive, thumb drive, or other data storage mechanism. System 200 also includes program product 208 (e.g., for tracking a position of a medical device as described herein) stored on a computer or machine readable medium, such as, for example, one or more of various types of memory, such as memory 206 of server 202, that is readable by the server 202, to facilitate, for example, a guided search application or other executable by one or more other communication devices, such as 214 (schematically shown as a desktop or personal computer). In some aspects, system 200 optionally also includes at least one database server, such as, for example, server 210 associated with an online website having data stored thereon (e.g., entries corresponding to data sets, etc.) searchable either directly or through search engine server 202. System 200 optionally also includes one or more other servers positioned remotely from server 202, each of which are optionally associated with one or more database servers 210 located remotely or located local to each of the other servers. The other servers can beneficially provide service to geographically remote users and enhance geographically distributed operations.

[045] As understood by those of ordinary skill in the art, memory 206 of the server 202 optionally includes volatile and/or nonvolatile memory including, for example, RAM, ROM, and magnetic or optical disks, among others. It is also understood by those of ordinary skill in the art that although illustrated as a single server, the illustrated configuration of server 202 is given only by way of example and that other types of servers or computers configured according to various other methodologies or architectures can also be used. Server 202 shown schematically in FIG. 2, represents a server or server cluster or server farm and is not limited to any individual physical server. The server site may be deployed as a server farm or server cluster managed by a server hosting provider. The number of servers and their architecture and configuration may be increased based on usage, demand and capacity requirements for the system 200. As also understood by those of ordinary skill in the art, other user communication devices 214, 216 in these aspects, for example, can be a laptop, desktop, tablet, personal digital assistant (PDA), cell phone, server, or other types of computers. As known and understood by those of ordinary skill in the art, network 212 can include an internet, intranet, a telecommunication network, an extranet, or world wide web of a plurality of computers/servers in communication with one or more other computers through a communication network, and/or portions of a local or other area network.

[046] As further understood by those of ordinary skill in the art, exemplary program product or machine readable medium 208 is optionally in the form of microcode, programs, cloud computing format, routines, and/or symbolic languages that provide one or more sets of ordered operations that control the functioning of the hardware and direct its operation. Program product 208, according to an exemplary aspect, also need not reside in its entirety in volatile memory, but can be selectively loaded, as necessary, according to various methodologies as known and understood by those of ordinary skill in the art.

[047] As further understood by those of ordinary skill in the art, the term "computer-readable medium" or “machine-readable medium” refers to any medium that participates in providing instructions to a processor for execution. To illustrate, the term "computer-readable medium" or “machine-readable medium” encompasses distribution media, cloud computing formats, intermediate storage media, execution memory of a computer, and any other medium or device capable of storing program product 208 implementing the functionality or processes of various aspects of the present disclosure, for example, for reading by a computer. A "computer-readable medium" or “machine-readable medium” may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks. Volatile media includes dynamic memory, such as the main memory of a given system. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications, among others. Exemplary forms of computer-readable media include a floppy disk, a flexible disk, hard disk, magnetic tape, a flash drive, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.

[048] Program product 208 is optionally copied from the computer-readable medium to a hard disk or a similar intermediate storage medium. When program product 208, or portions thereof, are to be run, it is optionally loaded from their distribution medium, their intermediate storage medium, or the like into the execution memory of one or more computers, configuring the computer(s) to act in accordance with the functionality or method of various aspects disclosed herein. All such operations are well known to those of ordinary skill in the art of, for example, computer systems.

[049] In some aspects, program product 208 includes non-transitory computer-executable instructions which, when executed by electronic processor 204, perform at least: moving an optical fiber that is operably connected to a medical device to one or more selected sites in and/or on a subject, which optical fiber is operably connected to an electromagnetic radiation source; transmitting electromagnetic waves from the electromagnetic radiation source to the one or more selected sites in and/or on the subject through the optical fiber to generate acoustic waves at least proximal to the one or more selected sites in and/or on the subject; positioning an acoustic sensor within sensory communication of the acoustic waves using a robotic device that is operably connected to the acoustic sensor such that the acoustic sensor receives the acoustic waves to produce image data; and, tracking a position of the medical device using the image data and one or more photoacoustic point source localization algorithms selected from the group consisting of: an amplitude-based algorithm, a coherence-based algorithm, and a deep learning-based target segmentation algorithm.

[050] As also shown in this exemplary embodiment, system 200 also includes additional system components 218, including a laser, an optical fiber, a needle, a probe, and an ultrasound scanner.

EXAMPLES

[051] EXAMPLE 1 : Deep Learning-Based Photoacoustic Visual Servoing: Using Outputs from Raw Sensor Data as Inputs to a Robot Controller

[052] I. INTRODUCTION

[053] The ability to visualize and track surgical tool tips is paramount to the success of multiple surgeries and procedures. Ultrasound is the one of the most commonly used imaging modalities to track tool tips due to its low cost, high frame rates, portability, and absence of harmful ionizing radiation. The combination of ultrasound imaging with either traditional techniques of visual servoing or recent advances in deep learning introduces additional layers of automation for this important task. For example, ultrasoundbased visual servoing may assist with percutaneous needle insertions, and deep learning has the potential to improve the performance and speed of ultrasound image-based needle detection systems. However, both of these automation gains rely on the ultrasound imaging process, which tends to fail in acoustically challenging environments characterized by significant acoustic clutter, sound scattering, and sound attenuation. Specific examples of challenging acoustic environments include transcranial imaging, abdominal imaging, spinal imaging, or imaging of obese patients.

[054] One option to address known limitations with ultrasound imaging is to combine ultrasound imaging systems with a miniature laser system to perform intraoperative photoacoustic imaging, which has provided clear images of needle tips and other structures when ultrasound imaging fails. Unlike ultrasound imaging, which requires the transmission and reception of sound to make images, photoacoustic imaging is implemented by transmitting light to generate an acoustic response that is received by the same ultrasound detectors used for ultrasound imaging. Photoacoustic imaging tends to be advantageous over ultrasound imaging in acoustically challenging environments because it only requires one way (as opposed to round-trip) acoustic travel from the transmission source to the ultrasound receiver.

[055] Previous work from our group demonstrated the success of using photoacoustic imaging as the computer vision component of a visual servoing system, enabling continuous monitoring of needle and catheter tips. The needle or catheter tip each housed an internal optical fiber as one of the key enabling modifications to the interventional setup. This optical fiber can potentially be coupled with any surgical tool tip to enable photoacoustic-based visual servoing of the tool tip. Therefore, this approach was also demonstrated with a fiber that was independent of any tool, catheter, or needle tip.

[056] To achieve photoacoustic-based visual servoing, raw data is typically beamformed to present a photoacoustic image that is interpretable to the human eye, followed by image segmentation to determine coordinates of interest for robot path planning. However, beamforming and other image formation approaches rely on mathematical models that do not consider all possible photoacoustic image artifact sources. Artifacts that cannot be removed with traditional amplitude-based or coherence-based photoacoustic visual servoing approaches (e.g., reflection artifacts or coherent artifacts, respectively) are confusing for both human and robot interpretation, resulting in unreliable segmentation for photoacoustic visual servoing tasks.

[057] In order to better discriminate sources from artifacts, we turn our attention to investigate novel input sources to the robotic system (which may not necessarily need to operate on an image that is interpretable to humans). In particular, our recent photoacousticbased deep learning approaches for photoacoustic source detection suggest that deep learning is a viable solution to address current challenges with amplitude- or coherence-based photoacoustic visual servoing. The novel concept of using deep learning to detect interventional structures of interest in raw sensor data before the application of traditional image formation techniques was previously implemented to detect needle and catheter tips. In summary, recent work from our group independently demonstrated two key advances with regard to interventional tool tip tracking: (1 ) photoacoustic-based visual servoing to enhance tool tip tracking and centering within the image plane and (2) deep learning-based photoacoustic image formation from raw sensor data to improve tool tip visibility.

[058] The independent demonstrations of feasibility described above suggest that the integration of deep learning with photoacoustic-based visual servoing is a superior approach to address well-known challenges with tool tip tracking. This paper presents the first known deep learning-based photoacoustic visual servoing system to address these challenges. The novelty of this contribution includes the creation and implementation of a direct pathway from the photoacoustic raw sensor data (i.e., before any image has been formed) to the robot controller, enabled by recent advances using deep learning to extract information directly from raw acoustic sensor data.

[059] The remainder of this example is organized as follows. Section II introduces our deep learning-based approach to visual servoing raw photoacoustic sensor data (also known as channel data), followed by a description of our network training process. This deep learning approach is contrasted with our previously introduced segmentation- based approach to visual servoing beamformed photoacoustic signals. Section III describes our experiments to test both approaches. Section IV presents our experimental results. Section V discusses our findings in the context of prior work. [060] II. VISUAL SERVOING SYSTEM

[061 ] A. System Components

[062] Fig. 3 shows a block diagram of the photoacoustic visual servoing system used in this work. The system components include a Sawyer robot (Rethink Robotics, Boston, MA, USA), a Vantage 128 ultrasound scanner (Verasonics Inc., Kirkland, WA, USA), a Verasonics P4-2v phased array ultrasound probe, a Phocus Mobile laser (Opotek, Carlsbad, CA, USA), and a 600 pm core diameter optical fiber. One end of the optical fiberwas coupled to the laser. The other end of the optical fiber was inserted into a hollow core needle, ensuring coincident fiber and needle tips to form a fiber-needle pair. The probe was attached to the end effector of the robot using a 3D-printed holder. Nanosecond laser pulses were transmitted at a rate of 10 Hz with a wavelength of 750 nm. The software components of the visual servoing system were implemented using the Robot Operating System (ROS).

[063] The frame U was assigned to coincide with the Verasonics P4-2v probe, with the x-, y-, and z-dimensions corresponding to the lateral, elevation, and axial dimensions of the probe, respectively. The imaging plane of the probe corresponded to the x-z plane of the frame U. The raw channel data frames acquired with the probe were processed to obtain an estimate U (n) of the position of the needle tip in the ultrasound probe frame U and a confidence measure d(n) ∈ (0, 1 ) of the estimate. W e refer to this confidence measure as the validity of the estimate.

[064] Process A used the amplitude-based approach previously developed to estimate the needle tip position and assess its validity. A photoacoustic image was recreated from the acquired channel data using delay-and-sum beamforming. The beamformed image was normalized and a binary threshold of 0.7 was applied to the normalized image. Binary erosion and dilation were performed with a 3x3 kernel to remove single pixel regions and connect segments which became disconnected during the binary threshold application. The erosion and dilation filters helped to ensure that the segmented needle tip was displayed as a single large component, rather than as multiple smaller components. Connected components were then labeled and their corresponding pixel areas were computed. If only one region was larger than 3 times the average area, then that region was assumed to be the needle tip and the centroid of that region was output as the needle tip position. Otherwise, the needle tip was assumed to be outside the field of view of the probe. For robustness, the estimated needle tip position was compared across five consecutive frames. If the needle tip was visible in each frame (i.e. , d (n) = 1 ) and the estimated position of the needle tip did not change by more than 1 cm across the 5 frames, then the needle tip position was labeled as valid.

[065] Process B used a convolutional neural network (CNN) to provide estimates of the needle tip position and corresponding confidence levels in the range 0 to 1 . With a focus on proving the feasibility of integrating deep learning-based approaches with real-time photoacoustic visual servoing systems, we used the ResNet-101 architecture and the Faster- RCNN detection method, which were previously demonstrated as an offline technique applied to photoacoustic channel data obtained with an E-CLIBE 12R ultrasound scanner (Alpinion Medical Systems, Seoul, South Korea). For robustness, the estimated needle tip was compared across 5 consecutive frames as described above. If the needle tip was visible with a confidence level d (n) > 0.7 in each frame and the estimated position of the needle tip did not change by more than 1 cm across the 5 frames, then the needle tip position was labeled as valid.

[066] Fig. 4 shows the finite state machine used to control the translational degrees of freedom corresponding to the lateral and elevation dimensions of the probe. Two- dimensional (2D) photoacoustic images do not contain elevation displacement information. As a result, both Process A and Process B outputzeros in the y-dimension of the estimate p (n). In the nominal “Center” state of the FSM, the error ^ e(n) in the frame U was computed using the equation: where p_cmd = is the desired position of the needle tip in the probe frame U . This computation of (n) ensures that the visual servoing system will center the probe laterally above the needle tip without changing the axial or elevation displacement between the probe and the needle tip. If the FSM was in the “Center” state and the needle tip position estimate was marked as valid (i.e., d (n) = 1 for Process A and d (n) > 0.7 for Process B), then the end effector of the robot was commanded to move along the x-axis of the probe frame U with the velocity vpid (n) given by the equation

where Kp , Ki , and Kd are the gains of the PI D controller (with values of 0.1 , 0.01 , and 0.001 , respectively), and ΔT is the sampling time of the PID controller. The controller was executed every 0.1 s to match the pulse repetition rate of the laser.

[067] The validity (i.e., d (n)) was used to indicate movement of the needle tip outside of the imaging plane of the probe. If the estimated needle tip position was marked as invalid, the FSM entered the “Wait” state. In this state, the end effector was held stationary until up to 5 frames of channel data were acquired by the photoacoustic imaging system. If a valid estimate of the needle tip position was obtained during that time, the FSM returned to the “Center” state. Otherwise, the FSM entered the “Search” state. In this state, the robot end effector was moved in a 2D spiral pattern given by where A and w are the parameters of the spiral search pattern.

[068] The commanded velocity was then converted to the base frame B of the robot using the equation where TU is the transformation from the ultrasound probe frame U to the robot end effector frame E and TE (n) is the instantaneous transformation from the frame E to the robot base frame B. The commanded velocity (n) was then transmitted to the internal velocity controller of the robot over the ROS topic for velocity commands to which the controller subscribed. TABLE I

RANGE AND INCREMENT SIZES OF SIMULATION VARIABLES

[069] B. Training the Convolutional Neural Network

[070] Simulations that mimic the physics of photoacoustic wave propagation offer the ability to generate training data without the time-intensive process of experimentally gathering and hand-labeling the large datasets. This ease of data generation makes simulations a powerful tool in the context of deep learning. To train the CNN for Process B, 20,000 frames of photoacoustic channel data were generated using the k-Wave toolbox in MATLAB. We simulated a single source of diameter 0.1 mm and 4-6 artifacts in each image. One of the artifacts could be anywhere in the image to simulate a reflection artifact and maintain consistency with previous implementations. The remaining artifacts were constrained to the range 1 mm to 10 mm below the source to simulate the reverberation artifacts, as observed in Fig. 5(a), which shows one of the acquired channel data frames used as a reference to generate our training dataset. The ranges and increment values of our simulation variables are listed in Table I. We simulated a discrete ultrasound probe model with a sampling frequency of 1 1 .88 MHz, an aperture of 128 elements, an element width of 0.25 mm, and an inter-element spacing of 0.05 mm. These parameters were selected to match the specifications of the Verasonics P4-2v probe to improve network performance. An example of our simulated training data is shown in Fig. 5(b).

[071 ] The Detectron platform was utilized for training and validation. The network was initialized with pre-trained ImageNet weights and trained on 80% of the simulated images. The remaining 20% of the images were used for network validation. Finally, the Detectron- ROS package was utilized to incorporate the trained network into Process B of the visual servoing system.

[072] III. EXPERI MENTAL METHODS

[073] A. Probe Centering Experiment

[074] The experimental setup for the probe centering experiments is shown in Fig. 6. These experiments were implemented to estimate the probe centering and needle tracking errors of the two processes (i.e., A and B) for needle tip detection, similar to previous experiments implemented with a segmentation-based photoacoustic visual servoing system. The choices for each experimental trial included laser fluence (18.4 uJ/cm^ or 49.5 uJ/cm2 ), needle tip detection process (Process A or B), and imaging environment (plastisol phantom or ex vivo chicken breast). There were 9 probe centering trials per fluence, per process, per imaging environment.

[075] At the start of each experimental trial, the translation stage was reset to 0 mm. The fiber-needle pair was inserted into the chosen imaging environment. The ultrasound probe was placed on the surface of the imaging environment, with the imaging plane of the probe placed to contain as much of the intended trajectory of the needle tip as possible. The probe was then manually displaced distances of 2-10 mm from the needle tip in 2 mm increments in the lateral probe dimension, followed by initiation of visual servoing with Process A or B.

[076] The visual servoing system was executed to center the probe above the needle tip. If the needle tip detection process output 5 consecutive valid estimates of the needle tip position (i.e., d (n) = 1 for Process A and d (n) > 0.7 for Process B), the trial was marked as a success. The mean of the lateral components of those 5 valid estimates u (n) was computed, and the magnitude of p (n) was output as the probe centering error. If 5 consecutive valid readings could not be obtained (i.e., d(n) = 0 for Process A and d(n) < 0.7 for Process B), the trial was marked as a failure. The mean and standard deviation of the probe centering errors were computed for each process and imaging environment.

[077] B. Needle Tip Tracking Experiment

[078] The same setup shown in Fig. 6 and described in Section l ll-A was used for the needle tip tracking centering experiments, using the same choices for each experimental trial. There were similarly 9 needle tracking trials per fluence, per process, per imaging environment. After successfully centering the probe on the needle tip (as defined in Section lll-A), the translation stage was used to move the needle tip in 2 mm increments along the lateral dimension of the probe. At each position, the output of the needle tip detection process was observed. If the process output five consecutive valid estimates of the needle tip position (i.e., d (n) = 1 for Process A, and d (n) > 0.7 for Process B), then the position was marked as a success. The needle tracking error was then computed using the equation: where e, B , Bf , and B sn are the needle tracking error, the initial robot end effector position, the final robot end effector position, and the measured displacement of the needle tip, respectively, in the robot base frame B.

[079] If five consecutive valid readings could not be obtained (i.e., d (n) = 0 for Process A and d (n) < 0.7 for Process B), the position was marked as a failure. The failure rates of Processes A and B were compared to assess the robustness of each algorithm.

[080] IV. RESULTS

[081 ] Table II summarizes the mean and standard deviation of probe centering errors for 90 trials per Process A or B, implemented with either the plastisol phantom or the ex vivo tissue. For each imaging environment, the probe centering errors of Processes A and B were within 0.1 mm of each other. The probe centering errors were similarly within 0.1 mm across the two imaging environments.

TABLE II

PROBE CENTERING ERRORS

[082] Fig. 7 shows the mean and standard deviation of the needle tracking errors for 18 trials per process, imaging environment, and lateral shift value. Process A produced needle tracking errors ranging 0.59-5.36 mm with a mean of 2.63 mm across all phantom trials and ranging 1 .47-2.34 mm with mean of 1.96 mm across all ex vivo tissue trials. Process B generally produced better needle tracking errors than that of Process A, ranging 0.65-1.03 mm with a mean of 0.85 mm across all phantom trials and ranging 0.46-1.39 mm with a mean of 0.88 mm across all ex vivo tissue trials.

[083] Table III lists the failure rates during needle tip tracking. For multiple trials with Process A, the needle tip was incorrectly labeled a reflection artifact that formed a larger bright region than the needle tip. This mislabeling caused a majority of the observed failures of Process A in both the phantom and the ex vivo tissue. Process B generally produced lower failure rates than Process A, with a mean improvement of 60.6% across all lateral shifts and both imaging environments. The highest failure rates of 1.85% and 3.70% for Process B were observed at a lateral offset of 10 mm in the phantom and ex vivo tissue environments, respectively. These failure locations are consistent with previous reports demonstrating increased CNN failure rates as lateral offset from the center of an image increases, due to a reduced number of source examples on the image periphery.

TABLE III

NEEDLE TRACKING FAILURE RATES

[084] We additionally note the <100 ms execution time requirement for Process B in order to achieve the same 10 Hz frame rate previously demonstrated with visual servoing systems using iterations of Process A. This requirement is dictated by the 10 Hz laser pulse repetition frequency of the photoacoustic imaging system. The mean ± one standard deviation of execution times from 36 trials of both experiments described above with Process B were 75.2 ± 12.8 ms and 73.9 ± 13.2 ms per channel data frame with the phantom and ex vivo tissue, respectively. [085] V. DISCUSSION

[086] The results presented in this manuscript highlight the potential of a CNN to provide an alternative input to command robotic visual servoing systems. This potential was demonstrated with a system composed of a Verasonics ultrasound engine, which was not used in any previous work testing similar CNN architectures. It is promising that the presented needle tracking errors are comparable to the 0.40 ± 0.22 mm point source location errors obtained by others with an Alpinion E-CUBE 12R ultrasound scanner and an L3-8 probe. This similar success indicates that the previously proposed deep learning methods for photoacoustic point source detection are generalizable across multiple imaging system platforms.

[087] V\fe identified three advantages of integrating this novel deep learning approach with a robotic photoacoustic visual servoing system, when compared to the amplitude-based image segmentation approach: (1 ) lower tool tip tracking failure rates in the presence of reflection artifacts, (2) reduced needle tracking errors across different imaging environments, and (3) maintenance of 10 Hz frame rates despite increased algorithmic complexity. Regarding the first advantage, the sensitivity of the beamforming techniques to reflection artifacts (e.g., caused by bone) can lead to uncertain and potentially hazardous robot arm movements during surgical procedures, which is a major concern for the steps required to complete the segmentation-based visual servoing approach (i.e., Process A). Instead of adding successive layers of complexity to the beamformer or segmentation algorithm to account for these artifacts and features, the deep learning approach (i.e., Process B) trains a CNN to distinguish between true sources and reflection artifacts in the raw channel data, thus mitigating the introduction of misclassification errors.

[088] To appreciate the second advantage, the reduced mean needle tracking errors with the deep learning approach (i.e., 0.85 mm and 0.88 mm in phantom and ex vivo tissue, respectively) can be compared to the needle tracking errors obtained with the segmentation approach (i.e., 2.63 mm and 1 .96 mm in the phantom and ex vivo tissue, respectively). The overall mean improvement with Process B translates to 67.7% and 55.3% reductions in needle tracking errors in the phantom and ex vivo tissue, respectively.

[089] With regard to the third advantage of achieving 10 Hz frame rates, the deep learning approach has a higher algorithmic complexity compared to the image segmentation approach (i.e. , O (M N S) for Process A vs. N 2 ch M N for Process B, where M , N , S, and N conv are the numbers of receiving elements, samples acquired per frame, scan lines in the beamformed image, and convolutional layers in the CNN, respectively, D is the size of the largest convolutional kernel, and N ch is the maximum filter dimension). This constraint would ordinarily compromise the maximum achievable frame rate. However, the use of GPUs combined with recent deep learning advances allow us to benefit from the deep learning approach without compromising the desired frame rate dictated by the 10 Hz laser pulse repetition frequency.

[090] Future possible improvements to the proposed deep learning visual servoing system include mitigating tracking errors obtained with larger lateral displacements from the center of the probe and increasing the number of degrees of freedom for the motion of the robot end effector. Regarding tracking error mitigation, an increase in the lateral displacement of the source from the center of the probe during the ex vivo experiments resulted in needle tracking errors increasing from 0.46 mm to 1 .39 mm (Fig. 7 ) and needle tracking failure rates increasing to a maximum of 3.70% (Table III). This increase in error may potentially be resolved by improving the training process and by increasing the number of training images containing sources with large lateral offsets. Regarding the tracking degree of freedom, the nominal motion of the robot end effector is limited to 1 dimension in our visual servoing system, and a second dimension is used to search for and find the tool tip when it is not in the imaging plane of the probe. While these two degrees of freedom sufficiently achieve the desired end result, future work will determine the extent to which additional degrees of freedom are necessary to achieve more complicated path planning outcomes with the proposed deep learning-based photoacoustic visual servoing system.

[091 ] VI. CONCLUSION

[092] This example is the first to demonstrate the integration of deep learningbased techniques with photoacoustic-based robotic visual servoing of needle tips. The deep learning- based needle tip detection process is more accurate (e.g., 0.46-1.39 mm needle tracking errors) and produces lower failure rates (e.g., 0-3.70%) when compared to the alternative photoacoustic image segmentation-based visual servoing system (which produced tracking errors and failure rates of 0.59-5.36 mm and 0-7.02%, respectively). The deep learning-based system additionally maintains the frame rates achieved with the segmentation-based approach. Overall, these results demonstrate the promise of a robotic photoacoustic visual servoing system that bypasses traditional image formation and segmentation steps, instead supplying robot controller input based on details contained within raw photoacoustic sensor data. While this work focuses on tracking needle tips, the system described herein can be extended to track the tips of catheters and a multitude of other surgical tools that are critical to automated surgeries and interventional procedures.

[093] EXAMPLE 2: Theoretical Framework to Predict Generalized Contrast-to-Noise Ratios of Photoacoustic Images With Applications to Computer Vision

[094] I. INTRODUCTION

[095] The integration of computer vision with medical imaging is a critical subfield of modern healthcare. Applications of computer vision algorithms in healthcare include classifying disease in medical images, segmenting image features of interest to alert surgeons during an operation, and tracking targets of interest in robot-assisted procedures, which has the potential to reduce infection rates, shorten hospital stays, and improve postoperative function. These advantages are applicable across multiple medical imaging modalities (e.g., magnetic resonance, X-ray, ultrasound, and photoacoustic imaging), and performance characterization is important for overall system design.

[096] Amplitude-based segmentation of a target from the background of an image is an example computer vision-based task with widespread applications. Others have used this technique to track needle tips with an ultrasound-based robotic visual servoing system. Ultrasound has numerous advantages over other imaging modalities, including its low cost, high frame rates, and the lack of ionizing radiation associated with other modalities, such as fluoroscopy. However, ultrasound imaging fails in acoustically challenging environments characterized by significant acoustic clutter, sound scattering, and sound attenuation. Under these conditions, the performance of computer vision algorithms (e.g., target segmentation) inherently suffers.

[097] Photoacoustic imaging is a more recent imaging modality that combines optical transmission with ultrasound reception to overcome the traditional ultrasound imaging limitations noted above. The photoacoustic imaging process is initiated when a light source (typically a nanosecond-pulsed laser) illuminates a region of interest. The illuminated tissue absorbs the light, undergoes thermal expansion, and produces a pressure gradient that is received by an ultrasound transducer, then reconstructed into an image. Recent advances in photoacoustic imaging demonstrate promise to mitigate the risks associated with complex surgical procedures, including liver resections, spinal fusion surgeries, and gynecological surgeries. Previous work from our group also demonstrates successful tracking of needle tips and catheter tips using a photoacoustic-based robotic visual servoing system. The performance of our visual servoing system directly depends on the performance of the target segmentation algorithm, which, in turn, depends on the detectability of targets in the acquired photoacoustic images. The detectability of targets in photoacoustic images has traditionally been quantified using image quality metrics, such as signal-to-noise ratio (SNR), contrast, and contrast-to-noise ratio (CNR). However, these metrics are not bounded, are sensitive to common image manipulation techniques (e.g., dynamic range adjustment and thresholding), and have substantial variability when comparing reported numbers to the detectability of photoacoustic image targets.

[098] Others developed the generalized contrast-to-noise ratio (gCNR) to assess the probability of lesion detection in ultrasound images and presented a theoretical framework to predict the gCNR of ultrasound images. Subsequent contributions from our group demonstrated the applicability of gCNR to photoacoustic images created with delay-and-sum (DAS), short-lag spatial coherence (SLSC), generalized coherence factor weighting applied to DAS, and minimum variance beam-formers. However, these contributions used histogram-based approximations of the ground truth. These approximations, which we refer to as measured gCNR, are sensitive to the selection of bin widths. To overcome this limitation, we require a theoretical framework to predict the true gCNR in photoacoustic images. The ultrasound gCNR theory is insufficient for photoacoustic images for two primary reasons. First, photoacoustic target characteristics differ from those of ultrasound targets. The exponential probability distribution in does not sufficiently model the target power distribution in photoacoustic images. Second, the theoretical ultrasound framework is not equipped to solve the multiple decision boundaries that exist when classifying photoacoustic images.

[099] In particular, the gCNR metric depends on target and background distributions, which depend on photoacoustic system parameters such as laser energy, receiver characteristics, and image processing algorithms implemented to improve image quality. Our previous conference papers individually demonstrated a subset of these dependencies, including the relationships among gCNR, laser energy, and frame averaging, relationships between gCNR and channel SNR, and relationships between image quality metrics (i.e., gCNR, CNR, contrast, and SNR) and target segmentation. Considering that gCNR was developed to assess the probability of detecting targets of interest and to overcome limitations of traditional image quality metrics, we hypothesize that gCNR has superior potential to predict the performance of a photoacoustic-based visual servoing system in comparison to more traditional alternatives.

[0100] To investigate this hypothesis, we first developed a general framework to relate system parameters (e.g., laser energy, channel SNR, and target and background power distributions) to computer vision-based task performance. The three additional contributions that follow from this general framework include: 1) a theoretical derivation to predict the gCNR of photoacoustic images using models of the target and background power distributions; 2) characterization of the connections among system parameter settings, image characteristics, and the computer vision goal of performing automated target segmentation; and 3) implementation of theory and characterizations to predict target segmentation performance from experimental photoacoustic images.

[0101] The remainder of this example is organized as follows: Section II presents our general framework to describe the relationships among the components of a photoacoustic visual servoing system, followed by our theoretical derivation of gCNR predictions in photoacoustic images and our theoretical description of the thresholding step typically implemented prior to initiating a computer vision task. Section III details the simulation of photoacoustic targets and acquisition of experimental and in vivo data, followed by methods to implement the theoretical gCNR predictions and validate these predictions with experimental measurements. Section IV details the results of the presented methods. Section V discusses the implications of this work and its future potential. Finally, Section VI concludes this article with a summary of our findings.

[0102] II. THEORY

[0103] A. General Framework [0104] The major components of a photoacoustic-based computer vision system can be represented as a directed graph highlighting multiple relationships among system properties, as illustrated in Fig. 8. There are four color-coded layers in the graph representing different properties of a photoacoustic-based computer vision system. The first layer represents baseline system parameters and imaging environment details, such as receiver characteristics, laser parameters, target dimensions, and target depth. The second layer corresponds to the signal characteristics observed in the acquired photoacoustic image. The third layer represents possible image quality metrics for photoacoustic images. The fourth layer represents the performance of a computer vision-based task (e.g., target segmentation) on the given photoacoustic image.

[0105] The nodes in each layer of Fig. 8 describe properties that are directly impacted by one or more nodes in the preceding layer. These dependencies are indicated by the directed lines (i.e., edges) which connect the related pairs of nodes. The solid black lines denote relationships reported in this example, including: 1 ) the relationship between channel SNR and background power; 2) the relationship between laser energy and target power; 3) the relationships among target power, background power, and gCNR (with mathematical descriptions for these relationships reported in Section I l-B); and 4) the relationship between image quality metrics (i.e., SNR, CNR, or gCNR) and photoacoustic target segmentation (with mathematical descriptions for the impact of thresholding prior to segmentation reported in Section ll-C). The third and fourth relationships are foundational to predicting the performance of photoacoustic-based visual servoing of needle tips, catheter tips, and surgical tool tips. In addition to providing mathematical descriptions associated with these foundational relationships, all relationships indicated by the black lines in Fig. 8 will be demonstrated empirically in Section IV.

[0106] The gray lines in Fig. 8 denote connections that are expected to exist, with solid gray indicating known mathematical expressions that are not reported in this article and dashed gray indicating relationships which were empirically observed in this article, yet difficult to isolate and quantify due to multiple confounding effects on the resulting photoacoustic images.

[0107] B. Modeling gCNR Predictions as a Classification Task [0108] 1) Approach Overview: The gCNR metric has been introduced and described as a normalized measurement of the highest achievable probability of success of an optimal two-class classifier operating on a given image. The associated expression is where Pe is the probability of error of the classifier. Deter- mining Pe requires characterization of this classifier, based on decision boundaries, which are computed from the points of intersection between the probability distribution functions (PDFs) of target and background powers. Considering that histogram-independent gCNR predictions depend on these power distribution models, an accurate gCNR prediction will demonstrate the accuracy of our chosen target and background power distribution models. Considering the additional dependencies on baseline system parameters and imaging environment details (located in the first layer of Fig. 8), accurate gCNR predictions will additionally establish the validity of our overall framework.

[0109] To predict the gCNR of a photoacoustic image, a four step approach is used: 1) modeling the target power distribution (Section II-B2); 2) modeling the background power distribution (Section II-B3); 3) determining decision boundaries computed from the points of intersection between the target and background power distributions (Section I I-B4); and 4) computing the gCNR predictions based on the decision boundaries (Section II-B5).

[0110] 2) Target Power Distribution: Others demonstrated that as a consequence of photoacoustic speckle, the envelope of photoacoustic radio frequency (RF) signals can be modeled by the Nakagami-m distribution. In this model, the signal power then follows the gamma distribution, which is a generalization of the exponential distribution employed to characterize ultrasound image targets in [24], The PDF of the photoacoustic target power distribution, pi, can be written as where x is the target power, and k and 6 are scalar parameters denoting the shape and scale of the gamma distribution, respectively. The mean of the target power distribution, μ x , is

Note that when k = 1 , the gamma distribution reduces to an exponential distribution

[0111] 3) Background Power Distribution: We assume that the back- ground of a photoacoustic image primarily contains thermal and electronic noise. Others demonstrated that thermal noise can be modeled using the complex normal distribution in ultrasound images created using DAS beam- forming. Given the similar receiver hardware and software, it is reasonable to expect the thermal noise in DAS photoacoustic images to also follow the complex normal distribution. Therefore, the combined sum of the thermal and electronic noise can be modeled using a complex normal distribution. The real and imaginary (i.e., in-phase and quadrature, or IQ) components of the image noise are assumed to be independent and identically distributed with mean zero and variance o 2 . In this case, the amplitude of the background signal follows the Rayleigh distribution, and the background power can be modeled using the exponential distribution, which yields the following PDF: where po is the PDF of the background power distribution, x is the background power, and p is the mean noise power given by the following expression:

As an aside, to mathematically relate this background power distribution to the receiver characteristic of channel SNR (see Fig. 8), particularly when the photoacoustic imaging system acquires raw RF channel data, the associated channel noise is assumed to be Gaussian distributed with a mean of zero and a variance of σ 2 ch. The DAS beamforming process converts the channel data to an image by delaying the channel data and summing the delayed channel data across the received elements. Therefore, the following linear relationship between σ 2 and σ 2 ch iS expected: where N is the number of receive elements, and the factor of 2 is introduced because of the conversion from RF to IQ data. Substituting (5) into (6) yields

[0112] 4) Computing Decision Boundaries: The PDFs of the target and background power of a DAS photoacoustic image are rep- resented by the orange and blue lines, respectively, in Fig. 9. The parameters corresponding to these target and background power distributions are k = 1.3, 9 = 0.5, and μ = 0.8. The two points of intersection between the PDFs are highlighted by the black dashed lines in Fig. 9. The abscissae of these points of intersection form the set of decision boundaries, D, of an ideal two-class classifier for the sample image, defined as where Em denotes the abscissae of the points of intersection.

[0113] The solutions for Em can be computed by equating the target power distribution, pi , to the background power distribution, po , as follows:

[0114] Substituting the expressions for the target and background probability distributions from (2) and (4) into (9) yields the following expression for the decision boundaries:

[0115] Equation (10) can be satisfied with four possible cases of the decision boundary points, depending on the values of k, θ, and μ.

[0116] Case 1 (k \- 1 and θ l= μ): In this first case, there are up to two decision boundaries, as shown in Fig. 9. To compute these decision boundaries, (10) may be simplified as where

[0117] For nonzero values of a (i.e. , 9/= μ), (11 ) is satisfied by the LambertW function as

[0118] Considering that (12) does not yield a simple closed-form expression for Em , numerical values for Em may be computed using the lambertw function in MATLAB. The complex outputs of this function can then be discarded to obtain up to two decision boundaries for the ideal two-class classifier when k l= 1 and 9 l= μ.

[0119] Case 2 (k l= 1 and θ= μ): In this second case, (10) can be simplified to the following expression:

[0120] Because Em is a decision boundary corresponding to a power value sampled from the image, we are interested in nonnegative, real-valued solutions to (13). The parameters of the target power distribution have the constraints k > 0 and 9 > 0. From the constraint on the target shape parameter k, we obtain the related constraint f(k) > 0. With these constraints, (13) can be rearranged to obtain the following expression for the single-decision boundary EQ when k 1= 1 and 6 = μ:

[0121] Case 3 (k = 1 and θ /= μ): This third case is observed in images with low channel SNR. A decrease in the channel SNR of a photoacoustic image results in the background noise overwriting the target region, causing a similarity that allows both the target and background power distributions to be modeled using the exponential distribution. Therefore, the PDF described by (2) reduces to which is the same as the target model for ultrasound images. Substituting (15) and (4) into (9) yields the fol- lowing closed-form expression for the single-decision bound ary E0 when k = 1 and θ l= μ:

[0122] Case 4 (k = 1 and θ = μ); The fourth case occurs in photoacoustic images with sufficiently low channel SNR such that the target and background regions are indistinguishable. Similar to Case 3, the target power distribution is modeled as an exponential distribution with mean 6. In this trivial case, the target and background power distributions coincide completely, resulting in infinitely many points of intersection between the distributions. As a result, no selection of decision boundaries can distinguish between the target and background signals under the conditions k = 1 and θ = μ.

[0123] 5) Predicting gCNR Using Decision Boundaries: Predictions of gCNR can be computed for the four cases described above. In each case, we assume that the target and background regions have equivalent areas.

[0124] For Case 1 , Pe in (1) expands to where Pe (Eo , Ei ) is the probability of error, and PF (E0 , E1 ) and PM (E0 , E1 ) are the probabilities of false positives and missed detections, respectively, for the given pair of decision boundaries (E0 , E1 ). The probabilities Pp (E0 , E1 ) and PM (E0 , E1 ) are represented by the blue and orange shaded regions, respectively, in Fig. 9. Expressions for these probabilities are derived based on the decisions of the optimal two-class classifier in the intervals [0, E0 ], (E0 , E1 ), and [E1 , ∞ ). In the interval (Eo , Ei ), the optimal decision is to assign the target class to samples. As a result, the probability of false positives Pp (Eo, Ei ) is

[0125] In the intervals [0, EO] and [E1 , ”), the optimal decision is to assign the background class to samples. Therefore, the probability of missed detections PM (E0, E 1 ) is

[0126] Substituting (18) and (19) into (17) yields

[0127] The resulting expression for the gCNR of a photoacoustic target

(after substitution into (1 )) is

[0128] For Case 2, with the assumption that photoacoustic targets have a higher amplitude than the background, we classify samples in the intervals [0, _0) and [O, ) as belonging to the background and target, respectively. Therefore, the probabilities of error are and the resulting expression for the gCNR of a photoacoustic target is

[0129] The precise value of the integral in (25) depends on the values of k, 0 , and μ.

[0130] For Case 3, (4), (15), and (16) are substituted into (25) to obtain the following closed-form expression for gCNR:

Comparing (4) and (15) with (28) and (27) in [24], we observe that photoacoustic images with low channel SNR values have statistical properties similar to ultrasound images. As a result, the final expression for gCNR obtained in (26) has a similar format to (35) in [A. Rodriguez-Molares et al., “The generalized contrast-to-noise ratio: A formal definition for lesion detectability,” IEEE Trans. Ultrason., Ferroelectr., Freq. Control, vol. 67, no. 4, pp. 745-759, Apr. 2020], which was derived for ultrasound images of hypoechoic targets. However, photoacoustic signals tend to be brighterthan their surrounding environments, which more closely aligns with the hyperechoic ultrasound target case described in [A. Rodriguez-Molares et al., “The generalized contrast-to-noise ratio: A formal definition for lesion detectability,” IEEE Trans. Ultrason., Ferroelectr., Freq. Control, vol. 67, no. 4, pp. 745-759, Apr. 2020.].

[0131] For Case 4, no optimal decision boundaries can be obtained for a two- class classifier, which achieves an accuracy of at most 50% on these images using any available (nonoptimal) decision boundary. Under these conditions, it is trivial to prove that the gCNR of the underlying photoacoustic image is zero.

[0132] C. Effect of Thresholding on gCNR

[0133] As reported elsewhere, applying a threshold of t0 to a photoacoustic image results in pixels with power values less than t0 in both the target and background regions of the image being set to zero. Fig. 10 shows the effect of the thresholding operation on the sample target and background distributions described in Section II-B4 and plotted in Fig. 9 with to = 0.5. For target and background pixel power values x < to , the original target and background distributions in Fig. 9 are replaced by Dirac delta functions ( t0 pj (y)dy)6(x ) o and ( po(y)dy)δ(x ), respectively, in Fig. 1 0. The target and background power distributions remain unchanged for x > to . The set of decision boundaries Dt of the optimal classifier for the thresholded image is formulated from to and D in (8) as follows:

As the threshold value to increases, the separability between the target and background power distributions is expected to decrease, resulting in an expected decrease in gCNR.

[0134] III. METHODS

[0135] A. Validation Experiments [0136] 1 ) Datasets and Initial Imaging Environments: We acquired three datasets (i.e., simulated, experimental, and in vivo) to validate the theoretical derivations of gCNR predictions with gCNR measurements across a range of channel SNR values. The simulated dataset consisted of 6-mm- diameter photoacoustic targets residing at a depth of 15 mm using the k-Wave toolbox. Optical absorbers of size 6 μm were placed in the target region with a spatial density of 135 absorbers/mm 2 . This absorber density lies between the moderately dense and dense cases of 3.18 and 318.31 absorbers/mm 2 , respectively. A transducer was defined in the simulation with a 0.3-mm pitch, 0.06-mm kerf, 128 elements, a center frequency of 5 MHz, and a bandwidth of 2-8 MHz. These parameters were chosen to match the Alpinion L3-8 transducer (Seoul, South Korea). Ten channel data frames were simulated with absorber positions varied within the target region.

[0137] The experimental dataset was acquired using the setup shown in Fig. 11 (a). The photoacoustic imaging system used for this dataset consisted of an Alpinion E-CU BE 12R ultrasound system connected to an Alpinion L3-8 linear array transducer, and a Phocus Mobile laser (OPOTEK, Carlsbad, CA, USA) connected to a 5-mm- diameter optical fiber bundle. The free end of the fiber bundle was submerged in a water bath. The laser was pulsed at a fixed repetition rate of 10 Hz, with a fixed wavelength of 760 nm. The laser energy at the fiber bundle tip was varied from 15 to 68 mJ, as measured with an Ophir PE50BF-DIFH-C energy meter (North Logan, UT, USA), which was used for all reported energy measurements unless otherwise stated. The laser output a trigger signal with each pulse to synchronize the acquisition of data by the ultrasound system. Ten frames of photoacoustic data were acquired with the tip of the fiber bundle at a depth of 35 mm in the imaging plane of the transducer.

[0138] The in vivo dataset was acquired during an in vivo cardiac catheterization procedure conducted on a Sus domesticus domestic swine weighing 78 lbs, as shown in Fig. 1 1 (b). All in vivo procedures were approved by the Johns Hopkins Institutional Animal Care and Use Committee. The swine was anesthetized and subsequently intubated to provide gas anesthesia throughout the surgery. Respiration was controlled with a ventilator that supplied medical air. The blood pressure and heart rate of the swine were monitored throughout the surgery and imaging procedures. The photoacoustic imaging system (consisting of the Alpinion E-CU BE 12R ultrasound scanner and Opotek Phocus Mobile laser) was placed next to the operating table. The ultrasound system was connected to a 64-element Alpinion SP1-5 phased array transducer with a center frequency of 3 MHz and a sampling frequency of 40 MHz. One end of a 1-mm-diameter optical fiber was inserted into a 2.5-mm-diameter steerable cardiac catheter. The other end of the fiber was interfaced with the laser. An incision was created to gain access to the femoral vein. The catheter was then guided through the inferior vena cava toward the heart with the ventricular septum as the final destination. A fixed laser wavelength of 750 nm was chosen to visualize the catheter tip inside the heart. Photoacoustic data were acquired with laser energies at the fiber tip varied from 2.9 to 608.5 μJ. Data from this animal study were previously described.

[0139] Each of the simulated, experimental, and in vivo datasets consisted of ten frames of raw RF channel data. Gaussian- distributed noise was added to each channel data frame in these three datasets to obtain channel SNR values in the range -40 to 40 dB with increments of 1 dB. Ten noisy channel data frames were generated per channel SNR level for each of the simulated or acquired channel data frames, resulting in 8100 noisy channel data frames per dataset.

[0140] The k-Wave toolbox was utilized to add noise to raw RF channel data frames with standard deviation σch defined using the following equation: where sch is the raw channel data signal prior to the addition of noise, and SNRch is the desired channel SNR value.

[0141] 2) Laser Energy Variations: The relationship between gCNR and laser energy was investigated with two additional imaging environments (i.e., a plastisol phantom and an ex vivo caprine heart shown in Fig. 1 1 (c) and (d), respectively). The photoacoustic imaging system used for these experiments consisted of an LS- series pulsed laser diode (PLD) (Laser Components, Bedford, NH, USA) and an Alpinion E-CUBE 12R ultrasound scanner connected to an Alpinion SP1-5 transducer. One end of a 2-mm-diameter fiber bundle was interfaced with the laser. The other end was inserted into each imaging environment. A function generator transmitted a trigger signal to both the ultrasound scanner and the PLD at a frequency of 20 Hz. Two 5-V channels from a regulated power supply unit (GW Instek, New Taipei City, Taiwan) were connected to the PLD to control the pulsewidth and peak power of the laser. The laser energy settings were varied to maximize the range of achievable gCNR in each environment, resulting in 99 unique pairs of pulsewidth and peak energy for the phantom environment and 88 unique pairs for the ex vivo environment. For each laser energy setting, 100 frames of photoacoustic data were acquired by the ultrasound system. The corresponding laser energies at the fiber bundle tip were then measured using a J-10MT-10 KHZ energy meter by Coherent, Inc., Santa Clara, CA, USA, resulting in ranges of 0.07-10.95 and 0.37- 11.36 μ J for the phantom and ex vivo environments, respectively. Unless otherwise stated, this entire energy range is reported in our results for this experiment. The photoacoustic imaging system consisting of the PLD and Alpinion ultrasound scanner was previously used by oth e rs to demonstrate a photoacoustic-based, teleoperated, robotic surgical system.

[0142] 3) Segmentation Accuracy Experiments: The relationships between the accuracy of the target segmentation process and selected image quality metrics (i.e., gCNR, SNR, and CNR) were investigated with the in vivo dataset described in Section III-A1 as well as with an alternative plastisol phantom shown in Fig. 1 1 (e). The photoacoustic imaging system for this phantom experiment consisted of the Phocus Mobile Laser and Alpinion E-Cube 12R scanner connected to the Alpinion L3-8 transducer described in Section III-A1. A 1-mm-diameter optical fiber was inserted into the phantom. The other end of the optical fiber was interfaced with the laser. Photoacoustic channel data were acquired using a fixed wavelength of 750 nm, a pulse repetition rate of 10 Hz, and laser energies at the fiber tip ranging from 13.5 μj to 2.3 mJ.

[0143] B. Image Formation, Analysis, and Segmentation

[0144] For each dataset described in Section lll-A, photoacoustic images were reconstructed with DAS beamforming, followed by envelope detection, normalization, log compression, and thresholding prior to image display. For each beamformed photoacoustic image, circular regions of interest (ROIs) were selected within the target and background using either a manual or automated approach. The manual ROI approach was implemented for experiments demonstrating relationships between nodes in the first, second, and third layers of Fig. 8. Automated ROI selection was used to demonstrate relationships between nodes in the third and fourth layers of Fig. 8.

[0145] To create manual ROIs, two circles of equal area and depth were placed inside and outside the envelope-detected photoacoustic target. For the simulated, experimental, and in vivo datasets described in Section III-A1 , the ROI diameters were 6, 4, and 3 mm, respectively, with the center of the background ROI laterally shifted by 10, 8.5, and 10 mm, respectively, from the center of the target ROI. For the phantom and ex vivo datasets described in Section III-A2, the ROI diameters were 1.6 and 0.8 mm, respectively, with the center of the background ROI laterally shifted 5 mm from the center of the target ROI. In each dataset, the diameter of the ROIs was selected to match the axial dimension of the photoacoustic target being imaged, while the lateral shift between the ROIs was selected to accommodate deviations in the target shape from the ideal circle while minimizing the distance between the ROIs.

[0146] To demonstrate relationships between nodes in the first and second layers of Fig. 8, the datasets described in Sections III- A1 and III-A2 were employed. For each image in these datasets, the target shape and scale parameters, k and 9 , respec- tively, were estimated within the target ROI of the envelope detected data, using closed-form expressions for estimating the parameters of a gamma distribution from random samples derived by Ye and Chen [43], The mean background power (i.e. , μ) was estimated as the mean of the squares of the samples in the background ROI of the envelope detected data. The simulated, experimental, and in vivo datasets described in Section III-A1 were utilized to quantify the relationship between μ and channel SNR, which is included in the receiver characteristics node in Fig. 8. The phantom and ex vivo datasets described in Section III-A2 were utilized to quantify relationships between laser energy and the target parameters k and 9. A quadratic fit was implemented to describe the relationship between 9 and laser energy in each imaging environment. The R 2 and root-mean-square error (RMSE) values were computed between the quadratic curve and the mean value of 9 for each laser energy.

[0147] To demonstrate relationships between gCNR and the pre-ceding layers in Fig. 8, there are two possible options. First, the nodes in the second layer (i.e. , target and background parameters) and the gCNR metric in the third layer can be connected with values of k, 9 , and /J measured from experimental data, by inserting these parameters into the mathematical expressions derived in Sections II- B4 and II- B5 to predict gCNR. This method was used to predict gCNR of the data described in Sections III-A1 and III-A2. This method was additionally leveraged to predict gCNR limits as a function of 9 for minimum and maximum values of k and μ. Alternatively, to demonstrate the impact of system parameters in the first layer of Fig. 8 (e.g., channel SNR and laser energy) on the gCNR metric located in the third layer of Fig. 8, these indirect relationships were empirically characterized for different imaging environments using gCNR measurements.

[0148] To measure gCNR and validate gCNR predictions, true gCNR values were approximated using the histogram approach described by others. Histograms of the target and background signal powers within ROIs located in normalized DAS beamformed images were created, with bin widths computed as described elsewhere (see Appendix A for justification).

[0149] The histogram-based gCNR was then measured where xk is the mean value of kth bin constructed from the input target and background signal power values, Nb is the number of bins of the histogram computed using a data-based method described elsewhere, and hj and ho are the histogram values for the target and background, respectively.

[0150] To demonstrate the necessity of a photoacoustic-specific gCNR theory (rather than relying on the existing ultrasound theory for our proposed framework), gCNR meas was directly compared with both the theory-based gCNR predictions for photoacoustic images presented in Section II-B5 and the ultrasound framework described in [24] (defined as gCNRus in this article), using the datasets described in Sections III-A1 and III-A2. The resulting errors, eUS and ePA , were computed as follows: where gCNRpA represents gCNRpA 1 , gCNRpA2 , or gCNRpA3 , depending on the case being evaluated. The mean absolute errors (MAEs), 11gjs and 11gpA, were then computed over M photoacoustic images in each dataset where / represents the / th photoacoustic image in the dataset. The percent improvement, Δ g, introduced by the photoacoustic framework was reported for each dataset based on the following equation:

[0151] To demonstrate relationships between nodes in the third layer of Fig.

8 (i.e., SNR, CNR, and gCNR) and the performance of the computer vision-based target segmentation task (i.e., represented by the node in the fourth layer), each image in the phantom and in vivo datasets described in Section III-A3 was utilized. The target segmentation component of our photoacoustic-based visual servoing system was executed on the sequence of images in each dataset. To provide a brief summary of this previously published algorithm, each beamformed image was thresholded, dilated, and eroded. The segmentation algorithm then attempted to locate the fiber tip in each resulting image. If the algorithm successfully segmented the fiber tip in five consecutive beamformed image frames, then a further consistency check was used to prevent reflection artifacts from being misclassified as sources. The segmentation algorithm succeeded on a given image if the fiber tip was successfully located in the image and the output passed the consistency check when applicable. [0152] The centroid of the region output by the segmentation algorithm was used as the center of the target ROI in each image. The ROI diameters were 3 mm in both datasets to match the axial dimensions of the target being imaged. The center of background ROIs was laterally shifted 10 mm from the center of the target ROIs to accommodate deviations in the shape of the segmented targets from the ideal circle. These ROIs were then used to measure gCNR, SNR, and CNR.

[0153] While there are multiple definitions of photoacoustic SNR and CNR, we use the following definitions to maintain consistency with our previous work: where pt and pb are the mean amplitudes of the target and background ROIs, respectively, within envelope detected DAS images, and erf and σb are the standard deviations of the amplitudes of the same data within the same target and background ROIs, respectively.

[0154] To characterize the relationship between segmentation accuracy and measured image quality metrics, a sigmoid curve was fit to datapoints from the phantom dataset described in Section III-A3. The associated R 2 and RMSE were computed to quantify the accuracy of the fit. To quantify accuracy variations across imaging environments, R 2 and RMSE values were also computed between the sigmoid fits and datapoints from the in vivo dataset described in Section III-A3. To demonstrate the effect of small changes in gCNR, SNR, and CNR on segmentation accuracy, the peak rate of increase of the sigmoidal fit was reported for each image quality metric.

[0155] To demonstrate the impact of thresholding on the target segmentation algorithm, each photoacoustic image from the phantom and in vivo datasets acquired in Section III-A3 was utilized. An amplitude-based threshold tQ was applied to each envelope detected DAS image in the range 0-1 , corresponding to no (to = 0) and all (to - 1) pixels removed. In each thresholded image, the previously described automated ROIs were used to measure gCNR, SNR, and CNR using (29), (35), and (36), respectively. The minimum and maximum values of the measured image quality metrics were reported for each dataset as functions of t0. The SNR and CNR metrics are expected to yield nonfinite values in thresholded images when σb in (35) or σt and σb in (36) are zero. To compare the stability of these image quality metrics, the percentage of gCNR, SNR, and CNR measurements that produced finite values were measured.

[0156] IV. RESULTS

[0157] A. Background Power and Channel SNR Relationship

[0158] Fig. 12 shows the mean background power distribution pi as a function of the channel SNR forthe three datasets described in Section III-A1. Initially, IJ decreases as channel SNR increases, until an asymptote is reached. This initial relationship between μ and channel SNR can be explained by substituting (28) into (7) to obtain the following expression for p as a function of the desired channel SNR: where N is the number of receive channels. Taking the logarithm of both sides of (37) yields which matches the linear trend observed in the logarithmic plot of μ as a function of channel SNR in Fig. 12 for lower channel SNR values. The asymptotes of μ are 6.7 for the simulated, experimental, and in vivo cases, respectively. These values correspond to the inherent noise in the raw channel data, which is significantly larger than the variance of the added noise at higher channel SNR levels. This result is expected to produce gCNR values that remain fairly constant at these higher values of channel SNR (i.e. , low noise and high signal conditions). In addition, the results in Fig. 12 provide an empirical demonstration of the edge connecting the receiver characteristics node to the background power distribution node in Fig. 8.

[0159] B. Target Power and Laser Energy Relationship

[0160] Fig. 13 shows measured values of the target parameters 9 and k as functions of the laser energy of photoacoustic images acquired during the experiments described in Section III-A2. The parameterθ generally increases as laser energy increases. Fig. 13(a) shows quadratic curves fit to the mean values of 9 with the phantom (R 2 = g.97 and RMSE = 5.53 x 10 3 ) and ex vivo tissue (R 2 = 0.99 and RMSE = 45.0) represented by the blue and orange dotted lines, respectively. The parameter k generally increases with laser energy in the ex vivo dataset. In the phantom dataset, k initially increases at lower laser energies and then remains low as the laser energy increases. These results provide an empirical demonstration of the edge connecting the laser parameters node to the target power distribution node in Fig. 8.

[0161] As shown in Fig. 8, the background parameter μ does not depend on laser energy. However, as noted in Sections II-B2 and II-B4, the decision boundaries forgCNR predictions additionally depend on μ. Therefore, the range of possible gCNR predictions depends on the measured values of θ , k, and μ, which are reported in Table I, along with measured gCNR values for completeness. These values span Cases 1- 4 of Section II-B4, and they represent the range of k and p values that can be used to predict gCNR as a function of θand thereby predict the gCNR limits of these datasets.

TABLE 1

[0162] C. Target Power, Background Power, and gCNR

[0163] Fig. 14 shows the predicted limits of gCNR as a function of 9 for independent selections of k and μ obtained from Table I. The upper bound of Fig. 14 can be divided into Regions 1 and 2, marked by the circled numbers, with corresponding values of Q, k, and p reported in Table II. The values of μ and k in Regions 1 and 2 correspond to the maximum and minimum values of (μ/k), respectively, within the selected datasets. The gCNR predictions saturate at unity when θ » μ/k.

TABLE II

REGIONS FORMING THE UPPER AND LOWER BOUNDS OF THE GCNR ENVELOPE IN FIG. 7

[0164] The lower bound of the range of predicted gCNR values consists of five distinct regions, marked by the circled numbers 3-7 in Fig. 14. In Regions 3 and 7, the minimum possible gCNR either decreases or increases with an increase in 6. For Regions 4-6, the minimum possible gCNRs correspond to values of k that satisfy μ x = μ . Therefore, k - μ/θ when the target power is gamma distributed in Regions 4 and 6. In Region 5, where the minimum possible gCNR is zero, the target power is exponentially distributed (i.e., k = 1), and μ = 6. These five regions indicate that it is sufficient to require kQ = IJ to predict a minimum gCNR. If corresponding values of k and μ are not available (as in Regions 3 and 7), then the limiting values of k and μ may be selected to maximize the overlap between the target and back- ground power distributions. In addition to fulfilling the primary purpose of establishing expectations for photoacoustic gCNR across a range of target and background parameters, Fig. 14 also provides a theory-based demonstration of relationships between the target and background distribution nodes and the gCNR node in Fig. 8. [0165] D. gCNR Predictions and Indirect Relationships

[0166] 1 ) Channel SNR and gCNR: Fig. 15 compares DAS beam-formed images of simulated (6-mm-diameter target), experimental (5-mm-diameter target), and in vivo (2.5-mm-diameter target) data for channel SNR values of -10, 0, and 10 dB from the datasets described in Section III-A1. Fig 1 5(a) shows a target which is not easily distinguishable from the background. This observation is confirmed by the low gCNR value of 0.30 for the simulated target at a channel SNR of - 10 dB. Visibility improvements were obtained when the channel SNR value increased to 0 dB [Fig. 15(b)] and 10 dB [Fig. 15(c)], Fig. 15(d) shows an image obtained in the water bath environment. Even at a low channel SNR value of - 10 dB, the target was clearly visible, with a gCNR measurement of 0.85. When increasing the channel SNR value to 0 dB [Fig. 15(e)] and 10 dB [Fig. 15(f)], the target was more distinguishable from the background, with gCNR values of 0.95 and 0.97, respectively. Fig. 15(g) and (h), obtained by placing a catheter in an in vivo porcine heart, show targets that were difficult to detect with -10 and 0 dB channel SNR, respectively. At the higher channel SNR of 10 dB [Fig. 15(i)], the photoacoustic signal was more distinguishable from the background.

[0167] The results in Fig. 15 point to the indirect, qualitative relationship between channel SNR and gCNR. Target detectability improved as the channel SNR increased, although the gCNR improvement was not the same for each scenario. In addition, the target detectability was influenced by factors, such as the presence of side lobes and sound attenuation, even at higher channel SNR values [e.g., 10 dB in Fig. 15(c)], As a result, images with the same channel SNR value (e.g., - 10 dB) exhibited large variations in target visibility. These variations were quantified by the corresponding large variations in gCNR measured on the images [i.e., 0.30, 0.85, and 0.16 for Fig. 15(a), (d), and (g), respectively],

[0168] Fig. 16 quantifies measured and predicted relationships between gCNR and channel SNR for the simulated, experimental and in vivo photoacoustic image data described in Section III-A1. There is a clear sigmoidal relationship between gCNR and channel SNR, with a low-gCNR region for lower channel SNR values, a linear region, and an asymptote at higher channel SNR values. While there is generally good agreement between the predictions and measurements, the increased difference between predicted and measured gCNR at channel SNRs lower than -20 dB can be explained by modeling the target as a gamma-distributed signal. At lower channel SNR values, the amplitude of the noise in the tar- get ROI was sufficiently large to cause inaccuracies in the model of the target. These inaccuracies manifest as errors in the computation of the parameters k and θfrom (2), and subsequently in the predicted gCNR values. Despite these inaccuracies, the gCNR predictions performed well even at low channel SNR values, with a peak mean absolute error of 0.04 for noisy in vivo images with channel SNR values ranging -40 to -20 dB. Overall, the results in Figs. 15 and 16 provide empirical demonstrations of the indirect relationship between channel SNR and gCNR, which can be described by a combination of the edges along the directed path from the receiver characteristics node to the background power distribution node to the gCNR node in Fig. 8.

[0169] 2) Laser Energy and gCNR: Fig. 17 shows DAS beam-formed photoacoustic images of the 2-mm-diameter optical fiber bundle in the plastisol phantom [Fig. 17(a)] and ex vivo caprine heart [Fig. 17(b)], taken from the datasets described in Section III-A2. Fig. 17(c) shows the predicted and mea- sured gCNR as a function of laser energy for these datasets, demonstrating an increase in gCNR as energy increases. The asymptote of each curve also highlights that there is a limit to improvements in target detectability that can be achieved by increasing the laser energy beyond a threshold. This laser energy threshold depends on the properties of the acoustic environment (e.g., sound attenuation and acoustic clutter). The results in Fig. 17 provide empirical demonstrations of the indirect relationship between laser energy and gCNR, which can be described as a combination of the edges along the directed path from the laser parameters node to the target power distribution node to the gCNR node in Fig. 8.

TABLE III

MAEs BETWEEN PREDICTED AND MEASURED GC NR VALUES AND CORRESPONDING IMPROVEMENTS WITH

THE PROPOSED FRAMEWORK

[0170] E. Necessity of a Photoacoustic-Specific Framework

[0171] Table III reports the values of ffgUS, IfgPA, and 11g for the results presented in Figs. 16 and 17(c). These MAEs were computed for a total of 8100 images per simulated, experimental, or in vivo dataset described in Section III-A1 , 9894 images in the phantom dataset described in Section III-A2, and 8800 images in the ex vivo dataset described in Section III-A2. The photoacoustic gCNR predictions have excellent agreement with the gCNR measurements in each dataset, with MAEs ranging 3.2 x l0 -3 -2.0 x 10 -2 . In addition, our predictions outperformed the ultrasound-based framework with improvements ranging 37.58%-87.69% across the five datasets. More details about this comparison are available in Appendix B.

TABLE IV

CHARACTERIZATION OF THE SIGMOIDAL FITS FOR PHANTOM AND In Vivo DATA IN FIG. 11

[0172] F. Target Segmentation and gCNR Relationship [0173] Fig. 18 shows the accuracy of the target segmentation algorithm as functions of gCNR, SNR, and CNR. The range of gCNR remains the same for both the phantom and ex vivo datasets in Fig. 18(a), as expected, because gCNR is bounded to the range [0, 1], In contrast, SNR values in Fig. 18(b) range 5.2-59.5 dB for the phantom dataset and 5.8-50.2 dB for the ex vivo set. The CNR values also vary across datasets, ranging 0-1.8 and 0-2.6 for the phantom and ex vivo cases, respectively. These variations in the ranges of SNR and CNR demonstrate the sensitivity of these metrics to multiple nodes in Fig. 8 when compared with gCNR measurements.

[0174] Fig. 18 also shows sigmoidal curves fit to the segmentation accuracy as a function of gCNR, SNR, and CNR using the phantom dataset described in Section III-A3. Table IV lists the R 2 and RMSE between each fit and the corresponding phantom or in vivo datapoints. The SNR results have the highest R 2 value and lowest RMSE due to the presence of fewer data points in the linear region of the sigmoid fit. The gCNR and CNR results have similar R 2 and RMSE values. The last column of Table IV reports the maximum rate of increase in segmentation accuracy as a function of the corresponding image quality metric. This peak rate of increase demonstrates that gCNR and CNR are more sensitive than SNR to detect small changes in segmentation accuracy. Note that a threshold of to = 0.7 was applied to improve the accu- racy of the segmentation algorithm implemented in Fig. 18, while no thresholds were applied to report the corresponding image quality metrics.

[0175] Fig. 19(a)-(c) shows the minimum and maximum values of gCNR, SNR, and CNR measurements, respectively, as functions of to . Of these three metrics, only gCNR monotonically decreases with an increase in to in both the phantom and in vivo datasets, which agrees with the expectations concluded in Section ll-C. In comparison, the CNR measurements achieve peaks of 1 .8 and 4.7 at threshold values to = 0.08 and to = 0.1 1 in the phantom and in vivo datasets, respectively. The absence of a monotonic decrease is similarly observed with SNR when to > 0.5 in Fig. 1 9(b).

[0176] Fig. 19(d) shows the percentage of gCNR, SNR, and CNR measurements yielding finite values as a function of to in the phantom and in vivo datasets. The gCNR measurements remain finite across 100% of the images in both datasets, which is expected because this metric will always produce a value that is bounded by [0, 1], In comparison, the percentage of finite CNR measurements reduces from 97.1% at t0 = 0 to 91.0% at to = 0.82. More substantial decreases in the percentage of finite measurements are observed in the SNR metric, reducing from 100% and 97.1 % at to = 0 in the phantom and in vivo datasets, respectively, to 41.9% and 73.7%, respectively, at to = 0.1. These results demonstrate the sensitivity of SNR and CNR measurements to thresholding and the robustness of gCNR to the same image manipulation.

[0177] V. DISCUSSION

[0178] A. Relationships Elucidated by Photoacoustic gCNR Theory

[0179] Given the desired photoacoustic system parameters and the associated target and background distribution parameters, we demonstrated that we can successfully predict gCNR, which may then be used to determine expected segmentation accuracy for visual servoing tasks. Although results are demonstrated in the context of photoacoustic target segmentation, our overall framework is more generally applicable to a variety of related computer vision-based tasks. We used this framework to demonstrate direct and indirect relationships among system parameters, target and background distribution parameters, and image quality metrics, such as gCNR. In addition, the directed graph representation supports the description of indirect relationships in the context of specific direct relationships.

[0180] There are a total of seven direct and two indirect relationships among system parameters, power distribution para- meters, image quality metrics, and computer vision-based task performance identified by this framework. The direct relationship between channel SNR and μ (Fig. 12) informs the indirect relationship between channel SNR and gCNR. In particular, there is a sigmoidal relationship between gCNR and channel SNR (Fig. 16), which contains toe, linear, and shoulder regions. When comparing these regions with Fig. 12, it is apparent that the toe and linear regions of Fig. 16 cor- respond to the linear region of Fig. 12. In the toe region of Fig. 16, μ is sufficiently large such that the target region is overwritten by noise, resulting in consistently low gCNR. As p decreases in Fig. 12, the corresponding linear region of Fig. 16 demonstrates an increase in gCNR. The shoulder region of Fig. 16 corresponds to the asymptote in Fig. 12, where the corresponding channel SNR values are sufficiently large and the raw channel data remains relatively unchanged after the addition of noise, resulting in constant p and gCNR values as channel SNR further increases.

[0181] Similarly, the direct relationship between laser energy and the target power distribution parameters (Fig. 13) informs the indirect relationship between laser energy and gCNR (Fig. 17). In particular, based on (3), an increase in 9 or k translates to an increase in These increases further translate to increased separation between the target and background distributions and improved gCNRs (assuming no significant change in μ). The larger increase in 9 compared with kfor the phantom data in Fig. 13 manifests as a large increase and eventual saturation of gCNR to 0.98 in Fig. 17(c). For laser energy levels ranging 4-7 JL/J in Fig. 13(a), although the measured 9 values are smaller in the ex vivo dataset compared with the phantom dataset, these same datasets have similarly high gCNR values over the same energy range in Fig. 17(c). This similarity can be explained by the higher values of k for the ex vivo dataset shown in Fig. 13(b) over the same range of laser energies. These observations demonstrate relationships between the target parameters and the imaging environment (e.g., phantom and ex vivo tissue) and between the target parameters and the laser parameters (e.g., laser energy), which are both represented in Fig. 8.

[0182] In addition, Regions 1-7 in Fig. 14 describe the limits of relationships among k, 9 , μ , and gCNR. These regions indicate that the range of achievable gCNR can be completely described for any imaging system and environment using descriptions of target and background parameters. In particular, the presence or absence of each region solely depends on the relative values of k, θ , and μ. Therefore, the achievable ranges of gCNR can be analyzed independently of the underlying parameters of the photoacoustic imaging system (e.g., target size, laser energy, and imaging environment), and these ranges can be compared across different photoacoustic imaging systems.

[0183] B. Impact of Proposed Framework

[0184] The proposed framework demonstrates the feasibility of predicting the performance of a computer vision task using measurements of image quality metrics. The results in Figs. 18 and 19 and Table IV highlight three advantages of gCNR overSNR and CNR when predicting performance of the target segmentation algorithm applied to phantom and in vivo datasets. First, segmentation accuracy is more robust to small changes in gCNR compared with SNR (Fig. 18 and Table IV). Second, the gCNR metric has finite upper and lower bounds, whereas SNR and CNR are unbounded (Fig. 18). Therefore, gCNR measurements are easier to interpret than SNR and CNR in the context of computer vision tasks, such as target segmentation, and they do not experience the data dropout observed in Fig. 19(d) as threshold values increase. Third, SNR and CNR may be arbitrarily increased through image manipulation techniques, such as thresholding, while gCNR experiences a monotonic decrease with increasing threshold values, as observed in Fig. 19(a)-(c). In addition, our previous work [22] demonstrated the large variations in SNR and CNR values for small increases in gCNR in simulated, phantom, and in vivo photoacoustic images. Considering that the sigmoidal relationship between segmentation accuracy and gCNR was successfully characterized with phantom data, the expected performance of a computer vision-based system in a surgical setting can be predicted using this type of phantom-based characterization.

[0185] Overall, the presented framework has the potential to improve the design of photoacoustic imaging systems by addressing the challenge of achieving a desired gCNR in beamformed images. The associated relationships identified and characterized in this article improve the ability of design engineers to optimize parameters, such as laser energy and transducer characteristics when designing systems for specific computer vision tasks. The gCNR metric has the potential to be incorporated into the design process of photoacoustic imaging systems to resize system components and reduce costs, potentially enabling the proliferation of smaller photoacoustic imaging systems better suited for surgical and interventional suites. Thus, employing gCNR as a prediction tool has the potential to improve the performance of tasks such as tracking catheter tips during cardiac catheterization procedures [18], placing pedicle screws during spinal fusion surgeries, visualizing the ureters, uterine arteries, and laparoscopic tool tips during hysterectomies, and visual servoing features of interest in a photoacoustic image.

[0186] The framework presented in Fig. 8 additionally promises to improve the efficiency of preoperative tasks for photoacoustic-guided surgical and interventional procedures. For example, previous work to improve cardiac catheterization procedures required multiple trial and error attempts to determine the optimal laser energy for successful visual servoing of photoacoustic images. Characterizing a visual servoing system with respect to gCNR and the underlying target and background statistics enables the incorporation of a theory-based approach to selecting desired laser energy levels and reduces dependency on trial and error.

[0187] In particular, a photoacoustic image could be acquired once at low laser energy prior to a surgical procedure to measure gCNR. Using the relationship between gCNR and laser energy measured from an ex vivo setup similar to the surgical environment [e.g., Fig. 17(c)] and the relationship between the segmentation accuracy and gCNR established in Fig. 1 8(a), the gCNR prediction framework would provide the increase in laser energy required to achieve a desired segmentation accuracy range (e.g., 80%-100%). Acquiring a photoacoustic image at the updated laser energy would enable measurements of gCNR (from the acquired image), predictions of the segmentation accuracy [based on the sigmoidal fit in Fig. 1 8(a)], and confirmation that the visual servoing process will be conducted within the desired segmentation accuracy range. Based on the selected segmentation accuracy range, this theory-based process inherently allows us to balance the tradeoff between minimizing laser energy and maximizing target segmentation accuracy

[0188] C. Limitations and Future Work

[0189] One limitation of the presented gCNR predictions is their dependence on the PDF of the target and background signal powers. Factors such as the target size, target position, target shape, beamformer selection, and ROI placement cannot be explicitly modeled in these predictions, although we have observed that these factors influence gCNR measurements in experimental data. These and other factors (including the appearance of side lobes and background contributions that cannot be modeled by our gCNR predictions) are nonetheless hypothesized to contribute to the reported empirical observations. In addition, our ability to accurately estimate the parameters of the target and background power distributions depends on the selection of the corresponding ROIs in the photoacoustic image. However, the gCNR prediction framework shares this limitation with both the histogram-based gCNR measurementtechnique and traditional image quality metrics, such as SNR, CNR, and contrast, which are measured using ROIs.

[0190] Note that the model of the target power distribution presented in Section II-B2 was developed for photoacoustic images created with envelope detection. Without envelope detection, the target and background distributions have modified shapes, with increased probability densities observed at lower power values. If image formation is altered by removing envelope detection, these observed differences would require the derivation of a new power distribution model that will accurately produce lower gCNR predictions in the absence of envelope detection.

[0191] Future work includes extending the presented framework to other beamformers to demonstrate the improvements in target detectability achieved by different image formation processes. Previous work from our group empirically demonstrated improvements in gCNR measured from images recon- structed using SLSC beamforming compared with images reconstructed from the same raw channel data using DAS beamforming. However, a theoretical analysis of the impact of beamformers on target detectability would require specific models of the target and background power distributions for each beamformer being analyzed.

[0192] Another potential application of our framework is to estimate the parameters of a photoacoustic imaging system using the target and background distribution parameters obtained from a beamformed image. For example, channel SNR is traditionally utilized to estimate noise in photoacoustic images. However, in clinical environments, without access to channel data, it would not be possible to estimate the channel SNR that exists in photoacoustic images to help with troubleshooting steps needed for real-time improvement of poor image quality (e.g., increasing laser energy versus adjusting the dynamic range). Our group recently introduced a method to estimate channel SNR from gCNR without access to raw channel data. This approach can be incorporated into our frame- work to link the image domain represented by the second and third layers in Fig. 8 to the channel data domain represented by the first layer in Fig. 8. This link has the potential to improve the rapid decision-making that is often necessary in life or death situations, enhance the reproducibility of clinical imaging environments, and provide consistently high-quality photoacoustic image acquisitions.

[0193] In addition, the extension of this framework to raw channel data would enable performance benchmarking for a new class of deep learning-based photoacoustic visual servoing systems that segment photoacoustic sources from raw channel data. This framework may also be utilized to compare the performance of amplitude-based and coherence-based visual servoing systems.

[0194] VI. CONCLUSION

[0195] This example is the first to present a novel framework to establish relationships among photoacoustic imaging system parameters, image quality, and computer vision-based task performance. This framework leveraged gCNR to quantify the relationships between system parameters (e.g., channel SNR, laser energy, and frame averaging) and photoacoustic image quality. A critical component of this framework involved predicting the gCNR of photoacoustic images using the statistics of the target and background signal powers. We presented a theoretical derivation of our gCNR predictions and validated these predictions on simulated, experimental, and in vivo data across multiple channel SNRs and laser ener- gies. This framework additionally enabled characterizations of the relationships between system parameter settings and image characteristics, including gCNR as a function of laser energy, the variation of the target and background distribution parameters as functions of laser energy and channel SNR, and the dependence of gCNR on the relative values of the mean target and background powers. Finally, we leveraged our novel framework to quantify the accuracy of a photoacoustic target segmentation algorithm as a function of gCNR and demonstrate the robustness of gCNR to thresholding. The proposed framework has the potential to be extended to other computer vision-based tasks (e.g., target tracking and image classification) and to improve the process of designing photoacoustic imaging systems, specifically the selection of hardware components such as lasers and software components such as image quality improvement techniques.

[0196] APPENDIX A: FIXED VERSUS DATA-BASED HISTOGRAM BIN WIDTHS

[0197] Fig. 20 compares histogram-based gCNR measurements using two different bin selection methods as a function of the values output by the gCNR prediction framework for photoacoustic images, for the five datasets described in Sections III-A1 and III-A2. Existing literature involving gCNR measurements uses a fixed number of bins to compute the histograms of both the target and background signal powers. However, statistics literature suggests that using a data-based bin width selection method would provide better results. In this example, we use a bin width selection method previously developed , where the bin width is a function of the number of samples and standard deviation of the target and background signal powers. Fig. 20 shows a strong agreement between the gCNR values measured using this data-based method, represented by the orange circles, and the gCNR predictions. In contrast, the gCNR values measured by fixing the number of bins to 256 deviate significantly from the predicted gCNR values, as shown by the blue circles in Fig. 20. These results suggest that the data-based method is superior to fixing the number of bins to 256. We expect similar results to be obtained with ultrasound images and the ultrasound framework previously published.

[0198] APPENDIX B: ADVANTAGE OF PHOTOACOUSTIC FRAMEWORK OVER ULTRASOUND FRAMEWORK FOR GCNR PREDICTIONS

[0199] Fig. 21 shows a comparison of gCNR prediction frameworks for ultrasound [24] and photoacoustic images as a function of gCNR measurements for the five datasets described in Sections III-A1 and III-A2. The gCNR measurements are obtained from normalized photoacoustic images with histogram bin widths selected using the data-based method previously described. The black line represents the ideal case where the gCNR predictions are exactly equal to the gCNR measurements. The gCNR predictions output by our framework for photoacoustic images are closer to the ideal black line than the gCNR values output by the ultrasound model previously published. This matches our expectation that photoacoustic images require a specific gCNR prediction framework based on the differences in target power statistics between ultrasound and photoacoustic images. The gCNR values predicted with the ultrasound and photoacoustic frameworks are more similar as gCNR decreases because the exponential distribution is sufficient to model the target power at low gCNR values in both cases. However, the observed deviation from the ideal line at these lower gCNR values is likely due to measurement inaccuracies. In particular, the histogram-based method seems to inaccurately represent the target and background distributions at low gCNR values.

[0200] While the foregoing disclosure has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be clear to one of ordinary skill in the art from a reading of this disclosure that various changes in form and detail can be made without departing from the true scope of the disclosure and may be practiced within the scope of the appended claims. For example, all the methods, systems, and/or computer readable media or other aspects thereof can be used in various combinations. All patents, patent applications, websites, other publications or documents, and the like cited herein are incorporated by reference in their entirety for all purposes to the same extent as if each individual item were specifically and individually indicated to be so incorporated by reference.