Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND APPARATUS SUITABLE FOR UTILIZATION OF NEURAL NETWORK BASED APPROACH IN ASSOCIATION WITH ALGORITHM OPTIMIZATION, AND A PROCESSING METHOD IN ASSOCIATION THERETO
Document Type and Number:
WIPO Patent Application WO/2024/056329
Kind Code:
A1
Abstract:
There is provided a processing method (300) which can include a learning step (304). The learning step (304) can be associated with a first optimization loop and/or a second optimization loop. In regard to the first optimization loop, subproblem optimization iterations can be performed and the first optimization loop can be associated with one or more specialized algorithms corresponding to one or more solutions which can be associated with one or more subproblems. In regard to the second optimization loop, one or more coordinate descent iterations can be performed.

Inventors:
HOY MICHAEL COLIN (SG)
Application Number:
PCT/EP2023/073108
Publication Date:
March 21, 2024
Filing Date:
August 23, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CONTINENTAL AUTOMOTIVE TECH GMBH (DE)
International Classes:
G06N5/01; G06N3/045; G06N3/096; G06Q10/047
Foreign References:
US20190147340A12019-05-16
Other References:
S. ZENG ET AL.: "A Reinforcement Learning Approach to Parameter Selection for Distributed Optimization in Power Systems", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 October 2021 (2021-10-22), XP091079247
S. TAIMOOR ET AL.: "Holistic resource management in UAV-assisted wireless networks: An optimization perspective", JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, vol. 205, 11 June 2022 (2022-06-11), XP087124925, ISSN: 1084-8045, [retrieved on 20220611], DOI: 10.1016/J.JNCA.2022.103439
S. KHORRAM ET AL.: "Stochastic Block-ADMM for Training Deep Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 1 May 2021 (2021-05-01), XP081956946
Attorney, Agent or Firm:
CONTINENTAL CORPORATION (DE)
Download PDF:
Claims:
Claim(s)

1. An apparatus (102) suitable for use with a neural network (NN) structure, the apparatus (102) comprising: a first module (202) configurable to receive at least one control signal from the NN structure; a second module (204) coupled to the first module (202), the second module (204) being configurable to process the at least one control signal in a manner so as to generate at least one output signal usable for control of a machine, and a third module (206) coupled to at least one of the first module (202) and the second module (204), the third module (206) being configurable to communicate at least one input signal, the at least one input signal being receivable by the NN structure for processing to generate the at least one control signal, wherein the NN structure is configurable to process the at least one input signal via at least one of: a first optimization loop (150b) wherein subproblem optimization iterations are performed and the first optimization loop (150b) is associable with at least one specialized algorithm corresponding to at least one solution associable with at least one subproblem, and a second optimization loop (150c) wherein coordinate descent iterations are performed.

2. The apparatus (102) according to claim 1 , wherein the NN structure is associable with a neural network (NN) associable with the second optimization loop (150c) and which is capable of accepting at least one input signal for processing to generate at least one learned additional penalty by manner of at least one coordinate descent iteration.

3. The apparatus (102) according to any of the preceding claims, wherein the first optimization loop (150b) comprises querying the at least one learned additional penalty capable of evaluating the at least one solution. 4. The apparatus (102) according to any of the preceding claims, wherein the second optimization loop (150c) corresponds to a coordinate descent iteration process comprising a plurality of coordinate descent iterations, wherein the plurality of coordinate descent iterations comprise a current coordinate descent iteration and a plurality of subsequent coordinate descent iterations following the current coordinate descent iteration, wherein the at least one solution corresponds to a current solution in association with the current coordinate descent iteration, wherein the current solution is capable of being perturbed during a subsequent coordinate descent iteration based on the learned additional penalty to derive an updated current solution.

5. The apparatus (102) according to any of the preceding claims, wherein the at least one solution capable of being repeatedly perturbed during the coordinate descent iteration process until a final solution corresponding to the at least one control signal is derived.

6. The apparatus (102) according to any of the preceding claims, wherein the machine corresponds to a vehicle, and wherein the apparatus (102) is capable of being carried by a vehicle and the apparatus (102) configurable to control the vehicle based on the at least one output signal.

7. The apparatus (102) according to any of the preceding claims, wherein the vehicle corresponds to a delivery robot. 8. A processing method (300) associable with a neural network (NN) structure, the processing method (300) comprising: an input step (302) comprising receiving at least one input signal, communicable from the apparatus (102) of claim 1 ; a learning step (304) associable with at least one of: a first optimization loop (150b) wherein subproblem optimization iterations are performed and the first optimization loop (150b) is associable with at least one specialized algorithm corresponding to at least one solution associable with at least one subproblem, and a second optimization loop (150c) wherein coordinate descent iterations are performed, wherein the NN structure is associable with a neural network (NN) associable with the second optimization loop (150c) and which is capable of accepting at least one input signal for processing to generate at least one learned additional penalty by manner of at least one coordinate descent iteration.

9. The processing method (300) according to claim 8, wherein the first optimization loop (150b) comprises querying the at least one learned additional penalty capable of evaluating the at least one solution.

10. The processing method (300) according to claim 8 or claim 9, wherein the second optimization loop (150c) corresponds to a coordinate descent iteration process comprising a plurality of coordinate descent iterations, wherein the plurality of coordinate descent iterations comprise a current coordinate descent iteration and a plurality of subsequent coordinate descent iterations following the current coordinate descent iteration, wherein the at least one solution corresponds to a current solution in association with the current coordinate descent iteration, wherein the current solution is capable of being perturbed during a subsequent coordinate descent iteration based on the learned additional penalty to derive an updated current solution.

11 . The processing method (300) according to any of claim 8 to claim 10, wherein the at least one solution capable of being repeatedly perturbed during the coordinate descent iteration process until a final solution corresponding to the at least one control signal is derived.

12. The processing method (300) according to any of claims 8 to 11 , further comprising an output step (306), wherein at least one output signal is derivable based on the at least one control signal.

13 A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out at least one of the input step (302), the learning step (304) and the output step (306) according to the processing method (300) of any of claims 8 to 12.

14. A computer readable storage medium having data stored therein representing software executable by a computer, the software including instructions, when executed by the computer, to carry out at least one of the input step (302), the learning step (304) and the output step (306) according to the processing method

Description:
SYSTEM AND APPARATUS SUITABLE FOR UTILIZATION OF NEURAL NETWORK BASED APPROACH IN ASSOCIATION WITH ALGORITHM OPTIMIZATION, AND A PROCESSING METHOD IN ASSOCIATION THERETO

Field Of Invention

The present disclosure generally relates to one or both of a system and at least one apparatus suitable for utilizing one or more neural network-based approaches in association with, for example, algorithm optimization. More specifically, the present disclosure relates to a system and/or at least one apparatus in association with, for example, using neural-based approach(es) for combinatorial optimization, in a manner which can facilitate customized optimization algorithm(s) such that different subproblems can be combined (together) in a way that can allow for tight coupling between the (different) subproblems. The present disclosure further relates a processing method which can be associated with the system and/or the apparatus(es).

Background

It is foreseeable that system(s) (e.g., transportation or logistic systems) can become increasingly complex, multi-modal and/or diverse. For example, sidewalk robots and autonomous shuttles can become increasingly ubiquitous in smart city settings.

Such systems typically operate under a large number of scheduling constraints autonomously and there could lead to a need for management algorithms for the purpose of, for example, coordination. Examples of constraints can include meeting pickup/drop-off time windows, ensuring that a robot has sufficient space for all cargo, ensuring batteries remain charged, ensuring small robots can rendezvous with motherships efficiently, ensuring the density of robots in any one place remains manageable and/or handling uncertainty in future demand and travel times etc.

The present disclosure contemplates management of algorithms is currently not facilitated in an efficient and/or an optimal manner, and there is a need to address (or at least mitigate) such an issue/ such issues. Summary of the Invention

In accordance with an aspect of the disclosure, there is provided an apparatus (e.g., which can correspond to an electronic module, in accordance with an embodiment of the disclosure) which can be coupled to at least one device (i.e., associable with a “neural network structure”). The apparatus can be suitable for use with, for example, a neural network (NN) structure.

In one embodiment, the apparatus can, for example, include a first module, a second module and/or a third module. In one example, the first module can be coupled to the second module which can be coupled to the third module, in accordance with an embodiment of the disclosure.

The first module can, for example, be configured to receive one or more control signals from the NN structure. The second module can, for example, be configured to process the control signal(s) in a manner so as to generate at least one output signal usable for controlling a machine (e.g., a vehicle which can, for example, correspond to a delivery robot). The third module can, for example, be configured to communicate one or more input signals which can be received by the NN structure for processing to generate the control signal(s). In one embodiment, the apparatus can, for example, be carried by the machine and the apparatus can, for example, be configured to control the vehicle based on the output signal(s).

In one embodiment, the NN structure can be configured to process the input signal(s) by manner of one or both of:

• a first optimization loop wherein subproblem optimization iterations are performed and the first optimization loop is associable with at least one specialized algorithm (i.e., specialized/customized optimization algorithm(s)) corresponding to at least one solution which can be associated with at least one subproblem

• a second optimization loop wherein coordinate descent iterations are performed In one embodiment, the NN structure can be associated with a neural network (NN) (e.g., graph NN) associable with the second optimization loop. Moreover, the NN can be capable of accepting one or more input signals for processing to generate one or more learned additional penalty by manner of one or more coordinate descent iterations.

Furthermore, the first optimization loop can, for example, include querying one or more learned additional penalty capable of evaluating the solution(s), in accordance with an embodiment of the disclosure.

In one embodiment, the second optimization loop can, for example, correspond to a coordinate descent iteration process which can, for example, include one or more coordinate descent iterations. The coordinate descent iteration(s) can, for example, include a current coordinate descent iteration (e.g., an iteration which can be performed at a current point in time) and one or more subsequent coordinate descent iterations (e.g., one or more iterations which can be performed at later point(s) in time after the current coordinate descent iteration). The subsequent coordinate descent iterations can follow (e.g., sequentially) the current coordinate descent iteration. Additionally, the solution(s) can, for example, correspond to a current solution which can be associated with the current coordinate descent iteration. Furthermore, the current solution can be capable of being perturbed during a subsequent coordinate descent iteration based on the learned additional penalty so as to derive an updated current solution. In one embodiment, the solution(s) can be capable of being repeatedly perturbed during the coordinate descent iteration process until a final solution can be derived. The control signal(s) can, for example, correspond to/include/be associated with the final solution.

It is contemplated that in the above manner, one or more specialized/customized optimization algorithms (e.g., specialized solver(s)) for different subproblems can possibly be combined (together) in a manner that can allow tight coupling between the subproblems can be facilitated, in accordance with an embodiment of the disclosure. Generally, for the present disclosure, it is contemplated that the neural network can be responsible for higher level task of deconfliction (i.e., as opposed to low level planning). Hence the possibility of improvement in scaling to larger problem(s) can be facilitated. For example, the neural network can only be concerned with learning to deconflict different local search algorithms whereas low level planning can be carried out by one or more specialized search algorithms (e.g., a library of specialized solvers). It is appreciable that by manner of facilitating the combination of specialized/customized optimization algorithm(s) (e.g., specialized search algorithm(s) associated with/from a library of specialized solvers) such that tight coupling between the subproblems can possibly be addressed, efficiency (e.g., computational efficiency) and/or adaptability/flexibility can possibly be facilitated.

The above-described advantageous aspect(s) of the above apparatus of the present disclosure can also apply analogously (all) the aspect(s) of a below processing method of the present disclosure.

In accordance with an aspect of the disclosure, there is provided a processing method which can be associated with a neural network (NN). In one example application, the processing method can be suitable for use in association with a NN (e.g., a graph NN).

The processing method can, for example, include an input step and/or a learning step, in accordance with an embodiment of the disclosure. In one embodiment, the processing can, for example, further include an output step wherein at least one output signal can be derived based on one or more control signals.

The input step can include receiving one or more input signal(s).

The learning step can be associated with one or both of:

• a first optimization loop wherein subproblem optimization iterations can be performed and the first optimization loop can be associated with one or more specialized algorithms corresponding to one or more solutions which can be associated with one or more subproblems

• a second optimization loop wherein one or more coordinate descent iterations can be performed The NN structure can be associable with a neural network (NN) (e.g., graph NN) which can be associated with the second optimization loop. Moreover, the NN structure can be capable of accepting the input signal(s) for processing to generate one or more learned additional penalty by manner of one or more coordinate descent iterations.

Additionally, the first optimization loop can, for example, include querying at least one learned additional penalty capable of evaluating the at least one solution, in accordance with an embodiment of the disclosure.

In one embodiment, the second optimization loop can, for example, correspond to a coordinate descent iteration process which can, for example, include one or more coordinate descent iterations. The coordinate descent iteration(s) can, for example, include a current coordinate descent iteration (e.g., an iteration which can be performed at a current point in time) and one or more subsequent coordinate descent iterations (e.g., one or more iterations which can be performed at later point(s) in time after the current coordinate descent iteration). The subsequent coordinate descent iterations can follow (e.g., sequentially) the current coordinate descent iteration. Additionally, the solution(s) can, for example, correspond to a current solution which can be associated with the current coordinate descent iteration. Furthermore, the current solution can be capable of being perturbed during a subsequent coordinate descent iteration based on the learned additional penalty so as to derive an updated current solution. In one embodiment, the solution(s) can be capable of being repeatedly perturbed during the coordinate descent iteration process until a final solution can be derived. The control signal(s) can, for example, correspond to/include/be associated with the final solution.

It is contemplated that in the above manner, one or more specialized/customized optimization algorithms (e.g., specialized solver(s)) for different subproblems to be combined together in a manner that can allow tight coupling between the subproblems can be facilitated, in accordance with an embodiment of the disclosure. Generally, for the present disclosure, it is contemplated that the neural network can be responsible for higher level task of deconfliction (i.e., as opposed to low level planning). Hence the possibility of improvement in scaling to larger problem(s) can be facilitated. For example, the neural network can only be concerned with learning to deconflict different local search algorithms whereas low level planning can be carried out by one or more specialized search algorithms (e.g., a library of specialized solvers). It is appreciable that by manner of facilitating the combination of specialized/customized optimization algorithm(s) (e.g., specialized search algorithm(s) associated with/from a library of specialized solvers) such that tight coupling between the subproblems can possibly be addressed, efficiency (e.g., computational efficiency) and/or adaptability/flexibility can possibly be facilitated.

The present disclosure further contemplates a computer program which can include instructions which, when the program is executed by a computer, cause the computer to carry out any one of the programming step, the analysis step, the initial step, the iteration step and the learning step, or any combination thereof (i.e., the programming step, the analysis step, the initial step, the iteration step and/or the learning step).

The present disclosure yet further contemplates a computer readable storage medium having data stored therein representing software executable by a computer, the software including instructions, when executed by the computer, to carry out any one of the programming step, the analysis step, the initial step, the iteration step and the learning step, or any combination thereof (i.e., the programming step, the analysis step, the initial step, the iteration step and/or the learning step).

Brief Description of the Drawings

Embodiments of the disclosure are described hereinafter with reference to the following drawings, in which:

Fig. 1a shows a system which can include at least one apparatus, according to an embodiment of the disclosure; Fig. 1 b shows, in association with the system of Fig. 1a, an example scenario which can be associated with a neural network structure, according to an embodiment of the disclosure;

Fig. 1c shows a flow diagram of a training process of a neural network associated with the neural network structure of Fig. 1 b, according to an embodiment of the disclosure;

Fig, 2 shows the apparatus of Fig, 1a in further detail, according to an embodiment of the disclosure; and

Fig. 3 shows a processing method in association with the system of Fig. 1a, according to an embodiment of the disclosure.

Detailed Description

The present disclosure contemplates, generally, Artificial Intelligence (Al) can, for example, be utilized to learn how to combine optimization algorithms (i.e. , rather than to solve/address the issue of optimization per se), in accordance with an embodiment of the disclosure.

It is contemplated that, for example, optimization problems (e.g., in the transportation domain) can span multiple subproblems, such as “Low level” route planning (e.g., node by node cooperative planning on a graph with a deadlock avoidance objective), “High level” vehicle density management (e.g., high level deconfliction/balancing between different spatial regions) and/or task assignment over a time horizon (e.g., management of a rideshare vehicle), in accordance with an embodiment of the disclosure. When such subproblems are combined together into a single joint optimization problem, there may be a combinatorial explosion in the number of decision variables to be explored/considered. This can place a limitation on the size or number of constraints that can be considered in regard to each optimization problem. Approximations/heuristics can generally be used to address such limitation(s). It is contemplated that, generally, in optimization (e.g., mathematical optimization), it is common to use a combination of algorithms. In one example, coordinate descent can possibly be utilized. In another example, subroutine-based approaches can possibly be utilized. In yet another example, reinforcement learning can possibly be utilized.

In one example, in regard to coordinate descent, coordinate descent can generally run algorithms in a cycle where each algorithm can optimize its “subset” of decision variable(s) given the result(s) of the other algorithm(s). One approach to stabilizing coordinate descent algorithm(s) can be referred to as proximal gradient descent. Proximal gradient descent can be associated with addition of penalty term(s) so as to prevent a solution from excessive change/deviation during each of the optimization step(s). It is contemplated that the type and/or magnitude of the penalty term(s) can be autotuned/determined, in accordance with an embodiment of the disclosure.

In another example, in regard to subroutine-based approaches, subroutine-based approaches can relate to running one algorithm as a sub-step of another algorithm. For example, a task assignment algorithm may use a specialized route planner for evaluating the travel time associated with particular task assignments, in accordance with an embodiment of the disclosure.

In yet another example, in regard to reinforcement learning, reinforcement learning can be utilized for training a neural network (NN) to make planning decisions without any explicit search algorithm. The NN can, for example, be trained based on a large number of simulated experiences.

It is contemplated that the above (e.g., coordinate descent, subroutine-based approaches and/or reinforcement learning etc.) can be associated with one or more issues such as the possible requirement of a very large neural network, the possibility of a very long training time to be able to scale to large problem instances and/or the possibility of severe scaling issues. In this regard, the present disclosure contemplates issue(s) such as efficiency (e.g., computational efficiency) and/or adaptability/flexibility (e.g., ease of customizing an optimization problem), in accordance with an embodiment of the disclosure. Generally, in one example, with regard to efficiency (e.g., computational efficiency), the present disclosure contemplates that combinatorial optimization algorithm can be subject to super-linear scaling as the problem size increases or as more constraints are added, and it would be helpful to address (or at least mitigate) such an issue.

Further generally, in one example, with regard to adaptability/flexibility (e.g., ease of customizing an optimization problem), the present disclosure contemplates that algorithm(s) can be designed to solve very specific/narrow problems, but there may not be a known and/or an obvious way to extend such algorithm(s) to solve problems with custom constraints (e.g., which may possibly be encountered when deploying small robots in diverse environments).

In one specific example, it is contemplated that coordinate descent can possibly be deficient when there is a strong coupling between variables. For example, a possible deficiency can be encountered when a variable such as “task assignment” and another variable such as “route planning” are tightly coupled in the sense that any changes to the “task assignment” would invalidate the routes that were previously planned. Moreover, it is contemplated that proximal gradient descent can be suitable in relation to problems of continuous variables and when the penalty terms are generally quite simple (e.g. a quadratic term), but proximal gradient descent may encounter limitations in relation to problems such as task assignment and routing.

In another specific example, it is contemplated that subroutine-based approaches may possibly require a huge number of calls to the inner optimization algorithms during the execution of the outer optimization algorithms and may thus be inefficient.

In yet another specific example, it is contemplated that reinforcement learning may possibly not scale well to large problems (in terms of both NN size and compute time) and thus may not be optimal.

In view of the above, the present disclosure generally contemplates the possibility of a neural network (NN) based approach in association with the formulation/derivation of a penalty term that can possibly address one or more deficiencies/limitations as discussed in connection with coordinate descent, and possibly facilitate efficiency and/or adaptability/flexibility, in accordance with an embodiment of the disclosure.

In one specific example, the present disclosure contemplates the possibility of processing which can be based generally on application of “proximal gradient descent” in association with combinatorial optimization (e.g., applying a methodology inspired by “proximal gradient descent” to combinatorial optimization), in a way that can facilitate customized optimization algorithms for different subproblems to be combined together in a manner that can allow tight coupling between the subproblems, in accordance with an embodiment of the disclosure. For example, it is contemplated that a NN can be utilized to act/function as a learned cost function (analogous/similar to the penalty term in proximal gradient descent) and the learned cost function can represent preferences for decision variables outside the subproblem and can harmonize the sequential optimization of each subproblem.

It is appreciable that the present disclosure contemplates the possibility that a NN can learn to deconflict different local search algorithms and the low level planning can possibly be carried out by custom ized/specialized algorithms (e.g., referable to as “specialized/customized optimization algorithms”) that can scale well to larger problem instance(s).

The foregoing will be discussed in further detail with reference to Fig. 1 to Fig. 3 hereinafter.

Referring to Fig. 1a, a system 100 is shown, according to an embodiment of the disclosure. The system 100 can, in one example, be suitable for use in association with at least one vehicle (e.g., an autonomous type vehicle or a semi-autonomous type vehicle) and/or at least one vehicle related application, in accordance with an embodiment of the disclosure. The system 100 can, in another example, be suitable for use in association with an automated vehicle computer for implementing functions such as path planning and/or object tracking, in accordance with an embodiment of the disclosure. In yet another example, the system 100 can be suitable for use in association with a server for implementing functions such as mapping, task allocation and/or cooperative routing, in accordance with an embodiment of the disclosure. Other examples such as supply chain optimization, production/factory optimization and/or logistics optimization can possibly be useful, in accordance with embodiment(s) of the disclosure.

The present disclosure contemplates that the system 100 can, for example, be suitable for facilitating the possible utilization of one or more neural network-based approaches in association with, for example, combining a set of algorithms associated with one or more subproblems, in accordance with an embodiment of the disclosure. For example, one or more neural network-based approaches can be utilized in association with conglomeration of an arbitrary set of specialized search algorithms (e.g., optimization algorithms) for individual subproblems. In a more specific example, the system 100 can be associated with a neural network structure which can be capable of enabling the conglomeration of an arbitrary set of specialized search algorithms for individual subproblems. For example, a neural network structure can be based on/correspond to/include/be associated with a coordinate descent type structure where, for example, a single decision variable subset can be optimized at each iteration by each of the subproblem optimization algorithms.

As shown, the system 100 can include one or more apparatuses 102, at least one device 104 and, optionally, a communication network 106, in accordance with an embodiment of the disclosure.

The apparatus(es) 102 can be coupled to the device(s) 104. Specifically, the apparatus(es) 102 can, for example, be coupled to the device(s) 104 via the communication network 106.

In one embodiment, the apparatus(es) 102 can be coupled to the communication network 106 and the device(s) 104 can be coupled to the communication network 106. Coupling can be by manner of one or both of wired coupling and wireless coupling. The apparatus(es) 102 can, in general, be configured to communicate with the device(s) 104 via the communication network 106, according to an embodiment of the disclosure.

The apparatus(es) 102 can, for example, correspond to one or more computers which can be carried, in accordance with an embodiment of the disclosure. For example, the apparatus(es) 102 can correspond to a processor which can be carried by a vehicle (e.g., an autonomous vehicle such as a delivery robot), in accordance with an embodiment of the disclosure.

In general, the apparatus(es) 102 can be configured to perform one or more processing tasks. The processing task(s) can include, for example, one or both of generating and communicating one or more input signals, in accordance with an embodiment of the disclosure. The processing task(s) can further include, for example, receiving one or more control signals, in accordance with an embodiment of the disclosure. The processing task(s) can yet further include, for example, processing the control signal(s) to generate one or more output signal(s), in accordance with an embodiment of the disclosure. In this regard, it is appreciable that the apparatus(es) 102 can be configured to perform one or more processing tasks in association with, for example, any one of the input signal(s), the control signal(s) and the output signal(s), or any combination thereof (i.e. , the input signal(s), the control signal(s) and/or the output signal(s)), in accordance with an embodiment of the disclosure. For example, in one embodiment, the apparatus(es) 102 can be configured to perform one or more processing tasks in association with, for example, the input signal(s), the control signal(s) and the output signal(s). The apparatus(es) 102 will be discussed in further detail with reference to Fig. 2, according to an embodiment of the disclosure.

The device(s) 104 can, for example, carry a deep learning type algorithm/network or can, for example correspond to a deep learning type algorithm/network. It is contemplated that the device(s) 104 can, for example, generally be associated with at least one Neural Network (NN), in accordance with an embodiment of the disclosure. In one example, the device(s) 104 can correspond to one or more host devices (e.g., one or more computers, or a network of computers) which carry a NN. In another example, a device 104 can correspond to a database associated with a NN (e.g., a graph NN). In one embodiment, a device 104 can, for example, correspond to a node in a NN and a number of devices 104 (i.e., corresponding to a plurality of nodes) can be coupled (e.g., interlinked/interconnected) to form a NN. In another embodiment, a device 104 can correspond to a host device carrying a plurality of nodes forming a NN. In yet another embodiment, a device 104 can, for example, correspond to a first host device carrying at least one node (e.g., a plurality of nodes) and another device 104 can, for example, correspond to a second host device carrying another at least one node (e.g., another plurality of nodes), and the first and second host devices can be coupled.

Generally, the device(s) 104 can be configured to receive the input signal(s) from the apparatus(es) 102 for further processing to generate one or more control signals. The control signal(s) can, for example, be communicated from the device(s) 104, in accordance with an embodiment of the disclosure. In a specific example, the control signal(s) can be communicated from the device(s) 104 to the apparatus(es) 102 for further processing to generate the output signal(s).

The communication network 106 can, for example, correspond to an Internet communication network, in accordance with an embodiment of the disclosure. In another example, the communication network 106 can correspond to a telecommunication network. Communication (i.e., between the apparatus(es) 102 and the database(s) 104) via the communication network 106 can be by manner of one or both of wired communication and wireless communication.

The system 100 will now be discussed in further detail in accordance with an example context 150 with reference to Fig. 1 b and Fig. 1c, in accordance with an embodiment of the disclosure, hereinafter.

The example context 150 can, for example, be based on vehicle routing (e.g., routing of an autonomous type vehicle), in accordance with an embodiment of the disclosure In a more specific example, the example context 150 can be based on vehicle routing in association with a delivery robot, in accordance with an embodiment of the disclosure. Moreover, in the example context 150, the system 100 can include at least one apparatus 102 coupled to at least one device 104, in accordance with an embodiment of the disclosure.

Generally, in the example context 150, a neural network structure which can be capable of facilitating the possibility of combining (e.g., conglomeration) a set of algorithms (e.g., an arbitrary set of specialized search algorithms such as subproblem optimization algorithms) in association with one or more subproblems (e.g., individual subproblems), is contemplated. The contemplated neural network structure can, for example, be based on a coordinate descent type structure where a decision (e.g., a single decision) variable subset can be optimized at an iteration (e.g., at each iteration) by one or more subproblem optimization algorithms (e.g., by each of the subproblem optimization algorithms), in accordance with an embodiment of the disclosure. It is to be noted that the present disclosure does not assume that the other variables are held constant during each coordinate descent - in particular, it is contemplated that decision variables associated with lower or equal levels of a planning hierarchy could possibly change arbitrarily when performing optimization of the higher levels of a planning hierarchy. For example, it is contemplated that any change to the task assignment would possibly invalidate the start/end locations/times of any tasks that have been swapped between two vehicles and replanning could possibly be necessary. As such, it is contemplated, in accordance with an embodiment of the disclosure, that the neural network structure should be capable of learning (or at least facilitate learning) to, for example, amortize computation of one or more other planning algorithms even if significant changes to the solution need to take place at each iteration.

It is generally contemplated that the neural network structure can, for example, be capable of providing (or at least facilitate the provision of) a general/overall/high-level strategy while one or more individual solvers resolve details for at least one or more various aspects of a subproblem (e.g., an individual solver can, for example, be configured to resolve the detail(s) for all aspects of a subproblem), in accordance with an embodiment of the disclosure. In one embodiment, the neural network structure can, for example, include/be associated with one or more optimization loops (e.g., a first optimization loop and a second optimization loop). For example, the neural network structure can be configured to gain (e.g., by manner of learning) an understanding of a current solution as a whole so that a first optimization loop (e.g., referable to as “inner optimization loop”/”inner loop optimization”) can be penalized when the inner loop optimization proposes one or more solutions to a subproblem which may not be consistent with the current solution as a whole. For example, where there are a number of vehicles, each vehicle path (i.e., of each vehicle) can be considered as a subproblem and if an alternate path which collides with another vehicle is chosen, it may be either “easy” or “hard” to re-route the other vehicle out of the colliding path. The second optimization loop (e.g., referable to as “outer optimization loop”/”outer loop optimization”) can, for example, be configured to internalize complexity of rerouting in different situations and provide such internalization in a format (e.g., in the form of/corresponding to some type of parameterize function) for which a learned additional penalty can possibly be computed (e.g., computed efficiently) in the inner optimization loop, in accordance with an embodiment of the disclosure.

As shown in Fig. 1 b, in the example context 150, a neural network structure 150a can, for example, include one or both of a first optimization loop 150b and a second optimization loop 150c (i.e., a first optimization loop 150b and/or a second optimization loop 150c), in accordance with an embodiment of the disclosure. In one embodiment, the neural network structure 150a can, for example, include a first optimization loop 150b and a second optimization loop 150c. Moreover, the neural network structure 150a can be configured to receive one or more input signals 150d.

The input signal(s) 150d can, for example, be associated with/include/be based on/correspond to one or both of: one or more system states 151a, and one or more predictive models 151 b, in accordance with an embodiment of the disclosure. In one embodiment, the input signal(s) 150d can, for example, include at least one system state 151a and at least one predictive model 151b. The system state(s) 151a can be communicable from, for example, the apparatus(es) 102, in accordance with an embodiment of the disclosure. The system state(s) 151a can, for example, be indicative of current position of each vehicle, a list of all in progress jobs and/or a list of delivery windows that have been guaranteed to customers, etc. In one specific example, the system state(s) 151a (e.g., which can include upcoming delivery request(s), position/container status/battery status of each vehicle and/or a graph showing possible route(s) the vehicle(s) can take through an environment) can be collated. Collation can, for example, in a form of a graph having a number of nodes and each node can correspond to a particular/specific location in an environment. It is contemplated that other data structures may possibly be applicable (e.g., different data structures for application(s) which may not be relevant in connection with, for example, vehicle routing), in accordance with an embodiment of the disclosure.

The predictive model(s) 151 b can be communicable from, for example, one or more devices 104 and/or the apparatus(es) 102. Generally, the predictive model(s) 151b can be communicable from one or more external systems/devices (e.g., one or more devices 104) that can be configured to perform one or more processing tasks in relation to prediction of variables/parameters such as, for example, future demand, travel time, fault and/or battery life etc., in accordance with an embodiment of the disclosure.

With regard to the first optimization loop 150b, one or more processing tasks in association with one or more subproblem optimization iterations can be performed, in accordance with an embodiment of the disclosure.

With regard to the second optimization loop 150c, one or more processing tasks in association with one or more coordinate descent iterations can be performed, in accordance with an embodiment of the disclosure.

Generally, in accordance with an embodiment of the disclosure, for each coordinate decent iteration (i.e., which can correspond to one subproblem) 150c, the neural network structure 150a can, for example, be configured to perform one or more processing tasks in association with any one of, or any combination of, the following:

• collating one or more provisional values of a set of decision variables 152 (e.g., provisional task assignments and/or full set of routes etc.). It is contemplated that such provisional value(s) can, for example, be expressed in a form of a graph data structure, in accordance with an embodiment of the disclosure.

• receiving one or both of the input signal(s) (e.g., the system state(s) 151a and/or data corresponding to the predictive model(s) 151b) and data indicative of a current solution by a neural network (NN) such as a graph neural network (graph NN) 154. It is contemplated that the graph NN 154 can be configured to predict a set of learned additional penalty parameters 156 which can be usable for evaluating one or more possible solutions to the subproblem associated with the first optimization loop 150b. In one embodiment, graph NN can, for example, refer to a neural network which can execute once in each coordinate descent iteration and can generate the learned additional penalty parameter(s). It is contemplated that, in one embodiment, a separate graph neural network can possibly be utilized for each subproblem. It is further contemplated that, in one embodiment, one or more layers can possible be shared (e.g., as between the graph neural networks) to, for example, improve learning.

• running a specialized search algorithm (e.g., an optimization algorithm) such as a local/graph search algorithm in association with the subproblem. It is contemplated that for each solution to the subproblem, a learned additional penalty 156 can be queried. The learned additional penalty 156 can, for example, be useful for evaluation of the subproblem solution. For example, the input signal(s) may be indicative of a particular route and the learned additional penalty can be a scalar quantity. It is contemplated that the learned additional penalty 156 can, for example, correspond to/be a surrogate for the effect(s) the subproblem solution variable(s) can have on the optimization problem as a whole. Moreover, it is contemplated that the (specialized) search algorithm can combine the learned additional penalty 156 with the heuristics used to solve the subproblem.

Generally, the present disclosure contemplates that in the example context 150, one or more subsets of decision variable(s) 158 can be iteratively optimized, for example, in a cyclic manner. For example, if there are three subsets of variables, optimization 160 can be in the following (cyclic) manner: o Optimize decision variable subset 1 (subproblem 1 ) o Optimize decision variable subset 2 (subproblem 2) o Optimize decision variable subset 3 (subproblem 3) o Optimize decision variable subset 1 (subproblem 1 ) o Optimize decision variable subset 2 (subproblem 2) o Optimize decision variable subset 3 (subproblem 3) o ... (i.e., and so on and so forth)

Generally, it is contemplated that a decision variable can, for example, be in relation to one or more values that an optimization algorithm should choose, in accordance with an embodiment of the disclosure. Such value(s) can, for example, be expressed as a (large) number of binary values which can be representative of, for example, one or more task assignments and/or one or more paths through a graph, etc., at different points in time, in accordance with an embodiment of the disclosure. Moreover, a set of decision variables can, for example, refer to the decision variables associated with the entire problem, in accordance with an embodiment of the disclosure. Additionally, a decision variable subset can, for example relate to a subset of the decision variables, in accordance with an embodiment of the disclosure and solving a subproblem can, for example, be in relation to determining one or more assignments for a decision variable subset, in accordance with an embodiment of the disclosure. It is contemplated that possible values for the decision variables can, for example, be referred to as provisional values. Moreover, it is contemplated that although each subproblem can, for example, correspond to a subset of the decision variables, there may possibly be overlap where a particular decision variable belongs to more than one subproblem, in certain situations, in accordance with an embodiment of the disclosure. Additionally, it is contemplated that there can also be a possibility that, for example, multiple subproblems could correspond to the exact same subset of decision variables, except a different inner loop algorithm and/or cost function can be used during the inner loop optimization of the subproblem, in accordance with an embodiment of the disclosure.

Further generally, learned additional penalty (e.g., which can possibly be expressed as a function with dynamically computed parameters or as a matrix etc.) can, for example, relate to augmenting a cost function with an additional “penalty” that nudges the optimization algorithm towards optimal/good solution(s), in accordance with an embodiment of the disclosure. For example, in one embodiment, a graph neural network can be configured to predict one or more parameters (i.e., learned additional penalty parameters) that can be used to compute the learned additional penalty. Implementation (e.g., depending on the nature of the subproblems) can be by manner of, for example, a parameterized function, a matrix or a neural network with dynamic weights (also referred to as a “hypernetwork”), in accordance with an embodiment of the disclosure. Additionally, a cost function can, for example, relate to a function that can output an overall scalar score for each decision variable, in accordance with an embodiment of the disclosure. In one example situation (e.g., in a situation where the number of decision variables can be considered to be small for a particular subproblem), a cost function can be expressed as a lookup operation on a matrix (i.e., referable to as a “cost matrix”), in accordance with an embodiment of the disclosure - in such an example situation, it can, for example, be possible to precompute the cost function for each entry of the matrix and make reference (e.g., direct reference) to the cost matrix. It is contemplated that the “learned additional penalty” can, for example, provide one of the components/parameters of the cost function for each subproblem - the present disclosure contemplates that the cost function can include one or more other components/parameters (e.g., package handling time and/or vehicle travel distance minimization).

Yet further generally, coordinate decent iteration can, for example, refer to a step of an overall optimization problem where a single subproblem is solved. Moreover, in regard to subproblem optimization iteration, it is contemplated that when a subproblem can be associated with being iterative (in nature), a single iteration of the optimization of the subproblem can be referred to as a “subproblem optimization iteration”, in accordance with an embodiment of the disclosure. It is contemplated that initially, there can be some form of a “default” solution (e.g., all the vehicles have no navigation commands) and as each step of the coordinate descent iteration process is executed, a subset of the variables comprising a current solution can be updated. In this regard, the current solution can be repeatedly perturbed during the coordinate descent iteration process until a final solution is derived. Such a final solution may then be executed by the actual vehicles. Appreciably, the present disclosure contemplates the possibility of incremental optimization - if the vehicles are carrying out a solution (e.g., referable to as an initial/default solution), the initial/default solution can be regarded as a current solution as a starting point from which incremental changes can be based to subsequently derive the (updated) final solution (e.g., which may mean/imply that some vehicles can receive new commands).

Earlier mentioned, a specialized search algorithm can, for example, be run in association with a subproblem, in accordance with an embodiment of the disclosure. The present disclosure contemplates that such a specialized search algorithm can, for example, be based on/be included in a library of specialized solvers (e.g., which can be carried by one or more devices 104) for one or more specific subproblems. Examples of such specialized solvers can, in accordance with an embodiment of the disclosure, include:

• Linear assignment: Given a matrix of assignment costs, determine an assignment of tasks to workers.

• Quadratic assignment: Generalization of linear assignment for when there are dependencies between tasks. There can be requirement(s) for cost matrices to be specified for task-task, task-vehicle, and vehicle - vehicle affinities.

• Route planning: Find the best sequence of waypoints to get from point A to point B, given traversal costs associated with each edge and node on a graph.

• Tour planning: Find the shortest route that passes through every location in some given list, given traversal costs associated with each edge and node on a graph. • Local search: Randomly perturb a solution, measure the cost function, and show some type of preference for solution(s) with lower cost(s). Examples can include simulated annealing, genetic algorithms, tabu- search, particle swarm optimization.

• Integer programming: Formulation of a linear objective function and linear valued constraint(s) over Boolean valued variable(s).

• Constraint programming: Which can be similar to integer programming, but with different computational tradeoff(s) involved.

• Graph/tree search: Search for, for example, route(s), node(s), Nash equilibrium between competing/cooperating agents.

The above example context 150 will now be discussed in further detail based a first example scenario, a second example scenario and a third example scenario hereinafter, in accordance with an embodiment of the disclosure.

The first example scenario can be in relation to a Point-to-Point delivery scenario where each vehicle (e.g., each delivery robot) can carry a single item (i.e. , combined task assignment and cooperative path planning) where a problem can be (decomposed/broken down) as follows:

• obtain a learned additional penalty matrix for an assignment problem which can incorporate feedback from a path planning process.

• assign one or more jobs to one or more vehicles based on this additional penalty matrix (e.g., using linear assignment).

• For each graph node, derive a “learned additional penalty” based on feedback from the traveling path(s) of one or more other vehicles.

• Plan the traveling path(s) for each vehicle according to the assigned job(s) and the derived “learned additional penalty”.

The second example scenario can be in relation to a Point-to-Point delivery scenario where each vehicle (e.g., each delivery robot) can carry multiple items (e.g., shuttle bus/dial-a-ride problem) where a problem can be (decomposed/broken down) as follows: • Obtain a learned additional penalty function which takes task assignment information as input.

• Assign tasks to vehicles at each point in time according to the cost function.

• Obtain a learned additional penalty that takes a vehicle schedule (i.e. , a list of pickups and drop-offs) as input.

• Determine a schedule for every vehicle.

• For each graph node, derive an “learned additional penalty”

• Route vehicles according to the schedules and the derived “learned additional penalty”.

The third example scenario can be in relation to Mothership based delivery (e.g., fixed set of deliveries, mothership sets down and picks up small delivery robots with limited range to perform actual deliveries) where a problem can be (decomposed/broken down) as follows:

• Obtain a learned additional penalty function which takes possible groupings of the jobs as input.

• Jobs can be clustered into job groups based on the learned additional penalty function.

• Obtain a learned additional penalty matrix for each possible assignment of the job groups to individual vehicles.

• Job groups can be assigned to delivery robots based on, for example, constraint programming with the learned additional penalty matrix.

• For each graph node to be traversed by the mothership, derive an “learned additional penalty”

• The path of the mothership can be updated using a local search approach (e.g., using a cost function to evaluate each possible perturbation).

• For each graph node to be traversed by the delivery robots, derive an “learned additional penalty”

• Tours can be planned for each delivery robot based on the assigned jobs and current path of the mothership. Concerning the above example scenario(s), it is contemplated that the graph nodes can, for example, correspond to physical locations that the vehicle(s) may visit/travel, in accordance with an embodiment of the disclosure.

Furthermore, it is generally contemplated that the above example scenario(s) can, for example, be associated with one or more subproblems. The subproblems can, for example, be associated with a hierarchy in which one subproblem can be associated with a higher level within the hierarchy as compared to another subproblem. For example, a first subproblem can be associated with a higher level (i.e. , as compared with a second subproblem) within the hierarchy whereas a second subproblem can be associated with a lower level (i.e., as compared with the first subproblem) within the hierarchy. In this regard, the second subproblem (i.e., associated with a lower level within the hierarchy) can have subproblem deference to the first subproblem (i.e., associated with a higher level within the hierarchy). For example, in regard to subproblem deference, it is contemplated that, in one embodiment, for certain pairs of subproblems a lower level subproblem can have a natural dependence on a higher level subproblem (e.g., route planning can be considered to be a lower level subproblem which can have a dependence on task assignment which can be considered to be a higher level subproblem, in accordance with an embodiment of the disclosure - in this regard, it is contemplated that a lower level subproblem can, for example, possibly take the decision variable subset associated with the higher level subproblem as fixed, in accordance with an embodiment of the disclosure.

Moreover, it is contemplated that such subproblems can be associated with one or more variations, in accordance with an embodiment of the disclosure. Example(s) of such variations can include:

• Online/on-demand route planning with an a priori unknown stream of jobs (e.g., when users may submit requests for rides at any time using a Smartphone app). It is contemplated that, in one embodiment, a number of “possible future tasks” that could plausibly be submitted to the system 100 at some point in the near future (e.g., receiving ride requests from a consumer app carried by a Smartphone device) and a “forking” path that can account for each of such “possible future tasks” can be planned. The “forking” path can include a “shared” section at one junction followed by a fork into separate paths for each “possible future task”. It is further contemplated that forking paths could possibly be internalized by the aforementioned “learned additional penalty”, in accordance with an embodiment of the disclosure.

• Battery charge constraint(s) where a vehicle (e.g., delivery robot) must move to a charging station before it experiences any risk of running out of power (thus resulting in possible immobility). Such a variation can possible be added to the cost function at the lowest level of the subproblem hierarchy (and can implicitly be propagated upwards to higher levels of the subproblem hierarchy via the learned additional penalties).

• Vehicle breakdown/fault modelling where the rate vehicle breakdowns can possibly be predicted in advance.

• Physical space constraints in operating area (environment) where there could be space constrains.

• Management of external communication of delivery windows where certain time windows for pickup and drop-off times can be provided. It is contemplated that the previously communicated time window(s) can possibly be integrated into the cost function to give preference to solutions where the time windows remain consistent.

Earlier mentioned, in the example context 150, it is contemplated that the neural network (e.g., graph NN 154) can, for example, be configured to predict a set of learned additional penalty parameters 156 which can be usable for evaluating one or more possible solutions to a subproblem, in accordance with an embodiment of the disclosure.

In one embodiment, the neural network (e.g., graph NN 154) can be trained in a manner as will be discussed in further detail with reference to Fig. 1c hereinafter.

Specifically, Fig. 1c shows a flow diagram 170 of a training process of a neural network associated with the neural network structure 150a of Fig. 1 b, according to an embodiment of the disclosure. More specifically, referring to Fig. 1c, the neural network (NN) can be trained based on, for example, simulated data, in accordance with an embodiment of the disclosure.

It is contemplated that, in accordance with an embodiment of the disclosure, a NN structure can be based on, for example, Fig. 2 on page 3 of “Heterogeneous Graph Transformer” by Ziniu Hu, Kuansan Wang, Yuxiao Dong and Yizhou Sun (hereinafter referred to as “reference paper”). Moreover, data usable for training the NN can, for example, be based on an OpenAI gym environment (e.g., for control of a ride hailing fleet) such as “pyhailing” (e.g., in accordance with an embodiment of the disclosure.

As an example how an above-described NN may be trained, a training dataset may be provided to the NN (e.g., graph NN) and following standard steps for training a general NN may be followed. For the above-mentioned NN, “pyhailing” simulated data can, for example, be used to train the NN. In another example, in the context of a pseudocode, as will be discussed as an example later in further detail in accordance with an embodiment of the disclosure, one or more solutions can be generated by manner of a slow offline method based on a software package which can be related to vehicle routing problem solver(https://qithub.com/reinterpretcat/yrp).

Before training the neural network according to, for example, the last paragraph of section 4.2 per the earlier mentioned “reference paper”, the weights are randomly initialized to numbers between 0.01 and 0.1 , while the biases are randomly initialized to numbers between 0.1 and 0.9.

Subsequently, the first observations of the dataset are loaded into the input layer of the neural network and the output value(s) is generated by forward-propagation of the input values of the input layers. Afterwards the following loss function is used to calculate loss with the output value(s): wherein n is the number of neurons in the output layer and y is the real value while Y is the calculated output.

The weights and biases are subsequently updated by an AdamOptimizer with a learning rate of 0.001. It is to be noted that the other parameters of the AdamOptimizer should be set to default values. For example: beta_1 = 0.9 beta_2 = 0.999 eps = 1 e-08 weight_decay = 0

The steps described above should be repeated with the next set of observations until all the observations are used for training. This represents the first training epoch.

This was repeated until 10 epochs were done.

It is contemplated that the neural network can, for example, be configured to amortize calculation of conditional subproblems. In this regard, it is contemplated that training data for the neural network can, for example, be self-generated by manner of the following example process:

• Generate a solution to a subproblem (e.g., based on the earlier discussion in association with the system 100)

• Generate an improved solution (e.g., by applying a very general optimization algorithm like local search) 1

• For each subproblem, a possible value of a learned additional penalty can be determined so as to force the solution of the subproblem to match the solution generated in a previous step.

• A new value of the learned additional penalty can be chosen to be close/near to the existing value of the learned additional penalty.

• A loss function can be computed based on the difference between the new and existing values of the learned additional penalty.

In one embodiment, a prediction model 172 can be used to generate a scenario 174 which can be input for generating a solution to a subproblem 176. Additionally, one or both of a cost function 178 and data associated with one or more auxiliary predictive models 180 can be used as further input(s) for generating the solution to a subproblem. The cost function 178 and/or data associated with the auxiliary predictive model(s) 180 can be used to generate an improved solution 182. As mentioned earlier, for each subproblem, a possible value of a learned additional penalty 184 can be determined so as to force the solution of the subproblem to match the solution generated in a previous step.

It is contemplated that in the above manner, one or more specialized/customized optimization algorithms (e.g., specialized solver(s)) for different subproblems can possibly be combined (together) in a manner that can allow tight coupling between the subproblems can be facilitated, in accordance with an embodiment of the disclosure. Generally, for the present disclosure, it is contemplated that the neural network can be responsible for higher level task of deconfliction (i.e., as opposed to low level planning). Hence the possibility of improvement in scaling to larger problem(s) can be facilitated. For example, the neural network can only be concerned with learning to deconflict different local search algorithms whereas low level planning can be carried out by one or more specialized search algorithms (e.g., a library of specialized solvers). It is appreciable that by manner of facilitating the combination of specialized/customized optimization algorithm(s) (e.g., specialized search algorithm(s) associated with/from a library of specialized solvers) such that tight coupling between the subproblems can possibly be addressed, efficiency (e.g., computational efficiency) and/or adaptability/flexibility can possibly be facilitated. Earlier mentioned, the apparatus(es) 102 can, for example, be configured to receive one or more control signals, in accordance with an embodiment of the disclosure. Generally, it is contemplated that the device(s) 104 can be configured to generate and communicate one or more control signal(s) corresponding to/associated with/indicative of/which can include the generated solution(s) (e.g., final solution) as discussed earlier, in accordance with an embodiment of the disclosure.

Further earlier mentioned, the apparatus(es) 102 can, for example, be configured to process the control signal(s) to generate one or more output signal(s), in accordance with an embodiment of the disclosure. The output signal(s) can be usable for, for example, movement control of a vehicle (e.g., a delivery robot), in accordance with an embodiment of the disclosure.

The aforementioned apparatus(es) 102 will be discussed in further detail with reference to Fig. 2 hereinafter.

Referring to Fig. 2, an apparatus 102 is shown in further detail in the context of an example implementation 200, according to an embodiment of the disclosure.

In the example implementation 200, the apparatus 102 can correspond to an electronic module 200a which can be capable of being configured to perform one or more processing task(s) in association with the input signal(s), the control signal(s) and/or the output signal(s), as discussed earlier with regard to the system 100, in accordance with an embodiment of the disclosure.

The electronic module 200a can, for example, correspond a computing device which can, for example, be carried by a vehicle (e.g., a mobile delivery robot), in accordance with an embodiment of the disclosure. The vehicle (not shown) can, for example, carry at least one sensor and/or at least one communication module which can be configured to provide one or more inputs associated with/indicative of the system state(s) of the vehicle, in accordance with an embodiment of the disclosure. Moreover, one or more inputs associated with/indicative of the predictive model(s) can be communicated from, for example, at least one device 104 (e.g., which can carry a database of predictive models), in accordance with an embodiment of the disclosure.

The electronic module 200a can, for example, include a casing 200b. Moreover, the electronic module 200a can, for example, carry any one of a first module 202, a second module 204, a third module 206, or any combination thereof.

In one embodiment, the electronic module 200a can carry a first module 202, a second module 204 and/or a third module 206. In a specific example, the electronic module 200a can carry a first module 202, a second module 204 and a third module 206, in accordance with an embodiment of the disclosure.

In this regard, it is appreciable that, in one embodiment, the casing 200b can be shaped and dimensioned to carry any one of the first module 202, the second module 204 and the third module 206, or any combination thereof.

The first module 202 can be coupled to one or both of the second module 204 and the third module 206. The second module 204 can be coupled to one or both of the first module 202 and the third module 206. The third module 206 can be coupled to one or both of the first module 202 and the second module 204. In one example, the first module 202 can be coupled to the second module 204 and the second module 204 can be coupled to the third module 206, in accordance with an embodiment of the disclosure. Coupling between the first module 202, the second module 204 and/or the third module 206 can, for example, be by manner of one or both of wired coupling and wireless coupling.

Each of the first module 202, the second module 204 and the third module 206 can correspond to one or both of a hardware-based module and a software-based module, according to an embodiment of the disclosure.

In one example, the first module 202 can correspond to a hardware-based receiver which can be configured to receive the control signal(s). The second module 204 can, for example, correspond to a hardware-based processor which can be configured to perform one or more processing tasks in association with one or both of the following:

• generating one or more input signal(s) (e.g., based on the input(s) associated with/indicative the system state(s) of the vehicle which can be communicated from the sensor(s) and/or the communication module(s)), and

• processing the received control signal(s) in a manner so as to generate the output signal(s).

The third module 206 can correspond to a hardware-based transmitter which can be configured to communicate one or both of the input signal(s) and the output signal(s) (i.e., the input signal(s) and/or the output signal(s)). The input signal(s) and/or the output signal(s) can, for example, be communicated from the electronic module 200a to the device(s) 104, in accordance with an embodiment of the disclosure.

The present disclosure contemplates the possibility that the first and second modules 202/204 can be an integrated software-hardware based module (e.g., an electronic part which can carry a software program/algorithm in association with receiving and processing functions/an electronic module programmed to perform the functions of receiving and processing). The present disclosure further contemplates the possibility that the first and third modules 202/206 can be an integrated softwarehardware based module (e.g., an electronic part which can carry a software program/algorithm in association with receiving and transmitting functions/an electronic module programmed to perform the functions of receiving and transmitting). The present disclosure yet further contemplates the possibility that the first and third modules 202/206 can be an integrated hardware module (e.g., a hardware-based transceiver) capable of performing the functions of receiving and transmitting.

In view of the foregoing, it is appreciable that the present disclosure generally contemplates an apparatus 102 (e.g., which can correspond to an electronic module 200a, in accordance with an embodiment of the disclosure) which can be coupled to at least one device 104 (i.e., associable with a “neural network structure”), in accordance with an embodiment of the disclosure.

Specifically, in one embodiment, the present disclosure contemplates an apparatus 102 which can be suitable for use with, for example, a neural network (NN) structure.

In one embodiment, the apparatus 102 can, for example, include a first module 202, a second module 204 and a third module 206. In one example, the first module 202 can be coupled to the second module 204 and the second module 204 can be coupled to the third module 206, in accordance with an embodiment of the disclosure.

The first module 202 can, for example, be configured to receive one or more control signals from the NN structure. The second module 204 can, for example, be configured to process the control signal(s) in a manner so as to generate at least one output signal usable for controlling a machine (e.g., a vehicle which can, for example, correspond to a delivery robot). The third module 206 can, for example, be configured to communicate one or more input signals which can be received by the NN structure for processing to generate the control signal(s). In one embodiment, the apparatus 102 can, for example, be carried by the machine and the apparatus 102 can, for example, be configured to control the machine based on the output signal(s).

In one embodiment, the NN structure can be configured to process the input signal(s) by manner of one or both of:

• a first optimization loop 150b wherein subproblem optimization iterations are performed and the first optimization loop 150b is associable with at least one specialized algorithm (i.e., specialized/customized optimization algorithm(s)) corresponding to at least one solution which can be associated with at least one subproblem

• a second optimization loop 150c wherein coordinate descent iterations are performed In one embodiment, the NN structure can be associated with a neural network (NN) (e.g., graph NN) associable with the second optimization loop 150c. Moreover, the NN can be capable of accepting one or more input signals for processing to generate one or more learned additional penalty by manner of one or more coordinate descent iterations.

Furthermore, the first optimization loop 150b can, for example, include querying one or more learned additional penalty capable of evaluating the solution(s), in accordance with an embodiment of the disclosure. For example, the NN can be configured/utilized to generate one or more predictions that can be used to influence the first optimization loop 150b (e.g., inner loop optimization) and “querying” can be used to indicate the manner in which the first optimization loop 150b can utilize information from the prediction(s) associated with the NN, in accordance with an embodiment of the disclosure. In one specific example, in regard to task assignment, the NN can be configured to predict a matrix which can be used as input to a linear assignment algorithm. In another specific example, in regard to routing, the NN can be used to predict one or more weights associated with graph edges and the predicted weight(s) can be used as input(s) to the path planning algorithm. Alternatively, a hypernetwork can, for example, be utilized in which a function that can combine the prediction(s) associated with the NN with one or more candidate solutions to obtain/derive at least one cost, in accordance with an embodiment of the disclosure. For example, for each graph node a short embedding vector can be predicted, a 1 D convolution can be applied to the sequence of graph nodes corresponding to the candidate solution and the resulting array can be summed, in accordance with an embodiment of the disclosure.

In one embodiment, the second optimization loop 150c can, for example, correspond to a coordinate descent iteration process which can, for example, include one or more coordinate descent iterations. The coordinate descent iteration(s) can, for example, include a current coordinate descent iteration (e.g., an iteration which can be performed at a current point in time) and one or more subsequent coordinate descent iterations (e.g., one or more iterations which can be performed at later point(s) in time after the current coordinate descent iteration). The subsequent coordinate descent iterations can follow (e.g., sequentially) the current coordinate descent iteration. Additionally, the solution(s) can, for example, correspond to a current solution which can be associated with the current coordinate descent iteration. Furthermore, the current solution can be capable of being perturbed during a subsequent coordinate descent iteration based on the learned additional penalty so as to derive an updated current solution. In one embodiment, the solution(s) can be capable of being repeatedly perturbed during the coordinate descent iteration process until a final solution can be derived. The control signal(s) can, for example, correspond to/include/be associated with the final solution.

It is contemplated that in the above manner, one or more specialized/customized optimization algorithms (e.g., specialized solver(s)) for different subproblems can possibly be combined (together) in a manner that can allow tight coupling between the subproblems can be facilitated, in accordance with an embodiment of the disclosure. Generally, for the present disclosure, it is contemplated that the neural network can be responsible for higher level task of deconfliction (i.e., as opposed to low level planning). Hence the possibility of improvement in scaling to larger problem(s) can be facilitated. For example, the neural network can only be concerned with learning to deconflict different local search algorithms whereas low level planning can be carried out by one or more specialized search algorithms (e.g., a library of specialized solvers). It is appreciable that by manner of facilitating the combination of specialized/customized optimization algorithm(s) (e.g., specialized search algorithm(s) associated with/from a library of specialized solvers) such that tight coupling between the subproblems can possibly be addressed, efficiency (e.g., computational efficiency) and/or adaptability/flexibility can possibly be facilitated.

The above-described advantageous aspect(s) of the apparatus(es) 102 of the present disclosure can also apply analogously (all) the aspect(s) of a below described processing method of the present disclosure. Likewise, all below described advantageous aspect(s) of the processing method of the disclosure can also apply analogously (all) the aspect(s) of above described apparatus(es) 102 of the disclosure. It is to be appreciated that these remarks apply analogously to the earlier discussed system 100 of the present disclosure. Referring to Fig. 3, a processing method in association with the system 100 is shown, according to an embodiment of the disclosure. The processing method 300 can, for example, be suitable for use in association with a neural network (NN) structure (e.g., associable with/corresponding to a device 104 as discussed earlier with reference to Fig. 1 ), in accordance with an embodiment of the disclosure.

In one example application, the processing method 300 can be suitable for use in association with a NN, in accordance with an embodiment of the disclosure. In one specific example application, the processing method 300 can be suitable for use in association with the utilization of one or more neural network-based approaches in association with, for example, combining a set of algorithms associated with one or more subproblems, in accordance with an embodiment of the disclosure. For example, one or more neural network-based approaches can be utilized in association with conglomeration of an arbitrary set of specialized search algorithms (e.g., optimization algorithms) for individual subproblems. In a more specific example, the processing method 300 can be associated with a neural network structure which can be capable of enabling the conglomeration of an arbitrary set of specialized search algorithms for individual subproblems. For example, a neural network structure can be based on/correspond to/include/be associated with a coordinate descent type structure where, for example, a single decision variable subset can be optimized at each iteration by each of the subproblem optimization algorithms. The NN structure can, for example, be associated with a NN, in accordance with an embodiment of the disclosure.

The processing method 300 can, for example, include any one of an input step 302, a learning step 304 and an output step 306, or any combination thereof, in accordance with an embodiment of the disclosure.

In one embodiment, the processing method 300 can, for example, include an input step 302. In another embodiment, the processing method 300 can, for example, include a learning step 304. In yet another embodiment, the processing method 300 can, for example, include an output step 306. In yet another further embodiment, the processing method 300 can, for example, include one or both of an input step 302 and a learning step 304. In yet another further embodiment, the processing method 300 can, for example, include one or both of a learning step 304 and an output step 306. In yet another further embodiment, the processing method 300 can, for example, include one or both of an input step 302 and an output step 306. In yet another further additional embodiment, the processing method 300 can, for example, include an input step 302, a learning step 304 and an output step 306. In yet another further additional embodiment, the processing method 300 can, for example, include any combination of an input step 302, a learning step 304 and an output step 306.

With regard to the input step 302, one or more input signals which can, for example, be associated with/include/be based on/correspond to one or both of at least one system state and at least one predictive model can be generated and/or received. The input signal(s) can, for example, be generated by and/or communicated from the apparatus(es) 102, in accordance with an embodiment of the disclosure.

With regard to the learning step 304, based on the input signal(s) one or more processing tasks in association with one or both of the earlier mentioned first optimization loop 150b and the earlier mentioned second optimization loop 150c can be performed in a manner so as to generate one or more control signals. In one embodiment, at least one processing task in association with the first optimization loop 150b can be performed. In another embodiment, at least one processing task in association with the second optimization loop 150c can be performed. In yet another embodiment, at least one processing task in association with the first optimization loop 150b and the second optimization loop 150c can be performed. Concerning the first optimization loop 150b, one or more processing tasks in association with one or more subproblem optimization iterations can be performed, in accordance with an embodiment of the disclosure. Concerning the second optimization loop 150c, one or more processing tasks in association with one or more coordinate descent iterations can be performed, in accordance with an embodiment of the disclosure. Moreover, in one embodiment, the device(s) 104 can, for example, be configured to perform one or more processing tasks in association with the first optimization loop 150b and/or the second optimization loop 150c. Furthermore, the control signal(s) can, for example, be generated by and/or communicated from the device(s) 104, in accordance with an embodiment of the disclosure.

With regard to the output step 306, based on the control signal(s), one or more output signal(s) can, for example, be generated, in accordance with an embodiment of the disclosure. The output signal(s) can, for example, be utilized for machine control (e.g., controlling vehicle movement such as movement of a delivery robot), in accordance with an embodiment of the disclosure. Moreover, the output signal(s) can, for example, be generated by and/or communicated from the apparatus(es) 102, in accordance with an embodiment of the disclosure.

The present disclosure further contemplates a computer program (not shown) which can include instructions which, when the program is executed by a computer (not shown), cause the computer to carry out any one of the input step 302, the learning step 304 and the output step 306, or any combination thereof (i.e. , the input step 302, the learning step 304 and/or the output step 306) as discussed with reference to the processing method 300.

The present disclosure yet further contemplates a computer readable storage medium (not shown) having data stored therein representing software executable by a computer (not shown), the software including instructions, when executed by the computer, to carry out any one of the input step 302, the learning step 304 and the output step 306, or any combination thereof (i.e., the input step 302, the learning step 304 and/or the output step 306) as discussed with reference to the processing method 300.

In view of the foregoing, it is appreciable that the present disclosure generally contemplates a processing method 300 which can be associated with a neural network (NN) (e.g., a device 104 as discussed earlier with reference to Fig. 1). In one example application, the processing method 300 can be suitable for use in association with a NN structure. The processing method 300 can, for example, include an input step 302 and a learning step 304, in accordance with an embodiment of the disclosure. In one embodiment, the processing 300 can, for example, further include an output step 306 wherein at least one output signal can be derived based on the control signal(s).

The input step 302 can include receiving one or more input signal(s) (e.g., communicable from the apparatus(es) 102)

The learning step 304 can be associated with one or both of:

• a first optimization loop 150b wherein subproblem optimization iterations can be performed and the first optimization loop 150b can be associated with one or more specialized algorithms corresponding to one or more solutions which can be associated with one or more subproblems

• a second optimization loop 150c wherein one or more coordinate descent iterations can be performed

The NN structure can be associable with a neural network (NN) (e.g., graph NN) which can be associated with the second optimization loop 150c. Moreover, the NN structure can be capable of accepting the input signal(s) for processing to generate one or more learned additional penalty by manner of one or more coordinate descent iterations.

Additionally, the first optimization loop 150b can, for example, include querying at least one learned additional penalty capable of evaluating the at least one solution, in accordance with an embodiment of the disclosure.

In one embodiment, the second optimization loop 150c can, for example, correspond to a coordinate descent iteration process which can, for example, include one or more coordinate descent iterations. The coordinate descent iteration(s) can, for example, include a current coordinate descent iteration (e.g., an iteration which can be performed at a current point in time) and one or more subsequent coordinate descent iterations (e.g., one or more iterations which can be performed at later point(s) in time after the current coordinate descent iteration). The subsequent coordinate descent iterations can follow (e.g., sequentially) the current coordinate descent iteration. Additionally, the solution(s) can, for example, correspond to a current solution which can be associated with the current coordinate descent iteration. Furthermore, the current solution can be capable of being perturbed during a subsequent coordinate descent iteration based on the learned additional penalty so as to derive an updated current solution. In one embodiment, the solution(s) can be capable of being repeatedly perturbed during the coordinate descent iteration process until a final solution can be derived. The control signal(s) can, for example, correspond to/include/be associated with the final solution.

It is contemplated that in the above manner, one or more specialized/customized optimization algorithms (e.g., specialized solver(s)) for different subproblems can possibly be combined (together) in a manner that can allow tight coupling between the subproblems can be facilitated, in accordance with an embodiment of the disclosure. Generally, for the present disclosure, it is contemplated that the neural network can be responsible for higher level task of deconfliction (i.e., as opposed to low level planning). Hence the possibility of improvement in scaling to larger problem(s) can be facilitated. For example, the neural network can only be concerned with learning to deconflict different local search algorithms whereas low level planning can be carried out by one or more specialized search algorithms (e.g., a library of specialized solvers). It is appreciable that by manner of facilitating the combination of specialized/customized optimization algorithm(s) (e.g., specialized search algorithm(s) associated with/from a library of specialized solvers) such that tight coupling between the subproblems can possibly be addressed, efficiency (e.g., computational efficiency) and/or adaptability/flexibility can possibly be facilitated.

It should be appreciated that the embodiments described above can be combined in any manner as appropriate (e.g., one or more embodiments as discussed in the “Detailed Description” section can be combined with one or more embodiments as described in the “Summary of the Invention” section). It should be further appreciated by the person skilled in the art that variations and combinations of embodiments described above, not being alternatives or substitutes, may be combined to form yet further embodiments.

In one example, the communication network 106 can possibly be omitted. Communication (i.e., between the apparatus(es) 102 and the device(s) 104) can be by manner of direct coupling. Such direct coupling can be by manner of one or both of wired coupling and wireless coupling.

In another example, in a situation where solution space can compact to an extent, the learned additional penalty can possibly be precomputed for every possible solution. In such a situation, for example, cost matrix can possibly be computed without the need for a hypernetwork. In a more specific example in connection with such a situation, for assignment type problems, a graph neural network may possibly directly compute the cost matrix without a hypernetwork.

In yet another example, it was earlier discussed that a neural network can possibly be trained based on, for example, simulated data, in accordance with an embodiment of the disclosure. The present disclosure contemplates the possibility of a variant of the training algorithm for, for example, handling stochastic online planning, in accordance with an embodiment of the disclosure. With regard to such a variant, the “learned additional penalty” can, for example, be trained to incorporate the risk of having to respond to unknown jobs that may be received in the future. It is contemplated that doing so may possibly avoid having to explicitly perform scenariobased planning (e.g., which may lead to additional computational costly and/or inefficiency) when the system 100 deployed, in accordance with an embodiment of the disclosure. Such a variant can be largely analogous to the discussion per Fig. 1c with one or more variations in that for the variant:

• the scenario 174 generated can be based substantially (or, only) on known jobs

• the auxiliary predictive model(s) 180 can be stochastic (i.e., auxiliary stochastic predictive model(s) which can take into account parameters such as one or more faults) • one or more scenario samples can be generated based on unknown jobs and/or fault(s) etc.

• the improved solution 182 can be generated based additionally on the scenario sample(s)

• the learned additional penalty 184 can implicitly include mitigations for possible future jobs

In yet another example, the first example scenario, the second example scenario and/or the third example scenario as discussed in connection with the example context 150 can, for example, be associated with vehicle routing. The present disclosure contemplates that other examples can also be useful (i.e., one or more combinatorial optimization problems which may/may not be associated with vehicle routing). One such example can be factory logistics planning (e.g., movement of raw materials to machines and collection of finished products) and the subproblem can be decomposed/broken down in the following way:

• a sequence of end products to produce can be chosen

• a sequence of intermediate products to produce can be chosen

• the production of processing steps to machines can be assigned

• high level aggregate material flow between machines can be chosen

• individual vehicle trajectories to achieve these material flows can be chosen

In yet another further example, the present disclosure contemplates an example pseudocode which can possibly be illustrative in the context of the first example scenario (i.e., which can be in relation to a Point-to-Point delivery scenario where a vehicle such as a delivery robot can carry a single item) as follows:

Pseudocode:

//Start of Pseudocode stuct Node: position: float[2] struct WorldGraph: nodes: Node[:] edges: Nodeld[:][2] struct Task: start_position: Nodeld end_position: Nodeld struct TaskAssignment: task: Task vehicle: Vehicle struct TaskAssignmentWithPath: task: Task vehicle: Vehicle path: Nodeld[]

Vehicle: position: Nodeld has_item_for_task: Optional[Taskld] func compute_policy( graph_network_routing: GraphNetworkRouting, graph_network_assignment: GraphNetworkAssignment, vehicles: VehicleState[], tasks: Task[], world: WorldGraph, is_training: bool

) -> (TaskAssignment[], float): assignment_learned_additional_penalty_matrix = get_default_penalty_matrix(method="spatial_proximity") if is_training: loss = 0.0

// For example, generate a solution to the problem using the proposed method, then generate

// an improved solution (e.g., by applying a very general optimization algorithm like local search) improved_assignments = get_improved_assignments(vehicles, tasks, world) for i_coordinate_descent_round in 1..n_coordinate_descent_round:

// perform assignments based on the current assignments: TaskAssignment[] = linear_assignment(assignment_learned_additional_penalty_matr ix)

// first update the routing based on feedback from the assignments graph_network_routing_input = construct_bipartite_graph( nodes={

"place_nodes": world. nodes,

"task_nodes": "create node for each task"

}, edges={

"navigation": world. edges,

"task_start_position": "from tasks[:].start_position to task_nodes",

"task_end_position": "from tasks[:].end_position to task_nodes",

"assignments": "from assignment.vehicle. position to task_nodes"

}

) graph_network_routing_output = graph_network_routing(graph_network_routing_input) predicted_learned_additional_penalties = "extract node and edge predictions from graph_network_routing_output"

// perform the routing based on feedback from the network for i_path_planning_iteration in 1 ,.n_path_planning_iterations: for i_vehicle in 1 ,.n_vehicle: assignments_with_path[i_vehicle] = assignments[i_vehicle] assignments_with_path[i_vehicle].path = perform_weighted_astar_planning( world, assignments[i_vehicle], weights=predicted_learned_additional_penalties

) if is_training:

// For example, generate a solution to the problem using the proposed method, then generate

// an improved solution (e.g., by applying a very general optimization algorithm like local search) improved_paths = get_improved_paths(world=world, assignments)

// solve a global optimization problem, e.g. with simulated annealing perturbed_learned_additional_penalties = get_learned_additional_penalties_which_produce_path( improved_paths, world, assignments

) loss += (predicted_learned_additional_penalties - perturbed_learned_additional_penalties).pow(2).sum()

// then update the assignments based on feedback from the routing graph_network_assignment_input = construct_bipartite_graph( nodes={

"place_nodes": world. nodes, "task_nodes": "create node for each task" }, edges={

"navigation": world. edges, "task_start_position": "from assignments_with_path[:].task.start_position to task_nodes",

"task_end_position": "from assignments_with_path[:].task.end_position to task_nodes",

"assignments_with_path": "from assignment.vehicle.position to task_nodes",

"paths": "from assignments_with_path.path[:] to task_nodes",

"possible_assignments": "from vehicles!:], position to task_nodes where vehicles!:]. has_item_for_task is not set",

}

) graph_network_assignment_output = graph_network_assignment(graph_network_assignment_input) assignment_learned_additional_penalty_matrix = "collate edges of type possible_assignments from graph_network_assignment_output" if is_training:

// solve a global optimization problem, e.g. with simulated annealing perturbed_assignment_learned_additional_penalty_matrix = get_assignment_learned_additional_penalty_matrix_which_produ ce_assignments( assignments=improved_assignments, starting_point=assignment_learned_additional_penalty_matrix

) loss += (assignment_learned_additional_penalty_matrix - perturbed_assignment_learned_additional_penalty_matrix).pow( 2).sum() return assignments_with_path, loss func train_network(): graph_network_routing = GraphNetworkRoutingQ graph_network_assignment = GraphNetworkAssignment()

// n_train_iter may be set to 10 A 6 for i_train_iter = O..n_train_iter: vehicles, tasks, world = generate_random_scenario() assignments, loss = compute_policy( graph_network_routing, graph_network_assignment, vehicles, tasks, world, is_training=True

) // ADAM learning rate may be set to 10 A -4, other ADAM paramters may be set to default values adam_optim([image_network, fusion_network], loss)

//End of Pseudocode

In yet another additional example, the input signal(s) can be communicated from and/or generated by one or both of at least one apparatus 102 and at least one device 104 (e.g., a device 104 which can correspond to a host device carrying a database associated with the predictive model(s)).

In the foregoing manner, various embodiments of the disclosure are described for addressing at least one of the foregoing disadvantages. Such embodiments are intended to be encompassed by the following claims, and are not to be limited to specific forms or arrangements of parts so described and it will be apparent to one skilled in the art in view of this disclosure that numerous changes and/or modification can be made, which are also intended to be encompassed by the following claims.