METHOD AND SYSTEM FOR MANAGING DRONE PORTS

Title:

METHOD AND SYSTEM FOR MANAGING DRONE PORTS

Document Type and Number:

WIPO Patent Application WO/2023/214173

Kind Code:

Abstract:

A method for managing a plurality of drone ports is disclosed herein. The method comprises operating a trained machine learning model to control a plurality of drone ports. Each of the plurality of drone ports comprises: a plurality of levels connected by a computer-operated lift, wherein at least one of the levels comprises a landing pad for receiving at least one flying drone; a plurality of lockers for receiving goods transported by the plurality of flying drones; and a plurality of rovers configured to transfer goods from a flying drone landed on the landing pad to one of the plurality of lockers. The machine learning model is trained heuristically using a simulation of the plurality of drone ports, and based on a reward given as a function of environment state.

Inventors:

MAJOE DENNIS (GB)

Application Number:

PCT/GB2023/051182

Publication Date:

November 09, 2023

Filing Date:

May 04, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

MAJOE DENNIS (GB)

International Classes:

G06Q10/083; B64F1/32; B64U70/90

Foreign References:

US20180155032A1	2018-06-07
EP3390226A1	2018-10-24
US20190220819A1	2019-07-18
US20200349852A1	2020-11-05
GB202206541A	2022-05-05
GB202215044A	2022-10-12
GB2306564A	1964-06-03

Other References:

VOLODYMYR MNIH ET AL: "Asynchronous Methods for Deep Reinforcement Learning", PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2016), 4 February 2016 (2016-02-04), XP055646567, Retrieved from the Internet
YAN PENG: "Flocking Control of UAV Swarmswith Deep Reinforcement Learning Approach", 27 November 2020 (2020-11-27), XP093064082, Retrieved from the Internet [retrieved on 20230714]

Attorney, Agent or Firm:

WHITE, Andrew (GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS:

1. A method for managing a plurality of drone ports, wherein the method comprises operating a trained machine learning model to control at least one drone port, wherein each drone port comprises: at least one level comprising a landing pad for receiving at least one flying drone; a plurality of lockers for receiving goods transported by a plurality of flying drones; and a plurality of rovers configured to transfer goods from a flying drone landed on the landing pad to one of the plurality of lockers; wherein the machine learning model is trained heuristically using a simulation of the plurality of drone ports, and based on a reward given as a function of environment state.

2. The method of claim 1 wherein the machine learning model is configured to meet at least one of a plurality of constraints comprising: (i) correctness, (ii) energy consumption, (iii) charge level of one of the plurality of flying drones, (iv) charge level of at least one of the plurality of rovers, and (v) time.

3. The method of claim 1 or 2 wherein the environment state comprises at least one of (i) the position or status of at least one of the plurality of flying drones, (ii) the position or status of at least one of the plurality of rovers, and (iii) the status of at least one of the plurality of lockers.

4. The method of claim 3 wherein the status of the plurality of lockers comprises at least one of: (i) the locker being occupied with goods, (ii) the locker being open and/or unlocked, and (iii) the locker being closed and/or locked.

5. The method of any of the previous claims wherein the reward comprises a penalty for any of the following environment states: (i) repetitive movements, (ii) delivery of the goods to the wrong destination, and (iii) charge level of at least one of the plurality of flying drones and/or one of the plurality of rovers dropping below a selected threshold.

6. The method of any of the previous claims wherein the machine learning model is a deep reinforcement machine learning model.

7. The method of any of the previous claims wherein the machine learning model is based on an asynchronous variant of proximal policy optimization.

8. The method of any of the previous claims wherein the machine learning model is configured to perform an objective function to maximise the reward.

9. The method of any of the previous claims, further comprising outputting, from the trained machine learning model, a sequence of discrete actions to be performed at each of the drone ports.

10. The method of claim 9 further comprising sending the sequence of discrete actions to at least one of: (i) at least one of the plurality of flying drones, (ii) at least one of the plurality of rovers.

11 . The method of any of the previous claims, wherein the machine learning model is configured to provide a plurality of command outputs, and wherein the method comprises performing a cross-check of the command outputs for safety.

12. A method of training a machine learning model for managing a drone port, wherein each drone port comprises: at least one comprising a landing pad for receiving at least one flying drone; a plurality of lockers for receiving goods transported by a plurality of flying drones; and a plurality of rovers configured to transfer goods from a flying drone landed on the landing pad to one of the plurality of lockers; the method comprising: creating a digital model of the plurality of drone ports; creating a series of constraints to be met by the digital model; creating a series of rewards when particular environment states are met by the digital model; and training the machine learning model using the digital model, constraints and rewards.

13. The method of claim 12 wherein the plurality of constraints comprises at least one of: (i) correctness, (ii) energy consumption, (iii) charge level of one of the plurality of flying drones, (iv) charge level of at least one of the plurality of rovers, and (v) time.

14. The method of claim 12 or 13 wherein the environment states comprise at least one of (i) the position or status of at least one of the plurality of flying drones, (ii) the position or status of at least one of the plurality of rovers, and (iii) the status of at least one of the plurality of lockers.

15. The method of claim 14 wherein the status of the plurality of lockers comprises at least one of: (i) the locker being occupied with goods, (ii) the locker being open and/or unlocked, and (iii) the locker being closed and/or locked.

16. The method of any of claims 12 to 15 wherein the reward comprises a penalty for any of the following environment states: (i) repetitive movements, (ii) delivery of the goods to the wrong destination, and (iii) charge level of at least one of the plurality of flying drones and/or one of the plurality of rovers dropping below a selected threshold.

17. The method of any of claims 12 to 16 wherein the machine learning model is a deep reinforcement machine learning model.

18. The method of any of claims 12 to 17 wherein the machine learning model is based on an asynchronous variant of proximal policy optimization. 19. The method of any of claims 12 to 18 wherein the deep reinforcement machine learning model is configured to perform an objective function to maximise the reward.

20. A system for managing a fleet of a plurality of flying drones operating from at least one drone port, wherein each drone port comprises: at least one level comprising a landing pad for receiving at least one flying drone; a plurality of lockers; and a plurality of rovers configured to transfer goods from a flying drone landed on the landing pad to one of the plurality of lockers; the system comprising a collaborative server, wherein the collaborative server is configured to communicate with a plurality of flying drones, and the at least one drone port, and wherein the collaborative server is configured to run a machine learning model to control operation of the plurality of flying drones and the plurality of rovers.

21 . The system of claim 20, wherein the collaborative server is configured to output a sequence of discrete actions to be performed at each of the drone ports.

22. The method of claim 21 wherein the collaborative server is configured to send the sequence of discrete actions to at least one of: (i) at least one of the plurality of flying drones, and (ii) at least one of the plurality of rovers.

23. The system of any of claims 20 to 22 wherein the machine learning model has been trained according to the method of any of claims 12 to 19.

24. The system of any of claims 20 to 23 wherein the collaborative server is configured to send high level commands to any of the plurality of flying drones, the plurality of rovers and the lifts, wherein each high-level command is an instruction to perform a discrete action.

25. The system of claim 24, wherein the system is configured to perform a crosscheck of the high-level commands for safety.

26. The system of any of claims 20 to 25 wherein the collaborative server is configured to handle the scheduling of the flights in association with external applications that provide flight path approval.

27. The system of any of claims 20 to 26 wherein each drone port comprises a minimum of one lift, one parcel locker, one drone, one parcel pickup rover, one drone garaging rover, and one charging station for either a drone or a rover. 28. A computer readable non-transitory storage medium comprising a program for a computer configured to cause a processor to perform the method of any of claims 1 to 11.

29. A computer readable non-transitory storage medium comprising a program for a computer configured to cause a processor to perform the method of any of claims 12 to

19.

Description:

Method and system for managing drone ports

Field of the invention

The present disclosure relates to a method and system for managing drone ports. Specifically, the disclosure relates to automated systems for drone delivery and parcel handling logistics.

This application claims priority to GB2206541.1 filed on 5 May 2022, to GB 2215044.5 filed on 12 October 2022, and to GB2306564.2 filed on 3 May 2023.

Background

Recently there has been growing interest in the use of drones to deliver parcels either in the medical space, e-commerce or military. For this technology to be competitive with vans and other ground transport, there will be the need for improving efficiency and reducing cost.

Drone technology has focused on the problems of flying and associated regulations. However less attention has been given to what happens before the drone takes off with a parcel and when it lands with a parcel. First and foremost, one needs a landing and takeoff point which ideally can also handle parcels for drone delivery and handle parcels that have been received by drone. We refer to this site as a drone port.

Drone ports need to be sited strategically at different locations in both rural and urban geographies. In dense urban areas land cost is high and commercially viable drone ports must be designed to account for this.

When drones are to be used in this context of drone ports, the handling of drones, parcels and flight scheduling could be achieved using human operators. However, the largest costs in drone ports will be associated with the amount of human labor and the amount of land on which the drone port is sited. Drone ports that use more land to garage and interact with the drones will require more land at more cost, therefore compact small foot-print drone ports are desirable.

Since the revenue per delivery must be kept as low and as affordable as possible given competing modes of transport, commercially viable drone port revenues will most likely require extremely high throughput of drones and parcels on a continuous basis.

With humans carrying out the mundane tasks of loading and unloading drones, as well as charging and garaging the drones, over extended periods of work, the risk of human error will be significant. Moreover, there exists a danger to humans interacting with drones whose protruding propellers and frames are not designed to meet human ergonomic needs, and this represents another risk requiring mitigation.

Summary of the invention

Aspects of the invention are as set out in the independent claims and optional features are set out in the dependent claims. Aspects of the invention may be provided in conjunction with each other and features of one aspect may be applied to other aspects.

In a first aspect of the disclosure there is provided a method for managing at least one drone port. The method comprises operating a trained machine learning model to control each drone ports. Each of the drone ports comprises: at least one level comprising a landing pad for receiving at least one flying drone; a plurality of lockers for receiving goods transported by the plurality of flying drones; and a plurality of rovers configured to transfer goods from a flying drone landed on the landing pad to one of the plurality of lockers. The machine learning model is trained heuristically using a simulation of the plurality of drone ports, and based on a reward given as a function of environment state.

In some examples each drone port may comprise a plurality of levels, wherein at least one of the levels comprises a landing pad for receiving at least one flying drone, and wherein the plurality of levels are connected by a computer operated lift. In some examples the method may be for managing a plurality of drone ports. In such examples the method may comprise operating the trained machine learning model to control the plurality of drone ports.

The machine learning model may be a deep reinforcement machine learning model.

The simulation may for example be a digital twin. The simulation may comprise modelling the actions of the plurality of drone ports as a series of discrete actions, and wherein the task time to perform each discrete action is simulated using time-step being an arbitrary value, wherein the power level of each flying drone or rover is modelled using an artificial charge and discharge rate, and wherein it is assumed that all flying drones fly at the same speed.

The reward may be calculated when a terminal state is reached in the simulation, the terminal stated being for example goods being delivered to their intended destination, or an undesired state being reached, for example as determined by a selected threshold period of time being exceeded.

The simulation may comprise conducting a series of episodes, wherein each episode comprises a selected sequence of actions, such as delivery of goods to their selected destination, and is terminated when the selected sequence of actions is successfully completed or an undesired state being reached, and wherein a reward is calculated for each episode.

When a reward is calculated, this may be fed back to the machine learning model as deep reinforcement machine learning.

The machine learning model may be configured to meet at least one of a plurality of constraints comprising: (i) correctness, (ii) energy consumption, (iii) charge level of one of the plurality of flying drones, (iv) charge level of at least one of the plurality of rovers, and (v) time. The environment state may comprise at least one of (i) the position or status of at least one of the plurality of flying drones, (ii) the position or status of at least one of the plurality of rovers, and (where the drone port comprises a plurality of levels connected by a computer operated lift) (iii) the position or status of at least one of the lifts, and (iv) the status of at least one of the plurality of lockers. The status of the plurality of lockers may comprise at least one of: (i) the locker being occupied with goods, (ii) the locker being open and/or unlocked, and (iii) the locker being closed and/or locked.

The reward may comprise a penalty for any of the following environment states: (i) repetitive movements, (ii) delivery of the goods to the wrong destination, (iii) charge level of at least one of the plurality of flying drones and/or one of the plurality of rovers dropping below a selected threshold, and (where the drone port comprises a plurality of levels connected by a computer operated lift) (iv) movement of at least one of the lifts without anything being moved by the lift.

The machine learning model may be based on an asynchronous variant of proximal policy optimization.

The machine learning model may be configured to perform an objective function to maximise the reward.

The method may further comprise outputting, from the trained machine learning model, a sequence of discrete actions to be performed at each of the drone ports. The sequence of discrete actions may correspond to an episode. The method may further comprise sending the sequence of discrete actions to at least one of: (i) at least one of the plurality of flying drones, (ii) at least one of the plurality of rovers, and (where the drone port comprises a plurality of levels connected by a computer operated lift) (iii) at least one of the plurality of lifts.

The machine learning model may be configured to provide a plurality of command outputs. The method may comprise performing a cross-check of the command outputs for safety. The cross-check may be performed in software or manually by a user. For example, a user may be prompted via a user interface to approve a series or sequence of command outputs.

In another aspect of the disclosure there is provided a method of training a machine learning model for managing at least one drone port. Each drone port comprises: at least one level comprising a landing pad for receiving at least one flying drone; a plurality of lockers for receiving goods transported by the plurality of flying drones; and a plurality of rovers configured to transfer goods from a flying drone landed on the landing pad to one of the plurality of lockers. The method comprises creating a digital model of the plurality of drone ports; creating a series of constraints to be met by the digital model; creating a series of rewards when particular environment states are met by the digital model; and training the machine learning model using the digital model, constraints and rewards. The machine learning model may be a deep reinforcement machine learning model.

It will be understood that in some examples the method may comprise a method of training a machine learning model for managing a plurality of drone ports. It will also be understood that in some examples each drone port may comprise a plurality of levels connected by a computer operated lift.

The plurality of constraints may comprise at least one of: (i) correctness, (ii) energy consumption, (iii) charge level of one of the plurality of flying drones, (iv) charge level of at least one of the plurality of rovers, and (v) time.

The environment states may comprise at least one of (i) the position or status of at least one of the plurality of flying drones, (ii) the position or status of at least one of the plurality of rovers, and (where the drone port comprises a plurality of levels connected by a computer operated lift) (iii) the position or status of at least one of the lifts, and (iv) the status of at least one of the plurality of lockers. The status of the plurality of lockers may comprises at least one of: (i) the locker being occupied with goods, (ii) the locker being open and/or unlocked, and (iii) the locker being closed and/or locked.

The machine learning model may be based on an asynchronous variant of proximal policy optimization.

The deep reinforcement machine learning model may be configured to perform an objective function to maximise the reward.

In another aspect of the disclosure there is provided a system for managing a fleet of a plurality of flying drones operating from at least one drone port. Each drone port comprises: at least one level comprising a landing pad for receiving at least one flying drone; a plurality of lockers; and a plurality of rovers configured to transfer goods from a flying drone landed on the landing pad to one of the plurality of lockers. The system comprises a collaborative server, wherein the collaborative server is configured to communicate with the plurality of flying drones, and the plurality of drone ports, and wherein the collaborative server is configured to run a machine learning model to control operation of the plurality of flying drones, the plurality of rovers and the lifts.

It will be understood that in some examples the system may be a system for managing a plurality of drone ports. It will also be understood that in some examples each drone port may comprise a plurality of levels connected by a computer operated lift.

The collaborative server may be configured to output a sequence of discrete actions to be performed at each of the drone ports. The collaborative server may be configured to send the sequence of discrete actions to at least one of: (i) at least one of the plurality of flying drones, (ii) at least one of the plurality of rovers, and (where the drone port comprises a plurality of levels connected by a computer operated lift) (iii) at least one of the plurality of lifts.

The machine learning model may have been trained according to the method of the aspect described above.

The collaborative server may be configured to send high level commands to any of the plurality of flying drones, the plurality of rovers and the lifts, wherein each high-level command is an instruction to perform a discrete action. The system (for example, the collaborative server), may be configured to perform a cross-check of the high-level commands for safety. This cross-check may be performed in software, or by way of a user interface to prompt a user to check that the system is operating safely.

The collaborative server may be configured to handle the scheduling of the flights in association with external applications that provide flight path approval, such as an automated unmanned traffic management system.

Each drone port may comprise a minimum of one parcel locker, one drone, one parcel pickup rover, one drone garaging rover, and one charging station for either a drone or a rover. In some examples each drone port may also comprise at least one lift.

In another aspect of the disclosure there is provided a computer readable non-transitory storage medium comprising a program for a computer configured to cause a processor to perform the method of any of the aspects described above.

Drawings

Embodiments of the disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which: Fig. 1 shows a schematic view of an example drone port system;

Fig. 2a shows a landing pad of an example drone port;

Fig. 2b shows a human parcel porter dropping off a parcel at a drone port such as the drone port of Figs 1 and 2a;

Fig. 3 shows a flow chart of an example method of training a machine learning model for operating a drone port such as that shown in Figs. 1 and 2;

Fig. 4 shows a graphical interface of a digital model simulation;

Fig. 5 shows the digital model simulation after nine parcels have been delivered;

Fig. 6 shows a schematic illustration of an agent’s typical network of weights which may be learned during training of a machine learning model such as that training used the method of Fig. 3;

Fig. 7 shows shows an example of a reinforcement learning approach that may be used for example in the method of Fig. 3; and

Fig. 8 shows a graphic of the learning reward and penalty against steps taken, which be the result of the reinforcement learning approach of Fig. 7.

Specific description

The automated transportation of parcels and drones requires high level intelligent wheeled robots/rovers that can physically interact with parcels and drones with high accuracy and dexterity. Such rovers should be able to recognize, pick up, handle, and drop off parcels wherever required. Such devices must have local embedded intelligence to perform tasks independently.

Following Civil Aviation Authority regulatory best practice, drones should where possible operate away from the ground so as to avoid contact with humans, animals and property, therefore drone ports ideally need to have at least one floor, where the roof is the landing or drop off and take off area.

In any one drone port therefore, there may be several rovers for drone and parcel handling as well as lifts, charging systems, location systems. For an optimal solution all these devices must collaborate efficiently in order to spend the least time and energy to perform their required functions.

Multiple drone ports are needed to form a logistics network. Multiple drone ports must collaborate and fully understand the progress of drone and parcel handling at other ports so as to be able to schedule efficient services that take account of the overall status at all drone ports. Thus, a collaborative control system or “collaborative server” is required that integrates the tasks of otherwise independent robots.

An example drone port system 100 is shown in Fig. 1. The system 100 as shown in Fig.

1 shows a drone port 101 comprising: o a. a plurality of levels with a landing pad 105 on the roof top for drone landing, drone take-off or parcel drop-off; o b. a plurality of autonomous mobile robots or rovers 109 configured to transfer goods from a flying drone landed on the landing pad to one of the plurality of lockers; o c. at least one computer-operated lift or elevator 107 to enable transport between the plurality of levels; o d. a plurality of lockers 111 for incoming and outgoing parcels (the lockers 111 may be operable to locked/unlocked by a computer); and o e. a collaboration server 150, which is a central processing server configured to communicate with the rovers 109, lifts 107, drones 103 and other processors and sensors that are spread out in each of the networked drone ports 101.

Although in the example shown the drone port has a plurality of levels, it will be understood that in some examples a drone port may only have one level - such as a rooftop drone port or a ground-based drone port (for example adjacent to a warehouse).

In some examples the system also comprises at least one charging station and/or at least one network of multi-level corridors connected to the lift shaft which offer garaging and electric charging to the incoming drones 103.

The autonomous mobile robots or rovers 109 may comprise a set of on-board sensors, processors, software, and other electronics configured to provide them with two- dimensional navigation and travel capabilities that enable them to navigate and travel autonomously both along the drone port roof, the drone port floor and the drone port corridors dedicated to parcel lockering and the drone port corridors dedicated to drone garaging and charging.

The plurality of rovers 109 may comprise one or more garaging rovers 109a for garaging one or more drone 103 at a charging station, and one or more parcel rovers 109b for carrying parcels to/from the drones 103 and to/from lockers 111.

The rovers 109 may achieve 3-dimensional movement around the drone port utilizing their 2d navigation and use of the lift or lifts 107. Utilizing linear actuators to push up, forward, backward or forward or backward in a rotation arc, the rovers 109 can flexibly handle the collection, transport and drop off of drones 103 of various shapes and sizes.

The system 100 may also comprise at least one parcel acceptance station comprising a processor, user interface terminal screen, parcel April Tag, QR code or RFID scanner, weighing scale and software dedicated to interface with other processors in the system in order to allow a human or automated operator to drop off a parcel at the acceptance station there on for delivery to another drone port, the system then performing the tasks of drone and parcel handling, flight scheduling and final take off.

The collaborative server, CS, 150 may be configured to control aspects of the drone port 101 , such as the tasks of computer-controlled entities such as the lifts 107, drones 103 and rovers 109 in real time at at least one drone port 101 when deliveries are to be made from drone port 101 to any non-drone port destination.

The collaborative server, CS, 150 may be configured to control aspects of the drone port 101 , such as the tasks of computer-controlled entities such as the lifts 107, drones 103 and rovers 109, in real time at a plurality of drone ports 101 when deliveries are to be made between drone ports 101 and any non-drone port destination, thereby synchronizing multiple locations so that the parcel transport network works optimally.

The collaborative server, CS, 150 is therefore configured to output a sequence of discrete actions to be performed at each of the drone ports 101. The collaborative server, CS, 150 may therefore be configured to send the sequence of discrete actions to at least one of: (i) at least one of the plurality of flying drones 103, (ii) at least one of the plurality of rovers 109, and (iii) at least one of the plurality of lifts 107. It will be understood that the collaborative server, CS, 150 may be configured to send discrete actions to other entities (such as a charging station) that form part of the drone port 101 where present.

The collaborative server, CS, 150 may be configured to use data from beacons and markers to locate the position and orientation of computer-controlled entities such as the drones 103, parcels, rovers 109 etc., and provide high level path navigation information for broad navigation tasks, while specific local navigation and obstacle avoidance is carried out by each rover 109, drone 103 or lift 107.

The collaborative server, CS, 150 may be a software application running, e.g., on a dedicated computer or in the cloud which has communications to some or all of drone ports 101 and associated rovers 109 and lift or lifts 107, location system sensors at the drone ports 101 , as well as all drones 103. The communications allow the collaborative server 150 to determine in real time the status of all rovers 109, lifts 107, drones 103 and enables the collaborative server, CS, 150 to send commands to each lift 107, rover 109 or drone 103 and other computer-controlled entities forming part of the drone port 101. These are high level commands which the rover 109, lift 107 or drone 103 should perform. It will therefore be understood that each lift 107, rover 109 or drone 103 may have a communications interface, such as a wireless communications interface, to communicate with the collaborative server, CS, 150. It will also be understood that each drone port 101 may have its own respective communications interface and may optionally provide a local area network for communication with computer-controlled entities of that drone port 101. Similarly, each of the drone ports 101 may have their own respective collaborative server, CS 150 that communicates with the collaborative server, CS 150, of other drone ports 101 , or each drone port 101 may be configured to communicate with a common shared collaborative server, CS 150.

A high-level command that might be an instruction to perform a discrete action for example might be to instruct a rover 109 on the roof top 105 to go from its current position to the lift 107. The CS 150 may also send a command to the lift 107 to go to the roof top level 105. The rover 109 does not need any more commands as it can move to the lift 107 using its own software application and does so until it has arrived at the lift door. Likewise, the lift 107 using its software application performs the move to the top floor 105 automatically. The CS 150, having established the rover 109 and drone 103 are at the correct places, can instruct the lift 107 to open its door and then instruct the rover 109 to enter the lift 107 once the door is open.

For a network of drone ports 101 to collaborate, the CS 150 monitors and commands the robots at all ports 101 simultaneously. When a parcel is to be sent from one port 101 to another, the CS 150 handles the scheduling of the flights in association with external applications that provide flight path approval such as an automated unmanned traffic management system. In some examples, the CS 150 is configured to provide automated unmanned traffic management.

It will therefore be understood that the CS 150 has to handle a much larger degree of real-world environmental impacts than would be the case in other settings, such as for example in a warehouse (such as a customer fulfilment center). Not only does the CS 150 have to handle multiple drones 103 flying at different points in a very large space, but it also has to take into account issues caused by e.g., environmental impacts such as rain, or other aircraft (such as other drones, helicopters, planes etc.). In the case where there are four drone ports 101 collaborating to move parcels across a road traffic congested city, each drone port 101 would be serviced by the minimum of one lift 107, one parcel locker 111 , one rover 109 (which may be a parcel pickup rover and/or a drone garaging rover), and optionally one charging station for either a drone or a rover. In order to coordinate this a collaborative server, CS 150 acting in common for all the drone ports 101 is faced with approximately 70 status variables and is required to make decisions to signal approximately 20 commands in real time with constant monitoring between each command being sent.

The collaborative server, CS 150 should be configured to deal with both binary and continuous variables such as flight distance, battery charge levels, position of a rover relative to the required destination. These calculations and decisions must be made to both realize the logic behind the systems function as well as to optimize the time taken to deliver parcels.

The software coding of the collaborative server, CS 150 may require that developers figure out the sequence of commands that not only correspond to the correct logical reaction to changes in status, but also achieve parcel delivery in an optimal way.

As it is intended that the collaborative server, CS 150 can control many more ports 101 , not only is the software coding problem impossible to solve by human developers, but the number of conditional statements required which exponentially rise. For example, a look up table approach will require many terabytes of memory to store the all the states and processing would not be achieved in real time.

The number of commands and status variables may grow as more drones 103, rovers 109 and lifts 107 are incorporated. At the level of four drone ports 101 it becomes almost impossible for a human developer to recreate the logic and optimization.

Accordingly, to handle the high number of instructions and commands that need to be controlled, the collaborative server, CS, 150 makes use of a trained machine learning model. The training of the machine learning model is described in more detail below, but in summary it is trained heuristically using a simulation of a plurality of drone ports 101 , and based on a reward given as a function of environment state. The machine learning model may be a deep reinforcement machine learning model.

As will be described in more detail below, the simulation comprises modelling the actions of the plurality of drone ports 101 as a series of discrete actions. The task time to perform each discrete action may be simulated using time-step being an arbitrary value. The power level of each flying drone 103 or rover 109 may be modelled using an artificial charge and discharge rate. It may be assumed that all flying drones 103 fly at the same speed. In some examples the simulation may be a digital twin.

The reward is calculated when a terminal state (which may be an environment state) is reached in the simulation, the terminal state being for example goods/a parcel being delivered to their intended destination, or an undesired state being reached, for example as determined by a selected threshold period of time being exceeded. The terminal states may be predefined, for example in a lookup table.

The simulation may comprise conducting a series of episodes, wherein each episode comprises a selected sequence of actions, such as delivery of goods/parcels to their selected destination, and is terminated when the selected sequence of actions is successfully completed or an undesired state being reached, and wherein a reward is calculated for each episode.

When a reward is calculated, this is fed back to the machine learning model as deep reinforcement machine learning.

The machine learning model is configured to meet at least one of a plurality of constraints. The constraints may comprise any of: (i) correctness, (ii) energy consumption, (iii) charge level of one of the plurality of flying drones, (iv) charge level of at least one of the plurality of rovers, and (v) time. The environment state comprises at least one of (i) the position or status of at least one of the plurality of flying drones, (ii) the position or status of at least one of the plurality of rovers, and (iii) the position or status of at least one of the lifts, and (iv) the status of at least one of the plurality of lockers. The status of the plurality of lockers may comprise at least one of: (i) the locker being occupied with goods, (ii) the locker being open and/or unlocked, and (iii) the locker being closed and/or locked.

The reward may comprise a penalty for states that result in an undesirable outcome. The undesirable outcomes may be predefined, for example in a lookup table. For example, the reward may comprise a penalty for any of the following environment states: (i) repetitive movements, (ii) delivery of the goods to the wrong destination, (iii) charge level of at least one of the plurality of flying drones and/or one of the plurality of rovers dropping below a selected threshold, and (iv) movement of at least one of the lifts without anything being moved by the lift.

The machine learning model is configured to perform an objective function to maximise the reward.

Fig. 2a shows shows a view of the drone port roof top 105. The roof top 105 has at least one post 32 on which is placed at a set height at least one sensor 34. The at least one sensor can comprise at least one camera looking down onto the roof 105, or at least one beacon receiver transmitter. The beacon could use ultrasonic energy or radio frequency electromagnetic energy with which to sense, receive, transmit. Such beacons are available and are called ultrawideband beacons.

An item on the roof top 105, such as a parcel or drone 103 or rover 109 can be demarked with at least one visual marker 35 such as an April Tag or Aruco Marker. The marker 35 can be seen by the at least one camera and by processing the video frames data, the location and orientation of the marker can be distinguished in relation to the camera 34. If the camera 34 is calibrated with a known root position on the roof top 105 or other platform, the location of the marker on the item relative to the root position can be inferred from the available information.

The cameras may be supported by at least one local computer such as a raspberry Pi, and the computations of the April tag pose are made using at least one software application to perform the pose estimation. When the pose estimation is sent to the collaborative server the pose can be combined with the calibration pose by at least one software application designed for this purpose and therefore this application can use data from any camera, generate multiple estimates of the item marker relative to the calibration pose and a average estimate of location and orientation generated ad broadcast for use in several other applications.

In the location process several cameras for example camera 1 , 2, 3, can be used.

To define a root or origin coordinate, an April tag is placed at a unique place A in the drone port. The nearest camera uses the April tag pose detection algorithm to calculate a matrix transformation T1-A, where this implies transformation of camera 1 for the origin point A.

The matrix transformation comprises a 4 x 4 matrix with 3 x3 rotation matrix in top left, 1x3 translation matrix column on the right and 0,0,0, 1 in the bottom row.

To calibrate camera 2 an April tag is placed at a point B where it can be seen by both camera 1 and camera 2. This provides two transforms T1-B and T2-B. From these we can calculate a new transform T ¹ ₂ . If an April tag is randomly placed in only the view of camera 1 , then we use T1-A and the pose for the randomly placed tag to calculate its position relative to A.

If an April tag is randomly placed in only the view of camera 2, then we use T1-A, T ¹ ₂ and the pose transform for the randomly placed tag to calculate its position relative to A. Similarly, to calibrate camera 3 an April tag is placed at a point C where it can be seen by both camera 2 and camera 3. This provides two transforms T2-C and T3-C. From these we can calculate a new transform T ² ₃ .

If an April tag is randomly placed in only the view of camera 3, then we use T1-A, T ¹ ₂ , T ²3 and the pose transform for the randomly placed tag to calculate its position relative to A.

An item on the roof top, such as a parcel or drone or rover can be demarked with at least one ultrawideband marker 36 such that the relative position and orientation of the ultrawideband marker can be calculated. Such off the shelf ultrawideband market systems are available, and they perform the calculations and cand send the results to the collaborative server or to any robot in the system.

At least one other visual marker can be distributed around the drone port 101 such as 37. A rover 109 may for example use its at least one camera to see the marker and since the location and orientation of the at least one other visual marker is defined in a database accessible by the rover 109 computer, the rover 109 can use the pose estimation method to calculate its own location and orientation relative to the at least one other visual marker and thereby locate itself in the drone port 101 .

Fig. 2b shows the lift, the corridors and a human parcel porter dropping off an item. The porter is shown placing the parcel onto a rover 109. The porter indicates via a mobile app that the parcel is ready for transfer. After weighing and scanning the parcel the rover 109 takes the parcel into the corridor system, where certain corridor levels allow for parcels incoming and others for lockering.

The collaborative server, CS, 150 may use a trained machine learning model or a deep reinforcement learning Al module, DRAI. Accordingly, embodiments of the disclosure disclose a method of training a machine learning model for managing a plurality of drone ports 101. Each of the plurality of drone ports 101 may be one of those as described above, and may comprise a plurality of levels connected by a computer-operated lift 107, wherein at least one of the levels comprises a landing pad 105 for receiving at least one flying drone 103, a plurality of lockers 111 for receiving goods transported by the plurality of flying drones 103, and a plurality of rovers 109 configured to transfer goods from a flying drone 103 landed on the landing pad 105 to one of the plurality of lockers 111. As shown in Fig. 3, the method 1000 of training the machine learning model comprises: creating 1010 a digital model simulation of the plurality of drone ports; creating 1020 a series of constraints to be met by the digital model; creating 1030 a series of rewards when particular environment states are met by the digital model; and training 1040 the machine learning model using the digital model, constraints and rewards.

To perform this training, the drone port 101 lift 107, rovers 109, drones 103 and lockers 111 are all simulated to a high level of fidelity in software in a digital model (additionally if the drone ports comprise other computer-operated features such as charging stations, these may also be simulated in the digital model). The digital model may be a digital twin. Since a method is to be used where the assumption is that the current state of the drone ports 101 , drones 103, lifts 107 and rovers 109 is independent of a previous state, that is there is no state memory, and is representable by a Markov decision process, then all state data must implicitly include single point measured states. Thus, states which may require past states in order to be realized such as acceleration, velocity must be explicitly provided.

Fig. 4 shows a graphical interface of a digital model simulation. In this image the simulation has just started. It can be seen how the simulation progresses from an initial state in the top left corner to a final state in the bottom right corner. Fig. 5 shows the digital model simulation after nine parcels have been delivered.

The simulation allows for the DRAI to provide commands to these simulated computer controlled entities such as rovers 109, lifts 107 and drones 103, and for the DRAI to receive status information back about the status of these computer controlled entities in simulated real time. In computer science the DRAI is termed the agent. The high-fidelity simulation of the drone ports 101 results in state data that is termed the environment state.

Using unsupervised deep reinforcement learning the DRAI performs exploration in order to learn the correct relationship between the environment states and the commands such that during an exploitation phase the DRAI can accurately operate all drone ports 101 and the computer controlled entities in a collaborative and optimal manner so as to perform parcel delivery in the shortest time.

The DRAI training framework uses a reward and penalty system to achieve this carrying out many thousands of simulations until the DRAI can operate the drone ports 101 with maximum reward and minimum penalty.

Deep reinforcement learning assumes that the environment state can be modelled as a Markov Decision Process. This means that any command generated by the DRAI is dependent only on the current environment state, which is the state of the drone port 101 simulation. Therefore, environment states include all the necessary values that allow the DRAI to learn without need for memorized states. For example, the environment state comprises at least one of (i) the position or status of at least one of the plurality of flying drones 103, (ii) the position or status of at least one of the plurality of rovers 109, and (iii) the position or status of at least one of the lifts 107, and (iv) the status of at least one of the plurality of lockers 111. It will be understood that the environment state could include the state or position of any computer-controlled entity forming part of a drone port 101 , including for example charging stations.

Fig. 6 shows an agent’s typical network of weights which are learned during the DRAI training. Typically, in a four-drone port simulation, there are around 70 inputs to the input layer, coming from robot and parcel status, the environment. There are two middle layers of weights which are modified during the exploration run for four drone ports 101. The weights illustrate the matrix coefficients which multiply the inputs via the input layer, multiply again the results by the middle layers, and finally to create the outputs via the output layer. Typically, in the four-drone port simulation there are around 20 outputs corresponding to commands that are given by the DRAI to the simulator.

As the learned information is described by the weight values in the two layers and as the implication of the value of the weight is difficult to interpret, although the DRAI may accurately deliver the CS function, we cannot explain in human terms the decisionmaking logic.

Given the safety requirements of such a system operating in the real world, and requiring CAA regulatory approval, safety may be improved by ensuring that at least one high fidelity simulation of the drone ports and included robots is used to train the DRAI, where this simulation is validated against observed data from at least one real life drone port with real-time real-life robots being provided with exactly the same test commands as the simulation robots.

Thus, the real-world sensor data in the real-world environment that results from the 20 or more commands is used to check like for like the simulated sensor data created in the simulation. By ensuring the simulated sensor data is the same as the real-world data the simulation can be validated and any discrepancies can be removed.

To provide added safety verification the DRAI performance can be tested by running the simulation with a very large number of different initial conditions and changes in drone 103, rover 109 or lift 107 performance in order to prove that no unsafe situations occur. The simulation can be run for the equivalent of several years and errors or undesirable states detected. Errors or undesirable states would include the detection of drones 103 or rovers 109 running out of charge, the usage per hour of a drone 103 or rover 109 rising beyond its operating envelope, parcels arriving to the wrong destination, parcels not being delivered, and drones 103 or rovers 109 not being used at a reasonable minimum usage level. Several other tests may additionally be applied beyond these mentioned. In such examples where an undesirable state occurs, a reward comprising a penalty may be fed back to the machine learning model.

In the second means of verification, the pathways that are represented by the weights relating input status to output command are extracted.

For each of the 20 or more command outputs from the DRAI we randomly sample the different sets of robot simulation environment input states that causes the triggering of the commands. These input states can be represented using an English text description and can be coded so as to be human readable.

Thus, a typical result for one command and one set of input states may read: -

“IF LIFT AT GROUND FLOOR AND ROVER IS AT LIFT DOOR ON ROOF TOP AND PARCEL MUST BE LOCKERED THEN SEND LIFT TO TOP FLOOR.”

The above description example would in reality be much longer incorporating all relevant status terms.

The samples set of unique descriptions of the network can be delivered for human validation. Although many hundreds of such descriptions are generated, within a short time, a team of humans can check that all are safe and valid.

Due to limitations in the DRAI method, some commands may seem illogical to a human, however as long as they are not unsafe and do not waste time to achieve a correct overall result, they can be acceptable.

If an unsafe decision is identified, this will be very rare since a long-term simulation should have identified it, however, to fix the issue one can modify the reward or penalty definitions in order to impact this behavior and thereby remove the possible unsafe logic, alternatively the exploration phase may be run for a longer time. In an associated process one can repeat the DRAI training process but with slightly different reward and penalty definitions as well as random initial conditions. As a result, we can arrive at more than one version of the DRAI. Each of these DRAIs will have very slightly different weights but in theory should provide the same command for a given drone port 101 status.

With multiple DRAI CS 150 decision makers one can run them in parallel. In effect this means that there may be a plurality of collaboration servers, CS, 150 each having a respective machine learning model. This allows the creation of a system with inbuilt redundancy or with majority voting of commands to be used for a given status input.

The collaborative server, CS, 150 hardware may comprise: (i) at least one high power processing unit preferably including at least one calculations accelerator hardware support, (ii) at least one communications support to receive and send data to all drone ports 101 , (iii) at least one operating system, (iii) at least one user interface, and (iv) at least one power unit.

To create the DRAI (the machine learning model used by the collaborative server, CS, 150), at least one deep reinforcement learning framework is required with at least one deep reinforcement trained agent and at least one high fidelity simulation of drone ports 101 and associated drones 103, rovers 109 and lifts 107, the status of which is equivalent to the environment state required by the deep reinforcement learning. When operating so as to coordinate collaborative operations between all drones 103, rovers 109, lifts 107 and drone ports 101 , the collaborative server, CS, 150 software comprises at least one decision making software that accepts as inputs the state of the drone port 101 and calculates the high-level commands to send to each drone 103, rover 109 or lift 107 in each drone port.

The at least one decision making software may be comprised of any mix of:

(i) at least one deep reinforcement artificial intelligence network that has been trained exhaustively using a near exact digital twin simulation of the drone ports that it will serve;

(ii) at least one deep reinforcement artificial intelligence network that has been trained exhaustively using a near exact digital twin simulation of the drone ports 101 that it will serve, with additional training under many different initial conditions and drone 103/rover 109/lift 107 behaviour;

(iii) at least one deep reinforcement artificial intelligence network that has been trained exhaustively using a near exact digital twin simulation of the drone ports 101 that it will serve, with additional training under many different initial conditions and drone 103/rover 109/lift 107 behavior and which has been safety verified by long term in simulation testing with extra fail detection software;

(iv) at least one deep reinforcement artificial intelligence network that has been trained exhaustively using a near exact digital twin simulation of the drone ports 101 that it will serve, with additional training under many different initial conditions and drone 103/rover 109/lift 107 behavior and which has been safety verified by human verification of a very large sample of decisions made during a long term in simulation testing, where those decisions are exported in a human readable format for further checking; and

(v) at least one decision making software application based on a plurality of deep reinforcement artificial intelligence networks that has been trained independently and exhaustively using a near exact digital twin simulation of the drone ports 101 that it will serve, with additional training under many randomly chosen different initial conditions and drone 103/rover 109/lift 107 behavior and where the final decision is based on a type of majority voting system between the plurality of command outputs and where the decision making software has been safety verified by long term in simulation testing with extra fail detection software and has been safety verified by human verification of a very large sample of decisions made during a long term in simulation testing, where those decisions are exported in a human readable format.

Fig. 7 shows an example of a reinforcement learning approach that may be used. Reinforcement learning is a type of unsupervised learning where the algorithm has to find the most optimal solution to its task without any input from the user. Fig. 7 depicts an overview of what a generic reinforcement learning setup will look like. In Fig. 7 the algorithm which has to find the most optimal solution is called an agent. The environment is where the agent lives in and interacts with. For every action that the agent performs, the environment will give a reward and inform the agent what is the state of the environment that it is currently in. The reward given can be positive or negative depending on whether the agent has performed an action that will benefit or set itself back. You can think of the reward system like a carrot and stick approach.

An objective function may be used to maximise the reward. In the case of controlling the drone port network, a reward is given whenever the agent is able to deliver parcels from one drone port 101 to another correctly i.e., the right address. Besides that, the rewards obtained at the very end of a learning cycle (or episode) reduces as the learning cycle (or episode) gets longer. These methods guarantee that the agent will deliver a parcel from one drone port 101 to another in the most efficient way since the agent will try to maximise its reward.

In the present example, all development was done using Python 3 on a Linux and Windows platform. The libraries that were used were OpenAI Gym, which provides the structure that is needed to implement the drone port network environment, and RLLib which has many reinforcement learning algorithms which can be easily plugged into the drone port network environment. However, it will be understood that other programming languages and libraries may be used.

The robotic systems that may be considered in a drone port 101 are:

1. Lift 107

2. Garaging Rover 109a

3. Parcel Rover 109b

4. Drone 103

Here we list the different possible tasks it is assumed each robot can perform:

Lift 107 o Move to floor N (where N is the amount of floors there are in a drone port. If N is

4, there are 4 possible actions which the lift can perform.)

• Garaging Rover 109a o Idle o Pick Up Drone o Put Down Drone o Go To Lift o Go To Charging Station o Go To T akeoff Location o Enter Lift

• Parcel Rover 109b o Idle o Pick Up Parcel o Put Down Parcel o Go To Lift o Go To Charging Station o Go To T akeoff Location o Go To Parcel Locker o Enter Lift

• Drone 103 o Idle o Fly

To conform to OpenAI Gym environment standards, actions of each robotic components must be modelled using one of the following data structures:

• Discrete o Agent can take one action at each timestep

• Multi-Discrete o Agent can take multiple actions at each timestep

• Tuple o A data structure to encapsulate simpler actions

• Dictionary o A data structure to helped group actions together in a dictionary format

• Box o A data structure that is like an array but has bounds.

• Multi-Binary o A data structure which is similar to one-hot encoding

At the beginning, the actions were modelled using Multi-Discrete. However, the amount of actions possible for each time-step will exponentially increase when we introduce more robots. This makes the agent harder and longer to find the most optimal solution. As a result, the actions of the robots were modelled using Discrete.

Task time is simulated using time-step, the atomic unit of time in the reinforcement learning environment. So, each task time will take a certain amount of time-step. Additionally, time-step is an arbitrary value that can be easily translated into actual time taken for specific actions.

At the beginning of each simulation or episode, all the ground robots (i.e., drone 103/rover 109/lift 107) start with full charge. To simulate real-life scenarios, an artificial charge and discharge rate were introduced for all robots. The batteries discharge via idling or by performing an action. The discharge rate set for idling is lower as compared to the robot performing an action. The charge and discharge rate depend on the timestep of the environment which can be easily changed and defined to reflect much more closely to a real-world situation.

It is assumed that all drones 103 fly at the same speed, thus the varying factor will be flight time. Similar to the case of ground rovers 109, power consumption of drones 103 depends on the flight time which in turn depends on the time-step of the environment which can be easily defined by the user.

There are several penalties that may be applied in the environment:

1 . Repetitive movements i.e., agent moves the rover back and forth

2. Agent delivers parcel to the wrong destination

3. Robot (i.e., drone 103 or rover 109) charge level drops to 0

4. Moving lift 107 without anything inside the lift 107

These penalties are applied so that it discourages the agent from doing such actions in the future.

The main reward given is when a parcel is delivered from a drone port 101 to another drone port 101 (the drone port 101 the parcel is supposed to be delivered to). However, to encourage and speed up learning, smaller rewards are given to the agent for doing tasks that help to run the drone port 101 efficiently.

The following are a list of possible rewards that may be given:

1 . Delivering a parcel to the correct destination

2. Charging the drone 103

3. Moving lift 107 with something inside the lift 107

The steps required to achieve a single objective in human terms:

For a human to deliver a parcel from one drone port 101 to another:

1 . Garaging rover 109a takes a drone 103 from the charging station.

2. Lift 107 goes to the charging station’s floor

3. Garaging rover 109a enters lift 107

4. Lift 107 goes to the roof-top

5. Parcel rover 109b goes to parcel lockers 111 and collect a parcel

6. Lift 107 goes to the parcel lockers’ floor

7. Parcel rover 109b enters lift 107

8. Lift 107 goes to the roof-top 9. Parcel rover 109b loads parcel

This shows the logic sequence is deep and multi robot in parallel.

When two drones 103 are operating in parallel, the action space and observation states gets larger which in turn increases training time and complexity.

A learning cycle may be started as follows: whenever, the agent enters into a terminal state (where either it managed to deliver all the parcel or it has entered into a very undesirable state), the total reward is calculated and the next episode starts. If the episode length got too long, the episode will end, and the total reward is calculated, and the next episode starts.

Fig. 8 shows a graphic of the learning reward, penalty against steps taken, which visualises that the reward is growing.

Reward received by the agent in a 2-drone port network, where there is one drone 103 in the entire network and there is 1 parcel rover 109b and 1 garaging rover 109a in each drone port 101.

Reward received by the agent in a 3-drone port network, where there is one drone 103 in the entire network and there is 1 parcel rover 109b and 1 garaging rover 109a in each drone port 101 . The maximum rewards for both drone ports 101 are different as there are more parcels to deliver in each training iteration.

As with all reinforcement learning algorithms, there is no one-rule-fits-all on when to stop training. However, the rule of thumb to when to stop training will be when you start noticing the highest and mean rewards obtained by the agent starts to plateau. When this happens, it usually means that the agent has learned a policy to maximise your rewards.

During this project, the algorithms PPO, APPO, IMPALA and APE-X were all tested. Amongst these, APPO was by far the best performing. PPO also performed well, however is not a high throughput architecture, meaning it took far longer to run than any of the others mentioned. APPO is an asynchronous variant of Proximal Policy Optimization (PPO) based on the IMPALA architecture. This is similar to IMPALA but using a surrogate policy loss with clipping.

Other architectures may also find successful, however APPO proved a good option due to requiring minimal hyperparameter search and its fast training on multiple cores. It will be appreciated from the discussion above that the embodiments shown in the Figures are merely exemplary, and include features which may be generalised, removed or replaced as described herein and as set out in the claims. In the context of the present disclosure other examples and variations of the apparatus and methods described herein will be apparent to a person of skill in the art.

Previous Patent: CRYSTALINE FORMS OF A CANNABIDIOL-LIKE CANNABINOID

Next Patent: A SPREADER APPARATUS