Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A SYSTEM AND METHOD FOR REMOTELY PROVIDING ASSESSMENTS OR QUOTATIONS
Document Type and Number:
WIPO Patent Application WO/2022/011415
Kind Code:
A1
Abstract:
Disclosed is a system for facilitating a remote assessment, comprising a first application configured for execution on a device of a remote user, and a second application executable on a device of a local user. The first and second application are configured for communication with each other over a real time communication session, a data feed from said session being displayable within respective user interfaces of the first and second applications. The second application is further configured to control at least one component of the local user device, wherein the first application is configured to receive an input from the remote user and generate a control request to the second application over the communication session to activate the at least one component or control a function thereof.

Inventors:
OAKES PHILIP JAMES (AU)
Application Number:
PCT/AU2021/050747
Publication Date:
January 20, 2022
Filing Date:
July 14, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
REMOTEQUOTE PTY LTD (AU)
International Classes:
G06K9/00; G06T19/00
Foreign References:
US20160103437A12016-04-14
US9674290B12017-06-06
US20190158547A12019-05-23
US9041796B22015-05-26
US20180350145A12018-12-06
Other References:
YUE XIANG: "An Augmented Reality Interface for Supporting Remote Insurance Claim Assessment", UNIVERSITY OF CANTERBURY, 1 March 2016 (2016-03-01), pages 1 - 73, XP055899786
Attorney, Agent or Firm:
GRIFFITH HACK (AU)
Download PDF:
Claims:
CLAIMS

1. A system for facilitating a remote assessment, comprising: a first application configured for execution on a device of a remote user, and a second application configured for execution on a device of a local user, the first and second application being further configured for communication with each other over a real time communication session, a data feed from said session being displayable within respective user interfaces of the first and second applications; the second application being configured to control at least one component of the local user device, wherein the first application is configured to receive an input from the remote user and generate a control request to the second application over the communication session to activate the at least one component or control a function thereof.

2. The system of claim 1 , wherein the at least one component comprises one or more of: a camera, a flashlight, a zoom lens.

3. The system of claim 1 or claim 2, wherein the at least one component comprises one or more of: a photography function, and a data upload function.

4. The system of any preceding claim, wherein the real time communication session is a video session provided by real time communication service, wherein a negotiation to receive a session access data for the communication session is initiated from the first application.

5. The system of claim 4, wherein the control request is provided as a data object and transmitted over a signalling capability of a protocol for the real time video communication session.

6. The system of claim 4 or claim 5, configured to provide the session access data over a communication network to be accessible from local user device, wherein an access of said session access data causes the second application to be launched on the local user device.

7. The system of any preceding claim, comprising wherein the first application is a web application hosted by a server having an application program interface.

8. The system of claim 7, said application program interface being configured to functionally interface with a server providing the real time communication service.

9. The system of any preceding claim, wherein the at least one component includes an augmented reality (AR) module executable on a processing unit of the local user device, configured to generate a three dimensional (3D) model of an environment for which images are captured by a camera of the local user device.

10. The system of claim 9, wherein the AR module is configured to receive data from said camera to track the local user device in relation to the 3D model, to create a real time two dimensional (2D) representation of the 3D model from a perspective of the camera, and provide the 2D representation in a video feed for the real time video communication session.

11. The system of claim 10, wherein said tracking includes AR plane detection.

12. The system of any one of claims 9 to 11 , where the 3D model includes a plurality of node objects, each corresponding to a point of interest in the 3D model.

13. The system of claim 12, wherein the AR module is configured to accept a point selection input, whether it is from the remote or local user, selecting a location on the 2D representation being shown on the corresponding user interface, and determine whether the selected location can be matched to at least one of the points of interest, wherein if there is a match, the point is selected and is associated to the matched point of interest.

14. The system of claim 13, wherein the AR module is configured to enable a series of two or more of said selections, one at a time.

15. The system of claim 13 or claim 14, wherein the AR module is configured to, if a match is determined, provide a 2D data object which is associated with the matched point of interest, on a 2D transparent layer object overlaid onto the 3D model.

16. The system of claim 15, wherein the AR module is configured to add a new one of said 2D data object, before a new selection for a point of interest is matched.

17. The system of claim 15 or claim 16, wherein said 2D data object has a Z-index level which is one of multiple available Z-index levels.

18. The system of any one of claims 14 to 17, wherein the AR module is configured to provide a linkage data to link two consecutively selected points which are each associated with points of interest on the 3D model, thereby linking the two points of interest.

19. The system of claim 18, wherein the linkage data includes a distance measure between said two points of interest, and the first and second applications are configured to display said distance measure is displayed on the respective user interfaces.

20. The system of claim 18 or claim 19, wherein when a next point is selected, the linkage data is modified to include information in relation to said next point.

21. The system of claim 20, wherein the information includes a distance measure between a point of interest associated with said next point and a point of interest associated with the immediately preceding selected point.

22. The system of claim 20 or 21 , wherein the information includes an angular measure between lines, respectively connecting said next point to the immediately preceding selected point and between said next point and a further next selected point.

23. The system of any one of claims 20 to 22, wherein the information includes an area measure for a shape defined by the next point and two or more immediately preceding selected points.

24. The system of any one of claims 14 to 23, wherein the AR module is configured to: if a new point associated with a new selection input is selected, and is determined to have a match with a previously selected point of interest, and said new point selection is immediately preceded by selection of at least two other points different from said previously selected point of interest; provide a data linkage object to link said new selected point with the at least two other points, and calculate an area of a shape formed with the new selected point and the at least two other points as vertices.

25. The system of claim 24, wherein when a next new point associated with a next new selection input is selected, said selected next new point does not have a data linkage object which links to said points providing said vertices.

26. The system of any one of claims 14 to 25, wherein the AR module is configured to, if a new point associated with a new selection input is determined to have a location in the 3D model which is close to a point of interest, cause display object showing the new selection input to snap or jump to a 2D location representing that point of interest.

Description:
A SYSTEM AND METHOD FOR REMOTELY PROVIDING ASSESSMENTS OR QUOTATIONS

TECHNICAL FIELD

This disclosure relates to a system and method for remotely providing assessments, utilising an augmented reality tool. The tool can be used where a remote access for information or metrics regarding a local object is required. It has applications in the provision of work or site assessments which can be used to make trades quotations or property inspections, but is not limited to these applications.

BACKGROUND ART

In a variety of situations, a person may need to obtain information regarding, or metrics or images of, an object or environment remote to them. Unless there is information already acquired, they require someone onsite to obtain the information, or otherwise utilise technology which can be deployed to obtain the data required.

Depending on the circumstances, there may not be any appropriate technology to obtain this data required to make an assessment remotely. The lack of such a technical tool may logistically complicate or render impossible the real life tasks such as providing trades quotations, emergency work, insurance assessments, repairs, or providing assessment in a healthcare setting, etc. In these and some other circumstances, a suitably skilled person with the required knowledge/experience is required to be onsite, in the absence of the technical tool. This often means a service provider or emergency assessor needs to travel in order to obtain the data required, placing limitations on the service provider’s or assessor’s ability to efficiently plan their pipeline of work or appointments. In emergency situations where immediate assessment is required, the need to send someone onsite also limits the ability for service providers to timely review or triage the issue. From a service seeker’s point of view, their issues may not be timely resolved, or they may need to travel to the service provider (e.g., for healthcare assessment) or wait for the service provider (e.g., tradesperson) to show up.

It is to be understood that, if any prior art is referred to herein, such reference does not constitute an admission that the prior art forms a part of the common general knowledge in the art, in Australia or any other country. SUMMARY

In a first aspect, the present disclosure provides a system for facilitating a remote assessment. The system comprises a first application configured for execution on a device of a remote user, and a second application configured for execution on a device of a local user. The first and second applications are further configured for communication with each other over a real time communication session, a data feed from the session being displayable within respective user interfaces of the first and second applications. The second application is configured to control at least one component of the local user device, wherein the first application is configured to receive an input from the remote user and generate a control request to the second application over the communication session to activate the at least one component or control a function thereof.

The at least one component can comprise one or more of: a camera, a flashlight, a zoom lens.

The at least one component can comprise one or more of: a photography function, and a data upload function.

The real time communication session can be a real time video communication session provided by a real time communication service, wherein a negotiation to receive a session access data for the communication session is initiated from the first application.

The control request can be provided as a data object and transmitted over a signalling capability of a protocol for the real time video communication session.

The system can be configured to provide the session access data over a communication network to be accessible from local user device, wherein an access of the session access data causes the second application to be launched on the local user device.

The first application can be a web application hosted by a server having an application program interface.

The application program interface can be configured to functionally interface with a server providing the real time communication service. The at least one component can include an augmented reality (AR) module executable on a processing unit of the local user device, configured to generate a three dimensional (3D) model of an environment for which images are captured by a camera of the local user device.

The AR module can be configured to receive data from the camera to track the local user device in relation to the 3D model, to create a real time two dimensional (2D) representation of the 3D model from a perspective of the camera, and provide the 2D representation in a video feed for the real time video communication session.

The tracking can include AR plane detection.

The 3D model can include a plurality of node objects, each corresponding to a point of interest in the 3D model.

The AR module can be configured to accept a point selection input, whether it is from the remote or local user, selecting a location on the 2D representation being shown on the corresponding user interface, and determine whether the selected location can be matched to at least one of the points of interest, wherein if there is a match, the point is selected and is associated to the matched point of interest.

The module can be configured to enable a series of two or more of the selections, one at a time.

The AR module can be configured to, if a match is determined, provide a 2D data object which is associated with the matched point of interest, on a 2D transparent layer object overlaid onto the 3D model.

The AR module can be configured to add a new one of the 2D data object, before a new selection for a point of interest is matched.

The 2D data object can have a Z-index level which is one of multiple available Z- index levels.

The AR module can be configured to provide a linkage data to link two consecutively selected points which are each associated with points of interest on the 3D model, thereby linking the two points of interest. The linkage data can include a distance measure between the two points of interest, and the first and second applications are configured to display the distance measure is displayed on the respective user interfaces.

When a next point is selected, the linkage data can be modified to include information in relation to the next point.

The information can include a distance measure between a point of interest associated with the next point and a point of interest associated with the immediately preceding selected point.

The information can include an angular measure between lines, respectively connecting the next point to the immediately preceding selected point and between the next point and a further next selected point.

The information can include an area measure for a shape defined by the next point and two or more immediately preceding selected points.

The AR module can be configured to, if a new point associated with a new selection input is selected, and is determined to have a match with a previously selected point of interest, and the new point selection is immediately preceded by selection of at least two other points different from the previously selected point of interest; provide a data linkage object to link the new selected point with the at least two other points, and calculate an area of a shape formed with the new selected point and the at least two other points as vertices.

When a next new point associated with a next new selection input is selected, the selected next new point may not have a data linkage object which links to the points providing the vertices.

The AR module can be configured to, if a new point associated with a new selection input is determined to have a location in the 3D model which is close to a point of interest, cause display object showing the new selection input to snap or jump to a 2D location representing that point of interest.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments will now be described by way of example only, with reference to the accompanying drawings in which Figure 1 is a schematic illustration of a remote assessment system in accordance with one embodiment of the present invention;

Figure 2 is a schematic depiction of an example process to establish the communication between a customer device and a remote user device;

Figure 3 is a schematic illustration of the system shown in Figure 1 , overlaid with the process shown in Figure 2;

Figure 4 is an example notes and measurement interface provided in the remote user’s application;

Figure 5 is a flow chart depicting a process to control functions on a customer user device from a remote application;

Figure 6 is a schematic depiction the process of Figure 5, in relation to the activation or deactivation of a recording feature;

Figure 7 shows an example remote application user interface, including a video stream display portion and a function control portion;

Figure 8 is a schematic depiction of the system operation to take a photograph using the customer device in response to a request issued from the remote user;

Figure 9 shows an example user interface which is shown during a video WebRTC session where the augmented reality (AR) mode is off, which may be shown in either the remote application or the local application;

Figure 10 shows an example user interface of Figure 9, where the augmented reality (AR) mode is on;

Figure 11-1 shows an example user interface of Figure 9, after four points have been selected and measurement data in relation to the four points are displayed;

Figure 11-2 shows another example user interface in the AR mode, showing the measurement tool taking measurements along an eave line, where distance, area, and angle measurements are displayed;

Figure 12 shows another example user interface in the AR mode, showing the measurement tool taking measurements along an eave line; Figure 13 is a schematic view of a user interface in the AR mode, displayed when the spirit level tool is activated;

Figure 14 shows an example user interface displayed when the point distance tool is activated;

Figure 15 shows an example interface shown on the remote application when the categorisation function is enabled;

Figure 16-1 shows an example photo annotation or photo editor interface;

Figure 16-2 shows an example text annotation interface;

Figure 17 is a schematic depiction of an example system during a video session over a server WebRTC;

Figure 18 is a schematic depiction of an example system during a session, where the system includes both server and peer to peer WebRTC;

Figure 19 schematically depicts a successful acquisition and upload of a photograph, where the photograph satisfies meeting one or more “quality” metric;

Figure 20 schematically depicts an upload of an image capture of the video stream to a storage location;

Figure 21 schematically depicts an example process whereby the remote user starts the augmented reality mode in the local user’s application;

Figure 22 schematically depicts an example of a process whereby the remote user selects a point in the augmented reality capture in the local user’s application; and

Figure 23 schematically depicts an example of a process whereby the remote user controls a flashlight function on the local user’s device.

DETAILED DESCRIPTION

In the following detailed description, reference is made to accompanying drawings which form a part of the detailed description. The illustrative embodiments described in the detailed description, depicted in the drawings and defined in the claims, are not intended to be limiting. Other embodiments may be utilised and other changes may be made without departing from the spirit or scope of the subject matter presented. It will be readily understood that the aspects of the present disclosure, as generally described herein and illustrated in the drawings can be arranged, substituted, combined, separated and designed in a wide variety of different configurations, all of which are contemplated in this disclosure.

Herein disclosed is a multi-user system which allows the users to communicate with each other, whereby information concerning objects or locations accessible by a local user (customer) is obtainable by a remote user (remote user), particularly by enabling the remote user to control functionalities which are fulfilled or provided by the local user’s device. For instance, in applications where a customer needs to obtain a quotation for a trade job or repair, the system facilitates a remote user (e.g., tradesperson) with the ability to remotely connect with the local user (e.g., a customer) via a video streaming facility, whereby the customer can communicate with the user and is able to show them any issue or problem that they may have. However, it will be appreciated that aspects of the invention disclosed herein will have utility in applications other than the making of trade or repair related assessments. For example, they may have applications for a remotely located health professional to assess a patient, in which case the remotely located person will be the “remote user” and either the patient or another person on-site will be the “local user”. The technical character of the disclosed inventive aspects is not dependent on the particular real life situation to which the inventive aspects are applied.

Depending on the embodiment, the remote user will have access to a suite of tools which can facilitate the capture of the issue and the obtainment of information required for the task. For example, the remote user can be enabled by the system to initiate, pause or stop a filming (recording) of the communication. They may also be enabled to control a visual or an audio aspect, or both, of the communication. For example, the remote user may be enabled to take individual photographs, mute the conversation, control the flashlight on the local user’s mobile device, as required, etc. The photos, audio and video can be automatically uploaded to a data storage location or a cloud- based storage system, for storage, viewing, editing, or distribution. In addition, the data can be retained for post evidential reasons. The cloud-based storage may be provided by a third party, such as Amazon S3.

As will be described, the photos are uploaded from either the remote or local user’s device. The uploads are automatic in some embodiments. The uploads are made a cloud-based storage system in embodiments using cloud-based storage. The photo or photos may be temporarily stored in the device memory or other local device location, until the image shown in the photo is no longer needed on the device. The status that the photo is no longer needed is confirmed by, e.g. a successful upload, or completion of any editing, as is implementable in different embodiments. The video recording is captured, in some video-enabled embodiments, via a cloud-based server and stored on a cloud based storage system for editing, viewing and distribution

In preferable embodiments, after the session is completed, the system will retain all the relevant information of selected data locations and make it available for the user to be able to utilise it as required to assist with the presentation of the results so that suitably accurate information (measurements and associated images) can be presented or added to a required report.

As will be described below, the invention integrates the use of Augmented Reality (AR) technology to enable the customer to take measurements (distance, height, width, area and angles) during the video streaming, and to record images (photos or videos). These attributes (photos and measurements) will be able to be incorporated into a report or used to assist with providing accurate information.

The invention described herein allows both the remote user and the customer to manage their respective applications and control the functionalities provided therein, to obtain data or information to be further utilised.

System overview

Figure 1 schematically illustrates an overview of a system in accordance with one embodiment of the present invention. The system 100 includes a server 102 to which a first user device 104 (e.g., of a local user, such as a patient or a customer) and a second user device 106 (e.g., of a remote user, such as a service provider such as a tradesperson) can be connected. For simplicity, the first user device is hereafter referred to as the local user device 104, and the second user device is hereafter referred to as the remote user device 106. The local user device 104 can be a portable computing device, and the remote user device 106 can be a computer or a mobile device.

The communication server 102 includes a real time communication module 122 such as a web-based real time communication (also known as “WebRTC”) module, so that it can host a real time communication session, between an instance of a local application running the local user device 104 and a remote application operable from a remote user device 106. In preferred embodiments the real time communication session is a real time video session. The server 102 can be provided by a third party WebRTC service provider. Alternatively, it can be part of or linked to a server 120 of the system 100, and include a proprietary WebRTC module, to allow registered users to communicate with each other. The signalling service may be configured to include a chat protocol. By allowing the local user to authenticate with the proprietary server, session history or scheduled session information may be saved for review or other purposes. For instance, this may enable a system administrator, or a user, to see what sessions occurred in the past or have been scheduled with the user. In the case of remote users which belong to the same company or affiliated group, the session history or future session information may also be searchable on the basis of the group or company. The session history may include or have links for access to the data obtained during the session if the sharing of the data has been enabled, authorised, or both. For simplicity, hereafter references will be made to the WebRTC server 102 and the WebRTC module 122.

In an embodiment, the system server 120 is configured to host one or more instances of a second application 118 (hereafter referred to as “remote application” 118) accessible from the remote user device 106. The remote application 118 may be a web-based application. The system server 120 also includes a server application API (Application Program Interface) module 114 to communicate with the remote application 118, and the WebRTC service 102. The remote application 118 may be developed using asynchronous JavaScript and XML (AJAX), or another language, transfer mechanism, or data interchange format. In preferred embodiments, the or each instance of the remote application 118 which is open on a remote user device will allow the retrieval or delivery of data between the remote user device 106 and the application API 114, while the remote application 118 is running. The system server 120 can also include one or more other application program interfaces (API) so that the applications can interface or interact with other modules within the system server 120, or the programs running on the remote user device 106.

The remote user device 106 can be a computer or a mobile computing device having the capability for video communication and to support the WebRTC and WebSocket protocols, so that it is enabled to communicate with the local user device 104 via video streaming. The output device (e.g., screen) to display the user interface provided by the remote application 118 may also act as an input device (e.g., the screen is a touch screen), or there may be a separate input device, to enable the remote user to use functionalities enabled through the user interface within the remote application 118. The WebRTC session may be configured so that it is initiated from the remote application 118, so that the display for the WebRTC session will be provided as part of the user interface within the remote application 118. The remote application 118 also provides the appropriate interface or interface component so that the remote user is enabled to provide control inputs to control one or more aspects of the WebRTC session. As will be discussed later, the remote application 118 is further configured to enable the remote user to provide control input to remotely initiate, control, and terminate augmented reality (AR) functions or tools enabled on the local user device 104.

The local user device 104 is preferably a mobile device, having an operating system which supports augmented reality functions, and having hardware to support video communication, preferably with high definition camera and video functions. In one embodiment, the local user device 104 has installed therein an instance of a second application (hereafter referred to as “local application”) 108 for executing by a processing unit of the local user device 104. The local user device 104 also includes communication modules for communication over the internet or over data communication networks, or both.

As alluded to previously, the local user device 104 will have the components required to establish video (preferably in high definition) streaming communication over the WebRTC server. Prior to the communication session being established, the system server 120 will have negotiated a session link with the WebRTC server. In particular embodiments, the system server 120 will provide a link to join the session, to the local user device 104.

The link which is provided to a local user may be a web site address (e.g. https://ioin.remotequote.com.au). Once accessed, the web site then provides a compatible link dependant on the operating system (e.g. remotequote://[parameters]) which provides the session information to the local application 108. The mobile application then parses and validates the parameters from the link to create the connection to the WebRTC service.

The local application 108 may be configured to open when the provided join link is accessed from a web browser installed on the local user device 104, to negotiate a connection to the WebRTC service. If the local application 108 is not already installed on the local user device 104, accessing the webpage may trigger the local user device 104 to download the local application 108. Alternatively, the webpage may provide the required application download link and prompt the user to download the application 108.

After the download is completed, the user can return to the web page to join the session. Alternatively, where the user is registered with his or her unique user identifier with the system. When the application is opened, the application may be configured to look up the session data associated with the unique user identifier, and gather the required parameters to "continue" the process to join the WebRTC session, without the need to refer back to the join link again. More details regarding establishing the communication session will be described later.

A remote user or local user local user may decide to use a waiting room system, or the preferred approach, where the customer will be notified at the time of session when the remote user is ready to begin.

The remote application API may implement a system in which to send notifications to each user (i.e. participant) to participate in the communication session, advising them of the upcoming session. The notifications may provide contextual information, instruction information regarding setting up and the use of the system, or both. These may be predefined. Alternatively, they may be configurable by the users, in relation to the frequency at which the notifications are delivered, the notification content (i.e., what to include in the notification) and the communication mediums, on a per session basis.

The settings may be saved to be applied to one or more later sessions.

The local application 108 contains computer readable instructions executable by a processor of the local user device 104, to implement one or more functions as described herein. The instructions may be implemented as program modules, i.e. functions, objects, program interfaces, memory structures, etc. The functions include the control of audio-video inputs to the local user device 104, communication with an instance of the web-based application running on a remote user device 106, with which the local user device 104 is in a communication session via the WebRTC server 102. The functions may also include augmented reality functions as will be described later in this specification. Other functions may also be provided. The communication session is preferably a video session, but may instead be voice only communication sessions. Establishment and connection of communication

Figure 17 is an example schematic depiction of the system during a video session over a server WebRTC. Figure 18 shows an example schematic depiction of the system during a session, where the system includes both server and peer to peer WebRTC.

Figure 2 is a schematic depiction of an example process 200 to establish the communication between a local user device 104 and a remote user device 106remote user (see Figure 1), over the server based WebRTC service.

As an initial step (step 202), the remote user will need to log into the system, by providing their user credentials via the remote application 118. In Figure 2, the step 202 is located within the graphical representation for the remote application (the rectangle with the reference numeral 118) to denote that this step is initiated from within the remote application 118.

At step 204, from the remote application 118, the remote user will request a communication session. The session may be requested for a date and time to which the remote user and the local user have prior agreed. Or the session can be created from the remote application 118 at any time that it is required, and the link to join the session can be provided to the local user, so that they can join the created session immediately.

The session request data will be provided to the application API 114 in the system server 120. The session request data may include user authentication data relating to the remote user who has requested the session. At step 206, the application API 114 parses and processes the request from the remote application 118 and will then send a request which uses a different authentication to the WebRTC server 102 for the WebRTC session creation. The API which communicates with the WebRTC server 102 is preferably a REST (Representational State Transfer) API, where the system server 120 supports web-based remote applications 118. At step 208, the WebRTC server 102 responds to the request by sending a session token back to the application API 114.

The token will include authentication data required for accessing the communication session, and the communication link. At step 210, the server application API 114 can now respond to the session request from the remote application 118 by providing a reply access token to the remote application 118 to connect to the WebRTC service.

When the remote user is ready, he or she can then use the authentication data and session link (obtained from the reply to the session request) to connect to the WebRTC server 120 to initiate the communication session. This is done from the remote application 118, and involves the remote application 118 creating the connection to the WebRTC server 102 (step 212). In this step the remote application 118 negotiates with the WebRTC server 102 to establish a persistent connection (i.e. , the “session”) with signalling capabilities. The remote user has the ability to publish an audio stream and subscribe to other streams, being the video stream from the local user device. This allows the remote user to control at least one aspect or an operation of the communication session.

Once the connection with the WebRTC server 102 is established, at step 214 the remote application 118 will request the WebRTC server 102 to supply authentication data intended for the local user, with whom communication over the requested session is intended. This will be provided to the local user so that the local user can connect with the current session, which has been established by the remote application.

The WebRTC server 102 responds by issuing the requested authentication data to the remote application (step 216). The remote application 118, on or after receiving the requested local user authentication data, will notify the application API on the server 120 and provide the local user authentication data to the API (step 218). At step 220, the application server 120 sends a communication to the local user by means of one or more available communication channels or contact destinations, such as by email, or by SMS, or by push notification, to forward a link to the local user, to allow them to access the communication session. For example, the application API 114 may communicate with an SMS (short messaging service) API to queue an SMS for transmission, communicate with a SMTP (Simple Mail Transfer Protocol) service to queue the email message for transmission, or it may communicate with a push notification service to queue the push notification for transmission. A combination of the notifications via different methods can be provided to the local user.

The message provided to the local user contains a unique session identifier, and a link (e.g., a web link) to join the session. The link included in the notification also includes data or parameters which will cause the local user device 104 to automatically launch the local application via a custom protocol. If the local user does not have the mobile application installed, they will be presented with a landing page offering the local user to download the mobile application on the appropriate operating system (i.e., Android, iOS).

When the local user receives the forwarded link and accesses it (step 222), the action of accessing the link will cause an automatic launching of the local application on the local user device 104, or will enable the local user to download the local application from a web browser, install the downloaded local application, and then launch it (step 224).

Once the local application 108 is open, it will negotiate a connection with the WebRTC server 102 to connect to the communication session (step 226). Once the local application 108 is connected to the session, and the WebRTC server 102 can relay any data from the local application 108 to the remote application 118, or vice versa, within the established communication session.

The local user or the local user device 104 may be identifiable, e.g., via a user registration system and the registered user signing in, or by checking a device identifier or unique device characterisation information. In some embodiments, where the user is required to download the application, when the user opens the application 108 and the application detects no session joining parameters, a “fingerprint” - information to uniquely identify either the user or the user’s device - will be generated using the same mechanisms as determined from the browser prior to downloading the application 108. The application will make a call to the WebRTC service 102 to obtain the joining parameters, so that it can join the session. This means the local user does not need to return to the webpage to join the session, or again refer to the notification they received inviting them to join the session.

Figure 3 shows the same connection process 200 which is shown in Figure 2, but with the connection between different system components (application server 120, communication or WebRTC host 102, the instance of the remote application 118 running the remote user device, the instance of the local application 108 running on the local user device 104) represented by single double-head arrows. In one example, the various actions enabled over the connections include:

- The remote user initiates a request for a session from the remote application 118

- The application server 120 initialises the requested session with the WebRTC service 120.

- The WebRTC service 120 responds to the application server 120 with authentication detail for joining the session. The application server 120 then responds to the request from the remote application 118 with authentication detail.

- The remote application 118 creates a connection to the WebRTC service 120 using the authentication detail. The WebRTC connection is then established.

- The remote application 118 further requests the WebRTC service to provide authentication detail for the local user to connect to the session. - The WebRTC service responds to the remote application 118 with the authentication detail for the local user.

- Once the remote application 118 receives the local user authentication detail, the detail is sent to the application server 120, which then communicates that detail to the local user.

- The local user joins the session using the authentication detail, which causes a browser to open or a local instance of the local user application to launch, and then connection to the WebRTC service is negotiated.

At the completion of the process depicted in Figure 2 and in Figure 3, the WebRTC server 102 connects the audio and video data streams from each of the local user and the remote user to the other, acting as a relay for the data sent over the WebRTC server 102 connection. For example, the WebRTC service (as the WebRTC server 102) notifies the remote user that the local user has “arrived” in the session. The remote application 118 open on the remote user’s device, and the local application 108 running on the local user’s device, may each request to subscribe to the WebRTC data provided from the other. The WebRTC service notifies each party of the arrival of the data from the other, to start transmitting the appropriate data and proceeds to relay the requested/received data.

In the above, the launching of the local application 108 may use a custom launching protocol. The custom launching protocol may, but does not need to, be a custom URL scheme. The protocol would include a launch parameter, the specific format and value of which may depend on the operating system of the local user device 104, to provide a “deep linking” function. That is, in addition to launching the application 108, the launch parameters will also direct the connection to the WebRTC session.

When the local application is launched using the custom launch protocol, the local application 108 will read the data provided in the session link, following the custom protocol, to identify the WebRTC session to which to connect. The application 108 will then open a connection to the WebRTC service requesting a persistent connection with signalling, and the ability to publish audio and video streams. The publication of the streams may be preferably performed using the VP8 video codec, but other video codec may be used - such as one which is determined to be optimal by the application or another logic. Remote operation of local user device functions from the remote application

In the present invention, a custom messaging protocol, built on top of the WebRTC signalling system, can be provided. For example, Figure 21 schematically depicts an example of a process 2100 whereby the remote user starts the AR mode in the local user’s application 108. At step 2102, the remote user interacts with the user interface in the remote application 118 to toggle the AR mode, and an AR toggle request is sent over the WebRTC signalling service to the local user’s application 108. At step 2104 the mobile application 104 receives a corresponding AR toggle signal. At step 2106 the application 108 attempts to toggle the AR mode. At step 2108 the mobile application 108 responds with an AR status, which would confirm the success or failure of the AR toggle initiated by the remote user. At step 2110, the remote user’s application 118 receives the AR status response, and the remote user interface is updated accordingly.

Figure 22 schematically depicts an example of a process 2200 whereby the remote user implements control functions in the AR mode which has been enabled on the local user’s device, to be fulfilled by the local user’s mobile application 108. The control function in this example is to add a point to the AR capture which is currently being shown. At step 2202, the remote user selects a point on his or her screen and a “press position” signal is sent over the WebRTC signalling protocol. At step 2204 the mobile application 108 receives a corresponding signal. At step 2206 the application 108 attempts to perform the press function at a location determined from the press position signal to mirror the press position. At step 2208 the mobile application 108 responds with a status or reply signal to confirm a success or failure of the press, over the WebRTC signalling protocol. At step 2210, the remote user’s application 118 receives a position press status response and the remote user interface is updated accordingly.

Figure 23 schematically depicts an example of a process whereby the remote user controls a flashlight function on the local user device 104 from the remote application 118, by sending control requests over the WebRTC server 102, similar to the process shown in Figure 22. Similar to the aforementioned process, the exemplified process involves, at step 2302, the remote user toggling a “flashlight” control in the remote application, causing a “flashlight toggle” signal to be sent via WebRTC signalling. At step 2204 the mobile application 108 receives a corresponding signal. At step 2306 the application 108 attempts to toggle the flashlight. At step 2308 the mobile application 108 responds with a status or reply signal to confirm a success or failure of the toggle, over the WebRTC signalling protocol. At step 2310, the remote user’s application 118 receives a flashlight toggle status response and the remote user interface is updated accordingly.

Messaging protocol

The custom protocol used over the WebRTC session is a messaging protocol. In an embodiment, the custom messaging protocol enables communication built on top of the JSON data-interchange format. It enables requests for particular functions to be sent over the WebRTC signalling system, from one user to the other - e.g., a request for taking a photograph, sent from the remote application 118 to the local application. The protocol also defines the formats of the responses to the requests. Requests using the protocol may be initiated from either device connected to the WebRTC session.

The messaging protocol enables the sending of a key (or action name), and also a value containing a parameter or a set of parameters for the requested action (if required). For example, if the requested action is to mute or unmute a user, no parameters may be required for this action. The data format for the value may depend on the key specified. If no parameters are required, the value may not be sent, or it may be null or an empty string.

The key and the value may be sent separately, e.g., as separate strings. Alternatively, the key and the value may be sent together as one object. The value may be encoded in a data format that allows the key and corresponding value to be encoded as data objects, for instance, attribute-value pairs or array data. An example is the JavaScript Object Notation (JSON) format. This allows the remote user to control the WebRTC session.

For example, a “request” from A to B (A and B can be either user, such as a tradesperson or a customer) may have the following definition:

Definition: {type: string, data: string(JSON object)} where the “data” is of a configuration which is required for the requested type, and the relevant user state information. Here the user state information may be information indicating the “state” of a certain function to the users, e.g., AR mode “ON” or “OFF. Below is an example request meeting the above definition.

{type: "take Photo", data: "{width: 500, height: 500}"}

The above example is a request for the recipient application to take a picture of 500 pixels x 500 pixels. A “response” may include an action completion confirmation, or a success or failure confirmation, and may further include some or all of the information from the request. In one example, a “response” has the following definition:

{type: string, orig_message: JSON object }

Where the “type” will be a direct response to the requested type, and the "orig_message" property will contain the request content allowing state information to be shared. Below is an example response meeting the above definition.

{type: "photoUploadSuccess", orig_message: {type: "takePhoto", data: "{width: 500, height: 500}"}}

The above example is a response back to the requesting application that the requested photo has been successfully taken and uploaded, and the response also includes the original photo-taking request.

The remote application 118 provides controls (e.g., via touch screen inputs, keyboards, pointer control via mouse or a stylus pen, etc) which the remote user can operate, to enable and disable features on the local application 108. Examples of such features include, but are not limited to, the flashlight feature, the AR tool, and the camera feature to take a photo (preferably in high definition, or at least 1200 x 800 pixels or 800 x 1200 pixels, depending on whether the photo is in a landscape or portrait orientation).

Preferably, the user interface within the remote application 118 includes a number of controls such as touch screen portions emulating buttons or toggles, each corresponding to a function feature which is performable within the local application 108 or by the local user device 104. When the controls are activated, the corresponding function or interaction is triggered in the local application 108 via the WebRTC signalling service. The states of the available controls (i.e., on/off, enabled/disabled, active/inactive) are also kept in sync between the remote application 118 and the local application 108. The synchronisation of the states is accomplished on the signalling service, via the messaging protocol. For example, the flashlight feature on the local user device 104 can be triggered by either user. If the local user triggers it, then the remote application 118 is also notified in order to update its icon and accurately show the state.

The local application 108 is configured to action the request from the remote application 118, and to respond with a success or failure message to communicate a success or failure of the action requested by the remote user from the remote application 118. Figure 7 depicts an example of a user interface 700 which is presented by the remote application 118. It includes a video stream display portion 702 which will show the video in the WebRTC session once the session commences. The interface 700 further includes a function control portion 704 on which controls, for example, touch screen buttons or touch screen emulating toggles or switches, are provided. The function control portion 704 is presented as a menu of control switches or buttons. In the depicted embodiment it includes a flashlight ON/OFF switch 706, a photo request button 708, a speaker/mute switch 710, and a recording function ON/OFF switch 712. This interface 700 also provides an augmented reality toggle 716 which allows the user to switch to the augmented reality mode, to access tools enabled in the augmented reality mode.

These controls enable the remote user to request the local application to operate the local user device (e.g., to take a photo or to use camera zoom), to request the WebRTC service to start, stop, or pause the session, or to switch on particular functions within the local application (e.g. to turn the augmented reality mode on or off).

Figure 5 depicts a process for controlling a feature or function on the local user device 104 from the remote application 118. The process is initialised by an activation or deactivation of the functions or features using the controls provided within the remote application user interface (step 502). This may cause corresponding notifications or requests to be sent to the application’s API (step 514). When the application’s API is involved (e.g., in the configuration shown in Figure 2) in updating the control “states” in the remote application 118, the API 114 later responds with the appropriate response or any requested information back to the remote application user interface (step 512 mentioned below). The appropriate response or requested information will differ depending on the feature or function being controlled. For example, the response may be a success or failure response to a request to activate or deactivate a function or a tool, or a photo upload URL in response to a photograph request.

Notifications to the local application 118 are sent through the signalling service via the WebRTC connection (step 504). Upon receiving a request (step 506), the local application 108 will process the received request (step 508). Either the local application 108 or the WebRTC service 102 will respond to the application API, with a success or failure response message to indicate whether the requested action has been successfully performed (step 510). The response may be sent via the WebRTC signalling service to the remote application 118 and thus to the application API 114 if required. The application API will update the current button or toggle state in the user interface of the web application 118 in accordance with the response message (512).

Figure 9 depicts an example of a user interface 900 which is shown during the video WebRTC session. The main display 902 shows the camera feed from the local user device 104, which is sent via WebRTC for the remote user to see. The user interface 900 also includes a control portion or panel 904, which presents on the interface 900 one or more controls which can be activated by either the remote user or the local user to activate different hardware functions or to control the WebRTC session. In this example, the controls include an “end call” button 906 to stop the WebRTC session, a “light” button 908 to control the flashlight, a “microphone” button 910 to mute or unmute the user (i.e., disable or enable the audio), and a “measure” button 912 to turn on the augmented reality (AR) mode in order to take measurements of objects in the observed scene. In the depicted example, the light button 908 and the microphone button 910 are deactivated, and thus the flashlight is off, and the microphone of the user manipulating the interface is muted.

Video stream recording

Figure 6 schematically depicts an example application of the process shown in Figure 5, in relation to the activation or deactivation of a recording feature. In this example, the system enables the remote user to record the WebRTC session. The recording function can be activated or deactivated from controls provided on the remote user’s GUI.

Multiple activations and deactivations of the recording may be allowed during one continuous WebRTC session. The recorded “sections” may be individually archived, or they may be stored as a single stream.

An activation or deactivation of the recording control sends a request 602 to the application’s server-based API 114 (on server 120), which communicates with the WebRTC service 122 (on server 102) to start or stop recording the specified WebRTC session in accordance with the request sent. The recordings 604 of the WebRTC session may be taken using the native format in which video data is received from the sending local user mobile device 104 (not shown). The native format differs, based on what formats for which the operating system has the best support. An example of the native format is the WebM format. In some embodiments, each of these recordings will be composed into a suitable format, such as MP4, for long term storage in a long term memory location 606, and high compatibility playback. A request response 608 indicating a success or failure of the recording and upload may be sent to the application API 114 over the WebRTC service or other mechanism such as a webhook from the WebRTC service to the application API directly. The application API 114 then accordingly updates the state of the “recording” in the remote user’s GUI to reflect the current state of the particular recording control (i.e. , recording activated, or recording deactivated) and optionally in the GUI in the local application 108. Messages are also displayed to notify the change which may use a message style referred to as a Toast message.

Photograph function

In some embodiments, while the communication (WebRTC) session is active, the remote application 118 interface includes a control which the remote user can activate, to send a request to the local application 108 that a photograph be taken by the local user device 104.

Figure 8 schematically depicts the system operation 800 to take a photograph using the local user device 104 under a request from the remote user (via the remote application 118). Prior to requesting the local application 108 to operate the local user device 104 to take a photo, the remote application 118 may designate or assign a photo upload link for the upload of the requested photo.

The remote application 118 will have requested one or more photo upload links (e.g., URLs) from the application server 120 (802). They are available for sending to the local application 108 when a photo-taking request is sent from the web application GUI (118).

If no upload links have been requested, or if the existing upload links have all been used, the remote application 118 will request another batch of one or more upload links from the application server 120. The application server 120 can then respond with the requested upload URLs (804).

The remote application 118 will then send a signalling message over the WebRTC connection, requesting a photo to be taken (806). The request may specify the desired resolution, the desired dimensions, or both, for the requested photograph. There may be a pre-set resolution at which the photos will be requested, if the resolution is not specified in the request. The assigned upload link is sent along with or included in the request.

Preferably, after the request is issued, the remote application 118 will capture an image (step 808) from the video feed. The captured image can only meet the maximum resolution of the video feed currently being received, which will also depend on network conditions. The captured image may thus be of a “low quality” - i.e., not meeting the resolution or size requirement of the “high quality” image. The captured image does not benefit from other features such as a focusing pass on the camera and lighting adjustments. Comparing with the captured images, the desired high resolution image which is requested at the desired resolution benefit from additional quality improving features including, but not limited to, the previously mentioned focusing pass and lighting corrections.

The remote application 118 then holds the image taken from the remote application 118 in a memory location, until a response to the request for the desired photo is received from the mobile application (at step 820 which is mentioned later), or if the image taken meets the high quality image requirements. The memory location in which these images are held is a local memory location within the remote user device 106 (not shown).

The local memory is managed via the web browser itself, and the data may be moved from the volatile memory (e.g. random access memory, “RAM”) into a non-volatile memory (e.g. solid state drive, “SSD” or hard disk drive, “HDD”) temporarily, as required. If the image data is moved to non-volatile memory, it may be removed if the browser session is terminated. In order for this to occur, and provide a visual indicator that the photo was requested and uploading, the image data may be base64 encoded into a data uniform resource identifier (URI), with the format of "data:[<mime type:>][;charset=<charset>][;base64],<encoded data>" (e.g. "data:image/jpeg;base64,[image-content-here]"), this is a standard feature of web browsers. This data URI is then placed into a HTML IMG tag (image tag) as the "src" (source) data. These images stay stored while the user stays on the session page, when the user navigates away from the session page they will be lost if not transferred to the cloud storage service.

After the photo request is received by the local application 108 (step 814). The local application 108 will process the request and take a photo as requested (step 816), which is as close to the specified or pre-set resolution or dimensions, as possible. To do so, the local application 108 will attempt to control operation of the camera of the customer device 104 to take the requested image. The local application 108 will then provide the acquired image to the local user device’s communication module to upload the image to the upload URL provided with the photo request (818) from the remote user’s application. The local application 108 will issue a reply message to the remote application 118 (via the WebRTC server 102), if the photo acquisition and upload process is successful (step 820). In other embodiments, different reply messages (e.g., one value signalling success and another value signalling failure) may be sent depending on whether the photo acquisition and upload are successful or have failed.

Upon receiving the message confirming that the requested operation was successful, the remote application 118 notifies the application’s server-based API of the successful response (step 824), updates the gallery/low resolution preview image originally taken with the actual image taken from the local application 108. The remote application 118 then will destroy the image captured (step 822), particularly in embodiments where the captured image does not have a size or resolution that meets a definition for a “high quality” photo.

If the local application 108 does not successfully upload a requested photograph, (e.g., in the absence of a success reply, or if a reply message indicating the failure will be provided to the remote application 118), the remote application 118 will upload the image to the upload URL (step 826) to a remote memory location or cloud-based storage 812 accessible by the application server 120 or remote application 118, and automatically notify the remote applications server based API 120 of the status of the uploaded image.

More on the generation and distribution of the photo upload links, which can enable the above described photo upload process, is provided below.

In the above, the photo upload link is preferably a link to a protected location where public upload or download of data is blocked. The link may include a set of security tokens that are used to authenticate the user - i.e. to verify the user has permission to upload a file to that location. The tokens may be valid for a predefined period of time. They may also be configured ensure the user using it can only access the location using particular specified commands, such as upload or download.

The web application server 120 may be configured to obtain or generate publicly accessible URLs which includes "presigned" authentication data with a short expiry period (typically less than 60 minutes). This presigned authentication data provides the context for the file storage system to verify that the file upload request is from the authorised user, and subsequently will allow the local application to upload photos or any other data deemed necessary for direct upload to the web application for review or access by the remote user. The photo upload URL may be a pre-signed authenticated URL.

A benefit of this approach is the ability to change the upload destination, which will be in control by the web application server 120. This ability is useful as the upload destination may be different for each remote user. The upload destination may be defined depending on a number of factors, such as the user’s location, network conditions or any restrictions placed on the network, or any security requirements. This means that the logic and additional data required for the upload do not need to be duplicated in the local application.

Another benefit is the system’s ability to treat the local application 108 like a dumb terminal, as there are no endpoint specific authentication protocols to implement. The amount of code required for handling file transfers in the local application 108 can therefore be reduced.

Another benefit is the ability to fine tune security controls for uploading files from the local application 108, because the optional presigned authentication data only allows access to a predefined context/location for uploading. This is particularly useful for situations where the image data are sensitive or need to be protected for privacy purposes.

The security measures in the local application 108 further improves the protection of access to the remote user’s and the local user’s data. It also makes it possible to enforce privacy controls on the data. Therefore, unlike in prior art applications, the local user is not required to log in to the mobile application. In less preferable embodiments requiring authentication from the mobile application 108 to the web application 118 or upload location, there would be a need to store system level access credentials (e.g., username and password), which could lead to inadvertent access of another user's data if there were a fault in the management of the authentication or authorisation controls.

As described above, when the remote user requests a photograph to be taken, the request is sent over the WebRTC service to the local application. In alternative embodiments, the remote application 118 will instead first determine whether the current video feed satisfies the parameters (e.g., image size, image resolution) which are required. In the positive, the high resolution image is obtained by taking a snap shot from the current video stream being transmitted or received over the WebRTC session (i.e., take a screen shot), instead of the remote application 118 remotely requesting a new photo to be taken using the camera of the local user device 104. This is possible if the resolution and quality of the image being received meets the definition of “high quality”, where the high-quality definition may be user defined or system defined. In these embodiments, the snap shot photo may be captured from the remote user application 118, if the video stream meets the “high quality” requirements. In these cases the photo request or command is not sent. As will be appreciated, the image frames of the video stream may satisfy the requirements, particularly if the video stream is a high definition video stream and the network conditions allow. If the image quality or the resolution of the image frames of the received video stream does not meet the “high quality” definition, the remote application 118 requests a “high quality” image to be taken from the local user device 104.

Figure 19 schematically depicts a successful acquisition and upload of a photograph meeting one or more “quality” metric (i.e., size, definition). In one example, the end-to- end successful acquisition involves:

From the remote application the remote user sends request to the application server for photo upload URL(s), and the application server replies with photo upload URL(s).

- The remote application takes a screenshot of the video stream from the WebRTC session and this screenshot is not of sufficient quality. The remote application then sends a “take photo” signal to the local application for a photo to be taken, over the WebRTC service.

- The local application receives the “take photo” signal and then causes the camera on the local user device to take a photo, and to upload the photo to the photo upload URL 1902. When this is completed, the local application responds to the remote application, over the WebRTC service, with a “photo taken” signal.

- The remote application receives the “photo taken” signal and then sends a notification to the application server that the photo is successful.

Figure 20 schematically depicts an upload of an image capture of the video stream to a storage location. This may be performed if the acquisition and upload shown in Figure 19 is not done or not successfully completed. In one example, the process involves: From the remote application the remote user sends request to the application server for photo upload URL(s), and the application server replies with photo upload URL(s).

- The remote application takes a screenshot of the video stream from the WebRTC session and this screenshot is not of sufficient quality. The remote application then sends a “take photo” signal to the local application for a photo to be taken, over the WebRTC service.

- The local application receives the “take photo” signal and then causes the camera on the local user device to attempt to take a photo and upload the photo to the photo upload URL 2002. This process fails, and the local application responds to the remote application, over the WebRTC service, with a “photo failed” signal.

- The remote application receives the “photo failed” signal and then attempts to upload the snapshot to the photo upload URL. If the upload fails, then a failure notification is sent to the server. Otherwise a success notification is sent to the server.

Photograph, Video and Note categorisation

The remote application 118 may provide the functionality to categorise the photos, videos and notes being captured, during or after the video streaming session. During a video feed session, or during a replay of the recorded video feed after the video session has been completed, the remote user will be enabled to “tag” a frame in the feed. Multiple “tags” may be added. The name which is used to tag the frame defines a category to describe the segment of the video currently being shown. The user can enter a category name, or select from existing category names, including either or both of predefined and previously used category names. Any photos, videos and notes captured or shown during the segment will then be associated with the defined category. A segment is defined as the segment between two tags, which could be tags for different or the same category names. A remote user may set a category by tagging the video feed at a chosen point with the appropriate category name, may replace the current category using the same mechanism, or remove any set category by removing the tag.

Therefore, the tags for a video stream are each linked to image data acquired at particular time-points or contained in particular frames in the video stream. When reviewing the categorised photographs, videos, video segments or notes, the user may use the tags to jump or skip to the relevant point in time in the appropriate recorded video, providing quick and easy access to additional context or information of the image content of the video stream. Optionally, when the recorded video is being replayed, when a video segment is being shown, if a category has been set for that segment, the category, and any notes and photos associated with the segment, may be highlighted or displayed. In other words, according to the above, the playback of the video data will trigger the category name(s) and the associated data which are linked to the video data by the tag(s) to be displayed.

Figure 15 shows an example interface 1500 shown on the remote application 118 where the categorisation function is enabled. The interface 1500 includes a video display portion 1502 to display the video stream. The interface 1500 also includes a progress bar or video timeline 1504 for the video, to indicate a progression in time of an incoming video or a recorded video. There is further a category portion or panel 1506 which has an entry field 1516 where the user may enter an alpha-numeric input to type a category name. The panel 1506 may also include a category list portion 1517 which displays predefined category names, previously entered category names, or both.

At any point during the video stream being shown on the display portion 1502, the user can tag that point in time or frame at that point in time, by inputting a category name in the entry field 1516 - this may be done by typing, or by selecting from any existing category names 1518, 1520, 1522, 1524 in the list portion 1517. Setting a category name also sets a tag which is preferably visible on the progress bar 1504. The segment of the video which starts the frame or point in time at which the segment tag is set, belongs to the category defined by the category name linked or assigned to the tag.

The segment starting from one tag will end at the point that the next tag is set. In Figure 15, four tags 1508, 1510, 1512, 1514 are visible on the progress bar 1504, meaning the user has assigned four categories to segments of the video. Therefore, the segment 1534 which starts from tag 1508 and ends at tag 1510, will belong to the category associated with the category name used to set tag 1508. The interface 1500 may further include a frame display portion 1526, showing individual frames 1528, 1530,

1532 of the video where tags have been added, or they may be otherwise picked by the user to be shown on the frame display portion 1526. In one example, for instance, when the local user films the rooms while he or she moves around a house, while participating in the WebRTC session, the remote user can tag the point at which video feed shows a bedroom as “bedroom 1”, and then when the video feed shows that the local user has entered a second bedroom, the remote user can then add a tag, to tag the point at which the video feed shows the second bedroom as “bedroom 2”. The point between the two tags “bedroom 1” and “bedroom 2” will be in the category “bedroom 1”.

Photograph annotation The remote application 118 may be configured to provide a photo annotation function to allow the remote user to mark up or annotate an image. In use, this may be done for a number of reasons, such as highlighting particular issues.

When working on a photo which was taken, in some embodiments, the program will use either the high definition (HD) photo taken, or the lower definition photo if the HD photo acquisition or uploading was unsuccessful. The photo will be shown in the annotation interface after a photo is taken.

The remote user may be provided with a number of photo annotation tools to perform functions including, but not limited, the following:

1. Crop the image to a user selectable or predefined size

2. Rotate the image to a user-controlled orientation or a predefined orientation

3. Adjust the Brightness, Contrast and Colour settings of the image, to a user controlled or predefined configuration

4. Ability to Draw freely on the image using a selection of predefined brushes which the user can, but not limited to control the size and set the colour

5. Add Shapes on the image using a selection of predefined shapes which the user can, but not limited to control the size, elongate, shorten, rotate and set the colour

6. Blur out portions of the image, which may be due to privacy concerns/requests or other reasons

7. Stamp other images or complex shapes using a selection of predefined stamps which the user can, but not limited to control the size, elongate, shorten, rotate and set the colour

8. Text which the user can enter their own message along with, but not limited to control the size, elongate, shorten, rotate and set the colour

9. Undo/Redo previous annotations

Figure 16-1 shows an example photo annotation or photo editor interface 1600. The interface 1600 includes a header portion 1602 which displays an edit label 1604 which in this example shows “photo editor”; an image display portion 1606 in which the image being annotated is shown; and an editing menu 1608 from which different types of annotation or editing functions or tool can be accessed, to directly activate the tools, or to access further interfaces specific to the functions or tools. In this example, the menu 1608 includes interactive portions 1610, 1612, 1614, 1616, 1618, 1620, 1622, respectively for accessing the crop function, the rotate function, the brighten function, the drawing tools, the blur tool, the stamp tool(s) and the text editing tools.

Figure 16-2 shows an example text annotation interface 1700, for adding text labels to a photograph, which is shown by accessing the interactive portion labelled “text” 1622 on the photo annotation interface 1600 (see Figure 16-1). The header portion 1602 now displays the function label 1704 “text”. The editing menu 1608 is now replaced by a “text” tools menu 1708 from which various text annotation functions or tools are accessible. Two text labels 1710, 1712 can be seen to have been added to the image. The text boxes also each include a rotation tool 1714, 1716, which allow the text label as displayed to be rotated with respect to the rest of the display. The text label “replace clips” 1710 is shown next to the broken clips on the blinds shown in the image. The broken clips have also been annotated by graphical markings, i.e., the free-hand circles 1718, 1720, the user had already added to the image. Similar interfaces can be provided for e.g., graphical annotation, such as to draw the circles 1718, 1720, lines, shapes, etc.

Notes System

The remote application 118 may be configured to provide a notes function for the remote user to make notes on the photo taken after or during the session. The notes on a particular photograph may be pre-populated with data collected using tools/sensors provided by the applications as controlled by the user. The data may include, but are not limited to, measurements, angles, area calculation, orientation, ordinal/cardinal directions, latitude, longitude and height/elevation. An example of an interface showing a photograph and notes populated with distance measurements angles and area calculation, is shown in Figure 4.

When working on a photo taken, in some embodiments, the program will use either the high definition photo taken, or the lower definition photo if the HD photo acquisition or uploading was unsuccessful. The photo 402 may be shown in the notes interface 400, after a photo is taken. The notes interface 400 in this example includes an editable interface 404, which has an editable field 406 for entering a name of the picture, and another editable field 408 for entering notes. The notes interface 400 may also provide an information display portion or window 410 to show already entered or acquired information regarding the photo or its content. The acquired information may be obtained using augmented reality (AR) and other functions provided by the system.

Augmented Reality

The system includes an AR mode. When the AR mode is activated, the system provides augmented reality measurement tools for obtaining information (such as dimensional information) pertaining to an object near the local user or an environment local to the local user.

To enable the AR mode, it will require the local user device 104 to include AR software in the operating system’s software development kit. The local user device 104 will also need to have suitable hardware so as to enable tracking of the local user device 104 when it is being moved, to maintain the interaction with the AR model which is constructed in the AR mode.

When the AR mode is activated, the camera on the local user device 104 begins to input the video feed into an AR session. This AR session builds a 3- dimensional space data model of the environment as captured by the camera of the local user device 104. This type of freedom of movement provided by the AR system is commonly referred to as the “six degrees of freedom”, which means that the camera viewpoint can be moved around in an area, instead of being restricted to having a fixed point of view (POV). The AR session provides the video stream, which along with audio (i.e., voice) data captured by the local user device, are provided over the WebRTC service, and in turn to the remote application 118.

The camera captures the video input via a continuous progressive scanning process. A purpose of the progressive scanning is to determine the location of the camera or view in an area in relation to a starting point, which is assigned a coordinate of (x, y, z = 0, 0, 0). The starting point is typically the camera view point or the camera location, at the start of the progressive scanning process. Another purpose of the progressive scanning is detecting surfaces in relation to that fixed point of view. When the camera is moved around, the tracking (scanning) updates the location or viewpoint of the camera in relation to the fixed point of view. During the initialization of an AR session (i.e. , before the 3D model has sufficient data), or in situations where the AR session has lost the ability to reliably track the local user’s environment and needs to reinitialise, the application may make use of the visual instructional system (where provided by the operating system on the local user device 104) to assist the local user to scan the environment until the AR session has gathered enough data to rebuild the 3D data model, and start or continue operation.

When the 3D model within the AR mode has sufficient data, the user can place a target on the screen while pointing the screen at a particular object, to place that target onto a line, or surface of the object as represented in the 3D model. This is done using point clouds.

The system requirements for the local user device 104 may mandate an operation system having the AR software development kit (SDK) so that the AR features may be provided from the local user device 104. The AR SDK provides a framework (such as ARKit for Apple’s iOS, or ARCore for Android systems), in which the AR module(s) provided by the present system are built. The AR module(s) refer to the software including executable code, such as procedures, functions, algorithms, subroutines, etc, for providing the AR mode. The framework is used to map the surroundings seen from the camera on the local user device 104. The system in accordance with the present invention utilises the framework to build this point cloud, and track objects to maintain the point cloud and extend the point cloud data. By referencing the point cloud data, the AR module can provide the positional information (e.g., coordinate data) of any point with enough distinction to be a trackable point of interest on the screen in the 3D space. It may be a point on an object or in the environment, such as but not limited to a corner or an edge, which can be identified and marked by the AR session.

While in the AR session the users have the ability to select one or more “points of interest” in the 3D space. The selection is done by the local user pointing the camera at a targeted object point, to locate it within a reticle provided by the application. The user may be required to confirm his or her selection of the target by tapping the reticle or using a select button provided by the AR session.

The AR module is configured, when the user takes the action to select a targeted point, to perform “hit tests” within the AR session, to identify a point in the point cloud which matches the targeted point which the user is trying to select. That is, the module tries to detect and identify locations in the point cloud that match the targeted point on the screen. If a suitable match is available, the coordinates of that matching point is added to a data container, and the matching point is the selected point of interest in the 3D space. The data container preferably stores data in an agnostic format to improve the ease of serialising and deserialising for communication, saving, and loading (e.g. undo stack). This may be implemented as a class holding 3 data points such as a Vector3 or a multi-dimensional array including a Tuple <double, double, double>. The successfully selected point is displayed (i.e. , drawn) on a 2D “transparent” overlay layer, e.g., as a circular object (i.e., a dot). The 2D overlay layer will be described later in this disclosure.

In trying to find the “match”, the hit testing provided by the SDK typically provides multiple points which match the selected target in the area specified, although it may also return only a single point. These multiple points are usually still at the exact same point of the ray cast, but are the results from different scanning mechanisms used by the AR. The multiple results may be the match obtained by the ray cast hitting a surface (i.e., the surface detection), or the match obtained from the feature point average, or other scanning mechanisms.

The position(s) of the points are returned to the AR module in the local application 118. The AR module processes these points to select the best match, using a number of factors including, but not limited to, surface/plane detection, density of points, etc. The selected best match is then the selected “point of interest” in the 3D space. In embodiments with particular practical application in measuring object or area dimensions, the result obtained from surface (plane) detection may be prioritised, to select a single match out of the multiple matches. This is because in such applications, plane detection may be considered to provide a better anchor point.

The AR module is configured so that when the user places (i.e., selects) a new point which is identified using hit tests, a record associated with the new point is created, to store information on or references to one or more previous points which were placed, if they are connected to this point. The algorithm for determining whether one or more previous points are connected to the new point is called a “join calculation”. In this sense, the record “connects” the points. The record, i.e., the point data, is stored in a Data Container (“dataContainer”).

After a join calculation (to determine whether the “new point” is connected to any previous points) is done, the “new point” becomes the “last point”. Thus, the “new point” variable is a temporary variable holding information regarding any current “new point”

,as a temporary value, in the memory, to facilitate creating the line relationships in a "dot to dot" manner. Alternatively, data for each “new point” can be stored in a stack. However, the dataContainer approach still allows a full record of all of the placed points to be retained (in the dataContainer), and the line relationship can be created by looking up the most recent point in the dataContainer. Having the “most recent point” as a separate value also allows that value, being a temporary value, to be cleared in memory without affecting the integrity of the dataContainer. The next point which is chosen would not then be linked to the most recent point. This is useful, for instance, when the user wishes to select a target point that is part of a new object, surface, etc, rather than the one on which the previously selected points lie.

To enable the user to “place” a point of interest which best matches the target onto the AR model, the AR module includes a two-dimensional (2D) layer data object which conceptually provides a 2D transparent overlay, superimposed over the scene generated from the AR session. This provides the AR mode GUI the ability to display the position of selected points or lines on a separate data layer, to increase readability for the human user. If the selected points were marked as 3D objects in the 3D model, the display of these points would then be affected by the distance and angle of the camera to the physical object or environment being modelled. This causes the display to lose readability as the camera distance or angle from the physical locations represented by the 3D objects changes. On a 2D layer, the changing depth does not affect the display clarity. Furthermore, on a 2D layer, more recent elements or elements considered to be more important may be brought in front of others. For instance, this allows the program to display a measurement of a distance between two linked points, over the line connecting the two points, so as to facilitate human readability, e.g. as the measurements may be considered as more critical information.

The transparent 2D layer may be implemented as a type of "view" object that renders data objects called “nodes”. In some embodiments, sprite nodes are heavily used. The AR module is configured to build nodes for the objects which are drawn. From the combined AR model and the camera feed, as a new camera frame is displayed, a new 2D frame is drawn on the screen.

The 2D positions of the data objects are updated, every time a new frame is drawn on screen, so that the selected points (nodes) can continue to be displayed on the screen in their correct positions within the AR session. When the camera is moved, the 2D positions are updated to match the movement in the 3D scene powered by the rendering engine (e.g. SceneKit). The rendering engine provides constant updates to the AR session with new frame data. The AR module of the system provides an algorithm which converts the 3D positions of the objects in the Data Container in the AR module to 2D points displayable on the screen. This is done by comparing the 3D points with the 2D screen dimensions, and tracking the camera viewport in the combined AR SDK + Camera view.

As previously mentioned, a reticle is used to facilitate the user to add a target or point of interest. The “reticle” is added to the 2D layer which is superimposed over the frame. The reticle is typically added to a centre location of the screen. The display algorithms or programs may be configured so that, on the display, the reticle will spin, or change in colour, style, or both, to indicate the AR tracking state. For example, when the reticle has a “good” tracking state, the reticle may be shown as a green solid circle and appear static (not spinning). It may be shown to become red, indicating a degradation in the tracking state. The displayed circle may be shown to have slots (e.g. circle now in dashed or dotted line), or to spin, or both, or otherwise become more visually noticeable, drawing the user’s attention to the degrading or bad tracking state, as required.

To add a “point” (e.g., a node object) on the 2D layer object, the user (typically the local user) will first move or orient the device camera so that the reticle is shown over a particular target on the screen (e.g. over one of the corners of a window being measured using the AR tools). The AR view of the camera capture is provided as a video feed to the remote user over the WebRTC service, and is displayed on the screen of the remote user’s device, in the remote application GUI. Either the remote user or local user may then confirm the selection of the targeted point by tapping in the reticle or a control button to “select” the target point in the reticle, when it is shown to be positioned over the desired location on the screen.

The point can be added if it is placed over a selectable location in the AR model - i.e., if a hit test for the reticle finds a point of interest in the AR model (e.g., representing a corner, an end point, a visible connection point, etc). The point can then be added by tapping the screen at the reticle. The AR module (i.e. program) may be constantly performing hit testing to find a match for the current location of the reticle, and the selection action sets the match and adds the matched point to the data container.

If the camera is manipulated so that the reticle generally follows a surface or line in the AR model, a new point may be added at a selectable location. This is powered by data from the AR session environment tracking to provide simple feedback to the users to allow them to know when they can or cannot place a point. For instance, the reticle may be shown with a visual indication as to whether there is a match for the centre, or a feedback (haptic, audio, visual, etc) is provided if the user taps the screen while there is no match available, or both. As the points are being drawn on the 2D overlay, they are provided as the aforementioned “nodes”. Thus, the AR module is configured to calculate the two dimensional positions of the selected points (i.e. “nodes”) in the 3D AR model, so that it can be displayed in the 2D scene.

If two or more points are selected (one at a time), the AR module measures (i.e., calculates or determines) the distance of the virtual line formed between each pair of points, if the points are linked. If further points are taken, the AR module may measure the angle between two lines. If applicable the AR module may also measure the area which is bound by the virtual shape formed with the points as its vertices. The positional information and metadata comprising the measurements are provided to the users (e.g., to provide the information to be displayed in the Ul window 410).

The addition of each point (“node”) or each virtual line between two linked nodes involves the AR module superimposing another 2D object over the transparent layer. Preferably, the node objects or line objects (linking between two node objects) are drawn in a multi-layered manner, where a 2D object on the top most layer will appear to be on top of or in front of the other 2D objects. Z-indexing (i.e., depth-indexing) may be used to control which objects appear on top of each other, by assigning one of a plurality of Z-index levels to each 2D object. If two or more 2D objects have the same Z-index level, then they will be drawn on the same visual layer. Utilizing Z-indexing will allow components (representing respective 2D objects) with a top Z-index level to have priority for display over other components (i.e., displayed as being in front or on top). Subsequent objects of the same Z-indexing level as the uppermost object are displayed at the same visual display layer.

In preferred embodiments, the 2D layer object is redrawn on each rendered frame from the camera. For example, if the camera has a frame rate of 60 frames per second, the 2D layer object is redrawn at the same rate. The program loops through the already saved data points in the data container, instructing the rendering engine (e.g., SceneKit) to redraw each object until all points in the data container have been drawn. The already existing data points or data linkages will be displayed as a point, line, text, pill box, etc, each type having an assigned z-index to control its position in relation to the other data points or linkages. This ensures the correct items are visible, such as a text layer being displayed over a pill box, and a point being displayed over a line. The rendering engine manages the drawing of the objects, and handles any additional Z-lndexing internally. It will draw each appropriate 2D object as defined by the z-indexforthe object being drawn. When an object being drawn has the same Z-lndex as a previously drawn object, causing the objects to overlap, the newer object may be drawn over the top of the previously drawn objects at the same z-index ensuring the latest object is on top. The process that is followed here involves clearing the screen as the camera frame is refreshed, checking that the objects are still in the container and checking for any new objects that may have been added. Next the program queries the AR SDK, for each object in the container, to determine whether that object is currently included in the camera view, and if it is, obtain the object’s new 2D positions in the camera view. The objects included in the current camera view are sorted for drawing, and then they are drawn at their updated positions, and at any re-determined scale as required.

In some embodiments, to provide the above mentioned 2D layer object, the AR module makes use of scene rendering engines in 3D graphics framework/library, such as “SceneKit” in iOS or ViroCore in Android. A memory structure known as a dictionary may be used to link nodes to objects stored in a data container. As mentioned, the “points” are examples of the nodes. The points can also be considered the vertices as they may form vertices of shapes. A point or a vertex object contains an identifier, and a vector which is used to store 3D space coordinates (x, y, z). The set of point objects are stored in a data container.

AR scanning configuration

The AR module is configured to scan or track the position of the local user device 104 in relation to the 3D model.

In an embodiment, the AR module uses the AR SDK configuration which tracks the position and orientation of the local user device 104 and uses the information to augment the scene of the AR model displayed in the AR session. Position and orientation tracking provides 6 degrees of freedom: (x, y, z) and (roll, pitch, yaw). Typically the input from the rear camera (located behind the screen) is used.

In embodiments intended for practical applications where measurements of lines, planes, and areas are useful, plane detection rather than image or object detection may be used in the above tracking (which object detection would be better if we had a controlled environment, or specific items/components we want to use for tracking locations with). The AR module also uses the raycasting functions when placing points as mentioned earlier, in order to find the position of a real-world feature in the 3D model which corresponds to the 2D point selected and “drawn” on the 2D layer data object.

In some embodiments, the AR module does not include any lighting simulation, or will disable the lighting simulation function provided in the AR SDK, particularly in cases where the main interest is in the tracking capabilities rather than 3D rendering. In some embodiments, the AR module does not perform any 3D rendering at all. However, the AR module is configured to place invisible 3D objects in the 3D space, at the location in the 3D model where a new point is placed. If at a later time, the selected point is matched to that 3D location, the presence or detection of the invisible 3D object will indicate that the selection has hit an already chosen point, i.e., an existing point or node. The detection of the same node or point being chosen allows the AR module to “close the loop”.

In some embodiments, the AR module may not support the loading of an existing scene or the scene reconstruction feature, particularly in cases where measurements of only a few objects (e.g., 1 or 2 objects) are normally taken at a time. In those cases, scene reconstruction or loading has little value. The removal of those features assists in reducing unnecessary CPU, memory and battery consumption. However, in other embodiments, the scene reconstruction features may be included, if the practical application calls for the creation of an augmented 3D space with many objects within, or where the AR module is configured to allow a user to save the current state of the AR session and return later (similar to a pause / save feature in games).

Figure 10 depicts the user interface 1000 which is activated after the user selects the “measure” button 912 in the interface 900 (see Figure 9), which activates the AR mode. The AR scene display portion 1002 of the interface 1000 now shows a scene of the 3D AR model representing the environment shown in Figure 9. The AR scene display is provided as the video feed from the local application to the remote application 118 over WebRTC. A reticle 1004 is located at a central portion of the AR feed. A selection button 1006 is provided, in this example located near the bottom of the screen, to allow the user to attempt to select the point in the reticle 1004. One or more AR mode controls are provided along the bottom of the screen, including an “undo” button 1008 to allow the user to undo the last change he or she has made (e.g., the last point drawn in the AR mode), a “clear” button 1010 to clear all of the points that have been selected, a “new” button 1012 to turn off the connection between the next point to be selected and the previously selected point - i.e., to start a new line, and a “back” button 1014 to exit the current interface 1000. It will be appreciated that more or fewer function buttons may be included in various embodiments. The function of the select button may be replaced, e.g., by the action of double tapping or prolonged tapping on or near the reticle.

Figure 11-1 shows an updated AR mode interface 1100, which now has points and lines drawn over the window object shown in Figure 10, with measurements displayed. The reticle 1018 is shown in the centre of the screen. Four “points”, in the order selected, 1102, 1104, 1106, and 1108, each corresponding to a corner of the window pane in the 3D model, are shown as circles. The distance measurements 1110, 1112, 1114, between each pair of consecutively drawn points are calculated by the AR model using the AR SDK in the local user device’s operating system. As the user again selects the point 1102, the AR module is able to detect that this point has previously been selected. The AR module draws the line 1116 between the point 1108 and the point 1102, and provides a distance measurement 1126 between the points 1108, 1102. The AR module, as it detects that the point 1102 has been selected previously, determines that the four points 1102, 1104, 1106, and 1108 form a closed shape. A measurement 1120 of the area enclosed by this shape is then made.

Optionally, as shown in this embodiment, a line 1122 is drawn between the reticle 1118 and the last drawn point 1102, to help the users see, rather than imagine point to point, where they have last selected a point. This line 1122 may cause similar functions or displays to be provided as the lines between two placed points, such as providing a distance measurement along the line 1122 from the last placed point to the current reticle position, or an angle measurement between multiple points linked to the reticle. This line 1122, if provided, is a temporary line, and will be replaced by a new temporary line when a new point belonging to the same measurement group is placed on the 3D model (and a corresponding 2D node drawn on the transparent 2D layer). Further optionally, the temporary line 1122 may disappear or clear, if the “new” button 1124 is selected to alert or notify the AR module that the next point drawn will belong to a new points group.

Figure 11 -2 shows another example AR mode interface 1150 where a AR image of a door is shown. Points 1152, 1154, 1156, 1158 have been added at each of the four corners of the door. The data for these points in the data container will show the linkage data for the point group. The points were added in sequence around the border of the door, so that the joining line between each pair of consecutively added points is shown along one corresponding edge of the door. The distance measurements 1160, 1162, 1164, 1166 are displayed on top of the joining lines to show the respective measured length along the edge. In this example, angle measures 1168, 1170, 1172, 1174 between each pair of joining lines linked respectively by points 1152, 1154, 1156, 1158.

Figure 12 depicts a further example interface 1200 from an AR session. The user interface 1200 in the session shows an image of a roof and eave as presented in the AR environment. The reticle 1202 is shown in a centre portion of the screen. A previously drawn point 1204 is displayed in the AR environment, and that point was previously in the centre of the screen until the camera is manipulated to move the reticle along the eave line. A second point (not shown) along the eave line is chosen and drawn, at which point the virtual line 1206 between the points is also drawn, and the AR model displays the distance measurement 1208 between the two points. The user interface 1200 also includes an “undo” button 1210 which allows the user to undo the previously drawn point. A “select” button 1212 may be provided, so that the user can select the point in the reticle 1202.

Level tool

The application may optionally provide a virtual “spirit level” tool. An example is shown in Figure 13. The interface 1300 provides two virtual spirit levels, a horizontal virtual spirit level 1302 and a vertical virtual spirit level 1304, each shown in their respective horizontal and vertical orientations, and located adjacent to the reticle 1306.

The local user can be prompted by the remote user, in the WebRTC session, to either place their mobile device 104 on an object, or line it up with an object, to obtain the angle of a plane which they need to determine. This can also be done to check if a horizontal surface is indeed horizontal, or if an angled surface is at the expected angle (e.g., at 90 degrees).

Video Annotation

The remote application 118 may be configured to provide a video annotation function to allow the remote user to mark up and annotate a video, for a number of reasons such as, but not limited to, highlighting issues shown by the video, or providing text to sections of the video for noting a categorisation or providing context, or to provide an introduction.

If a video session has been recorded, the video annotation may be available to annotate the recorded video. In some examples, the video will be shown in the annotation interface after the video has finished initial processing and has been stored in the storage system.

The remote user may be provided with a number of video annotation tools including, but not limited, to perform the following:

1. Crop or letterbox the video to a user-selectable or predefined size, which may be changeable for different sections of the video

2. Rotate the video to a user-controlled orientation or a predefined orientation

3. Adjust one or more the Brightness, Contrast, or Colour settings of the video, which can be further adjusted at different points throughout the video, to a user- controlled or predefined setting

4. Free-draw (draw by freehand) on the video using a selection of predefined brushes, where the user can control one or more of - the brush size, or set the brush colour or set the style of brush stroke, or another parameter of the brush - where the appearance setting may persist for one or more frames

5. Add an arbitrary shape, or add a predefined shape on the video, where the user can control and set one or more aspects of the appearance of the added shape, such as size, length (to elongate, or shorten), orientation (i.e. to rotate), the colour, or another aspect, where the appearance setting may persist for one or more frames

6. Blur out one or more portions of the video, which may be due to privacy concerns/requests or other reasons, each portion includes one or more frames

7. Stamp images or complex shapes onto the video using a selection of predefined stamps, where the user may be able to control one or more aspects of the appearance of the images or shapes, such as size, length or width (to elongate or shorten), orientation (to rotate), colour, or another aspect, and the appearance setting may persist for one or more frames

8. Add text by inputting a message, where the user may be able to control one or more aspects of the entered text, such as size, length or width (to elongate or shorten), orientation (to rotate), colour, or another aspect, and appearance setting may persist one or more frames. The message or text may be the same as. Alternatively the message or text may be included in addition to the labels for the categories/notes/measurements, which may also be displayed from the text editing interface 1700 (see Figure 16-2).

9. Undo/Redo/Reset previous annotations Exportable Session Report

Data associated with a session may include data captured before, during or after the session. By way of non-limiting examples, the data may include one or more of: customer (i.e. , local user) information, site/job information, business (i.e., remote user) information, photos taken, measurements taken, notes and categories entered or selected. The data can be arranged to be provided in an exportable or printer friendly report. For example, the report may be in a portable document format (PDF), comma separated values (CSV) format, or JSON format. The information contained in the report may be selectable by the user, allowing the user to fine-tune the report to their need.

The report setting (to select the data included, and or another selection for the organisation or appearance) may be applied to an individual report or be reusable for applying to one or more future reports. The setting may be managed via a settings system, or may be set by selecting from one or more predefined templates. The report may be shared within an internal organisation, or sent to an external system.

Image Extraction (from a Recorded Video)

In some situations, the remote user may not have requested or taken all the required photos during the live session. Thus, optionally, the program may be configured to provide an image extraction function, where the user can take a snap shot of a particular video frame, or request that the snap shot be taken, during a play back of a recorded video. When the function is activated or the request is made, a snapshot of the current video frame may be converted into an image and made available to other features of the application (e.g., annotation).

Point distance tool

Optionally, the AR module may provide a tool which measures the distance between two points - replacing the real life analogue of using a tape measure, or the analogue or digital version where a line is drawn between the points and measured. This is useful for measuring depths. Figure 14 shows an example user interface displayed when the point distance tool is activated. When this tool is activated, the user can select the point at which the “depth” is to be taken - conceptually similar to pointing a laser at a point to send a pulse light and measuring how long it takes for the light to return. This point is selected using the reticle 1402, and then calling the functions in the AR SDK. The interface 1400 may display the measurement next to the selected point 1404 which is shown in red in this example. In some embodiments, optionally, the reticle is shown in different colours depending on the state of the AR session. For instance, the reticle will start off in one colour, e.g., red (the operating system is not ready, or the environment is not ready to have a reticle, etc). After the commencement of the AR session (i.e., initiation has finalised because enough data has been gathered for the 3D model), the reticle will turn to a different colour, such as “green”, to show that a “good” (i.e., ready) state is achieved and that reticle is ready for use to provide the measurement functions. If a “good” state is not achieved, or if it is achieved but then lost, the reticle will return to the colour (such as red) indicating it is not ready. The reticle may be shown in a “not ready” state due to one or more of the following: AR session not started, AR session still initialising, AR session is broken, AR session partially missing information (loses information to track something), AR session unable to detect any points of interest or AR session unable to detect planes. The reticle can only be green (colour indicating a ready state) when none of the “not ready” state items are present, which provides points of interest for hit testing and a minimum of one flat plane detected.

In some embodiments, if the reticle is hovering over an area with an existing point, the reticle will snap back to that point. It may further provide a haptic feedback to indicate the snap-back. The snap-back, if confirmed to be correct, would indicate a shape being enclosed. Therefore, the AR module may provide an area calculation done after the user confirms the placement of the point back to an existing point. Every time a point is added, the AR module tries to compute any enclosed area, if possible. In some implementations, this is done by checking the linkage between existing points, to find a path from a most recently added point, back to the earliest linked point without crossing any line connecting any two points. If this path forms a loop, the area surrounded by this path is calculated. This process is repeated for each point in the data container until the first loop is detected.

For instance, the AR module may be configured to, when the user has added more than one point to the data container, check for a match between a new point with other points previously selected (added), before performing a hit test for the new point. When the local user is adjusting the position of their camera/device, the AR module is actively scanning for previously selected points. If the camera viewpoint comes close to an existing point in the 3D space, or if the AR module detects a collision between the viewpoint and an existing point in the 3D space, the reticle will snap to the existing point, along with haptic/taptic feedback, making it easier for the users to connect multiple points together. In the above embodiments, the components in the 3D model generated on the local user’s device 104, e.g., - point(s) of interest, detected planes - may be projected onto the screen on the local user’s device 104, for the users to see. The framebuffer output from this layer is also passed to a WebRTC session for encoding and sending to other participants in the WebRTC session. This may be done by constantly taking screenshots from the AR session, and then encode the screenshot data into a streamable format. This encoding could be done within the local application 108. Alternatively, the screen shot may be sent to the WebRTC server 102 and then encoded by a media encoder in the WebRTC server 102. An additional 2D functionality layer is added afterwards, as part of the user interface on the remote application or local application, providing controls for the users to interact with the AR session.

As the nature of the AR technology currently has limitations regarding the absolute accuracy of point placement in the 3D space, the user interface receives measurements from the AR session in meters. In order to provide context aware information to the users, the application adjusts the measurement units and truncates the measurement values being displayed so that the measurements are not expansive and representing an unrealistic accuracy. These measurements may be truncated to ensure that the measurement displayed is constrained to fit in the display being provided. This helps to maintain the readability of the displayed measurement value, without the need of complex computer codes. All measurements may be displayed in imperial or metric system of units, or a more appropriate system for the user/context, which may be user selectable, where typically, a user will see measurements in centimetres (cm) and metres (m) when using the metric system of units, or inches (in) and feet (ft) when using the imperial system of units.

As multiple points are placed in the same measurement group, lines connecting each pair of adjacent points are generated, showing the distance between them. There is a linkage between the sequentially added points in a point group, as well as between the last point and the first point in the point group. The linkage may be formed by each point having a linkage data to one or more of the preceding points (in the order added) in the group. The linkage between the points will be added to the data container. A user may select to start a new measurement group which will cause the next point to not be linked to the previous point, allowing a new object to be drawn.

As depicted in some of the embodiments described above, the users may have the ability to manipulate the data container to push items from or pop items into the data container allowing for functionality such as undo or redo, or to reset the data container removing all selected points.

The data container may be serialised and sent to the remote user through the signalling system within the WebRTC connection when changes to the data container occur. This data is deserialised by the remote application 118 and then displayed as required. The data streams for transmission and storage may be adjacent streams.

In order to provide meaningful data to the remote user, and to reduce the network overhead of transferring the data container context from the local application 108 back to the remote user’s web application, a reasonable short delay and other controls may be in place. For instance, the data may not be sent when there are 0 or 1 points contained in the data container context, unless the user action was to reset/undo, which will notify the remote user’s web application to remove any current data.

Also, as the points are being placed and added to the data container context, a time delay may be in place, before the data is sent to the remote user(s). Different time delays may be used.

The data may be sent immediately if a photo is taken and timestamped at the remote user’s end, allowing for a unique reference point in the video (if recording).

In some embodiments, the remote user may interact with the AR session remotely from a supported web browser.

In some embodiments, the data objects may be provided as video stream canvas objects, particularly in embodiments where the web-based application allows for the marking up / annotation of the images - whether with the live video feed, or after the end of the video feed, with static images. In an implementation, a HTML Canvas object will be layered, using a transparent background over the image/video feed being displayed.

When the annotation tool(s) is used on a live video feed, the application may be configured to allow the option to take a photo of the annotated frame, or freeze the frame, or continue with the live video feed. In embodiments where the HTML Canvas is being used, this option will only be active or available when the mark up/annotation process is occurring.

The HTML Video object, which is where the video stream from the local user is actively being displayed, will still exist when the HTML Canvas is used. In some embodiments, the AR module may be configured to, while the AR remote control is in operation and when the remote user interacts with the HTML video object such as by clicking on it, track the position of the “click”, by e.g., measuring the number of pixels from the top (or bottom) edge to the click position and the number of pixels from the left (or right) edge to the click position. This interaction position is captured as a fraction, being the number of pixels from the left (or right) edge as a fraction of the number of pixels in the total width, and the number of pixels from the top (or bottom) edge as a fraction of the number of pixels in the total height (e.g. 500 / 1000). The fractions provide a definition of the position of where remote user has interacted with the canvas object (e.g., clicked or touched), to enable an interaction to be reflected at a corresponding location on the AR session video feed on the local user’s device, allowing remote interaction of the AR session running on the local user’s device. This data will be sent via the signalling service to the local application to trigger the appropriate AR function (e.g., place point or connect to existing point). When AR remote control is not in operation, clicks on the video object may still be tracked/sent to the local application 108.

The positional data of the interaction is sent to the local application 108 via the signalling system within the WebRTC connection, where the mobile application will ingest this data and act upon the request by interacting with the highest-level layer projected to the screen. This will trigger a bubbling process down to lower layers until a valid interaction has been made or the lowest layer is reached. When the layer interacted with is the AR session the application performs a hit test on the 3D space which may be an existing point to connect to, or as part of the 3D room environment in which case a new point will be added. This is processed by looking for the first successful match of existing placed points, detection of flat surfaces to place a point on, or a group of feature points to place a point between. The mobile application 108 will respond via the signalling system within the WebRTC connection with a success or failure for the requested interaction.

In the embodiments described above, the remote application 118 is web-based. In other embodiments, the remote application may instead be loaded as a program configured for execution on the remote user device. However, these embodiments will impose a higher storage requirement on the remote user device.

In the above embodiments, references have been made to the remote users being tradespeople, using the invention to communicate with the local user, to assess a job / site / location, and obtain information and data to perform the job. However the invention may potentially be used in a variety of applications, such as but not limited to:

• Property Quotations - new work, repairs, maintenance, emergency assistance

• Motor Vehicle (cars, boats, trucks etc) quotations or repairs

• Insurance assessing or follow-up confirmation

• Property Inspections - initial and follow-up

• Emergency assessments - anywhere anytime, fallen trees, blocked roads, etc

• Health Professional assessments.

In the above embodiments, reticles are shown in different user interfaces to allow the user to select points of interest. However this is not an essential requirement. For instance, the user may select a location as a target by tapping, or pointing with a pointer device (e.g., mouse, stylus pen), directly on a desired location without a reticle being shown over it, and a matched point in the 3D model (if available) will be added. The AR module may be configured to trace the user’s touch or follow the pointer and constantly perform hit testing in the area(s) around the traced touch or pointer. The AR module may be configured to perform hit testing at various locations along defined features of interest (such as planes, edges, corners, in the 3D model.

In the above embodiments, the server-based video communication can instead be replaced with peer to peer video communication, such as peer to peer WebRTC. Peer to peer WebRTC does not provide the full suite of functions, such as recording, provided by the server-based WebRTC. Alternatively, the system may provide both WebRTC service and peer to peer WebRTC, where the peer to peer WebRTC is used when the full functionalities of the server based WebRTC service is not required.

As previously mentioned, functions available on the local user’s application or device, such as AR mode controls, in-app camera control, etc, can be controlled remotely by the remote user. They can also be locally controlled by the local user. This is useful, in scenarios where only voice communication is available. The local user can operate the local user application under verbal instruction from the remote user, to obtain the data required. The data, which may be screen captures from an AR mode, recordings, photographs, can be uploaded by the local user to a storage location (preferably a cloud-based storage), for reviewing by the remote user. Therefore, in the broadest embodiment, the real time communication can be either voice (i.e. audio only) or video (i.e., audio-visual) communication. Variations and modifications may be made to the parts previously described without departing from the spirit or ambit of the disclosure.

In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word “comprise” or variations such as “comprises” or “comprising” is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.