Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VIRTUAL PRESENCE DEVICE WHICH USES TRAINED HUMANS TO REPRESENT THEIR HOSTS USING MAN MACHINE INTERFACE
Document Type and Number:
WIPO Patent Application WO/2022/130414
Kind Code:
A1
Abstract:
The present invention relates to a virtual presence device which uses a virtual host to represent the real host using Men Machine Interface technologies providing a virtual presence experience in real time. One part of the device is worn by the host in the form of a headgear which comprises of a number of hardware loaded with the requisite software, through which he guides the attendee for actions like walking, running, turning, bowing, seating, standing, speaking, translating, broadcasting, presenting, training, repairing, diagnosing etc. The attendee receives the guidance from the host through the other part of the device which is worn by him in the form of a headgear comprising of various hardware loaded with requisite software. Both the host and attendee also wear gyroscopic sensors on the body parts to perform requisite movements.

Inventors:
PATEL LOKESH (IN)
PATEL REENA (IN)
Application Number:
PCT/IN2021/051186
Publication Date:
June 23, 2022
Filing Date:
December 17, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PATEL LOKESH (IN)
PATEL REENA (IN)
International Classes:
G06F3/01; G02B27/01; G06T13/40; G06T19/00
Foreign References:
US10469546B22019-11-05
US20180033203A12018-02-01
Attorney, Agent or Firm:
SHAH, Kavita (IN)
Download PDF:
Claims:
We claim,

1. A virtual presence device which uses trained humans to represent their hosts using man machine interface which facilitates conducting meetings, conferences, inspections, audits, servicing, operating, repairing, exploring etc. remotely by a host (user) in any part of the world in real time with the help of a trained person who acts as a virtual host; wherein the device comprises of two main parts from which one part is worn as a headgear by the host who wishes to display his virtual presence at a remote location while the other part is worn as a headgear by the virtual host who represents the host at the remote location; wherein the facial expressions, audio and video of the host are captured through the headgear and the body movements are captured by gyroscopic sensors which are transmitted through the internet to the device of the virtual host which displays the 3 dimensional face of the host on the OLED display of the headgear of the virtual host, along with the audio and video of the host while the body movements are transmitted to the gyroscopic sensors and devices of the virtual host, to perform similar movements; wherein the both the host and virtual host devices comprise of 3D sensing camera, transparent and flexible OLED display, gesture control devices, microphone array, speakers array, gyroscopic sensors, GPS sensors, motion sensors, proximity sensors, infrared sensors, force sensors, temperature and humidity sensors, communication module (4G/5G/Wi-Fi), battery pack, and active or passive cooling system; and wherein both the devices are connected to cloud servers for data storage and internet connectivity using various available technologies for communication.

2. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the device involves the use of a 3D sensing camera which contains depth sensing hardware along with image sensor to record real time depth information of the object being recorded and also records the motion data of the user along with facial expressions and other visual cues of the user thereby recording all live activities performed by the host which are sent through the internet to the virtual host and are replicated there. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the transparent and flexible OLED display has high pixel density and resolution with wide viewing angle which enables the host device to display the virtual host device data for continuous monitoring by the host; wherein the curved display on helmet in the virtual host device which also uses organic light emitting diode (OLED) technology and mimics the average curvature of the human face thereby displaying the host’s face on the face of the helmet worn by the virtual host wherein the imaging technology used reproduces hi-quality real looking video output of the face of the host; and wherein the display provided inside the helmet of the virtual host may be any among heads up display, holographic projection on virtual host’s iris or virtual reality type vertical display with lens setup which conveys the instructions and other details to the virtual host. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the gesture control device of the virtual host acquires the gestures of the real host, processes them and are interpreted and replicated by the virtual host. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the microphone array in the host device records the sound input from the host and transmits it to the virtual host device through the internet; wherein the microphone array in the virtual host device consists of multiple microphones placed at different angles throughout the virtual host device which provide audio directional feedback to the system and enables to locate the source of the recorded voice; wherein microphone array reduces ambient noise, acoustic source localization, increases focus on the voice of the conveyor and helps in binaural recording; and wherein the microphone array is used to transmit the sound received by the virtual host device to the host device through the internet.

6. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the speakers in the host device convert the electrical audio signals received from the virtual host device into corresponding sound.

7. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the gyroscopic sensors transmit the angular velocity data from the host to the virtual host device and vice versa which gives processed data to the gesture control devices to replace accurate movements for the virtual host.

8. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the GPS sensors in the real host device and the virtual host device captures the geographical usages of both the devices.

9. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the motion sensors detect and transmit the movement data, the proximity sensors detect and transmit the absence and presence data, the infrared sensors detect and transmit the heat data, the force sensors detect and transmit the movement intensity data and the temperature and humidity sensors detect and transmit the temperature and humidity data from the real host to the virtual host and vice versa.

10. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the communication modules used are 4G/5G and high speed Wi-Fi communication modules for strong data communication.

11. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein one battery pack is installed in the host device so that it can run independently from the docking station while another battery pack with snap in functionality is provided for mobile use and enables the user to swap the batteries without interrupting the device functionality.

12. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the virtual host device has at least 3 cameras attached on or above the curved screen for recording the proceedings wherein the video obtained from multiple cameras produce wide viewing angle image that mimics the viewing angle of the human eye and transmits the clear data to the host device.

13. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the device has a passive cooling system along with an active cooling system to carry away the heat generated from multiple displays and processors thereby protecting the life and efficiency of the device.

14. The virtual presence device which uses trained humans to represent their hosts using man machine interface as claimed in claim 1 wherein the host device and virtual host device hardware are accompanied by their respective software to represent the host at a remote location such that the host can virtually remain present at the remote location and conduct or attend the proceedings remotely.

Description:
TITLE OF THE INVENTION:

Virtual presence device which uses trained humans to represent their hosts using Man Machine Interface.

FIELD OF THE INVENTION:

The present invention relates to a virtual presence device which uses trained human beings (virtual hosts) to represent their hosts (real hosts), using Men Machine Interface (MMI) technologies due to which virtual presence is experienced in real time.

BACKGROUND OF THE INVENTION:

Virtual presence is the ability of a user to feel that they are actually present in a virtual location using technologies like virtual reality (VR) or augmented reality (AR). The users may also be given the ability to affect the remote location. The user’s positions, movements, actions, voice etc may be sent, transmitted and duplicated in the remote location. Video conferencing is a simple and most commonly used application to display virtual presence. Videoconferences are generally done on computers, smart televisions, smart phones etc in which the users at both the end can see each other while communicating. Apart from this, robots can be used for virtual presence enabling presence and communication of an individual at the desired location. Using the technology of virtual presence, a user can be present at a live real- world location which is remote from his own physical location, without actually travelling to that location. This will reduce the travel expenses, carbon footprint and environment impact along with saving time and improving productivity. A number of technologies are available in the prior art to provide virtual presence facilities to users.

US 10834365 describes an audio-visual monitoring system using a virtual assistant wherein a function of a user-controlled virtual assistant (UCVA) device, such as a smart speaker, can be augmented using video or image information about an environment. In an example, a system for augmenting an UCVA device includes an image sensor configured to monitor an environment, a processor circuit configured to receive image information from the image sensor and use artificial intelligence to discern a presence of one or more known individuals in the environment from one or more other features in the environment. The system can include an interface coupled to the processor circuit and configured to provide identification information to the UCVA device about the one or more known human beings in the environment. The UCVA device can be configured by the identification information to update an operating mode of the UCVA device.

US 10827150 discusses system and methods for facilitating virtual presence which includes a display having a structural matrix configured to arrange a plurality of spaced pixel elements. A plurality of spaced pixel elements collectively form an active visual area wherein an image is displayable. At least one image capture device is disposed within the active visual area for capturing an image. The system is able to sense the environment in front of the display and, in response to what is sensed, is able to change one or more attributes of a displayed image, or, is able to change the displayed image or a portion of the displayed image.

US 10722800 discloses co-presence handing in virtual reality wherein a method for controlling a co-presence virtual environment for a first user and a second user includes: determining a first avatar's restricted space in the co-presence virtual environment, the first avatar correspond to the first user of the co-presence virtual environment; receiving user position data from a first computing device associated with the first user and determining the first avatar's location within the co-presence virtual environment; when the first avatar's location is within the first avatar's restricted space, communicating first co-presence virtual environment modification data to the first computing device; and communicating second co-presence virtual environment modification data to a second computing device associated with the second user.

US Patent Application 201816203287 relates to cognitive enhancement of communication with tactile stimulation wherein the methods include, for instance: determining a relationship between participants in an electronic communication. An emotion implicating a tactile stimulation is identified and a sender and a receiver of the tactile stimulation are specified. A contact point to which the tactile stimulation is applied on the body of the receiver is determined based on the relationship, according to a mapping between the relationship and the contact point as stored in a tactile stimulation knowledgebase. The tactile stimulation is delivered by use of a virtual presence user device on the side of the receiver.

GB 2567731 discusses low power virtual reality presence monitoring and notification wherein the method notifies a virtual reality (VR) headset user to the presence of a human or animal comprising: using thermal sensors (211, Fig. 3) to detect activity near the user 403; determining whether the thermal activity is moving 405; analysing the thermal activity to determine whether it is characteristic of an animate being 407; and generating a visual alert (505, Fig. 5) via the headset 409. The visual alert to the user may be one of: an outline image; a thermal image; a photographic image; or a VR character. The pose and location of the source of the thermal activity may be continuously monitored and the visual alert updated accordingly. The VR headset may also generate an audio alert. The sensors may be thermopile sensors. The system ensures that the VR headset user is alerted to, but not startled by, the presence of a real world animate being.

WO 2017007179 discloses a method for expressing social presence of virtual avatar by using facial temperature change according to heartbeats, and system employing the same which comprises the steps of: detecting ECG data of an actual user in real time; detecting, from the ECG data, a facial temperature change according to heartbeats; and changing the face of the user's avatar in response to the facial temperature change.

AU 2018427118 provides multi-location virtual collaboration, monitoring and control in which a virtual presence may be established to provide engagement with equipment and operators at a hydrocarbon recovery, exploration, operation, or services environments in order to reduce the expense associated with having knowledge experts, or other personnel, travel to and work at remote hydrocarbon recovery, exploration, operation, or services environments. For example, a wearable device may allow a user, such as a subject matter expert, at a virtual real time operation center to view data from equipment located at the site.

JP 2020095517 gives image processing system, image processing method, imaging device and program which allows a communication band to be allocated to an image dynamically and appropriately, and to improve image quality and enhance position detection accuracy in an MR system. An MR system includes an HMD that photographs a real space and acquires a captured image and a PC (image processing device) that generates a composite image of the captured image and an image in a virtual space. The HMD includes: an index detection unit that detects presence or absence of an index to obtain the composite image in the acquired captured image; and a transmission control unit that selects the captured image to be transmitted to the PC or sets a compression ratio of the captured image based on a result of the index detection unit. The PC includes: an index detection unit that detects the index from the received captured image transmitted from the imaging device; and an image combining unit that combines the captured image with the image of the virtual space based on a result of the index detection unit. Thereby, a communication band can be allocated to the image dynamically and appropriately.

CN 111325124 comprises of real-time man- machine interaction system in virtual scene wherein the system comprises a visual attention area prediction module used for mutation detection and a behavior prediction module based on visual attention area characteristics. Wherein the visual attention area prediction module receives an input video frame sequence, carries out target information detection, smooth motion detection and mutation information detection in sequence to obtain a visual saliency map, and carries out visual attention area extraction on the visual saliency map to obtain an attention area map; and the behavior prediction module is used for predicting user behaviors by utilizing the characteristics after carrying out characteristics on the user visual area and the video content. According to the invention, the feedback behavior of the user after observing the video is predicted by inputting the video content observed by the user, so that the method can better operate and can cope with a smoothly changing scene in the presence of sudden changes in the scene.

Although a number of systems like videoconferencing are available in the prior art for providing some kind of virtual presence to a host located at another location, such systems do not have a provision for physical presence of the user at the desired location, which is essential for conducting various meetings, inspections, presentations etc. Thus a virtual presence device, which uses trained persons, as virtual hosts, wearing a part of the device, to represent their hosts, which are remotely located and wearing the other part of the device, is the need of the day.

OBJECT OF THE INVENTION:

The main object of the invention is to provide a virtual presence device which uses trained humans to represent their hosts using Man Machine Interface which facilitates remotely conducting meetings, conferences, inspections etc. by a user (host) in any part of the world in real time, with the help of a trained person using the virtual presence device.

Another object of the invention is to provide a virtual presence device which uses trained humans to represent their hosts using Man Machine Interface in which the host has a device through which he can control the device of the trained person (virtual host) representing him at a remote location.

Still another object of the invention is to provide a virtual presence device which uses trained humans to represent their hosts using Man Machine Interface in which the virtual host device will cast/stream the host data to multiple streams of his device and convey the physical movements and responses received from the host device.

Yet another object of the invention is to provide a virtual presence device which uses trained humans to represent their hosts using Man Machine Interface in which the virtual host in the remote location discusses, presents, moves, examines, repairs, trains and/or behaves according to the host due to the continuous communication between the host device and the virtual host device. A further object of the invention is to provide a virtual presence device which uses trained humans to represent their hosts using Man Machine Interface which saves both time and money of the user (host).

SUMMARY OF THE INVENTION:

The present invention provides a virtual presence device which uses trained humans to represent their hosts using Man Machine Interface which uses one person (virtual host) to represent another person (real host) at a different location, using Man Machine Interface (MMI). The host, through his device, guides the virtual host, on his device, for actions like walking, running, turning, bowing, seating, standing, speaking, translating, broadcasting, presenting, repairing, inspecting, training etc., thereby creating the hosts virtual presence in any part of the world in real time. The devices used for creating the virtual presence can be divided into host device and virtual host device. The host device comprises of 3D camera, monitor, pointing device, mic array, speakers and position sensor while the virtual host device comprises of curved display on exterior face of helmet, display on interior face of helmet, inside speaker, inside mic, mic array, speakers, 180+ degree camera, position sensor, IR sensor, battery pack, in-use charging, dual band Wi-Fi and active or passive cooling system Both the devices of the host and virtual host are wearable on heads due to which their hands are free for any kinds of work or gestures.

BRIEF DESCRIPTION OF THE DRAWINGS: Fig. 1 displays the two parts of the device, one worn by the real host, another worn by the virtual host.

Fig. 2 gives the list of the hardware of both the host device and the virtual host device

Fig. 3 gives the image of the 3D sensing camera.

Fig. 4 gives the image of the transparent and flexible OLED display.

Fig. 5 gives the image of the hand-gesture device.

Fig. 6 gives the flowchart of the gesture recognition framework.

Fig. 7 gives an image of the gyroscopic sensor.

DETAILED DESCRIPTION:

The nature of the invention and the manner in which it is performed is clearly described in the specification. The invention has various components and they are clearly described in the detailed description of the complete specification.

In this technologically advanced world, virtual meetings, presentations, discussions, teaching etc. have become quite important as they save the travelling time of people as well as the money spent on the travel. These virtual meetings, presentations, discussions, teaching etc. take place over the internet using integrated audio and video, chat tools and application sharing. However such virtual presence lacks the personal touch as everyone can view and hear others only through the screen. The movements, gestures, inspections, analysis etc., which can be projected during physical presence, become difficult to be portrayed through the screen.

The present invention describes a virtual presence device which uses trained humans to represent their hosts using Man Machine Interface which is used to provide a real time virtual presence experience. This device helps a human host to create his/her virtual presence in any part of the world in real time with the help of a virtual host, who acts as a virtual host. People surrounding the virtual host can interact with the actual host in real time and the host too can interact with them through the virtual host. With the help of this device, the host can attend meetings in real time, carry out audits remotely of any place, offer service support remotely to industries, can address audience remotely, can operate equipment/machinery remotely, carry out repairs remotely, explore remote locations/places and many such activities with the aid of the virtual host.

This device comprises of two parts as shown in Fig. 1, in which one part of the device is worn as a headgear (wearable hat/helmet type of device) by the host, who wishes to display his virtual presence at a remote location wherein the facial expressions, voice and video are captured through the headgear while the body movements are captured by gyroscopic sensors and devices worn on the body by the user. This part of the device has an authoritarian control of the meeting and will be able to control the hardware of the virtual host, as and when required.

The other part of the device is worn as a headgear (wearable helmet type of device) by the virtual host, as shown in Fig. 1 who represents the host at the remote location wherein the face of the host is displayed on the front screen of the headgear along with his voice and video. The virtual host also wears gyroscopic sensors and devices on his body to capture the gestures of the host and perform the actions instructed by the host. This device also has external speakers as well as internal speakers and microphones for private as well as public communication. This device casts/streams the host data to multiple streams of devices and also conveys the physical movements and responses received. The virtual host will also be able to act as translator and interpreter, in the language required.

Both parts of the device are connected to cloud servers for data storage and internet connectivity using various available devices and technologies like Wi-Fi, 2G, 3G, 4G, 5G internet connectivity for communication. A number of hardware are provided in both the headgears which work with the related software to transfer the activities performed by the host to the virtual host, such that the virtual host will perform these activities at the remote location as a proxy of the host. The host guides the virtual host for actions like walking, running, turning, bowing, seating, standing, speaking, talking, broadcasting, presenting, inspection, repairing, healing, training etc., which are replicated by the virtual host present at the remote location.

In the present invention, the host device comprises of hardware like 3D sensing camera, transparent and flexible OLED display, gesture control devices, microphone array, speakers array, gyroscopic sensors, GPS sensors, motion sensors, proximity sensors, infrared sensors, force sensors, temperature/humidity sensors, communication module (4G, 5G, Wi-Fi), battery pack, and active or passive cooling system.

The virtual host device comprises of hardware like 3D sensing 180+ degree camera, transparent and flexible OLED display, inside display device, gesture control devices, microphone array (including inside microphone and outside microphone), speakers array (including inside speakers and outside speakers), gyroscopic sensors, GPS sensors, motion sensors, proximity sensors, infrared sensors, force sensors, temperature/humidity sensors, communication module (4G, 5G, Wi-Fi), battery pack, and active or passive cooling system. The hardware of both the host device and the virtual host device have been listed in Fig. 2. The software features of the host device and the virtual host device comprise of Cloud based Authentication, End-To-End Encryption (E2EE), Video & audio Streaming and manipulation, Gesture recording from devices, Movement control and positioning, Giving live feedback to the operator, Playing pre-recorded content, One to one communication feature, Saving meeting hours on cloud and Privacy mode.

The details of the hardware devices used and their functions are as provided below:

3D sensing camera (Fig. 3) - Three dimensional (3D) technology is a momentous scientific breakthrough. It is a depth-sensing technology that augments camera capabilities for facial and object recognition. The process of capturing a real-world object’s length, width, and height with more clarity and in-depth detail than can be achieved using a number of different technologies. 3D technology delivers unique advancements in the way day-to-day activities are perceived and approached.

3D is a real game-changer as manufacturers scramble to incorporate these new advancements into consumer products such as mobile phones. 3D sensing technology mimics the human visual system using optical technology, which facilitates the emergence and integration of augmented reality, Al (Artificial Intelligence), and the Internet of Things (loT). This creates unique opportunities in consumer applications.

Many of the key technologies driving the advancement of 3D sensing have their pros and cons. Designing these new systems involve developing high-quality sensors and efficient algorithms that can leverage new and existing technologies. For example, Vertical-Cavity Surface-Emitting

Lasers (VCSELs) are becoming the dominant. Light source technology for 3D sensing and can replace LEDs or edge-emitting laser diodes, as they are simple, have a narrow spectrum, and a stable temperature. Stereoscopic vision, structured light pattern, and time of flight are three technologies used for 3D sensing.

Each of these technologies has its common use-cases and individual strengths, which we discuss in more detail.

- Stereoscopic Vision: The stereoscopic vision technology derives its structure from the way human eyes capture any image. Two cameras are placed at slightly offset positions (just like human eyes). The two captured images are then united into one picture using the software. Small variances resulting from the different camera positions create the stereoscopic, i.e., 3D picture. In the assisted stereoscopic vision, a laser projection module is deployed, which projects dots on the object or scene to help the camera focus more easily. The captured image is processed to bring out a depth effect. For instance, this technology is used in bullet cameras installed for monitoring people’s movement at door entrances and other places. FLIR Systems (U.S) manufactures Stereo Vision Camera Systems with stereoscopic vision technology.

- Structured Light Pattern: A light pattern made of either line, squares (periodic structures), or dots is projected on to an object or a scene by a laser projection module. A distorted pattern is created by the reflected light. The reflected light from the target is captured by a camera mounted triangularly to the projection module. The pattern distortion achieved by the triangulation between the projection module and the camera helps in the acquisition of 3D coordinates of the object or scene. The most common example is the True Depth Camera used in iPhone X. The front camera with this technology adds an infrared emitter that projects over 30,000 dots in a known pattern onto the user’s face. Those dots are then photographed by a dedicated infrared camera for analysis, and thus, the image analyzed is used for accessing the phone.

- Time of Flight (ToF): Direct short light flashes are emitted by a projection module, captured by a camera module, and integrated with the system. The time taken by the light to travel from the emitter to the object and back to the camera is calculated. The data is then processed with coordinates, and a 3D picture is generated. In some cases, phase differences are used to calculate the depth and motion of the object detected. Wavelength stability over the entire operating temperature range of the optical source is critical to maintaining tracking precision, as filters are typically applied in the receive path to minimize the noise in the received signal. Time-of-Flight camera sensors can be used for object scanning, measuring distance, indoor navigation, obstacle avoidance, gesture recognition, tracking objects, measuring volumes, reactive altimeters, 3D photography, and augmented reality games, among others.

Transparent and flexible OLED display (Fig. 4) - Many transparent and flexible displays are based on the new display technology which is Organic Light Emitting Diode (OLED) technology, and they are going to be used often in future displays. Today, consumer electronic devices are using two major display technologies: LCD and OLED displays. Although most LCD TVs are equipped with backlit LED technology, recently LG and Sony are taking the technology one step further and have started manufacturing TVs/displays with OLED technology.

Both LCD and OLED have brighter displays in a sunny environment and provide excellent quality pictures. OLEDs are overall slightly brighter than LCD because of their dynamic range and no backlight function. Contrast ratio is another important aspect of picture quality. A display with high-contrast will look more realistic than a display with low-contrast. The contrast ratio is the difference between the brightest and darkest pixels in a display. OLED displays have a clear advantage in contrast because they have true black pixels.

Both LCD and OLED displays can provide Ultra HD 4K resolution, so there is no difference when it comes to display resolution. Some old LCDs are still available in 1080P resolution, but modern displays are all 4K.

Transparent OLED displays only consist of transparent components, and when they are turned off, they are up to 85% transparent. When it is turned on, it allows light to pass in both directions.

Currently OLED (Organic Light Emitting Diode) technology is being used to produce such displays. OLED is used because it can emit light from within the display wherein other types of display technologies needs external lighting sources thus OLED can keep same level of brightness irrespective to angel of curvature.

Using this flexible display technology, we can mimic the average curvature of the human face and using imaging technology we can reproduce hi-quality real looking video output of the face of the host.

Gesture Control Devices - Gesture control allows a human to interact with a device without touch or audio. Instead, the device can detect and decipher movements or actions and translate them into functions.

Gesture control is the ability to recognize and interpret movements of the human body in order to interact with and control a computer system without direct physical contact. The term “natural user interface” is becoming commonly used to describe these interface systems, reflecting the general lack of any intermediate devices between the user and the system.

Gesture control, or gesture recognition, is both a topic of computer science and language technology, where the primary goal is the interpretation of human gestures via algorithms. Gesture control devices have the ability to recognize and interpret movements of the human body, allowing users to interact with and control a system without direct physical contact. Gestures can originate from any bodily motion or state, but normally originate from the hand. There are several different types of touchless gesture control technologies used today to enable devices to recognize and respond to gestures and movements. These range from cameras to radar, and they each come with pros and cons depending on the application.

[Electromyography (EMG)] is the most used for hand-gesture identification (Fig. 5) and for the design of prosthetic hand controllers. EMG measures the electrical signal resulting from muscle activation. The source of the signal is the motor neuron action potentials generated during the muscle contraction. Generally, EMG can be detected either directly with electrodes inserted in the muscle tissue, or indirectly with surface electrodes positioned above the skin [surface EMG (sEMG), for simplicity we will refer to it as EMG]. The EMG is more popular for its accessibility and non-invasive nature. However, the use of EMG to discriminate between handgestures is a non-trivial task due to several physiological processes in the skeletal muscles underlying their generation. The Myo armband is a wearable device provided with eight equally spaced non-invasive EMG electrodes and a Bluetooth transmission module. The EMG electrodes detect signals from the forearm muscles activity and afterwards the acquired data is sent to an external electronic device. The sampling rates for Myo data are fixed at 200 Hz and the data is returned as a unitless 8-bit unsigned integer for each sensor representing “activation” and does not translate to millivolts (mV).

Recognition of human gestures comes within the more general framework of pattern recognition. In this framework, systems consist of two processes: the representation and the decision processes. The representation process converts the raw numerical data into a form adapted to the decision process which then classifies the data. This recognition process has been displayed in Fig- 6.

Gesture recognition systems inherit this structure and have two more processes: the acquisition process, which converts the physical gesture to numerical data, and the interpretation process, which gives the meaning of the symbol series coming from the decision process.

Microphone Array - A Microphone Array (or array microphone) is a microphone device that functions just like a regular microphone, but instead of having only one microphone to record sound input, it has multiple microphones (2 or more) to record sound.

The microphones in the array device work together to record sound simultaneously. Microphone arrays can be designed to have as many microphones in them as needed or wanted to record sound output. A common microphone array, however, is a 2-microphone array device, with one microphone placed on the left side of the device and and the other placed on the right side. With one microphone on each side, sounds can be recorded from both the left and right side of the room, making for a dynamic stereo recording which mimics surround sound. When played back on a stereo headset, the separate left and right channel recording are distinctly different and noticeably heard. The most important characteristic that must be present in microphone array devices is microphone matching. All of the microphones in an array must be similar and closely matched and in some aspects exactly the same in order for an array to pick up good recording. Otherwise, you can have a microphone array where one microphone has much higher gain than the other, or the microphones could be out of phase so that one microphone records before the other, or the problem where one microphone picks up sounds from all directions while the other picks up sounds from only a certain direction. These are all unwanted in audio applications and are some of the horrible consequences that can ensue when microphones in an array aren't carefully chosen. Therefore, the microphones must all be similarly matched and meet a number of specifications so there are no uneven sound recordings.

Three aspects to consider for microphone matching in microphone arrays are directionality, sensitivity, and phase.

Directionality: The directionality of a microphone is the direction from which it can pick up sounds. Microphones are made to pick up sound from certain directions when spoken into. Some microphones are made to only pick up sounds from one direction, unidirectional microphones. Other microphones are made so that they can pick up sounds from all directions, omnidirectional microphones. When building an array microphone, all the microphones must have the same directionality. Having one microphone pick up sounds only from a certain direction and the other pick up sounds from all directions would make for disastrous, imbalanced sound recording. Unless there is some unique situation where this would be the case, this is largely undesired.

Therefore, microphone arrays are always made with microphones of the same directionality. Sensitivity: Sensitivity is another aspect that must match for microphone arrays. Sensitivity is the gain that a microphone picks up when recording a signal. Sensitivity must be closely matched in microphone array devices, or else one microphone will be louder than the other, producing imbalanced sound recordings. This is why usually the maximum sensitivity difference allowed in array microphones is ±1.5dB so that there is no bigger than a 3dB difference in microphone sensitivity of the microphones.

Phase: Phase is the last important aspect that must match for microphone arrays. Phase is the degree line of reference for the time that a microphone begins recording, meaning, it determines the time that all microphones in an array start and stop recording. If microphones have drastically different phases, they will record signals at different times. This will lead to unsynchronized recording. Again, this is largely undesired. It is desired that microphones record signals at the same time so that there is no delay between signals. Just like sensitivity, there must be a maximum allowable tolerance for phase difference between microphones. This difference is usually ±1.5 degrees to ensure that signals record at the same time, leading to harmonized recording.

Microphone array consists of multiple mics placed on different angels throughout the attendee device, using multiple mics on different angle will provide the audio direction feedback to the system which enable to locate source of recorded voice.

Followings are the uses of mic array for the system.

Reducing ambient noise.

Acoustic source localization

Focussing the voice of the conveyor and reducing ambient noise. Binaural recording.

Speakers Array - Vertical line arrays and column speakers tend to provide good control of the vertical coverage, but provide a predetermined horizontal. Depending on the aspect ratio of the room and the horizontal reverberation character of the space, point source speakers can perform better than a line array of speakers in this regard. In air, sound is transmitted by pressure variations from its source to the surroundings.

The sound level decreases as it gets further and further away from its source. While absorption by air is one of the factors attributing to the weakening of a sound during transmission, distance plays a more important role in noise reduction during transmission. The reduction of a sound is called attenuation.

The effect of distance attenuation depends on the type of sound sources. Most sounds or noises we encountered in our daily life are from sources which can be characterized as point or line sources. If a sound source produces spherical spreading of sound in all directions, it is a point source.

For a point source, the noise level decreases by 6 dB per doubling of distance from it. If the sound source produces cylindrical spreading of sound, it may be considered as a line source. For a line source, the noise level decreases by 3 dB per doubling of distance from it.

Since Attendee device has to broadcast voice from host, speaker array has to installed on attendee device to carry sound as far as possible and in each directions.

Gyroscope Sensors - Gyroscope sensor is a device that can measure and maintain the orientation and angular velocity of an object. These are more advanced than accelerometers. These can measure the tilt and lateral orientation of the object whereas accelerometer can only measure the linear motion.

Gyroscope sensors are also called as Angular Rate Sensor or Angular Velocity Sensors. These sensors are installed in the applications where the orientation of the object is difficult to sense by humans. Measured in degrees per second, angular velocity is the change in the rotational angle of the object per unit of time.

Besides sensing the angular velocity, Gyroscope sensors can also measure the motion of the object. For more robust and accurate motion sensing, in consumer electronics Gyroscope sensors are combined with Accelerometer sensors.

Depending on the direction there are three types of angular rate measurements. Yaw- the horizontal rotation on a flat surface when seen the object from above, Pitch- Vertical rotation as seen the object from front, Roll- the horizontal rotation when seen the object from front.

The concept of Coriolis force is used in Gyroscope sensors. In this sensor to measure the angular rate, the rotation rate of the sensor is converted into an electrical signal. With the advance in technology highly accurate, reliable and miniature devices are being manufactured. More accurate measurements of orientation and movement in a 3D space became possible with the integration of the Gyroscope sensor. Gyroscopes are also available in different sizes with different performances.

Based on their sizes, Gyroscope sensors are divided as small and large-sized. From large to small the hierarchy of Gyroscope sensors can be listed as Ring laser gyroscope, Fiber-optic gyroscope,

Fluid gyroscope, and Vibration gyroscope. Being small and more easy to use Vibration gyroscope, displayed in Fig. 7, is most popular. The accuracy of vibration gyroscope depends upon the stationary element material used in the sensor and structural differences.

The main functions of the Gyroscope Sensor for all the applications are Angular velocity sensing, angle sensing, and control mechanisms.

Using Gyroscope Sensors, we will be able to accurately transmit angular velocity data from host to attendee device and visa versa, which in turn will give processed data to gesture control devices to replace accurate movements for attendee human.

GPS Sensor - GPS is the abbreviation of Global Positioning System, which is a satellite navigation system, which is about 20,000km away from the Earth. It can provide us with location and time information. It can work 24 hours a day under any conditions. A complete GPS requires at least 24 satellites. As technology develop, more than 33 satellites work together in the system of GPS.

Now we can use GPS in the navigation of airplanes, cars and trucks, GPS tracker that is a terminal device based on GPS positioning technology.

Using GPS sensor in Host as well as Attendee device we will be able to capture geographical usage of both the devices. In future versions of the present invention, we may develop one to many communication/broadcasting devices, wherein advance level of GPS data usage will play vital role.

Motion Sensors / Proximity Sensors / Infrared sensors / Force Sensors/ Temperature & Humidity Sensor - Various other types of sensors will be integrated within Host & Attendee devices to exchange environmental details in more precise manner. Motion detection, Proximity for absence presence, Infrared for heat, force for movement intensity, temperate & humidity sensors, etc. will be added from time to time.

Our motto is to create virtual + augmented reality using human for human in such a way, that at both the ends, as much as we can environment /surrounding data is exchanged.

Communication Module (4G/5G/Wi-Fi) - We want our devices communicate on high speed network & should be capable enough to transmit video/audio along with other data related to sensors. Our devices will be equipped with 4G/5G & high speed Wi-Fi communication modules. As we know 5G speeds will range from ~50 Mbps to over 1 Gbps. The fastest 5G speeds would be in the mmWave bands and can reach 4 Gb/s with carrier aggregation and MIMO.

To cover maximum audit trails from both end devices, we need very strong data communication network wherein 5G will be best suitable in future to come.

Battery Pack - Battery technology has already improved immensely over the nickel-toting cells used in the 80s. The following decade’s switch to lithium-ion/poly batteries has allowed more power to be crammed into smaller spaces, helping kick-off the smartphone revolution. Today, manufacturers are already using innovative solutions to provide more power, and there isn’t a day that goes by without news of a potentially revolutionary new bit of battery tech hitting the news.

There are variety of options today in battery technology front, wherein we will have to smartly evaluate & select best suitable solution for longer life & fast charging.

The user helmet will have one battery pack installed in the device having capacity to run independently from the docking station. Another battery pack with snap in functionality will enable the user to carry extra battery pack while in mobile use and will also enable the user swap battery without interrupting the device functionality.

Active or passive cooling system - Having multiple display and processor on the device will generate heat and may decrease life and efficiency of the device.

Passive cooling system along with active cooling system will carry the heat from the device away as silently possible thus reducing noise interruption.

The details of the software modules used are as provided below:

Cloud based Authentication - Unlike normal web based applications, cloud computing application provide many unique services to users. So the authentication process in cloud computing is much different than to a normal web application. Traditional approach where each application keeps track of its user names and password in different place is not feasible in cloud based approach. For instance just think about a cloud application which provide 10 unique services, if the authentication is achieved according to a normal web application user would have to remember 10 different passwords and user name. Therefore cloud based application uses different kind of authentication mechanism called Single Sign on (SSO).

Simply the single sign on means accessing several cloud based application using one user name and password. User can just log on to the system once and then they could access all the services they have registered. Single sign on is been used in several cloud based applications Google apps engine and Ping identity are some applications which provide Single sign on based approach to register users. End to end encryption - End-to-end encryption (E2EE) is a method of secure communication that prevents third parties from accessing data while it's transferred from one end system or device to another.

In E2EE, the data is encrypted on the sender's system or device, and only the intended recipient can decrypt it. As it travels to its destination, the message cannot be read or tampered with by an internet service provider (ISP), application service provider, hacker or any other entity or service. Many popular messaging service providers use end-to-end encryption, including Facebook, WhatsApp and Zoom. In the present invention, E2EE is used to secure communication.

Video & audio Streaming and manipulation - Video streaming software allows to mix multiple camera sources to create a professional- looking HD broadcast. Encoding is another major function of streaming software. Typically, it serves two main purposes: encoding and mixing/production. Live broadcasting software uses video encoding technology to convert your video feed into a suitable format for live streaming.

One hand Audio & Video stream from host device will be broadcasted live on the front transparent OLED screen to give audience live presence of the host. The software will get the input from the 3D camera hardware connected to the host device. 3D camera will provide depth and distance data along with the video stream, the software will than perform facial recognition and will capture the depth information of the facial area.

Software will generate an instant 3d model from the captured data and will transmit to the attendee side to play on the convex flexible display to mimic depth information of the human face on the curved display. On the other hand the system will use the input from multiple camera array from the attendee device and will create panoramic video stream from the attendee device and using live stitching and stretching operation will create video having width of human eye viewport.

The software will also use facial recognition to verify the attendance of the attendee and store and display Realtime on the screen along with other statistics related to the attendee and meeting. Gesture recording from devices - Gesturing is a natural and intuitive way to interact with people and the environment. So it makes perfect sense to use hand gestures as a method of human-computer interaction (HCI).

Hand tracking and gesture recognition are not the same things. Both technologies are supposed to use hands for human-machine interaction (HMI) without touching, switching, or employing controllers. Sometimes, systems for hand tracking and gesture recognition require the use of markers, gloves, or sensors, but the ideal system requires nothing but a human hand.

The software will track and record the hand gesture of the narrator and will convert into in instruction set for the attended device operator. Intel has recently released a suite of depth and tracking technologies called RealSense, providing the developer community with open-source tools for a variety of languages and platforms. The Intel RealSense Depth Camera D455 with Lidar, stereo depth, tracking, and coded light capabilities provides a high level of gesture recognition and a longer range for HMI. With the help of a camera like this, dynamic hand gesture recognition systems can be applied to various use cases, from robotics and drones to 3D scanning and people tracking.

We will have to develop software drivers to interact with Gesture devise and transmit real time data between host and attendee device for real time actions. The user will have the option to pre-record some gestures that is being used frequently in conversations and it will be sync to the attendee device prior meeting for the operator to understand and train. The gestures can also be saved as pre-recorded steps to perform certain actions recorded in regular interval manner.

Movement control and positioning - As the attendee device can run on stationary as well as on mobile mode host device will have certain control of the attendee device mobility and in display signaling capacity.

Host will have ability to point the object from the stream of attendee device video feed and the attendee can act according to instruction and verbal communication. Host can also navigate the attendee and send visual signs and notification along with external data such as image or video as per need.

Software will transmit bidirectional data from host to attendee device to create real time virtual presence experience. Also such data will addon to enhance augmented reality experiences.

Giving live feedback to the operator - The software will take input from the installed gyroscopic and other positioning sensors to record the movement of the operator and it will compare with the given instructions, doing so will create a feedback loop system for gesture and movement. The software will process the compared data and will give feedback to the host as well as attendee device for improvisation and confirmation of the movement.

Playing pre-recorded content - In case of conferencing or repeating session, the host can play the pre recorded video along with the gesture and dynamic voice recorded audio, the other function of the system such as attendees and recording and logging from the attendee device will continue to function. One to one communication feature - This feature will be used when the host side enables the enhanced privacy mode. The operator will be replaced by the attendee, it will also disable the outside display and speaker system.

Enabling privacy mode will disable the software function related to storage, data processing and feedback system. At the moment the device will not store any video or audio data and will not log the communication other than connection and disconnection time.

The user will not be able to access any features related to data storage and manipulation while in one to one communication mode.

Saving meeting hours on cloud - We need cloud-based meeting management system that provides a secure and efficient method for scheduling, executing and archiving meetings. Software should be designed to drive accountability, streamline meeting procedures and align management goals.

The system should have integration to MS Outlook, powerful Agenda, Task and Meeting Minutes modules, as well as a comprehensive Meeting Analytics system for analyzing meeting data making it most effective meeting management tool.

No data during the meeting will be saved on the local disk of any of the device. User will have option to save the up-link and down-link stream on the cloud on demand. And can be accessed the saved stream anytime using the cloud service.

Using this feature user will have option to create minutes of meeting based on the interaction and speech recognition. User will also be able to forward and save the meeting for further references. Privacy mode - This mode enables the host user to have a private conversation with the virtual host (by replacing the operator with virtual host) and the software and service will not store or process with any of the tools of the software as well as will not be stored on cloud.

The virtual presence device of the present invention which uses trained persons to represent their hosts using Man Machine Interface (MMI) is highly advantageous as it solves a major problem faced in virtual interactions, which is of providing personal presence at a remote location. It provides a virtual host in proxy to the host (user), who walks, talks, moves, enacts, discusses, trains, diagnoses, inspects etc. at the remote location, as per the guidance of the host. This saves the time as well as money of the host which would have been spent if he would have to go personally to the remote location to attend the event.

Although the preferred embodiment as well as the construction and use have been specifically described, it should be understood that variations in the preferred embodiment could be achieved by a person skilled in the art without departing from the spirit of the invention. The invention has been described with reference to specific embodiments which are merely illustrative and not intended to limit the scope of the invention as defined in the claims.