Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AUGMENTED REALITY CONTENT RENDERING
Document Type and Number:
WIPO Patent Application WO/2023/009112
Kind Code:
A1
Abstract:
This application is directed to systems and methods of rendering augmented reality (AR) content. A first imaging device and a plurality of second home devices are distributed in a premises. The first imaging device identifies a subset of second home devices based on a device pose of the first imaging device. In accordance with a determination that the first imaging device has access to information captured by the subset of second home devices, the first imaging device collects the information captured by the subset of second home devices and generates virtual content based on the collected information. The virtual content is rendered on a display. In some embodiments, a head-mounted display includes both the first imaging device and the display. In some embodiments, the plurality of second home devices include one or more imaging devices located at different locations of the premises.

Inventors:
TIAN YUAN (US)
LI XIANG (US)
XU YI (US)
Application Number:
PCT/US2021/043463
Publication Date:
February 02, 2023
Filing Date:
July 28, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INNOPEAK TECH INC (US)
International Classes:
G06T19/00; G06T7/73; G06T15/00
Foreign References:
US20200082600A12020-03-12
US20170228878A12017-08-10
US20200097770A12020-03-26
US20200111255A12020-04-09
US20160292918A12016-10-06
Attorney, Agent or Firm:
WANG, Jianbai et al. (US)
Download PDF:
Claims:
What is claimed is:

1. A method for rendering augmented reality (AR) content, comprising: obtaining a device pose of a first imaging device; identifying a subset of a plurality of second home devices based on the device pose of the first imaging device, wherein the first imaging device and second home devices are distributed in a premises; in accordance with a determination that the first imaging device has an access to information captured by the subset of second home devices, collecting the information captured by the subset of second home devices, and generating virtual content based on the collected information; and rendering the virtual content on a display.

2. The method of claim 1, the subset of second home devices including a camera device, identifying the subset of second home devices further comprising: determining a first field of view, a device position, and a device orientation of the display based on the device pose, the device pose including a device position and a device orientation of the first imaging device; and identifying the camera device according to the device position and the device orientation of the display, wherein the camera device has a second field of view that intersects with an extension of the first field of view and is separated from the first field of view by at least a structure; wherein the collected information includes video data captured by the camera device.

3. The method of claim 2, the first imaging device located in a first portion of the premises, the camera device located at a second portion of the premises that is at least partially separated from the first portion by the structure, further comprising: identifying a person in the video data captured by the camera device, wherein the virtual content includes an avatar generated based on the person.

4. The method of claim 3, rendering the virtual content on the display further comprising: rendering the avatar of the person on the display without any background content from the second field of view; rendering both the avatar of the person and part of the background content from the second field of view surrounding the person on the display; or rendering the avatar of the person and a layout wireframe associated with the second field of view on the display.

5. The method of claim 3 or 4, wherein the first imaging device includes the display, and the avatar of the person is overlaid on top of the first field of view of the display.

6. The method of any of claims 3-5, further comprising: determining a posture of the person in the video data captured by the camera device; wherein the avatar of the person is rendered on the display with the posture of the person.

7. The method of any of claims 3-6, wherein a first AR head mounted display includes both the first imaging device and the display, and the person carries a second AR head mounted display, further comprising: after identifying the camera device and the person in the second field of view, receiving, by the first AR head mounted display, a message from the second AR head mounted display carried by the person; and displaying the message with the avatar of the person on the display.

8. The method of any of claims 3-6, wherein the first imaging device includes a first AR head mounted display, and the person carries a second AR head mounted display, further comprising: after identifying the camera device and the person in the second field of view, exchanging voice messages between the first and second AR head mounted displays.

9. The method of any of claims 2-8, the camera device including a first camera device, the subset of second home devices including a second camera device, identifying the subset of second home devices further comprising: identifying the second camera device according to the device pose of the first imaging device, wherein the second camera device has a third field of view that intersects with the extension of the first field of view and is separated from the first field of view; identifying a second person in video data captured by the second camera device; and rendering a second avatar of the second person that is captured by the second camera device on the display jointly with the avatar of the person in the video data that is captured by the first camera device.

10. The method of claim 1, further comprising: identifying an interested object based on the collected information captured by the subset of second home devices, wherein the virtual content includes a virtual object generated based on the interested object.

11. The method of any of the preceding claims, wherein: the subset of second home devices includes a thermostat, and the collected information includes a temperature value measured by the thermostat; and the virtual content includes the temperature value that is rendered for display to a user of the display.

12. The method of any of the preceding claims, further comprising: obtaining location information of each of the first imaging device and second home devices, thereby determining a relative location of each second home device with reference to a device location of the first imaging device; wherein the first imaging device is moveable in the premises.

13. The method of any of the preceding claims, further comprising: scanning the premises using a sensor device to obtain a map including a respective location of each fixed home device in the second home devices.

14. The method of any of the preceding claims, further comprising: determining that the first imaging device and the plurality of second home devices are associated with a user account on an online platform hosted by a server, thereby determining that the first imaging device has the access to the collected information; wherein the first imaging device and second home devices are configured to communicate with each other and with the server via one or more communication networks.

15. The method of any of the preceding claims, further comprising: identifying a third electronic device that is also located in a direction associated with the device pose of the first imaging device; determining that the first imaging device does not have access to information collected by the third electronic device; wherein the virtual content is rendered on the display, independently of the information collected by the third electronic device.

16. A computer system, comprising: one or more processors; and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform a method of any of claims 1-15.

17. A non-transitory computer-readable medium, having instructions stored thereon, which when executed by one or more processors cause the processors to perform a method of any of claims 1-15.

Description:
Augmented Reality Content Rendering

TECHNICAL FIELD

[0001] This application relates generally to virtual object rendering technology including, but not limited to, methods, systems, and non-transitory computer-readable media for presenting virtual content associated with a person in a field of view of a camera of an electronic device.

BACKGROUND

[0002] Augmented Reality (AR) has become increasingly common in industrial settings and everyday life. Mobile devices and AR glasses are developed to enable AR applications that scan environment, render virtual content, and facilitate user interaction. The AR applications are implemented for local presence, remote presence, or a hybrid of both. In local presence implementations, multiple users are registered to a common coordinate system using the same anchor (e.g., points, image, 3D structure) in the real world and interact with the same virtual content at the same time. In remote presence implementations, each individual user can see and interact with other users’ avatars through its own camera location and angle. In hybrid implementations, some users may be in the same physical location, while a remote user can join a session as an observer or a player. However, the local, remote or hybrid presence implementations of the AR applications require a meticulous hardware and software setup for image rendering, and do not provide flexibility needed for multiple users to interact with one another in a convenient and efficient manner. There is a need for an AR solution that enables multiple users to interact with one another in a flexible, efficient, and cost-effective manner.

SUMMARY

[0003] Various embodiments of this application are directed to methods and systems of rendering augmented reality (AR) content that merges real content captured by a camera and virtual content that are blocked by physical structures and not exposed in a field of view of the camera. In some embodiments, the methods and systems disclosed herein are used in scenarios that benefit from collaboration, personal interactions, and/or in person interactions. Particularly, the methods and systems disclosed herein take advantage of environment mapping and user tracking to improve remote communications through rendered AR content. For example, a building structure is mapped and users are tracked by one or more imaging devices, client devices, and/or IOT (Internet of Things) sensors that are distributed in the building structure and communicatively coupled via one or more communication networks. The AR content are rendered for the users, allowing the users to interact as if they were in person without real world boundaries or limitations. In some embodiments, walls and/or other obstacles between employee offices become transparent (e.g., “see through”) to allow colleagues to meet from their respective offices when they initiate an interaction (e.g., an AR session or collaboration). Similarly, each user’s pose (e.g., position and orientation) is used to simulate an in-person interaction including, for example, forming directional audio, selecting AR content, and adjusting characteristics of the AR content based on each user’s position. Various embodiments of this application can be implemented in both private and public settings, such as a home, an office, coffee shops, and airports.

[0004] In some embodiments, the methods and systems disclosed herein are performed at or in conjunction with artificial reality systems. Artificial reality systems include, but are not limited to, non-immersive, semi-immersive, and fully-immersive virtual reality (VR) systems; marker-based, markerless, location-based, and projection-based augmented reality systems; hybrid reality systems; and other types of mixed reality systems. For example, in some embodiments, an artificial reality system includes an optical see- through head-mounted display (OST-HMD) or other AR glasses that render virtual objects in real or virtual world environments. As the skilled artisan will appreciate upon reading the descriptions provided herein, the novel wearable devices described herein can be used with any of these types of artificial reality environments.

[0005] In one aspect, a method for rendering augmented reality (AR) content. The method includes obtaining a device pose of a first imaging device and identifying a subset of a plurality of second home devices based on the device pose of the first imaging device. The first imaging device and second home devices are distributed in a premises. The method includes in accordance with a determination that the first imaging device has access to information captured by the subset of second home devices, collecting the information captured by the subset of second home devices. The method further includes generating virtual content based on the collected information, and rendering the virtual content on a display.

[0006] In some embodiments, the subset of second home devices includes a camera device. Identifying the subset of second home devices further includes determining a first field of view, a device position, and a device orientation of the display based on the device pose. The device pose includes a device position and a device orientation of the first imaging device. The camera device is identified according to the device position and device orientation of the first imaging device. The camera device has a second field of view that intersects with an extension of the first field of view and is separated from the first field of view by at least a structure. The collected information includes video data captured by the camera device.

[0007] Further, in some embodiments, the first imaging device is located in a first portion of the premises, and the camera device is located at a second portion of the premises that is at least partially separated from the first portion by the structure. The method further includes identifying a person in the video data captured by the camera device. The virtual content includes an avatar generated based on the person. In some embodiments, the first imaging device includes the display, and the avatar of the person is overlaid on top of the first field of view of the display. The method optionally includes determining a posture of the person in the video data captured by the camera device. The avatar of the person is rendered on the display with the posture of the person.

[0008] Additionally, in some embodiments, rendering the virtual content on the display further includes rendering the avatar of the person on the display without any background content from the second field of view, rendering both the avatar of the person and part of the background content from the second field of view surrounding the person on the display, or rendering the avatar of the person and a layout wireframe associated with the second field of view on the display.

[0009] Further, in some embodiments, a first AR head mounted display includes both the first imaging device and the display, and the person carries a second AR head mounted display. The method further includes after identifying the camera device and the person in the second field of view, receiving, by the first AR head mounted display, a message from the second AR head mounted display carried by the person. The method further includes displaying the message with the avatar of the person on the display. Alternatively, in some embodiments, the first imaging device includes a first AR head mounted display, and the person carries a second AR head mounted display. The method further includes after identifying the camera device and the person in the second field of view, exchanging voice messages between the first and second AR head mounted displays.

[0010] Further, in some embodiments, the camera device includes a first camera device, and the subset of second home devices includes a second camera device. Identifying the subset of second home devices further includes identifying the second camera device according to the device position and orientation of the first imaging device. The second camera device has a third field of view that intersects with the extension of the first field of view and is separated from the first field of view. Identifying the subset of second home devices further includes identifying a second person in video data captured by the second camera device, and rendering a second avatar of the second person that is captured by the second camera device on the display jointly with the avatar of the person in the video data that is captured by the first camera device.

[0011] In some embodiments, the method includes identifying an interested object based on the collected information captured by the subset of second home devices, and the virtual content includes a virtual object generated based on the interested object.

[0012] In some embodiments, the subset of second home devices includes a thermostat, and the collected information includes a temperature value measured by the thermostat. The virtual content includes the temperature value that is rendered for display to a user of the display.

[0013] In some embodiments, the method includes obtaining location information of each of the first imaging device and second home devices, thereby determining a relative location of each second home device with reference to a device location of the first imaging device. The first imaging device is moveable in the premises.

[0014] In some embodiments, the method includes scanning the premises using a sensor device to obtain a map including a respective location of each fixed home device in the second home devices.

[0015] In some embodiments, the method includes determining that the first imaging device and the plurality of second home devices are associated with a user account on an online platform hosted by a server, thereby determining that the first imaging device has the access to the collected information. The imaging device and second home devices are configured to communicate with each other and with the server via one or more communication networks.

[0016] In some embodiments, the method further includes identifying a third electronic device that is also located in a direction associated with the device pose of the first imaging device and determining that the first imaging device does not have access to information collected by the third electronic device. The virtual content is rendered on the display, independently of the information collected by the third electronic device.

[0017] In another aspect, some embodiments include a computer system having one or more processors and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform any of the above methods. [0018] In yet another aspect, some embodiments include a non-transitory computer- readable medium, having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the above methods.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] For a better understanding of the various described embodiments, reference should be made to the Detailed Description below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.

[0020] Figure 1 is an example virtual content rendering system having one or more servers communicatively coupled to one or more client devices, imaging devices, internet of things sensors, in accordance with some embodiments.

[0021] Figure 2 is an overview of a computer system for rendering virtual content between users.

[0022] Figure 3 A is a block diagram of a computer system for rendering virtual content, in accordance with some embodiments.

[0023] Figure 3B is a diagram illustrating different fields of view of client devices and imaging devices of a computer system for rendering virtual content, in accordance with some embodiments.

[0024] Figure 4 is a diagram illustrating virtual content overview 400 including virtual content rendered on a display, in accordance with some embodiments.

[0025] Figure 5 is a block diagram of a computer system, in accordance with some embodiments.

[0026] Figures 6A-6D are flowcharts of a method for rendering virtual content, in accordance with some embodiments.

[0027] Like reference numerals refer to corresponding parts throughout the several views of the drawings.

DETAILED DESCRIPTION

[0028] Reference will now be made in detail to specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to assist in understanding the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that various alternatives may be used without departing from the scope of claims and the subject matter may be practiced without these specific details. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein can be implemented on many types of electronic systems with digital video capabilities.

[0029] Figure 1 is an example virtual content rendering system 100 having one or more servers communicatively coupled to one or more client devices, imaging devices, internet of things sensors, in accordance with some embodiments. Each of the one or more client devices 104 is, for example, a desktop computer 104A, a tablet computer 104B, a mobile phone 104C, or an intelligent, multi-sensing, network-connected home device (e.g., a surveillance camera). In some embodiments, the one or more client devices 104 include a pair of AR glasses 150 (also called head-mounted display (HMD)). In some embodiments, the HMD 150 is an optical see-through head-mounted display (OST-HMD), in which virtual objects are rendered on top of a field of view of a camera of the OST-HMD reflecting the real world. Each client device 104 is configured to collect data or user inputs, executes user applications, and present outputs on its user interface.

[0030] In some embodiments, the AR glasses 150 include one or more of an image sensor, a microphone, a speaker, one or more inertial sensors (e.g., gyroscope, accelerometer), and a display. The image sensor and microphone are configured to capture video and audio data from a scene of the AR glasses 150, while the one or more inertial sensors are configured to capture inertial sensor data. In some situations, the image sensor captures hand gestures of a user wearing the AR glasses 150. In some situations, the microphone records ambient sound, including user’s voice commands. In some situations, both video or static visual data captured by the image sensor and the inertial sensor data measured by the one or more inertial sensors are applied to determine and predict device poses (e.g., device positions and orientations). The video, static image, audio, or inertial sensor data captured by the AR glasses 150 are optionally processed by the AR glasses 150, server(s) 102, or both to recognize the device poses of the AR glasses 150.

[0031] The device poses are used to control the AR glasses 150 itself or interact with an application (e.g., a gaming application, a conferencing application, etc.) executed by the AR glasses 150. In some embodiments, the AR glasses 150 displays a user interface, and the recognized or predicted device poses are used to render or interact with user selectable display items on the user interface. In some embodiments, deep learning techniques are applied in the virtual content rendering system 100 to process video data, static image data, or inertial sensor data captured by the AR glasses 150. Device poses are recognized and predicted based on such video, static image, and/or inertial sensor data using a data processing model. Training of the data processing model is optionally implemented by the server 102 or AR glasses 150. Inference of the device poses is implemented by each of the server 102 and AR glasses 150 independently or by both of the server 102 and AR glasses 150 jointly. Also, a camera and a display of the AR glasses 150 have respective device poses that are equal to or slightly adjusted from the device pose as the AR glasses 150. A field of view of the camera of the AR glasses 150 is consistent with, and however, larger than a field of view of the display of the AR glasses 150.

[0032] Each of the one or more imaging devices 160 is, for example, a camera, a depth sensor, a light detection and ranging (LiDAR) sensor, or a laser scanner. In some embodiments, the imaging devices 160 are stationary. In some embodiments, the imaging devices are pan-tilt-zoom (PTZ) imaging devices 160. In some embodiments, the imaging devices 160 include tracking cameras, infrared sensors, CMOS sensors, etc. In some embodiments, the one or more internet of things (IOT) sensors 170 include a thermostat, an alarm system, a smart TV, a smart appliance, a smart lock, a smart speaker, and/or other devices that collect information of a building or premises or user inputs to perform one or more actions in a premises (e.g., a building structure).

[0033] The collected data or user inputs from the client devise 104, AR glasses 150, imaging devices 160, IOT sensors 170, or a combination thereof can be processed locally (e.g., for training and/or for prediction) at each device and/or remotely by the server(s) 102. The one or more servers 102 provides system data (e.g., boot files, operating system images, and user applications) to the client devices 104, AR glasses 150, imaging devices 160, or IOT sensors 170, and in some embodiments, processes the data and user inputs received from the client device(s) 104, AR glasses 150, imaging devices 160, or IOT sensors 170 when the user applications are executed on the client devices 104, AR glasses 150, imaging devices 160, or IOT sensors 170. In some embodiments, the virtual content rendering system 100 further includes a storage 106 for storing data related to the servers 102, client devices 104, AR glasses 150, imaging devices 160, or IOT sensors 170 and applications executed on the devices. For example, storage 106 may store video content (including visual and audio content), static visual content, and/or inertial sensor data for training a machine learning model (e.g., deep learning network). Alternatively, storage 106 may also store video content, static visual content, and/or inertial sensor data obtained by a client device 104, AR glasses 150, imaging device 160, or IOT sensor 170 to which a trained machine learning model can be applied to determine one or more poses associated with the video content, static visual content, and/or inertial sensor data. [0034] The one or more servers 102 can enable real-time data communication with the client devices 104, AR glasses 150, imaging devices 160, or IOT sensors 170 that are remote from each other or from the one or more servers 102. Further, in some embodiments, the one or more servers 102 can implement data processing tasks that cannot be or are preferably not completed locally by the client devices 104, AR glasses 150, imaging devices 160, or IOT sensors 170. For example, the client devices 104 include a game console (e.g., the head-mounted display 150) that executes an interactive online gaming application. The game console receives a user instruction and sends it to a game server 102 with user data. The game server 102 generates a stream of video data based on the user instruction and user data and providing the stream of video data for display on the game console and other client devices that are engaged in the same game session with the game console. In another example, the client devices 104 include a networked surveillance camera and a mobile phone 104C. The networked surveillance camera collects video data and streams the video data to a surveillance camera server 102 in real time. While the video data is optionally pre-processed on the surveillance camera, the surveillance camera server 102 processes the video data to identify motion or audio events in the video data and share information of these events with the mobile phone 104C, thereby allowing a user of the mobile phone 104 to monitor the events occurring near the networked surveillance camera in the real time and remotely.

[0035] The one or more servers 102, one or more client devices 104, AR glasses 150, imaging devices 160, IOT sensors 170, and storage 106 are communicatively coupled to each other via one or more communication networks 108, which are the medium used to provide communications links between these devices and computers connected together within the virtual content rendering system 100. The one or more communication networks 108 may include connections, such as wire, wireless communication links, or fiber optic cables. Examples of the one or more communication networks 108 include local area networks (LAN), wide area networks (WAN) such as the Internet, or a combination thereof. The one or more communication networks 108 are, optionally, implemented using any known network protocol, including various wired or wireless protocols, such as Ethernet, Universal Serial Bus (USB), FIREWIRE, Long Term Evolution (LTE), Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wi-Fi, voice over Internet Protocol (VoIP), Wi-MAX, or any other suitable communication protocol. A connection to the one or more communication networks 108 may be established either directly (e.g., using 3G/4G/5G connectivity to a wireless carrier), or through a network interface 110 (e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node), or through any combination thereof. As such, the one or more communication networks 108 can represent the Internet of a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other electronic systems that route data and messages.

[0036] In some embodiments, deep learning techniques are applied in the virtual content rendering system 100 to process content data (e.g., video data, visual data, audio data) obtained by an application executed at a client device 104 to identify information contained in the content data, match the content data with other data, categorize the content data, or synthesize related content data. The content data may broadly include inertial sensor data captured by inertial sensor(s) of a client device 104. In these deep learning techniques, data processing models are created based on one or more neural networks to process the content data. These data processing models are trained with training data before they are applied to process the content data. In some embodiments, both model training and data processing are implemented locally at each individual client device 104 (e.g., the client device 104C and head-mounted display 150). The client device 104 or head-mounted display 150 obtains the training data from the one or more servers 102 or storage 106 and applies the training data to train the data processing models.

[0037] Subsequently to model training, the client devices 104, AR glasses 150, imaging devices 160, and/or IOT sensors 170 obtain the content data (e.g., captures video data via an internal and/or external image sensor, such as a camera) and processes the content data using the training data processing models locally. Alternatively, in some embodiments, both model training and data processing are implemented remotely at a server 102 (e.g., the server 102 A) associated with a client device 104 (e.g. the client device 104 A and head- mounted display 150). The server 102A obtains the training data from itself, another server 102 or the storage 106 and applies the training data to train the data processing models. The client devices 104, head-mounted display 150, imaging devices 160, and/or IOT sensors 170 obtain the content data, send the content data to the server 102A (e.g., in an application) for data processing using the trained data processing models, receives data processing results (e.g., recognized or predicted device poses) from the server 102A, presents the results on a user interface (e.g., associated with the application), rending virtual objects in a field of view based on the poses, or implements some other functions based on the results.

[0038] The client devices 104, HMD 150, imaging devices 160, and/or IOT sensors 170 themselves implement no or little data processing on the content data prior to sending them to the server 102A. Additionally, in some embodiments, data processing is implemented locally at a client device 104, HMD 150, imaging devices 160, and/or IOT sensors 170 , while model training is implemented remotely at a server 102 (e.g., the server 102B) associated with the client device 104, HMD 150, imaging devices 160, and/or IOT sensors 170. The server 102B obtains the training data from itself, another server 102 or the storage 106 and applies the training data to train the data processing models. The trained data processing models are optionally stored in the server 102B or storage 106. The client devices 104, HMD 150, imaging devices 160, and/or IOT sensors 170 import the trained data processing models from the server 102B or storage 106, processes the content data using the data processing models, and generates data processing results to be presented on a user interface or used to initiate some functions (e.g., rendering virtual objects based on device poses) locally.

[0039] Figure 2 is an overview of a computer environment 200 for rendering virtual content, in accordance with some embodiments. The computer environment 200 includes one or more imaging devices 210 (similar to imaging devices 160; Figure 1), one or more client devices (e.g., mobile phone 104C, AR glasses 150, etc.), one or more sensors 217 (similar to IOT sensors 170; Figure 1), and one or more users 215. The imaging devices 210 and sensors 217 are located in a building structure 205. Each imaging device 210 or sensor 217 is optionally fixed at a respective location or moveable within the building structure 205. For example, the imaging devices 210-3 is fixed (e.g. stationary imaging device 210-3), and the pan-tilt-zoom (P/T/Z) imaging device 210-1 is moveable and can be repositioned within the building structure 205. The IOT sensors 217 include one or more of a thermostat, barometer, lights, proximity sensors, light sensors, contact sensor, depth sensor, laser scanner, and/or other sensors. Sensor 217-1 through sensor 217-3 are installed throughout building 205. In some embodiments, the one or more imaging devices 210 and the sensors 217 are communicatively coupled in the computer environment 200 and are both referred to as IOT devices 305 (Figure 3A). Each client device 104 is moveable or fixed in the building structure 205. For example, a smart TV is fixed in a room, and a mobile device 104C and AR glasses 150 are carried by a user and moved around the building structure 205. The client devices 104 are communicatively coupled in the computer environment. In some embodiments, a client device 104 (e.g., a mobile device 104C) includes an imaging device 210, IOT sensor 217, or both. Any two distinct devices of the one or more client devices 104 and the one or more imaging devices 210 have respective fields of views that optionally overlap or do not overlap as described and shown in Figure 3B.

[0040] In some embodiments, the computer environment 200 obtains location information of each of the client devices 104, imaging devices 210, and/or sensors 217, thereby allowing a determination of a relative location of each of the client devices 104, imaging devices 210, and/or sensors 217 with reference to a location of one another. In some embodiments, the computer environment 200 uses the imaging devices 210 and/or sensors 217 to scan and map the environment of the building 205, i.e., to scan the building and obtain a map including a respective location of each fixed and/or moving imaging devices 210 and/or sensors 217. Additionally, in some embodiments, the computer environment 200 use the client devices 104 to scan and map the environment of the building 205. The mapped environment includes a respective location of each imaging device 210 and/or sensor 217. In some embodiments, the mapped environment includes a respective location of each client device 104. Alternatively or additionally, in some embodiments, the mapped environment includes a dense map of each portion of the building 205(e.g., rooms, patio area, different floors, etc.) that includes a field of view of one or more imaging devices 210, IOT sensors 217, and/or client devices 104. The mapped environment of the building 205 is stored in memory of the computer environment 200. In some embodiments, the computer environment 200 includes mapped environments for one or more buildings, each building being scanned and mapped by their respective imaging devices 210 and/or sensors 217, and, in some embodiments, the client devices 104. In some embodiments, the computer environment 200 uses the mapped environments to allow for interaction between one or more users 215.

[0041] The computer environment 200 allows users associated with the computer environment 200 to interact with other users associated with the computer environment 200. In particular, the computer environment 200 allows users to audibly and/or visually interact with one another (e.g., via directional sounds, avatars, shared image data, shared video data, and/or other interactions). In some embodiments, the users interact with one another without being in physical proximity to one other. Physical proximity, for purposes of this disclosure, means within eyesight or earshot. In some embodiments, computer environment 200 facilitates interactions among users located at separate areas, rooms, or floors of the same building. For example, referring to Figure 2, a first user 215-1, second user 215-2, and third user 251-3 are in three different rooms of the same building, and cannot visually see each other because of walls of the building. The second user 215-2 and the third user 251-3 establish a communication session 219 between one another. Triangle 221 represents the different user communications that can be established by the three users 215-1, 215-2, and 215-3. In some embodiments, more than one communication session is established at a time in the building. For example, in some embodiments, the computer environment 200 enables two distinct communication sessions (e.g., a first communication session between the second user 215-2 and third user 251-3, a second communication session between the first user 215 and another user) simultaneously. In some embodiments, a single communication session involves more than two users.

[0042] In some embodiments, during a communication session, a device pose of a first device 250 (e.g., AR glasses 150 worn by the second user 215-2) is determined. A subset of building devices 104, 210, and 217 (e.g., one or more imaging devices 210, sensors 217, and/or client devices 104) are identified based on the device pose of the first device 250. The first device 250 and the building devices 104, 210, and 217 are distributed in a building 205 (or a premises). For example, as shown in Figure 2, the one or more imaging devices 210, sensors 217, AR glasses 150, and mobile phone 104 are all spread around building 205. In some embodiments, in accordance with a determination that the first device 250 has access to information captured by the subset of building devices 104, 210, and 217, the first device 250 collects the information captured by the subset of building devices 104, 210, and 217, and generates virtual content based on the collected information independently or jointly with the server 102. For example, the HMD 150 worn by the second user 215-2 has a device pose facing the third user 215-3. Based on the device pose, the devices 210-1, 201-2, 210-5, 210-7, 210-8, 210-9, 217-1, and 217-3 are in a field of view of the HMD 150 worn by the second user 215-2. The second user 215-2 optionally has access to a subset or all of the devices in the field of view of the HMD 150 worn by the second user 215-2. The information collected by the subset of these devices is then used to generate the virtual content for display on the HMD 150 worn by the second user 215-2.

[0043] In some embodiments, the first device 250 and the subset of building devices

104, 210, and 217 are associated with a user account on an online platform hosted by a server 102 (Figure 1), thereby allowing the first device 250 to have the access to the collected information of the subset of building devices 104, 210, and 217. In some embodiments, the first device 250 and the building devices 104, 210, and 217 are configured to communicate with each other via one or more communication networks for access control. In some embodiments, the first device 250 does not have access to information collected by a specific electronic device (e.g., sensor 217-3) although the specific electronic device is within a field of view of the first device 250. The first device generates the virtual content based on the collected information, independently from the information collected by the specific electronic device.

[0044] The computer environment 200 allows users 215 located within the same building to initiate an AR session with each other. A user initiates an AR session using AR glasses 150 or other artificial reality device. In some embodiments, the AR glasses 150 are used to select one or more other users to connect with. When an AR session is initiated, each user’s location with respect to the pre-built map of the building 205 is determined. In some embodiments, a user’s location (e.g., a location of the user 215-2 who is wearing the HMD 150) is determined in real time with respect to the pre-built map of the building using AR glasses 150. The image data and/or video data captured by the HMD 150 are analyzed to identify feature points, and the identified feature points are compared with the pre-built map to determine the location of the HMD 150 (i.e., the first device 250). Alternatively, in some embodiments, a user’s pose (position and orientation) with respect to the pre-built map is tracked using the one or more imaging devices 210 and/or sensors 217 that are external to the user. For example, the user 215-1 does not wear any HMD 150, and a location of the user 215-1 is tracked by the imaging device 210-3. When the user 215-1 moves to a room 225-3, the location of the user 215-1 continues to be monitored by the imaging device 210-6. Computer vision based human tracking is performed at the imaging device 210-3 or 210-6.

In some embodiment, the computer environment 200 is configured to track each device’s pose using the plurality of imaging devices 210 and/or the one or more sensors 217 included in the building 205. Alternatively or additionally, in some embodiments, the computer environment 200 is configured to track each device’s 215 pose using the one or more client devices 104 that are optionally carried by a user 215.

[0045] In some embodiments, the subset of building devices 104, 210, and 217 that are accessible to the first device 250 includes at least one imaging device 210. The respective imaging device 210 is identified according to a device position and orientation of the first device 250. For example, in Figure 2, the second user 215-2 wearing the AR glasses 150 faces towards the third user 215-3, and the AR glasses 150 are oriented towards the subset of building devices 104, 210, and 217 including the first imaging device 210-1. The AR glasses 150 confirms that the user 215-2 is permitted to access the subset of building devices 104, 210, and 217, and initiates a new user interaction 219 or participates in an ongoing user interaction 219 with the user 215-3 via the first imaging device 210 of the subset of building devices 104, 210, and 217. In some embodiments, the subset of building devices 104, 210, and 217 include at least two imaging devices 210 identified according to the device position and orientation of the first device 250. For example, the subset of building devices 104, 210, and 217 includes both of the imaging devices 210-1 and 210-7, when the second user 215-2 faces towards the third user 215-3. Image data captured by both the imaging devices 210-1 and 210-7 are used in the user interaction 219. In some embodiments, based on the device pose of the first device 250, the subset of building devices 104, 210, and 217 accessible to the user 215-2 includes a ninth imaging device 210-9, which is in the field of view of the first device 250 worn by the user 215-2. The collected information includes image data and/or video data captured by the respective imaging devices 210 of the subset of building devices 104, 210, and 217.

[0046] In some embodiments, the user 215-2 (specifically, a user account of the user

215-2) has access to all building devices 104, 210, and 217 that are located in the direction to which the first device 250 face. In some embodiments, the first device 250 faces a plurality of building devices 104, 210, and 217. The user 215-2 only has access to a subset (e.g., less than all) of the plurality of building devices 104, 210, and 217 faced by the first device. The user 215-2 does not have access to data captured by one or more of the building devices 104, 210, and 217 faced by the first device.

[0047] In some embodiments, the imaging devices 210 of the subset of building devices 104, 210, and 217 have respective fields of view that intersect with an extension of a field of view of the first device. In some embodiments, one or more fields of view of the imaging devices 210 of the subset of building devices 104, 210, and 217 are separated or partially separated from the field of view of the first device by at least a structure. The structure can be a part of the building (e.g., a wall, a door, etc.), furniture, appliances, users, and/or fixtures within the building 205. In some embodiments, a structure is anything that occludes a user’s line of sight. The fields of view are discussed in detail below in Figure 3B. [0048] In some embodiments, virtual content is rendered on a display of the first device, e.g., on top of a first field of view of the display of the first device 250. For example, if the first device 250 is a pair of AR glasses 150, the virtual content is rendered on top of a first field of view of the display of the AR glasses 150. In some embodiments, while a user interaction between at least two users is ongoing, one or more users of the user interaction can see through walls of the building 205 and interact with each other to provide personalized and intimate interaction by displaying the virtual content. [0049] In some embodiments, the first device 250 is located in a first portion of the building (e.g., room 225-1), and one or more devices of the subset of building devices 104, 210, and 217 are located at a second portion of the building (e.g., room 225-2) that is at least partially separated from the first portion by the structure. For example, in Figure 2, user interaction 219 includes AR glasses 150 and the first imaging device 210-1. The AR glasses 150 are worn by the second user 215-2 and located in a first room 225-1 of the building 205, and the third user 215-3 is inside a second room 225-2 of the building 205. The third user 215-3 is identified in the image data and/or video data captured by the imaging device 210-1 or 210-7, and virtual content (e.g., an avatar of the third user 215-3) is generated based on the user 215-3 and rendered in the field of view of the AR glasses 150. In some embodiments, avatar generation is predefined, stylized, photorealistic, based on photo of user 215, etc. Additional detail on the virtual content and avatars rendered on top of a field of view of a device is provided below in reference to Figure 4.

[0050] The computer environment 200 provides a new interaction paradigm for collaboration among the users 215, and particularly, AR-based collaboration among the users. In some embodiments, two users 215 interact using the AR glasses 150, and the computer environment 200 causes each HMD 150 of a user to render on a respective field of view of the HMD 150 an avatar of the other user. For example, the first device 250 includes a first HMD 150-1 carried by the second user 215-2 and a second HMD 150-2 carried by the third user 215-3. Based on a respective pose of the first or second HMD 150, one or more imaging devices 210 are identified within the field of view of the first or second HMD 150 to capture the image data of the third user 215-3 or second user 215-2, respectively. For example, the imaging devices 210-1 and 210-7 are identified in an extended field of view of the first HMD 150-1, so is the third user 215-3 identified to be located in the extended field of view of the first HMD 150-1. The imaging devices 210-9 is identified to be located in an extended field of view of the second HMD 150-2 worn by the third user 215-3, so is the second user 215-2 identified to be located in the extended field of view of the second HMD 150-2 worn by the third user 215-3. After the imaging devices 210 are identified in the extended fields of view of the first and second HMDs, content (e.g., avatars and messages) are rendered for the first and second HMDs 150 based on the image data captured by these identified imaging devices 210. For example, an avatar of the user 215-3 is created based on the image data captured by the imaging device 210-1 and displayed in the first HMD 150-1 worn by the second user 215- 2, and an avatar of the user 215-2 is created based on the image data captured by the imaging device 210-9 and displayed in the second HMD 150-2 worn by the third user 215-3. [0051] In some embodiments, in a communication session (also called interaction

219), both a message and an avatar of a distinct user 215 are displayed on top of a field of view of the first device 250 associated with a user 215. For example, the first HMD 150-1 carried by the second user 215-2 is configured to display an avatar of the third user 215-3, and the second HMD 150-2 is configured to display an avatar of the second user 215-2. In some embodiments, more than one avatar is presented to the user 215. For example, if the first user 215-1 is part of user interaction 219, another avatar for the first user 215-1 is rendered for display on the first HMD 150-1 based on image data and/or video data captured by the imaging device 210-3. Specifically, two separate avatars representing the first user 215-1 and third user 215-3 are displayed on top of a field of view of the first HMD 150-1, and two separate avatars representing the first user 215-1 and second user 215-2 are displayed on top of a field of view of the second HMD 150-2.

[0052] It is noted that in some embodiments, the second user 215-2 wearing the first

HMD 150-1 is engaged in a communication session with the first user 215-1 who does not wear any HMD 150 and communicates via a mobile device. In some embodiments, the second user 215-2 wearing the first HMD 150-1 is not engaged in any communication session, and however, can see thorough the building structure 205 to select a user to initiate a communication session via a client device 104 carried by the selected user or via an IOT device 217 (e.g., a speaker) located in proximity to the selected user. In some embodiments, the second user 215-2 wearing the first HMD 150-1 can see thorough the building structure 205 to select an IOT device 217 and review data provided by the IOT device. For example, the selected IOT device 217 is a thermostat, and the second user 215-2 reviews temperature values set by the thermostat on the first HMD 150-1.

[0053] In some embodiments, the message is displayed on the first device 150 (e.g., the first HMD 150-1), and includes text, image data, and/or video data. In some embodiments, after the respective imaging devices 210 and users 215 are identified in the extended fields of view of the HMDs 150, the HMDs 150 exchange voice messages and/or voice data. In some embodiments, the message (audio and/or visual data) is dynamically displayed and updated on the first device 150 based on the device pose of the first device. Users 215 interact and/or communicate in a number of different ways. In some embodiments, a first user 215-1 appears to see through the wall by seeing a full body avatar of a second user 250-2 that is projected behind the wall corresponding to a real-world position of the second user 250-2. In some embodiments, room layout wireframes are visualized to gain more immersive experience. Alternatively, in some embodiments, the first user 215-1 can talk with the second user 215-2, and directional audio acts as if it comes from the real-world position of the second user 215-2. For example, if the second user 215-2 is in a southwestern location relative to the first user 215-1, the first user 215-1 receives audio from the southwestern location. In some embodiments, the first user 215-1 transfers digital content to the second user 215-2, and the content is shown virtually moving from a direction associated with the first user 215-1, i.e., coming into view of the first device 250 from a northeastern direction. For example, if the first user 215-1 is on the right of the second user 215-2 and sends a picture to the second user 215-2, the second user 215-2 sees a virtual copy of the picture moving from the right to the left on a display. In some embodiments, both users 215-2 and 215-3 see each other’s avatar and interact with each other in games or other collaborative tasks such as product design. In some embodiments, a first user 215-1 can choose to share his/her own virtualized environment (e.g., room) with the second user 215-2. In some embodiments, access control can be implemented. For example, a user 215 can define permissions to allow other users 215 to see him or her within the computer environment 200. A user 215 can also opt not to share his location and/or environment with other users 215. If the imaging devices 210-1 and 210-7 opt not to share its image data to a user account of the second user 215-2, the second user 215-2 cannot see the third user 215-3 in its extended field of view via the imaging devices 210-1 and 210-7. Conversely, if the second HMD 150-2 of the third user 215-3 opts not to share its location, the second user 215- 2 sees the third user 215-3 in its extended field of view via the imaging devices 210-1 and 210-7, and however, cannot initiate a communication session with the third user 215-3 via the second HMD 215-2.

[0054] In some embodiments, the computer environment 200 identifies an interested object based on the collected information captured by the subset of building devices 104, 210, and 217, and the virtual content includes a virtual object generated based on the interested object. In some embodiments, the interested objects include moving objects (e.g., pets, toys), entry points (e.g., doors, windows), unique objects (e.g., artwork), and/or other objects that a user 215 displays or presents to others. In some embodiments, the collected information includes a temperature value measured by a thermostat (or other sensor information), and the virtual content includes the temperature value (or other sensor data) that is rendered for display to the user 215-2 of the first device 250.

[0055] Figure 3 A is a block diagram of a computer system 300 for rendering virtual content, in accordance with some embodiments. In some embodiments, the computer system 300 includes one or more IOT devices 305, an IOT sensor calibration module 307, mapping information 309, a server 102, and client devices 104 (e.g., HMDs 150 in Figures 1 and 2) associated with individual users 315. In some embodiments, the IOT devices 305 are distributed in a premises (e.g., building 205; Figure 2). In some embodiments, one or more users 315 carry one or more client devices 104, such as AR glasses 150 (also called HMDs 150). The one or more client devices 104 are communicatively coupled in the computer system 300 such that the client devices 104 have access to the IOT devices 305, mapping information 309, and the server 102. In some embodiments, the computer system 300 defines a subset of the IOT devices 305 and mapping information 309 to which a client devices 104 has access. In some embodiments, the IOT devices 305, client devices 104, server 102 and other components of the computer system 300 are communicatively coupled together via one or more communication networks 108. In some embodiments, the server 102 uses mapping information 309 of the premises to initiate and/or enable one or more user interactions (also called communication sessions) in an AR collaboration environment described herein.

[0056] In some embodiments, the computer system 300 requires a pre-scanned map of the environment as input. Mapping is performed using one or more devices, such as one or more imaging devices (e.g., cameras), one or more sensors (e.g., depth sensors), LiDAR sensor, laser scanner, etc. In some embodiments, mapping is performed using a 3D reconstruction process (i.e., reconstructing a virtual representation of the scanned environment). In some embodiments, the one or more devices distributed in the premises and/or used for mapping the environment are collectively referred to as IOT devices 305. For example, the IOT devices 305 include one or more imaging devices 210 and one or more sensors 217 (Figure 2). Additionally, in some embodiments, the IOT devices 305 include client devices 104 and/or other smart devices (e.g., smart appliances, smart TVs, etc.). The constructed maps are stored in mapping information 309 for access by the computer system 300 and/or other devices connected to the computer system 300.

[0057] In some embodiments, the IOT devices 305 are pre-installed, calibrated, and registered to their respective locations with respect to the map using the IOT sensor calibration module 307. For example, imaging devices 210 and/or sensors 217 are registered using a visual relocalization algorithm. The ITO sensor calibration module 307 analyzes image data and/or video data captured by the imaging devices 210 and compares visual similarities with the pre-built map. This is optionally implemented in real time, allowing mobile devices (e.g. cameras mounted on moving platforms) to be used. In some embodiments, one or more sensors are manually added to the 3D digital model or using calibration algorithms designed for such sensors. For example, in some embodiment, a thermostat’s location is registered with respect to the map manually. The registered locations of the IOT devices 305 are stored with the constructed map information in the mapping information 309.

[0058] In some embodiments, each user associated with the computer system 300 is able to select one or more other users to establish communication. A user who has not established a user interaction or initiated a connection with another user is a free role 315-1. For example, in Figure 3A, a first user 215-1 has a free role 315-1. In some embodiments, a user initiates an interaction (e.g., communication session) with another user associated with the computer system 300. For example, a user uses a client device 104, such as the first HMD 150-1, to initiate a user interaction with other users. Users that have established a user interaction or initiated a connection with another user assume active user roles. For example, an ongoing AR session involves the second user 215-2 having a second user role 315-2 and the third user 215-3 having a third user role 315-3.

[0059] In some embodiments, each HMD 150-1 or 150-2 that implements the AR session includes a relocalization module 317 and/or an interaction module 319. Each user’s location with respect to the pre-built map of the premises is determined via the relocalization module 317 of a corresponding HMD 150. Alternatively, in some embodiments, each user location with respect to the pre-built map of the premises is determined using the HMD 150 worn by the user or the imaging device 210 having a field of view that includes the respective user location. The interaction module 319 facilitates the interaction corresponding to the AR session based on the image data captured by the imaging device 210 and HMDs 150.

[0060] Figure 3B is a diagram illustrating different fields of view of client devices

104 and imaging devices 355 of a computer system 300 for rendering virtual content, in accordance with some embodiments. The one or more imaging devices 355 (similar to imaging device 160 and 210 described above in Figures 1 and 2 respectively) are distributed in a premises. The one or more client devices 104 (Figure 1; e.g. AR glasses 150) are also distributed in the premises. In some embodiments, one or more client devices 104 are worn by one or more users 360. For example, in Figure 3B, a first user 360-1 is carrying AR glasses 150. In some embodiments, the one or more client devices 104 and the one or more imaging devices 355 include respective fields of view 365. For example, a first imaging device 355-1 (specifically, a display associated with the first imaging device) has a first field of view 365-1, a second imaging device 355-2 has a second field of view 365-2, a third imaging device 355-3 has a third field of view 365-3. Similarly, the AR glasses 150 carried by the first user 360-1 have a client device field of view 367. In some embodiments, one or more fields of view 365 of the one or more imaging devices 355 and/or the client devices 104 are separated or partially separated by at least a structure. As described above in Figure 2, a structure is anything that occludes a user’s line of sight (e.g., a wall 369) or a physical barrier between users.

[0061] In some embodiments, a user 360 is located in one or more fields of view of one or more imaging device 355 and/or one or more client devices 104. For example, a second user 360-2 is within a second field of view 365-2 of the second imaging device 355-2 and also within the field of view 367 of the AR glasses 150 carried by the first user 360-1. Similarly, a third user 360-3 is within the third field of view 365-3 of the third imaging device 355-3 and also within the client device field of view 367 of the AR glasses 150 carried by the first user 360-1. In some embodiments, a user 360 is within an intersecting region of the field of view 365 of an imaging device 355 and an extension of a client device field of view 367 of a client device 104, and/or a user 360 is within an intersecting region of the field of view 365 of an imaging device 355 and an extension of another field of view 365 of another imaging device 355.

[0062] In some embodiments, the computer system 300 determines a first field of view, a device orientation, and/or position of a display of a client device 104, such as AR glasses 150, based on the client device 104 pose. The computer system 300 identifies one or more imaging devices 355 and/or other client devices 104 distributed in the premises that have field of view that intersects with an extension of field of view 367 of the display of the client device (AR glasses 150) and is separate from the AR glasses 150 field of view 367 by at least a structure. For example, the computer system 300 determines a device orientation and/or position for the AR glasses 150 based on the pose of AR glasses 150 worn by the first user 306-1, and identifies at least the second imaging device 355-2 and the third imaging device 355-3, which each have a respective field of view (365-2 and 365-3) that intersect with the extension of the field of view 367 for the AR glasses 150 and are separated for the AR glasses 150 field of view 367 by at least wall 369. The collected information captured by identified one or more imaging devices 355 and/or other client devices 104 distributed in the premises that have field of view that intersects with an extension of the client device field of view 367 and is separate from the client device field of view 367 by at least a structure includes image data and/or video data captured by the one or more imaging devices 355 and/or other client devices 104. Although the above example describes the identification of one or more imaging devices 355 and/or other client devices 104 based on the pose of a client device (e.g., AR glasses 150), in some embodiments, the computer system 300 uses the determined pose of an imaging device 355 to identify one or more other imaging devices 355 and/or client devices 104 distributed in the premises as described above. Alternatively, in some embodiments, the computer system 300 identifies one or more imaging devices 355 and/or client devices based on only on intersecting of fields of view.

[0063] In some embodiments, the computer system 300 identifies one or more persons within image data and/or video data captured by the identified imaging devices 355 and/or client devices 104 (identified based on a device pose of a first device as described above). For example, the computer system 300 utilizes a device pose of AR glasses 150 worn by the first user 360-1 (in a first portion of the premises) to identify the second imaging devices 355-2 and the third imaging devices 355-3 (which are located at different portions of the premises), and further uses image data and/or video data captured by the second imaging devices 355-2 and the third imaging devices 355-3 to identify the second user 360-2 and the third user 360-3. In some embodiments, the captured image data and/or video data corresponds to the respective fields of view of the imaging devices 355 and/or client devices 104. In some embodiments, one or more imaging devices 355 and/or other client devices 104 are located at second portions of the premises, and the second portions are entirely separated or partially separated from the first portion of the premises where the first user 360-1 is located.

[0064] In some embodiments, a user 360 initiates (using a client device 104) a communication session with one or more other users 360 within fields of view of the imaging devices 355 and/or client devices 104 identified based on a device pose of a first device 250. In some embodiments, the user 360 selects one or more other users 360 to interact with. For example, the first user 360-1 selects (using AR glasses 150) the second user 360-2 to initiate an interaction with. In some embodiments, the computer system 300 uses that captured image data and/or video data by the identified imaging devices 355 and/or client devices 104 to generate and render virtual content that is laid on the field of view of an imaging device 355 and/or client device 104 (e.g., using AR glasses 150). In some embodiments, the virtual content generated by the computing system 300 includes an avatar generated based on the user engaged in the communication session.

[0065] Figure 4 is a diagram illustrating a virtual content environment 400 including virtual content rendered on a display, in accordance with some embodiments. In some embodiments, a first imaging device includes the display that displays images captured by the first imaging device in a first field of view. An example of the first imaging device is AR glasses 150 (worn by a first person 415-1), located in a first portion of the premises. When the virtual content 410 is displayed, the virtual content 410 is overlaid on top of the first field of view 407 of the display of the first imaging device. In some embodiments, the virtual content 410 is based on the collected information from one or more imaging devices 210 distributed in the premise. In some embodiments, the one or more imaging devices 210 distributed in the premises are located at different portions of the premises. In some embodiments, the one or more imaging devices 210 are separated at least partially from the first portion (where the first imaging device is positioned) by a structure (e.g., a wall, door, window, or curtain). For example, as shown in the virtual content environment 400, the imaging device 210 is in a second portion of the premises separated from the first portion of the premise by at least wall 402. In some situations, the wall 402 is seen in the field of view of the AR glasses 150, and a scene located behind the wall 402 and captured by the one or more imaging devices 210 is displayed on the wall 402 as if the AR glasses 150 has a see- through capability.

[0066] In some embodiments, the computer system 300 identifies a second person

415-2 in image data and/or video data captured by the imaging device 210, and renders virtual content 410 including an avatar 413 generated based on the second person 415-2. In some embodiments, the generated avatar 413 is predefined, stylized, or made photorealistic based on image data and/or video data of a user captured in real-time. In some embodiments, the avatar 413 is a predefined 3D avatar by an application developer or a stylized avatar generated by a character customization process. Alternatively or additionally, in some embodiments, the avatar 413 is generated from a photo of the person using a computer algorithm. In some embodiments, the second person 415-2 is located in a second field of view 409, specifically in an intersecting region of the second field of view 409 of the imaging device 210 and the extension of the first field of view 407 of the display of the AR glasses 150. It should be noted that the imaging device 210 does not have to be in the extension of the field of view of the AR glasses 150.

[0067] In some embodiments, the avatar 413 of the second person 415-2 is rendered on the first field of view 407 of the display of the AR glasses 150. In some embodiments, the computer system determines a posture of the second person 415-2 in the image data and/or video data captured by the imaging device 210, and the rendered avatar 413 of the person 415-2 is rendered in the first field of view 407 of the display of the AR glasses 150 with the posture of the second person 415-2. In some embodiments, the avatar 413 of the second person 415-2 is rendered on top of the first field of view 407 of the display of the AR glasses 150 without any background content 421 from the second field of view 409. Alternatively, in some embodiments, both the avatar 413 of the second person 415-2 and part of the background content from the second field of view 409 surrounding the second person 415-2 are rendered on top of the first field of view 407 of the display of the AR glasses 150. In some embodiments, the avatar 413 of the second person 415-2 and a layout wireframe associated with the second field of view 409 are rendered on top of the first field of view 407 of the display of the AR glasses 150. The layout wireframe associated with the second field of view 409 include room layout wireframe that provide an immersive experience to users. [0068] In some embodiments, the person 415-2 carries a second HMD 150-2 (not shown in Figure 4), and the AR glasses 150 worn by the first person 415-1 includes first AR glasses 150-1. After the imaging device 210 and the second person 415-2 in the second field of view 409 of the imaging device 210 are identified, the first AR glasses 150-1 receive a message 417 from the second HMD 150-2 carried by the second person 415-2. In some embodiments, the AR glasses 150-1 display the message 417 with the avatar 413 of the second person 415-2 on top of the first field of view 407 of the display of the AR glasses 150-1. In some embodiments the message 417 is dynamically displayed with a variation of the device pose (i.e., movement of the AR glasses 150). In some embodiments, the person 415-2 carries an HMD 150-2 (not shown), and, after the imaging device 210 and the second person 415-2 in the second field of view 409 of the imaging device 210 are identified, the first AR glasses 150-1 and the second HMD 150-2 carried by the second person 415-2 exchange voice messages 419 between the one another.

[0069] In some embodiments, one or more IOT sensors 170 or 217 (not shown in

Figure 4) are distributed in the premises and information is collected from one or more IOT sensors. For example, in some embodiments, the IOT sensors include a thermostat and the collected information includes a temperature value 411 measured by the thermostat. In some embodiments, the virtual content 410 includes the temperature value 411 that is rendered for display to a user (e.g., first user 215-1).

[0070] Figure 5 is a block diagram of a computer system 500, in accordance with some embodiments. The computer system 500 includes at least one of a client device 104, an imaging device 160, a server 102, an IOT sensor 170, or a combination thereof. An example of the client device 104 is a pair AR glasses 150. The computer system 500 is optionally used as the computer system 300 or the content rendering system 100. The computer system 500, typically, includes one or more processing units (CPUs) 502, one or more network interfaces 504, memory 506, and one or more communication buses 508 for interconnecting these components (sometimes called a chipset). The computer system 500 includes one or more input devices 510 that facilitate user input, such as a keyboard, a mouse, a voice- command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture capturing camera, or other input buttons or controls. The computer system 500 also includes one or more output devices 512 that enable presentation of user interfaces and display content, including one or more speakers and/or one or more visual displays. As described above in Figure 1, in some embodiments, devices of the computer system 500 are communicatively coupled to each other via the one or more network interfaces 504 or communication buses 508.

[0071] In some embodiments, the computer system 500 includes AR glasses 150 having one or more imaging devices (e.g., tracking cameras, infrared sensors, CMOS sensors, etc.), scanners, or photo sensor units for capturing images or video , detecting users or interesting objects, and/or environmental conditions (e.g., background scenery or objects).

The AR glasses 150 also include one or more output devices that enable presentation of user interfaces and display content, including one or more speakers and/or one or more visual displays. Optionally, the AR glasses 150 include a location detection device, such as a GPS (global positioning satellite) or other geo-location receiver, for determining the location of the AR glasses 150. Optionally, the AR glasses 150 include an inertial measurement unit (IMU) integrating multi-axes inertial sensors to provide estimation of a location and an orientation of the AR glasses 150 in space. Examples of the one or more inertial sensors include, but are not limited to, a gyroscope, an accelerometer, a magnetometer, and an inclinometer.

[0072] Memory 506 includes high-speed random access memory, such as DRAM,

SRAM, DDR RAM, or other random access solid state memory devices; and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. Memory 506, optionally, includes one or more storage devices remotely located from one or more processing units 502. Memory 506, or alternatively the non-volatile memory within memory 506, includes a non-transitory computer readable storage medium. In some embodiments, memory 506, or the non- transitory computer readable storage medium of memory 506, stores the following programs, modules, and data structures, or a subset or superset thereof:

• Operating system 515 including procedures for handling various basic system services and for performing hardware dependent tasks; • Network communication module 516 for connecting the server 102 and other devices (e.g., client devices 104, head-mounted displays 150, imaging devices 160, IOT sensors 170, and/or storage 106) via one or more network interfaces 504 (wired or wireless) via one or more communication networks 108, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;

• User interface module 517 for enabling presentation of information (e.g., a graphical user interface for application(s) 520, widgets, websites and web pages thereof, and/or games, audio and/or video content, text, etc.) at each client device 104 and/or head- mounted displays 150 via their respective output devices (e.g., displays, speakers, etc.);

• Input processing module 518 for detecting one or more user inputs or interactions from one of the one or more input devices 510 and interpreting the detected input or interaction;

• Web browser module 519 for navigating, requesting (e.g., via HTTP), and displaying websites and web pages thereof, including a web interface for logging into a user account associated with a client device 104 or another electronic device, controlling the client or electronic device if associated with the user account, and editing and reviewing settings and data that are associated with the user account;

• One or more user applications 520 for execution by the computer system 500 (e.g., games, social network applications, smart home applications, and/or other web or non-web based applications for controlling another electronic device and reviewing data captured by such devices);

• Model training module 521 for receiving training data (.g., training data 532) and establishing a data processing model (e.g., data processing module 522) for processing content data (e.g., video data, visual data, audio data, sensor data) to be collected or obtained by a client device 104, head-mounted display 150, imaging device 160, or IOT sensor 170;

• Data processing module 522 for processing content data using data processing models 533, thereby identifying information contained in the content data, matching the content data with other data, categorizing the content data, or synthesizing related content data, where in some embodiments, the data processing module 522 is associated with one of the user applications 520 to process the content data in response to a user instruction received from the user application 520 and includes at least: o FOV processing module 523 for processing a field of view of one or more of the client devices 104, head-mounted displays 150, and imaging devices 160, thereby identifying the coverage area of the fields of view, objects and/or users within the fields of view, intersecting fields of view of different devices; and o Mapping Module 524 for mapping (i.e., generating one or more maps for) a building or premises, the map including at least a premises layout (e.g., rooms, room type (e.g., office, kitchen, living room, etc.), room size, room location, etc.), device location (e.g., location of an imaging device 150, IOT 170, etc.), device field of view information, and/or building or premises information based on the content data obtained by one or more of the client devices 104, head- mounted displays 150, and imaging devices 160;

• Pose determination and prediction module 525 for determining and predicting a pose (position and orientation) of the client device 104 (e.g., AR glasses 150) and/or other devices, and further includes an SLAM (Simultaneous Localization and Mapping) module 526 for mapping a scene where a client device 104 and/other devices are located and identifying a location of the client device 104 and/or other devices within the scene;

• Content rendering module 527 for generating virtual content based on content data (e.g., video data, visual data, audio data, sensor data) collected or obtained by one or more of a client device 104, head-mounted display 150, imaging device 160, and IOT sensor 170, and rendering the virtual content on top of a field of view of one or more of the client device 104, head-mounted display 150, and imaging device 160;

• One or more databases 528 for storing at least data including one or more of: o Device settings 529 including common device settings (e.g., service tier, device model, storage capacity, processing capabilities, communication capabilities, etc.) of the one or more servers 102, client devices 104, head-mounted displays 150, imaging devices 160, and IOT sensors 170; o User account information 530 for the one or more user applications 520, e.g., user names, security questions, account history data, user preferences, and predefined account settings; o Network parameters 531 for the one or more communication networks 108, e.g., IP address, subnet mask, default gateway, DNS server and host name; o Training data 532 for training one or more data processing models 533; o Data processing model(s) 533 for processing content data (e.g., video data, visual data, audio data, sensor data) using deep learning techniques; o Content data and results 534 that are obtained by and outputted to the client device 104, AR glasses 150, imaging devices 160, IOT sensors 170 of the content rendering system 100, respectively, where the content data is processed by the data processing models 522 locally at the respective devices or remotely at the server 102 to provide the associated results to be presented on the client devices 104, AR glasses 150, and/or other devices; and o Mapping data 535 that includes mapping information generated by the mapping module 524.

[0073] Optionally, the one or more databases 528 are stored in one of the server 102, client device 104, imaging devices 160, and storage 106 of the content rendering system 100. Optionally, the one or more databases 230 are distributed in more than one of the server 102, client device 104, and storage 106 of the content rendering system 100. In some embodiments, more than one copy of the above data is stored at distinct devices, e.g., two copies of the data processing models 533 are stored at the server 102 and storage 106, respectively.

[0074] Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, modules or data structures, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory 506 stores a subset of the modules and data structures identified above. Furthermore, memory 506 stores additional modules and data structures not described above.

[0075] Figures 6A-6D are flowcharts of a method 600 for rendering virtual content as discussed above in Figure 1-5, in accordance with some embodiments. For convenience, the method 600 is described as being implemented by a computer system (e.g., a client device 104, AR glasses 150, imaging device 160, a server 102, IOT sensors 170 or a combination thereof; Figure 1-5C). Method 600 is, optionally, governed by instructions that are stored in a non-transitory computer readable storage medium and that are executed by one or more processors of the computer system. Each of the operations shown in Figures 6A-6D correspond to instructions stored in a computer memory or non-transitory computer readable storage medium (e.g., memory 506 of server 102 in Figure 5 A). The computer readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices. The instructions stored on the computer readable storage medium may include one or more of: source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors. Some operations in method 600 may be combined and/or the order of some operations may be changed.

[0076] The computer system obtains (602) a device pose of a first imaging device and identifies (604) a subset of a plurality of second home devices based on the device pose of the first imaging device. The first imaging device and second home devices are distributed in a premises. For example, as described above in reference to Figure 2, a first imaging device is a pair of AR glasses 150-1 and the subset of a plurality of second home devices include imaging devices 210 distributed along the premises. In some embodiments, the device pose is determined using a pose determination and prediction module 525 described above in Figure 5.

[0077] In some embodiments, referring to Figure 2, the subset of second home devices includes (606-a) a camera device (e.g., imaging device 210-1), and identifying the subset of second home devices further includes determining (606-b) a first field of view, a device position, and a device orientation of the display of the first imaging device based on the device pose (e.g., device position and device orientation), and identifying (606-c) the camera device according to the device position and orientation of the display. The camera device having a second field of view that intersects with an extension of the first field of view and is separated from the first field of view by at least a structure, and the collected information including video data captured by the camera device. For example, intersecting fields of view 365-2 and 367 shown in Figure 3B and intersecting fields of view 407 and 409 in Figure 4.

[0078] In some embodiments, the first imaging device is (608-a) located in a first portion of the premises, and the camera device is located at a second portion of the premises that is at least partially separated from the first portion by the structure. For example, AR glasses 150-1 is separated from imaging device 210 by wall 402 as shown in at least Figure 4. In some embodiments, the computer system further identifies (608-b) a person in the video data captured by the camera device. The virtual content rendered by the computer system (as discussed below) includes an avatar 413 generated based on the person. Examples of the avatar are described above in reference to Figure 4. [0079] The computer system, in accordance with a determination that the first imaging device has access to information captured by the subset of second home devices, collects (610) the information captured by the subset of second home devices, and generates virtual content based on the collected information. The computer system renders (612) the virtual content laid on a display. Examples of the rendered virtual content is shown above in reference to Figure 4.

[0080] In some embodiments, rendering (614-a) the virtual content on the display further includes rendering (614-b) the avatar of the person on the display without any background content from the second field of view, rendering (614-c) both the avatar of the person and part of the background content from the second field of view surrounding the person on the display, or rendering (614-d) the avatar of the person and a layout wireframe associated with the second field of view on the display. In some embodiments, the first imaging device includes the display, and the avatar of the person is overlaid on top of the first field of view of the display. In some embodiments, the computer system determines (618) a posture of the person in the video data captured by the camera device, and the avatar of the person is rendered on the display with the posture of the person.

[0081] In some embodiments, referring to Figure 2, a first HMD 150-1 includes (620- a) the first imaging device and the display, and the person carries a second HMD 150-2. After the camera device and the person are identified in the second field of view of the camera device, the first HMD 150-1 receives (620-b) a message from the second HMD 150-2 carried by the person 215-3, and displays (620-c) the message with the avatar of the person on the display. In some embodiments, the first imaging device includes (622-a) a first HMD 150-1, and the person carries a second HMD 150-2. After the camera device and the person are identified in the second field of view of the camera device, the first and second HMDs 150 exchange (622 -b) voice messages. In some embodiments, referring to Figure 2, the first imaging device includes (622-a) a first HMD 150-1, and the person (e.g., 215-1) carries a mobile device 104C. After the camera device 210-3 and the person 215-1 are identified in the second field of view of the camera device 210-3, the first HMD 150-1 exchanges (622-b) voice messages with the mobile device 104C. More information concerning these examples of different communications between client devices 104 and/or HMDs 150 is provided above with reference to Figures 2 and 4.

[0082] In some embodiments, referring to Figure 3B, the camera device includes a first camera device, e.g., 355-2, and the subset of second home devices include a second camera device, e.g., 355-3. Identifying (626-a) the subset of second home devices further includes identifying (626-b) the second camera device according to the device position and orientation of the first imaging device. The second camera device has a third field of view that intersects with the extension of the first field of view and is separated from the first field of view. The computer system further identifies (626-c) a second person in video data captured by the second camera device, and renders (626-d) a second avatar of the second person that is captured by the second camera device on the display jointly with the avatar of the person in the video data that is captured by the first camera device.

[0083] In some embodiments, the computer system identifies (628) an interested object based on the collected information captured by the subset of second home devices, the virtual content includes a virtual object generated based on the interested object. In some embodiments, the subset of second home devices includes (630-a) a thermostat, and the collected information includes a temperature value measured by the thermostat. The virtual content includes the temperature value that is rendered for display to a user of the display. [0084] In some embodiments, the computer system obtains (632-a) location information of each of the first imaging device and second home devices, thereby determining a relative location of each second home device with reference to a device location of the first imaging device. In some embodiments, the first imaging device is (632- b) moveable in the premises. In some embodiments, the computer system scans (634) the premises using a sensor device (e.g., IOT sensors 170 or imaging device 160) to obtain a map including a respective location of each fixed home device in the second home devices. In some embodiments, the computer system determines (636-a) that the first imaging device and the plurality of second home devices are associated with a user account on an online platform hosted by a server, thereby determining that the first imaging device has the access to the collected information, and the first imaging device and second home devices are configured (636-b) to communicate with each other and with the server via one or more communication networks. In some embodiments, if a map is constructed, then a server or web service can be deployed to enable the AR content rending described herein.

[0085] In some embodiments, the computer system identifies (638-a) a third electronic device that is also located in a direction associated with the device pose of the first imaging device, and determines (638-a) that the first imaging device does not have an access to information collected by the third electronic device. The computer system renders (638-c) the virtual content on the display, independently of the information collected by the third electronic device. [0086] In summary, the computer system utilizes obtained image and/or video from one or more devices and their respective location information to render virtual content for facilitating personalized interactions between users that is not restricted by their location and/or obstructions along the users’ line of sight. More specifically, the computer system renders virtual content of other users’ obtained image and/or video data in the users’ FOV display such that in person meetings are simulated. This allows users to have face-to-face interactions without obstructions allowing for more personalized interactions.

[0087] It should be understood that the particular order in which the operations in

Figures 6A-6D have been described are merely exemplary and are not intended to indicate that the described order is the only order in which the operations could be performed. One of ordinary skill in the art would recognize various ways to identify device poses as described herein. Additionally, it should be noted that details of other processes described above with respect to Figures 2-5 are also applicable in an analogous manner to method 600 described above with respect to Figures 6A-6D. For brevity, these details are not repeated here.

[0088] The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Additionally, it will be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.

[0089] As used herein, the term “if’ is, optionally, construed to mean “when” or

“upon” or “in response to determining” or “in response to detecting” or “in accordance with a determination that,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event]” or “in accordance with a determination that [a stated condition or event] is detected,” depending on the context.

[0090] The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain principles of operation and practical applications, to thereby enable others skilled in the art.

[0091] Although various drawings illustrate a number of logical stages in a particular order, stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art, so the ordering and groupings presented herein are not an exhaustive list of alternatives. Moreover, it should be recognized that the stages can be implemented in hardware, firmware, software or any combination thereof.