INTERACTIVE STEREOSCOPIC DISPLAY CALIBRATION

Title:

INTERACTIVE STEREOSCOPIC DISPLAY CALIBRATION

Document Type and Number:

WIPO Patent Application WO/2023/003558

Kind Code:

Abstract:

This application is directed to systems, devices and methods of interactively adjusting a stereoscopic display. The stereoscopic display includes two display portions for displaying media content to two eyes of a user. For each display portion of the stereoscopic display, an electronic device renders a virtual object in a virtual field of view that is overlaid on a real field of view of a camera coupled to the stereoscopic display. The electronic device further generates a convergence adjustment instruction to calibrate a display convergence of the stereoscopic display. In response to the convergence adjustment instruction, the electronic device determines convergence display settings for the stereoscopic display based on a user convergence confirmation that the virtual object is displayed with convergence.

Inventors:

MA YUXIN (US)
XU YI (US)

Application Number:

PCT/US2021/042815

Publication Date:

January 26, 2023

Filing Date:

July 22, 2021

Export Citation:

Click for automatic bibliography generation Help

Assignee:

INNOPEAK TECH INC (US)

International Classes:

H04N17/00

Foreign References:

US20150215611A1	2015-07-30
US20170099481A1	2017-04-06
US20160170208A1	2016-06-16

Attorney, Agent or Firm:

WANG, Jianbai et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

What is claimed is:

1. A method for interactively adjusting a stereoscopic display, the method comprising: for each display portion of the stereoscopic display, rendering a virtual object in a virtual field of view that is overlaid on a real field of view of a camera coupled to the stereoscopic display, the stereoscopic display including two display portions for displaying media content to two eyes of a user; generating a convergence adjustment instruction to calibrate a display convergence of the stereoscopic display; and in response to the convergence adjustment instruction, determining convergence display settings for the stereoscopic display based on a user convergence confirmation that the virtual object is displayed with convergence.

2. The method of claim 1, further comprising, prior to generating the convergence adjustment instruction: generating two FOV adjustment instructions to calibrate the virtual field of view in each display portion; and sequentially and in response to the two FOV adjustment instructions, determining FOV display settings of each display portion based on a respective user FOV confirmation that the respective virtual object matches the real field of view displayed in the respective display portion.

3. The method of claim 2, further comprising, in response to each of the two FOV adjustment instructions: receiving a first user input to adjust the virtual object and virtual field of view displayed in the respective display portion; based on the first user input, adjusting display of the virtual object and the virtual field of view overlaid on the real field of view displayed on the respective display portion; and receiving the respective user FOV confirmation that the virtual object matches the real field of view displayed in the respective display portion.

4. The method of claim 3, further comprising, in response to each of the two FOV adjustment instructions, iteratively until the user FOV confirmation: sequentially rendering two first virtual fields of view overlaid on the real field of view in the respective display portion; wherein the first user input selects one of the two first virtual fields of view, and the selected one of the two first virtual fields of view providing a better rendering effect for the virtual object than an unselected one of the two first virtual fields of view.

5. The method of claim 2, further comprising, in response to each of the two FOV adjustment instructions: sequentially rendering two first virtual fields of view overlaid on the real field of view in the respective display portion; wherein the user FOV confirmation selects one of the two first virtual fields of view, and the selected one of the two first virtual fields of view providing a better rendering effect for the virtual object than an unselected one of the two first virtual fields of view.

6. The method of any of claims 2-5, wherein the FOV display settings for each display portion includes a horizontal FOV display setting and a vertical FOV display setting, and the horizontal and vertical FOV display settings of each display portion are determined in response to the respective FOV adjustment instruction.

7. The method of any of claims 2-6, wherein: each of the two FOV adjustment instructions identifies one of the two display portions; for each FOV adjustment instruction, the user is instructed to open a first eye that watches the one of the two display portions or close a second eye that watches the other one of the two display portions.

8. The method of any of the preceding claims, further comprising, in response to the convergence adjustment instruction: receiving a second user input to adjust the display convergence; based on the second user input, adjusting display of the virtual object and the virtual field of view overlaid on the real field of view displayed on the two display portions; and receiving the user convergence confirmation that the virtual object is displayed with convergence on the two display portions.

9. The method of claim 8, further comprising, in response to the convergence adjustment instruction: sequentially rendering two second virtual fields of view overlaid on the real field of view in the respective display portion; wherein the second user input selects one of the two second virtual fields of view, the selected one of the two second virtual fields of view providing a better rendering effect for the display convergence of the virtual object than an unselected one of the two second virtual fields of view.

10. The method of any of claims 1-8, further comprising, in response to the convergence adjustment instruction: sequentially rendering two second virtual fields of view overlaid on the real field of view in the respective display portion; wherein the user convergence confirmation selects one of the two second virtual fields of view, the selected one of the two second virtual fields of view providing a better rendering effect for the display convergence of the virtual object than an unselected one of the two second virtual fields of view.

11. The method of any of the preceding claims, wherein the virtual object is rendered according to a real object existing in the real field of view of the camera, and the user convergence confirmation is received when the virtual object matches the real object in the respective display portion, the method further comprising: moving the virtual object in the virtual field of view to align the virtual object with the real object.

12. The method of any of the preceding claims, wherein the virtual object is selected from a plurality of predefined virtual objects, and has a position and an orientation.

13. The method of any of the preceding claims, further comprising: in response to the convergence adjustment instruction, displaying the virtual object in the virtual field of view at a plurality of image depths; and adjusting the convergence display settings for the stereoscopic display for each of the plurality of image depths.

14. The method of any of the preceding claims, wherein: the convergence adjustment instruction corresponds to both of the two display portions; and the user is instructed to open both eyes for the convergence adjustment instruction.

15. The method of any of claims 1-14, wherein the user convergence confirmations includes a respective hand gesture captured by the camera.

16. The method of any of claims 1-14, wherein the user convergence confirmations corresponds to a respective predefined length of time for which no user action is received after generating the corresponding first or convergence adjustment instruction.

17. The method of any of the preceding claims, further comprising at a user interface, moving a virtual pointer to the virtual object, clicking on a button or the virtual object to grab the virtual object, and moving the virtual object in the virtual field of view.

18. The method of any of the preceding claims, wherein the two display portions are physically separated in the stereoscopic display.

19. The method of any of the preceding claims, further comprising: adjusting a six degrees of freedom (6DOF) pose of the virtual object in each display portion based on at least the convergence display settings.

20. An electronic device, comprising: one or more processors; and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform a method of any of claims 1-19.

21. A non-transitory computer-readable medium, having instructions stored thereon, which when executed by one or more processors cause the processors to perform a method of any of claims 1-19.

Description:

Interactive Stereoscopic Display Calibration

TECHNICAL FIELD

[0001] This application relates generally to display technology including, but not limited to, methods, systems, and non-transitory computer-readable media for calibrating a stereoscopic display of an electronic system and projecting virtual content on real world images accurately.

BACKGROUND

[0002] Augmented reality (AR) applications overlay virtual content on top of a view of real world objects or surfaces to provide enhanced perceptual information to a user. Position, orientation, and scale of virtual objects are adjusted with respect to the real world objects or surfaces in the image, such that the virtual content can be seamlessly merged with the image. Such adjustment is particularly critical to head mounted displays in which a stereoscopic vision enables depth perception and causes virtual-real alignment errors. Most head mounted displays are calibrated in the factory and cannot be adjusted or calibrated by individual users. Even for the very few head mounted displays that allow users to calibrate their stereoscopic visions, calibration processes are complex and hard to manage, making it nearly impossible for individual users to identify optimal settings of these displays. In some situations, the user may need to acquire additional expensive hardware to calibrate the head- mounted displays accurately. Therefore, it would be beneficial to adopt a more convenient and efficient calibration mechanism in an augmented reality device (e.g., a head-mounted display) than the current practice.

SUMMARY

[0003] Various embodiments of this application are directed to methods and systems of interactively adjusting stereoscopic display settings on artificial reality systems. For example, interactive adjustment of the stereoscopic display settings is implemented in an optical see-through head-mounted display (OST-HMD), and virtual objects are rendered to be merged seamlessly with a background image of the real world. The artificial reality systems include, but are not limited to, non-immersive, semi-immersive, and fully immersive virtual reality (VR) systems; marker-based, markerless, location-based, and projection-based augmented reality systems; hybrid reality systems; and other types of mixed reality systems. As the skilled artisan will appreciate upon reading the descriptions provided herein, novel wearable devices described herein can be used with any of these types of artificial reality environments.

[0004] The disclosed methods and systems herein enable users to calibrate their artificial reality systems with a simple one-step or two-step calibration process. The calibration process allows a single artificial reality system to be shared among multiple users, because the calibration process allows display settings of the artificial reality system to be adjusted every time a distinct user uses (e.g., wears) the artificial reality system. Specifically, for each distinct user, the display settings of the artificial reality system are customized according to an Inter-Pupillary Distance (IPD) measured between centers of pupils of the distinct user or according to whether the distinct user wears correction lenses. Additionally, as a result of the calibration process, the display settings of the artificial reality system are stored in memory in association with distinct users or to account for deformation or aging of the artificial reality system itself. The stored display settings are reloaded according to distinct users or when the artificial reality system is restarted. By these means, the user of the artificial reality system can calibrate the system locally and conveniently, while keeping consistent virtual-real alignment without sending the system back to a manufacturer or purchasing a new artificial reality system.

[0005] In one aspect, a method is implemented at an electronic system for interactively adjusting a stereoscopic display. The method includes for each display portion of the stereoscopic display, rendering a virtual object in a virtual field of view that is overlaid on a real field of view of a camera coupled to the stereoscopic display. The stereoscopic display includes two display portions for displaying media content to two eyes of a user. The method further includes generating a convergence adjustment instruction to calibrate a display convergence of the stereoscopic display. The method includes in response to the convergence adjustment instruction, determining convergence display settings for the stereoscopic display based on a user convergence confirmation that the virtual object is displayed with convergence.

[0006] In some embodiments, the method includes prior to generating the convergence adjustment instruction, generating two FOV adjustment instructions to calibrate the virtual field of view in each display portion, sequentially and in response to the two FOV adjustment instructions, determining FOV display settings of each display portion based on a respective user FOV confirmation that the respective virtual object matches the real field of view displayed in the respective display portion. In some embodiments, the method further includes in response to each of the two FOV adjustment instructions, receiving a first user input to adjust the virtual object and virtual field of view displayed in the respective display portion. Based on the first user input, the method includes adjusting display of the virtual object and the virtual field of view overlaid on the real field of view displayed on the respective display portion. The method further includes receiving the respective user FOV confirmation that the virtual object matches the real field of view displayed in the respective display portion. In some embodiment, the method further includes in response to each of the two FOV adjustment instructions, iteratively until the user FOV confirmation sequentially rendering two first virtual fields of view overlaid on the real field of view in the respective display portion. The first user input selects one of the two first virtual fields of view, and the selected one of the two first virtual fields of view providing a better rendering effect for the virtual object than an unselected one of the two first virtual fields of view. In some embodiments, the method further includes in response to each of the two FOV adjustment instructions sequentially rendering two first virtual fields of view overlaid on the real field of view in the respective display portion. The user FOV confirmation selects one of the two first virtual fields of view, and the selected one of the two first virtual fields of view providing a better rendering effect for the virtual object than an unselected one of the two first virtual fields of view.

[0007] In some embodiments, the FOV display settings for each display portion includes a horizontal FOV display setting and a vertical FOV display setting, and the horizontal and vertical FOV display settings of each display portion are determined in response to the respective FOV adjustment instruction. In some embodiments, each of the two FOV adjustment instructions identifies one of the two display portions, and for each FOV adjustment instruction, the user is instructed to open a first eye that watches the one of the two display portions or close a second eye that watches the other one of the two display portions.

[0008] In some embodiments, the method further includes in response to the convergence adjustment instruction, receiving a second user input to adjust the display convergence. The method includes based on the second user input, adjusting display of the virtual object and the virtual field of view overlaid on the real field of view displayed on the two display portions. The method includes receiving the user convergence confirmation that the virtual object is displayed with convergence on the two display portions. In some embodiments, the method further includes in response to the convergence adjustment instruction sequentially rendering two second virtual fields of view overlaid on the real field of view in the respective display portion. The second user input selects one of the two second virtual fields of view, the selected one of the two second virtual fields of view providing a better rendering effect for the display convergence of the virtual object than an unselected one of the two second virtual fields of view.

[0009] In some embodiments, the method further includes in response to the convergence adjustment instruction sequentially rendering two second virtual fields of view overlaid on the real field of view in the respective display portion. The user convergence confirmation selects one of the two second virtual fields of view, the selected one of the two second virtual fields of view providing a better rendering effect for the display convergence of the virtual object than an unselected one of the two second virtual fields of view. In some embodiments, the virtual object is rendered according to a real object existing in the real field of view of the camera, and the user convergence confirmation is received when the virtual object matches the real object in the respective display portion, and the method further includes moving the virtual object in the virtual field of view to align the virtual object with the real object.

[0010] In some embodiments, the virtual object is selected from a plurality of predefined virtual objects, and has a position and an orientation. In some embodiments, the method includes in response to the convergence adjustment instruction, displaying the virtual object in the virtual field of view at a plurality of image depths. The method further includes adjusting the convergence display settings for the stereoscopic display for each of the plurality of image depths.

[0011] In some embodiments, the convergence adjustment instruction corresponds to both of the two display portions, and the user is instructed to open both eyes for the convergence adjustment instruction. In some embodiments, the user convergence confirmation includes a respective hand gesture captured by the camera. In some embodiments, the user convergence confirmation corresponds to a respective predefined length of time for which no user action is received after generating the corresponding first or convergence adjustment instruction.

[0012] In some embodiments, the method further includes at a user interface, moving a virtual pointer to the virtual object, clicking on a button or the virtual object to grab the virtual object, and moving the virtual object in the virtual field of view. In some embodiments, the two display portions are physically separated in the stereoscopic display. In some embodiments, the method further includes adjusting a six degrees of freedom (6DOF) pose of the virtual object in each display portion based on at least the FOV and/or convergence display settings. [0013] In another aspect, some implementations include an electronic system that includes one or more processors and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform any of the above methods.

[0014] In yet another aspect, some implementations include a non-transitory computer-readable medium, having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the above methods.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] For a better understanding of the various described implementations, reference should be made to the Detailed Description below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.

[0016] Figure 1 A is an example data processing environment having one or more servers communicatively coupled to one or more client devices, in accordance with some embodiments. Figure IB illustrates a pair of augmented reality (AR) glasses (also called a head-mounted display) that can be communicatively coupled to a data processing environment, in accordance with some embodiments.

[0017] Figure 2 is a block diagram illustrating an interactive stereoscopic display calibration system, in accordance with some embodiments.

[0018] Figure 3 is a diagram illustrating a rendering process configured to render virtual objects on a the real world view or an image associated with the real world view, in accordance with some embodiments.

[0019] Figure 4A is a diagram comparing three images that are rendered with different virtual fields of view (FOV), in accordance with some embodiments.

[0020] Figure 4B is a flow diagram of a process of adjusting an FOV of an electronic device having a stereoscopic display, in accordance with some embodiments.

[0021] Figure 5A a diagram comparing another three images that are rendered with different virtual FOAs, in accordance with some embodiments.

[0022] Figure 5B is a flow diagram of a process of adjusting convergence of an electronic device having a stereoscopic display, in accordance with some embodiments. [0023] Figure 6 is a flowchart of a method for interactively adjusting a stereoscopic display, in accordance with some embodiments. [0024] Like reference numerals refer to corresponding parts throughout the several views of the drawings.

DETAILED DESCRIPTION

[0025] Reference will now be made in detail to specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to assist in understanding the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that various alternatives may be used without departing from the scope of claims and the subject matter may be practiced without these specific details. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein can be implemented on many types of electronic systems with digital video capabilities.

[0026] Figure 1 A is an example data processing environment 100 having one or more servers 102 communicatively coupled to one or more client devices 104, in accordance with some embodiments. The one or more client devices 104 may be, for example, desktop computers 104 A, tablet computers 104B, mobile phones 104C, or intelligent, multi-sensing, network-connected home devices (e.g., a camera). In some implementations, the one or more client devices 104 include a head-mounted display 150. In some embodiments, the head- mounted display 150 is an optical see-through head-mounted display (OST-HMD) that renders virtual objects on a view of the real world. Each client device 104 can collect data or user inputs, executes user applications, and present outputs on its user interface. The collected data or user inputs can be processed locally (e.g., for training and/or for prediction) at the client device 104 and/or remotely by the server(s) 102. The one or more servers 102 provides system data (e.g., boot files, operating system images, and user applications) to the client devices 104, and in some embodiments, processes the data and user inputs received from the client device(s) 104 when the user applications are executed on the client devices 104. In some embodiments, the data processing environment 100 further includes a storage 106 for storing data related to the servers 102, client devices 104, and applications executed on the client devices 104. For example, storage 106 may store video content (including visual and audio content), static visual content, and/or inertial sensor data for training a machine learning model (e.g., deep learning network). Alternatively, storage 106 may also store video content, static visual content, and/or inertial sensor data obtained by a client device 104 to which a trained machine learning model can be applied to determine one or more poses associated with the video content, static visual content, and/or inertial sensor data. [0027] The one or more servers 102 can enable real-time data communication with the client devices 104 that are remote from each other or from the one or more servers 102. Further, in some embodiments, the one or more servers 102 can implement data processing tasks that cannot be or are preferably not completed locally by the client devices 104. For example, the client devices 104 include a game console (e.g., the head-mounted display 150) that executes an interactive online gaming application. The game console receives a user instruction and sends it to a game server 102 with user data. The game server 102 generates a stream of video data based on the user instruction and user data and providing the stream of video data for display on the game console and other client devices that are engaged in the same game session with the game console. In another example, the client devices 104 include a networked surveillance camera and a mobile phone 104C. The networked surveillance camera collects video data and streams the video data to a surveillance camera server 102 in real time. While the video data is optionally pre-processed on the surveillance camera, the surveillance camera server 102 processes the video data to identify motion or audio events in the video data and share information of these events with the mobile phone 104C, thereby allowing a user of the mobile phone 104 to monitor the events occurring near the networked surveillance camera in the real time and remotely.

[0028] The one or more servers 102, one or more client devices 104, and storage 106 are communicatively coupled to each other via one or more communication networks 108, which are the medium used to provide communications links between these devices and computers connected together within the data processing environment 100. The one or more communication networks 108 may include connections, such as wire, wireless communication links, or fiber optic cables. Examples of the one or more communication networks 108 include local area networks (LAN), wide area networks (WAN) such as the Internet, or a combination thereof. The one or more communication networks 108 are, optionally, implemented using any known network protocol, including various wired or wireless protocols, such as Ethernet, Universal Serial Bus (USB), FIREWIRE, Long Term Evolution (LTE), Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wi-Fi, voice over Internet Protocol (VoIP), Wi-MAX, or any other suitable communication protocol. A connection to the one or more communication networks 108 may be established either directly (e.g., using 3G/4G/5G connectivity to a wireless carrier), or through a network interface 110 (e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node), or through any combination thereof. As such, the one or more communication networks 108 can represent the Internet of a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other electronic systems that route data and messages.

[0029] In some embodiments, deep learning techniques are applied in the data processing environment 100 to process content data (e.g., video data, visual data, audio data) obtained by an application executed at a client device 104 to identify information contained in the content data, match the content data with other data, categorize the content data, or synthesize related content data. The content data may broadly include inertial sensor data captured by inertial sensor(s) of a client device 104. In these deep learning techniques, data processing models are created based on one or more neural networks to process the content data. These data processing models are trained with training data before they are applied to process the content data. In some embodiments, both model training and data processing are implemented locally at each individual client device 104 (e.g., the client device 104C and head-mounted display 150). The client device 104C or head-mounted display 150 obtains the training data from the one or more servers 102 or storage 106 and applies the training data to train the data processing models. Subsequently to model training, the client device 104C or head-mounted display 150 obtains the content data (e.g., captures video data via an internal and/or external image sensor, such as a camera) and processes the content data using the training data processing models locally. Alternatively, in some embodiments, both model training and data processing are implemented remotely at a server 102 (e.g., the server 102A) associated with a client device 104 (e.g. the client device 104A and head-mounted display 150). The server 102A obtains the training data from itself, another server 102 or the storage 106 and applies the training data to train the data processing models. The client device 104A or head-mounted display 150 obtains the content data, sends the content data to the server 102A (e.g., in an application) for data processing using the trained data processing models, receives data processing results (e.g., recognized or predicted device poses) from the server 102A, presents the results on a user interface (e.g., associated with the application), rending virtual objects in a field of view based on the poses, or implements some other functions based on the results. The client device 104A or head-mounted display 150 itself implements no or little data processing on the content data prior to sending them to the server 102 A. Additionally, in some embodiments, data processing is implemented locally at a client device 104 (e.g., the client device 104B and head-mounted display 150), while model training is implemented remotely at a server 102 (e.g., the server 102B) associated with the client device 104B or head-mounted display 150. The server 102B obtains the training data from itself, another server 102 or the storage 106 and applies the training data to train the data processing models. The trained data processing models are optionally stored in the server 102B or storage 106. The client device 104B or head-mounted display 150 imports the trained data processing models from the server 102B or storage 106, processes the content data using the data processing models, and generates data processing results to be presented on a user interface or used to initiate some functions (e.g., rendering virtual objects based on device poses) locally.

[0030] Figure IB illustrates a pair of augmented reality (AR) glasses 150 (also called a head-mounted display) that can be communicatively coupled to a data processing environment 100, in accordance with some embodiments. The AR glasses 150 includes one or more of an image sensor, a microphone, a speaker, one or more inertial sensors (e.g., gyroscope, accelerometer), and a display. The image sensor and microphone are configured to capture video and audio data from a scene of the AR glasses 150, while the one or more inertial sensors are configured to capture inertial sensor data. In some situations, the image sensor captures hand gestures of a user wearing the AR glasses 150. In some situations, the microphone records ambient sound, including user’s voice commands. In some situations, both video or static visual data captured by the image sensor and the inertial sensor data measured by the one or more inertial sensors are applied to determine and predict device poses. The video, static image, audio, or inertial sensor data captured by the AR glasses 150 is processed by the AR glasses 150, server(s) 102, or both to recognize the device poses. Optionally, deep learning techniques are applied by the server(s) 102 and AR glasses 150 jointly to recognize and predict the device poses. The device poses are used to control the AR glasses 150 itself or interact with an application (e.g., a gaming application) executed by the AR glasses 150. In some embodiments, the display of the AR glasses 150 displays a user interface, and the recognized or predicted device poses are used to render or interact with user selectable display items on the user interface.

[0031] In some embodiments, simultaneous localization and mapping (SLAM) techniques are applied in the data processing environment 100 to process video data, static image data captured by the AR glasses 150 with inertial sensor data. Device poses are recognized and predicted, and a scene in which the AR glasses 150 are located is mapped and updated. The SLAM techniques are optionally implemented by the AR glasses 150 independently or by both of the server 102 and the AR glasses 150 jointly. In some embodiments, virtual objects are rendered on top of an image captured by a camera of the AR glasses 150 and displayed on a display of the AR glasses 150. In some embodiments, the AR glasses 150 have see-through portions on which the virtual objects are projected, such that the virtual objects are displayed on top of a view of the real world directly.

[0032] Figure 2 is a block diagram illustrating an electronic system 200, in accordance with some embodiments. The electronic system 200 includes a server 102, a client device 104, a storage 106, or a combination thereof. An example of the electronic system 200 is the AR glasses 150 The electronic system 200, typically, includes one or more processing units (CPUs) 202, one or more network interfaces 204, memory 206, and one or more communication buses 208 for interconnecting these components (sometimes called a chipset). The electronic system 200 includes one or more input devices 210 that facilitate user input, such as a keyboard, a mouse, a voice-command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture capturing image sensor, or other input buttons or controls. Furthermore, in some embodiments, the client device 104 of the electronic system 200 uses a microphone and voice recognition or an image sensor 260 and gesture recognition to supplement or replace the keyboard. In some embodiments, the client device 104 includes one or more image sensors 260 (e.g., tracking cameras, infrared sensors, CMOS sensors, etc.), scanners, or photo sensor units for capturing images, for example, of graphic serial codes printed on the electronic devices. The electronic system 200 also includes one or more output devices 212 that enable presentation of user interfaces and display content, including one or more speakers and/or one or more visual displays. Optionally, the client device 104 includes a location detection device, such as a GPS (global positioning satellite) or other geo-location receiver, for determining the location of the client device 104. Optionally, the client device 104 includes an inertial measurement unit (IMU)

280 integrating multi-axes inertial sensors to provide estimation of a location and an orientation of the client device 104 in space. Examples of the one or more inertial sensors include, but are not limited to, a gyroscope, an accelerometer, a magnetometer, and an inclinometer.

[0033] Memory 206 includes high-speed random access memory, such as DRAM,

SRAM, DDR RAM, or other random access solid state memory devices; and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. Memory 206, optionally, includes one or more storage devices remotely located from one or more processing units 202. Memory 206, or alternatively the non-volatile memory within memory 206, includes a non-transitory computer readable storage medium. In some embodiments, memory 206, or the non- transitory computer readable storage medium of memory 206, stores the following programs, modules, and data structures, or a subset or superset thereof:

• Operating system 214 including procedures for handling various basic system services and for performing hardware dependent tasks;

• Network communication module 216 for connecting each server 102 or client device 104 to other devices (e.g., server 102, client device 104, or storage 106) via one or more network interfaces 204 (wired or wireless) and one or more communication networks 108, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;

• User interface module 218 for enabling presentation of information (e.g., a graphical user interface for application(s) 224, widgets, websites and web pages thereof, and/or games, audio and/or video content, text, etc.) at each client device 104 via one or more output devices 212 (e.g., displays, speakers, etc.);

• Input processing module 220 for detecting one or more user inputs or interactions from one of the one or more input devices 210 and interpreting the detected input or interaction;

• Web browser module 222 for navigating, requesting (e.g., via HTTP), and displaying websites and web pages thereof, including a web interface for logging into a user account associated with a client device 104 or another electronic device, controlling the client or electronic device if associated with the user account, and editing and reviewing settings and data that are associated with the user account;

• One or more user applications 224 for execution by the data processing system 200 (e.g., games, social network applications, smart home applications, and/or other web or non-web based applications for controlling another electronic device and reviewing data captured by such devices);

• Pose determination and prediction module 226 for determining and predicting a 6 degree of freedom pose (position and orientation) of the client device 104 (e.g., AR glasses 150), and further includes an SLAM (Simultaneous Localization and Mapping) module 227 for mapping a scene where a client device 104 is located and identifying a location of the client device 104 within the scene; • Interactive field of view (FOV) adjustment module 228 for adjusting display of a rendered virtual object or the virtual field of view overlaid on the real field of view in each display portion, and receiving a user FOV confirmation that the virtual object matches the real field of view displayed in the respective display portion;

• Interactive Convergence adjustment module 229 for adjusting display of the virtual object or the virtual field of view overlaid on the real field of view in at least two display portions, and receiving a user convergence confirmation that the virtual object is displayed with convergence on the two display portions;

• One or more databases 230 for storing at least data including one or more of: o Device settings 232 including common device settings (e.g., service tier, device model, storage capacity, processing capabilities, communication capabilities, etc.) of the one or more servers 102 or client devices 104; o User account information 234 for the one or more user applications 224, e.g., user names, security questions, account history data, user preferences, and predefined account settings; o Network parameters 236 for the one or more communication networks 108, e.g., IP address, subnet mask, default gateway, DNS server and host name; and o Display Settings 238 including one or more rendering parameters for each user, such as FOV display settings and convergence display settings, that are calibrated for a stereoscopic display of the client device 104.

[0034] Optionally, the one or more databases 230 are stored in one of the server 102, client device 104, and storage 106 of the electronic system 200. Optionally, the one or more databases 230 are distributed in more than one of the server 102, client device 104, and storage 106 of the electronic system 200. In some embodiments, more than one copy of the above data is stored at distinct devices, e.g., two copies of the display settings 238 are stored at the server 102 and storage 106, respectively.

[0035] Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, modules or data structures, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory 206, optionally, stores a subset of the modules and data structures identified above. Furthermore, memory 206, optionally, stores additional modules and data structures not described above. [0036] Figure 3 is a diagram illustrating a rendering process configured to render virtual objects on a the real world view 300 or an image associated with the real world view, in accordance with some embodiments. An electronic device 305, such as such as AR glasses 150 and/or any other client device 104, is disposed in the real world view 300. The real world view 300 further includes a field of view 300 of an image sensor 260, and one or more real world objects 310A-3 IOC that are located in the field of view of the image sensor 260. The real-word view 300 is mapped into an artificial world view 350 that also includes the one or more real world objects 310A-3 IOC. Each of the image sensor 260 and two display portions 315 and 325 corresponds to a respective device pose (e.g., position and orientation) 370, 365, or 375 in the artificial world view 350. A virtual object 360 that does not exist in the real world view 300 can be displayed in the artificial world view 350 based on the device poses 365-375 of the electronic device 305 in a manner that the virtual object 360 seamlessly merges with the one or more real world objects 310A-3 IOC. In some embodiments, when an image is captured by an image sensor 260 of the real world view 300, feature points are extracted from the image in association with the real world objects 310A- 3 IOC. Based on the feature points, the device poses of the image sensor 260 and two display portions 315 and 325 are determined, and the virtual object is rendered.

[0037] In some embodiments, the electronic device 305 includes a stereoscopic display having at least two display portions (e.g. left display 315 and right display 325). The at least two display portions 315 and 325 are configured display media content to two eyes of a user separately and in a synchronous manner. In some embodiments, the at least two display portions 315 and 325 are physically separated in the stereoscopic display. In some embodiments, the at least two display portions 315 and 325 are formed on two distinct substrates. In some embodiments, the at least two display portions 315 and 325 are formed on two distinct regions of the same substrate, and optical paths to the eyes are physically separated. In some embodiments, the at least two display portions 315 and 325 overlap each other in the stereoscopic display. For example, in some embodiments, the at least two display portions 315 and 325 are temporally multiplexed, using two distinct polarizer to review them separately. In some embodiments, the electronic device 305 includes an image sensor 260 (e.g., tracking camera) to capture real-time image data (e.g., still frames and/or video data) of the real world environment.

[0038] When the stereoscopic display renders the virtual object 360 on the real world view 300 or an image associated with the real world view 300, the artificial world view 350 in which the virtual object 360 is rendered overlaps the real world view 300 such that an accurate perception of size and depth of the virtual object 360 is created for a user of the stereoscopic display. As an example, a virtual cube of 10 cm by 10 cm by 10 cm rendered at 1 meter in front of the user by the electronic device 305 should appear the same size and distance as a real 10 cm by 10 cm by 10 cm cube placed at 1 meter in front of the user. In some embodiments, the electronic device 305 renders, for each display portion of the stereoscopic display, a virtual object 360 in a virtual field of view 350 that is overlaid on a real field of view 300 of the image sensor 260 coupled to the stereoscopic display. In some embodiments, the virtual object 360 is rendered separately for the two display portions 315 and 325 in a synchronous manner. A virtual object rendering process described below ensures that the stereoscopic display produces accurate perception of size and depth of the virtual object 360 in the artificial field of view 350 for users.

[0039] In some embodiments, the virtual object 360 is rendered according to the real objects 310 existing in the real field of view 300 of the image sensor 260. A real object 310 is placed in the scene and used as a reference object during calibration. In some embodiments, the real object 310 is a 2D print (e.g., a picture) or a 3D object. In some embodiments, a digital copy (i.e., the virtual object 360) of the real object 310 (including both shape and texture) is rendered virtually on the display portion of the electronic device 305 (e.g., the left display 315 and right display 325) with a default set of display settings. For a 2D print, the virtual object 360 is a rectangle with the picture itself applied as a texture map. For a 3D object, the virtual object 360 is acquired using any of the 3D scanning methods and/or devices known in the art, or a 3D computer-aided design (CAD) model if the object is manufactured or 3D-printed from the 3D CAD model. In some embodiments, the real object 310 is used as a 2D pattern or 3D object template to compute six degrees of freedom (6DOF) pose (position and orientation) of the electronic device 305.

[0040] In some embodiments, the real object 310 does not have to be predefined.

More specifically, in some embodiments, the real object 310 ranges from a predefined object at a predefined relative position/orientation from the electronic device 305, to a predefined object tracked by the electronic device 305, and further to any object that is 3D reconstructed and tracked by the electronic device 305. In some embodiments, the use of a particular real object 310 depends on the capabilities of the electronic device 305 to render a virtual object 360 that is substantially consistent with the real object 310. Alternatively or additionally, in some embodiments, the virtual object 360 is selected from a plurality of predefined virtual objects, and has a position and an orientation (e.g., 10 cm by 10 cm by 10 cm cube with a predetermined position and orientation). In some embodiments, the real object 310 that is used as a reference object has an entirely different shape from the virtual object 360 as far as a 2D or 3D location of the real object 310 in the artificial world view can be accurately determined and compared with that of the virtual object 360.

[0041] In some embodiments, the 6DOF pose of the electronic device 305 is tracked.

The 6DOF pose of the electronic device 305 is tracked using a SLAM module 227 (Figure 2) on the electronic device 305. In some embodiments, the 6DOF pose of the electronic device 305 is tracked by recognizing a pre-defmed 2D pattern or 3D object template and tracking the 6DOF pose of the electronic device 305 relative to the 2D pattern or 3D object template. Alternatively or additionally, in some embodiments, the 6DOF pose of the electronic device 305 is tracked using an external tracking device that uses cameras (e.g., image sensor 260) to track the position and orientation of the electronic device 305, such as the tracking camera shown in real world view 300 (e.g., as shown by tracking camera 6DOF 370). In some embodiments, the 6DOF pose of the electronic device 305 is tracked for each display portion of the stereoscopic display (e.g., as shown by left display 6DOF 365 and right display 6DOF 375). In some embodiments, if the SLAM module 227 or external tracking is used for 6DOF pose estimation, the virtual object 360 of the calibration object (i.e. the real object 310) is rendered at a pre-defmed initial position and orientation (e.g., 3 m in front of the user, facing the user). The 6DOF pose is used to adjust alignment of virtual object 360 with the real object 310 as discussed below.

[0042] In some embodiments, a set of user interface functions are provided so that the user can roughly align the virtual object 360 with the real object 310. In some embodiments, the electronic device 305 generates a convergence adjustment instruction to calibrate a display convergence of the stereoscopic display. In some embodiments, the user is first guided to adjust a FOV of the display until the size of the rendered virtual object 360 matches that of real-world counterpart (i.e., real object 310), as described below in Figures 4A and 4B. Then the user is guided to adjust the FOV or convergence display settings until the rendered virtual object 360 is at the same distance and/or depth as its real object 310, as described below in Figures 5 A and 5B.

[0043] In some embodiments, the electronic device allows the user to move the virtual object 360 in the virtual field of view to align the virtual object with the real object 310. In some embodiments, the user to moves the virtual object 360 via one or more inputs in a controller associated with the electronic device 305. In some embodiments, the users aims a virtual pointer at the digital copy (i.e., the virtual object 360), provides a user input (e.g., a selection, such as a click on a button of a controller associated with the electronic device 305) to grab the virtual object 360, and move the virtual object 360 to an intended location by controlling the controller. In some embodiments, a swipe on a touchpad or move on a joystick of the controller rotates the virtual object 360 along a certain axis. In some embodiments, the electronic device 305 presents, via its display, a user interface, the user interface including a virtual pointer to the virtual object 360. The user is able to move the virtual pointer to the virtual object 360, clicking on a button of a controller associated with the electronic device 305 or click the virtual object 360 to grab the virtual object 360, and moving the virtual object 360 in the artificial word view 350. Alternatively, in some embodiments, the user can also opt to move and/or re-orient the real object 310 to roughly align the real object 310 with its virtual object 360.

[0044] Although the above example describe the use of a controller, in some embodiments, hand gestures (captured by the image sensor 260), are used to provide the user inputs for aligning the real object 310 and virtual object 360 as described above.

[0045] Additional scene understanding capability can be used to reduce the efforts required to bring the virtual object 360 and its real-world counterpart (i.e., real object 310) into initial alignment. For example, if wall detection is used, then the virtual object 360 of a 2D picture can be placed on a virtual wall that is a representation of a real wall, where the actual real object 310 is placed. Then the user only needs to move the 2D digital picture on a wall without worrying about the orientation, reducing the effort required to complete the calibration. In some embodiments, the electronic device 305 is fitted with a depth sensing capability. The virtual object 360 corresponding to the real object 310 is created on the fly by or beforehand using a depth sensor or multiple image sensors 260. The real object 310 here is not limited to a single object, and a static scene can be used as well. For example, when the calibration process is started, a 3D reconstruction of the scene in front of the user is built in the artificial real world 350. If a virtual object 360 corresponding to the real object 310 is acquired beforehand, object tracking can be used to find the correct position and orientation offset between the real object 310 and electronic device 305. A one-to-one rendering suggestion of each reconstructed object and/or scene is presented to the user. The user is then tasked to align the virtual object 360 with the real object 310 by adjusting display settings (e.g., FOV display settings or convergence display settings as described below in reference to Figures 4A-5B).

[0046] In some embodiments, in response to the convergence adjustment instruction, the electron device 305 determines convergence display settings for the stereoscopic display based on a user convergence confirmation that the virtual object is displayed with convergence. In some embodiments, the user convergence confirmation is received, e.g., from a user input, when the virtual object 360 matches the real object 310 in the respective display portion. The electronic device 305 allows the user to see the outcome of the display calibration (e.g., virtual object 360 perfectly superimposes over the real object 310). The user is involved in this calibration process to provide feedback on whether the rendered virtual object 360 is consistent with the real object 310, e.g., based on the user’s preference. Once calibration is complete, the convergence display settings are saved with the user’s profile. [0047] In some embodiments, if the real object 310 is used as a 2D pattern or 3D object template for computing the 6DOF pose of the electronic device 305, and the virtual object 360 is rendered at the exact same location and orientation of the real object 310. Inaccuracies in the display settings cause the real object 310 and the virtual object 360 to misalign. In some embodiments, if the display settings are accurate, then by manipulating the position and orientation of the object, the real object 310 and virtual object 360 align perfectly on the display in the electronic device 305; however, it is often not the case. The goal of this interactive calibration process is to find a set of display settings to minimize an alignment error of the real-world object 310 and its virtual object 360 for each user.

[0048] Figure 4A is a diagram comparing three images 400, 430, and 450 that are rendered with different virtual fields of view (FOV), in accordance with some embodiments. The images 400, 430, and 450 correspond to a first FOV display setting resulting in a correct FOV, a second FOV display setting resulting in a FOV that is too small, and a third FOV display setting resulting in a FOV that is too large, respectively. Specifically, the image 400 corresponds to the first FOV display setting and includes a first FOV virtual-real object overlay 410 displayed on screen 415 (e.g., similar to left display 315 and/or right display 325 of a stereoscopic display; Figure 3) as seen through the user’s eyes 420. In the first FOV virtual-real object overlay 410, the virtual object 360 (Figure 3) substantially overlaps the real object 310 (Figure 3), and vice versa, indicating that the first FOV display setting is correct (i.e., optimal). The image 430 corresponds to the second FOV display setting and includes a second FOV virtual-real object overlay 440 displayed on the screen 415 as seen through the user’s eyes 420. In the second FOV virtual-real object overlay 440, the virtual object 360 is larger than the real object 310. The image 450 corresponds to the third FOV display setting and includes a third FOV virtual-real object overlay 460 displayed on the screen 415 as seen through the user’s eyes 420. In the third FOV virtual -real object overlay 460, the virtual object 360 is smaller than the real object 310. As such, when each of the images 430 and 450, the respective FOV display setting has to be manually adjusted by the user towards to first FOV display setting to obtain the virtual object 360 and the real object 310 that substantial match each other.

[0049] In some embodiments, FOV adjustment is performed for one eye 420 and one display portion at a time. The electronic device guides the user to close one eye at a time, e.g., using a displayed user interface and user interface elements. The user then adjusts the FOV display settings for the display portion 315 or 325 of the open eye using a user interface provided and displayed by the electronic device 305 (Figure 3). For example, swipe up and down touchpad increases and decreases the FOV display settings, respectively. FOV adjustment aims to find a correct FOV such that the size of the virtual object 360 appears to match that of the real object 310 as shown in the image 400. In some embodiments, optical lens of the electronic device are highly rigid, and the FOV for the display portions 315 and 325 are not adjustable and stay constant. In some embodiments, the FOV adjustment is optional.

[0050] Figure 4B is a flow diagram of a process 470 of adjusting an FOV of an electronic device having a stereoscopic display, in accordance with some embodiments. For convenience, the method 470 is described as being implemented by the electronic device (e.g., a client device 104, AR glasses 1520, electronic device 305, a server 102, or a combination thereof; Figure 1-3). Method 470 is, optionally, governed by instructions that are stored in a non-transitory computer readable storage medium and that are executed by one or more processors of the electronic device. Each of the operations shown in Figure 4B correspond to instructions stored in a computer memory or non-transitory computer readable storage medium (e.g., memory 206 of the system 200 in Figure 2). The computer readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices. The instructions stored on the computer readable storage medium may include one or more of: source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors. Some operations in method 470 may be combined and/or the order of some operations may be changed.

[0051] In some embodiments, the electronic device renders (475) a virtual object 360 in a virtual field of view for each display portion of an include stereoscopic display. The virtual object 360 in a virtual field of view is overlaid on a real field of view of a camera (e.g., image sensor 260) coupled to the stereoscopic display. In some embodiments, the stereoscopic display includes at least two display portions (e.g., a left display 315 and a right display 325; Figure 3) for displaying media content to two eyes of a user. In some embodiments, the stereoscopic display is physically coupled to the electronic device. Alternatively, in some embodiments, the stereoscopic display is communicatively coupled to the electronic device (e.g., a server 102 communicatively coupled to AR glasses 150).

[0052] In some embodiments, prior to generating a convergence adjustment instruction, the electronic device generates two FOV adjustment instructions to calibrate the virtual field of view in the display portions. In some embodiments, sequentially and in response to the two FOV adjustment instructions, the electronic device determines (480)

FOV display settings of each display portion based on a respective user FOV confirmation indicating that the respective virtual object 360 matches the real field of view displayed in the respective display portion (e.g., the respective virtual object 360 of the virtual field of view matches or substantially overlaps the real object 310 in the real field of view).

[0053] In response to each of the two FOV adjustment instructions, the electronic device receive (485) a first user input to adjust the virtual object 360 and virtual field of view displayed in the respective display portion. In some embodiments, the user input is a hand gesture captured by an image sensor 260. In some embodiments, while FOV adjustment appears to be “moving” the virtual object 360, the display settings are adjusted such that virtual-real objects can be aligned to each other (e.g., the respective virtual object 360 matches the real object 310). Based on the first user input, the electronic device adjusts (490) display of the virtual object 360 and the virtual field of view overlaid on the real field of view displayed on the respective display portion, and receives (495) the respective user FOV confirmation that the virtual object 360 matches the real field of view displayed in the respective display portion (e.g., matches the real object 310).

[0054] In some embodiments, the user FOV confirmation includes a respective hand gesture captured by the image sensor 260. In some embodiments, the user FOV confirmation corresponds to a respective predefined length of time for which no user action is received after generating the corresponding FOV adjustment instructions. In some embodiments, not receiving the user action for the respective predefined length of time indicates that the user approves the FOV display settings. In some embodiments, the user provides inputs using a controller associated with the electronic device as described above with reference to Figure 3. [0055] In some embodiments, each of the two FOV adjustment instructions identifies one of the two display portions. For each FOV adjustment instruction, the user is instructed to open a first eye that watches the one of the two display portions or close a second eye that watches the other one of the two display portions. Alternatively, in some embodiments, the other one of the one of the two display portions is disabled from displaying any content. In some embodiments, the FOV adjustment process is split into adjusting horizontal FOV and vertical FOV independently. In some embodiments, FOV display settings for each display portion includes a horizontal FOV display setting and a vertical FOV display setting, and the horizontal and vertical FOV display settings of each display portion are determined in response to the respective FOV adjustment instruction.

[0056] In some embodiments, in response to each of the two FOV adjustment instructions, the electronic device sequentially renders two first virtual fields of view overlaid on the real field of view in the respective display portion. The user FOV confirmation selects one of the two first virtual fields of view. The selected one of the two first virtual fields of view provides a better rendering effect for the virtual object 360 than an unselected one of the two first virtual fields of view. More specifically, at times, it can be difficult for the user to decide on the final FOV display settings when the adjustment granularity is fine. A binary selection method is performed by the electronic device to select between two options and assist the user. The electronic device presents the rendering results of two FOV display settings consecutively to the user. In some situations, one result shows a virtual object (e.g., virtual object 360) to be closer to the user than the corresponding real object (e.g., real object 310), and the other result shows a virtual object to be farther away than the corresponding real object. The user is asked to pick the virtual object that is rendered closer to the real object 310. In some embodiments, in response to each of the two FOV adjustment instructions, the electronic device, iteratively until the user FOV confirmation is received, sequentially renders two first virtual fields of view overlaid on the real field of view in the respective display portion. The first user input (i.e., user FOV confirmation) selects one of the two first virtual fields of view, and the selected one of the two first virtual fields of view providing a better rendering effect for the virtual object than an unselected one of the two first virtual fields of view. Similar to the process described above, the user is instructed to choose between two options. In some embodiments, with each iteration, the difference between the two sets of FOV display settings gets narrower and finally the selection gives optimal convergence display settings (e.g., when the virtual and real objects substantially overlap as shown in the image 400).

[0057] Figure 5A a diagram comparing another three images 500, 530, and 550 that are rendered with different virtual FOVs, in accordance with some embodiments. The images 500, 530, and 550 correspond to three distinct convergence display settings including a first convergence display setting resulting in a correct or accurate convergence, a second convergence display setting resulting in too little convergence, and a third convergence display setting resulting in too much convergence. Specifically, the image 500 corresponds to the first convergence display setting and includes a first virtual -real object overlay 505 displayed substantially with a convergence of the virtual and real objects on both a left display 315 and right display 325 of a stereoscopic display as seen through the user’s left eye 520 and right eye 525, respectively. In the first virtual-real object overlay 505, the virtual object 360 (Figure 3) substantially overlaps the real object 310 (Figure 3), and vice versa, i.e., the virtual and real objects 360 and 310 have a substantially identical depth with reference the stereoscopic display. The image 530 corresponds to the second convergence display setting and includes a second virtual-real object overlay displayed with an insufficient convergence of the virtual and real objects on the left display 315 or right display 325 as seen through the user’s left eye 520 or right eye 525, respectively. In the second virtual-real object overlay 530, the virtual object 360 is displayed with a greater depth than the real object 310, and there is an insufficient convergence between the virtual object 360 and the real object 310. The image 550 corresponds to the third convergence display setting and includes a third virtual- real object overlay 555 displayed with an insufficient convergence on the left display 315 or right display 325 as seen through the user’s left eye 520 or right eye 525, respectively. In the third virtual-real object overlay 550, the virtual object 360 is display with an insufficient depth, and there is an insufficient convergence between the virtual object 360 and the real object 310.

[0058] In some embodiments, convergence adjustment is performed with both eyes open (e.g., left eye 520 and right eye 525). The electronic device guides (e.g., via a displayed user interface and user interface elements) the user through adjusting the convergence display settings while observing the perceived depth. This is aided by continuously visualizing the virtual object 360 and the real object 310 on the display of the electronic device 305. In some embodiments, the convergence display settings are simplified to a convergence angle at which eyes captured by two camera portions (i.e., two subsets of image sensors 260) look towards each other. For example, a left eye captured by a first image sensor subset looks slightly towards the right while a right eye captured by a second image sensor subset looks slightly towards the left. This is done to simplify an interactive convergence adjustment process for a user, given that many first time users find it overwhelming to tune multiple variables concurrently. When the convergence angle changes, center axes of the two image sensor subsets converge towards or diverge from each another. The convergence display setting leads to change in parallax between rendered left and right eye image. To the user, the virtual object 360 appears to move closer towards the user or further away as the convergence display setting changes. For example, if a painting on the wall is used as the calibration target (i.e., real object 310), a virtual painting is displayed accordingly. The user is then tasked to adjust the convergence display setting until the virtual painting appears to be at the same depth as the real painting on the wall.

[0059] Figure 5B is a flow diagram of a process 560 of adjusting convergence of an electronic device having a stereoscopic display, in accordance with some embodiments. The method 560 is implemented by an electronic device (e.g., a client device 104, AR glasses 150, electronic device 305, a server 102, or a combination thereof; Figure 1-3). Method 560 is, optionally, governed by instructions that are stored in a non-transitory computer readable storage medium and that are executed by one or more processors of the electronic device. Each of the operations shown in Figure 5B correspond to instructions stored in a computer memory or non-transitory computer readable storage medium (e.g., memory 206 of the system 200 in Figure 2). The computer readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices. The instructions stored on the computer readable storage medium may include one or more of: source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors. Some operations in method 560 may be combined and/or the order of some operations may be changed.

[0060] In some embodiments, the electronic device renders (560) a virtual object 360 in a virtual field of view for each display portion a stereoscopic display. The virtual object 360 in a virtual field of view is overlaid on a real field of view of a camera (e.g., image sensor 260) coupled to the stereoscopic display. In some embodiments, the stereoscopic display includes at least two display portions for displaying media content to two eyes of a user (e.g., a left display 315 and a right display 325; Figure 3). In some embodiments, the stereoscopic display is physically coupled to the electronic device. Alternatively, in some embodiments, the stereoscopic display is communicatively coupled to the electronic device (e.g., a server 102 communicatively coupled to AR glasses 150). The electronic device further generates a convergence display setting instruction to calibrate a display convergence of the stereoscopic display. In some embodiments, the convergence display setting instruction corresponds to both of the two display portions (the left display 315 and the right display 325), and the user is instructed to open both eyes for the convergence display setting instruction. [0061] In response to the convergence display setting instruction, the electronic device receives (565) a second user input to adjust the display convergence settings. In some embodiments, the user input is a hand gesture captured by an image sensor 260. Based on the second user input, the electronic device adjusts (875) display of the virtual object 360 and the virtual field of view overlaid on the real field of view displayed on the two display portions (the left display 315 and the right display 325). In some embodiments, in response to the convergence display setting instruction, the electronic device displays the virtual object 360 in the virtual field of view at a plurality of image depths, and adjusts the convergence display settings for the stereoscopic display for each of the plurality of image depths. In some embodiments, the convergence adjustment process is performed at multiple different distances to increase accuracy of the outcome (i.e., convergence display settings).

[0062] In some embodiments, the electronic device receives (580) the user convergence confirmation that the virtual object 360 is displayed with a convergence on the two display portions (i.e., converged with the real object 310 as shown in the image 500). In some embodiments, the user convergence confirmation includes a respective hand gesture captured by the image sensor 260. In some embodiments, the user convergence confirmations corresponds to a respective predefined length of time for which no user action is received after generating the corresponding convergence display setting instructions. Not receiving the user action for the respective predefined length of time indicates that the user approves the convergence display settings. In some embodiments, the user can provide inputs using a controller associated with the electronic device as described above with reference to Figure 3. [0063] In some embodiments, in response to each of the convergence display setting instruction, the electronic device sequentially renders two first virtual fields of view overlaid on the real field of view in the respective display portion. The user convergence confirmation selects one of the two first virtual fields of view. The selected one of the two first virtual fields of view providing a better rendering effect for the virtual object 360 than an unselected one of the two first virtual fields of view. At times, it can be difficult for the user to decide on a final or optimal convergence display setting when the adjustment granularity is very fine.

To assist the user a binary selection method is performed by the electronic device. The electronic device presents the rendering results of two convergence display settings consecutively to the user. In an example, one result shows the virtual object 360 to be closer to the user than the real object 310, and the other result shows the virtual object 360 to be farther away than the real object 310. The user is asked to pick the virtual object 360 that is closer to the real object. In some embodiments, in response to each of the two calibration adjustment instructions, iteratively until the user convergence confirmation is received, the electronic device sequentially renders two first virtual fields of view overlaid on the real field of view in the respective display portion. The second user input (i.e., user convergence confirmation) selects one of the two first virtual fields of view, and the selected one of the two first virtual fields of view providing a better rendering effect for the virtual object than an unselected one of the two first virtual fields of view. Similar to the process described above, the user is instructed to choose between two options. In some embodiments, with each iteration, the difference between the two sets of convergence display settings gets narrower and finally the selection reaches an optimal convergence display setting which allows the virtual and real objects to substantially overlap with a convergence.

[0064] Figure 6 is a flowchart of a method 600 for interactively adjusting a stereoscopic display, in accordance with some embodiments. The method 600 is implemented by an electronic device (e.g., a client device 104, AR glasses 1520, electronic device 305, a server 102, or a combination thereof; Figure 1-3). Method 600 is, optionally, governed by instructions that are stored in a non-transitory computer readable storage medium and that are executed by one or more processors of the electronic device. Each of the operations shown in Figure 6 correspond to instructions stored in a computer memory or non-transitory computer readable storage medium (e.g., memory 206 of the system 200 in Figure 2). The computer readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices. The instructions stored on the computer readable storage medium may include one or more of: source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors. Some operations in method 600 may be combined and/or the order of some operations may be changed.

[0065] The electronic device, for each display portion of the stereoscopic display, renders (605) a virtual object 360 in a virtual field of view that is overlaid on a real field of view of a camera coupled to the stereoscopic display. In some embodiments, the stereoscopic display including two display portions for displaying media content to two eyes of a user. In some embodiments, the electronic device generates (610) two FOV adjustment instructions to calibrate the virtual field of view in each display portion. In some embodiments, the electronic device, sequentially and in response to the two FOV adjustment instructions, determines (615) FOV display settings of each display portion based on a respective FOV user confirmation that the respective virtual object 360 matches the real field of view displayed in the respective display portion. The determination of the FOV display settings is discussed above in reference to Figures 4A-4B.

[0066] The electronic device generate (620) a convergence adjustment instruction to calibrate a display convergence of the stereoscopic display. In response to the convergence adjustment instruction, the electronic device determines (625) convergence display settings for the stereoscopic display based on a convergence user confirmation that the virtual object 360 is displayed with convergence. The determination of the convergence display settings is discussed above in reference to Figures 5A-5B.

[0067] In summary, the application utilizes a user guidance system to calibrate display settings, more specifically, FOV display setting and convergence display settings, of an electronic device (such as AR glasses 150) including or communicatively coupled to a stereoscopic display. The user can provide their confirmations of the FOV or convergence display settings during the course of adjusting FOV or convergence of the display portions. This allows users to calibrate their electronic devices quickly and accurately, improve the overall performance of the electronic device, and personalize the display setting of the electronic device for each user. Each user sharing the same electronic device can have distinct rest positions of the electronic device (e.g. rest position on their head, rest position on their nose, distance from the eyes to the respective displays of the stereoscopic display, etc.), distinct IPD, and/or correction lenses. The processes 470, 560, and 600 disclosed herein can be used to customize the display settings of the electronic device for each user’s preferences and/or characteristics.

[0068] Additionally, electronic devices can lose calibration data that and/or the electronic device’s structure can become deformed due to mechanical impact or aging. The processes 470, 560, and 600 allow the user to calibrate the electronic device for each of these conditions without having to send the electronic device to a manufacturer or purchasing a new one. From a product perspective, the processes 470, 560, and 600 disclosed herein ensure that the electronic devices have consistent virtual-real alignment and alleviate a pose- sale support burden of returning the electronic devices to the factory.

[0069] It should be understood that the particular order in which the operations in

Figures 4B, 5B, and 8 have been described are merely exemplary and are not intended to indicate that the described order is the only order in which the operations could be performed. One of ordinary skill in the art would recognize various ways to identify device poses as described herein. Additionally, it should be noted that details of other processes described above with respect to Figures 3-5B are also applicable in an analogous manner to method 600 described above with respect to Figure 6. For brevity, these details are not repeated here. [0070] The terminology used in the description of the various described implementations herein is for the purpose of describing particular implementations only and is not intended to be limiting. As used in the description of the various described implementations and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Additionally, it will be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.

[0071] As used herein, the term “if’ is, optionally, construed to mean “when” or

“upon” or “in response to determining” or “in response to detecting” or “in accordance with a determination that,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event]” or “in accordance with a determination that [a stated condition or event] is detected,” depending on the context.

[0072] The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain principles of operation and practical applications, to thereby enable others skilled in the art.

[0073] Although various drawings illustrate a number of logical stages in a particular order, stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art, so the ordering and groupings presented herein are not an exhaustive list of alternatives. Moreover, it should be recognized that the stages can be implemented in hardware, firmware, software or any combination thereof.

Previous Patent: PRINTER

Next Patent: SPIRAL VALVE FOR SCREW CAPACITY CONTROL