The Vault

Moving Towards Next Level Immersion with Mixed Reality
White Paper / Apr 2018 / ar/vr, innovation partners

Over the last few years, we watched head-worn virtual and augmented reality devices move from research laboratories to store shelves as the newest platform enabling novel virtual experiences. The big promise of emerging mixed reality (MR) technology is a far deeper immersion into imaginary worlds than is possible with the flat 2D screens we currently use to consume movies, TV shows and computer games. Increased immersion comes from the fact that with MR, viewers feel completely surrounded by the alternative reality – they’re not merely looking at a flat 2D projection on a screen from a single fixed viewpoint. To create an increased level of immersion, MR content needs to look and behave realistically, as if the user is part of the virtual environment even though it only really exists as digital information in a computer’s memory.

In this white paper, Innovation Partners introduce key terms and technical aspects associated with free viewpoint control for MR content; limitations that are hindering production, distribution and consumption of immersive MR content; and, at the end, introduce solutions that address these shortcomings in novel ways.

We found some problems with the form:
To view this content please fill out the form below to tell us about you.
Privacy Policy
Over the last few years, we watched head-worn virtual and augmented reality devices move from research laboratories to store shelves as the newest platform enabling novel virtual experiences. The big promise of emerging mixed reality (MR) technology is a far deeper immersion into imaginary worlds than is possible with the flat 2D screens we currently use to consume movies, TV shows and computer games. Increased immersion comes from the fact that with MR, viewers feel completely surrounded by the alternative reality – they’re not merely looking at a flat 2D projection on a screen from a single fixed viewpoint. To create an increased level of immersion, MR content needs to look and behave realistically, as if the user is part of the virtual environment even though it only really exists as digital information in a computer’s memory. Virtual content for which the viewer is in a passive role, observing a pre-recorded story similar to a movie, is called ‘Cinematic VR’. Even in these cases, when minimal amounts of interaction between the viewer and virtual world are needed, the minimum requirement for realistic behavior in the virtual world is that the user can freely control their viewpoint by just moving his or her head. However, even this low level of interaction between the viewer and virtual content has proven to be difficult to achieve with photorealistic visual quality using current-generation MR devices. Solving this problem will unlock the full potential of MR and deliver next level immersion to consumers. In the following chapters, we will introduce key terms and technical aspects associated with free viewpoint control for MR content. We will introduce limitations that are hindering production, distribution and consumption of immersive MR content, and, at the end, introduce solutions that address these shortcomings in novel ways. Motion parallax and degrees of freedom When observing our surroundings in everyday life, our brains construct a spatial map that spans three dimensions. In order to do so, our brain has learned to identify a number of depth cues from the visual stimuli picked up by the receptors in our eyes. Depth cues are aspects of visual information that provide hints about the distance of objects from the viewer. An ideal virtual experience would recreate all depth cues accurately for the observer. In the fields of cinematography and photography, stereoscopic capture and viewing has been experimented with since almost the initial development of photography itself. In stereoscopic imaging, separate images are recorded and displayed for each eye, allowing the viewer to observe binocular depth cues such as stereopsis. But despite a big push from the industry a few years back, 3D content featuring stereoscopic view has failed to catch on. A major reason is that stereopsis is just one of a number of depth cues that our visual perception system uses to enable three-dimensional perception, and adding it by itself fails to enhance the overall immersion significantly. Many monocular depth cues such as perspective, texture gradients and realistic lighting and shadows can be captured and produced with today’s computer graphics, but motion parallax, which has been considered an even stronger depth cue than stereopsis, is missing from traditional formats of digital entertainment. Motion parallax is the visual effect of objects that are closer to the viewer moving larger distances on the image plane than objects that are further away, when the viewpoint is shifted sideways. One often-used example is a landscape observed through the window of a moving train. Trees and grass Moving Towards Next Level Immersion with Mixed Reality Humans are curious creatures that are always looking for new, enthralling experiences. This has led to the development of many forms of entertainment, ongoing since the dawn of communication. Today, enabled by state of the art technologies, we can experience alternate realities from the safety of our living room couches, theater seats or amusement park rides in realistic audiovisual quality. Innovation Partners | White Paper closer to the train move faster through the field of view than the buildings further away, while clouds even further away seem to remain static in the background. In the same way, motion parallax provides strong depth cues to our brains when we move our head even slightly, as we observe objects at close range. With emerging MR devices embedded with positional tracking, it is possible to produce virtual experiences that allow viewers to freely control viewpoint with their head motions. When rendering of the 3D view is done based on head motion, motion parallax depth cues can be correctly produced. When motion parallax is accurately synchronized with head motion, it creates a strong illusion of virtual content really existing in the same physical space with the viewer. In order to enable the effect of motion parallax, visual information rendered for the viewer needs to convey information about the distance of each element in the image from the viewer. The element distance determines how much the element needs to translate when the viewpoint shifts. In addition to depth information, visual information for all areas of an image, even those that may be occluded by closer objects but can become visible once the viewpoint is shifted, are needed. Traditional 2D media loses both depth information and visual information for occluded areas when the scene is recorded with a 2D camera from a single viewpoint. Only 3D content formats that provide both depth information and visual information for occluded areas can enable recreation of motion parallax. From existing 3D formats, fully synthetic 3D scenes created by artists through tedious 3D modelling and animation processes contain complete depth and visual information, the data is inherently defined in all three dimensions. Other 3D formats that are used to capture a scene in a way that enables motion parallax, such as light-field video, contain multiple views of the scene which can then be used to infer depth information and provide views of occluded areas from various viewpoints, thus enabling motion parallax for at least a limited area. Some other formats, such as stereoscopic video with just two separate views, may have too little information to enable recreation of motion parallax with good quality, as they lose much of the depth information and visual information for occluded areas. Combined with the rise of consumer MR devices, we have seen a number of new terms become more widely used to determine various aspects of the immersive characteristics of virtual experiences. A key term representing how much freedom the viewer has to navigate within the content is degrees of freedom (DoF). DoF is a definition of the number of axes through which motion can take place. In the case of MR, some low-end devices, such as mobile devices used as VR displays, have sensors that can only register rotations of the device. As the motion of the device is not constrained, the device can rotate around all three axes of the physical space, thus enabling 3DoF navigation. When the motion of the device is being tracked in addition to device rotations, free translation along each of the three axes of physical space provides an additional three degrees of freedom, i.e., translation and rotation together equals 6DoF, which gives full freedom of navigation. Challenges with freedom of navigation The illusion of virtual content surrounding the user comes from virtual elements having consistent spatial anchoring so that when the user moves his or her head, the elements react to the motion naturally, as if they were present in the same space. This requires that user motion is known, i.e., tracked, and that the viewpoint from which the virtual content is being drawn can be changed according to the tracked motion. Innovation Partners | White Paper VIRTUAL CONTENT NAVIGATION LIMITED TO 3DoF DOES NOT DELIVER A DEEP ENOUGH LEVEL OF IMMERSION AND THUS FALLS SHORT ON THE REALISM CLAIMED BY MANUFACTURERS Innovation Partners | White Paper Traditionally, tracking has been seen as the main challenge preventing full 6DoF navigation. However, thanks to recent advances in tracking methods and sensor technology integrated within MR platforms, tracking can be largely considered a solved problem. The problems hindering virtual experiences are actually more pressing on the content side. Technologies enabling capture, distribution and display of alternate realities in a format that not only provides photorealistic scene information but also enough depth and “extra” content to deal with depth queues and occluded scene content, thus enabling full 6DoF navigation, are primarily limited by the content itself. At the moment, only the real-time 3D rendering widely used in computer games enables full 6DoF navigation for MR. However, even with rapid development of real-time 3D computer graphics over the last several decades, real-time computer graphics has not yet been able to deliver fully-photorealistic virtual experiences, especially not on consumer MR devices with limited graphics processing performance. For high quality experiences, only off-line rendering and off-line reality capture can truly deliver. One downside of both off-line rendering and off-line reality capture is that the amount of data required to enable free navigation exceed all reasonable means of content delivery. For example, a limited light-field format that includes data from relatively small camera arrays requires several dozen simultaneous high-resolution video streams instead of the one required by normal 2D video. Supporting low latency operation is a second downside of off-line rendering. Tackling these limitations and enabling the motion parallax needed for 6DoF MR experiences will require novel solutions achieved by innovative approaches. Recently, some solutions have been emerging. Many of them limit where content can be navigated freely from full 6DoF to a more restricted area that usually only allows free head motion when the viewer position is expected to remain stationary. This limitation permits cutting some corners in content optimization. This approach of enabling motion parallax for only a small area is referred to as a 3DoF+ solution - one that allows freedom of navigation that is more than 3DoF, but not full 6DoF. In the following chapter, we introduce examples of solutions that enable increased levels of immersion for MR using such 3DoF+ approaches. 3DoF+ Achieving motion parallax on current-generation MR devices requires a radical reduction in the amount of transmitted data. For pre-rendered and captured content, there are no obvious ways to achieve the level of data reduction required, so innovative approaches are needed. As outlined above, the challenges are significant and the industry has just started to realize that we need to develop novel solutions quickly. As consumer adoption of MR technology has already begun, early experiences are impacted by a lack of realism due to currently-available technology. Virtual content navigation limited to 3DoF does not deliver a deep enough level of immersion and thus falls short on the realism claimed by manufacturers, and therefore expected by consumers. Using a 3DoF+ approach, where freedom of motion is limited in order to achieve efficient data optimization, several players investing in MR technology are developing solutions for next generation immersive content. Recently, Google announced early hints of Seurat1, a surface light-field rendering technology they have been working on. Disney Research also published an initial academic paper2 describing the real-time animated light-field approach they have been investigating. Both approaches limit freedom of motion within the content to a confined area and optimize the needed visual data based on selected viewing areas. InterDigital’s Innovation Partners research group has also been working in the area of 3DoF+ solutions, developing several enablers for the next level of immersion for MR content, including for mobile devices with relatively low performance characteristics. One approach analyzes and reduces full 3D scenes into a number of discrete spherical video layers that can be efficiently compressed, streamed and rendered with motion parallax on client devices. With this approach, instead of using more common per pixel depth values or full 3D reconstructions of the scene, the virtual view is segregated into a number of co- centric spherical 360-degree video layers based on the depth values of the scene. The viewpoint can then be shifted around a central point of the spherical video layers according to head tracking, approximating motion parallax by varying the speeds of motion of different video layers at different distances. This is similar to how traditional 2D cartoons create the illusion of motion parallax by moving several image layers at different speeds. Essential to the illusion are the distance and number of spherical video layers quantizing the overall depth of the scene, which can be dynamically adjusted during content delivery based on scene depth variation, available communication bandwidth and rendering performance of the viewing client, thus optimizing the quality of the experience for given resources. Other approaches analyze the content and intelligently and dynamically package it into layers of spherical video and 3D assets. This approach takes into account how to optimize the quality of experience while providing motion parallax, by leveraging 3D rendering and video rendering resources available on both the content server and client device side. It extends the idea of dynamically reducing the 3D information needed to support full 6DoF navigation, towards 3DoF+ with limited navigation, to avoid bottlenecks in the content distribution pipeline. In this approach, 6DoF content is analyzed to estimate how well motion parallax can be approximated for different visual elements with spherical video layers and which elements benefit most from full 3D information. Based on this analysis and limitations of communication bandwidth and rendering performance, the virtual scene is then distributed for a viewing client as a combination of spherical video layers and full 3D assets, dynamically adjusting the per-visual element content format on a per-client basis. These solutions adapt data optimization and data amounts to the communication and processing resources of the MR client device and content server, thus maximizing perceived quality given available resources. Both approaches use spatial information that is much simpler than the full 3D scene description needed for 6DoF navigation. The approach of reducing the spatial information needed to recreate motion parallax in these solutions may lead to the ability to recreate motion parallax for existing standard 2D content. We have already seen some early examples of deep neural networks trained with stereoscopic content inferring depth information for standard 2D video. Estimating depth information at a level that permits the segregation of visual elements to different depth layers combined with a way to recreate missing (occluded) information will enable some level of motion parallax for legacy 2D content. Remastering this content would radically alter the dynamics of commercial adoption of new MR devices by allowing older experiences to be consumed with a completely new level of immersion. Conclusions In this white paper, we have identified the limitations of current-generation MR content, and the fact that solutions enabling full 6DoF navigation of photorealistic MR content are in high demand, as full 6DoF navigation will unlock the true potential of MR. We have also explained how, rather than aiming for full 6DoF navigation, similar consumer quality of experience can be achieved on MR devices with limited performance characteristics using a 3DoF+ approach. WP_201804_009 Innovation Partners | White Paper 6DoF CONTENT IS ANALYZED TO ESTIMATE HOW WELL MOTION PARALLAX CAN BE APPROXIMATED FOR DIFFERENT VISUAL ELEMENTS Innovation Partners | White Paper WP_201804_009 Innovation Partners www.innovation-partners.com As the limitations associated with content navigation become better understood among MR explorers, and the value of solutions approaching full 6DoF navigation in photorealistic virtual experiences is realized, we expect rapid development and adoption of 3DoF+ solutions that move MR immersion to the next level. This new level of immersion will usher in a completely new era of interactive media. In our wildest imagination, this era will not be limited to only new content, but will unlock a new, immersive way of experiencing the traditional 2D content that has been created over many decades. In addition to the solutions explained briefly in this white paper, InterDigital is continuously working on key enabling technologies that bring new virtual experiences to consumers, pushing the boundaries of MR quality of experience. References 1. https://blog.google/products/google-vr/bringing-google-earth-expeditions-seurat/ 2. https://www.disneyresearch.com/publication/real-time-rendering-with-compressed-animated- light-fields/