Visual Technologies

Pioneering visual technology innovation delivers rich multimedia experiences and drives global standards.

Visual Technologies

Pioneering visual technology innovation delivers rich multimedia experiences and drives global standards.
  • overview
  • artificial intelligence
  • home experience
  • imaging science
  • immersive
  • data sets
  • solutions
  • resources


In 2019, InterDigital incorporated a world-class video and AI research team from Technicolor to expand its work in video technology. Our areas of expertise encompass versatile video coding standards, immersive media representation and delivery, video broadcasting and multicasting, real-time interactive video applications, and optimized video delivery over 5G networks. InterDigital’s visual technologies development and advanced research are top-notch in the video industry. Over the past years, InterDigital’s video standards and platforms team has pioneered new capabilities and made more than 100 contributions to key global standards through participation in international standardization organizations such as ISO/IEC/MPEG, ITU-T/VCEG, JVET and 3GPP SA4. InterDigital has been involved in the development of High Efficiency Video Coding (HEVC) standard, and led the standardization of HEVC Scalable Video Coding (SHVC) extension.

InterDigital is a lead partner in next generation video coding standard technology development projects related to standard dynamic range, high dynamic range and wide color gamut, and 360-degree video coding and other areas. The industry-leading adaptive 360-degree video streaming platform developed by our engineers delivered real-time 4K- quality video with very low latency in demonstrations at MWC 2018 and in the 5G-Coral VR trial.

Artificial Intelligence Lab

The InterDigital Artificial Intelligence Lab specializes in adapting and applying the latest advances in Artificial Intelligence (AI) to the creation and delivery of next generation content – and in innovating how these advances can be applied at home for the benefit of consumers everywhere.

Energy Efficient Deep Learning

Emerging applications for AI and deep learning outline the significant need to optimize energy efficiency to improve battery use, extend battery life, and promote sustainability. Energy considerations have become crucial to the success of AI for wireless, and InterDigital continues to research and develop cutting edge techniques to measure and optimize the energy use of deep neural networks (DNN) to increase battery life or expedite inference.

Deep Video and M2M Compression

Efficient video compression is critical for the storage and transmission of growing video content, especially as deep learning-based approaches have quickly exceeded traditional approaches for image compression. InterDigital is a leader in the design of disruptive video codecs based on deep learning techniques to complement ongoing augmentation of modules in existing software codec architectures, providing a natural way to represent video content using latent representations and offering a solution especially suited for interpretation by other neural networks or machines, like IoT devices.

AI for Dynamic Wireless Environments

InterDigital is dedicated to developing neural network-based models that achieve better performance and using neural networks to address the complex optimization problems that arise in wireless system design for 5G and beyond. Our engineers lead research to optimize wireless systems, particularly when the channel state is highly dynamic and channel state information (CSI) varies over time and location. Deep neural networks help approximate the otherwise unknown input and output of wireless optimization problems, enabling inference to be done in quasi-real-time.

Synthetic Content

The promise and rapid growth of cloud gaming has unveiled the need to automate the process of generating gaming content. InterDigital addresses the particular technical challenges related to the creation and tuning of 3D assets based on a text description, including the initial creation of the asset from text avatar and the tuning or personalization of the avatar based on user feedback, enabling users to have more input into the creative process.

Home Experience Lab

InterDigital R&I’s Home Experience Lab is committed to improving the user experience in the home and developing future connected home technologies. Use cases include audio/video/media consumption, entertainment (including cloud gaming), and IoT and Smart City technology. 

To do so, we leverage our best-in-class expertise in communication, networking (5G, WiFi, virtualization and SDR), video and VR streaming, cloud computing/gaming, and ML/AI (machine learning, visualization, for better in-home understanding & control, EDGE computing).

AI in CE devices

The Home Experience Lab explores ways to offer the most advanced AI-based services in the home while minimizing cloud access, to provide the utmost responsiveness, preserve privacy and seamlessly empower users. To do so, we abstract underlying complexity and work on transparently interfacing CE devices to distribute processing and enhance capabilities.

Future Network for the Home

A main area of focus is home connectivity, supporting user mobility at high throughput and low latency with possible QoS constraints across a large number of devices. Key technologies include local processing capabilities (CPU, video processing, model inference and training) and storage, and innovative AI solutions to steer traffic toward the best connectivity in a multi-radio environment.

Streaming and Communication

With the ever-growing need by users and applications to get to zero latency, especially given VR needs, InterDigital R&I is developing ultra-low-latency streaming solutions that can meet the most demanding application needs, including volumetric streaming, AR/VR, cloud gaming, uni/broad/multi-casting and V2X.

IoE & Sensing

At the crossroads of data fusion, machine learning and multi-modal sensing, we develop technologies that synergistically sense the home and provide a platform for new in-home capabilities and services. We combine off-the- shelf sensors with the newest unobtrusive and privacy-aware technologies like geo-phone or RADAR.

Imaging Science Lab

InterDigital R&I’s Imaging Science Lab partners with key players across the film and consumer electronics industries to develop cutting-edge tools to analyze, process, represent, compress and render content – enabling the production and delivery of high quality real and synthetic images. 

Several key areas on which the Imaging Science Lab is focusing include:

  • Color science and applications;
  • Content compression and its associated ecosystems; and
  • Content processing.

These technologies are developed in a dedicated project portfolio, ensuring a balance between advanced research and its direct application to today’s technology needs.

Video Compression

The Imaging Science Lab develops new technologies to improve compression efficiency and make key contributions to the new ISO MPEG/ITU-T VCEG standard. We also are conducting advanced research into the use of deep learning to develop disruptive video codec solutions.


InterDigital R&I’s Imaging Science Lab is actively involved in the development of a complete joint SDR/HDR solution, ensuring backward compatibility and optimal delivery and rendering on next generation displays.

TV UI and interactive experiences

A focus of research at the Imaging Science Lab is the development of new display functionalities and their associated usage at home.

Point Cloud Compression

Point Cloud technology is one of the most exciting and multi-purpose visual technologies being developed today. InterDigital R&I is developing pioneering compression solutions for Point Cloud technology, leveraging the joint geometry and video nature of point clouds. This includes active participation in the new PCC MPEG standard.

Advanced AI-based Object, Facial and Production Capabilities

InterDigital R&I builds on its heritage of developing cutting-edge solutions for Technicolor, a leader in production services, with a range of technologies that we believe will see broad application beyond specialized services. These include AI-based capture to assist VFX artists, animated face rig extraction (in collaboration with Max Planck Institute), digital make-up, and color management in VFX.

Immersive Lab

InterDigital’s Immersive Lab develops today’s solutions for tomorrow’s interactive media environment. Through the application of innovations in computer graphics, computer vision, video processing and optics, we offer professionals and end-users alike the solutions they need to amplify their immersive experiences. Focus areas include:

  • Virtual Reality;
  • Augmented Reality;
  • Virtual Production;
  • Real-time Visual Effects;
  • Light Field Technologies; and
  • Light Guiding Technologies.
Light Fields

Light Field capture, editing and rendering in view of defining more powerful image representations. Potential usage on mobile phones with multiple cameras.

Light Guiding

New enabling technology for guiding light at nanoscale. Abrupt light deviation. Near field beam forming. Potential usage to AR glasses, displays and camera sensors.

Digital Double

FACET (Facial Animation Control for Expression Transfer) is a proprietary tool we developed, which streamlines 3D facial animation for VFX and animation artists. We also developed a fully automatic pipeline to create a full rigged CG character ready to be animated with full facial expressions system setup from a photogrammetric capture rig.

Mixed Reality Interactive Experiences

Allow users to view and interact with virtual objects inserted in a real-world environment. Characterize lighting parameters to better blend virtual objects within the real-world environment. Use an Augmented Reality shared server (AR Hub) to build, store and share advanced scene descriptions, enabling device localization in the environment, remote interactions with real objects and advanced augmented reality experiences.

Advanced VR Capabilities

The development of Virtual Reality capabilities and applications is a fertile area for new research. InterDigital R&I’s Immersive Lab is pioneering technologies in VR-based advanced collaborative production environments, game engine technologies that enable the creation and viewing of high-end visuals in close to real-time, social VR, and VR interactivity.

Data Sets

In contrast to existing datasets with very few video resources and limited accessibility due to copyright constraints, LIRIS-ACCEDE consists of videos with a large content diversity annotated along affective dimensions. All excerpts are shared under Creative Commons licenses and can thus be freely distributed without copyright issues. The dataset (the video clips, annotations, features and protocols) is publicly available.
The LaFin: Large-scale Flickr interestingness dataset (hereafter “the Dataset”) is a collection of Flickr image IDs corresponding to about 123k Flickr images, equally balanced between interesting and non-interesting images, and their corresponding metadata. In addition to the images, their binary labels, and associated metadata, some precomputed features are provided: CNNs, semantic features that derived from image captioning and Word2Vec representations of Flickr tags.  It is intended to be used for analyzing socially-driven image interestingness and...
The recent VR/AR applications require some way to evaluate the Quality of Experience (QoE) which can be described in terms of comfort, acceptability, realism, and ease of use. In order to assess all these different dimensions, it is necessary to take into account the user’s senses and in particular, for A/V content, the vision. Understanding how users watch a 360° image, how they scan the content, where they look and when is thus necessary to...
Automatic extraction of face tracks is a key component of systems that analyzes people in audio-visual content such as TV programs and movies. Due to the lack of annotated content of this type, popular algorithms for extracting face tracks have not been fully assessed in the literature. To help fill this gap, we introduce a new dataset, based on the full audio-visual person annotation of a feature movie. Thanks to this dataset, state-of-art tracking metrics...
The VSD benchmark is a collection of ground-truth files based on the extraction of violent events in movies and web videos, together with high-level audio and video concepts. It is intended to be used for assessing the quality of methods for the detection of violent scenes and/or the recognition of some high level, violence-related, concepts in movies and web videos. The data was produced by Technicolor for the 2012 subset and by the Fudan University and the Ho Chi Minh University...
 The Interestingness Dataset is a collection of movie excerpts and key-frames and their corresponding ground-truth files based on the classification into interesting and non-interesting samples. It is intended to be used for assessing the quality of methods for predicting the interestingness of multimedia content. The data has been produced by the MediaEval 2016 Predicting Interestingness and the MediaEval 2017 Predicting Interestingness Tasks' organizers and was used in the context of this benchmark. A detailed description of the benchmark can...
The automatic recognition of human emotions is of great interest in the context of multimedia applications and brain-computer interfaces. While users’ emotions can be assessed based on questionnaires, the results may be biased because the answers could be influenced by social expectations. More objective measures of emotions can be obtained by studying the users' physiological responses. The present database has been constructed in particular to evaluate the usefulness of electroencephalography (EEG) for emotion recognition in the context...
We provide a set of synchronized Light-Field video sequences captured by a 4x4 camera rig at 30fps. Each camera has a resolution of 2048x1088 pixels and a 12mm lens. The Field Of View (FOV) is 50˚ x 37˚. For each Light-Field video sequence we provide the captured images after the color homogenization and the demosaicking, as well as the pseudo-rectified images. Images are provided in PNG format.  Calibration parameters are also provided. Please see [1]...
The Movie Memorability Database and related software is a collection of movie excerpts and corresponding ground-truth files based on the measurement of long-term memory performance when recognizing small movie excerpts from weeks to years after having viewed them. It is accompanied with audio and video features extracted from the movie excerpts. It is intended to be used for assessing the quality of methods for predicting the memorability of multimedia content. A detailed description of the...
The VideoMem or Video Memorability Database is a collection of sound-less video excerpts and their corresponding ground-truth memorability files. The memorability scores are computed based on the measurement of short-term and long-term memory performances when recognizing small video excerpts a few minutes after viewing them for the short-term case, and 24 to 72 hours later, for the long-term case. It is accompanied with video features extracted from the video excerpts. It is intended to be...


Viewport Adaptive 360° Video Live Streaming Solution

  • Real-time 4k 360° video acquisition, tile-based HEVC encoding, DASH live streaming
  • Viewport adaptive low-latency live streaming
  • Real-time 360° video decoding, composition and 2D/3D rendering
Scalable HEVC Video Delivery Platform

  • End-to-end scalable video delivery system with low re-buffering latency and efficient bandwidth utilization
  • Standards supported: MPEG DASH and Scalable HEVC (SHVC)
  • Devices supported: 4K/UHD TV, tablets, smartphones, etc.
  • OS supported: Android and Linux
Power Aware HEVC Streaming Solution for Mobile Communications

  • Power aware HEVC encoder: consider power consumption during encoding process 
  • Complexity aware Media Presentation Description (MPD): convey complexity level information in manifest file
  • Power aware streaming client: power aware adaptation with power sensing
Pointing Zoom Solution

  • Illustrates content interaction possibilities
  • Past system required human manual tracking
  • Point at area of screen and region is magnified (Try to track a player)
  • Zoom factor may be adjusted via mouse wheel (1x, 2x, 4x options)
  • Alternatively zoom area may be place into picture-in-picture
Personalized VR Player

  • Personalized VR Player enables real-time interaction with real-world media elements in a virtual world
  • Viewport optimized 360° video technology delivers improved media quality with reduced bandwidth consumption


File Download / 4.7 GB
File Download / 5.02 GB
File Download / 5.5 GB
File Download / 4.5 GB
File Download / 4.5 GB
File Download / 5.6 GB
File Download / 5.5 GB