Papers - Docs

A spatio-angular filter for high quality sparse light field refocusing
The ability to render synthetic depth-of-field effects post capture is a flagship application of light field imaging.
However, it is known that many existing light field refocusing methods suffer from severe artefacts when applied to sparse light fields, known as angular aliasing. We propose in this paper a method for high quality sparse light field refocusing based on insights from depth-based bokeh rendering techniques. We first provide an in-depth analysis of the geometry of the defocus blur in light field refocusing by analogy with the defocus geometry in a traditional camera using the thin lens model. Based on this analysis, we propose a filter for removing angular aliasing artefacts in light field refocusing, which consists in modifying the well known shift-and-sum algorithm to apply a depth-dependent blur to the light field in between the shift and the sum operations. We show that our method can achieve significant quality improvements compared to existing approaches for a reasonable computational cost.
Authors: Martin Alain and Aljosa Smolic
Venue: IEEE ICME Workshop 2021

A-spatio-angular-filter-for-high-quality-sparse-light-field-refocusing.pdf (Adobe PDF - 9.30Mb)

Not yet published
Semantic Crowd Re-targeting: Implementation for Real-time Applications and User Evaluations
Crowd simulation is the act of simulating and controlling the dynamic movement of large groups of virtual characters. Crowd simulation is traditionally a complex and time-consuming process, requiring extensive manual effort to achieve. On one hand, commercial and liberally licensed tools tend to have many aspects of simulation tightly integrated which can be prohibitively difficult to re-configure, on the other, paying extras can be far more costly. In this context, the re-use of existing simulated crowds has been identified as a valuable cost-saving approach to crowd simulations. Previous approaches have investigated the use of environment semantics, but they have not been integrated with a commonly used simulation platform, rendering their usefulness limited. We present a novel approach to crowd simulation using an emergent system for re-targeting autonomous crowds and report on the findings of a problem discovery study, analyzing and establishing key aspects of functionality, usability, and user experience. Our results provide a breakdown of the crowd simulation process with corresponding time-on-task metrics to provide a reference point for future scientific research into crowd simulation systems. Furthermore, we report on how users react to a system that involves the use of semantic data to facilitate the re-use of existing crowd simulations. We anticipate
that other researchers will follow suit, to develop tools that are both innovative and usable in crowd simulation practices.
Authors: David L. Smyth; Gareth W. Young; Jan Ondrej; Rogerio da Silva; Alan Cummins; Susheel Nath; Amar Zia Arslaan; Pisut Wisessing; and Aljosa Smolic

Spotlight on EU Project SAUCE
SAUCE is a three-year EU Research and Innovation project between Universitat Pompeu Fabra, Foundry, DNEG, Brno University of Technology, Filmakademie Baden-Württemberg, Saarland University, Trinity College Dublin, Disney Research to create a step-change in allowing creative industry companies to re-use existing digital assets for future productions. The goal of SAUCE is to produce, pilot and demonstrate a set of professional tools and techniques that reduce the costs for the production of enhanced digital content for the creative industries by increasing the potential for re-purposing and re-use of content as well as providing significantly improved technologies for digital content production and management.
The approach is based on research into
- light-field technology
- automated classification and tagging using deep learning and semantic labeling to describe and draw inferences
- the development of tools for automated asset transformation, smart animation, storage and retrieval.
These new technologies and tools will show that a vast reduction of costs and increases in efficiency are possible, facilitating the production of more content, of higher quality and creativity, for the benefit of the competitiveness of the European creative industries.
Authors: All partners

Physiologically Personalized Color Management for Motion Picture Workflows
One of the essential mechanisms employed by the human visual system when interpreting the natural world is that of trichromatic integration of physical scene spectra by cone photoreceptors. By extension of this, different scene spectra can result in the same color sensation in an observer, a phenomenon known as metamerism. This allows imaging systems to produce realistic reproductions of scene content by the same three channel mechanism. To predict these matches, color matching functions (CMFs) are used which aim to describe the average spectral integration behavior of observers. However, the use of a single average observer CMF has been shown to result in impactful color rendering errors, as there exists significant variation in the spectral absorption characteristics of the eye within populations of color-normal observers. When this is crossed with the growing disparity between the spectral characteristics of emerging display technology it becomes evident that this inter-observer variability should be accounted for. Asano and Fairchild present a physiologically based individual observer model, as well as a method for separating a population of observers into a limited number of categorical CMFs. Building on this work, we present a computationally simple metameric match simulation pipeline which uses these categorical functions. With this pipeline, we perform a simulation with real display spectra and natural images to observe the variability which could occur among a population as a result of observer metamerism in a motion picture viewing scenario. The results provide further evidence that inter-observer metameric variability is a relevant problem in the context of natural images. Finally, we outline how this pipeline can be incorporated into one's color management strategy.
Authors: Trevor Canham; David L Long; Mark D Fairchild; Marcelo Bertalmío

SAUCE: Asset Libraries of the Future
Storage and retrieval of production assets is vital for every modern VFX and animation facility. From the volume of assets being stored to the constantly changing variety and richness of the asset data, efficiently storing, indexing, finding and retrieving the assets you want is a growing challenge. This paper discusses some of the requirements of modern asset storage systems for VFX and animation, introducing two systems that were built to address these challenges as part of the collaborative EU funded “SAUCE” project; DNEG’s search and retrieval framework, and Foundry’s back-end asset storage. It also presents example use cases of the asset library from Filmakademie’s experiments in virtual production, demonstrating more artist focused and task centered systems that enable greater asset re-use.
Authors: Jonas Trottnow; William Greenly; Christian Shaw; Sam Hudson; Volker Helzle; Henry Vera; Dan Ring

SAUCE-Asset-Libraries-of-the-Future.pdf (Adobe PDF - 7.52Mb)

Vision models fine-tuned by cinema professionals for High Dynamic Range imaging in movies
Many challenges that deal with processing of HDR material remain very much open for the film industry, whose extremely demanding quality standards are not met by existing automatic methods. Therefore, when dealing with HDR content, substantial work by very skilled technicians has to be carried out at every step of the movie production chain. Based on recent findings and models from vision science, we propose in this work effective tone mapping and inverse tone mapping algorithms for production, post-production and exhibition. These methods are automatic and real-time, and they have been both fine-tuned and validated by cinema professionals, with psychophysical tests demonstrating that the proposed algorithms outperform both the academic and industrial state-of-the-art. We believe these methods bring the field closer to having fully automated solutions for important challenges for the cinema industry that are currently solved manually or sub-optimally. Another contribution of our research is to highlight the limitations of existing image quality metrics when applied to the tone mapping problem, as none of them, including two state-of-the-art deep learning metrics for image perception, are able to predict the preferences of the observers.
Authors: Praveen Cyriac; Trevor Canham; David Kane; Marcelo Bertalmío

A Spatio-Angular Binary Descriptor For Fast Light Field Inter View Matching
Light fields are able to capture light rays from a scene arriving at different angles, effectively creating multiple perspective views of the same scene. Thus, one of the flagship applications of light fields is to estimate the captured scene geometry, which can notably be achieved by establishing correspondences between the perspective views, usually in the form of a disparity map. Such correspondence estimation has been a long standing research topic in computer vision, with application to stereo vision or optical flow. Research in this area has shown the importance of well designed descriptors to enable fast and accurate matching. We propose in this paper a binary descriptor exploiting the light field gradient over both the spatial and the angular dimensions in order to improve inter view matching. We demonstrate in a disparity estimation application that it can achieve comparable accuracy compared to existing descriptors while being faster to compute
Authors: Martin Alain and Aljosa Smolic
Venue: IEEE ICIP 2020

Matching visual induction effects on screens of different size by regularizing a neural field model of color appearance
In the film industry, the same movie is expected to be watched on displays of vastly different sizes, from cinema screens to mobile phones. But visual induction, the perceptual phenomenon by which the appearance of a scene region is affected by its surroundings, will be different for the same image shown on two displays of different dimensions. This presents a practical challenge for the preservation of the artistic intentions of filmmakers, as it can lead to shifts in image appearance between viewing destinations. In this work we show that a neural field model based on the efficient representation principle is able to predict induction effects, and how by regularizing its associated energy functional the model is still able to represent induction but is now invertible. From this we propose a method to pre-process an image in a screen-size dependent way so that its perception, in terms of visual induction, may remain constant across displays of different size. The potential of the method is demonstrated through psychophysical experiments on synthetic images and qualitative examples on natural images.
Authors: Trevor D. Canham; Javier Vazquez-Corral; Elise Mathieu; Marcelo Bertalmío

Retinal Noise Emulation: A Novel Artistic Tool for Cinema That Also Improves Compression Efficiency
In cinema it is standard practice to improve the appearance of images by adding noise that simulates film grain. This is computationally very costly, so it is only done in post-production and not on the set. It is also limiting because the artists are not able to really experiment with the noise nor introduce novel looks. Furthermore, video compression requires a higher bit rate when the source material has film grain or any other type of high frequency texture. In this work, we introduce a method for adding texture to digital cinema that aims to solve these problems. The proposed algorithm is based on modeling retinal noise, with which the images processed by our method have a natural appearance. This “retinal grain” serves a double purpose. One is aesthetic, as it has parameters that allow to vary widely the resulting texture appearance, which make it an artistic tool for cinematographers. Results are validated through psychophysical experiments in which observers, including cinema professionals, prefer our method over film grain synthesis methods from academia and the industry. The other purpose of the retinal noise emulation method is to improve the quality of compressed video by masking compression artifacts, which allows to lower the encoding bit rate while preserving image quality, and to improve image quality while keeping the bit rate fixed. The effectiveness of our approach for improving coding efficiency, with average bit rate savings of 22.5%, has been validated through psychophysical experiments using professional cinema content shot in 4K, color-graded and where the amount of retinal noise was selected by a motion picture specialist based solely on aesthetic preference.
Authors: Itziar Zabaleta; Mateo Cámara; César Díaz; Trevor Canham; Narciso García; Marcelo Bertalmío

Optimized Predictive Coding of 5D Light Fields

OPTIMIZED-PREDICTIVE-CODING-OF-5D-LIGHT-FIELDS.pdf (Adobe PDF - 630Kb)

With the emergence of Light Field (LF) technology, the number of dimensions representing light has once again increased. 4D light fields captured with additional temporal information per ray or as assemblies of rays include the 5th dimension, namely time and thus produce 5D light fields. This is very crucial when we have moving objects in the scene. In the recent years, research has paved way to several ideas on efficient 4D light field compression. However, techniques for compression and storage for higher dimensions is still an open challenge. In this paper we have introduced a low-complexity predictive coding of 5D light fields by automatic generation of per frame customized coding structure exploiting both spatial and temporal neighbors. Evaluations with HEVC codec shows an increase of more than 1.4 dB gain in quality.
Authors: Harini Priyadarshini Hariharan; Thorsten Herfet

A Versatile 5D Light Field Capture Array
In this paper, we describe a versatile light field capturing device able to generate sparse 4D LF, LF Video and 5D LF images. The capturing array has proven to be an ubiquitous tool for the experimental generation of light fields and the development of post processing algorithms to provide so called LF assets that can be re-used and re-purposed in creative environments.
Authors: Kelvin Chelli, Tobias Lange; Thorsten Herfet; Marek Solony; Pavel Smrz; Martin Alain; Aljosa Smolic; Jonas Trottnow; Volker Helzle

The persistent influence of viewing environment illumination color on displayed image appearance
Chromatic adaptation considering competing influences from emissive displays and ambient illumination is a little studied topic in the context of color management in proportion to its influence on displayed image appearance. An experiment was conducted to identify the degree to which observers adapt to the white point of natural images on an emissive display versus the color of ambient illumination in the room. The responses of observers had no significant difference from those of a previous experiment which was conducted with roughly the same procedure and conditions on a mobile display with a significantly smaller viewing angle. A model is proposed to predict the degree of adaptation values reported by observers. This model has a form such that it can be re-optimized to fit additional data sets for different viewing scenarios and can be used in conjunction with a number of chromatic adaptation transforms.
Authors: Trevor Canham; Marcelo Bertalmío

L2 based colour correction for light field arrays
In recent years, there has been an increase in the popularity of Light Field(LF) imaging technology with the increase in availability of LF camera devices such as the Lytro, as well as an increase in the use of LF camera arrays. However, both camera arrays and LF cameras can create views that exhibit colour discrepancies, and previous work has already tackledsimilar issues when data is captured using the Lytro camera [4] or multi-view camera systems. In previous work, an L2 based colour transfer method is applied in an iterative approach to recolour the views of a Lytro lightfield to correct the colour fading that occurs on outer views of the LF. This method was based on earlier L2 based methods proposed by Grogan et al. In this paper, we propose to combine similar aspects of the L2 based cost function proposed in the earlier work with some of the cost function constraints proposed for Lytro lightfields, and propose a new propagation scheme so that this L2 based framework can be extended to colour correcting LF arrays. We also take advantage of a colour chart captured in the scene to not only ensure that colours are consistent across the LF, but also match the ground truth colour chart.
Authors Mairéad Grogan and Aljosa Smolic
Venue: ACM CVMP 2019

Color Stabilization for Multi-Camera Light-Field Imaging
By capturing a more complete rendition of scene light than standard 2D cameras, light-field technology represents an important step towards closing the gap between live action cinematography and computer graphics. Light-field cameras accomplish this by simultaneously capturing the same scene under different angular configurations, providing directional information that allows for a multitude of post-production effects. Among the practical challenges related to capturing multiple images simultaneously, a very important problem is the fact that the different images do not perfectly match in terms of color, which severely complicates all further processing. In this work we adapt and extend to the light-field scenario a color stabilization method previously proposed for standard multi-camera shoots, and demonstrate experimentally that it provides an improvement over the state-of-the-art techniques for light-field imaging.
Authors: Olivier Vu Thanh; Trevor Canham; Javier Vazquez-Corral; Raquel Gil Rodríguez; Marcelo Bertalmío

A reevaluation of Whittle (1986, 1992 reveals the link between detection thresholds, discrimination thresholds, and brightness perception)
In 1986, Paul Whittle investigated the ability to discriminate between the luminance of two small patches viewed upon a uniform background. In 1992, Paul Whittle asked subjects to manipulate the luminance of a number of patches on a uniform background until their brightness appeared to vary from black to white with even steps. The data from the discrimination experiment almost perfectly predicted the gradient of the function obtained in the brightness experiment, indicating that the two experimental methodologies were probing the same underlying mechanism. Whittle introduced a model that was able to capture the pattern of discrimination thresholds and, in turn, the brightness data; however, there were a number of features in the data set that the model couldn't capture. In this paper, we demonstrate that the models of Kane and Bertalmío (2017) and Kingdom and Moulden (1991) may be adapted to predict all the data but only by incorporating an accurate model of detection thresholds. Additionally, we show that a divisive gain model may also capture the data but only by considering polarity-dependent, nonlinear inputs following the underlying pattern of detection thresholds. In summary, we conclude that these models provide a simple link between detection thresholds, discrimination thresholds, and brightness perception.
Authors: David Kane; Marcelo Bertalmío

Approaching real-time Character Animation in Virtual Productions
Virtual productions get increasingly common in modern movie productions. The possibilities to visualize, edit and explore virtual 3D content directly on a movie set make it invaluable for VFX rich productions. Many of the virtual production scenarios also involve animated characters and motion capturing [4]. But the complexity of animations systems prohibits it’s usage on a film set. Within the EU funded project SAUCE (Smart Assets for re-Use in Creative Environments) an extensive research on available virtual production tools and frameworks has been carried out. While most of them are not publicly available or open source, none of them had the possibility to interactively and intuitively animate characters on set.
Authors: Jonas Trottnow, Simon Spielmann

The Potential of Light Fields in Media Productions
One aspect of the EU funded project SAUCE is to explore the possibilities and challenges of integrating light field capturing and processing into media productions. A special light field camera was build by Saarland University [Herfet et al. 2018] and is first tested under production conditions in the test production “Unfolding” as part of the SAUCE project. Filmakademie Baden-Württemberg developed the contentual frame, executed the post-production and prepared a complete previsualization. Calibration and post-processing algorithms are developed by the Trinity College Dublin and the Brno University of Technology. This document describes challenges during building and shooting with the light field camera array, as well as its potential and challenges for the post-production.
Authors: Jonas Trottnow; Simon Spielmann; Tobias Lange; Kelvin Chelli; Marek Solony; Pavel Smrž; Pavel Zemčík; Weston Aenchbacher; Mairéad Grogan; Martin Alain; Aljosa Smolic; Trevor Canham; Olivier Vu-Thanh; Javier Vázquez-Corral; Marcelo Bertalmío

Interactive Light Field Tilt-Shift Refocus with Generalized Shift-and-Sum
Since their introduction more than two decades ago, light fields have gained considerable interest in graphics and vision communities due to their ability to provide the user with interactive visual content. One of the earliest and most common light field operations is digital refocus, enabling the user to choose the focus and depth-of-field for the image after capture. A common interactive method for such an operation utilizes disparity estimations, readily available from the light field, to allow the user to point-and-click on the image to chose the location of the refocus plane.
In this paper, we address the interactivity of a lesser-known light field operation: refocus to a non-frontoparallel plane, simulating the result of traditional tilt-shift photography. For this purpose we introduce a generalized shift-and-sum framework. Further, we show that the inclusion of depth information allows for intuitive interactive methods for placement of the refocus plane. In addition to refocusing, light fields also enable the user to interact with the viewpoint, which can be easily included in the proposed generalized shift-and-sum framework.
Authors: Martin Alain; Weston Aenchbacher; Aljosa Smolic

Vision Models for Wide Color Gamut Imaging in Cinema
Gamut mapping is the problem of transforming the colors of image or video content so as to fully exploit the color palette of the display device where the content will be shown, while preserving the artistic intent of the original content's creator. In particular in the cinema industry, the rapid advancement in display technologies has created a pressing need to develop automatic and fast gamut mapping algorithms. In this paper we propose a novel framework that is based on vision science models, performs both gamut reduction and gamut extension, is of low computational complexity, produces results that are free from artifacts and outperforms state-of-the-art methods according to psychophysical tests. Our experiments also highlight the limitations of existing objective metrics for the gamut mapping problem.
Authors: Syed Waqas Zamir; Javier Vazquez-Corral; Marcelo Bertalmio

DublinCity: Annotated LiDAR Point Cloud and its Applications
Scene understanding of full-scale 3D models of an urban area remains a challenging task. While advanced computer vision techniques offer cost-effective approaches to analyse 3D urban elements, a precise and densely labelled dataset is quintessential. The paper presents the first-ever labelled dataset for a highly dense Aerial Laser Scanning (ALS) point cloud at city-scale. This work introduces a novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud [12] in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (i.e., building, vegetation and ground) to refined (i.e., window, door and tree) elements. To validate the performance of our dataset, two different applications are showcased. Firstly, the labelled point cloud is employed for training Convolutional Neural Networks (CNNs) to classify urban elements. The dataset is tested on the well-known state-of-the-art CNNs (i.e., PointNet, PointNet++ and So-Net). Secondly, the complete ALS dataset is applied as detailed ground truth for city-scale image-based 3D reconstruction.
Authors: S. M. Iman Zolanvari; Susana Ruano; Aakanksha Rana; Alan Cummins; Rogerio Eduardo da Silva; Morteza Rahbar; Aljosa Smolic

Issues with Common Assumptions about the Camera Pipeline and Their Impact in HDR Imaging from Multiple Exposures
Multiple-exposure approaches for high dynamic range (HDR) image generation share a set of building assumptions: that color channels are independent and that the camera response function (CRF) remains constant while changing the exposure. The first contribution of this paper is to highlight how these assumptions, which were correct for film photography, do not hold in general for digital cameras. As a consequence, results of multiexposure HDR methods are less accurate, and when tone-mapped they often present problems like hue shifts and color artifacts. The second contribution is to propose a method to stabilize the CRF while coupling all color channels, which can be applied to both static and dynamic scenes, and yield artifact-free results that are more accurate than those obtained with state-of-the-art methods according to several image metrics.
Authors: R. Gil Rodríguez; J. Vazquez-Corral; M. Bertalmío

Enabling Multiview- and Light Field-Video for Veridical Visual Experiences
With the advent of UHDTV and the inclusion of High Dynamic Range, High Frame Rate and Extended Color Gamut 2D-imagery is able to push technical parameters up to the limits of the human visual sense. Consequently, developments in sensor technology can be used to capture information beyond 2D-imagery. In this paper we introduce multiview- and light field-video as an option to capture (at least parts of) the plenoptic function and therewith drive veridical visual experiences. Our contribution is on tools for capturing and encoding so called 5D light fields. We have built a multi-camera array producing up to 6 GigaRays/s and a real-time hierarchical H.264 MVC encoder that enables encoding the light fields in form of a legacy compliant video stream.
Authors: Thorsten Herfet; Tobias Lange; Harini Priyadarshini Hariharan

Using LSTM for Automatic Classification of Human Motion Capture Data
Creative studios tend to produce an overwhelming amount of content everyday and being able to manage these data and reuse it in new productions represent a way for reducing costs and increasing productivity and profit. This work is part of a project aiming to develop reusable assets in creative productions. This paper describes our first attempt using deep learning to classify human motion from motion capture files. It relies on a long short-term memory network (LSTM) trained to recognize action on a simplified ontology of basic actions like walking, running or jumping. Our solution was able of recognizing several actions with an accuracy over 95% in the best cases.
Authors: Rogerio Eduardo Da Silva; Jan Ondrej; Aljosa Smolic

Convolutional Neural Networks Deceived by Visual Illusion
Visual illusions teach us that what we see is not always what is represented in the physical world. Their special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear and non-linear operations. The similarity of this structure with the operations present in Convolutional Neural Networks (CNNs) has motivated us to study if CNNs trained for low-level visual tasks are deceived by visual illusions. In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size. These results suggest that in order to obtain CNNs that better replicate human behaviour, we may need to start aiming for them to better replicate visual illusions.
Authors:Alexander Gomez-Villa; Adrian Martín ; Javier Vazquez-Corral ; Marcelo Bertalmío

Influence of Ambient Chromaticity on Portable Display Color Appearance
The share hold of mobile displays in the content distribution market has grown significantly over the past decade. These displays add new complication to media color management as they can be viewed across a wide range of environments over a short span of time. There is currently no consensus within the color science community on the extent to which surround adaptation to ambient chromaticity has a significant impact on the color appearance of image content on these displays. Thus, an investigation into this query has been conducted at the Dynamic Visual Adaptation Laboratory at the Rochester Institute of Technology in Rochester, NY. The study aimed to quantify the color appearance impact of these surround signals. Observers performed an asymmetric memory matching task for a set of images viewed under SMPTE standardized mastering conditions and under a series of ambient illumination conditions with varying chromaticity and luminance. The results suggested that observers adapt partially to the chromaticity of ambient illumination while viewing images on portable displays, and also that this mixed adaptation ratio varies as a function of ambient luminance and stimulus type (self-luminous solid color versus images).
Authors: Trevor Canham; Michael J. Murdoch; David Long

In-camera, Photorealistic Style Transfer for On-set Automatic Grading
In professional cinema, the intended artistic look of the movie informs the creation of a static 3D LUT that is applied on set, where further manual modifications to the image appearance are registered as 10-parameter transforms in a color decision list (CDL). The original RAW footage and its corresponding LUT and CDL are passedon to the post-production stage where the fine-tuning of the final look is performed during color grading. — In many cases, the director wants to emulate the style and look present in a reference image, e.g. a still from an existing movie, or a photograph, or a painting, or even a frame from a previously shot sequence in the current movie. The manual creation of a LUT and CDL for this purpose may require a significant amount of work from very skilled artists and technicians, while the state of the art in the academic literature offers promising but partial solutions to the photorealistic style transfer problem, with limitations regarding artifacts, speed and manual interaction. — In this paper, we propose a method that automatically transfers the style, in terms of luminance, color palette and contrast, from a reference image to the source raw footage. It consists of three separable operations: global luminance matching, global color transfer and local contrast matching. As it just takes into account the statistics of source and reference images, no training is required. The total transform is not static but adapts to the changes in the source footage. The computational complexity of the procedure is extremely low and allows for real-time implementation in-camera, for on-set monitoring. While the method is proposed as a substitute for the need to specify a LUT and a CDL, it's compatible with further refinements performed via LUTs, CDLs and grading, both on-set and in post-production. The results are free from artifacts and provide an excellent approximation to the intended look, bringing savings in pre-production, shooting and post-production time.
Authors: Itziar Zabaleta ; Marcelo Bertalmío

Color-matching Shots from Different Cameras Having Unknown Gamma or Logarithmic Encoding Curves
In cinema and TV it is quite usual to have to work with footage coming from several cameras, which show noticeable color differences among them even if they are all the same model. In TV broadcasts, technicians work in camera control units so as to ensure color consistency when cutting from one camera to another. In cinema post-production, colorists need to manually color-match images coming from different sources. Aiming to help perform this task automatically, the Academy Color Encoding System (ACES) introduced a color management framework to work within the same color space and be able to use different cameras and displays; however, the ACES pipeline requires to have the cameras characterized previously, and therefore does not allow to work ‘in the wild’, a situation which is very common. We present a color stabilization method that, given two images of the same scene taken by two cameras with unknown settings and unknown internal parameter values, and encoded with unknown non-linear curves (logarithmic or gamma), is able to correct the colors of one of the images making it look as if it was captured with the other camera. Our method is based on treating the in-camera color processing pipeline as a combination of a 3x3 matrix followed by a non-linearity, which allows us to model a color stabilization transformation among two shots as a linear-nonlinear function with several parameters. We find corresponding points between the two images, compute the error (color difference) over them, and determine the transformation parameters that minimize this error, all automatically without any user input. The method is fast and the results have no spurious colors or spatio-temporal artifacts of any kind. It outperforms the state of the art both visually and according to several metrics, and can handle very challenging real-life examples.
Authors: Raquel Gil Rodríguez ; Javier Vazquez-Corral ; Marcelo Bertalmío

Photorealistic Style Transfer for Cinema Shoots
Color grading is the process of subtly mixing and adjusting the color and tonal balance of a movie to achieve a specific visual look. This manual editing task may require a significant amount of work from very skilled artists and technicians. In many cases the director wants to emulate the style and look present in a reference image, e.g. a still from an existing movie, a photograph, or even a previously shot sequence in the current movie. In this paper we propose a method that automatically transfers the style, in terms of tone, color palette and contrast, from a reference image to the source RAW image. It consists of three separable operations: global luminance matching, global color transfer and local contrast matching. The computational complexity of the procedure is extremely low and allows for real-time implementation in-camera. As it just takes into account the statistics of source and reference images, no training is required. The results are free from artifacts and provide an excellent approximation to the intended look, bringing savings in pre-production, shooting and post-production time.
Authors: Itziar Zabaleta; Marcelo Bertalmío

Statistics of natural images as a function of dynamic range
The statistics of real world images have been extensively investigated, in virtually all cases using low dynamic range (LDR) image databases. The few studies that have considered high dynamic range (HDR) images have performed statistical analysis over illumination maps with HDR from different sets (Dror et al. 2001) or have examined the difference between images captured with HDR techniques against those taken with single-exposure LDR photography (Pouli et al. 2010). In contrast, in this study we investigate the impact of dynamic range upon the statistics of equally created natural images. To do so we consider the HDR database SYNS (Adams et al. 2016). For the distribution of intensity, we observe that the standard deviation of the luminance histograms increases noticeably with dynamic range. Concerning the power spectrum and in accordance with previous findings (Dror et al. 2001), we observe that as the dynamic range increases the 1/f power law rule becomes substantially inaccurate, meaning that HDR images are not scale invariant. We show that a second-order polynomial model is a better fit than a linear model for the power spectrum in log-log axis. A model of the point-spread function of the eye (considering light scattering, pupil size, etc.) has been applied to the datasets creating a reduction of the dynamic range, but the statistical differences between HDR and LDR images persist and further study needs to be performed on this subject. Future avenues of research include utilizing computer generated images, with access to the exact reflectance and illumination distributions and the possibility to generate very large databases with ease, that will help performing more significant statistical analysis.
Authors: Antoine Grimaldi; David Kane; Marcelo Bertalmío

Light Field Compression by Superpixel Based Filtering and Pseudo-Temporal Reordering
In this paper we have addressed the topic of an evolutionary integration of light fields into standard image/video processing chains by pre-processing light fields with superpixel-based and structurally adaptive Gaussian pre-filters and circular pseudo-temporal sequencing to feed them into an HEVC-codec with low-delay predictive coding configuration. We could show significant bit rate reductions of up to 27% compared to pseudo-temporal sequencing without pre-processing. The paper includes experimental results showing that not only the perceived visual quality, but also the cornucopia of post-processing options is preserved.
Authors: Harini Priyadarshini Hariharan; Thorsten Herfet