Automatic Face Smoothing - Setup at the Swiss Camera Museum, Vevey, Switzerland.


Daniel Tamburrino, Clément Fredembach, and Sabine Süsstrunk.

What is near-infrared ?

The human eye is sensitive to the "visible" part of the electromagnetic spectrum, corresponding to wavelengths from 400 (blue) to 700 (red) nanometers. Located right after red, near-infrared is not perceived by the human visual system. Digital cameras, however, capture light with a silicon sensor, which is sensitive to both visible and near-infrared light.

Figure 1: Electromagnetic spetrum.

How to capture near-infrared ?

Every digital camera is capable of acquiring images in the near-infrared. However, to avoid contamination of the color signal by infrared light, an infrared-blocking filter is placed in front of the sensor in the camera. To be able to capture images in the visible and in the near-infrared, this filter is removed and replaced with transparent glass. Filters are then placed in front of the camera lens to alternatively capture in the visible or near-infrared.

Portrait enhancement

Skin tones, portraits in particular, are of tremendous importance in photography and video, but a number of factors, such as pigmentation irregularities (e.g., moles, freckles), irritation, roughness, or wrinkles can reduce their appeal. Moreover, such "defects" are oftentimes enhanced by lighting conditions. Starting with the observations that melanin and hemoglobin, the key components of skin color, have little absorption in the near-infrared part of the spectrum, and that the depth of light penetration in the epidermis is proportional to the incident light's wavelength, we can use information provided by near-infrared images to enhance the look of visible images.

examples: VIS examples: NIR examples: Enhanced
Figure 2: Left: "normal" color image. Middle: near-infrared image. Right: enhanced image using near-infrared. Click on an image to enlarge it.


A prototype of a system that performs automatic skin smoothing was built and deployed in the Digital Revolution exhibit at the Swiss Camera Museum in Vevey, Switzerland. The installation is set-up to resemble a photo booth, which was and is still used to take passport and other ID photos. The "normal" (visible) and "enhanced" (fused visible + near-infrared) pictures are displayed on a screen. The visitors are asked which picture they prefer and can then have both pictures sent to their email address.

installation 1 installation 2

installation 3 installation 4
Figure 3: Setup of our system at the Swiss Camera Museum in Vevey.


We use a JAI AD-080GE camera to acquire visible and near-infrared (NIR) images. This 2CCD multi-spectral camera can simultaneously capture both color and NIR images in one camera housing and through a single lens (Fig. 4). It uses a multi-faceted prism in the optical path with bandpass filters on each spectral axis to capture RGB through a Bayer Filter Array on one sensor and NIR on the other. The camera, primarily intended for industrial quality control, has a resolution of 1024x768 pixels and a maximum exposure time of 1/30 seconds that require a good light source and little motion.

JAI 2CCD camera
Figure 4: 2CCD JAI AD-080GE camera can capture simultaneousely visible and near-infrared images.

Image processing

Deeper light penetration combined with the relative transparency of hemoglobin and melanin to near-infrared (NIR) allow for more unwanted features to be smoothed out in the NIR image. Flushed skin, visible capillaries, rash, and acne are all features that are present on the skin surface. While these small-scale skin features are generally attenuated in the near-infrared, faces also contain other details that should be preserved. High-frequency features, such as the distinction between skin, eye, iris, and pupil have to be retained in the fused image. Additionally, hair-based features (e.g., hair, eyelashes, beard) have to remain as sharp as in the original image. This result is achieved using a bilateral filter.

The bilateral filter is an edge-aware spatial filter that can decompose an image into base (low frequency) and detail (high-frequency) layers. The process is illustrated in Fig. 5. The color RGB image is first transformed in a Luminance-Chrominance color space. The luminance channel that represents the light intensity is decomposed into its base and detail layer using the bilateral filter. The same decomposition is applied to the NIR image. A new luminance channel is then reconstructed using the base layer from the visible image and the detail layer from the NIR image. Indeed, the NIR detail layer contains all the details we want to keep (hair, etc.) without the skin imperfections that are in the detail layer of the visible image, which we discard. The new luminance channel is finally added to the chrominance and converted back to a RGB color image.

The algorithm is written in C++ and takes about 500ms for a 1024x768 image pair on a 2.4GHz Core2Duo processor using only one core. Bilateral filtering is the most time-consuming task. We use the Fast Bilateral Filter with Truncated Kernel with a spatial sigma of 16.0 and a range sigma of 0.1.

Figure 5: Image processing workflow.


This work was supported in part by the Swiss National Science Foundation under grant number 200021-124796/1.