The big face off

Type a person’s name into the Google image search engine and the results are likely to vary wildly. You may find pictures of the person you’re seeking, but you’re also likely to see completely irrelevant images just because their name appears on the same web page.

You might have better luck if your computer could analyze a picture of the person you want, then search through millions of other images — even hours of videotape — to find someone who looks identical or similar. Ideally, the computer could match the faces regardless of whether the subject is in bright or low light, is only partially facing the camera or is near or far.

That’s exactly what two Pacific Northwest National Laboratory (PNNL) researchers have done. Their algorithms analyze millions of video frames, pluck out the faces and quantify them to create searchable databases for facial identification.

“We’re measuring the information content of a face much like Google” analyzes written web material, says Harold Trease, a PNNL computational physicist. “What they do for text searching we’re trying to do for video and image processing.”

A program that picks faces out of streaming or recorded video and identifies them regardless of conditions could be useful in many areas, but for Trease and Rob Farber, a PNNL senior research scientist, it’s just a test case.

“It doesn’t have to be webcams,” Farber adds. “This is ‘a first toe in the water’ work” to prove the concept on massive amounts of unstructured data and high-performance computers. The algorithms could be generalized to work with almost any set of digital images to identify a variety of objects, including hidden roadside bombs and tumors.

Face recognition is especially tricky conditions in which light levels, size and angles change constantly. For instance, humans typically have few problems recognizing people regardless of whether they’re close or somewhat distant, but computers aren’t as adept. So facial recognition algorithms must have “scale invariance” — the ability to pick a face out of video regardless of its distance from the camera.

Likewise, a successful algorithm must have a degree of “rotation invariance” — the ability to distinguish faces that aren’t facing the camera head-on. And it must have “translational invariance” — the ability to extract faces or other target objects in a video even if they’re moving within the frame.

The first part of the algorithm, largely Trease’s work, starts with a raw red-green-blue (RGB) format video frame and transforms it to concentrate on the qualities of hue, saturation and ntensity. The intensity parameter is discarded, allowing the algorithm to work regardless of lighting in the image.

Next the algorithm sifts out facial “blobs” based on hue. “It turns out that skin color occupies a very narrow band in the hue dimension” regardless of race or ethnicity, Trease says. The algorithms identify patches of skin pixels, apply edge-detection filters to separate overlapping faces, and compute the two-dimensional geometry of each blob.
Next, successive constraints based on Shannon entropy measures are applied to generate a 20 attribute signature characteristic of each face. The entropy measures calculate the images’ information content.

It’s impractical, however, to compare complex 20-dimensional signatures to identify faces or other objects extracted from a host of video frames. So the second part of the algorithm, largely based on Farber’s research, reduces those 20 dimensions to just three.

“We crunch down to low dimensions and are able to do that with high accuracy — we don’t introduce a lot of error between the different points,” Farber says. That makes it possible to use similarity metrics that can distinguish between like faces.

To do the job, the researchers start with principal component analysis (PCA), a technique for identifying the main directions and trends among a group of data points. “Principal components allows us to find the smallest set of linear combinations that lets us interpret the data,” Farber says.

The process forces the 20-dimensional vector through a “bottleneck” of three linear “neurons” to find the three principal components that can accurately reconstruct the original signature. Data passes through the bottleneck to a set of output neurons that reconstruct the data. The process is repeated, reducing the error each time as the bottleneck neurons learn the principal components.

The approach has proven to be extraordinarily accurate and efficient. In one test using 2,000 pictures with known identities, the algorithm correctly identified all but two faces — although Trease notes that accuracy varies with image quality, resolution and other factors. And because Farber combined the neural network with a massively parallel mapping technique he pioneered in the 1980s, the program achieves high throughput with near-linear scaling — the amount of work the computer does rises in direct proportion to the number of processors employed.
Just because the algorithms run well on supercomputers doesn’t mean they can’t do as well on smaller machines, Farber notes. They also achieve near-linear scaling on inexpensive commodity hardware like NVIDIA graphics processing units (GPUs). Farber envisions low-power “smart sensors” that capture data, do initial data extraction and then compress the results and transmit them at low bandwidth for processing.

In fact, the potential applications are so wide he seems a little overwhelmed.

“We have many different directions we’re contemplating,” Farber says. “It depends on what’s going to be the best allocation of our available resources.”

About the Author

Thomas R. O’Donnell is a former Krell Institute science writer.

About the Author

Share This Article

DEIXIS Newsletter