17 Mar 2011

Face Tracking Research – DroidDoes.com

Blog No Comments

Last spring one of the guys on the team whipped up a fun little Flash game that used motion tracking (via frame-differencing) to have a boxing glove follow the user’s hand around the screen, batting away hamburger projectiles. It was cute and fluffy, but kept messing up whenever the user moved their head or talked, because that motion was getting picked up, too, skewing the centroid of the motion and ruining the experience. I knew that if we could locate the face, and mask out its effects on the center of motion, it would clean up the whole thing and make it a lot better.

I set off to make it happen, and quickly found the Marilena libraries, based on Haar cascade face recognition, which could pick out a face in a frame and mark its location.  It worked really well, and probably could have worked for a low-level execution, maybe a really light banner ad. Armed with this fix ready in my back pocket, I set up a conversation with our Chief Creative Officer to show it off.

He was interested, but not at the edge of his seat, until I pointed out that by tweaking the method somewhat, we could use it to do some calculations of where the user’s face was in relation to their screen and mimic the now famous head-tracking effect demonstrated by Johnny Chung Lee with a Wiimote and some infrared emitters.

After showing the video and promising we could pull off a similar effect, the room suddenly filled up with people, all Creative Directors from various accounts that the CCO leaned out and called in.   The conversation became all about this, and burger-swatting fell by the wayside while each CD asked how this head-tracking method could be used in their client work.  We settled on building a prototype for the upcoming Droid summer campaign that the user could navigate around just by moving their head.

Once we got started, though, we quicky hit the limit of what an average machine could process.  The processing power it used in finding the face meant that we couldn’t add a lot of additional functionality (Papervision 3D was also ruled out due to performance issues), so I resumed my research and finally found the answer in a recently published Danish research paper, calling out a method called CAMSHIFT as a computationally efficient method of tracking a face, once identified by the Haar cascade method.

With a little further research, I stumbled onto a Flash port of CAMSHIFT, where you sampled a region of color by drawing a box around it, and it could track that color region as a blob that moved around frame-by-frame without losing tracking.  Best of all, it was lightning fast, compared to Marilena.  The only issue with the CAMSHIFT method was that the user would have to either sample their own face by dragging a box across it, or else be tricked into doing it by having them line their face up with an on-screen prompt.

We chose the latter, and created a “Calibration” stage, just after the loader, that prompted the user to center their face inside an on-screen oval, which would then snap the sample and start tracking right away.  The client looked at the demos we sent over and approved the project, and a few months later DroidDoes.com was launched.  Over time, the effect was severely diminished and at times removed, at the clients’ request, so at this point all that’s left is a subtle twist when you lean back and forth.

This past winter we also found a French anti-smoking website/game/anime thing that seems to use the same CAMSHIFT library we used, only with less subtlety about it.  In wrapping up the project, I put together a combined library that used Marilena to identify the face, then passed that rectangle on to the CAMSHIFT side to take over tracking it, which helped reduce some of the performance issues.  I’ve used it from time to time, but I’m still looking for a really killer application of the idea.  Shoot me a note if you want to take a look at the library and offer any thoughts.

No Responses to “Face Tracking Research – DroidDoes.com”

Leave a Reply