<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Digitologist&#187; Projects</title>
	<atom:link href="http://digitologist.com/blog/projects/feed/" rel="self" type="application/rss+xml" />
	<link>http://digitologist.com</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Fri, 20 May 2011 03:39:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>fastICA in AS3</title>
		<link>http://digitologist.com/2011/04/fastica-in-as3-2/</link>
		<comments>http://digitologist.com/2011/04/fastica-in-as3-2/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 02:03:01 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Projects]]></category>

		<guid isPermaLink="false">http://digitologist.com/?p=213</guid>
		<description><![CDATA[[click the image above for a demonstration] I&#8217;m not going to spend a lot of time explaining what fastICA is, or what Independent Component Analysis is in general, but if you need it in Actionscript and you&#8217;ve been looking for it, here it is. I only vaguely understand how it works (for now!), but math [...]]]></description>
			<content:encoded><![CDATA[<p>[click the image above for a demonstration] </p>
<p>I&#8217;m not going to spend a lot of time explaining what <a href="http://en.wikipedia.org/wiki/FastICA" target="_blank">fastICA</a> is, or what <a href="http://en.wikipedia.org/wiki/Independent_component_analysis" target="_blank">Independent Component Analysis</a> is in general, but if you need it in Actionscript and you&#8217;ve been looking for it, here it is.  <span id="more-213"></span>I only vaguely understand how it works (for now!), but math is math and code is code, and once I had my Matrix Math package set up, it was just a matter of effort and optimization.</p>
<p>It&#8217;s ported it over from <a href="http://mdp-toolkit.sourceforge.net/" target="_blank">the MDP package in Python</a>.  It&#8217;s not the full implementation, just as much as I needed to make <a href="http://digitologist.com/2011/04/i-can-read-your-pulse-by-webcam-2/" target="_self">the Pulse project</a> work, so at some point I&#8217;m going to finish porting over the rest of the internal methods.  It&#8217;s mad slow, yo!  My next goal is to offload the heavier calculations to PixelBender, like I did with the Lomb-Scargle code <a href="http://digitologist.com/2011/04/i-can-read-your-pulse-by-webcam-2/" target="_self">the Pulse project</a> used, but until then, you&#8217;ll need to only run it against reasonably-small data sets if you want it to run with anything approaching &#8220;speed&#8221;.</p>
<p>While I work on getting a code repository set up, use the contact form or leave a message in the comments if you want me to send it to you, I want to share!  Click through, or on the image above, to launch a demo.</p>
<p>
<object width="610" height="678">
<param name="movie" value="http://digitologist.com/wp-content/uploads/2011/04/ICAFlex.swf"></param>
<param name="quality" value="high"></param>
<param name="wmode" value="window"></param>
<param name="menu" value="false"></param>
<param name="bgcolor" value="#FFFFFF"></param>
<embed type="application/x-shockwave-flash" width="610" height="678" src="http://digitologist.com/wp-content/uploads/2011/04/ICAFlex.swf" quality="high" bgcolor="#FFFFFF" wmode="window" menu="false" ></embed>
</object>
</p>
]]></content:encoded>
			<wfw:commentRss>http://digitologist.com/2011/04/fastica-in-as3-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I Can Read Your Pulse by Webcam</title>
		<link>http://digitologist.com/2011/04/i-can-read-your-pulse-by-webcam/</link>
		<comments>http://digitologist.com/2011/04/i-can-read-your-pulse-by-webcam/#comments</comments>
		<pubDate>Tue, 05 Apr 2011 04:33:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Projects]]></category>

		<guid isPermaLink="false">http://digitologist.com/?p=130</guid>
		<description><![CDATA[[click the image above to try it yourself] The above is the latest version of a labor of love I&#8217;ve been working on, based on research coming out of the MIT Media Lab. Turns out the veeeery minute changes in color of your skin tone can, frame-by-frame, be read and decoded to arrive at a [...]]]></description>
			<content:encoded><![CDATA[<p>[click the image above to try it yourself]</p>
<p>The above is the latest version of a labor of love I&#8217;ve been working on, based on <a href="http://web.mit.edu/newsoffice/2010/pulse-camera-1004.html">research coming out of the MIT Media Lab.</a> Turns out the veeeery minute changes in color of your skin tone can, frame-by-frame, be read and decoded to arrive at a just-fine measure of your pulse.</p>
<p><span id="more-130"></span></p>
<p>The general process by which it works is called photoplethysmography, and it&#8217;s the same way normal fingertip pulse readers work.  Those use an LED to light up your skin and then measure how the light absorption changes whenever blood flows through.</p>
<p>I read through the <a href="http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10762">original research paper</a>, and then later got to peek at the code the researchers used (through the sponsor relationship of my parent company with MIT), and challenged myself to rebuild it in Flash so that it could be web-deployable (and so could be thrown into a banner or a microsite as a gimmick for one of our clients).  Forgive the interface I gave it, I honestly don&#8217;t have a design bone in my body.</p>
<p>Want to make your own? Awesome, here&#8217;s what you need to work out:</p>
<ol>
<li>First, you need to identify a region of open skin from the webcam feed, so fire up a face identification method and have it mark out a subregion of the user&#8217;s face.  I used a tweaked version of <a title="Marilena on Libspark" href="e=UTF-8&amp;layout=2&amp;eotf=1&amp;sl=auto&amp;tl=en&amp;u=http%3A%2F%2Fwww.libspark.org%2Fwiki%2Fmash%2FMarilena&amp;act=url" target="_blank">the Marilena library for AS3</a>, and had it strip out only the nose and cheeks area, so that eye and mouth movements wouldn&#8217;t screw things up.</li>
<li>Then you need to<a href="http://digitologist.com/2011/02/tip-use-as3s-histogram-method-for-color-averaging/"> average out the red, green, and blue color values for the pixels</a> in that subregion, and store a bunch of consecutive frames of them in a matrix.  Not having found a satisfactory AS3 matrix math class, I went ahead and made my own AND BOY WAS IT A DOOZY.  Turns out things like matrix multiplication, while straightforward, are not very computationally efficient, so I had to research, optimize and reoptimize before I could move very far.</li>
<li>Once you have a sufficient number of frames worth of data (this version gets by with around 70 frames, a little over two seconds&#8217; worth, but the more you have, the more stable your readings will be), the real fun begins.  You should have three signals at this point&#8211;the traces of each of the color channels of the user&#8217;s skin as they change over time&#8211;so the MIT researchers figured out that you could pass them through a <a href="http://en.wikipedia.org/wiki/Blind_signal_separation" target="_blank">blind source separation</a> algorithm (more on that in a later post) and at least one of the outputs would contain a fairly dependable pulse reading.  Again, not having found, really, ANY blind source separation libraries for AS3, I went ahead and made my own AND BOY WAS IT A DOOZY.  My version of FastICA for AS3 is basically just a port of the version in <a href="http://mdp-toolkit.sourceforge.net/" target="_blank">the MDP parkage for Python</a>, but with a crapload of optimizations to speed it up for Flash.</li>
<li>So by now you should have a clean pulse signal, and it&#8217;s time to extract out the period of the beats.  Fire up your favorite periodogram function and pass your cleaned-up signal through it and you should be able to get a beats-per-minute measure easily.  I went ahead and rolled my own implementation of <a href="http://en.wikipedia.org/wiki/Least-squares_spectral_analysis" target="_blank">the Lomb-Scargle method</a> that was based, again, on the work from the MIT researchers and BOY WAS IT A DOOZY.  Again, not a very computationally efficient process (though I recently found a better method I want to try), so this version depends on a customized Pixel Bender kernel to do all the heavy lifting outside of the Flash thread (I&#8217;ll do a whole post on it eventually).</li>
</ol>
<p>So I plan to keep playing with it.  The low frame count is primarily due to the fact that fastICA starts choking when it has to chew on too much data, so I want to get that going faster (maybe by dumping it off to a Pixel Bender kernel, too) so that I can load in more frames for it to process over.  I don&#8217;t like how jumpy and off the reading can be sometimes.  Work in progress!</p>
]]></content:encoded>
			<wfw:commentRss>http://digitologist.com/2011/04/i-can-read-your-pulse-by-webcam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Face Tracking Research &#8211; DroidDoes.com</title>
		<link>http://digitologist.com/2011/03/face-tracking-research-droiddoes-com/</link>
		<comments>http://digitologist.com/2011/03/face-tracking-research-droiddoes-com/#comments</comments>
		<pubDate>Thu, 17 Mar 2011 06:10:12 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Projects]]></category>

		<guid isPermaLink="false">http://digitologist.com/?p=172</guid>
		<description><![CDATA[[click above to view a short video (I show up at 00:11)] Last spring one of the guys on the team whipped up a fun little Flash game that used motion tracking (via frame-differencing) to have a boxing glove follow the user&#8217;s hand around the screen, batting away projectiles. It was cute and fluffy, but [...]]]></description>
			<content:encoded><![CDATA[<p>[click above to view a short video (I show up at 00:11)]</p>
<p>Last spring one of the guys on the team whipped up a fun little Flash game that used motion tracking (via frame-differencing) to have a boxing glove follow the user&#8217;s hand around the screen, batting away projectiles.  It was cute and fluffy, but kept messing up whenever the user moved their head or talked, because that motion was getting picked up, too, skewing the centroid of the motion and ruining the experience.  I knew that if we could locate the face, and mask out its effects on the center of motion, it would clean up the whole thing and make it a lot better.</p>
<p><span id="more-172"></span></p>
<p>I set off to make it happen, and quickly found <a href="http://www.quasimondo.com/archives/000687.php">the Marilena libraries</a>, based on <a href="http://en.wikipedia.org/wiki/Haar-like_features" target="_blank">Haar cascade face recognition</a>, which could pick out a face in a frame and mark its location.  It worked really well, and probably could have worked for a low-level execution, maybe a really light banner ad. Armed with this fix ready in my back pocket, I set up a conversation with our Chief Creative Officer to show it off.</p>
<p>He was interested, but not at the edge of his seat, until I pointed out that by tweaking the method somewhat, we could use it to do some calculations of where the user&#8217;s face was in relation to their screen and mimic <a href="http://www.youtube.com/watch?v=Jd3-eiid-Uw">the now famous head-tracking effect demonstrated by Johnny Chung Lee</a> with a Wiimote and some infrared emitters.</p>
<p>After showing the video and promising we could pull off a similar effect, the room suddenly filled up with people, all Creative Directors from various accounts that the CCO leaned out and called in.   The conversation became all about this, and burger-swatting fell by the wayside while each CD asked how this head-tracking method could be used in their client work.  We settled on building a prototype for the upcoming Droid summer campaign that the user could navigate around just by moving their head.</p>
<p>Once we got started, though, we quicky hit the limit of what an average machine could process.  The processing power it used in finding the face  meant that we couldn&#8217;t add a lot of additional functionality  (Papervision 3D was also ruled out due to performance issues), so I resumed my research and finally found the answer in a recently published Danish research paper, calling out a method called CAMSHIFT as a computationally efficient method of tracking a face, once identified by the Haar cascade method.</p>
<p>With a little further research, I stumbled onto <a href="http://www.mukimuki.fr/flashblog/2009/06/18/camshift-going-to-the-source/">a Flash port of CAMSHIFT</a>, where you sampled a region of color by drawing a box around it, and it could track that color region as a blob  that moved around frame-by-frame without losing tracking.  Best of all,  it was lightning fast, compared to Marilena.  The only issue with the  CAMSHIFT method was that the user would have to either sample their own  face by dragging a box across it, or else be tricked into doing it by  having them line their face up with an on-screen prompt.</p>
<p>We chose the latter, and created a &#8220;Calibration&#8221; stage, just after the loader, that prompted the user to center their face inside an on screen oval, which would then snap the sample and start tracking right away.  The client looked at the demos we sent over and approved the project, and a few months later DroidDoes.com was launched.  Over time, the effect was severely diminished and at times removed, at the clients&#8217; request, so at this point all that&#8217;s left is a subtle twist when you lean back and forth.</p>
<p>This past winter we also found <a href="http://www.attraction-lemanga.fr/site/index.php">a French anti-smoking website/game/anime thing</a> that seems to use the same CAMSHIFT library we used, only with less subtlety about it.  In wrapping up the project, I put together a combined library that used Marilena to identify the face, then passed that rectangle on to the CAMSHIFT side to take over tracking it, which helped reduce some of the performance issues.  I&#8217;ve used it from time to time, but I&#8217;m still looking for a really killer application of the idea.  Shoot me a note if you want to take a look at the library and offer any thoughts.</p>
]]></content:encoded>
			<wfw:commentRss>http://digitologist.com/2011/03/face-tracking-research-droiddoes-com/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a Social Media Listening Platform from scratch</title>
		<link>http://digitologist.com/2011/03/building-a-social-media-listening-platform-from-scratch/</link>
		<comments>http://digitologist.com/2011/03/building-a-social-media-listening-platform-from-scratch/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 06:59:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Projects]]></category>

		<guid isPermaLink="false">http://digitologist.com/?p=116</guid>
		<description><![CDATA[[click above to view a short video] The brief from management was to build a system capable of collecting brand mentions from all over the web, organizing and analyzing them, and then displaying them on an interface that we could distribute to our clients and account managers.  Mcgarrybowen needed its very own Social Media Listening [...]]]></description>
			<content:encoded><![CDATA[<p>[click above to view a short video]</p>
<p>The brief from management was to build a system capable of collecting  brand mentions from all over the web, organizing and analyzing them,  and then displaying them on an interface that we could distribute to our  clients and account managers.  Mcgarrybowen needed its very own Social  Media Listening Platform.</p>
<p><span id="more-116"></span></p>
<p><img title="More..." src="http://cc.mcgarrybowen.com/wp-includes/js/tinymce/plugins/wordpress/img/trans.gif" alt="" /></p>
<p>With some further questioning, more requirements were established.   The system should be a peace-of-mind application for distribution to our  clients, as a monitor of their real-time brand reputation on the web.   It should be optimized for quick-glance reviews, with broad but shallow  content, but with the ability to drill down from top-level reports into  granular metrics reporting.  It should be capable of tracking the  reputation not just of the brand itself, but of its key competition as  well. It would need to be accessible from the web or the iPad, so HTML5  was a must.</p>
<p>With a mammoth assignment like this, our first step was to break it down into manageable, workable chunks.</p>
<p><strong>Finding the data sources:</strong></p>
<p>Our first challenge: Where on the Internet is the brand being mentioned, and how do we collect that data?</p>
<p>We divided our focus into three buckets:</p>
<ol>
<li><em>News &amp; Headlines</em> – What is the mainstream news media saying about the brand?  What press releases have been published that reference the brand?</li>
<li><em>Online Authorities and In-Market sources</em> – What sites are  consumers likely to visit when they’re searching for information about  the brand or the industry?  What ratings and reviews are they likely to  view that mention the brand?</li>
<li><em>Buzz</em> – What are consumers likely to come across on the web  when they are not actively in-market?  What are bloggers, tweeters, and  Diggers saying that might affect a consumer’s opinion of the brand?</li>
</ol>
<p>We knew that a robust solution would eventually take us down the  route of building screen scrapers that could collect and organize data  from any site we pointed it at, but for the initial prototypes, we  decided to focus strictly on sources with well-established APIs.  We  picked 30 sources, checked documentation and ran test queries, organized  the returned data and did a gap analysis to figure out how we would  organize the data across multiple sources.</p>
<p>This is where we ran into our first set of issues:</p>
<ol>
<li><span style="text-decoration: underline;">How often are we querying?</span> Some sources might only need to  update once a day, but others, like Twitter, would require near-constant  monitoring to keep up with the sheer volume of results.</li>
<li><span style="text-decoration: underline;">What format are the returns in? </span>We realized that our pulls would need to be capable of parsing XML, JSON, or CSV depending on the source.</li>
<li><span style="text-decoration: underline;">Are we violating anyone’s Terms of Service? </span> We knew we  wanted to store everything in a centralized database, but several  sources had specific prohibitions against storing their data in external  frameworks.</li>
<li>But the biggest question turned out to be:  <span style="text-decoration: underline;">What, exactly, are we searching for? </span>We  quickly realized that our pulls were going to have to be  keyword-driven, submitting a given term to the API and logging what the  returns were.</li>
</ol>
<p>A keyword search strategy would need to be established.  We started  out by searching only with the brand’s name, but realized that even  subtle misspellings would be lost in this search, so we created the  “Brand-words” category.   One of our test cases was Marriott, which  meant also including “Marriot”, “Mariott”, and “Mariot”.  Sub-brand  terms like “Courtyard”, “Renaissance”, and “Residence Inn” filled out  this category.</p>
<p>Our second grouping was “Competitor-words”, which included search  terms with the names of the top competition within the brand’s industry  (in the case of Marriott, we used “Hilton”, “Intercontinental”, and  “Four Seasons”).  The final grouping was “Industry-words”, hoping to  capture conversations about more general topics within the industry.</p>
<p>Once we’d worked to satisfy all of these issues, we had a clean and  stable database, pulling each source regularly, and with an API for  getting the data back out.   We now could move on to the next most  pressing concern:</p>
<p><strong>How to analyze the data:</strong></p>
<p>Since reputation management is all about keeping people’s opinions  more positive than negative about your brand, our first priority for the  prototype was to set up a sentiment analysis engine capable of reading  through each of our database items and appending an evaluation of how  positive or negative they were.  Even the smallest amount of research  revealed this to be a huge task, but we were up for the challenge.</p>
<p>We looked at a number of different approaches, both custom and  off-the-shelf, and determined that we’d get the most value out of  building and training our own Naïve Bayesian classifier, a  well-documented method of extracting sentiment from unstructured text.   Given a number of sample text snippets, each with a manually-supplied  categorization, the system should, in time, be able to recognize which  of the categories any new text snippet should belong in.  Anything  that’s noticed as mis-scored can be resubmitted to the system with a  correction, gradually increasing the tool’s accuracy over time.</p>
<p>We knew that our system should be capable of recognizing “Positive”  vs. “Negative” vs. “Neutral”, but after looking at the data we were  accumulating, we were again surprised by how much of the data we were  grabbing wasn’t right for the system.  We included a “NSFW” category to  weed out the more colorful entries, a “Non-Applicable” category (hadn’t  realized that most “Hilton” searches would result in the latest Paris  Hilton scandals), and a “Spam” category to filter out the surprising  volume of Tweet-spam featuring bogus vacation offers.</p>
<p>A handful of lucky interns were tasked with poring over 20,000 of our  database entries and manually scoring them as one of the 6 categories <em>twice</em>,  each being scored again by someone else as confirmation.  If they two  scores differed, they were flagged for further administrator review.   After amassing this much data, testing confirmed that any new items were  being scored with up to 72% accuracy.</p>
<p><strong>Presenting the Data:</strong></p>
<p>With our back-end in order, our attention shifted to the interface:  How are our clients going to view the data?  Our only limitation was the  desire to have it viewable on the iPad, so a purely-HTML5 Canvas  application was needed.</p>
<p>A designer was brought in to prepare the visualization scheme, based  around a radial &#8220;health&#8221; metric that compared the the total number of  brand mentions to how many of those were positive or negative.  We ended  up with three data views:</p>
<ul>
<li>Up-to-the-minute, live-streaming brand mentions, monitoring a single day&#8217;s brand health as they are picked up by the system</li>
<li>An at-a-glance historical review review of brand performance for recent pre-set time periods</li>
<li>A deep analytics toolkit to monitor the brand&#8217;s sentiment over time, keyword-by-keyword and for any selected date range</li>
</ul>
<p>The combination of these three techniques satisfied all our original  requirements and created a platform for future data viz designers to  pick up where we left off.</p>
<p><strong>What we learned:</strong></p>
<p>Building a Social Media Listening Platform is not easy, but once the  fundamentals are in place, they start to click together like puzzle  pieces.  Broken down by challange:</p>
<ol>
<li>Intake: for each source you plan to monitor, you will need custom  scripts to call for new data, clean it up, and store it to your  database.  Make sure you know the limits of each API&#8217;s Terms of Service  and if they have a limit on how often and how much data  you can fetch.   Your data storage will need to grow as you add new keywords and will  depend on how general or specific your terms are.</li>
<li>Processing: find a sentiment-analysis algorithm you like and train  the heck out of it.  The smarter you can make your system, the more  accurate and valuable it&#8217;ll be.  Unfortunately, this cannot (in most  cases) be automated, so account for the training time in your planning.   Bribe your interns to do it with pizza and iTunes gift cards and you&#8217;ll  have great success.</li>
<li>Display: compelling data visualization is at least as important as  the data itself.  Get a designer who knows what they&#8217;re doing, and it&#8217;ll  be a simple matter of hooking the right data pipes to the right display  outputs.</li>
</ol>
<p>Get these three areas right and it&#8217;ll be a piece of cake&#8211;or just ask nicely and I can help you make one, should be pretty good at it by now.</p>
]]></content:encoded>
			<wfw:commentRss>http://digitologist.com/2011/03/building-a-social-media-listening-platform-from-scratch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
