translator

I was pleased to see David Pogue’s pos­i­tive review of the new Win­dows Phone, Nokia’s Lumia 900, a cou­ple of weeks ago in the New York Times.  Win­dows Phone has made great progress these past cou­ple of years, and has advanced a beau­ti­ful and fresh design lan­guage, Metro, which we see being adopted all around Microsoft.  (I’ve been a big advo­cate for Metro lan­guage and prin­ci­ples in my own part of the com­pany, Online Ser­vices.)  Pogue’s only real com­plaint is that apps for Win­dows Phone  are still thin­ner on the ground than on iPhone and Android— though as he points out, what really mat­ters is whether the impor­tant, great and use­ful apps are there, not whether the total num­ber is 50,000 or 500,000.  Many apps doesn’t nec­es­sar­ily imply many qual­ity apps, and most of us have got­ten over last decade’s “app mania” that inspired one to fill screen after screen with run-once wonders.

What really made me smile was Pogue’s char­ac­ter­i­za­tion of what those impor­tant apps are, in his view.  After reel­ing off  a few of the usual sus­pects— Yelp, Twit­ter, Pan­dora, Face­book, etc.— he added:

Plenty of my less famous favorites are also unavail­able: Line2, Hip­munk, Nest, Word Lens, iStop­Mo­tion, Glee, Oca­rina, Songify This.

Even Microsoft’s own amaz­ing iPhone app, Pho­to­synth, isn’t avail­able for the Lumia 900.

I’ve also been asked (a num­ber of times) about Pho­to­synth for Win­dows Phone... hang in there.  A nice piece of news we’ve just announced, how­ever, is a new app for Win­dows Phone that I hope will join Pogue’s pan­theon, and that is con­sid­er­ably more advanced than its coun­ter­parts on other devices: Trans­la­tor.  Tech­ni­cally this isn’t a new app, but an update, though the update is far more func­tional than its predecessor.

Trans­la­tor has offline lan­guage sup­port, mean­ing that if you install the right lan­guage pack you can use it abroad with­out a data con­nec­tion (essen­tial for now, I wish inter­na­tional data were a prob­lem of the past).  It also has a nice speech trans­la­tion mode, but what’s per­haps most inter­est­ing is the visual mode.  Visual trans­la­tion is really help­ful when you’re encoun­ter­ing menus, signs, forms, etc., and is espe­cially impor­tant when you need to deal with char­ac­ter sets that you not only can’t pro­nounce, but can’t even write or type (that would be Chinese).

Word Lens, men­tioned by Pogue, was one of our inspi­ra­tions in devel­op­ing the new Trans­la­tor.  What’s impres­sive about Word Lens is its abil­ity to process frames from the cam­era at near-video speed, read­ing text, gen­er­at­ing word-by-word trans­la­tions, and over­lay­ing those onto the video feed in place of the orig­i­nal text.  This is quite a feat, near the edge of achiev­abil­ity on cur­rent mobile phone hard­ware.  In my view it’s also one of the first con­vinc­ing appli­ca­tions of aug­mented real­ity on a phone.  How­ever, the approach suf­fers from some inher­ent draw­backs.  First, the trans­la­tion is word-by-word, which often results in non­sen­si­cal trans­lated texts.  Sec­ond, there isn’t quite enough com­pute time to do the job prop­erly in just one frame, yield­ing a some­what slug­gish feel; on the other hand the inde­pen­dent pro­cess­ing of each frame is waste­ful and often makes words flicker in and out of their cor­rect trans­la­tions, just a bit too fast to fol­low.  For me, these things make Word Lens a good idea, and bet­ter than noth­ing in a pinch, but imperfect.

The visual trans­la­tion in Trans­la­tor takes a dif­fer­ent approach.  It exploits the fact that the text one is aim­ing at is printed on a sur­face and is gen­er­ally con­stant.  What needs to be done frame-by-frame, then, is to lock onto that sur­face and track it.  This is done using Photosynth-like com­puter vision tech­niques, but in real­time, a bit like the video track­ing in our TED 2010 demo.  Selected, sta­bi­lized frames from that video can then be rec­ti­fied and the opti­cal char­ac­ter recog­ni­tion (OCR) can be done on them asyn­chro­nously— that is, on a timescale not cou­pled to the video fram­er­ate.  We can do a bet­ter job of OCR and trans­la­tion, using a lan­guage model that under­stands gram­mar and multi-word phrases.  Then, the trans­lated text can be ren­dered onto the video feed in a way that still tracks the orig­i­nal in 3D.  This solves a num­ber of prob­lems at once: improv­ing the trans­la­tion qual­ity, avoid­ing flicker, improv­ing the frame rate, and avoid­ing super­flu­ous repeated OCR.  It’s a small step toward build­ing a per­sis­tent and mean­ing­ful model of the world seen in the video feed and track­ing against it, instead of doing a weaker form of frame-by-frame aug­mented real­ity.  The team has done a really beau­ti­ful job of imple­ment­ing this approach, and the ben­e­fits are pal­pa­ble in the experience.

Use this app on your next visit to China!  I’d love to read com­ments and sug­ges­tions from any­one try­ing Trans­la­tor out in the field.


This entry was posted in mobile and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>