This week’s CBC tech column is all about the current state of video search. There’s a version up at cbc.ca/tech, and one below, for posterity. You can also download an MP3. [audio:http://blip.tv/file/get/Dmisener-MisenerTechColumn20110510708.mp3]
When I do these things, I sometimes wonder — as Michael Ridley did in his 2008 talk “Beyond Literacy: Are Reading and Writing Doomed?” — how far we are from a truly post-literate society.
According to recent numbers from internet measurement firm comScore, Canadians are voracious online video watchers. Collectively, 22.5 million Canadian users watched 388 million hours of online video in March. That’s 17.2 hours apiece — higher than any other country in the world.
But while our appetite for online video seems to be growing, our ability to search deeply within those videos isn’t keeping pace.
We already have pretty good tools for searching through text on the web. I can get quick, easy and accurate results when I search for for encyclopedia articles, news stories and long-lost friends from junior high school. Computers “get” text. Beyond basic keyword searches, computers can now be programmed to understand the relationships between words, the syntactical structures of language, and they can even analyze the sentiments behind the things we type.
But video? Not so much. Computers have a much harder time with moving pictures.
Basically, video search has all the challenges of static image recognition (a classically difficult task for computers), multiplied by 30 frames per second. Sure, computers can process and analyze video with increasing sophistication, but they stop short of truly understanding the content of moving images.
As both online video production and consumption increases, the issue grows.
“Now Dan,” you might be thinking, “doesn’t video search work just fine already? If I want to see a video of a funny cat, I can type in ‘funny cat video’ and spend all afternoon on YouTube.”
Yes, you can. But there’s an important distinction here. The reason you can find those funny cat videos is because someone somewhere named a video “funny cat” or included the tags “funny” or “cat.” Or maybe they linked to the video and the link text said, “funny cat video.” Or they added the video to their “Top 10 Funny Cat Videos of all time” playlist.
The reason you can find that funny cat video is because a human being labeled it as such. It’s not because the computer understands the content of the video, or even has the slightest clue what a “cat” is or a how a cat could be “funny.”
The real challenge in video search has to do with searching inside videos — helping computers better catalogue the depth of their content.
This is particularly relevant to the eduction sector. Many universities, colleges, and in some cases, high schools, post video lectures online. For certain courses, it’s not uncommon to have access to hours and hours of online lecture material. At that scale, the challenge becomes searching deeply for relevant content, not skimming across a shalow layer of metadata.
Last week, I talked to Larry Rowe, president of the multimedia research lab FXPAL (Fuji Xerox Palo Alto Laboratory). His team is working on exactly this problem, and recently launched TalkMiner, a video search tool.
Here’s how it works. TalkMiner analyzes online lecture videos, searching for PowerPoint-style presentation slides. When it finds a slide, it scans the relevant text, notes the video’s timestamp, and adds this information to a searchable database. This allows users to search for text that might not be in the lecture’s title or description, but might be buried 45 minutes in.
One of my test searches on TalkMiner took me back to Grade 10 biology: “meiosis and mitosis.” Though many of the results had these words in their titles, the first result was a lecture from Berkeley that didn’t mention meiosis until a presentation slide almost five minutes in.
Of course, scanning presentation slides from existing lecture videos is just one technique, most effective for a particular style of online video. But the central idea is there: let’s design technology that helps a computer make long video more searchable, and more useful.
There are other techniques. In late in 2009, YouTube announced an experimental feature called automatic captions. Basically, it takes the audio part of a video, runs it through speech recognition software and generates a transcript. Since the transcription is text, it can be added to a searchable database to make that video easier to find. The feature is now available on all English-language YouTube videos.
Several years before automatic captions made its debut, an early version of Google’s Video Search product scanned existing closed-caption information from television shows to generate a searchable database.
Outside of the academic and consumer space, the U.S. Department of Defense is working on video search, too. It is developing a system that could be used to analyze footage to identify people, vehicles, and certain types of action in a scene.
So why is this important? For me, it’s the scale that makes this such and interesting and relevant problem. According to YouTube, 35 hours of video is uploaded every minute. That’s staggering. And sure, many of those videos are funny cats. But there’s also an enormous amount of knowledge contained in some of these online videos. Just look at the TED Talks series, for instance.
But right now, video search is clunky. It doesn’t always work as well as we’d like. And in most cases, searching deep inside videos is impossible. Reliable direct video search is still the stuff of science fiction.
Online video production is growing. Online video consumption is growing. Without decent search tools, we risk getting lost in a sea of abundance: knowing that what we want is out there, but being unable to find it.