Hi. And thanks for wanting to learn more about Kerosene and a Match technology. We’re still in a pre-release phase, so we can’t detail all the specifics of our technologies just yet. We can, however, answer some frequently asked questions:
What is Kerosene and a Match?
Kerosene and a Match (KaaM) is a totally new style of multimedia discovery and indexing technology that gives standard search systems the ability to “discover” the content of un-indexed and/or untagged images, audio and video and return the results along with text.
“Multimedia Discovery and Indexing?” What’s that?
Right now search engines determine what an image (or audio or video) contains largely by text-based cues such as filename (e.g., “65mustang.jpg”), text tags (e.g., embedded alt tags, metatags, titles, etc.), and textual content around the image (e.g., an article about the 1965 Ford Mustang next to the image). The problem with this is it’s entirely dependent on a person correctly tagging the media — the search engine has no other way of determining the contents — and more than 80% of all non-text media on the Internet isn’t correctly tagged (or tagged at all), meaning that tens of billions of photos, audio clips and videos are unsearchable.
KaaM technology dives deep into untagged media, identifies its contents (aka: “discovery”), and adds special indexes (aka “indexing”) to make it searchable by standard search engines. So a photo that was once known only as “IMG_6043.jpg” is transformed into “photo of a red 1965 Ford Mustang taken in Los Angeles, California in April 2009.”
Isn’t human tagging good enough?
Human tagging is great. Unfortunately most media isn’t tagged — either because the person doesn’t bother or they don’t know what’s in the picture — or is tagged with personally identifiable information (think “my uncle Frank’s car” instead of “1965 Ford Mustang”) that makes it impossible to be properly indexed and searched. When you search images, audio or video on the web right now, you’re only seeing the tip of the iceberg. KaaM lets you get at the 80% that’s still below the surface.
Moreover, KaaM doesn’t simply let you search for multimedia, it lets you search with multimedia. So if you snap a pic of a car with your cameraphone, you don’t have to know it’s a 1965 Ford Mustang in order to learn more about it. You can simply upload the photo and ask “what kind of car is that?” KaaM will analyze the photo and hand off the information to the search engine which then deliver you everything it has on 1965 Ford Mustangs.
How does KaaM work?
KaaM’s discovery technology actually works much the same way your own ability to recognize things does. Take a tree for example. All trees have a number of components — leaves, trunk, branches, etc. Individually each of a tree’s components has unique factors that make them identifiable from other components of the tree (leaves are different from branches), as well identifiable from similar components on other trees (fan-shaped leaves vs. spade-shaped leaves). Each of these items is a specific “fingerprint” that when taken collectively allow you to discern a “palm tree” from a “maple tree”.
KaaM’s technology is very similar. KaaM dives into an image (or audio or video) and breaks it up into as many as 10,000 unique elements within the media — color, intensity, geo-location, orientation, volume, pitch, and a whole bunch of other attributes. Then, using a series of proprietary high speed pattern matching algorithms, KaaM compares all of these elements to other pieces of media. Those with substantially similar “fingerprints” can be assumed to be the same object. These newly discovered objects are then connected back to known items already cataloged within the search engine. So a formerly “unsearchable” photo not only gets cataloged as a picture of a Washingtonia fan palm, but also gets connected to all the search engine’s data on Washingtonia fan palms.
What kind of server horsepower does KaaM require?
KaaM’s underlying pattern matching algorithms are pretty complex, but it requires surprisingly little horsepower. This is because rather than tying up a server’s CPU with all its transformations and pattern analysis, KaaM uses commodity graphics processor units (GPUs) to perform the heavy computation and leaves the CPU to perform the other operations. Commodity GPUs aren’t just inexpensive, they’re also incredibly powerful when it comes to operations like pattern matching. A single GPU can do hundreds or thousands of operations simultaneously where a CPU can only do a handful. As a result, a server with KaaM’s GPU/CPU hybrid architecture offers performance upwards of 50 times more powerful than a CPU-only server platform.
How difficult is KaaM to implement?
Compared to a lot of things, KaaM is actually really simple to implement. Designed to plug into most large-scale distributed computing environments, it’s compatible with cloud/grid architectures such as Hadoop, GridGain, Terracotta, and many others. It can also be run as a self-contained server appliance, as a software-only implementation, or as a hybrid of the two.
Where can I get KaaM Technologies?
Sorry, you can’t. Right now we’re in a closed testing period. However, if you’d like the opportunity to test KaaM once we move into our beta release period later this year, feel free to contact us.
