The new way is already old

Some may be thinking that the emerging semantic web may be “just a new representation of data” (RDF, OWL, etc..)  But I think it’s a lot more. It’s also, and possibly more importantly a representation of data where none has existed before.  I think we sometimes need to be careful not to loose sight of the obvious., especially in academic discussions. (which often fling the obvious far into the weeds.) That being; that vast multitudes of information are pouring on to the web every day and every minute in a form which is unstructured, difficult to reuse, discover, and make sense of.

Ok so as I understand it the real value of the emerging semantic web (once fully realized) will be the new ability to share, harvest, read, process, and “reason over” the world’s published knowledge programmatically.  The alternative is, well, rather disconcerting, How many monkeys can you muster to sit and read the web and make sense of it to get what you need today? Not many I’ll suggest, monkeys are not cheep and they eat a lot. And yet that is just what businesses are doing today. The number of “knowledge workers” in the modern business office place goes up and up each year. Where do all these people get their information in the “information age”. The answer:  First scan the web, maybe pick up the phone. Honestly think about, when you need an answer where does your hand go first? The door knob? The phone? Or the Mouse?

Is the new paradigm still new?

Here’s the rapidly emerging issue, the new paradigm of gather knowledge (and thus power) from the web is also the old paradigm that must become obsolete.  It is the old paradigm in the sense that burning up valuable “human eyeball time” scanning for information on the web is just not very efficient.  We need a better way.

The consensus of this need represents itself in many ways. The shear mountainous popularity of Google is, (obvious again) a vote in favor of the validity of this need. In this Google Tech talk Doug Lenat from Cycorp points out the limitations of Google;  just try and ask Google a question like “Which is taller the Space Needle or the Eiffel tower?” Doug goes on to describe how (in 2006 )  Google returns no reasonable answer.  I tried it today and the first hit led me to an answer.  Or did it?

Did Google really answer the question?  Sadly no, we only get lucky (or think we did) because someone created a forum post with the same question in the title, so we get a match.  Guess what, if the forum post did not really resolve to an answer we still fail.  How many times has this happened to you?  (And by the way how willing are you to trust an answer from a forum post)

Good answers are hard to find

So you still can’t really get a reliable answer with out some manual work on your own sorting out bits and pieces from the search results and then doing some verification (or some math).  It should be dirt simple for computers to answer these sorts of questions in a reliable fashion. But today it is not.

One step closer is Wolfram Alpha.  Doug Lenat is Bullish on Wolfram alpha and so am I. I played around with it the other day and had surprisingly good results, This tool has great potential and sports a reasonable amount of DWIM (Do what I mean, not what I say).  However today I asked the Space Needle /Eiffel Tower question there and lo and behold it’s… not understanding me. I type in the exact phrase “Which is taller the space needle or the Eiffel Tower?” and I get back “Wolfram Alpha isn’t sure what to do with your input.”  Ok, but I’m willing to try again…so I enter “Space Needle Eiffel Tower”.  Aha I get results. A very nicely formatted page with facts and maps about each tower and it is complete with height values.  Very cool.

But the current limitation is that Wolfram alpha did not understand English grammar well enough.  Optimistically speaking this should be doable today; We can write programs that parse simple sentence structures in question format.  It’s not always guaranteed to work perfectly but I think the folks at wolfram alpha should be able to do better and I’m sure they will.

The ultimate answering machines cometh

Meanwhile similar but different solutions are on the way. Google has announced Google Squared.  Tech Crunch is quick to claim it will  “Crush wolfram alpha“. And  Paul, (The Content Guy) is quick to reminds us that we should all calm down because Google Squared is not the same as Wolfram alpha., and therefore should not be compared head to head and presented as an either/or choice.  Indeed I agree, it isn’t the same. And as those annoying optimists are so quick to say;  “It’s all good”. Anyway I don’t think “really smart” search engines are the end of this story. What is the end game for sites like Ask.com, Wolfram Alpha and Google Squared? Sites who’s motivation, reason for being, is being in the game of helping people get to an answer to a question. What might the ideal “well of web wisdom” look like?  Well here’s what I want, imagine this if you will;  a web site that can answer my questions and:

  • Do a reasonable job at natural language interpretation
  • Provide suggested questions when mine is not understood (i.e “Did you mean…” on  steroids)
  • Truly assemble real quality answers with associated facts and links.
  • Draw material from the vast semantically rich web when needed.
  • Draw on large Linked Open Data sources when needed.
  • Provide links to soruces for verification
  • Provide deep linking to other, possibly more relevant, sites that focus on vertical domains for a deep dive. (engineering, health, sciences, etc.).
  • And so on…

Sound unrealistic?, I don’t think so.  This is not utpoian at all at this point.  No not at all utopian. And  once we have these kinds of resources, the cost of hunting around for information will be greatly reduced saving us time, money and monkeys, and we may just find we have more time to pick up the phone again and call a friend, or even (heaven forbid) reach for the door.