Archive | Quantitative Classification RSS for this section

Google Blogging 2007: From Legitimizing Blogs to Wikipedia-Competitor Google Knol

There’s a recent, year-in-review entry by the Google blogging team.Not only does this entry highlight another wonderful year for Google, it also quantitatively places blogging in perspective. If you ever had any doubts as to the legitimacy of blogging, just read this post.Amongst the highlights I found the announcement of the Knol test project to be of interest. Although I’m a huge fan of knowledge representation and management, especially in the context of the Semantic Web, I must confess to being confused by Knol. At the most-basic level, Knol seems to be about knowledge sharing. And more-specifically, providing jumping off points (from search-engine hits) for those seeking to understand some topic.Therefore, I can’t help but ask, is there more to Knol than it’s Google’s competitive answer to Wikipedia?If you happen to drop by my blog, and this post, please feel free to share your take on Knol.What am I missing?

Jott Meets the Semantic Web

While walking my husky after work yesterday, I Jott’ed myself:

Another great work out today on the electrical, you had over 3 kilometers and over 550 calories burned in 32 minutes. Nice work and then some good wait listing …

Most human readers would automatically parse this Jott as:

Another great workout today on the elliptical, you had over 3 kilometers and over 550 calories burned in 32 minutes. Nice work and then some good weight lifting …

Even though I don’t know a lot about Jott’s transcription engine, I’ll share my perspective on the identified differences:

  • “work out” vs. “workout” and “wait” vs. “weight” – These are subtle differences. Differences that can only be resolved with an understanding of context. In other words, a human reader knows that I was attempting to capture some data on my lunch-time exercise routine, and re-parses the Jott with contextually correct words. In order to correct such subtle ‘errors of transcription’, Jott will need to develop semantic filters. Filters that can take context into account.
  • “electrical” vs. “elliptical” and “listing” vs. “lifting” – These are glaring differences. I know, from past experience, that Jott has words like “elliptical” and “lifting” in its ‘dictionary’. Therefore, I regard these as errors originating from Jott’s inability to ‘hear’ what I’m saying. And although a context-based filter may also help here, I feel I must share some of the responsibility for not clearly articulating my Jott.

What does all of this mean?

Meaning, indeed, is the root of it all!

What this means is that some future version of Jott will need to do a better job of capturing meaning. What I had intended. The context in which I framed my Jott.

What this means is that in the longer term, a few major releases of Jott down the road, Jott will need to become as interested in the Semantic Web as companies like Google are today.

And as we’re experiencing with search engines like Google, this’ll take some effort and some time!

The Essence of Google

Another quote from Chris Anderson’s The Long Tail:

Likewise for Google, which seems both omniscient and inscrutable. It makes connections that you or I might not, because they naturally emerge from math on a scale we can’t comprehend. Google is arguably the first company to be born with the alien intelligence of the Web’s “massive-scale” statistics hardwired into its DNA. That’s why it’s so successful, and so seemingly unstoppable.

Author Paul Graham puts it like this:

The Web naturally has a certain grain, and Google is aligned with it. That’s why their success seems so effortless. They’re sailing with the wind, instead of sitting becalmed praying for a business model, like print media, or trying to tack upwind by suing their customers, like Microsoft and the record labels. Google doesn’t try to force things to happen their way. They try to figure out what’s going to happen, and arrange to be standing there when it does.

I’ve never read a more-concise distillation of the very essence of Google.

Digital Terrain Mapping via LIDAR

From the purely scientific (ozone-column mapping, imaging hydrometeors in clouds) to commercial (on-board detection of clear air turbulence, CAT), my exposure to LIDAR applications has been primarily atmospheric.

Of course, other applications of LIDAR technology exist, and one of these is Digital Terrain Mapping (DTM).

Terra Remote Sensing Inc. is a leader in LIDAR-based DTM. Particularly impressive is their ability to perform surface DTM in areas of dense vegetation. As I learned at a very recent meeting of the Ontario Association of Remote Sensing (OARS), Terra has already found a number of very practical applications for LIDAR-based DTM.

Some additional applications that come to mind are:

  • DTM of urban canopies for atmospheric experiments – Terra has already mapped buildings for various purposes. The same approach could be used to better ground (sorry 😉 atmospheric experiments. For example, the boundary-layer modeling that was conducted for Joint Urban 2003 (JU03) employed a digitization of Oklahoma City. A LIDAR-based DTM would’ve made this an even-more realistic effort.
  • Monitoring the progress of Global Change in the Arctic – In addition to LIDAR-based DTM, Terra is also having some success characterizing surfaces based on LIDAR intensity measurements. Because open water and a glacier would be expected to have different DTM and intensity characteristics, Terra should also be able to monitor Global Change as nunataks are progressively transformed into traditional islands (land isolated and surrounded by open water). With the Arctic as a bellwether for Global Change, it’s not surprising that the nunatak-to-island transformation is getting attention.

Although my additional examples are (once again) atmospheric in nature, as Terra is demonstrating, there are numerous applications for LIDAR-based technologies.

Annotation Paper Submitted to HPCS 2007 Event

I’ve blogged and presented recently (locally and at an international scientific event) on the topic of annotation and knowledge representation.

Working with co-authors Jerusha Lederman, Jim Freemantle and Keith Aldridge, a written version of the recent AGU presentation has been prepared and submitted to the HPCS 2007 event. The abstract is as follows:

Semantically Enabling the Global Geodynamics Project:
Incorporating Feature-Based Annotations via XML Pointer Language (XPointer)

Earth Science Markup Language (ESML) is efficient and effective in representing scientific data in an XML-based formalism. However, features of the data being represented are not accounted for in ESML. Such features might derive from events, identifications, or some other source. In order to account for features in an ESML context, they are considered from the perspective of annotation. Although it is possible to extend ESML to incorporate feature-based annotations internally, there are complicating factors identified that apply to ESML and most XML dialects. Rather than pursue the ESML-extension approach, an external representation for feature-based annotations via XML Pointer Language (XPointer) is developed. In previous work, it has been shown that it is possible to extract relationships from ESML-based representations, and capture the results in the Resource Description Format (RDF). Application of this same requirement to XPointer-based annotations of ESML representations results in a revised semantic framework for the Global Geodynamics Project (GGP).

Once the paper is accepted, I’ll make a pre-submission version available online.

Because the AGU session I participated in has also issued a call for papers, I’ll be extending the HPCS 2007 submission in various interesting ways.

And finally, thoughts are starting to gel on how annotations may be worked into the emerging notions I’ve been having on knowledge-based heuristics.

Stay tuned.

Knowledge-Based Heuristics: Further Research is Required

Recently, I’ve blogged about:

In both cases, there’s a case to be made for combining heuristic with knowledge-based approaches.

Although I did find “heuristics” and “knowledge” juxtaposed in Googling for “knowledge-based heuristics”, I believe the tightly coupled examples I’ve described above have some degree of novelty.

Further research is required 🙂

A Bayesian-Ontological Approach for Fighting Spam

When it comes to fighting spam, Bayesian and ontological approaches are not mutually exclusive.

They could be used together in a highly complimentary fashion.

For example you could use Bayesian approaches, as they are implemented today, to build a spam ontology. In other words, the Bayesian approach would be extended through the addition of knowledge-representation methods for fighting spam.

This is almost the opposite of the Wikipedia-based approach I blogged about recently.

In the Wikipedia-based approach, the ontology consists of ham-based knowledge.

In the alternative I’ve presented here, the ontology consists of spam-based knowledge.

Both approaches are technically viable. However, it’d be interesting to see which one actually works better in practice.

Either way, automated approaches for constructing ontologies, as I’ve outlined elsewhere, are likely to be of significant applicability here.

Another point is also clear: Either way, this will be a computationally intensive activity!