RDF-ization: Is That What I’ve Been Up To?
Recently, on his blog, Kingsley Idehen wrote:
RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.
Although Idehen identifies a number of data sources, he does not explicitly identify two data sources I’ve been spending a fair amount of time with over the past few years:
- One source of data is that generated by scientific instruments. With various colleagues, the semantic framework I’ve built around this data source allows for RDF-ization of scientific data from semi-structured ASCII to XML (specifically ESML) to RDF via GRDDL. (Please see the illustration.) In principle, it should be possible to further transform the RDF representation into OWL thus resulting in what I’ve referred to elsewhere as an informal ontology. (According to Morville as well as Shadbolt et al., the RDF-ization of the data sources Idehen identifies result in folksonomies, rather than informal ontologies.) Again with various colleagues, I’ve also made use of RDF to annotate features inherent in the scientific data via XML Pointer Language (XPointer).

- Even more recently, with members of my Network Operations team at York University, I’ve been working with a relational database as a source of data on the topology of IP networks. (Please see the illustration.)

Of course, whether the motivation is personal/social-networking or scientific/IT related, the attention to RDF-ization is win-win for all stakeholders. Why? Anything that accelerates the RDF-ization of non-RDF data sources brings us that much closer to realizing the true value of the Semantic Web.
Annotation Modeling: In Press
Our manuscript on annotation modeling is one step closer to publication now, as late last night my co-authors and I received sign-off on the copy-editing phase. The journal, Computers and Geosciences, is now preparing proofs.
For the most part then, as authors, we’re essentially done.
However, we may not be able to resist the urge to include a “Note Added in Proof”. At the very least, this note will allude to:
- The work being done to refactor Annozilla for use in a Firefox 3 context; and
- How annotation is figuring in OWL2 (Google “W3C OWL2″ for more).
Stay tuned …
CANHEIT 2008: Enhanced Abstract
The program specifics for CANHEIT 2008 are becoming available online.
The enhanced abstract for one of my presentations is as follows:
From the Core to the Edge: Automating Awareness of Network Topology through Knowledge Representation
Ian Lumb – Manager Network Operations, Computing and Network Services (York University)
Abstract
Like many other institutions of higher education, York University makes extensive use of Open Source software. This is especially true in the case of monitoring and managing IP (Internet Protocol) devices. On the monitoring front, extensive manual configuration is currently required to make monitoring solutions (e.g., NAGIOS) aware of the topology of the York network. And with respect to managing, NetDisco automatically discovers assets placed on the network, but is unable to abstract away unnecessary complexity in, e.g., rendering schematics of the network topology. These and other examples suggest that NAGIOS and NetDisco operate in the realm of data, and possibly information, but are unable to envisage network topology from a knowledge-representation perspective. Thus the current focus is on applying a recently developed knowledge-representation platform to such routine requirements in network monitoring and management. The platform is based on Sematic Web standards and implementations and has already been proven effective in various scientific contexts. Ultimately our objective is to extract data automatically discovered by NetDisco, represent it using the knowledge-based platform, and transform a topology-aware representation of the data into configuration data that can be ingested by NAGIOS.
A visual representation of the approach is illustrated below.
Annotation Modeling: To Appear in Comp & Geosci
What a difference a day makes!
Yesterday I learned that my paper on semantic platforms was rejected.
Today, however, the news was better as a manuscript on annotation modeling was
accepted for publication.
It’s been a long road for this paper:
- Its conception dates back to a presentation I gave at the 2006 Fall Meeting of the AGU.
- The paper was submitted as a contribution for Computers
& Geosciences Special Issue on Geoscience Knowledge Representation in
Cyberinfrastructure. - The initial reviews called for major revisions. With tremendous support from my co-authors, the paper was significantly revised, and re-submitted.
- After some additional interactions, I just learned that the paper was finally accepted for publication.
The abstract of the paper is as follows:
Annotation Modeling with Formal Ontologies:
Implications for Informal Ontologies
L. I. Lumb[1], J. R. Freemantle[2], J. I. Lederman[2] & K. D.
Aldridge[2]
[1] Computing and Network Services, York University, 4700 Keele Street,
Toronto, Ontario, M3J 1P3, Canada
[2] Earth & Space Science and Engineering, York University, 4700 Keele
Street, Toronto, Ontario, M3J 1P3, Canada
Knowledge representation is increasingly recognized as an important component of any cyberinfrastructure (CI). In order to expediently address scientific needs, geoscientists continue to leverage the standards and implementations emerging from the World Wide Web Consortium’s (W3C) Semantic Web effort. In an ongoing investigation, previous efforts have been aimed towards the development of a semantic framework for the Global Geodynamics Project (GGP). In contrast to other efforts, the approach taken has emphasized the development of informal ontologies, i.e., ontologies that are derived from the successive extraction of Resource Description Format (RDF) representations from eXtensible Markup Language (XML), and then Web Ontology Language (OWL) from RDF. To better understand the challenges and opportunities for incorporating annotations into the emerging semantic framework, the present effort focuses on knowledge-representation modeling involving formal ontologies. Although OWL’s internal mechanism for annotation is constrained to ensure computational completeness and decidability, externally originating annotations based on the XML Pointer Language (XPointer) can easily violate these constraints. Thus the effort of modeling with formal ontologies allows for recommendations applicable to the case of incorporating annotations into informal ontologies.
I expect the whole paper will be made available in the not-too-distant future …
Evolving Semantic Frameworks into Platforms: Unpublished ms.
I learned yesterday that the manuscript I submitted to HPCS 2008 was not accepted ![]()
It may take my co-authors and I some time before this manuscript is revised and re-submitted.
This anticipated re-submission latency, along with the fact that we believe the content needs to be shared in a timely fashion, provides the motivation for sharing the manuscript online.
To whet your appetite, the abstract is as follows:
Evolving a Semantic Framework into a Network-Enabled Semantic Platform
A data-oriented semantic framework has been developed previously for a project involving a network of globally distributed scientific instruments. Through the use of this framework, the semantic expressivity and richness of the project’s ASCII data is systematically enhanced as it is successively represented in XML (eXtensible Markup Language), RDF (Resource Description Formal) and finally as an informal ontology in OWL (Web Ontology Language). In addition to this representational transformation, there is a corresponding transformation from data into information into knowledge. Because this framework is broadly applicable to ASCII and binary data of any origin, it is appropriate to develop a network-enabled semantic platform that identifies the enabling semantic components and interfaces that already exist, as well as the key gaps that need to be addressed to completely implement the platform. After briefly reviewing the semantic framework, a J2EE (Java 2 Enterprise Edition) based implementation for a network-enabled semantic platform is provided. And although the platform is in principle usable, ongoing adoption suggests that strategies aimed at processing XML via parallel I/O techniques are likely an increasingly pressing requirement.
AGU Poster: Relationship-Centric Ontology Integration
Later today in San Francisco, at the 2007 Fall Meeting of the American Geophysical Union (AGU), one of my co-authors will be presenting our poster entitled “Relationship-Centric Ontology Integration” (abstract).
This poster will be in a session for which I was a co-convenor and described elsewhere.
A PDF-version of the poster is available elsewhere (agu07_the_poster_v2.pdf).
Jott Meets the Semantic Web
While walking my husky after work yesterday, I Jott’ed myself:
Another great work out today on the electrical, you had over 3 kilometers and over 550 calories burned in 32 minutes. Nice work and then some good wait listing …
Most human readers would automatically parse this Jott as:
Another great workout today on the elliptical, you had over 3 kilometers and over 550 calories burned in 32 minutes. Nice work and then some good weight lifting …
Even though I don’t know a lot about Jott’s transcription engine, I’ll share my perspective on the identified differences:
- “work out” vs. “workout” and “wait” vs. “weight” – These are subtle differences. Differences that can only be resolved with an understanding of context. In other words, a human reader knows that I was attempting to capture some data on my lunch-time exercise routine, and re-parses the Jott with contextually correct words. In order to correct such subtle ‘errors of transcription’, Jott will need to develop semantic filters. Filters that can take context into account.
- “electrical” vs. “elliptical” and “listing” vs. “lifting” – These are glaring differences. I know, from past experience, that Jott has words like “elliptical” and “lifting” in its ‘dictionary’. Therefore, I regard these as errors originating from Jott’s inability to ‘hear’ what I’m saying. And although a context-based filter may also help here, I feel I must share some of the responsibility for not clearly articulating my Jott.
What does all of this mean?
Meaning, indeed, is the root of it all!
What this means is that some future version of Jott will need to do a better job of capturing meaning. What I had intended. The context in which I framed my Jott.
What this means is that in the longer term, a few major releases of Jott down the road, Jott will need to become as interested in the Semantic Web as companies like Google are today.
And as we’re experiencing with search engines like Google, this’ll take some effort and some time!
Earth and Space Science Informatics at the 2007 Fall Meeting of the American Geophysical Union
In a previous post, I referred to Earth Science Informatics as a discipline-in-the-making.
To support this claim, I cited a number of data points. And of these data points, the 2006 Fall Meeting of the American Geophysical Union (AGU) stands out as a key enabler.
With 22 sessions posted, the 2007 Fall Meeting of the AGU is well primed to further enable the development of this discipline.
Because I’m a passionate advocate of this intersection between the Earth Sciences and Informatics, I’m involved in convening three of the 22 Earth and Space Science Informatics sessions:
- Ontology Integration: A Pressing Challenge for Earth and Space Science Informatics
- Grid Technologies and Associated Infrastructures
- Putting Ontologies to Work: Real-World Applications in the Earth and Space Sciences
I encourage you to take a moment to review the calls for participation for these three, as well as the other 19, sessions in Earth and Space Science Informatics at the 2007 Fall Meeting of the AGU.
CANARIE’s Network-Enabled Platforms Workshop
Next week, I’ll be attending CANARIE’s Network-Enabled Platforms Workshop: Convergence of Cyber-Infrastructure and the Next-Generation Internet in Ottawa. Although the workshop is described elsewhere, to provide a little context consider that:
The purpose of CANARIE’s Network-Enabled Platforms Workshop is to explore the development of and participation in network-enabled platforms by Canadian researchers and other interested parties. The workshop will be an important step towards the launch of a CANARIE funding program in this area.
Based on the agenda, I expect this will be a highly worthwhile event, and I am looking forward to it.
My contribution to the workshop will be a short presentation described by the following abstract:
Evolving Semantic Frameworks into Network-Enabled Semantic Platforms
Ian Lumb
Manager Network Operations
Computing and Network Services
York UniversityA semantic framework has been successfuly developed for a project involving a network of globally distributed scientific instruments. Through the use of this framework, the semantic expressivity and richness of the project’s ASCII data is systematically enhanced as it is successively represented in XML (eXtensible Markup Language), RDF (Resource Description Formal) and finally OWL (Web Ontology Language). In addition to this representational transformation, there is a corresponding transformation from data into information into knowledge. Because this framework is broadly applicable to ASCII and binary data of any origin, it is appropriate to develop a network-enabled sematic platform that (i) facilitates integration of the enabling languages, tools and utilities that already exist, and (ii) identifies the key gaps that need to be addressed to completely implement the platform. After briefly reviewing the semantic framework in a generic way, a J2EE (Java 2 Enterprise Edition) based, work-in-progress proposal for a network-enabled semantic platform is forwarded.
I expect to be sharing more on this thread as it develops …


