I bumped into a professional acquaintance last week. After describing briefly a presentation I was about to give, he offered to broker introductions to others who might have an interest in the work I’ve been doing. To initiate the introductions, I crafted a brief description of what I’ve been up to for the past 5 years in this area. I’ve also decided to share it here as follows:
As always, [name deleted], I enjoyed our conversation at the recent AGU meeting in Toronto. Below, I’ve tried to provide some context for the work I’ve been doing in the area of knowledge representations over the past few years. I’m deeply interested in any introductions you might be able to broker with others at York who might have an interest in applications of the same.
Since 2004, I’ve been interested in expressive representations of data. My investigations started with a representation of geophysical data in the eXtensible Markup Language (XML). Although this was successful, use of the approach underlined the importance of metadata (data about data) as an oversight. To address this oversight, a subsequent effort introduced a relationship-centric representation via the Resource Description Format (RDF). RDF, by the way, forms the underpinnings of the next-generation Web – variously known as the Semantic Web, Web 3.0, etc. In addition to taking care of issues around metadata, use of RDF paved the way for increasingly expressive representations of the same geophysical data. For example, to represent features in and of the geophysical data, an RDF-based scheme for annotation was introduced using XML Pointer Language (XPointer). Somewhere around this point in my research, I placed all of this into a framework.
In addition to applying my Semantic Framework to use cases in Internet Protocol (IP) networking, I’ve continued to tease out increasingly expressive representations of data. Most recently, these representations have been articulated in RDFS – i.e., RDF Schema. And although I have not reached the final objective of an ontological representation in the Web Ontology Language (OWL), I am indeed progressing in this direction. (Whereas schemas capture the vocabulary of an application domain in geophysics or IT, for example, ontologies allow for knowledge-centric conceptualizations of the same.)
From niche areas of geophysics to IP networking, the Semantic Framework is broadly applicable. As a workflow for systematically enhancing the expressivity of data, the Framework is based on open standards emerging largely from the World Wide Web Consortium (W3C). Because there is significant interest in this next-generation Web from numerous parties and angles, implementation platforms allow for increasingly expressive representations of data today. In making data actionable, the ultimate value of the Semantic Framework is in providing a means for integrating data from seemingly incongruous disciplines. For example, such representations are actually responsible for providing new results – derived by querying the representation through a ‘semantified’ version of the Structured Query Language (SQL) known as SPARQL.
I’ve spoken formally and informally about this research to audiences in the sciences, IT, and elsewhere. With York co-authors spanning academic and non-academic staff, I’ve also published four refereed journal papers on aspects of the Framework, and have an invited book chapter currently under review – interestingly, this chapter has been contributed to a book focusing on data management in the Semantic Web. Of course, I’d be pleased to share any of my publications and discuss aspects of this work with those finding it of interest.
With thanks in advance for any connections you’re able to facilitate, Ian.
If anything comes of this, I’m sure I’ll write about it here – eventually!
In the meantime, feedback is welcome.
RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.
Although Idehen identifies a number of data sources, he does not explicitly identify two data sources I’ve been spending a fair amount of time with over the past few years:
- One source of data is that generated by scientific instruments. With various colleagues, the semantic framework I’ve built around this data source allows for RDF-ization of scientific data from semi-structured ASCII to XML (specifically ESML) to RDF via GRDDL. (Please see the illustration.) In principle, it should be possible to further transform the RDF representation into OWL thus resulting in what I’ve referred to elsewhere as an informal ontology. (According to Morville as well as Shadbolt et al., the RDF-ization of the data sources Idehen identifies result in folksonomies, rather than informal ontologies.) Again with various colleagues, I’ve also made use of RDF to annotate features inherent in the scientific data via XML Pointer Language (XPointer).
- Even more recently, with members of my Network Operations team at York University, I’ve been working with a relational database as a source of data on the topology of IP networks. (Please see the illustration.)
Of course, whether the motivation is personal/social-networking or scientific/IT related, the attention to RDF-ization is win-win for all stakeholders. Why? Anything that accelerates the RDF-ization of non-RDF data sources brings us that much closer to realizing the true value of the Semantic Web.
Our manuscript on annotation modeling is one step closer to publication now, as late last night my co-authors and I received sign-off on the copy-editing phase. The journal, Computers and Geosciences, is now preparing proofs.
For the most part then, as authors, we’re essentially done.
However, we may not be able to resist the urge to include a “Note Added in Proof”. At the very least, this note will allude to:
- The work being done to refactor Annozilla for use in a Firefox 3 context; and
- How annotation is figuring in OWL2 (Google “W3C OWL2” for more).
Stay tuned …
From the Core to the Edge: Automating Awareness of Network Topology through Knowledge Representation
Ian Lumb – Manager Network Operations, Computing and Network Services (York University)
Like many other institutions of higher education, York University makes extensive use of Open Source software. This is especially true in the case of monitoring and managing IP (Internet Protocol) devices. On the monitoring front, extensive manual configuration is currently required to make monitoring solutions (e.g., NAGIOS) aware of the topology of the York network. And with respect to managing, NetDisco automatically discovers assets placed on the network, but is unable to abstract away unnecessary complexity in, e.g., rendering schematics of the network topology. These and other examples suggest that NAGIOS and NetDisco operate in the realm of data, and possibly information, but are unable to envisage network topology from a knowledge-representation perspective. Thus the current focus is on applying a recently developed knowledge-representation platform to such routine requirements in network monitoring and management. The platform is based on Sematic Web standards and implementations and has already been proven effective in various scientific contexts. Ultimately our objective is to extract data automatically discovered by NetDisco, represent it using the knowledge-based platform, and transform a topology-aware representation of the data into configuration data that can be ingested by NAGIOS.
A visual representation of the approach is illustrated below.
What a difference a day makes!
Yesterday I learned that my paper on semantic platforms was rejected.
Today, however, the news was better as a manuscript on annotation modeling was
accepted for publication.
It’s been a long road for this paper:
- Its conception dates back to a presentation I gave at the 2006 Fall Meeting of the AGU.
- The paper was submitted as a contribution for Computers
& Geosciences Special Issue on Geoscience Knowledge Representation in
- The initial reviews called for major revisions. With tremendous support from my co-authors, the paper was significantly revised, and re-submitted.
- After some additional interactions, I just learned that the paper was finally accepted for publication.
The abstract of the paper is as follows:
Annotation Modeling with Formal Ontologies:
Implications for Informal Ontologies
L. I. Lumb, J. R. Freemantle, J. I. Lederman & K. D.
 Computing and Network Services, York University, 4700 Keele Street,
Toronto, Ontario, M3J 1P3, Canada
 Earth & Space Science and Engineering, York University, 4700 Keele
Street, Toronto, Ontario, M3J 1P3, Canada
Knowledge representation is increasingly recognized as an important component of any cyberinfrastructure (CI). In order to expediently address scientiﬁc needs, geoscientists continue to leverage the standards and implementations emerging from the World Wide Web Consortium’s (W3C) Semantic Web effort. In an ongoing investigation, previous efforts have been aimed towards the development of a semantic framework for the Global Geodynamics Project (GGP). In contrast to other efforts, the approach taken has emphasized the development of informal ontologies, i.e., ontologies that are derived from the successive extraction of Resource Description Format (RDF) representations from eXtensible Markup Language (XML), and then Web Ontology Language (OWL) from RDF. To better understand the challenges and opportunities for incorporating annotations into the emerging semantic framework, the present effort focuses on knowledge-representation modeling involving formal ontologies. Although OWL’s internal mechanism for annotation is constrained to ensure computational completeness and decidability, externally originating annotations based on the XML Pointer Language (XPointer) can easily violate these constraints. Thus the effort of modeling with formal ontologies allows for recommendations applicable to the case of incorporating annotations into informal ontologies.
I expect the whole paper will be made available in the not-too-distant future …
I learned yesterday that the manuscript I submitted to HPCS 2008 was not accepted 😦
It may take my co-authors and I some time before this manuscript is revised and re-submitted.
This anticipated re-submission latency, along with the fact that we believe the content needs to be shared in a timely fashion, provides the motivation for sharing the manuscript online.
To whet your appetite, the abstract is as follows:
Evolving a Semantic Framework into a Network-Enabled Semantic Platform
A data-oriented semantic framework has been developed previously for a project involving a network of globally distributed scientiﬁc instruments. Through the use of this framework, the semantic expressivity and richness of the project’s ASCII data is systematically enhanced as it is successively represented in XML (eXtensible Markup Language), RDF (Resource Description Formal) and ﬁnally as an informal ontology in OWL (Web Ontology Language). In addition to this representational transformation, there is a corresponding transformation from data into information into knowledge. Because this framework is broadly applicable to ASCII and binary data of any origin, it is appropriate to develop a network-enabled semantic platform that identiﬁes the enabling semantic components and interfaces that already exist, as well as the key gaps that need to be addressed to completely implement the platform. After brieﬂy reviewing the semantic framework, a J2EE (Java 2 Enterprise Edition) based implementation for a network-enabled semantic platform is provided. And although the platform is in principle usable, ongoing adoption suggests that strategies aimed at processing XML via parallel I/O techniques are likely an increasingly pressing requirement.
Earlier this week, I participated in the Net@EDU Annual Meeting 2008: The Next 10 Years. For me, the key takeaways are:
- The Internet can be improved. IP, its transport protocols (RTP, SIP, TCP and UDP), and especially HTTP, are stifling innovation at the edges – everything (device-oriented) on IP and everything (application-oriented) on the Web. There are a number of initiatives that seek to improve the situation. One of these, with tangible outcomes, is the Stanford Clean Slate Internet Design Program.
- Researchers and IT organizations need to be reunited. In the 1970s and 1980s, these demographics worked closely together and delivered a number of significant outcomes. Beyond the 1990s, these group remain separate and distinct. This separation has not benefited either group. As the manager of a team focused on operation of a campus network who still manages to conduct a modest amount of research, this takeaway resonates particularly strongly with me.
- DNSSEC is worth investigating now. DNS is a mission-critical service. It is often, however, an orphaned service in many IT organizations. DNSSEC is comprised of four standards that extend the original concept in security-savvy ways – e.g., they will harden your DNS infrastructure against DNS-targeted attacks. Although production implementation remains a future, the time is now to get involved.
- The US is lagging behind in the case of broadband. An EDUCAUSE blueprint details the current situation, and offers a prescription for rectifying it. As a Canadian, it is noteworthy that Canada’s progress in this area is exceptional, even though it is regarded as a much-more rural nation than the US. The key to the Canadian success, and a key component of the blueprint’s prescription, is the funding model that shares costs evenly between two levels of government (federal and provincial) as well as the network builder/owner.
- Provisioning communications infrastructures for emergency situations is a sobering task. Virginia Tech experienced 100-3000% increases in the demands on their communications infrastructure as a consequence of their April 16, 2007 event. Such stress factors are exceedingly difficult to estimate and account for. In some cases, responding in real time allowed for adequate provisioning through a tremendous amount of collaboration. Mass notification remains a challenge.
- Today’s and tomorrow’s students are different from yesterday’s. Although this may sound obvious, the details are interesting. Ultimately, this difference derives from the fact that today’s and tomorrow’s students have more intimately integrated technology into their lives from a very young age.
- Cyberinfrastructure remains a focus. EDUCAUSE has a Campus Cyberinfrastructure Working Group. Some of their deliverables are soon to include a CI digest, plus contributions from their Framing and Information Management Focus Groups. In addition to the working-group session, Don Middleton of NCAR discussed the role of CI in the atmospheric sciences. I was particularly pleased that Middleton made a point of showcasing semantic aspects of virtual observatories such as the Virtual Solar-Terrestrial Observatory (VSTO).
- The Tempe Mission Palms Hotel is an outstanding venue for a conference. Net@EDU has themed its annual meetings around this hotel, Tempe, Arizona and the month of February. This strategic choice is delivered in spades by the venue. From individual rooms to conference food and logistics to the mini gym and pool, The Tempe Mission Palms Hotel delivers.