Monday, May 18, 2009

Semantic Binary Data

There seems to be quite a few quite a few things missing from XML Schema, RDF and OWL, for example, a single unified type for binary octet-streams. The distinction between the two types used for binary octet-streams has caused more than a few issues during the implementation of these technologies. One indication is that OWL 2.0 is much more complex than the OWL 1.0, and another indication is a note in OWL 2 which talks about the consequences of the distinction between xs:base64Binary and xs:hexBinary. Both of these types are defined to have identical value spaces, but different lexical spaces. Their mutual value space is defined as all finite-length sequences of octets (numbers between 0 and 255).

If the Web metasystem is to understand anything, it must be able to understand octet-streams, since everything else is described in terms of these. Text can be viewed as an octet-stream. Pictures, movies, and music files can all be described as octet-streams, so unless XML, RDF and OWL allow clear expression of the concept of binary data, the application of these technologies will always be limited to vauge and abstract knowledge. There are many ways that a unified binary type could be brought to the XML world. One would be to make a owl:unionOf type that combined the two. Or another option would be to define an rdf:List type, and restrict the types of the members of the list to octets. Since the first option is trivial, and does not provide much structural knowledge about octet-streams, we will not consider it. The second approach does add structural knowledge, and so would be more appropriate for allowing the Web to understand more about octet-streams.

Neither RDF nor OWL allow restricting lists to be of a certain type, so in order to make a class of lists of octets, we must first have the ability to make lists of a type. To this end, we define a new rdf:Property called ex:listType. How would this property be used? To understand this, we must first understand how RDF handles higher-order types. For example, Properties in RDF can be thought of as higher-order types with two parameters. Using a Haskell-like syntax, this would mean (x :: Property a b) would correspond to the triples

_:x rdf:type rdf:Property .
_:x rdfs:domain _:a .
_:x rdfs:range _:b .

which means the appropriate way to encode list types in RDF, such as (x :: List a), would be with the triples

_:x rdf:type rdf:List .
_:x ex:listType _:a .

With this new construct, a unified ex:binary type could easily be defined as (List xs:unsignedByte). Or in simpler terms, the triple (_:x rdf:type ex:binary) would be equivalent to the triples

_:x rdf:type rdf:List .
_:x ex:listType xs:unsignedByte .

This would allow OWL ontologies about file formats, stream protocols, multimedia files, and much more. Using these two higher-order types as examples, it seems all higher-order types can be patterned after these two. Also, since it is abstract enough to represent these higher-order types, it seems that Web technologies are on the right track. RDF and OWL seem to be the most abstract way of expressing modeling ideas, even more abstract than systems like CORBA, IDL and UML. The fact that they allow expressing higher-order types such as those found in Haskell is promising. Perhaps the goal of model compilation (the OMG kind), is much closer than we think. However, the bloat of UML does more to complicate model translation than to simplify it.

Perhaps the simplicity of RDF is all we need.

No comments:

Post a Comment