Paul Miller is right... and so is Ian Davis

20 July 2009

Paul Miller, a good friend and ex-colleague, has been having a tough time arguing that perhaps Linked Data doesn't need RDF. Don't misunderstand that, he thinks RDF is a Good Thing and Best Practice for Linked Data. But he thinks a dogmatic stance is unhelpful.

The problem, I contend, comes when well-meaning and knowledgeable advocates of both Linked Data and RDF conflate the two and infer, imply or assert that ‘Linked Data’ can only be Linked Data if expressed in RDF.

This dogmatism makes me deeply uncomfortable, and I find myself unable to agree with the underlying premise. In the twitter stream that Paul links to there is some comment reminding people that RDF can take many forms, not just RDF/XML.

kidehen: @andypowe11 re. #rdf, it's the data model for #linkeddata based #metadata. Remember #rdf != RDF/XML, no escaping RDF model re. #linkeddata. Ian Davis (my boss) took a strong stance saying that if things weren't RDF then they weren't linked data. Perhaps the very thing Paul sees as a dogmatic stance. Ironic as Ian is far from dogmatic. But Ian is defending the term Linked Data, not saying that's the only way to publish data on the web...

TallTed: @iand "I think LD better for many cases, but there are times i'd rather hv a spreadsheet." What? Can a spreadsheet not hold #LinkedData? Well, it seems to me both Paul and Ian are right to a strong degree and are essentially arguing over only one thing - the meaning of the term Linked Data.

Paul quote Tim Berners-Lee's design note on Linked Data:

Use URIs as names for things

Use HTTP URIs so that people can look up those names

When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

Include links to other URIs. so that they can discover more things. The emphasis is Paul's. I would emphasise a different point:

4. Include links to other URIs. so that they can discover more things. And in point four lies the reason that Ian is saying a spreadsheet isn't Linked Data, even if it's on the web and even if it's linked to. The only standard for describing how one resource relates to others using URIs is RDF. Sure, you can put URIs into a spreadsheet, but there is no standard interpretation of what the sheets, rows and columns mean. Sure, you can put URIs into a CSV file, but again, there is no standard interpretation of what the fields mean.

The end result of that is data published on the web that can be linked to but not from.

At this early time, though, Paul argues that what we really want is to get more and more data published and open. We all agree on that, I know. Ian does for sure, he runs Data Incubator for exactly that reason - well, that and helping show those publishing spreadsheets and CSV why they should move to RDF and Linked Data.

In the comments on Paul's post Justin (another senior manager at Talis) says:

Yes the same mistake was made with the rise of the web.

Once you had URIs and HTTP you already had plain text which is a perfectly good way to encode content. By adopting the STANDARD convention of HTML, all sort of existing text based formats with their various mark ups were locked out. That locked out a lot of content that already existed and required anyone who wanted to play to convert existing content into a html format.

Of course it did have the small side effect that to consume web content you only needed a browser that understood one convention i.e. html.

The same is true of RDF. XML is the equivalent of ascii in this regard. And that's the point. XML is the equivalent of ASCII, as is a spreadsheet or a CSV file, not because they're simple, but because they have no mechanism for embedding the relationships and links necessary to link out from your data. Yes, they can contain URIs and clients can decide to make those into links, but there is no way to describe the meaning.

I agree with both side of this argument - If it isn't RDF then it isn't Linked Data, but I wouldn't keep pushing that point if someone was willing to publish data yet unable or unwilling to publish RDF (in any of its many forms).

Comments

20 July 2009Zach Beauvais

Sorry, should have linked to TBL interview: http://www.readwriteweb.com/archives/interview_with_tim_berners-lee_part_1.php

20 July 2009Jonathan Rochkind

Surely RDF is not the "*only* standard for describing how one resource relates to others using URIs." It might be the only general-purpose abstract standard for that. But there are a couple dozen other standard ways to describe, in certain contexts, how one resource relates to another using URIs. I mean, even just could be described as that. Obviously it's a very special purpose case of limited use. There are probably other standards that neither you nor I have heard of. Of course, one could argue that a standard nobody's heard of isn't very useful, and you should use the general purpose abstract standard that people have heard of. But it seems oddly, yeah, dogmatic,to suggest that there isn't even possibly (now or in the future?) any standard that allows one to "describe how one resource relates to others using URIs." Use what works.

20 July 2009Jonathan Rochkind

It ate my tag in my comment. I suggested that even just using an html link tag with rel equals 'canonical' could be described as a very limited specific context for 'describing how one resource relates to another using uris'.

20 July 2009More Linked Data and RDF | Paul Miller - The Cloud of Data

[...] also liked the simplicity with which Alan Dix and Elliot Smith responded to Rob Styles’ ‘Paul Miller is right… and so is Ian Davis,’ writing; “Surely the critical issue is whether the semantics are available, not whether they [...]

20 July 2009Alan Dix

Surely the critical issue is whether the semantics are available, not whether they are in RDF. If a csv file is published AND suitable semantics are available, then you know which columns are URIs or whatever else. ... but how to give the semantics ... maybe someone needs a standard for meta-data ... hey what was RDF supposed to be for???

20 July 2009Zach Beauvais

I think the semantics here is problematic. I don't mean the explicit linking to metadata kind of semantics, but the whole problem with using ambiguous and generic words to describe both precise technological expressions and general trends. You could create your own term to demonstrate. We could try "Published Data". Publishing data, some in the public sector seem to content, can be done in PDF's. But that's not what anyone in my industry (I also work at Talis) would call "Published Data", because it's in an obtuse format which can't be used without serious investment of time and translation. But it fits the criteria of being available on the web, so it's data that's been published... and you can imagine the twitter battles following that. To me, "Linked Data" makes use of semantic web technologies (including RDF), when it's used as a proper noun (a noun, usually capitalised in English, which expresses a specific thing or person: like "United Kingdom" or "Tom"). Tim Berners-Lee would seem to agree, sometimes, from his recent ReadWriteWeb interview: "They [Linked Data and the Semantic Web] fit in completely, in that the linked data actually uses a small slice of all the various technologies that people have put together and standardized for the Semantic Web. ... One of the nice things about Linked Data, when they have a pile of it, is that they could run a SPARQL server on it. ... So the message [for government] is to use RDF." However his explicit stance on whether Linked Data NEEDS RDF isn't crystal clear, it would seem that his expectation is that Linked Data uses SPARQL (which needs RDF). However, the message to get data out there doesn't require it to be Linked Data. It still makes most sense to do so, as it's become best practice and standards-based. So, is arguing whether this data is "Linked" or not useful if it's best practice to use RDF and SPARQL? Seems, to me, that the real debate is whether it is best practice to use SPARQL, not whether linked data is Linked Data.

20 July 2009Elliot Smith

I'm with Alan - if you publish data on the web and a suitable semantics for interpreting that data and linking it to other data, then why isn't it Linked Data? It just so happens that RDF has a clear(er) semantics describing the interpretation of its data elements (URIs in particular) than a spreadsheet does; it doesn't mean you couldn't apply similar semantics to a spreadsheet if you were so inclined.

20 July 2009Rob Styles

Of course, it's possible to create a system of machine readable data using CSVs, but how does one get from the CSV to the definition of it? And once one has the definition, it's only practical to describe the same type of data within one file as the definition has to say something like "column one means the person's homepage". It's not that it couldn't, with a lot of work, become Linked Data. But why would you? There are only two reasons to publish something like CSV, Excel or XML. One is that you already have the data in that form, so publishing is simpler. The other is that it needs to be consumed in a specific context where that format is already easily accepted. Either of those may be a good reason to publish something that's not Linked Data, but saying it is isn't quite true.