Stefano Mazzocchi is awesome and his thinking on Web-based data is incredibly nuanced and pragmatic, so it’s not often that I want to publicly disagree with him. But in his latest post, I think he’s off the mark.
Stefano argues:
The difference between RDFa and Microdata (syntactic differences aside) is basically the fact that the proponents of the first believe that once everybody naturally starts reusing existing ID schemes and ontologies a densely connected web of semantically reconciled information will come together naturally. The second just want to focus on immediate values and avoid speculating on what’s going to happen next.
In the same vein, he adds:
The RDFa camp see it as a vector to promote the growth of the web of data, while the Microdata camp focuses on solving practical problems of embedding richer machine-processable information in web pages
That’s not true. RDFa is 100% focused on solving practical problems, e.g. for Creative Commons search: Google and Yahoo now support Creative-Commons image search based on RDFa (check out the Google video on how to add RDFa to your images). RDFa builds on existing technology, RDF, not because it’s the gospel but because it has some nice properties: you can reuse someone’s vocabulary if you want, or you can invent your own if you prefer. If you invent your own, there’s a little bit of overhead to prevent stepping on toes and to enable others to reuse your vocabulary if they choose. We don’t expect everyone to reuse all the time. We expect duplicate vocabularies to arise. But we do think it’s a good idea to make reuse possible, easy and scalable to the Web.
We absolutely do not expect a “densely connected web of semantically reconciled information” to “come together naturally.” But we do think that, when users want to build more densely connected semantically reconciled graphs, they should be able to.
This isn’t just a theory, it’s actually happening: Google is reusing Yahoo’s RDFa vocabulary intermixed with other vocabularies. They didn’t have to. Other groups at Google are making up their own vocabularies. And that’s okay: both approaches are part of a healthy Web-data ecosystem.
So, are we speculating too much on what’s going to happen next? I don’t think so. In fact, I think it’s quite the opposite: RDFa is giving users a choice, while other technologies are purposefully reducing choice. I call it overly opinionated software: being so certain that even slight future-proofing is pointless that you actually make it deliberately harder for your users. The criticism that Stefano offers actually applies the other way: solutions that de-emphasize Web-scale identifiers are reducing options by deciding that there shall be no meaningful, scalable reuse or distributed innovation. With RDFa, you have a choice. With a number of other technologies, you can’t choose to reuse / mash-up easily.
You said something about reconciliation
Now, one of many areas where I’ve learned quite a bit from Stefano, one where everyone in the Linked Data community should stop and listen, is reconciliation. RDF promises reconciliation, where one day Google and Yahoo will realize that google:author and yahoo:creator are actually the same thing, they’ll come together around a campfire somewhere between Mountain View and Santa Clara and sing Kumbaya by mapping each URL to the other. And Stefano is absolutely right to point out that this won’t be trivial when the data is not identically sampled, when strings contain much embedded structure that hasn’t been normalized. He summarizes this as:
I find it frankly disheartening that purists still believe that the secret to a useful web of data is already there in the guts of the architecture of the web and that by simply turning a URI into a URL will cause enough social pressure to solve the other issues.
I agree. The Linked Data community shouldn’t overpromise what same-as mappings will accomplish.
But take a step back: what is the other option? Having even less information about the vocabularies we use? Not having the ability to map vocabulary terms to one another? Once again, it’s an issue of choice: RDFa and RDF give you the ability to map concepts to one another. You don’t have to. You can ignore those features if you want. But isn’t it still a good idea to let users map concepts when they choose and when they can?
RDF and RDFa shouldn’t overpromise what reconciliation can deliver. At the same time, critics shouldn’t use the argument that, because RDF doesn’t solve all problems, then it solves none. Especially when the alternative solutions provide zero functionality in that department.
Who you callin’ Purist?
So here’s my take. The folks preventing reuse are the ones over-speculating. They are placing deliberate obstacles to vocabulary reuse, not because they are being cautious about the future, but because they think they know the future exactly, and they think that future doesn’t include vocabulary reuse. Meanwhile, contrary to popular belief, there are no RDFa cops watching over your shoulder, smacking you upside the head when you reinvent a term that FOAF or Dublin Core already specified. But there are RDFa genies, waiting to be invoked to help you reuse and augment FOAF and Dublin Core, if you want to.
With this added context, who’s the purist, really?