Names Names Names
Over at OCLC, Thom and his team are doing work to match names across several international Name Authorities. This comes after the recent announcement about allowing non-latin characters into the LC/NACO Name Authority.
This is great work and ties up somewhat with threads I've been thinking about following discussions on other lists.
Firstly, Nicole brings together thoughts from Tim Spalding with blog comment on the RDA drafts. This challenges the notion that controlled subject vocabularies serve end-users particularly well. This is covered in David Weinberger's Everything is Miscellaneous, of course. This is one of the key things that changes when an index no longer require a huge room full of drawers of cards to keep it in.
Secondly, there's a thread about linking to digitized books (login required) going on over on NGC4Lib. In that thread folks are discussing the cataloguing of books digitized by Google Book Search and others.
Jan Szczepanski describes how she is cataloguing GBS books:
You can collect in two ways, Tim's way or my way, or a combination of both.
My way, or what I could call the quality way means that You carefully looks at every title. Who likes "white noise"? I use the same criteria I use for paper books.
Maurice York is interested in that approach:
I'm curious about this trash-or-treasure line of thinking as a reasoned basis for the manual effort of selection of digitized texts. You are quite right that libraries specialize in selection and have been doing it for thousands of years (more in generalities than realities, since I don't believe any library with a currently functioning collection has been around for more than a few hundred). But it seems to me that this is the very reason Google saw libraries as such an attractive proposition for digitization--they have been building high-quality collections of print materials and (presumably) sorting much of the dross according to sustained plans over long periods of time. When you say that the vast majority of texts in Google are "bad quality, bad relevance", that seems more a dig at American libraries and how we collect than at Google, since Google's collection is no more and no less than what librarians have created. Let me expand that a bit....it's something of a criticism of the libraries of Spain, Germany, the Netherlands, Japan, England, and France as well, all of whom are digitizing books with Google.
These snippets of bits & pieces are all starting to fit together. Not quite sure what the jigsaw is of yet, but it's going to be interesting.
It seems to me that subject classification has to be opened up to everyone - and simplified. It doesn't need to be a hierarchy anymore, and it doesn't need to be controlled. On the other hand, names need some real clever work, to decide which names are the same and which are different requires a huge amounts of intelligence and knowledge. Authority files have historically helped with this, but we need to make those work much harder.
I should write some more on how I think this stuff fits together - mental note to self, must write a paper on Marc and RDF just as soon as we have Talis Insight out of the way.
Technorati Tags: library, multi-lingual, open-data
I don't know which thread you're pointing to, but there is an openaccess (no login required) archive of NGC4Lib at GMANE: http://dir.gmane.org/gmane.culture.libraries.ngc4lib