Data Sharing: Panacea or Can of Worms?
24/02/2012 § 16 Comments
Author’s note: My interests within the LIS field are data curation and e-science librarianship. This is a hot topic that is growing every day, and skilled e-science librarians are needed to fill the gap. If you’re interested in learning more about data curation librarianship as a future career, leave a comment here, and I’ll follow up with more information.
Back in the Fall, Micah wrote a post about Open Access Week. In it he discussed open journals, open data, and the ALA Code of Ethics. Open data is what today’s post is about. An important ongoing question in the world of data curation today is how to get scientists to share their data by placing it in a data repository. There are many scientists who are unaware of the fact that their data has value to anyone but them and their research team. On the other hand, there are scientists who are very possessive of their data and don’t want to release it for fear that they will lose control of it and not be credited for its creation. There are also those who want to suck every drop of publishing potential out of a data set before releasing it to anyone else.
Last November, there were two requests for information (here and here) put out by the White House Office of Science and Technology Policy. One asked if peer-reviewed journal articles resulting from federally funded research should be accessible to the public. The other asked if data from federally funded research should be accessible to the public. OSTP has released the comments from that RFI here. I have not read all the responses, but the ones I have read seem to indicate that the support of open-access is high among those not affiliated with a publisher and cautious, at best, from those affiliated with a publisher. The questions, concerns, and issues I see raised generally deal with how journals can remain profitable for the value they add and how researchers can receive due credit for their efforts.
But let’s set aside the questions of whether scientists and researchers should be required to share their data and articles or even if it’s a good idea that they do it. I think an even larger issue here is whether or not our current crop of scientists and researchers has the data management skills necessary to make the research data usable to anyone but themselves and their immediate research group. Data management practices of researchers are not exactly stellar. Infrequent or nonexistent backups, inadequate metadata on variables and research background, and loose standards all contribute to a set of data that is basically useless to anyone not involved with the project from the beginning.
Do you think that the data generators know how to manage their data properly? What can be done to improve the situation? How can librarians help?