As a researcher you have to make a cost estimate of your project. Wouldn’t it be really nice to have a one-stop shop where you can select your research analytics tools like R (as a service) and your datasets, all in the cloud by the way, and pay at a checkout counter.
This is just an idea where I describe the potential of the SURFmarket’s pilot project “Cloud distribution channel” when adding “Quality Data” next to the so called “Apps”.
In the Netherlands there is an increasingly need for an overall overview of the apps, tools, services and data you can utilise as a research in your research project. At the universities “Research Service Centers” are popping up, trying to help and support the researcher in their different phases of their research project, by combining the knowledge and services of the different university departments together in one one-stop-shop at the institutional level. The University library of Maastricht is growing their service center together with all the researchers and staff departments, trying to evolve and improve their offerings. Also the Technical University of Delft is creating a specialised department for Research Support. All these people in the Netherlands who are committed to deliver the support structure for the needs of the researchers, are united in a special interest group (SIG) from SURF, called the SIG Research Support.
Looking at ‘research data’ services in particular is that the University service departments are, for now, concentrating on services where you can store your research data at an optimum balance between costs and availability. When you have your data sitting there, stored, well described, stamped with a quality brand and ready to be reused and citable, it would be great if you could put it on a market place to cover for the digital curation costs, right? Well lets assume we live in a paradigm where this is a great idea! What are then the potential possibilities?
I will put in another ingredient in your mind to this conceptual idea. As an analogy of the iTunes App Store (Trade Mark, and so forth), this market place that I will describe can offer little corners of scientific disciplines where small editorial boards are making collections of tools and high quality datasets for researchers to delve into.
Lets just focus on some “neutral” databases with factual data, or factoids. I will try to sum-up some of those databases that are out there; public geo spacial data (NL:kadaster), commercial publisher data Web of Science, demographic statistical data (NL:CBS), international bibliographic data (VIAF), etc.
In a potential use case where I am a researcher, I would like to visit this collection of my discipline and search, filter and look at the datasets available. I would like to select the dataset to use and it shows me the costs. It is maybe even possible to make a query for a sub-section of the data, and reduce the total costs. Then I get two buttons: Try and Buy. The Try button gives me a sample of 10 records of data to look if it is useful enough, or to see if I can find values to connect the data to other datasets. The Buy button sends me to a licence agreement page where it tells me how to cite the dataset or subset, what I can and can’t do with it, if it has an expiration date, can it be reused, combined, made publicly available, etc.
And just as you this might be great for research! … it is already done by the commercial industry. Bringing data for apps in iTunes style. Edd Dumbill has written for the Oreilly blog an article called “Data markets compared” which gives a pretty good overview of the data markets available.
For the sake of LOCKSS curation principle, I will just cite a big part of that blog content here.
Data markets compared
Azure Datamarket Factual Infochimps Data sources Broad range Range, with a focus on country and industry stats Geo-specialized, some other datasets Range, with a focus on geo, social and web sources Free data Yes Yes – Yes Free trials of paid data Yes – Yes, limited free use of APIs – Delivery OData API API, downloads API, downloads for heavy users API, downloads Application hosting Windows Azure – – Infochimps Platform Previewing Service Explorer Interactive visualization Interactive search – Tool integration Excel, PowerPivot, Tableau and other OData consumers – Developer tool integrations – Data publishing Via database connection or web service Upload or web/database connection. Via upload or web service. Upload Data reselling Yes, 20% commission on non-free datasets Yes. Fees and commissions vary. Ability to create branded data market – Yes. 30% commission on non-free datasets. Launched 2010 2010 2007 2009
Other data suppliers
While this article has focused on the more general purpose marketplaces, several other data suppliers are worthy of note.
Wolfram Alpha — Perhaps the most prolific integrator of diverse databases, Wolfram Alpha recently added a Pro subscription level that permits the end user to download the data resulting from a computation.
Perhaps the research community can learn from the commercial industry and engage in collaboration. What is so bad about storing your research data at a commercial storage facility, as long as they use DOI’s for citing, the bucks acquired can be used to curate your data for the generations to come.
What are your thoughts?