David duChemin, a Humanitarian Photographer from Vancouver, wrote a bog postduC13 at the start of 2013 (in the “New Year Resolution” season) entitled “Planning is just guessing. But with more pie charts and stuff”. He writes:
“Planning is good. Don’t get me wrong. It serves us well when we need a starting point and a string of what ifs. I’m great at planning. Notebooks full of lists and drawings and little check-boxes, and the only thing worse than planning too much is not planning at all. It’s foolish not to do your due-diligence and think things through. Here’s the point it’s taken me 4 paragraphs to get to: you can only plan for what you’ll do, not for what life will do to you.”
OK he doesn’t really think planning is just guessing; in the post he’s stressing the need for flexibility, but also pointing out that planning (however flawed) is better than not planning.
That blog post is part of what inspired me to write this. Another part is a piecce of work that I’m doing that seems to have gone on forever. It seems like a good idea to put this up and see what comments I get that might be helpful.
Planning to manage the data for your research project is not the same thing as filling in a Checklist, or running DMP Online. The planning is about the thinking processes, not about answering the questions. The short summary of what follows is that planning your research data management is really an integral part of planning your research project.
So when planning your research data management, what must you do?
First, find out what data relevant to your planned research exists. You traditionally have to do a literature search; just make sure you do a data search as well. You need to ensure you’re aware of all relevant data resources that you and your colleagues have locally, and data resources that exist elsewhere. Some of these will be tangentially referenced in the literature you’ve reviewed. So the next step is to work out how you can get access to this data and use it if appropriate. It doesn’t have to be open; you can write to authors and data creators requesting permission (offering a citation in return). Several key journals have policies requiring data to be made available, if you need to back up your request.
The next step, clearly, is to determine what data you need to create: what experiments to run, what models, what interviews, what sources to transcribe. This is the exciting bit, the research you want to do. But it should be informed by what exists.
Now before planning how you are actually going to manage this data, you need to understand the policies and rules under which you must operate, and (perhaps even more important) the services and support that is available to you. Hidden in the policies and rules will be requirements for your data management (data security, privacy, backup, continued availability, etc). Hidden in the services and support will be some that will be very useful to you, and will save you time and diverted resources (institutional backup services, institutional data repositories, etc). As suggested above, these services and support could come from your group, your institution, your discipline, your scientific society, or your invisible college of colleagues around the world.
So now you can plan to manage your data. You may need to address many issues:
- Identification, provenance and version control: how to connect associated datasets with the experimental events and sources from which they derived, and the conditions and circumstances associated.
- Storage: how and where to store the data, so that you and your colleagues (who may be in other institutions and/or other countries with different data protection regimes) can work on it conveniently but securely. Issues like data size, rate of data creation, rate of data update may all be relevant here. Data backup! Encryption for sensitive data taken off-site. Access control. Annotation. Documentation.
- Processing: how will you analyse and process your data, and how will you store the results. Back to provenance and version control!
- Sharing: How to make data available to others, and under what conditions. Where will you deposit it? With what associated information to make it usable? Depends on the data of course, and issues such as data sensitivity. May also depend on data size etc. Which data to share? Which data to report?
That’s not everything but it’s the core. When you’ve done the basic planning at this sort of level, you can get down to writing the Plan! At this point the specific requirements of research funder and institution will come into play, and tools like DCC DMP Online will be useful. They may even remind you of key issues you had forgotten or ignored, or local services you (still) didn’t know about.
At this point you don’t know whether your research will be funded, so there is a limit to the amount of effort you should put into this. NERC wants a very much simplified one-page outline data management plan; it may be more sensible to have a 2 or 3-page plan covering the stuff above, and condense down (or up) as required by your funder.
But you’re still only at the first stage of your research data management planning! If you are lucky enough to get your project funded, there will be a project initiation phase, when you gather the resources (budget, staff, equipment, space). Effectively you’re going to build the systems and establish the protocols that will deliver your research project. At this point you should refine your plan, and add detail to some elements you were able to leave rather vague before. Now you’re moving from good intentions to practical realities. And given that life does throw unexpected events at you (staff leaving, IT systems failing, new regulations coming in), you may need to do this re-planning more than once. Keep them all! They are Records that could be useful to you in the future. In a near-worst case, they could form part of your defence against accusations of research malpractice!
My point is, this isn’t so much good research data management planning, as good planning for your research.
duC13 duChemin, D. (2013). Planning Is Just Guessing. But With More Pie Charts and Stuff. Vancouver, BC. Retrieved from