The Misjudgements of Theresa May

6 Jun

It’s nearly 4 years since I last put anything on this blog. I haven’t used the blog for anything political before, but these are worrying times and I’m trying to get my head straight, as I’ve not yet made up my mind how to vote. But Mrs May seems to have managed a truly spectacular number of misjudgements over the past few years!

When Home Secretary

I’ve a feeling there were many more blunders and few successes, but these are the two most obvious right now. First was the cuts to police numbers, and in the current context most telling, the cuts to armed police. This was compounded by her speech to the Police Federation: “So please – for your sake and for the thousands of police officers who work so hard every day – this crying wolf has to stop.”

Second, during her time at the Home Office, she failed to get policies and practices in place to get immigration under control, to the “tens of thousands” promised by the Tories. If she had managed this, who knows perhaps she would have reduced the Leave vote.

She was also a Remain supporter nominally, but I only saw one weak speech in favour of Remain by her.

After the Referendum

Once she became PM, one of her first acts was to appoint Boris Johnson as Foreign Secretary. This really felt like a joke, a bad one: put the most notorious, un-disciplined blabbermouth in charge of the most sensitive relationships we have with foreign governments.

And despite his woeful performance, she re-appointed Jeremy Hunt as Health Secretary, and failed to sack him as he carried on making things worse.

Then despite being a Remain supporter, she flip-flopped to become a supporter of hard Brexit, without bringing the rest of us along with her. No explanation of the process by which they decided that was the appropriate way forward. It appears she doesn’t consult, not with us, nor with Parliament!

Along with hard Brexit came the idea of a cosy trade deal with the US. How was that likely to happen, with the President-elect wanting to change the terms of all deals in America’s favour?

In pursuit of this, off she trotted to Washington after his inauguration. I can forgive her the ghastly hand-holding experience- it’s become obvious that P45 is a notorious hand-grabber- but I can’t forgive the extraordinary sucking up to this dangerous man in inviting him, not for a normal visit, but to a State Visit, something never accorded to any previous President so early in their term. I can see that un-inviting him could be difficult, but let’s hope she manages NOT to achieve this visit; diary clashes might be her last resort!

Then we had the whole Article 50 issue. Proclaiming that in a parliamentary democracy there was no need to consult Parliament was extraordinary. She might have saved the day by recognising the problem, but she compounded by going the whole 9 yards… and losing. And she still doesn’t want Parliament to have any meaningful role. This is the most important negotiation of our lifetimes and she doesn’t want Parliament to be involved at all!

Following that, we had the Juncker visit; hard to tell what went on, but it seems like she engaged in the sort of hard-line posturing that had been happening publicly. She’s attempting to win the negotiation by making enemies of the 27 (or 28 if you count the Commission) people she’s to negotiate with!

And then, she lets Hammond forget the Tory Manifesto and propose raising National Insurance… prompting another U-turn.

Calling the election

Having promised many times that an election was unnecessary and would be divisive, she suddenly U-turns again and announces a snap election. The reasons she gave were transparently specious: every key vote had been won handsomely in the Commons, but she seemed to be worried that Her Majesty’s Loyal Opposition was doing its job. It wasn’t clear how she thought an election would fix this… a huge majority would empower her own Remainers, and a smaller majority would leave things as they are except she’d be weakened!

Calling an election to take place days before the Brexit negotiations were due to start was also a misjudgement, taking her Government’s focus away from preparations to Party matters.

It appears she thought that Corbyn couldn’t win over the electorate, and she might get a better majority now than in 2020 after Brexit had happened and the potentially disastrous terms were clear. It looks like she also wanted to get some of Cameron’s policies off the manifesto, so she could have another go at raising tax and NI. So then we had the major misjudgement of the Social Care funding issue. And after partly back-tracking she doubled down by claiming “nothing has changed, nothing has changed”.

Then there’s the focus of the campaign exclusively on her supposed “strong, stable leadership”. It seen became abundantly clear that she offered anything but that.

And how could she think it was a good idea, being such a “strong and stable leader”, not to turn up for the leadership debates. OK, they are risky, but she’s our current and proposed PM, we should be able to see her in action. It’s another thing that’s made her look weak.

That cringeworthy performance on the One Show sofa with her poor husband was also disastrous. All that stuff about boy’s jobs and girl’s jobs, and then that awful story about the woman who got into politics because of Mrs May’s shoes. I mean, please!

Overall

I might find the misjudgements above evidence of a fallible person I could relate to, except that overall she seems to want to appear as Robo-politician rather than human being. Just answer the damn questions for once, please!

You might guess from that, that I’m unlikely to vote Tory, and you might be right. My local Tory candidate had a 58% majority last time; his lead over the second-placed candidate was significantly greater than the number of registered voters who didn’t bother to vote, so I’m guessing he won’t care too much about what I do. But I care and would like to make the right choice!

An inadequate thankyou for UKOLN

1 Aug

I’ve been struggling to find a way to mark the passing of UKOLN, or at least UKOLN as we knew it (I’m not sure whether the remaining rump is still called UKOLN; the website has not been updated with much if any information about the changes that occurred on 1 August, as of this writing). I enjoyed the tweet-sized memories yesterday under the #foreverukoln hashtag. The trouble is, any proper marking of UKOLN needs more than a tweet, more than a post, more even than a book. And any less proper marking risks leaving out people who should be thanked.

But, I can’t just leave it unmarked. So you have to accept that this is just some of the things I’ve appreciated from UKOLN, and names just some of the many people from UKOLN who have helped and supported me. If you’re left out, please blame my memory and not any ill-intent, but also note this doesn’t attempt to be comprehensive.

So here’s the first thing. I’ve found in my store of ancient documents the text of the draft brochure for the eLib Programme, written in 1995 or 1996 (some of you will remember its strange square format and over-busy blue logo). Right at the bottom it says:

“The eLib web pages are maintained by UKOLN and can be found at

http://ukoln.bath.ac.uk/elib

Now (currently at least, if you click on that link it will still work, redirecting you to http://www.ukoln.ac.uk/services/elib/. There have been multiple changes to the UKOLN website over the years, and they have always maintained the working links. I don’t know most of the people who did this (though Andy Powell and Paul Walk both had something to do with it), but my heartfelt thanks to them. Those readers who work anywhere near Library management or Library systems teams: PLEASE demand that prior URIs continue to work, when getting your websites updated!

The first phase of the eLib programme had around 60 projects, many of them 3 year projects. As we moved towards the second and third phases, the numbers of projects dropped, and it was clear that the UK’s digital library movement was losing many people with hard-won experience in this new world. (In fact, we were mainly losing them to the academic Libraries, so it was not necessarily a Bad Thing.) I remember trying to persuade JISC that we needed a few organisations with greater continuity, so we wouldn’t always have new project staff trying to learn everything from the ground up. Whether they listened or not, over the years UKOLN provided much of that continuity.

Another backroom group has also been hugely important to me. Over the 15 years I was working with them, UKOLN staff organised countless workshops and conferences for eLib, for JISC and for the DCC. These staff were a little better publicly known, as they staffed the welcome desks and communicated personally with many delegates. They were always professional, courteous, charming, and beyond helpful. I don’t remember all the names; I thank them all, but remember Hazel Gott from earlier andNatasha Bishop and Bridget Robinson in more recent times.

A smaller group with much higher visibility would be the Directors of UKOLN. Lorcan Dempsey was an inspired appointment as Director, and his thoughtful analyses did much to establish UKOLN as a force to be reckoned with. I’d never met anyone who read authors like Manuel Castells for fun. I was a simple-minded, naïve engineer, and being in 4-way conversations with Lorcan, Dan Greenstein of the AHDS, and John Kelleher of the Tavistock Institute, larded with long words and concepts from Social Science and Library Science, sometimes made my brain hurt! But it was always stimulating.

When Lorcan moved on, the role was taken by Liz Lyon, whom I had first met as project coordinator of the PATRON project at the University of Surrey. A very different person, she continued the tradition of thoughtful analyses, and promoted UKOLN and later the DCC tirelessly with her hectic globetrotting presentations. She was always a great supporter of and contributor to the DCC, and I have a lot to thank her for.

One of the interesting aspects of UKOLN was the idea of a “focus” person. Brian Kelly made a huge impact as UK Web Focus until just yesterday, and though our paths didn’t cross that often, I always enjoyed a chat over a pint somewhere with Brian. Paul Miller, if I remember right, was Interoperability Focus (something to do with Z39.50?), before moving on to become yet another high-flying industry guru and consultant!

That reminds me that one of my favourite eLib projects was MODELS (MOving to Distributed Environments for Library Services, we were big on acronyms!), which was project managed by Rosemary Russell, comprising a series of around 11 workshops. The second MODELS workshop was also the second Dublin Core workshop, so you can see it was at the heart of things. Sadly at the next workshop I coined the neologism “clumps” for groups of distributed catalogues, and nobody stopped me! We chased around a Z39.50 rabbit hole for a few years, which was a shame, but probably a necessary trial. Later workshops looked at ideas like the Distributed National Electronic Resource, information architectures, integrated environments for learning and teaching, hybrid environments, rights management and terminologies. And the last workshop was in 2000! Always huge fun, the workshops were often chaired by Richard Heseltine from Hull, who had a great knack for summarising where we’d got to (and who I think was involved directly in UKOLN oversight in some way).

Rachel Heery also joined UKOLN to work on an eLib project, ROADS, looking at resource discovery. She had a huge impact on UKOLN and on many different areas of digital libraries before illness led to her retirement in 2007 and sadly her death in 2009. The UKOLN tribute to her is moving.

UKOLN did most of the groundwork on eLib PR in the early days, and John Kirriemuir was taken on as Information Officer. I particularly remember that he refused to use the first publicity mugshot I sent; he told me over the phone that when it opened on his PC someone in the office screamed, and they decided it would frighten small children! I think John was responsible for most of the still-working eLib website (set up in 1995, nota bene Jeff Rothenberg!).

Ariadne has become strongly identified with UKOLN, but was originally suggested by John MacColl, then at Abertay, Dundee and now St Andrews, and jointly proposed by John and Lorcan as a print/electronic parallel publication. John Kirriemuir worked on the electronic version in the early days, I believe, later followed by Philip Hunter and Richard Waller, both of whom also worked on IJDC (as also did Bridget Robinson).  Ariadne is a major success; I am sure there are many more who worked on making her so, and my thanks and congratulations to all of them.

Most recently I interacted with UKOLN mostly in terms of the DCC. As well as Liz and those working on IJDC, Alex Ball, Michael Day, Manjula Patel and Maureen Pennock made major contributions, and wrote many useful DCC papers.

Last but by no means least, we tend to forget to thank the office staff behind the scenes. I don’t remember most names, my sincere apologies, but you were always so helpful to me and to others, you definitely deserve my thanks.

… and to so many more UKOLN staff over the years, some of whom I should have remembered and acknowledged, and some of whom I didn’t really know: thanks to you from all of us!

Data Management Planning tools: still immature?

26 Apr

I’ve spent the last few months looking at the JISC data management planning projects. It’s been very interesting. Data management planning for research is still comparatively immature, and so are the tools that are available to support it. The research community needs more and better tools at a number of levels. Here are my thoughts… what do you think?

At group or institution level, we need better “maturity assessment” tools. This refers to tools like:

  • DCC CARDIO for assessing institutional readiness,
  • the DCC Digital Asset Framework for understanding the landscape of data resources,
  • repository risk assessment and quality assessment tools like DRAMBORA, Data Seal of Approval, etc
  • security assessment tools including audits based on ISO 27000.

Some of the existing tools seem rather ad hoc, as if they had emerged and developed from somewhat casual beginnings (perhaps not well put; maybe from beginnings unrelated to the scale of tasks now facing researchers and institutions). It is perhaps now time for a tool assessment process involving some of the stake-holders to help map the landscape of potential tools, and use this to plot development (or replacement) of existing tools.

For example CARDIO and DAF, I’m told, are really tools aimed at people acting in the role of consultants, helping to support a group or institutional assessment process. Perhaps if they could be adjusted to be more self-assessment-oriented, it might be helpful. The DAF resource really needs to be brought up to date and made internally consistent in its terminology.

Perhaps the greatest lack here is a group-oriented research data risk-assessment tool. This could be as simple as a guide-book and a set of spreadsheets. But going through a risk assessment process is a great way to start focusing on the real problems, the issues that could really hurt your data and potentially kill your research, or those that could really help your research and your group’s reputation.

We also need better DMP-writing tools, ie better versions of DMPonline or DMP Tool. The DCC recognises that DMPonline needs enhancement, and has written in outline about what they want to do, all of which sounds admirable. My only slight concern is that the current approach with templates for funders, disciplines and institutions in order to reflect all the different nuances, requirements and advice sounds like a combinatorial explosion (I may have misunderstood this). It is possible that the DMP Tool approach might reduce this combinatorial explosion, or at least parcel elements of it out to the institutions, making it more manageable.

The other key thing about these tools is that they need better support. This means more resources for development and maintenance. That might mean more money, or it might mean building a better Open Source partnership arrangement. DMPonline does get some codebase contributions already, but the impression is that the DMP Tool partnership model has greater potential to be sustainable in the absence of external funding, which must eventually be the situation for these tools.

It is worth emphasising that this is nevertheless a pretty powerful set of tools, and potentially very valuable to researchers planning their projects and institutions, departments etc trying to establish the necessary infrastructure.

Is the PDF format appropriate for preserving documents with long perspective?

19 Mar

Paul Wheatley drew attention to this question on Stack Exchange yesterday:

“PDF is almost a de facto standard when it comes to exchanging documents. One of the best things is that always, on each machine, the page numbers stay the same, so it can be easily cited in academic publications etc.

But de facto standard is also opening PDFs with Acrobat Reader. So the single company is making it all functioning fluently.

However, thinking in longer perspective, say 50 years, is it a good idea to store documents as PDFs? Is the PDF format documented good enough to ensure that after 50 years it will be relatively easy to write software that will read such documents, taking into account that PDF may be then completely deprecated and no longer supported?”

I tried to respond, but fell foul of Stack Exchanges login/password rules, which mean I’ve created a password I can’t remember. And I was grumpy because our boiler isn’t working AFTER it’s just been serviced (yesterday, too), so I was (and am) cold. Anyway, I’ve tried answering on SE before and had trouble, and I thought I needed a bit more space to respond. My short answer was going to be:

“There are many many PDF readers available implemented independently of Adobe. There are so many documents around in PDF, accessed so frequently, that the software is under constant development, and there is NO realistic probability that PDF will be unreadable in 50 years, unless there is a complete catastrophe (in which case, PDF is the least of your worries). This is not to say that all PDF documents will render exactly as now.”

Let’s backtrack. Conscious preservation of artefacts of any kind is about managing risk. So to answer the question about whether a particular preservation tactic (in this case using PDF as an encoding format for information) is appropriate for a 50-year preservation timescale, you MUST think about risks.

Frankly, most of the risks for any arbitrary document (a container for an intellectual creation) have little to do with the format. Risks independent of format include:

  • whether the intellectual creation is captured at all in document form,
  • whether the document itself survives long enough and is regarded as valuable enough to enter any system that intends to preserve it,
  • whether such a system itself can be sustained over 50 years (the economic risks here being high),
  • not to mention whether in 50 years we will still have anything like current computer and internet systems, or electricity, or even any kind of civilisation!

So, if we are thinking about the risks to a document based on its format, we are only thinking about a small part of the total risk picture. What might format-based risks be?

  • whether the format is closed and proprietary
  • whether the format is “standardised”
  • whether the format is agressively protected by IP laws, eg copyright, trademark, patents etc
  • whether the format requires, or allows DRM
  • whether the format requires (or allows) inclusion of other formats
  • the complexity of the format
  • whether the development of the format generally allows backwards compatibility
  • whether the format is widely used
  • whether tools to access the format are closed and licensed
  • whether tools to access the format are linked to particular computer systems environments
  • whether various independent tools exist
  • how good independent tools are at creating, processing or rendering the format

and no doubt others. By the way the impact of these risks all differ. You have to think about them for each case.

So let’s see how PDF does… no, hang on. There are several families within PDF. There’s the “bog-standard” PDF. There’s PDF/A up to v2. There’s PDF/A v3. There are a couple of other variants including one for technical engineering documents. Let’s just think about “bog-standard” PDF: Adobe PDF 1.7, technically equivalent to ISO standard ISO 32000-1:2008:

  • The format was proprietary but open; it is now open
  • it is the subject of an ISO standard, out of the control of Adobe (this might have its own risks, including the lack of openness of ISO standards, and the future development of the standard)
  • it allows, but does not require DRM
  • it allows, but does not require the inclusion of other formats
  • PDF is very complex and allows the creation of documents in many different ways, not all of which are useful for all future purposes (for example, the characters in a text can be in completely arbitrary order, placed by location on the page rather than textual sequence)
  • PDF has generally had pretty good backwards compatibility
  • the format is extremely widely used, with many billions of documents worldwide, and no sign of usage dropping (so there will be continuing operational pressure for PDF to continue accessible)
  • many PDF creating and reading tools are available from multiple independent tool creators; some tools are open source (so you are not likely to have to write such tools)
  • PDF tools exist on almost all computer systems in wide use today
  • some independent PDF tools have problems with some aspects of PDF documents, so rendering may not be completely accurate (it’s also possible that some Adobe tools will have problems with PDFs created by independent tools). Your mileage may vary.

So, the net effect of all of that, it seems to me is that provided you steer clear of a few of the obvious hurdles (particularly DRM), it is reasonable to assume that PDF is perfectly fine for preserving most documents for 50 years or so.

What do you think?

Open postcode? That’ll be a “no” then!

14 Mar

A month or so ago I got an email from the OpenRightsGroup, asking me to write to a minister supporting the idea of retaining the Postcode database as Royal Mail is privatised, and making it Open. The suggested text was as follows:

“Dear [Minister of State for Business and Enterprise]
“We live in an age where location services underpin a great chunk of the economy, public service delivery and reach intimate aspects of our lives through the rise of smartphones and in-car GPS. Every trip from A to B starts and ends in a postcode.
“In this context, a national database of addresses is both a critical national asset and a natural monopoly, which should not be commercially exploited by a single entity. Instead, the Postcode Address File should be made available for free reuse as part of our national public infrastructure.The postcode is now an essential part of daily life for many purposes. Open availbaility would create re-use and mashup opportunities with an economic value far in excess of what can be realised from a restrictive licence.
“I am writing to you as the minister responsible to ask for a public commitment to:
“1) Keep the Postcode Address File (PAF) under public ownership in the event of the Royal Mail being privatised.
“2) Release the PAF as part of a free and open National Address Dataset.”

A few days ago I got a response. I think it must be from a person, as the writer managed to mis-spell my name (not likely to endear him (or her) to me!)

“Dear Mr Rushbridge,

“Thank you for your email of 6 February to the Minister for Business and Enterprise, Michael Fallon MP, regarding the Postcode Address File (PAF).

“I trust you will understand that the Minister receives large amounts of correspondence every day and regretfully is unable to reply to each one personally.  I have been asked to reply.

“The Government’s primary objective in relation to Royal Mail is to secure a sustainable universal postal service.  The postcode was developed by Royal Mail in order to aid delivery of the post and is integral to Royal Mail’s nationwide operations.  However, we recognise that postcode data has now become an important component of many other applications, for example sat-navs.

“In light of PAF’s importance to other users, there is legislation in place to ensure that PAF must be made available to anyone who wishes to use it on terms that are reasonable.  This allows Royal Mail to charge an appropriate fee whilst also ensuring that other users have access to the data.  The requirement is set out in the Postal Services Act 2000 (as amended by the Postal Services Act 2011) and will apply regardless of who owns Royal Mail.  It is this regulatory regime, and not ownership of Royal Mail, that will ensure that PAF continues to be made available on reasonable terms.  Furthermore, Ofcom, the independent Regulator, has the power to direct Royal Mail as to what ‘reasonable’ terms are.  Ofcom are currently consulting on the issue of PAF regulation and more information can be found on their website at: http://www.ofcom.org.uk.

“On the question of a National Address Register, the UK already has one of the most comprehensive addressing data-sets in the world in the form of the National Address Gazetteer (NAG).  The NAG brings together addressing and location data from Ordnance Survey, Local Authorities and Royal Mail; the Government is committed to its continuation as the UK’s definitive addressing register.

“The Government is similarly committed to ensuring that the NAG is used to its full benefit by both public and private sector users, and keeps pricing and licensing arrangements under review with the data owners.  Alongside our commitment to the NAG, the Government is continuing to consider the feasibility of a national address register.

“I trust you will find this information helpful in explaining the position on this subject.

“Yours sincerely,

“BIS MINISTERIAL CORRESPONDENCE UNIT”

So, that’ll be a “No” then. But wait! Maybe there’s a free/open option? No such luck! From Royal Mail’s website, it looks like £4,000 for unlimited use of the entire PAF (for a year?), or £1 per 100 clicks. You can’t build an open mashup on that basis. Plus there’s a bunch of licences to work out and sign.

What about the wonderful National Address Gazeteer? It’s a bit hard to find out, as there seem to be mutiple suppliers, mainly private sector. Ordnance Survey offers AddressBase via their GeoPlace partnership, which appears [pdf] to cost £129,950 per year plus £0.008 per address for the first 5 million addresses! So that’s not exactly an Open alternative, either!

Now I’m all for Royal Mail being sustainable. But overall, I wonder how much better off the whole economy would be with a Open PAF than with a closed PAF?

Some research data management terminology

22 Feb

Terminology in this area is confusing, and is used differently in different projects. For the purposes of a report I’m writing, unless otherwise specified, we will use terminology in the following way:

  • Data management is the handing and care of data (in our case research data) throughout its lifecycle. Data management thus will potentially involve several different actors.
  • Data management plans refer to formal or informal documents describing the processes and technologies to be deployed in data management, usually for a research project.
  • Data deposit refers to placing the data in a safe location, normally distinct from the environment of first use, where it has greater chance of persisting, and can be accessed for re-use (sometimes under conditions). Often referred to as data archiving.
  • Data re-use refers to use made of existing data either by its creators, or by others. If re-use is by the data creators, the implication is that the purpose or context has changed.
  • Data sharing is the process of making data available for re-use by others, either by data deposit, or on a peer to peer basis.
  • Data sharing plans refer to the processes and technologies to be used by the project to support data sharing.

Some JISCMRD projects made a finer distinction between data re-use and data re-purposing. I couldn’t quite get that. So I’m balancing on the edge of an upturned Occam’s Razor and choosing the simpler option!

Does this make sense? Comments welcomed!

How to plan your research data management (planning is not writing the plan!)

21 Feb

David duChemin, a Humanitarian Photographer from Vancouver, wrote a bog postduC13 at the start of 2013 (in the “New Year Resolution” season) entitled “Planning is just guessing. But with more pie charts and stuff”. He writes:

“Planning is good. Don’t get me wrong. It serves us well when we need a starting point and a string of what ifs.  I’m great at planning. Notebooks full of lists and drawings and little check-boxes, and the only thing worse than planning too much is not planning at all. It’s foolish not to do your due-diligence and think things through. Here’s the point it’s taken me 4 paragraphs to get to: you can only plan for what you’ll do, not for what life will do to you.”

OK he doesn’t really think planning is just guessing; in the post he’s stressing the need for flexibility, but also pointing out that planning (however flawed) is better than not planning.

That blog post is part of what inspired me to write this. Another part is a piecce of work that I’m doing that seems to have gone on forever. It seems like a good idea to put this up and see what comments I get that might be helpful.

Planning to manage the data for your research project is not the same thing as filling in a Checklist, or running DMP Online. The planning is about the thinking processes, not about answering the questions. The short summary of what follows is that planning your research data management is really an integral part of planning your research project.

So when planning your research data management, what must you do?

First, find out what data relevant to your planned research exists. You traditionally have to do a literature search; just make sure you do a data search as well. You need to ensure you’re aware of all relevant data resources that you and your colleagues have locally, and data resources that exist elsewhere. Some of these will be tangentially referenced in the literature you’ve reviewed. So the next step is to work out how you can get access to this data and use it if appropriate. It doesn’t have to be open; you can write to authors and data creators requesting permission (offering a citation in return). Several key journals have policies requiring data to be made available, if you need to back up your request.

The next step, clearly, is to determine what data you need to create: what experiments to run, what models, what interviews, what sources to transcribe. This is the exciting bit, the research you want to do. But it should be informed by what exists.

Now before planning how you are actually going to manage this data, you need to understand the policies and rules under which you must operate, and (perhaps even more important) the services and support that is available to you. Hidden in the policies and rules will be requirements for your data management (data security, privacy, backup, continued availability, etc). Hidden in the services and support will be some that will be very useful to you, and will save you time and diverted resources (institutional backup services, institutional data repositories, etc). As suggested above, these services and support could come from your group, your institution, your discipline, your scientific society, or your invisible college of colleagues around the world.

So now you can plan to manage your data. You may need to address many issues:

  • Identification, provenance and version control: how to connect associated datasets with the experimental events and sources from which they derived, and the conditions and circumstances associated.
  • Storage: how and where to store the data, so that you and your colleagues (who may be in other institutions and/or other countries with different data protection regimes) can work on it conveniently but securely. Issues like data size, rate of data creation, rate of data update may all be relevant here. Data backup! Encryption for sensitive data taken off-site. Access control. Annotation. Documentation.
  • Processing: how will you analyse and process your data, and how will you store the results. Back to provenance and version control!
  • Sharing: How to make data available to others, and under what conditions. Where will you deposit it? With what associated information to make it usable? Depends on the data of course, and issues such as data sensitivity. May also depend on data size etc. Which data to share? Which data to report?

That’s not everything but it’s the core. When you’ve done the basic planning at this sort of level, you can get down to writing the Plan! At this point the specific requirements of research funder and institution will come into play, and tools like DCC DMP Online will be useful. They may even remind you of key issues you had forgotten or ignored, or local services you (still) didn’t know about.

At this point you don’t know whether your research will be funded, so there is a limit to the amount of effort you should put into this. NERC wants a very much simplified one-page outline data management plan; it may be more sensible to have a 2 or 3-page plan covering the stuff above, and condense down (or up) as required by your funder.

But you’re still only at the first stage of your research data management planning! If you are lucky enough to get your project funded, there will be a project initiation phase, when you gather the resources (budget, staff, equipment, space). Effectively you’re going to build the systems and establish the protocols that will deliver your research project. At this point you should refine your plan, and add detail to some elements you were able to leave rather vague before. Now you’re moving from good intentions to practical realities. And given that life does throw unexpected events at you (staff leaving, IT systems failing, new regulations coming in), you may need to do this re-planning more than once. Keep them all! They are Records that could be useful to you in the future. In a near-worst case, they could form part of your defence against accusations of research malpractice!

My point is, this isn’t so much good research data management planning, as good planning for your research.


duC13 duChemin, D. (2013). Planning Is Just Guessing. But With More Pie Charts and Stuff. Vancouver, BC. Retrieved from http://davidduchemin.com/2013/01/planning-and-guessing/