Sustainability in context: collectively-produced web content

8 Sep

  Draft 0.2

The Blue Ribbon Task Force looked at four different sustainability contexts: Scholarly discourse, Research Data, Commercially-owned cultural content, and Collectively produced web content. We thought it worth revisiting those contexts in the light of our developing reference model. The next of these is Collectively-produced web content.

This term addresses much of the area sometimes referred to as “Web 2.0” or social networking: content that is produced by members of the public acting for their own benefit or for the public good. It would include major resources such as Wikipedia, Facebook and its ilk, but also more amorphous concepts such as the “blogosphere”, Twitter and other instant-messaging sites, and even perhaps community gaming.

The analysis will address 7 important questions:

  • Who benefits from use?
  • Who selects what is kept?
  • Who owns the resource?
  • Who preserves (or manages) the resource?
  • Who pays?
  • What are key attributes specific to this context?
  • What are the key risks?

Note: this is an extremely broad area, and this analysis must perforce be somewhat superficial. There is a great opportunity for further study here.

We welcome comments on this approach. Is this a sufficient set of questions to consider sustainability in particular (general) contexts? Are there major issues that have been forgotten in this scenario (bearing in mind that it is always a generalisation)?

Who benefits from collectively-produced web content?

Wikipedia is perhaps the most obvious example in this category of collectively-produced web content. It may however be rather misleading as an example, as there are significant differences between it and almost all other examples. It doesn’t just stand at one end of a continuum, it is a far outrider. There can be few reading this who have not at one time or another (or even very often) made use of Wikipedia.

Many other examples are much more complex. It’s probably true that these resources are produced primarily for the benefit of the creator(s). However, that benefit often accrues from features provided by an underlying platform (eg Facebook, WordPress, Twitter), in combination with the social features of the systems. That is, these resources get value not from themselves, but from network effects resulting from linkages and interactions with other elements of their environment (and other environments).

A result of this is that short-term benefit is much clearer than long-term benefit. There are also well-known examples of long-term dis-benefit resulting from earlier contributions. Youthful or hasty indiscretions may reflect badly in certain situations. Issues like this have influenced questions of privacy and also control over deletion of contributions; these are treated differently in different platforms.

Strong cases have been made for the long-term value of some parts of these resources, such as blogs covering major social events such as general elections, or upheavals such as the “Arab Spring”. However, for much of the resource there is no clear-cut use case for long-term preservation, beyond vague assertions like “will be of interest to future scholars”, etc.

One problematic characteristic is that those who benefit from long-term access may be completely distinct from those who benefit from short-term access.

From the point of view of the platform provider, the content is a vehicle to attract usage, most often with the intent of attracting benefits in the form of one or both of advertising or premium subscription revenue.

Who selects what is kept?

This question operates at two levels: the platform provider and the content provider.

Decisions by the platform provider are likely to be made in terms of financial or other business benefits that do not relate directly to the benefits to the contributors and readers. There are many examples of community content platforms that have been closed down by their providers for business reasons. One of the most notorious is the closure of GeoCities, but there are many examples of platforms either closed or at risk. There appear to be no commonly accepted protocols for closing down such a resource, and providers generally do not provide mechanisms for wholesale transfer of all or any individual’s content to another resource. Community efforts have occurred to capture some or all of the content in several cases (including GeoCities), but the resulting content may not be easily accessible (it is based on an Options Strategy, capturing the resource in a basic way to leave the option to do more work on it later, rather than losing it entirely).

[For sheer numbers, see the Archive Team’s Deathwatch or TechCrunch’s DeadPool, which gives a bit more background information on some at risk sites.]

The boundaries of these resources are often ill-defined; in fact, it is not clear what is the resource. The possibilities include the platform, the aggregate of contributed content, or the set of content contributed by an individual or group (who may think of their content as being entirely separate from other content on the same platform).

Time is a real factor: the resource is dynamic with additions and subtractions constantly being made. What is the authoritative version of the resource?

So the answers to “who selects?” are various:

  • individual contributors select what is created and (where this facility is available) what is changed, deleted or extracted, and what comments to keep.
  • some memory institutions are selecting content for preservation under various criteria. These include web archiving activities, and potentially specialist blog archiving activities. Another example is the Library of Congress agreement to archive the Twitter stream.
  • where a platform becomes at risk, the platform provider will make business decisions based on balancing costs against revenues and other business benefits when deciding what to keep.

Who owns the resource?

Ownership of a resource comprising community-contributed content is extremely murky, and the IP Rights will be very complex. Rights will exist to the platform and its software, to widgets and other elements used by contributors, to designs and templates used by many contributors, to the individual content items created by contributors, and to comments left against other contributors’ content (sometimes part of a long chain of comments). Few resources make explicit reference to rights (such as declaring that both the content and comments contributed to a particular blog would be covered by a Creative Commons licences). Quotation of potentially significant parts of other people’s posts is another common feature that can result in increasingly complex chains of ownership.

So overall the ownership is potentially diffused among all the contributors.

Most contributors never read the full terms and conditions under which they operate, but may be surprised to learn that they often grant non-exclusive, irrevocable rights to the platform providers. Sometimes these rights are qualified by words intended to make clear that they are for the purposes of the operation of the resource (and such rights are indeed essential), but sometimes the rights appear more extensive [example?].

Who preserves (or manages) the resource?

In the first instance the platform provider (or hosting site) manages the resource. This is potentially a reasonably long-term solution for content of continuing interest. However, platform providers are generally commercial organizations, and certainly have real costs to pay, so it is wise to plan for their interest to be temporary.

Platform providers have very weak incentives to act for the long term, as preservation is generally not part of their institutional mission. Although it would be helpful if platform providers made hand-off plans and post-termination agreements, these are not generally part of the landscape of entrepreneurial businesses. Indeed, attempts to sustain a resource beyond its financially viable life may mean no resources available for orderly transfer or hand-off to any other party, including the contributors themselves. At best, contributors and others can hope for a window during which transfer of material will be possible. Community actions for mass transfer are often explicitly prohibited in terms and conditions, but volunteer or self-organised groups may be determined to extract content anyway. However, these tend to be hastily-organised, post hoc activities, resulting in archives of significantly lower value than the original; they could be described as low-cost option strategies, leaving open the door for future work, that otherwise would have irrevocably closed.

It is of course possible that contributors have (some of) their own content on their own computers, anyway. However, these fragmented atoms of content do not have the power or value of the aggregated content in context.

The value proposition is too diffuse, or not compelling enough at this point for many stakeholder groups to see preservation as their problem (although experiments have been run with blog archiving, and the Library of Congress is reported to be ingesting the Twitter stream, and is collecting legal blogs). Despite its clear importance, we know of no activity to preserve Open Source hosting sites such as SourceForge.

[I understand that Carolyn Hanks of Magill has been making a presentation on findings of her study on blogs and preservation at the BlogForever project meeting; if possible I will update here with a link.]

Who pays?

The question of “who pays” tends to be quite unclear to the majority of contributors and readers of community-contributed sites. Facebook proudly claims it is “free and always will be”. Some sites (eg flickr, WordPress etc) operate on a “freemium” model: free for basic use, but a subscription will buy enhanced capability, more storage, more services etc.

The “free” model works because the marginal cost of individual contributions, or even individual blogs or pages, is extremely low, whereas value accretes from the aggregation. Large numbers of writers and readers are very attractive to advertisers, and sites such as Facebook have been ruthless in changing their rules to enable them to monetise this value. We reflect this “advertiser pays” approach in the sustainability ecosystem through the “indirect beneficiary” class of stakeholders, who benefit indirectly from use of the resource by its primary users.

Some approaches (Open Source Software and Wikipedia being examples) are driven by a strong volunteer emphasis. Much of the work is contributed gratis, either by individuals, or by employees paid by their organizations to contribute because of perceived value. There are always real costs in such activities however, such as hosting costs, that must be met each year if the activity is not to fold. The high profile example of this is Wikipedia’s annual drive to raise sufficient funds to continue operating. Some of these activities can be operating on very tight budgets, and small miscalculations can make them unable to pay their bills, and may force them to cease operations suddenly.

It is important when thinking of costs, however, to realise that costs are not just financial. Community-contributed content requires the involvement of the community to generate and sustain content. That volunteer effort is part of the “cost” of sustaining the digital assets, just as much as the money to pay hosting and other costs. Many such projects have failed because they could not attract (or retain, in the case of Myspace) sufficient interest from their intended communities.

What are key attributes specific to this context?

This is a very new form of content, and is itself highly varied; there are few useful parallels in the analogue world that can be used as a guide for preservation decision-making.

Unfortunately, there is little awareness among most stakeholders that preservation is an issue, although when it does become an issue (as in the proposed or actual closure of services), some react with anger and a strong sense of betrayal.

There is a great deal of uncertainty about most aspects of this content and its platforms. Most platforms do not succeed in attracting a critical mass for survival (hence the large numbers of casualties), and some that appear to do so may get overtaken by a more agile, more feature-rich, or perhaps jsu (temporarily) “cooler” competitor, as Myspace was overtaken by Facebook.

Decision-making power may be too diffused among thousands of contributors, making collective action nearly impossible; there are not enough decision-makers with a “big picture” view. Owners of content platforms (eg Yahoo as owners of Delicious, flickr etc) may become too focused on bottom-line problems to worry about preservation, or even exit strategies.

Key risks?

The Gartner Hype Curve (see Wikipedia article) is widely referenced and sometimes derided, not least for its implication that everything eventually succeeds (clearly not the case). It does however hint at the serious risks that face new technologies:

“’Peak of Inflated Expectations’ — In the next phase, a frenzy of publicity typically generates over-enthusiasm and unrealistic expectations. There may be some successful applications of a technology, but there are typically more failures.”

Much of the financial case for platform providers to host “free” content rests on successfully capturing a sufficiently large chunk of the market to bring in enough advertising revenue. There have been many startups in different niches that have attempted this, and inevitably many have failed or are likely to fail, most often after having attracted some community-contributed content. The circumstances of (impending) failure do not make organizing an orderly exit or handoff strategy likely, as management is driven by decreasing or inadequate revenues, dwindling funding pools, and desperate attempts both to obtain more contributors and more capital, while cutting costs. This inevitably leads towards lost content. The decline of Myspace reminds us that even market leaders can be dislodged.

Given the risks they face and the community-contributed content they hold, it would be advisable for these social sites to have an explicit exit or hand-off strategy.

Chris Rusbridge and Brian Lavoie

[Update to add links


Up to Table of contents

Revised by?]


Comments always welcome, will be treated as CC-BY

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: