The New Information Order and the Future of the Archive
The Institute for Advanced Studies in the Humanities,
The University of Edinburgh
20-23 March 2002
ISBN 0 9532713 0 7
Free Access to Research Publications? The Potential of the Open Archives Initiative
John MacColl
SELLIC (Science & Engineering Library,
Learning & Information Centre) Project
University of Edinburgh
Darwin Library
The King’s Buildings
Mayfield Road
Edinburgh EH9 3JU
john.maccoll@ed.ac.uk
Two prominent landmarks of the old information order are currently undergoing potentially enormous change as we enter the new. The first is the scholarly journal, and the second is the academic library. Critical to both changes which this paper considers is the concept of the archive. The notion of the archive is of course itself under pressure of enormous change in the world of digital information and publication, and the development of new tools for publishing, storing and accessing all digital content whose authors may feel it deserves longevity for some reason. In the arena of research publication, ‘archive’ has been adopted by the movement which encourages researchers to store copies of their own publications in their own digital stores. This practice is known as ‘self-archiving’. Self-archiving sounds as though it may be selfish, however, which is ironic, since the primary reason to self-archive is to provide general access to one’s research. The recent development of a new protocol for the management of multiple, distributed digital archives, allows their content to be discovered easily and comprehensively. This protocol is known as the Metadata Harvesting Protocol, and it is promulgated by a movement known as the Open Archives Initiative.
We are all familiar with the central role played by the scholarly research journal in the publication of research. Academic librarians, who have the responsibility to collect and make available these journals, have been aware for many years of a growing problem surrounding this responsibility, a problem which is pushing them into a crisis. This problem has been caused by the pricing policies of research journal publishers, who have been increasing the cost of these journals by an average of 10% per annum for many years now. With library budgets increasing by no more than the average rate of inflation (and by less in many cases), it is obvious that the ability of an academic library to sustain its collection role in respect of these journals is in serious doubt.
The problem has arisen in the period since the early 1970s, at which time the pricing policies of academic journal publishers began to change. To illustrate its impact, if we imagine a typical university in 1980, spending £500,000 per annum on a total of 1,000 journals which average £500 per subscription, the situation today, 22 years on, would be that the library budget for academic journals would have grown to £958,000 (an increase of 92%), but the cost of the same set of journals would now be just over £4 million (an increase of 714%). Assuming the library was spending the same proportion of its total budget on journals, then whereas in 1980 it could afford to purchase 1,000 titles, this year it would only be able to afford 235 of these – a reduction in purchasing power of more than three quarters in 22 years. In such a scenario we would have to ask, can an academic library whose ability to collect material for its users has declined so much still call itself an academic library?
What are the roots of this crisis? A Canadian scholar, Jean-Claude Guédon, provided an excellent analysis in a recent article written for the American Association of Research Libraries. He traces the development of the research journal from its origins in the Philosophical Transactions of the Royal Society of London in the 17th century to the present day. The factors which account for the current crisis are reasonably well known. The problem is most acute in the sciences and medicine, and can largely be accounted for by the increasing specialisation of research – particularly in the years of rapid technological progress since the Second World War. Those changes in society, in conjunction with the gradual advance in the relative importance to academic careers of research rather than teaching, have contributed considerably to the crisis in which most academics and librarians would agree we find ourselves. But what Guédon’s article also does is point to some of the other factors which he has identified in the practices and polices of journal publishers over the past two or three decades, which have created the crisis in a way which may be seen to be quite calculated and cynical.
Two attributes of the higher education journal market have permitted certain publishers to exploit it ruthlessly. One is the inelasticity of the market. It cannot grow quickly, and indeed it might be argued that even in the space of several decades it is unlikely either to grow or to shrink by very much. Universities cannot expand their quantity of researchers the way society expands the numbers of people owning digital TVs or drinking a particular brand of beer. One might imagine over a long period of time that growing prosperity in the Third World might lead to better standards of education and a higher per capita involvement in research, but an accompanying reducing birthrate will limit the growth potential of the market at the same time. Publishers are therefore selling their products into a market of an essentially fixed size, and so are not obliged to price these products aggressively in order to win customers away from an existing title produced by a rival publisher. On the other hand, in such a market, brand loyalty is bound to be of very high importance. In the case of academic journals, the ‘branded’ end of the market comprises those titles with high impact factors. The game then, for commercial publishers, is to publish ‘high impact’ journals, and then charge whatever prices they can get away with in an essentially captive market.
Of course, the market does change. Titles will disappear as particular research specialisms fade out of academic interest, and new titles will appear to fill the requirements of new specialisms, but publishers can afford to wait until they can be sure that a market exists for a new title before launching it. In this respect they have the great advantage of knowing that the customers for their product are also its producers, since the content of any new journal will be produced by the same academics who will require that their library subscribes to it. They can therefore put in place an editorial board composed of scholars who can testify to the size of the likely market and whose success in attracting articles will equate closely to numbers of sales. This is not a high-risk venture for a commercial publisher.
The market forces which apply in most other areas of business therefore hardly apply in the field of research journal publication. An interesting recent attempt to use market forces to drive down the cost of journals is that of the Scholarly Publications Access Resource Coalition (SPARC) which is launching low-cost rivals to established ‘market-leading’ titles. These rival journals are published by not-for-profit publishers, typically learned societies, or even university libraries or academic departments. SPARC can point to some success in obliging commercial publishers to drop the annual rate of price increase on particular titles. The best-known example at the present time is Tetrahedron Letters, a hugely-expensive chemistry journal published by Elsevier, which now has a SPARC rival title, Organic Letters. Tetrahedron Letters cost subscribers $8,600 per year when Organic Letters was launched in 1999. Organic Letters was launched with a subscription price of $2,400. Had the rival journal not appeared, the rate of price increase per annum would have meant that by now Tetrahedron Letters would be costing $12,000, whereas in fact the annual price rate has fallen to 2-3%, so that its current price is merely £9,000. Thus, SPARC can argue that libraries can subscribe to both Tetrahedron Letters and Organic Letters at a price which is no higher than the original journal would be costing. The launch of the rival title has therefore exerted a brake upon the pricing of Tetrahedron Letters, and in due course SPARC hopes that Organic Letters will overtake Tetrahedron Letters in profile, perhaps attracting some of the editors of its high-cost rival, and gradually overtaking it in impact, becoming the journal of choice in its field. In this way, the community transfers its allegiance from one title to the other, and libraries can stop having to buy two separate titles in the same specialist area. This may work, but publishers like Elsevier are of course very good at selling their products, and have been in the business a long time. It will not be easy to dislodge them from their position of dominance. One of the interesting effects of this war on journal pricing policies – itself worthy of its own research specialism perhaps – is to examine the techniques adopted by publishers to keep their market position intact, and in particular the enticements they offer to academics to continue to act as editors for their titles.
The other attribute which has been exploited is the notion of a ‘pecking order’ amongst journals. As Guédon shows, we have the Institute for Scientific Information (ISI) to thank for the arrival in the early 1960s, with the Science Citation Index and citation indexing, of ‘core science’ and, later, ‘impact factors’, as a means of ranking the importance of journals in given research domains. ISI developed citation indexing by which it became possible to determine on a statistical basis which researchers were most cited by their peers. Computerised production of journal indexes made citation indexing a very simple by-product of the journal publishing process, but one which has had a huge effect on the behaviour of the market. Of course, not only can the relative importance of individual researchers be assessed by means of citation counting, so also can the relative importance of individual journal titles, and so arose the ranking of journals which in turn affected the behaviour of career-anxious academics, who sought to ensure that they published their articles in the top journals in their field, as determined by impact factors. From the point of view of those purchasing these journals – academic libraries – this phenomenon gave rise, dispiritingly, to a vicious circle in pricing terms. The highest impact journals as the most attractive in their field in an inelastic market will, by the laws of market economics, become the highest priced.
What Guédon shows in his article is that certain commercial journals have managed to manipulate the game of impact factors for their own benefit, in order to derive a near-monopoly advantage. Elsevier has a market dominance in the field of scientific, technological and medical journal publishing, which it owes to its ability to achieve this near-monopoly position. What Elsevier has done over the years, according to Guédon, is to target the titles which have the highest impact ranking, and buy out the publishers of these in order to ensure that the high-impact corner of the market is occupied almost fully by Elsevier titles. This move towards monopoly has been accelerated by the arrival of digital networking, allowing publishers to offer libraries not merely individual electronic journal titles, but aggregated databases of a range of titles which are in fact databases of a publisher’s entire content over a given period. These aggregated sales he calls the ‘Big Deal’, and he counsels libraries against them because of the way they can distort the research publication landscape, providing academics with the illusion of a comprehensive landscape available to them within the space of a few clicks.
In fact, in order for libraries to be able to afford a Big Deal such as that offered by Elsevier through its ScienceDirect aggregation, they will almost always have to cancel certain non-Elsevier journal titles. By aggregating at the expense of the titles of other publishers, then, a big publisher like Elsevier can ensure that articles from its titles are the most obviously present in the research landscape of scholars. This means that these same scholars, in generating new research, will tend to cite Elsevier titles more often than others. Elsevier titles then appear at the ‘top and tail’ of an increasing number of articles, Elsevier authors citing other Elsevier authors, without any awareness that they are demonstrating loyalty to a particular publisher. In this way, the publisher ensures that its titles become the most-cited, and their impact factor rises, gradually pushing their journals further up the pecking order. From the policy of the ‘Big Deal’, therefore, Guédon demonstrates how a commercial publisher can create a ‘quality pump’ for its own titles.
Without interfering in the journal market in any way which could be considered corrupt, therefore, or inimical to research integrity, it is nevertheless clear that a commercial publisher which is good at its own business (and let us not forget that Elsevier is simply behaving as any sensible commercial company would behave in pursuing its central objective of increasing profits for shareholders) can come to occupy a position of market dominance even in an area of human activity which is considered noble and pure, such as academic research.
We can oppose the distortion of this landscape in two particular ways – by asserting our control of the archive, and by using new ‘open’ tools (by which we mean those not controlled by commercial interests, as much as those for which the code is freely available) to build our own pathways through the publication landscape. These latter are linking tools which allow the client (i.e. the library on behalf of its institution) to control the targets of links in the scholarly landscape, rather than the publisher. Another new protocol has emerged in this area, the OpenURL, which will convert – assuming the compliance of data publishers - any embedded URL into one which directs the user to a source which the library chooses. Both measures are about shifting the locus of control from the publisher to the producer.
And perhaps it is at this point that we need to ask the question ‘why is research publication commercialised at all?’ Where is the sense in it? What do the successful commercial journal publishers do to deserve such high profits? Do they invest large sums in producing the content which they sell to academic libraries at such a high price? Clearly they do not, since academics give them that content for free. Is the quality control expensive then? Again, no, since much of the reviewing and editorial work is done for free. The situation is economically absurd, and clearly unsustainable.
What is the role of the archive in this scenario? I would like to suggest that archiving of research publications essentially has two roles for the research community in the age of digital networking (or ‘skywriting’ in the ‘post-Gutenberg age’, to use Stevan Harnad’s terminology). The first of these roles is techno-economic. The workers (who are knowledge workers in this scenario) can take back the means of production. Paul Ginsparg’s well-known archive of research papers in high-energy physics, which was based at the Los Alamos National Laboratory for many years (but moved last year to Cornell University) proved that researchers at least in one domain of knowledge were happy, as authors, to place their research output on a freely-available host repository on the internet, and were at the same time equally happy to use that same repository as readers. The ‘Los Alamos archive’, which now has the internet name arXiv.org, has revolutionised research publication in high-energy physics and cognate domains, and the model it provides seems on the face of it to be exportable. Copies of the model have appeared in other domains. In the University of Southampton there is an archive known as Cogprints, which serves the cognitive science community worldwide, using the ‘Los Alamos’ software.
The second role is in the provision of a true archive by means of digital preservation as an associated service. This is an associated goal which will be realised once the content exists in the sort of quantity which will make preservation important. This is a goal which has never really been in the sights of commercial publishers, and so is less contentious, but is important in establishing credibility for the Open Archives Initiative as an archive?.
However, the success of arXiv.org has not been achieved at the expense of commercial journal publishers, at least not entirely. While the research behaviour of high-energy physicists as a community has meant that many of them are happy working with preprints, some of which never become published papers in the career-advancing sense at all (and therefore the availability of a free internet archive would appear to be the perfect solution for their needs), nevertheless there are many thousands of papers in arXiv.org which are versions of articles published in the established journals in physics, astronomy and associated disciplines. The researchers in the field do not need to wait for these journals to appear in their libraries in printed form, or via their library portals in electronic form from a publisher’s server, because they have already read them in arXiv.org. So are these journal publishers going out of business? Not yet. Libraries are continuing to subscribe for a number of reasons. Having a paper record as an archive is still important for many. Then there are the value-added reasons. Appearing in a bona-fide journal means that a paper is indexed in the regular way, and so has an authentic, demonstrable and documented existence in the world of research. It can be cited, and develop a citation history. Most importantly of all, the journal title carries a branding value which matters to its author’s career.
The journal publishers have been content, for the meantime, to allow these papers to be deposited in arXiv.org where they can be consulted for free, because their revenue has not been affected. In the physics field in any case we are talking largely about learned society publishers, such as the American Physical Society, whose commercial agenda is unlikely to be quite as focussed on shareholder profits as that of a publisher like Elsevier. In a sense, then, in this area of research we have the best of both worlds. Academics can access the most current research in their field easily and comprehensively, and still advance their careers by being published in the high-impact journals.
But there are two problems to be resolved here. The first is that there is some economic absurdity built into the scenario. It makes little sense for academic libraries to be purchasing journals which no one actually reads, and whose main value to the community is their ‘citability’. The logical resolution to this absurdity, as Harnad has advanced in his work, is for the journal titles to continue to exist but primarily to provide the function of quality control, the organisation of peer review. The titles should continue, almost certainly in electronic-only form, and will continue to pick up subscriptions – perhaps from individual subscribers, or members of the learned societies who publish them in many cases, rather than from academic libraries – but will not depend upon library subscriptions for their existence. Their economic basis will change as they disinvest in the machinery of print and electronic distribution, scaling down their production to a much reduced level (and abandoning print distribution entirely), and will be funded through charges levied on academic and research institutions for the provision of peer-review services.
The second is that the model does not appear to be capable of being replicated in all domains. ArXiv.org has been in existence for more than ten years, but the idea that it would serve as a template for the development of such archives in all disciplines has proved unfounded. This is not simply due to the difference in research publishing patterns of scholars in the sciences from those in the humanities and social sciences. The differences in publishing practice between scholars in different domains are more sophisticated. It certainly seems clear that those domains without an established preprint exchange culture often find the jump into the deposit of papers, whether preprint or published, on an open internet archive, too great to make. Indeed, the sophistication may even lie in the way by which these different domains go about preprint exchange. Very selective preprint exchange, in which a given researcher sends a copy of their paper to the four or five of their fellow researchers whose work is closest to their own or which they most value, might also mean that they are made uncomfortable by the idea of thousands of peers having the opportunity to read their work before it has been reviewed.
For this reason, it would appear that the best way forward is to build up open archives initially at least on the basis of work already accepted for publication, and we should do it on an institutional basis. The institution can provide support to its own reticent self-archivers. The researchers with strong publication records can assist those without in standing up to journal publishers who might want to bully them into handing over all rights to their published work, rather than just the right to publish it in their journal, or at least just not the right to prevent them form placing a copy on their own institution’s open archive. The institution’s library can support them in the copyright fray by advising on normalised practices in article submission to publishers, and also by making the tools of self-archiving highly visible, with local help in their use available. Both of these objectives are served by developing institutional open archives, and the academic library is likely to be a centrally appropriate agent on the campus to manage these.
The institutional archive approach, now being advocated by the UK’s Joint Information Systems Committee, and the Soros Foundation with its recent Budapest Initiative, can – we hope – lead the academic horses to water (to use another Harnad metaphor) and make them drink, in the way disciplinary archives have so far failed to do. Open source software is now freely available to institutions to use in setting up institutional versions of arXiv.org, and the Open Archives Initiative Metadata Harvesting Protocol allows all such archives, once registered, to function as a virtual single archive, using a Google-style metadata harvesting approach to distributed data.
‘Archive’ in the context of the ‘open archive’, then, has its own particular meaning. It may on the surface appear to be an oxymoron to talk of an ‘open’ archive. An open archive is not primarily concerned with the process of preservation, but rather with the process of deposit. Open archives are like pigeon-holes on the internet, into which researchers deposit copies of their latest publications. A machinery awaits these publications, which researchers need not know about, but which involves the campus library, database software and indexing and harvesting protocols, turning their deposited papers into a freely-shared corpus of research. This corpus can be traversed meaningfully by the use of searching tools, developed also by the Open Archives Initiative, to allow the searcher to specify a subject or an author or a title, or to confine their search only to those papers already accepted for publication in professional journals, or only to those papers which have appeared in a particular journal.
To publishers, such a corpus of material might be considered a parasite upon the literature they consider themselves to own. But we must pause to reflect: who is the parasite? The growth of a big business-dominated research literature has always been absurd. An open archive basis to the research corpus opens up questions and opens up minds. We have for too long been victims of an inappropriate and non-ideal research publishing environment, in which the legitimate career development needs of researchers have been exploited by legitimately profit-focussed businesses. The post-Gutenberg world allows us to create an appropriate and ideal environment, in which journal titles and impact factors continue to exist for the sake of driving forward the highest quality research by encouraging the work of the best researchers. That they continue to exist in the hands of commercial publishers may not be ideal, but there is no feasible way to change this except by supporting movements like SPARC which allow scholarly publication to make a gradual transition, based on the behaviour of researchers themselves, to publications whose publishers’ commercial behaviour is more appropriate to the community of authors and readers they serve. There may well be scope for separate initiatives there (and advocacy is already underway), but the more important priority is to begin to create a serious cross-domain corpus of free publications.
The Open Archive Initiative does not of course produce an entirely free corpus. Rather, it is free at the point of use. Its costs – which are unlikely to include profits – are covered by what academic institutions will pay for the service of peer review. They might also be met in part through the sale of value-added products. Print versions of these journals would be an example of these. New markets might be required for these products, since university libraries are unlikely to want to continue to subscribe to much of the material which is available for free on the internet. If the publishers are university presses or libraries, or learned societies, then the profit margins can be substantially less than those sought by Elsevier and company, and those same organisations – at least in the case of societies – will have membership bases which may provide revenue on the basis of reasonable pricing, aimed at the individual subscriber.
But ultimately, this is not our problem, at least not yet. What we want to do is to free the research literature. This can be achieved by the development of an international open archive, freely accessible and searchable, and open to any researcher to deposit in. The scholarly community, so divided along disciplinary lines in its manifold interests, and so divided culturally in its many different behaviours, should unite behind this vision if nothing else. Its research is not performed for commercial gain, and since an open archive on the internet now exists, it should support the idea of that research being deposited in that archive, in order to spread the value of that research to all corners of the world, and to hasten its benefits, rather than merely to reach the minds of researchers in similarly wealthy universities. What is required is for researchers to accept that vision, and for their libraries to stand ready – however ironically – as archivists, to do the rest.
March 2002. © John MacColl. Non-exclusive right of publication granted.