The Broken Scientific Publishing Model and My Attempt to Improve It

October 12, 2016

I’ll begin this post with a rant about the state of scientific publishing, then review the technology “disruption” landscape and offer a partial improvement that I developed (source).

Scientific publishing is quite important – all of science is based on previously confirmed “science”, so knowing what the rest of the scientific community has done or is doing is essential to research. And allows scientists to “stand on the shoulders of giants”.

The web was basically invented to improve the sharing of scientific information – it was created at CERN and allowed linking from one (research) document to others.

However, scientific publishing at the moment is one of the few industries that haven’t benefited from the web. Well, the industry has – the community hasn’t, at least not as much as one would like.

Elsevier, Thomson-Reuters (which recently sold its intellectual property business), Springer and other publishers make huge profits (e.g. 39% margin on a 2 billion revenue) for doing something that should basically be free in this century – they spread the knowledge that scientists have created. You can see here some facts about their operation, the most striking being that each university has to pay more than a million dollars to get the literature it needs.

It’s because they rely on a centuries old process of submission to journals, accepting the submission, then printing and distributing to university libraries. Recently publishers have put publications online, but they are behind paywalls or accessible only after huge subscription fees have been paid.

I’m not a “raging socialist” but sadly, publishers don’t provide (sufficient) value. They simply gather the work of scientists that is already funded by public money, sometimes get the copyright on that, and disseminate it in a pre-Internet way.

They also do not pay for peer review of the submitted publications, they simply “organize it” – which often means “a friend of the editor is a professor and he made his postdocs write peer reviews”. Peer review is thus itself broken, as it is non-transparent and often of questionable quality. The funny side of the peer review process is caught at “shitsmyreviewerssay”.

Oh, and of course authors should themselves write their publication in a journal-preferred template (and each journal has its own preferences). So the only actual work that the journals do is typesetting and editorial filtering.

So, we have expensive scientific literature with no added value and broken peer review system.

And at that point you may argue that if they do not add value, they can be easily replaced. Well, no. Because of the Impact Factor – the metric for determining the most cited journals, and by extension – the reputation of the authors that manage to get published in these journals. The impact factor is calculated based on a big database (Web of Science) and assigns a number on each journal. The higher impact factor a journal has, the better career opportunities a scientist has if they managed to get accepted for publication in that journal.

You may think that the impact factor is objective – well, it isn’t. It is based on data that only publishers (Thomson-Reuters in particular) have and when others tried to reproduce the impact factor, it was nearly 40% off (citation needed, but I lost the link). Not only that, but it’s an impact factor of the journal, not the scientists themselves.

So the fact that publishers are the judge, jury and executioner, means they can make huge profits without adding much value (and yes, they allow searching through the entire collection they have, but full-text search on a corpus of text isn’t exactly rocket science these days). That means scientists don’t have access to everything they may need, and that poor universities won’t be able to keep up. Not to mention individual researchers who are just left out. In general, science suffers from the inefficient sharing and assessment of research.

The situation is even worse, actually – due to the lack of incentive for publishers to change their process (among other things), as a popular journal editor once said – “much of the scientific literature, perhaps half, may simply be untrue”. So the fact that you are published in a somewhat impactful journal doesn’t mean your publication has been thoroughly reviewed, nor that the reviewers bear any responsibility for their oversights.

Many discussions have been held about why disruption hasn’t yet happened in this apparently broken field. And it’s most likely because of the “chicken and egg problem” – scientists have an incentive to publish to journals because of the impact factor, and that way the impact factor is reinforced as a reputation metric.

Then comes open access – a movement that requires scientific publications to be publicly accessible. There are multiple organizations/initiatives that support and promote open access, including EU’s OpenAIRE. Open access comes in two forms:

  • “green open access”, or “preprints” (yup, “print” is still an important word) – you just push your work to an online repository – it’s not checked by editors or reviewers, it just stays there.
  • “gold open access” – the author/library/institution pays a processing fee to publish the publication and then it becomes public. Important journals that use this include PLOS, F1000 and others

The “gold open access” doesn’t solve almost anything, as it just shifts the fees (maybe it reduces them, but again – processing fee to get something published online, really?). The “green open access” doesn’t give you the reputation benefits – preprint repos don’t have impact factor. Despite that, it’s still good to have the copies available, which some projects (like dissem.in, OABOT, ArchiveLab) try to do.

Then there’s Google Scholar, which has agreements with publishers to aggregate their content and provide search results (not the full publications). It also provides some metrics ontop of that, regarding citation. It forms a researcher profile based on that, which can actually be used as a replacement for the impact factor.

Because of that, many attempts have been made to either “revolutionize” scientific publishing, or augment it with additional services that would have the potential to one day become prelevant and take over the process. I’ll try to summarize the various players:

  • preprint repositories – this is where scientists publish their works before submitting them to a journal. The major player is arXiv, but there are others as well (list, map)
  • scientific “social networks” – Academia.edu, ResearchGate offer a way to connect with fellow-researchers and share your publications, thus having a public researcher profile. Scientists get analytics about the number of reads their publications get and notifications about new research they might be interested in. It is similar to a preprint repo, as they try to get hold of a lot of publications.
  • services which try to completely replace the process of scientific publishing – they try to be THE service where you publish, get reviewed and get a “score”. These include SJS, The Winnower and possibly science.ai. Academia.edu and ResearchGate can also maybe fit in this category, as they offer some way of feedback (and plan or already have peer-review) and/or some score (RG score).
  • tools to support researchers – Mendeley (a personal collection of publications), Authorea (a tool for collaboratively editing publications), Figshare (a place for sharing auxiliary materials like figures, datasets, source code, etc.), Zenodo (data repository), Publons (a system to collect everyone’s peer reviews), labii.com and Open Science Framework (sets of tools for researchers), Altmetric (tool to track the activity around research), ScholarPedia and OpenWetWare (wikis)
  • impact calculation services – in addition to the RG score, there’s ImpactFactory
  • scientist identity – each of the social networks try to be “the profile page” of a scientist. Additionally, there are the identifiers such as ORCID, researcherId, and a few others by individual publishers. Maybe fortunately, all are converging towards ORCID at the moment.
  • search engines – Google Scholar, Microsoft Academic, Science Direct (by Elsevier), Papers, PubPeer, CrossRef, PubMed, Base Search, CLOCKSS, Iris.ai (AI for analyzing scientific texts) and of course Sci-Hub – which mostly rely on contracts with publishers (with the exception of SciHub)
  • journals with a more modern, web-based workflow – F1000Research, Cureus, Frontiers, PLoS

Most of these services are great and created with the real desire to improve the situation. But unfortunately, many have problems. ResearchGate has bee accused of too much spamming, its RG score is questionable; Academia.edu is accused of too many fake accounts for the sake of making investors happy, Publons is a place where peer review should be something you brag about, yet very few reviews are made public by the reviewers (which signifies a cultural problem). SJS and The winnower have too few users, and the search engines are dependent on the publishers. Mendeley and others were acquired by the publishers so they no longer pose a threat to the existing broken model.

Special attention has to be paid to Sci-Hub. The “illegal” place where you can get the knowledge you want to find. Alexandra Elbakyan created Sci-Hub which automatically collects publications through library and university networks by credentials donated by researchers. That way all of the content is public and searchable by DOI (the digital identifier of an article, which by the way is also a broken concept, because in order to give your article and identifier, you need to pay for a “range”). So sci-hub seems like a good solution, but doesn’t actually fix the underlying workflow. It has been sued and its original domain(s) – taken, so it’s something like the pirate bay for science – it takes effort and idealistic devotion in order to stay afloat.

The lawsuits against sci-hub, by the way, are an interesting thing – publishers want to sue someone for giving access to content that they have taken for free from scientists. Sounds fair and the publishers are totally not “evil”?

I have had discussions with many people, and read a lot of articles discussing the disruption of the publishing market (here, here, here, here, here, here, here). And even though some of the articles are from several years ago, the change isn’t yet here.

Approaches that are often discussed are the following, and I think neither of them are working:

  • have a single service that is a “mega-journal” – you submit, get reviewed, get searched, get listed in news sections about your area and/or sub-journals. “One service to rule them all”, i.e. a monopoly, is also not good in the long term, even if the intentions of its founders are good (initially)
  • have tools that augment the publishing process in hope to get more traction and thus gradually get scientists to change their behaviour – I think the “augmenting” services begin with the premise that the current system cannot be easily disrupted, so they should at least provide some improvement on it and easy of use for the scientists.

On the plus side, it seems that some areas of research almost exclusively rely on preprints (green open access) now, so publishers have a diminishing influence. And occasionally someone boycotts them. But that process is very slow. That’s why I wanted to do something to help make it faster and better.

So I created a wordpress plugin (source). Yes, it’s so trivial. I started with a bigger project in mind and even worked on it for a while, but it was about to end up in the first category above, of “mega-journal”, and that seems to have been tried already, hasn’t been particularly successful, and is risky long term (in terms of centralizing power).

Of course a wordpress plugin isn’t a new idea either. But all attempts that I’ve seen either haven’t been published, or provide just extras and tools, like reference management. My plugin has three important aspects:

  • JSON-LD – it provides semantic annotations for the the scientific content, making it more easily discoverable and parseable
  • peer review – it provides a simple, post-publication peer review workflow (which is an overstatement for “comments with extra parameters”)
  • it can be deployed by anyone – both as a personal website of a scientist and as a library/university-provided infrastructure for scientists. Basically, you can have a wordpress intallation + the plugin, and get a green open access + basic peer review for your institution. For free.

What is the benefit of the semantic part? I myself have argued that the semantic web won’t succeed anytime soon because of a chicken-and-egg problem – there is no incentive to “semanticize” your page, as there is no service to make use of it; and there are no services, because there are no semantic pages. And also, there’s a lot of complexity for making something “semantic” (RDF and related standards are everything but webmaster-friendly). There are niche cases, however. The Open Graph protocol, for example, makes a web page “shareable on facebook”, so web masters have the incentive to add these tags.

I will soon contact Google Scholar, Microsoft Academic and other search engines to convince them to index semantically-enabled web-published research. The point is to have an incentive, just like with the facebook example, to use the semantic options. I’ll also get in contact with ResearchGate/Academia/Arxiv/etc. to suggest the inclusion of semantic annotations and/or JSON-LD.

The general idea is to have green open access with online post-publication peer review, which in turn lets services make profile pages and calculate (partial) impact scores, without reliance on the publishers. It has to be easy, and it has to include libraries as the main contributor – they have the “power” to change the status-quo. And supporting a WordPress installation is quite easy – a library, for example, can setup one for all of the researchers in the institution and let them publish there.

A few specifics of the plugin:

  • the name “scienation” comes from “science” and either “nation” or the “-ation” suffix.
  • it uses URLs as article identifiers (which is compatible with DOIs that can also be turned into URLs). There is an alternative identifier, which is the hash of the article (text-only) content – that way the identifier is permanent and doesn’t rely on one holding a given domain.
  • it uses ORCID as an identity provider (well, not fully, as the OAuth flow is not yet implemented – it requires a special registration which won’t be feasible). One has to enter his ORCID in a field and the system will assume it’s really him. This may be tricky and there may be attempts to publish a bad peer review on behalf of someone else.
  • the hierarchy of science branches is obtained from Wikipedia, combined with other small sources.
  • the JSON-LD properties in use are debatable (sample output). I’ve started a discussion on having additional, more appropriate properties in schema.org’s ScholarlyArticle. I’m aware of ScholarlyHTML (here, here and here – a bit confusing which is “correct”), codemeta definitions and the scholarly article ontology. They are very good, but their purpose is different – to represent the internal details of a scientific work in a structured way. There is probably no need of that if the purpose is to make the content searchable and to annotate it with metadata like authors, id, peer reviews and citations. Still, I reuse the ScholarlyArticle standard definition and will gladly accept anything else that is suitable for the usecase.
  • I got the scienation.com domain (nothing to be seen there currently) and one can choose to add his website to a catalog that may be used in the future for easier discovering and indexing semantically-enabled websites.

The plugin is open source, licensed under GPL (as is required by WordPress), and contributions, discussions and suggestions are more than welcome.

I’m well aware that a simple wordpress plugin won’t fix the debacle that I’ve described in the first part of this article. But I think the right approach is to follow the principle of decentralization and reliance on libraries and individual researchers, rather than on (centralized) companies. The latter has so far proved inefficient and actually slows science down.

Share Button

6 Responses to “The Broken Scientific Publishing Model and My Attempt to Improve It”

  1. Great job researching the problem so thoroughly and respect for doing something about it!

    While I agree with most of what you said about the problem, I think you might be going in the wrong direction if you seek to provide an alternative. Let me explain.

    First, you are suggesting that scientists should upload their paper to a web server running WordPress that they or their institution maintain. Further you suggest, if I understand correctly, to replace the peer review process with a “post-publication peer review” in the form of comments to the study that is published on a web server that is under the control of the authors or their institution. How do you make sure that unfavorable reviews to the paper will get published at all in this case? What is more, how do you control the quality of the published research?

    Peer review is there to filter out low quality research, including studies the conclusions of which are not supported by the data presented, or the analysis of which lacks sufficient rigor or is outright false, etc. Another goal of peer review is to actually improve the quality of a paper that is otherwise suitable or nearly-suitable for publication, and so on. In other words, peer review is part of what makes the difference between a scientific paper, and a publication in the local newspaper. As an author and an occasional reviewer, I agree that in its present form peer review is far from being perfect — in fact, it has some serious problems of its own — but it works and does its job more often than not, and for the time being we can’t go without it. So please don’t think you can just scrap it or reduce it to a comment to a blog post.

    Another important function of journals is archiving scientific work. Each and every paper published in a journal needs to be accessible for at least as long as the journal exists, ideally longer than that — for generations to come. Only then can future research “stand on shoulders of giants”. Can you rely on every scientific group or research institution that they keep all of their publications online for that long? Probably not.

    Don’t get me wrong, I agree with you on the problems of the conventional scientific publishing model, especially regarding the tall paywalls and the evil practices of some publishers. This has to change and it will, gradually. While for the reasons pointed out above your tool might not be an alternative to the publication process, I think scientists can make a good use of it to disseminate their research either as “pre-print” or after it has passed peer review. Anyway, this is some good work on the difficult task of changing the current system, well done!

  2. Thanks for your comments.

    I fully agree that we should neither scrap the peer review process, nor reduce it to “comments”. This is more of a “proof of concept” (it still doesn’t guarantee you can’t delete a negative review).

    But I think “post publication peer review” can do a good job in guaranteeing that a publication is good. “Filtering” out is the paper process. On the web you can edit dynamically – you publish->get feedback->edit/improve.

    And the peer review results can be used to judge the quality of the article. An article can be published and still be crap, which services like Google Scholar will take into account.

    Also, although I cannot say that with certainty, seems like the current peer review process doesn’t actually guarnatee any quality. Countless examples of even computer-generated papers getting published in obscure journals, or even reputable ones. Even the Lancet editor complained about that (saying probably half of the papers are wrong). So if we are to say that peer review guarantees the quality of scientific publications, we should also change it – punish bad or negligent reviewers, and punish scientists for publishing bad articles. A public process is necessary for that.

    As for archiving – that can be an added-value service, like the search engines. Or an NGO, like arxiv. Or it can be the responsibility of libraries, which has been their reponsobility anyway.

  3. This is a really neat suggestion, thanks for writing it up and developing the plugin! Did you know ScienceOpen basically offer a post-publication peer review ‘overlay’ service for pre-prints too? http://blog.scienceopen.com/2016/04/what-if-you-could-peer-review-the-arxiv/ (Disclosure: I work for them)

  4. Thanks for you very accurate summary of the sad status quo!

    In this post are the references you asked for concerning IF:

    http://blogarchive.brembs.net/comment-n397.html
    (from 2008)

    The newer info is that Journal rank as indicated by IF negatively correlates with the reliability of the science (i.e, highest ranked journals publish the least reliable science):

    http://bjoern.brembs.net/2016/01/even-without-retractions-top-journals-publish-the-least-reliable-science/

    As you so eloquently demonstrate: a single coder can today offer something that is technologically superior to whatever multi-billion dollar industries have to offer academia. This is why there are currently over 600 solutions around for various issues we are confronted with in our antiquated infrastructure:

    https://innoscholcomm.silk.co/

    full list:

    http://bit.ly/innoscholcomm-list

    Congratulations on providing us with another piece in the puzzle of upgrading our infrastructure. 🙂

    You also correctly point towards journal rank (not the IF! If IF were gone tomorrow, we’d be using a different ranking tool the following day) being the one major obstacle why we’re not using solutions like yours on a broad scale, yet. This obstacle will only be overcome if we let journals go the way of the dinosaurs. Because of the new tools, we now can cancel subscriptions without much serious disruption in access for faculty:

    http://bjoern.brembs.net/2016/09/practical-roads-to-infrastructure-reform

    Hence, if you would like for your tool to be picked up, ask your librarian to cancel all their subscription and replace them with the “legal sci-hub” we now can piece together from various sources. This will make journals obsolete and free billions for an infrastructure that takes care of our narratives (text, audio, video), data and code.

  5. Thanks a lot for your comment. The puzzle seems even bigger than I imagined.

    I will indeed try to see the libraries point of view in this step-by-step replacement of traditional journals.

  6. I’m not sure if you’ve seen it, but I commented on the subject here:

    http://nauka.offnews.bg/news/Novini_1/Traditcionniiat-model-sreshtu-otvoreniia-dostap-v-nauchnoto-publikuvan_54753.html

    Otherwise I’m a big fan of open access, because I think that publicly paid research shouldn’t be hidden behind a paywall.

Leave a Reply