Using metrics to assess research quality

As of the 23rd May 2022 this website is archived and will receive no further updates. was produced by the Winton programme for the public understanding of risk based in the Statistical Laboratory in the University of Cambridge. The aim was to help improve the way that uncertainty and risk are discussed in society, and show how probability and statistics can be both useful and entertaining.

Many of the animations were produced using Flash and will no longer work.

The Higher Education Funding Council for England (HEFCE) is carrying out an independent review of the role of metrics in research assessment, and are encouraging views. I have submitted a (very personal) response, using HEFCE's suggested headings, which is given below in a minimally-edited version.


You will be getting a lot of detailed reasoned arguments about this topic, so I thought I would provide a more personal perspective from someone whose has done very well out of metrics.

Identifying useful metrics for research assessment:

I am a statistician, and so I love metrics. I follow my Google Scholar profile with interest. By any metric, I have been extremely successful. On Google Scholar I have 64000 citations, h-index of 85, and on Web of Science I have 32000 citations, h-index of 63. I follow Altmetrics, and have over 8000 followers on Twitter. All this has done me very well in my career - I have more letters after my name than you can shake a stick at.

Nevertheless I am strongly against the suggestion that peer–review can in any way be replaced by bibliometrics.

How should metrics be used in research assessment?

My own experience shows some of the problems. My highest-cited paper clocks in at over 14,000, and yet it has roughly 150 authors and to be honest I have forgotten what, if anything, I contributed. How would these citations be shared out? Or the WinBUGS User manuals for software: around 5000 citations that do not even appear in WoS. Looking at my own record, I can see a correlation between metrics and the quality and importance of the work, but it is not large enough to use to replace judgement.

Clearly metrics should be collected and should be available to peers making judgements about the quality of research work. However they are only ‘indicators’, and not direct ‘measures’ of quality.

‘Gaming’ and strategic use of metrics:

I have done very well out of metrics, and although this is not because of deliberate gaming, I can see that my particular approach to research has paid off. I have tended to go for attractive and novel, even ‘sexy’ areas of statistics (believe it or not, such things do exist). I have got into a field early, not necessarily doing the best work, but reaping citation benefits later, mainly from people who have never read the original paper.

I have spent much of my career working on performance indicators in health and education, where it is finally being recognised that a past move towards apparently ‘simpler’ metrics was accompanied by massive gaming and distortions of practice. The Mid-Staffs scandal could be said to have directly arisen due to an obsession with a few indicators, at the cost of reduced attention to the whole system: fortunately judgements about hospitals have now moved away from a few targets and indicators to a more holistic system.

There has been a disastrous confusion between ‘indicators’ and ‘measures’, and it would be a retrograde step to see this being played out in research assessment.

Making comparisons:

The difficulty with making comparisons is illustrated by the Google Scholar listing for researchers under ‘Statistics’.

I am currently lying 9th in the world, although I am fully aware that some people such as David Cox or Martin Bland do not feature. It is interesting to look at the top scorers – these include people who come from areas that I would not consider ‘statistics’, eg particle physics, or write tutorial articles for doctors, or have published in boundary areas such as machine learning. No doubt all these authors are excellent (although I am unsure about the individual who seems to have other people’s publications included under their own name), but this shows the problems of delineating a ‘subject’ in an automatic way.

To summarise, I feel that metrics should definitely be collected, but only used as additional evidence in a professional judgement as to the quality of research output.


So – I just looked at Google Scholar, and you now have 71593 citations and an h-index of 88. You had 5684 citations in 2014. Charles Darwin’s figures are 106840 citations, h index of 95 and 6092 citations in 2014. It looks to me that he is sitting pretty for the timebeing. But Albert Einstein has just 86874 citations, only 5255 of them in 2014 (looks like he may be hitting a declining trend, too), and an h index of 103. I think you can catch him. (I do worry a bit about the numbers though - Google tells us that neither of them have verifiable emails.)