Bias in study design classification: A “retrospective” retrospective study

Background: Authors’ accurate use of terminology about epidemiological methods is important for clearly communicating about research to readers. Users of the medical literature rely on such information to guide them in their use of evidence to support patient care. However, historical changes in the meanings of epidemiologic terms could undermine these goals. For example, the term “retrospective” has historically been used to refer to a case-control study design. More recently, however, the term has come into wide usage as a description of a study’s data source.

Objectives: Using a corpus of human immunodeficiency virus (HIV) studies, we evaluated the association between use of the term retrospective in a study’s title/abstract and the likelihood that it would be indexed as a case-control study in PubMed.

Methods: We conducted a cohort study of PubMed research articles that were indexed with the MeSH term HIV from 1992-2017. We identified observational studies using a validated search filter for observational studies from the National Institute for Health and Care Excellence (NICE). For our primary objective, exposure of interest was the presence of the term retrospective in the title and/or abstract of an article, and the outcome was the PubMed Medical Subject Heading (MeSH) term for case-control studies. We estimated risk ratios (RRs) and 95% confidence intervals for the association between the exposure and the outcome overall and by year.

Results: A total of 88771 studies were identified meeting the eligibility criteria during the study period in our HIV corpus. Risk ratios (RR) in the HIV corpus for the association between the term retrospective and the MeSH term for case control studies was 29.21 (27.8, 30.7), and 26.4 (25.1, 27.8) when controlling for year. By year, the association was strongest in the first year (RR 72.4, 95% CI 46.0, 113.9), but then declined substantially in each subsequent year, stabilizing at a magnitude below 40% by 1998.

Conclusion: Use of the term retrospective is strongly associated with indexing with the MeSH term for case-control studies. This association initially decreased but has remained stable from 2005 onward. This suggests potential bias in study classification if use of the term retrospective is now more commonly associated with data source than with study design. 

Published in College of Pharmacy, Virtual Poster Session Spring 2020


  1. This is really quite interesting. From an outside perspective, one may not really see the importance of using descriptive terminology in something as basic as describing study design, but it truly can lead to a great deal of confusion about what the actual design of a study is. Especially since so many studies use historical data for their analyses. I am interested to see the results of the ongoing study that correlates actual study design with their MESH terms.

    1. Yup, it is definitely food for thought. A major theme in all of this goes back to the rigor in which studies are developed and carried out. It is surprising how many researchers and authors do not have a strong background in epidemiology. There is a lot of variability in how we “speak” about studies and this reflects, in our opinion, an area in which the scientific community as a whole can potentially improve upon.

  2. Stephen – interesting work!! I wonder if you can describe your Risk ratio more. Is that the outcome measures of what exposure and outcomes. It was not clear to me in the poster.

    1. Of course! Our outcome of exposure was classification with the MeSH term “case-control study” and our exposure was presence of the term “retrospective” in the title and/or abstract. Risk ratios were calculated overall and by year. We went with risk ratios as opposed to odds ratios as odds ratios can overestimate relative risk depending on how frequent the outcome is.

  3. Stephen, well done! You findings intrigue me, and like Spencer, I can’t wait to see the outcome of the retrospective, case-control study! One question I have has to do with how you conclude that the magnitude of the association is greater than you would expect, suggesting that the classification of the “retrospective” studies as “case controls” is erroneous. As you noted, the proper use of “retrospective” to be aligned with a “case control” study would have been strongest earlier on, whereas more recently it has been taken to indicate a historical data source. Given that argument, I find it intriguing that the risk ratio is highest for your earliest data points. Wouldn’t you expect the opposite if this mis-characterization of “retrospective” is creeping in more recently?

    1. I think that is a great point. While we do see a downward trend, we still do see very large risk ratios (~20-30), potentially suggesting a strong association between “retrospective” and case-control classification. I do agree with you though, and I think your observation is definitely worth mentioning in the manuscript. Our statement of “greater than you would expect” mostly comes from the fact that if retrospective really was becoming more associated with historical data sources, we would have expected much lower risk ratios in the past 5-10 years than what we see here. But I love the insight and definitely want to explore in the manuscript why the initial risk ratios are so much higher than later on.

  4. Great work Stephen, complicated issue. I guess I have always considered a case control study a retrospective study. Old school, DIana

    1. Thank you! Yup, there is a lot of variability in the nomenclature people use.

Comments are closed.