Back to full list

The Puzzling Task of Ranking Colleges

Tuesday 20 June, 2023

by Brandon Turner

There’s revolution afoot in the world of higher education rankings. The first shot was fired in November last year with the announcement by Yale Law School and Harvard Law School that they would no longer participate in the U.S. News & World Report (USNWR) rankings due to concerns with the ranking methodology. In the ensuing months, most (but not all) of the top 20 ranked law and medical schools (per USNWR) have also withdrawn.

Despite the enormous attention afforded the USNWR rankings methodology, it exists among a crowded field of competing international rankings from start ups, news outlets and academics - each crafted with the promise of better ranking algorithms that emphasise criteria more relevant to students, parents, and even the higher education institutions themselves.

While it is tempting to dismiss these rankings as the preoccupations of exuberant helicopter parents, their very real impact warrants being taken seriously. Universities regularly report large increases in applications after a jump in the rankings, and there have been cases of completely fabricated or incorrect numbers. Wealthy alumni donate thousands or even millions to bolster their alma mater. These are all precious dollars in a limited economy of philanthropy.

While many rankings focus on entering student characteristics such as – in the United States - average SAT/ACT test results or high school Grade Point Averages, some focus on career outcomes such as salaries or employment rates. Others feature criteria like the number of grants their faculty bring in or the diversity of their staff. Despite the increasing number of approaches to ranking universities, it's difficult to state objectively which one does it "right". Designing rankings is an inherently imperfect task prone to many statistical flaws and logical missteps. Spotting these faults is sometimes tricky, but important in understanding why so many rankings are flawed and how best to evaluate their results. Fortunately (or unfortunately), these five key missteps repeatedly occur and are valuable to know.

The Ecological Fallacy

Rankings commit the ecological fallacy when using aggregated data to make inferences for a particular individual. This often occurs with metrics like average alumni salary (or average debt as seen in USNWR College or Law School rankings). As an illustrative example, consider two random schools, School A and School B. Even if alumni from School A have a higher average salary than alumni from School B, when compared within individual professions, School B alumni earn more than School A alumni in every case. How is this possible? As long as a significant number of graduates from School A are more likely to prefer high-paying professions, School A's overall average salary will appear higher - even if within every profession they make the exact same amount (or even less) than graduates from School B!

Thus many schools' high performance on such metrics can instead reflect a greater propensity of their students to seek careers in lucrative industries like finance or consulting than an actual earnings advantage in any specific profession. Rankings might fail to consider that some schools disproportionately feature alumni from different regions where salaries vary or disproportionately enroll students from less wealthy backgrounds who can’t pay full cost without loans and debt. Yale Law School Dean Heather Gerken mentions the perverse disincentive of the debt metrics to admit low-income students and those interested in public interest careers in her letter withdrawing from USNWR.

Circulus in demonstrando (aka "circular reasoning")

Circular reasoning frequently results when rankings rely on indicators that are themselves affected by a university’s ranking. This often occurs even when rankings (like the USNWR Law or Medical School Rankings) use objective measures of school selectivity such as acceptance rates, total number of applicants, or the proportion of accepted students who decide to attend (also called the "yield"). Because students' decisions on which institutions they will apply to and ultimately attend are heavily influenced by those institutions' ranks, a self-fulfilling loop is created whereby the top ranking institutions unsurprisingly feature the lowest acceptance rates and the highest yields.

Many rankings (such as the USNWR College Rankings or the collaborative Wall Street Journal (WSJ) / Times Higher Education (THE) Higher Education College Rankings or the QS World University Rankings) also heavily weight “peer” or “expert” opinions obtained through thousands of surveys. The public perception of a university is certainly of relevance to prospective students. However, such reputations are difficult to change and tend to be perpetuated through surveys. Relying on opinion surveys simply repeats the very rankings which they are supposed to inform.

The Inspection Paradox

Averages can sometimes surprisingly vary depending on the group that we "inspect" to measure this average. For example, many rankings (such as the USNWR college rankings or the THE World University Rankings) utilize the average class size or similar metrics like student:faculty ratios.

Imagine an institution with only 10 classes (it's highly selective). One large introductory class has 91 students in it, and the other nine classes are "independent study" and have one student each. Most institutions would report their "average class size" as 10-students per class. But the class size for the vast majority of students is much larger than 10.

The inspection paradox is tricky because it all depends on what we inspect. It's true that the vast majority of classes are small at this institution. Using the median (instead of average) to balance for outliers would actually suggest a typical class has only 1 student! That's even worse.

To resolve the paradox, you have to inspect students--not classes. If you ask each student the average number of classmates in their classes, and then take the average of that number, you would arrive at something far more representative: 81 students.

False Causality

False causality results from attributing a causal relationship between two elements simply because they often occur together. This is commonly seen when rankings measure alumni success for some "outcome" (e.g. scores on a test, career earnings, happiness, number of pets) and attribute that to their university or college. For example, the Forbes Top Colleges 2022 ranking looks at not only alumni salary, but also career success (e.g. Rhodes Scholarships, MacArthur Fellowships, Pulitzers, Fields Medals, etc). ShanghaiRanking’s Academic Ranking of World Universities incorporates both alumni and faculty successes. The “value-added” of a school is at the core of false causality - are students from top universities successful because they went to top universities, or do top universities do a better job of selecting students who will be successful regardless of where they attend?

Some mixture of both explanations is likely. Talented faculty and peers can elevate one’s performance, perhaps through stimulating competition or facilitating fruitful collaborations. How much relative credit a school should earn for its alumni’s successes is a statistically hard question to answer (though some like WSJ/THE try to model it for salary). A simple takeaway is that rankings which merely look at the correlation of certain alumni outcomes likely overstate the causal effect of the school on that graduate's success.

Confirmation Bias

Confirmation bias represents the tendency for individuals to interpret information in ways that support their prior beliefs and hypotheses. Because there is no inherent "right" way to rank institutions, rankings designers must decide for themselves which variables to measure and how much each variable should be weighted. Each unique combination of variables and weights could produce a completely different ranking order, though ultimately only one can be published. Confirmation bias encourages a designer to choose those variables and weights that produce rankings which fit her subjective expectation of which institutions should generally be near the top and which should be generally near the bottom. For example, many observers would be sceptical of a result in which no Ivy League institutions appeared in the top 20, while being more trusting if at least a handful showed up in the top 10. For reports and magazines looking to increase sales there is thus obvious incentive to produce rankings that do not stray far from the currency of public expectation.

Every ranking result is ultimately vulnerable to this bias. The latest ranking tool from the New York Times creatively shifts this responsibility to parents and students by allowing them to craft their own rankings. Where bias ends and personal taste begins is uncertain.

What is sometimes lost amid the many valid critiques of rankings is that there are in fact legitimate differences between institutions of higher education. What students want for potential universities is ironically not unlike what universities desire for potential matriculants: a fast but meaningful way of parsing through the massive number of candidates to identify the best fit. Perhaps the final paradox is in finding a single ranking (or even a handful of rankings) to accurately capture the fit for thousands of unique applicants at hundreds of unique institutions. A good start would probably be in using logically sound criteria.

Brandon Turner (North Carolina & St Catherine's 2012) is a researcher and resident physician in the Harvard Radiation Oncology Program.

Share this article