The Trouble with College Rankings, The Fresh Yorker
The Order of Things
Rankings depend on what weight we give to what variables.
Last summer, the editors of Car and Driver conducted a comparison test of three sports cars, the Lotus Evora, the Chevrolet Corvette Grand Sport, and the Porsche Cayman S. The cars were taken on an extended run through mountain passes in Southern California, and from there to a race track north of Los Angeles, for precise measurements of spectacle and treating. The results of the road tests were then tabulated according to a twenty-one-variable, two-hundred-and-thirty-five-point rating system, based on four categories: vehicle (driver convenience, styling, fit and finish, etc.); power train (transmission, engine, and fuel economy); chassis (steering, brakes, rail, and treating); and “fun to drive.” The magazine concluded, “The range of these three cars’ driving personalities is as various as the pajama sizes of Papa Bear, Mama Bear, and Baby Bear, but a clear winner emerged nonetheless.” This was the final tally:
Porsche Cayman 193
Chevrolet Corvette 186
Lotus Evora 182
Car and Driver is one of the most influential editorial voices in the automotive world. When it says that it likes one car better than another, consumers and carmakers take notice. Yet when you inspect the magazine’s tabulations it is hard to figure out why Car and Driver was so sure that the Cayman is better than the Corvette and the Evora. The trouble starts with the fact that the ranking methodology Car and Driver used was essentially the same one it uses for all the vehicles it tests—from S.U.V.s to economy sedans. It’s not set up for sports cars. Exterior styling, for example, counts for four per cent of the total score. Has anyone buying a sports car ever placed so little value on how it looks? Similarly, the categories of “fun to drive” and “chassis”—which cover the subjective practice of driving the car—count for only eighty-five points out of the total of two hundred and thirty-five. That may make sense for S.U.V. buyers. But, for people interested in Porsches and Corvettes and Lotuses, the subjective practice of driving is surely what matters most. In other words, in attempting to come up with a ranking that is heterogeneous—a methodology that is broad enough to cover all vehicles— Car and Driver ended up with a system that is absurdly ill-suited to some vehicles. Suppose that Car and Driver determined to tailor its grading system just to sports cars. Clearly, styling and the driving practice ought to count for much more. So let’s make exterior styling worth twenty-five per cent, the driving practice worth fifty per cent, and the balance of the criteria worth twenty-five per cent. The final tally now looks like this:
Lotus Evora 205
Porsche Cayman 198
Chevrolet Corvette 192There’s another thing funny about the Car and Driver system. Price counts only for twenty points, less than ten per cent of the total. There’s no secret why: Car and Driver is edited by auto enthusiasts. To them, the choice of a car is as significant as the choice of a home or a spouse, and only a philistine would let a few dollars stand inbetween him and the car he wants. (They leave penny-pinching to their frumpy counterparts at Consumer Reports. ) But for most of us price matters, especially in a case like this, where the Corvette, as tested, costs $67,565—thirteen thousand dollars less than the Porsche, and eighteen thousand dollars less than the Lotus. Even to a car nut, that’s a lot of money. So let’s imagine that Car and Driver revised its ranking system again, providing a third of the weight to price, a third to the driving practice, and a third split identically inbetween exterior styling and vehicle characteristics. The tally would now be:
Chevrolet Corvette 205
Lotus Evora 195
Porsche Cayman 195
So which is the best car?
Car and Driver’s ambition to grade every car in the world according to the same methodology would be fine if it limited itself to a single dimension. A heterogeneous ranking system works if it focusses just on, say, how much joy a car is to drive, or how good-looking it is, or how beautifully it treats. The magazine’s ambition to create a comprehensive ranking system—one that considered cars along twenty-one variables, each weighted according to a secret sauce cooked up by the editors—would also be fine, as long as the cars being compared were truly similar. It’s only when one car is thirteen thousand dollars more than another that bouncing twenty-one variables starts to break down, because you’re faced with the unlikely task of determining how much a difference of that degree ought to matter. A ranking can be heterogeneous, in other words, as long as it doesn’t attempt to be too comprehensive. And it can be comprehensive as long as it doesn’t attempt to measure things that are heterogeneous. But it’s an act of real audacity when a ranking system attempts to be comprehensive and heterogeneous—which is the very first thing to keep in mind in any consideration of U.S. News & World Report ’ s annual “Best Colleges” guide.
The U.S. News rankings are run by Robert Morse, whose six-person team operates out of a puny crimson brick office building in the Georgetown neighborhood of Washington, D.C. Morse is a middle-aged man with gray hair who looks like the prototypical Beltway wonk: rumpled, self-effacing, mildly preppy and sensibly shoed. His office is piled high with the statistical detritus of more than two decades of data collection. When he took on his current job, in the mid-nineteen-eighties, the college guide was little more than an item of service journalism tucked away inwards U.S. News magazine. Now the weekly print magazine is defunct, but the rankings have taken on a life of their own. In the month that the two thousand eleven rankings came out, the U.S. News Web site recorded more than ten million visitors. U.S. News has added rankings of graduate programs, law schools, business schools, medical schools, and hospitals—and Morse has become the dean of a burgeoning international rankings industry.
“In the early years, the thing that’s happening now would not have been imaginable,” Morse says. “This idea of using the rankings as a benchmark, college presidents setting a aim of ‘We’re going to rise in the U.S. News ranking,’ as proof of their management, or as proof that they’re a better school, that they’re a good president. That wasn’t on anybody’s radar. It was just for consumers.”Over the years, Morse’s methodology has steadily evolved. In its current form, it relies on seven weighted variables:
Undergraduate academic reputation, 22.Five per cent
Graduation and freshman retention rates, twenty per cent
Faculty resources, twenty per cent
Student selectivity, fifteen per cent
Financial resources, ten per cent
Graduation rate spectacle, 7.Five per cent
Alumni providing, five per centFrom these variables, U.S. News generates a score for each institution on a scale of one to 100, where Harvard is a one hundred and the University of North Carolina-Greensboro is a 22. Here is a list of the schools that finished in positions forty-one through fifty in the two thousand eleven “National University” category:
Case Western Reserve, 60
Rensselaer Polytechnic Institute, 60
University of California-Irvine, 60
University of Washington, 60
University of Texas-Austin, 59
University of Wisconsin-Madison, 59
Penn State University-University Park, 58
University of Illinois, Urbana-Champaign, 58
University of Miami, 58
Yeshiva University, 57
This ranking system looks a good deal like the Car and Driver methodology. It is heterogeneous. It doesn’t just compare U.C. Irvine, the University of Washington, the University of Texas-Austin, the University of Wisconsin-Madison, Penn State, and the University of Illinois, Urbana-Champaign—all public institutions of harshly the same size. It aims to compare Penn State—a very large, public, land-grant university with a low tuition and an economically diverse student bod, set in a rural valley in central Pennsylvania and famous for its football team—with Yeshiva University, a puny, expensive, private Jewish university whose undergraduate program is set on two campuses in Manhattan (one in midtown, for the women, and one far uptown, for the boys) and is certainly not famous for its football team.
The system is also comprehensive. It doesn’t simply compare schools along one dimension—the test scores of incoming freshmen, say, or academic reputation. An algorithm takes a slate of statistics on each college and converts them into a single score: it tells us that Penn State is a better school than Yeshiva by one point. It is effortless to see why the U.S. News rankings are so popular. A single score permits us to judge inbetween entities (like Yeshiva and Penn State) that otherwise would be unlikely to compare. At no point, however, do the college guides acknowledge the extreme difficulty of the task they have set themselves. A comprehensive, heterogeneous ranking system was a spread for Car and Driver —and all it did was rank inanimate objects operated by a single person. The Penn State campus at University Park is a sophisticated institution with dozens of schools and departments, four thousand faculty members, and forty-five thousand students. How on earth does anyone propose to assign a number to something like that?The very first difficulty with rankings is that it can be remarkably hard to measure the variable you want to rank—even in cases where that variable seems flawlessly objective. Consider an extreme example: suicide. Here is a ranking of suicides per hundred thousand people, by country:
South Korea, 31.0
This list looks straightforward. Yet no self-respecting epidemiologist would look at it and conclude that Belarus has the worst suicide rate in the world, and that Hungary belongs in the top ten. Measuring suicide is just too tricky. It requires someone to make a surmise about the intentions of the deceased at the time of death. In some cases, that’s effortless. Maybe the victim leaped off the Golden Gate Bridge, or left a note. In most cases, however, there’s ambiguity, and different coroners and different cultures vary widely in the way they choose to interpret that ambiguity. In certain places, cause of death is determined by the police, who some believe are more likely to call an ambiguous suicide an accident. In other places, the decision is made by a physician, who may be less likely to do so. In some cultures, suicide is considered so shameful that coroners timid away from that determination, even when it’s visible. A suicide might be called a suicide, a homicide, an accident, or left undetermined. David Phillips, a sociologist at the University of California-San Diego, has argued persuasively that a significant percentage of single-car crashes are most likely suicides, and criminologists suggest that a good percentage of civilians killed by police officers are actually cases of “suicide by cop”—instances where someone deliberately provoked deadly force. The reported suicide rate, then, is almost certainly less than the actual suicide rate. But no one knows whether the relationship inbetween those two numbers is the same in every country. And no one knows whether the proxies that we use to estimate the real suicide rate are any good.
“Many, many people who commit suicide by poison have something else wrong with them—let’s say the person has cancer—and the death of this person might be listed as primarily associated with cancer, rather than with deliberate poisoning,” Phillips says. “Any suicides in that category would be undetectable. Or it is frequently noted that Orthodox Jews have a low recorded suicide rate, as do Catholics. Well, it could be because they have this very solid community and proscriptions against suicide, or because they are unusually embarrassed by suicide and more willing to hide it. The ordinary reaction is nobody knows whether suicide rankings are real.”
The U.S. News rankings suffer from a serious case of the suicide problem. There’s no direct way to measure the quality of an institution—how well a college manages to inform, inspire, and challenge its students. So the U.S. News algorithm relies instead on proxies for quality—and the proxies for educational quality turn out to be flimsy at best.Take the category of “faculty resources,” which counts for twenty per cent of an institution’s score. “Research shows that the more sated students are about their contact with professors,” the College Guide’s explanation of the category starts, “the more they will learn and the more likely it is they will graduate.” That’s true. According to educational researchers, arguably the most significant variable in a successful college education is a vague but crucial concept called student “engagement”—that is, the extent to which students immerse themselves in the intellectual and social life of their college—and a major component of engagement is the quality of a student’s contacts with faculty. As with suicide, the disagreement isn’t about what we want to measure. So what proxies does U.S. News use to measure this elusive dimension of engagement? The explanation goes on:
We use six factors from the 2009-10 academic year to assess a school’s commitment to instruction. Class size has two components, the proportion of classes with fewer than twenty students (30 percent of the faculty resources score) and the proportion with fifty or more students (Ten percent of the score). Faculty salary (35 percent) is the average faculty pay, plus benefits, during the 2008-09 and 2009-10 academic years, adjusted for regional differences in the cost of living. . . . We also weigh the proportion of professors with the highest degree in their fields (15 percent), the student-faculty ratio (Five percent), and the proportion of faculty who are utter time (Five percent). This is a puzzling list. Do professors who get paid more money indeed take their training roles more gravely? And why does it matter whether a professor has the highest degree in his or her field? Salaries and degree attainment are known to be predictors of research productivity. But studies demonstrate that being oriented toward research has very little to do with being good at instructing. Almost none of the U.S. News variables, in fact, seem to be particularly effective proxies for engagement. As the educational researchers Patrick Terenzini and Ernest Pascarella concluded after analyzing twenty-six hundred reports on the effects of college on students:
After taking into account the characteristics, abilities, and backgrounds students bring with them to college, we found that how much students grow or switch has only inconsistent and, perhaps in a practical sense, trivial relationships with such traditional measures of institutional “quality” as educational expenditures per student, student/faculty ratios, faculty salaries, percentage of faculty with the highest degree in their field, faculty research productivity, size of the library, [or] admissions selectivity.
The reputation score that serves as the most significant variable in the U.S. News methodology—accounting for 22.Five per cent of a college’s final score—isn’t any better. Every year, the magazine sends a survey to the country’s university and college presidents, provosts, and admissions deans (along with a sampling of high-school guidance counsellors) asking them to grade all the schools in their category on a scale of one to five. Those at national universities, for example, are asked to rank all two hundred and sixty-one other national universities—and Morse says that the typical respondent grades about half of the schools in his or her category. But it’s far from clear how any one individual could have insight into that many institutions. In an article published recently in the Annals of Internal Medicine , Ashwini Sehgal analyzed U.S. News’s “Best Hospitals” rankings, which also rely intensely on reputation ratings generated by professional peers. Sehgal put together a list of objective criteria of performance—such as a hospital’s mortality rates for various surgical procedures, patient-safety rates, nursing-staffing levels, and key technologies. Then he checked to see how well those measures of spectacle matched each hospital’s reputation rating. The reaction, he discovered, was that they didn’t. Having good outcomes doesn’t translate into being admired by other doctors. Why, after all, should a gastroenterologist at the Ochsner Medical Center, in Fresh Orleans, have any specific insight into the spectacle of the gastroenterology department at Mass General, in Boston, or even, for that matter, have anything more than an anecdotal impression of the gastroenterology department down the road at some hospital in Baton Rouge?
Some years ago, similarly, a former chief justice of the Michigan supreme court, Thomas Brennan, sent a questionnaire to a hundred or so of his fellow-lawyers, asking them to rank a list of ten law schools in order of quality. “They included a good sample of the big names. Harvard. Yale. University of Michigan. And some lesser-known schools. John Marshall. Thomas Cooley,” Brennan wrote. “As I recall, they ranked Penn State’s law school right about in the middle of the pack. Maybe fifth among the ten schools listed. Of course, Penn State doesn’t have a law school.”
Those lawyers put Penn State in the middle of the pack, even however every fact they thought they knew about Penn State’s law school was an illusion, because in their minds Penn State is a middle-of-the-pack brand. (Penn State does have a law school today, by the way.) Sound judgments of educational quality have to be based on specific, hard-to-observe features. But reputational ratings are simply inferences from broad, readily observable features of an institution’s identity, such as its history, its prominence in the media, or the elegance of its architecture. They are prejudices.
And where do these kinds of reputational prejudices come from? According to Michael Bastedo, an educational sociologist at the University of Michigan who has published widely on the U.S. News methodology, “rankings drive reputation.” In other words, when U.S. News asks a university president to perform the unlikely task of assessing the relative merits of dozens of institutions he knows nothing about, he relies on the only source of detailed information at his disposition that assesses the relative merits of dozens of institutions he knows nothing about: U.S. News . A school like Penn State, then, can do little to improve its position. To go higher than forty-seventh, it needs a better reputation score, and to get a better reputation score it needs to be higher than forty-seventh. The U.S. News ratings are a self-fulfilling prophecy.
Bastedo, incidentally, says that reputation ratings can sometimes work very well. It makes sense, for example, to ask professors within a field to rate others in their field: they read one another’s work, attend the same conferences, and hire one another’s graduate students, so they have real skill on which to base an opinion. Reputation scores can work for one-dimensional rankings, created by people with specialized skill. For example, the Wall Street Journal has ranked colleges according to the opinions of corporate recruiters. Those opinions are more than a proxy. To the extent that people chose one college over another to enhance their prospects in the corporate job markets, the reputation rankings of corporate recruiters are of direct relevance. The No. One school in the Wall Street Journal ’ s corporate recruiter’s ranking, by the way, is Penn State.
For several years, Jeffrey Stake, a professor at the Indiana University law school, has run a Web site called the Ranking Game. It contains a spreadsheet loaded with statistics on every law school in the country, and permits users to pick their own criteria, assign their own weights, and construct any ranking system they want.
Stake’s intention is to demonstrate just how subjective rankings are, to demonstrate how determinations of “quality” turn on relatively arbitrary judgments about how much different variables should be weighted. For example, his site makes it effortless to mimic the U.S. News rankings. All you have to do is give equal weight to “academic reputation,” “LSAT scores at the 75th percentile,” “student-faculty ratio,” and “faculty law-review publishing,” and you get a list of élite schools which looks similar to the U.S News law-school rankings:
University of Chicago
University of Pennsylvania
Fresh York University
University of California, Berkeley
There’s something missing from that list of variables, of course: it doesn’t include price. That is one of the most distinctive features of the U.S. News methodology. Both its college rankings and its law-school rankings prize schools for dedicating lots of financial resources to educating their students, but not for being affordable. Why? Morse admitted that there was no formal reason for that position. It was just a feeling. “We’re not telling that we’re measuring educational outcomes,” he explained. “We’re not telling we’re social scientists, or we’re subjecting our rankings to some peer-review process. We’re just telling we’ve made this judgment. We’re telling we’ve interviewed a lot of experts, we’ve developed these academic indicators, and we think these measures measure quality schools.”
As answers go, that’s up there with the parental “Because I said so.” But Morse is simply being fair. If we don’t understand what the right proxies for college quality are, let alone how to represent those proxies in a comprehensive, heterogeneous grading system, then our rankings are inherently arbitrary. All Morse was telling was that, on the question of price, he comes down on the Car and Driver side of things, not on the Consumer Reports side. U.S. News thinks that schools that spend a lot of money on their students are nicer than those that don’t, and that this niceness ought to be factored into the equation of desirability. Slew of Americans agree: the campus of Vanderbilt University or Williams College is packed with students whose families are largely indifferent to the price their school charges but keenly interested in the flower beds and the spacious suites and the architecturally distinguished lecture halls those high prices make possible. Of course, given that the rising cost of college has become a significant social problem in the United States in latest years, you can make a strong case that a school ought to be rewarded for being affordable. So suppose we go back to Stake’s ranking game, and re-rank law schools based on student-faculty ratio, L.S.A.T. scores at the seventy-fifth percentile, faculty publishing, and price, all weighted identically. The list now looks like this:
University of Chicago
Brigham Youthful University
University of Colorado
University of Pennsylvania
Columbia UniversityThe revised ranking tells us that there are schools—like B.Y.U. and Colorado—that provide a good legal education at a decent price, and that, by choosing not to include tuition as a variable, U.S. News has effectively penalized those schools for attempting to provide value for the tuition dollar. But that’s a very subtle tweak. Let’s say that value for the dollar is something we indeed care about. And so what we want is a three-factor ranking, counting value for the dollar at forty per cent, L.S.A.T. scores at forty per cent of the total, and faculty publishing at twenty per cent. Look at how the top ten switches:
University of Chicago
Brigham Youthful University
University of Texas
University of Virginia
University of Colorado
University of Alabama
University of Pennsylvania
Welcome to the big time, Alabama!
The U.S. News rankings turn out to be utter of these kinds of implicit ideological choices. One common statistic used to evaluate colleges, for example, is called “graduation rate spectacle,” which compares a school’s actual graduation rate with its predicted graduation rate given the socioeconomic status and the test scores of its incoming freshman class. It is a measure of the school’s efficacy: it quantifies the influence of a school’s culture and teachers and institutional support mechanisms. Tulane, given the qualifications of the students that it admits, ought to have a graduation rate of eighty-seven per cent; its actual two thousand nine graduation rate was seventy-three per cent. That shortfall suggests that something is amiss at Tulane.
Another common statistic for measuring college quality is “student selectivity.” This reflects variables such as how many of a college’s freshmen were in the top ten per cent of their high-school class, how high their S.A.T. scores were, and what percentage of applicants a college admits. Selectivity quantifies how accomplished students are when they very first arrive on campus.
Each of these statistics matters, but for very different reasons. As a society, we most likely care more about efficacy: America’s future depends on colleges that make sure the students they admit leave with an education and a degree. If you are a bright high-school senior and you’re thinking about your own future, tho’, you may well care more about selectivity, because that relates to the prestige of your degree.
But no institution can excel at both. The national university that ranks No. One in selectivity is Yale. A crucial part of what it considers its educational function is to assemble the most gifted group of freshmen it can. Because it maximizes selectivity, tho’, Yale will never do well on an efficacy scale. Its freshmen are so accomplished that they have a predicted graduation rate of ninety-six per cent: the highest Yale’s efficacy score could be is plus four. (It’s actually plus two.) Of the top fifty national universities in the “Best Colleges” ranking, the least selective school is Penn State. Penn State sees its educational function as serving a broad range of students. That gives it the chance to excel at efficacy—and it does so brilliantly. Penn State’s freshmen have an expected graduation rate of seventy-three per cent and an actual graduation rate of eighty-five per cent, for a score of plus twelve: no other school in the U.S. News top fifty comes close.
There is no right reaction to how much weight a ranking system should give to these two rivaling values. It’s a matter of which educational model you value more—and here, once again, U.S. News makes its position clear. It gives twice as much weight to selectivity as it does to efficacy. It favors the Yale model over the Penn State model, which means that the Yales of the world will always succeed at the U.S. News rankings because the U.S. News system is designed to prize Yale-ness. By contrast, to the extent that Penn State succeeds at doing a better job of being Penn State—of attracting a diverse group of students and educating them capably—it will only do worse. Rankings are not benign. They enshrine very particular ideologies, and, at a time when American higher education is facing a crisis of accessibility and affordability, we have adopted a de-facto standard of college quality that is uninterested in both of those factors. And why? Because a group of magazine analysts in an office building in Washington, D.C., determined twenty years ago to value selectivity over efficacy, to use proxies that scarcely relate to what they’re meant to be proxies for, and to pretend that they can compare a large, diverse, low-cost land-grant university in rural Pennsylvania with a puny, expensive, private Jewish university on two campuses in Manhattan.
“If you look at the top twenty schools every year, forever, they are all wealthy private universities,” Graham Spanier, the president of Penn State, told me. “Do you mean that even the most prestigious public universities in the United States, and you can take your pick of what you think they are—Berkeley, U.C.L.A., University of Michigan, University of Wisconsin, Illinois, Penn State, U.N.C.—do you mean to say that not one of those is in the top tier of institutions? It doesn’t truly make sense, until you drill down into the rankings, and what do you find? What I find more than anything else is a measure of wealth: institutional wealth, how big is your endowment, what percentage of alumni are donating each year, what are your faculty salaries, how much are you spending per student. Penn State may very well be the most popular university in America—we get a hundred and fifteen thousand applications a year for admission. We serve a lot of people. Almost a third of them are the very first people in their entire family network to come to college. We have seventy-six per cent of our students receiving financial aid. There is no possibility that we could do anything here at this university to get ourselves into the top ten or twenty or thirty—except if some donor gave us billions of dollars.”In the fall of 1913, the prominent American geographer Ellsworth Huntington sent a letter to two hundred and thirteen scholars from twenty-seven countries. “May I ask your cooperation in the prep of a map showcasing the distribution of the higher elements of civilization across the world?” Huntington began, and he continued:
My purpose is to prepare a map which shall demonstrate the distribution of those characteristics which are generally recognized as of the highest value. I mean by this the power of initiative, the capacity for formulating fresh ideas and for carrying them into effect, the power of self-control, high standards of honesty and morality, the power to lead and to control other races, the capacity for disseminating ideas, and other similar qualities which will readily suggest themselves.
Each contributor was given a list of a hundred and eighty-five of the world’s regions—ranging from the Amur district of Siberia to the Kalahari Desert—with instructions to give each region a score of one to ten. The scores would then be summed and converted to a scale of one to a hundred. The rules were rigorous. The past could not be considered: Greece could not be given credit for its ancient glories. “If two races inhabit a given region,” Huntington specified further, “both must be considered, and the rank of the region must depend upon the average of the two.” The reputation of immigrants could be used toward the score of their country of origin, but only those of the very first generation. And size and commercial significance should be held constant: the Scots should not suffer relative to, say, the English, just because they were less populous. Huntington’s respondents took on the task with the utmost seriousness. “One appreciates what a big world this is and how little one knows about it when he attempts such a task as you have set,” a respondent wrote back to Huntington. “It is a most excellent means of taking the conceit out of one.” England and Wales and the North Atlantic states of America scored a ideal hundred, with central and northwestern Germany and Fresh England coming in at ninety-nine.
Huntington then requested from the twenty-five of his correspondents who were Americans an in-depth ranking of the constituent regions of the United States. This time, he proposed a six-point scale. Southern Alaska, in this 2nd reckoning, was last, at 1.Five, followed by Arizona and Fresh Mexico, at 1.6. The winners: Massachusetts, at 6.0, followed by Connecticut, Rhode Island, and Fresh York, at Five.8. The citadel of American civilization was Fresh England and Fresh York, Huntington concluded, in his magisterial one thousand nine hundred fifteen work “Civilization and Climate.” In case you are wondering, Ellsworth Huntington was a professor of geography at Yale, in Fresh Haven, Connecticut. “Civilization and Climate” was published by Yale University Press, and the book’s appendix contains a list of Huntington’s American correspondents, of which the following bear special mention:
J. Barrell, geologist, Fresh Haven, Conn.
P. Bigelow, traveler and author, Malden, N.Y.
I. Bowman, geographer, Fresh York City
W. M. Brown, geographer, Providence, R.I.
A. C. Coolidge, historian, Cambridge, Mass.
S. W. Cushing, geographer, Salem, Mass.
L. Farrand, anthropologist, Fresh York City
C. W. Furlong, traveler and author, Boston, Mass.
E. W. Griffis, traveler and author, Ithaca, N.Y.
A. G. Keller, anthropologist, Fresh Haven, Conn.
E. F. Merriam, editor, Boston, Mass.
J. R. Smith, economic geographer, Philadelphia, Pa.
Anonymous, Fresh York City
“In spite of several attempts I was incapable to obtain any contributor in the states west of Minnesota or south of the Ohio Sea,” Huntington explains, as if it were a side issue. It isn’t, of course—not then and not now. Who comes out on top, in any ranking system, is indeed about who is doing the ranking. ♦
The Trouble with College Rankings, The Fresh Yorker
The Order of Things
Rankings depend on what weight we give to what variables.
Last summer, the editors of Car and Driver conducted a comparison test of three sports cars, the Lotus Evora, the Chevrolet Corvette Grand Sport, and the Porsche Cayman S. The cars were taken on an extended run through mountain passes in Southern California, and from there to a race track north of Los Angeles, for precise measurements of spectacle and treating. The results of the road tests were then tabulated according to a twenty-one-variable, two-hundred-and-thirty-five-point rating system, based on four categories: vehicle (driver convenience, styling, fit and finish, etc.); power train (transmission, engine, and fuel economy); chassis (steering, brakes, rail, and treating); and “fun to drive.” The magazine concluded, “The range of these three cars’ driving personalities is as various as the pajama sizes of Papa Bear, Mama Bear, and Baby Bear, but a clear winner emerged nonetheless.” This was the final tally:
Porsche Cayman 193
Chevrolet Corvette 186
Lotus Evora 182
Car and Driver is one of the most influential editorial voices in the automotive world. When it says that it likes one car better than another, consumers and carmakers take notice. Yet when you inspect the magazine’s tabulations it is hard to figure out why Car and Driver was so sure that the Cayman is better than the Corvette and the Evora. The trouble starts with the fact that the ranking methodology Car and Driver used was essentially the same one it uses for all the vehicles it tests—from S.U.V.s to economy sedans. It’s not set up for sports cars. Exterior styling, for example, counts for four per cent of the total score. Has anyone buying a sports car ever placed so little value on how it looks? Similarly, the categories of “fun to drive” and “chassis”—which cover the subjective practice of driving the car—count for only eighty-five points out of the total of two hundred and thirty-five. That may make sense for S.U.V. buyers. But, for people interested in Porsches and Corvettes and Lotuses, the subjective practice of driving is surely what matters most. In other words, in attempting to come up with a ranking that is heterogeneous—a methodology that is broad enough to cover all vehicles— Car and Driver ended up with a system that is absurdly ill-suited to some vehicles. Suppose that Car and Driver determined to tailor its grading system just to sports cars. Clearly, styling and the driving practice ought to count for much more. So let’s make exterior styling worth twenty-five per cent, the driving practice worth fifty per cent, and the balance of the criteria worth twenty-five per cent. The final tally now looks like this:
Lotus Evora 205
Porsche Cayman 198
Chevrolet Corvette 192There’s another thing funny about the Car and Driver system. Price counts only for twenty points, less than ten per cent of the total. There’s no secret why: Car and Driver is edited by auto enthusiasts. To them, the choice of a car is as significant as the choice of a home or a spouse, and only a philistine would let a few dollars stand inbetween him and the car he wants. (They leave penny-pinching to their frumpy counterparts at Consumer Reports. ) But for most of us price matters, especially in a case like this, where the Corvette, as tested, costs $67,565—thirteen thousand dollars less than the Porsche, and eighteen thousand dollars less than the Lotus. Even to a car nut, that’s a lot of money. So let’s imagine that Car and Driver revised its ranking system again, providing a third of the weight to price, a third to the driving practice, and a third split identically inbetween exterior styling and vehicle characteristics. The tally would now be:
Chevrolet Corvette 205
Lotus Evora 195
Porsche Cayman 195
So which is the best car?
Car and Driver’s ambition to grade every car in the world according to the same methodology would be fine if it limited itself to a single dimension. A heterogeneous ranking system works if it focusses just on, say, how much joy a car is to drive, or how good-looking it is, or how beautifully it treats. The magazine’s ambition to create a comprehensive ranking system—one that considered cars along twenty-one variables, each weighted according to a secret sauce cooked up by the editors—would also be fine, as long as the cars being compared were truly similar. It’s only when one car is thirteen thousand dollars more than another that bouncing twenty-one variables starts to break down, because you’re faced with the unlikely task of determining how much a difference of that degree ought to matter. A ranking can be heterogeneous, in other words, as long as it doesn’t attempt to be too comprehensive. And it can be comprehensive as long as it doesn’t attempt to measure things that are heterogeneous. But it’s an act of real audacity when a ranking system attempts to be comprehensive and heterogeneous—which is the very first thing to keep in mind in any consideration of U.S. News & World Report ’ s annual “Best Colleges” guide.
The U.S. News rankings are run by Robert Morse, whose six-person team operates out of a petite crimson brick office building in the Georgetown neighborhood of Washington, D.C. Morse is a middle-aged man with gray hair who looks like the prototypical Beltway wonk: rumpled, self-effacing, mildly preppy and sensibly shoed. His office is piled high with the statistical detritus of more than two decades of data collection. When he took on his current job, in the mid-nineteen-eighties, the college guide was little more than an item of service journalism tucked away inwards U.S. News magazine. Now the weekly print magazine is defunct, but the rankings have taken on a life of their own. In the month that the two thousand eleven rankings came out, the U.S. News Web site recorded more than ten million visitors. U.S. News has added rankings of graduate programs, law schools, business schools, medical schools, and hospitals—and Morse has become the dean of a burgeoning international rankings industry.
“In the early years, the thing that’s happening now would not have been imaginable,” Morse says. “This idea of using the rankings as a benchmark, college presidents setting a purpose of ‘We’re going to rise in the U.S. News ranking,’ as proof of their management, or as proof that they’re a better school, that they’re a good president. That wasn’t on anybody’s radar. It was just for consumers.”Over the years, Morse’s methodology has steadily evolved. In its current form, it relies on seven weighted variables:
Undergraduate academic reputation, 22.Five per cent
Graduation and freshman retention rates, twenty per cent
Faculty resources, twenty per cent
Student selectivity, fifteen per cent
Financial resources, ten per cent
Graduation rate spectacle, 7.Five per cent
Alumni providing, five per centFrom these variables, U.S. News generates a score for each institution on a scale of one to 100, where Harvard is a one hundred and the University of North Carolina-Greensboro is a 22. Here is a list of the schools that finished in positions forty-one through fifty in the two thousand eleven “National University” category:
Case Western Reserve, 60
Rensselaer Polytechnic Institute, 60
University of California-Irvine, 60
University of Washington, 60
University of Texas-Austin, 59
University of Wisconsin-Madison, 59
Penn State University-University Park, 58
University of Illinois, Urbana-Champaign, 58
University of Miami, 58
Yeshiva University, 57
This ranking system looks a fine deal like the Car and Driver methodology. It is heterogeneous. It doesn’t just compare U.C. Irvine, the University of Washington, the University of Texas-Austin, the University of Wisconsin-Madison, Penn State, and the University of Illinois, Urbana-Champaign—all public institutions of toughly the same size. It aims to compare Penn State—a very large, public, land-grant university with a low tuition and an economically diverse student bod, set in a rural valley in central Pennsylvania and famous for its football team—with Yeshiva University, a puny, expensive, private Jewish university whose undergraduate program is set on two campuses in Manhattan (one in midtown, for the women, and one far uptown, for the guys) and is undoubtedly not famous for its football team.
The system is also comprehensive. It doesn’t simply compare schools along one dimension—the test scores of incoming freshmen, say, or academic reputation. An algorithm takes a slate of statistics on each college and converts them into a single score: it tells us that Penn State is a better school than Yeshiva by one point. It is effortless to see why the U.S. News rankings are so popular. A single score permits us to judge inbetween entities (like Yeshiva and Penn State) that otherwise would be unlikely to compare. At no point, however, do the college guides acknowledge the extreme difficulty of the task they have set themselves. A comprehensive, heterogeneous ranking system was a spread for Car and Driver —and all it did was rank inanimate objects operated by a single person. The Penn State campus at University Park is a sophisticated institution with dozens of schools and departments, four thousand faculty members, and forty-five thousand students. How on earth does anyone propose to assign a number to something like that?The very first difficulty with rankings is that it can be remarkably hard to measure the variable you want to rank—even in cases where that variable seems ideally objective. Consider an extreme example: suicide. Here is a ranking of suicides per hundred thousand people, by country:
South Korea, 31.0
This list looks straightforward. Yet no self-respecting epidemiologist would look at it and conclude that Belarus has the worst suicide rate in the world, and that Hungary belongs in the top ten. Measuring suicide is just too tricky. It requires someone to make a surmise about the intentions of the deceased at the time of death. In some cases, that’s effortless. Maybe the victim leaped off the Golden Gate Bridge, or left a note. In most cases, tho’, there’s ambiguity, and different coroners and different cultures vary widely in the way they choose to interpret that ambiguity. In certain places, cause of death is determined by the police, who some believe are more likely to call an ambiguous suicide an accident. In other places, the decision is made by a physician, who may be less likely to do so. In some cultures, suicide is considered so shameful that coroners bashful away from that determination, even when it’s visible. A suicide might be called a suicide, a homicide, an accident, or left undetermined. David Phillips, a sociologist at the University of California-San Diego, has argued persuasively that a significant percentage of single-car crashes are most likely suicides, and criminologists suggest that a good percentage of civilians killed by police officers are actually cases of “suicide by cop”—instances where someone deliberately provoked deadly force. The reported suicide rate, then, is almost certainly less than the actual suicide rate. But no one knows whether the relationship inbetween those two numbers is the same in every country. And no one knows whether the proxies that we use to estimate the real suicide rate are any good.
“Many, many people who commit suicide by poison have something else wrong with them—let’s say the person has cancer—and the death of this person might be listed as primarily associated with cancer, rather than with deliberate poisoning,” Phillips says. “Any suicides in that category would be undetectable. Or it is frequently noted that Orthodox Jews have a low recorded suicide rate, as do Catholics. Well, it could be because they have this very solid community and proscriptions against suicide, or because they are unusually embarrassed by suicide and more willing to hide it. The ordinary reaction is nobody knows whether suicide rankings are real.”
The U.S. News rankings suffer from a serious case of the suicide problem. There’s no direct way to measure the quality of an institution—how well a college manages to inform, inspire, and challenge its students. So the U.S. News algorithm relies instead on proxies for quality—and the proxies for educational quality turn out to be flimsy at best.Take the category of “faculty resources,” which counts for twenty per cent of an institution’s score. “Research shows that the more pleased students are about their contact with professors,” the College Guide’s explanation of the category commences, “the more they will learn and the more likely it is they will graduate.” That’s true. According to educational researchers, arguably the most significant variable in a successful college education is a vague but crucial concept called student “engagement”—that is, the extent to which students immerse themselves in the intellectual and social life of their college—and a major component of engagement is the quality of a student’s contacts with faculty. As with suicide, the disagreement isn’t about what we want to measure. So what proxies does U.S. News use to measure this elusive dimension of engagement? The explanation goes on:
We use six factors from the 2009-10 academic year to assess a school’s commitment to instruction. Class size has two components, the proportion of classes with fewer than twenty students (30 percent of the faculty resources score) and the proportion with fifty or more students (Ten percent of the score). Faculty salary (35 percent) is the average faculty pay, plus benefits, during the 2008-09 and 2009-10 academic years, adjusted for regional differences in the cost of living. . . . We also weigh the proportion of professors with the highest degree in their fields (15 percent), the student-faculty ratio (Five percent), and the proportion of faculty who are utter time (Five percent). This is a puzzling list. Do professors who get paid more money truly take their training roles more earnestly? And why does it matter whether a professor has the highest degree in his or her field? Salaries and degree attainment are known to be predictors of research productivity. But studies showcase that being oriented toward research has very little to do with being good at training. Almost none of the U.S. News variables, in fact, seem to be particularly effective proxies for engagement. As the educational researchers Patrick Terenzini and Ernest Pascarella concluded after analyzing twenty-six hundred reports on the effects of college on students:
After taking into account the characteristics, abilities, and backgrounds students bring with them to college, we found that how much students grow or switch has only inconsistent and, perhaps in a practical sense, trivial relationships with such traditional measures of institutional “quality” as educational expenditures per student, student/faculty ratios, faculty salaries, percentage of faculty with the highest degree in their field, faculty research productivity, size of the library, [or] admissions selectivity.
The reputation score that serves as the most significant variable in the U.S. News methodology—accounting for 22.Five per cent of a college’s final score—isn’t any better. Every year, the magazine sends a survey to the country’s university and college presidents, provosts, and admissions deans (along with a sampling of high-school guidance counsellors) asking them to grade all the schools in their category on a scale of one to five. Those at national universities, for example, are asked to rank all two hundred and sixty-one other national universities—and Morse says that the typical respondent grades about half of the schools in his or her category. But it’s far from clear how any one individual could have insight into that many institutions. In an article published recently in the Annals of Internal Medicine , Ashwini Sehgal analyzed U.S. News’s “Best Hospitals” rankings, which also rely intensely on reputation ratings generated by professional peers. Sehgal put together a list of objective criteria of performance—such as a hospital’s mortality rates for various surgical procedures, patient-safety rates, nursing-staffing levels, and key technologies. Then he checked to see how well those measures of spectacle matched each hospital’s reputation rating. The reaction, he discovered, was that they didn’t. Having good outcomes doesn’t translate into being admired by other doctors. Why, after all, should a gastroenterologist at the Ochsner Medical Center, in Fresh Orleans, have any specific insight into the spectacle of the gastroenterology department at Mass General, in Boston, or even, for that matter, have anything more than an anecdotal impression of the gastroenterology department down the road at some hospital in Baton Rouge?
Some years ago, similarly, a former chief justice of the Michigan supreme court, Thomas Brennan, sent a questionnaire to a hundred or so of his fellow-lawyers, asking them to rank a list of ten law schools in order of quality. “They included a good sample of the big names. Harvard. Yale. University of Michigan. And some lesser-known schools. John Marshall. Thomas Cooley,” Brennan wrote. “As I recall, they ranked Penn State’s law school right about in the middle of the pack. Maybe fifth among the ten schools listed. Of course, Penn State doesn’t have a law school.”
Those lawyers put Penn State in the middle of the pack, even however every fact they thought they knew about Penn State’s law school was an illusion, because in their minds Penn State is a middle-of-the-pack brand. (Penn State does have a law school today, by the way.) Sound judgments of educational quality have to be based on specific, hard-to-observe features. But reputational ratings are simply inferences from broad, readily observable features of an institution’s identity, such as its history, its prominence in the media, or the elegance of its architecture. They are prejudices.
And where do these kinds of reputational prejudices come from? According to Michael Bastedo, an educational sociologist at the University of Michigan who has published widely on the U.S. News methodology, “rankings drive reputation.” In other words, when U.S. News asks a university president to perform the unlikely task of assessing the relative merits of dozens of institutions he knows nothing about, he relies on the only source of detailed information at his disposition that assesses the relative merits of dozens of institutions he knows nothing about: U.S. News . A school like Penn State, then, can do little to improve its position. To go higher than forty-seventh, it needs a better reputation score, and to get a better reputation score it needs to be higher than forty-seventh. The U.S. News ratings are a self-fulfilling prophecy.
Bastedo, incidentally, says that reputation ratings can sometimes work very well. It makes sense, for example, to ask professors within a field to rate others in their field: they read one another’s work, attend the same conferences, and hire one another’s graduate students, so they have real skill on which to base an opinion. Reputation scores can work for one-dimensional rankings, created by people with specialized skill. For example, the Wall Street Journal has ranked colleges according to the opinions of corporate recruiters. Those opinions are more than a proxy. To the extent that people chose one college over another to enhance their prospects in the corporate job markets, the reputation rankings of corporate recruiters are of direct relevance. The No. One school in the Wall Street Journal ’ s corporate recruiter’s ranking, by the way, is Penn State.
For several years, Jeffrey Stake, a professor at the Indiana University law school, has run a Web site called the Ranking Game. It contains a spreadsheet loaded with statistics on every law school in the country, and permits users to pick their own criteria, assign their own weights, and construct any ranking system they want.
Stake’s intention is to demonstrate just how subjective rankings are, to demonstrate how determinations of “quality” turn on relatively arbitrary judgments about how much different variables should be weighted. For example, his site makes it effortless to mimic the U.S. News rankings. All you have to do is give equal weight to “academic reputation,” “LSAT scores at the 75th percentile,” “student-faculty ratio,” and “faculty law-review publishing,” and you get a list of élite schools which looks similar to the U.S News law-school rankings:
University of Chicago
University of Pennsylvania
Fresh York University
University of California, Berkeley
There’s something missing from that list of variables, of course: it doesn’t include price. That is one of the most distinctive features of the U.S. News methodology. Both its college rankings and its law-school rankings prize schools for dedicating lots of financial resources to educating their students, but not for being affordable. Why? Morse admitted that there was no formal reason for that position. It was just a feeling. “We’re not telling that we’re measuring educational outcomes,” he explained. “We’re not telling we’re social scientists, or we’re subjecting our rankings to some peer-review process. We’re just telling we’ve made this judgment. We’re telling we’ve interviewed a lot of experts, we’ve developed these academic indicators, and we think these measures measure quality schools.”
As answers go, that’s up there with the parental “Because I said so.” But Morse is simply being fair. If we don’t understand what the right proxies for college quality are, let alone how to represent those proxies in a comprehensive, heterogeneous grading system, then our rankings are inherently arbitrary. All Morse was telling was that, on the question of price, he comes down on the Car and Driver side of things, not on the Consumer Reports side. U.S. News thinks that schools that spend a lot of money on their students are nicer than those that don’t, and that this niceness ought to be factored into the equation of desirability. Slew of Americans agree: the campus of Vanderbilt University or Williams College is packed with students whose families are largely indifferent to the price their school charges but keenly interested in the flower beds and the spacious suites and the architecturally distinguished lecture halls those high prices make possible. Of course, given that the rising cost of college has become a significant social problem in the United States in latest years, you can make a strong case that a school ought to be rewarded for being affordable. So suppose we go back to Stake’s ranking game, and re-rank law schools based on student-faculty ratio, L.S.A.T. scores at the seventy-fifth percentile, faculty publishing, and price, all weighted identically. The list now looks like this:
University of Chicago
Brigham Youthful University
University of Colorado
University of Pennsylvania
Columbia UniversityThe revised ranking tells us that there are schools—like B.Y.U. and Colorado—that provide a good legal education at a decent price, and that, by choosing not to include tuition as a variable, U.S. News has effectively penalized those schools for attempting to provide value for the tuition dollar. But that’s a very subtle tweak. Let’s say that value for the dollar is something we indeed care about. And so what we want is a three-factor ranking, counting value for the dollar at forty per cent, L.S.A.T. scores at forty per cent of the total, and faculty publishing at twenty per cent. Look at how the top ten switches:
University of Chicago
Brigham Youthfull University
University of Texas
University of Virginia
University of Colorado
University of Alabama
University of Pennsylvania
Welcome to the big time, Alabama!
The U.S. News rankings turn out to be utter of these kinds of implicit ideological choices. One common statistic used to evaluate colleges, for example, is called “graduation rate spectacle,” which compares a school’s actual graduation rate with its predicted graduation rate given the socioeconomic status and the test scores of its incoming freshman class. It is a measure of the school’s efficacy: it quantifies the influence of a school’s culture and teachers and institutional support mechanisms. Tulane, given the qualifications of the students that it admits, ought to have a graduation rate of eighty-seven per cent; its actual two thousand nine graduation rate was seventy-three per cent. That shortfall suggests that something is amiss at Tulane.
Another common statistic for measuring college quality is “student selectivity.” This reflects variables such as how many of a college’s freshmen were in the top ten per cent of their high-school class, how high their S.A.T. scores were, and what percentage of applicants a college admits. Selectivity quantifies how accomplished students are when they very first arrive on campus.
Each of these statistics matters, but for very different reasons. As a society, we most likely care more about efficacy: America’s future depends on colleges that make sure the students they admit leave with an education and a degree. If you are a bright high-school senior and you’re thinking about your own future, tho’, you may well care more about selectivity, because that relates to the prestige of your degree.
But no institution can excel at both. The national university that ranks No. One in selectivity is Yale. A crucial part of what it considers its educational function is to assemble the most gifted group of freshmen it can. Because it maximizes selectivity, tho’, Yale will never do well on an efficacy scale. Its freshmen are so accomplished that they have a predicted graduation rate of ninety-six per cent: the highest Yale’s efficacy score could be is plus four. (It’s actually plus two.) Of the top fifty national universities in the “Best Colleges” ranking, the least selective school is Penn State. Penn State sees its educational function as serving a broad range of students. That gives it the chance to excel at efficacy—and it does so brilliantly. Penn State’s freshmen have an expected graduation rate of seventy-three per cent and an actual graduation rate of eighty-five per cent, for a score of plus twelve: no other school in the U.S. News top fifty comes close.
There is no right response to how much weight a ranking system should give to these two challenging values. It’s a matter of which educational model you value more—and here, once again, U.S. News makes its position clear. It gives twice as much weight to selectivity as it does to efficacy. It favors the Yale model over the Penn State model, which means that the Yales of the world will always succeed at the U.S. News rankings because the U.S. News system is designed to prize Yale-ness. By contrast, to the extent that Penn State succeeds at doing a better job of being Penn State—of attracting a diverse group of students and educating them capably—it will only do worse. Rankings are not benign. They enshrine very particular ideologies, and, at a time when American higher education is facing a crisis of accessibility and affordability, we have adopted a de-facto standard of college quality that is uninterested in both of those factors. And why? Because a group of magazine analysts in an office building in Washington, D.C., determined twenty years ago to value selectivity over efficacy, to use proxies that scarcely relate to what they’re meant to be proxies for, and to pretend that they can compare a large, diverse, low-cost land-grant university in rural Pennsylvania with a petite, expensive, private Jewish university on two campuses in Manhattan.
“If you look at the top twenty schools every year, forever, they are all wealthy private universities,” Graham Spanier, the president of Penn State, told me. “Do you mean that even the most prestigious public universities in the United States, and you can take your pick of what you think they are—Berkeley, U.C.L.A., University of Michigan, University of Wisconsin, Illinois, Penn State, U.N.C.—do you mean to say that not one of those is in the top tier of institutions? It doesn’t indeed make sense, until you drill down into the rankings, and what do you find? What I find more than anything else is a measure of wealth: institutional wealth, how big is your endowment, what percentage of alumni are donating each year, what are your faculty salaries, how much are you spending per student. Penn State may very well be the most popular university in America—we get a hundred and fifteen thousand applications a year for admission. We serve a lot of people. Almost a third of them are the very first people in their entire family network to come to college. We have seventy-six per cent of our students receiving financial aid. There is no possibility that we could do anything here at this university to get ourselves into the top ten or twenty or thirty—except if some donor gave us billions of dollars.”In the fall of 1913, the prominent American geographer Ellsworth Huntington sent a letter to two hundred and thirteen scholars from twenty-seven countries. “May I ask your cooperation in the prep of a map displaying the distribution of the higher elements of civilization via the world?” Huntington began, and he continued:
My purpose is to prepare a map which shall demonstrate the distribution of those characteristics which are generally recognized as of the highest value. I mean by this the power of initiative, the capacity for formulating fresh ideas and for carrying them into effect, the power of self-control, high standards of honesty and morality, the power to lead and to control other races, the capacity for disseminating ideas, and other similar qualities which will readily suggest themselves.
Each contributor was given a list of a hundred and eighty-five of the world’s regions—ranging from the Amur district of Siberia to the Kalahari Desert—with instructions to give each region a score of one to ten. The scores would then be summed and converted to a scale of one to a hundred. The rules were stringent. The past could not be considered: Greece could not be given credit for its ancient glories. “If two races inhabit a given region,” Huntington specified further, “both must be considered, and the rank of the region must depend upon the average of the two.” The reputation of immigrants could be used toward the score of their country of origin, but only those of the very first generation. And size and commercial significance should be held constant: the Scots should not suffer relative to, say, the English, just because they were less populous. Huntington’s respondents took on the task with the utmost seriousness. “One appreciates what a big world this is and how little one knows about it when he attempts such a task as you have set,” a respondent wrote back to Huntington. “It is a most excellent means of taking the conceit out of one.” England and Wales and the North Atlantic states of America scored a ideal hundred, with central and northwestern Germany and Fresh England coming in at ninety-nine.
Huntington then requested from the twenty-five of his correspondents who were Americans an in-depth ranking of the constituent regions of the United States. This time, he proposed a six-point scale. Southern Alaska, in this 2nd reckoning, was last, at 1.Five, followed by Arizona and Fresh Mexico, at 1.6. The winners: Massachusetts, at 6.0, followed by Connecticut, Rhode Island, and Fresh York, at Five.8. The citadel of American civilization was Fresh England and Fresh York, Huntington concluded, in his magisterial one thousand nine hundred fifteen work “Civilization and Climate.” In case you are wondering, Ellsworth Huntington was a professor of geography at Yale, in Fresh Haven, Connecticut. “Civilization and Climate” was published by Yale University Press, and the book’s appendix contains a list of Huntington’s American correspondents, of which the following bear special mention:
J. Barrell, geologist, Fresh Haven, Conn.
P. Bigelow, traveler and author, Malden, N.Y.
I. Bowman, geographer, Fresh York City
W. M. Brown, geographer, Providence, R.I.
A. C. Coolidge, historian, Cambridge, Mass.
S. W. Cushing, geographer, Salem, Mass.
L. Farrand, anthropologist, Fresh York City
C. W. Furlong, traveler and author, Boston, Mass.
E. W. Griffis, traveler and author, Ithaca, N.Y.
A. G. Keller, anthropologist, Fresh Haven, Conn.
E. F. Merriam, editor, Boston, Mass.
J. R. Smith, economic geographer, Philadelphia, Pa.
Anonymous, Fresh York City
“In spite of several attempts I was incapable to obtain any contributor in the states west of Minnesota or south of the Ohio Sea,” Huntington explains, as if it were a side issue. It isn’t, of course—not then and not now. Who comes out on top, in any ranking system, is truly about who is doing the ranking. ♦
The Trouble with College Rankings, The Fresh Yorker
The Order of Things
Rankings depend on what weight we give to what variables.
Last summer, the editors of Car and Driver conducted a comparison test of three sports cars, the Lotus Evora, the Chevrolet Corvette Grand Sport, and the Porsche Cayman S. The cars were taken on an extended run through mountain passes in Southern California, and from there to a race track north of Los Angeles, for precise measurements of spectacle and treating. The results of the road tests were then tabulated according to a twenty-one-variable, two-hundred-and-thirty-five-point rating system, based on four categories: vehicle (driver convenience, styling, fit and finish, etc.); power train (transmission, engine, and fuel economy); chassis (steering, brakes, rail, and treating); and “fun to drive.” The magazine concluded, “The range of these three cars’ driving personalities is as various as the pajama sizes of Papa Bear, Mama Bear, and Baby Bear, but a clear winner emerged nonetheless.” This was the final tally:
Porsche Cayman 193
Chevrolet Corvette 186
Lotus Evora 182
Car and Driver is one of the most influential editorial voices in the automotive world. When it says that it likes one car better than another, consumers and carmakers take notice. Yet when you inspect the magazine’s tabulations it is hard to figure out why Car and Driver was so sure that the Cayman is better than the Corvette and the Evora. The trouble starts with the fact that the ranking methodology Car and Driver used was essentially the same one it uses for all the vehicles it tests—from S.U.V.s to economy sedans. It’s not set up for sports cars. Exterior styling, for example, counts for four per cent of the total score. Has anyone buying a sports car ever placed so little value on how it looks? Similarly, the categories of “fun to drive” and “chassis”—which cover the subjective practice of driving the car—count for only eighty-five points out of the total of two hundred and thirty-five. That may make sense for S.U.V. buyers. But, for people interested in Porsches and Corvettes and Lotuses, the subjective practice of driving is surely what matters most. In other words, in attempting to come up with a ranking that is heterogeneous—a methodology that is broad enough to cover all vehicles— Car and Driver ended up with a system that is absurdly ill-suited to some vehicles. Suppose that Car and Driver determined to tailor its grading system just to sports cars. Clearly, styling and the driving practice ought to count for much more. So let’s make exterior styling worth twenty-five per cent, the driving practice worth fifty per cent, and the balance of the criteria worth twenty-five per cent. The final tally now looks like this:
Lotus Evora 205
Porsche Cayman 198
Chevrolet Corvette 192There’s another thing funny about the Car and Driver system. Price counts only for twenty points, less than ten per cent of the total. There’s no secret why: Car and Driver is edited by auto enthusiasts. To them, the choice of a car is as significant as the choice of a home or a spouse, and only a philistine would let a few dollars stand inbetween him and the car he wants. (They leave penny-pinching to their frumpy counterparts at Consumer Reports. ) But for most of us price matters, especially in a case like this, where the Corvette, as tested, costs $67,565—thirteen thousand dollars less than the Porsche, and eighteen thousand dollars less than the Lotus. Even to a car nut, that’s a lot of money. So let’s imagine that Car and Driver revised its ranking system again, providing a third of the weight to price, a third to the driving practice, and a third split identically inbetween exterior styling and vehicle characteristics. The tally would now be:
Chevrolet Corvette 205
Lotus Evora 195
Porsche Cayman 195
So which is the best car?
Car and Driver’s ambition to grade every car in the world according to the same methodology would be fine if it limited itself to a single dimension. A heterogeneous ranking system works if it focusses just on, say, how much joy a car is to drive, or how good-looking it is, or how beautifully it treats. The magazine’s ambition to create a comprehensive ranking system—one that considered cars along twenty-one variables, each weighted according to a secret sauce cooked up by the editors—would also be fine, as long as the cars being compared were truly similar. It’s only when one car is thirteen thousand dollars more than another that bouncing twenty-one variables starts to break down, because you’re faced with the unlikely task of determining how much a difference of that degree ought to matter. A ranking can be heterogeneous, in other words, as long as it doesn’t attempt to be too comprehensive. And it can be comprehensive as long as it doesn’t attempt to measure things that are heterogeneous. But it’s an act of real audacity when a ranking system attempts to be comprehensive and heterogeneous—which is the very first thing to keep in mind in any consideration of U.S. News & World Report ’ s annual “Best Colleges” guide.
The U.S. News rankings are run by Robert Morse, whose six-person team operates out of a petite crimson brick office building in the Georgetown neighborhood of Washington, D.C. Morse is a middle-aged man with gray hair who looks like the prototypical Beltway wonk: rumpled, self-effacing, mildly preppy and sensibly shoed. His office is piled high with the statistical detritus of more than two decades of data collection. When he took on his current job, in the mid-nineteen-eighties, the college guide was little more than an item of service journalism tucked away inwards U.S. News magazine. Now the weekly print magazine is defunct, but the rankings have taken on a life of their own. In the month that the two thousand eleven rankings came out, the U.S. News Web site recorded more than ten million visitors. U.S. News has added rankings of graduate programs, law schools, business schools, medical schools, and hospitals—and Morse has become the dean of a burgeoning international rankings industry.
“In the early years, the thing that’s happening now would not have been imaginable,” Morse says. “This idea of using the rankings as a benchmark, college presidents setting a purpose of ‘We’re going to rise in the U.S. News ranking,’ as proof of their management, or as proof that they’re a better school, that they’re a good president. That wasn’t on anybody’s radar. It was just for consumers.”Over the years, Morse’s methodology has steadily evolved. In its current form, it relies on seven weighted variables:
Undergraduate academic reputation, 22.Five per cent
Graduation and freshman retention rates, twenty per cent
Faculty resources, twenty per cent
Student selectivity, fifteen per cent
Financial resources, ten per cent
Graduation rate spectacle, 7.Five per cent
Alumni providing, five per centFrom these variables, U.S. News generates a score for each institution on a scale of one to 100, where Harvard is a one hundred and the University of North Carolina-Greensboro is a 22. Here is a list of the schools that finished in positions forty-one through fifty in the two thousand eleven “National University” category:
Case Western Reserve, 60
Rensselaer Polytechnic Institute, 60
University of California-Irvine, 60
University of Washington, 60
University of Texas-Austin, 59
University of Wisconsin-Madison, 59
Penn State University-University Park, 58
University of Illinois, Urbana-Champaign, 58
University of Miami, 58
Yeshiva University, 57
This ranking system looks a superb deal like the Car and Driver methodology. It is heterogeneous. It doesn’t just compare U.C. Irvine, the University of Washington, the University of Texas-Austin, the University of Wisconsin-Madison, Penn State, and the University of Illinois, Urbana-Champaign—all public institutions of harshly the same size. It aims to compare Penn State—a very large, public, land-grant university with a low tuition and an economically diverse student figure, set in a rural valley in central Pennsylvania and famous for its football team—with Yeshiva University, a puny, expensive, private Jewish university whose undergraduate program is set on two campuses in Manhattan (one in midtown, for the women, and one far uptown, for the boys) and is undoubtedly not famous for its football team.
The system is also comprehensive. It doesn’t simply compare schools along one dimension—the test scores of incoming freshmen, say, or academic reputation. An algorithm takes a slate of statistics on each college and converts them into a single score: it tells us that Penn State is a better school than Yeshiva by one point. It is effortless to see why the U.S. News rankings are so popular. A single score permits us to judge inbetween entities (like Yeshiva and Penn State) that otherwise would be unlikely to compare. At no point, however, do the college guides acknowledge the extreme difficulty of the task they have set themselves. A comprehensive, heterogeneous ranking system was a open up for Car and Driver —and all it did was rank inanimate objects operated by a single person. The Penn State campus at University Park is a complicated institution with dozens of schools and departments, four thousand faculty members, and forty-five thousand students. How on earth does anyone propose to assign a number to something like that?The very first difficulty with rankings is that it can be remarkably hard to measure the variable you want to rank—even in cases where that variable seems flawlessly objective. Consider an extreme example: suicide. Here is a ranking of suicides per hundred thousand people, by country:
South Korea, 31.0
This list looks straightforward. Yet no self-respecting epidemiologist would look at it and conclude that Belarus has the worst suicide rate in the world, and that Hungary belongs in the top ten. Measuring suicide is just too tricky. It requires someone to make a surmise about the intentions of the deceased at the time of death. In some cases, that’s effortless. Maybe the victim hopped off the Golden Gate Bridge, or left a note. In most cases, however, there’s ambiguity, and different coroners and different cultures vary widely in the way they choose to interpret that ambiguity. In certain places, cause of death is determined by the police, who some believe are more likely to call an ambiguous suicide an accident. In other places, the decision is made by a physician, who may be less likely to do so. In some cultures, suicide is considered so shameful that coroners timid away from that determination, even when it’s evident. A suicide might be called a suicide, a homicide, an accident, or left undetermined. David Phillips, a sociologist at the University of California-San Diego, has argued persuasively that a significant percentage of single-car crashes are very likely suicides, and criminologists suggest that a good percentage of civilians killed by police officers are actually cases of “suicide by cop”—instances where someone deliberately provoked deadly force. The reported suicide rate, then, is almost certainly less than the actual suicide rate. But no one knows whether the relationship inbetween those two numbers is the same in every country. And no one knows whether the proxies that we use to estimate the real suicide rate are any good.
“Many, many people who commit suicide by poison have something else wrong with them—let’s say the person has cancer—and the death of this person might be listed as primarily associated with cancer, rather than with deliberate poisoning,” Phillips says. “Any suicides in that category would be undetectable. Or it is frequently noted that Orthodox Jews have a low recorded suicide rate, as do Catholics. Well, it could be because they have this very solid community and proscriptions against suicide, or because they are unusually embarrassed by suicide and more willing to hide it. The plain reaction is nobody knows whether suicide rankings are real.”
The U.S. News rankings suffer from a serious case of the suicide problem. There’s no direct way to measure the quality of an institution—how well a college manages to inform, inspire, and challenge its students. So the U.S. News algorithm relies instead on proxies for quality—and the proxies for educational quality turn out to be flimsy at best.Take the category of “faculty resources,” which counts for twenty per cent of an institution’s score. “Research shows that the more pleased students are about their contact with professors,” the College Guide’s explanation of the category commences, “the more they will learn and the more likely it is they will graduate.” That’s true. According to educational researchers, arguably the most significant variable in a successful college education is a vague but crucial concept called student “engagement”—that is, the extent to which students immerse themselves in the intellectual and social life of their college—and a major component of engagement is the quality of a student’s contacts with faculty. As with suicide, the disagreement isn’t about what we want to measure. So what proxies does U.S. News use to measure this elusive dimension of engagement? The explanation goes on:
We use six factors from the 2009-10 academic year to assess a school’s commitment to instruction. Class size has two components, the proportion of classes with fewer than twenty students (30 percent of the faculty resources score) and the proportion with fifty or more students (Ten percent of the score). Faculty salary (35 percent) is the average faculty pay, plus benefits, during the 2008-09 and 2009-10 academic years, adjusted for regional differences in the cost of living. . . . We also weigh the proportion of professors with the highest degree in their fields (15 percent), the student-faculty ratio (Five percent), and the proportion of faculty who are total time (Five percent). This is a puzzling list. Do professors who get paid more money truly take their instructing roles more gravely? And why does it matter whether a professor has the highest degree in his or her field? Salaries and degree attainment are known to be predictors of research productivity. But studies showcase that being oriented toward research has very little to do with being good at training. Almost none of the U.S. News variables, in fact, seem to be particularly effective proxies for engagement. As the educational researchers Patrick Terenzini and Ernest Pascarella concluded after analyzing twenty-six hundred reports on the effects of college on students:
After taking into account the characteristics, abilities, and backgrounds students bring with them to college, we found that how much students grow or switch has only inconsistent and, perhaps in a practical sense, trivial relationships with such traditional measures of institutional “quality” as educational expenditures per student, student/faculty ratios, faculty salaries, percentage of faculty with the highest degree in their field, faculty research productivity, size of the library, [or] admissions selectivity.
The reputation score that serves as the most significant variable in the U.S. News methodology—accounting for 22.Five per cent of a college’s final score—isn’t any better. Every year, the magazine sends a survey to the country’s university and college presidents, provosts, and admissions deans (along with a sampling of high-school guidance counsellors) asking them to grade all the schools in their category on a scale of one to five. Those at national universities, for example, are asked to rank all two hundred and sixty-one other national universities—and Morse says that the typical respondent grades about half of the schools in his or her category. But it’s far from clear how any one individual could have insight into that many institutions. In an article published recently in the Annals of Internal Medicine , Ashwini Sehgal analyzed U.S. News’s “Best Hospitals” rankings, which also rely powerfully on reputation ratings generated by professional peers. Sehgal put together a list of objective criteria of performance—such as a hospital’s mortality rates for various surgical procedures, patient-safety rates, nursing-staffing levels, and key technologies. Then he checked to see how well those measures of spectacle matched each hospital’s reputation rating. The reaction, he discovered, was that they didn’t. Having good outcomes doesn’t translate into being admired by other doctors. Why, after all, should a gastroenterologist at the Ochsner Medical Center, in Fresh Orleans, have any specific insight into the spectacle of the gastroenterology department at Mass General, in Boston, or even, for that matter, have anything more than an anecdotal impression of the gastroenterology department down the road at some hospital in Baton Rouge?
Some years ago, similarly, a former chief justice of the Michigan supreme court, Thomas Brennan, sent a questionnaire to a hundred or so of his fellow-lawyers, asking them to rank a list of ten law schools in order of quality. “They included a good sample of the big names. Harvard. Yale. University of Michigan. And some lesser-known schools. John Marshall. Thomas Cooley,” Brennan wrote. “As I recall, they ranked Penn State’s law school right about in the middle of the pack. Maybe fifth among the ten schools listed. Of course, Penn State doesn’t have a law school.”
Those lawyers put Penn State in the middle of the pack, even tho’ every fact they thought they knew about Penn State’s law school was an illusion, because in their minds Penn State is a middle-of-the-pack brand. (Penn State does have a law school today, by the way.) Sound judgments of educational quality have to be based on specific, hard-to-observe features. But reputational ratings are simply inferences from broad, readily observable features of an institution’s identity, such as its history, its prominence in the media, or the elegance of its architecture. They are prejudices.
And where do these kinds of reputational prejudices come from? According to Michael Bastedo, an educational sociologist at the University of Michigan who has published widely on the U.S. News methodology, “rankings drive reputation.” In other words, when U.S. News asks a university president to perform the unlikely task of assessing the relative merits of dozens of institutions he knows nothing about, he relies on the only source of detailed information at his disposition that assesses the relative merits of dozens of institutions he knows nothing about: U.S. News . A school like Penn State, then, can do little to improve its position. To go higher than forty-seventh, it needs a better reputation score, and to get a better reputation score it needs to be higher than forty-seventh. The U.S. News ratings are a self-fulfilling prophecy.
Bastedo, incidentally, says that reputation ratings can sometimes work very well. It makes sense, for example, to ask professors within a field to rate others in their field: they read one another’s work, attend the same conferences, and hire one another’s graduate students, so they have real skill on which to base an opinion. Reputation scores can work for one-dimensional rankings, created by people with specialized skill. For example, the Wall Street Journal has ranked colleges according to the opinions of corporate recruiters. Those opinions are more than a proxy. To the extent that people chose one college over another to enhance their prospects in the corporate job markets, the reputation rankings of corporate recruiters are of direct relevance. The No. One school in the Wall Street Journal ’ s corporate recruiter’s ranking, by the way, is Penn State.
For several years, Jeffrey Stake, a professor at the Indiana University law school, has run a Web site called the Ranking Game. It contains a spreadsheet loaded with statistics on every law school in the country, and permits users to pick their own criteria, assign their own weights, and construct any ranking system they want.
Stake’s intention is to demonstrate just how subjective rankings are, to display how determinations of “quality” turn on relatively arbitrary judgments about how much different variables should be weighted. For example, his site makes it effortless to mimic the U.S. News rankings. All you have to do is give equal weight to “academic reputation,” “LSAT scores at the 75th percentile,” “student-faculty ratio,” and “faculty law-review publishing,” and you get a list of élite schools which looks similar to the U.S News law-school rankings:
University of Chicago
University of Pennsylvania
Fresh York University
University of California, Berkeley
There’s something missing from that list of variables, of course: it doesn’t include price. That is one of the most distinctive features of the U.S. News methodology. Both its college rankings and its law-school rankings prize schools for dedicating lots of financial resources to educating their students, but not for being affordable. Why? Morse admitted that there was no formal reason for that position. It was just a feeling. “We’re not telling that we’re measuring educational outcomes,” he explained. “We’re not telling we’re social scientists, or we’re subjecting our rankings to some peer-review process. We’re just telling we’ve made this judgment. We’re telling we’ve interviewed a lot of experts, we’ve developed these academic indicators, and we think these measures measure quality schools.”
As answers go, that’s up there with the parental “Because I said so.” But Morse is simply being fair. If we don’t understand what the right proxies for college quality are, let alone how to represent those proxies in a comprehensive, heterogeneous grading system, then our rankings are inherently arbitrary. All Morse was telling was that, on the question of price, he comes down on the Car and Driver side of things, not on the Consumer Reports side. U.S. News thinks that schools that spend a lot of money on their students are nicer than those that don’t, and that this niceness ought to be factored into the equation of desirability. Slew of Americans agree: the campus of Vanderbilt University or Williams College is packed with students whose families are largely indifferent to the price their school charges but keenly interested in the flower beds and the spacious suites and the architecturally distinguished lecture halls those high prices make possible. Of course, given that the rising cost of college has become a significant social problem in the United States in latest years, you can make a strong case that a school ought to be rewarded for being affordable. So suppose we go back to Stake’s ranking game, and re-rank law schools based on student-faculty ratio, L.S.A.T. scores at the seventy-fifth percentile, faculty publishing, and price, all weighted identically. The list now looks like this:
University of Chicago
Brigham Youthful University
University of Colorado
University of Pennsylvania
Columbia UniversityThe revised ranking tells us that there are schools—like B.Y.U. and Colorado—that provide a good legal education at a decent price, and that, by choosing not to include tuition as a variable, U.S. News has effectively penalized those schools for attempting to provide value for the tuition dollar. But that’s a very subtle tweak. Let’s say that value for the dollar is something we truly care about. And so what we want is a three-factor ranking, counting value for the dollar at forty per cent, L.S.A.T. scores at forty per cent of the total, and faculty publishing at twenty per cent. Look at how the top ten switches:
University of Chicago
Brigham Youthfull University
University of Texas
University of Virginia
University of Colorado
University of Alabama
University of Pennsylvania
Welcome to the big time, Alabama!
The U.S. News rankings turn out to be utter of these kinds of implicit ideological choices. One common statistic used to evaluate colleges, for example, is called “graduation rate spectacle,” which compares a school’s actual graduation rate with its predicted graduation rate given the socioeconomic status and the test scores of its incoming freshman class. It is a measure of the school’s efficacy: it quantifies the influence of a school’s culture and teachers and institutional support mechanisms. Tulane, given the qualifications of the students that it admits, ought to have a graduation rate of eighty-seven per cent; its actual two thousand nine graduation rate was seventy-three per cent. That shortfall suggests that something is amiss at Tulane.
Another common statistic for measuring college quality is “student selectivity.” This reflects variables such as how many of a college’s freshmen were in the top ten per cent of their high-school class, how high their S.A.T. scores were, and what percentage of applicants a college admits. Selectivity quantifies how accomplished students are when they very first arrive on campus.
Each of these statistics matters, but for very different reasons. As a society, we most likely care more about efficacy: America’s future depends on colleges that make sure the students they admit leave with an education and a degree. If you are a bright high-school senior and you’re thinking about your own future, tho’, you may well care more about selectivity, because that relates to the prestige of your degree.
But no institution can excel at both. The national university that ranks No. One in selectivity is Yale. A crucial part of what it considers its educational function is to assemble the most gifted group of freshmen it can. Because it maximizes selectivity, tho’, Yale will never do well on an efficacy scale. Its freshmen are so accomplished that they have a predicted graduation rate of ninety-six per cent: the highest Yale’s efficacy score could be is plus four. (It’s actually plus two.) Of the top fifty national universities in the “Best Colleges” ranking, the least selective school is Penn State. Penn State sees its educational function as serving a broad range of students. That gives it the chance to excel at efficacy—and it does so brilliantly. Penn State’s freshmen have an expected graduation rate of seventy-three per cent and an actual graduation rate of eighty-five per cent, for a score of plus twelve: no other school in the U.S. News top fifty comes close.
There is no right reaction to how much weight a ranking system should give to these two contesting values. It’s a matter of which educational model you value more—and here, once again, U.S. News makes its position clear. It gives twice as much weight to selectivity as it does to efficacy. It favors the Yale model over the Penn State model, which means that the Yales of the world will always succeed at the U.S. News rankings because the U.S. News system is designed to prize Yale-ness. By contrast, to the extent that Penn State succeeds at doing a better job of being Penn State—of attracting a diverse group of students and educating them capably—it will only do worse. Rankings are not benign. They enshrine very particular ideologies, and, at a time when American higher education is facing a crisis of accessibility and affordability, we have adopted a de-facto standard of college quality that is uninterested in both of those factors. And why? Because a group of magazine analysts in an office building in Washington, D.C., determined twenty years ago to value selectivity over efficacy, to use proxies that scarcely relate to what they’re meant to be proxies for, and to pretend that they can compare a large, diverse, low-cost land-grant university in rural Pennsylvania with a puny, expensive, private Jewish university on two campuses in Manhattan.
“If you look at the top twenty schools every year, forever, they are all wealthy private universities,” Graham Spanier, the president of Penn State, told me. “Do you mean that even the most prestigious public universities in the United States, and you can take your pick of what you think they are—Berkeley, U.C.L.A., University of Michigan, University of Wisconsin, Illinois, Penn State, U.N.C.—do you mean to say that not one of those is in the top tier of institutions? It doesn’t truly make sense, until you drill down into the rankings, and what do you find? What I find more than anything else is a measure of wealth: institutional wealth, how big is your endowment, what percentage of alumni are donating each year, what are your faculty salaries, how much are you spending per student. Penn State may very well be the most popular university in America—we get a hundred and fifteen thousand applications a year for admission. We serve a lot of people. Almost a third of them are the very first people in their entire family network to come to college. We have seventy-six per cent of our students receiving financial aid. There is no possibility that we could do anything here at this university to get ourselves into the top ten or twenty or thirty—except if some donor gave us billions of dollars.”In the fall of 1913, the prominent American geographer Ellsworth Huntington sent a letter to two hundred and thirteen scholars from twenty-seven countries. “May I ask your cooperation in the prep of a map showcasing the distribution of the higher elements of civilization via the world?” Huntington began, and he continued:
My purpose is to prepare a map which shall demonstrate the distribution of those characteristics which are generally recognized as of the highest value. I mean by this the power of initiative, the capacity for formulating fresh ideas and for carrying them into effect, the power of self-control, high standards of honesty and morality, the power to lead and to control other races, the capacity for disseminating ideas, and other similar qualities which will readily suggest themselves.
Each contributor was given a list of a hundred and eighty-five of the world’s regions—ranging from the Amur district of Siberia to the Kalahari Desert—with instructions to give each region a score of one to ten. The scores would then be summed and converted to a scale of one to a hundred. The rules were stringent. The past could not be considered: Greece could not be given credit for its ancient glories. “If two races inhabit a given region,” Huntington specified further, “both must be considered, and the rank of the region must depend upon the average of the two.” The reputation of immigrants could be used toward the score of their country of origin, but only those of the very first generation. And size and commercial significance should be held constant: the Scots should not suffer relative to, say, the English, just because they were less populous. Huntington’s respondents took on the task with the utmost seriousness. “One appreciates what a big world this is and how little one knows about it when he attempts such a task as you have set,” a respondent wrote back to Huntington. “It is a most excellent means of taking the conceit out of one.” England and Wales and the North Atlantic states of America scored a flawless hundred, with central and northwestern Germany and Fresh England coming in at ninety-nine.
Huntington then requested from the twenty-five of his correspondents who were Americans an in-depth ranking of the constituent regions of the United States. This time, he proposed a six-point scale. Southern Alaska, in this 2nd reckoning, was last, at 1.Five, followed by Arizona and Fresh Mexico, at 1.6. The winners: Massachusetts, at 6.0, followed by Connecticut, Rhode Island, and Fresh York, at Five.8. The citadel of American civilization was Fresh England and Fresh York, Huntington concluded, in his magisterial one thousand nine hundred fifteen work “Civilization and Climate.” In case you are wondering, Ellsworth Huntington was a professor of geography at Yale, in Fresh Haven, Connecticut. “Civilization and Climate” was published by Yale University Press, and the book’s appendix contains a list of Huntington’s American correspondents, of which the following bear special mention:
J. Barrell, geologist, Fresh Haven, Conn.
P. Bigelow, traveler and author, Malden, N.Y.
I. Bowman, geographer, Fresh York City
W. M. Brown, geographer, Providence, R.I.
A. C. Coolidge, historian, Cambridge, Mass.
S. W. Cushing, geographer, Salem, Mass.
L. Farrand, anthropologist, Fresh York City
C. W. Furlong, traveler and author, Boston, Mass.
E. W. Griffis, traveler and author, Ithaca, N.Y.
A. G. Keller, anthropologist, Fresh Haven, Conn.
E. F. Merriam, editor, Boston, Mass.
J. R. Smith, economic geographer, Philadelphia, Pa.
Anonymous, Fresh York City
“In spite of several attempts I was incapable to obtain any contributor in the states west of Minnesota or south of the Ohio Sea,” Huntington explains, as if it were a side issue. It isn’t, of course—not then and not now. Who comes out on top, in any ranking system, is truly about who is doing the ranking. ♦