Fun With PageRank - College Football

January 12, 2016

I am by no means to first person to try and rank football teams using PageRank. I quick google search will attest to that. Still I find the PageRank algorithm really interesting and was curious what kind of rankings I would get when I applied in to scores from this season. I also wanted to compare different techniques for creating links between nodes representing teams and see if I could optimize the process to give rankings that best predicted the success of teams.

PageRank is an algorithm that was originally created by Larry Page and was key to the creation of Google as it allows webpages to be ranked and shown in a meaningful order in search results. The algorithm works by representing each website as a node with edges linking it to all the pages that it links to and edges connecting to it from all the pages that link to that webpage. The algorithm gives every node an initial rank value of 1 and then continually runs through the list of nodes and assigns its value to the sum of the ranks of the sites that link to it divided by the number of nodes each one links to times a dampening value plus 1 minus the dampening value (the dampening value allows for sites that are linked to by no others to still be factored in). This gives us the formula for the rank of a website (PR(A)) to be

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

or more specifically the value that this approaches when we run through the list of pages many times. Basically this is the provability that a random web surfer who starts on a random page and randomly clicks on links will end up on a given page. For more info I recommend reading this.

To apply this algorithm to college football scores obviously the nodes represent the teams and the simplest way to link the teams is just to link the winning team to the loosing team in each game. Now all that is left to do is to write a quick program, plug in the data for the 2015 season from this site, and see what the best teams in college football are:

RankingTeamRank ValueAP Rank
1Mississippi13.15206438979450710
2Alabama11.9954342285013261
3Michigan State6.0282375617030266
4Arkansas5.851229795995238
5Florida5.1086533505910225
6Memphis4.669365414640358
7Clemson4.6087026274292482
8Stanford4.023852635093753
9Houston3.81611784509302558
10Oklahoma3.68746307851725735
11Connecticut3.6331792831198086
12Nebraska3.331446212217053
13Northwestern3.25290991818144823
14Louisiana State3.176533875323984516
15Oregon3.030027790469455719
16Michigan2.97151189085903812
17Ohio State2.85226300108669144
18Utah2.768311147129962617
19Iowa2.6053932987941799
20Notre Dame2.580842531562548311
21Texas Christian2.5276633885906187
22Texas2.4284649059411563
23Oklahoma State2.41212569798159120
24Navy2.36814707164837618
25Toledo2.2649714913821746

This isn’t quite what I was hoping for. Ole Miss is no doubt a very good team this year but I don’t know that they should have been in the championship. Also Clemson, who looked really good last night, is the #7 team in this ranking, 6-7 Connecticut and Nebraska are #11 and #12 (6 spots above Ohio State), and, as much as it pains me to admit it, Texas probably doesn’t deserve to be ranked #22 above Oklahoma State (even though we should have won our game against them). These problems all kind of makes sense though. This system only gives teams credit for who they beat and more the less people that beat those teams. Ole Miss was the only team to beat Alabama so they gain a huge boost and Texas, with wins against Oklahoma and Baylor, shoots up to #22 despite our 5-7 record. The question is though, despite these problems, how well is it working? I modified the program to run on the data from 2008-2015 (resetting after each year) and to count in what percentage of games after week 4 in the year (necessary to build up some links) the team with the higher rank won. The answer I got was 66%. That isn’t bad for sure but this is college football, in at least half of games the winner is obvious.

There must be a better way.

What if instead of just giving each team a connection for a win you gave them one weighted based on the number of points they won by? I modified the code and ran it again and got this:

RankingTeamRank ValueAP Rank
1Mississippi16.54610929179966710
2Alabama16.2218819465835371
3Florida10.11650682001526725
4Michigan State7.5966233961978936
5Ohio State6.7550350903663824
6Michigan6.18758656201683912
7Clemson5.3143691225788362
8Memphis4.972078578040473
9Stanford4.1328665206245083
10Houston4.1216658466405448
11Florida State3.966226393339516814
12Connecticut3.8935762094662736
13Northwestern3.55882849367543123
14Utah3.387831672644298717
15Oklahoma3.37928752767103375
16Navy3.110639035552415618
17Louisiana State2.97054399203173116
18Southern California2.8507316276250267
19Temple2.5541231226403918
20Notre Dame2.54075967018954511
21Oregon2.29899629100377219
22Oklahoma State2.20099411396664920
23Arkansas2.1821347212464777
24Auburn2.001231953036031
25South Florida1.9591946276537273
67.4% Correct

A little better but still not ideal. The top 25 actually look a lot better with this system but the overall percentage of games predicted correctly didn’t improve much. This is probably because this system gives large bonuses to teams with blowout wins which skews the results for lower ranked teams. For example this system ranks Georgia Tech #51 over Georgia at #54 despite Georgia Tech being 3-9 and Georgia being 10-3 because all three of Georgia Tech’s wins were either blowout victories or against a highly ranked team. Georgia on the other hand played mostly very close games except against other low ranked teams where the victories weren’t worth much.

For the next version I made a couple of changes. Links between nodes are now reduced in weight from week to week and season to season but are carried over. This allows me to only have to exclude the first 4 weeks on the 2008 season and not the first 4 weeks on every season to get meaningful data. I wrote an optimization program to determine what the optimal drop off values were for between weeks and between seasons. Not surprisingly the week to week drop off was very small and the one between seasons was very large. I expected this to slightly decrease the percentage of times the higher ranked team beat the lower ranked team but in fact it actually increased it to 71.74% although the final order didn’t change much. Next I played around with different ways of improving the linking algorithm until I found some things that improved the percentage correct significantly. To solve the problem where teams gained huge bonuses for blowout wins I took the points they won by to a power between 0 and 1. This made all wins more equal but still gave bonuses to greater margins of victory. I used my optimization code to determine the ideal value to fit my set of data. This gave me these rankings:

RankingTeamRank ValueAP Rank
1Mississippi14.42143890340860310
2Alabama14.3140096618761441
3Florida7.63390541660449825
4Michigan State6.7193559712594466
5Ohio State6.3867737772570614
6Clemson5.9509254130051594
7Stanford4.55692686516393053
8Memphis4.52878062713223
9Michigan4.43960000376814612
10Oklahoma4.2835585512953485
11Utah4.05228745772277417
12Oregon3.801463838185711519
13Houston3.6609934978225078
14Florida State3.620738621037450614
15Oklahoma State3.581902176601720720
16Arkansas3.5163718511428637
17Texas Christian3.4340719620284667
18Southern California3.377286250509462
19Louisiana State3.27414591772634616
20Northwestern3.15649162510500323
72.23% Correct

Getting better but Mississippi is still on top. To reward teams for getting more wins and not just for the teams they beat I linked a team to itself when they won but with a relatively low strength. These were the results:

RankingTeamRank ValueAP Rank
1Alabama21.24249024885741
2Ohio State11.7046402979200064
3Clemson10.1499792302554822
4Mississippi8.77958739036300410
5Stanford6.0895184004055263
6Houston5.9109314770260278
7Michigan State5.3746586560176616
8Oklahoma4.4185120162341295
9Florida4.41474865355223625
10Texas Christian4.0384967658892797
11Utah3.845647784956568717
12Michigan3.530314390319729612
13Oregon3.355549281995348719
14Florida State3.203865141193116714
15Notre Dame3.09845856906183611
16Southern California2.838827541666553
17Oklahoma State2.82259031818817620
18Baylor2.791163733745587513
19Memphis2.7283381324200637
20Louisiana State2.713301967680143516
73.25% Correct

Looks a lot better now. One big thing that could still be improved though is to somehow incorporate the idea of a “quality loss”. To accomplish this I added a small link form the winning team to the loosing team that is inversely dependent on the difference in score. Sort of the opposite of the first score link. Again optimizing the values. Here are the results:

RankingTeamRank ValueAP Rank
1Alabama6.9572590479905961
2Clemson5.3847686299196462
3Ohio State4.682554499270194
4Stanford3.96864553610071273
5Mississippi3.8129086628828110
6Oklahoma3.6698405457820935
7Notre Dame3.131312657728692311
8Michigan State3.1157114919295916
9Houston3.0671969357504968
10Texas Christian3.0280642352807957
11Florida State2.98032281865276414
12Michigan2.956959951476033412
13Baylor2.85710244801788313
14Oregon2.828197097337148419
15Utah2.826593881441251517
16Tennessee2.822419614518001422
17Southern California2.8009842814746286
18North Carolina2.792351113099641215
19Louisiana State2.766995853441243516
20Florida2.75716827819305925
73.94% Correct

This obviously helped Clemson and Ohio State as both only had one loss but they were to really good teams. One final question I had was if the point rankings should be based off of the numerical point difference or the ratio of the scores. Previously I had been using the numerical difference but I decided to convert it over to ratio to see how that change things. I had to reoptimize everything put in the end these were the results I got:

RankingTeamRank ValueAP Rank
1Alabama13.8790789355062181
2Clemson9.4578811532541052
3Ohio State7.1694944181895674
4Oklahoma4.780777050166975
5Michigan4.7005626055583612
6Mississippi4.54352933929555810
7Notre Dame4.45059788245847611
8Stanford4.2695198010602483
9Florida State4.24250027279683114
10Houston4.1292501611755498
11Texas Christian3.73104555107869157
12Arkansas3.124350503877841
13Tennessee2.903238377638649722
14North Carolina2.815275616544859415
15Michigan State2.80878057234671856
16Florida2.789456115911504825
17Utah2.786365896969191317
18Baylor2.686688193331202313
19Navy2.65343176262540618
20Oregon2.646653814442190319
74.97% Correct

I had mixed feelings regarding this change. It is objectively better as the percentage of games in which the team that has the higher rankings wins is much higher but I’m not sure that I like the top 20 as much. Michigan is ranked really high and Michigan State is really low. Also Stanford sees a big drop when this system is employed. The final remaining question is how these rankings preform on data they were not optimized on. I first tested the ratio based algorithm on data from 2000 to 2008 with the first 4 weeks of the 2000 season reserved for linking teams and not predicting. It scored 71.91% which isn’t bad on data it was not optimized for. When I tested the margin of victory algorithm that initially scored over 1 percentage point lower on the optimized data it scored 72.39% of the test data. This makes me think that the ratio algorithm was only better because it optimized to fit the data better and the margin of victory algorithm is the better one overall. This fits nicely with me preferring how it ranks the top teams.

In conclusion I was able to rank college football teams using the PageRank algorithm and improve on the initial results. The advantage of this algorithm is that it doesn’t just rank the top teams but it ranks every college football team giving you a complete picture of D1 college football. This certainly isn’t the best way to rank college football teams. There is lots more data you could incorporate and better algorithms you could use that can allow you to predict over 80% of games. As a relatively simple algorithm with simple code and very limited input data I was surprised how well the ranking it produced looked. Finally here are the full rankings using the data from 2000-2015:

RankingTeamRank ValueAP Rank
1Alabama7.2241600113340141
2Clemson5.5419060539562792
3Ohio State4.862488887877414
4Stanford4.0487980839414093
5Mississippi3.97325218603648410
6Oklahoma3.7566549286033765
7Michigan State3.2374066578395396
8Notre Dame3.22191331940879911
9Houston3.1777635000406618
10Florida State3.093482162979231714
11Texas Christian3.0920140285112547
12Michigan3.05197623109596512
13Louisiana State2.955891083428892516
14Baylor2.921894539858769213
15Tennessee2.90751584419459522
16Oregon2.8861441660234519
17Florida2.88336579020021825
18Utah2.883002140689435517
19North Carolina2.87051393470006315
20Southern California2.860817353677727
21Arkansas2.693500813171901
22Navy2.681530292073354418
23Mississippi State2.550840483045118
24Western Kentucky2.42743723498305524
25Iowa2.3598950790415059
26Oklahoma State2.349384003337650320
27Washington2.345241812521774
28Georgia2.2800274101179046
29Texas A&M2.2799104320365173
30Auburn2.248746145497043
31Memphis2.106909587268683
32West Virginia2.0988069709103474
33San Diego State2.023316897548638
34California1.9998083973800473
35Northwestern1.983719831907433423
36UCLA1.9656033297508748
37Arizona State1.9595482430554303
38Toledo1.9571656509407265
39Brigham Young1.9552740040435235
40Boise State1.9449966931943656
41Wisconsin1.94330072342956521
42Temple1.9338290803668383
43Bowling Green State1.914337512174002
44South Florida1.8733294043883326
45Louisville1.8465286317157759
46Washington State1.831973936975508
47Western Michigan1.7770625516483372
48Georgia Southern1.7723632455828442
49Miami (FL)1.7532979228899208
50Pittsburgh1.7384524789672948
51Air Force1.7170629563103361
52Penn State1.7105386708605215
53North Carolina State1.7010018222165484
54Virginia Tech1.6813341796318055
55Appalachian State1.6701452911658001
56Texas Tech1.668766887400046
57Nebraska1.658889391522711
58Georgia Tech1.6212527640771435
59Texas1.560087748177339
60Arizona1.5546103998306493
61Cincinnati1.4879752779616622
62Connecticut1.4720027364068882
63Utah State1.4689319339591713
64Northern Illinois1.4449947491164377
65Marshall1.439102131210634
66Missouri1.4121176098813124
67Southern Mississippi1.4103483277653197
68Louisiana Tech1.4042627179262506
69Minnesota1.401874949493287
70Arkansas State1.398863043137954
71Indiana1.3605801440877276
72Duke1.3602521805821683
73South Carolina1.3533591434975678
74Kansas State1.336563547609276
75Maryland1.3210492183852187
76Iowa State1.2645859117547698
77Boston College1.2592665972458104
78Central Michigan1.256794452939677
79Middle Tennessee State1.1953293648354104
80East Carolina1.19208679260967
81Illinois1.1597159393554373
82Virginia1.1364454151279006
83Syracuse1.1268991092129959
84Ohio1.0948949721127397
85Army1.0898953872771153
86Kentucky1.0760839605482424
87Vanderbilt1.0693634717891984
88Purdue1.0582575603024897
89Rutgers1.0523179868981378
90Wake Forest1.0465479065689278
91San Jose State1.0281746663360987
92Oregon State1.02483602454208
93Buffalo0.9778664454959232
94Colorado State0.9724796739711302
95Colorado0.95521838188361
96Akron0.945023946610994
97Georgia State0.9340434648976754
98New Mexico0.9333227812426348
99Tulsa0.9265686395895174
100Troy0.9225759057018894
101Nevada0.8660562258211573
102Fresno State0.8405874154998786
103Southern Methodist0.8345438627239115
104Texas-San Antonio0.8336273636854139
105Texas State0.8312894958813077
106Wyoming0.8177168233290907
107South Alabama0.8131296610246296
108Kansas0.7628025025590193
109Florida Atlantic0.7446616388347236
110Ball State0.7359578644845043
111Nevada-Las Vegas0.7353214985525114
112North Texas0.7314461864717602
113Tulane0.7159965270025683
114Hawaii0.7087076832398298
115Central Florida0.6952510236890954
116Rice0.6810685798567235
117Florida International0.6757691943756488
118Old Dominion0.6617348410773847
119Kent State0.657665592627139
120Louisiana-Monroe0.6559556904609171
121Texas-El Paso0.6536543700896079
122New Mexico State0.6525230431779202
123Massachusetts0.6259609514481066
124Eastern Michigan0.6177975426182662
125Louisiana-Lafayette0.6034732426264243
126Miami (OH)0.5574022478633163
127Idaho0.5410084836595543
128Portland State0.49430950488609543
129Charlotte0.426612271625464
130Charleston Southern0.3248664329086697
131Florida A&M0.30839512988517637
132Wofford0.2838487833117854
133Tennessee-Martin0.27863686732751586
134Western Carolina0.27743796447851354
135Liberty0.2751060860092764
136North Dakota0.25195493995024537
137North Dakota State0.25066216675454345
138Alabama-Birmingham0.2501699083157457
139Fordham0.23276890053878724
140Tennessee Tech0.23168873120083538
141Chattanooga0.22719931634103432
142Yale0.22530299161852424
143Holy Cross0.22350682276629513
144Lamar0.22242672181406864
145Citadel0.21921072403021805
146Eastern Washington0.21784862985979864
147Stephen F. Austin0.21453120865250358
148Delaware0.2142078232116168
149Northwestern State0.2137953512282769
150Southern0.21175943067922767
151Jacksonville State0.20987003609031457
152South Dakota State0.20794142373153612
153North Carolina A&T0.2058309531827044
154James Madison0.20550494514202322
155Illinois State0.2047194790637865
156Murray State0.2045157703253057
157Missouri State0.20431112389932138
158Eastern Kentucky0.20356946516977809
159Colgate0.2033479562469926
160Norfolk State0.20125784726153623
73.24% Correct