|
A somewhat surprising Top 100 hundred game list, based on BGG ratings… AND SCIENCE!
Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
Another Top 100 list? This one does promise to be a little different, and hopefully, more than a little bit interesting. But before you hit the back button, let me tell you why this one is different:
The rankings listed here are based solely off the BGG rating data, but they have nothing to do with the average or Bayesian average or ratings, or combining the average rating with the number of ratings, etc. They are based on the relative rankings of games by each user. For every user’s list of rated, we can infer which games they think are better or worse than other games. From this, we can look at all the possible pairings of games and find the "most preferred games". This is not a list of games determined by just mass popularity or rating, but by how people compared a game to other games they have played.
I think the results might be a little surprising -- and interesting! But before we get to the results, let me show a small example of why there can be so much more to rankings than just averages of ratings...
Let’s look at a tiny subset of games, Agricola, Brass, Caylus, and Dominion. 5 imaginary gamers rank them, relative to each other, from best to worst, with their ratings shown in brackets:
Ellie: Agricola (9) > Brass (6) > Caylus (5) > Dominion (4) Fred: Dominion (9) > Brass (7) > Caylus (6) > Agricola (5) Giles: Agricola (9) > Brass (6) > Dominion (5) > Caylus (2) Hank: Brass (9) > Caylus (8) > Agricola (7) > Dominion (5) June: Brass (8) > Agricola (7) > Caylus (6) > Dominion (5)
According to this group, which is the best game? Using this system (it’s the Schulze method, for the curious), the answer is Brass. All of them think it is better than Caylus, three-fifths think it is better than Agricola, and four-fifths think it is better than Dominion. We did use the rating to determine the order of preference, but after that, the number is not needed.
But if we went by the ratings, the averages would have been A (7.4), B (7.2), C (5.4), D (5.6) -- Agricola wins even though 60% of the gamers like Brass better than it. By looking at the relative "do I like game X better than game Y" for each gamer instead of the ratings themselves, we got something that is a bit more telling about how the games relate to each other.
The above scenario is a great application of how a bunch of people sitting around a table might decide what game they would like to play. But wouldn’t it be interesting if we could do that on a massive scale, like having all the BGGers sitting around the same giant table and vote on all their favourite games? Well, in a sense, we can -- we can just order the ratings of users already in the BGG database to get their relative preferences.
Since the system is always comparing pairs of games, it’s only looking at the scoring by users who have rated both of the games in question. This leads to a strong transitive property -- if a majority of the gamers think A is better than B, and B is better than C, the system will rank A better than C, even if there are few users directly comparing A and C and the averages work out a different way.
The transitive property really shows up with games that that might have a smaller number of ratings, but do consistently better than "popular" titles -- which I believe is why there are a good number of war-games on the list. Or, it might be that the war-games just have a more consistent and agreed "X is better than Y", which strengthens their results. I’m not sure.
The data used to build this list consisted of almost 2 million current ratings for nearly a thousand games, resulting in over 800,000 pairings. From that, the preferences are calculated, and the most preferred game is found. We remove it from the pool, and calculate preferences again to find the second game, and so on. The algorithm’s speed is based on the cube of the number of games being compared, so it gets into taking a few days or longer running on the full data set. In order to speed it up, my solution was to randomly pick smaller subsets of the data, and take the top few to build a pool of 200 games to be compared for the final run-off. More specific details about the data collection and the ranking method are in the comments.
Games with high averages will usually do pretty well since they are more likely to beat other games in their one-to-one pairings of preference, but the resulting list is much different than one just sorted by average or Bayesian average.
Is this supposed to be the definitive list of "what is the best game?" I don’t think so, but I was pretty surprised at a number of the titles that appeared -- not the usual suspects on most Top 100 lists, but now I will be giving a lot of them a closer look. It’s fun to dive into the data and come up with something unexpected.
Are you surprised by the games on the list? Are they hidden gems or niche games?
Edit: I've started made a blog post (in what will hopefully be a series) about mining and looking at BGG data: http://www.boardgamegeek.com/blog/1006/mining-the-geek
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-
77.
Board Game: Yomi
[Average Rating:7.61 Overall Rank:183]

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-
78.
Board Game: Luna
[Average Rating:7.51 Overall Rank:229]

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-
80.
Board Game: TZAAR
[Average Rating:7.70 Overall Rank:147]

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-
86.
Board Game: Goa
[Average Rating:7.75 Overall Rank:30]

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-
87.
Board Game: Tichu
[Average Rating:7.72 Overall Rank:41]

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-
95.
Board Game: Egizia
[Average Rating:7.56 Overall Rank:113]

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-
96.
Board Game: London
[Average Rating:7.54 Overall Rank:106]

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-

Alex Wilson
Canada Waterloo Ontario
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
-
-
-
|
|
Waterloo
Ontario
-Get a list of the top 1000 games based on BGG rank.
-Get a list of the top 1000 games based on total number of votes.
-Since my scripts were just dumb regex scrapers, I also got whatever was in the Hot 100 at the time (not a problem, "not-yet-shipping" games and such are weeded out in later steps, and they wouldn’t affect later results anyway if left in).
-Merge all those lists together, end up with a list of ~1200 games.
-Grab every rating for each of those games. This took about 8 hours to grab just over 2 million ratings, need to go multi-threaded if I do this again in future.
-Pump the data into a database, where I can fine-tune and do some calculations more easily.
-Remove games with less than 100 total ratings and users with less than 10 game ratings. Later when trying to get the total size down a bit, I think I upped the thresholds to 200 votes for games. The data set is now for 920 games and has 1.89m ratings. This was my initial try at reducing the time to run, in the future, we can probably just keep all these results, as the real time optimization step comes later.
-The most elegant part: With a single SQL statement I can generate the pairings for each possible pair of games, X and Y, along with the number of users who rated X>Y. This generates 845,000 rows of data and takes about 2 hours. Not sure if the table/query can be optimized better. Could also be done outside of the database, but a single SQL statement wins for ease of use for now.
-Pump the pairings data out to a file.
-I started by implementing the Ranked Pairs method, which is based on the margin between pairs. Ultimately give up on this as testing shows it’s going to be too slow with the full data -- Both RP and Schulze are O(n^3), but doing the cycle detection in RP adds a second O(n^3) operation. Switch to Schulze method, but it also appears to be pretty slow (but faster than RP). Running it with all 920 games together is going to take several days for it to come up with the top 100.
-Since running the Schulze method on 100 items only takes ~10 seconds to decide on the winner (and less for each spot after that), I decided to run the method on random subsets of games -- take 100 games randomly from the full list, pick the top 10 via Schulze method from each and add them to the pool of winners. As the iterations grow, more repeat winners show up, so once a game has been in the winning list 3 times, we exclude it so isn’t considered again until the final round. Once the pool reaches 200, run it one last time on that pool (making sure to add in any games that never ran at all in the random rounds) to determine the final top 100 rankings from this "best of the best". This takes a few hours, depending on how zippy the machine in question is.
For the curious, I did everything with Perl and SQL Anywhere.
While the stochastic approach isn’t in strict keeping with the Schulze method, it does serve to narrow the pool in a pretty reliable manner that picks games that will fare well in the final showdown, as games that win in one subset tend to show up as winners when they are in other subsets.
I’ve got a few different variations running, so I’m curious how consistent the results are when the methodology is tweaked.
More info on the Schulze method:
http://en.wikipedia.org/wiki/Schulze_method
I’m not using the refined Proportional Representational method here:
http://home.versanet.de/~chris1-schulze/schulze2.pdf
(mostly because I haven’t figured it out yet, and I’m not sure the problems it addresses have much bearing in a simulation like this)
Some other fun bits from the data...
Largest pairing (the two games with the most people that rated one higher than the other):
Puerto Rico over Carcassonne, by 7900 users
Largest margin (the two games with the most people that rated A higher than B minus the number of people who rated them the other way):
Puerto Rico over Carcassonne, by a difference of 6231 users
Largest 2-step chain (largest margins for pairs A>B, B>C):
Puerto Rico > Settlers of Catan > Risk
(a margin of over 5000 for each)
But I think it’s interesting that even these huge margins don’t do much for determining final ranking on the list.
If I get a chance, I’ll add some more data points to the games in the list, so check back.
Edit:
First of all, thanks to all for the great comments. I was really hoping this would spark an interesting discussion, and it certainly has.
Of special note, as
Oakland
California
Please do take a look through the comments -- there's been a lot of wonderful discussion and feedback.
Humble
Texas
I'm generating the pairing data now for if unrated games were a 0. I'm predicting that the mosted-rated games will fill the top of the list, but we'll see.
Next I'll do 5 -- which does make sense given the BGG rating guidelines, I think. The only problem is some people use their own scoring criteria... the relative paring thing handles that without problem for rated games, but it will be be a little skewed by those that consistent rate differently than 5 is "Average game... take it or leave it".
What if you gave each unplayed game the average rating for that user? Hopefully that isn't too complicated to put in to the system. That way the unplayed games will be treated as "average" based on each person's rating methodology.
Waterloo
Ontario
http://www.boardgamegeek.com/blog/1006/mining-the-geek
Green Bay
Wisconsin
Carrollton
Texas