Ranked placements & soft resets, ELO hell, and other MMR and rank considerations - A Vulcan perspect

Bademeister·2/22/2016, 7:03:34 PM·1 votes·1,048 views

TL;DR – Using a team win/loss-based system for rank determination of individual players leads to skewed results in cases of high player populations and large skill variances combined with a steep but decreasing-slope learning curve, and low total number of games. The results are the more skewed the lower the individual player’s rank. In random team settings, counting wins/losses exclusively must lead to toxic player behavior. “Soft resets” at the beginning of competitive seasons amplify both problems and should in all cases be avoided. The system can further be improved by implementing one or more qualification level(s) towards the lower rank tiers, and by offsetting the skewedness along the learning curve when matchmaking.

Dear Rioters, dear community,

With the new season’s ranked placements more or less over and their – as usual, devastating “you dropped at least a full tier” – result in, I wanted to take the opportunity to elaborate in a more or less (let’s say: half-human, half-Vulcan) logical fashion about some key issues with the current system of ranking and matchmaking in League of Legends. This applies to the classic competitive game version of Summoner’s Rift only.

For the purse of the following line of argumentation, let me define a couple of terms first:

“ELO” – a generic and highly theoretical term used here for the actual, objective, real, indisputable average skill level of a human player relative to all other human players
Rank – the tier and division rank and LP level as given by Riot (if one plays ranked)
MMR – the skill level of a player relative to all other players as defined by Riot’s algorithms and as used by Riot during ranked matchmaking
Riot – generally refers to the rules of the system “League of Legends”, not to individuals or their opinions or work, and absolutely no offense is meant to anybody who actually works at Riot, or to the company in general, or any other living being whatsoever.

Also, I am making the following assumptions about the game and the ranking system:

The total player population is large – as of patch 6.2 (according to lolsummoners.com), there are around 1.7 million players on the NA server alone
Compared to the total population, an individual player has a low average number of games – an active player will be able to play maybe 50 ranked games per week, over the course of a season that lasts around 40 weeks, which leads to 2,000 games per season. 10 placement matches are used initially to determine starting rank for each player
Each player’s ELO over time increases along a steep, but decreasing slope learning curve – in the beginning, the improvements are fast and large (e.g., realizing that dying after getting a kill is most often worse then not getting the kill and staying alive, that objectives matter more than kills, etc.). Over time, the rate of improvements decreases
Individual game outcomes are determined by the ELO (see above definition) of the individual players only. The impact of chance / random events is marginal – while there is a certain element of chance, we will assume it can be neglected and that the real ELO of all players combined decides which team wins (at least on average, not counting events such as tilt, technical difficulties, etc.)
Matchmaking happens in a way that tries to “match” average ELO of teams – within certain boundaries and making certain tradeoffs (e.g., you don’t want players to wait for hours to have the “perfectly matched ELO” game, but want them to play quickly and having a slight difference in ELO levels per team), Riot tries to match two teams that have combined or average ELO values (i.e. the sum or average of their individual player’s ELO values) that are similar. For the sake of matchmaking, the argument survives if we assume MMR = ELO
The negative impact on the game of a player who is relatively “worse” than the opposing-team’s average ELO is stronger than the positive impact of a player who is relatively “better” than his team’s average ELO – now, this is an important assumption, but should be obvious to all players of League of Legends: eventually, gold and objective control are critical to win a game. If a player “makes mistakes constantly”, gets killed, etc. (because he is worse than the opposing team), the increase in gold and objectives for the opposing team can quickly get to a point where “snowballing” occurs. On the other hand, a significantly higher skill player alone can not as easily create kills or take objectives for his own team, if the opposing team avoids making mistakes. In addition, a “worse” player on the opposing team influences the performance of 5 players on the opposing team in a positive way, whereas a good player only impacts the performance of 4 players on his own team (because his own performance remains unaffected by himself, obviously)

With the above, some consequences of the system should be immediately obvious:

If you only take the consideration of large player population, number of games per player, and high variance of skill levels, simple statistics (see “Law of large numbers” and “Variance”) tells you that it will require a sufficiently large number of games for individual players to regress towards a rank the accurately reflects their ELO. 10 games are clearly not sufficient and will lead to more or less random results. In addition, the actual effect of the placement matches for people who have played previous seasons is simply to toss everybody down one full tier and have them climb back up, which is much worse for players after they have already climbed up in the previous season, due to the following…
If you have a decreasing slope learning curve and use average ELO to compose two team of five randomly selected players each, the best player of the 10 will be closer to the average ELO than the worst player of the 10 (because of the decreasing slope). Therefore, the best player can not compensate for the worst player, if they are on opposing teams. Now, if you further agree with my above assumption that the impact of the relatively worse player is even stronger than the relatively better player’s, you amplify this problem further. This means that the lower you are along the learning curve, the higher the impact of one randomly selected “worse” player on your team. And, consequently, the higher the probability of a “loss” due to random selection of that worse player.

These two things combined lead to a high degree of frustration, especially for player who have improved in their previous season: they experienced in the past that the degree of “randomness” on game outcomes had decreased and they were honing their skills further. However, at the start of the new season, they are thrown back into significantly “increased” levels of randomness (because, again the random “bad player” makes it hard for them to win and play according to their potential). Not necessarily, of course, because the high variance still means they can “be lucky” and make it back into their previous tier in no time. Those are the players who state “ELO hell is a myth”, ignoring the fact that they simply got lucky (or, rather, not unlucky, if you follow my argumentation).

Regardless of that frustration, given the Law of large numbers, the accuracy of the system is significantly reduced by resetting the number of games used for calculating an individual player’s rank. It provides no benefits. Therefore, this mechanism needs to be removed or it’s impact significantly reduced – OR it should be communicated clearly that the intention of season resets is to “get people to play more”… economically, an understandable (and don’t get me wrong: fully acceptable!) consideration, that, however, should be made explicit to avoid further frustration of players by feeling “cheated on”.

Now, you can take the assumptions above, create a Monte Carlo simulation model (a hint: this will require the use of a Chi distribution function… and will get complicated), and find out exactly what the likelihood is that you get good results over 10 games, or 50 or 100 or 1,000. Or, you can use common sense, and realize that it is very unlikely that the results are “any good” before you have reached the mid-point of a season. By resetting everything at the end of the season, you are re-creating a more inaccurate representation, where individual players’ rank is determined largely by chance for the majority of the time they are playing, and only for a small window at the end of the season, they can hope to play with others at their skill level. AND, the lower people are when they start, the more likely it is that randomness determines their climb paces, not their own performance.

Now, let me add a psychological point: Riot only counts “wins and losses of teams” as metrics for individual performance, and individual players (along the lines of the “worse players’ increased impact”, as discussed above) have the ability to throw games for their whole team. An individual can create negative outcomes for the team. However, because the individual is a troll, on tilt, or generally does not care for wins/losses, they do not “emotionally” share the negative consequences of their team. More often than not, they even laugh into their team’s face. This is a special case of something called “Free-rider behavior”, and something the human brain is prone to try to punish. However, in the League of Legends system, while playing ranked, the only means available to punish a team mate is to “flame” them in chat – you cannot even report them for “playing bad”, this is not what the system allows for. Therefore, you are creating toxic behavior (it is the only means of the other players left to do something humans want to do, as in punish free-riders) by avoiding other metrics that are more related to the individual.

Especially now, in Season 6, with New Champ Select’s position selection (and therefore tracking) ability, it is more easily possible than ever to develop a certain number of skill predictor variables that exist in addition to how many games are won. For example, how often did people die, how many assists did they have (different by position, obviously, and compared with the total kills of their team), how much CS by what time, etc. One such metrics very clearly is NOT(!) “KDA”, which has almost zero objective informational value other than “it is super easy to calculate”… I by the way strongly suggest this is banned from being broadcast / mentioned by casters in professional play, to not further encourage people to believe in this idiotic number… Need an example? Mid APC and ADC have the same KDA value of 4… who did better? Did they perform equally well? What if the APC had 3-1-1, and the AD had 4-4-12? Can you even tell without more details, matchups, overall kills for the game, etc.? What if the APC was Orianna? And the total number of kills of the AD’s team was 13?).

Now, here are changes I propose to improve the situation:

Change average MMR for teams to reflect the skewness of the learning curve and the stronger impact of the negative players – at all times, this will result in more evenly matched teams and more fun games. This is a task for a team of really good, smart statisticians and math nerds (or for the community, I am sure we can solve this, too – I personally just do not have the required software to model the probability distribution correctly… and likely also not the statistics skill to model the required components such as the aforementioned distribution correctly)
Establish a qualification layer (or multiple such layers), which is not passable by “wins and losses” alone, but requires additional, other, more individual qualifications – for example, request people to have at least three Mastery-level 4 or 5 champions if they want to que up for certain roles after Silver (and make only those champs selectable). Develop success metrics that are related to certain roles that people have to consistently beat (on average) in order to qualify for the next tier. Once that qualification is made, people cannot drop below the qualification tier until the next season, when they would have to repeat their qualification (which is based on their own performance, not on random team wins/losses). Beyond the qualification tier (or tiers, if you want, you can of course have multiple of those), the normal system applies, to ensure that really in the end the main focus is on wins and losses most of the time…
…but even if you don’t like this, there are some easy fixes that will help alleviate the “win/loss only” problem: start incorporating number of ranked games played into MMR, and make pick order in each team such that the best player picks last. The number of games is a good indicator about general game understanding. If you have two teams, and one team has 2,500 total ranked games played, and the other has 250 (which happens all the time, by the way), the likelihood of the first team winning is significantly higher (again, on average… there can always be 5 smurfs). They simply understand the game that much better. And in random teams, especially at lowest tiers, where absolutely no team work or communication happens that is taken seriously, the best player needs to be allowed to pick after his team has picked, so he can maximize the little impact he has on the game (e.g., by correctly complementing the team composition)
Establish success metrics for individual players, which make sense, are clearly visible to others, and lead to “individual punishment” (i.e., if you are the player who feeds 15 kills by minute 10 to the enemy mid laner, it shows somewhere… so YOU get punished individually as well, and your team “only” gets the loss) – this will reduce toxicity immensely, as people know that “somewhere, this will mean something bad for that a…hole”
Remove “you drop a full tier after the new season starts” – it has zero benefits, works counter to the system employed, and leads to high and unnecessary levels of frustration.

Oh, and lastly, to all those lucky bastards out there who didn’t experience 5 games in a row with the jungler Vi without smite on their own team, complemented by Ashe top: yes, “ELO Hell” does exist, at all levels. It is the worse the further down the learning curve people are. But the variance is high enough, and everybody thinks that “THEY are of course great”, that there will always be lucky people who didn’t have to suffer through it, got out quickly, and are impacted less and less over time. It really depends a hell of a lot on the very early games everybody plays per season… And the lower you are, the more "random" the results of those games will be.

Ranked placements & soft resets, ELO hell, and other MMR and rank considerations - A Vulcan perspect

6 Comments