Total Recall, or: That Time We Disabled Ranked
Pre-season is a time for getting excited about what’s coming next in League, but it also provides us with a moment to reflect on things that happened over the last year. Ranked players, for example, may remember the Riven-related recall bug that popped up in July -- the one that caused a global shutdown of ranked queues.
What follows is the story of that bug and the teams across Riot who worked like hell to get it figured out.
All times in PDT.
Morning - July 15, 2015
A video quickly rises to the front page of the League of Legends subreddit. In it, a player demonstrates a massive bug with Riven: The “right” sequence of button presses allows her to instantly recall back to her Nexus, skipping the ability’s cast time. As teams at Riot are starting their day, reports start to trickle in regarding the bug and its potential impact.
Donna Mason, Release Manager
We got emails, pings, and people in person all at the same time. ‘Oh my god, have you seen this thing on Reddit?’
Scott Hansen, Live Producer
There was something on Reddit where someone posted a video of, ‘Hey, there was this weirdness when I was playing Riven.’
Tim Isenman, Live Producer
The first thing that we saw was that Reddit post, and that’s when we started to investigate. There were a few people on champion team looking at the issue already, as well.
Kacee Granke, Product Manager
The Riven issue immediately threw up caution flags.
Mark Sassenrath, Associate Game Designer
Someone comes by and says, ‘Hey, we have a Riven bug we need to hotfix. Can you fix this Riven bug?’
Donna Mason
So we went to go look at it, and we started trying to reproduce it. Our goal when this stuff comes in is always to figure out if this is a fluke; or if it’s something you can exploit for your personal gain. That’s always the line. If there’s a bug in the game, that’s not good. But if a bug gives you an unnatural advantage, that’s very, very bad. And a big part of that is, ‘Can I do it?’
Kacee Granke
We jumped on that and started attempting to reproduce it in-house. Luckily, the video made it very clear. Sometimes in cases like this it’s like, ‘Oh, shit, that’s definitely a bug,’ but we don’t know how to reproduce it.
Scott Hansen
Release QA was able to reproduce it pretty easily once they got it down. We (Live Production) weren’t – we’re not that good at the game. I’m only Silver.
Matthew Wittrock, Release QA
So I’m messing with it and finally I’m like, ‘Oh, I did it.’ And then I’m like, ‘Oh, now I can do it constantly. Now it’s easy. This is not good.’
Tim Isenman
I asked, ‘Given the information we have, are we going to disable Riven?’
Donna Mason
So we look at the info and we look at how rapidly the post is rising and the visibility on the video and we make the call to disable Riven, which was the only thing we knew about.
2:50 PM - July 15
Riven is disabled globally. As teams work on a fix, new reports surface both internally and externally that indicate the bug might apply to more than one champion.
Tim Isenman
After disabling Riven, I got a few pings from various Rioters saying that more videos are surfacing of people finding the recall bug on other champions. It was Yasuo, then Graves, then more and more.
Mark Sassenrath
People kept coming up, ‘This also happens on Shen. It also happens on X, Y, and Z,’ and the list just grew and grew.
Kacee Granke
Almost immediately afterward, we start getting reports from our QA that they could reproduce the bug with other champions.
Mark Sassenrath
Over the course of the day, as more reports from players come in, we start to realize, ‘Oh, this isn’t a Riven bug. This is an everything bug.’
Tim Isenman
When we were deep diving it we realized that the same exploit could apply to about a quarter of all of our champions. That’s when our hearts dropped into our stomachs.
Matthew Wittrock
Even then, we were still underestimating the scope of the problem. We thought it was just champions with specific abilities. At that point we didn’t realize it was every champion in the game.
Tim Isenman
And then we realized that any champion using Tiamat or Hydra could trigger the same effect. Now it applied to every champion.
Mark Sassenrath
It was like, ‘Oh, every champion can do this. We need to go really hard on not letting this break.’ Testing had to be very thorough.
Scott Hansen
Even without Hydra, it was 40-some-odd champions.
Donna Mason
There’s a sinking feeling you get when you realize it’s every champion. There’s nothing like it, when you’re just like, ‘Oh shit it’s all of them. What are we going to do?’
Matthew Wittrock
We have had examples of abusable bugs that weren’t actually beneficial. So, you’re abusing something but you’re actively losing the game for your team. In this case, it was very clear there was no downside to the exploit.
Donna Mason
We always want more data before we make really sweeping decisions. We looked at ‘How disruptive is it to actually do this?’ So, it’s a quick recall, maybe it’s not that bad. But then you look at how you trigger it and it’s like, ‘I can split push all day. I can be a jerk and just recall.’
Afternoon - July 15
The bug is getting bigger. Within Riot, teams start discussing how best to mitigate the possible damage of the recall bug going nuclear.
Scott Hansen
We started talking about, ‘What can we do to preserve the experience?’ It’s beyond disabling one champion or one item, we had to consider disabling ranked.
Tim Isenman
What we never want to do is cripple League in such a way as disabling like a quarter of the things people that people use. That’s probably one of the worst things we could do aside from turning the game off entirely. So in cases like that our next best option is disabling ranked.
Matthew Wittrock
We were making a lot of decisions around this without full data, but there were a lot of things that met the minimum standard. It affects a ton of champions, it’s widespread, it’s damaging. We can’t just turn off one champ or item.
Kacee Granke
The question at that point was, ‘We can’t disable all the champions, so what else can we do?
Tim Isenman
Weighing the pros and cons -- having everyone potentially exploit the bug in ranked or playing conservatively and theorizing that only a few people are actually aware of the bug -- we could maybe just wait to disable ranked for a while until the exploit got humongous visibility. So far most players only knew that Riven was affected by the issue.
Scott Hansen
When we first started talking about disabling ranked, we had the conversation about, ‘Okay, when do we disable ranked? Can we potentially get a fix out before the bug hits critical mass?’
Kacee Granke
We decided to wait to disable ranked until it becomes a real problem, and leave Riven disabled until the time came to turn off ranked.
Tim Isenman
We didn’t want to make an assumption that everybody knew about the issue and that everybody knew about the issue beyond Riven.
Donna Mason
Luckily because the Riven thing had come in first we had already started looking at it.
In an effort to reduce the potential spread of the bug and gather more information, Rioters reached out to Reddit mods to see if potential new reports could be looped into the existing video thread.
Kacee Granke
We talked about, ‘How can we minimize exposure while still keeping information flowing?’ We don’t control Reddit, we don’t control the forums. Wider exposure creates a higher risk for abuse, ruined games, and bad player experiences, but we want to see what players are seeing.
Tim Isenman
We reached out to the Reddit mods for help with consolidating any new bug posts into the original thread, but in that chain of communication there was a misunderstanding of what we wanted to do. Much to our dismay, we saw the posting of a stickied mega-thread, giving the greatest visibility of every single bug. Every video was posted right in the heart of the post.
Donna Mason
It’s a double-edged sword. The fact that we get very quick information is great, but the visibility the bug gets is unfortunate. People who had never seen it all of a sudden are trying it.
Kacee Granke
That was basically an immediate, ‘We have to disable ranked at this point.’ Not only was it listing all the champions, but it was giving clear reproduction steps.
After the Reddit post explodes, so does the awareness of the bug. It quickly spreads into other LoL regions.
Donna Mason
It’s not something we ever want to do, but the potential benefit of exploiting the bug was really high. We had to assume players would do it, especially in ranked.
Kacee Granke
Not everybody is going to be using the bug, but if it catches on it’s going to be terrible for the player experience.
Tim Isenman
This is one of the larger issues Riot has ever faced on our live environment. Ranked is like the endgame for many players; when that’s taken offline, you lose a tremendous sense of purpose.
Donna Mason
We always ask, ‘What would it feel like if someone did this bug to me?’
Matthew Wittrock
There’s definitely a player understanding. No one is going to be happy, but people aren’t going to be upset when you’re trying to preserve the competitive experience.
Kacee Granke
If players are thinking, ‘I play ranked, I put my heart and soul into this, and I’m just playing against cheaters,’ that ruins people’s desire to play competitive games. So we turned off ranked, and we turned Riven back on.
5:30 PM - July 15
With ranked queues officially disabled, Rioters work frantically to find the cause of the bug, get it solved, and push the fix through the QA testing process.
Scott Hansen
There’s two things happening in parallel here. There’s figuring out how to keep players informed the best we can and communicate with them in the 20-plus languages that they speak, and there’s actually getting the problem fixed.
Tim Isenman
The champion team started working on some scripting rewrites.
Kacee Granke
Once we started to realize the scope, there was a good amount of time where we were just trying to figure out what to even fix. It took a few hours to come up with a first pass.
Matthew Wittrock
I remember getting together with the various teams, and it’s very much, ‘Here’s what we know,’ and ‘What should we do,’ then, ‘Cool, everyone go do things.’ And that’s sort of exciting. This thing might be real bad, but we have a plan and we’re moving quickly.
Kacee Granke
We had a couple of band-aids in place almost right away that we were testing internally. We kept thinking we were there, but then we would find a way to break it or realize it would cause some weird side effect.
Mark Sassenrath
The scope got larger and larger throughout the day.
Donna Mason
We knew we were in trouble when four iterations into the fix we were still finding problems.
Kacee Granke
There was a case where we fixed it, but if you used a health pot it would cancel the recall, whereas in the past using a health pot wouldn’t cancel recall. That type of change in functionality isn’t kosher, because it completely ignores player expectations.
Mark Sassenrath
The detailed steps in player videos helped a lot. Instead of having to do three hotfixes over the course of a few days, we were able to get it fixed much quicker. It was really valuable that players did that.
Donna Mason
It was really interesting to test, because we need to know that testers know how to reproduce the bug. So we go into a custom game and practice, and then test it in the test build. If you’re not good at executing the bug, we can’t trust the test results.
Scott Hansen
If we can get a clear set of, ‘This is how you do this thing, this is how it works,’ that makes our job so much easier in terms of finding out who can go and get that problem fixed.
Tim Isenman
We waited until we had something we felt really good about and sent it off to testing.
Donna Mason
Five iterations in, we think we get the fix. We send it to QA and the test plan involved basically testing every ability in the game, every champion, every item, etc.
Kacee Granke
Because of build and deploy times, we knew we weren’t going to find out if it worked until the next day.
Tim Isenman
When we have any type of change to the live environment, there’s a really rigorous process that every change goes through in order for us to put it out with confidence. It needs to go through preliminary internal testing and peer review, and then we need to push it through destructive testing with our global QA teams. Those types of turnarounds usually take 12 hours, at a minimum.
Scott Hansen
We posted a message saying, ‘Okay, in 12 hours, we’ll let you know where we’re at.’ We assumed around 4-5 a.m. we’d know whether the fix worked or not.
Donna Mason
Around 10-11p.m., we sent a lot of people to bed. We let the QA team know to wake us all up if the fix failed.
2:00 AM - July 16
The teams wait anxiously for the results of extensive destructive testing. If the bug fix fails, players could be looking at another 12 hours without ranked -- 8.5 hours have already passed since it was initially disabled.
Tim Isenman
Around 2 a.m., we learn that the fix did not work, and that we had to reevaluate and pretty much start from scratch. We called everyone back in to figure out what went wrong with the first fix, make the change, and then re-submit it.
Donna Mason
We all got paged. We woke up the engineers, the design people -- everyone wakes back up. All of the involved teams.
Mark Sassenrath
You know something went wrong if your phone is ringing at 2 a.m. Either you missed something, or your fix broke something else.
Donna Mason
They wouldn’t call us if it were good. They wouldn’t call to say, ‘Hey, the fix is fantastic! How are you?’
Tim Isenman
I actually don’t know if Mark Sassenrath ever left the office…
Mark Sassenrath
We thought we had it fixed. We deployed to the testing environment. Somewhere around 3 a.m., we hear that something is still wrong. By around 4 a.m. I was back in the office working on a new fix. At that point I was basically dead.
Kacee Granke
We got the results back and it was still broken. We submitted another fix and then waited through the same process.
Tim Isenman
The new version of the fix went into the QA cycle from scratch. But in addition to starting over, we also had to go back and make sure that every broken case from the first fix was then fixed with the second try, so our workload widened.
Donna Mason
At that point, we not only test the new issues we’ve found, we re-test the things we have already tested.
Mark Sassenrath
We didn’t want to break recall’s intended uses. We tried not to go too crazy with the fix at first. There are all sorts of actions you’re allowed to take during recall and we didn’t want to break any of them.
Afternoon - July 16
Ranked has been disabled for nearly 19 hours. The involved teams struggle to update players on when ranked will be turned on without a clear timeline on the fix.
Tim Isenman
We had to re-communicate with players that testing was still in progress. We let them know we’d update them within a couple of hours, giving us enough room to screw it up a couple more times. Every time we asked QA for an update the window for completion seemed to extend by two hours. We didn’t know how long it would take.
Scott Hansen
We had conversations about, ‘What do we tell players?’ We didn’t want to set a timeline that’s really far out just to be safe, but we also didn’t want to set unrealistic expectations. We ended up going with ‘soon,’ which honestly isn’t ideal.
Tim Isenman
Around 4 p.m. we hear that about 80-90% of our champions have been successfully tested against the issue. So we’re feeling pretty good about it, but we had a hard time communicating that to players. There’s still a chance the fix won’t work for a few champions, which means we’d end up rolling back the fix again and re-verifying it over another 10-hour period. So we kind of kept quiet.
5:00 PM - July 16
Roughly 24 hours after the initial ranked shutdown, the final bug fix is verified across all champions. At this point, the teams start the process of rolling out the fix globally and making sure ranked is re-enabled across all regions.
Tim Isenman
I think it was 5 p.m., we get the confirmation that we have a 100% success rate with the fix. We had preemptively staged and prepped the new game server package just to have a one-button deployment to live. The deploy train was ready, so region by region we pushed out the fix.
Donna Mason
Because we have to touch game servers in every data center all over the world, it is pretty time consuming. But we were moving fast.
Matthew Wittrock
It was one of the faster turnaround times we’ve had. Everyone was very ready.
Tim Isenman
Right at the end, Vietnam was taking a really long time. We’re all standing around the monitor waiting to hear back on the fix. And this dude pings back, ‘I still have it.’ And we almost lost it.
Matthew Wittrock
That’s the real fear. As much as we’ve tested everything we can think of, we’re still waiting for the post that says, ‘Hey this bug’s still here!’
Tim Isenman
We said, ‘Exit the game and try it again.’ So he does, and we’re waiting. And waiting. And finally he’s like, ‘I can’t do it. I think it’s fixed.’ It was a serious moment of panic, because if it didn’t work in the one region it didn’t work in any.
8:00 PM - July 16
Ranked is fully enabled 28 hours after being taken offline. The bug is officially squashed.
Tim Isenman
We kept ranked disabled for about five minutes so we could test the fix internally and verify that it was good to go per server, then we re-enabled ranked and let players know everything was back online. The demon was defeated.
Matthew Wittrock
At that point, it’s way too late to be worried. If something’s wrong you’re going to find out about it in a couple of hours. You have to move on to the next thing. The train keeps going.
Donna Mason
I’m like, ‘I’m gonna go play some ranked.’ We’re all players, there’s that sense, ‘Thank god, everything is okay again.’
Kacee Granke
And then it’s back to work.
Mark Sassenrath
We can’t rest on our laurels about it. There’s still lots of work to do.
Nobody likes in-game bugs, but eliminating them completely from a game as complex as League is a pretty big challenge (one we work every day to meet). When major bugs do pop up, multiple teams at Riot work together to find a solution as quickly and effectively as possible, with as little game interference as we can manage. In the case of the Great Recall Bug of 2015, dozens of people contributed long hours to solving the problem and getting players back into ranked.