Total Recall, or: That Time We Disabled Ranked

By Rumtumtummers

Pre-season is a time for getting excited about what’s coming next in League, but it also provides us with a moment to reflect on things that happened over the last year. Ranked players, for example, may remember the Riven-related recall bug that popped up in July -- the one that caused a global shutdown of ranked queues.

What follows is the story of that bug and the teams across Riot who worked like hell to get it figured out.

All times in PDT.

Morning - July 15, 2015

A video quickly rises to the front page of the League of Legends subreddit. In it, a player demonstrates a massive bug with Riven: The “right” sequence of button presses allows her to instantly recall back to her Nexus, skipping the ability’s cast time. As teams at Riot are starting their day, reports start to trickle in regarding the bug and its potential impact.

Donna Mason, Release Manager

We got emails, pings, and people in person all at the same time. ‘Oh my god, have you seen this thing on Reddit?’

Scott Hansen, Live Producer

There was something on Reddit where someone posted a video of, ‘Hey, there was this weirdness when I was playing Riven.’

Tim Isenman, Live Producer

The first thing that we saw was that Reddit post, and that’s when we started to investigate. There were a few people on champion team looking at the issue already, as well.

Kacee Granke, Product Manager

The Riven issue immediately threw up caution flags.

Mark Sassenrath, Associate Game Designer

Someone comes by and says, ‘Hey, we have a Riven bug we need to hotfix. Can you fix this Riven bug?’

Donna Mason

So we went to go look at it, and we started trying to reproduce it. Our goal when this stuff comes in is always to figure out if this is a fluke; or if it’s something you can exploit for your personal gain. That’s always the line. If there’s a bug in the game, that’s not good. But if a bug gives you an unnatural advantage, that’s very, very bad. And a big part of that is, ‘Can I do it?’

Kacee Granke

We jumped on that and started attempting to reproduce it in-house. Luckily, the video made it very clear. Sometimes in cases like this it’s like, ‘Oh, shit, that’s definitely a bug,’ but we don’t know how to reproduce it.

Scott Hansen

Release QA was able to reproduce it pretty easily once they got it down. We (Live Production) weren’t – we’re not that good at the game. I’m only Silver.

Matthew Wittrock, Release QA

So I’m messing with it and finally I’m like, ‘Oh, I did it.’ And then I’m like, ‘Oh, now I can do it constantly. Now it’s easy. This is not good.’

Tim Isenman

I asked, ‘Given the information we have, are we going to disable Riven?’

Donna Mason

So we look at the info and we look at how rapidly the post is rising and the visibility on the video and we make the call to disable Riven, which was the only thing we knew about.

2:50 PM - July 15

Riven is disabled globally. As teams work on a fix, new reports surface both internally and externally that indicate the bug might apply to more than one champion.

Tim Isenman

After disabling Riven, I got a few pings from various Rioters saying that more videos are surfacing of people finding the recall bug on other champions. It was Yasuo, then Graves, then more and more.

Mark Sassenrath

People kept coming up, ‘This also happens on Shen. It also happens on X, Y, and Z,’ and the list just grew and grew.

Kacee Granke

Almost immediately afterward, we start getting reports from our QA that they could reproduce the bug with other champions.

Mark Sassenrath

Over the course of the day, as more reports from players come in, we start to realize, ‘Oh, this isn’t a Riven bug. This is an everything bug.’

Tim Isenman

When we were deep diving it we realized that the same exploit could apply to about a quarter of all of our champions. That’s when our hearts dropped into our stomachs.

Matthew Wittrock

Even then, we were still underestimating the scope of the problem. We thought it was just champions with specific abilities. At that point we didn’t realize it was every champion in the game.

Tim Isenman

And then we realized that any champion using Tiamat or Hydra could trigger the same effect. Now it applied to every champion.

Mark Sassenrath

It was like, ‘Oh, every champion can do this. We need to go really hard on not letting this break.’ Testing had to be very thorough.

Scott Hansen

Even without Hydra, it was 40-some-odd champions.

Donna Mason

There’s a sinking feeling you get when you realize it’s every champion. There’s nothing like it, when you’re just like, ‘Oh shit it’s all of them. What are we going to do?’

Matthew Wittrock

We have had examples of abusable bugs that weren’t actually beneficial. So, you’re abusing something but you’re actively losing the game for your team. In this case, it was very clear there was no downside to the exploit.

Donna Mason

We always want more data before we make really sweeping decisions. We looked at ‘How disruptive is it to actually do this?’ So, it’s a quick recall, maybe it’s not that bad. But then you look at how you trigger it and it’s like, ‘I can split push all day. I can be a jerk and just recall.’

Afternoon - July 15

The bug is getting bigger. Within Riot, teams start discussing how best to mitigate the possible damage of the recall bug going nuclear.

Scott Hansen

We started talking about, ‘What can we do to preserve the experience?’ It’s beyond disabling one champion or one item, we had to consider disabling ranked.

Tim Isenman

What we never want to do is cripple League in such a way as disabling like a quarter of the things people that people use. That’s probably one of the worst things we could do aside from turning the game off entirely. So in cases like that our next best option is disabling ranked.

Matthew Wittrock

We were making a lot of decisions around this without full data, but there were a lot of things that met the minimum standard. It affects a ton of champions, it’s widespread, it’s damaging. We can’t just turn off one champ or item.

Kacee Granke

The question at that point was, ‘We can’t disable all the champions, so what else can we do?

Tim Isenman

Weighing the pros and cons -- having everyone potentially exploit the bug in ranked or playing conservatively and theorizing that only a few people are actually aware of the bug -- we could maybe just wait to disable ranked for a while until the exploit got humongous visibility. So far most players only knew that Riven was affected by the issue.

Scott Hansen

When we first started talking about disabling ranked, we had the conversation about, ‘Okay, when do we disable ranked? Can we potentially get a fix out before the bug hits critical mass?’

Kacee Granke

We decided to wait to disable ranked until it becomes a real problem, and leave Riven disabled until the time came to turn off ranked.

Tim Isenman

We didn’t want to make an assumption that everybody knew about the issue and that everybody knew about the issue beyond Riven.

Donna Mason

Luckily because the Riven thing had come in first we had already started looking at it.

In an effort to reduce the potential spread of the bug and gather more information, Rioters reached out to Reddit mods to see if potential new reports could be looped into the existing video thread.

Kacee Granke

We talked about, ‘How can we minimize exposure while still keeping information flowing?’ We don’t control Reddit, we don’t control the forums. Wider exposure creates a higher risk for abuse, ruined games, and bad player experiences, but we want to see what players are seeing.

Tim Isenman

We reached out to the Reddit mods for help with consolidating any new bug posts into the original thread, but in that chain of communication there was a misunderstanding of what we wanted to do. Much to our dismay, we saw the posting of a stickied mega-thread, giving the greatest visibility of every single bug. Every video was posted right in the heart of the post.

Donna Mason

It’s a double-edged sword. The fact that we get very quick information is great, but the visibility the bug gets is unfortunate. People who had never seen it all of a sudden are trying it.

Kacee Granke

That was basically an immediate, ‘We have to disable ranked at this point.’ Not only was it listing all the champions, but it was giving clear reproduction steps.

After the Reddit post explodes, so does the awareness of the bug. It quickly spreads into other LoL regions.

Donna Mason

It’s not something we ever want to do, but the potential benefit of exploiting the bug was really high. We had to assume players would do it, especially in ranked.

Kacee Granke

Not everybody is going to be using the bug, but if it catches on it’s going to be terrible for the player experience.

Tim Isenman

This is one of the larger issues Riot has ever faced on our live environment. Ranked is like the endgame for many players; when that’s taken offline, you lose a tremendous sense of purpose.

Donna Mason

We always ask, ‘What would it feel like if someone did this bug to me?’

Matthew Wittrock

There’s definitely a player understanding. No one is going to be happy, but people aren’t going to be upset when you’re trying to preserve the competitive experience.

Kacee Granke

If players are thinking, ‘I play ranked, I put my heart and soul into this, and I’m just playing against cheaters,’ that ruins people’s desire to play competitive games. So we turned off ranked, and we turned Riven back on.

5:30 PM - July 15

With ranked queues officially disabled, Rioters work frantically to find the cause of the bug, get it solved, and push the fix through the QA testing process.

Scott Hansen

There’s two things happening in parallel here. There’s figuring out how to keep players informed the best we can and communicate with them in the 20-plus languages that they speak, and there’s actually getting the problem fixed.

Tim Isenman

The champion team started working on some scripting rewrites.

Kacee Granke

Once we started to realize the scope, there was a good amount of time where we were just trying to figure out what to even fix. It took a few hours to come up with a first pass.

Matthew Wittrock

I remember getting together with the various teams, and it’s very much, ‘Here’s what we know,’ and ‘What should we do,’ then, ‘Cool, everyone go do things.’ And that’s sort of exciting. This thing might be real bad, but we have a plan and we’re moving quickly.

Kacee Granke

We had a couple of band-aids in place almost right away that we were testing internally. We kept thinking we were there, but then we would find a way to break it or realize it would cause some weird side effect.

Mark Sassenrath

The scope got larger and larger throughout the day.

Donna Mason

We knew we were in trouble when four iterations into the fix we were still finding problems.

Kacee Granke

There was a case where we fixed it, but if you used a health pot it would cancel the recall, whereas in the past using a health pot wouldn’t cancel recall. That type of change in functionality isn’t kosher, because it completely ignores player expectations.

Mark Sassenrath

The detailed steps in player videos helped a lot. Instead of having to do three hotfixes over the course of a few days, we were able to get it fixed much quicker. It was really valuable that players did that.

Donna Mason

It was really interesting to test, because we need to know that testers know how to reproduce the bug. So we go into a custom game and practice, and then test it in the test build. If you’re not good at executing the bug, we can’t trust the test results.

Scott Hansen

If we can get a clear set of, ‘This is how you do this thing, this is how it works,’ that makes our job so much easier in terms of finding out who can go and get that problem fixed.

Tim Isenman

We waited until we had something we felt really good about and sent it off to testing.

Donna Mason

Five iterations in, we think we get the fix. We send it to QA and the test plan involved basically testing every ability in the game, every champion, every item, etc.

Kacee Granke

Because of build and deploy times, we knew we weren’t going to find out if it worked until the next day.

Tim Isenman

When we have any type of change to the live environment, there’s a really rigorous process that every change goes through in order for us to put it out with confidence. It needs to go through preliminary internal testing and peer review, and then we need to push it through destructive testing with our global QA teams. Those types of turnarounds usually take 12 hours, at a minimum.

Scott Hansen

We posted a message saying, ‘Okay, in 12 hours, we’ll let you know where we’re at.’ We assumed around 4-5 a.m. we’d know whether the fix worked or not.

Donna Mason

Around 10-11p.m., we sent a lot of people to bed. We let the QA team know to wake us all up if the fix failed.

2:00 AM - July 16

The teams wait anxiously for the results of extensive destructive testing. If the bug fix fails, players could be looking at another 12 hours without ranked -- 8.5 hours have already passed since it was initially disabled.

Tim Isenman

Around 2 a.m., we learn that the fix did not work, and that we had to reevaluate and pretty much start from scratch. We called everyone back in to figure out what went wrong with the first fix, make the change, and then re-submit it.

Donna Mason

We all got paged. We woke up the engineers, the design people -- everyone wakes back up. All of the involved teams.

Mark Sassenrath

You know something went wrong if your phone is ringing at 2 a.m. Either you missed something, or your fix broke something else.

Donna Mason

They wouldn’t call us if it were good. They wouldn’t call to say, ‘Hey, the fix is fantastic! How are you?’

Tim Isenman

I actually don’t know if Mark Sassenrath ever left the office…

Mark Sassenrath

We thought we had it fixed. We deployed to the testing environment. Somewhere around 3 a.m., we hear that something is still wrong. By around 4 a.m. I was back in the office working on a new fix. At that point I was basically dead.

Kacee Granke

We got the results back and it was still broken. We submitted another fix and then waited through the same process.

Tim Isenman

The new version of the fix went into the QA cycle from scratch. But in addition to starting over, we also had to go back and make sure that every broken case from the first fix was then fixed with the second try, so our workload widened.

Donna Mason

At that point, we not only test the new issues we’ve found, we re-test the things we have already tested.

Mark Sassenrath

We didn’t want to break recall’s intended uses. We tried not to go too crazy with the fix at first. There are all sorts of actions you’re allowed to take during recall and we didn’t want to break any of them.

Afternoon - July 16

Ranked has been disabled for nearly 19 hours. The involved teams struggle to update players on when ranked will be turned on without a clear timeline on the fix.

Tim Isenman

We had to re-communicate with players that testing was still in progress. We let them know we’d update them within a couple of hours, giving us enough room to screw it up a couple more times. Every time we asked QA for an update the window for completion seemed to extend by two hours. We didn’t know how long it would take.

Scott Hansen

We had conversations about, ‘What do we tell players?’ We didn’t want to set a timeline that’s really far out just to be safe, but we also didn’t want to set unrealistic expectations. We ended up going with ‘soon,’ which honestly isn’t ideal.

Tim Isenman

Around 4 p.m. we hear that about 80-90% of our champions have been successfully tested against the issue. So we’re feeling pretty good about it, but we had a hard time communicating that to players. There’s still a chance the fix won’t work for a few champions, which means we’d end up rolling back the fix again and re-verifying it over another 10-hour period. So we kind of kept quiet.

5:00 PM - July 16

Roughly 24 hours after the initial ranked shutdown, the final bug fix is verified across all champions. At this point, the teams start the process of rolling out the fix globally and making sure ranked is re-enabled across all regions.

Tim Isenman

I think it was 5 p.m., we get the confirmation that we have a 100% success rate with the fix. We had preemptively staged and prepped the new game server package just to have a one-button deployment to live. The deploy train was ready, so region by region we pushed out the fix.

Donna Mason

Because we have to touch game servers in every data center all over the world, it is pretty time consuming. But we were moving fast.

Matthew Wittrock

It was one of the faster turnaround times we’ve had. Everyone was very ready.

Tim Isenman

Right at the end, Vietnam was taking a really long time. We’re all standing around the monitor waiting to hear back on the fix. And this dude pings back, ‘I still have it.’ And we almost lost it.

Matthew Wittrock

That’s the real fear. As much as we’ve tested everything we can think of, we’re still waiting for the post that says, ‘Hey this bug’s still here!’

Tim Isenman

We said, ‘Exit the game and try it again.’ So he does, and we’re waiting. And waiting. And finally he’s like, ‘I can’t do it. I think it’s fixed.’ It was a serious moment of panic, because if it didn’t work in the one region it didn’t work in any.

8:00 PM - July 16

Ranked is fully enabled 28 hours after being taken offline. The bug is officially squashed.

Tim Isenman

We kept ranked disabled for about five minutes so we could test the fix internally and verify that it was good to go per server, then we re-enabled ranked and let players know everything was back online. The demon was defeated.

Matthew Wittrock

At that point, it’s way too late to be worried. If something’s wrong you’re going to find out about it in a couple of hours. You have to move on to the next thing. The train keeps going.

Donna Mason

I’m like, ‘I’m gonna go play some ranked.’ We’re all players, there’s that sense, ‘Thank god, everything is okay again.’

Kacee Granke

And then it’s back to work.

Mark Sassenrath

We can’t rest on our laurels about it. There’s still lots of work to do.

Nobody likes in-game bugs, but eliminating them completely from a game as complex as League is a pretty big challenge (one we work every day to meet). When major bugs do pop up, multiple teams at Riot work together to find a solution as quickly and effectively as possible, with as little game interference as we can manage. In the case of the Great Recall Bug of 2015, dozens of people contributed long hours to solving the problem and getting players back into ranked.


1 year ago

Tagged with: 
Ranked Play