As you’ve experienced today, we’ve run into a sequence of unfortunate events that has diminished the quality of our service below what you expect from us – and what we expect from ourselves. I wanted to create this thread for a few reasons.
- To apologize – First and foremost, I think it’s important that we acknowledge that this is unacceptable. Even though we’re a fast-moving company, experiencing tremendous growth we don’t perceive these outages to be acceptable. I want to ensure that service stability does in-fact remain our #1 priority, and we continue to devote significant resources, (both time and money) into improving the stability of our service. I know it might not always seem like it in times like these, but over the last 5 months our overall stability has improved significantly, (we measure it by the second!) and we continue to have a large team fully dedicated to service stability.
- To correct misconceptions – While we definitely understand how frustrating this experience is. If it were as easy as spending more money and all problems go away – we definitely would have done that by now. The problems that we’re experiencing are large-scale and not simple to solve. We have a bunch of really smart people working on solving these problems, and we’re trying to hire as many more as we can.
- To answer questions – I’m going to use this thread as an opportunity to answer questions that people have been creating individual threads for, and hopefully condensing a bunch of knowledge to a single location so we can clean up the discussions.
Feel free to ask your questions in this thread – and I’ll put them into the original post. Trolling, flaming, unconstructive ranting will be removed from this thread, and posters will lose posting permissions.
Q: What exactly happened that caused the issue(s)?
A: A high level overview of the issues and a timeline can be found in this thread: http://www.leagueoflegends.com/board...d.php?t=654620. Unfortunately I don't really have much more information than this right now.
Q: Will we get some ip boost like last server outbreak?
A: We'll definitely do something - I don't know what yet.
Q: You said you added a server. How many servers are you guys actually running on and how big are they?
A: We have hundreds of services in multiple international data-centers. The actual specs on them vary - but our hardware is top of the line.
Q: Can you give more detail about why a predicted 9 hour patched ended up going on 15+ hours?
A: The unfortunate reality is there are just unknowns when it comes to stuff like this. We're doing a lot as a company to become more reliable and more predictable on the technology side, but it's a process that takes time.
Q: When you predict everything will be running smoothly again? if you can make such a prediction at the present time?
A: Personally I'd be uncomfortable making a guess. Our production crew has our best guess, but I'd rather not bug them right now.
Q: As for a question, does Riot have any plans on implementing a small server to run during maintenance times that would run off a previous version of LoL?
A: There are lots of options that we're considering, and lots of ideas floating around. I can't speak to this one specifically.
Q: Are you currently developing a smoother system to ease the strain of the transition into a new patch?
A: We're constantly making improvements to the way we develop, and the way we deploy changes to our technology and architecture, so yes.
Q: What are you going to do to make sure this doesn't happen again?
A: Again - I can't promise that it won't happen again, and it's not possible that I'd ever be able to make that promise. We're constantly striving to improve, and we've improved leaps and bounds from this point last year - and will continue to address major pain points as we're able to.
Q: Are the problems the same/similar lately, and it's just a matter of figuring out the trick to stop it from happening
A: Generalizing - over long timespans the problems are rarely, (if ever) similar. Once we see an issue we can correct it pretty quickly. The issues over the past week have a variety of causes and I personally don't know enough about them to speak to them specifically.
Q: It's probably tough for the guys working on it to explain to us in full detail, but maybe they could give us a quick run-by on what the problems were and how to fix them/prevent them from happening?
A: We do this every for every single period of degraded service internally, (we call it a retrospective). Sometimes we can create player-facing versions, and sometimes we can't depending on a ton of factors. I'll see if it makes sense for this set of issues - but I definitely can't commit to anything.
Q: It seems as if the introduction of the login queue made everything more... wonky. Is this the case?
A: We're working to improve the login queue experience. The login queue allows us to control the rate at which players enter the system - effectively preventing the equivalent of a DDOS attack on our service. The login queue itself has not caused any outage issues - and help us prevent the usage of "busy" on the patcher as much as possible.
Q: Second - What kind of population numbers are you running into. I don't think people are grasping just how many people are trying to log into your servers when you put them live.
A: We're not quite ready to share that yet. As Tryndamere put it here, people will be "shocked"
Q: Why dont you just over shoot the estimated time for patch?
A: We do actually! We actually have a great track record for finishing on time in the past year - save a few unfortunate incidents
Q: I'm sure this may seem like a silly question but is there anything we, the regular players, can do today or future patch days to help ease the workload of the programmers/repair crew, etc?
A: If you know someone who is looking for work and meets the qualifications, refer them! http://www.riotgames.com/careers/job-openings-0