All the talk this morning about last-minute design tweaks to Croc got me thinking about metrics and balancing. It seems to me that with regards to the test realm, Riot's practices are completely backwards and ineffective and I think this Croc issue, among others, proves my point.
Metrics & Game Design
As a brief primer to those of you that don't work in the game industry, the word "metrics" generally refers to using statistics to make better decisions. This isn't just math-crafting numbers because any designer worth his salt knows that you can't puzzle out exactly how much fun or even how effective something is going to be just by playing with numbers. Instead, game designers use the statistics from play tests, focus groups, etc to help identify previously undetected problems and refine designs. How this is done and how the results are applied vary but major studios all over the world use these kinds of tools to improve their games.
If you want to learn more about the value of metrics (and still be entertained), I suggest reading Moneyball by Michael Lewis ( http://www.amazon.com/Moneyball-Art-.../dp/0393324818 ) . It's about how metrics were used at the Oakland Athletics to turn their franchise into an insanely efficient ball club. These methods have been reproduced at other clubs and in other sports. If you want a more boring, but more thorough look at business metrics, there are a few quality books you can find by searching on Amazon.
The Test Realm is Backwards
One thing I've learned from doing focus testing is that the bigger the group you have to test with (within the bounds of your manpower) the more accurate your numbers are going to be. This is pretty obvious from a Statistics 101 standpoint.
Another thing I've learned is that you want different players or at least a varied mix of players each time you do a test. Why?
- Players can become biased over time based on testing procedures and exposure to the developers. This reduces their effectiveness as testers
- Players form habits in the test environment that are not reflective of their behaviors in a live environment.
- Players become experts in a certain genre, role or task and miss problems that less experienced players would have discovered.
But the current iteration of the test realm has prioritized "preventing leaks" over any of these pretty standard best practices. The player pool is small. There doesn't seem to often be calls for new testers. It certainly doesn't seem like you've got any open doors for newer players to gain access. So you've got a small, insulated testing group that has repeated exposure to the same devs and testing practices. This seems backwards to me.
The Test Realm is Ineffective
Today's issue with Croc proves this point. Why are you doing tweaks until the last second? Probably because you didn't have enough testing resources, whether thats manpower or time, to identify the problems with his numbers. Since I'm not on the test realm I can only speculate, but maybe you tweak the numbers repeatedly and play him until he "feels right."
- Pantheon buffs and nerfs are ALL over the map. He was OP, then he was UP. Then he was balanced but you guys weren't happy with his HSS being useless. Then he was OP again. Now he's getting nerfed again. How much better would it be for the community if you could do all those iterations in a few weeks rather than over the course of a year?
- Eve/Ward changes made normal games all but unplayable at first. That would have been picked up on by a larger body of playtesters
- LeBlancs release was a nightmare. It took just hours before the forums were flooded with "OPOPOPOPOPOP!" threads.
Wouldn't it be more effective to have 1,000 or 2,000 or even 10,000 players on the test realm to give you valuable metrics to work from? Maybe its a manpower or an equipment limitation that keeps you from doing this, but given the option, I hope you would choose this over preventing leaks or maintaining the mystery of a champion until they drop.
Your Community ALWAYS knows more...
... than you do about your own game. If you have 20 full time testers working 40 hours a week, that gives you 800 manhours of testing from your QA department. Add in the odd hours your own staff get to play the test realm, plus dedicated time from your design team and maybe you break 1200-1500 manhours of testing. I'm not sure the population of the test server, but I can't imagine you break more than 2500 manhours total and that's being pretty generous.
Compare that to the first hour a patch is live. If you have a modest 25,000 players on the first hour after the patch, they immediately surpass your investment in the patch by ten fold! That's just in the first hour. They test interactions you didn't even think of, let alone test against. They find build orders you didn't consider. They build team comps you didn't have time to try out. Then they disseminate this information at break-neck pace via match results, ventrilo chats and your own forums.
There is no way for you to keep up with player investment. All you can do is maximize your ability to glean from it and work hard to translation the metrics into more effective game design the next time out the gate.
A list of some other benefits of an open test realm:
- You get to test more things. Imagine if everyone got a turn on the test realm and earned IP for playing there. Now you've got a huge body of interested testers at your fingertips. You could test what a "double gold" game would look like, or what might happen if multiples of the same champ could be picked. Maybe it has no place in the "live version" of the game apart from some giggles, but it may create some new ideas and give you valuable insight into your game that you otherwise would have never received.
- You increase player investment. Like the tribunal, you give players more reasons to care about your product. "You guys should definitely get Renekton. I was on the test realm for his tweaks and he is really balanced and fun."
- You create a better product. Period. A biggest test group generally means fewer problems, better balance and earlier detection of potentially game-breaking interactions. And related to that note...
- You save yourself time fixing problems later. How much time have you wasted rebalancing Pantheon? How many manhours did you burn prepping and pushing the LeBlanc hotfix? How many late nights have you put in testing a champ for last-minute tweaks when you could have been done tweaking days before the patch?
- Your community stops thinking you are trying to trick them. Your champion spotlights are accurate more often. You present your community with more information about what makes a good, balanced champion and why it takes so much work to get it right.
So What's Up?
... with the backwards and inefficient test realm? Is the business case for keeping a champion as secret as possible until launch so strong that you are willing to overlook the design case for more open testing? Is it worth making your community suffer for at least two patches worth of Eve wrecking faces to push the ward nerfs through sans leak? Are you guys just lacking the manpower required to implement this? I'd love to hear your thoughts.
Previous "Let's Talk" topics :
Risk vs. Reward: http://www.leagueoflegends.com/board....php?p=5112925
yes, I intend to make a series out of these.