The Good, The Bad, and the League: 2/4 - 2/24
_Your semi-weekly dose of server problem-os, NA League news, and other! _
2020 is in full swing. LoR entered open beta, Sett is STILL punching an awful lot of things, Your Shop is active and C9 is undefeated...so far. Clash happened this last weekend
Follow-Ups:
- Mac voice still not working properly It’s being worked on! There’s improvements being done to Riot Voice and it should be fixed in the nearish future (not a Soon™ type scenario).
Ongoing
- None
**Server Stuff: **
-
MFA host has higher error rate than acceptable(2/7, ~4 hours) Automated alerting reports that error rates for Multi-Factor Auth (MFA) are above acceptable rates. NOC notifies engineers, who investigate and fiddle with config changes. Then, additional hardware is scaled up to handle extra load, and the problem is resolved. An additional code change is scheduled to permanently fix the underlying problem.
-
Elevated error rate causing login issues(2/7, ~30 minutes) NOC is notified that error rates are reaching unacceptable levels. NOC techs cycle the boxes running the programs, which solves the problem.
-
Player in Game (PIG) loss due to unknown reasons (2/10, ~8 minutes) Automated alerting notifies the NOC that PIG dropped mysteriously. PIG recovers shortly, but investigations into why that happened find no root cause.
-
NA players report inability to patch Riot Client (2/11, ~10 hours) NOC is notified by PS that they’ve gotten a large volume of support requests due to an inability to patch the Riot Client. Investigation points to an issue with timeouts. Engineers point software towards a different box, which fixes the routing issues causing the problem.
-
Issues with game starts (2/14, ~20 minutes) Automated alerting notifies the NOC that game starts are impacted. Before an investigation can be spun up, game starts resume normal metrics, and further watching doesn’t show any additional problems.
-
Signups API failure(2/21, ~6 minutes) Automated alerting notifies the NOC that signups have reported 0 signups for 5 minutes. NOC tech takes a look and notices that one of the backend connection nodes isn’t transmitting data. Routing around the impacted node fixes the problem while the node is rebooted.
-
CPU Usage too high - Mitigation steps (2/22, ~89 minutes) CPU usage hitting warning levels due to Clash. In response, practice games are set to 10 player minimums and then the practice tool is disabled too. ARAM queue is throttled, and then all steps are undone as soon as the CPU levels lower to acceptable ranges.
-
CPU Usage too high - Mitigation steps (2/23, ~95 minutes) CPU usage hitting warning levels due to Clash. In response, practice games are set to 10 player minimums and then the practice tool is disabled too. Co-op vs. AI games are throttled, and then all steps are undone as soon as the CPU levels lower to acceptable ranges.
-
Practice Games Finishing in Error (2/24, ~11 hours) Automated alerting notifies the NOC that games are ending with unacceptable error rates. Investigation starts, and engineers notice only practice games are ending with errors. Further checks notice that unverified code around clash trophies in practice tool seems to be the culprit. A hotfix goes out that fixes the issue.
**Game Stuff: **
- Network config for perks service accidentally deletes Runes (2/4, ~38 minutes) Network engineer changes the config for perks, which accidently disables rune selection for all Riot Shards. Once discovered, the network engineer undoes the network configs for perks that caused it.
Morgageddon