Retrospective and plans going forward
On 3/21 at 6pm ET, we attempted to open the application to the wider public. We had performed a variety of stress and generative tests leading up to the open registration. The only issue we ran into at the last minute was the tutorial, where we found edge cases which could brick it, and which we cut down with the intent to prevent user frustration and revisit at a later time. Everything else looked good, fixing any issues found through staff testing. We expected to see some bugs cropping up here and there, as is customary in a beta.
However, there was a big bug which our tests didn’t reveal, and that was a locking issue with cat ID generation. We believe that our tests did not catch this because we did not simulate enough user processes at once, which is an error on our part. Our developers have been working all night on tests and further strengthening of the application.
The server itself was not under any significant strain by the user influx, but the locking created a Katamari-like effect, where each new cat added to the stuck queue, thus causing significant lag, strain, and a jumble of spiraling problems.
We were working to quickly fix the issue, but it was growing just as fast. In the end, we temporarily closed the application to prevent the issue from continuing to expand. We may have needed to make this call earlier, and we regret how this may have frustrated users.
We all owe you one heck of an apology.
And we do apologize that it took ‘til now to post this writeup. Our staff needed to rest for the night before returning! Users are entirely justified in their frustration and scorn, and here is our official accountability.
Where we need to improve:
We need to ensure more complex testing. Our technical architect has been working nonstop on building and running more tests, rapidly fixing any achilles heels like the locking issue.
We were short-sighted and negligent to allow this to happen. We feel any user upset is justified, and from hereon our testing process will be more advanced and thorough.
We had to cry, then we picked ourselves up and dusted ourselves off! As sad as we are to see this happen, it’s all part of the punches with a small team and indie resources.
The bug was a nasty bugger, but despite the query issue gunking the process for everyone, our server held up quite well under the influx of users. By peak traffic, we were using only 30% of our total CPU, and that’s great! It means that, once we address the bottleneck, the server will hold up to strain.
The community was majority positive and kind, even those who were disgruntled, bringing forth just how much goodness there is in our users. We had over 1,400 new users sign up! Another huge milestone, and we’re excited to have you all playing with a positive experience in the future, which brings us to the next point:
First off, we will be rewarding everyone who created an account, including old and new users, an exclusive Fauna item. If you got stuck at the verification step, your account still has been created and will be rewarded the Fauna. This Fauna will only be available through this event, and we hope makes up for some of the disappointment and unrest.
Second, we are restoring lost followers. If you had a follower you liked, with either evidence through a picture/screenshot or the follower did get generated, we can and will restore it!
Before reopen, we will invite users to send a ticket of their follower evidence or request, and we will communicate with you in a multi-step process to fix your follower, no matter how we proceed! After we reopen all tickets sent will be followed up with a request to clarify which follower to change to the original. Further information on this will be released in the coming days.
We will also be requesting users to give input on the following:
What we should do with the data
The chief matter on our hands now is how we should handle the data going forward. We understand that users are very disappointed. Many cats were stuck with IDs while being invalid (ghost cats) and some users were not able to access cats at (the malformed cats bug, which was caused by the bottleneck, in which users could not generate anything). We want to ask which option going forward feels the fairest for users.
We will be releasing an official, clearly outlined form on two matters:
What is the userbase's most preferred time to reopen?
How do users want the data to be dispersed?
Whether that is through a data wipe, partial data wipe, or attempting to save and restore everything we already have, we are currently going through our options and assessing what is possible. We first made a Discord poll to see where users were at, but this was ultimately erroneous on our part and executed during a time where admin and devs were frantic, fatigued, addressing users, and attempting to damage control. We apologize if this caused any further upset or confusion.
All of the options provided remain possibilities, but we need to give a clearer and more explanative breakdown of what each option entails. For this, please allow us a few days!
No one came out of this experience unscathed, and we are doing our best to roll with the punches. In the grand scheme, this will be a little blip on the PawBorough timeline, and what matters most is not only that it happened, but the measures and efforts we take to appropriately and competently respond. We hope to have shed a little light on what went down, our accountability, and what we will be doing to apologize and rectify the situation.