I’ve been working for almost a year on a set of side projects built around a MUD called DragonRealms(DR) and it’s given me a chance to solve a lot of interesting problems, both technical and soft in nature. I want to use my blog as a chance to solidify and share some of my lessons learned in both areas. Robert Herbig has been working on this project with me and you should probably read his introductory post for a background on what this side project entails.
A good starting point seems like going over the environment of the Lich project we’re working in. This should provide some insight into the unique problems facing us as we try and develop software in this environment. I won’t fully go into the solutions we’ve found for each problem but they serve to highlight the constraints we’re working in and may serve as touch points for later posts.
At the high level Lich is a scripting engine that runs as an “aggressive proxy” between the game client and the game server. It’s designed to allow you to send commands to the client or game as if they came from the other. To facilitate this it supports the running of Scripts which are interpreted as native ruby code.
There are actually two different versions of the eval/binding that Lich can run scripts can run in. A script can either be Trusted or Untrusted (the default). Scripts running in Untrusted mode are running at $SAFE=3. This is a mostly outdated system that was an attempt to allow safe sandboxing of potentially dangerous code. This setting means no network or disk access, but also limits the access to things created in this mode. Trusted scripts are running like normal full ruby scripts. This leads to a variety of interesting behaviors.
In practice Trusted scripts can’t see classes, modules, and methods created in an Untrusted script. Untrusted scripts can see Trusted items but can’t invoke any methods that would result in the object taking an action requiring trust. This can easily lead to a lot of subtle and hard to trace bugs if a user accidentally trusts a script they shouldn’t or vice versa.
Our eventual solution to this was to declare that all of our scripts should be Untrusted, except for our installer/updater script. Any script wanting to take an action requiring trust passes a message to the Trusted script and the action is handled by that other script in its own thread, upon completion it unblocks the call by the Untrusted script.
This Trusted/Untrusted issue comes back and bites us later though. You can execute code snippets manually while in the game, easily verifying the behavior of some code. However Lich runs all of these snippets as Trusted. This makes it really hard to check on the state of Untrusted objects while debugging since the snippet can’t see them.
The Lich script ecosystem provides a script repository that allows users to upload and share scripts, however it’s quite unwieldy. Scripts don’t update to the latest version by default so when I fix a terrible bug many users may miss the fix for weeks or months. Only one person can upload a script so working on scripts co-operatively is exceedingly painful. In addition it becomes another place you need to remember to check in your code to and you need to remember to upload every single changed file. If people have some of your scripts auto updating but not others it can quickly break when you make dependent changes to both scripts.
In the end we developed our own repository equivalent that feeds off of a Github repo. We develop scripts normally and anyone running our repository script gets the latest versions from our repo when they first start up. This isn’t a perfect solution since someone with multiple characters can update the scripts out from under another character who was already running when the second character starts, but it’s a huge step forward in ease of development and behavior for the users.
Lich itself isn’t maintained in a publicly accessible repository, nor do we have an option to contribute directly. The maintainer plays a different game so focus is often divided. Combined with the fact that there’s no issue tracker means that relying on fixes to the platform isn’t always viable. Due to this problems in Lich are often fixed through bad workarounds and wrappers in the scripting layer leading to duplication and more bugs.
An example is when the
move(direction) command didn’t handle several failure messages specific to DR so we ended up having to write our own movement wrapper that detects failed movements and restarts the move over and over instead of just adding the failure messages in to the method in question.
This second-class nature of DR also affects sharing existing documentation with users, because many of the resources for Lich are often only partially correct for our game. We find ourselves frequently reinventing the wheel; overlooking something in Lich.rbw is easy, it’s 13k lines in one file which often looks like the following.
We ended up making a method that handles lag around command submission by submitting the command repeatedly until the game sends a recognized response, not realizing for several months that a very similar method existed already, undocumented in the Lich code.
Another problematic point is that since our scripts are evaluating inside the Lich environment any odd choices it has made can have disastrous consequences. It took almost 9 months to finally trace down some truly strange behavior to code living in Lich itself.
For anyone not familiar with Ruby, the
method_missing method is an incredibly powerful meta-programming tool. It receives any calls to the object with method names that don’t exist. However with great power comes the possibility to do terrible things. Here Lich makes it so if you ever have a
false value and accidentally call a method on it, nothing will fail and it will return a
false value as if the method had run. This lead to hair pulling debugging since a snippet like this that would throw a runtime error normally runs without error. As an example.
With all this pain, Lich is still the best option out there warts and all. More than that its open ecosystem means that if something causes us enough pain there’s always a viable solution if we we’re willing to take it.
We’ve recently forked Lich and are now maintaining our own version. This has allowed for faster improvements and fixes for DR specific issues and we’ve made it a simple toggle for users to move on or off our custom Lich install. This facilitated extending Lich support to other front ends so it now works with the single largest 3rd party front end for DR. This gives us unparalleled reach, we’re over 100 active users at any given time (about 20% of the player base). In addition I’ve added features like an untrusted equivalent to the in game snippet evaluator to facilitate debugging and testing.