Magic Item Tech, testing – part 2.

Hi everyone!

It’s been a while since my last post. Lately I had to focus on a lot of things (KREEP, Steam, paperwork, framework development, day-job, health issues, new cat 🙂 !!!), and I wound up not being able to actually focus on any of it…
Now I’m trying to catch my breath, doing the best I can to organize my time better, and I’m going to restore the habit of telling the story of my progress in game-development land (getting back to posting every week or two). Starting it, by delivering a long ago promised follow-up for the game software testing topic 😉 !

In the last post I summarized the design goals of the (halfway-)developed framework. The main purpose was to create an automatic testing system which provides a trivial way to create and extend high level (integration/functional) test cases for my game projects. Since then, I finalized most of the features and inner workings, and made a regression set for KREEP with a pretty decent code and functionality coverage.

The testing system and work-flow is based on “capture and replay”. This means, that for creating a test case you actually play the game, trying out various facets of the program, and the engine captures events (e.g.: input events, key and game-pad presses, etc…) while doing so. These events are than saved to a file, and the engine itself can run in “replay” mode, to replay these events later on. This alone would not allow things to be validated in an automatic fashion, but a special event type serves the role of replay based checks. These “Check” events implement the same interface as other common events, so replaying them can be done right before a given “Update” call, and they define specific assertions for game objects in the running game. Since components have a string identifier tag, a check can search for given entities (like the player, or a map element, or any enemy monster etc…), and with a little reflection magic, assert on any of the properties of these components. Filling the replay files with these checks to be done before given “Update” calls creates the actual validating automatic test cases.

Here is the class diagram (simplified) showing the architecture. It’s clearly visible, that the record & replay systems and their building blocks are mirrored (as their functionality/goal) and it is easy to extend both systems with introducing new leaf implementations (recorders and events):
2016_02_14_UML

I’m already experimenting with other “Check” event types. Screen-shot compare check compares the state of the back buffer to a given reference image. This approach has both great advantages (e.g.: sanity checks for rendering/graphics code + validates huge amount of the functionality leading to a given state of the game) and disadvantages too, since it is quite unstable (changing a sprite or a model a little can cause the comparison to fail, but smart comparison algorithms, like histogram or color-channel-distance based comparisons can help) + they are not really helpful until the game is (or at least larger parts of it are) in a quasi finished state. This is why I haven’t based the validation aspect around this approach, and why it is still not a fully flashed out part of the test framework. Game-object hash value checks will be a somewhat similar beast. They are just like the standard property checks, but instead of asserting on scalar values/properties, the hash-code of a game-object (Object.GetHashCode) is recorded and checked when replaying. This is also quite fragile, because adding a new component or a new property to a game-object could break a test, so it is a type of check which is more useful when larger parts of the code approaches the finished status, but it can validate a huge part of the game state too! At least it is not hard to update broken but not actually failing tests with new hash values and screen-shots…

For achieving deterministic playback (at least supporting it in the lower level systems), the events are captured and replayed on a specific “step” of the game-loop instead of using timestamps, so a space-bar press polled on the 15th call of the “Update” function is played back right before the 15th “Update” call. For this to work as intended a “fixed delta time” game-loop is ~required, but it is not a hard-coded limitation, since both the record and replay systems support extensions (as seen on the UML diagram), and optionally a delta time can be saved for each step and replayed again as the delta time since the last “Update” call (viola, deterministic replay for “variable delta time” game-loops). Another aid to reliably test stochastic parts of the code, is seed events, usable to capture the seed of a random number generator and reset a given generator to the recorded seed when replaying right before the set game-loop step. Later on if a game or some parts of a game become non-deterministic, I hope, that due to the events are actually being a higher level abstraction, not tied at all specifically to input devices and input events, could be used for replaying non-deterministic game sessions with special game events instead of capturing input (e.g.: disabling a non-deterministic physics system upon replay and relying on “PhysicsDiagnosticEvent” instances).

As I mentioned, the events are serialized to a file for later replay. I chose XML (but could be anything similar) since I already have a lot of code, helpers and tools built for working with this format + I find it quite comfortable (+ a natural choice for .NET). Here is a simple replay file containing only a few key press events:



    
        
            
        
        
            
        
        
            
        
    

To be able to better organize test cases (extract common parts), and to aid the creation of test cases by hand instead of capturing game-play footage (really useful for UI/menu tests), I’ve implemented an “include” attribute for the “EventStrip” type, so that the contents of a replay can be defined in multiple files. Event strips are actually specific event implementations containing a list of “relative” events which can be replayed/started at a given frame relative to the starting frame of the strip itself. This way multiple events can be replayed “parallel”, and it is easy to capture multiple separate event footage and play them combined simultaneously:



    
        
            
        
        
            
        
    

To be as compact as possible, both memory, disk-space and mental-health wise :D, the basic building block, the “DiagnosticEvent” class is not defined and implemented as a “once-only” event like in most event architectures. It has a duration, and any concrete event implementing it’s interface can decide to span over and be played for multiple “Update” calls. The most common example is a key-press. There are multiple ways to capture and replay a user pressing a key, than later on releasing it. Most common approaches are with their cons. against them:

  1. Save keys in pressed state every single frame as a distinct event. This takes an awful lot of memory and disk-space, and it is close to impossible to edit by hand…
  2. Save two events for each press, an event for pressing and an event for releasing. This is a much much better approach than the first one, but I still hated it’s concept, since any time you wold like to edit an actual key-press, for example make it happen a couple of frames earlier you have to modify two events, and you have to make sure they align well, the frame numbers are correct, the release event is not missing etc… since you may accidentally end up with a replay which presses a key and never releases as a bad example.

The third approach, which I used, and I think is the most feasible solution, is one event which can define how many frames it spans over. As an example a player presses fire (e.g.: left mouse button) and holds it down for 30 frames. That is one event that should be replayed for 30 frames from it’s defined relative starting frame. This way it is easy to make a press take longer or shorter. Also to move around a button press within a test-case, e.g.: to make it happen earlier or later on, only one number has to be modified 😉 !



    
        
            
        
        
            
        
    

Here is the last XML example, a simple check used in the test suite for KREEP, requiring, that the first player (Red) is alive. The game-object for this player is tagged “Player One”, the players are contained within a game-object composition tagged “Players”, and the root component of the game is the “ScreenManager” which doesn’t need more explanation 🙂 .



    
        
            
                
                    
                        
                    
                
            
        
    

If this check is included for a given frame, and while replaying, on that frame the value of the “IsAlive” boolean property of the game-object is false, or the game-object is not found an exception is generated. That is how I validate things, and hopefully discover early if I mess stuff up with my modifications.

The last big magic trick I added to this whole mix is a test-case runner system. I’m a big “fan” of one-click stuff ( who isn’t 😛 😉 ? ). I’ve looked around how to do this, and since I’ve been using NUnit for a while now, it was my first trial/choice. Thankfully NUnit has both a command line and a gui based test-execution application, proper result reporting, and a built-in way to programmatically generate test cases runtime! So I’ve built a simple ~application thingy which generates test cases for the NUnit harness from replay files and some meta data ( again in an XML file 😀 ). When these tests are executed by NUnit, the glue app simply launches a pre-defined application linking my engine, e.g.: KREEP, starting it in “replay” mode and feeding the path of the replay XML file to be loaded and run ( achieved with huge amount of reflection magic and some extra inter-process naughtiness 😀 ). If no exception occurs during the replay, it shuts down the game, nice and clean, than advances; otherwise the un-handled exception propagates to the application domain border, and the glue app fetches it with some inter-process serialization magic (again) to make NUnit know about the failure situation and cause. All in all the glue app is pretty small, has no dependencies at all besides NUnit, it utilizes some tricks ( a.k.a hacks 😛 ), but nothing out of the ordinary (actually pretty “common” stuff for core .NET programmers), and as the last and best addition, it will work out of the box without any modifications for any game project which is built upon my framework (no special implementation/preparation is required from the game application either!).

I recorded a little footage with Bandicam to show how this looks like in action. In the first part of the video, I execute three selected test-cases, all passing. Than I edit the third case to add a deliberate failure. This modified case checks the “Energy shield” mutator. It expects, that when a match starts with this mutator set, all players have an active shield, and a laser shot hitting the second player will not score a kill, but the shield will be disabled right afterwards. This expected boolean property (ShieldActive) is changed to “true”, which is obviously wrong, as the shield do wear-off right after the shot, and the test runner signals the failed assertion:

This way, I just have to press a button, and within a couple minutes I know whether a new/modified version of KREEP is ready to be released or not.

Lessons learned, conclusions, plans for the future:
It exceeded my expectations. I know it’s my brain-child and stuff :D, so who else would be pleased if not me, but I do believe it is going to make my life much easier with game releases and patches in the future, and probably will help a lot mid production phase too. It took approximately two work days, recording and creating test-cases, to reach an 80% code coverage on the KREEP code base. This framework is a pretty decent result and I’m happy for making and having it 🙂 ! Also there are a lot of handy utility features already built-in, since I upgraded some parts while I was using it to make it more comfy, but this post is already enormous to talk about all those stuff 😀 …
A “limitation” which I’m going to fix for my next project is the time it takes to run a full test. It is not yet unbearable or anything ( it takes approximately 5 minutes for KREEP, so a coffee and/or a cup of tea 🙂 ), but for a project with a lengthy single player campaign it could take “too” long, and parallel test-case execution (which NUnit supports) would not help too much (though with save-games it could be helped). A simple antidote to this, on which I’m already working on, is a special game-loop, which “fakes” a fixed 60 times-per-second update rate, passing down 16.66 elapsed milliseconds to game-objects, but actually steps the simulation as fast as possible ( poor CPU 😀 😛 ), so to speak achieving a fast-forward speed-up mode.

This post became pretty lengthy and heavily technical, but I wanted to share my latest big achievement in detail (yes, I love tech-talk…).
Meanwhile the work on the Steam release for KREEP is ongoing. It goes much slower than I expected, so the early march release is in danger currently, but I’m doing the best I can. Not all is lost yet. The paperwork is done, I’m a Steamworks partner, it’s official :), and I’m working on integrating the SteamApi. Also working hard to add extra content for the release (achievements yeah!!!). I hope it’s going to be cool.

Next time I’ll do a more detailed status report on KREEP+Steam…
Stay tuned!