Thanks to the success of Semantle, I decided to become a game developer. I’m going to try it for a year, and see if I can make a living at it. At the end of May, I quit my job, and started working on a new game called Surfwords. Now, at the beginning of August, I’m nearly done. As soon as I get this released, it’s on to the next thing (probably an archaeology-themed solitaire card game, although maybe I’ll do a two-player version too).

I love words, and I love word games. My favorite is Montage, which is a cross between Taboo and a crossword puzzle. You have to figure out a playable word, clue it, and have your partner guess before both opponents do, all within a one-minute timer. That’s the level of intensity I crave.

I was also inspired by Super Hexagon: it’s simple, you lose a lot, and with practice, you can improve. Surviving for a minute is a real achievement. One thing I like about Super Hexagon is that it more-or-less could have been written for the Vectrex in the eighties, but in fact it wasn’t released until 2012. This gives me hope for the world: there are games out there that are destined to become absolute classics, that we could be making right now.

Surfwords is a fast-paced word building game. Letters drop from the top of the screen in two columns. You have to choose which column to take at each row, in order to build up a series of words. It’s dead simple: you can pick it up and immediately start playing it. But it’s not at all easy.

prefix OWNE; incoming left: X H A; incoming right: R R E

I want to talk about some things that went well, and some things that went poorly, in the hopes that I can learn something from the experience.

One of the easiest things was buying music. I had a bit of a false start: I reached out to a band I like to ask them to make music, and didn’t hear back. Then I reached out to another band I liked, and they quoted me a price which, while totally understandable, was outside of my budget. But then a friend pointed me to Patrick Cornelius. I sent him a “mood board” of tracks that I liked, and he wrote me original music with exactly the sound I was going for, at a price I could afford, and within my timeline. Working with a true professional is wonderful. Next time, I will start buying music and images first thing, to give the maximal possible lead time.

A surprisingly annoying thing was the word list. I started with the word list I had used for Semantle, which I had cobbled together from a variety of Internet sources, but it quickly became clear that this was not going to work: it had too many words that no word game player would like: “nd” (as in 2nd, but without the 2) or “nco” (presumably, an abbreviation for non-commissioned officer, except that would be uppercase, and word games don’t usually allow initialisms). So then I tried the venerable “ENABLE2K” word list, but it was also filled with junk. Oh, unscrupulousnesses, sure, unscrupulousness is a noun, it must be pluralizable. No. Also, as the name implies, it hadn’t been updated in a while. So, no “twerk”, or even “blog”. In the end, I did a ton of work to manually remove thousands of bogus “ness” plurals and add in modern words. Of course, the work is never done: while writing this diary, I thought up a more few missing words, and so I had to go in and add them. There’s nothing more frustrating than a word game that won’t allow a word that you know is real. I have some experience with this: I once sent some raffia to Will Shortz to protest the word’s non-inclusion in Spelling bee.

For Surfwords, it would be equally frustrating for the game to expect you to know an obscure word. When you fail to make a word in Surfwords, the game tells you what you could have done:

OWNE is not a word; you should have gone for OWNER

But if it shows you a word you didn’t know, you’ll get annoyed. Oh, “pseudocholinesterases”, sure, I definitely should have gone for that. So Surfwords has two word lists: words it expects you to know and words you might know. The first is derived from Google n-grams, but n-grams has a lot of flaws: often a word will be present, but its plural won’t. And some short words are supposedly common, but are actually likely common because they’re typos or OCR errors for more-common words. Finally, some words that I think are commonly known aren’t: several of my testers didn’t know the word “fen”. I readily accept that I have a larger-than-average vocabulary, but I think “fen” is a perfectly ordinary word, and I am baffled that it has turned out to be so obscure.

Today, I’m announcing the release of my word lists. They are licensed as permissively as possible, because I would like them to become standard for word games, to replace the Scrabble word list, which is not freely licensed.

So much for the content. Now, on to the technology. I chose to use the Godot Engine. I’ve been supporting them on Patreon for years, and the price is certainly right. Godot has a decent user experience: it’s got a built-in editor, which is fast and responsive. The built-in debugger is usable, although I miss the ability to evaluate arbitrary code at breakpoints (“wait, what did that function return?”).

Godot has its own programming language, GDScript, which I have been describing as “shitty Python”: it’s a colons-and-indentation syntax with optional static typing. But it doesn’t have function pointers, and you can’t always refer to a class from inside itself.

GDScript is also quite slow. Surfwords stores two sets of words, described above. They’re not exactly big data — the lists total under 200k words, under 2 megabytes. In memory, they’re stored in a trie. But building the trie in GDScript took about 20 seconds on mobile processors. So that had to be fixed. Godot has optional C# support, and the interoperability between GDScript and C# worked fine for me. So my first rewrite of the trie code was in C#. Unfortunately, when it came time to build custom binaries for Android and iOS (stripping out all of the 3d stuff that I wasn’t using to save space), I had a really hard time with the C#.

So I rewrote the loading code again: first, I made a preprocessor in Python to convert the word lists directly to tries (sharing common suffixes) and spit out JSON, and then I wrote code to load the JSON and generate GDScript objects, which Godot can save and load natively. Unfortunately, that was still too slow: each trie node was an object, and the overhead of constructing all of those objects was too high. So my third rewrite stored everything as four large integer arrays (Godot has support for a few types of strongly-typed arrays which are memory-efficient and don’t require boxing). The arrays are: for each trie node, a pointer to its list of children, the number of direct children, a count of total descendants (with the sign bit used to track whether the node can end a word). And for the trie node children, pairs of [letter, child index].

Loading these four arrays could be just a mmap each, although in practice there’s probably a memcpy in there. Finally, it’s fast enough! This has the obvious disadvantage that the code is now incomprehensible gibberish. On reflection, I think I might be able to reduce it to three arrays. But at the beginning of the project, as I wrote my todo list, I wrote a final item: “It doesn’t have to be beautiful code. It has to work.” And the code works now, so I am not going to mess with it to shave a few millisecond off of the game’s load time.

I know that the code works because I have a suite of unit tests. Godot has a third-party unit testing framework, GUT. GUT has a ton of whizzy features, but I mostly just wrote three dozen manual tests for the parts of the game that were easiest to test and hardest to get right: letter selection and failure explanation. I also wrote one exhaustive test for letter selection, which turned up an interesting case that I’ll describe later. But first, I have to explain why letter selection is so tricky.

In the first draft of the game, what’s now called “hard letter selection” was the only setting. This drove testers crazy, because of the problem that I initially called “bash/hue”, but which I now call “hash/hue”, since “bas” is apparently not a common word (what do you mean you’re unfamiliar with ancient Egyptian concepts of the soul? Didn’t you go through an Egyptomania phase as a kid?).

prefix: HAS; incoming left: H U Z; incoming right: A Q E

Here, you might be tempted to go for “hash”, since longer words are worth more points. But if you do, you’ll be doomed: none of UZ, UE, QZ, and QE can start any words. So you have to end “has” and start “hue”. This is fair: you can see it coming. But it’s super hard.

Instead, in normal mode, when picking a new letter to add, I consider all four possibilities: - A: prefix + two next letters, then end-of-word - B: prefix + two next letters, continuing - C: prefix + one next letter, then end-of-word, then starting a new word - D: end immediately, then start a new word.

For each of these possibilities, I have to consider each pair of next letters. So, A is really: prefix + left bottom + left middle, prefix + left bottom + right middle , prefix + right bottom + left middle, prefix + right bottom + right middle. Whichever of these choices you make, you need to be able to continue. Case A is easy: any letter can start a word (although there are only three X words in the short list: xenon, xylophone, and xylophones). The other cases are trickier. Here’s one especially tricky one that my exhaustive testing turned up:

prefix: BAI; incoming left: A, S, F; incoming right: L I X

What letters should we give here? The user might be thinking of “bail”, or “bails”. But what if they’re thinking of “bailiff”? There’s a problem: if they go for “bail”, the only continuation is “ifs”, so we have to give them a “s”. If they go for “bails”, the only next word must start with either F or X, so the next letter must be one of AEIJLORU (J? oh, fjord, sure). That doesn’t leave room for the terminal F of “bailiff”. So we can’t always support the all of user’s possible choices. But we can support “bail” and “bails”, so we’re still OK.

There’s a related problem with explaining what happened after the user fails: given that we sometimes use random letters, and given that those random letters sometimes make unintended words, how do we tell the user what they should have done? The answer is, frankly, the yuckiest code in Surfwords. It’s totally unprincipled. But it does seem to work, and I’m glad to have the unit tests to prove it.

There are a fair number of things in Godot that don’t work well. Some of them are fixed in the next version, 4.0. I decided not to upgrade yet, because 4.0 is not released, and I think it would be a mistake to depend on pre-release software. Among the issues I hit:

  • On a M1 Mac, building for the simulator doesn’t work.
  • Rounded corners on mobile devices are not well supported
  • General polyline badness
  • Slow text rendering (if you notice some choppiness during the splash screen, that’s what’s going on in the background). I wasted like a week trying various ways to get around this, including doing my own text rendering, but in the end, I gave up and accepted that I would have to have jank somewhere and put it in the splash screen.
  • Regular Buttons don’t support different icons for hover. TextureButtons do, but they aren’t styled like buttons.
  • No built-in set data type
  • The Google Play Games Services plugin has three different versions (corresponding to different Godot versions, none of which is the current version), and they’re all based on v1 of the API, which requires full internet permissions. To be fair, Google’s Play Games Services examples are, as of this writing, also based on v1, and Google’s messaging on this could use some help. In the end, I rewrote just the portion that I cared about to use v2.
  • Out line font fading doesn’t work right. I decided to just live with this, but it is ugly.
  • Full screen didn’t work on OS X due to some random setting (possibly window/stretch/aspect="keep") — I just differentially debugged until it started working.
  • Dialogs get weirdly long

Speaking of Google Play Games Services: making that work was a total nightmare. Admittedly, some of the fault is mine: in one case, I typoed a method name. But that should have caused Godot to give me an error, and instead, it just silently did nothing. Another time, I wasted a bunch of time because I used the wrong kind of OAuth token. The docs were clear, but the token type was counterintuitive (of course you have to use a “web” token rather than a “Android” token for your Android app). And the docs were very clear:

Note that the Intent returned from the Task must be invoked with Activity.startActivityForResult(Intent, int), so that the identity of the calling package can be established.

Yep, that’s right, it inspects your call stack to figure out your permissions, rather than doing something sensible, like taking permissions as an argument. And it can only inspect that call stack if you use the method that does a callback (even though the callback is not used for anything). That’s quality engineering.

But actually, the design of Play Games Services is evil: it automatically installs a provider, playgamesinitprovider, which asks my users to log in before they even start the game. This is an atrocious user experience: my users don’t know if they care about leaderboards until they play the game. When I remove the provider (using horrible Gradle hackery), logging into Play Games doesn’t work unless the Play Games app is installed, and sometimes it doesn’t seem to prompt the user to install it (sometimes it does — I’m not sure what the difference is).

I think one possible lesson here is that I should have just written my own Play Games Services plugin from scratch to start. And another lesson is that maybe I should have just used dreamlo. But I didn’t want to deal with the identity management, so I didn’t.

Another thing that didn’t work: I wanted to do a cool flowing wave background. I was using Material Maker. I ran into a few minor issues with Material Maker itself — for instance, a slight annoyance which required me to manually hack my shader after export. But it was pretty easy to use, and the model is easy to understand.

The background was going to be really pretty! And working on it was lot of fun, if somewhat time-consuming. Honestly, I was never 100% happy with what I had made. But eventually, I hit the “good enough” point. And that’s where things went wrong. When I exported to Android, I was getting like 30 frames per second on my new-ish phone. So I dug into the generated shader code, and found several micro-optimizations with how gradients were handled. After several hours of messing around, I managed to increase the performance… to 32 frames per second! I scrapped the feature. If I were doing a non-phone game, I would try Material Maker again. But honestly, I think maybe next time I should just find an shader artist to make me a shader.

Speaking of my terrible choices: I decided to make the UI adjust to the screen size. This led to a bunch of last-second changes as a playtester requested screen rotation support for tablets. My UI code was generally terrible. Some of this is due to one of the aforementioned Godot bugs. Some of it is just that I didn’t have a good sense of what sorts of UI interactions I would need, and how they would interact with Godot’s scene system. So I have a lot of special cases where I ought to have a general system. I spent a few hours debugging a case where I set rect_size before rect_min_size (which doesn’t work: rect_size is clamped to rect_min_size). And some of it is because making the game portion of the UI adjust is really tricky: what size do users actually want?

But Godot generally doesn’t have great support for fine-grained manipulation of UI. For instance, I wanted a little breathing room around the dialog describing “hard endings mode”. Normally, one would do this by adjusting the margins, or padding, or something. But the margin settings didn’t work. The only setting that worked was the expand_margin_* settings — which move the border outside of the requested area of the dialog (requiring the dialog to be manually shrunk to accommodate it).

Playtesters had several useful suggestions. Next time, I’ll definitely start with a color palette in mind, rather than building one as I go. I’ll also think about the tutorial from the beginning; because I added Surfwords’s late in development, it’s not as interactive as I would have liked. I am very glad that I got Surfwords in front of an early tester back before I had done any visual design, since that helped uncover the has/hue problem in time for me to spend a bunch of time fixing it without delaying the release. Special thanks to playtesters Dan Luu, John, Sky, Josh, Praveen, and Rachel for expecially useful feedback, but thanks to all playtesters.

Surfwords is now available for Windows, Mac, and Linux, as well as iOS and Android

Previous post: Perhaps The Stars