The ultimate guide to conlang word generators

Creating the vocabulary for a conlang is a never-ending task. To make a conlang which is as expressive as a natural language means creating thousands of words.1 And, because language is mostly arbitrary, these could sound like anything that your conlang's phonology allows. So how do you go about choosing the root for any given word?

Some conlangers like to invent these roots themselves, pairing meaning together with sound by hand, with only their inner vision to guide them. Other conlangers prefer to have a little help: that is where conlang word generators come in. These are programs that generate at least the sound component of words or morphemes for you, allowing you to select forms you like and pair them with whatever meaning you like. Many conlangers, myself included, feel this helps relieve some of the pressure of having to create forms out of thin air.

But which word generator to use? There are several out there. Some of them require technical skill, while others are more point-and-click. Some handle only particular kinds of languages well, while others are more open ended. Some are commercial and others are free. I've tried every one I could find. This article presents the results of my testing, so you can find the one that works best for you.

Common features

All of the tools reviewed here share some core functionality. They all allow you to:

  • Define phoneme classes: This allows you to group, for example, all the consonants in your language under a symbol C and all the vowels under a symbol V.
  • Define possible syllable shapes out of those phoneme classes: This allows you to specify that the language's syllables all have the shape of CV, for example.2
  • Control the shape of words to a certain extent: This means that there is some control over how syllables (or in some cases phonemes) are combined to make full words.
  • Filter unwanted combinations of phonemes: This allows you to prevent the generation of words which contain a sequence of phonemes you don't want.3

Where these tools differ is in their power beyond these core features, as well as their ease of use and their overall aesthetics. Let's take a look at how six of the most popular word generators compare to one another.

Awkwords

Awkwords is a simple-seeming tool without a lot of flashy visuals. In fact, it has a decidedly old-school internet feel. It is one of the more basic options on the list but it does have some extra features. One of these is the ability to assign weights to different phonemes, making them appear more or less often in the results. Another is the ability to load and save the language settings so you can return to your work later.

You can find Awkwords here.

Gen & GenWord

I've put these together because GenWord is an improved version of Gen. The look and feel of both Gen and GenWord date back to the classic age of the web. Their functionality is similar to that of Awkwords, including probabilistic phoneme weighting functionality; also feature rewrite rules, which allows you to transform a particular sequence of sounds in the output. This is useful for enforcing phonotactic rules. For instance, if your language obligatorily palatalizes /s/ before /i/, you could enforce that in the generated words by rewriting all si sequences with ʃi. Both tools also offer limited control over word size, with a setting which lets you influence how many monosyllabic words are generated.

You can find Gen here and GenWord here.

GenGo

The next tool, GenGo, is very much in the same vein as the previous three. But you can think of GenGo as a more modern equivalent to those classics. Like Gen & GenWord, it has rewrite rules. But it also has more precise control over word length, with the ability to set a minimum and maximum number of syllables. This makes it possible to create languages where all roots are, for example, monosyllabic. One drawback of GenGo, at least as compared to the tools I've mentioned so far, is that it does not support probabilistic phoneme weighting.

You can find GenGo here.

Vulgar

Next we have Vulgar a tool that is unique among the ones I have surveyed in that it is a commercial product (requiring a one-time payment of USD 30), albeit one with a feature-limited free version. It does, however, do a great deal more than the other options available. I will outline its features here and let you decide whether it is worth the cost for you.

First, the visuals: Vulgar has an attractive, modern interface. It provides you with a lot of options, and, as a result, the interface can be a little hard to understand. It's not always clear what options affect what other options, and where the next button you'll have to press will be. That said, using Vulgar is largely a pleasant experience. One major thing that Vulgar does differently from the alternatives is that it generates a more-or-less complete language for you. This includes things like word meanings, morphological patterns, and word order rules. If you are looking for a language in a box, Vulgar will provide it. Unlike most of the other generators, Vulgar provides many different export options, including to .tex! As for the generator itself, it is full featured, allowing for rewrite rules, fully customizable word patterns, and probabilistic phoneme distribution.

If you'd like something that goes beyond generating phonological forms, and you're willing to pay a bit,4 Vulgar could be for you.

You can find Vulgar here.

Lexifer

Lexifer, a word generator written by William S. Annis, was originally a command-line tool only but has recently5 been ported to a web version.

Lexifer is a powerful tool which offers a few features other tools do not. First, it has paid special attention to creating forms in which different phonemes occur with different frequencies, as they do in natural languages. Although some of the other generators give options for controlling the frequency distribution of phonemes, none uses such a naturalistic model for doing so.

Another feature which sets Lexifer apart is the "cluster table feature", which allows you to represent in a concise way what happens when two segments collide. A cluster table is a table with two axes, the y axis representing the first segment in the cluster and the x axis representing the second segment.

% a  i  u
a +  +  o
i -  +  uu
u -  -  +
A cluster table drawn from the Lexifer documentation.6

The same results can be achieved using a variety of filters and rewrite rules, but cluster tables make things a lot easier.

The third unique feature Lexifer offers is its built-in rules for common phonological processes, such as nasal place assimilation (e.g. /an + pa/ → /ampa/) and voicing assimilation in stops (e.g. /ab + ta/ → /apta/). Since these processes occur over and over again, it's a nice time-saver that you can direct Lexifer to apply them out of the box.

The one caveat with Lexifer is that it requires a certain level of technical ability to be able to debug the specifications when things don't come out as you expect.7 To a certain extent, this is true of all the tools I've described, but it is especially true of Lexifer: because Lexifer does more than these other tools, more can go wrong. That said, you don't need to know how to code to use it: you may just need to spend some time poring over the documentation.

You can find Lexifer here.

Bonus: Logopoeist

Logopoeist is really cool: it is the only tool I've seen to implement a full statistical phonotactics system. In other words, it lets you specify the conditional probability of a phoneme given what has come before. This is extremely powerful, which means that you can model things like vowel harmony with relative ease.

The increased power, however, does come along with increased complexity. And, since Logopoeist is only available as a command-line utility, if you're not comfortable with using the command line,8 I recommend checking out one of the web-based options above.

You can find Logopoeist here .

Conclusion

In the end, all of these tools do a good job at their core purpose: generating phonological shapes for words and morphemes. It's in other things, such as ease of use, look and feel, and advanced options that you see the differences between the various word generators.

If you're comfortable with the syntax, I recommend Lexifer, especially if you want to take advantage of its phoneme frequency and cluster resolution rule features. Otherwise, try GenGo or, if you're willing to pay a bit of money to unlock the advanced features, Vulgar. Vulgar is also your best bet if you don't have the time or interest to customize the language a lot.

Footnotes

  1. More specifically, you need to create thousands of roots and other morphemes to build words out of, as well as methods for building words out of these morphemes. Usually there will be thousands and thousands of these morphemes, but if you are making an oligosynthetic conlang, such as Toki Pona, you can get away with a much smaller number.

  2. A CV syllable is a syllable consisting of any one of the language's consonants followed by any one of the language's vowels.

  3. Common sequences on my reject list are /ji/ and /wu/.

  4. If you use Vulgar, you'll probably end up wanting to pay for the premium version. The free version is fairly limited in that it only generates 200 words and does not allow you to save the language specification, among other, more minor restrictions.

  5. As of July 2021.

  6. The + symbols indicate a legal combination and the - symbols an illegal combination. Any other character indicates the result of a change. For example, in the sample table, the combination of /au/ is rewritten as /o/.

  7. In my experience, omitting symbols from the letters directive is a common place for errors to occur.

  8. On a Mac, the command line is accessible through the Terminal app. On Windows, it can be found using the Command Prompt. If you're using Linux, you don't need me to tell you how to access the command line.

Subscribe to my linguistics newsletter

If you'd like to stay up to date with my progress in bringing linguistics out of the ivory tower, I invite you to join the 100+ people receiving my newsletter.