How It Works

The linguistics behind the nonsense

The Problem With Pure Randomness

A truly random string of characters — say, xqzbt or fkrmpl — is unpronounceable and immediately recognisable as machine-generated. For a word to feel like it could be a real English word, it needs to follow the same phonological rules that real English words follow.

Those rules are not arbitrary. Every language constrains which sounds can appear next to each other, in what positions, and in what combinations. This is called phonotactics. Wurbz applies English phonotactics at every stage of word construction.

Syllable Structure: Onset · Nucleus · Coda

The fundamental unit of English phonology is the syllable. Every English syllable is built from three components:

  • Onset — the opening consonant or consonant cluster. Can be empty (as in up), a single consonant (t in top), or a cluster of up to three consonants (str in strength).
  • Nucleus — the vowel core of the syllable. Can be a single vowel (a, e, i, o, u) or a vowel digraph (ea, ou, ai, ee).
  • Coda — the closing consonant or consonant cluster. Can be empty (open syllable), a single consonant (t, n, d), or a cluster (nd, st, ng). English also allows derivational suffixes here: -ing, -tion, -ness, -er.

Each syllable in a generated word is assembled from these three slots independently, then joined together.

Weighted Phoneme Frequency

Not all sounds are equally common in English. The generator uses frequency tables where each phoneme or cluster is assigned a weight proportional to how often it appears in real English words.

Common onsets (high weight): s, c, p, t, m, b, r, st, br, tr

Rare onsets (low weight): x, z, qu, wh, str, spr

Common nuclei: a, e, i, o, er, ea, ou

Common codas: -e (silent final-e pattern), -t, -n, -d, -s, -ng, -tion, -er, -ing, -ness

A weighted random selection is made at each slot, so common patterns appear more often while rare ones still occur occasionally — just as in real English vocabulary.

Syllable Count Patterns

Words are built from one of five patterns, each with a weighted probability that reflects the natural distribution of English word lengths:

PatternExample shapeChance
MonosyllabicCVC → Grolt25%
Disyllabic (CVC + CVC)Bran · ston30%
Disyllabic (CV + CVC)Ve · xmore25%
Disyllabic + suffixCrest · er15%
TrisyllabicMe · rri · den5%

The disyllabic patterns are the most common because two-syllable words dominate everyday English. The trisyllabic pattern is rare by design — longer words are harder to evaluate quickly and less useful as brand names or character names.

A Word Being Built: Step by Step

Here is an example of the disyllabic (CVC + CVC) pattern producing the word Merriden:

Pattern selected: Disyllabic CVC + CVC Syllable 1 onset: m (common consonant, high weight) Syllable 1 nucleus: er (vowel digraph, common) Syllable 1 coda: r (sonorant, common) → First syllable: "merr" Syllable 2 onset: (empty — open syllable variant) Syllable 2 nucleus: i (short vowel) Syllable 2 coda: den (cluster, place-name feel) → Second syllable: "iden" Raw result: "merriden" Validation: ✓ has vowels ✓ 8 chars ✓ no triple clusters Capitalised: Merriden

Validation Rules

After a candidate word is assembled, it is tested against a set of hard rules. Words that fail are discarded and a new attempt is made (up to 50 times).

  • Must contain at least one vowel. Pure consonant strings are rejected.
  • No three or more consecutive consonants. Clusters like nstr or rltk are unpronounceable in English and rejected.
  • No three or more consecutive vowels. Sequences like aou or eea produce awkward results.
  • Q must be followed by U. The English spelling rule qu is enforced; a lone q is rejected.
  • Length between 3 and 12 characters. Very short words are too common; very long words are unwieldy.

Refinement

Words that pass validation are then refined to remove a few patterns that slip through the statistical model:

  • Triple character runs — sequences of three identical characters in a row (e.g., sss) are collapsed to two.
  • Malformed Q — any q not already followed by u has a u inserted after it.
  • Double-Jjj is reduced to a single j, since this cluster never appears in English.

After refinement, the final word is capitalised and returned to the user.

Fallback Words

In extremely rare cases — when 50 consecutive generation attempts all fail validation — the generator falls back to a curated list of hand-picked nonsense words. This list was assembled manually to guarantee quality and serves as a safety net, not a primary source. In practice, it is almost never reached.