Handling Special Characters in URL Slugs

6 min read

Most slug generators work fine for plain English text. But the real world has accented characters, non-Latin scripts, emojis, and symbols. This guide covers how to handle all of them when generating URL slugs.

The Problem: URLs Have a Limited Character Set

URLs can only safely contain ASCII characters: letters (a-z), digits (0-9), and a few symbols like hyphens and underscores. Everything else must be either converted or percent-encoded (like %C3%A9for é). Percent-encoded URLs are ugly, hard to read, and bad for SEO. The solution is transliteration.

Transliteration: The Core Technique

Transliteration converts characters from one script to their closest ASCII equivalent. Some common examples:

InputTransliteratedLanguage
é, è, ê, ëeFrench, Portuguese
ü, ö, äue, oe, aeGerman
ñnSpanish
ßssGerman
çcFrench, Portuguese

Our Slug Generator performs transliteration automatically when the “Transliterate” option is enabled.

Unicode Normalization

Before transliteration, you need to normalize Unicode text. The character é can be stored two ways in Unicode:

  • Composed (NFC): a single code point (U+00E9)
  • Decomposed (NFD): the letter “e” followed by a combining accent mark (U+0065 + U+0301)

Both look identical on screen but are different bytes. Normalizing to NFKD (compatibility decomposition) separates the base letter from its accent mark, making it easy to strip the accents:

// JavaScript
function removeAccents(text) {
  return text
    .normalize('NFKD')
    .replace(/[\u0300-\u036f]/g, '');
}

removeAccents('Café Résumé');
// Output: Cafe Resume

CJK Characters (Chinese, Japanese, Korean)

CJK characters don’t transliterate to Latin in a meaningful way. There are two common approaches:

  1. Use romanization (pinyin for Chinese, romaji for Japanese). Libraries like pinyin (npm) or kuroshiro can do this, but the results may not match user expectations.
  2. Keep the characters as-is. Modern browsers and search engines handle Unicode URLs well. Google can index URLs with CJK characters. The URL will be percent-encoded in the address bar but will display correctly in search results.

For most use cases, option 2 is safer—romanized CJK text can be unreadable to native speakers.

Emojis in URLs

While technically possible (emojis get percent-encoded), emojis in slugs are a bad idea:

  • They become extremely long percent-encoded strings
  • They break in many systems and APIs
  • Search engines may not index them properly
  • They can’t be typed manually

Always strip emojis during slug generation. A simple regex pattern can remove them:

text.replace(/[\u{1F600}-\u{1F6FF}\u{2600}-\u{26FF}\u{2700}-\u{27BF}]/gu, '')

Common Special Characters

Here’s how a good slug generator handles common special characters:

CharacterActionReason
&Replace with “and” or removeHas special meaning in URLs
@, #, ?, =RemoveReserved URL characters
“ ” ‘ ’RemovePunctuation, no semantic value in slug
— –Replace with hyphenSimilar purpose, normalize to separator
/ \\Replace with hyphenSlashes create path segments

Testing Your Slug Generator

Test with these tricky inputs to make sure your slug generator handles edge cases:

  • “Café Menu — Special Édition!” cafe-menu-special-edition
  • “100% Free & Open Source” 100-free-and-open-source or 100-free-open-source
  • “ Lots of spaces ” lots-of-spaces
  • “---triple---hyphens---” triple-hyphens
  • Empty string → should return empty or a fallback

Try these examples directly in our Slugify Online tool and see how it handles each case.

Library Support

For a deeper comparison of slug libraries across languages, check our guide on how to slugify text in JavaScript, Python, and PHP. Each library handles special characters differently, so choose one that fits your use case.