Handling Special Characters in URL Slugs: A Developer Guide

Most slug generators work fine for plain English text. But the real world has accented characters, non-Latin scripts, emojis, and symbols. This guide covers how to handle all of them when generating URL slugs.

The Problem: URLs Have a Limited Character Set

URLs can only safely contain ASCII characters: letters (a-z), digits (0-9), and a few symbols like hyphens and underscores. Everything else must be either converted or percent-encoded (like %C3%A9 for é). Percent-encoded URLs are ugly, hard to read, and bad for SEO. The solution is transliteration.

Transliteration: The Core Technique

Transliteration converts characters from one script to their closest ASCII equivalent. Some common examples:

Input	Transliterated	Language
é, è, ê, ë	e	French, Portuguese
ü, ö, ä	ue, oe, ae	German
ñ	n	Spanish
ß	ss	German
ç	c	French, Portuguese

Our Slug Generator performs transliteration automatically when the “Transliterate” option is enabled.

Unicode Normalization

Before transliteration, you need to normalize Unicode text. The character é can be stored two ways in Unicode:

Composed (NFC): a single code point (U+00E9)
Decomposed (NFD): the letter “e” followed by a combining accent mark (U+0065 + U+0301)

Both look identical on screen but are different bytes. Normalizing to NFKD (compatibility decomposition) separates the base letter from its accent mark, making it easy to strip the accents:

// JavaScript
function removeAccents(text) {
  return text
    .normalize('NFKD')
    .replace(/[\u0300-\u036f]/g, '');
}

removeAccents('Café Résumé');
// Output: Cafe Resume

CJK Characters (Chinese, Japanese, Korean)

CJK characters don’t transliterate to Latin in a meaningful way. There are two common approaches:

Use romanization (pinyin for Chinese, romaji for Japanese). Libraries like pinyin (npm) or kuroshiro can do this, but the results may not match user expectations.
Keep the characters as-is. Modern browsers and search engines handle Unicode URLs well. Google can index URLs with CJK characters. The URL will be percent-encoded in the address bar but will display correctly in search results.

For most use cases, option 2 is safer—romanized CJK text can be unreadable to native speakers.

Emojis in URLs

While technically possible (emojis get percent-encoded), emojis in slugs are a bad idea:

They become extremely long percent-encoded strings
They break in many systems and APIs
Search engines may not index them properly
They can’t be typed manually

Always strip emojis during slug generation. A simple regex pattern can remove them:

text.replace(/[\u{1F600}-\u{1F6FF}\u{2600}-\u{26FF}\u{2700}-\u{27BF}]/gu, '')

Common Special Characters

Here’s how a good slug generator handles common special characters:

Character	Action	Reason
&	Replace with “and” or remove	Has special meaning in URLs
@, #, ?, =	Remove	Reserved URL characters
“ ” ‘ ’	Remove	Punctuation, no semantic value in slug
— –	Replace with hyphen	Similar purpose, normalize to separator
/ \	Replace with hyphen	Slashes create path segments

Testing Your Slug Generator

Test with these tricky inputs to make sure your slug generator handles edge cases:

"Café Menu — Special Édition!" → cafe-menu-special-edition
"100% Free & Open Source" → 100-free-and-open-source
" Lots of spaces " → lots-of-spaces
"---triple---hyphens---" → triple-hyphens
Empty string → should return empty or a fallback

Try these examples directly in our Slugify Online tool and see how it handles each case.

Library Support

For a deeper comparison of slug libraries across languages, check our guide on how to slugify text in JavaScript, Python, and PHP. Each library handles special characters differently, so choose one that fits your use case.