What Is UTF-8 Encoding? A Developer's Guide
4 min read
UTF-8 is a variable-length character encoding that can represent every Unicode character. It's used by 98%+ of all websites.
How UTF-8 Works
| Character Range | Bytes | Example |
|---|---|---|
| U+0000 to U+007F (ASCII) | 1 byte | A = 0x41 |
| U+0080 to U+07FF | 2 bytes | é = 0xC3 0xA9 |
| U+0800 to U+FFFF | 3 bytes | 中 = 0xE4 0xB8 0xAD |
| U+10000 to U+10FFFF | 4 bytes | emoji |
Common Issues
- "Mojibake" — garbled text from wrong encoding detection
- Database encoding mismatch
- BOM (Byte Order Mark) in files
Clean Encoding Issues
Use our plain text converter to strip problematic characters.