What Is UTF-8 Encoding? A Developer's Guide

4 min read

UTF-8 is a variable-length character encoding that can represent every Unicode character. It's used by 98%+ of all websites.

How UTF-8 Works

Character RangeBytesExample
U+0000 to U+007F (ASCII)1 byteA = 0x41
U+0080 to U+07FF2 bytesé = 0xC3 0xA9
U+0800 to U+FFFF3 bytes中 = 0xE4 0xB8 0xAD
U+10000 to U+10FFFF4 bytesemoji

Common Issues

  • "Mojibake" — garbled text from wrong encoding detection
  • Database encoding mismatch
  • BOM (Byte Order Mark) in files

Clean Encoding Issues

Use our plain text converter to strip problematic characters.