The <code>char</code> Primitive in Java β€” UTF-16 Code Units

char is a 16-bit unsigned integer that stores a single UTF-16 code unit β€” not a "character". Most Latin, Cyrillic and CJK characters fit in one char; emoji and many rare scripts need two (a surrogate pair). That distinction causes the most common char bugs.

Declaration

char a = 'A';           // single quotes β€” not double
char nl = '\n';          // escape sequence
char nbsp = '\u00A0';    // unicode escape
char zero = 0;           // also valid β€” char is a 16-bit number

Escape sequences

EscapeMeaning
\nNewline
\tTab
\rCarriage return
\\Backslash
\'Single quote
\"Double quote
\0Null
\uXXXXUnicode code unit (4 hex digits)

Emoji break naive iteration

String s = "AπŸ™‚";
System.out.println(s.length());    // 3 β€” the emoji is 2 chars
for (int i = 0; i < s.length(); i++) {
    System.out.println(s.charAt(i));  // 'A', then two surrogates β€” garbage
}

// βœ… Iterate by Unicode code point
s.codePoints().forEach(cp -> System.out.println(new String(Character.toChars(cp))));

char is a number

char c = 'A';
int  n = c + 1;          // 66 β€” int promotion
char next = (char) (c + 1);  // 'B'
boolean isDigit = c >= '0' && c <= '9';

Character utility methods

Character.isLetter('A');      // true
Character.isDigit('7');       // true
Character.isWhitespace(' ');  // true
Character.toUpperCase('a');   // 'A'
Character.isLetterOrDigit('_'); // false

Common mistakes

  • Using char for "a Unicode character" β€” works until an emoji arrives. For text processing, use String and codePoints().
  • Confusing 'A' and "A" β€” single quotes = char, double quotes = String.
  • Reading chars from a stream β€” always decode with a charset (StandardCharsets.UTF_8) to avoid platform-default surprises.

Related

Pillar: Java primitives. Tool: Java String Escape handles every \uXXXX variant. See also byte.