The <code>char</code> Primitive in Java β UTF-16 Code Units
char is a 16-bit unsigned integer that stores a single UTF-16 code unit β not a "character". Most Latin, Cyrillic and CJK characters fit in one char; emoji and many rare scripts need two (a surrogate pair). That distinction causes the most common char bugs.
Declaration
char a = 'A'; // single quotes β not double
char nl = '\n'; // escape sequence
char nbsp = '\u00A0'; // unicode escape
char zero = 0; // also valid β char is a 16-bit number
Escape sequences
| Escape | Meaning |
|---|---|
\n | Newline |
\t | Tab |
\r | Carriage return |
\\ | Backslash |
\' | Single quote |
\" | Double quote |
\0 | Null |
\uXXXX | Unicode code unit (4 hex digits) |
Emoji break naive iteration
String s = "Aπ";
System.out.println(s.length()); // 3 β the emoji is 2 chars
for (int i = 0; i < s.length(); i++) {
System.out.println(s.charAt(i)); // 'A', then two surrogates β garbage
}
// β
Iterate by Unicode code point
s.codePoints().forEach(cp -> System.out.println(new String(Character.toChars(cp))));
char is a number
char c = 'A';
int n = c + 1; // 66 β int promotion
char next = (char) (c + 1); // 'B'
boolean isDigit = c >= '0' && c <= '9';
Character utility methods
Character.isLetter('A'); // true
Character.isDigit('7'); // true
Character.isWhitespace(' '); // true
Character.toUpperCase('a'); // 'A'
Character.isLetterOrDigit('_'); // false
Common mistakes
- Using
charfor "a Unicode character" β works until an emoji arrives. For text processing, useStringandcodePoints(). - Confusing
'A'and"A"β single quotes =char, double quotes =String. - Reading
chars from a stream β always decode with a charset (StandardCharsets.UTF_8) to avoid platform-default surprises.
Related
Pillar: Java primitives. Tool: Java String Escape handles every \uXXXX variant. See also byte.