String Length in Java: length() Explained
Finding the length of a String in Java is straightforward — until you hit emojis. The length() method returns the number of UTF-16 code units, which is not always the number of characters a human would count.
Basic usage
String name = "Alice";
int n = name.length(); // 5
String empty = "";
int e = empty.length(); // 0
length() is a method (parentheses required), unlike array's .length property.
What length() really counts
Internally, Java stores strings as an array of 16-bit units. For characters in the Basic Multilingual Plane (most Latin, accented, CJK, Cyrillic), one character = one unit, so the count matches intuition.
"hello".length(); // 5
"café".length(); // 4 (é is a single code unit U+00E9)
"日本語".length(); // 3
The emoji trap
Characters above U+FFFF (emojis, rare CJK, ancient scripts) use a surrogate pair — two 16-bit units for one logical character.
String greeting = "Hi 😀";
System.out.println(greeting.length());
// 5 — not 4! The smiley counts as 2
To count the actual number of characters (code points) a human would see:
int characters = greeting.codePointCount(0, greeting.length());
System.out.println(characters); // 4
If your application handles user input (social media bios, chat messages, names), always use codePointCount when you need a human-facing character count.
Combining characters add another twist
Some "characters" are actually composed of multiple code points — é can be U+00E9 (precomposed) or U+0065 U+0301 (letter e + combining acute accent). Even codePointCount counts them as two.
For visual grapheme clusters (what a user sees as "one character"), use BreakIterator:
import java.text.BreakIterator;
String s = "cafe\u0301"; // café (decomposed)
BreakIterator it = BreakIterator.getCharacterInstance();
it.setText(s);
int count = 0;
while (it.next() != BreakIterator.DONE) count++;
System.out.println(count); // 4 grapheme clusters
Null and blank
Calling length() on a null reference throws NullPointerException. Guard or use a utility:
int safeLength = (s != null) ? s.length() : 0;
// Or, with apache-commons-lang
int len = StringUtils.length(s); // returns 0 for null
// Check blank (null / empty / whitespace-only) — Java 11+
boolean blank = s == null || s.isBlank();
Empty string vs null
An empty string has length zero but is a valid object — you can call methods on it. A null reference is not an object at all. Don't confuse the two.
Maximum length
Java String can contain up to about 2.1 billion characters (Integer.MAX_VALUE), but you'll hit JVM memory limits long before that.
length() vs length vs size()
| Type | Usage | Notes |
|---|---|---|
| String | s.length() | Method, UTF-16 code units |
| Array | arr.length | Field (no parentheses) |
| Collection | list.size() | Method, element count |
| Map | map.size() | Method, key count |
Performance
length() is O(1) — the internal array's length is stored as a field. Call it freely inside loops without caching.
Common validations
// Username between 3 and 20 characters (code point count for Unicode safety)
int len = username.codePointCount(0, username.length());
if (len < 3 || len > 20) {
throw new IllegalArgumentException("Invalid length");
}
// Truncate with emoji safety
if (message.codePointCount(0, message.length()) > 280) {
int cutoff = message.offsetByCodePoints(0, 280);
message = message.substring(0, cutoff);
}
Quick takeaways
- Use
s.length()for quick byte-like counts and loop bounds. - Use
s.codePointCount(0, s.length())when you need the actual Unicode character count. - Use
BreakIteratorfor user-visible grapheme clusters. - Always guard against
null.
For 99% of code, plain length() is enough. The 1% involving user-submitted content and emojis is where the surrogate-pair trap matters most.