Java split() Method: How to Split a String

The split method on String breaks text into an array of pieces based on a regular-expression delimiter. It's the go-to for parsing CSV-like lines, paths, URLs and configuration strings.

The basics

String csv = "apple,banana,cherry";
String[] fruits = csv.split(",");
// fruits = {"apple", "banana", "cherry"}

for (String f : fruits) System.out.println(f);

split(regex) takes a regular expression, not a plain substring. This is the nΒ° 1 source of confusion.

Special characters need escaping

Regex metacharacters β€” . * + ? ( ) [ ] { } | \ ^ $ β€” must be escaped.

// "." matches ANY character, so split is wrong:
"a.b.c".split(".");    // ❌ returns an empty array

// Escape the dot:
"a.b.c".split("\\.");  // βœ… {"a", "b", "c"}

// Same for pipe:
"a|b|c".split("\\|");  // {"a", "b", "c"}

// Safer: use Pattern.quote()
String sep = ".";
"a.b.c".split(Pattern.quote(sep)); // {"a", "b", "c"}

Splitting on whitespace

String text = "  hello   world  ";
String[] words = text.trim().split("\\s+");
// {"hello", "world"}

\\s+ matches one or more whitespace characters (spaces, tabs, newlines). trim() first removes leading/trailing whitespace, otherwise the first token can be empty.

Limiting the number of splits

The two-argument form caps how many pieces you get β€” the last piece keeps the remaining text intact.

"a,b,c,d".split(",", 2);   // {"a", "b,c,d"}
"a,b,c,d".split(",", 3);   // {"a", "b", "c,d"}
"a,b,c,d".split(",", -1);  // {"a", "b", "c", "d", ""} β€” includes trailing empty

Negative limit preserves trailing empty strings β€” useful when parsing CSV where "a,b,," should yield 4 fields.

Default behavior drops trailing empty strings

"a,,b,,".split(",");      // {"a", "", "b"} β€” trailing empty dropped
"a,,b,,".split(",", -1);  // {"a", "", "b", "", ""} β€” all preserved

This surprise bites CSV parsers constantly. Always use -1 for data parsing.

Multiple delimiters

// Split on any of comma, semicolon, or pipe
"a,b;c|d".split("[,;|]");  // {"a", "b", "c", "d"}

// Split on any non-alphanumeric run
"a,  b; c!d".split("\\W+"); // {"a", "b", "c", "d"}

Split by a fixed character sequence

"one---two---three".split("---"); // {"one", "two", "three"}

When the separator is always the same short string and you want maximum performance, avoid regex entirely:

// Apache Commons Lang
StringUtils.split("one---two---three", "---"); // plain string split, faster

// Guava
Iterable<String> parts = Splitter.on("---").split("one---two---three");

Performance

String.split compiles the regex on every call. Inside a hot loop, precompile:

import java.util.regex.Pattern;

static final Pattern COMMA = Pattern.compile(",");

for (String line : lines) {
    String[] fields = COMMA.split(line);
    // ...
}

10-20x faster on long jobs.

Streaming with Pattern.splitAsStream

Pattern.compile(",")
       .splitAsStream("a,b,c")
       .filter(s -> !s.isEmpty())
       .forEach(System.out::println);

Avoids allocating an intermediate array.

Common patterns

Parse key=value pairs

String[] pair = "user=alice".split("=", 2);
String key = pair[0];
String val = pair.length > 1 ? pair[1] : "";

Lines of text

String[] lines = text.split("\\R"); // matches \n, \r\n, \r and Unicode line terminators

URL path segments

String[] segments = "/a/b/c".split("/");
// {"", "a", "b", "c"} β€” leading empty because of the leading slash

CSV β€” don't use split

Real CSV has quoting, embedded commas, escaped quotes, and multiline fields. split(",") handles none of these. Use a proper library: Jackson CSV, Commons CSV, or OpenCSV.

Common mistakes

  • Forgetting that ., |, +, etc. are regex metacharacters.
  • Losing trailing empty fields without -1.
  • Using split for real CSV.
  • Compiling the same regex over and over in a loop.
  • Forgetting the extra empty string when the input starts with the separator.

Used correctly, split handles 90% of text parsing needs in a single line. For the other 10%, reach for a proper parser.