Streams over Collections in Java
The Stream API (Java 8+) is Java's declarative way to process collections. You describe what to do (filter, map, reduce) and the runtime handles how (possibly in parallel). A stream is a one-shot pipeline β not a collection β and doesn't mutate its source.
The pipeline model
List<String> adults = users.stream()
.filter(u -> u.age() >= 18) // intermediate
.map(User::name) // intermediate
.sorted() // intermediate
.toList(); // terminal β Java 16+
- Source β a
Collection, array,Files.lines,Stream.of. - Intermediate operations β lazy, return a new stream (
filter,map,flatMap,sorted,distinct). - Terminal operation β triggers execution (
toList,count,forEach,collect,reduce,anyMatch).
Common collectors
import static java.util.stream.Collectors.*;
users.stream().collect(toList()); // mutable
users.stream().toList(); // immutable β Java 16+
users.stream().collect(toUnmodifiableList()); // immutable
users.stream().collect(toSet());
users.stream().collect(toMap(User::id, Function.identity()));
// Group by
users.stream().collect(groupingBy(User::city)); // Map<City, List<User>>
users.stream().collect(groupingBy(User::city, counting())); // Map<City, Long>
users.stream().collect(groupingBy(User::city, mapping(User::name, toList())));
// Partition
users.stream().collect(partitioningBy(u -> u.age() >= 18)); // Map<Boolean, List>
// Joining
names.stream().collect(joining(", ", "[", "]")); // "[alice, bob]"
Primitive streams
int total = orders.stream().mapToInt(Order::amount).sum();
OptionalInt max = scores.stream().mapToInt(Score::value).max();
IntStream.range(0, 10).forEach(System.out::println);
Use IntStream, LongStream, DoubleStream when you'd otherwise box primitives β they're faster.
Lazy evaluation
var first = Stream.of(1, 2, 3, 4)
.peek(n -> System.out.println("peek " + n))
.filter(n -> n > 2)
.findFirst();
// Output: peek 1, peek 2, peek 3 β stops as soon as a match is found
Parallel streams
list.parallelStream().filter(...).toList();
Uses the common ForkJoin pool. Worth it only for large, CPU-bound, stateless operations. For small or I/O-bound tasks, plain stream() is faster.
Common mistakes
- Reusing a stream β they're one-shot. Call
stream()again on the source. - Side effects in
maporfilterβ streams should be pure. Side effects live inforEach, and even there are suspect with parallel streams. - Over-streaming β a simple
for-eachis sometimes clearer than.stream().forEach(). - Returning
Streamfrom a public API β callers must consume it exactly once. Return aListunless streaming is essential.
Related
Pillar: Java collections. See also Iterator.