Press "Enter" to skip to content

Java Stream API was Broken Before JDK10

Of course not all of it, but history showed that Stream API featured a few interesting bugs/deficiencies that can affect anyone still residing on JDK8 and JDK9.

Stream#flatMap

Update: the bugfix for this issue was backported to JDK 8 (but not JDK 9) (openjdk8u222).


Unfortunately, it turns out that Stream#flatMap was not as lazy as advertised which contributed to a possibility of some crazy situations to exist.

For example, let’s take this one:

Stream.of(1)
  .flatMap(i -> Stream.generate(() -> 42))
  .findAny()
  .ifPresent(System.out::println);

In JDK8 and JDK9, the above code snippet spins forever waiting for the evaluation of the inner infinite Stream.

One would expect O(1) time complexity from a trivial operation of taking a single element from an infinite sequence – and this is how it works as long as we don’t process an infinite Stream inside Stream#flatMap:

Stream.generate(() -> 42)
  .findAny()
  .ifPresent(System.out::println);

// completes "immediately" and prints 42

What’s more, it gets worse if we insert some additional processing after a short-circuited Stream#flatMap call:

Stream.of(1)
  .flatMap(i -> Stream.generate(() -> 42))
  .map(i -> process(i))
  .findAny()
  .ifPresent(System.out::println);

private static <T> T process(T input) {
    System.out.println("Processing...");
    return input;
}

Now, not only we’re stuck in an infinite evaluation loop but also we’re also processing all items coming through:

Processing...
Processing...
Processing...
Processing...
Processing...
Processing...
Processing...
Processing...
Processing...
Processing...
Processing...
Processing...
...

Imagine the consequences if the process() method contained some blocking operations, unwanted side-effects like email send-outs or logging.

Explanation

The internal implementation of Stream#flatMap is here to blame, especially the following part:

@Override
public void accept(P_OUT u) {
    try (Stream<? extends R> result = mapper.apply(u)) {
        // We can do better that this too; optimize for depth=0 case and just grab spliterator and forEach it
        if (result != null)
            result.sequential().forEach(downstream);
    }
}

As you can see, the inner Stream is consumed eagerly using Stream#forEach (not even mentioning the lack of curly braces around the conditional statement, ugh!).

The problem remained unaddressed in JDK9, but luckily the solution was shipped with JDK10:

@Override
public void accept(P_OUT u) {
    try (Stream<? extends R> result = mapper.apply(u)) {
        if (result != null) {
            if (!cancellationRequestedCalled) {
                result.sequential().forEach(downstream);
            }
            else {
                var s = result.sequential().spliterator();
                do { } while (!downstream.cancellationRequested() && s.tryAdvance(downstream));
            }
        }
    }
}

Stream#takeWhile/dropWhile

This one is directly connected to the above one and Stream#flatMap’s unwanted eager evaluation.

Let’s say we have a list of lists:

List<List<String>> list = List.of(
  List.of("1", "2"),
  List.of("3", "4", "5", "6", "7"));

and want to flatten them to a single one:

list.stream()
  .flatMap(Collection::stream)
  .forEach(System.out::println);

// 1
// 2
// 3
// 4
// 5
// 6
// 7

Works just as expected.

Now, let’s take the flattened Stream and simply keep taking elements until we encounter “4”:

Stream.of("1", "2", "3", "4", "5", "6", "7")
  .takeWhile(i -> !i.equals("4"))
  .forEach(System.out::println);

// 1
// 2
// 3

Works just as expected.

Let’s now try to combine these two, what could go wrong?

List<List<String>> list = List.of(
  List.of("1", "2"),
  List.of("3", "4", "5", "6", "7"));

list.stream()
  .flatMap(Collection::stream)
  .takeWhile(i -> !i.equals("4"))
  .forEach(System.out::println);

// 1
// 2
// 3
// 5
// 6
// 7

That’s an unexpected turn of events and can be fully attributed to the original issue with Stream#flatMap.

Some time ago I did run a short poll on Twitter, most of you were quite surprised with the result:

Parallel Streams on Custom ForkJoinPool Instances

Update: the bugfix for this issue was backported to JDK 8 (but not JDK 9) (openjdk8u222).


There’s one commonly-known hack (that you should not be using since it relies on internal implementation details of Stream API) that makes it possible to hijack parallel Stream tasks and run them on the custom fork-join pool by running them from within your own FJP instance:

ForkJoinPool customPool = new ForkJoinPool(42);

customPool.submit(() -> list.parallelStream() /*...*/);

If you thought that you managed to trick everyone already, you were partially right.

It turns out that even though tasks were running on a custom pool instance, they were still coupled to the shared pool – the size of the computation would remain in proportion to the common pool and not the custom pool – JDK-8190974.

So, even if you were using these even if you shouldn’t, the fix for that arrived in JDK10 – if you really need to use Stream API to run parallel computations, you could use parallel-collectors instead.




If you enjoyed the content, consider supporting the site: