The Ultimate Guide to the Java Stream API groupingBy() Collector

The groupingBy() is one of the most powerful and customizable Stream API collectors.

If you always find yourself not going beyond:

.collect(groupingBy(...));

…or simply wanted to discover more its potential uses, then this article is for you.

If you’re looking for a general Collectors API overview, head here.

Overview

Simply put, groupingBy() provides similar functionality to SQL’s GROUP BY clause, just for Java Stream API.

To use it, we always need to specify a property, by which the grouping be performed. We do this by providing an implementation of a functional interface – usually by passing a lambda expression.

For example, if we wanted to group Strings by their lengths, we could do that by passing String::length to the groupingBy():

List<String> strings = List.of("a", "bb", "cc", "ddd"); 

Map<Integer, List<String>> result = strings.stream() 
  .collect(groupingBy(String::length)); 

System.out.println(result); // {1=[a], 2=[bb, cc], 3=[ddd]}

But the collector itself is capable of doing much more than simple groupings like above.

Grouping Into a Custom Map Implementation

If you need to provide a custom Map implementation, you can do that by using a provided groupingBy() overload:

List<String> strings = List.of("a", "bb", "cc", "ddd");

TreeMap<Integer, List<String>> result = strings.stream()
  .collect(groupingBy(String::length, TreeMap::new, toList()));

System.out.println(result); // {1=[a], 2=[bb, cc], 3=[ddd]}

Providing a Custom Downstream Collection

If you need to store grouped elements in a custom collection, this can be achieved by using a toCollection() collector.

For example, if you wanted to group elements in TreeSet instances, this could be as easy as:

groupingBy(String::length, toCollection(TreeSet::new))

and a complete example:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, TreeSet<String>> result = strings.stream()
  .collect(groupingBy(String::length, toCollection(TreeSet::new)));

System.out.println(result); // {1=[a], 2=[bb, cc], 3=[ddd]}

Grouping and Counting Items in Groups

If you simply want to know the number of grouped elements, this can be as easy as providing a custom counting() collector:

groupingBy(String::length, counting())

and a complete example:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, Long> result = strings.stream()
  .collect(groupingBy(String::length, counting()));

System.out.println(result); // {1=1, 2=2, 3=1}

Grouping and Combining Items as Strings

If you need to group elements and create a single String representation of each group, this can be achieved by using the joining() collector:

groupingBy(String::length, joining(",", "[", "]"))

and in action:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, String> result = strings.stream()
  .collect(groupingBy(String::length, joining(",", "[", "]")));

System.out.println(result); // {1=[a], 2=[bb,cc], 3=[ddd]}

Grouping and Filtering Items

Sometimes, there might be a need to exclude some items from grouped results. This can be achieved using the filtering() collector:

groupingBy(String::length, filtering(s -> !s.contains("c"), toList()))

and in action:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, List<String>> result = strings.stream()
  .collect(groupingBy(String::length, filtering(s -> !s.contains("c"), toList())));

System.out.println(result); // {1=[a], 2=[bb], 3=[ddd]}

Grouping and Calculating an Average per Group

If there’s a need to derive an average of properties of grouped items, there are a few handy collectors for that:

averagingInt()
averagingLong()
averagingDouble()

and in action:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, Double> result = strings.stream()
  .collect(groupingBy(String::length, averagingInt(String::hashCode)));

System.out.println(result); // {1=97.0, 2=3152.0, 3=99300.0}

Disclaimer: String::hashCode was used as a placeholder.

Grouping and Calculating a Sum per Group

If you want to derive a sum from properties of grouped elements, there’re some options for this as well:

summingInt()
summingLong()
summingDouble()

and in action:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, Integer> result = strings.stream()
  .collect(groupingBy(String::length, summingInt(String::hashCode)));

System.out.println(result); // {1=97, 2=6304, 3=99300}

Disclaimer: String::hashCode was used as a placeholder.

Grouping and Calculating a Statistical Summary per Group

If you want to group and then derive a statistical summary from properties of grouped items, there are out-of-the-box options for that as well:

summarizingInt()
summarizingLong()
summarizingDouble()

in action:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, IntSummaryStatistics> result = strings.stream()
  .collect(groupingBy(String::length, summarizingInt(String::hashCode)));

System.out.println(result);

the result (user-friendly reformatted):

{
    1=IntSummaryStatistics{
      count=1, 
      sum=97, 
      min=97, 
      average=97.000000, 
      max=97}, 
    2=IntSummaryStatistics{
      count=2, 
      sum=6304, 
      min=3136, 
      average=3152.000000, 
      max=3168}, 
    3=IntSummaryStatistics{
      count=1, 
      sum=99300, 
      min=99300, 
      average=99300.000000, 
      max=99300}
}

Disclaimer: String::hashCode was used as a placeholder.

Grouping and Reducing Items

If you want to perform a reduction operation on grouped elements, you can use the reducing() collector:

groupingBy(List::size, reducing(List.of(), (l1, l2) -> ...)))

in action:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, List<Character>> result = strings.stream()
  .map(toStringList())
  .collect(groupingBy(List::size, reducing(List.of(), (l1, l2) -> Stream.concat(l1.stream(), l2.stream())
    .collect(Collectors.toList()))));

System.out.println(result); // {1=[a], 2=[b, b, c, c], 3=[d, d, d]}

Grouping and Calculating Max/Min Item

If you want to derive the max/min element from a group, you can simply use the max()/min() collector:

groupingBy(String::length, Collectors.maxBy(Comparator.comparing(String::toUpperCase)))

in action:

List<String> strings = List.of("a", "bb", "cc", "ddd");

Map<Integer, Optional<String>> result = strings.stream()
  .collect(groupingBy(String::length, Collectors.maxBy(Comparator.comparing(String::toUpperCase))));

System.out.println(result); // {1=Optional[a], 2=Optional[cc], 3=Optional[ddd]}

The fact that the collector returns an Optional is a bit inconvenient in this case – there’s always at least a single element in a group, so usage of Optional increases accidental complexity.

Unfortunately, there’s nothing we can do with the collector itself to prevent it. We can recreate the same functionality using the reducing() collector, though.

Composing Downstream Collectors

The whole power of the collector gets unleashed once we start combining multiple collectors to define complex downstream grouping operations – which start resembling standard Stream API pipelines – the sky’s the limit here.

Example #1

Let’s say we have a list of strings and want to obtain a map of string lengths associated with uppercased strings with a length bigger than 1, and collect them into a TreeSet instance.

We can do that quite easily:

var result = strings.stream()
  .collect(
    groupingBy(String::length,
      mapping(String::toUpperCase,
        filtering(s -> s.length() > 1,
          toCollection(TreeSet::new)))));

//result
{1=[], 2=[BB, CC], 3=[DDD]}

Example #2

Given a list of strings, group them by their matching lengths, convert into a list of characters, flatten the obtained list, keep only distinct elements with non-zero length, and eventually reduce them by applying string concatenation.

We can achieve that as well:

var result = strings.stream()
  .collect(
    groupingBy(String::length,
      mapping(toStringList(),
        flatMapping(s -> s.stream().distinct(),
          filtering(s -> s.length() > 0,
            mapping(String::toUpperCase,
              reducing("", (s, s2) -> s + s2)))))
    ));

//result 
{1=A, 2=BC, 3=D}

Bonus: groupingByConcurrent()

The groupingByConcurrent() is the concurrent counterpart of groupingBy(). It returns a Collector that stores
results in a ConcurrentMap and is designed to be used with parallel streams.

How groupingBy() Works Under the Hood With Parallel Streams

When groupingBy() is used with a parallel stream, the JDK splits the stream into partitions, groups each partition independently into a separate intermediate Map, and then merges all intermediate maps at the end. That merge step has a cost that grows with the number of distinct grouping keys.

The groupingByConcurrent() eliminates the merge step entirely. Instead of each thread maintaining its own map, all threads accumulate directly into a single shared ConcurrentHashMap. This makes it a better fit when you’re working with large datasets and a high number of distinct keys, where merging intermediate maps becomes a bottleneck.

Basic Usage

import java.util.concurrent.ConcurrentMap;
import static java.util.stream.Collectors.groupingByConcurrent;
import static java.util.stream.Collectors.toList;

List<String> strings = List.of("a", "bb", "cc", "ddd");

ConcurrentMap<Integer, List<String>> result = strings.parallelStream()
  .collect(groupingByConcurrent(String::length));

// Result: {1=[a], 2=[bb, cc], 3=[ddd]}

All three overloads mirror those of groupingBy():

// 1. Classifier only
ConcurrentMap<Integer, List<String>> result = strings.parallelStream()
  .collect(groupingByConcurrent(String::length));

// 2. Classifier + downstream collector
ConcurrentMap<Integer, Long> counts = strings.parallelStream()
  .collect(groupingByConcurrent(String::length, counting()));

// 3. Classifier + custom concurrent map factory + downstream collector
ConcurrentSkipListMap<Integer, Long> sorted = strings.parallelStream()
  .collect(groupingByConcurrent(String::length, ConcurrentSkipListMap::new, counting()));

Note that the custom map factory must produce a ConcurrentMap – unlike groupingBy(), we cannot supply a plain HashMap or TreeMap here.

CONCURRENT and UNORDERED Characteristics

groupingByConcurrent() declares two special collector characteristics that distinguish it from groupingBy():

CONCURRENT – the accumulator may be called concurrently by multiple threads against a single shared result container, removing the need for a merge phase.
UNORDERED – the order of elements within each group is not guaranteed, even if the source stream is ordered.

The UNORDERED characteristic is the most important gotcha: even when used with a sequential stream, the order of elements within groups is non-deterministic. If preserving element order within groups matters, stick with groupingBy().

Common Mistake: Using groupingByConcurrent() With Sequential Streams

With a sequential stream, groupingByConcurrent() adds the overhead of ConcurrentHashMap (volatile reads, CAS operations) with none of the parallelism benefit. Use groupingBy() for sequential streams.

When to Prefer groupingBy() Even With Parallel Streams

The groupingByConcurrent() is not always the better choice for parallel streams:

If element ordering within groups matters, use groupingBy() with a LinkedHashMap supplier instead.
If the downstream collector itself is stateful and not thread-safe, concurrent accumulation cannot be used safely regardless.
For small datasets, the overhead of concurrent data structures may outweigh the parallelism gain.

In practice, always benchmark before choosing groupingByConcurrent() over the simpler groupingBy() – parallel does not automatically mean faster.

Sources

All above examples can be found over in my GitHub project.

Make sure to check my OSS project with custom parallel Stream API collectors.

The Ultimate Guide to the Java Stream API groupingBy() Collector

Overview

Grouping Into a Custom Map Implementation

Providing a Custom Downstream Collection

Grouping and Counting Items in Groups

Grouping and Combining Items as Strings

Grouping and Filtering Items

Grouping and Calculating an Average per Group

Grouping and Calculating a Sum per Group

Grouping and Calculating a Statistical Summary per Group

Grouping and Reducing Items

Grouping and Calculating Max/Min Item

Composing Downstream Collectors

Example #1

Example #2

Bonus: groupingByConcurrent()

How groupingBy() Works Under the Hood With Parallel Streams

Basic Usage

CONCURRENT and UNORDERED Characteristics

Common Mistake: Using groupingByConcurrent() With Sequential Streams

When to Prefer groupingBy() Even With Parallel Streams

Sources

Further Reading