The groupingBy() is one of the most powerful and customizable Stream API collectors.
If you always find yourself not going beyond:
.collect(groupingBy(...));
…or simply wanted to discover more its potential uses, then this article is for you.
If you’re looking for a general Collectors API overview, head here.
Overview
Simply put, groupingBy() provides similar functionality to SQL’s GROUP BY clause, just for Java Stream API.
To use it, we always need to specify a property, by which the grouping be performed. We do this by providing an implementation of a functional interface – usually by passing a lambda expression.
For example, if we wanted to group Strings by their lengths, we could do that by passing String::length to the groupingBy():
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, List<String>> result = strings.stream()
.collect(groupingBy(String::length));
System.out.println(result); // {1=[a], 2=[bb, cc], 3=[ddd]}But the collector itself is capable of doing much more than simple groupings like above.
Grouping Into a Custom Map Implementation
If you need to provide a custom Map implementation, you can do that by using a provided groupingBy() overload:
List<String> strings = List.of("a", "bb", "cc", "ddd");
TreeMap<Integer, List<String>> result = strings.stream()
.collect(groupingBy(String::length, TreeMap::new, toList()));
System.out.println(result); // {1=[a], 2=[bb, cc], 3=[ddd]}Providing a Custom Downstream Collection
If you need to store grouped elements in a custom collection, this can be achieved by using a toCollection() collector.
For example, if you wanted to group elements in TreeSet instances, this could be as easy as:
groupingBy(String::length, toCollection(TreeSet::new))
and a complete example:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, TreeSet<String>> result = strings.stream()
.collect(groupingBy(String::length, toCollection(TreeSet::new)));
System.out.println(result); // {1=[a], 2=[bb, cc], 3=[ddd]}Grouping and Counting Items in Groups
If you simply want to know the number of grouped elements, this can be as easy as providing a custom counting() collector:
groupingBy(String::length, counting())
and a complete example:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, Long> result = strings.stream()
.collect(groupingBy(String::length, counting()));
System.out.println(result); // {1=1, 2=2, 3=1}Grouping and Combining Items as Strings
If you need to group elements and create a single String representation of each group, this can be achieved by using the joining() collector:
groupingBy(String::length, joining(",", "[", "]"))and in action:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, String> result = strings.stream()
.collect(groupingBy(String::length, joining(",", "[", "]")));
System.out.println(result); // {1=[a], 2=[bb,cc], 3=[ddd]}Grouping and Filtering Items
Sometimes, there might be a need to exclude some items from grouped results. This can be achieved using the filtering() collector:
groupingBy(String::length, filtering(s -> !s.contains("c"), toList()))and in action:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, List<String>> result = strings.stream()
.collect(groupingBy(String::length, filtering(s -> !s.contains("c"), toList())));
System.out.println(result); // {1=[a], 2=[bb], 3=[ddd]}Grouping and Calculating an Average per Group
If there’s a need to derive an average of properties of grouped items, there are a few handy collectors for that:
- averagingInt()
- averagingLong()
- averagingDouble()
and in action:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, Double> result = strings.stream()
.collect(groupingBy(String::length, averagingInt(String::hashCode)));
System.out.println(result); // {1=97.0, 2=3152.0, 3=99300.0}Disclaimer: String::hashCode was used as a placeholder.
Grouping and Calculating a Sum per Group
If you want to derive a sum from properties of grouped elements, there’re some options for this as well:
- summingInt()
- summingLong()
- summingDouble()
and in action:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, Integer> result = strings.stream()
.collect(groupingBy(String::length, summingInt(String::hashCode)));
System.out.println(result); // {1=97, 2=6304, 3=99300}Disclaimer: String::hashCode was used as a placeholder.
Grouping and Calculating a Statistical Summary per Group
If you want to group and then derive a statistical summary from properties of grouped items, there are out-of-the-box options for that as well:
- summarizingInt()
- summarizingLong()
- summarizingDouble()
in action:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, IntSummaryStatistics> result = strings.stream()
.collect(groupingBy(String::length, summarizingInt(String::hashCode)));
System.out.println(result);
the result (user-friendly reformatted):
{
1=IntSummaryStatistics{
count=1,
sum=97,
min=97,
average=97.000000,
max=97},
2=IntSummaryStatistics{
count=2,
sum=6304,
min=3136,
average=3152.000000,
max=3168},
3=IntSummaryStatistics{
count=1,
sum=99300,
min=99300,
average=99300.000000,
max=99300}
}Disclaimer: String::hashCode was used as a placeholder.
Grouping and Reducing Items
If you want to perform a reduction operation on grouped elements, you can use the reducing() collector:
groupingBy(List::size, reducing(List.of(), (l1, l2) -> ...)))
in action:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, List<Character>> result = strings.stream()
.map(toStringList())
.collect(groupingBy(List::size, reducing(List.of(), (l1, l2) -> Stream.concat(l1.stream(), l2.stream())
.collect(Collectors.toList()))));
System.out.println(result); // {1=[a], 2=[b, b, c, c], 3=[d, d, d]}Grouping and Calculating Max/Min Item
If you want to derive the max/min element from a group, you can simply use the max()/min() collector:
groupingBy(String::length, Collectors.maxBy(Comparator.comparing(String::toUpperCase)))
in action:
List<String> strings = List.of("a", "bb", "cc", "ddd");
Map<Integer, Optional<String>> result = strings.stream()
.collect(groupingBy(String::length, Collectors.maxBy(Comparator.comparing(String::toUpperCase))));
System.out.println(result); // {1=Optional[a], 2=Optional[cc], 3=Optional[ddd]}
The fact that the collector returns an Optional is a bit inconvenient in this case – there’s always at least a single element in a group, so usage of Optional increases accidental complexity.
Unfortunately, there’s nothing we can do with the collector itself to prevent it. We can recreate the same functionality using the reducing() collector, though.
Composing Downstream Collectors
The whole power of the collector gets unleashed once we start combining multiple collectors to define complex downstream grouping operations – which start resembling standard Stream API pipelines – the sky’s the limit here.
Example #1
Let’s say we have a list of strings and want to obtain a map of string lengths associated with uppercased strings with a length bigger than 1, and collect them into a TreeSet instance.
We can do that quite easily:
var result = strings.stream()
.collect(
groupingBy(String::length,
mapping(String::toUpperCase,
filtering(s -> s.length() > 1,
toCollection(TreeSet::new)))));
//result
{1=[], 2=[BB, CC], 3=[DDD]}Example #2
Given a list of strings, group them by their matching lengths, convert into a list of characters, flatten the obtained list, keep only distinct elements with non-zero length, and eventually reduce them by applying string concatenation.
We can achieve that as well:
var result = strings.stream()
.collect(
groupingBy(String::length,
mapping(toStringList(),
flatMapping(s -> s.stream().distinct(),
filtering(s -> s.length() > 0,
mapping(String::toUpperCase,
reducing("", (s, s2) -> s + s2)))))
));
//result
{1=A, 2=BC, 3=D}
Bonus: groupingByConcurrent()
The groupingByConcurrent() is the concurrent counterpart of groupingBy(). It returns a Collector that stores
results in a ConcurrentMap and is designed to be used with parallel streams.
How groupingBy() Works Under the Hood With Parallel Streams
When groupingBy() is used with a parallel stream, the JDK splits the stream into partitions, groups each partition independently into a separate intermediate Map, and then merges all intermediate maps at the end. That merge step has a cost that grows with the number of distinct grouping keys.
The groupingByConcurrent() eliminates the merge step entirely. Instead of each thread maintaining its own map, all threads accumulate directly into a single shared ConcurrentHashMap. This makes it a better fit when you’re working with large datasets and a high number of distinct keys, where merging intermediate maps becomes a bottleneck.
Basic Usage
import java.util.concurrent.ConcurrentMap;
import static java.util.stream.Collectors.groupingByConcurrent;
import static java.util.stream.Collectors.toList;
List<String> strings = List.of("a", "bb", "cc", "ddd");
ConcurrentMap<Integer, List<String>> result = strings.parallelStream()
.collect(groupingByConcurrent(String::length));
// Result: {1=[a], 2=[bb, cc], 3=[ddd]}All three overloads mirror those of groupingBy():
// 1. Classifier only
ConcurrentMap<Integer, List<String>> result = strings.parallelStream()
.collect(groupingByConcurrent(String::length));
// 2. Classifier + downstream collector
ConcurrentMap<Integer, Long> counts = strings.parallelStream()
.collect(groupingByConcurrent(String::length, counting()));
// 3. Classifier + custom concurrent map factory + downstream collector
ConcurrentSkipListMap<Integer, Long> sorted = strings.parallelStream()
.collect(groupingByConcurrent(String::length, ConcurrentSkipListMap::new, counting()));Note that the custom map factory must produce a ConcurrentMap – unlike groupingBy(), we cannot supply a plain HashMap or TreeMap here.
CONCURRENT and UNORDERED Characteristics
groupingByConcurrent() declares two special collector characteristics that distinguish it from groupingBy():
- CONCURRENT – the accumulator may be called concurrently by multiple threads against a single shared result container, removing the need for a merge phase.
- UNORDERED – the order of elements within each group is not guaranteed, even if the source stream is ordered.
The UNORDERED characteristic is the most important gotcha: even when used with a sequential stream, the order of elements within groups is non-deterministic. If preserving element order within groups matters, stick with groupingBy().
Common Mistake: Using groupingByConcurrent() With Sequential Streams
With a sequential stream, groupingByConcurrent() adds the overhead of ConcurrentHashMap (volatile reads, CAS operations) with none of the parallelism benefit. Use groupingBy() for sequential streams.
When to Prefer groupingBy() Even With Parallel Streams
The groupingByConcurrent() is not always the better choice for parallel streams:
- If element ordering within groups matters, use groupingBy() with a LinkedHashMap supplier instead.
- If the downstream collector itself is stateful and not thread-safe, concurrent accumulation cannot be used safely regardless.
- For small datasets, the overhead of concurrent data structures may outweigh the parallelism gain.
In practice, always benchmark before choosing groupingByConcurrent() over the simpler groupingBy() – parallel does not automatically mean faster.
Sources
All above examples can be found over in my GitHub project.
Make sure to check my OSS project with custom parallel Stream API collectors.



