One of Vavr’s users hit a peculiar bug recently: a serialized map containing enum keys, deserialized in another process, appeared to hold the right entries but the map… just could not find them.
The root cause turns out to be a subtle property of Java enums that most Java developers are not aware of.
Java Native Serialization
Before we get to the core issue, let’s recall how Java’s native serialization works.
To serialize an object, JVM traverses the object graph and writes each field’s value to a byte stream.
For primitive fields, the raw value is written directly. For object references, the referenced object is serialised recursively.
On deserialization, the object is reconstructed by reading those bytes back and assigning them to fields directly, bypassing the constructor entirely:
private static byte[] serialize(Object obj) throws Exception {
var baos = new ByteArrayOutputStream();
try (var oos = new ObjectOutputStream(baos)) {
oos.writeObject(obj);
}
return baos.toByteArray();
}
private static <T> T deserialize(byte[] bytes) throws Exception {
try (var ois = new ObjectInputStream(new ByteArrayInputStream(bytes))) {
return (T) ois.readObject();
}
}Enum’s hashCode()
Let’s start with a simple enum:
enum Type { A, B }A quick question – what does Type.A.hashCode()return?
If you don’t know the exact answer, I’m willing to bet that you’re going to go for one of these:
0(the ordinal)65(the hashcode of A, or something derived from it)
In both cases, you’d be wrong. Enum doesn’t override hashCode(), it inherits the default implementation directly from Object, which is identity-based and derived from the object’s memory address (or, more precisely, from whatever the JVM uses internally to assign identity hash codes).
Within a single JVM, this is perfectly fine since enum constants are singletons. However, across JVM instances, the identity hash code of the same constant will almost certainly be different.
A Minimal Naive Hash Map
To see why this matters, let’s write a trivial map implementation that stores a single key-value pair:
class SingleEntryMap<K, V> implements Serializable {
private final int bucket;
private final K key;
private final V value;
SingleEntryMap(K key, V value) {
this.bucket = key.hashCode();
this.key = key;
this.value = value;
}
public Optional<V> get(K key) {
return key.hashCode() == bucket && key.equals(this.key)
? Optional.of(value)
: Optional.empty();
}
// ...
}The map captures the key’s hash code at construction time and stores it alongside the key/value pair.
On lookup, it first checks hash equality (the bucket), and only then performs the ultimate equality test, which mirrors what real hash-based data structures do.
Within a single JVM, this works correctly every single time:
var map = new SingleEntryMap<>(Type.A, "hello");
assertThat(map.get(Type.A)).contains("hello");
assertThat(map.get(Type.B)).isEmpty();Even if we serialize and then deserialize it within the same process, it works. The deserialized enum constant resolves back to the same singleton object, which still has the same identity hash code:
var original = new SingleEntryMap<>(Type.A, "hello");
byte[] bytes = serialize(original);
SingleEntryMap<Type, String> deserialized = deserialize(bytes);
assertThat(deserialized.get(Type.A)).contains("hello");The Multiple Processes Trap
Now let’s serialize the map in one JVM process and deserialize it in another:
void main() throws Exception {
var map = new SingleEntryMap<>(Type.A, "hello");
try (var out = new ObjectOutputStream(new FileOutputStream("/tmp/map.bin"))) {
out.writeObject(map);
IO.println("Type.A.hashCode() = " + Type.A.hashCode());
}
}void main() throws Exception {
try (var in = new ObjectInputStream(new FileInputStream("/tmp/map.bin"))) {
SingleEntryMap<Type, String> map = (SingleEntryMap<Type, String>) in.readObject();
IO.println("Type.A.hashCode() = " + Type.A.hashCode());
IO.println("stored bucket = " + map.bucket());
IO.println("map contains: " + map.key() + " -> " + map.value());
IO.println("map.get(Type.A) = " + map.get(Type.A));
}
}The output will look something like:
// Process 1
Type.A.hashCode() = 713338599
// Process 2
Type.A.hashCode() = 1147985808
stored bucket = 713338599
map contains: A -> hello
map.get(Type.A) = Optional.emptyWe can clearly see that the key is there, the value is there, but get(Type.A) returns empty!
The stored bucket came from the first process. In the second process, Type.A has a different identity hash code. The bucket check fails before we get to the true equality check!
How come JDK’s HashMap doesn’t have this problem?
It implements custom readObject/writeObject methods which serialize only the keys and values, then rehashes everything during deserialization.
Vavr’s HashMap Had the Same Bug
This isn’t a hypothetical problem. Vavr’s HashMap had the same bug!
Vavr’s HashMap is backed by a Hash Array Mapped Trie, and its serialization mechanism would dump the entire internal tree structure to the stream, including the pre-computed hash values of keys.
For keys with deterministic hash codes (strings, integers), this worked fine. But for enums or any key type relying on identity hash codes, deserializing in a different process produced a map containing same entries, which could never be fetched!
The fix in Vavr 1.0.1 was to adopt the serialization proxy pattern: serialize only the key-value pairs, then rebuild the HAMT from scratch during deserialization, which is essentially the same approach the JDK has used all along (just implemented using a different pattern).
Summary
If you’re working with hashcodes, never leak them outside a single JVM process unless you’re sure that those are deterministic. Generally, a much safer solution would be to focus on the logical content, and then rebuild the physical structure when needed, but that can come up with extra performance penalty.
The source code is available on GitHub.



