Moving Fulcrum

Achieving zero impact GC has been the holy grail of programming for many decades. However as memory capacity has grown the reality is dawning that there is no free lunch. GC has a real impact. And slowly but surely we are now seeing a move away from it.

Infrastructure Software platform evolution -

2000s - Java/JVM. GC Issues. JIT optimization issues.

2010s - Go. Stack allocation. Less GC issues. No JIT issues.

2020s - Rust/C++20. No GC issues. No JIT issues.
— Prashant Deva (@pdeva) August 16, 2021

GC doesn't scale

GC really only works well in scenarios where heap size <=4gb with Javascript-based SPA apps and mobile apps being a great example.

Back in the era of 32-bit systems where GC-based languages got popular, this was considered a lot of memory (after all 4gb covers the entire 32-bit address space). However as RAM sizes have increased, its clear that GC simply doesnt scale. Can you imagine having a 100gb heap? A 50 gb heap? Nope. Yet, its fairly common to see these RAM sizes. The r6g.16xlarge instance on AWS comes with 512gb ram. It is unfathomable to even consider a heap this size.

Modern 'pauseless' GC sacrifice much for gain in a single metric

What about that new pauseless ZGC in the JVM?

So Java's 'next gen' ZGC operates using a Load Barrier, which means your Reads can 'magically' result in Writes as the barrier tries to 'heal' relocated references, completely changing the performance characteristics of your code!
Oracle marketing doesnt mention this part....
— Prashant Deva (@pdeva) August 17, 2021

Imagine having an algorithm that you carefully crafted so it doesn't perform any writes. Using ZGC means that the read-only algorithm is indeed doing writes under the hood (to 'heal' its pointers), which completely changes the performance characteristics of your algorithm!

If your app is one Core 10 and GC on Core 18, GC now has massive latency to read data owned by Core 10. If GC moves data during collection, now application thread on Core 10 gets another massive latency hit as it transfers data which is now owned by Core 18.
2/3
— Prashant Deva (@pdeva) August 17, 2021

Your app isn't 'paused' but it massively increased latency to its data. So much for a GC targeting 'low latency'.
Fact is GC algorithms are moving in opposite direction from hardware. GC simply does not work with massive core count and huge RAM. Close one hole another opens.
3/3
— Prashant Deva (@pdeva) August 17, 2021

Computer hardware is moving in a different direction

Modern infrastructure software like Scylladb and Redpanda are moving towards a pinned thread per core model where there is no data sharing across cpu cores to minimize latency. However modern pauseless, 'latency-focused' GC are moving in the opposite direction by scanning through the heap on separate cores, destroying the L3 cache of your worker threads each time they do so.

The more memory your program uses, the more the GC algorithm has to scan, resulting in more cpu usage and more thrashing of your caches. This fundamental property of GC makes it simply not feasible for a world where systems can have a terabyte of RAM.

Sure you can add some heuristics like the generational collectors, which are designed for HTTP's request/response model. But add long-lived objects like a cache to a system that uses generational GC and now your GC will thrash your cpu. Maybe you use a new type of collector for your 'caching' app. But then it may not perform so well with the request/response model's short-lived objects like the generational GC does. It is impossible to predict all the various ways programs will allocate and use memory and design GC heuristics for each.

GC is the wrong problem to solve

Maybe we have just been focusing on the wrong problem! What if instead we just made it super easy and error-free to manually free memory.

This is the direction modern programming languages are starting to move in. It started with Go finally acknowledging and supporting stack allocation and struct embedding. Just using those allowed Go programs to use so little memory compared to equivalent Java programs that Go could get away with a very basic GC compared to Java's 20 year old massive beast of a GC.

Rust takes the cake by introducing the borrow-checker and getting us back to the native code performance. A clear sign that GC-free is the future is seeing all modern infrastructure applications being written in Rust instead of Java or Go.

Even C++ now discourages the use of naked pointers and added move semantics and unique pointers to allow memory to be easily freed.

Conclusion

It's human nature to try to find a one size fits all solution. GC is a great solution for applications with small heaps, like mobile and web apps. Existing GC algorithms are good enough to deal with those. For larger heaps, GC research is a losing battle.

The future is indeed pauseless, it's also GC-free.

Moving Fulcrum

GC research is a losing battle

GC doesn't scale

Modern 'pauseless' GC sacrifice much for gain in a single metric

Computer hardware is moving in a different direction

GC is the wrong problem to solve

Conclusion

GC research is a losing battle

Rethinking the IDE for the 2020s

The era of the JVM is coming to an end

Github Actions remains an Insecure and Incomplete CI

Gotchas of dual-booting Ubuntu with Windows 10

The Unsolved Load Balancing Problem of WebSockets

The uncertain future of Kotlin for JavaScript

Negative impact of AVX Workloads on Cloud VMs and Kubernetes clusters

Horrors of using Azure Kubernetes Service in production

Massive Security Issue with Bitbucket Pipelines and CircleCI

Visual Studio Code will replace Visual Studio

Oculus' misguided direction is killing VR

4K is not here yet

Microsoft Sculpt Ergonomic Mouse - A hidden gem

Amazing 'Game of Thrones' critique