Achieving zero impact GC has been the holy grail of programming for many decades. However as memory capacity has grown the reality is dawning that there is no free lunch. GC has a real impact. And slowly but surely we are now seeing a move away from it.
Infrastructure Software platform evolution -
— Prashant Deva (@pdeva) August 16, 2021
2000s - Java/JVM. GC Issues. JIT optimization issues.
2010s - Go. Stack allocation. Less GC issues. No JIT issues.
2020s - Rust/C++20. No GC issues. No JIT issues.
GC doesn't scale
GC really only works well in scenarios where heap size <=4gb with Javascript-based SPA apps and mobile apps being a great example.
Back in the era of 32-bit systems where GC-based languages got popular, this was considered a lot of memory (after all 4gb covers the entire 32-bit address space). However as RAM sizes have increased, its clear that GC simply doesnt scale. Can you imagine having a 100gb heap? A 50 gb heap? Nope. Yet, its fairly common to see these RAM sizes. The r6g.16xlarge instance on AWS comes with 512gb ram. It is unfathomable to even consider a heap this size.
Modern 'pauseless' GC sacrifice much for gain in a single metric
What about that new pauseless ZGC in the JVM?
So Java's 'next gen' ZGC operates using a Load Barrier, which means your Reads can 'magically' result in Writes as the barrier tries to 'heal' relocated references, completely changing the performance characteristics of your code!
— Prashant Deva (@pdeva) August 17, 2021
Oracle marketing doesnt mention this part....
Imagine having an algorithm that you carefully crafted so it doesn't perform any writes. Using ZGC means that the read-only algorithm is indeed doing writes under the hood (to 'heal' its pointers), which completely changes the performance characteristics of your algorithm!
If your app is one Core 10 and GC on Core 18, GC now has massive latency to read data owned by Core 10. If GC moves data during collection, now application thread on Core 10 gets another massive latency hit as it transfers data which is now owned by Core 18.
— Prashant Deva (@pdeva) August 17, 2021
2/3
Your app isn't 'paused' but it massively increased latency to its data. So much for a GC targeting 'low latency'.
— Prashant Deva (@pdeva) August 17, 2021
Fact is GC algorithms are moving in opposite direction from hardware. GC simply does not work with massive core count and huge RAM. Close one hole another opens.
3/3
Computer hardware is moving in a different direction
Modern infrastructure software like Scylladb and Redpanda are moving towards a pinned thread per core model where there is no data sharing across cpu cores to minimize latency. However modern pauseless, 'latency-focused' GC are moving in the opposite direction by scanning through the heap on separate cores, destroying the L3 cache of your worker threads each time they do so.
The more memory your program uses, the more the GC algorithm has to scan, resulting in more cpu usage and more thrashing of your caches. This fundamental property of GC makes it simply not feasible for a world where systems can have a terabyte of RAM.
Sure you can add some heuristics like the generational collectors, which are designed for HTTP's request/response model. But add long-lived objects like a cache to a system that uses generational GC and now your GC will thrash your cpu. Maybe you use a new type of collector for your 'caching' app. But then it may not perform so well with the request/response model's short-lived objects like the generational GC does. It is impossible to predict all the various ways programs will allocate and use memory and design GC heuristics for each.
GC is the wrong problem to solve
Maybe we have just been focusing on the wrong problem! What if instead we just made it super easy and error-free to manually free memory.
This is the direction modern programming languages are starting to move in. It started with Go finally acknowledging and supporting stack allocation and struct embedding. Just using those allowed Go programs to use so little memory compared to equivalent Java programs that Go could get away with a very basic GC compared to Java's 20 year old massive beast of a GC.
Rust takes the cake by introducing the borrow-checker and getting us back to the native code performance. A clear sign that GC-free is the future is seeing all modern infrastructure applications being written in Rust instead of Java or Go.
Even C++ now discourages the use of naked pointers and added move semantics and unique pointers to allow memory to be easily freed.
Conclusion
It's human nature to try to find a one size fits all solution. GC is a great solution for applications with small heaps, like mobile and web apps. Existing GC algorithms are good enough to deal with those. For larger heaps, GC research is a losing battle.
The future is indeed pauseless, it's also GC-free.