GC Fun at Twitch

I read an interesting blog entry the other day, which I recommend all to read, from a developer at Twitch blog.twitch.tv. It is a real deep dive into the garbage collector and provides some really good insight.

To summarise the problem twitch is facing refresh storms where the amount of traffic to there API servers increased rapidly for a short while. The effect of this was a large number of small go routine serving up each request and that way creating a lot of CPU intensive garbage collecting work during this short period. As described they are running the instances with 64gb of ram but the application was only using 400mb or so, due to the heavy collection of garbage.

The garbage collector is generally triggered each time the allocated memory grows by a set percentage. Meaning that if you, in general, have a small footprint and for a short period have a much larger one (that can be collected) the GC will try quite intensively to do so. However, you can tune this percentage, which is one of the few ways to tune the GC.

Twitch solution boils down to something they call ballast which means that when the service is started it allocates 10gb of ram on the heap. This way, the GC won’t be triggered until the application is using 10gb which gives them room to handle a lot of traffic burst without involving the GC. Which I assume would look something like

var ballast []byte
func main(){
    ballast = make([]byte, 10_000_000_000)
}

My take

I might be well up on the hill of ignorance on the Dunning-Kruger curve when it comes to the GC in go, but the ballast concept seems like a quite hacky solution to me. It relies on some quite specific behavior of the GC and the underlying operating system in order to work well. Personally, I have never had an issue with the GC in a production environment but, for me, a more straight forward approach would be to define the GC behavior in code. Using the runtime and debug package we can build our own GC triggering strategy. The blogger at Twitch has done some extensive research which makes it somewhat odd that he does not bring up this possibility of a solution.

In the Twitch example we don’t want the GC to trigger until 10gb of memory is used so an example of this might be the following

import (
	"runtime"
	"runtime/debug"
	"github.com/c2h5oh/datasize"
)

func gc(threshold uint64){
    off := -1
    on := debug.SetGCPercent(off) // turns off the gc and assigns the prior value to on
    state := off
    for{
        time.Sleep(500*time.Millisecond)
        
        var mem runtime.MemStats
        runtime.ReadMemStats(&mem)  // gets memory statistics from runtime
        
        // if gc is turned off and the allocated memory is more than our threshold
        // then start the GC
        if state == off && mem.Alloc > threshold { 
            debug.SetGCPercent(on)
            state = on
        }
        // if gc is turned on but the allocated memory is less then our threshold 
        // then lets stop the GC
        if state == on && mem.Alloc < threshold { 
            debug.SetGCPercent(off)
            state = off
        }
    }
}

func main(){
    go gc(10 * datasize.GB)
}

We can now take this a step further and trigger the GC our selves, which could give us much greater control over when it is a good time to collect garbage. In general, if the memory at any time required isn’t much more then 400mb but we allow its heap to grow to 10gb before we collect anything, we can expect that the GC will bring it down to around 400mb again.

import (
	"runtime"
	"runtime/debug"
	"github.com/c2h5oh/datasize"
)

func gc(threshold uint64){
    debug.SetGCPercent(-1) // turns off the gc
    for{
        time.Sleep(500*time.Millisecond)
        
        var mem runtime.MemStats
        runtime.ReadMemStats(&mem)  // gets memory statistics from runtime
        
        // if allocated memory is more than our threshold
        // let us collect some garbage
        if mem.Alloc > threshold { 
            runtime.GC()
        }
    }
}

func main(){
    go gc(10 * datasize.GB)
}

This serves only as a basic example, but we could implement a more suiting heuristic with some more knowledge surrounding the context of the service runtime.

Some self-critique, runtime.ReadMemStats needs to stop the world to figure out all that goes into the memory statistics and I’m not sure how this might impact the performance. There can however be other ways to do this. One might be to use the os directly for an approximation of the needed information, eg. by reading from /proc/self/statm. Or implementing another heuristic counting handling requests. Or implementing one that only does GC when the traffic is “low”.

Further, I have only done some very basic tests in regards to this. So it would be interesting to see a comparision between the the ballast concept and this one in a real world scenario.


Author:
Rasmus Holm, CTO/Developer