Arena is an experimental package provided by the Go standard library, proposed in #51317, aimed at providing a user-controlled method for memory allocation and deallocation at a finer granularity to improve GC control.
The arena implementation provides an Arena object, which users can use for memory allocation and manual memory release. Once this package stabilizes, Go applications and libraries can adaptively adjust to further improve GC.
One use case for the arena is: suppose there is a tree-like data structure where each node in the tree is a heap object. The Arena can be used to allocate each node object, and when the entire tree structure is no longer needed, all node objects can be released at once through the Arena's free interface (since they are allocated from the same Arena, the memory space for all node objects is contiguous, making the release relatively fast).
Not being good at drawing diagrams or lengthy textual descriptions, I will analyze it below using code with comments in a slightly more rigid manner.
Arena Usage Example#
The minimum Go version to use the arena is 1.20, and GOEXPERIMENT needs to be enabled.
$ go env -w GOEXPERIMENT='arenas'
Let's write a test to see how to use the arena package and some considerations.
func Test_arena(t *testing.T) {
ar := arena.NewArena() // Create an arena object
// Use arena.MakeSlice to create a slice providing len and cap
// Similar to the native makeslice
s1 := arena.MakeSlice[int](ar, 0, 8)
s1 = append(s1, 1, 2, 3, 4, 5)
t.Log(s1) // [1 2 3 4 5]
// Use arena.Clone to clone an object
s2 := []string{"🐱", "🐭"}
s3 := arena.Clone(s2)
t.Log(s3) // [🐱 🐭]
// Use arena.New to create an object similar to the native new, returning *T
s4 := arena.New[map[int]int](ar)
t.Log((*s4)[0]) // 0
// If this append is not added, accessing s1 after Free will crash
// Because here the append exceeds cap and actually uses growslice
// So s1 becomes memory allocated by native mallocgc and is no longer managed by the arena
s1 = append(s1, 6, 7, 8, 9)
// Use Free to release all memory space allocated by the arena
ar.Free()
// Since growslice now has s1 unrelated to ar, accessing it will not crash
t.Log(s1)
// Since ar has been released, the original memory area is set to inaccessible (refer to the implementation source code below), which will crash here
// accessed data from freed user arena 0xc0047fffc0
// fatal error: fault
// [signal SIGSEGV: segmentation violation code=0x2 addr=0xc004000000 pc=0x1133494]
t.Log(((*s4)[0]))
}
Arena Implementation Source Code Analysis#
When reading the code, I tend to first envision a usage scenario and then read the corresponding implementation according to the flow:
- First create an arena
- The arena first requests space (allocating a segment of space each time)
- Arena allocates object1, object2... until space is insufficient
- Request space again to allocate new objects
- Release the arena (in most cases, there should be caching here)
- Create another arena or request space again (may retrieve some items from the cache)
The following source code analysis is closely related to Go's memory management, requiring readers to have a basic understanding of Go's memory allocator implementation (mspan, mheap, etc.) and the GC process.
From Standard Library to Runtime#
The standard library arena
is actually a wrapper for runtime linked via linkname to src/runtime/arena.go; just look directly at arena.go.
func newUserArena() *userArena // Create arena object
func (a *userArena) new(typ *_type) unsafe.Pointer
func (a *userArena) slice(sl any, cap int)
func (a *userArena) free()
// ...
Runtime Structure of Arena#
arenaChunk is a segment of heap memory, managed by mspan after being allocated by mheap (spanClass = 0).
type userArena struct {
fullList *mspan // Doubly linked list of allocated mspan
active *mspan // The most recently allocated mspan
refs []unsafe.Pointer // List of base addresses of arenaChunk (mspan)
// The last item always refers to the active mspan, while the rest are in fullList
defunct atomic.Bool // Marks whether this userArena has been freed
}
Creating Arena#
func newUserArena() *userArena {
a := new(userArena)
SetFinalizer(a, func(a *userArena) {
// If not actively freed, free it during GC
a.free()
})
a.refill() // Request a segment of space upon creation
return a
}
Filling Arena Space#
func (a *userArena) refill() *mspan {
// The first refill will definitely have s empty
s := a.active
var x unsafe.Pointer
// Not the first refill
if s != nil {
// If the old active is not empty, put it into fullList
// Then place the newly requested mspan into active
s.next = a.fullList
a.fullList = s
a.active = nil
s = nil
}
// Retrieve arenaChunk from the global reuse list, directly take the last one
// This list is populated when freeing the arena
lock(&userArenaState.lock)
if len(userArenaState.reuse) > 0 {
// Pick off the last arena chunk from the list.
n := len(userArenaState.reuse) - 1
x = userArenaState.reuse[n].x
s = userArenaState.reuse[n].mspan
userArenaState.reuse[n].x = nil
userArenaState.reuse[n].mspan = nil
userArenaState.reuse = userArenaState.reuse[:n]
}
unlock(&userArenaState.lock)
if s == nil {
// Request a new arenaChunk, actually managed by mspan
x, s = newUserArenaChunk()
}
// Store the base address of the new arenaChunk (mspan) in refs
a.refs = append(a.refs, x)
// Place the most recently requested mspan into active
// But do not add it to fullList
a.active = s
return s
}
// Create a new arenaChunk, still using mspan to manage memory
func newUserArenaChunk() (unsafe.Pointer, *mspan) {
// userArena also needs to add credit to assist GC
deductAssistCredit(userArenaChunkBytes)
var span *mspan
systemstack(func() {
// Use mheap to obtain a userArena
span = mheap_.allocUserArenaChunk()
})
// The returned x is the base address of a segment of heap space managed by mspan (the address of the first byte)
x := unsafe.Pointer(span.base())
// If requested during GC, directly mark the object as black (since it's all empty, no need to scan)
if gcphase != _GCoff {
gcmarknewobject(span, span.base(), span.elemsize)
}
// Memory sampling related...
// Affects heap memory space, triggering a GC test
if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
gcStart(t)
}
return x, span
}
// Use mheap to request arenaChunk
func (h *mheap) allocUserArenaChunk() *mspan {
var s *mspan
var base uintptr
lock(&h.lock)
if !h.userArena.readyList.isEmpty() {
// First check the free list readyList doubly linked list
s = h.userArena.readyList.first
h.userArena.readyList.remove(s)
base = s.base()
} else {
// Request a new arena
hintList := &h.userArena.arenaHints
// This part is more complex, aligning sizes, requesting space from the OS,
// and recording arena's metadata on mheap, etc.
v, size := h.sysAlloc(userArenaChunkBytes, hintList, false)
// If the obtained size is larger than requested
// Split the remaining part into readyList for future use
// The returned size is still userArenaChunkBytes
if size > userArenaChunkBytes {
for i := uintptr(userArenaChunkBytes); i < size; i += userArenaChunkBytes {
s := h.allocMSpanLocked()
s.init(uintptr(v)+i, userArenaChunkPages)
h.userArena.readyList.insertBack(s)
}
size = userArenaChunkBytes
}
base = uintptr(v)
}
unlock(&h.lock)
// sysAlloc returns reserved address space (BRK), need to change permissions with mmap
sysMap(unsafe.Pointer(base), userArenaChunkBytes, &gcController.heapReleased)
// Create an mspan to manage the allocated space, here spanclass is 0
spc := makeSpanClass(0, false)
h.initSpan(s, spanAllocHeap, spc, base, userArenaChunkPages)
s.isUserArenaChunk = true
// GC and heap profile data statistics...
// Place mspan onto mcentral's full spanSet
// Mark as having no available space (this space can only be used by the userArena owner)
h.central[spc].mcentral.fullSwept(h.sweepgen).push(s)
s.limit = s.base() + userArenaChunkBytes
s.freeindex = 1
s.allocCount = 1
// Clear mspan's bitmap on mheap
s.initHeapBits(true)
// Clear memory area (also increases the possibility of using hugepage)
// Because Linux will track and predict memory access patterns
// Continuous zero values will be judged as sequential scanning patterns, making hugepage more likely
memclrNoHeapPointers(unsafe.Pointer(s.base()), s.elemsize)
s.needzero = 0
s.freeIndexForScan = 1
return s
}
Creating Objects#
Taking MakeSlice to create a slice as an example.
func (a *userArena) slice(sl any, cap int) {
// Get the type of the slice itself
i := efaceOf(&sl)
typ := i._type
// Get the type of slice elements
typ = (*slicetype)(unsafe.Pointer(typ)).Elem
// Some typekind checks omitted
// Use alloc to request and construct a structure that conforms to slice ABI
*((*slice)(i.data)) = slice{a.alloc(typ, cap), cap, cap}
}
func (a *userArena) alloc(typ *_type, cap int) unsafe.Pointer {
s := a.active
var x unsafe.Pointer
for {
// Start trying to allocate space from the current active mspan
x = s.userArenaNextFree(typ, cap)
if x != nil {
break
}
// If mspan is insufficient for allocation, request a new arenaChunk
// If the requested size exceeds an arenaChunk, it will fallback to mallocgc
// A brand new arenaChunk space will definitely be sufficient for allocation
// During the next refill, the previous active will be placed into fullList
// Then the latest mspan will be placed into active
s = a.refill()
}
return x
}
func (s *mspan) userArenaNextFree(typ *_type, cap int) unsafe.Pointer {
// ...
// Calculate the required size based on typesize and cap
// If size exceeds userArenaChunkMaxAllocBytes, it can only be created via mallocgc
// Because arenaChunk may not be contiguous, multiple arenaChunks cannot be used to construct oversized structures
// Currently, userArenaChunkMaxAllocBytes is 8M
if size > userArenaChunkMaxAllocBytes {
if cap >= 0 {
return newarray(typ, cap)
}
return newobject(typ)
}
// typ.PtrBytes indicates how many bytes of this type may contain pointers
// The calculation uses the offset of the last pointer field plus the size of the pointer
// https://github.com/golang/go/blob/master/src/reflect/type.go#L2618
// This value will be used when GC scans the structure; if PtrBytes is zero, it indicates this type does not contain pointers
// takeFromBack indicates allocating space from the tail end of the arenaChunk
// takeFromFront indicates allocating space from the head of the arenaChunk
// For types without pointers, allocate from back to front; otherwise, from front to back
// This way, during GC scanning, when scanning from front to back reaches empty or reaches a non-pointer type, it does not need to continue scanning backward
var ptr unsafe.Pointer
if typ.PtrBytes == 0 {
// Allocate pointer-less objects from the tail end of the chunk.
v, ok := s.userArenaChunkFree.takeFromBack(size, typ.Align_)
if ok {
ptr = unsafe.Pointer(v)
}
} else {
v, ok := s.userArenaChunkFree.takeFromFront(size, typ.Align_)
if ok {
ptr = unsafe.Pointer(v)
}
}
if ptr == nil {
// The current active arenaChunk (mspan) has insufficient space, return nil to let the upper layer create a new arenaChunk
return nil
}
// For pointer types, need to mark on the corresponding mheap's bitmap
// If it's a slice type (cap >= 0), need to mark each element
if typ.PtrBytes != 0 {
if cap >= 0 {
userArenaHeapBitsSetSliceType(typ, cap, ptr, s.base())
} else {
userArenaHeapBitsSetType(typ, ptr, s.base())
}
c := getMCache(mp)
if cap > 0 {
// If it's a slice type, only the last element's last segment of non-pointer fields does not need to be scanned
// [{*int, int}, {*int, int}, {*int, int}]
// In the above case, only the last int does not need to be scanned
c.scanAlloc += size - (typ.Size_ - typ.PtrBytes)
} else {
// If it's a single type, only PtrBytes needs to be counted
// {int, *int, int, int}
// In the above case, the last two ints do not need to be scanned
c.scanAlloc += typ.PtrBytes
}
}
// Ensure the object is initialized first, setting the heap bitmap before being observed by GC
// On some weakly ordered machines, there may be inconsistent behavior due to reordering
// So a store/store barrier is added here
publicationBarrier()
return ptr
}
Releasing Arena#
func (a *userArena) free() {
// The objects on fullList correspond one-to-one with the addresses on refs
// Excluding refs[len(refs)-1] which indicates the active object
s := a.fullList
i := len(a.refs) - 2
for s != nil {
a.fullList = s.next
s.next = nil
freeUserArenaChunk(s, a.refs[i])
s = a.fullList
i--
}
// Place the active object onto the reuse list
s = a.active
if s != nil {
lock(&userArenaState.lock)
userArenaState.reuse = append(userArenaState.reuse, liveUserArenaChunk{s, a.refs[len(a.refs)-1]})
unlock(&userArenaState.lock)
}
a.active = nil
a.refs = nil
}
func freeUserArenaChunk(s *mspan, x unsafe.Pointer) {
if gcphase == _GCoff {
// When not in GC, set all (global) arenaChunk spans to fault
lock(&userArenaState.lock)
faultList := userArenaState.fault
userArenaState.fault = nil
unlock(&userArenaState.lock)
// The setUserArenaChunkToFault function mainly sets the arenaChunk's mmap area to inaccessible
// This way, if free is called and an attempt is made to access slices created with arena make, it will crash
// It will also place mspan into quarantine for GC worker sweep
s.setUserArenaChunkToFault()
for _, lc := range faultList {
lc.mspan.setUserArenaChunkToFault()
}
// Since mspan may still hold pointers, it needs to KeepAlive until GC cleans up
// Only after GC cleanup will the corresponding mspan be moved to readyList for reuse
KeepAlive(x)
KeepAlive(faultList)
} else {
// During GC, directly place the corresponding arenaChunk mspan onto the fault list
// Wait for the next freeUserArenaChunk and _GCoff to clean up
lock(&userArenaState.lock)
userArenaState.fault = append(userArenaState.fault, liveUserArenaChunk{s, x})
unlock(&userArenaState.lock)
}
}
Some Thoughts#
- The current arena implementation is still relatively rough; shared data like userArenaState is placed in global variables and has a global lock.
- The size of arenaChunk is also worth noting; currently, it is 8M. Since the arena's free does not immediately return to the memory allocator or OS, it still requires GC sweep. If arenas are created arbitrarily, one can imagine that each arena will have at least an 8M sized arenaChunk, which can easily blow up the heap.
- Space waste issue: If there are some larger data structures, such as 4+M, there may be 40% space waste; however, generally, it won't be that large. This issue needs to be considered based on usage scenarios. It is uncertain whether the size of arenaChunk can be customized in the future.
- Performance issues: Testing shows that the improvement is not as high as expected; it may also be due to issues with my benchmark code. I won't post it here, but testing for specific scenarios will be more accurate when needed.