SIMD in Golang, part 1 of ?

Nov 24, 2024 by @turgon@dosgame.club

Recently, I've been playing with running assembly directly in Golang. Pretty much nobody recommends doing this, but I felt I had a use case and wanted to see how things would shake out. I may write more in the future specifically about that use case.

A couple years back a friend mentioned this was possible in Go and pointed me to a developer talk on youtube by the Apache Arrow team, which I can't seem to find now. But a quick visit to the Apache Arrow for Go repository gets to the point: Go can run assembly and definitely can do SIMD. The benchmarks in their README.md offer something like a 15x speed improvement in ns/op.

Go's assembler is Plan 9 based, described at a high level by A Quick Guide to Go's Assembler. It's probably easiest to think of Go's assembly as an intermediate build step, just before your code compiles to an architecture specific object. This is nice because you won't have to write architecture specific assembly, but also not nice because there's a lot more literature and examples of e.g. Intel's x64 assembly than Go's and I found I had to care about x86 instructions quite a lot anyway.

I found Scott Mansfield's post A Foray Into Go Assembly Programming (2017) a really good starting point, but it and others like it were written ten thousand years ago. How much has changed since then?

It appears that there's a lot more instruction support now, while the way the assembler works is not so practically different, if at all, as to be a nuisance. Go assembly language complementary reference (2017) has a nice overview of the practical ways Go's assembly is special. It covers the quirky register naming, operand ordering, and other things that are likely to trip you up. Scott's post also mentions the textflag.h symbols.

I found it really helpful to write my code in Go and assembly with unit tests, writing absurdly small functions, and really going to great lengths to make sure the assembly did what I expected.

After surveying the landscape I was finally ready to start looking at real examples, but this turned out to be it's own challenge because searching for "golang assembly example" was pretty messy. In the end the best examples I found were from Go itself, or things like Apache Arrow and segmentio, where those examples are aren't made up proofs of concept, but actively developed production code.

Some examples:

math floor/ceil/trunc from Go's own source
Sum slices of integers using SIMD in Segmentio's Go ASM lib
Bloom filter support functions using SIMD from Segmentio's Parquet package
Take the min and max of slices of integers using SIMD from Apache Arrow (generated assembly)

Felix Cloutier's x86 and amd64 instruction reference was helpful (since I'm working under x86) as enough documentation to give me the gist of how the instructions work, but not all are supported by Go and they are sometimes slightly differently named.

Examples helped me see how a lot of simpler concepts function in Go's assembly, a great example of this being how to deal with Go slices.

The ultimate way to see examples, though, is to use go tool objdump against a Go binary, which will immediately barf out the entire program in assembly. The main problem with this is that most/all of the assembly is machine generated, so YMMV.

In terms of finding the instructions, there is a list of them in a CSV file in Go's sourcecode. I often found myself reading an example, finding a Go instruction, then searching for its equivalent in x86, or going in reverse from x86 instruction to Go and playing with it in a test program.

All in all, I had a lot of fun playing with Go's assembly. I would absolutely not want to use it in general projects, but I think there are specific use cases in which it could work well. But notably even with those use cases, it will present a maintenance burden that is quite a pain in the ass. For fun projects, or getting to know Go's internals better, this is good stuff.