This is a prerequisite for upstreaming our LLVM patches, as our current
hack forcing `-ftls-model=initial-exec` in the Clang driver is not
acceptable upstream.
Currently, our kernel-managed TLS implementation limits us to only
having a single block of storage for all thread-local variables that's
initialized at load time. This PR merely implements the dynamic TLS
interface (`__tls_get_addr` and TLSDESC) on top of our static TLS
infrastructure. The current model's limitations still stand:
- a single static TLS block is reserved at load time, `dlopen()`-ing
shared libraries that define thread-local variables might cause us to
run out of space.
- the initial TLS image is not changeable post-load, so `dlopen()`-ing
libraries with non-zero-initialized TLS variables is not supported.
The way we repurpose `ti_module` to mean "offset within static TLS
block" instead of "module index" is not ABI-compliant.