版本比较

密钥

  • 该行被添加。
  • 该行被删除。
  • 格式已经改变。
目录

...

Overview

TLS(Thread Local Storage)全称为线程本地存储变量,指每个线程有独立的存储,进程内不共享。对于TLS变量来说,不同的线程指向不同的存储空间。它的实现涉及高级编程语言、编译器和链接器的支持。

TLS变量的申请

在高级编程语言中,在多线程环境下定义申请TLS变量,线程间对TLS变量的修改互不影响,像普通变量一样可以在赋值语句中读写访问。

线程的动态管理

除了主线程外,其它线程都在程序运行时创建和销毁,在加载共享对象文件时,无法确认何时创建线程,创建多少线程等信息。因此对TLS变量libc需要进行动态管理。由于不同线程访问TLS变量的地址空间不一样,libc需要管理TLS变量的地址空间,需要动态获取TLS变量地址。

编译器和链接器将动态库中所有的TLS变量放在同一个TLS程序段中,TLS变量相对于TLS程序段的偏移是固定的。

TLS变量的实现方式

为了优化TLS变量的访问性能,TLS变量的实现采用了以下四种方式,性能逐步增加,实现场景逐步减少。

  • Generic Dynamic:通用方式,每个TLS变量的访问都需函数调用获取地址。可跨动态库引用访问

  • Local Dynamic:局部方式,在函数内多个TLS变量引用,通过函数调用获取TLS程序段的地址,TLS变量通过对TLS程序段的偏移获取地址。只能在动态库内引用访问

  • Initial Exec:段寄存器和TLS变量偏移间接寻址,在主程序起始时静态加载的动态库中引用访问

  • Local Exec:段寄存器和TLS变量偏移直接寻址,只能在主程序中引用访问

另外有一个对GD的优化的访问模式TLSDESC。优化主要包括优化static TLS block中TLS变量访问,减少函数调用对寄存器污染,减少原子操作访问等

编译器访问TLS变量的方式

编译器一般用tls-dialect和tls-model2个选项支持5种访问方式:, Thread Local Storage, also known as Thread-Local Storage variables, refers to individual storage for each thread that is not shared among processes. For TLS variables, different threads point to different storage spaces. Its implementation involves support from high-level programming languages, compilers, and linkers.

Allocation of TLS Variables

In high-level programming languages, when defining and allocating TLS variables in a multi-threaded environment, modifications to TLS variables by different threads do not affect each other. TLS variables can be accessed for read and write operations in assignment statements just like regular variables.

Dynamic Management of Threads

Except for the main thread, other threads are created and destroyed during program execution. When loading shared object files, it is not possible to determine when and how many threads will be created. Therefore, dynamic management is required for the TLS variable libc. Since different threads access different address spaces of TLS variables, libc needs to manage the address space of TLS variables and dynamically obtain their addresses.

The compiler and linker place all TLS variables in the same TLS program segment in a dynamic library. The offset of TLS variables relative to the TLS program segment is fixed.

Implementation of TLS Variables

To optimize the performance of accessing TLS variables, the implementation of TLS variables adopts four approaches, with increasing performance and decreasing implementation scenarios:

  1. Generic Dynamic: The generic approach requires a function call to obtain the address of each TLS variable. Can be accessed across dynamic library references.

  2. Local Dynamic: The local approach is used when multiple TLS variables are referenced within a function. The address of the TLS program segment is obtained through a function call, and the address of TLS variables is obtained through the offset from the TLS program segment. Can only be accessed within the dynamic library.

  3. Initial Exec: Indirect addressing between segment registers and TLS variable offsets. Used for referencing and accessing TLS variables in dynamically loaded libraries at the start of the main program.

  4. Local Exec: Direct addressing between segment registers and TLS variable offsets. Can only be accessed within the main program.

Additionally, there is an optimized access mode called TLSDESC, which optimizes access to TLS variables in the static TLS block. It reduces register contamination caused by function calls and minimizes atomic operation access, among other optimizations.

Compiler Access Modes for TLS Variables

The compiler generally supports five access modes through the tls-dialect and tls-model options:

代码块
-mtls-dialect=trad -ftls-model=global-dynamic : GD方式GD
-mtls-dialect=trad -ftls-model=local-dynamic :LD方式:LD
-mtls-dialect=trad -ftls-model=initial-exec  :IE方式:IE
-mtls-dialect=trad -ftls-model=local-exec  :LE, supports the main program and does not support dynamic :LE方式,只支持主程序,不支持动态库libraries.
-mtls-dialect=desc :TLSDESC方式

注:以上为aarch64的编译选项,x86_64的tls-dialec编译选项值与aarch64不同,为gnu和gnu2,分别对应trad和desc

:TLSDESC

These options determine how the compiler generates code to access TLS variables.

Note: The above compilation options are for aarch64 architecture. For x86_64 architecture, the tls-dialect compilation options have different values: gnu and gnu2, which correspond to trad and desc, respectively.

The TLS variable modifiers in C/C++

...

Specifier

Notes

__thread

  • non-standard, but ubiquitous in GCC and Clang

  • cannot have dynamic initialization or destruction

_Thread_local

  • a keyword standardized in C11

  • cannot have dynamic initialization or destruction

thread_local

  • C11: a macro for _Thread_local via threads.h

  • C++11: a keyword, allows dynamic initialization and/or destruction

TLS数据结构介绍

Drepper根据静态加载和动态加载共享对象的不同场景提供了内存布局的两种实现,基本原理类似。

  • 定义一个动态线程数组,名为dtv,每个元素指向动态库的TLS程序段内容在当前线程的位置,该数组可动态增加

  • 分配一段静态的物理内存,称之为静态TLS块(static tls block),一旦分配不能增减,大部分用来存储TLS段内容,一小段内存用来存储线程控制块(TCB),该TCB指向dtv数组

  • 动态加载的动态库,为每个动态库的TLS程序段分配一段内存,称之为动态TLS块(dynamic tls block),由dtv元素指向这段内存

  • 使用CPU某个专用段寄存器保存线程指针地址(tp),如:x86_64的%fs段寄存器,aarch64的tpidr_el0段寄存器

...

TLS数据布局

这里分别列出gnu和bionic的arrch64 TLS数据布局,gnu的实现与图1的结构类似,bionic的实现的TCB定义不一样,其它类似。

...

图3为bionic线程栈的物理布局图,地址从上往下增加,静态TLS块内存分配在线程栈上,TCB大小为bionic_tcb数据结构大小,而非16字节。静态TLS块紧挨着bionic_tcb数据结构。

...

图4为gnu线程栈的物理布局图,地址从下往上增加,静态TLS块内存分配在线程栈上,TCB大小为tcbhead_t数据结构大小(16字节)。静态TLS块紧挨着tcbhead_t数据结构。

静态TLS空间

涉及静态TLS空间,TCB管理空间(bionic_tcb),线程指针(tp ),动态线程数组(dtv)等。

  • 静态TLS空间:静态TLS空间涉及静态TLS空间和TLS管理空间,一段连续的物理内存,一般分配在线程栈上,动态库的tls_offset相对于该空间计算的

  • 线程指针:线程指针需要保存在段寄存器中,变种1的TCB所占空间需要与编译器达成一致,以满足LE访问方式的要求,如:bionic的TCB所占空间为64,要求编译器生成主程序TLS变量访问指令时,增加64的偏移量。变种2约定偏移为0,因此无此要求。

  • TCB管理空间:记录线程指针、动态线程数组,线程局部数据管理等地址,bionic在aarch64定义的空间大小为9个8字节数组共72字节,比线程指针与第一个TLS变量的偏移大8,因此线程指针指向其第二个元素地址。另外,TCB管理空间涉及字节对齐问题,因此与静态TLS空间的起始地址可能不一样,这可能与bionic的实现相关

  • 动态线程数组:记录每个动态库的TLS程序段的起始位置,其地址保存在线程指针指定的空间中。

bionic TLS数据结构初始化流程

动态链接器对TLS数据结构的初始化分两部分,一部分在加载主程序过程中,称之为静态加载库,另一部分在主程序运行中调用dlopen加载动态库过程中,称之为动态加载库。

静态加载库初始化TLS

动态链接器在加载主程序过程中,使用StaticTlsLayout和TlsModules两个全局变量初始化TLS数据结构。

...

StaticTlsLayout类型变量计算每个依赖库TLS程序段在静态TLS块的偏移。

...

Introduction to TLS Data Structure

Drepper provides two implementations of memory layout based on different scenarios: static loading and dynamic loading of shared objects. The basic principles are similar.

  1. Define a dynamic thread-specific array called dtv (dynamic thread vector), where each element points to the position of the TLS program segment content of the dynamic library in the current thread. This array can dynamically grow.

  2. Allocate a static block of physical memory called the static TLS block. Once allocated, it cannot be increased or decreased. Most of the memory is used to store the contents of the TLS segment, and a small portion is used to store the Thread Control Block (TCB), which points to the dtv array.

  3. For dynamically loaded dynamic libraries, allocate a memory block called the dynamic TLS block for each TLS program segment of the dynamic library. The dtv elements point to this memory block.

  4. Use a dedicated segment register in the CPU to store the thread pointer address, such as the %fs segment register on x86_64 or the tpidr_el0 segment register on aarch64.

...

TLS Data Layout

Here, we present the TLS data layout for arrch64 in both the GNU and Bionic implementations. The GNU implementation is similar to the structure shown in Figure 1, while the TCB definition differs in the Bionic implementation. Other aspects remain similar.

...

In the physical layout of the Bionic thread stack shown in Figure 3, the address increases from top to bottom. The allocation of the static TLS block memory occurs on the thread stack. The size of the TCB is determined by the size of the bionic_tcb data structure, which may not be 16 bytes. The static TLS block is located immediately after the bionic_tcb data structure.

...

In the physical layout of the GNU thread stack shown in Figure 4, the address increases from bottom to top. The allocation of the static TLS block memory occurs on the thread stack. The size of the TCB is determined by the tcbhead_t data structure, which is typically 16 bytes. The static TLS block is located immediately after the tcbhead_t data structure.

Static TLS Space:

Static TLS space refers to a contiguous physical memory region allocated on the thread stack. It encompasses both the static TLS space and TLS management space. The dynamic library's tls_offset is calculated relative to this space.

  • Thread Pointer:

The thread pointer needs to be stored in a segment register. In Variant 1, the TCB's space allocation needs to be consistent with the compiler to meet the requirements of LE (little-endian) access. For example, in Bionic, the TCB occupies 64 bytes, requiring the compiler to generate instructions for accessing main program TLS variables with an additional offset of 64. In Variant 2, the offset is conventionally set to 0, eliminating this requirement.

  • TCB Management Space:

The TCB management space records addresses for the thread pointer, dynamic thread vector (dtv), and management of thread-local data. In Bionic's aarch64 implementation, the space is defined as nine arrays of eight bytes each, totaling 72 bytes. It is larger than the offset between the thread pointer and the first TLS variable by 8 bytes, causing the thread pointer to point to the address of the second element. Additionally, the TCB management space may have alignment requirements, which can result in a different starting address compared to the static TLS space. This may vary depending on the specific implementation of Bionic.

  • Dynamic Thread Vector (dtv):

The dynamic thread vector records the starting positions of the TLS program segments for each dynamic library. Its address is stored in the space specified by the thread pointer.

Initialization Process of TLS Data Structure in Bionic

The initialization of the TLS data structure in Bionic by the dynamic linker is divided into two parts: static loading of libraries during the main program's loading process and dynamic loading of libraries using dlopen during the main program's execution.

TLS Initialization of TLS in Static Loading Libraries

During the loading process of the main program, the dynamic linker initializes the TLS data structure using two global variables: StaticTlsLayout and TlsModules.

  • StaticTlsLayout is of type StaticTlsLayout and is used to calculate the offset of each dependency library's TLS program segment in the static TLS block.

  • TlsModules is of type TlsModules and is used to calculate all the dynamic libraries that have TLS program segments.

在主程序加载过程中,所有的TLS程序段都存储在静态TLS块内存块中,具体步骤如下:

...