目录 |
---|
Overview
TLS, Thread Local Storage, also known as Thread-Local Storage variables, refers to individual storage for each thread that is not shared among processes. For TLS variables, different threads point to different storage spaces. Its implementation involves support from high-level programming languages, compilers, and linkers.
Allocation of TLS Variables
In high-level programming languages, when defining and allocating TLS variables in a multi-threaded environment, modifications to TLS variables by different threads do not affect each other. TLS variables can be accessed for read and write operations in assignment statements just like regular variables.
Dynamic Management of Threads
Except for the main thread, other threads are created and destroyed during program execution. When loading shared object files, it is not possible to determine when and how many threads will be created. Therefore, dynamic management is required for the TLS variable libc. Since different threads access different address spaces of TLS variables, libc needs to manage the address space of TLS variables and dynamically obtain their addresses.
The compiler and linker place all TLS variables in the same TLS program segment in a dynamic library. The offset of TLS variables relative to the TLS program segment is fixed.
Implementation of TLS Variables
To optimize the performance of accessing TLS variables, the implementation of TLS variables adopts four approaches, with increasing performance and decreasing implementation scenarios:
...
Additionally, there is an optimized access mode called TLSDESC, which optimizes access to TLS variables in the static TLS block. It reduces register contamination caused by function calls and minimizes atomic operation access, among other optimizations.
Compiler Access Modes for TLS Variables
The compiler generally supports five access modes through the tls-dialect and tls-model options:
...
Note: The above compilation options are for aarch64 architecture. For x86_64 architecture, the tls-dialect compilation options have different values: gnu and gnu2, which correspond to trad and desc, respectively.
The TLS variable modifiers in C/C++
Specifier | Notes |
---|---|
__thread |
|
| |
_Thread_local |
|
| |
thread_local |
|
|
Introduction to TLS Data Structure
Drepper provides two implementations of memory layout based on different scenarios: static loading and dynamic loading of shared objects. The basic principles are similar.
Define a dynamic thread-specific array called dtv (dynamic thread vector), where each element points to the position of the TLS program segment content of the dynamic library in the current thread. This array can dynamically grow.
Allocate a static block of physical memory called the static TLS block. Once allocated, it cannot be increased or decreased. Most of the memory is used to store the contents of the TLS segment, and a small portion is used to store the Thread Control Block (TCB), which points to the dtv array.
For dynamically loaded dynamic libraries, allocate a memory block called the dynamic TLS block for each TLS program segment of the dynamic library. The dtv elements point to this memory block.
Use a dedicated segment register in the CPU to store the thread pointer address, such as the %fs segment register on x86_64 or the tpidr_el0 segment register on aarch64.
...
TLS Data Layout
Here, we present the TLS data layout for arrch64 in both the GNU and Bionic implementations. The GNU implementation is similar to the structure shown in Figure 1, while the TCB definition differs in the Bionic implementation. Other aspects remain similar.
...
In the physical layout of the GNU thread stack shown in Figure 4, the address increases from bottom to top. The allocation of the static TLS block memory occurs on the thread stack. The size of the TCB is determined by the tcbhead_t data structure, which is typically 16 bytes. The static TLS block is located immediately after the tcbhead_t data structure.
Static TLS Space:
Static TLS space refers to a contiguous physical memory region allocated on the thread stack. It encompasses both the static TLS space and TLS management space. The dynamic library's tls_offset is calculated relative to this space.
...
The dynamic thread vector records the starting positions of the TLS program segments for each dynamic library. Its address is stored in the space specified by the thread pointer.
Initialization Process of TLS Data Structure in Bionic
The initialization of the TLS data structure in Bionic by the dynamic linker is divided into two parts: static loading of libraries during the main program's loading process and dynamic loading of libraries using dlopen during the main program's execution.
TLS Initialization of TLS in Static Loading Libraries
During the loading process of the main program, the dynamic linker initializes the TLS data structure using two global variables: StaticTlsLayout and TlsModules.
StaticTlsLayout is of type StaticTlsLayout and is used to calculate the offset of each dependency library's TLS program segment in the static TLS block.
TlsModules is of type TlsModules and is used to calculate all the dynamic libraries that have TLS program segments.
在主程序加载过程中,所有的TLS程序段都存储在静态TLS块内存块中,具体步骤如下:
...
During the main program loading process, all TLS program segments are stored in the static TLS block memory. The specific steps are as follows:
Initialization of bionic_tcb and tpidr_el0: The function libc_init_main_thread_early初始化bionic_tcb和tpidr_el0,调用initearly is called to initialize bionic_tcb _dtv初始bionic_tcb中的TLS_SLOT_DTV,其更新标志值为0
调用linker_setup_exe_static_tls:(假设主程序包含TLS程序段)
a. 在StaticTlsLayout类型变量中预留bionic_tcb对象和TLS空间以及其位置
b. 调用register_tls_module获得module id,并加入TlsModules类型变量中
c. 预留bionic_tls对象空间及其位置
加载主程序依赖库,若存在TLS程序段,调用soinfo::register_soinfo_tls在StaticTlsLayout类型变量中预留TLS空间和位置,并加入TlsModules类型变量中
调用linker_finalize_static_tls,计算StaticTlsLayout类型变量中预留空间总大小,即为静态TLS块大小
调用__allocate_thread_mapping,为静态TLS块分配内存空间,空间分配在主程序的栈上。静态TLS块空间包括:bionic_tcb、bionic_tls以及TLS程序段。
调用__init_static_tls,将TlsModules类型变量中动态库的TLS段内容拷贝至静态TLS块空间中。
调用bionicand tpidr_el0. This involves setting up the initial values for bionic_tcb and setting the tpidr_el0 register.
Initialization of TLS_SLOT_DTV in bionic_tcb: The function init_tcb_dtv is called to initialize the TLS_SLOT_DTV field in the bionic_tcb structure. The update flag for this field is set to 0.
Calling linker_setup_exe_static_tls: This function is called assuming that the main program contains TLS program segments. The following actions are taken:
a. Reserve space and determine the position of the bionic_tcb object, TLS space, and their locations in a variable of type StaticTlsLayout.
b. Call register_tls_module to obtain the module ID and add it to a variable of type TlsModules.
c. Reserve space and determine the position of the bionic_tls object in the variable.Loading of TLS program segments in the main program dependencies: If there are TLS program segments in the main program's dependencies, the function soinfo::register_soinfo_tls is called. This reserves space for TLS segments and their positions in the StaticTlsLayout variable, and adds them to the TlsModules variable.
Calling linker_finalize_static_tls: This function calculates the total size of the reserved space in the StaticTlsLayout variable, which represents the size of the static TLS block.
Calling __allocate_thread_mapping: This function allocates memory space on the main program's stack for the static TLS block. The static TLS block includes bionic_tcb, bionic_tls, and TLS program segments.
Calling __init_static_tls: This function copies the contents of the TLS segments from the dynamic libraries specified in the TlsModules variable into the reserved space of the static TLS block.
Calling bionic_tcb::copy_from_bootstrap同步第一步初始化bionic_tcb的内容bootstrap: This function synchronizes the contents of bionic_tcb with the initial values set during the first step of initialization.
Calling __init_tcb更新bionictcb: This function updates bionic_tcb with additional information.
调用Calling __init_bionic_tls_ptrs更新bionic_tls地址调用__set_tls设置静态TLS块地址至段寄存器tpidr_el0中ptrs: This function updates the addresses of bionic_tls.
Relocation Initialization of TLS Variable's GOT Table Entries
Relocation Initialization of TLS Variable's GOT Table Entries
重定位初始化TLS变量的GOT表项
GD:重定位类型包括R_AARCH64_TLS_DTPMOD64和R_AARCH64_TLS_DTPREL64,分别将其相邻的GOT表项初始化为module id和变量在其TLS段中的偏移
LD:aarch64对LD的实现与GD相同,而x86_64的实现不同,其重定位类型为R_X86_64_DTPMOD64,分别将其相邻的GOT表项初始化为module id和偏移0
IE:重定位类型为R_AARCH64_TLS_TPREL64,将其GOT表项值初始为static STL block上的偏移
LE:无重定位项和GOT表项,不需要初始化
TLSDESC:重定位类型为R_AARCH64_TLSDESC,将其相邻的GOT表项初始化为tlsdesc_resolver_static函数地址和静态TLS块上的偏移量
从上初始化流程中,可看出dtv数组为空,dtv数组只在访问TLS变量时才会创建,这种延时分配的好处有:
...
从上面的用例看,TLSDSC访问方式相对于GD来说,对静态TLS块上的TLS变量的访问优化是显著的,TLSDESC方式直接返回GOT表项中的静态TLS块偏移量,而GD方式需要访问dtv数组,计算而得其地址。
动态加载库初始化TLS
bionic通过dlopen动态加载库,初始TLS的步骤如下:
在do_dlopen->find_library->soinfo::register_soinfo_tls→register_tls_module流程中,获得module id,并加入至TlsModules类型变量中
在soinfo::relocate->plain_relocate->plain_relocate_impl->process_relocation→process_relocation_impl流程中重定位初始化TLS变量的GOT表项
GD:重定位类型包括R_AARCH64_TLS_DTPMOD64和R_AARCH64_TLS_DTPREL64,分别将其相邻的GOT表项初始化为module id和变量在其TLS段中的偏移
LD:aarch64对LD的实现与GD相同
IE/LE:不支持
TLSDESC:重定位类型为R_AARCH64_TLSDESC,将其相邻的GOT表项初始化为tlsdesc_resolver_dynamic函数地址和TlsDynamicResolverArg类型变量地址
a. 初始化TlsDynamicResolverArg中TlsIndex的module id以及offset,offset的值为TLS变量在其TLS程序段的偏移量。另外初始化更新标志为库的更新标志,该标志表示动态库是否有更新;
b. 为了存储TlsDynamicResolverArg类型变量,将变量保存在soinfo::tlsdescargs数组中,为处理数组重新分配内存,Relocator::deferred_tlsdesc_relocs缓冲重定位信息,当该库的所有重定位操作完成后,再更新TLS变量的GOT表项.
线程创建过程中初始化TLS
调用pthread_create创建线程,需要对主程序上的所有TLS数据结构进行拷贝。(pthread_create->__allocate_thread)
调用__allocate_thread_mapping分配线程栈空间,包含了静态TLS块空间。(Allocate in order: stack guard, stack, static TLS, guard page)
调用__init_static_tls,将TlsModules类型变量中动态库的TLS段内容拷贝至静态TLS块空间中。
调用__init_tcb更新bionic_tcb
调用__init_tcb_dtv初始bionic_tcb中的TLS_SLOT_DTV,其更新标志值为0。
调用__init_bionic_tls_ptrs更新bionic_tls地址
调用clone,将静态TLS块地址传递给clone,由内核设置段寄存器tpidr_el0值
__tls_get_addr函数实现
GD/LD访问方式使用tls_get_addr函数获取TLS变量绝对地址。tls_get_addr函数涉及对dtv数据更新,其更新的条件由3个更新标志(generation)控制
...
更新dtv数组的条件:dtv数组的更新标志与全局动态库更新标志不相等,说明动态库有更新,当前指向的为旧的动态库
重新分配dtv数组,条件条件为拥有TLS程序段的动态库总数量大于dtv数组大小
a. 根据动态库数量重新分配dtv数组空间
b. 将dtv数组中的内容备份至新的dtv数组空间
c. 调用__set_tcb_dtv更新为新的dtv数组
d. 为实现无锁操作,不释放旧的dtv数组空间,而是将其插入一个垃圾回收队列中,待程序结束时回收
重新更新静态TLS块对应动态库的dtv数组元素
重新更新动态TLS块对应动态库的dtv数组元素,条件:动态库的更新标志大于dtv数组更新标志(表明dtv数组指向的为旧的动态库)
a. 释放旧的动态库动态TLS块内存
b. 将dtv数组元素清零
更新dtv数组的更新标志为全局动态库更新标志
TLSDESC访问方式实现
TLSDESC访问方式有两种方式获取TLS变量相对静态TLS块的偏移地址:一种对于静态TLS块上的TLS变量由tlsdesc_resolver_static获取,另一种对于动态TLS块的TLS变量由tlsdesc_resolver_dynamic获取。这两种方式都采用汇编实现,不遵循C/C++函数调用的寄存器传参规范。其使用规范中的返回值寄存器传参,如aarch64的x0寄存器,x86_64的rax寄存器。
...
快速路径,条件:TlsDynamicResolverArg::generation <= dtv[0] && dtv[mod_id] != NULL a. 返回 dtv[mod_id] + TlsDynamicResolverArg::TlsIndex::offset相对于静态TLS块的偏移
慢速路径,调用__tls_get_addr获取TLS变量的绝对地址,返回与静态TLS块的相对偏移
gnu TLS数据结构初始化流程
原理与bionic TLS数据结构初始化流程类似,不再详述。两者的差异包括:
...