版本比较

密钥

  • 该行被添加。
  • 该行被删除。
  • 格式已经改变。

...

代码块
__attribute__((noinline)) static void* tls_get_addr_slow_path(const TlsIndex* ti) {
  TlsModules& modules = __libc_shared_globals()->tls_modules;
  bionic_tcb* tcb = __get_bionic_tcb();
  ScopedSignalBlocker ssb;

  // To prevent multiple threads from simultaneously modifying the __libc_shared_globals()->tls_modules global variable, you can use a mutex to enforce mutual exclusion
  ScopedWriteLock locker(&modules.rwlock);

  // update the dtv array or reallocate its memory

  update_tls_dtv(tcb);

  TlsDtv* dtv = __get_tcb_dtv(tcb);
  const size_t module_idx = __tls_module_id_to_idx(ti->module_id);
  void* mod_ptr = dtv->modules[module_idx];
  if (mod_ptr == nullptr) {

    // If the dtv array does not exist, you would need to allocate memory, copy the contents of the dynamic library's TLS program segment to the new memory, and initialize the module pointer.
    const TlsSegment& segment = modules.module_table[module_idx].segment;
    mod_ptr = __libc_shared_globals()->tls_allocator.memalign(segment.alignment, segment.size);
    if (segment.init_size > 0) {
      memcpy(mod_ptr, segment.init_ptr, segment.init_size);
    }
    dtv->modules[module_idx] = mod_ptr;

    // Reports the allocation to the listener, if any.
    if (modules.on_creation_cb != nullptr) {
      modules.on_creation_cb(mod_ptr, static_cast<void*>(static_cast<char*>(mod_ptr) + segment.size));
    }
  }

  return static_cast<char*>(mod_ptr) + ti->offset + TLS_DTV_OFFSET;
}

The steps to implement the update_tls_dtv动态分配dtv数组空间,代码太多不再列出,其实现步骤如下:

...

更新dtv数组的条件:dtv数组的更新标志与全局动态库更新标志不相等,说明动态库有更新,当前指向的为旧的动态库

重新分配dtv数组,条件条件为拥有TLS程序段的动态库总数量大于dtv数组大小

...

a. 根据动态库数量重新分配dtv数组空间

...

b. 将dtv数组中的内容备份至新的dtv数组空间

...

dtv function for dynamically allocating dtv array space are as follows:

  1. Condition for updating the dtv array: If the update flag of the dtv array is not equal to the global dynamic library update flag, it indicates that the dynamic library has been updated and the current dtv array points to the old dynamic library.

  2. Reallocate the dtv array if the total number of dynamic libraries with TLS program segments is greater than the size of the dtv array.

  • Allocate new space for the dtv array based on the number of dynamic libraries.

  • Backup the contents of the dtv array to the new space.

  • Call __set_tcb_

...

d. 为实现无锁操作,不释放旧的dtv数组空间,而是将其插入一个垃圾回收队列中,待程序结束时回收

...

重新更新静态TLS块对应动态库的dtv数组元素

...

重新更新动态TLS块对应动态库的dtv数组元素,条件:动态库的更新标志大于dtv数组更新标志(表明dtv数组指向的为旧的动态库)

  • a. 释放旧的动态库动态TLS块内存

  • b. 将dtv数组元素清零

...

更新dtv数组的更新标志为全局动态库更新标志

TLSDESC访问方式实现

TLSDESC访问方式有两种方式获取TLS变量相对静态TLS块的偏移地址:一种对于静态TLS块上的TLS变量由tlsdesc_resolver_static获取,另一种对于动态TLS块的TLS变量由tlsdesc_resolver_dynamic获取。这两种方式都采用汇编实现,不遵循C/C++函数调用的寄存器传参规范。其使用规范中的返回值寄存器传参,如aarch64的x0寄存器,x86_64的rax寄存器。

...

  • dtv to update to the new dtv array.

  • To achieve lock-free operation, don't free the old dtv array space. Instead, insert it into a garbage collection queue for later cleanup at the end of the program.

  1. Update the dtv array elements corresponding to the static TLS block and dynamic TLS blocks of the dynamic libraries.

  • Free the memory of the old dynamic TLS block

  • Clear the dtv array element.

  1. Update the update flag of the dtv array to the global dynamic library update flag.

The access method of TLSDECS

The TLSDESC access method provides two ways to obtain the offset address of a TLS variable relative to the static TLS block: tlsdesc_resolver_static for TLS variables on the static TLS block and tlsdesc_resolver_dynamic for TLS variables on the dynamic TLS block. Both of these methods are implemented using assembly language and do not adhere to the register parameter passing conventions of C/C++ function calls. Instead, they use the return value register for passing parameters, such as the x0 register for AArch64 or the rax register for x86_64.

代码块
/* Type used to represent a TLS descriptor in the GOT.  */

struct TlsDescriptor {
  TlsDescResolverFunc* func;
  size_t arg;
};

// tlsdesc_resolver_static函数,其TlsDescriptorstatic, Its TlsDescriptor::arg值为TLS变量的相对偏移

arg value is the relative offset of the TLS variable
// tlsdesc_resolver_dynamic函数,其TlsDescriptor::arg值为下列类型变量地址dynamic, tlsdesc_resolver_dynamic function, its TlsDescriptor::arg value is the address of the following type variable

struct TlsDynamicResolverArg {
  size_t generation;
  TlsIndex index;
};

struct TlsIndex {
  size_t module_id;
  size_t offset;
};

The implementation of tlsdesc_resolver_static的实现相当简单,返回TlsDescriptorstatic is straightforward as it simply returns the value of TlsDescriptor::arg值即可。arg.

For tlsdesc_resolver_dynamic对更新标志的判断有所优化,共有4个generation更新标志:

...

dynamic, there are several optimizations based on the update flags:

  1. Global generation: Stored in __libc_tls_generation_copy,为TlsModules::generation一个副本,每次新增拥有TLS程序段的动态库时,递增该值,表示有动态库新增。不需要处理动态库删除问题

  2. dtv数组中的generation,保存在数组中的第一个元素dtv[0],初始化为0,每次更新dtv数组时,更新generation只为当时的全局generation值。与全局generation不相等,说明有新的动态库加载,需要更新dtv数组内容

  3. 动态库的generation,保存在TlsModule::first_generation,该值初始化为加载该库时全局generation的值。该值用于判断dtv指向的动态库是否有变化,即是否为旧的动态库

  4. TLS变量GOT表项中指向的TlsDynamicResolverArg::generation,该值初始化为TlsModule::first_generation值,该值只要不大于dtv数组中的generation不需要重新分配dtv数组,否则表示该动态库的TLS程序段在dtv数组中未初始化。

tlsdesc_resolver_dynamic实现步骤:

  1. 快速路径,条件:TlsDynamicResolverArg::generation <= dtv[0] && dtv[mod_id] != NULL a. 返回 dtv[mod_id] + TlsDynamicResolverArg::TlsIndex::offset相对于静态TLS块的偏移

  2. 慢速路径,调用__tls_get_addr获取TLS变量的绝对地址,返回与静态TLS块的相对偏移

gnu TLS数据结构初始化流程

原理与bionic TLS数据结构初始化流程类似,不再详述。两者的差异包括:

  • 静态加载库初始化TLS时,gnu会为静态TLS块预留144字节空间

  • 动态加载库初始化TLS时,先从预留的静态TLS块获取空间,不足时采用动态TLS块,这种方式可满足调用dlopen动态加载的TLS变量IE访问方式的动态库

  • TLSDESC实现函数名称不同:_dl_tlsdesc_return/_dl_tlsdesc_dynamic/_dl_tlsdesc_undefweak -> tlsdesc_resolver_static/tlsdesc_resolver_dynamic/tlsdesc_resolver_unresolved_weak

预留静态TLS空间的作用有两个,一个是支持动态加载IE访问模式的库,另一个是优化TLSDESC访问模式性能。

在linux环境下,图形加速库(OpenGL/EGL)的使用预留静态TLS空间的典型应用。一般linux应用程序会使用图形API转发库,如glvnd,图形API转发库通过dlopen动态加载OpenGL/EGL库,而OpenGL/EGL库一般都使用了IE访问模式的TLS变量,通常是一个指针变量,指向一个数据结构,从而减少静态TLS块预留空间的占用。

预留静态TLS空间注意事项:

...

分配时机:glibc在重定位时尝试分配静态TLS空间,支持的两个重定位类型,分别为R_AARCH64_TLSDESC和R_AARCH64_TLS_TPREL

...

初始化数据:对所有线程的静态TLS空间进行初始化,TLS数据结构在线程栈上,有的线程使用用户栈,有的使用系统栈,在_dl_init_static_tls函数中实现

...

  1. copy, which is a copy of TlsModules::generation. It is incremented each time a dynamic library with TLS program segments is added, indicating the addition of a new dynamic library. There is no need to handle dynamic library removal.

  2. Generation in the dtv array: Stored in the first element of the dtv array, dtv[0]. It is initialized to 0 and updated to the value of the global generation whenever the dtv array is updated. If it is different from the global generation, it indicates that new dynamic libraries have been loaded, and the dtv array needs to be updated.

  3. Generation in the dynamic library: Stored in TlsModule::first_generation. It is initialized with the value of the global generation when the library is loaded. This value is used to determine if the dynamic library pointed to by dtv has changed, indicating an old dynamic library.

  4. Generation in the TLS variable's GOT entry: Stored in TlsDynamicResolverArg::generation. It is initialized with TlsModule::first_generation. As long as it is not greater than the generation in the dtv array, there is no need to reallocate the dtv array. If it is greater, it means that the TLS program segment of that dynamic library is not initialized in the dtv array.

The implementation steps for tlsdesc_resolver_dynamic are as follows:

  1. Fast path (conditions: TlsDynamicResolverArg::generation <= dtv[0] && dtv[mod_id] != NULL): Return dtv[mod_id] + TlsDynamicResolverArg::TlsIndex::offset as the relative offset to the static TLS block.

  2. Slow path:

Call __tls_get_addr to obtain the absolute address of the TLS variable. and Calculate the relative offset to the static TLS block.

The initialization process of the GNU TLS data structure

It is similar to the Bionic TLS data structure, and the principles are comparable. The differences between them include:

  1. When initializing TLS for statically loaded libraries, GNU reserves 144 bytes of space for the static TLS block.

  2. When initializing TLS for dynamically loaded libraries, GNU first tries to obtain space from the reserved static TLS block. If it is insufficient, it falls back to the dynamic TLS block. This approach allows dynamic libraries using the IE access mode for TLS variables to be accessed properly when loaded with dlopen.

  3. The function names for TLSDESC implementation differ: _dl_tlsdesc_return, _dl_tlsdesc_dynamic, and _dl_tlsdesc_undefweak in GNU correspond to tlsdesc_resolver_static, tlsdesc_resolver_dynamic, and tlsdesc_resolver_unresolved_weak respectively.

The purpose of reserving the static TLS space is twofold. Firstly, it supports dynamic libraries loaded with IE access mode for TLS variables. Secondly, it optimizes the performance of the TLSDESC access mode.

In the Linux environment, a typical use case for reserving static TLS space is in graphics acceleration libraries (such as OpenGL/EGL). Linux applications generally use graphics API dispatch libraries like glvnd, which dynamically load OpenGL/EGL libraries using dlopen. These OpenGL/EGL libraries often use TLS variables in IE access mode, typically a pointer variable pointing to a data structure. This reduces the space occupied by the reserved static TLS block.

Some considerations regarding reserving static TLS space are as follows:

  • Allocation Timing: glibc attempts to allocate the static TLS space during relocation and supports two relocation types: R_AARCH64_TLSDESC and R_AARCH64_TLS_TPREL.

  • Initialization Data: The static TLS space for all threads is initialized. The TLS data structures are located on the thread stack, with some threads using the user stack and others using the system stack. Initialization is performed in the _dl_init_static_tls function.

  • Concurrent Access: The allocation of reserved static TLS space is implemented in dl_open_worker_begin, and a large lock (dl_load_tls_lock) is acquired before calling the function. When initializing static TLS data, a lock (dl_stack_cache_lock) is obtained within the _dl_init_static_tls function to ensure thread safety.