【巨杉數據庫SequoiaDB】巨杉Tech | 巨杉數據庫的併發 malloc 實現

本文由巨杉數據庫北美實驗室資深數據庫架構師撰寫,主要介紹巨杉數據庫的併發malloc實現與架構設計。原文爲英文撰寫,我們提供了中文譯本在英文之後。

SequoiaDB Concurrent malloc Implementation

Introduction

In a C/C++ application, the dynamic memory allocation function malloc(3) can have a significant impact on the application’s performance. For multi-threaded applications such as a database engine, a sub-optimal memory allocator can also limit the scalability of the application. In this paper, we will discuss several popular dynamic memory allocator, and how SequoiaDB addresses the dynamic memory allocation problem in its database engine.

dlmalloc/ptmalloc

The GNU C library (glibc) uses ptmalloc, which is an allocator forked from dlmalloc with thread-related improvement. Memories are allocated as chunks, which is 8-byte aligned data structure containing a header and usable memory. This means there is at least an 8 or 16 byte overhead for memory chunk management. Unallocated memory is grouped by similar sizes and maintained by a double-linked list of chunks.

jemalloc

Originally developed by Jason Evans in 2005, jemalloc has since been adopted by FreeBSD, Facebook, Mozilla Firefox, MariaDB, Android and etc. jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. In order to avoid lock contention, jemalloc uses separate memory pool “arenas” for each CPU, and threads are assigned to an arena to handle malloc requests.

tcmalloc

TCMalloc is a malloc developed by Google. It reduces lock contention for multi-threaded programs by utilizing thread-local storage for small allocations. For large allocations, mmap or sbrk can be used along with fine grained and efficient spinlocks. It also has garbage-collection for local storage of dead threads. For small objects allocation, TCMalloc requires just one-percent space overhead for 8-byte objects, which is very space-efficient.

【巨杉數據庫SequoiaDB】巨杉Tech | 巨杉數據庫的併發 malloc 實現

Here is a test done to compare the performance of jemalloc and tcmalloc. The test involves 500 iterations of performing 1000 memory allocation, then free these 1000 memory. As seen both of them have very similar performance.

SequoiaDB Implementation

In SequoiaDB 3.4, it implements its own proprietary memory allocator, which is highly efficient and tailored for the memory usage within the SequoiaDB database engine. While jemalloc and tcmalloc are both excellent general purpose memory allocator, they cannot address all the challenges that are encountered within SequoiaDB. For example, the ability to trace memory requests is an important requirement in SequoiaDB engine, and this feature is lacking in existing third-party memory allocators. Figure 2 shows the architecture of the SequoiaDB memory model. There are three layers - thread, pool and OSS (Operating System Services).

【巨杉數據庫SequoiaDB】巨杉Tech | 巨杉數據庫的併發 malloc 實現

 

OSS Layer

The OSS layer provides malloc API which requests memory from the underlying operating system. This is also where the pool layer gets the memory from.

Pool Layer

The pool layer is a global memory pool which contains segments of different size. A segment is a contiguous memory block that is allocated from the OSS Layer. Each segment is divided into fixed-size chunks. By default there are 32-byte, 64, 128…8092-byte chunk-size. Requests above the 8092-byte max chunk-size threshold will be serviced by the OSS layer.

Thread Layer

The thread layer is a thread-local cache, with each thread having its own private cache, therefore memory allocation can be done in a lock-free manner. Memory chunks are grouped together by their chunk size, implemented using a linked-list. Memory chunks are requested and cached from the pool layer up to a configured threshold. For memories exceeding this threshold, they are released back to the pool layer, and can be reused by other threads. This design helps limit the overall memory footprint. In addition, each thread has a single elastic-big-block, which is used to service requests above max chunk-size threshold. Therefore, in most cases requests can be fulfilled in the thread layer, which is efficient and fast.

In addition, the SequoiaDB memory model also has built-in memory-debugging capability to detect memory corruption. It also has a trace feature which can track down where memories are being requested from. On top of that, it is fully configurable, and allow deployment to be customized according to customers workload and environment.

以下爲中文譯本

介紹

在 C / C ++ 應用程序中,動態內存分配函數 malloc(3) 會對應用程序的性能產生重大影響。對於諸如數據庫引擎之類的多線程應用程序,優化不足的內存分配器也會限制應用程序的可伸縮性。在本文中,我們將討論幾種流行的動態內存分配器,以及 SequoiaDB 如何解決其數據庫引擎中的動態內存分配問題。

dlmalloc/ptmalloc

GNU C 庫 (glibc) 使用 ptmalloc,它是從 dlmalloc 派生的具有線程相關改進的分配器。內存被分配爲塊,這是 8byte 對齊的數據結構,其中包含標頭和可用內存。這意味着內存塊管理至少有 8 或 16byte 的開銷。未分配的內存按相似的大小分組,並由塊的雙向鏈接列表維護。

jemalloc

jemalloc 最初由 Jason Evans 於2005年開發,此後已被 FreeBSD,Facebook,Mozilla Firefox,MariaDB,Android 等採用。jemalloc 是通用的 malloc(3) 實現,主要特點是避免碎片化和可擴展的併發支持。爲了避免鎖競爭,jemalloc 爲每個 CPU 使用單獨的內存池“區域”,並且將線程分配給區域以處理 malloc 請求。

tcmalloc

TCMalloc 是 Google 開發的 malloc。通過利用線程本地存儲進行小的分配,它減少了多線程程序的鎖爭用。對於較大的分配,可以將 mmap 或 sbrk 與細粒度且高效的自旋鎖一起使用。它還具有垃圾收集功能,用於死線程的本地存儲。對於小對象分配,TCMalloc 僅需要8個字節對象的百分之一的空間開銷,這非常節省空間。

這是一個測試,用於比較 jemalloc 和 tcmalloc 的性能。該測試涉及500次迭代以執行1000個內存分配,然後釋放這1000個內存。如圖所示,它們兩者的性能十分接近。

【巨杉數據庫SequoiaDB】巨杉Tech | 巨杉數據庫的併發 malloc 實現

SequoiaDB的實現

在 SequoiaDB  中(以 SequoiaDB v3.4 作爲例子),它實現了自己專有的內存分配器,該分配器高效且針對 SequoiaDB 數據庫引擎中的內存使用量身定製。儘管 jemalloc 和 tcmalloc 都是出色的通用內存分配器,但它們無法解決 SequoiaDB 內部遇到的所有挑戰。例如,跟蹤內存請求的能力是 SequoiaDB 引擎的一項重要要求,而現有的第三方內存分配器缺少此功能。圖2顯示了 SequoiaDB 內存模型的體系結構。共有三層-線程,池和 OSS(操作系統服務)。

 【巨杉數據庫SequoiaDB】巨杉Tech | 巨杉數據庫的併發 malloc 實現
 

OSS Layer

OSS 層提供了 malloc API,該 API 向底層操作系統請求內存。這也是 PoolLayer 從中獲取內存的位置。

Pool Layer

Pool Layer 是全局內存池,其中包含不同大小的段。段是從 OSS 層分配的連續內存塊。每個段分爲固定大小的塊。默認情況下,有32字節,64、128…8092字節的塊大小。超過8092字節最大塊大小閾值的請求將由 OSS 層處理。

Thread Layer

線程層是線程本地緩存,每個線程都有其自己的專用緩存,因此可以無鎖方式完成內存分配。內存塊按其塊大小分組在一起,使用鏈接列表實現。從 Pool Layer 請求內存塊並將其緩存到配置的閾值。對於超過此閾值的內存,它們將釋放回 Pool Layer 並可以由其他線程重用。

此設計有助於限制整體內存佔用。此外,每個線程都有一個彈性大塊,用於服務超過最大塊大小閾值的請求。因此,在大多數情況下,可以在線程層中滿足請求,這既高效又快速。

此外,SequoiaDB 內存模型還具有內置的內存調試功能,可以檢測內存損壞。它還具有跟蹤功能,可以跟蹤從哪裏請求內存。最重要的是,它是完全可配置的,並允許根據客戶的工作量和環境自定義部署。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章