前言
template <class T, class Alloc = alloc>
class vector {
typedef simple_alloc<value_type, Alloc> data_allocator;
//vector對象中內存分配和釋放全部交給data_allocator負責,即data_allocator是內存的主導者
…
}
從上可知vector中的內存主導者data_allocator主要有4個函數組成,這四個函數都是轉調用,其實還是Alloc在起作用。Alloc就是STL的內存配置器,STL中內存的分配和釋放工作全部交由Alloc負責。所以如果能摸清Alloc的來龍去脈,剖析清楚其主要的工作和原理對於STL源碼的學習非常重要。
STL配置器(allocator)頭文件與基本的函數
主要爲以下三個頭文件:stl_constructor.h,stl_alloc.h,stl_uninitialized.h
構造和析構工具:construct()和destroy()
這兩個函數的實現位於stl_construct.h,用來實現對象的構造和析構
construct()和destroy()被實現爲全局函數
#ifndef __SGI_STL_INTERNAL_CONSTRUCT_H
#define __SGI_STL_INTERNAL_CONSTRUCT_H
#include <new.h>
__STL_BEGIN_NAMESPACE
template <class T1, class T2>
inline void construct(T1* p, const T2& value) {
new (p) T1(value);
}
//泛化版本(ForwardIterator first, ForwardIterator last),接受兩個迭代器
//此函數設法找出元素的數值型別,進而利用__type_traits<>求取最適當措施
template <class ForwardIterator>
inline void destroy(ForwardIterator first, ForwardIterator last) {
__destroy(first, last, value_type(first));
}
//判斷元素的數值型別(value_type)是否有trivial destructor
template <class ForwardIterator, class T>
inline void __destroy(ForwardIterator first, ForwardIterator last, T*) {
typedef typename __type_traits<T>::has_trivial_destructor trivial_destructor;
__destroy_aux(first, last, trivial_destructor());
}
//如果元素的數值型別(value_type)有trivial destructor
template <class ForwardIterator>
inline void __destroy_aux(ForwardIterator, ForwardIterator, __true_type) {}
//如果元素的數值型別(value_type)有non-trivial destructor
template <class ForwardIterator>
inline void
__destroy_aux(ForwardIterator first, ForwardIterator last, __false_type) {
for ( ; first < last; ++first)
destroy(&*first);
}
//特化版本(char*, char*),對迭代器爲char*的特化版
inline void destroy(char*, char*) {}
//特化版本(wchar_t*, wchar_t*),對迭代器爲wchar_t*的特化版
inline void destroy(wchar_t*, wchar_t*) {}
//特化版本(T* pointer),接受一個指針
template <class T>
inline void destroy(T* pointer) {
pointer->~T();
}
__STL_END_NAMESPACE
#endif /* __SGI_STL_INTERNAL_CONSTRUCT_H */
value_type()和__type_traits<>實現問題?
construct()接受一個指針p和一個初值value,用於將初值設定到指針所指的空間上。Destroy()有兩種版本,第一種接受一個指針,將指針所指之物析構掉,即直接調用對象的析構函數。第二個版本接受一對迭代器,將[first,last)範圍內的對象都析構掉。首先判斷迭代器所指對象的型別(value_type),再利用__type_traits<T>判斷該型別的析構函數是否無關痛癢。如果每個對象的析構代價都是微小的(trivial destructor),就什麼也不做,因爲如果一次次的調用這些無關痛癢的析構函數,對於效率是一種傷害。否則就循環遍歷,將一個個對象析構掉。
空間的配置和釋放,std::alloc
這部分內容的函數實現位於stl_alloc.h,用來管理對象內存的分配和釋放。
SGI對於內存配置和釋放的設計哲學:
考慮到小型區塊所可能造成的內存破碎問題,SGI使用了2級內存配置器。第一級配置器直接使用malloc和free函數,第二級適配器採用不同的策略,如果配置的內存超過128bytes,視爲內存區塊足夠大,調用第一級內存配置器;否則認爲內存區塊過小,爲了減小內存碎片問題,採用memory pool整理方式。
第一級內存配置器:malloc_alloc
第二級內存配置器:__default_alloc_template
包裝一個接口simple_alloc,包含四個的四個成員函數都只是單純的轉調用,調用傳遞給內存配置器的成員函數。SGI STL容器全部都使用這個simple_alloc接口進行內存的配置和釋放。
以下討論兩級配置器的具體運行機制,暫時不把有關於線程的情況考慮進來。
第一級配置器__malloc_alloc_template剖析
template <int inst>
class __malloc_alloc_template {//第一級配置器
private:
//以下三個函數內存不足的處理函數
static void *oom_malloc(size_t);
static void *oom_realloc(void *, size_t);
#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
static void (* __malloc_alloc_oom_handler)();
#endif
public:
static void * allocate(size_t n)
{
void *result = malloc(n);
if (0 == result) result = oom_malloc(n);
return result;
}
static void deallocate(void *p, size_t /* n */)
{
free(p);
}
static void * reallocate(void *p, size_t /* old_sz */, size_t new_sz)
{
void * result = realloc(p, new_sz);
if (0 == result) result = oom_realloc(p, new_sz);
return result;
}
static void (* set_malloc_handler(void (*f)()))()
{
void (* old)() = __malloc_alloc_oom_handler;
__malloc_alloc_oom_handler = f;
return(old);
}
};
// malloc_alloc out-of-memory handling
#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
template <int inst>
void (* __malloc_alloc_template<inst>::__malloc_alloc_oom_handler)() = 0;
#endif
template <int inst>
void * __malloc_alloc_template<inst>::oom_malloc(size_t n)
{
void (* my_malloc_handler)();
void *result;
for (;;) {
my_malloc_handler = __malloc_alloc_oom_handler;
if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
(*my_malloc_handler)();
result = malloc(n);
if (result) return(result);
}
}
template <int inst>
void * __malloc_alloc_template<inst>::oom_realloc(void *p, size_t n)
{
void (* my_malloc_handler)();
void *result;
for (;;) {
my_malloc_handler = __malloc_alloc_oom_handler;
if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
(*my_malloc_handler)();
result = realloc(p, n);
if (result) return(result);
}
}
第一級配置器的工作:
如果分配給客端或者客端歸還的內存大小大於128bytes,則內存維護工作有第一級配置器擔任。
new-handler機制?
內存不足的情況處理函數:
private:
//以下三個函數內存不足的處理函數
static void *oom_malloc(size_t);
static void *oom_realloc(void *, size_t);
#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
static void (* __malloc_alloc_oom_handler)();
#endif
// malloc_alloc out-of-memory handling
#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
template <int inst>
void (* __malloc_alloc_template<inst>::__malloc_alloc_oom_handler)() = 0;
#endif
template <int inst>
void * __malloc_alloc_template<inst>::oom_malloc(size_t n)
{
void (* my_malloc_handler)();
void *result;
for (;;) {
my_malloc_handler = __malloc_alloc_oom_handler;
if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
(*my_malloc_handler)();
result = malloc(n);
if (result) return(result);
}
}
template <int inst>
void * __malloc_alloc_template<inst>::oom_realloc(void *p, size_t n)
{
void (* my_malloc_handler)();
void *result;
for (;;) {
my_malloc_handler = __malloc_alloc_oom_handler;
if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
(*my_malloc_handler)();
result = realloc(p, n);
if (result) return(result);
}
}
第二級配置器__default_alloc_template剖析
第二級配置器的宗旨:
如果分配給客端或者客端歸還的內存大小小於128bytes,則內存維護工作有第二級配置器擔任。
維護一個內存分配鏈表(各子鏈表上每個節點內存大小分別爲8,16,...,128bytes),另有內存緩衝池用來爲鏈表輸送內存。如果供應不足,從heap空間調來內存給內存緩衝池和內存分配鏈表。內存分配鏈表上各節點的內存用來分配給需要的客端對象;客端對象歸還內存也是歸還給內存分配鏈表。
代碼註釋:
template <bool threads, int inst>
class __default_alloc_template { //第二級配置器
private:
// Really we should use static const int x = N
// instead of enum { x = N }, but few compilers accept the former.
# ifndef __SUNPRO_CC
enum {__ALIGN = 8}; //分配內存的基數
enum {__MAX_BYTES = 128}; //分配內存的最大值
enum {__NFREELISTS = __MAX_BYTES/__ALIGN};//free_list[]的大小
# endif
static size_t ROUND_UP(size_t bytes) {//調整bytes大小,使得它的大小變爲8的倍數
return (((bytes) + __ALIGN-1) & ~(__ALIGN - 1));
}
__PRIVATE:
union obj {//使用union可以實現內存共享,縮小內存的使用
union obj * free_list_link;
char client_data[1]; /* The client sees this. */
};
private:
# ifdef __SUNPRO_CC
static obj * __VOLATILE free_list[]; //管理內存的主鏈表,每個元素指向一個鏈表,其鏈表上鍊接着一系列同等大小的內存待予分配
// Specifying a size results in duplicate def for 4.1
# else
static obj * __VOLATILE free_list[__NFREELISTS];
# endif
static size_t FREELIST_INDEX(size_t bytes) {//指向所應分配的內存鏈表地址
return (((bytes) + __ALIGN-1)/__ALIGN - 1);
}
// Returns an object of size n, and optionally adds to size n free list.
static void *refill(size_t n);
// Allocates a chunk for nobjs of size "size". nobjs may be reduced
// if it is inconvenient to allocate the requested number.
static char *chunk_alloc(size_t size, int &nobjs);
// Chunk allocation state.內存緩衝池的起始地址
static char *start_free;
static char *end_free;
static size_t heap_size;
# ifdef __STL_SGI_THREADS
static volatile unsigned long __node_allocator_lock;
static void __lock(volatile unsigned long *);
static inline void __unlock(volatile unsigned long *);
# endif
# ifdef __STL_PTHREADS
static pthread_mutex_t __node_allocator_lock;
# endif
# ifdef __STL_WIN32THREADS
static CRITICAL_SECTION __node_allocator_lock;
static bool __node_allocator_lock_initialized;
public:
__default_alloc_template() {
// This assumes the first constructor is called before threads
// are started.
if (!__node_allocator_lock_initialized) {
InitializeCriticalSection(&__node_allocator_lock);
__node_allocator_lock_initialized = true;
}
}
private:
# endif
class lock {
public:
lock() { __NODE_ALLOCATOR_LOCK; }
~lock() { __NODE_ALLOCATOR_UNLOCK; }
};
friend class lock;
public:
//宗旨:維護一個內存分配鏈表(各子鏈表上每個節點內存大小分別爲8,16,...,128bytes),另有內存緩衝池用來爲鏈表輸送內存,
//如果供應不足,從heap空間調來內存給內存緩衝池和內存分配鏈表
//內存分配鏈表上各節點的內存用來分配給需要的客端對象
/* n must be > 0 */
static void * allocate(size_t n)
{//構造,分配內存
obj * __VOLATILE * my_free_list;
obj * __RESTRICT result;
if (n > (size_t) __MAX_BYTES) {//如果即將內存大小大於128bytes,則由一級配置器分配內存
return(malloc_alloc::allocate(n));
}
my_free_list = free_list + FREELIST_INDEX(n);//定位到內存分配的地址,即尋找16個free_list子鏈表中適當的一個
// Acquire the lock here with a constructor call.
// This ensures that it is released in exit or during stack
// unwinding.
# ifndef _NOTHREADS
/*REFERENCED*/
lock lock_instance;
# endif
result = *my_free_list;
if (result == 0) {//沒有可用的free_list,準備重新填充free_list
void *r = refill(ROUND_UP(n));//一方面會分配子鏈表上的內存,另一方面用來增加內存緩衝池的內存
return r;
}
*my_free_list = result -> free_list_link;//調整free_list
return (result);
};
/* p may not be 0 */
static void deallocate(void *p, size_t n)
{//析構,釋放內存
obj *q = (obj *)p;
obj * __VOLATILE * my_free_list;
if (n > (size_t) __MAX_BYTES) {//如果即將內存大小大於128bytes,則由一級配置器釋放內存
malloc_alloc::deallocate(p, n);
return;
}
my_free_list = free_list + FREELIST_INDEX(n);//尋找對應的free_list
// acquire lock
# ifndef _NOTHREADS
/*REFERENCED*/
lock lock_instance;
# endif /* _NOTHREADS */
q -> free_list_link = *my_free_list;//調整free_list,回收內存,在free_list某個子鏈表的頭節點上插入一塊內存
*my_free_list = q;
// lock is released here
}
static void * reallocate(void *p, size_t old_sz, size_t new_sz);
} ;
typedef __default_alloc_template<__NODE_ALLOCATOR_THREADS, 0> alloc;
typedef __default_alloc_template<false, 0> single_client_alloc;
/* We allocate memory in large chunks in order to avoid fragmenting */
/* the malloc heap too much. */
/* We assume that size is properly aligned. */
/* We hold the allocation lock. */
template <bool threads, int inst>
char*
__default_alloc_template<threads, inst>::chunk_alloc(size_t size, int& nobjs)
{//從內存池中取空間給free_list使用,
char * result;
size_t total_bytes = size * nobjs;
size_t bytes_left = end_free - start_free;//內存緩衝池剩餘空間
if (bytes_left >= total_bytes) {//內存池的剩餘空間完全滿足需求量
result = start_free;
start_free += total_bytes;
return(result);
} else if (bytes_left >= size) {//內存池的剩餘空間不能完全滿足需求量,但足夠分配一個objs區塊大小
nobjs = bytes_left/size;
total_bytes = size * nobjs;
result = start_free;
start_free += total_bytes;
return(result);
} else {//內存池的剩餘空間連一個objs區塊大小都無法滿足
size_t bytes_to_get = 2 * total_bytes + ROUND_UP(heap_size >> 4);
// Try to make use of the left-over piece.
if (bytes_left > 0) {//內存緩衝池內還有一些零頭,分配給free_list的某個子鏈表
obj * __VOLATILE * my_free_list =
free_list + FREELIST_INDEX(bytes_left);
((obj *)start_free) -> free_list_link = *my_free_list;
*my_free_list = (obj *)start_free;
}
//配置heap空間,用來補充內存緩衝池
start_free = (char *)malloc(bytes_to_get);//分配bytes_to_get字節的內存到內存緩衝池中
if (0 == start_free) {//如果系統的內存不足以用來分配bytes_to_get給內存緩衝池
//heap空間不足,malloc失敗
obj * __VOLATILE * my_free_list, *p;
// Try to make do with what we have. That can't
// hurt. We do not try smaller requests, since that tends
// to result in disaster on multi-process machines.
for (i = size; i <= __MAX_BYTES; i += __ALIGN) {//從free_list子鏈表中(每個子節點內存大於size)各取一個節點的內存(如果存在的話),
//即釋放出尚未使用的區塊(區塊足夠大),用來增加節點內存大小爲size這個子鏈表上的節點;
//另一方面,也會把多餘的內存分配到內存緩衝池
int i;
my_free_list = free_list + FREELIST_INDEX(i);
p = *my_free_list;
if (0 != p) {
*my_free_list = p -> free_list_link;
start_free = (char *)p;
end_free = start_free + i;
return(chunk_alloc(size, nobjs));
// Any leftover piece will eventually make it to the
// right free list.
}
}
end_free = 0; // In case of exception.如果出現意外(山窮水盡,到處都沒內存可用)
start_free = (char *)malloc_alloc::allocate(bytes_to_get);//調用第一級配置器,看看out-of-memory是否能盡力改變情況
//在這邊會拋出異常,或內存不足的情況得到改善
// This should either throw an
// exception or remedy the situation. Thus we assume it
// succeeded.
}
heap_size += bytes_to_get;
end_free = start_free + bytes_to_get;
return(chunk_alloc(size, nobjs));//內存緩衝池中已經有足夠的內存,開始轉去重新分配;一部分分配給free_list,剩下的內存緩衝池自己依然留着
}
}
/* Returns an object of size n, and optionally adds to size n free list.*/
/* We assume that n is properly aligned. */
/* We hold the allocation lock. */
template <bool threads, int inst>
void* __default_alloc_template<threads, inst>::refill(size_t n)//返回一個大小爲n(n已經調整爲8的倍數)的對象,並且有時候會爲適當的free_list增加節點
{
int nobjs = 20;
char * chunk = chunk_alloc(n, nobjs);//嘗試取得nobjs個區塊作爲free_list的新節點,當然其中一個區塊用來分配掉
obj * __VOLATILE * my_free_list;
obj * result;
obj * current_obj, * next_obj;
int i;
if (1 == nobjs) return(chunk);//如果只獲得一個區塊的內存,則將其直接分配給調用者,free_list未增加新的節點
my_free_list = free_list + FREELIST_INDEX(n);//準備free_list,納入新的節點
/* Build free list in chunk */
result = (obj *)chunk;//這一塊準備給調用者
*my_free_list = next_obj = (obj *)(chunk + n);//剩餘的nobjs-1個區塊分別鏈接起來,加入free_list的子鏈表
for (i = 1; ; i++) {
current_obj = next_obj;
next_obj = (obj *)((char *)next_obj + n);
if (nobjs - 1 == i) {
current_obj -> free_list_link = 0;
break;
} else {
current_obj -> free_list_link = next_obj;
}
}
return(result);
}
template <bool threads, int inst>
void*
__default_alloc_template<threads, inst>::reallocate(void *p,
size_t old_sz,
size_t new_sz)//重新分配新的內存
{
void * result;
size_t copy_sz;
if (old_sz > (size_t) __MAX_BYTES && new_sz > (size_t) __MAX_BYTES) {
return(realloc(p, new_sz));
}
if (ROUND_UP(old_sz) == ROUND_UP(new_sz)) return(p);
result = allocate(new_sz);
copy_sz = new_sz > old_sz? old_sz : new_sz;
memcpy(result, p, copy_sz);
deallocate(p, old_sz);
return(result);
}
#ifdef __STL_PTHREADS
template <bool threads, int inst>
pthread_mutex_t
__default_alloc_template<threads, inst>::__node_allocator_lock
= PTHREAD_MUTEX_INITIALIZER;
#endif
#ifdef __STL_WIN32THREADS
template <bool threads, int inst> CRITICAL_SECTION
__default_alloc_template<threads, inst>::__node_allocator_lock;
template <bool threads, int inst> bool
__default_alloc_template<threads, inst>::__node_allocator_lock_initialized
= false;
#endif
#ifdef __STL_SGI_THREADS
__STL_END_NAMESPACE
#include <mutex.h>
#include <time.h>
__STL_BEGIN_NAMESPACE
// Somewhat generic lock implementations. We need only test-and-set
// and some way to sleep. These should work with both SGI pthreads
// and sproc threads. They may be useful on other systems.
template <bool threads, int inst>
volatile unsigned long
__default_alloc_template<threads, inst>::__node_allocator_lock = 0;
#if __mips < 3 || !(defined (_ABIN32) || defined(_ABI64)) || defined(__GNUC__)
# define __test_and_set(l,v) test_and_set(l,v)
#endif
template <bool threads, int inst>
void
__default_alloc_template<threads, inst>::__lock(volatile unsigned long *lock)
{
const unsigned low_spin_max = 30; // spin cycles if we suspect uniprocessor
const unsigned high_spin_max = 1000; // spin cycles for multiprocessor
static unsigned spin_max = low_spin_max;
unsigned my_spin_max;
static unsigned last_spins = 0;
unsigned my_last_spins;
static struct timespec ts = {0, 1000};
unsigned junk;
# define __ALLOC_PAUSE junk *= junk; junk *= junk; junk *= junk; junk *= junk
int i;
if (!__test_and_set((unsigned long *)lock, 1)) {
return;
}
my_spin_max = spin_max;
my_last_spins = last_spins;
for (i = 0; i < my_spin_max; i++) {
if (i < my_last_spins/2 || *lock) {
__ALLOC_PAUSE;
continue;
}
if (!__test_and_set((unsigned long *)lock, 1)) {
// got it!
// Spinning worked. Thus we're probably not being scheduled
// against the other process with which we were contending.
// Thus it makes sense to spin longer the next time.
last_spins = i;
spin_max = high_spin_max;
return;
}
}
// We are probably being scheduled against the other process. Sleep.
spin_max = low_spin_max;
for (;;) {
if (!__test_and_set((unsigned long *)lock, 1)) {
return;
}
nanosleep(&ts, 0);
}
}
template <bool threads, int inst>
inline void
__default_alloc_template<threads, inst>::__unlock(volatile unsigned long *lock)
{
# if defined(__GNUC__) && __mips >= 3
asm("sync");
*lock = 0;
# elif __mips >= 3 && (defined (_ABIN32) || defined(_ABI64))
__lock_release(lock);
# else
*lock = 0;
// This is not sufficient on many multiprocessors, since
// writes to protected variables and the lock may be reordered.
# endif
}
#endif
template <bool threads, int inst>
char *__default_alloc_template<threads, inst>::start_free = 0;
template <bool threads, int inst>
char *__default_alloc_template<threads, inst>::end_free = 0;
template <bool threads, int inst>
size_t __default_alloc_template<threads, inst>::heap_size = 0;
template <bool threads, int inst>
__default_alloc_template<threads, inst>::obj * __VOLATILE
__default_alloc_template<threads, inst> ::free_list[
# ifdef __SUNPRO_CC
__NFREELISTS
# else
__default_alloc_template<threads, inst>::__NFREELISTS
# endif
] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
管理內存最基本的數據結構:
union obj {//使用union可以實現內存共享,縮小內存的使用
union obj * free_list_link;
char client_data[1]; /* Theclient sees this. */
};
內存基本處理工具
STL定義有5個全局函數,作用於未初始化空間上,這樣的功能對於容器的實現很有幫助。
例如:要實現一個容器,容器的全區間構造函數(range constructor)通常有2個步驟完成:
(1)配置內存區塊,足以包含範圍內的所有元素;
(2)使用unintialized_copy()在該內存塊上構造元素。
用於構造的construct() //本節剛開始已經討論過
用於析構的destroy() //本節剛開始已經討論過
以下三個函數的實現均位於stl_uninitialized.h,其實現過程中對應的高層次函數copy(),fill(),fill_n()均位於stl_algobase.h
unintialized_copy()的泛型版本和特化版本:
unintialized_fill()的泛型版本:
unintialized_fill_n()的泛型版本:
參考文獻:
代碼來自於SGI STL
截圖來源於侯捷老師的《STL源代碼剖析》