malloc內存分配字節對齊問題

最近看了一些開源的C/C++庫,其中都對於內存分配這塊做出了自己的一些優化和說明,也涉及到了一些內存分配字節對齊以及內存分頁的問題。

對於內存分配的字節對齊問題,一直都是隻知其事,不知其解,平時也很少關注這一塊會帶來的性能問題。但是要是放在一個高併發,快速以及資源最大化利用的系統裏面,這一塊往往是需要注意的,所以也就趁着這次機會,大概的瞭解一下。

我們先來看一下glibc裏面malloc.c的定義

1100 /*
1101   -----------------------  Chunk representations -----------------------
1102 */
1103 
1104 
1105 /*
1106   This struct declaration is misleading (but accurate and necessary).
1107   It declares a "view" into memory allowing access to necessary
1108   fields at known offsets from a given base. See explanation below.
1109 */
1110 
1111 struct malloc_chunk {
1112 
1113   INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
1114   INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */
1115 
1116   struct malloc_chunk* fd;         /* double links -- used only if free. */
1117   struct malloc_chunk* bk;
1118 
1119   /* Only used for large blocks: pointer to next larger size.  */
1120   struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
1121   struct malloc_chunk* bk_nextsize;
1122 };
1123 
1124 
1125 /*
1126    malloc_chunk details:
1127 
1128     (The following includes lightly edited explanations by Colin Plumb.)
1129 
1130     Chunks of memory are maintained using a `boundary tag' method as
1131     described in e.g., Knuth or Standish.  (See the paper by Paul
1132     Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
1133     survey of such techniques.)  Sizes of free chunks are stored both
1134     in the front of each chunk and at the end.  This makes
1135     consolidating fragmented chunks into bigger chunks very fast.  The
1136     size fields also hold bits representing whether chunks are free or
1137     in use.
1138 
1139     An allocated chunk looks like this:
1140 
1141 
1142     chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1143             |             Size of previous chunk, if allocated            | |
1144             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1145             |             Size of chunk, in bytes                       |M|P|
1146       mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1147             |             User data starts here...                          .
1148             .                                                               .
1149             .             (malloc_usable_size() bytes)                      .
1150             .                                                               |
1151 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1152             |             Size of chunk                                     |
1153             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1154 
1155 
1156     Where "chunk" is the front of the chunk for the purpose of most of
1157     the malloc code, but "mem" is the pointer that is returned to the
1158     user.  "Nextchunk" is the beginning of the next contiguous chunk.
1159 
1160     Chunks always begin on even word boundaries, so the mem portion
1161     (which is returned to the user) is also on an even word boundary, and
1162     thus at least double-word aligned.
1163 
1164     Free chunks are stored in circular doubly-linked lists, and look like this:
1165 
1166     chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1167             |             Size of previous chunk                            |
1168             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1169     `head:' |             Size of chunk, in bytes                         |P|
1170       mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1171             |             Forward pointer to next chunk in list             |
1172             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1173             |             Back pointer to previous chunk in list            |
1174             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1175             |             Unused space (may be 0 bytes long)                .
1176             .                                                               .
1177             .                                                               |
1178 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1179     `foot:' |             Size of chunk, in bytes                           |
1180             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1181 
1182     The P (PREV_INUSE) bit, stored in the unused low-order bit of the
1183     chunk size (which is always a multiple of two words), is an in-use
1184     bit for the *previous* chunk.  If that bit is *clear*, then the
1185     word before the current chunk size contains the previous chunk
1186     size, and can be used to find the front of the previous chunk.
1187     The very first chunk allocated always has this bit set,
1188     preventing access to non-existent (or non-owned) memory. If
1189     prev_inuse is set for any given chunk, then you CANNOT determine
1190     the size of the previous chunk, and might even get a memory
1191     addressing fault when trying to do so.
1192 
1193     Note that the `foot' of the current chunk is actually represented
1194     as the prev_size of the NEXT chunk. This makes it easier to
1195     deal with alignments etc but can be very confusing when trying
1196     to extend or adapt this code.
1197 
1198     The two exceptions to all this are
1199 
1200      1. The special chunk `top' doesn't bother using the
1201         trailing size field since there is no next contiguous chunk
1202         that would have to index off it. After initialization, `top'
1203         is forced to always exist.  If it would become less than
1204         MINSIZE bytes long, it is replenished.
1205 
1206      2. Chunks allocated via mmap, which have the second-lowest-order
1207         bit M (IS_MMAPPED) set in their size fields.  Because they are
1208         allocated one-by-one, each must contain its own trailing size field.
1209 
1210 */
1211 
1212 /*
1213   ---------- Size and alignment checks and conversions ----------
1214 */
1215 
1216 /* conversion from malloc headers to user pointers, and back */
1217 
1218 #define chunk2mem(p)   ((void*)((char*)(p) + 2*SIZE_SZ))
1219 #define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))
1220 
1221 /* The smallest possible chunk */
1222 #define MIN_CHUNK_SIZE        (offsetof(struct malloc_chunk, fd_nextsize))
1223 
1224 /* The smallest size we can malloc is an aligned minimal chunk */
1225 
1226 #define MINSIZE  \
1227   (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK))
1228 
1229 /* Check if m has acceptable alignment */
1230 
1231 #define aligned_OK(m)  (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0)
1232 
1233 #define misaligned_chunk(p) \
1234   ((uintptr_t)(MALLOC_ALIGNMENT == 2 * SIZE_SZ ? (p) : chunk2mem (p)) \
1235    & MALLOC_ALIGN_MASK)
1236 
1237 
1238 /*
1239    Check if a request is so large that it would wrap around zero when
1240    padded and aligned. To simplify some other code, the bound is made
1241    low enough so that adding MINSIZE will also not wrap around zero.
1242  */
1243 
1244 #define REQUEST_OUT_OF_RANGE(req)                                 \
1245   ((unsigned long) (req) >=                                                   \
1246    (unsigned long) (INTERNAL_SIZE_T) (-2 * MINSIZE))
1247 
1248 /* pad request bytes into a usable size -- internal version */
1249 
1250 #define request2size(req)                                         \
1251   (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)  ?             \
1252    MINSIZE :                                                      \
1253    ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)
1254 
1255 /*  Same, except also perform argument check */
1256 
1257 #define checked_request2size(req, sz)                             \
1258   if (REQUEST_OUT_OF_RANGE (req)) {                                           \
1259       __set_errno (ENOMEM);                                                   \
1260       return 0;                                                               \
1261     }                                                                         \
1262   (sz) = request2size (req);
1263 
其中,有很多的宏定義,我們只看最主要的幾個。request2size負責內存對齊操作,MINSIZE是malloc時內存佔用的最小內存單元,32位系統爲16字節,64位系統爲32字節,MALLOC_ALIGNMENT爲內存對齊字節數,由於在32和64位系統中,size_t爲4字節和8字節,所以MALLOC_ALIGNMENT在32位和64位系統中,分別爲8和16.

實際上,對齊參數(MALLOC_ALIGNMENT)大小的設定需要滿足以下兩點:

1. 必須是2的冪

2. 必須是void *的整數倍

所以從request2size可知,在64位系統,如果申請內存爲1~24字節,系統內存消耗32字節,當申請25字節的內存時,系統內存消耗48字節。而對於32位系統,申請內存爲1~12字節時,系統內存消耗爲16字節,當申請內存爲13字節時,系統內存消耗爲24字節。

這裏分享一個別人寫的怎麼實現一個簡單的malloc函數:http://blog.codinglabs.org/articles/a-malloc-tutorial.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章