Berkeley DB Performance Test

各位讀者,很抱歉這篇文章是英文的,我當初做筆記的時候,寫成英文了,這樣纔可以在同事之間交流。
而現在確實沒時間翻譯過來了,還望大家理解,謝謝!



In this article I'd like to talk about the caveats and how-to's when doing
performance test with Berkeley DB, when the data volume is huge.
For legal reasons I can not publish the result of my test without further
approval, so I decided not to do so.

I. Context

I need to insert 10 billion key/data pairs to a btree database, each key
item is 768 bytes, with no duplicate keys, and keys are inserted increasing only;
Each data item varies between KB/2 to 1KB. Thus each key/data pair
varies between 1.25KB to 2KB.


After insertion, I search for some keys, some keys are in the db, others not,
to find out the average time to insert a key/data pair, and to find a key/data
pair. I use a very powerful Linux 64bit machine, which has 4 processers,
each processor is a 3.2GB intel Xeon, 8GB memory and 8TB of storage.


II. Things to Note

The specially huge data volumn means a lot in this test, there are many things to note:

1. Integer overflow.
Loop variables will overflow if we insert so many key/data pairs in a loop, so
split the job into pieces, making sure each piece of job won't overflow an 32bit integer.
If you used signed loop variables, they can go negative before reaching the limit
value, and thus falls into an endless loop. In such a use case, we should be very
careful about all integer variables being overflown.

Another example: If using 32bit integer as index keys, they are also overflown,
causing not as many key/data pairs inserted as assumed, because later key/data pairs
with keys already present in db will overwrite previous ones.
So we must use DB_SEQUENCE to generate 64bit sequential keys.

2. Random integer generator (rand() C function) loses randomness.
The RAND_MAX is only 64K, and when over 32K random integers are generated,
the randomness fades, and I observed some none-randomness in my test code.

3. Huge key/data pair.

Context: Each key/data pair can be as large as 2KB, and there can be millions of pages.
Problem:
A lot of overflow pages can be generated if using default page size.
Solution:
Set page size to 64k, so that each internal node can hold most number of keys, thus,
the internal space waste, the number of times to read more internal nodes during
search and the number of overflown pages is minimized. Concurrency is not harmed
since each key/data pair is so big.

Problem:
Stack space insufficient.
Solution:
Do not allocate huge buffer (more than several MB) on stack, allocate it on heap,
otherwise the stack may not be big enough.

4. Berkeley DB configuration
a. Do not use logs, otherwise, we will have to store another copy of the dataset,
   which is too much in this case as we are inserting 10 billion records each can be
   3KB/4 to 2KB size.
b. Need a huge cache, otherwise the internal nodes won't fit into the cache, btree
   search would be too slow.
c. When cache size is set to 6G on seneca, if using DB_PRIVATE, it is very slow,
   cpu usage is always less than 10%, because virtual memory (backed by disk) is used;
   when turned to use file mapping, cpu usage can be over 70%(later observation shows
   that cpu usage falls back to less than 10% after several millions of key/data pair
   are inserted, only when key/data pairs are inserted in order, did the insertion
   became fast. This is practical though, since we can always find sequentially
   increasing keys to use, and we can rely on secondary keys to use the various real
   keys.), the program is much faster.
d. May need to start multiple processes and insert simultaneously. And may need to
   split the cache into multiple files on some platforms which limit the maximum file size.
e. Use partitioned database, so that each database file is not so huge. If using
   multiple processes at the same time, each process inserting in one db file,
   concurrency can be promoted a lot. Though here I only used one process.

5. When changed to use ordered (increasing) keys, the insertion is much faster,
   as expected, because search becomes a very cheap operation in this case.
   In such a huge dataset, always use predictable sequentially increasing keys, like
   sequence numbers(1, 2, 3, ...). For example we want to insert many Person
   objects, each of which has an 'ID' field, though we can not get Person objects
   ordered by 'ID', then we should use a sequence as the key for primary db
   containing Person objects, and create secondary databases each of which uses
   one of the properties of Person. The 'ID' secondary db is much smaller because
   the 'data' item of a key/data pair in it is only a sequence number, thus much
   easier and faster to find the Person object by its ID, and can insert very
   fast at the same time.

6. When the search phase begins, the pages are gradually loaded into cache, but
   since the keys were randomly chosen, the search is not fast. A lot of
   internal btree pages are being swapped in/out of the cache.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章