目前在使用Linux提供的auditd的審計功能。發現在大量的系統調用的情況下,性能上不去。
使用valgrind對其進行性能分析。
目前對auditd進行分段分析性能,關閉寫文件write_logs=no (/etc/audit/auditd.conf),接收到kauditd從netlink發過來的數據後,關閉處理完之後,不轉發給/sbin/audispd, 實現方式是在src/auditd.c中distribute_event函數中,註釋掉其中的轉發代碼:
void distribute_event(struct auditd_event *e)
{
int attempt = 0, route = 1, proto;
if (config.log_format == LF_ENRICHED)
proto = AUDISP_PROTOCOL_VER2;
else
proto = AUDISP_PROTOCOL_VER;
/* If type is 0, then its a network originating event */
if (e->reply.type == 0) {
// See if we are distributing network originating events
if (!dispatch_network_events())
route = 0;
else { // We only need the original type if its being routed
e->reply.type = extract_type(e->reply.message);
// Treat everything from the network as VER2
// because they are already formatted. This is
// important when it gets to the dispatcher which
// can strip node= when its VER1.
proto = AUDISP_PROTOCOL_VER2;
}
} else if (e->reply.type != AUDIT_DAEMON_RECONFIG)
// All other local events need formatting
format_event(e);
else
route = 0; // Don't DAEMON_RECONFIG events until after enqueue
/* Make first attempt to send to plugins */
//if (route && dispatch_event(&e->reply, attempt, proto) == 1)
// attempt++; /* Failed sending, retry after writing to disk */
/* End of Event is for realtime interface - skip local logging of it */
if (e->reply.type != AUDIT_EOE)
handle_event(e); /* Write to local disk */
/* Last chance to send...maybe the pipe is empty now. */
/* 去除將event 轉發到 audispd進程*/
// if ((attempt && route) || (e->reply.type == AUDIT_DAEMON_RECONFIG))
// dispatch_event(&e->reply, attempt, proto);
/* Free msg and event memory */
cleanup_event(e);
}
編譯好之後,使用valgrind啓動auditd進行處理數據:
運行方式: valgrind --tool=callgrind --separate-threads=yes /home/admin/auditd/sbin/auditd
根據上面的圖分析發現:malloc 、free 、 snprintf佔比比較高。
因爲之前瞭解過一些內存分配器,首先使用tcmalloc進行測試,對比性能。
git clone https://github.com/gperftools/gperftools.git
最新代碼進行編譯安裝
運行方式與之前的一樣,採用的是LD_PRELOAD方式,進行替換libc
測試結果:
內存分配幾乎不佔用資源
jemalloc:
git clone https://github.com/jemalloc/jemalloc.git
測試結果:
測試場景,使用xftp傳輸CentOS-7-x86_64-DVD-1810.iso,4.27 GB
對比結果,ptmalloc(系統自帶的,性能最差), tcmalloc 、jemalloc,基本持平,性能提升10%左右,
未使用統計工具統計,對比cpu tcmalloc、 jemalloc 維持在10%, ptmalloc 11.2%.
後期優化auditd需要對malloc進行單獨封裝,源碼直接使用malloc, 改動會很大。