性能優化篇(2):小心“STL 低效率用法”所帶來的性能開銷
Author:stormQ
Sunday, 17. November 2019 1 03:53PM
善用 reserve 預分配內存
-
reserve 函數的作用
將容器的容量(即可以容納的最大元素數量)調整爲指定的大小
n
,如果n
小於容器的當前容量,那麼直接忽略此操作。 -
適用場景
- 如果
std::vector
中要添加的元素數量已知,那麼在添加元素前(通常是在循環中添加元素前)使用reserve
函數預分配std::vector
需要的內存。這樣,可以避免由於std::vector
內部發生數據遷移而帶來的不必要的元素拷貝開銷、創建新內存開銷和釋放舊內存開銷,從而改善程序性能。
- 如果
-
代碼示例
源碼:
// main.cc
#include <vector>
#define N 1980*1024
class TestObject
{
public:
TestObject(int a, int b, int c) : a_(a), b_(b), c_(c) {}
int a_;
int b_;
int c_;
};
std::vector<TestObject> test_data_1;
std::vector<TestObject> test_data_2;
void poor(std::vector<TestObject> &data)
{
for (int i = 0; i < N; i++)
{
data.push_back(TestObject(i, i+1, i+2));
}
}
void better(std::vector<TestObject> &data)
{
data.reserve(N);
for (int i = 0; i < N; i++)
{
data.push_back(TestObject(i, i+1, i+2));
}
}
int main()
{
poor(test_data_1);
better(test_data_2);
return 0;
}
編譯:
$ g++ -std=c++11 -g -Og -o main_Og main.cpp
函數耗時統計:
啓動程序方式 | 第一次執行耗時(us) | 第二次執行耗時(us) | 第三次執行耗時(us) | 第四次執行耗時(us) | 第五次執行耗時(us) |
---|---|---|---|---|---|
./main_Og |
從統計結果中可以看出,示例中better()
函數的執行速度比poor()
函數至少快 2
倍。
- 代碼調試
分析poor()
函數添加前 9 個元素時,std::vector
內部數據遷移情況。具體調試過程如下:
(gdb) l 25
20
21 void poor(std::vector<TestObject> &data)
22 {
23 for (int i = 0; i < N; i++)
24 {
25 data.push_back(TestObject(i, i+1, i+2));
26 }
27 }
28
29 void better(std::vector<TestObject> &data)
(gdb) b 25
Breakpoint 3 at 0x401065: file m.cpp, line 25.
(gdb) c
Continuing.
Breakpoint 3, poor (data=std::vector of length 0, capacity 0) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
(gdb) c
Continuing.
Breakpoint 3, poor (data=std::vector of length 1, capacity 1 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
(gdb) display data.size()
1: data.size() = 1
(gdb) display &data[0]
2: &data[0] = (TestObject *) 0x614c20
(gdb) c
Continuing.
Breakpoint 3, poor (data=std::vector of length 2, capacity 2 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
1: data.size() = 2
2: &data[0] = (TestObject *) 0x614c40
(gdb)
Continuing.
Breakpoint 3, poor (data=std::vector of length 3, capacity 4 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
1: data.size() = 3
2: &data[0] = (TestObject *) 0x614c60
(gdb)
Continuing.
Breakpoint 3, poor (data=std::vector of length 4, capacity 4 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
1: data.size() = 4
2: &data[0] = (TestObject *) 0x614c60
(gdb)
Continuing.
Breakpoint 3, poor (data=std::vector of length 5, capacity 8 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
1: data.size() = 5
2: &data[0] = (TestObject *) 0x614ca0
(gdb)
Continuing.
Breakpoint 3, poor (data=std::vector of length 6, capacity 8 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
1: data.size() = 6
2: &data[0] = (TestObject *) 0x614ca0
(gdb)
Continuing.
Breakpoint 3, poor (data=std::vector of length 7, capacity 8 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
1: data.size() = 7
2: &data[0] = (TestObject *) 0x614ca0
(gdb)
Continuing.
Breakpoint 3, poor (data=std::vector of length 8, capacity 8 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
1: data.size() = 8
2: &data[0] = (TestObject *) 0x614ca0
(gdb)
Continuing.
Breakpoint 3, poor (data=std::vector of length 9, capacity 16 = {...}) at m.cpp:25
25 data.push_back(TestObject(i, i+1, i+2));
1: data.size() = 9
2: &data[0] = (TestObject *) 0x614d10
可以看出,poor()
函數在添加第 2、3、5、9 個元素時,第一個元素的地址——&data[0]
發生了變化。即在這添加這些元素時,std::vector
內部發生了數據遷移。在添加後面的某些元素時,也會導致std::vector
內部數據遷移,這裏不再贅述。
分析better()
函數添加前 9 個元素時,std::vector
內部是否有數據遷移情況。具體調試過程如下:
(gdb) l 34
29 void better(std::vector<TestObject> &data)
30 {
31 data.reserve(N);
32 for (int i = 0; i < N; i++)
33 {
34 data.push_back(TestObject(i, i+1, i+2));
35 }
36 }
37
38 int main()
(gdb) b 34
Breakpoint 4 at 0x400ffe: file m.cpp, line 34.
(gdb) c
Continuing.
Breakpoint 4, better (data=std::vector of length 0, capacity 2027520) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
(gdb) c
Continuing.
Breakpoint 4, better (data=std::vector of length 1, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
(gdb) display data.size()
3: data.size() = 1
(gdb) display &data[0]
4: &data[0] = (TestObject *) 0x7ffff2ba6010
(gdb) c
Continuing.
Breakpoint 4, better (data=std::vector of length 2, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
3: data.size() = 2
4: &data[0] = (TestObject *) 0x7ffff2ba6010
(gdb)
Continuing.
Breakpoint 4, better (data=std::vector of length 3, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
3: data.size() = 3
4: &data[0] = (TestObject *) 0x7ffff2ba6010
(gdb)
Continuing.
Breakpoint 4, better (data=std::vector of length 4, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
3: data.size() = 4
4: &data[0] = (TestObject *) 0x7ffff2ba6010
(gdb)
Continuing.
Breakpoint 4, better (data=std::vector of length 5, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
3: data.size() = 5
4: &data[0] = (TestObject *) 0x7ffff2ba6010
(gdb)
Continuing.
Breakpoint 4, better (data=std::vector of length 6, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
3: data.size() = 6
4: &data[0] = (TestObject *) 0x7ffff2ba6010
(gdb)
Continuing.
Breakpoint 4, better (data=std::vector of length 7, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
3: data.size() = 7
4: &data[0] = (TestObject *) 0x7ffff2ba6010
(gdb)
Continuing.
Breakpoint 4, better (data=std::vector of length 8, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
3: data.size() = 8
4: &data[0] = (TestObject *) 0x7ffff2ba6010
(gdb)
Continuing.
Breakpoint 4, better (data=std::vector of length 9, capacity 2027520 = {...}) at m.cpp:34
34 data.push_back(TestObject(i, i+1, i+2));
3: data.size() = 9
4: &data[0] = (TestObject *) 0x7ffff2ba6010
可以看出,better()
函數在添加前 9 個元素時,第一個元素的地址——&data[0]
都一直不變。即在這添加這些元素時,std::vector
內部未發生數據遷移。在添加後面的其他元素時,也不會導致std::vector
內部數據遷移,有興趣的可以使用條件斷點——b 34 if &data[0]!=0x7ffff2ba6010
(0x7ffff2ba6010 爲此處調試時 &data[0] 的值)驗證這一點。這正是使用 reserve 預分配內存
帶來的效果。
上面示例中還有一處可以優化的地方,下文會詳細說明。
善用 emplace_back 避免不必要的開銷
-
emplace_back 函數的作用
相比與
push_back
函數,emplace_back
函數可以直接構造元素,從而避免不必要的臨時對象構造/析構開銷。 -
適用場景
- 如果
std::vector
中要添加的元素需要傳入參數構造時,用emplace_back
而不是push_back
函數來直接構造元素以避免不必要的開銷,從而改善程序性能。
- 如果
-
代碼示例
源碼:
// main.cc
#include <vector>
#define N 1980*1024
class TestObject
{
public:
TestObject(int a, int b, int c) : a_(a), b_(b), c_(c) {}
int a_;
int b_;
int c_;
};
std::vector<TestObject> test_data_1;
std::vector<TestObject> test_data_2;
void poor(std::vector<TestObject> &data)
{
data.reserve(N);
for (int i = 0; i < N; i++)
{
data.push_back(TestObject(i, i+1, i+2));
}
}
void better(std::vector<TestObject> &data)
{
data.reserve(N);
for (int i = 0; i < N; i++)
{
data.emplace_back(i, i+1, i+2);
}
}
int main()
{
poor(test_data_1);
better(test_data_2);
return 0;
}
編譯:
$ g++ -std=c++11 -g -Og -o main_Og main.cpp
函數耗時統計:
啓動程序方式 | 第一次執行耗時(us) | 第二次執行耗時(us) | 第三次執行耗時(us) | 第四次執行耗時(us) | 第五次執行耗時(us) |
---|---|---|---|---|---|
./main_Og |
從統計結果中可以看出,示例中better()
函數的執行速度比poor()
函數至少快 1.5
倍。
- 代碼調試
爲了驗證emplace_back
函數的作用,引入三個全局變量g_count_1
、g_count_2
和g_count_3
用於統計構造函數、拷貝構造函數、析構函數被調用的次數。修改後的源碼爲:
// main.cc
#include <vector>
#define N 1980*1024
int g_count_1 = 0;
int g_count_2 = 0;
int g_count_3 = 0;
class TestObject
{
public:
TestObject(int a, int b, int c) : a_(a), b_(b), c_(c)
{
g_count_1++;
}
TestObject(const TestObject &other)
{
if (this != &other)
{
this->a_ = other.a_;
this->b_ = other.b_;
this->c_ = other.c_;
}
g_count_2++;
}
~TestObject()
{
g_count_3++;
}
int a_;
int b_;
int c_;
};
std::vector<TestObject> test_data_1;
std::vector<TestObject> test_data_2;
void poor(std::vector<TestObject> &data)
{
data.reserve(N);
for (int i = 0; i < N; i++)
{
data.push_back(TestObject(i, i+1, i+2));
}
}
void better(std::vector<TestObject> &data)
{
data.reserve(N);
for (int i = 0; i < N; i++)
{
data.emplace_back(i, i+1, i+2);
}
}
int main()
{
g_count_1 = 0;
g_count_2 = 0;
g_count_3 = 0;
poor(test_data_1);
g_count_1 = 0;
g_count_2 = 0;
g_count_3 = 0;
better(test_data_2);
g_count_1 = 0;
return 0;
}
具體調試過程如下:
(gdb) l
61
62 int main()
63 {
64 g_count_1 = 0;
65 g_count_2 = 0;
66 g_count_3 = 0;
67 poor(test_data_1);
68
69 g_count_1 = 0;
70 g_count_2 = 0;
71 g_count_3 = 0;
72 better(test_data_2);
73 g_count_1 = 0;
74
75 return 0;
76 }
(gdb) b 69
Breakpoint 2 at 0x400b3e: file main.cpp, line 69.
(gdb) b 73
Breakpoint 3 at 0x400b66: file main.cpp, line 73.
(gdb) display g_count_1
1: g_count_1 = 0
(gdb) display g_count_2
2: g_count_2 = 0
(gdb) display g_count_3
3: g_count_3 = 0
(gdb) c
Continuing.
Breakpoint 2, main () at main.cpp:69
69 g_count_1 = 0;
1: g_count_1 = 2027520
2: g_count_2 = 2027520
3: g_count_3 = 2027520
(gdb) n
70 g_count_2 = 0;
1: g_count_1 = 0
2: g_count_2 = 2027520
3: g_count_3 = 2027520
(gdb)
71 g_count_3 = 0;
1: g_count_1 = 0
2: g_count_2 = 0
3: g_count_3 = 2027520
(gdb)
72 better(test_data_2);
1: g_count_1 = 0
2: g_count_2 = 0
3: g_count_3 = 0
(gdb) n
Breakpoint 3, main () at main.cpp:73
73 g_count_1 = 0;
1: g_count_1 = 2027520
2: g_count_2 = 0
3: g_count_3 = 0
(gdb) n
76 }
1: g_count_1 = 0
2: g_count_2 = 0
3: g_count_3 = 0
可以看出,poor
函數在執行過程中調用容器元素構造函數、析構函數和拷貝構造函數的次數都是 2027520(2027520 = 1980 x 1024),驗證了push_back
函數的三種開銷:臨時對象的構造/析構開銷和拷貝開銷。better
函數在執行過程中調用容器元素構造函數、析構函數和拷貝構造函數的次數分別是 2027520、0、0,驗證了emplace_back
函數只有必要的構造元素開銷,不會引入臨時對象,避免了不必要的臨時對象構造/析構開銷,從而改善程序性能。
善用 std::move 避免不必要的拷貝開銷
-
std::move 函數的作用
移動而非拷貝數據以避免不必要的拷貝開銷,從而改善程序性能。
-
適用場景
- 構造函數的參數列表中有類型爲
std::vector
的參數,並且該參數只在構造過程中使用。
- 構造函數的參數列表中有類型爲
-
代碼示例
源碼:
// main.cpp
#include <list>
#include <vector>
#define N 1980*1024
class TestObject
{
public:
TestObject(int a, int b, int c) : a_(a), b_(b), c_(c) {}
int a_;
int b_;
int c_;
};
class Element
{
public:
explicit Element(const std::vector<TestObject> &data)
: data_(data)
{
}
explicit Element(std::vector<TestObject> &&data)
: data_(std::move(data))
{
}
private:
std::vector<TestObject> data_;
};
std::list<Element> test_data_1;
std::list<Element> test_data_2;
Element *g_ele = nullptr;
void poor(std::list<Element> &data)
{
std::vector<TestObject> vec;
vec.reserve(N);
for (int i = 0; i < N; i++)
{
vec.push_back(TestObject(i, i+1, i+2));
}
data.emplace_back(vec);
g_ele = &data.front();
}
void better(std::list<Element> &data)
{
std::vector<TestObject> vec;
vec.reserve(N);
for (int i = 0; i < N; i++)
{
vec.emplace_back(i, i+1, i+2);
}
data.emplace_back(std::move(vec));
g_ele = &data.front();
}
int main()
{
poor(test_data_1);
better(test_data_2);
return 0;
}
編譯:
$ g++ -std=c++11 -g -Og -o main_Og main.cpp
函數耗時統計:
啓動程序方式 | 第一次執行耗時(us) | 第二次執行耗時(us) | 第三次執行耗時(us) | 第四次執行耗時(us) | 第五次執行耗時(us) |
---|---|---|---|---|---|
./main_Og |
從統計結果中可以看出,示例中better()
函數的執行速度比poor()
函數至少快 2
倍。
- 代碼調試
驗證data.emplace_back(vec); 會引發拷貝開銷
,具體調試過程如下:
(gdb) l 50
35
36 std::list<Element> test_data_1;
37 std::list<Element> test_data_2;
38 Element *g_ele = nullptr;
39
40 void poor(std::list<Element> &data)
41 {
42 std::vector<TestObject> vec;
43 vec.reserve(N);
44 for (int i = 0; i < N; i++)
45 {
46 vec.push_back(TestObject(i, i+1, i+2));
47 }
48
49 data.emplace_back(vec);
50 g_ele = &data.front();
51 }
52
53 void better(std::list<Element> &data)
54 {
55 std::vector<TestObject> vec;
56 vec.reserve(N);
57 for (int i = 0; i < N; i++)
58 {
59 vec.emplace_back(i, i+1, i+2);
60 }
61
62 data.emplace_back(std::move(vec));
63 g_ele = &data.front();
64 }
(gdb) b 50
Breakpoint 3 at 0x40131c: file p.cpp, line 50.
(gdb) c
Continuing.
Breakpoint 3, poor (data=std::__cxx11::list = {...}) at p.cpp:50
50 g_ele = &data.front();
(gdb) display vec._M_impl
1: vec._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x7ffff5a2b010,
_M_finish = 0x7ffff715f010, _M_end_of_storage = 0x7ffff715f010}
(gdb) n
42 std::vector<TestObject> vec;
1: vec._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x7ffff5a2b010,
_M_finish = 0x7ffff715f010, _M_end_of_storage = 0x7ffff715f010}
(gdb) display g_ele->data_._M_impl
2: g_ele->data_._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x7ffff42f6010,
_M_finish = 0x7ffff5a2a010, _M_end_of_storage = 0x7ffff5a2a010}
可以看出,poor
函數中的臨時對象vec
的內存地址範圍爲0x7ffff5a2b010~0x7ffff715f010
(vec._M_impl._M_start 的值爲 0x7ffff5a2b010,vec._M_impl.M_end_of_storage 的值爲 0x7ffff715f010)。全局變量test_data_1
首元素的數據成員data_
的內存地址範圍爲0x7ffff42f6010~0x7ffff5a2a010
(g_ele->data._M_impl.M_start 的值爲 0x7ffff42f6010,g_ele->data._M_impl._M_end_of_storage 的值爲 0x7ffff5a2a010)。兩者的內存地址範圍不相同。因此,全局變量test_data_1
首元素的數據成員data_
是從臨時對象vec
拷貝而來的,從而驗證了data.emplace_back(vec); 會引發拷貝開銷
。
驗證data.emplace_back(std::move(vec)); 不會引發拷貝開銷
,具體調試過程如下:
(gdb) b 59
Breakpoint 4 at 0x4010ba: file p.cpp, line 59.
(gdb) b 63
Breakpoint 5 at 0x401147: file p.cpp, line 63.
(gdb) c
Continuing.
poor_function elapsed_time: 59091892 us
Breakpoint 4, better (data=empty std::__cxx11::list) at p.cpp:59
59 vec.emplace_back(i, i+1, i+2);
2: g_ele->data_._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x7ffff42f6010,
_M_finish = 0x7ffff5a2a010, _M_end_of_storage = 0x7ffff5a2a010}
(gdb) display vec._M_impl
3: vec._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x616060, _M_finish = 0x616060,
_M_end_of_storage = 0x1d4a060}
(gdb) d 4
(gdb) c
Continuing.
Breakpoint 5, better (data=std::__cxx11::list = {...}) at p.cpp:63
63 g_ele = &data.front();
2: g_ele->data_._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x7ffff42f6010,
_M_finish = 0x7ffff5a2a010, _M_end_of_storage = 0x7ffff5a2a010}
3: vec._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x0, _M_finish = 0x0,
_M_end_of_storage = 0x0}
(gdb) n
55 std::vector<TestObject> vec;
2: g_ele->data_._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x616060, _M_finish = 0x1d4a060,
_M_end_of_storage = 0x1d4a060}
3: vec._M_impl = {<std::allocator<TestObject>> = {<__gnu_cxx::new_allocator<TestObject>> = {<No data fields>}, <No data fields>}, _M_start = 0x0, _M_finish = 0x0,
_M_end_of_storage = 0x0}
可以看出,better
函數中的臨時對象vec
的內存地址範圍爲0x616060~0x1d4a060
(vec._M_impl._M_start 的值爲 0x616060,vec._M_impl.M_end_of_storage 的值爲 0x1d4a060)。全局變量test_data_2
首元素的數據成員data_
的內存地址範圍爲0x616060~0x1d4a060
(g_ele->data._M_impl.M_start 的值爲 0x616060,g_ele->data._M_impl._M_end_of_storage 的值爲 0x1d4a060)。兩者的內存地址範圍相同。因此,全局變量test_data_2
首元素的數據成員data_
是直接將臨時對象vec
移動而來的,從而驗證了data.emplace_back(std::move(vec)); 不會引發拷貝開銷
。另外,需要注意的是,臨時對象vec
在被移動之後將不再有效。
如果你覺得本文對你有所幫助,歡迎關注公衆號,支持一下!