使用pg_repack實現在線vacuum

Postgresql通過數據多版本實現mvcc,刪除數據並不會真正刪除數據,而是修改標識,更新是通過刪除+插入的方式進行,所以在頻繁更新的系統,數據膨脹是個頭疼的問題,如果不進行處理,數據膨脹倍數可能達到十幾倍。

爲了處理膨脹問題,pg提供了vacuum工具,vacuum分爲普通vacuum和vacuum full,普通vacuum會清理死元組,但是不會進行空間重組,磁盤上的空間不會釋放,但是會釋放死元組的空間,後續插入的元組會根據空閒空間管理fsm優先插入空閒空間。Vacuum full清理會釋放磁盤空間,但是會獲取八級鎖,因爲vacuum full的原理是新建一個表數據文件,然後從老表中拷貝數據到新文件中,這個過程會阻塞select。

因爲影響業務,pg社區開發了pg_repack工具,老版本叫pg_reorg。Pg_repack以extension的方式存在,用戶可以自己安裝該插件。本文簡單介紹一下pg_repack的使用。

Pg_repack的源碼在github或者pgxn上都可以下載,這裏編譯安裝、創建插件的過程就不再贅述。

在安裝完後,就可以在操作系統命令行使用pg_repack命令了,下面列舉了pg_repack的一些命令用法:

pg_repack --no-order --table test_1 test
pg_repack --wait-timeout 3600 --jobs 10 --no-order -d test
pg_repack --wait-timeout 3600 --jobs 10 --no-order --schema=test -d test
pg_repack --wait-timeout 3600 --jobs 10 --only-indexes --table test.test_1 --no-order -d test
pg_repack --wait-timeout 3600 --jobs 10 --index test.idx1 --no-order -d test

經過測試,在執行pg_repack的同時對錶進行併發查詢,性能下降大概只有10%到20%,讀取操作可以正常進行。並且表的oid沒有發生變化,repack執行完成後,通過pg_relation_filepath()函數查詢發現表的數據文件發生了改變,同時會刪除原來的數據文件,其實執行vacuum full數據文件也會發生改變。

test=# select pg_relation_filepath(16475);
 pg_relation_filepath 
----------------------
 base/16387/16580
(1 row)

test=# select pg_relation_filepath(16475);
 pg_relation_filepath 
----------------------
 base/16387/16601
(1 row)

下面聊聊repack的原理吧,原理其實和vacuum full是類似的,都是新建一個文件,然後將老文件數據拷貝過來,然後進行文件切換,它不阻塞讀寫的祕訣就是新建文件和拷貝的過程是在線做的,在沒有完成拷貝之前,原來的文件還是可以讀寫的,只有在切表那一瞬間可能會有影響。

那麼它是怎麼做到在線拷貝的呢?源庫的數據文件一直在變,所以表文件其實分爲兩部分,一部分是基礎數據,一部分是增量數據,基礎數據的拷貝就是正常的拷貝,增量數據是通過創建觸發器來捕獲在該表上的讀寫操作來實現的,待基礎數據拷貝完後再將trigger捕獲的增量sql進行應用,達到最終結果。

我們其實可以發現,pg_repack會在庫裏創建名爲repack的schema,裏面有兩張表:primary_keys和tables。Primarys分爲兩列,第一列indrelid代表表的oid,第二列indexrelid代表主鍵或唯一索引的oid。Tables表記錄了創建trigger以及捕獲的相關語句,語句按一條條的record進行記錄。如下所示:

test=# select * from tables;
-[ RECORD 1 ]-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname           | public.products
relid             | 16388
reltoastrelid     | 16391
reltoastidxid     | 16393
schemaname        | public
pkid              | 
ckid              | 
create_pktype     | 
create_log        | CREATE TABLE repack.log_16388 (id bigserial PRIMARY KEY, pk repack.pk_16388, row public.products)
create_trigger    | 
enable_trigger    | ALTER TABLE public.products ENABLE ALWAYS TRIGGER repack_trigger
create_table_1    | CREATE TABLE repack.table_16388 WITH (oids = false) TABLESPACE 
tablespace_orig   | pg_default
create_table_2    |  AS SELECT product_no,name,price FROM ONLY public.products
copy_data         | INSERT INTO repack.table_16388 SELECT product_no,name,price FROM ONLY public.products
alter_col_storage | 
drop_columns      | 
delete_log        | DELETE FROM repack.log_16388
lock_table        | LOCK TABLE public.products IN ACCESS EXCLUSIVE MODE
ckey              | 
sql_peek          | SELECT * FROM repack.log_16388 ORDER BY id LIMIT $1
sql_insert        | INSERT INTO repack.table_16388 VALUES ($1.*)
sql_delete        | 
sql_update        | 
sql_pop           | DELETE FROM repack.log_16388 WHERE id IN (
-[ RECORD 2 ]-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname           | public.test
relid             | 16400
reltoastrelid     | 0
reltoastidxid     | 0
schemaname        | public
pkid              | 
ckid              | 
create_pktype     | 
create_log        | CREATE TABLE repack.log_16400 (id bigserial PRIMARY KEY, pk repack.pk_16400, row public.test)
create_trigger    | 
enable_trigger    | ALTER TABLE public.test ENABLE ALWAYS TRIGGER repack_trigger
create_table_1    | CREATE TABLE repack.table_16400 WITH (oids = false) TABLESPACE 
tablespace_orig   | pg_default
create_table_2    |  AS SELECT id FROM ONLY public.test
copy_data         | INSERT INTO repack.table_16400 SELECT id FROM ONLY public.test
alter_col_storage | 
drop_columns      | 
delete_log        | DELETE FROM repack.log_16400
lock_table        | LOCK TABLE public.test IN ACCESS EXCLUSIVE MODE
ckey              | 
sql_peek          | SELECT * FROM repack.log_16400 ORDER BY id LIMIT $1
sql_insert        | INSERT INTO repack.table_16400 VALUES ($1.*)
sql_delete        | 
sql_update        | 
sql_pop           | DELETE FROM repack.log_16400 WHERE id IN (
-[ RECORD 3 ]-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname           | public.fruits
relid             | 16394
reltoastrelid     | 16397
reltoastidxid     | 16399
schemaname        | public
pkid              | 
ckid              | 
create_pktype     | 
create_log        | CREATE TABLE repack.log_16394 (id bigserial PRIMARY KEY, pk repack.pk_16394, row public.fruits)
create_trigger    | 
enable_trigger    | ALTER TABLE public.fruits ENABLE ALWAYS TRIGGER repack_trigger
create_table_1    | CREATE TABLE repack.table_16394 WITH (oids = false) TABLESPACE 
tablespace_orig   | pg_default
create_table_2    |  AS SELECT number,name,price FROM ONLY public.fruits
copy_data         | INSERT INTO repack.table_16394 SELECT number,name,price FROM ONLY public.fruits
alter_col_storage | 
drop_columns      | 
delete_log        | DELETE FROM repack.log_16394
lock_table        | LOCK TABLE public.fruits IN ACCESS EXCLUSIVE MODE
ckey              | 
sql_peek          | SELECT * FROM repack.log_16394 ORDER BY id LIMIT $1
sql_insert        | INSERT INTO repack.table_16394 VALUES ($1.*)
sql_delete        | 
sql_update        | 
sql_pop           | DELETE FROM repack.log_16394 WHERE id IN (
-[ RECORD 4 ]-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname           | public.test_1
relid             | 16475
reltoastrelid     | 0
reltoastidxid     | 0
schemaname        | public
pkid              | 16493
ckid              | 
create_pktype     | CREATE TYPE repack.pk_16475 AS (c1 integer)
create_log        | CREATE TABLE repack.log_16475 (id bigserial PRIMARY KEY, pk repack.pk_16475, row public.test_1)
create_trigger    | CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON public.test_1 FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('INSERT INTO repack.log_16475(pk, row) VALUES( CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.c1)::repack.pk_16475) END, $2)')
enable_trigger    | ALTER TABLE public.test_1 ENABLE ALWAYS TRIGGER repack_trigger
create_table_1    | CREATE TABLE repack.table_16475 WITH (oids = false) TABLESPACE 
tablespace_orig   | pg_default
create_table_2    |  AS SELECT c1,c2,c3,c4,c5 FROM ONLY public.test_1
copy_data         | INSERT INTO repack.table_16475 SELECT c1,c2,c3,c4,c5 FROM ONLY public.test_1
alter_col_storage | 
drop_columns      | 
delete_log        | DELETE FROM repack.log_16475
lock_table        | LOCK TABLE public.test_1 IN ACCESS EXCLUSIVE MODE
ckey              | 
sql_peek          | SELECT * FROM repack.log_16475 ORDER BY id LIMIT $1
sql_insert        | INSERT INTO repack.table_16475 VALUES ($1.*)
sql_delete        | DELETE FROM repack.table_16475 WHERE (c1) = ($1.c1)
sql_update        | UPDATE repack.table_16475 SET (c1, c2, c3, c4, c5) = ($2.c1, $2.c2, $2.c3, $2.c4, $2.c5) WHERE (c1) = ($1.c1)
sql_pop           | DELETE FROM repack.log_16475 WHERE id IN (

好吧,加油吧。

歡迎關注我的公衆號:數據庫架構之美

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章