摘要
在MySQL 5.7中,支持倆種的Generated Column,即Virtual Generated Column和Stored Generated Column,前者只將Generated Column 保存在數據字典中(表的元數據),並不會將這一列數據持久化到磁盤上;後者會將Generated Column 持久化到磁盤上,而不是每次讀取的時候計算所得。很明顯,後者存放了可以通過已有的數據計算得的數據,需要更多的磁盤空間,與Virtual Column相比並沒有優勢,因此,MySQL5.7中,不指定Generated Column的類型的時候,默認是Virtual Generated Column。
- 如果需要Stored Generated Column的話,可能在Virtual Genterated Column上建立索引更加合適。
語法
<type> [ GENERATED ALWAYS ] AS ( <expression> ) [ VIRTUAL|STORED ] [ UNIQUE [ KEY ] ] [ NOT NULL ] [COLUMN <text> ]
實際應用
- 表結構
mysql> show create table fen_simpic \G
*************************** 1. row ***************************
Table: fen_simpic
Create Table: CREATE TABLE `fen_simpic` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`group` int(11) NOT NULL COMMENT '截圖的視頻帖號',
`item` int(2) NOT NULL COMMENT '截圖的順序號',
`mh` char(144) DEFAULT NULL COMMENT '截圖的漢明哈希值',
`dct` bigint(20) unsigned DEFAULT NULL COMMENT '截圖的dct哈希值',
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '記錄生成時間',
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`),
KEY `group` (`group`,`item`),
) ENGINE=InnoDB AUTO_INCREMENT=2599837 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql>
2.慢SQL和執行計劃
mysql> explain select `group`, `item` , dct , mh, bit_count(dct^17228540329887592107) as dist from fen_simpic force index(created_at) where created_at<"2018-05-08 21:44:09" and created_at>"2018-04-09 10:15:50.463238" and `group` not in (120381696,120381705,120381709,120381714,120381718,120381736,120381747,120381753,120381763,120381776,120381787,120381808,120381820,120381837,120381857,120381861,120382022,120381776) and (`item`>=3 and `item`<=5) having dist<=26 order by dist limit 5000;
+----+-------------+------------+------------+-------+---------------+------------+---------+------+---------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+------------+---------+------+---------+----------+----------------------------------------------------+
| 1 | SIMPLE | fen_simpic | NULL | range | created_at | created_at | 4 | NULL | 1071840 | 5.55 | Using index condition; Using where; Using filesort |
+----+-------------+------------+------------+-------+---------------+------------+---------+------+---------+----------+----------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql>
3.請求耗時
mysql> show profile for query 52;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.008504 |
| checking permissions | 0.000009 |
| Opening tables | 0.000028 |
| init | 0.000049 |
| System lock | 0.000012 |
| optimizing | 0.000017 |
| statistics | 0.000107 |
| preparing | 0.000025 |
| Sorting result | 0.000006 |
| executing | 0.000003 |
| Sending data | 0.000010 |
| Creating sort index | 1.088568 |
| end | 0.000011 |
| query end | 0.000013 |
| closing tables | 0.000010 |
| freeing items | 0.000270 |
| logging slow query | 0.000060 |
| cleaning up | 0.000018 |
+----------------------+----------+
18 rows in set, 1 warning (0.00 sec)
4.創建虛擬列
mysql> alter table fen_simpic add column dist tinyint(1) generated always as (bit_count(dct^17228540329887592107)) virtual;
mysql> alter table fen_simpic add index idx_dist(dist);
mysql> show create table fen_simpic \G
*************************** 1. row ***************************
Table: fen_simpic
Create Table: CREATE TABLE `fen_simpic` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`group` int(11) NOT NULL COMMENT '截圖的視頻帖號',
`item` int(2) NOT NULL COMMENT '截圖的順序號',
`mh` char(144) DEFAULT NULL COMMENT '截圖的漢明哈希值',
`dct` bigint(20) unsigned DEFAULT NULL COMMENT '截圖的dct哈希值',
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '記錄生成時間',
`dist` tinyint(1) GENERATED ALWAYS AS (bit_count((`dct` ^ 17228540329887592107))) VIRTUAL,
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`),
KEY `group` (`group`,`item`),
KEY `idx_dist` (`dist`)
) ENGINE=InnoDB AUTO_INCREMENT=2599837 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql>
5.執行SQL
mysql> explain select `group`, `item` , dct , mh, dist from fen_simpic force index(idx_dist) where created_at<"2018-05-08 21:44:09" and created_at>"2018-04-09 10:15:50.463238" and `group` not in (120381696,120381705,120381709,120381714,120381718,120381736,120381747,120381753,120381763,120381776,120381787,120381808,120381820,120381837,120381857,120381861,120382022,120381776) and (`item`>=3 and `item`<=5) having dist<=26 order by dist limit 5000;
+----+-------------+------------+------------+-------+---------------+----------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+----------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | fen_simpic | NULL | index | NULL | idx_dist | 2 | NULL | 2502423 | 0.62 | Using where |
+----+-------------+------------+------------+-------+---------------+----------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql>
6.請求耗時
mysql> show profile for query 57;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000133 |
| checking permissions | 0.000009 |
| Opening tables | 0.000029 |
| init | 0.000049 |
| System lock | 0.000012 |
| optimizing | 0.000016 |
| statistics | 0.000029 |
| preparing | 0.000023 |
| Sorting result | 0.000006 |
| executing | 0.000003 |
| Sending data | 0.212587 |
| end | 0.000013 |
| query end | 0.000012 |
| closing tables | 0.000012 |
| freeing items | 0.000279 |
| cleaning up | 0.000018 |
+----------------------+----------+
16 rows in set, 1 warning (0.00 sec)
7.進一步改進的SQL
mysql> explain select t1.`group`, t1.`item` , t1.dct , t1.dist from fen_simpic t1 inner join (select id,dist from fen_simpic force index(idx_dist) where created_at<"2018-05-08 21:44:09" and created_at>"2018-04-09 10:15:50.463238" and `group` not in (120381696,120381705,120381709,120381714,120381718,120381736,120381747,120381753,120381763,120381776,120381787,120381808,120381820,120381837,120381857,120381861,120382022,
+----+-------------+------------+------------+--------+---------------+----------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+----------+---------+-------+---------+----------+-------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 5000 | 100.00 | NULL |
| 1 | PRIMARY | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | t2.id | 1 | 100.00 | NULL |
| 2 | DERIVED | fen_simpic | NULL | index | NULL | idx_dist | 2 | NULL | 2502423 | 0.62 | Using where |
+----+-------------+------------+------------+--------+---------------+----------+---------+-------+---------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
mysql>
8.進一步改進的SQL的耗時
mysql> show profile for query 58;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.005367 |
| checking permissions | 0.000007 |
| checking permissions | 0.000005 |
| Opening tables | 0.000032 |
| init | 0.000081 |
| System lock | 0.000013 |
| optimizing | 0.000015 |
| optimizing | 0.000015 |
| statistics | 0.000031 |
| preparing | 0.000023 |
| Sorting result | 0.000010 |
| statistics | 0.000026 |
| preparing | 0.000011 |
| executing | 0.000009 |
| Sending data | 0.000009 |
| executing | 0.000002 |
| Sending data | 0.201685 |
| end | 0.000012 |
| query end | 0.000013 |
| closing tables | 0.000005 |
| removing tmp table | 0.000008 |
| closing tables | 0.000009 |
| freeing items | 0.000340 |
| cleaning up | 0.000028 |
+----------------------+----------+
24 rows in set, 1 warning (0.00 sec)
總結
- 在原生的SQL中剛剛開始有使用force index(created_at) 主要是因爲在進行所有過濾的時候,過濾的數據一般超過30%左右就會進行全文掃描,不會使用索引。所以纔會使用強制索引,還有就是在選擇索引的時候會選擇選擇率比較高的索引。
- 在進行SQL耗時分析的時候,可以比較明顯的看出耗時大部分都是在Create sort index上面,因爲排序使用的是dist,這個列在表中實際上是不存在的,所以會在計算完之後再創建排序索引。
- 虛擬列在類似與這種計算後的值進行排序和過濾有很大的幫助。
- 在優化之後進行進一步的SQL改寫的目的,其實是爲了減少返回的數據量。