mysql官方文檔之Range Optimization

        The range access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections give descriptions of conditions under which the optimizer uses range access.

        範圍獲取方法使用一個索引來檢索包含有一個或多個索引值間隔內的表中行的子集。它可以用於單列或多列索引。下面的部分描述了優化器使用量程訪問的條件。

 1、The Range Access Method for Single-Part Indexes.

 2、The Range Access Method for Multiple-Part Indexes.

 3、Equality Range Optimization of Many-Valued Comparisons.

 4、Limiting Memory Use for Range Optimization.

 5、Range Optimization of Row Constructor Expressions.

The Range Access Method for Single-Part Indexes(單列索引範圍獲取方法)

        For a single-part index, index value intervals can be conveniently represented by corresponding conditions in the WHERE clause, denoted as range conditions rather than “intervals.”

        對於單列索引,索引值間隔可以方便地用WHERE子句中的相應條件表示,表示爲範圍條件,而不是“間隔”。

The definition of a range condition for a single-part index is as follows:對於單列索引的範圍條件定義如下:

        (1)For both BTREE and HASH indexes, comparison of a key part with a constant value is a rangecondition when using the =, <=>, IN(), IS NULL, or IS NOT NULL operators.對於BTREE和散列索引來說,在使用=、<=>,IN()、IS NULL,IS NOT NULL等操作符對鍵部分和常量值的比較是一個範圍條件。

        (2)Additionally, for BTREE indexes, comparison of a key part with a constant value is a range condition when using the >, <, >=, <=, BETWEEN, !=, or <> operators, or LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character.此外,對於BTREE索引來說,在使用>、<、>=、<=、BETWEEN 、!=或<>,LIKE操作符將鍵部分與常量值進行比較是一個範圍條件,這裏的LIKE比較的參數必須是一個不以通配符開頭的常量字符串。

        For all index types, multiple range conditions combined with OR or AND form a range condition.“Constant value” in the preceding descriptions means one of the following:

        對於所有的索引類型,用AND或者OR將多個範圍條件結合形成一個範圍條件。在前面的描述中,“常量值”是指以下內容之一:

        (1)A constant from the query string(一個常量形式的查詢字符串)

        (2)A column of a const or system table from the same join。來自同一連接表的const列或系統表的列

        (3)The result of an uncorrelated subquery。一個不相關子查詢的結果

        (4)Any expression composed entirely from subexpressions of the preceding types。任何完全由前一種類型的子表達式組成的表達式

Here are some examples of queries with range conditions in the WHERE clause:下面是一些在WHERE子句中具有範圍條件的查詢示例:

SELECT * FROM t1 WHERE key_col > 1 AND key_col < 10;
SELECT * FROM t1 WHERE key_col = 1 OR key_col IN (15,18,20);
SELECT * FROM t1 WHERE key_col LIKE 'ab%' OR key_col BETWEEN 'bar' AND 'foo';
        Some nonconstant values may be converted to constants during the optimizer constant propagation phase.

        在優化器恆定的傳播階段,一些非常量值可能被轉換成常量。

        MySQL tries to extract range conditions from the WHERE clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.

        MySQL試圖從每個可能的索引的WHERE子句中提取範圍條件。在提取過程中,放棄了不能用於構造範圍條件的條件,組合了重疊範圍的條件,並消除了產生空範圍的條件。(儘可能的去使用索引去建立範圍條件,縮小區間)

        Consider the following statement, where key1 is an indexed column and nonkey is not indexed:

        考慮下面的語句,其中key1是一個索引列,而nonkey沒有被索引:

SELECT * FROM t1 WHERE
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
(key1 < 'bar' AND nonkey = 4) OR
(key1 < 'uux' AND key1 > 'z');

        The extraction process for key key1 is as follows:關鍵key1的提取過程如下:

1. Start with original WHERE clause:從最初的WHERE子句開始:

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
(key1 < 'bar' AND nonkey = 4) OR
(key1 < 'uux' AND key1 > 'z')

2. Remove nonkey = 4 and key1 LIKE '%b' because they cannot be used for a range scan. The correct way to remove them is to replace them with TRUE, so that we do not miss any matching rows when doing the range scan. Replacing them with TRUE yields:移除非鍵nokey=4和key1 like “%b”,因爲它們不能用於範圍掃描。移除它們的正確方法是用TRUE替換它們,這樣當進行範圍掃描時,我們不會遺漏任何匹配的行

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR
(key1 < 'bar' AND TRUE) OR
(key1 < 'uux' AND key1 > 'z')
3. Collapse conditions that are always true or false: 皺縮條件總是true或者false

• (key1 LIKE 'abcde%' OR TRUE) is always true

• (key1 < 'uux' AND key1 > 'z') is always false

Replacing these conditions with constants yields:用常數代替這些條件

(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)

Removing unnecessary TRUE and FALSE constants yields:刪除不必要的真和假常量:

(key1 < 'abc') OR (key1 < 'bar')

4. Combining overlapping intervals into one yields the final condition to be used for the range scan:將重疊的區間合併成一個,可以得到用於範圍掃描的最終條件:

(key1 < 'bar')

        In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the WHERE clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full WHERE clause.

        一般情況下(如前面的例子所示),範圍掃描所使用的條件比WHERE子句的限制性更小。MySQL執行額外的檢查以過濾出滿足範圍條件但不滿足完整的WHERE子句行。

        The range condition extraction algorithm can handle nested AND/OR constructs of arbitrary depth, and

its output does not depend on the order in which conditions appear in WHERE clause.

      範圍條件提取算法可以處理任意深度的嵌套的AND/OR結構,其輸出不依賴於WHERE子句中條件出現的順序。  

     MySQL does not support merging multiple ranges for the range access method for spatial indexes. To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT.

        MySQL不支持爲空間索引的範圍訪問方法合併多個範圍。爲了解決這個限制,您可以使用完全相同的SELECT語句的聯合,您將每個空間謂詞放在不同的SELECT語句中除外。(這個差那麼點意思,理解不到位)

The Range Access Method for Multiple-Part Indexes((聚族)多列索引範圍獲取方法)

        Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.          

        聚族索引的範圍條件是單列索引的範圍條件的擴展。聚族索引的範圍條件限制索引行位於一個或幾個關鍵元組區間內。關鍵元組區間是定義在一組使用索引順序的關鍵元祖集合之上的。             

        For example, consider a multiple-part index defined as key1(key_part1, key_part2, key_part3), and the following set of key tuples listed in key order:例如,考慮一個被定義爲key1(keypart1、keypart2、keypart3)的聚族索引,以及下面列出的關鍵順序的一系列關鍵元組:

key_part1 key_part2 key_part3
NULL       1         'abc'
NULL       1         'xyz'
NULL       2         'foo'
1          1         'abc'
1          1         'xyz'
1          2         'abc'
2          1         'aaa'

The condition key_part1 = 1 defines this interval:條件key_part1=1定義了這個區間:

(1,-inf,-inf) <= (key_part1,key_part2,key_part3) < (1,+inf,+inf)  inf表示無窮大

        The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.

        該區間涵蓋前一組數據集的第4、第5和第6個元組,可用於範圍訪問方法。(是由key_part1=1篩選出來)

        By contrast, the condition key_part3 = 'abc' does not define a single interval and cannot be used by the range access method.

        相比之下,條件key_part3 = ' abc '沒有定義一個區間,不能使用的範圍訪問方法。

        The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.

        下面的描述說明了在更詳細的情況下,聚族索引是如何工作的。

        For HASH indexes, each interval containing identical values can be used. This means that the interval can be produced only for conditions in the following form:

        對於HASH 索引,可以使用包含相同值的區間。這意味着該區間只能以下列形式的條件產生:

key_part1 cmp const1  
AND key_part2 cmp const2
AND ...
AND key_partN cmp constN;

        Here, const1, const2, … are constants, cmp is one of the =, <=>, or IS NULL comparison operators, and the conditions cover all index parts. (That is, there are N conditions, one for each part of an N-part index.)在這裏,const1,const2,是常量,cmp是一個=,<=>,或者 IS NULL 比較運算符等操作符中的一個,條件覆蓋所有的索引列。(也就是說,有N個條件,一個是N列索引的每個部分。)

        For example, the following is a range condition for a three-part HASH index:例如,下面是一個三列的HASH 索引的範圍條件:

key_part1 = 1 AND key_part2 IS NULL AND key_part3 = 'foo'

        For a BTREE index, an interval might be usable for conditions combined with AND, where each condition compares a key part with a constant value using =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' (where 'pattern' does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if <> or != is used).

        對於一個BTREE索引,一個區間可能適合於用AND進行條件組合,在每個條件都用 =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' 將常量與key  part比較('pattern'不能是以通配符開頭)。只要有可能確定一個包含與條件匹配的所有行(或使用<或!=)的單個鍵值元組,這個範圍區間就會被使用。

        The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction:

        如果比較操作符使用的是=、<=>或 IS NULL時優化器嘗試使用額外的 key parts來確定區間。如果操作符是>、<、>=、<=、!=、<>、或BETWEEN、LIKE,優化器使用區間,但不考慮更多額外的 key parts對於下面的表達式,優化器從第一次比較使用=它也在第二次比較中使用使用了>=,但是沒有考慮其他的 key parts,並且不使用第三次比較來進行區間結構的比較:

key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10

The single interval is:

('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)

        It is possible that the created interval contains more rows than the initial condition. For example,the preceding interval includes the value ('foo', 11, 0), which does not satisfy the original condition.創建的區間可能包含比初始條件更多的行。例如,前面的區間包含值('foo',11,0),它不滿足原始條件(第三個條件時大於10,然而第三個值確實0)。

        If conditions that cover sets of rows contained within intervals are combined with OR, they form a condition that covers a set of rows contained within the union of their intervals. If the conditions are combined with AND, they form a condition that covers a set of rows contained within the intersection of their intervals. For example, for this condition on a two-part index:

        如果是通過OR結合的幾個區間的集合,它們形成了一個條件,該條件涵蓋了在它們的區間內包含所有行(並集)。如果條件是通過AND相結合,它們形成的一個條件,它涵蓋了在它們的區間的交集中包含的那些行。(這個其實就是講了一個OR和AND的區別)例如,對於two-part索引的這個條件:

(key_part1 = 1 AND key_part2 < 2) OR (key_part1 > 5)

The intervals are:

(1,-inf) < (key_part1,key_part2) < (1,2)
(5,-inf) < (key_part1,key_part2) 
        In this example, the interval on the first line uses one key part for the left bound and two key parts for the right bound. The interval on the second line uses only one key part. The key_len column in the EXPLAIN output indicates the maximum length of the key prefix used.

        在這個例子中,第一行的區間使用了左邊綁定的一個key part 和右邊界的兩個 key parts 。第二行中的間隔只使用一個 key parts 。EXPLAIN輸出中的key_len列表明所使用的鍵前綴的最大長度。

        In some cases, key_len may indicate that a key part was used, but that might be not what you would expect. Suppose that key_part1 and key_part2 can be NULL. Then the key_len column displays two key part lengths for the following condition:

        在某些情況下,key_len可能表示使用了一個 key par,但這可能不是您所期望的。假設keypart1和keypart2可以是NULL。以下條件key_len列顯示的兩個key part長度:

key_part1 >= 1 AND key_part2 < 2

But, in fact, the condition is converted to this:但是,事實上,這個條件被轉換成這個:

key_part1 >= 1 AND key_part2 IS NOT NULL

        The Range Access Method for Single-Part Indexes, describes how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index. Analogous steps are performed for range conditions on multiple-part indexes.

       The Range Access Method for Single-Part Indexes,描述瞭如何執行優化,以組合或消除單列索引的範圍條件的區間。對聚族索引的範圍條件執行類似的步驟。

Equality Range Optimization of Many-Valued Comparisons

Consider these expressions, where col_name is an indexed column:考慮一下這些表達式,col_name是一個索引列:

col_name IN(val1, ..., valN)
col_name = val1 OR ... OR col_name = valN
        Each expression is true if col_name is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:

        如果col_name等於幾個值中的任何一個值,那麼這個表達式都是TRUE。這些比較和範圍比較是相等的(其中“範圍”是一個單獨的值)。優化器估計爲讀取相等範圍的比較中符合條件的行的成本如下:

1、If there is a unique index on col_name, the row estimate for each range is 1 because at most one row can have the given value.

如果col_name上有一個unique 索引,那麼每個範圍的行估計是1,因爲最多一行可以符合給定的值。

2、Otherwise, any index on col_name is nonunique and the optimizer can estimate the row count for each range using dives into the index or index statistics.

否則,col_name上的任何索引都是非惟一的,並且優化器可以使用  index dives index statistics.來估計每個範圍的行數。

        With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression col_name IN (10, 20, 30) has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.

        通過 index dives,優化器在一個範圍的每一端進行 index dives,並使用範圍內的行數作爲估計,例如,表達式col_name IN (10, 20, 30) 有三個相等的範圍,優化器在每個單值使用兩次 index dives生成行評估。每一對index dives都可以估計出有給定值的行數。

        Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.

        Index dives提供精確的行估計,但是隨着表達式中比較值的數量增加,優化器需要更長的時間來生成行估計。 index statistics的使用沒有Index dives準確,但是能對大值列表進行更快的行估計。

        The eq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other.To permit use of index dives for comparisons of up to N equality ranges, set eq_range_index_dive_limit to N + 1. To disable use of statistics and always use index dives regardless of N, set eq_range_index_dive_limit to 0.

        eq_range_index_dive_limit 系統變量使您能夠配置優化器從一行評估策略切換到另一行的值的數量。爲了允許使用index dives來比較N個相等的ranges,將eq_range_index_dive_limit 設置爲N+1。要禁用index  statistics,並且總是使用index dives,而不考慮N,將eq_range_index_dive_limit 設置爲0。

        To update table index statistics for best estimates, use ANALYZE TABLE.

        要更新表index statistics以獲得最佳估計,請使用分析表。

        Prior to MySQL 8.0, there is no way of skipping the use of index dives to estimate index usefulness,except by using the eq_range_index_dive_limit system variable. In MySQL 8.0, index dive skipping is possible for queries that satisfy all these conditions:

        在MySQL 8.0之前,除了使用eq_range_index_dive_limit 系統變量之外,沒有辦法跳過index dives的使用來估計索引有用性。在MySQL 8.0中,對於滿足所有這些條件的查詢,index dives是可能的:

        (1)The query is for a single table, not a join on multiple tables。查詢只針對單個表,而不是多個表上的聯接

       (2)A single-index FORCE INDEX index hint is present. The idea is that if index use is forced, there is nothing to be gained from the additional overhead of performing dives into the index.一個單列索引FORCE INDEX指示提示存在。其思想是,如果索引使用是強制的,那麼在索引中執行index dive的額外開銷沒有什麼好處。

        (3)The index is nonunique and not a FULLTEXT index.索引是非唯一的,而不是全文索引。

        (4)No subquery is present.不存在子查詢

        (5)No DISTINCT, GROUP BY, or ORDER BY clause is present. 沒有DISTINCT, GROUP BY, or ORDER BY 子句存在

          For EXPLAIN FOR CONNECTION, the output changes as follows if index dives are skipped:爲了EXPLAIN FOR CONNECTION,如果跳過index dives,輸出會發生如下變化:

        (1)For traditional output, the rows and filtered values are NULL.對於傳統輸出,行和過濾值都是NULL。

   (2)For JSON output, rows_examined_per_scan and rows_produced_per_join do not appear,   skip_index_dive_due_to_force is true, and cost calculations are not accurate.對於JSON輸出,rows_examined_per_scan rows_produced_per_join 不會登場skip_index_dive_due_to_force 是true,成本計算是不準確的。

  •         Without FOR CONNECTION, EXPLAIN output does not change when index dives are skipped.

  •         如果沒有連接,則在跳過 index dives時,EXPLAIN 輸出不會改變。

   After execution of a query for which index dives are skipped, the corresponding row in the INFORMATION_SCHEMA.OPTIMIZER_TRACE table contains an index_dives_for_range_access value of skipped_due_to_force_index.

Limiting Memory Use for Range Optimization

        To control the memory available to the range optimizer, use the range_optimizer_max_mem_size system variable:        

        爲了控制範圍優化器可用的內存,請使用range_optimizer_max_mem_size系統變量:

        (1)A value of 0 means “no limit.”值爲0表示沒有限制

        (2)With a value greater than 0, the optimizer tracks the memory consumed when considering the range access method. If the specified limit is about to be exceeded, the range access method is abandoned and other methods, including a full table scan, are considered instead. This could be less optimal. If this happens, the following warning occurs (where N is the current range_optimizer_max_mem_size value):當值大於0時,優化器會跟蹤考慮範圍訪問方法時所消耗的內存。如果要超出指定的限制,則放棄範圍訪問方法,並考慮其他方法,包括一個完整的表掃描。這可能不是最優的。如果發生這種情況,則會出現以下警告(WHERE N是當前的range_optimizer_max_mem_size 值):

Warning 3170 Memory capacity of N bytes for 'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query.

        For individual queries that exceed the available range optimization memory and for which the optimizer falls back to less optimal plans, increasing the range_optimizer_max_mem_size value may improve performance.對於超出可用範圍優化內存的單個查詢,以及優化器返回到不太理想的計劃,增加range_optimizer_max_mem_size值可以提高性能。

        To estimate the amount of memory needed to process a range expression, use these guidelines:要估計處理範圍表達式所需的內存數量,請使用以下指南:

        (1)For a simple query such as the following, where there is one candidate key for the range access method, each predicate combined with OR uses approximately 230 bytes:對於一個簡單的查詢,如下面的查詢,其中有一個候選 key用於範圍訪問方法,每個謂詞用OR結合使用大約230個字節:

SELECT COUNT(*) FROM t WHERE a=1 OR a=2 OR a=3 OR .. . a=N;

        (2)Similarly for a query such as the following, each predicate combined with AND uses approximately 125 bytes:類似於下面的查詢,每個謂詞通過AND結合使用大約125個字節:

SELECT COUNT(*) FROM t WHERE a=1 AND b=1 AND c=1 ... N

         (3) For a query with IN() predicates:  對於IN()謂詞的查詢:

SELECT COUNT(*) FROM t WHERE a IN (1,2, ..., M) AND b IN (1,2, ..., N);
         Each literal value in an IN() list counts as a predicate combined with OR. If there are two IN() lists, the number of predicates combined with OR is the product of the number of literal values in each list. Thus, the number of predicates combined with OR in the preceding case is M × N.

        IN()列表中的每一個文字值都算作一個謂詞用OR結合。如果有兩個IN()列表,那麼謂詞的數量與每個列表中的文字值的數量用OR相結合。因此,與前一種情況相結合的謂詞的數量是M *N。

Range Optimization of Row Constructor Expressions

        The optimizer is able to apply the range scan access method to queries of this form:優化器能夠將範圍掃描訪問方法應用於此格式的查詢:

SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));

        Previously, for range scans to be used, it was necessary to write the query as:在此之前,要使用範圍掃描,有必要將查詢寫成:

SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' )
OR ( col_1 = 'c' AND col_2 = 'd' );

        For the optimizer to use a range scan, queries must satisfy these conditions:爲了讓優化器使用範圍掃描,查詢必須滿足以下條件:

        (1)Only IN() predicates are used, not NOT IN().只有IN()謂詞被使用,而不能有NOT IN()。

        (2)On the left side of the IN() predicate, the row constructor contains only column references.在IN()謂詞的左邊,row構造函數只包含列引用。(column(左邊只能是字段名)  in (value))

        (3)On the right side of the IN() predicate, row constructors contain only runtime constants, which are either literals or local column references that are bound to constants during execution.在IN()謂詞的右邊,row構造函數只包含運行時常量,它們要麼是字面量,要麼是它們在執行期間綁定到常量的本地列引用。(column(左邊只能是字段名)  in (value 右邊可以是一個const值嗎,或者子查詢的某一列的值,相當於常量))

        (4)On the right side of the IN() predicate, there is more than one row constructor.在IN()謂詞的右邊,有不止一個row構造函數。(col  in (const1, const2)   ,如果只有一個直接用 = 操作符,所以至少要有兩個)


        希望有同學,大佬幫我指正理解不到位的部分,在此表示感謝!

        上一篇:https://blog.csdn.net/qwerdf10010/article/details/80514301

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章