(譯)JVM Concurrent Mark Sweep (CMS) Collector 1.8

更多請移步: 我的博客

最近線上JDK升級到啦1.8,應用在發佈時,總會發生Full GC報警,看了下GC日誌,發現應用重啓時會接連發生4次Full GC,但是這4次GC後很久一段時間(第二天再次查看GC日誌和jstat的統計)沒有再發生Full GC,查了下官方資料,順便翻譯出來,方便以後閱讀。翻譯的不好,請見諒,也歡迎提出建議,在下將不勝感激。

原文鏈接:https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html

The Concurrent Mark Sweep (CMS) collector is designed for applications that prefer shorter garbage collection pauses and that can afford to share processor resources with the garbage collector while the application is running. Typically applications that have a relatively large set of long-lived data (a large tenured generation) and run on machines with two or more processors tend to benefit from the use of this collector. However, this collector should be considered for any application with a low pause time requirement. The CMS collector is enabled with the command-line option -XX:+UseConcMarkSweepGC.

CMS收集器特點:1.更短的垃圾回收暫停(stop the world) 2.垃圾收集器能夠在應用運行時與其共享處理器資源。通常,一個運行在多處理器機器上並且其中有長期存活的大集合(較大的老生代)的應用可以考慮使用CMS收集器。當然,對於具有低暫停時間要求的應用,都可以考慮使用該收集器。使用-XX:+UseConcMarkSweepGC來啓用CMS收集器。


Similar to the other available collectors, the CMS collector is generational; thus both minor and major collections occur. The CMS collector attempts to reduce pause times due to major collections by using separate garbage collector threads to trace the reachable objects concurrently with the execution of the application threads. During each major collection cycle, the CMS collector pauses all the application threads for a brief period at the beginning of the collection and again toward the middle of the collection. The second pause tends to be the longer of the two pauses. Multiple threads are used to do the collection work during both pauses. The remainder of the collection (including most of the tracing of live objects and sweeping of unreachable objects is done with one or more garbage collector threads that run concurrently with the application. Minor collections can interleave with an ongoing major cycle, and are done in a manner similar to the parallel collector (in particular, the application threads are stopped during minor collections).

和其他收集器一樣,CMS是分代的,因此,新生代和老年代都會發生回收。CMS嘗試通過多線程併發的方式來跟蹤對象的可達性,以便減少老生代的收集時間。在老年代垃圾回收期間,CMS會發生兩次STW:1.收集開始首次標記時(initial-mark)2.重新標記時(remark)。第二次暫停的時間往往比第一次要長。這兩次標記均使用多線程。其餘的收集(包括大多存活對象跟蹤和不可達對象的清除)使用一個或多個線程和應用併發執行。新生代的回收可以和正在執行老年年代回收交錯執行,類似於並行收集器(在新生代回收期間應用程序線程被停止STW)。

Concurrent Mode Failure

The CMS collector uses one or more garbage collector threads that run simultaneously with the application threads with the goal of completing the collection of the tenured generation before it becomes full. As described previously, in normal operation, the CMS collector does most of its tracing and sweeping work with the application threads still running, so only brief pauses are seen by the application threads. However, if the CMS collector is unable to finish reclaiming the unreachable objects before the tenured generation fills up, or if an allocation cannot be satisfied with the available free space blocks in the tenured generation, then the application is paused and the collection is completed with all the application threads stopped. The inability to complete a collection concurrently is referred to as concurrent mode failure and indicates the need to adjust the CMS collector parameters. If a concurrent collection is interrupted by an explicit garbage collection (System.gc()) or for a garbage collection needed to provide information for diagnostic tools, then a concurrent mode interruption is reported.

CMS使用一個或多個線程和應用線程併發進行,目標是在老生代被消耗完之前完成垃圾回收。如前所述,在正常操作中,CMS收集器執行大部分追蹤和清理工作時,應用程序線程仍在運行,因此應用線程只會短暫暫停。但是,如果在垃圾回收完成前,老生代被耗盡,或者老生代無法分配足夠的空間,此時會暫停所有的應用線程(導致了STW)直到垃圾回收完成。沒有在併發期間完成垃圾回收工作稱爲concurrent mode failure,這個失敗表明我們需要調整CMS收集器的參數。如果併發回收被顯示調用(System.gc()) 或者爲了給診斷工具提供信息而發生中斷,則會報告併發模式中斷。

Excessive GC Time and OutOfMemoryError

Excessive GC Time and OutOfMemoryError
The CMS collector throws an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, then an OutOfMemoryError is thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.

The policy is the same as that in the parallel collector, except that time spent performing concurrent collections is not counted toward the 98% time limit. In other words, only collections performed while the application is stopped count toward excessive GC time. Such collections are typically due to a concurrent mode failure or an explicit collection request (for example, a call to System.gc).

當GC時間太久(垃圾收集花費時間,超過了總時間的98%,但是回收的堆少於2%,總時間究竟是相對哪個時間而言?在下還未找到解釋。),CMS會拋出OutOfMemoryError。這個特性旨在避免由於堆空間過小而導致應用程序業務處理緩慢或者無進展。這個特性可以用-XX:-UseGCOverheadLimit禁用。

這個策略和並行收集器一致,只是併發回收垃圾的時間不計入98%的時間消耗中。換句話講,只有發生了STW情況下的收集時間(兩次標記的時間)會計入這98%的時間限制內。這種長時間回收通常是由於concurrent mode failure或者回收請求(比如:顯示調用System.gc)導致的。

Floating Garbage

The CMS collector, like all the other collectors in Java HotSpot VM, is a tracing collector that identifies at least all the reachable objects in the heap. In the parlance of Richard Jones and Rafael D. Lins in their publication Garbage Collection: Algorithms for Automated Dynamic Memory, it is an incremental update collector. Because application threads and the garbage collector thread run concurrently during a major collection, objects that are traced by the garbage collector thread may subsequently become unreachable by the time collection process ends. Such unreachable objects that have not yet been reclaimed are referred to as floating garbage. The amount of floating garbage depends on the duration of the concurrent collection cycle and on the frequency of reference updates, also known as mutations, by the application. Furthermore, because the young generation and the tenured generation are collected independently, each acts a source of roots to the other. As a rough guideline, try increasing the size of the tenured generation by 20% to account for the floating garbage. Floating garbage in the heap at the end of one concurrent collection cycle is collected during the next collection cycle.

CMS和其他Java HotSpot VM中回收器一樣,是一個會標記堆中所有可達對象的追蹤回收器。在《Algorithms for Automated Dynamic Memory》中CMS被稱作增量收集器。因爲應用線程和垃圾回收線程在老年代回收時是併發運行的,所以,那些被垃圾回收線程追蹤的對象在回收結束時可能會變得不可達。像這些尚未被回收並且不可達的引用被稱爲浮動垃圾。浮動垃圾的數量取決於併發收集週期的持續時間以及應用程序的引用更新頻率(也稱爲突變)。此外,因爲新生代和老年代的垃圾回收是相互獨立的,所以,根據經驗建議將老年代增加20%的空間來承載浮動垃圾。一個併發回收週期節後產生的浮動垃圾會在下個垃圾回收中回收。

Pauses

The CMS collector pauses an application twice during a concurrent collection cycle. The first pause is to mark as live the objects directly reachable from the roots (for example, object references from application thread stacks and registers, static objects and so on) and from elsewhere in the heap (for example, the young generation). This first pause is referred to as the initial mark pause. The second pause comes at the end of the concurrent tracing phase and finds objects that were missed by the concurrent tracing due to updates by the application threads of references in an object after the CMS collector had finished tracing that object. This second pause is referred to as the remark pause.

CMS在一次回收的過程中會兩次暫停(STW)應用。第一次是標記從GC root(比如:被應用線程棧引用的對象和registers,靜態的對象等等)可以直接到達的對象和來自堆的其他地方(比如:新生代)的對象。第一次暫停叫做初始標記暫停(initial mark pause)。第二次暫停發生在併發追蹤階段結束後,這次暫停是爲了標記那些在併發追蹤期間因爲被應用線程更新引用而錯過的對象。第二次暫停叫做重新標記暫停(remark pause)。

Concurrent Phases

The concurrent tracing of the reachable object graph occurs between the initial mark pause and the remark pause. During this concurrent tracing phase one or more concurrent garbage collector threads may be using processor resources that would otherwise have been available to the application. As a result, compute-bound applications may see a commensurate fall in application throughput during this and other concurrent phases even though the application threads are not paused. After the remark pause, a concurrent sweeping phase collects the objects identified as unreachable. Once a collection cycle completes, the CMS collector waits, consuming almost no computational resources, until the start of the next major collection cycle.

併發追蹤可達對象發生在初始標記(initial mark)和重新標記(remark)之間。在併發標記階段會有一個或者多個垃圾回收線程在使用處理器資源,否則處理器資源對應用而言是可用的。因此,在垃圾回收期間計算密集型應用吞吐量會下降,即使應用線程沒有被暫停。在重新標記階段後,會併發清理那些被被標記爲不可達的對象。垃圾回收完成後,CMS進入等待,幾乎不消耗任何計算資源,直到下一個老年代回收開始。

Starting a Concurrent Collection Cycle

With the serial collector a major collection occurs whenever the tenured generation becomes full and all application threads are stopped while the collection is done. In contrast, the start of a concurrent collection must be timed such that the collection can finish before the tenured generation becomes full; otherwise, the application would observe longer pauses due to concurrent mode failure. There are several ways to start a concurrent collection.

Based on recent history, the CMS collector maintains estimates of the time remaining before the tenured generation will be exhausted and of the time needed for a concurrent collection cycle. Using these dynamic estimates, a concurrent collection cycle is started with the aim of completing the collection cycle before the tenured generation is exhausted. These estimates are padded for safety, because concurrent mode failure can be very costly.

A concurrent collection also starts if the occupancy of the tenured generation exceeds an initiating occupancy (a percentage of the tenured generation). The default value for this initiating occupancy threshold is approximately 92%, but the value is subject to change from release to release. This value can be manually adjusted using the command-line option -XX:CMSInitiatingOccupancyFraction=, where is an integral percentage (0 to 100) of the tenured generation size.

老年代串行垃圾收集器在老年代即將耗盡時開始工作,此時所有應用線程都會暫停直到垃圾回收完成。與之相反,CMS從開始併發回收時進行計時,以使回收在老年代耗盡之前完成,否則會因爲concurrent mode failure使得應用暫停更長時間。有幾種方式可以開始一個併發回收。

基於最近的回收歷史,CMS會維護兩個預估時間:老年代多久會被耗盡;完成一次老年代的回收需要多久。依據這些動態估計值,在老年代耗盡前適時開始老年代的回收。這些預估值是爲了避免耗時的concurrent mode failure

如果老年代佔用的空間超過了啓動設置的大小(一個老生代的佔用比)。默認的啓動設置的閥值是92%,但是這個值會隨着版本發佈改變。這個值可以通過-XX:CMSInitiatingOccupancyFraction=配置,N是佔用老生代大小的百分比(0-100)。

Scheduling Pauses

The pauses for the young generation collection and the tenured generation collection occur independently. They do not overlap, but may occur in quick succession such that the pause from one collection, immediately followed by one from the other collection, can appear to be a single, longer pause. To avoid this, the CMS collector attempts to schedule the remark pause roughly midway between the previous and next young generation pauses. This scheduling is currently not done for the initial mark pause, which is usually much shorter than the remark pause.

老年代和新生代垃圾回收引起的暫停是獨立的。它們不會重疊,但是可能會快速連續發生,比如:一個回收暫停結束後另一個回收又立刻導致暫停,看起來就像單獨一個很長的暫停。爲了避免這種情況,CMS嘗試規劃將remark暫停放在兩次新生代回收(Young GC)之間。這個規劃尚未在初始標記暫停(initial mark pause)使用,因爲初始標記暫停(initial mark pause)通常比remark pause要短。

Incremental Mode

Note that the incremental mode is being deprecated in Java SE 8 and may be removed in a future major release.

請注意,增量模式在Java SE 8中已被棄用,並可能在將來的主要版本中被刪除,所以不在翻譯和i-cms有關部分。

Measurements

Example 8-1, “Output from the CMS Collector” is the output from the CMS collector with the options -verbose:gc and -XX:+PrintGCDetails, with a few minor details removed. Note that the output for the CMS collector is interspersed with the output from the minor collections; typically many minor collections occur during a concurrent collection cycle. CMS-initial-mark indicates the start of the concurrent collection cycle, CMS-concurrent-mark indicates the end of the concurrent marking phase, and CMS-concurrent-sweep marks the end of the concurrent sweeping phase. Not discussed previously is the precleaning phase indicated by CMS-concurrent-preclean. Precleaning represents work that can be done concurrently in preparation for the remark phase CMS-remark. The final phase is indicated by CMS-concurrent-reset and is in preparation for the next concurrent collection.

Example 8-1 Output from the CMS Collector

   [GC [1 CMS-initial-mark: 13991K(20288K)] 14103K(22400K), 0.0023781 secs]
   [GC [DefNew: 2112K->64K(2112K), 0.0837052 secs] 16103K->15476K(22400K), 0.0838519 secs]
   ...
   [GC [DefNew: 2077K->63K(2112K), 0.0126205 secs] 17552K->15855K(22400K), 0.0127482 secs]
   [CMS-concurrent-mark: 0.267/0.374 secs]
   [GC [DefNew: 2111K->64K(2112K), 0.0190851 secs] 17903K->16154K(22400K), 0.0191903 secs]
   [CMS-concurrent-preclean: 0.044/0.064 secs]
   [GC [1 CMS-remark: 16090K(20288K)] 17242K(22400K), 0.0210460 secs]
   [GC [DefNew: 2112K->63K(2112K), 0.0716116 secs] 18177K->17382K(22400K), 0.0718204 secs]
   [GC [DefNew: 2111K->63K(2112K), 0.0830392 secs] 19363K->18757K(22400K), 0.0832943 secs]
   ...
   [GC [DefNew: 2111K->0K(2112K), 0.0035190 secs] 17527K->15479K(22400K), 0.0036052 secs]
   [CMS-concurrent-sweep: 0.291/0.662 secs]
   [GC [DefNew: 2048K->0K(2112K), 0.0013347 secs] 17527K->15479K(27912K), 0.0014231 secs]
   [CMS-concurrent-reset: 0.016/0.016 secs]
   [GC [DefNew: 2048K->1K(2112K), 0.0013936 secs] 17527K->15479K(27912K), 0.0014814 secs]

The initial mark pause is typically short relative to the minor collection pause time. The concurrent phases (concurrent mark, concurrent preclean and concurrent sweep) normally last significantly longer than a minor collection pause, as indicated by Example 8-1, “Output from the CMS Collector”. Note, however, that the application is not paused during these concurrent phases. The remark pause is often comparable in length to a minor collection. The remark pause is affected by certain application characteristics (for example, a high rate of object modification can increase this pause) and the time since the last minor collection (for example, more objects in the young generation may increase this pause).

例子8-1是打開-verbose:gc和-XX:+PrintGCDetails選項後的gc打印日誌,例子中略去了一些新生代的收集細節。注意,CMS收集器打印的日誌和新生代回收打印的日誌是穿插在一起的,通常在一個老年代回收週期中會發生多次新生代回收。CMS-initial-mark表明一個併發收集週期的開始,CMS-concurrent-mark表明併發標記階段已經完成,CMS-concurrent-sweep說明併發清除階段已經結束。CMS-concurrent-preclean是預清理階段。預清理工作併發進行,是CMS-remark階段的前置操作。CMS-concurrent-reset是老年代垃圾回收的最後一個階段也是爲下次回收做預備工作。

初始標記階段耗時通常比年輕代暫停時間短。從例8-1我們不難發現,併發階段(併發標記,併發預清理和併發清除)通常耗時比新生代回收時間長。但是,應用在這些階段中並沒有暫停。重新標記耗時與新生代回收耗時差不多。重新標記的耗時會受某些應用特性(比如:高對象修改率會增加此階段耗時)和距離上次老年代收集時間(比如:新生代有了更多的對象會增加此階段耗時)的影響。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章