cms gc

These are a few esoteric factoids that I never expected users to need, but which have actually come up recently. Most of the text is just background information. If you already recognize the command line flags that I've bold'ed, you probably already know more than is good for you.

ParallelCMSThreads

The low-pause collector (UseConcMarkSweepGC) does parts of the collection of the tenured generation concurrently with the execution of the application (i.e., not during a stop-the-world). There are principally two concurrent phases of the collection: the concurrent marking phase and the concurrent sweeping phase. In JDK 6 the concurrent marking phase can use more than 1 GC threads (uses parallelism as well as concurrency). This use of the parallelism is controlled by the command line flag CMSConcurrentMTEnabled. The number of threads used during a concurrent marking phase is ParallelCMSThreads. If it is not set on the command line it is calculated as

(ParallelGCThreads + 3)/4)

where ParallelGCThreads is the command line flag for setting the number of GC threads to be used in a stop-the-world parallel collection.

Where did this number come from? We added parallelism to the concurrent marking phase because we observed that a single GC thread doing concurrent marking could be overwhelmed by the allocations of many applications threads (i.e., while the concurrent marking was happening, lots of applications threads doing allocations could exhaust the heap before the concurrent marking finished). We could see this with a fewer application threads allocating at a furious rate or many application threads allocating at a more modest rate, but whatever the application we would often seen the concurrent marking thread overwhelmed on platforms with 8 or more processors.

The above policy provides a second concurrent marking threads at ParallelGCThreads=5 and approaches a fourth of ParallelGCThread at the higher processor numbers. Because we still do have the added overheard of parallelism 2 concurrent marking threads provide only a small boost in concurrent marking over a single concurrent marking thread. We expect that to still be adequate up to ParallelGCThreads=8. At ParallelGCThreads=9 we get a third concurrent marking thread and that's when we expect to need it.

CMSMaxAbortablePrecleanTime

Our low-pause collector (UseConcMarkSweepGC) which we are usually careful to call our mostly concurrent collector has several phases, two of which are stop-the-world (STW) phases.

STW initial mark
Concurrent marking
Concurrent precleaning
STW remark
Concurrent sweeping
Concurrent reset

The first STW pause is used to find all the references to objects in the application (i.e., object references on thread stacks and in registers). After this first STW pause is the concurrent marking phase during which the application threads runs while GC is doing additional marking to determine the liveness of objects. After the concurrent marking phase there is a concurrent preclean phase (described more below) and then the second STW pause which is called the remark phase. The remark phase is a catch-up phase in which the GC figures out all the changes that the application threads have made during the previous concurrent phases. The remark phase is the longer of these two pauses. It is also typically the longest of any of the STW pauses (including the minor collection pauses). Because it is typically the longest pause we like to use parallelism where ever we can in the remark phase.

Part of the work in the remark phase involves rescanning objects that have been changed by an application thread (i.e., looking at the object A to see if A has been changed by the application thread so that A now references another object B and B was not previously marked as live). This includes objects in the young generation and here we come to the point of these ramblings. Rescanning the young generation in parallel requires that we divide the young generation into chunks so that we can give chunks out to the parallel GC threads doing the rescanning. A chunk needs to begin on the start of an object and in general we don't have a fast way to find the starts of objects in the young generation.

Given an arbitrary location in the young generation we are likely in the middle of an object, don't know what kind of object it is, and don't know how far we are from the start of the object. We know that the first object in the young generation starts at the beginning of the young generation and so we could start at the beginning and walk from object to object to do the chunking but that would be expensive. Instead we piggy-back the chunking of the young generation on another concurrent phase, the precleaning phase.

During the concurrent marking phase the applications threads are running and changing objects so that we don't have an exact picture of what's alive and what's not. We ultimately fix this up in the remark phase as described above (the object-A-gets-changed-to-point-to-object-B example). But we would like to do as much of the collection as we can concurrently so we have the concurrent precleaning phase. The precleaning phase does work similar to parts of the remark phase but does it concurrently. The details are not needed for this story so let me just say that there is a concurrent precleaning phase. During the latter part of the concurrent precleaning phase the the young generation "top" (the next location to be allocated in the young generation and so at an object start) is sampled at likely intervals and is saved as the start of a chunk. "Likely intervals" just means that we want to create chunks that are not too small and not too large so as to get good load balancing during the parallel remark.

Ok, so here's the punch line for all this. When we're doing the precleaning we do the sampling of the young generation top for a fixed amount of time before starting the remark. That fixed amount of time is CMSMaxAbortablePrecleanTime and its default value is 5 seconds. The best situation is to have a minor collection happen during the sampling. When that happens the sampling is done over the entire region in the young generation from its start to its final top. If a minor collection is not done during that 5 seconds then the region below the first sample is 1 chunk and it might be the majority of the young generation. Such a chunking doesn't spread the work out evenly to the GC threads so reduces the effective parallelism.

If the time between your minor collections is greater than 5 seconds and you're using parallel remark with the low-pause collector (which you are by default), you might not be getting parallel remarking after all. A symptom of this problem is significant variations in your remark pauses. This is not the only cause of variation in remark pauses but take a look at the times between your minor collections and if they are, say, greater than 3-4 seconds, you might need to up CMSMaxAbortablePrecleanTime so that you get a minor collection during the sampling.

And finally, why not just have the remark phase wait for a minor collection so that we get effective chunking? Waiting is often a bad thing to do. While waiting the application is running and changing objects and allocating new objects. The former makes more work for the remark phase when it happens and the latter could cause an out-of-memory before the GC can finish the collection. There is an option CMSScavengeBeforeRemark which is off by default. If turned on, it will cause a minor collection to occur just before the remark. That's good because it will reduce the remark pause. That's bad because there is a minor collection pause followed immediately by the remark pause which looks like 1 big fat pause.l

DontCompileHugeMethods

We got a complaint recently from a user who said that all his GC pauses were too long. I, of course, take such a statement with a grain of salt, but I still try to go forward with an open mind. And this time the user was right, his GC pauses were way too long. So we started asking the usual questions about anything unusual about the application's allocation pattern. Mostly that boils down to asking about very large objects or large arrays of objects. I'm talking GB size objects here. But, no, there were nothing like that. The user was very helpful in terms of trying experiments with his application, but we weren't getting anywhere until the user came back and said that he had commented out part of his code and the GC's got much smaller. Hmmm. Curiouser and curiouser. Not only that, but the code that was commented out was not being executed. At this point the strain on my brain began to be too much and I lost consciousness. Fortunately, another guy in the group persevered and with some further experiments determined that the code that was being commented out was in a method that was not always being JIT'ed.

Methods larger than a certain size will not be JIT'ed in hotspot. Commenting out some code would bring the size of the method below the JIT size limit and the method would get compiled. How did that affect GC you might ask? When a method is compiled, the compilers generate and save information on where object references live (e.g., where on the stack or in which registers). We refer to these as oop maps and oop maps are generated to speed up GC. If the method has not been JIT'ed, the GC has to generate the oop maps itself during the GC. We do that by a very laborious means that we call abstract interpretation. Basically, we simulate the execution of the method with regard to where reference are stored. Large methods mean large abstract interpretation times to generate the oop maps. We do save the oop maps for the next GC, but oop maps are different at different locations in the method. If we generate an oop map for PC=200 this time but stop for a GC at PC=300 next time, we have to generate the oop map for PC=300. Anyway, the method in which code was being commented in and out, was too large to be JIT'ed with the code commented in and that led to the long GC's.

If you have some huge methods and GC's are taking a very long time, you could try -XX:-DontCompileHugeMethods. This will tell the JIT to ignore its size limit on compilation. I'm told by the compiler guy in my carpool that it's not a good idea to use that flag in general. Refactor your methods down to a less than huge size instead. By the way, the huge method was something like 2500 lines so it was what I would call huge.

http://www.cnblogs.com/zhangxiaoguang/p/5792468.html

最后编辑于：2017.12.06 21:47:30

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 196,165评论 5赞 462
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 82,503评论 2赞 373
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 143,295评论 0赞 325
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 52,589评论 1赞 267
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 61,439评论 5赞 358
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 46,342评论 1赞 273
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 36,749评论 3赞 387
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 35,397评论 0赞 255
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 39,700评论 1赞 295
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 34,740评论 2赞 313
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 36,523评论 1赞 326
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,364评论 3赞 314
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 37,755评论 3赞 300
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,024评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,297评论 1赞 251
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 41,721评论 2赞 342
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 40,918评论 2赞 336

cms gc

ParallelCMSThreads

推荐阅读更多精彩内容