Details
-
Type:
Bug
-
Status:
Open
-
Priority:
Unknown
-
Resolution: Unresolved
-
Affects Version/s: 10.12.0
-
Fix Version/s: None
-
Component/s: Chef Server
-
Environment:Hide
Mid-2011 Macbook Pro running OSX Lion, 8GB ram, awesome chimpanzee sticker on lid. Tried Chef 10.12.0 & Chef 0.10.8, and RabbitMQ 2.8.4 & 2.8.2. Def Leppard played at various volumes during exploration. erlang R15B01 (erts-5.9.1), ruby 1.9.3p125
ShowMid-2011 Macbook Pro running OSX Lion, 8GB ram, awesome chimpanzee sticker on lid. Tried Chef 10.12.0 & Chef 0.10.8, and RabbitMQ 2.8.4 & 2.8.2. Def Leppard played at various volumes during exploration. erlang R15B01 (erts-5.9.1), ruby 1.9.3p125
Description
I'm seeing 100% CPU load from rabbitmq's beam.smp process on OSX the second and all subsequent times I start rabbitmq. RabbitMQ is supporting a local chef server and nothing else. This presented using Chef 10.12.0 and RabbitMQ 2.8.4, but persisted even after a baseline reinstall to chef 0.10.8 and rabbitmq 2.8.2.
td;dr – if beam.smp is going crazy, stuff these lines into your chef config files:
module ::Chef::Expander ; VNODES = 16 ; end
module ::Chef::IndexQueue ; class AmqpClient ; VNODES = 16 ; end ; end
-------------------------------
Running `rabbitmqctl report` would hang trying to list `Queues on /chef`. Rabbitmq was otherwise responsive to rabbitmqctl commands (in particular, stop and status), and was using ~ 144 MB under no external load; starting and stopping the server would take more than a minute each. Documents continued to be indexed during this time, so the queue was responsive.
- Setup: rabbitmq, couch and erlang installed using homebrew via this chef recipe. erlang version is R15B01 (erts-5.9.1), rabbitMQ is 2.8.4 Nuking it from orbit + re-installing led to the same issue over multiple trials. I also compiled erlang from scratch and reinstalled rabbitmq from source at 2.8.2, no change.
- The rabbitMQ mgmt dashboard reports that I had 3GB high water mark – rabbitMQ was under no real memory pressure. The computer is a mid-2011 Macbook Pro with 8 GB ram.
I then did the following:
- blew away everything – chef, erlang, couch, rabbitmq, and all their installed directories
- installed erlang and rabbitmq, brought up rabbitmq and verified that it was vanilla.
- Make vhost '/chef', user chef, and authed chef and guest on /chef:
rmq_user=chef; rmq_password=testing ; rmq_vhost='/chef'
rabbitmqctl add_vhost "$rmq_vhost"
rabbitmqctl add_user "$rmq_user" "$rmq_password"
rabbitmqctl set_permissions -p "$rmq_vhost" "$rmq_user" ".*" ".*" ".*"
rabbitmqctl set_permissions -p "$rmq_vhost" "guest" ".*" ".*" ".*"
- ran chef-expanderctl queue-depth, which then enumerated its 1024 queues – fairly slowly.
- After some seconds, all those queues report as idle, and the rabbitmq process seemed otherwise quiescent.
- Did a rabbitmq stop, which took ~60sec
- Starting rabbitMQ took more than a minute to come fully active, with the CPU at max the whole way. Tens of minutes later, with no activity, the beam.smp process was still at 300% / fan blasting / space heater warp speed.
- no jobs were ever enqueued – only the enumeration of 1024 queues.
The only thing that did help was to turn down the number of vnodes chef server and expander use:
module ::Chef::Expander ; VNODES = 16 ; end
module ::Chef::IndexQueue ; class AmqpClient ; VNODES = 16 ; end ; end
hoover_damm on IRC points out that I had also just adjusted Solr's maxFieldLength in response to #2346. I increased its value to 200200 up from 10000. That seems foolish in retrospect, doesn't it? I don't know for sure that it was the source of the thrashing, but it's worth noting.
Even if that maxFieldLength value is improvidently generous, I still believe the default VNODES should not default to 1024 queues – the server-api only runs one process out of the box, so that seems significantly overprovisioned.