Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 0.7.16
-
Fix Version/s: None
-
Component/s: Chef Server
-
Labels:None
-
Environment:
Ubuntu 9.10 x86_64
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]
chef-server 0.7.16-1
libmerb-core-ruby1.8 1.0.12+dfsg-0ubuntu1
Description
merb worker on port 4000 RAM usage grows over time without bounds (except for the limits of RAM + swap available.)
Activity
- All
- Comments
- History
- Activity
- Transitions Summary
The Chef server is running in a KVM VM which is not hosting any other major applications.
From Ken:
Particulars:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 9.10
Release: 9.10
Codename: karmic
$ uname -m
x86_64
$ COLUMNS=40 dpkg -l | grep chef
ii chef 0.7.16-1 configuration management system written in R
ii chef-indexer 0.7.16-1 Creates search indexes of Chef node attribut
ii chef-server 0.7.16-1 Merb application providing centralized manag
ii chef-server-sl 0.7.16-1 Merb app slice providing centralized managem
ii libchef-ruby 0.7.16-1 Ruby libraries for Chef configuration manage
ii libchef-ruby1. 0.7.16-1 Ruby 1.8 libraries for Chef configuration ma
Looking closer the increase in RAM usage is not sudden. I was looking at a chart of "free memory" blush so the "spikes" were really reboots resulting in sudden increase of RAM.
However, the merb worker was using an rss of ~2.3 GiB
$ ps vax --sort=-rss |head
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
29513 ? R 2192:16 1141997 2 2549253 2352616 79.9 merb : worker (port 4000)
9882 ? S 39:31 31673 2 172461 111188 3.7 ruby /usr/bin/chef-indexer -d -c /etc/chef/indexer.rb
29514 ? S 20:39 55304 2 132805 58284 1.9 merb : worker (port 4001)
29512 ? S 811:15 11118 2 104457 29028 0.9 merb : spawner (ports 4000)
Since the reboot the rss of the restarted merb worker has climbed steadily from about 185 MiB to 320 MiB
$ ps vax --sort=-rss |head -n 5
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
1728 ? S 105:02 1 2 400393 322248 6.6 merb : worker (port 4000)
1613 ? S 0:12 0 2 128161 78968 1.6 /usr/bin/ruby1.8 /usr/bin/stompserver -C /etc/stompserver.conf
1729 ? S 1:08 0 2 109313 36864 0.7 merb : worker (port 4001)
1727 ? S 41:55 3 2 105145 32532 0.6 merb : spawner (ports 4000)
Does it have a memory leak or is it merely a resource hog?
Thanks,
Ken
Again, from Ken:
Hi,
merb RAM usage (rss) appears to stay flat for a long time and then suddenly shoots up.
After the RAM usage has spiked it looks like this:
$ ps vax --sort=-rss |head
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
29513 ? R 2192:16 1141997 2 2549253 2352616 79.9 merb : worker (port 4000)
9882 ? S 39:31 31673 2 172461 111188 3.7 ruby /usr/bin/chef-indexer -d -c /etc/chef/indexer.rb
29514 ? S 20:39 55304 2 132805 58284 1.9 merb : worker (port 4001)
29512 ? S 811:15 11118 2 104457 29028 0.9 merb : spawner (ports 4000)
10213 ? S 3:37 119973 2 301561 18080 0.6 /usr/bin/ruby1.8 /usr/bin/stompserver -C /etc/stompserver.conf
15666 ? Sl 0:15 1473 1617 111930 10768 0.3 /usr/lib/erlang/erts-5.7.2/bin/beam.smp -Bd -K true – -root /usr/lib/erlang -progname erl – -home /var/lib/couchdb -noshell -noinput -smp auto -sasl errlog_type error -pa /usr/lib/couchdb/erlang/lib/couch-0.10.0/ebin /usr/lib/couchdb/erlang/lib/mochiweb-r97/ebin /usr/lib/couchdb/erlang/lib/ibrowse-1.5.2/ebin /usr/lib/couchdb/erlang/lib/erlang-oauth/ebin -eval application:load(ibrowse) -eval application:load(oauth) -eval application:load(crypto) -eval application:load(couch) -eval crypto:start() -eval ssl:start() -eval ibrowse:start() -eval couch_server:start([ "/etc/couchdb/default.ini", "/etc/couchdb/local.ini", "/etc/couchdb/default.ini", "/etc/couchdb/local.ini"]), receive done -> done end. -pidfile /var/run/couchdb/couchdb.pid -heart
Bumped up VM that is running Chef server from 3 GB RAM to 5 GB RAM.
After the reboot merb is only using about 185 MiB rss vs. 2.3 GiB
$ ps vax --sort=-rss |head
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
29513 ? R 2192:16 1141997 2 2549253 2352616 79.9 merb : worker (port 4000)
9882 ? S 39:31 31673 2 172461 111188 3.7 ruby /usr/bin/chef-indexer -d -c /etc/chef/indexer.rb
29514 ? S 20:39 55304 2 132805 58284 1.9 merb : worker (port 4001)
29512 ? S 811:15 11118 2 104457 29028 0.9 merb : spawner (ports 4000)
10213 ? S 3:37 119973 2 301561 18080 0.6 /usr/bin/ruby1.8 /usr/bin/stompserver -C /etc/stompserver.conf
15666 ? Sl 0:15 1473 1617 111930 10768 0.3 /usr/lib/erlang/erts-5.7.2/bin/beam.smp -Bd -K true – -root /usr/lib/erlang -progname erl – -home /var/lib/couchdb -noshell -noinput -smp auto -sasl errlog_type error -pa /usr/lib/couchdb/erlang/lib/couch-0.10.0/ebin /usr/lib/couchdb/erlang/lib/mochiweb-r97/ebin /usr/lib/couchdb/erlang/lib/ibrowse-1.5.2/ebin /usr/lib/couchdb/erlang/lib/erlang-oauth/ebin -eval application:load(ibrowse) -eval application:load(oauth) -eval application:load(crypto) -eval application:load(couch) -eval crypto:start() -eval ssl:start() -eval ibrowse:start() -eval couch_server:start([ "/etc/couchdb/default.ini", "/etc/couchdb/local.ini", "/etc/couchdb/default.ini", "/etc/couchdb/local.ini"]), receive done -> done end. -pidfile /var/run/couchdb/couchdb.pid -heart
We are keeping charts of RAM use over time via Nagios/RRD and they show the increase in RAM usage happens suddenly not gradually. It also repeats over time.
Hi Adam,
$ dpkg -l | grep "ii ruby" | awk '{print $2 " " $3 }'
ruby 4.2
ruby1.8 1.8.7.174-1ubuntu1
rubygems 1.3.5-1ubuntu2
rubygems1.8 1.3.5-1ubuntu2
$ ruby --version
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]
Actually looking at the charts shows a gradual decline in free RAM for a while (hours and hours) and then a sudden decline.
There are 107 chef clients which go off twice an hour but not all at once since they use splayed cron jobs. The splay is done by using a modulo of the last octet of the IP address not by any Chef splaying mechanism. This yields an average of 2-3 a minute that check in nearly simultaneously, maybe a few seconds apart.
The Chef server is running in a VM which is not hosting any other major applications. It has sshd etc.
Adam wrote:
Next step here is to get this running under something that can give us
some memory profiling - may I recommend bleak_house?
http://blog.evanweaver.com/files/doc/fauna/bleak_house/files/README.html
Is anyone else seeing this kind of growth on 64bit?
It's "normal" to see ruby use roughly double the memory on 64bit
systems, at least in my experience. We tend to see roughly 50-60MB
resident across the board on 32bit, and it's pretty stable... but we
don't have any production systems at 9.10.
Adam
As further data-points, we do have some folks who are seeing similar growth on 64bit Ubuntu with the shipped 1.8.7.
A different end user, also on 64bit ubuntu, but running a slightly older and manually built 1.8.7, appears to not exhibit the leak.
More investigating. ![]()
Hi, I'm seeing this issue too.
Ubuntu 9.10 64bit
root@caesar:/home/grantz# dpkg -l | grep chef
ii chef 0.7.16-1 configuration management system written in R
ii chef-indexer 0.7.16-1 Creates search indexes of Chef node attribut
ii chef-server 0.7.16-1 Merb application providing centralized manag
ii chef-server-slice 0.7.16-1 Merb app slice providing centralized managem
ii libchef-ruby 0.7.16-1 Ruby libraries for Chef configuration manage
ii libchef-ruby1.8 0.7.16-1 Ruby 1.8 libraries for Chef configuration ma
root@caesar:/home/grantz# ruby --version
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]
I only have 16 clients connecting every 30 minutes with a 5 minute splay.
Ram usage appears to grow erratically.
We're seeing same problem in our environment :
Ubuntu 8.04 64bit
Ruby 1.8.6 p111
Chef-server 0.7.16
merb 1.0.12
chef server (merb worker for port 4000) slowly eats up all memory in the system so we have to restart chef server once in a while, which is very painful.
The merb API server is going away in Chef 11, so this can be considered fixed.
There are 107 chef clients which go off twice an hour but not all at once since they use splayed cron jobs. The splay is done by using a modulo of the last octet of the IP address not by any Chef splaying mechanism. This yields an average of 2-3 a minute that check in nearly simultaneously, maybe a few seconds apart.