god
Ruby process monitor
God - A Process Monitoring Framework in Ruby
I have a god script that is supposed to keep an eye on two stalker processes.
The problem is that after 24h it starts way too many processes.
This is the god script.
rails_root = File.expand_path("../..", __FILE__)
2.times do |n|
God.watch do |w|
w.group = "app-scripts"
w.name = "run-#{n}"
w.interval = 30.seconds
w.dir = File.dirname(__FILE__)
w.env = {
"BUNDLE_GEMFILE" => "#{rails_root}/Gemfile",
"RAILS_ENV" => "production",
"BEANSTALK_URL" => "beanstalk://127.0.0.1:54132"
}
w.start = "bbundle exec stalk #{File.join(rails_root, "config/jobs.rb")}"
w.start_grace = 5.seconds
w.stop_grace = 5.seconds
w.start_if do |start|
start.condition(:process_running) { |c| c.running = false }
end
w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
c.above = 200.megabytes
c.times = [3, 5]
end
restart.condition(:cpu_usage) do |c|
c.above = 95.percent
c.times = 5
end
end
w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 5.minute
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 2.hours
end
end
end
end
ps aux | grep stalk
returns the following.
root 3178 0.2 2.7 417580 117284 ? Sl Oct28 2:22 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 3179 0.2 3.3 506068 138740 ? Sl Oct28 2:26 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 4588 0.2 2.9 497932 121664 ? Sl Oct25 16:10 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 4794 0.2 3.0 497792 128084 ? Sl Oct25 15:57 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 10391 0.2 2.8 496784 121388 ? Sl Oct25 15:44 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 10392 0.2 2.8 497624 121528 ? Sl Oct25 15:31 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 18874 75.0 2.0 214116 83948 ? Rl 15:49 0:09 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 18875 75.0 2.0 214944 84868 ? Rl 15:49 0:09 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 20649 0.2 2.6 410636 110012 ? Sl Oct28 2:44 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 20650 0.2 3.0 439284 128996 ? Sl Oct28 2:47 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 23272 0.2 2.7 414452 115772 ? Sl Oct28 2:44 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 23273 0.2 2.7 417728 117152 ? Sl Oct28 2:44 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 25919 0.2 3.1 436276 131876 ? Sl Oct28 2:28 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 25920 0.2 3.3 503236 138676 ? Sl Oct28 2:29 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 28782 0.2 2.8 431836 121108 ? Sl Oct25 16:58 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 30687 0.2 2.7 415908 117008 ? Sl Oct28 2:39 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root 30688 0.2 2.6 476184 111844 ? Sl Oct28 2:37 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
This is the /usr/bin/bbundle
script.
#!/usr/bin/env bash
if [[ -s "/home/webmaster/.rvm/environments/ruby-1.9.2-p320@webmaster" ]]
then
source "/home/webmaster/.rvm/environments/ruby-1.9.2-p320@webmaster"
bundle "$@"
else
echo "ERROR: Missing RVM environment file: '/home/webmaster/.rvm/environments/ruby-1.9.2-p320@webmaster'" >&2
exit 1
fi
Running sudo god stop app-scripts
won't kill any processes.
I've tried adding w.uid = "webmaster"
to the god script, but the problem remains.
I'm running god version 0.12.1
, ruby version 1.9.3p286
and stalker version 0.9.0
.
What am I doing wrong?
Source: (StackOverflow)
I serve my software using passenger. It spawns many ruby processes.
Sometimes one of these rubies becomes bloated and I want it to die.
I was hoping to use god to that intent. My idea was to monitor all these rubies and if it is consuming more than 500MB of memory for 3 cycles, god should try to gracefuly kill it. If it remains alive for more than 5 minutes then god should kill it not gracefully.
It seems to me that god always tries to run the service again, so it forces us to provide a start command. Is it possible to use god only to kill bad behaviored processes and let the passenger spawner to bring them back to live when necessary?
Source: (StackOverflow)
I am working on a God script to monitor my Unicorns. I started with GitHub's examples script and have been modifying it to match my server configuration. Once God is running, commands such as god stop unicorn
and god restart unicorn
work just fine.
However, god start unicorn
results in WARN: unicorn start command exited with non-zero code = 1
. The weird part is that if I copy the start script directly from the config file, it starts right up like a brand new mustang.
This is my start command:
/usr/local/bin/unicorn_rails -c /home/my-linux-user/my-rails-app/config/unicorn.rb -E production -D
I have declared all paths as absolute in the config file. Any ideas what might be preventing this script from working?
Source: (StackOverflow)
I have this file
rails_env = ENV['RAILS_ENV'] || 'development'
rails_root = ENV['RAILS_ROOT'] || "/home/luiz/rails_dev/api"
God.watch do |w|
w.name = "unicorn"
w.interval = 30.seconds # default
# unicorn needs to be run from the rails root
w.start = "cd #{rails_root} && unicorn_rails -c config/unicorn.rb -E #{rails_env}"
# QUIT gracefully shuts down workers
w.stop = "kill -QUIT `cat #{rails_root}/tmp/pids/unicorn.pid`"
# USR2 causes the master to re-create itself and spawn a new worker pool
w.restart = "kill -USR2 `cat #{rails_root}/tmp/pids/unicorn.pid`"
w.start_grace = 10.seconds
w.restart_grace = 10.seconds
w.pid_file = "#{rails_root}/tmp/pids/unicorn.pid"
w.behavior(:clean_pid_file)
w.start_if do |start|
start.condition(:process_running) do |c|
c.interval = 5.seconds
c.running = false
end
end
w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
c.above = 300.megabytes
c.times = [3, 5] # 3 out of 5 intervals
end
restart.condition(:cpu_usage) do |c|
c.above = 50.percent
c.times = 5
end
end
# lifecycle
w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 5.minute
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 2.hours
end
end
end
I start unicorn with god -c unicorn.god -D -p 8081
and my workers are setup fine. but, sometimes I need do stop unicorn (god stop unicorn -p 8081
in another console) but the server keep up and running.
what am I missing?
Edit
We're moving from unicorn to puma (not because this question, it's a performance thing), and not going to use god anymore...thanks everybody for your help
Source: (StackOverflow)
I've got a rails website and a small minecraft server running on a linode vps. I'm running minecraft as a custom service off of a ram server based on an init.d file. Since I'm using God to monitor my rails website I thought I'd use it for minecraft as well, but it doesn't seem to be able to recognize the service in any way. The conditions don't detect its presence. :process_running always returns false, whether it's running or not, and fail to start it when it isn't. To add to the confusion :memory_usage and :cpu_usage are always zero.
My /etc/init.d/minecraft file is here:
http://pastie.org/2760483
It works perfectly well, and 'service minecraft start' and whatnot gives me pretty much everything I need. My hope was to be able to put it to sleep it automatically through god whenever the cpu usage got high to prioritize the website. However none of the god conditions are figuring out what's going on with the process.
My /opt/god/minecraft.god file is here:
http://pastie.org/2760498
Obviously the low cpu in that is an attempt to get a rise out of god. Asking for a smiting, if you will.
Trying to run god off the config:
sudo god -c minecraft.god -D
yields:
I [2011-10-26 01:55:55] INFO: Loading minecraft.god
I [2011-10-26 01:55:55] INFO: Syslog enabled.
I [2011-10-26 01:55:55] INFO: Using pid file directory: /var/run/god
I [2011-10-26 01:55:55] INFO: Socket already in use
I [2011-10-26 01:55:55] INFO: Socket is stale, reopening
I [2011-10-26 01:55:55] INFO: Started on drbunix:///tmp/god.17165.sock
I [2011-10-26 01:55:55] INFO: minecraft move 'unmonitored' to 'up'
I [2011-10-26 01:55:55] INFO: minecraft moved 'unmonitored' to 'up'
I [2011-10-26 01:55:55] INFO: minecraft [ok] memory within bounds [0kb] (MemoryUsage)
I [2011-10-26 01:55:55] INFO: minecraft [ok] cpu within bounds [0%%] (CpuUsage)
Source: (StackOverflow)
I am using Resque on a production website.
When I deploy, I want GOD to stop all of the workers and then restart them since sometimes we change the code of a class and requeue the failed jobs.
The problem is, that when I do god stop resque, the rakes does not actually stop, the workers still stay alive and working with older code, which creates all sorts of issues for me.
even when I do 'god terminate' it won't kill the workers.
Right now, I am using a shell script to kill workers, but since I have more then one server, it's pretty much a pain in the ass doing in on all production servers.
This is my god config file:
rails_env = ENV['RAILS_ENV'] || "production"
rails_root = ENV['RAILS_ROOT'] || "/mnt/data-store/html/gogobot/current"
num_workers = rails_env == 'production' ? 5 : 2
num_workers.times do |num|
God.watch do |w|
w.dir = "#{rails_root}"
w.name = "resque-#{num}"
w.group = "resque"
w.interval = 2.minutes
w.env = {"QUEUE"=>"duplicate_merging,facebook_wall_posts,generic,mailer,notifications,realtime,scoring_system,signup,social_graph_facebook,social_graph_foursquare,social_graph_twitter,user_info,user_score", "RAILS_ENV"=>rails_env, "PIDFILE" => "#{rails_root}/tmp/resque_#{w}.pid"}
w.pid_file = "#{rails_root}/tmp/resque_#{w}.pid"
w.start = "cd #{rails_root}/ && rake environment resque:work QUEUE=duplicate_merging,facebook_wall_posts,generic,mailer,notifications,realtime,scoring_system,signup,social_graph_facebook,social_graph_foursquare,social_graph_twitter,user_info,user_score RAILS_ENV=#{rails_env}"
w.log = "#{rails_root}/log/resque_god.log"
w.uid = 'root'
w.gid = 'root'
# restart if memory gets too high
w.transition(:up, :restart) do |on|
on.condition(:memory_usage) do |c|
c.above = 350.megabytes
c.times = 2
end
end
# determine the state on startup
w.transition(:init, { true => :up, false => :start }) do |on|
on.condition(:process_running) do |c|
c.running = true
end
end
# determine when process has finished starting
w.transition([:start, :restart], :up) do |on|
on.condition(:process_running) do |c|
c.running = true
c.interval = 5.seconds
end
# failsafe
on.condition(:tries) do |c|
c.times = 5
c.transition = :start
c.interval = 5.seconds
end
end
# start if process is not running
w.transition(:up, :start) do |on|
on.condition(:process_running) do |c|
c.running = false
end
end
end
end
1.times do |num|
God.watch do |w|
w.name = "dj-#{num}"
w.group = 'dj'
w.interval = 30.seconds
w.start = "cd #{rails_root} && rake jobs:work"
w.uid = 'root'
w.gid = 'root'
# retart if memory gets too high
w.transition(:up, :restart) do |on|
on.condition(:memory_usage) do |c|
c.above = 300.megabytes
c.times = 2
end
end
# determine the state on startup
w.transition(:init, { true => :up, false => :start }) do |on|
on.condition(:process_running) do |c|
c.running = true
end
end
# determine when process has finished starting
w.transition([:start, :restart], :up) do |on|
on.condition(:process_running) do |c|
c.running = true
c.interval = 5.seconds
end
# failsafe
on.condition(:tries) do |c|
c.times = 5
c.transition = :start
c.interval = 5.seconds
end
end
# start if process is not running
w.transition(:up, :start) do |on|
on.condition(:process_running) do |c|
c.running = false
end
end
end
end
Would appreciate any help in how I can stop rake jobs using GOD.
Thanks.
Source: (StackOverflow)
I want to ensure that certain processes like Sunspot Solr search and delayed_job are running when my Rails 3 app initializes or loads.
I'm somewhat of a noob and from what I can tell, I could write a custom initializer or use a process monitoring framework like God or Monit.
Can someone please suggest the optimal path to take here?
Source: (StackOverflow)
Here's the description for god's restart command: restart <task or group name>
. The builtin init script does a kill, followed by a start. Is there really no built-in way to send a restart command to all watches whether they are grouped or not?
Source: (StackOverflow)
We're using God to monitor our server processes, and were wondering if we should use something like Monit to make sure God gets up if something unexpected happens.
A quis custodiet ipsos custodes? conundrum :)
Googling for it didn't bring any mentions of this being done, which makes me think it's probably pretty rare.
Has anybody here seen a need for it?
Source: (StackOverflow)
I have a few apps running rails 3 on ruby 1.9.2 and deployed on a Ubuntu 10.04 LTS machine using nginx + passenger. Now, I need to add a new app that runs on ruby 1.8.7 (REE) and Rails 2. I accomplished to do that with RVM, Passenger Standalone and a reverse proxy.
The problem is that, every time I have to restart the server (to install security updates for example), I have to start Passenger Standalone manually.
Is there a way to start it automatically? I was told to use Monit or God, but I couldn't be able to write a proper recipe that works with Passenger Standalone. I also had a few problems with God and RVM, so if you have a solution that doesn't use God, or if you know how to configure God/Rvm properly, it's even better.
Source: (StackOverflow)
I use the God gem to monitor my delayed_job processes, so far the gem is doing its job as it should but from some reason I can't get him to send email notifications (i use google apps).
Here are my god file configuration:
God::Contacts::Email.defaults do |d|
d.from_email = 'system@example.com'
d.from_name = 'Process monitoring'
d.delivery_method = :smtp
d.server_host = 'smtp.gmail.com'
d.server_port = 587
d.server_auth = true
d.server_domain = 'example.com'
d.server_user = 'system@example.com'
d.server_password = 'myPassword'
end
God.contact(:email) do |c|
c.name = 'me'
c.group = 'developers'
c.to_email = 'me@example.com'
end
w.start_if do |start|
start.condition(:process_running) do |c|
c.interval = 20.seconds
c.running = false
c.notify = {:contacts => ['me'], :priority => 1, :category => 'staging'}
end
Any thoughts?
Source: (StackOverflow)
I have a god/resque setup that spans a few worker servers. Every so often, the workers get jammed up by long polling connections and won't time out correctly. We have tried coding around it (but regardless of why it doesn't work), the keep-alive packets being sent down the wire won't let us time it out easily.
I would like certain workers (which I already have segmented out in their own watch blocks) to not be allowed to run for longer than a certain amount of time. In pesudocode, I am looking for a watch condition like the following (i.e. restart that worker if it takes longer than 60 sec to complete the task):
w.transition(:up, :restart) do |on|
on.condition(:process_timer) do {|c| c.greater_than = 60.seconds}
end
Any thoughts or pointers on how to accomplish this would be greatly appreciated.
Source: (StackOverflow)
Running god.rb to start and monitor Sidekiq this does not work. Below my god config for sidekiq.
Running sidekiq -C /srv/books/current/config/sidekiq.yml manually from terminal on production does work fine, but not the sidekiq god.rb config anyone an idea why this could happen? Nothing much in the logs.
God.watch do |w|
w.name = "sidekiq"
w.interval = 30.seconds
w.start = "cd #{ENV['RAILS_ROOT']}; sidekiq -C /srv/books/current/config/sidekiq.yml"
w.stop = "cd #{ENV['RAILS_ROOT']}; exec sidekiqctl stop /srv/books/shared/tmp/pids/sidekiq.pid"
w.restart = "#{w.stop} && #{w.start}"
w.start_grace = 10.seconds
w.restart_grace = 10.seconds
w.log = File.join(ENV['RAILS_ROOT'], 'log', 'sidekiq.log')
# determine the state on startup
w.transition(:init, {true => :up, false => :start}) do |on|
on.condition(:process_running) do |c|
c.running = true
end
end
# determine when process has finished starting
w.transition([:start, :restart], :up) do |on|
on.condition(:process_running) do |c|
c.running = true
c.interval = 5.seconds
end
# failsafe
on.condition(:tries) do |c|
c.times = 5
c.transition = :start
c.interval = 5.seconds
end
end
# start if process is not running
w.transition(:up, :start) do |on|
on.condition(:process_running) do |c|
c.running = false
end
end
# Notifications
# --------------------------------------
w.transition(:up, :start) do |on|
on.condition(:process_exits) do |p|
p.notify = 'ect'
end
end
end
Source: (StackOverflow)