EzDevInfo.com

god

Ruby process monitor God - A Process Monitoring Framework in Ruby

How to monitor delayed_job with monit

Are there any examples on the web of how to monitor delayed_job with Monit?

Everything I can find uses God, but I refuse to use God since long running processes in Ruby generally suck. (The most current post in the God mailing list? God Memory Usage Grows Steadily.)

Update: delayed_job now comes with a sample monit config based on this question.


Source: (StackOverflow)

God vs. Monit [closed]

Which one to use for process monitoring and why?


Source: (StackOverflow)

Advertisements

God starts too many processes

I have a god script that is supposed to keep an eye on two stalker processes. The problem is that after 24h it starts way too many processes.

This is the god script.

rails_root = File.expand_path("../..", __FILE__)

2.times do |n|
  God.watch do |w|
    w.group = "app-scripts"
    w.name  = "run-#{n}"
    w.interval = 30.seconds
    w.dir      = File.dirname(__FILE__)

    w.env = {
      "BUNDLE_GEMFILE" => "#{rails_root}/Gemfile",
      "RAILS_ENV" => "production",
      "BEANSTALK_URL" => "beanstalk://127.0.0.1:54132"
    }

    w.start = "bbundle exec stalk #{File.join(rails_root, "config/jobs.rb")}"

    w.start_grace = 5.seconds
    w.stop_grace  = 5.seconds

    w.start_if do |start|
      start.condition(:process_running) { |c| c.running = false }
    end

    w.restart_if do |restart|
      restart.condition(:memory_usage) do |c|
        c.above = 200.megabytes
        c.times = [3, 5]
      end

      restart.condition(:cpu_usage) do |c|
        c.above = 95.percent
        c.times = 5
      end
    end

    w.lifecycle do |on|
      on.condition(:flapping) do |c|
        c.to_state = [:start, :restart]
        c.times = 5
        c.within = 5.minute
        c.transition = :unmonitored
        c.retry_in = 10.minutes
        c.retry_times = 5
        c.retry_within = 2.hours
      end
    end
  end
end

ps aux | grep stalk returns the following.

root      3178  0.2  2.7 417580 117284 ?       Sl   Oct28   2:22 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root      3179  0.2  3.3 506068 138740 ?       Sl   Oct28   2:26 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root      4588  0.2  2.9 497932 121664 ?       Sl   Oct25  16:10 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root      4794  0.2  3.0 497792 128084 ?       Sl   Oct25  15:57 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     10391  0.2  2.8 496784 121388 ?       Sl   Oct25  15:44 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     10392  0.2  2.8 497624 121528 ?       Sl   Oct25  15:31 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     18874 75.0  2.0 214116 83948 ?        Rl   15:49   0:09 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     18875 75.0  2.0 214944 84868 ?        Rl   15:49   0:09 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     20649  0.2  2.6 410636 110012 ?       Sl   Oct28   2:44 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     20650  0.2  3.0 439284 128996 ?       Sl   Oct28   2:47 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     23272  0.2  2.7 414452 115772 ?       Sl   Oct28   2:44 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     23273  0.2  2.7 417728 117152 ?       Sl   Oct28   2:44 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     25919  0.2  3.1 436276 131876 ?       Sl   Oct28   2:28 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     25920  0.2  3.3 503236 138676 ?       Sl   Oct28   2:29 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     28782  0.2  2.8 431836 121108 ?       Sl   Oct25  16:58 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     30687  0.2  2.7 415908 117008 ?       Sl   Oct28   2:39 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb
root     30688  0.2  2.6 476184 111844 ?       Sl   Oct28   2:37 ruby /opt/www/myapp/shared/bundle/ruby/1.9.1/bin/stalk /opt/www/myapp/current/config/jobs.rb

This is the /usr/bin/bbundle script.

#!/usr/bin/env bash

if [[ -s "/home/webmaster/.rvm/environments/ruby-1.9.2-p320@webmaster" ]]
then
  source "/home/webmaster/.rvm/environments/ruby-1.9.2-p320@webmaster"
  bundle  "$@"
else
  echo "ERROR: Missing RVM environment file: '/home/webmaster/.rvm/environments/ruby-1.9.2-p320@webmaster'" >&2
  exit 1
fi
  • Running sudo god stop app-scripts won't kill any processes.

  • I've tried adding w.uid = "webmaster" to the god script, but the problem remains.

  • I'm running god version 0.12.1, ruby version 1.9.3p286 and stalker version 0.9.0.

What am I doing wrong?


Source: (StackOverflow)

Using god only to kill

I serve my software using passenger. It spawns many ruby processes.

Sometimes one of these rubies becomes bloated and I want it to die.

I was hoping to use god to that intent. My idea was to monitor all these rubies and if it is consuming more than 500MB of memory for 3 cycles, god should try to gracefuly kill it. If it remains alive for more than 5 minutes then god should kill it not gracefully.

It seems to me that god always tries to run the service again, so it forces us to provide a start command. Is it possible to use god only to kill bad behaviored processes and let the passenger spawner to bring them back to live when necessary?


Source: (StackOverflow)

Using God to monitor Unicorn - Start exited with non-zero code = 1

I am working on a God script to monitor my Unicorns. I started with GitHub's examples script and have been modifying it to match my server configuration. Once God is running, commands such as god stop unicorn and god restart unicorn work just fine.

However, god start unicorn results in WARN: unicorn start command exited with non-zero code = 1. The weird part is that if I copy the start script directly from the config file, it starts right up like a brand new mustang.

This is my start command:

/usr/local/bin/unicorn_rails -c /home/my-linux-user/my-rails-app/config/unicorn.rb -E production -D

I have declared all paths as absolute in the config file. Any ideas what might be preventing this script from working?


Source: (StackOverflow)

god doesn't stop unicorn

I have this file

rails_env = ENV['RAILS_ENV'] || 'development'
rails_root = ENV['RAILS_ROOT'] || "/home/luiz/rails_dev/api"

God.watch do |w|
  w.name = "unicorn"
  w.interval = 30.seconds # default

  # unicorn needs to be run from the rails root
  w.start = "cd #{rails_root} && unicorn_rails -c config/unicorn.rb -E #{rails_env}"

  # QUIT gracefully shuts down workers
  w.stop = "kill -QUIT `cat #{rails_root}/tmp/pids/unicorn.pid`"

  # USR2 causes the master to re-create itself and spawn a new worker pool
  w.restart = "kill -USR2 `cat #{rails_root}/tmp/pids/unicorn.pid`"

  w.start_grace = 10.seconds
  w.restart_grace = 10.seconds
  w.pid_file = "#{rails_root}/tmp/pids/unicorn.pid"

  w.behavior(:clean_pid_file)

  w.start_if do |start|
    start.condition(:process_running) do |c|
      c.interval = 5.seconds
      c.running = false
    end
  end

  w.restart_if do |restart|
    restart.condition(:memory_usage) do |c|
      c.above = 300.megabytes
      c.times = [3, 5] # 3 out of 5 intervals
    end

    restart.condition(:cpu_usage) do |c|
      c.above = 50.percent
      c.times = 5
    end
  end

  # lifecycle
  w.lifecycle do |on|
    on.condition(:flapping) do |c|
      c.to_state = [:start, :restart]
      c.times = 5
      c.within = 5.minute
      c.transition = :unmonitored
      c.retry_in = 10.minutes
      c.retry_times = 5
      c.retry_within = 2.hours
    end
  end
end

I start unicorn with god -c unicorn.god -D -p 8081 and my workers are setup fine. but, sometimes I need do stop unicorn (god stop unicorn -p 8081 in another console) but the server keep up and running.
what am I missing?

Edit

We're moving from unicorn to puma (not because this question, it's a performance thing), and not going to use god anymore...thanks everybody for your help


Source: (StackOverflow)

God won't register a running custom service

I've got a rails website and a small minecraft server running on a linode vps. I'm running minecraft as a custom service off of a ram server based on an init.d file. Since I'm using God to monitor my rails website I thought I'd use it for minecraft as well, but it doesn't seem to be able to recognize the service in any way. The conditions don't detect its presence. :process_running always returns false, whether it's running or not, and fail to start it when it isn't. To add to the confusion :memory_usage and :cpu_usage are always zero.

My /etc/init.d/minecraft file is here: http://pastie.org/2760483

It works perfectly well, and 'service minecraft start' and whatnot gives me pretty much everything I need. My hope was to be able to put it to sleep it automatically through god whenever the cpu usage got high to prioritize the website. However none of the god conditions are figuring out what's going on with the process.

My /opt/god/minecraft.god file is here: http://pastie.org/2760498

Obviously the low cpu in that is an attempt to get a rise out of god. Asking for a smiting, if you will.

Trying to run god off the config: sudo god -c minecraft.god -D

yields:

I [2011-10-26 01:55:55]  INFO: Loading minecraft.god
I [2011-10-26 01:55:55]  INFO: Syslog enabled.
I [2011-10-26 01:55:55]  INFO: Using pid file directory: /var/run/god
I [2011-10-26 01:55:55]  INFO: Socket already in use
I [2011-10-26 01:55:55]  INFO: Socket is stale, reopening
I [2011-10-26 01:55:55]  INFO: Started on drbunix:///tmp/god.17165.sock
I [2011-10-26 01:55:55]  INFO: minecraft move 'unmonitored' to 'up'
I [2011-10-26 01:55:55]  INFO: minecraft moved 'unmonitored' to 'up'
I [2011-10-26 01:55:55]  INFO: minecraft [ok] memory within bounds [0kb] (MemoryUsage)
I [2011-10-26 01:55:55]  INFO: minecraft [ok] cpu within bounds [0%%] (CpuUsage)

Source: (StackOverflow)

God stop resque workers rake

I am using Resque on a production website.

When I deploy, I want GOD to stop all of the workers and then restart them since sometimes we change the code of a class and requeue the failed jobs.

The problem is, that when I do god stop resque, the rakes does not actually stop, the workers still stay alive and working with older code, which creates all sorts of issues for me.

even when I do 'god terminate' it won't kill the workers.

Right now, I am using a shell script to kill workers, but since I have more then one server, it's pretty much a pain in the ass doing in on all production servers.

This is my god config file:

rails_env   = ENV['RAILS_ENV']  || "production"
rails_root  = ENV['RAILS_ROOT'] || "/mnt/data-store/html/gogobot/current"
num_workers = rails_env == 'production' ? 5 : 2

num_workers.times do |num|
  God.watch do |w|
    w.dir      = "#{rails_root}"
    w.name     = "resque-#{num}"
    w.group    = "resque"
    w.interval = 2.minutes
    w.env      = {"QUEUE"=>"duplicate_merging,facebook_wall_posts,generic,mailer,notifications,realtime,scoring_system,signup,social_graph_facebook,social_graph_foursquare,social_graph_twitter,user_info,user_score", "RAILS_ENV"=>rails_env, "PIDFILE" => "#{rails_root}/tmp/resque_#{w}.pid"}
    w.pid_file = "#{rails_root}/tmp/resque_#{w}.pid"
    w.start    = "cd #{rails_root}/ && rake environment resque:work QUEUE=duplicate_merging,facebook_wall_posts,generic,mailer,notifications,realtime,scoring_system,signup,social_graph_facebook,social_graph_foursquare,social_graph_twitter,user_info,user_score RAILS_ENV=#{rails_env}"
    w.log      = "#{rails_root}/log/resque_god.log"

    w.uid = 'root'
    w.gid = 'root'

    # restart if memory gets too high
    w.transition(:up, :restart) do |on|
      on.condition(:memory_usage) do |c|
        c.above = 350.megabytes
        c.times = 2
      end
    end

    # determine the state on startup
    w.transition(:init, { true => :up, false => :start }) do |on|
      on.condition(:process_running) do |c|
        c.running = true
      end
    end

    # determine when process has finished starting
    w.transition([:start, :restart], :up) do |on|
      on.condition(:process_running) do |c|
        c.running = true
        c.interval = 5.seconds
      end

      # failsafe
      on.condition(:tries) do |c|
        c.times = 5
        c.transition = :start
        c.interval = 5.seconds
      end
    end

    # start if process is not running
    w.transition(:up, :start) do |on|
      on.condition(:process_running) do |c|
        c.running = false
      end
    end
  end
end

1.times do |num|
  God.watch do |w|
    w.name     = "dj-#{num}"
    w.group    = 'dj'
    w.interval = 30.seconds
    w.start    = "cd #{rails_root} && rake jobs:work"

    w.uid = 'root'
    w.gid = 'root'

    # retart if memory gets too high
    w.transition(:up, :restart) do |on|
      on.condition(:memory_usage) do |c|
        c.above = 300.megabytes
        c.times = 2
      end
    end

    # determine the state on startup
    w.transition(:init, { true => :up, false => :start }) do |on|
      on.condition(:process_running) do |c|
        c.running = true
      end
    end

    # determine when process has finished starting
    w.transition([:start, :restart], :up) do |on|
      on.condition(:process_running) do |c|
        c.running = true
        c.interval = 5.seconds
      end

      # failsafe
      on.condition(:tries) do |c|
        c.times = 5
        c.transition = :start
        c.interval = 5.seconds
      end
    end

    # start if process is not running
    w.transition(:up, :start) do |on|
      on.condition(:process_running) do |c|
        c.running = false
      end
    end
  end
end

Would appreciate any help in how I can stop rake jobs using GOD.

Thanks.


Source: (StackOverflow)

Ensure that certain processes are running when my Rails app loads

I want to ensure that certain processes like Sunspot Solr search and delayed_job are running when my Rails 3 app initializes or loads.

I'm somewhat of a noob and from what I can tell, I could write a custom initializer or use a process monitoring framework like God or Monit.

Can someone please suggest the optimal path to take here?


Source: (StackOverflow)

restart all god tasks

Here's the description for god's restart command: restart <task or group name>. The builtin init script does a kill, followed by a start. Is there really no built-in way to send a restart command to all watches whether they are grouped or not?


Source: (StackOverflow)

Monit to watch over God?

We're using God to monitor our server processes, and were wondering if we should use something like Monit to make sure God gets up if something unexpected happens.

A quis custodiet ipsos custodes? conundrum :)

Googling for it didn't bring any mentions of this being done, which makes me think it's probably pretty rare.

Has anybody here seen a need for it?


Source: (StackOverflow)

How can I keep a Passenger Standalone up even after a restart?

I have a few apps running rails 3 on ruby 1.9.2 and deployed on a Ubuntu 10.04 LTS machine using nginx + passenger. Now, I need to add a new app that runs on ruby 1.8.7 (REE) and Rails 2. I accomplished to do that with RVM, Passenger Standalone and a reverse proxy.

The problem is that, every time I have to restart the server (to install security updates for example), I have to start Passenger Standalone manually.

Is there a way to start it automatically? I was told to use Monit or God, but I couldn't be able to write a proper recipe that works with Passenger Standalone. I also had a few problems with God and RVM, so if you have a solution that doesn't use God, or if you know how to configure God/Rvm properly, it's even better.


Source: (StackOverflow)

Emails notifications are not sent from the God gem

I use the God gem to monitor my delayed_job processes, so far the gem is doing its job as it should but from some reason I can't get him to send email notifications (i use google apps). Here are my god file configuration:

God::Contacts::Email.defaults do |d|
  d.from_email = 'system@example.com'
  d.from_name = 'Process monitoring'
  d.delivery_method = :smtp
  d.server_host = 'smtp.gmail.com'
  d.server_port = 587
  d.server_auth = true
  d.server_domain = 'example.com'
  d.server_user = 'system@example.com'
  d.server_password = 'myPassword'
end


God.contact(:email) do |c|
  c.name = 'me'
  c.group = 'developers'
  c.to_email = 'me@example.com'
end     

w.start_if do |start|
  start.condition(:process_running) do |c|
  c.interval = 20.seconds
  c.running = false
  c.notify = {:contacts => ['me'], :priority => 1, :category => 'staging'}
end

Any thoughts?


Source: (StackOverflow)

How do I write a Resque condition that says "if a process is running for longer than n seconds, kill it"?

I have a god/resque setup that spans a few worker servers. Every so often, the workers get jammed up by long polling connections and won't time out correctly. We have tried coding around it (but regardless of why it doesn't work), the keep-alive packets being sent down the wire won't let us time it out easily.

I would like certain workers (which I already have segmented out in their own watch blocks) to not be allowed to run for longer than a certain amount of time. In pesudocode, I am looking for a watch condition like the following (i.e. restart that worker if it takes longer than 60 sec to complete the task):

w.transition(:up, :restart) do |on|
  on.condition(:process_timer) do {|c|  c.greater_than = 60.seconds}
end

Any thoughts or pointers on how to accomplish this would be greatly appreciated.


Source: (StackOverflow)

sidekiq true god.rb never runs my workers, where same command from terminal does?

Running god.rb to start and monitor Sidekiq this does not work. Below my god config for sidekiq.

Running sidekiq -C /srv/books/current/config/sidekiq.yml manually from terminal on production does work fine, but not the sidekiq god.rb config anyone an idea why this could happen? Nothing much in the logs.

God.watch do |w|

  w.name = "sidekiq"
  w.interval = 30.seconds
  w.start = "cd #{ENV['RAILS_ROOT']}; sidekiq -C /srv/books/current/config/sidekiq.yml"
  w.stop = "cd #{ENV['RAILS_ROOT']}; exec sidekiqctl stop /srv/books/shared/tmp/pids/sidekiq.pid"
  w.restart = "#{w.stop} && #{w.start}"
  w.start_grace = 10.seconds
  w.restart_grace = 10.seconds
  w.log = File.join(ENV['RAILS_ROOT'], 'log', 'sidekiq.log')

  # determine the state on startup
  w.transition(:init, {true => :up, false => :start}) do |on|
    on.condition(:process_running) do |c|
      c.running = true
    end
  end

  # determine when process has finished starting
  w.transition([:start, :restart], :up) do |on|
    on.condition(:process_running) do |c|
      c.running = true
      c.interval = 5.seconds
    end

    # failsafe
    on.condition(:tries) do |c|
      c.times = 5
      c.transition = :start
      c.interval = 5.seconds
    end
  end

  # start if process is not running
  w.transition(:up, :start) do |on|
    on.condition(:process_running) do |c|
      c.running = false
    end
  end


  # Notifications
  # --------------------------------------
  w.transition(:up, :start) do |on|
    on.condition(:process_exits) do |p|
      p.notify = 'ect'
    end
  end


end

Source: (StackOverflow)