EzDevInfo.com

statsd

A Ruby Statsd client that isn't a direct port of the Python example code. Because Ruby isn't Python.

How does StatsD store its data?

I've been going through the README at https://github.com/etsy/statsd but I can't figure out how does StatsD store the data it gets?

Does it do any permanent storage or is it one off thing? I was trying to figure out what database (if any) it uses or if it simply uses a file-based storage.


Source: (StackOverflow)

Tracking metrics using StatsD (via etsy) and Graphite, graphite graph doesn't seem to be graphing all the data

We have a metric that we increment every time a user performs a certain action on our website, but the graphs don't seem to be accurate.

So going off this hunch, we invested the updates.log of carbon and discovered that the action had happened over 4 thousand times today(using grep and wc), but according the Integral result of the graph it returned only 220ish.

What could be the cause of this? Data is being reported to statsd using the statsd php library, and calling statsd::increment('metric'); and as stated above, the log confirms that 4,000+ updates to this key happened today.

We are using:

graphite 0.9.6 with statsD (etsy)


Source: (StackOverflow)

Advertisements

Can writing to a UDP socket ever block?

And if so, under what conditions? Or, phrased alternately, is it safe to run this code inside of twisted:

class StatsdClient(AbstractStatsdClient):
  def __init__(self, host, port):
    super(StatsdClient, self).__init__()
    self.addr = (host, port)
    self.server_hostname = socket.gethostname()
    self.udp_sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

  def incr(self, stat, amount=1):
    data = {"%s|c" % stat: amount}
    self._send(data)

  def _send(self, data):
    for stat, value in data.iteritems():
      self.udp_sock.sendto("servers.%s.%s:%s" % (self.server_hostname, stat, value), self.addr)

Source: (StackOverflow)

Which StatsD client should I use for a java/grails project?

I'm looking at adding StatsD data collection to my grails application and looking around at existing libraries and code has left me a little confused as to what would be a good scalable solution. To put the question into context a little I'm working on an online gaming type project where I will naturally be monitoring user interactions with the game engine, these will naturally cluster around particular moments in time where X users will be performing interactions within the window of a second or two, then repeating after a 10-20 second pause.

Here is my analysis of the options that are available today.

Etsy StatsD client example

https://github.com/etsy/statsd/blob/master/examples/StatsdClient.java

The "simplest thing that could possibly work" solution, I could pull this class into my project and instanciate a singleton instance as a spring bean and use it directly. However after noticing that the grails-statsd plugin creates a pool of client instances I started wondering about the scalability of this approach.

It seems that the doSend method could become a bottleneck if many threads are trying to send events at the same time, however as I understand it, due to the fire and forget nature of sending UDP packets, this should happen quickly, avoiding the huge overhead that we usually associate with network connections.

grails-statsd plugin

https://github.com/charliek/grails-statsd/

Someone has already created a StatsD plugin for grails that includes some nice features, such as the annotations and withTimer method. However I see that the implementation there is missing some bug fixes from the example implementation such as specifying the locale on calls to String.format. I'm also not a huge fan of pulling in apache commons-pool just for this, when a standard Executor could achieve a similar effect.

java-statsd-client

https://github.com/tim-group/java-statsd-client/

This is an alternative pure java library that operates asynchronously by maintaining its own ExecutorService. It supports the entire StatsD API, including sets and sampling, but doesn't provide any hooks for configuring the thread pool and queue size. In the case of problems, for non-critical things such as monitoring, I think I would prefer a finite queue and losing events than having an infinite queue that fills up my heap.

Play statsd plugin

https://github.com/vznet/play-statsd/

Now I can't use this code directly in my grails project but I thought it was worth a look to see how things were implemented. Generally I love the way the code in StatsdClient.scala is built up, very clean and readable. Also appears to have the locale bug, but otherwise feature complete with the etsy sample. Interestingly, unless there is some scala magic that I've not understood, this appears to create a new socket for each data point that is sent to StatsD. While this approach nicely avoids the necessity for an object pool or executor thread I can't imagine it's terribly efficient, potentially performing DNS lookups within the request thread that should be returning to the user as soon as possible.

The questions

  1. Judging by the fact that all the other implementations appear to have implemented another strategy for handling concurrency, can I assume that the Etsy example is a little too naïve for production use?
  2. Does my analysis here appear to be correct?
  3. What are other people using for statsd in java/groovy?

So far it looks like the best existing solution is the grails plugin as long as I can accept the commons-pool dependency, but right now I'm seriously considering spending Sunday writing my own version that combines the best parts of each implementation.


Source: (StackOverflow)

Graph old data using graphite and statsd

Can I enter timestamp to send data to graphite via statsd(javascript statsd)? I need to graph old data.


Source: (StackOverflow)

Statsd & Graphite - get data as CSV

I use statsd for measuring stats and Graphite for displaying these. Anyway, I would like to do a more sophisticated analysis in statistical software, to find out the relations between various variables.

In order to do this, I need the "raw" data, which are usually displayed in Graphite as color lines. Is it possible to get the data in CSV format? Data sampled to 1 entry per 10 seconds will be perfect, and that's statsd default behavior, I think.


Source: (StackOverflow)

Getting accurate graphite stats_counts

We have etsy/statsd node application running that flushes stats to carbon/whisper every 10 seconds. If you send 100 increments (counts), in the first 10 seconds, graphite displays them properly, like:

localhost:3000/render?from=-20min&target=stats_counts.test.count&format=json

[{"target": "stats_counts.test.count", "datapoints": [
 [0.0, 1372951380], [0.0, 1372951440], ... 
 [0.0, 1372952460], [100.0, 1372952520]]}]

However, 10 seconds later, and this number falls to 0, null and or 33.3. Eventually it settles at a value 1/6th of the initial number of increments, in this case 16.6.

/opt/graphite/conf/storage-schemas.conf is:

[sixty_secs_for_1_days_then_15m_for_a_month]
pattern = .*
retentions = 10s:10m,1m:1d,15m:30d

I would like to get accurate counts, is graphite averaging the data over the 60 second windows rather than summing it perhaps? Using the integral function, after some time has passed, obviously gives:

localhost:3000/render?from=-20min&target=integral(stats_counts.test.count)&format=json

[{"target": "stats_counts.test.count", "datapoints": [
 [0.0, 1372951380], [16.6, 1372951440], ... 
 [16.6, 1372952460], [16.6, 1372952520]]}]

Source: (StackOverflow)

StatsD/Graphite Naming Conventions for Metrics

I'm beginning the process of instrumenting a web application, and using StatsD to gather as many relevant metrics as possible. For instance, here are a few examples of the high-level metric names I'm currently using:

http.responseTime
http.status.4xx
http.status.5xx
view.renderTime
oauth.begin.facebook
oauth.complete.facebook
oauth.time.facebook
users.active

...and there are many, many more. What I'm grappling with right now is establishing a consistent hierarchy and set of naming conventions for the various metrics, so that the current ones make sense and that there are logical buckets within which to add future metrics.

My question is two fold:

  1. What relevant metrics are you gathering that you have found indespensible?
  2. What naming structure are you using to categorize metrics?

Source: (StackOverflow)

Is there any way to fill in missing data in graphite when using statsD?

I'm using statsD to report counter data to graphite; sends a tick everytime I get a message. This works great, except in the situation when statsD has to restart for whatever reason. Then I get huge holes in my graphs, since statsD is now no longer sending '0' every 10 seconds for periods when I didn't get any messages.

I'm reporting for various different message types and queues, and sometimes I don't get a message for a particular queue for a long time.

Is there any existing way to 'fill-in' the missing data with a default value I specify (in my case this would be 0)?

I thought about sending a '0' count for a given metric so that statsD starts sending 0's for it, but I don't always know the set of metrics I'll be reporting in advance.


Source: (StackOverflow)

How do you run utility services on Heroku?

Heroku is fantastic for prototyping ideas and running simple web services, I often use it to run Python web services like Flask and Django and try out ideas. However I've always struggled to understand how you can use the infrastricture to run those amazingly powerful support or utility services every startup needs in its stack. 4 exmaples of services I can't live without and would recommend to any startup.

  • Jenkins
  • Statsd
  • Graphite
  • Graylog

How would you run these on Heroku? Would it be best just getting dedicated boxes (Rackspace, e.t.c) with these support services installed.

Has anyone one run utility deamons (services) on Heroku?


Source: (StackOverflow)

Kibana 3 Milestone 4 and Graphite Integration

I am having difficulties understanding integration of Graphite and Kibana 3 to monitor logs and system vitals. I am referring to figure in Log management system described here.

  1. Considering the new features in Kibana 3 Milestone 4, can we collect system vitals and store it directly into elastic search instead of graphite and use a single kibana dashboard (What could be right choice to implement in a distributed system where emphasis is on performance and low memory foot print)?
  2. Why must we use StatsD and graphite, when count and simple statistics are now supported by kibana - Elasticsearch combination?
  3. In case, we decide to use both graphite and kibana, How do we integrate it into a single Dashboard?
  4. Is there a tutorial to integrate Dashboards (kibana and graphitos/graph explorer/orion/pencil)?

Thanks in advance.


Source: (StackOverflow)

Making Graphite UI data cumualtive by default

I'm setting up Graphite, and hit a problem with how data is represented on the screen when there's not enough pixels.

I found this post whose first answer is very close to what I'm looking for:

No what is probably happening is that you're looking at a graph with more datapoints than pixels, which forces Graphite to aggregate the datapoints. The default aggregation method is averaging, but you can change it to summing by applying the cumulative() function to your metrics.

Is there any way to get this cumulative() behavior by default?

I've modified my storage-aggregation.conf to use 'aggregationMethod = sum', but I believe this is for historical data and not for data that's displayed in the UI.

When I apply cumulative() everything is perfect, I'm just wondering if there's a way to get this behavior by default.


Source: (StackOverflow)

Correct use of Graphite metric names

I build a web analytics tool and consider to use Graphite. This is a very basic tool with just a few interesting dimentions but there are multiple dimensions associated with a measurement. For example when a user hits the site I want to keep track of geography, browser etc. The metric name would probably be:

usa.chrome.windows8.organic...

I can then use wildcards to do interesting queries.

Is this abuse of metric names (and Graphite in general), or is it a good approach as long as I only care about a small amount of metrics.


Source: (StackOverflow)

Does a thread-safe statsd client exist?

I need to use a thread-safe statsd client in a web application to monitor user threads for the statistics. Please suggest a solution that is both thread safe and does not compromise performance.


Source: (StackOverflow)

Having trouble getting accurate numbers from graphite

I have an application that publishes a number of stats to graphite via statsd. One of the stats simply sends a stat increment to statsd every time a message is received by the service. I need to display a graph that shows the the relative traffic over time for this stat. Generally speaking, I should be able to display a graph that refreshes every, say 10 seconds, and displays how many messages were recived in those 10 seconds as well as the history for a given period of time. However, no matter how I format my API query I cannot seem to get accurate data. I've read a number of articles including this one:

http://code.hootsuite.com/accurate-counting-with-graphite-and-statsd/

That seems to give some good insight but is still not quite giving me what I need. this is the closes I have come:

integral(hitcount(stats.recieved, "10seconds"))

However, I don't like the cumulative result of this and when I run this I get statistics that come nowhere near to what I see n my logs for messages received. I am ok with accepting some packet loss but we talking about orders of magnitude. I know I am doing something wrong. Just hoping someone can give me some insight as to what.


Source: (StackOverflow)