EzDevInfo.com

logstash interview questions

Top logstash frequently asked interview questions

How to handle multiple heterogeneous inputs with Logstash?

Let's say you have 2 very different types of logs such as technical and business logs and you want:

  • raw technical logs be routed towards a graylog2 server using a gelf output,
  • json business logs be stored into an elasticsearch cluster using the dedicated elasticsearch_http output.

I know that with Syslog-NG for instance, the configuration file allow to define several distinct inputs which can then be processed separately before being dispatched; what Logstash seems unable to do. Even if one instance can be initiated with two specific configuration files, all logs take the same channel and are being applied the same processings ...

Should I run as many instances as I have different types of logs?


Source: (StackOverflow)

Getting Started With Logstash Filters

Looking for a little help getting started... I have Logstash installed (as well as ElasticSearch) but I'm struggling with my first filter.

As a test I have it configured to read from a trimmed log file that contains 6 lines, each line begins with a time stamp such as [11/5/13 4:09:21:327 PST] followed by a bunch of other data.

For now I have my conf file set to read this file and I'm trying to do a very basic grok filter to match the lines, maybe to grab the timestamp and then the rest of the data (from where I can start splitting it up).

Here is what I have:

input {
  file {
    type => "chris"
    path => "/home/chris/Documents/test.log" 
  }
}
filter {
  grok {
    type => "chris"
    pattern => "%{GREEDYDATA:logline}"
  }
}
output {
  stdout {debug => true debug_format => "json"}
}

I was kind of expecting (hoping) that when I ran Logstash it'd match each line and output it, then I could start breaking the lines down and filtering my adjusting the pattern but as I can't get this first basic bit to work I'm a little stumped.

Does anyone have a similar conf file they'd be okay to share? Most of the examples I can find are more advanced and I seem to be stuck trying to get out of the gate.

Thanks,

Chris.


Source: (StackOverflow)

Advertisements

Logstash with Elasticsearch

I am trying to connect Logstash with Elasticsearch but cannot get it working.

Here is my logstash conf:

input {
  stdin {
    type => "stdin-type"
  }

  file {
    type => "syslog-ng"

    # Wildcards work, here :)
    path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
  }
}

output {
  stdout { }
  elasticsearch{
        type => "all"
        embedded => false
        host => "192.168.0.23"
        port => "9300"
        cluster => "logstash-cluster"
        node_name => "logstash"
        }
}

And I only changed these details in my elasticsearch.yml

cluster.name: logstash-cluster
node.name: "logstash"
node.master: false
network.bind_host: 192.168.0.23
network.publish_host: 192.168.0.23
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["localhost"]

With these configurations I could not make Logstash connect to ES. Can someone please suggest where I am going wrong?


Source: (StackOverflow)

How to reference environment variables in logstash configuration file?

Is it possible to reference environment variables in logstash configuration?

In my case, i want to make my elasticsearch address configurable that i have set in the environment.


Source: (StackOverflow)

Decompose Logstash json message into fields

It have a logfile that stores event with a timestamp and a json message. For example:

timestamp {"foo": 12, "bar": 13}

I would like to decompose the keys (foo and bar) in the json part into fields in the Logstash output.

I'm aware that I can set the format field in the Logstash file filter to json_event but in that case I have to include the timestamp in json. There is also a json filter, but that adds a single field with the complete json data structure, instead of using the keys.

Any ideas how this can be done?


Source: (StackOverflow)

Logstash date parsing as timestamp using the date filter

Well, after looking around quite a lot, I could not find a solution to my problem, as it "should" work, but obviously doesn't. I'm using on a Ubuntu 14.04 LTS machine Logstash 1.4.2-1-2-2c0f5a1, and I am receiving messages such as the following one:

2014-08-05 10:21:13,618 [17] INFO  Class.Type - This is a log message from the class:
  BTW, I am also multiline

In the input configuration, I do have a multiline codec and the event is parsed correctly. I also separate the event text in several parts so that it is easier to read.

In the end, I obtain, as seen in Kibana, something like the following (JSON view):

{
  "_index": "logstash-2014.08.06",
  "_type": "customType",
  "_id": "PRtj-EiUTZK3HWAm5RiMwA",
  "_score": null,
  "_source": {
    "@timestamp": "2014-08-06T08:51:21.160Z",
    "@version": "1",
    "tags": [
      "multiline"
    ],
    "type": "utg-su",
    "host": "ubuntu-14",
    "path": "/mnt/folder/thisIsTheLogFile.log",
    "logTimestamp": "2014-08-05;10:21:13.618",
    "logThreadId": "17",
    "logLevel": "INFO",
    "logMessage": "Class.Type - This is a log message from the class:\r\n  BTW, I am also multiline\r"
  },
  "sort": [
    "21",
    1407315081160
  ]
}

You may have noticed that I put a ";" in the timestamp. The reason is that I want to be able to sort the logs using the timestamp string, and apparently logstash is not that good at that (e.g.: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html).

I have unsuccessfull tried to use the date filter in multiple ways, and it apparently did not work.

date {
            locale => "en"
            match => ["logTimestamp", "YYYY-MM-dd;HH:mm:ss.SSS", "ISO8601"]
            timezone => "Europe/Vienna"
            target => "@timestamp"
            add_field => { "debug" => "timestampMatched"}
        }

Since I read that the Joda library may have problems if the string is not strictly ISO 8601-compliant (very picky and expects a T, see https://logstash.jira.com/browse/LOGSTASH-180), I also tried to use mutate to convert the string to something like 2014-08-05T10:21:13.618 and then use "YYYY-MM-dd'T'HH:mm:ss.SSS". That also did not work.

I do not want to have to manually put a +02:00 on the time because that would give problems with daylight saving.

In any of these cases, the event goes to elasticsearch, but date does apparently nothing, as @timestamp and logTimestamp are different and no debug field is added.

Any idea how I could make the logTime strings properly sortable? I focused on converting them to a proper timestamp, but any other solution would also be welcome.

As you can see below: Sorting with timestamps: OK

When sorting over @timestamp, elasticsearch can do it properly, but since this is not the "real" log timestamp, but rather when the logstash event was read, I need (obviously) to be able to sort also over logTimestamp. This is what then is output. Obviously not that useful:

Sorting with string: Not OK. Any suggestions?

Any help is welcome! Just let me know if I forgot some information that may be useful.

Update:

Here is the filter config file that finally worked:

# Filters messages like this:
# 2014-08-05 10:21:13,618 [17] INFO  Class.Type - This is a log message from the class:
#  BTW, I am also multiline

# Take only type- events (type-componentA, type-componentB, etc)
filter {
    # You cannot write an "if" outside of the filter!
    if "type-" in [type] {
        grok {
            # Parse timestamp data. We need the "(?m)" so that grok (Oniguruma internally) correctly parses multi-line events
            patterns_dir => "./patterns"
            match => [ "message", "(?m)%{TIMESTAMP_ISO8601:logTimestampString}[ ;]\[%{DATA:logThreadId}\][ ;]%{LOGLEVEL:logLevel}[ ;]*%{GREEDYDATA:logMessage}" ]
        }

        # The timestamp may have commas instead of dots. Convert so as to store everything in the same way
        mutate {
            gsub => [
                # replace all commas with dots
                "logTimestampString", ",", "."
                ]
        }

        mutate {
            gsub => [
                # make the logTimestamp sortable. With a space, it is not! This does not work that well, in the end
                # but somehow apparently makes things easier for the date filter
                "logTimestampString", " ", ";"
                ]
        }

        date {
            locale => "en"
            match => ["logTimestampString", "YYYY-MM-dd;HH:mm:ss.SSS"]
            timezone => "Europe/Vienna"
            target => "logTimestamp"
        }
    }
}

filter {
    if "type-" in [type] {
        # Remove already-parsed data
        mutate {
            remove_field => [ "message" ]
        }
    }
}

Source: (StackOverflow)

Suggestion on Logstash Setup

I've implemented logstash ( in testing ) as below mentioned architecture.

enter image description here

Component Break Down

  • Rsyslog client: By default syslog installed in all Linux destros, we just need to configure rsyslog to send logs to remote server.
  • Logstash: Logstash will received logs from syslog client and it will store in Redis.
  • Redis: Redis will work as broker, broker is to hold log data sent by agents before logstash indexes it. Having a broker will enhance performance of the logstash server, Redis acts like a buffer for log data, till logstash indexes it and stores it. As it is in RAM its too fast.
  • Logstash: yes, two instance of logstash, 1st one for syslog server, 2nd for read data from redis and send out to elasticsearch.
  • Elasticsearch: The main objective of a central log server is to collect all logs at one place, plus it should provide some meaningful data for analysis. Like you should be able to search all log data for your particular application at a specified time period.Hence there must be a searching and well indexing capability on our logstash server. To achieve this, we will install another opensource tool called as elasticsearch.Elasticsearch uses a mechanism of making an index, and then search that index to make it faster. Its a kind of search engine for text data.
  • Kibana : Kibana is a user friendly way to view, search and visualize your log data

But I'm little bit confuse with redis. using this scenario I'll be running 3 java process on Logstash server and one redis, this will take hugh ram.

Question Can I use only one logstash and elastic search ? Or what would be the best way ?


Source: (StackOverflow)

Is it possible to have Centralised Logging for ElasticBeanstalk Docker apps?

We have custom Docker web app running in Elastic Beanstalk Docker container environment. Would like to have application logs be available for viewing outside. Without downloading throught instance ss ot AWS console.

So far neither of solutions been acceptable. Maybe someone achieved centralised logging for Elastic Benastalk Dockerized apps?

Solution 1: AWS Console log download

not acceptable - requires to download logs, extract every time. Non real-time.

Solution 2: S3 + Elasticsearch + Fluentd

fluentd does not have plugin to retrieve logs from S3 There's excellent S3 plugin, but it's only for log output to S3. not for input logs from S3.

Solution 3: S3 + Elasticsearch + Logstash

cons: Can only pull all logs from entire bucket or nothing.

The problem lies with Elastic Beanstalk S3 Log storage structure. You cannot specify file name pattern. It's either all logs or nothing. ElasticBeanstalk saves logs on S3 in path containing random instance and environment ids:

s3.bucket/resources/environments/logs/publish/e-<random environment id>/i-<random instance id>/my.log@

Logstash s3 plugin can be pointed only to resources/environments/logs/publish/. When you try to point it to environments/logs/publish/*/my.log it does not work. which means you can not pull particular log and tag/type it to be able to find in Elasticsearch. Since AWS saves logs from all your environments and instances in same folder structure, you cannot chose even the instance.

Solution 4: AWS CloudWatch Console log viewer

It is possible to forward your custom logs to CloudWatch console. Do achieve that, put configuration files in .ebextensions path of your app bundle: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/AWSHowTo.cloudwatchlogs.html

There's a file called cwl-webrequest-metrics.config which allows you to specify log files along with alerts, etc. Great!? except that configuration file format is neither yaml,xml or Json, and it's not documetned. Tehre's abslutely zero mentions of that file, it's format either on AAWS documentation website or anywhere on the net. And to get one log file appear in CloudWatch is not simply adding a configuraiton line. The only possible way to get this working seem to be trial and error. Great!? except for every attempt you need to re-deploy your environment.

There's only one reference to how to make this work with custom log: http://qiita.com/kozayupapa/items/2bb7a6b1f17f4e799a22 I have no idea how that person reverse engineered the file format.

cons:

  • Cloudwatch does not seem to be able to split logs into columns when displaying, so you can't easily filter by priority, etc.
  • AWS Console Log viewer does not have auto-refresh to follow logs.
  • Nightmare undocumented configuration file format, no way of testing. Trial and error requires re-deploying whole instance.

Source: (StackOverflow)

Logstash: how to add file name as a field?

I'm using Logstash + Elasticsearch + Kibana to have an overview of my Tomcat log files.

For each log entry I need to know the name of the file from which it came. I'd like to add it as a field. Is there a way to do it? I've googled a little and I've only found this SO question, but the answer is no longer up-to-date.

So far the only solution I see is to specify separate configuration for each possible file name with different "add_field" like so:

input {
  file {
     type => "catalinalog"
     path => [ "/path/to/my/files/catalina**" ]
     add_field => { "server" => "prod1" }
  }
}

But then I need to reconfigure logstash each time there is a new possible file name. Any better ideas?


Source: (StackOverflow)

How to debug the logstash file plugin

Is it possible to log actions of the logstash file plugin? (i.e. what files it tries to send, what errors happen, etc)


Source: (StackOverflow)

Logstash close file descriptors?

BACKGROUND:

We have rsyslog creating log files directories like: /var/log/rsyslog/SERVER-NAME/LOG-DATE/LOG-FILE-NAME So multiple servers are spilling out their logs of different dates to a central location.

Now to read these logs and store them in elasticsearch for analysing I have my logstash config file something like this:

file{
   path => /var/log/rsyslog/**/*.log
}

ISSUE :

Now as number of log files in the directory increase, logstash opens file descriptors (FD) for new files and will not release FDs for already read log files. Since log files are generated per date, once it is read, it is of no use after that since it will not be updated after that date.

I have increased the file openings limit to 65K in /etc/security/limits.conf

Can we make logstash close the handle after some time so that number of file handles opened do not increase too much ??


Source: (StackOverflow)

Importing and updating data in Elasticsearch

We have an existing search function that involves data across multiple tables in SQL Server. This causes a heavy load on our DB, so I'm trying to find a better way to search through this data (it doesn't change very often). I have been working with Logstash and Elasticsearch for about a week using an import containing 1.2 million records. My question is essentially, "how do I update existing documents using my 'primary key'"?

CSV data file (pipe delimited) looks like this:

369|90045|123 ABC ST|LOS ANGELES|CA
368|90045|PVKA0010|LA|CA
367|90012|20000 Venice Boulvd|Los Angeles|CA
365|90045|ABC ST 123|LOS ANGELES|CA
363|90045|ADHOCTESTPROPERTY|DALES|CA

My logstash config looks like this:

input {
  stdin {
    type => "stdin-type"
  }

  file {
    path => ["C:/Data/sample/*"]
    start_position => "beginning"
  }
}

filter {
  csv {
    columns => ["property_id","postal_code","address_1","city","state_code"]
    separator => "|"
  }
}

output {
  elasticsearch {
    embedded => true
    index => "samples4"
    index_type => "sample"
  }
}

A document in elasticsearch, then looks like this:

{
   "_index": "samples4",
   "_type": "sample",
   "_id": "64Dc0_1eQ3uSln_k-4X26A",
   "_score": 1.4054651,
   "_source": {
   "message": [
      "369|90045|123 ABC ST|LOS ANGELES|CA\r"
   ],
   "@version": "1",
   "@timestamp": "2014-02-11T22:58:38.365Z",
   "host": "[host]",
   "path": "C:/Data/sample/sample.csv",
   "property_id": "369",
   "postal_code": "90045",
   "address_1": "123 ABC ST",
   "city": "LOS ANGELES",
   "state_code": "CA"
}

I think would like the unique ID in the _id field, to be replaced with the value of property_id. The idea is that subsequent data files would contain updates. I don't need to keep previous versions and there wouldn't be a case where we added or removed keys from a document.

The document_id setting for elasticsearch output doesn't put that field's value into _id (it just put in "property_id" and only stored/updated one document). I know I'm missing something here. Am I just taking the wrong approach?

EDIT: WORKING!

Using @rutter's suggestion, I've updated the output config to this:

output {
  elasticsearch {
    embedded => true
    index => "samples6"
    index_type => "sample"
    document_id => "%{property_id}"
  }
}

Now documents are updating by dropping new files into the data folder as expected. _id and property_id are the same value.

{
   "_index": "samples6",
   "_type": "sample",
   "_id": "351",
   "_score": 1,
   "_source": {
   "message": [
      "351|90045|Easy as 123 ST|LOS ANGELES|CA\r"
   ],
   "@version": "1",
   "@timestamp": "2014-02-12T16:12:52.102Z",
   "host": "TXDFWL3474",
   "path": "C:/Data/sample/sample_update_3.csv",
   "property_id": "351",
   "postal_code": "90045",
   "address_1": "Easy as 123 ST",
   "city": "LOS ANGELES",
   "state_code": "CA"
}


Source: (StackOverflow)

Logstash agent not indexing anymore

I have a Logstash instance running as a service that reads from Redis and outputs to Elasticsearch. I just noticed there was nothing new in Elasticsearch for the last few days, but the Redis lists were increasing.

Logstash log was filled with 2 errors repeated for thousands of lines:

:message=>"Got error to send bulk of actions"
:message=>"Failed to flush outgoing items"

The reason being:

{"error":"IllegalArgumentException[Malformed action/metadata line [107], expected a simple value for field [_type] but found [START_ARRAY]]","status":500}, 

Additionally, trying to stop the service failed repeatedly, I had to kill it. Restarting it emptied the Redis lists and imported everything to Elasticsearch. It seems to work ok now.

But I have no idea how to prevent that from happening again. The mentioned type field is set as a string for each input directive, so I don't understand how it could have become an array.
What am I missing?

I'm using Elasticsearch 1.7.1 and Logstash 1.5.3. The logstash.conf file looks like this:

input {
  redis {
    host => "127.0.0.1"
    port => 6381
    data_type => "list"
    key => "b2c-web"
    type => "b2c-web"
    codec => "json"
  }
  redis {
    host => "127.0.0.1"
    port => 6381
    data_type => "list"
    key => "b2c-web-staging"
    type => "b2c-web-staging"
    codec => "json"
  }

    /* other redis inputs, only key/type variations */
}
filter {
  grok {
    match => ["msg", "Cache hit %{WORD:query} in %{NUMBER:hit_total:int}ms. Network: %{NUMBER:hit_network:int} ms.     Deserialization %{NUMBER:hit_deserial:int}"]
    add_tag => ["cache_hit"]
    tag_on_failure => []
  }
  /* other groks, not related to type field */
}
output {
  elasticsearch {
    host => "[IP]"
    port => "9200"
    protocol=> "http"
    cluster => "logstash-prod-2"
  }
}

Source: (StackOverflow)

Using Log4J with LogStash

I'm new to LogStash. I have some logs written from a Java application in Log4J. I'm in the process of trying to get those logs into ElasticSearch. For the life of me, I can't seem to get it to work consistently. Currently, I'm using the following logstash configuration:

input {
  file {
    type => "log4j"
    path => "/home/ubuntu/logs/application.log"
  }
}
filter {
  grok {
    type => "log4j"
    add_tag => [ "ApplicationName" ]
    match => [ "message", "%{TIMESTAMP_ISO8601:timestamp}  %{LOGLEVEL:level}" ]
  }
}
output {
  elasticsearch {
    protocol => "http"
    codec => "plain"
    host => "[myIpAddress]"
    port => "[myPort]"
  }
}

This configuration seems to be hit or miss. I'm not sure why. For instance, I have two messages. One works, and the other throws a parse failure. Yet, I'm not sure why. Here are the messages and their respective results:

Tags                   Message
------                 -------
["_grokparsefailure"]  2014-04-04 20:14:11,613 TRACE c.g.w.MyJavaClass [pool-2- 
                       thread-6] message was null from https://domain.com/id-1/env-
                       MethodName

["ApplicationName"]    2014-04-04 20:14:11,960 TRACE c.g.w.MyJavaClass [pool-2-
                       thread-4] message was null from https://domain.com/id-1/stable-
                       MethodName

The one with ["ApplicationName"] has my custom fields of timestamp and level. However, the entry with ["_grokparsefailure"] does NOT have my custom fields. The strange piece is, the logs are nearly identical as shown in the message column above. This is really confusing me, yet, I don't know how to figure out what the problem is or how to get beyond it. Does anyone know how how I can use import log4j logs into logstash and get the following fields consistently:

  • Log Level
  • Timestamp
  • Log message
  • Machine Name
  • Thread

Thank you for any help you can provide. Even if I can just the log level, timestamp, and log message, that would be a HUGE help. I sincerely appreciate it!


Source: (StackOverflow)

Regex query syntax examples in kibana [closed]

I am trying to find the different kinds of syntax I can give in regex type of query through kibana, but I was not able to find any information on this. I am running logstash and elasticsearch in the backend. Any answer or example will be helpful.


Source: (StackOverflow)