EzDevInfo.com

elasticsearch

Simple PHP client for ElasticSearch

How to test ElasticSearch in a Rails application (Rspec)

I was wondering how you were testing the search in your application when using ElasticSearch and Tire.

  • How do you setup a new ElasticSearch test instance? Is there a way to mock it?

  • Any gems you know of that might help with that?


Some stuff I found helpful:

I found a great article answering pretty much all my questions :)

http://bitsandbit.es/post/11295134047/unit-testing-with-tire-and-elastic-search#disqus_thread

Plus, there is an answer from Karmi, Tire author.

This is useful as well: https://github.com/karmi/tire/wiki/Integration-Testing-Rails-Models-with-Tire

I can't believe I did not find these before asking...


Source: (StackOverflow)

How to handle multiple heterogeneous inputs with Logstash?

Let's say you have 2 very different types of logs such as technical and business logs and you want:

  • raw technical logs be routed towards a graylog2 server using a gelf output,
  • json business logs be stored into an elasticsearch cluster using the dedicated elasticsearch_http output.

I know that with Syslog-NG for instance, the configuration file allow to define several distinct inputs which can then be processed separately before being dispatched; what Logstash seems unable to do. Even if one instance can be initiated with two specific configuration files, all logs take the same channel and are being applied the same processings ...

Should I run as many instances as I have different types of logs?


Source: (StackOverflow)

Advertisements

How to search for a part of a word with ElasticSearch

I've recently started using ElasticSearch and I can't seem to make it search for a part of a word.

Example: I have three documents from my couchdb indexed in ElasticSearch:

{
  "_id" : "1",
  "name" : "John Doeman",
  "function" : "Janitor"
}
{
  "_id" : "2",
  "name" : "Jane Doewoman",
  "function" : "Teacher"
}
{
  "_id" : "3",
  "name" : "Jimmy Jackal",
  "function" : "Student"
} 

So now, I want to search for all documents containing "Doe"

curl http://localhost:9200/my_idx/my_type/_search?q=Doe

That doesn't return any hits. But if I search for

curl http://localhost:9200/my_idx/my_type/_search?q=Doeman

It does return one document (John Doeman).

I've tried setting different analyzers and different filters as properties of my index. I've also tried using a full blown query (for example:

{
    "query" : {
            "term" : {
                     "name" : "Doe"
                     } 
              }                    
}

) But nothing seems to work.

How can I make ElasticSearch find both John Doeman and Jane Doewoman when I search for "Doe" ?

UPDATE

I tried to use the nGram tokenizer and filter, like Igor proposed, like this:

{
    "index" : {
        "index" : "my_idx",
        "type" : "my_type",
        "bulk_size": "100",
        "bulk_timeout" : "10ms",
        "analysis" : {
                   "analyzer" : {
                              "my_analyzer" : {
                                            "type" : "custom",
                                            "tokenizer" : "my_ngram_tokenizer",
                                            "filter" : ["my_ngram_filter"]
                              }
                   },
                   "filter" : {
                            "my_ngram_filter" : {
                                       "type" : "nGram",
                                       "min_gram" : 1,
                                       "max_gram" : 1
                            }
                   },
                   "tokenizer" : {
                               "my_ngram_tokenizer" : {
                                                    "type" : "nGram",
                                                    "min_gram" : 1,
                                                    "max_gram" : 1
                               }
                   }
        }
    }
}

The problem I'm having now is that each and every query returns ALL documents :-S Any pointers? ElasticSearch's documentation on using nGram's isn't great...


Source: (StackOverflow)

Why do I need "store":"yes" in elasticsearch?

I really don't understand why in core types link it says in the attributes descriptions (for a number, for example):

  1. store - Set to yes to store actual field in the index, no to not store it. Defaults to no (note, the JSON document itself is stored, and it can be retrieved from it)
  2. index - Set to no if the value should not be indexed. In this case, store should be set to yes, since if it’s not indexed and not stored, there is nothing to do with it

The two bold parts seem to contradict. If "index":"no", "store":"no" I could still get the value from the source. This could be a good use if I have a field containing a URL for example. No?

I had a little experiment, where I had two mappings, in one a field was set to "store":"yes" and in the other to "store":"no".

In both cases I could still specify in my query:

{"query":{"match_all":{}}, "fields":["my_test_field"]}

and I got the same answer, returning the field.

I thought that if "store" is set to "no" it would mean I could not retreive the specific field, but had to get the whole _source and parse it on the client side.

So, what benefit is there in setting "store" to "yes"? Is it only relevant if I exclude the field from the "_source" field explicitly?


Source: (StackOverflow)

How to upgrade a running Elasticsearch older instance to a newer version?

Essentially I cannot find documents or resources that explains the procedure of upgrading a running Elasticsearch instance into the current version.

Please help me out in a few scenarios:

  1. If I am running an Elasticsearch instance in a single server, how do I upgrade the instance and not lose data?

  2. If I am running multiple Elasticsearch instances in a number of servers, how do I keep my operations running, while I upgrade my Elasticsearch instances without losing data?

If there are proper procedures or explanations on this it will greatly help my understanding and work. Thanks!


Source: (StackOverflow)

When do you start additional Elasticsearch nodes?

I'm in the middle of attempting to replace a Solr setup with Elasticsearch. This is a new setup, which has not yet seen production, so I have lots of room to fiddle with things and get them working well.

I have very, very large amounts of data. I'm indexing some live data and holding onto it for 7 days (by using the _ttl field). I do not store any data in the index (and disabled the _source field). I expect my index to stabilize around 20 billion rows. I will be putting this data into 2-3 named indexes. Search performance so far with up to a few billion rows is totally acceptable, but indexing performance is an issue.

I am a bit confused about how ES uses shards internally. I have created two ES nodes, each with a separate data directory, each with 8 indexes and 1 replica. When I look at the cluster status, I only see one shard and one replica for each node. Doesn't each node keep multiple indexes running internally? (Checking the on-disk storage location shows that there is definitely only one Lucene index present). -- Resolved, as my index setting was not picked up properly from the config. Creating the index using the API and specifying the number of shards and replicas has now produced exactly what I would've expected to see.

Also, I tried running multiple copies of the same ES node (from the same configuration), and it recognizes that there is already a copy running and creates its own working area. These new instances of nodes also seem to only have one index on-disk. -- Now that each node is actually using multiple indices, a single node with many indices is more than sufficient to throttle the entire system, so this is a non-issue.

When do you start additional Elasticsearch nodes, for maximum indexing performance? Should I have many nodes each running with 1 index 1 replica, or fewer nodes with tons of indexes? Is there something I'm missing with my configuration in order to have single nodes doing more work?

Also: Is there any metric for knowing when an HTTP-only node is overloaded? Right now I have one node devoted to HTTP only, but aside from CPU usage, I can't tell if it's doing OK or not. When is it time to start additional HTTP nodes and split up your indexing software to point to the various nodes?


Source: (StackOverflow)

Solr vs. ElasticSearch [closed]

What are the core architectural differences between these technologies?

Also, what use cases are generally more appropriate for each?


Source: (StackOverflow)

Queries vs. Filters

I can't see any description of when I should use a query or a filter or some combination of the two. Can anyone please explain or point me to an explanation?


Source: (StackOverflow)

How to set up ES cluster?

Assuming I have 5 machines I want to run an elasticsearch cluster on, and they are all connected to a shared drive. I put a single copy of elasticsearch onto that shared drive so all three can see it. Do I just start the elasticsearch on that shared drive on eall of my machines and the clustering would automatically work its magic? Or would I have to configure specific settings to get the elasticsearch to realize that its running on 5 machines? If so, what are the relevant settings? Should I worry about configuring for replicas or is it handled automatically?


Source: (StackOverflow)

Shards and replicas in Elasticsearch

I am trying to understand what shard and replica is in Elasticsearch, but I don't manage to understand it. If I download Elasticsearch and run the script, then from what I know I have started a cluster with a single node. Now this node (my PC) have 5 shards (?) and some replicas (?).

What are they, do I have 5 duplicates of the index? If so why? I could need some explanation.


Source: (StackOverflow)

How to retrieve unique count of a field using Kibana + Elastic Search

Is it possible to query for a distinct/unique count of a field using Kibana? I am using elastic search as my backend to Kibana.

If so, what is the syntax of the query? Heres a link to the Kibana interface I would like to make my query: http://demo.kibana.org/#/dashboard

I am parsing nginx access logs with logstash and storing the data into elastic search. Then, I use Kibana to run queries and visualize my data in charts. Specifically, I want to know the count of unique IP addresses for a specific time frame using Kibana.


Source: (StackOverflow)

how to use elasticsearch with mongoDB

I have gone through many blogs and sites about configuring elastic search for mongoDB to index Collections in mongoDB but none of them where straightforward, please explain to me a step by step process of an elastic search installation -> configuration -> run on browser. I am using node.js with express.js so please help accordingly


Source: (StackOverflow)

elasticsearch query to return all records

I have a small database in elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...

http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

Can someone give me the URL you would use to accomplish this please?


Source: (StackOverflow)

How to stop/shut down an elasticsearch node?

I want to restart an elasticsearch node with a new configuration. What is the best way to gracefully shut down an node?

Is killing the process the best way of shutting the server down, or is there some magic URL I can use to shut the node down?


Source: (StackOverflow)

Installing Elasticsearch on OSX Mavericks

I'm trying to install Elasticsearch 1.1.0 on OSX Mavericks but i got the following errors when i'm trying to start:

:> ./elasticsearch
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.Version
at org.elasticsearch.bootstrap.Bootstrap.buildErrorMessage(Bootstrap.java:252)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:236)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)

Also when i'm executing the same command with -v arg, i got this error:

:> ./elasticsearch -v
Exception in thread "main" java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<clinit>(Version.java:42)

Here's my environment:

Java version

>: java -version
java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)

Instalation path (downloaded .tar.gz archive from elasticsearch download page and extracted here):

/usr/local/elasticsearch-1.1.0

ENV vars:

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home 
CLASSPATH=/usr/local/elasticsearch-1.1.0/lib/*.jar:/usr/local/elasticsearch-1.1.0/lib/sigar/*.jar

UPDATE

i finally make it working, unfortunally not sure how because i tried a lot of changes :). But here's a list of changes i made that can help:

~/Library/Caches

/Library/Caches

  • i removed CLASSPATH env var.

  • ES_PATH and ES_HOME env vars are not set either, but i think this is not so important.

Note: now it's working also if i'm installing with brew.

Thanks.


Source: (StackOverflow)