EzDevInfo.com

pyes

Python connector for ElasticSearch - the pythonic way to use ElasticSearch

Get it on Github
Language : Python

Storing only selected fields and not storing _all in pyes/elasticsearch

I am trying to use pyes with elasticsearch as full text search engine, I store only UUIDs and indexes of string fields, actual data is stored in MonogDB and retrieved using UUIDs. Unfortunately, I am unable to create a mapping that wouldn't store original data, I've tried various combinations of "store"/"source" fields and disabling "_all" but I can still get text of indexed fields. It seems that documentation is misleading on this topic as it's just a copy of original docs.

Can anyone please provide an example of mapping that would only store some fields and not the original document JSON?

Source: (StackOverflow)

Elastic Search No server available, list index out of range

I'm trying to get a simple example working with elastic search using pyes, but I'm having trouble getting the starting examples working. I'm following the documentation found here: http://pyes.readthedocs.org/en/latest/manual/usage.html

and just trying to run the following function, but It's not quite working.

def index_transcripts():
    conn = ES('127.0.0.1:9200')
    conn.indices.create_index("test-index")

index_transcripts()

Which in my mind should be very straightforward, but instead I get the following error:

pyes.exceptions.NoServerAvailable: list index out of range

I'm just starting out with Elastic Search and pyes seems like a wonderful library, but I'm clearly uncertain on how exactly I should use it. Any help would be greatly appreciated.

Source: (StackOverflow)

Advertisements

elasticsearch BindTransportException at startup

This is the exception I'm getting starting up elasticsearch:

STATUS | wrapper  | 2013/03/21 00:43:42 | Launching a JVM...
INFO   | jvm 1    | 2013/03/21 00:43:42 | WrapperManager: Initializing...
INFO   | jvm 1    | 2013/03/21 00:43:45 | {0.19.4}: Startup Failed ...
INFO   | jvm 1    | 2013/03/21 00:43:45 | - BindTransportException[Failed to bind to [9300]]
INFO   | jvm 1    | 2013/03/21 00:43:45 |       ChannelException[Failed to bind to: /192.168.0.1:9300]
INFO   | jvm 1    | 2013/03/21 00:43:45 |               BindException[Cannot assign requested address]
STATUS | wrapper  | 2013/03/21 00:43:47 | <-- Wrapper Stopped

Does anybody have a clue about what could cause the issue?

Source: (StackOverflow)

pyes 'from' keyword can't be set

Since from is a special python keyword I am not able to pass it pyes.es.search function. It give syntax error. pyes.es.search(maf, "twitter", "tweet", sort="timestamp", size=2, from=3) . I passed keyword arguments containing from also as below but from did not work while other worked.

keywords = {'sort': 'timestamp', 'size':3, 'from':2}
r = pyes.es.search(maf, "twitter", "reply",**keywords)

This problem also available for another python elasticsearch module here here. In search function interface there is from argument.

Source: (StackOverflow)

Slow test cases when using elastic search (pyes)

I recently added elastic search to our stack and it is slowing down our test cases.

In order to keep the tests from trampling each other I isolated them by index. This seems to be the cause. Creating and Deleting indexes seems to be slow.

I haven't been able to find an equivalent to "truncate", but I thought I would ask.

Also, a few years ago when I was using ES with Java I used an in memory node for the tests and that was very fast. I don't think pyes has this option.

Source: (StackOverflow)

Configure a tokenizer with pyes

I'm trying to configure one of my fields to use an edge ngram tokenizer. I'm trying to translate the following gist that I found (https://gist.github.com/1037563):

{
    "mappings": {
        "contact": {
            "properties": {
                "twitter": {
                    "type": "object",
                    "properties": {
                        "profile": {
                            "fields": {
                                "profile": {
                                    "type": "string",
                                    "analyzer": "left"
                                },
                                "reverse_profile": {
                                    "type": "string",
                                    "analyzer": "right"
                                }
                            },
                            "type": "multi_field"
                        }
                    }
                }
            }
        }
    },
    "settings": {
        "analysis": {
            "analyzer": {
                "left": {
                    "filter": [
                        "standard",
                        "lowercase",
                        "stop"
                    ],
                    "type": "custom",
                    "tokenizer": "left_tokenizer"
                },
                "right": {
                    "filter": [
                        "standard",
                        "lowercase",
                        "stop"
                    ],
                    "type": "custom",
                    "tokenizer": "right_tokenizer"
                }
            },
            "tokenizer": {
                "left_tokenizer": {
                    "side": "front",
                    "max_gram": 20,
                    "type": "edgeNGram"
                },
                "right_tokenizer": {
                    "side": "back",
                    "max_gram": 20,
                    "type": "edgeNGram"
                }
            }
        }
    }
}

I can see pyes supports the 'put_mapping' API, but this seems to wrap everything inside 'mappings'. I need to be able to pass the analyzer under a 'settings' key and can't work out how to.

Can anyone help?

Source: (StackOverflow)

Elastic Search: pyes.exceptions.IndexMissingException exception from search result

This is a question about Elastic-Search python API (pyes).

I run a very simple testcase through curl, and everything seems to work as expected.

Here is the description of the curl test-case:

The only document that exists in the ES is:

curl 'http://localhost:9200/test/index1' -d '{"page_text":"This is the text that was found on the page!"}

Then I search the ES for all documents that the word "found" exists in. The result seems to be OK:

curl 'http://localhost:9200/test/index1/_search?q=page_text:found&pretty=true'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "test",
      "_type" : "index1",
      "_id" : "uaxRHpQZSpuicawk69Ouwg",
      "_score" : 0.15342641, "_source" : {"page_text":"This is the text that was found on the page!"}

    } ]
  }
}

However, when I run the same query though python2.7 api (pyes), something goes wrong:

>>> import pyes
>>> conn = pyes.ES('localhost:9200')
>>> result = conn.search({"page_text":"found"}, index="index1")
>>> print result
<pyes.es.ResultSet object at 0xd43e50>
>>> result.count()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1717, in count
    return self.total
  File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1686, in total
    self._do_search()
  File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1646, in _do_search
    doc_types=self.doc_types, **self.query_params)
  File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1381, in search_raw
    return self._query_call("_search", body, indices, doc_types, **query_params)
  File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 622, in _query_call
    return self._send_request('GET', path, body, params=querystring_args)
  File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 603, in _send_request
    raise_if_error(response.status, decoded)
  File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/convert_errors.py", line 83, in raise_if_error
    raise excClass(msg, status, result, request)
pyes.exceptions.IndexMissingException: [_all] missing

As you can see, pyes returns the result object, but from some reason I can't even get the number of results there.

Anyone was any guess what may be wrong here?

Thanks a lot in advance!

Source: (StackOverflow)

How to use ResultSet in PyES

I'm using PyES to use ElasticSearch in Python. Typically, I build my queries in the following format:

# Create connection to server.
conn = ES('127.0.0.1:9200')

# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")

# Create query.
q = FilteredQuery(MatchAllQuery(), myFilter).search()

# Execute the query.
results = conn.search(query=q, indices=['my-index'])

print type(results)
# > <class 'pyes.es.ResultSet'>

And this works perfectly. My problem begins when the query returns a large list of documents. Converting the results to a list of dictionaries is computationally demanding, so I'm trying to return the query results already in a dictionary. I came across with this documentation:

http://pyes.readthedocs.org/en/latest/faq.html#id3 http://pyes.readthedocs.org/en/latest/references/pyes.es.html#pyes.es.ResultSet https://github.com/aparo/pyes/blob/master/pyes/es.py (line 1304)

But I can't figure out what exactly I'm supposed to do. Based on the previous links, I've tried this:

from pyes import *
from pyes.query import *
from pyes.es import ResultSet
from pyes.connection import connect

# Create connection to server.
c = connect(servers=['127.0.0.1:9200'])

# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")

# Create query / Search object.
q = FilteredQuery(MatchAllQuery(), myFilter).search()

# (How to) create the model ?
mymodel = lambda x, y: y

# Execute the query.
# class pyes.es.ResultSet(connection, search, indices=None, doc_types=None,
# query_params=None, auto_fix_keys=False, auto_clean_highlight=False, model=None)

resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > TypeError: __init__() got an unexpected keyword argument 'search'

Anyone was able to get a dict from the ResultSet? Any good sugestion to efficiently convert the ResultSet to a (list of) dictionary will be appreciated too.

Source: (StackOverflow)

Implementing ElasticSearch into Pyramid

After doing some research on search engines, I decided to go with ElasticSearch, and was wondering what the quickest and most efficient way of implementing it with pyramid is. I've found the documentation for Pyes, but I am not sure if this is the right path to take. Thanks!

Source: (StackOverflow)

ElasticSearch: Finding documents with field value that is in an array

I have some customer documents that I want to be retrieved using ElasticSearch based on where the customers come from (country field is IN an array of countries).

[
  {
    "name": "A1",
    "address": {
      "street": "1 Downing Street"
      "country": {
        "code": "GB",
        "name": "United Kingdom"
      }
    }
  },
  {
    "name": "A2",
    "address": {
      "street": "25 Gormut Street"
      "country": {
        "code": "FR",
        "name": "France"
      }
    }
  },
  {
    "name": "A3",
    "address": {
      "street": "Bonjour Street"
      "country": {
        "code": "FR",
        "name": "France"
      }
    }
  }
]

Now, I have another an array in my Python code:

["DE", "FR", "IT"]

I'd like to obtain the two documents, A2 and A3.

How would I write this in PyES/Query DSL? Am I supposed to be using an ExistsFilter or a TermQuery for this. ExistsFilter seems to only check whether the field exists or not, but doesn't care about the value.

Source: (StackOverflow)

Rename fields in elasticsearch response

I'm using PyES library for quering the elastcsearch. Let's imagine that my query looks like:

query = MatchAllQuery()
query = query.search(
    fields=[
        "content.title",
        "content.description",
        "content.timestamp",
        "source.name",
        "source.url"
    ],
    count=10
)

result = es_conn.search(
             query=query,
             indices=['my'],
             sort="content.timestamp:desc"
         )

Every result's item is a dict with fields' names as keys, so item = {"content.title": "bla bla", "content.description": "bla bla bla", ... }

My script is only a getter and need to save the results for 3rd party script without processing, but that script requires special keys names: item = { "name": "bla bla", "text": "bla bla bla", ...}

Is it a way to specify in PyES request a rule for renaming fields' names (to "name", "title", "date" etc.) in the returned object?

Of course, i can do that after i got response from elsticsearch by it requires to iterate through the result object (that i what to avoid) and doesn't look so optimal if i have thousand items in the response.

Source: (StackOverflow)

Elastic Search [PUT] error

I'm having some trouble getting elastic search integrated with an existing application, but it should be a fairly straightforward issue. I'm able to create and destroy indices but for some reason I'm having trouble getting data into elastic search and querying for it.

I'm using the pyes library and honestly finding the documentation to be less than helpful on this front. This is my current code:

def initialize_transcripts(database, mapping):
    database.indices.create_index("transcript-index")


def index_course(database, sjson_directory, course_name, mapping):
    database.put_mapping(course_name, {'properties': mapping}, "transcript-index")
    all_transcripts = grab_transcripts(sjson_directory)
    video_counter = 0
    for transcript_tuple in all_transcripts:
        data_map = {"searchable_text": transcript_tuple[0], "uuid": transcript_tuple[1]}
        database.index(data_map, "transcript-index", course_name, video_counter)
        video_counter += 1
    database.indices.refresh("transcript-index")


def search_course(database, query, course_name):
    search_query = TermQuery("searchable_text", query)
    return database.search(query=search_query)

I'm first creating the database, and initializing the index, then trying to add data in and search it with the second two methods. I'm currently getting the following error:

raise ElasticSearchException(response.body, response.status, response.body)
pyes.exceptions.ElasticSearchException: No handler found for uri [/transcript-index/test-course] and method [PUT]

I'm not quite sure how to approach it, and the only reference I could find to this error suggested creating your index beforehand which I believe I am already doing. Has anyone run into this error before? Alternatively do you know of any good places to look that I might not be aware of?

Any help is appreciated.

Source: (StackOverflow)

ElasticSearch GeoDistance Query

I am using geodistance query in python like this

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "geo_distance": {
          "distance": "20miles",
          "location": {
            "lat": 51.512497,
            "lon": -0.052098
          }
        }
      }
    }
  }
}

It is working correctly. My problem is how to give "distance" a value from within the document. I have a field like this distance: 50 in my index for each record and I want to use it as a value of distance in geodistance. I tried "distance":doc['distance'].value but it is not working.

Source: (StackOverflow)

Retrieving specif fields from nested object returned by elasticsearch

I am new to elastic search. I am trying to retrive the selective fields from nested object returned by elasticsearch. Below is the object stored in elastic search index:

{"_index":"xxx","_type":"user","_id":"2","_version":1,"exists":true, "_source" : {"user": {"user_auth": {"username": "nickcoolster@gmail.com", "first_name": "", "last_name": "", "is_active": false, "_state": {"adding": false, "db": "default"}, "id": 2, "is_superuser": false, "is_staff": false, "last_login": "2012-07-10 21:11:53", "password": "sha1$a6caa$cba2f821678ccddc4d70c8bf0c8e0655ab5c279b", "email": "nickcoolster@gmail.com", "date_joined": "2012-07-10 21:11:53"}, "user_account": {}, "user_profile": {"username": null, "user_id": 2, "following_count": 0, "sqwag_count": 0, "pwd_reset_key": null, "_state": {"adding": false, "db": "default"}, "personal_message": null, "followed_by_count": 0, "displayname": "nikhil11", "fullname": "nikhil11", "sqwag_image_url": null, "id": 27, "sqwag_cover_image_url": null}}}}

Now i want only certain fields to be returned from user.user_auth(like password,superuser etc should not be returned). I am using django PyES and below is the code that i tried:

fields = ["user.user_auth.username","user.user_auth.first_name","user.user_auth.last_name","user.user_auth.email"]
    result = con.search(query=q,indices="xxx",doc_types="user",fields=fields)

but the reult that i get is only email being retrieved(i:e only last field being returned):

{"user.user_auth.email": "nikhiltyagi.eng@gmail.com"} i want this abstraction for both the nested objects i:e user_auth,user_profile

how do i do this? any help is appreciated

Thanks in Advance ..:)

Source: (StackOverflow)

Convert the "facet_filter" query into pyes format

I have a following query and I want to change that query into PyES:

{
    "facets": {
        "participating-org.name": {
            "terms": {
                "field": "participating-org.name"
            },
            "facet_filter": {
                "term": {
                    "participating-org.role": "funding"
                }
            },
            "nested": "participating-org"
        }
    }
}

I have searched in PyES documentation about this "facet_filter" but couldn't come up with good query in PyES.
So need some help for converting this JSON query into PyES format.

Source: (StackOverflow)