EzDevInfo.com

elasticutils

[deprecated] A friendly chainable ElasticSearch interface for python ElasticUtils — ElasticUtils 0.10.3 documentation

Query Multi_Field with Elasticutils

I have defined in Elastic Search the following mapping for my index with one multi_field type.

{
    'station': {
       "properties" :{
           'id': {'type': 'integer'},
           'call': {'type': 'multi_field', 
                    'fields' : {
                               'call': {'type': 'string', 'analyzer': 'whitespace'},
                               'raw': {'type': 'string', 'index': 'not_analyzed'}
                               }
                    }
        }
    }
}

I like Mozilla's ElasticUltis, but I couldn't find a way to query the multi_field fields.

I would have expected something like:

myQuery = S.query(call_raw__wildcard="value")

Does anyone know how to Query a multi_field field with elasticutils?


Source: (StackOverflow)

Can the accuracy of the counts in an ElasticSearch Terms Aggregation be relied upon in certain conditions?

As detailed in the docs the terms aggregation does not always give accurate counts.

The docs explain the way in which multiple shards each with a different view of the data can cause inaccuracies when queried and combined if the size parameter is smaller than the number of unique terms.

They do not however make it clear if in the case that the size parameter is larger than the number of unique terms whether the accuracy of the document counts can be relied upon.

Is this the case?


Source: (StackOverflow)

Advertisements

Elasticsearch is being really slow when returning all results

I have an index of ~113000 documents. I'm trying to retrieve all of them, and I don't care about the score. basically a select * from index;

And i'm doing this in python using elasticutils (haven't found the time to switch to elasticsearch-dsl yet)

Running

S().indexes('da_userstats').query().count()  

completes in about 0.003 seconds.

Running

S().indexes('da_userstats').query()[0:113595].execute().objects 

is taking about 15 seconds.

From what I understand of the documentation both should forcing execution, so I don't see why there is the huge difference in time.

In the mapping I've tried marking the fields as don't analyze but its had no effect. I really don't get why there is a difference of so many orders of magnitude.

@classmethod
def get_mapping(cls):
    return {
        'properties': {
            'id': {
                'type': 'integer',
                'index': 'not_analyzed',
                "include_in_all": False,
            },
            'email': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'username': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'date_joined': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'last_activity': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'last_activity_web': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'last_activity_ios': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },

Source: (StackOverflow)

ElasticSearch with filter via elasticutils

I'm currently trying to use a filter in an existing ElasticSearch instance via the library elasticutils. I'm getting nowhere, unfortunately. I'm not sure if the problem is because I did something basic wrong or if there's a problem in the library (could well be, AFAICT).

I've got an index with a specific mapping, containing a field (say "A") of type string (no explicit analyzer given). That field always contains a list of strings.

I'd like to filter my documents by containing a given string in that field A, so I tried:

import elasticutils as eu
es = eu.S().es(urls=[ URL ]).indexes(INDEX).doctypes(DOCTYPE)
f = eu.F(A="text")
result = es.filter(f)

But that returns an empty result set. I also tried it using f = eu.F(A__in="text") but that resulted in a large error message, the most intriguing part of it being [terms] filter does not support [A].

I'm wondering if I have to configure my index differently, maybe I have to create a facet to be able to use filter? But I didn't find any hint on this in the documentation I read.

My reason for wanting to use filter is that they can be combined freely using and, or, and not. I also found some specs describing that query also can be boolean, but they typically refer to must, should, and must_not which aren't flexible enough for me I think. But I also found some specs which mentioned an operator flag for querys which can be set to and or or. Any info on that is welcome.

So, my questions now are:

  • Is it a configuration problem? Do facets have something to do with this?
  • I'd like to test whether this is a library bug by skipping the lib, so how can I perform this filtering action using just, say, curl? Or any other library (maybe pyes)?
  • Is a flexible combining (using and, or, not, and groupings of them) of several queries possible (i. e. without using filters at all)? How would I do that? (Preferably in elasticutils but other library syntaxes, e. g. pyes, or simple CURLs are welcome as well).

Source: (StackOverflow)

Elasticsearch filter (numeric field) returns nothing

Type mapping

{
  "pois-en": {
    "mappings": {
      "poi": {
        "properties": {
           "address": {
              "type": "string",
              "analyzer": "portuguese"
           },
           "city": {
              "type": "integer"
           },
           (...)
           "type": {
              "type": "integer"
           }
        }
      }
    }
  }
}

Query all:

GET pois-en/_search
{
  "query":{
    "match_all":{}
  },
  "fields": ["city"]
}

returns:

"hits": [
     {
        "_index": "pois-en",
        "_type": "pois_poi",
        "_id": "491",
        "_score": 1,
        "fields": {
           "city": [
              91
           ]
        }
     },
     (...)

But when i filter using:

GET pois-en/_search
{
    "query" : {
        "filtered" : { 
            "query" : {
                "match_all" : {} 
            },
            "filter" : {
                "term" : { 
                    "city" : 91
                }
            }
        }
    }
}

Its returns nothing!

I can't figure out what i'm doing wrong. To Django and Elasticsearch communication i'm Elasticutils (https://github.com/mozilla/elasticutils) but i'm using Sense now to make those queries.

Thanks in advance


Source: (StackOverflow)

elasticutils + django, extract_document can return a nested dictionary?

I have many questions regarding elasticutils and not sure if creating an issue for each question in the github or not.

Question 1.

When you create a mapping for a django model, and the model has a foreign key,
can you return a dictionary for the foreign key in extract_document()?

@classmethod
def extract_document(cls, obj_id, obj=None):
    if obj is None:
        obj = cls.get_model().objects.get(pk=obj_id)

    return {
        'id': obj.id,
        'title': obj.title,
        'main_post': {
            'id': obj.main_post.id,
            'raw_html': obj.main_post.raw_html,
            'user_id': obj.main_post.user.id
        },
        'deleted': obj.deleted
    }

Question 2.

Is there an equivalent of haystack's load_all() in elasticutils?


Source: (StackOverflow)