EzDevInfo.com

sunburnt

Python interface to Solr Python Sunburnt - Google Groups

How to use sunburnt to filter pdf files from a solr collection

How can I use Python-based interface for Solr - Sunburnt, to give me only the pdf files from a solr collection?


Source: (StackOverflow)

Apply pagination using sunburnt highlighted search

I am using Sunburnt Python Api For Solr Search

I am using highlighted search in Sunburnt it works fine

I am using the following code

search_record = solrconn.query(search_text).highlight("content").highlight("title")
records = search_record.execute().highlighting

Problem is it returns only 10 records. I know it can be change from solr-config.xml but issue is i want all records

I want to apply pagination using highlighted search of sunburnt

can any one help me ....


Source: (StackOverflow)

Advertisements

How to do a Sunburnt Query on exact term

Trying to search for exactly "11000060K2"

    from solr import SolrConnection
    from sunburnt import RawString

    term = "11000060K2"
    solr_conn = SolrConnection()
    scoreDocs = solr_conn.si.query(activityemail=RawString(term)).paginate(start=0, rows=1000).execute()
    params_dict = scoreDocs.params
    for key, keyvalue in params_dict:
        logging.debug ("param %s    value %s "  %(key, keyvalue) )

Returns:

param start    value 0 
param q    value activityemail:11000060K2 
param rows    value 1000 

And a bunch of results that match other terms.

I want it to return only documents that match 11000060K2 with a query that returns / looks like:

param q    value activityemail:"11000060K2"

Please tell me what am I doing wrong.


Source: (StackOverflow)

KeyError: 'id' when trying to index documents to Solr using sunburnt

I am trying to index a few text files to Solr using sunburnt. Below is my code

solr_url = "http://localhost:8983/solr"      
h = httplib2.Http(cache="/var/tmp/solr_cache")    
solr_instance = sunburnt.SolrInterface(url=solr_url, http_connection=h)

for url,title, webpage in webpages: 
html_id = hashlib.md5(url).hexdigest()
doc = {"id":html_id, "content":webpage, "title":title}  
solr_instance.add(doc)

try:
    solr_instance.commit()
except:
      print "Could not Commit Changes to Solr, check the log files."
else:
      print "Successfully committed changes"

But when I run this I get below error.

  File "/Users/ananya/Desktop/dbms project/code/extractText/ExtractText.py", line 94, in index_to_Solr
    solr_instance = sunburnt.SolrInterface(url=solr_url, http_connection=h)

  File "/Users/ananya/anaconda/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 166, in __init__
    self.init_schema()

  File "/Users/ananya/anaconda/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 177, in init_schema
    self.schema = SolrSchema(schemadoc, format=self.format)

  File "/Users/ananya/anaconda/lib/python2.7/site-packages/sunburnt/schema.py", line 417, in __init__
    if self.unique_key else None

KeyError: 'id'

I am very new to Solr. Please help me. Do I need to make any changes to the schema file? If yes, please let me know how.

Thanks.


Source: (StackOverflow)

Solr - Error adding document with multivalued field

Schema:

<field name="tags" type="string_ci" indexed="true" stored="true" multiValued="true" />

Document:

document = {
    "id":123, 
    "title":"this is title", 
    "description":"this is desc", 
    "tags":["beach", "luxury", "RTW"]
}

Error:

<title>Error 400 ERROR: [doc=20] Error adding field \'tags\'=\'[beach, luxury, RTW]\'</title>

I tried REST, python module solrpy & sunburnt but gives the same error.


Source: (StackOverflow)

Query with relation in Solr

I have this situation:

{"product": {"name": "Name of Product",
             "categories": [{'name': 'Category 1'}, {'name': 'Category' 2}]}

This is the structure's resume of my solr document. When I'm going to search, I always will search for the name of the product and for the category. But, if I search for this product and category = 'Category 1', I should return a json like this:

{"product": {"name": "Name of Product",
             "categories": {'name': 'Category 1'}}

I don't know the best way to do this. For now, my options are:

  1. Make this final structure in the code;
  2. Make two collections in Solr, Product and Category, and simulate a join to mount this final response.

I am really new in Solr, so I am kind of confused.

By the way, I am using sunburnt in my Flask application.


Source: (StackOverflow)

Solr-Sunburnt-Nutch. content field missing in results

Im using solr-sunburnt with django. I have used nutch to crawl and index my site. I copied the nutch schema.xml to solr.

The problem I'm facing is that when I send a query, the results do not have the content field in them.

Results are the same whether I query from sunburnt or directly solr (from browser, :8983/solr/select).

What do i need to do to get content field in my results

P.S. I'm a noob when it comes to searching and solr. :)


Source: (StackOverflow)

Alternative way to use getattr in python?

I am trying to call a object functions which also allows several different function to be called through same object :

for eg: sort(), facet(),exclude() are different functions with their arguments and sort_flag, facet_flag and exclude_flag as condition set to true or false

can be called as:

si = sunburnt.SolrInterface("url","schema.xml")
response = si.query(arg1).facet(arg2).sort(arg3).exclude(arg4)

There can be certaing cases when I dont need to call all of these functions at same time or may be I dont have all the arguments to call these functions or vice versa. In that situtation how can I call si.facet(args).sort(args) something like this:

if sort_flag:
  --append sort function to the query
if exclude_flag:
  -- append exclude function

There can be alternative to do that using getattr but its confusing to use it using arguments of function and at same time it may generate lot of if check statements (for 3 flags close to 3 factorial statements)


Source: (StackOverflow)

sunburnt - how to see the generated query URL

I'm using sunburnt, a python library for talking to Solr. I'm getting some unexpected results and it would help me in debugging if I could see what query was being generated by sunburnt. So instead of doing:

result = query.execute()

I want to do something like

url = query.generate_url()

Is anything like this possible? Are there any hacks that can achieve the same effect?


Source: (StackOverflow)

Solr, sunburnt (python) and highlighting: how-to?

What's the best way to implement sunburnt's highlight response into an application (django based, in this case)?

This link shows how's the response structured.

As they say

The results are shown as a dictionary of dictionaries

which is fair understandable enough. What i don't understand is this:

The text is highlighted with HTML, and the fragments should be suitable for dropping straight into a search template

How can i "drop the fragments in the template"? In the example they do highlight the word "Game". How can I use those highlighted fragments? Do i have to do a "search-and-replace regex" on my text? Is there another (hopefully smarter) way to deal with this?

I'm really stuck this time, and cannot come up with any solution. Thanks all in advance.


Source: (StackOverflow)

Creating a dynamic sized OR query using Sunburnt+Solr

I'm trying out the python Solr interface Sunburnt , and I've come across a little problem I can't seem to figure out. From my search field, I want to accept an arbitrary number of words which I put in an array (e.g. "Music 'Iron Maiden'" -> ['Music', 'Iron Maiden']. This I've figured out (using shlex).

The problem is that Sunburnt syntax for ORing terms is

    response = si.query(si.Q(tag = 'Music') | si.Q(tag = 'Iron Maiden'))

How can I iterate over my searchword list and end up with something like the above? Or is there any other way of doing it that I'm not aware of?


Source: (StackOverflow)

Sunburnt solr wildcard *:*

I need a way of Using the solr wildcard : in sunburnt solr or is there another way of specifying 'all documents' from index then refining.Here is the code

....
si = sunburnt.SolrInterface(url=solr_url,http_connection=h)
search_terms = {SEARCH_TERMS_COMIN_FROM_A_FORM}

#!This is where I need help!
result = si.query(WILDCARD)#I need all the docs from the index

#then I can do this
if search_terms['province']:
    result = result.query(province=search_terms['province'])
if search_terms['town']:
    result = result.query(town=search_terms['town'])
.......#two other similar if statement blocks
#finally
results = result.execute()

Source: (StackOverflow)

Sunburnt arbitrary search

I am using sunburnt solar API I want to make a query like this

solrconn.query(solrconn.Q("disease")|solrconn.Q("heart"))).highlight("content").highlight("title")

The above query is running accurately but i want to make this portion dynamic

solrconn.Q("disease")|solrconn.Q("heart")

For this i am doing

search_words=search_text.split(" ")
bitwiseQuery=""
count=0
for word in search_words:
    count=count+1
    if count<len(search_words):
        bitwiseQuery+='solrconn.Q("'+word+'")|'
    if count==len(search_words):
        bitwiseQuery+='solrconn.Q("'+word+'")'

search_record=(solrconn.query(bitwiseQuery)).highlight("content").highlight("title")

But it is not giving me any result , Any Idea how can I do this...


Source: (StackOverflow)

sunburnt : SolrError while parsing response

I came upon the following error trace while just playing with this interface I plan to use in a Django app:

import sunburnt
si = sunburnt.SolrInterface("http://localhost:8984/solr/sprod/") si.query(global_attr_article_type='casual shoes').execute()
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/sunburnt/search.py", line 599, in execute
result = self.interface.search(**self.options())
File "/usr/local/lib/python2.7/dist-packages/sunburnt/sunburnt.py", line 212, in search
return self.schema.parse_response(self.conn.select(params))
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 510, in parse_response
return SolrResponse(self, msg)
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 652, in init
self.result = SolrResult(schema, result_node)
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 691, in init
self.docs = [schema.parse_result_doc(n) for n in node.xpath("doc")]
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 519, in parse_result_doc
return dict([self.parse_result_doc(n) for n in doc.getchildren()])
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 516, in parse_result_doc
values = [self.parse_result_doc(n, name) for n in doc.getchildren()]
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 525, in parse_result_doc
return name, SolrFieldInstance.from_solr(field_class, doc.text or '').to_user_data()
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 326, in from_solr
self.value = self.field.from_solr(data)
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 161, in from_solr
return self.normalize(value)
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 219, in normalize
(value, self.class, self.name))
SolrError: is invalid value for class 'sunburnt.schema.SolrFieldType_SolrIntField_indexed_True_omitNorms_True_stored_True' (field designer) `

The designer field in the indexed document is indeed empty
<arr name="designer"> <int/> </arr> <arr name="discount"> <float>0.0</float> </arr> <arr name="discount_label"> <str/> </arr>

and here's what the schema's got
<fieldType name="integer" class="solr.IntField" omitNorms="true"/>
..
...
....
<field name="designer" type="integer" indexed="true" stored="true"/>

I understand this has to do with the field being empty but since the schema doesn't mention 'required' = true anywhere for this field, I wonder what's really up.


Source: (StackOverflow)