carrot in Python

Sending other data along with a file using Kombu (carrot)

How can I send a file along with other data using Kombu? I'm using rabbitmq as the broker.

e.g. file.pdf along a dictionary {'author': 'user'}

I'd like to do this in a single message if possible. But if you reckon using standard serializers of Kombu, how would you bind the separate messages?

Source: (StackOverflow)

Regarding Carrot Gems for ruby

I am using carrot gem for message publishing in ruby and my sample code is as follows :

Code :

require 'carrot'

q = Carrot.queue('testqueue', :durable => true)

q.publish("sample data")

Please can someone tell me how to publish the same data using a routing key ?

Source: (StackOverflow)

Using the carrot2 workbench without specifying a query

I would like to use the workbench to do some tests but I could not understand how to run it without specifying a query. I would like to make the cluster of documents, without having to specify a query using the workbench. Is it possible?

Tanks

Source: (StackOverflow)

Got java heap size error when trying to cluster 15980 documents via carrot2workbench

My environment: 8GB Ram Notebook with Ubuntu 14.04, Solr 4.3.1, carrot2workbench 3.10.0

My Solr Index: 15980 documents

My Problem: Cluster all documents with the kmeans algorithm

When I drop off the query in the carrot2workbench (query: :), I always get a Java heap size error when using more than ~1000 Results. I started Solr with -Xms256m -Xmx6g but it still occurs.

Is it really a heap size problem or could it be somewhere else?

Source: (StackOverflow)

Where is the best place initialize a Singleton in Rails?

I am using the Carrot AMQP library in a Ruby on Rails app and I only want to initial the settings once and not on every task that is generated.

I currently have it in my environment.rb and it seems to work but I am not entirely sure this is the best place.

Is having Carrot initialized only once Rails has booted even a good idea or should I create a new Carrot object for every task that is created?

Source: (StackOverflow)

Python ImportError with Sage

Okay I am newer to python and have been researching this problem but I can't find anything like it so I am not sure what is going on.

I am creating a program that involves sage and it has a message cue. We have this set up on a development machine, so I know it works but I was wanting to set it up on my own computer so I could get a better understanding of how it all works and make it easier to develop for myself.

To start up sage, we run a script that calls sages main binary file and passes it an executable .py file. (./sage/sage ./sage_server.py) This creates an error in the sage_server.py file:

Traceback (most recent call last):
  File "./sage_server.py", line 23, in <module>
    from carrot.messaging import Publisher
ImportError: No module named carrot.messaging

But whenever I run that file just in the terminal (./sage_server) the import works fine and isn't until line 27 that there is an error when it tries to import something from sage.

Does anyone know what would cause the error when it is being called by something else? I am very lost as to what would be causing this.

Source: (StackOverflow)

Efficiently selecting a title (the center of the cluster) for a cluster of strings

I have an (imperfectly) clustered string data, where the items in one cluster might look like this:

[ 
  Yellow ripe banana very tasty,
  Yellow ripe banana with little dots,
  Green apple with little dots,
  Green ripe banana - from the market, 
  Yellow ripe banana,
  Nice yellow ripe banana,
  Cool yellow ripe banana - my favourite,
  Yellow ripe,
  Yellow ripe
],

where the optimal title would be 'Yellow ripe banana'.

Currently, I am using simple heuristics - choosing the most common, or the shortest name if tie, - with the help of SQL GROUP BY. My data contains a large amount of such clusters, they change frequently, and, every time a new fruit is added to or removed from the cluster, the title for the cluster has to be re-calculated.

I would like to improve two things:

(1) Efficiency - e.g., compare the new fruit name to the title of the cluster only, and avoid grouping / phrase clustering of all fruit titles each time.

(2) Precision - instead of looking for the most common complete name, I would like to extract the most common phrase. The current algorithm would choose 'Yellow ripe', which repeats 2 times and is the most common complete phrase; however, as the phrase, 'Yellow ripe banana' is the most common in the given set.

I am thinking of using Solr + Carrot2 (got no experience with the second). At this point, I do not need to cluster the documents - they are already clustered based on other parameters - I only need to choose the central phrase as the center/title of the cluster.

Any input is very appreciated, thanks!

Source: (StackOverflow)

How to use an external server with Ruby AMQP Carrot Library

I am using the Ruby AMQP Carrot library and I am trying to talk to a test RabbitMQ server on a virtual machine. The AMQP port is open on the machine but I can't get Carrot to establish an external connection. I have tried the following:

Carrot.queue('message', :durable => true, :server => '192.168.162.176')

Carrot.queue('message', :durable => true, :host => '192.168.162.176')

Source: (StackOverflow)

RabbitMQ: Connecting & publishing to an existing queue in Ruby

I have two process types on Heroku: a web dyno in Ruby and a worker in Node.js. I'm using the RabbitMQ addon (currently beta) to pass a message from Ruby to Node. Node connects and consumes correctly, and Ruby connects and publishes correctly as long as it is the first to connect / create the queue.

Apparently, Carrot throws some funny errors when you try to create a queue that already exists, which is how I discovered that the reason for not being able to get my message across (I could have sworn it worked when I tested last night) was that I started my Node process before my Ruby.

Since I'm on Heroku, I'm going to have more than one of each Ruby and Node threads working concurrently, and they each need to support being the first to start a queue and connect into an existing queue, without issue.

Which brings me to my question:

How do I connect to an existing RabbitMQ queue, using Ruby, for the purpose of publishing messages to consumers which are already connected and waiting to receive messages?

Source: (StackOverflow)

Celery (Django) Rate limiting

I'm using Celery to process multiple data-mining tasks. One of these tasks connects to a remote service which allows a maximum of 10 simultaneous connections per user (or in other words, it CAN exceed 10 connections globally but it CANNOT exceed 10 connections per individual job).

I THINK Token Bucket (rate limiting) is what I'm looking for, but I can't seem to find any implementation of it.

Source: (StackOverflow)

can't connect to rabbit mq with ruby gem carrot

I am trying to connect to my rabbitmq server. I am using

require 'carrot'
@client = Carrot.new(:host => 10.xx.xx.xx, :port => 5672)
q = @client.queue("my_queue")

I am getting this error

"#<Carrot::AMQP::Server::ServerDown: Connection reset by peer>"

How do I check if my server is down? and how do I restart it?

rabbitmq-server

Source: (StackOverflow)

Carrot (Python) [errno 10054] An existing connection was forcibly closed by the remote host

We are using Carrot in our Python project. I wrote a Python script acting as the consumer of the message queue. I invoked this Python script using command line shell in Windows 7 as

python consumer.py

However, after a while, the running session was aborted and the error is:

[errno 10054] An existing connection was forcibly closed by the remote host

The producer session is still running fine on the Linux server. Just wondering how can I fix this and have a long running consumer session on Windows .

Source: (StackOverflow)