pycassa in Python

Is there any query for Cassandra as same as SQL:LIKE Condition?

The LIKE condition allows us to use wildcards in the where clause of an SQL statement. This allows us to perform pattern matching. The LIKE condition can be used in any valid SQL statement - select, insert, update, or delete. Like this

SELECT * FROM users
WHERE user_name like 'babu%';

like the same above operation any query is available for Cassandra in CLI.

Source: (StackOverflow)

best Cassandra library/wrapper for Python? [closed]

I found lazyboy and pycassa - maybe there are others too. I've seen many sites recommending lazyboy. IMHO the project seems dead, see https://www.ohloh.net/p/compare?project_0=pycassa&project_1=lazyboy

So what's the best option for a new project? Thanks.

Source: (StackOverflow)

Getting an unexpected NameError in pycassaShell when invoking one function from another

I'm playing with pycassaShell (as part of the Cassandra and the Twissandra tutorial). When trying to add two functions inside the shell, and call one from the other I get an error that the Name is not recognized.

This is probably something very simple, but I did not find how to do this.

The pycassaShell looks like:

In [3]: def aaa(): print 5
In [4]: aaa()
5

In [5]: def bbb(): aaa()

In [6]: bbb()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
...
NameError: global name 'aaa' is not defined

Source: (StackOverflow)

Aptana 3 Unresolved import - Python

I'm trying to import pycassa library in a Python project in Aptana 3, but in the line "import pycassa" it show me the next error: "Unresolved import: pycassa". I installed pycassa with easy install and if I run "import pycassa" in a python shell it run with no errors. If I run the Aptana project run with no errors too, but the error mark continues. Why?

Sorry for my english.

Source: (StackOverflow)

Generate UUID for Cassandra in Python

Heh, I'm using

cf.insert(uuid.uuid1().bytes_le, {'column1': 'val1'}) (pycassa)

to create a TimeUUID for Cassandra, but getting the error

InvalidRequestException: 
InvalidRequestException(why='UUIDs must be exactly 16 bytes')

It doesn't work with

uuid.uuid1()
uuid.uuid1().bytes
str(uuid.uuid1())

either.

What's the best way to create a valid TimeUUID to use with the CompareWith="TimeUUIDType" flag?

Thanks,
Henrik

Source: (StackOverflow)

How do I get all the keys that are stored in the Cassandra column family with pycassa?

Is anyone having experience working with pycassa I have a doubt with it. How do I get all the keys that are stored in the database?

well in this small snippet we need to give the keys in order to get the associated columns (here the keys are 'foo' and 'bar'),that is fine but my requirement is to get all the keys (only keys) at once as Python list or similar data structure.

cf.multiget(['foo', 'bar'])
{'foo': {'column1': 'val2'}, 'bar': {'column1': 'val3', 'column2': 'val4'}}

Thanks.

Source: (StackOverflow)

Updating TimeUUID columns in cassandra

I'm trying to store some time series data on the following column family:

create column family t_data with comparator=TimeUUIDType and default_validation_class=UTF8Type and key_validation_class=UTF8Type;

I'm successfully inserting data this way:

data={datetime.datetime(2013, 3, 4, 17, 8, 57, 919671):'VALUE'}
key='row_id'
col_fam.insert(key,data)

As you can see, using a datetime object as the column name pycassa converts to a timeUUID object correctly.

[default@keyspace] get t_data[row_id];

=> (column=f36ad7be-84ed-11e2-af42-ef3ff4aa7c40, value=VALUE, timestamp=1362423749228331)

Sometimes, the application needs to update some data. The problem is that when I try to update that column, passing the same datetime object, pycassa creates a different UUID object (the time part is the same) so instead of updating the column, it creates another one.

[default@keyspace] get t_data[row_id];

=> (column=f36ad7be-84ed-11e2-af42-ef3ff4aa7c40, value=VALUE, timestamp=1362423749228331)

=> (column=**f36ad7be**-84ed-11e2-b2fa-a6d3e28fea13, value=VALUE, timestamp=1362424025433209)

The question is, how can I update TimeUUID based columns with pycassa passing the datetime object? or, if this is not the correct way to doing it, what is the recommended way?

Source: (StackOverflow)

batch_mutate for counters

Is it possible to use batch_mutate for counters in php? From what I've seen, it should be possible to increment counters in general, but I can't seem to find any working examples in any language.

Thanks

Source: (StackOverflow)

How to create RDD object on cassandra data using pyspark

I am using cassandra 2.0.3 and I would like to use pyspark (Apache Spark Python API) to create an RDD object from cassandra data.

PLEASE NOTE: I do not want to do import CQL and then CQL query from pyspark API rather I would like to create an RDD on which I woud like to do some transformations.

I know this can be done in Scala but I am not able to find out how this could be done from pyspark.

Really appreciate if anyone could guide me on this.

Source: (StackOverflow)

Cassandra (Pycassa/CQL) Return Partial Match

I'm trying to do a partial search through a column family in Cassandra similar to an SQL query like: SELECT * FROM columnfamily WHERE col = 'val*' where val* means any value matching at least the first three characters 'val'.

I've read datastax's documentation on the SELECT function, but can't seem to find any support for the partial WHERE criteria. Any ideas?

Source: (StackOverflow)

Pycassa: how to query parts of a Composite Type

Basically I'm asking the same thing as in this question but for the Python Cassandra library, PyCassa.

Lets say you have a composite type storing data like this:

[20120228:finalscore] = '31-17'
[20120228:halftimescore]= '17-17'
[20120221:finalscore] = '3-14'
[20120221:halftimescore]= '3-0'
[20120216:finalscore] = '54-0'
[20120216:halftimescore]= '42-0'

So, I know I can easily slice based off of the first part of the composite type by doing:

>>> cf.get('1234', column_start('20120216',), column_finish('20120221',))
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])

But if I only want the finalscore, I would assume I could do:

>>> cf.get('1234', column_start('20120216', 'finalscore'),
column_finish('20120221', 'finalscore'))

To get:

OrderedDict([((u'20120216', u'finalscore'), u'54-0')])

But instead, I get:

OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])

Same as the 1st call.

Am I doing something wrong? Should this work? Or is there some syntax using the cf.get(... columns=[('20120216', 'finalscore')]) ? I tried that too and got an exception.

According to http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1, I should be able to do something like this...

Thanks

Source: (StackOverflow)

Cassandra buffered read of millions of columns

I've got a cassandra cluster with a small number of rows (< 100). Each row has about 2 million columns. I need to get a full row (all 2 million columns), but things start failing all over the place before I can finish my read. I'd like to do some kind of buffered read.

Ideally I'd like to do something like this using Pycassa (no this isn't the proper way to call get, it's just so you can get the idea):

results = {}
start = 0
while True:
    # Fetch blocks of size 500
    buffer = column_family.get(key, column_offset=start, column_count=500)
    if len(buffer) == 0:
        break

    # Merge these results into the main one
    results.update(buffer)

    # Update the offset
    start += len(buffer)

Pycassa (and by extension Cassandra) don't let you do this. Instead you need to specify a column name for column_start and column_finish. This is a problem since I don't actually know what the start or end column names will be. The special value "" can indicate the start or end of the row, but that doesn't work for any of the values in the middle.

So how can I accomplish a buffered read of all the columns in a single row? Thanks.

Source: (StackOverflow)

Cassandra multiget performance

I've got a cassandra cluster with a fairly small number of rows (2 million or so, which I would hope is "small" for cassandra). Each row is keyed on a unique UUID, and each row has about 200 columns (give or take a few). All in all these are pretty small rows, no binary data or large amounts of text. Just short strings.

I've just finished the initial import into the cassandra cluster from our old database. I've tuned the hell out of cassandra on each machine. There were hundreds of millions of writes, but no reads. Now that it's time to USE this thing, I'm finding that read speeds are absolutely dismal. I'm doing a multiget using pycassa on anywhere from 500 to 10000 rows at a time. Even at 500 rows, the performance is awful sometimes taking 30+ seconds.

What would cause this type of behavior? What sort of things would you recommend after a large import like this? Thanks.

Source: (StackOverflow)

Time UUID type in pycassa

I'm having problems with using the time_uuid type as a key in my columnfamily. I want to store my records, and have them ordered by when they were inserted, and then I figured that the time_uuid is a good way to go. This is how I've set up my column family:

sys.create_column_family("keyspace", "records", comparator_type=TIME_UUID_TYPE)

When I try to insert, I do this:

q=pycassa.ColumnFamily(pycassa.connect("keyspace"), "records")
myKey=pycassa.util.convert_time_to_uuid(datetime.datetime.utcnow())
q.insert(myKey,{'somedata':'comevalue'})

However, when I insert data, I always get an error:

Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number.

If I change the comparator_type to UTF8_TYPE, it works, but the order of the items when returned are not as they should be. What am I doing wrong?

Source: (StackOverflow)

pycassa remove specific key from super column

I know how to remove an entire super column, but not an individual key within. My google kung fu has failed me.

so, to remove a super column:

cf_accounts.remove('key', ['super_column'])

Am I barking up the wrong tree?

Help appreciated.

Source: (StackOverflow)