pycassa
Python Thrift driver for Apache Cassandra
pycassa 1.11.0 Documentation — pycassa 1.11.0 documentation
The LIKE condition allows us to use wildcards in the where clause of an SQL statement. This allows us to perform pattern matching. The LIKE condition can be used in any valid SQL statement - select, insert, update, or delete. Like this
SELECT * FROM users
WHERE user_name like 'babu%';
like the same above operation any query is available for Cassandra in CLI.
Source: (StackOverflow)
I'm playing with pycassaShell (as part of the Cassandra and the Twissandra tutorial).
When trying to add two functions inside the shell, and call one from the other I get an error that the Name is not recognized.
This is probably something very simple, but I did not find how to do this.
The pycassaShell looks like:
In [3]: def aaa(): print 5
In [4]: aaa()
5
In [5]: def bbb(): aaa()
In [6]: bbb()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
...
NameError: global name 'aaa' is not defined
Source: (StackOverflow)
I'm trying to import pycassa library in a Python project in Aptana 3, but in the line "import pycassa" it show me the next error: "Unresolved import: pycassa". I installed pycassa with easy install and if I run "import pycassa" in a python shell it run with no errors. If I run the Aptana project run with no errors too, but the error mark continues. Why?
Sorry for my english.
Source: (StackOverflow)
Heh,
I'm using
cf.insert(uuid.uuid1().bytes_le, {'column1': 'val1'})
(pycassa)
to create a TimeUUID for Cassandra, but getting the error
InvalidRequestException:
InvalidRequestException(why='UUIDs must be exactly 16 bytes')
It doesn't work with
uuid.uuid1()
uuid.uuid1().bytes
str(uuid.uuid1())
either.
What's the best way to create a valid TimeUUID to use with the CompareWith="TimeUUIDType" flag?
Thanks,
Henrik
Source: (StackOverflow)
Is anyone having experience working with pycassa I have a doubt with it. How do I get all the keys that are stored in the database?
well in this small snippet we need to give the keys in order to get the associated columns (here the keys are 'foo' and 'bar'),that is fine but my requirement is to get all the keys (only keys) at once as Python list or similar data structure.
cf.multiget(['foo', 'bar'])
{'foo': {'column1': 'val2'}, 'bar': {'column1': 'val3', 'column2': 'val4'}}
Thanks.
Source: (StackOverflow)
I'm trying to store some time series data on the following column family:
create column family t_data with comparator=TimeUUIDType and default_validation_class=UTF8Type and key_validation_class=UTF8Type;
I'm successfully inserting data this way:
data={datetime.datetime(2013, 3, 4, 17, 8, 57, 919671):'VALUE'}
key='row_id'
col_fam.insert(key,data)
As you can see, using a datetime object as the column name pycassa converts to a timeUUID object correctly.
[default@keyspace] get t_data[row_id];
=> (column=f36ad7be-84ed-11e2-af42-ef3ff4aa7c40, value=VALUE, timestamp=1362423749228331)
Sometimes, the application needs to update some data. The problem is that when I try to update that column, passing the same datetime object, pycassa creates a different UUID object (the time part is the same) so instead of updating the column, it creates another one.
[default@keyspace] get t_data[row_id];
=> (column=f36ad7be-84ed-11e2-af42-ef3ff4aa7c40, value=VALUE, timestamp=1362423749228331)
=> (column=**f36ad7be**-84ed-11e2-b2fa-a6d3e28fea13, value=VALUE, timestamp=1362424025433209)
The question is, how can I update TimeUUID based columns with pycassa passing the datetime object? or, if this is not the correct way to doing it, what is the recommended way?
Source: (StackOverflow)
Is it possible to use batch_mutate for counters in php?
From what I've seen, it should be possible to increment counters in general, but I can't seem to find any working examples in any language.
Thanks
Source: (StackOverflow)
I am using cassandra 2.0.3 and I would like to use pyspark (Apache Spark Python API) to create an RDD object from cassandra data.
PLEASE NOTE: I do not want to do import CQL and then CQL query from pyspark API rather I would like to create an RDD on which I woud like to do some transformations.
I know this can be done in Scala but I am not able to find out how this could be done from pyspark.
Really appreciate if anyone could guide me on this.
Source: (StackOverflow)
I'm trying to do a partial search through a column family in Cassandra similar to an SQL query like: SELECT * FROM columnfamily WHERE col = 'val*' where val* means any value matching at least the first three characters 'val'.
I've read datastax's documentation on the SELECT function, but can't seem to find any support for the partial WHERE criteria. Any ideas?
Source: (StackOverflow)
Basically I'm asking the same thing as in this question but for the Python Cassandra library, PyCassa.
Lets say you have a composite type storing data like this:
[20120228:finalscore] = '31-17'
[20120228:halftimescore]= '17-17'
[20120221:finalscore] = '3-14'
[20120221:halftimescore]= '3-0'
[20120216:finalscore] = '54-0'
[20120216:halftimescore]= '42-0'
So, I know I can easily slice based off of the first part of the composite type by doing:
>>> cf.get('1234', column_start('20120216',), column_finish('20120221',))
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])
But if I only want the finalscore, I would assume I could do:
>>> cf.get('1234', column_start('20120216', 'finalscore'),
column_finish('20120221', 'finalscore'))
To get:
OrderedDict([((u'20120216', u'finalscore'), u'54-0')])
But instead, I get:
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])
Same as the 1st call.
Am I doing something wrong? Should this work? Or is there some syntax using the cf.get(... columns=[('20120216', 'finalscore')]) ? I tried that too and got an exception.
According to http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1, I should be able to do something like this...
Thanks
Source: (StackOverflow)
I've got a cassandra cluster with a small number of rows (< 100). Each row has about 2 million columns. I need to get a full row (all 2 million columns), but things start failing all over the place before I can finish my read. I'd like to do some kind of buffered read.
Ideally I'd like to do something like this using Pycassa (no this isn't the proper way to call get
, it's just so you can get the idea):
results = {}
start = 0
while True:
# Fetch blocks of size 500
buffer = column_family.get(key, column_offset=start, column_count=500)
if len(buffer) == 0:
break
# Merge these results into the main one
results.update(buffer)
# Update the offset
start += len(buffer)
Pycassa (and by extension Cassandra) don't let you do this. Instead you need to specify a column name for column_start
and column_finish
. This is a problem since I don't actually know what the start or end column names will be. The special value ""
can indicate the start or end of the row, but that doesn't work for any of the values in the middle.
So how can I accomplish a buffered read of all the columns in a single row? Thanks.
Source: (StackOverflow)
I've got a cassandra cluster with a fairly small number of rows (2 million or so, which I would hope is "small" for cassandra). Each row is keyed on a unique UUID, and each row has about 200 columns (give or take a few). All in all these are pretty small rows, no binary data or large amounts of text. Just short strings.
I've just finished the initial import into the cassandra cluster from our old database. I've tuned the hell out of cassandra on each machine. There were hundreds of millions of writes, but no reads. Now that it's time to USE this thing, I'm finding that read speeds are absolutely dismal. I'm doing a multiget using pycassa on anywhere from 500 to 10000 rows at a time. Even at 500 rows, the performance is awful sometimes taking 30+ seconds.
What would cause this type of behavior? What sort of things would you recommend after a large import like this? Thanks.
Source: (StackOverflow)
I'm having problems with using the time_uuid type as a key in my columnfamily. I want to store my records, and have them ordered by when they were inserted, and then I figured that the time_uuid is a good way to go. This is how I've set up my column family:
sys.create_column_family("keyspace", "records", comparator_type=TIME_UUID_TYPE)
When I try to insert, I do this:
q=pycassa.ColumnFamily(pycassa.connect("keyspace"), "records")
myKey=pycassa.util.convert_time_to_uuid(datetime.datetime.utcnow())
q.insert(myKey,{'somedata':'comevalue'})
However, when I insert data, I always get an error:
Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number.
If I change the comparator_type to UTF8_TYPE, it works, but the order of the items when returned are not as they should be. What am I doing wrong?
Source: (StackOverflow)
I know how to remove an entire super column, but not an individual key within. My google kung fu has failed me.
so, to remove a super column:
cf_accounts.remove('key', ['super_column'])
Am I barking up the wrong tree?
Help appreciated.
Source: (StackOverflow)