

A Ruby client for the Cassandra distributed database

Cassandra port usage - how are the ports used?

When experimenting with Cassandra I've observed that Cassandra listens to the following ports:

  • TCP *:8080
  • TCP *:8888
  • TCP *:57311
  • TCP *:57312
  • TCP
  • TCP
  • UDP

How does Cassandra use each of the ports listed?

Source: (StackOverflow)

How do I delete all data in a Cassandra column family?

I'm looking for a way to delete all of the rows from a given column family in cassandra.

This is the equivalent of TRUNCATE TABLE in SQL.

Source: (StackOverflow)


Redis, CouchDB or Cassandra? [closed]

What are the strengths and weaknesses of the various NoSQL databases available?

In particular, it seems like Redis is weak when it comes to distributing write load over multiple servers. Is that the case? Is it a big problem? How big does a service have to grow before that could be a significant problem?

Source: (StackOverflow)

How to use Cassandra in Django framework

Is there any robust way of implementing Cassandra back end to a web application developed using Django web framework.


Source: (StackOverflow)

MongoDB vs. Redis vs. Cassandra for a fast-write, temporary row storage solution

I'm building a system that tracks and verifies ad impressions and clicks. This means that there are a lot of insert commands (about 90/second average, peaking at 250) and some read operations, but the focus is on performance and making it blazing-fast.

The system is currently on MongoDB, but I've been introduced to Cassandra and Redis since then. Would it be a good idea to go to one of these two solutions, rather than stay on MongoDB? Why or why not?

Thank you

Source: (StackOverflow)

Large scale data processing Hbase vs Cassandra [closed]

I am nearly landed at Cassandra after my research on large scale data storage solutions. But its generally said that Hbase is better solution for large scale data processing and analysis.

While both are same key/value storage and both are/can run (Cassandra recently) Hadoop layer then what makes Hadoop a better candidate when processing/analysis is required on large data.

I also found good details about both at http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/

but I'm still looking for concrete advantages of Hbase.

While I am more convinced about Cassandra because its simplicity for adding nodes and seamless replication and no point of failure features. And it also keeps secondary index feature so its a good plus.

Source: (StackOverflow)

How to choose between Cassandra, Membase, Hadoop, MongoDB, RDBMS etc.? [closed]

Is there a paper/blog-post on when to use Cassandra or Membase or Hadoop or plain old relational databases ? Is there a paper discussing the strengths/weaknesses of each, and on what scenarios either of these technologies should be chosen ?

I am thinking of writing a new webservice which will have about a million hits per day and data spanning about a few terabytes.

Source: (StackOverflow)

What is an SSTable?

In BigTable/GFS and Cassandra terminology, what is the definition of a SSTable?

Source: (StackOverflow)

Difference between Document-based and Key/Value-based databases?

I know there are three different, popular types of non-sql databases.

  • Key/Value: Redis, Tokyo Cabinet, Memcached
  • ColumnFamily: Cassandra, HBase
  • Document: MongoDB, CouchDB

I have read long blogs about it without understanding so much.

I know relational databases and get the hang around document-based databases like MongoDB/CouchDB.

Could someone tell me what the major differences are between these and the 2 former on the list?

Source: (StackOverflow)

When NOT to use Cassandra?

There has been a lot of talk related to Cassandra lately.

Twitter, Digg, Facebook, etc all use it.

When does it make sense to:

  • use Cassandra,
  • not use Cassandra, and
  • use a RDMS instead of Cassandra.

Source: (StackOverflow)

Difference between partition key, composite key and clustering key in Cassandra?

I have been reading articles around the net to understand the differences between the following key types. But it just seems hard for me to grasp. Examples will definitely help make understanding better.

primary key,
partition key, 
composite key 
clustering key

Source: (StackOverflow)

Cassandra server throws java.lang.AssertionError: DecoratedKey(...) != DecoratedKey

I'm currently experimenting around with Cassandra.

On the client-side (with Hector) I look up a few keys like this:

ColumnFamilyResult<String, String> result = template.queryColumns(Arrays.asList("key1","key2","key3"));

Most of the time it seems to work. But other times I get a timeout exception on the client:

Caused by: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
    at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:35)
    at me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate$1.execute(ThriftColumnFamilyTemplate.java:100)
    at me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate$1.execute(ThriftColumnFamilyTemplate.java:88)
    at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
    at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
    at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
    at me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate.sliceInternal(ThriftColumnFamilyTemplate.java:88)
    at me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate.doExecuteSlice(ThriftColumnFamilyTemplate.java:46)
    at me.prettyprint.cassandra.service.template.ColumnFamilyTemplate.queryColumns(ColumnFamilyTemplate.java:113)
    at info.gamlor.experiments.Cassandra.readObjectByKey(ComplexCassandra.java:255)

Caused by: TimedOutException()
    at org.apache.cassandra.thrift.Cassandra$get_slice_result.read(Cassandra.java:7772)
    at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:570)
    at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:542)
    at me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate$1.execute(ThriftColumnFamilyTemplate.java:95)

And on the server this exception shows up:

ERROR 11:33:55,312 Exception in thread Thread[ReadStage:91,5,main]
java.lang.AssertionError: DecoratedKey(4948402862350542345439897754126541659, 6932) != DecoratedKey(132475956107784875457507977471906551877, 726f6f74) in C:\tem
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:58)
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
        at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78)
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1331)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1193)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1128)
        at org.apache.cassandra.db.Table.getRow(Table.java:378)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
        at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816)
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

Sometimes the key-values in the DecoratedKey(...) part takes up pages.

Anyone a hint what I'm doing wrong. Or how to investigate this issue.


Source: (StackOverflow)

What's The Best Practice In Designing A Cassandra Data Model? [closed]

And what are the pitfalls to avoid? Are there any deal breaks for you? E.g., I've heard that exporting/importing the Cassandra data is very difficult, making me wonder if that's going to hinder syncing production data to development environment.

BTW, it's very hard to find good tutorials on Cassandra, the only one I have http://arin.me/code/wtf-is-a-supercolumn-cassandra-data-model is still pretty basic.


Source: (StackOverflow)

What does "Document-oriented" vs. Key-Value mean when talking about MongoDB vs Cassandra?

What does going with a document based NoSQL option buy you over a KV store, and vice-versa?

Source: (StackOverflow)

What should I choose: MongoDB/Cassandra/Redis/CouchDB? [closed]

We're developing a really big project and I was wondering if anyone can give me some advice about what DB backend should we pick.

Our system is compound by 1100 electronic devices that send a signal to a central server and then the server stores the signal info (the signal is about 35 bytes long). How ever these devices will be sending about 3 signals per minute each, so if we do de numbers, that'll be 4.752.000 new records/day on the database, and a total of 142.560.000 new records/month.

We need a DB Backend that is lighting fast and reliable. Of course we need to do some complex data mining on that DB. We're doing some research on the MongoDB/Cassandra/Redis/CouchDB, however the documentation websites are still on early stages.

Any help? Ideas?

Thanks a lot!

Source: (StackOverflow)