leveldb
I'm wondering if the 'snapshot' facility of the LevelDB library can create a snapshot reference that could be saved even after a close of the open database object (and thus reused on a subsequent open).
I suspect not, which leads to a followup: is there a good/recommended way to make a consistent backup of the database as of a snapshot-instant, ideally even while other activity continues? (That is, short of iterating the entire snapshot keyrange via the API?)
(Essentially I'm looking for something analogous to saving aside the append-only JDB log files of BerkeleyDB-JE up through a certain checkpointed place.)
Source: (StackOverflow)
Is there a way to access a levelDB database from several programs? Is there some kind of option to open the dababase as read only?
For now, when opening the same database from to programs I get:
/path/to/dir/with/levelDBdatabase/LOCK: Resource temporarily unavailable
Cheers!
Source: (StackOverflow)
Leveldb seems to be a new interesting persistent key value store from Google. How does Leveldb differ from Redis or Riak or Tokyo Tyrant? In what specific use cases is one better than the other?
Source: (StackOverflow)
I'm a little confused here- would comparison of doubles still work correctly when they're stored as opaque (binary) fields? The problem I'm facing is the fact that the double includes a leading bit for the sign (i.e. positive or negative) and when they're stored as binary data I'm not sure it will be compared correctly:
I want to ensure that the comparison will work correctly, because I'm using a double as a part of a key tuple (e.g. ) in LevelDB and I want to preserve the data locality for positive and negative numbers. LevelDB only uses opaque fields as keys, but it does allow the user to specify his/her own comparator. However, I just want to make sure that I don't specify a comparator unless I absolutely need to:
// Three-way comparison function:
// if a < b: negative result
// if a > b: positive result
// else: zero result
inline int Compare(const unsigned char* a, const unsigned char* b) const
{
if (*(double*)a < *(double*)b) return -1;
if (*(double*)a > *(double*)b) return +1;
return 0;
}
Source: (StackOverflow)
Currently we are evaluating several key+value data stores, to replace an older isam currently in use by owr main application (for 20 something years!) ...
The problem is that our current isam doesn't support crash recoveries.
So LevelDB seemd Ok to us (also checking BerkleyDB, etc)
But we ran into de question of hot-backups, and, given the fact that LevelDB is a library, and not a server, it is odd to ask for 'hot backup', as it would intuitively imply an external backup process.
Perhaps someone would like to propose options (or known solutions) ?
For example:
- Hot backup through an inner thread of the main applicacion ?
- Hot backup by merely copying the LevelDB data directory ?
Thanks in advance
Source: (StackOverflow)
There is this in LevelUP Documentation (https://github.com/rvagg/node-levelup#multi-process-access):
LevelDB is thread-safe but is not suitable for accessing with multiple processes. You should only ever have a LevelDB database open from a single Node.js process. Node.js clusters are made up of multiple processes so a LevelUP instance cannot be shared between them either.
So I could not use Node Cluster (http://nodejs.org/api/cluster.html)
Is there another option to make a multi-process (or multi-thread) Node.js Application accessing a LevelDB Database?
Source: (StackOverflow)
I'm performing a pressure test with leveldb.
In util/env_poisx.cc : NewRandomAccessFile()
void* base = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
after 3 million data (each 100k) insert. The errno says Cannot allocate memory
.
why?
More details:
top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19794 root 20 0 290g 4.9g 4.7g S 98.6 7.8 2348:00 ldb
free -m:
total used free shared buffers cached
Mem: 64350 60623 3726 0 179 59353
-/+ buffers/cache: 1090 63259
Swap: 996 0 996
ulimit -a:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 10240
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 530432
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
sysctl:
vm.max_map_count = 300000
kernel.shmmni = 8092
kernel.shmall = 4194304
kernel.shmmax = 2147483648
Source: (StackOverflow)
I'm working on a design where secondary indexes to data will be built with all the information in the key, needing nothing in the value side. Is this likely to cause problems?
I'm not asking is it technically possible to have a blank value. Are there any structural consequences, for example: adding sorted keys can unbalance some tree structures? (I'm not saying leveldb uses trees, just trying to think of an analogy ;-) )
ie: say a "primary record" looks like (nulls as separators)
- key = uniqueTableID \0 uniqueRowID
- value = some collection of fields
a secondary index to a typical single-valued field would look like:
- key = uniqueFieldID \0 keyValue \0 uniqueRowID
that allows iteration by the partial key [uniqueFieldID \0 keyValue] and it also makes it easy to find these keys and delete them if the main record is deleted or key value changes, working back from the main record's uniqueRowID. So there might be several key values ending in the same uniqueRowID but can only ever be one key for the particular combination starting with a uniqueFieldID and ending with a uniqueRowID
The only thing is that I don't have any need to put a value in the value side of the pair.
I'm pretty happy with this conceptual design, just checking to see if anyone can spot holes in it. For example, if it would distort leveldb internals causing performance issues.
I expect there would be tens of thousands of such keys in one particular app.
As an example of a value we might want to store, a secondary word index to a text field mightlook like:
- key = uniqueFieldID \0 keyValue \0 GUID
- value = count of word occurrences or maybe a list of offsets if scanning large blobs was expensive
Source: (StackOverflow)
WebSQL and IndexedDB are both DB API for accessing (CRUD) the underlying embedded database in the web browser. Which, if I am correct, is like SQL for accessing (CRUD) any client-server database like Oracle etc. (in many case support for both WebSQL and IndexedDB is available on same browser)
- So, does it mean that both WebSQL and IndexedDB are accessing (CRUD) the same underlying embedded database and if that it the case then it will have same performance on all web browsers!
-
But I think that is not the case, so does it mean that a web browser will have more than one underlying embedded database? And why there should be 2 underlying embedded database in same browser?
And since WebSQL and IndexedDB are API's, so it means that it not entirely correct to say performance of WebSQL and IndexedDB (because they are more like query/access language), but it significantly depends upon the performance of underlying embedded database. And, as per Google, LevelDB is faster than SQLite
-
Is it correct to say that significantly it is not the performance difference between WebSQL and IndexedDB, but performance of underlying embedded database?
- What are the underlying embedded database for IE, Chrome, Android browser? I couldn't find this information on web, has anybody found or compiled it ever?
Source: (StackOverflow)
In our application we use std::map
to store (key, value) data and use serialization to store that data on disk. With this approach we are finding that the disk I/O is performance bottleneck and finding values using key is not very fast.
I have come across LevelDB and thinking of using it. But I have some questions.
- LevelDB's documentation says its made for (string, string) key value pair. Does it mean that I can not use for custom key value pairs?
- It seems the difference between
std::map
and LevelDB is that LevelDB is persistent and std::map
works in memory. So does it mean the disk I/O bottleneck will be more problematic for levelDB.
More specifically can anybody please explain if LevelDB could be better choice than std::map
?
PS: I tried using hash_map
s but it appears to be slower than std::map
Source: (StackOverflow)
One the official site of leveldb(http://code.google.com/p/leveldb/), there is a performance report. I pasted as below.
Below is from official leveldb benchmark
Here is a performance report (with explanations) from the run of the included db_bench program. The results are somewhat noisy, but should be enough to get a ballpark performance estimate.
Setup
We use a database with a million entries. Each entry has a 16 byte key, and a 100 byte value. Values used by the benchmark compress to about half their original size.
LevelDB: version 1.1
CPU: 4 x Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
CPUCache: 4096 KB
Keys: 16 bytes each
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
Raw Size: 110.6 MB (estimated)
File Size: 62.9 MB (estimated)
Write performance
The "fill" benchmarks create a brand new database, in either sequential, or random order.
The "fillsync" benchmark flushes data from the operating system to the disk after every operation; the other write operations leave the data sitting in the operating system buffer cache for a while. The "overwrite" benchmark does random writes that update existing keys in the database.
fillseq : 1.765 micros/op; 62.7 MB/s
fillsync : 268.409 micros/op; 0.4 MB/s (10000 ops)
fillrandom : 2.460 micros/op; 45.0 MB/s
overwrite : 2.380 micros/op; 46.5 MB/s
Each "op" above corresponds to a write of a single key/value pair. I.e., a random write benchmark goes at approximately 400,000 writes per second.
Below is from My leveldb benchmark
I did some benchmark for leveldb but got write speed 100 times less than the report.
Here is my experiment settings:
- CPU: Intel Core2 Duo T6670 2.20GHz
- 3.0GB memory
- 32-bit Windows 7
- without compression
- options.write_buffer_size = 100MB
- options.block_cache = 640MB
What I did is very simple: I just put 2 million {key, value} and no reads at all. The key is a byte array which has 20 random bytes and the value is a byte array too with 100 random bytes. I constantly put newly random {key, value} for 2 million times, without any operation else.
In my experiment, I can see that the speed of writing decreases from the very beginning. The instant speed (measuring the speed of every 1024 writes) swings between 50/s to 10, 000/s. And my overall average speed of writes for 2 million pairs is around 3,000/s. The peak speed of writes is 10, 000/s.
As the report claimed that the speed of writes can be 400, 000/s, the write speed of my benchmark is 40 to 130 times slower and I am just wondering what's wrong with my benchmark.
I don't need to paste my testing codes here as it is super easy, I just have a while loop for 2 million times, and inside the loop, for every iteration, I generate a 20 bytes of key, and 100 bytes of value, and then put them to the leveldb database. I also measured the time spent on {key, value} generation, it costs 0 ms.
Can anyone help me with this? How can I achieve 400, 000/s writes speed with leveldb? What settings I should improve to?
Thanks
Moreover
I just ran the official db_bench.cc on my machie. It is 28 times slower than the report.
I think as I used their own benchmark program, the only difference between my benchmark and theirs is the machine.
Source: (StackOverflow)
Given the following requirements of a persistent key/value store:
- Only fetch, insert, and full iteration of all values (for exports) are required
- No deleting values or updating values
- Keys are always the same size
- Code embedded in the host application
And given this usage pattern:
- Fetches are random
- Inserts and fetches are interleaved with no predictability
- Keys are random, and inserted in random order
What is the best on-disk data structure/algorithm given the requirements?
Can a custom implementation exceed the performance of LSM-based (Log Structured Merge) implementations (i.e. leveldb, rocksdb)?
Would a high performance custom implementation for these requirements also be considerably simpler in implementation?
Source: (StackOverflow)
I'm writing a script to collect the hashes of all bitcoin blocks. The program bitcoind, if a certain setting is changed, stores metadata for all blocks in a LevelDB database. The key of each set of metadata is the block's hash, which is typically used as an identifier for it. Essentially, I'm trying to get a specific part of the metadata (the transaction IDs) out of each block. The script I'm writing is in Haskell, although I could always do a shell command if necessary. To put my problem in general terms, I'm not sure if the most simple way to do this is to find all block hashes (keys) and then call bitcoind to get the metadata for each of them. If there's any way to simply get every value from a LevelDB database directly, that would work as well. What's the most simple and efficient way to do this?
Source: (StackOverflow)