mapdb
MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
MapDB: Database Engine — MapDB
I use mapdb
as following
val mycache = DBMaker.newFileDB(new File(("/data/tmp/cache.db")))
.transactionDisable()
.make().getHashSet("")
then when i do
mycache.put(k1, v1)
assertTrue(mycache.get(k1), v1) // all is fine
however if i restart my server i do see i have cache.db on disk however it will have an empty map when reading
so
mycache.get(k1) // is null after restart
how can i have it reread my map after restart from file?
Source: (StackOverflow)
So I created a database that worked like this:
static class Record implements Serializable
{
final String action;
final String categoryOfAction;
final String personWhoPerformedAction;
final Long timeOfOccurrence;
public record(String actn, String cat, String person, Long time)
{
action = actn;
categoryOfAction = cat;
personWhoPerformedAction = person;
timeOfOccurence = time;
}
}
static void main(String[] args)
{
DB thedb = DBMaker.newFileDB(new File("D:\\thedb.db")
.compressionEnable()
.closeOnJvmShutdown()
.mmapFileEnableIfSupported()
.transactionDisable()
.asyncWriteEnable()
.make();
//primaryMap maps each record to a unique ID
BTreeMap<Integer,Record> primaryMap = thedb.createTreeMap("pri")
.keySerializer(BTreeKeySerializer.INTEGER)
.makeOrGet();;
//this map holds the unique ID of every record in primaryMap with a common action
NavigableSet<Object[]> map_commonAction = thedb.createTreeSet("com_a")
.comparator(Fun.COMPARABLE_ARRAY_COMPARATOR)
.makeOrGet();
//this map holds the unique ID of every record in primaryMap with a common person
NavigableSet<Object[]> map_commonPerson = thedb.createTreeSet("com_p")
.comparator(Fun.COMPARABLE_ARRAY_COMPARATOR)
.makeOrGet();
//binding map_commonAction to primaryMap so it is updated with primary
Bind.secondaryKey(primaryMap, map_commonAction, new Fun.Function2<String, Integer, Record>() {
@Override
public String run(Integer recordID, Record r) {
return r.action;
}
});
//binding map_commonPerson to primaryMap so it is updated with primary
Bind.secondaryKey(primaryMap, map_commonPerson, new Fun.Function2<String, Integer, Record>() {
@Override
public String run(Integer recordID, Record r) {
return r.personWhoPerformedAction;
}
});
//method used to attain all records with some action
for (Object[] k : Fun.filter(map_commonAction, "someAction"))
{
Record obtainedRecord = primary.get(k[1]);
}
//method used to attain all records with some person
for (Object[] k : Fun.filter(map_commonPerson, "somePerson"))
{
Record obtainedRecord = primary.get(k[1]);
}
}
After I created it, I inserted 19 billion items. The methods for attaining all records with some action or person worked perfectly. I closed the database, and then I tried running it again, except this time the database was already built and had all the items inserted, so there was no need to insert the 19 billion items. Once I called one of the methods for attaining all records with some action or person, I get this error:
Exception in thread "main" java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to java.lang.Comparable
at org.mapdb.Fun$1.compare(Fun.java:31)
at org.mapdb.BTreeKeySerializer$BasicKeySerializer.compare(BTreeKeySerializer.java:206)
at org.mapdb.BTreeKeySerializer$BasicKeySerializer.compare(BTreeKeySerializer.java:156)
at org.mapdb.BTreeKeySerializer.compareIsSmaller(BTreeKeySerializer.java:48)
at org.mapdb.BTreeKeySerializer.findChildren(BTreeKeySerializer.java:89)
at org.mapdb.BTreeMap.nextDir(BTreeMap.java:843)
at org.mapdb.BTreeMap.findLargerNode(BTreeMap.java:1673)
at org.mapdb.BTreeMap$BTreeIterator.<init>(BTreeMap.java:1068)
at org.mapdb.BTreeMap$BTreeKeyIterator.<init>(BTreeMap.java:1323)
at org.mapdb.BTreeMap$SubMap.keyIterator(BTreeMap.java:2483)
at org.mapdb.BTreeMap$KeySet.iterator(BTreeMap.java:1900)
at org.mapdb.Fun$12.iterator(Fun.java:369)
at test.main(test.java:187)
So then checked the size of each map
System.out.println(map_commonAction.size()); //returned correct size: 19billion
System.out.println(map_commonPerson.size()); //returned correct size: 19billion
System.out.println(primaryMap.size()); //returned correct size: 19billion
So then I checked if the primaryMap even worked, and checked a couple int values, and it returned a record like it should
Record r1 = primaryMap.get(1);
Record r2 = primaryMap.get(2);
System.out.println(r1.toString());
System.out.println(r2.toString());
It only fails when I try to iterate through the what is given by Fun.filter(map_common*, "something") but the act of calling it does not make it fail, just trying to iterate through it. I tested it like so:
//this method fails and causes and exception to be thrown
for (Object[] k : Fun.filter(map_commonPerson, "person"))
{
System.out.println(primaryMap.get(k[1]).toString());
}
//this method doesn't cause an exception to be thrown
Iterable<Object[]> x = Fun.filter(map_commonPerson, "person");
So now I'm stuck, and I have no idea what's wrong with my map. It works perfectly once I've created a new DB and inserted the 19 billion items, but once I close it and try re-openning it for more reading it fails.
Can anyone help? Thanks.
Source: (StackOverflow)
I'm getting an out of memory error when testing mapdb. Given that the whole idea of the project is to serialize data structures to disk and avoid memory problems, I figure I'm doing something wrong. Any ideas what I'm doing wrong? Or is there a bug?
@Test
public void testLarge() throws Exception {
final HTreeMap<UUID, String> storage = DBMaker.newTempHashMap();
String string = createDataSize(250);
ArrayList<UUID> keys = new ArrayList<>();
for (int i = 0; i < 320000; i++) {
final UUID key = UUID.randomUUID();
storage.put(key, string);
keys.add(key);
}
for (UUID key : keys) {
assertNotNull(storage.get(key));
}
for (UUID key : keys) {
storage.remove(key);
}
assertEquals("nothing left", 0, storage.size());
}
/**
* Creates a message of size @msgSize in KB.
*/
private static String createDataSize(int msgSize) {
// Java chars are 2 bytes
msgSize = msgSize / 2;
msgSize = msgSize * 1024;
StringBuilder sb = new StringBuilder(msgSize);
for (int i = 0; i < msgSize; i++) {
sb.append('a');
}
return sb.toString();
}
}
Stack trace. Line 29 in my function corresponds to the "assertNotNull(storage.get(key));" line.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.String.<init>(String.java:168)
at org.mapdb.SerializerBase.deserializeString(SerializerBase.java:724)
at org.mapdb.SerializerBase.deserialize(SerializerBase.java:932)
at org.mapdb.SerializerBase.deserialize(SerializerBase.java:731)
at org.mapdb.HTreeMap$1.deserialize(HTreeMap.java:134)
at org.mapdb.HTreeMap$1.deserialize(HTreeMap.java:123)
at org.mapdb.StorageDirect.recordGet2(StorageDirect.java:536)
at org.mapdb.StorageDirect.get(StorageDirect.java:201)
at org.mapdb.EngineWrapper.get(EngineWrapper.java:50)
at org.mapdb.AsyncWriteEngine.get(AsyncWriteEngine.java:163)
at org.mapdb.EngineWrapper.get(EngineWrapper.java:50)
at org.mapdb.CacheHashTable.get(CacheHashTable.java:85)
at org.mapdb.HTreeMap.get(HTreeMap.java:387)
at com.sample.StorageTest.testLarge(StorageTest.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:292)
Based on unholysampler's suggestion, I made the following change which fixed the problem.
diff -r 35918e46551a src/test/java/com/sample/StorageTest.java
--- a/src/test/java/com/sample/StorageTest.java Thu Mar 21 13:40:16 2013 -0600
+++ b/src/test/java/com/sample/StorageTest.java Thu Mar 21 13:42:24 2013 -0600
@@ -16,7 +16,9 @@
@Test
public void testLarge() throws Exception {
- final HTreeMap<UUID, String> storage = DBMaker.newTempHashMap();
+ File tmpFile = File.createTempFile("largeTest", null);
+ DB db = DBMaker.newFileDB(tmpFile).deleteFilesAfterClose().journalDisable().make();
+ final HTreeMap<UUID, String> storage = db.getHashMap("name");
String string = createDataSize(250);
@@ -25,6 +27,9 @@
final UUID key = UUID.randomUUID();
storage.put(key, string);
keys.add(key);
+ if (i%100==0) {
+ db.commit();
+ }
}
for (UUID key : keys) {
Source: (StackOverflow)
I'm using MapDB in a project that deals with billions of Objects that need to be mapped/queued. I don't need any kind of persistence after the program finishes (the MapDB databases are all temporary). I want the program to run as fast as possible, but I'm confused about MapDB's commit() function (which I assume is relevant to performance), even after reading the docs. My questions:
What exactly does commit do? My working understanding is that it serializes Objects from the heap to disk, thus freeing heap space. Is this accurate?
What happens to the references to Objects that were just committed? Do they get cleaned up by GC, or do they somehow 'reference' an Object on disk (with MapDB making this transparent?)
Ultimately I want to know how to use MapDB as efficiently as I can, but I can't do that without knowing what commit() is for. I'd appreciate any other advice that you might have for using MapDB efficiently.
Source: (StackOverflow)
I have implemented a hazelcast service which stores its data into local mapdb instances via MapStoreFactory and newMapLoader. This way the keys can be loaded if a cluster restart is necessary:
public class HCMapStore<V> implements MapStore<String, V> {
Map<String, V> map;
/** specify the mapdb e.g. via
* DBMaker.newFileDB(new File("mapdb")).closeOnJvmShutdown().make()
*/
public HCMapStore(DB db) {
this.db = db;
this.map = db.createHashMap("someMapName").<String, Object>makeOrGet();
}
// some other store methods are omitted
@Override
public void delete(String k) {
logger.info("delete, " + k);
map.remove(k);
db.commit();
}
// MapLoader methods
@Override
public V load(String key) {
logger.info("load, " + key);
return map.get(key);
}
@Override
public Set<String> loadAllKeys() {
logger.info("loadAllKeys");
return map.keySet();
}
@Override
public Map<String, V> loadAll(Collection<String> keys) {
logger.info("loadAll, " + keys);
Map<String, V> partialMap = new HashMap<>();
for (String k : keys) {
partialMap.put(k, map.get(k));
}
return partialMap;
}}
The problem I'm now facing is that the loadAllKeys method of the MapLoader interface from hazelcast requires to return ALL keys of the whole cluster BUT every node stores ONLY the objects it owns.
Example: I have two nodes and store 8 objects, then e.g. 5 objects are stored in the mapdb of node1 and 3 in the mapdb of node2. Which object is owned by which node is decided by hazelcast I think. Now on restart node1 will return 5 keys for loadAllKeys and node2 will return 3. Hazelcast decides to ignore the 3 items and data is 'lost'.
What could be a good solution to this?
Update for bounty: Here I asked this on the hc mailing list mentioning 2 options (I'll add 1 more) and I would like to know if something like this is already possible with hazelcast 3.2 or 3.3:
Currently the MapStore interface gets only data or updates from the local node. Would it be possible to notify the MapStore interface of every storage action of the full cluster? Or maybe this is already possible with some listener magic? Maybe I can force hazelcast to put all objects into one partition and have 1 copy on every node.
If I restart e.g. 2 nodes then the MapStore interface gets called correctly with my local databases for node1 and then for node2. But when both nodes join the data of node2 will be removed as Hazelcast assumes that only the master node can be correct. Could I teach hazelcast to accept the data from both nodes?
Source: (StackOverflow)
I'm exploring MapDB utility to be used as a off-heap java cache backed by a SSD. Can someone suggest if it supports following:
- Is device access to SSD device "flash friendly" i.e. access are pages aligned.
- Does it allow inserting keys to the device in a batch mode. (All I was wondering if I can avoid performing db.commit() after insert of every single key).
Thanks!
Source: (StackOverflow)
I tried to use directly Clojure's hashmap with MapDB and ran into weird behaviour. I checked Clojure and MapDB sources and couldn't understand the problem.
First everything looks fine:
lein try org.mapdb/mapdb "1.0.6"
; defining a db for the first time
(import [org.mapdb DB DBMaker])
(defonce db (-> (DBMaker/newFileDB (java.io.File. "/tmp/mapdb"))
.closeOnJvmShutdown
.compressionEnable
.make))
(defonce fruits (.getTreeMap db "fruits-store"))
(do (.put fruits :banana {:qty 2}) (.commit db))
(get fruits :banana)
=> {:qty 2}
(:qty (get fruits :banana))
=> 2
(first (keys (get fruits :banana)))
=> :qty
(= :qty (first (keys (get fruits :banana))))
=> true
CTRL-D
=> Bye for now!
Then I try to access the data again:
lein try org.mapdb/mapdb "1.0.6"
; loading previsously created db
(import [org.mapdb DB DBMaker])
(defonce db (-> (DBMaker/newFileDB (java.io.File. "/tmp/mapdb"))
.closeOnJvmShutdown
.compressionEnable
.make))
(defonce fruits (.getTreeMap db "fruits-store"))
(get fruits :banana)
=> {:qty 2}
(:qty (get fruits :banana))
=> nil
(first (keys (get fruits :banana)))
=> :qty
(= :qty (first (keys (get fruits :banana))))
=> false
(class (first (keys (get fruits :banana))))
=> clojure.lang.Keyword
How come the same keyword be different with respect to =
?
Is there some weird reference problem happening ?
Source: (StackOverflow)
i'm trying to serialaze and deserialize an object to store it in mapDb.
I managed to serialize the Object using this snippet:
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream os = new ObjectOutputStream(bos);
os.writeObject(u);
result = bos.toString();
after that I stored "result" in mapDb. everything seemed to work like a charm.
Unfortunately I run in some issues while trying to deserialize it.
Here the snippet:
byte[] b = null;
b = str.getBytes();
InputStream ac = new ByteArrayInputStream(b);
Object a= ac.read();
str is the serialized object coming from mapDB treated as a string.
After that i "casted" it as a byteArray.
I used this approach because I had some issues while fetching data from mapDb as Objects.
So, I'm asking you, how can I fix this problem. Beacuse Object "a" is an instance of java.lang.Integer, instead of the class desired, so deselrialization isn't working.
Source: (StackOverflow)
When I create circular queue in direct memory. What I have done:
BlockingQueue<String> queue = DBMaker.newMemoryDirectDB().make().getCircularQueue("my-queue");
queue.add("sdfsd");
queue.add("345345");
queue.add("dfgdfg");
queue.add("dfgdgfdgdf");
System.out.println(queue.take());
This is working fine. But when I create queue like this:
BlockingQueue<String> queue = DBMaker.newMemoryDirectDB().make().createCircularQueue("my-queue", Serializer.STRING, 1000);
It throws NullPointerException
:
Exception in thread "main" java.lang.NullPointerException
at org.mapdb.DataOutput2.writeUTF(DataOutput2.java:147)
at org.mapdb.Serializer$1.serialize(Serializer.java:70)
at org.mapdb.Serializer$1.serialize(Serializer.java:67)
at org.mapdb.Queues$SimpleQueue$NodeSerializer.serialize(Queues.java:63)
at org.mapdb.Queues$SimpleQueue$NodeSerializer.serialize(Queues.java:52)
at org.mapdb.Store.serialize(Store.java:154)
at org.mapdb.StoreWAL.put(StoreWAL.java:232)
at org.mapdb.Caches$HashTable.put(Caches.java:216)
at org.mapdb.DB.createCircularQueue(DB.java:1208)
at com.mycompany.testjoda.Main.main(Main.java:11)
Am I missing something?
Source: (StackOverflow)
so I have a list of around 20 million key value pairs, and I'm storing the data in several MapDB's differently to see how it affects my programs performance, and for experiment sake.
The thing is, it takes quite a lot of time to insert (in random order) 20 million key-value pairs into a mapdb. So, I would like to sort the list of key-value pairs I have so I can insert them faster, and thus build databases faster out of them.
So, how would I go about this?
I'd like to learn how to do this for MapDB's BTreeSet and BTreeMap, or, MapDBs that use single key-value pairs and MapDBs that have multiple values for a single key.
EDIT:
I forgot to mention, the key-value pairs are String objects.
Source: (StackOverflow)
I'm evaluating MapDB for an application. I create the DB following way.
DB db = DBMaker.fileDB(new File("mapdbtest1")).cacheHashTableEnable()
.cacheSize(50000).closeOnJvmShutdown()
.encryptionEnable("password").make();
After putting 50000 objects in the DB and then calling GC explicitly, the used size is too less and looks like the objects are not in memory cache any more. I was expecting the cacheSize no. of objects to be always in memory, irrespective of GC or OOM and rest of them be swapped.
=======================
Memory : after committing to DB
free memory: 1426 MB
allocated memory: 2860 MB
used memory: 1433 MB
max memory: 3641 MB
total free memory: 2207 MB
=======================
=======================
Memory : after gc
free memory: 2479 MB
allocated memory: 2494 MB
used memory: 14 MB
max memory: 3641 MB
total free memory: 3626 MB
=======================
Is my understanding wrong ?
Source: (StackOverflow)
Is there any way to use byte array as a key in BTreeMap
like this:
BTreeMap<byte[], Integer> myBTreeMap = db.getTreeMap("myBTreeMap");
Currently this exception is thrown when trying to put new object into the map:
Exception in thread "main" java.lang.ClassCastException: [B cannot be cast to java.lang.Comparable ...
What is proper way of making this to work? I would like to know solution without using wrapper classes.
Any ideas are welcome.
[UPDATE]
I've used proposed solution by SJuan76:
BTreeMap<byte[], Integer> myBTreeMap = db.createTreeMap("myBTreeMap")
.comparator(SignedBytes.lexicographicalComparator())
.makeOrGet();
Used comparator can be found in Guava library if needed.
Source: (StackOverflow)
I encountered the exception after restarting Jetty, which contains producer-consumer implementation via MapDB's Queue. I haven't call "DBMaker.transactionDisable()", but why did I still get the above exceptions?
Before I restarted Jetty, I found the consumer seems blocked. By jstack command, I got lots of following logs:
"qtp1730704097-270" #270 prio=5 os_prio=0 tid=0x00007f7b60006800
nid=0xb44f waiting on condition [0x00007f7986ae9000]
java.lang.Thread.State: TIMED_WAITING (parking) at
sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000054047c420> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None
Here are some code fragment:
// Consumer code:
public void run() {
try {
while (!stopped) {
if (System.currentTimeMillis() - lastCheckTime > 60000) {
localDb.commit();
localDb.compact();
lastCheckTime = System.currentTimeMillis();
}
...
for (...) {
queue.poll();
}
...
localDb.commit();
}
localDb.commit();
localDb.compact();
} catch (InterruptedException ie) {
LOG.error("", ie);
}
}
// Others code:
...
DB localDb = DBMaker.newFileDB(new File(path)).closeOnJvmShutdown().make();
localDb.catPut(QUEUE_NAME + ".useLocks", false);
queue = localDb.getQueue(QUEUE_NAME);
...
public void interrupt() {
stopped = true;
localDb.close();
}
java -version: java version "1.8.0_11"
Thanks!
Source: (StackOverflow)
I use mapDB to store my Data, which are inserted in maps. I followed the instructions on mapDB-Site and was able to set up an Database and also fill it with values. But my problem is right here, I insert my data into the maps and then call the Database-Class to insert the Map into the DB. The current map is then added to the DB, but automatically overwrites the previous entry, so that the number of entries are always 1.
Here is my code:
for(Element objects : objectInstanceList)
{
mapID = objects.getName().toString();
List<Element> listObjects1 = objects.getChildren();
Multimap<String, Values> mm = HashMultimap.create();
for(Element objectClasses : listObjects1 )
{
List<Element> listObjects2 = objectClasses.getChildren();
for(Element objectAttributes : listObjects2)
{
String name = objectAttributes.getAttributeValue("name");
String type = objectAttributes.getAttributeValue("type");
String value = objectAttributes.getAttributeValue("value");
Values v = new Values(name, type, value);
mm.put(objectClasses.getName(), v);
}
}
DataBase.putHW(mapID, mm);
System.out.println(mm);
}
Like I said I fill the Multimap mm with some values and the Method Database.putHW, which looks like this(created like in the examples of the mapDB page).
public class DataBase {
static DB dbHW = DBMaker.newMemoryDB().make();
static NavigableSet<Fun.Tuple2<String, Multimap<String,Values>>> multimapHW
= dbHW.getTreeSet("Applications");
public static void putHW(String mapID, Multimap<String,Values> dbMap) {
multimapHW = dbHW.createTreeSet("Delta").serializer(BTreeKeySerializer.TUPLE2).make();
multimapHW.add(Fun.t2(mapID, dbMap)); // Fun means functional, its the Function to add values in the map
}
}
So why does the multimapHW in the Database just contains 1 entry, instead of many entries?
Source: (StackOverflow)
When one should use MapDb vs regular database through an ORM? Other than having a direct mapping to Java.util.Map which can be implemented as well with an ORM.
Source: (StackOverflow)