EzDevInfo.com

neo4j interview questions

Top neo4j frequently asked interview questions

anybody tried neo4j vs titan - pros and cons [closed]

Can anybody please provide or point out to a good comparison between Neo4j and Titan? One thing i can see is in terms of scale - Titan is scaleout and requires an underlying scalable datastore like cassandra. Neo4j is only for HA and has its own embedded database. Any other pros and cons? Any specific usecases. (Is Titan being used anywhere currently?)

I also have the following link: http://architects.dzone.com/articles/16-graph-databases-compared that gives a objective compare for graph databases but not much on pros and cons between Neo4j and Titan.

Source: (StackOverflow)

Neo4j - Cypher vs Gremlin query language

I'm starting to develop with Neo4j using the REST API. I saw that there are two options for performing complex queries - Cypher (Neo4j's query language) and Gremlin (the general purpose graph query/traversal language).

Here's what I want to know - is there any query or operation that can be done by using Gremlin and can't be done with Cypher? or vice versa?

Cypher seems much more clear to me than Gremlin, and in general it seems that the guys in Neo4j are going with Cypher. But - if Cypher is limited compared to Gremlin - I would really like to know that in advance.

Source: (StackOverflow)

Advertisements

How does FlockDB compare with neo4j?

Both FlockDB and neo4j are open source frameworks for keeping large graph dataset. Anyone familiar enough with both products to write a comparison?

Source: (StackOverflow)

Node identifiers in neo4j

I'm new to Neo4j - just started playing with it yesterday evening.

I've notice all nodes are identified by an auto-incremented integer that is generated during node creation - is this always the case?

My dataset has natural string keys so I'd like to avoid having to map between the Neo4j assigned ids and my own. Is it possible to use string identifiers instead?

Source: (StackOverflow)

What is the difference between graph-based databases and object-oriented databases?

What is the difference between graph-based databases (http://neo4j.org/) and object-oriented databases (http://www.db4o.com/)?

Source: (StackOverflow)

What (in_memory) graph DB if modeling data is focused

I am out of ideas and hope to get some useful input. I am using this question to compress my experiences and share them, hoping to inspire some distributors to go the next step with modeling graph databases as a first class question/way.

I've been validating some graph database solutions usable by node.js for a few weeks. My use case is to save interactions of different social user network accounts. The need is to use CPU and memory in the most efficient way.

My most important requirements are:

in_memory (at least for indexing)
open source (and free to use)
same JavaScript/Node.js performance as first class citizen
comfortable query and modeling language

Neo4J

I really like cypher so my best choice would be Neo4j. But the major issue about Neo4j is the JavaScript access is non-native. It uses the REST-API which is about ten times (10x) slower than direct Java access. So I took a look at node-neo4j-embedded, but it has been inactive for more than two years. It looks like its author isn't active at all (bad sign).

ArangoDB

The really nice core developers of ArangoDB answered to my question about internals. Finally it means JavaScript is first class citizen because native queries can be pushed out of JS. Looking at the open source benchmarks, I think it is fair. But I am afraid they didn't use node-neo4j-embedded for their benchmark. The benchmarks compare the REST-APIs (Edited because of @weinberger comment). I wished they compare the native APIs (maybe someone is snoopy enough and give it a try! - let us know!). Update: As I noticed now, OrientDB has answered the benchmark with a new node.js driver (using Command Cache by starting the server with -Dcommand.cache.enabled=true -Dcommand.cache.minExecutionTime=3, what isn't fair, because it wasn't a query caches benchmark!)

Because I like to use ArangoDB as a graph database I would have 3 choices (source: FAQ):

traverse JS objects
using AQLs graph functions
using the REST API

In general it isn't comfortable like cypher. And I am not sure how to compare and what is the right way modeling data (like Neo4J explains very well). I'd love to have something like this for ArangoDB Graphs. It feels like ArangoDB is focused on graph operations and Neo4J fits more the needs of using graphs if you have more relations than rows (the reason to use graphs instead of relations with joins).

MongoDB

The document based MongoDB isn't optimized for graph operations but latterly has gotten an experimental in_memory storage engine. Also there are some projects either in_memory or graph related but nothing is really compelling. And at this discussion it looks like MongoDB isn't what I like to use.

OrientDB

Because there is a comparison about OrientDB vs. MongoDB available (from OrientDB) I though about to use this one. "OrientDB has a hybrid Document-Graph engine" using SQL. I am a former PHP/MySQL expert. But where is the modeling part ? Their chapter working with graphs is not cypher like. It is like using SQL for Graphs. There is nothing wrong with that, but using cypher before I miss the modeling like feeling. If someone did a modeling process with OrientDB and Graphs maybe you could write a tutorial like Neo4J had done.

Update: About JavaScript access like first citizen there are news: "In the next release the speed of this driver will be comparable to the native Java one" The forked node.js driver had bin fixed last days.

Update: Before choosing OrientDB one might want to read article about some issues and discussions linked from there. The article is touching a sensitive issue and should be approached with critical mind. Note from author of this update: I'm new to editing SO and don't have enough reputation to put this to comments. I believe this information is a valid point to discussion, not sure how to place it here according to SO rules.

LokiJS

Before I was looking at Neo4J, ArangoDB and MongoDB, I played around with that JavaScript based in_memory database called LokiJS, what seams to follow the strategy to ignore everything what slows down performance and efficiency. LokiJS is trying to complete the Mongo-Style (RoadMap). The major issue is the bad ability to scale. Of cause it isn't a graph database but it was an interesting solution while the beginning of my project. Also it wasn't a perfect feeling to find all the distributed documentation (maybe they should reboot with GitBook). Finally LokiJS is a very interesting project at all and I hope they will go forward!

LevelDB

Previously when I wrote my degree paper I was looking at levelDB. Remembering this while writing this post, I searched for LevelDB in_memory and got a promising result called MemDown (see also). I haven't tested this find, but maybe someone has experiences working and modeling for this solution. Maybe it would be the most efficient way if all the others will not fit because I would simply write a lightweight cypher clone with the goal to stay much lightweight as I can do.

Edit: Due to comment, here is a link to LevelGraph. As an idea to implement a CYPHER parser for LevelGraph/LevelDB your starting point would be to compare

CREATE (SUBJECT:"a") - [b:PREDICATE] -> (OBJECT:"c") 
RETURN, subject, predicate, object

var RETURN = { SUBJECT: "a", PREDICATE: "b", OBJECT: "c" };
db.put(RETURN, function(err) {
  // ..
});

Conclusion

As you likely noticed I am not the super hero about graphs. But this is my initial dive into this and I'm trying to get an overview. I assume there are a lot people out there who want to ask the same questions as me but haven't the time. I hope this post will help a lot people and will change by comments and answers to become a well done overview how to modeling data for graphs.

@editors: You are welcome.

@commenters: This is the result of my personal research - if you also have done a journey like me, please answer with a short summary like I have done for each DB I've evaluated (don't forget to target my 4 goals).

Source: (StackOverflow)

Hype around graph databases... why?

There is some hype around graph databases. I'm wondering why.

What are the possible problems that one can be confronted with in today's web environment that can be solved using graph databases? And are graph databases suitable for classical applications, i.e. can one be used as a drop-in replacement for a Relational Database? So in fact it's two questions in one.

Related: Has anyone used Graph-based Databases (http://neo4j.org/)?

Source: (StackOverflow)

How to delete/create databases in Neo4j?

Is it possible to create/delete different databases in the graph database Neo4j like in MySQL? Or, at least, how to delete all nodes and relationships of an existing graph to get a clean setup for tests, e.g., using shell commands similar to rmrel or rm?

Source: (StackOverflow)

Has anyone used Graph-based Databases (http://neo4j.org/)? [closed]

I have used Relational DB's a lot and decided to venture out on other types available.

This particular product looks good and promising: http://neo4j.org/

Has anyone used graph-based databases? What are the pros and cons from a usability prespective?

Have you used these in a production environment? What was the requirement that prompted you to use them?

Source: (StackOverflow)

How to increase maximum file open limit (ulimit) in Ubuntu?

Currently ulimit -n shows 10000. I want to increase it to 40000. I've edited "/etc/sysctl.conf" and put fs.file-max=40000. I've also edited /etc/security/limits.conf and updated hard and soft values. But still ulimit shows 10000. After making all these changes I rebooted my laptop. I've access to root password.

usr_name@usr_name-lap:/etc$ /sbin/sysctl fs.file-max
fs.file-max = 500000

Added following lines in /etc/security/limits.conf -

*     soft    nofile          40000
*     hard    nofile          40000

I also added following line in /etc/pam.d/su-

session    required   pam_limits.so

I've tried every possible way as given on other forums, but I can reach up to a maximum limit of 10000, not beyond that. What can be the issue?

I'm making this change because neo4j throws maximum open file limits reached error.

Source: (StackOverflow)

Is it a good idea to use MySQL and Neo4j together?

I will make an application with a lot of similar items (millions), and I would like to store them in a MySQL database, because I would like to do a lot of statistics and search on specific values for specific columns.

But at the same time, I will store relations between all the items, that are related in many connected binary-tree-like structures (transitive closure), and relation databases are not good at that kind of structures, so I would like to store all relations in Neo4j which have good performance for this kind of data.

My plan is to have all data except the relations in the MySQL database and all relations with item_id stored in the Neo4j database. When I want to lookup a tree, I first search the Neo4j for all the item_id:s in the tree, then I search the MySQL-database for all the specified items in a query that would look like:

SELECT * FROM items WHERE item_id = 45 OR item_id = 345435 OR item_id = 343 OR item_id = 78 OR item_id = 4522 OR item_id = 676 OR item_id = 443 OR item_id = 4255 OR item_id = 4345

Is this a good idea, or am I very wrong? I haven't used graph-databases before. Are there any better approaches to my problem? How would the MySQL-query perform in this case?

Source: (StackOverflow)

Cypher - Return node if relationship is not present

I'm trying to create a query using cypher that will "Find" missing ingredients that a chef might have, My graph is set up like so:

(ingredient_value)-[:is_part_of]->(ingredient)

(ingredient) would have a key/value of name="dye colors". (ingredient_value) could have a key/value of value="red" and "is part of" the (ingredient, name="dye colors").

(chef)-[:has_value]->(ingredient_value)<-[:requires_value]-(recipe)-[:requires_ingredient]->(ingredient)

I'm using this query to get all the ingredients, but not their actual values, that a recipe requires, but I would like the return only the ingredients that the chef does not have, instead of all the ingredients each recipe requires. I tried

(chef)-[:has_value]->(ingredient_value)<-[:requires_value]-(recipe)-[:requires_ingredient]->(ingredient)<-[:has_ingredient*0..0]-chef

but this returned nothing.

Is this something that can be accomplished by cypher/neo4j or is this something that is best handled by returning all ingredients and sorted through them myself?

Bonus: Also is there a way to use cypher to match all values that a chef has to all values that a recipe requires. So far I've only returned all partial matches that are returned by a chef-[:has_value]->ingredient_value<-[:requires_value]-recipe and aggregating the results myself.

Source: (StackOverflow)

MongoDB + Neo4J vs OrientDB vs ArangoDB

I am currently on design phase of a MMO browser game, game will include tilemaps for some real time locations (so tile data for each cell) and a general world map. Game engine I prefer uses MongoDB for persistent data world.

I will also implement a shipping simulation (which I will explain more below) which is basically a Dijkstra module, I had decided to use a graph database hoping it will make things easier, found Neo4j as it is quite popular.

I was happy with MongoDB + Neo4J setup but then noticed OrientDB , which apparently acts like both MongoDB and Neo4J (best of both worlds?), they even have VS pages for MongoDB and Neo4J.

Point is, I heard some horror stories of MongoDB losing data (though not sure it still does) and I don't have such luxury. And for Neo4J, I am not big fan of 12K€ per year "startup friendly" cost although I'll probably not have a DB of millions of vertexes. OrientDB seems a viable option as there may be also be some opportunities of using one database solution.

In that case, a logical move might be jumping to OrientDB but it has a small community and tbh didn't find much reviews about it, MongoDB and Neo4J are popular tools widely used, I have concerns if OrientDB is an adventure.

My first question would be if you have any experience/opinion regarding these databases.

And second question would be which Graph Database is better for a shipping simulation. Used Database is expected to calculate cheapest route from any vertex to any vertex and traverse it (classic Dijkstra). But also have to change weights depending on situations like "country B has embargo on country A so any item originating from country A can't pass through B, there is flood at region XYZ so no land transport is possible" etc. Also that database is expected to cache results. I expect no more than 1000 vertexes but many edges.

Thanks in advance and apologies in advance if questions are a bit ambiguous

PS : I added ArangoDB at title but tbh, hadn't much chance to take a look.

Source: (StackOverflow)

Graph Databases - betting the company on it?

Looking at Neo4j, and the 32 billion relationship limit has me worried (imagine 40 million users who upload 500 photos, have 500 friends, make 500 comments etc and before you know it you are past 32 billion).. So I have some concerns and have to make sure I'm making the best choice on which database to use.

Not looking for subjective answers nor debate here - ie. which one is better etc - rather, since I'm betting a startup's future on what graph database is uses, I need to know the risks the different databases present, such as Neo4j not having more than 32billion relationships.

Now, several companies have called their graph databases the "leading graph database".. but let's look past the hype -which one has the most financial backing? Which db enjoys a large community support? Which one has a solid company behind it for commercial support?

Which one is most likely to be mature enough so if you wanted, you could easily create facebook with minimal effort?

It's easy to choose a graph database on technical features or familiarity - but I'm looking for more than that - I want to make sure a few years from the company is still around. I want to make sure I'm not choosing to go with Neo4j based on hype and the momentum it currently (temporarily?) has...

And What other graphs can contend with Neo4gj to create a full fledged social network similar to facebook (again, not looking for better, just looking for a solid competitor ).

Please don't let this turn into a subjective Neo vs Dex debate - just facts and solids answers please..

Source: (StackOverflow)

Graph Database in Java (other than Neo4J)

Greetings,
Is there any open source graph database available other than Neo4J??

NOTE: Why not Neo4J?
Neo4J is opensource, but counts primitives (number of nodes,relationships & properties). If you are using it for commercial use. And does not have any straight forward information of pricing on official website. so there can be potential vendor lock-in (Although I have just started my company, and don't have budget to spent money on software anyway.) so It is out of option.

Regards,

Source: (StackOverflow)