py2neo
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line. The library has no external dependencies and has been carefully designed to be easy and intuitive to use.
The Py2neo 2.0 Handbook — Py2neo 2.0.7 documentation
I have a class that does generic stuff, then I inherited another class that adds more functionality to the base one, as following:
Class Person(StructureNode):
....
Class SpecialPerson(Person):
pass
Now if I do :
Person.index.get(user_id=id)
returns my Person object, But:
SpecialPerson.index.get(user_id=id)
gives me a DoesNotExist("Can't find node in index matching query") exception, which is in neomodel.index.NodeIndexManager at line 56.
How can I make it work?
Thank you
Source: (StackOverflow)
this is more of a best-practices question. I am implementing a search back-end for highly structured data that, in essence, consists of ontologies, terms, and a complex set of mappings between them. Neo4j seemed like a natural fit and after some prototyping I've decided to go with py2neo as a way to communicate with neo4j, mostly because of nice support for batch operations. This is more of a best practices question than anything.
What I'm getting frustrated with is that I'm having trouble with introducing the types of higher-level abstraction that I would like to in my code - I'm stuck with either using the objects directly as a mini-orm, but then I'm making lots and lots of atomic rest calls, which kills performance (I have a fairly large data set).
What I've been doing is getting my query results, using get_properties on them to batch-hydrate my objects, which preforms great and which is why I went down this route in the first place, but this makes me pass tuples of (node, properties) around in my code, which gets the job done, but isn't pretty. at all.
So I guess what I'm asking is if there's a best practice somewhere for working with a fairly rich object graph in py2neo, getting the niceties of an ORM-like later while retaining performance (which in my case means doing as much as possible as batch queries)
Source: (StackOverflow)
How do I perform the functions of shortestPath()
and allShortestPaths()
in py2neo?
In Cypher, I'd execute something like:
START beginning=node(4), end=node(452)
MATCH p = shortestPath(beginning-[*..500]-end)
RETURN p
I've tried what I thought was the equivalent (below), but this doesn't work (these relationships work in cypher, and the node_* objects are indeed the correct nodes
>>> rels = list(graph_db.match(start_node=node_4, end_node=node_452))
>>> rels
[]
Source: (StackOverflow)
I have a variable
name="Rahul"
and,
I want to pass this variable to cypher query in Py2neo in the following manner:
line=session.execute("MATCH (person)WHERE person.name=name RETURN person")
but i am getting an error -
"py2neo.cypher.InvalidSyntax: name not defined (line 1, column 33)"
how to pass the variable in py2neo
Source: (StackOverflow)
neo4j - localhost:7474 browser - which nodes are coloured? (py2neo)
I am creating a complex neo4j database with py2neo.
So far I have 6 node indices and labels, and 5 relationship indices and labels.
When I look through the localhost:7474/browser , some of my node type get coloured, some stay gray.
What is the trigger that colours the nodes in the localhost:7474/browser - or are there only 4 colours in the preset?
Thanks a lot!
Source: (StackOverflow)
I have a query of this form creating a new node in neo4j:
cypher.get_or_create_indexed_node(index="person", key="name", value="Fred", properties={"level" : 1}
However, when I query Fred to inspect his properties, his level = "1" /with quotes/. It appears something is converting his value to a string. This wouldn't matter much---I could convert it on retrieval if necessary---except when I try to do cypher queries like...
start b = node:person("*:*") RETURN b.level, ID(b) ORDER BY b.level desc SKIP 5 LIMIT 5;
...I notice that b.level is not being ordered as expected. I'm seeing something like:
==> +-------------------------+
==> | b.level | ID(b) |
==> +-------------------------+
==> | "3" | 42 |
==> | "0" | 53 |
==> | "2" | 57 |
==> | "0" | 63 |
==> | "2" | 20 |
==> +-------------------------+
when I expect something like:
==> +-------------------------+
==> | b.level | ID(b) |
==> +-------------------------+
==> | 3 | 42 |
==> | 2 | 20 |
==> | 2 | 57 |
==> | 0 | 63 |
==> | 0 | 53 |
==> +-------------------------+
I assume this is a data-type issue, since the reference manual shows skip/limit functionality.
Is it the case that all values are strings, or that there's something else I should add to input correctly?
Source: (StackOverflow)
I am trying to find a workaround to the following problem. I have seen it quasi-described in this SO question, yet not really answered.
The following code fails, starting with a fresh graph:
from py2neo import neo4j
def add_test_nodes():
# Add a test node manually
alice = g.get_or_create_indexed_node("Users", "user_id", 12345, {"user_id":12345})
def do_batch(graph):
# Begin batch write transaction
batch = neo4j.WriteBatch(graph)
# get some updated node properties to add
new_node_data = {"user_id":12345, "name": "Alice"}
# batch requests
a = batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})
batch.set_properties(a, new_node_data) #<-- I'm the problem
# execute batch requests and clear
batch.run()
batch.clear()
if __name__ == '__main__':
# Initialize Graph DB service and create a Users node index
g = neo4j.GraphDatabaseService()
users_idx = g.get_or_create_index(neo4j.Node, "Users")
# run the test functions
add_test_nodes()
alice = g.get_or_create_indexed_node("Users", "user_id", 12345)
print alice
do_batch(g)
# get alice back and assert additional properties were added
alice = g.get_or_create_indexed_node("Users", "user_id", 12345)
assert "name" in alice
In short, I wish, in one batch transaction, to update existing indexed node properties. The failure is occurring at the batch.set_properties
line, and it is because the BatchRequest
object returned by the previous line is not being interpreted as a valid node. Though not entirely indentical, it feels like I am attempting something like the answer posted here
Some specifics
>>> import py2neo
>>> py2neo.__version__
'1.6.0'
>>> g = py2neo.neo4j.GraphDatabaseService()
>>> g.neo4j_version
(2, 0, 0, u'M06')
Update
If I split the problem into separate batches, then it can run without error:
def do_batch(graph):
# Begin batch write transaction
batch = neo4j.WriteBatch(graph)
# get some updated node properties to add
new_node_data = {"user_id":12345, "name": "Alice"}
# batch request 1
batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})
# execute batch request and clear
alice = batch.submit()
batch.clear()
# batch request 2
batch.set_properties(a, new_node_data)
# execute batch request and clear
batch.run()
batch.clear()
This works for many nodes as well. Though I do not love the idea of splitting the batch up, this might be the only way at the moment. Anyone have some comments on this?
Source: (StackOverflow)
Is there a way to iterate through every node in a neo4j database using py2neo?
My first thought was iterating through GraphDatabaseService
, but that didn't work. If there isn't a way to do it with py2neo, is there another python interface that would let me?
Edit: I'm accepting @Nicholas's answer for now, but I'll update it if someone can give me a way that returns a generator.
Source: (StackOverflow)
I am finding Neo4j slow to add nodes and relationships/arcs/edges when using the REST API via py2neo for Python. I understand that this is due to each REST API call executing as a single self-contained transaction.
Specifically, adding a few hundred pairs of nodes with relationships between them takes a number of seconds, running on localhost.
What is the best approach to significantly improve performance whilst staying with Python?
Would using bulbflow and Gremlin be a way of constructing a bulk insert transaction?
Thanks!
Source: (StackOverflow)
I've set up neo4j on server A, and I have an app running on server B which is to connect to it.
If I clone the app on server A and run the unit tests, it works fine. But running them on server B, the setup runs for 30 seconds and fails with an IncompleteRead:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/nose-1.3.1-py2.7.egg/nose/suite.py", line 208, in run
self.setUp()
File "/usr/local/lib/python2.7/site-packages/nose-1.3.1-py2.7.egg/nose/suite.py", line 291, in setUp
self.setupContext(ancestor)
File "/usr/local/lib/python2.7/site-packages/nose-1.3.1-py2.7.egg/nose/suite.py", line 314, in setupContext
try_run(context, names)
File "/usr/local/lib/python2.7/site-packages/nose-1.3.1-py2.7.egg/nose/util.py", line 469, in try_run
return func()
File "/comps/comps/webapp/tests/__init__.py", line 19, in setup
create_graph.import_films(films)
File "/comps/comps/create_graph.py", line 49, in import_films
batch.submit()
File "/usr/local/lib/python2.7/site-packages/py2neo-1.6.3-py2.7-linux-x86_64.egg/py2neo/neo4j.py", line 2643, in submit
return [BatchResponse(rs).hydrated for rs in responses.json]
File "/usr/local/lib/python2.7/site-packages/py2neo-1.6.3-py2.7-linux-x86_64.egg/py2neo/packages/httpstream/http.py", line 563, in json
return json.loads(self.read().decode(self.encoding))
File "/usr/local/lib/python2.7/site-packages/py2neo-1.6.3-py2.7-linux-x86_64.egg/py2neo/packages/httpstream/http.py", line 634, in read
data = self._response.read()
File "/usr/local/lib/python2.7/httplib.py", line 532, in read
return self._read_chunked(amt)
File "/usr/local/lib/python2.7/httplib.py", line 575, in _read_chunked
raise IncompleteRead(''.join(value))
IncompleteRead: IncompleteRead(131072 bytes read)
-------------------- >> begin captured logging << --------------------
py2neo.neo4j.batch: INFO: Executing batch with 2 requests
py2neo.neo4j.batch: INFO: Executing batch with 1800 requests
--------------------- >> end captured logging << ---------------------
The exception happens when I submit a sufficiently large batch. If I reduce the size of the data set, it goes away. It seems to be related to request size rather than the number of requests (if I add properties to the nodes I'm creating, I can have fewer requests).
If I use batch.run()
instead of .submit()
, I don't get an error, but the tests fail; it seems that the batch is rejected silently. If I use .stream()
and don't iterate over the results, the same thing happens as .run()
; if I do iterate over them, I get the same error as .submit()
(except that it's "0 bytes read").
Looking at httplib.py suggests that we'll get this error when an HTTP response has Transfer-Encoding: Chunked
and doesn't contain a chunk size where one is expected. So I ran tcpdump over the tests, and indeed, that seems to be what's happening. The final chunk has length 0x8000
, and its final bytes are
"http://10.210.\r\n
0\r\n
\r\n
(Linebreaks added after \n for clarity.) This looks like correct chunking, but the 0x8000th byte is the first "/", rather than the second ".". Eight bytes early. It also isn't a complete response, being invalid JSON.
Interestingly, within this chunk we get the following data:
"all_relatio\r\n
1280\r\n
nships":
That is, it looks like the start of a new chunk, but embedded within the old one. This new chunk would finish in the correct location (the second "." of above), if we noticed it starting. And if the chunk header wasn't there, the old chunk would finish in the correct location (eight bytes later).
I then extracted the POST request of the batch, and ran it using cat batch-request.txt | nc $SERVER_A 7474
. The response to that was a valid chunked HTTP response, containing a complete valid JSON object.
I thought maybe netcat was sending the request faster than py2neo, so I introduced some slowdown
cat batch-request.txt | perl -ne 'BEGIN { $| = 1 } for (split //) { select(undef, undef, undef, 0.1) unless int(rand(50)); print }' | nc $SERVER_A 7474
But it continued to work, despite being much slower now.
I also tried doing tcpdump on server A, but requests to localhost don't go over tcp.
I still have a few avenues that I haven't explored: I haven't worked out how reliably the request fails or under precisely which conditions (I once saw it succeed with a batch that usually fails, but I haven't explored the boundaries). And I haven't tried making the request from python directly, without going through py2neo. But I don't particularly expect either of these to be be very informative. And I haven't looked closely at the TCP dump except for using wireshark's 'follow TCP stream' to extract the HTTP conversation; I don't really know what I'd be looking for there. There's a large section that wireshark highlights in black in the failed dump, and only isolated lines black in the successful dump, maybe that's relevant?
So for now: does anyone know what might be going on? Anything else I should try to diagnose the problem?
The TCP dumps are here: failed and successful.
EDIT: I'm starting to understand the failed TCP dump. The whole conversation takes ~30 seconds, and there's a ~28-second gap in which both servers are sending ZeroWindow TCP frames - these are the black lines I mentioned.
First, py2neo fills up neo4j's window; neo4j sends a frame saying "my window is full", and then another frame which fills up py2neo's window. Then we spend ~28 seconds with each of them just saying "yup, my window is still full". Eventually neo4j opens its window again, py2neo sends a bit more data, and then py2neo opens its window. Both of them send a bit more data, then py2neo finishes sending its request, and neo4j sends more data before also finishing.
So I'm thinking that maybe the problem is something like, both of them are refusing to process more data until they've sent some more, and neither can send some more until the other processes some. Eventually neo4j enters a "something's gone wrong" loop, which py2neo interprets as "go ahead and send more data".
It's interesting, but I'm not sure what it means, that the penultimate TCP frame sent from neo4j to py2neo starts \r\n1280\r\n
- the beginning of the fake-chunk. The \r\n8000\r\n
that starts the actual chunk, just appears part-way through an unremarkable TCP frame. (It was the third frame sent after py2neo finished sending its post request.)
EDIT 2: I checked to see precisely where python was hanging. Unsurprisingly, it was while sending the request - so BatchRequestList._execute()
doesn't return until after neo4j gives up, which is why neither .run()
or .stream()
did any better than .submit()
.
Source: (StackOverflow)
I was wondering if there is a function to remove all uniqueness constraints of a SchemaResource
, without specifying its labels and properties keys.
It may be possible by retrieving Graph.node_labels
, then iterate through them to find SchemaResource.get_indexes()
and finally calling SchemaResource.drop_uniqueness_constraint()
for each tuple.
It may also be possible with CypherResource.execute()
and a Cypher query.
Is there another option?
Source: (StackOverflow)
I execute a long running (5 mintues) Cypher query with py2neo 2.0:
graph.cypher.run(query)
or result = graph.cypher.execute(query)
The query fails after ~60 sec with a Socket Error from httpstream:
ERROR:httpstream:! SocketError: timed out
The same happens when I use a Cypher transaction. This did not happen with the same query and py2neo 1.6.4. Can I increase the time py2neo waits for a response? I didn't find anything in the docs.
Update
I found a hard coded socket_timeout
in py2neo.packages.httpstream.http
. Setting it to a higher value avoids the SocketError:
from py2neo.packages.httpstream import http
http.socket_timeout = 9999
result = graph.cypher.execute("MATCH (g:Gene) RETURN count(g)")
Can I somehow set the timeout for a single query?
Source: (StackOverflow)
I have started working with Node4j and I was exploring a bit the batch processing, but unfortunately, I am having some problems in creating relations between nodes.
My problem is the following. I have a list of websites and users that I read from a file. I may have repeated websites and users in that file, so I do not want to insert new nodes for those repeated entries. But as the file is big, I want to batch the processing of the nodes and relations.
Basically, I have these two functions to create nodes and relations and add them to the batch.
graph_db = neo4j.GraphDatabaseService("http://localhost:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)
def create_node(pvalue, svalue, type):
return batch.create({\
"pkey" : pvalue,
"skey" : svalue,
"type" : type
}
)
def create_rel(from_node, type_label, to_node, fields):
properties =\
{"ACCT_KEY": fields.ACCT_KEY}
relation = rel(from_node, type_label, to_node, **properties)
batch.create(relation)
Then, after using a dictionary to make sure I have not created the nodes before, I do:
node1 = create_node("ATTRIBUTE_1", "ATTRIBUTE_2", "WEBSITE")
node2 = create_node("ATTRIBUTE_3", "ATTRIBUTE_4", "USER")
create_rel(node1, "VISITED_BY", node2, fields)
I save the references to "node1" and "node2" in a dictionary, so when I want to create a relation involving a website or a user that has already been registered, I will not create the node again, but use directly the reference. I do this inside a loop and it works fine, till I decide to do this after a certain number of iterations:
batch.submit()
batch.clear()
When I decide to use those references from previous batches, I get the following error:
Traceback (most recent call last):
File "main.py", line 102, in <module>
create_rel(cardholder, fraud_label, merchant,fields)
File "main.py", line 33, in create_rel
batch.create(relation)
File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2775, in create
"to": self._uri_for(entity.end_node)
File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2613, in _uri_for
uri = "{{{0}}}".format(self.find(resource))
File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2604, in find
raise ValueError("Request not found")
ValueError: Request not found
I believe that this happens because it somehow loses the references from the previous batches and they are no longer valid. I have tried to collect the IDs from the nodes and use those instead, but I cannot find how to do it. Any help would be appreciated, thanks.
My Node4j version is "2.0.3 community edition for Unix" and py2neo version 1.6.4.
Source: (StackOverflow)
I'm trying to build a date tree in my Neo4j database that will work with the calendar module in Nigel Small's py2neo library.
I used Mark Needham's starter code from here (http://java.dzone.com/articles/neo4j-cypher-creating-time), but that doesn't connect all of the Year nodes to a master calendar node, which is required for the py2neo library. (docs here: http://book.py2neo.org/en/latest/calendar/)
I've modified Mark's code to try and create a master node to connect all of the years like this:
CREATE (x:Calendar {name:'master'})
WITH range(2005, 2014) AS years, range(1,12) as months
FOREACH(year IN years |
CREATE (y:Year {year: year})
MERGE (x)-[:YEAR]->(y)
FOREACH(month IN months |
CREATE (m:Month {month: month})
MERGE (y)-[:MONTH]->(m)
FOREACH(day IN (CASE
WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31)
WHEN month = 2 THEN
CASE
WHEN year % 4 <> 0 THEN range(1,28)
WHEN year % 100 <> 0 THEN range(1,29)
WHEN year % 400 <> 0 THEN range(1,29)
ELSE range(1,28)
END
ELSE range(1,30)
END) |
CREATE (d:Day {day: day})
MERGE (m)-[:DAY]->(d))))
What's happening is that there's a node (with Calendar label) that's getting created but no relationships attached to it, while there's a node (with no label) that's getting created and attached to each Year node.
I know this is probably a really easy fix, but I'm new to CYPHER and am really struggling to figure this out.
Source: (StackOverflow)
I am creating a lot of nodes in neo4j using python's py2neo. I am using neo4j version 2 which has support for labels.
I would like to add a label to the node I have created in batches. Is there a way to do that ?
a = batch.create(node(name='Alice'))
Now I would like to add the labels "Female", and "Human" to a.
Source: (StackOverflow)