psycopg2
Python PostgreSQL adapter
PostgreSQL + Python | Psycopg python adapter for postgresql
Trying to setup postgres with the postgres mac app and hit this error, which I haven't been able to solve. Any thoughts?
ImportError: dlopen(/Users/Craig/pyenv/mysite/lib/python2.7/site-packages/psycopg2/_psycopg.so, 2): Library not loaded: @executable_path/../lib/libssl.1.0.0.dylib
Referenced from: /Applications/Postgres.app/Contents/MacOS/lib/libpq.dylib
Reason: image not found
Source: (StackOverflow)
I was trying to install postgres for a tutorial, but pip
gives me error:
pip install psycopg
A snip of error I get:
Error: pg_config executable not found.
Please add the directory containing pg_config to the PATH
or specify the full executable path with the option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
Where is pg_config
in my virtualenv? How to configure it? I'm using virtualenv because I do not want a system-wide installation of postgres.
Source: (StackOverflow)
I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.
Source: (StackOverflow)
I got a lot of errors with the message :
"DatabaseError: current transaction is aborted, commands ignored until end of transaction block"
after changed from python-psycopg to python-psycopg2 as Django project's database engine.
The code remains the same, just dont know where those errors are from.
Source: (StackOverflow)
Overview
I'm attempting to improve the performance of our database queries for SQLAlchemy. We're using psycopg2. In our production system, we're chosing to go with Java because it is simply faster by at least 50%, if not closer to 100%. So I am hoping someone in the Stack Overflow community has a way to improve my performance.
I think my next step is going to be to end up patching the psycopg2 library to behave like the JDBC driver. If that's the case and someone has already done this, that would be fine, but I am hoping I've still got a settings or refactoring tweak I can do from Python.
Details
I have a simple "SELECT * FROM someLargeDataSetTable" query running. The dataset is GBs in size. A quick performance chart is as follows:
Timing Table
Records | JDBC | SQLAlchemy[1] | SQLAlchemy[2] | Psql
--------------------------------------------------------------------
1 (4kB) | 200ms | 300ms | 250ms | 10ms
10 (8kB) | 200ms | 300ms | 250ms | 10ms
100 (88kB) | 200ms | 300ms | 250ms | 10ms
1,000 (600kB) | 300ms | 300ms | 370ms | 100ms
10,000 (6MB) | 800ms | 830ms | 730ms | 850ms
100,000 (50MB) | 4s | 5s | 4.6s | 8s
1,000,000 (510MB) | 30s | 50s | 50s | 1m32s
10,000,000 (5.1GB) | 4m44s | 7m55s | 6m39s | n/a
--------------------------------------------------------------------
5,000,000 (2.6GB) | 2m30s | 4m45s | 3m52s | 14m22s
--------------------------------------------------------------------
[1] - With the processrow function
[2] - Without the processrow function (direct dump)
I could add more (our data can be as much as terabytes), but I think changing slope is evident from the data. JDBC just performs significantly better as the dataset size increases. Some notes...
Timing Table Notes:
- The datasizes are approximate, but they should give you an idea of the amount of data.
- I'm using the 'time' tool from a Linux bash commandline.
- The times are the wall clock times (i.e. real).
- I'm using Python 2.6.6 and I'm running with
python -u
- Fetch Size is 10,000
- I'm not really worried about the Psql timing, it's there just as a reference point. I may not have properly set fetchsize for it.
- I'm also really not worried about the timing below the fetch size as less than 5 seconds is negligible to my application.
- Java and Psql appear to take about 1GB of memory resources; Python is more like 100MB (yay!!).
- I'm using the [cdecimals] library.
- I noticed a [recent article] discussing something similar to this. It appears that the JDBC driver design is totally different to the psycopg2 design (which I think is rather annoying given the performance difference).
- My use-case is basically that I have to run a daily process (with approximately 20,000 different steps... multiple queries) over very large datasets and I have a very specific window of time where I may finish this process. The Java we use is not simply JDBC, it's a "smart" wrapper on top of the JDBC engine... we don't want to use Java and we'd like to stop using the "smart" part of it.
- I'm using one of our production system's boxes (database and backend process) to run the query. So this is our best-case timing. We have QA and Dev boxes that run much slower and the extra query time can become significant.
testSqlAlchemy.py
#!/usr/bin/env python
# testSqlAlchemy.py
import sys
try:
import cdecimal
sys.modules["decimal"]=cdecimal
except ImportError,e:
print >> sys.stderr, "Error: cdecimal didn't load properly."
raise SystemExit
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
def processrow (row,delimiter="|",null="\N"):
newrow = []
for x in row:
if x is None:
x = null
newrow.append(str(x))
return delimiter.join(newrow)
fetchsize = 10000
connectionString = "postgresql+psycopg2://usr:pass@server:port/db"
eng = create_engine(connectionString, server_side_cursors=True)
session = sessionmaker(bind=eng)()
with open("test.sql","r") as queryFD:
with open("/dev/null","w") as nullDev:
query = session.execute(queryFD.read())
cur = query.cursor
while cur.statusmessage not in ['FETCH 0','CLOSE CURSOR']:
for row in query.fetchmany(fetchsize):
print >> nullDev, processrow(row)
After timing, I also ran a cProfile and this is the dump of worst offenders:
Timing Profile (with processrow)
Fri Mar 4 13:49:45 2011 sqlAlchemy.prof
415757706 function calls (415756424 primitive calls) in 563.923 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 563.924 563.924 {execfile}
1 25.151 25.151 563.924 563.924 testSqlAlchemy.py:2()
1001 0.050 0.000 329.285 0.329 base.py:2679(fetchmany)
1001 5.503 0.005 314.665 0.314 base.py:2804(_fetchmany_impl)
10000003 4.328 0.000 307.843 0.000 base.py:2795(_fetchone_impl)
10011 0.309 0.000 302.743 0.030 base.py:2790(__buffer_rows)
10011 233.620 0.023 302.425 0.030 {method 'fetchmany' of 'psycopg2._psycopg.cursor' objects}
10000000 145.459 0.000 209.147 0.000 testSqlAlchemy.py:13(processrow)
Timing Profile (without processrow)
Fri Mar 4 14:03:06 2011 sqlAlchemy.prof
305460312 function calls (305459030 primitive calls) in 536.368 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 536.370 536.370 {execfile}
1 29.503 29.503 536.369 536.369 testSqlAlchemy.py:2()
1001 0.066 0.000 333.806 0.333 base.py:2679(fetchmany)
1001 5.444 0.005 318.462 0.318 base.py:2804(_fetchmany_impl)
10000003 4.389 0.000 311.647 0.000 base.py:2795(_fetchone_impl)
10011 0.339 0.000 306.452 0.031 base.py:2790(__buffer_rows)
10011 235.664 0.024 306.102 0.031 {method 'fetchmany' of 'psycopg2._psycopg.cursor' objects}
10000000 32.904 0.000 172.802 0.000 base.py:2246(__repr__)
Final Comments
Unfortunately, the processrow function needs to stay unless there is a way within SQLAlchemy to specify null = 'userDefinedValueOrString' and delimiter = 'userDefinedValueOrString' of the output. The Java we are using currently already does this, so the comparison (with processrow) needed to be apples to apples. If there is a way to improve the performance of either processrow or SQLAlchemy with pure Python or a settings tweak, I'm very interested.
Source: (StackOverflow)
What fork, or combination of packages should one to use to make PyPy, Django and PostgreSQL play nice together?
I know that PyPy and Django play nice together, but I am less certain about PyPy and PostgreSQL. I do see that Alex Gaynor has made a fork of PyPy called pypy-postgresql. I also know that some people are using psycopg2-ctypes.
Is there a difference between these forks? Or should we use the stable 1.9 PyPy and use psycopg2-ctypes? Using the ctypes options could hurt performance, see the comment below.
Also, has anyone experienced any pitfalls with using PyPy with pyscopg2? It seems easy enough to fall back on CPython if something isn't working right, but mostly I'm looking for things a programmer can do ahead of time to prepare.
I looked around, it doesn't seem that psycopg2 works natively with PyPy. Although, psycopg2-ctypes does seem to be working for some people, there was a discussion on pypy-dev. I work on Windows, and I don't think psycopg2-ctypes is ready for Windows yet, sadly.
Source: (StackOverflow)
I am trying to set up a PostgreSQL database for my django project, which I believe I have done now thanks to the replies to my last question Problems setting up a postgreSQL database for a django project. I am now trying to run the command 'python manage.py runserver' in Terminal to get my localhost up but when I run the command, I see this response...
Error: No module named psycopg2.extensions
I'm not sure what this means - I have tried to download psycopg2 but can't seem to find a way to download psycopg2 using homebrew. I have tried easy_install, pip install and sudo but all return errors like this...
Downloading http://www.psycopg.org/psycopg/tarballs/PSYCOPG-2-4/psycopg2-2.4.5.tar.gz
Processing psycopg2-2.4.5.tar.gz
Writing /tmp/easy_install-l7Qi62/psycopg2-2.4.5/setup.cfg
Running psycopg2-2.4.5/setup.py -q bdist_egg --dist-dir /tmp/easy_install-l7Qi62/psycopg2-2.4.5/egg-dist-tmp-PBP5Ds
no previously-included directories found matching 'doc/src/_build'
unable to execute gcc-4.0: No such file or directory
error: Setup script exited with error: command 'gcc-4.0' failed with exit status 1
Any help on how to fix this would be very much appreciated!
Thanks
Jess
(Sorry if this is a really simple problem to solve - this is my first django project)
Source: (StackOverflow)
I'm new to Python and Django.
I'm configuring a Django project using PostgreSQL database engine backend, But I'm getting errors on each database operations, for example when i run manage.py syncdb
, I'm getting:
C:\xampp\htdocs\djangodir>python manage.py syncdb
Traceback (most recent call last):
File "manage.py", line 11, in <module>
execute_manager(settings)
File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line
438, in execute_manager
utility.execute()
File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line
379, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line
261, in fetch_command
klass = load_command_class(app_name, subcommand)
File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line
67, in load_command_class
module = import_module('%s.management.commands.%s' % (app_name, name))
File "C:\Python27\lib\site-packages\django\utils\importlib.py", line 35, in im
port_module
__import__(name)
File "C:\Python27\lib\site-packages\django\core\management\commands\syncdb.py"
, line 7, in <module>
from django.core.management.sql import custom_sql_for_model, emit_post_sync_
signal
File "C:\Python27\lib\site-packages\django\core\management\sql.py", line 6, in
<module>
from django.db import models
File "C:\Python27\lib\site-packages\django\db\__init__.py", line 77, in <modul
e>
connection = connections[DEFAULT_DB_ALIAS]
File "C:\Python27\lib\site-packages\django\db\utils.py", line 92, in __getitem
__
backend = load_backend(db['ENGINE'])
File "C:\Python27\lib\site-packages\django\db\utils.py", line 33, in load_back
end
return import_module('.base', backend_name)
File "C:\Python27\lib\site-packages\django\utils\importlib.py", line 35, in im
port_module
__import__(name)
File "C:\Python27\lib\site-packages\django\db\backends\postgresql\base.py", li
ne 23, in <module>
raise ImproperlyConfigured("Error loading psycopg module: %s" % e)
django.core.exceptions.ImproperlyConfigured: Error loading psycopg module: No mo
dule named psycopg
Can someone give me a clue on what is going on?
Source: (StackOverflow)
Is it possible to install psycopg2
into a virtualenv
when PostgreSQL isn't installed on my development system—MacBook Pro with OS X 10.6?
When I run pip install psycopg2
from within my virtualenv
, I received the error shown below.
I'm trying to connect to a legacy database on a server using Django, and I'd prefer not to install PostgreSQL on my development system if possible.
Why not install PostgreSQL?
I received an error when installing PostgreSQL using homebrew. I have Xcode4—and only Xcode4—installed on my MacBook Pro and am thinking it's related to missing gcc 4.0. However, this is a problem for another StackOverflow question.
Update 8:37 AM on April 12, 2011: I'd still like to know if this is possible without installing PostgreSQL on my MacBook Pro. However, I ran brew update
and forced a reinstallation of ossp-uuid with brew install --force ossp-uuid
and now brew install postgresql
works. With PostgreSQL successfully installed, I was able to pip install psycopg2
from within my virtualenv.
Error from pip install psycopg2
$ pip install psycopg2
Downloading/unpacking psycopg2
Running setup.py egg_info for package psycopg2
Error: pg_config executable not found.
Please add the directory containing pg_config to the PATH
or specify the full executable path with the option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
Complete output from command python setup.py egg_info:
running egg_info
writing pip-egg-info/psycopg2.egg-info/PKG-INFO
writing top-level names to pip-egg-info/psycopg2.egg-info/top_level.txt
writing dependency_links to pip-egg-info/psycopg2.egg-info/dependency_links.txt
warning: manifest_maker: standard file '-c' not found
Error: pg_config executable not found.
Please add the directory containing pg_config to the PATH
or specify the full executable path with the option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /Users/matthew/.pip/pip.log
Preliminary Research
Below are the articles I read as preliminary research:
Source: (StackOverflow)
Currently i am installing psycopg2 for work within eclipse with python.
I am finding a lot of problems:
- The first problem
sudo pip3.4 install psycopg2
is not working and it is showing the following message
Error: pg_config executable not found.
FIXED WITH:export PATH=/Library/PostgreSQL/9.4/bin/:"$PATH”
- When I import psycopg2 in my project i obtein:
ImportError:
dlopen(/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/psycopg2/_psycopg.so
Library libssl.1.0.0.dylib
Library libcrypto.1.0.0.dylib
FIXED WITH:
sudo ln -s /Library/PostgreSQL/9.4/lib/libssl.1.0.0.dylib /usr/lib
sudo ln -s /Library/PostgreSQL/9.4/lib/libcrypto.1.0.0.dylib /usr/lib
- Now I am obtaining:
ImportError:
dlopen(/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/psycopg2/_psycopg.so,
2): Symbol not found: _lo_lseek64 Referenced from:
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/psycopg2/_psycopg.so
Expected in: /usr/lib/libpq.5.dylib in
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/psycopg2/_psycopg.so
After 3 hours I didn´t find the solution. Can you help me?
Thank you so much!
Regards Benja.
Source: (StackOverflow)
Been trying to install psycopg2 with either easy_install or pip, and the terminal gets stuck in a loop between xcrun and lipo.
sidwyn$ sudo easy_install psycopg2
Searching for psycopg2
Reading https://pypi.python.org/simple/psycopg2/
Reading http://initd.org/psycopg/
Reading http://initd.org/projects/psycopg2
Best match: psycopg2 2.5.1
Downloading https://pypi.python.org/packages/source/p/psycopg2/psycopg2-2.5.1.tar.gz#md5=1b433f83d50d1bc61e09026e906d84c7
Processing psycopg2-2.5.1.tar.gz
Writing /tmp/easy_install-dTk7cd/psycopg2-2.5.1/setup.cfg
Running psycopg2-2.5.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-dTk7cd/psycopg2-2.5.1/egg-dist-tmp-4jaXas
clang: warning: argument unused during compilation: '-mno-fused-madd'
It bounces between xcrun and lipo and is stuck forever in this loop. Would appreciate some insights on this.
I'm on OS X Mavericks 10.9, latest build.
Source: (StackOverflow)
How can I determine if a table exists using the Psycopg2 Python library? I want a true or false boolean.
Source: (StackOverflow)
I would like a general way to generate column labels directly from the selected column names, and recall seeing that python's psycopg2 module supports this feature.
Source: (StackOverflow)
I'm using Python and psycopg2 to interface to postgres.
When I insert a row...
sql_string = "INSERT INTO hundred (name,name_slug,status) VALUES ("
sql_string += hundred_name + ", '" + hundred_slug + "', " + status + ");"
cursor.execute(sql_string)
... how do I get the ID of the row I've just inserted? Trying:
hundred = cursor.fetchall()
returns an error, while using RETURNING id
:
sql_string = "INSERT INTO domes_hundred (name,name_slug,status) VALUES ("
sql_string += hundred_name + ", '" + hundred_slug + "', " + status + ") RETURNING id;"
hundred = cursor.execute(sql_string)
simply returns None.
UPDATE: So does currval
(even though using this command directly into postgres works):
sql_string = "SELECT currval(pg_get_serial_sequence('hundred', 'id'));"
hundred_id = cursor.execute(sql_string)
Can anyone advise?
thanks!
Source: (StackOverflow)
I haven't worked with psycopg2 before but I'm trying to change the cursor factory to DictCursor so that fetchall or fetchone will return a dictionary instead of a list.
I created a test script to make things simple and only test this functionality. Here's my little bit of code that I feel should work
import psycopg2
import psycopg2.extras
conn = psycopg2.connect("dbname=%s user=%s password=%s" % (DATABASE, USERNAME, PASSWORD))
cur = conn.cursor(cursor_factory = psycopg2.extras.DictCursor)
cur.execute("SELECT * from review")
res = cur.fetchall()
print type(res)
print res
The res variable is always a list and not a dictionary as I would expect.
A current workaround that I've implemented is to use this function that builds a dictionary and run each row returned by fetchall through it.
def build_dict(cursor, row):
x = {}
for key,col in enumerate(cursor.description):
x[col[0]] = row[key]
return d
Python is version 2.6.7 and psycopg2 is version 2.4.2.
Source: (StackOverflow)