EzDevInfo.com

xhtml2pdf

HTML/CSS to PDF converter based on Python HTML/CSS to PDF converter written in Python - HTML2PDF script

http://www.xhtml2pdf.com

Convert arabic page using xhtml2pdf.pisa in Python

I'm trying to convert html2pdf from pisa utility. please check the code below. I'm getting error which I couldn't figure out.

Traceback (most recent call last):
  File "dewa.py", line 27, in <module>
    html = html.encode(enc, 'replace')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 203: ordinal not in range(128)

Please check code here.

from cStringIO import StringIO
from grab import Grab
from grab.tools.lxml_tools import drop_node, render_html
from grab.tools.text import remove_bom
from lxml import etree
import grab.error
import inspect
import lxml
import os
import sys
import xhtml2pdf.pisa as pisa

enc = 'utf-8'
filePath = '~/Desktop/dewa'
##############################

g = Grab()
g.go('http://www.dewa.gov.ae/arabic/aboutus/dewahistory.aspx')

html = g.response.body

html = html.replace('bgcolor="EDF389"', 'bgcolor="#EDF389"')


''' clear page '''
html = html.encode(enc, 'replace')

print html

f = file(filePath + '.html' , 'wb')
f.write(html)
f.flush()
f.close()

''' Save PDF '''
pdfresult = StringIO()
pdf = pisa.pisaDocument(StringIO(html), pdfresult, encoding = enc)
f = file(filePath + '.pdf', 'wb')
f.write(pdfresult.getvalue())
f.flush()
f.close()
pdfresult.close()

Source: (StackOverflow)

Is there a workaround to get floating divs in Report Lab?

I generate PDFs with the xhtml2pdf Python package. The output is not optimal. I use floating divs in order to place images and text on the page. In HTML this works but after PDF rendering, images and text ar placed underneath eachother which is not what I want. From surfing the web I learned that the Report Lab package that is used by xhtml2pdf can not handle floating divs. Does a workaround exist? I have tried webkit rendering via QT but the resulting PDFs are of low quality, i.e. character spacing is completely wrong.

Source: (StackOverflow)

Why do my images not show up when using XHTML2PDF on Google App Engine?

I have followed exactly the code here: Convert HTML into PDF using Python, but my images are still not showing up. They have absolute URLs, in any case.

xhtml2pdf and reportlab are both placed in my app folder as modules, so no import errors pop up or anything. The PDF renders fine, except that images are not being displayed. I tried to remove HTML and CSS width/height attributes as well to no avail.

Any pointers?

Source: (StackOverflow)

Text inside the table cell improperly aligned

I'm using xhtml2pdf (former pisa, or is it vice versa? :)) to generate PDF from the django template. The template is rendered ok, but PDF I get from that template is corrupted in a very weird manner: text in table cells are lifted to the top of the cell, so capital letters touch the upper border of the cell:

enter image description here

While in the browser it looks like that:

enter image description here

I've tried:

Applying vertical-align - looks like it's just ignored, at least I didn't notice any changes in pdf, even if they were in generated html
Applying padding-top - it moves the text down, but increases the cell height as well.
Wrapping text into span with margin-top - same effect as padding-top

I think the reason is that text is rendered by xhtml2pdf at the very top of the line, while browsers tend to render it somewhere in the middle of the block. In other words the text block occupies the very same position both in pdf and html, but the text inside the block is shifted. But that's just my speculation.

So, has anyone faced the same issue? Am I doing something wrong? Any workarounds possible?

Pieces of code:

Rendered html: http://pastebin.com/4jMCLrA4
CSS: http://pastebin.com/vAn8HXkY
Code that generates PDF: http://pastebin.com/6wBULrhx

Source: (StackOverflow)

reportlab ValueError: Invalid color value 'initial'

ReportLab/xhtml2pdf have worked perfectly until now when it crashes at this style bit in HTML:

<p style="border-style: initial; border-color: initial; border-image: initial; 
 font-family: Ubuntu-R; font-size: small; border-width: 0px; padding: 0px; 
 margin: 0px;">Done:</p>

with this error:

File "/usr/local/lib/python2.7/dist-packages/reportlab/lib/colors.py",
line 850, in __call__
    raise ValueError('Invalid color value %r' % arg)
ValueError: Invalid color value 'initial'

I use it typically like this:

     pdf = pisa.pisaDocument(StringIO.StringIO(html.encode('UTF-8')), 
result, encoding='UTF-8', link_callback=fetch_resources)

Is there a way to overcome this other than patching it's original code?

Source: (StackOverflow)

xhtml2pdf does not detect div width

I am using xhtml2pdf for converting html to pdf.

For some reason it does not detect the width of any div. I have tried giving width using style it still does not work. What am I do doing wrong?

    <html>
<head>

</head>
<body>
<style>
    div{
        width:100pt;
        height:100pt;
        border:Solid red 1pt;
    }
</style>
<div>
    WOw a pdf
</div>
</body>

</html>

In the above code the div does not have a width of 100px or 100pt.

def myview(request):
    options1 = ReportPropertyOption.objects.all()
    for option in options1:
        option.exterior_images = ReportExteriorImages.objects.filter(report = option)  
        option.interior_images = ReportInteriorImages.objects.filter(report = option)
        option.floorplan_images = ReportFloorPlanImages.objects.filter(report = option)

    html  = render_to_string('report/export.html', { 'pagesize' : 'A4', }, context_instance=RequestContext(request,{'options1':options1}))
    result = StringIO.StringIO()

    pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), dest=result, link_callback=fetch_resources )
    if not pdf.err:
        return HttpResponse(result.getvalue(), mimetype='application/pdf')
    return HttpResponse('Gremlins ate your pdf! %s' % cgi.escape(html))

def fetch_resources(uri, rel):  

    path = os.path.join(settings.MEDIA_ROOT, uri.replace("/media/", ""))

    return path.replace("\\","/")

Source: (StackOverflow)

Django/Python: generate pdf with the proper language

I use Pisa/xhtml2pdf in my Django apps to generate pdf from an HTML source. That is:

I generate the HTML file formatted with all 'printing' stuffs (e.g. page-breaks, header, footer, etc.)
I convert this HTML into pdf using Pisa

This process is ok but it is slow (expecially when dealing with long tables) and I must use HTML/CSS according to Pisa features/limitations.

The question is: is this the right way to generate pdf from a web application (i.e. create HTML and then convert it to pdf) or there is a more direct way, that is "write" the pdf with a more suitable language?

Source: (StackOverflow)

xhtml2pdf ImportError - Django

I installed xhtml2pdf using pip for use with Django. I am getting the following ImportError:

Reportlab Toolkit Version 2.2 or higher needed

But I have reportlab 3.0

>>> import reportlab
>>> print reportlab.Version                                                                                                                                                                                                                 
3.0

I found this try catch block in the __init__.py of xhtml2pdf:

REQUIRED_INFO = """
****************************************************
IMPORT ERROR!
%s
****************************************************

The following Python packages are required for PISA:
- Reportlab Toolkit >= 2.2 <http://www.reportlab.org/>
- HTML5lib >= 0.11.1 <http://code.google.com/p/html5lib/>

Optional packages:
- pyPDF <http://pybrary.net/pyPdf/>
- PIL <http://www.pythonware.com/products/pil/>

""".lstrip()

log = logging.getLogger(__name__)

try:
    from xhtml2pdf.util import REPORTLAB22

    if not REPORTLAB22:
        raise ImportError, "Reportlab Toolkit Version 2.2 or higher needed"
except ImportError, e:
    import sys

    sys.stderr.write(REQUIRED_INFO % e)
    log.error(REQUIRED_INFO % e)
    raise

There's also another error in the util.py:

if not (reportlab.Version[0] == "2" and reportlab.Version[2] >= "1"):

Shouldn't that read something like:

if not (reportlab.Version[:3] >="2.1"):

What gives?

Source: (StackOverflow)

Django - Creating & Store PDF Files using XHTML2PDF

As of now we are using XHTML2PDF to dynamically generate PDFs and outputting to browser whenever required. Now our requirements is changed to generate the PDF only once and store it in the server. The link should be displayed to user to view the PDF. Could you please point out any resources or snippets to achieve this?

Source: (StackOverflow)

Reportlab. Floating Text with two Columns

First of all, I'm new to python, reportlab, xhtml2pdf. I've already done my first pdf files with reportlab, but I ran into the following problem.

I need a large text in two columns.

First I create my canvas, create my story, append my large text as a paragraph to the story, create my Frame and finally add the story to the frame.

c = Canvas("local.pdf")
storyExample = []
textExample = (""" This is a very large text Lorem Ipsum ... """)
storyExample.append(Paragraph(textExample, styleText))
frameExample = Frame(0, 0, 50, 50,showBoundary=0)
frameExample.addFromList(storyExample,c)
c.showPage()
c.save()

Works like a charm. But I need to show the text in a two column represantation.

Now the text just flows threw my frame like:

|aaaaaaaaaaaaaaaaaaaa|
|bbbbbbbbbbbbbbbbbbbb|
|cccccccccccccccccccc|
|dddddddddddddddddddd|

But I need it like this:

|aaaaaaaaa  bbbbbbbbbb|
|aaaaaaaaa  cccccccccc|
|bbbbbbbbb  cccccccccc|
|bbbbbbbbb  dddddddddd|

I hope you understood what I am trying to say.

Source: (StackOverflow)

Show different footers on first and consecutive pages with pisa/xhtml2pdf

I'm having some trouble getting a footer to appear as one frame on the first page of a Pisa document, and as another frame on every other page. I have attempted to adapt the lastPage idea from here, but with no luck.

Is it possible to do this? <pdf:nextpage /> doesn't seem to be the right thing here since the document has a long table that may (or may not) flow over multiple pages. <pdf:nextframe /> plus a first-page-only frame looks promising, though I'm not sure how to use this exactly.

Currently I have (snipped for brevity):

<style type="text/css">
  @page {
    margin: 1cm;
    margin-bottom: 2.5cm;
    @frame footer {
      -pdf-frame-content: footerFirst;
      -pdf-frame-border: 1;
      bottom: 2cm;
      margin-left: 1cm;
      margin-right: 1cm;
      height: 1cm;
   }
   @frame footer {
      -pdf-frame-content: footerOther;
      bottom: 2cm;
      margin-left: 1cm;
      margin-right: 1cm;
      height: 1cm;
}
</style>

<body>
  <table repeat="1">
    <!-- extra long table here -->
  </table>
  <div id="footerContent">This is a footer</div>
  <!-- what goes here to switch frames after the first page? -->
  <div id="footerOther"></div>
</body>

This places the same footer on each page. I need the same space left on each consecutive pages, but with no content in the frame.

Source: (StackOverflow)

Is there a current tutorial or howto document for using the xhtml2pdf python module?

I can't seem to find a working tutorial or howto document for this module. Does one exist somewhere?

The "To be completed" section here: https://github.com/chrisglass/xhtml2pdf/blob/master/doc/usage.rst

is buggy, and doesn't seem to contain working code. After corrections, this code sequence:

from xhtml2pdf import pisa as pisa
filename = u'test.pdf'
pdf = pisa.CreatePDF("Hello <strong>World</strong>",file(filename, "wb"))
pisa.startViewer(filename)

produces an empty test.pdf file (well, not exactly empty, it's a pdf file without content)

Source: (StackOverflow)

xhtml2pdf and django, varying document size

I'm trying to build a view which would render itself to PDF. Each time I accessed the view, I had some random issues with the structure of rendered document / table. Tracking the error, I've came down to rendering completely static html code, and found out, that - each request, the resulting document size is different.

    template = get_template(self.get_report_template_name())
    html = template.render(Context({}))
    strobj = StringIO.StringIO()
    pisa.CreatePDF(html.encode("UTF-8"), strobj, encoding='UTF-8')
    return HttpResponse('len: %d' % strobj.len);

as you can see, each time the very same template is rendered, with empty context, to make sure nothing changes. anyway, the template doesn't use django templating language at all

the above code returns a bit different result each time I refresh the page

len: 2573, len: 2595 len: 2234, len: 2601, len: 2244, len: 2632,

etc ... (some of the values are repeated multiple time).

when saved & displayed these documents - they contains "broken" page structure, like incorrectly displayed table cell or something. Only one of these looks correct.

Any suggestions where to find the problem?

Source: (StackOverflow)

Python using xhtml2pdf to print webpage into PDF

good day...I am trying to using xhtml2pdf to print webpage into local disk PDF files. there's an example found as below.

it runs and doesn't return error. however it doesn't convert the webpage but only a sentence. in this case, only 'http://www.yahoo.com/' is written into the PDF file.

how can I actually convert the web page into PDF? thanks.

from xhtml2pdf import pisa

sourceHtml = 'http://www.yahoo.com/'
outputFilename = "test.pdf"

def convertHtmlToPdf(sourceHtml, outputFilename):
    resultFile = open(outputFilename, "w+b")
    pisaStatus = pisa.CreatePDF(sourceHtml,resultFile)
    resultFile.close()
    return pisaStatus.err

if __name__=="__main__":
    pisa.showLogging()
    convertHtmlToPdf(sourceHtml, outputFilename)

Source: (StackOverflow)

Does the xhtml2pdf python library support the tag?

I am trying to export an html document to pdf using the xhtml2pdf python library.

I think the <img> tag is supported - however the docs are not clear on this matter - there are a couple of test cases using the tag.

Following the example in the docs, with an image added, I did this:

from xhtml2pdf import pisa
sourceHtml = "<html><body><div><img src ='testimage.jpg'></div><p>Some text output for testing...<p></body></html>"
outputFilename = "test.pdf"
resultFile = open(outputFilename, "w+b")
pisa.CreatePDF(sourceHtml,dest=resultFile)
resultFile.close()

However no image was included in the resulting pdf. Reading around, I see that this might be to do with the PIL package - which appears to be installed OK on my system.

My question is should I be expecting the above code to work with xhtml2pdf or does it ignore the <img> tag?

Source: (StackOverflow)