EzDevInfo.com

python-docx

Reads, queries and modifies Microsoft Word 2007/2008 docx files.

Django Choices Reference in View

Model.py:

class RiskIssue(models.Model):
RISK_ISSUE_SEVERITY = (
    ('L', 'Low'),
    ('M', 'Medium'),
    ('H', 'High'),
)
projectRiskIssueSeverity = models.CharField("Risk/Issue Severity", max_length=1, choices=RISK_ISSUE_SEVERITY, default='L')

View.py code:

cell = table.rows[2].cells[0]
formatted_status = [astatus.get_risk_issue_severity_display() for astatus in activitylist.values_list('activityStatus', flat=True)]
cell.paragraphs[0].text = ', '.join(formatted_status)

I am getting the error code: 'unicode' object has no attribute 'get_risk_issue_severity_display'

What am I doing wrong? I have tried everything except the correct answer. Very small TypeError problem here I assume


Source: (StackOverflow)

python-docx - how to control font and other attributes

I'm using python-docx to generate .docx files. I'd like to be able to control the style of the paragraphs (and of individual words within if that's possible) that I'm appending to the document body.

The business-end of generating a paragraph looks like this:

body.append(paragraph("This is a new paragraph"))

Now, I'm not familiar with the complexities of XML and to be honest, learning enough of it to parse and manipulate it using lxml would be overkill for what I have in mind. Can anyone provide a simple example of, say, changing the font of my paragraph above from the default to Courier New?

There are a couple of similar (but unanswered) questions here, so apologies for the breach of protocol if I've got it wrong.


Source: (StackOverflow)

Advertisements

Cell spanning multiple columns in table using python-docx

I'm trying to create a table that looks like this, using the python-docx module.

Example of the table I want to create

Working from the example code for creating a table in example-makedocument.py and reading through the code in docx.py, I thought something similar to this would work:

tbl_rows = [ ['A1'], 
       ['B1', 'B2' ],
       ['C1', 'C2' ] ]
tbl_colw = [ [100],
       [25, 75],
       [25, 75] ]
tbl_cwunit = 'pct'

body.append(table(tbl_rows, colw=tbl_colw, cwunit=tbl_cwunit))

however this corrupts the docx document, and when Word recovers the document the table is shown as this:

Actual table created

How can I get a row to properly span multiple columns using python-docx?


Source: (StackOverflow)

How to find a list in docx using python?

I'm trying to pull apart a word document that looks like this:

1.0 List item
1.1 List item
1.2 List item
2.0 List item

It is stored in docx, and I'm using python-docx to try to parse it. Unfortunately, it loses all the numbering at the start. I'm trying to identify the start of each ordered list item.

The python-docx library also allows me to access styles, but I cannot figure out how to determine whether the style is a list style or not.

So far I've been messing around with a function and checking output, but the standard format is something like:

    for p in doc.paragraphs:
        s = p.style
        while s.base_style is not None:
            print s.name
            s = s.base_style
        print s.name

Which I've been using to try to search up through the custom styles, but the all end at "Normal," as opposed to the "ListNumber."

I've tried searching styles under the document, the paragraphs, and the runs without luck. I've also tried searching p.text, but as previously mentioned the numbering does not persist.


Source: (StackOverflow)

Extract text of certain font face from a docx file

I am using python 3.4 along with python-docx library to work on .docx files. I have been able to extract text from the document. But my objective is to extract only those text with certain font (and modify them).

I have been searching for this in the library documentation for the past two days with no result.

Does anybody here have experience with this library, if so could they point me in the right direction.


Source: (StackOverflow)

libxml2 ImportError when importing etree from lxml

I am trying to run python-docx in Enthought Canopy. I'm running Ubuntu 14.04 on VMWare Worstation 10. Whenever I try to import docx or import etree from lxml, I get an error that seems to be related to the libxml2 library:

ImportError: /usr/lib/x86_64-linux-gnu/libxslt.so.1: symbol xmlBufUse, version   LIBXML2_2.9.0 not defined in file libxml2.so.2 with link time reference

I checked around, and found another person with a similar error, and I followed the directions. Still no luck. Then I realized that I have libxml2 version 2.9.1 on my computer.

Does this mean that I need to get the older 2.9.0 version of libxml2, or is something else going on? I can find 2.9.1 with apt-get, but not 2.9.0. Before setting off on that path, I figured someone here might have a better idea.

Thanks!


Source: (StackOverflow)

Downloadable docx file in Django

My django web app makes and save docx and I need to make it downloadable. I use simple render_to_response as below.

return render_to_response("test.docx", mimetype='application/vnd.ms-word')

However, it raises error like 'utf8' codec can't decode byte 0xeb in position 15: invalid continuation byte

I couldn't serve this file as static so I need to find a way to serve it as this. Really appreciate for any help.


Source: (StackOverflow)

How to apply a style to a paragraph with python-docx?

I am trying to apply a style to a paragraph in a document using the python-docx module. I can write new text, but I cannot apply a different style to a previously-written paragraph.

Here is an example:

x = docx.Document('Sample.docx')

x
Out[81]: <docx.api.Document at 0x121cebed0>

x.paragraphs[2].style
Out[82]: 'Normal'

x.paragraphs[2].style = 'Title'

x.paragraphs[2].style
Out[84]: 'Normal'

I cannot tell from the documentation whether this should work or not. It seems to indicate that paragraph style is writable, but there is no example of this use case in the documentation.

Please let me know whether this is my problem, or if the functionality is not implemented yet.


Source: (StackOverflow)

Parsing Doxygen XML and Generating .docx Files

I have been automating my C code with Doxygen for a long time, and find the HTML output great for understanding the code. However, I have a new requirement to generate Word .docx files containing the same information, but with very particular formatting requirements.

Using the python-docx package, writing a .docx file, and configuring the styles to match my requirements is pretty straightforward, parsing Doxygen's XML outputs with LXML. However, for the more complex fields (e.g. detailed description of files and functions), there is fairly complex document-style XML markup.

My understanding is there can be an arbitrary nesting of <para>, <itemizedlist>, <xrefsect> and other tags with free-standing text "tails" trailing them at the same level. My approach to converting from nested XML to flat .docx paragraphs and bulleted lists has been to employ a state-machine, but this seems fragile since there could be an arbitrary number of nesting levels.

My current strategy is in a gist here. I was wondering if there is a more general way of understanding documents with complex nesting. Is there a less fragile tactic than my state-machine? I'm not sure if I'm trying to reinvent the wheel, but the more cases I add to the state-machine, the less it feels like a robust solution.


Source: (StackOverflow)

combine word document using python docx

i have few word files that each one of them have specific content. i would like for a snippet that show me or help me to figure out how to do it (combine the word files into one file) while using python docx library.

for example in pywin32 library i did the following:

rng = self.doc.Range(0,0)
    for i in range(len(data)):
        time.sleep(0.05)

        docstart=data[i].wordDoc.Content.Start
        self.word.Visible = True
        docend=data[i].wordDoc.Content.End-1
        location= data[i].wordDoc.Range(docstart,docend).Copy()
        rng.Paste()
        rng.Collapse(0)
        rng.InsertBreak(win32.constants.wdPageBreak)

but i need to do it while using python docx library instead of win32.client.

thank you very much,

omri


Source: (StackOverflow)

How to add a temporary .docx file to a zip archive in django

here is my code for downloading a zip file, containing a .docx file,

def reportsdlserien(request):
    selected_sem = request.POST.get("semester","SS 2016")

    docx_title="Report_in_%s.docx" % selected_sem.replace(' ','_')

    document = Document()
    f = io.BytesIO()

    zip_title="Archive_in_%s.zip" % selected_sem.replace(' ','_')
    zip_arch = ZipFile( f, 'a' )

    document.add_heading("Report in "+selected_sem, 0)
    document.add_paragraph(date.today().strftime('%d %B %Y'))

    document.save(docx_title)
    zip_arch.write(docx_title)
    zip_arch.close()
    response = HttpResponse(
        f.getvalue(),
        content_type='application/zip'
    )
    response['Content-Disposition'] = 'attachment; filename=' + zip_title
    return response

the only problem is, it also creates the .docx file, which i dont need. I wanted to use BytesIO for a docx file too, but i cant add it to the archive, the command zip_arch.write(BytesIOdocxfile) doesn't work. Is there another command to do this? Thank you!


Source: (StackOverflow)

python-docx cannot be imported to python

I'm trying to install python-docx so I typed in the cmd

easy_install python-docx

and got:

Searching for python-docx
Best match: python-docx 0.7.4
Processing python_docx-0.7.4-py2.6.egg
python-docx 0.7.4 is already the active version in easy-install.pth

Using c:\python26\lib\site-packages\python_docx-0.7.4-py2.6.egg
Processing dependencies for python-docx
Finished processing dependencies for python-docx

but when I open python and type:

import docx

I got:

 File "c:\python26\lib\site-packages\docx-0.2.4-py2.6.egg\docx.py", line 17, in <
module>
    from lxml import etree
ImportError: DLL load failed: The specified procedure could not be found.

How can I solve this import error? what is missing?


Source: (StackOverflow)

python-docx insertion point

I am not sure if I've been missing anything obvious, but I have not found anything documented about how one would go to insert Word elements (tables, for example) at some specific place in a document?

I am loading an existing MS Word .docx document by using:

my_document = Document('some/path/to/my/document.docx')

My use case would be to get the 'position' of a bookmark or section in the document and then proceed to insert tables below that point.

I'm thinking about an API that would allow me to do something along those lines:

insertion_point = my_document.bookmarks['bookmark_name'].position
my_document.add_table(rows=10, cols=3, position=insertion_point+1)

I saw that there are plans to implement something akin to the 'range' object of the MS Word API, this would effectively solve that problem. In the meantime, is there a way to instruct the document object methods where to insert the new elements?

Maybe I can glue some lxml code to find a node and pass that to these python-docx methods? Any help on this subject would be much appreciated! Thanks.


Source: (StackOverflow)

Convert Microsoft Word equations to Latex

I have a docx file containing a few equations in different pages. With Python and lxml, I was successful in extracting the content. I now need to convert the equations in Word to Latex. Some of the equations are shown as:

- eq \\f (sinx,\\r(1 - sin 2 x))

Is there any Python library of any tool that I can use to convert the equation to Latex format?

Here is a snippet of the XML file which I obtained from docxfile/word/document.xml:

<w:p w:rsidR="00677018" w:rsidRPr="007D05E5" w:rsidRDefault="00677018" w:rsidP="00677018">
            <w:pPr>
                <w:pStyle w:val="w" />
                <w:jc w:val="both" /></w:pPr>
            <w:r w:rsidRPr="007D05E5">
                <w:tab/>
                <w:t>a.</w:t>
            </w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:tab/></w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="begin" /></w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:instrText xml:space="preserve">eq \b\bc\[(\a\co2\hs4(7,-3,-1,2))</w:instrText>
            </w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="end" /></w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:tab/>
                <w:t>b.</w:t>
            </w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:tab/></w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="begin" /></w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:instrText xml:space="preserve">eq \f(5,8)</w:instrText>
            </w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="end" /></w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="begin" /></w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:instrText xml:space="preserve">eq \b\bc\[(\a\co2\hs4(7,-3,-1,2))</w:instrText>
            </w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="end" /></w:r>
        </w:p>

Source: (StackOverflow)

python-docx - how to restart list lettering

I'm automating the process of creating a Word document with the python-docx module. In particular, I'm creating a multiple choice test where the questions are numbered 1., 2., 3., ... and under each question there are 4 answers that should be labeled as A., B., C., and D. I used a style to create the numbered list and the lettered list. However, I don't know how to restart the letters. For example, the answers for the 2nd question would range from E., F., G., H. Does anyone know how to restart the lettering back to A? I could manually specify the lettering in the answer string but I'm wondering how to do it with the style sheet. Thank you.


Source: (StackOverflow)