EzDevInfo.com

xmltodict

Python module that makes working with XML feel like you are working with JSON

Handle 1 to n elements

I'm using xmltodict to parse an XML config. The XML has structures where an element can occur in 1 to n instances, where both are valid:

<items>
    <item-ref>abc</item-ref>
</items>

and

<items>
    <item-ref>abc</item-ref>
    <item-ref>dca</item-ref>
    <item-ref>abb</item-ref>
</items>

I'm parsing this with xmltodict as follows:

document['items']['item-ref']

and it gives back a single unicode or a list (depending the items found), so I always need to add an extra check to ensure if I need to handle a list or a string:

if isinstance(document['items']['item-ref'], list):
    my_var = document['items']['item-ref']
else:
    my_var = [document['items']['item-ref']] #create list manually

Is there a better/simpler/more elegant way to handle these?


Source: (StackOverflow)

Creating a json file from a xml file in python with xmltodict

I am trying to create a json file from an input xml file using xmltodict with the following code

import io, xmltodict, json
infile = io.open(filename_xml, 'r')
outfile = io.open(filename_json, 'w')
o = xmltodict.parse( infile.read() )
json.dump( o , outfile )

the last line get me the following error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 182, in dump
    fp.write(chunk)
TypeError: must be unicode, not str

I guess I need to change the encoding. My initial xml file seems to be ascii. Any idea on how to make this work? Thanks


Source: (StackOverflow)

Advertisements

Catch ExpatError in xmltodict

I am using xmltodict to parse xml.

If we parse invalid xml, it throws up an ExpatError.

How do I catch this? Here is what I've tried in my ipython shell

>>> import xmltodict
>>> xml_data = """<?xml version="1.0" encoding="UTF-8" ?>
...     <Website>"""

>>> xml_dict = xmltodict.parse(xml_data)
ExpatError: no element found

>>> try:                      
...     xml_dict = xmltodict.parse(xml_data)
... except ExpatError:
...     print "that's right"
NameError: name 'ExpatError' is not defined

>>> try:                      
...     xml_dict = xmltodict.parse(xml_data)
... except xmltodict.ExpatError:
...     print "that's right"
AttributeError: 'module' object has no attribute 'ExpatError'

Source: (StackOverflow)

Most efficient way to convert one XML to a different XML file in python xmltodict, elementTree etc

Howdie do,

So I have the following two XML files.

File A:

<?xml version="1.0" encoding="UTF-8"?>
<GetShipmentUpdatesResult>
    <Shipments>
        <Shipment>
            <Container>
                <OrderNumber>5108046</OrderNumber>
                <ContainerNumber>5108046_1</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-12T12:00:00</ShipDate>
                <CarrierName>UPS</CarrierName>
                <TrackingNumber>1ZX20520A803682850</TrackingNumber>
                <StatusCode>InTransit</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T13:53:18</TimeStamp>
                        <City></City>
                        <StateOrProvince></StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T18:47:44</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>Status: AF Recorded</Description>
                        <TrackingStatus>In Transit</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
        <Shipment>
            <Container>
                <OrderNumber>456789</OrderNumber>
                <ContainerNumber>44789</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-03T13:56:27</ShipDate>
                <CarrierName>UP2</CarrierName>
                <TrackingNumber>1Z4561230020</TrackingNumber>
                <StatusCode>IN_TRANSIT</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-07-03T13:56:27</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
    </Shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId></RequestId>
    <RecordsRemaining>0</RecordsRemaining>
</GetShipmentUpdatesResult>

File B:

<?xml version="1.0" encoding="UTF-8"?>
<getShipmentStatusResponse>
    <getShipmentStatusResult>
        <outcome>
            <result>Success</result>
            <error></error>
        </outcome>
        <shipments>
            <shipment>
                <orderID>123456</orderID>
                <containerNo>CD1863663C</containerNo>
                <shipDate>2015-06-29T18:47:44</shipDate>
                <carrier>UPS</carrier>
                <trackingNumber>1Z4561230001</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T13:53:18</timeStamp>
                        <city />
                        <state />
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T18:47:44</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Shipped from warehouse</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
            <shipment>
                <orderID>456789</orderID>
                <containerNo>44789</containerNo>
                <shipDate>2015-07-03T13:56:27</shipDate>
                <carrier>UP2</carrier>
                <trackingNumber>1Z4561230020</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-07-03T13:56:27</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
        </shipments>
        <matchingRecords>2</matchingRecords>
        <requestId></requestId>
        <remainingRecords>0</remainingRecords>
    </getShipmentStatusResult>
</getShipmentStatusResponse>

I basically need to read through File A and change it to look like File B. Now, I've been using xmltodic to parse the File A, but it only will read the top element. It seems I would have to create multiple for loops in order to achieve this with xmltodict. A loop to go through each parent and then childern elements.

Looking at elementree, this appears to be the same. Does anyone know any other way to do this without having to do multiple for loops?


Source: (StackOverflow)

parse nil values in xml using xmltodict library

Is there a way to read nil values correctly using xmltodict/xml.sax.xmlreader?

I tried setting up the namespaces parsing but it seems like it just ignores them when parsing this form:

<example xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true" />

Using xmltodict I am getting the following value:

OrderedDict([(u'@http://www.w3.org/2001/XMLSchema-instance:nil', u'true')])

I can just ignore it, but it seems like it should be working better


Source: (StackOverflow)

Parsing an xml file with an ordered dictionary

I have an xml file of the form:

<NewDataSet>
    <Root>
        <Phonemic>and</Phonemic>
        <Phonetic>nd</Phonetic>
        <Description/>
        <Start>0</Start>
        <End>8262</End>
    </Root>
    <Root>
        <Phonemic>comfortable</Phonemic>
        <Phonetic>comfetebl</Phonetic>
        <Description>adj</Description>
        <Start>61404</Start>
        <End>72624</End>
    </Root>
</NewDataSet>

I need to process it so that, for instance, when the user inputs nd, the program matches it with the <Phonetic> tag and returns and from the <Phonemic> part. I thought maybe if I can convert the xml file to a dictionary, I would be able to iterate over the data and find information when needed.

I searched and found xmltodict which is used for the same purpose:

import xmltodict
with open(r'path\to\1.xml', encoding='utf-8', errors='ignore') as fd:
    obj = xmltodict.parse(fd.read())

Running this gives me an ordered dict:

>>> obj
OrderedDict([('NewDataSet', OrderedDict([('Root', [OrderedDict([('Phonemic', 'and'), ('Phonetic', 'nd'), ('Description', None), ('Start', '0'), ('End', '8262')]), OrderedDict([('Phonemic', 'comfortable'), ('Phonetic', 'comfetebl'), ('Description', 'adj'), ('Start', '61404'), ('End', '72624')])])]))])

Now this unfortunately hasn't made things simpler and I am not sure how to go about implementing the program with the new data structure. For example to access nd I'd have to write:

obj['NewDataSet']['Root'][0]['Phonetic']

which is ridiculously complicated. I tried to make it into a regular dictionary by dict() but as it is nested, the inner layers remain ordered and my data is so big.


Source: (StackOverflow)

TypeError: list indices must be integers, not str with xmltodict:

I have got this XML file:

<?xml version="1.0"?>
<toolbox tool_path="/galaxy/main/shed_tools">
<section id="snpeff" name="snpEff" version="">
  <tool file="toolshed.g2.bx.psu.edu/repos/pcingola/snpeff/c052639fa666/snpeff/snpEff_2_1a/snpEff_2_1a/galaxy/snpSift_filter.xml" guid="toolshed.g2.bx.psu.edu/repos/pcingola/snpeff/snpSift_filter/1.0">
      <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
        <repository_name>snpeff</repository_name>
        <repository_owner>pcingola</repository_owner>
        <installed_changeset_revision>c052639fa666</installed_changeset_revision>
        <id>toolshed.g2.bx.psu.edu/repos/pcingola/snpeff/snpSift_filter/1.0</id>
        <version>1.0</version>
    </tool>
    <tool file="toolshed.g2.bx.psu.edu/repos/pcingola/snpeff/c052639fa666/snpeff/snpEff_2_1a/snpEff_2_1a/galaxy/snpEff.xml" guid="toolshed.g2.bx.psu.edu/repos/pcingola/snpeff/snpEff/1.0">
      <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
        <repository_name>snpeff</repository_name>
        <repository_owner>pcingola</repository_owner>
        <installed_changeset_revision>c052639fa666</installed_changeset_revision>
        <id>toolshed.g2.bx.psu.edu/repos/pcingola/snpeff/snpEff/1.0</id>
        <version>1.0</version>
    </tool>
    <tool file="toolshed.g2.bx.psu.edu/repos/gregory-minevich/check_snpeff_candidates/22c8c4f8d11c/check_snpeff_candidates/checkSnpEffCandidates.xml" guid="toolshed.g2.bx.psu.edu/repos/gregory-minevich/check_snpeff_candidates/check_snpeff_candidates/1.0.0">
      <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
        <repository_name>check_snpeff_candidates</repository_name>
        <repository_owner>gregory-minevich</repository_owner>
        <installed_changeset_revision>22c8c4f8d11c</installed_changeset_revision>
        <id>toolshed.g2.bx.psu.edu/repos/gregory-minevich/check_snpeff_candidates/check_snpeff_candidates/1.0.0</id>
        <version>1.0.0</version>
    </tool>
</section>
...

I have tried to parse the above file in the following way:

import xmltodict

# wget -c https://raw.githubusercontent.com/galaxyproject/usegalaxy-playbook/c55aa042825fe02ef4a02d958eb811adba8ea45f/files/galaxy/usegalaxy.org/var/shed_tool_conf.xml

if __name__ == '__main__':

    with open('tests/shed_tool_conf.xml') as fd:
        doc = xmltodict.parse(fd.read())
        tools_section = doc['toolbox']['section']['@name']
        print tools_section

However, I have got the following error:

Traceback (most recent call last):
  File "importTools2Galaxy.py", line 15, in <module>
    tools_section = doc['toolbox']['section']['@name']
TypeError: list indices must be integers, not str

What did I do wrong?


Source: (StackOverflow)

Python import XML intro SQLITE (xmltodict)

I'm trying to parse an XML file and import it into an SQLITE database.

the XML looks like this:

<resultset>
    <row>
        <column name="pct_lucru">unit name</column>
        <column name="cod_comercial">00032749</column>
        <column name="denumire_med">stuff name</column>
        <column name="producator">fabri</column>
        <column name="tip_produs">koops</column>
        <column name="tva">24.000000</column>
        <column name="umc">1</column>
        <column name="furnizor">FURNIZORI DIVERSI</column>
        <column name="data_expirarii">2015-12-31</column>
        <column name="lot">80063</column>
        <column name="cant_fl">1</column>
        <column name="fractie">0</column>
        <column name="cantitate">1</column>
        <column name="pret_intr">62.930000</column>
        <column name="val_intr">62.930000</column>
        <column name="pret_fl">82.720000</column>
        <column name="valoare">82.720000</column>
    </row>
</resultset>

And I have the following python code

import xmltodict
import sqlite3

conn = sqlite3.connect("test.sqlite")
c = conn.cursor()

with open("export.xml") as fd:
    obj = xmltodict.parse(fd.read())

for row in obj["resultset"]["row"]:
    for column in row["column"]:
        c.execute("INSERT INTO stocks ? VALUES '?'", [column["@name"], column["#text"]])
    print "item inserted \n"

Which produces the following error

Traceback (most recent call last):
        File "dbimport.py", line 12, in <module>
            c.execute("INSERT INTO stocks ? VALUES '?'", [column["@name"], column["#text"]])
sqlite3.OperationalError: near "?": syntax error

What am I doing wrong here? I've used this method before and it worked just fine, albeit not with XML files.


Source: (StackOverflow)

Python xmltodict Force Array with attribute

I've been using the python package xmltodict very successfully to parse my xml string into a python dictionary.

However, I have the following issue:

<child>
  <episode>["a","b"]</episode>
</child>

parses as:

 { 
  child: {
    episode: ["a","b"]
    }
 }

whereas:

<child>
  <episode>["a","b"]</episode>
  <episode>["c","d"]</episode>
</child` 

parses as:

{ child: 
   {
    episode: [
     ["a","b"],
     ["c","d"]
     ]
    }
 }

which means that an code I write is going to give me different results depending on which child observation I'm looking at.

What I'd like is a way to specify to parse the episode always as an array - similarly to this .Net package. What would be the best way (or a way) of doing this in Python?


Source: (StackOverflow)

Python xmltodict indicies error

I have this XML API I am trying to get information out of for stats. Here's my code and sample XML:

xml.xml

<calls total="1">
    <call id="cc04cd2a-31ff-422e-9f9c-94f41eaa219d">
        <name>John Doe</name>
        <coSpace>324f9508-beca-4829-89ee-927898c5796e</coSpace>
        <callCorrelator>45962153-c0ff-41ef-bf39-e75f54085b4e</callCorrelator>
    </call>
</calls>

temp.py

import xmltodict

totalNumCospaces = 0
totalNumCallers = 0

with open('/root/xml.xml','r') as f:
    xml = xmltodict.parse(f.read())

total = xml["calls"]["@total"]

totalNumCospaces = totalNumCospaces + int(total)
# If we have more than 0 calls active, find the coSpaces and count the callers
if (int(total) > 0):
    callList = xml["calls"]["call"]
    for c in callList:
        id = c["@id"]
        totalNumCallers = totalNumCallers + get_coSpaceCallers(server, id)

When I run the code and there are MORE than 1 <call id="x">stanzas, this works fine. If there is only 1 <call id="x">, then I get this error below.

Traceback (most recent call last):
  File "temp.py", line 17, in <module>
    id = c["@id"]
TypeError: string indices must be integers

When I print the contents of xml, I get this, so I know that xml["calls"]["call"]["@id"] should be there:

OrderedDict([(u'calls', OrderedDict([(u'@total', u'1'), (u'call', OrderedDict([(u'@id', u'cc04cd2a-31ff-422e-9f9c-94f41eaa219d'), (u'name', u'John Doe'), (u'coSpace', u'324f9508-beca-4829-89ee-927898c5796e'), (u'callCorrelator', u'45962153-c0ff-41ef-bf39-e75f54085b4e')]))]))])

Thoughts?


Source: (StackOverflow)

ValueError with xmltodict unparse() function - Python 3

I'm having trouble using xmltodict to convert json to xml. It works fine with a single root and a single object, however when I try to convert multiple objects it returns a ValueError "ValueError: document with multiple roots".

Here's my JSON data:

Here's my script thus far:

import json
import xmltodict

    y = """{{  "markers":[  {  "point":"new GLatLng(40.266044,-74.718479)","awayTeam":"LUGip","markerImage":"images/red.png","fixture":"Wednesday 7pm","information":"Linux users group meets second Wednesday of each month.","previousScore":"","capacity":"","homeTeam":"Lawrence Library"},{  "point":"new GLatLng(40.211600,-74.695702)","awayTeam":"LUGip HW SIG","tv":"","markerImage":"images/white.png","fixture":"Tuesday 7pm","information":"Linux users can meet the first Tuesday of the month to work out harward and configuration issues.","capacity":"","homeTeam":"Hamilton Library"},{  "point":"new GLatLng(40.294535,-74.682012)","awayTeam":"After LUPip Mtg Spot","tv":"","markerImage":"images/newcastle.png","fixture":"Wednesday whenever","information":"Some of us go there after the main LUGip meeting, drink brews, and talk.","capacity":"2 to 4 pints","homeTeam":"Applebees"}]}"""

y2 = json.loads(y)
print(xmltodict.unparse(y2, pretty = True))

Result:

Traceback (most recent call last):

  File "<ipython-input-89-8838ce8b0d7f>", line 1, in <module>
    print(xmltodict.unparse(y2,pretty=True))

  File "/Users/luzazul/anaconda/lib/python3.4/site-packages/xmltodict.py", line 323, in unparse
    raise ValueError('Document must have exactly one root.')

ValueError: Document must have exactly one root.

Any help would be greatly appreciated, thanks!


Source: (StackOverflow)

Improve parsing speed for xmltodict

I have a compressed FIXML ZIP file. I am trying to use xmltodict to parse its uncompressed file (~130MB uncompressed data) as follows, but the parsing takes about 3 mins:

with zipfile.ZipFile(ff, 'r') as fh:
     infile = fh.read(fh.namelist()[0])
o = xmltodict.parse(infile)

I have tried to use the latest version of xmltodict as well (0.9.0) since the release notes for a prior version outlines a peformance improvement, but this still didn't help...no speed improvement at all.

Any ideas on how this xmltodict parsing could be accomplished much faster?

Thanks.


Source: (StackOverflow)