Top xml-parsing frequently asked interview questions

An invalid XML character (Unicode: 0xc) was found

Parsing an XML file using the Java DOM parser results in:

[Fatal Error] os__flag_8c.xml:103:135: An invalid XML character (Unicode: 0xc) was found in the element content of the document.
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xc) was found in the element content of the document.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

Source: (StackOverflow)

What XML parser should I use in C++?

I have XML documents that I need to parse and/or I need to build XML documents and write them to text (either files or memory). Since the C++ standard library does not have a library for this, what should I use?

Note: This is intended to be a definitive, C++-FAQ-style question for this. So yes, it is a duplicate of others. I did not simply appropriate those other questions because they tended to ask for something slightly more specific. This question is more generic.

Source: (StackOverflow)

How do you parse and process HTML/XML in PHP?

How can one parse HTML/XML and extract information from it?

This is a General Reference question for the php tag

Source: (StackOverflow)

What is the difference between SAX and DOM?

I read some articles about the XML parsers and came across SAX and DOM.

SAX is event-based and DOM is tree model -- I don't understand the differences between these concepts.

From what I have understood, event-based means some kind of event happens to the node. Like when one clicks a particular node it will give all the sub nodes rather than loading all the nodes at the same time. But in the case of DOM parsing it will load all the nodes and make the tree model.

Is my understanding correct?

Please correct me If I am wrong or explain to me event-based and tree model in a simpler manner.

Source: (StackOverflow)

Convert XML String to Object

I am receiving XML strings over a socket, and would like to convert these to C# objects.

The messages are of the form:

<msg>
   <id>1</id>
   <action>stop</action>
</msg>

I am new to .Net, and am not sure the best practice for performing this. I've used JAXB for Java before, and wasn't sure if there is something similar, or if this would be handled a different way.

Source: (StackOverflow)

The best node module for XML parsing [closed]

As far as XML parsing is concerned, which is the best node module, that I can use for XML parsing?

Source: (StackOverflow)

Get XML only immediate children elements by name

My question is: How can I get elements directly under a specific parent element when there are other elements with the same name as a "grandchild" of the parent element.

I'm using the Java DOM library to parse XML Elements and I'm running into trouble. Here's some (a small portion) of the xml I'm using:

<notifications>
  <notification>
    <groups>
      <group name="zip-group.zip" zip="true">
        <file location="C:\valid\directory\" />
        <file location="C:\another\valid\file.doc" />
        <file location="C:\valid\file\here.txt" />
      </group>
    </groups>
    <file location="C:\valid\file.txt" />
    <file location="C:\valid\file.xml" />
    <file location="C:\valid\file.doc" />
  </notification>
</notifications>

As you can see, there are two places you can place the <file> element. Either in groups or outside groups. I really want it structured this way because it's more user-friendly.

Now, whenever I call notificationElement.getElementsByTagName("file"); it gives me all the <file> elements, including those under the <group> element. I handle each of these kinds of files differently, so this functionality is not desirable.

I've thought of two solutions:

Get the parent element of the file element and deal with it accordingly (depending on whether it's <notification> or <group>.
Rename the second <file> element to avoid confusion.

Neither of those solutions are as desirable as just leaving things the way they are and getting only the <file> elements which are direct children of <notification> elements.

I'm open to IMPO comments and answers about the "best" way to do this, but I'm really interested in DOM solutions because that's what the rest of this project is using. Thanks.

Source: (StackOverflow)

xml.LoadData - Data at the root level is invalid. Line 1, position 1

I'm trying to parse some XML inside a WiX installer. The XML would be an object of all my errors returned from a web server. I'm getting the error in the question title with this code:

XmlDocument xml = new XmlDocument();
try
{
    xml.LoadXml(myString);
}
catch (Exception ex)
{
    System.IO.File.WriteAllText(@"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
    throw ex;
}

myString is this (as seen in the output of text.txt)

<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>

text.txt comes out looking like this:

<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>

Data at the root level is invalid. Line 1, position 1.

I need this XML to parse so I can see if I had any errors.

Edit

This question is not a duplicate as marked. In that question the person asking the question was using LoadXml to parse an XML file. I'm parsing a string, which is the correct use of LoadXml

Source: (StackOverflow)

DTD prohibited in xml document exception

I'm getting this error when trying to parse through an XML document in a C# application:

"For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method."

For reference, the exception occurred at the second line of the following code:

using (XmlReader reader = XmlReader.Create(uri))
{
    reader.MoveToContent(); //here

    while (reader.Read()) //(code to parse xml doc follows).

My knowledge of Xml is pretty limited and I have no idea what DTD processing is nor how to do what the error message suggests. Any help as to what may be causing this and how to fix it? thanks...

Source: (StackOverflow)

Better way to detect XML?

Currently, I have the following c# code to extract a value out of text. If its XML, I want the value within it - otherwise, if its not XML, it can just return the text itself.

String data = "..."
try
{
    return XElement.Parse(data).Value;
}
catch (System.Xml.XmlException)
{
    return data;
}

I know exceptions are expensive in C#, so I was wondering if there was a better way to determine if the text I'm dealing with is xml or not?

I thought of regex testing, but I dont' see that as a cheaper alternative. Note, I'm asking for a less expensive method of doing this.

Source: (StackOverflow)

Where I can find a detailed comparison of Java XML frameworks?

I'm trying to choose an XML-processing framework for my Java projects, and I'm lost in names.. XOM, JDOM, etc. Where I can find a detailed comparison of all popular Java XML frameworks?

Source: (StackOverflow)

DOMDocument in php

I have just started reading documentation and examples about DOM, in order to crawl and parse the document.

For example I have part of document shown below:

    <div id="showContent">
    <table>
    <tr>
        <td>
         Crap
        </td>
    </tr>
<tr>
          <td width="172" valign="top"><a rel='nofollow' href="link"><img height="91" border="0" width="172" class="" src="img"></a></td>
          <td width="10">&nbsp;</td>
          <td valign="top"><table cellspacing="0" cellpadding="0" border="0">
              <tbody><tr>
                <td height="30"><a class="px11" rel='nofollow' href="link">title</a><a><br>
                    <span class="px10"></span>
                </a></td>
              </tr>
              <tr>
                <td><img height="1" width="580" src="crap"></td>
              </tr>
              <tr>
                <td align="right">
                    <a rel='nofollow' href="link"><img height="16" border="0" width="65" src="/buy"></a>
                </td>
              </tr>
              <tr>
                <td valign="top" class="px10">
                    <p style="width: 500px;">description.</p>
                </td>
              </tr>
          </tbody></table></td>
        </tr>
    <tr>
        <td>
Crap
        </td>
    </tr>
    <tr>
        <td>
         Crap
        </td>
    </tr>
    </table>
    </div>

I'm trying to use the following code to get all the tr tags and analyze whether there is crap or information inside them:

$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);


$tags = $xpath->query('.//div[@id="showContent"]');
foreach ($tags as $tag) {
    $string="";
    $string=trim($tag->nodeValue);
    if(strlen($string)>3) {
        echo $string;
        echo '<br>';
    }
}

However I'm getting just stripped string without the tags, for example:

Crap

Crap
Title
Description

But I would like to get:

<tr>
   <td>Crap</td>
</tr>
<tr>
   <a rel='nofollow' href="link">title</a>
</tr>

How to keep html nodes (tags)?

Source: (StackOverflow)

Java parsing XML document gives "Content not allowed in prolog." error [duplicate]

This question already has an answer here:

Content is not allowed in Prolog SAXParserException 5 answers

I am writing a program in Java that takes a custom XML file and parses it. I'm using the XML file for storage. I am getting the following error in Eclipse.

[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
    at     com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283  )
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:208)
    at me.ericso.psusoc.RequirementSatisfier.parseXML(RequirementSatisfier.java:61)
    at me.ericso.psusoc.RequirementSatisfier.getCourses(RequirementSatisfier.java:35)
    at     me.ericso.psusoc.programs.RequirementSatisfierProgram.main(RequirementSatisfierProgram.java:23  )

The beginning of the XML file is included:

<?xml version="1.0" ?>
<PSU>
     <Major id="IST">
        <name>Information Science and Technology</name>
        <degree>B.S.</degree>
        <option> Information Systems: Design and Development Option</option>
        <requirements>
            <firstlevel type="General_Education" credits="45">
                <component type="Writing_Speaking">GWS</component>
                <component type="Quantification">GQ</component>

The program is able to read in the XML file but when I call DocumentBuilder.parse(XMLFile) to get a parsed org.w3c.dom.Document, I get the error above.

It doesn't seem to me that I have invalid content in the prolog of my XML file. I can't figure out what is wrong. Please help. Thanks.

Source: (StackOverflow)

How to create a XML object from String in Java?

I am trying to write a code that helps me to create a XML object. For example, I will give a string as input to a function and it will return me a XMLObject.

XMLObject convertToXML(String s) {}

When I was searching on the net, generally I saw examples about creating XML documents. So all the things I saw about creating an XML and write on to a file and create the file. But I have done something like that also ,

Document document = new Document();
    Element child = new Element("snmp");
    child.addContent(new Element("snmpType").setText("snmpget"));
    child.addContent(new Element("IpAdress").setText("127.0.0.1"));
    child.addContent(new Element("OID").setText("1.3.6.1.2.1.1.3.0"));
    document.setContent(child);

Do you think it is enough to create an XML object ? and also can you please help me how to get data from XML ? For example; how can I get the IpAdress from that XML ?

Thank you all a lot

**EDIT 1 : ** Actually now I thought that maybe it would be much easier for me to have a file like base.xml , I will write all basic things into that for example,

  <snmp>
  <snmpType><snmpType>
  <OID></OID>
  </snmp>

and then use this file to create a XML object. What do you think about that ?

Source: (StackOverflow)

Read a XML (from a string) and get some fields - Problems reading XML

I have this XML (stored in a C# string called myXML)

<?xml version="1.0" encoding="utf-16"?>
<myDataz xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <listS>
    <sog>
      <field1>123</field1>
      <field2>a</field2>
      <field3>b</field3>
    </sog>
    <sog>
      <field1>456</field1>
      <field2>c</field2>
      <field3>d</field3>
    </sog>
  </listS>
</myDataz>

and I'd like to browse all <sog> elements. For each of them, I'd like to print the child <field1>.

So this is my code :

XmlDocument xmlDoc = new XmlDocument();
string myXML = "<?xml version=\"1.0\" encoding=\"utf-16\"?><myDataz xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"><listS><sog><field1>123</field1><field2>a</field2><field3>b</field3></sog><sog><field1>456</field1><field2>c</field2><field3>d</field3></sog></listS></myDataz>"
xmlDoc.Load(myXML);
XmlNodeList parentNode = xmlDoc.GetElementsByTagName("listS");
foreach (XmlNode childrenNode in parentNode)
{
    HttpContext.Current.Response.Write(childrenNode.SelectSingleNode("//field1").Value);
}

but seems I can't read a string as XML? I get System.ArgumentException

Source: (StackOverflow)

EzDevInfo.com

xml-parsing interview questions

An invalid XML character (Unicode: 0xc) was found

What XML parser should I use in C++?

How do you parse and process HTML/XML in PHP?

What is the difference between SAX and DOM?

Convert XML String to Object

The best node module for XML parsing [closed]

Get XML only immediate children elements by name

xml.LoadData - Data at the root level is invalid. Line 1, position 1

DTD prohibited in xml document exception

Better way to detect XML?

Where I can find a detailed comparison of Java XML frameworks?

DOMDocument in php

Java parsing XML document gives "Content not allowed in prolog." error [duplicate]

How to create a XML object from String in Java?

Read a XML (from a string) and get some fields - Problems reading XML