[SOLVED] How to prevent lxml from converting ‘&’ character to ‘&’?

Issue

This Content is from Stack Overflow. Question asked by Krupniok

I need to send the control characters 
 and 
 in my XML file so that the text is displayed correctly in the target system.

For the creation of the XML file I use the lxml library. This is my attempt:

from lxml import etree as et
import lxml.builder

e = lxml.builder.ElementMaker()

xml_doc = e.newOrderRequest(
    e.Orders(
        e.Order(
            e.OrderNumber('12345'),
            e.OrderID('001'),
            e.Articles(
                e.Article(
                    e.ArticleNumber('000111'),
                    e.ArticleName('Logitec Mouse'),
                    e.ArticleDescription('* 4 Buttons
* 600 DPI
* Bluetooth')
                )
            )
        )
    )
)

tree = et.ElementTree(xml_doc)
tree.write('output.xml', pretty_print=True, xml_declaration=True, encoding="utf-8")

This is the result:

<?xml version='1.0' encoding='UTF-8'?>
<newOrderRequest>
  <Orders>
    <Order>
      <OrderNumber>12345</OrderNumber>
      <OrderID>001</OrderID>
      <Articles>
        <Article>
          <ArticleNumber>000111</ArticleNumber>
          <ArticleName>Logitec Mouse</ArticleName>
          <ArticleDescription>* 4 Buttons&amp;#x0D;&amp;#x0A;* 600 DPI&amp;#x0D;&amp;#x0A;* Bluetooth</ArticleDescription>
        </Article>
      </Articles>
    </Order>
  </Orders>
</newOrderRequest>

This is what I need:

<ArticleDescription>* 4 Buttons&#x0D;&#x0A;* 600 DPI&#x0D;&#x0A;* Bluetooth</ArticleDescription>

Is there a function in the lxml library to turn off the conversion or does anyone know a way to solve this problem? Thanks in advance.



Solution

This is not a python or lxml issue – it is how XML parsers and serializers work.
If you want to use a specific character in your programming language, then make it that character. The serializer will convert it into an entity reference if required, and the parser will convert it back when reading the document. You cannot turn it off – it would be against the specification.

An exception might be to use a CDATA section as explained in What does <![CDATA[]]> in XML mean?


This Question was asked in StackOverflow by Krupniok and Answered by Hiran Chaudhuri It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?