Working with XML in Python is quite nice, but when trying to render a document back to XML after manipulating it, you can get some weird shortnames for namespaces in the result. Take the following example:
from xml.etree import ElementTree xml = ElementTree.fromstring(""" <document xmlns="http://www.lol.com/" xmlns:foo="http://www.foo.com/" xmlns:bar="http://www.bar.com/"> <something> <foo:thing>Hello,</foo:thing> <bar:thing>world!</bar:thing> </something> </document> """) print ElementTree.tostring(xml)
<ns0:document xmlns:ns0="http://www.lol.com/" xmlns:ns1="http://www.foo.com/" xmlns:ns2="http://www.bar.com/"> <ns0:something> <ns1:thing>Hello,</ns1:thing> <ns2:thing>world!</ns2:thing> </ns0:something> </ns0:document>
The XML parser doesn't maintain the namespace short names, so you get ns%d in your output XML. It's obviously perfectly valid XML, just a bit unsightly. To fix it up, a simple solution if you know the namespaces in advance, is to register them with ElementTree before rendering:
from xml.etree import ElementTree xml = ElementTree.fromstring(""" <document xmlns="http://www.lol.com/" xmlns:foo="http://www.foo.com/" xmlns:bar="http://www.bar.com/"> <something> <foo:thing>Hello,</foo:thing> <bar:thing>world!</bar:thing> </something> </document> """) namespaces = { '': 'http://www.lol.com/', 'foo': 'http://www.foo.com/', 'bar': 'http://www.bar.com/', } for prefix, uri in namespaces.iteritems(): ElementTree.register_namespace(prefix, uri) print ElementTree.tostring(xml)
<document xmlns="http://www.lol.com/" xmlns:bar="http://www.bar.com/" xmlns:foo="http://www.foo.com/"> <something> <foo:thing>Hello,</foo:thing> <bar:thing>world!</bar:thing> </something> </document>
If you don't know the namespaces at runtime, it seems that a good solution is to use ElementTree.iterparser to pull them out like so:
from cStringIO import StringIO from xml.etree import ElementTree xmlin = """ <document xmlns="http://www.lol.com/" xmlns:foo="http://www.foo.com/" xmlns:bar="http://www.bar.com/"> <something> <foo:thing>Hello,</foo:thing> <bar:thing>world!</bar:thing> </something> </document> """ xml = None namespaces = {} for event, elem in ElementTree.iterparse(StringIO(xmlin), ('start', 'start-ns')): if event == 'start-ns': if elem[0] in namespaces and namespaces[elem[0]] != elem[1]: # NOTE: It is perfectly valid to have the same prefix refer # to different URI namespaces in different parts of the # document. This exception serves as a reminder that this # solution is not robust. Use at your own peril. raise KeyError("Duplicate prefix with different URI found.") namespaces[str(elem[0])] = elem[1] elif event == 'start': if xml is None: xml = elem break for prefix, uri in namespaces.iteritems(): ElementTree.register_namespace(prefix, uri) print ElementTree.tostring(xml)