Sometimes, we want to convert an XML file to a Python Pandas dataframe.
In this article, we’ll look at how to convert an XML file to a Python Pandas dataframe.
How to convert an XML file to a Python Pandas dataframe?
To convert an XML file to a Python Pandas dataframe, we can use the iter
method.
For instance, we write
import pandas as pd
import xml.etree.ElementTree as ET
import io
def iter_docs(author):
author_attr = author.attrib
for doc in author.iter('document'):
doc_dict = author_attr.copy()
doc_dict.update(doc.attrib)
doc_dict['data'] = doc.text
yield doc_dict
xml_data = io.StringIO(u'''...''')
etree = ET.parse(xml_data)
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))
to create the iter_docs
function that iterates through the document
elements in the author
XML DOM object.
And then we insert the values into the doc_dict
dictionary.
Then we use yield
to yield the value.
Next, we read the XML string with
xml_data = io.StringIO(u'''...''')
Then we parse the XML string with
etree = ET.parse(xml_data)
And then we use iter_docs
to return the items with
iter_docs(etree.getroot())
Then we convert the iterator items to a list with list
and use the list as the argument for DataFrame
to create the data frame.
Conclusion
To convert an XML file to a Python Pandas dataframe, we can use the iter
method.