How to convert an XML file to a Python Pandas dataframe?

Spread the love

Sometimes, we want to convert an XML file to a Python Pandas dataframe.

In this article, we’ll look at how to convert an XML file to a Python Pandas dataframe.

How to convert an XML file to a Python Pandas dataframe?

To convert an XML file to a Python Pandas dataframe, we can use the iter method.

For instance, we write

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''...''')

etree = ET.parse(xml_data)
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

to create the iter_docs function that iterates through the document elements in the author XML DOM object.

And then we insert the values into the doc_dict dictionary.

Then we use yield to yield the value.

Next, we read the XML string with

xml_data = io.StringIO(u'''...''')

Then we parse the XML string with

etree = ET.parse(xml_data)

And then we use iter_docs to return the items with

iter_docs(etree.getroot())

Then we convert the iterator items to a list with list and use the list as the argument for DataFrame to create the data frame.

Conclusion

To convert an XML file to a Python Pandas dataframe, we can use the iter method.

How to convert an XML file to a Python Pandas dataframe?

Conclusion

Related Posts

By John Au-Yeung

Leave a Reply Cancel reply