How to convert an XML file to a Python Pandas dataframe?

Spread the love

To convert an XML file to a Python Pandas dataframe, we parse the XML into an object and them we create a dataframe from it.

For instance, we write

import pandas as pd
import xml.etree.ElementTree as ET

xml_str = '<?xml version="1.0" encoding="utf-8"?>\n<response>\n <head>\n  <code>\n   200\n  </code>\n </head>\n <body>\n  <data id="0" name="All Categories" t="2018052600" tg="1" type="category"/>\n  <data id="13" name="RealEstate.com.au [H]" t="2018052600" tg="1" type="publication"/>\n </body>\n</response>'

etree = ET.fromstring(xml_str)
dfcols = ['id', 'name']
df = pd.DataFrame(columns=dfcols)

for i in etree.iter(tag='data'):
    df = df.append(
        pd.Series([i.get('id'), i.get('name')], index=dfcols),
        ignore_index=True)

df.head()

to call ET.fromstring with xml_str to create an XML tree object.

And then we create an empty data frame with some columns with DataFrame.

Next, we use a for loop to loop through the data tag values.

In it, we call df.append to append the series created from the id and name attribute values of each node.

Related Posts

By John Au-Yeung

Leave a Reply Cancel reply