Sometimes, we want to convert an XML file to a Pandas DataFrame.
In this article, we’ll look at how to convert an XML file to a Pandas DataFrame.
How to convert an XML file to a Pandas DataFrame?
To convert an XML file to a Pandas DataFrame, we can use the xml.etree.ElementTree
module.
For instance, we write:
import pandas as pd
import xml.etree.ElementTree as ET
xml_str = '''<?xml version="1.0" encoding="utf-8"?>
<response>
<head>
<code> 200 </code>
</head>
<body>
<data id="0" name="All Categories" t="2018052600" tg="1" type="category"/>
<data id="13" name="RealEstate.com.au [H]" t="2018052600" tg="1" type="publication"/>
</body>
</response>
'''
etree = ET.fromstring(xml_str)
dfcols = ['id', 'name']
df = pd.DataFrame(columns=dfcols)
for i in etree.iter(tag='data'):
df = df.append(pd.Series([i.get('id'), i.get('name')], index=dfcols),
ignore_index=True)
h = df.head()
print(h)
We have an XML string assigned to xml_str
.
And we parse it by passing that as the argument of ET.fromstring
.
Next, we define the columns of the DataFrame.
And we create the DataFrame with the DataFrame
constructor.
Next, we loop through the parsed XML data
elements we got with etree.iter(tag='data')
with the for loop.
And we call df.append
to append to id
and name
attribute values by putting them into a series.
Then we get the first 5 rows with df.head
.
Therefore, print
should print:
id name
0 0 All Categories
1 13 RealEstate.com.au [H]
Conclusion
To convert an XML file to a Pandas DataFrame, we can use the xml.etree.ElementTree
module.