Import XML into Pandas and Convert to CSV

In this article we will cover importing XML, including nested XML, into Python using Pandas. You can then save the pandas DataFrame to a CSV file in order to convert the XML to CSV. We will be using the example XML file below, which is a dummy API response generated by ChatGPT.

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <status>200</status>
    <message>Success</message>
    <data>
        <users>
            <user>
                <id>123</id>
                <username>johndoe</username>
                <email>johndoe@example.com</email>
                <first_name>John</first_name>
                <last_name>Doe</last_name>
                <address>
                    <street>123 Main St</street>
                    <city>Anytown</city>
                    <zip_code>12345</zip_code>
                </address>
            </user>
            <user>
                <id>456</id>
                <username>janedoe</username>
                <email>janedoe@example.com</email>
                <first_name>Jane</first_name>
                <last_name>Doe</last_name>
                <address>
                    <street>456 Oak St</street>
                    <city>AnotherTown</city>
                    <zip_code>54321</zip_code>
                </address>
            </user>
        </users>
        <products>
            <product>
                <id>789</id>
                <name>Smartphone</name>
                <price>399.99</price>
                <description>A feature-rich smartphone with the latest technology.</description>
                <category>Electronics</category>
                <manufacturer>TechCo</manufacturer>
                <availability>In Stock</availability>
            </product>
            <product>
                <id>101</id>
                <name>Laptop</name>
                <price>899.99</price>
                <description>A powerful laptop for productivity and entertainment.</description>
                <category>Computers</category>
                <manufacturer>PCMaster</manufacturer>
                <availability>Out of Stock</availability>
            </product>
        </products>
    </data>
</response>

We will work with the users table within the XML file, as this table is more complex. Since the address has several sub-elements, we need to collapse them into one so that this can be loaded into Pandas. For this, we will use the stylesheet below. Make sure both the XML file and XSLT file are in your working directory if you’re following along.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
    <xsl:for-each select="response/data/users/user">
    <user>
      <id><xsl:value-of select="id"/></id>
      <username><xsl:value-of select="username"/></username>
      <email><xsl:value-of select="email"/></email>
      <first_name><xsl:value-of select="first_name"/></first_name>
      <last_name><xsl:value-of select="last_name"/></last_name>
      <address><xsl:value-of select="address/street"/>,&#032;<xsl:value-of select="address/city"/>,&#032;<xsl:value-of select="address/zip_code"/></address>
    </user>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet> 

The final thing to do is make sure that you have both pandas and the lxml library.

pip install pandas
pip install lxml

With our files read and the relevant libraries installed, you can go ahead and convert the XML file to a CSV using only 3 lines of Python.

import pandas as pd
df = pd.read_xml('samplefile.xml', stylesheet='samplefilestylesheet.xslt', xpath='//user')
df.to_csv('xml_to_csv.csv', index=False)

That’s it!