Parsing XML Feed to an Array with XPath

Recently while working on a project, I found myself needed to parse several different types of files through the same mechanism (CSV, pipe delimited, XML, and more). I decided that it would be best to get each time of feed to a identical object that could then be run through the same methods regardless of the input type. This tutorial will walk you through using PHP and XPath to parse the values from an XML file and store them into array for later manipulation.

To start, for those that are unfamiliar with XPath. It is a mechanism which will allow you to easily navigate and retrieve elements and attributes from XML and HTML. It’s preety simple to understand and can make life a lot easier when dealing with these languages.
Let’s take a look at the following XML example. An XPath is literally just a path to whatever attribute or element you are looking to get.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>

//catalog/book/title would return the nodes that match that. In this case, there would be 2.

This is the basic idea behind XPath. If you are interested in learning more or need more information please review the information that they have over at w3schools on the subject. They will be able to provide everything you need to get started.

First lets create our method that will do the actually data collection for us:

   function parseXML($feed_url,$field_xpaths)
    {
        //Get file from url
        $file_contents = file_get_contents($feed_url);
        
        //create xml object
        $xml = new SimpleXMLElement($file_contents);

        //Work through all fields we need to collect
        foreach($field_xpaths[0] as $key=>$field_xpath)
        {
            $count = 0;

            //Get an array of fields that match the current xpath
            $xpath_returns = $xml->xpath($field_xpath);
            //Iterate through the array and assign values to the coupon array
            while(list( , $node) = each($xpath_returns))
            {
                $rows[$count][$key] = $node;
                $count++;
            }
        }

        return $rows;
    }

When you break this down, you see how simple this really is to achieve. The way this function is written, it will accept an array for the $field_xpaths variable. Then it will iterate the through the array treating each as it’s own xpath. Each value will use it’s key in the array as it’s name (in my example, these values are pulled from a database, so the column name is the name of the key for that value in the array).

It first grabs the entire contents of the file and associates a SimpleXMLElements object to them. This object will allow us to use XPath operators to navigate through our nodes. For each set in the array (key and value) we get all values that match the XPath provided. We then iterate through each value in the returned array and store it in a master array (in this case it’s just called rows). When this is done processing, all values for each xpath will be stored in an multi dimensional array that you can call by name. For example, I said that mine were from a database and one of my column names (which would then become a key in the array of results) was named store_name. I would then be able to access the stores at $rows[intCount][‘store_name’].

$field_locations = getFieldLocations($result['feed_id']); //Method to query the database and get the XPath values from it. 
$feed_array = parseXML($result['feed_url'],$field_locations); //Calls the parser method.

The above code just calls the function and store the result into the array. You can now take this array and do what you need to with it. That concludes this tutorial. I hope it was easy to follow, and as always, feel free to comment or ask questions. Thanks for reading.