PHP DomXPath: Read Complex XML files easily.

XPath allows traversing through XML elements and attributes very easily. For complex XMLs, using XPath can significantly reduce the complexity of coding.

Good Tutorial of XPath can be viewed on W3C schools here. Reference on DomXPath can be viewed here.

XPath is useful if someone needs to extract a specific node from an entire XML, rather than parsing the entire XML by running a query. Below I will explain how to use DomDocument and DomXPath to read XML. At first I will start with a simple XML, and then more complex.

Here’s a basic XML called ‘test.xml’:

<xml version="1.0" encoding="ISO-8859-1">
<library>
	<book isbn="781">
		<name>SCJP 1.5</name>
		<info><![CDATA[Sun Certified Java Programmer book]]></info>
	</book>
        <book isbn="980">
		<name>jQuery How To</name>
		<info><![CDATA[jQuery Reference Book]]></info>
	</book>
<library>

Few Details: A couple of DomXPath query syntax is:

// get all book element which has info attribute and is a child of library 
query("//library/book[@info]") 

// get all name element which is a child of book which is a child of library (library is the root node)
query("/library/book/name'); 

// note: above i use single slash before library to specify its a root node

For full specs, please check out W3C’s page on XPath syntax

To register a namespace in DomXPath of PHP, use the following:

$xpath->registerNamespace('localName', namespaceURI');

Below is a way to parse it, At first load XML File on DomDocument and initialize DomXPath and then run the query method.

$dom = new DomDocument("1.0", "ISO-8859-1");
$dom->load('test.xml');
$xpath = new DomXPath($dom);

Lets say, I want to get all the names of book, then just do the following:

$bookList  = array();
$bookNodes = $xpath->query('//book/name'); // selects all name element
for($i=0;$i<$bookNodes->length;$i++) {
 $bookList[] = $bookNodes->item($i)->nodeValue;
}

// below is print_r of bookList
array(2) {
  [0]=>
  string(8) "SCJP 1.5"
  [1]=>
  string(18) "jQuery is Awesome!"
}

OK, simple enough. The above example doesn’t demonstrate how XPath makes life easier, so, lets parse YouTube’s featured RSS Playlist, which has namespace and a whole lot of elements. To keep it simple, I am only going to fetch the recently added Video’s Title and their corresponding URL.

From the RSS file, it can be seen that URL is stored “href” attribute of ‘link’ the element which has attribute type as text/html. Below I show the snippet of ‘link’ element and title from the XML.


 ... 
  
    YouTube Symphony Orchestra @ Carnegie Hall - Act One
    ...
    
   ...
  
 ...

OK, so we need to get the following:

  1. The node value of ‘title’ element which has attribute type=’text’ and which is inside entry and entry is inside feed.
  2. The attribute value of ‘href’ which is of link element inside entry, and which is inside feed.

Below is the full code of how to read YouTube’s RSS

// initialize and the file into load DomDocument
$youTubeDom = new DomDocument();
$youTubeDom->load('http://gdata.youtube.com/feeds/api/standardfeeds/recently_featured');

// intialize an DomXPath object
$xPath 		= new DomXPath($youTubeDom);

// register the namespace on YouTube (its declared on feed element)
$xPath->registerNamespace('yte', 'http://www.w3.org/2005/Atom');

// now run the 2 queries, add the suffix of that namespace because feed, entry etc. belong to that namespace
$linkNodes 	= $xPath->query("/yte:feed/yte:entry/yte:link[@type='text/html']");
$titleNodes     = $xPath->query("/yte:feed/yte:entry/yte:title[@type='text']");


$recentList	= array();
for($i=0;$i<$titleNodes->length;$i++) {
	$recentList[$i] = array(
		'title' => $titleNodes->item($i)->nodeValue ,
		'url'	=> $linkNodes->item($i)->getAttribute('href')
	);
	
}

And Thats it. Here’s a var_dump of the $recentList


array(25) {
  [0]=>
  array(2) {
    ["title"]=>
    string(52) "YouTube Symphony Orchestra @ Carnegie Hall - Act One"
    ["url"]=>
    string(42) "http://www.youtube.com/watch?v=ueJcRmfweSM"
  }
  [1]=>
  array(2) {
    ["title"]=>
    string(38) ""The Internet Symphony" Global Mash Up"
    ["url"]=>
    string(42) "http://www.youtube.com/watch?v=oC4FAyg64OI"
  }
  [2]=>
  array(2) {
    ["title"]=>
    string(49) "Harmony: The Road to Carnegie Hall Teaser Trailer"
    ["url"]=>
    string(42) "http://m.youtube.com/details?v=oC4FAyg64OI"
  }
  [3]=>
  array(2) {
    ["title"]=>
    string(44) "The YouTube Symphony Orchestra Summit Begins"
    ["url"]=>
    string(42) "http://www.youtube.com/watch?v=wBZviTce94Q"
  }
  [4]=>
  array(2) {
    ["title"]=>
    string(47) "4/13@A.M. YouTubeSymphonyOrchestra Vlog by Eiko"
    ["url"]=>
    string(42) "http://m.youtube.com/details?v=wBZviTce94Q"
  }
  [5]=>
  array(2) {
    ["title"]=>
    string(50) ""Internet Symphony, Eroica" Rehearsal with Tan Dun"
    ["url"]=>
    string(42) "http://www.youtube.com/watch?v=lwVtmH9k-SI"
  }
  // ...... shortened .... 
  
}

I hope this explains how XPath simplifies reading XML files.

PHP 5 XMLReader: Reading XML with Namespace (Part 2)

This is a continuation of previous article, where I wrote on how to use PHP’5 DOM to read XML Files easily. But, for large files, its better to use XMLReader. Unlike DomDocument, XMLReader does not load the entire File on memory. It reads an XML file one node at a time. I hardly use XMLReader because it requires writing a lot of codes, but for extremely large XML Files, its better to use it. Full reference on XMLReader can be viewed here. It requires PHP 5.2.

The following are the functions I use:

  1. open – opens XML File
  2. read – reads node of XML
  3. getAttribute – gets the attribute of a node
  4. moveToNextAttribute – moves to next Attribute
  5. next – moves to next node

Here’s a Basic XML File called ‘test.xml’:

<xml version="1.0" encoding="ISO-8859-1">
<library>
	<book isbn="781">
		<name>SCJP 1.5</name>
		<info><![CDATA[Sun Certified Java Programmer book]]></info>
	</book>
        <book isbn="980">
		<name>jQuery How To</name>
		<info><![CDATA[jQuery Reference Book]]></info>
	</book>
<library>

Below is a way to read it:

At first initialize and open the XML file

$xmlReader = new XMLReader();
// open the file for reading
$xmlReader->open('test.xml')

Now keep reading nodes until the end has been reached, which done by a while loop:

while($xmlReader->read()) { 

}

Below is the full code:

$bookList = array();
$i=0;
$xmlReader = new XMLReader();
$xmlReader->open('test.xml');
while($xmlReader->read()) {
        // check to ensure nodeType is an Element not attribute or #Text  
	if($xmlReader->nodeType == XMLReader::ELEMENT) {
		if($xmlReader->localName == 'book') {
			$bookList[$i]['book_isbn'] = $xmlReader->getAttribute('isbn');
		}
		if($xmlReader->localName == 'name') {
			// move to its textnode / child
			$xmlReader->read(); 
			$bookList[$i]['name'] = $xmlReader->value;
		}
		if($xmlReader->localName == 'info') {
			// move to its textnode / child
			$xmlReader->read(); 
			$bookList[$i]['info'] = $xmlReader->value;
			$i++;
		}
		
	}
}

Here’s a var_dump of $bookList

array(2) {
  [3]=>
  array(3) {
    ["book_isbn"]=>
    string(3) "781"
    ["name"]=>
    string(8) "SCJP 1.5"
    ["info"]=>
    string(34) "Sun Certified Java Programmer book"
  }
  [4]=>
  array(3) {
    ["book_isbn"]=>
    string(3) "194"
    ["name"]=>
    string(18) "jQuery is Awesome!"
    ["info"]=>
    string(21) "jQuery Reference Book"
  }
}

That’s about it. It requires writing a lot of codes, but it’s useful for large (by large I mean extremely large) XML files.

However, when the XML files is very complex (and not extremely large), I find both DomDocument or XMLReader are not ideal solution. I rather use XPath (DomXPath), which I will hopefully write in my next article.

PHP-5 DomDocument: Creating a Basic XML

DomDocument is a powerful library of PHP-5 to create XML. In this article, I will try to explain the basics of DomDocument and then will create a couple of simple XML files.

First, lets have a look at a simple XML file:
Taken from W3C’s example:

<?xml version="1.0" encoding="ISO-8859-1"?>
>note>
	<to>Tove>/to>
	<from>Jani>/from>
	<heading>Reminder>/heading>
	<desc>Don't forget me this weekend!</desc>
</note>

We see an XML version and character encoding at the first line, followed by XML Tags/Elements. I am going to create the above XML using DomDocument

In PHP, at first let’s create an instance of DomDocument and initialize it, and set its version and character encoding

// 1st param takes version and 2nd param takes encoding;
$dom = new DomDocument("1.0", "ISO-8859-1");

// it can also be set later, like below, if you decide not to declare at first line:
$dom->version  = "1.0";
$dom->encoding = "ISO-8859-1";

To create a Node, the following method is used:

$dom->createElement('NODE_NAME');
// OR
$dom->createElement('NODE_NAME', 'NODE_VALUE');

To set a node as a child node of another node:

$dom->appendChild( 'PREVIOUSLY_CREATED_NODE' );

Now, we are going to create Nodes:

// we create a XML Node and store it in a variable called noteElem;
$noteElem  = $dom->createElement('note'); 

// createElement takes 2 param also, with 1st param takes the node Name, and 2nd param is node Value
$toElem    = $dom->createElement('to', 'Tove');

// now, we add $toElem as a child of $noteElem
$noteElem->appendChild( $toElem );

//we don't need to create a new variable for each node, we can do the following to quicken the steps:
$noteElem->appendChild ( $dom->createElement('from', 'Jani') );
$noteElem->appendChild ( $dom->createElement('heading', 'Reminder') );
$noteElem->appendChild ( $dom->createElement('desc', 'Dont forget me this weekend!') );

So, $noteElem, now has all its child added properly. So we add $noteElem to $dom and then we can save it as File or output as XML like below:

// add $noteElem to the main dom
$dom->appendChild( $noteElem );

// $dom has entire XML, but, it's not clearly formatted, i.e there's no space or new lines in between tags, so I do this:
$dom->formatOutput = true; // this adds spaces, new lines and makes the XML more readable format. 

// now $dom has all the entire XML properly, we can output it like below, 
$xmlString = $dom->saveXML(); // $xmlString contains the entire String

// or we can save it as XML
$dom->save('filename.xml'); // returns true/false upon failure or success 

The above was a very simple XML, with no Attributes or CDATA/PCDATA on any element, so, lets say we have the following XML below:

<?xml version="1.0" encoding="ISO-8859-1"?>
<library>
 <book isbn="781">
   <name>SCJP 1.5</name>
   <info><![CDATA[Sun Certified Java Programmer book]]></info>
  </book>
</library>

To set Attribute of any Element, use:

$dom->setAttribute('name', 'value');

To create a CDATA, use:

$dom->createCDATASection('The CData for the Node');

Below, I will create the above XML

$dom 	  = new DomDocument("1.0", "ISO-8859-1");
$library  = $dom->createElement('library');

//1st item
$bookElem = $dom->createElement('book');
// set it's attribute
$bookElem->setAttribute('isbn', '781');
$bookElem->appendChild( $dom->createElement('name', 'SCJP 1.5') );

//create infoElement and append a CDATA as its child
$infoElem = $dom->createElement('info');
$infoElem->appendChid( $dom->createCDATASection('Sun Certified Java Programmer book') );

$bookElem->appendChild( $infoElem );
$library->appendChild( $bookElem );
$dom->appendChild( $library );

$xmlData  = $dom->saveXML();

Sometimes you may want to have comments in an XML, use:

$dom->createComment('some comment data');

I think thats about it. One important thing to know that DomDocument can be used to create HTML files. It have several functions like saveHTML, please read the PHP 5’s official Documentation to know more.