PHP 5 DOM and XMLReader: Reading XML with Namespace (Part 1)

PHP-5’s DOM and XMLReader provides the ability to read XML files easily. The good thing about PHP-5’s DOM (mainly DomDocument, DomNodeList, DomNode) is that it implements the standard DOM features as specified by W3C. W3C’s reference on DOM can be viewed here. So, if someone has used DOM before (say on JavaScript), then it would be easy for him/her to grasp PHP-5’s DOM.

The following are the functions of PHP5’s DOM I commonly use:

  1. getElementsByTagName
  2. getAttribute
  3. childNodes
  4. nodeName
  5. nodeValue
  6. getElementsByTagNameNS

Here’s a Simple XML File called test.xml:



 
   SCJP 1.5
   
 
 
   jQuery is Awesome!
   
 	


Below I will explain how to read the XML. At first load the file on DomDocument

$dom = new DomDocument();
$dom->load('test.xml');

So, $dom now has the XML file loaded, now using getElementsByTagName I will get the list of elements/nodes called ‘book’

$bookElemList = $dom->getElementsByTagName('book');

bookElemList is an object of DomNodeList and it contains List of DomNode of ‘book’ tags/elements. It has a instance variable ‘length’ which returns the number of DomNodes (items) in it, and it has a method called item (index), which returns the item based on the index passed on it. Below, I parse through bookElemList and store contents of ‘book’ in an assoc array. To get access to an Attribute, I use getAttribute method as shown below

$bookList = array();
// run a for loop to iterate through all bookElemList index.
for($i=0;$i<$bookElemList->length;$i++) {
	$bookList[$i] = array (
          // get Attribute of book Element as store it in book_isbn
	  'book_isbn' => $bookElemList->item($i)->getAttribute('isbn'),
          // get 'name' element inside bookElemList at $i index.
	  'name'      => $bookElemList->item($i)->getElementsByTagName('name')->item(0)->nodeValue,
	  'info'      => $bookElemList->item($i)->getElementsByTagName('info')->item(0)->nodeValue
	);

}

Instead of getting name and info separately I could have easily used childNodes method to access the elements like below: (Note that below I had to use nodeType to check if the node is Element or not, this is required because Blank spaces on XML is considered as a text node by DOM. If you want to avoid checking nodeType, then remove whitespaces from XML before reading it). Values of NodeType can be viewed at W3C’s page

$bookList = array();
for($i=0;$i<$bookElemList->length;$i++) {
  $bookList[$i]['book_isbn'] = $bookElemList->item($i)->getAttribute('isbn');
 
 foreach($bookElemList->item($i)->childNodes as $eachChild) {
  if( $eachChild->nodeType == 1 )  // ensure nodeType is Element
   $bookList[$i][$eachChild->nodeName] = $eachChild->nodeValue;
 }
}

But, I prefer to manually get the contents, because in most cases, I only need the values/texts of few elements on the XML, so if instead I use childNodes, it means I would be consuming memory for large XML files which has many elements/tags.

Here’s a print_r of how $bookList looks like:

Array
(
    [0] => Array
        (
            [book_isbn] => 781
            [name] => SCJP 1.5
            [info] => Sun Certified Java Programmer book
        )

    [1] => Array
        (
            [book_isbn] => 194
            [name] => jQuery is Awesome!
            [info] => jQuery Reference Book
        )

)

The above was a very simple XML. Now, lets parse an XML a bit complex and which has namespaces.An XML Namespace is used to avoid conflicts on XML Elements/Tags by using a prefix. Brief info on XML Namespaces can be viewed here.

I chose to read reading an XML featured on JWPlayer’s setup wizard. It can be viewed here JWPlayer’s Rss XML

Here’s the XML:


	
		Example media RSS playlist for the JW Player
		http://www.longtailvideo.com

		
			Big Buck Bunny - FLV Video
			http://www.bigbuckbunny.org/

			Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
			the Peach Open Movie Project
			
		

		
			Big Buck Bunny - MP3 Audio with thumb
			http://www.bigbuckbunny.org/

			Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
			the Peach Open Movie Project
			
			
		

		
			Big Buck Bunny - PNG Image with start

			http://www.bigbuckbunny.org/
			Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
			
				the Peach Open Movie Project
				
			
		

	


Here’s the first tag from the File which declares the XML Namespace


The Namespace is defined on the first line, i.e. xmlns:media (so ‘media’ is the localname of that Element on this XML File, while its namespace is ‘http://search.yahoo.com/mrss/’)

To read a node with a namespace, the following method can be used:

$dom->getElementByTagNameNS('namespaceURI', 'local_Name_of_Node');

The code below explains how to read the above XML

// load the file on the DOM
$dom = new DomDocument();
$dom->load('http://www.longtailvideo.com/jw/upload/mrss.xml');

$itemList 		= array();

// get the list of Items.
$itemElemList 	= $dom->getElementsByTagName('item');
for($i=0;$i<$itemElemList->length;$i++) {
	$itemList[$i] = array (
		'title'       => $itemElemList->item($i)->getElementsByTagName('title')->item(0)->nodeValue,
		'link'        => $itemElemList->item($i)->getElementsByTagName('link')->item(0)->nodeValue,
		'description' => $itemElemList->item($i)->getElementsByTagName('description')->item(0)->nodeValue,			
		'credit'      => $itemElemList->item($i)->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'credit')->item(0)->nodeValue,
		'content_url' => $itemElemList->item($i)->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url'),
	);

}

Here’s a print_r of how itemList looks like:

Array
(
    [0] => Array
        (
            [title] => Big Buck Bunny - FLV Video
            [link] => http://www.bigbuckbunny.org/
            [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
            [credit] => the Peach Open Movie Project
            [content_url] => http://www.longtailvideo.com/jw/upload/bunny.flv
        )

    [1] => Array
        (
            [title] => Big Buck Bunny - MP3 Audio with thumb
            [link] => http://www.bigbuckbunny.org/
            [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
            [credit] => the Peach Open Movie Project
            [content_url] => http://www.longtailvideo.com/jw/upload/bunny.mp3
        )

    [2] => Array
        (
            [title] => Big Buck Bunny - PNG Image with start
            [link] => http://www.bigbuckbunny.org/
            [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
            [credit] => the Peach Open Movie Project
            [content_url] => http://www.longtailvideo.com/jw/upload/bunny.png
        )

)

So far I explained reading XML by loading on DomDocument. An important thing to realize is that when an XML is loaded on DomDocument, the entire XML is converted into a DomDocument, thus giving the ability to parse through each Nodes on the XML.

But, if the XML is very large, then loading them via DomDocument is unwise, because it means using a lot of memory (loading entire file on Memory), so, PHP-5 provides a Class: XMLReader. In part 2 of this article, I explain how to use XMLReader.