PHP 5 DOM and XMLReader: Reading XML with Namespace (Part 1)

Tags:

PHP-5's DOM and XMLReader provides the ability to read XML files easily. The good thing about PHP-5's DOM (mainly DomDocument, DomNodeList, DomNode) is that it implements the standard DOM features as specified by W3C. W3C's reference on DOM can be viewed here. So, if someone has used DOM before (say on JavaScript), then it would be easy for him/her to grasp PHP-5's DOM.

The following are the functions of PHP5's DOM I commonly use:
  1. getElementsByTagName
  2. getAttribute
  3. childNodes
  4. nodeName
  5. nodeValue
  6. getElementsByTagNameNS
Here's a Simple XML File called test.xml:
  1. <?xml version="1.0" encoding="ISO-8859-1"?>
  2. <library>
  3. <book isbn="781">
  4. <name>SCJP 1.5</name>
  5. <info><![CDATA[Sun Certified Java Programmer book]]></info>
  6. </book>
  7. <book isbn="194">
  8. <name>jQuery is Awesome!</name>
  9. <info><![CDATA[jQuery Reference Book]]></info>
  10. </book>
  11. </library>

Below I will explain how to read the XML. At first load the file on DomDocument

  1. $dom = new DomDocument();
  2. $dom->load('test.xml');

So, $dom now has the XML file loaded, now using getElementsByTagName I will get the list of elements/nodes called 'book'

  1. $bookElemList = $dom->getElementsByTagName('book');

bookElemList is an object of DomNodeList and it contains List of DomNode of 'book' tags/elements. It has a instance variable 'length' which returns the number of DomNodes (items) in it, and it has a method called item (index), which returns the item based on the index passed on it. Below, I parse through bookElemList and store contents of 'book' in an assoc array. To get access to an Attribute, I use getAttribute method as shown below

  1. $bookList = array();
  2. // run a for loop to iterate through all bookElemList index.
  3. for($i=0;$i<$bookElemList->length;$i++) {
  4. $bookList[$i] = array (
  5. // get Attribute of book Element as store it in book_isbn
  6. 'book_isbn' => $bookElemList->item($i)->getAttribute('isbn'),
  7. // get 'name' element inside bookElemList at $i index.
  8. 'name' => $bookElemList->item($i)->getElementsByTagName('name')->item(0)->nodeValue,
  9. 'info' => $bookElemList->item($i)->getElementsByTagName('info')->item(0)->nodeValue
  10. );
  11.  
  12. }

Instead of getting name and info separately I could have easily used childNodes method to access the elements like below: (Note that below I had to use nodeType to check if the node is Element or not, this is required because Blank spaces on XML is considered as a text node by DOM. If you want to avoid checking nodeType, then remove whitespaces from XML before reading it). Values of NodeType can be viewed at W3C's page

  1. $bookList = array();
  2. for($i=0;$i<$bookElemList->length;$i++) {
  3. $bookList[$i]['book_isbn'] = $bookElemList->item($i)->getAttribute('isbn');
  4.  
  5. foreach($bookElemList->item($i)->childNodes as $eachChild) {
  6. if( $eachChild->nodeType == 1 ) // ensure nodeType is Element
  7. $bookList[$i][$eachChild->nodeName] = $eachChild->nodeValue;
  8. }
  9. }

But, I prefer to manually get the contents, because in most cases, I only need the values/texts of few elements on the XML, so if instead I use childNodes, it means I would be consuming memory for large XML files which has many elements/tags.

Here's a print_r of how $bookList looks like:
  1. (
  2. [0] => Array
  3. (
  4. [book_isbn] => 781
  5. [name] => SCJP 1.5
  6. [info] => Sun Certified Java Programmer book
  7. )
  8.  
  9. [1] => Array
  10. (
  11. [book_isbn] => 194
  12. [name] => jQuery is Awesome!
  13. [info] => jQuery Reference Book
  14. )
  15.  
  16. )

The above was a very simple XML. Now, lets parse an XML a bit complex and which has namespaces.An XML Namespace is used to avoid conflicts on XML Elements/Tags by using a prefix. Brief info on XML Namespaces can be viewed here.

I chose to read reading an XML featured on JWPlayer's setup wizard. It can be viewed here JWPlayer's Rss XML

Here's the XML:

  1. <rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
  2. <channel>
  3. <title>Example media RSS playlist for the JW Player</title>
  4. <link>http://www.longtailvideo.com</link>
  5.  
  6. <item>
  7. <title>Big Buck Bunny - FLV Video</title>
  8. <link>http://www.bigbuckbunny.org/</link>
  9.  
  10. <description>Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.</description>
  11. <media:credit role="author">the Peach Open Movie Project</media:credit>
  12. <media:content url="http://www.longtailvideo.com/jw/upload/bunny.flv" type="video/x-flv" duration="33" />
  13. </item>
  14.  
  15. <item>
  16. <title>Big Buck Bunny - MP3 Audio with thumb</title>
  17. <link>http://www.bigbuckbunny.org/</link>
  18.  
  19. <description>Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.</description>
  20. <media:credit role="author">the Peach Open Movie Project</media:credit>
  21. <media:content url="http://www.longtailvideo.com/jw/upload/bunny.mp3" type="audio/mpeg" duration="33" />
  22. <media:thumbnail url="http://www.longtailvideo.com/jw/upload/bunny.jpg" />
  23. </item>
  24.  
  25. <item>
  26. <title>Big Buck Bunny - PNG Image with start</title>
  27.  
  28. <link>http://www.bigbuckbunny.org/</link>
  29. <description>Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.</description>
  30. <media:group>
  31. <media:credit role="author">the Peach Open Movie Project</media:credit>
  32. <media:content url="http://www.longtailvideo.com/jw/upload/bunny.png" type="image/png" duration="20" start="10" />
  33. </media:group>
  34. </item>
  35.  
  36. </channel>
  37. </rss>

Here's the first tag from the File which declares the XML Namespace

  1. <rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">

The Namespace is defined on the first line, i.e. xmlns:media (so 'media' is the localname of that Element on this XML File, while its namespace is 'http://search.yahoo.com/mrss/')

To read a node with a namespace, the following method can be used:

  1. $dom->getElementByTagNameNS('namespaceURI', 'local_Name_of_Node');

The code below explains how to read the above XML

  1. // load the file on the DOM
  2. $dom = new DomDocument();
  3. $dom->load('http://www.longtailvideo.com/jw/upload/mrss.xml');
  4.  
  5. $itemList = array();
  6.  
  7. // get the list of Items.
  8. $itemElemList = $dom->getElementsByTagName('item');
  9. for($i=0;$i<$itemElemList->length;$i++) {
  10. $itemList[$i] = array (
  11. 'title' => $itemElemList->item($i)->getElementsByTagName('title')->item(0)->nodeValue,
  12. 'link' => $itemElemList->item($i)->getElementsByTagName('link')->item(0)->nodeValue,
  13. 'description' => $itemElemList->item($i)->getElementsByTagName('description')->item(0)->nodeValue,
  14. 'credit' => $itemElemList->item($i)->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'credit')->item(0)->nodeValue,
  15. 'content_url' => $itemElemList->item($i)->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url'),
  16. );
  17.  
  18. }

Here's a print_r of how itemList looks like:

  1. (
  2. [0] => Array
  3. (
  4. [title] => Big Buck Bunny - FLV Video
  5. [link] => http://www.bigbuckbunny.org/
  6. [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
  7. [credit] => the Peach Open Movie Project
  8. [content_url] => http://www.longtailvideo.com/jw/upload/bunny.flv
  9. )
  10.  
  11. [1] => Array
  12. (
  13. [title] => Big Buck Bunny - MP3 Audio with thumb
  14. [link] => http://www.bigbuckbunny.org/
  15. [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
  16. [credit] => the Peach Open Movie Project
  17. [content_url] => http://www.longtailvideo.com/jw/upload/bunny.mp3
  18. )
  19.  
  20. [2] => Array
  21. (
  22. [title] => Big Buck Bunny - PNG Image with start
  23. [link] => http://www.bigbuckbunny.org/
  24. [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
  25. [credit] => the Peach Open Movie Project
  26. [content_url] => http://www.longtailvideo.com/jw/upload/bunny.png
  27. )
  28.  
  29. )

So far I explained reading XML by loading on DomDocument. An important thing to realize is that when an XML is loaded on DomDocument, the entire XML is converted into a DomDocument, thus giving the ability to parse through each Nodes on the XML.

But, if the XML is very large, then loading them via DomDocument is unwise, because it means using a lot of memory (loading entire file on Memory), so, PHP-5 provides a Class: XMLReader. In part 2 of this article, I explain how to use XMLReader.

AttachmentSize
test.xml284 bytes
test.php.txt867 bytes
jwplayer_rss.php.txt906 bytes

Good work

Good work

Big Thanks

I have been trying to read an XML with namespace but wasn't able to figure it out. Finally I got it working. Big thanks!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
9 + 1 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.