PHP 5 XMLReader: Reading XML with Namespace (Part 2)

Tags:

This is a continuation of previous article, where I wrote on how to use PHP'5 DOM to read XML Files easily. But, for large files, its better to use XMLReader. Unlike DomDocument, XMLReader does not load the entire File on memory. It reads an XML file one node at a time. I hardly use XMLReader because it requires writing a lot of codes, but for extremely large XML Files, its better to use it. Full reference on XMLReader can be viewed here. It requires PHP 5.2.

The following are the functions I use:

  1. open - opens XML File
  2. read - reads node of XML
  3. getAttribute - gets the attribute of a node
  4. moveToNextAttribute - moves to next Attribute
  5. next - moves to next node

Here's a Basic XML File called 'test.xml':

<?xml version="1.0" encoding="ISO-8859-1"?>

	
		SCJP 1.5
		<![CDATA[Sun Certified Java Programmer book]]>
	
	
		jQuery is Awesome!
		<![CDATA[jQuery Reference Book]]>
		

Below is a way to read it:

At first initialize and open the XML file

$xmlReader = new XMLReader();
// open the file for reading
$xmlReader->open('test.xml')

Now keep reading nodes until the end has been reached, which done by a while loop:

while($xmlReader->read()) { 

}

Below is the full code:

$bookList = array();
$i=0;
$xmlReader = new XMLReader();
$xmlReader->open('test.xml');
while($xmlReader->read()) {
        // check to ensure nodeType is an Element not attribute or #Text  
	if($xmlReader->nodeType == XMLReader::ELEMENT) {
		if($xmlReader->localName == 'book') {
			$bookList[$i]['book_isbn'] = $xmlReader->getAttribute('isbn');
		}
		if($xmlReader->localName == 'name') {
			// move to its textnode / child
			$xmlReader->read(); 
			$bookList[$i]['name'] = $xmlReader->value;
		}
		if($xmlReader->localName == 'info') {
			// move to its textnode / child
			$xmlReader->read(); 
			$bookList[$i]['info'] = $xmlReader->value;
			$i++;
		}
		
	}
}

Here's a var_dump of $bookList

array(2) {
  [3]=>
  array(3) {
    ["book_isbn"]=>
    string(3) "781"
    ["name"]=>
    string(8) "SCJP 1.5"
    ["info"]=>
    string(34) "Sun Certified Java Programmer book"
  }
  [4]=>
  array(3) {
    ["book_isbn"]=>
    string(3) "194"
    ["name"]=>
    string(18) "jQuery is Awesome!"
    ["info"]=>
    string(21) "jQuery Reference Book"
  }
}

That's about it. It requires writing a lot of codes, but it's useful for large (by large I mean extremely large) XML files.

However, when the XML files is very complex (and not extremely large), I find both DomDocument or XMLReader are not ideal solution. I rather use XPath (DomXPath), which I will hopefully write in my next article.

AttachmentSize
test.xml300 bytes
xmlreader_test.php.txt677 bytes

Thank you for sharing your

Thank you for sharing your code, I found it very useful.
Karel

Hello I find your code very

Hello I find your code very useful.. What would you recommend for me if I am going to parse a huge XML? I try using DOM and It really eats a lot of memory, I got a memory error. I believe XMLReader is okay like your code above, does it support parsing media RSS? Because in my RSS I have this value inside the node

Yes, it can be done, RSS is also a type of XML file

Hi,

yes, it can definitely be done. I received your email, so I will explain it on the email.

Parsing Media RSS

Hello I find your code very useful.. What would you recommend for me if I am going to parse a huge XML? I try using DOM and It really eats a lot of memory, I got a memory error. I believe XMLReader is okay like your code above, does it support parsing media RSS? Because in my RSS I have this value inside the node

U can DOM object for small

U can DOM object for small files and XML reader for huge files

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
8 + 11 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.