IMDB Details Grabber: Using PHP DOM XPath to extract Movie details

May 19, 2010 update: Grab movie details by entering its name. i.e. Just type "The Matrix".
Jan 22, 2010 update: Now you can grab "cast" in the movie as well.
Recently, a friend of mine asked me if there's an IMDB Parser that allows to fetch information about any Movie on IMDB. Unfortunately, IMDB does not offer an API, so , after doing a bit googling I found a couple of nice IMDB Grabbers (PHP IMDB Grabber, PHP Classes IMDB), despite both being very good ones, I decided develop my own class. It will rely on XPath, and it would also allow developers to extract additional information easily without writing much more code.

Brief Explanation of Methods:

The Class 'class.imdb.php' has 3 useful Methods (in addition other ones). They are:

get($url) Fetches the contents of a specific Movie on IMDB. Call this Method by passing the URL and this Method would return as associate array containing the title, image, url, Plot, Director, Release Date and Runtime.
add($name,$value='') This Method accepts either 2 or 1 parameters. This provides developers the ability to extract additional data in addition to the default fields (i.e. title,image,url etc.) mentioned above. Please see below how to use this Method
useCSV($status) By Default, when get Method is invoked, values returned are in CSV format. If this Method is invoked with false value as parameter, the values returned from get($url) Method would be a list of array, For example: if a Movie has multiple Directors, by default, it when get($url) Method is invoked, the value of 'Director:' would be a CSV containing multiple Director Names. If useCSV(false) is invoked and then get($url) is called, the associative value of 'Director:' would be an array containing list of Directors. Please see Usage for more info.
showCast($t) The parameter is a boolean. If you wish to grab the cast (some people have requested it), then call this method with a 'true' parameter. Then, when you grab an imdb page, the returned value would also contain the cast in the movies with their thumb image, real name and cast name.
showRating($t) The parameter is a boolean. If you wish not to grab the rating, then call this method with a 'false' parameter.

Usage (Basic):

Using the class is very easy. Include the class file, initialize the class and then invoke get($url), where $url is the IMDB URL for movie.

include 'path/to/class/class.imdb.php';
$imdbObj   = new Imdb();

// movieInfo contains the details of Movies in associative array format
$movieInfo = $imdbObj->get('http://www.imdb.com/title/tt0167260/');

// OR simply enter the movie name :)
$movieInfo = $imdbObj->get('The Matrix');

Here's a vardump of movieInfo:

  1. (
  2. [title:] => The Lord of the Rings: The Return of the King (2003)
  3. [image:] => http://ia.media-imdb.com/images/M/MV5BMjE4MjA1NTAyMV5BMl5BanBnXkFtZTcwNzM1NDQyMQ@@._V1._SX94_SY140_.jpg
  4. [url:] => http://www.imdb.com/title/tt0167260/
  5. [Plot:] => The former Fellowship of the Ring prepare for the final battle for Middle Earth, while Frodo & Sam approach Mount Doom to destroy the One Ring.|
  6. [Director:] => Peter Jackson
  7. [Release Date:] => 17 December 2003 (USA)
  8. [Runtime:] => 201 min | 251 min (extended edition)
  9. )

That's it, simple enough!. By Default, title, image, url, Plot, Director, Release Date and Runtime is returned. All Keys of the associative array as suffixed with ':' (colon), I leave it upto developers to either strip it out, or just display it.

Usage (Customization and Advanced Usage):

By default, the class fetches the title, image, url, Plot, Director, Release Date and Runtime which is fine for most everyday work, but a developer may want to access additional information like Awards, Aspect Ratio, Writers, etc for a movie from IMDB. This class allows developers to do that easily, just by calling the function get. Also, I have developed the class by using Method Chaining Technique (thanks to jQuery for teaching me that), so, you can chain Methods easily.

If you wish to grab the cast in the movie, then call the following method:

$imdbObj->showCast(true);

Below, I will be using this URL: http://www.imdb.com/title/tt0167260/
Lets say, in addition to default Fields, you want to know the Writers on a movie. So, open up the IMDB Page('http://www.imdb.com/title/tt0167260/') page, and view its source on the area of Awards (you can view source in Firefox by selecting that region and right-click and then view Selection Source, or by using Firebug), you will see something like this on source code:
//..
Awards:
Won 11 Oscars. Another 106 wins & 68 nominations more
//..
You want to extract the raw text 'Won 11 Oscards, ..'. So, Call add Method and pass 'Awards:' on 1st parameter (exact value of text inside h5) and on 2nd parameter you pass '/text()' (as its raw text):

$imdbObj   = new Imdb();
$imdbObj->add('Awards:','/div/text()');
$movieInfo = $imdbObj->get('http://www.imdb.com/title/tt0167260/');

Infact, the class allows you to chain Methods, so, you could just do this:


$imdbObj   = new Imdb();
$movieInfo = $imdbObj->add('Awards:','/div/text()')->get('http://www.imdb.com/title/tt0167260/');

Now, $movieInfo will also contain information on Awards. Here's a vardump of $movieInfo

Array
(
    [title:] => The Lord of the Rings: The Return of the King (2003)
    [image:] => http://ia.media-imdb.com/images/M/MV5BMjE4MjA1NTAyMV5BMl5BanBnXkFtZTcwNzM1NDQyMQ@@._V1._SX94_SY140_.jpg
    [url:] => http://www.imdb.com/title/tt0167260/
    [Plot:] => The former Fellowship of the Ring prepare for the final battle for Middle Earth, while Frodo & Sam approach Mount Doom to destroy the One Ring.|
    [Director:] => Peter Jackson
    [Release Date:] => 17 December 2003 (USA)
    [Runtime:] => 201 min  | 251 min (extended edition)
    [Awards:] => Won 11 Oscars. 
Another 106 wins 
& 68 nominations
)

Simple? right. Now, lets say, you want to know both Awards and the Writers of that Movie, so, look at the source code on IMDB Page again on Writers area. Here's how it looks like:


Writers (WGA):
J.R.R. Tolkien (novel)
Fran Walsh (screenplay) ...
more

So, call add Method again. add Method also accepts associative array on 1st parameter, so you don't have to chain/call add again and again multiple times with the imdbObj. For writers, on 1st parameter, you pass 'Writers (WGA):' (the raw text inside h5 without tags) and on 2nd param, you pass '/a', because all the writers name are inside 'a' (anchor) tags. Here's how to do it:


$imdbObj = new Imdb();
$movieInfo = $imdbObj->add( array('Awards:' => '/div/text()' , 
                            'Writers (WGA):' => '/div/a'))
                     ->get('http://www.imdb.com/title/tt0167260/');

// OR just provide the movie name
$movieInfo = $imdbObj->add( array('Awards:' => '/div/text()' , 
                            'Writers (WGA):' => '/div/a'))
                     ->get('The Lord of the Rings King');

Thats about. Now, here's a vardump of $movieInfo

Array
(
    [title:] => The Lord of the Rings: The Return of the King (2003)
    [image:] => http://ia.media-imdb.com/images/M/MV5BMjE4MjA1NTAyMV5BMl5BanBnXkFtZTcwNzM1NDQyMQ@@._V1._SX94_SY140_.jpg
    [url:] => http://www.imdb.com/title/tt0167260/
    [Plot:] => The former Fellowship of the Ring prepare for the final battle for Middle Earth, while Frodo & Sam approach Mount Doom to destroy the One Ring.|
    [Director:] => Peter Jackson
    [Release Date:] => 17 December 2003 (USA)
    [Runtime:] => 201 min  | 251 min (extended edition)
    [Awards:] => Won 11 Oscars.
 Another 106 wins
&
68 nominations
    [Writers (WGA):] => J.R.R. Tolkien,Fran Walsh,more
)
Note: I leave up to developer to strip out 'more' keyword from the array values

Almost, all the Information on IMDB movie page is stored in this format:


Name_of_Data:
_RAW_TEXT_ TEXT_1TEXT_2
And, you will always want to extract either the '_RAW_TEXT_' or the set of 'TEXT_1', 'TEXT_2' inside contained in tag 'TAG_NAME'. So, when calling addMethod, if you want to grab '_RAW_TEXT_, just pass '/div/text()'. If you want to grab 'TEXT_1', 'TEXT_2' etc. which are inside other tags.. just pass '/div/TAG_NAME' inside 2nd parameter. For the 1st parameter, always pass the text value with spaces contained in h5 ('Name_of_Data:')

CSV: By Default, all Values returned are in CSV Format, if you want to get value as list of array, for example for Writers, you way want to get the list of Writers as in Array instead of CSV. In that case do the following:


$imdbObj   = new Imdb();
$movieInfo = $imdbObj->useCSV(false)->get('http://www.imdb.com/title/tt0167260/');

View Demo

AttachmentSize
test.imdb.php.txt642 bytes
class.imdb.php.txt5.87 KB
Tags:

HI, i'm on the road of

HI,
i'm on the road of learning web scraping
can't plz exmplai how to keep formatting while scrapping content because my scraped article are not formatted so useless

When I use your IMDB demo

When I use your IMDB demo with a title + year (eg. "The Italian Job (1969)"), I get a correct result. When I use your script locally, nothing is returned.

Any ideas ?

Thats strange...I just tried

Thats strange...I just tried it locally and it seems to work for me. Can you please explain what error you are getting?. Are you getting no result (just blank) or something else.

Thanks,

$IMDB emtpy when $Title is

$IMDB emtpy when $Title is "The Italian Job (1969)", but
is filled when "The Italian Job"

$ErrorLevel = error_reporting(E_ERROR);
$imdbObj = new Imdb();
$IMDB = $imdbObj->showCast(true)
->add('Genre:', '/div/a')
->add('Tagline:', '/div/text()')
->add('Certification:', '/div/a')
->get($Title);
error_reporting($ErrorLevel);

echo "IMDB(".$Title.")";
echo '';
print_r($IMDB);
echo '';

I tried the code and it seems

I tried the code and it seems to be working for me. So, I am not sure why you are getting that error :| :S

This is what I tried (same as yours):

$Title = 'The Italian Job (1969)';
$IMDB = $imdbObj->showCast(true)
->add('Genre:', '/div/a')
->add('Tagline:', '/div/text()')
->add('Certification:', '/div/a')
->get($Title);

echo "<pre>";
print_r($IMDB);
echo "<pre>";

The above works for me. Have you tried other search queries, do they work out for you?or do you get empty return as well?

The code on my blog (Demo Page) and the code for download is same.

It was last modified on July 8th, 2010 for a minor fix.

Fatal error (this could be

Fatal error (this could be when the IMDB query fails)

PHP Fatal error: Call to a member function getAttribute() on a non-object

public function getImdbURL($url) {
$searchURL = 'http://www.imdb.com/find?q='.urlencode($url);
$searchDom = new DomDocument();
$searchLoad = $searchDom->loadHTMLFile($searchURL);
$xpath = new DomXPath($searchDom);
return 'http://www.imdb.com'.$xpath->query('//p[@style]/b/a')->item(0)->getAttribute('href');
}

Thanks for the catch. I

Thanks for the catch.

I noticed it throws an error when an invalid search term (say: 'x98359835') is entered. In which case, Imdb doesn't return any valid Movie.

The error also occurs if you enter an invalid url (say 'http://www.google.ca/'), which isn't a valid imdb url.

Both the issues should be fixed now. Please download the updated files (class.imdb.php.txt and test.imdb.php.txt).

Thanks,

Well done - a failed search

Well done - a failed search (eg, "Shutter Island") now returns no results and not a FATAL error reported by PHP.

The search term is valid, it's just that no results are obtained.

Thanks

I just noticed the error. It

I just noticed the error. It seems for some search results, Imdb simply redirects to the Movie URL without showing any search Result. For others, Imdb returns a listing of Movies as strong match.

For example: If you search 'The Matrix'.
http://www.imdb.com/find?s=all&q=Matrix

Imdb will return a search page with the Search Result.

Whereas, if you type 'Shutter Island', if simply redirects to the movie page without displaying the results.
http://www.imdb.com/find?s=all&q=Shutter+Island

Redirects to: http://www.imdb.com/title/tt1130884/

without showing any search Result. Thanks for pointing the issue. I have fixed it for now.

If there are any other issues you have come across, let me know,

Thanks,

This is an great piece of

This is an great piece of script. Its much easier to use.

Hey Alif, thanks for this

Hey Alif,

thanks for this nice script (-:
but how can i show the results in my Page with HTML?

like "echo "Film: " + $movieInfo[????];

I m not the biggest Profi (-: Sorry..

wou8ld nice if you help me (-:

Thanks, Dennis

Hi Dennis, If you want to

Hi Dennis,

If you want to display the infos in php pages, then you can use:

Film: <?=$movieInfo['title:']?>
Plot: <?=$movieInfo['Plot:']?>
Director: <?=$movieInfo['Director:']?>

...etc. and so on (if you use the above syntax, make sure short_tags are turned on in your php settings).

You can view all the associated array keys in the test.imdb.php class.

I hope it helps,

why dont this wor $movieInfo

why dont this wor

$movieInfo = $imdbObj
->add('Archive Footage:', '/div/a')
->get('http://www.imdb.com/name/nm0004710/');

http://www.imdb.com/name/nm1200692/

Hello, It won't work for

Hello, It won't work for Grabbing Celebrity details from a Celebrity page. This Grabber works for Grabbing Movie Details of a Movie. For example, it will work for:

http://www.imdb.com/title/tt0320661/

Or even if you type 'Kingdom of Heaven', that should also work (for the above result).

any chance you can get

any chance you can get celebrity details eg from http://www.imdb.com/name/nm1200692/??

Hello Alif, Thank you. I was

Hello Alif,

Thank you. I was looking for information on how to do this myself when I stumbled on your code. This is directly usable. Nice work.

I have one question. I would also like to grab the "User Rating:" but I have trouble locating the text.

Is this possible to do at all?

Thanks in advance,
EJEE

Yes, its definitely possible.

Yes, its definitely possible. Infact, I just updated my class to reflect it. Now, the class should return User Rating and Total Votes by default. But, you can turn it off by calling the Method showRating(false).

Thanks,

Nice script... Its simple and

Nice script...
Its simple and easy to use. Thanks for sharing.

Is there anyway cast can be

Is there anyway cast can be grabbed?. I am looking at this script and but I need a way to grab the cast. Is it possible?

Yes, I have added the ability

Yes, I have added the ability to grab cast now :)

Requesting some help

Thanks for making this. I have been looking for something like this to manage my persona movie collection using PHP / MySQL.

When I put your 2 files in my xampp/htdocs/moviecritc folder and run it through http://localhost/moviecritc/test.imdb.php in the browser, I get the following error:

http://i.imgur.com/YGRxG.png

Can you please help identify this? Thanks.

nice writing

This piece of writing is excellent and I liked it a lot.

Awesome!

This is really awesome. Really easy to use.

Thanks

cant get the cast

Hi there

i have used your script. it works great. but whatever i do i cant retrieve the CAST of the movie from that page. Please let me know if its possible
cheers

I think yours will be even

I think yours will be even better than these alternatives. Looks pretty straight forward.

Doesn't seem to work !

The code doesn't seem to work, I get error messages, it cannot find DomDocument ? Where is that declared..???

Can you elaborate more on the error?

Can you please explain what error you are getting. DomDocument is the standard Class of PHP-5. Ensure that you are using PHP-5.1 atleast.

Error with PHP

For some reason, I'm getting the following error:
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: no name in http://www.imdb.com/title/tt0167260/, line: 642 in class.imdb.php on line 146

Over and over and over again, but the result does show at the end.

Thanks, I will look into it

Thanks for trying it out. From the messages on your comment, it seems to be a warning, not an error. I will look into it. For the time being, if you wish the warning message not to be displayed, you can use error_reporting feature of php to not show the warning.

http://us2.php.net/manual/en/function.error-reporting.php

thanks :-) i'll try to use it

thanks :-)
i'll try to use it soon...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.