Scraping with Curl using Cookies

Okay, I got a much needed lesson in scaping today using curl and cookies. I quickly discovered that you can not get the contents of some webpages without using cookies because some pages use cookies to validate requests for pages or other data.

My mission was to scrape the following URL from expireddomains.com:

http://member.expireddomains.net/domains/expiredcom/?start=0&o=changes&r=d

If you knew anything about expireddomains.com, you might know that the URL above is for members only, so the goal of this exercise is to make the web server at expireddomains.com think that you are a logged in user. Here is how:

So, first I tried this:

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
curl_close($ch);

echo “HTML:<br>$html<hr>”;

That returned nothing but a redirect to the login page of expireddomains.com. So then I read online how to use curl while sending a cookie with the page request and found that this worked:

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
// cookies to be sent
curl_setopt($ch, CURLOPT_COOKIE, “PUT_COOKIE_HERE”);
$html = curl_exec($ch);
curl_close($ch);

echo “HTML:<br>$html<hr>”;

That worked! so I was home free, but more importantly here are the steps I took to get the value to replace PUT_COOKIE_HERE:

  1. First, I signed up for a free membership at expireddomains.com.
  2. Then I went to the page I wanted to scrape and made a note of the exact URL which was:  http://member.expireddomains.net/domains/expiredcom/?start=8550&o=changes&r=d
  3. So I copied the URL to my clipboard from step two above and pasted it into the Firefox address bar. Don’t hit Enter yet.
  4. Then I opened Firebug and allowed the URL to resolve while Firebug was open so I could monitor the HTML headers sent.
  5. Look for a header that reads: “Cookie” and copy the value and paste it in place of PUT_COOKIE_HERE in the code above. Make sure the value is in parentheses.
  6. Now run your code again and you’ll get results as long as that’s all the webpage is looking for. I have had some that want a referrer set as well which you can also do with curl. Hint, Google curl set referrer for more information.

 

 

9 Replies to “Scraping with Curl using Cookies”

    1. I am not sure exactly how you mean. You execute it with PHP and you can see all the actions, but I doubt you can directly use curl from firebug… Let us know what you discover please.

Leave a Reply

Your email address will not be published. Required fields are marked *