Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1113969
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T02:56:12+00:00 2026-05-17T02:56:12+00:00

I want to crawl some data out of a phpBB forum i’m a member

  • 0

I want to crawl some data out of a phpBB forum i’m a member of. But for that, login is required. I can login using cURL, but if I try to crawl the data after logging in using cURL, it still shows that I need to login before viewing that page. Is it possible to login using cURL AND retain that session to do some farther job?

Another thing, that forum usually shows a confirmation page after logging in and then after 5sec, automatically redirects to the index page. And the thing is, if I login using cURL, my script also follow that header location and shows me that page..

Any workaround of this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T02:56:13+00:00Added an answer on May 17, 2026 at 2:56 am

    This is what usually works for me

    
    $timeout=5;
    $file='cookies.jar';
    $this->handle=curl_init('');
    curl_setopt($this->handle, CURLOPT_COOKIEFILE,  $file);
    curl_setopt($this->handle, CURLOPT_COOKIEJAR,   $file);
    curl_setopt($this->handle, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($this->handle, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($this->handle, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($this->handle, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($this->handle, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 (.NET CLR 3.5.30729)");
    curl_setopt($this->handle, CURLOPT_TIMEOUT, round($timeout,0));
    curl_setopt($this->handle, CURLOPT_CONNECTTIMEOUT, round($timeout,0));
    

    and i generally use it like this

    
    $now=grab_first_page();
    if(not_logged_in($now)) {
       send_login_info();
    }
    if(not_logged_in()) { end_of_script_with_error(); }
    // rest of script
    

    This way the cookies are kept across sessions and the script does not have to login every time it does something.

    — explian for below —-

    Im using an object, but you can replace $this->handle with a simple variable named $mycurl, the lines will be like

    
    $mycurl=curl_init(''
    curl_setopt($mycurl, CURLOPT_COOKIEFILE, $file)

    What the code below does is:
    – initialize “a curl instance” (to keep it simple) (3rd line)
    – 4th and 5th line: save cookies to a file. Curl works just like a browser, so when you login to a page with curl it keeps the cookies with the authentication data in memory. I’m telling it to save it to a file so that the second time i run the script it will have the same cookies and will not need to authenticate again. Or you can have multiple scripts using the same cookie file, and just one for login that you run every 24 hours or whenever you’re logged out…
    – other settings:
    * followlocation – when curl receives a http redirect it should return the page it was redirected to, not the redirect code
    * useragent – curl presents itself as firefox
    * timeout – how much time should it wait for a connection to be established, 5 or 10 is more than enough usually

    I have put a simple class i use here http://pastebin.com/Rfpc103X

    you can use it like this

    
    
    // -- initialize curl
    $ec=new easyCurl;
    
    // -- set some options
    //if the file you are in right now is named file_a.php it will create a file_a.jar cookie file
    $ec->start(str_replace('.php','.jar',__FILE__));
    $ec->headersPrepare(false);
    $ec->prepareTimeOut(20);
    
    $url='http://www.google.com/';
    
    // --- set url
    $ec->curlPrepare($url);
    
    // --- get the actual data
    $page=$ec->grab();
    
    echo $page;
    
    // to send GET data
    $get_data=array('id'=>10);
    $ec->curlPrepare($url,$get_data);
    
    // and to post data
    $post_data=array('user'=>'blue','password'=>'black');
    $ec->curlPrepare($url,array(),$post_data);
    

    It handles automatically the settings for POST/GET and other option i usually encounter. I hope the examples above will be useful to you. Good luck.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I want to code a perl application that would crawl some websites and collect
I'm trying to test out some XPaths using the Scrapy shell, but it seems
We can change crawl-rate of bots by using robots.txt. But Googlebot doesn't take robots.txt
Using Python, I want to crawl data on a web page whose source if
I want to use java.net.url to crawl some websites and retrieve some data. I
I'm using WWW::Mechanize::Firefox to crawl pages that load some JavaScript after they have been
I can crawl and index the web pages using Nutch , but I don't
i have one domain link text i want to know that does google crawl
Recently I use PyQt4 to crawl some web pages. I want to set different
I want to make my code parallel, but have some questions for the experienced.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.