Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8574859
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T19:34:10+00:00 2026-06-11T19:34:10+00:00

Some other website use cURL and fake http referer to copy my website content.

  • 0

Some other website use cURL and fake http referer to copy my website content.
Do we have any way to detect cURL or not real web browser ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T19:34:12+00:00Added an answer on June 11, 2026 at 7:34 pm

    There is no magic solution to avoid automatic crawling. Everyting a human can do, a robot can do it too. There are only solutions to make the job harder, so hard that only strong skilled geeks may try to pass them.

    I was in trouble too some years ago and my first advice is, if you have time, be a crawler yourself (I assume a “crawler” is the guy who crawls your website), this is the best school for the subject. By crawling several websites, I learned different kind of protections, and by associating them I’ve been efficient.

    I give you some examples of protections you may try.


    Sessions per IP

    If a user uses 50 new sessions each minute, you can think this user could be a crawler who does not handle cookies. Of course, curl manages cookies perfectly, but if you couple it with a visit counter per session (explained later), or if your crawler is a noobie with cookie matters, it may be efficient.

    It is difficult to imagine that 50 people of the same shared connection will get simultaneousely on your website (it of course depends on your traffic, that is up to you). And if this happens you can lock pages of your website until a captcha is filled.

    Idea :

    1) you create 2 tables : 1 to save banned ips and 1 to save ip and sessions

    create table if not exists sessions_per_ip (
      ip int unsigned,
      session_id varchar(32),
      creation timestamp default current_timestamp,
      primary key(ip, session_id)
    );
    
    create table if not exists banned_ips (
      ip int unsigned,
      creation timestamp default current_timestamp,
      primary key(ip)
    );
    

    2) at the beginning of your script, you delete entries too old from both tables

    3) next you check if ip of your user is banned or not (you set a flag to true)

    4) if not, you count how much he has sessions for his ip

    5) if he has too much sessions, you insert it in your banned table and set a flag

    6) you insert his ip on the sessions per ip table if it has not been already inserted

    I wrote a code sample to show in a better way my idea.

    <?php
    
    try
    {
    
        // Some configuration (small values for demo)
        $max_sessions = 5; // 5 sessions/ip simultaneousely allowed
        $check_duration = 30; // 30 secs max lifetime of an ip on the sessions_per_ip table
        $lock_duration = 60; // time to lock your website for this ip if max_sessions is reached
    
        // Mysql connection
        require_once("config.php");
        $dbh = new PDO("mysql:host={$host};dbname={$base}", $user, $password);
        $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    
        // Delete old entries in tables
        $query = "delete from sessions_per_ip where timestampdiff(second, creation, now()) > {$check_duration}";
        $dbh->exec($query);
    
        $query = "delete from banned_ips where timestampdiff(second, creation, now()) > {$lock_duration}";
        $dbh->exec($query);
    
        // Get useful info attached to our user...
        session_start();
        $ip = ip2long($_SERVER['REMOTE_ADDR']);
        $session_id = session_id();
    
        // Check if IP is already banned
        $banned = false;
        $count = $dbh->query("select count(*) from banned_ips where ip = '{$ip}'")->fetchColumn();
        if ($count > 0)
        {
            $banned = true;
        }
        else
        {
            // Count entries in our db for this ip
            $query = "select count(*)  from sessions_per_ip where ip = '{$ip}'";
            $count = $dbh->query($query)->fetchColumn();
            if ($count >= $max_sessions)
            {
                // Lock website for this ip
                $query = "insert ignore into banned_ips ( ip ) values ( '{$ip}' )";
                $dbh->exec($query);
                $banned = true;
            }
    
            // Insert a new entry on our db if user's session is not already recorded
            $query = "insert ignore into sessions_per_ip ( ip, session_id ) values ('{$ip}', '{$session_id}')";
            $dbh->exec($query);
        }
    
        // At this point you have a $banned if your user is banned or not.
        // The following code will allow us to test it...
    
        // We do not display anything now because we'll play with sessions :
        // to make the demo more readable I prefer going step by step like
        // this.
        ob_start();
    
        // Displays your current sessions
        echo "Your current sessions keys are : <br/>";
        $query = "select session_id from sessions_per_ip where ip = '{$ip}'";
        foreach ($dbh->query($query) as $row) {
            echo "{$row['session_id']}<br/>";
        }
    
        // Display and handle a way to create new sessions
        echo str_repeat('<br/>', 2);
        echo '<a href="' . basename(__FILE__) . '?new=1">Create a new session / reload</a>';
        if (isset($_GET['new']))
        {
            session_regenerate_id();
            session_destroy();
            header("Location: " . basename(__FILE__));
            die();
        }
    
        // Display if you're banned or not
        echo str_repeat('<br/>', 2);
        if ($banned)
        {
            echo '<span style="color:red;">You are banned: wait 60secs to be unbanned... a captcha must be more friendly of course!</span>';
            echo '<br/>';
            echo '<img src="http://4.bp.blogspot.com/-PezlYVgEEvg/TadW2e4OyHI/AAAAAAAAAAg/QHZPVQcBNeg/s1600/feu-rouge.png" />';
        }
        else
        {
            echo '<span style="color:blue;">You are not banned!</span>';
            echo '<br/>';
            echo '<img src="http://identityspecialist.files.wordpress.com/2010/06/traffic_light_green.png" />';
        }
        ob_end_flush();
    }
    catch (PDOException $e)
    {
        /*echo*/ $e->getMessage();
    }
    
    ?>
    

    Visit Counter

    If your user uses the same cookie to crawl your pages, you’ll be able to use his session to block it. This idea is quite simple: is it possible that your user visits 60 pages in 60 seconds?

    Idea :

    1. Create an array in the user session, it will contains visit time()s.
    2. Remove visits older than X seconds in this array
    3. Add a new entry for the actual visit
    4. Count entries in this array
    5. Ban your user if he visited Y pages

    Sample code :

    <?php
    
    $visit_counter_pages = 5; // maximum number of pages to load
    $visit_counter_secs = 10; // maximum amount of time before cleaning visits
    
    session_start();
    
    // initialize an array for our visit counter
    if (array_key_exists('visit_counter', $_SESSION) == false)
    {
        $_SESSION['visit_counter'] = array();
    }
    
    // clean old visits
    foreach ($_SESSION['visit_counter'] as $key => $time)
    {
        if ((time() - $time) > $visit_counter_secs) {
            unset($_SESSION['visit_counter'][$key]);
        }
    }
    
    // we add the current visit into our array
    $_SESSION['visit_counter'][] = time();
    
    // check if user has reached limit of visited pages
    $banned = false;
    if (count($_SESSION['visit_counter']) > $visit_counter_pages)
    {
        // puts ip of our user on the same "banned table" as earlier...
        $banned = true;
    }
    
    // At this point you have a $banned if your user is banned or not.
    // The following code will allow us to test it...
    
    echo '<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js"></script>';
    
    // Display counter
    $count = count($_SESSION['visit_counter']);
    echo "You visited {$count} pages.";
    echo str_repeat('<br/>', 2);
    
    echo <<< EOT
    
    <a id="reload" href="#">Reload</a>
    
    <script type="text/javascript">
    
      $('#reload').click(function(e) {
        e.preventDefault();
        window.location.reload();
      });
    
    </script>
    
    EOT;
    
    echo str_repeat('<br/>', 2);
    
    // Display if you're banned or not
    echo str_repeat('<br/>', 2);
    if ($banned)
    {
        echo '<span style="color:red;">You are banned! Wait for a short while (10 secs in this demo)...</span>';
        echo '<br/>';
        echo '<img src="http://4.bp.blogspot.com/-PezlYVgEEvg/TadW2e4OyHI/AAAAAAAAAAg/QHZPVQcBNeg/s1600/feu-rouge.png" />';
    }
    else
    {
        echo '<span style="color:blue;">You are not banned!</span>';
        echo '<br/>';
        echo '<img src="http://identityspecialist.files.wordpress.com/2010/06/traffic_light_green.png" />';
    }
    ?>
    

    An image to download

    When a crawler need to do his dirty work, that’s for a large amount of data, and in a shortest possible time. That’s why they don’t download images on pages ; it takes too much bandwith and makes the crawling slower.

    This idea (I think the most elegent and the most easy to implement) uses the mod_rewrite to hide code in a .jpg/.png/… an image file. This image should be available on each page you want to protect : it could be your logo website, but you’ll choose a small-sized image (because this image must not be cached).

    Idea :

    1/ Add those lines to your .htaccess

    RewriteEngine On
    RewriteBase /tests/anticrawl/
    RewriteRule ^logo\.jpg$ logo.php
    

    2/ Create your logo.php with the security

    <?php
    
    // start session and reset counter
    session_start();
    $_SESSION['no_logo_count'] = 0;
    
    // forces image to reload next time
    header("Cache-Control: no-store, no-cache, must-revalidate");
    
    // displays image
    header("Content-type: image/jpg");
    readfile("logo.jpg");
    die();
    

    3/ Increment your no_logo_count on each page you need to add security, and check if it reached your limit.

    Sample code :

    <?php
    
    $no_logo_limit = 5; // number of allowd pages without logo
    
    // start session and initialize
    session_start();
    if (array_key_exists('no_logo_count', $_SESSION) == false)
    {
        $_SESSION['no_logo_count'] = 0;
    }
    else
    {
        $_SESSION['no_logo_count']++;
    }
    
    // check if user has reached limit of "undownloaded image"
    $banned = false;
    if ($_SESSION['no_logo_count'] >= $no_logo_limit)
    {
        // puts ip of our user on the same "banned table" as earlier...
        $banned = true;
    }
    
    // At this point you have a $banned if your user is banned or not.
    // The following code will allow us to test it...
    
    echo '<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js"></script>';
    
    // Display counter
    echo "You did not loaded image {$_SESSION['no_logo_count']} times.";
    echo str_repeat('<br/>', 2);
    
    // Display "reload" link
    echo <<< EOT
    
    <a id="reload" href="#">Reload</a>
    
    <script type="text/javascript">
    
      $('#reload').click(function(e) {
        e.preventDefault();
        window.location.reload();
      });
    
    </script>
    
    EOT;
    
    echo str_repeat('<br/>', 2);
    
    // Display "show image" link : note that we're using .jpg file
    echo <<< EOT
    
    <div id="image_container">
        <a id="image_load" href="#">Load image</a>
    </div>
    <br/>
    
    <script type="text/javascript">
    
      // On your implementation, you'llO of course use <img src="logo.jpg" />
      $('#image_load').click(function(e) {
        e.preventDefault();
        $('#image_load').html('<img src="logo.jpg" />');
      });
    
    </script>
    
    EOT;
    
    // Display if you're banned or not
    echo str_repeat('<br/>', 2);
    if ($banned)
    {
        echo '<span style="color:red;">You are banned: click on "load image" and reload...</span>';
        echo '<br/>';
        echo '<img src="http://4.bp.blogspot.com/-PezlYVgEEvg/TadW2e4OyHI/AAAAAAAAAAg/QHZPVQcBNeg/s1600/feu-rouge.png" />';
    }
    else
    {
        echo '<span style="color:blue;">You are not banned!</span>';
        echo '<br/>';
        echo '<img src="http://identityspecialist.files.wordpress.com/2010/06/traffic_light_green.png" />';
    }
    ?>
    

    Cookie check

    You can create cookies in the javascript side to check if your users does interpret javascript (a crawler using Curl does not, for example).

    The idea is quite simple : this is about the same as an image check.

    1. Set a $_SESSION value to 1 and increment it in each visits
    2. if a cookie (set in JavaScript) does exist, set session value to 0
    3. if this value reached a limit, ban your user

    Code :

    <?php
    
    $no_cookie_limit = 5; // number of allowd pages without cookie set check
    
    // Start session and reset counter
    session_start();
    
    if (array_key_exists('cookie_check_count', $_SESSION) == false)
    {
        $_SESSION['cookie_check_count'] = 0;
    }
    
    // Initializes cookie (note: rename it to a more discrete name of course) or check cookie value
    if ((array_key_exists('cookie_check', $_COOKIE) == false) || ($_COOKIE['cookie_check'] != 42))
    {
        // Cookie does not exist or is incorrect...
        $_SESSION['cookie_check_count']++;
    }
    else
    {
        // Cookie is properly set so we reset counter
        $_SESSION['cookie_check_count'] = 0;
    }
    
    // Check if user has reached limit of "cookie check"
    $banned = false;
    if ($_SESSION['cookie_check_count'] >= $no_cookie_limit)
    {
        // puts ip of our user on the same "banned table" as earlier...
        $banned = true;
    }
    
    // At this point you have a $banned if your user is banned or not.
    // The following code will allow us to test it...
    
    echo '<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js"></script>';
    
    // Display counter
    echo "Cookie check failed {$_SESSION['cookie_check_count']} times.";
    echo str_repeat('<br/>', 2);
    
    // Display "reload" link
    echo <<< EOT
    
    <br/>
    <a id="reload" href="#">Reload</a>
    <br/>
    
    <script type="text/javascript">
    
      $('#reload').click(function(e) {
        e.preventDefault();
        window.location.reload();
      });
    
    </script>
    
    EOT;
    
    // Display "set cookie" link
    echo <<< EOT
    
    <br/>
    <a id="cookie_link" href="#">Set cookie</a>
    <br/>
    
    <script type="text/javascript">
    
      // On your implementation, you'll of course put the cookie set on a $(document).ready()
      $('#cookie_link').click(function(e) {
        e.preventDefault();
        var expires = new Date();
        expires.setTime(new Date().getTime() + 3600000);
        document.cookie="cookie_check=42;expires=" + expires.toGMTString();
      });
    
    </script>
    EOT;
    
    
    // Display "unset cookie" link
    echo <<< EOT
    
    <br/>
    <a id="unset_cookie" href="#">Unset cookie</a>
    <br/>
    
    <script type="text/javascript">
    
      // On your implementation, you'll of course put the cookie set on a $(document).ready()
      $('#unset_cookie').click(function(e) {
        e.preventDefault();
        document.cookie="cookie_check=;expires=Thu, 01 Jan 1970 00:00:01 GMT";
      });
    
    </script>
    EOT;
    
    // Display if you're banned or not
    echo str_repeat('<br/>', 2);
    if ($banned)
    {
        echo '<span style="color:red;">You are banned: click on "Set cookie" and reload...</span>';
        echo '<br/>';
        echo '<img src="http://4.bp.blogspot.com/-PezlYVgEEvg/TadW2e4OyHI/AAAAAAAAAAg/QHZPVQcBNeg/s1600/feu-rouge.png" />';
    }
    else
    {
        echo '<span style="color:blue;">You are not banned!</span>';
        echo '<br/>';
        echo '<img src="http://identityspecialist.files.wordpress.com/2010/06/traffic_light_green.png" />';
    }
    

    Protection against proxies

    Some words about the different kind of proxies we may find over the web :

    • A “normal” proxy displays information about user connection (notably, his IP)
    • An anonymous proxy does not display IP, but gives information about proxy usage on header.
    • A high-anonyous proxy do not display user IP, and do not display any information that a browser may not send.

    It is easy to find a proxy to connect any website, but it is very hard to find high-anonymous proxies.

    Some $_SERVER variables may contain keys specifically if your users is behind a proxy (exhaustive list took from this question):

    • CLIENT_IP
    • FORWARDED
    • FORWARDED_FOR
    • FORWARDED_FOR_IP
    • HTTP_CLIENT_IP
    • HTTP_FORWARDED
    • HTTP_FORWARDED_FOR
    • HTTP_FORWARDED_FOR_IP
    • HTTP_PC_REMOTE_ADDR
    • HTTP_PROXY_CONNECTION’
    • HTTP_VIA
    • HTTP_X_FORWARDED
    • HTTP_X_FORWARDED_FOR
    • HTTP_X_FORWARDED_FOR_IP
    • HTTP_X_IMFORWARDS
    • HTTP_XROXY_CONNECTION
    • VIA
    • X_FORWARDED
    • X_FORWARDED_FOR

    You may give a different behavior (lower limits etc) to your anti crawl securities if you detect one of those keys on your $_SERVER variable.


    Conclusion

    There is a lot of ways to detect abuses on your website, so you’ll find a solution for sure. But you need to know precisely how your website is used, so your securities will not be aggressive with your “normal” users.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Due to network or some other reasons, some sites do not have their css
Is there a way to use an iframe or some other method of showing
I have a website (MVC) and some other projects in the same solution. In
I am willing to build a wiki-based website that would have some other features,
I have to run some other application from my program and hide it's form.
I have gone through some other answers, but cannot get the solution to my
Im doing some form validation on a website and I've tried to use JQuery
I'm working on a website with some other people. Usually when we want to
A plugin which I need to implement in other website uses some jquery functions.
We have set up Website Payments Pro account and I managed to use ruby

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.