I’m looking for the best solution on how I can ensure I am doing

Question

0

Asked: May 19, 20262026-05-19T03:28:03+00:00 2026-05-19T03:28:03+00:00

I’m looking for the best solution on how I can ensure I am doing

0

I’m looking for the best solution on how I can ensure I am doing this correctly:

I have a calendar on my website, in which users can take the calendar iCal feed and import it into external calendars of their preference (Outlook, iCal, Google Calendar, etc…).

To deter bad people from crawling/searching my website for the *.ics files, I’ve setup Robots.txt to disallow the folders in which the feeds are stored.

So, essentially, an iCal feed might look like: webcal://www.mysite.com/feeds/cal/a9d90309dafda390d09/feed.ics

I understand the above is still a public URL. However, I have a function in which the user can change address of their feed, if they want.

My question is: All external calendars have no problem importing/subscribing to the calendar feed, except for Google Calendar. It throws the message: Google was unable to crawl the URL due to a robots.txt restriction. Google’s Answer to This.

Consequently, after searching around, I’ve found that the following works:

1) Setup a PHP file (which I am using) that essentially forces a download of the file. It basically looks like this:

<?php
$url = "/home/path/to/local/feed/".$_GET['url'];
 $file = fopen ($url, "r");
 if (!$file) {
    echo "<p>Unable to open remote file.\n";
    exit;
  }
 while (!feof ($file)) {
  $line = fgets ($file, 1024);
 print $line;
}
fclose($file);
?>

I tried using this script, and it appeared to work with Google Calendar, with no issues. (Although, I’m not sure if it updates/refreshes yet. I’m still waiting to see if this works).

My question is this: Is there a better way to approach such an issue? I’d like to keep the current Robots.txt in place to disallow crawling my directories for *.ics files and keep the files hidden.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T03:28:03+00:00

Looks to me you have two problems:

Prevent bad-behavioral bots
accessing the website.
After
installing robots.txt, allow
Googlebot access your site.

The first problem cannot be solved by robots.txt. As Marc B points out in comment, robots.txt is a purely voluntary mechanism. In order to block badbots once for all, I will suggest you using some kind of behavior-analysis program/firewall to detect bad bots and deny access from these IPs.

For the second problem, robots.txt do allow you whitelist a particular bot. Check http://facebook.com/robots.txt as example. Noted that Google identify their bots in different names (for Adsence, search, image search, mobile search), I am not if the Google calendar bot uses the generic Google bot name or not.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m looking for the best solution on how I can ensure I am doing

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply