Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 400159
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T16:55:01+00:00 2026-05-12T16:55:01+00:00

For completely non-nefarious purposes – machine learning specifically, I’d like to download a huge

  • 0

For completely non-nefarious purposes – machine learning specifically, I’d like to download a huge dataset of CAPTCHA images. However, CAPTCHA is always implemented using some obfuscated javascript that makes getting at the actual images without a browser a non-trivial task, at least to me, who is a javascript novice.

So, can anyone give me some helpful pointers on how to download the image of the obscured word using a script completely outside of a browser? And please don’t point me to a dataset of already collected obscured words – I need to collect the images from a specific website for this particular experiment.

Thanks!

Edit: Another way this question could be asked is very simple. When you click “view source” on website with complicated javascript, you see the script references, but that’s all you see. However, if you click “save webpage as…” (in firefox) and then view the source of the saved webpage, the javascript will be resolved and new html and the images (at least in the case of ASIRRA and reCAPTCHA) is in the source. How can I mimic this “save webpage as…” behavior using a script? This is an important web coding question in general, so please stop questioning me on my motives with this! This is knowledge I can use from now on in all web development involving scripting and I’m sure other stack overflow visitors can as well!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T16:55:01+00:00Added an answer on May 12, 2026 at 4:55 pm

    While waiting for an answer here I kept digging and eventually figured out a sort of hacked way of getting done what I wanted.

    First off, the reason this is a somewhat complicated problem (at least to a javascript novice like me) is that the images from ASIRRA are loaded onto the webpage via javascript, which is a client-side technology. This is a problem when you download the webpage using something like wget or curl because it doesn’t actually run the javascript, it just downloads the source html. Therefore, you don’t get the images.

    However, I realized that using firefox’s “Save Page As…” did exactly what I needed. It ran the javascript which loaded the images, and then it saved it all into the well-known directory structure on my hard drive. That’s exactly what I wanted to automate. So… I found a firefox Add-on called “iMacros” and wrote this macro:

    VERSION BUILD=6240709 RECORDER=FX
    TAB T=1
    URL GOTO=http://www.asirra.com/examples/ExampleService.html
    SAVEAS TYPE=CPL FOLDER=C:\Cat-Dog\Downloads  FILE=*
    

    Set to loop 10,000 times, it worked perfectly. In fact, since it was always saving to the same folder, duplicate images were overwritten (which is what I wanted).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm getting an issue where non-src folders are looking like packages, they look completely
I've got a load of non local images which I'd like to render as
I'm trying to write a notification service (for completely legit non-spam purposes) in .NET
I have a completely non-interactive python program that takes some command-line options and input
I've been working non-stop for the last three days on a completely managed interface
Given a texbox, how can I completely ignore non-digit characters? So if I press
ISSUE Large dataset with many improperly or non-uniformly entered dates in a specific field.
I'm completely stuck on a really weird IE bug and non of the other
I'm not sure if this is possible (complete non-flash developer speaking), but we have
Completely new to most of this stuff, but basically Im playing around with the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.