My test uses Selenium to loop through a CSV list of URLs via an

Question

0

Asked: May 12, 20262026-05-12T21:27:52+00:00 2026-05-12T21:27:52+00:00

My test uses Selenium to loop through a CSV list of URLs via an

0

My test uses Selenium to loop through a CSV list of URLs via an HTTP proxy (working script below). As I watch the script run I can see about 10% of the calls produce “Proxy error: 502” (“Bad_Gateway”); however, the errors are not captured by my catch-all “except Exception” clause — ie: instead of writing ‘error’ in the appropriate row of the “output.csv”, they get passed to the else clause and produce a short piece of html that starts: “Proxy error: 502 Read from server failed: Unknown error.” Also, if I collect all the URLs which returned 502s and re-run the script, they all pass, which leads me to believe that this is a sporadic network path issue.

Question: Can the script be made to recognize the the 502 errors, sleep a minute, and then retry the URL instead of moving on to the next URL in the list?

The only alternative that I can think of is to apply re.search(“Proxy error: 502”) after “get_html_source” as a way to catch the bad calls. Then, if the RE matches, put the script to sleep for a minute and then retry ‘sel.open(row[0]’ on the URL which produced the 502. Any advice would be much appreciated. Thanks!

#python 2.6
from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://baseDomain.com")
        self.selenium.start()
        self.selenium.set_timeout("60000")

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('ListOfSubDomains.csv', 'rb'))
        for row in spamReader:
            try:
                sel.open(row[0])
            except Exception:
                ofile = open('output.csv', 'ab')
                ofile.write("error" + '\n')
                ofile.close()
            else:
                time.sleep(5)
                html = sel.get_html_source()
                ofile = open('output.csv', 'ab')
                ofile.write(html.encode('utf-8') + '\n')
            ofile.close()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
     unittest.main()

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T21:27:53+00:00

Editorial Team

2026-05-12T21:27:53+00:00Added an answer on May 12, 2026 at 9:27 pm

I think that the alternative you propose is ok. rather than the get_html_source, You can use the captureNetworkTraffic function to get the HTTP header. That would be safer because the 502 page can change.

Be careful, there is a bug in the captureNetworkTraffic of the selenium python wrapper that can be hacked. See: http://jira.openqa.org/browse/SRC-758

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My test uses Selenium to loop through a CSV list of URLs via an

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply