The only programming language I really know right now is python. I am pretty new to javascript. Right now, I’m trying to make a simple program that goes through a website and gathers information for me.
On the website, there is a long list of links to other pages. If you hover near one, another link will come up on the side that says “Find Dupes” (short for duplicates). I found this in the page source:
<a href="javascript:void(0)" onclick="getDuplictes(1020347166, true)">Find Dupes</a>
So when you click on the javascript link, an iframe will pop up:
</div>
</center>
<div id="ActionDiv" style="position: absolute;z-index: 400; width:400; display:none">
<iframe id="ActionFrame" src="" style="width:400;height:400" scrolling="no" frameborder="0" ></iframe>
</div>
<div id="DuplicatesDiv" style="position: absolute;z-index: 200; width:600; display:none">
<iframe id="DuplicatesFrame" src="" style="width:600;height:400" scrolling="auto" frameborder="0" ></iframe>
</div>
<script>
function getDuplictes(placeId, findInLoca, feedId){
if(isUndefined(feedId)){
feedId = 0;
}
if(isUndefined(findInLoca)){
duplicatesUrl = "/places/duplicates.jsp?inPID=" + placeId + "&inFeedID=" + feedId;
}else{
duplicatesUrl = "/places/duplicates.jsp?inPID=" + placeId + "&inFindInLoca=" + findInLoca + "&inFeedID=" + feedId;
}
showFrameDiv( duplicatesUrl, "DuplicatesFrame", "DuplicatesDiv", "LocaBlur")
}
</script>
And the information will be different each time, based on which link you click.
What I want to do is somehow get the information that is displayed in the iframe in an organized, readable form without actually opening and using the web browser. I want to be able to look at the content of one iframe, decide if I need it or not, and click next to see the next one. There are about 100 of these “Find Dupes” iframes per page and maybe 50 pages. Anyway, my main problem is how to get the content of the specific iframe using python or javascript or something (I’m clueless with javascript…)
Thanks.
With python you could use PyQuery to get the
onclickattribute of each anchor tag, parse that with a regular expression to get theplaceId, build the/places/duplicates.jsp?inPID=URL yourself, use requests to load the content at that URL, then PyQuery again on the content to get the data you need.