I’ve been trying to get either an <object> or an <embed> tag using:
HtmlNode videoObjectNode = doc.DocumentNode.SelectSingleNode("//object");
HtmlNode videoEmbedNode = doc.DocumentNode.SelectSingleNode("//embed");
This doesn’t seem to work.
Can anyone please tell me how to get these tags and their InnerHtml?
A YouTube embedded video looks like this:
<embed height="385" width="640" type="application/x-shockwave-flash"
src="http://s.ytimg.com/yt/swf/watch-vfl184368.swf" id="movie_player" flashvars="..."
allowscriptaccess="always" allowfullscreen="true" bgcolor="#000000">
I got a feeling the JavaScript might stop the swf player from working, hope not…
Cheers
Update 2010-08-26 (in response to OP’s comment):
I think you’re thinking about it the wrong way, Alex. Suppose I wrote some C# code that looked like this:
Now, if I wrote a C# parser, should it recognize the contents of the string literal above as C# code and highlight it (or whatever) as such? No, because in the context of a well-formed C# file, that text represents a
stringto which thecodeBlockvariable is being assigned.Similarly, in the HTML on YouTube’s pages, the
<object>and<embed>elements are not really elements at all in the context of the current HTML document. They are the contents of string values residing within JavaScript code.In fact, if
HtmlAgilityPackdid ignore this fact and attempted to recognize all portions of text that could be HTML, it still wouldn’t succeed with these elements because, being inside JavaScript, they’re heavily escaped with\characters (notice the precariousUnescapemethod in the code I posted to get around this issue).I’m not saying my hacky solution below is the right way to approach this problem; I’m just explaining why obtaining these elements isn’t as straightforward as grabbing them with
HtmlAgilityPack.YouTubeScraperOK, Alex: you asked for it, so here it is. Some truly hacky code to extract your precious
<object>and<embed>elements out from that sea of JavaScript.And in case you’re interested, here’s a little demo I threw together (super fancy, I know):
Original Answer
Why not try using the element’s Id instead?
Update: Oh man, you’re searching for HTML tags that are themselves within JavaScript? That’s definitely why this isn’t working. (They aren’t really tags to be parsed from the perspective of
HtmlAgilityPack; all of that JavaScript is really one big string inside a<script>tag.) Maybe there’s some way you can parse the<script>tag’s inner text itself as HTML and go from there.