I have a HTML file with inline CSS:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<TITLE>Page 1</TITLE>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<DIV style="position:relative;width:612;height:792;">
<STYLE type="text/css">
.ft0{font-size:108px;font-family:Helvetica;color:#000000;}
.ft1{font-size:16px;font-family:Times;color:#000000; }
</STYLE>
</HEAD>
<BODY bgcolor="#A0A0A0" vlink="blue" link="blue">
<DIV style="position:absolute;top:457;left:225"><nobr><span class="ft0">Sample</span> </nobr></DIV>
<DIV style="position:absolute;top:62;left:241"><nobr><span class="ft1"><b>HTML</b></span></nobr></DIV>
</BODY>
</HTML>
I am trying to parse the inline CSS using Ruby’s css_parser library. Note that the inline CSS has 2 classes .ft0 and .ft1.
My code is:
require 'css_parser'
parser = CssParser::Parser.new
parser.load_file!('filename.html')
puts parser.to_s
Which outputs:
<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n<HTML>\n<HEAD> \n<TITLE>Page 1</TITLE>\n<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n<DIV style=\"position:relative;width:612;height:792;\">\n<STYLE type=\"text/css\">\n.ft0 {\nfont-size: 108px; font-family: Helvetica; color: #000000;\n}\n.ft1 {\nfont-size: 16px; font-family: Times; color: #000000;\n}\n"
when I do:
parser.find_by_selector(".ft0")
it returns an empty array.
It appears as though css_parser is seeing the entire string
<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n<HTML>\n<HEAD>\n<TITLE>Page 1</TITLE>\n<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n<DIV style=\"position:relative;width:612;height:792;\">\n<STYLE type=\"text/css\">\n.ft0
as the selector instead of just the class .ft0
Is there a way that I can fix this, so that it just finds the class .ft0?
CssParser doesn’t find the target in HTML, it only wants a style-sheet definition. You need to parse the CSS from the HTML then pass it to CssParser.
This might get you started:
Which outputs: