I want to extract attribute1 and attribute3 values only. I don’t understand why charset doesn’t seem to work in my case to “skip” any other attributes (attribute3 is not extracted as I would like):
content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}
attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]
spacer: charset reduce [tab newline #" "]
letter: complement spacer
to-space: [some letter | end]
attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]
rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]
parse content rule
output is
>> parse content rule
valueattribute1
none
== true
>>
Firstly you’re not using
parse/all. In Rebol 2 that means that whitespace has been effectively stripped out before the parse runs. That’s not true in Rebol 3: if your parse rules are in block format (as you are doing here) then/allis implied.(Note: There seemed to be consensus that Rebol 3 would throw out the non-block form of parse rules, in favor of the
splitfunction for those “minimal” parse scenarios. That would get rid of/allentirely. No action has yet been taken on this, unfortunately.)Secondly your code has bugs, which I’m not going to spend time sorting out. (That’s mostly because I think using Rebol’s parse to process XML/HTML is a fairly silly idea :P)
But don’t forget you have an important tool. If you use a set-word in the parse rule, then that will capture the parse position into a variable. You can then print it out and see where you’re at. Change the part of
attribute-rulewhere you first sayany lettertopos: (print pos) any letterand you’ll see this:See the leading space? Your rules right before the
any letterput you at a space… and since you said any letter was ok, no letters are fine, and everything’s thrown off.(Note: Rebol 3 has an even better debugging tool…the word
??. When you put it in the parse block it tells you what token/rule you’re currently processing as well as the state of the input. With this tool you can more easily find out what’s going on:…though it’s really buggy on r3 mac intel right now.)
Additionally, if you’re not using
copythen your pattern ofto X thru Xis unnecessary, you can achieve that with justthru X. If you want to do a copy you can also do that with the briefercopy Y to X Xor if it’s just a single symbol you could write the clearercopy Y to X skipIn places where you see yourself writing repetitive code, remember that Rebol can go a step above by using
composeetc: