I want to extract attribute1 and attribute3 values only. I don’t understand why charset

Question

0

Asked: May 14, 20262026-05-14T01:02:21+00:00 2026-05-14T01:02:21+00:00

I want to extract attribute1 and attribute3 values only. I don’t understand why charset

0

I want to extract attribute1 and attribute3 values only. I don’t understand why charset doesn’t seem to work in my case to “skip” any other attributes (attribute3 is not extracted as I would like):

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]

spacer: charset reduce [tab newline #" "]
letter: complement spacer 
to-space: [some letter | end]

attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

output is

>> parse content rule
valueattribute1
none
== true
>>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T01:02:21+00:00

Firstly you’re not using parse/all. In Rebol 2 that means that whitespace has been effectively stripped out before the parse runs. That’s not true in Rebol 3: if your parse rules are in block format (as you are doing here) then /all is implied.

(Note: There seemed to be consensus that Rebol 3 would throw out the non-block form of parse rules, in favor of the split function for those “minimal” parse scenarios. That would get rid of /all entirely. No action has yet been taken on this, unfortunately.)

Secondly your code has bugs, which I’m not going to spend time sorting out. (That’s mostly because I think using Rebol’s parse to process XML/HTML is a fairly silly idea :P)

But don’t forget you have an important tool. If you use a set-word in the parse rule, then that will capture the parse position into a variable. You can then print it out and see where you’re at. Change the part of attribute-rule where you first say any letter to pos: (print pos) any letter and you’ll see this:

>> parse/all content rule
 attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>

valueattribute1
none
== true

See the leading space? Your rules right before the any letter put you at a space… and since you said any letter was ok, no letters are fine, and everything’s thrown off.

(Note: Rebol 3 has an even better debugging tool…the word ??. When you put it in the parse block it tells you what token/rule you’re currently processing as well as the state of the input. With this tool you can more easily find out what’s going on:

>> parse "hello world" ["hello" ?? space ?? "world"]
space: " world"
"world": "world"
== true

…though it’s really buggy on r3 mac intel right now.)

Additionally, if you’re not using copy then your pattern of to X thru X is unnecessary, you can achieve that with just thru X. If you want to do a copy you can also do that with the briefer copy Y to X X or if it’s just a single symbol you could write the clearer copy Y to X skip

In places where you see yourself writing repetitive code, remember that Rebol can go a step above by using compose etc:

>> temp: [thru (rejoin [{attribute} num {=}]) 
          copy (to-word rejoin [{valueattribute} num]) to {"} thru {"}]

>> num: 1
>> attribute1: compose temp
== [thru "attribute1=" copy valueattribute1 to {"} thru {"}]

>> num: 2
>> attribute2: compose temp
== [thru "attribute2=" copy valueattribute2 to {"} thru {"}]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to extract attribute1 and attribute3 values only. I don’t understand why charset

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply