I ran into ss64.com which provides good help regarding how to write batch scripts that the Windows Command Interpreter will run.
However, I have been unable to find a good explanation of the grammar of batch scripts, how things expand or do not expand, and how to escape things.
Here are sample questions that I have not been able to solve:
- How is the quote system managed? I made a TinyPerl script
(foreach $i (@ARGV) { print '*' . $i ; }), compiled it and called it this way :my_script.exe "a ""b"" c"→ output is*a "b*cmy_script.exe """a b c"""→ output it*"a*b*c"
- How does the internal
echocommand work? What is expanded inside that command? - Why do I have to use
for [...] %%Iin file scripts, butfor [...] %Iin interactive sessions? - What are the escape characters, and in what context? How to escape a percent sign? For example, how can I echo
%PROCESSOR_ARCHITECTURE%literally? I found thatecho.exe %""PROCESSOR_ARCHITECTURE%works, is there a better solution? - How do pairs of
%match? Example:set b=a,echo %a %b% c%→%a a c%set a =b,echo %a %b% c%→bb% c%
- How do I ensure a variable passes to a command as a single argument if ever this variable contains double quotes?
- How are variables stored when using the
setcommand? For example, if I doset a=a" band thenecho.%a%I obtaina" b. If I however useecho.exefrom the UnxUtils, I geta b. How comes%a%expands in a different way?
We performed experiments to investigate the grammar of batch scripts. We also investigated differences between batch and command line mode.
Batch Line Parser:
Here is a brief overview of phases in the batch file line parser:
Phase 0) Read Line:
Phase 1) Percent Expansion:
Phase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes.
Phase 3) Echo the parsed command(s) Only if the command block did not begin with
@, and ECHO was ON at the start of the preceding step.Phase 4) FOR
%Xvariable expansion: Only if a FOR command is active and the commands after DO are being processed.Phase 5) Delayed Expansion: Only if delayed expansion is enabled
Phase 5.3) Pipe processing: Only if commands are on either side of a pipe
Phase 5.5) Execute Redirection:
Phase 6) CALL processing/Caret doubling: Only if the command token is CALL
Phase 7) Execute: The command is executed
Here are details for each phase:
Note that the phases described below are only a model of how the batch parser works. The actual cmd.exe internals may not reflect these phases. But this model is effective at predicting behavior of batch scripts.
Phase 0) Read Line: Read line of input through first
<LF>.<Ctrl-Z>(0x1A) is read as<LF>(LineFeed 0x0A)<Ctrl-Z>, is treated as itself – it is not converted to<LF>Phase 1) Percent Expansion:
%%is replaced by a single%%*,%1,%2, etc.)%var%, if var does not exist replace it with nothing<LF>not within%var%expansionPhase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes. What follows is an approximation of this process.
There are concepts that are important throughout this phase.
<space><tab>;,=<0x0B><0x0C>and<0xFF>Consecutive token delimiters are treated as one – there are no empty tokens between token delimiters
The following characters may have special meaning in this phase, depending on context:
<CR>^(@&|<><LF><space><tab>;,=<0x0B><0x0C><0xFF>Look at each character from left to right:
<CR>then remove it, as if it were never there (except for weird redirection behavior)^), the next character is escaped, and the escaping caret is removed. Escaped characters lose all special meaning (except for<LF>)."), toggle the quote flag. If the quote flag is active, then only"and<LF>are special. All other characters lose their special meaning until the next quote toggles the quote flag off. It is not possible to escape the closing quote. All quoted characters are always within the same token.<LF>always turns off the quote flag. Other behaviors vary depending on context, but quotes never alter the behavior of<LF>.<LF><LF>is stripped<LF>, then it is treated as a literal, meaning this process is not recursive.<LF>not within parentheses<LF>is stripped and parsing of the current line is terminated.<LF>within a FOR IN parenthesized block<LF>is converted into a<space><LF>within a parenthesized command block<LF>is converted into<LF><space>, and the<space>is treated as part of the next line of the command block.&|<or>, split the line at this point in order to handle pipes, command concatenation, and redirection.|), each side is a separate command (or command block) that gets special handling in phase 5.3&,&&, or||command concatenation, each side of the concatenation is treated as a separate command.<,<<,>, or>>redirection, the redirection clause is parsed, temporarily removed, and then appended to the end of the current command. A redirection clause consists of an optional file handle digit, the redirection operator, and the redirection destination token.@, then the@has special meaning. (@is not special in any other context)@is removed.@is before an opening(, then the entire parenthesized block is excluded from the phase 3 echo.(is not special.(, then start a new compound statement and increment the parenthesis counter)terminates the compound statement and decrements the parenthesis counter.)functions similar to aREMstatement as long as it is immediately followed by a token delimiter, special character, newline, or end-of-file^(line concatenation is possible)@have been stripped and redirection moved to the end).(functions as a command token delimiter, in addition to the standard token delimiters<LF>as<space>. After the IN clause is parsed, all tokens are concatenated together to form a single token.^that ends the line, then the argument token is thrown away, and the subsequent line is parsed and appended to the REM. This repeats until there is more than one token, or the last character is not^.:, and this is the first round of phase 2 (not a restart due to CALL in phase 6) then),<,>,&and|no longer have special meaning. The entire remainder of the line is considered to be part of the label "command".^continues to be special, meaning that line continuation can be used to append the subsequent line to the label.(no longer has special meaning for the first command that follows the Unexecuted Label.|pipe or&,&&, or||command concatenation on the line.Phase 3) Echo the parsed command(s) Only if the command block did not begin with
@, and ECHO was ON at the start of the preceding step.Phase 4) FOR
%Xvariable expansion: Only if a FOR command is active and the commands after DO are being processed.%%Xinto%X. The command line has different percent expansion rules for phase 1. This is the reason that command lines use%Xbut batch files use%%Xfor FOR variables.~modifiersare not case sensitive.~modifierstake precedence over variable names. If a character following~is both a modifier and a valid FOR variable name, and there exists a subsequent character that is an active FOR variable name, then the character is interpreted as a modifier.—- From this point onward, each command identified in phase 2 is processed separately.
—- Phases 5 through 7 are completed for one command before moving on to the next.
Phase 5) Delayed Expansion: Only if delayed expansion is on, the command is not in a parenthesized block on either side of a pipe, and the command is not a "naked" batch script (script name without parentheses, CALL, command concatenation, or pipe).
!. If not, then the token is not parsed – important for^characters.If the token does contain
!, then scan each character from left to right:^) the next character has no special meaning, the caret itself is removed!are collapsed into a single!!is removed<CR>or<LF>)same thread – Exclamation Point Phase
Phase 5.3) Pipe processing: Only if commands are on either side of a pipe
Each side of the pipe is processed independently and asynchronously.
%comspec% /S /D /c" commandBlock", so the command block gets a phase restart, but this time in command line mode.<LF>with a command before and after are converted to<space>&. Other<LF>are stripped.Phase 5.5) Execute Redirection: Any redirection that was discovered in phase 2 is now executed.
||is used.Phase 6) CALL processing/Caret doubling: Only if the command token is CALL, or if the text before the first occurring standard token delimiter is CALL. If CALL is parsed from a larger command token, then the unused portion is prepended to the arguments token before proceeding.
/?. If found anywhere within the tokens, then abort phase 6 and proceed to Phase 7, where the HELP for CALL will be printed.CALL, so multiple CALL’s can be stacked– Expansion errors in step 1.2 or 1.3 abort the CALL, but the error is not fatal – batch processing continues.
&or|(@IForFORis not recognized as an internal or external command.:.:, thenPhase 7 is not executed for CALLed scripts or :labels.
Phase 7) Execute: The command is executed
+/[]<space><tab>,;or=If the preceding text is an internal command, then remember that command
.\or:If the preceding text is not an internal command, then goto 7.2
Else the preceding text may be an internal command. Remember this command.
+/[]<space><tab>,;or=If the preceding text is a path to an existing file, then goto 7.2
Else execute the remembered internal command.
/?is detected. Most recognize/?if it appears anywhere in the arguments. But a few commands like ECHO and SET only print help if the first argument token begins with/?.set "name=content" ignored–> value=contentthen the text between the first equal sign and the last quote is used as the content (first equal and last quote excluded). Text after the last quote is ignored. If there is no quote after the equal sign, then the rest of the line is used as content.
set name="content" not ignored–> value="content" not ignoredthen the entire remainder of the line after the equal is used as content, including any and all quotes that may be present.
::will always result in an error unless SUBST is used to define a volume for::If SUBST is used to define a volume for
::, then the volume will be changed, it will not be treated as a label.,,;,=or+then break the command token at the first occurrence of<space>,;or=and prepend the remainder to the argument token(s).If the volume cannot be found, then abort with an error.
:, then goto 7.4Note that if the label token begins with
::, then this will not be reached because the preceding step will have aborted with an error unless SUBST is used to define a volume for::.:, then goto 7.4Note that this is rarely reached because the preceding step will have aborted with an error unless the command token begins with
::, and SUBST is used to define a volume for::, and the entire command token is a valid path to an external command.:.Rules in 7.2 and 7.3 may prevent a label from reaching this point.
Command Line Parser:
Works like the BatchLine-Parser, except:
Phase 1) Percent Expansion:
%*,%1etc. argument expansion%var%is left unchanged.%%. If var=content, then%%var%%expands to%content%.Phase 3) Echo the parsed command(s)
Phase 5) Delayed Expansion: only if DelayedExpansion is enabled
!var!is left unchanged.Phase 7) Execute Command
::Parsing of integer values
There are many different contexts where cmd.exe parses integer values from strings, and the rules are inconsistent:
SET /AIF%var:~n,m%(variable substring expansion)FOR /F "TOKENS=n"FOR /F "SKIP=n"FOR /L %%A in (n1 n2 n3)EXIT [/B] nDetails for these rules may be found at Rules for how CMD.EXE parses numbers
For anyone wishing to improve the cmd.exe parsing rules, there is a discussion topic on the DosTips forum where issues can be reported and suggestions made.
Jan Erik (jeb) – Original author and discoverer of phases
Dave Benham (dbenham) – Much additional content and editing