I wonder wether it is feasible to implement an optimal string generator Class meeting the following second thought requirements:
Generation criteria using regexLexicographical order enumeration.Count propetryIndexed access
I don’t feel comfortable with regular expression: I cannot come up with a starting piece of code but I just think of a naive implementation using a TList as a base class and use a filter (Regex) against “brute force” generated string.
What are the other optimal alternatives ?
- Ordering: First by length (shortest first), then lexicographically.
- Specification of the range of characters to be used in the generation: All printable or any possible combination of [A-Z], [a-z], numbers, special symbols, and eventually space (regex ?).
- String Length bounded with a given Min/Max.
- Space of search constrained with bounds: Start string an End string with possibility of filtering (regex ?)
Last Edit
To begin with, I rephrased the header using regex like instead of regex.
I am considering to revise the first requirement as it is an open door which may lead to untractable issue.
I need suggestions and help for the correct wording.
Second thought requirements edit done. Still open to suggestion for refinement.
I’d do this by constructing the minimum Deterministic Finite Automaton for the language. If you are starting with a regex, this can be done automatically by Thompson’s Construction followed by the Subset Construction and minimization. See this description for example.
With a DFA in hand, you can use something like this algorithm:
Note that the step marked
**needs to be true set insertion, as duplicates can easily crop up.This is a core algorithm.
Pcan grow exponentially with output length, but this is just the price of tracking all possibilities for a future output string. The order/size/space constraints you mentioned can be ensured by maintaining sorted order in the listsLand by cutting off the search when resource limits are reached.Edit
Here is a toy Java example where I’ve hard coded the DFA for simple binary floating point literals with optional minus sign. This uses a slightly different scheme than the pseudocode above to get strict sorted order of output and to accomodate character ranges.