In my regex, I want to say that within the sample text, any characters are allowed, including a-z in upper and lower case, numbers and special characters.
For example, my regular expression may be checking that a document is html. therefore:
"/\n<html>[]+</html>\n/"
i have tried []+ but it does not seem to like this?
Using
[XXX]+means any character that’s between[and], one or more than one time.Here, you didn’t put any character between
[and]— hence the problem.If you want to say “any possible character”, you can use a `.`
Note : by default, it will not match newlines ; you’ll have to play with [**Pattern Modifiers**][1] if you want it to.
If you want to say any letter, you can use :
[a-z][A-Z][a-zA-Z]And, for numbers :
[0-9]: any digit[a-zA-Z0-9]: any lower-case or upper-case letter, and any number.At that point, you will probably want to take a look at :
\wmeta-character, which means "any word character"After that, when you’ll begin using a regex such as
which should match :
You’ll see that it doesn’t "stop" when you expect it too — that’s because matching is greedy, by default — you’ll have to use a
?after the+, or use theUmodifier ; see the Repetition section, for more informations.Well, actually, the best thing to do would be to *invest* some time, carefully reading everything in the [**PCRE Patterns**][4] section of the manual, if you want to start working with regexes 😉
Oh, and, BTW : **using regex to *parse* HTML is a bad idea…**
It’s generally much better to use a DOM Parser, such as :
DOMDocument::loadHTML