I want to write a custom language for access logs in Notepad++.
The Problem is that numbers (here: HTTP status codes) won’t be highlighted like real keywords (i.e. GET). Notepad++ only provides a highlight color for numbers in general.
How do I handle numbers like text?
Sample log file
192.23.0.9 - - [10/Sep/2012:13:46:42 +0200] "GET /js/jquery-ui.custom.min.js HTTP/1.1" 200 206731
192.23.0.9 - - [10/Sep/2012:13:46:43 +0200] "GET /js/onmediaquery.min.js HTTP/1.1" 200 1229
192.23.0.9 - - [10/Sep/2012:13:46:43 +0200] "GET /en/contact HTTP/1.1" 200 12836
192.23.0.9 - - [10/Sep/2012:13:46:44 +0200] "GET /en/imprint HTTP/1.1" 200 17380
192.23.0.9 - - [10/Sep/2012:13:46:46 +0200] "GET /en/nothere HTTP/1.1" 404 2785
Sample custom languages
http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=User_Defined_Language_Files
I also tried editing and importing a predefined language like this:
http://notepad-plus.sourceforge.net/commun/userDefinedLang/Log4Net.xml
I thought the custom language should look like this:
<KeywordLists>
[...]
<Keywords name="Words1">404 501</Keywords>
<Keywords name="Words2">301 303</Keywords>
<Keywords name="Words3">200</Keywords>
</KeywordLists>
<Styles>
<WordsStyle name="DEFAULT" styleID="11" fgColor="000000" bgColor="FFFFFF" colorStyle="0" fontName="Courier New" fontStyle="0"/>
[...]
<WordsStyle name="KEYWORD1" styleID="5" fgColor="FF0000" bgColor="FFFFFF" colorStyle="1" fontName="" fontStyle="0"/>
<WordsStyle name="KEYWORD2" styleID="6" fgColor="0000FF" bgColor="FFFFFF" colorStyle="1" fontName="" fontStyle="1"/>
<WordsStyle name="KEYWORD3" styleID="7" fgColor="00FF00" bgColor="FFFFFF" colorStyle="1" fontName="" fontStyle="0"/>
[...]
// This line causes number highlighting. Deletion doesn't work either.
<WordsStyle name="NUMBER" styleID="4" fgColor="0F7F00" bgColor="FFFFFF" fontName="" fontStyle="0"/>
</Styles>
Unfortunately numbers will be colored in the same color.
I’d like to color them like this:

etc.
Any suggestions? How to handle the numbers like keywords?
It isn’t possible to highlight numbers as keywords as the built-in lexers (parsers/language definitions) use a numeric as a token meaning that the only way to differentiate between a numeric and your keyword would be to parse the whole numeric block and then compare to the keyword list, in which case it becomes required to also parse the delimiters around the numeric block to ensure that
.200.doesn’t highlight as200. This is why your numbers all highlighted as the same color; namely the ‘number’ color.While this could be done using a custom lexer using either fixed position tokens or regex matching you’ll find the user defined languages (the last I heard) do not have this capability.
As your request is actually a fairly simple, from what I understand, being as general as possible ( as requested in your comment )…
We can use the ‘Mark’ feature of the ‘Find’ dialog with that regex but then everything is marked the same color like with your failed experiment.
Perhaps what would be simple and suit your needs would be to use a npp pythonscript and the
Mark Stylesettings in theStyle Configuratorto get the desired result?something like this crude macro style:
Which, to use, just use the plugin manager to install
Python Script, go to the plugin menu and selectNew Scriptthen paste, save, select the tab for the doc you want to parse, and execute the script (once again from the plugin menu).Obviously you could use all 5 Mark styles for different terms, you could assign to a shortcut, and you could get more into the ‘scripting’ -vs- ‘macro’ style of nppPython and make a full blown script to parse whatever you want… shoot having a script trigger whenever you select a particular lexer style is doable too.