I have installed Eggdrop on a new Debian server, but it keeps having issues with processing special characters.
Eggdrop is running utf-8. I have even manually enforced TCL encoding to utf-8 in the script. And I have tried recompiling Eggdrop with instructions from http://eggwiki.org/Utf-8.
22:00 <@me> !tr fr I have prepared lots of cookies for the entire family.
22:00 <@bot> J'ai préparé beaucoup de biscuits pour toute la famille.
22:00 <@me> !tr ar The special characters are processed.
22:00 <@bot> êêÃE ÃEùçÃDìé çÃDãÃÂñÃA çÃDîçõé.
(Also see a previous Question asked, that did not get solved: Issues with TCL encoding on Eggdrop)
namespace eval gTranslator {
# Factor this out into a helper
proc getJson url {
set tok [http::geturl $url]
set res [json::json2dict [http::data $tok]]
http::cleanup $tok
return $res
}
# How to decode _decimal_ entities; WARNING: high magic factor within!
proc decodeEntities str {
set str [string map {\[ {\[} \] {\]} \$ {\$} \\ \\\\} $str]
subst [regsub -all {&#(\d+);} $str {[format %c \1]}]
}
bind pub - !tr gTranslator::translate
proc translate { nick uhost handle chan text } {
package require http
package require json
set lngto [string tolower [lindex [split $text] 0]]
set text [http::formatQuery q [join [lrange [split $text] 1 end]]]
set dturl "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=$text"
set lng [dict get [getJson $dturl] responseData language]
if { $lng == $lngto } {
putserv "PRIVMSG $chan :\002Error\002 translating $lng to $lngto."
return 0
}
set trurl "http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=$lng%7c$lngto&$text"
putlog $trurl
set res [getJson $trurl]
putlog $res
#putserv "PRIVMSG $chan :Language detected: $lng"
set translated [decodeEntities [dict get $res responseData translatedText]]
putserv "PRIVMSG $chan :[encoding convertto utf-8 $translated]"
}
}
That ugly mess you are seeing is UTF-8 interpreted as ISO 8859-1. It indicates that somewhere there’s a misinterpretation of what characters mean, and can be caused by either getting wires crossed over a communication channel, or by an extra round of encoding being applied. Because there are rather a lot of moving parts involved (IRC client, IRC server, eggdrop, your script, Google translate) it is necessary to talk you through debugging.
Tcl and Google communicate correctly with each other (I’ve double-checked the code) so we can eliminate that possibility. The problem is therefore between your IRC client, the IRC server, and eggdrop; if they don’t agree on what the interpretation of the bytes “on the wire” is, you get mangling.
You can add (or remove) mangling in the script through the use of
encoding convertto(andencoding convertfrom) but it is necessary to be clear what you are doing in order to get it right. In memory, Tcl represents strings as sequences of abstract Unicode characters; the way in which they are “written down” in memory is not your business (and in fact varies from time to time in a complex way that’s almost always highly efficient in terms of run-time). If there is a general agreement that the IRC server’s channel will be passing through UTF-8, your requirement then is to:Dealing with the first point, I can’t remember if eggdrop handles encodings automatically for you or not. If it does, you just do this in the final stage of your binding:
If it does not, you do this:
Experiment. Use the right one.
On the second point (the client), explore its settings and get it right. Be aware that there can be additional problems if the client is running in a situation where it cannot display all Unicode characters correctly (a common problem if running in a terminal). There’s nothing that your eggdrop script can do to fix that.