I am running hive query using get_json_object to read json strings from files in HDFS.
And I bumped with some strange behavior:
if the json is as follow:
{"data":{"oneSlash":"aaa\bbb","twoSlashes":"ccc\\ddd","threeSlashes":"eee\\\fff"}}
The result of the query is:
{"oneSlash":"aaabbb","twoSlashes":"ccc\\ddd","threeSlashes":"eee\\fff"}
I understand the ‘oneSlash’ and the ‘threeSlashes’ result but why ‘twoSlashes’ did not equal to “ccc\ddd”?
after all ‘\’ should be unescaped to ‘\’
BTW the quesry is:
SELECT get_json_object(escaping_test.data, '$.data') FROM escaping_test
it’s because \b and \f is valid escape characters whereas \d is not. there’s a post about this in more detail: Where can I find a list of escape characters required for my JSON ajax return type?