I am trying to extract some part of string and store it to hbase in columns.
Files Content :
msgType1 Person xyz has opened Internet:www.google.com from IP:192.123.123.123 for duration 00:15:00
msgType2 Person xyz denied for opening Internet:202.x.x.x from IP:192.123.123.123 reason:unautheticated
msgType1 Person xyz has opened Internet:202.x.x.x from IP:192.123.123.123 for duration 00:15:00
pattern of messages corresponding to msgType is fixed. Now i am trying to store person name, destination , source , duration etc in hbase.
I am trying to to wrtie script in PIG to do this task.
But i am stuck at extracting part.(extracting IP or website name from ‘Internet:202.x.x.x’ token inside string).
I tried Regular expression but its not working for me. Regex alway throw this error :
ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.REGEX_EXTRACT as multiple or none of them fit. Please use an explicit cast.
is there any other way to extract these value and store it to hbase in PIG or other than PIG?
How do you use the REGEX_EXTRACT function ? Have you seen the REGEX_EXTRACT_ALL function ? According to the documentation (http://pig.apache.org/docs/r0.9.2/func.html#regex-extract-all), it should be like this :
My file is like that :