I have a Flume consolidator which writes every entry on a S3 bucket on AWS.
The problem is with the directory path.
The events are supposed to be written on /flume/events/%y-%m-%d/%H%M, but they’re on //flume/events/%y-%m-%d/%H%M.
It seems that Flume is appending one more “/” at the beginning.
Any ideas for this issue? Is that a problem with my path configuration?
master.sources = source1
master.sinks = sink1
master.channels = channel1
master.sources.source1.type = netcat
# master.sources.source1.type = avro
master.sources.source1.bind = 0.0.0.0
master.sources.source1.port = 4555
master.sources.source1.interceptors = inter1
master.sources.source1.interceptors.inter1.type = timestamp
master.sinks.sink1.type = hdfs
master.sinks.sink1.hdfs.path = s3://KEY:SECRET@BUCKET/flume/events/%y-%m-%d/%H%M
master.sinks.sink1.hdfs.filePrefix = event
master.sinks.sink1.hdfs.round = true
master.sinks.sink1.hdfs.roundValue = 5
master.sinks.sink1.hdfs.roundUnit = minute
master.channels.channel1.type = memory
master.channels.channel1.capacity = 1000
master.channels.channel1.transactionCapactiy = 100
master.sources.source1.channels = channel1
master.sinks.sink1.channel = channel1
The Flume NG HDFS sink doesn’t implement anything special for S3 support. Hadoop has some built-in support for S3, but I don’t know of anyone actively working on it. From what I have heard, it is somewhat out of date and may have some durability issues under failure.
That said, I know of people using it because it’s “good enough”.
Are you saying that “//xyz” (with multiple adjacent slashes) is a valid path name on S3? As you probably know, most Unixes collapse adjacent slashes.