Flume must be used the hadoop native libraries when uploading gz file
THe Problem
Recently, I had been one requirement in my project for uploading real-time log record into hadoop cluster. I chose the open source software Flume. After installing flume, The log record could be transferred to hadoop cluster with gz suffix successfully. But I found the gz file size more than decompressed one.
-rw-r--r-- 1 root root 942 Dec 27 17:28 ngaancache-access.log.2016122321.1482498035352
-rw-r--r-- 1 root root 6571 Dec 27 17:32 ngaancache-access.log.2016122321.1482498035352.gz
When I used gzip command to decompress this file, one warning infomation "trailing garbage ignored" is reported as followed
#gzip ...