Some tips for configing the flume properties

The Tips

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.I already have installed many flume systems to collect streaming log data. But I found some problems when we used flume. I write this blog to record the problems and solutions. and anybody else will avoid such problem.

The running environment

  • CDH version 5.8.0+
  • Flume 1.6.0+
  • Java 1.7.0+
  • Linux 2.6.32-573.el6.x86_6
  • Centos 6.6+

Tips one :rotating invalid

flume by using the following configuration uploads and rotates files to hadoop ...

View comments.

more ...

Why does linux du command print no result and occupy one 100 CPU usage

The Problem

Today, My colleague found an abnormal problem and asked me the reason. I recorded the following analysis steps.

This problem is that linux du command does not print the results, and at the same time, the process of du command occupies 100% CPU usage. Result of top command is the following:

# top
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                    
36343 root      20   0  110m  13m 1780 R 99.8  0.0   1:37.66 du                                                                                                         
36370 root      20   0  110m  13m 1780 R 94.2  0.0   1:20.98 du
   86 root ...

View comments.

more ...

How to upgrade the tomcat version used by CDH httpFs service

The Problem

Tomcat released one patch which fixed one error bug about CVE-2016-8745. In my CDH cluster, httpFS service is used by web http service, and it is run by 6.0.44 version Tomcat. We must upgrade the tomcat version from 6.0.44 to 6.0.50+ avoid of security attacking.

the CDH Envirenment

  • CDH version 5.7.0+
  • Java 1.7.0+
  • Linux 2.6.32-573.el6.x86_6
  • Centos 6.6+

The upgrade steps

Download the newest version of tomcat

Tomcat version 6.0.53 can be downloaded. I can extract gz package

tar xvfz apache-tomcat-6.0 ...

View comments.

more ...

How to limit virtual memory using of Flume process in Centos 6.x

The Problem

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.I already have installed many flume systems to collect streaming log data. But I found one problem that flume process was occuping more virtual memories and fewer physical memories. Why is this situation occurred? This blog explains and provides the methods to solve this problem.

The running environment

  • CDH version 5.8.0+
  • Flume 1.6.0+
  • Java 1.7.0+
  • Linux 2.6.32-573.el6.x86_6
  • Centos 6.6+

The Top command Result of flume process

26963 root ...

View comments.

more ...

How to config and query Impala SQL interface of CDH with kerberos mechanism

The Problem

Recently, I have spended several days on rearching impala sql interface with security mechanism. There are two methods to query impala data. One is the kerberos mechanism, the other is ldap method which provided user and password. The first one is very difficult and usually adapted for internal using in the hadoop cluster, So I choose the ldap method for external appliction such as jdbc interface. This blog provides the configuration steps and queries demo for using ldap to impala databases.

The test environment

  • CDH version 5.8.0+
  • kerberos software
  • ldap service
  • Linux 2.6.32-573.el6 ...

View comments.

more ...

Flume must be used the hadoop native libraries when uploading gz file

THe Problem

Recently, I had been one requirement in my project for uploading real-time log record into hadoop cluster. I chose the open source software Flume. After installing flume, The log record could be transferred to hadoop cluster with gz suffix successfully. But I found the gz file size more than decompressed one.

-rw-r--r-- 1 root   root        942 Dec 27 17:28 ngaancache-access.log.2016122321.1482498035352
-rw-r--r-- 1 root   root       6571 Dec 27 17:32 ngaancache-access.log.2016122321.1482498035352.gz

When I used gzip command to decompress this file, one warning infomation "trailing garbage ignored" is reported as followed

#gzip ...

View comments.

more ...

Howto Run SSL splitting and Caching Web proxies Demo

barnraising

This Project is provided by Chris Lesniewski-Laas and M. Frans Kaashoek for reducing the bandwidth load on Web servers. the thesis is public and Demo Program is open source code. We can access thesis and download the Demo

But both thesis and demo have not specific installed document. I had met a lot of problems, sometimes I had to modify the source code. After 3 weeks later, I finally install and run successfully. I wrote the process of installing to this file, and this git reposity is my modified and successful barnraising version. My modified Version of Barnraising is ...

View comments.

more ...

Howto Stop Bind writing log to system messages

Bind is open source software that implements the Domain Name System (DNS) protocols for the Internet, Many Companies have used it. This article discusses the logging function of Bind. by default configuration, Bind writes logging to Linux system messages in /var/log/messages directory like:

 Jun 13 10:10:09 Test_Host named[19304]: success resolving 'ns4.servodns.com/A' (in 'servodns.com'?) after reducing the advertised EDNS UDP packet size to 512 octets
 Jun 13 10:10:09 Test_Host named[19304]: success resolving 'ns4.servodns.com/AAAA' (in 'servodns.com'?) after reducing the advertised EDNS UDP packet size to 512 ...

View comments.

more ...