hadoop-ansible/README.new


httpfs: We need to install it on only one machine (two for redundancy). Let's use the namenodes.

Move the second jobtracker on a dedicated machine.

hbase thrift: let's have two of them, on the nodes that run the hbase masters

Impala: needs to be installed on all the datanodes. After that, hue-impala can be installed on the hue server

NB: /etc/zookeeper/conf/zoo.cfg needs to be distributed on all
datanodes.

Create the new disks: lvcreate -l 238465 -n node11.t.hadoop.research-infrastructures.eu-data-hdfs dlibsan6 /dev/md3
# Move the data:
rsync -qaxvH --delete --numeric-ids /mnt/disk/ dlibsan7:/mnt/disk/

----------
dfs.socket.timeout,  for read timeout
dfs.datanode.socket.write.timeout, for write timeout

In fact, the read timeout value is used for various connections in
DFSClient, if you only increase dfs.datanode.socket.write.timeout, the
timeout can continue to happen.

I tried to generate 1TB data with teragen across more than 40 data
nodes, increasing writing timeout has not fixed the problem. When I
increased both values above 600000, it disappeared.
----------


To configure yarn:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html