You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
hadoop-ansible/README.new

33 lines
1.2 KiB
Plaintext

httpfs: We need to install it on only one machine (two for redundancy). Let's use the namenodes.
Move the second jobtracker on a dedicated machine.
hbase thrift: let's have two of them, on the nodes that run the hbase masters
Impala: needs to be installed on all the datanodes. After that, hue-impala can be installed on the hue server
NB: /etc/zookeeper/conf/zoo.cfg needs to be distributed on all
datanodes.
Create the new disks: lvcreate -l 238465 -n node11.t.hadoop.research-infrastructures.eu-data-hdfs dlibsan6 /dev/md3
# Move the data:
rsync -qaxvH --delete --numeric-ids /mnt/disk/ dlibsan7:/mnt/disk/
----------
dfs.socket.timeout, for read timeout
dfs.datanode.socket.write.timeout, for write timeout
In fact, the read timeout value is used for various connections in
DFSClient, if you only increase dfs.datanode.socket.write.timeout, the
timeout can continue to happen.
I tried to generate 1TB data with teragen across more than 40 data
nodes, increasing writing timeout has not fixed the problem. When I
increased both values above 600000, it disappeared.
----------
To configure yarn:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html