httpfs: We need to install it on only one machine (two for redundancy). Let's use the namenodes. Move the second jobtracker on a dedicated machine. hbase thrift: let's have two of them, on the nodes that run the hbase masters Impala: needs to be installed on all the datanodes. After that, hue-impala can be installed on the hue server NB: /etc/zookeeper/conf/zoo.cfg needs to be distributed on all datanodes. Create the new disks: lvcreate -l 238465 -n node11.t.hadoop.research-infrastructures.eu-data-hdfs dlibsan6 /dev/md3 # Move the data: rsync -qaxvH --delete --numeric-ids /mnt/disk/ dlibsan7:/mnt/disk/ ---------- dfs.socket.timeout, for read timeout dfs.datanode.socket.write.timeout, for write timeout In fact, the read timeout value is used for various connections in DFSClient, if you only increase dfs.datanode.socket.write.timeout, the timeout can continue to happen. I tried to generate 1TB data with teragen across more than 40 data nodes, increasing writing timeout has not fixed the problem. When I increased both values above 600000, it disappeared. ---------- To configure yarn: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html