304 lines
12 KiB
Markdown
304 lines
12 KiB
Markdown
|
|
# Hadoop cluster based on the CDH 4 packages.
|
|
|
|
This is the playbook that I used to install and configure the Hadoop cluster @CNR, based on the deb packages found in the Cloudera repositories.
|
|
No cloudera manager was used nor installed.
|
|
|
|
## The cluster.
|
|
|
|
The cluster structure is the following:
|
|
|
|
- jobtracker.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
|
|
- mapreduce HA jobtracker
|
|
- zookeeper quorum
|
|
- HA HDFS journal
|
|
- quorum4.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
|
|
- mapreduce HA jobtracker
|
|
- zookeeper quorum
|
|
- HA HDFS journal
|
|
- nn1.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
|
|
- hdfs HA namenode
|
|
- zookeeper quorum
|
|
- HA HDFS journal
|
|
- nn2.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
|
|
- hdfs HA namenode
|
|
- zookeeper quorum
|
|
- HA HDFS journal
|
|
- hbase-master.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
|
|
- hbase primary master
|
|
- hbase thrift
|
|
- zookeeper quorum
|
|
- HA HDFS journal
|
|
- hbase-master2.t.hadoop.research-infrastructures.eu (2GB RAM, 2 CPUs):
|
|
- HBASE secondary master
|
|
- hbase thrift
|
|
|
|
- node{2..13}.t.hadoop.research-infrastructures.eu (9GB RAM, 8 CPUs, 1000GB external storage for HDFS each):
|
|
- mapreduce tasktracker
|
|
- hdfs datanode
|
|
- hbase regionserver
|
|
- solr (sharded)
|
|
|
|
- hive.t.hadoop.research-infrastructures.eu:
|
|
- hue
|
|
- hive
|
|
- oozie
|
|
- sqoop
|
|
|
|
- db.t.hadoop.research-infrastructures.eu:
|
|
- postgresql instance for hue and hive
|
|
|
|
|
|
Su jobtracker.t.hadoop.research-infrastructures.eu sono installati gli
|
|
script che gestiscono tutti i servizi. È possibile fermare/attivare i
|
|
singoli servizi oppure tutto il cluster, rispettando l'ordine
|
|
corretto.
|
|
|
|
Hanno tutti prefisso "service-" e il nome dello script dà un'idea delle operazioni che verranno eseguite:
|
|
service-global-hadoop-cluster
|
|
service-global-hbase
|
|
service-global-hdfs
|
|
service-global-mapred
|
|
service-global-zookeeper
|
|
service-hbase-master
|
|
service-hbase-regionserver
|
|
service-hbase-rest
|
|
service-hdfs-datanode
|
|
service-hdfs-httpfs
|
|
service-hdfs-journalnode
|
|
service-hdfs-namenode
|
|
service-hdfs-secondarynamenode
|
|
service-mapreduce-jobtracker
|
|
service-mapreduce-tasktracker
|
|
service-zookeeper-server
|
|
|
|
Prendono come parametro "start,stop,status,restart"
|
|
|
|
|
|
- jobtracker URL:
|
|
http://jobtracker.t.hadoop.research-infrastructures.eu:50030/jobtracker.jsp
|
|
- HDFS URL:
|
|
http://namenode.t.hadoop.research-infrastructures.eu:50070/dfshealth.jsp
|
|
- HBASE master URL:
|
|
http://hbase-master.t.hadoop.research-infrastructures.eu:60010/master-status
|
|
- HUE Web Interface:
|
|
http://quorum2.t.hadoop.research-infrastructures.eu:8888
|
|
|
|
- URL ganglia, per le metriche del cluster:
|
|
http://monitoring.research-infrastructures.eu/ganglia/?r=hour&cs=&ce=&s=by+name&c=Openaire%252B%2520Hadoop%2520TEST&tab=m&vn=
|
|
- URL Nagios, per lo stato dei servizi (da attivare):
|
|
http://monitoring.research-infrastructures.eu/nagios3/
|
|
------------------------------------------------------------------------------------------------
|
|
dom0/nodes/san map data
|
|
|
|
dlib18x: *node8* e90.6 (dlibsan9)
|
|
dlib19x: *node9* e90.7 (dlibsan9)
|
|
dlib20x: *node10* e90.8 (dlibsan9)
|
|
dlib22x: *node11* e90.5 (dlibsan9)
|
|
*node7* e63.4 (dlibsan6)
|
|
dlib23x: *node12* e80.3 (dlibsan8)
|
|
*node13* e80.4 (dlibsan8)
|
|
dlib24x: *node2* e25.1 (dlibsan2)
|
|
*node3* e74.1 (dlibsan7)
|
|
dlib25x: *node4* e83.4 (dlibsan8)
|
|
dlib26x: *node5* e72.1 (dlibsan7)
|
|
*node6* e63.3 (dlibsan6)
|
|
|
|
------------------------------------------------------------------------------------------------
|
|
Submitting a job (supporting multiple users)
|
|
To support multiple users you create UNIX user accounts only in the master node.
|
|
|
|
Sul namenode:
|
|
|
|
#groupadd supergroup
|
|
(da eseguire una sola volta)
|
|
|
|
#adduser claudio
|
|
...
|
|
|
|
# su - hdfs
|
|
$ hadoop dfs -mkdir /home/claudio
|
|
$ hadoop dfs -chown -R claudio:supergroup /home/claudio
|
|
|
|
(aggiungere claudio al gruppo supergroup)
|
|
|
|
|
|
Important:
|
|
|
|
If you do not create /tmp properly, with the right permissions as shown below, you may have problems with CDH components later. Specifically, if you don't create /tmp yourself, another process may create it automatically with restrictive permissions that will prevent your other applications from using it.
|
|
|
|
Create the /tmp directory after HDFS is up and running, and set its permissions to 1777 (drwxrwxrwt), as follows:
|
|
|
|
$ sudo -u hdfs hadoop fs -mkdir /tmp
|
|
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
|
|
|
|
Note:
|
|
|
|
If Kerberos is enabled, do not use commands in the form sudo -u <user> <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>
|
|
Step 8: Create MapReduce /var directories
|
|
|
|
sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
|
|
sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
|
|
sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
|
|
|
|
Step 9: Verify the HDFS File Structure
|
|
|
|
$ sudo -u hdfs hadoop fs -ls -R /
|
|
|
|
You should see:
|
|
|
|
drwxrwxrwt - hdfs supergroup 0 2012-04-19 15:14 /tmp
|
|
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var
|
|
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib
|
|
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoop-hdfs
|
|
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoop-hdfs/cache
|
|
drwxr-xr-x - mapred supergroup 0 2012-04-19 15:19 /var/lib/hadoop-hdfs/cache/mapred
|
|
drwxr-xr-x - mapred supergroup 0 2012-04-19 15:29 /var/lib/hadoop-hdfs/cache/mapred/mapred
|
|
drwxrwxrwt - mapred supergroup 0 2012-04-19 15:33 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
|
|
|
|
Step 10: Create and Configure the mapred.system.dir Directory in HDFS
|
|
|
|
After you start HDFS and create /tmp, but before you start the JobTracker (see the next step), you must also create the HDFS directory specified by the mapred.system.dir parameter (by default ${hadoop.tmp.dir}/mapred/system and configure it to be owned by the mapred user.
|
|
|
|
To create the directory in its default location:
|
|
|
|
$ sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system
|
|
$ sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system
|
|
|
|
Important:
|
|
|
|
If you create the mapred.system.dir directory in a different location, specify that path in the conf/mapred-site.xml file.
|
|
|
|
When starting up, MapReduce sets the permissions for the mapred.system.dir directory to drwx------, assuming the user mapred owns that directory.
|
|
Step 11: Start MapReduce
|
|
|
|
To start MapReduce, start the TaskTracker and JobTracker services
|
|
|
|
On each TaskTracker system:
|
|
|
|
$ sudo service hadoop-0.20-mapreduce-tasktracker start
|
|
|
|
On the JobTracker system:
|
|
|
|
$ sudo service hadoop-0.20-mapreduce-jobtracker start
|
|
|
|
Step 12: Create a Home Directory for each MapReduce User
|
|
|
|
Create a home directory for each MapReduce user. It is best to do this on the NameNode; for example:
|
|
|
|
$ sudo -u hdfs hadoop fs -mkdir /user/<user>
|
|
$ sudo -u hdfs hadoop fs -chown <user> /user/<user>
|
|
|
|
where <user> is the Linux username of each user.
|
|
|
|
Alternatively, you can log in as each Linux user (or write a script to do so) and create the home directory as follows:
|
|
|
|
sudo -u hdfs hadoop fs -mkdir /user/$USER
|
|
sudo -u hdfs hadoop fs -chown $USER /user/$USER
|
|
|
|
|
|
|
|
------------------------------------------------------------------------------------------------
|
|
We use the jobtracker as provisioning server
|
|
Correct start order (reverse to obtain the stop order):
|
|
• HDFS (NB: substitute secondarynamenode with journalnode when we will have HA)
|
|
• MapReduce
|
|
• Zookeeper
|
|
• HBase
|
|
• Hive Metastore
|
|
• Hue
|
|
• Oozie
|
|
• Ganglia
|
|
• Nagios
|
|
|
|
|
|
I comandi di init si trovano nel file "init.sh" nel repository ansible.
|
|
|
|
|
|
Errore da indagare:
|
|
http://stackoverflow.com/questions/6153560/hbase-client-connectionloss-for-hbase-error
|
|
|
|
# GC hints
|
|
http://stackoverflow.com/questions/9792590/gc-tuning-preventing-a-full-gc?rq=1
|
|
|
|
HBASE troubleshooting
|
|
|
|
- Se alcune region rimangono in "transition" indefinitamente, è possibile provare a risolvere il problema da shell:
|
|
|
|
# su - hbase
|
|
$ hbase hbck -fixAssignments
|
|
|
|
Potrebbe essere utile anche
|
|
$ hbase hbck -repairHoles
|
|
|
|
-----------------------------------------------------
|
|
Quando si verifica: "ROOT stuck in assigning forever"
|
|
|
|
bisogna:
|
|
- verificare che non ci siano errori relativi a zookeeper. Se ci sono, far ripartire zookeeper e poi tutto il cluster hbase
|
|
- Far ripartire il solo hbase master
|
|
-----------------------------------------------------
|
|
Quando ci sono tabelle disabilitate, ma che risultano impossibili da abilitare o eliminare:
|
|
# su - hbase
|
|
$ hbase hbck -fixAssignments
|
|
|
|
* Restart del master hbase
|
|
|
|
-----------------------------------------------------
|
|
Vedi: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/32838
|
|
Ed in generale, per capire il funzionamento: http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
|
|
|
|
Tool per il monitoraggio di hbase quando è configurato per il manual splitting:
|
|
https://github.com/sentric/hannibal
|
|
|
|
---------------------------------------------------------------------------------
|
|
|
|
2013-02-22 10:24:46,492 INFO org.apache.hadoop.mapred.TaskTracker: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@41a7fead
|
|
2013-02-22 10:24:46,492 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_node2.t.hadoop.research-infrastructures.eu:localhost/127.0.0.1:47798
|
|
2013-02-22 10:24:46,492 WARN org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks is -1 and reserved physical memory is not configured. TaskMemoryManager is disabled.
|
|
2013-02-22 10:24:46,571 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
|
|
|
|
---
|
|
Post interessante che tratta la configurazione ed i vari parametri: http://gbif.blogspot.it/2011/01/setting-up-hadoop-cluster-part-1-manual.html
|
|
|
|
|
|
Lista di nomi di parametri deprecati e il loro nuovo nome: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
|
|
|
|
---
|
|
How to decommission a worker node
|
|
|
|
1. If they are many, reduce the hdfs redundancy factor
|
|
2. Stop the regionserver on the node
|
|
3. Add the node to the hdfs and jobtracker exclude list
|
|
|
|
./run.sh mapred.yml -i inventory/hosts.production -l jt_masters --tags=hadoop_workers
|
|
./run.sh hadoop-hdfs.yml -i inventory/hosts.production -l hdfs_masters --tags=hadoop_workers
|
|
|
|
4. Refresh the hdfs and jobtracker configuration
|
|
|
|
hdfs dfsadmin -refreshNodes
|
|
mapred mradmin -refreshNodes
|
|
|
|
5. Remove the node from the list of allowed ones
|
|
|
|
5a. Edit the inventory
|
|
|
|
5b. Run
|
|
./run.sh hadoop-common.yml -i inventory/hosts.production --tags=hadoop_workers
|
|
./run.sh mapred.yml -i inventory/hosts.production -l jt_masters --tags=hadoop_workers
|
|
./run.sh hadoop-hdfs.yml -i inventory/hosts.production -l hdfs_masters --tags=hadoop_workers
|
|
|
|
---------------------------------------------------------------------------------
|
|
Nagios monitoring
|
|
|
|
- The handlers to restart the services are managed via nrpe. To get them work, we need to:
|
|
- Add an entry in nrpe.cfg. The command name needs to start with "global_restart_" and
|
|
the remaining part of the name must coincide with the name of the service.
|
|
For example:
|
|
command[global_restart_hadoop-0.20-mapreduce-tasktracker]=/usr/bin/sudo /usr/sbin/service hadoop-0.20-mapreduce-tasktracker restart
|
|
- Add a handler to the nagios service. The command needs the service name as parameter
|
|
Example:
|
|
event_handler restart-service!hadoop-0.20-mapreduce-tasktracker
|
|
|
|
---------------------------------------------------------------------------------
|