The old playbook files. Some are missing.

hadoop-cdh4-legacy
Andrea Dell'Amico 6 months ago
parent 53fcee684d
commit b821ee815b
Signed by: andrea.dellamico
GPG Key ID: 147ABE6CEB9E20FF

@ -0,0 +1,86 @@
2013-11-06 Andrea Dell'Amico <adellam@sevenseas.org>
* templates/hadoop-nrpe.cfg.j2: Add the correct entry for the tasktracker event handler.
2013-11-04 Andrea Dell'Amico <adellam@sevenseas.org>
* templates/iptables-rules.v4.j2: Add access to the hbase master rpc port from the isti network.
2013-10-10 Andrea Dell'Amico <adellam@sevenseas.org>
* templates/hbase-site.j2: Add the property hbase.master.loadbalance.bytable to try a better balance for our workload. Reference here: http://answers.mapr.com/questions/7049/table-only-on-single-region-server
2013-10-09 Andrea Dell'Amico <adellam@sevenseas.org>
* templates/nagios-server/hadoop-cluster/services.cfg.j2: Handler to restart the tasktracker when it fails.
* templates/iptables-rules.*.j2: iptables rules to block access to the services ports from outside CNR.
2013-10-07 Andrea Dell'Amico <adellam@sevenseas.org>
* templates/nagios-server: added checks for the logstash host and services
* logstash.yml: Add logstash with remote syslog to aggregate all the workers logs. Needed for solr.
2013-10-01 Andrea Dell'Amico <adellam@sevenseas.org>
* templates/: management portal that redirects to the services web interfaces.
2013-09-23 Andrea Dell'Amico <adellam@sevenseas.org>
* tasks/jobtracker-ha.yml: HA configuration for the jobtracker. jobtracker.t.hadoop and quorum4.t.hadoop are the two masters.
2013-09-19 Andrea Dell'Amico <adellam@sevenseas.org>
* all.yml: HDFS is now HA. All the datanodes lists are generated from the hosts file and are not static anymore. Changed nagios to reflect the new configuration.
2013-09-17 Andrea Dell'Amico <adellam@sevenseas.org>
* templates: Changed the system-* scripts to manage the second namenode instance. Removed the secondary namenode start/stop script
2013-09-12 Andrea Dell'Amico <adellam@sevenseas.org>
* hadoop-test.yml: New quorum4.t.hadoop node. Zookeeper now has 5
quorum nodes. HBASE master HA. quorum4 is the other instance.
2013-07-29 Andrea Dell'Amico <adellam@sevenseas.org>
* templates/datanode-hdfs-site.j2: Added "support_append" as "true" and max_xcievers as 1024
2013-06-20 Andrea Dell'Amico <adellam@sevenseas.org>
* hadoop-ganglia.yml: The ganglia configuration is now differentiated between datanodes, jobtracker, hbase master, hdfs namenode, hdfs secondary namenode
2013-02-27 Andrea Dell'Amico <adellam@sevenseas.org>
* vars/hadoop-global-vars.yml: mapred_tasktracker_reduce_tasks_maximum: 5, hbase_regionserver_heap_size: 3192
2013-02-22 Andrea Dell'Amico <adellam@isti.cnr.it>
* init.sh: Create hdfs directory /jobtracker to store the jobtracker history
* templates/mapred-site-jobtracker.j2: Activate permanent jobtracker history
* jobtracker.yml: Cleanup
2013-02-18 Andrea Dell'Amico <adellam@isti.cnr.it>
* vars/hadoop-global-vars.yml: mapred_child_java_opts: "-Xmx3092M", mapred_map_child_java_opts: "-Xmx2048M", mapred_reduce_child_java_opts: "-Xmx1512M", hbase_regionserver_heap_size: 4092
2013-02-18 Andrea Dell'Amico <adellam@isti.cnr.it>
* vars/hadoop-global-vars.yml: hbase_master_heap_size: 5120, hbase_regionserver_heap_size: 3192
2013-02-18 Andrea Dell'Amico <adellam@isti.cnr.it>
* vars/hadoop-global-vars.yml (hbase_thrift_heap_size): mapred_child_java_opts: "-Xmx1512M", mapred_map_child_java_opts: "-Xmx3092M", mapred_reduce_child_java_opts: "-Xmx2048M", hbase_master_heap_size: 3072
2013-02-16 Andrea Dell'Amico <adellam@isti.cnr.it>
* templates/hbase-thrift-env.sh.j2: Disabled the jmx console for hbase thrift
* templates/hbase-master-env.sh.j2: disabled the master jmx console
* vars/hadoop-global-vars.yml: zookeeper_max_timeout: 240000, fixed the zookeeper quorum host naming
2013-02-16 Andrea Dell'Amico <adellam@isti.cnr.it>
* vars/hadoop-global-vars.yml: mapred_child_java_opts: "-Xmx2G", mapred_reduce_child_java_opts: "-Xmx2512M", hbase_regionserver_heap_size: 5000

@ -1,3 +1,303 @@
# hadoop-ansible
Ansible playbook that installs and configures a Hadoop cluster.
# Hadoop cluster based on the CDH 4 packages.
This is the playbook that I used to install and configure the Hadoop cluster @CNR, based on the deb packages found in the Cloudera repositories.
No cloudera manager was used nor installed.
## The cluster.
The cluster structure is the following:
- jobtracker.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
- mapreduce HA jobtracker
- zookeeper quorum
- HA HDFS journal
- quorum4.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
- mapreduce HA jobtracker
- zookeeper quorum
- HA HDFS journal
- nn1.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
- hdfs HA namenode
- zookeeper quorum
- HA HDFS journal
- nn2.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
- hdfs HA namenode
- zookeeper quorum
- HA HDFS journal
- hbase-master.t.hadoop.research-infrastructures.eu (3.5GB RAM, 4 CPUs):
- hbase primary master
- hbase thrift
- zookeeper quorum
- HA HDFS journal
- hbase-master2.t.hadoop.research-infrastructures.eu (2GB RAM, 2 CPUs):
- HBASE secondary master
- hbase thrift
- node{2..13}.t.hadoop.research-infrastructures.eu (9GB RAM, 8 CPUs, 1000GB external storage for HDFS each):
- mapreduce tasktracker
- hdfs datanode
- hbase regionserver
- solr (sharded)
- hive.t.hadoop.research-infrastructures.eu:
- hue
- hive
- oozie
- sqoop
- db.t.hadoop.research-infrastructures.eu:
- postgresql instance for hue and hive
Su jobtracker.t.hadoop.research-infrastructures.eu sono installati gli
script che gestiscono tutti i servizi. รˆ possibile fermare/attivare i
singoli servizi oppure tutto il cluster, rispettando l'ordine
corretto.
Hanno tutti prefisso "service-" e il nome dello script dร  un'idea delle operazioni che verranno eseguite:
service-global-hadoop-cluster
service-global-hbase
service-global-hdfs
service-global-mapred
service-global-zookeeper
service-hbase-master
service-hbase-regionserver
service-hbase-rest
service-hdfs-datanode
service-hdfs-httpfs
service-hdfs-journalnode
service-hdfs-namenode
service-hdfs-secondarynamenode
service-mapreduce-jobtracker
service-mapreduce-tasktracker
service-zookeeper-server
Prendono come parametro "start,stop,status,restart"
- jobtracker URL:
http://jobtracker.t.hadoop.research-infrastructures.eu:50030/jobtracker.jsp
- HDFS URL:
http://namenode.t.hadoop.research-infrastructures.eu:50070/dfshealth.jsp
- HBASE master URL:
http://hbase-master.t.hadoop.research-infrastructures.eu:60010/master-status
- HUE Web Interface:
http://quorum2.t.hadoop.research-infrastructures.eu:8888
- URL ganglia, per le metriche del cluster:
http://monitoring.research-infrastructures.eu/ganglia/?r=hour&cs=&ce=&s=by+name&c=Openaire%252B%2520Hadoop%2520TEST&tab=m&vn=
- URL Nagios, per lo stato dei servizi (da attivare):
http://monitoring.research-infrastructures.eu/nagios3/
------------------------------------------------------------------------------------------------
dom0/nodes/san map data
dlib18x: *node8* e90.6 (dlibsan9)
dlib19x: *node9* e90.7 (dlibsan9)
dlib20x: *node10* e90.8 (dlibsan9)
dlib22x: *node11* e90.5 (dlibsan9)
*node7* e63.4 (dlibsan6)
dlib23x: *node12* e80.3 (dlibsan8)
*node13* e80.4 (dlibsan8)
dlib24x: *node2* e25.1 (dlibsan2)
*node3* e74.1 (dlibsan7)
dlib25x: *node4* e83.4 (dlibsan8)
dlib26x: *node5* e72.1 (dlibsan7)
*node6* e63.3 (dlibsan6)
------------------------------------------------------------------------------------------------
Submitting a job (supporting multiple users)
To support multiple users you create UNIX user accounts only in the master node.
Sul namenode:
#groupadd supergroup
(da eseguire una sola volta)
#adduser claudio
...
# su - hdfs
$ hadoop dfs -mkdir /home/claudio
$ hadoop dfs -chown -R claudio:supergroup /home/claudio
(aggiungere claudio al gruppo supergroup)
Important:
If you do not create /tmp properly, with the right permissions as shown below, you may have problems with CDH components later. Specifically, if you don't create /tmp yourself, another process may create it automatically with restrictive permissions that will prevent your other applications from using it.
Create the /tmp directory after HDFS is up and running, and set its permissions to 1777 (drwxrwxrwt), as follows:
$ sudo -u hdfs hadoop fs -mkdir /tmp
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Note:
If Kerberos is enabled, do not use commands in the form sudo -u <user> <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>
Step 8: Create MapReduce /var directories
sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
Step 9: Verify the HDFS File Structure
$ sudo -u hdfs hadoop fs -ls -R /
You should see:
drwxrwxrwt - hdfs supergroup 0 2012-04-19 15:14 /tmp
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoop-hdfs
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoop-hdfs/cache
drwxr-xr-x - mapred supergroup 0 2012-04-19 15:19 /var/lib/hadoop-hdfs/cache/mapred
drwxr-xr-x - mapred supergroup 0 2012-04-19 15:29 /var/lib/hadoop-hdfs/cache/mapred/mapred
drwxrwxrwt - mapred supergroup 0 2012-04-19 15:33 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
Step 10: Create and Configure the mapred.system.dir Directory in HDFS
After you start HDFS and create /tmp, but before you start the JobTracker (see the next step), you must also create the HDFS directory specified by the mapred.system.dir parameter (by default ${hadoop.tmp.dir}/mapred/system and configure it to be owned by the mapred user.
To create the directory in its default location:
$ sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system
$ sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system
Important:
If you create the mapred.system.dir directory in a different location, specify that path in the conf/mapred-site.xml file.
When starting up, MapReduce sets the permissions for the mapred.system.dir directory to drwx------, assuming the user mapred owns that directory.
Step 11: Start MapReduce
To start MapReduce, start the TaskTracker and JobTracker services
On each TaskTracker system:
$ sudo service hadoop-0.20-mapreduce-tasktracker start
On the JobTracker system:
$ sudo service hadoop-0.20-mapreduce-jobtracker start
Step 12: Create a Home Directory for each MapReduce User
Create a home directory for each MapReduce user. It is best to do this on the NameNode; for example:
$ sudo -u hdfs hadoop fs -mkdir /user/<user>
$ sudo -u hdfs hadoop fs -chown <user> /user/<user>
where <user> is the Linux username of each user.
Alternatively, you can log in as each Linux user (or write a script to do so) and create the home directory as follows:
sudo -u hdfs hadoop fs -mkdir /user/$USER
sudo -u hdfs hadoop fs -chown $USER /user/$USER
------------------------------------------------------------------------------------------------
We use the jobtracker as provisioning server
Correct start order (reverse to obtain the stop order):
โ€ข HDFS (NB: substitute secondarynamenode with journalnode when we will have HA)
โ€ข MapReduce
โ€ข Zookeeper
โ€ข HBase
โ€ข Hive Metastore
โ€ข Hue
โ€ข Oozie
โ€ข Ganglia
โ€ข Nagios
I comandi di init si trovano nel file "init.sh" nel repository ansible.
Errore da indagare:
http://stackoverflow.com/questions/6153560/hbase-client-connectionloss-for-hbase-error
# GC hints
http://stackoverflow.com/questions/9792590/gc-tuning-preventing-a-full-gc?rq=1
HBASE troubleshooting
- Se alcune region rimangono in "transition" indefinitamente, รจ possibile provare a risolvere il problema da shell:
# su - hbase
$ hbase hbck -fixAssignments
Potrebbe essere utile anche
$ hbase hbck -repairHoles
-----------------------------------------------------
Quando si verifica: "ROOT stuck in assigning forever"
bisogna:
- verificare che non ci siano errori relativi a zookeeper. Se ci sono, far ripartire zookeeper e poi tutto il cluster hbase
- Far ripartire il solo hbase master
-----------------------------------------------------
Quando ci sono tabelle disabilitate, ma che risultano impossibili da abilitare o eliminare:
# su - hbase
$ hbase hbck -fixAssignments
* Restart del master hbase
-----------------------------------------------------
Vedi: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/32838
Ed in generale, per capire il funzionamento: http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
Tool per il monitoraggio di hbase quando รจ configurato per il manual splitting:
https://github.com/sentric/hannibal
---------------------------------------------------------------------------------
2013-02-22 10:24:46,492 INFO org.apache.hadoop.mapred.TaskTracker: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@41a7fead
2013-02-22 10:24:46,492 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_node2.t.hadoop.research-infrastructures.eu:localhost/127.0.0.1:47798
2013-02-22 10:24:46,492 WARN org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks is -1 and reserved physical memory is not configured. TaskMemoryManager is disabled.
2013-02-22 10:24:46,571 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
---
Post interessante che tratta la configurazione ed i vari parametri: http://gbif.blogspot.it/2011/01/setting-up-hadoop-cluster-part-1-manual.html
Lista di nomi di parametri deprecati e il loro nuovo nome: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
---
How to decommission a worker node
1. If they are many, reduce the hdfs redundancy factor
2. Stop the regionserver on the node
3. Add the node to the hdfs and jobtracker exclude list
./run.sh mapred.yml -i inventory/hosts.production -l jt_masters --tags=hadoop_workers
./run.sh hadoop-hdfs.yml -i inventory/hosts.production -l hdfs_masters --tags=hadoop_workers
4. Refresh the hdfs and jobtracker configuration
hdfs dfsadmin -refreshNodes
mapred mradmin -refreshNodes
5. Remove the node from the list of allowed ones
5a. Edit the inventory
5b. Run
./run.sh hadoop-common.yml -i inventory/hosts.production --tags=hadoop_workers
./run.sh mapred.yml -i inventory/hosts.production -l jt_masters --tags=hadoop_workers
./run.sh hadoop-hdfs.yml -i inventory/hosts.production -l hdfs_masters --tags=hadoop_workers
---------------------------------------------------------------------------------
Nagios monitoring
- The handlers to restart the services are managed via nrpe. To get them work, we need to:
- Add an entry in nrpe.cfg. The command name needs to start with "global_restart_" and
the remaining part of the name must coincide with the name of the service.
For example:
command[global_restart_hadoop-0.20-mapreduce-tasktracker]=/usr/bin/sudo /usr/sbin/service hadoop-0.20-mapreduce-tasktracker restart
- Add a handler to the nagios service. The command needs the service name as parameter
Example:
event_handler restart-service!hadoop-0.20-mapreduce-tasktracker
---------------------------------------------------------------------------------

@ -0,0 +1,32 @@
httpfs: We need to install it on only one machine (two for redundancy). Let's use the namenodes.
Move the second jobtracker on a dedicated machine.
hbase thrift: let's have two of them, on the nodes that run the hbase masters
Impala: needs to be installed on all the datanodes. After that, hue-impala can be installed on the hue server
NB: /etc/zookeeper/conf/zoo.cfg needs to be distributed on all
datanodes.
Create the new disks: lvcreate -l 238465 -n node11.t.hadoop.research-infrastructures.eu-data-hdfs dlibsan6 /dev/md3
# Move the data:
rsync -qaxvH --delete --numeric-ids /mnt/disk/ dlibsan7:/mnt/disk/
----------
dfs.socket.timeout, for read timeout
dfs.datanode.socket.write.timeout, for write timeout
In fact, the read timeout value is used for various connections in
DFSClient, if you only increase dfs.datanode.socket.write.timeout, the
timeout can continue to happen.
I tried to generate 1TB data with teragen across more than 40 data
nodes, increasing writing timeout has not fixed the problem. When I
increased both values above 600000, it disappeared.
----------
To configure yarn:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html

@ -0,0 +1,2 @@
monitorRole readonly
controlRole readwrite

@ -0,0 +1,374 @@
---
# Generic machines data
time_zone: 'Europe/Rome'
cpu_cores: 8
datanode_ram: 11000
nagios_enabled: True
ganglia_enabled: False
ganglia_gmond_hdfs_datanodes_port: "8640:8660"
ganglia_gmond_jobtracker_port: "8640:8660"
ganglia_gmond_hbmaster_port: "8640:8660"
ganglia_gmond_namenode_port: "8640:8660"
configure_munin: True
# JDK (Oracle)
jdk_version:
- 7
- 8
jdk_default: 8
java_home: '/usr/lib/jvm/java-{{ jdk_default }}-oracle'
jdk_java_home: '{{ java_home }}'
# PKG state: latest or present. Set to 'latest' when you want to upgrade the installed packages version.
hadoop_pkg_state: present
#
#
# Global data
#
worker_nodes_num: 4
worker_node_start: 2
worker_node_end: 5
worker_node_swappiness: 0
dns_domain: t.hadoop.research-infrastructures.eu
namenode_hostname: 'nn1.{{ dns_domain }}'
secondary_nm_hostname: 'nn2.{{ dns_domain }}'
quorum_0_node_hostname: 'quorum0.{{ dns_domain }}'
quorum_1_node_hostname: 'quorum1.{{ dns_domain }}'
quorum_2_node_hostname: 'quorum2.{{ dns_domain }}'
quorum_3_node_hostname: 'quorum3.{{ dns_domain }}'
quorum_4_node_hostname: 'quorum4.{{ dns_domain }}'
hbase_master_1_hostname: 'hbase-master1.{{ dns_domain }}'
hbase_master_2_hostname: 'hbase-master2.{{ dns_domain }}'
ldap:
server: ldap://ldap.sub.research-infrastructures.eu
search_bind_auth: False
username_pattern: "uid=<username>,ou=People,o=Users,ou=Organizations,dc=research-infrastructures,dc=eu"
hadoop_ldap_uri: ldap://ldap.sub.research-infrastructures.eu
hadoop_ldap_base_dn: "dc=research-infrastructures,dc=eu"
hadoop_ldap_search_bind_auth: False
hadoop_ldap_username_pattern: "uid=<username>,ou=People,o=Users,ou=Organizations,dc=research-infrastructures,dc=eu"
#
# LOGGING
#
# WARN,INFO,DEBUG,ERROR
hadoop_log_level: INFO
#
# RFA is the rolling file appender
hadoop_log_appender: RFA
hadoop_log_appender_max_filesize: 256MB
# max backup index is ignored if the appender is daily rolling file
hadoop_log_appender_max_backupindex: 10
#
# We can use a logstash collector
hadoop_send_to_logstash: False
# Ditch the local appender if you want a logstash only solution
hadoop_logstash_appender: RFA,LOGSTASH
hadoop_logstash_collector_host: 'logstash.{{ dns_domain }}'
hadoop_logstash_collector_socketappender_port: 4560
hadoop_logstash_collector_socketappender_reconndelay: 10000
#
# rsyslog
rsyslog_install_newer_package: True
rsyslog_send_to_elasticsearch: False
rsyslog_use_queues: False
rsyslog_use_elasticsearch_module: False
rsys_elasticsearch_collector_host: '{{ hadoop_logstash_collector_host }}'
rsys_elasticsearch_collector_port: 9200
#
# General hadoop
#
initialize_hadoop_cluster: False
hadoop_cluster_name: "nmis-hadoop-cluster"
hadoop_data_dir: /data
hadoop_conf_dir: '/etc/hadoop/conf.{{ hadoop_cluster_name|lower }}'
hadoop_mapred_home: /usr/lib/hadoop-0.20-mapreduce
hadoop_hdfs_data_disk:
- { mountpoint: '/data', device: 'xvda3', fstype: 'xfs' }
#
# Hadoop default heapsize
# The default is 1000
hadoop_default_heapsize: 1024
hadoop_default_java_opts: "-server -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8"
hadoop_jmx_enabled: False
#
# HDFS
#
hdfs_cluster_id: '{{ hadoop_cluster_name }}'
hdfs_cluster_nn_id_1: nn1
hdfs_cluster_nn_id_2: nn2
hdfs_cluster_ids: "{{ hdfs_cluster_nn_id_1 }},{{ hdfs_cluster_nn_id_2 }}"
hdfs_namenode_1_hostname: '{{ namenode_hostname }}'
hdfs_namenode_2_hostname: '{{ secondary_nm_hostname }}'
hdfs_data_dir: '{{ hadoop_data_dir }}/dfs'
hdfs_nn_data_dir: nn
hdfs_dn_data_dir: dn
hdfs_dn_balance_bandwidthPerSec: 2097152
hdfs_support_append: "true"
hdfs_nn_rpc_port: 8020
hdfs_nn_http_port: 50070
hdfs_nn_client_port: 57045
# handler count. Recommended: ln(number of datanodes) * 20
hdfs_nn_handler_count: 50
# Recommended: up to 128MB, 134217728 bytes (this is the default, is a client parameter)
hdfs_block_size: 16777216
hdfs_repl_max: 256
hdfs_replication: 1
# Set to 0 to disable the trash use. Note that the client can enable it.
hdfs_fs_trash_interval: 10060
hdfs_datanode_max_xcievers: 1024
hdfs_datanode_http_port: 50075
hdfs_datanode_ipc_port: 50020
hdfs_datanode_rpc_port: 50010
hdfs_dfs_socket_timeout: 600000
hdfs_dfs_socket_write_timeout: 600000
# See http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_topic_11_6.html
hdfs_read_shortcircuit: True
hdfs_read_shortcircuit_cache_size: 3000
hdfs_read_shortcircuit_cache_expiry: 50000
hdfs_read_shortcircuit_cache_dir: '/var/run/hadoop-hdfs'
hdfs_journal_id: '{{ hdfs_cluster_id }}'
hdfs_journal_port: 8485
hdfs_journal_0: '{{ quorum_0_node_hostname }}'
hdfs_journal_1: '{{ quorum_1_node_hostname }}'
hdfs_journal_2: '{{ quorum_2_node_hostname }}'
hdfs_journal_3: '{{ quorum_3_node_hostname }}'
hdfs_journal_4: '{{ quorum_4_node_hostname }}'
hdfs_journal_data_dir: jn
hdfs_journal_http_port: 8480
hdfs_zkfc_port: 8019
hdfs_webhdfs_enabled: True
hdfs_users_supergroup: supergroup
# The following is used to retrieve the ssh key needed for the HA failover
hdfs_user_home: /usr/lib/hadoop
httpfs_user: httpfs
httpfs_host: 'hue.{{ dns_domain }}'
httpfs_host_1: 'nn1.{{ dns_domain }}'
httpfs_host_2: 'nn2.{{ dns_domain }}'
httpfs_port: 14000
httpfs_catalina_work_dir: /usr/lib/hadoop-httpfs/work
#
# Zookeeper
zookeeper_conf_dir: '/etc/zookeeper/conf.{{ hadoop_cluster_name|lower }}'
zookeeper_log_dir: '/var/log/zookeeper'
zookeeper_client_port: 2182
zookeeper_quorum_port: 4182
zookeeper_leader_port: 3182
zookeeper_min_timeout: 30000
zookeeper_max_timeout: 240000
zookeeper_quorum_0: '{{ quorum_0_node_hostname }}'
zookeeper_quorum_1: '{{ quorum_1_node_hostname }}'
zookeeper_quorum_2: '{{ quorum_2_node_hostname }}'
zookeeper_quorum_3: '{{ quorum_3_node_hostname }}'
zookeeper_quorum_4: '{{ quorum_4_node_hostname }}'
zookeeper_maxclient_connections: 240
zookeeper_nodes: "{{ zookeeper_quorum_0 }},{{ zookeeper_quorum_1 }},{{ zookeeper_quorum_2 }},{{ zookeeper_quorum_3 }},{{ zookeeper_quorum_4 }}"
zookeeper_cluster: "{{ zookeeper_quorum_0 }}:{{ zookeeper_client_port }},{{ zookeeper_quorum_1 }}:{{ zookeeper_client_port }},{{ zookeeper_quorum_2 }}:{{ zookeeper_client_port }},{{ zookeeper_quorum_3 }}:{{ zookeeper_client_port }},{{ zookeeper_quorum_4 }}:{{ zookeeper_client_port }}"
#
# Jobtracker
#
jobtracker_cluster_id: nmis-hadoop-jt
jobtracker_node_1_hostname: 'jobtracker.{{ dns_domain }}'
jobtracker_node_2_hostname: 'jobtracker2.{{ dns_domain }}'
jobtracker_cluster_id_1: jt1
jobtracker_cluster_id_2: jt2
jobtracker_cluster_id1_rpc_port: 8021
jobtracker_cluster_id2_rpc_port: 8022
jobtracker_cluster_id1_ha_rpc_port: 8023
jobtracker_cluster_id2_ha_rpc_port: 8024
jobtracker_cluster_id1_http_port: 50030
jobtracker_cluster_id2_http_port: 50031
jobtracker_http_port: 9290
jobtracker_persistent_jobstatus: 'true'
jobtracker_restart_recover: 'false'
jobtracker_failover_connect_retries: 3
jobtracker_auto_failover_enabled: 'true'
jobtracker_zkfc_port: 8018
# handler count. Recommended: ln(number of datanodes) * 20
jobtracker_handler_count: 50
# We have 12 nodes and 6 CPUs per node
# reduce tasks forumla: 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum)
# Cloudera defaults: 2 mappers, 2 reducers max
# ------
# tested. too much stress on the hardware
#mapred_tasktracker_map_tasks_maximum: 6
#mapred_tasktracker_reduce_tasks_maximum: 68
#mapred_reduce_child_java_opts: "-Xmx2G"
# ------
mapred_tasktracker_http_port: 50060
mapred_tasktracker_map_tasks_maximum: 2
mapred_tasktracker_reduce_tasks_maximum: 4
mapred_use_fair_scheduler: True
mapred_fair_scheduler_pools:
- { name: 'solr', map: '12', reduce: '18' }
mapred_fair_scheduler_use_poolnameproperty: True
mapred_fair_scheduler_poolnameproperty: user.name
mapred_fair_scheduler_undecl_pools: True
mapred_fair_scheduler_preemption: False
mapred_fair_scheduler_assignmultiple: True
mapred_fair_scheduler_allocation_file: '{{ hadoop_conf_dir }}/fair-scheduler.xml'
# reducer parallel copies. Recommended: ln(number of datanodes) * 4
# with a minimum of 10
mapred_reduce_parallel_copies: 10
# Recommended: 80
mapred_tasktracker_http_threads: 80
# Default: 0.05. Recommended: 0.8. Used by the jobtracker
mapred_reduce_slowstart_maps: 0.9
# Default: 100. We could increase it
mapred_tasktracker_io_sort_mb: 256
mapred_io_sort_factor: 25
mapreduce_job_counters_max: 5000
mapred_userlog_retain_hours: 24
mapred_jt_completeuserjobs_max: 150
mapred_jt_persist_jobstatus_hours: 4320
mapred_user_jobconf_limit: 5242880
mapred_jt_retirejob_interval: 86400000
mapreduce_jt_split_metainfo_maxsize: 10000000
mapred_queue_names: default
#
mapred_staging_root_dir: /user
mapred_old_staging_root_dir: /home
mapred_local_dir: /data/mapred/local
# Java parameters
mapred_child_java_opts: "-Xmx3092M"
mapred_map_child_java_opts: "-Xmx3092M"
#mapred_reduce_child_java_opts: "-Xmx1512M"
mapred_reduce_child_java_opts: "-Xmx2048M"
#
# HBASE
#
# Raw formula to calculate the needed regionserver heap size:
# regions.hbase.hregion.max.filesize /
# hbase.hregion.memstore.flush.size *
# dfs.replication *
# hbase.regionserver.global.memstore.lowerLimit
# See: http://hadoop-hbase.blogspot.it/2013/01/hbase-region-server-memory-sizing.html
#
hbase_user: hbase
hbase_conf_dir: '/etc/hbase/conf.{{ hadoop_cluster_name|lower }}'
# HBASE heap size
hbase_master_heap_size: 5120
hbase_thrift_heap_size: 1024
hbase_regionserver_heap_size: 4500
hbase_master_java_opts: '-Xmx{{ hbase_master_heap_size }}M'
hbase_regionserver_maxdirectmemory_size: "-XX:MaxDirectMemorySize=2G"
hbase_regionserver_java_opts: '-Xmx{{ hbase_regionserver_heap_size }}M'
hbase_thrift_java_opts: '-Xmx{{ hbase_thrift_heap_size }}M'
hbase_zookeeper_java_opts: -Xmx1G
hbase_thrift_port: 9090
hbase_thrift_jmx_port: 9591
# hbase zookeeper timeout
hbase_zookeeper_timeout: '{{ zookeeper_max_timeout }}'
# rpc timeout needs to be greater than lease period
# See http://hbase.apache.org/book/trouble.client.html
hbase_rpc_timeout: 600000
hbase_lease_period: 400000
hbase_open_files: 65536
hbase_master_rpc_port: 60000
hbase_master_http_port: 60010
hbase_regionserver_http_port: 60030
hbase_regionserver_http_1_port: 60020
# This is controversial. When set to 'true' hdfs balances
# each table without paying attention to the global balancing
hbase_loadbalance_bytable: True
# Default is 0.2
hbase_regions_slop: 0.15
# Default is 10. The recommendation is to keep it low when the payload per request grows
# We have mixed payloads.
hbase_handler_count: 12
# Default was 256M. It's 10737418240 (10GB) since 0.94
# The recommendation is to have it big to decrease the total number of regions
# 1288490188 is circa 1.2GB
hbase_hregion_max_file_size: 1288490188
hbase_hregion_memstore_mslab_enabled: True
# The default 134217728, 128MB. We set it to 256M
hbase_hregion_memstore_flush_size: 268435456
# The default is 0.4
hbase_regionserver_global_memstore_lowerLimit: 0.35
#
hbase_regionserver_global_memstore_upperLimit: 0.45
hbase_hregion_memstore_block_multiplier: 3
# HBASE thrift server
hbase_thrift_server_1: '{{ hbase_master_1_hostname }}'
hbase_thrift_server_2: '{{ hbase_master_2_hostname }}'
#
# nginx uses as reverse proxy to all the web interfaces
#
nginx_use_ldap_pam_auth: True
nginx_pam_svc_name: nginx
nginx_ldap_uri: '{{ hadoop_ldap_uri }}'
nginx_ldap_base_dn: '{{ hadoop_ldap_base_dn }}'
portal_nginx_conf: management-portal
portal_pam_svc_name: '{{ nginx_pam_svc_name }}'
portal_title: "NeMIS Hadoop Cluster"
portal_web_root: /usr/share/nginx/www
#
# OOZIE and HIVE DB data
#
oozie_db_type: postgresql
oozie_db_name: oozie
oozie_db_user: oozie
oozie_db_host: db.t.hadoop.research-infrastructures.eu
hive_db_type: '{{ oozie_db_type }}'
hive_db_name: hive
hive_db_user: hive
hive_db_host: '{{ oozie_db_host }}'
hive_metastore_db_type: '{{ oozie_db_type }}'
hive_metastore_db_name: metastore
hive_metastore_db_user: metastore
hive_metastore_db_host: '{{ oozie_db_host }}'
hue_db_type: '{{ oozie_db_type }}'
hue_db_name: hue
hue_db_user: hue
hue_db_host: '{{ oozie_db_host }}'
hue_http_port: 8888
oozie_ip: 146.48.123.66
hive_ip: '{{ oozie_ip }}'
hue_ip: '{{ oozie_ip }}'
# Iptables
other_networks:
# Marek
icm_pl: 213.135.59.0/24
# eri.katsari
icm_pl_1: 195.134.66.216/32
# Antonis addresses, need to reach hdfs and zookeeper (ARC). And Glykeria Katsari
ilsp_gr: [ '194.177.192.226/32', '194.177.192.223/32', '195.134.66.96/32', '194.177.192.218/32', '194.177.192.231/32', '195.134.66.216/32', '195.134.66.145/32', '194.177.192.118/32', '195.134.66.244' ]
# Needed by marek. It's the IIS cluster gateway.
iis_pl_1: 213.135.60.74/32
# Jochen
icm_1: 129.70.43.118/32
monitoring_group_name: hadoop-cluster
nagios_local_plugins_dir: /usr/lib/nagios/plugins/hadoop
nagios_common_lib: check_library.sh
nagios_monitoring_dir: '/etc/nagios3/objects/{{ monitoring_group_name }}'
nagios_root_disk: /
nagios_check_disk_w: 10%
nagios_check_disk_c: 7%
nagios_service_contacts:
- andrea.dellamico
- claudio.atzori
nagios_contactgroup: hadoop-managers
nagios_monitoring_server_ip: 146.48.123.23
iptables_default_policy: REJECT

@ -0,0 +1,27 @@
---
# Ganglia
ganglia_unicast_mode: False
ganglia_gmond_jobtracker_cluster: "Openaire+ Hadoop Cluster - Jobtrackers"
ganglia_gmond_namenode_cluster: "Openaire+ Hadoop Cluster - HDFS namenodes"
ganglia_gmond_hbmaster_cluster: "Openaire+ Hadoop Cluster - HBASE masters"
ganglia_gmond_workers_cluster: "Openaire+ Hadoop Cluster - Worker nodes"
ganglia_gmond_cluster: '{{ ganglia_gmond_workers_cluster }}'
#
# To play nice with iptables
ganglia_gmond_mcast_addr: 239.2.11.0
ganglia_gmond_cluster_port: "8640:8660"
# jmx ports
hadoop_namenode_jmx_port: 10103
hadoop_secondary_namenode_jmx_port: 10104
hadoop_datanode_jmx_port: 10105
hadoop_balancer_jmx_port: 10106
hadoop_jobtracker_jmx_port: 10107
hbase_master_jmx_port: 10101
hbase_regionserver_jmx_port: 10102
hbase_thrift_jmx_port: 10109
hbase_zookeeper_jmx_port: 10110
zookeeper_jmx_port: 10108

@ -0,0 +1,32 @@
---
# jmx ports
hadoop_namenode_jmx_port: 10103
hadoop_secondary_namenode_jmx_port: 10104
hadoop_datanode_jmx_port: 10105
hadoop_balancer_jmx_port: 10106
hadoop_jobtracker_jmx_port: 10107
hbase_master_jmx_port: 10101
hbase_regionserver_jmx_port: 10102
hbase_thrift_jmx_port: 10109
hbase_zookeeper_jmx_port: 10110
zookeeper_jmx_port: 10108
#
# Used by nagios
hadoop_plugins_dir: /usr/lib/nagios/plugins/hadoop
root_disk: /dev/xvda2
data_disk: /dev/xvda3
root_disk_warn: 20%
disk_warn: '{{ root_disk_warn }}'
root_disk_crit: 10%
disk_crit: '{{ root_disk_crit }}'
data_disk_warn: 7%
data_disk_crit: 4%
hbase_check_user: hbasecheck
hbase_check_timeout: 560
hdfs_warn: 90
hdfs_crit: 95
nagios_proclist_red: '{{ redprocs }}'
nagios_proclist_yellow: '{{ yellowprocs }}'
nagios_nrpe_port: 5666

@ -0,0 +1,23 @@
---
#
# The OOZIE users are a subset of the hdfs users.
#
hadoop_users:
- { login: 'marek.horst', name: "Marek Horst", ssh_key: '{{ marek_horst }}', shell: '/bin/bash' }
- { login: 'claudio.atzori', name: "Claudio Atzori", ssh_key: '{{ claudio_atzori }}', shell: '/bin/bash' }
- { login: 'sandro.labruzzo', name: "Sandro Labruzzo", ssh_key: '{{ sandro_labruzzo }}', shell: '/bin/bash' }
- { login: 'michele.artini', name: "Michele Artini", ssh_key: '{{ michele_artini }}', shell: '/bin/bash' }
- { login: 'alessia.bardi', name: "Alessia Bardi", ssh_key: '{{ alessia_bardi }}', shell: '/bin/bash' }
- { login: 'andrea.mannocci', name: "Andrea Mannocci", ssh_key: '{{ andrea_mannocci }}', shell: '/bin/bash' }
- { login: 'andrea.dellamico', name: "Andrea Dell'Amico", ssh_key: '{{ andrea_dellamico }}', shell: '/bin/bash' }
- { login: 'giorgos.alexiou', name: "Giorgos Alexiou", ssh_key: '{{ giorgos_alexiou }}', shell: '/bin/bash' }
- { login: 'antonis.lempesis', name: "Antonis Lempesis", ssh_key: '{{ antonis_lempesis }}', shell: '/bin/bash' }
- { login: 'dnet' }
- { login: 'claudio' }
- { login: 'michele' }
- { login: 'sandro' }
- { login: 'alessia' }
- { login: 'andrea' }
- { login: 'adellam' }
- { login: 'hbasecheck' }

@ -0,0 +1,6 @@
$ANSIBLE_VAULT;1.1;AES256
63613435386665626236306331353063626137386531346461646463623436376232303461653436
3934313830326366373364396630356630623935633230360a646439346530363762363966643534
30373331666537666266353666333632616465666331383231356661633838633432656536653233
3738636134393763650a623637326339653932323563346336366433333732373733656532353137
36306364343430303535373961646632656535666162363862613036356461343865

@ -0,0 +1,10 @@
$ANSIBLE_VAULT;1.1;AES256
39646636653439616665643935326563653435646462306639646266376232633436393834643933
3364336430396530646637383438663037366362663135320a373065343862653035653838323739
61646135626431643330363963666433303737663464396663353632646339653562666162393034
3363383435346364310a356439323431343336366635306461613462663436326431383266366231
39636262313038366135316331343939373064356336356239653631633435613736306131656363
37613864353931396435353431633765623330663266646666643632626666643436623939303538
34343461383338663466303131663336326230666532326335373862636437343739336136616435
35653763353436383537633932316434303539373237336161303165353962356336666161323765
6336

@ -0,0 +1,18 @@
---
psql_version: 9.1
psql_db_host: localhost
psql_db_data:
- { name: '{{ oozie_db_name }}', encoding: 'UTF8', user: '{{ oozie_db_user }}', roles: 'CREATEDB,NOSUPERUSER', pwd: '{{ psql_db_pwd }}', allowed_hosts: [ '{{ oozie_ip }}/32' ] }
- { name: '{{ hue_db_name }}', encoding: 'UTF8', user: '{{ hue_db_user }}', roles: 'CREATEDB,NOSUPERUSER', pwd: '{{ psql_db_pwd }}', allowed_hosts: [ '{{ hue_ip }}/32' ] }
- { name: '{{ hive_metastore_db_name }}', encoding: 'UTF8', user: '{{ hive_metastore_db_user }}', roles: 'CREATEDB,NOSUPERUSER', pwd: '{{ psql_db_pwd }}', allowed_hosts: [ '{{ hive_ip }}/32' ] }
psql_listen_on_ext_int: True
pg_backup_pgdump_bin: /usr/lib/postgresql/9.1/bin/pg_dump
pg_backup_retain_copies: 10
pg_backup_build_db_list: "no"
pg_backup_db_list: "'{{ oozie_db_name }}' '{{ hue_db_name }}' '{{ hive_metastore_db_name }}'"
pg_backup_destdir: /data/pgsql/backups
pg_backup_logfile: '{{ pg_backup_logdir }}/postgresql-backup.log'
pg_backup_use_nagios: "yes"
user_ssh_key: [ '{{ claudio_atzori }}' ]

@ -0,0 +1,2 @@
---
user_ssh_key: [ '{{ claudio_atzori }}', '{{ hadoop_test_cluster }}', '{{ sandro_labruzzo }}' ]

@ -0,0 +1,18 @@
---
#
# The hadoop logs are now sent to logstash directly by log4j
# - adellam 2015-02-04
#
# the log_state_file names must be unique when using the old rsyslog syntax. In the new one
# they are not used
# rsys_logfiles:
# - { logfile: '/var/log/hadoop-0.20-mapreduce/hadoop-{{ hadoop_cluster_name }}-jobtrackerha-{{ ansible_hostname }}.log', log_tag: 'hadoop-jobtracker', log_state_file: 'hadoop-jobtracker'}
# - { logfile: '/var/log/hadoop-0.20-mapreduce/hadoop-{{ hadoop_cluster_name }}-mrzkfc-{{ ansible_hostname }}.log', log_tag: 'hadoop-jt-mrzkfc', log_state_file: 'hadoop-jt-mrzkfc'}
# - { logfile: '/var/log/hadoop-0.20-mapreduce/mapred-audit.log', log_tag: 'hadoop-mapred-audit', log_state_file: 'hadoop-mapred-audit'}
# - { logfile: '/var/log/hadoop-hdfs/hadoop-{{ hadoop_cluster_name }}-namenode-{{ ansible_hostname }}.log', log_tag: 'hadoop-hdfs-namenode', log_state_file: 'hadoop-hdfs-namenode'}
# - { logfile: '/var/log/hadoop-hdfs/hadoop-{{ hadoop_cluster_name }}-zkfc-{{ ansible_hostname }}.log', log_tag: 'hadoop-hdfs-zkfc', log_state_file: 'hadoop-hdfs-zkfc'}
# - { logfile: '/var/log/hadoop-hdfs/hadoop-{{ hadoop_cluster_name }}-journalnode-{{ ansible_hostname }}.log', log_tag: 'hadoop-hdfs-journal', log_state_file: 'hadoop-hdfs-journal'}
# - { logfile: '/var/log/hbase/hbase.log', log_tag: 'hbase-master-log', log_state_file: 'hbase-master-log'}
# - { logfile: '/var/log/hbase/hbase-hbase-master-{{ ansible_hostname }}.log', log_tag: 'hbase-master-ha', log_state_file: 'hbase-master-ha'}
# - { logfile: '/var/log/hbase/hbase-hbase-thrift-{{ ansible_hostname }}.log', log_tag: 'hbase-thrift', log_state_file: 'hbase-thrift'}
# - { logfile: '{{ zookeeper_log_dir }}/zookeeper.log', log_tag: 'hadoop-zookeeper', log_state_file: 'hadoop-zookeeper'}

@ -0,0 +1,6 @@
---
# Ganglia gmond port
ganglia_gmond_cluster: '{{ ganglia_gmond_workers_cluster }}'
ganglia_gmond_cluster_port: '{{ ganglia_gmond_hdfs_datanodes_port }}'
ganglia_gmond_mcast_address: '{{ ganglia_gmond_workers_mcast_addr }}'

@ -0,0 +1,10 @@
---
iptables:
tcp_rules: True
tcp:
- { port: '{{ hdfs_datanode_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}' ] }
- { port: '{{ hdfs_datanode_ipc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}' ] }
- { port: '{{ hdfs_datanode_rpc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ mapred_tasktracker_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hbase_regionserver_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hbase_regionserver_http_1_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }

@ -0,0 +1,10 @@
---
#
# The hadoop logs are now sent to logstash directly by log4j
# - adellam 2015-02-04
#
# IMPORTANT: the log_state_file names must be unique
# rsys_logfiles:
# - { logfile: '/var/log/hadoop-0.20-mapreduce/hadoop-{{ hadoop_cluster_name }}-tasktracker-{{ ansible_hostname }}.log', log_tag: 'hadoop-tasktracker', log_state_file: 'hadoop-tasktracker'}
# - { logfile: '/var/log/hadoop-hdfs/hadoop-{{ hadoop_cluster_name }}-datanode-{{ ansible_hostname }}.log', log_tag: 'hadoop-hdfs-datanode', log_state_file: 'hadoop-hdfs-datanode'}
# - { logfile: '/var/log/hbase/hbase-hbase-regionserver-{{ ansible_hostname }}.log', log_tag: 'hbase-regionserver', log_state_file: 'hbase-regionserver'}

@ -0,0 +1,6 @@
---
# Ganglia gmond port
ganglia_gmond_cluster: '{{ ganglia_gmond_hbmaster_cluster }}'
ganglia_gmond_cluster_port: '{{ ganglia_gmond_hbmaster_port }}'
ganglia_gmond_mcast_address: '{{ ganglia_gmond_hbmaster_mcast_addr }}'

@ -0,0 +1,12 @@
---
iptables:
tcp_rules: True
tcp:
- { port: '{{ hbase_master_rpc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ hbase_master_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ hbase_thrift_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hdfs_journal_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hdfs_journal_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_leader_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_quorum_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_client_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }

@ -0,0 +1,6 @@
---
# Ganglia gmond port
ganglia_gmond_cluster: '{{ ganglia_gmond_namenode_cluster }}'
ganglia_gmond_cluster_port: '{{ ganglia_gmond_namenode_port }}'
ganglia_gmond_mcast_address: '{{ ganglia_gmond_namenode_mcast_addr }}'

@ -0,0 +1,13 @@
---
iptables:
tcp_rules: True
tcp:
- { port: '{{ hdfs_nn_rpc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ hdfs_nn_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ hdfs_nn_client_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hdfs_zkfc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hdfs_journal_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hdfs_journal_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_leader_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_quorum_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_client_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }

@ -0,0 +1,6 @@
---
# Ganglia gmond port
ganglia_gmond_cluster: '{{ ganglia_gmond_jobtracker_cluster }}'
ganglia_gmond_cluster_port: '{{ ganglia_gmond_jobtracker_port }}'
ganglia_gmond_mcast_address: '{{ ganglia_gmond_jobtracker_mcast_addr }}'

@ -0,0 +1,22 @@
---
iptables:
tcp_rules: True
tcp:
- { port: '80:95' }
- { port: '8100:8150' }
- { port: '8200:8250' }
- { port: '8300:8350' }
- { port: '8400:8450' }
- { port: '{{ jobtracker_cluster_id1_rpc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ jobtracker_cluster_id2_rpc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ jobtracker_cluster_id1_ha_rpc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ jobtracker_cluster_id2_ha_rpc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ jobtracker_cluster_id1_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ jobtracker_cluster_id2_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.icm_1 }}' ] }
- { port: '{{ jobtracker_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ jobtracker_zkfc_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hdfs_journal_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ hdfs_journal_http_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_leader_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_quorum_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '{{ zookeeper_client_port }}', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.ilsp_gr }}', '{{ other_networks.iis_pl_1 }}', '{{ other_networks.icm_1 }}' ] }

@ -0,0 +1,52 @@
---
user_ssh_key: [ '{{ claudio_atzori }}' ]
# logstash
logstash_collector_host: logstash.t.hadoop.research-infrastructures.eu
logstash_collector_listen_port: 5544
logstash_version: 1.3.3
logstash_file: 'logstash-{{ logstash_version }}-flatjar.jar'
logstash_url: 'download.elasticsearch.org/logstash/logstash/{{ logstash_file }}'
logstash_install_dir: /opt/logstash
logstash_conf_dir: '{{ logstash_install_dir }}/etc'
logstash_lib_dir: '{{ logstash_install_dir }}/share'
logstash_log_dir: /var/log/logstash
logstash_user: logstash
logstash_indexer_jvm_opts: "-Xms2048m -Xmx2048m"
kibana_nginx_conf: kibana
kinaba_nginx_root: /var/www/kibana/src
kibana_virtual_host: logs.t.hadoop.research-infrastructures.eu
elasticsearch_user: elasticsearch
elasticsearch_group: elasticsearch
elasticsearch_version: 1.0.0
elasticsearch_http_port: 9200
elasticsearch_transport_tcp_port: 9300
elasticsearch_download_path: download.elasticsearch.org/elasticsearch/elasticsearch
elasticsearch_cluster: hadoop-logstash
elasticsearch_node_name: logstash
elasticsearch_node_master: "true"
elasticsearch_node_data: "true"
elasticsearch_max_local_storage_nodes: 1
elasticsearch_log_dir: /var/log/elasticsearch
elasticsearch_heap_size: 5
elasticsearch_host: localhost
elasticsearch_curator_close_after: 10
elasticsearch_curator_retain_days: 20
elasticsearch_curator_optimize_days: 10
elasticsearch_curator_bloom_days: 7
elasticsearch_curator_timeout: 1200
elasticsearch_curator_manage_marvel: True
elasticsearch_disable_dynamic_scripts: True
# We use the nginx defaults here
nginx_use_ldap_pam_auth: True
iptables:
tcp_rules: True
tcp:
- { port: '{{ logstash.collector_listen_port }}', allowed_hosts: [ '{{ network.nmis }}' ] }
- { port: '{{ elasticsearch.http_port }}', allowed_hosts: [ '{{ ansible_fqdn }}', '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }
- { port: '80', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}' ] }

@ -0,0 +1,43 @@
---
user_ssh_key: [ '{{ claudio_atzori }}', '{{ michele_artini }}' ]
iptables:
tcp_rules: True
tcp:
- { port: '11000', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.icm_pl }}', '{{ other_networks.icm_pl_1 }}' ] }
- { port: '10000', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.icm_pl }}', '{{ other_networks.icm_pl_1 }}' ] }
- { port: '9083', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.icm_pl }}', '{{ other_networks.icm_pl_1 }}' ] }
- { port: '8888', allowed_hosts: [ '{{ network.isti }}', '{{ network.nmis }}', '{{ network.eduroam }}', '{{ other_networks.icm_pl }}', '{{ other_networks.icm_pl_1 }}', '0.0.0.0/0' ] }
oozie:
host: 'oozie.{{ dns_domain }}'
conf_dir: /etc/oozie/conf
user: oozie
catalina_work_dir: /usr/lib/oozie/oozie-server-0.20/work
http_port: 11000
#
# HIVE
#
hive:
host: 'hive.{{ dns_domain }}'
conf_dir: /etc/hive/conf
user: hive
metastore_port: 9083
server2_http_port: 10000
setugi: True
#
# HUE
#
hue:
user: hue
group: hue
host: 'hue.{{ dns_domain }}'
http_port: 8888
conf_dir: /etc/hue
hive_interface: hiveserver2
exec_path: /usr/share/hue/build/env/bin/hue
encoding: 'utf-8'
setuid_path: /usr/share/hue/apps/shell/src/shell/build/setuid

@ -0,0 +1,8 @@
$ANSIBLE_VAULT;1.1;AES256
35656164616131366466393935373064383333633237616435353030613234323463393363643961
6366343466396563666662396332666661636462313861630a376235623035633530656238623464
37636231343837363431396564363632343466306166343365356137646266656637313534353834
3561323334346135300a643731653463353564356332376162613864336539376530333534363032
36643532626433393939353030653762643636353331326565666164343761393533623461383165
33313736346537373364646332653538343034376639626335393065346637623664303264343237
326630336139303531346238383733633335

@ -0,0 +1,17 @@
---
- hosts: hadoop_worker_nodes:hadoop_masters
remote_user: root
max_fail_percentage: 10
serial: "25%"
# vars_files:
# - ../library/vars/isti-global.yml
roles:
- common
- cdh_common
- chkconfig
- hadoop_common
- hadoop_config
- hadoop_zookeeper
- hadoop_zookeeper_config

@ -0,0 +1,10 @@
[zookeeper_cluster]
quorum0.t.hadoop.research-infrastructures.eu zoo_id=0
quorum1.t.hadoop.research-infrastructures.eu zoo_id=1
quorum2.t.hadoop.research-infrastructures.eu zoo_id=2
quorum3.t.hadoop.research-infrastructures.eu zoo_id=3
quorum4.t.hadoop.research-infrastructures.eu zoo_id=4
[monitoring]
monitoring.research-infrastructures.eu

@ -0,0 +1,14 @@
---
- hosts: monitoring
user: root
vars_files:
- ../library/vars/isti-global.yml
roles:
- nagios-server
- hosts: hadoop_cluster:other_services:db
user: root
vars_files:
- ../library/vars/isti-global.yml
roles:
- nagios-monitoring

@ -0,0 +1,4 @@
---
dependencies:
- { role: ../library/roles/openjdk }
- role: '../../library/roles/ssh-keys'

@ -0,0 +1,14 @@
---
- name: Install the common CDH hadoop packages
apt: pkg={{ item }} state={{ hadoop_pkg_state }}
with_items:
- hadoop
- hadoop-0.20-mapreduce
- hadoop-client
- hadoop-hdfs
- hadoop-mapreduce
tags:
- hadoop
- mapred
- hdfs

@ -0,0 +1,23 @@
---
- name: Install the D-NET repository key
action: apt_key url=http://ppa.research-infrastructures.eu/dnet/keys/dnet-archive.asc
tags:
- hadoop
- cdh
- name: Install the CDH repository key
action: apt_key url=http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/archive.key
tags:
- hadoop
- cdh
- apt_repository: repo='{{ item }}' update_cache=yes
with_items:
- deb http://ppa.research-infrastructures.eu/dnet lucid main
- deb http://ppa.research-infrastructures.eu/dnet unstable main
- deb [arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4 contrib
- deb [arch=amd64] http://archive.cloudera.com/gplextras/ubuntu/precise/amd64/gplextras precise-gplextras4 contrib
register: update_apt_cache
tags:
- hadoop
- cdh

@ -0,0 +1,4 @@
---
- import_tasks: cdh-setup.yml
- import_tasks: cdh-pkgs.yml
# See meta/main.yml for the involved library playbooks

@ -0,0 +1,4 @@
---
dependencies:
- role: '../../library/roles/ubuntu-deb-general'
- { role: '../../library/roles/iptables', when: iptables is defined }

@ -0,0 +1,2 @@
---
# See meta/main.yml for the involved library playbooks

@ -0,0 +1,3 @@
---
dependencies:
- role: '../../library/roles/ganglia'

@ -0,0 +1,31 @@
---
# See meta/main.yml for the basic installation and configuration steps
# The hadoop conf directory always exists
- name: Distribute the ganglia hadoop metrics properties
template: src={{ item }}.j2 dest={{ hadoop_conf_dir }}/{{ item }} owner=root group=root mode=444
with_items:
- hadoop-metrics.properties
- hadoop-metrics2.properties
tags: [ 'monitoring', 'ganglia', 'ganglia_conf' ]
- name: Check if the hbase conf directory exists
stat: path={{ hbase_conf_dir }}
register: check_hbase_confdir
tags: [ 'monitoring', 'ganglia', 'ganglia_conf' ]
- name: Distribute the ganglia hbase metrics properties
template: src={{ item }}.properties.j2 dest={{ hbase_conf_dir }}/{{ item }}-hbase.properties owner=root group=root mode=444
with_items:
- hadoop-metrics
- hadoop-metrics2
when: check_hbase_confdir.stat.exists
tags: [ 'monitoring', 'ganglia', 'ganglia_conf' ]
- name: Distribute the ganglia hbase metrics properties, maintain the old file name
file: src={{ hbase_conf_dir }}/{{ item }}-hbase.properties dest={{ hbase_conf_dir }}/{{ item }}.properties state=link force=yes
with_items:
- hadoop-metrics
- hadoop-metrics2
when: check_hbase_confdir.stat.exists
tags: [ 'monitoring', 'ganglia', 'ganglia_conf' ]

@ -0,0 +1,96 @@
# Configuration of the "dfs" context for null
dfs.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "dfs" context for file
#dfs.class=org.apache.hadoop.metrics.file.FileContext
#dfs.period=10
#dfs.fileName=/tmp/dfsmetrics.log
# Configuration of the "dfs" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# dfs.period=10
# dfs.servers=localhost:8649
# Configuration of the "mapred" context for null
mapred.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "mapred" context for file
#mapred.class=org.apache.hadoop.metrics.file.FileContext
#mapred.period=10
#mapred.fileName=/tmp/mrmetrics.log
# Configuration of the "mapred" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# mapred.period=10
# mapred.servers=localhost:8649
# Configuration of the "jvm" context for null
#jvm.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "jvm" context for file
#jvm.class=org.apache.hadoop.metrics.file.FileContext
#jvm.period=10
#jvm.fileName=/tmp/jvmmetrics.log
# Configuration of the "jvm" context for ganglia
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# jvm.period=10
# jvm.servers=localhost:8649
# Configuration of the "rpc" context for null
rpc.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "rpc" context for file
#rpc.class=org.apache.hadoop.metrics.file.FileContext
#rpc.period=10
#rpc.fileName=/tmp/rpcmetrics.log
# Configuration of the "rpc" context for ganglia
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# rpc.period=10
# rpc.servers=localhost:8649
# Configuration of the "ugi" context for null
ugi.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "ugi" context for file
#ugi.class=org.apache.hadoop.metrics.file.FileContext
#ugi.period=10
#ugi.fileName=/tmp/ugimetrics.log
# Configuration of the "ugi" context for ganglia
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# ugi.period=10
# ugi.servers=localhost:8649
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# dfs.period=10
# dfs.servers={{ hdfs_namenode_1_hostname }}:{{ ganglia_gmond_namenode_port }}
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# mapred.period=10
# mapred.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
# hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# hbase.period=10
# hbase.servers={{ hbase_master_1_hostname }}:{{ ganglia_gmond_cluster_port }}
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# jvm.period=10
# jvm.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# rpc.period=10
# rpc.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# ugi.period=10
# ugi.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
# fairscheduler.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# fairscheduler.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}

@ -0,0 +1,34 @@
# Ganglia 3.1+ support
*.period=60
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
# default for supportsparse is false
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
namenode.sink.ganglia.servers={{ hdfs_namenode_1_hostname }}:{{ ganglia_gmond_namenode_port }},{{ hdfs_namenode_2_hostname }}:{{ ganglia_gmond_namenode_port }}
datanode.sink.ganglia.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
jobtracker.sink.ganglia.servers={{ jobtracker_node_1_hostname }}:{{ ganglia_gmond_jobtracker_port }},{{ jobtracker_node_2_hostname }}:{{ ganglia_gmond_jobtracker_port }}
#tasktracker.sink.ganglia.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_tasktracker_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_tasktracker_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_tasktracker_port }}
#maptask.sink.ganglia.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_maptask_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_maptask_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_maptask_port }}
#reducetask.sink.ganglia.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_reducetask_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_reducetask_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_reducetask_port }}
tasktracker.sink.ganglia.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
maptask.sink.ganglia.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
reducetask.sink.ganglia.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
hbase.extendedperiod = 3600
hbase.sink.ganglia.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
hbase.servers=node2.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node5.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }},node11.{{ dns_domain }}:{{ ganglia_gmond_hdfs_datanodes_port }}
#hbase.sink.ganglia.servers={{ hbase_master_1_hostname }}:{{ ganglia_gmond_hbmaster_port }},{{ hbase_master_2_hostname }}:{{ ganglia_gmond_hbmaster_port }}
#hbase.servers={{ hbase_master_1_hostname }}:{{ ganglia_gmond_hbmaster_port }},{{ hbase_master_2_hostname }}:{{ ganglia_gmond_hbmaster_port }}
#resourcemanager.sink.ganglia.servers=
#nodemanager.sink.ganglia.servers=
#historyserver.sink.ganglia.servers=
#journalnode.sink.ganglia.servers=
#nimbus.sink.ganglia.servers=
#supervisor.sink.ganglia.servers=
#resourcemanager.sink.ganglia.tagsForPrefix.yarn=Queue

@ -0,0 +1,43 @@
---
- name: Directory for hdfs root under /data
file: dest={{ hdfs_data_dir }} state=directory
tags:
- hadoop
- mapred
- hdfs
# TODO: split and move to more specific roles.
- name: Directories for the hdfs services
file: dest={{ hdfs_data_dir}}/{{ item }} state=directory owner=hdfs group=hdfs mode=700
with_items:
- '{{ hdfs_dn_data_dir }}'
- '{{ hdfs_journal_data_dir }}'
tags:
- hadoop
- mapred
- hdfs
- name: Directories for mapred under /data/mapred
file: dest=/data/mapred state=directory
tags:
- hadoop
- mapred
- hdfs
- name: Directories for mapred under /data/mapred
file: dest=/data/mapred/{{ item }} state=directory owner=mapred group=hadoop mode=700
with_items:
- jt
- local
tags:
- hadoop
- mapred
- hdfs
- name: JMX secrets directory
file: dest=/etc/hadoop-jmx/conf state=directory owner=hdfs group=root mode=0750
when: hadoop_jmx_enabled
tags:
- hadoop
- jmx

@ -0,0 +1,8 @@
node13.t.hadoop.research-infrastructures.eu
node12.t.hadoop.research-infrastructures.eu
node11.t.hadoop.research-infrastructures.eu
node10.t.hadoop.research-infrastructures.eu
node9.t.hadoop.research-infrastructures.eu
node8.t.hadoop.research-infrastructures.eu
node7.t.hadoop.research-infrastructures.eu
node6.t.hadoop.research-infrastructures.eu

@ -0,0 +1,39 @@
---
- name: Restart HDFS namenode
service: name=hadoop-hdfs-namenode state=restarted sleep=20
ignore_errors: true
- name: Restart HDFS journalnode
service: name=hadoop-hdfs-journalnode state=restarted sleep=20
ignore_errors: true
- name: Restart HDFS datanode
service: name=hadoop-hdfs-datanode state=restarted sleep=20
ignore_errors: true
- name: Restart HDFS httpfs