Configure Hadoop Using Ansible Playbook


Apache Hadoop is an open-source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.


DataNodes are the slave nodes in HDFS. Unlike NameNode, DataNode is commodity hardware, that is, a non-expensive system that is not of high quality or high-availability


NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself.

Our task is to write an ansible playbook that will configure data nodes and name nodes in slave and master host groups respectively.

I will use 2 Datanodes on AWS, 1 Namenode on AWS, and 1 Ansible Controller node in Virtual Machine.

  • Launch 3 Instances on AWS.
  • Update Inventory file of Ansible hosts.
  • Configure the Ansible Configuration file.
  • Run the following Playbook. (Click here to get code.)
  • The output of Playbook is:
  • Now check Namenodes and Datanodes, it has been configured.




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Testing Kubernetes Event Watchers in Golang

Using WordPress As A GraphQL Backend

A Journey to Contribute in Open Source

Lets make a 2D Archery game in Unity in 1 hour

WIP36:WePiggy’s Specific Plan for Deploying on the Aurora Mainnet


Image shows mt5 software, to be connected to deriv broker


Visualise and collectively discover test scenarios in 1 hour: a practical guide

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Karan Agrawal

Karan Agrawal

More from Medium

Docker Or Podman on Windows using WSL 2

Simplistic Chaos Test with Python and Kubernetes client

Apache Tomcat with Nginx Proxy on Ubuntu 20.04

How to install docker on RHEL using Ansible role