How to set up spark stand-alone cluster on Linux


I have been exposed to Spark lately resulting into this second post related to it.

This post assumes that you know fundamental of Spark. If not then may be you should go here first.

There are several alternatives for setting up Spark cluster. Out of which the most basic one is stand-alone mode for which you won't require any external cluster management tools. Generally stand-alone mode is sufficient for small cluster of size up-to 10 nodes.

Now before you get bored let's start with cluster set-up:

For this example we will assume that we have three nodes with host-names: Node1, Node2 and Node3.

1) Download and install spark from here on all nodes. For installation process you can refer this post. You can skip this step if you have already installed spark on your nodes.
Note: Your spark home directory path must be same on all the nodes. If it's not then you can create symbolic link and match the directory on all nodes.

2) If you have not configures you host-names then add respective host names on /etc/hostname file.
For example in Node1 the above file should have following content:


Restart you nodes after this change.

3) Configuring hosts file of each node:
Append following lines to /etc/hosts file

<IP of Node1> Node1
<IP of Node2> Node2
<IP of Node3> Node3

Replace <IP of NodeX> with IP address of respective node.

Ping nodes with each other using the host name (i.e. Node1,Node2 etc) to make sure that above configuration is OK.

4) Setting up password-less ssh connection between master and slave nodes:

Suppose I want to make Node1 as master node then we need to setup password-less ssh connection between Node1 and other slave nodes.

Which you can do using following commands:

Generating ssh key pair:

Fire this command on all three node


Copy public key to remote-host:

On Node1 which is master node:
ssh-copy-id -i ~/.ssh/ Node1

ssh-copy-id -i ~/.ssh/ Node2

ssh-copy-id -i ~/.ssh/ Node3

Fire following command from all nodes including Node1:

ssh-copy-id -i ~/.ssh/ Node1

5) Configuring spark config files:


Go to you <spark-home>/conf directory.

You will find file. Rename it to and add following line to this file.


Replace <IP of Node1> with IP address of Node1(master).

Do this for all nodes.

b) slaves

Under  <spark-home>/conf directory rename slaves.template to slaves.

Add following lines to slaves file:


Again do this for all other nodes.

6) Booting up the cluster:

Go to <spark-home>/sbin directory and file following command:


On successful completion of above command your cluster should be on and you can monitor it from following URL:

http://<IP of master node>:8080 

Feel free to throw your doubts, troubles , suggestions in comment section.

You may also like

Popular Posts