Scaling Down an Elasticsearch Cluster

Jinna Balu
3 min readApr 24, 2020

--

Scaling Down an Elasticsearch Cluster

Elasticsearch must be resilient to the failures of individual nodes. It achieves this resilience by considering cluster-state updates to be successful after a quorum of nodes have accepted them. A quorum is a carefully-chosen subset of the master-eligible nodes in a cluster.

Quorums must be carefully chosen so the cluster cannot elect two independent masters which make inconsistent decisions, ultimately leading to data loss. to know more...

Preparations Before scaling down

  • Back up your cluster to have something to restore if things go wrong down the line.
  • Check with the master nodes configuration minimum master nodes according to the
  • Stop all writes to your cluster as it will not be safe to failover after our downscaling, but not mandatory if all goes fine.
  • Make sure that you are not overloading the cluster by making it too small disk space and memory, else cluster will become read-only with Low disk watermark.
  • Bring down the index replication factor to 1 in order to save space and speed up shard relocation during scaling, since less shards need to be created and moved around. Also, this saves a lot of space in duplicated data.
curl -X PUT "localhost:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"index" : {
"number_of_replicas" : 1
}
}
'
  • Re-balance the cluster gracefully before you start scaling down.
  • Cluster need to be healthy with green status, check with the shards and status

Health

curl -X GET "localhost:9200/_cluster/health?pretty"

Expected Output

{
"cluster_name" : "\"es-data-cluster\"",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Shards

curl -X GET "localhost:9200/_cat/shards"

Expected Output

twitter 2 p STARTED    0   0b 172.18.0.2 es-node
twitter 1 p STARTED 0 0b 172.18.0.2 es-node
twitter 0 p STARTED 0 230b 172.18.0.2 es-node

When the cluster status is green and all shards are STARTED then you are good to go with scaling down.

Steps to scale down

  • Remove one data node — the cluster will go into the yellow state. Now observe the following
  • logs of the cluster,
  • check with STARTED, and UNASSIGNED shards

If the logs say Marking shards as stale that means shard which is no more available for assignment and will be removed. Then the elastic search in-build capabilities start re-balancing the Shards.

curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

This command will provide explanations for shard allocations in the cluster in detail.

  • Wait for green — then the cluster has replicated the lost shards.

The cluster health is red so there is at least one unassigned primary shard. You need to focus on an unassigned cluster.

Reference

[1]: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/modules-discovery-quorums.html
[2]: https://www.elastic.co/blog/a-new-era-for-cluster-coordination-in-elasticsearch
[3]: https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html

[4]: https://blog.mapillary.com/tech/2017/01/12/scaling-down-an-elasticsearch-cluster.html

--

--