Scaling Down an Elasticsearch Cluster
Scaling Down an Elasticsearch Cluster
Elasticsearch must be resilient to the failures of individual nodes. It achieves this resilience by considering cluster-state updates to be successful after a quorum of nodes have accepted them. A quorum is a carefully-chosen subset of the master-eligible nodes in a cluster.
Quorums must be carefully chosen so the cluster cannot elect two independent masters which make inconsistent decisions, ultimately leading to data loss. to know more...
Preparations Before scaling down
- Back up your cluster to have something to restore if things go wrong down the line.
- Check with the master nodes configuration minimum master nodes according to the
- Stop all writes to your cluster as it will not be safe to failover after our downscaling, but not mandatory if all goes fine.
- Make sure that you are not overloading the cluster by making it too small disk space and memory, else cluster will become read-only with Low disk watermark.
- Bring down the index replication factor to 1 in order to save space and speed up shard relocation during scaling, since less shards need to be created and moved around. Also, this saves a lot of space in duplicated data.
curl -X PUT "localhost:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"index" : {
"number_of_replicas" : 1
}
}
'
- Re-balance the cluster gracefully before you start scaling down.
- Cluster need to be healthy with green status, check with the
shards
andstatus
Health
curl -X GET "localhost:9200/_cluster/health?pretty"
Expected Output
{
"cluster_name" : "\"es-data-cluster\"",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
Shards
curl -X GET "localhost:9200/_cat/shards"
Expected Output
twitter 2 p STARTED 0 0b 172.18.0.2 es-node
twitter 1 p STARTED 0 0b 172.18.0.2 es-node
twitter 0 p STARTED 0 230b 172.18.0.2 es-node
When the cluster status is green
and all shards are STARTED
then you are good to go with scaling down.
Steps to scale down
- Remove one data node — the cluster will go into the yellow state. Now observe the following
- logs of the cluster,
- check with
STARTED
, andUNASSIGNED
shards
If the logs say Marking shards as stale
that means shard which is no more available for assignment and will be removed. Then the elastic search in-build capabilities start re-balancing the Shards.
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"
This command will provide explanations for shard allocations in the cluster in detail.
- Wait for green — then the cluster has replicated the lost shards.
The cluster health is red
so there is at least one unassigned primary shard. You need to focus on an unassigned cluster.
Reference
[1]: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/modules-discovery-quorums.html
[2]: https://www.elastic.co/blog/a-new-era-for-cluster-coordination-in-elasticsearch
[3]: https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html
[4]: https://blog.mapillary.com/tech/2017/01/12/scaling-down-an-elasticsearch-cluster.html