Elasticsearch Index Red / Yellow — Why?

Troubleshooting & Fixing Index Status

Red — Yellow-Green Stop light

What Does Red or Yellow Mean?

First, a word on what the colors mean, as they can seem complex, but in the end are simple:

  • The missing shards may be truly missing, damaged, or have other problems; or the cluster may just be in the middle of moving or rebuilding these missing shards.
  • Our job is to manually or automatically recreate these missing replicas to get to green.
  • Red — One or more indexes has missing primary shards and is not functional, i.e. it cannot index, search, or serve data.
  • Note this is on per-shard basis, so even with 50 shards it only takes one to be dead to turn the index and the cluster red.
  • Our job is to manually find or fix these missing primaries, if we can, else the index is lost and must be recreated from snapshots or original source data.

Finding Red & Yellow Indexes

1) The first step is to identify major issues you know about, such as a dead node, disk space issues, etc. that are likely to create problems. This helps inform what we look for and how we fix it later.

  • Shard Count Limits — Too many shards per node, common when new indexes are created or some nodes are removed and the system can’t find a place for them.
  • JVM or Heap Limits — Some versions can limit allocations when they are low on RAM
  • Routing or Allocation Rules — Common HA cloud or large complex systems
  • Corruption or Serious Problems — There are many more issues that can arise, each needing special attention or solutions, or, in many cases, just removing the old shards and adding new replicas or primaries.

Fixing Red & Yellow Indexes

The fourth step is to fix the problem. Fixes fall into a few categories:

  • Manually Allocate the Shard — Sometimes needed to fix things
  • Check Routing / Allocation Rules — Many HA or complex systems use routing or allocation rules to control placement, and as things change, this can create shards that can’t allocate. The explain should make this lear.
  • Remove all Replicas by setting number to 0 — Maybe you can’t fix the replica or manually move or assign it. In that case, as long as you have a primary (index is yellow, not red), you can always just set the replica count to 0, wait a minute, then set back to 1 or whatever you want, using: "index" : { "number_of_replicas" : 0 }

CEO of ChinaNetCloud & Siglos.io — Global Entrepreneur in Shanghai & Silicon Valley