Indexers with “BatchAdding” status on the Indexer Clustering dashboardīelow is a screenshot of some of the error messages seen in Splunk: No replication activities as seen on the Indexer Clustering Dashboard on the Cluster Master Most Splunk servers including search heads in red status Cluster Master in an unhealthy state – Replication and Search factor not met Messages indicating indexers being rejected from joining the cluster Messages indicating failed connection from Cluster Master to the Indexers There were multiple errors in Splunk that pointed to issues with the indexers: The master would not allow them back into the cluster. The datacenter issue was later determined to be a result of storage shortfalls and was resolved within 24 hours but once the datacenter was restored, however, we noticed the indexers failing to rejoin the cluster. This caused about half of the indexers to lose connection to the cluster master for several hours. In my situation, a partial datacenter outage had caused multiple appliances to fail and resulted in failed connections among the Splunk servers. Refer to Splunk docs if you are interested in learning more about what happens when a peer goes down.ĭepending on the reason for the disconnection, indexers in a cluster may go offline for a few minutes or hours. When a peer gets disconnected from the cluster master and the master does not receive a heartbeat after a set period, the master begins bucket fixing activities to ensure the defined replication factor is met so that the cluster remains in a healthy state. A peer could go offline intentionally by issuing the CLI offline command or unintentionally as in a server crashing or due to intermittent or recurring network issues in the environment. There are however situations in which a peer node could become disconnected from the cluster master. The cluster master is also responsible for managing the configuration of peer nodes, searching of peer nodes and remedial activities if a peer goes offline.Ī peer node will need to be connected to the cluster master and stay connected to receive instructions. These indexing and replication activities are coordinated by a cluster master. Each peer node indexes external data, stores them, and simultaneously sends and receives replicated data. In an indexer cluster environment, there can be two or more indexers also called peer nodes. First, indexers in a Splunk environment can be clustered or non-clustered. Indexers failing to rejoin the cluster can cause serious issues, but this blog will provide simple steps to help resolve the issue in your environment. Splunk Issue: Indexers Failing to Rejoin the Cluster (Solved)
0 Comments
Leave a Reply. |