An analysis of metrics can identify hidden performance issues to improve system efficiency.
This article is about a real-world case involving an error for a high-performance OpenSearch cluster.
The OpenSearch cluster was configured with 12 nodes to meet growing performance requirements, but performance couldn't improve anymore.
The article focuses on three high-level metrics - Indexing Data Rate, HTTP requests by response code and Search rate - to identify the flaw.
It was found that the root cause of the anomaly was due to how queries were executed by a component leveraging a shared library that was lacking the index pattern parameter.
The impact of the error was significant and led to an unintended consumption of resources.
Correcting the code to always specify the index pattern in queries led to a significant improvement in performance and resource utilization.
Horizontal scaling-in of the cluster reduced cloud costs, highlighting the importance of monitoring performance metrics.
This experience highlights the need to pay attention to detail in analysis, monitor metrics and avoid simple coding errors.
OpenSearch's good performances can hide issues that might go unnoticed and have severe consequences.