Cluster Optimizer: A cloud-native cluster optimization platform

This article introduces the main functions of the cluster optimization platform Cluster Optimizer.

Urgent Need for Cloud-Native Cost Optimization

Cloud-native technology is rapidly becoming a cornerstone of modern digital transformation. Gartner predicts that by 2025, more than 95% of new digital workloads will be deployed on cloud-native platforms. The adoption of cloud-native approaches promises more efficient resource utilization, faster application delivery, and enhanced scalability and reliability for enterprises. However, the reality is that many organizations struggle with escalating costs due to various factors. The “2021 CNCF FinOps Kubernetes Report” found that after migrating to Kubernetes, 68% of enterprises reported an increase in computing costs, with 36% experiencing cost surges exceeding 20%. Moreover, the CNCF’s 2023 “Cloud Native and Kubernetes Finops Microsurvey” highlighted that 49% of respondents witnessed either slight or significant cost increases. Flexera’s 2024 State of the Cloud Report also underscored that 59% of users are increasingly concerned about cost optimization. Similarly, the 2023 China FinOps Industry Development Research Report by the FinOps Industry Promotion Alliance revealed that over half of enterprises experience IT resource waste, and more than 80% have a pressing need for IT resource and cost optimization. These statistics underscore the urgent need for effective monitoring, management, and optimization of cloud-native application costs.

Cost Optimization Challenges

Key Challenges in Cloud-Native Cost Optimization

  1. Idle Resources: This issue arises when resources remain allocated but unused, such as application instances in development or testing environments or associated resources like elastic IPs that are not released promptly.

  2. Misconfiguration: Misconfiguration occurs when resources are over-provisioned or scaling mechanisms are improperly configured. Examples include over-requesting CPU or memory to ensure peak performance or incorrectly setting the timing for auto-scaling, leading to oversized application replicas.

  3. Lack of Automated Optimization: The absence of automated optimization mechanisms often necessitates manual intervention, complicating operations. For instance, while a disk may experience fluctuating usage, without automation, the system cannot automatically resize the disk to match actual demand, potentially leading to resource wastage.

Introducing Cluster Optimizer: A Cloud-Native Cluster Optimization Platform

Cluster Optimizer is designed to help enterprises reduce cloud costs through an automated and intelligent platform. It addresses the challenges of cost management and optimization in cloud-native architectures by leveraging advanced algorithms, including deep learning and sequential decision-making. Cluster Optimizer analyzes cloud resources, applications, user behavior, and cloud provider data to identify optimization opportunities, such as idle resources and misconfigurations. It offers multi-dimensional optimization recommendations and supports automation to help users reduce costs, improve performance, and enhance operational efficiency.

Comprehensive Optimization Recommendations by Cluster Optimizer:

  • Node Group: Provides strategies for node group configuration, including recommended instance types, auto-scaling settings, and optimal minimum and maximum node counts.
  • Node: Evaluates node utilization to recommend appropriate instance types for underutilized nodes.
  • GPU Node: Identifies low GPU utilization and suggests suitable instance types.
  • Disk: Assesses disk utilization and recommends appropriate disk capacities.
  • Persistent Volume: Identifies unused or low-utilization persistent volumes and helps users delete or resize them.
  • Application: Offers resource recommendations, alerts for workloads without resource quotas, and monitors out-of-memory (OOM) events.
Example: Node Group Recommendation

Typically, a cluster’s nodes are divided into several node groups, each serving a specific purpose (e.g., separating different business units). Cloud providers offer auto-scaling features for these node groups. However, configuring the instance type and setting the appropriate minimum and maximum node values for auto-scaling can be challenging. Cluster Optimizer’s node group recommendation strategy suggests the most cost-effective instance type, whether to enable auto-scaling, and optimal minimum and maximum node values based on current load metrics, cloud provider pricing, and geographical considerations. This strategy allows for continuous optimization as the node group’s load changes.

In the example below, the node group us-pre-eks-cluster-node-r5a-20240229 currently uses the r6a.4xlarge instance type, with auto-scaling enabled and both the minimum and maximum number of nodes set to 2. The optimization settings recommend several instance types, including r5a.large, r6a.large, and r5a.2xlarge. The recommended maximum node count is 7, and the minimum is 1. By applying these optimization settings, the node group’s configuration can be adjusted, significantly reducing costs during periods of low utilization.

Node Group Optimization

How to Install and Try Cluster Optimizer

We offer a community version of Cluster Optimizer for free installation and trial. For more details, please visit: Free Trial.

If you encounter any issues during installation or trial, please contact us, and we will respond promptly.