Efficient resource allocation for generative AI workloads in cloud-native infrastructures: A multi-tiered approach

Kiran Randhi 1, * and Srinivas Reddy Bandarapu 2

1 Principal Solutions Architect.
2 Principal Cloud Architect.
 
Research Article
International Journal of Science and Research Archive, 2024, 13(02), 826-839.
Article DOI: 10.30574/ijsra.2024.13.2.2208
Publication history: 
Received on 07 October 2024; revised on 12 November 2024; accepted on 15 November 2024
 
Abstract: 
Resource management becomes essential in ensuring that generative AI workloads in cloud-native infrastructures deliver the best results. The architecture described in this article targets such workloads due to their inherent fluctuations in resource usage and the difficulties in scaling them. The proposed framework divides resources into groups to guarantee that applications are given support based on difficulty level. The features of the proposed methodology are the performance assessment of resource distribution effectiveness, taking into account metrics, including latency, throughput, and utilization rates. Furthermore, examples have been provided to support the use of this approach and its efficiency in real-life situations. Based on these, applying the multi-tiered approach to resource management improves the organization's operations performance and minimizes expenses connected with resource provisioning. Such a study also emphasizes the importance of developing flexible and effective resource management tools that can be especially useful in modern generative AI development environments.
 
Keywords: 
Generative AI; Resource Allocation; Cloud-Native Infrastructure; Multi-Tiered Approach; Performance Metrics
 
Full text article in PDF: