A Complete Study on Cloud Virtual Machine for Data Analyses
Main Article Content
Abstract
In today's data processing era, cloud computing has become the essential technology for storing, processing and distributing large amounts of data. Scheduling is an important part of cloud computing and plays an important role in integrating distributed computing resources and managing distributed workloads. This article provides an overview of the key concepts and challenges in planning data distribution projects in the cloud.
Cloud computing uses virtual devices to reduce server power costs and hardware costs. Additionally, large data sets allow users to access multiple stores. To get the most out of cloud computing, developers need to develop a process that optimizes the architecture and deployment of usage models. The role of virtual machines (VMs) has become an important topic as cloud computing tools are made available through virtualization technology. To improve the overall performance of cloud computing, virtual machines need to be carefully managed so that they can run efficiently and allocate physical resources correctly.
There is a lot of SLA based big data processing and MapReduce scheduling in the cloud environment, but they do not dynamically configure the cloud environment. Instead, they create a virtual cluster by first provisioning (static) resources in a private cloud. We believe that cloud services should be provisioned dynamically based on the application's workload and data size. This introduces new challenges such as: a) how much and what type of air to provide; b) private producers or public air providers selected for demand with budget and time constraints; c) provide information on which fields and resources should be selected to minimize data transmission and processing costs.