Huazhong University of Science and Technology
Shanghai Jiao Tong University
Institute of Computing Technology, Chinese Academy of Sciences
Alibaba Cloud Computing Co., Ltd.
Jiangnan Institute of Computing Technology, Wuxi
G-Cloud Technology Corporation
Research Institute of Information Science, China Electronics
Technology Group Corporation
Cloud computing, which provides an Internet-based model for consumption, delivery, and increasing resources, has been widely used in various domains. Cloud operating system (OS) is the core technology of cloud computing. Although some best practices of cloud OS have almost formed in the past decade, a unified Application Programming Interface (API) specification is absent, leading to ecosystem fragmentation and vendor lock-in.
The project puts forward the idea of building a cloud OS ecosystem around API and defines an API specification by analyzing the existing cloud computing practices. The API specification, which consists of basic and extended APIs, has gained the support from many vendors, forming a preliminary fully-controllable cloud OS ecosystem.
Below are the detailed technical solutions and achievements of the project.
1. Develop an API specification out of existing practices and lay a foundation for the cloud OS ecosystem
The project first explored the evolution characteristics of general and mobile OS ecosystems and established some cloud OS API definition principles. Then, following the principle of "from the practice and for the practice", a cloud OS API specification was developed through analyzing and summarizing existing cloud computing practices such as Alibaba Cloud, G-Cloud, CECT Cloud, OpenStack, Amazon AWS, and so on. The APIs defined are divided into two categories, namely the basic APIs and the extended APIs. The basic APIs, 55 in total, are also called the cloud OS minimum kernel APIs. They are a common subset of APIs supported by current cloud computing practices and encapsulate the functions necessary for building basic cloud computing services. The extended APIs, 311 in total, provide an extension to the cloud OS minimum kernel functions and further reduce developers' burden.
2. Innovate independently and expand the ecosystem impact by providing a cutting-edge cloud OS reference implementation
Adhering to the above API specification, the project made the following breakthroughs, developed a cutting-edge cloud OS reference implementation, and integrated technologies into the ecosystem.
The project made such innovations in container technology as rapid construction and distribution of container images, live migration of containers in data centers, and adaptive in-kernel container view isolation, forming a core technology system for containers. These achievements have been applied to Alibaba Cloud, providing support for the public container services and producing a significant influence worldwide.
The project put forward new ways to virtualize complex new hardware such as GPU and FPGA, making them compatible with Chinese domestic processors. Moreover, it proposed a univeral Type II "many-to-one" virtual machine architecture based on distributed QEMU and KVM, which is the first of its kind in the world and fills in the blank in this field.
The project presented new resources pooling, management and scheduling technologies, including heterogeneous resource pooling and management, distributed resources oversell, and ultra-large-scale scheduling of online-offline hybrid workload. Especially, the developed large-scale mixed online-offline job deployment and resources scheduling technology has been successfully applied to Fuxi, the scheduling system of Alibaba Cloud, enabling resources management and online-offline hybrid workload scheduling on clusters of 10,000+ nodes while with second-order failure recovery. The project advanced distributed file system with technologies such as the implementation and optimization of large-scale erasure codes, and geometric partitioning of regenerating codes. Moreover, for the two systems developed by the project, TStor --- the high-reliable and self-maintaining storage system - is the first applicable storage system based on 32+16 erasure codes in the world, and MadFS --- the ultra-high-performance burst buffer file system - has ranked first with a large margin for twice on the global IO500 list under the support of "Pengcheng Cloud Brain II".
3. Build an open-source community of cloud OS and foster efficient and effective interactions among ecosystem participants to ensure sustainable development
The project developed an API-conformance testing tool, which can automatically analyze service APIs and their application scenarios, generate test cases meeting the coverage requirements, and finish testing. It ensures that all API implementations strictly follow the specifications, thus avoiding ecosystem fragmentation. In addition, the project built its own open-source community (http://bbs.cloud-edu.cn/) as well as a code hosting platform, which hosts more than 10 projects open sourced by the project, including the minimum kernel of cloud OS, live container migration, giant virtual machine, and so on. The community fosters efficient and effective interactions among project contributors, maintainers and users.
The project outcomes promote the interoperability among different cloud operating systems or services, avoid repeated development, and thus facilitate continuous technology iteration and overall industry innovation. Till now, the cloud OS minimum kernel APIs have been officially supported by Alibaba Cloud, HUAWEI Cloud, Inspur Cloud, CETC Cloud, and so on. In particular, a domestic heterogeneous cloud platform has been constructed based on the API specification developed, further demonstrating the benefits of the project outcomes.
Trailer of World Internet Conference
Copyright © World Internet Conference. All rights Reserved
Presented by China Daily. 京ICP备13028878号-23
Copyright © World Internet Conference. All rights Reserved Presented by China Daily. 京ICP备13028878号-23