Case Sharing丨Building a New Model of Modern University Data Center

2020/11/2519:40:04 technology 2539
The core value of

smart campus lies in data:

based on data mining and mathematical modeling technology, smart campus can build models on the basis of "massive" campus data, establish prediction methods, and mine, analyze, forecast and predict information;

The smart campus can also integrate all aspects of data, information, rules and other content, and make quick responses and proactive responses through intelligent reasoning, which more reflects the characteristics of intelligence and intelligence.

Therefore, the use of data by will directly affect the development of smart applications and smart campuses.

Distributed software-defined storage

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Traditional storage

Traditional storage area network (SAN, Storage Area Network) technology-based storage systems have the disadvantage of partial closure , although storage virtualization technology can improve its relatively independent architecture. However, back-end storage is still a data island, and operations such as data replication and backup between back-end storage devices will greatly affect storage performance and data security. With the rapid increase in demand for data usage in smart campuses, it is extremely prone to traversal difficulties caused by unstable concurrent streams, limited read and write bandwidth, and a sharp increase in the number of files. In the early days of

, in order to break the problem of data islands that cannot be shared by directly connected data storage, most data centers used SAN to build a storage sharing network. Due to problems such as SAN network scale and network performance bottlenecks, facing the storage resource demand of large-scale data centers at the moment, the size limit of SAN network has become an obstacle to the unlimited expansion of storage scale . Storage with different performance and different storage areas has to rely on solutions such as storage gateways to achieve mutual data sharing. Its overall architecture can no longer adapt to existing cloud computing, microservices, big data and other application scenarios, and the SAN network architecture has almost become a new data storage island.

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Smart campus has a large number of scenarios where microservices are used as users, which will also bring about startup storms, massive data synchronization read and write, data classification protection, and snapshots. In the current era of cloud data centers with software definition as the dominant direction, the technical characteristics of traditional storage have been difficult to support the rapid development of data centers.

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Software Defined Storage

Software Defined Storage (SDS, Software Defined Storage) is a storage architecture that can decouple storage software and hardware. The difference between and traditional SAN storage or NAS (Network Attached Storage) storage is that software-defined storage generally runs on X86 or industry standard servers, thereby eliminating the storage system's dependence on proprietary hardware. By decoupling storage software and hardware, the storage system can be expanded according to demand, and hardware can be reduced or downgraded, which makes the storage system more flexible.

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Advantages of distributed software-defined storage

The current software-defined storage is mainly distributed software-defined storage. Its main advantages of include the following:

In addition to providing storage management functions, it can also provide a wealth of additional software Features , such as performance analysis, automated operation and maintenance, snapshots, backups, compression, encryption, QoS (Quality of Service, quality of service), etc., while the usual traditional storage needs to rely on external software servers or purchase additional software licenses for storage hardware . Through software automation, the reliability of the information storage system can be ensured, so that after the scale of the system is expanded, its operation and maintenance difficulty is always maintained at a low level.

Capacity and performance scalability , distributed software-defined storage adopts a horizontal expansion (rather than vertical expansion) distributed structure, capacity and performance can theoretically be unlimited expansion. To increase the storage pool capacity, you only need to increase node storage disks or increase storage nodes. Since data slices in storage are evenly distributed on all nodes, and all nodes participate in data read and write in parallel, data read and write performance increases linearly with the increase of data nodes.

Most cloud platform infrastructure uses virtualization or container technology , and integration with it is a common application scenario for distributed software-defined storage. Connect the traditional storage architecture path intercommunication through iSCSI interface,It is now interoperable with traditional storage systems so that the original hardware assets can be fully utilized.

provides a standard application programming interface (API) , which is integrated with the cloud platform for management and maintenance of storage devices, seamlessly connects storage resource pooling with the cloud platform, and provides various storage services for the cloud platform.

It can be seen that distributed software-defined storage gives full play to the characteristics of its unified storage platform. The storage system expansion plan can be based on the storage capacity requirements to increase storage nodes. The distributed architecture increases storage nodes while increasing storage capacity. Improve the performance of the overall storage system. In addition, the advantages of software can also be used to predict future data growth requirements, thereby quantifying future storage expansion requirements. The concept of

Ceph distributed storage system

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Ceph

Ceph project originated from the work of Sage's Ph.D. (the earliest results were published in 2004), and then contributed to the open source community, its original intention is to provide better performance, reliability and Scalability. After several years of development, it has been supported by many cloud computing vendors and is widely used.

Ceph is a software-defined storage , which can run on almost all mainstream Linux distributions (such as CentOS and Ubuntu) and other UNIX-like operating systems (typically FreeBSD). Ceph's distributed genes make it easy to manage large-scale clusters with thousands of nodes, EB-level and above storage capacity. At the same time, the flat addressing design based on computing allows the Ceph client to directly communicate with any node on the server, thereby avoiding There are access hotspots that cause performance bottlenecks.

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Ceph uses the CRUSH (Controlled Replication Under Scalable Hashing) algorithm to dynamically calculate the storage and access location of an object. Ceph is also a unified storage system that supports traditional block and file storage protocols, such as SAN and NAS; it also supports object storage protocols, such as S3 and Swift. Therefore, Ceph is a true software-defined storage solution, which can correctly provide all enterprise-level storage features from the software level: low cost, reliability, and scalability. The basic principle of

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Ceph

Ceph distributed software-defined storage object daemon mainly includes MON (Monitors) and OSD (object storage daemon). In addition, if the upper interface needs to provide file services, it is also necessary to add the MDS (CephMetadataServer) process to Ceph, but this process is not necessary.

MON maintains the state of the cluster mainly by managing the key state and configuration information of the cluster. Maintain the membership and status of the cluster (cluster map), and provide a strong consistency strategy. Use the cluster method to avoid single points of failure, use the paxos algorithm to ensure the consistency of the cluster map, and ensure that it can still work normally when less than half of the nodes fail. MONs must agree on the state of the cluster, so the number must be an odd number.

OSD is an object storage daemon, used to provide storage objects to the client , in which OSD is divided into main OSD and non-main OSD. The main OSD functions include replication, data consistency, data rebalancing, and data recovery; the non-master OSD performs operations under the control of the main OSD, such as copy operations, and converts to OSD when necessary. The data access interface components of

client access to Ceph include:

RADOSGW (Reliable, Autonomic Distributed Object Store Gateway) interface , an object access gateway based on HTTP, which provides applications with REST-style compatible S3 and Swift protocols.

RBD (RADOS Block Device) interface provides block storage devices for the host and virtual machines. RBD provides reliable, distributed and high performance for the clientCapable block storage. RBD has been supported by the Linux kernel, and almost all Linux operating system distributions support RBD. In addition to reliability and performance, RBD also supports other enterprise-level features, such as full and incremental snapshots, streamlined configuration, copy-on-write cloning, and full memory cache. In addition, RBD can also interface with iSCSI to provide iSCSI-based storage services for the host.

CephFS (Ceph File System) interface is a POSIX-compatible file system that uses Ceph storage clusters to store data. The Linux kernel driver supports CephFS, which also makes CephFS highly suitable for major Linux operating system distributions. CephFS stores data and metadata separately to provide higher performance and reliability for upper-layer applications.

RADOSGW, RBD and CephFS access RADOS (Reliable, Autonomic Distributed Object Store) through librados, and librados also provides programming interfaces in multiple languages, including PHP, Ruby, Java, Python, C and C++ support. The complete architecture diagram is shown in Figure 1.

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Figure 1 Ceph architecture

Distributed software-defined storage

in smart campus practice

This article takes the Southeast University distributed storage project as an example, and uses software-defined network as the data transmission network foundation to build distributed software-defined storage to realize software definition Data center infrastructure.

Southeast University's informatization development has a history of more than ten years, and its current guiding ideology is to "digital wisdom southeast" as the goal , establish a sound informatization sustainable development mechanism , adhere to informatization, talent training, and scientific research , Deep integration of management services, adhere to the coordinated advancement of application-driven and mechanism innovation. However, the existing hardware infrastructure used in the school's information system is complex and diverse, including minicomputers, X86 servers, and virtualized clusters. The storage system also has various types of storage servers based on the SAN network architecture, and their performance and capacity vary greatly.

takes the existing SAN storage system of Southeast University as an example. It has different sizes of storage and different disk performance. Therefore, coordinating the storage of data mirroring with different capacities and performances and maintaining the size of the storage pool take up too much routine maintenance operating hours. And as the scale of data storage becomes larger, the workload of maintaining storage and the complexity of resource allocation are also increasing. On the other hand, when expanding the current storage capacity, it is necessary to consider the improvement of read and write performance in the face of the increase of storage capacity, such as increasing the storage head, the number of paths and the FC port rate, which are both for the hardware and management costs of data storage. A huge challenge.

In order to solve the above problems, needs to consider the integration and intercommunication of existing computing resources and storage resources when constructing distributed software-defined storage at Southeast University to avoid data islands . Existing data types mainly include databases, virtual machine data and static files. In this project, distributed software-defined storage mainly carries core application virtual machines and static files.

This project uses Ceph architecture commercial software to ensure data interoperability to the original VMware and FC networks, which is not available in open source Ceph architecture software, and can provide a more friendly graphical management interface. Figure 2 shows the converged architecture of distributed software-defined storage and VMware and FC networks.

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Figure 2 Analysis of advantages of distributed software-defined storage

The network of distributed software-defined storage is an important part of the storage system, which mainly includes three types:

PublicNetwork is used for client access and data interaction,

Cluster Network is used for data synchronization within the cluster And to deliver heartbeat messages,

Management Network is used for the management module. The three types of networks

can be flexibly configured according to the cluster size. When conditions permit, it is generally recommended that the first two types of networks use at least 10Gb network bandwidth. For traditional networks, these three types of networks need to be connected to different switch equipment..

adopts software-defined network architecture to carry distributed software-defined storage in order to solve the stringent requirements of distributed software-defined storage on the network such as high bandwidth, network isolation, and cross-center large second layer. The leaf spine distributed network node architecture of the software-defined network, the vxlan-based network, and the unified network across data centers are important guarantees for supporting the high and stable operation of distributed software-defined storage. The network physical architecture is shown in Figure 3.

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Figure 3 Storage network physical topology

can intuitively manage distributed software-defined storage networks through a graphical user interface, and quickly deliver storage services to application systems through a unified network platform, as shown in Figure 4.

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

Figure 4 Storage network logic topology

Distributed software-defined storage server network connection uses 2 Gigabit network ports and 4 10 Gigabit network ports for each server, all network ports are bound in LACP port aggregation mode and connected to SDN separately The two Leaf switches on the network ensure the reliability of the network and increase the bandwidth between nodes. The management network is a gigabit network, which is connected to an out-of-band management switch, and the internal network of the cluster and the service network that provide services use a 10-Gigabit network.

establishes two application groups for iSCSI and internal interconnection on the SDN network switch. The application group can be extended to another campus as needed to achieve cross-campus large two-tier intercommunication to meet the dual-active needs. The iSCSI network and computing resources are interconnected to provide block storage for computing resources, including blades and rack servers. Increase the number of aggregated ports of the original blade server's back-end access to the SDN network to ensure the iSCSI bandwidth from the blade to the SDN network. At the same time, in order to be able to communicate with the existing FC network of the data center, an FC HBA card is added to the distributed software-defined storage server, and it can communicate with the existing SAN switch. The storage nodes containing FC HBA cards are connected to FC switches in the traditional storage resource pool to make full use of the original FC storage resources.

From the perspective of storage management, the distributed software-defined storage implemented in this project has a greater advantage of convenient management compared with traditional storage:

adopts clustering from the underlying architecture, and the decoupling of hardware and software , The entire storage system is a complete data platform, one platform can quickly allocate all storage resources, eliminates data storage islands;

allocates different types of storage according to business needs. When allocating storage resources, the smart campus core database, Virtual machine data can be allocated high-speed and high-reliability storage pools, and static file data such as archives and attachments can be allocated cost-effective storage pools. When allocates resources through the platform, there is no need to manage and maintain a single physical storage node at the bottom;

distributed software-defined storage Data transmission adopts Ethernet, and its operation is completely integrated into the existing data center Ethernet environment, reducing the investment of independent SAN network hardware and optical fiber resources in traditional storage. Z1z greatly reduces the workload of data center operation and maintenance.

Distributed software-defined storage, as an advanced data storage method, supports the future demands of artificial intelligence, big data and mass storage of data, and ultimately makes system storage access functions more precise, and improves the utilization of information resources by detaching from physical systems, etc. Rate is an important development direction of current data center infrastructure. Although SDS still has many problems, just using SDS is not enough, but with its advantages in cost, flexibility and performance, will surely become the new cornerstone for building a modern data center, so as to play a better role in the construction of smart campuses in the future. Great effect.

Author: Tang Jie (Southeast University Network and Information Center)

Source: "China Education Network" magazine (November issue)

finishing: Zheng Yi Long

posting, reprint, or cooperation, please contact: [email protected]

Case Sharing丨Building a New Model of Modern University Data Center - DayDayNews

technology Category Latest News