The flexible and dynamic architecture practice of Alipay Super App

2021/06/1211:01:48 technology 2076

The following articles are from mPaaS, author Chongyue

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

mPaaS

mPaaS (https://aliyun.com/product/mpaas) originates from Ant Financial Financial Technology, dedicated to providing an efficient, flexible and stable mobile R&D and management platform.

| Introduction

This article is based on the content shared by Chongyue at the 2019 DevOps International Summit Beijing Station. I hope that this article will introduce Alipay's challenges to large business volumes in recent years, and build a flexible dynamic architecture on the mobile side. What kind of actual combat and thinking have been done, I hope to bring some help to readers.

At the same time, regarding the five component capabilities of mPaaS, (application address at the end of the article) has been officially opened for trial use. Welcome to experience it.

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

Alipay, as a national-level application, currently has more than 870 million annual active users in China, providing more than 200 services , and the crash rate remains below 5/10,000, and Alipay launches new functions and improvements every day . It is not easy to achieve today's achievements, and it is accumulated through a long period of practical experience.

The evolution of Alipay's architecture has mainly gone through three stages. If we use an analogy, it can be divided into three stages: canoe , battleship and aircraft carrier .

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

Canoe era

When Alipay first launched the mobile terminal, its structure was very simple, except that some tool components were divided into modules, and business codes were all mixed together. It wasn't much of a problem at first, but as our R&D staff grew rapidly, things started to get tricky, just to name a few.

  1. The runnable code submitted by the research and development classmates at night will be completely unusable when they are updated the next morning. The reason is that other unrelated teams submit code to cover or pollute their own code. When
  2. is approaching the release point, it is usually the busiest, but it is not busy with functions, but busy with solving various problems caused by the merged code, which not only wastes time, but also delays the precious time of testing students. Even if
  3. is reluctantly released in the end, the stability and performance are very bad, because each module only manages its own, there is no unified specification, and there is no unified monitoring.
  4. The most headache for Android development is the problem of 65535. At that time, Google had not launched a multi-dex solution.

These serious problems made our product development iteration unsustainable, so we decided to do a thorough reconstruction and entered the age of battleships.

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The Age of Battleships

When designing a new generation of client architecture, we thought from three directions: teamwork, R&D efficiency, performance and stability.

In terms of teamwork, we hope that the entire structure is reasonably layered, and the basic level can sink general capabilities to serve more upper-level businesses and avoid repetitive creation of wheels; at the business level, each business team can develop and manage independently, and will not be wrong. related business. Based on this original intention, we have formed the following architecture:

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

The entire client architecture is divided into four layers: business layer, service layer, component layer, framework layer .

  • business layer: only needs to focus on the implementation of business logic and interface. When general capabilities such as payment need to be invoked, R&D students can directly use the service capabilities provided by the lower layer without developing their own, which can ensure that the core capabilities are closed. Easy to monitor.
  • service layer: commonly used modules, such as login, payment, marketing, etc., are not only their own businesses, but also provide their own services to other businesses. We classify such modules into the service layer.
  • component layer: This layer provides the general capabilities of the client, such as security, network, multimedia, storage, etc. They provide stable interfaces to upper-level users, while constantly optimizing their own internal performance and stability, as the client's cornerstones, they are vital.
  • framework layer: The most critical part of includes containers, micro-applications, service frameworks and pipelines. Client-side micro-applications and startup management all depend on the operation of the framework layer.

We call service layer , component layer and framework layer collectively called mPaaS, that is, a PaaS service on the mobile terminal.These PaaS services can be reused. We not only use them in Alipay, but also in other group applications, such as Ant Fortune , MYbank , etc.

| Business division

To achieve business division, the best way is to isolate the code. You don't need to develop in the same Codebase to avoid code merge conflicts. This can usually be achieved by aar on Android, but unfortunately, aar has not yet come out when we refactored, and even if there is aar, there is a problem that the packaging time increases linearly with the increase of code size.

Our solution draws on the concept of OSGi and divides the entire client into bundles, each bundle can contain its own code, pages and resources. Readers may wonder, what is the difference between this and aar? In fact, the difference is huge!

First of all, the code part in the bundle is the compiled dex. When compiling apk, we only need to merge the dex, instead of compiling the class into dex like aar and then merging, which greatly saves the packaging time; secondly, Bundle can run independently in its own ClassLoader, and we can load basic components such as Activity through shell proxy, making it possible to dynamically deliver services; finally, Bundle also contains configuration information related to micro-applications, services and pipelines , the framework will start the corresponding component based on this information. The

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

mPaaS service, that is, the Service is similar to the Service in the Spring framework, it provides external interface services, and the user does not need to know how to initialize the service instance and life cycle management, these are completely managed by the framework.The user only needs to know the method parameters of the target service interface class, and obtain the instance through the API provided by the framework when calling. For the publisher of the service, he declares the interface class and the instance class derived from the interface class in his bundle, and registers the relevant information in the bundle's manifest file. The essential idea of ​​this approach is Inversion of Control, which reduces complex dependencies between classes and avoids tedious initialization work.

is developed in a way of relying on the interface, which can relieve the service user's dependence on the service provider. When the service provider is not fully developed, the user can simulate the service in a mock way without modifying their own business. The code, of course, presupposes that the two parties negotiate the agreement of the service interface.

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

There are so many pages in Alipay that directly starting Activity or ViewController is not enough for us. We choose to add MicroApp on them, that is, the concept of micro application. Microapps have unique application IDs that identify their existence in the framework. The micro-app has a unified entry, and manages Activity or ViewController according to the dictionary parameters passed in by the user. This can bring many benefits:

  1. As long as the application ID and parameter protocol remain unchanged, the user does not need to worry about the impact of the internal reconstruction of the target application, and the problem of flooding of references caused by directly using the Activity or ViewController class name no longer exists. The ID and dictionary parameters of the
  2. micro-app can easily generate URLs, so that external applications can use URLs to jump to in-app pages.
  3. From a data perspective, we can count user behavior data by business dimensions.
  4. The concept of micro-apps applies not only to native pages, but also to H5 and applets. Registered as an application ID of H5 or applet type, the framework will automatically delegate the startup process to the H5 or applet container, and the user does not need to care about the application type corresponding to the application ID.

To sum up, the features endowed by micro-applications and service interfaces greatly improve the efficiency of collaboration between teams, and the dependencies between R&D groups are simpler and can go their own ways, and pay more attention to the creation and construction of their own services.

| Performance optimization

On the one hand, we have made major changes in the architecture to improve the efficiency of research and development, and on the other hand, we are constantly optimizing the performance to improve the user experience. We mainly start from three levels:

  • The framework level
  1. formulates unified development specifications, and the business side uses the unified thread pool, storage, network and other components, and loads them on demand to avoid unnecessary startup and time-consuming operations.
  2. introduces the Pipeline mechanism. If the business module needs to initialize when the application starts, it must use the Pipeline. The framework determines the actual business initialization according to the business priority.
  3. uses the AOP aspect to conduct time-consuming statistics on common paths and track performance bottlenecks.
  • Basic indicators

For common indicators, such as flashback, ANR, memory, storage, power, traffic, etc., long-term tracking. We are able to clearly understand the differences in these metrics between each version, and conduct sampling analysis to locate and solve the problem.

  • breaks down

We not only optimize at the application level, but also explore the possibility of performance improvement . In this regard, we have also gained a lot. For example, on some system versions on Android, the startup time can be reduced by 20% to 30% by disabling GC during the startup phase; on iOS, using the system's own Background Fetch mechanism to improve The active time of the process, the application starts in seconds.

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

Aircraft carrier era

With the continuous popularization of mobile payment, in the face of massive user and business needs, high availability and flexible dynamics have become more difficult challenges for Alipay clients. As a service platform integrating payment, finance and life, Alipay needs to be able to quickly and steadily release services and introduce third-party services, and at the same time, it must be able to respond actively and quickly to user feedback and demands.

| Dynamic R&D Model

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

We made changes in the R&D model to meet the requirements of rapid business iteration, and business gradually migrated from native pages to Web hybrid pages. The original R&D model can well meet the requirements of team collaboration, but with the continuous increase of the business scale, the corresponding expansion of the amount of code results in the installation package being too large, which once exceeded the upper limit of the code segment on iOS and could not pass the AppStore review; in addition The iterative release based on centralized time points, usually a version a month, is far from meeting the update speed requirements of the business. Compared with native application development, the advantages of web applications are very obvious:

  1. only needs one set of code, and web applications can be run on iOS and Android clients, which can relatively reduce personnel investment.
  2. The daily functions of each user are only a small part of Alipay's huge platform. H5 applications can be dynamically distributed, so redundant storage can be eliminated and the package size can be reduced.
  3. In recent years, React Native, Weex and other dynamic rendering engines have been very active in the community, but after a small range of applications and considering the continuous development of Web technology and its recognized status in the industry, we finally chose Web technology as the dynamic R&D model. Base.
  4. Web application iteration gets rid of the constraints of centralized time-point release on the client side, and the iteration plan of each business line becomes autonomous and controllable.

| Polished Web Experience

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

Although the advantages of web applications are obvious, the shortcomings on the mobile terminal are also obvious. The limitations of user experience, performance and capabilities provided by it are quite different from native applications. In order to make up for these gaps, we have made a lot of improvements, mainly in the following aspects:

  1. The front and back ends are separated, and we have made the page resources offline, which saves the time consumed by resource requests, makes the page opening speed significantly improved, and solves the problem of The problem of white screen is easy to occur in poor network environment. At the same time, the data request uses the Native network channel, which has more space for optimization and higher security.
  2. Differential update, when the client updates a business application version, it does not need to download the complete new version resource package, but downloads the smaller differential package calculated and generated by the publishing platform according to the locally installed version of the client, which not only can Save bandwidth and traffic, and also improve the speed of business updates.The
  3. push-pull combination solves the problem of the coverage of the latest version of the business. Every time a new version is released, the business can actively trigger a message to the client, and the client will update the business application version after receiving the notification. At the same time, the client will regularly check whether the server has a version released, which can ensure that most users can get the latest application in a short time after the version is released.
  4. Fault tolerance compensation, the client may not be able to use or obtain the offline package in time due to network, security or storage permissions, etc. We also consider this situation. When we publish offline resources, the publishing platform will automatically generate the corresponding online URL and configure it into the application information. When the client loads the web application and finds that the offline package is unavailable, it will immediately enable the URL to load the content, which can maximize business availability. .
  5. Android standalone browser kernel, the problem of Android fragmenting has existed since its inception, and there seems to be no sign of it being resolved. There are also differences in the browser kernels of different systems and different manufacturers, which leads to the endless compatibility problems that cause headaches for R&D students, which also goes against the vision of the web to dominate the world. In order to completely solve and control these problems, we have introduced an independent UC browser kernel and integrated it into the application, so that all problems are solved by the UC team and become very controllable. According to statistics, after using the UC browser kernel, the Browser-related crashes and ANRs have dropped significantly. At the same time, we can fix and release security vulnerabilities at the first time, which is far more efficient than the manufacturer's upgrade.
  6. Web applications are all-round monitoring, and performance data such as resource loading exceptions, JS execution exceptions, blank screens, and loading time will be collected and reported to the background, so that abnormalities can be detected in time.

| applet

We not only provide various services ourselves, but also need to introduce third-party services to serve more people. In the past, we could only introduce simple third-party H5 pages, which could only use a few functions provided by Alipay , and the difference in developer capabilities results in a suboptimal user experience. The applet fully opens up the capabilities of Alipay. From development to testing, it has complete toolchain support such as IDE. At the same time, the DSL is simple and easy to use. For third parties, it can quickly develop and launch a more powerful experience and function than ever before. Applets.

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

| Online High Availability Guarantee System

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

In Alipay, online risk is something that every R&D personnel must clarify before doing business. Risk assessment, risk prevention, risk monitoring, and risk emergency response plans must be prepared before going online. The high-availability guarantee system of Alipay online consists of four parts: grayscale publishing , real-time monitoring, diagnosis and positioning, and disaster recovery and recovery.

  • Grayscale publishing

Grayscale publishing is one of the effective means to prevent risks. For the client, no matter how complete the offline test is, it cannot guarantee that everything is normal in the user environment. Direct publishing to all users is a very dangerous operation , which is a serious violation within Alipay. Our publishing platform provides a variety of grayscale strategies, including whitelist grayscale, time window grayscale, percentage grayscale, and grayscale based on models and regional systems. Before the new version is released, priority is given to selecting active users and models with high problems for grayscale. During the grayscale period, problems are found and repaired, and the grayscale range is continuously expanded until the indicators such as the flashback rate and the dead rate meet the release standards. release.

  • Real-time monitoring of

First of all, formulate various online monitoring indicators, including flashback, freeze, fluency, traffic, memory, storage, and business unavailability.

Secondly, high-priority indicators such as flashbacks, stuck, and service unavailability in the reporting strategy are reported in real time, and anomalies are found at the first time; data reporting uses an independent process to ensure that it does not affect the main business logic; when the business peak period, For example, during large-scale events such as Spring Festival red envelopes and Double 11, we can dynamically adjust the reporting strategy to relieve the pressure on the log server. In addition to the automatic upload and periodic upload policies, we send diagnostic commands to the client to obtain logs that are not usually used but reside on the client, such as logcat logs.

  • Diagnosis and positioning

We can fully describe the user's operation path according to various buried logs reported by the client. Based on this information, we can try to reproduce the user's problem. The authenticity of the data is more reliable than the information provided by the user. Can reduce the interference caused by wrong information. In addition, the logcat log uploaded through the diagnostic command can provide more complete information and help us to locate the problem more clearly. Therefore, we usually require R&D students to output more useful information in the process of writing code.

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

  • Disaster recovery

When a fault occurs, the first requirement is to stop the bleeding and avoid the expansion of losses. We usually preset the switch into the business logic. When there is a large-scale abnormality or asset loss in the business, the background pushes the business switch to the customer In the terminal, the service can be temporarily shielded and offline.

If the deadlock, flashback, or abnormal homepage of the client during the startup phase exceeds a certain threshold, the application data will be automatically cleaned up and the application will be restored to the initial state, which may partially cause startup problems caused by abnormal data.

We use the hotpatch technology to repair the native code. Also, hotpatch itself is a risky technology, so we have to go through the stage of grayscale release to gradually verify the online stability. Once a problem caused by the patch occurs, the patch should be rolled back immediately.

The flexible and dynamic architecture practice of Alipay Super App - DayDayNews

technology Category Latest News