A large ocean shipping group was established in Shanghai in February 2016. It is a super-large central enterprise directly managed by the State-owned Assets Supervision and Administration Commission of the State Council and involved the national economy and people's livelihood and the lifeline of the national economy. Based on shipping, ports, logistics, etc., and core industries, the Ocean Shipping Group empowers and value-added industries with shipping finance, equipment manufacturing, value-added services, and digital innovation as the empowering and value-added industries, it strives to create a "3+4" industrial ecosystem, and is committed to building a world-class global comprehensive logistics supply chain service ecosystem.
When the response performance of a transaction process decreases, how much business order volume will it affect? As a shipping group with data volume of tens of millions or even hundreds of millions of yuan, what they need is end-to-end, multi-perspective, and multi-dimensional data collection and analysis to solve problems such as difficulty in troubleshooting, slow positioning and massive business growth. This article will share how 六文机 APM matrix application monitoring helps the shipping group "ride the wind and waves".
Multiple Challenges
In order to obtain a complete application performance view and fast positioning problem, it cannot rely on a simple monitoring tool to solve the problem, but requires more combination of monitoring tools and data associations, and data collection and analysis from end-to-end, multi-view and multi-dimensionality. The Ocean Shipping Group's application monitoring methods were obviously still relatively backward at the time and were unable to cope with massive business growth:
- lacked the means of fault backtracking , and it was difficult to find the overall operating status of the system at that time, including traditional infrastructure monitoring, dynamic monitoring , network performance monitoring, log monitoring, etc.
- The existing monitoring platform is insufficient to integrate . There are many existing monitoring platforms, but they are isolated from each other. It is difficult to locate problems in key business systems, and the positioning time is long, and there is a lack of efficient fault positioning methods.
- generates data faster and faster, and there are more and more types of data, and it is necessary to analyze events, indicators, and track transactions. With wired data, network traffic data, flow telemetry data, customer sentiment, etc., the rate of change within the IT architecture is getting higher and higher. Due to the use of cloud native and some temporary architectures, we face challenges in maintaining observability and improving participation.
solution
In response to the actual situation and demand points of the shipping group, Xieyun tailored a new generation of application performance monitoring solutions with APM as the core, and integrated middleware performance data, infrastructure layer performance data technology, network layer and other performance data from top to bottom to realize full-stack performance data management .
In the unified monitoring system, the APM suite plays the most important role, integrating traditional infrastructure monitoring, dynamic environment monitoring, network performance monitoring, log monitoring, etc., and plays a key role in performance visualization, root cause analysis and operation and maintenance automation , etc.
end-to-end tracking technology
According to the fast-response and easy-to-scaling cross-component execution trajectory monitoring method in the system platform architecture, the monitoring infrastructure of large-scale distributed system calls the end-to-end full-link execution trajectory idea, and studies the end-to-end tracking technology based on full-link analysis.
supports a billion-level data volume
Based on the characteristics of the full-link call trajectory of execution time, it uses inverted index technology to quickly locate application exceptions. The location of the record is determined by the attribute value, and the real-time query is quickly and queried from massive data (10 million or even millions of levels), and better compress and store the data, reducing the pressure of data storage.
Accurate exception alarm
uses performance abnormality analysis and exception alarm methods based on complex event processing engines, and handles them through the event processing bus, access adapter and engine registration method; the process of engine processing requires the internal cache of complex event processing engines, status engines, rules engines, , etc. to parse and filter events, and perform corresponding exception alarm actions.
Operation and maintenance knowledge graph
Based on the machine learning method, it mines operation and maintenance historical data through various algorithms, thereby obtaining various characteristics portraits and rules of the operation and maintenance subject, as well as the relationship between the operation and maintenance subjects, forming an operation and maintenance knowledge graph.
Value effect
Use Heiyun APM matrix application monitoring, customers have achieved the following value benefits while solving problems:
- System abnormal warning capability : Through the application performance monitoring platform's ability to build application analysis model, grasp the health status of the application in real time, realize intelligent warning capabilities, and avoid omissions from business personnel.
- Application Situation Awareness : Through the monitoring platform, we have achieved comprehensive control over the operation of the application, and improved the business personnel's accurate perception and analysis capabilities of the application system.
- Troubleshooting Process Optimization : Help business personnel to significantly reduce the troubleshooting time, and in the process, a troubleshooting system architecture with industry-specific scenarios has been formed.
For cloud services, Xieyun can provide comprehensive monitoring capabilities based on three types of data, namely indicators, tracking, and logs. Through data aggregation and analysis in three dimensions, it can build link closed-loop and refine analysis including host resources, virtual resources, network resources, etc. At the same time, according to the customer's specific big data platform construction needs, we provide cluster solution planning, resource scheduling strategies, elastic expansion strategies and other capabilities that are adapted to and adapted to .
At present, the Xieyun monitoring product has provided performance monitoring services in the process of cloud-native architecture transformation and digitalization for customers in multiple industries such as Finance, operators, manufacturing , and is the leader in the performance monitoring of the new generation of cloud applications.