
Author | Du Qinyuan and other
editor | Guo Rui
Produced by | CSDN (ID: CSDNnews)
Once seen on Hacker News, an Oracle engineer deals with bugs. Daily life:
took about two weeks to understand how 20 parameters trigger bugs through magical combinations.
changed a few lines of code, tried to fix the bug, submitted the test cluster and started running nearly one million test cases, which usually took 20 to 30 hours.
If you are lucky, you will have more than 100 cases. Sometimes there are thousands of cases, so you have to choose a few to see. I found that there are still 10 parameters that I didn’t notice before. After another two weeks of
, I finally found the real parameter combination that caused the bug and ran through all the tests. And add more than 100 test cases to ensure that his modifications are overwritten.
After more than a month of code review, his modifications were finally merged and he began to deal with the next bug...
Later, the engineer sighed and said: "I don't work for Oracle anymore. Will never work for Oracle again!"
Oracle 12.2 has nearly 25 million lines of C code. Testing complex systems is a difficult, difficult and difficult thing, and testing a distributed database is even more complicated. We never know what kind of SQL users may write, how many combinations of table structure and indexes are there, and we must also consider when the nodes of the cluster fail, and are affected by factors such as network jitter and disk performance degradation - the possibilities are almost unlimited.
So is there a way for the program to automatically check for bugs?

How to "make the program automatically check for bugs when sleeping"? The idea of the
project is actually very simple. If you can use statistical methods to analyze the code paths of enough experiments every time you run the case, you can find the code that is suspected to be bugs. The final result is presented by the front-end visual in the form of code staining, and the effect shown in the figure below is obtained:

"The darker the color, the higher the brighter the" means the greater the possibility of containing the error logic. This method is not only suitable for testing database systems, but also for any other complex system. The principle behind

project was originally inspired by a paper by VLDB APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems. The paper mainly revolves around how to diagnose code that triggers database performance fallbacks, and its core idea is also applicable to troubleshooting bugs. The automatic diagnosis system mentioned in the paper consists of three modules: SQLFuzz, SQLMin and SQLDebug.

SQLFuzz: Responsible for randomly generating SQL, and using binary search to locate the first and last two versions of performance fallback, and pass it to the SQLMin module.
SQLMin: Simplifies the SQL generated by SQLFuzz through a pruning algorithm, and obtains the minimum SQL that can reproduce the problem and pass it to the SQLDebug module. The purpose is to reduce irrelevant code paths and reduce noise.
SQLDebug: Insert the source code so that it can output the execution path of the code when executing SQL. Then analyze the code paths of the two versions and establish a statistical model to locate the location of the problem.
The final system automatically generates a test report, which contains:
Which code commit introduces performance fallback.
The code source file with problems.
specific function location.
In fact, considering the impact of concurrency, looping, recursion, etc., the code execution path analysis will be very complicated. In order to ensure that the effect can be displayed in such a short time as Hackathon, we refer to another paper, Visualization of Test Information to Assist Fault Localization. The core idea is to count the number of times the code blocks are correct and errored, and then apply different colors based on the analysis algorithm. It is simple and practical.

can actually be applied to other fields with the help of this idea, which we will introduce later.Next, let’s take a look at how SQLDebug is implemented.

talk about details (gān) Section (huò)
How to automatically generate test case?
Since it is based on statistical diagnosis, we need to build enough test use cases first. Of course, this process is best completed automatically by the program. In fact, grammar-based testing has a considerable history in verifying compiler accuracy, and the DBMS community has adopted a similar approach to verifying the functionality of the database. For example: the RAGS system developed by Microsoft's SQL Server team conducts continuous automated testing of the database, as well as the SQLSmith project that is more famous in the community, etc. This year, another award-winning project of TiDB Hackathon, sql-spider, also achieved a similar purpose.
Here we temporarily use the PingCAP open source random testing framework go-randgen to implement SQL fuzzing, which requires users to write some rule files to help generate random SQL test cases. The rule file consists of some production formulas. randgen each time, randomly walks the production formula from query, generates a SQL, and generates a path like the red line in the figure below.

We draw each production formula to the correct and wrong use case as the color value of the production formula, and draw it into a page as the display page of SQLFuzz. Through this page, it is easier to see which production line is more likely to produce error SQL.

code tracking
In order to track the code execution path of each SQL at runtime, a key operation is to perform Dynamic Instrumentation on the tested program. The VLDB paper mentions a binary instrumentation tool, DynamoRIO, but we are not sure whether it can be used to make Go compiled binary work properly. Another idea is, what if we can directly organize the source code before compilation?
refers to the implementation of go cover tool, we wrote a special code insertion tool tidb-wrapper. It can process any version of TiDB source code and generate wrapped code. And inject an HTTP Server into the program. Assuming that the digest of a certain SQL is df6bfbff (the summary here refers to the hexadecimal system of the 32-bit MurmurHash calculation result of the SQL statement, and the main purpose is to simplify the transmitted data), then just visit http://tidb-server-ip::43222/trace/df6bfbff to obtain the source code file and code block information that the SQL passes through.
// http://localhost:43222/trace/df6bfbff
{
"sql": "show databases",
"trace": [
{
"file": "executor/batch_checker.go",
"line":
},
{
"file": "infoschema/infoschema.go",
"line": [
[
113,
113
],
[
261,
261
],
//....
}
],
}
line field outputs each binary group is the start and end line number of a basic block (closed left and closed right). The definition of a basic block is a code block that will never produce branches, and it is also the minimum granularity we count. So how do you identify the basic blocks in Go code? In fact, the workload is quite high. Fortunately, there is this section in Go's source code. We just saw it, so we cut it out and became a go-blockscanner.
Because the main goal is correctness diagnosis, we restrict the system to not execute SQL concurrently on TiDB, so that we can think that all the basic blocks executed during the period from the server/conn.go:handleQuery method is called and the SQLDebug module accesses the trace interface are the execution path of this SQL. When the SQLDebug module accesses the HTTP interface, the SQL-related trace information will be deleted at the same time to avoid memory being burst.
Basic Block Statistics
SQLDebug module will establish the following visual model for each basic block after obtaining the basic block information passing by each SQL.
is the color first, the higher the proportion of failed use cases that pass through the base block, the darker the color of the base block.

and then brightness. The higher the proportion of the failed use cases passing through the base block in the total failed use cases, the higher the brightness of the base block.

already has a color indicator, why do you need a brightness indicator? In fact, the brightness indicator is to make up for some biases in the "color indicator Score".For example, if a code path is only passed by one wrong use case, it will obviously get the highest score of Score 1. In fact, this path is not so representative, because only one of so many wrong use cases has passed through this path, which is most likely not the real reason for the error. Therefore, an additional brightness indicator is needed to avoid interference from this path. Only code blocks with dark colors and high brightness are truly doubtful code blocks. The two models above
are mainly based on the Visualization paper mentioned earlier. We also created a file sorting indicator. The greater the density of the failed use case in the file (according to the basic block), the higher the file ranking: after the

front-end gets these indicators, it displays it in the order of file ranking calculated above. The higher the file, the higher the risk of problems.

When clicking to expand, you can see the stained code block:

After some simple experiments, the file-level diagnosis is relatively accurate, and the diagnosis of the basic block is relatively rough. This has a lot to do with the failure to implement SQLMin. After all, SQLMin can remove a lot of noise during statistics. Can you do something else in

?
See this, you may think that this project is just an automated test for the database system. In fact, with the help of automatic code debugging, we can give us more inspiration.
source code teaching
Reading and analyzing the source code of complex systems is a headache. Can running-time visual tracking based on source code become a general tool? In this way, you can intuitively see the code running process while executing the program, which will definitely be of great help to quickly understand the source code. Going further, is it possible to make an online web application in conjunction with the online execution of source code?
Full-link test coverage statistics
language itself provides a single-test coverage statistics tool, but in general, e2e testing, integration testing, stability testing, etc. must be passed. Can the coverage of various tests be calculated in this paper and integrated with the CI system and automated test platform? Using code staining technology, the heatmap analysis performed by the code can also be output. Combined with the profiler tool, can it also help locate the performance problems of the code?

Chaos Engineering
has many Chaos testing platforms in PingCAP to verify the robustness of distributed systems, such as Schrodinger, Jepsen, etc. One disadvantage of chaos testing is that it is difficult to reproduce again after the problem occurs, so you can only guess where the code may be problematic through the situation at that time. If the code execution path can be recorded when the program is running, the scope can be further narrowed based on the logs and monitoring near the time point when the problem occurs, and then analysis with the code path can be combined with the code path to locate the cause of the problem accurately and quickly.
integrates with distributed tracing system
Google has a paper introducing its internal distributed tracking system Dapper. At the same time, the community also has a relatively famous project Open Tracing as its open source implementation. Apache also has a similar project Skywalking below. The general Tracing system mainly tracks the call relationship between user requests between multiple services and uses visualization to assist in troubleshooting problems. However, the tracking granularity of the Tracing system is generally at the service level. If we pass trace_id and span_id to the code block as labels for pile driving, can we directly drill down to the source code on the interface of the Tracing system? Doesn’t it sound particularly cool?

Next work
above
We have only completed a very simple prototype. There is still a way to go before the program will automatically check bugs when sleeping. We plan to continuously improve the project.
Next, we must first support the execution of multiple test cases in parallel, so that enough experimental samples can be obtained in a short time and the analysis results can be more accurate. In addition, the impact of the injected code on program performance should be minimized, so that it can be applied to a wider range of fields, such as performance pressure testing scenarios, and can be turned on even in production environments.
You may not be able to hold back when you see this. Attach the complete source code of the project:
https://github.com/fuzzdebugplatform/fuzz_debug_platform
Welcome to hack!
Author profile:
Huang Baoling, PingCAP front-end development engineer, likes React and TypeScript.
Man Junpeng, an efficiency tool engineer, is currently engaged in the research and development of Benchmark and Stability-related tools at PingCAP.
Du Qinyuan, a graduate student at the University of Science and Technology of China, has internship at PingCAP and is engaged in the research and development of database testing tools.
Han Yubo, a graduate student at the University of Science and Technology of China, interned at Tradeshift and engaged in front-end development.
【End】
generates a customized version of the 2D face avatar with one move, and can also "imitate" your expressions!
The major changes in the Internet of Things: LoRa has been officially approved!
C++ console masturbation mini game | CSDN blog post selection
☞Twitter Do you have the right to delete the account of the deceased user?
I have collected 12 automatic generators, special
html for boring people to entertain themselves 15 commonly used developers tools for Alibaba programmers