Re-learning the principles of computer composition (10)-the origin of "hot hot" garbled

2019/08/2700:52:14 technology 2311

program = algorithm + data structure

corresponds to the composition principle of the computer (hardware level)

algorithm --- various Computer instruction
data structure --- binary data

The computer uses a binary 0/1 to represent all information

The machine code used by the program instructions is expressed in binary Fc4d7#
The strings, integers, and floating-point numbers stored in the memory are all expressed in binary.

Everything is 0 and 1 in the computer. Find out how the various data is at the binary level. It is our compulsory course.

The most common problem encountered in practical applications is how the text string is represented as binary, especially the garbled code we will encounter.

When developing, What is the relationship between Unicode and UTF-8?

Understand these, I believe that you will be able to catch any garbled problems in the future.

1 Understand the binary "every two into one"

There is no essential difference between binary and the decimal system we usually use, but it is usually "every ten into one", here it becomes " Every two enters one "

For each digit, we can only use the two digits 0 and 1 compared to the ten digits 0-9 in decimal.

Any decimal integer can be expressed in binary.

A binary number corresponds to the decimal system. It is very simple, that is, multiply the Nth digit from right to left by a 2 N times Fang, and then added up, it becomes a decimal number

Of course, since binary is a "language" for programmers, this position from right to left naturally starts from 0.

such as 0011 This binary number, the corresponding decimal representation is

$0×2^3+0×2^2+1×2^1+1×2^0$
$=3$

represents decimal 3

Correspondingly, if we want to convert a decimal number into binary, use 短分法.

That is, the remainder of dividing a decimal number by 2 as the rightmost digit. Then use the quotient to continue to divide by 2, put the corresponding remainder to the right of the remainder just now, and iterate recursively until the quotient is 0.

For example, if we want to convert the decimal number 13 into binary by short division, we need to go through the following steps:

Therefore, the corresponding binary number is 1101

The examples we just gave are positive numbers. Is the situation the same for negative numbers?

We can regard the leftmost digit of a number as the corresponding sign. For example, 0 is a positive number and 1 is a negative number, so we can mark it.

In this way, a 4-bit binary number, 0011 is expressed as +3. The first digit on the leftmost side of 1011 is 1, so it means -3. This is actually the original code representation of an integer.

The original code representation has a very intuitive shortcoming that 0 can be represented by two different codes. 1000 represents 0 and 0000 also represents 0. Programmers who are accustomed to the one-to-one correspondence of everything will inevitably be "forced to death" seeing this situation.

So, we have another representationmethod. We still judge the positive or negative of this number by the first 0 and 1 on the left. However, we no longer regard this bit as a separate sign bit, and add a sign before the calculated decimal number of the remaining few bits. Instead, when calculating the entire binary value, add a negative before the highest bit on the left. number.

For example, a 4-digit twos complement value 1011, converted to decimal, is

$-1×2^3+0×2^2+1×2^1+1×2 ^0$
$=-5$

If the highest bit is 1, the number must be negative; the highest bit is 0, it must be positive. And, only 0000 means 0, and 1000 means -8 in this case. A 4-bit binary number can represent 16 integers from -8 to 7, without wasting one bit in vain.

Of course, the more important point is that using one's complement to represent negative numbers makes it easy to add integers, without any special processing, just treat it as an ordinary binary addition to get the correct the result of.

Let’s make it simple and take a 4-digit integer to calculate, for example -5 + 4 = -1, -5 + 6 = 1

Let's convert them to binary to take a look. If they are added in the same way as unsigned binary integers, it means that they are the same circuit.

2 Representation of character strings, from code to number

Not only numerical values can be expressed in binary, characters and even more information can be expressed in binary

The most typical example is String (Character String)

The earliest computers only need to use English characters, plus numbers and some special symbols, and then use 8-bit binary to represent us All the characters we need daily, this is what we often say ASCII code (American Standard Code for Information Interchange, American Standard Code for Information Interchange)

ASCII code just It is like a dictionary, using 128 different numbers in 8-bit binary to map to 128 different characters

For example, the lowercase letter a in ASCII is the 97th, which is the binary 0110 0001 , The corresponding hexadecimal representation is 61. The capital letter A is the 65th, which is 0100 0001 in binary, and the corresponding hexadecimal representation is 41.

In ASCII code, the number 9 is no longer the same as in integer representation. 0000 1001 is used instead of 0011 1001. The string 15 is not represented by the 8 bits of 0000 1111, but becomes two characters 1 and 5 consecutively placed together, that is, 0011 0001 and 0011 0101, which need to be represented by two 8 bits.

We can see that the largest 32-bit integer is 2147483647. If you use integer notation, you only need 32 bits to represent it. But if it is represented by a string, there are a total of 10 characters, and if each character uses 8 bits, a full 80 bits are required. It takes up a lot more space than integer notation.

This is why, many times when we store data, we need to use binary serialization instead of simply storing the data in a text format such as CSV or JSON for serialization. Whether it is an integer or a floating point number, using binary serialization will save a lot of space than storing text.

ASCII code only represents 128 characters, which was also usable at first, after all, the computer was invented in the United States

However, as more and more people from different countries are using computers, 128 characters are obviously not enough to express characters such as Chinese. As a result, computer engineers began to show their abilities and created corresponding Charset and Character Encoding for their own country’s language.

charset

means It can be a set of characters

For example, "Chinese" is a character set, but it is not accurate to describe a character set like this.

To be more precise, we can say, "The first edition of Xinhua Dictionary "All Chinese characters appearing in it", this is a character set. In this way, we can clearly know whether a character is in this set

For example, the Unicode we talk about daily is actually a character set that contains 140,000 different characters in 150 languages.

Character encoding

is for these characters in the character set, how to use binary representation one by one a dictionary

Unicode we mentioned above, you can use UTF-8 , UTF-16, or even UTF-32 to encode and store as binary. So, with Unicode, in fact, we can use more than UTF-8 encoding, we can also invent a set of GT-32 encoding, such as Geek Time 32. As long as others know this set of coding rules, they can transmit and display this code normally.

The same text is stored with different codes. If another program uses a different encoding method to decode and display, garbled characters will appear. It is as if two armies communicate in secret language. If they use the wrong codebook, the news they see will be incomprehensible. In the Chinese world, the most typical is the allusion of "holding two jin khao, and screaming in the mouth."

Inexperienced students, when they saw the program output "hot", thought that the program caused the CPU to overheat and issued an alarm, so they tried to reduce the frequency of the CPU to solve the problem.

Since we want to thoroughly understand the coding knowledge today, let's figure out the ins and outs of "kunjinkao" and "hot and hot".

The source of "Kunjinko"

If we want to use Unicode to record some text, especially some legacy text in the old character set, but these characters may not be in Unicode does not exist. Therefore, Unicode will uniformly record these characters as the U+FFFD code

If stored in UTF-8 format, it is \\xef\\xbf\\xbd. If two consecutive such characters are put together, \\xef\\xbf\\xbd\\xef\\xbf\\xbd, at this time, if the program decodes this character in GB2312, it will become "Kunjinkao". This is like when we use the cipher book GB2312 to decrypt other people's information encrypted with UTF-8, naturally there is no way to read useful information.

and "hot hot", it is because if you use Visual Studio debugger, the default MBCS character set

"hot" is represented by 0xCCCC, and 0xCC happens to be Assignment of uninitialized memory. As a result, when it reads an unassigned memory address or variable, the computer starts yelling "hot".

3 Summary and extension

to here, I believe you find that we can use binary encoding to represent arbitrary information. As long as the character set and character encoding are established, and everyone agrees, we can express such information in the computer. So, if you have the heart, to invent a Klingon language of your own is notWhat is difficult.

However, it is not enough to understand how to represent numerical values and characters in binary at the logical level. In the computer composition, we are not only concerned with the logical representation of values and characters, but also to understand at the hardware level, what is the relationship between these values and the transistors and circuits we have been mentioning. In the next lecture, I will lift the veil of mystery for you. I will start with the clock and D flip-flops, and finally let you understand how addition in a computer is realized through circuits.

4 Recommended reading

"Code: Language Hidden Behind Computer Software and Hardware"

8d0f0##
From telegraph machine to computer, this book tells the historical stories of many computing devices. Of course, it also contains binary and the corresponding circuit principles behind it.

Reference

Introduction to the principle of computer composition

technology

Recently, in order to implement the "Financial Technology Development Plan (2022-2025)" of the People's Bank of China and actively use privacy computing technical specifications to carry out data sharing applications, the Beijing Financial Technology Industry Alliance (hereinafte - DayDayNews

Recently, in order to implement the "Financial Technology Development Plan (2022-2025)" of the People's Bank of China and actively use privacy computing technical specifications to carry out data sharing applications, the Beijing Financial Technology Industry Alliance (hereinafte

The alliance successfully held the "Financial Industry Privacy Computing Interconnection Work Seminar"

07/06 1075

Then, empowered by "Technology +". More than 60 forums, thousands of guests, 32 technical directions involving 6 major technical fields, and 25 industrial issues in 5 major digital fields... - DayDayNews

Then, empowered by "Technology +". More than 60 forums, thousands of guests, 32 technical directions involving 6 major technical fields, and 25 industrial issues in 5 major digital fields...

Yao Haokan | Climbing on "Yunqi" and exploring the "Future Realm" of Thousand Palaces

07/06 1984

technology

If you know the product layout of the OPPO brand, you will know that there are K series 1,000 yuan phones with a focus on appearance and lightness in this brand. However, in the OPPO K10 and OPPO K10 Pro released this year, we can no longer see the high-value and light selling po

OPPO is very courageous, with 5000mAh+80W flagship, 12G+256G directly reduced by 1100

07/06 1787

Elon Musk successfully acquired Twitter on the 27th, and now there is news that Twitter is considering charging for the "Blue Gou" verification service! That is, users need to spend US$4.99 per month (about NT$161) to subscribe to Twitter Blue, otherwise the blue hook will disapp - DayDayNews

Elon Musk successfully acquired Twitter on the 27th, and now there is news that Twitter is considering charging for the "Blue Gou" verification service! That is, users need to spend US$4.99 per month (about NT$161) to subscribe to Twitter Blue, otherwise the blue hook will disapp

Twitter has changed a lot! Blue Gou Gou users are charged! Musk revealed that he will adjust the certification process

07/06 1061

Click to follow, and it will be exciting every day! Introduction: Alibaba has quietly made plans, with 5nm, 120 billion, and 10,000 pieces, Jack Ma is "hidden without revealing"! - DayDayNews

Click to follow, and it will be exciting every day! Introduction: Alibaba has quietly made plans, with 5nm, 120 billion, and 10,000 pieces, Jack Ma is "hidden without revealing"!

Alibaba quietly makes plans, 5nm, 120 billion, and 10,000 pieces, Jack Ma is "hidden"

07/05 1805

As a heavyweight product created by Shanghai Zoomlion Heavyke Pile Machinery Co., Ltd. for smart mines, safety mines and efficiency mines, once this coal mine boring machine was launched, there were endless customers who came to the door for consultation, and many customers even - DayDayNews

As a heavyweight product created by Shanghai Zoomlion Heavyke Pile Machinery Co., Ltd. for smart mines, safety mines and efficiency mines, once this coal mine boring machine was launched, there were endless customers who came to the door for consultation, and many customers even

Weighing 81 tons, remote control, equipped with a number of patented technologies, the "Songjiang Intelligent Manufacturing" major new product is launched

07/05 1699

This phone has a world-first feature, which is to support satellite communication. To be precise, it is the function of relying on Beidou satellite to send short messages and locations without a network or signal. - DayDayNews

This phone has a world-first feature, which is to support satellite communication. To be precise, it is the function of relying on Beidou satellite to send short messages and locations without a network or signal.

Huawei officially announced that this is the second Huawei phone that supports Beidou satellite

07/05 1645

▎VRay6.0 new features ▎Chaos Scatter The function of Scatter is to allow you to quickly and easily create forests, fields, crowds, etc. without taking up a lot of memory. It is a very friendly tool! At the same time, Chaos Cosmos has also added scatter presets, which Chaos users - DayDayNews

▎VRay6.0 new features ▎Chaos Scatter The function of Scatter is to allow you to quickly and easily create forests, fields, crowds, etc. without taking up a lot of memory. It is a very friendly tool! At the same time, Chaos Cosmos has also added scatter presets, which Chaos users

VRay has entered the 6.0 era, what have the features improved?

07/05 1532

Double Eleven in 2022 is coming soon. Are you still worried about what kind of TV you have to buy when you have to buy a TV? Facing the various TV products on the market, I don’t know which TV is more suitable for me? It doesn’t matter, today we will let everyone understand how t - DayDayNews

Double Eleven in 2022 is coming soon. Are you still worried about what kind of TV you have to buy when you have to buy a TV? Facing the various TV products on the market, I don’t know which TV is more suitable for me? It doesn’t matter, today we will let everyone understand how t

What are you waiting for! Now is the best time to buy a smart TV

07/05 1870

"Some functions of Tencent meetings start charging" became a hot search yesterday, and some companies were happy and some were sad. According to reports, starting from September 15, some Tencent meeting functions will start charging, and a fee of at least one month will be requir - DayDayNews

"Some functions of Tencent meetings start charging" became a hot search yesterday, and some companies were happy and some were sad. According to reports, starting from September 15, some Tencent meeting functions will start charging, and a fee of at least one month will be requir

What is Tencent meeting fee? The workers have reason to refuse to hold a meeting

07/05 1382