Re-learning the principles of computer composition (10)-the origin of "hot hot" garbled

2019/08/2700:52:14 technology 2311

Re-learning the principles of computer composition (10)-the origin of

program = algorithm + data structure

corresponds to the composition principle of the computer (hardware level)

  • algorithm --- various Computer instruction
  • data structure --- binary data

The computer uses a binary 0/1 to represent all information

  • The machine code used by the program instructions is expressed in binary Fc4d7#
  • The strings, integers, and floating-point numbers stored in the memory are all expressed in binary.

Everything is 0 and 1 in the computer. Find out how the various data is at the binary level. It is our compulsory course.

The most common problem encountered in practical applications is how the text string is represented as binary, especially the garbled code we will encounter.

When developing, What is the relationship between Unicode and UTF-8?

Understand these, I believe that you will be able to catch any garbled problems in the future.

1 Understand the binary "every two into one"

There is no essential difference between binary and the decimal system we usually use, but it is usually "every ten into one", here it becomes " Every two enters one "

For each digit, we can only use the two digits 0 and 1 compared to the ten digits 0-9 in decimal.

Any decimal integer can be expressed in binary.

A binary number corresponds to the decimal system. It is very simple, that is, multiply the Nth digit from right to left by a 2 N times Fang, and then added up, it becomes a decimal number

Of course, since binary is a "language" for programmers, this position from right to left naturally starts from 0.

such as 0011 This binary number, the corresponding decimal representation is

$0×2^3+0×2^2+1×2^1+1×2^0$

$=3$

represents decimal 3

Correspondingly, if we want to convert a decimal number into binary, use 短分法.

That is, the remainder of dividing a decimal number by 2 as the rightmost digit. Then use the quotient to continue to divide by 2, put the corresponding remainder to the right of the remainder just now, and iterate recursively until the quotient is 0.

  • For example, if we want to convert the decimal number 13 into binary by short division, we need to go through the following steps:

Re-learning the principles of computer composition (10)-the origin of


  • Therefore, the corresponding binary number is 1101

The examples we just gave are positive numbers. Is the situation the same for negative numbers?

We can regard the leftmost digit of a number as the corresponding sign. For example, 0 is a positive number and 1 is a negative number, so we can mark it.

In this way, a 4-bit binary number, 0011 is expressed as +3. The first digit on the leftmost side of 1011 is 1, so it means -3. This is actually the original code representation of an integer.

The original code representation has a very intuitive shortcoming that 0 can be represented by two different codes. 1000 represents 0 and 0000 also represents 0. Programmers who are accustomed to the one-to-one correspondence of everything will inevitably be "forced to death" seeing this situation.

So, we have another representationmethod. We still judge the positive or negative of this number by the first 0 and 1 on the left. However, we no longer regard this bit as a separate sign bit, and add a sign before the calculated decimal number of the remaining few bits. Instead, when calculating the entire binary value, add a negative before the highest bit on the left. number.

For example, a 4-digit twos complement value 1011, converted to decimal, is

$-1×2^3+0×2^2+1×2^1+1×2 ^0$

$=-5$

If the highest bit is 1, the number must be negative; the highest bit is 0, it must be positive. And, only 0000 means 0, and 1000 means -8 in this case. A 4-bit binary number can represent 16 integers from -8 to 7, without wasting one bit in vain.

Of course, the more important point is that using one's complement to represent negative numbers makes it easy to add integers, without any special processing, just treat it as an ordinary binary addition to get the correct the result of.

Let’s make it simple and take a 4-digit integer to calculate, for example -5 + 4 = -1, -5 + 6 = 1

Let's convert them to binary to take a look. If they are added in the same way as unsigned binary integers, it means that they are the same circuit.

Re-learning the principles of computer composition (10)-the origin of

2 Representation of character strings, from code to number

Not only numerical values ​​can be expressed in binary, characters and even more information can be expressed in binary

The most typical example is String (Character String)

The earliest computers only need to use English characters, plus numbers and some special symbols, and then use 8-bit binary to represent us All the characters we need daily, this is what we often say ASCII code (American Standard Code for Information Interchange, American Standard Code for Information Interchange)

Re-learning the principles of computer composition (10)-the origin of

ASCII code just It is like a dictionary, using 128 different numbers in 8-bit binary to map to 128 different characters

For example, the lowercase letter a in ASCII is the 97th, which is the binary 0110 0001 , The corresponding hexadecimal representation is 61. The capital letter A is the 65th, which is 0100 0001 in binary, and the corresponding hexadecimal representation is 41.

In ASCII code, the number 9 is no longer the same as in integer representation. 0000 1001 is used instead of 0011 1001. The string 15 is not represented by the 8 bits of 0000 1111, but becomes two characters 1 and 5 consecutively placed together, that is, 0011 0001 and 0011 0101, which need to be represented by two 8 bits.

We can see that the largest 32-bit integer is 2147483647. If you use integer notation, you only need 32 bits to represent it. But if it is represented by a string, there are a total of 10 characters, and if each character uses 8 bits, a full 80 bits are required. It takes up a lot more space than integer notation.

This is why, many times when we store data, we need to use binary serialization instead of simply storing the data in a text format such as CSV or JSON for serialization. Whether it is an integer or a floating point number, using binary serialization will save a lot of space than storing text.

ASCII code only represents 128 characters, which was also usable at first, after all, the computer was invented in the United States

However, as more and more people from different countries are using computers, 128 characters are obviously not enough to express characters such as Chinese. As a result, computer engineers began to show their abilities and created corresponding Charset and Character Encoding for their own country’s language.

charset

means It can be a set of characters

For example, "Chinese" is a character set, but it is not accurate to describe a character set like this.

To be more precise, we can say, "The first edition of Xinhua Dictionary "All Chinese characters appearing in it", this is a character set. In this way, we can clearly know whether a character is in this set

For example, the Unicode we talk about daily is actually a character set that contains 140,000 different characters in 150 languages.

Character encoding

is for these characters in the character set, how to use binary representation one by one a dictionary

Unicode we mentioned above, you can use UTF-8 , UTF-16, or even UTF-32 to encode and store as binary. So, with Unicode, in fact, we can use more than UTF-8 encoding, we can also invent a set of GT-32 encoding, such as Geek Time 32. As long as others know this set of coding rules, they can transmit and display this code normally.

Re-learning the principles of computer composition (10)-the origin of

The same text is stored with different codes. If another program uses a different encoding method to decode and display, garbled characters will appear. It is as if two armies communicate in secret language. If they use the wrong codebook, the news they see will be incomprehensible. In the Chinese world, the most typical is the allusion of "holding two jin khao, and screaming in the mouth."

Inexperienced students, when they saw the program output "hot", thought that the program caused the CPU to overheat and issued an alarm, so they tried to reduce the frequency of the CPU to solve the problem.

Since we want to thoroughly understand the coding knowledge today, let's figure out the ins and outs of "kunjinkao" and "hot and hot".

The source of "Kunjinko"

If we want to use Unicode to record some text, especially some legacy text in the old character set, but these characters may not be in Unicode does not exist. Therefore, Unicode will uniformly record these characters as the U+FFFD code

If stored in UTF-8 format, it is \\xef\\xbf\\xbd. If two consecutive such characters are put together, \\xef\\xbf\\xbd\\xef\\xbf\\xbd, at this time, if the program decodes this character in GB2312, it will become "Kunjinkao". This is like when we use the cipher book GB2312 to decrypt other people's information encrypted with UTF-8, naturally there is no way to read useful information.

and "hot hot", it is because if you use Visual Studio debugger, the default MBCS character set

"hot" is represented by 0xCCCC, and 0xCC happens to be Assignment of uninitialized memory. As a result, when it reads an unassigned memory address or variable, the computer starts yelling "hot".

3 Summary and extension

to here, I believe you find that we can use binary encoding to represent arbitrary information. As long as the character set and character encoding are established, and everyone agrees, we can express such information in the computer. So, if you have the heart, to invent a Klingon language of your own is notWhat is difficult.

However, it is not enough to understand how to represent numerical values ​​and characters in binary at the logical level. In the computer composition, we are not only concerned with the logical representation of values ​​and characters, but also to understand at the hardware level, what is the relationship between these values ​​and the transistors and circuits we have been mentioning. In the next lecture, I will lift the veil of mystery for you. I will start with the clock and D flip-flops, and finally let you understand how addition in a computer is realized through circuits.

4 Recommended reading

  • "Code: Language Hidden Behind Computer Software and Hardware"

Re-learning the principles of computer composition (10)-the origin of

  • 8d0f0##
  • From telegraph machine to computer, this book tells the historical stories of many computing devices. Of course, it also contains binary and the corresponding circuit principles behind it.

Reference

  • Introduction to the principle of computer composition

technology Category Latest News