Requests is written in Python, rewritten based on urllib3, and uses the HTTP library of the Apache2 Licensed source protocol. It can be found that we successfully initiated a GET request, and the return result contains the request header, URL, IP and other information.

1. What are Requests?

Requests is written in Python, rewritten based on urllib3, and uses the HTTP library of the Apache2 Licensed source protocol.

It is more convenient than urllib, can save a lot of our work and fully meet HTTP testing needs.

in one sentence----A simple and easy-to-use HTTP library implemented by Python.

1.1 Basic usage

Install Requests

pip3 install requests

# Various request methods: commonly used are requests.get() and requeststs.post() import requests r = requests.get('https://api.github.com/events ') r = requests.post('http://httpbin.org/post', data = {'key':'value'}) r = requests.put('http://httpbin.org/put', data = {'key':'value'}) r = requests.delete('http://httpbin.org/delete') r = requests.head('http://httpbin.org/get') r = requests.options('http://httpbin.org/get')

1.3 GET Request

First, build a simplest GET request, the link to the request is http://httpbin.org/get, and the website will judge If the client initiates a GET request, it returns the corresponding request information:

import requestsr = requests.get('http://httpbin.org/get')print(r.text)

GET request with parameter GET request

can be found , We successfully initiated a GET request, and the return result contains the request header, URL, IP and other information.

So, for GET requests, if you want to attach additional information, how do you usually add it? Splice it behind the URL, use one? Let's divide the parameters, pass them over and then use the symbol of & to divide them. For example, now I want to add two parameters, where name is germery and age is 22. Yo construct this request link, can it be written directly as:

r = requests.get ('http://httpbin.org/get?name=germery&age=22')

This is also OK, but in general, this information data will be stored in a dictionary. So how do you construct this link? -----It's good to use the params parameter. The example is as follows:

import requestsdata = {'name':'germary','age':22}r = requests.get('http://httpbin.org/get' ,params=data)print(r.text)

can be judged by the running result, and the requested link is automatically formed: http://httpbin.org/get? name=germery&age=22.

parses json

In addition, the return type of the web page is actually str type, but it is very special and is in JSON format. So, if you want to directly parse and return the result and get a dictionary format, you can directly call the json() method. The example is as follows:

import requestsr = requests.get('http://httpbin.org/get')print(type(r .text))print(r.json())print(type(r.json()))

You can find that by calling the json() method, you can convert the string in JSON format into a dictionary.

But it should be noted that if the returned result is not in JSON format, a parsing error will occur, and a json.decoder.JSONDecoderError exception will be thrown.

#If the query keyword is Chinese or there are other special symbols, you have to url encoding from urllib.parse import urlencode wb = "haiyan Haiyan" encode_res = urlencode({"k":wb},encoding="utf-8 ") print(encode_res) #k=haiyan%E6%B5%B7%E7%87%95 keywords = encode_res.split("=")[1] #haiyan%E6%B5%B7%E7%87%95 url = "https://www.baidu.com/s?wd=%s&pn=1"%(keywords) # url = "https://www.baidu.com/s?"+encode_res print(url) # Then Spliced ​​into url response = requests.get( url, headers = { "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36 ", }

crawl the web page. The request link above

returns a string in the form of JSON. If you request an ordinary web page, you will definitely get the corresponding content.Let’s take the 'Zhihu'---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- /537.36 (KHTML, like Gecko) ' 'Chrome/72.0.3626.121 Safari/537.36'}r = requests.get('https://www.zhihu.com/explore',headers=headers)pattern = re.compile( 'explore-feed.*?question_link.*?(.*?)/a',re.S)titles = re.findall(pattern,r.text)print(titles)

Here we add headers information, which contains User-Agent field information, that is, browser identification information. If you don't add this, Zhihu will prohibit crawling.

Next we used the most basic regular expression to match all the problem content. I will write a separate blog about the content of regular expressions, and here is an example to explain it together. The result of

running is as follows:

['\nHow to conduct a complete and efficient legal search? \n', '\nHow to draw a Buddha in C language? \n', '\nWhat is it like to pick up a dog? \n', '\nWhy don't boys chase after halfway? \n', '\nWhy is Nobita Nobita called Nobita Sea King? What are the specific deeds? \n', '\nHow terrible can a person's efforts be? \n', '\nWhat is the most educated person you have ever met? \n', '\nIs there any book from contemporary Chinese writers in Europe and the United States that sells well? Any industry can do it? \n', '\nWhich actor in Deyun Club is the best at speaking/Double Supply High? \n', '\nWhy do Luffy's ship stop at the dock and the enemy won't destroy the ship? \n']

We found that all the problem content was successfully extracted here.

crawl binary data

In the above example, we crawl a page on Zhihu, and in fact it returns an HTML document. What should I do if I want to capture pictures, audio, video, etc.?

Pictures, audio, and video files are essentially composed of binary codes. Only because of the specific saving format and corresponding analysis methods can we see these various multimedia. So, if you want to grab them, you have to get their binary code.

Let’s take GitHub’s site icon as an example to see:

import requestsresponse = requests.get('https://github.com/favicon.ico')print(response.text)print(response.content)

crawl here The content is the site icon, that is, the small icon of the browser on each tag, as shown in the figure below

Here two properties of the Response object are printed, one is text and the other is content. The result of running

is shown in the figure below, where the first two lines are the result of response.text and the last line is the result of response.content.

You can notice that the former has garbled code, and the latter has a b in front of the result, which represents data of type bytes. Since the picture is binary data, the former is converted to str type when printing, that is, the picture is converted directly into a string, which is naturally garbled.

Next, we save the image we just extracted:

import requestsresponse = requests.get('https://github.com/favicon.ico') with open('favicon.ico','wb')as f :f.write(response.content)

The open() method is used here. Its first parameter is the file name, and the second parameter is opened in binary writing form, so that binary data can be written into the file. After the

run, you can find that an icon named favicon.ico appears in the folder, as shown in the figure:

Similarly, audio and video files can also be obtained in this way.

Add headers

and urllib.request, we can also pass header information through the headers parameter.

For example, in the example above "Zhihu", if headers are not passed, you cannot request normally:

But if headers are added and User-Agent information is added, then there is no problem:

Of course, we can Add other field information anytime to the headers parameter.

1.4 POST request

Previously, we learned about the most basic GET request. Another common request method is POST. Implementing POST requests using requestsrs is also very simple. The example is as follows:

import requestsdata = {'name':'germey','age':'22'}r = requests.post('http://httpbin.org/post', data=data)print(r.text)

Here, request http://httpbin.org/post. The website can determine that if the request is POST, it will return the relevant request information.

running structure is as follows:

{"args": {}, "data": "", "files": {}, "form": {"age": "22", "name": "germey"}, "headers": {"Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "18", "Content-Type": "application/x-www- form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0"}, "json": null, "origin": "139.226.173.216, 139.226.173.216 ", "url": "https://httpbin.org/post"}

can be found that we successfully obtained the return result, where the form part is the submitted data, which proves that the POST request was sent successfully.

1.5 After sending a request in

, the response is naturally obtained. In the above example, we use text and content to get the content of the response. In addition, there are many attributes and methods that can be used to obtain information, such as status codes, response headers, cookies, etc. The example is as follows:

import requestsr = requests.get('http://www.jianshu.com')print(type(r.status_code),r.status_code)print(type(r.headers),r.headers)print( type(r.cookies),r.cookies)print(type(r.url),r.url)print(type(r.history),r.history)

Here we type the status-code attribute output respectively to get the status code. The output headers attribute is obtained to get the response header, the output cookies attribute is obtained to get the cookies, the output url is obtained to get the URL, and the output history attribute is obtained to get the request history.

run result is as follows:

class 'int' 403class 'requests.structures.CaseInsensitiveDict' {'Server': 'Tengine', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection ': 'keep-alive', 'Date': 'Thu, 07 Mar 2019 11:58:21 GMT', 'Vary': 'Accept-Encoding', 'Strict-Transport-Security': 'max-age=31536000 ; includeSubDomains; preload', 'Content-Encoding': 'gzip', 'x-alicdn-da-ups-status': 'endOs,0,403', 'Via': 'cache25.l2nu17-1[4,0], cache19.l2nu17[4,0], cache3.cn550[86,0]', 'Timing-Allow-Origin': '*', 'EagleId': '24faeb4315519599018091620e'}class 'requests.cookies.RequestsCookieJar' RequestsCookieJar[] class 'str' https://www.jianshu.com/class 'list' [Response [301]]

Because session_id is too long, it is abbreviated again. You can see that the results obtained by the two attributes of headers and cookies are CaseInsensitiveDict respectively and RequestsCookieJar type.

status code judging

status code is often used to determine whether the request is successful, and requests Haiting Highway is a built-in status code query object request.code, the example is as follows:

import requestsr = requests.get('http://www.jianshu.com ')exit() if not r.status_code == requests.codes.ok else print('Request Successfully')

Here, by comparing the return code with the built-in successful return code, we ensure that the request gets a normal response and outputs a successful request If the message is otherwise the program will terminate, here we use request.code.ok to get the successful status code 200.

. Then we definitely cannot have the only condition code ok.The return code and the corresponding query conditions are listed below:

information status code

success status code

redirect status code

client error status code

server error status code

advanced operation

on full, we understand the basics of requests Usage, such as basic GET, POST request and Response object. Next, let’s learn about some advanced usages of request, such as uploading files and cookies settings. Proxy settings, etc.

1, file upload

, we know requests can simulate submitting some data. If some websites need to upload files, we can also use it to achieve it. This is very simple, the example is as follows:

import requestsfiles = {'file':open('favicon.ico','rb')}r = requests.post('http://www.httpbin.org/post',files= files)print(r.text)

Before we saved a file favicon.ico, and this time we tried it to simulate the process of file uploading. It should be noted that favicon.ico needs to be in the same directory as the current script. If there are other files, of course you can use other files to upload them, just change the code.

1. What are Requests?

Requests is written in Python, rewritten based on urllib3, and uses the HTTP library of the Apache2 Licensed source protocol.

It is more convenient than urllib, can save a lot of our work and fully meet HTTP testing needs.

in one sentence----A simple and easy-to-use HTTP library implemented by Python.

1.1 Basic usage

Install Requests

pip3 install requests

# Various request methods: commonly used are requests.get() and requeststs.post() import requests r = requests.get('https://api.github.com/events ') r = requests.post('http://httpbin.org/post', data = {'key':'value'}) r = requests.put('http://httpbin.org/put', data = {'key':'value'}) r = requests.delete('http://httpbin.org/delete') r = requests.head('http://httpbin.org/get') r = requests.options('http://httpbin.org/get')

1.3 GET Request

First, build a simplest GET request, the link to the request is http://httpbin.org/get, and the website will judge If the client initiates a GET request, it returns the corresponding request information:

import requestsr = requests.get('http://httpbin.org/get')print(r.text)

GET request with parameter GET request

can be found , We successfully initiated a GET request, and the return result contains the request header, URL, IP and other information.

So, for GET requests, if you want to attach additional information, how do you usually add it? Splice it behind the URL, use one? Let's divide the parameters, pass them over and then use the symbol of & to divide them. For example, now I want to add two parameters, where name is germery and age is 22. Yo construct this request link, can it be written directly as:

r = requests.get ('http://httpbin.org/get?name=germery&age=22')

This is also OK, but in general, this information data will be stored in a dictionary. So how do you construct this link? -----It's good to use the params parameter. The example is as follows:

import requestsdata = {'name':'germary','age':22}r = requests.get('http://httpbin.org/get' ,params=data)print(r.text)

can be judged by the running result, and the requested link is automatically formed: http://httpbin.org/get? name=germery&age=22.

parses json

In addition, the return type of the web page is actually str type, but it is very special and is in JSON format. So, if you want to directly parse and return the result and get a dictionary format, you can directly call the json() method. The example is as follows:

import requestsr = requests.get('http://httpbin.org/get')print(type(r .text))print(r.json())print(type(r.json()))

You can find that by calling the json() method, you can convert the string in JSON format into a dictionary.

But it should be noted that if the returned result is not in JSON format, a parsing error will occur, and a json.decoder.JSONDecoderError exception will be thrown.

#If the query keyword is Chinese or there are other special symbols, you have to url encoding from urllib.parse import urlencode wb = "haiyan Haiyan" encode_res = urlencode({"k":wb},encoding="utf-8 ") print(encode_res) #k=haiyan%E6%B5%B7%E7%87%95 keywords = encode_res.split("=")[1] #haiyan%E6%B5%B7%E7%87%95 url = "https://www.baidu.com/s?wd=%s&pn=1"%(keywords) # url = "https://www.baidu.com/s?"+encode_res print(url) # Then Spliced ​​into url response = requests.get( url, headers = { "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36 ", }

crawl the web page. The request link above

returns a string in the form of JSON. If you request an ordinary web page, you will definitely get the corresponding content.Let’s take the 'Zhihu'---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- /537.36 (KHTML, like Gecko) ' 'Chrome/72.0.3626.121 Safari/537.36'}r = requests.get('https://www.zhihu.com/explore',headers=headers)pattern = re.compile( 'explore-feed.*?question_link.*?(.*?)/a',re.S)titles = re.findall(pattern,r.text)print(titles)

Here we add headers information, which contains User-Agent field information, that is, browser identification information. If you don't add this, Zhihu will prohibit crawling.

Next we used the most basic regular expression to match all the problem content. I will write a separate blog about the content of regular expressions, and here is an example to explain it together. The result of

running is as follows:

['\nHow to conduct a complete and efficient legal search? \n', '\nHow to draw a Buddha in C language? \n', '\nWhat is it like to pick up a dog? \n', '\nWhy don't boys chase after halfway? \n', '\nWhy is Nobita Nobita called Nobita Sea King? What are the specific deeds? \n', '\nHow terrible can a person's efforts be? \n', '\nWhat is the most educated person you have ever met? \n', '\nIs there any book from contemporary Chinese writers in Europe and the United States that sells well? Any industry can do it? \n', '\nWhich actor in Deyun Club is the best at speaking/Double Supply High? \n', '\nWhy do Luffy's ship stop at the dock and the enemy won't destroy the ship? \n']

We found that all the problem content was successfully extracted here.

crawl binary data

In the above example, we crawl a page on Zhihu, and in fact it returns an HTML document. What should I do if I want to capture pictures, audio, video, etc.?

Pictures, audio, and video files are essentially composed of binary codes. Only because of the specific saving format and corresponding analysis methods can we see these various multimedia. So, if you want to grab them, you have to get their binary code.

Let’s take GitHub’s site icon as an example to see:

import requestsresponse = requests.get('https://github.com/favicon.ico')print(response.text)print(response.content)

crawl here The content is the site icon, that is, the small icon of the browser on each tag, as shown in the figure below

Here two properties of the Response object are printed, one is text and the other is content. The result of running

is shown in the figure below, where the first two lines are the result of response.text and the last line is the result of response.content.

You can notice that the former has garbled code, and the latter has a b in front of the result, which represents data of type bytes. Since the picture is binary data, the former is converted to str type when printing, that is, the picture is converted directly into a string, which is naturally garbled.

Next, we save the image we just extracted:

import requestsresponse = requests.get('https://github.com/favicon.ico') with open('favicon.ico','wb')as f :f.write(response.content)

The open() method is used here. Its first parameter is the file name, and the second parameter is opened in binary writing form, so that binary data can be written into the file. After the

run, you can find that an icon named favicon.ico appears in the folder, as shown in the figure:

Similarly, audio and video files can also be obtained in this way.

Add headers

and urllib.request, we can also pass header information through the headers parameter.

For example, in the example above "Zhihu", if headers are not passed, you cannot request normally:

But if headers are added and User-Agent information is added, then there is no problem:

Of course, we can Add other field information anytime to the headers parameter.

1.4 POST request

Previously, we learned about the most basic GET request. Another common request method is POST. Implementing POST requests using requestsrs is also very simple. The example is as follows:

import requestsdata = {'name':'germey','age':'22'}r = requests.post('http://httpbin.org/post', data=data)print(r.text)

Here, request http://httpbin.org/post. The website can determine that if the request is POST, it will return the relevant request information.

running structure is as follows:

{"args": {}, "data": "", "files": {}, "form": {"age": "22", "name": "germey"}, "headers": {"Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "18", "Content-Type": "application/x-www- form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0"}, "json": null, "origin": "139.226.173.216, 139.226.173.216 ", "url": "https://httpbin.org/post"}

can be found that we successfully obtained the return result, where the form part is the submitted data, which proves that the POST request was sent successfully.

1.5 After sending a request in

, the response is naturally obtained. In the above example, we use text and content to get the content of the response. In addition, there are many attributes and methods that can be used to obtain information, such as status codes, response headers, cookies, etc. The example is as follows:

import requestsr = requests.get('http://www.jianshu.com')print(type(r.status_code),r.status_code)print(type(r.headers),r.headers)print( type(r.cookies),r.cookies)print(type(r.url),r.url)print(type(r.history),r.history)

Here we type the status-code attribute output respectively to get the status code. The output headers attribute is obtained to get the response header, the output cookies attribute is obtained to get the cookies, the output url is obtained to get the URL, and the output history attribute is obtained to get the request history.

run result is as follows:

class 'int' 403class 'requests.structures.CaseInsensitiveDict' {'Server': 'Tengine', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection ': 'keep-alive', 'Date': 'Thu, 07 Mar 2019 11:58:21 GMT', 'Vary': 'Accept-Encoding', 'Strict-Transport-Security': 'max-age=31536000 ; includeSubDomains; preload', 'Content-Encoding': 'gzip', 'x-alicdn-da-ups-status': 'endOs,0,403', 'Via': 'cache25.l2nu17-1[4,0], cache19.l2nu17[4,0], cache3.cn550[86,0]', 'Timing-Allow-Origin': '*', 'EagleId': '24faeb4315519599018091620e'}class 'requests.cookies.RequestsCookieJar' RequestsCookieJar[] class 'str' https://www.jianshu.com/class 'list' [Response [301]]

Because session_id is too long, it is abbreviated again. You can see that the results obtained by the two attributes of headers and cookies are CaseInsensitiveDict respectively and RequestsCookieJar type.

status code judging

status code is often used to determine whether the request is successful, and requests Haiting Highway is a built-in status code query object request.code, the example is as follows:

import requestsr = requests.get('http://www.jianshu.com ')exit() if not r.status_code == requests.codes.ok else print('Request Successfully')

Here, by comparing the return code with the built-in successful return code, we ensure that the request gets a normal response and outputs a successful request If the message is otherwise the program will terminate, here we use request.code.ok to get the successful status code 200.

. Then we definitely cannot have the only condition code ok.The return code and the corresponding query conditions are listed below:

information status code

success status code

redirect status code

client error status code

server error status code

advanced operation

on full, we understand the basics of requests Usage, such as basic GET, POST request and Response object. Next, let’s learn about some advanced usages of request, such as uploading files and cookies settings. Proxy settings, etc.

1, file upload

, we know requests can simulate submitting some data. If some websites need to upload files, we can also use it to achieve it. This is very simple, the example is as follows:

import requestsfiles = {'file':open('favicon.ico','rb')}r = requests.post('http://www.httpbin.org/post',files= files)print(r.text)

Before we saved a file favicon.ico, and this time we tried it to simulate the process of file uploading. It should be noted that favicon.ico needs to be in the same directory as the current script. If there are other files, of course you can use other files to upload them, just change the code.

{"args": {}, "data": "", "files": {"file": "data:application/octet-stream;base64,AAABAAIAEBAAAAEAIAAoBQAAJgAAACAgAAABACAAKBQAAE4FAAAoAAAAEAAAACAAAAABACAAAAAAAAAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABERE3YTExPFDg4OEgAAAAAAAAAADw8PERERFLETExNpAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABQUFJYTExT8ExMU7QAAABkAAAAAAAAAAAAAABgVFRf/FRUX/xERE4UAAAAAAAAAAAAAAAAAAAAAAAAAABEREsETExTuERERHhAQEBAAAAAAAAAAAAAAAAAAAAANExMU9RUVF/8VFRf/EREUrwAAAAAAAAAAAAAAABQUFJkVFRf/BgYRLA4ODlwPDw/BDw8PIgAAAAAAAAAADw8PNBAQEP8VFRf/ FRUX/xUVF/8UFBSPAAAAABAQEDAPDQ//AAAA+QEBAe0CAgL/AgIC9g4ODjgAAAAAAAAAAAgICEACAgLrFRUX/xUVF/8VFRf/FRUX/xERES0UFBWcFBQV/wEBAfwPDxH7DQ0ROwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA0NEjoTExTnFRUX/xUVF/8SEhKaExMT2RUVF/8VFRf/ExMTTwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAERERTBUVF/8VFRf/ExMT2hMTFPYVFRf/FBQU8AAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAITExTxFRUX/xMTFPYTExT3FRUX/xQUFOEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBQU4RUVF/8TExT3FBQU3hUVF/8TExT5Dw8PIQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEBAQHxMTFPgVFRf/FBQU3hERFKIVFRf/FRUX/w8PDzQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAQEEAVFRf /FRUX/xERFKIODg44FRUX/xUVF/8SEhKYAAAAAAAAAAwAAAAKAAAAAAAAAAAAAAAMAAAAAQAAAAASEhKYFRUX/xUVF/8ODg44AAAAABERFKQVFRf/ERESwQ4ODjYAAACBDQ0N3BISFNgSEhTYExMU9wAAAHQFBQU3ERESwRUVF/8RERSkAAAAAAAAAAAAAAADExMTxhUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8TExPGAAAAAwAAAAAAAAAAAAAAAAAAAAMRERSiFRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF /8RERSiAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAQED4TExOXExMT2RISFPISEhTyExMT2RMTE5cQEBA+/NAQEAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABYWGO0WFhfzFhYYlRwcHCUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACQkJAcWFhiAFhYY+BUVF/8VFRf/FRUX/yAgIAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRUX/hUVF/8VFRf/FhYY+RYWGIIgICAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAbGxscFhYX0BUVF/8VFRf/FRUX/xUVF/8VFRf/KysrBgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVFRf9FRUX/xUVF/8VFRf/FRUX/xYWF9IaGhoeAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFhYbLxUVF+YVFRf/FRUX/BYWGLgWFhh0FhYZZxYWGH5VVVUDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABUVF/wVFRf/FRUX/ xUVF/8VFRf/FRUX/xUVF+YWFhsvAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABoaGh0VFRfmFRUX/xUVF/wYGBhJAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRUX+xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF+YaGhodAAAAAAAAAAAAAAAAAAAAAAAAAAAkJCQHFhYX0RUVF/8VFRf/FRUYnQAAAAAVFSAYFhYYcxUVF5AXFxlmJCQkBwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABwcHBIVFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xYWF9EkJCQHAAAAAAAAAAAAAAAAAAAAABYWGIEVFRf/ FRUX/xUVF/EbGxscHBwcJRYWGOsVFRf/FRUX/xUVF/8XFxpOAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGBgYQBUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xYWGIAAAAAAAAAAAAAAAAAVFRwkFhYY+RUVF/8VFRjuFhYaRRUVKwwWFhfPFRUX/xUVF/8VFRf/FRUX/xYWF8SAgIACAAAAAAAAAAAAAAAAAAAAAAAAAAAVFRi/FRUX/xUVF/8VFRf/FRUX/ xUVF/8VFRf/FRUX/xUVF/8VFRf/FhYY+BYWHSMAAAAAAAAAABYWGJQVFRf/FRUX/xYWF44XFxpaFhYX0RUVF/8VFRf/FRUY4hYWGIAWFhpFHBwcEgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACIiIg8XFxdCFxcZexYWF9sVFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FxcYkwAAAAAnJycNFRUX8hUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/hYWGIIzMzMFAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgICAAhYWGHQVFRf8FRUX/xUVF/ 8VFRf/FRUX/xUVF/8VFRfyFRUrDBYWGVIVFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8WFhh0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABUVGGAVFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8WFhlSFRUZkRUVF/8VFRf/FRUX/xUVF/8VFRf/FRUYyv///wEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABYWGLcVFRf/FRUX/xUVF/ 8VFRf/FRUX/xUVGZEWFhjJFRUX/xUVF/8VFRf/FRUX/xUVF/8WFhlcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFhYZRxUVF/8VFRf/FRUX/xUVF/8VFRf/FhYYyBYWGOEVFRf/FRUX/xUVF/8VFRf/FRUX/xcXFxYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgICAIFhYY+BUVF/8VFRf/FRUX/xUVF/8WFhjgFhYY9RUVF/8VFRf/FRUX/ xUVF/8VFRfyAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWFhjeFRUX/xUVF/8VFRf/FRUX/xYWGPUWFhfzFRUX/xUVF/8VFRf/FRUX/xYWGN4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABUVGMoVFRf/FRUX/xUVF/8VFRf/FhYX8xUVGNkVFRf/FRUX/xUVF/8VFRf/FhYY9P///wEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFhYY4RUVF/8VFRf/FRUX/xUVF/8VFRjZFRUYvxUVF/8VFRf/ FRUX/xUVF/8VFRf/HBwcJQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAgIBAVFRf/FRUX/xUVF/8VFRf/FRUX/xUVGL8WFhiVFRUX/xUVF/8VFRf/FRUX/xUVF/8WFhh2AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRUYYRUVF/8VFRf/FRUX/xUVF/8VFRf/FhYYlRYWGUcVFRf/FRUX/xUVF/8VFRf/FRUX/xYWGPQZGRkfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABsbGxMWFhjrFRUX/xUVF/ 8VFRf/FRUX/xUVF/8WFhlHKysrBhUVF/EVFRf/FRUX/xUVF/8VFRf/FRUX/xYWGV0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGBgYSRUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX8SsrKwYAAAAAFhYYlxUVF/8VFRf/FRUX/xUVF/8VFRf/GRkZMwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaGhoeFRUX/xUVF/8VFRf/FRUX/xUVF/8WFhiXAAAAAAAAAAAVFSAYFhYY9BUVF/ 8VFRf/FRUX/xUVF/8YGBg1AAAAAAAAAAAAAAAAFRUrDBgYGCqAgIACAAAAAAAAAAAAAAAAAAAAAP///wEbGxsmHh4eEQAAAAAAAAAAAAAAABcXFyEVFRf/FRUX/xUVF/8VFRf/FhYY9BUVIBgAAAAAAAAAAAAAAAAWFhiCFRUX/xUVF/8VFRf/FRUX/xcXGWYAAAAAQEBABBcXF2IWFhfnFRUX/xYWF/MWFhfSFRUYwRUVGMAWFhfRFRUX8BUVF/8WFhjtFRUYbCsrKwYAAAAAFhYZUhUVF/8VFRf/FRUX/xUVF/8WFhiCAAAAAAAAAAAAAAAAAAAAACQkJAcWFhjIFRUX/xUVF/8VFRf/FRUY1hUVGKgWFhjsFRUX/xUVF/ 8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX7xUVGKoVFRjNFRUX/xUVF/8VFRf/FhYYyCQkJAcAAAAAAAAAAAAAAAAAAAAAAAAAABUVIBgVFRjjFRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/ FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVGOMVFSAYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABYWHC4VFRjjFRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/ FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRjjFhYcLgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABUVIBgWFhjIFRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FhYYyBUVIBgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACQkJAcWFhiCFhYY9BUVF/8VFRf/ FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FhYY9BYWGIIkJCQHAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVFSAYFhYYlxUVF/EVFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX/xUVF/8VFRf/FRUX8RYWGJcVFSAYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAKysrBhYWGUcWFhiVFRUYvxUVGNkWFhfzFhYX8xUVGNkVFRi/=" }, "form": {}, "headers": {"Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "6665", "Content-Type ": "multipart/form-data; boundary=bc5b787ed0aaa230edb1b69ac4c30b65", "Host": "www.httpbin.org", "User-Agent": "python-requests/2.21.0"}, "json": null, " origin": "139.226.173.216, 139.226.173.216", "url": "https://www.httpbin.org/post"}

This website will return a response, which contains the files field, and the form field is empty , indicating that the file upload section will have a files field to identify it.

Cookies

We used urllib to process cookies, and the writing method is relatively complicated. With requests, it only takes one step to obtain and set cookies.

Let’s first look at the process of obtaining cookies with an example:

import requestsr = requests.get('https://www.baidu.com')print(r.cookies) for key,value in r.cookies.items() :print(key + '=' + value)

The running result is as follows:

RequestsCookieJar[Cookie BDORZ=27315 for .baidu.com/]BDORZ=27315

Here we first call the Cookies attribute to successfully obtain cookies. You can find that it is the RequestsCookieJar type .

and then use the items() method to convert it into a list composed of tuples, traversal and output the name and value of each cookie, and realize the traversal analysis of cookies.

Of course, we can also directly use cookies to maintain the login status. The following is to illustrate it with Zhihu as an example. First log in to Zhihu and copy the content of the cookies in the headers as shown in the figure below:

Cookies

cookie in the headers: _zap=e521c080-1303-4d99-ab5f-8ac180ec6688; _xsrf=bHjCx4sOokNf4gbxOfPSZiXpy6ql8x08; d_c0="ANBgHk5FFg-PTtXfZVwUnV1ZKgwKVqET2ao=|1551972600" ; capsion_ticket="2|1:0|10:1551972647|14:capsion_ticket|44:NjdiNGU5MGMzYjlkNDg2MThjZTllYjU1MTQxMzk4ZDE=|ef3c8af44b4003a5c888809fc5983a6604c98ab3f4a7692b0b394aaf2a3f1aac"; tgw_l7_route=578107ff0d4b4f191be329db6089ff48; z_c0="2|1:0|10:1551972730|4:z_c0|92:Mi4xSExXQkNBQUFBQUFBMEdBZVRrVVdEeVlBQUFCZ0FsVk5lb2R1WFFDQjZUNC1kS29EYkJlcHlKSkNyZW1BMWwtQURB |3df7c4c89ca396eef71e2e6793cee37511f5415d33e8ead60500d245bb29dc7b"

Here you can replace it with your own cookies, set it to the headers, and then send a request. The example is as follows:

Of course, we can also set it through the cookies parameter, but in this way, you need to construct the RequestsCookieJar object, and it needs to be split Just cookies. This is relatively cumbersome, but the effect is still the same. The example is as follows:

session maintains

in requests. If you directly use methods such as get() or post(), you can indeed simulate the web page request, but this is actually equivalent to different The session is equivalent to opening different pages with two browsers.

imagine such a scenario. The first request is to log in to a certain website using the post() method. The second time you want to get your personal information after successfully logging in, you have used the get() method to request this person's information page. In fact, this is equivalent to opening two browsers, two completely unrelated sessions. Can you successfully obtain personal information? Of course not.

SSL certificate verification

proxy settings

For some websites, requests several times during testing, and the content can be obtained normally. However, once large-scale crawling begins, for large-scale and frequent requests, the website may pop up verification codes, or jump to the login authentication page, and may even directly block the client's IP, resulting in inaccessibility for a period of time.

So in order to prevent this from happening, we need to set up a proxy to solve this problem, which requires the proxies parameter. You can set it in this way:

import requestsproxies = {'http':'http://10.10.1.10:3128','https':'http://10.10.1.10:1080',}requests.get('https ://www.taobao.com',proxies=proxies)
import requestproxies = {'http':'http://user:password@10.10.1.10:3128',}requests.get('https://www. taobao.com',proxies=proxies)

Timeout Set

When the local network is not good or the server network delay response is too slow or even unresponsive, we may wait for special time before we can receive the response, and even in the end it cannot be received. An error was reported to the response. In order to prevent the server from responding in time, a timeout time should be set, that is, if the time has not been responded, an error will be reported. This requires the timeout parameter. This time is calculated as the time when the request is issued to the server to return the response.The example is as follows:

ID authentication

Prepared Requesttml1

Original link: https://www.cnblogs.com/zhangrenguo/p/10491821.html

Cookies

We used urllib to process cookies, and the writing method is relatively complicated. With requests, it only takes one step to obtain and set cookies.

Let’s first look at the process of obtaining cookies with an example:

import requestsr = requests.get('https://www.baidu.com')print(r.cookies) for key,value in r.cookies.items() :print(key + '=' + value)

The running result is as follows:

RequestsCookieJar[Cookie BDORZ=27315 for .baidu.com/]BDORZ=27315

Here we first call the Cookies attribute to successfully obtain cookies. You can find that it is the RequestsCookieJar type .

and then use the items() method to convert it into a list composed of tuples, traversal and output the name and value of each cookie, and realize the traversal analysis of cookies.

Of course, we can also directly use cookies to maintain the login status. The following is to illustrate it with Zhihu as an example. First log in to Zhihu and copy the content of the cookies in the headers as shown in the figure below:

Cookies

cookie in the headers: _zap=e521c080-1303-4d99-ab5f-8ac180ec6688; _xsrf=bHjCx4sOokNf4gbxOfPSZiXpy6ql8x08; d_c0="ANBgHk5FFg-PTtXfZVwUnV1ZKgwKVqET2ao=|1551972600" ; capsion_ticket="2|1:0|10:1551972647|14:capsion_ticket|44:NjdiNGU5MGMzYjlkNDg2MThjZTllYjU1MTQxMzk4ZDE=|ef3c8af44b4003a5c888809fc5983a6604c98ab3f4a7692b0b394aaf2a3f1aac"; tgw_l7_route=578107ff0d4b4f191be329db6089ff48; z_c0="2|1:0|10:1551972730|4:z_c0|92:Mi4xSExXQkNBQUFBQUFBMEdBZVRrVVdEeVlBQUFCZ0FsVk5lb2R1WFFDQjZUNC1kS29EYkJlcHlKSkNyZW1BMWwtQURB |3df7c4c89ca396eef71e2e6793cee37511f5415d33e8ead60500d245bb29dc7b"

Here you can replace it with your own cookies, set it to the headers, and then send a request. The example is as follows:

Of course, we can also set it through the cookies parameter, but in this way, you need to construct the RequestsCookieJar object, and it needs to be split Just cookies. This is relatively cumbersome, but the effect is still the same. The example is as follows:

session maintains

in requests. If you directly use methods such as get() or post(), you can indeed simulate the web page request, but this is actually equivalent to different The session is equivalent to opening different pages with two browsers.

imagine such a scenario. The first request is to log in to a certain website using the post() method. The second time you want to get your personal information after successfully logging in, you have used the get() method to request this person's information page. In fact, this is equivalent to opening two browsers, two completely unrelated sessions. Can you successfully obtain personal information? Of course not.

SSL certificate verification

proxy settings

For some websites, requests several times during testing, and the content can be obtained normally. However, once large-scale crawling begins, for large-scale and frequent requests, the website may pop up verification codes, or jump to the login authentication page, and may even directly block the client's IP, resulting in inaccessibility for a period of time.

So in order to prevent this from happening, we need to set up a proxy to solve this problem, which requires the proxies parameter. You can set it in this way:

import requestsproxies = {'http':'http://10.10.1.10:3128','https':'http://10.10.1.10:1080',}requests.get('https ://www.taobao.com',proxies=proxies)
import requestproxies = {'http':'http://user:password@10.10.1.10:3128',}requests.get('https://www. taobao.com',proxies=proxies)

Timeout Set

When the local network is not good or the server network delay response is too slow or even unresponsive, we may wait for special time before we can receive the response, and even in the end it cannot be received. An error was reported to the response. In order to prevent the server from responding in time, a timeout time should be set, that is, if the time has not been responded, an error will be reported. This requires the timeout parameter. This time is calculated as the time when the request is issued to the server to return the response.The example is as follows:

ID authentication

Prepared Requesttml1

Original link: https://www.cnblogs.com/zhangrenguo/p/10491821.html