Python crawler example tutorial-Douban movie rankings--python crawler requests library

2021/09/0521:59:07 technology 116

In the first few lessons, we conducted simple web page collection and Baidu translation through the requests library. In this lesson, we will continue to explain the case-the Douban Movie Ranking List of the Python Crawler Example Tutorial, this case Similar to the case of the previous lesson, it will also involve JSON modules, asynchronous loading and partial loading methods. Next, we will explain the operation methods one by one.


1. Main content obtained


We mainly pass the Douban Movie Ranking (https://movie.douban.com/typerank?type_name=%E5%96%9C%E5%) 89%A7&type=24&interval_id=100:90&action= )

This website gets the related information of the movie, such as the link, title, rating, etc. (see below)


img0p pimg_br5

2. Analyze problem-solving ideas


First we open the URL we want to crawl, we will find that by dragging the mouse slider, the movie is constantly being loaded, and the URL does not change, so Can we immediately think of the case that we did in the previous lesson. Baidu search has the same effect-ajax is asynchronous, so we can get URL information, headers, keywords and other information, we can no longer view it through all, but choose xpath to view (as follows Figure)


Python crawler example tutorial-Douban movie rankings--python crawler requests library - DayDayNews


3. Write the code


The first step,Import the requests module


Python crawler example tutorial-Douban movie rankings--python crawler requests library - DayDayNews


The second step is to get information such as url, parameters, headers, etc.



p

aja Get url, parameters, headers information through xpath (as follows)


Python crawler example tutorial-Douban movie rankings--python crawler requests library - DayDayNews

Python crawler example tutorial-Douban movie rankings--python crawler requests library - DayDayNews


We also know from the above figure that the request type of the web page is get, and the response type is JSON, so the code It is as follows:


Python crawler example tutorial-Douban movie rankings--python crawler requests library - DayDayNews


Note that:


(1) The URL parameter “limit” has been removed from the “limit” of

_p5=1.

(2) The value of "limit" in the parameter is changed to 100. The reason is that "limit" represents the number of movies. We don't just want to get information about 1 movie, we want to get 100, of course the number can be according to needs Change


to learn more

.

technology Category Latest News