Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos.

2025/07/0109:46:39 hotcomm 1469

Welcome to click on the upper right corner to follow the editor. In addition to sharing technical articles, there are many benefits. Private message learning materials can be obtained, including Python practical drills, large navigation plans, BAT internal recommendations, learning materials, etc.

crawl data

Since the release of the Korean restriction order, Running man has been removed from all major video websites except B.S., so this article starts from B.S. to capture all comments on related videos.

Because there are many related videos, this article has selected the most representative video with the most views.

After entering this page, start catching packets (https://www.bilibili.com/video/av18089528?from=searchseid= 16848360519725142300).

Continuously clicking on the next page, you can find that reply?callback=This file is always appearing.

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

After opening one of the files, you can see that the comments on each side are inside; just build a similar URL and you can crawl all the comments down.

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

analyze this URL:

https://api.bilibili.com/x/v2/replycallback=jQuery17201477141935656543_1541165464647Jsonp=jsonppn=368type=1oid=18089528sort=0_=1541165714862

pn is the number of pages, _corresponding to the number of seconds from January 1, 1971, you can get it directly with time.time, and the remaining parameters remain unchanged. The data format is Json, but B station is a bit cunning~

It stores all Json data in jQuery17201477141935656543_1541165464647.

So it needs to be processed when extracting (Talk is cheap, show me the code).

html=requests.get(url,headers=headers).textml=json.loads(html.split('(',1))[1][:-1])

Finally, we crawled all the comments and saved them in Excel. The data format is as follows:

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

When writing to CSV, you must remember to encoding='utf-8'. Because of this lack, the data will always be garbled, because of various strange reasons (clicked, widened, and saved it in place).

Data Cleaning

For various missing data on B station, it is directly replaced with 0; for poetry comments, when it is stored in CSV, one sentence occupies one line, and the rest of its information will be stored in the last line.

So when processing, the previous n-1 line is packaged append to comments on n lines, and then delete n-1 line; for the time returned by B station (similar to 1540882722); use time.strftime('%Y-%m-%d %H:%M:%S, time.localtime()) to 2018/11/12 22:15:15.

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

Data analysis

After cleaning, a total of 7513*11 pieces of data were obtained. Next, some analysis of the data was performed, and the data analysis was completed through Python and R.

Distribution of male and female

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

From the pie chart, it can be seen that nearly 60% of people choose to keep their personal information confidential, and the public information shows that girls are only 3% more than boys. This conclusion is unexpected. It turns out that both men and women like Running man very much.

def male(sex): att=['male','female','confidential'] val=[] for i in att: val.append(sex.count(i)) pie = Pie("", "Gender Pie Chart", title_pos="right", width=1200, height=600) pie.add("", att, val, label_text_color=None, is_label_show=True, legend_orient='vertical', is_more_utils=True, legend_pos='left') pie.render("sexPie.html")

Comment Weekly Distribution

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

Running man's update time in South Korea is every afternoon every week, but it will not be updated until Monday on Bilibili.

Therefore, from the comment week distribution chart, the number of comments on Monday is much larger than other times, followed by Tuesday and Sunday, just before and after the Runnning man update, the number of comments increased a certain amount compared with other time periods.

def ana_week(week): weeks=['Sunday','Monday','Tuesday','Thursday','Friday','Saturday'] output_file('week_bar.html') count=[] for i in sorted(set(week)): if not numpy.isnan(i): count.append(week.count(i)) source = ColumnDataSource(data=dict(weeks=weeks, counts=count,color=['orange','yellowgreen','pink','darksalmon','lightgreen','paleturquoise','lightsteelblue'])) p=figure(x_range=weeks, y_range=(0,4000), plot_height=250, title="Week Counts", toolbar_location=None, tools="") p.vbar(x='weeks', top='counts', color='color',width=0.9, legend="Week", source=source) p.legend.orientation = "horizontal" p.legend.location = "top_right" show(p)

Comment time distribution

In addition to the number of comments per week, I am also very curious about the daily trend of comments. During what time do you usually watch comments?

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

According to the figure above, an explosive increase occurred after 6 o'clock, reaching a peak between 11 and 13 o'clock, followed by a second wave of small climax between 15 and 17 o'clock.

In the evening, except for a certain drop at 20 o'clock, the number of comments was close to 500. Midnight comments are the least, but there are still many night owls.

def ana_hour(hour): h,k=[],[] for i in range(len(hour)): if isinstance(hour[i],str): h.append(hour[i][:2]) for i in sorted(set(h)): k.append(h.count(i)) print(k) output_file('hour_line.html') p = figure(plot_width=400, title='Number of comments per hour', plot_height=400) p.line(sorted(set(h)), k, line_width=2) p.circle(sorted(set(h)), k, fill_color="white", size=8) show(p)

Comments word count and likes

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

Comparing the word count and likes of each comment, from the above figure, we can see that the more words in the comments, the greater the probability of getting likes: the average number of likes for comments with more than 100 words is much higher than those with less than 100 words, and those comments with less than 10 words are basically not liked, so as long as you comment carefully and write everyone's voice, you can gain everyone's recognition.

def com_zan(com,zan): q,w,e,r,t=[],[],[],[],[] for i in range(len(com)): if len(com[i])10: q.append(zan[i]) if 10=len(com[i])50: w.append(zan[i]) if 50=len(com[i])100: e.append(zan[i]) if 100=len(com[i]): r.append(zan[i]) a=go.Box(y=q,name='0-10 words') b=go.Box(y=w,name='10-50 words') c=go.Box(y=e,name='50-100 words') d=go.Box(y=r,name='100 or more words') e=go.Box(y=zan,name='all comments') data=[a,b,e,c,d] layout = go.Layout(legend=dict(font=dict(size=16)), orientation=270) fig = go.Figure(data=data, layout=layout) plotly.offline.plot(data)

Sentiment analysis

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

Perform emotional analysis of everyone's comments separately. The closer they are to 1, the stronger the positive emotions; on the contrary, the closer they are to 0, the stronger the negative emotions.

As can be seen from the above picture, although nearly 600 people’s comments are very negative, most people have 1 point or 0.9 points.

While Running man brings us joy and touch, everyone is full of love for Running man.

def snownlp(com): q=[] for i in com: s=SnowNLP(i) q.append(round(s.sentiments,1)) emotion=[] count=[] for i in sorted(set(q)): emotion.append(str(i)) count.append(q.count(i)) #count=[596, 481, 559, 566, 490, 617, 528, 601, 581, 809, 1685] #emotion=['0.0', '0.1', '0.2', '0.3', '0.4', '0.5', '0.6', '0.7', '0.8', '0.9', '1.0'] output_file('Comment sentiment analysis.html') source = ColumnDataSource(data=dict(emotion=emotion, counts=count)) p = figure(x_range=emotion, y_range=(0, 2000), plot_height=250, title="Comment sentiment analysis", toolbar_location=None, tools="") p.vbar(x='emotion', top='counts', width=0.9, source=source) p.legend.orientation = "horizontal" show(p)

Topic Ranking

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

I have always been curious about which mc is the most popular in the audience, so I made a topic ranking. From the picture above, we can see that Haha is the most topical mc (this result is a bit unexpected) followed by Lee Kwang-soo and Song Ji-hyo.

Because the author counts Running man in 2018, Gary's data is a bit miserable. Compared with the two new members, Quanmei's topic is not a little more popular than Shixi.

def hot(com): #print(com) output_file('Topicity of each member.html') jzg=[' Kim Jong Kook ','Jong Kook','Ability'] gary=['gary','Go Brother'] haha=['haha','HAHA','Haha'] qsm=['Jeon Joomin','Join Girl','Jeon Joobody'] lsz=['Ryo Shizan','Shizan','Little Bit'] name=['Ji Shijin','Yo Jae-suk','Song Ji Hyo','Lee Kwang Soo','Jiu Kook','Gary','haha','Jeon Joomin','Ryo Shizan'] csz,lzs,szx,lgz,jzg,gary,haha,qsm,lsz=[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], for i in com: if 'Chishu Town'in i or 'Shishu Town' in i or 'Shishu Town' in i or 'Shishu' in i or 'A basket of pigs' in i: lgz.append(i) if 'Song Ji Hyo' in i or 'Ji Hyo' in i or 'Mi-Di 'Kim Jong Kook'in i or 'Jung Kook'in i or 'Captible' in i: jzg.append(i) if 'gary'in i or 'Dog brother'in i: gary.append(i) if 'haha'in i or 'HAHA'in i or 'haha'in i: haha.append(i) if 'Jong Joo Min'in i or 'June Girl'in i or 'June Zhao body'in i: qsm.append(i) if 'Liang Shizan'in i or 'Shizan'in i or 'Little Bit'in i: lsz.append(i) count=[len(csz),len(lzs),len(szx),len(lgz),len(jzg),len(gary),len(haha),len(qsm),len(lsz)] source = ColumnDataSource(data=dict(name=name, counts=count,color=['orange', 'yellowgreen', 'pink', 'darksalmon','lightgreen','paleturquoise','lightsteelblue','hotpink','yellow'])) p = figure(x_range=name, y_range=(0, 600), plot_height=250, title="topic ranking", toolbar_location=None, tools="") p.vbar(x='name', top='counts', color='color', width=0.9, source=source) p.legend.orientation = "horizontal" show(p)

Running man has always been unable to lack CP. There were Monday couples Gary and Song Ji Hyo, power couples Yoo Jae-suk and Kim Jong-kook, elderly line Yoo Jae-suk and Ji-suk, my brother and my brother Kim Jong-kook and Haha, the betrayer alliance will definitely touch cross.

Now there are national siblings Yoo Jae-suk and Jeon So-min, Mapo siblings Song Ji Hyo and Haha, barbecue line Kim Jong Kook Haha, etc.

Their relationship is complicated, so the author intends to dig through the various lines in the eyes of the audience.

Member relationship matrix

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

The full score is 100 points. You can see Ji Seo-jin and Yoo Jae-suk; Yoo Jae-suk and Lee Kwang-suk; Kim Jong-kook and Song Ji-hyo; Gary and Song Ji-hyo; Haha and Lee Kwang-suk; Jeon Soo-min and Song Ji-hyo are all very high, among which the correlation between Gary and Song Ji-hyo actually reaches 40, which means that if there is Gary in the comments, there is a 40% chance that Song Ji-hyo will appear. The couple on Monday is really deeply rooted in the hearts of the people.

Secondly is Song Ji Hyo and Kim Jong Kook. It seems that people have always said that they will get married is not groundless; and Liang Shizan is very relevant to the other members, which shows that no one mentioned him alone. I hope Shizan can find his position as soon as possible and gain the recognition of the audience!

def network_edg_csv(com): df=pandas.DataFrame(columns=['Ki Shizhen','Yo Jae-suk','Song Ji Hyo','Lee Kwang Soo','Kim Jong Kook','gary','haha','Jeon Joo-min','Yeon Se-zan'], index=['Ki Shizhen','Yo Jae-suk','Song Ji Hyo','Lee Kwang Soo','Kim Jong Kook','gary','haha','Jeon Joo-min','Yeon Se-zan']) df.loc[:,:]=0.0 for i in com: if (i in 'Ki Shizhen'in i or'Shizhen' in i or'Nose'in i): df['Ki Shizhen']['Ki Shizhen'] = df['Ki Shizhen']['Ki Shizhen'] + 1 if('Liu Zai Shi'in i or 'Zai Shi' in i or 'Grandhought' in i or 'Grasshopper' in i): df['Chi Shi Town']['Liu Zai Shi'] = df['Chi Shi Town']['Liu Zai Shi'] + 1 df['Liu Zai Shi']['Chi Shi Town'] = df['Liu Zai Shi']['Chi Shi Town'] = df['Liu Zai Shi']['Chi Shi Town'] + 1 #Member relationship matrix df calculation method: In the same comment, if Liu Zai Shi and Chi Shi Town appear at the same time, then their contact value +1; then use (the contact value of Liu Zai Shi and Chi Shi Town/the number of times Chi Shi Town appears in the comments) * 100 to get their correlation coefficient.for i in df.index: s=df.loc[i][i] for j in ['chishuijin','Yo Jae-suk','Song Ji-hyo','Lee Kwang-soo','Jin Jong Kook','gary','haha','Jeon Soomin','Liang Shizan']: df.loc[i][j]=df.loc[i][j]/s*100 fig=pyl.figure() names=['chishizhen','liuzaishi','songzhixiao','liguangzhu','jinzgongguo','gary','haha','quanshaomin','liangshizan'] ax=fig.add_subplot(figsize=(100, 100))ax=seaborn.heatmap(df, cmap='rainbow',linewidths = 0.05, vmax = 100,vmin = 0,annot = True, annot_kws = { 'size': 6, 'weight': 'bold'})pyl.xticks(np.arange(9) + 0.5, names,rotation=-90) pyl.yticks(np.arange(9) + 0.5, names,rotation=360) ax.set_title('Characteristic correlation') # Title settings pyl.show()

Social network relationship network

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

In the social network relationship network, the degree of closeness of the connection is divided into four levels according to red, yellow, green and blue. Red means very close connections, while blue is not.

It can be seen that Lee Kwang-soo, Haha and Yoo Jae-suk are very close, and Kim Jong-kook and Song Ji-hyo are also very close. For Gary, since he quit Running man, the members and him have had very little contact.

def network(): data=pandas.read_csv('run_edge.csv',encoding='utf-8',engine='python') G = nx.Graph() pyl.figure(figsize=(20,20)) for i in data.index: G.add_weighted_edges_from([(data.loc[i]['one'],data.loc[i]['two'],data.loc[i]['count'])]) n=nx.draw(G) pyl.show() pos=nx.spring_layout(G) large=[(x,y) for (x,y,z)in G.edges(data=True) if z['weight']100] middle = [(x, y) for (x, y, z) in G.edges(data=True) if 50z['weight'] = 100] middle = [(x, y) for (x, y, z) in G.edges(data=True) if z['weight'] = 50] small=[(x, y) for (x, y, z) in G.edges(data=True) if z['weight']=10] nx.draw_networkx_nodes(G,pos,alpha=0.6) nx.draw_networkx_edges(G,pos,edgelist=large,width=3,edge_color='red') nx.draw_networkx_edges(G,pos, edgelist=middle, width=2, edge_color='yellow') nx.draw_networkx_edges(G,pos, edgelist=middlev, width=1, edge_color='yellowgreen') nx.draw_networkx_edges(G,pos, edgelist=small, width=0.5, edge_color='green') nx.draw_networkx_labels(G,pos,font_size=10,font_family='simhei') pyl.axis('off') pyl.show()

Word Cloud Map

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

This word cloud map I made with R, but the background of R's word cloud map is completely black and white, so I gave up the idea of ​​adding a pattern to the word cloud.

Back to the word cloud map, we can see that everyone has a lot of discussions about the program itself, and at the same time, they also expressed their various love for Running man in the comments.

def comment(com): df=pandas.DataFrame() pl=[] stopword=['','','','. ',',',' ','? ','! ','\n',':','"','"','*','=','(',')',',',',',',',',',','(',')',',',',','[',']',',',','°','? ','! ','.','-','','';',',',','"'"'] for i in range(len(com)): cut_list=jieba.cut(com[i],cut_all=False) w='/'.join(cut_list) w=w.split('/') for j in w: if not j in stopword: pl.append(j) for s in set(pl): if len(s)1: if pl.count(s) 50: x = {} x['word']=s.strip('\n') x['count']=pl.count(s) df=df.append(x,ignore_index=True) print(df) df.to_csv('jieba.csv',encoding='utf-8',index=False, mode='a', header=False) print(df)#The following is used to generate a word cloud map library(wordcloud2)data-read.csv(header=FALSE,'C:/Users/Yiya/PycharmProjects/untitled/venv/share/doc/ jieba.csv')f=data.frame(data)fwordcloud2(f)

Finally, I hope Running man will bring us more and more joy and the ratings will get better and better.

Crawling data Since the release of the Korean restriction order, Runningman has been removed from all major video websites except B.com, so this article starts from B.com to capture all comments on related videos. - DayDayNews

hotcomm Category Latest News