I crawled WeChat friends with Python, and they turned out to be such a person

With the popularity of WeChat, more and more people start to use WeChat. WeChat has gradually changed from a simple social software to a way of life. People need WeChat for daily communication and WeChat for work communication. Every friend in WeChat represents a different role that people play in society.

Today’s article will use to analyze data on WeChat friends based on Python. The dimensions selected here are mainly gender, avatar, signature, and location. mainly uses graphs and word clouds to present results. among them, Two methods of word frequency analysis and sentiment analysis are used for text information. As the saying goes: if a worker wants to do his job well, he must first sharpen his tools. Before starting this article, let’s briefly introduce the third-party modules used in this article:

  • itchat: WeChat web version interface encapsulates the Python version, which is used in this article to obtain WeChat friend information.
  • jieba: Python version of stammering word segmentation, used in this article to segment text information.
  • matplotlib: Python chart drawing module, used in this article to draw column and pie charts
  • snownlp: a Python Chinese word segmentation module, used in this article to make emotional judgments on text information.
  • PIL: Image processing module in Python, used in this article to process images.
  • numpy : Numerical calculation module in Python, used in this article with the wordcloud module.
  • word cloud: Word cloud module in Python, used in this article to draw word cloud pictures.
  • Tencent Youtuyun: Python version SDK provided by Tencent ,In this article, it is used to recognize faces and extract image tag information.

itchat can not be used for reference: How to use Python to view the messages withdrawn by WeChat friends?

The above modules can be installed through pip . For detailed instructions on the use of each module, please refer to their respective documents.

01

Data analysis

The premise of analyzing WeChat friend data is to obtain friend information. By using the itchat module, all this will become very simple. We can achieve this with the following two lines of code:

itchat.auto_login (hotReload = True)

friends = itchat.get_friends(update = True)

Same as we usually log in to the web version of WeChat, we can log in by scanning the QR code with our mobile phone. The friends object returned here is a collection, the first The element is the current user. Therefore, in the following data analysis process, we always take friends[1:] as the original input data. Each element in the set is a dictionary structure. Taking myself as an example, you can notice that there are Sex, City, For the four fields of Province, HeadImgUrl, and Signature, our following analysis starts with these four fields:

02

Friends gender

To analyze the gender of friends, we first need to obtain the gender information of all friends , Here we extract the Sex field of each friend's information, and then count the number of Male, Female, and Unkonw respectively. We assemble these three values ​​into a list, and then use the matplotlib module to draw a pie chart. ,The code implementation is as follows:

def analyseSex(firends):

sexs = list(map(lambda x:x['Sex'],friends[1:]))

counts = list(map(lambda x:x [1],Counter(sexs).items()))

labels = ['Unknow','Male','Female']

colors = ['red','yellowgreen','lightskyblue']

plt.figure(figsize=(8,5), dpi =80)

plt.axes(aspect=1)

plt.pie(counts, #gender statistics result

labels=labels, #gender display Label

colors=colors, #pie chart area color matching

labeldistance = 1.1, #label distance to dot distance

autopct ='%3.1f%%', #pie chart area text format

shadow = False, #Whether the pie chart shows shadow

startangle = 90, #The starting angle of the pie chart

pctdistance = 0.6 #The distance of the pie chart area text from the dot

)

plt.legend(loc='upper right',)

plt.title(u'%s's WeChat friend gender composition'% friends[0]['NickName'])

plt.show()

Here is a brief explanation of this code,There are three values ​​for the gender field in WeChat: Unkonw, Male, and Female, and the corresponding values ​​are 0, 1, and 2, respectively. These three different values ​​are counted through Counter() in the Collection module, and the items() method returns a collection of tuples.

The first dimension element of the tuple represents the key, namely 0, 1, 2, and the second dimension element of the tuple represents the number, and the set of tuples is sorted, that is, the key is 0 , 1, 2 are arranged in order, so the number of these three different values ​​can be obtained through the map() method, and we can pass them to matplotlib for drawing. The percentage of each of these three different values ​​is calculated by matplotlib inferred. The following figure is the gender distribution map of friends drawn by matplotlib:

03

Friend avatar

Analyze the avatars of friends and analyze them from two aspects. First, use face avatars in these friends’ avatars How big is the proportion of friends of?

Here you need to download the avatar to the local according to the HeadImgUrl field, and then use the face recognition API interface provided by Tencent Youtu to detect whether there are faces in the avatar image and extract the tags in the image. Among them, the former is classification and summary, we use pie charts to present the results; the latter is to analyze the text, we use word clouds to present the results. The key code is as follows:

def analyseHeadImage(frineds):

# Init Path

basePath = os.path.abspath('.')

baseFolder = basePath +'\\HeadImages\\'

if( os.path. exists (baseFolder) == False):

os.makedirs(baseFolder)

# Analyse Images

faceApi = FaceAPI()

use_face = 0

use_face = 0

image = 0

image '

for index in range(1,len(friends)):

friend = friends[index]

# Save HeadImages

img file = baseFolder +'\\Image%s.jpg'% str( index)

imgData = itchat.get_head_img(userName = friend['UserName'])

if(os.path.exists(imgFile) == False):

with open(imgFile,'wb') as file:

file.write(imgData)

# Detect Faces

time.sleep(1)

result = faceApi.detectFace(imgFile)

if result == True:

+

use

= 1p use else:

not_use_face += 1

# Extract Tags

result = faceApi.extractTags(imgFile)

image_tags +=','.join(list(map(lambda x:x['tag_name'],result) ))

labels = [u'Use face avatar',u'Do not use face avatar']

counts = [use_face,not_use_face]

colors = ['red','yellowgreen','lightskyblue']

plt.figure(figsize=(8,5), dpi=80)

plt.axes(aspect=1)

plt.pie(counts, #gender statistics result

labels=labels, #gender display Label

colors=colors, #pie chart area color matching

labeldistance = 1.1, #label distance to dot distance

autopct ='%3.1f%%', #pie chart area text format

shadow = False, #Whether the pie chart shows shadow

startangle = 90, #The starting angle of the pie chart

pctdistance = 0.6 #The distance of the pie chart area text from the dot

)

plt.legend(loc='upper right',)

plt.title(U'%s WeChat friends use face avatar situation'% friends[0]['NickName'])

plt.show()

image_tags = image_tags.encode(' iso8859-1'). decode ('utf-8')

back_coloring = np.array(Image.open('face. jpg'))

wordcloud = WordCloud(

font_path='simfang.ttf',

background_color="white",

max_words=1200,

mask=back_coloring,

, max_font_size=75 45,

width=800,

height=480,

margin=15

)

wordcloud.generate(image_tags)

plt. imshow (wordcloud p"0p) _paxis (

) plt.

plt.show()

Here we will create a new HeadImages directory in the current directory,Used to store the avatars of all friends, and then we will use a class called FaceApi here. This class is encapsulated by the SDK of Tencent Youtu. calls two API interfaces, face detection and image tag recognition, respectively, The former will count the number of friends who "use face avatars" and "do not use face avatars", and the latter will accumulate the tags extracted from each avatar. The analysis results are shown in the figure below:

It can be noticed that among all WeChat friends, approximately 1/4 of WeChat friends use face avatars, while nearly 3/4 of WeChat friends do not. Face avatars, which means that among all WeChat friends, people who are confident about their “face value” account for only 25% of the total number of friends, or 75% of WeChat friends act in a low-key style and don’t like to use face avatars. Make a WeChat avatar.

Secondly, considering that Tencent Youtu does not really recognize "faces", here we extract the tags from friends' avatars again to help us understand what keywords are in WeChat friends' avatars. The analysis results are shown in the figure below. As shown:

Through the word cloud, we can find that in the signature word cloud of WeChat friends, keywords with relatively high frequency are: girl, tree, house, text, screenshot, cartoon , Group photo, sky, sea. This means that among my WeChat friends, the WeChat avatars selected by friends mainly come from four sources: daily, travel, scenery, and screenshots.

The style of the WeChat avatar selected by the friend is mainly cartoon. The common elements in the WeChat avatar selected by the friend are the sky, sea, house, and trees. By observing all the friends’ avatars, I found that among my WeChat friends, 15 people use personal photos as WeChat avatars, 53 people use network pictures as WeChat avatars, and 25 people use anime pictures as WeChat avatars. They use group photos. 3 people use pictures as WeChat avatars, 5 people use children’s photos as WeChat avatars, 13 people use landscape pictures as WeChat avatars, and 18 people use girl photos as WeChat avatars. basically conforms to the analysis results of image tag extraction.

04

Friend’s signature

Analyze the friend’s signature. The signature is the richest text information in the friend’s information. According to the customary "tagging" methodology used by humans, the signature can analyze the status of a certain individual during a certain period of time. Just like people laugh when they are happy and cry when they are sad, the two labels cry and laugh respectively indicate the state of being happy and sad.

Here we do two types of processing for signatures. The first type of is to use stammering word segmentation to generate a word cloud. The purpose of is to understand what keywords are in a friend’s signature and which keywords appear relatively frequently; The second is to use SnowNLP to analyze the sentimental tendencies in the friend's signature. means whether the friend's signature is positive, negative or neutral as a whole, and what is their respective weight. Extract the Signature field here, and its core code is as follows:

def analyseSignature(friends):

signatures =''

emotions = []

pattern = re.compile("1f\d.+")

for friend in friends:

signature = friend['Signature']

if(signature != None):

signature = signature.strip(). replace ('span','').replace(' class','').replace(' emoji ','')

signature = re.sub(r'1f(\d.+)','',signature)

if(len(signature) >0):

nlp = SnowNLP(signature)

emotions.append(nlp.sentiments)

signatures += ''.join(jieba.analyse.extract_tags(signature,5))

with open('signatures .txt','wt',encoding='utf-8') as file:

file.write(signatures)

# Sinature WordCloud

back_coloring = np.array(Image.open('flower.jpg') )

wordcloud = WordCloud(

font_path='simfang.ttf',

background_color="white",

max_words=1200,

mask=back_coloring,

max_font_size=75,

random_state=45,

width=960,

height=720,

margin=15

)

p0genet. imshow(wordcloud)

plt.axis("off")

plt.show()

wordcloud.to_file('signatures.jpg')

# Signature Emotional Judgment

count_good = lamblenda(list(filter x:x>0.66,emotions)))

count_normal = len(list(filter(lambda x:x>=0.33 and x<=0.66,emotions)))>

count_bad = len(list(filter(lambda x: x<0.33,emotions)))>

labels = [u'negative negative',u'neutral',u'positive positive']

values ​​= (count_bad,count_normal,count_good)

plt.rcParams['font .sans-serif'] = ['simHei']

plt.rcParams['axes.unicode_minus'] = False

plt.xlabel(u'emotional judgment')

plt.ylabel(u'frequency')

plt.xticks(range(3),labels)

plt.legend(loc='upper right',)

plt.bar(range(3), values, color ='rgb')

plt.title(u'%s sentiment analysis of signature information of WeChat friends'% friends[0]['NickName'])

plt.show()

Through word cloud,We can find that in the signature messages of WeChat friends, keywords with relatively high frequency are: hard work, growing up, beauty, happiness, life, happiness, life, distance, time, and walking.

Through the following histogram, we can find that in the signature information of WeChat friends, positive and positive emotion judgments accounted for about 55.56%, neutral emotion judgments accounted for about 32.10%, and negative emotion judgments. It accounts for about 12.35%. The result of is basically consistent with the result we showed through the word cloud. This shows that in the signature messages of WeChat friends, about 87.66% of the signature messages are conveyed in a positive attitude.

05

Friends location

Analyze the location of friends, mainly by extracting the two fields of Province and City. Map visualization in Python is mainly through the Basemap module. This module needs to download map information from foreign websites, which is very inconvenient to use.

Baidu 's ECharts is used more in the front end. Although the pyecharts project is provided in the community, I noticed that due to policy changes, Echarts no longer supports the function of exporting the map, so the customization of the map is currently It is still a problem. The mainstream technical solution is to configure the JSON data of all provinces and cities across the country.

Here I am using the BDP Personal Edition, which is a zero-programming solution. We export a CSV file through Python and upload it to the BDP. You can make a visual map by simply dragging and dropping. Simple, here we only show the code that generates the csv part:

   

def analyseLocation(friends):

headers = ['NickName','Province','City'location]

with open .csv','w',encoding='utf-8',newline='',) as csvFile:

writer = csv.DictWriter(csvFile, headers)

writer.writeheader()

for friend in friends [1:]:

row = {}

row['NickName'] = friend['NickName']

row['Province'] = friend['Province']

row['City'] = friend['City']

writer.writerow(row)

The figure below is the geographical distribution map of WeChat friends generated in BDP,It can be found that my WeChat friends are mainly concentrated in Ningxia and Shaanxi provinces.

06

Summary

This article is another attempt of my data analysis, mainly from the four dimensions of gender, avatar, signature, and location to conduct a simple data analysis of WeChat friends , Mainly uses two forms of graphs and word clouds to present the results. In a word, "data visualization is a means, not an end." The important thing about is not that we made these diagrams here, but the phenomena reflected in these diagrams. What essential enlightenment we can get, hope this This article can inspire everyone.

.