Data import Before importing data, we first create some constraints to avoid importing some duplicate data: CREATE constraint on ASSERT p.id is unique; CREATE constraint on ASSERT m.id is unique; Next we will import the data into Neo

2024/05/2007:51:33 hotcomm 1268

Original link: https://medium.com/neo4j/finding-the-best-tennis-players-of-all-time-using-weighted-pagerank-6950ed5fc98e

The latest version of the Neo4j graphics algorithm library adds weight to the PageRank algorithm Variable support.

My colleague Ryan (https://twitter.com/ryguyrg/) recently published a paper "Who is the best tennis player of all time?" Complex network analysis based on the history of professional tennis" (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017249), in this paper, he used a variant of PageRank algorithm , so I was thinking, I am also a tennis enthusiast, can I do something based on this algorithm?

I originally planned to do some data capture, but Kevin Lin has already done some more difficult work. He put all the competition results by the end of 2017 in the form of csv files on Github "atp-world-tour-tennis" -data》(https://github.com/serve-and-volley/atp-world-tour-tennis-data).

Thanks to Kevin for his contribution.

data import

Before importing data, we first create some constraints to avoid importing some duplicate data:

CREATE constraint on (p:Player)ASSERT p.id is unique;CREATE constraint on (m:Match)ASSERT m.id is unique;

Next we will import the data into Neo4j. First copy the CSV file created by Kevin to the Neo4j import directory. After

is completed, we can use Cypher's LOAD CSV command to import data into Neo4j.

LOAD CSV FROM "file:///match_scores_1968-1990_UNINDEXED.csv" AS rowMERGE (winner:Player {id: row[8]}) ON CREATE SET winner.name = row[7]MERGE (loser:Player {id: row [11]}) ON CREATE SET loser.name = row[10]MERGE (m:Match {id: row[22]})SET m.score = row[15], m.year = toInteger(split(row[ 0], "-")[0])MERGE (m)-[w:WINNER]-(winner) SET w.seed = toInteger(row[13])MERGE (m)-[l:LOSER]-(loser ) SET l.seed = toInteger(row[14]);LOAD CSV FROM "file:///match_scores_1991-2016_UNINDEXED.csv" AS rowMERGE (winner:Player {id: row[8]})ON CREATE SET winner.name = row[7]MERGE (loser:Player {id: row[11]})ON CREATE SET loser.name = row[10]MERGE (m:Match {id: row[22]})SET m.score = row [15], m.year = toInteger(split(row[0], "-")[0])MERGE (m)-[w:WINNER]-(winner) SET w.seed = toInteger(row[13] )MERGE (m)-[l:LOSER]-(loser) SET l.seed = toInteger(row[14]);LOAD CSV FROM "file:///match_scores_2017_UNINDEXED.csv" AS rowMERGE (winner:Player {id: row[8]}) ON CREATE SET winner.name = row[7]MERGE (loser:Player {id: row[11]}) ON CREATE SET loser.name = row[10]MERGE (m:Match {id: row[22]})SET m.score = row[15], m.year = toInteger(split(row[0], "-")[0])MERGE (m)-[w:WINNER]-(winner ) SET w.seed = toInteger(row[13])MERGE (m)-[l:LOSER]-(loser) SET l.seed = toInteger(row[14]);

This model is very simple, you can run the following Request to see the visual description:

CALL db.schema()

You can see:

It looks good. Before continuing, let's write a simple query to take a look at the data:

MATCH p=()-[:LOSER]-()-[r:WINNER]-() RETURN p LIMIT 25

The player with the most wins

Now, We want to see the player with the most wins. How should we write this statement?

MATCH (p:Player)WITH p,size((p)-[:WINNER]-()) AS wins,size((p)-[:LOSER]-()) as defeatsRETURN p.name, wins, defeats, CASE WHEN wins+defeats = 0 THEN 0ELSE (wins * 100.0) / (wins + defeats) END AS percentageWinsORDER BY wins DESCLIMIT 10

Run the above statement and you will see the following output:

If you are also a tennis fan, you may Recognize most of the names on this list. Most of them are considered the best players of all time, but just counting the number of games won doesn't seem too rigorous.

At this point, it seems that we can try a more advanced method---PageRank algorithm....

Create a credible projection graph

Determine the credibility of a node through its entry relationship, this This is how the PageRank algorithm works. For example, in the online world, a web page brings credibility to it by linking to another web page. This credibility can be determined by the weight attribute of this relationship.

In our world of tennis, a player's credibility is determined by how many wins and losses they have compared to each other. For example, the following query shows how many times Federer and Nadal have won against each other.

MATCH (p1:Player {name: "Roger Federer"}), (p2:Player {name: "Rafael Nadal"})RETURN p1.name, p2.name,size((p1)-[:WINNER]-() -[:LOSER]-(p2)) AS p1Wins,size((p1)-[:LOSER]-()-[:WINNER]-(p2)) AS p2Wins

The running output is as follows:

Our projection image should be in A direct relationship is established between Federer and Nadal, using weights to represent the number of times they have won each other's matches. The weight of the relationship from Federer to Nadal is 23, meaning that Federer beat Nadal 23 times. The weight of the relationship between Nadal and Federer is 15.

. We write the following query statement to project this picture:

MATCH (p1)-[:WINNER]-(match)-[:LOSER]-(p2) WHERE p1 .name IN ["Roger Federer", "Rafael Nadal"]AND p2.name IN ["Roger Federer", "Rafael Nadal"]RETURN p2.name AS source, p1.name AS target, count(*) as weightLIMIT 10

The output of this query is as follows:

The next thing we need to do is to delete the WHERE condition so that this query can be performed on the entire graph.

Use weighted PageRank to discover the best tennis players

Now we call the weighted PageRank algorithm through the weightProperty parameter of the PageRank algorithm. By default, the PageRank algorithm is in unweighted mode. The following statement of

is to run the weighted PageRank algorithm on the entire image:

CALL algo.pageRank.stream( "MATCH (p:Player) RETURN id(p) AS id", "MATCH (p1)-[:WINNER]-( match)-[:LOSER]-(p2)RETURN id(p2) AS source, id(p1) AS target, count(*) as weight ", {graph:"cypher", weightProperty: "weight"})YIELD nodeId , scoreRETURN algo.getNodeById(nodeId).name AS player, scoreORDER BY score DESCLIMIT 10

The running results are as follows:

We can see that the head of our ranking is different from the ranking of Filippo Radicchi's paper. The main difference is that Federer, Na Dahl and Djokovic rounded out the top five. This is because Radicchi's analysis only extends to 2010, and these three players have been very good in the next 8 years, so this is why our rankings are different.

We can template only include games before 2010, then the following query statement:

CALL algo.pageRank.stream( "MATCH (p:Player) RETURN id(p) AS id", "MATCH (p1)-[:WINNER] -(match)-[:LOSER]-(p2)WHERE match.year = $year RETURN id(p2) AS source, id(p1) AS target, count(*) as weight ", {graph:"cypher", weightProperty: "weight", params: {year: 2010}})YIELD nodeId, scoreRETURN algo.getNodeById(nodeId).name AS player, scoreORDER BY score DESCLIMIT 10

The running effect is as follows:

Note that in this query, we will use the year Values are passed as parameters into Cypher projection queries via the params key.

The top two in our rankings are now the same as Radicche's, but Federer is currently in third place rather than seventh in Radicche's rankings, while Nadal and Djokovic are already ranked in our rankings. Out of the top ten.

We may also query the PageRank ranking of a certain competition. The following query is the PageRank ranking in 2017

CALL algo.pageRank.stream( "MATCH (p:Player) RETURN id(p) AS id", "MATCH (p1)- [:WINNER]-(match)-[:LOSER]-(p2) WHERE match.year = $yearRETURN id(p2) AS source, id(p1) AS target, count(*) as weight ", {graph:" cypher", weightProperty: "weight", params: {year: 2017}})YIELD nodeId, scoreRETURN algo.getNodeById(nodeId).name AS player, scoreORDER BY score DESCLIMIT 10

The running effect is as follows:

The picture below is the 2017 ATP World The Tour's year-end ranking

This ranking is completely different from our ranking! What is the reason for this? This is because the official ranking gives different weight to each match, while our PageRank ranking gives equal weight to each match.

Well, that’s it for the problem of using weighted PageRank to find the most optimized tennis player in history. I look forward to seeing more people using weighted PageRank to solve other problems. If you have used it, please tell me [email protected]

Enjoy!

Translator's words: The author only introduced how to implement this from the application perspective, and did not introduce the functions of each parameter of the algo.pageRank.stream method. I will be available in the future. Find relevant articles and introduce them to you.

hotcomm

▲England head coach Southgate leads the national team. The first place in Group F in Europe has reached the FIFA ranking: No. 12, the best result in the World Cup. In the history of World Cup winning, England, known as the "modern football motherland", has been known as the "mode - DayDayNews

▲England head coach Southgate leads the national team. The first place in Group F in Europe has reached the FIFA ranking: No. 12, the best result in the World Cup. In the history of World Cup winning, England, known as the "modern football motherland", has been known as the "mode

The World Cup Fa Ge starts / FIFA Dragon List English Dance Youth

07/04 1665

The event is approaching. In order to prepare for the pre-match, the organizing committee hereby reminds you to read the following entry instructions carefully. To collect the participating bags on behalf of others, you must provide the ID card of the person being received, and t - DayDayNews

The event is approaching. In order to prepare for the pre-match, the organizing committee hereby reminds you to read the following entry instructions carefully. To collect the participating bags on behalf of others, you must provide the ID card of the person being received, and t

Must-see! Instructions for participating in the 2019 Changchun Jingyuetan International Forest Marathon!

07/04 1162

On the morning of January 20, 2018, the ninth season of "Sohu News Marathon" started in Macau, China. Sohu Company Chairman and CEO Zhang Chaoyang, along with celebrities from all over the Taiwan Strait and four regions, including Sun Nan, Ling Xiaosu, Wang Ou, Wang Xuebing, Wu Z - DayDayNews

On the morning of January 20, 2018, the ninth season of "Sohu News Marathon" started in Macau, China. Sohu Company Chairman and CEO Zhang Chaoyang, along with celebrities from all over the Taiwan Strait and four regions, including Sun Nan, Ling Xiaosu, Wang Ou, Wang Xuebing, Wu Z

Sohu News Marathon Macau Station Ling Xiaosu runs 20 kilometers with hiking Sun Nan boasts that he can run with thin legs

07/04 1660

hotcomm

On March 30, the 13th season of Sohu News Marathon was launched, and celebrity runners gathered in Haitang Bay, Sanya to embark on the "most beautiful beach". At the launch ceremony, Zhang Chaoyang said: "Sohu News Marathon is a brand with a long history and is now the 13th sessi

Season 13 Sohu News Marathon starts in Sanya, Cecilia Cheung, Tan Jing, Yang Jiulang, Wang Chenyi and others appear

07/04 1480

Your attention is our motivation! WeChat official account: Answer questions, update daily, the content here will be deleted and delayed! ! ! Duoduo Orchard Answer Competition Question Bank A77 We are the most complete and latest Duoduo Orchard Answer Question Bank Hope everyone c - DayDayNews

Your attention is our motivation! WeChat official account: Answer questions, update daily, the content here will be deleted and delayed! ! ! Duoduo Orchard Answer Competition Question Bank A77 We are the most complete and latest Duoduo Orchard Answer Question Bank Hope everyone c

Duoduo Orchard Question Bank A77 (updated daily)

07/04

Xiaowo will launch the "Senior Brother and Sister Say" series from time to time starting today. The first issue was written by Zhang Xiaoying himself. Zhang Xiaoying, who is both a senior sister of the third and fourth phases of Yiju Wharton, and also a student of the fifth phase - DayDayNews

Xiaowo will launch the "Senior Brother and Sister Say" series from time to time starting today. The first issue was written by Zhang Xiaoying himself. Zhang Xiaoying, who is both a senior sister of the third and fourth phases of Yiju Wharton, and also a student of the fifth phase

Senior brother and sister said | Zhang Xiaoying: Discover the fireworks in his heart

07/04 1176

The case of investigating and handling of relevant personnel from the Hockey Association embezzlement of public funds was impeached by the Supervisory Commission yesterday, accusing that there were major violations of the use of the matter and seriously violated the case handling - DayDayNews

The case of investigating and handling of relevant personnel from the Hockey Association embezzlement of public funds was impeached by the Supervisory Commission yesterday, accusing that there were major violations of the use of the matter and seriously violated the case handling

The Tsai administration's mafia is involved in the "hockey case" to get rid of Duan Yikang's confusion? Luo Zhiqiang: The most shameful judicial butcher

07/04 1299

The story of Enshi's daughter who married from Taiwan made the birth of a drama. In May 2013, then-Secretary of the Provincial Party Committee Li Hongzhong led a delegation to Taiwan to carry out a cultural exchange tour of Baodao and visited some mainland brides from Hubei. I li - DayDayNews

The story of Enshi's daughter who married from Taiwan made the birth of a drama. In May 2013, then-Secretary of the Provincial Party Committee Li Hongzhong led a delegation to Taiwan to carry out a cultural exchange tour of Baodao and visited some mainland brides from Hubei. I li

Culture｜Enshi's drama "Taipei Bride" is very popular. It turns out that the prototype of the character in the play is actually Enshi Girl

07/04 1259

"Promoting Transfers" recently held a press conference to report the progress of the six-year task. Yang Cui said that the "Zhongzheng Memorial Hall Transformation Suggestions" have been sent to the "Ministry of Culture" and put forward five principle proposals, including the wit - DayDayNews

"Promoting Transfers" recently held a press conference to report the progress of the six-year task. Yang Cui said that the "Zhongzheng Memorial Hall Transformation Suggestions" have been sent to the "Ministry of Culture" and put forward five principle proposals, including the wit

Duan Yikang said that Yang Cui was not wrong with "promoting transfers". Taiwanese netizens: You can swallow the ball first before you have the right to speak.

07/04 1088

DPP "legislator" Duan Yikang. (Photo source: Taiwan's United Daily News) China Taiwan Network reported on October 26 that the high vegetable prices in Taiwan have caused public grievances. The Democratic Progressive Party's "legislator" Duan Yikang recently pointed out on Faceboo - DayDayNews

DPP "legislator" Duan Yikang. (Photo source: Taiwan's United Daily News) China Taiwan Network reported on October 26 that the high vegetable prices in Taiwan have caused public grievances. The Democratic Progressive Party's "legislator" Duan Yikang recently pointed out on Faceboo

Tormented! "Green Committee" Duan Yikang was criticized for being criticized as "little slut" and did not dare to accept the challenge.

07/04 1471