Nadal or Djokovic? Predicting the winner of the US Open
- Published
Most pundits will have an opinion on who will triumph in this year's US Open men's final - Rafael Nadal or Novak Djokovic - but the best insights into who will be crowned champion will come from the same technology that has helped cities to lower crime rates and plan for extreme weather.
Deep in the bowels of Arthur Ashe Stadium in Flushing Meadows, Queens, New York, beats the data heart of the 2013 US Open.
In a bland room accessed through an unmarked door, more than 60 laptops are piled high, arranged like a command control centre for a mission to the moon.
This room is known as "scoring central", according to US Open officials.
It's where data is pushed to scoreboards on Louis Armstrong Court - the second largest US Open tennis court - or to TV screens across the globe.
But more than a power processing centre, this is where the results of matches are broken down and analysed, where it's determined not only who won, but why they won, according to the numbers.
"Say you wanted to see every backhand unforced error in a match. You would touch a button and all of those would come up," says IBM's vice-president of sports marketing, Rick Singer.
But seeing what has happened in past matches is rapidly giving way to better predicting what will happen in future pairings, explains Mr Singer.
To put it simply: the past might have centred around intuitively understanding that a player who gets a majority of their first serves in will win the match.
The future is pinpointing the exact percentile threshold the player must cross to win.
'Unusual statistics'
This year, IBM has gathered more than 41 million data points from eight years of Grand Slam tennis matches to better understand the small details that end up deciding a match.
The idea is that by crunching more and more data, patterns will emerge that can help better hone predictions.
So what should Novak Djokovic do if he wants to beat a resurgent Rafael Nadal, who has emerged this summer as the dominant force on hard courts?
Looking at data from the head-to-head matches between the two in Grand Slams, IBM says that if Djokovic wins more than 57% of medium-length rallies (of between four and nine shots) then he will emerge triumphant.
He also has to win more than 39% of return points on Nadal's first serve.
Nadal, on the other hand, has to dominate on his serve. If he wins more than 63% of points on his first serve then IBM predicts he will win.
However, the longer Nadal's service games go on, the less likely he is to win. He needs to keep his service games relatively short, averaging fewer than 6.5 points per game, according to IBM.
"It's the same sort of statistical analysis and predictive analytics that we do for our clients all around the world, just applied to tennis," explains Mr Singer.
"What we're trying to do is find statistics that are unusual."
A backhanded solution
Statistics from the tournament so far also provide a pointer as to what each player should work on.
Djokovic, for instance, must focus on getting his backhand into play.
According to IBM's data, when Djokovic can hit his backhand deep to Nadal's forehand, his odds of winning the point dramatically increase.
However, during this tournament that stroke has been particularly difficult for Djokovic - he's had 32 backhand winners, but 70 backhand unforced errors.
For Nadal, he will go into the final knowing that his most powerful weapon - his forehand - is working well. He has hit 113 forehand winners, compared with Djokovic's 73.
He will also know that as long as he can continue to keep up his variety of serve, and go to the net occasionally - where he's won 81% of the points he has played there - he might have the upper hand over Djokovic.
Serbia's world number one will also have to improve his consistency in the final. Although both players have hit the same number of winners in the tournament so far (206), Djokovic has made 167 unforced errors, far more than Nadal's 130.
And with the Spaniard having dropped serve just once all tournament, Djokovic will have to be more ruthless when taking any break point opportunities that come his way, having converted only 44% up until now.
Elephant brain
It's only with the advent of big data technologies and faster, better, processing power that companies like IBM say they've been able to quickly and cheaply gather these new insights.
Most of these big data crunching technologies, from predicting airline prices to sports champions, use something known as Apache Hadoop, external.
Designed by engineers who had been working at Yahoo and elsewhere ("Hadoop" was the name of one of the creators' son's toy elephant), it is now just one of the components of IBM's predictive analytics toolkit.
The hope is that in the future, statistics like these might not just be of benefit to sports as a whole, but that athletes themselves will be better able to calibrate their performances.
"Each tournament we evolve a little bit further," says Mr Singer.
The goal, he says, is "to take the statistics beyond what people are expecting".
But for fans watching the US Open final who have no head for statistics, Rafael Nadal's coach and uncle, Toni Nadal, has this simple advice for what it takes to succeed: "You should play good, nothing else. You should play very well."