Can Big Data be used to Predict Outcomes of Court Cases?

Studies indicate that outcomes of court cases can be predicted using big data. This area of practice, however, is still nascent. Currently the predictions are unlikely to be completely accurate; but as the practice evolves, it is likely that outcomes of court cases will be predicted with a high degree of certainty.

About Big Data

Big data refers to large volumes of data that can be collected and mined for valuable information. This data can be collected automatically, such as Amazon recording information about items bought and even viewed by people, or be given voluntarily by people who are using the service such as their address and contact number. The data is collected by companies for reasons such as tracking product performance, understanding and targeting customers, improving decision-making and developing new products. This is done by using software to gather voluminous amounts of data and study patterns that are invisible to the human eye.

For instance, Nest Laboratories uses big data to design ‘smarter’ thermostats by gathering data about people’s energy consumption habits around the world. After studying consumers’ energy usage patterns, the thermostat can auto-regulate based on the information it has acquired and make households more energy efficient.

Similarly, Amazon collects data about consumers’ habits, product preferences and dislikes to personalize its customer service for every user. This personalization has enabled Amazon to significantly improve its relationship with customers by making them feel more human. It also uses this data to automatically recommend products to customers that are using their website.

Using Big Data to make Predictions

A field that big data has irrevocably altered is sports, and one sport in particular – basketball. The practice of incorporating big data into basketball has commonly been referred to as ‘analytics.’

While picking drafts, teams have gathered data of not just a player’s college performance but also personal details such as family lineage, coaching history, parenting, positions played and ability to bench press. Mere data about how quickly a player jumped was not enough to measure quickness – the data had to be as nuanced as measuring the timing of his first two steps.

While of lot of the data proved non-predictive, information about rebounds per minute has proved useful in measuring future success of tall players and steals per minute has been significant in predicting the success of smaller players. College players are drafted into NBA teams after all this data is crunched by algorithm to predict how players would perform in the big leagues – an approach vastly different from scouts and coaches basing their drafts on college statistics (points, rebound, assists) and gut-instinct. While teams such as the Warriors and Cavaliers are known to make extensive use of big data, this method was pioneered by the general manager of the Houston Rockets, Daryl Morey.

The manner in which the game is being played has also been revolutionized. Studies now assign value to every moment of a possession by predicting the points that the offense is likely to produce at the end of the possession (called ‘expected possession value [EPV]’). Every dribble, pass and shot is allocated points depending on the result the particular move would produce. In this way, the EPV assigns a value to every possible move a player could make, allowing teams to make plays that are likely to result in the best outcome.

For instance[1]


The success big data has had with basketball indicates that data, when quantified into objective terms, can be used to predict human behaviour. The natural segue to this would be – can it succeed in areas other than sports?

Big Data and Court Cases

Given that big data is capable of predicting human behaviour, it comes as no surprise that it can also be used to predict the outcome of court cases. A model created by Daniel Katz, Michael Bommarito II and Josh Blackman does precisely this – it predicts the outcomes of court cases and individual justice votes in cases decided by the United States Supreme Court. Their model has demonstrated 70.2 per cent accuracy in predicting outcomes of cases and 71.9 per cent accuracy in predicting votes of individual justices. The data used included the justice presiding over the case, parties, issue, issue area, decision of the lower court, reason for granting certiorari and the court of origin. These factors were then converted into binary variables so that they could be coded into the model. For instance, the authors found 13 reasons for granting certiorari which were then converted into 13 binary variables – one for each possible reason.

A similar predictive exercise is also being undertaken by Predictice which uses an algorithm to predict the chances of resolution, amount of compensation and the best means through which a favourable judgment can be obtained.

In an attempt to further studies in this area, Katz, Bommarito and Blackman have also started FantasySCOTUS – a Supreme Court Fantasy League that allows people to make predictions about the outcomes of Supreme Court cases and justice votes.

At its core, predicting outcomes of court cases is based on ‘machine-learning’ which is a branch of artificial intelligence that enables computers to ‘learn’ things through self-training, observation and experience. Machines use data, which could be past experience or instructions, to autonomously learn what is required to perform a particular function. This is called ‘supervised learning.’

For instance, to be able to differentiate between cats and dogs, a machine would have to be given several images of cats and dogs so that it can learn what their respective characteristics are. Using this knowledge the machine would be able to differentiate between cats and dogs over time – a process analogous to how learning takes place in humans. Video recommendations by YouTube are made similarly by studying the videos that were previously seen by the viewer.

To predict court cases, the algorithm is first ‘trained’ by feeding it volumes of data about cases that are several decades old. The algorithm uses this data to look for patterns and builds a model to predict outcomes of similar cases in the future. While such predictions are not completely accurate, as machine learning and data availability increases it is likely that predicting outcomes will become a prominent part of courtroom litigation.

Featured image from here.

[1] Dan Cervone, et. al., POINTWISE: Predicting Points and Valuing Decisions in Real Time with NBA Optical Tracking Data, MIT Sloan Sports Analytics Conference, available at


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s