The Place of Artificial Intelligence in Sentencing Decisions

—Kieran Newcomb (Mentor: Nicholas Smith)

Thursday, March 28, 2024

Artificial intelligence (AI) is increasingly becoming a feature of human life, with technology infiltrating nearly every domain of our society. The legal field is no exception (Nikolskaia & Naumov, 2020). In the United States, judges already use AI-infused statistical programs to predict chances of flight or recidivism for criminal defendants (Wisconsin Department of Corrections, n.d.). Some people hope that incorporating this new technology will bring a sense of scientific objectivity to the process (Xu, 2021). However, many others find this to be a disturbing dystopian development (Nikolskaia & Naumov, 2020).

Kieran sits facing the camera with his laptop open and bookbag nearby on the table. He smiles in a light blue shirt against a colorful mural backdrop depicting a river, trees, and a sunset with swirly abstract lines.

Through research I conducted with a Summer Undergraduate Research Fellowship (SURF) from the Hamel Center for Undergraduate Research, I came to the conclusion that our economy and our justice system are structured such that in the coming years, AI is likely to seize the role of determining prison sentences from our judges. This does not mean that there will be no human judges—judges do much more than determine a sentence or grant parole or probation. But it would signify the beginning of AI taking over the work of one of our most venerated social positions, one that has tremendous impact on our sociopolitical world. I think we should take this prospect quite seriously.

In this article, I begin by defining some key terms and laying the foundation for this discussion. I explore the way artificial intelligence changes the landscape of judicial decision-making, and then I make my case about the aforementioned replacement. In the conclusion, I ask what we should do about this development.

Key Word Definitions

Before moving forward, I will define some key terms: systems, models, recidivism, sentencing, artificial intelligence, and machine learning. A system is a set of structures organized together as parts of a mechanism or interconnected network (O’Neil, 2016). Our justice system is composed of several subsystems, including a lawmaking system, policing system, court system, prison system, and many more. We aim to improve the systems we interact with by developing models, both formal and informal.

A model is “an abstract representation of some process . . . [that] takes what we know and uses it to predict responses in various situations” (O’Neil, 2016). Everyone has hundreds of informal models in their heads, which we use to navigate the various systems in our lives. At work, we have informal models that tell us how to behave. We gather information, both about specific colleagues and about the generally agreed-upon appropriate standards of workplace conduct, which informs how we behave professionally. Our models make life easier. Similarly, we can use formal mathematical models to improve the efficiency of systems. The predictive algorithms I discuss in this paper are formal mathematical models that help us better predict recidivism. The goal is to improve the judicial system.

Recidivism refers to when an individual is released from prison and reoffends (National Institute of Justice, n.d.). This is an important metric for criminologists and judges, because it forms a pattern of troubling behavior over time. Sentencing refers to when a judge declares the punishment for a defendant who has been found guilty of a crime (Cornell Legal Information Institute, n.d.).

Artificial intelligence is a broad term that refers to the ability of computers to process and perform tasks that mimic human intelligence and/or are structured to achieve ideal rationality (Bringsjord et al., 2018). Of course, human intelligence is not structured with ideal rationality, so there are different aims for different AI technologies. The purpose of the AI in Snapchat’s “My AI” is to mimic human written language patterns, while the purpose of the AI in technology used by judges is ideal rationality.

Machine learning typically refers to the process by which this occurs (Allen, n.d.). AI technology analyzes data and uses algorithms to predict future outcomes based on that data. Not only is it capable of handling enormous data sets, but it learns from new experiences. This means that if the program receives new, contrary data, or data that shows its prediction was wrong, it rewrites its own algorithm—hence the “learning” (Ridgeway, 2013).

Methodology

This project synthesizes research I have done over the last several years. The true beginning was a philosophy course I took with Professor Nick Smith in the spring of 2022, the Future of Humanity—my formal introduction to AI. We discussed the ethics of AI integration in everything from the autocorrect feature on our phones to self-driving cars to medical diagnostic programs to space exploration.

I subsequently took another of Professor Smith’s courses, Intro to Law and Justice, in which my research topic really came into focus. I applied for the SURF to study how judges use predictive algorithms to inform their sentencing. My project was a philosophical investigation. I needed to become well acquainted with the landscape of judicial decision-making and AI-infused statistical programming, but my real research question was an ethical one: Should we allow judges to use AI?

I spent the summer learning how judges make sentencing decisions, how criminological statistics are used, and how artificial intelligence transforms this process. I read entire issues of things like the Judges’ Journal, Criminology & Public Policy, and the Research Handbook on Big Data Law. I also read Weapons of Math Destruction by data scientist Cathy O’Neil, Are Prisons Obsolete? by Angela Y. Davis, and The Rich Get Richer and the Poor Get Prison by Jeffrey Reiman and Paul Leighton. Over the summer and into the fall, I compiled a document of notes on all the things that I read.

However, a lot of philosophical research is done through Socratic debate—a method of argumentation in which as few as two people attempt to gain a better understanding of a topic through asking and answering questions. This often takes the form of quite rigorous debate, but the goal is not to “win” so much as it is to better understand. So, I discussed with my advisor (and anyone else who would talk about it) whether we should be using such powerful technology in our courtrooms.

Over time, it seemed to me that regardless of whether we endorse its use, AI will continue to be integrated into our judicial system. In fact, the more I read and discussed, the more likely it seemed to me that over the coming years we will begin to reduce the role of human judges as we turn over tasks like sentence determination to AI-powered machines. Thus, my project morphed from a question of whether we should allow judges to use AI into what we should do about the prospect of the technology beginning to take over the job. The outcome of my SURF was a full-length philosophy research paper in which I explain my findings and what I believe to be the implications of this information. I summarize the crux of that paper in this article.

Judicial Deliberation and Artificial Intelligence

Whether it happens through a court case or through a plea deal (Johnson, 2023), once a person has been convicted, a judge determines a prison sentence. This is a delicate and complex process. The judge receives input from many sources: mandatory minimums and maximums set by Congress, sentencing guidelines set by the US Sentencing Commission, and presentence reports with statements from the victims, defendant, and attorneys (Offices of the United States Attorneys, n.d.). The judge may also consider mitigating factors, such as the defendant’s criminal history, whether they express remorse, the nature of the crime, mental health history, and much more (Offices of the United States Attorneys, n.d.). What follows is a complicated mixture of concrete legal guidelines, highly trained legal reasoning, and subjective assessments of the defendant’s disposition and perceived character.

The point of using this character assessment is to predict the likelihood of recidivism. We should pause here to note how bizarre a practice it is to use predictions about future behavior to inform present punishment. On the face of it, it seems morally wrong to use predictions about someone’s future behavior to inform punishment now. Common sense tells us that punishments should be associated with the crime committed, not the crime predicted. But it simultaneously seems like an obvious thing to do when a judge has been placed in a position of guarding the safety of the public. We want to be able to distinguish between the person who killed in self-defense and is haunted by it and the person who killed because they were interested in what it would feel like and shows no remorse. The question at the heart of this is whether we think the offender will go on to do the same crime, or anything similar, again. So, while this may seem like an ethically and logically questionable practice, it’s a very practical part of the sentencing processes.

So, what about AI? We should consider two metrics of performance that can be enhanced by technology: speed and accuracy. Comparatively speaking, humans are not particularly efficient at reviewing case history and writing judicial decisions. It’s a time-consuming process, requiring a considerable amount of reading, analysis, and writing—even for the very best human experts. This is why judges have clerks. Artificial intelligence, on the other hand, is incredibly efficient at analyzing massive data sets (Lederer, 2020). With the predictive coding available through AI, a judge can simply enter the relevant data points of the case and an algorithm will offer the statistical likelihood of recidivism based on all relevant characteristics of that case and every similar case that has ever been tried in the United States. This technology can also account for sentencing rules and guidelines, produce a sentencing suggestion, and provide a written explanation of how it determined the sentence. Rather than taking weeks to months for a decision, this could be done within minutes. Of course, there are very topical concerns over AI programs like ChatGPT fabricating case history (Bohannon, 2023), but this is a speed bump on the road to what would have been incomprehensible efficiency ten years ago.

In terms of prediction accuracy, a couple of decades ago judges began to use statistical programming to inform their deliberation. In these older statistical programs, a number of variables such as age or level of education, would be run against each other, and the relationships between them and the rates of recidivism would be used to create a graph, showing the range that indicated the greatest likelihood of recidivism (Berk & Bleich, 2013). The judge could create a number of different graphs with the data points they believed to be most relevant and use them to inform their decision. This wouldn’t provide something as clear-cut as a risk score, but the judge would probably learn which variable relationships provided the most meaningful insight and weigh those graphs more heavily in their informal mental model. This is an instance of a judge’s informal model getting stronger with more experience.

However, as our technology improves, scientists develop predictive programs that can handle more complex variables. So, while an old program might provide an assortment of graphs with thresholds of predicted criminality based on relationships between two variables, new programs are able to provide a single, complex graph with all relevant variables (Berk & Bleich, 2013). This is important statistically, because a defendant’s data points do not interact in isolation; rather, they are the composite of all their data points. Having technology that can accommodate this complexity is extremely helpful—especially when it can also provide quick, clear answers like a recidivism risk score.

In addition to being a considerable improvement on the technology that judges have already been using for years, AI addresses a concern surrounding the impressionistic nature of a judge making a character assessment of a defendant (Richards, 2016). It is argued that regardless of how much rigorous legal training a judge has, it is impossible to strip a practice like this of implicit bias. This is coupled with the fact that humans do not seem to be particularly good at predicting the future behavior of other humans, including recidivism. In one study (Dressel & Farid, 2018), randomly selected human subjects and an AI system, COMPAS, were each given information from twenty old court cases and asked to predict whether the defendant reoffended after their release from prison. COMPAS—which stands for Correctional Offender Management Profiling for Alternative Sanctions—predicts recidivism likelihood by weighing the answers to a 137-item questionnaire (Yong, 2018). This questionnaire asks about things like age, criminal history, level of education, gang affiliation, economic status, levels of boredom, and anger issues (Yong, 2018). The 800 participants had an average accuracy rate of 66.7%, while COMPAS had an accuracy rate of 65% (Dressel & Farid, 2018).

Neither of these results inspires a lot of confidence, given that random guessing should yield about 50% accuracy, but I argue that the takeaway from this study should be that in 2018 this technology was already as effective as many laypeople’s informal models. The crucial difference is that COMPAS’s algorithm will be fed more data every day and will modify its algorithm continuously for as long as we allow it to, while the human participants in that study won’t. Humans continue to modify their informal models until they seem good enough to serve them in whatever system they are working in (O’Neil, 2016). Any valuable improvements made are erased when the person retires or is no longer involved in that given system. A program like COMPAS doesn’t face this barrier. It can continue to improve open-endedly, its gains are not erased with retirement or death, and it’s only getting faster (Epps & Warren, 2020).

But, one might point out, that study was done with people who were randomly selected; don’t judges significantly improve in their ability to assess character through their training? Probably, but it’s hard to tell by how much (Ridgeway, 2013). It is almost certain that, through training, humans will develop more complex mental models for predicting behavior—but an increase in complexity does not necessarily correlate to an increase in accuracy.

Let’s grant, though, that the rigorous training involved in legal practice improves a human’s ability to predict future criminal behavior considerably. The question becomes: At what point in AI’s progression do we consider its superior accuracy to be something we cannot look past? Could it, at a certain point, become irresponsible to allow the human to make the character assessment? Is that point 86% accuracy? Is it 90%?

We seem to view something like character assessment as a uniquely human endeavor—a highly trained one in this case. But it’s not really: the data will tell us what we want to know. As data scientists continue to train their models, they become able to create new programs that are more powerful and therefore learn even faster.

The practical implications of this may be better displayed by looking to another sector that has been profoundly changed by AI integration: marketing in the age of social media and big data (O’Neil, 2016). Social media companies accumulate massive amounts of data about their users and sell that information to retail companies who feed it into AI programs that predict what a user will be likely to buy. Have you ever thought about (and not even told anyone!) buying a product, only to receive an ad on Instagram for that specific product hours later? This is not because ad agencies can read your mind; it’s because your data tells these machines information about you that allows them to predict what you are likely to want. This can be done with such great accuracy that sometimes it seems like they can read your mind (O’Neil, 2016). This paints a critical picture about the nature of our data. It tells a story so complete that it feels like this machine is reading our minds. What if we could predict who the dangerous criminals were with such accuracy that it was like we were reading their minds?

The Threat of Human Replacement in Judicial Sentencing

What do we do when a computer is more efficient at someone’s job than they are? We replace them! During my research, this is the pattern that came into focus. As our technology improves, we improve our models, and thereby make our systems more efficient. We are then able to reduce the amount of work done by humans because they’re simply not as efficient as the technology we’ve made (Kochhar, 2023). We often do this because it’s cheaper. I argue that this pattern will extend to the work of our judges as well.

In fact, a cutback on the number of judges and clerks happened in New Hampshire as recently as 2011 (King, 2020). In 2010, New Hampshire was facing a budget crisis, which, among other consolidations, introduced a restructuring plan and an e-filing program. These changes enabled the court system to run with fewer staff and allowed attorneys and self-represented litigants to file motions online rather than with a clerk in the courthouse (King, 2020). With these changes, New Hampshire reduced its number of clerkships from 118 to 18 (and it is now 16): “In a few painful short weeks, we went from a structure of [118 to] 52 clerks down to 18 . . . nearly $2.1 million the first year and, unadjusted for increased salaries and benefits, is nearly $19 million over the first nine years” (King, 2020). The first ten years also saw a reduction in circuit court judges, from 40 full-time and 29 part-time judges to 34 full-time and 8 part-time judges (King, 2020). The article in the New Hampshire Business Review that outlines this process shows that this reform saved New Hampshire taxpayers $55 million over the first nine years after its implementation (King, 2020).

Conclusion

I argue that the same sort of cost-benefit analysis will come back around as our technology improves to the point where these programs are a viable candidate to replace clerks and speed up the work of the courts. Should we reduce the scope of work for our judges, cut the clerks out entirely, and replace them with faster, cheaper, equally-or-more accurate, ideally less biased, ever-improving AI-powered statistical programs? By my calculation, I believe we could save many more millions of taxpayer dollars each year.

For now, technology hasn’t reached the point at which it is more accurate than humans. But I believe the time will come that it is. For all the comparisons my topic draws to movies like Minority Report and I, Robot, the impetus for our replacing human judges won’t be robots taking over the world—it’ll just be regular old budget cuts.

The structure of our criminal justice system is ripe for AI integration, and our capitalist society is one that rewards cost efficiency. Thus, I see the replacement of human judges in sentence determination as highly likely under the system we have—and I believe this likelihood will grow as technology continues to advance. This research article aims to do justice to the many possible benefits of AI integration into our judicial deliberation process and the great potential for positive change with this technology. But despite these benefits, my research leads me to believe that we need to stop this development from progressing further—at least until we make some critical changes to our criminal justice system. I think it is paramount that we consider our next steps very carefully. If we fail to do so, the consequences could be dire for our community.

There are so many people that have made my project possible. First, Professor Nick Smith, for being my academic advisor, my several-time URC advisor, my SURF advisor, my senior thesis advisor, and for guiding me throughout my law school application process. You’ve mentored me since the day we first met. I’d also like to thank the entire faculty of the UNH Philosophy Department. I couldn’t possibly describe the impact you all have had on my life. I’d like to thank Mr. Dana Hamel and Mrs. Elizabeth Lunt Knowles, who funded my Summer Undergraduate Research Fellowship through the Hamel Center for Undergraduate Research. Research opportunities like this are part of what makes UNH so outstanding! Finally, I’d also like to thank my family. My parents, Charles Newcomb and Cathy Duffy; my grandfather, James Duffy; my brother, Lachlan Newcomb; and my fiancée, Lena Beth Schneider. If I were to adequately describe how much this work relied upon all of their support, it would be its own paper.

Works Cited

Allen, J.C. (2020). Artificial intelligence in our legal system. The Judges’ Journal., 59(1), 1–39. https://heinonline.org/HOL/LandingPage?handle=hein.journals/judgej59&div=4&id=&page=

Berk, R. A., & Bleich, J. (2013). Statistical Procedures for Forecasting Criminal Behavior. Criminology & Public Policy 12 (3) 513–44. https://doi.org/10.1111/1745-9133.12047.

Bringsjord, S., Govindarajulu, N. S., Banerjee, S., & Hummel, J. (2017). Do Machine-Learning Machines Learn? Philosophy and Theory of Artificial Intelligence, 2017, 136-157. https://doi.org/10.1007/978-3-319-96448-5_14

Bohannon, M. (2023). Lawyer Used Chatgpt in Court-and Cited Fake Cases. A Judge Is Considering Sanctions. Forbes. .https://www.forbes.com/sites/mollybohannon/2023/06/08/lawyer-used-chatg….

Cornell Legal Information Institute. (n.d.) “Sentencing.” Legal Information Institute. https://www.law.cornell.edu/wex/sentencing.

Davis, A. Y. (2003). Are prisons obsolete? Seven Stories Press.

Dressel, J. & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances. https://doi.org/10.1126/sciadv.aao5580

Epps, W. & Warren, J. M. (2020). Artificial intelligence: now being deployed in the field of law. Judges' Journal, 59(1), 16-19.

Johnson, C. (2023) The Vast Majority of Criminal Cases End in Plea Bargains, a New Report Finds. NPR. https://www.npr.org/2023/02/22/1158356619/plea-bargains-criminal-cases-justice#:~:text=In%20any%20given%20year%2C%2098,from%20the%20American%20Bar%20Association.

King, D. (2020). The Circuit Court after 9 Years. NH Business Review. https://www.nhbr.com/the-circuit-court-after-9-years/.

Kochhar, R. (2023) Which U.S. Workers Are More Exposed to AI on Their Jobs? Pew Research Center’s Social & Demographic Trends Project. https://www.pewresearch.org/social-trends/2023/07/26/which-u-s-workers-are-more-exposed-to-ai-on-their-jobs/.

Lederer, F. I. (2020). Here there be dragons the likely interaction of judges with the artificial intelligence ecosystem. Judges' Journal, 59(1), 12-15.

National Institute of Justice. (n.d.) “Recidivism.” National Institute of Justice. https://nij.ojp.gov/topics/corrections/recidivism.

Nikolskaia, K., & V. Naumov. (2020). Artificial Intelligence in Law. 2020 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon). https://doi.org/10.1109/fareastcon50210.2020.9271095.

O'Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Penguin Books.

Offices of the United States Attorneys. (2023) Plea Bargaining. United States Department of Justice. https://www.justice.gov/usao/justice-101/pleabargaining.

Offices of the United States Attorneys. (2023). Sentencing. United States Department of Justice. https://www.justice.gov/usao/justice-101/sentencing.

Reiman, J. & Leighton, P. (2020). The Rich Get Richer and the Poor Get Prison. Routledge.

Richards, D. (2016). When Judges Have a Hunch: Intuition and Experience in Judicial Decision-Making. ARSP: Archives for Philosophy of Law and Social Philosophy, 102(2), 245–260. http://www.jstor.org/stable/24756844

Ridgeway, G. (2013). Linking prediction and prevention. Criminology & Public Policy, 12(3), 545-550.

Ridgeway, G. (2013). The Pitfalls of Prediction. National Institute of Justice. https://nij.ojp.gov/topics/articles/pitfalls-prediction.

Wisconsin Department of Corrections. (n.d.). COMPAS. Wisconsin Department of Corrections. https://doc.wi.gov/Pages/AboutDOC/COMPAS.aspx.

Xu, Z. (2021). Human Judges in the Era of Artificial Intelligence: Challenges and Opportunities. Applied Artificial Intelligence 36 (1). https://doi.org/10.1080/08839514.2021.2013652.

Yong, E. (2018). A Popular Algorithm Is No Better at Predicting Crimes than Random People. The Atlantic. https://www.theatlantic.com/technology/archive/2018/01/equivant-compas-algorithm/550646/.

Author and Mentor Bios

Kieran Newcomb will graduate in May 2024 with a bachelor of arts in Legal and Political Philosophy. His research project explored the rise of artificial intelligence (AI) and its potential to play a significant role in the legal field through a Summer Undergraduate Research Fellowship (SURF) funded by the Hamel Center for Undergraduate Research at UNH. Originally from Brooklyn, New York, Kieran moved to Manchester, New Hampshire where he began his education at Manchester Community College. Kieran became a Wildcat after he was offered a trial period in the University Honors Program, which he loved! Kieran’s research question was inspired by two philosophy courses he took with Professor Nick Smith who would become his advisor for his research. Through Kieran’s research he was able to learn more about the court system and apply his education actively to think like a philosopher. Despite the challenges of sticking to just one avenue of fascinating research, Kieran hopes this experience with the SURF program is the first steppingstone for his journey postgraduation. He plans to attend Cornell Law School in the fall.

Dr. Nick Smith J.D./Ph.D., is a professor of philosophy and has been teaching at the University of New Hampshire since 2002. He was previously a litigator at a major New York law firm and a judicial clerk for the United States Court of Appeals for the Third Circuit, he teaches and writes on issues in law, politics, and society. Dr. Smith has published I Was Wrong: The Meanings of Apologies in 2008 and Justice through Apologies: Remorse, Reform, and Punishment in 2014 (both with Cambridge University Press). He has been interviewed or appeared in many major news outlets, including the New York Times, The Wall Street Journal, NPR and BBC, among others. He began mentoring the author on this project after Kieran took several classes with him since his first experience with Dr. Smith’s Future of Humanity course. He has mentored several undergraduate researchers and Inquiry authors. Dr. Smith describes Kieran as an exceptional and inspiring student, and that “this is such a cutting-edge topic that we are continuously learning from each other. I regularly bring material from Kieran’s research into my current classes.”

Contact the author

Commonly Searched Items: