One of the areas of mathematics and statistics that interests me generally goes under the heading of data compression. It’s closely related to “machine learning,” if that helps. The idea is that one has a whole lot of variables that one observes about something (“independent variables,” to use a statistics term) and one has a result that follows (“a dependent variable”). And one has this data for a whole bunch of different instances. The idea behind the prevalent technique of multiple regression (well described among other places in Ian Ayres’ fabulous book, Super Crunchers), is that one assumes a “model” of how the independent variables relate to the dependent variable and then sees which coefficients “best predict” the result. The output is basically the statistical distribution of these coefficients and is generally simplified to the most likely value of the coefficients. If one conceives of the law as some sort of system for mapping “facts” into results, one can also use this technique to deduce what the law must be. One “simply” categorizes “the facts” in some numerical fashion (1s can be used for the presence of some fact, 0s for its absence), categorizes the result numerically, and one can develop a regression whose results represent the best estimate of “the law.”
But where did the model come from? Often it comes from some physical intuition about how the system operates or one uses Occam’s Razor to assume a simple linear relationship. This often works well in modeling physical systems. But it may not work well with law where it may be some combination of variables that best predict the outcome, not just some linear combination of the individual variables. And so, in fact, what clever jurists often do is to ferret out combination of variables that seem to predict the outcome of cases and pronounce that to be the law. Indeed, doing this represents one of the chief skills learned in law school. For classic work by a really bright student, see Justice Cardozo’s opinion in MacPherson v. Buick Motor Corp., 217 NY 382 (1916).
But for fat data sets, ones in which there are lots of variables, such as those attempting to capture the complexity of many legal cases, it turns out that there may be multiple “compressions” of the data in which use of just a few variables can predict outcomes quite well. So, for example, in predicting the outcome of an automobile negligence action, one might do quite well by excluding from consideration the color of the plaintiff’s vehicle but might not do well unless the variable “speed of defendant’s vehicle” was present. The idea is to find “compressions” that predict well and that are parsimonious, i.e. some sort of fairly terse expression. Legal rules that take 100 pages to set forth are disfavored by humans, even if they predict well. (Although maybe one can think of a treatise as being an example of a model that trades in parsimony in favor of accurate prediction).
Anyway, it turns out that there are often multiple compressions of data that have roughly equivalent parsimony and roughly equivalent predictive value. They often rely on a different subset of variables drawn from the much larger set of potential variables. Moreover, there may be a set of compressions that lie along a “Pareto Frontier” such that one can not improve parsimony without sacrificing predictive value and vice versa. (You can read about this here). To choose which of these models in fact represents “the law,” is an extraordinarily difficult task in which one often has to go outside statistics and invoke one’s aesthetic preferences, as well as one’s beliefs about what law should look like and care about.
Which brings me to Baker v. Carr. Although sometimes judicial disputes about precedential reasoning are based on different categorizations (encoding) of facts and outcomes, I do not see that going on here. Instead, the case features Justice Brennan’s clever recompression of a data set composed of prior Supreme Court decision’s on justiciability. He ends up with a rule that predicts justiciability based on a disjunctive combination of six factors (“textually demonstrable commitment,” “lack of judicially manageable standards”, etc. ) and not based simply on whether the “Republican form of government” clause is potentially implicated by the case or whether the case is political. Justice Frankfurter in his dissent develops an alternative three-factor compression (including interference with matters of state government) of the precedent data bank that he finds predicts outcomes well. Among the things interesting to me are (1) that while both compressions predict well with hindsight the outcome of prior cases, they arguably diverge on the outcome of this particular case — indeed, this is likely the case for most contested Supreme Court decisions — and (2) the “ships passing in the night” character of the opinions, which do not fully grapple with either the success of the alternative models in prediction or with the non-“statistical” bases for preferring one compression, one vision of the law, over the other. (Although Frankfurter comes close at the end in expressing a preference for justiciability rules that avoid judicial activism in ways that he thinks will create tension in federal-state relations).
Query for students by the way: What values might Justice Frankfurter be sacrificing or ignoring here?
Moral: It is probably useful to see what rule of law emerges from an attempt to compress the dataset implicit in some bank of relevant precedents but to recognize that alternative compressions are often likely to succeed. The process of choosing from amongst the successful models is a yet more difficult problem, one that might be informed by concerns for the robustness of the models — a good model should not lead to absurd results in cases not yet examined — but that will likely be the product of external notions of justice, efficiency and legal aesthetics.