Probabilistic blindspots in Goodman’s new riddle of induction

In Fact, Fiction and Forecast, Nelson Goodman revisits Hume’s problem of induction. The problem Hume wrote about in the 18th century is: can we reliably know anything about the future, given experience of the past? Or alternatively, can we produce generalizable knowledge from observations of particulars?

Goodman breezes through a couple of unworkable solutions (or attempts at dissolutions) of Hume’s problem. He then zones in on the core issue: just as deductive inferences are justified if they conform to the rules of logic, Goodman argues that “predictions are justified if they conform to valid canons of induction.” The problem of induction, then, is the problem of defining the valid rules of induction.

Today, an obvious suggestion would be that the laws of probability—which generalize the laws of classical deductive logic—should form the rule system that distinguishes valid from invalid predictions. (For some, maybe the more obvious suggestion is something like deep learning, though there aren’t any decipherable laws there so far.) This isn’t where Goodman was going: “As principles of deductive inference, we have the familiar and highly developed laws of logic; but there are available no such precisely stated and well-recognized principles of inductive inference.” He adds that “Elaborate and valuable treatises on probability usually leave certain fundamental questions untouched.” That’s the only reference he makes to probability in the book.

It’s dangerous to project our current understanding of probabilistic inference on Goodman’s text, which is based on lectures he gave in the 1950s. It was going to be a couple of decades until Bayesian computation would be feasible to implement. Monte Carlo methods were in their infancy then. Practical matters of computation aside, much of the applied work making the case for Bayesian inference was yet to be done. Still, the absence of more recognizably probabilistic thinking from Goodman’s work is striking. (It’s interesting to ask what Goodman thought at the time of the emerging field of cybernetics, which had a clear probabilistic bent, and whether he changed his mind later.)

It also wouldn’t be fair to dispute Goodman’s implication that the problem of deductive inference is basically “solved.” Much of the work on non-classical logics, like nonmonotonic logics, was developed only later. In the 1950s, the foundational rules of deductive logic may have been viewed as a done deal in Goodman’s circle.

***

Goodman focuses his search for a theory of induction by noting that valid inductive inferences have a “lawlike-ness” to them. He claims that “Only a statement that is lawlike—regardless of its truth or falsity or its scientific importance—is capable of receiving confirmation from an instance of it; accidental statements are not.” For example, seeing that a piece of copper conducts electricity “increases the credibility of statements asserting that other pieces of copper conduct electricity.” This observation therefore confirms the lawlike-ness of the hypothesis All pieces of copper conduct electricity.

Lawlike-ness stands in contrast to what Goodman calls “accidental” hypotheses, which aren’t supported by their own instantiation. Here’s Goodman’s example of an accidental hypothesis: “…the fact that a given man now in this room is a third son does not increase the credibility of statements asserting that other men now in this room are third sons, and so this does not confirm the hypothesis that all men now in this room are third sons.” The objective for Goodman is therefore to find a way to discriminate between lawlike and accidental hypotheses.

Goodman’s quest to define what makes a hypothesis “lawlike” doesn’t make much sense in the framework of probabilistic inference. The fact that an assertion appears to be generalizing about a class of objects as opposed to particulars (from a linguistic perspective) isn’t what determines how probable the assertion is.

Consider a simple inference problem. We observe a sequence of independent tosses of a coin with unknown weight $\theta$, and we’re asked to predict the outcome of the next toss. Given a finite sequence $X_1,…,X_n$ of Heads ($H$) and Tails ($T$):

$X_1,\dots,X_n = H, T, T, \dots, H, T, H$

our task is to predict next toss. If $X_{n+1}$ is a random variable representing the next coin toss’s outcome, then its probability is given by the posterior predictive distribution:

$P(X_{n+1} \mid X_1,\dots,X_n) = \displaystyle\int P(X_{n+1} \mid \theta)P(\theta \mid X_1,\dots,X_n)d\theta$

Simple tasks like these show that the lawlike-ness property is arbitrary. The assertion “The next flip is likely to be Heads with probability 0.65” is as un-lawlike as they come: the statement refers to a singular event, the next toss of a coin, based on observing a specific short sequence of tosses. This is nothing like the lawlike hypotheses Goodman described about the properties of copper, or other statements that relate objects and broad categories. And yet, given the right data, this assertion is entirely justified.

The features that Goodman is seeking aren’t the ones that matter for separating sound from unsound predictions. What matters, for probabilistic reasoning, are features like the size of the predictions made by the hypothesis—as elegantly illustrated by Josh Tenenbaum’s “number game”—or the complexity of the hypothesis, as measured by its description length, for example. As Hájek and Hall suggest, algorithmic probability and complexity-based priors are good tools for thinking about induction.

A rejoinder on behalf of Goodman might go as follows. The assertion that “The next flip is likely to be Heads with probability 0.65” is in fact not a statement about a singular event and a specific set of observations. It’s actually a generalization over a very large space of coin tosses. In our example, we assumed that the coin tosses are independent and identically distributed (conditioned on $\theta$), and so our assertion would be equally justifiable if we had observed any sequence of tosses that has the same binomial frequencies of heads and tails. However, this form of reasoning, which appeals to the probabilistic structure of the hypothesis space, only reinforces the argument that the kind of features Goodman was grasping for aren’t the relevant ones—and that one has to consider the laws of probability to ascertain how probable an assertion is, rather than the the assertion’s linguistic lawlike-ness.

Goodman acknowledged that lawlike-ness, whatever it may be, isn’t a syntactic feature of hypotheses. He also agreed that characterizing accidental hypotheses as ones that “involve some spatial or temporal restriction, or reference to some particular individual” isn’t a workable distinction, either. He argued that lawlike-ness arises from complex and nested relations among hypotheses and the evidence that confirms them, making some hypotheses “projectible” and others not. In short, Goodman recognized that he has no real account of what distinguishes lawlike from accidental hypotheses.

***

Looking back on Goodman’s work in light of modern probabilistic inference, it’s tempting to see in the sum of Goodman’s intuitions, when formalized, induction in a Bayesian framework. But it’s not clear if there’s room in this interpretation for anything like the distinction between lawlike and accidental hypotheses as Goodman saw it. For Goodman, the difference between these, and the basis for valid inductive inference, was fundamentally linguistic: “If I am at all correct, then the roots of inductive invalidity are to be found in our use of language. A valid prediction is, admittedly, one that is in agreement with past regularities in what has been observed; but the difficulty has always been to say what constitutes such agreement. The suggestion I have been developing here is that such agreement with regularities in what has been observed is a function of our linguistic practices.” He concludes that “the line between valid and invalid predictions…is drawn upon the basis of how the world is and has been described and anticipated in words.”

The insistence on framing valid induction in linguistic (or logical) terms is hard to salvage when we reinterpret Goodman in probabilistic terms. Perhaps wanting to identify Goodman as a progenitor of a successful Bayesian induction paradigm is another instance of the “founding-father fables” that the historian of science Jan Sapp wrote about. (There aren’t many “founding-mother” fables to write about, sadly.)

In trying to distinguish lawlike from accidental hypotheses, Goodman made several interesting observations about the knowledge required for successful inductive inference. He recognized that an agent’s previous experience with inductive inferences (including failed ones) has to factor into the way future inductive inferences are made, even if these new inferences concern a different domain. An account of valid inference has to include “some knowledge of past predictions and their successes and failures. I suppose that seldom, if ever, has there been any explicit proposal to preclude use of such knowledge in dealing with our problem. Rather a long standing habit of regarding such knowledge as irrelevant has led us to ignore it almost entirely. Thus what I am suggesting is less a reformulation of our problem than a reorientation: that we regard ourselves as coming to the problem not empty-headed but with some stock of knowledge, or of accepted statements, that may fairly be used in reaching a solution.” Nowadays, this might be what machine learning researchers call transfer learning, or learning to learn.

All of this led Goodman to view the mind as a non-stop prediction machine that cumulatively learns from the successes and failures of its predictions: “We…regard the mind as in motion from the start, striking out with spontaneous predictions in dozens of directions, and gradually rectifying and channeling its predictive processes. We ask not how predictions come to be made, but how—granting that they are made—they come to be sorted out as valid and invalid.”

***

Goodman ends the final lecture in Fact, Fiction and Forecast with humility:

“But none of these speculations should be taken for a solution. I am not offering any easy and automatic device for disposing of all, or indeed of any, of these problems. Ample warning of the distance from promising idea to tenable theory has been given by the complexities we have had to work through in developing our proposal concerning projectibility; and even this task is not complete in all details. I cannot reward your kind attention with the comforting assurance that all has been done, or with the perhaps almost as comforting assurance that nothing can be done. I have simply explored a not altogether familiar way of attacking some of all too familiar problems.”

The problem of induction is hard. Reading Goodman wrestle with it inside a logical-linguistic framework reminded me of an acknowledgment Judea Pearl made in the preface to his book Probabilistic Reasoning in Intelligent Systems: “Rina [Dechter] and Hector [Geffner], in particular, are responsible for wresting me from the security blanket of probability theory into the cold darkness of constraint networks, belief functions and nonmonotonic logic.” [emphases added]

Yarden Katz is a fellow in the Dept. of Systems Biology at Harvard Medical School.