# The crazy bookmaker and the Cult of probability

## A Critique of the Dutch Book Argument

Many neutral observers concur into thinking we are assisting to the formation of a new religion among hopelessly nerdy people.

I’m thinking of course on what has been called hardcore Bayesianism, the epistemology according to which each proposition (“Tomorrow it’ll rain”, “String theory is the true description of the world”, “There is no god” etc.) has a probability which can and should be computed under almost every conceivable circumstance.

In a previous post I briefly explained the two main theories of probabilities, frequentism and Bayesianism. In another post, I laid out my own alternative view called “knowledge-dependent frequentism” which attempts at keeping the objectivity of frequentism while including the limited knowledge of the agent. An application to the Theory of Evolution can be found here.

It is not rare to hear Bayesians talk about their own view of probability as a life-saving truth you cannot live without, or a bit more modestly as THE “key to the universe“.

While trying to win new converts, they often put it as if it were all about accepting Bayes theorem whose truth is certain since it has been mathematically proven. This is a tactic I’ve seen Richard Carrier repeatedly employing.

I wrote this post as a reply for showing that frequentists accept Bayes theorem as well, and that the matter of the dispute isn’t about its mathematical demonstration but about whether or not one accepts that for every proposition, there exists a rational degree of belief behaving like a probability.

## Establishing the necessity of probabilistic coherence

One very popular argument aiming at establishing this is the “Dutch Book Argument” (DBA). I think it is no exaggeration to state that many committed Bayesians venerate it with almost the same degree of devotion a Conservative Evangelical feels towards the doctrine of Biblical inerrancy.

Put forward by Ramsey and De Finetti, it defines a very specific betting game whose participants are threatened by a sure loss (“being Dutch booked”) if the amounts of their odds do not fulfill the basic axioms of probabilities, the so-called Kolmogorov’s axioms (I hope my non-geeky readers will forgive me one day for becoming so shamelessly boring…):

1) the probability of an event is always a real positive number

2)  the probability of an event regrouping all possibilities is equal to 1

3) the probability of the sum of disjoint events is equal to the sum of the probability of each event

The betting game upon which the DBA lies is defined as follows: (You can skip this more technical green part whose comprehension isn’t necessary for following the basic thrust of my criticism of the DBA).

## A not very wise wager

Let us consider an event E upon which it must be wagered.

The bookmaker determines a sum of money S (say 100 €) that a person R  (Receiver) will get from a person G (Giver) if E comes true. But the person R  has to give p*S to the person G beforehand.

The bookmaker determines himself who is going to be R and who is going to be G.

Holding fast to these rules, it’s possible to demonstrate that a clever bookmaker can set up things in such a way that any better not choosing p respecting the laws of probabilities will lose money regardless of the outcome of the event.

Let us consider for example that a better wagers upon the propositions

1) “Tomorrow it will snow” with P1 = 0.65  and upon

2) “Tomorrow it will not snow” with P2 = 0.70.

P1 and P2 violate the laws of probability because the sum of the probabilities of these two mutually exclusive events should be 1 instead of 1.35

In this case, the bookmaker would choose to be G and first get P1*S + P2*S = 100*(1.135) = 135 €  from his better R. Afterwards, he wins in the two cases:

– It snows. He must give 100 € to R because of 1).  The bookmaker’s gain is  135 € – 100 = 35 €

– It doesn’t snow. He must give 100 € to R because of 2).  The bookmaker’s gain is also 135 € – 100 = 35 €

Let us consider the same example where this time the better comes up with P1 = 0.20 and P2 = 0.3 whose sum is largely inferior to 1.

The Bookmaker would choose to be R giving 0.20*100 = 20 € about the snow and 0.3*100 = 30 € about the absence of snow. Again, he wins in both cases:

– It snows. The better must give 100 € to R (the bookmaker) because of 1).  The bookmaker’s gain is -30 – 20 +100 = 50 €

– It does not snows. The better must give 100 € to R (the bookmaker) because of 2).  The bookmaker’s gain is  -30 – 20 +100 = 50 €

In both cases, P1 and P2 having fulfilled the probability axioms would have been BOTH a necessary and sufficient condition for keeping the sure loss from happening.

The same demonstration can be generalized to all other basic axioms of probabilities.

## The thrust of the argument and its shortcomings

The Dutch Book Argument can be formulated as follows:

1) It is irrational to be involved in a bet where you’re bound to lose

2) One can make up a betting game such that for every proposition, you’re doomed to lose if the sums you set do not satisfy the rules of probabilities. In the contrary case you’re safe.

3) Thus you’d be irrational if the amounts you set broke the rules of probabilities.

4) The amounts you set are identical to your psychological degrees of belief

5) Hence you’d be irrational if your psychological degrees of beliefs do not behave like probabilities

Now I could bet any amount you wish there are demonstrably countless flaws in this reasoning.

### I’m not wagering

One unmentioned premise of this purely pragmatic argument is that the agent is willing to wager in the first place. In the large majority of situations where there will be no opportunity for him to do so, he wouldn’t be irrational if his degrees of beliefs were non-probabilistic because there would be no monetary stakes whatsoever.

Moreover, a great number of human beings always refuse to bet by principle and would of course undergo no such threat of “sure loss”.

Since it is a thought experiment, one could of course modify it in such a way that:

“If you don’t agree to participate, I’ll bring you to Guatemala where you’ll be water-boarded until you’ve given up”.

But to my eyes and that of many observers, this would make the argument look incredibly silly and convoluted.

### I don’t care about money

Premise 1) is far from being airtight.

Let us suppose you’re a billionaire who happens to enjoy betting moderate amounts of money for various psychological reasons. Let us further assume your sums do not respect the axioms of probabilities and as a consequence you lose 300 €, that is 0.00003% of your wealth while enjoying the whole game. One must use an extraordinarily question-begging notion of rationality for calling you “irrational” in such a situation.

### Degrees of belief and actions

It is absolutely not true that our betting amounts HAVE to be identical or even closely related to our psychological degree of beliefs.

Let us say that a lunatic bookie threatens to kill my children if I don’t accept to engage in a series of bets concerning insignificant political events in some Chinese provinces I had never heard of previously.

Being in a situation of total ignorance, my psychological degree of beliefs are undefined and keep fluctuating in my brain. But since I want to avoid a sure loss, I make up amounts behaving like probabilities which will prevent me from getting “Dutch-booked”, i.e. amounts having nothing to do with my psychology.

So I avoid sure loss even if my psychological states didn’t behave like probabilities at any moment.

### Propositions whose truth we’ll never discover

There are countless things we will never know (at least assuming atheism is true, as do most Bayesians.)

Let us consider the proposition: “There exists an unreachable parallel universe which is fundamentally governed by a rotation between string-theory and loop-quantum gravity and many related assertions.

Let us suppose I ask to a Bayesian friend: “Why am I irrational if my corresponding degrees of belief in my brain do not fulfill the basic rules of probability?”

The best thing he could answer me (based on the DBA) would be:

“Imagine we NOW had to set odds about each of these propositions. It is true we’ll never know anything about that during our earthly life. But imagine my atheism was wrong: there is a hell, we are both stuck in it, and the devil DEMANDS us to abide by the sums we had set at that time.

You’re irrational because the non-probabilistic degrees of belief you’re having right now means you’ll get dutch-booked by me in hell in front of the malevolent laughters of fiery demons.”

Now I have no doubt this might be a good joke for impressing a geeky girl being not too picky (which is truly an extraordinarily unlikely combination).

But it is incredibly hard to take this as a serious philosophical argument, to say the least.

## A more modest Bayesianism is probably required

To their credits, many more moderate Bayesians have started backing away from the alleged strength and scope of the DBA and state instead that:

“First of all, pretty much no serious Bayesian that I know of uses the Dutch book argument to justify probability. Things like the Savage axioms are much more popular, and much more realistic. Therefore, the scheme does not in any way rest on whether or not you find the Dutch book scenario reasonable. These days you should think of it as an easily digestible demonstration that simple operational decision making principles can lead to the axioms of probability rather than thinking of it as the final story. It is certainly easier to understand than Savage, and an important part of it, namely the “sure thing principle”, does survive in more sophisticated approaches.”

Given that Savage axioms rely heavily on risk assessment, they’re bound to be related to events very well treatable through my own knowledge-dependent frequentism, and I don’t see how they could justify the existence and probabilistic nature of degree of beliefs having no connection with our current concerns (such as the evolutionary path through which a small sub-species of dinosaurs evolved countless years ago).

To conclude, I think there is a gigantic gap between:

– the fragility of the arguments for radical Bayesianism, its serious problems such as magically turning utter ignorance into specific knowledge.

and

– the boldness, self-righteousness and terrible arrogance of its most ardent defenders.

I am myself not a typical old-school frequentist and do find valuable elements in Bayesian epistemology but I find it extremely unpleasant to discuss with disagreeable folks who are much more interested in winning an argument than in humbly improving human epistemology.

Thematic list of ALL posts on this blog (regularly updated)

My other blog on Unidentified Aerial Phenomena (UAP)

# A mathematical proof of Ockham’s razor?

$i\hbar\frac{\partial}{\partial t}\left|\Psi(t)\right>=H\left|\Psi(t)\right>$

Ockham’s razor is a principle often used to dismiss out of hand alleged phenomena deemed to be too complex. In the philosophy of religion, it is often invoked for arguing that God’s existence is extremely unlikely to begin with owing to his alleged incredible complexity. A geeky brain is desperately needed before entering this sinister realm.

In a earlier post I dealt with some of the most popular justifications for the razor and made the following distinction:

Methodological Razor: if theory A and theory B do the same job of describing all known facts C, it is preferable to use the simplest theory for the next investigations.

Epistemological Razor: if theory A and theory B do the same job of describing all known facts C, the simplest theory is ALWAYS more likely.”

Like the last time, I won’t address the validity of the Methodological Razor (MR) which might be an useful tool in many situations.

My attention will be focused on the epistemological glade and its alleged mathematical grounding.

${\frac{x+y}{xy}}$

## Example: prior probabilities of models having discrete variables

### Presentation of the problem

We consider five functions that predicts an output Y (e.g. the velocity of a particle in an agitated test tube) which depends on an input X (e.g. the rotation speed).

Those five functions themselves depend on a given number of unknown parameters $latex a_i$.

$latex f1(a1)[X]$
$f2(a1,a2)[X]$
$latex f3(a1,a2,a3)[X]$
$latex f4(a1,a2,a3,a4)[X]$
$latex f5(a1,a2,a3,a4,a5)[X]$

To make the discussion somewhat more accessible to lay people, we shall suppose that the $latex a_i$ can only take on five discrete values: {1,2,3,4,5}
Let us suppose that an experiment was performed.
For x = 200 rpm (rotation per minute), the measured velocity of the particle was y = 0.123 m/s.

Suppose now that there is only one set of precise values that allows the function fi to predict the measurement E.
For example
f1(2)[200 rpm]= f2(1,3)[200 rpm]= f3(5,2,1)[200 rpm]=f4(2,1,4,5)[200 rpm]=f5(3,5,1,3,2)[200 rpm]= 0.123 m/s.

Now we want to evaluate the strength of the different models.
How are we to proceed?

Many scientists (including myself) would say that the five functions fit perfectly the data and that we would need further experiments to discriminate between them.

$latex your-latex-code-here$

### The objective Bayesian approach

Objective Bayesians would have a radically different approach.
They believe that all propositions (“The grass is greener in England than in Switzerland”, “Within twenty years, healthcare in Britain will no longer be free”, “The general theory of relativity is true”…) is associated with a unique precise degree of belief every rational agent knowing the same facts should have.

They further assert that degrees of belief ought to obey the laws of probability using diverse “proofs” such as the Dutch Book Argument (but see my critical analysis of it here).

Consequently, if at time t0, we believe that model M has a probability p(M) of being true, and if at t2 we get new measurement E, the probability of M should be updated according to Bayes’ theorem:

$latex p(M|E) = \frac{p(M)*p(E|M)}{(p(E|M)+p(E|\overline{M})}$.

p(M|E) is called the posterior, p(M) is the prior, p(E|M) is the likelihood of the experimental values given the truth of model M and p(E|M)+p(E|non M) is the total probability of E.
A Bayesian framework can be extremely fruitful if the prior p(M) is itself based on other experiments.

But at the very beginning of the probability calculation chain, p(M) we are in a situation of “complete ignorance”, to use the phrase of philosopher of science John Norton.

Now back to our problem.

An objective Bayesian would apply Bayes’ theorem and conclude that the probability of a model fi is given by:

p(fi|E) = p(fi)*p(E|fi)/(p(E|fi)+p(E|non fi))

Objective Bayesians apply the principle of indifference, according to which in utterly unknown situations every rational agent assigns the same probability to each possibility.

As a consequence, we get p(f1)=p(f2)=…=p(f5)=0.2

p(E|fi) is more tricky to compute. It is the probability that E would be produced if fi is true.

For this reason O(i,j) is usually referred to as an Ockham’s factor, because it penalizes the likelihood of complex models. If you are interested in the case of models with continuous real parameters, you can take a look at this publication. The sticking point of the whole demonstration is its heavy reliance on the principle of indifference.

## The trouble with the principle of indifference

I already argued against the principle of indifference in an older post. Here I will repeat and reformulate my criticism.

### Turning ignorance into knowledge

The principle of indifference is not only unproven but also often leads to absurd consequences. Let us suppose that I want to know the probability of certain coins to land odd. After having carried out 10000 trials, I find that the relative frequency tends to converge towards a given value which was 0.35, 0.43, 0.72 and 0.93 for the four last coins I investigated. Let us now suppose that I find a new coin I’ll never have the opportunity to test more than one time. According to the principle of indifference, before having ever started the trial, I should think something like that:

Since I know absolutely nothing about this coin, I know (or consider here extremely plausible) it is as likely to land odd as even.

I think this is magical thinking in its purest form. I am not alone in that assessment.

The great philosopher of science Wesley Salmon (who was himself a Bayesian) wrote what follows. “Knowledge of probabilities is concrete knowledge about occurrences; otherwise it is uselfess for prediction and action. According to the principle of indifference, this kind of knowledge can result immediately from our ignorance of reasons to regard one occurrence as more probable as another. This is epistemological magic. Of course, there are ways of transforming ignorance into knowledge – by further investigation and the accumulation of more information. It is the same with all “magic”: to get the rabbit out of the hat you first have to put him in. The principle of indifference tries to perform “real magic”. “

Objective Bayesians often use the following syllogism for grounding the principle of indifference.

1)If we have no reason for favoring one outcomes, we should assign the same probability to each of them

2) In an utterly unknown situation, we have no reason for favoring one of the outcomes

3) Thus all of them have the same probability.

The problem is that (in a situation of utter ignorance) we have not only no reason for favoring one of the outcomes, but also no grounds for thinking that they are equally probable.

The necessary condition in proposition 1) is obviously not sufficient.

This absurdity (and other paradoxes) led philosopher of mathematics John Norton to conclude:

“The epistemic state of complete ignorance is not a probability distribution.”

The Dempter Shafer theory of evidence offers us an elegant way to express indifference while avoiding absurdities and self-contradictions. According to it, a conviction is not represented by a probability (real value between 0 and 1) but by an uncertainty interval [ belief(h) ; 1 – belief(non h) ] , belief(h) and belief(non h) being the degree of trust one has in the hypothesis h and its negation.

For an unknown coin, indifference according to this epistemology would entail  belief(odd) = belief(even) = 0, leading to the probability interval [0 ; 1].

### Non-existing prior probabilities

Philosophically speaking, it is controversial to speak of the probability of a theory before any observation has been taken into account. The great philosopher of evolutionary biology Elliot Sober has a nice way to put it: ““Newton’s universal law of gravitation, when suitably supplemented with plausible background assumptions, can be said to confer probabilities on observations. But what does it mean to say that the law has a probability in the light of those observations? More puzzling still is the idea that it has a probability before any observations are taken into account. If God chose the laws of nature by drawing slips of paper from an urn, it would make sense to say that Newton’s law has an objective prior. But no one believes this process model, and nothing similar seems remotely plausible.”

It is hard to see how prior probabilities of theories can be something more than just subjective brain states.

## Conclusion

The alleged mathematical demonstration of Ockham’s razor lies on extremely shaky ground because:

1) it relies on the principle of indifference which is not only unproven but leads to absurd and unreliable results as well

2) it assumes that a model has already a probability before any observation.

Philosophically this is very questionable. Now if you are aware of other justifications for Ockham’s razor, I would be very glad if you were to mention them.

# Knowledge-dependent frequentist probabilities

This is going to be a (relatively) geeky post which I tried to make understandable for lay people.

Given the important role than epistemological assumptions play in debate between theists and atheists, I deemed it necessary to first write a groundwork upon which more interesting discussions (about the existence of God, the historicity of Jesus, miracles, the paranormal…) will lie.

## Bayesianism, Degrees of belief

In other posts I explained why I am skeptical about the Bayesian interpretation of probabilities as degrees of belief. I see no need to adjust the intensity of our belief in string theory (which is a subjective feeling) in order to do good science or to avoid irrationality.

Many Bayesians complain that if we don’t consider subjective probabilities, a great number of fields  such as economy, biology, geography or even history would collapse.
This is a strong pragmatic ground for being a Bayesian I hear over and over again.

## Central limit theorem and frequencies

I don’t think this is warranted for I believe that the incredible successes brought about by probabilistic calculations concern events which are (in principle) repeatable and therefore open to a frequentist interpretation of the related likelihoods.

According to a knowledge-dependent interpretation of frequentism I rely on the probability of an event is its frequency if the known circumstances were to be repeated an infinite number of times.

Let us consider an ideal dice which is thrown in a perfectly random way. Obviously we can only find approximations of this situation in the real world, but a computer can reasonably do the job.

In the following graphics, I plotted the results for five series of trials.

The frequentist probability of the event is defined as

,

that is the limit of the frequency of “3” when the number of trials becomes close to infinity.

This is a mathematical abstraction which never exists in the real world, but from the 6000-th trial onward the frequency is a very good approximation of the probability which will converge to the probability according to the central limit theorem.

Actually my knowledge-dependent frequentist interpretation allows me to consider the probability of unique events which have not yet occurred.

For example, a Bayesian wrote that “the advantage of this view over the frequency interpretation is that it can deal with cases where there is no relative frequency to draw on: for example, Gigerenzer mentions the first ever heart transplant patient who was given a 70% chance of survival by the surgeon. Under the frequency interpretation that statement made no sense, because there had never actually been any similar operations by then.“

I think there are many confusions going on here.
Let us call K the total knowledge of the physician which might include the different bodily features of the patient, the state of his organs and the hazard of the novel procedure.

The frequentist probability would be defined as the ratio of surviving patients divided by the total number of patients undergoing the operation if the known circumstances underlying K were to be repeated a very great (actually infinite) number of times.Granted, for many people this does not seem as intuitive as the previous example with the dice.
And it is obvious there existed for the physician no frequency he could have used to directly approximate the probability.
Nevertheless, this frequentist interpretation is by no means absurd.

The physician could very well have used Bayes’s theorem to approximate the probability while having only used other frequentist probabilities, such as the probability that the body reacting in a certain way would be followed by death or the probability that introducing a device in some organs could have lethal consequences.

Another example is the estimation of the probability it is going to rain tomorrow morning as you will wake up.

While the situation you are confronted with might very well be unique in the whole history of mankind, the probability is well defined by the frequency of rain if all the circumstances you know of were to be repeated an extremely high number of times.

Given this extended, knowledge-dependent variant of frequentism, the probabilities of single events are meaningful and many fields considered as Bayesian (such as economical simulations, history or evolutionary biology) could be as well interpreted according to this version of frequentism.

It has a great advantage: it allows us to bypass completely subjective degrees of belief and to focus on an objective concept of probability.

Now, some Bayesians could come up and tell me that it is possible that the frequentist probabilities of the survival of the first heart transplant patient or of the weather does not exist: in other words, if the known circumstances were to be repeated an infinite number of times, the frequency would keep oscillating instead of converging to a fixed value (such as 1/6 for the dice).

This is a fair objection, but such a situation would not only show that the frequentist probability does not exist but that the Bayesian interpretation is meaningless as well.

It seems utterly nonsensical to my mind to say that every rational agent ought to have a degree of belief of (say) 0.45 or 0.87 if the frequency of the event (given all known circumstances) would keep fluctuating between 0.01 and 0.99.
For in this case the event is completely unpredictable and it seems entirely misguided to associate a probability to it.

Another related problem is that in such a situation a degree of belief could be no nothing more than a pure mind state with no relation to the objective world whatsoever.

As professor Jon Williamson wrote:
Since Bayesian methods for estimating physical probabilities depend on a given prior probability function, and it is precisely the prior that is in question here, this leaves classical (frequentist) estimation methods—in particular confidence interval estimation methods—as the natural candidate for determining physical probabilities. Hence the Bayesian needs the frequentist for calibration.”

But if this frequentist probability does not exist, the Bayesian has absolutely no way to relate his degree of  belief to reality since no prior can be defined and evaluated.

Fortunately, the incredible success of the mathematical treatment of uncertain phenomenons (in biology, evolution, geology, history, economics and politics to name only a few) show that we are justified in believing in the meaningfulness of the probability of the underlying events, even if they might be quite unique.

In this way, I believe that many examples Bayesians use to argue for the indispensability of their subjectivist probabilistic concept ultimately fail because the same cases could have been handled using the frequentist concept I have outlined here.

However this still leaves out an important aspect: what are we to do about theories such as the universal gravitation, string theory or the existence of a multiverse?
It is obvious no frequentist interpretation of their truth can be given.
Does that mean that without Bayesianism we would have no way to evaluate the relative merits of such competing models in these situations?
Fortunately no, but this will be the topic of a future post.
At the moment I would hate to kill the suspense 🙂

# Why probabilities matter

In real life, it’s pretty rare (some would even say utterly impossible) to be sure of anything at all, like knowing it’s going to rain in one hour, that a conservative president is going to be elected, that you will be happily married in two years and so on and so forth.

We all recognize that it is only meaningful to speak of the probability or likelihood of each of these events.

The question of how to interpret their profound nature (ontoloy) is however, far from being an easy one.

I will use the basic proposition: if I roll the dice, there is a probability of 1/6 I will get a 3 in order to illustrate the two main interpretation of the probability concept out there.

## 1. Frequentism

According to this interpretation, the probability of an event equals its frequency if it is repeated an infinite number of times. If you roll a dice a great number of time, the frequency of the event (that is the number of 3s divided by the total number of rollings) will converge towards 1/6.

Mathematically it is a well defined concept and in many cases it can be relatively easily approximated. One of the main difficulties is that it apparently fails to account for the likelihood of unique situations, such as that (as far as we know in 2013) the Republicans are going to win the next American elections.

This brings us to the next popular interpretation of probability.

## 2. Bayesianism

For Bayesians, probabilities are degrees of belief and each degree of belief is a probability.

My degree of belief that the dice will fall onto 3 is 1/6.

But what is then a „degree of belief“? It is a psychological mind state which is correlated with a certain readiness for action.

According to many proponents of Bayenianism, degrees of belief are objective in so far that every rational creature disposing of a set of information would have exactly the same.

While such a claim is largely defensible for many situations such as the rolling of dices, the spread of a disaease or the results of the next elections, there are cases where it does not seem to make any sense at all.

Take for exampling the young Isaac Newton who was considering his newly developed theory of universal gravitation. What value should his degree of belief have taken on BEFORE he had begun to consider the first data of the real world?

And what would it mean ontologically to say that we have a degree of belief of 60% that the theory is true? What is the relation (in that particular situation) between the intensity of certain brain processes and the objective reality?

Such considerations have led other Bayesians to give up objectivity and define „degrees of belief“ as subjective states of mind, which might however be objectively constrained in many situations.

Another criticism of (strong) Bayesianism is that it ties the concept of probability to the belief of intelligent creatures. Yet it is clear that even in an universe lacking conscious beings, the probability of the decay of an atom and of more fundamental quantum processes would still exist and be meaningful.

For completeness, I should mention the propensity interpretation of Karl Popper who viewed the likelihood of an event as an intrinsic tendency of a physical system to tend towards a certain state of affairs.

So this was my completely unbiased (pun intended!) views on probabilities.

When debating (and fighting!) each other, theists and atheists tend to take their own epistemology (theory of knowledge) as granted.

This often leads to fruitless and idle discussions.

This is why I want to take the time to examine how we can know, what it means to know, before discussing what we can (and cannot) know.

Thematic list of ALL posts on this blog (regularly updated)

My other blog on Unidentified Aerial Phenomena (UAP)

Next episod: Naked Bayesianism.