# A mathematical proof of Bayesianism?

This is going to be another boring post (at least for most people who are not nerds).

However before approaching interesting questions such as the existence of God, morality and history a sound epistemology (theory of knowledge) must already be present. During most (heated) debates between theists and atheists, people tend to take for granted many epistemological principles which are very questionable.

This is why I spend a certain amount of my time exploring such questions, as a groundwork for more applied discussions.

I highly recommand all my reader to first read my two other posts on the concept of probability before reading what follows.

Bayesianism is a theory of knowledge according to which our degrees of belief in theories are well defined probabilities taking on values between 0 and 1.

According to this view, saying that string theory has a probability of 0.2 to be true is as meaningful as saying that a normal dice randomly thrown has a probability of 1/6 to produce a “3”.

Bayesians like asserting over and over again that it is mathematically proven to say we ought to compute the likelihood of all beliefs according to the laws of probability and first and foremost Bayes formula:

Here I want to debunk this popular assertion. Bayes theorem can be mathematically proven for frequential probabilities but there is no such proof that ALL our degrees of belief behave that way.

Let us consider (as an example) the American population (360 millions people) and two features a person might have.

CE (Conservative Evangelical): the individual believes that the Bible contains no error.

FH (Fag Hating): the individual passionately hates gay people.

Let us suppose that 30% of Americans are CE and that 5.8% of Americans hate homosexuals.

The frequencies are f(CE) = 0.30 and f(FH) = 0.058

Let us now consider a random event: you meet an American by chance.
What is the probability that you meet a CE person and what is the probability that you meet a FH individual?
According to a frequentist interpretation, the probability equals the frequency of meeting such kinds of persons given a very great (actually infinite) number of encounters.
From this it naturally follows that p(CE) = f(CE) = 0.30 and p(FH) = f(FH) = 0.058

Let us now introduce the concept of conditional probability: if you meet a Conservative Evangelical, what is the probability that he hates faggots p(FH|CE)? (the | stands for „given“).

If you meet a fag-hating person, what is the probability that he believes in Biblical inerrancy p(CE|FH)?

To answer these questions (thereby proving Bayes theorem) it is necessary to get back to our consideration of frequencies.

Let us consider that 10% of all Conservative Evangelicals and 4% of people who are not CE hate faggots: f(FH/CE) = 0.1 and f(FH/CE) = 0.04. The symbol ⌐ stands for the negation (denial) of a proposition.

The proportion of Americans who are both conservative Evangelicals and fag-haters is f(FHCE) = f(FH/CE)*f(CE) = 0.1*0.3 = 0.03.

The proportion of Americans who are NOT conservative Evangelicals but fag-haters is f(FH∩⌐CE) = f(FH/⌐CE)*f(⌐CE) = 0.04*0.7 = 0.028.

Logically the frequency of fag-haters in the whole American population is equal to the sum of the two proportions:

f(FH) = f(FHCE) + f(FH∩⌐CE) = 0.03 + 0.028 = 0.058

But what if we are interested to know the probability that a person is a conservative Evangelical IF that person hates queers p(CE|FH)?

This corresponds to the frequency(proportion) of Conservative Evangelicals among Fag-Haters: f(CE|FH).

We know that f(FHCE) = f(CE∩FH) = f(CE|FH)*f(FH)

Thus f(CE|FH) = f(FH∩CE) / f(FH)

Given a frequentist interpretation of probability, this entails that

which is of course Bayes theorem. We have mathematically proven it in this particular case but the rigorous mathematical demonstration would be pretty much the same given events expressable as frequencies.

If you meet an American who hates gays, the probability that he is a Conservative Evangalical is 51.72% (given the validity of my starting values above).

But let us now consider the Bayesian interpretation of probability (our degree of confidence in a theory) in a context having nothing to do with frequencies.

Let S be “String theory is true“ and UEP “an Undead Elementary Particle has been detected during an experience in the LHC“.

In that context, the probabilities correspond to our confidence in the truth of theories and hypotheses.

We have no compelling grounds for thinking that

, that is to say that is the way our brains actually work or ought to work that way in order to strive for truth.

The mathematical demonstration used to prove Bayes theorem relies on related frequencies and cannot be employed in a context where propositions (such as S and UEP) cannot be understood as frequencies.
Considering ALL our degrees of beliefs like probabilities is a philosophical decision and not an inevitable result of mathematics.

I hope that I have been not too boring for lay people.

Now I have a homework for you: what is the probability that Homeschooling Parents would like to employ my post as an introduction to probability interpretation, given that they live in the Bible Belt  p(HP|BB)?

## 20 thoughts on “A mathematical proof of Bayesianism?”

1. Lotharson,
I am not sure that I entirely understand the mathematics concerning non-frequencies but I found the example interesting. Although I thought the percentage would be higher.

• Okay, thanks for your opinion!

According to your own experience, what percentage would you use?

• It’s only a guess but I would think it was more like 75%.

2. I had a bit of trouble understanding your differentiation between frequentist and Bayesian probability philosophy, so I found Frequentist vs. Bayesian statistics: resources to help you choose, where the author conveniently differentiated between the two. (P is probability, D is data, H is hypothesis)

P(D | H) — frequentist
P(H | D) — Bayesian

A lot of the choice between frequentist and Bayesian statistics comes down to whether you think science should comprise statements about the world, or statements about our beliefs.

I’m not sure about what the author says about statements about the world vs. beliefs; he hasn’t convinced me and neither have you. :-p I just don’t know enough about the two different views to settle on one over the other.

Now, you say:

The mathematical demonstration used to prove Bayes theorem relies on related frequencies and cannot be employed in a context where propositions (such as S and UEP) cannot be understood as frequencies.

First, you mean ‘related probabilities‘, else you’re mixing frequentism and Bayesianism. That’s just a quibble. Second, why do you say that P(S) is an invalid concept? We can certainly become more and less confident in a theory. Scientific observations are not sacrosanct; see the faster-than-light neutrino anomaly. But here I am talking about statements about belief, and not about reality. Then again, there is an inherent ‘fuzz’ in reality, whereby all models of it are only accurate to some level. Some may dispute this, but no convincing argument has been advanced.

• Hello Luke!

There are two issues which are raised here.

1) Let us suppose for the sake of the argument that the intensity of our beliefs in our brain can be mapped as a continuous function onto the interval [0;1].
What does that mean that:
– String theory has a probability of 0.3 to be true
– String theory has a probability of 0.6 to be true
– String theory has a probability of 0.8 to be true ?
What is the relation with the real world? I don’t see how one can avoid subjectivity.

2) why should the probability of String Theory to be true OBEY Bayes theorem? It is certainly not possible to demonstrate it mathematically. As I showed in the post, the demonstration only holds for probabilities understood as frequencies for an infinite number of events.
It goes without saying you cannot use this line of reasoning for ST.

Cheers.

• 1) I cannot help but think that our confidence in propositions exists on (0, 1)—I prefer to exclude the endpoints—and that Bayesian inference well-models the updating of our confidences.

I’m reticent to talk about whether or not string theory is true, because what I really believe is that either it will or will not be a better model of reality than the ones we have so far. In other words, I think string theory is actually false, but it may be less false than what came before it. 🙂 I should adopt string theory for a given purpose if it is better at what I want than the other available options. Sometimes, good old F = ma will be sufficient. Note that I believe reality is infinite in description, and therefore we will only ever be able to approximate it better and better.

2) Bayes’ theorem merely talks about connected observations and beliefs. A single, distinct observation/proposition/belief cannot obey Bayes’ theorem because we haven’t claimed that it is connected to anything.

There is a way to tweak what you’ve said, which slightly draws on the multiverse theory, but only really as inspiration: what is the probability that I live in a world where string theory is a good model? It seems to me that the only way to answer such a question is to make observations and see whether they line up with string theory or do not line up with it.

3. Hmmm. Although I agree with your general pessimism on using Bayes in philosophy, I’m not convinced by your argument here. For a couple of reasons.

1. There are frequentist interpretations of confidence. So someone can argue that they are using confidence probabilities in that way. Carrier explicitly does this in Proving History, though he thinks he is inventing a new way to ‘solve’ the Bayesian / Frequentist problem, which of course he is not. To the extent that such an interpretation is valid, your argument disintegrates. Of course, even if the interpretation works, it has troubling implications on the ability to actually calculate probabilities for confidence. But then we’re back to the effect of error on calculations, rather than the applicability of Bayes’s theorem.

2. If we reject #1, we’re still left with the problem that Bayes’s theorem, acting on belief probabilities, can be checked empirically. And it appears to work. At least to within the kinds of tolerance that a debater would want to claim to use it in their arguments. I’ve relied on Bayesian interpretations of probability for my paycheck. If Bayes’s Theorem breaks down in such cases, then the errors it introduces are so small compared to the noise in any reasonable data set, that it is hardly worth mentioning as an objection in domains where the noise is so dominant.

• Dear Ian, thank you very much for your critical comment.

In this post, I only wanted to show that one cannot prove mathematically that ALL our degrees of belief (or confidence in a theory) obey Bayes theoerem because such a demonstration demand them to be related to frequencies.

You answer me that there are frequentist interpretations of confidence. Fair enough. But what is the frequency that string theory is the correct description of the multiverse?
Or what is the related frequency?

Otherwise I agree that applying Bayes theorem leads to success in many fields. But I have not seen a good justification of its use in fields where frequencies are not meaningful, such as String Theory or the existence of God for that matter.

Cheers.

• I agree. When you have terms that are so vaguely defined and ambiguous, there is simply no way to apply quantitative reasoning to it.

• Actually, I don’t think that the problem lies in the ill-definition but in the non-frequential nature of the beliefs.
The great philosopher of science Elliot Sober gave the example of the theory of gravitation:

“Newton’s universal law of gravitation, when suitably supplemented with plausible background assumptions, can be said to confer probabilities on observations. But what does it mean to say that the law has a probability in the light of those observations? More puzzling still is the idea that it has a probability before any observations are taken into account. If God chose the laws of nature by drawing slips of paper from an urn, it would make sense to say that Newton’s law has an objective prior. But no one believes this process model, and nothing similar seems remotely plausible.”

I think that his point is well taken and I don’t see how you can define this probability as being something more than a subjective brain state.
And there would be no mathematical demonstration that, in that particular situation, Bayes theorem can be employed for actualizing our confidence.

Otherwise I happen to think one can use an extended concept of frequentist probability for considering historical events.
This would have the advantage to get rid of subjectivity altogether and compute objective probabilities.
In the next weeks I will lay the groundwork for such an approach.

To be honest I am not very excited at the perspective of buying Carrier’s book. While he certainly has some good ideas, I don’t like his arrogant and condescending tone towards those disagreeing with his pet theories.

Cheers from Lancaster.

• Ian says:

Good luck then, because I’m starting out disagreeing. But I look forward to being convinced! So far you’re pretty convincing, I’m just not buying yet… 🙂

4. Bayes’s theorem, or the definition of conditional probability, doesn’t rely on a frequentist interpretation of probability.

Bayes’s theorem follows from the formula for conditional probability (P(A|B) = P(AB)/P(B)) and a couple of obvious axioms of probability (the probability of the entire space is 1 and probabilities of disjoint events sum). So what you’re really saying is that P(A|B) = P(AB)/P(B) is only true in a frequentist interpretation. But that’s not true, either.

First one needs to define the conditional probability measure P(・|B). This is a probability measure on the subspace B (of the original, larger space); that is, it satisfies all the axioms of probability, with P(B|B) = 1. Moreover, ratios of probabilities for events contained with B shouldn’t depend on whether one uses the original probability measure or the condition one. This has nothing to do with frequentism but is required for our notion of probability to make sense. To give an example, let’s say that the probability that someone is a left-handed man is 2 times the probability that he is a man over 6′ tall. This ratio shouldn’t change if we condition on the fact that he is a man. Otherwise we could change the probability ratio just by arbitrarily conditioning on any event which contains the 2 events in question, say, the fact that the person is human, or a mammal.

The fact that this ratio is invariant under conditioning means that P(A|B)/P(C|B) = P(A)/P(C) for any 2 events A, C contained in B. Choosing the event B as our event C (which is permissible since B is contained in itself) gives: P(A|B)/P(B|B) = P(A)/P(B), or P(A|B) = P(AB)/P(B) for events A contained in B. For an arbitrary event that is not wholly contained in B, just split it up into the parts inside and outside B and use additivity of probability and the fact that P(A|B) = 0 if A is disjoint from B to get the same formula.

So Bayes’s formula just follows from the axioms of probability and a basic consistency requirement for the meaning of conditional probability. It’s hard to see what a probability theory would look like if it didn’t fulfill these conditions.

• Hello Malcom, thanks for your challenging comment.

My point was that we have no compelling reason to think that our degrees of beliefs have to obey the laws of conditional probabilities for situations having no connection with frequencies whatsoever.
Bayesian assume that in every situation our degrees of belief satisfy the axioms of probability. But what is a degree of belief in a situation involving no frequency at all?

Let us assume, for the sake of the argument, that one can map any degree of belief in a theory T as the intensity of an ensemble of brain processes which takes on continuous values between 0 and 1: I(T).

Let us consider the propositions S: String theory is true and M: we live in a multiverse.
Let us suppose that my degree of belief we live in a multiverse is I(M) = 0.30 and my degree of belief we live in a multiverse where String theory is true is I(S and M) = 0.05.

Why should I(S given M) = I(S and M) / I(M) = 0.05/*0.30 ?

Answering “because the intensity of ALL our degrees of belief in our brain must behave like mathematical probabilities” would seem to beg the question.

Cheers.

• You seem to be mixing up a few things here.

1. The laws of probability are not synonymous with frequentism. It’s true that Bayes’s theorem follows from the axioms of probability and the definition of conditional probability, but these things do not depend on frequentism but hold for other interpretations of probability as well. You can find a list of the 3 axioms of (Kolmogorov’s) probability theory online here: http://en.wikipedia.org/wiki/Probability_axioms. They are pretty basic, especially if one considers only finite additivity (which is all that is required for Bayes’s theorem). Conditional probability is usually added to the theory via a definition or additional axiom, which is just P(A|B) = P(AB)/P(B), but as I pointed out it can derived from an even more intuitive axiom, namely, that odds ratios should be independent of extraneous events.

2. When one talks about “subjective probability” philosophically or mathematically, one is still assuming that these probabilities obey some rules. E.g., it wouldn’t make sense to talk about a negative probability or probabilities greater than 1, at least in any usual sense of the word “probability.” Of course, you are free to imagine any set of probabilities you like, regardless of whether they are consistent in any way. For example, you may think that there is a 10% chance that tomorrow’s high temperature is at least 20 degrees and and a 20% chance it is at least 30 degrees, but one could hardly expect that other people would put much stock in your opinion in that case.

So when Carrier or anyone else applies Bayes’s theorem to a question involving “subjective” probabilities, the implicit assumption is being made that what is being called “probability” satisfies the axioms of probability theory. This doesn’t seem, to me at least, to be that restrictive a qualification.

• Hello Malcom!

I believe one can consider objective and frequential probabilities of historical events by considering a theoretic infinite population which might really exist if we live in an infinite multiverse.
So I would advise historians to give up computing the intensity of their subjective beliefs and instead to use Bayes theorem as a mean to calculate frequential probabilities existing objectively independently from us.
Scientifically it is obvious that the latter approach is superior since it directly touches the real world. In future posts, I will explain how this can play out.

I am still not convinced that the intensity of my beliefs about universal gravitation or string theory ought to obey all axioms of probability, including the invariance of the ratios you mentionned.
Taking this step is a philosophical decision and is by no means compelled by mathematics itself.
Ontologically speaking, what does that mean that string theory has 20% or 60% odds to be true? A frequential interpretation of these values seems impossible.

To my mind, it is perfectly possible to assume that the intensities of our conviction in situation NOT involving frequencies (such as the truth of quantum gravitation) have no objective meaning.

And there is a related problem concerning the determination and even existence of PRIORS for such situations.
As the great philosopher Elliot Sober wrote:

““Newton’s universal law of gravitation, when suitably supplemented with plausible background assumptions, can be said to confer probabilities on observations. But what does it mean to say that the law has a probability in the light of those observations? More puzzling still is the idea that it has a probability before any observations are taken into account. If God chose the laws of nature by drawing slips of paper from an urn, it would make sense to say that Newton’s law has an objective prior. But no one believes this process model, and nothing similar seems remotely plausible.””

So I think I am NOT compelled by rationality to accept that the intensity of my beliefs about this should behave like a probability, since they are not objectively definable.

But as I said I think one can (theoretically) use Bayes’s theorem to calculate the objective frequential probability of historical events given all what is known.
And such a probability would exist objectively and would not be merely a state of our brain.

Cheers from the UK.

• lotharson,

Here’s the first hit that you get when you google “interpretation of probability”: http://plato.stanford.edu/entries/probability-interpret/

Read section 3.3 on subjective probability, in particular. You can argue for a theory of complete subjectivity with no rationality bounds, but as the author states, “unconstrained subjectivism is not a serious proposal.” Rejecting the probability calculus also leaves you vulnerable to a Dutch book.

Use of Bayes’s theorem, and all the other attempts, to discuss the likelihood of past events being true assume that such probabilities obey Kolmogorov’s axioms, at least up to finite additivity. Personally I don’t find this to be a big constraint, but no one is forcing you to be rational or consistent.

(BTW, no offense intended, but I would think that someone from the UK should be able to spell my name.)

5. Hi lotharson, interesting post but my non-mathematically-inclined brain isn’t quite up to the equations! I’m not sure I’ve fully understood Bayesianism yet (actually, I’m fairly sure I haven’t).

So for a layman like me, was the old London bus campaign ‘There’s probably no God’ using a Bayesian view of probability?

As a non-mathematician, I find it hard to see how measures of probability can be applied to religious questions like whether there’s a God, or which religion best represents God, etc.

I seem to remember a passage in Dawkins’ “God Delusion” which tried to put a numerical probability on the likelihood of God existing. That always struck me as one of his less convincing lines of reasoning.

Based on certain starting assumptions I suppose you *might* be able to argue that particular religious views are more or less likely to be true. I’d certainly like to think that it’s possible to argue that Scientology is 99.999% likely to be complete b*ll*cks, for example! But in practice I’m not sure how I could justify such a statement.

Anyway, interested to hear your thoughts.

6. Hi Lothar,

This is interesting, but I would have to read a book on Bayesianism before I could peruse the argument, and right now there are a lot of books higher on my priority list.