A mathematical proof of Ockham’s razor?

$ i\hbar\frac{\partial}{\partial t}\left|\Psi(t)\right>=H\left|\Psi(t)\right>$

 

Ockham’s razor is a principle often used to dismiss out of hand alleged phenomena deemed to be too complex. In the philosophy of religion, it is often invoked for arguing that God’s existence is extremely unlikely to begin with owing to his alleged incredible complexity. Bild A geeky brain is desperately needed before entering this sinister realm.

In a earlier post I dealt with some of the most popular justifications for the razor and made the following distinction:

Methodological Razor: if theory A and theory B do the same job of describing all known facts C, it is preferable to use the simplest theory for the next investigations.

Epistemological Razor: if theory A and theory B do the same job of describing all known facts C, the simplest theory is ALWAYS more likely.”

Like the last time, I won’t address the validity of the Methodological Razor (MR) which might be an useful tool in many situations.

My attention will be focused on the epistemological glade and its alleged mathematical grounding.

{\frac{x+y}{xy}}

Example: prior probabilities of models having discrete variables

Presentation of the problem

We consider five functions that predicts an output Y (e.g. the velocity of a particle in an agitated test tube) which depends on an input X (e.g. the rotation speed).

Those five functions themselves depend on a given number of unknown parameters $latex a_i $.

$latex f1(a1)[X] $
$f2(a1,a2)[X] $
$latex f3(a1,a2,a3)[X] $
$latex f4(a1,a2,a3,a4)[X] $
$latex f5(a1,a2,a3,a4,a5)[X] $

To make the discussion somewhat more accessible to lay people, we shall suppose that the $latex a_i$ can only take on five discrete values: {1,2,3,4,5}
Let us suppose that an experiment was performed.
For x = 200 rpm (rotation per minute), the measured velocity of the particle was y = 0.123 m/s.

Suppose now that there is only one set of precise values that allows the function fi to predict the measurement E.
For example
f1(2)[200 rpm]= f2(1,3)[200 rpm]= f3(5,2,1)[200 rpm]=f4(2,1,4,5)[200 rpm]=f5(3,5,1,3,2)[200 rpm]= 0.123 m/s.

Now we want to evaluate the strength of the different models.
How are we to proceed?

Many scientists (including myself) would say that the five functions fit perfectly the data and that we would need further experiments to discriminate between them.

$latex your-latex-code-here$

The objective Bayesian approach

Objective Bayesians would have a radically different approach.
They believe that all propositions (“The grass is greener in England than in Switzerland”, “Within twenty years, healthcare in Britain will no longer be free”, “The general theory of relativity is true”…) is associated with a unique precise degree of belief every rational agent knowing the same facts should have.

They further assert that degrees of belief ought to obey the laws of probability using diverse “proofs” such as the Dutch Book Argument (but see my critical analysis of it here).

Consequently, if at time t0, we believe that model M has a probability p(M) of being true, and if at t2 we get new measurement E, the probability of M should be updated according to Bayes’ theorem:

$latex p(M|E) = \frac{p(M)*p(E|M)}{(p(E|M)+p(E|\overline{M})}$.

p(M|E) is called the posterior, p(M) is the prior, p(E|M) is the likelihood of the experimental values given the truth of model M and p(E|M)+p(E|non M) is the total probability of E.
A Bayesian framework can be extremely fruitful if the prior p(M) is itself based on other experiments.

But at the very beginning of the probability calculation chain, p(M) we are in a situation of “complete ignorance”, to use the phrase of philosopher of science John Norton.

Now back to our problem.

An objective Bayesian would apply Bayes’ theorem and conclude that the probability of a model fi is given by:

p(fi|E) = p(fi)*p(E|fi)/(p(E|fi)+p(E|non fi))

Objective Bayesians apply the principle of indifference, according to which in utterly unknown situations every rational agent assigns the same probability to each possibility.

As a consequence, we get p(f1)=p(f2)=…=p(f5)=0.2

p(E|fi) is more tricky to compute. It is the probability that E would be produced if fi is true.

 

 

 

 

 

For this reason O(i,j) is usually referred to as an Ockham’s factor, because it penalizes the likelihood of complex models. If you are interested in the case of models with continuous real parameters, you can take a look at this publication. The sticking point of the whole demonstration is its heavy reliance on the principle of indifference.

The trouble with the principle of indifference

I already argued against the principle of indifference in an older post. Here I will repeat and reformulate my criticism.

Turning ignorance into knowledge

The principle of indifference is not only unproven but also often leads to absurd consequences. Let us suppose that I want to know the probability of certain coins to land odd. After having carried out 10000 trials, I find that the relative frequency tends to converge towards a given value which was 0.35, 0.43, 0.72 and 0.93 for the four last coins I investigated. Let us now suppose that I find a new coin I’ll never have the opportunity to test more than one time. According to the principle of indifference, before having ever started the trial, I should think something like that:

Since I know absolutely nothing about this coin, I know (or consider here extremely plausible) it is as likely to land odd as even.

I think this is magical thinking in its purest form. I am not alone in that assessment.

The great philosopher of science Wesley Salmon (who was himself a Bayesian) wrote what follows. “Knowledge of probabilities is concrete knowledge about occurrences; otherwise it is uselfess for prediction and action. According to the principle of indifference, this kind of knowledge can result immediately from our ignorance of reasons to regard one occurrence as more probable as another. This is epistemological magic. Of course, there are ways of transforming ignorance into knowledge – by further investigation and the accumulation of more information. It is the same with all “magic”: to get the rabbit out of the hat you first have to put him in. The principle of indifference tries to perform “real magic”. “

Objective Bayesians often use the following syllogism for grounding the principle of indifference.

1)If we have no reason for favoring one outcomes, we should assign the same probability to each of them

2) In an utterly unknown situation, we have no reason for favoring one of the outcomes

3) Thus all of them have the same probability.

The problem is that (in a situation of utter ignorance) we have not only no reason for favoring one of the outcomes, but also no grounds for thinking that they are equally probable.

The necessary condition in proposition 1) is obviously not sufficient.

This absurdity (and other paradoxes) led philosopher of mathematics John Norton to conclude:

“The epistemic state of complete ignorance is not a probability distribution.”

The Dempter Shafer theory of evidence offers us an elegant way to express indifference while avoiding absurdities and self-contradictions. According to it, a conviction is not represented by a probability (real value between 0 and 1) but by an uncertainty interval [ belief(h) ; 1 – belief(non h) ] , belief(h) and belief(non h) being the degree of trust one has in the hypothesis h and its negation.

For an unknown coin, indifference according to this epistemology would entail  belief(odd) = belief(even) = 0, leading to the probability interval [0 ; 1].

Non-existing prior probabilities

Philosophically speaking, it is controversial to speak of the probability of a theory before any observation has been taken into account. The great philosopher of evolutionary biology Elliot Sober has a nice way to put it: ““Newton’s universal law of gravitation, when suitably supplemented with plausible background assumptions, can be said to confer probabilities on observations. But what does it mean to say that the law has a probability in the light of those observations? More puzzling still is the idea that it has a probability before any observations are taken into account. If God chose the laws of nature by drawing slips of paper from an urn, it would make sense to say that Newton’s law has an objective prior. But no one believes this process model, and nothing similar seems remotely plausible.”

It is hard to see how prior probabilities of theories can be something more than just subjective brain states.

Conclusion

The alleged mathematical demonstration of Ockham’s razor lies on extremely shaky ground because:

1) it relies on the principle of indifference which is not only unproven but leads to absurd and unreliable results as well

2) it assumes that a model has already a probability before any observation.

Philosophically this is very questionable. Now if you are aware of other justifications for Ockham’s razor, I would be very glad if you were to mention them.

Advertisements

John Loftus, probabilities and the Outsider Test of Faith

John Loftus is a former fundamentalist who has become an outspoken opponent of Christianity which he desires to debunk.

He has created what he calls the “Outsider Test of Faith” which he described as follows:

“This whole inside/outside perspective is quite a dilemma and prompts me to propose and argue on behalf of the OTF, the result of which makes the presumption of skepticism the preferred stance when approaching any religious faith, especially one’s own. The outsider test is simply a challenge to test one’s own religious faith with the presumption of skepticism, as an outsider. It calls upon believers to “Test or examine your religious beliefs as if you were outsiders with the same presumption of skepticism you use to test or examine other religious beliefs.” Its presumption is that when examining any set of religious beliefs skepticism is warranted, since the odds are good that the particular set of religious beliefs you have adopted is wrong.”

But why are the odds very low (instead of unknown) to begin with? His reasoning seems to be as follows:

1) Before we start our investigation, we should consider each religion to possess the same likelihood.

2) Thus if there are (say) N = 70000 religions, the prior probality of a religion being true is 1/70000 p(R), p(R) being the total probability of a religious worldview being true.

(I could not find a writing of Loftus explicitly saying that but it seems to be what he means. However I could find one of the supporters of the OST taking that line of reasoning).

 

Objective Bayesianism and the principle of indifference

 

This is actually a straightforward application of the principle of indifference followed by objective Bayesians:

In completely unknown situations, every rational agent should assign the same probability to all outcomes or theory he is aware of.

While this principle can seem pretty intuitive to many people, it is highly problematic.

In the prestigious Standford Encyclopedia of philosophy, one can read in the article about Bayesian epistemology :

“it is generally agreed by both objectivists and subjectivists that ignorance alone cannot be the basis for assigning prior probabilities.”

To illustrate the problem,  I concocted the following story.

Once upon a time, king Lothar of Lorraine had 1000 treasures he wanted to share with his people. He disposed of 50000 red balls and 50000 white balls.

Frederic the Knight (the hero of my trilingual Christmas tale) has to choose one of those in the hope he would get one of the“goldenen Wundern”.

On Monday, Lothar distributes his treasures in a perfectly random fashion.
Frederic knows that the probability of finding the treasure in a red or in a white ball is the same: p(r) = p(w) = 0.5

On Tuesday, the great king puts 10% of the treasure within red balls and 90% within white ones.

Frederic  knows that the probabilities are   p(r) = 0.10   and    p(w) = 0.90

On Wednesday, the sovereign lord of Lorraine puts 67% of the treasures in red balls and 33% in white ones.

Frederic knows that the probabilities are p(r) = 0.67 and p(w) = 0.33

On Thursday, Frederic does not know what the wise king did with his treasure. He could have distributed them in the same way he did during one of the previous days but also have chosen a completely different method.

Therefore Frederic does not know the probabilities;   p(r) = ?  and p(w) = ?

According to the principle of indifference, Fred would be irrational because he ought to believe that p(r) = 0.5 and p(w) = 0.5 on the grounds it is an unknown situation.

This is an extremely strong claim and I could not find in the literature any hint why Frederic would be irrational by accepting his ignorance of the probabilities.

Actually, I believe that quite the contrary is the case.

If the principle of indifference were true, Fred should reason like this:

“I know that on Monday my Lord mixed the treasures randomly so that p(r) = p(w) = 0.5
I know that on Tuesday He distributed 10% in the white ones and 90% in the red ones so that p(w) = 0.10 and p(r) = 0.90
I know that on Wednesday He distributed 67% in the white ones and 33% in the red ones so that p(w) = 0.67 and p(r) = 0.33
AND
I know absolutely nothing what He did on Thursday, therefore I know tthat the probabilities are p(r) = p(w) = 0.5 exactly like on Monday. “

Now I think that this seems intuitively silly and even absurd to many people. There seems to be just no way how one can transform an utter ignorance into a specific knowledge.

Degrees of belief of a rational agent

More moderate Bayesians will probably agree with me that it is misguided to speak of a knowledge of probabilities in the fourth case. Nevertheless they might insist he should have the same confidence that the treasure is in a white ball as in a red one.

I’m afraid this changes nothing to the problem. On Monday Fred has a perfect warrant for feeling the same confidence.
How can he have the same confidence on Thursday if he knows absolutely nothing about the distribution?

So Frederic would be perfectly rational in believing that he does not know the probabilities p(r) = ? and p(w) = ?

Likewise, an alien having just landed on earth would be perfectly rational not to know the initial likelihood of the religions:
p(Christianity) = ?     p(Islam) = ?     p(Mormonism) = ? and so on and so forth.

But there is an additional problem here.

The proposition “the religion x is true one” is not related to any event and it is doubted by non-Bayesian (and moderate Bayesian) philosophers that is warranted to speak of probabilities in such a situation.

Either x is true or false and this cannot be related to any kind of frequency.

The great science philosopher Elliot Sobert (who is sympathetic to Bayesian epistemology) wrote this about the probability of a theory BEFORE any data has been taken into account:

Newton’s universal law of gravitation, when suitably supplemented with plausible background assumptions, can be said to confer probabilities on observations. But what does it mean to say that the law has a probability in the light of those observations? More puzzling still is the idea that it has a probability before any observations are taken into account. If God chose the laws of nature by drawing slips of paper from an urn, it would make sense to say that Newton’s law has an objective prior. But no one believes this process model, and nothing similar seems remotely plausible.”

He rightly reminds us t the beginning of his article that “it is not inevitable that all propositions should have probabilities. That depends on what one means by probability, a point to which I’ll return. The claim that all propositions have probabilities is a philosophical doctrine, not a theorem of mathematics.” l

So, it would be perfectly warranted for the alien to either confess his ignorance of the prior likelihoods of the various religions or perhaps even consider that these prior probabilities do not exist, as Elliot Sober did with the theory of gravitation.

In future posts, I will lay out a non-Bayesian way to evaluate the goodness of theory which only depends on the set of all known facts and don’t assume the existence of a prior probability before any data has been considered.

As we shall see, many of the probabilistic challenges of Dr. Richard Carrier against Christianity kind of dissolves if one drops the assertion that all propositions have objective prior probabilities.

To conclude, I think I have shown in this post that the probabilistic defense of the Outsider Test of Faith is unsound and depends on very questionable assumptions.

I have not, however, showed at all that the OST is flawed for it might very well be successfully defended based on pragmatic grounds. This will be the topic of future conversations.