Wednesday, December 16, 2015

Dick pics

I sometimes come across videos like this


or pictures like this


This is very interesting to me because it shows how people have no idea what is going on in their own brains.

Let us start with the question: why an unwanted image of a dick causes us to have an emotional reaction (and it seems that both man and women have a similar reaction after looking at a stranger’s dick – some mixture of shame and the feeling of being intimidated)? I remember when I was talking to a male friend on Skype not so long ago. Another male friend of mine stood behind him and suddenly pulled his pants off and showed his dick to the camera. My immediate reaction was to curse and look away in disgust. The thing is that unlike many people who send dick pics, my friend perfectly knew what he was doing – he has read The Human Zoo by Desmond Morris. I did too, but the trick nevertheless worked.

So why do dicks evoke these emotions in us? Do we learn to be scared by dicks? When and how exactly do we learn about this? Somebody tells us? Or do we need an unpleasant experience with a dick that belongs to somebody else?

There is no reason to be afraid of a dick if you had none of such experiences. But people still do have these emotions. So what causes them?

Another example I know is when I was just a few years old and I was playing at a riverbank with a group of female friends, all of them a few years old. A dude with a mustache was riding a bike nearby. He stopped close to us, silently pulled down his pants and showed us his junk with a smile on his face. The girls started screaming and run away. I probably did the same thing, although I do not remember well, as it was so much time ago (by the way none of us thinks about this as a traumatic experience now; it was quite benign; after the incident the guy rode off and we never saw him again). Of course, nobody explained to us before that this was an appropriate reaction in such a situation. But it seemed appropriate. Why?

The answer is that this is our innate instinct. We have inherited from our ancestors some types of social interaction that are guided (among others) by genital display. People who observe primates know that the genital display is a way to communicate social status in a group. Dominant male individuals show their dicks much more often than other individuals. Human brains are wired in a similar way. Seeing somebody’s dick makes you feel intimidated and human intuition sometimes makes guys show their dicks in order to intimidate others.

A woman often feels disgust after seeing stranger’s dick, yet she may think that his intention was to arouse her and the man had no idea that his dick looked gross to her. A men asked why he sent a dick pic would probably say something like that: “it was a joke; I wanted to embarrass her; I like to show off my masculinity.” What is really going on, is that when a man wants to intimidate a woman (or less often another man), his primate intuition tells him to show off his dick. The woman verbally misidentifies his intentions but emotionally responds in an intended way. The act achieves its goal. Note that the man also somewhat misidentifies what really caused him to do this. It is because he acts on his animal instinct.

So, as it turns out, unwanted dick pics are a product of our ancestral way to ensure group cohesion through authority structure. These mechanisms have not much use nowadays but they still hang around aimlessly in our brains causing trouble. Moreover, the example of dick pics shows nicely how our verbal processing is disconnected from the part of the brain where we actually make decisions. Neither a man nor a woman verbally understand their own role in this situation – unless they are educated in anthropology or primatology.

If you want to read more, I recommend:

Darkness in the sense of justice

People have an innate sense of justice. Our intuition tells us that if somebody did something wrong, they have to be punished. Notion of karma and such are based in the human tendency to think that there is some cosmic justice. And people who figured out that there is no objective justice, take such justice as an ideal humans should strive for. A just world is what most of us are working towards.

So maybe before we start making decisions based on our sense of justice, it may be good to know where does it come from and whether we should trust it. And of course, as most other intuitions, the sense of justice is a product of natural selection. It is how nature wired our emotions in order to guarantee that we cooperate, enforce social cohesion, and so on. The problem is that nature tends to implement technological trade-offs in her designs, and the things she creates are not perfect. For an easy example, just look up the recurrent laryngeal nerve which connects the brain with the jaw but goes down to the chest for no apparent reason (which is a gross redundancy for a giraffe).

And here is the question I have kept on asking. Should we trust our intuitions? I would say – no. Our intuitions are the animal spirits that the nature equipped us with to deal with much different circumstances than those of today. And even when the circumstances are right, the animal spirits are not guaranteed to be perfect. As anything else designed by natural selection, they are technological trade-offs.

Following our innate sense of justice may lead to sub-optimal design of society, and thus to more suffering that it would be necessary if people behaved rationally. Rationality should be a yardstick against which we should judge how efficiently our instincts help us shape the society. If you want to rationally maximize social welfare, you should consider what are the specific consequences of the decisions you make, rather than follow your intuition. For example, it is reasonable to think that some punishment for crimes is necessary in order to deter people from committing crimes. But if such deterrence cannot be achieved, there is no rational reason to punish a person. Moreover – there may be good reasons to offer the person help in order to make them a better citizen rather than let them learn how to be a hardened criminal during their jail time.

You may cringe at the notion that some crimes should go unpunished. But this feeling is precisely the dark, irrational revenge-seeking sense of justice that was implemented in you by nature. If you want to build a better society, you need to set the feelings aside and perform strict cost-benefit analysis of the decisions you are facing. When you compare outcomes obtained with rationality to outcomes obtained with the human innate sense of justice, it is easy to see the darkness of our animal spirits. 

Further reading

For examples on how people deal with trade-offs between innate sense of justice and pragmatism, see:
Related peer-reviewed papers:

Monday, December 14, 2015

Undeserved saliency of thoughts and words

It happens very often that we put a great emphasis on what people think express verbally. I can see it in philosophical texts where, for example, philosophers deliberate what should be more important – expressed preferences or revealed preferences (see Decision Theory and Rationality by José Bermúdez, p. 64) or while listening to people who keep on talking about their beliefseven in absence of any decision making problem these beliefs could influence.

My take on it is that what ultimately matters is behavior and decisions. Description of human mental processes can help us understand some aspects of human behavior but is only a part of the picture. For example, if we want to see what a person truly wants, the action and the actual choice should be taken into account rather than what the person says she wants, even, or especially, when the two are in conflict. Similarly, it does not matter what convoluted theories people come up with in order to explain how their thoughts interact with their behavior. If their behavior can be fully explained by a simpler theory, then the convoluted ones should be discarded.

But why? Why am I so eager to demean human thoughts? The reason is simple. Verbal processing and speech are devices that serve some evolutionary purposes. I do not believe that providing a perfect window into the operation of the human mind is one of these purposes. On the contrary, we know that a lot of mental processes are unconscious. The part of the brain responsible for verbal processing is not connected to all other parts of the brain that are responsible for making decisions. Therefore, we are unable to describe fully what is going on in our heads. Furthermore, there aren’t even reasons to believe that spoken words are a perfect window into the part of mind that is available to verbal processing. It may well be a dirty window obstructing the view or a distorting mirror.

To give you an analogy – imagine that human nature is a picture of a ruined city with a single nice flower in the foreground. What people say about their thoughts can give you access only to the part of the picture that has the little flower, probably seen through a distorting lens. It is not wise to draw a conclusion about what the entire picture represents based on this little image only. If you want to know what the human nature truly is, you must go beyond the verbal processing and look at the entire picture.

The best way to think about it is to ask yourself a question: what could I learn about humans (and how) if they could not speak? Or even better: how would I go about learning about an alien species that I have no clue how to communicate with and who may be different to me in any aspect? If you can think about humans and analyze them as an alien species, then you are on a good way to be objective in your analysis of the human nature. But if you are focused on what people think that is going on in their heads – then you may be bound for a dead end. 

Wednesday, November 18, 2015

Big data regression

Problem formulation

You have a dataset in which each observation is an impression of a banner ad. You have a variable indicating success (say, click or conversion) and a lot of additional information (features), all of which are coded as binary: browser type, host, URL, placement, banner format, banner id, banner keywords, hosts the viewer has seen so far, how many times the viewer has seen particular banners, when did s/he see these banners, how many times did s/he click on which banners, how did s/he behave while shopping online, current date and time, geoid information, and so on.

The question is which features increase the chance of success and which decrease it? This is an important question if you want to allocate your advertising resources efficiently. The difficulty is that the number of observations is in billions and the number of available features is in millions.

A solution

A naïve approach is to create a table which informs how many successes and how many failures occurred when feature was present or absent. Then, you can compare success ratio in absence of the feature with the success ratio in presence of the feature. If the latter is higher than the former, then the feature indicates higher probability of success.

This approach is similar to calculating simple correlation between the feature and the success indicator. And thus, it suffers from endogeneity. If a combination of two features often occurs together, say a particular host and a particular banner, and both of them seem to have high correlation with the success indicator, you do not really know whether this is the banner that drives success, the host, or both.

In order to separate the effects of features, you need to calculate partial correlation, conditional on other features, rather than simple correlation. The straightforward way to do it is to perform an ordinary least squares regression on the data. Unfortunately, there exists no software that could handle amounts of data you have. Even if you limit the dataset to most common features – say top 5000 – you still end up with several terabytes of data to be processed by the regression algorithm. To focus attention, let us say that we need a way to perform a regression on n = 4 billion observations and k = 10 thousand features. If each variable takes up 4 bytes, the amount of memory required to perform such analysis equals nearly 160 terabytes.

Typically, linear least squares models are fit using orthogonal decomposition of the data matrix. For example, R package uses QR decomposition. One can use also singular value decomposition. Unfortunately, these methods require all data to be kept in memory and have algorithmic complexity of O(nk2).

Alternatively, one can calculate Gram matrix. This has algorithmic complexity of O(nk2) which can be reduced to O(np2) if the data are sparse (where p is the quadratic mean number of features per observation) and very easily parallelized. Another advantage is that memory requirement for calculating Gram matrix are O(k2) only and for k = 10000 the exact amount of RAM required to keep Gram matrix would be just under 200 MB (keep in mind that Gram matrix is symmetric). The only problem here is that to calculate regression coefficients, it is necessary to invert the calculated Gram matrix (which is often discouraged due to inferior numerical stability and takes O(k3)). The viability of this solution depends thus on whether it is possible to do it with satisfactory numerical accuracy. As it turns out, it is.

Note that popular machine learning engines like Vowpal Wabbit are not of much use in this situation. Machine learning is usually concentrated on prediction, rather than accurate estimation of model parameters. Engines like VW in principle are less accurate than OLS. They allow multi-collinearity of variables which in turn forces user to perform separate data analysis in order to eliminate it in the first place. Finally, they do not allow for standard statistical inference with the model parameters.

Preliminaries

The plan was to create a C++ class able to do all operations necessary for this regression. The data were stored on a remote Linux server using Hadoop. I was planning to develop and debug my solution using Microsoft Visual Studio 2015 on my Windows 7 64-bit Dell computer (i7-4790 @ 3.6 GHz with 16 GB RAM) and then to port it to its final destination.

There were four initial things I had to take care of: (1) a way of measuring code performance, (2) a way of measuring numerical accuracy of matrix inversion, (3) C++ libraries for inverting matrices, and (4) a strategy for verifying accuracy of the entire algorithm.

Boy, was it hard to find a good way to precisely measure code execution time on Windows. Unfortunately, the usually recommended GetTickCount() Windows API function relies on the 55 Hz clock and thus has a resolution of around 18 milliseconds. Fortunately, I eventually found out about the QueryPerformanceCounter() function, whose resolution is much better.

Next, I decided to use the following measure for numerical precision of matrix inversion. Let us say that you need to invert matrix A. You use an inversion algorithm on it which generates matrix B. If matrix B is a perfect inverse of A, then AB = I, where I is the identity matrix. Hence, I calculate matrix C = AB – I. Then, I find the element of matrix C that has the highest absolute value and call it r. This is my measure of numerical precision. In the world of infinite precision, r = 0. In the real world r < 1e-16 is perfect (I use double – a 64 bit floating point type for my calculations). r < 1e-5 is still acceptable. Otherwise there are reasons to worry.

With tools for measuring performance and accuracy, I was able to start testing libraries. I initially turned to Eigen which was very easy to install and use with my Visual Studio. Eigen uses LU decomposition for calculating matrix inverse and was satisfying in terms of speed and reliability – up to the point when I tried to invert a 7000x7000 matrix. Eigen kept on crashing and I could not figure out why. The second option was thus Armadillo. Armadillo did not have the same problems and worked well with bigger matrices all the way up to 10000.

As it turns out, Armadillo can take advantage of the fact that Gram matrix is symmetric and positive-definite. The inversion is done by means of Cholesky decomposition and after a few experiments I realized that it is not only faster but also numerically more reliable than LU-based method. I was able to invert a 10001x10001 matrix in 283 seconds (in a single thread) with r = 3.13e-14. The irony is that both Cholesky decomposition and matrix multiplication work in O(k3) but the latter is over twice as slow, so it takes much more time to check numerical precision than to perform actual inversion.

Finally, I designed a data generating process to test whether least squares algorithm of my design can recover parameters used to generate the data. Essentially, I created 10001 variables xi for i=0, 1, 2, …, 10000. x0 = 1, always. For i>0 we have P(xi = 1) = 1/(3+i) = 1 – P(xi = 0). Then, I created a vector of parameters bi. b0 = 0.0015 and for any non-negative integer j, b4j+1 = 0.0001, b4j+2 = 0.0002, b4j+3 = 0.0003, and b4j+4 = -0.00005. Finally, P(y = 1) = x * b, where * indicates dot product. This is a typical linear probability model.

Using the formula above I generated 4 billion observations (it took 11 days on 4 out of 8 cores of my Windows machine) and fed them into the regression algorithm. The algorithm was able to recover vector b with the expected convergence rate. Note that by design the aforementioned data generating process creates variables that are independently distributed. I thus had to tweak this and that to see whether the algorithm could handle correlated features as well as to investigate the bias (see more about that in the last section).

Statistical inference

The question of how to recover model parameters from the data is simple. In addition to the Gram matrix, you need a success count vector. The i-th element in this vector indicates how many successes were there when i-th feature was present. Calculating this vector is at most O(np) in time and requires O(k) memory (note that none of the operations involved in calculating Gram matrix and success count vector are floating point operations – this is all integer arithmetic since we operate on binary variables only; thus both Gram matrix and success count vector are calculated with perfect numerical precision). Once you have them both, you need to invert the Gram matrix and multiply it by the success count vector. The resulting vector contains estimated model parameters.

However, getting standard errors of the estimated coefficients is a bit more complicated. Typically, we would use diagonal elements of the inverted Gram matrix and multiply them by standard deviation of the residuals. The problem is that calculating residuals requires going through all observations all over again. This not only increases the execution time. It poses a major technical difficulty as it requires the dataset to be invariant for the duration of the algorithm execution (which is assumed to be at least several hours). To fix this, one would have to tinker with the data flow in the entire system which can greatly inflate project’s costs.

Fortunately, there is trick that can rescue us here. Instead of quadratic mean of residuals, one can use standard deviation of the success variable. Note that the latter must be greater than the former: the former is the quadratic mean of residuals for the entire model and the latter is the quadratic mean of residuals for the model with a constant only. This guarantees that the standard errors will be overestimated which is much better than having them underestimated or all over the place. Moreover, for small average success ratio, the two will be close. In fact, it is easy to show that under some plausible conditions as the average success ratio goes to zero, the two are the same in the limit. And for banner impressions, the average success ratio (e.g. CTR) is, no doubt, small.

No amount of theoretical divagations can replace an empirical test. It is thus necessary to check ex post whether statistical inference using the above simplifications is indeed valid. To do that, I estimate a number of models (keep in mind that I have 10000 variables) and check how frequently the estimated coefficients are within the 95% confidence intervals. I expect them to be there slightly more often than 95% of the time (due to overestimation of standard errors) and indeed, this is what I find.

Finally, I cannot write a section about statistical inference without bashing p-values and t-statistics. I strongly discourage you from using them. A single number is often not enough to facilitate good judgment about the estimated coefficient. p-value typically answers a question like: “how likely is it, that the coefficient is on the opposite side of zero?” - Is this really what you want to know? The notion of statistical significance is often misleading. You can have a statistically insignificant coefficient whose confidence interval is so close to zero that any meaningful influence on the dependent variable is ruled out: you can then say that your data conclusively show that there is no influence (rather than that the data do not show that there is influence). Also, you can have a statistically significant coefficient with very high t-statistic, which is economically insignificant or economically significant but estimated very imprecisely. Thus, instead of p-values and t-statistics I suggest using confidence intervals. The question they answer is: what are the likely values of the coefficient? And this is what you actually want to know most of the time.

Data refinements

Oops. You have a nice OLS algorithm which supports valid statistical inference. You tested it with you generated data and it works fine. Now you apply it to real data and the Gram matrix does not want to invert or inverts with precision r > 1. You quickly realize that it is because the data have a lot of constant, perfectly correlated, and multicollinear variables. How to deal with that?

Sure, you can force users to limit themselves only to variables which are neither perfectly correlated nor multi-collinear. But when they are using thousands of variables, it may take a lot of effort to figure it out. Also, running an algorithm for several hours only to learn that it fails because you stuffed it with a bad variable (and it does not tell you which one is bad!) simply does not seem right. Fortunately, as it turns out, all these problems can be fixed with analysis and manipulations on the already-calculated Gram matrix.

The first refinement I suggest is dropping features that are present too few times (e.g. less than 1000). You can find them by examining diagonal entries of the Gram matrix. To drop a variable you can just delete appropriate row and column from the Gram matrix as well as corresponding entry form the success count vector. After such a delete operation, what you are left with is the same as if you did not consider the deleted variable to begin with. Clear cut.

The second refinement I suggest is to drop features with not enough variability. Based on the Gram matrix and the success count vector, it is possible to construct a variability table for every feature (the same one I described as the naïve solution at the beginning of the article). This table has two rows and two columns – rows indicate whether there was a success and columns indicate whether the feature was present. Each cell contains the number of observations. So you have the number of observations that had a feature and there was a success, a number of observations that had a feature but with no success, a number of observation without this feature but with success, and a number of observations with neither feature nor success. I drop features for which at least one of the four cells has a value lower than 10.

As we proceed with the third refinement, note that you can easily calculate correlation between any two features based on the content of Gram matrix. Just write out the formula for correlation and simplify it knowing that you are dealing with binary variables to realize that you have all information you need in the Gram matrix. This of course allows you to identify all pairs of perfectly correlated or highly correlated variables in O(k2) time. I got rid of a variable if I saw correlation whose absolute value exceeded .99 (doing, say, .95 instead of .99 did not dramatically improve speed or numerical precision of the algorithm).

But now comes a biggie. How to find features that are perfectly multicollinear? One naïve approach is to try to find all triples of such variables and test them for multicollinearity, find all quadruples, quintuples, and so on. The trouble is that finding all n-tuples can be done in time O(kn) which is a nightmare. Alternatively, you can try to invert submatrices: if you can invert a matrix made up of first p rows and columns of the original Gram matrix, but you cannot invert a matrix made up of the first p+1 rows and columns of the original, it surely indicates that the variable number p+1 causes our Gram matrix to be singular. But this solution has a complexity of O(k4) which for high k may be very cumbersome. There must be a better way.

As it turns out a better way is to perform a QR decomposition of the Gram matrix (not to confuse with QR decomposition of the data matrix as a part of the standard linear least squares algorithm). The diagonal elements of the R matrix are of interest to us – a zero indicates that a variable is causing problems and needs to be eliminated. QR decomposition generates the same results as the “invert submatrices” algorithm described above – but it runs in O(k3). And, of course, it is a good practice to check its numerical precision in a similar way we were checking numerical precision of matrix inversion algorithm.

Finally, note that you can sort the Gram matrix using its diagonal entries. I sort it descending so that features that get eliminated are always the features which occur less frequently. It is probably possible to achieve higher/lower numerical precision by sorting Gram matrix, however I have not investigated this issue extensively. I only noticed that in some instances sorting the Gram matrix ascending made the LU inversion algorithm fail (too high r) while sorting descending or not sorting did not affect the LU algorithm much.

All these operations require some effort to keep track of which variables were eliminated and why, and especially how variables in the final Gram matrix (the one undergoing inversion) map to the initial variables before the refinements. However, the results are worth the effort.

Integration and application

The task of integrating new solutions with legacy systems may be particularly hard. Fortunately, in my case, there already existed data processing routines that fed off of the same input I needed (that is a stream of observations in a sparsity supporting format – a list of “lit-up” features), as well as input generating routines that filtered original data with given observation and feature selectors.

I had a shared terminal session using Screen with people responsible for maintaining C++ code for analysis done on these data to-date. We were able to link up my class within the current setup so that users can use the same interface to run the regression that they used previously to do other type of analyses. Later on, I had to do some code debugging to account for unexpected differences in data format but ultimately everything went well.

The first real data fed to the algorithm had 1.16 billion observations and 5050 features. Calculating Gram matrix and success count vector took around 7 hours. Due to refinements, the number of features was reduced to 3104. Inverting matrix took just a few seconds, and the achieved precision was around 2e-7.

Pitfalls

In this final section I would like to discuss three potential problems that do not have easy solutions: variable cannibalization, bias, and causality.

It often happens that a number of available features refer to essentially the same thing. For example, you may have features that indicate a person who did not see this banner in past minute, 5 minutes, hour, and day. These features will be correlated and they have a clear hierarchy of implication. A user can make an attempt to run a regression using all these features expecting that the chance of success will be a decreasing function of the number of impressions. However, the effect of a viewer who has never seen the banner will not be attributed entirely to any of the aforementioned features. Instead, it will be split among them, making the estimated coefficients hard to interpret. This is the essence of cannibalization – similar variables split the effect they are supposed to pick up and therefore none of them has a coefficient it should have (please let me know if you are aware of a better term than “cannibalization”). The simple but somewhat cumbersome remedy for it is to manually avoid using features with similar meaning in one regression.

Secondly, it is widely known that linear probability model generates bias. The biased coefficients are usually closer to zero than they should be. To see why, consider a feature whose effect is to increase probability of success by 10%. However, this feature often occurs with other feature whose presence drives the probability of success to -25% (that is zero). Presence of the feature in question can at best increase the probability to -15% (that is still zero). As a result the feature in question does not affect the outcome in some sub-population due to negative predicted probability. Its estimated effect is thus smaller (closer to zero) than expected 10%.

Note that the reason why linear probability model generates biased results is not because the regression algorithm is flawed but because the model specification is flawed. The P(y = 1) = x * b model equation is incorrect if x * b is smaller than zero or bigger than one because probability by definition must be between 0 and 1. Whenever x * b is outside these bounds, the coefficients end up being biased. That is, OLS correctly estimates partial correlation between independent and dependent variables, but, due to data truncation, partial correlation is not what is needed to recover the linear probability model parameters.

The resolution of this issue may go towards assuming that model specification is correct and finding ways to alleviate bias or at least towards identifying features whose coefficients may be biased. On the other hand it may be also possible to assume that the linear probability specification is incorrect and to investigate whether partial correlation is what is really needed for the decision problems the estimates are supposed to help with. I consider solving this problem an issue separate from the main topic of this article and I leave it at that.

Finally, I would like to make a note on causality. Partial correlation, as any correlation, does not imply causation. Therefore, it may turn out that a particular feature does not have a causal effect on probability of success but instead is correlated with an omitted variable which is the true cause of the change in the observable behavior. For example, one host can have a higher conversion ratio than the other. However, the reason for that may be that the advertised product is for females only. The population of females may be much smaller for the second host even though higher fraction of them buys the product. In such case the second host is actually better at selling the product (that is it is better to direct generic traffic to the second host rather than to the first one) but this information is obscured by inability to distinguish between male and female viewers. It is thus important to remember that the regression provides us only with partial correlation rather than proofs of causality.

The issue of causality is of extreme importance when we are trying to predict effects of policy (like redirecting traffic in the example above). However, when instead of policy effects, we are interested in predictions, partial correlation seems to be a sufficient tool. For example, you may want to know whether people using Internet Explorer are more likely to click on a banner, even though you do not have the ability to influence what browser they are using. In such situations establishing causality is not necessary.

:-)

Wednesday, September 30, 2015

Consciousness and morality revisited

As I am investigating the topic of morality (whether I have anything interesting to say about it is yet to be discovered), I bought “The Moral Landscape: How science can determine human values” by Sam Harris. I was not surprised to see that on the first page, in the Introduction, Harris writes: “I will argue, however, that question about values – about meaning, morality, and life’s larger purpose – are really questions about the well-being of conscious creatures.” I am glad to have yet another example that humans use consciousness as a property defining objects of morality.

The notion of “well-being of conscious creatures” is repeated numerous times throughout the book. On page 32, Harris explains why he chose consciousness as the basis for morality:

“Let us begin with the fact of consciousness: I think we can know, through reason alone, that consciousness is the only intelligible domain of value. What is the alternative? I invite you to try to think of a source of value that has absolutely nothing to do with the (actual or potential) experience of conscious beings. Take a moment to think about what this would entail: whatever this alternative is, it cannot affect the experience of any creature (in this life or in any other). Put this thing in a box, and what you have in that box is – it would seem, by definition – the least interesting thing in the universe.

So how much time should we spend worrying about such a transcendent source of value? I think the time I will spend typing this sentence is already too much. All other notions of value will bear some relationship to the actual or potential experience of conscious beings. So my claim that consciousness is the basis of human values and morality is not an arbitrary starting point.“

There are a couple of problems here. First of all, Harris does not present any constructive argument in favor of using consciousness as the starting point. He only says that all alternatives he can think of are either uninteresting or related to consciousness. This is an argument from ignorance, a logical fallacy, which Harris should be familiar with as an outspoken atheist. Unfortunately, Harris keeps on using arguments from ignorance in his book (see also p. 62 and p. 183).

Secondly, let us for a second consider the world of ants (rather than humans). We know that most ants are insects with complex rules of social interactions. The problem is how to design these rules in order to maximize ants’ well-being (e.g. “thou shalt not kill another ant from your nest”). Or to put it more generally, let us say we have any population of any social agents: they may be simple computer programs implemented in a cellular automaton, or super-intelligent aliens who have no characteristic that we would recognize as consciousness by any modern definition (they do not have brain tissue, they do not smile, frown, sleep, cry, nor talk). How do we go about designing optimal interaction rules for their population, i.e. how do we design their morality? If use of consciousness is necessary, does it mean that we cannot design morality for creatures that do not have it? It seems that this is what Harris is thinking: “altruism must be (…) conscious (…) to exclude ants” (p. 92). Why not ants? It seems that if we are to solve much more complicated problem for humans, maybe it would be a good idea to start with much simpler problem for ants? Harris seems to think that we cannot optimize ants’ behavior but we can optimize human behavior. Why?

We already know that despite what Harris is claiming, the choice of consciousness is arbitrary. This is what evolutionary psychology dictates and Harris tries (and fails) to rationalize this human intuition. And the question that needs to be answered in the first place is: should we follow our intuitions? Or, more precisely: why, when, and which intuitions should we follow, and which should we discard?

Monday, September 14, 2015

Consciousness and morality

In his brilliant book, ‘Consciousness and the brain,’ Stanislas Dehaene gives an overview of neurological processes that give rise to consciousness. He uses the definition of consciousness I like and, generally, I agree with everything he has to say (especially with his critique of the grotesque modern philosophical theories of consciousness towards the end of the book). But there is one exception. In chapter 7, he unfortunately delves into a morality-related topic and seems to make a tacit assumption that virtually everybody else makes as well. And this is an assumption I do not like.

The question is: ‘is infanticide morally justified?’ The quoted argument in favor is: “The fact that a being is (…) a member of species Homo sapiens, is not relevant to the wrongness of killing it; it is, rather, characteristics like rationality, autonomy, and self-consciousness (…). Infants lack these characteristics. Killing them, therefore, cannot be equated with killing normal human beings, or any other self-conscious beings.” Dehaene strongly criticizes this point of view: “Such assertions are preposterous for many reasons. (…) Although the infant mind remains a vast terra incognita, behavior, anatomy, and brain imaging can provide much information about conscious states. (…) We can now safely conclude that conscious access exists in babies as in adults (…)” (p. 236-243)

Let us take a step back to see what is going on here. We see two people arguing if a thing (an infant in this case) deserves protection as an object of moral behavior (let us quickly explain the notion of 'object of moral behavior': e.g. unlike humans, cockroaches are not objects of morality so we can kill them with no remorse). Both sides tacitly agree that a thing qualifies as an object of moral behavior if it has certain characteristics of an adult human being and that it does not qualify as an object of moral behavior if it does not have these characteristics. Both sides tacitly agree that these characteristics revolve around mental capabilities of a healthy human adult and gravitate towards something that both sides call consciousness. What the sides disagree upon is whether a particular class of things (infants) has consciousness or not. I side with Dehaene that babies have consciousness, because philosophers whom he cites, as it often happens to philosophers, seem to have no idea what they are talking about. However, it may surprise you that who I side with is actually irrelevant.  

Connecting consciousness with morality seems to be very popular. As I wrote in my previous post, vegetarians often use the argument that animals are ‘conscious’ to propose that it is immoral to kill them. We can also see it in the debates about consciousness, maintaining life-support for vegetative-state patients, etc. However, virtually never, a person making such claims explains why consciousness is the necessary and sufficient condition to become an object of morality. And this omission goes unnoticed. Everybody in these discussions seems to be in tacit agreement that this is the way to go: you are conscious – you deserve a right to live, you are not conscious – you can be treated instrumentally. (In some other debates the question is whether the thing in question is capable of ‘suffering’ or ‘feeling’ rather than being conscious, I do not want to go too deep in the nuance here, because it is irrelevant.)

Why is consciousness the necessary and sufficient condition to become an object of morality? Why is it inherently wrong to kill a conscious being? If you read my previous essay, you probably know the answers to these questions. There is nothing objectively wrong in killing a conscious being, whatever the definition of consciousness might be. It is subjectively wrong from the point of view of the members of the Homo sapiens species, because they (we) are hardwired to perceive it as wrong. In other words, we have an evolved intuition that killing something that has consciousness is wrong because Mother Nature hardwired us not to kill each other. But she did not care to make the emotional mechanisms precise enough to spare us the ongoing confusion. Whatever works is good enough.

The battle described at the beginning of this essay can be deciphered in the following way. The two gentlemen have a certain concept (consciousness) whose perception is hardwired in their brains to activate the ‘I care’ system. The two gentlemen then argue whether a certain stimulus (a baby) should be associated with this concept (and thus activate the system). But both of them skip the question whether this setup with the concept of consciousness invoking morality makes sense at all. None of the gentlemen makes any arguments about objective – as opposed to emotional – reasons to kill or not to kill infants.

Do such objective reasons exist? It may be surprising and depressing for some people to learn that our intuitions about the world, including the dearest and most deeply held feelings, are the works of natural selection and are imprecise products of technological trade-offs that often lead us astray. Our intuitions worked well to propel us to the status of the dominant species on the planet Earth (in a sense that we technically have power to destroy virtually all other species) but they also very efficiently generate confusion when we try to understand the nature of reality. Using intuition is not a pathway to truth. Rationality is.

But if we eliminate our intuitions as a source of morality, what are we left with? Can we answer the question ‘is it okay to murder babies?’ Can we even answer the question ‘is it okay to murder another human being?’ Can we build a rational argument that is completely independent of our hardwired intuitions, is fully rational, and provides guidance for the decisions we must constantly make? Finally, if we ever encounter more intellectually advanced aliens, who do not rely on their instincts like we do, what kind of morality should we expect from them?

Maybe it is possible to answer these questions in a meaningful way. But this is a topic for a separate post. Stay tuned. 

Saturday, August 29, 2015

The mind-body problem and the hard problem of consciousness explained

Abstract: In this essay I try to reconcile the two available sources of information about consciousness. One source is science, which gives us insights into how brains work. Another source is introspection and opinions expressed by other people talking about their subjective experience. The hard problem of consciousness can be understood as inconsistency of the information provided by these two sources. I propose a reductionist theory based in evolutionary psychology that accounts for subjective experience and explains why introspection yields thoughts that are seemingly incompatible with fully reductionist point of view. By incorporating thoughts of individuals and their opinions about nature of consciousness into the reductionist theory, I reduce the set of unexplained or inconsistent observations, on which the hard problem of consciousness is based, to null. Given there is no more unexplained observations left, regardless of whether they are objective or subjective, the hard problem of consciousness appears to be solved. 


Introduction

People are baffled by the notion of consciousness. There have been many theories of why the consciousness arises and none of them seems to be satisfying. Dualism (e.g.mind is immaterial and somehow communicates with brain) is just a pure speculation which offers no explanation in a scientific sense. Other theories, like panpsychism (every single molecule has consciousness but the more complicated the system, the more consciousness it has), are not falsifiable and offer no predictive power. Finally, full reductionism (mind is a product of a physical brain) fails to address the reason why we are able to distinguish between subjective experience and objective world.

People discuss consciousness, mind-body problem, and the existence of qualia – the discussion itself is an objective fact about reality. Electrical and chemical impulses originating in their brains make the muscles in their mouths and throats contract so that the corresponding statements are uttered. It is thus an objective fact about reality that there is something in the brains of these people that causes them to perceive consciousness or qualia as something that cannot be explained by material science. What are the origins of these electrochemical signals? Is there some mysterious soul that through yet unidentified physical mechanism facilitates transfer of information between the realm of spiritual and the realm of material? Or the mental properties of elementary particles in a brain unite in some mysterious way to influence the neurons responsible for perception of consciousness? Finally, maybe there is a fully reductionist explanation of why these neurons get activated and why our brains produce behavioral outcomes like the writings of Rene Descartes?

It is bizarre. The best model of the Universe that science has been able to come up so far, the model that is being constantly positively verified, yields good predictions, and helps us solve practical problems, is materialistic and reductionist. All objective evidence points to the fact that a mind is a product of a brain and nothing else. Brain has not been fully understood yet. Mind has not been fully understood either. So maybe understanding the brain will let us understand the mind? It is bizarre that philosophers are often so quick to reject a notion that once we fully understand physical brain we will fully understand human experience. Why is that?

A solution

To sketch a quick explanation, let me first focus the attention on what the problem is. When you think about other humans as humans, or when you think about your own thoughts, there is something you perceive (the common notion of consciousness). But you no longer perceive this thing when you think about the brain tissue, firing neurons, electrical circuits of a robot, and such. The perception of it, when you think about humans as a whole, and failure to perceive it, when you think about them in a reductionist way (biological tissue), is what causes the dissonance in your brain. Your brain perceives this dissonance and makes your mouth utter the statement like: ‘The really hard problem of consciousness is the problem of experience. When we think and perceive there is a whir ofinformation processing, but there is also a subjective aspect.’ The notion of “information processing” (reductionist part) does not cause the feeling that you have about yourself when you ponder the phenomenon of your own thought. This is why a thought must be something more. This is what the problem was during the time of Descartes and this is what the problem still is as discussed by modern philosophers talking about the hard problem of consciousness and the existence of qualia.

To solve the hard problem of consciousness we need to explain why people perceive consciousness in some things and they do not perceive it on other things. The answer lies in evolutionary psychology. Every healthy human being has a system in their brain that is supposed to detect minds. This system is activated when we think about sentient beings but is not activated when we think about meat or a piece of silicon. The reason why we have this system is to facilitate our socials interactions. We need to recognize other humans so that we can feel sympathy or compassion, and help them when they are hurt while not having the same feelings towards a broken cardboard box.

Evolutionary psychology explains

A human brain has a number of systems that allow for recognition of objects that are important from the evolutionary point of view. Recognition of such salient objects usually facilitates specific behavior which is a response increasing genes’ probability of survival. When you see a naked body of a potential sexual partner you are inclined to get closer. When you see (or smell) a rotting carcass of a rat, you are inclined to move away. When you see a baby, you feel a need to care about it. And so on.

The perception of a salient object often generates a feeling. This feeling can be overcome under some circumstances. It cannot be treated as a sole determination of human behavior, it is just guidance. There are other factors that can be more important (e.g. there is a lion between me and the naked body of my lover). But the important fact is that detection of salient objects happens outside of our thoughts – we cannot chose to be attracted by a carcass instead of being repelled. We can at most ignore the feeling of repulsion.

Another important fact is that systems for recognition of salient objects are very crude and prone to mistakes. It is probably very hard to precisely shape the structure of a brain by genes. There is an engineering tradeoff involved – our brains are not designed to be perfect. They are designed to do their job in most situations at a moderate cost.

One of the examples of such crudeness is recognition of a sexual partner. It is very common both for humans and in the rest of animal kingdom to get aroused by things that are far from actual potential sexual partners. It is easy to find on the Internet a picture of a tortoise having sex with a stone or a dog humping his owner’s leg. It is also very easy to find a lot of pornography. Pornography allows humans to get aroused by a pattern of colorful dots rather than by the presence of an actual sexual partner.

Similar phenomena occur with respect to perceived cuteness. It is quite simple to discern what facial features activate a mental system that was most likely intended for the recognition of human infants (cartoonists know it too well). Seeing something with big eyes and a little nose mounted on a big head makes us feel like we need to care about it. But it is not only infants that have these features. Most of young mammals have them which is why we perceive kittens and puppies as cute. From the evolutionary point of view, this may be a mistake – we are misidentifying a salient object and misallocating our resources (as we waste time on caring for a puppy instead of a member of our own species, say a relative). But it is not easy to determine whether this phenomenon is really a bug or a feature – there is not only an engineering tradeoff involved in making the recognition mechanism crude. There may be also some yet-to-be-discovered benefits from making such “mistakes.”

The mind as a salient object

The toolbox our brains are equipped with contains a lot of systems that guide our behavior when we interact with other humans. For example, there is a subsystem responsible for basic moral behavior. It takes a great deal of emotional effort to kill somebody you consider to be a fellow human being. There is a feeling of compassion you feel towards somebody in need. There are feelings of unfairness that we have when others do not reciprocate. And so on. All these emotional responses are culturally universal and have simple evolutionary explanations.

Is there any other evidence that such subsystem exists? Let us consider genocide. Something that happens very often before genocide is a process of dehumanization of a group that will be subject to extermination. In the brains of perpetrators, dehumanization effectively disconnects the system for recognition of fellow human beings from the input generated by the victimized group. You no longer perceive victims as humans – you now perceive them as cockroaches or rats, which does not activate your moral intuition and makes them worthy of extermination.

When you watch debates or read articles about consciousness, you may notice that people often intuitively make a connection between consciousness and moral behavior. And it is not only vegetarians that refuse to eat “sentient beings.” People who otherwise eat meat, sometimes say that since octopi, elephants, dolphins, and monkeys were identified as self-aware (they can recognize themselves in a mirror), it is thus immoral to eat them. The questions what is conscious and what is not conscious is important precisely because conscious beings seem to require moral treatment while beings that are not conscious are just things that can be dealt with without as much respect. Note that lack of consciousness at the time of committing a crime is often a condition for more lenient verdict.

Like other systems for recognition of salient objects, the system for recognizing minds (or consciousness / sentience / souls / feelings / subjective experience) is not perfect. It can be overridden (as we saw in the example of genocide) and it can be activated by the stimuli it was not likely intended for. Such a misidentification can yield a decision that results in misallocation of resources and could be detrimental not only from the evolutionary point of view but even from the viewpoint of subjective wellbeing of the individual.

There are a lot of examples showing the system for recognizing minds being activated by a mistake, when there is no actual human mind present. As I already mentioned, it is often activated by animals, which leads people to refuse to eat them and chastise those who do. It is activated by fetuses, which leads people to strongly oppose abortion and demonize those who do not. It is activated by inanimate objects and natural occurrences in people who believe in spirits and worship gods (and trade favors with them by making offerings, including human sacrifice). And finally, some people make a connection between this system and the entire Universe, which leads to panpsychism.

The existence of such a system can be potentially empirically verified. It implies that there is a pattern of neurological activity that occurs when a human detects a mind. There is also a pattern corresponding to at least some moral judgements which is then tied to identifying the subject of a judgement as a mind.

What is consciousness?

The main point of this essay is that there is no evidence, whether subjective or objective, that leads to a conclusion that mind is something more than a product of material brain, given the laws of physics as we know them. All the subjective “evidence” against reductionism can be explained by reductionism and there are no arguments left in favor of other theories.

Reductionism explains such “evidence” by showing that it is created as a result of cognitive dissonance caused by the system for detecting minds which gets activated while thinking about humans holistically or performing introspection, and is not activated while thinking about nerve tissue. The hard problem of consciousness has thus no origins in reality but is a product of inadequate perception abilities of the human brain. In other words, the question is not whether the thing exists but why do we have an illusion that it does. Similarly, when you are visiting a psychiatric ward, you do not find yourself concerned whether one of the patients really is Napoleon Bonaparte. You are more likely to analyze WHY he thinks so. The thought that all healthy humans have some mental deficits leading to delusions might seem depressing but on the other hand we cannot just start off with an assumption that we are perfect.

Having addressed the main point of this essay, an interesting follow-up question arises: does consciousness even exist? Before discussing existence one must have a good definition of the thing in question. A good definition identifies a phenomenon by its properties and enables us to objectively discern whether a phenomenon satisfies the properties or not. In other words, a good definition should let us objectively determine whether something is conscious or not. A good definition should be also as close to the common use of the word as possible. Most of the available definitions describe consciousness by vague associations with notions like sentience, awareness, subjectivity, or ability to feel and fail at satisfying these criteria.

The word “consciousness” is commonly used to describe phenomena which activate the mind detection system in a brain of the person who speaks about it. Unfortunately, for different people, different objects activate this system which causes a lot of confusion. Despite trying hard I was unable to come up with a reductionist definition that would be based on this approach. Hence, a different approach is needed.

Another common way of talking about consciousness is when we contrast conscious mental processes with unconscious mental processes. The former are what we can speak about after investigating our own minds trough introspections and the latter seem to be “given” to us in form of feelings or information, for example who the faces belong to and whether they are attractive.

Hence, I espouse the following definition of consciousness: “Human consciousness is a collection of mental processes that can be remembered and talked about later on by a healthy human being.” This particular definition seems to be useful as it allows empirically verifying what the content of human consciousness is and what it is not. It relates to what we care about – the experience we can share with each other, the experience perceived as the content of the Cartesian theater. It is what the mind detection system is most likely supposed to detect. Finally, its similar version has been already used by reputable scientists.

What about animals then? Do they have consciousness? Well, it depends on what scientific question we are asking. We are surely not interested in knowing if animals possess the imaginary and emotionally loaded quality whose illusion is caused by a failure of the mind detection system in our brains. We know that animals are aware (that is they have a working model of the environment in their brains) and often self-aware (that is they have a working model of their bodies and maybe even their own thoughts). But we don’t have to give these phenomena a new name. Word “consciousness” seems to be interesting from the perspective of humans willing to investigate what they are able to communicate about, but it does not have any obvious application in case of other animals, since they cannot speak.

Let us now think about a philosophical zombie, a creature imagined by philosophers, a creature that behaves exactly like a human but does not have subjective experience – a very sophisticated robot. Consciousness does not exist in a sense that there is something mysterious that differentiates a philosophical zombie from a human being. As a matter of fact, the notion of a philosophical zombie was engineered as a human being who does not activate the mind-recognition system. The difference between a human and a philosophical zombie is in subjective perception, not in the objective properties of the objects being studied.

Subjective experience

Finally, let me address the notions of subjective experience and qualia which seem so real and yet reductionism seems to deny their existence. Do I claim that they do not exist? On the contrary. Subjective experience exists and manifests itself as and only as a state of a brain. Subjective experience is a pattern of neural activity. Qualia correspond to various states of physical brain. Knowing perfectly configuration of a brain is enough to fully understand human experience and to predict with highest possible accuracy (subject to quantum uncertainties) how a person is going to react to a particular experience.

The current state of neuroscience does not allow modeling precisely the neural machinery that gives rise to our thoughts. I can only speculate that, for example, there may be a pattern of neural activity that gets activated when you see red color. There also may be a pattern that gets activated when you want to say what you see. Neurons in the part of the brain responsible for speech detect that these two patterns have been activated and they send a signal to your mouth. You say ‘I see red.’

There may be a pattern that gets activated when you perceive a mind. It gets activated when you think about your own perception of red color. But it does not get activated when you think about your own brain tissue. The patterns associated with these two phenomena get activated due to different inputs and cannot be merged. Failure to merge them is recognized and a pattern corresponding to detection of cognitive dissonance emerges. This pattern in turn makes your mouth say: ‘My subjective experience cannot be explained by reductionism.’


Note: Special thanks to Kamil Faber and Julia Madajczak for helping me make this essay slightly less unintelligible. 

Friday, July 24, 2015

Philosophic burden of proof in God debates

Abstract: Requiring a position to meet its burden of proof before it can be believed is a bad approach to decision-making. In reality, decisions often need to be made before any proofs can be produced. In this essay, I show what are the flaws of the burden-of-proof logic. I suggest a different way of evaluating beliefs, which is consistent with necessity of making decisions under uncertainty. I also address most popular arguments put forward by the proponents of the burden-of-proof logic. Finally, I try to explain what are the reasons and consequences of adopting their way of thinking. 


Introduction

Burden of proof is a framework of logical thinking used to establish validity of certain claims about reality. It indicates which side should provide arguments for their case. Many atheists argue that theists should provide arguments, because these are theists who make a claim about reality: “God exists.” On the other hand, atheists do not make a claim about reality; they merely reject the theist claim. The default position of a skeptic and a rational person is, according to the burden-of-proof logic, disbelief in a claim until sufficient evidence is provided.

A common response to this reasoning is that atheists put forward an alternative claim: “God does not exist.” But according to many atheists, this is not the case. The discussion is only about the claim made by theists. For example, a person who has never learnt the notion of a god is also an atheist. Such a person could never make a claim one way or the other. Moreover, when rejecting a claim, you are not making yourself automatically a proponent of an alternative claim. There are things about the Universe that we just simply do not know yet. You cannot be forced to automatically accept a bad claim because you rejected another bad claim. Finally, there are an infinite number of claims that can be made up by a person with good imagination. It is impossible to even name them, let alone produce reliable counterevidence to all of them. Therefore, a default position must be to disbelieve a claim until evidence for it is provided, without being forced to accept an alternative claim in the meantime. The evidence does not have to take a tangible form. It can be in form of sound reasoning or even vague clues in some cases.

Although there are, no doubt, a considerable variety of opinions on this matter, the above explanation more or less summarizes why many atheists think that the burden of proof lies with theists. Some theists try to turn this argument around. They say that atheists are indeed making a claim that “God does not exist.” Also, there is good evidence for their case and there is insufficient evidence for the case of atheists, therefore it is atheists’ turn to argue for themselves. However, these arguments do not go deep enough. In most debates, atheists present much higher level of knowledge and sophistication in their thinking then their adversaries. Since they have hard time arguing otherwise, it may seem that theists are indeed the ones who need to provide justification for their claims and the burden of proof lies with them.

The philosophical questions that are interesting to me are: is burden of proof a valid framework of logical thinking? Will using it improve quality of the inferences we make about reality or should we use a different framework? What different framework would be better? And if burden of proof is the best framework, how to decide who has the burden of proof? Is there a general rule or the rule depends on the situation?

In this essay I indent to answer these questions by picking apart the burden-of-proof logic and by addressing most common arguments supporting it. But before I start this job, let me briefly explain why it should be important to both sides of the debate: application of burden of proof shapes communication between atheists and theists. It generates some confusion and disagreement. It will be hard to come to an agreement if the parties use different logic, and especially hard if the parties use faulty logic. So, whatever side you are on, if you care for the truth to prevail, in your interest is to get both yours and your opponent’s logic straight. 

Decisions matter

Why are the debates between theists and atheists important? Why are debates important in general? The debates I want to discuss in this essay are a way of exchanging information that is supposed to help establish what is the nature of reality. You can have such a debate in your own head when you consider pros and cons of some claims. We can also hold such a debate in public to educate people and show to the undecided which arguments are stronger. Regardless of whether argument goes on in your head or in public, the same logic should govern the reasoning: the correct logic.

Why do we care what the nature of reality is? We care about it because our vision of reality influences our decisions, which in turn reflect on our wellbeing. If somebody harbors in his head a belief that bears no effect on his decision whatsoever, this belief is irrelevant to other people and to the entire outside world. For some people it may be interesting to indulge in “academic discussion” about such beliefs, but for the sake of this argument, I am going to consider only beliefs that influence our decisions.

The theory of decisions under uncertainty

Consider the following scenario: you are speeding on a highway. There is a turn ahead. You are wondering whether there are police waiting behind it. If there are, you are probably going to get a ticket. Now, you have to decide whether to slow down or maintain your speed.

This is a typical problem in which you have to make a decision under uncertainty. You have some prior information about the situation – you have your past experience with police and you have heard many stories. You also know the approximate costs and benefits of each choice: the approximate value of the ticket, the amount of time lost by being pulled over and the amount of time lost by slowing down to avoid the ticket.

So let us apply the burden-of-proof logic to this situation. The default position is disbelief in the claim that police exists until sufficient evidence is provided. And since there is no proof that they are there, the decision should be clearly to keep on speeding. But we know this is a wrong decision under some circumstances, especially if you know that police really like to be there. Either the burden-of-proof logic is not working here or I have misapplied it. You can argue the latter by saying that the claim should be “there is a 10% chance that police is behind the corner.” Your burden of proof is now met by your previous experience with cops but you must have arguments both in favor and against their existence to evaluate the probability.

This is the way we seem to make decisions. We consider alternative versions of reality – in fact cops either are there or they are not. We assign a probability value to each version of reality. Then we think about costs and benefits of our decisions under each version of reality and we choose the option that is best for us. This is how rational decisions are usually modelled in social sciences.

 ‘Wait a minute’ you may think. ‘But many people don’t evaluate probabilities and don’t build a comprehensive cost-benefit analysis when they make decisions!’

That is correct. Many people do not consider consequences of their decisions and do not consider how probable their assumptions are. We call such decisions irrational. In virtually every decision made by humans, there is a component of irrationality. We do not have enough intellectual capacity to process all information efficiently. However, we are quite good at approximating rational decisions even when we do not explicitly think about probabilities. Moreover, it has been proven over and over again that the more irrationality in the decision-making process, the worse the results on average. An agent concerned with her wellbeing must aspire to be as rational as possible. Therefore, I am going to assume that the decisions we discuss are rational.

Consider a different example: a claim that some diseases are caused by germs. Belief in such a claim can yield significant benefits. For example, you can start washing your hands, which can extend your expected life span by a few years. Today people ascribe a very high probability to this claim. But a few centuries ago it did not even occur to them. They were acting as if they assigned zero probability to this claim.

In a rational decision-making process you need to consider alternative versions of reality and assign them probabilities. The version of reality that did not occur to you gets by default probability zero. Being rational means choosing best option given the information you have. Being rational does not depend on the quality of the information you have. If you have insufficient or false information, your decision can still be rational if based solely on this information. People who were not washing their hands were rational, they just had insufficient information.

Let us go back to the question about a god. First, recall that we are interested in decisions and their consequences. Going to church can have different costs and benefits depending on which version of reality is true: “God exists” or “God does not exist.” A rational person concerned with her wellbeing should assign probabilities to both claims and choose the best option. The process of evaluating probabilities requires both arguments against and arguments in favor of existence of a god. If you had arguments for only one side, you would end up with probability 1 or 0. If you had no arguments whatsoever, the probability would be a half for each. 

It is impossible to square the burden-of-proof logic with rational decision making. Evaluating a single claim about the nature of reality in isolation does not make sense because it cannot well translate into a decision. In order to have a decision, you need to evaluate at least two claims and assign them probabilities (keep in mind that probabilities must sum to one). As a result, it turns out that the burden-of-proof logic is useless. 

You can’t prove negative

After establishing that the logic at its core is flawed, I need to debunk examples and lines of reasoning presented by some atheists (and other skeptics) that are built around this logic. The first is the notion that it is impossible to prove negative. As the argument goes, it is inherently hard to prove certain claims false as they are constructed in such a way that they elude counterarguments. An often cited example was coined by Bertrand Russell in 1952: “If I were to suggest that between the Earth and Mars there is a china teapot revolving about the sun in an elliptical orbit, nobody would be able to disprove my assertion provided I were careful to add that the teapot is too small to be revealed even by our most powerful telescopes.”

To break this argument down, consider the following scenario: your five year old kid comes to you, waves his stuffed kangaroo in front of your face and says: ‘Mommy, Timmy says he is hungry and wants you to put some marshmallows in his pouch.’ 

This is a serious decision-making challenge that you are facing now. Any bad decision can have negative consequences. If you do not give Timmy marshmallows, and the claim made by your kid turns out to be true, the poor creature could suffer or even starve, making your child desolate. However, if the claim is false and you donate the marshmallows, they will surely end up in the stomach of your little love, after spending some time in the dirty and hairy interior of Timmy’s pouch. Similar decisions in long run can harm your child’s health and erode your authority.
So what will you do? On one hand you have the child’s testimony. On the other hand you have the following arguments:
  1. The claim is incompatible with scientific worldview. You could dissect Timmy and see inside that there is nothing that could generate human speech.
  2. The general experience is that stuffed animals do not talk. There are no verified examples (except when they are stuffed with electronics, but Timmy is not).
  3. The children at this age tend to make such things up. There is a lot of verified examples of children making claims that turned out to be untrue.
  4. The kid himself may have an interest in making you believe the claim. Certainly, he is projecting his own desires on Timmy, as marshmallows tend to be favorite food of kids, not stuffed kangaroos.
After performing this sophisticated analysis, you may conclude that ‘Well, I cannot prove that Timmy did not say that. I cannot prove negative, therefore I must give marshmallows to Timmy.’

We know however from the previous section that this is faulty reasoning leading to an irrational decision. You cannot evaluate kid’s claim in isolation. You have to evaluate it together with the claim “he made it up” and assign both claims probabilities. In the end, you will most likely assign very high probability to the claim “he made it up” and act accordingly. Maybe Timmy did speak and you may never be one hundred percent sure. But for practical reasons, you have enough evidence to disprove it. The fact that you cannot “positively” prove that Timmy did not speak is irrelevant.

This example is not as ludicrous as it sounds. It is indeed very similar to the main topic of this essay. Note that supernatural religious claims have similar characteristics:
  1. They cannot be explained using known and verified laws of nature.
  2. There are no verified examples of anything supernatural ever happening.
  3. People have a natural tendency to make up supernatural claims.
  4. People often have an interest in persuading you and a tendency to engage in intellectual gymnastics to make their arguments about supernatural look credible.
As a matter of fact, it is often impossible to prove negative beyond any doubt (some people may say that it is impossible to prove positive beyond any doubt too). But for practical reasons it is not necessary. There are often good reasons to believe that the claim was made up and this should be enough to force the person making the claim to produce more compelling evidence. For example, I have good reasons to believe that Bertrand Russell made up the celestial teapot. When forced to make a decision (e.g. make a bet), I will assign very low probability to existence of the teapot unless more compelling argument in favor of it is provided. Mr. Russell seems to be fully aware that this is how it is supposed to be, as after introduction of his example he swiftly adds: “But if I were to go on to say that, since my assertion cannot be disproved, it is intolerable presumption on the part of human reason to doubt it, I should rightly be thought to be talking nonsense.”

In fact, proving negative is very common and one does not have to come up with fancy examples to show it. Proof by contradiction is a valid logical way to prove non-existence of an object. In mathematics, a well-defined object can be tested for consistency of its properties. If the properties are inconsistent, the implication is that the object does not exist. Similarly, an object in physics can have a well-defined and ever-present influence on the environment. The lack of the influence implies that the object does not exist. 

Null hypothesis

I sometimes hear the notion of null hypothesis being brought up in God debates, although the context is never clear enough for me to formulate it. Nevertheless, it does not seem that the notion is used properly, so let me explain what it is about. You can also read more on Wikipedia.

In statistics, a null hypothesis is a set of assumptions about available data. Statisticians use the data to calculate certain value called test statistic (example: sample average). If the assumptions of the null hypothesis are satisfied, then it is mathematically proven that test statistic should have a particular probability distribution. This means that the test statistic is very likely to take on some values and less likely to take on other values (example: your null hypothesis is that man are on average 6 feet tall; it implies that your sample average is not likely to be around 7 feet). If the statistic takes a value that is considered unlikely (your sample average turned out to be 7 feet), it indicates that you should reject the null hypothesis.

However, if your statistic takes on a likely value (your average turns out to be 6.1 feet), it does not mean that you should accept the null hypothesis. Virtually always, there exist multiple different null hypotheses that could all be accepted. Sample size is never big enough to eliminate possibility of unexpected random deviation (even though sample average is 6.1 it may turn out that the true average among all men is 5.9 instead of 6 feet).

Null hypothesis is a tool for proving negative. It is very similar to proof by contradiction. You start with some assumptions. If you end up with a result that is unlikely, your assumptions are likely to be wrong. But having a likely result does not prove the assumptions, because even with different assumptions, you may end up with the same result just by chance.

Gumballs in a jar

Wikipedia contains the following example that explains the burden-of-proof logic: there is a jar with gumballs. The number of gumballs is either even or odd. There are two claims about reality:
  1. Number of gumballs is even.
  2. Number of gumballs is odd.
If we have no information about the number of gumballs, then we have to suspend our belief in both claims until evidence for any of them is provided. This is the default position. End of the example.

The failure of this example is in the fact that it is irrelevant, because there is no decision involved. To fix it, we can add in a decision problem. Let us say that you have to make a guess whether the number is odd or even and if you make a wrong guess (or refuse to guess) you will be shot in the forehead with a Magnum.

Now the situation is quite different. In absolute absence of information about the number of gumballs you will assign each claim a probability of a half and pick at random. If on the other hand there is no evidence except for somebody claiming that the number is even, you will not “disbelieve this person by default until sufficient evidence is provided.” Instead, you will choose “even” as your answer, unless you have reasons to believe that this person may be misleading you. 

Innocent until proven guilty

Another popular example involves analogies to a court case. In this scenario, a claim about the nature of reality is compared to a claim that a person is guilty of murder. The burden of proof lies with the prosecutor to prove the guilt, and judges (or jury, depending on the legal system) have to produce a “guilty” or “not guilty” verdict (rather than “guilty” or “innocent”). “Not guilty” in this case does not mean that the person was determined to be innocent. It only means that there was insufficient evidence to ascertain guilt. In a similar way, the burden of proof should lie with a person making a claim about the nature of reality, and rejection of the claim does not automatically imply acceptance of an alternative claim.

The argument that judges (jury) vote “guilty – not guilty” is a word play. Recall that presumption of innocence is often phrased as “innocent until proven guilty.” To keep the wording consistent with the burden-of-proof logic, it should rather be phrased as “not guilty until proven guilty.” But is it that these words have a deep significance to the logic of the decision-making process? Or is it that they are based in tradition and do not imply anything beyond their literal meaning?

In fact, a court decision is often a good example of how the theory of decisions under uncertainty works. Both sides need to present their arguments. There is both a prosecutor and a defense attorney who present different accounts of the nature of reality. Based on their arguments, court assigns probabilities to both stories. Then, within its legal constraints, court considers costs and benefits of possible decisions it can make. Finally, a ruling takes place.

The presumption of innocence indicates that courts ought to be risk-averse in their rulings. That is, the probability of committing a crime must be way above 50% for a person to be sentenced. This is a reflection of a simple preference of our society: we consider sentencing an innocent person to be a greater injustice than letting a criminal get away with his crimes. This is also valid from practical reasons: the information provided during a trial is hardly ever complete and even judges are not perfectly rational, so we need a great degree of certainty before we let them decide to ruin somebody’s life. The reasons why decision-making process in courts is structured in such a way are thus idiosyncratic. Using court rulings as a model for any decision is simply wrong.

Evidence for claims in science

‘Whenever scientists make a claim, they back it up with evidence. You do not often see scientists making up unbacked theories’ another argument goes.

That is true, a significant part of science is about assessing which claims about reality are true and which are false. For example, the framework of null hypothesis discussed above is a tool of science that serves exactly that purpose. But still, there are reasons why this analogy is wrong.

The concept of a god is not something new that was recently invented by a mad scientist. Theism is the incumbent idea. Similarly, geocentric model of the Universe was the incumbent idea in the medieval times. Are you saying that Copernicus could just have said ‘I do not accept geocentric model of the Universe because it has not met its burden of proof’ and people should have been persuaded? Incumbent ideas usually have some arguments backing them, however flawed. If you want to dismantle an incumbent idea, you must provide new arguments to counter the old ones.

But dismantling an incumbent idea is not enough. People need information about the nature of reality in order to make decisions. If you point out flaws in the only existing theory without providing an alternative one, people have no choice but to still assign this theory probability 100% when they are forced to make decisions. The only way to change that is to introduce an alternative theory. And this new theory will require evidence.

The fact that science focuses so often on single claims rather than decisions has a good reason: efficiency of communication. It is much easier to discuss a single claim about the nature of reality than a decision that entails multiple claims or a single claim in context of all decisions it can be part of. Nevertheless, scientists often provide examples of decisions their work can influence, to show why it is relevant. And new theories hardly ever appear in vacuum. Virtually always, there is some sort of incumbent theory which needs to be replaced. For example, the arguments supporting germ theory were made to convince people whose previous theory was that disease was a punishment from God. To use a familiar analogy: a scientist is like a prosecutor or a defense attorney. It is not their job to make a decision; their job is to provide evidence in favor of their account of the nature of reality. 

Extraordinary claims require extraordinary evidence

For people who believe in a god, it is often as self-evident that their god exists, as it was self-evident to medieval astronomers that the Sun revolved around the Earth. Imagine that you are one of those medieval astronomers. Somebody comes up to you and says that your claim about the Sun revolving around the Earth is extraordinary and you need to produce extraordinary evidence for it, otherwise your claim is unbelievable. This sounds ridiculous. Just look at the Sun.

It is all about perspective. Who decides what claims are extraordinary and what type of evidence is required? Atheists may label theist claims as “extraordinary” but they cannot just impose this labeling on theists and expect them to accept it. If atheists want to have an actual dialog with theists, they need to explain why theist claims should be labeled “extraordinary.” This in turn entails providing and justifying and alternative vision of reality, that is taking on the burden of proof.

There are many claims made by theists that can easily be recognized as extraordinary even by the people who make them. The communication failure happens, when the rule “extraordinary claims require extraordinary evidence” is applied across the board without making sure that the other side agrees with the labeling and with what labels imply. 

The burden of proof

The decision to place the burden of proof on somebody who claims existence is arbitrary. Theists often turn this argument on its head, for example saying that we should then consider a claim that there exists a universe without a god. The typical response from atheists is that it is a cop out. Is it?

As I mentioned before, I consider only relevant questions. Both existence and non-existence of a relevant object have their consequences (if there are no consequences, then the object is irrelevant by definition). The attempts to explain why we should prioritize one of these options over the other are made only with examples of irrelevant objects, such as the celestial teapot orbiting the Sun. To the contrary, people in general seem to be very interested in establishing non-existence of relevant objects, especially if these objects could threaten their wellbeing. Before walking into a gas chamber, which of the following claims would you prefer to be proven: “Zyklon B is in the chamber” or “there is no Zyklon B in the chamber?”

The notion of default position and the burden-of-proof logical framework do not hold up when faced with practical problems. They are flawed and must be discarded. But should we abandon the notion of burden of proof entirely?

Burden of proof can be understood as a communication device. It can imply what the presuppositions are and who should provide the arguments to counter these presuppositions, like in the case of judicial trial. In common discussions, the burden of proof lays often with the person who wants to change beliefs of other people. In general, the burden of proof is most often on the person who challenges the incumbent idea. There is nothing wrong with it. But stating that it should always be applied in this or any other way and making a rule of logic out of it requires justification, not mere assertion. 

Flying Spaghetti Monster

Does it all mean that we have to debunk every possible claim, no matter how ridiculous?

Let us consider the example of the Flying Spaghetti Monster (FSM hereafter). Let us imagine that the question of existence of FSM is relevant, that is our decisions should be based on the true nature of reality, otherwise there is some sort of penalty: you will be sent to FSM’s hell if it does exist and you do not observe prescribed rites or you will be wasting time and being obnoxious if it does not exist and you do observe the rites.

Regardless of whether the decision process happens in your head or whether it is held in a public forum, one of the arguments can be: “FSM does not exist because it seems to be made up by some people in order to troll other people.” An argument in favor of alternative point of view may be: “I had a personal revelation from FSM that he exists.”

A person perceiving such an exchange has to consider both arguments and assign probabilities to corresponding claims. How well a person evaluates the probabilities depends on the qualities of the person, for example how knowledgeable or intelligent she is. If a person comes to a conclusion that the first of the arguments has no merit and the second one is all that is needed, it just means that the person has either no clue how to evaluate evidence or no knowledge about contemporary societies. The discussion with such a person about the claim itself is thus pointless. Instead, this person probably needs to be educated how to evaluate evidence in general, and such a discussion is recommended instead.

On the other hand, a person who knows how to evaluate such arguments will quickly conduct the entire reasoning in her head. It will happen in a split second upon the first time she hears about the FSM. In fact, this happens to most people. The results of this quick reasoning seem so obvious that people often forget the reasoning even took place. So, yes, unfortunately every ridiculous claim needs to be debunked. The good news is that it usually takes less than the time needed to utter it. 

Atheism

Atheism is most often defined as “a lack of belief in gods.” Definitions often underline that it is not “a belief that gods don’t exist.” The practical difference is that the latter definition includes only people who thought about gods and concluded that they did not exist. The former encompasses also people who have never thought about gods because this notion never occurred to them as well as people who thought about gods but decline to provide their stance.

Atheists often complain that people conflate these two definitions. But the source of complaints is oftentimes not that individuals got miscategorized, but because of the definitions’ supposed implications for the burden of proof. In atheist worldview, the former definition shifts the burden of proof to theists. The latter definition indicates that atheists are making a positive claim which would require evidence and would place the burden of proof on them. The question is: why people keep confusing the two definitions?

A typical atheist neither goes to church nor observes other rituals prescribed by religions. Most of the time, atheists know that if the claim about God is correct, their wellbeing is in serious jeopardy. Nevertheless, they keep on behaving as if gods did not exist. Yet they say that they are atheists not because they think that gods do not exist. To the contrary, some of them say that in addition to being atheists, they just happen to believe that gods do not exist.

Consider a rational person deciding whether to go to church. Under the theory of decisions under uncertainty, the person has to assign probabilities to at least two claims and evaluate costs and benefits of the decision under each possible version of reality. Such a person would not go to church only if probability assigned to “God does not exist” is high. Yet some atheists say that they evaluate only the “God exists” claim. They do not assign probability to the claim “God does not exist.” How is that possible? There are a few possible explanations for this paradox:
  1. Atheism is a vapid notion that has no implications on the behavior of people under this label.
  2. Atheists are irrational in their decisions.
  3. Atheists are disingenuous by claiming that they do not hold a certain belief and yet they consistently behave as if they did.
  4. Theory of decision under uncertainty is wrong and cannot be applied to this situation.
Back to the topic: the current definition of atheism focusing on beliefs seems coherent with an alternative definition focused on behavior: “an atheist is a person who behaves as if gods did not exist.” People who consciously call themselves atheists, people who were not exposed to the notion of a god, as well as the “undeclared” tend to behave as if gods did not exist. There seem to be no need to change the definition. What needs to be changed is the silly burden-of-proof logic.

Source of confusion

The entire essay could be probably replaced with a single statement: “the claim that the default position and the burden-of-proof logical frameworks are valid has not met its burden of proof; it should be thus disbelieved by default.” However, as I am challenging an incumbent theory, I feel obliged to provide as good arguments as possible. And arguing for the new theory seems incomplete without speculating why the previous theory was so readily accepted. I consider reasons to be the following: 
  1. It is hard to realize that considering claims in isolation does not make sense. You need to conceptually link up claims with decisions and have a working model of decision making.
  2. A lot of intellectual activity focuses on claims, especially in science. It may seem natural to use claims as a basic unit and consider them in isolation.
  3. It is not in anybody’s interest to voluntarily give up an advantage. Putting the burden of proof on theists is convenient for atheists. Maybe this is why not enough effort has been made by atheists themselves to debunk the burden-of-proof logic.
  4. As Bertrand Russell pointed out, proving objectively non-existence is in some cases impossible, especially if the object in question is irrelevant (that is, it does not have a measurable influence on the people who argue about it). Given the recent tendency of theists to roll back their claims and to make them less and less relevant, it seems natural to demand more evidence supporting such claims.
  5. Some atheist feel like arguing against popular notion of God is similar to arguing against Flying Spaghetti Monster. The idea just seems to them so ridiculous that no arguments against it need to be made – on the contrary, strong arguments must be made to support such an “extraordinary” claim before it is even worth considering.
I hope that by now all these reasons have been cleared.

Consequences of confusion

The last topic I would like to cover is why this is all important.

First of all, using faulty logic is never good. The quality of conclusions that can be drawn from a discussion at hand is not its only victim. Invalid patterns of thinking once adopted by people, can be used in other discussions as well. Therefore, it is not just the conclusions about gods that are at stake, but the entire body of logic and all future conclusions that could be reached for bad reasons.

Second reason is the easily observable failure of communication between atheists and theists. Theists often have hard time trying understanding the argument about burden of proof because it is so unintelligible. The reason for it is not that theists lack intelligence, but that the argument is wrong. As a result, it is much harder to reach an agreement. Theists are left unconvinced that what atheists are saying makes sense and atheists are frustrated that theists do not want to accept their arguments. Such an impasse makes it easier to create hard feelings and makes it less easy to carry out meaningful debate leading to a conclusion. In a nutshell, use of faulty logic can impede propagation of good ideas.

Finally, holding on to the burden-of-proof framework is highly counterproductive from the point of view of atheists themselves. Theists oftentimes leave the discussion with a feeling that atheists are disingenuous. Almost all atheists behave as if they believe that a god does not exist. Yet they keep on saying that it is not their belief. This clear discrepancy between words and actions is easily noticeable and many theists feel perplexed once exposed to it. And they cannot be blamed, if in the end they conclude that atheists are selling them bullshit.

Disclaimer

I use labels “atheists” and “theists” in the context of media personalities who recently dominated public sphere, mostly YouTube, as well as people who respond to these personalities. This essay does not intend to objectively represent views of groups that use mentioned labels but rather addresses some of the arguments made by some people in these groups.