Wednesday, December 16, 2015

Dick pics

I sometimes come across videos like this


or pictures like this


This is very interesting to me because it shows how people have no idea what is going on in their own brains.

Let us start with the question: why an unwanted image of a dick causes us to have an emotional reaction (and it seems that both man and women have a similar reaction after looking at a stranger’s dick – some mixture of shame and the feeling of being intimidated)? I remember when I was talking to a male friend on Skype not so long ago. Another male friend of mine stood behind him and suddenly pulled his pants off and showed his dick to the camera. My immediate reaction was to curse and look away in disgust. The thing is that unlike many people who send dick pics, my friend perfectly knew what he was doing – he has read The Human Zoo by Desmond Morris. I did too, but the trick nevertheless worked.

So why do dicks evoke these emotions in us? Do we learn to be scared by dicks? When and how exactly do we learn about this? Somebody tells us? Or do we need an unpleasant experience with a dick that belongs to somebody else?

There is no reason to be afraid of a dick if you had none of such experiences. But people still do have these emotions. So what causes them?

Another example I know is when I was just a few years old and I was playing at a riverbank with a group of female friends, all of them a few years old. A dude with a mustache was riding a bike nearby. He stopped close to us, silently pulled down his pants and showed us his junk with a smile on his face. The girls started screaming and run away. I probably did the same thing, although I do not remember well, as it was so much time ago (by the way none of us thinks about this as a traumatic experience now; it was quite benign; after the incident the guy rode off and we never saw him again). Of course, nobody explained to us before that this was an appropriate reaction in such a situation. But it seemed appropriate. Why?

The answer is that this is our innate instinct. We have inherited from our ancestors some types of social interaction that are guided (among others) by genital display. People who observe primates know that the genital display is a way to communicate social status in a group. Dominant male individuals show their dicks much more often than other individuals. Human brains are wired in a similar way. Seeing somebody’s dick makes you feel intimidated and human intuition sometimes makes guys show their dicks in order to intimidate others.

A woman often feels disgust after seeing stranger’s dick, yet she may think that his intention was to arouse her and the man had no idea that his dick looked gross to her. A men asked why he sent a dick pic would probably say something like that: “it was a joke; I wanted to embarrass her; I like to show off my masculinity.” What is really going on, is that when a man wants to intimidate a woman (or less often another man), his primate intuition tells him to show off his dick. The woman verbally misidentifies his intentions but emotionally responds in an intended way. The act achieves its goal. Note that the man also somewhat misidentifies what really caused him to do this. It is because he acts on his animal instinct.

So, as it turns out, unwanted dick pics are a product of our ancestral way to ensure group cohesion through authority structure. These mechanisms have not much use nowadays but they still hang around aimlessly in our brains causing trouble. Moreover, the example of dick pics shows nicely how our verbal processing is disconnected from the part of the brain where we actually make decisions. Neither a man nor a woman verbally understand their own role in this situation – unless they are educated in anthropology or primatology.

If you want to read more, I recommend:

Darkness in the sense of justice

People have an innate sense of justice. Our intuition tells us that if somebody did something wrong, they have to be punished. Notion of karma and such are based in the human tendency to think that there is some cosmic justice. And people who figured out that there is no objective justice, take such justice as an ideal humans should strive for. A just world is what most of us are working towards.

So maybe before we start making decisions based on our sense of justice, it may be good to know where does it come from and whether we should trust it. And of course, as most other intuitions, the sense of justice is a product of natural selection. It is how nature wired our emotions in order to guarantee that we cooperate, enforce social cohesion, and so on. The problem is that nature tends to implement technological trade-offs in her designs, and the things she creates are not perfect. For an easy example, just look up the recurrent laryngeal nerve which connects the brain with the jaw but goes down to the chest for no apparent reason (which is a gross redundancy for a giraffe).

And here is the question I have kept on asking. Should we trust our intuitions? I would say – no. Our intuitions are the animal spirits that the nature equipped us with to deal with much different circumstances than those of today. And even when the circumstances are right, the animal spirits are not guaranteed to be perfect. As anything else designed by natural selection, they are technological trade-offs.

Following our innate sense of justice may lead to sub-optimal design of society, and thus to more suffering that it would be necessary if people behaved rationally. Rationality should be a yardstick against which we should judge how efficiently our instincts help us shape the society. If you want to rationally maximize social welfare, you should consider what are the specific consequences of the decisions you make, rather than follow your intuition. For example, it is reasonable to think that some punishment for crimes is necessary in order to deter people from committing crimes. But if such deterrence cannot be achieved, there is no rational reason to punish a person. Moreover – there may be good reasons to offer the person help in order to make them a better citizen rather than let them learn how to be a hardened criminal during their jail time.

You may cringe at the notion that some crimes should go unpunished. But this feeling is precisely the dark, irrational revenge-seeking sense of justice that was implemented in you by nature. If you want to build a better society, you need to set the feelings aside and perform strict cost-benefit analysis of the decisions you are facing. When you compare outcomes obtained with rationality to outcomes obtained with the human innate sense of justice, it is easy to see the darkness of our animal spirits. 

Further reading

For examples on how people deal with trade-offs between innate sense of justice and pragmatism, see:
Related peer-reviewed papers:

Monday, December 14, 2015

Undeserved saliency of thoughts and words

It happens very often that we put a great emphasis on what people think express verbally. I can see it in philosophical texts where, for example, philosophers deliberate what should be more important – expressed preferences or revealed preferences (see Decision Theory and Rationality by José Bermúdez, p. 64) or while listening to people who keep on talking about their beliefseven in absence of any decision making problem these beliefs could influence.

My take on it is that what ultimately matters is behavior and decisions. Description of human mental processes can help us understand some aspects of human behavior but is only a part of the picture. For example, if we want to see what a person truly wants, the action and the actual choice should be taken into account rather than what the person says she wants, even, or especially, when the two are in conflict. Similarly, it does not matter what convoluted theories people come up with in order to explain how their thoughts interact with their behavior. If their behavior can be fully explained by a simpler theory, then the convoluted ones should be discarded.

But why? Why am I so eager to demean human thoughts? The reason is simple. Verbal processing and speech are devices that serve some evolutionary purposes. I do not believe that providing a perfect window into the operation of the human mind is one of these purposes. On the contrary, we know that a lot of mental processes are unconscious. The part of the brain responsible for verbal processing is not connected to all other parts of the brain that are responsible for making decisions. Therefore, we are unable to describe fully what is going on in our heads. Furthermore, there aren’t even reasons to believe that spoken words are a perfect window into the part of mind that is available to verbal processing. It may well be a dirty window obstructing the view or a distorting mirror.

To give you an analogy – imagine that human nature is a picture of a ruined city with a single nice flower in the foreground. What people say about their thoughts can give you access only to the part of the picture that has the little flower, probably seen through a distorting lens. It is not wise to draw a conclusion about what the entire picture represents based on this little image only. If you want to know what the human nature truly is, you must go beyond the verbal processing and look at the entire picture.

The best way to think about it is to ask yourself a question: what could I learn about humans (and how) if they could not speak? Or even better: how would I go about learning about an alien species that I have no clue how to communicate with and who may be different to me in any aspect? If you can think about humans and analyze them as an alien species, then you are on a good way to be objective in your analysis of the human nature. But if you are focused on what people think that is going on in their heads – then you may be bound for a dead end. 

Wednesday, November 18, 2015

Big data regression

Problem formulation

You have a dataset in which each observation is an impression of a banner ad. You have a variable indicating success (say, click or conversion) and a lot of additional information (features), all of which are coded as binary: browser type, host, URL, placement, banner format, banner id, banner keywords, hosts the viewer has seen so far, how many times the viewer has seen particular banners, when did s/he see these banners, how many times did s/he click on which banners, how did s/he behave while shopping online, current date and time, geoid information, and so on.

The question is which features increase the chance of success and which decrease it? This is an important question if you want to allocate your advertising resources efficiently. The difficulty is that the number of observations is in billions and the number of available features is in millions.

A solution

A naïve approach is to create a table which informs how many successes and how many failures occurred when feature was present or absent. Then, you can compare success ratio in absence of the feature with the success ratio in presence of the feature. If the latter is higher than the former, then the feature indicates higher probability of success.

This approach is similar to calculating simple correlation between the feature and the success indicator. And thus, it suffers from endogeneity. If a combination of two features often occurs together, say a particular host and a particular banner, and both of them seem to have high correlation with the success indicator, you do not really know whether this is the banner that drives success, the host, or both.

In order to separate the effects of features, you need to calculate partial correlation, conditional on other features, rather than simple correlation. The straightforward way to do it is to perform an ordinary least squares regression on the data. Unfortunately, there exists no software that could handle amounts of data you have. Even if you limit the dataset to most common features – say top 5000 – you still end up with several terabytes of data to be processed by the regression algorithm. To focus attention, let us say that we need a way to perform a regression on n = 4 billion observations and k = 10 thousand features. If each variable takes up 4 bytes, the amount of memory required to perform such analysis equals nearly 160 terabytes.

Typically, linear least squares models are fit using orthogonal decomposition of the data matrix. For example, R package uses QR decomposition. One can use also singular value decomposition. Unfortunately, these methods require all data to be kept in memory and have algorithmic complexity of O(nk2).

Alternatively, one can calculate Gram matrix. This has algorithmic complexity of O(nk2) which can be reduced to O(np2) if the data are sparse (where p is the quadratic mean number of features per observation) and very easily parallelized. Another advantage is that memory requirement for calculating Gram matrix are O(k2) only and for k = 10000 the exact amount of RAM required to keep Gram matrix would be just under 200 MB (keep in mind that Gram matrix is symmetric). The only problem here is that to calculate regression coefficients, it is necessary to invert the calculated Gram matrix (which is often discouraged due to inferior numerical stability and takes O(k3)). The viability of this solution depends thus on whether it is possible to do it with satisfactory numerical accuracy. As it turns out, it is.

Note that popular machine learning engines like Vowpal Wabbit are not of much use in this situation. Machine learning is usually concentrated on prediction, rather than accurate estimation of model parameters. Engines like VW in principle are less accurate than OLS. They allow multi-collinearity of variables which in turn forces user to perform separate data analysis in order to eliminate it in the first place. Finally, they do not allow for standard statistical inference with the model parameters.

Preliminaries

The plan was to create a C++ class able to do all operations necessary for this regression. The data were stored on a remote Linux server using Hadoop. I was planning to develop and debug my solution using Microsoft Visual Studio 2015 on my Windows 7 64-bit Dell computer (i7-4790 @ 3.6 GHz with 16 GB RAM) and then to port it to its final destination.

There were four initial things I had to take care of: (1) a way of measuring code performance, (2) a way of measuring numerical accuracy of matrix inversion, (3) C++ libraries for inverting matrices, and (4) a strategy for verifying accuracy of the entire algorithm.

Boy, was it hard to find a good way to precisely measure code execution time on Windows. Unfortunately, the usually recommended GetTickCount() Windows API function relies on the 55 Hz clock and thus has a resolution of around 18 milliseconds. Fortunately, I eventually found out about the QueryPerformanceCounter() function, whose resolution is much better.

Next, I decided to use the following measure for numerical precision of matrix inversion. Let us say that you need to invert matrix A. You use an inversion algorithm on it which generates matrix B. If matrix B is a perfect inverse of A, then AB = I, where I is the identity matrix. Hence, I calculate matrix C = AB – I. Then, I find the element of matrix C that has the highest absolute value and call it r. This is my measure of numerical precision. In the world of infinite precision, r = 0. In the real world r < 1e-16 is perfect (I use double – a 64 bit floating point type for my calculations). r < 1e-5 is still acceptable. Otherwise there are reasons to worry.

With tools for measuring performance and accuracy, I was able to start testing libraries. I initially turned to Eigen which was very easy to install and use with my Visual Studio. Eigen uses LU decomposition for calculating matrix inverse and was satisfying in terms of speed and reliability – up to the point when I tried to invert a 7000x7000 matrix. Eigen kept on crashing and I could not figure out why. The second option was thus Armadillo. Armadillo did not have the same problems and worked well with bigger matrices all the way up to 10000.

As it turns out, Armadillo can take advantage of the fact that Gram matrix is symmetric and positive-definite. The inversion is done by means of Cholesky decomposition and after a few experiments I realized that it is not only faster but also numerically more reliable than LU-based method. I was able to invert a 10001x10001 matrix in 283 seconds (in a single thread) with r = 3.13e-14. The irony is that both Cholesky decomposition and matrix multiplication work in O(k3) but the latter is over twice as slow, so it takes much more time to check numerical precision than to perform actual inversion.

Finally, I designed a data generating process to test whether least squares algorithm of my design can recover parameters used to generate the data. Essentially, I created 10001 variables xi for i=0, 1, 2, …, 10000. x0 = 1, always. For i>0 we have P(xi = 1) = 1/(3+i) = 1 – P(xi = 0). Then, I created a vector of parameters bi. b0 = 0.0015 and for any non-negative integer j, b4j+1 = 0.0001, b4j+2 = 0.0002, b4j+3 = 0.0003, and b4j+4 = -0.00005. Finally, P(y = 1) = x * b, where * indicates dot product. This is a typical linear probability model.

Using the formula above I generated 4 billion observations (it took 11 days on 4 out of 8 cores of my Windows machine) and fed them into the regression algorithm. The algorithm was able to recover vector b with the expected convergence rate. Note that by design the aforementioned data generating process creates variables that are independently distributed. I thus had to tweak this and that to see whether the algorithm could handle correlated features as well as to investigate the bias (see more about that in the last section).

Statistical inference

The question of how to recover model parameters from the data is simple. In addition to the Gram matrix, you need a success count vector. The i-th element in this vector indicates how many successes were there when i-th feature was present. Calculating this vector is at most O(np) in time and requires O(k) memory (note that none of the operations involved in calculating Gram matrix and success count vector are floating point operations – this is all integer arithmetic since we operate on binary variables only; thus both Gram matrix and success count vector are calculated with perfect numerical precision). Once you have them both, you need to invert the Gram matrix and multiply it by the success count vector. The resulting vector contains estimated model parameters.

However, getting standard errors of the estimated coefficients is a bit more complicated. Typically, we would use diagonal elements of the inverted Gram matrix and multiply them by standard deviation of the residuals. The problem is that calculating residuals requires going through all observations all over again. This not only increases the execution time. It poses a major technical difficulty as it requires the dataset to be invariant for the duration of the algorithm execution (which is assumed to be at least several hours). To fix this, one would have to tinker with the data flow in the entire system which can greatly inflate project’s costs.

Fortunately, there is trick that can rescue us here. Instead of quadratic mean of residuals, one can use standard deviation of the success variable. Note that the latter must be greater than the former: the former is the quadratic mean of residuals for the entire model and the latter is the quadratic mean of residuals for the model with a constant only. This guarantees that the standard errors will be overestimated which is much better than having them underestimated or all over the place. Moreover, for small average success ratio, the two will be close. In fact, it is easy to show that under some plausible conditions as the average success ratio goes to zero, the two are the same in the limit. And for banner impressions, the average success ratio (e.g. CTR) is, no doubt, small.

No amount of theoretical divagations can replace an empirical test. It is thus necessary to check ex post whether statistical inference using the above simplifications is indeed valid. To do that, I estimate a number of models (keep in mind that I have 10000 variables) and check how frequently the estimated coefficients are within the 95% confidence intervals. I expect them to be there slightly more often than 95% of the time (due to overestimation of standard errors) and indeed, this is what I find.

Finally, I cannot write a section about statistical inference without bashing p-values and t-statistics. I strongly discourage you from using them. A single number is often not enough to facilitate good judgment about the estimated coefficient. p-value typically answers a question like: “how likely is it, that the coefficient is on the opposite side of zero?” - Is this really what you want to know? The notion of statistical significance is often misleading. You can have a statistically insignificant coefficient whose confidence interval is so close to zero that any meaningful influence on the dependent variable is ruled out: you can then say that your data conclusively show that there is no influence (rather than that the data do not show that there is influence). Also, you can have a statistically significant coefficient with very high t-statistic, which is economically insignificant or economically significant but estimated very imprecisely. Thus, instead of p-values and t-statistics I suggest using confidence intervals. The question they answer is: what are the likely values of the coefficient? And this is what you actually want to know most of the time.

Data refinements

Oops. You have a nice OLS algorithm which supports valid statistical inference. You tested it with you generated data and it works fine. Now you apply it to real data and the Gram matrix does not want to invert or inverts with precision r > 1. You quickly realize that it is because the data have a lot of constant, perfectly correlated, and multicollinear variables. How to deal with that?

Sure, you can force users to limit themselves only to variables which are neither perfectly correlated nor multi-collinear. But when they are using thousands of variables, it may take a lot of effort to figure it out. Also, running an algorithm for several hours only to learn that it fails because you stuffed it with a bad variable (and it does not tell you which one is bad!) simply does not seem right. Fortunately, as it turns out, all these problems can be fixed with analysis and manipulations on the already-calculated Gram matrix.

The first refinement I suggest is dropping features that are present too few times (e.g. less than 1000). You can find them by examining diagonal entries of the Gram matrix. To drop a variable you can just delete appropriate row and column from the Gram matrix as well as corresponding entry form the success count vector. After such a delete operation, what you are left with is the same as if you did not consider the deleted variable to begin with. Clear cut.

The second refinement I suggest is to drop features with not enough variability. Based on the Gram matrix and the success count vector, it is possible to construct a variability table for every feature (the same one I described as the naïve solution at the beginning of the article). This table has two rows and two columns – rows indicate whether there was a success and columns indicate whether the feature was present. Each cell contains the number of observations. So you have the number of observations that had a feature and there was a success, a number of observations that had a feature but with no success, a number of observation without this feature but with success, and a number of observations with neither feature nor success. I drop features for which at least one of the four cells has a value lower than 10.

As we proceed with the third refinement, note that you can easily calculate correlation between any two features based on the content of Gram matrix. Just write out the formula for correlation and simplify it knowing that you are dealing with binary variables to realize that you have all information you need in the Gram matrix. This of course allows you to identify all pairs of perfectly correlated or highly correlated variables in O(k2) time. I got rid of a variable if I saw correlation whose absolute value exceeded .99 (doing, say, .95 instead of .99 did not dramatically improve speed or numerical precision of the algorithm).

But now comes a biggie. How to find features that are perfectly multicollinear? One naïve approach is to try to find all triples of such variables and test them for multicollinearity, find all quadruples, quintuples, and so on. The trouble is that finding all n-tuples can be done in time O(kn) which is a nightmare. Alternatively, you can try to invert submatrices: if you can invert a matrix made up of first p rows and columns of the original Gram matrix, but you cannot invert a matrix made up of the first p+1 rows and columns of the original, it surely indicates that the variable number p+1 causes our Gram matrix to be singular. But this solution has a complexity of O(k4) which for high k may be very cumbersome. There must be a better way.

As it turns out a better way is to perform a QR decomposition of the Gram matrix (not to confuse with QR decomposition of the data matrix as a part of the standard linear least squares algorithm). The diagonal elements of the R matrix are of interest to us – a zero indicates that a variable is causing problems and needs to be eliminated. QR decomposition generates the same results as the “invert submatrices” algorithm described above – but it runs in O(k3). And, of course, it is a good practice to check its numerical precision in a similar way we were checking numerical precision of matrix inversion algorithm.

Finally, note that you can sort the Gram matrix using its diagonal entries. I sort it descending so that features that get eliminated are always the features which occur less frequently. It is probably possible to achieve higher/lower numerical precision by sorting Gram matrix, however I have not investigated this issue extensively. I only noticed that in some instances sorting the Gram matrix ascending made the LU inversion algorithm fail (too high r) while sorting descending or not sorting did not affect the LU algorithm much.

All these operations require some effort to keep track of which variables were eliminated and why, and especially how variables in the final Gram matrix (the one undergoing inversion) map to the initial variables before the refinements. However, the results are worth the effort.

Integration and application

The task of integrating new solutions with legacy systems may be particularly hard. Fortunately, in my case, there already existed data processing routines that fed off of the same input I needed (that is a stream of observations in a sparsity supporting format – a list of “lit-up” features), as well as input generating routines that filtered original data with given observation and feature selectors.

I had a shared terminal session using Screen with people responsible for maintaining C++ code for analysis done on these data to-date. We were able to link up my class within the current setup so that users can use the same interface to run the regression that they used previously to do other type of analyses. Later on, I had to do some code debugging to account for unexpected differences in data format but ultimately everything went well.

The first real data fed to the algorithm had 1.16 billion observations and 5050 features. Calculating Gram matrix and success count vector took around 7 hours. Due to refinements, the number of features was reduced to 3104. Inverting matrix took just a few seconds, and the achieved precision was around 2e-7.

Pitfalls

In this final section I would like to discuss three potential problems that do not have easy solutions: variable cannibalization, bias, and causality.

It often happens that a number of available features refer to essentially the same thing. For example, you may have features that indicate a person who did not see this banner in past minute, 5 minutes, hour, and day. These features will be correlated and they have a clear hierarchy of implication. A user can make an attempt to run a regression using all these features expecting that the chance of success will be a decreasing function of the number of impressions. However, the effect of a viewer who has never seen the banner will not be attributed entirely to any of the aforementioned features. Instead, it will be split among them, making the estimated coefficients hard to interpret. This is the essence of cannibalization – similar variables split the effect they are supposed to pick up and therefore none of them has a coefficient it should have (please let me know if you are aware of a better term than “cannibalization”). The simple but somewhat cumbersome remedy for it is to manually avoid using features with similar meaning in one regression.

Secondly, it is widely known that linear probability model generates bias. The biased coefficients are usually closer to zero than they should be. To see why, consider a feature whose effect is to increase probability of success by 10%. However, this feature often occurs with other feature whose presence drives the probability of success to -25% (that is zero). Presence of the feature in question can at best increase the probability to -15% (that is still zero). As a result the feature in question does not affect the outcome in some sub-population due to negative predicted probability. Its estimated effect is thus smaller (closer to zero) than expected 10%.

Note that the reason why linear probability model generates biased results is not because the regression algorithm is flawed but because the model specification is flawed. The P(y = 1) = x * b model equation is incorrect if x * b is smaller than zero or bigger than one because probability by definition must be between 0 and 1. Whenever x * b is outside these bounds, the coefficients end up being biased. That is, OLS correctly estimates partial correlation between independent and dependent variables, but, due to data truncation, partial correlation is not what is needed to recover the linear probability model parameters.

The resolution of this issue may go towards assuming that model specification is correct and finding ways to alleviate bias or at least towards identifying features whose coefficients may be biased. On the other hand it may be also possible to assume that the linear probability specification is incorrect and to investigate whether partial correlation is what is really needed for the decision problems the estimates are supposed to help with. I consider solving this problem an issue separate from the main topic of this article and I leave it at that.

Finally, I would like to make a note on causality. Partial correlation, as any correlation, does not imply causation. Therefore, it may turn out that a particular feature does not have a causal effect on probability of success but instead is correlated with an omitted variable which is the true cause of the change in the observable behavior. For example, one host can have a higher conversion ratio than the other. However, the reason for that may be that the advertised product is for females only. The population of females may be much smaller for the second host even though higher fraction of them buys the product. In such case the second host is actually better at selling the product (that is it is better to direct generic traffic to the second host rather than to the first one) but this information is obscured by inability to distinguish between male and female viewers. It is thus important to remember that the regression provides us only with partial correlation rather than proofs of causality.

The issue of causality is of extreme importance when we are trying to predict effects of policy (like redirecting traffic in the example above). However, when instead of policy effects, we are interested in predictions, partial correlation seems to be a sufficient tool. For example, you may want to know whether people using Internet Explorer are more likely to click on a banner, even though you do not have the ability to influence what browser they are using. In such situations establishing causality is not necessary.

:-)

Wednesday, September 30, 2015

Consciousness and morality revisited

As I am investigating the topic of morality (whether I have anything interesting to say about it is yet to be discovered), I bought “The Moral Landscape: How science can determine human values” by Sam Harris. I was not surprised to see that on the first page, in the Introduction, Harris writes: “I will argue, however, that question about values – about meaning, morality, and life’s larger purpose – are really questions about the well-being of conscious creatures.” I am glad to have yet another example that humans use consciousness as a property defining objects of morality.

The notion of “well-being of conscious creatures” is repeated numerous times throughout the book. On page 32, Harris explains why he chose consciousness as the basis for morality:

“Let us begin with the fact of consciousness: I think we can know, through reason alone, that consciousness is the only intelligible domain of value. What is the alternative? I invite you to try to think of a source of value that has absolutely nothing to do with the (actual or potential) experience of conscious beings. Take a moment to think about what this would entail: whatever this alternative is, it cannot affect the experience of any creature (in this life or in any other). Put this thing in a box, and what you have in that box is – it would seem, by definition – the least interesting thing in the universe.

So how much time should we spend worrying about such a transcendent source of value? I think the time I will spend typing this sentence is already too much. All other notions of value will bear some relationship to the actual or potential experience of conscious beings. So my claim that consciousness is the basis of human values and morality is not an arbitrary starting point.“

There are a couple of problems here. First of all, Harris does not present any constructive argument in favor of using consciousness as the starting point. He only says that all alternatives he can think of are either uninteresting or related to consciousness. This is an argument from ignorance, a logical fallacy, which Harris should be familiar with as an outspoken atheist. Unfortunately, Harris keeps on using arguments from ignorance in his book (see also p. 62 and p. 183).

Secondly, let us for a second consider the world of ants (rather than humans). We know that most ants are insects with complex rules of social interactions. The problem is how to design these rules in order to maximize ants’ well-being (e.g. “thou shalt not kill another ant from your nest”). Or to put it more generally, let us say we have any population of any social agents: they may be simple computer programs implemented in a cellular automaton, or super-intelligent aliens who have no characteristic that we would recognize as consciousness by any modern definition (they do not have brain tissue, they do not smile, frown, sleep, cry, nor talk). How do we go about designing optimal interaction rules for their population, i.e. how do we design their morality? If use of consciousness is necessary, does it mean that we cannot design morality for creatures that do not have it? It seems that this is what Harris is thinking: “altruism must be (…) conscious (…) to exclude ants” (p. 92). Why not ants? It seems that if we are to solve much more complicated problem for humans, maybe it would be a good idea to start with much simpler problem for ants? Harris seems to think that we cannot optimize ants’ behavior but we can optimize human behavior. Why?

We already know that despite what Harris is claiming, the choice of consciousness is arbitrary. This is what evolutionary psychology dictates and Harris tries (and fails) to rationalize this human intuition. And the question that needs to be answered in the first place is: should we follow our intuitions? Or, more precisely: why, when, and which intuitions should we follow, and which should we discard?

Monday, September 14, 2015

Consciousness and morality

In his brilliant book, ‘Consciousness and the brain,’ Stanislas Dehaene gives an overview of neurological processes that give rise to consciousness. He uses the definition of consciousness I like and, generally, I agree with everything he has to say (especially with his critique of the grotesque modern philosophical theories of consciousness towards the end of the book). But there is one exception. In chapter 7, he unfortunately delves into a morality-related topic and seems to make a tacit assumption that virtually everybody else makes as well. And this is an assumption I do not like.

The question is: ‘is infanticide morally justified?’ The quoted argument in favor is: “The fact that a being is (…) a member of species Homo sapiens, is not relevant to the wrongness of killing it; it is, rather, characteristics like rationality, autonomy, and self-consciousness (…). Infants lack these characteristics. Killing them, therefore, cannot be equated with killing normal human beings, or any other self-conscious beings.” Dehaene strongly criticizes this point of view: “Such assertions are preposterous for many reasons. (…) Although the infant mind remains a vast terra incognita, behavior, anatomy, and brain imaging can provide much information about conscious states. (…) We can now safely conclude that conscious access exists in babies as in adults (…)” (p. 236-243)

Let us take a step back to see what is going on here. We see two people arguing if a thing (an infant in this case) deserves protection as an object of moral behavior (let us quickly explain the notion of 'object of moral behavior': e.g. unlike humans, cockroaches are not objects of morality so we can kill them with no remorse). Both sides tacitly agree that a thing qualifies as an object of moral behavior if it has certain characteristics of an adult human being and that it does not qualify as an object of moral behavior if it does not have these characteristics. Both sides tacitly agree that these characteristics revolve around mental capabilities of a healthy human adult and gravitate towards something that both sides call consciousness. What the sides disagree upon is whether a particular class of things (infants) has consciousness or not. I side with Dehaene that babies have consciousness, because philosophers whom he cites, as it often happens to philosophers, seem to have no idea what they are talking about. However, it may surprise you that who I side with is actually irrelevant.  

Connecting consciousness with morality seems to be very popular. As I wrote in my previous post, vegetarians often use the argument that animals are ‘conscious’ to propose that it is immoral to kill them. We can also see it in the debates about consciousness, maintaining life-support for vegetative-state patients, etc. However, virtually never, a person making such claims explains why consciousness is the necessary and sufficient condition to become an object of morality. And this omission goes unnoticed. Everybody in these discussions seems to be in tacit agreement that this is the way to go: you are conscious – you deserve a right to live, you are not conscious – you can be treated instrumentally. (In some other debates the question is whether the thing in question is capable of ‘suffering’ or ‘feeling’ rather than being conscious, I do not want to go too deep in the nuance here, because it is irrelevant.)

Why is consciousness the necessary and sufficient condition to become an object of morality? Why is it inherently wrong to kill a conscious being? If you read my previous essay, you probably know the answers to these questions. There is nothing objectively wrong in killing a conscious being, whatever the definition of consciousness might be. It is subjectively wrong from the point of view of the members of the Homo sapiens species, because they (we) are hardwired to perceive it as wrong. In other words, we have an evolved intuition that killing something that has consciousness is wrong because Mother Nature hardwired us not to kill each other. But she did not care to make the emotional mechanisms precise enough to spare us the ongoing confusion. Whatever works is good enough.

The battle described at the beginning of this essay can be deciphered in the following way. The two gentlemen have a certain concept (consciousness) whose perception is hardwired in their brains to activate the ‘I care’ system. The two gentlemen then argue whether a certain stimulus (a baby) should be associated with this concept (and thus activate the system). But both of them skip the question whether this setup with the concept of consciousness invoking morality makes sense at all. None of the gentlemen makes any arguments about objective – as opposed to emotional – reasons to kill or not to kill infants.

Do such objective reasons exist? It may be surprising and depressing for some people to learn that our intuitions about the world, including the dearest and most deeply held feelings, are the works of natural selection and are imprecise products of technological trade-offs that often lead us astray. Our intuitions worked well to propel us to the status of the dominant species on the planet Earth (in a sense that we technically have power to destroy virtually all other species) but they also very efficiently generate confusion when we try to understand the nature of reality. Using intuition is not a pathway to truth. Rationality is.

But if we eliminate our intuitions as a source of morality, what are we left with? Can we answer the question ‘is it okay to murder babies?’ Can we even answer the question ‘is it okay to murder another human being?’ Can we build a rational argument that is completely independent of our hardwired intuitions, is fully rational, and provides guidance for the decisions we must constantly make? Finally, if we ever encounter more intellectually advanced aliens, who do not rely on their instincts like we do, what kind of morality should we expect from them?

Maybe it is possible to answer these questions in a meaningful way. But this is a topic for a separate post. Stay tuned. 

Saturday, August 29, 2015

The mind-body problem and the hard problem of consciousness explained

Abstract: In this essay I try to reconcile the two available sources of information about consciousness. One source is science, which gives us insights into how brains work. Another source is introspection and opinions expressed by other people talking about their subjective experience. The hard problem of consciousness can be understood as inconsistency of the information provided by these two sources. I propose a reductionist theory based in evolutionary psychology that accounts for subjective experience and explains why introspection yields thoughts that are seemingly incompatible with fully reductionist point of view. By incorporating thoughts of individuals and their opinions about nature of consciousness into the reductionist theory, I reduce the set of unexplained or inconsistent observations, on which the hard problem of consciousness is based, to null. Given there is no more unexplained observations left, regardless of whether they are objective or subjective, the hard problem of consciousness appears to be solved. 


Introduction

People are baffled by the notion of consciousness. There have been many theories of why the consciousness arises and none of them seems to be satisfying. Dualism (e.g.mind is immaterial and somehow communicates with brain) is just a pure speculation which offers no explanation in a scientific sense. Other theories, like panpsychism (every single molecule has consciousness but the more complicated the system, the more consciousness it has), are not falsifiable and offer no predictive power. Finally, full reductionism (mind is a product of a physical brain) fails to address the reason why we are able to distinguish between subjective experience and objective world.

People discuss consciousness, mind-body problem, and the existence of qualia – the discussion itself is an objective fact about reality. Electrical and chemical impulses originating in their brains make the muscles in their mouths and throats contract so that the corresponding statements are uttered. It is thus an objective fact about reality that there is something in the brains of these people that causes them to perceive consciousness or qualia as something that cannot be explained by material science. What are the origins of these electrochemical signals? Is there some mysterious soul that through yet unidentified physical mechanism facilitates transfer of information between the realm of spiritual and the realm of material? Or the mental properties of elementary particles in a brain unite in some mysterious way to influence the neurons responsible for perception of consciousness? Finally, maybe there is a fully reductionist explanation of why these neurons get activated and why our brains produce behavioral outcomes like the writings of Rene Descartes?

It is bizarre. The best model of the Universe that science has been able to come up so far, the model that is being constantly positively verified, yields good predictions, and helps us solve practical problems, is materialistic and reductionist. All objective evidence points to the fact that a mind is a product of a brain and nothing else. Brain has not been fully understood yet. Mind has not been fully understood either. So maybe understanding the brain will let us understand the mind? It is bizarre that philosophers are often so quick to reject a notion that once we fully understand physical brain we will fully understand human experience. Why is that?

A solution

To sketch a quick explanation, let me first focus the attention on what the problem is. When you think about other humans as humans, or when you think about your own thoughts, there is something you perceive (the common notion of consciousness). But you no longer perceive this thing when you think about the brain tissue, firing neurons, electrical circuits of a robot, and such. The perception of it, when you think about humans as a whole, and failure to perceive it, when you think about them in a reductionist way (biological tissue), is what causes the dissonance in your brain. Your brain perceives this dissonance and makes your mouth utter the statement like: ‘The really hard problem of consciousness is the problem of experience. When we think and perceive there is a whir ofinformation processing, but there is also a subjective aspect.’ The notion of “information processing” (reductionist part) does not cause the feeling that you have about yourself when you ponder the phenomenon of your own thought. This is why a thought must be something more. This is what the problem was during the time of Descartes and this is what the problem still is as discussed by modern philosophers talking about the hard problem of consciousness and the existence of qualia.

To solve the hard problem of consciousness we need to explain why people perceive consciousness in some things and they do not perceive it on other things. The answer lies in evolutionary psychology. Every healthy human being has a system in their brain that is supposed to detect minds. This system is activated when we think about sentient beings but is not activated when we think about meat or a piece of silicon. The reason why we have this system is to facilitate our socials interactions. We need to recognize other humans so that we can feel sympathy or compassion, and help them when they are hurt while not having the same feelings towards a broken cardboard box.

Evolutionary psychology explains

A human brain has a number of systems that allow for recognition of objects that are important from the evolutionary point of view. Recognition of such salient objects usually facilitates specific behavior which is a response increasing genes’ probability of survival. When you see a naked body of a potential sexual partner you are inclined to get closer. When you see (or smell) a rotting carcass of a rat, you are inclined to move away. When you see a baby, you feel a need to care about it. And so on.

The perception of a salient object often generates a feeling. This feeling can be overcome under some circumstances. It cannot be treated as a sole determination of human behavior, it is just guidance. There are other factors that can be more important (e.g. there is a lion between me and the naked body of my lover). But the important fact is that detection of salient objects happens outside of our thoughts – we cannot chose to be attracted by a carcass instead of being repelled. We can at most ignore the feeling of repulsion.

Another important fact is that systems for recognition of salient objects are very crude and prone to mistakes. It is probably very hard to precisely shape the structure of a brain by genes. There is an engineering tradeoff involved – our brains are not designed to be perfect. They are designed to do their job in most situations at a moderate cost.

One of the examples of such crudeness is recognition of a sexual partner. It is very common both for humans and in the rest of animal kingdom to get aroused by things that are far from actual potential sexual partners. It is easy to find on the Internet a picture of a tortoise having sex with a stone or a dog humping his owner’s leg. It is also very easy to find a lot of pornography. Pornography allows humans to get aroused by a pattern of colorful dots rather than by the presence of an actual sexual partner.

Similar phenomena occur with respect to perceived cuteness. It is quite simple to discern what facial features activate a mental system that was most likely intended for the recognition of human infants (cartoonists know it too well). Seeing something with big eyes and a little nose mounted on a big head makes us feel like we need to care about it. But it is not only infants that have these features. Most of young mammals have them which is why we perceive kittens and puppies as cute. From the evolutionary point of view, this may be a mistake – we are misidentifying a salient object and misallocating our resources (as we waste time on caring for a puppy instead of a member of our own species, say a relative). But it is not easy to determine whether this phenomenon is really a bug or a feature – there is not only an engineering tradeoff involved in making the recognition mechanism crude. There may be also some yet-to-be-discovered benefits from making such “mistakes.”

The mind as a salient object

The toolbox our brains are equipped with contains a lot of systems that guide our behavior when we interact with other humans. For example, there is a subsystem responsible for basic moral behavior. It takes a great deal of emotional effort to kill somebody you consider to be a fellow human being. There is a feeling of compassion you feel towards somebody in need. There are feelings of unfairness that we have when others do not reciprocate. And so on. All these emotional responses are culturally universal and have simple evolutionary explanations.

Is there any other evidence that such subsystem exists? Let us consider genocide. Something that happens very often before genocide is a process of dehumanization of a group that will be subject to extermination. In the brains of perpetrators, dehumanization effectively disconnects the system for recognition of fellow human beings from the input generated by the victimized group. You no longer perceive victims as humans – you now perceive them as cockroaches or rats, which does not activate your moral intuition and makes them worthy of extermination.

When you watch debates or read articles about consciousness, you may notice that people often intuitively make a connection between consciousness and moral behavior. And it is not only vegetarians that refuse to eat “sentient beings.” People who otherwise eat meat, sometimes say that since octopi, elephants, dolphins, and monkeys were identified as self-aware (they can recognize themselves in a mirror), it is thus immoral to eat them. The questions what is conscious and what is not conscious is important precisely because conscious beings seem to require moral treatment while beings that are not conscious are just things that can be dealt with without as much respect. Note that lack of consciousness at the time of committing a crime is often a condition for more lenient verdict.

Like other systems for recognition of salient objects, the system for recognizing minds (or consciousness / sentience / souls / feelings / subjective experience) is not perfect. It can be overridden (as we saw in the example of genocide) and it can be activated by the stimuli it was not likely intended for. Such a misidentification can yield a decision that results in misallocation of resources and could be detrimental not only from the evolutionary point of view but even from the viewpoint of subjective wellbeing of the individual.

There are a lot of examples showing the system for recognizing minds being activated by a mistake, when there is no actual human mind present. As I already mentioned, it is often activated by animals, which leads people to refuse to eat them and chastise those who do. It is activated by fetuses, which leads people to strongly oppose abortion and demonize those who do not. It is activated by inanimate objects and natural occurrences in people who believe in spirits and worship gods (and trade favors with them by making offerings, including human sacrifice). And finally, some people make a connection between this system and the entire Universe, which leads to panpsychism.

The existence of such a system can be potentially empirically verified. It implies that there is a pattern of neurological activity that occurs when a human detects a mind. There is also a pattern corresponding to at least some moral judgements which is then tied to identifying the subject of a judgement as a mind.

What is consciousness?

The main point of this essay is that there is no evidence, whether subjective or objective, that leads to a conclusion that mind is something more than a product of material brain, given the laws of physics as we know them. All the subjective “evidence” against reductionism can be explained by reductionism and there are no arguments left in favor of other theories.

Reductionism explains such “evidence” by showing that it is created as a result of cognitive dissonance caused by the system for detecting minds which gets activated while thinking about humans holistically or performing introspection, and is not activated while thinking about nerve tissue. The hard problem of consciousness has thus no origins in reality but is a product of inadequate perception abilities of the human brain. In other words, the question is not whether the thing exists but why do we have an illusion that it does. Similarly, when you are visiting a psychiatric ward, you do not find yourself concerned whether one of the patients really is Napoleon Bonaparte. You are more likely to analyze WHY he thinks so. The thought that all healthy humans have some mental deficits leading to delusions might seem depressing but on the other hand we cannot just start off with an assumption that we are perfect.

Having addressed the main point of this essay, an interesting follow-up question arises: does consciousness even exist? Before discussing existence one must have a good definition of the thing in question. A good definition identifies a phenomenon by its properties and enables us to objectively discern whether a phenomenon satisfies the properties or not. In other words, a good definition should let us objectively determine whether something is conscious or not. A good definition should be also as close to the common use of the word as possible. Most of the available definitions describe consciousness by vague associations with notions like sentience, awareness, subjectivity, or ability to feel and fail at satisfying these criteria.

The word “consciousness” is commonly used to describe phenomena which activate the mind detection system in a brain of the person who speaks about it. Unfortunately, for different people, different objects activate this system which causes a lot of confusion. Despite trying hard I was unable to come up with a reductionist definition that would be based on this approach. Hence, a different approach is needed.

Another common way of talking about consciousness is when we contrast conscious mental processes with unconscious mental processes. The former are what we can speak about after investigating our own minds trough introspections and the latter seem to be “given” to us in form of feelings or information, for example who the faces belong to and whether they are attractive.

Hence, I espouse the following definition of consciousness: “Human consciousness is a collection of mental processes that can be remembered and talked about later on by a healthy human being.” This particular definition seems to be useful as it allows empirically verifying what the content of human consciousness is and what it is not. It relates to what we care about – the experience we can share with each other, the experience perceived as the content of the Cartesian theater. It is what the mind detection system is most likely supposed to detect. Finally, its similar version has been already used by reputable scientists.

What about animals then? Do they have consciousness? Well, it depends on what scientific question we are asking. We are surely not interested in knowing if animals possess the imaginary and emotionally loaded quality whose illusion is caused by a failure of the mind detection system in our brains. We know that animals are aware (that is they have a working model of the environment in their brains) and often self-aware (that is they have a working model of their bodies and maybe even their own thoughts). But we don’t have to give these phenomena a new name. Word “consciousness” seems to be interesting from the perspective of humans willing to investigate what they are able to communicate about, but it does not have any obvious application in case of other animals, since they cannot speak.

Let us now think about a philosophical zombie, a creature imagined by philosophers, a creature that behaves exactly like a human but does not have subjective experience – a very sophisticated robot. Consciousness does not exist in a sense that there is something mysterious that differentiates a philosophical zombie from a human being. As a matter of fact, the notion of a philosophical zombie was engineered as a human being who does not activate the mind-recognition system. The difference between a human and a philosophical zombie is in subjective perception, not in the objective properties of the objects being studied.

Subjective experience

Finally, let me address the notions of subjective experience and qualia which seem so real and yet reductionism seems to deny their existence. Do I claim that they do not exist? On the contrary. Subjective experience exists and manifests itself as and only as a state of a brain. Subjective experience is a pattern of neural activity. Qualia correspond to various states of physical brain. Knowing perfectly configuration of a brain is enough to fully understand human experience and to predict with highest possible accuracy (subject to quantum uncertainties) how a person is going to react to a particular experience.

The current state of neuroscience does not allow modeling precisely the neural machinery that gives rise to our thoughts. I can only speculate that, for example, there may be a pattern of neural activity that gets activated when you see red color. There also may be a pattern that gets activated when you want to say what you see. Neurons in the part of the brain responsible for speech detect that these two patterns have been activated and they send a signal to your mouth. You say ‘I see red.’

There may be a pattern that gets activated when you perceive a mind. It gets activated when you think about your own perception of red color. But it does not get activated when you think about your own brain tissue. The patterns associated with these two phenomena get activated due to different inputs and cannot be merged. Failure to merge them is recognized and a pattern corresponding to detection of cognitive dissonance emerges. This pattern in turn makes your mouth say: ‘My subjective experience cannot be explained by reductionism.’


Note: Special thanks to Kamil Faber and Julia Madajczak for helping me make this essay slightly less unintelligible.