# Universal laws are causal inference

Edit: It has come to my attention that I did a terrible job of explaining this. I think it’s very important, but the explanation needs to be improved.

OK, the title might be a bit of an exaggeration, but it’s an effective way of summarizing an amazing piece of insight I’ve been thinking about lately.

Suppose you have a causal system. For simplicity, we’ll say that it contains two variables, A and B. Being a causal system means that one of the variables might affect each other. But how can we tell, from pure observation, which variable is the cause, and which is the effect? That should be impossible, right?

For instance, if A and B each have a variance of 1, and their correlation is 0.5, then that can either be due to the rule:

```A ~ N(0, 1)
B ~ 0.5 * A + N(0, 0.75)```

(which is to say, where B is determined by a combination of A and random noise; noise is denoted by N(µ, σ2), which refers to the normal distribution)

Or it can be due to the rule:

```B ~ N(0, 1)
A ~ 0.5 * B + N(0, 0.75)```

(where A is determined by a combination of B and random noise…)

These rules give the same observational data, yet are literally opposites. Which poses a problem for causal inference. There are methods of doing causal inference anyway, such as experiments, instrumental variables, and theory, but these are all far too expensive or difficult in many cases. Is there an easier way?

## Detour: Some coefficients are unstable across contexts

One of the main cases where you will see this discussed is in genetics. Within genetics, one has what is known as the heritability coefficient, h2, which is generally understood as a quantity that describes how much genetic influence there is on a trait. It is defined to be the fraction of variance caused by genetics.

But by talking about “fraction of variance”, nongenetic factors that increase variance will decrease the heritability. For instance, if you have a number of plants, and you place some of the plants in good conditions and some of the plants in bad conditions, then the growth of the plants will be less heritable than if they were all placed in the same conditions, as there is now extra variance due to the environmental condition. If, on the other hand, the heritability had been unstandardized, if one had talked about just the variance in growth, rather than the fraction of variance in growth, then the condition might not reduce the heritability.

(… or it might. If there are gene-environment interactions or other nonlinearities, as there likely is, then it could very well also affect the heritability. But we will ignore nonlinearities here.)

So one way that standardizing makes coefficients unstable is that they allow downstream conditions to affect the coefficient. To tie this into our previous examples with A and B, even if A causes B at a consistent coefficient of 0.5, the correlation between them is going to vary depending on the noise of B. In the previous example, the causal coefficient matched the correlational coefficient, but if B’s noise had been 0, the correlation coefficient would be 1, while if B’s noise had been 1, the correlation coefficient would be 0.44.

Another way that standardizing makes coefficients unstable is that they introduce a dependence of the upstream conditions. For instance, genetic variance can be lowered in cases of inbreeding, population bottlenecks, avoiding assortative mating, and more. Or in terms of the A/B example previously, if A has lower variance, then the correlation will be lower.

## The core asymmetry

Notice an important thing in the above: if A causes B, then variance in A will increase the correlation, while variance in B will decrease the correlation. That’s an asymmetry between A and B! Exactly what is needed for causal inference.

Just to hammer it home, here are covariance matrices for the two causal relations, and two different sizes of variance for A and B each. Top: Structural equation model diagram which shows the relationships between the variables. eA and eB are the noise terms, with the noise variance being represented by a and b. Bottom: The covariance matrices implied by different values of a and b.

Despite the causal effect being the same in each of the cases, the covariance matrix ends up differing due to the different variances that are introduced. And because of the asymmetry between A and B over the covariance matrices, only this linear causal relationship and not the one in the opposite direction fits the data.

Or to illustrate it in another way, I can generate datasets for each causal direction, for differing variances: Each circle represents an N=infinity dataset generated by the previously described causal models, with blue dots generated by the A->B model and orange dots generated by the B->A model. The Y coordinate shows the correlation between the two variables in the dataset. The X coordinate shows the relative amount of variance in A and B.

As you can see, while different causal models can overlap observationally, they trace out different curves of possible observational data in the space of covariance structures.

## Automagic causal inference

Now this is all well and good, but in reality any dataset is only going to have one noise variance for each of A and B, so how is this useful? This is where the “universal laws” part of the title comes in: if one can make ones theory describe multiple distinct situations, then one could embed the variables A=A'(x), B=B'(x) into a larger family of situations, and require the same theory to apply for each x.

In that case, simply by successfully fitting the theory, as long as the situations are sufficiently distinct, you have much greater confidence in causal validity than you would in a standard case where you are considering only one situation. (It is necessary for the situations to be sufficiently distinct, as otherwise it might fit to all of them through sheer luck.) This is because it’s easy for a wrong theory to accidentally fit a single situation, but hard for it to fit multiple situations.

To give some examples of how that might work:

• You might wonder if people support some specific political policy because they believe it is beneficial to them. In that case, you could generalize and look at policies in general, examining whether there is support for the general theory that people support policies if they believe they benefit them.
• You might wonder how people answer a personality item, what influence factors like desirability, memories, actual applicability, etc., have on their response. In that case, you can consider the general theory of how people answer personality items.
• You might wonder what factors go into creating some kink. Is it porn depicting the kink, traumas surrounding the kink, taboo, etc.? You might also wonder how the kink influences behavior, and in particular whether there is some mediation going on, e.g. does fantasizing increase the likelihood of acting on it? In that case, rather than considering the specific kink, one could consider a general theory of kinks, as this then allows performing causal inference over these questions.
• And particularly relevant for this blog, you might wonder what makes some people wish to be the opposite sex and what makes some people happy with their sex. And again, here one could embed it into a general theory about how people’s desires to be something specific works.

These are just some beginning examples I’ve thought of, because they are relevant to the topics I’m researching. I would not be surprised if there are analogous examples for other topics, considering how there are so many examples everywhere I look.

(Uhm, though there is one major complication: All of the examples I gave are in the domain of psychology, where measurement error is rampant and correlated, data is ordinal rather than interval/ratio, constructs are dubious and generalizability is unlikely. So it’s pretty relevant to look into how big of a problem these things will be; this is something I’m currently examining in simulations, and I will look at it more in the future.)

What’s interesting to me is that compared to all the other methods of causal inference, this method seem extremely… easy? You don’t need to have a good instrument, you don’t need to carefully look at conditional independences, you just need to look at generalities. And considering how important theory is to do anything practical, you need to look at generalities anyway, so this isn’t necessarily a big restriction. So I feel this is likely a method I will look into more to better understand.