Autism as a sparsity prior

[Epistemic status: Speculative. Testing these ideas quantitatively seems hard.]
[Writing status: I have no idea whether this is even halfway comprehensible. I’m so deep stuck in decision theory and causal inference and psychiatry that I’m not sure what is obvious to others, and what is in need of further explanation.]

I’m autistic, but I’m also kind of confused about what autism even is. Apparently in some contexts I come off as being very autistic, while in other contexts I don’t come off as autistic at all. However, I’ve got an idea that helps me make sense of some things:

Autism is supposedly “characterized by difficulties with social interaction and communication, and by restricted and repetitive behavior”, and tends to involve rigidity and obsession. A common approach seems to place it as one end of a spectrum of mechanistic vs mentalistic thinking, with schizophrenia at the other end; this approach is summarized by Scott Alexander in his blog post on the Diametrical Model Of Autism And Schizophrenia.

I’m building on that model, but I have some thoughts or concerns about it. It relies on the idea of “mechanistic” thinking (or in related models, “systematizing”, “if-then-else” thinking). But what is that? And why would it “trade off” against mentalistic thinking? (I think I have a reasonably good understanding of what mentalistic thinking is, namely as an agency prior. More on that later.) This is where my idea comes in.

Mechanistic thinking as a sparsity prior…

… would probably have been a more accurate title to the blog post, but that wouldn’t make the connection to autism as clear. To understand what this means, we need to consider what a prior is more generally, and what a sparsity prior is specifically.

“Prior” is short-hand for “prior distribution”; it’s a mathematical object used in Bayesian statistics which describes the theory in which one interprets the data one encounters. It turns out, you can’t interpret data without theory, no matter what you do; even something as simple as extrapolating from the past into the future relies on the theory that the past will resemble the future. The prior formalizes what exactly the assumptions you make are.

A sparsity prior is, in a sense, a formalization of the law of parsimony. Sparsity priors assert that it is more likely for a system to work in a simple way than in a complex way. The overwhelming majority of ways a system can work has “everything interacting with everything” (well, rather, most things interacting with most things); sparsity priors assert that this is implausible because there are so many interactions, and so rules out the overwhelming majority of possible explanations.

To see how this relates to mechanistic thinking, let’s consider some examples:

  • “If-then-else” thinking involves picking some key factor (the “if”) and making decisions on the basis of this. This makes sense only if there is some key factor that matters much more than anything else, i.e. if the system is sparse.
  • We tend to think of mechanistic systems as inanimate. What does that mean? One element of inanimacy is that they are not proactive; they don’t go out and modify the world in all sorts of ways. They just sit there. This is sparsity; they don’t have much influence on anything else.
  • Another element of inanimacy is that they are not reactive. They only have a limited, fixed set of ways to interact. Yes, a rock can be thrown, and it can fall to the ground, but it can’t really do much beyond that. That is, inanimate systems cannot be affected in subtle and complicated ways, only in simple and well-defined ones, which is a sort of sparsity.
  • Mechanistic thinking involves a sort of decontextualization; it tends to assume that things will work the same each time. This partly relies on sparsity, in that it assumes there aren’t a number of additional moderating variables that changes the system’s behavior. (It also relies on symmetry, in the sense that it requires duplication of the system across different contexts, such as over time or over space. Symmetry could likely also be considered a prior, but it is not the same prior as a sparsity prior.)

Generally, the pattern is that a sparse prior is rigid. It assumes there is one simple explanation going on, and cannot deal well with exceptions. It fails badly in cases that are not sparse. One such case is when dealing with interpersonal interactions, and agency in general. Agents try to absorb as much information as possible from their surroundings (e.g. by looking around and seeing things), building a model of what is going on, and they try to massively modify the world to suit them (e.g. building cities, reproducing to have a massive population). These are very non-sparse dynamics, as they require massive amounts of causal exchange, both in and out, in order to regulate the world effectively. Ultimately, any prior only works as well as the environment corresponds to its assumptions, and agents don’t correspond great to a sparsity prior, making it malfunction badly on them.

Relationship to the diametrical model

I think that’s a reasonable starting point; it seems like a sparsity prior accounts for the rigidity involved in mechanistic thinking well. So that leads to the followup question of, if autistic people have a sparsity prior, do allistic people have a density prior? And the answer is no: Most possible models are dense. Therefore, assuming the world is dense does not meaningfully narrow down the possibilities enough to let you infer anything about how it works.

As I see it, for the diametrical model, the relevant alternative to a sparsity prior is an agency prior. Under an agency prior, you assume that the world contains individuals who have goals, beliefs, and who act according to these to influence the world.

To illustrate the contrast, suppose you come across a tree that is fallen over. You don’t know how it has fallen over; the main possibility you can think of is the wind, but it seems hard to imagine that the wind is strong enough to knock down a tree. Density priors, sparsity priors, and agency priors interpret this scenario in three different ways:

  • If you have a density prior, anything might explain it. Maybe an elaborate Rube Goldberg mechanism knocked it over. Maybe god did. Maybe it’s not knocked over but it’s just an optical illusion. Maybe it grew fallen down. How could you know? After all, anything is possible in this world.
  • If you have a sparsity prior, it’s pretty unlikely that it could be anything other than the wind. After all, that would require some new effect that you hadn’t considered to explain it; but that seems pretty unlikely. The world is simple and doesn’t have all sorts of complicated effects. It must have been a very strong wind, or a very weak tree, or some combination of the two.
  • If you have an agency prior, you still don’t know how it happened. But you do know one thing: Someone must have wanted it to happen. After all, trees aren’t usually fallen over; such a special scenario requires an explanation, and the obvious explanation is that someone did it because they wanted to.

A pure agency prior seems reminiscent of schizophrenia, especially explaining the conspiratorial aspects, as well as the assumption that there is a deeper meaning behind every little thing one encounters. One could imagine that normal people have a balance between agency and sparsity priors, while autistic people have a skew towards applying sparsity priors, and schizotypal people have a skew towards applying agency priors. Seeing autism as a sparsity prior is thus not an explanation of how autism differs from allism (that would be via lack of agency prior), but instead an explanation of autism on its own terms, without reference to social cognition.

Deriving autism from sparsity priors

I will now go through some symptoms I see online characterizing autism, and discuss how they related to sparsity priors.

Autistic people are said to do worse at social problems. To an extent, this probably follows simply from being different; anyone who is sufficiently different is going to have trouble dealing with others. But if autism is a dysfunction in agency priors that leads to application of sparsity priors in cases where they are ineffective, then that too should lead to social difficulties.

Humans are agentic. One characteristic of agency is trying to do everything possible to optimize one’s goals; in communication, this would involve trying to optimize every part of one’s message, from phrasing to body language. If this is interpreted from an agentic perspective, then one will ask “given the whole picture, what is this person trying to tell us?”; on the other hand, if it is interpreted from a sparse perspective, then one will focus on the clearest specific things, likely ending up overly literal, and ignoring context and subtle signals. More generally, it will not even occur to someone with a sparsity prior that there are subtle things that they are missing; after all, that is part of the sparsity prior.

This also goes the other way; if you have a sparsity prior, then you will assume that there are not many things that are relevant for how to optimize your message. You may end up blunt, focusing purely on the message, and not taking into account the side effects that sharing the message may have. Meanwhile, with an agency prior, you would to a greater extent take into account the social implications.

Autistic people are characterized by obsessions and restricted interests. This might also make sense in terms of a sparsity prior; if you determine that there is some factor that is important, then it makes sense to learn everything you can about that factor, as it likely accounts for a big fraction of everything that might ever be important about anything (by the assumption of sparsity, that there are not that many factors that influence things). On the other hand, if someone else is interested in something, then an agency prior would tend to infer that this other thing must also be important and worth looking into; and thus lead to broader interests.

There are some autistic characteristics where I don’t quite understand what they refer to; for instance ritualistic behavior. Certainly it seems like sparsity should imply some forms of ritualistic behavior; due to the assumptions that there are only a few key ways that things work, a sparsity assumption would imply that one could end up focusing excessively on some parameters when making decisions, seemingly “ritualistically” ignoring alternative possibilities. I can sometimes recognize this from my own decisions, where in retrospect I have ignored many factors in favor of a single clearer factor. So perhaps sparsity can explain autistic ritualistic behavior too.

One needs to be careful here, with the point mentioned in the previous paragraph. Allistic people also have a sparsity prior, because you need a prior to act. However, I think it comes down to the social element; if there is even the slightest social encouragement to do something in a different way, then an agency prior will assume that there is some good reason for that, and adjust. This will still lead to ritualistic behavior in cases where everyone socially agrees on it, but since everyone is used to this, it is not noticeable. (Unless you start doing anthropology, at which point you realize that humans are very ritualistic in general.)

This ends up important to take into account when one then starts analyzing other factors. For instance, consider sensory overload. It would make sense that if you assume that only a few key factors are important, you would avoid “noisy” (not just audibly but also visually and through other senses) places, due to being unable to figure out what those factors are when there is so much noise. But this shouldn’t be limited to autism, it should apply to allistic people too. I don’t know whether there is a social explanation here, like there is for ritualistic behavior.

Another characteristic with similar problems is that autistic people tend to be clumsy. I think clumsiness can easily be tied into sparsity. Per Moravec’s paradox, basic things like moving your body are much more difficult than humans usually observe, due to needing to constantly optimizing your movements and keep track of every little thing. It would make sense that sparsity would tend to lead to “robotic” and clumsy movements in such a case – but since everybody, not just autistic people, need a sparsity prior to make sense of everything, it’s hard to see how this explains why autistic people specifically are clumsy. And here I also have trouble coming up with a social explanation.

One thing I’m still not sure how to derive is stimming; making repetitive movements. I only have a relatively limited form of stimming, consisting of tapping my feet sometimes. I don’t see any way this fits into a sparsity prior. Maybe there’s a subtle thing related to neuroscience or gathering information or something, but for now I will just consider this to be unaccounted for in the theory, indicating that there is a flaw.

I’m also not sure how to derive comorbidities like ADHD.

Dimensions of sparsity prior

Autistic people aren’t entirely lacking an agency prior, and allistic people aren’t lacking a sparsity prior. This raises the question, in what sense exactly can autistic people be said to skew towards a sparsity prior? More generally, why do the priors trade off against each other?

Let’s start with the second question. It might seem mysterious that mechanistic and mentalistic thinking have a tradeoff. From the point of view of priors, the answer is that they don’t really have very much tradeoff. There are some limits to it, in the sense that you only have so much probability mass to give out, so eventually you do run into a tradeoff. However, it seems like under ordinary circumstances, one would quickly figure out which model is appropriate, and apply that.

The apparent tradeoff probably arises from the fact that you need some sort of model. “The density prior” isn’t useful, as it lacks predictions; so there is some relatively limited set of models, of which sparsity and agency are two of the most obvious ones. The things you don’t model mechanistically must be modelled in some other way, and this main other way would be mentally. Hence an apparent tradeoff.

When it comes to the question of in what sense autistic people skew towards sparsity, I think the core of it is just that the agency prior among autistic people works worse, so they use it less. I.e., autism is literally the same thing as cognitive difficulties with social things (low “emotional intelligence”, except plausibly emotional intelligence tests aren’t good at measuring the agency prior). If you struggle with modelling things in agentic ways, you will end up modelling them in mechanistic ways instead, regardless of how appropriate that is. But it’s worth noting that quality of prior isn’t the only “free parameter” to consider:

A sparsity prior generally has a free parameter describing the degree of sparsity it assumes. You might think of it as being akin to a dimensionality measure. If you have a bunch of dominoes standing in a line, they can affect their neighbors, such that one of them falling over can influence the entire line, transitively. Meanwhile, if the dominoes are more spread out, then you won’t get these sorts of chain reactions, but will instead see the effects dissipate. So on one end of the spectrum, a limiting case of a sparsity prior is the assumption that everything might affect everything, while on the other end of the spectrum, a limiting case is that everything is isolated and can’t affect anything else.

Finally, one free parameter is the “hyperparameter” that tells you how often a sparsity prior is appropriate to use, compared to any other prior you might have. This is essentially about whether your immediate instinct is to think of things in a mechanistic way, or in an agentic way (or in some other way, assuming there are other priors of interest). This should in principle only have a limited effect on things; one should quickly be able to recognize when a situation calls for something else, other than a sparsity prior. Possibly, this might explain why engineers can seem more autistic; their “default mode of operation” ends up being autism-like, but they can adjust if they get a bit of time.

I don’t know if all of these exist in the brain. (Heck, I don’t know if any one specific of them exist in the brain; all of this is speculation.) But they seem a lot more concrete to me than “if-then-else thinking”, “systematizing” or “mechanistic thinking”.

Relationship between autism and masculinity

One theory of autism that has come up is the Simon Baron-Cohen’s “Extreme Male Brain” theory, which asserts that men think more “systemizingly” (basically mechanistically) and women think more empathetically, and that autism is due to a skew towards the male side of things. As support for that, it’s generally noticed that more males are diagnosed as autistic, that autistic people have a more male systemizing-empathizing profile, and so on.

In the past, I have had trouble with this theory. One element is that SBC’s measures don’t reaaallly seem to overlap all that much with autism. And autistic men don’t seem to be all that masculine. Really, this sort of thing has also made me skeptical about the diametrical model in the past, and about the validity of “autism” as a category in general. But now I’m writing a huge blog post on it, so clearly I’m going to have some new thoughts on this:

I think men are interested in things and women are interested in people, while autistic people are bad at people and schizotypal people are bad at things. But experience trades off against priors; getting more experience can to some degree develop skills that compensates for lack of ability. So if you’re autistic, but you are interested in people, then you are much more likely to learn to compensate, and that leads to female autists being less likely to be diagnosed. Unless they happen to have masculine interests, in which case they don’t develop their people skills. This seems to match what some people say about female autists “masking” their autism to make it less visible.

This model does introduce an ambiguity; do we define autism by the prior (possibly leading to equally many male and female autists?), or by its consequences? Probably the answer here should be that we shouldn’t get too hasty in applying speculative theoretical models, and that we should wait with using the prior as a definition until we have actually validated that this is an accurate theory of autism. Which it might very much not be.


For a while, autism hasn’t really made sense to me. This model makes it make a lot more sense to me, though I have no clue whether it’s right; I would encourage anyone to critique it. The model essentially just boils down to “autistic people are bad at social stuff” though, which is obvious enough; I guess the nonobvious claim of the model is about what is left in reasoning after removing the social stuff. And I don’t really think sparsity priors are the only thing left over; rather, I think they (or something like them) can be seen as a more formal way of understanding what “mechanistic thinking” is. But mechanistic vs mentalistic are not the only forms of thinking, and so there’s lots of things left over.

One big bottleneck for further development I see is a need to create something that measures it. This would probably be a form of intelligence test. I think current “emotional intelligence” tests don’t really focus much on ability to reason about agency vs sparseness per se, but instead on surface-level “content” vaguely associated with these things. Possibly problems akin to the “tree that has fallen over” situation that I described before might be relevant.

It might also be entertaining to ask whether other things can be mapped into this sort of “prior” viewpoint. For instance, there’s the concept of some person’s or place’s “vibe”; this seems non-sparse, but also not necessarily agentic. Certainly in some cases it might be agentic, but it more generally it just seems similar to principal components regression. Roughly speaking, one can understand the assumption behind this as being a combination of sparsity with latent variables; that is, rather than assuming that everything is observable, one assumes that there’s some important unknown unobserved variables that influence many things; these are then estimated using the “vibe”.

I also can’t help but notice that I’m very interested in mechanistic-like theories of agency. Artificial intelligence, decision theory, psychology, and so on. I wonder if learning such theories to a sufficiently advanced degree can function as a sort of self-treatment of autism. šŸ¤·

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s