Using instrumental variables to test the direction of causality between autogynephilia and gender dissatisfaction

[Epistemic status: experimental. Half-way speculative with some input from data.]

TL;DR: I use instrumental variables (paraphilias and masculinity/femininity) to test the direction of causation between autogynephilia and gender issues, and find evidence that it goes mostly from autogynephilia to gender issues, but also to some degree from gender issues to autogynephilia.

Autogynephilia – a propensity to be aroused by the thought of being a woman – is extremely extraordinarily strongly associated with gender dissatisfaction and cross-gender ideation in males. One controversial hypothesis, which I personally believe, is that this association is causal, autogynephilia → gender issues. However, this claim is controversial, as the causal aspect has not been properly demonstrated yet.

… mainly because causal studies are hard! It’s not like we have any simple way of randomly assigning people autogynephilia or gender dysphoria to people and seeing what effects that would have, and even if we did, that would probably be considered unethical.

But there are alternative ways of testing causality than randomized controlled trials. An important one is through the use of instrumental variables. Essentially, suppose we want to test the causality X → Y. In that case, if we can find another variable Z, then we can use this variable to test the X → Y causality, as long as Z satisfies certain assumptions:

  1. Z must affect X; we are essentially using Z as a stand-in for experimenting on X. If you are familiar with randomized controlled trials, then think of those high in Z as being the intervention group (except we are letting Z do the intervention, instead of doing it randomly), and those low in Z as being the control group.
  2. Z may not affect Y through other means than through X.
  3. Z may not be confounded with X or Y; that is, there must be no unmeasured factors that affect Z as well as X or Y.

Or graphically:


Assumptions made by instrumental variables estimation. Red arrows indicate forbidden connections. U represents any unmeasured confounders that may make the analysis invalid.

So far, there are two potential instrumental variables I can think of for examining the question of autogynephilia’s causal relationship with gender issues:

  • Paraphilias all seem to correlate with each other. This makes sense if there is a “general factor of paraphilia” that affects all of them; thus we can use this general factor as an instrumental variable that affects autogynephilia, to measure the effect of autogynephilia on gender issues.
  • Masculinity/femininity can easily be thought to affect gender issues; if someone has a poor fit to gender norms, then it would make sense for them to become uncomfortable with their assigned gender.

So, that’s the theory, which I’ve been aware of for some time, but now I have some data that will allow me to start testing it in practice! My initial power calculations suggested that I would need a very large sample size (1000+) to have enough power to meaningfully examine this question, and it’s not super trivial to get this. However, I’ve sometimes done “porn surveys” where I show participants on /r/SampleSize some porn and have them rate it, and usually these surveys are very popular, easily achieving the needed sample size. Therefore I decided to include the questions necessary to test this in a porn survey that I was doing for other reasons (more on that later, hopefully), to achieve the sample size needed.

Model: Paraphilia → Autogynephilia

So, how do we measure this general factor of paraphilia so that we can test the direction of causality? Essentially, we look at a bunch of paraphilias unrelated to autogynephilia. These paraphilias will all have some degree of influence from the general factor, as well as some random unknown influence from other sources. Thus, each of them is a noisy indicator for the general factor of paraphilia. We can find out how noisy they are by looking at how much they correlate with each other; because if they correlate with the general factor at a strength of h, then their correlation with each other would be at a strength of h2. (There’s some additional math that handles this in a more nuanced way, but I won’t go into that here. If you want to read up on it yourself, the keyword is “structural equation models”.)

So, we can use a set of paraphilias to estimate the correlation between the general factor of paraphilia and any other variable, by looking at how much the paraphilias correlate with the other variable, and adjusting for the noise inherent in using a proxy. However, there is one big complication to this: paraphilias have more structure than just the general factor. In addition to the general factor, paraphilias also correlate with each other in more specific ways. Consider for instance submissive paraphilias; they tend to correlate more with each other than they do with random paraphilias. This becomes a problem, because if one picks too many paraphilias within a single narrow domain, one ends up measuring this narrow domain instead of the broader general factor of paraphilia. So, when selecting the paraphilias, I tried to make them as unrelated to each other as possible, with mixed success. Here are the paraphilia items I selected for testing the model:

  • Treating your partner roughly in bed, e.g. spanking, shoving around, biting, scratching, or pulling hair
  • Being tied up by your partner
  • Exposing your genitals to an unsuspecting stranger
  • Watching a video of yourself masturbating
  • Having an older sexual partner take on a dominant parent-like role in the relationship
  • Imagining having sex with an anthropomorphic animal (furry)
  • Caressing your partner’s feet

For each of the above, participants were asked how arousing they found it. There were also a number of other sexual interests in the list, including normophilic ones (e.g. “Having sex with a woman”), and autogynephilic ones, of which I will use the following items:

  • Imagining being the opposite sex
  • Wearing clothes typically associated with the opposite sex (crossdressing)
  • Picturing a beautiful woman and imagining being her
  • Wearing sexy panties and bras
  • Imagining being hyperfeminized, i.e. turned into a sexy woman with exaggeratedly large breasts and wide hips

The survey I’m basing this on was a porn survey, and so I couldn’t easily fit in a detailed gender dysphoria measure. However, I included a handful of questions in a masculinity/femininity test and in a disgust sensitivity measure:

  • As a child I wanted to be the opposite sex
  • I feel I would be better off if I was the opposite sex
  • (“How disgusting do you find the following?”…) Imagining yourself being the opposite sex

I try to call this by the imprecise term “gender issues” instead of saying “gender dysphoria” because these do not measure very strong gender issues. One big improvement that could likely be made in future surveys would be to use a better measure of gender feelings.

Anyway, I then set up the following model in a statistics program, and ran it on the data from the cisgender male participants in the survey:


Structural equation model assuming paraphilias as an instrumental variable for autogynephilia.

(Here’s a bit of a technical point, so it might be worth skipping over if you don’t care: This model contains a cyclic causal connection, which is not usually allowed in causal models. I fit it as follows: If we let C be the matrix containing the coefficients for the SEM, and V be the matrix containing the residual variances, then I compute the implied covariance matrix as (I-C)-1V(I-C)-1 T. This essentially treats observed covariances as being what you end up with when one reaches an equillibrium after the causal effects are iteratively applied.)

If I fit this model, I get these results. The output here is a bit technical, so I will try to summarize:

  • The model finds evidence for bidirectional causality, but mostly in the autogynephilia → gender issues direction. (Specifically, B~0.56 from autogynephilia to gender issues, and B~0.2 from gender issues to autogynephilia.)
  • The model is very definitely wrong (as decided by the χ2 test); this is to be expected with these kinds of models once one gets enough sample size, as obviously it is too simplistic to assume that there are only three major factors that account for the covariation between the traits. As people say, “all models are wrong but some are useful”.
  • The model is also kind of bad; the numbers labelled “NFI”, “TLI” and “RMSEA” are measures that essentially assume the model isn’t true, and try to quantify how bad the fit is. Generally you want the NFI and TLI to be in the 0.9’s, and the RMSEA to be 0.05 or lower, all of which this model fails to achieve. Future research should probably look into creating a model that isn’t this terrible.

It’s also worth testing how stable these results were, as some of the measures I included were kind of “funny”. For instance:

  • As part of the gender issues measure, I asked people how disgusting they found “Imagining yourself being the opposite sex”. This is a weird question, but if I drop it from the model, I get very similar results; B~0.49 from autogynephilia to gender issues, and B~0.21 from gender issues to autogynephilia.
  • One of the paraphilia items asked about ageplay in a way that might include a degree of “role reversal”, and role reversal could plausibly be associated with gender issues in some way. If I drop it, I get B~0.58 from AGP to gender issues, and B~0.18 from gender issues to AGP. If instead I allow it to have a residual correlation with gender issues, I find no effect. Thus role reversal is probably not problematic for this model, but it is hard to say for sure.
  • When people answer my questions, they answers get discretized into the specific categories I provide (e.g. agree/disagree), rather than me getting data from what we can only assume is a more continuous underlying distribution. If I control for this in an ad-hoc way, I get B~0.54 from autogynephilia to gender issues, and B~0.25 from gender issues to autogynephilia. I used an ad-hoc way to control for this, though, so in the future it should be examined in a more numerically justified way.

In conclusion, using paraphilias as an instrumental variable seems to support the causality going in both directions, but mostly from autogynephilia to gender issues.

Model: Masculinity/femininity → Gender issues

The concept behind this second model is that if someone has a poor fit into gender norms, it seems plausible that they would start feeling dissatisfaction with their gender, or at least openness to being the opposite gender. Thus, we can use masculinity/femininity as an instrumental variable for gender issues.

But first we need some philosophy on what masculinity/femininity even is. I want to eventually write a blog post going into more detail on this, but to keep it brief:

There are various psychological differences between males and females; for instance, males tend to be more horny. These are not necessarily the same as masculinity/femininity, and therefore I will call them “gender differences”. Some of these psychological differences, as well as some things that are not gender differences, end up included in expectations for men and women, and these expectations appear to be closer to what people mean when they use the phrase masculinity/femininity than the gender differences are. I have done some research to find some things that could plausibly be relevant for the concept of masculinity/femininity, and have come up with this preliminary list of items:

  • I prefer talking to people about their daily activities rather than their feelings
  • I like being well-dressed at all times
  • As a child I often played with girls
  • As a child I often played with boys
  • I would be interested in being a fighter pilot
  • I would be interested in working as a machinist
  • I keep myself well-groomed
  • As a child I played with toy weapons or objects meant to simulate them (e.g. gun-shaped sticks)
  • I am interested in medical shows
  • I do not enjoy watching dance performances
  • I am very sensitive and easily hurt
  • I am muscular
  • I have a curvy body
  • [Arousal to] Being treated roughly in bed, e.g. spanked, shoved around, bit, scratched, or pulled hair

I deliberately avoided aspects of masculinity/femininity that I perceived to be strongly overlapping with gender identity, such as whether one wears feminine clothes, or whether one considers oneself to be masculine/feminine, as I think that makes its connections to gender issues too tautological, and so plausibly makes the model invalid. The dataset includes some data related to this, though, so you can play around with it if you download it.

In the previous models, I defined traits by assuming that there is some underlying “true” trait that makes all of the items correlate with each other. I don’t currently think this can be done with masculinity/femininity; instead, I will treat these items as an “index”, so I say that masculinity/femininity is whichever way they affect gender satisfaction. Or graphically:


Individual indicators are assumed to cause a synthetic variable that we label masculinity/femininity, rather than be caused by this variable.

This is called a formative model, and it has some disadvantages relative to the model we used previously. In the previous models, called reflective models, the model inherently prescribes some relationships between the items, making it able to be tested much more aggressively. In addition, reflective models automatically control for measurement error, whereas formative models don’t.

And I want to add: Currently, I don’t think we don’t have a good idea of what constitutes masculinity/femininity. Most existing scales, including my own, do not correlate all that much with what is informally referred to as masculinity/femininity. (To be more precise: They seem to correlate on the order of magnitude of 0.4. As a correlation between two separate variables, this is quite high for the standards of psychology, but these are not intended to be separate variables, they are intended to be a measurement. Usually we want measurements to share at least 70% of their variance with what is being measured, whereas a correlation of 0.4 implies that they share only 16% of variance.) I interpret this to mean that we don’t really know what masculinity/femininity is, and so in the future the concept of masculinity/femininity I’ve written about here may change. But in the meantime, let’s look at the results.

So, I fit the following model:


Structural equation model assuming masculinity/femininity as an instrumental variable for gender issues.

The initial fit gave these results, which asserted that autogynephilia overwhelmingly affects gender issues (B~0.8), and that gender issues actually reduce autogynephilia (B~-0.24), but it contained some elements that I found dubious, so I modified the model:

  • For some reason, the model claimed that arousal to being treated roughly in bed was masculine, even though I had intended it to be added as a feminine item. This might be an artifact of item phrasing, in that the item I had found to be associated with self-perceived femininity was “My partner acting dominant in bed”, but I wanted something more specific for my current survey, and therefore replaced the item. If I delete this item, I get B~0.6 for autogynephilia affecting gender issues, and B~0.16 for gender issues affecting autogynephilia.
  • Another issue I have is that the masculinity/femininity factor ends up almost entirely defined by the “As a child I often played with boys” item. I am concerned that having the variable defined so narrowly might lead to problems, so I removed this item to have it be defined more broadly by the other items. Combining this with the other change yielded B~0.55 for autogynephilia affecting gender issues, and B~0.24 for gender issues affecting autogynephilia.

Doing those modifications yielded these results. Here, we can observe that the resulting model is not as bad as the model that used paraphilias as an instrumental variable, though it is still quite bad.

Overall, the results seem to agree with the results based on using paraphilias as an instrumental variable: the causality is bidirectional and mostly goes from autogynephilia to gender issues.

Combined Model

Autogynephilia and gender dysphoria might be related in three ways: one affecting the other, the other affecting the one, or confounding where they are affected due to some common factor. Due to the two instrumental variables we have, we can find the causal effect in each direction, and so whatever correlation remains must be confounding. (In theory – assuming that there aren’t any major problems with the models, even though there probably are…)

To test this, I simply fit a straightforward extension of the previous models to the data:


Structural equation model that allows confounded relationship between autogynephilia and gender issues.

Fitting this model yields these results. This model finds that most of the connection between the two variables is either autogynephilia causing gender issues (B~0.5) or confounding (0.15), with only negligible causality from gender issues to autogynephilia (B~0.05). This is kind-of sketchy, as both the previous models agreed that there was some causality from gender issues to autogynephilia.

It’s hard to tell for sure what happened, but it seems to me that some of the assumptions were violated. Specifically, masculine/feminine traits seem to have correlated with paraphilic interests. Thus, to fix this, I let the masc/fem items freely correlate with the paraphilia items too. This yielded these results, where autogynephilia causes gender issues (B~0.49), gender issues cause autogynephilia (B~0.23), and there is little confounding (0.05).


For all of this to be valid, the assumptions behind the models have to hold. There are a number of ways in which this might not be the case:

Using paraphilias as an instrumental variable for autogynephilia assumes a “factor model”; that is, it assumes that there is a latent factor which causes the covariance between the different paraphilias. As an alternative to factor models, some people think of things as being “networks”. For instance, perhaps people “start out with” some sexual interest, “pick up” adjacent sexual interests, and repeat. This would be compatible with conditioning models of sexual interests. In such a case, the relationship between autogynephilia and other paraphilias would be bidirectional causality, with the paraphilias strengthening each other.

Using paraphilias as an instrumental variable also assumes that there are no other paraphilias that affect gender issues. If, for example, submissiveness tends to affect them, then this assumption is invalid and the effect of autogynephilia on gender issues will have been overestimated. Even more generally, paraphilias have not been sufficiently demonstrated to satisfy the requirements for instrumental variables (though my initial examination into this look optimistic – more on that another time).

The masculine/feminine traits are a grab-bag of different personality traits that I have lumped together. The assumptions behind instrumental variables need to be established for all of them, and so far we don’t even have an argument for why it should hold for any of them.

The masculine/feminine traits included questions about appearance. It is known that people tend to have extremely inaccurate ideas of how attractive they look. This raises the question of whether they also have equally inaccurate ideas about other aspects of their appearance. If so, this might have implications for the masculinity/femininity measure, though for now it’s hard to say how strong those implications will turn out to be.

The place where I got this data, namely reddit, has very high rates of paraphilias, and of autogynephilia specifically. This is going to increase the causal estimate for autogynephilia → gender issues, as there is more variance in autogynephilia on reddit than elsewhere. It also has very high rates of trans people; I have excluded trans women from this analysis due to a number of problems with including them (changes in traits due to transition, unsure about the self-report accuracy, …), but excluding them also leads to some biases (underestimation of effect sizes, particularly ones linked to gender transition).

It might be worthwhile to look into whether the bidirectional causality can be attributed to only certain narrower subtypes of autogynephilia. The current survey asked about autogynephilia quite broadly, so it is hard to say much about this

There are thus lots of things that could productively be researched in the future.


This is by no means a perfect test, and I’m not sure people on either side of the issue are going to be convinced by it. (Certainly it will be interesting to see what people say.) It might be worth considering some intuition for both sides of the issue:

  • There’s a straightforward way that autogynephilia could cause a desire to be female, and that’s because it’d be hot. There are also more subtle ways, though; for instance, there’s evidence that paraphiles tend to get attached to their paraphilic objects of interest, perhaps in similar ways that romantic attraction operates. Furthermore, maybe engaging in autogynephilic fantasies and behavior helps “normalize” the concept of being the opposite sex to oneself, as one has to keep confronting oneself with it?
  • Some people think that sexual interests reveal hidden desires. I’m don’t think I believe that, but there are two other categories of explanations that I find more plausible: Autogynephilia questions like “How arousing would you find it to imagine being the opposite sex?” correlate almost perfectly with questions like “How often do you imagine being the opposite sex?”. Thus, plausibly, people infer their arousal to it on the basis of how often they fantasize about it. But a man who is comfortable with the idea of being female might be more comfortable fantasizing about it. And, many claim that a man who is distressed about having a male body would also avoid fantasies where he has this, and likely replace them with fantasies where he has a male body.

It should also be noted that these results could easily be overthrown. You need massive sample sizes to estimate the parameters accurately enough that they can be used for instrumental variables, and even this data is a bit too “close for comfort”.

To make it easier for others to research, I’m releasing the data used for this analysis. I can’t release all the data, as some people opted to not have their responses shared, but here is a subset of the data from those who opted in to having it shared. I will also eventually be releasing the full survey results, so I guess stay tuned!

2 thoughts on “Using instrumental variables to test the direction of causality between autogynephilia and gender dissatisfaction

  1. Some studies suggest that autistic people are more likely to be transsex, others that we’re more likely to have paraphiliae. So it could be another confounding variable.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s