Quick heads up: Julia Serano believes in ETLE too

Autogynephilia is a sexual interest in being a woman. Erotic target location error is a theory which asserts that this sexual interest is connected to gynephilia (a sexual interest in women as partners); that there is ???something??? which usually prevents men’s gynephilia from finding the thought of being a woman to be erotic, but that this ???something??? is missing in autogynephiles.

Some people say that they find ETLE theory absurd, mock it, and call it debunked. And then they endorse people like Julia Serano, who claim to critique it. But here’s Julia Serano’s critique of it:

A third factor that may influence embodiment fantasies is sexual orientation, albeit not in the way that Blanchard envisioned. Specifically, if an individual is attracted to femaleness and femininity in a more general sense (e.g. they find such qualities erotic in their partners), then these same attributes might also be sexually salient with regard to their own embodiment, leading to more frequent or intense FEFs. (A similar correlation between attraction to maleness and masculinity, and MEFs, might also be expected.) Or to phrase this conversely: If an individual is not attracted to female or feminine attributes more generally, then they may be less likely to find FEFs arousing or compelling. This fairly simple explanation (which Blanchard never explored) is consistent with the correlations researchers have found between sexual orientation and embodiment fantasies, but without invoking direct causality.

Julia Serano, Autogynephilia: A scientific review, feminist analysis, and alternative ‘embodiment fantasies’ model

But this is literally just erotic target errors restated! Serano’s argument is that maybe there’s a link between autogynephilia and gynephilia where whichever mechanism that creates gynephilia also for some reason sometimes creates autogynephilia, which is precisely same as Blanchard’s postulation about the same.

It may be worth quoting Blanchard and Freund to illustrate the similarities in the theories:

What kind of defect in a male’s capacity for sexual learning could produce anatomic autogynephilia, transvestism, and fetishism, singly and in various combinations? Common to all these phenomena is a kind of error in locating heterosexual targets in the environment. In fetishism, the individual orients toward a particular garment (e.g., panties, brassieres) rather than those parts of the female body the garment usually covers. In transvestism, the individual is aroused by the appearance of an attractively clad woman, but he locates this image on himself rather than another person. In anatomic autogynephilia, the individual is oriented toward the characteristic features of the feminine physique (e.g., breasts), but he attempts, in some way, to locate these features on his own body.

The above analysis suggests the failure of some developmental process that, in normal males, keeps heterosexual learning “on track,” perhaps by biasing erotic response toward external rather than internal stimuli, and inherent rather than variable features of the female appearance. This putative defect allows the development of various misdirected – but still recognizably heterosexual – behaviors, and makes it possible, if not probable, that more than one misplaced interest will appear in the same individual.

Ray Blanchard, Clinical observations and systematic studies of autogynephilia

So in other words, Blanchard’s ETLE theory is that for some reason the gynephilia is applied to one’s own embodiment, as Serano describes. Or in other words, both agree that some sort of gynephilic eroticism contributes to autogynephilia, and so both agree on ETLE.

Julia Serano is not the only one I’ve seen who has done this; e.g. I’ve seen someone else propose that ETLEs don’t exist and any correlation between autogynephilia and gynephilia is just because gynephilia makes it easier to sexualize having a female body… which of course is the core claim of ETLE theory, making it puzzling that someone might call that a contradiction of ETLE.

Examining the structure of male sexual interests

Sexuality keeps coming up in this Blanchardian sphere of gender research, and so it would be nice to have an overview of how it works. Fortunately, Pasha, the creator of the /r/AskAGP subreddit, recently did a HUGE survey on /r/SampleSize, where ~1000 people responded to 84 different sexual fantasy items2. Since the structure of male and female sexuality seems to differ, in this post I will focus on the responses from the 494 cisgender men who responded.

A good starting point for understanding a domain of variables is factor analysis1. Factor analysis tries to model the data using a lower number of “factors” which group together the variables that are highly correlated, thereby abstracting the data and revealing large-scale structures. I can then inspect the fantasies that it lumps together, and name the factors to summarize the results in a readable way. Here are the results for applying factor analysis using 1, 2, 3, 4, 5, and 8 factors:

Bass-ackwards factor analysis applied to the male sexuality data. Each level represents a factor analysis. At the first level, I extracted one factor; at the second level, I extracted two, and so on. The boxes at the bottom show some of the items that were assigned to each factor. The arrows between the levels show how the factors correlate between the different factor analyses.

At the final level, there were eight factors; which I tend to think of in two groups, four “broad” factors that I assume shape everything else, and four “narrow” factors that I assume are less important for sexuality in general (though they may be very important in specific contexts). I picked names for the broad factors using the following reasoning:

  • One very consistent factor was characterized by a very large number of generic partnered sex acts. This seems core to the definition of allosexuality (sexual attraction to other people), so therefore I labelled it Allosexuality.
  • The second most consistent factor was primarily characterized by androgynous men. This made it seem like it denoted attraction to feminine men. However, it was also heavily characterized by masculine men, and it was negatively characterized by women (i.e. those who scored higher in this factor were less attracted to women). The feminine men were also secondarily placed on a different factor relating to androgyny, so therefore I decided to label this factor Homosexuality.
  • A third very stable factor contained a variety of items relating to wild sex with many strangers. This seemed reminiscent of what social scientists call sociosexuality (essentially meaning promiscuity or “sluttiness”), so therefore I labelled the factor Sociosexuality.
  • Fourth, all the way there appeared to be a factor that contained a variety of peculiar sexual interests that did not seem to be particularly defined or characterized by any common theme. My assumption is that this factor reflects the General Factor Of Paraphilia, so therefore I labelled it Paraphilia.

I picked names for the narrow factors using the following reasoning:

  • One factor involved oneself being the opposite sex, often combined with various sex acts. On this blog, it is well-known that this represents Autogynephilia, a sexual interest in being a woman.
  • A related factor involved attraction to masculine women. It also to an extent involved attraction to feminine women, but my suspicion is that this is due to the factor analysis getting confused by bisexuals. Furthermore, androgynous men seemed to have a secondary loading on this factor. Therefore I emphasized the Androgyny part more than for the Homosexuality factor, and named it Androgyny/Gynephilia.
  • A well-known factor that popped up involved bondage, discipline, dominance, submission, sadism and masochism. Therefore I named it BDSM.
  • Finally, another factor that came up involved things with zoophilic and pedophilic themes, as well as themes involving bodily waste. Furthermore, my experience with looking at other survey data on sexuality makes me have some suspicions about what would also have been included if the items had been there, and that makes me label the factor Disgust/Taboo.

These generally seem like some reasonably interpretable factors. Furthermore, while it was difficult to fit into a diagram, it appeared I could coherently continue the factor analysis further, to 15 factors. An image can be seen here, but to summarize it split as follows:

  • Allosexuality remained as it was before
  • Homosexuality remained as it was before
  • The Disgust/Taboo factor split into three parts; a relatively pure Pedophilia/Ageplay factor, a Bodily Waste factor, and a Nonhuman Anthropomorphic factor.
  • The BDSM factor split into four parts; two relatively pure factors involving Submission/Masochism and Dominance/Sadism, plus the Bodily Waste factor, plus something that appeared to be a Fetishism factor (involving latex and leather)
  • The Paraphilia factor continued with a difficult to interpret generic Paraphilia factor, but it also spun off three other factors, namely a Fetishism factor, a Roleplay factor, and a Transvestism factor.
  • The Sociosexuality factor continued with a general Sociosexuality factor, but seemed to spin off a Roleplay factor as well as a factor that appeared to involve bimbos or body modifications.
  • The Androgyny/Gynephilia factor continued into an Androgyny/Gynephilia factor in the final layer, but it also appeared to reduce interest in a factor seemingly related to Transvestism.
  • The Autogynephilia seemed to split into an anatomic Autogynephilia factor and a Transvestism factor.

The full factor analysis can be seen here.

Towards a new general factor of paraphilia (GFP) measure

I have an idea for how to empirically prove that autogynephilia causes gender issues, but in order for the idea to work, I need some variable that influences autogynephilia. This study of paraphilias gives a good candidate for it. To see why, let’s recap some principles.

Almost all paraphilias are positively correlated with each other. This indicates that they have some sort of common underlying causes. If we lump all of these causes together into a single variable, then this variable is usually labelled the General Factor of Paraphilia. Of course, this is not very useful unless we can actually measure the variable. But fortunately, there’s an easy way of measuring it: simply measure a broad variety of the narrower paraphilias, and average them together. Causes that are specific to individual paraphilias will then disappear in the averaging, while causes that are common to all paraphilias will add up.

Because of the rich factor structure discovered in the previous section, it is important to sample paraphilias from a broad variety of factors, so that we don’t just end up measuring the narrower factors. Studies I’ve seen of the topic often fail to do this, and instead seem to sample mainly from the BDSM and Disgust/Taboo factors. In theory, this reduces the accuracy of their general factor measure.

To attempt to do better, I sampled paraphilias from a broad variety of factors. The correlation matrix can be seen here:

Correlation matrix between a broad range of paraphilic interests.

As described in previous posts, I then extracted the general factor of paraphilia, and looked at the correlations that remained after controlling for it. This yielded the following matrix:

Correlations after controlling for the general factor of paraphilia.

To shorten the list and obtain a purer measure, I then removed a number of items due to them being too correlated with other items on the list:

  • Anal penetration was too correlated with bondage, and so I removed anal penetration.
  • Attraction to women wearing men’s clothes was too correlated with autogynephilia. If I was optimizing purely for a measure of GFP, I would remove autogynephilia, but since the idea is to use this in conjunction with autogynephilia measures, I instead removed attraction to women wearing men’s clothes to avoid getting at anything too specific to this.
  • Having a cigarette-smoking partner was not very correlated with the GFP, but was relatively correlated with humiliation masochism, and so therefore I removed it.
  • Nyotaimori was too correlated with doctor roleplay, and therefore I removed it.
  • Flashing was not very correlated with the GFP, but was a bit too correlated with other variables for my liking, especially since it is a courtship disorder and therefore might be controversial to ask about in a survey. Therefore I removed it.
  • Bearded women were not very correlated with the GFP, but they were correlated with attraction to nippleless partners and to statues, so therefore to avoid introducing extra noise, I removed them from the list.
  • Similarly, balloons were mostly uncorrelated with the general factor of paraphilia, but were too correlated with latex and were therefore removed.
  • Getting peed on was correlated with humiliation masochism and therefore removed.
  • Making your partner adhere to a diet was mostly independent of the GFP, and was vaguely correlated to a variety of other things, and so was removed.
  • It might be worthwhile to investigate having sex with a religious figure. I removed it at this step because it was correlated with interest in sex with older partners, but as you will see, that item got removed at a later step in this test construction, and therefore this could be revisited.
  • Cat ears were removed due to it correlating with a variety of other items.

This yielded the following items:

New selection of GFP items.

To further evaluate the items, it seemed appropriate to analyze them together with Allosexuality and Sociosexuality items, to ensure that they interact well. Upon doing so, some problems popped up:

Correlations between paraphilia items, sociosexuality items, and allosexuality items.

The bondage item appeared to be strongly associated with sociosexuality and allosexuality. The item about sex with older partners appeared to be strongly associated with sociosexuality. And the item about sucking on your partner’s tongue was strongly associated with allosexuality. Therefore, to achieve a cleaner paraphilia measure, these items were removed. I fit a confirmatory factor model to a reduced set of items, and it seemed to achieve a not-too-terrible fit. Thus the final set of items are:

  • Imagining being a member of the opposite sex
  • Having your partner call you slurs or insults
  • Imagining having sex with a vampire
  • Having your partner wear latex
  • Having a sexual partner with no nipples (blank skin where the nipples would be)
  • Rubbing your genitals on a piece of furniture
  • Having surgery to modify your body to be more erotic to your partner
  • Touching a naked statue
  • Pretending that you are a patient and your partner is your doctor as sexual role play

These are my current best attempt to make a brief general factor of paraphilia measure. For the psychometrically inclined, it has an alpha of 0.67, which is not so good, and indicates that the measure could use improvement. However, to me it seems like a reasonable starting point to work from.

Who are the paraphiles?

It might be nice to get some idea of how paraphilias relate to other variables. Let’s start with other sexual interests. It is commonly claimed by Blanchardians that different sexual interests compete, so that if one is more into one thing, then one becomes less into other things. I found no trace of this in the sexuality survey, with the general factor of paraphilia instead being highly correlated with allosexuality all across the spectrum:

Essentially it was rare for participants to be paraphilic without being allosexual. The only form of paraphilia that I found evidence for being negatively associated with allosexuality was the disgust/taboo cluster of paraphilias.

Most likely, the correlation here is underestimated due to sampling effects; since this was a survey with a huge number of sexual fantasies, there wasn’t much reason for asexual or low-libido people to participate, and so they may end up undersampled. On the other hand, reddit has much higher rates of paraphilias than the general population, and this may lead to a higher correlation, due to there being more variance to examine.

I also found paraphilias to be even more correlated with sociosexuality. It might be entertaining to think about whether sociosexuality should be considered to be a paraphilia; it seemed like there were some paraphilias that it ended up closer to than it did to allosexuality.

I also decided to look at some group membership. I’ve heard some anecdotes and seen some studies to suggest that autism might be associated with paraphilias. However, when looking into it, I didn’t find much effect:

Shifts in paraphilic and other sexual interests for autistic men. The left three variables are the general factor of paraphilia, allosexuality, and sociosexuality (measured in standard deviation units), while the variables on the right are the specific paraphilias used to estimate the GFP (measured in absolute units). Black bars represent standard errors in the estimate. The numbers in the title refer to the sample size for autistic vs non-autistic men.

If anything, the main thing characterizing autistic men is that they were much less allosexual than non-autistic men. My hunch is that this is the key; being less allosexual, the proportion of paraphilic to normophilic activities they engage in will be paraphilic-skewed.

Another group of interest would be polyamorous men:

Shift in paraphilic and other sexual interests for polyamorous men.

As can be seen, they are much more sociosexual, but also much more paraphilic, than monogamous men. This matches previous observations that I have seen about polyamorous people having a kink for their partner having sex with someone else.

I’ve seen some people suppose that homosexuality is a paraphilia. However, this doesn’t really seem to be so; or at least, gay men don’t seem all that particularly paraphilic:

Shift in paraphilic and other sexual interests for gay men.

Bisexual men, on the other hand, seem to be more paraphilic:

Shift in paraphilic and other sexual intersts for bisexual men.

This matches a hunch I’ve had for a while that bisexuality and homosexuality are more orthogonally related than continuously related. That is, I suspect that bisexuality results from a great level of sexual flexibility, or something like that.

We were also interested in the relationship between paraphilias and intelligence. Anecdotally, there seems to be a correlation between the two, with many of the communities that are highly paraphilic being known to also be highly intelligent. We had two measures of intelligence; first, we had asked people if they had ever taken an IQ test, and if so, what their score was; and secondly, we asked if they had taken the SAT, and if so, what their score was. The score was asked in broad buckets, with IQ being scored in buckets of 10 and SAT being scored in buckets of 100. Of the people who reported scores, most reported far above average, so YMMV if you believe that reddit is full of geniuses. But if you do believe the data, then I can say that there was moderate correlation between the two cognitive scores, at r~0.36. To get an overall cognitive score, I averaged them together.

Scatterplot containing intelligence and paraphilias. To reduce the degree to which points overlap due to low measurement fidelity, I did some slight reweighting before taking the averages so they would be more noisy, but there is probably still overlap.

There was no correlation between intelligence and paraphilias, r~-0.02. So there goes that theory.

Attraction to androgyny

A final thing to investigate is the structure of attraction to androgyny. In surveys I often find I want to ask about attraction to androgynous people, but I don’t know what dimensions exactly to include. On my request, this survey included a bunch of androgynous archetypes, and so I can factor-analyze them:

  • a woman who has a full beard and a lot of body hair
  • an otherwise feminine woman who is mainly into penetrating you using a strapon
  • an assertive, muscular woman with masculine interests (a tomboy)
  • an ambitious career-focused woman who has a high position in a business job
  • an “Amazonian” woman; a woman who is taller and stronger than you are
  • a woman who exclusively wears masculine clothes, has short hair, is socially dominant, coarse, and has masculine interests
  • a nerdy woman who is awkward and not very interested in people
  • a woman who has small breasts and narrow hips
  • a man who has very effeminate, “campy” mannerisms and speech (but who still presents masculine)
  • a physically androgynous man who often wears women’s clothes (a femboy)
  • a very short, narrow-shouldered man with a soft face
  • a sweet/caring unambitious man who wants to be a househusband and start a family
  • an otherwise masculine man who is mainly into being anally penetrated by you
  • a sensitive/emotional artistic man, who is physically slender and tends to daydream
  • a physically masculine man who finds it hot to wear women’s clothes during sex
  • a pre-operative passing trans woman (MtF, a feminine-looking woman with a penis)
  • a pre-operative passing trans man (FtM, a masculine-looking man with a vagina)
  • a passing trans woman who has had surgery to get a vagina (MtF)
  • a passing trans man who has had surgery to get a penis (FtM)
  • a very androgynous person who you can’t tell whether is male or female

For most of the above, the question asked was how arousing the participants would find it to have sex with the archetype. However, for the final archetype the question was how arousing they would find to make out with those of the archetype.

I also included a number of nonandrogynous controls:

  • a physically fit man who likes to engage in sports
  • an ambitious career-focused man who has a high position in a business job
  • a nerdy man who is awkward and not very interested in people
  • a sweet/caring motherly woman, who wants to be a housewife and start a family
  • a female cheerleader
  • an artistic, feminine woman

Overall, I found I could squeeze four factors out of it: attraction to men, to women, to masculine women, and to trans/androgynous people.

Bass-ackwards analysis of the archetype items for men. The factor analysis can be seen here.

When looking at the data, I got the impression that there was some nonlinear structure that couldn’t be accounted for by the factor analysis. Perhaps it’s just bisexuals being more into androgyny, but it might be worth looking into in the future.


This is a rich dataset, and I’ve probably only scratched the surface. If there’s anything specific you want me to investigate, consider contacting me on discord via tailcalled#7006. I’m likely to also make further blog posts in the future on the basis of this dataset. This is a pretty big and aimless blogpost, so I have to find some way to end it, and I’m deciding to do so here.

1. Strictly speaking I used principal component analysis rather than factor analysis. PCA tends to yield nearly identical results to FA, but is computationally more readily available.

2. The items originate from a variety of sources. Some were included due to having been used with success in previous surveys. I suggested some because I wanted to study attraction to androgyny. Pasha included some to study “pairs” of self-related and other-related items (e.g. attraction to bimbos vs to being a bimbo). In order to get a broad sample, I also used GPT-3 to brainstorm items, with me picking the most plausibly relevant ones out of a big set.

Revisiting the instrumental variables strategy for testing AGP GD causation

Autogynephilia correlates with cross-gender ideation, gender dysphoria, and other gender issues. Usually Blanchardians attribute this to autogynephilia causing gender issues, but critics point out that correlation!=causation, and often argue that it is instead gender issues that cause autogynephilia, because someone who wants to be a woman would also want to engage in sexual activities as a woman and such.

A while ago, I had the idea that we could test the causal relationship between AGP and GD by looking at people who are more or less kinky. Specifically, the idea was that while some could imagine that wanting to be a woman would cause autogynephilia, it wouldn’t make much sense for it to cause kinkiness in general. Therefore, if we observe a correlation between kinkiness and gender issues, it would make most sense for this to be due to a kinkiness -> AGP -> GD effect, and therefore it would support an AGP -> GD causality. I found such an association, and therefore concluded that there was support to the AGP -> GD effect.

Shortly after I wrote the post, Michael Bailey sent me an email criticizing it by pointing out that applying instrumental variables in this way can be problematic, linking to a paper where he made the critique in detail. Which in retrospect is pretty obvious; I even emphasized these sorts of problems in my blog post, but perhaps I didn’t take them seriously enough, considering that I did still attempt to do this.

I think I’ve come up with a way to fix the method, and test the AGP -> GD effect in a much more solid way. This blog post intends to give an introduction to this concept; I still need more data before I can definitely test it, but I can use the previous data as an illustration.

Empirical causal inference in science 101

The main point of doing research is to uncover causal relationships. A common problem in science is that you’ve got two variables X and Y (in this case, AGP and GD), and you want to figure out the causal effect of X on Y. To solve this problem, a broad range of methods have been developed. Enumerating them all can be daunting, but luckily they mostly tend to follow a pretty consistent formula: To identify the effect of X on Y, you isolate some cause of X and look at how Y varies as this cause varies. So for instance, when you do a randomized controlled experiment, the cause of X that you isolate is your experiment, and then you look at how Y varies from your control group to your experimental group.

Most forms of quantitative causal inference between variables X and Y involve finding some cause Xc of X that doesn’t suffer from problems due to confounding or reverse causation. See the blog post for details.

The core assumption that this method makes is that the cause you isolate is not correlated with the outcome of interest, other than via its effect on X. Putting the case of autogynephilia and gender dysphoria into this framework, my strategy was to isolate general kinkiness as a cause of autogynephilia, and then look at how gender dysphoria varies between non-kinky and highly kinky people. But one could easily question whether the assumption holds here; for instance, you might suspect that people who are more sexually open-minded are both more kinky and more likely to want to be the opposite sex. Or really, lots of other things.

In particular, part of the problem is that “kinkiness” is a particularly difficult sort of variable to use for this approach. If I take the average interest across a wide range of sexual interests, then the variable I am measuring is “whatever things contribute to a wide range of sexual interests”. This is a pretty unbounded category of causes; while I have trouble thinking of any one single thing that would go into it (libido maybe?), it also seems unlikely that this is definitely going to be unconfounded. My plan after writing the blog post was to start investigating these sorts of hypotheses, searching for confounders and adjusting for them. But ultimately the problem is that you only need a very tiny violation of the assumptions to get wrong results, and therefore this is not a viable strategy.

This is a general problem with figuring out the AGP <-> GD causality

I investigated the causal direction using general kinkiness as a root cause, but there are other attempts to figure out AGP <-> GD causality that fall into the same general category, and which encounters the same problems.

Consider for instance time as a cause of autogynephilia. Kids are, for complicated evolutionary reasons, not very sexual, with libido instead firing up at puberty. As such, Blanchardians might want to use the contrast between childhood gender issues and adulthood gender issues as a measure of the contribution of autogynephilia.1 This can be critiqued in a lot of ways, but perhaps the best critique is to point out that it’s far from obvious that this is unconfounded. Puberty is also a time where a lot of sexual differentiation happens, and where gender-related topics become relevant in new and different ways, so it’s very far from obvious that this is an unconfounded measure of the effect of AGP.

Another example involves relationship status. An AGP researcher I’ve talked to argued that you could use the differences in autogynephilia and gender issues between times where an autogynephile is single and times where the autogynephile is in a romantic relationship to estimate the effect of autogynephilia on gender issues.2 The idea is that some autogynephiles feel that they are more autogynephilic when they don’t have a girlfriend. Leaving aside the issue that I am kinda skeptical of the effect of relationship status on autogynephilia, it seems far from obvious to me that relationship status doesn’t influence gender issues through other means. It seems to definitely influence the pros and cons of transitioning, and it seems like someone who has more opportunity to transition would also have a greater interest in doing so. Which makes relationship status an invalid variable to use to estimate these things.

I think the problem pops up all the time in these debates. HRT, random variation in GD over time, shifts in GD when seeing or thinking about sexy women, etc.. Almost all the back-and-forth arguing about the validity of AGP models comes down to the issue that we’re trying to parse out causality from a bunch of proxy related variables, without having a definite idea of how these variables function.

It is worth saying that the problem is not that we know some specific confounding variable that makes the tests invalid. Rather, the bigger problem is that we have no idea how the variables are related, so there could easily be tons of confounders and unintended mediators that we don’t understand. These sorts of methods shouldn’t be taken lightly, with all the arguments mindlessly thrown at the wall to see what sticks. Rather, we need to take a step back and identify some more well-justified method for studying this.

Kan være et billede af udendørs og tekst, der siger "ENDOGENEITY Me adding more controls to my regression"

Recently, I decided that this whole class of methods was inherently flawed for investigating things, and looked into alternate methods of causal inference, most notably analogy-based reasoning. For instance, one such argument would be “we know autogynephilia is a sexual interest, and sexual interests cause desires, rather than being caused by the desires”. But these alternate methods have their own new and exciting difficulties to struggle with, so I haven’t been able to do anything definite with them. But as I mentioned in the beginning of the post, I’ve come up with a way to fix the standard approach for causal inference, so let’s get around to this.

Maybe we should just investigate how our causes work

So back to the matter at hand. We want to know the effect of autogynephilia on gender dysphoria. So to do this, we look at the causes of autogynephilia, and identify general paraphilic tendencies as a cause. But the problem is, we don’t know how general paraphilic tendencies work, so maybe they have some hidden correlation with gender dysphoria (e.g. via sexual openmindedness) that make our tests invalid.

The problem illustrated diagramatically. Each node represents a variable, and the arrows represent causal effects while the lines represent unknown effects. GFP refers to the general kinkiness variable that we estimate by asking people about a bunch of unrelated paraphilias. ??? refers to hidden confounders that may make our analysis invalid.

In fact, if we knew the strength of the hidden correlation, we could just subtract it off in order to make our tests valid again. There’s some asterisks here that should be taken into account, but I think it’s at least a promising path forward. But that raises the question, how do we figure out the hidden correlation between general paraphilia and gender issues?

The obvious way to figure out whether this is the case would be to correlate general paraphilic tendencies with gender issues. If there is some sort of connection between them, then that connection should show up as a correlation between the two. But of course the problem is, the connections between paraphilias and GD also include the kinkiness -> AGP -> GD connection, which is precisely the connection that we want to estimate. We would end up subtracting the correlation from itself, yielding zero no matter what.

So is there some way that we can figure out the kinkiness <-> GD correlation, minus the kinkiness -> AGP -> GD path? Here’s my idea: Just look at the correlation between kinkiness and GD in non-AGP men. If the men aren’t AGP, then the kinkiness -> AGP -> GD path cannot be in play. Next, subtract this off from the correlation between kinkiness and GD overall, and you get your causal estimate.

Rather than investigate the associations among all men, we can simply investigate the associations among non-AGP men. This doesn’t include the kink -> AGP -> GD path, allowing us to investigate potential confounders.


I figured this method out a while ago now, and I had actually intended to do a separate survey to collect new data to test it. But then I started getting distracted, and I figured, hey, I’ve got the previous porn survey that I originally tested this method in, I might as well try it again on that data. Later we will discuss some reasons why this survey isn’t ideal, but it seems like a reasonable starting point.

So a bit of background, the dataset I’m going to analyze comes from a survey I posted to /r/SampleSize, titled “[Casual] Can you look at some porn For Science? Survey #5 (18+) NSFW”. In the survey, I showed people various erotic images containing men and women doing various erotic things. In addition to this, I also asked a number of questions, including questions about sexual interests and gender issues. I got about 1000 male responses, making it quite a large sample size. Which is good, because this method is incredibly data-intensive.

To measure general paraphilia, I had some items measuring sexual interests by asking about arousal on a rating scale from “Not at all” to “Very”. I took the average response to how aroused the participants said they would get by the following themes (alpha=0.52):

  • Being tied up by your partner
  • Exposing my genitals to an unsuspecting stranger
  • Watching a video of yourself masturbating
  • Having an older sexual partner take on a dominant parent-like role in the relationship
  • Imagining having sex with an anthropomorphic animal (furry)
  • Caressing your partner’s feet

To measure autogynephilia, I took the average response to how aroused the pariticipants said they would get by the following themes (alpha=0.81):

  • Imagining being the opposite sex
  • Wearing clothes typically associated with the opposite sex (crossdressing)
  • Picturing a beautiful woman and imagining being her
  • Wearing sexy panties and bras
  • Imagining being hyperfeminized, i.e. turned into a sexy woman with exaggeratedly large breasts and wide hips

Those who answered “Not at all” to all of the above were categorized as non-AGP (n=316), while the remainder were classified as AGP (n=828).

To measure gender dysphoria, I had some items that asked about how masculine/feminine the participants were, with a rating scale going from “Disagree Strongly” to “Agree Strongly”. Among those, I used the following two to assess gender issues (alpha=0.61):

  • As a child I wanted to be the opposite sex
  • I feel I would be better off if I was the opposite sex

Among non-AGPs, the correlation between GFP and GD was 0.02 (with a standard error of 0.06 according to bootstrap). This could be taken to indicate that there was no confounding between GFP and GD at all, though make sure to read the rest of the blog post to see an asterisk with this interpretation. Among AGPs, the correlation between GFP and GD was 0.15 (SE 0.03). Therefore, subtracting them yielded a correlation of 0.13 (SE 0.07).

This 0.13 number is pretty low, but it is the value for the GFP -> AGP -> GD path, not for the AGP -> GD step of it. To get the latter, I divide out by the GFP <-> AGP correlation among AGP men. This is a correlation of 0.37 (SE 0.03), yielding a value of 0.35 (SE 0.2) as the causal effect of autogynephilia on gender issues among autogynephiles.

This effect is technically not an effect for the whole sample, but instead only among the subset that are autogynephilic. I can assume it simply linearly extrapolates to the entire sample, in which case I get a total effect of 0.36 (SE 0.2). If I subtract this off from the original correlation of 0.45 (SE 0.03) between autogynephilia and gender issues, that leaves an effect of 0.1 (SE 0.2) that isn’t explained by the AGP -> GD effect. So this examination indicates that 80% of the correlation between autogynephilia and gender issues is causal AGP -> GD.


And here’s the bad news: the effect of 0.36 is not statistically significant. That’s not to say that it’s too “small” to be important or something like that. Rather, statistical significance is a technical term used to describe when the sample size is big enough that it would be hard for the result to have been achieved by chance, just from randomly picking people who happen to align with the theory. In order for a result to be statistically significant, it must be the case that if there were no effect, you’d only get results as extreme as that result 5% of the time. But that would require our effect to be greater than 0.4, which it is not.

The good news is that the remaining correlation of 0.1 also wasn’t statistically significant. It would have to exceed 0.38 to be significant, which it very much did not.

What this lack of significance means is that this survey isn’t the final step in the story. We need to collect more, bigger data. Compared to just going with the direct correlation, this method needs very large sample sizes. I would estimate that this method requires about 15x as many participants as the more straightforward methods, though it depends very much on the details.

We also need better data. The paraphilia and gender issues measures used in this survey were very low-quality. I’ve been working on better measures, but I could still use improvements. The autogynephilia measure is also kind of ad-hoc, and could benefit from more coherence and thought.

It may also help to get more controls. If we can better account for other factors that influence gender dysphoria, then that can let us estimate the effects more precisely for autogynephilia. It may also be that we can somehow combine this with my analogy-based methods to improve things.

It should also be noted that this method can be used for other things than autogynephilia theory too. For instance, it could likely be used to test the “autoandrophobia” theory that is often brought up by critics of autogynephilia. This theory is rarely explicated, but I did once talk with a trans woman who gave me her idea of it. In that variant, people end up with certain random things that they are disgusted by, similar to how people end up with certain random things that they find erotic; and if one then ends up finding having male traits to be disgusting, then that would cause gender dysphoria. This theory could be tested by replacing the general factor of paraphilia with a general factor of disgust sensitivity, and replacing autogynephilia with autoandrophobia.

Finally, let’s take a discussion of the potential problems and assumptions with this method. This is going to get technical, so I guess be warned about that. After the discussion of problems.

Conditioning is not a counterfactual

This first point is kind of abstract, so let’s instead discuss my favorite statistical paradox, Berkson’s paradox. I like the examples given in this twitter thread: Why are handsome men jerks? Why don’t standardized test scores predict university performance great? Why are movies based on good books usually bad? Why are smart students less athletic? Why do taller NBA players not perform better at basketball?

Stolen slide illustrating Berkson’s paradox. By selecting a subset of the population, you introduce a negative correlation between the variables you select on.

If we filter our sample on the basis of some set of variables, then that filtering introduces a ton of spurious correlations between all of the variables that are upstream of our filtering. The usual pattern will be negative correlations between the causes, but we might have other things going on, depending on the specific details.

So when we compute things for the non-AGP and AGP men separately, we may very well introduce some additional correlations that don’t correspond to anything real. How big of a problem is this? Lemme give you my threat model, to evaluate what happens.

Threat model: AGP merely reflects a kinky way to express gender feelings. The association between the GFP and GD is not due to GFP -> AGP -> GD, but instead due to some underlying common cause, e.g. sexual open-mindedness or something abstract like that.

The most common critique of the AGP->GD hypothesis is to claim that it makes more sense for there to be a GD->AGP effect. If we then filter for those who are not AGP, then that seems like it should lead to exactly the sorts of classical Berkson’s paradox effect that I’ve brought up here: You would only be included in the sample if you are not AGP, which you would be unlikely to be if you were both kinky and GD, so you’d have to either be neither kinky nor GD, be only kinky, or be only GD. Further, if you were only GD, then you would probably need to be less kinky than average to cancel it out, while if you were only kinky, then you would probably need to be less GD than average to cancel it out. So this could explain why we got a correlation of 0.02 between kinkiness and gender issues among non-AGPs; maybe the “true” correlation was higher, but it was masked by the filter effect.

So that seems like a problem. But, this isn’t the only filtering we did. We also looked at the correlation between AGP and GD among AGP men, and subtracted off the correlations from each other. Thus, if the Berkson’s paradox effect is equally big for both of them, it should cancel out. Could that be the case? And if it isn’t the case, could we estimate the discrepancy and adjust for it?

Here’s one condition where it would be the case: All of the variables are normally distributed and linearly related, and when we filter for non-AGP men, we take the men who have below-average amounts of AGP, while when we filter for AGP men, we take the men who have above-average amounts of AGP. Because we’d then be filtering equally strongly when we took the below-average and above-average AGPs, it would exactly cancel out, and there would be nothing to be concerned about.

The problem with this condition is that it’s obviously wrong. For instance, the distribution of AGP looks like this:

That looks extremely non-normal to me.

But there are many ways that it could be rescued. Suppose, for instance, that you believe the participants see being a woman as having some degree of eroticism, which may be negative or positive, and suppose that a man ends up AGP if he sees being a woman as having a positive degree of eroticism. In that case, you’d expect to see some sort of distribution similar to the above, where there’s a large spike around 0, and a distribution above this. Further, if you believe that there are many factors that influence the latent eroticism (and you almost must, considering that we can’t find any factors that predict AGP), then it seems reasonably to suppose that this is normally distributed, as tends to happen in polyfactorial cases due to the central limit theorem. So in this model you would have AGP expressed as follows:

AGP = max(0, kinkiness + gender issues + ..?other factors?..)

An alternative would be a conjunctive model. The previous model assumes that if there is some factor that influences the latent eroticism of being a woman strongly enough, then that factor alone can cause AGP, by overpowering the other factors. But what if instead you think that factors need to interact to cause AGP? A simplistic example might be that if you are AGP if you are kinky and open to being a woman; but other more nuanced models are possible. Here you would express AGP as a product:

AGP = kinkiness * gender issues * ..?other factors?..

(Here, all of the factors would need to be positive; otherwise you get bizarre inversions where if a factor gets negative then all of the other factors end up having the opposite effect.)

It turns out that these models are approximately isomorphic! Specifically, first notice that the maximum function and the exponential function have approximately the same shape for small input values:

Shapes of the maximum function and the exponential function.

Therefore, we can approximately replace the first model with the following:

AGP = exp(kinkiness + gender issues + ..?other factors?..) = exp(kinkiness) * exp(gender issues) * …

Applying the exponential function to the other factors is exactly what is necessary to turn them strictly positive, as is expected by the conjunctive model. Overall I’ve spent a lot of time thinking of different models for how things could interact, and most of them seem like they end up approximately isomorphic to this model (though I’m open to hearing counterexamples if you have any), so I think it’s probably okay to use.3

So to recap, what this implies is that the Berkson’s paradox effect will be equally big if we filter equally hard on the AGP category and on the non-AGP category, which will happen if we have equally many in each of the categories.

And that’s actually part of the problem with the porn survey. I had 316 men in the non-AGP category, and 828 men in the AGP category, so that means only 28% of the respondents were non-AGP. Meanwhile, in the general population, about 3%-15% of men are AGP and the rest are non-AGP. So in neither case, I would end up with an even split. However, on reddit, the proportion of AGPs is actually often quite close to 50%, so it might be doable there. (I’m not sure what happened in the porn survey – I suspect it’s just that AGPs are horny.) Otherwise, it might also be interesting to look into whether there are any mathematical ways to adjust for the asymmetry.

Nonlinearities kill

Part of the assumption made in this method is that whichever confounders there may be between kinkiness and gender issues work the same way in AGPs and non-AGPs. If this is true, then I think the approach is in pretty good standing. However, what if they don’t? Suppose for instance that we have some sort of situation like this:

That is, suppose gender dysphoria is caused by some sort of neurological feminization (it’s not particularly important that this is so, but I had to pick some concrete variable), and suppose that gender issues arise from this. But suppose further that sexual openmindedness (or whatever, the particular variable isn’t very important) moderates this effect, such that the effect of ladybrains on gender issues is stronger for those who are sexually openminded (maybe the others repress, or are unwilling to admit their gender issues, or whatever).

In that case, AGPs would be more likely than non-AGPs to have ladybrains, and therefore the confounding between GFP and GD would be stronger for them. Which would lead to my method concluding that AGP causes GD, even though in this case it doesn’t.

It would probably be a good idea to evaluate how sensitive this method is to nonlinearities. In additions, ways of making it more robust should be evaluated. Further, in the context of nonlinearities, it should be noted that the method sort of relies on something nonlinear-like going on. I split on the basis of AGP vs non-AGP, with the logic being that the GFP can’t influence AGP among non-AGPs. But for there to be some context where the GFP can’t influence AGP, there must be a nonlinear relationship between the GFP and AGP.

Estimation shenanigans

When I computed the effects, I did all sorts of subtractions of correlations and such from each other. This isn’t strictly valid; the correct way to adjust for the confounding between GFP and GD depends on the nature of how the confounding works, leading to a spectrum of possible adjustments. Furthermore, if variances differ (for instance, there’s more variance in AGP among AGPs than among non-AGPs, as non-AGPs have 0 variance in AGP), then using correlations rather than regression coefficients is invalid.

In fact, if I take this last point into account and reevaluate the coefficient from the data, then I get an effect size of 0.38 (SE 0.17), which just barely manages to be statistically significant. But this isn’t the only estimation shenanigan I did, and in order for the results to be believable, it would be good to go through and see if the estimation can be made more accurate. In cases where we don’t have sufficient information to make it more accurate, we should try varying the assumptions to see how sensitive it is to them.

Overall, due to all of these complications, this should merely be seen as a proof of concept, and not necessarily as a finished, definite solution. But I think the trick I presented in this post, of comparing the effect in AGPs and non-AGPs, make me more open to the possibility that this class of methods for causal inference may be workable for deciding the validity of AGP->GD causality.

1. From the perspective of Blanchardian theory, what would be most convenient would be if AGPs didn’t have any childhood gender issues at all, because this would seriously cast doubt on the possibility of GD -> AGP. However, pursuing this argument is not very viable, because when pressed, Blanchardians admit that often times, AGPs do have some gender issues in childhood.

Blanchardians argue that this may be analogous to how children sometimes end up with childhood crushes, with the childhood gender issues corresponding to a sort of romantic ideation. Which, sure, whatever, seems like a fair enough possibility. But it complicates the idea of using time as a cause of autogynephilia for causal inference, and Blanchardians should stop making this argument.

2. The idea behind this proposal is that autogynephilia and gynephilia “compete”; at times where someone is more sexually engaged with women, they don’t have enough “left-over attraction” to be attracted to being women. I have not seen much convincing theory or hard data supporting this; as far as I can tell, it’s solely based on some clinical anecdotes. I don’t really buy it, which makes me extra critical about using it to estimate these things.

3. One interesting thought that comes up here is the question of, if there’s a continuous liability of eroticizing being female, is it really only the positive part that affects things? For instance, you could imagine that the negative part represents finding AGP themes to be a direct “turnoff”. But the estimation method I came up with ends up assuming that there is no effect in the negative part of the spectrum, and attributing any effect there is found to confounding. From a theory point of view, if there is such a thing as “negative AGP”, then that would obviously disprove Blanchardianism.