Almost all sexual interests are positively correlated. Going even further, almost all paraphilias are positively correlated, and typically more closely with each other than with normophilic sexual interests. To give an example, from a survey run by the creator of /r/AskAGP:

When trying to understand the structure of sexual interests, this presents a problem, because an important method of understanding it proceeds by looking at the pattern of correlations. But when everything correlates with everything, it is difficult to interpret this pattern.

The pervasive correlations could be interpreted as representing a “general factor of paraphilia”; that is, some underlying factor that contributes to all abnormal sexual interests. Since paraphilias and ordinary sexual interests also to a degree correlate with each other, one might interpret it as to some degree reflecting something that paraphilias and ordinary sexual interests have in common, such as libido. However, since paraphilias are more strongly correlated with each other than with ordinary sexual interests, it likely also represents *something else* other than libido. One possibility for this *something else* might be that there is some common error or set of errors in the development of one’s sexuality that can contribute to all abnormal sexual interests. Another possibility is that it represents “method factors”; i.e. maybe to a degree it is due to people who are open to admitting to one abnormal sexual interest also being more open to admitting to other abnormal sexual interests.

But regardless of the reason for the presence of this general factor of paraphilia, it would be nice to have some way to control for it. One way of controlling for it is to assume that each paraphilia is influenced to some specific degree by the general factor; this influence is called the *factor loading* of the paraphilia. We can then fit a statistical model that estimates the factor loadings of the paraphilias, and use this model to distinguish between the variance and correlations due to the general factor, versus the variance unrelated to the general factor:

In order to understand things better, we can zoom further into the residual matrix. This matrix represents the correlation structure after taking the general factor into account:

This kind of matrix should give a clearer idea of the true structure of the paraphilic interests. For instance, we can see that being a furry is correlated with attraction to animals, as well as with AGP; this is predicted by the theory of *erotic target location errors*, which states that AGP and furryism both represent an “inversion” of attraction to respectively women and anthropomorphic animals onto oneself, such that one is interested *being* what one would otherwise be attracted to.

On the other hand, we do not see any particular correlation between autogynephilia and masochism. We do see one between forced feminization and masochism, but this is trivial due to content overlap. (Unfortunately, this dataset didn’t include anything asking about transvestic fetishism independent of masochism.) The lack of correlation between autogynephilia and masochism seems to contradict the theory of *masochistic emasculation fetish*, which asserts that autogynephilia isn’t really a sexual interest in being a woman per se, but instead a fundamentally masochistic interest. Since anecdotal correlations between autogynephilia and masochism are the main thing cited as evidence for MEF, I think this result disproves MEF, at least unless it turns out to have been a fluke somehow. (Incidentally, the creator of /r/AskAGP collected this dataset specifically with the purpose of trying to *prove* that AGP was correlated with masochism. Turns out it’s not that simple.)

Another interesting thing is that this method makes it more convenient to collect items that measure the general factor of paraphilia well. The issue with just picking any random set of items is that if the items have too much residual correlation, they might not measure the general factor of paraphilia, but might instead measure something more narrow. For instance, if one includes both “Having your partner say insults and/or slurs to you” and “Being humiliated for having a small penis”, one might end up just measuring humiliation masochism. Thus, ideally one would pick items that have no correlations beyond the general factor of paraphilia at all. Inspecting the diagram, that following look like a promising set:

- Imagining being a woman and caressing your own (female) body
- Having your partner say insults and/or slurs to you
- Exposing your genitals to an unsuspecting stranger
- Sniffing your partner’s underwear
- Mate-swapping; having sex with someone else’s partner while they have sex with your partner
- Tying someone up
- Having sex with someone much older than yourself

Since these items aren’t pure measures of the general factor, they are not going to perfectly measure it, even when aggregated. It might be nice to have an idea of how well they measure it. One such measure is the internal reliability, which estimates the correlation between the general factor and the sum of the items from the internal correlations between the items. The internal reliability of this set of items is 0.64, which is considered to be on the low side; ideally one would find a greater set of varied paraphilia items to create a better measure of the general factor of paraphilia.

I’ll end this post with a brief technical description of the math involved. In order to fit the general factor, I searched for a vector λ containing the factor loadings, as well as a matrix Ω containing the residuals. Given the correlation matrix Σ, I then searched for solutions to λ and Ω such that Σ=λλ^{T}+Ω.

This is an underspecified problem, as one can find a solution for any λ simply by taking Ω=Σ-λλ^{T}. To make the result well-defined, I picked λ so as to minimize the off-diagonal elements of Ω. The intuition behind this is that correlations between unrelated paraphilias are presumably due to the general factor, so we do not want Ω to contain any of these correlations. In order to prevent Ω from containing this, we simply minimize the off-diagonal elements. More specifically, I chose to minimize the sum of the absolute values of Ω; this should aim to set the median value of Ω to zero. As long as most of the paraphilias collected are unrelated to each other, this should accurately get at the general factor of paraphilia.

In order to estimate the internal reliability, I used the formula (λ_{0}+λ_{1}+…)^{2}/((λ_{0}+λ_{1}+…)^{2}+(1-λ_{0}^{2})+(1-λ_{1}^{2})+…). Intuitively speaking, the formula consists of two parts, G=(λ_{0}+λ_{1}+…)^{2}, and S=(1-λ_{0}^{2})+(1-λ_{1}^{2})+…, such that the total formula is G/(G+S). G and S each represent a fraction of the variance in the “sum score” of the paraphilic interests. Specifically, G represents the variance due to the **g**eneral factor, while S represents the **s**pecific variance for each paraphilia (which, when trying to measure the general factor, we think of as being measurement error). The G variance grows quadratically with the number of items, while the S variance grow linearly with the number of items, so this means that as one increases the number of items, the sum score will to a greater degree represent the general factor compared to the specific variance; that is, more items means less measurement error.

The mean factor loading for the restricted set was 0.45. If additional sexual interests continue having factor loadings like this, the reliability should become adequate with about 10 items, good with about 16 items, and near-perfect with about 40 items. One project I would like to see completed would be to collect more items to better measure the general factor of paraphilia.