If a cis kid was made to transition…

[Epistemic status: somewhat speculative, due to lack of data.]

One argument that’s sometimes made in favor of the existence of an innate gender identity is the case of David Reimer, a boy who was brought up as female due to damage to his penis after a circumcision. Later in life, he ended up gender dysphoric and transitioned back to living as male, but eventually committed suicide.

The Reimer case is likely not strong evidence, though. It’s only n=1 (obviously), and John Money made Reimer do things that could very easily be seen as abusive. Thus, it’s not a great case to rely on.

Instead, a better case to consider might be boys with cloacal exstrophy (a serious condition that among other things leads to an underdeveloped penis) who were raised as female. In the main study of such boys that I’m aware of, they found high levels of gender dissatisfaction, with the majority returning to live as boys, and 2 to 4 of the 5 living as girls wishing to be boys.

On first glance, this supports the notion of innate gender identity. However, one thing that’s already worth noting is that all of these kids were very masculine from young; thus, it also supports some sort of link between this gender identity and masculine behavior, a link that isn’t very compatible with narratives of repressing or hiding gender nonconformity.

In addition, I think if we read the study more carefully, we see some issues with the idea that it supports notions of innate gender identity. The kid who had the least gender issues, and was the most satisfied with being a girl, subject #1, appeared to be in many ways similar to the other kids. So why the different outcomes? Here’s my suspicion:

If you read the study carefully, only two of the subjects living as male, 9 and 10, spontaneously declared their “gender identity”. These, along with two other subjects who were not living as male, were the only ones who formed the idea of “I am male” independently, indicating that whatever is motivating the gender issues of most of the subjects, it’s more complicated than an internally-generated feeling of being male (as gender identity is usually but not always defined). This leaves open the possibility that for all the subjects, including these two, the gender issues came from a more-complex interaction between their behavioral masculinity and society.

Subjects 11-13 adopted a male identity after their parents told them of their medical status at ages 5-7. Since there are quite a few people who have gender issues in their childhood but get over them when older, this makes them imperfect examples. For instance, we could hypothesize that telling masculine girls that they are in some sense male will often lead to a desire to transition. For these subjects, I’d wonder how many of them would’ve desisted if they had not been told.

Subject 14 assumed a male identity after being told at age 18. This is probably old enough that the effect above cannot explain it; thus, I’d categorize this subject in a way similar to subjects 9 and 10.

Subjects 7 and 8 are really similar to subjects 9 and 10, except that their families aren’t supportive, limiting their ability to start living as male. Both for these, and for subjects 9 and 10, there’s another issue that’s worth considering; due to being natal males, they need to take estrogen medically, rather than having the body produce it naturally at puberty. Refering the desistance study again, it is worth noting that some masculine girls feel uncomfortable at puberty but eventually find that they like being girls:

The second factor the desisting girls associated with their decrease in gender discomfort was the feminization of their bodies, primarily the growth of their breasts. At first they reported that this was unpleasant. They felt embarrassed and uncomfortable, and felt it interfered with their freedom to move. However, before long their feelings shifted in a positive direction and they desired even more physical feminization.

♀ Desister #11
Before puberty, I disliked the thought of getting breasts. I did not want them to grow. But when they actually started to grow, I was glad they did. I really loved looking like a girl, so I was glad my body became more feminine.

One thing I would wonder is if the need to take the estrogen exogenously leads to more gender issues, as the effects of it are seen as more foreign and avoidable than if this is what the body naturally produces. As such, while subjects 7-10 are probably the most-unambiguously gender-dissatisfied of the bunch, the situation isn’t completely unambiguous.

Subjects 1 to 5 have their own set of ambiguities, though. The only info we had on how they did in adulthood was based on parent report, which raises the question of how accurate it is. However, the parent’s reports that the subjects are generally content is compatible with more-reliable observations that masculine girls with gender issues generally get over them when they grow up. (On the other hand, the kids in the cloacal exstrophy study were attracted to girls, while the masculine girls who tend to get over their gender issues tend to be attracted to boys.) For most of them, their gender issues were also somewhat limited in scope at the initial assessment, further supporting the possibility that they did fine in adulthood.

Subject 6 is really unclear, though. They appeared to be doing ok – not perfect, but ok – at the initial assessment, but after being told of their medical status, would not discuss the topic with any. However, they did comply with estrogen treatment. Due to lack of better info, I’d classify them with subjects 11-13, as having ambiguous gender issues.

Group Count Subjects
Ambiguously no gender issues 5 1-5
Ambiguous gender issues 4 6, 11-13
Unambiguous gender issues 5 7-10, 14

I’d be inclined to drop subjects 6 and 11-13 for being told at a young age, making them very difficult to compare to e.g. masculine natal females. This yields about half with no gender issues, and about half with clear gender issues, and I can’t help but point out that this is a similar to the rate of people who tend to identity as cis-by-default in surveys (this survey found 54% identifying as cis-by-default, 46% identifying as affirmatively cis).

[Epistemic status for the followup: questionable math, mainly for sanity-checking. The following math can probably be adjusted to “prove” anything by fiddling with the assumptions.]

How well does this fit with a model where masculinity interacting with society is the driving factor for these sorts of gender issues? I usually estimate there to be a D~2 gender difference in psychology, which implies that 15% of people are more like the opposite sex than like their natal sex. This is wayyy to much if just used directly, as this would suggest that 15%/2 = 7.5% of natal females become trans men.

However, doing this estimate directly would also be somewhat ridiculous, as the vast majority of the 15% would still be more feminine than the average boy, and because the gender issues appear to be much stronger among those attracted to girls than those attracted to boys.

Thus, the real question we need to know is how many lesbians are more masculine than the average man. The estimate of the difference between lesbians and straight women in masculinity/femininity varies depending on study, but let’s go with the gender diagnosticity difference from this study and assume d~0.5. This means that lesbians are d~1.5 more feminine than men. The estimated rate of lesbianism also varies, but let’s go with a middle ground answer of 2%.

By these estimates, about 7% of lesbians should end up with serious gender issues and transition to end up as trans men, which in total should make up a bit more than 0.1% of the natal female population, or, if we estimate trans rates to be about 0.3%, a bit less than half of the FtM population.

This is twice as high as the rate they truly make up (23%, according to the USTS), but there’s infinitely many places that the numbers and calculations can be tweaked, so I don’t think this problem with fitting it should be taken too seriously.

My main conclusion for this is that probably a lot of men would do fine living as women if they were raised as girls, but also that quite a few probably wouldn’t. This appears to very roughly match the cis-by-default self-identification situation, at least to around an order of magnitude or two (which, admittedly, is a pretty bad match). I don’t think that these results are as compatible with a universal innate immutable gender identity, as much as they might be compatible with more-complex mechanics, involving significant individual variation in how well any given man would do living as female.

EDIT 2019-08-02: This study also appears to find that 50% of boys with cloacal exstrophy who are raised as female end up non-gender-dysphoric, but I haven’t read it very carefully so I don’t know. Might be worth looking into.

Playing around with “gendermetricity”

[Epistemic status: silly statistical experiments. Might eventually turn into something useful but for now everything should be taken with a grain of salt.]

[Apology: this is a badly-organized post. The explanation of what gendermetricity and gendermetric correlations are comes in the middle of the post, rather than in the beginning. I find the results in the end really interesting and promising, but it takes a while to get there.]

I love behavioral genetics, because I find the way that it allows you to summarize complex and opaque information into simple variance components interesting and enlightening. For this reason, I got excited when I saw Gwern post a tweet with a link to a study that generalized this approach from behavioral genetics to neuroanatomy. Does this mean we can use this for other domains too?

For some background: typically, behavioral genetics have used the known similarities between monozygotic and dizygotic twins to infer to what degrees various traits are heritable, shared environment or nonshared environment. If more-genetically-similar twins are more phenotypically similar than less-genetically-similar twins, the trait in question is heritable. However, more recently, it has become possible to genotype extremely large numbers of unrelated individuals, which makes it possible to compare similarity without the individuals being related family-wise. This allows the technique of comparing degree of genetic similarity with degree of phenotypic similarity to work with non-twin samples, as long as they are big enough. This statistical tool is called GCTA (genome-wide complex trait analysis).

However, there’s nothing restricting you to genetic similarity. In principle, you can use any similarity metric you want, as long as it satisfies the conditions assumed by the GCTA statistics. This was what they did in the paper Gwern linked, replacing genetic similarity with neuroanatomic similarity, allowing them to study highly interesting questions of how strongly phenotypes can in principle be predicted from neuroanatomy, even though they haven’t yet discovered how to predict these neurotypes. They called this statistic morphometricity.

But if this works with genetics, and it works with neuroanatomy, then surely it works for just about anything! Gwern suggested gut microbiomes and leaf spectral imaging, but given my interests, my attention immediately shifts to personality, life experiences, or generally any sort of data that is sufficiently multidimensional that regressing directly with it becomes difficult.

As masculinity/femininity appears relatively high-dimensional, and as I get more and more interested in exploring massively high-dimensional data, I’m interested in this sort of tool for my surveys. However, the immediate question that comes to mind is, do I have the sample size needed? GCTAs are usually run with thousands of participants, whereas I typically have a few hundred (though I have a project in the works that might yield me thousands…), so it’s not looking promising. On the other hand, it seems that I have way fewer dimensions to work with, so perhaps this helps; after all, this is supposed to be less data-intensive than just plain linear regression…

After trying an failing for a while to translate their matlab code to Python, I decided to just follow Gwern’s advice and abuse the GCTA program to directly give me the results. I loaded it up with data from my survey on Gender, Sexuality and Other Things and gave it some test runs. Here’s some example results:

Demo Trait g^2 SE
all gender 56% 6 pp
women aap 53% 12 pp
women narcissism 47% 11 pp
men feminism 46% 9 pp
women gender issues 40% 11 pp
women self-mf 29% 11 pp
women age 27% 10 pp
all age 24% 6 pp
men age 24% 8 pp
all sexual orientation 21% 5 pp
men sexual orientation 21% 7 pp
men narcissism 20% 9 pp
men self-mf 19% 7 pp
women sexual orientation 13% 10 pp
all quality of life 12% 5 pp
women feminism 12% 10 pp
men gender issues 8% 6 pp
men agp 2% 4 pp

In the above, I used the GCTA program to look at demographics and traits and compute their “””gendermetricity””” (“””g^2″””) – i.e. its estimate for how much variance in the trait can in theory be predicted linearly using the masculinity/femininity items I included in the survey. SE denotes the standard error that GCTA estimated. Self-mf refers to self-assessed masculinity/femininity.

The above table is… not very promising for the usability of this tool. The confidence intervals are very wide (though that’s to be expected with my sort of sample size), there’s relatively little connection to how strongly something appears to be related to masculinity/femininity and how high its gendermetricity is (though this is not what the tools promise either – in principle, they’re supposed to detect any variance that can be predicted from combinations of the items, even if these combinations are completely orthogonal to masculinity/femininity), and it’s kinda opaque if just considered directly. It did have some ups, though, e.g. placing gender as being the most-gendermetric trait, and placing AGP as being one of the least-gendermetric traits, but given the other problems, I wouldn’t trust gendermetricity in these domains either.

GCTA is supposed to have a “genetic correlation” function, which should be usable for figuring out the degree to which the gendermetric variance in two variables is correlated. However, I couldn’t get it to work, and the problems I mentioned before made me a bit uninterested in spending too much effort on making it work.

However… gendermetricity is basically an estimate for how well linear regression can in principle be able to predict the traits in question. If we just ignore the “in principle” part, we can explore gendermetricity-like concepts by performing the relevant linear regressions directly!

Let z be a random vector containing the masculinity/femininity-related variables that we seek define gendermetricity using, and x (and y) be a random variable containing the trait that we seek to predict the gendermetricity of. Let x // z denote residualizing x for z. The gendermetricity of x is simply just the fraction of variance explained by z of x, which can be computed as var_z(x) = (var(x)-var(x//z))/var(x). Similarly, the gendermetric covariance of x and y must then be cov_z(x, y) = cov(x, y)-cov(x//z, y//z), and so their gendermetric correlation be cov_z(x, y)/√(var_z(x)var_z(y)).

To help with dealing with the amount of data I have, I use PCA to reduce the dimensionality of the masculinity/femininity test from 22 to 7. In addition, I residualize the variables in a “leave-one-out” manner, which is to say, I predict each individual with a model that has been fitted to all other individuals. To reduce noise variance, I test giving the regression different numbers of principal components as input, ranging from 1 to 7, and give the number that yields the highest gendermetricity. This yielded the following gendermetricities:

Demo Trait g^2
all gender 42,3%
all sexual orientation 20%
men feminism 13,5%
men sexual orientation 11,6%
women gender issues 9,6%
women self-mf 9,5%
men self-mf 8,2%
men age 8,2%
all age 4,8%
women aap 3,7%
women sexual orientation 3,6%
women narcissism 3,3%
men gender issues 2,5%
men narcissism 2,1%
all quality of life 1,0%
men agp 0%
women age 0%
women feminism 0%

This doesn’t look too bad, but more importantly, we can now compute gendermetric correlations! But first, what actually is a gendermetric correlation? The best way I can explain a gendermetric correlation between two variables X and Y is the following: Suppose there’s some stuff that makes X correlate with the masculinity/femininity test (i.e. X is somewhat gendermetric). And suppose there’s some stuff that makes Y correlate with the masculinity/femininity test. The gendermetric correlation is then a measure of how much these two “stuffs” is the same stuff. Now let’s take a look at some examples!


Correlation matrix among the full sample. The above-diagonal correlations are the gendermetric correlations, while the below-diagonal correlations are the residual correlations.

So, how do we interpret the above? There’s a number of things that could be said. First, note that the gendermetric correlation between sexual orientation and quality of life exceeds the [-1, 1] bounds that are typically expected of correlations. This is not because gendermetric correlations are somehow able to correlate more strongly than ordinary correlations; rather, it is because my math sucks. (I could have removed these effects, e.g. by just clamping them to the relevant range, or by not doing the leave-one-out thing in my regression, but I think they serve as a useful reminder not to take the statistics in this post too seriously.)

Consider the gendermetric correlation between sexual orientation and gender. It is very close to one, which makes sense when you break it down: The variance in gender decomposes into the gendermetric variance, which boils down to the fact that men are more masculine than women, and the non-gendermetric variance, which boils down to the fact that some women are masculine and some men are feminine. Meanwhile, the variance in sexual orientation decomposes into the same gendermetric variance where men are more masculine and women are more feminine, plus a bit of extra gendermetric variance where gay people are more GNC than the baseline, plus a lot of non-gendermetric variance due to not all queer people being GNC, and not all straight people being gender-conforming.

The gendermetric correlation tells you how much the gendermetric variance in the two variables is shared. Since the main difference in the gendermetric variances is that sexual orientation also contains some GNC gay people, the bulk of the variance (namely that men tend to be more masculine than women) is shared, and so the gendermetric correlation is high. (It’s probably worth adding that I wouldn’t be surprised if the 0.98 number above is an overestimate.)

The residual correlation is much lower. This correlation tells you how much the variables are still correlated after taking the gendermetric variance into account. That is, it tells you the degree to which the non-gendermetric variance is shared. As you can see from the diagram, it is much lower than the gendermetric correlation, and I can also inform you that it is lower than the usual correlation, as in this sample, gender and sexual orientation is correlated at r~0.42.


Depicted: the gendermetricities for the traits mentioned before. It is only the gendermetric variance, and not all of the variance, that gendermetric correlations use to measure the connection between variables.

In the text above, I assumed the gendermetric variance was related to masculinity/femininity. This is likely in the case of gender or sexual orientation, but it doesn’t necessarily need to be the case in general. Since I allowed up to 7 dimensions from the masculinity/femininity test to be included in the regression, it is possible for the linear regression to form predictions that are not based on masculinity/femininity, but instead also on mixes, e.g. taking some “masculine” characteristics and some “feminine” characteristics and using them to form a new “trait”.

As an example, two traits that might be included in a masculinity/femininity test (but which weren’t included in mine) are Expressivity (caring about others) and Instrumentality (high agency and a strong sense of self). One might assume that gendermetricity computed using these traits only use either the traits directly, or use their difference. However, gendermetricity might also instead use their sum to predict things, which corresponds to Extraversion, a relatively ungendered trait.

Now that we understand gendermetricity (hopefully), let’s look at some more examples.


Gendermetric and residual correlations for women. A number of other traits were also tested but found to be nonsignificantly gendermetric, namely attraction to women, mimicry-autogynephilia, every paraphilia in the survey except for the ones listed in the diagram (including a different variant of exhibitionism), feminism, dislike of own appearance, age, life satisfaction, exclusive attraction to women, number of female partners, and number of male partners. 


The first thing to note is that including self-mf (i.e. self-assessed masculinity/femininity) in a gendermetric correlation is in some ways strange. Gendermetricity is meant to capture masculinity/femininity, so what exactly happens when this gets combined with self-mf? Well, basically, we’d expect the gendermetric variance in self-mf to be just that, masculinity/femininity. However, there is going to be some additional variance in self-mf, both because any self-report measure has some noise, and because our masculinity/femininity measure might not be complete. Thus, a gendermetric correlation with self-mf tells us something about whether the gendermetric variance in a trait is due to masculinity/femininity, or due to something else (such as the extraversion example earlier).

Thus, what the above diagram suggests to us is that the gendermetric variance in attraction to men, self-sexualization, and gender issues in women is due to masculinity/femininity, but that the gendermetric variance in autoandrophilia and narcissism is partly due to something else. I believe that like for ordinary correlations, the gendermetric correlations have to be squared in order to yield the shared variance; this means that 28% of the gendermetric variance of autoandrophilia is, according to this measure, due to masculinity, while the remaining 72% isn’t.

Despite this, autoandrophilia appears to gendermetrically correlate really strongly with gender issues, even though these should be mainly about masculinity/femininity. This is another example of how you should take the results here with a grain of salt; it is impossible for the “real” gendermetricities to work like this, but the estimates do.

One thing that’s worth noting is that autoandrophilia gendermetrically correlates with androphilia. Furthermore, while gynephilia and lesbianism isn’t statistically significant gendermetrically, if I force it to compute the gendermetric correlation between those and AAP, I also find those to be negatively correlated with autoandrophilia. Thus, autoandrophilia is gendermetrically correlated with heterosexuality; this is despite the fact that I usually find it to be negatively correlated with heterosexuality. I’m not yet sure how to interpret this finding, but I find it very intriguing that we finally have an AAP/heterosexuality “””correlation”””, as the lack of this is one of the arguments against AAP as a concept.


Gendermetricities of the traits under consideration.

One odd thing is that narcissism, autoandrophilia, and gender issues are all gendermetrically correlated. I’m not sure what’s up with that, and it’s worth keeping an eye on whether this replicates. (Is this pattern predicted by the ROGD model? I don’t know.)

It is also interesting to observe that autoandrophilia is negatively gendermetrically correlated with self-sexualization, even though it is otherwise positively correlated with self-sexualization. This might also be worth keeping an eye on in the future.

If you think about it, the matrix for women appears to suggest a two-factor solution, with a “general gendermetric factor” that all the dimensions load positively on, and a “courtship-vs-GID factor” where self-sexualization/androphilia/narcissism load in the courtship direction, and AAP/gender-issues/self-mf load in the GID direction. This approach might be worth considering looking into (though that would require me to first figure out how to do “gendermetric factor analysis”, which appears to be easy enough but might be trickier than it looks).

I don’t know if it was a fluke, or what happened, but for some reason of the two exhibitionism items I had, only one, which I’ve labelled “exhibitionism 1”, was gendermetric. This exhibitionism item is related to flashing; its item text is “Exposing my genitals to an attractive stranger”. Meanwhile, the other exhibitionism item, “exhibitionism 2”, is about public sex, “Performing sex acts while stranger watch”.

On to men!


The gendermetric-and-residual correlation matrix for men. Traits that were also tested for gendermetricity were life satisfaction and autogynephilia, which were found not to be significantly gendermetric, and number of male partners, which was found to be gendermetric in the obvious way (positive correlations with almost everything else) and thus excluded because the diagram was getting crowded.

This time, there is a strong, obvious structure in the graph that just screams that it wants to get noticed: There’s a gender nonconformity factor that involves self-mf, sexual orientation, gender issues, feminism, and disliking one’s own appearance (with all but the disliking-appearance dimension being completely gendermetrically correlated), and a courtship factor that involves liking one’s appearance, self-sexualization, being older, narcissism, and having had more female partners.

I think the first of these two factors is very cute; there appears to be a single general factor of gender nonconformity, rather than there being different forms of GNC that are relevant for different traits. (Alternatively, my data analysis is bad enough that I’m not able to detect different forms of GNC.)

The courtship factor is surprising to me. The masculinity/femininity test I’m using doesn’t have any items that are “obviously” related to courtship for men; there’s no “going to the gym” items, or anything similar to this. Presumably there’s an explanation for this that will become clear if I perform some sort of gendermetric factor analysis, but until then my best explanation is that gendermetricity is magic.

And it’s not even that the courtship-related variance it’s capturing is tiny. Here’s the gendermetricities of men’s traits:


Amount of variance explained by the masculinity/femininity test across a range of traits that were tested for men.


The number of female partners is the most gendermetric trait according to this analysis. It’s not that I’m complaining, because other than masculinity/femininity itself, courtship would probably be one of the most-relevant things for a masculinity/femininity test to capture. I just don’t understand how it does it.

One constrast between women’s and men’s correlation matrices is that for men, the residual correlations appear to often to be smaller than for women. I’m not sure if that effect is real, but if it is, it indicates to me that this approach works better for men than for women.

I think there’s four obvious followups to this post:

  1. Perform “gendermetric factor analysis”. It seems that this should allow us to extract highly-intuitive factors from the masculinity/femininity test, which might be useful for other things in the future. Plus, gendermetric factor analysis might help reduce some of the potential problems that can arise from overfitting in these cases. (When playing around with changing the number of principal components, it appears that the structure in the men’s gendermetricity matrix is just a result of the existence of the first two principal components. However, in the women’s gendermetricity matrix, the structure appears to require more principal components, despite appearing mostly 2D.)
  2. Expand the study of gendermetricity with more traits and better masculinity/femininity tests. Maybe we can discover even more structure within the traits, and at least we can verify the structure that is already found. Attractiveness is an obvious thing that might be worth including, as would sociosexuality.
  3. Apply these methods to other domains too; for instance, it would be interesting to see if the AGP/GAMP correlation is due to [attitudes to androgyny]metricity, or something similar for other correlations in sexuality. One complication is that in order to make this system work, the domain that is used must be multidimensional; otherwise the correlations will all be 1 or -1. At the same time, really it’s not the gendermetric correlation that needs to be used to see if some factor is a potential mediator, but instead the residual correlation.
  4. Improve the calculations of gendermetricity, e.g. by fixing the cases where gendermetricities greater than 1 or smaller than -1 are computed, or by figuring out a way to use the linear mixed model approach that e.g. the GCTA program uses.

Brief note on differences between the ROGD narrative and the transtrender narrative

In a picture:


Pictured: the stereotypes associated with stories labelled ROGD vs transtrender.

There are two narratives that have become popular among people critical of the trans community, and they have some surface-level similarity that I think might prevent people from noticing how different they really are. Briefly, both claim that there is a social trend of people taking on transgender identities, but they differ a lot in how they describe the nature of this trend. I think there’s some serious issues with both narratives, but I think it’s worth writing an article that clearly distinguishes them before writing a response to either of them.

According to the transtrender narrative, there are a lot of normal girls who pick up transgender identities in order to get attention, but who aren’t gender dysphoric and aren’t seriously transitioning. People talking about “transtrenders” are usually mainly worried about them making “true trans people” look silly. They do sometimes worry about “transtrenders” engaging in medical transition, but in these cases, they generally consider regret to be inevitable.


Breakdown of issues pointed to by the transtrender narrative.

The ROGD narrative is different. Here, the idea still starts with relatively-normal girls who are in social groups that encourage taking on a trans identity. In the ROGD narrative, they’re also said to have a lot of mental health issues that they expect transition to fix. In addition, the followup is different: ROGDs start following the script that would be expected of trans men, discarding feminine behavior and putting a lot of energy into transition.


Breakdown of issues pointed to by the ROGD narrative.

The ROGD narrative isn’t worried about whether the trans community looks silly, but is instead worried about people ending up with expectations that transition will solve problems that it really doesn’t solve, that people who didn’t need transition will undergo medical interventions, and that the trans community might encourage suicide or self-harm.

In the above, I presented the narratives as being completely separate, but it can be far more continuous than that. There’s nothing contradictory in seeing these things as a continuum, as a progression, or as whatever else one might mix together.

Since the ROGD narrative is clearly the most alarming one, it’s also the one I intend to write a response to first. But that’ll have to wait until a later post.