Text-gender relationships

After commenting on Neil’s post about picking the ‘gender’ of a blog through the content, I sought out the data I used to use when I was looking at the topic some time ago. Eventually I found it. What it is, and what I was nearly correct on in my comment, was the use of specific words in text samples that denote the gender of the writer (theoretically). Do note that all this is based around ‘blog posts’ – there were slightly different lists for ‘fiction’ and ‘nonfiction’ categories as well. I found the list of words, and it goes like this for the ‘feminine’ words:

  • With (52)
  • If (47)
  • Not (27)
  • Where (18)
  • Be (17)
  • When (17)
  • Your (17)
  • Her (9)
  • We (8)
  • Should (7)
  • She (6)
  • And (4)
  • Me (4)
  • Myself (4)
  • Hers (3)
  • Was (1)

That’s the hierarchy of words from most ‘telling’ to least, ranked by the points value (in brackets beside the word). Thus, a piece or text that has ‘with’ in it quite frequently is supposedly more likely to have a female author.

As for the ‘male’ words, they are as follows:

  • Around (42)
  • What (35)
  • More (34)
  • Are (28)
  • As (23)
  • Who (19)
  • Below (8)
  • Is (8)
  • These (8)
  • The (7)
  • A (6)
  • At (6)
  • It (6)
  • Many (6)
  • Said (5)
  • Above (4)
  • To (2)

What I found interesting is that there is quick a rapid shrinking of points from the highest value to the lower values in the female list (from 52 to 47 to 27 in 3 words), while it is more spread out and progressive with the male list (42 to 35 to 34 to 28 to 23 to 19 over the first  6 words). I wondered if it meant that ‘with’ was such a female word that the propensity of its appearance would average out to comparable use of the top 3 or 4 words in the male list. That is to say, ‘with’ occurs extremely frequently in ‘female’ writing, more-so than the individual occurrence of the top 3 or 4 ‘male’ words, but when those male words are added together the total a similar points value to the total of ‘with’. Perhaps its more than the first 3 or 4 ‘male’ words. I never really looked that hard into it.

As interesting as it might seem, it’s not necessarily flawless. I used my previous ‘Holiday Road – Part XVIII‘ entry as an example on the website as it came out as female – by some 300 points out of the ~5000 combined. I thought about why this might be. Obviously I’m not a female, but my story retelling abilities must have ‘female’ characteristics. I then thought about the idea that story telling, in Western culture is (perhaps a broad generalisation here, but it’s how I remember it growing up) a primarily female activity with their kids. I wonder if I picked up these ‘female’ traits from the induction to story telling by my mother and grandmothers.

Then I went and tested out a personal favourite of the parts of the holiday I’ve retold so far – Part V – and it was overwhelming male. Over 1200 point difference ~7000 point contest. So then I thought it might actually  have to do with the topic being written about, more than the actual ‘ability’ to retell a story. In Part V it’s quite descriptive and personal. I suspect that a descriptive and personal recount by a female author would genuinely use the ‘feminine’ words, while a descriptive and personal recount by a male author would genuinely use the ‘male’ words. In Part XVIII, it’s pure recount with little self-reflection or personal aspects that would give away the gender to the reader – and thus the word choices there aren’t significant of gender per se, rather they were best fitting.

Anyway, it’s all these unanswered question and my lack of knowledge in this field that saw me give up my investigations into the topic years ago. I’m not about to start all over again, because I remember I didn’t get too far with all the work I put into reading up on the topic. Maybe it’s just all theory. Maybe it’s all hoopla. I don’t know and don’t think I ever will. I’ll leave it at that, and hope it suffices my momentary curiosity.


P.S. This post has a 400 point leaning towards being written by a male author in a ~2,400 point contest.


7 thoughts on “Text-gender relationships

  1. Like I said, too many unanswered questions that stand to be unanswerable altogether. I’m sure someone, somewhere can answer them in one way, but would also start off a new round of questions (much like you have asked in your revised comment on your post). I don’t intend to be the person who can explain the whole theory. Or even believes it. The website I linked comes out, more often than not, correctly though. But your doubts in the websites are founded.

  2. Pingback: Words and Gender « Rik O’Neill’s Nothing Chronicle

  3. I just read your post rikoneill. Left a comment of my own. You certainly provide some food for thought. I’ll be thinking about some of the things you wrote about. Maybe even write another post about this stuff. I’ll let you know if I do.

  4. Thanks for the link to the site, Thomas. You say you’ve researched this before, so it’s probably old news, but I did like the article in the Guardian re: the accuracy of the site.


    As you mentioned to the poster above, this sort of area is a difficult one, and there are probably no real answers (people are people and trying to “box them in” or generalize is always worrying!) but I find it interesting for anecdotal observations of the differences between men and women.

    It is a universal human subject! Are we fundamentally different, are we fundamentally the same! Can’t we just agree the toilet seat should always, always, be left up!

  5. Hahaha, you’re safe with a comment like that about toilet seats around here – but I know some places where you’d find yourself in trouble!

    You’re right, boxing people in is a worrying thing. But often I think that there might be enough fine details that differentiate a majority of one group from a majority of another group. I think that’s where websites like these come in. Yes, there’s the greay area in the middle, but for a majority of cases the principle of differences separate the groups. The principle might not be correct 200% of the time, but fundamentally they work with fine details to create broad differences.

    Then again, are broad differences even important or telling if we are dealing with individuals – those who are the fine details?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s