Saturday 24 October 2015

On The Mathematics of Meaning


I have worked with co-occurrence models of semantics for a long time. These computational models try to bootstrap word meaning from analysis of patterns of word co-occurrence in large corpora of text. Recently, Google released a set of tools (word2vec) and associated materials for a new, and very good, kind of co-occurrence model that they have built. There is a nice explanation of the model here.

One of the things you can do with co-occurrence model word representations is subtract or add them, to see what the resultant word representation 'means' (I skip over the mathematical details since we are just here for fun). For example, in word2vec space:
king - man + woman = queen
The equality sign here has to be taken with a grain of salt; it really means 'is similar to'.

My colleague Geoff Hollis and I have been working with the word2vec model (using a smaller dictionary and a slightly different representation and similarity measure than Google). I added the ability to add fractions of representations instead of just adding or subtracting each word representation as a whole, and have spent some time looking for interesting semantic math results. I have defined '=' here as 'being in the top ten closest results' (and also restricted myself by requiring that the final result on the right of the '=' sign cannot be among the top ten closest neighbors of any the input words on the left of that sign). This human flexibility (and the fact that I have deliberately searched for interesting results) means that this math is really a human-computer collaboration rather than a purely computational result. 

Here are some of my most interesting results. Enjoy.
love + 0.4 * sex = friendship
love + sex = infidelity
love + 3 * sex = monogamy

murder + fun = gunplay

apple + pig = potato

cat + 0.7 * dog = poodle

despair + 0.5 * hope = frustration

wealth + 0.2 * dream + 3 * selfish = elitist

courage + 2 * stupidity - incompetence = audacity

hope + time = opportunity

logic + hope = principle

man - 2 * education = snake

tiger - cat = rhino

sex + drunken = debauchery

love + dream = passion
[Image from: Alfred Bray Kempe (1886) A Memoir of the Theory of Mathematical Form.]

No comments:

Post a Comment