What we did

Investigated the extent to which semantically similar words sound similar.

For communication efficiency it is expected that words which are semantically similar will not sound similar (otherwise confusion will ensue). Word pairs that are semantically/phonetically similar occur because of coincidence and if the usage is rare the evolutionary pressure to change will be low.

Lots of noise in the data. Filter pairs that have the same word stem. British/American spelling difference then becomes the largest source of noise. Then removed pairs where both words were not in the common subset of ispell's US/British words lists.

Interesting word pairs








Heatmap of data

Based on 2 million word pairs.

Phonetic vs semantic similarity.

alt text

Simple heat map.

alt text

Filter out words with different spelling in British/US English (0.5 million word pairs).

alt text

Word (i.e., character level) distance vs. semantics.

alt text

Phonetic vs character level distance.

alt text