What we did
Investigated the extent to which semantically similar words sound similar.
For communication efficiency it is expected that words which are semantically similar will not sound similar (otherwise confusion will ensue). Word pairs that are semantically/phonetically similar occur because of coincidence and if the usage is rare the evolutionary pressure to change will be low.
Lots of noise in the data. Filter pairs that have the same word stem. British/American spelling difference then becomes the largest source of noise. Then removed pairs where both words were not in the common subset of ispell's US/British words lists.
Interesting word pairs
Heatmap of data
Based on 2 million word pairs.
Phonetic vs semantic similarity.
Simple heat map.
Filter out words with different spelling in British/US English (0.5 million word pairs).
Word (i.e., character level) distance vs. semantics.
Phonetic vs character level distance.