I’ve got two word lists, an example:
list 1 list 2
foot fuut
barj kijo
foio fuau
fuim fuami
kwim kwami
lnun lnun
kizm kazm
I’d like to find
o → u # 1 and 3
i → a # 3 and 7
im → ami # 4 and 5
This should be ordered by amount of occurrences, so I can filter the
ones that don’t appear often.
The lists currently consist of 35k words, the calculation should
take about 6h on an average server.
My final solution is to use the mosesdecoder. I split the words into
single characters and used them as parallel corpus and used the
extracted model. I compared Sursilvan and Vallader.