So, this is the script that has very kindly been given to me as a starter:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from __future__ import with_statement # needed for Python 2.5
from itertools import chain
def chunk(s):
"""Split a string on whitespace or hyphens"""
return chain(*(c.split("-") for c in s.split()))
def process(latin, gloss, trans):
chunks = zip(chunk(latin), chunk(gloss))
# now you have to DO SOMETHING with the chunks!
def main():
with open("examples.txt") as inf:
try:
while True:
latin = inf.next().strip()
gloss = inf.next().strip()
trans = inf.next().strip()
process(latin, gloss, trans)
inf.next() # skip blank line
except StopIteration:
# reached end of file
pass
if __name__=="__main__":
main()
However,
I’ve just spoken to my lecturer, who has let me know that he doesn’t want us using the
__ x __
function, as it is "too advanced for the students’ needs at this point in the course".
I’m absolutely stumped as to what I need to put into the "chunks" or "process" fields, up until now I’ve been able to figure most of the other exercises out (with a few hints) but this one is just way beyond me. This particular part is worth 15 points out of 20, and it’s making me feel just a little bit sick!
Any further help would be greatly appreciated.
Original post (sorry it’s so long!)
I’m trying to do the following: I have a text with a language other than english, broken up into morphemes (parts of each word) using hyphens, with the English gloss (linguistic translation of each morpheme) and a direct translation below. eg.
Itali-am fat-o profug-us Lavini-a-que ven-it
Italy-Fem:Sg:Acc fate-Neut:Sg:Abl fleeing-Masc:Sg:Nom Lavinian-Neut:Pl:Acc come:Perf-3-Sg:Indic:Act
‘in flight [driven] by fate came to Italy and the Lavinian [shores]’
I’ll have several texts such as the above in one file – i.e.
blank line
a line of latin broken up with hyphens
a line of gloss broken up with corresponding hyphens, using colons to join elements
a line of translation
blank line
latin
gloss
translation
ad infinitum.
What I need to do is write a file that gives me the following output:
Itali: 1 Italy
am: 1 Fem:Sg:Acc
fat: 1 fate
o: 1 Neut:Sg:Abl
profug: 1 fleeing
us: 1 Masc:Sg:Nom
Lavini: 1 Lavinian
a: 1 Neug:Pl:Acc
que: 1 come:Perf
ven: 1 3
it: 1 Sg:Indic:Act
where the first column represents the first line of text without hyphens; the second column indicates the number of occurrences (it’s only 1 each in this example), and the third column is the English translation of the first column, as written in the text.
If there’s a latin morpheme with no corresponding English gloss/translation, the Latin column will be as normal but the English column will print [unknown], like:
a: 1 [unknown]
And if the opposite, i.e. an English morpheme with no corresponding Latin, it should print
[unknown]: 1 kitten
Finally, the prog needs to be able to deal with homophonous morphemes (i.e. two identically spelled latin morphemes with different meanings). e.g.
a: 16 Neuter:Plural
a: 28 Feminine:Singular
Whenever you need to count occurrences, you need a dictonary.
Create a dictionary where the key is the tuple generated by zip, and the value is a list that has: [latin, amount, translation]. Each time you encounter the same tuple you increment the amount.
The dictionary has to outlive the function so you probably want to add it as a parameter.
Once you are done, you can do: result = dict.keys(); result.sort().
I’m not sure I understand the part of the unknowns. If this does not solve that part, you might need to show a relevant example.