Before I start, note that I’m using the linux shell (via using subprocess.call() from Python), and I am using openFST.
I’ve been sifting through documents and questions about openFST, but I cannot seem to find an answer to this question: how does one actually give input to an openFST-defined, compiled and composed FST? Where does the output go? Do I simply execute ‘fstproject’? If so, how would I, say, give it a string to transduce, and print the various transductions when the end-state(s) have been reached?
I apologize if this question seems obvious. I’m not very familiar with openFST as of yet.
One way is to create your machine that performs the transformation.
A very simple example would be to upper case a string.
M.wfst
The accompanying symbols file contains a line for for each symbols of the alphabet. Note 0 is reserved for null (epsilon) transitions and has special meaning in many of the operations.
M.syms
Then compile the machine
For an input string “abc” create a linear chain automata, this is a left-to-right chain with an arc for each character. This is an acceptor so we only need a column for the
input symbols.
I.wfst
Compile as an acceptor
Then compose the machines and print
This will give the output
The output of fstcompose is a lattice of all transductions of the input string. (In this case there is only one). If M.ofst is more complicated fstshortestpath can be used to extract n-strings using the flags –unique -nshortest=n. This output is again a transducer, you could either scrap the output of fstprint, or use C++ code and the OpenFst library to run depth first search to extract the strings.
Inserting fstproject –project_output will convert the output to an acceptor containing only the output labels.
Gives the following
This is an acceptor because the input and output labels are the same, the –acceptor options can be used to generate more succinct output.