I’m trying to use the wordalignment in the BerkeleyAligner.jar file from http://code.google.com/p/berkeleyaligner/ in my own java class.
I have already added the .jar file into my buildpath.
What parameters does the edu.berkeley.nlp.wordAlignment.combine.CombinedAligner take?
What does the edu.berkeley.nlp.wordAlignment.combine.CombinedAligneroutput?
What i have are 2 input files that are already sentence aligned; i.e. the sentence from line number X from the sourceFile is the same (but in a different language) as the sentence from line number X of the targetFile.
import edu.berkeley.*;
import edu.berkeley.nlp.wa.mt.Alignment;
import edu.berkeley.nlp.wa.mt.SentencePair;
public class TestAlign {
BufferedReader brSrc = new BufferedReader(new FileReader ("sourceFile"));
BufferedReader brTrg = new BufferedReader(new FileReader ("targetFile"));
String currentSrcLine;
while ((currentSrcLine = brSrc.readLine()) !=null) {
String currentTrgLine = brTrg.readline();
// Reads into BerkeleyAligner SentencePair format.
SentencePair src2trg = new SentencePair(sentCounter, params.get("source"),
Arrays.asList(srcLine.split(" ")), Arrays.asList(trgLine.split(" ")));
// How do i call the BerkeleyAligner??
// -What parameters does the CombinedAligner takes?
// -What does the function/class returns?
// I assume it returns a list of strings.
// Is there a class in BerkeleyAligner to read the output?
// Please provide some example, thank you!!
Alignment output = edu.berkeley.nlp.wordAlignment.combine.CombinedAligner
.something.something(currentSrcLine, currentTrgLine);
}
}
e.g. sourceFile:
this is the first line in the textfile.
that is the second line.
foo bar likes to eat bar foo.
e.g. targetFile:
Dies ist die erste Textzeile in der Datei.
das ist die zweite Zeile.
foo bar gerne bar foo essen.
Actual Answer
You just wanted to align text (from a target file and a source file), right?
If so, after creating a sentence pair, you did not even need to put them in a
CombinedAligner.You could get an Alignment:
(SentencePair, boolean)from that. The boolean is if you want a tree alignment.Putting it into the constructor will generate an Alignment automatically!
So simple!
This is where I got the code: http://code.google.com/p/berkeleyaligner/source/browse/trunk/src/edu/berkeley/nlp/wa/mt/Alignment.java
UPDATE
Unfortunately, I misunderstood your question, and posted an irrelevant response.
However, I downloaded the jar file, found CombinedAligner.class, and decompiled it.
Here’s what I got:
package edu.berkeley.nlp.wordAlignment.combine;
It seems that the
Alignmentclass you’re using isedu.berkeley.nlp.mt.Alignment.Anyway,
CombinedAligneris abstract, so you can’t instantiate it. And I don’t know what the.something‘s are, because there is no static method or field.I think that what you want, however, is
alignSentencePair(SentencePair).To get this, you need to use a subclass of
CombinedAligner, asCombinedAligneris abstract.So, after poking around the files, I found these subclasses:
You can use these instead of
CombinedAlignerand insert your two sentences as aSentencePair!After checking, I realized that
WordAligneris also abstract!import edu.berkeley.nlp.mt.Alignment;
import edu.berkeley.nlp.mt.SentencePair;
import fig.basic.LogInfo;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
public abstract class WordAligner implements Serializable {
}
I found a subclass, though:
Unfortunately, this is still abstract.
But there’s a subclass of
IterWordAlignerthat isn’t:edu.berkeley.nlp.wordAlignment.EMWordAligner
However, the constructor is really weird.
It uses an INNER CLASS in the CONSTRUCTOR!? That’s terrible programming practice.
WAIT…
I found a word aligner!
http://code.google.com/p/tdx-nlp/source/browse/trunk/pa2/java/src/cs224n/assignments/WordAlignmentTester.java?r=67
Maybe that helps and you can resolve your problem with it.