I’m learning about text processing in Java for a class and the example in class was to read in data from a file, do text processing, write back data (List) to the file. I understand the example in that he reads in each line into a String and adds that line to the list and uses the .split(” “) and then Collections.sort to sort the data returning one of the strings. However, if there are commas and extra whitespace, I don’t know how to format those. I read up on regex, but wasn’t sure if that was needed since we haven’t covered that and was going for the trim() method. But if I put the trim() method in the compare method of my class that implements Comparator that gets passed to Collections.sort, it wouldn’t get passed the correctly formatted string since compare returns an int. So I guess I’m looking for some general guidelines to help with this assignment, but not giving away the answer completely. Thanks.
Edit:
Assignment is to write the list in order, deleting duplicates and extra whitespace.
public class TextProcess
{
public static void main(String[] args)
{
try {
// get data from class file
List<String> data = TextFileUtils.readTextFile("addressbooktest.txt");
// process data. Really just the same address book that looks like
// firstName, lastName, phone, email
// with the commas, but deleting duplicates, the extra whitespace, and sorting alphabetically
Collections.sort(data, FIRSTNAMECOMPARATOR);
// write to output file
TextFileUtils.writeTextFile(data, "parsedaddressbooktest.txt");
}
catch (IOException e) {
e.printStackTrace();
}
}
private static final FirstNameComparator FIRSTNAMECOMPARATOR = new FirstNameComparator();
}
class FirstNameComparator implements Comparator<String>
{
public int compare(String s1, String s2)
{
String[] st1 = s1.split(",");
String[] st2 = s2.split(",");
String firstName1 = st1[0].toUpperCase().trim();
String lastName1 = st1[1].toUpperCase().trim();
String firstName2 = st2[0].toUpperCase().trim();
String lastName2 = st2[1].toUpperCase().trim();
if (!(firstName1.equals(firstName2)))
return firstName1.compareTo(firstName2);
else
return lastName1.compareTo(lastName2);
}
}
A
Comparatoris simply a way of determining the relative order of two items, nothing more. You’d use it when you want to control the order that a collection of objects are sorted, but in this case it sounds like you’re trying to mutate the objects within your comparator; this isn’t going to work.You’re right that the
trim()method will get rid of leading and trailing whitespace (subject to its own definition of whitespace, which is fine for simple use cases like yours). You’ll need to use this earlier on; after you’ve extracted the “raw” data, of course, but before you add the data to the list.Beyond that, you haven’t actually said what the requirements are. I can assume that you need to discard trailing whitespace, but what about the commas? Should these be interpreted as element separators, in a functionally equivalent way to newlines? Or is something else needed?
I think you’re on the right track in general; just think about the steps required and try to do each one separately as it’s cleaner that way. From what I can tell, your steps might be something like: