Skip to content

Commit

Permalink
Fix handling of zero-length label written last in the lexicon
Browse files Browse the repository at this point in the history
There is no restriction in place disallowing zero-length labels, and in some use cases,
it makes sense to have them. However, when the lexicon file happens to include a
zero-length label as the last label in the list, split(" ") swallows it. This can later lead
to ArrayIndexOutOfBoundExceptions when something gets classified with the
zero-length label.
  • Loading branch information
Googulator authored Sep 22, 2021
1 parent aff939f commit 2d9ed73
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion src/main/java/com/digitalpebble/classification/Lexicon.java
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,11 @@ private void loadFromFile(String filename) throws IOException {
.readLine());
this.normalizeVector = Boolean.parseBoolean(reader.readLine());
this.classifierType = reader.readLine();
this.labels = Arrays.asList(reader.readLine().split(" "));
this.labels = new ArrayList<String>();
// Need -1 to handle the case where the last label is zero length
this.labels.addAll(Arrays.asList(reader.readLine().split(" ", -1)));
// Remove the extra entry created by the terminating space
this.labels.remove(labels.size() - 1);
String[] tmp = reader.readLine().split(" ");
for (String f : tmp) {
// see if there is a custom weight for it
Expand Down

0 comments on commit 2d9ed73

Please sign in to comment.