I have a webservice that is getting approx 9GB of raw text data per day from various sources. The vast majority of this is relatively short (100-300) character strings that are repeated very often. I might only have a few thousand unique strings
I usually do not want to pre-optimize, but our storage issues are going to become a problem very soon into development.
I have a JPA Entity, and will simplify for the sake of this posting. This is a string/id pair that is mapped to a parent table.
@Entity
public class DeduplicatedString implements Serializable {
private static final long serialVersionUID = 1L;
@Id
@GeneratedValue
private int id;
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
private String value;
public DeduplicatedString() {
super();
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
}
I’d like to setup a JPA listener (beforeInsert?) to check existing data when adding a new string, and return an existing record if the exact match is already found.
I’d normally just setup an on insert trigger, and am not so sure how to do this in JPA.
Thanks!
The functionality you need is not directly supported by JPA. There is a
@PrePersistannotation (along with several others), but these can only be used to e.g. monitor the system, or make last-minute changes to the entity. JPA is unaware of any triggers executed in the database, and there are currently no mechanisms to link the two.