I’m having a big problem with the embedded firebird database engine and Java SE.
I’m currently developing a filtering tool for users to filter out data.
So I have made two Options for filtering, user can chose one or both:
- Filter out from a black list(the black list is controled by user).
- Filter out according to a massive list that records every record ever uploaded and filtered out.
The data the user uploads its in plain text comma or token separated like this:
(SET OF COLUMNS)| RECORD TO FILTER |
0-MANY COLUMNS | ABC2 |
0-MANY COLUMNS | ABC5 |
When I Upload it to the DB, I add A FLAG for every filter
(SET OF COLUMNS) | RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC2 | | |
0-MANY COLUMNS | ABC5 | | |
So, when it comes to the second filter, the program has a main empty table on the first run of the software, then it fills that table with all the records from the very first upload.
The main table will have unique records like the following table after a few text uploads made by the user:
Record | Date criteria for filtering |
ABC1 | 08/11/2012:1,07/11/2012:3,06/11/2012:5|
ABC2 | 05/11/2012:1,04/11/2012:0,03/11/2012:0|
ABC3 | 12/11/2012:3,11/11/2012:0,10/11/2012:0|
ABC4 | 12/11/2012:1,11/11/2012:0,10/11/2012:0|
ABC5 | 12/11/2012:3,11/11/2012:0,10/11/2012:3|
ABC9 | 11/11/2012:3,10/11/2012:1,09/11/2012:0|
When the data is processed, for example, the software updates both, the main table and the user table:
(SET OF COLUMNS| RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC4 | | |
0-MANY COLUMNS | ABC9 | | |
So the main table will update:
Record | Day criteria for filtering |
ABC1 | 08/11/2012:1,07/11/2012:3,06/11/2012:5|
ABC2 | 05/11/2012:1,04/11/2012:0,03/11/2012:0|
ABC3 | 12/11/2012:3,11/11/2012:0,10/11/2012:0|
ABC4 | 12/11/2012:1,11/11/2012:0,10/11/2012:0| ->12/11/2012:2,11/11/2012:0,10/11/2012:0
ABC5 | 12/11/2012:3,11/11/2012:0,10/11/2012:3|
ABC9 | 11/11/2012:3,10/11/2012:1,09/11/2012:0| ->12/11/2012:1,11/11/2012:3,10/11/2012:1
If in the last three days the data criteria event has reached more than four, the user table will flag filter B. Notice that each date has an integer next to it.
(SET OF COLUMNS)| RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC4 | | |
0-MANY COLUMNS | ABC9 | | X |
Both updates are in a single transaction, the problem is that when the user uploads more than 800,000 records my program throws the following exception in the while loop.
I use StringBuilder parsing and append methods for maximun performance on the mutable days string.
java.lang.OutOfMemoryError: Java heap space
Here it’s my code, I use five days:
FactoriaDeDatos factoryInstace = FactoriaDeDatos.getInstance();
Connection cnx = factoryInstace.getConnection();
cnx.setAutoCommit(false);
PreparedStatement pstmt = null;
ResultSet rs=null;
pstmt = cnx.prepareStatement("SELECT CM.MAIL,CM.FECHAS FROM TCOMERCIALMAIL CM INNER JOIN TEMPMAIL TMP ON CM.MAIL=TMP."+colEmail);
rs=pstmt.executeQuery();
pstmtDet = cnx.prepareStatement("ALTER INDEX IDX_MAIL INACTIVE");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("SET STATISTICS INDEX IDX_FECHAS");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("ALTER INDEX IDX_FECHAS INACTIVE");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("SET STATISTICS INDEX IDX_FECHAS");
pstmtDet.executeUpdate();
sql_com_local_tranx=0;
int trxNum=0;
int ix=0;
int ixE1=0;
int ixAc=0;
StringBuilder sb;
StringTokenizer st;
String fechas;
int pos1,pos2,pos3,pos4,pos5,pos6,pos7,pos8,pos9;
StringBuilder s1,s2,sSQL,s4,s5,s6,s7,s8,s9,s10;
long startLoop = System.nanoTime();
long time2 ;
boolean ejecutoMax=false;
//int paginador_sql=1000;
//int trx_ejecutada=0;
sb=new StringBuilder();
s1=new StringBuilder();
s2=new StringBuilder();
sSQL=new StringBuilder();
s4=new StringBuilder();
s6=new StringBuilder();
s8=new StringBuilder();
s10=new StringBuilder();
while(rs.next()){
//De aqui
actConteoDia=0;
sb.setLength(0);
sb.append(rs.getString(2));
pos1= sb.indexOf(":",0);
pos2= sb.indexOf(",",pos1+1);
pos3= sb.indexOf(":",pos2+1);
pos4= sb.indexOf(",",pos3+1);
pos5= sb.indexOf(":",pos4+1);
pos6= sb.indexOf(",",pos5+1);
pos7= sb.indexOf(":",pos6+1);
pos8= sb.indexOf(",",pos7+1);
pos9= sb.indexOf(":",pos8+1);
s1.setLength(0);
s1.append(sb.substring(0, pos1));
s2.setLength(0);
s2.append(sb.substring(pos1+1, pos2));
s4.setLength(0);
s4.append(sb.substring(pos3+1, pos4));
s6.setLength(0);
s6.append(sb.substring(pos5+1, pos6));
s8.setLength(0);
s8.append(sb.substring(pos7+1, pos8));
s10.setLength(0);
s10.append(sb.substring(pos9+1));
actConteoDia=Integer.parseInt(s2.toString());
actConteoDia++;
sb.setLength(0);
//sb.append(s1).a
if(actConteoDia>MAXIMO_LIMITE_POR_SEMANA){
actConteoDia=MAXIMO_LIMITE_POR_SEMANA+1;
}
sb.append(s1).append(":").append(actConteoDia).append(",").append(rs.getString(2).substring(pos2+1, rs.getString(2).length()));
//For every date record it takes aprox 8.3 milisec by record
sSQL.setLength(0);
sSQL.append("UPDATE TCOMERCIALMAIL SET FECHAS='").append(sb.toString()).append("' WHERE MAIL='").append(rs.getString(1)).append("'");
pstmtDet1.addBatch(sSQL.toString());
//actConteoDia=0;
//actConteoDia+=Integer.parseInt(s2.toString());
actConteoDia+=Integer.parseInt(s4.toString());
actConteoDia+=Integer.parseInt(s6.toString());
actConteoDia+=Integer.parseInt(s8.toString());
actConteoDia+=Integer.parseInt(s10.toString());
if(actConteoDia>MAXIMO_LIMITE_POR_SEMANA){
sSQL.setLength(0);
sSQL.append("UPDATE TEMPMAIL SET DIASLIMITE='S' WHERE ").append(colEmail).append("='").append(rs.getString(1)).append("'");
pstmtDet.addBatch(sSQL.toString());
}
sql_com_local_tranx++;
if(sql_com_local_tranx%2000==0 || sql_com_local_tranx%7000==0 ){
brDias.setString("PROCESANDO "+sql_com_local_tranx);
pstmtDet1.executeBatch();
pstmtDet.executeBatch();
}
if(sql_com_local_tranx%100000==0){
System.gc();
System.runFinalization();
}
}
pstmtDet1.executeBatch();
pstmtDet.executeBatch();
cnx.commit();
I’ve made telemetry tests so I can trace where the problem lies.
The big while it’s the problem, I think, but I don’t know where the problem may exactly be.
I’m adding some images of the telemtry tests, please I need to interpretate them properly.
The gc becomes inverse to the time the jvm to keep objects alive:
http://imageshack.us/photo/my-images/849/66780403.png
The memory heap goes from 50 MB to 250 MB, the used heap reaches the 250 MB creating the outOfMemory exception:
50 MB
http://imageshack.us/photo/my-images/94/52169259.png
REACHING 250 MB
http://imageshack.us/photo/my-images/706/91313357.png
OUT OF MEMORY
http://imageshack.us/photo/my-images/825/79083069.png
The final stack of objetcts generated ordered by LiveBytes:
http://imageshack.us/photo/my-images/546/95529690.png
Any help, suggestion, answer will be vastly appreciated.
The problem is that you are using
PreparedStatementas if it is aStatement, as you are callingaddBatch(string). The javadoc of this method says:This comment was added with JDBC 4.0, before that it said the method was optional. The fact that Jaybird allows you to call this method on
PreparedStatementis therefor a bug: I created issue JDBC-288 in the Jaybird tracker.Now to the cause of the
OutOfMemoryError: When you useaddBatch(String)on thePreparedStatementimplementation of Jaybird (FBPreparedStatement), it is added to a list internal to theStatementimplementation (FBStatement). In case ofFBStatement, when you callexecuteBatch(), it will execute all statements in this list and then clear it. InFBPreparedStatementhoweverexecuteBatch()is overridden to execute the originally prepared statement with batch parameters (in your example it won’t do anything, as you never actually add a PreparedStatement-style batch). It will never execute the statements you added withaddBatch(String), but it will also not clear the list of statements inFBStatementand that is most likely the cause of yourOutOfMemoryError.Based on this, the solution should be to create a
Statementusingcnx.createStatementand use that to execute your queries, or investigate if you could benefit from using one or morePreparedStatementobjects with a parameterized query. It looks like you should be able to use two separate PreparedStatements, but I am not 100% sure; the added benefit would be protection against SQL injection and a minor performance improvement.Addendum
This issue has been fixed since Jaybird 2.2.2
Full disclosure: I am the developer of the Jaybird / Firebird JDBC driver.