I am importing huge amount of data from CSV files into MSSQL Server 2008. I am using core JDBC (Without any ORM frameworks) and communicating with the DB using the driver ‘sqljdbc4.jar’ provided by Microsoft.
Following is my requirements:
- Parse the CSV file sequentially.
- Validate the record based on business requirements.(Log the error against each record in error file, if the record is invalid.)
- Import the records which is not found invalid into DB. (Log the error against each record in error file, if the record is invalid.)
- Save two resultant files success and error. Success will have the same number of fields with good records. Error file will have one additional field ‘ERROR’ with the error (Logged from validate / import face).
As of now am importing the records one by one. Which takes considerable amount of time even though am using PreparedStatement.
I am not using batch import since I need to log the exact error into the Error file.
Kindly suggest any an idea to improve the performance with out sacrificing the accurate error logging. And I am forced to do this without any ORM tools.
Here is the sample code:
for (Map<String, String> csvRecord : csvAsList) {
// Prepare category object using csvRecord.
// invoke obj.insert(category);
}
public Category insert(Category category){
if (category == null) {
return null;
}
String SQL = "INSERT INTO t1(c1,c2) VALUES(?,?)";
PreparedStatement pstmt = null;
ResultSet rs = null;
try {
pstmt = this.dbConnectionUtil.getConnection().prepareStatement(SQL,
Statement.RETURN_GENERATED_KEYS);
pstmt.setInt(1, category.getField1());
pstmt.setString(2, category.getField2());
int result = pstmt.executeUpdate();
if (result < 1) {
return null;
}
rs = pstmt.getGeneratedKeys();
if (rs.next()) {
category.setId(rs.getInt(1));
}
} finally {
if (rs != null)
rs.close();
if (pstmt != null)
pstmt.close();
this.dbConnectionUtil.closeConnection();
}
return category;
}
Update on Sep 20, 2012.
I have modified the code so that to import one CSV file there will be only one PreparedStatement object will be created. The new code is below:
public void importCSV(){
// Create a db connection if its null or closed.
// Create PreparedStatement objects for selects and inserts if null or closed.
for (Map<String, String> csvRecord : csvAsList) {
// Prepare category object using csvRecord.
// Check whether category exists in by.
// Import files can have up to 1,00,000 records so tracking errors is critical.
try{
categoryDAO.findByName(categoryName,<PreparedStatement object>);
}
catch(Exception exp){
// log this to error.csv file
}
// If its a new category import it to the db.
try{
categoryDAO.insert(category,<PreparedStatement object>);
}
catch(Exception exp){
// log this to error.csv file
}
}
// Close PreparedStatement objects
// Close DB Connection
}
public Category insert(Category category,PreparedStatement pstmt ) throws SQLException{
if (category == null) {
return null;
}
ResultSet rs = null;
try {
pstmt.setInt(1, category.getField1());
pstmt.setString(2, category.getField2());
int result = pstmt.executeUpdate();
if (result < 1) {
return null;
}
rs = pstmt.getGeneratedKeys();
if (rs.next()) {
category.setId(rs.getInt(1));
}
} finally {
if (rs != null)
rs.close();
}
return category;
}
Thanks.
It looks like you are calling
prepareStatementover again for each row of your input data. This will pretty much eliminate the performance gain from usingPreparedStatement. Instead, create thePreparedStatementoutside your loop, and inside your loop keep only thesetInt,setStringandexecuteUpdatecalls.