I have a MySQL database which is indexed by Solr. I carry out searches using Solr (fast), and I retrieve every result in the Solr search from the database using JPA. JPA runs a WHERE IN query on the database which is VERY slow.
Is there a way to make this process faster, or to refactor the design to improve performance?
I have just refactored the whole application from using MySQL’s fulltext search to use Solr, and now the performance is worse.
Note: I need all results immediately to carry out calculations on, and thus, I cannot use pagination.
Java code:
SolrDocumentList documentList = response.getResults();
Collection<String> listingIds = new ArrayList<>();
for(SolrDocument doc : documentList) {
String listingId = (String) doc.getFirstValue("ListingId");
listingIds.add(listingId);
}
Query query = em.createNamedQuery("getAllListingsWithId");
query.setParameter("listingIds", listingIds);
List<ListedItemDetail> listings = query.getResultList();
Named Query:
<query>Select listing from ListingSet listing where listing.listingId in :listingIds</query>
Additional Information:
SHOW CREATE TABLE ListingSet produces [shortened]:
CREATE TABLE `listingset` (
`LISTINGID` int(11) NOT NULL,
`STARTDATE` datetime DEFAULT NULL,
`STARTPRICE` decimal(10,2) DEFAULT NULL,
`TITLE` varchar(255) DEFAULT NULL,
PRIMARY KEY (`LISTINGID`),
KEY `FK_LISTINGSET_MEMBER_MEMBERID` (`MEMBER_MEMBERID`),
CONSTRAINT `FK_LISTINGSET_MEMBER_MEMBERID` FOREIGN KEY (`MEMBER_MEMBERID`) REFERENCES `member` (`MEMBERID`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Investigating the generated SQL
Looking at the generated SQL, JPA runs a lot of SQL queries for a single JPA query. The ListingSet table has 7 tables it is linked to, and runs a separate SELECT query for each table for EACH listingid (of which there are 1,000 – 10,000). So my one JPA query gets blown into what looks like ~7,000 queries!
The problem was caused by my use of JPA. Due to the many relationship my entity had, a single query exploded into 1,000-10,000 queries.
The solution is to use Batch Processing in JPA to prevent the ORM n + 1 query problem. Batch processing causes JPA to request all relevant rows from related tables at once, rather than once for each entity. This solution is appropriate when a query returns many results, and the entity being queried has many relationships.
The easiest way to determine potential issues with JPA is to enable finer logging. For EclipseLink, add a property to
persistence.xml:Be wary that the logging produced under the default settings for EclipseLink only displays the JPQL form of the queries.