I have a MySQL database with a PHP front-end. I would like to implement a search function but I have a somewhat unique situation and need some advice before proceeding.
My employer has a large collection of archival research materials. Some of the collections have metadata and data available in a digital format, however, not all of the digital data is allowed to be accessible via the internet due to donor agreements or copyright issues. In the cases where the digital data is not allowed on the internet, people may physically visit our building and view the information. Right now each digital collection has its own database but we are in the process of consolidating everything into a single db so that patrons can search across all collections at the same time.
It is my understanding that if I use Solr to index and search, the info is transmitted over HTTP between the Solr instance running on Tomcat to the client and that could potentially expose data that is not supposed to be public. To avoid this problem, I thought perhaps it would be a better idea to use Lucene directly on the server to generate the index and then somehow access it from PHP directly on the same server. My questions are (1) does my assessment of the situation sound correct; and (2) if not, how does it actually work? I do know Java. Thank you.
Transmitting data over HTTP and having public access to that data are two entirely different concerns. You can have a Solr server running on a physically different machine halfway across the globe and still configure it so it can only be accessed from one particular machine which you explicitly allow. Usually though you may have the Solr server running on the same machine that your app is running on and configure the server’s firewall/port/routing settings to only allow public access to your application, not the Solr server. This really is not a concern over choosing one or the other, just a matter of configuration.