Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8933605
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T09:34:50+00:00 2026-06-15T09:34:50+00:00

I have a setup where I have a MySQL DB using Amazon’s RDS (version

  • 0

I have a setup where I have a MySQL DB using Amazon’s RDS (version 5.5.x). I have confirmed that the DB is configured for UTF8. I can insert and select the chinese characters using PHP. It looks fine.

I have SOLR running on Tomcat6 (the stock Tomcat6 that comes from Ubuntu 10.04LTS)

The issue now is plugging SOLR 4.0 into it using MySQL’s own JDBC driver.

I’ll provide the relevant configs below, but the issue I’m trying to solve is once I index my DB table into SOLR, the ASCII chars are fine (of course), but the unicode characters, such as Chinese, show up as jiberish. I cannot query on those chars either (but querying on English words works great!).

Has anyone encountered this issue?

Here’s my db conf xml file that is used for the DataImportHandler:

<dataConfig>
        <dataSource     type="JdbcDataSource"
                        driver="com.mysql.jdbc.Driver"
                        url="jdbc:mysql://db.host.com/db_name?useUnicode=yes&amp;characterEncoding=UTF-8&amp;characterSetResults=UTF-8"
                        user="db_username"
                        password="db_password"
        />

        <document name="vspcm">
                <entity name="conversations"
                        query="select query from mysql"
                >
                </entity>
        </document>
</dataConfig>

I’ve played around with the parameters in the url variable above, changing utf8 to UTF-8, etc.. nothing sticks. (I’ve changed the query to remove my table structure, but i confirmed that it works – delta imports work fine)

For schema.xml, I started from the collection1/conf/schema.xml, removed all the unneccessary fields, and added my own. All text fields (that are varchars in mysql) are set to text_general Example field:

<field name="msg"       type="text_general"     indexed="true"  stored="true"  multiValued="true"/>

(the field above would have unicode characters in it.

The text_general field definition is:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Just to make sure that SOLR plays nice with unicode chars, I imported the example utf8-example.xml schema that is provided in the download of SOLR 4 into “collection1”, and it does indeed idex the example chars. I added some Chinese to the xml file, imported it, and queried against collection1, and I get chinese characters.

So I suspect it is somewhere in the DataImportHandler.

Has anyone encountered this before and come up with a solution?

Thank you so much!
Chris

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T09:34:55+00:00Added an answer on June 15, 2026 at 9:34 am

    Fixed it – was actually a DB issue.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a very straight forward master/slave replication setup with MySQL. ActiveRecord is configured
I currently have a news website setup in PHP/MYSQL that's a bit old and
I currently have a Linux, Apache, MySQL, PHP, Postfix web server that I setup
I have a client that wants to connect to his MySQL database using Dial-up/Modem
Below is the link How to insert,delete,select,update values in datagridview in C# using MYSQL
How can I search something using Solr? I have setup the Solr server in
I have code that in the connection setup selects the database using mysql_select_db(). $link
Newbie question... I have setup an application using Coldfusion/MySQL/Apached Webserver on my localhost. I
I currently have replication setup on MySQL 4.1 (master, named radius) to MySQL 5.0
Bit of an unusual question, but I have setup a field inside a MySQL

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.