Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6698651
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T06:36:11+00:00 2026-05-26T06:36:11+00:00

I’m using Solr 3.x with focus on German text, which works well. Searching for

  • 0

I’m using Solr 3.x with focus on German text, which works well.
Searching for umlauts (öäüß) also works well.

The problem is:
I received some archived text from the late 80s, were most of the computer/software did not support more than ASCII, especially no German umlauts were supported.
For this an alternative notation was used:

ae instead of ä
oe instead of ö
ue instead of ü
ss instead of ß

That means, the name Müller was saved as Mueller.

Back to Solr, I need now to find documents which contains ue – even if the user searched for ü.

Example: If I like to search for all text messages from the person called Müller,
Solr has to find text with Mueller and also Müller

How can I handle this?

Is this an adequate feature? –> http://wiki.apache.org/solr/UnicodeCollation (I’m not sure, if I understand the documentation completely)

By the way, it’s not an option to change the source-text by “search and replace”: all oe to ö.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T06:36:12+00:00Added an answer on May 26, 2026 at 6:36 am

    As Paige Cook already pointed out, you already found the relevant documentation, but since not every Solr user knows Java I decided to create my own answer with a little more detail.

    The first step is to add the filter to your field definition:

    <fieldType>
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <!-- BEGIN OF IMPORTANT PART -->
        <filter class="solr.CollationKeyFilterFactory"
            custom="customRules.dat"
            strength="primary"
        />
        <!-- END OF IMPORTANT PART -->
      </analyzer>
    </fieldType>
    

    The next step is to create the necessary customRules.dat file:

    You have to create a tiny Java program in order to follow the documentation. Unfortunately for non-Java programmers this is a little difficult, since the code snippet only shows the important parts. Also it uses a third-party library not distributed with the JDK (Apache Commons IO)

    Heres the full Java 7 code necessary to write a customRules.dat without the use of external libraries:

    import java.io.*;
    import java.text.*;
    import java.util.*;
    
    public class RulesWriter {
        public static void main(String[] args) throws Exception {
            RuleBasedCollator baseCollator = (RuleBasedCollator) 
                    Collator.getInstance(new Locale("de", "DE"));
    
            String DIN5007_2_tailorings =
              "& ae , a\u0308 & AE , A\u0308"+
              "& oe , o\u0308 & OE , O\u0308"+
              "& ue , u\u0308 & UE , u\u0308";
    
            RuleBasedCollator tailoredCollator = new RuleBasedCollator(
                    baseCollator.getRules() + DIN5007_2_tailorings);
            String tailoredRules = tailoredCollator.getRules();
    
            Writer fw = new OutputStreamWriter(
                    new FileOutputStream("c:/customRules.dat"), "UTF-8");
            fw.write(tailoredRules);
            fw.flush();
            fw.close();
        }
    }
    

    Disclaimer: The above code compiles and creates a customRules.dat file, but I didn’t actually test the created file with Solr.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

For some reason, after submitting a string like this Jack’s Spindle from a text
I have a text area in my form which accepts all possible characters from
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
That's pretty much it. I'm using Nokogiri to scrape a web page what has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have just tried to save a simple *.rtf file with some websites and
I am trying to understand how to use SyndicationItem to display feed which is
I used javascript for loading a picture on my website depending on which small
I am reading a book about Javascript and jQuery and using one of the
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.