I’m using Solr 3.x with focus on German text, which works well. Searching for

Question

0

Asked: May 26, 20262026-05-26T06:36:11+00:00 2026-05-26T06:36:11+00:00

I’m using Solr 3.x with focus on German text, which works well. Searching for

0

I’m using Solr 3.x with focus on German text, which works well.
Searching for umlauts (öäüß) also works well.

The problem is:
I received some archived text from the late 80s, were most of the computer/software did not support more than ASCII, especially no German umlauts were supported.
For this an alternative notation was used:

ae instead of ä
oe instead of ö
ue instead of ü
ss instead of ß

That means, the name Müller was saved as Mueller.

Back to Solr, I need now to find documents which contains ue – even if the user searched for ü.

Example: If I like to search for all text messages from the person called Müller,
Solr has to find text with Mueller and also Müller

How can I handle this?

Is this an adequate feature? –> http://wiki.apache.org/solr/UnicodeCollation (I’m not sure, if I understand the documentation completely)

By the way, it’s not an option to change the source-text by “search and replace”: all oe to ö.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T06:36:12+00:00

As Paige Cook already pointed out, you already found the relevant documentation, but since not every Solr user knows Java I decided to create my own answer with a little more detail.

The first step is to add the filter to your field definition:

<fieldType>
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <!-- BEGIN OF IMPORTANT PART -->
    <filter class="solr.CollationKeyFilterFactory"
        custom="customRules.dat"
        strength="primary"
    />
    <!-- END OF IMPORTANT PART -->
  </analyzer>
</fieldType>

The next step is to create the necessary customRules.dat file:

You have to create a tiny Java program in order to follow the documentation. Unfortunately for non-Java programmers this is a little difficult, since the code snippet only shows the important parts. Also it uses a third-party library not distributed with the JDK (Apache Commons IO)

Heres the full Java 7 code necessary to write a customRules.dat without the use of external libraries:

import java.io.*;
import java.text.*;
import java.util.*;

public class RulesWriter {
    public static void main(String[] args) throws Exception {
        RuleBasedCollator baseCollator = (RuleBasedCollator) 
                Collator.getInstance(new Locale("de", "DE"));

        String DIN5007_2_tailorings =
          "& ae , a\u0308 & AE , A\u0308"+
          "& oe , o\u0308 & OE , O\u0308"+
          "& ue , u\u0308 & UE , u\u0308";

        RuleBasedCollator tailoredCollator = new RuleBasedCollator(
                baseCollator.getRules() + DIN5007_2_tailorings);
        String tailoredRules = tailoredCollator.getRules();

        Writer fw = new OutputStreamWriter(
                new FileOutputStream("c:/customRules.dat"), "UTF-8");
        fw.write(tailoredRules);
        fw.flush();
        fw.close();
    }
}

Disclaimer: The above code compiles and creates a customRules.dat file, but I didn’t actually test the created file with Solr.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using Solr 3.x with focus on German text, which works well. Searching for

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply