I’m working on a system which allows imported files to be localized into other

Question

0

Asked: June 13, 20262026-06-13T06:35:46+00:00 2026-06-13T06:35:46+00:00

I’m working on a system which allows imported files to be localized into other

0

I’m working on a system which allows imported files to be localized into other languages.

This is mostly a private project to get the hang of MVC3, EntityFramework, LINQ, etcetera. Therefore I like doing some crazy things to spice up the end result, one of those things would be the recognition of similar strings.

Imagine you have the following list of strings – borrowed from a game I’ve worked with in the past:

Megabeth: Holy Roller Uniform – Includes Head, Torso, and Legs
Megabeth: Holy Roller Uniform Head
Megabeth: Holy Roller Uniform Legs
Megabeth: Holy Roller Uniform Torso
Megabeth: PAX East 2012 Uniform – Includes Head, Torso, and Legs
Megabeth: PAX East 2012 Uniform Head
Megabeth: PAX East 2012 Uniform Legs
Megabeth: PAX East 2012 Uniform Torso

As you can see, once users have translated the first 4 strings, the following 4 share a lot of similarities, in this case:

Megabeth
Uniform
Includes Head, Torso, and Legs
Head
Legs
Torso

Consider the first 4 strings are indeed already translated, when a user selects the 5th string from the list, what kind of algorithm or technique can I use to show the user the 1st string (and potentially others) under a sub-header of “Similar strings”?

Edit – A little comment on the Levenshtein Distance:
I’m currently targeting 10k strings in the database. Levenshtein Distance compares string per string, so in this case 10k x (10k -1) possible combinations. How would I approach this in a feasible way? Is there a better solution that this particular algorithm?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T06:35:46+00:00

Editorial Team

2026-06-13T06:35:46+00:00Added an answer on June 13, 2026 at 6:35 am

You could look into the Levenshtein Distance. Those below a certain threshold will be considered similar. Two strings that are identical will have a distance of zero.

There’s a C# implementation, amongst other languages, on Rosetta Code.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a system which allows imported files to be localized into other

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply