Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7645839
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T10:02:59+00:00 2026-05-31T10:02:59+00:00

I’m doing a lot of natural language processing with a bit unsusual requirements. Often

  • 0

I’m doing a lot of natural language processing with a bit unsusual requirements. Often I get tasks similar to lemmatization – given a word (or just piece of text) I need to find some patterns and transform the word somehow. For example, I may need to correct misspellings, e.g. given word “eatin” I need to transform it to “eating”. Or I may need to transform words “ahahaha”, “ahahahaha”, etc. to just “ahaha” and so on.

So I’m looking for some generic tool that allows to define transormation rules for such cases. Rules may look something like this:

 {w}in   ->  {w}ing
 aha(ha)+  ->  ahaha

That is I need to be able to use captured patterns from the left side on the right side.

I work with linguists who don’t know programming at all, so ideally this tool should use external files and simple language for rules.

I’m doing this project in Clojure, so ideally this tool should be a library for one of JVM languages (Java, Scala, Clojure), but other languages or command line tools are ok too.

There are several very cool NLP projects, including GATE, Stanford CoreNLP, NLTK and others, and I’m not expert in all of them, so I could miss the tool I need there. If so, please let me know.

Note, that I’m working with several languages and perform very different tasks, so concrete lemmatizers, stemmers, misspelling correctors and so on for concrete languages do not fit my needs – I really need more generic tool.

UPD. It seems like I need to give some more details/examples of what I need.

Basically, I need a function for replacing text by some kind of regex (similar to Java’s String.replaceAll()) but with possibility to use caught text in replacement string. For example, in real world text people often repeat characters to make emphasis on particular word, e.g. someoone may write “This film is soooo boooring…”. I need to be able to replace these repetitive “oooo” with only single character. So there may be a rule like this (in syntax similar to what I used earlier in this post):

{chars1}<char>+{chars2}?  ->  {chars1}<char>{chars2}

that is, replace word starting with some chars (chars1), at least 3 chars and possibly ending with some other chars (chars2) with similar string, but with only a single . Key point here is that we catch on a left side of a rule and use it on a right side.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T10:03:00+00:00Added an answer on May 31, 2026 at 10:03 am

    I’ve found http://userguide.icu-project.org/transforms/general to be useful as well for some general pattern/transform tasks like this, ignore the stuff about transliteration, its nice for doing a lot of things.

    You can just load up rules from a file into a String and register them, etc.

    http://userguide.icu-project.org/transforms/general/rules

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I am doing a simple coin flipping experiment for class that involves flipping a
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I am trying to understand how to use SyndicationItem to display feed which is
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I have just tried to save a simple *.rtf file with some websites and
I want to count how many characters a certain string has in PHP, but
I would like to count the length of a string with PHP. The string
For some reason, after submitting a string like this Jack’s Spindle from a text

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.