I’m working on a Text Mining Solution with SQL and R. First I Import

Question

0

Editorial Team

Asked: May 24, 20262026-05-24T01:36:10+00:00 2026-05-24T01:36:10+00:00

I’m working on a Text Mining Solution with SQL and R. First I Import

0

I’m working on a Text Mining Solution with SQL and R.

First I Import Data into R from my SQL selection and than I do data mining stuff with it.

Here is what I got:

rawData = sqlQuery(dwhConnect,sqlString) 
a = data.frame(rawData$ENNOTE_NEU)

If I do a

a[[1]][1:3]

you see the structure:

[1] lorem ipsum li ld ee wö wo di dd
[2] la kdin di da dogs chicken
[3] kd good i need some help

Now I want to do some data cleaning with my own dictionary.
An Example would be to replace li with lorem ipsum and kd as well as kdin with kunde

My Problem is how to do that for the whole Data Frame.

 for(i in 1:(nrow(a)))
    {
        a[[1]][i]=gsub( " kd | kdin " , " kunde " ,a[[1]][i])
        a[[1]][i]=gsub( " li " , " lorem ipsum " ,a[[1]][i])
...
    }

works but is slow for a lot of data.

Is there a better way to do that?

cheers The Captain

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T01:36:11+00:00

Editorial Team

2026-05-24T01:36:11+00:00Added an answer on May 24, 2026 at 1:36 am

gsub is vectorised, so you don’t need the loop.

a[[1]] <- gsub( " kd | kdin " , " kunde " , a[[1]])

is quicker.

Also, are you sure you want spaces inside your regexes? That way you won’t match words at the start or end of lines.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a Text Mining Solution with SQL and R. First I Import

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply