Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8331491
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T02:25:51+00:00 2026-06-09T02:25:51+00:00

I want to prepare data for unsupervised learning with random forest. The procedure is

  • 0

I want to prepare data for unsupervised learning with random forest.
The procedure is as follows:

  • Take data and add attribute ‘class’ with value 1 to all examples
  • Generate synthetic data from original data:
    • while you don’t have the same number of examples as in original data build examples:
      • sample new attribute value from all values of that attribute in original data
      • do that for all attributes and combine them into new example
  • assign to attribute ‘class’ of synthetic data value 2
  • bind both data together

At the end it look like this:

        ...      Class
                |1
     Original   |1
     Data       |1
                |1
    --------------
                |2
     Synthetic  |2
     Data       |2
                |2

My R code looks like this:

library(gtools) #for smartbind()

sample1 <- function(X)   { sample(X, replace=T) } 
g1      <- function(dat) { apply(dat,2,sample1) }

data$class <- rep(1, times=nrow(data)) #add attribute 'class' with value 1

synthData<-data.frame(g1(data[,1:ncol(data)])) #generate synthetic data with sampling from data
synthData$class <- rep(2, times=nrow(synthData)) #attribute 'class' is 2
colnames(synthData) <- colnames(data)
newData <- smartbind(data, synthData) #bind the data together

It’s probably obvious that I’m really new to R, but it works – there is just one problem: types of attributes in synthetic data are not the same as in original data. If in original they are nums, now they become factors. How could I preserve same type while generating synthetic data?

Thank you!

Data1 (nums become factors):

structure(list(V2 = c(1.51793, 1.51711, 1.51645, 1.51916, 1.51131
), V3 = c(13.21, 12.89, 13.44, 14.15, 13.69), V4 = c(3.48, 3.62,
3.61, 0, 3.2), V5 = c(1.41, 1.57, 1.54, 2.09, 1.81), V6 = c(72.64,
72.96, 72.39, 72.74, 72.81), V7 = c(0.59, 0.61, 0.66, 0, 1.76
), V8 = c(8.43, 8.11, 8.03, 10.88, 5.43), V9 = c(0, 0, 0, 0,
1.19), V10 = c(0, 0, 0, 0, 0), realClass = structure(c(1L, 2L,
2L, 5L, 6L), .Label = c(“1”, “2”, “3”, “5”, “6”, “7”), class = “factor”)), .Names = c(“V2”,
“V3”, “V4”, “V5”, “V6”, “V7”, “V8”, “V9”, “V10”, “realClass”), row.names = c(27L,
138L, 77L, 183L, 186L), class = “data.frame”)

Data2 (factors become chrs):

structure(list(realClass = structure(c(2L, 2L, 2L, 1L, 2L), .Label = c(“e”,
“p”), class = “factor”), V2 = structure(c(6L, 3L, 4L, 6L, 6L), .Label = c(“b”,
“c”, “f”, “k”, “s”, “x”), class = “factor”), V3 = structure(c(4L,
4L, 3L, 1L, 1L), .Label = c(“f”, “g”, “s”, “y”), class = “factor”),
V4 = structure(c(5L, 5L, 5L, 3L, 4L), .Label = c(“b”, “c”,
“e”, “g”, “n”, “p”, “r”, “u”, “w”, “y”), class = “factor”),
V5 = structure(c(1L, 1L, 1L, 2L, 1L), .Label = c(“f”, “t”
), class = “factor”), V6 = structure(c(3L, 9L, 3L, 6L, 3L
), .Label = c(“a”, “c”, “f”, “l”, “m”, “n”, “p”, “s”, “y”
), class = “factor”), V7 = structure(c(2L, 2L, 2L, 2L, 2L
), .Label = c(“a”, “f”), class = “factor”), V8 = structure(c(1L,
1L, 1L, 1L, 1L), .Label = c(“c”, “w”), class = “factor”),
V9 = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c(“b”, “n”
), class = “factor”), V10 = structure(c(1L, 1L, 1L, 10L,
4L), .Label = c(“b”, “e”, “g”, “h”, “k”, “n”, “o”, “p”, “r”,
“u”, “w”, “y”), class = “factor”), V11 = structure(c(2L,
2L, 2L, 2L, 1L), .Label = c(“e”, “t”), class = “factor”),
V12 = structure(c(NA, NA, NA, 1L, 1L), .Label = c(“b”, “c”,
“e”, “r”), class = “factor”), V13 = structure(c(3L, 2L, 3L,
3L, 2L), .Label = c(“f”, “k”, “s”, “y”), class = “factor”),
V14 = structure(c(3L, 3L, 2L, 3L, 2L), .Label = c(“f”, “k”,
“s”, “y”), class = “factor”), V15 = structure(c(7L, 8L, 7L,
4L, 7L), .Label = c(“b”, “c”, “e”, “g”, “n”, “o”, “p”, “w”,
“y”), class = “factor”), V16 = structure(c(7L, 7L, 8L, 4L,
1L), .Label = c(“b”, “c”, “e”, “g”, “n”, “o”, “p”, “w”, “y”
), class = “factor”), V17 = structure(c(1L, 1L, 1L, 1L, 1L
), .Label = “p”, class = “factor”), V18 = structure(c(3L,
3L, 3L, 3L, 3L), .Label = c(“n”, “o”, “w”, “y”), class = “factor”),
V19 = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c(“n”, “o”,
“t”), class = “factor”), V20 = structure(c(1L, 1L, 1L, 5L,
3L), .Label = c(“e”, “f”, “l”, “n”, “p”), class = “factor”),
V21 = structure(c(8L, 8L, 8L, 4L, 2L), .Label = c(“b”, “h”,
“k”, “n”, “o”, “r”, “u”, “w”, “y”), class = “factor”), V22 = structure(c(5L,
5L, 5L, 5L, 6L), .Label = c(“a”, “c”, “n”, “s”, “v”, “y”), class = “factor”),
V23 = structure(c(3L, 3L, 5L, 1L, 2L), .Label = c(“d”, “g”,
“l”, “m”, “p”, “u”, “w”), class = “factor”)), .Names = c(“realClass”,
“V2”, “V3”, “V4”, “V5”, “V6”, “V7”, “V8”, “V9”, “V10”, “V11”,
“V12”, “V13”, “V14”, “V15”, “V16”, “V17”, “V18”, “V19”, “V20”,
“V21”, “V22”, “V23”), row.names = c(4105L, 6207L, 6696L, 2736L,
3756L), class = “data.frame”)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T02:25:52+00:00Added an answer on June 9, 2026 at 2:25 am

    You can always use this trick to have numeric columns

    numcol <- as.numeric(as.character(factcol))
    

    But I suspect that you have factor variable in your data.frame.
    Since apply return a matrix, if you have one factor in your data, all the numeric variable will be coerced to factor too.

    Here is an example, using toy dataset

    set.seed(123)
    toydat <- data.frame(A = 1:10, B = rnorm(10), C = LETTERS[1:10])
    str(toydat)
    
    ## 'data.frame':    10 obs. of  3 variables:
    ##  $ A: int  1 2 3 4 5 6 7 8 9 10
    ##  $ B: num  -0.5605 -0.2302 1.5587 0.0705 0.1293 ...
    ##  $ C: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
    
    set.seed(1)
    str(data.frame(apply(toydat[,1:2], 2, sample, replace = TRUE)))
    
    ## 'data.frame':    10 obs. of  2 variables:
    ##  $ A: num  3 4 6 10 3 9 10 7 7 1
    ##  $ B: num  1.5587 -0.2302 0.4609 0.0705 -1.2651 ...
    
    # with the factor column C     
    set.seed(2)
    str(data.frame(apply(toydat[,1:3], 2, sample, replace = TRUE)))
    
    ## 'data.frame':    10 obs. of  3 variables:
    ##  $ A: Factor w/ 6 levels "10"," 2"," 5",..: 2 5 4 2 1 1 2 6 3 4
    ##  $ B: Factor w/ 8 levels " 0.129288","-0.230177",..: 8 7 6 2 1 5 3 7 1 4
    ##  $ C: Factor w/ 6 levels "B","D","E","G",..: 4 2 5 1 2 3 1 2 6 1
    

    This is where the plyr package became useful, since you can control the output (using **ply). But in this case, the colwise function is sufficient

    require(plyr)
    set.seed(2)
    mysamplingfun <- colwise(function(x) sample(x, replace = TRUE))
    str(mysamplingfun(toydat[,1:3]))
    
    ## 'data.frame':    10 obs. of  3 variables:
    ##  $ A: int  2 8 6 2 10 10 2 9 5 6
    ##  $ B: num  1.715 1.559 -1.265 -0.23 0.129 ...
    ##  $ C: Factor w/ 10 levels "A","B","C","D",..: 7 4 9 2 4 5 2 4 10 2
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In my UITableViewController subclass I want to prepare the tabular data and reload the
I want to delete all data in an existing sqlite DB, this is the
I want to prepare some data after user login system. After some google, I
To speed up my application I want to prepare some data before DOM is
I want to prepare a text for the use in a LaTeX document. I
I want to use ADO.NET Prepare command will increase performance if query is repeatevely
I'm creating a website and I want it also to be prepare for mobile
HI, I want to use FOP to prepare a XML document for printing (ps/pdf).
want to know why String behaves like value type while using ==. String s1
i want to make a method async public static void PrepareData<T>() { // prepare

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.