Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8662247
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T16:41:18+00:00 2026-06-12T16:41:18+00:00

I have to create unique combination while allowing some mismatches. The following is an

  • 0

I have to create unique combination while allowing some mismatches. The following is an example:

set.seed (1234)
dataf <- data.frame (var1 = sample( c("A", "B", "-"),20, replace = T),
            var2 = sample( c("A"),20, replace = T),
            var3 = sample( c("B", "B", "B", "-"),20, replace = T),
            var4 = sample( c("A","A", "A",  "-"),20, replace = T),
            var5 = sample( c("A", "B", "A", "A", "-"),20, replace = T)
            )
 dataf

Rules:

(1) Generate Unique combinations:

     A B     A  B  B   - combination 1
    A  A     A  B  B    - combination 2
    B  B     B  A  A    - combination 3 
   so on ...

(2) Allow one (can be n) mismatch to create a category. For example:

A   B    A  B  B
A   A    A  B  B
B   A    A  B  B
B   A    B  B  B 
B   A    A  B  A

are same as there a single mismatch at different variables.

(3) “-” indicates missing values, can be treated as similar way as integers in matching means that one mismatch allowed.

A   B    A  B  B
 A   -    A  B  B
 A   B    A  -  B

However if there are two missing values then combination is declared unknown (-)

 A   B    A  B  B
 A   -    A  -  B
 A   B    A  -  -

The following is workout for the above data.

    var1 var2 var3 var4 var5       comb
1     A    A    B    -    -       -

2     B    A    B    A    A        1
3     B    A    B    A    A        1
4     B    A    B    A    A        1
5     -    A    B    A    A        1
6     B    A    B    A    -        1

7     A    A    B    A    B        2
8     A    A    B    A    B        2

9     B    A    B    A    A        1

10    B    A    -    A    -        -

11    -    A    B    A    A        1

12    B    A    B    -    -        -

13    A    A    B    A    A        2

14    -    A    B    -    A        -

15    A    A    B    A    A        2
16    -    A    B    A    A        2
17    A    A    B    A    B        2

18    A    A    -    A    A        3

19    A    A    B    A    B        2

20    A    A    -    A    A        3

Any idea ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T16:41:20+00:00Added an answer on June 12, 2026 at 4:41 pm

    Here is how I would do it. The idea is create a distance matrix, so you can cluster your data into groups of rows that have a zero distance among them.

    First, let’s remove (temporarily) the rows that have two or more dashes:

    two.dashes <- apply(dataf, 1, function(x)sum(x == '-') >= 2)
    subdata <- dataf[!two.dashes,]
    

    Then, let’s compute a distance matrix.

    mydist.fun <- function(i, j, x = subdata) {
       row.i <- x[i, ]
       row.j <- x[j, ]
       idx   <- row.i != '-' & row.j != '-'
       sum(row.i[idx] != row.j[idx])
    }
    rows.idx  <- seq_len(nrow(subdata))
    rows.dist <- as.dist(outer(rows.idx, rows.idx, Vectorize(mydist.fun)))
    

    Then, let’s use clustering to group your data. I am using a complete hierarchical clustering and cutting it at height = 0, i.e., it creates groups of points that all have a distance of zero among them.

    hc <- hclust(rows.dist)
    members <- cutree(hc, h = 0)
    

    Let’s put everything together:

    comb <- rep('-', nrow(dataf))
    comb[!two.dashes] <- members
    dataf$comb <- comb
    dataf
    #    var1 var2 var3 var4 var5 comb
    # 1     A    A    B    -    -    -
    # 2     B    A    B    A    A    1
    # 3     B    A    B    A    A    1
    # 4     B    A    B    A    A    1
    # 5     -    A    B    A    A    1
    # 6     B    A    B    A    -    1
    # 7     A    A    B    A    B    2
    # 8     A    A    B    A    B    2
    # 9     B    A    B    A    A    1
    # 10    B    A    -    A    -    -
    # 11    -    A    B    A    A    1
    # 12    B    A    B    -    -    -
    # 13    A    A    B    A    A    3
    # 14    -    A    B    -    A    -
    # 15    A    A    B    A    A    3
    # 16    -    A    B    A    A    1
    # 17    A    A    B    A    B    2
    # 18    A    A    -    A    A    3
    # 19    A    A    B    A    B    2
    # 20    A    A    -    A    A    3
    

    This is exposing contradictions in your expected output. For example, row 7 and 13 should not belong to the same group. Also, there are rows with a single dash that could go to different groups, e.g. row 16.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

i have a table e.g: CREATE TABLE places (id UNIQUE, name, latitude, longitude) i
I have tried with ado.net create table columnName with unique name. as uniquename I
I have three enums, which - in combination - should identify a unique state
I have a table schema similar to the following (simplified): CREATE TABLE Transactions (
I have the following data frame 'x' id,item,volume a,c1,2 a,c2,3 a,c3,2 a,c4,1 a,c5,4 b,c6,6
I have a table called tbl_jobs that stores the meta data of some background
I have a join table with the following structure: CREATE TABLE adjectives_friends ( adjective_id
I have this SQL table: CREATE TABLE DATA ( ID NUMBER NOT NULL, CODE
I have the following tables in MySQL server: Companies: - UID (unique) - NAME
I have Log and LogItem tables; I'm writing a query to grab some data

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.