Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 649661
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T21:58:06+00:00 2026-05-13T21:58:06+00:00

After my last, failed, attempt at asking a question here I’m trying a more

  • 0

After my last, failed, attempt at asking a question here I’m trying a more precise question this time:

What I have:

  1. A huge dataset (finite, but I wasted days of multi-core processing time to compute it before…) of ISet<Point>.
  2. A list of input values between 0 to 2n, n≤17

What I need:

3) A table of [1], [2] where I map every value of [2] to a value of [1]

The processing:

For this computation I have a formula, that takes a bit value (from [2]) and a set of positions (from [1]) and creates a new ISet<Point>. I need to find out which of the original set is equal to the resulting set (i.e. The “cell” in the table at “A7” might point to “B”).

The naive way:

Compute the new ISet<Point> and use .Contains(mySet) or something similar on the list of values from [1]. I did that in previous versions of this proof of concept/pet project and it was dead slow when I started feeding huge numbers. Yes, I used a profiler. No, this wasn’t the only slow part of the system, but I wasted a considerable amount of time in this naive lookup/mapping.

The question, finally:

Since I basically just need to remap to the input, I thought about creating a List of hashed values for the list of ISet<Point>, doing the same for my result from the processing and therefor avoiding to compare whole sets.

Is this a good idea? Would you call this premature optimization (I know that the naive way above is too slow, but should I implement something less clever first? Performance is really important here, think days of runtime again)? Any other suggestions to ease the burdon here or ideas what I should read up on?


Update: Sorry for not providing a better explanation or a sample right away.

Sample for [1] (Note: These are real possible datapoints, but obviously it’s limited) :

new List<ISet<Point>>() {
  new HashSet() {new Point(0,0) },
  new HashSet() {new Point(0,0), new Point(2,1) },
  new HashSet() {new Point(0,1), new Point(3,1) }
}

[2] is just a boolean vector of the length n. For n = 2 it’s

  • 0,0
  • 0,1
  • 1,0
  • 1,1

I can do that one by using an int or long, basically.

Now I have a function that takes an vector and an ISet<Point> and returns a new ISet<Point>. It’s not a 1:1 transformation: An set of 5 might result in a set of 11 or whatever. The resulting ISet<Point> is however guaranteed to be part of the input.

Using letters for a set of points and numbers for the bit vectors, I’m starting with this

  A  B  C  D  E  F
1
2
3
4
5
6
7

What I need to have at the end is

  A  B  C  D  E  F
1 -  C  A  E  -  -
2 B  C  E  F  A  -
3 ................
4 ................
5 ................
6 F  C  B  A  -  - 
7 E  -  C  A  -  D

There are several costly operations in these project, one is the preparation of the sets of point ([1]). But this question is about the matching now: I can easily (more or less, not that important now) compute a target ISet for a given bit vector and a source ISet. Now I need to match/find that in the original set.

The whole beast is going to be a state machine, where the set of points is a valid state. Later I don’t care about the individual states, I can actually refer to them by anything (a letter, an index, whatever). I just need to keep the associations:

1, B => C


Update: Eric asked if a HashSet would be possible. The answer is yes, but only if the dataset stays small enough. My question (hashing) is: Might it be possible/a good idea to employ a hashing algorithm for this hashset? My idea is this:

  • Walk the (lazily generated) list/sequence of ISet<Point> (I could change this type, I just want to stress that it is a mathematical set of points, no duplicates).

    • Create a simpler representation of the input (a hash?) and store it (in a hashset?)
    • Compute all target sets for this input, but only store again a simple representation (see above)
    • Discard the set
  • Fix up the mapping (equal hash = equal state)

Good idea? Problems with this? One that I could come up with is a collision (how probable is that?) – and I wouldn’t know a good hashing function to begin with..

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T21:58:07+00:00Added an answer on May 13, 2026 at 9:58 pm

    OK, I think I understand the problem at least now. Let me see if I can rephrase.

    Let’s start by leaving sets out of it. We’ll keep it abstract.

    You have a large list L, containing instances of reference type S (for “set”). Such a list is of course logically a mapping from natural numbers N onto S.

    L: N --> S
    

    S has the property that two instances can be compared for both reference equality and value equality. That is, there can be two instances of S which are not reference equals, but logically represent the same value.

    You have a function F which takes a value of type V (for “vector”) and an instance of type S and produces another instance of type S.

    F: (V, S) --> S
    

    Furthermore, you know that if F is given an instance of S from L then the resulting instance of S will be value equals to something on the list, but not necessarily reference equals.

    The problem you face is: given an instance s of S which is the result of a call to F, which member L(n) is value-equals to s?

    Yes?

    The naive method — go down L(1), L(2), … testing set equality along the way will be dead slow. It’ll be at least linear in the size of L.

    I can think of several different ways to proceed. The easiest is your initial thought: make L something other than a list. Can you make it a HashSet<S> instead of List<S>? If you implement a hashing algorithm and equality method on S then you can build a fast lookup table.

    If that doesn’t suit then we’ll explore other options.

    UPDATE:

    OK, so I can see two basic ways to deal with your memory problem. (1) Keep everything in memory using data structures that are much smaller than your current implementation, or (2) change how you store stuff on disk so that you can store an “index” in memory and rapidly go to the right “page” of the disk file to extract the information you need.

    You could be representing a point as a single short where the top byte is x and the bottom byte is y, instead of representing it as two ints; a savings of 75%.

    A set of points could be implemented as a sorted array of shorts, which is pretty compact and easy to write a hash algorithm for.

    That’s probably the approach I’d go for since your data are so compressible.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

After having read Ian Boyd 's constructor series questions ( 1 , 2 ,
I'm trying to build a Chrome browser extension, that should enhance the way the
I am attempting to pull some information from my tnsnames file using regex. I
I'm in the process of porting some code from Linux to Mac OS X.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.