Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6472691
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T06:22:36+00:00 2026-05-25T06:22:36+00:00

I have this scenario in which memory conservation is paramount. I am trying to

  • 0

I have this scenario in which memory conservation is paramount. I am trying to read in > 1 GB of Peptide sequences into memory and group peptide instances together that share the same sequence. I am storing the Peptide objects in a Hash so I can quickly check for duplication, but found out that you cannot access the objects in the Set, even after knowing that the Set contains that object.

Memory is really important and I don’t want to duplicate data if at all possible. (Otherwise I would of designed my data structure as: peptides = Dictionary<string, Peptide> but that would duplicate the string in both the dictionary and Peptide class). Below is the code to show you what I would like to accomplish:

public SomeClass {

       // Main Storage of all the Peptide instances, class provided below
       private HashSet<Peptide> peptides = new HashSet<Peptide>();

       public void SomeMethod(IEnumerable<string> files) {
            foreach(string file in files) {
                 using(PeptideReader reader = new PeptideReader(file)) {
                     foreach(DataLine line in reader.ReadNextLine()) {
                         Peptide testPep = new Peptide(line.Sequence);
                         if(peptides.Contains(testPep)) {

                            // ** Problem Is Here **
                            // I want to get the Peptide object that is in HashSet
                            // so I can add the DataLine to it, I don't want use the
                            // testPep object (even though they are considered "equal")
                            peptides[testPep].Add(line); // I know this doesn't work

                            testPep.Add(line) // THIS IS NO GOOD, since it won't be saved in the HashSet which i use in other methods.

                         } else {
                            // The HashSet doesn't contain this peptide, so we can just add it
                            testPep.Add(line);
                            peptides.Add(testPep);
                         }
                     }   
                 }
            }
       }
}

public Peptide : IEquatable<Peptide> {
     public string Sequence {get;private set;}
     private int hCode = 0;

     public PsmList PSMs {get;set;}

     public Peptide(string sequence) {
         Sequence = sequence.Replace('I', 'L');
         hCode = Sequence.GetHashCode();             
     }

     public void Add(DataLine data) {
         if(PSMs == null) {
             PSMs = new PsmList();
         } 
         PSMs.Add(data);
     }

     public override int GethashCode() {
         return hCode;
     }

     public bool Equals(Peptide other) {
         return Sequence.Equals(other.Sequence);
     }
}

public PSMlist : List<DataLine> { // and some other stuff that is not important }

Why does HashSet not let me get the object reference that is contained in the HashSet? I know people will try to say that if HashSet.Contains() returns true, your objects are equivalent. They may be equivalent in terms of values, but I need the references to be the same since I am storing additional information in the Peptide class.

The only solution I came up with is Dictionary<Peptide, Peptide> in which both the key and value point to the same reference. But this seems tacky. Is there another data structure to accomplish this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T06:22:37+00:00Added an answer on May 25, 2026 at 6:22 am

    Basically you could reimplement HashSet<T> yourself, but that’s about the only solution I’m aware of. The Dictionary<Peptide, Peptide> or Dictionary<string, Peptide> solution is probably not that inefficient though – if you’re only wasting a single reference per entry, I would imagine that would be relatively insignificant.

    In fact, if you remove the hCode member from Peptide, that will safe you 4 bytes per object which is the same size as a reference in x86 anyway… there’s no point in caching the hash as far as I can tell, as you’ll only compute the hash of each object once, at least in the code you’ve shown.

    If you’re really desperate for memory, I suspect you could store the sequence considerably more efficiently than as a string. If you give us more information about what the sequence contains, we may be able to make some suggestions there.

    I don’t know that there’s any particularly strong reason why HashSet doesn’t permit this, other than that it’s a relatively rare requirement – but it’s something I’ve seen requested in Java as well…

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

we have this scenario: A server which contains needed data and client component which
The scenario is this We have two applications A and B, both which are
This is a very common scenario: displaying images in a ListView which have to
I have this scenario where I need data integrity in the physical database. For
I have this typical scenario. I have a smartclient application built on .net 2.0
Imagine this scenario: You have a desktop and a laptop. The desktop has a
Consider this scenario. I have my own website, that I use as my identifier,
Consider this scenario. I have an object, lets call it.... Foo. Foo raises a
The architecture for this scenario is as follows: I have a table of items
Maybe it's not worth worrying about in this scenario, but lets say you have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.