Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8901709
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T01:25:39+00:00 2026-06-15T01:25:39+00:00

I am trying to optimize my code since when I try to load huge

  • 0

I am trying to optimize my code since when I try to load huge dictionaries it becomes really slow. I think It’s because it searchs for a key in the dictionary. I’ve been reading about python defaultdict and I think it might be a good improvement but I fail to implement it here. As you can see is a hierarchichal dictionary structure. Any hint will be appreciated.

class Species:
    '''This structure contains all the information needed for all genes.
    One specie have several genes, one gene several proteins'''
    def __init__(self, name):
        self.name = name #name of the GENE
        self.genes = {}
    def addProtein(self, gene, protname, len):
        #Converting a line from the input file into a protein and/or an exon
        if gene in self.genes:
            #Gene in the structure
            self.genes[gene].proteins[protname] = Protein(protname, len)
            self.genes[gene].updateProts()
        else:
            self.genes[gene] = Gene(gene) 
            self.updateNgenes()
            self.genes[gene].proteins[protname] = Protein(protname, len)
            self.genes[gene].updateProts()
    def updateNgenes(self):
    #Updating the number of genes
        self.ngenes = len(self.genes.keys())    

The definitions of gene and Protein are:

class Protein:
    #The class protein contains information about the length of the protein and a list with it's exons (with it's own attributes)
    def __init__(self, name, len):
        self.name = name
        self.len = len

class Gene:
    #The class gene contains information about the gene and a dict with it's proteins (with it's own attributes)
    def __init__(self, name):
        self.name = name
        self.proteins = {}
        self.updateProts()
    def updateProts(self):
        #Update number of proteins
        self.nproteins = len(self.proteins)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T01:25:40+00:00Added an answer on June 15, 2026 at 1:25 am

    You cannot use a defaultdict because your __init__ methods require arguments.

    This is probably one of your bottlenecks:

    def updateNgenes(self):
    #Updating the number of genes
        self.ngenes = len(self.genes.keys()) 
    

    len(self.genes.keys()) creates a list of all keys before calculating length. This means that every time you add a gene, you create a list and throw it away. This list creation gets more and more expensive the more genes you have. To avoid creating an intermediate list, just do len(self.genes).

    Better yet would be to make ngenes a property so it is only calculated when you need it.

    @property
    def ngenes(self):
        return len(self.genes)
    

    The same can be done with nproteins in the Gene class.

    Here is your code refactored:

    class Species:
        '''This structure contains all the information needed for all genes.
        One specie have several genes, one gene several proteins'''
    
        def __init__(self, name):
            self.name = name #name of the GENE
            self.genes = {}
    
        def addProtein(self, gene, protname, len):
            #Converting a line from the input file into a protein and/or an exon
            if gene not in self.genes:
                self.genes[gene] = Gene(gene) 
            self.genes[gene].proteins[protname] = Protein(protname, len)
    
        @property
        def ngenes(self):
            return len(self.genes)
    
    class Protein:
        #The class protein contains information about the length of the protein and a list with it's exons (with it's own attributes)
        def __init__(self, name, len):
            self.name = name
            self.len = len
    
    class Gene:
        #The class gene contains information about the gene and a dict with it's proteins (with it's own attributes)
        def __init__(self, name):
            self.name = name
            self.proteins = {}
    
        @property
        def nproteins(self):
            return len(self.proteins)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to optimize some code, using criterion to try to compare, for example,
I'm trying to optimize my code using Neon intrinsics. I have a 24-bit rotation
I'm trying to micro-optimize my code at a very low level point in the
I'm trying to optimize my C++ code. I've searched the internet on using dynamically
I'm trying to optimize the performance of my code, but I'm not familiar with
I`m having trouble trying to optimize this query with OVER (PARTITION BY ...) because
I needed some help in trying to optimize this code portion ... Basically here's
I'm working on a bit of code and I'm trying to optimize it as
I'm programming a big game in Java and I'm trying to optimize the code
I'm trying to optimize the following code below to avoid having to copy and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.