Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 704557
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T03:57:18+00:00 2026-05-14T03:57:18+00:00

I have successfully debugged my own memory leak problems. However, I have noticed some

  • 0

I have successfully debugged my own memory leak problems. However, I have noticed some very strange occurence.

    for fid, fv in freqDic.iteritems():
        outf.write(fid+"\t")                #ID
        for i, term in enumerate(domain):   #Vector
            tfidf = self.tf(term, fv) * self.idf( term, docFreqDic)
            if i == len(domain) - 1:
                outf.write("%f\n" % tfidf)
            else:
                outf.write("%f\t" % tfidf)
        outf.flush()
        print "Memory increased by", int(self.memory_mon.usage()) - startMemory

    outf.close()

def tf(self, term, freqVector):
    total = freqVector[TOTAL]
    if total == 0:
        return 0
    if term not in freqVector:      ##  When you don't have these lines memory leaks occurs
        return 0                    ##
    return float(freqVector[term]) / freqVector[TOTAL]


def idf(self, term, docFrequencyPerTerm):
    if term not in docFrequencyPerTerm:
        return 0        
    return math.log( float(docFrequencyPerTerm[TOTAL])/docFrequencyPerTerm[term])

Basically let me describe my problem:
1) I am doing tfidf calculations
2) I traced that the source of memory leaks is coming from defaultdict.
3) I am using the memory_mon from How to get current CPU and RAM usage in Python?
4) The reason for my memory leaks is as follows: a) in self.tf, if the lines: if term not in freqVector: return 0 are not added that will cause the memory leak. (I verified this myself using memory_mon and noticed a sharp increase in memory that kept on increasing)

The solution to my problem was 1) since fv is a defaultdict, any reference to it that are not found in fv will create an entry. Over a very large domain, this will cause memory leaks.

I decided to use dict instead of default dict and the memory problem did go away.

My only puzzle is: since fv is created in “for fid, fv in freqDic.iteritems():” shouldn’t fv be destroyed at the end of every for loop? I tried putting gc.collect() at the end of the for loop but gc was not able to collect everything (returns 0). Yes, the hypothesis is right, but the memory should stay fairly consistent with ever for loop if for loops do destroy all temp variables.

This is what it looks like with that two line in self.tf:

Memory increased by 12
Memory increased by 948
Memory increased by 28
Memory increased by 36
Memory increased by 36
Memory increased by 32
Memory increased by 28
Memory increased by 32
Memory increased by 32
Memory increased by 32
Memory increased by 40
Memory increased by 32
Memory increased by 32
Memory increased by 28

and without the the two line:

Memory increased by 1652
Memory increased by 3576
Memory increased by 4220
Memory increased by 5760
Memory increased by 7296
Memory increased by 8840
Memory increased by 10456
Memory increased by 12824
Memory increased by 13460
Memory increased by 15000
Memory increased by 17448
Memory increased by 18084
Memory increased by 19628
Memory increased by 22080
Memory increased by 22708
Memory increased by 24248
Memory increased by 26704
Memory increased by 27332
Memory increased by 28864
Memory increased by 30404
Memory increased by 32856
Memory increased by 33552
Memory increased by 35024
Memory increased by 36564
Memory increased by 39016
Memory increased by 39924
Memory increased by 42104
Memory increased by 42724
Memory increased by 44268
Memory increased by 46720
Memory increased by 47352
Memory increased by 48952
Memory increased by 50428
Memory increased by 51964
Memory increased by 53508
Memory increased by 55960
Memory increased by 56584
Memory increased by 58404
Memory increased by 59668
Memory increased by 61208
Memory increased by 62744
Memory increased by 64400

I look forward to your answer

EDIT:
It appears that my terminology might have been wrong (or appear to be wrong).

  1. The memory leak I was referring to was NOT generated from freqVector[term]. (Looking up an nonexistent key in a defaultdict).
  2. The actual memory leak I was talking about was the memory leak from for fid, fv in freqDic.iteritems()!! I know fv increased in size because of 1), but it should still be destroyed at the end of the loop! memory shouldn’t keep on expanding. Is this not memory leak?
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T03:57:19+00:00Added an answer on May 14, 2026 at 3:57 am

    Iterating over freqDict does not generate new values, but passes references to the values already held by the dict. This means you add new values to the fv which is held by freqDict even after the loop.

    Another solution would be to clear freqDict after looping over it.

    In general, Python does pass everything by reference, although it sometimes it appears otherwise. Strings and Integers are immutable and the object, which they represent, gets replaced if they are changed.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 431k
  • Answers 431k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer Oh wow that's embarrassing. I needed to select the view… May 15, 2026 at 2:13 pm
  • Editorial Team
    Editorial Team added an answer Is it possible somehow? I haven't come across any method… May 15, 2026 at 2:13 pm
  • Editorial Team
    Editorial Team added an answer Let's break it down: [ ] <-- matches a single… May 15, 2026 at 2:13 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.