I am running a python process on an Amazon EC2 Ubuntu instance that processes a large data file. Initially everything is going fine and I do not notice any continuous rise in RAM or CPU usage. Then, after processing a part of the input data, the process runs out of memory and dies. dmesg -T produces the following, which does not tell me anything:
[Thu Jan 3 17:47:27 2013] python invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
[Thu Jan 3 17:47:27 2013] python cpuset=/ mems_allowed=0
[Thu Jan 3 17:47:27 2013] Pid: 1108, comm: python Not tainted 3.2.0-25-virtual #40-Ubuntu
[Thu Jan 3 17:47:27 2013] Call Trace:
[Thu Jan 3 17:47:27 2013] [<ffffffff810bdb9d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
[Thu Jan 3 17:47:27 2013] [<ffffffff81118231>] dump_header+0x91/0xe0
[Thu Jan 3 17:47:27 2013] [<ffffffff811185b5>] oom_kill_process+0x85/0xb0
[Thu Jan 3 17:47:27 2013] [<ffffffff8111895a>] out_of_memory+0xfa/0x220
[Thu Jan 3 17:47:27 2013] [<ffffffff8111e38a>] __alloc_pages_nodemask+0x7ea/0x800
[Thu Jan 3 17:47:27 2013] [<ffffffff810063dd>] ? pte_mfn_to_pfn+0x8d/0x110
[Thu Jan 3 17:47:27 2013] [<ffffffff811569fa>] alloc_pages_vma+0x9a/0x150
[Thu Jan 3 17:47:27 2013] [<ffffffff8113705c>] do_anonymous_page.isra.38+0x7c/0x2f0
[Thu Jan 3 17:47:27 2013] [<ffffffff8113acc1>] handle_pte_fault+0x1e1/0x200
[Thu Jan 3 17:47:27 2013] [<ffffffff8100647e>] ? xen_pmd_val+0xe/0x10
[Thu Jan 3 17:47:27 2013] [<ffffffff810052d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[Thu Jan 3 17:47:27 2013] [<ffffffff8113b098>] handle_mm_fault+0x1f8/0x350
[Thu Jan 3 17:47:27 2013] [<ffffffff81659f9b>] do_page_fault+0x14b/0x520
[Thu Jan 3 17:47:27 2013] [<ffffffff811425fd>] ? mprotect_fixup+0x17d/0x2b0
[Thu Jan 3 17:47:27 2013] [<ffffffff81142920>] ? sys_mprotect+0x1f0/0x250
[Thu Jan 3 17:47:27 2013] [<ffffffff81656bf5>] page_fault+0x25/0x30
[Thu Jan 3 17:47:27 2013] Mem-Info:
[Thu Jan 3 17:47:27 2013] Node 0 DMA per-cpu:
[Thu Jan 3 17:47:27 2013] CPU 0: hi: 0, btch: 1 usd: 0
[Thu Jan 3 17:47:27 2013] Node 0 DMA32 per-cpu:
[Thu Jan 3 17:47:27 2013] CPU 0: hi: 186, btch: 31 usd: 0
[Thu Jan 3 17:47:27 2013] active_anon:142435 inactive_anon:14 isolated_anon:0
[Thu Jan 3 17:47:27 2013] active_file:0 inactive_file:11 isolated_file:0
[Thu Jan 3 17:47:27 2013] unevictable:0 dirty:0 writeback:0 unstable:0
[Thu Jan 3 17:47:27 2013] free:1389 slab_reclaimable:1528 slab_unreclaimable:1686
[Thu Jan 3 17:47:27 2013] mapped:2 shmem:45 pagetables:793 bounce:0
[Thu Jan 3 17:47:27 2013] Node 0 DMA free:2460kB min:72kB low:88kB high:108kB active_anon:12296kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14524kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:8kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Jan 3 17:47:27 2013] lowmem_reserve[]: 0 597 597 597
[Thu Jan 3 17:47:27 2013] Node 0 DMA32 free:3096kB min:3088kB low:3860kB high:4632kB active_anon:557444kB inactive_anon:56kB active_file:0kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:611856kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:180kB slab_reclaimable:6104kB slab_unreclaimable:6744kB kernel_stack:1024kB pagetables:3156kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:27445 all_unreclaimable? yes
[Thu Jan 3 17:47:27 2013] lowmem_reserve[]: 0 0 0 0
[Thu Jan 3 17:47:27 2013] Node 0 DMA: 1*4kB 2*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2468kB
[Thu Jan 3 17:47:27 2013] Node 0 DMA32: 151*4kB 10*8kB 23*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3100kB
[Thu Jan 3 17:47:27 2013] 55 total pagecache pages
[Thu Jan 3 17:47:27 2013] 0 pages in swap cache
[Thu Jan 3 17:47:27 2013] Swap cache stats: add 0, delete 0, find 0/0
[Thu Jan 3 17:47:27 2013] Free swap = 0kB
[Thu Jan 3 17:47:27 2013] Total swap = 0kB
[Thu Jan 3 17:47:27 2013] 159472 pages RAM
[Thu Jan 3 17:47:27 2013] 8383 pages reserved
[Thu Jan 3 17:47:27 2013] 261 pages shared
[Thu Jan 3 17:47:27 2013] 149349 pages non-shared
[Thu Jan 3 17:47:27 2013] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[Thu Jan 3 17:47:27 2013] [ 238] 0 238 4306 47 0 0 0 upstart-udev-br
[Thu Jan 3 17:47:27 2013] [ 242] 0 242 5396 119 0 -17 -1000 udevd
[Thu Jan 3 17:47:27 2013] [ 287] 0 287 5362 99 0 -17 -1000 udevd
[Thu Jan 3 17:47:27 2013] [ 288] 0 288 5362 99 0 -17 -1000 udevd
[Thu Jan 3 17:47:27 2013] [ 361] 0 361 3795 48 0 0 0 upstart-socket-
[Thu Jan 3 17:47:27 2013] [ 419] 0 419 1814 123 0 0 0 dhclient3
[Thu Jan 3 17:47:27 2013] [ 643] 0 643 12487 151 0 -17 -1000 sshd
[Thu Jan 3 17:47:27 2013] [ 657] 101 657 63427 102 0 0 0 rsyslogd
[Thu Jan 3 17:47:27 2013] [ 663] 102 663 5981 89 0 0 0 dbus-daemon
[Thu Jan 3 17:47:27 2013] [ 725] 0 725 3624 42 0 0 0 getty
[Thu Jan 3 17:47:27 2013] [ 732] 0 732 3624 41 0 0 0 getty
[Thu Jan 3 17:47:27 2013] [ 741] 0 741 3624 42 0 0 0 getty
[Thu Jan 3 17:47:27 2013] [ 743] 0 743 3624 41 0 0 0 getty
[Thu Jan 3 17:47:27 2013] [ 747] 0 747 3624 41 0 0 0 getty
[Thu Jan 3 17:47:27 2013] [ 755] 0 755 1080 37 0 0 0 acpid
[Thu Jan 3 17:47:27 2013] [ 756] 0 756 4776 50 0 0 0 cron
[Thu Jan 3 17:47:27 2013] [ 757] 0 757 4225 39 0 0 0 atd
[Thu Jan 3 17:47:27 2013] [ 787] 0 787 3624 41 0 0 0 getty
[Thu Jan 3 17:47:27 2013] [ 790] 103 790 46895 300 0 0 0 whoopsie
[Thu Jan 3 17:47:27 2013] [ 797] 0 797 20467 216 0 0 0 sshd
[Thu Jan 3 17:47:27 2013] [ 800] 0 800 146074 260 0 0 0 console-kit-dae
[Thu Jan 3 17:47:27 2013] [ 867] 0 867 46645 154 0 0 0 polkitd
[Thu Jan 3 17:47:27 2013] [ 983] 1000 983 20467 213 0 0 0 sshd
[Thu Jan 3 17:47:27 2013] [ 984] 1000 984 6557 1766 0 0 0 bash
[Thu Jan 3 17:47:27 2013] [ 1108] 1000 1108 163815 138085 0 0 0 python
[Thu Jan 3 17:47:27 2013] Out of memory: Kill process 1108 (python) score 915 or sacrifice child
[Thu Jan 3 17:47:27 2013] Killed process 1108 (python) total-vm:655260kB, anon-rss:552336kB, file-rss:4kB
Is there a way to profile the process to figure out what’s going on and what causes the sudden surge in RAM usage? Thx
I’ve used
Dowserto help track down memory use in one of my projects. It runs as a simple web interface and produces a lot of information that will hep you track down the issue.Dowser Blog giving an example.
Dowser Wiki