Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8811421
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T03:21:34+00:00 2026-06-14T03:21:34+00:00

I do have a table with list of files. There is id_folder, id_parrent_folder, size

  • 0

I do have a table with list of files. There is id_folder, id_parrent_folder, size (file size):

create table sample_data (
    id_folder bigint ,
    id_parrent_folder bigint,
    size bigint
);

I would like to know, how many files are in every subfolder (including current folder) for each folder (starting wigh given folder). Given the samle data posted below I expect the following output:

id_folder     files
100623           35
100624           14

Sample data:

insert into sample_data values (100623,58091,60928);
insert into sample_data values (100623,58091,59904);
insert into sample_data values (100623,58091,54784);
insert into sample_data values (100623,58091,65024);
insert into sample_data values (100623,58091,25600);
insert into sample_data values (100623,58091,31744);
insert into sample_data values (100623,58091,27648);
insert into sample_data values (100623,58091,39424);
insert into sample_data values (100623,58091,30720);
insert into sample_data values (100623,58091,71168);
insert into sample_data values (100623,58091,68608);
insert into sample_data values (100623,58091,34304);
insert into sample_data values (100623,58091,46592);
insert into sample_data values (100623,58091,35328);
insert into sample_data values (100623,58091,29184);
insert into sample_data values (100623,58091,38912);
insert into sample_data values (100623,58091,38400);
insert into sample_data values (100623,58091,49152);
insert into sample_data values (100623,58091,14444);
insert into sample_data values (100623,58091,33792);
insert into sample_data values (100623,58091,14789);
insert into sample_data values (100624,100623,16873);
insert into sample_data values (100624,100623,32768);
insert into sample_data values (100624,100623,104920);
insert into sample_data values (100624,100623,105648);
insert into sample_data values (100624,100623,31744);
insert into sample_data values (100624,100623,16431);
insert into sample_data values (100624,100623,46592);
insert into sample_data values (100624,100623,28160);
insert into sample_data values (100624,100623,58650);
insert into sample_data values (100624,100623,162);
insert into sample_data values (100624,100623,162);
insert into sample_data values (100624,100623,162);
insert into sample_data values (100624,100623,162);
insert into sample_data values (100624,100623,162);

I’ve tried to use example from postgresql (postgresql docs), but it (obviously) can’t work this way. Any help appreciated.

— Edit

I’ve tried the following query:

WITH RECURSIVE included_files(id_folder, parrent_folder, dist_last_change) AS (
SELECT 
    id_folder, 
    id_parrent_folder, 
    size
FROM 
    sample_data p 
WHERE 
    id_folder = 100623
UNION ALL
SELECT 
    p.id_folder, 
    p.id_parrent_folder, 
    p.size
FROM 
    included_files if, 
    sample_data p
WHERE 
    p.id_parrent_folder = if.id_folder
)
select * from included_files

This won’t work, because for every child there is a lot of parents and as a result rows in child folders are multiplied.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T03:21:35+00:00Added an answer on June 14, 2026 at 3:21 am

    Very nice problem to think about, I upvoted!

    As I see it, 2 cases to think about:

    1. multi-level paths and
    2. multi-child nodes.

    So far I’ve came up with the following query:

    WITH RECURSIVE tree AS (
        SELECT id_folder id, array[id_folder] arr
          FROM sample_data sd
         WHERE NOT EXISTS (SELECT 1 FROM sample_data s
                            WHERE s.id_parrent_folder=sd.id_folder)
        UNION ALL
        SELECT sd.id_folder,t.arr||sd.id_folder
          FROM tree t
          JOIN sample_data sd ON sd.id_folder IN (
            SELECT id_parrent_folder FROM sample_data WHERE id_folder=t.id))
    ,ids AS (SELECT DISTINCT id, unnest(arr) ua FROM tree)
    ,agg AS (SELECT id_folder id,count(*) cnt FROM sample_data GROUP BY 1)
    SELECT ids.id, sum(agg.cnt)
      FROM ids JOIN agg ON ids.ua=agg.id
     GROUP BY 1
     ORDER BY 1;
    

    I’ve added the following rows to the sample_data:

    INSERT INTO sample_data VALUES (100625,100623,123);
    INSERT INTO sample_data VALUES (100625,100623,456);
    INSERT INTO sample_data VALUES (100625,100623,789);
    INSERT INTO sample_data VALUES (100626,100625,1);
    

    This query is not optimal though and will be slowing down as number of rows grows.


    Full-scale tests

    In order to simulate original situation, I’ve done a small python script that scans filesystem and stores it into the database (thus the delay, I’m not yet good at python scripting).

    The following tables had been created:

    CREATE TABLE fs_file(file_id bigserial, name text, type char(1), level int4);
    CREATE TABLE fs_tree(file_id int8, parent_id int8, size int8);
    

    Scanning whole filesystem of my MBP took 7.5 minutes and I have 870k entries in the fs_tree table, which is quite similar to the original task. After upload, the following was run:

    CREATE INDEX i_fs_tree_1 ON fs_tree(file_id);
    CREATE INDEX i_fs_tree_2 ON fs_tree(parent_id);
    VACUUM ANALYZE fs_file;
    VACUUM ANALYZE fs_tree;
    

    I’ve tried running my first query on this data and had to kill it after aprx 1 hour. The improved one takes round 2 minutes (on my MBP) to do the job on the whole filesystem. Here it comes:

    WITH RECURSIVE descent AS (
        SELECT fs.file_id grp, fs.file_id, fs.size, 1 k, 0 AS lvl
          FROM fs_tree fs
         WHERE fs.parent_id = (SELECT file_id FROM fs_file WHERE name = '/')
        UNION ALL
        SELECT DISTINCT CASE WHEN k.k=0 THEN d.grp ELSE fs.file_id END AS grp,
               fs.file_id, fs.size, k.k, d.lvl+1
          FROM descent d
          JOIN fs_tree fs ON d.file_id=fs.parent_id
          CROSS JOIN generate_series(0,1) k(k))
    /* the query */
    SELECT grp, file_id, size, k, lvl
      FROM descent
     ORDER BY 1,2,3;
    

    Query uses my table names, but it shouldn’t be difficult to change it. It will build a set of groups for each file_id found in the fs_tree. To get the desired output, you can do something like:

    SELECT grp AS file_id, count(*), sum(size)
      FROM descent GROUP BY 1;
    

    Some notes:

    1. query will work only if there’re no duplicates. I think it is a right way to go, ‘cos it is impossible to have 2 equally named entries in a single directory;
    2. query doesn’t care bout the depth or sibling count of the tree, though this does have impact on the performance;
    3. for me it was good experience, as similar functionality is needed also for task planning systems (I’m working with one at the moment);
    4. as tasks are considered, single entry can have multiple parents (but not otherwise) and query will still work;
    5. this problem can be solved in other ways too, like traversing the tree in ascending order, or using pre-calculated values to avoid the final grouping step, but this is getting a bit bigger then a simple question, so I live it as an exercise for you.

    Recommendations

    To get this query work, you should prepare your data by aggregating it:

    WITH RECURSIVE
    fs_tree AS (
        SELECT id_folder file_id, id_parrent_folder parent_id,
               sum(size) AS size, count(*) AS cnt
          FROM sample_data GROUP BY 1,2)
    ,descent AS (
        SELECT fs.file_id grp, fs.file_id, fs.size, fs.cnt, 1 k, 0 AS lvl
          FROM fs_tree fs
         WHERE fs.parent_id = 58091
        UNION ALL
        SELECT DISTINCT CASE WHEN k.k=0 THEN d.grp ELSE fs.file_id END AS grp,
               fs.file_id, fs.size, fs.cnt, k.k, d.lvl+1
          FROM descent d
          JOIN fs_tree fs ON d.file_id=fs.parent_id
          CROSS JOIN generate_series(0,1) k(k))
    /* the query */
    SELECT grp file_id, sum(size) size, sum(cnt) cnt
      FROM descent
     GROUP BY 1
     ORDER BY 1,2,3;
    

    In order to speed things up, you can implement Materialized Views and pre-calculate some metrics.


    Sample data

    Here’s a small dump that will show the data inside the tables:

    INSERT INTO fs_file VALUES (1, '/Users/viy/prj/logs', 'D', 0),
        (2, 'jobs', 'D', 1),
        (3, 'pg_csv_load', 'F', 2),
        (4, 'pg_logs', 'F', 2),
        (5, 'logs.sql', 'F', 1),
        (6, 'logs.sql~', 'F', 1),
        (7, 'pgfouine-1.2.tar.gz', 'F', 1),
        (8, 'u.sql', 'F', 1),
        (9, 'u.sql~', 'F', 1);
    
    INSERT INTO fs_tree VALUES (1, NULL, 0),
        (2, 1, 0),
        (3, 2, 936),
        (4, 2, 706),
        (5, 1, 4261),
        (6, 1, 4261),
        (7, 1, 793004),
        (8, 1, 491),
        (9, 1, 491);
    

    Note, that I’ve slightly updated create statements.

    And this is the script I’ve used to scan the filesystem:

    #!/usr/bin/python
    
    import os
    import psycopg2
    import sys
    from stat import *
    
    def walk_tree(full, parent, level, call_back):
        '''recursively descend the directory tree rooted at top,
           calling the callback function for each regular file'''
    
        if not os.access(full, os.R_OK):
            return
    
        for f in os.listdir(full):
            path = os.path.join(full, f)
            if os.path.islink(path):
                # It's a link, register and continue
                e = entry(f, "L", level)
                call_back(parent, e, 0)
                continue
    
            mode = os.stat(path).st_mode
            if S_ISDIR(mode):
                e = entry(f, "D", level)
                call_back(parent, e, 0)
                # It's a directory, recurse into it
                try:
                    walk_tree(path, e, level+1, call_back)
                except OSError:
                    pass
    
            elif S_ISREG(mode):
                # It's a file, call the callback function
                call_back(parent, entry(f, "F", level), os.stat(path).st_size)
            else:
                # It's unknown, just register
                e = entry(f, "U", level)
                call_back(parent, e, 0)
    
    def register(parent, entry, size):
        db_cur.execute("INSERT INTO fs_tree VALUES (%s,%s,%s)",
                       (entry, parent, size))
    
    def entry(name, type, level):
        db_cur.execute("""INSERT INTO fs_file(name,type, level)
                       VALUES (%s, %s, %s) RETURNING file_id""",
                       (name, type, level))
        return db_cur.fetchone()[0]
    
    db_con=psycopg2.connect("dbname=postgres")
    db_cur=db_con.cursor()
    
    if len(sys.argv) != 2:
        raise SyntaxError("Root directory expected!")
    
    if not S_ISDIR(os.stat(sys.argv[1]).st_mode):
        raise SyntaxError("A directory is wanted!")
    
    e=entry(sys.argv[1], "D", 0)
    register(None, e, 0)
    walk_tree(sys.argv[1], e, 1, register)
    
    db_con.commit()
    
    db_cur.close()
    db_con.close()
    

    This script is for Python 3.2 and is based on the example from official python documentation.

    Hope this clarifies things for you.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a table that stores a tree like structure of file names. There
I have a table which contains a list of Surveys (PK is ID) CREATE
I have a file containing a lot of SQL statements, such as: CREATE TABLE
I have a list of IDs in a text file like this: 24641985 ,
Lets Say i have a table like this WEB_LIST_TABLE KEY Value ---------------------------------------- 134 google.com
I have a table in my DB with a list of people. I need
I have this table CREATE TABLE [dbo].[friend_blocked_list]( [subdomain] [varchar](50) NOT NULL, [un] [nvarchar](50) NOT
We have a table value function that returns a list of people you may
I have a table a with a list of id's, and a user-defined function
I have a table which holds a list of criminal charges.These charges can be

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.