Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6319791
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T15:51:54+00:00 2026-05-24T15:51:54+00:00

Given input, which shows tag assignments to images, as follows (reading this from php://stdin

  • 0

Given input, which shows tag assignments to images, as follows (reading this from php://stdin line by line, as the input can get rather large)

image_a tag_lorem
image_a tag_ipsum
image_a tag_amit
image_b tag_sit
image_b tag_dolor
image_b tag_ipsum
... (there are more lines, may get up to a million)

Output of the input is shown as follows. Basically it is the same format with another entry showing whether the image-tag combination exists in input. Note that for every image, it will list all the available tags and show whether the tag is assigned to the image by using 1/0 at the end of each line.

image_a tag_sit 0
image_a tag_lorem 1
image_a tag_dolor 0
image_a tag_ipsum 1
image_a tag_amit 1
image_b tag_sit 1
image_b tag_lorem 0
image_b tag_dolor 1
image_b tag_ipsum 1
image_b tag_amit 0
... (more)

I have posted my no-so-efficient solution down there. To give a better picture of input and output, I fed 745 rows (which explains tag assignment of 10 images) into the script via stdin, and I receive 555025 lines after the execution of the script using about 0.4MB of memory. However, it may kill the harddisk faster because of the heavy disk I/O activity (while writing/reading to the temporary column cache file).

Is there any other way of doing this? I have another script that can turn the stdin into something like this (not sure if this is useful)

image_foo tag_lorem tag_ipsum tag_amit
image_bar tag_sit tag_dolor tag_ipsum

p/s: order of tag_* is not important, but it has to be the same for all rows, i.e. this is not what i want (notice the order of tag_* is inconsistent for both tag_a and tag_b)

image_foo tag_lorem 1
image_foo tag_ipsum 1
image_foo tag_dolor 0
image_foo tag_sit 0
image_foo tag_amit 1
image_bar tag_sit 1
image_bar tag_lorem 0
image_bar tag_dolor 1
image_bar tag_ipsum 1
image_bar tag_amit 0

p/s2: I don’t know the range of tag_* until i finish reading stdin

p/s3: I don’t understand why I get down-voted, if clarification is needed I am more than happy to provide them, I am not trying to make fun of something or posting nonsense here. I have re-written the question again to make it sound more like a real problem (?). However, the script really doesn’t have to care about what the input really is or whether database is used (well, the data is retrieved from an RDF data store if you MUST know) because I want the script to be usable for other type of data as long as the input is in right format (hence the original version of this question was very general).

p/s4: I am trying to avoid using array because I want to avoid out of memory error as much as possible (if 745 lines expaining just 10 images will be expanded into 550k lines, just imagine I have 100, 1000, or even 10000+ images).

p/s5: if you have answer in other language feel free to post it here. I have thought of solving this using clojure but still couldn’t find a way to do it properly.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T15:51:55+00:00Added an answer on May 24, 2026 at 3:51 pm

    Sorry, maby I misunderstood you – this looks too easy:

    $stdin = fopen('php://stdin', 'r');
    $columns_arr=array();
    $rows_arr=array();
    function set_empty_vals(&$value,$key,$columns_arr) {
        $value=array_merge($columns_arr,$value);
        ksort($value);
        foreach($value AS $val_name => $flag) {
            echo $key.' '.$val_name.' '.$flag.PHP_EOL;
        }
        $value=NULL;
    }
    while ($line = fgets($stdin)) {
        $line=trim($line);
        list($row,$column)=explode(' ',$line);
        $row=trim($row);
        $colum=trim($column);
        if(!isset($rows_arr[$row]))
            $rows_arr[$row]=array();
        $rows_arr[$row][$column]=1;
        $columns_arr[$column]=0;
    }
    array_walk($rows_arr,'set_empty_vals',$columns_arr);
    

    UPD:

    1 million lines is easy for php:

    $columns_arr = array();
    $rows_arr = array();
    
    function set_null_arr(&$value, $key, $columns_arr) {
        $value = array_merge($columns_arr, $value);
        ksort($value);
        foreach($value AS $val_name => $flag) {
            //echo $key.' '.$val_name.' '.$flag.PHP_EOL;
        }
        $value=NULL;
    }
    
    for ($i = 0; $i < 100000; $i++) {
        for ($j = 0; $j < 10; $j++) {
            $row='row_foo'.$i;
            $column='column_ipsum'.$j;
            if (!isset($rows_arr[$row]))
                $rows_arr[$row] = array();
            $rows_arr[$row][$column] = 1;
            $columns_arr[$column] = 0;
        }
    }
    array_walk($rows_arr, 'set_null_arr', $columns_arr);
    
    echo memory_get_peak_usage();
    

    147Mb for me.

    Last UPD – this is how I see low memory usage(but rather fast) script:

    //Approximate stdin buffer size, 1Mb should be good
    define('MY_STDIN_READ_BUFF_LEN', 1048576);
    //Approximate tmpfile buffer size, 1Mb should be good
    define('MY_TMPFILE_READ_BUFF_LEN', 1048576);
    //Custom stdin line delimiter(\r\n, \n, \r etc.)
    define('MY_STDIN_LINE_DELIM', PHP_EOL);
    //Custom stmfile line delimiter - chose smallset possible
    define('MY_TMPFILE_LINE_DELIM', "\n");
    //Custom stmfile line delimiter - chose smallset possible
    define('MY_OUTPUT_LINE_DELIM', "\n");
    
    function my_output_arr($field_name,$columns_data) {
        ksort($columns_data);
        foreach($columns_data AS $column_name => $column_flag) {
            echo $field_name.' '.$column_name.' '.$column_flag.MY_OUTPUT_LINE_DELIM;
        }
    }
    
    $tmpfile=tmpfile() OR die('Can\'t create/open temporary file!');
    $buffer_len = 0;
    $buffer='';
    //I don't think there is a point to save columns array in file -
    //it should be small enough to hold in memory.
    $columns_array=array();
    
    //Open stdin for reading
    $stdin = fopen('php://stdin', 'r') OR die('Failed to open stdin!');
    
    //Main stdin reading and tmp file writing loop
    //Using fread + explode + big buffer showed great performance boost
    //in comparison with fgets();
    while ($read_buffer = fread($stdin, MY_STDIN_READ_BUFF_LEN)) {
        $lines_arr=explode(MY_STDIN_LINE_DELIM,$buffer.$read_buffer);
        $read_buffer='';
        $lines_arr_size=count($lines_arr)-1;
        $buffer=$lines_arr[$lines_arr_size];
        for($i=0;$i<$lines_arr_size;$i++) {
            $line=trim($lines_arr[$i]);
            //There must be a space in each line - we break in it
            if(!strpos($line,' '))
                continue;
            list($row,$column)=explode(' ',$line,2);
            $columns_array[$column]=0;
            //Save line in temporary file
            fwrite($tmpfile,$row.' '.$column.MY_TMPFILE_LINE_DELIM);
        }
    }
    fseek($tmpfile,0);
    
    $cur_row=NULL;
    $row_data=array();
    while ($read_buffer = fread($tmpfile, MY_TMPFILE_READ_BUFF_LEN)) {
        $lines_arr=explode(MY_TMPFILE_LINE_DELIM,$buffer.$read_buffer);
        $read_buffer='';
        $lines_arr_size=count($lines_arr)-1;
        $buffer=$lines_arr[$lines_arr_size];
        for($i=0;$i<$lines_arr_size;$i++) {
            list($row,$column)=explode(' ',$lines_arr[$i],2);
            if($row!==$cur_row) {
                //Output array
                if($cur_row!==NULL)
                    my_output_arr($cur_row,array_merge($columns_array,$row_data));
                $cur_row=$row;
                $row_data=array();
            }
            $row_data[$column]=1;
        }
    }
    
    if(count($row_data)&&$cur_row!==NULL) {
        my_output_arr($cur_row,array_merge($columns_array,$row_data));
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

By which I mean this: Given the input set of numbers: 1,2,3,4,5 becomes 1-5.
Suppose that my Haskell function is given an input, which is supposed to be
Given this input: [1,2,3,4] I'd like to generate the set of spanning sets: [1]
So given this input string: =?ISO-8859-1?Q?TEST=2C_This_Is_A_Test_of_Some_Encoding=AE?= And this function: private string DecodeSubject(string input) {
I was looking into sorting tables by a column designated given some input, and
I am looking to transform a input xml given below <profile name=default> <color id=forecolor
I am attempting to compose a style sheet that, given an XML input (obviously)
Given following Ruby statements: (Read input and store each word in array removing spaces
Given SQL as an input, I have to query a PostgreSQL database and return
Given multiple HTML checkboxes: <input type=checkbox name=catIDs value=1 /> <input type=checkbox name=catIDs value=2 />

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.