(sorry in advance for the long question – the problem is actually simple – but to explain it is maybe not so simple )
My noobie skills in PHP are challenged by this :
Input of 2 TXT files with a structure like this :
$rowidentifier //number,letter,string etc..
$some semi-fixed-string $somedelimiter $semi-fixed-string
$content //with unknown length or strings or lines number.
reading the above , my meaning in “semi-fixed-string, means that it is a string with a KNOWN structure, but UNKNOWN content..
to give a practical example, let´s take an SRT file (i just use it as a guinea pig as the structure is very similar to what I need ):
1
00:00:12,759 --> 00:00:17,458
"some content here "
that continues here
2
00:00:18,298 --> 00:00:20,926
here we go again...
3
00:00:21,368 --> 00:00:24,565
...and this can go forever...
4
.
.
.
what I want to do , is to take the the $content part from one file, and put it IN THE RIGHT PLACE at the second file .
going back to the example SRT , having :
//file1
1
00:00:12,759 --> 00:00:17,458
"this is the italian content "
which continues in italian here
2
00:00:18,298 --> 00:00:20,926
here we go talking italian again ...
and
//file2
1
00:00:12,756 --> 00:00:17,433
"this is the spanish, chinese, or any content "
which continues in spanish, or chinese here
2
00:00:16,293 --> 00:00:20,96
here we go talking spanish, chinese or german again ...
will result in
//file3
1
00:00:12,756 --> 00:00:17,433
"this is the italian content "
which continues in italian here
"this is the spanish, chinese, or any content "
which continues in spanish, or chinese here
2
00:00:16,293 --> 00:00:20,96
here we go talking italian again ...
here we go talking spanish, chinese or german again ...
or more php like :
$rowidentifier //unchanged
$some semi-fixed-string $somedelimiter $semi-fixed-string //unchanged, except maybe an option to choose if to keep file1 or file2 ...
$content //from file 1
$content //from file 2
so, after all this introduction – this is what I have (which amounts to nothing actually..)
$first_file = file('file1.txt'); // no need to comment right ?
$second_file = file('file2.txt'); // see above comment
$result_array = array(); /construct array
foreach($first_file as $key=>$value) //loop array and....
$result_array[]= trim($value).'/r'.trim($second_file[$key]); //..here is my problem ...
// $Value is $content - but LINE BY LINE , and in our case, it could be 2-3- or even 4 lines
// should i go by delimiters /n/r ?? (not a good idea - how can i know they are there ?? )
// or should i go for regex to lookup for string patterns ? that is insane , no ?
$fp = fopen('merge.txt', 'w+'); fwrite($fp, join("\r\n", $result_array); fclose($fp);
this will do line by line – which is not what i need. I need conditions..
also – I am sure this is not a smart code, or that there are many better ways to go at it – so any help would be appreciated …
What you actually want to do is to iterate over both files in parallel and then combine the part belonging to each other.
But you can not use the line numbers, because those might differ. So you need to use the number of the entry (block). So you need to give it a “number” or more precise, to get out one entry after the other from a file.
So you need an iterator for the data in question that is able to turn some lines into a block.
So instead of:
it is
This can be done by writing your own iterator that takes a file’s line as input and will then convert lines into blocks on the fly. For that you need to parse the data, this is a small example of a state based parser that can convert lines into blocks:
It just runs along the lines and changes it’s state. Based on that state, each line is processed as part of it’s block. If a new block begins, it will be created. It works for the SRT file you’ve outline in your question, demo.
To make the use of it more flexible, turn it into an iterator which takes
$linesin it’s constructor and offers the blocks while iterating. This needs some little adoption how the parser gets the lines to work on but it works generally the same.The basic usage is the following, the output can be seen in the Demo:
So now it’s possible to iterate over all blocks in a SRT file. The only thing left now is to iterate over both SRT files in parallel. Since PHP 5.3 the SPL comes with the
MultipleIteratorthat does this. It’s now pretty straight forward, for the example I use the same lines twice:To store the string (instead of outputting) into a file is rather trivial, so I leave this out of the answer.
So what to remark? First, sequential data like lines in a file can be easily parsed in a loop and some state. That works not only for lines in a file but also across strings.
Second, why did I suggest an iterator here? First it’s easy to use. It was only a small step from handling one file to two files in parallel. Next to that, the iterator can actually operate on another iterator as well. For example with the
SPLFileObjectclass. It provides an iterator over all lines in a file. If you have large files, you can just use theSPLFileObject(instead an array) and you won’t need to load both files into arrays first, after a small addition toSRTBlocksthat removes trailing EOL characters from the end of each line:It just works:
That done you can process even really large files with (nearly) the same code. Flexible, isn’t it? The full demonstration.