Recently I’ve been busy with some PHP framework – completely off-topic by the way.
Anyhow, I got specific html/template files I would like to parse with C++ (don’t ask me why, it’s just because I want to write it in C++). Besides that, it might actually be the first useful thing I would ever write in C++.
Anyway, to get back to the problem, imagine I have a file like the following:
<table>
<tr>
<th>ID</th>
<th>Title</th>
<th>Actions</th>
</tr>
{foreach from="$pages => $page"}
<tr>
<td>{$page.Id()}</td>
<td>{$page.Title()}</td>
<td><a href="page/edit/{$page.Id()}/">Edit</a> | <a href="page/delete/{$page.Id()}/">Delete</a></td>
</tr>
{foreachelse}
<tr>
<td colspan="3">There are no pages to be displayed</td>
</tr>
{/foreach}
</table>
And the output should be:
<table>
<tr>
<th>ID</th>
<th>Title</th>
<th>Actions</th>
</tr>
<?php if(count($pages) > 0): ?>
<?php foreach($pages as $page): ?>
<tr>
<td><?php echo $page->getId(); ?></td>
<td><?php echo $page->getTitle(); ?></td>
<td><a href="page/edit/<?php echo $page->getId(); ?>/">Edit</a> | <a href="page/delete/<?php echo $page->getId(); ?>/">Delete</a></td>
</tr>
<?php endforeach; ?>
<?php else: ?>
<tr>
<td colspan="3">There are no pages to be displayed</td>
</tr>
<?php endif; ?>
</table>
Why I am doing this might not be exactly clear to you, but it remains a problem, applicable somewhere else in any case.
Anyhow, some forward and backward lookups and modifications in the output files are required. What is the right approach to this problem?
For these type of problems I tend to be inclined towards REGEX. Using either
boost::regexor the GNU regex classes or any other library. Identifying those markers and converting them is mostly a regex search and replace thing (with parameters for variable names, values, etc.), and you don’t have to write code to actually parse the complete HTML and the special inserts.