I’m trying to figure out how to strip content after the closing HTML tag using only bash or common GNU tools. For example, given the following HTML template, what would be an efficient way to remove the trailing comment without touching the embedded comment and not using an external language such as Python?
<!DOCTYPE html>
<html>
<head>
<title>Site | Page 1</title>
</head>
<body>
<!-- Don't delete me! -->
</body>
</html>
<!--
Man, I really wish to vanish!
-->
The only thing I can come up with is to read the whole file into memory and process it there, i.e. something archaic as getting the location of the closing HTML tag with regex magic, truncating thereafter, and writing back out to disk.
sed:Example:
Where:
-n— suppress automatic printing of pattern space1is the first line</html>‘ is the last linepprints these lines