I have some Fortran 77 source files that I’m trying to convert from a non-standard STRUCTURE and RECORD syntax to the standardized Fortran 90 TYPE syntax. One tricky aspect of this is the different way that structure members are addressed.
Non-standard:
s.member = 1
Standard:
s%member = 1
So, I need to trap all uses of periods in these sort of scenarios and replace them with % characters. Not too bad, except when you think about all of the ways that periods can be used (decimal points in numbers, filenames in include statements, punctuation in comments, Fortran 77 relational operators, maybe others). I’ve done some preprocessing to fix the relational operators to use the Fortran 90 symbols, and I don’t really care about mangling the grammar of comments, but I haven’t come up with a good approach to translate the . to % for the cases above. It seems like I should be able to do this with sed, but I’m not sure how to match the instances I need to fix. Here are the rules that I’ve thought of:
On a line-by-line basis:
-
If the line begins with
<whitespace>include, then we shouldn’t do anything to that line; pass it through to the output, so we don’t mess up the filename inside the include statement. -
The following strings are operators that don’t have symbolic equivalents, so they must be left alone:
.not. .and. .or. .eqv. .neqv. -
Otherwise, if we find a period that is surrounded by 2 non-numeric characters (so it’s not a decimal point), then it should be the operator that I’m looking to replace. Change that period to a
%.
I’m not a native Fortran speaker myself, so here are some examples:
include 'file.inc' ! We don't want to do anything here. The line can
! begin with some amount of whitespace
if x == 1 .or. y > 2.0 ! In this case, we don't want to touch the periods that
! are part of the logical operator ".or.". We also don't
! want to touch the period that is the decimal point
! in "2.0".
if a.member < 4.0 .and. b.othermember == 1.0 ! We don't want to touch the periods
! inside the numbers, but we need to
! change the "a." and "b." to "a%"
! and "b%".
Any good way of tackling this problem?
Edit: I actually found some additional operators that contain a dot in them that don’t have symbolic equivalents. I’ve updated the rule list above.
You can’t do this with a regexp, and it’s not that easy.
If I had to do what you have to, I would probably do it by hand, unless the codebase is huge. If the former applies, first replace all [a-zA-Z0-9].[a-zA-Z] to something very weird that is guaranteed never to compile, something like “@WHATEVER@”, then proceed to search all these entries and replace them by hand after manual control.
If the amount of code is huge, then you need to write a parser. I would suggest you to use python to tokenize basic fortran constructs, but remember that fortran is not an easy language to parse. Work “per routine”, and try to find all variable names used, using them as a filter. If you encounter something like
a.whatever, and you know thatais in the list of local or global vars, apply the change.