I would like to improve the lambda-code generated for the “assert” OCaml 3.12.1 construct. Here is an example:
let f x =
assert (x = 4);
assert (2 + x = 6);
assert (x - x = 0);
exit x
The file longfilename.ml above is representative of large OCaml modules for which I would like lambda-code generation to be improved. It compiles to:
$ ocamlopt -S longfilename.ml
$ cat longfilename.s
...
.data
.quad 3072
_camlLongfilename__2:
.quad L100007
.quad 9
.quad 9
.quad 2300
L100007: .L100007:
.ascii "longfilename.ml"
.byte 0
.data
.quad 3072
_camlLongfilename__3:
.quad L100006
.quad 7
.quad 9
.quad 2300
L100006: .L100006:
.ascii "longfilename.ml"
.byte 0
.data
.quad 3072
_camlLongfilename__4:
.quad L100005
.quad 5
.quad 9
.quad 2300
L100005: .L100005:
.ascii "longfilename.ml"
.byte 0
...
The above is terribly redundant. The name of the source file each assertion may come from is duplicated. The culprit appears to be bytecomp/translcore.ml:
let assert_failed loc =
(* [Location.get_pos_info] is too expensive *)
let fname = match loc.Location.loc_start.Lexing.pos_fname with
| "" -> !Location.input_name
| x -> x
in
let pos = loc.Location.loc_start in
let line = pos.Lexing.pos_lnum in
let char = pos.Lexing.pos_cnum - pos.Lexing.pos_bol in
Lprim(Praise, [Lprim(Pmakeblock(0, Immutable),
[transl_path Predef.path_assert_failure;
Lconst(Const_block(0,
[Const_base(Const_string fname);
Const_base(Const_int line);
Const_base(Const_int char)]))])])
;;
On the face of it, it looks like it would be enough to give a name to
Const_base(Const_string fname), and to store and reuse it with
a compile-time hash-table. For intra-module optimization,
the changes just might be manageable
(as long as the hash-table is reset at each compilation unit).
I am a little out of my depth here, especially the “reset at each compilation
unit” part. Any hint?
There already is a mechanism in the OCaml compiler to share some constants: see
asmcomp/compilenv.mland its use, in particular of thestructured_constantsvalue, inasmcomp/cmmgen.ml. I am not familiar with this code so am not sure why your particular use case is not shared, but it seems like there is a difference between, in the lambda-code,Const_base (Const_string foo)andConst_immstring foo; the later are shared, and maybe the former are not.I don’t know what the intended semantics is for
immstring. It seems to be used by the compiler internally to compile method labels (bytecomp/translclass.ml), but not exposed to the input language.(I suspect the distinction is because strings are mutable, so sharing user-visible strings would be observable and change programs behavior. But string constants are already lambda-lifted so users can already observe semantically-inconsistent sharing. Increasing sharing of user-visible strings would probably still be rejected as a compatibility break.)
Looking at the way those immediate strings are handled by the constant emitting code (
asmcomp/cmmgen.ml:emit_constant), they are represented like the usual strings, so maybe you could just patch the compiler to use animmstringinassert_failedand things would work.[EDIT BY OP]
Changing
Const_base (Const_string fname)intoConst_immstring fname, while slightly incompatible, allows OCaml to compile itself, to compile Frama-C and the new Frama-C passes its regression tests. On the original example, the effect is as follows, which was exactly the desired result: