I found the concept as in a paper on dynamic instrumentation. But I couldnt find the explanation of this concept. Please explain, if possible…
EDIT: or is there any tutorial on how to achieve lightweight dynamic instrumentation (in user space, for syscalls and normal function calls)?
EDIT(Added paper details):
A code generation approach to optimizing high-performance distributed data stream processing
We present a code-generation-based
optimization approach to bringing
performance and scalability to
distributed stream processing
applications. We express stream
processing applications using an
operator-based, stream-centric
language called SPADE, which supports
composing distributed data flow graphs
out of toolkits of type-generic
operators. A major challenge in
building such applications is to find
an effective and flexible way of
mapping the logical graph of operators
into a physical one that can be
deployed on a set of distributed
nodes. This involves finding how best
operators map to processes and how
best processes map to computing nodes.
In this paper, we take a two-stage
optimization approach, where an
instrumented version of the
application is first generated by the
SPADE compiler to profile and collect
statistics about the processing and
communication characteristics of the
operators within the application. In
the second stage, the profiling
information is fed to an optimizer to
come up with a physical data flow
graph that is deployable across nodes
in a computing cluster. This approach
not only creates highly optimized
applications that are tailored to the
underlying computing and networking
infrastructure, but also makes it
possible to re-target the application
to a different hardware setup by
simply repeating the optimization step
and re-compiling the application to
match the physical flow graph produced
by the optimizer. Using real-world
applications, from diverse domains
such as finance and radio-astronomy,
we demonstrate the effectiveness of
our approach on System S — a
large-scale, distributed stream
processing platform.
Instrumentation means inserting code into a stream of instructions whose purpose is to measure something — execution time, function calls, data access, all sorts of things relating to profiling. That’s one of two ways to do profiling, and it’s the more accurate but slower one. The other one is sampling, where you periodically interrupt the program and look at its current state. This has less performance impact but isn’t as accurate, especially for short runs.