The question may seem vague, but let me explain it.
Suppose we have a function f(x,y,z ….) and we need to find its value at the point (x1,y1,z1 …..).
The most trivial approach is to just replace (x,y,z …) with (x1,y1,z1 …..).
Now suppose that the function is taking a lot of time in evaluation and I want to parallelize the algorithm to evaluate it. Obviously it will depend on the nature of function, too.
So my question is: what are the constraints that I have to look for while “thinking” to parallelize f(x,y,z…)?
If possible, please share links to study.
Asking the question in such a general way does not permit very specific advice to be given.
I’d begin the analysis by looking for ways to evaluate or rewrite the function using groups of variables that interact closely, creating intermediate expressions that can be used to make the final evaluation. You may find a way to do this involving a hierarchy of subexpressions that leads from the variables themselves to the final function.
In general the shorter and wider such an evaluation tree is, the greater the degree of parallelism. There are two cautionary notes to keep in mind that detract from “more parallelism is better.”
For one thing a highly parallel approach may actually involve more total computation than your original “serial” approach. In fact some loss of efficiency in this regard is to be expected, since a serial approach can take advantage of all prior subexpression evaluations and maximize their reuse.
For another thing the parallel evaluation will often have worse rounding/accuracy behavior than a serial evaluation chosen to give good or optimal error estimates.
A lot of work has been done on evaluations that involve matrices, where there is usually a lot of symmetry to how the function value depends on its arguments. So it helps to be familiar with numerical linear algebra and parallel algorithms that have been developed there.
Another area where a lot is known is for multivariate polynomial and rational functions.
When the function is transcendental, one might hope for some transformations or refactoring that makes the dependence more tractable (algebraic).
Not directly relevant to your question are algorithms that amortize the cost of computing function values across a number of arguments. For example in computing solutions to ordinary differential equations, there may be “multi-step” methods that share the cost of evaluating derivatives at intermediate points by reusing those values several times.
I’d suggest that your concern to speed up the evaluation of the function suggests that you plan to perform more than one evaluation. So you might think about ways to take advantage of prior evaluations or perform evaluations at related arguments in a way that contributes to your search for parallelism.
Added: Some links and discussion of search strategy
Most authors use the phrase “parallel function evaluation” to
mean evaluating the same function at multiple argument points.
See for example:
[Coarse Grained Parallel Function Evaluation — Rulon and Youssef]
http://cdsweb.cern.ch/record/401028/files/p837.pdf
A search strategy to find the kind of material Gaurav Kalra asks
about should try to avoid those. For example, we might include
“fine-grained” in our search terms.
It’s also effective to focus on specific kinds of functions, e.g.
“polynomial evaluation” rather than “function evaluation”.
Here for example we have a treatment of some well-known techniques
for “fast” evaluations applied to design for GPU-based computation:
[How to obtain efficient GPU kernels — Cruz, Layton, and Barba]
http://arxiv.org/PS_cache/arxiv/pdf/1009/1009.3457v1.pdf
(from their Abstract) “Here, we have tackled fast summation
algorithms (fast multipole method and fast Gauss transform),
and applied algorithmic redesign for attaining performance on
GPUs. The progression of performance improvements attained
illustrates the exercise of formulating algorithms for the
massively parallel architecture of the GPU.”
Another search term that might be worth excluding is “pipelined”.
This term invariably discusses the sort of parallelism that can
be used when multiple function evaluations are to be done. Early
stages of the computation can be done in parallel with later
stages, but on different inputs.
So that’s a search term that one might want to exclude. Or not.
Here’s a paper that discusses n-fold speedup for n-variate
polynomial evaluation over finite fields GF(p). This might be
of direct interest for cryptographic applications, but the
approach via modified Horner’s method may be interesting for
its potential for generalization:
[Comparison of Bit and Word Level Algorithms for Evaluating
Unstructured Functions over Finite Rings — Sunar and Cyganski]
http://www.iacr.org/archive/ches2005/018.pdf
“We present a modification to Horner’s algorithm for evaluating
arbitrary n-variate functions defined over finite rings and fields.
… If the domain is a finite field GF(p) the complexity of
multivariate Horner polynomial evaluation is improved from O(p^n)
to O((p^n)/(2n)). We prove the optimality of the presented algorithm.”
Multivariate rational functions can be considered simply as the
ratio of two such polynomial functions. For the special case
of univariate rational functions, which can be particularly
effective in approximating elementary transcendental functions
and others, can be evaluated via finite (resp. truncated)
continued fractions, whose convergents (partial numerators
and denominators) can be defined recursively.
The topic of continued fraction evaluations allows us to segue
to a final link that connects that topic with some familiar
parallelism of numerical linear algebra:
[LU Factorization and Parallel Evaluation of Continued Fractions
— Ömer Egecioglu]
http://www.cs.ucsb.edu/~omer/DOWNLOADABLE/lu-cf98.pdf
“The first n convergents of a general continued fraction
(CF) can be computed optimally in logarithmic parallel
time using O(n/log(n))processors.”