The function search below searches for two inputs which have the same output under some function. During the search it iterates over the input list xs twice, and this input list could be very large, e.g. [0..1000000000]. I’d rather use memory for storing the HashSet created by collision rather than storing the elements of xs, and my understanding is that even though xs could be lazily computed it would be kept around in case it was needed for the call to find.
Questions:
- is this understanding correct?
- if I keep it as a list is there a way I can have
xsrecomputed if it is passed tofind? - is there an alternative data structure I can use for
xswhich allows me to control the space used?xsis just used to specify which inputs to check.
Note that there are no type restrictions on xs – it can be a collection of any type.
import Data.HashSet as Set
import Data.Hashable
import Data.List
search :: (Hashable b, Eq b) => (a->b) -> [a] -> Maybe (a,a)
search h xs =
do x0 <- collision h xs
let h0 = h x0
x1 <- find (\x -> (h x) == h0) xs
return (x0,x1)
collision :: (Hashable b, Eq b) => (a->b) -> [a] -> Maybe a
collision h xs = go Set.empty xs
where
go s [] = Nothing
go s (x:xs) =
if y `Set.member` s
then Just x
else go (Set.insert y s) xs
where y = h x
main = print $ search (\x -> x `mod` 21) ([10,20..2100] :: [Int])
I answered basically this question here: https://stackoverflow.com/a/6209279/371753
Here’s the relevant code.
Long story short, instead of passing around a list, pass around a data type that describes how to generate a list. Now you can write functions directly over the stream, or you can use the
usToListfunction to use the list functions you already have.