I find it very common to want to model relational data in my functional programs. For example, when developing a web-site I may want to have the following data structure to store info about my users:
data User = User
{ name :: String
, birthDate :: Date
}
Next, I want to store data about the messages users post on my site:
data Message = Message
{ user :: User
, timestamp :: Date
, content :: String
}
There are multiple problems associated with this data structure:
- We don’t have any way of distinguishing users with similar names and birth dates.
- The user data will be duplicated on serialisation/deserialisation
- Comparing the users requires comparing their data which may be a costly operation.
- Updates to the fields of
Userare fragile — you can forget to update all the occurences ofUserin your data structure.
These problems are manageble while our data can be represented as a tree. For example, you can refactor like this:
data User = User
{ name :: String
, birthDate :: Date
, messages :: [(String, Date)] -- you get the idea
}
However, it is possible to have your data shaped as a DAG (imagine any many-to-many relation), or even as a general graph (OK, maybe not). In this case, I tend to simulate the relational database by storing my data in Maps:
newtype Id a = Id Integer
type Table a = Map (Id a) a
This kind of works, but is unsafe and ugly for multiple reasons:
- You are just an
Idconstructor call away from nonsensical lookups. - On lookup you get
Maybe a, but often the database structurally ensures that there is a value. - It is clumsy.
- It is hard to ensure referential integrity of your data.
- Managing indices (which are very much necessary for performance) and ensuring their integrity is even harder and clumsier.
Is there existing work on overcoming these problems?
It looks like Template Haskell could solve them (as it usually does), but I would like not to reinvent the wheel.
The
ixsetlibrary (orixset-typed, a more type-safe version) will help you with this. It’s the library that backs the relational part ofacid-state, which also handles versioned serialization of your data and/or concurrency guarantees, in case you need it.The Happstack Book has an IxSet tutorial.
The thing about
ixsetis that it manages "keys" for your data entries automatically.For your example, one would create one-to-many relationships for your data types like this:
You can then find the message of a particular user. If you have built up an
IxSetlike this:… you can then find messages by
user1with:If you need to find the user of a message, just use the
userfunction like normal. This models a one-to-many relationship.Now, for many-to-many relations, with a situation like this:
… you create an index with
ixFun, which can be used with lists of indexes. Like so:To find all the messages by an user, you still use the same function:
Additionally, provided that you have an index of users:
… you can find all the users for a message:
If you don’t want to have to update the users of a message or the messages of a user when adding a new user/message, you should instead create an intermediary data type that models the relation between users and messages, just like in SQL (and remove the
usersandmessagesfields):Creating a set of these relations would then let you query for users by messages and messages for users without having to update anything.
The library has a very simple interface considering what it does!
EDIT: Regarding your "costly data that needs to be compared":
ixsetonly compares the fields that you specify in your index (so to find all the messages by a user in the first example, it compares "the whole user").You regulate which parts of the indexed field it compares by altering the
Ordinstance. So, if comparing users is costly for you, you can add anuserIdfield and modify theinstance Ord Userto only compare this field, for example.This can also be used to solve the chicken-and-egg problem: what if you have an id, but neither a
User, nor aMessage?You could then simply create an explicit index for the id, find the user by that id (with
userSet @= (12423 :: Id)) and then do the search.