I have a set of documents that represent some workitems:
public class WorkItem
{
public string Id {get;set;
public string DocumentId { get; set; }
public string FieldId { get; set; }
public bool IsValidated { get; set; }
}
public class ExtractionUser
{
public string Id {get;set;}
public string Name {get;set;}
public string[] AssignedFields {get;set;}
}
A user has access to a set of FieldIds. I need to query the WorkItems based on this set of fields and get out a status per document:
public class UserWorkItems
{
public string DocumentId { get; set; }
public int Validated { get; set; }
public int Total { get; set; }
}
The query I’m a after is this:
using (var session = RavenDb.OpenSession())
{
string[] userFields = session.Load<User>("users/1").Fields;
session.Query<WorkItem>()
.Where(w => w.FieldId.In(userFields))
.GroupBy(w => w.DocumentId)
.Select(g => new
{
DocumentId = g.Key,
Validated = g.Where(w => w.IsValidated).Count(),
Total = g.Count()
}).Skip(page * perPage).Take(perPage)
.ToArray();
}
I have tried creating a Map/Reduce index but the main problem was that I need to be able to apply a filter on the FieldId which is not included in the Reduce output since it is the property that is counted.
I have also tried doing a simple Map index on the FieldId for the query part and a TransformResults to perform the GroupBy – but since the paging is applied before the TransformResults the pages and totals reflect the documents before grouping which is not good.
Then i’ve tried to use a Multi Map index that maps users and their fields collection and also maps the workitems and field then try to reduce the result to what i wanted. I’ve created a gist with the index definition. The reduce part involves a group by field and then multiple SelectMany and a final GroupBy and Select. The index has been accepted by raven, but i does not return any results. I’m a bit stuck at the Multi Map index as i don’t know how i could actually debug it.
I guess in the end my problem could be reduced (pun intended) to how to query on a “reduced” field?
Any ideas how I could achieve such a functionality? Are there any other options I could explore beside Map/MultiMap/Reduce/TransformResults?
UPDATE: While reading Ayende’s Map Reduce post I realised I’m approaching mapreduce wrong. Still looking for a solution …
UPDATE 2: After a bit more research I’ve ended up with this index which looks like what i want to do but does not return any data (the index was defined directly in the studio):
Map:
from user in docs
where user["@metadata"]["Raven-Entity-Name"] == "ExtractionUsers"
from field in user.AssignedFields
from item in docs
where item["@metadata"]["Raven-Entity-Name"] == "WorkItems" && item.FieldId == field
select new {
UserId = user.Id,
DocumentId = item.DocumentId,
Validated = item.Status=="Validated"? 1: 0,
Count = 1
}
Reduce:
from r in results
group r by new { r.UserId , r.DocumentId } into g
select new {
UserId = g.Key.UserId,
DocumentId = g.Key.DocumentId,
Validated = g.Sum(d => d.Validated),
Count = g.Sum(d => d.Count),
}
The idea is to try to map in the index all the documents, and link from Users to Fields and to WorkItems.
After a week I’ve managed to solve the problem. I’ve took a slightly different (less relational) approach that is a simple and seems to work fine. Here are the details in case somebody else has this kind of problems:
I group the WorkItems by DocumentId and put in a collection the Validated and the NonValidated fields. The result of the map reduce looks like this:
The Map function looks like this:
And the Reduce :
To query the index I now use the following expression:
The only thing i need to do client side is count only the fields that belong to the user.
There is also a gist with the solution.