Monday, August 20, 2012

Mapped results "table"

Most of the tables in RavenDB are very simple "storage dumps".  There is an ID, some row properties, and a data section.  Then there is typically at most one secondary key, and one sorting key.  For example, the documents table.  There's the internal document ID, the secondary index which is the document key and the sorting key is the etag.  These tables turn out to be easy to achieve in BDB since it has built-in support for secondary key lookups.

The mapped results table is a little different since we are looking items up by more complicated keys and sorts.  I will attempt to describe how this is done in BDB as clearly as I can, since I had quite a few misses before achieving the results I wanted.

The mapped results table

The mapped results table holds raven's mapped results (obviously).  I don't fully understand what all of the columns (nor do I have to) are used for since I'm not that familiar with the specifics of how raven pulls off map-reduce.  But here are the columns for the table:
  • primary key ID
  • view name
  • document key
  • reduce key
  • reduce key/view hash
  • document etag
  • timestamp
  • document data
Now, there are a few ways that we access the data in the table (writing all of these out will help us model our  secondary indexes)
  1. get the most recent etag for a view
  2. get the document data for a view after a specified etag
  3. delete documents given a view
  4. get the document data documents given a view and a reduce key
  5. delete documents given a view and a document key
Once we make a slight change from the way the esent storage engine operates, most of these are easy to implements and involve a since secondary index combined with the primary row data.  The change we are going to make is to make the etag the primary key and remove the auto-incrementing ID.

We can then add a secondary index for the view.  This will map every document etag to the view it's part of.  We sort the view in ascending order and the etag in this secondary index in reverse order.  We tell BDB that we want duplicated in the index and we want to sort these indexes

this.indexByViewFile = env.CreateDatabase(DbCreateFlags.None);
indexByViewFile.PageSize = 512;
indexByViewFile.SetFlags(DbFlags.Dup | DbFlags.DupSort);
indexByViewFile.DupCompareFast = GuidCompareDesc;

Now we use this index to solve:
  1. open a cursor on indexByView, jump to the given view, then get the first primary key for the index.  Since the duplicates are stored in reverse order it will always be the most recent etag
  2. open a cursor on indexByView, jump to the given view, now continue to pull the duplicate secondary index records until we hit the stop etag.  Again since the duplicates are stored in reverse order, all of the keys we pull from the index until we hit the stop etag will be more recent than that stop etag.
  3. open a cursor on indexByView, jump to the given view, look at all duplicates and delete them
The next secondary index we need is based on the reduce key/view hash.  This secondary index will solve:
  1. open a cursor on indexByHash, jump to the given hash, then enumerate all duplicate keys.  This will give us all the etag that match the reduce key and view.  And just like the esent engine we also verify that the reduce key and view match in case of hash clashes.
The final access is the slightly more complicated scenario since we are trying to match two separate fields.  Of course BDB supports this, it's just a little more work.  We want to match by view and document key.  And we already have the secondary index on view, so we need to add another secondary index on document key.  Now, we use something called a join cursor in BDB.  This allows us to start up two cursors then join them together to access the primary data while both cursors continue to match.
  1. open a cursor on indexByView and jump to the given view.  Then open a cursor on indexByKey and jump to the given document key.  Now, with the primary row table, join these two cursors into another cursor.  Iterating over this cursor will give us etag that match the view and the document key.

using(var cursorView = indexByView.OpenCursor(txn))
using(var cursorKey = indexByKey.OpenCursor(txn))
{
  string currentView, currentKey;

  if(cursorView.JumpTo(view, out currentView) == ReadStatus.NotFound)
   return reduceKeys;
  if (currentView != view)
   return reduceKeys;

  if (cursorKey.JumpTo(documentId, out currentKey) == ReadStatus.NotFound)
   return reduceKeys;
  if (currentKey != documentId)
   return reduceKeys;

  using(var cursor = primaryTable.Join(new DbFileCursor[] { cursorView, cursorKey }, Db.JoinMode.None, 0))
  {
   var vKey = DbEntry.Out(new byte[16]);
   var vData = DbEntry.Out(new byte[128]);

   while (true)
   {
    var status = cursor.Get(ref vKey, ref vData, DbJoinCursor.GetMode.None, DbJoinCursor.ReadFlags.None);
    if (status == ReadStatus.NotFound)
     break;

    int viewLength = BitConverter.ToInt32(vData.Buffer, 40);
    int keyLength = BitConverter.ToInt32(vData.Buffer, 44);
    int reduceKeyLength = BitConverter.ToInt32(vData.Buffer, 48);
    string reduceKey = Encoding.Unicode.GetString(vData.Buffer, 56+viewLength+keyLength, reduceKeyLength);

    reduceKeys.Add(reduceKey);
    deletes.Add(new Guid(vKey.Buffer));
   }
  }
 }
}

No comments:

Post a Comment