There are times in the raven storage engine protocol where only the document data is read, or only the metadata is read, or only the etag is checked. BDB supports partial reads of data from the datafile, so we want to make sure we take advantage of that as well.
Where's the data
I am going to lay the document primary data section out as a fixed sized header followed by the three variable sized buffers.
[StructLayout(LayoutKind.Sequential, Pack = 0)] unsafe private struct DocumentHeader { public Guid Etag; public long LastModifiedFileTime; public int KeySize; public int MetadataSize; public int DocumentSize; //KEYDATA[KeySize] //METADATA[MetadataSize] //DOCUMENT[DocumentSize] }
This is the structure for the raw data that will be stored in the data file.
Give me the data
Raven calls IDocumentStorageActions.AddDocument when it wants to add a document. The old etag (if an update) is passed in along with the metadata and the actual document data (there's some ravendb transaction operations that I'm going to skip for now since I'm just trying to get a document in the database).
We want to forward this call down to our document table class.
Guid newEtag = uuidGenerator.CreateSequentialUuid(); using(var ms = new MemoryStream()) { data.WriteTo(ms); dataBuffer = ms.ToArray(); } using(var ms = new MemoryStream()) { metadata.WriteTo(ms); metadataBuffer = ms.ToArray(); } database.DocumentTable.AddDocument(transaction, key, newEtag, SystemTime.UtcNow, dataBuffer, metadataBuffer);
This will give us the raw data we want to store in the database.
Store it
unsafe public void AddDocument(Txn transaction, string key, Guid etag, DateTime dateTime, byte[] data, byte[] metadata) { DbEntry dkey; var keyBuffer = Encoding.Unicode.GetBytes(key); var dataBuffer = new byte[documentBaseLength + keyBuffer.Length + data.Length + metadata.Length]; var header = new DocumentHeader { Etag = etag, LastModifiedFileTime = dateTime.ToFileTime(), KeySize = keyBuffer.Length, DocumentSize = data.Length, MetadataSize = metadata.Length }; //find the existing document key var existingId = GetDocumentIdByKey(transaction, key); //update or insert? if(existingId == 0) { long lastId = 0; var vlastId = DbEntry.Out(new byte[8]); var vlastData = DbEntry.EmptyOut(); using (var cursor = dataTable.OpenCursor(transaction, DbFileCursor.CreateFlags.None)) { if (cursor.Get(ref vlastId, ref vlastData, DbFileCursor.GetMode.Last, DbFileCursor.ReadFlags.None) != ReadStatus.NotFound) lastId = BitConverter.ToInt64(vlastId.Buffer, 0); } dkey = DbEntry.InOut(BitConverter.GetBytes(lastId + 1)); } else { dkey = DbEntry.InOut(BitConverter.GetBytes(existingId)); } var offset = 0; Marshal.Copy(new IntPtr(&header), dataBuffer, offset, documentBaseLength); offset += documentBaseLength; Buffer.BlockCopy(keyBuffer, 0, dataBuffer, offset, keyBuffer.Length); offset += keyBuffer.Length; Buffer.BlockCopy(metadata, 0, dataBuffer, offset, metadata.Length); offset += metadata.Length; Buffer.BlockCopy(data, 0, dataBuffer, offset, data.Length); var dvalue = DbEntry.InOut(dataBuffer); dataTable.Put(transaction, ref dkey, ref dvalue); }
That's quite a function call, but it's fairly simple when you break it down:
- Form the document header structure
- Search the secondary index for the document key and get the primary key (if it exists)
- If we found the primary key then this is an update, not an insert.
- If it's an insert then we need to generate a new primary key (again, we need to perform the auto-incrementing primary key)
- We find the highest current primary key by using a BDB cursor and jumping directly to the end.
- The new primary key is one more than that.
- If we are an update then we already have the primary key.
- Copy all of the header, key data, metadata and document data into a buffer.
- Put the data into the table.
That pretty much it, except for the secondary key callback that will occur when the Put operation happens. Remember we are responsible for picking out the secondary key from the primary data field.
unsafe private DbFile.KeyGenStatus GetDocumentKeyForIndexByKey(DbFile secondary, ref DbEntry key, ref DbEntry data, out DbEntry result) { //extract the key for the secondary index of the document table var header = new DocumentHeader(); Marshal.Copy(data.Buffer, 0, new IntPtr(&header), documentBaseLength); var keyBuffer = new byte[header.KeySize]; Buffer.BlockCopy(data.Buffer, documentBaseLength, keyBuffer, 0, keyBuffer.Length); result = DbEntry.InOut(keyBuffer); return DbFile.KeyGenStatus.Success; }
This will pull the document key from the primary data field and return it to BDB for storage in the secondary index. With a little more plumbing sprinkled through we should be able to run the server now and do a document put operation and see it in the documents.db file : curl -X PUT http://localhost:8080/docs/bobs_address -d "{ FirstName: 'Bob', LastName: 'Smith', Address: '5 Elm St' }"
VERSION=3 format=bytevalue database=data type=btree db_pagesize=8192 HEADER=END 0100000000000000 00000000000000000000000000000001e10574f8877ccd011800000005000000420000000000000062006f00620073005f0061006400640072006500730073000500000000420000000246697273744e616d650004000000426f6200024c6173744e616d650006000000536d69746800024164647265737300090000003520456c6d2053740000 DATA=END VERSION=3 format=bytevalue database=indexByKey type=btree db_pagesize=8192 HEADER=END 62006f00620073005f006100640064007200650073007300 0100000000000000 DATA=END
The data in the dump is display key then value. So we have (in the primary data section) a key of 1, with the document header followed by all of the variable data (you can see the first 16 bytes is the etag of 00000000-0000-0000-0000-000000000001). Then in the secondary index you can see the unicode version of bobs_address as the key and the primary key 1 as the data.
We have successfully stored a document in the database, now let's try to get it back out.
No comments:
Post a Comment