I’ve used Simple.Data in a few production projects now (and it’s doing a great job so far). It’s not often you actually get to use the software that you write, but when you do, it’s a great opportunity to see it through users’ eyes, and I’ve made a few changes and improvements over the past year as a result.

The most recent project that we’ve used it on at Dot Net Solutions is the Met Office open data thing that was announced on Tuesday. And that forced me to bring forward an optimisation that’s been way down my to-do list, partly because I didn’t really just how much of an optimisation it would be.

The Met Office project involves inserting something like 8 million records a day into a SQL Azure database, which isn’t a huge amount, but enough to need you to be smart about how you do it. The version of Simple.Data that was on NuGet when we started supported bulk inserts, but it wasn’t friendly to the error handling we needed and it assumed it needed to return the inserted records, doing that whole ‘select just-inserted-record’ thing, which is often completely unnecessary.

(So it turns out that when you’re handling TryInvokeMember in a DynamicObject, you can actually find out whether the return value is used by the caller, and not bother if it isn’t. But that’s another blog post.)

Anyway, I tweaked a couple of things and shaved off a fraction of the time it was taking, but it was a small fraction, and things were still far too slow. So we did what we should have done in the first place, and used SqlBulkCopy.

If you haven’t used this (SqlClient-specific) method, you should read up on it and keep it in your mental list of “things that are good that I might need some day”. It lets you prepare a big batch of rows in a DataTable (turns out they’re still good for something) and then insert them in a single operation, and man, it’s quick.

But it’s SQL Server specific, so I couldn’t support it in the generic ADO adapter code.

I’ve exposed a few interfaces in the Simple.Data.Ado assembly which providers can optionally implement if they need to do something a little differently or can do something better. The first instance was ICustomInserter, which is implemented in the Oracle provider to handle fetch-backs in a world without IDENTITY columns. Since then I’ve added more as I went along, and IBulkInserter was one of them because, as I said earlier, I had half a mind to implement this. And now I have.

Anyway, I’ll stop blathering now and just post the comparison code I wrote (measures time to insert 10,000 records, five times) and the before and after results.


[sourcecode language=”csharp”]
namespace BulkInsertComparison
using System;
using System.Collections.Generic;
using System.Diagnostics;
using Simple.Data;

class Program
    static void Main(string[] args)
        var db = Database.OpenConnection("data source=.;initial catalog=BulkInsertTest;integrated security=true");

        for (int i = 0; i < 5; i++)

    private static TimeSpan TimeInsert(dynamic db)
        var stopwatch = Stopwatch.StartNew();
        return stopwatch.Elapsed;

    static IEnumerable<Item> GenerateItems(int number)
        for (int i = 0; i < number; i++)
            var guid = Guid.NewGuid();
            yield return new Item(0, guid, guid.ToString("N"));

class Item
    private readonly int _id;
    private readonly Guid _guid;
    private readonly string _text;

    public Item(int id, Guid guid, string text)
        _id = id;
        _guid = guid;
        _text = text;

    public string Text { get { return _text; } }
    public Guid Guid { get { return _guid; } }
    public int Id { get { return _id; } }





00:00:00.4616911 (First run includes MEFing IBulkInserter)

Totally worth it.

0.12.2 on NuGet now. Mini-roadmap: eager-loading (0.14) and upserts (0.15).

Share on facebook
Share on google
Share on twitter
Share on linkedin


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.