• Kenton Varda's avatar
    Implement kj::Table, an alternative to STL maps/sets. · 8707e731
    Kenton Varda authored
    Hash-based (unordered) and tree-based (ordered) indexing are provided.
    
    kj::Table offers advantages over STL:
    - A Table can have multiple indexes (allowing lookup by multiple keys). Different indexes can use different algorithms (e.g. hash vs. tree) and have different uniqueness constraints.
    - The properties on which a Table is indexed need not be explicit fields -- they can be computed from the table's row type.
    - Tables use less memory and make fewer allocations than STL, because rows are stored in a contiguous array.
    - The hash indexing implementation uses linear probing rather than chaining, which again means far fewer allocations and more cache-friendliness.
    - The tree indexing implementation uses B-trees optimized for cache line size, whereas STL uses cache-unfriendly and allocation-heavy red-black binary trees. (However, STL trees are overall more cache-friendly; see below.)
    - Most of the b-tree implementation is not templated. This reduces code bloat, at the cost of some performance due to virtual calls.
    
    On an ad hoc benchmark on large tables, the hash index implementation appears to outperform libc++'s `std::unordered_set` by ~60%. However, libc++'s `std::set` still outperforms the B-tree index by ~70%. It looks like the B-tree implementation suffers in part from the fact that keys are not stored inline in the tree nodes, forcing extra memory indirections. This is a price we pay for lower memory usage overall, and the ability to have multiple indexes on one table. The b-tree implementation also suffers somewhat from not being 100% templates, compared to STL, but I think this is a reasonable trade-off. The most performance-critical use cases will use hash indexes anyway.
    8707e731
function.h 9.89 KB