Commit 8707e731 authored by Kenton Varda's avatar Kenton Varda

Implement kj::Table, an alternative to STL maps/sets.

Hash-based (unordered) and tree-based (ordered) indexing are provided.

kj::Table offers advantages over STL:
- A Table can have multiple indexes (allowing lookup by multiple keys). Different indexes can use different algorithms (e.g. hash vs. tree) and have different uniqueness constraints.
- The properties on which a Table is indexed need not be explicit fields -- they can be computed from the table's row type.
- Tables use less memory and make fewer allocations than STL, because rows are stored in a contiguous array.
- The hash indexing implementation uses linear probing rather than chaining, which again means far fewer allocations and more cache-friendliness.
- The tree indexing implementation uses B-trees optimized for cache line size, whereas STL uses cache-unfriendly and allocation-heavy red-black binary trees. (However, STL trees are overall more cache-friendly; see below.)
- Most of the b-tree implementation is not templated. This reduces code bloat, at the cost of some performance due to virtual calls.

On an ad hoc benchmark on large tables, the hash index implementation appears to outperform libc++'s `std::unordered_set` by ~60%. However, libc++'s `std::set` still outperforms the B-tree index by ~70%. It looks like the B-tree implementation suffers in part from the fact that keys are not stored inline in the tree nodes, forcing extra memory indirections. This is a price we pay for lower memory usage overall, and the ability to have multiple indexes on one table. The b-tree implementation also suffers somewhat from not being 100% templates, compared to STL, but I think this is a reasonable trade-off. The most performance-critical use cases will use hash indexes anyway.
parent 78c27314
...@@ -530,8 +530,13 @@ template <typename T, size_t s> ...@@ -530,8 +530,13 @@ template <typename T, size_t s>
inline constexpr size_t size(T (&arr)[s]) { return s; } inline constexpr size_t size(T (&arr)[s]) { return s; }
template <typename T> template <typename T>
inline constexpr size_t size(T&& arr) { return arr.size(); } inline constexpr size_t size(T&& arr) { return arr.size(); }
template <typename T, typename U, size_t s>
inline constexpr size_t size(U (T::*arr)[s]) { return s; }
// Returns the size of the parameter, whether the parameter is a regular C array or a container // Returns the size of the parameter, whether the parameter is a regular C array or a container
// with a `.size()` method. // with a `.size()` method.
//
// Can also be invoked on a pointer-to-member-array to get the declared size of that array,
// without having an instance of the containing type. E.g.: kj::size(&MyType::someArray)
class MaxValue_ { class MaxValue_ {
private: private:
......
...@@ -213,7 +213,8 @@ public: ...@@ -213,7 +213,8 @@ public:
// All instances of Wrapper<Func> are two pointers in size: a vtable, and a Func&. So if we // All instances of Wrapper<Func> are two pointers in size: a vtable, and a Func&. So if we
// allocate space for two pointers, we can construct a Wrapper<Func> in it! // allocate space for two pointers, we can construct a Wrapper<Func> in it!
static_assert(sizeof(WrapperType) == sizeof(space)); static_assert(sizeof(WrapperType) == sizeof(space),
"expected WrapperType to be two pointers");
// Even if `func` is an rvalue reference, it's OK to use it as an lvalue here, because // Even if `func` is an rvalue reference, it's OK to use it as an lvalue here, because
// FunctionParam is used strictly for parameters. If we captured a temporary, we know that // FunctionParam is used strictly for parameters. If we captured a temporary, we know that
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment