File identifier feature.

Allows you to add, and test for the presence of a magic 4-char string in a FlatBuffer. Tested: on OS X. Change-Id: I090692a9e4fb53bed3543279a28563e67132cba0

File identifier feature.
Allows you to add, and test for the presence of a magic 4-char string in a FlatBuffer. Tested: on OS X. Change-Id: I090692a9e4fb53bed3543279a28563e67132cba0
5da7bda8 · Wouter van Oortmerssen · be3c8742 · 5da7bda8 · 5da7bda8 · 5da7bda8
Commit 5da7bda8 authored Jul 31, 2014 by Wouter van Oortmerssen
13 changed files
--- a/.gitignore
+++ b/.gitignore
@@ -34,3 +34,4 @@ flatsamplebinary
 flatsampletext
 snapshot.sh
 tests/go_gen
+CMakeLists.txt.user
--- a/docs/html/md__cpp_usage.html
+++ b/docs/html/md__cpp_usage.html
@@ -74,7 +74,7 @@ mb.add_name(name);
 mb.add_inventory(inventory);
 auto mloc = mb.Finish();
 </pre><p>We start with a temporary helper class <code>MonsterBuilder</code> (which is defined in our generated code also), then call the various <code>add_</code> methods to set fields, and <code>Finish</code> to complete the object. This is pretty much the same code as you find inside <code>CreateMonster</code>, except we're leaving out a few fields. Fields may also be added in any order, though orderings with fields of the same size adjacent to each other most efficient in size, due to alignment. You should not nest these Builder classes (serialize your data in pre-order).</p>
-<p>Regardless of whether you used <code>CreateMonster</code> or <code>MonsterBuilder</code>, you now have an offset to the root of your data, and you can finish the buffer using: </p><pre class="fragment">fbb.Finish(mloc);
+<p>Regardless of whether you used <code>CreateMonster</code> or <code>MonsterBuilder</code>, you now have an offset to the root of your data, and you can finish the buffer using: </p><pre class="fragment">FinishMonsterBuffer(fbb, mloc);
 </pre><p>The buffer is now ready to be stored somewhere, sent over the network, be compressed, or whatever you'd like to do with it. You can access the start of the buffer with <code>fbb.GetBufferPointer()</code>, and it's size from <code>fbb.GetSize()</code>.</p>
 <p><code>samples/sample_binary.cpp</code> is a complete code sample similar to the code above, that also includes the reading code below.</p>
 <h3>Reading in C++</h3>

--- a/docs/html/md__schemas.html
+++ b/docs/html/md__schemas.html
@@ -113,7 +113,15 @@ root_type Monster;
 <p>These will generate the corresponding namespace in C++ for all helper code, and packages in Java. You can use <code>.</code> to specify nested namespaces / packages.</p>
 <h3>Root type</h3>
 <p>This declares what you consider to be the root table (or struct) of the serialized data. This is particular important for parsing JSON data, which doesn't include object type information.</p>
-<h3>Comments &amp; documentation</h3>
+<h3>File identification and extension</h3>
+<p>Typically, a FlatBuffer binary buffer is not self-describing, i.e. it needs you to know its schema to parse it correctly. But if you want to use a FlatBuffer as a file format, it would be convenient to be able to have a "magic number" in there, like most file formats have, to be able to do a sanity check to see if you're reading the kind of file you're expecting.</p>
+<p>Now, you can always prefix a FlatBuffer with your own file header, but FlatBuffers has a built-in way to add an identifier to a FlatBuffer that takes up minimal space, and keeps the buffer compatible with buffers that don't have such an identifier.</p>
+<p>You can specify in a schema, similar to <code>root_type</code>, that you intend for this type of FlatBuffer to be used as a file format: </p><pre class="fragment">file_identifier "MYFI";
+</pre><p>Identifiers must always be exactly 4 characters long. These 4 characters will end up as bytes at offsets 4-7 (inclusive) in the buffer.</p>
+<p>For any schema that has such an identifier, <code>flatc</code> will automatically add the identifier to any binaries it generates (with <code>-b</code>), and generated calls like <code>FinishMonsterBuffer</code> also add the identifier. If you have specified an identifier and wish to generate a buffer without one, you can always still do so by calling <code>FlatBufferBuilder::Finish</code> explicitly.</p>
+<p>After loading a buffer, you can use a call like <code>MonsterBufferHasIdentifier</code> to check if the identifier is present.</p>
+<p>Additionally, by default <code>flatc</code> will output binary files as <code>.bin</code>. This declaration in the schema will change that to whatever you want: </p><pre class="fragment">file_extension "ext";
+</pre><h3>Comments &amp; documentation</h3>
 <p>May be written as in most C-based languages. Additionally, a triple comment (<code>///</code>) on a line by itself signals that a comment is documentation for whatever is declared on the line after it (table/struct/field/enum/union/element), and the comment is output in the corresponding C++ code. Multiple such lines per item are allowed.</p>
 <h3>Attributes</h3>
 <p>Attributes may be attached to a declaration, behind a field, or after the name of a table/struct/enum/union. These may either have a value or not. Some attributes like <code>deprecated</code> are understood by the compiler, others are simply ignored (like <code>priority</code>), but are available to query if you parse the schema at runtime. This is useful if you write your own code generators/editors etc., and you wish to add additional information specific to your tool (such as a help text).</p>

--- a/docs/source/CppUsage.md
+++ b/docs/source/CppUsage.md
@@ -88,7 +88,7 @@ Regardless of whether you used `CreateMonster` or `MonsterBuilder`, you
 now have an offset to the root of your data, and you can finish the
 buffer using:

-    fbb.Finish(mloc);
+    FinishMonsterBuffer(fbb, mloc);

 The buffer is now ready to be stored somewhere, sent over the network,
 be compressed, or whatever you'd like to do with it. You can access the

--- a/docs/source/Schemas.md
+++ b/docs/source/Schemas.md
@@ -147,6 +147,43 @@ This declares what you consider to be the root table (or struct) of the
 serialized data. This is particular important for parsing JSON data,
 which doesn't include object type information.

+### File identification and extension
+
+Typically, a FlatBuffer binary buffer is not self-describing, i.e. it
+needs you to know its schema to parse it correctly. But if you
+want to use a FlatBuffer as a file format, it would be convenient
+to be able to have a "magic number" in there, like most file formats
+have, to be able to do a sanity check to see if you're reading the
+kind of file you're expecting.
+
+Now, you can always prefix a FlatBuffer with your own file header,
+but FlatBuffers has a built-in way to add an identifier to a
+FlatBuffer that takes up minimal space, and keeps the buffer
+compatible with buffers that don't have such an identifier.
+
+You can specify in a schema, similar to `root_type`, that you intend
+for this type of FlatBuffer to be used as a file format:
+
+    file_identifier "MYFI";
+
+Identifiers must always be exactly 4 characters long. These 4 characters
+will end up as bytes at offsets 4-7 (inclusive) in the buffer.
+
+For any schema that has such an identifier, `flatc` will automatically
+add the identifier to any binaries it generates (with `-b`),
+and generated calls like `FinishMonsterBuffer` also add the identifier.
+If you have specified an identifier and wish to generate a buffer
+without one, you can always still do so by calling
+`FlatBufferBuilder::Finish` explicitly.
+
+After loading a buffer, you can use a call like
+`MonsterBufferHasIdentifier` to check if the identifier is present.
+
+Additionally, by default `flatc` will output binary files as `.bin`.
+This declaration in the schema will change that to whatever you want:
+
+    file_extension "ext";
+
 ### Comments & documentation

 May be written as in most C-based languages. Additionally, a triple

--- a/include/flatbuffers/flatbuffers.h
+++ b/include/flatbuffers/flatbuffers.h
@@ -613,10 +613,21 @@ class FlatBufferBuilder {
    return CreateVectorOfStructs(v.data(), v.size());
  }

+  static const int kFileIdentifierLength = 4;
+
  // Finish serializing a buffer by writing the root offset.
-  template<typename T> void Finish(Offset<T> root) {
+  // If a file_identifier is given, the buffer will be prefix with a standard
+  // FlatBuffers file header.
+  template<typename T> void Finish(Offset<T> root,
+                                   const char *file_identifier = nullptr) {
    // This will cause the whole buffer to be aligned.
-    PreAlign(sizeof(uoffset_t), minalign_);
+    PreAlign(sizeof(uoffset_t) + (file_identifier ? kFileIdentifierLength : 0),
+             minalign_);
+    if (file_identifier) {
+      assert(strlen(file_identifier) == kFileIdentifierLength);
+      buf_.push(reinterpret_cast<const uint8_t *>(file_identifier),
+                kFileIdentifierLength);
+    }
    PushElement(ReferTo(root.o));  // Location of root.
  }

@@ -649,6 +660,12 @@ template<typename T> const T *GetRoot(const void *buf) {
    EndianScalar(*reinterpret_cast<const uoffset_t *>(buf)));
 }

+// Helper to see if the identifier in a buffer has the expected value.
+inline bool BufferHasIdentifier(const void *buf, const char *identifier) {
+  return strncmp(reinterpret_cast<const char *>(buf) + 4, identifier,
+                 FlatBufferBuilder::kFileIdentifierLength) == 0;
+}
+
 // Helper class to verify the integrity of a FlatBuffer
 class Verifier {
 public:

--- a/include/flatbuffers/idl.h
+++ b/include/flatbuffers/idl.h
@@ -284,6 +284,8 @@ class Parser {

  FlatBufferBuilder builder_;  // any data contained in the file
  StructDef *root_struct_def;
+  std::string file_identifier_;
+  std::string file_extension_;

 private:
  const char *source_, *cursor_;

--- a/src/flatc.cpp
+++ b/src/flatc.cpp
@@ -27,9 +27,10 @@ bool GenerateBinary(const Parser &parser,
                    const std::string &path,
                    const std::string &file_name,
                    const GeneratorOptions & /*opts*/) {
+  auto ext = parser.file_extension_.length() ? parser.file_extension_ : "bin";
  return !parser.builder_.GetSize() ||
         flatbuffers::SaveFile(
-           (path + file_name + ".bin").c_str(),
+           (path + file_name + "." + ext).c_str(),
           reinterpret_cast<char *>(parser.builder_.GetBufferPointer()),
           parser.builder_.GetSize(),
           true);

--- a/src/idl_gen_cpp.cpp
+++ b/src/idl_gen_cpp.cpp
@@ -466,18 +466,36 @@ std::string GenerateCPP(const Parser &parser, const std::string &include_guard_i
    code += decl_code;
    code += enum_code_post;

-    // Generate convenient root datatype accessor, and root verifier.
+    // Generate convenient global helper functions:
    if (parser.root_struct_def) {
+      // The root datatype accessor:
      code += "inline const " + parser.root_struct_def->name + " *Get";
      code += parser.root_struct_def->name;
      code += "(const void *buf) { return flatbuffers::GetRoot<";
      code += parser.root_struct_def->name + ">(buf); }\n\n";

+      // The root verifier:
      code += "inline bool Verify";
      code += parser.root_struct_def->name;
      code += "Buffer(const flatbuffers::Verifier &verifier) { "
              "return verifier.VerifyBuffer<";
      code += parser.root_struct_def->name + ">(); }\n\n";
+
+      // Finish a buffer with a given root object:
+      code += "inline void Finish" + parser.root_struct_def->name;
+      code += "Buffer(flatbuffers::FlatBufferBuilder &fbb, flatbuffers::Offset<";
+      code += parser.root_struct_def->name + "> root) { fbb.Finish(root";
+      if (parser.file_identifier_.length())
+        code += ", \"" + parser.file_identifier_ + "\"";
+      code += "); }\n\n";
+
+      if (parser.file_identifier_.length()) {
+        // Check if a buffer has the identifier.
+        code += "inline bool " + parser.root_struct_def->name;
+        code += "BufferHasIdentifier(const void *buf) { return flatbuffers::";
+        code += "BufferHasIdentifier(buf, \"" + parser.file_identifier_;
+        code += "\"); }\n\n";
+      }
    }

    // Close the namespaces.

--- a/src/idl_parser.cpp
+++ b/src/idl_parser.cpp
@@ -81,7 +81,9 @@ template<> inline Offset<void> atot<Offset<void>>(const char *s) {
  TD(Enum, 263, "enum") \
  TD(Union, 264, "union") \
  TD(NameSpace, 265, "namespace") \
-  TD(RootType, 266, "root_type")
+  TD(RootType, 266, "root_type") \
+  TD(FileIdentifier, 267, "file_identifier") \
+  TD(FileExtension, 268, "file_extension")
 #ifdef __GNUC__
 __extension__  // Stop GCC complaining about trailing comma with -Wpendantic.
 #endif
@@ -194,6 +196,14 @@ void Parser::Next() {
          if (attribute_ == "union")     { token_ = kTokenUnion;     return; }
          if (attribute_ == "namespace") { token_ = kTokenNameSpace; return; }
          if (attribute_ == "root_type") { token_ = kTokenRootType;  return; }
+          if (attribute_ == "file_identifier") {
+            token_ = kTokenFileIdentifier;
+            return;
+          }
+          if (attribute_ == "file_extension") {
+            token_ = kTokenFileExtension;
+            return;
+          }
          // If not, it is a user-defined identifier:
          token_ = kTokenIdentifier;
          return;
@@ -802,11 +812,26 @@ bool Parser::Parse(const char *source) {
        Next();
        auto root_type = attribute_;
        Expect(kTokenIdentifier);
-        Expect(';');
        if (!SetRootType(root_type.c_str()))
          Error("unknown root type: " + root_type);
        if (root_struct_def->fixed)
          Error("root type must be a table");
+        Expect(';');
+      } else if (token_ == kTokenFileIdentifier) {
+        Next();
+        file_identifier_ = attribute_;
+        Expect(kTokenStringConstant);
+        if (file_identifier_.length() !=
+            FlatBufferBuilder::kFileIdentifierLength)
+          Error("file_identifier must be exactly " +
+                NumToString(FlatBufferBuilder::kFileIdentifierLength) +
+                " characters");
+        Expect(';');
+      } else if (token_ == kTokenFileExtension) {
+        Next();
+        file_extension_ = attribute_;
+        Expect(kTokenStringConstant);
+        Expect(';');
      } else {
        ParseDecl();
      }

--- a/tests/monster_test.fbs
+++ b/tests/monster_test.fbs
@@ -36,3 +36,6 @@ table Monster {
 }

 root_type Monster;
+
+file_identifier "MONS";
+file_extension "mon";
--- a/tests/monster_test_generated.h
+++ b/tests/monster_test_generated.h
@@ -187,6 +187,10 @@ inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<

 inline bool VerifyMonsterBuffer(const flatbuffers::Verifier &verifier) { return verifier.VerifyBuffer<Monster>(); }

+inline void FinishMonsterBuffer(flatbuffers::FlatBufferBuilder &fbb, flatbuffers::Offset<Monster> root) { fbb.Finish(root, "MONS"); }
+
+inline bool MonsterBufferHasIdentifier(const void *buf) { return flatbuffers::BufferHasIdentifier(buf, "MONS"); }
+
 }  // namespace Example
 }  // namespace MyGame


--- a/tests/test.cpp
+++ b/tests/test.cpp
@@ -93,7 +93,7 @@ std::string CreateFlatBufferTest() {
                            Any_Monster, mloc2.Union(), // Store a union.
                            testv, vecofstrings, vecoftables, 0);

-  builder.Finish(mloc);
+  FinishMonsterBuffer(builder, mloc);

  #ifdef FLATBUFFERS_TEST_VERBOSE
  // print byte data for debugging:
@@ -116,6 +116,8 @@ void AccessFlatBufferTest(const std::string &flatbuf) {
    flatbuf.length());
  TEST_EQ(VerifyMonsterBuffer(verifier), true);

+  TEST_EQ(MonsterBufferHasIdentifier(flatbuf.c_str()), true);
+
  // Access the buffer from the root.
  auto monster = GetMonster(flatbuf.c_str());