Document Object Model(DOM) is a in-memory representation of JSON for query and manipulation. The basic usage of DOM is described in [Tutorial](tutorial.md). This section will describe some details and more advanced usages.
Document Object Model(DOM) is an in-memory representation of JSON for query and manipulation. The basic usage of DOM is described in [Tutorial](doc/tutorial.md). This section will describe some details and more advanced usages.
## Template
[TOC]
# Template {#Template}
In the tutorial, `Value` and `Document` was used. Similarly to `std::string`, these are actually `typedef` of template classes:
@@ -19,17 +23,21 @@ class GenericDocument : public GenericValue<Encoding, Allocator> {
typedef GenericValue<UTF8<> > Value;
typedef GenericDocument<UTF8<> > Document;
} // namespace rapidjson
~~~~~~~~~~
User can customize these template parameters.
### Encoding
## Encoding {#Encoding}
The `Encoding` parameter specifies the encoding of JSON String value in memory. Possible options are `UTF8`, `UTF16`, `UTF32`. Note that, these 3 types are also template class. `UTF8<>` is `UTF8<char>`, which means using char to store the characters. You may refer to [Encoding](encoding.md) for details.
Suppose a Windows application would query localization strings stored in JSON files. Unicode-enabled functions in Windows use UTF-16 (wide character) encoding. No matter what encoding was used in JSON files, we can store the strings in UTF-16 in memory.
~~~~~~~~~~cpp
using namespace rapidjson;
typedef GenericDocument<UTF16<> > WDocument;
typedef GenericValue<UTF16<> > WValue;
...
...
@@ -48,7 +56,7 @@ const WValue locale(L"ja"); // Japanese
The `Allocator` defines which allocator class is used when allocating/deallocating memory for `Document`/`Value`. `Document` owns, or references to an `Allocator` instance. On the other hand, `Value` does not do so, in order to reduce memory consumption.
...
...
@@ -56,44 +64,46 @@ The default allocator used in `GenericDocument` is `MemoryPoolAllocator`. This a
Another allocator is `CrtAllocator`, of which CRT is short for C RunTime library. This allocator simply calls the standard `malloc()`/`realloc()`/`free()`. When there is a lot of add and remove operations, this allocator may be preferred. But this allocator is far less efficient than `MemoryPoolAllocator`.
## Parsing
# Parsing {#Parsing}
`Document` provides several functions for parsing. In below, (1) is the fundamental function, while the others are helpers which call (1).
The examples of [tutorial](tutorial.md) uses (9) for normal parsing of string. The examples of [stream](stream.md) uses the first three. *In situ* parsing will be described soon.
...
...
@@ -108,11 +118,11 @@ Parse flags | Meaning
By using a non-type template parameter, instead of a function parameter, C++ compiler can generate code which is optimized for specified combinations, improving speed, and reducing code size (if only using a single specialization). The downside is the flags needed to be determined in compile-time.
The `SourceEncoding` parameter defines what encoding is in the stream. This can be differed to the `Encoding` of the `Document`. See [Transcoding and Validation](#transcoding-and-validation) section for details.
The `SourceEncoding` parameter defines what encoding is in the stream. This can be differed to the `Encoding` of the `Document`. See [Transcoding and Validation](#TranscodingAndValidation) section for details.
And the `InputStream` is type of input stream.
### Parse Error
## Parse Error {#ParseError}
When the parse processing succeeded, the `Document` contains the parse results. When there is an error, the original DOM is *unchanged*. And the error state of parsing can be obtained by `bool HasParseError()`, `ParseErrorCode GetParseError()` and `size_t GetParseOffet()`.
...
...
@@ -146,14 +156,12 @@ Here shows an example of parse error handling.
// TODO: example
~~~~~~~~~~
### In Situ Parsing
## In Situ Parsing {#InSituParsing}
From [Wikipedia](http://en.wikipedia.org/wiki/In_situ):
> *In situ* ... is a Latin phrase that translates literally to "on site" or "in position". It means "locally", "on site", "on the premises" or "in place" to describe an event where it takes place, and is used in many different contexts.
> ...
> (In computer science) An algorithm is said to be an in situ algorithm, or in-place algorithm, if the extra amount of memory required to execute the algorithm is O(1), that is, does not exceed a constant no matter how large the input. For example, heapsort is an in situ sorting algorithm.
In normal parsing process, a large overhead is to decode JSON strings and copy them to other buffers. *In situ* parsing decodes those JSON string at the place where it is stored. It is possible in JSON because the decoded string is always shorter than the one in JSON. In this context, decoding a JSON string means to process the escapes, such as `"\n"`, `"\u1234"`, etc., and add a null terminator (`'\0'`)at the end of string.
...
...
@@ -204,17 +212,17 @@ There are some limitations of *in situ* parsing:
*In situ* parsing is mostly suitable for short-term JSON that only need to be processed once, and then be released from memory. In practice, these situation is very common, for example, deserializing JSON to C++ objects, processing web requests represented in JSON, etc.
### Transcoding and Validation
## Transcoding and Validation {#TranscodingAndValidation}
RapidJSON supports conversion between Unicode formats (officially termed UCS Transformation Format) internally. During DOM parsing, the source encoding of the stream can be different from the encoding of the DOM. For example, the source stream contains a UTF-8 JSON, while the DOM is using UTF-16 encoding. There is an example code in [EncodedInputStream](stream.md#encodedinputstream).
RapidJSON supports conversion between Unicode formats (officially termed UCS Transformation Format) internally. During DOM parsing, the source encoding of the stream can be different from the encoding of the DOM. For example, the source stream contains a UTF-8 JSON, while the DOM is using UTF-16 encoding. There is an example code in [EncodedInputStream](doc/stream.md#EncodedInputStream).
When writing a JSON from DOM to output stream, transcoding can also be used. An example is in [EncodedOutputStream](stream.md##encodedoutputstream).
When writing a JSON from DOM to output stream, transcoding can also be used. An example is in [EncodedOutputStream](stream.md##EncodedOutputStream).
During transcoding, the source string is decoded to into Unicode code points, and then the code points are encoded in the target format. During decoding, it will validate the byte sequence in the source string. If it is not a valid sequence, the parser will be stopped with `kParseErrorStringInvalidEncoding` error.
When the source encoding of stream is the same as encoding of DOM, by default, the parser will *not* validate the sequence. User may use `kParseValidateEncodingFlag` to force validation.
## Techniques
# Techniques {#Techniques}
Some techniques about using DOM API is discussed here.
...
...
@@ -236,9 +244,9 @@ User may create customer handlers for transforming the DOM into other formats. F
// TODO: example
~~~~~~~~~~
For more about SAX events and handler, please refer to [SAX](sax.md).
For more about SAX events and handler, please refer to [SAX](doc/sax.md).
### User Buffer
## User Buffer {#UserBuffer}
Some applications may try to avoid memory allocations whenever possible.
In RapidJSON, `rapidjson::Stream` is a concept for reading/writing JSON. Here we first show how to use streams provided. And then see how to create a custom streams.
## Memory Streams
[TOC]
# Memory Streams {#MemoryStreams}
Memory streams store JSON in memory.
### StringStream (Input)
## StringStream (Input) {#StringStream}
`StringStream` is the most basic input stream. It represents a complete, read-only JSON stored in memory. It is defined in `rapidjson/rapidjson.h`.
...
...
@@ -34,7 +36,7 @@ d.Parse(json);
Note that, `StringStream` is a typedef of `GenericStringStream<UTF8<> >`, user may use another encodings to represent the character set of the stream.
### StringBuffer (Output)
## StringBuffer (Output) {#StringBuffer}
`StringBuffer` is a simple output stream. It allocates a memory buffer for writing the whole JSON. Use `GetString()` to obtain the buffer.
...
...
@@ -59,13 +61,13 @@ By default, `StringBuffer` will instantiate an internal allocator.
Similarly, `StringBuffer` is a typedef of `GenericStringBuffer<UTF8<> >`.
## File Streams
# File Streams {#FileStreams}
When parsing a JSON from file, you may read the whole JSON into memory and use ``StringStream`` above.
However, if the JSON is big, or memory is limited, you can use `FileReadStream`. It only read a part of JSON from file into buffer, and then let the part be parsed. If it runs out of characters in the buffer, it will read the next part from file.
### FileReadStream (Input)
## FileReadStream (Input) {#FileReadStream}
`FileReadStream` reads the file via a `FILE` pointer. And user need to provide a buffer.
...
...
@@ -90,7 +92,7 @@ Different from string streams, `FileReadStream` is byte stream. It does not hand
Apart from reading file, user can also use `FileReadStream` to read `stdin`.
### FileWriteStream (Output)
## FileWriteStream (Output) {#FileWriteStream}
`FileWriteStream` is buffered output stream. Its usage is very similar to `FileReadStream`.
...
...
@@ -117,7 +119,7 @@ fclose(fp);
It can also directs the output to `stdout`.
## Encoded Streams
# Encoded Streams {#EncodedStreams}
Encoded streams do not contain JSON itself, but they wrap byte streams to provide basic encoding/decoding function.
...
...
@@ -129,7 +131,7 @@ If the encoding of stream is known in compile-time, you may use `EncodedInputStr
Note that, these encoded streams can be applied to streams other than file. For example, you may have a file in memory, or a custom byte stream, be wrapped in encoded streams.
### EncodedInputStream
## EncodedInputStream {#EncodedInputStream}
`EncodedInputStream` has two template parameters. The first one is a `Encoding` class, such as `UTF8`, `UTF16LE`, defined in `rapidjson/encodings.h`. The second one is the class of stream to be wrapped.
...
...
@@ -154,7 +156,7 @@ d.ParseStream<0, UTF16LE<> >(eis); // Parses UTF-16LE file into UTF-8 in memory
fclose(fp);
~~~~~~~~~~
### EncodedOutputStream
## EncodedOutputStream {#EncodedOutputStream}
`EncodedOutputStream` is similar but it has a `bool putBOM` parameter in the constructor, controlling whether to write BOM into output byte stream.
...
...
@@ -180,7 +182,7 @@ d.Accept(writer); // This generates UTF32-LE file from UTF-8 in memory
fclose(fp);
~~~~~~~~~~
### AutoUTFInputStream
## AutoUTFInputStream {#AutoUTFInputStream}
Sometimes an application may want to handle all supported JSON encoding. `AutoUTFInputStream` will detection encoding by BOM first. If BOM is unavailable, it will use characteristics of valid JSON to make detection. If neither method success, it falls back to the UTF type provided in constructor.
...
...
@@ -211,7 +213,7 @@ When specifying the encoding of stream, uses `AutoUTF<CharType>` as in `ParseStr
You can obtain the type of UTF via `UTFType GetType()`. And check whether a BOM is found by `HasBOM()`
### AutoUTFOutputStream
## AutoUTFOutputStream {#AutoUTFOutputStream}
Similarly, to choose encoding for output during runtime, we can use `AutoUTFOutputStream`. This class is not automatic *per se*. You need to specify the UTF type and whether to write BOM in runtime.
`AutoUTFInputStream` and `AutoUTFOutputStream` is more convenient than `EncodedInputStream` and `EncodedOutputStream`. They just incur a little bit runtime overheads.
## Custom Stream
# Custom Stream {#CustomStream}
In addition to memory/file streams, user can create their own stream classes which fits RapidJSON's API. For example, you may create network stream, stream from compressed file, etc.
...
...
@@ -273,7 +275,7 @@ For input stream, they must implement `Peek()`, `Take()` and `Tell()`.
For output stream, they must implement `Put()` and `Flush()`.
There are two special interface, `PutBegin()` and `PutEnd()`, which are only for *in situ* parsing. Normal streams do not implement them. However, if the interface is not needed for a particular stream, it is still need to a dummy implementation, otherwise will generate compilation error.
The following example is a wrapper of `std::istream`, which only implements 2 functions.
...
...
@@ -367,6 +369,6 @@ d.Accept(writer);
Note that, this implementation may not be as efficient as RapidJSON's memory or file streams, due to internal overheads of the standard library.
## Summary
# Summary {#Summary}
This section describes stream classes available in RapidJSON. Memory streams are simple. File stream can reduce the memory required during JSON parsing and generation, if the JSON is stored in file system. Encoded streams converts between byte streams and character streams. Finally, user may create custom streams using a simple interface.
This tutorial introduces the basics of the Document Object Model(DOM) API.
As shown in [Usage at a glance](../readme.md#usage-at-a-glance), a JSON can be parsed into DOM, and then the DOM can be queried and modified easily, and finally be converted back to JSON.
As shown in [Usage at a glance](readme.md), a JSON can be parsed into DOM, and then the DOM can be queried and modified easily, and finally be converted back to JSON.
## Value & Document
[TOC]
# Value & Document {#ValueDocument}
Each JSON value is stored in a type called `Value`. A `Document`, representing the DOM, contains the root of `Value`. All public types and functions of RapidJSON are defined in the `rapidjson` namespace.
### Query Value
# Query Value {#QueryValue}
In this section, we will use excerpt of [`example/tutorial/tutorial.cpp`](../example/tutorial/tutorial.cpp).
In this section, we will use excerpt of `example/tutorial/tutorial.cpp`.
Assumes we have a JSON stored in a C string (`const char* json`):
~~~~~~~~~~js
...
...
@@ -38,7 +40,7 @@ document.Parse(json);
The JSON is now parsed into `document` as a *DOM tree*:
![tutorial](diagram/tutorial.png)
![DOM in the tutorial](diagram/tutorial.png)
The root of a conforming JSON should be either an object or an array. In this case, the root is an object.
~~~~~~~~~~cpp
...
...
@@ -115,7 +117,7 @@ Note that, RapidJSON does not automatically convert values between JSON types. I
In the following, details about querying individual types are discussed.
### Query Array
## Query Array {#QueryArray}
By default, `SizeType` is typedef of `unsigned`. In most systems, array is limited to store up to 2^32-1 elements.
...
...
@@ -133,7 +135,7 @@ And other familiar query functions:
*`SizeType Capacity() const`
*`bool Empty() const`
### Query Object
## Query Object {#QueryObject}
Similar to array, we can iterate object members by iterator:
...
...
@@ -169,7 +171,7 @@ if (itr != document.MemberEnd())
printf("%s %s\n", itr->value.GetString());
~~~~~~~~~~
### Querying Number
## Querying Number {#QueryNumber}
JSON provide a single numerical type called Number. Number can be integer or real numbers. RFC 4627 says the range of Number is specified by parser.
...
...
@@ -200,7 +202,7 @@ Note that, an integer value may be obtained in various ways without conversion.
When obtaining the numeric values, `GetDouble()` will convert internal integer representation to a `double`. Note that, `int` and `uint` can be safely convert to `double`, but `int64_t` and `uint64_t` may lose precision (since mantissa of `double` is only 52-bits).
### Query String
## Query String {#QueryString}
In addition to `GetString()`, the `Value` class also contains `GetStringLength()`. Here explains why.
which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance.
## Create/Modify Values
# Create/Modify Values {#CreateModifyValues}
There are several ways to create values. After a DOM tree is created and/or modified, it can be saved as JSON again using `Writer`.
### Changing Value Type
## Change Value Type {#ChangeValueType}
When creating a Value or Document by default constructor, its type is Null. To change its type, call `SetXXX()` or assignment operator, for example:
~~~~~~~~~~cpp
...
...
@@ -258,7 +260,7 @@ Value o(kObjectType);
Value a(kArrayType);
~~~~~~~~~~
### Move Semantics
## Move Semantics {#MoveSemantics}
A very special decision during design of RapidJSON is that, assignment of value does not copy the source value to destination value. Instead, the value from source is moved to the destination. For example,
...
...
@@ -268,7 +270,7 @@ Value b(456);
b = a; // a becomes a Null value, b becomes number 123.
~~~~~~~~~~
![move1](diagram/move1.png)
![Assignment with move semantics.](diagram/move1.png)
Why? What is the advantage of this semantics?
...
...
@@ -287,7 +289,7 @@ Value o(kObjectType);
}
~~~~~~~~~~
![move2](diagram/move2.png)
![Copy semantics makes a lots of copy operations.](diagram/move2.png)
The object `o` needs to allocate a buffer of same size as contacts, makes a deep clone of it, and then finally contacts is destructed. This will incur a lot of unnecessary allocations/deallocations and memory copying.
...
...
@@ -307,11 +309,11 @@ Value o(kObjectType);
}
~~~~~~~~~~
![move3](diagram/move3.png)
![Move semantics makes no copying.](diagram/move3.png)
This is called move assignment operator in C++11. As RapidJSON supports C++03, it adopts move semantics using assignment operator, and all other modifying function like `AddMember()`, `PushBack()`.
### Create String
## Create String {#CreateString}
RapidJSON provide two strategies for storing string.
1. copy-string: allocates a buffer, and then copy the source data into it.
If we really need to copy a DOM tree, we can use two APIs for deep copy: constructor with allocator, and `CopyFrom()`.
~~~~~~~~~~cpp
...
...
@@ -408,7 +410,7 @@ v1.SetObject().AddMember( "array", v2, a );
d.PushBack(v1,a);
~~~~~~~~~~
### Swap Values
## Swap Values {#SwapValues}
`Swap()` is also provided.
...
...
@@ -422,15 +424,15 @@ assert(b.IsInt());
Swapping two DOM trees is fast (constant time), despite the complexity of the tress.
## What's next
# What's next {#WhatsNext}
This tutorial shows the basics of DOM tree query and manipulation. There are several important concepts in RapidJSON:
1.[Streams](stream.md) are channels for reading/writing JSON, which can be a in-memory string, or file stream, etc. User can also create their streams.
2.[Encoding](encoding.md) defines which character set is used in streams and memory. RapidJSON also provide Unicode conversion/validation internally.
3.[DOM](dom.md)'s basics are already covered in this tutorial. Uncover more advanced features such as *in situ* parsing, other parsing options and advanced usages.
4.[SAX](sax.md) is the foundation of parsing/generating facility in RapidJSON. Learn how to use `Reader`/`Writer` to implement even faster applications. Also try `PrettyWriter` to format the JSON.
5.[Performance](performance.md) shows some in-house and third-party benchmarks.
6.[Internals](internals.md) describes some internal designs and techniques of RapidJSON.
1.[Streams](doc/stream.md) are channels for reading/writing JSON, which can be a in-memory string, or file stream, etc. User can also create their streams.
2.[Encoding](doc/encoding.md) defines which character set is used in streams and memory. RapidJSON also provide Unicode conversion/validation internally.
3.[DOM](doc/dom.md)'s basics are already covered in this tutorial. Uncover more advanced features such as *in situ* parsing, other parsing options and advanced usages.
4.[SAX](doc/sax.md) is the foundation of parsing/generating facility in RapidJSON. Learn how to use `Reader`/`Writer` to implement even faster applications. Also try `PrettyWriter` to format the JSON.
5.[Performance](doc/performance.md) shows some in-house and third-party benchmarks.
6.[Internals](doc/internals.md) describes some internal designs and techniques of RapidJSON.
You may also refer to the [FAQ](faq.md), API documentation, examples and unit tests.