faq.md 15 KB
Newer Older
miloyip's avatar
miloyip committed
1
# FAQ
Milo Yip's avatar
Milo Yip committed
2

3 4
[TOC]

Milo Yip's avatar
Milo Yip committed
5 6 7 8
## General

1. What is RapidJSON?

9
   RapidJSON is a C++ library for parsing and generating JSON. You may check all [features](doc/features.md) of it.
Milo Yip's avatar
Milo Yip committed
10

11
2. Why is RapidJSON named so?
Milo Yip's avatar
Milo Yip committed
12

13 14 15 16 17 18 19 20
   It is inspired by [RapidXML](http://rapidxml.sourceforge.net/), which is a fast XML DOM parser.

3. Is RapidJSON similar to RapidXML?

   RapidJSON borrowed some designs of RapidXML, including *in situ* parsing, header-only library. But the two APIs are completely different. Also RapidJSON provide many features that are not in RapidXML.

4. Is RapidJSON free?

21
   Yes, it is free under MIT license. It can be used in commercial applications. Please check the details in [license.txt](https://github.com/Tencent/rapidjson/blob/master/license.txt).
22 23 24 25 26 27

5. Is RapidJSON small? What are its dependencies? 

   Yes. A simple executable which parses a JSON and prints its statistics is less than 30KB on Windows.

   RapidJSON depends on C++ standard library only.
Milo Yip's avatar
Milo Yip committed
28 29

6. How to install RapidJSON?
30

31
   Check [Installation section](https://miloyip.github.io/rapidjson/).
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

7. Can RapidJSON run on my platform?

   RapidJSON has been tested in many combinations of operating systems, compilers and CPU architecture by the community. But we cannot ensure that it can be run on your particular platform. Building and running the unit test suite will give you the answer.

8. Does RapidJSON support C++03? C++11?

   RapidJSON was firstly implemented for C++03. Later it added optional support of some C++11 features (e.g., move constructor, `noexcept`). RapidJSON shall be compatible with C++03 or C++11 compliant compilers.

9. Does RapidJSON really work in real applications?

   Yes. It is deployed in both client and server real applications. A community member reported that RapidJSON in their system parses 50 million JSONs daily.

10. How RapidJSON is tested?

47
   RapidJSON contains a unit test suite for automatic testing. [Travis](https://travis-ci.org/Tencent/rapidjson/)(for Linux) and [AppVeyor](https://ci.appveyor.com/project/Tencent/rapidjson/)(for Windows) will compile and run the unit test suite for all modifications. The test process also uses Valgrind (in Linux) to detect memory leaks.
48 49 50

11. Is RapidJSON well documented?

miloyip's avatar
miloyip committed
51
   RapidJSON provides user guide and API documentationn.
52

Milo Yip's avatar
Milo Yip committed
53 54
12. Are there alternatives?

55 56
   Yes, there are a lot alternatives. For example, [nativejson-benchmark](https://github.com/miloyip/nativejson-benchmark) has a listing of open-source C/C++ JSON libraries. [json.org](http://www.json.org/) also has a list.

Milo Yip's avatar
Milo Yip committed
57 58 59
## JSON

1. What is JSON?
60 61 62 63 64 65 66

   JSON (JavaScript Object Notation) is a lightweight data-interchange format. It uses human readable text format. More details of JSON can be referred to [RFC7159](http://www.ietf.org/rfc/rfc7159.txt) and [ECMA-404](http://www.ecma-international.org/publications/standards/Ecma-404.htm).

2. What are applications of JSON?

   JSON are commonly used in web applications for transferring structured data. It is also used as a file format for data persistence.

Milo Yip's avatar
Milo Yip committed
67
2. Does RapidJSON conform to the JSON standard?
68 69 70

   Yes. RapidJSON is fully compliance with [RFC7159](http://www.ietf.org/rfc/rfc7159.txt) and [ECMA-404](http://www.ecma-international.org/publications/standards/Ecma-404.htm). It can handle corner cases, such as supporting null character and surrogate pairs in JSON strings.

Milo Yip's avatar
Milo Yip committed
71 72
3. Does RapidJSON support relaxed syntax?

73
   Currently no. RapidJSON only support the strict standardized format. Support on related syntax is under discussion in this [issue](https://github.com/Tencent/rapidjson/issues/36).
74

Milo Yip's avatar
Milo Yip committed
75 76 77
## DOM and SAX

1. What is DOM style API?
78 79 80

   Document Object Model (DOM) is an in-memory representation of JSON for query and manipulation.

Milo Yip's avatar
Milo Yip committed
81
2. What is SAX style API?
82 83 84

   SAX is an event-driven API for parsing and generation.

Milo Yip's avatar
Milo Yip committed
85
3. Should I choose DOM or SAX?
86 87 88

   DOM is easy for query and manipulation. SAX is very fast and memory-saving but often more difficult to be applied.

Milo Yip's avatar
Milo Yip committed
89
4. What is *in situ* parsing?
90

miloyip's avatar
miloyip committed
91
   *in situ* parsing decodes the JSON strings directly into the input JSON. This is an optimization which can reduce memory consumption and improve performance, but the input JSON will be modified. Check [in-situ parsing](doc/dom.md) for details.
92 93 94

5. When does parsing generate an error?

miloyip's avatar
miloyip committed
95
   The parser generates an error when the input JSON contains invalid syntax, or a value can not be represented (a number is too big), or the handler of parsers terminate the parsing. Check [parse error](doc/dom.md) for details.
96

Milo Yip's avatar
Milo Yip committed
97
6. What error information is provided? 
98 99 100

   The error is stored in `ParseResult`, which includes the error code and offset (number of characters from the beginning of JSON). The error code can be translated into human-readable error message.

Milo Yip's avatar
Milo Yip committed
101 102
7. Why not just using `double` to represent JSON number?

103 104
   Some applications use 64-bit unsigned/signed integers. And these integers cannot be converted into `double` without loss of precision. So the parsers detects whether a JSON number is convertible to different types of integers and/or `double`.

105 106
8. How to clear-and-minimize a document or value?

107
   Call one of the `SetXXX()` methods - they call destructor which deallocates DOM data:
108

109
   ~~~~~~~~~~cpp
110 111 112
   Document d;
   ...
   d.SetObject();  // clear and minimize
113
   ~~~~~~~~~~
114

115
   Alternatively, use equivalent of the [C++ swap with temporary idiom](https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Clear-and-minimize):
116
   ~~~~~~~~~~cpp
117
   Value(kObjectType).Swap(d);
118
   ~~~~~~~~~~
Martin Lindhe's avatar
Martin Lindhe committed
119
   or equivalent, but slightly longer to type:
120
   ~~~~~~~~~~cpp
121
   d.Swap(Value(kObjectType).Move()); 
122
   ~~~~~~~~~~
123

124 125
9. How to insert a document node into another document?

126
   Let's take the following two DOM trees represented as JSON documents:
127
   ~~~~~~~~~~cpp
128 129 130 131 132
   Document person;
   person.Parse("{\"person\":{\"name\":{\"first\":\"Adam\",\"last\":\"Thomas\"}}}");
   
   Document address;
   address.Parse("{\"address\":{\"city\":\"Moscow\",\"street\":\"Quiet\"}}");
133
   ~~~~~~~~~~
134
   Let's assume we want to merge them in such way that the whole `address` document becomes a node of the `person`:
135
   ~~~~~~~~~~js
136 137 138 139 140
   { "person": {
      "name": { "first": "Adam", "last": "Thomas" },
      "address": { "city": "Moscow", "street": "Quiet" }
      }
   }
141
   ~~~~~~~~~~
142

Martin Lindhe's avatar
Martin Lindhe committed
143
   The most important requirement to take care of document and value life-cycle as well as consistent memory management using the right allocator during the value transfer.
144
   
145
   Simple yet most efficient way to achieve that is to modify the `address` definition above to initialize it with allocator of the `person` document, then we just add the root member of the value:
146
   ~~~~~~~~~~cpp
Martin Lindhe's avatar
Martin Lindhe committed
147
   Document address(person.GetAllocator());
148 149
   ...
   person["person"].AddMember("address", address["address"], person.GetAllocator());
150
   ~~~~~~~~~~
151
Alternatively, if we don't want to explicitly refer to the root value of `address` by name, we can refer to it via iterator:
152
   ~~~~~~~~~~cpp
153 154
   auto addressRoot = address.MemberBegin();
   person["person"].AddMember(addressRoot->name, addressRoot->value, person.GetAllocator());
155
   ~~~~~~~~~~
156
   
157
   Second way is to deep-clone the value from the address document:
158
   ~~~~~~~~~~cpp
159 160
   Value addressValue = Value(address["address"], person.GetAllocator());
   person["person"].AddMember("address", addressValue, person.GetAllocator());
161
   ~~~~~~~~~~
162

Milo Yip's avatar
Milo Yip committed
163 164 165
## Document/Value (DOM)

1. What is move semantics? Why?
miloyip's avatar
miloyip committed
166 167 168

   Instead of copy semantics, move semantics is used in `Value`. That means, when assigning a source value to a target value, the ownership of source value is moved to the target value.

miloyip's avatar
miloyip committed
169
   Since moving is faster than copying, this design decision forces user to aware of the copying overhead.
miloyip's avatar
miloyip committed
170

Milo Yip's avatar
Milo Yip committed
171
2. How to copy a value?
miloyip's avatar
miloyip committed
172 173 174

   There are two APIs: constructor with allocator, and `CopyFrom()`. See [Deep Copy Value](doc/tutorial.md) for an example.

Milo Yip's avatar
Milo Yip committed
175
3. Why do I need to provide the length of string?
miloyip's avatar
miloyip committed
176

Martin Lindhe's avatar
Martin Lindhe committed
177
   Since C string is null-terminated, the length of string needs to be computed via `strlen()`, with linear runtime complexity. This incurs an unnecessary overhead of many operations, if the user already knows the length of string.
miloyip's avatar
miloyip committed
178 179 180

   Also, RapidJSON can handle `\u0000` (null character) within a string. If a string contains null characters, `strlen()` cannot return the true length of it. In such case user must provide the length of string explicitly.

miloyip's avatar
miloyip committed
181
4. Why do I need to provide allocator parameter in many DOM manipulation API?
miloyip's avatar
miloyip committed
182 183 184

   Since the APIs are member functions of `Value`, we do not want to save an allocator pointer in every `Value`.

Milo Yip's avatar
Milo Yip committed
185 186
5. Does it convert between numerical types?

miloyip's avatar
miloyip committed
187
   When using `GetInt()`, `GetUint()`, ... conversion may occur. For integer-to-integer conversion, it only convert when it is safe (otherwise it will assert). However, when converting a 64-bit signed/unsigned integer to double, it will convert but be aware that it may lose precision. A number with fraction, or an integer larger than 64-bit, can only be obtained by `GetDouble()`.
miloyip's avatar
miloyip committed
188

Milo Yip's avatar
Milo Yip committed
189 190
## Reader/Writer (SAX)

miloyip's avatar
miloyip committed
191 192 193 194 195 196 197
1. Why don't we just `printf` a JSON? Why do we need a `Writer`? 

   Most importantly, `Writer` will ensure the output JSON is well-formed. Calling SAX events incorrectly (e.g. `StartObject()` pairing with `EndArray()`) will assert. Besides, `Writer` will escapes strings (e.g., `\n`). Finally, the numeric output of `printf()` may not be a valid JSON number, especially in some locale with digit delimiters. And the number-to-string conversion in `Writer` is implemented with very fast algorithms, which outperforms than `printf()` or `iostream`.

2. Can I pause the parsing process and resume it later?

   This is not directly supported in the current version due to performance consideration. However, if the execution environment supports multi-threading, user can parse a JSON in a separate thread, and pause it by blocking in the input stream.
Milo Yip's avatar
Milo Yip committed
198 199 200 201

## Unicode

1. Does it support UTF-8, UTF-16 and other format?
miloyip's avatar
miloyip committed
202 203 204

   Yes. It fully support UTF-8, UTF-16 (LE/BE), UTF-32 (LE/BE) and ASCII. 

Milo Yip's avatar
Milo Yip committed
205
2. Can it validate the encoding?
miloyip's avatar
miloyip committed
206

Martin Lindhe's avatar
Martin Lindhe committed
207
   Yes, just pass `kParseValidateEncodingFlag` to `Parse()`. If there is invalid encoding in the stream, it will generate `kParseErrorStringInvalidEncoding` error.
miloyip's avatar
miloyip committed
208

Milo Yip's avatar
Milo Yip committed
209
3. What is surrogate pair? Does RapidJSON support it?
miloyip's avatar
miloyip committed
210 211 212 213 214

   JSON uses UTF-16 encoding when escaping unicode character, e.g. `\u5927` representing Chinese character "big". To handle characters other than those in basic multilingual plane (BMP), UTF-16 encodes those characters with two 16-bit values, which is called UTF-16 surrogate pair. For example, the Emoji character U+1F602 can be encoded as `\uD83D\uDE02` in JSON.

   RapidJSON fully support parsing/generating UTF-16 surrogates. 

miloyip's avatar
miloyip committed
215
4. Can it handle `\u0000` (null character) in JSON string?
miloyip's avatar
miloyip committed
216

miloyip's avatar
miloyip committed
217
   Yes. RapidJSON fully support null character in JSON string. However, user need to be aware of it and using `GetStringLength()` and related APIs to obtain the true length of string.
miloyip's avatar
miloyip committed
218

miloyip's avatar
miloyip committed
219
5. Can I output `\uxxxx` for all non-ASCII character?
Milo Yip's avatar
Milo Yip committed
220

miloyip's avatar
miloyip committed
221 222
   Yes, use `ASCII<>` as output encoding template parameter in `Writer` can enforce escaping those characters.

Milo Yip's avatar
Milo Yip committed
223 224 225
## Stream

1. I have a big JSON file. Should I load the whole file to memory?
miloyip's avatar
miloyip committed
226 227 228

   User can use `FileReadStream` to read the file chunk-by-chunk. But for *in situ* parsing, the whole file must be loaded.

Milo Yip's avatar
Milo Yip committed
229
2. Can I parse JSON while it is streamed from network?
miloyip's avatar
miloyip committed
230 231 232

   Yes. User can implement a custom stream for this. Please refer to the implementation of `FileReadStream`.

miloyip's avatar
miloyip committed
233
3. I don't know what encoding will the JSON be. How to handle them?
miloyip's avatar
miloyip committed
234 235 236

   You may use `AutoUTFInputStream` which detects the encoding of input stream automatically. However, it will incur some performance overhead.

Milo Yip's avatar
Milo Yip committed
237
4. What is BOM? How RapidJSON handle it?
miloyip's avatar
miloyip committed
238

luz.paz's avatar
luz.paz committed
239
   [Byte order mark (BOM)](http://en.wikipedia.org/wiki/Byte_order_mark) sometimes reside at the beginning of file/stream to indicate the UTF encoding type of it.
miloyip's avatar
miloyip committed
240 241 242

   RapidJSON's `EncodedInputStream` can detect/consume BOM. `EncodedOutputStream` can optionally write a BOM. See [Encoded Streams](doc/stream.md) for example.

Milo Yip's avatar
Milo Yip committed
243 244
5. Why little/big endian is related?

miloyip's avatar
miloyip committed
245 246
   little/big endian of stream is an issue for UTF-16 and UTF-32 streams, but not UTF-8 stream.

Milo Yip's avatar
Milo Yip committed
247 248 249
## Performance

1. Is RapidJSON really fast?
miloyip's avatar
miloyip committed
250

Martin Lindhe's avatar
Martin Lindhe committed
251
   Yes. It may be the fastest open source JSON library. There is a [benchmark](https://github.com/miloyip/nativejson-benchmark) for evaluating performance of C/C++ JSON libraries.
miloyip's avatar
miloyip committed
252

Milo Yip's avatar
Milo Yip committed
253
2. Why is it fast?
miloyip's avatar
miloyip committed
254

miloyip's avatar
miloyip committed
255
   Many design decisions of RapidJSON is aimed at time/space performance. These may reduce user-friendliness of APIs. Besides, it also employs low-level optimizations (intrinsics, SIMD) and special algorithms (custom double-to-string, string-to-double conversions).
miloyip's avatar
miloyip committed
256

Milo Yip's avatar
Milo Yip committed
257
3. What is SIMD? How it is applied in RapidJSON?
miloyip's avatar
miloyip committed
258

259
   [SIMD](http://en.wikipedia.org/wiki/SIMD) instructions can perform parallel computation in modern CPUs. RapidJSON support Intel's SSE2/SSE4.2 and ARM's Neon to accelerate whitespace/tabspace/carriage-return/line-feed skipping. This improves performance of parsing indent formatted JSON. Define `RAPIDJSON_SSE2`, `RAPIDJSON_SSE42` or `RAPIDJSON_NEON` macro to enable this feature. However, running the executable on a machine without such instruction set support will make it crash.
miloyip's avatar
miloyip committed
260

Milo Yip's avatar
Milo Yip committed
261
4. Does it consume a lot of memory?
miloyip's avatar
miloyip committed
262 263 264

   The design of RapidJSON aims at reducing memory footprint.

Martin Lindhe's avatar
Martin Lindhe committed
265
   In the SAX API, `Reader` consumes memory proportional to maximum depth of JSON tree, plus maximum length of JSON string.
miloyip's avatar
miloyip committed
266 267 268

   In the DOM API, each `Value` consumes exactly 16/24 bytes for 32/64-bit architecture respectively. RapidJSON also uses a special memory allocator to minimize overhead of allocations.

Milo Yip's avatar
Milo Yip committed
269 270
5. What is the purpose of being high performance?

Martin Lindhe's avatar
Martin Lindhe committed
271
   Some applications need to process very large JSON files. Some server-side applications need to process huge amount of JSONs. Being high performance can improve both latency and throughput. In a broad sense, it will also save energy.
miloyip's avatar
miloyip committed
272

Milo Yip's avatar
Milo Yip committed
273 274 275
## Gossip

1. Who are the developers of RapidJSON?
miloyip's avatar
miloyip committed
276

Milo Yip's avatar
Milo Yip committed
277
   Milo Yip ([miloyip](https://github.com/miloyip)) is the original author of RapidJSON. Many contributors from the world have improved RapidJSON.  Philipp A. Hartmann ([pah](https://github.com/pah)) has implemented a lot of improvements, setting up automatic testing and also involves in a lot of discussions for the community. Don Ding ([thebusytypist](https://github.com/thebusytypist)) implemented the iterative parser. Andrii Senkovych ([jollyroger](https://github.com/jollyroger)) completed the CMake migration. Kosta ([Kosta-Github](https://github.com/Kosta-Github)) provided a very neat short-string optimization. Thank you for all other contributors and community members as well.
miloyip's avatar
miloyip committed
278

Milo Yip's avatar
Milo Yip committed
279
2. Why do you develop RapidJSON?
miloyip's avatar
miloyip committed
280 281 282

   It was just a hobby project initially in 2011. Milo Yip is a game programmer and he just knew about JSON at that time and would like to apply JSON in future projects. As JSON seems very simple he would like to write a header-only and fast library.

Milo Yip's avatar
Milo Yip committed
283
3. Why there is a long empty period of development?
miloyip's avatar
miloyip committed
284 285 286

   It is basically due to personal issues, such as getting new family members. Also, Milo Yip has spent a lot of spare time on translating "Game Engine Architecture" by Jason Gregory into Chinese.

Milo Yip's avatar
Milo Yip committed
287
4. Why did the repository move from Google Code to GitHub?
miloyip's avatar
miloyip committed
288 289

   This is the trend. And GitHub is much more powerful and convenient.