Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in / Register
Toggle navigation
R
rapidjson
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Packages
Packages
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
submodule
rapidjson
Commits
3ed9527c
Commit
3ed9527c
authored
Jul 19, 2014
by
Milo Yip
Browse files
Options
Browse Files
Download
Plain Diff
Merge pull request #78 from thebusytypist/TransitionTable
Supplemental documents for iterative parsing
parents
19a2279a
1ec83fb7
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
152 additions
and
9 deletions
+152
-9
iterative-parser-states-diagram.dot
doc/diagram/iterative-parser-states-diagram.dot
+37
-0
iterative-parser-states-diagram.png
doc/diagram/iterative-parser-states-diagram.png
+0
-0
internals.md
doc/internals.md
+115
-9
No files found.
doc/diagram/iterative-parser-states-diagram.dot
0 → 100644
View file @
3ed9527c
digraph
{
splines
=
true
;
node
[
shape
=
doublecircle
]
;
Start
;
Finish
;
node
[
shape
=
circle
]
;
Start
->
ArrayInitial
[
label
=
"["
]
;
Start
->
ObjectInitial
[
label
=
"{"
]
;
ObjectInitial
->
ObjectFinish
[
label
=
"}"
]
;
ObjectInitial
->
MemberKey
[
label
=
"string"
]
;
MemberKey
->
KeyValueDelimiter
[
label
=
":"
]
;
KeyValueDelimiter
->
ArrayInitial
[
label
=
"[ (push MemberValue)"
]
;
KeyValueDelimiter
->
ObjectInitial
[
label
=
"{ (push MemberValue)"
]
;
KeyValueDelimiter
->
MemberValue
[
label
=
"string|false|true|null|number"
]
;
MemberValue
->
ObjectFinish
[
label
=
"}"
]
;
MemberValue
->
MemberDelimiter
[
label
=
","
]
;
MemberDelimiter
->
MemberKey
[
label
=
"string"
]
;
ArrayInitial
->
ArrayInitial
[
label
=
"[ (push Element)"
]
;
ArrayInitial
->
ArrayFinish
[
label
=
"]"
]
;
ArrayInitial
->
ObjectInitial
[
label
=
"{ (push Element)"
]
;
ArrayInitial
->
Element
[
label
=
"string|flase|true|null|number"
]
;
Element
->
ArrayFinish
[
label
=
"]"
]
;
Element
->
ElementDelimiter
[
label
=
","
]
;
ElementDelimiter
->
ArrayInitial
[
label
=
"[ (push Element)"
]
;
ElementDelimiter
->
ObjectInitial
[
label
=
"{ (push Element)"
]
;
ElementDelimiter
->
Element
[
label
=
"string|false|true|null|number"
]
;
ArrayFinish
->
Finish
;
ObjectFinish
->
Finish
;
}
doc/diagram/iterative-parser-states-diagram.png
0 → 100644
View file @
3ed9527c
172 KB
doc/internals.md
View file @
3ed9527c
...
@@ -2,20 +2,126 @@
...
@@ -2,20 +2,126 @@
This section records some design and implementation details.
This section records some design and implementation details.
# Value
[
TOC
]
#
# Data Layout
#
Value {#Value}
##
Flags
##
Data Layout {#DataLayout}
#
Allocator
#
# Flags {#Flags}
#
# MemoryPoolAllocator
#
Allocator {#Allocator}
#
Parsing Optimization
#
# MemoryPoolAllocator {#MemoryPoolAllocator}
#
# Skip Whitespace with SIMD
#
Parsing Optimization {#ParsingOptimization}
##
Pow10()
##
Skip Whitespace with SIMD {#SkipwhitespaceWithSIMD}
## Local Stream Copy
## Pow10() {#Pow10}
## Local Stream Copy {#LocalStreamCopy}
# Parser {#Parser}
## Iterative Parser {#IterativeParser}
The iterative parser is a recursive descent LL(1) parser
implemented in a non-recursive manner.
### Grammar {#IterativeParserGrammar}
The grammar used for this parser is based on strict JSON syntax:
~~~
~~~~~~~
S -> array | object
array -> [ values ]
object -> { members }
values -> non-empty-values | ε
non-empty-values -> value addition-values
addition-values -> ε | , non-empty-values
members -> non-empty-members | ε
non-empty-members -> member addition-members
addition-members -> ε | , non-empty-members
member -> STRING : value
value -> STRING | NUMBER | NULL | BOOLEAN | object | array
~~~
~~~~~~~
Note that left factoring is applied to non-terminals
`values`
and
`members`
to make the grammar be LL(1).
### Parsing Table {#IterativeParserParsingTable}
Based on the grammar, we can construct the FIRST and FOLLOW set.
The FIRST set of non-terminals is listed below:
| NON-TERMINAL | FIRST |
|:-----------------:|:--------------------------------:|
| array |
[
|
| object | { |
| values | ε STRING NUMBER NULL BOOLEAN {
[
|
| addition-values | ε COMMA |
| members | ε STRING |
| addition-members | ε COMMA |
| member | STRING |
| value | STRING NUMBER NULL BOOLEAN {
[
|
| S |
[
{ |
| non-empty-members | STRING |
| non-empty-values | STRING NUMBER NULL BOOLEAN {
[
|
The FOLLOW set is listed below:
| NON-TERMINAL | FOLLOW |
|:-----------------:|:-------:|
| S | $ |
| array | , $ } ] |
| object | , $ } ] |
| values | ] |
| non-empty-values | ] |
| addition-values | ] |
| members | } |
| non-empty-members | } |
| addition-members | } |
| member | , } |
| value | , } ] |
Finally the parsing table can be constructed from FIRST and FOLLOW set:
| NON-TERMINAL |
[
| { | , | : |
]
| } | STRING | NUMBER | NULL | BOOLEAN |
|:-----------------:|:---------------------:|:---------------------:|:-------------------:|:-:|:-:|:-:|:-----------------------:|:---------------------:|:---------------------:|:---------------------:|
| S | array | object | | | | | | | | |
| array |
[
values
]
| | | | | | | | | |
| object | | { members } | | | | | | | | |
| values | non-empty-values | non-empty-values | | | ε | | non-empty-values | non-empty-values | non-empty-values | non-empty-values |
| non-empty-values | value addition-values | value addition-values | | | | | value addition-values | value addition-values | value addition-values | value addition-values |
| addition-values | | | , non-empty-values | | ε | | | | | |
| members | | | | | | ε | non-empty-members | | | |
| non-empty-members | | | | | | | member addition-members | | | |
| addition-members | | | , non-empty-members | | | ε | | | | |
| member | | | | | | | STRING : value | | | |
| value | array | object | | | | | STRING | NUMBER | NULL | BOOLEAN |
There is a great
[
tool
](
http://hackingoff.com/compilers/predict-first-follow-set
)
for above grammar analysis.
### Implementation {#IterativeParserImplementation}
Based on the parsing table, a direct(or conventional) implementation
that pushes the production body in reverse order
while generating a production could work.
In RapidJSON, several modifications(or adaptations to current design) are made to a direct implementation.
First, the parsing table is encoded in a state machine in RapidJSON.
States are constructed by the head and body of production.
State transitions are constructed by production rules.
Besides, extra states are added for productions involved with
`array`
and
`object`
.
In this way the generation of array values or object members would be a single state transition,
rather than several pop/push operations in the direct implementation.
This also makes the estimation of stack size more easier.
The final states diagram is shown below:
<img
src=
"diagram/iterative-parser-states-diagram.png"
alt=
"States Diagram"
height=
"400px"
/>
Second, the iterative parser also keeps track of array's value count and object's member count
in its internal stack, which may be different from a conventional implementation.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment