Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in / Register
Toggle navigation
R
rapidjson
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Packages
Packages
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
submodule
rapidjson
Commits
d29e5f96
Commit
d29e5f96
authored
Jul 19, 2014
by
thebusytypist
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add document for implementation of iterative parser.
parent
140dc066
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
98 additions
and
0 deletions
+98
-0
internals.md
doc/internals.md
+98
-0
No files found.
doc/internals.md
View file @
d29e5f96
...
...
@@ -19,3 +19,101 @@ This section records some design and implementation details.
## Pow10()
## Local Stream Copy
# Parser
## Iterative Parser
The iterative parser is a recursive descent LL(1) parser
implemented in a non-recursive manner.
### Grammar
The grammar used for this parser is based on strict JSON syntax:
~~~
~~~~~~~
S -> array | object
array -> [ values ]
object -> { members }
values -> non-empty-values | ε
non-empty-values -> value addition-values
addition-values -> ε | , non-empty-values
members -> non-empty-members | ε
non-empty-members -> member addition-members
addition-members -> ε | , non-empty-members
member -> STRING : value
value -> STRING | NUMBER | NULL | BOOLEAN | object | array
~~~
~~~~~~~
Note that left factoring is applied to non-terminals
`values`
and
`members`
to make the grammar be LL(1).
### Parsing Table
Based on the grammar, we can construct the FIRST and FOLLOW set.
The FIRST set of non-terminals is listed below:
| NON-TERMINAL | FIRST |
|:-----------------:|:--------------------------------:|
| array |
[
|
| object | { |
| values | ε STRING NUMBER NULL BOOLEAN {
[
|
| addition-values | ε COMMA |
| members | ε STRING |
| addition-members | ε COMMA |
| member | STRING |
| value | STRING NUMBER NULL BOOLEAN {
[
|
| S |
[
{ |
| non-empty-members | STRING |
| non-empty-values | STRING NUMBER NULL BOOLEAN {
[
|
The FOLLOW set is listed below:
| NON-TERMINAL | FOLLOW |
|:-----------------:|:-------:|
| S | $ |
| array | , $ } ] |
| object | , $ } ] |
| values | ] |
| non-empty-values | ] |
| addition-values | ] |
| members | } |
| non-empty-members | } |
| addition-members | } |
| member | , } |
| value | , } ] |
Finally the parsing table can be constructed from FIRST and FOLLOW set:
| NON-TERMINAL |
[
| { | , | : |
]
| } | STRING | NUMBER | NULL | BOOLEAN |
|:-----------------:|:---------------------:|:---------------------:|:-------------------:|:-:|:-:|:-:|:-----------------------:|:---------------------:|:---------------------:|:---------------------:|
| S | array | object | | | | | | | | |
| array |
[
values
]
| | | | | | | | | |
| object | | { members } | | | | | | | | |
| values | non-empty-values | non-empty-values | | | ε | | non-empty-values | non-empty-values | non-empty-values | non-empty-values |
| non-empty-values | value addition-values | value addition-values | | | | | value addition-values | value addition-values | value addition-values | value addition-values |
| addition-values | | | , non-empty-values | | ε | | | | | |
| members | | | | | | ε | non-empty-members | | | |
| non-empty-members | | | | | | | member addition-members | | | |
| addition-members | | | , non-empty-members | | | ε | | | | |
| member | | | | | | | STRING : value | | | |
| value | array | object | | | | | STRING | NUMBER | NULL | BOOLEAN |
There is a great
[
tool
](
http://hackingoff.com/compilers/predict-first-follow-set
)
for above grammar analysis.
### Implementation
Based on the parsing table, a direct(or conventional) implementation
that pushes the production body in reverse order
while generating a production could work.
In RapidJSON, several modifications(or adaptations to current design) are made to a direct implementation.
First, the parsing table is encoded in a state machine in RapidJSON.
Extra states are added for productions involved with
`array`
and
`object`
.
In this way the generation of array values or object members would be a single state transition,
rather than several pop/push operations in the direct implementation.
This also makes the estimation of stack size more easier.
Second, the iterative parser also keeps track of array's value count and object's member count
in its internal stack, which may be different from a conventional implementation.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment