|
Inferno #07
31 мая 2005 |
|
For Coderz - Recognition and computation of arithmetic expressions on their character record.

Recognition and computation
arithmetic expressions
Vitamin/CAIG/2001
This article will be touched upon
compilers, and translators, in particular,
calculation of the arithmetic expression of
his character record. For example, how to
mainframe can evaluate the expression of type
(10 +20) * 30-40? Man looking at this expression
mapping, just tell the answer. And what if
in the expression contains a huge number of
brackets? Here even the most gifted mathematician
thoughtful, though the expression itself can solve
tsya very simple. A computer does not have
human intelligence, as it is sad
but he can only assume. That should be it
learn to count such things. This we
you now do.
Let's start with the lexical analiza.Leksiches
cue analysis is always preceded by sintaksiche
Scoma analysis and is the prefix
belorussian processing of expression and its perekodi
derivations to a form suitable for syntactic
one analysis. Lexical analysis made
separated from the parsing for two reasons
us:
- To reduce compilation time;
- The syntax of tokens is always easier than the blue
taxis programs.
Lexical analysis is reduced to the partition
the program text into lexical units
(Token) - design, acting as
terminal symbols for syntactic
analiza.Leksemy sometimes called symbolic
Lamy or atomami.T.e., symbols "("," 10 ","+"
personal computer, do not say. For him, it
just data. Moreover, different lengths. And our
task - to transform them into a readable
computer form. This is the purpose
lexical analysis. Parsing
(Check on the correctness of the expression) is usually
but after doing this a hundred leksicheskogo.V
THIEY he will not be discussed in detail,
but I will say a few words later.
Token can be represented as a pair
(Class value). The value can be directly
rectly or submit a link
at some tables, which stores
value tokens. In particular, for our
problem requires the use of three types
tokens: delimiters, operators, and numeric
konstanty.Pust we have a table again
dividers and operations. Here they are:
Separator Operation Index Index
gap 0 <tsikl.sdvig left 0
Grating # 1> tsikl.sdvig right one
% Rate 2! XOR 2
(Skobka1 3 'logical AND 3
) Skobka2 4 ~ 4 inversion
| Logical OR 5
- Subtract 6
7 + addition
Comparison = 8
/ 9 division
* Multiplication 10
. ml.chast operand 11
'St.chast operand 12
In the value field tokens Save
tsya index operation or splitter. For
numeric constants as a table
values, which is created in the course of analysis
for. Further, we denote the token combination
Niemi BukvaTsifra form, for example, the minus sign
be coded as O6 (Operation 6),
separator skobka1 be coded as
D3 (Delimiter 3). Tokens are constants
have the prefix N (Number).
The task of lexical analysis is sufficient
simple, the complexity can be only
analysis of multi-character tokens (in our
case number.) Here you can apply the method
samples. The input to the program serves the characters, and
She checks them for membership
Class razdeliteley.Esli successfully, the output
enters the corresponding token. If
failure, there is a check for membership
sensitivity to class operatsiy.I so dalee.Vychle
equation symbolic constants is done by
determine their left and right granits.Dalee
constants are translated into numerical form and for
rush to the table of constants. Moreover, if
is a constant in the table already exists, then it
not be added again, and returns the index
existing constant.
Note 1. Since the considered
the problem is quite limited, I will not
consider other classes can leksem.Eto
Gut be identifiers (if using
variables), keywords, tags, symbol
constants (if it is input
text of a higher level than assembler)
etc. It should also be allocated
Other multi-character structure: the same
keywords, tags, and mnogoliternye
operations and separators.
Since we have broached the subject of languages more
High-level (the problem is
Recognition arithmetic NE
a special case of the concepts of translation
and compilation, and it is a language of high
one level), it is necessary to return to the promised
syntactic subject analiza.Chastichno concentration
Troll the correct expression is satisfied
at the level of lexical analysis. Usually
a check for a valid combination of symmetry
oxen in the declaration of constants, identifiers
tori, etc. But the text of the program is absolutely
correct in terms of vocabulary can
be syntactically incorrect. In our
If this includes the control sequence
particular character (that operations were
in the allowed combinations), the numbers in parentheses
arithmetic vyrazheniyah.Zadacha syntactic
electric analysis is complicated enough and decides
tsya by compiling a grammar of the input
language and parse the set of tokens recursive
algorithm according to the grammar. In
our case, we can restrict the former has
been proved operations.
The next step is the recognition
translate the expression in the form of tokens in the formation
tnuyu Polish Polish zapis.Obratnaya for
pis (IPF) is a form of
writing expressions and statements, distinguishing
An important feature of which is that
all the arguments (or operands) are
before the operation (the operator).
For example, the expression written in the usual
bracket entries
(A + d) / c + b * (e + d),
being represented in the IPF receives the form:
ad + c / bed + * +.
RPN has received wide
some spread due to its basis
vnomu advantage: the expression (operator)
written in the form of the SCR can be executed
one view of the chain from left to right. In
system programming is widely applied
is handling strings of symbols, so
necessary algorithms for IPF in
ntsipu string processing: the transformation of input
Neu-line on weekends, which is about
inverse Polish notation.
For the construction of the SCR can be used
vyrazheniya.V tree tops of the tree is
tsya operations (operators), and the leaves are
are arguments. For example, for the reduced
example, the tree will look like this:
(+)
/
(/) (*)
/ /
(+) C b (+)
/ /
a d e d
In the top of each subtree
is an operation to be performed
tsya posledney.Dlya conversion tree
linear recording, Image de
Reva, starting from the lower left is
hundred, left to right. After viewing all
sheet transition occurs to the root, etc.
SCR finished tying with this tree, you will not
difficult to calculate the conversion algorithm.
But to formalize this algorithm to
it was clear the car, very slozhno.Poetomu
For this purpose, other sposoby.Nai
better known of these algorithms is
tsya Dijkstra's algorithm. It uses
stack, and places the transaction. Each operator
radio is assigned a priority. Except
this, define the rules for work with the degree
com.
1) Input string visible on the left
right.
2) If the symbol is considered
operand, then it is transferred to the output
string, otherwise processed stack trace
lows:
- If the stack is empty, the operation is logged in
stack.
- If the stack top element (the opera
tion) has a higher priority, then
considered the operation of pushing
tsya the stack.
- If the stack top element (the opera
tion) has a lower priority, then
from the stack are pushed all the elements
(Operation) until such time has not yet encountered
titsya operation with a priority higher than
have the operation in question, after which
This operation is pushed onto the stack.
- If the input symbol "(" he pro
repel the stack.
- If the input character "), then pull it
lkivaet from the stack, all characters up to the nearest
Shay ")." Brackets themselves mutually annihilate
are expressed in the output string is not getting
exist.
Table of precedence of operations:
Operations | Priority
. '~ | 0
* / | 1
+ - | 2
= | 3
'! | | 4
<> | 5
Consider the example of the algorithm:
Log in | (a + d) / c + b * (e + d) <end>
------+----------------------------------
C | (+ / + * (+ * +
m | (+ * (+
e | + *
to | +
------+---------------------------------- Out | ad + c / bed +
* +
At the end of input to output
transmitted to all the characters from the stack until
long as the stack is not freed. As a result of
we find that the output contains only
operations and their arguments. This recording
called back because of the operation of one hundred
um after the operands (there is also a line where
operation precedes the operands, but we do it
not necessary).
Note 2. Thus, t.e.sogla
CHO Dijkstra's algorithm can be translated into
SCR is not only an arithmetic expression, but
and a program in Java (the languages of high levels
AE). Thus it is necessary to determine the additional
nye classes of tokens: a description and call the function
tions, descriptions, and addressing arrays, conditional
nye, and unconditional transitions. As a result,
get a description of the language of low-level
structures could not be better adapted
lennyh further translation into machine
code. Information can be found in
literature on this topic.
The next and final step will be to direct
rectly evaluate the expression. As you
probably already had to make sure the calculation
SCR turns on fast and easy, holds
equate in one pass. Algorithm for computing
summarized as follows:
1) When is input operand (Cons
tantalum), he entered into the stack.
2) If the operator is input, then
stack is popped the required number of operators
ndov and operations. Result for
rushes back into stek.Neobhodimo remember
that the operand stack is stored in reverse
order.
3) After the top of the stack will
will lie the answer.
Here is an example of software vse.Dalee
implementation of the above. Program
has several features:
- All numbers have 16-bit format;
- The construction of the SCR are not included una
rnye + and -;
- Constants may be in three systems
notation;
- Supports all major arithmetic
cal and logical;
- The priority of the operations described in the table,
t.e.tsiklichesky shift, for example, holds
after all other operations;
- Boolean operations have the same
priority and are performed from left to right;
- Parse realized hour
partially (not control the number of brackets), so
better define the correct expression, otherwise
possible stack overflow.
After starting the program shows on the
screen, the original expression, a response record you
expressions in the form of tokens and the SCR as tokens.
Ed.: Full version of the program, see
Annex (COUNT.H), give here only
to the main fragments.
...
CALL LEXIC; call leksich.analiz-ra
RET C; was a mistake
LD HL, DESTIN; output tokens
LD DE, # 4800; character records
CALL WRITE_LEX
CALL BPR_CNV; translation in the SCR
RET C; was a mistake
LD HL, BPR_BUF; output tokens SCR
LD DE, # 5000
CALL WRITE_LEX
CALL EVALUATE; calculation
And the result - in BC
LD HL, STRIN
LD E, 4
DOANSW XOR A
DUP 4
RL C, B
RLA
EDUP
ADD A, 48
CP 1958
JR C, ADOK
ADD A, 7
ADOK LD (HL), A
INC HL
DEC E
JR NZ, DOANSW
LD HL, SOURCE
LD DE, # 4000
CALL PRINT
LD HL, STRING
LD DE, # 40C0
PRINT LD A, (HL)
AND A
RET Z
CALL WR_SYM
INC HL
JR PRINT
STRING DB "Answer is: #"
STRIN DB "0000", 0
; Classes of tokens
OPERATION = 1; operation
DELIMITER = OPERATION +1; separator
NUMBER = DELIMITER +1; numerical constant
SOURCE = # C000; string expression
DESTIN = # E000; lexical analysis
BPR_BUF = # D000; RPN
DIGITS = # 8000; buffer numerical constants
;---------- Lexical analysis ---------- LEXIC
LD (RETSP +1), SP; sohr.adr.vozvrata
LD HL, SOURCE
LD IX, DESTIN
XOR A; setting
LD (DIGITS), A
LD B, A
LD C, A
LEXCYC LD A, (HL); last character
LD (IX), A
AND A
RET Z
CALL IS_DELIM; a divider?
JR C, NODELIM
INC HL; enters
LD (IX), DELIMITER
LODCONT LD (IX +1), A
LD B, (IX)
LD C, A
INC IX, IX
JR LEXCYC
NODELIM CALL IS_OPER; this operation?
JR C, NOOPER
INC HL; enters
LD (IX), OPERATION
JR LODCONT
NOOPER CALL IS_DIGIT; it chisl.konstanta?
JP C, ERROR; if not, then the error
LD (IX), NUMBER
JR LODCONT
; Checks belong to the STP token diff. Classes
IS_DELIM
EXX
LD HL, DELIMITERS
IS_TST LD C, 0
CONSEAR INC (HL)
DEC (HL)
JR Z, ENDLST
CP (HL)
JR NZ, NOEL
LD A, C
EXX
RET
NOEL INC L
INC C
JR CONSEAR
ENDLST EXX
SCF
RET
IS_OPER
EXX
LD HL, OPERATIONS
JR IS_TST
IS_DIGIT
EXA
LD A, B
CP DELIMITER
JR NZ, NODELM
LD A, C; if the previous symbol =
DEC A; pointer sist.schisleniya.
JP Z, HEX; hex
DEC A
JP Z, BIN; binary
NODELM EXA
CP "0
RET C; if this is not the number of
CP "9" 1
JR C, ISDIG
CCF
RET
ISDIG LD E, 0; counter-allowed characters in
DIGOK INC E
INC HL
LD A, (HL)
CP "0
JR C, ENDDIG
CP "9" 1
JR C, DIGOK
CP "A
JR C, ENDDIG
CP "Z" +1
JP C, ERROR
CP "a
JR C, ENDDIG
CP "z" +1
JP C, ERROR
CP 128
JP NC, ERROR
ENDDIG
PUSH HL
, Translated from the line (HL) in the number of HL '
...
ENDDECOD
EXX; entry in the list of constants
PUSH IX
LD IX, DIGITS
LD E, 0
LD A, (NUMBERS)
AND A
JR Z, ADDNUM; no one there, puts
LD B, A
DOSEAR LD A, (IX); search
CP L
JR NZ, NEXNUM
LD A, (IX +1)
CP H
JR Z, RETNUM; found - do not add, but
And returns the index
NEXNUM INC E
INC IX, IX
DJNZ DOSEAR
ADDNUM LD (IX), L; ADDED th nov.konstanty
LD (IX +1), H
LD A, (NUMBERS); it increases their number
INC A
LD (NUMBERS), A
RETNUM POP IX
LD A, E; return index constants
AND A
EXX
POP HL
RET
; Hexadecimal constant
HEX DEC IX, IX; to remove specific
; Cator of the output string
ISHEX LD E, -1
DEC HL
HEXOK INC E; count the number of characters
INC HL
LD A, (HL)
CP "0
JR C, ENDHEX
CP "9" 1
JR C, HEXOK
CP "A
JR C, ENDHEX
CP "Z" +1
JP C, HEXOK
CP "a
JR C, ENDHEX
CP "z" +1
JP C, HEXOK
CP 128
JP NC, ERROR
ENDHEX
LD A, E; no simv.net => error
AND A
JR Z, ERROR
PUSH HL
EXX
LD HL, 0
EXX
LD A, E; return to the first character
BAKHEX DEC HL
DEC A
JR NZ, BAKHEX
DOHEX LD A, (HL); NeuStar tetrads nach.so
SUB "0; starshih.Esli among more
CP 10, 4 tetrads excess lost
JR C, H1
SUB 7
H1 EXX
RLA
RLA
RLA
RLA
DUP 4
RLA
RL L, H
EDUP
EXX
INC HL
DEC E
JR NZ, DOHEX
JP ENDDECOD
; Binary constant
BIN; similarly indiscriminately
... , Hexadecimal constants
JP ENDDECOD
ERROR LD A, 2; error indication
OUT (-2), A
RETSP LD SP, 0; & Returns
SCF
RET
; Translations RPN (SCR)
BPR_CNV
LD (RETSP +1), SP; setting
LD IX, DESTIN
LD HL, BPR_BUF
LD BC, 0
PUSH BC; initializing the stack -
; Entering the marker end of the stack
BPR_CYC LD A, (IX); read one token
LD E, (IX +1)
AND A
JR Z, FINISH; no more tokens
LD D, A
CP DELIMITER
JR Z, DELIMS; separator
CP OPERATION
JR Z, OPERA; operation
OUT_LEX LD (HL), D; output tokens to the output
INC HL
LD (HL), E
INC HL
NEXSY INC IX, IX
JR BPR_CYC
DELIMS LD A, E; parsing delimiters
AND A
JR Z, NEXSY; blank - skip
SUB 3; opening bracket -
JR Z, PUSHA; entry into the stack
DEC A
JR Z, POPA; asm
OPERA POP BC
PUSH BC; element from the top of the stack
CALL GET_PRIO; its priority
LD C, E; current element
LD B, D
PUSH BC; entry into the stack (*)
LD E, A
CALL GET_PRIO; and its priority
CP E; if priority is less
JR C, NEXSY; a Record (*: already there)
POP DE; otherwise - the current element
POP BC; and item from the top
LD (HL), B; top e-t - on the way out
INC HL
LD (HL), C
INC HL
JR OPERA; popup until
, Until a token
; With lower priority
PUSHA PUSH DE; token in the stack (bracket)
JR NEXSY
POPA; closing bracket - pushing
POP DE; all e-Tov to the nearest
LD A, D; opening
CP DELIMITER
JR NZ, COUT
LD A, E; opening bracket
CP 3; also ejected
JR Z, NEXSY; but not displayed
COUT LD (HL), D
INC HL
LD (HL), E
INC HL
JR POPA
FINISH POP DE; completion of translation -
LD (HL), D; expulsion of all
INC HL; e-Tov from the stack and output.
LD (HL), E
INC HL
LD A, D
AND A
JR NZ, FINISH
RET
; Take priority tokens
GET_PRIO; BC = LEXEM, A <= PRIO
LD A, B
CP OPERATION
JR Z, OP_PRIO
CP DELIMITER
JR NZ, OTHER
LD B, 'DEL_PRIO
LD A, C
ADD A, DEL_PRIO
LD C, A
LD A, (BC)
RET
OTHER LD A, -1; the highest priority in
RET; "extra" tokens (usually
And this marker end of the stack)
OP_PRIO
LD B, 'OPR_PRIO
LD A, C
ADD A, OPR_PRIO
LD C, A
LD A, (BC)
RET
;------ Calculation of expression of SCR ----- EVALUATE
LD (RETSP +1), SP; setting
LD IX, BPR_BUF
DOEVAL LD D, (IX)
LD E, (IX +1)
LD A, D
AND A
JR NZ, CONTEV
POP BC; end of the expression
RET
CONTEV CP NUMBER
JR NZ, NONUM
LD D, 0 number - entry of the stack
LD HL, DIGITS; search buffer
EX DE, HL
ADD HL, HL
ADD HL, DE
LD E, (HL)
INC HL
LD D, (HL)
PUSSHA PUSH DE
INC IX, IX
JR DOEVAL
NONUM CP OPERATION; there were only
, Operators, delimiters
JP NZ, ERROR; in the SCR should not be!
; Vypoln.oper th on e-Tami top of the stack
, (One or 2, while at the top of the stack
; Lies _vtoroy_ operand
LD A, E
AND A
JR NZ, SH_R
POP BC, DE; cyclic shift to the left
DOSHL LD A, D
RLA
RL E, D
DEC BC
LD A, B
OR C
JR NZ, DOSHL
JR PUSSHA
SH_R DEC A
JR NZ, XOR_
POP BC, DE; tsiklich.sdvig right
...
JR PUSSHA
XOR_ DEC A
JR NZ, AND_
POP BC, DE; bit XOR
...
JR PUSSHA
AND_ DEC A
JR NZ, CPL_
POP BC, DE; bits or
...
JR PUSSHA
CPL_ DEC A
JR NZ, OR_
POP DE; bitwise inversion
...
JR PUSSHA
OR_ DEC A
JR NZ, MINUS
POP BC, DE; bit and
...
JR PUSSHA
MINUS DEC A
JR NZ, PLUS
POP DE, HL; subtraction
AND A
SBC HL, DE
EX DE, HL
JR PUSSHA
PLUS DEC A
JR NZ, EQUAL
POP DE, HL; addition
ADD HL, DE
EX DE, HL
JR PUSSHA
EQUAL DEC A
JR NZ, DIVIDE
; Comparison of two numbers. Result: 0 or 1
POP BC, HL
...
JP PUSSHA
DIVIDE DEC A
JP NZ, MULT
POP BC, DE; division with rounding
...
JP PUSSHA
MULT DEC A
JP NZ, LOPART
POP DE, BC; multiplication
...
JP PUSSHA
LOPART DEC A
JR NZ, HIPART
POP DE; the lower part of the word
LD D, 0
JP PUSSHA
HIPART DEC A
JP NZ, ERROR
POP DE; upper part of the word
LD E, 0
JP PUSSHA
; Service F-function output tokens for recording.
WRITE_LEX
... , Takes the words of HL, byte 0 = end
WR_SYM ... ; Printing characters. A screen in DE
ORG '($ -1) +1 * 256
DELIMITERS DB "#%()", 0; dividers
OPERATIONS DB "<>!'~|-+=/*.'", 0; operation
NUMBERS DB 0; number of constants
; Priorities dividers and operations
DEL_PRIO DB 0,0,0, 6,6
OPR_PRIO DB 5,5, 4,4,0,4, 2,2, 3, 1,1, 0,0
ORG SOURCE
DB <expression>, 0
Other articles:
Similar articles:
В этот день... 26 October