Inferno #07
31 мая 2005

For Coderz - Recognition and computation of arithmetic expressions on their character record.

<b>For Coderz</b> - Recognition and computation of arithmetic expressions on their character record.
        Recognition and computation

         arithmetic expressions
 Vitamin/CAIG/2001

   This article will be touched upon
compilers, and translators, in particular,
calculation of the arithmetic expression of
his character record. For example, how to
mainframe can evaluate the expression of type
(10 +20) * 30-40? Man looking at this expression
mapping, just tell the answer. And what if
in the expression contains a huge number of
brackets? Here even the most gifted mathematician
thoughtful, though the expression itself can solve
tsya very simple. A computer does not have
human intelligence, as it is sad
but he can only assume. That should be it
learn to count such things. This we
you now do.

   Let's start with the lexical analiza.Leksiches
cue analysis is always preceded by sintaksiche
Scoma analysis and is the prefix
belorussian processing of expression and its perekodi
derivations to a form suitable for syntactic
one analysis. Lexical analysis made
separated from the parsing for two reasons
 us:

   - To reduce compilation time;

  - The syntax of tokens is always easier than the blue
taxis programs.

   Lexical analysis is reduced to the partition
the program text into lexical units
(Token) - design, acting as
terminal symbols for syntactic
analiza.Leksemy sometimes called symbolic
Lamy or atomami.T.e., symbols "("," 10 ","+"
personal computer, do not say. For him, it
just data. Moreover, different lengths. And our
task - to transform them into a readable
computer form. This is the purpose
lexical analysis. Parsing
(Check on the correctness of the expression) is usually
but after doing this a hundred leksicheskogo.V
THIEY he will not be discussed in detail,
but I will say a few words later.

   Token can be represented as a pair
(Class value). The value can be directly
rectly or submit a link
at some tables, which stores
value tokens. In particular, for our
problem requires the use of three types
tokens: delimiters, operators, and numeric
konstanty.Pust we have a table again
dividers and operations. Here they are:

Separator Operation Index Index

     gap 0 <tsikl.sdvig left 0

   Grating # 1> tsikl.sdvig right one

   % Rate 2! XOR 2

   (Skobka1 3 'logical AND 3

  ) Skobka2 4 ~ 4 inversion

                   | Logical OR 5

                   - Subtract 6

                   7 + addition

                   Comparison = 8

                   / 9 division

                   * Multiplication 10

                   . ml.chast operand 11

                   'St.chast operand 12


   In the value field tokens Save
tsya index operation or splitter. For
numeric constants as a table
values, which is created in the course of analysis
for. Further, we denote the token combination
Niemi BukvaTsifra form, for example, the minus sign
be coded as O6 (Operation 6),
separator skobka1 be coded as
D3 (Delimiter 3). Tokens are constants
have the prefix N (Number).

   The task of lexical analysis is sufficient
simple, the complexity can be only
analysis of multi-character tokens (in our
case number.) Here you can apply the method
samples. The input to the program serves the characters, and
She checks them for membership
Class razdeliteley.Esli successfully, the output
enters the corresponding token. If
failure, there is a check for membership
sensitivity to class operatsiy.I so dalee.Vychle
equation symbolic constants is done by
determine their left and right granits.Dalee
constants are translated into numerical form and for
rush to the table of constants. Moreover, if
is a constant in the table already exists, then it
not be added again, and returns the index
existing constant.

   Note 1. Since the considered
the problem is quite limited, I will not
consider other classes can leksem.Eto
Gut be identifiers (if using
variables), keywords, tags, symbol
constants (if it is input
text of a higher level than assembler)
etc. It should also be allocated
Other multi-character structure: the same
keywords, tags, and mnogoliternye
operations and separators.

   Since we have broached the subject of languages ​​more
High-level (the problem is
Recognition arithmetic NE
a special case of the concepts of translation
and compilation, and it is a language of high
one level), it is necessary to return to the promised
syntactic subject analiza.Chastichno concentration
Troll the correct expression is satisfied
at the level of lexical analysis. Usually
a check for a valid combination of symmetry
oxen in the declaration of constants, identifiers
tori, etc. But the text of the program is absolutely
correct in terms of vocabulary can
be syntactically incorrect. In our
If this includes the control sequence
particular character (that operations were
in the allowed combinations), the numbers in parentheses
arithmetic vyrazheniyah.Zadacha syntactic
electric analysis is complicated enough and decides
tsya by compiling a grammar of the input
language and parse the set of tokens recursive
algorithm according to the grammar. In
our case, we can restrict the former has
been proved operations.

   The next step is the recognition
translate the expression in the form of tokens in the formation
tnuyu Polish Polish zapis.Obratnaya for
pis (IPF) is a form of
writing expressions and statements, distinguishing
An important feature of which is that
all the arguments (or operands) are
before the operation (the operator).

   For example, the expression written in the usual
 bracket entries

             (A + d) / c + b * (e + d),
 being represented in the IPF receives the form:

               ad + c / bed + * +.

   RPN has received wide
some spread due to its basis
vnomu advantage: the expression (operator)
written in the form of the SCR can be executed
one view of the chain from left to right. In
system programming is widely applied
is handling strings of symbols, so
necessary algorithms for IPF in
ntsipu string processing: the transformation of input
Neu-line on weekends, which is about
inverse Polish notation.

   For the construction of the SCR can be used
vyrazheniya.V tree tops of the tree is
tsya operations (operators), and the leaves are
are arguments. For example, for the reduced
 example, the tree will look like this:

                    (+)

                  /

                (/) (*)

              / /

            (+) C b (+)

           / /

         a d e d

   In the top of each subtree
is an operation to be performed
tsya posledney.Dlya conversion tree
linear recording, Image de
Reva, starting from the lower left is
hundred, left to right. After viewing all
sheet transition occurs to the root, etc.
SCR finished tying with this tree, you will not
difficult to calculate the conversion algorithm.

   But to formalize this algorithm to
it was clear the car, very slozhno.Poetomu
For this purpose, other sposoby.Nai
better known of these algorithms is
tsya Dijkstra's algorithm. It uses
stack, and places the transaction. Each operator
radio is assigned a priority. Except
this, define the rules for work with the degree
 com.

  1) Input string visible on the left
 right.

  2) If the symbol is considered
operand, then it is transferred to the output
string, otherwise processed stack trace
lows:

     - If the stack is empty, the operation is logged in

     stack.

     - If the stack top element (the opera

      tion) has a higher priority, then

      considered the operation of pushing

     tsya the stack.

     - If the stack top element (the opera

      tion) has a lower priority, then

      from the stack are pushed all the elements

      (Operation) until such time has not yet encountered

      titsya operation with a priority higher than

      have the operation in question, after which

     This operation is pushed onto the stack.

     - If the input symbol "(" he pro

     repel the stack.

     - If the input character "), then pull it

      lkivaet from the stack, all characters up to the nearest

      Shay ")." Brackets themselves mutually annihilate

      are expressed in the output string is not getting

     exist.


       Table of precedence of operations:

            Operations | Priority

              . '~ | 0

              * / | 1

              + - | 2

              = | 3

              '! | | 4

             <> | 5


   Consider the example of the algorithm:
Log in | (a + d) / c + b * (e + d) <end>
 ------+----------------------------------
   C | (+ / + * (+ * +

   m | (+ * (+

   e | + *

  to | +
------+---------------------------------- Out | ad + c / bed + 
* + 


   At the end of input to output
transmitted to all the characters from the stack until
long as the stack is not freed. As a result of
we find that the output contains only
operations and their arguments. This recording
called back because of the operation of one hundred
um after the operands (there is also a line where
operation precedes the operands, but we do it
not necessary).

   Note 2. Thus, t.e.sogla
CHO Dijkstra's algorithm can be translated into
SCR is not only an arithmetic expression, but
and a program in Java (the languages ​​of high levels
AE). Thus it is necessary to determine the additional
nye classes of tokens: a description and call the function
tions, descriptions, and addressing arrays, conditional
nye, and unconditional transitions. As a result,
get a description of the language of low-level
structures could not be better adapted
lennyh further translation into machine
code. Information can be found in
literature on this topic.


   The next and final step will be to direct
rectly evaluate the expression. As you
probably already had to make sure the calculation
SCR turns on fast and easy, holds
equate in one pass. Algorithm for computing
 summarized as follows:

  1) When is input operand (Cons
 tantalum), he entered into the stack.

  2) If the operator is input, then
stack is popped the required number of operators
ndov and operations. Result for
rushes back into stek.Neobhodimo remember
that the operand stack is stored in reverse
 order.

  3) After the top of the stack will
will lie the answer.


   Here is an example of software vse.Dalee
implementation of the above. Program
 has several features:

   - All numbers have 16-bit format;

  - The construction of the SCR are not included una
 rnye + and -;

  - Constants may be in three systems
 notation;

  - Supports all major arithmetic
 cal and logical;

  - The priority of the operations described in the table,
t.e.tsiklichesky shift, for example, holds
 after all other operations;

  - Boolean operations have the same
 priority and are performed from left to right;

  - Parse realized hour
partially (not control the number of brackets), so
better define the correct expression, otherwise
possible stack overflow.

   After starting the program shows on the
screen, the original expression, a response record you
 expressions in the form of tokens and the SCR as tokens.

   Ed.: Full version of the program, see
Annex (COUNT.H), give here only
 to the main fragments.

         ...

         CALL LEXIC; call leksich.analiz-ra

         RET C; was a mistake

         LD HL, DESTIN; output tokens

         LD DE, # 4800; character records

         CALL WRITE_LEX

         CALL BPR_CNV; translation in the SCR

         RET C; was a mistake

         LD HL, BPR_BUF; output tokens SCR

         LD DE, # 5000

         CALL WRITE_LEX

         CALL EVALUATE; calculation

                          And the result - in BC

         LD HL, STRIN

        LD E, 4
 DOANSW XOR A

         DUP 4

         RL C, B

         RLA

         EDUP

         ADD A, 48

         CP 1958

         JR C, ADOK

        ADD A, 7
 ADOK LD (HL), A

         INC HL

         DEC E

         JR NZ, DOANSW

         LD HL, SOURCE

         LD DE, # 4000

         CALL PRINT

         LD HL, STRING

        LD DE, # 40C0
 PRINT LD A, (HL)

         AND A

         RET Z

         CALL WR_SYM

         INC HL

        JR PRINT
STRING DB "Answer is: #"
STRIN DB "0000", 0
; Classes of tokens
OPERATION = 1; operation
DELIMITER = OPERATION +1; separator
NUMBER = DELIMITER +1; numerical constant
SOURCE = # C000; string expression
DESTIN = # E000; lexical analysis
BPR_BUF = # D000; RPN
DIGITS = # 8000; buffer numerical constants
;---------- Lexical analysis ---------- LEXIC

         LD (RETSP +1), SP; sohr.adr.vozvrata

         LD HL, SOURCE

         LD IX, DESTIN

         XOR A; setting

         LD (DIGITS), A

         LD B, A

        LD C, A
 LEXCYC LD A, (HL); last character

         LD (IX), A

         AND A

         RET Z

         CALL IS_DELIM; a divider?

         JR C, NODELIM

         INC HL; enters

        LD (IX), DELIMITER
 LODCONT LD (IX +1), A

         LD B, (IX)

         LD C, A

         INC IX, IX

        JR LEXCYC
 NODELIM CALL IS_OPER; this operation?

         JR C, NOOPER

         INC HL; enters

         LD (IX), OPERATION

        JR LODCONT
 NOOPER CALL IS_DIGIT; it chisl.konstanta?

         JP C, ERROR; if not, then the error

         LD (IX), NUMBER

        JR LODCONT
; Checks belong to the STP token diff. Classes
 IS_DELIM

         EXX

        LD HL, DELIMITERS
IS_TST LD C, 0
 CONSEAR INC (HL)

         DEC (HL)

         JR Z, ENDLST

         CP (HL)

         JR NZ, NOEL

         LD A, C

         EXX

        RET
 NOEL INC L

         INC C

        JR CONSEAR
 ENDLST EXX

         SCF

        RET
 IS_OPER

         EXX

         LD HL, OPERATIONS

        JR IS_TST
 IS_DIGIT

         EXA

         LD A, B

         CP DELIMITER

         JR NZ, NODELM

         LD A, C; if the previous symbol =

         DEC A; pointer sist.schisleniya.

         JP Z, HEX; hex

         DEC A

        JP Z, BIN; binary
 NODELM EXA

         CP "0

         RET C; if this is not the number of

         CP "9" 1

         JR C, ISDIG

         CCF

        RET
ISDIG LD E, 0; counter-allowed characters in
 DIGOK INC E

         INC HL

         LD A, (HL)

         CP "0

         JR C, ENDDIG

         CP "9" 1

         JR C, DIGOK

         CP "A

         JR C, ENDDIG

         CP "Z" +1

         JP C, ERROR

         CP "a

         JR C, ENDDIG

         CP "z" +1

         JP C, ERROR

         CP 128

        JP NC, ERROR
 ENDDIG

        PUSH HL
 , Translated from the line (HL) in the number of HL '

        ...
 ENDDECOD

         EXX; entry in the list of constants

         PUSH IX

         LD IX, DIGITS

         LD E, 0

         LD A, (NUMBERS)

         AND A

         JR Z, ADDNUM; no one there, puts

        LD B, A
 DOSEAR LD A, (IX); search

         CP L

         JR NZ, NEXNUM

         LD A, (IX +1)

         CP H

         JR Z, RETNUM; found - do not add, but

                        And returns the index
 NEXNUM INC E

         INC IX, IX

        DJNZ DOSEAR
 ADDNUM LD (IX), L; ADDED th nov.konstanty

         LD (IX +1), H

         LD A, (NUMBERS); it increases their number

         INC A

        LD (NUMBERS), A
 RETNUM POP IX

         LD A, E; return index constants

         AND A

         EXX

         POP HL

        RET
; Hexadecimal constant
 HEX DEC IX, IX; to remove specific
                 ; Cator of the output string
 ISHEX LD E, -1

        DEC HL
 HEXOK INC E; count the number of characters

         INC HL

         LD A, (HL)

         CP "0

         JR C, ENDHEX

         CP "9" 1

         JR C, HEXOK

         CP "A

         JR C, ENDHEX

         CP "Z" +1

         JP C, HEXOK

         CP "a

         JR C, ENDHEX

         CP "z" +1

         JP C, HEXOK

         CP 128

        JP NC, ERROR
 ENDHEX

         LD A, E; no simv.net => error

         AND A

         JR Z, ERROR

         PUSH HL

         EXX

         LD HL, 0

         EXX

        LD A, E; return to the first character
 BAKHEX DEC HL

         DEC A

        JR NZ, BAKHEX
 DOHEX LD A, (HL); NeuStar tetrads nach.so

         SUB "0; starshih.Esli among more

         CP 10, 4 tetrads excess lost

         JR C, H1

        SUB 7
 H1 EXX

         RLA

         RLA

         RLA

         RLA

         DUP 4

         RLA

         RL L, H

         EDUP

         EXX

         INC HL

         DEC E

         JR NZ, DOHEX

        JP ENDDECOD
; Binary constant
 BIN; similarly indiscriminately

         ... , Hexadecimal constants

        JP ENDDECOD
 ERROR LD A, 2; error indication

        OUT (-2), A
 RETSP LD SP, 0; & Returns

         SCF

        RET
; Translations RPN (SCR)
 BPR_CNV

         LD (RETSP +1), SP; setting

         LD IX, DESTIN

         LD HL, BPR_BUF

         LD BC, 0

         PUSH BC; initializing the stack -

            ; Entering the marker end of the stack
 BPR_CYC LD A, (IX); read one token

         LD E, (IX +1)

         AND A

         JR Z, FINISH; no more tokens

         LD D, A

         CP DELIMITER

         JR Z, DELIMS; separator

         CP OPERATION

        JR Z, OPERA; operation
 OUT_LEX LD (HL), D; output tokens to the output

         INC HL

         LD (HL), E

        INC HL
 NEXSY INC IX, IX

        JR BPR_CYC

 DELIMS LD A, E; parsing delimiters

         AND A

         JR Z, NEXSY; blank - skip

         SUB 3; opening bracket -

         JR Z, PUSHA; entry into the stack

         DEC A

        JR Z, POPA; asm
 OPERA POP BC

         PUSH BC; element from the top of the stack

         CALL GET_PRIO; its priority

         LD C, E; current element

         LD B, D

         PUSH BC; entry into the stack (*)

         LD E, A

         CALL GET_PRIO; and its priority

         CP E; if priority is less

         JR C, NEXSY; a Record (*: already there)

         POP DE; otherwise - the current element

         POP BC; and item from the top

         LD (HL), B; top e-t - on the way out

         INC HL

         LD (HL), C

         INC HL

         JR OPERA; popup until

                , Until a token

                    ; With lower priority
 PUSHA PUSH DE; token in the stack (bracket)

        JR NEXSY
 POPA; closing bracket - pushing

         POP DE; all e-Tov to the nearest

         LD A, D; opening

         CP DELIMITER

         JR NZ, COUT

         LD A, E; opening bracket

         CP 3; also ejected

        JR Z, NEXSY; but not displayed
 COUT LD (HL), D

         INC HL

         LD (HL), E

         INC HL

        JR POPA

 FINISH POP DE; completion of translation -

         LD (HL), D; expulsion of all

         INC HL; e-Tov from the stack and output.

         LD (HL), E

         INC HL

         LD A, D

         AND A

         JR NZ, FINISH

        RET
; Take priority tokens
 GET_PRIO; BC = LEXEM, A <= PRIO

         LD A, B

         CP OPERATION

         JR Z, OP_PRIO

         CP DELIMITER

         JR NZ, OTHER

         LD B, 'DEL_PRIO

         LD A, C

         ADD A, DEL_PRIO

         LD C, A

         LD A, (BC)

        RET
 OTHER LD A, -1; the highest priority in

         RET; "extra" tokens (usually

                  And this marker end of the stack)
 OP_PRIO

         LD B, 'OPR_PRIO

         LD A, C

         ADD A, OPR_PRIO

         LD C, A

         LD A, (BC)

        RET
;------ Calculation of expression of SCR ----- EVALUATE

         LD (RETSP +1), SP; setting

        LD IX, BPR_BUF
 DOEVAL LD D, (IX)

         LD E, (IX +1)

         LD A, D

         AND A

         JR NZ, CONTEV

         POP BC; end of the expression

        RET
 CONTEV CP NUMBER

         JR NZ, NONUM

         LD D, 0 number - entry of the stack

         LD HL, DIGITS; search buffer

         EX DE, HL

         ADD HL, HL

         ADD HL, DE

         LD E, (HL)

         INC HL

        LD D, (HL)
 PUSSHA PUSH DE

         INC IX, IX

        JR DOEVAL
 NONUM CP OPERATION; there were only

                   , Operators, delimiters

        JP NZ, ERROR; in the SCR should not be!
; Vypoln.oper th on e-Tami top of the stack
, (One or 2, while at the top of the stack
 ; Lies _vtoroy_ operand

         LD A, E

         AND A

         JR NZ, SH_R

        POP BC, DE; cyclic shift to the left
 DOSHL LD A, D

         RLA

         RL E, D

         DEC BC

         LD A, B

         OR C

         JR NZ, DOSHL

        JR PUSSHA
 SH_R DEC A

         JR NZ, XOR_

         POP BC, DE; tsiklich.sdvig right

         ...

        JR PUSSHA
 XOR_ DEC A

         JR NZ, AND_

         POP BC, DE; bit XOR

         ...

        JR PUSSHA
 AND_ DEC A

         JR NZ, CPL_

         POP BC, DE; bits or

         ...

        JR PUSSHA
 CPL_ DEC A

         JR NZ, OR_

         POP DE; bitwise inversion

         ...

        JR PUSSHA
 OR_ DEC A

         JR NZ, MINUS

         POP BC, DE; bit and

         ...

        JR PUSSHA
 MINUS DEC A

         JR NZ, PLUS

         POP DE, HL; subtraction

         AND A

         SBC HL, DE

         EX DE, HL

        JR PUSSHA
 PLUS DEC A

         JR NZ, EQUAL

         POP DE, HL; addition

         ADD HL, DE

         EX DE, HL

        JR PUSSHA
 EQUAL DEC A

        JR NZ, DIVIDE
 ; Comparison of two numbers. Result: 0 or 1

         POP BC, HL

         ...

        JP PUSSHA
 DIVIDE DEC A

         JP NZ, MULT

         POP BC, DE; division with rounding

         ...

        JP PUSSHA
 MULT DEC A

         JP NZ, LOPART

         POP DE, BC; multiplication

         ...

        JP PUSSHA
 LOPART DEC A

         JR NZ, HIPART

         POP DE; the lower part of the word

         LD D, 0

        JP PUSSHA
 HIPART DEC A

         JP NZ, ERROR

         POP DE; upper part of the word

         LD E, 0

        JP PUSSHA
; Service F-function output tokens for recording.
 WRITE_LEX

        ... , Takes the words of HL, byte 0 = end
 WR_SYM ... ; Printing characters. A screen in DE

        ORG '($ -1) +1 * 256
DELIMITERS DB "#%()", 0; dividers
OPERATIONS DB "<>!'~|-+=/*.'", 0; operation
NUMBERS DB 0; number of constants
; Priorities dividers and operations
DEL_PRIO DB 0,0,0, 6,6
 OPR_PRIO DB 5,5, 4,4,0,4, 2,2, 3, 1,1, 0,0

         ORG SOURCE

        DB <expression>, 0




Other articles:

Classics - Almanashnik. Alexander Pushkin.

For Coderz - Recognition and computation of arithmetic expressions on their character record.

Inferno - The authors of the magazine.

For Coderz - the discipline to create large projects.

Interview - Questions Konstantin Sviridov (Conan) on the site zxnext.narod.ru.

Likbez - The principles of converting graphics PC-ZX.

For Coderz - Programming disc changer / drive in Scorpio.

Softinka - DNA_OS v0.431 - package of utilities for working with hard drives, RAM-drives and floppy disks.

For Coderz - Programming under DNA_OS ZET-9, a package of tools to work with storage devices.

Softinka - The problems and shortcomings package of tools to work with storage devices DNA_OS.

Likbez - details about disk formats that are FAT.

Inferno - Entered from the editor.

Inferno - Errors in the previous numbers.

For Coderz - Small programmers' tricks.

Gameland - On the new games: Oneyroid, Dizzy forever, Dridlock.

For Coderz - Writing archive. Practical principles LZ packaging.

Gameland - Passage of new shipments for the game "Black Crow".

For Coderz - Programming for the video mode 384x304.

Inferno - Letters to the Editor.

Sound - Eden Megus'a about the tracker for the AY / YM.

Inferno - On the shell.

For Coderz - Fundamentals of optimization for the processor Z80.

Likbez - The location of partitions on your hard drive.

Gamedev - 3D projection of the floor / road in the games.

Sound - Wild ideas for AY trackers.

Advertising - Ads by Roman Chuunin.

Advertising - Ads by V. Bogdanovich

For Coderz - How a large Flexible Program.

Repair - Faults Pentagon 128 + and their repair.

Inferno - Content.

Miscellaneous - Thoughts on the contest for the best software.

Others - Transfer software on ZX Spectrum with a PC.

Video - On packaging for a video ZX Spectrum.


Темы: Игры, Программное обеспечение, Пресса, Аппаратное обеспечение, Сеть, Демосцена, Люди, Программирование

Similar articles:
Chaos Construction 2001 - an interview with Stingrey and Steep from Izhevsk.
Story - Well, user, wait a minute! (Continued).
Other - Scholz bedwetter.

В этот день...   21 November