Inferno #07
31 мая 2005 |
|
For Coderz - Recognition and computation of arithmetic expressions on their character record.
Recognition and computation arithmetic expressions Vitamin/CAIG/2001 This article will be touched upon compilers, and translators, in particular, calculation of the arithmetic expression of his character record. For example, how to mainframe can evaluate the expression of type (10 +20) * 30-40? Man looking at this expression mapping, just tell the answer. And what if in the expression contains a huge number of brackets? Here even the most gifted mathematician thoughtful, though the expression itself can solve tsya very simple. A computer does not have human intelligence, as it is sad but he can only assume. That should be it learn to count such things. This we you now do. Let's start with the lexical analiza.Leksiches cue analysis is always preceded by sintaksiche Scoma analysis and is the prefix belorussian processing of expression and its perekodi derivations to a form suitable for syntactic one analysis. Lexical analysis made separated from the parsing for two reasons us: - To reduce compilation time; - The syntax of tokens is always easier than the blue taxis programs. Lexical analysis is reduced to the partition the program text into lexical units (Token) - design, acting as terminal symbols for syntactic analiza.Leksemy sometimes called symbolic Lamy or atomami.T.e., symbols "("," 10 ","+" personal computer, do not say. For him, it just data. Moreover, different lengths. And our task - to transform them into a readable computer form. This is the purpose lexical analysis. Parsing (Check on the correctness of the expression) is usually but after doing this a hundred leksicheskogo.V THIEY he will not be discussed in detail, but I will say a few words later. Token can be represented as a pair (Class value). The value can be directly rectly or submit a link at some tables, which stores value tokens. In particular, for our problem requires the use of three types tokens: delimiters, operators, and numeric konstanty.Pust we have a table again dividers and operations. Here they are: Separator Operation Index Index gap 0 <tsikl.sdvig left 0 Grating # 1> tsikl.sdvig right one % Rate 2! XOR 2 (Skobka1 3 'logical AND 3 ) Skobka2 4 ~ 4 inversion | Logical OR 5 - Subtract 6 7 + addition Comparison = 8 / 9 division * Multiplication 10 . ml.chast operand 11 'St.chast operand 12 In the value field tokens Save tsya index operation or splitter. For numeric constants as a table values, which is created in the course of analysis for. Further, we denote the token combination Niemi BukvaTsifra form, for example, the minus sign be coded as O6 (Operation 6), separator skobka1 be coded as D3 (Delimiter 3). Tokens are constants have the prefix N (Number). The task of lexical analysis is sufficient simple, the complexity can be only analysis of multi-character tokens (in our case number.) Here you can apply the method samples. The input to the program serves the characters, and She checks them for membership Class razdeliteley.Esli successfully, the output enters the corresponding token. If failure, there is a check for membership sensitivity to class operatsiy.I so dalee.Vychle equation symbolic constants is done by determine their left and right granits.Dalee constants are translated into numerical form and for rush to the table of constants. Moreover, if is a constant in the table already exists, then it not be added again, and returns the index existing constant. Note 1. Since the considered the problem is quite limited, I will not consider other classes can leksem.Eto Gut be identifiers (if using variables), keywords, tags, symbol constants (if it is input text of a higher level than assembler) etc. It should also be allocated Other multi-character structure: the same keywords, tags, and mnogoliternye operations and separators. Since we have broached the subject of languages more High-level (the problem is Recognition arithmetic NE a special case of the concepts of translation and compilation, and it is a language of high one level), it is necessary to return to the promised syntactic subject analiza.Chastichno concentration Troll the correct expression is satisfied at the level of lexical analysis. Usually a check for a valid combination of symmetry oxen in the declaration of constants, identifiers tori, etc. But the text of the program is absolutely correct in terms of vocabulary can be syntactically incorrect. In our If this includes the control sequence particular character (that operations were in the allowed combinations), the numbers in parentheses arithmetic vyrazheniyah.Zadacha syntactic electric analysis is complicated enough and decides tsya by compiling a grammar of the input language and parse the set of tokens recursive algorithm according to the grammar. In our case, we can restrict the former has been proved operations. The next step is the recognition translate the expression in the form of tokens in the formation tnuyu Polish Polish zapis.Obratnaya for pis (IPF) is a form of writing expressions and statements, distinguishing An important feature of which is that all the arguments (or operands) are before the operation (the operator). For example, the expression written in the usual bracket entries (A + d) / c + b * (e + d), being represented in the IPF receives the form: ad + c / bed + * +. RPN has received wide some spread due to its basis vnomu advantage: the expression (operator) written in the form of the SCR can be executed one view of the chain from left to right. In system programming is widely applied is handling strings of symbols, so necessary algorithms for IPF in ntsipu string processing: the transformation of input Neu-line on weekends, which is about inverse Polish notation. For the construction of the SCR can be used vyrazheniya.V tree tops of the tree is tsya operations (operators), and the leaves are are arguments. For example, for the reduced example, the tree will look like this: (+) / (/) (*) / / (+) C b (+) / / a d e d In the top of each subtree is an operation to be performed tsya posledney.Dlya conversion tree linear recording, Image de Reva, starting from the lower left is hundred, left to right. After viewing all sheet transition occurs to the root, etc. SCR finished tying with this tree, you will not difficult to calculate the conversion algorithm. But to formalize this algorithm to it was clear the car, very slozhno.Poetomu For this purpose, other sposoby.Nai better known of these algorithms is tsya Dijkstra's algorithm. It uses stack, and places the transaction. Each operator radio is assigned a priority. Except this, define the rules for work with the degree com. 1) Input string visible on the left right. 2) If the symbol is considered operand, then it is transferred to the output string, otherwise processed stack trace lows: - If the stack is empty, the operation is logged in stack. - If the stack top element (the opera tion) has a higher priority, then considered the operation of pushing tsya the stack. - If the stack top element (the opera tion) has a lower priority, then from the stack are pushed all the elements (Operation) until such time has not yet encountered titsya operation with a priority higher than have the operation in question, after which This operation is pushed onto the stack. - If the input symbol "(" he pro repel the stack. - If the input character "), then pull it lkivaet from the stack, all characters up to the nearest Shay ")." Brackets themselves mutually annihilate are expressed in the output string is not getting exist. Table of precedence of operations: Operations | Priority . '~ | 0 * / | 1 + - | 2 = | 3 '! | | 4 <> | 5 Consider the example of the algorithm: Log in | (a + d) / c + b * (e + d) <end> ------+---------------------------------- C | (+ / + * (+ * + m | (+ * (+ e | + * to | + ------+---------------------------------- Out | ad + c / bed + * + At the end of input to output transmitted to all the characters from the stack until long as the stack is not freed. As a result of we find that the output contains only operations and their arguments. This recording called back because of the operation of one hundred um after the operands (there is also a line where operation precedes the operands, but we do it not necessary). Note 2. Thus, t.e.sogla CHO Dijkstra's algorithm can be translated into SCR is not only an arithmetic expression, but and a program in Java (the languages of high levels AE). Thus it is necessary to determine the additional nye classes of tokens: a description and call the function tions, descriptions, and addressing arrays, conditional nye, and unconditional transitions. As a result, get a description of the language of low-level structures could not be better adapted lennyh further translation into machine code. Information can be found in literature on this topic. The next and final step will be to direct rectly evaluate the expression. As you probably already had to make sure the calculation SCR turns on fast and easy, holds equate in one pass. Algorithm for computing summarized as follows: 1) When is input operand (Cons tantalum), he entered into the stack. 2) If the operator is input, then stack is popped the required number of operators ndov and operations. Result for rushes back into stek.Neobhodimo remember that the operand stack is stored in reverse order. 3) After the top of the stack will will lie the answer. Here is an example of software vse.Dalee implementation of the above. Program has several features: - All numbers have 16-bit format; - The construction of the SCR are not included una rnye + and -; - Constants may be in three systems notation; - Supports all major arithmetic cal and logical; - The priority of the operations described in the table, t.e.tsiklichesky shift, for example, holds after all other operations; - Boolean operations have the same priority and are performed from left to right; - Parse realized hour partially (not control the number of brackets), so better define the correct expression, otherwise possible stack overflow. After starting the program shows on the screen, the original expression, a response record you expressions in the form of tokens and the SCR as tokens. Ed.: Full version of the program, see Annex (COUNT.H), give here only to the main fragments. ... CALL LEXIC; call leksich.analiz-ra RET C; was a mistake LD HL, DESTIN; output tokens LD DE, # 4800; character records CALL WRITE_LEX CALL BPR_CNV; translation in the SCR RET C; was a mistake LD HL, BPR_BUF; output tokens SCR LD DE, # 5000 CALL WRITE_LEX CALL EVALUATE; calculation And the result - in BC LD HL, STRIN LD E, 4 DOANSW XOR A DUP 4 RL C, B RLA EDUP ADD A, 48 CP 1958 JR C, ADOK ADD A, 7 ADOK LD (HL), A INC HL DEC E JR NZ, DOANSW LD HL, SOURCE LD DE, # 4000 CALL PRINT LD HL, STRING LD DE, # 40C0 PRINT LD A, (HL) AND A RET Z CALL WR_SYM INC HL JR PRINT STRING DB "Answer is: #" STRIN DB "0000", 0 ; Classes of tokens OPERATION = 1; operation DELIMITER = OPERATION +1; separator NUMBER = DELIMITER +1; numerical constant SOURCE = # C000; string expression DESTIN = # E000; lexical analysis BPR_BUF = # D000; RPN DIGITS = # 8000; buffer numerical constants ;---------- Lexical analysis ---------- LEXIC LD (RETSP +1), SP; sohr.adr.vozvrata LD HL, SOURCE LD IX, DESTIN XOR A; setting LD (DIGITS), A LD B, A LD C, A LEXCYC LD A, (HL); last character LD (IX), A AND A RET Z CALL IS_DELIM; a divider? JR C, NODELIM INC HL; enters LD (IX), DELIMITER LODCONT LD (IX +1), A LD B, (IX) LD C, A INC IX, IX JR LEXCYC NODELIM CALL IS_OPER; this operation? JR C, NOOPER INC HL; enters LD (IX), OPERATION JR LODCONT NOOPER CALL IS_DIGIT; it chisl.konstanta? JP C, ERROR; if not, then the error LD (IX), NUMBER JR LODCONT ; Checks belong to the STP token diff. Classes IS_DELIM EXX LD HL, DELIMITERS IS_TST LD C, 0 CONSEAR INC (HL) DEC (HL) JR Z, ENDLST CP (HL) JR NZ, NOEL LD A, C EXX RET NOEL INC L INC C JR CONSEAR ENDLST EXX SCF RET IS_OPER EXX LD HL, OPERATIONS JR IS_TST IS_DIGIT EXA LD A, B CP DELIMITER JR NZ, NODELM LD A, C; if the previous symbol = DEC A; pointer sist.schisleniya. JP Z, HEX; hex DEC A JP Z, BIN; binary NODELM EXA CP "0 RET C; if this is not the number of CP "9" 1 JR C, ISDIG CCF RET ISDIG LD E, 0; counter-allowed characters in DIGOK INC E INC HL LD A, (HL) CP "0 JR C, ENDDIG CP "9" 1 JR C, DIGOK CP "A JR C, ENDDIG CP "Z" +1 JP C, ERROR CP "a JR C, ENDDIG CP "z" +1 JP C, ERROR CP 128 JP NC, ERROR ENDDIG PUSH HL , Translated from the line (HL) in the number of HL ' ... ENDDECOD EXX; entry in the list of constants PUSH IX LD IX, DIGITS LD E, 0 LD A, (NUMBERS) AND A JR Z, ADDNUM; no one there, puts LD B, A DOSEAR LD A, (IX); search CP L JR NZ, NEXNUM LD A, (IX +1) CP H JR Z, RETNUM; found - do not add, but And returns the index NEXNUM INC E INC IX, IX DJNZ DOSEAR ADDNUM LD (IX), L; ADDED th nov.konstanty LD (IX +1), H LD A, (NUMBERS); it increases their number INC A LD (NUMBERS), A RETNUM POP IX LD A, E; return index constants AND A EXX POP HL RET ; Hexadecimal constant HEX DEC IX, IX; to remove specific ; Cator of the output string ISHEX LD E, -1 DEC HL HEXOK INC E; count the number of characters INC HL LD A, (HL) CP "0 JR C, ENDHEX CP "9" 1 JR C, HEXOK CP "A JR C, ENDHEX CP "Z" +1 JP C, HEXOK CP "a JR C, ENDHEX CP "z" +1 JP C, HEXOK CP 128 JP NC, ERROR ENDHEX LD A, E; no simv.net => error AND A JR Z, ERROR PUSH HL EXX LD HL, 0 EXX LD A, E; return to the first character BAKHEX DEC HL DEC A JR NZ, BAKHEX DOHEX LD A, (HL); NeuStar tetrads nach.so SUB "0; starshih.Esli among more CP 10, 4 tetrads excess lost JR C, H1 SUB 7 H1 EXX RLA RLA RLA RLA DUP 4 RLA RL L, H EDUP EXX INC HL DEC E JR NZ, DOHEX JP ENDDECOD ; Binary constant BIN; similarly indiscriminately ... , Hexadecimal constants JP ENDDECOD ERROR LD A, 2; error indication OUT (-2), A RETSP LD SP, 0; & Returns SCF RET ; Translations RPN (SCR) BPR_CNV LD (RETSP +1), SP; setting LD IX, DESTIN LD HL, BPR_BUF LD BC, 0 PUSH BC; initializing the stack - ; Entering the marker end of the stack BPR_CYC LD A, (IX); read one token LD E, (IX +1) AND A JR Z, FINISH; no more tokens LD D, A CP DELIMITER JR Z, DELIMS; separator CP OPERATION JR Z, OPERA; operation OUT_LEX LD (HL), D; output tokens to the output INC HL LD (HL), E INC HL NEXSY INC IX, IX JR BPR_CYC DELIMS LD A, E; parsing delimiters AND A JR Z, NEXSY; blank - skip SUB 3; opening bracket - JR Z, PUSHA; entry into the stack DEC A JR Z, POPA; asm OPERA POP BC PUSH BC; element from the top of the stack CALL GET_PRIO; its priority LD C, E; current element LD B, D PUSH BC; entry into the stack (*) LD E, A CALL GET_PRIO; and its priority CP E; if priority is less JR C, NEXSY; a Record (*: already there) POP DE; otherwise - the current element POP BC; and item from the top LD (HL), B; top e-t - on the way out INC HL LD (HL), C INC HL JR OPERA; popup until , Until a token ; With lower priority PUSHA PUSH DE; token in the stack (bracket) JR NEXSY POPA; closing bracket - pushing POP DE; all e-Tov to the nearest LD A, D; opening CP DELIMITER JR NZ, COUT LD A, E; opening bracket CP 3; also ejected JR Z, NEXSY; but not displayed COUT LD (HL), D INC HL LD (HL), E INC HL JR POPA FINISH POP DE; completion of translation - LD (HL), D; expulsion of all INC HL; e-Tov from the stack and output. LD (HL), E INC HL LD A, D AND A JR NZ, FINISH RET ; Take priority tokens GET_PRIO; BC = LEXEM, A <= PRIO LD A, B CP OPERATION JR Z, OP_PRIO CP DELIMITER JR NZ, OTHER LD B, 'DEL_PRIO LD A, C ADD A, DEL_PRIO LD C, A LD A, (BC) RET OTHER LD A, -1; the highest priority in RET; "extra" tokens (usually And this marker end of the stack) OP_PRIO LD B, 'OPR_PRIO LD A, C ADD A, OPR_PRIO LD C, A LD A, (BC) RET ;------ Calculation of expression of SCR ----- EVALUATE LD (RETSP +1), SP; setting LD IX, BPR_BUF DOEVAL LD D, (IX) LD E, (IX +1) LD A, D AND A JR NZ, CONTEV POP BC; end of the expression RET CONTEV CP NUMBER JR NZ, NONUM LD D, 0 number - entry of the stack LD HL, DIGITS; search buffer EX DE, HL ADD HL, HL ADD HL, DE LD E, (HL) INC HL LD D, (HL) PUSSHA PUSH DE INC IX, IX JR DOEVAL NONUM CP OPERATION; there were only , Operators, delimiters JP NZ, ERROR; in the SCR should not be! ; Vypoln.oper th on e-Tami top of the stack , (One or 2, while at the top of the stack ; Lies _vtoroy_ operand LD A, E AND A JR NZ, SH_R POP BC, DE; cyclic shift to the left DOSHL LD A, D RLA RL E, D DEC BC LD A, B OR C JR NZ, DOSHL JR PUSSHA SH_R DEC A JR NZ, XOR_ POP BC, DE; tsiklich.sdvig right ... JR PUSSHA XOR_ DEC A JR NZ, AND_ POP BC, DE; bit XOR ... JR PUSSHA AND_ DEC A JR NZ, CPL_ POP BC, DE; bits or ... JR PUSSHA CPL_ DEC A JR NZ, OR_ POP DE; bitwise inversion ... JR PUSSHA OR_ DEC A JR NZ, MINUS POP BC, DE; bit and ... JR PUSSHA MINUS DEC A JR NZ, PLUS POP DE, HL; subtraction AND A SBC HL, DE EX DE, HL JR PUSSHA PLUS DEC A JR NZ, EQUAL POP DE, HL; addition ADD HL, DE EX DE, HL JR PUSSHA EQUAL DEC A JR NZ, DIVIDE ; Comparison of two numbers. Result: 0 or 1 POP BC, HL ... JP PUSSHA DIVIDE DEC A JP NZ, MULT POP BC, DE; division with rounding ... JP PUSSHA MULT DEC A JP NZ, LOPART POP DE, BC; multiplication ... JP PUSSHA LOPART DEC A JR NZ, HIPART POP DE; the lower part of the word LD D, 0 JP PUSSHA HIPART DEC A JP NZ, ERROR POP DE; upper part of the word LD E, 0 JP PUSSHA ; Service F-function output tokens for recording. WRITE_LEX ... , Takes the words of HL, byte 0 = end WR_SYM ... ; Printing characters. A screen in DE ORG '($ -1) +1 * 256 DELIMITERS DB "#%()", 0; dividers OPERATIONS DB "<>!'~|-+=/*.'", 0; operation NUMBERS DB 0; number of constants ; Priorities dividers and operations DEL_PRIO DB 0,0,0, 6,6 OPR_PRIO DB 5,5, 4,4,0,4, 2,2, 3, 1,1, 0,0 ORG SOURCE DB <expression>, 0
Other articles:
Similar articles:
В этот день... 21 November