kodit Who's there? - An approximate search for a given sequence of bytes!

<b>kodit Who's there?</b> - An approximate search for a given sequence of bytes!

    (C) Wocen / Triumph

    -----------------

   Approximate search for a given sequence of bytes!


   Recently, breaking some sort of protection, or something 
else, ran with little problem. We had to find a specified string

(Sequence of bytes). So what is the problem, say
dear reader, compared byte and all the way up
the result. But the problem is that I did not know at
100% is a given sequence in the original file. AND
It was then, and then came up with little idea to write a search
sequence closest to the given one. Are
method is very simple.



   1) Get the source code:

Source text: I'am wocen!
Hex: # 49, # 27, # 61, # 6D, # 20, # 77, # 6F, # 63, # 65, # 6E



   2) Set the sequence you want to find:

Given sequence: Wocen
In hexadecimal: # 57, # 6F, # 63, # 65, # 6E


   It is evident that the simplest Byte comparison will not 
work. 


   3) Take the given string and proksorivaem it in the text 
(with start of text). We have:



   # 57 # 6F # 63 # 65 # 6E
XOR

   # 49 # 27 # 61 # 6D # 20
 = ------------------
   # 1E # 48 # 02 # 08 # 4E


   4) summarize the number of remaining bits equal to one!


  # 1E = 4 bits
+ # 48 = 2 bits
+ # 02 = 1 bit
+ # 08 = 1 bit
+ # 4E = 4 bits
= -------------
       12 bits (of ones) errors.


   5) Store the address in the text of which was to compare and
keep the number of errors = 12.


   Next, we repeat the same operations but not with
beginning of the text, and offset 1. Ie will:


   # 57 # 6F # 63 # 65 # 6E
XOR

   # 27 # 61 # 6D # 20 # 77
 = ------------------
   # 48 # 02 # 08 # 4E # 19

11 bits (of ones)



   6) At this time we look at: aha number of ones is less
than the first time, then the likelihood that you have found 
the desired sequence of the above! Save the address of the 
text, and the number of errors. 


   So we repeat the entire text. More detail,
when the address in the text shows on the string "wocen" which
differs from the specified "Wocen" only the first letter.


   # 57 # 6F # 63 # 65 # 6E = "Wocen"
XOR

   # 77 # 6F # 63 # 65 # 6E = "wocen"
 = ------------------
   # 20 # 00 # 00 # 00 # 00


   Number of errors = 1, because only 1 bit is 1. Since
number of errors less than the previous then save the address in
text, and errors.


   At the end of the text we take the address of which is found 
in the work and deliver it to the user!



   Where can I still use the files of?


   Yes, anywhere. For example, in the same assembly, of course
not instead of byte comparison, and additionally (in
load >;-)) Since by my own experience I know if there is
already 4 source files, label Immersed property
forgotten and remember the exact / full name is not in
forces. Here are some specific and useful approximate
search.


   Also approximate search can be used in
programs for the distillation of 'screen' to 'text'. The way to 
my subjective view of the best of existing similar

programs is "Screen to text transformer" by
Death Moroz / Shales. I still have a couple of similar prog,
but they are so wretchedly that they have been better to remain 
silent. In "Screen to text transformer" used ordinary byte-

comparatively, and when they had to vospolzovatsya was
quite unpleasant to teach the program that the letter "A"
thickening, one and the same-that "and" without thickening, and 
so almost with all the characters. And if you-have been used 
bitwise method is the keyboard in general could not touch, 
teaching program. Although it is possible some were 
bukvochki-would be replaced other :-) So in the future, if 
someone wants to write very best transcoder in the picture into 
the text, then let take into account several points:



   1 - Must be able to encode color and b / w
screens, with the color code 16 (# 10) and after the color.

   2 - To load should go according to the tagging of files.

   3 - Font for comparison must have been 2-3, for each

       font-size, ie 3 font sizes 4x8, 6x8, 8x8

       Well, you can make 3 font size 5x8.

   4 - Buffer for received text 16 kb, with control variables
       complement.

   5 - Muse is a must.

   6 - And finally, except for one set of some size

       font on the picture, it should be possible combined
       a fixed job. How do I determine what type napecha
       ted? Elementary! Consider a vertical column,

       if it is zero with probability 90%, we can assert
       expect that the border is a symbol. Next, push up 
obviously        anterior character, look at it long and are 
looking for among the current        schego size.



   And now he is the source of an approximate search with the 
comments. Source wrote in Alasm'e using its specific commands. 

;------------------------------------------------- ---------; 
WRITTEN BY WOCEN / ORION / TRIUMPH - May 1998 ; Search speed 
Specifies the approximately: , 3270 BYTES in 1 second (without 
turbo), 5 bytes , 1630 BYTES in 1 second (without turbo), 10 
BYTES ; 850 BYTES in 1 second (without turbo), 20 BYTES

, 4680 BYTES in 1 second (Turbo, SKORPIONOVSKOY), 5 bytes
, 2520 BYTES in 1 second (Turbo), 10 BYTES
, 1260 BYTES in 1 second (Turbo), 20 BYTES

ORG # 8000

ADR_TXT EQU # 0000 ADDRESS START SEARCH
LEN_TXT EQU 32768; length of the search
LEN EQU 5; length of the specified String

DI
XOR A
DEC A; set at the START OF MAX.
LD (OLD +1), A; ERROR = 255 (# FF)
LD BC, LEN_TXT; length of the search
EXX

LD DE, ADR_TXT; INITIAL ADDRESS SEARCH
L1 LD B, LEN; length of the sequence
LD HL, TEXT; start of a sequence
PUSH DE; Save the current ADDRESS

                                ; TEXT
L2 LD A, (DE); data byte
XOR (HL); check

ERRORS LD C, 0; NUMBER OF ERRORS

; Team DUP repeats itself after a piece of code to the directive
; EDUP specified in the DUP again.
; In this case, the disclosure of the cycle has allowed a 
little speed ; One of _DOHRENA_VREMYA_SEDAYUSCHIH_ routines to 
work. 

DUP 8; Check all 8 bits
RRA; BIT = 0? (Ie, match)
JR NC, $ +2 +1; jump if Coinciding
INC C; Increases the amount of NO
EDUP; agreement between (ERROR)

LD A, C
LD (ERRORS +1), A; maintains a COINCIDENCE
INC HL; LARGE ADDRESS GIVEN

                                ; Strings
INC DE; LARGE ADDRESS TEXT
DJNZ L2; REPETITION
POP DE; RESTORE ADDRESS OF TEXT

OLD LD A, # FF; old result
CP C; LESS THAN NEW?
JR C, L3; YES!
JR Z, L3; RAVENNA! => Leave the old

                                ; RESULT
LD A, C; took a new RESULT
LD (OLD +1), A; Save it
LD (ADRES +1), DE; SAVES HIS ADDRESS

                                The text
AND A; results coincided NEW WITHOUT

                                , ERRORS?
JR Z, ADRES; GO IF YES!

; It can be inserted a piece of code below

L3 XOR A
LD (ERRORS +1), A; Zeroing ERRORS
INC DE; INCREASES ITS ADDRESS IN THE TEXT

EXX
DEC BC; REDUCES PROCESSED
LD A, B; LENGTH
OR C
EXX
JR NZ, L1; GO IF NOT COMPLETE

ADRES LD HL, 0, OUTPUT ADDRESS HERE

                                The text,
LD A, (OLD +1); A NUMBER OF ERRORS HERE!
RET

TEXT DEFB "Wocen"; PRESET String



   Well as you can see from the source code out of protsedurki
realized only if it is 100% coincidence or not yet
shoveled all of the specified memory. Therefore, we can insert
Next> piece of code before the label 'L3':


ERROR EQU 4; LIMIT FOR ERRORS

                                , TE IF LESS THAN THIS

                                , An error occurred, then

                                , We assume WHAT IS

                                , Normal. HEREBY

                                ; REGULATING SENSITIVITY
...
CP ERROR; CHECK margin of error
JR C, ADRES; IF LESS THAN GIVEN

                                ; Limit, GO
L3 ... ; CONTINUATION PROGRAM



   Of course this code does not claim to coolness and probably
You can speed it up by applying reading bytes from the memory 
stack, well and probably replacing all 'JR' to 'JP'.



   Reference to the author for reprinting necessarily at
use is desirable.



              ---=== Nutrient foods you ===--