diff(3m) MBA Library Functions diff(3m) NAME diff - compute a shortest edit script (SES) given two sequences SYNOPSIS#include <mba/diff.h>typedef const void *(*idx_fn)(const void *s, int idx, void *context);typedef int (*cmp_fn)(const void *e1, const void *e2, void *context);typedef enum {DIFF_MATCH = 1,DIFF_DELETE,DIFF_INSERT} diff_op;struct diff_edit {short op; /* DIFF_MATCH, DIFF_DELETE or DIFF_INSERT */int off; /* off into a if MATCH or DELETE, b if INSERT */int len;};int diff(const void *a,intaoff,intn,const void *b,intboff,intm,idx_fnidx,cmp_fncmp,void *context,intdmax,struct varray *ses,int *sn,struct varray *buf);DESCRIPTION Thediff(3m) module will compute theshortest edit script(SES) of two sequences. This algorithm is perhaps best known as the one used in GNU diff(1) although GNUdiffemploys additional optimizations specific to line oriented input such as source code files whereas this implementation is more generic. Formally, this implementation of the SES solution uses the dynamic programming algorithm described by Myers [1] with the Hirschberg linear space refinement. The objective is to compute the minimum set of edit operations necessary to transform a sequence A of length N into B of length M. This can be performed in O(N+M*D^2) expected time where D is theedit distance(number of elements deleted and inserted to transform A into B). Thus the algorithm is particularly fast and uses less space if the difference between sequences is small. The interface is generic such that sequences can be anything that can be indexed and compared with user supplied functions including arrays of structures, linked lists, arrays of pointers to strings in a file, etc. [1] E. Myers, ``An O(ND) Difference Algorithm and Its Variations,'' Algorithmica 1, 2 (1986), 251-266. http://www.cs.arizona.edu/people/gene/PAPERS/diff.psdiffThediff(3m) function computes the minimumedit distancebetween the sequencesa(fromaofffor lengthn) andb(frombofffor lengthm) and optionally collects theedit scriptnecessary to transformaintobstoring the result in thevarrayidentified by thesesparameter. Theidxparemeter is a pointer to anidx_fnthat returns the element at the numeric index in a sequence. Thecmpparameter is a pointer to acmp_fnfunction that returns zero if the two elementse1ande2are equal and non-zero otherwise. The suppliedcontextparameter will be passed directly to both functions with each invokation. IfidxandcmpareNULLaandbwill be compared as continuous sequences ofunsigned char(i.e. raw memory diff). If thesesparameter is notNULLit must be avarray(3m) with amembsizeofsizeof(struct diff_edit). Eachstruct diff_editelement in thevarray(3m) starting from 0 will be populated with theop,off, andlenthat together constitute the edit script. The number ofstruct diff_editelements in the edit script is written to the integer pointed to by thesnparameter. If thesesorsnparameter isNULL, the edit script will not be collected. If thedmaxparameter is not 0, the calculation will stop as soon as it is determined that the edit distance of the two sequences equals or exceeds the specified value. A value of 0 indicates that there is no limit. If thebufparameter is notNULLit must be avarray(3m) withmembsizeofsizeof(int)and will be used as temporary storage for the dynamic programming tables. IfbufisNULLstorage will be temporarily allocated and freed with malloc(3) and free(3). RETURNSdiffThediff(3m) function returns the edit distance between the two sequencesaandbordmaxif the edit distance has been determined to meet or exceed the specifieddmaxvalue. If an error occurs -1 is returned anderrnois set to indicate the error. libmba-0.9.1 Apr 29, 2005 diff(3m)