Quantcast

libxdiff and patience diff

classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

libxdiff and patience diff

Pierre Habouzit-4
Hi Davide,

I've been working tonight, trying to make libxdiff support the patience
diff algorithm, but I've totally failed, because I _thought_ I
understood what xdl_split was doing, but it appears I don't.


[ For the readers playing at home, the patience diff algorithm is
  explained after my sig. ]


What I did is to:
(1) add a flag to the xenv in xdl_split that says that I want a
    patience "split".
(2) Then in xdl_split, if that bit is set, I compute the longest common
    subsequence from the patience diff.
(3) for each split it computes I call xdl_recs_cmp on that interval.


What I thought it would achieve is that I force some boundaries at which
libxdiff _must_ resync. Though, it seems that for some reason it doesn't
work, probably because the "ha" stuff and the boundaries absolutely
don't work the way I thought it did.

So where is the place I should do that ? I suspect it should be
partly in xprepare.c but I'm a bit stuck right now.


Any pointer on how the stuff in xprepare.c and xdiffi.c work would help
greatly, it's really not self-evident to me :)


--
·O·  Pierre Habouzit
··O                                                [hidden email]
OOO                                                http://www.madism.org



Patience diff
=============

Basically, it's an algorithm that considers line from the left file A
and the right file B.

It searches for lines that are unique *IN* A and unique *IN* B as well.
Then using a clever O(n log log n) where n is that number of lines, you
extract the longer common sequence of those lines.

This defines two splits of file A and B. For each corresponding chunks,
you then reduce the lines matching at the beginning and the end, and
reiterate the algorithm on the interior of that space, until there are
_no_ unique lines at all, and then you apply a usual diff algorithm to
generate the diff in that final section.

http://alfedenzo.livejournal.com/170301.html has a nice visual
explanation of the fact, even if it forgets about the "zones" trimming
that helps for the efficiency. It's also often wrong to generate the
stacks in the order shown on the blog, because you recurse from the max
line to the min line, which is not the natural order to work on a diff.
But that's just a matter of "inverting" all the comparisons which is
pretty obvious to do.


The difference of output can be seen on http://glandium.org/blog/?p=120
where the patience diff picks the line
   "include $(topsrcdir)/config/rules.mk"
as being unique on the left and the right file, hence will use it as
sync point rather than using it in a diff.


attachment0 (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: libxdiff and patience diff

Davide Libenzi
On Tue, 4 Nov 2008, Pierre Habouzit wrote:

> Hi Davide,
>
> I've been working tonight, trying to make libxdiff support the patience
> diff algorithm, but I've totally failed, because I _thought_ I
> understood what xdl_split was doing, but it appears I don't.
>
>
> [ For the readers playing at home, the patience diff algorithm is
>   explained after my sig. ]
>
>
> What I did is to:
> (1) add a flag to the xenv in xdl_split that says that I want a
>     patience "split".
> (2) Then in xdl_split, if that bit is set, I compute the longest common
>     subsequence from the patience diff.
> (3) for each split it computes I call xdl_recs_cmp on that interval.
>
>
> What I thought it would achieve is that I force some boundaries at which
> libxdiff _must_ resync. Though, it seems that for some reason it doesn't
> work, probably because the "ha" stuff and the boundaries absolutely
> don't work the way I thought it did.
>
> So where is the place I should do that ? I suspect it should be
> partly in xprepare.c but I'm a bit stuck right now.
>
>
> Any pointer on how the stuff in xprepare.c and xdiffi.c work would help
> greatly, it's really not self-evident to me :)

What makes you think it'd self-evident to me? :)
Seriously, I forgot stuff I wrote the last month, this is way beyond my
memory limits.
You definitely need to look at xprepare.c, especially at xdl_trim_ends()
and xdl_cleanup_records(). Lines are re-arranged in indexes, and this
might screw up your logic if you're not prepared for it.
What xdl_split() does, is find the start of an LCS and return the
coordinate. Then xdl_recs_cmp() does the box reducing (first two "for"
loops) and different-lines marking (first and second "if").



- Davide


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: libxdiff and patience diff

Johannes Schindelin
In reply to this post by Pierre Habouzit-4
Hi,

On Tue, 4 Nov 2008, Pierre Habouzit wrote:

> I've been working tonight, trying to make libxdiff support the patience
> diff algorithm, but I've totally failed, because I _thought_ I
> understood what xdl_split was doing, but it appears I don't.

I thought about it, too, at the GitTogether, although I want to finish my
jGit diff first.

The main idea I had about patience diff is that you can reuse a lot of
functions in libxdiff.

But the requirement of taking just unique lines/hashes into account, and
_after_ splitting up, _again_ determine uniqueness _just_ in the
between-hunk part made me think that it may be wiser to have a separate
function for the patience diff stuff.

Basically, you would start to have libxdiff split the lines and hash them
as normal, but then determine the unique hashes (I'd start with the
smaller text, just to have a better chance to end up with unique hashes).

Once they are determined, you can search for those exact lines (hash
first) in the post-document.

Once that is done, you'd have to find the longest common subsequence,
which you could do using the existing infrastructure, but that would
require more work (as we already know the lines are unique).

After that, you would have to recurse to the same algorithm _between_
known chunks.  Eventually, that would have to resort to classical libxdiff
(if there are no, or not enough, unique lines).

Ciao,
Dscho


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: libxdiff and patience diff

Pierre Habouzit-4
On Tue, Nov 04, 2008 at 05:39:48AM +0000, Johannes Schindelin wrote:

> Hi,
>
> On Tue, 4 Nov 2008, Pierre Habouzit wrote:
>
> > I've been working tonight, trying to make libxdiff support the patience
> > diff algorithm, but I've totally failed, because I _thought_ I
> > understood what xdl_split was doing, but it appears I don't.
>
> I thought about it, too, at the GitTogether, although I want to finish my
> jGit diff first.
>
> The main idea I had about patience diff is that you can reuse a lot of
> functions in libxdiff.
>
> But the requirement of taking just unique lines/hashes into account, and
> _after_ splitting up, _again_ determine uniqueness _just_ in the
> between-hunk part made me think that it may be wiser to have a separate
> function for the patience diff stuff.
>
> Basically, you would start to have libxdiff split the lines and hash them
> as normal, but then determine the unique hashes (I'd start with the
> smaller text, just to have a better chance to end up with unique hashes).
>
> Once they are determined, you can search for those exact lines (hash
> first) in the post-document.
Actually my current implementation just puts all the hashes into an
array, sorts them by hash (which is O(n log(n)) with the position from
left or right file it's in, ordered by increasing right position. (and
the struct is filled with the left pos to -1 if the right pos is set,
and vice versa).

The I scan the array to find patterns of two consecutive hashes exactly,
and collapse it into the proper {left pos, right pos} tuple if it was
indeed a unique line in both files.

This results into an array I sort again by right pos then, and we can
work on that for the stack sorting, and I do it, and then I have my LCS.


This is the complete brute-force algorithm which requires a temporary
array of the size of the number of lines on the left + the right, and a
temporary array for the stacks which _may_ end up being as large as the
smallest number of lines between the left or right file in the worst
case I'd say (roughly).

Then I just remember a list of split points, and I recurse in all the
sub splits again. It has a fixed point which may or may not need
libxdiff recursion in it.

This code is actually written, naive and unoptimized but it doesn't work
probably because I didn't plug that in the proper place :)

> Once that is done, you'd have to find the longest common subsequence,
> which you could do using the existing infrastructure, but that would
> require more work (as we already know the lines are unique).

Patience diff gives you the algorithm to do that, it's pretty simple,
and is more efficient than the current infrastructure (in time, I don't
know for space though).

> After that, you would have to recurse to the same algorithm _between_
> known chunks.  Eventually, that would have to resort to classical libxdiff
> (if there are no, or not enough, unique lines).

Yeah, that's the point, the problem is, I believe more and more that I
should prepare the LCS from patience diff in xprepare.c, but I grok
absolutely nothing at what the chastore_t and similar stuff is. I
understand it's about hashing, but the exact stuff it does eludes me. In
fact when I look at the records I have in xdiffi.c I had the impression
they were already somehow collapsed, which makes it a too late point to
apply the patience diff ...

--
·O·  Pierre Habouzit
··O                                                [hidden email]
OOO                                                http://www.madism.org

attachment0 (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: libxdiff and patience diff

Pierre Habouzit-4
In reply to this post by Davide Libenzi
On Tue, Nov 04, 2008 at 03:17:37AM +0000, Davide Libenzi wrote:

> On Tue, 4 Nov 2008, Pierre Habouzit wrote:
>
> > Hi Davide,
> >
> > I've been working tonight, trying to make libxdiff support the patience
> > diff algorithm, but I've totally failed, because I _thought_ I
> > understood what xdl_split was doing, but it appears I don't.
> >
> >
> > [ For the readers playing at home, the patience diff algorithm is
> >   explained after my sig. ]
> >
> >
> > What I did is to:
> > (1) add a flag to the xenv in xdl_split that says that I want a
> >     patience "split".
> > (2) Then in xdl_split, if that bit is set, I compute the longest common
> >     subsequence from the patience diff.
> > (3) for each split it computes I call xdl_recs_cmp on that interval.
> >
> >
> > What I thought it would achieve is that I force some boundaries at which
> > libxdiff _must_ resync. Though, it seems that for some reason it doesn't
> > work, probably because the "ha" stuff and the boundaries absolutely
> > don't work the way I thought it did.
> >
> > So where is the place I should do that ? I suspect it should be
> > partly in xprepare.c but I'm a bit stuck right now.
> >
> >
> > Any pointer on how the stuff in xprepare.c and xdiffi.c work would help
> > greatly, it's really not self-evident to me :)
>
> What makes you think it'd self-evident to me? :)
> Seriously, I forgot stuff I wrote the last month, this is way beyond my
> memory limits.
> You definitely need to look at xprepare.c, especially at xdl_trim_ends()
> and xdl_cleanup_records(). Lines are re-arranged in indexes, and this
> might screw up your logic if you're not prepared for it.
> What xdl_split() does, is find the start of an LCS and return the
> coordinate. Then xdl_recs_cmp() does the box reducing (first two "for"
> loops) and different-lines marking (first and second "if").
Well it's what I thought it did indeed, that's why before recursing into
that bit, I tried to extract a full LCS from the patience diff algorithm
and recurse into that for each interval it gives, which _should_ work,
but doesn't at all :/

Okay maybe I should re-read my algorithm slowly and check that I've not
made something silly (like chaining the list in the reverse order or god
knows what), but my debug functions let me believe that I did that fine.

I'll look into it tonight.

--
·O·  Pierre Habouzit
··O                                                [hidden email]
OOO                                                http://www.madism.org

attachment0 (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: libxdiff and patience diff

Johannes Schindelin
In reply to this post by Pierre Habouzit-4
Hi,

On Tue, 4 Nov 2008, Pierre Habouzit wrote:

> On Tue, Nov 04, 2008 at 05:39:48AM +0000, Johannes Schindelin wrote:
>
> > On Tue, 4 Nov 2008, Pierre Habouzit wrote:
> >
> > > I've been working tonight, trying to make libxdiff support the
> > > patience diff algorithm, but I've totally failed, because I
> > > _thought_ I understood what xdl_split was doing, but it appears I
> > > don't.
> >
> > I thought about it, too, at the GitTogether, although I want to finish
> > my jGit diff first.
> >
> > The main idea I had about patience diff is that you can reuse a lot of
> > functions in libxdiff.
> >
> > But the requirement of taking just unique lines/hashes into account,
> > and _after_ splitting up, _again_ determine uniqueness _just_ in the
> > between-hunk part made me think that it may be wiser to have a
> > separate function for the patience diff stuff.
> >
> > Basically, you would start to have libxdiff split the lines and hash
> > them as normal, but then determine the unique hashes (I'd start with
> > the smaller text, just to have a better chance to end up with unique
> > hashes).
> >
> > Once they are determined, you can search for those exact lines (hash
> > first) in the post-document.
>
> Actually my current implementation just puts all the hashes into an
> array, sorts them by hash (which is O(n log(n)) with the position from
> left or right file it's in, ordered by increasing right position. (and
> the struct is filled with the left pos to -1 if the right pos is set,
> and vice versa).

Yeah, that would be much more efficient using a hash-multiset.  Still
linear size (albeit twice as much).  And you can already discard double
entries early (although you have to keep one pointer in the hash-multiset
to prevent other identical lines from being misdetected as "unique").

> The I scan the array to find patterns of two consecutive hashes exactly,
> and collapse it into the proper {left pos, right pos} tuple if it was
> indeed a unique line in both files.
>
> This results into an array I sort again by right pos then, and we can
> work on that for the stack sorting, and I do it, and then I have my LCS.
>
>
> This is the complete brute-force algorithm which requires a temporary
> array of the size of the number of lines on the left + the right, and a
> temporary array for the stacks which _may_ end up being as large as the
> smallest number of lines between the left or right file in the worst
> case I'd say (roughly).
>
> Then I just remember a list of split points, and I recurse in all the
> sub splits again. It has a fixed point which may or may not need
> libxdiff recursion in it.

I am not sure that you really end up with patience diff that way.  I
_think_ that you need to determine the longest sequence of unique lines
which has the property of being ordered in both texts first, and only
_then_ recurse into the not-yet-handled lines.

> > Once that is done, you'd have to find the longest common subsequence,
> > which you could do using the existing infrastructure, but that would
> > require more work (as we already know the lines are unique).
>
> Patience diff gives you the algorithm to do that, it's pretty simple,
> and is more efficient than the current infrastructure (in time, I don't
> know for space though).

Actually, IIRC it is pretty easy to see that the time complexity is linear
(and therefore, the space complexity, too).

> > After that, you would have to recurse to the same algorithm _between_
> > known chunks.  Eventually, that would have to resort to classical
> > libxdiff (if there are no, or not enough, unique lines).
>
> Yeah, that's the point, the problem is, I believe more and more that I
> should prepare the LCS from patience diff in xprepare.c, but I grok
> absolutely nothing at what the chastore_t and similar stuff is. I
> understand it's about hashing, but the exact stuff it does eludes me.

Yes, I do not like the short and unintuitive names either.

AFAIU chastore_t is just a generic extensible array of elements that have
size "isize", and initially there are "icount" of them.

> In fact when I look at the records I have in xdiffi.c I had the
> impression they were already somehow collapsed, which makes it a too
> late point to apply the patience diff ...

AFAICS xdiffi.c contains the classical diff algorithm (incidentally, I the
inventor of that algorithm is about 50 meters away from me at this very
moment).  It should not have anything of interest to you, except for the
fall-back case.

So I think that you should add a new file xpatience.c.

In that, I'd implement that hash multi-set, and use a prepared xdfenv_t to
fill it (smaller file first, then you can traverse the other file,
checking for uniqueness in that file and for a match in the other file at
the same time).

You _could_ build the longest list of ordered pairs at the same time, too,
but that may make the code a bit too complex.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: libxdiff and patience diff

Pierre Habouzit-4
On Tue, Nov 04, 2008 at 02:34:37PM +0000, Johannes Schindelin wrote:

> Hi,
>
> On Tue, 4 Nov 2008, Pierre Habouzit wrote:
>
> > On Tue, Nov 04, 2008 at 05:39:48AM +0000, Johannes Schindelin wrote:
> >
> > > On Tue, 4 Nov 2008, Pierre Habouzit wrote:
> > >
> > > > I've been working tonight, trying to make libxdiff support the
> > > > patience diff algorithm, but I've totally failed, because I
> > > > _thought_ I understood what xdl_split was doing, but it appears I
> > > > don't.
> > >
> > > I thought about it, too, at the GitTogether, although I want to finish
> > > my jGit diff first.
> > >
> > > The main idea I had about patience diff is that you can reuse a lot of
> > > functions in libxdiff.
> > >
> > > But the requirement of taking just unique lines/hashes into account,
> > > and _after_ splitting up, _again_ determine uniqueness _just_ in the
> > > between-hunk part made me think that it may be wiser to have a
> > > separate function for the patience diff stuff.
> > >
> > > Basically, you would start to have libxdiff split the lines and hash
> > > them as normal, but then determine the unique hashes (I'd start with
> > > the smaller text, just to have a better chance to end up with unique
> > > hashes).
> > >
> > > Once they are determined, you can search for those exact lines (hash
> > > first) in the post-document.
> >
> > Actually my current implementation just puts all the hashes into an
> > array, sorts them by hash (which is O(n log(n)) with the position from
> > left or right file it's in, ordered by increasing right position. (and
> > the struct is filled with the left pos to -1 if the right pos is set,
> > and vice versa).
>
> Yeah, that would be much more efficient using a hash-multiset.  Still
> linear size (albeit twice as much).  And you can already discard double
> entries early (although you have to keep one pointer in the hash-multiset
> to prevent other identical lines from being misdetected as "unique").
Probably, I just wanted something to work first, and then optimize it
using the proper structure.

> I am not sure that you really end up with patience diff that way.  I
> _think_ that you need to determine the longest sequence of unique lines
> which has the property of being ordered in both texts first, and only
> _then_ recurse into the not-yet-handled lines.

I must have been using really bad english because it's what I do ;)

> > > Once that is done, you'd have to find the longest common subsequence,
> > > which you could do using the existing infrastructure, but that would
> > > require more work (as we already know the lines are unique).
> >
> > Patience diff gives you the algorithm to do that, it's pretty simple,
> > and is more efficient than the current infrastructure (in time, I don't
> > know for space though).
>
> Actually, IIRC it is pretty easy to see that the time complexity is linear
> (and therefore, the space complexity, too).
Well the space complexity of the patience diff would be linear too for
me, though I'm afraid that the current state of my implementation has a
big factor front ;)


> > In fact when I look at the records I have in xdiffi.c I had the
> > impression they were already somehow collapsed, which makes it a too
> > late point to apply the patience diff ...
>
> AFAICS xdiffi.c contains the classical diff algorithm (incidentally, I the
> inventor of that algorithm is about 50 meters away from me at this very
> moment).  It should not have anything of interest to you, except for the
> fall-back case.
>
> So I think that you should add a new file xpatience.c.
>
> In that, I'd implement that hash multi-set, and use a prepared xdfenv_t to
> fill it (smaller file first, then you can traverse the other file,
> checking for uniqueness in that file and for a match in the other file at
> the same time).
>
> You _could_ build the longest list of ordered pairs at the same time, too,
> but that may make the code a bit too complex.
Well, technically, if I'm correct, xdiffi.c already has a pruned list of
hashed (I suppose ->ha is a hash value somehow ?) lines (pruned from
lines without a match left or right), and I was massaging that. Though
it absolutely fails so I fear I didn't understand what the ha stuff is
for real.

Attached is my current work (YES it is horrible, it's WIP), probably
most of it can be put in an xpatience.c file, but still, I absolutely
don't understand what gets wrong. On the simple example from
glandium.org it _looks_ like it catches the "unique" include line fine,
but somehow not.

The nasty thing about the patience diff is that it still needs the usual
diff algorithm once it has split the file into chunks separated by
"unique lines". So you cannot make really independant stuff. What I
could do is put most of the xpatience diff into xpatience.c but it would
still have to use some functions from xdiffi.c that are currently
private, so it messes somehow the files more than it's worth IMHO.


Anyways, I'm waiting for a client to finally setup his network for the
test I'm supposed to do right now, I'll try to have a closer look
tonight...

--
·O·  Pierre Habouzit
··O                                                [hidden email]
OOO                                                http://www.madism.org

diff --git a/diff.c b/diff.c
index f644947..0901cdc 100644
--- a/diff.c
+++ b/diff.c
@@ -2447,6 +2447,8 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
  options->xdl_opts |= XDF_IGNORE_WHITESPACE_CHANGE;
  else if (!strcmp(arg, "--ignore-space-at-eol"))
  options->xdl_opts |= XDF_IGNORE_WHITESPACE_AT_EOL;
+ else if (!strcmp(arg, "--patience"))
+ options->xdl_opts |= XDF_USE_PATIENCE;
 
  /* flags options */
  else if (!strcmp(arg, "--binary")) {
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 84fff58..bba915c 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -32,6 +32,7 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
+#define XDF_USE_PATIENCE (1 << 5)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
 #define XDL_PATCH_NORMAL '-'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 9d0324a..1a5b13a 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -37,8 +37,235 @@ typedef struct s_xdpsplit {
  int min_lo, min_hi;
 } xdpsplit_t;
 
+typedef struct s_xduniqmatch {
+ struct s_xduniqmatch *next;
+ long i1, i2, len;
+} xduniqmatch_t;
+
+typedef struct s_xdp_ha {
+ struct s_xdp_ha *up, *down;
+ struct s_xdp_ha *left;
+ long ha;
+ long off1;
+ long off2;
+} xdp_ha_t;
+
+
+static xduniqmatch_t *xdl_uniq_match_alloc(long i1, long i2, long len)
+{
+ xduniqmatch_t *match = xdl_malloc(sizeof(xduniqmatch_t));
+ if (match) {
+ match->next = NULL;
+ match->i1 = i1;
+ match->i2 = i2;
+ match->len = len;
+ }
+ return match;
+}
+
+static xduniqmatch_t *xdl_uniq_popfree(xduniqmatch_t *lst)
+{
+ xduniqmatch_t *next = lst->next;
+ xdl_free(lst);
+ return next;
+}
+
+static int xdp_ha_cmp_by_ha(const void *a1, const void *a2)
+{
+ const xdp_ha_t *ha1 = a1;
+ const xdp_ha_t *ha2 = a2;
+ if (ha1->ha == ha2->ha) {
+ return ha1->off2 - ha2->off2;
+ }
+ return ha1->ha - ha2->ha;
+}
+
+static int xdp_ha_cmp_by_off2(const void *a1, const void *a2)
+{
+ const xdp_ha_t *ha1 = a1;
+ const xdp_ha_t *ha2 = a2;
+ return ha2->off2 - ha1->off2;
+}
+
+static int patience_bisect_stack(xdp_ha_t **stacks, long len, long off1)
+{
+ long l = 0, r = len;
+
+ while (l < r) {
+ long i = (r + l) / 2;
+
+ if (off1 < stacks[i]->off1) {
+ l = i + 1;
+ } else {
+ r = i;
+ }
+ }
+ return l;
+}
+
+static int xdl_patience_split_aux(xduniqmatch_t *lcs, xdp_ha_t *ha)
+{
+ xduniqmatch_t *tmp;
+
+ while (ha) {
+ tmp = xdl_uniq_match_alloc(ha->off1, ha->off2, 1);
+ if (!tmp)
+ return -1;
+ tmp->next = lcs->next;
+ lcs->next = tmp;
+ lcs = tmp;
+
+ while ((ha = ha->down ? ha->down : ha->left)) {
+ if (lcs->i1 + lcs->len + 1 == ha->off1 && lcs->i2 + lcs->len + 1 == ha->off2) {
+ lcs->len++;
+ continue;
+ }
+ break;
+ }
+ }
+ return 0;
+}
+
+static int xdl_patience_split(unsigned long const *ha1, unsigned long const *ha2,
+      xduniqmatch_t *begin, xduniqmatch_t *end)
+{
+ xdp_ha_t *recs;
+ long off1 = begin->i1 + begin->len, lim1 = end->i1;
+ long off2 = begin->i2 + begin->len, lim2 = end->i2;
+ long len, i, j, uniq;
+
+ len = lim1 - off1 + lim2 - off2;
+ recs = (xdp_ha_t *)xdl_malloc(sizeof(xdp_ha_t) * len);
+ if (recs == NULL)
+ return -1;
+
+ for (i = 0, j = off1; j < lim1 - off1; j++, i++) {
+ recs[i].ha = ha1[j];
+ recs[i].off1 = j;
+ recs[i].off2 = -1;
+ recs[i].up = recs[i].down = recs[i].left = NULL;
+ }
+ for (j = off2; j < lim2; j++, i++) {
+ recs[i].ha = ha2[j];
+ recs[i].off1 = -1;
+ recs[i].off2 = j;
+ recs[i].up = recs[i].down = recs[i].left = NULL;
+ }
+
+ qsort(recs, len, sizeof(xdp_ha_t), xdp_ha_cmp_by_ha);
+
+ uniq = 0;
+ for (i = 0; i < len - 1; ) {
+ long ha = recs[i].ha;
+
+ if (ha != recs[i + 1].ha) {
+ i++;
+ continue;
+ }
+
+ if (i < len - 2 && ha == recs[i + 2].ha) {
+ i += 3;
+ while (i < len - 1 && recs[i].ha == ha && i < len - 1) {
+ i++;
+ }
+ continue;
+ }
+
+ if (recs[i].off2 < 0 && recs[i + 1].off1 < 0) {
+ long a, b;
+ recs[uniq].ha   = ha;
+ a = recs[uniq].off1 = recs[i].off1;
+ b = recs[uniq].off2 = recs[i + 1].off2;
+ uniq++;
+ }
+ i += 2;
+ }
+
+ if (uniq) {
+ xdp_ha_t **stacks;
+ long alloc, len;
+
+ qsort(recs, uniq, sizeof(xdp_ha_t), xdp_ha_cmp_by_off2);
+
+ alloc = xdl_bogosqrt(uniq);
+ stacks = xdl_malloc(sizeof(xdp_ha_t *) * alloc);
 
+ if (stacks == NULL)
+ goto error;
 
+ len = 1;
+ stacks[0] = recs;
+
+ for (i = 1; i < uniq; i++) {
+ long off1 = recs[i].off1;
+ long k;
+
+ if (off1 < stacks[len - 1]->off1) {
+ if (len >= alloc) {
+ alloc *= 2;
+ stacks = xdl_realloc(stacks, sizeof(xdp_ha_t *) * alloc);
+ if (!stacks)
+ goto error;
+ }
+ stacks[k = len++] = NULL;
+ } else {
+ k = patience_bisect_stack(stacks, len - 1, off1);
+ }
+
+ if (k > 0) {
+ recs[i].left = stacks[k - 1];
+ }
+ if (stacks[k]) {
+ stacks[k]->down = &recs[i];
+ recs[i].up = stacks[k];
+ }
+ stacks[k] = &recs[i];
+ }
+
+ if (xdl_patience_split_aux(begin, stacks[len - 1]) < 0) {
+ xdl_free(stacks);
+ goto error;
+ }
+
+ xdl_free(stacks);
+ }
+
+ xdl_free(recs);
+ return 0;
+
+error:
+ xdl_free(recs);
+ return -1;
+}
+
+static int xdl_patience_lcs(xdfenv_t *xe, xduniqmatch_t *begin, xduniqmatch_t *end)
+{
+ unsigned long const *ha1 = xe->xdf1.ha, *ha2 = xe->xdf2.ha;
+ long off1 = begin->i1 + begin->len, lim1 = end->i1;
+ long off2 = begin->i2 + begin->len, lim2 = end->i2;
+ xduniqmatch_t *next;
+
+ for (; off1 < lim1 && off2 < lim2 && ha1[off1] == ha2[off2]; off1++, off2++);
+ for (; off1 < lim1 && off2 < lim2 && ha1[lim1 - 1] == ha2[lim2 - 1]; lim1--, lim2--);
+
+ begin->len += off1 - begin->i1;
+ end->len   += end->i1 - lim1;
+ end->i1     = lim1;
+ end->i2     = lim2;
+
+ if (off1 == lim1 || off2 == lim2)
+ return 0;
+
+ if (xdl_patience_split(ha1, ha2, begin, end))
+ return -1;
+
+ for (next = begin->next; next != end; begin = next, next = begin->next) {
+ if (xdl_patience_lcs(xe, begin, next) < 0)
+ return -1;
+ }
+
+ return 0;
+}
 
 static long xdl_split(unsigned long const *ha1, long off1, long lim1,
       unsigned long const *ha2, long off2, long lim2,
@@ -321,13 +548,13 @@ int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
  return 0;
 }
 
-
 int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
  xdfenv_t *xe) {
  long ndiags;
  long *kvd, *kvdf, *kvdb;
  xdalgoenv_t xenv;
  diffdata_t dd1, dd2;
+ int need_min = (xpp->flags & XDF_NEED_MINIMAL) != 0;
 
  if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
 
@@ -364,12 +591,54 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
  dd2.rchg = xe->xdf2.rchg;
  dd2.rindex = xe->xdf2.rindex;
 
- if (xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
- kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0, &xenv) < 0) {
+ if (xpp->flags & XDF_USE_PATIENCE) {
+ xduniqmatch_t *lcs;
+ long i1, i2;
+
+ lcs = xdl_uniq_match_alloc(0, 0, 0);
+ if (!lcs)
+ goto error;
+ lcs->next = xdl_uniq_match_alloc(xe->xdf1.nreff, xe->xdf2.nreff, 0);
+ if (!lcs->next || xdl_patience_lcs(xe, lcs, lcs->next) < 0) {
+ while ((lcs = xdl_uniq_popfree(lcs)));
+ goto error;
+ }
 
- xdl_free(kvd);
- xdl_free_env(xe);
- return -1;
+ i1 = i2 = lcs->len;
+ if (lcs->len) {
+ fprintf(stderr, "skip  %ld:%ld -> %ld:%ld\n",
+ lcs->i1, i1, lcs->i2, i2);
+ }
+
+ while ((lcs = xdl_uniq_popfree(lcs))) {
+ fprintf(stderr, "usual %ld:%ld -> %ld:%ld\n",
+ i1, lcs->i1, i2, lcs->i2);
+ fprintf(stderr, "l/r: %ld / %ld\n",
+ xe->xdf1.rindex[lcs->i1],
+ xe->xdf2.rindex[lcs->i2]);
+ if (xdl_recs_cmp(&dd1, i1, lcs->i1, &dd2, i2, lcs->i2,
+ kvdf, kvdb, need_min, &xenv) < 0) {
+ while ((lcs = xdl_uniq_popfree(lcs)));
+ goto error;
+ }
+ i1 = lcs->i1 + lcs->len;
+ i2 = lcs->i2 + lcs->len;
+ if (lcs->len) {
+ fprintf(stderr, "skip  %ld:%ld -> %ld:%ld (len %ld)\n",
+ lcs->i1, i1, lcs->i2, i2, lcs->len);
+ fprintf(stderr, "l/r: %ld / %ld\n",
+ xe->xdf1.rindex[lcs->i1 + lcs->len],
+ xe->xdf2.rindex[lcs->i2 + lcs->len]);
+ }
+ }
+ } else {
+ if (xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
+ kvdf, kvdb, need_min, &xenv) < 0) {
+error:
+ xdl_free(kvd);
+ xdl_free_env(xe);
+ return -1;
+ }
  }
 
  xdl_free(kvd);

attachment0 (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: libxdiff and patience diff

Johannes Schindelin
Hi,

On Tue, 4 Nov 2008, Pierre Habouzit wrote:

> The nasty thing about the patience diff is that it still needs the usual
> diff algorithm once it has split the file into chunks separated by
> "unique lines".

Actually, it should try to apply patience diff again in those chunks,
separately.

> So you cannot make really independant stuff. What I could do is put most
> of the xpatience diff into xpatience.c but it would still have to use
> some functions from xdiffi.c that are currently private, so it messes
> somehow the files more than it's worth IMHO.

I think it is better that you use the stuff from xdiffi.c through a well
defined interface, i.e. _not_ mess up the code by mingling it together
with the code in xdiffi.c.  The code is hard enough to read already.

Oh, BTW, "ha" is a hash of the lines which is used to make the line
matching more performant.  You will see a lot of "ha" comparisons before
actually calling xdl_recmatch() for that reason.  Incidentally, this is
also the hash that I'd use for the hash multi-set I was referring to.

Oh, and I am not sure that it is worth your time trying to get it to run
with the linear list, since you cannot reuse that code afterwards, and
have to spend the same amount of time to redo it with the hash set.

I am awfully short on time, so it will take some days until I can review
what you have already, unfortunately.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: libxdiff and patience diff

Pierre Habouzit-4
On Tue, Nov 04, 2008 at 03:57:44PM +0000, Johannes Schindelin wrote:

> Hi,
>
> On Tue, 4 Nov 2008, Pierre Habouzit wrote:
>
> > The nasty thing about the patience diff is that it still needs the usual
> > diff algorithm once it has split the file into chunks separated by
> > "unique lines".
>
> Actually, it should try to apply patience diff again in those chunks,
> separately.
yes it's what I do, but this has a fixed point as soon as you don't find
unique lines between the new found ones, or that that space is "empty".
E.g. you could have the two following hunks:

File A       File B
1            2
2            1
1            2
2            1
1            2
2
1

The simple leading/trailing reduction will do nothing, and you don't
have any shared unique lines, on that you must apply the usual diff
algorithm.

> > So you cannot make really independant stuff. What I could do is put most
> > of the xpatience diff into xpatience.c but it would still have to use
> > some functions from xdiffi.c that are currently private, so it messes
> > somehow the files more than it's worth IMHO.
>
> I think it is better that you use the stuff from xdiffi.c through a well
> defined interface, i.e. _not_ mess up the code by mingling it together
> with the code in xdiffi.c.  The code is hard enough to read already.

Hmmm. I'll see to that later, once I have something that works.

> Oh, BTW, "ha" is a hash of the lines which is used to make the line
> matching more performant.  You will see a lot of "ha" comparisons before
> actually calling xdl_recmatch() for that reason.  Incidentally, this is
> also the hash that I'd use for the hash multi-set I was referring to.

Yeah, that's what I assumed it would be.

> Oh, and I am not sure that it is worth your time trying to get it to run
> with the linear list, since you cannot reuse that code afterwards, and
> have to spend the same amount of time to redo it with the hash set.

Having the linear list (actually an array) work would show me I hook at
the proper place. Replacing a data structure doesn't makes me afraid
because I've split the functions properly.

> I am awfully short on time, so it will take some days until I can review
> what you have already, unfortunately.

NP, it was just in case, because I'm horribly stuck with that code right
now ;)

--
·O·  Pierre Habouzit
··O                                                [hidden email]
OOO                                                http://www.madism.org

attachment0 (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 0/3] Teach Git about the patience diff algorithm

Johannes Schindelin
In reply to this post by Pierre Habouzit-4

Nothing fancy, really, just a straight-forward implementation of the
heavily under-documented and under-analyzed paience diff algorithm.

One thing is a bit ugly: the libxdiff interface does not allow to
calculate diffs of ranges of lines.  So instead of reusing an initialized
environment (with line hashes and all), the simple structure of mmfile_t
is exploited to fake an mmfile_t of a file _part_, reusing the file's
buffer.

And this mmfile_t pair gets a new environment, recalculating hashes and
all.

Davide, I think it would be easier to refactor xdl_do_diff() to take line
ranges, and use that interface both in xpatience.c and in xmerge.c.  
(Although I do not know if you took xmerge.c at all; you seemed a bit
reluctant about it back when I sent it to you.)

For those interested in studying the code, I suggest starting with the
short comment at the beginning of xpatience.c and then working yourself up
from the end (i.e. xdl_do_patience_diff()).

It might be a good idea to think about using this code in our merging code
once it is well reviewed and tested, as it might help a substantial number
of otherwise non-trivial merge conflicts.

Oh, and the bash completions are so trivial I did not even bother to test
them.

Happy new year.

Johannes Schindelin (3):
  Implement the patience diff algorithm
  Introduce the diff option '--patience'
  bash completions: Add the --patience option

 Documentation/diff-options.txt         |    3 +
 Makefile                               |    2 +-
 contrib/completion/git-completion.bash |    2 +
 diff.c                                 |    2 +
 t/t4033-diff-patience.sh               |  168 ++++++++++++++
 xdiff/xdiff.h                          |    1 +
 xdiff/xdiffi.c                         |    3 +
 xdiff/xdiffi.h                         |    2 +
 xdiff/xpatience.c                      |  374 ++++++++++++++++++++++++++++++++
 9 files changed, 556 insertions(+), 1 deletions(-)
 create mode 100755 t/t4033-diff-patience.sh
 create mode 100644 xdiff/xpatience.c


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 1/3] Implement the patience diff algorithm

Johannes Schindelin

The patience diff algorithm produces slightly more intuitive output
than the classic Myers algorithm, as it does not try to minimize the
number of +/- lines first, but tries to preserve the lines that are
unique.

To this end, it first determines lines that are unique in both files,
then the maximal sequence which preserves the order (relative to both
files) is extracted.

Starting from this initial set of common lines, the rest of the lines
is handled recursively, with Myers' algorithm as a fallback when
the patience algorithm fails (due to no common unique lines).

Signed-off-by: Johannes Schindelin <[hidden email]>
---
 xdiff/xdiff.h     |    1 +
 xdiff/xdiffi.c    |    3 +
 xdiff/xdiffi.h    |    2 +
 xdiff/xpatience.c |  374 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 380 insertions(+), 0 deletions(-)
 create mode 100644 xdiff/xpatience.c

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 361f802..4da052a 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -32,6 +32,7 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
+#define XDF_PATIENCE_DIFF (1 << 5)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
 #define XDL_PATCH_NORMAL '-'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 9d0324a..3e97462 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -329,6 +329,9 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
  xdalgoenv_t xenv;
  diffdata_t dd1, dd2;
 
+ if (xpp->flags & XDF_PATIENCE_DIFF)
+ return xdl_do_patience_diff(mf1, mf2, xpp, xe);
+
  if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
 
  return -1;
diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
index 3e099dc..ad033a8 100644
--- a/xdiff/xdiffi.h
+++ b/xdiff/xdiffi.h
@@ -55,5 +55,7 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr);
 void xdl_free_script(xdchange_t *xscr);
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
   xdemitconf_t const *xecfg);
+int xdl_do_patience_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
+ xdfenv_t *env);
 
 #endif /* #if !defined(XDIFFI_H) */
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
new file mode 100644
index 0000000..6687940
--- /dev/null
+++ b/xdiff/xpatience.c
@@ -0,0 +1,374 @@
+/*
+ *  LibXDiff by Davide Libenzi ( File Differential Library )
+ *  Copyright (C) 2003-2009 Davide Libenzi, Johannes E. Schindelin
+ *
+ *  This library is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ *
+ *  This library is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  Lesser General Public License for more details.
+ *
+ *  You should have received a copy of the GNU Lesser General Public
+ *  License along with this library; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *  Davide Libenzi <[hidden email]>
+ *
+ */
+#include "xinclude.h"
+#include "xtypes.h"
+#include "xdiff.h"
+
+/*
+ * The basic idea of patience diff is to find lines that are unique in
+ * both files.  These are intuitively the ones that we want to see as
+ * common lines.
+ *
+ * The maximal ordered sequence of such line pairs (where ordered means
+ * that the order in the sequence agrees with the order of the lines in
+ * both files) naturally defines an initial set of common lines.
+ *
+ * Now, the algorithm tries to extend the set of common lines by growing
+ * the line ranges where the files have identical lines.
+ *
+ * Between those common lines, the patience diff algorithm is applied
+ * recursively, until no unique line pairs can be found; these line ranges
+ * are handled by the well-known Myers algorithm.
+ */
+
+#define NON_UNIQUE ULONG_MAX
+
+/*
+ * This is a hash mapping from line hash to line numbers in the first and
+ * second file.
+ */
+struct hashmap {
+ int nr, alloc;
+ struct entry {
+ unsigned long hash;
+ /*
+ * 0 = unused entry, 1 = first line, 2 = second, etc.
+ * line2 is NON_UNIQUE if the line is not unique
+ * in either the first or the second file.
+ */
+ unsigned long line1, line2;
+ /*
+ * "next" & "previous" are used for the longest common
+ * sequence;
+ * initially, "next" reflects only the order in file1.
+ */
+ struct entry *next, *previous;
+ } *entries, *first, *last;
+ /* were common records found? */
+ unsigned long has_matches;
+ mmfile_t *file1, *file2;
+ xdfenv_t *env;
+ xpparam_t const *xpp;
+};
+
+/* The argument "pass" is 1 for the first file, 2 for the second. */
+static void insert_record(int line, struct hashmap *map, int pass)
+{
+ xrecord_t **records = pass == 1 ?
+ map->env->xdf1.recs : map->env->xdf2.recs;
+ xrecord_t *record = records[line - 1], *other;
+ /*
+ * After xdl_prepare_env() (or more precisely, due to
+ * xdl_classify_record()), the "ha" member of the records (AKA lines)
+ * is _not_ the hash anymore, but a linearized version of it.  In
+ * other words, the "ha" member is guaranteed to start with 0 and
+ * the second record's ha can only be 0 or 1, etc.
+ *
+ * So we multiply ha by 2 in the hope that the hashing was
+ * "unique enough".
+ */
+ int index = (int)((record->ha << 1) % map->alloc);
+
+ while (map->entries[index].line1) {
+ other = map->env->xdf1.recs[map->entries[index].line1 - 1];
+ if (map->entries[index].hash != record->ha ||
+ !xdl_recmatch(record->ptr, record->size,
+ other->ptr, other->size,
+ map->xpp->flags)) {
+ if (++index >= map->alloc)
+ index = 0;
+ continue;
+ }
+ if (pass == 2)
+ map->has_matches = 1;
+ if (pass == 1 || map->entries[index].line2)
+ map->entries[index].line2 = NON_UNIQUE;
+ else
+ map->entries[index].line2 = line;
+ return;
+ }
+ if (pass == 2)
+ return;
+ map->entries[index].line1 = line;
+ map->entries[index].hash = record->ha;
+ if (!map->first)
+ map->first = map->entries + index;
+ if (map->last) {
+ map->last->next = map->entries + index;
+ map->entries[index].previous = map->last;
+ }
+ map->last = map->entries + index;
+ map->nr++;
+}
+
+/*
+ * This function has to be called for each recursion into the inter-hunk
+ * parts, as previously non-unique lines can become unique when being
+ * restricted to a smaller part of the files.
+ *
+ * It is assumed that env has been prepared using xdl_prepare().
+ */
+static int fill_hashmap(mmfile_t *file1, mmfile_t *file2,
+ xpparam_t const *xpp, xdfenv_t *env,
+ struct hashmap *result,
+ int line1, int count1, int line2, int count2)
+{
+ result->file1 = file1;
+ result->file2 = file2;
+ result->xpp = xpp;
+ result->env = env;
+
+ /* We know exactly how large we want the hash map */
+ result->alloc = count1 * 2;
+ result->entries = (struct entry *)
+ xdl_malloc(result->alloc * sizeof(struct entry));
+ if (!result->entries)
+ return -1;
+ memset(result->entries, 0, result->alloc * sizeof(struct entry));
+
+ /* First, fill with entries from the first file */
+ while (count1--)
+ insert_record(line1++, result, 1);
+
+ /* Then search for matches in the second file */
+ while (count2--)
+ insert_record(line2++, result, 2);
+
+ return 0;
+}
+
+/*
+ * Find the longest sequence with a smaller last element (meaning a smaller
+ * line2, as we construct the sequence with entries ordered by line1).
+ */
+static int binary_search(struct entry **sequence, int longest,
+ struct entry *entry)
+{
+ int left = -1, right = longest;
+
+ while (left + 1 < right) {
+ int middle = (left + right) / 2;
+ /* by construction, no two entries can be equal */
+ if (sequence[middle]->line2 > entry->line2)
+ right = middle;
+ else
+ left = middle;
+ }
+ /* return the index in "sequence", _not_ the sequence length */
+ return left;
+}
+
+/*
+ * The idea is to start with the list of common unique lines sorted by
+ * the order in file1.  For each of these pairs, the longest (partial)
+ * sequence whose last element's line2 is smaller is determined.
+ *
+ * For efficiency, the sequences are kept in a list containing exactly one
+ * item per sequence length: the sequence with the smallest last
+ * element (in terms of line2).
+ */
+static struct entry *find_longest_common_sequence(struct hashmap *map)
+{
+ struct entry **sequence = xdl_malloc(map->nr * sizeof(struct entry *));
+ int longest = 0, i;
+ struct entry *entry;
+
+ for (entry = map->first; entry; entry = entry->next) {
+ if (!entry->line2 || entry->line2 == NON_UNIQUE)
+ continue;
+ i = binary_search(sequence, longest, entry);
+ entry->previous = i < 0 ? NULL : sequence[i];
+ sequence[++i] = entry;
+ if (i == longest)
+ longest++;
+ }
+
+ /* No common unique lines were found */
+ if (!longest)
+ return NULL;
+
+ /* Iterate starting at the last element, adjusting the "next" members */
+ entry = sequence[longest - 1];
+ entry->next = NULL;
+ while (entry->previous) {
+ entry->previous->next = entry;
+ entry = entry->previous;
+ }
+ return entry;
+}
+
+static int match(struct hashmap *map, int line1, int line2)
+{
+ xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
+ xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
+ return xdl_recmatch(record1->ptr, record1->size,
+ record2->ptr, record2->size, map->xpp->flags);
+}
+
+static int patience_diff(mmfile_t *file1, mmfile_t *file2,
+ xpparam_t const *xpp, xdfenv_t *env,
+ int line1, int count1, int line2, int count2);
+
+static int walk_common_sequence(struct hashmap *map, struct entry *first,
+ int line1, int count1, int line2, int count2)
+{
+ int end1 = line1 + count1, end2 = line2 + count2;
+ int next1, next2;
+
+ for (;;) {
+ /* Try to grow the line ranges of common lines */
+ if (first) {
+ next1 = first->line1;
+ next2 = first->line2;
+ while (next1 > line1 && next2 > line2 &&
+ match(map, next1 - 1, next2 - 1)) {
+ next1--;
+ next2--;
+ }
+ } else {
+ next1 = end1;
+ next2 = end2;
+ }
+ while (line1 < next1 && line2 < next2 &&
+ match(map, line1, line2)) {
+ line1++;
+ line2++;
+ }
+
+ /* Recurse */
+ if (next1 > line1 || next2 > line2) {
+ struct hashmap submap;
+
+ memset(&submap, 0, sizeof(submap));
+ if (patience_diff(map->file1, map->file2,
+ map->xpp, map->env,
+ line1, next1 - line1,
+ line2, next2 - line2))
+ return -1;
+ }
+
+ if (!first)
+ return 0;
+
+ while (first->next &&
+ first->next->line1 == first->line1 + 1 &&
+ first->next->line2 == first->line2 + 1)
+ first = first->next;
+
+ line1 = first->line1 + 1;
+ line2 = first->line2 + 1;
+
+ first = first->next;
+ }
+}
+
+static int fall_back_to_classic_diff(struct hashmap *map,
+ int line1, int count1, int line2, int count2)
+{
+ /*
+ * This probably does not work outside Git, since
+ * we have a very simple mmfile structure.
+ *
+ * Note: ideally, we would reuse the prepared environment, but
+ * the libxdiff interface does not (yet) allow for diffing only
+ * ranges of lines instead of the whole files.
+ */
+ mmfile_t subfile1, subfile2;
+ xpparam_t xpp;
+ xdfenv_t env;
+
+ subfile1.ptr = (char *)map->env->xdf1.recs[line1 - 1]->ptr;
+ subfile1.size = map->env->xdf1.recs[line1 + count1 - 2]->ptr +
+ map->env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
+ subfile2.ptr = (char *)map->env->xdf2.recs[line2 - 1]->ptr;
+ subfile2.size = map->env->xdf2.recs[line2 + count2 - 2]->ptr +
+ map->env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
+ xpp.flags = map->xpp->flags & ~XDF_PATIENCE_DIFF;
+ if (xdl_do_diff(&subfile1, &subfile2, &xpp, &env) < 0)
+ return -1;
+
+ memcpy(map->env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
+ memcpy(map->env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
+
+ return 0;
+}
+
+/*
+ * Recursively find the longest common sequence of unique lines,
+ * and if none was found, ask xdl_do_diff() to do the job.
+ *
+ * This function assumes that env was prepared with xdl_prepare_env().
+ */
+static int patience_diff(mmfile_t *file1, mmfile_t *file2,
+ xpparam_t const *xpp, xdfenv_t *env,
+ int line1, int count1, int line2, int count2)
+{
+ struct hashmap map;
+ struct entry *first;
+ int result = 0;
+
+ /* trivial case: one side is empty */
+ if (!count1) {
+ while(count2--)
+ env->xdf2.rchg[line2++ - 1] = 1;
+ return 0;
+ } else if (!count2) {
+ while(count1--)
+ env->xdf1.rchg[line1++ - 1] = 1;
+ return 0;
+ }
+
+ memset(&map, 0, sizeof(map));
+ if (fill_hashmap(file1, file2, xpp, env, &map,
+ line1, count1, line2, count2))
+ return -1;
+
+ /* are there any matching lines at all? */
+ if (!map.has_matches) {
+ while(count1--)
+ env->xdf1.rchg[line1++ - 1] = 1;
+ while(count2--)
+ env->xdf2.rchg[line2++ - 1] = 1;
+ return 0;
+ }
+
+ first = find_longest_common_sequence(&map);
+ if (first)
+ result = walk_common_sequence(&map, first,
+ line1, count1, line2, count2);
+ else
+ result = fall_back_to_classic_diff(&map,
+ line1, count1, line2, count2);
+
+ return result;
+}
+
+int xdl_do_patience_diff(mmfile_t *file1, mmfile_t *file2,
+ xpparam_t const *xpp, xdfenv_t *env)
+{
+ if (xdl_prepare_env(file1, file2, xpp, env) < 0)
+ return -1;
+
+ /* environment is cleaned up in xdl_diff() */
+ return patience_diff(file1, file2, xpp, env,
+ 1, env->xdf1.nrec, 1, env->xdf2.nrec);
+}
--
1.6.1.rc3.412.ga72b


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 2/3] Introduce the diff option '--patience'

Johannes Schindelin
In reply to this post by Johannes Schindelin

This commit teaches Git to produce diff output using the patience diff
algorithm with the diff option '--patience'.

Signed-off-by: Johannes Schindelin <[hidden email]>
---

        git log --check complains about this patch, for obvious reasons.

 Documentation/diff-options.txt |    3 +
 Makefile                       |    2 +-
 diff.c                         |    2 +
 t/t4033-diff-patience.sh       |  168 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 174 insertions(+), 1 deletions(-)
 create mode 100755 t/t4033-diff-patience.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 671f533..15ef35a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -36,6 +36,9 @@ endif::git-format-patch[]
 --patch-with-raw::
  Synonym for "-p --raw".
 
+--patience:
+ Generate a diff using the "patience diff" algorithm.
+
 --stat[=width[,name-width]]::
  Generate a diffstat.  You can override the default
  output width for 80-column terminal by "--stat=width".
diff --git a/Makefile b/Makefile
index 154cf34..2217873 100644
--- a/Makefile
+++ b/Makefile
@@ -1287,7 +1287,7 @@ $(LIB_FILE): $(LIB_OBJS)
  $(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(LIB_OBJS)
 
 XDIFF_OBJS=xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
- xdiff/xmerge.o
+ xdiff/xmerge.o xdiff/xpatience.o
 $(XDIFF_OBJS): xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
  xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
diff --git a/diff.c b/diff.c
index 56b80f9..67718b7 100644
--- a/diff.c
+++ b/diff.c
@@ -2472,6 +2472,8 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
  options->xdl_opts |= XDF_IGNORE_WHITESPACE_CHANGE;
  else if (!strcmp(arg, "--ignore-space-at-eol"))
  options->xdl_opts |= XDF_IGNORE_WHITESPACE_AT_EOL;
+ else if (!strcmp(arg, "--patience"))
+ options->xdl_opts |= XDF_PATIENCE_DIFF;
 
  /* flags options */
  else if (!strcmp(arg, "--binary")) {
diff --git a/t/t4033-diff-patience.sh b/t/t4033-diff-patience.sh
new file mode 100755
index 0000000..63c1b00
--- /dev/null
+++ b/t/t4033-diff-patience.sh
@@ -0,0 +1,168 @@
+#!/bin/sh
+
+test_description='patience diff algorithm'
+
+. ./test-lib.sh
+
+cat > file1 << EOF
+#include <stdio.h>
+
+// Frobs foo heartily
+int frobnitz(int foo)
+{
+    int i;
+    for(i = 0; i < 10; i++)
+    {
+        printf("Your answer is: ");
+        printf("%d\n", foo);
+    }
+}
+
+int fact(int n)
+{
+    if(n > 1)
+    {
+        return fact(n-1) * n;
+    }
+    return 1;
+}
+
+int main(int argc, char **argv)
+{
+    frobnitz(fact(10));
+}
+EOF
+
+cat > file2 << EOF
+#include <stdio.h>
+
+int fib(int n)
+{
+    if(n > 2)
+    {
+        return fib(n-1) + fib(n-2);
+    }
+    return 1;
+}
+
+// Frobs foo heartily
+int frobnitz(int foo)
+{
+    int i;
+    for(i = 0; i < 10; i++)
+    {
+        printf("%d\n", foo);
+    }
+}
+
+int main(int argc, char **argv)
+{
+    frobnitz(fib(10));
+}
+EOF
+
+cat > expect << EOF
+diff --git a/file1 b/file2
+index 6faa5a3..e3af329 100644
+--- a/file1
++++ b/file2
+@@ -1,26 +1,25 @@
+ #include <stdio.h>
+
++int fib(int n)
++{
++    if(n > 2)
++    {
++        return fib(n-1) + fib(n-2);
++    }
++    return 1;
++}
++
+ // Frobs foo heartily
+ int frobnitz(int foo)
+ {
+     int i;
+     for(i = 0; i < 10; i++)
+     {
+-        printf("Your answer is: ");
+         printf("%d\n", foo);
+     }
+ }
+
+-int fact(int n)
+-{
+-    if(n > 1)
+-    {
+-        return fact(n-1) * n;
+-    }
+-    return 1;
+-}
+-
+ int main(int argc, char **argv)
+ {
+-    frobnitz(fact(10));
++    frobnitz(fib(10));
+ }
+EOF
+
+test_expect_success 'patience diff' '
+
+ test_must_fail git diff --no-index --patience file1 file2 > output &&
+ test_cmp expect output
+
+'
+
+test_expect_success 'patience diff output is valid' '
+
+ mv file2 expect &&
+ git apply < output &&
+ test_cmp expect file2
+
+'
+
+cat > uniq1 << EOF
+1
+2
+3
+4
+5
+6
+EOF
+
+cat > uniq2 << EOF
+a
+b
+c
+d
+e
+f
+EOF
+
+cat > expect << EOF
+diff --git a/uniq1 b/uniq2
+index b414108..0fdf397 100644
+--- a/uniq1
++++ b/uniq2
+@@ -1,6 +1,6 @@
+-1
+-2
+-3
+-4
+-5
+-6
++a
++b
++c
++d
++e
++f
+EOF
+
+test_expect_success 'completely different files' '
+
+ test_must_fail git diff --no-index --patience uniq1 uniq2 > output &&
+ test_cmp expect output
+
+'
+
+test_done
--
1.6.1.rc3.412.ga72b


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3/3] bash completions: Add the --patience option

Johannes Schindelin
In reply to this post by Johannes Schindelin

Signed-off-by: Johannes Schindelin <[hidden email]>
---
 contrib/completion/git-completion.bash |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index a046441..b3e1e22 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -777,6 +777,7 @@ _git_diff ()
  --no-prefix --src-prefix= --dst-prefix=
  --base --ours --theirs
  --inter-hunk-context=
+ --patience
  "
  return
  ;;
@@ -969,6 +970,7 @@ _git_log ()
  --parents --children --full-history
  --merge
  --inter-hunk-context=
+ --patience
  "
  return
  ;;
--
1.6.1.rc3.412.ga72b


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 0/3] Teach Git about the patience diff algorithm

Linus Torvalds-3
In reply to this post by Johannes Schindelin


On Thu, 1 Jan 2009, Johannes Schindelin wrote:
>
> Nothing fancy, really, just a straight-forward implementation of the
> heavily under-documented and under-analyzed paience diff algorithm.

Exactly because the patience diff is so under-documented, could you
perhaps give a few examples of how it differs in the result, and why it's
so wonderful? Yes, yes, I can google, and no, no, nothing useful shows up
except for *totally* content-free fanboisms.

So could we have some actual real data on it?

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 0/3] Teach Git about the patience diff algorithm

Linus Torvalds-3


On Thu, 1 Jan 2009, Linus Torvalds wrote:
>
> So could we have some actual real data on it?

.. and some testing. I tried to get some limited data for the kernel
myself, by doing

        git log --patience -p v2.6.28.. > ~/patience

but I just got a core-dump instead.

Pinpointing it to a specific commit shows a smaller failure case:

        git show -p --patience 05d564fe00c05bf8ff93948057ca1acb5bc68e10

which might help you debug this.

                        Linus

---
#0  0x00000000004cce73 in xdl_get_rec (xdf=0x7fffcb1e08d0, ri=-9, rec=0x7fffcb1e0778) at xdiff/xemit.c:36
#1  0x00000000004cced1 in xdl_emit_record (xdf=0x7fffcb1e08d0, ri=-9, pre=0x4eef79 "-", ecb=0x7fffcb1e0bc0)
    at xdiff/xemit.c:46
#2  0x00000000004cd4e6 in xdl_emit_diff (xe=0x7fffcb1e08d0, xscr=0x1111daf0, ecb=0x7fffcb1e0bc0,
    xecfg=0x7fffcb1e0b80) at xdiff/xemit.c:179
#3  0x00000000004caa2c in xdl_diff (mf1=0x7fffcb1e0a40, mf2=0x7fffcb1e0a30, xpp=0x7fffcb1e0bd0,
    xecfg=0x7fffcb1e0b80, ecb=0x7fffcb1e0bc0) at xdiff/xdiffi.c:559
#4  0x00000000004c088d in xdi_diff (mf1=0x7fffcb1e0c00, mf2=0x7fffcb1e0bf0, xpp=0x7fffcb1e0bd0,
    xecfg=0x7fffcb1e0b80, xecb=0x7fffcb1e0bc0) at xdiff-interface.c:137
#5  0x00000000004c0914 in xdi_diff_outf (mf1=0x7fffcb1e0c00, mf2=0x7fffcb1e0bf0, fn=0x475448 <fn_out_consume>,
    consume_callback_data=0x7fffcb1e0b40, xpp=0x7fffcb1e0bd0, xecfg=0x7fffcb1e0b80, xecb=0x7fffcb1e0bc0)
    at xdiff-interface.c:154
#6  0x00000000004780dc in builtin_diff (name_a=0x25cf6f0 "fs/nfs/nfs4xdr.c", name_b=0x25cf6f0 "fs/nfs/nfs4xdr.c",
    one=0x25cf690, two=0x26ae110, xfrm_msg=0xf659900 "index 7dde309..29656c5 100644", o=0x7fffcb1e1088,
    complete_rewrite=0) at diff.c:1486
#7  0x00000000004796e4 in run_diff_cmd (pgm=0x0, name=0x25cf6f0 "fs/nfs/nfs4xdr.c", other=0x0,
    attr_path=0x25cf6f0 "fs/nfs/nfs4xdr.c", one=0x25cf690, two=0x26ae110,
    xfrm_msg=0xf659900 "index 7dde309..29656c5 100644", o=0x7fffcb1e1088, complete_rewrite=0) at diff.c:2024
#8  0x0000000000479e2e in run_diff (p=0xaffece0, o=0x7fffcb1e1088) at diff.c:2158
#9  0x000000000047b959 in diff_flush_patch (p=0xaffece0, o=0x7fffcb1e1088) at diff.c:2743
#10 0x000000000047c942 in diff_flush (options=0x7fffcb1e1088) at diff.c:3184
#11 0x0000000000488b75 in log_tree_diff_flush (opt=0x7fffcb1e0f40) at log-tree.c:451
#12 0x0000000000488d17 in log_tree_diff (opt=0x7fffcb1e0f40, commit=0x2673198, log=0x7fffcb1e0ec0) at log-tree.c:503
#13 0x0000000000488da4 in log_tree_commit (opt=0x7fffcb1e0f40, commit=0x2673198) at log-tree.c:526
#14 0x000000000043218d in cmd_log_walk (rev=0x7fffcb1e0f40) at builtin-log.c:201
#15 0x0000000000432bae in cmd_log (argc=4, argv=0x7fffcb1e14b0, prefix=0x0) at builtin-log.c:423
#16 0x000000000040486b in run_command (p=0x70c7b0, argc=4, argv=0x7fffcb1e14b0) at git.c:243
#17 0x0000000000404a1c in handle_internal_command (argc=4, argv=0x7fffcb1e14b0) at git.c:387
#18 0x0000000000404c6e in main (argc=4, argv=0x7fffcb1e14b0) at git.c:484
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 0/3] Teach Git about the patience diff algorithm

Adeodato Simó
In reply to this post by Linus Torvalds-3
* Linus Torvalds [Thu, 01 Jan 2009 11:45:21 -0800]:

> On Thu, 1 Jan 2009, Johannes Schindelin wrote:

> > Nothing fancy, really, just a straight-forward implementation of the
> > heavily under-documented and under-analyzed paience diff algorithm.

> Exactly because the patience diff is so under-documented, could you
> perhaps give a few examples of how it differs in the result, and why it's
> so wonderful? Yes, yes, I can google, and no, no, nothing useful shows up
> except for *totally* content-free fanboisms.

> So could we have some actual real data on it?

For me, the cases where I find patience output to be of substantial
higher readability are those involving a rewrite of several consecutive
paragraphs (i.e., lines of code separated by blank lines). Compare:

-8<- git -8<-
@@ -51,29 +51,30 @@ def mbox_update(bug):
             f.close()
     else:
         # make a list of Message-Id we have
-        fp1 = file(path, 'ab+')
-        ids1 = [ x.get('Message-Id') for x in mailbox.UnixMailbox(fp1) ]
+        msgids = { x.get('Message-Id') for x in mailbox.mbox(path) }
 
-        # get remote mbox again
-        fp2 = tempfile.TemporaryFile()
-        retrieve_mbox(bug, fp2)
+        with tempfile.NamedTemporaryFile() as tmpfd:
+            # retrieve the remote mbox again
+            retrieve_mbox(bug, tmpfd)
 
-        # parse its messages
-        fp2.seek(0)
-        parser = email.Parser.Parser()
-        msgs2 = dict((x['Message-Id'], x)
-                    for x in mailbox.UnixMailbox(fp2, parser.parse))
+            # parse its messages
+            parser = email.parser.Parser()
+            new_msgids = { x['Message-Id']: x
+                            for x in mailbox.mbox(tmpfd.name, parser.parse) }
 
-        # now append the new ones
-        for msgid in set(msgs2.keys()) - set(ids1):
-            fp1.write('\n' + msgs2[msgid].as_string(unixfrom=True))
+        with open(path, 'a+') as fd:
+            # now append the new messages
+            for msgid in new_msgids.keys() - msgids:
+                fd.write('\n' + new_msgids[msgid].as_string(unixfrom=True))
 
     return path
->8- git ->8-

with:

-8<- bzr patience -8<-
@@ -51,29 +51,30 @@
             f.close()
     else:
         # make a list of Message-Id we have
-        fp1 = file(path, 'ab+')
-        ids1 = [ x.get('Message-Id') for x in mailbox.UnixMailbox(fp1) ]
-
-        # get remote mbox again
-        fp2 = tempfile.TemporaryFile()
-        retrieve_mbox(bug, fp2)
-
-        # parse its messages
-        fp2.seek(0)
-        parser = email.Parser.Parser()
-        msgs2 = dict((x['Message-Id'], x)
-                    for x in mailbox.UnixMailbox(fp2, parser.parse))
-
-        # now append the new ones
-        for msgid in set(msgs2.keys()) - set(ids1):
-            fp1.write('\n' + msgs2[msgid].as_string(unixfrom=True))
+        msgids = { x.get('Message-Id') for x in mailbox.mbox(path) }
+
+        with tempfile.NamedTemporaryFile() as tmpfd:
+            # retrieve the remote mbox again
+            retrieve_mbox(bug, tmpfd)
+
+            # parse its messages
+            parser = email.parser.Parser()
+            new_msgids = { x['Message-Id']: x
+                            for x in mailbox.mbox(tmpfd.name, parser.parse) }
+
+        with open(path, 'a+') as fd:
+            # now append the new messages
+            for msgid in new_msgids.keys() - msgids:
+                fd.write('\n' + new_msgids[msgid].as_string(unixfrom=True))
 
     return path
->8- bzr patience ->8-

I don't know about you, but I find the latter much easier to read,
because the whole context of each version is always available.

As you see, in (at least) this case is just a matter of considering the
blank lines worthy of presented as common, or not.

I'll note that in this particular case, `git diff` yielded the very same
results with or without --patience. I don't know why that is, Johannes?
I'll also note that /usr/bin/diff produces (in this case) something
closer to patience than to git.

I'm attaching both versions of the file in case they are useful to
anybody.

--
Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
I promise you. Once I enter into an exclusive relationship, I sleep with
very few people.
                -- Denny Crane
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 0/3] Teach Git about the patience diff algorithm

Linus Torvalds-3


On Thu, 1 Jan 2009, Adeodato Simó wrote:
>
> For me, the cases where I find patience output to be of substantial
> higher readability are those involving a rewrite of several consecutive
> paragraphs (i.e., lines of code separated by blank lines). Compare:

I don't think that's a "patience diff" issue.

That's simply an issue of merging consecutive diff fragments together if
they are close-by, and that's independent of the actual diff algorithm
itself.

> I'll note that in this particular case, `git diff` yielded the very same
> results with or without --patience. I don't know why that is, Johannes?
> I'll also note that /usr/bin/diff produces (in this case) something
> closer to patience than to git.

See above - I really don't think this has anything to do with "patience vs
non-patience". It's more akin to the things we do for our merge conflict
markers: if we have two merge conflicts next to each other, with just a
couple of lines in between, we coalesce the merge conflicts into one
larger one instead.

We don't do that for regular diffs - they're always kept minimal (ok, not
really minimal, but as close to minimal as the algorithm finds them).

See commit f407f14deaa14ebddd0d27238523ced8eca74393 for the git merge
conflict merging. We _could_ do similar things for regular diffs. It's
sometimes useful, sometimes not.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 0/3] Teach Git about the patience diff algorithm

Clemens Buchacher
On Thu, Jan 01, 2009 at 05:56:13PM -0800, Linus Torvalds wrote:
> See above - I really don't think this has anything to do with "patience vs
> non-patience". It's more akin to the things we do for our merge conflict
> markers: if we have two merge conflicts next to each other, with just a
> couple of lines in between, we coalesce the merge conflicts into one
> larger one instead.

But patience diff does more than that. Take a look at "git diff" and "git
diff --patience" output below (taken from [1]). You can see that patience
diff produces two separate hunks, one removing a function and one adding a
function. Merging consecutive diff fragments would have produced one big
hunk instead.

*** git diff ****************************** git diff --patience **********
 #include <stdio.h>                   | #include <stdio.h>
                                      |
+int fib(int n)                       |-// Frobs foo heartily
+{                                    |-int frobnitz(int foo)
+    if(n > 2)                        |+int fib(int n)
+    {                                | {
+        return fib(n-1) + fib(n-2);  |-    int i;
+    }                                |-    for(i = 0; i < 10; i++)
+    return 1;                        |+    if(n > 2)
+}                                    |     {
+                                     |-        printf("Your answer is: ");
 // Frobs foo heartily                |-        printf("%d\n", foo);
 int frobnitz(int foo)                |+        return fib(n-1) + fib(n-2);
 {                                    |     }
     int i;                           |+    return 1;
     for(i = 0; i < 10; i++)          | }
     {                                |
-        printf("Your answer is: ");  |-int fact(int n)
         printf("%d\n", foo);         |+// Frobs foo heartily
     }                                |+int frobnitz(int foo)
 }                                    | {
                                      |-    if(n > 1)
-int fact(int n)                      |+    int i;
-{                                    |+    for(i = 0; i < 10; i++)
-    if(n > 1)                        |     {
-    {                                |-        return fact(n-1) * n;
-        return fact(n-1) * n;        |+        printf("%d\n", foo);
-    }                                |     }
-    return 1;                        |-    return 1;
-}                                    | }
-                                     |
 int main(int argc, char **argv)      | int main(int argc, char **argv)
 {                                    | {
-    frobnitz(fact(10));              |-    frobnitz(fact(10));
+    frobnitz(fib(10));               |+    frobnitz(fib(10));
 }                                    | }

[1] http://alfedenzo.livejournal.com/170301.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 0/3] Teach Git about the patience diff algorithm

Clemens Buchacher
Only two choices, and I still get it wrong. The diffs should be labeled the
other way around, of course.

*** git diff --patience ******************** git diff ********************
 #include <stdio.h>                   | #include <stdio.h>
                                      |
+int fib(int n)                       |-// Frobs foo heartily
+{                                    |-int frobnitz(int foo)
+    if(n > 2)                        |+int fib(int n)
+    {                                | {
+        return fib(n-1) + fib(n-2);  |-    int i;
+    }                                |-    for(i = 0; i < 10; i++)
+    return 1;                        |+    if(n > 2)
+}                                    |     {
+                                     |-        printf("Your answer is: ");
 // Frobs foo heartily                |-        printf("%d\n", foo);
 int frobnitz(int foo)                |+        return fib(n-1) + fib(n-2);
 {                                    |     }
     int i;                           |+    return 1;
     for(i = 0; i < 10; i++)          | }
     {                                |
-        printf("Your answer is: ");  |-int fact(int n)
         printf("%d\n", foo);         |+// Frobs foo heartily
     }                                |+int frobnitz(int foo)
 }                                    | {
                                      |-    if(n > 1)
-int fact(int n)                      |+    int i;
-{                                    |+    for(i = 0; i < 10; i++)
-    if(n > 1)                        |     {
-    {                                |-        return fact(n-1) * n;
-        return fact(n-1) * n;        |+        printf("%d\n", foo);
-    }                                |     }
-    return 1;                        |-    return 1;
-}                                    | }
-                                     |
 int main(int argc, char **argv)      | int main(int argc, char **argv)
 {                                    | {
-    frobnitz(fact(10));              |-    frobnitz(fact(10));
+    frobnitz(fib(10));               |+    frobnitz(fib(10));
 }                                    | }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 0/3] Teach Git about the patience diff algorithm

Junio C Hamano
In reply to this post by Clemens Buchacher
Clemens Buchacher <[hidden email]> writes:

> But patience diff does more than that. Take a look at "git diff" and "git
> diff --patience" output below (taken from [1]).
>
> *** git diff ****************************** git diff --patience **********
>  #include <stdio.h>                   | #include <stdio.h>

I suspect you have the above labels backwards...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
1234
Loading...