Quantcast

purging unwanted history

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

purging unwanted history

Geoff Russell-3
I have a repository with 5 years worth of history, I only want to keep
1 year, so I want to purge the
first 4 years. As it happens, the repository only has a single branch
which should
simplify the problem.

Cheers,

Geoff Russell

P.S. Apologies, but I've asked this question before but didn't get an
answer which
I understood or which worked, so perhaps my description of the problem
was faulty. This
is a second attempt.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: purging unwanted history

Björn Steinbrink
On 2008.11.17 10:56:23 +1030, Geoff Russell wrote:
> I have a repository with 5 years worth of history, I only want to keep
> 1 year, so I want to purge the first 4 years. As it happens, the
> repository only has a single branch which should simplify the problem.

Use filter-branch to drop the parents on the first commit you want to
keep, and then drop the old cruft.

Let's say $drop is the hash of the latest commit you want to drop. To
keep things sane and simple, make sure the first commit you want to
keep, ie. the child of $drop, is not a merge commit. Then you can use:

git filter-branch --parent-filter "sed -e 's/-p $drop//'" \
        --tag-name-filter cat -- \
        --all ^$drop

The above rewrites the parents of all commits that come "after" $drop.

Check the results with gitk.


Then, to clean out all the old cruft.

First, the backup references from filter-branch:

git for-each-ref --format='%(refname)' refs/original | \
        while read ref
        do
                git update-ref -d "$ref"
        done

Then clean your reflogs:
git reflog expire --expire=0 --all

And finally, repack and drop all the old unreachable objects:
git repack -ad
git prune # For objects that repack -ad might have left around

At that point, everything leading up to and including $drop should be
gone.

HTH
Björn
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: purging unwanted history

Björn Steinbrink
On 2008.11.17 03:24:12 +0100, Björn Steinbrink wrote:

> On 2008.11.17 10:56:23 +1030, Geoff Russell wrote:
> > I have a repository with 5 years worth of history, I only want to keep
> > 1 year, so I want to purge the first 4 years. As it happens, the
> > repository only has a single branch which should simplify the problem.
>
> Use filter-branch to drop the parents on the first commit you want to
> keep, and then drop the old cruft.
>
> Let's say $drop is the hash of the latest commit you want to drop. To
> keep things sane and simple, make sure the first commit you want to
> keep, ie. the child of $drop, is not a merge commit. Then you can use:
>
> git filter-branch --parent-filter "sed -e 's/-p $drop//'" \
> --tag-name-filter cat -- \
> --all ^$drop
>
> The above rewrites the parents of all commits that come "after" $drop.
>
> Check the results with gitk.
>
>
> Then, to clean out all the old cruft.
>
> First, the backup references from filter-branch:
>
> git for-each-ref --format='%(refname)' refs/original | \
> while read ref
> do
> git update-ref -d "$ref"
> done
>
> Then clean your reflogs:
> git reflog expire --expire=0 --all
>
> And finally, repack and drop all the old unreachable objects:
> git repack -ad
> git prune # For objects that repack -ad might have left around
>
> At that point, everything leading up to and including $drop should be
> gone.

Hm, on second thought, if you have tags referencing some of the old
history, they'll still be around, I think. Just delete those before you
start the rewriting.

And of course do the above with a copy of your repo. Just in case.

Björn
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: purging unwanted history

Geoff Russell-3
On Mon, Nov 17, 2008 at 12:57 PM, Björn Steinbrink <[hidden email]> wrote:

> On 2008.11.17 03:24:12 +0100, Björn Steinbrink wrote:
>> On 2008.11.17 10:56:23 +1030, Geoff Russell wrote:
>> > I have a repository with 5 years worth of history, I only want to keep
>> > 1 year, so I want to purge the first 4 years. As it happens, the
>> > repository only has a single branch which should simplify the problem.
>>
>> Use filter-branch to drop the parents on the first commit you want to
>> keep, and then drop the old cruft.
>>
>> Let's say $drop is the hash of the latest commit you want to drop. To
>> keep things sane and simple, make sure the first commit you want to
>> keep, ie. the child of $drop, is not a merge commit. Then you can use:
>>
>> git filter-branch --parent-filter "sed -e 's/-p $drop//'" \
>>       --tag-name-filter cat -- \
>>       --all ^$drop
>>
>> The above rewrites the parents of all commits that come "after" $drop.
>>
>> Check the results with gitk.
>>
>>
>> Then, to clean out all the old cruft.
>>
>> First, the backup references from filter-branch:
>>
>> git for-each-ref --format='%(refname)' refs/original | \
>>       while read ref
>>       do
>>               git update-ref -d "$ref"
>>       done
>>
>> Then clean your reflogs:
>> git reflog expire --expire=0 --all
>>
>> And finally, repack and drop all the old unreachable objects:
>> git repack -ad
>> git prune # For objects that repack -ad might have left around
>>
>> At that point, everything leading up to and including $drop should be
>> gone.
>
> Hm, on second thought, if you have tags referencing some of the old
> history, they'll still be around, I think. Just delete those before you
> start the rewriting.
>
> And of course do the above with a copy of your repo. Just in case.
>
> Björn
>

Great, I've just tested this and it is exactly what I want. I'm still
getting my head around
why, but understanding will arrive with a little more thought.


Many thanks,

Geoff



--
6 Fifth Ave,
St Morris, S.A. 5068
Australia
Ph: 041 8805 184 / 08 8332 5069
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: purging unwanted history

Marcel M. Cary-2
In reply to this post by Geoff Russell-3
Geoff,

I'm able to prune history with git filter-branch.  For example, to throw
away history on the current branch before commit
171d7661eda111d3e35f6e8097a1a3a07b30026c, I tried:

git filter-branch --parent-filter '
    if [ $GIT_COMMIT = 171d7661eda111d3e35f6e8097a1a3a07b30026c ]; then
        echo "";
    else
        read line;
        echo $line;
    fi'

I found the diff between that commit and it's rewritten version was
empty, and diffs to subsequent commits looked sane.  It took an hour on
the git repository with about 16k commits.  I probably should have
excluded all the commits I didn't want to keep to reduce processing time.

However, after deleting all but the rewritten branch and cloning the
repository, I didn't notice any decrease in the size of .git/, so I'm
not sure why you'd want to do that.  Also, all the remaining commitIDs
changed so any previous clones would have a tough time merging with yours.

Another possibility whose results might be similar in runtime and
repository size would be to run git rebase --interactive and squash all
the commits together before the ones you want to keep.

Marcel


Geoff Russell wrote:

> I have a repository with 5 years worth of history, I only want to keep
> 1 year, so I want to purge the
> first 4 years. As it happens, the repository only has a single branch
> which should
> simplify the problem.
>
> Cheers,
>
> Geoff Russell
>
> P.S. Apologies, but I've asked this question before but didn't get an
> answer which
> I understood or which worked, so perhaps my description of the problem
> was faulty. This
> is a second attempt.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...