Quantcast

How to make git diff-* ignore some patterns?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to make git diff-* ignore some patterns?

Dirk Süsserott
Hi list,

is there a way to tell "git diff-index" to ignore some special patterns,
such that /^-- Dump completed on .*$/ is NOT recognized as a difference
and "git diff-index" returns 0 if that's the only difference?

     -- Dirk

<Background>
I have a mySQL database which I backup daily using mysqldump (cronjob).
The result is a text file (*.sql) with all the "create" and "insert"
statements and some metadata.
I used to use tar and gzip to backup these files and got a huge
collection of backups in the last tree years (500+ MB).
Then I switched to Git and recorded only the diffs between day X and day
X-1. My repository shrunk to 16 MB for the very same data, which was great!

My database doesn't change every day, but I backup it anway and store
the backup files with Git and a cronjob. It does:

---------------
mysqldump ... -r <backupfile> # that's the output file ;-)
git add <backupfile>
if ! git diff-index --quiet HEAD --; then
     git commit -m "Backup of <database> at <timestamp>"
fi
---------------

This way, a new commit is only done when the backupfile has changed. So
far, so perfect.
A few days ago my web hoster (where the database actually resides)
changed the mySQL version.
mysqldump now writes "-- Dump completed on <timestamp>" to the file and
Git correctly recognizes this as a change and my script creates a new
commit. Every day, even if only that line has changed.

I'd like to skip these commits if only the "Dump completed" line has
changed.
</Background>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to make git diff-* ignore some patterns?

MichaelJGruber
Dirk Süsserott venit, vidit, dixit 21.11.2009 17:40:

> Hi list,
>
> is there a way to tell "git diff-index" to ignore some special patterns,
> such that /^-- Dump completed on .*$/ is NOT recognized as a difference
> and "git diff-index" returns 0 if that's the only difference?
>
>      -- Dirk
>
> <Background>
> I have a mySQL database which I backup daily using mysqldump (cronjob).
> The result is a text file (*.sql) with all the "create" and "insert"
> statements and some metadata.
> I used to use tar and gzip to backup these files and got a huge
> collection of backups in the last tree years (500+ MB).
> Then I switched to Git and recorded only the diffs between day X and day
> X-1. My repository shrunk to 16 MB for the very same data, which was great!
>
> My database doesn't change every day, but I backup it anway and store
> the backup files with Git and a cronjob. It does:
>
> ---------------
> mysqldump ... -r <backupfile> # that's the output file ;-)
> git add <backupfile>
> if ! git diff-index --quiet HEAD --; then
>      git commit -m "Backup of <database> at <timestamp>"
> fi
> ---------------
>
> This way, a new commit is only done when the backupfile has changed. So
> far, so perfect.
> A few days ago my web hoster (where the database actually resides)
> changed the mySQL version.
> mysqldump now writes "-- Dump completed on <timestamp>" to the file and
> Git correctly recognizes this as a change and my script creates a new
> commit. Every day, even if only that line has changed.
>
> I'd like to skip these commits if only the "Dump completed" line has
> changed.
> </Background>

Is the dump guaranteed to be in a specific order? If yes then this
procedure makes sense. (pdfs etc. are problematic because of reordering.)

You can either egrep -v through the output of git diff-index, or define
a diff driver: set an attribute, say "dumpdiff", for dump files (see
gitattributes) and define diff driver as
git config diff.dumpdiff.textconv = dumpdiff.sh
where dumpdiff.sh is "egrep -v ...". You may need to call diff-index
with --ext-diff. I haven't tried, though ;)

Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to make git diff-* ignore some patterns?

Björn Steinbrink
In reply to this post by Dirk Süsserott
On 2009.11.21 17:40:14 +0100, Dirk Süsserott wrote:
> is there a way to tell "git diff-index" to ignore some special
> patterns, such that /^-- Dump completed on .*$/ is NOT recognized as
> a difference and "git diff-index" returns 0 if that's the only
> difference?

If you don't mind losing that line, you could use a clean filter via
.gitattributes:

echo '*.sql filter=mysql_dump' >> .gitattributes
git config filter.mysql_dump.clean "sed -e '/^-- Dump completed on .*$/d'"

That way, git will filter all *.sql paths through that sed command
before storing them as blobs, dropping that "Dump completed" line from
the data stored in the repo.

Björn
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to make git diff-* ignore some patterns?

Dirk Süsserott
Am 21.11.2009 19:07 schrieb Björn Steinbrink:

> On 2009.11.21 17:40:14 +0100, Dirk Süsserott wrote:
>> is there a way to tell "git diff-index" to ignore some special
>> patterns, such that /^-- Dump completed on .*$/ is NOT recognized as
>> a difference and "git diff-index" returns 0 if that's the only
>> difference?
>
> If you don't mind losing that line, you could use a clean filter via
> .gitattributes:
>
> echo '*.sql filter=mysql_dump' >> .gitattributes
> git config filter.mysql_dump.clean "sed -e '/^-- Dump completed on .*$/d'"
>
> That way, git will filter all *.sql paths through that sed command
> before storing them as blobs, dropping that "Dump completed" line from
> the data stored in the repo.
>
> Björn
>

Thank you Björn and Michael,

Your suggestions were really helpful. I decided to use Björn's 'clean
filter' approach. It works great.

-- Dirk
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...