Make the git codebase thread-safe

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Stefan Zager
On Fri, Feb 14, 2014 at 11:04 AM, Karsten Blees <[hidden email]> wrote:
>
> Damn...you're right, multi-threaded git-index-pack works fine, but some tests fail badly. Mixed reads would have to be from git_mmap, which is the only other caller of pread().

msysgit used git_mmap() as defined in compat/win32mmap.c, which does
not use pread.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Karsten Blees-2
In reply to this post by Zachary Turner
Am 14.02.2014 20:16, schrieb Zachary Turner:

> For the mixed read, we wouldn't be looking for another caller of
> pread() (since it doesn't care what the file pointer is), but instead
> a caller of read() or lseek() (since those do depend on the current
> file pointer).  In index-pack.c, I see two possible culprits:
>
> 1) A call to xread() from inside fill()
> 2) A call to lseek in parse_pack_objects()
>
> Do you think these could be related?  If so, maybe that opens up some
> other solutions?
>

Yeah, I think that's it. The problem is that the single-threaded part (parse_pack_objects/parse_pack_header) _also_ calls pread (via sha1_object -> get_data_from_pack -> unpack_data). So a pread() that modifies the file position would naturally be bad in this single-threaded scenario. Incidentally, that's exactly what the lstat64 in the version below fixes (similar to git_pread).

> BTW, the version you posted isn't thread safe.

It is true that, in a multi-threaded scenario, my version modifies the file position in some indeterministic way. However, as you noted above, the file position is irrelevant to pread(), so that's perfectly thread-safe, as long as all threads use pread() exclusively.

Using [x]read() in one of the threads would _not_ be thread-safe, but we're not doing that here. Both fill()/xread() and parse_pack_objects()/lseek() are unreachable from threaded_second_pass(), and the main thread just waits for the background threads to complete...

>>> A simple alternative to ReOpenHandle is to reset the file pointer to its
>>> original position, as in compat/pread.c::git_pread. Thus single-theaded code
>>> can mix read()/pread() at will, but multi-threaded code has to use pread()
>>> exclusively (which is usually the case anyway). A main thread using read()
>>> and background threads using pread() (which is technically allowed by POSIX)
>>> will fail with this solution.
>>>
>>> This version passes the test suite on msysgit:
>>>
>>> ----8<----
>>> ssize_t mingw_pread(int fd, void *buf, size_t count, off64_t offset)
>>> {
>>>         DWORD bytes_read;
>>>         OVERLAPPED overlapped;
>>>         off64_t current;
>>>         memset(&overlapped, 0, sizeof(overlapped));
>>>         overlapped.Offset = (DWORD) offset;
>>>         overlapped.OffsetHigh = (DWORD) (offset >> 32);
>>>
>>>         current = lseek64(fd, 0, SEEK_CUR);
>>>
>>>         if (!ReadFile((HANDLE)_get_osfhandle(fd), buf, count, &bytes_read,
>>> &overlapped)) {
>>>                 errno = err_win_to_posix(GetLastError());
>>>                 return -1;
>>>         }
>>>
>>>         lseek64(fd, current, SEEK_SET);
>>>
>>>         return (ssize_t) bytes_read;
>>> }
>>>
>>

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Duy Nguyen
In reply to this post by Zachary Turner
On Sat, Feb 15, 2014 at 2:16 AM, Zachary Turner <[hidden email]> wrote:

> (Gah, sorry if you're receiving multiple emails to your personal
> addresses, I need to get used to manually setting Plain-text mode
> every time I send a message).
>
> For the mixed read, we wouldn't be looking for another caller of
> pread() (since it doesn't care what the file pointer is), but instead
> a caller of read() or lseek() (since those do depend on the current
> file pointer).  In index-pack.c, I see two possible culprits:
>
> 1) A call to xread() from inside fill()
> 2) A call to lseek in parse_pack_objects()
>
> Do you think these could be related?  If so, maybe that opens up some
> other solutions?

For index-pack alone, what's wrong with open one file handle per thread?
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Stefan Zager
On Fri, Feb 14, 2014 at 4:45 PM, Duy Nguyen <[hidden email]> wrote:

> On Sat, Feb 15, 2014 at 2:16 AM, Zachary Turner <[hidden email]> wrote:
>> (Gah, sorry if you're receiving multiple emails to your personal
>> addresses, I need to get used to manually setting Plain-text mode
>> every time I send a message).
>>
>> For the mixed read, we wouldn't be looking for another caller of
>> pread() (since it doesn't care what the file pointer is), but instead
>> a caller of read() or lseek() (since those do depend on the current
>> file pointer).  In index-pack.c, I see two possible culprits:
>>
>> 1) A call to xread() from inside fill()
>> 2) A call to lseek in parse_pack_objects()
>>
>> Do you think these could be related?  If so, maybe that opens up some
>> other solutions?
>
> For index-pack alone, what's wrong with open one file handle per thread?

Nothing wrong with that, except that it would mean either using
thread-local storage (which the code doesn't currently use); or
plumbing pack_fd through the call stack, which doesn't sound very fun.

Stefan

> --
> Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Duy Nguyen
On Sat, Feb 15, 2014 at 7:50 AM, Stefan Zager <[hidden email]> wrote:

> On Fri, Feb 14, 2014 at 4:45 PM, Duy Nguyen <[hidden email]> wrote:
>> On Sat, Feb 15, 2014 at 2:16 AM, Zachary Turner <[hidden email]> wrote:
>>> (Gah, sorry if you're receiving multiple emails to your personal
>>> addresses, I need to get used to manually setting Plain-text mode
>>> every time I send a message).
>>>
>>> For the mixed read, we wouldn't be looking for another caller of
>>> pread() (since it doesn't care what the file pointer is), but instead
>>> a caller of read() or lseek() (since those do depend on the current
>>> file pointer).  In index-pack.c, I see two possible culprits:
>>>
>>> 1) A call to xread() from inside fill()
>>> 2) A call to lseek in parse_pack_objects()
>>>
>>> Do you think these could be related?  If so, maybe that opens up some
>>> other solutions?
>>
>> For index-pack alone, what's wrong with open one file handle per thread?
>
> Nothing wrong with that, except that it would mean either using
> thread-local storage (which the code doesn't currently use); or
> plumbing pack_fd through the call stack, which doesn't sound very fun.

Current code does use thread-local storage (struct thread_local and
get_thread_data). Adding a new file handle when NO_THREAD_SAFE_PREAD
is defined is simpler imo.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Zachary Turner
Even if we make that change to use TLS for this case, the
implementation of pread() will still change in such a way that the
semantics of pread() are different on Windows.  Is that ok?

Just to summarize, here's the viable approaches I've seen discussed so far:

1) Use _WINVER at compile time to select either a thread-safe or
non-thread-safe implementation of pread.  This is the easiest possible
code change, but would necessitate 2 binary distributions of git for
windows.
2) Use TLS as you suggest and have one fd per pack thread.  Probably
the most complicated code change (at least for me, being a first-time
contributor)
3) Use Karsten's suggested implementation from earlier in the thread.
Seems to work, but it's a little confusing from a readability
standpoint since the implementation is not-thread safe except in this
specific usage context.

On Fri, Feb 14, 2014 at 4:56 PM, Duy Nguyen <[hidden email]> wrote:

> On Sat, Feb 15, 2014 at 7:50 AM, Stefan Zager <[hidden email]> wrote:
>> On Fri, Feb 14, 2014 at 4:45 PM, Duy Nguyen <[hidden email]> wrote:
>>> On Sat, Feb 15, 2014 at 2:16 AM, Zachary Turner <[hidden email]> wrote:
>>>> (Gah, sorry if you're receiving multiple emails to your personal
>>>> addresses, I need to get used to manually setting Plain-text mode
>>>> every time I send a message).
>>>>
>>>> For the mixed read, we wouldn't be looking for another caller of
>>>> pread() (since it doesn't care what the file pointer is), but instead
>>>> a caller of read() or lseek() (since those do depend on the current
>>>> file pointer).  In index-pack.c, I see two possible culprits:
>>>>
>>>> 1) A call to xread() from inside fill()
>>>> 2) A call to lseek in parse_pack_objects()
>>>>
>>>> Do you think these could be related?  If so, maybe that opens up some
>>>> other solutions?
>>>
>>> For index-pack alone, what's wrong with open one file handle per thread?
>>
>> Nothing wrong with that, except that it would mean either using
>> thread-local storage (which the code doesn't currently use); or
>> plumbing pack_fd through the call stack, which doesn't sound very fun.
>
> Current code does use thread-local storage (struct thread_local and
> get_thread_data). Adding a new file handle when NO_THREAD_SAFE_PREAD
> is defined is simpler imo.
> --
> Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Duy Nguyen
On Sat, Feb 15, 2014 at 8:15 AM, Zachary Turner <[hidden email]> wrote:

> Even if we make that change to use TLS for this case, the
> implementation of pread() will still change in such a way that the
> semantics of pread() are different on Windows.  Is that ok?
>
> Just to summarize, here's the viable approaches I've seen discussed so far:
>
> 1) Use _WINVER at compile time to select either a thread-safe or
> non-thread-safe implementation of pread.  This is the easiest possible
> code change, but would necessitate 2 binary distributions of git for
> windows.
> 2) Use TLS as you suggest and have one fd per pack thread.  Probably
> the most complicated code change (at least for me, being a first-time
> contributor)

It's not so complicated. I suggested a patch [1] before (surprise!).

> 3) Use Karsten's suggested implementation from earlier in the thread.
> Seems to work, but it's a little confusing from a readability
> standpoint since the implementation is not-thread safe except in this
> specific usage contex

[1] http://article.gmane.org/gmane.comp.version-control.git/196042
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Junio C Hamano
Duy Nguyen <[hidden email]> writes:

> On Sat, Feb 15, 2014 at 8:15 AM, Zachary Turner <[hidden email]> wrote:
> ...
>> 2) Use TLS as you suggest and have one fd per pack thread.  Probably
>> the most complicated code change (at least for me, being a first-time
>> contributor)
>
> It's not so complicated. I suggested a patch [1] before (surprise!).
> ...
> [1] http://article.gmane.org/gmane.comp.version-control.git/196042

That message is at the tail end of the discussion. I wonder why
nothing came out of it back then.

While I do not see anything glaringly wrong with the change from a
quick glance over it, it would be nice to hear how well it performs
on their platform from Windows folks.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Make the git codebase thread-safe

Zachary Turner
It shouldn't be hard for us to run some tests with this patch applied.
 Will report back in a day or two.

On Tue, Feb 18, 2014 at 9:55 AM, Junio C Hamano <[hidden email]> wrote:

> Duy Nguyen <[hidden email]> writes:
>
>> On Sat, Feb 15, 2014 at 8:15 AM, Zachary Turner <[hidden email]> wrote:
>> ...
>>> 2) Use TLS as you suggest and have one fd per pack thread.  Probably
>>> the most complicated code change (at least for me, being a first-time
>>> contributor)
>>
>> It's not so complicated. I suggested a patch [1] before (surprise!).
>> ...
>> [1] http://article.gmane.org/gmane.comp.version-control.git/196042
>
> That message is at the tail end of the discussion. I wonder why
> nothing came out of it back then.
>
> While I do not see anything glaringly wrong with the change from a
> quick glance over it, it would be nice to hear how well it performs
> on their platform from Windows folks.
>
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
123