PDA

View Full Version : Windows threading problems in drivers



avid
August 3rd, 2004, 07:18 AM
Ben,

This might get a bit complicated, but stick with it!

I have analysed the issues which caused Rob’s lock-ups when using scrolling (marquee) text and the steps I had to take to avoid them. These have significant consequences for safe usage of the driver API.

Rob’s example showed up two problems.

1) In his use of LDJ, the Girder LUA was sending changes to NR variables which were being displayed in scrolling transparent text boxes. This then calls the text box driver’s VariableChanged method. As part of the implementation of the text boxes I have to act on the “window” which is the on-screen representation of the text button. In particular it is necessary to first hide then show transparent text, to capture the correct background. But the call to VariableChanged is on the Girder driver’s listen thread! This thread is not allowed to do Windows actions in-line. So I had to add a deferred mechanism to get the Windows actions moved from the VariableChanged method onto a “safe” thread.

2) More subtly, it is possible for the first call of PaintButton for a button to be called on Girder driver’s listen thread. This arises when a button is on a hidden frame, and then Girder LUA sends a change to the STATE variable for the frame. If this makes the frame visible, the driver sees a PaintButton call for the first time. It is not allowed to create a window in this thread and doing so will lead to threading issues. Effectively it means that it is unsafe to do any window painting inside PaintButton!! Again I had to add a deferred mechanism to get the Windows actions moved from the PaintButton method onto a “safe” thread.

In both cases, I could manage the deferred call as I was sub-classing your main window and so could post it a couple of private WM_ messages which my WndProc would handle knowing that it was safe to act. I have not yet uploaded the source with these changes, but will do so this evening.

Does the explanation make sense? Do you see the issues??

As I see it, there are four options going forward post-1.0:

1) Leave it as it is, with the AvidUtils source and this posting acting as guidance to future driver writers. It may then be necessary for any (every) driver which displays anything to sub-class the main window.

2) Leave it as it is, but make it clear in SDK documentation what the Windows re-entrancy restrictions may be.

3) Add a new “call me back on a safe thread” call which the driver can use to ensure that Windows operations are safely allowed.

4) Change NR so that calls to VariableChanged and PaintButton are only made from the Windows message thread, which is always safe.

What do you think?

Brian

Ben S
August 3rd, 2004, 08:01 AM
It sounds like 4 would be the best option. When the variable changes, it can see if there are any watches on the variable.

If so, it can post a variable changed message to the main windows thread, which will then be responsible for dispatching the variable change.

As this is causing some stability problems, I am going to escalate this to a fix for 1.0. I'll do so this evening and get something into your hands to check.

Thanks Brian.

avid
August 3rd, 2004, 08:26 AM
Sounds good to me. Obviously that leads to the "cleanest" driver solution. I am only worried that by raising this *now* I am just adding a further delay to 1.0.

I will need to get Rob to test any changes to NR and my driver, as I don't have a LDJ setup. But I guess we can turn it around pretty quickly.

Brian

Ben S
August 3rd, 2004, 08:32 AM
The rest of 1.0 is pretty much "ready" to go, we're waiting for some work to be finished on the IR dll that the IR plugin uses, so I have time for this now.

I've been working on the migration here (done) and the designer (coming along nicely).

avid
August 3rd, 2004, 08:52 AM
The rest of 1.0 is pretty much "ready" to go
Does that mean you have found (or at least understood) the disconnection issues I was having??

Ben S
August 3rd, 2004, 08:54 AM
No, did you get the code I sent you to take a look at?

avid
August 3rd, 2004, 09:03 AM
I did, but I haven't spotted anything yet. I was more concerned about my windows threading problems.

Is there any chance you could try the recipe I posted with my pre-release uploaded ZP driver and CCF, and with any recent copy of Zoom (4.0 or 4.01):

I would think that reproducing it should be as simple as: Run ZP and play a media file; Open the ZP CCF, confirming that it is updating OK; Quit ZP; Try to do something else with NR - it will be like treacle. At least that's what I am seeing. There seems to be no need to wait for a suspend.

There is no problem with the Win32 version, which lulled me into a false sense of security.
Meanwhile, I will re-read your sockets code again to see if I have any misconceptions as to how it should be used.

Brian

Ben S
August 3rd, 2004, 09:09 AM
Okay. I'll recheck Zoom this evening, bringing Zoom down while connected via PPC.

If needed, I can pass you the other build files for the PluginSocket library so you can run some checks in there.

Thanks Brian.

avid
August 3rd, 2004, 09:17 AM
If needed, I can pass you the other build files for the PluginSocket library so you can run some checks in there.
It might be useful. That way I can at least add tracing to see the flow. But it might be a day or two before I can get a large enough window of time in the evenings to have a good run at it.

Brian

avid
August 3rd, 2004, 05:31 PM
Ben,

One piece of "suspicious" coding is in PluginSocket::ListenThread. Are you totally sure that if select() returns SOCKET_ERROR, that the FD_SET parameters are cleared? And are you sure on all platforms - like on a PPC??

Imagine if this is not so and the remote end of a socket were closed. The select would return an error, but it would not get noticed because m_socket would still be in fdset, and so ReceivePending would be called. The ReceivePending calls ReadLine which calls recv. I am not sure that recv behaves well in this case.

Maybe the select() call should check for an error?

This is probably *not* the problem, as I can't see how the client then gets stuck in a loop, but I thought it worth mentioning just in case.

Brian

Ben S
August 3rd, 2004, 07:40 PM
Okay. I'll send you the PluginSocket project so you can take a look.

It's working -great- from here. Exiting ZoomPlayer (newest version just downloaded) and NetRemote PPC still working fine. Opening ZP back up and NR PPC picks it back up again.

Regarding select, it was my understanding that if the socket was closed on the server it would not be in the readfds, only if it was closed on the client. There is no harm in wrapping the select with a check for error. I'll do that and add the threading code we spoke about and fire something off to you.

avid
August 4th, 2004, 03:47 AM
Very strange that it's working fine for you. The recipe shows the problem 100% for me. So I guess I need to instrument a copy of the sockets code to try to find out where it's looping or hanging on my PPC.

Brian

avid
August 4th, 2004, 03:22 PM
Well it's working fine for me now. And *all* I changed was to build with the newer SDK you sent me. That implies to me that somehow you "accidentally" fixed it in your other sockets changes.

The one thing I can't understand is how it worked for you - unless you decided to build it yourself from the source instead of use the uploaded DLL.

But - so far so good. Thanks.

Brian

Ben S
August 17th, 2004, 11:49 PM
No problem. Not sure what was going on exactly, but we're working flawlessly now.