Pages: [1]
cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-08 05:01:48

I am getting this problem "error while computing" on a new computer I set up a few days ago. BOINC is running as a service under Windows Vista Home Premium on a dual core machine. Is is using both processors and no GPU. It is a fresh install of the latest client software. Not all units are having this problem. The last day's report came back with 6 good and 10 errors, all the same.


Here is the link to the computer in question:

http://www.primegrid.com/results.php?hostid=135616


Here is what the errors all look like in greater detail:

Name LLR_SGS_42411056_2
Workunit 100035414
Created 4 Jan 2010 0:47:46 UTC
Sent 4 Jan 2010 0:58:36 UTC
Received 7 Jan 2010 3:20:52 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -185 (0xffffffffffffff47)
Computer ID 135616
Report deadline 11 Jan 2010 0:58:36 UTC
Run time 0
CPU time 0
stderr out

<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
CreateProcess() failed - Access is denied. (0x5)
</message>
]]>

Validate state Invalid
Claimed credit 0
Granted credit 0
application version Sophie Germain Prime Search (LLR) v5.11


Is this some sort of resource issue on the computer or what? Any ideas on what this is and how to fix it would be greatly appreciated.

Thanks,
cygnus
Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7312
Credits: 511,134,235
World-rank: 2,292

2010-01-08 14:27:57


Cygnus:

On occasional error [on the order of 2%] is not indicative of a problem.

. . . it would have to be real habitual before I would get concerned.



Guest

2010-01-08 15:06:52

cygnus wrote:
I am getting this problem "error while computing" on a new computer I set up a few days ago. BOINC is running as a service under Windows Vista Home Premium on a dual core machine. Is is using both processors and no GPU. It is a fresh install of the latest client software. Not all units are having this problem. The last day's report came back with 6 good and 10 errors, all the same.


you posted the explaination:

<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
CreateProcess() failed - Access is denied. (0x5)
</message>
]]>


most likely some kind of anti-virus thing is blocking them. try to exclude the boinc-folders if you dare.
Guest

2010-01-08 15:09:15

Sid2 wrote:

Cygnus:

On occasional error [on the order of 2%] is not indicative of a problem.

. . . it would have to be real habitual before I would get concerned.



I would really get concerned if i thought 10 out of 16 was less than 2%..
Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7312
Credits: 511,134,235
World-rank: 2,292

2010-01-08 17:05:29

Sid2 wrote:

Cygnus:

On occasional error [on the order of 2%] is not indicative of a problem.

. . . it would have to be real habitual before I would get concerned.





Let's write this response off as an example of what happens when DSL reading meets dialup brain.

cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-09 03:45:04

frankhagen wrote:
most likely some kind of anti-virus thing is blocking them. try to exclude the boinc-folders if you dare.


Thanks for a sensible reply - here is the update:

I don't think it is the virus checker and here is why: the computer is running two projects, PrimeGrid and MilkyWay. I have PrimegGrid set up to send only workunits from projects that have posted an average compute time of one hour or less. This is because the machine is not on 24/7.

As you can see, if you use the links below, the computer has turned in its first MilkyWay workunit without incident, and that uses a lot longer compute time than any of the many PrimeGrid workunits that have been turned in so for. The computer is turned on and off (using shutdown) a few times a day and the client is set to switch projects once an hour. If it were the virus checker doing something every so often when a file is writing I'd think that the odds would have been nearly 100% that the MilkyWay workunit would have been hit given that 45 out of 68 PrimeGrid units that have been returned to date were "Error while computing" status.

Link to this computer on MilkyWay: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=135509
Link to this computer on PrimeGrid: http://www.primegrid.com/results.php?hostid=135616&offset=0&show_names=0&state=0&appid=0

I got administrator access to this machine for a few hours today and cleaned house (deleted useless registry 'Run' items, uninstalled garbage SW, did IE 8 and SP 2 updates that hadn't yet been done, scandisk and defrag, verified that sufficient RAM and hardisk space were free, etc.) and to this moment it hasn't had any further problems, but the night is young. Does any of this give anyone any ideas about this issue?

Again, I appreciate getting a sensible answer here. The message board on the PrimeGrid site ignored the heck out of me.

cygnus
Guest

2010-01-09 09:22:02

cygnus wrote:
I don't think it is the virus checker and here is why: the computer is running two projects, PrimeGrid and MilkyWay. I have PrimegGrid set up to send only workunits from projects that have posted an average compute time of one hour or less. This is because the machine is not on 24/7.


ok, i remeber there is a bug in that wrapper of those LLR-apps.

check if you have a zero value in advanced-preferences-disk and memory usage-write to disk at most every xx seconds.

try 60 and it might work.
cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-11 21:19:47

frankhagen wrote:
... check if you have a zero value in advanced-preferences-disk and memory usage-write to disk at most every xx seconds.

try 60 and it might work. ...


Okay, took me a day to get access to the machine ... it already had a value of 60 in that field. This is the same as my BOINCstats global prefs as well. It is still grinding out more errors than anything else for PrimeGrid. Only 1 of the errors was on a project without LLC in the title/description though, so I think you may be onto something. I appreciate the helpful ideas, but so far no luck. I may resort to dropping PrimeGrid on that machine, as artless of a solution as that is ... sigh ...
ritterm
    Donator
Tester - BOINCstats SOFA member
BAM!ID: 53610
Joined: 2008-06-03
Posts: 1771
Credits: 1,020,580,751
World-rank: 1,379

2010-01-11 21:45:55

cygnus wrote:
I may resort to dropping PrimeGrid on that machine, as artless of a solution as that is ... sigh ...

Or, you could be a little less drastic and simply de-select the LLR apps (and anything else that might cause you problems) in your PG preferences.
cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-12 19:04:31
last modified: 2010-01-12 19:07:30

I had already considered that, ritterm, but that would mean dropping those projects for all of the machines that I have running PrimeGrid without incident. That would leave me with only a few projects selected on those machines as well, and if those few ran out of workunits it would be as if I had dropped PrimeGrid on all of them. That seems a little more drastic to me.
Guest

2010-01-12 20:07:10

cygnus wrote:
I had already considered that, ritterm, but that would mean dropping those projects for all of the machines that I have running PrimeGrid without incident. That would leave me with only a few projects selected on those machines as well, and if those few ran out of workunits it would be as if I had dropped PrimeGrid on all of them. That seems a little more drastic to me.


nope - you can use "separate preferences" for work, school and home - that way you are able to run different sub-projects on selected hosts.

cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-12 21:17:47

frankhagen wrote:

nope - you can use "separate preferences" for work, school and home - that way you are able to run different sub-projects on selected hosts.


I had considered that as a long term solution to the problem. I could simply call the errant computer a "school" computer and then set different prefs at PrimeGrid's end for that computer, as you say. What I did for now is to drop LLC (2 of the 5 projects I was working on) from my prefs at PrimeGrid to see if that clears up the problem for that one machine over the next few days. If it does I'll probably implement the fix you mentioned.

Thanks for your help, Frank. I'll update here in a few days to let you know how it is going.
cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-13 06:32:44

Okay, it's a little early to tell for sure what is going on, but so far for 13 January UTC four workunits have been turned in (no LLC's, obviously) One of the four has the same error message. Maybe the guy who suggested the antivirus issue had something there. This computer is the only one running a different anti-virus software from the others I use for this purpose. I'll continue to watch the results to see what happens for now.


Everybody got to elevate from the norm - N. Peart
$
Marty
 
BOINCstats SOFA member
BAM!ID: 2256
Joined: 2006-06-16
Posts: 873
Credits: 1,541,017,612
World-rank: 1,018

2010-01-15 18:59:20

This might also be a permission/corruption problem on one/some of the slot folders (those are the directories for the currently really running tasks) in the BOINC Data directory.
You could try to run the host completely dry (no tasks left), stop BOINC, delete the everthing in the slot directory, reinstall BOINC (to let the installer set the correct permissions on all files and directories) and start BOINC again.
cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-16 04:35:05

FIrst off, thanks for your suggestion, Marty. I will take note of it and use it if I think it is warranted.

Here's the update:

With the tasks limited to non-LLC type workunits ....
13 Jan - 3 good, 1 error - the error was 321 Sieve
14 Jan - 13 good, 2 errors - the errors were both 321 Sieve
15 Jan - 6 good, no errors

I'm going to let a few more days pass to see how this goes. I don't think this is a definitive result yet. Even if it does work, in the end getting the LLC workunits going is really the best solution. I now have two strategies given to me here to try (mark the BOINC directories off-limits to the virus checker and/or run the workunits out and reinstall).

Thanks, again. More updates soon ...


Everybody got to elevate from the norm - N. Peart
$
cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-21 07:42:51

Here is a further update - limiting the tasks to non-LLC workunits is not successful in resolving this issue. I am contemplating the earlier proposed solution of telling the virus checker to ignore BOINC's workspace.

With the tasks limited to non-LLC type workunits ....
13 Jan - 3 good, 1 error - the error was 321 Sieve
14 Jan - 13 good, 2 errors - the errors were both 321 Sieve
15 Jan - 6 good, no errors
16 Jan - 3 good, no errors
17 Jan - 8 good, 2 errors - the errors were both 321 Sieve
18 Jan - 10 good, 3 errors - two errors were 321 Sieve, one was AP26 Search
19 Jan - 5 good, 2 errors - the errors were both 321 Sieve
20 Jan - 3 good, no errors
21 Jan (so far) - 3 good, 1 error - the error was 321 Sieve

Total: 54 good, 11 errors - the errors were all 321 Sieve except one which was AP26 Search (this computer is also running Prime Sierpinski without error to this point)
Error Percentage = 20.4
Not so good.


Everybody got to elevate from the norm - N. Peart
$
ebahapo
 
BAM!ID: 239
Joined: 2006-05-12
Posts: 659
Credits: 29,820,514
World-rank: 17,664

2010-01-21 16:39:36

I'd never add such exception to the anti-virus. Projects, especially the smaller ones, may not be very well managed and a virus may be distributed via BOINC.

I'd add that if the WUs fail immediately, without using up much CPU time, let the project handle the errors. Otherwise, decrease PG's share of the resources so that you have fewer of its WUs to worry about.

Can you please post the link to a failed WU?

TIA
Crystal Pellet
 
Tester - BOINCstats SOFA member
BAM!ID: 64136
Joined: 2009-01-12
Posts: 4516
Credits: 350,549,237
World-rank: 3,049

2010-01-21 17:23:16
last modified: 2010-01-21 17:25:17

Augustine wrote:

Can you please post the link to a failed WU?

TIA


CPU type GenuineIntel
Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz [x86 Family 6 Model 23 Stepping 10]
Number of processors 2
Coprocessors ---
Operating System Microsoft Windows Vista
Home Premium x86 Edition, Service Pack 2, (06.00.6002.00)
BOINC client version 6.10.18
Memory 3068.24 MB
Cache 3072 KB
Measured floating point speed 2168.73 million ops/sec
Measured integer speed 4276.17 million ops/sec
Link to the errors: http://www.primegrid.com/results.php?hostid=135616&offset=0&show_names=0&state=5&appid=0

All errors with 0.00 seconds cpu-time!!

ebahapo
 
BAM!ID: 239
Joined: 2006-05-12
Posts: 659
Credits: 29,820,514
World-rank: 17,664

2010-01-21 17:30:59

Perhaps you could try stopping BOINC and then delete all "slot" directories. Of course, would lose any ongoing work. I wonder if you have a problem with access permissions to some "slot" directories.

HTH
cygnus
BAM!ID: 79066
Joined: 2009-12-21
Posts: 9
Credits: 588,879
World-rank: 254,429

2010-01-23 04:20:17
last modified: 2010-01-23 04:28:13

Yeah, I kinda thought that the idea about the virus checker was risky, but I didn't have a better idea and the machine is the only one I have running BOINC with a different virus checker installed (not one of my choosing). I haven't done it yet, and now I won't.

The machine is having a fan issue right now and I'll have to take it apart tomorrow to check it out. After that I can try deleting the slot directories as suggested. What are the names of the slot directories under a windows service install? Are they under Program Files/BOINC someplace?

This machine is Vista Home Premium that I used a separate admin account to install BOINC from as a service. This separate admin account is not usually used. I wonder if this could be a source of privilege problems? I doubt it though, since I have other machines under XP and 2000 that have similar service installs done and they work fine.

Crystal Pellet posted the correct link to the error WUs.


Everybody got to elevate from the norm - N. Peart
$
Marty
 
BOINCstats SOFA member
BAM!ID: 2256
Joined: 2006-06-16
Posts: 873
Credits: 1,541,017,612
World-rank: 1,018

2010-01-23 17:23:38
last modified: 2010-01-23 17:26:18

There is a directory 'slots' in your BOINC Data directory (if you don't know where this is check the registry or rerun the BOINC setup till the directory selection; the client_state.xml file is also in the Data directory and you could seach for this). The slot directories themselves are only numbered directories, e.g. 0, 1, 2 ..., within the 'slots' folder.

Since you used a different account for installation make sure that the option similar to 'allow every user to run BOINC' (don't know the exact wording right now) during the the setup was checked.
Pages: [1]

Index :: The Projects :: Error while computing for PrimeGrid
Reason: