@ |
Home | Help | Status | Forums | Glossary | Account
|
log in |
Message boards : Help! : New Host entry after each contact to scheduler
Author | Message |
---|---|
Hi, the title says it all: I came home and found about 25 new hosts under the same name... after each contact to the scheduler it created one more ;) Now the problem for me is not too big, because merging worked fine, but it's a bit annoying... | |
ID: 2319 | Rating: 0 | rate: / Reply Quote | |
Hi, the title says it all: I came home and found about 25 new hosts under the same name... after each contact to the scheduler it created one more ;) Now the problem for me is not too big, because merging worked fine, but it's a bit annoying... Hmmm, I saw something like this when the directory in which BOINC lives/runs was no longer writeable by BOINC. That machine created thousands of new entries on Einstein@Home before we tracked that one down. Check the permissions on C:\\Program Files\\BOINC or wherever you installed it. (If you have XP you can still do this, but it's not accessible by default. See here.) ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 2320 | Rating: 0 | rate: / Reply Quote | |
Check the permissions on C:Program FilesBOINC or wherever you installed it. Thanks, will try this! ;) Funny, that it only happens with Pirates, because PG and SIMAP are doing well... perhaps my laptop wants to enter the ship, hehe! Edit#1: didn't help with the file permission... just "created" a new host again :( Edit#2: message from BOINC is: 17.01.2006 22:03:14|Pirates@Home|[b]Computer ID: not assigned yet;[/b] location: home; project prefs: default ____________ Arrrrr! I'm a proud member o' the pirates' ship crew BOINC@Heidelberg | |
ID: 2321 | Rating: 0 | rate: / Reply Quote | |
| |
ID: 2364 | Rating: 0 | rate: / Reply Quote | |
I had that problem too, in my case the reason was that I had used the old account file with just the new URL inserted. | |
ID: 2417 | Rating: 0 | rate: / Reply Quote | |
I had that problem too, in my case the reason was that I had used the old account file with just the new URL inserted. I used my old account, too, but when I (re-) attached, BOINC used the new url and also created a new account.xml with the matching name. So this shouldn't be the reason, mhh... nevertheless I merged over 50 hosts again. :P I wonder why there's a message like "Computer ID: not assigned yet..." or how this can be solved. ____________ Arrrrr! I'm a proud member o' the pirates' ship crew BOINC@Heidelberg | |
ID: 2441 | Rating: 0 | rate: / Reply Quote | |
Sorry for posting again, but the time for edit was over... :P Just wanted to add that I'm using a development client version (CPDNBBC 5.3.9) and (re-) attached to Pirates via an Account Management System which is currently beta tested. I don't have multiple host entries with 'normal' attaching via BOINC manager (just checked this out...) but as other people had the same problem I'm not sure where the bug is... ;) | |
ID: 2443 | Rating: 0 | rate: / Reply Quote | |
I'm using same AMS system as Cori does. I did not play with any xml files. And I'm using GridRepublic client. | |
ID: 2465 | Rating: 0 | rate: / Reply Quote | |
Heck, I had three host entries again to merge... :P | |
ID: 2468 | Rating: 0 | rate: / Reply Quote | |
Just wanted to add that I'm using a development client version (CPDNBBC 5.3.9) and (re-) attached to Pirates via an Account Management System which is currently beta tested. I don't have multiple host entries with 'normal' attaching via BOINC manager (just checked this out...) but as other people had the same problem I'm not sure where the bug is... ;) Could it be that the AMS is fighting with the BOINC manager or client about the URL? At some point I may switch the Vassar link back to the archive for mission 1 (as it should be) and we'll see what that does for this. And how many new problems it creates. :-) ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 2484 | Rating: 0 | rate: / Reply Quote | |
Just wanted to add that I'm using a development client version (CPDNBBC 5.3.9) and (re-) attached to Pirates via an Account Management System which is currently beta tested. I don't have multiple host entries with 'normal' attaching via BOINC manager (just checked this out...) but as other people had the same problem I'm not sure where the bug is... ;) I'm the developer of the above mentioned AMS, and I have looked into this problem from the AMS side. There is no fighting going on. The AMS lets the host connect and disconnect without a problem. It also 'sees' that a client is connected and doesn't send a new attach command. Even if it would send a new attach command, the host would reject it (and put a message in the message tab 'Already attached')(there is a catch, but it doesn't apply in this case). I also saw the creation of multiple host entries in my account. Whenever the host connected to the Pirates scheduler (manual or automated) it created a new host entry. A host that is connected to an AMS doesn't send (as far as I now, but David or Rom will know) different requests to projects than a host that isn't connected to an AMS. ____________ Join team BOINCstats | |
ID: 2486 | Rating: 0 | rate: / Reply Quote | |
I'm the developer of the above mentioned AMS, and I have looked into this problem from the AMS side. Willy - thanks for all the information. I certainly meant no disrespect to your code when suggesting something to check for. Right now the Pirates server is using BOINC 5.2.15, which is supposed to be the 'stable' branch. Perhaps there is a problem with my server configuration. Earlier there were problems with the permissions in the upload directory, which caused problems for the file deleter, but I don't think would cause this behavior. We'll stay at 5.2.15 for a week or so, then upgrade to 5.3 and live life on the cutting edge for a bit. It will be interesting to see if this is fixed in 5.3 or continues. Thanks for your help. ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 2495 | Rating: 0 | rate: / Reply Quote | |
Willy - thanks for all the information. I certainly meant no disrespect to your code when suggesting something to check for. NP. There could easily be a bug in my code. My AMS is in BETA, you're running a stable version of the server code, so I'm betting on the AMS when there is a problem. But in this case Pirates is the only project (so far) with this particular fluke. ____________ Join team BOINCstats | |
ID: 2497 | Rating: 0 | rate: / Reply Quote | |
At some point I may switch the Vassar link back to the archive for mission 1 (as it should be) and we'll see what that does for this. And how many new problems it creates. :-) When you switch, consider this url. Since i've learned to work with php, it may be more accurate. ____________ Click and enter your name for your BOINC Statistics | |
ID: 2498 | Rating: 0 | rate: / Reply Quote | |
This cloning of hosts isn't a Pirates problem alone, a team mate who crunches Rosetta just told me, that he has the same problem. It must be the new server side software version. | |
ID: 2558 | Rating: 0 | rate: / Reply Quote | |
5.2.15 failed in alpha testing. Don't really know the reason, AFAIK 5.2.13 is the "recommended" release: | |
ID: 2568 | Rating: 0 | rate: / Reply Quote | |
No problems here!! | |
ID: 2579 | Rating: 0 | rate: / Reply Quote | |
To me it happened with 4.19 | |
ID: 2581 | Rating: 0 | rate: / Reply Quote | |
NP. There could easily be a bug in my code. My AMS is in BETA, you're running a stable version of the server code, so I'm betting on the AMS when there is a problem. It could also be a strange interaction between the project server and the account manager, right? Quite by accident I found what looks like an SQL problem in am_set_info.php. Would this kind of error show up if that script, called during a dialogue with the AM, caused an error in setting the host ID? I've made a small change which corrects the error, and when there is work again we can see if this clears up the problem. ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 2599 | Rating: 0 | rate: / Reply Quote | |
Could be, but I don't think so. The duplicates are created during contact between the client and the project. The AMS does not participate in that communication. ____________ Join team BOINCstats | |
ID: 2603 | Rating: 0 | rate: / Reply Quote | |
I can imagine where it comes from : | |
ID: 2606 | Rating: 0 | rate: / Reply Quote | |
I can imagine where it comes from : Interesting theory, I like it for several reasons, one being that it's easily checked. The two schedulers are in fact the same code. I just don't like calling the thing 'cgi'. So I'll cut down to only one, and we will see what that does. ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 2607 | Rating: 0 | rate: / Reply Quote | |
Pirates@Home - 2006-01-26 08:25:34 - Host ID not assigned yet ... --- - 2006-01-26 08:25:38 - Insufficient work; requesting more Pirates@Home - 2006-01-26 08:25:38 - Requesting 146479 seconds of work Pirates@Home - 2006-01-26 08:25:38 - Sending request to scheduler: http://pirates.spy-hill.net/cgi-bin/scheduler Pirates@Home - 2006-01-26 08:25:41 - Scheduler RPC to http://pirates.spy-hill.net/cgi-bin/scheduler failed Pirates@Home - 2006-01-26 08:25:41 - No schedulers responded Pirates@Home - 2006-01-26 08:25:41 - Deferring communication with project for 1 minutes and 0 seconds It still added one ID on this attempt, but only one, before, it has been 2 quite often. So it's basically the same missing handshake ACK that creates those phantom WUs, that get assigned but never arrived. The time values are CET, the host that has been created on this first attempt is 18668. It did create a new sched_reply.xml file btw., it looks OK too,I hve no idea why the BOINC client doesn't use it. Unfortunately it didn't fix the problem, just the "new hosts creation rate" dropped. I wonder why the communication fails so often, even though the transferred data seem to arrive. | |
ID: 2612 | Rating: 0 | rate: / Reply Quote | |
*sigh* I give up, I tried several MTU sizes from 1500 down to 1000 but no change so this is probably not the reason. | |
ID: 2636 | Rating: 0 | rate: / Reply Quote | |
It gives me a sched_reply with a fresh Host-ID on each contact but CC 4.19 still insists "no scheduler responding". I wonder if 'distance' to the server has any relevance. How far are you from various sites (in ms, as reported by ping or traceroute)? It it's too far, then the request for scheduler contact could time out even if a reply comes back eventually. This morning I pointed pirates.vassar.edu back to the archive, so anybody still pointing to that old address will no longer get a scheduler response at all. ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 2641 | Rating: 0 | rate: / Reply Quote | |
3 39 ms 39 ms 39 ms 217.0.72.50 4 131 ms 131 ms 132 ms 62.156.131.182 5 133 ms 134 ms 135 ms so-6-0-0.gar1.Washington1.Level3.net [67.29.172.1] 6 142 ms 132 ms 144 ms ae-32-56.ebr2.Washington1.Level3.net [4.68.121.190] 7 137 ms 140 ms 147 ms ae-1.ebr2.Washington1.Level3.net [4.69.132.30] 8 140 ms 136 ms 149 ms ae-3.ebr2.NewYork1.Level3.net [4.69.132.94] 9 138 ms 134 ms 137 ms ae-22-56.car2.NewYork1.Level3.net [4.68.97.181] 10 136 ms 135 ms 134 ms pos15-0-nycmnyrdc-rtr1.nyc.rr.com [24.29.113.157] 11 135 ms 139 ms 137 ms pos15-0-nycmnyrdc-rtr1.nyc.rr.com [24.29.113.157] 12 139 ms 139 ms 140 ms 24.164.160.41 13 147 ms 149 ms 142 ms 24.164.160.136 14 158 ms 149 ms 154 ms cpe-204-210-158-6.hvc.res.rr.com [204.210.158.6] The times are not bad, 14 hops isn't a lot either. Berkeley's Galileo is nearly twice the distance (time-wise) and 14 hops too but works fine. A big ping (1200 bytes) has quite good response times of ~ 230ms The web browser never had any timeouts on the site either. _________________ The strangest thing is, that I always get a sched_reply which looks normal and contains the new HostID. CC 4.19 just won't use it | |
ID: 2643 | Rating: 0 | rate: / Reply Quote | |
This is the condition for this error - just in case you didn't keep the ancient source ;-) : | |
ID: 2644 | Rating: 0 | rate: / Reply Quote | |
The strangest thing is, that I always get a sched_reply which looks normal and contains the new HostID. CC 4.19 just won't use it What about upgrading to a newer CC? ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 2645 | Rating: 0 | rate: / Reply Quote | |
*sigh* much work and 4.19 isn't as bloated as later versions - plus I prefer to have control over the scheduling, the new scheduler encroaches into things I prefer to keep in my hands. | |
ID: 2649 | Rating: 0 | rate: / Reply Quote | |
I'm not sure if it would be a good idea anyway, they changed the firewall/proxy handling to cURL lately and some of my boxes are behind a Squid with auth - others have trouble with this and I didn't bring other DC programs using cURL through this Squid either so far (wget works though, it uses cURL) Ah, but that is a very worthwhile problem to solve. Perhaps if you worked on it together, or even enlisted a few more helpers... I'm worried that it would be a waste of time to try to solve a problem that is really due to using 4.19. Or might be. Though I belive Cori is not using 4.19, right? I've finally had time to catch up on some reading on the boinc_dev list, and it sounds like the behaviour reported here might be related to the "error 500" discussion there. It seems to be caused by a mis-match in use of HTTP 1.0 versus 1.1 between client and scheduler. The proposed fix is to have cURL always use 1.1. So we will be testing that when we upgrade to 5.3. Meanwhile, I think Ageless already tried out a client built with this change and it cleared up some errors, though it was not the problem of new hosts being created, right Ageless? ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 2656 | Rating: 0 | rate: / Reply Quote | |
...Though I belive Cori is not using 4.19, right? No, I'm using version 5.3.15 meanwhile, it's a development version which supports the AMS testing. ;) (Before I used the development client version of CPDNBBC 5.3.9.) ____________ Arrrrr! I'm a proud member o' the pirates' ship crew BOINC@Heidelberg | |
ID: 2657 | Rating: 0 | rate: / Reply Quote | |
There has been one interesting effect with FaD (which uses cURL) on Linux. | |
ID: 2675 | Rating: 0 | rate: / Reply Quote | |
A member of my team seem to have similar problem on SZDG. It happened recently, when Sztaki wasn't supplying work units. | |
ID: 2676 | Rating: 0 | rate: / Reply Quote | |
I've finally had time to catch up on some reading on the boinc_dev list, and it sounds like the behaviour reported here might be related to the "error 500" discussion there. It seems to be caused by a mis-match in use of HTTP 1.0 versus 1.1 between client and scheduler. The proposed fix is to have cURL always use 1.1. So we will be testing that when we upgrade to 5.3. The original version of 5.3.6 that I had would have HTTP errors on here when for longer out of work. A computer reboot and a reset of BOINC/Pirates wouldn't give me anything else than the HTTP error. So I tested Carl's 5.3.13 app with forced HTTP 1.1 use and immediately upon restart I would get a scheduler request succeeded message... though the project was still out of work. So yes, for me it cleared that up. I think this addition propagated into the 5.3.15 I am using, as I still only have scheduler request succeeded and project out of work. It's not reverted back to the dreaded HTTP error yet. :) But anyone with the continuous error 500/HTTP error should look at this post by Wander Saito on the Seti helpdesk forums. It has cleared up the errors for some who have had them for a long time. ____________ Jord. The BOINC FAQ Service. | |
ID: 2687 | Rating: 0 | rate: / Reply Quote | |
Looking at the checkin_notes today gives me hope that this problem may have been found and fixed in the CC: + David 3 Mar 2006 + - core client: on scheduler RPC, if our host ID is zero, + set RPC seqno to zero also. + This avoids a bug where the scheduler creates a new host record + with rpc_seqno zero, and then on the next RPC creates + ANOTHER host record because the client's rpc_seqno is > 0 We will of course have to wait until it can be tested to be sure this is it. ____________ -- Eric Myers "Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats | |
ID: 3106 | Rating: 0 | rate: / Reply Quote | |
Message boards : Help! : New Host entry after each contact to scheduler
Home | Help | Status | Forums | Glossary | Account
|