Pirates@Home logo

Pirates@Home

Berkeley Open Infrastructure
BOINC!
for Network Computing
Home Help Status Forums Glossary Account

Einstein@Home is down

log in

Advanced search

Message boards : Announcements : Einstein@Home is down

Author Message
Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,009.8
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 3346 - Posted: 29 May 2006 | 13:21:34 UTC

Einstein@Home is down due to failure of the air conditioning in the server room and subsequent crash of the database server. The project server was shut down cleanly. No word on the extent of damage to the server and no estimate yet of how long repairs should take (it's a holiday weekend in the US.)


____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

Profile KSMarksPsych
Volunteer tester
Avatar
Send message
Joined: 19 Jan 06
United States
Machinae Supremacy
Credit: 4,127.4
RAC: 0.00
Joined: Jan 19, 2006
Verified: Sep 24, 2010
Dubloons: 3
Pieces of Eight: 8
Punishment: Mess Duty
Message 3347 - Posted: 29 May 2006 | 13:49:20 UTC - in response to Message 3346.

Many thanks.

Was wondering what happened.

Hopefully not too much damage.

____________
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System

Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,009.8
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 3349 - Posted: 29 May 2006 | 16:34:03 UTC - in response to Message 3347.

Many thanks.

Was wondering what happened.

Hopefully not too much damage.


We hope, but won't know until they get the power restored and try to bring the servers up. Bruce Allen has promised an update in 24 hrs. They will be down for at least that long.

____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

Jan Gnodde
Avatar
Send message
Joined: 30 May 06
Netherlands
Credit: 0.0
RAC: 0.00
Joined: May 30, 2006
Verified: NEVER
Message 3363 - Posted: 30 May 2006 | 14:42:08 UTC
Last modified: 30 May 2006 | 14:42:32 UTC

Those 24 hours all allmost done. I hope everything will be well, because some of my processors are getting cold (no more work_units left...).

Jan.

____________
Small but mean: http://www.damnsmalllinux.org/

RobertKN
Send message
Joined: 30 May 06
Austria
Credit: 0.0
RAC: 0.00
Joined: May 30, 2006
Verified: NEVER
Message 3364 - Posted: 30 May 2006 | 17:57:10 UTC

are there nice news somewhere about einstein.

The status mesage we wait for is still not here.

Jan Gnodde
Avatar
Send message
Joined: 30 May 06
Netherlands
Credit: 0.0
RAC: 0.00
Joined: May 30, 2006
Verified: NEVER
Message 3365 - Posted: 30 May 2006 | 23:09:57 UTC

This is getting a bit frustrating! Still no info on the server-status of einstein@home. Most of my machines have run-out of work_units.
This is not handled in a professional way. Imagine Google getting down for two days...

Jan.

____________
Small but mean: http://www.damnsmalllinux.org/

Profile Ageless
Chief Petty Officer
Volunteer tester
Avatar
Send message
Joined: 20 Jul 04
Netherlands
Machinae Supremacy
Credit: 1,295.9
RAC: 0.00
Joined: Jul 20, 2004
Verified: Jul 9, 2011
Dubloons: 3
Pieces of Eight: 7
Punishment: Cat o' Nine Tails
Message 3366 - Posted: 31 May 2006 | 1:05:29 UTC - in response to Message 3365.

Google is a multi-million dollar company.
Einstein@Home is running on a shoe-string budget.

You cannot compare the two.

BOINC is made to run multiple projects, in case one goes down you always have work. So complaining that EAH isn't back yet and your computers don't have work is a no go complaint. All you have to do is attach to a secondary project.

____________
Jord.

The BOINC FAQ Service.

Jan Gnodde
Avatar
Send message
Joined: 30 May 06
Netherlands
Credit: 0.0
RAC: 0.00
Joined: May 30, 2006
Verified: NEVER
Message 3367 - Posted: 31 May 2006 | 4:50:45 UTC
Last modified: 31 May 2006 | 4:51:04 UTC

I don't know how it is in the US, but from a university you may expect the server-room(s) to be well equipped and secured, with adequate backups, etc. For every university those computers are vital, without them a university cannot exist nowadays. So i.m.h.o. a server-crash should be handled within hours.

Secondly: even with the server down it should still be possible to get some information out about how the repairs are going on. It's this lack of information what's the most frustrating.

And, as a last point: I don't want to run other projects. I have chosen to support EAH and I stick to that (for the moment).

Jan.

____________
Small but mean: http://www.damnsmalllinux.org/

Profile [B^S] sTrey
Volunteer tester
Avatar
Send message
Joined: 6 Feb 05
International
BOINC Synergy
Credit: 2,300.5
RAC: 0.00
Joined: Feb 6, 2005
Verified: Mar 30, 2009
Dubloons: 3
Pieces of Eight: 2
Punishment: Mess Duty
Message 3368 - Posted: 31 May 2006 | 7:19:58 UTC
Last modified: 31 May 2006 | 7:26:27 UTC

Sorry no news here, just comments on comments.

I have no knowledge of E@H's setup, but I highly doubt the Einstein servers have stringent uptime requirements or are considered critical to the university's functioning (no matter how important we consider them to be to OUR functioning ;)

People busy trying to get things running often don't communicate much until after the fact, even though they mean to... I'm frustrated also not knowing more, but please appreciate that they took the trouble to get some word out initially.

Sure this is worse than the more usual bumps in the road, especially for E@H which has been one of the more stable projects I've run overall. For me, the ticking clock adds some concern about the kinds of hardware and/or software damage that can cause long delays. Good luck folks, anyone who's ever done any kind of support is empathizing & pulling for you.
____________

Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,009.8
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 3369 - Posted: 31 May 2006 | 11:42:28 UTC

As of Wednesday morning (EDT) there is still no news on the status of Einstein@Home.


____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

Profile [B^S] Paul@home
Volunteer tester
Avatar
Send message
Joined: 21 Feb 05
Ireland
BOINC Synergy
Credit: 2,617.5
RAC: 0.00
Joined: Feb 21, 2005
Verified: NEVER
Dubloons: 3
Message 3370 - Posted: 31 May 2006 | 11:55:07 UTC - in response to Message 3368.
Last modified: 31 May 2006 | 11:56:20 UTC

Good luck folks, anyone who's ever done any kind of support is empathizing & pulling for you.


Here here!

As my old team leader used to say to the customer when stuff went wrong- "Do you want me to fix it, or do you want me to talk about fixing it!"


____________
Join BOINC Synergy Team

RobertKN
Send message
Joined: 30 May 06
Austria
Credit: 0.0
RAC: 0.00
Joined: May 30, 2006
Verified: NEVER
Message 3371 - Posted: 31 May 2006 | 12:12:03 UTC - in response to Message 3370.



I hope you are smart enough to fix it an to talk to the customers.

Profile Ageless
Chief Petty Officer
Volunteer tester
Avatar
Send message
Joined: 20 Jul 04
Netherlands
Machinae Supremacy
Credit: 1,295.9
RAC: 0.00
Joined: Jul 20, 2004
Verified: Jul 9, 2011
Dubloons: 3
Pieces of Eight: 7
Punishment: Cat o' Nine Tails
Message 3372 - Posted: 31 May 2006 | 12:42:12 UTC - in response to Message 3367.

I don't know how it is in the US, but from a university you may expect the server-room(s) to be well equipped and secured, with adequate backups, etc. For every university those computers are vital, without them a university cannot exist nowadays. So i.m.h.o. a server-crash should be handled within hours.

Had you read the actual news given by Eric, you'd have seen it was the air-conditioning that broke down, which caused the database server to crash, probably due to excessive heat.

So wouldn't you want to fix the AC first? Or do you think they have backup ACs in place? Plus one never knows what damage the database server got from overheating, prior to its crash. It's possible that it has lost all the tables on the present data that's out there. Which in essence means a restart of the S4 program.

Secondly: even with the server down it should still be possible to get some information out about how the repairs are going on. It's this lack of information what's the most frustrating.

The normal server that houses the forums and all sits in the same room, which is probably the size of a cupboard. If the AC unit is in this same room, I can imagine them taking all the computers out of that room to have the room to be able to reach the AC.

And don't think these AC units are the size of your AC unit, or your table fan. They are big refrigeration units.

Eric, while you were around there, did you ever see this?

And, as a last point: I don't want to run other projects. I have chosen to support EAH and I stick to that (for the moment).

Then that's your own choice. Yet everyone complaining that EAH doesn't come back quickly enough for their liking, that they aren't handling things professionally, that they should do this and that, should just have patience. If your main complaint is that your computers are now sitting idle, then just shut them down. Essent won't like you, but your wallet may.

(Gewoon geduld hebben)

____________
Jord.

The BOINC FAQ Service.

Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,009.8
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 3373 - Posted: 31 May 2006 | 14:49:10 UTC - in response to Message 3372.
Last modified: 31 May 2006 | 14:50:08 UTC

And don't think these AC units are the size of your AC unit, or your table fan. They are big refrigeration units.

Eric, while you were around there, did you ever see this?


Yes.

I have no real new information, but maybe I can provide some context. Hopefully without causing undue speculation. I expect the folks at UWM are working very hard on the problem, and Bruce will provide an update when he knows something definite. I will pass on whatever I learn when I hear it.

First of all, as I understand it the AC failure was in the *new* cluster room. So it is possible that there was a problem with the installation or with new AC hardware. But I do not know this.

I do know that the Einstein@Home servers are only a small part of a much larger computer installation at UWM, which includes the 300 node "Medusa" beowulf cluster. When I toured the machine room (what I expect is now the *old* cluster room) I found it very impressive (pictures here and here). The power consumption was also very impressive, and the AC system has to remove the excess heat from all of those nodes.

I know that there were plans to upgrade the cluster to new hardware, but I do not know if the "new" cluster room has new computing hardware or the original Medusa nodes. In any case, it is possible that there is damage to many of the nodes in the cluster, not just the Einstein@Home servers. But I do not know this.

I have not attempted to contact Bruce because I am sure he is extremely busy right now, and I also know that he likes to wait to release information until he knows the full extent of the situation. I will pass on whatever I hear as soon as I can.

Meanwhile, BOINC was designed so that participants can crank on other projects when one project is off-line for whatever reason. Or those who only want to do work for Einstein@Home can wait until it comes back up, which it will.

I'm sorry I don't have any further news or information. We will all just have to wait patiently and hope for the best.



____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

Profile Ageless
Chief Petty Officer
Volunteer tester
Avatar
Send message
Joined: 20 Jul 04
Netherlands
Machinae Supremacy
Credit: 1,295.9
RAC: 0.00
Joined: Jul 20, 2004
Verified: Jul 9, 2011
Dubloons: 3
Pieces of Eight: 7
Punishment: Cat o' Nine Tails
Message 3374 - Posted: 31 May 2006 | 15:02:46 UTC

Thank you, Eric. So my estimate of a small room was a bit off as well. :)
____________
Jord.

The BOINC FAQ Service.

Profile Ageless
Chief Petty Officer
Volunteer tester
Avatar
Send message
Joined: 20 Jul 04
Netherlands
Machinae Supremacy
Credit: 1,295.9
RAC: 0.00
Joined: Jul 20, 2004
Verified: Jul 9, 2011
Dubloons: 3
Pieces of Eight: 7
Punishment: Cat o' Nine Tails
Message 3377 - Posted: 31 May 2006 | 20:13:49 UTC
Last modified: 31 May 2006 | 20:19:07 UTC

Einstein is back up. You can upload your work.

May 31, 2006
We are making good progress in restarting the project. Now that the web pages are up, we will first enable file uploads and then enable the scheduler. It may be some hours before we are again handing out new work and everything is functioning normally. We'll post further reports here when we have more news.


May 31, 2006
Einstein@Home is back up and running. Because of the backlog of work that has been completed by YOUR computers in the past few days, our systems may be somewhat slow at uploading this completed work and handing out new work. So please be patient if it takes a bit of time before your computers are busy crunching away! Note: experience shows that recovery from this type of 'hard' failure can take some time; there may be new problems that appear, which may require rapid shut-down and additional corrective action at our end. But we will try hard to avoid this if possible.
____________
Jord.

The BOINC FAQ Service.

Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,009.8
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 3378 - Posted: 31 May 2006 | 20:36:50 UTC - in response to Message 3374.

Thank you, Eric. So my estimate of a small room was a bit off as well. :)


For Einstein@Home, yes, your estimate was off. But it's not far off for Pirates@Home.

____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

Profile Fuzzy Hollynoodles
Volunteer tester
Avatar
Send message
Joined: 18 Jan 06
International
BOINC Synergy
Credit: 90.6
RAC: 0.00
Joined: Jan 18, 2006
Verified: NEVER
Dubloons: 2
Punishment: Misfit
Message 3379 - Posted: 31 May 2006 | 21:53:36 UTC - in response to Message 3378.



Do you have a Beowulf? How many nodes?



____________

[color=navy][size=12][b]Those who can, do. Those who can't, bully.[/b][/size][/color]
From here

Profile KSMarksPsych
Volunteer tester
Avatar
Send message
Joined: 19 Jan 06
United States
Machinae Supremacy
Credit: 4,127.4
RAC: 0.00
Joined: Jan 19, 2006
Verified: Sep 24, 2010
Dubloons: 3
Pieces of Eight: 8
Punishment: Mess Duty
Message 3384 - Posted: 1 Jun 2006 | 13:43:59 UTC

Re: The pics...

D***! I'd hate to trouble shoot that when it goes down.
____________
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System

Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,009.8
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 3385 - Posted: 1 Jun 2006 | 13:54:49 UTC - in response to Message 3379.

Do you have a Beowulf? How many nodes?


No, I have 5 machines in a small (cozy?) office, but they are not configured as a beowulf cluster.

____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

Profile Fuzzy Hollynoodles
Volunteer tester
Avatar
Send message
Joined: 18 Jan 06
International
BOINC Synergy
Credit: 90.6
RAC: 0.00
Joined: Jan 18, 2006
Verified: NEVER
Dubloons: 2
Punishment: Misfit
Message 3409 - Posted: 7 Jun 2006 | 18:47:31 UTC - in response to Message 3385.
Last modified: 7 Jun 2006 | 18:48:35 UTC


No, I have 5 machines in a small (cozy?) office, ...


So you are saying, that you have a small and nice warm, and cosy to be in in the wintercold, office?! ;-D



____________
[color=navy][size=12][b]Those who can, do. Those who can't, bully.[/b][/size][/color]
From here

Post to thread

Message boards : Announcements : Einstein@Home is down

Home Help Status Forums Glossary Account


Return to Pirates@Home main page


Copyright © 2013 Capt. Jack Sparrow