[Aces-support] Problems on ao nodes

aces-admin at techsquare.com aces-admin at techsquare.com
Tue Aug 28 14:03:26 EDT 2007


hello jmc-

yes, these two (2) nodes are alright now.
fwiw, a54-1727-033 was troubled earlier today.

[greg]

> Date: Tue, 28 Aug 2007 10:52:03 -0400
> From: Jean-Michel Campin <jmc at ocean.mit.edu>
> Mime-Version: 1.0
> Cc: 
> Reply-To: ACES-support at mitgcm.org
> 
> Hi,
> 
> I have 2 mpi jobs which did not run last night, between 1 am & 3.30 am:
> both were on the same computing nodes:
> a54-1727-033 
> a54-1727-035
> 
> and generate this error msg:
> poll: protocol failure in circuit setup
> p0_30263:  p4_error: Child process exited while making connection to remote process on a54-1727-035: 0
> p0_30263: (2.010777) net_send: could not write to fd=4, errno = 32
> 
> Are these 2 nodes OK now for mpi jobs ?
> Thanks,
> Jean-Michel
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
> 



More information about the Aces-support mailing list