[Aces-support] Problems on ao nodes

Jean-Michel Campin jmc at ocean.mit.edu
Fri Aug 31 10:06:35 EDT 2007


Hi Greg,

run into similar problem last night (between 1.am & 3.30 am)
with those 2 nodes:
 a54-1727-037
 a54-1727-045
and same error msg:
> poll: protocol failure in circuit setup
> p0_12955:  p4_error: Child process exited while making connection to remote process on a54-1727-037: 0
> p0_12955: (2.009466) net_send: could not write to fd=4, errno = 32

Jean-Michel

On Tue, Aug 28, 2007 at 02:03:26PM -0400, aces-admin at techsquare.com wrote:
> hello jmc-
> 
> yes, these two (2) nodes are alright now.
> fwiw, a54-1727-033 was troubled earlier today.
> 
> [greg]
> 
> > Date: Tue, 28 Aug 2007 10:52:03 -0400
> > From: Jean-Michel Campin <jmc at ocean.mit.edu>
> > Mime-Version: 1.0
> > Cc: 
> > Reply-To: ACES-support at mitgcm.org
> > 
> > Hi,
> > 
> > I have 2 mpi jobs which did not run last night, between 1 am & 3.30 am:
> > both were on the same computing nodes:
> > a54-1727-033 
> > a54-1727-035
> > 
> > and generate this error msg:
> > poll: protocol failure in circuit setup
> > p0_30263:  p4_error: Child process exited while making connection to remote process on a54-1727-035: 0
> > p0_30263: (2.010777) net_send: could not write to fd=4, errno = 32
> > 
> > Are these 2 nodes OK now for mpi jobs ?
> > Thanks,
> > Jean-Michel
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> > 
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support



More information about the Aces-support mailing list