[Aces-support] cannot start new job on ao and geo

aces-admin at techsquare.com aces-admin at techsquare.com
Wed Aug 29 08:56:27 EDT 2007


hello lurr-

this is not strange at all, if the 
"cluster environment" doesn't change. 

compute nodes are chosen in a fixed-order. 
if you submit 2 jobs, quit them, and then
submit 2 more jobs you will get the 
same compute nodes as long as no thing
else has changed in the "cluster environment" -
eg, other user-jobs, machines down for repairs, etc.

i am working to find the root cause of this
breakage, so please bear with me...

[greg]

> Date: Tue, 28 Aug 2007 16:28:46 -0400
> From: Richard Lu <lurr at mit.edu>
> MIME-Version: 1.0
> Cc: 
> Reply-To: ACES-support at mitgcm.org
> 
> It is strange that if I close all the jobs, then I can submit two job 
> requests, the first one is going to start normally, but the second job 
> is having problem. For example, I request two jobs a moment ago and 
> first job 68906.geo in the queue gets started and the other job 
> 68907.geo had a problem. Hope this give you more clue on what's going 
> on. Thanks.
> 
> [lurr at geo:~]
> $ nt1
> qsub: waiting for job 68907.geo to start
> qsub: job 68907.geo ready
> 
> 
> qsub: job 68907.geo completed
> 
> 
> 
> 
> aces-admin at techsquare.com wrote:
> > hmm, and again, please ? 
> > 
> > [greg]
> > 
> >> Date: Tue, 28 Aug 2007 10:30:33 -0400
> >> From: Richard Lu <lurr at mit.edu>
> >> MIME-Version: 1.0
> >> Cc: 
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> No, it still has problem:
> >>
> >> [lurr at geo:~/scratch/s40/deimos]
> >> $ nt1
> >> qsub: waiting for job 68882.geo to start
> >> qsub: job 68882.geo ready
> >>
> >>
> >> qsub: job 68882.geo completed
> >>
> >>
> >>
> >> aces-admin at techsquare.com wrote:
> >>> hello lurr-
> >>>
> >>> is this still happening for you ? 
> >>> i've just checked both geo and ao 
> >>> and they seem to be fine...
> >>>     
> >>> actually, i just tweaked geo a bit.
> >>> does that help for you ? 
> >>>
> >>> [greg]
> >>>
> >>> ps. i killed your MATLAB on the head-node.
> >>>     please do not run computationally intensive
> >>>     code on the head nodes, etc.
> >>>
> >>>> Date: Tue, 28 Aug 2007 10:05:23 -0400
> >>>> From: Richard Lu <lurr at mit.edu>
> >>>> MIME-Version: 1.0
> >>>> Cc: 
> >>>> Reply-To: ACES-support at mitgcm.org
> >>>>
> >>>> Hi, there,
> >>>>
> >>>> I cannot start any new job on both ao and geo. When I submit a job 
> >>>> request, it says the job was ready, and then immediately the job gets 
> >>>> killed.  Anything wrong? Thanks.
> >>>>
> >>>> [lurr at ao:~] $ qsub -I -q long -l nodes=1
> >>>> qsub: waiting for job 86713.ao to start
> >>>> qsub: job 86713.ao ready
> >>>>
> >>>>
> >>>> qsub: job 86713.ao completed
> >>>>
> >>>>
> >>>> [lurr at geo:~]
> >>>> $ qsub -I -q long -l nodes=1
> >>>> qsub: waiting for job 68879.geo to start
> >>>> qsub: job 68879.geo ready
> >>>>
> >>>>
> >>>> qsub: job 68879.geo completed
> >>>>
> >>>>
> >>>> Rongrong Lu
> >>>>
> >>>> --------------------------------------------
> >>>> Earth Resources Laboratory, MIT
> >>>> 77 Massachusetts Ave., Bldg.54-1815, Cambridge, MA 02139
> >>>> Tel:     617-253-7835 (o)  617-230-6729 (m)
> >>>> Email:   lurr at mit.edu
> >>>> Web:     http://web.mit.edu/lurr
> >>>> --------------------------------------------
> >>>> _______________________________________________
> >>>> Aces-support mailing list
> >>>> Aces-support at acesgrid.org
> >>>> http://acesgrid.org/mailman/listinfo/aces-support
> >>>>
> >>> _______________________________________________
> >>> Aces-support mailing list
> >>> Aces-support at acesgrid.org
> >>> http://acesgrid.org/mailman/listinfo/aces-support
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
> 



More information about the Aces-support mailing list