[Aces-support] cannot start new job on ao and geo

Richard Lu lurr at MIT.EDU
Wed Aug 29 11:44:59 EDT 2007


Thanks for the effort.

Rongrong Lu

--------------------------------------------
Earth Resources Laboratory, MIT
77 Massachusetts Ave., Bldg.54-1815, Cambridge, MA 02139
Tel:     617-253-7835 (o)  617-230-6729 (m)
Email:   lurr at mit.edu
Web:     http://web.mit.edu/lurr
--------------------------------------------

aces-admin at techsquare.com wrote:
> hello lurr-
> 
> this is not strange at all, if the 
> "cluster environment" doesn't change. 
> 
> compute nodes are chosen in a fixed-order. 
> if you submit 2 jobs, quit them, and then
> submit 2 more jobs you will get the 
> same compute nodes as long as no thing
> else has changed in the "cluster environment" -
> eg, other user-jobs, machines down for repairs, etc.
> 
> i am working to find the root cause of this
> breakage, so please bear with me...
> 
> [greg]
> 
>> Date: Tue, 28 Aug 2007 16:28:46 -0400
>> From: Richard Lu <lurr at mit.edu>
>> MIME-Version: 1.0
>> Cc: 
>> Reply-To: ACES-support at mitgcm.org
>>
>> It is strange that if I close all the jobs, then I can submit two job 
>> requests, the first one is going to start normally, but the second job 
>> is having problem. For example, I request two jobs a moment ago and 
>> first job 68906.geo in the queue gets started and the other job 
>> 68907.geo had a problem. Hope this give you more clue on what's going 
>> on. Thanks.
>>
>> [lurr at geo:~]
>> $ nt1
>> qsub: waiting for job 68907.geo to start
>> qsub: job 68907.geo ready
>>
>>
>> qsub: job 68907.geo completed
>>
>>
>>
>>
>> aces-admin at techsquare.com wrote:
>>> hmm, and again, please ? 
>>>
>>> [greg]
>>>
>>>> Date: Tue, 28 Aug 2007 10:30:33 -0400
>>>> From: Richard Lu <lurr at mit.edu>
>>>> MIME-Version: 1.0
>>>> Cc: 
>>>> Reply-To: ACES-support at mitgcm.org
>>>>
>>>> No, it still has problem:
>>>>
>>>> [lurr at geo:~/scratch/s40/deimos]
>>>> $ nt1
>>>> qsub: waiting for job 68882.geo to start
>>>> qsub: job 68882.geo ready
>>>>
>>>>
>>>> qsub: job 68882.geo completed
>>>>
>>>>
>>>>
>>>> aces-admin at techsquare.com wrote:
>>>>> hello lurr-
>>>>>
>>>>> is this still happening for you ? 
>>>>> i've just checked both geo and ao 
>>>>> and they seem to be fine...
>>>>>     
>>>>> actually, i just tweaked geo a bit.
>>>>> does that help for you ? 
>>>>>
>>>>> [greg]
>>>>>
>>>>> ps. i killed your MATLAB on the head-node.
>>>>>     please do not run computationally intensive
>>>>>     code on the head nodes, etc.
>>>>>
>>>>>> Date: Tue, 28 Aug 2007 10:05:23 -0400
>>>>>> From: Richard Lu <lurr at mit.edu>
>>>>>> MIME-Version: 1.0
>>>>>> Cc: 
>>>>>> Reply-To: ACES-support at mitgcm.org
>>>>>>
>>>>>> Hi, there,
>>>>>>
>>>>>> I cannot start any new job on both ao and geo. When I submit a job 
>>>>>> request, it says the job was ready, and then immediately the job gets 
>>>>>> killed.  Anything wrong? Thanks.
>>>>>>
>>>>>> [lurr at ao:~] $ qsub -I -q long -l nodes=1
>>>>>> qsub: waiting for job 86713.ao to start
>>>>>> qsub: job 86713.ao ready
>>>>>>
>>>>>>
>>>>>> qsub: job 86713.ao completed
>>>>>>
>>>>>>
>>>>>> [lurr at geo:~]
>>>>>> $ qsub -I -q long -l nodes=1
>>>>>> qsub: waiting for job 68879.geo to start
>>>>>> qsub: job 68879.geo ready
>>>>>>
>>>>>>
>>>>>> qsub: job 68879.geo completed
>>>>>>
>>>>>>
>>>>>> Rongrong Lu
>>>>>>
>>>>>> --------------------------------------------
>>>>>> Earth Resources Laboratory, MIT
>>>>>> 77 Massachusetts Ave., Bldg.54-1815, Cambridge, MA 02139
>>>>>> Tel:     617-253-7835 (o)  617-230-6729 (m)
>>>>>> Email:   lurr at mit.edu
>>>>>> Web:     http://web.mit.edu/lurr
>>>>>> --------------------------------------------
>>>>>> _______________________________________________
>>>>>> Aces-support mailing list
>>>>>> Aces-support at acesgrid.org
>>>>>> http://acesgrid.org/mailman/listinfo/aces-support
>>>>>>
>>>>> _______________________________________________
>>>>> Aces-support mailing list
>>>>> Aces-support at acesgrid.org
>>>>> http://acesgrid.org/mailman/listinfo/aces-support
>>>> _______________________________________________
>>>> Aces-support mailing list
>>>> Aces-support at acesgrid.org
>>>> http://acesgrid.org/mailman/listinfo/aces-support
>>>>
>>> _______________________________________________
>>> Aces-support mailing list
>>> Aces-support at acesgrid.org
>>> http://acesgrid.org/mailman/listinfo/aces-support
>> _______________________________________________
>> Aces-support mailing list
>> Aces-support at acesgrid.org
>> http://acesgrid.org/mailman/listinfo/aces-support
>>
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support



More information about the Aces-support mailing list