[Aces-support] walltime question

aces-admin at techsquare.com aces-admin at techsquare.com
Tue Dec 2 11:57:24 EST 2008


hello huajian-

what you suggest will not work, as
the max_walltime for the 'one' queue
is less than 12h. 

secondly, the default walltime for the
'one' queue is 6h. if you point me 
to the url where there is stale data
i will fix that up.

thirdly, if you are hoping to game the
queue-system, then you will have to either
modify your job or become one of the PIs 
who set policy ;)

;; job-modification

have your job checkpoint every hour or
two and set them free (with appropriate
pickup files) in the 'one' queue.

;; job-submission modification

request all 16 nodes available to your
job in the 'long' queue and 'manually
multiplex' your job that way. 

i hope that helps...

[greg]


> Date: Tue, 02 Dec 2008 11:12:23 -0500
> From: Huajian Yao <hjyao at MIT.EDU>
> MIME-Version: 1.0
> Cc: 
> Reply-To: ACES-support at mitgcm.org
> 
> Greg,
> 
> Thanks a lot for your quick reply!
> Maybe you misunderstood my last email. The reason I want to switch to
> 'one' queue is that I can run up to 64 jobs at the same time. But with
> 'long' queue, I can only have a maximum of 8 jobs. However, the maximum
> walltime seems to be 2 hours (according to acesgrid.org) which does not
> satisfy my computation time for each job. Is it possible that I require
> walltime for 'one' queue like: qsub -q one -l nodes=1,walltime=12:00:00 ?
> 
> Huajian
> 
> Quoting aces-admin at techsquare.com:
> 
> > hello hiyao-
> >
> > i believe that you may be over-thinking this.
> >
> > your job, for example, does not take any longer
> > to run when executed in the 'long' queue than
> > when executed in the 'one' queue
> >
> >  unless your job does not terminate and you
> >  simply wait for the queue-system to beat it
> >  up. in this case, you should write some terminal
> >  conditions to your job.
> >
> > what you may notice, however, is that jobs in
> > the 'one' queue are scheduled more readily than
> > jobs in the 'long' queue. this has more to do
> > with the default resource requirements for each
> > queue and you should be able to achieve similar
> > scheduling results by submitting your job to the
> > 'long' queue and requesting only 12h of walltime.
> >
> >  qsub -q long -l nodes=1,walltime=12:00:00
> >
> > did i mis-read your message entirely or is this
> > 'on the right track' ?
> >
> > [greg]
> >
> >
> >
> >
> >> Date: Tue, 02 Dec 2008 10:25:46 -0500
> >> From: Huajian Yao <hjyao at mit.edu>
> >> MIME-Version: 1.0
> >> Cc:
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> Hi,
> >>
> >> I am running (many) jobs each with single node one cpu on the cluster. The
> >> computation
> >> time usually takes about 9-12 hours for each job. Usually I run my 
> >> code using
> >> "long" job which works pretty well but takes too much time. I have 
> >> heard from
> >> someone that the walltime for "one" type job has changed to 12 
> >> hours. However,
> >> on
> >> the acesgrid.org, the walltime for "one" is still 2 hours. Yesterday I tried
> >> some "one" type jobs on the cluster but all killed after about 6 
> >> hours. Could
> >> you
> >> please let me what is the real walltime for "one" type job? Thanks a lot!
> >> And I really hope the walltime for "one" can extend to 12 hours.
> >>
> >> Huajian
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> >
> 
> 
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
> 



More information about the Aces-support mailing list