[Aces-support] Error with submitting job

Chinnawat Surussavadee surusc at MIT.EDU
Mon Oct 1 10:37:13 EDT 2007


Hello again Greg,

Sorry for not providing enough info in the last email.

Below are my pbs job, pbs output file, and pbs error file.  Please let me know
if you need other info.

Thanks,

Pop


This is my pbs job:
[surusc at geo tbamsua_nadir]$ more January_2_9.pbs
#!/bin/bash
#PBS -q one
#PBS -N January_2_9
#PBS -l nodes=1:ppn=2
#PBS -V
echo $PBS_NODEFILE
cat  $PBS_NODEFILE
cd $PBS_O_WORKDIR

export MODULE_VERSION="3.1.6"
source /usr/local/pkg/modules/modules-$MODULE_VERSION/init/bash

module load matlab

matlab < January_2_9.m > matlab.out

-------------------
This is the pbs output file:
[surusc at geo tbamsua_nadir]$ more January_2_9.o93573
/var/torque/aux/93573.geo
aE34-500-025
aE34-500-025

-------------------
This is the pbs error file:
[surusc at geo tbamsua_nadir]$ more January_2_9.e93573
ModuleCmd_Load.c(199):ERROR:105: Unable to locate a modulefile for 'man'
??c(3):ERROR:105: Unable to locate a modulefile for 'modules'
??c(4):ERROR:105: Unable to locate a modulefile for 'rsh'
??c(5):ERROR:105: Unable to locate a modulefile for 'torque'
$(7):ERROR:105: Unable to locate a modulefile for 'matlab'
/var/torque/mom_priv/jobs/93573.geo.SC: line 15: matlab: command not found



Quoting aces-admin at techsquare.com:

> hello pop-
>
> as you know, i really need a bit more data
> here -- machine-name, job-id, something to
> go on.
>
> thank you
>
> [greg]
>
>
>> Date: Mon, 01 Oct 2007 10:22:07 -0400
>> From: Chinnawat Surussavadee <surusc at MIT.EDU>
>> Reply-to: pop at alum.mit.edu
>> Cc: ACES-support at mitgcm.org
>> MIME-Version: 1.0
>>
>> Hello Greg,
>>
>> Thank you for the reply.
>>
>> 1. I just tried to qsub the pbs job again on geo and got the same
>> error.  By the
>> way, the pbs job will read/write data from/to /net/ds-01/scratch-0/.  I also
>> tried to submit similar pbs job on itrda as you suggested and got the same
>> error attached below.
>>
>> [surusc at itrda tbamsua_nadir]$ more February_12_9.e18899
>> ModuleCmd_Load.c(199):ERROR:105: Unable to locate a modulefile for 'man'
>> ??c(3):ERROR:105: Unable to locate a modulefile for 'modules'
>> ??c(4):ERROR:105: Unable to locate a modulefile for 'rsh'
>> ??c(5):ERROR:105: Unable to locate a modulefile for 'torque'
>> $(7):ERROR:105: Unable to locate a modulefile for 'matlab'
>> /var/torque/mom_priv/jobs/18899.itrda.SC: line 15: matlab: command not found
>>
>> 2. I still could not get into /net/ds-0a/raid1/ as we can see error message
>> below.
>>
>> [surusc at geo ~]$ cd /net/ds-0a/raid1/
>> -bash: cd: /net/ds-0a/raid1/: No such file or directory
>>
>> Thanks,
>>
>> Pop
>>
>>
>>
>> Quoting aces-admin at techsquare.com:
>>
>> > hello pop-
>> >
>> >> 1. I tried to submit a job using qsub, but I got some errors.  
>> Could anyone
>> >> please point out how I should fix this problem?  Both pbs job and
>> >> error message
>> >> are shown below.  Please note that this same pbs job worked fine
>> >> before (some
>> >> time last year)
>> >
>> > looks like transient network issue.
>> >
>> > for what it's worth, the itrda nodes are much
>> > closer to ds-0a than the geo-nodes.
>> >
>> >> 2. I could not access /net/ds-0a/raid1/.  Could the admin please
>> >> check if the
>> >> disk is mounted properly?
>> >
>> > yes, it is. if there was lots of network
>> > activity, though, this may have timed-out.
>> >
>> > i have just verified that things at ds-0a
>> > are well and will check out the several switches
>> > throughout the day today.
>> >
>> > [greg]
>> >
>> >
>>
>>
>





More information about the Aces-support mailing list