Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add srun warning

...

By default, users are assigned one core at a time, and a chunk of memory (~4GB RAM) associated with that core. 

For many applications, this may not be enough memory.  To get around this, one can reserve more cores:[user@ln1 ~]$ srun -n28

Code Block
languagebash
srun --account=<myPIRG> --cpus-per-task=28 --pty bash

This will reserve all 28 cores on a basic node, which should suffice for most applications. 

Alternatively, if you just want to reserve more memory, do:[user@ln1 ~]$ srun

Code Block
languagebash
srun --account=<myPIRG> --mem=

...

50G --pty bash

This will reserve 1 core but 50GB RAM on a node. 

It may be useful to try reserving a full node and determine how much RAM is needed for your application, then future jobs can be adjusted accordingly.

...

You can now view your disk usage and your current quota using the "quota" command, located in /usr/local/bin/quota command. 

The first lines give your usage and quota, and additional lines give group usages and quotas for all the groups that you are a member of.


3.  "Oops!  I just accidentally deleted some of my files and directories.  Can you please recover them?

You just might be in luck.   Talapas does NOT have backups, but it does take regular filesystem snapshots.  These are located in the directory /gpfs/.snapshots/ and

Snapshots are taken at the following times:

...

For example, to copy your deleted file from your project directory from last Friday:[user@ln1 ~]$ cp

Code Block
languagebash
cp /gpfs/.snapshots/daily-Fri/projects/myPIRG/myDataDir/path/to/myDeletedFile .


Your data may be accessible from within the snapshot, but once the old snapshot has been replaced by a new snapshot, data unique to the older snapshot will be gone.

...

Perhaps the easiest way for you to do this is by using rsync, e.g.:[user@ln1 ~]$ rsync -auv myDuckID@dtn01

Code Block
languagebash
rsync -auv myDuckID@dtn01.uoregon.edu:/projects/myPIRG/myDuckID/myDataDirectory myDestinationDirectory


alternatively Alternatively you can use scp:[user@ln1 ~]$ scp -rp myDuckID@dtn01

Code Block
languagebash
scp -rp myDuckID@dtn01.uoregon.edu:/projects/myPIRG/myDuckID/myDataDirectory myDestinationDirectory


This Each of the examples above will create for you a directory "named myDestinationDirectory" on  on your local Mac/Linux machine.

If your transfer gets interrupted, then try rsync with the "u" option (for update).

From a Windows machine: 

...

After your transfer has completed, you can reorganize as you please. If you need further assistance on your file transfer, let us know. 


45. "How can I extend my jobs on Talapas?"

It is currently not possible for a user to extend their jobs on Talapas.


56. "Why is my job pending with 'ReqNodeNotAvail' message when I type squeue?"

The most likely explanation is because your job cannot run yet because it will overlap with an existing reservation, e.g. a reservation made for a maintenance outage.  If this is the case, and if you know that your job can complete before the outage window, you can change the TimeLimit of your queued job as follows:

scontrol update jobid=1234567 TimeLimit=2-12:00:00

This will change the TimeLimit of jobid to 2 days and 12 hours.  To resubmit jobs with non-default time limits, use the "--time" option.  For example, to submit a job to the long queue but for only 4 days rather than the default 14 days, add this SBATCH directive to your batch script:

#SBATCH --time=4-00:00:00

If the maintenance window is scheduled for the next day and you want an interactive job on the short queue for just six hours rather than the default 24 hours, try this:

srun -p short --time=0-06:00:00 --pty bash -i

Efforts will be made to communicate maintenance outages 30 days, 14 days, and 1 day before the outage begins.  In addition, the current maintenance schedule for Talapas is published here:

Scheduled Maintenance Windows

If you get this "ReqNodeNotAvail" message and there is no maintenance scheduled, please leave the job in the queue, submit a ticket and we will investigateSee FAQ: Why is my SLURM job not running?


7. "I'm seeing an error mentioning a directory /run/user, and it's causing trouble.  How do I fix it?"

The SLURM commands pass all environment variables into your job by default.  One of the variables, XDG_RUNTIME_DIR, references a special directory in /run/user that will not exist on your compute node.  Programs are supposed to ignore this, but some do not.  You can work around this by including the line below as the first line of your script following all #SBATCH declarations.

Code Block
languagebash
#!/bin/bash
#SBATCH --account=<myPIRG>
...

unset XDG_RUNTIME_DIR


8.  "How do I view my compute usage on Talapas?"

Check out the link below:

How-to view Talapas compute usage


9. "I have exceeded my disk quota, can you increase it?"

Most quota issues involve the home directory.  All users gets a hard 10GB quota that is fixed and will not be increased. We recommend that users use their project directory, /projects/<myPIRG>/<myUsername>/, to write and store their data.  By default, PIRGs have minimum of 2 TB to work with in their project directories, and users have a shortcut in their home directory that points to their projects directory so that it can be accessed from the home directory by just doing cd <myPIRG>

If you do not have this shortcut and would like to have it, perform the following commands:

Code Block
languagebash
cd ~
ln -s /projects/<myPIRG>/<myUsername> <myPIRG>
cd <myPIRG>

You can now work from your projects folder.

Your project usage and your PIRGs usage and quota are displayed on login.  Your project quota is a soft quota - if your group's project quota has been exceeded, your group will have a 30 day grace period to bring it back under the quota.  If that does not happen, it becomes a hard quota and no new files can be created or stored. 

If that becomes the case and your group needs your project quota increased, let us know at racs@uoregon.edu.


10.  "Why can't I submit a job, I keep getting 'srun: error: AssocGrpSubmitJobsLimit' or 'Job violates accounting/QOS policy' message?"

You may be getting this message if you have not specified your PIRG in your job submission.  This is now required.  To fix this, do the following:

For srun,

Code Block
languagetext
srun -A <myPIRG> ...

or

Code Block
languagetext
srun --account=<myPIRG> ...

NOTE: The 'srun' command will NOT pay attention to '#SBATCH' directives, so if the account is not specified using an argument, you will get this error!


For sbatch, include the following directive in your submission script:

Code Block
languagebash
#SBATCH -A <myPIRG>

or

Code Block
languagebash
#SBATCH --account=<myPIRG>


If you have added the correct account information and you still cannot submit your jobs, contact us.