1. "My jobs on Talapas are crashing immediately, what's wrong??"
By default, users are assigned one core at a time, and a chunk of memory (~4GB RAM) associated with that core. For many applications, this may not be enough memory. To get around this, one can reserve more cores:
[user@ln1 ~]$ srun -n28 --pty bash
This will reserve all 28 cores on a basic node, which should suffice for most applications. Alternatively, if you just want to reserve more memory, do:
[user@ln1 ~]$ srun --mem=50G --pty bash
This will reserve 1 core but 50GB RAM on a node. It may be useful to try reserving a full node and determine how much RAM is needed for your application, then future jobs can be adjusted accordingly.
If your jobs are crashing for a different reason, check that you are not exceeding a disk quota (e.g. your home directory quota). If it is not the quota, please submit a ticket and let us know the jobid of the job that crashed.
2. "How do I view my disk usage and quota?"
You can now view your disk usage and your current quota using the "quota" command, located in /usr/local/bin. The first lines give your usage and quota, and additional lines give group usages and quotas for all the groups that you are a member of.
3. "Oops! I just accidentally deleted some of my files and directories. Can you please recover them?
You just might be in luck. Talapas does NOT have backups, but it does take regular filesystem snapshots. These are located in /gpfs/.snapshots/ and are taken at the following times:
end of each day (before midnight, Sunday through Friday): daily-{Sun-Fri}
end of each week (before midnight on Saturday): weekly-{week # of month}
For example, to copy your deleted file from your project directory from last Friday:
[user@ln1 ~]$ cp /gpfs/.snapshots/daily-Fri/projects/myPIRG/myDataDir/path/to/myDeletedFile .
Your data may be accessible from within the snapshot, but once the old snapshot has been replaced by a new snapshot, data unique to the older snapshot will be gone.
Users are responsible for backing up their own data! Please see the FAQ below on how to transfer your files off of Talapas.
4. "What is the best way to transfer my files to/from Talapas?"
From a Mac or Linux machine:
Perhaps the easiest way for you to do this is by using rsync, e.g.:
[user@ln1 ~]$ rsync -auv myDuckID@dtn01.uoregon.edu:/projects/myPIRG/myDuckID/myDataDirectory myDestinationDirectory
alternatively you can use scp:
[user@ln1 ~]$ scp -rp myDuckID@dtn01.uoregon.edu:/projects/myPIRG/myDuckID/myDataDirectory myDestinationDirectory
This will create for you a directory "myDestinationDirectory" on your Mac/Linux machine. If your transfer gets interrupted, then try rsync with the "u" option (for update).
From a Windows machine:
You will need a FTP/SCP client like WinSCP or Cyberduck (Cyberduck is also available for Mac).
Globus and GridFTP is also available on Talapas (see HOWTOs for more info).
After your transfer has completed, you can reorganize as you please. If you need further assistance on your file transfer, let us know.
4. "How can I extend my jobs on Talapas?"
It is currently not possible for a user to extend their jobs on Talapas.
5. "Why is my job pending with 'ReqNodeNotAvail' message when I type squeue?"
The most likely explanation is because your job cannot run yet because it will overlap with an existing reservation, e.g. a reservation made for a maintenance outage. If this is the case, and if you know that your job can complete before the outage window, you can change the TimeLimit of your queued job as follows:
scontrol update jobid=1234567 TimeLimit=2-12:00:00
This will change the TimeLimit of jobid to 2 days and 12 hours. To resubmit jobs with non-default time limits, use the "--time" option. For example, to submit a job to the long queue but for only 4 days rather than the default 14 days, add this SBATCH directive to your batch script:
#SBATCH --time=4-00:00:00
If the maintenance window is scheduled for the next day and you want an interactive job on the short queue for just six hours rather than the default 24 hours, try this:
srun -p short --time=0-06:00:00 --pty bash -i
Efforts will be made to communicate maintenance outages 30 days, 14 days, and 1 day before the outage begins. In addition, the current maintenance schedule for Talapas is published here:
If you get this "ReqNodeNotAvail" message and there is no maintenance scheduled, please leave the job in the queue, submit a ticket and we will investigate.