df -h /u/mlat the command line. You can (and should) use condor to distribute the computation (see below) here.
A side note: on CS space that is backed up, such as your home directory and /u/ml, you can type "cd .snapshot" at any place to access snapshots of the current directory from each of the last 5 hours, the last 12 days, and the last 4 weeks. This can be a lifesaver when you've accidentally overwritten important files (although it is not a good substitute for a good source code revision control system, such git or SVN).
There is a large unbacked-up scratch disk accessible only on
the Mastodon nodes. It is currently 24TB.
To access it, you'll need to e-mail gripe and ask for
Mastodon scratch space, which they will grant you by giving you a
directory called /scratch/cluster/$USER. You cannot directly access these
on your desktop -- instead, ssh into one of the Mastodon submit nodes
(submit64.cs.utexas.edu) and work from
there, then submit jobs to condor with the line
InMastodon" to process them.
Starting July 2013, each /scratch user will have a 100GB initial quota. If you're working on a project and need more space, you can email gripe and ask for a temporary increase.
More about Condor and our cluster. If you use Condor heavily, it is very useful to make a script to help you submit jobs. Others have written much more complex systems that simulate MapReduce.
You may also ask for local storage on your machine. E-mail gripe
and ask them to make you a directory on /var/local/ on your machine.
You may want to see how big that is by typing
df -h /var/local.
Depending on your machine, it will be about 40GB - 750GB. This space is not
accessible at all through condor, and is not backed up, and may be
lost on system upgrades, hard drive failures, etc. Gripe will send
you a list of all the warnings about storage in /var/local when you
Ray may also be willing to buy external hard drives that you can attach to your computer. This will give you a few TB of personal disk space, but it is not available on the network and is not backed up.
There are some resources available outside of the department, particularly through TACC and Matt Lease and Jason Baldridge's recent LIFT grant. TACC operates a variety of high performance clusters and has up to 6PB of storage. You can run Hadoop MapReduce jobs on some of these clusters, and there is a page describing the process. If you want MPI, this is your place. You may need to get added to an allocation for it, so talk to Ray, Matt, or Jason if you think this is your best option.
The computational linguists also operate a small cluster of machines that have some storage and maintain some corpora. If you are working with people in the linguistics department and would like access, contact Katrin or Jason. This cluster is currently without a real admin, and its future is not entirely certain. There is also a small Hadoop cluster (Markov) in CLA, with somewhere in the 12TB range of HDFS storage.