Author Topic: I get "Segmentation Fault", SIGSEGV or "Memory fault"  (Read 31231 times)

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 558
  • Karma: +0/-0
I get "Segmentation Fault", SIGSEGV or "Memory fault"
« on: February 05, 2007, 02:23:37 PM »
Please check if your user limits are sufficient.

sh/bash/ksh users: please do a ulimit -a to get your actual limits. The output should look like:

core file size (blocks)     0
data seg size (kbytes)      unlimited
file size (blocks)          unlimited
max locked memory (kbytes)  unlimited
max memory size (kbytes)    unlimited
open files                  1024
pipe size (512 bytes)       8
stack size (kbytes)         unlimited
cpu time (seconds)          unlimited
max user processes          8191
virtual memory (kbytes)     unlimited


Important entries: data size, stack size, max memory size, and virtual memory should be either unlimited or as big as your total RAM.

To set, e.g. the stack size to unlimited, do:

ulimit -s unlimited

csh/tcsh users: please do limit and check the output.

Again, like given above, the limits should be at least as high as your memory available.
The syntax for changing the limits in csh/tcsh is:

limit stacksize unlimited

And please note that on 32bit machines, unlimited can be the same as 4GB (4194303 kbytes).


If you are using a queuing system:

Note that if you are submitting jobs to a queue, the user limits might be different from what you get when you log in on the machines! To check your limits, you have to add ulimit or limit in the script that is sent to the queue. Like:

....
ulimit -a > mylimits.out
jobex -ri -c 200 -statpt > jobex.out
...

send it to the queue and look in the file mylimits.out to see how the settings are.

Parallel version:

The parallel binaries are being started by the mpirun command which often uses ssh to start a process on a remote node. The limits for the stack size can not be set by the user in such a case, so everything in $HOME/.profile, $HOME/.bashrc, etc. will not help to get rid of the problem.

To check the limits on a node, try (sh/bash/ksh syntax):

ssh <hostname> ulimit -a

If the ssh command gives a lower stack size than unlimited or a large number, you have to change the file

/etc/security/limits.conf

on all nodes where the parallel binaries might run, and add there the line (example for 4GB limit)

*                soft    stack           4194303

Redo ssh <hostname> ulimit -a
and you should get 4GB stack size limit, as it is set in limits.conf now.
 

« Last Edit: April 20, 2007, 09:22:45 AM by uwe »