A few weeks ago I attended a symposium on HPC and Open Source and ever since I've been wanting to set up my own HPC cluster. So I did, here are the instructions to set up an HPC cluster using CentOS 6.2.
I have set up a two node cluster, but these instructions could be used for any number of nodes. The servers I've used only have a single 74 GB hard drive, a single NIC, 8 GB of RAM and 2 quad core CPUs, so that the cluster has 16 cores and 16 GB of RAM.
- Install CentOS using a minimum install to ensure that the smallest amount of packages get installed.
- Enable NIC by editing NIC config file (/etc/sysconfig/network-scripts/ifcfg-eth0) (I used the text install and it seems to leave the NIC disabled, but it's quicker to navigate from the ILO interface):
- Disable and stop the firewall (I'm assuming no internet access for your cluster, of course):
chkconfig iptables off; service iptables stop
- Install ssh clients and man. This installs the ssh client and scp among others things as well as man, which is always handy to have:
yum -y install openssh-clients man
- Modify ssh client configuration to allow seamless addition of hosts to the cluster. Add this line to /etc/ssh/ssh_config (Note that this is a security risk if your cluster has access to the internet):
StrictHostKeyChecking no
- Generate pass-phrase free key. This will make it easier to add hosts to the cluster (just press enter repeatedly after running ssh-keygen):
ssh-keygen
- Install compilers and libraries (Note that development packages were obtained from here and yum was run from the directory containing them):
yum -y install gcc gcc-c++ atlas blas lapack mpich2 make mpich2-devel atlas-devel
- Add node hostname to /etc/hosts.
- Create file /$(HOME)/hosts and add node hostname to it.
DEVICE="eth0"
ONBOOT="yes"
BOOTPROTO=dhcp
- Add each extra node to the hosts file (/etc/hosts) of all nodes [A DNS server could be set up instead.] and to (/$(HOME)/hosts).
- Copy key generated in step 5 to all nodes (If you don't have a head
node, i.e. a node that does not do any calculations, remember to add the
key to itself too):
ssh-copy-id hostname
I have not made any comments on networking and this is
because the servers that I have been using only have a single NIC as
mentioned above. There are gains to be made by forcing as much
intra-node communication as possible through the loopback interface, but
this requires unique (/etc/hosts) files for each node and my original plan was to set up a 16 node cluster.
SELinux does not seem to have any negative effects, so I have left it on. I plan to test without it to see whether performance is improved.
At this point all that remains is to add some software that can run on the cluster and there is nothing better than HPL or Linpack, which is widely used to measure cluster efficiency (the ratio between theoretical and actual performance). Do the following steps on all nodes:
SELinux does not seem to have any negative effects, so I have left it on. I plan to test without it to see whether performance is improved.
At this point all that remains is to add some software that can run on the cluster and there is nothing better than HPL or Linpack, which is widely used to measure cluster efficiency (the ratio between theoretical and actual performance). Do the following steps on all nodes:
- Download HPL from netlib.org and extract it to your home directory.
- Copy Make.Linux_PII_CBLAS file from $(HOME)/hpl-2.0/setup/ to $(HOME)/hpl-2.0/
- Edit Make.Linux_PII_CBLAS file (Changes in Bold. Note that the MPI section is commented out):
- Run make arch=Linux_PII_CBLAS.
- You can now run Linpack (on a single node):
cd bin/Linux_PII_CBLAS
mpiexec.hydra -n 4 ./xhpl
# ----------------------------------------------------------------------# - HPL Directory Structure / HPL library ------------------------------# ----------------------------------------------------------------------#TOPdir = $(HOME)/hpl-2.0INCdir = $(TOPdir)/includeBINdir = $(TOPdir)/bin/$(ARCH)LIBdir = $(TOPdir)/lib/$(ARCH)#HPLlib = $(LIBdir)/libhpl.a## ----------------------------------------------------------------------# - Message Passing library (MPI) --------------------------------------# ----------------------------------------------------------------------# MPinc tells the C compiler where to find the Message Passing library# header files, MPlib is defined to be the name of the library to be# used. The variable MPdir is only used for defining MPinc and MPlib.##MPdir = /usr/lib64/mpich2#MPinc = -I$(MPdir)/include#MPlib = $(MPdir)/lib/libmpich.a## ----------------------------------------------------------------------# - Linear Algebra library (BLAS or VSIPL) -----------------------------# ----------------------------------------------------------------------# LAinc tells the C compiler where to find the Linear Algebra library# header files, LAlib is defined to be the name of the library to be# used. The variable LAdir is only used for defining LAinc and LAlib.#LAdir = /usr/lib64/atlasLAinc =LAlib = $(LAdir)/libcblas.a $(LAdir)/libatlas.a# ----------------------------------------------------------------------# - Compilers / linkers - Optimization flags ---------------------------# ----------------------------------------------------------------------#CC = /usr/bin/mpiccCCNOOPT = $(HPL_DEFS)CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops## On some platforms, it is necessary to use the Fortran linker to find# the Fortran internals used in the BLAS library.#LINKER = /usr/bin/mpiccLINKFLAGS = $(CCFLAGS)#ARCHIVER = arARFLAGS = rRANLIB = echo## ----------------------------------------------------------------------
Repeat steps 1- 5 on all nodes and the you can now run Linpack on all nodes like this (from directory $(HOME)/hpl-2.0/Linux_PII_CBLAS/ ):
mpiexec.hydra -f /$(HOME)/hosts -n x ./xhpl
For results of running Linpack, see my next post here.
On a Centos 6.3 system after completing steps 1-3 from these instructions. I get the following error on step 4.
ReplyDelete"Make.inc: No such file or directory"
what directory are you running step 4 from?
Delete$HOME/hpl-2.0
DeleteI also moved the hpl-20 directory to system root "/" tried to run "make arch=Linux_PII_CBLAS" from /hpl-20make[2]: Entering directory `/home/gvtlinux/hpl-2.0/src/auxil/Linux_PII_CBLAS'
ReplyDeleteMakefile:47: Make.inc: No such file or directory
make[2]: *** No rule to make target `Make.inc'. Stop.
I just ran through the instructions again on a single VM and could not reproduce your issue.
Deletedid you check that everything on step 7 (I've modified so it's not version specific anymore) installed?
I had the same issue on one of my nodes. I removed the hpl-2.0 folder and tar file then followed the steps again.
DeleteThere are some issues with the libmpich.a, to solve this problems replace:
ReplyDelete#MPlib = $(MPdir)/lib/libmpich.a
with:
#MPlib = $(MPdir)/lib/libmpich.a $(MPdir)/lib/libmpl.a
Did you mean?
DeleteMPlib = $(MPdir)/lib/libmpich.a $(MPdir)/lib/libmpl.a
otherwise it's commented out ;)
thanks alot for sharing this, it helped me in getting HPL runs very easily with no time and effort.
ReplyDeletecancel avast
ReplyDelete