Description of the Lisa system

What is the Lisa system

The Lisa system is a cluster computer consisting of several hundreds of multi-core nodes running the Linux operating system. The system is installed and maintained by SURFsara.

Research Capacity Computing Services (RCCS)

The Lisa system is used for the SURFsara service Research Capacity Computing Services (RCCS). RCCS is a SURFsara compute service for researchers coping with many large computational tasks. RCCS covers computing tasks which are typically characterized by a large amount of independent, moderately parallel, computing tasks. The tasks itself can be run in parallel on the computing system. RCCS is typically for those compute tasks which in practice cannot be run on either departmental or university computing systems, due to their size.

Participants

The following participants are involved in the Lisa system:

  • The University of Amsterdam (UvA)
  • The VU University Amsterdam (VU)
  • The SURF organization (SURF, which has taken over from NWO)

System configuration

The Lisa system is constantly evolving and growing to satisfy the needs of the participants. At this moment the configuration is as follows:

Number Type Clock Scratch Memory Cache Cores InfiniBand
32 L5640 2.26 GHz 220 GB 24 GB QPI 5.86 GT/s 12 MB 12 -
64 L5640 2.26 GHz 220 GB 24 GB QPI 5.86 GT/s 12 MB 12 Mellanox DDR
144 E5-2650L 1.80 GHz 750 GB 32 GB QPI 8.00 GT/s 20 MB 16 -
32 E5-2650 v2 2.60 GHz 870 GB 32 GB QPI 8.00 GT/s 20 MB 16 -
280 E5-2650 v2 2.60 GHz 870 GB 64 GB QPI 8.00 GT/s 20 MB 16 -
32 E5-2650 v2 2.60 GHz 870 GB 64 GB QPI 8.00 GT/s 20 MB 16 Mellanox FDR
1 E7-8857 v2 3.00 GHz 13 TB 1 TB QPI 8.00 GT/s 30 MB 48 -

Summary

Total number of cores    9008
Total amount of memory   31 TB
Total peak performance   159 TFlop/sec
Disk space               100 TB for the home file systems
Operating System         Debian Linux AMD64 OS
Mellanox                 InfiniBand network
                         Bandwidth DDR: 20 Gbit/sec, FDR: 56 Gbit/sec
                         Latency   DDR: 2.6 µsec,    FDR: 1.3 µsec 

Managing Lisa

Most of the software that is used to manage the Lisa system is Open Source. We are using the following software to manage the system:

  • CFEngine 3, A configuration engine
  • As batch software we use Torque and Maui. For Torque we have developed a pbs Python interface. So it is now possible to develop batch utilities in Python instead of C.
  • torque_2_deb, SURFsara has developed software to make a Debian package from the Torque source
  • Ganglia. For ganglia SURFsara has developed an jobmonarch add-on to monitor OpenPBS/Torque queues. https://ganglia.surfsara.nl
  • SALI, SALI (Sara Automatic Linux Installer) is a tool that allows you to install Linux on multiple machines at once. It support several protocols for downloading by way of aria2 to install a machine. For example, BitTorrent and rsync are supported. SALI originates from SystemImager and still uses the same philosophy. It is a scalable method for performing unattended installation. SALI is mostly used in cluster setups.