Trying Out Univa Grid Engine on AWS EC2

Post date: Nov 09, 2013 11:40:1 AM

Last week I tried out setting up Univa Grid Engine 8.1.4 on AWS EC2.

AWS EC2 is a great platform for trying out software that involves multiple servers and a network.  While it is possible to set up Grid Engine on virtual machines on one computer, there is probably insufficient IO and memory on most people's computers.

I set up five hosts on EC2: Master, Shadow Master, Submit, and two Execution hosts.

All hosts were on VPC with single public subnet of 10.0.0.0/24.  This allows me to set up the hosts with fixed internal IP address in a contiguous range.  I provisioned four of the hosts at one go and got contiguous address from 10.0.0.38 to 10.0.0.41.  The last Execution host was provisioned later and I got address of 10.0.0.88.

All were of instance type M1 Standard Large (m1.large): 2 vCPUs, 7.5 GiB memory.  There are supposed to be instance storage of 2 x 420 GB, but I could not see them.

The OS was SUSE Linux Enterprise Server 11 SP3, on EBS of 30 GB.  The default image had 9.9 GB of ext3 file system and I had to do resize2fs to extend the file system to 30 GB.  No swap was set up.

Grid Engine was installed on "/sge" on Master, NFS4 exported to the entire cluster.  I used classic spooling.

I set up SSH access for Grid Engine using access keys of both Master and Shadow hosts.

Submit host was set up with user home directory NFS4 exported to the entire cluster.  This requires identity mapping to be set up properly on both NFS4 server and Execution hosts clients.  See http://www.novell.com/support/kb/doc.php?id=7005060

I used local password file for the Submit and Execution hosts for users.

The last time I set up NFS4 on SUSE Linux Enterprise 11, I did not have to put in "[Translation]

Method=nsswitch" in "/etc/idmapd.conf", if my memory is correct.  This is now necessary.

The Execution hosts were set up with local spooling.

I also set up ARCo with PostgreSQL.  Note that the PostgreSQL data and configuration files are located in "/var/lib/pgsql".

I tested the fail-over by killing the sge_qmaster process on Master host.  After five minutes the Shadow host took over the Master role by starting up a sge_qmaster process.  That was neat.

The entire exercise cost US$61.77, which was deducted from credits I had.  It was a great exercise to try out Univa Grid Engine installation in a realistic networked environment, with affordable costs.

There are other installations like Hadoop that one will do well to test in a cloud environment like EC2 since that can be an IO intensive job.  On virtual machines on one computer, one can do only functional testing.

UnivaGridEngineAWS ‎‎(Responses)‎‎