Español  English  

Logo Alhambra CSIRC

UGR Alhambra CSIRC

UGRGrid

General Diagram

Estructura UGRGrid

Description Top

UGRGrid is a computing cluster composed by 281 servers. 16 of them are dedicated to management purposes (management storage, administration ...) and the rest to computing. There are two kinds of computing servers, some servers have 2 AMD Opteron Dual Core processors and others have 8 AMD Opteron Dual Core processors.

All nodes are interconnected through two kinds of network tecnologies, Infiniband for computing nodes, providing high speed, low latency and high bandwidth, and Ethernet to management network.

The storage system is based on a SAN (Storage Area Network). In UGRGrid there are two types of SAN, one to storage data on disk and one for backup on tapes. The first type is Sun StorageTek 6540, with 24 TB of capacity. The volumes that are in these are exported throughout the cluster via two types of network file systems, Lustre File System and NFS, the first one for temporary data storage and the other one for permanent storage. The data are saved in this SAN drive tape, which holds 40 TB of capacity.

In total, there are 1264 processing cores interconnected through a Infiniband network of high performance (10 Gb/s and 3 microseconds of latency), with 3 TB of RAM and 24 TB disk storage (of which are useful 14 TB ).

Management Servers Top

  • Job Scheduler. Distribute workload between computing nodes.
  • Remote management and monitoring.

Computing Servers Top

All of servers have AMD Opteron 275 processor, it clock frecuency is 2200 Mhz. This is a processor compatible with x86 family with capacity of 64-bit process. You can see the processor diagram in the next image:

Procesador

In the diagram you can see the multiprocessor system haven't Front Side Bus, unlike other x86 architectures. This design has the following advantages:

  • The RAM driver is integrated in CPU chip, allowing an important latency reduction
  • The comunications with the others processor (or memory or I/O) is directly, point to point. So, don't exist bottleneck, unlike FSB traditional architectures. These connections are realized through low latency link HyperTransport (bandwidth 8 GB/s).
  • L1 cache and L2 cache are integrated in CPU chip.

These processor are installed on two kinds of servers, Sun Fire X2200 M2 and Sun Fire X4600 M2, which have next features:

Sun Fire X2200 M2 (248 nodes)

  • 2 AMD Opteron Dual Core processors.
  • 8 GB RAM DDR2-667
  • 2 2.5'' SATA disk (250GB each)
  • 4 gigabit ethernet network interfaces
  • Size: 1U
  • Integarte management card IPMI, SNMP and Remote KVMS

Sun Fire X4600 M2 (17 nodes)

  • 8 AMD Opteron Dual Core processors.
  • 64 GB RAM DDR2-667
  • 2 Serial Attached SCSI disk (73 GB each)
  • 4 gigabit ethernet network intefaces
  • Size: 4U
  • Integrate management card IPMI, SNMP and Remote KVMS

Storage Top

The users have two types of storage, one to keep permanent data and one for temporal data. The first one will be permanent data directory (home directory) and the other one temporally work directory (scratch directory). In the last will be where the users launch their jobs. Each research group and user have a space personal in both of them to save their data.

The home directory is where the users must save their data once the application execution is finished. Note that the scratch directory is limited and shared with other users, so when the job is completed, the users must transfer the important output files to their home directory or personal computer. Otherwise, it would be hurting the system performance and the others users. Also, files that have not been accessed in the past 20 days are automatically deleted.

The home directory is limited to 50GBytes and 120000 inodes (regular files, directories, links, etc) for every user. When users come close to this limit, they receive an email informing them of this situation.

For example, for user anonymous, part of the research group which code is ANM, the directories would be:

Home Directory Scratch Directory
/home/ANM/anonymous /SCRATCH/ANM/anonymous

To send a job, users access via sftp to ugrgrid.ugr.es server, users transfer necessaries files to run their application to their scratch directory (/SCRATCH/ANM/anonymous). Once the job has finished, users move all files to save to their home directory (/home/ANM/anonymous).

Click here to list all applications supported on UGRGrid.

Each node have an additional temporally storage as local scracth, this space is accessible as /LOCALSCRATCH, whose capacity 150 GB on X2200 nodes and 100 GB on X4600 nodes.

Networks Top

Data network

Based on Gigabit Ethernet, is used to transfer applications data, to access to home directory and connect all cluster services.

Computing network

Based on Infiniband, is used to comunication in parallel application that use distributed memory (like MPI applications), to access to application scratch directories which are in shared storage. The advantages se utiliza para comunicación de aplicaciones paralelas que utilizan memoria distribuida (por ejemplo, aplicaciones que utilizan MPI), así como para acceder a los datos temporales de las aplicaciones que residen en almacenamiento compartido. The main advantages over Gigabit Ethernet are higher bandwidth (up to 10 Gb/s) and latency is much lower (about 3 microseconds latency), which is as important or even more than the value of bandwidth for the performance of scientific applications.

Contact us  Site map  Suggestions  Visit us          CSIRC                 © Universidad de Granada