The European Databases of Seismogenic Faults (EDSF) installation's main purposes are to publish datasets through the open standards developed by the Open Geospatial Consortium (OGC) and to host a large part of the data behind those datasets.This document illustrates the main characteristics of the IT infrastructure, called SEISMOFAULTS.EU, which serves these purposes.
The front end of the infrastructure is the EDSF portal, although a few other portals and websites are hosted on SEISMOFAULTS.EU to provide a user interface with other activities related to sibling projects.
Implementation
Hardware
SEISMOFAULTS.EU is hosted in three dedicated physical servers belonging to the INGV IT infrastructure:
- N. 1 Hewlett Packard Enterprise (HPE) ProLiant BL460c (Gen9) equipped with two Intel® Xeon® CPU E5-2640 v4 @ 2.40GHz processors with 10/10 cores, 20 threads; 128 GB of RAM for every CPU with a frequency of 2133 MHz; two HDD HPE of 300GB in Raid 1.
This server is entirely dedicated to development and testing.
- N. 2 HP ProLiant DL560 (Gen10) equipped with 4 Intel® Xeon® Gold 5118 CPU @ 2.30GHz 12/12 cores, 24 threads processors with 64 GB of RAM for every CPU with a frequency of 2400 MHz; four HDD da 300 GB in Raid 5 and 4 SSD da 1920 GB in RAID 5.
In addition, 50TB of storage hosted by the Storage Area Network (SAN) of Centro Servizi Informativi (CSI) of INGV premises in Rome is dedicated to our installation.
The ProLiant BL460c and one of the DL560 are hosted in the data center of CSI; the other DL560 is hosted in the INGV premises in Bologna.
Software
Following the recommendations of Agenzia per l’Italia Digitale (AGID) contained in the “Linee guida acquisizione e riuso software PA” SEISMOFAULTS.EU implements almost only open-source software.
In each physical server, we installed XenServer Hypervisor for creating and managing virtual machines (VM). Debian Linux OS is installed on all the VMs except for one on which MS Windows Server 2016 is installed for running applications devoted to activities not directly related to the data distribution (figure below).
The web portals and websites are managed through Joomla! or WordPress Content Management System (CMS).
The publication of the OGC web services is done through the software GeoServer, while the data management in the backend is made with PostGIS.
All services and websites are containerized through Docker Containers technology.
Docker containers are a virtualization technology to create isolated applications or services inside an autonomous environment. This environment includes all the applications/services needed to be executed: dependencies, libraries, and configuration files.
Even if there are some similarities, containers are different from VM for several reasons:
- They share the same kernel and operating system, making them more efficient and less resource-intensive than VMs;
- They can be created, deleted, and run faster, making them ideal for microservices implementation and applications that need few resources and great scalability;
- They are portable: once created and customized, they can be deployed and run on almost every platform that can run the Docker runtime engine;
- They can be easily deployed in the cloud or mixed environments;
- They offer higher security. A (carefully configured) container is an environment isolated from the host OS and other containers. A malfunction or compromise does not affect other containers or the host OS.
The typical file system organization of a “dockerized” service inside SEISMOFAULTS.EU can be summarized as follows:
- A main directory “to hold them all” with a name reminiscent of the name of the service. It might also be a GIT repository (see below);
- A docker-compose.yml file containing details about the services, networks, and volumes for setting up the application’s environment. It is used to create, deploy, and manage service-related containers;
- A directory containing volumes is a way to store and manage data generated by and used by Docker containers. Volumes exist outside of the Docker container and can be used to persist data even if the container is destroyed or moved. They typically store and share data between multiple containers or with the host OS (see figure below).
For a full understanding of how Docker works, we recommend the official documentation.
Several websites and web services run on the same VM. Every website or web service includes its web server (Nginx or Apache), web application server (i.e., Apache Tomcat), and related RDBMSs.
A (dockerized) reverse proxy manages network traffic routing arriving and departing connections from and to the suitable container hosting the service requested by the user.
Git
Service-related directories are managed through Git.
Git is a version control system designed to handle software development projects. It is a distributed version control system, meaning each user can have a local copy of the repository, making it easier to collaborate with other users.
Every Docker project directory is also a Git repository. In this way, the Git system tracks every configuration change and synchronizes the sysadmin workstation and the server running the services straightforwardly, often automatically.
Security
All VMs are implemented with a strong rationalization of active services to limit unnecessary port exposure to the internet as far as possible.
Some VMs are available only from the institutional intranet and are accessible only through a Virtual Private Network (VPN). Those VMs that publish web services are protected through a few/simple firewall rules.
Software updates
The OS and application software updates are performed from some scripts that are periodically and automatically launched using standard Unix tools. The system administrator makes a manual revision of updates on a weekly schedule.
Docker image updates are manually curated by the system administrator and generally performed when image distributors release security patches.
Backup
System backups are performed at several independent levels:
- VM snapshots are periodically made through the Hypervisor user interface;
- Several scripts are executed nightly to perform a backup of the main directories of the VM OSs.
Backups are moved in disk partitions normally unused by OS and periodically copied inside a Network Attached Storage (NAS).
Monitoring
Monitoring is the process of collecting, analyzing, and displaying performance data used to measure the performance of an IT system. The goal is to ensure the system’s availability, performance, and security. We monitor the SEISMOFAULTS.EU installation to measure the performance of applications, databases, networks, and servers. Performance metrics such as latency, throughput, availability, and utilization provide a clear picture of the system's health and can identify potential areas of improvement to take corrective action when needed. Monitoring can also provide insight into the security of the system. It can detect unauthorized access attempts, malware, and other malicious activity.
The SEISMOFAULTS.EU infrastructure monitoring is performed at multiple levels. The first level is performed by the VM hypervisor, which notifies the system administrator if a VM has anomalous behaviors, such as the excessive activity of the disks or CPUs for an extended time, a saturation of memory space, or abnormal network traffic (figure below).
A second level is performed by Nagios, a software specifically designed for this task configured on a dedicated VM.
Nagios monitors both VMs at the OS level (processes, CPU load, RAM usage, disk partition usage, logged users, etc.) and at the process level (services and websites). It is configured to send a notification to the system administrator through email/Telegram in case of malfunction (figures below).
Security administration policy for the systems includes, in addition, software updates, backup, and monitoring.
Access Statistics
SEISMOFAULTS.EU uses two approaches to monitor access and report usage statistics for websites and web services because of the different technologies underlying their operation.
Website access statistics are created through the Google Analytics platform, whereas web services access statistics are created using the software AWStats. Google Analytics works through a javascript that “intercepts” the connections to every webpage of the website and stores data in the Google cloud for a wide range of analyses (figure below). AWStats, instead, is a Perl script installed on the same VMs that publishes the services. It analyzes the logs generated by the Web Application manager that runs GeoServer and produces as output a webpage with a set of stats (figure below).
Acknowledgements
We thank all the INGV staff of Centro Servizi Informativi and particularly Giovanni Scarpato, Stefano Cacciaguerra, Pietro Ficeli, Manuela Sbarra, Gianpaolo Sensale, Diego Sorrentino, Stefano Vazzoler, Francesco Zanolin for their continuous IT support; the EPOS Integrated Core Services staff of the EPOS office in Rome: Daniele Bailo, Kety Giuliacci, Rossana Paciello, and Valerio Vinciarelli for their effort in EPOS/SEISMOFAULTS.EU cooperation; and our colleagues Valentino Lauciani for suggestions on the Docker configuration, Matteo Quintiliani for managing the INGV GitLab installation, and Giorgio Maria De Santis, Mario Locati, and Gabriele Tarabusi for the fruitful exchange of views.
The development of the SEISMOFAULTS.EU infrastructure benefitted from the funding of H2020 projects EPOS IP (grant agreement No. 676564) and SERA (grant agreements No. 730900), the JRU EPOS ITALIA Piano di Attività 2021 2024 supported by the Italian Ministry of University and Research (MUR), nd the DPC INGV Agreement 2012 2021 (Annex A) and 2022 2024 (Annex A).
if you want more detailed information, please read the full report