IBM Spectrum Scale

IBM Spectrum Scale
Developer(s)	IBM
Operating system	AIX / Linux / Windows Server
Type	File system
License	Proprietary
Website	IBM Spectrum Scale

IBM Spectrum Scale
Limits
Developer(s)	IBM
Full name	IBM Spectrum Scale
Introduced	1998 with AIX
Max. volume size	8 YB
Max. file size	8 EB
Max. number of files	2⁶⁴ per file system
Features
File system permissions	POSIX
Transparent encryption	yes
Other
Supported operating systems	AIX, Linux, Windows Server

IBM Spectrum Scale is a high-performance clustered file system developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List.^[1] For example, under its previous name of GPFS it was the filesystem of the ASC Purple Supercomputer^[2] which was composed of more than 12,000 processors and has 2 petabytes of total disk storage spanning more than 11,000 disks.

Before 2015, Spectrum Scale was known as IBM General Parallel File System (GPFS).^[3] GPFS 4.1 was the last release under the old name, and Spectrum Scale 4.1.1 the first release under the current name. The most recent release is Spectrum Scale 5.0.

In common with typical cluster filesystems, Spectrum Scale provides concurrent high-speed file access to applications executing on multiple nodes of clusters. It can be used with AIX 5L clusters, Linux clusters, on Microsoft Windows Server, or a heterogeneous cluster of AIX, Linux and Windows nodes. In addition to providing filesystem storage capabilities, Spectrum Scale provides tools for management and administration of the Spectrum Scale cluster and allows for shared access to file systems from remote Spectrum Scale clusters.

Spectrum Scale has been available on IBM's AIX since 1998, on Linux since 2001, and on Windows Server since 2008, and it is offered as part of the IBM System Cluster 1350.

History

Spectrum Scale, then known as GPFS, began as the Tiger Shark file system, a research project at IBM's Almaden Research Center as early as 1993. Tiger Shark was initially designed to support high throughput multimedia applications. This design turned out to be well suited to scientific computing.^[4]

Another ancestor of Spectrum Scale is IBM's Vesta filesystem, developed as a research project at IBM's Thomas J. Watson Research Center between 1992-1995.^[5] Vesta introduced the concept of file partitioning to accommodate the needs of parallel applications that run on high-performance multicomputers with parallel I/O subsystems. With partitioning, a file is not a sequence of bytes, but rather multiple disjoint sequences that may be accessed in parallel. The partitioning is such that it abstracts away the number and type of I/O nodes hosting the filesystem, and it allows a variety of logical partitioned views of files, regardless of the physical distribution of data within the I/O nodes. The disjoint sequences are arranged to correspond to individual processes of a parallel application, allowing for improved scalability.^[6]

Vesta was commercialized as the PIOFS filesystem around 1994,^[7] and was succeeded by GPFS around 1998.^[8]^[9] The main difference between the older and newer filesystems was that GPFS replaced the specialized interface offered by Vesta/PIOFS with the standard Unix API: all the features to support high performance parallel I/O were hidden from users and implemented under the hood.^[4]^[9] Today, Spectrum Scale is used by many of the top 500 supercomputers listed on the Top 500 Supercomputing Sites web site. Since inception Spectrum Scale has been successfully deployed for many commercial applications including digital media, grid analytics, and scalable file services.

In 2010 IBM previewed a version of GPFS that included a capability known as GPFS-SNC where SNC stands for Shared Nothing Cluster. This was officially released with GPFS 3.5 in December 2012, and is now known as FPO ^[10] (File Placement Optimizer). This allows Spectrum Scale to use locally attached disks on a cluster of network connected servers rather than requiring dedicated servers with shared disks (e.g. using a SAN). FPO is suitable for workloads with high data locality such as shared nothing database clusters like SAP HANA and DB2 DPF, and can be used as a HDFS-compatible filesystem.

Architecture

Spectrum Scale provides high performance by allowing data to be accessed over multiple computers at once. Most existing file systems are designed for a single server environment, and adding more file servers does not improve performance. Spectrum Scale provides higher input/output performance by striping blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel. Other features provided by Spectrum Scale include high availability, support for heterogeneous clusters, disaster recovery, security, DMAPI, HSM and ILM.

According to (Schmuck and Haskin), a file that is written to the filesystem is broken up into blocks of a configured size, less than 1 megabyte each. These blocks are distributed across multiple filesystem nodes, so that a single file is fully distributed across the disk array. This results in high reading and writing speeds for a single file, as the combined bandwidth of the many physical drives is high. This makes the filesystem vulnerable to disk failures: any one disk failing would be enough to lose data. To prevent data loss, the filesystem nodes have RAID controllers — multiple copies of each block are written to the physical disks on the individual nodes. It is also possible to opt out of RAID-replicated blocks, and instead store two copies of each block on different filesystem nodes.

Other features of the filesystem include

Distributed metadata, including the directory tree. There is no single "directory controller" or "index server" in charge of the filesystem.
Efficient indexing of directory entries for very large directories. Many filesystems are limited to a small number of files in a single directory (often, 65536 or a similar small binary number). Spectrum Scale does not have such limits.
Distributed locking. This allows for full Posix filesystem semantics, including locking for exclusive file access.
Partition Aware. A failure of the network may partition the filesystem into two or more groups of nodes that can only see the nodes in their group. This can be detected through a heartbeat protocol, and when a partition occurs, the filesystem remains live for the largest partition formed. This offers a graceful degradation of the filesystem — some machines will remain working.
Filesystem maintenance can be performed online. Most of the filesystem maintenance chores (adding new disks, rebalancing data across disks) can be performed while the filesystem is live. This ensures the filesystem is available more often, so keeps the supercomputer cluster itself available for longer.

It is interesting to compare this with Hadoop's HDFS filesystem, which is designed to store similar or greater quantities of data on commodity hardware — that is, datacenters without RAID disks and a Storage Area Network (SAN).

HDFS also breaks files up into blocks, and stores them on different filesystem nodes.
HDFS does not expect reliable disks, so instead stores copies of the blocks on different nodes. The failure of a node containing a single copy of a block is a minor issue, dealt with by re-replicating another copy of the set of valid blocks, to bring the replication count back up to the desired number. In contrast, while Spectrum Scale supports recovery from a lost node, it is a more serious event, one that may include a higher risk of data being (temporarily) lost.
Spectrum Scale supports full Posix filesystem semantics. HDFS and GFS do not support full Posix compliance.
Spectrum Scale distributes its directory indices and other metadata across the filesystem. Hadoop, in contrast, keeps this on the Primary and Secondary Namenodes, large servers which must store all index information in-RAM.
Spectrum Scale breaks files up into small blocks. Hadoop HDFS likes blocks of 64 MB or more, as this reduces the storage requirements of the Namenode. Small blocks or many small files fill up a filesystem's indices fast, so limit the filesystem's size.

Information lifecycle management

Storage pools allow for the grouping of disks within a file system. Tiers of storage can be created by grouping disks based on performance, locality or reliability characteristics. For example, one pool could be high performance Fibre Channel disks and another more economical SATA storage.

A fileset is a sub-tree of the file system namespace and provides a way to partition the namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas and be specified in a policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of rules in a user defined policy.

There are two types of user defined policies in Spectrum Scale: file placement and file management. File placement policies direct file data as files they are created to the appropriate storage pool. File placement rules are determined by attributes such as file name, the user name or the fileset. File management policies allow the file's data to be moved or replicated or files to be deleted. File management policies can be used to move data from one pool to another without changing the file's location in the directory structure. File management policies are determined by file attributes such as last access time, path name or size of the file.

The Spectrum Scale policy processing engine is scalable and can be run on many nodes at once. This allows management policies to be applied to a single file system with billions of files and complete in a few hours.

References

↑ Schmuck, Frank; Roger Haskin (January 2002). "GPFS: A Shared-Disk File System for Large Computing Clusters" (pdf). Proceedings of the FAST'02 Conference on File and Storage Technologies. Monterey, California, US: USENIX. pp. 231–244. ISBN 1-880446-03-0. Retrieved 2008-01-18.
↑ "Storage Systems - Projects - GPFS". IBM. Retrieved 2008-06-18.
↑ "IBM Redefines Storage Economics with New Software".
1 2 May, John M. (2000). Parallel I/O for High Performance Computing. Morgan Kaufmann. p. 92. ISBN 1-55860-664-5. Retrieved 2008-06-18.
↑ Corbett, Peter F.; Feitelson, Dror G.; Prost, J.-P.; Baylor, S. J. (1993). "Parallel access to files in the Vesta file system". Supercomputing. Portland, Oregon, United States: ACM/IEEE. pp. 472–481. doi:10.1145/169627.169786.
↑ Corbett, Peter F.; Feitelson, Dror G. (August 1996). "The Vesta parallel file system" (pdf). Transactions on Computer Systems. ACM. 14 (3): 225–264. doi:10.1145/233557.233558. Retrieved 2008-06-18.
↑ Corbett, P. F.; D. G. Feitelson; J.-P. Prost; G. S. Almasi; S. J. Baylor; A. S. Bolmarcich; Y. Hsu; J. Satran; M. Snir; R. Colao; B. D. Herr; J. Kavaky; T. R. Morgan; A. Zlotek (1995). "Parallel file systems for the IBM SP computers" (pdf). IBM Systems Journal. 34 (2): 222–248. doi:10.1147/sj.342.0222. Retrieved 2008-06-18.
↑ Barris, Marcelo; Terry Jones; Scott Kinnane; Mathis Landzettel Safran Al-Safran; Jerry Stevens; Christopher Stone; Chris Thomas; Ulf Troppens (September 1999). Sizing and Tuning GPFS (pdf). IBM Redbooks, International Technical Support Organization. see page 1 ("GPFS is the successor to the PIOFS file system").
1 2 Snir, Marc (June 2001). "Scalable parallel systems: Contributions 1990-2000" (pdf). HPC seminar, Computer Architecture Department, Universitat Politècnica de Catalunya. Retrieved 2008-06-18.
↑ "IBM GPFS FPO (DCS03038-USEN-00)" (pdf). IBM Corporation. 2013. Retrieved 2012-08-12.

External links

File systems

Disk

ADFS AdvFS Amiga FFS Amiga OFS APFS AthFS BFS Be File System Boot File System Btrfs CVFS CXFS DFS EFS Encrypting File System Extent File System Episode ext ext2 ext3 ext3cow ext4 FAT exFAT Files-11 Fossil HAMMER HFS HFS+ HPFS HTFS IBM Spectrum Scale JFS LFS MFS Macintosh File System TiVo Media File System MINIX NetWare File System Next3 NILFS NILFS2 NSS NTFS OneFS PFS QFS QNX4FS ReFS ReiserFS Reiser4 Reliance Reliance Nitro RFS SFS SNFS Soup (Apple) Tux3 UBIFS UFS VxFS WAFL Xiafs XFS Xsan zFS ZFS
Optical disc	HSF ISO 9660 ISO 13490 UDF
Flash memory and SSD	APFS FAT exFAT CHFS TFAT FFS2 F2FS HPFS JFFS JFFS2 JFS LogFS NILFS NILFS2 NVFS YAFFS UBIFS
Distributed	CXFS GFS2 Google File System OCFS2 OrangeFS PVFS QFS Xsan more...

NAS

Specialized

Aufs AXFS Boot File System CDfs Compact Disc File System cramfs Davfs2 FTPFS FUSE GmailFS Lnfs LTFS NOVA MVFS SquashFS UMSDOS OverlayFS UnionFS WBFS
Pseudo and virtual	configfs devfs debugfs kernfs procfs specfs sysfs tmpfs WinFS
Encrypted	eCryptfs EncFS EFS Rubberhose SSHFS ZFS

Types

Features

Case preservation Copy-on-write Data deduplication Data scrubbing Execute in place Extent File attribute Extended file attributes File change log Fork Links Hard Symbolic
Access control	Access control list Filesystem-level encryption Permissions Modes Sticky bit

Interfaces

Lists

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.