Network File System Version 4 T. Haynes Internet-Draft P. Evenou Intended status: Standards Track Hammerspace Expires: 18 April 2025 15 October 2024 The Mojette Transformation for the Erasure Encoding of Files in NFSv4.2 draft-haynes-nfsv4-mojette-encoding-00 Abstract Parallel NFS (pNFS) allows a separation between the metadata (onto a metadata server) and data (onto a storage device) for a file. The flex file layout type version 2 further allows for erasure encoding types to provide data integrity. In this document, a new erasure encoding type for the Mojette Transformation is introduced. Note This note is to be removed before publishing as an RFC. Discussion of this draft takes place on the NFSv4 working group mailing list (nfsv4@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group information can be found at https://datatracker.ietf.org/wg/nfsv4/ about/. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 18 April 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. Haynes & Evenou Expires 18 April 2025 [Page 1] Internet-Draft mojette-encoding October 2024 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Notations . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. Mojette Transform . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1. Encoding . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2. Decoding . . . . . . . . . . . . . . . . . . . . . . 4 2.1.3. Systematic and Non Systematic Implementations . . . . 5 2.1.4. Data Block Representation . . . . . . . . . . . . . . 5 2.2. Non-Systematic Mojette Transform . . . . . . . . . . . . 5 2.2.1. Block Encoding . . . . . . . . . . . . . . . . . . . 5 2.2.2. Block Decoding . . . . . . . . . . . . . . . . . . . 5 2.2.3. Example . . . . . . . . . . . . . . . . . . . . . . . 5 2.3. Systematic Mojette Transform . . . . . . . . . . . . . . 6 2.3.1. Block Encoding . . . . . . . . . . . . . . . . . . . 6 2.3.2. Block Decoding . . . . . . . . . . . . . . . . . . . 6 2.3.3. Example . . . . . . . . . . . . . . . . . . . . . . . 6 2.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.1. Benefits of Non-Systematic Mojette Transform: . . . . 7 2.4.2. Benefits of Systematic Mojette Transform: . . . . . . 7 3. Extension of Flexible File Layout Type Version 2 Encoding Type . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1. ffv2_encoding_type . . . . . . . . . . . . . . . . . . . 7 3.2. ffv2_mojette_faulty_devices . . . . . . . . . . . . . . . 8 3.3. ffv2_encoding_data . . . . . . . . . . . . . . . . . . . 8 4. Examples Tying It All Together . . . . . . . . . . . . . . . 9 4.1. Block_Sizes . . . . . . . . . . . . . . . . . . . . . . . 9 5. Extraction of XDR . . . . . . . . . . . . . . . . . . . . . . 10 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 8.1. Normative References . . . . . . . . . . . . . . . . . . 11 8.2. Informative References . . . . . . . . . . . . . . . . . 12 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 12 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 Haynes & Evenou Expires 18 April 2025 [Page 2] Internet-Draft mojette-encoding October 2024 1. Introduction In Parallel NFS (pNFS) (see Section 12 of [RFC8881]), the metadata server returns layout type structures that describe where file data is located. There are different layout types for different storage systems and methods of arranging data on storage devices. [RFCTBD09] defined the Flexible File Version 2 Layout Type used with file-based data servers that are accessed using the NFS protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC8881], and NFSv4.2 [RFC7862]. This document introduces a new Erasure Encoding Type called Mojette Transformation. Using the process detailed in [RFC8178], the revisions in this document become an extension of NFSv4.2 [RFC7862]. They are built on top of the external data representation (XDR) [RFC4506] generated from [RFC7863]. 1.1. Notations x_i: The ith value of x. sum_{x=0}^{10} f(x_i): The sum of the function f applied to the values of x ranging from 0 to 10. I.e, The first '{}' describes the lower range of x and the second '{}' describes the upper range of x. 1.2. Requirements Language The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 'MAY', and 'OPTIONAL' in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Mojette Transform 2.1. Introduction The Mojette Transform is an erasure coding technique that provides fault tolerance for data storage systems by enabling the recovery of lost data blocks. This section describes the integration of the systematic Mojette Transform into the NFS protocol, focusing on encoding and decoding file system blocks, typically sized at 4KB or 8KB. Haynes & Evenou Expires 18 April 2025 [Page 3] Internet-Draft mojette-encoding October 2024 2.1.1. Encoding The Mojette Transform involves the following steps to encode a data block: Initialization: Each data block is treated as a 2D grid of data elements (pixels). Typically, a block is structured as a matrix of size PxQ, where P and Q are the dimensions of the grid. Projections Calculation: Projections are computed along specific directions defined by pairs of coprime integers (p_i, q_i). Each projection sums the values of the data elements (pixels) along a line defined by these directions. The size of a projection is given as in Figure 1. For a given projection direction (p_i, q_i), the projection values are calculated as in Figure 2, where Delta is 1 if the argument is zero and 0 otherwise. Size of projection = (P - 1) x |q_i| + (Q - 1) * |p_i| + 1 Figure 1 Projection(b, p_i, q_i) = sum_{k=0}^{Q-1} sum_{l=0}^{P-1} Data(k, l) x Delta(b - l x p_i + k x q_i) Figure 2 2.1.2. Decoding Decoding the Mojette Transform is the inverse of encoding, involving the reconstruction of data from projection data to fill an empty grid. This involves solving a system of linear equations defined by the projection differences and the projection directions (p_i, q_i). The algorithm iterates to refine the values of the missing data elements until the original data block is reconstructed. Data reconstruction is possible if Katz's criterion holds, which was extended to any convex shape. The Katz's criterion specifies that reconstruction is valid if for a given set of n projections along n directions (p_i, q_i) either sum_{i=0}^{n} q_i ≥ Q or sum_{i=0}^{n} p_i ≥ P. Adjusting the number of lines Q and the projections set allows setting a desired fault-tolerance threshold. For example a 64x4 grid can be decoded by the projection set {(0, 1), (1, 1), (2, 1), (3, 1)} as sum_{i=1}^{4} q_i = 4. Haynes & Evenou Expires 18 April 2025 [Page 4] Internet-Draft mojette-encoding October 2024 2.1.3. Systematic and Non Systematic Implementations A systematic code is an error-correcting code where the input data is embedded directly in the encoded output. In contrast, a non- systematic code produces an output that does not contain the original input symbols. The Mojette Transform can be implemented in both ways, allowing it to adapt to various use cases. 2.1.4. Data Block Representation In the context of NFS, a data block corresponds to a file system block, which is a contiguous segment of data, typically 4KB or 8KB in size. The Mojette Transform encodes these blocks to ensure data integrity and availability in distributed storage environments. 2.2. Non-Systematic Mojette Transform 2.2.1. Block Encoding In the non-systematic version of the Mojette Transform, the original data block is not directly included in the encoded output. Instead, the entire encoded output consists of projections computed from the original data. The number of computed projections n is larger than the number of projections m required to rebuild the initial data. 2.2.2. Block Decoding To decode a file system block that has undergone the non-systematic Mojette Transform, the following steps are followed: Identify Available Projections: Determine which projections are available. A least m projections (what ever they are) out of n must be available. Recompute Data Block: Apply the inverse Mojette Transform to rebuild the original Data. 2.2.3. Example Assume a file system block of 4KB is divided into a 128x4 matrix of 128-bit elements. Using the non-systematic Mojette Transform, we compute projections along selected directions, such as (-2,1), (-1, -1), (0,1), (1,1), (2,1) and (3,1). The original data is not stored directly; instead, the projections are stored. Haynes & Evenou Expires 18 April 2025 [Page 5] Internet-Draft mojette-encoding October 2024 If a data loss occurs, for instance, if two projections are lost, the missing elements can be recovered by using the remaining projections and solving the inverse problem. Any set of 4 projections among the 6 generated can rebuild exactly the original data block 2.3. Systematic Mojette Transform 2.3.1. Block Encoding In the systematic version, the original data block (file system block) is part of the encoded output. Additional projections are calculated to provide redundancy. If k is the number of original data blocks and n is the total number of encoded blocks (including projections), the systematic code will have the first k blocks as the original data and the remaining n - k blocks as projections. 2.3.2. Block Decoding To decode a file system block that has undergone the Systematic Mojette Transform, the following steps are followed: Identify Missing Data: Determine which data lines are missing in the block. Let e be the number of missing lines. Recompute Projections: Compute the projections of the available (partial) data blocks. Calculate the differences between the projections of the full data and the partial data. Combine Existing Data and Recomputed projection: Recreate the full block of data by combining the existing data with the reconstructed data according to their positions in the block. 2.3.3. Example Assume a file system block of 4KB is divided into a $64\times4$ matrix of 128-bit elements. Using the systematic Mojette Transform, we first compute projections along selected directions, such as (0,1), and (1,1). The original 4 blocks of 64 128-bit elements remains part of the encoded data, and the 2 additional projections are stored for redundancy. If a data loss occurs, the missing elements can be recovered by using the projections and solving the inverse problem. Haynes & Evenou Expires 18 April 2025 [Page 6] Internet-Draft mojette-encoding October 2024 2.4. Conclusion The Mojette Transform provides two implementations for an efficient and effective way to enhance data reliability by encoding file system blocks with additional projections. These methods ensure that data can be reconstructed even in the presence of failures, thereby enhancing the fault tolerance of the file system. In summary, the Mojette Transform offers robust solutions for data reliability in file systems, balancing redundancy, efficiency, and performance to ensure data integrity and quick recovery from failures. 2.4.1. Benefits of Non-Systematic Mojette Transform: Fast Failure Detection: By storing only encoded data and having the ability to rebuild the original block from any sufficient subset of projections, the non-systematic encoding implementation offers great flexibility and allows fast failure detection and recovery. Constant Performance: The non-systematic decoding algorithm is the most performant of all implementations. Although decoding always occurs, the overhead is low, and unlike systematic encoding, performance remains constant regardless of the number of failures. 2.4.2. Benefits of Systematic Mojette Transform: Redundancy Reduction: Systematic Mojette coding reduces redundancy by integrating the original data blocks into the encoded data, unlike non-systematic codes that generate entirely new data from the original. Efficiency: Fewer projections need to be calculated and stored, reducing both computational and storage overhead. Performance: Decoding is faster and simpler, especially when some original data blocks are available, enabling quicker data access. However, performance is slightly degraded in the case of failures and depends on the number of failures. 3. Extension of Flexible File Layout Type Version 2 Encoding Type 3.1. ffv2_encoding_type Haynes & Evenou Expires 18 April 2025 [Page 7] Internet-Draft mojette-encoding October 2024 /// enum ffv2_encoding_type { /// FFV2_ENCODING_MIRRORED = 0x1; /// FFV2_ENCODING_MOJETTE_SYSTEMATIC = 0x2; /// FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC = 0x3; /// }; Figure 3 ffv2_encoding_type is extended in Figure 3 to introduce two different erasure encoding types: FFV2_ENCODING_MOJETTE_SYSTEMATIC and FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC. They are intoduced at this level instead of at the ffv2_encoding_data (see Figure 5) in order for the client to negotiate the support of one over the other with the NFS4ERR_ERASURE_ENCODING_NOT_SUPPORTED error (see Section ZZZ of [RFCTBD09]). 3.2. ffv2_mojette_faulty_devices /// enum ffv2_mojette_faulty_devices { /// FFV2_MOJETTE_FAULTY_DEVICES_2_1 = 0x1; /// FFV2_MOJETTE_FAULTY_DEVICES_4_1 = 0x2; /// FFV2_MOJETTE_FAULTY_DEVICES_4_2 = 0x3; /// FFV2_MOJETTE_FAULTY_DEVICES_8_1 = 0x4; /// FFV2_MOJETTE_FAULTY_DEVICES_8_2 = 0x5; /// FFV2_MOJETTE_FAULTY_DEVICES_8_3 = 0x6; /// FFV2_MOJETTE_FAULTY_DEVICES_8_4 = 0x7; /// }; Figure 4 The ffv2_mojette_faulty_devices (see Figure 4) can be used in both the layout_hint (see both Section 5.12.4 of [RFC8881] and Section XXX of [RFCTBD09]) and the ffl_encoding_type_data (see Section YYY of [RFCTBD09]) to convey the distribution of FFV2_DS_FLAGS_ACTIVE and FFV2_DS_FLAGS_SPARE projection blocks (see Section XXX of [RFCTBD09]) in the layouts for the Flexible File Version 2 Layout Type. The name of each of the enum targets ends with 'X_Y', which states that there is a need for X+Y files to compose the projection. X is the number of FFV2_DS_FLAGS_ACTIVE blocks and Y is the number of FFV2_DS_FLAGS_SPARE blocks. 3.3. ffv2_encoding_data Haynes & Evenou Expires 18 April 2025 [Page 8] Internet-Draft mojette-encoding October 2024 /// union ffv2_encoding_type_data /// switch (ffv2_encoding_type fetd_type) { /// case FFV2_ENCODING_MIRRORED: /// void; /// case FFV2_ENCODING_MOJETTE_SYSTEMATIC: /// case FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC: /// ffv2_mojette_faulty_devices /// fetd_mojette_potection_configuration; /// }; Figure 5 This addition of FFV2_ENCODING_MOJETTE_SYSTEMATIC and FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC to the ffv2_encoding_type_data (see Figure 5) allows for the metadata server to inform the client as to the expected distribution of FFV2_DS_FLAGS_ACTIVE and FFV2_DS_FLAGS_SPARE projection blocks inside the ffm_data_servers arm for the Mojette erasure encoding types. 4. Examples Tying It All Together 4.1. Block_Sizes Consider a FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC encoding type which needs 6 active blocks and 2 spare blocks in the payload (Not a valid ffv2_mojette_faulty_devices, but needed to show varied block sizes). As can be seen in Table 1 , 4kB data blocks need blocks about 1kB in size. But not all blocks are the same size. +===============+======+======+======+======+======+======+===+===+ | Projection ID | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | +===============+======+======+======+======+======+======+===+===+ | p value | -3 | -2 | -1 | 0 | 1 | 2 | 0 | 0 | +---------------+------+------+------+------+------+------+---+---+ | q value | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | +---------------+------+------+------+------+------+------+---+---+ | size (bytes) | 1048 | 1080 | 1080 | 1048 | 1064 | 1048 | 0 | 0 | +---------------+------+------+------+------+------+------+---+---+ Table 1: Example sizes of a Mojette Projection Haynes & Evenou Expires 18 April 2025 [Page 9] Internet-Draft mojette-encoding October 2024 5. Extraction of XDR This document contains the external data representation (XDR) [RFC4506] description of the uncacheable attribute. The XDR description is embedded in this document in a way that makes it simple for the reader to extract into a ready-to-compile form. The reader can feed this document into the following shell script to produce the machine readable XDR description of the new flags: #!/bin/sh grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' That is, if the above script is stored in a file called 'extract.sh', and this document is in a file called 'spec.txt', then the reader can do: sh extract.sh < spec.txt > mojette_encoding_prot.x The effect of the script is to remove leading white space from each line, plus a sentinel sequence of '///'. XDR descriptions with the sentinel sequence are embedded throughout the document. Note that the XDR code contained in this document depends on types from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]), the Flex Files Layout Type flexfiles.x file (generated from [RFC8435]), and the Flex Files Layout Type version 2 flexfilesv2.x (generated from [RFCTBD09]). This includes both nfs types that end with a 4, such as offset4, length4, etc., as well as more generic types such as uint32_t and uint64_t. While the XDR can be appended to that from [RFC7863], the various code snippets belong in their respective areas of that XDR. 6. Security Considerations This document has the same security considerations as both Flex Files Layout Type version 1 (see Section 15 of [RFC8435]) and NFSv4.2 (see Section 17 of [RFC7862]). Haynes & Evenou Expires 18 April 2025 [Page 10] Internet-Draft mojette-encoding October 2024 7. IANA Considerations This document introduces changes in the 'Flex Files V2 Erasure Encoding Type Registry'. This document defines both the FFV2_ENCODING_MOJETTE_SYSTEMATIC and FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC types for Client-Side Mojette Transformations. +======================================+=====+========+===+========+ | Erasure Encoding Type Name |Value|RFC |How|Minor | | | | | |Versions| +======================================+=====+========+===+========+ | FFV2_ENCODING_MOJETTE_SYSTEMATIC |2 |RFCTBD10|L |2 | +--------------------------------------+-----+--------+---+--------+ | FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC |3 |RFCTBD10|L |2 | +--------------------------------------+-----+--------+---+--------+ Table 2: Flex Files V2 Erasure Encoding Type Assignments 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4506] Eisler, M., Ed., "XDR: External Data Representation Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 2006, . [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, March 2015, . [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, November 2016, . [RFC7863] Haynes, T., "Network File System (NFS) Version 4 Minor Version 2 External Data Representation Standard (XDR) Description", RFC 7863, DOI 10.17487/RFC7863, November 2016, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . Haynes & Evenou Expires 18 April 2025 [Page 11] Internet-Draft mojette-encoding October 2024 [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, . [RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, . [RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 8881, DOI 10.17487/RFC8881, August 2020, . [RFCTBD09] Haynes, T., "Erasure Encoding of Files in NFSv4.2", October 2024, . 8.2. Informative References [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 Protocol Specification", RFC 1813, DOI 10.17487/RFC1813, June 1995, . Appendix A. Acknowledgments The following from Hammerspace were instrumental in driving the incorporation of the Mojette Transformation into an encoding type for Flexible Files Version 2 Layout Type: David Flynn, Trond Myklebust, Tom Haynes, Didier Feron, Jean-Pierre Monchanin, Pierre Evenou, and Brian Pawlowski. Appendix B. RFC Editor Notes This section is to be removed before publishing as an RFC. [RFC Editor: prior to publishing this document as an RFC, please replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the RFC number of this document] [RFC Editor: prior to publishing this document as an RFC, please replace all occurrences of RFCTBD09 with RFCyyyy where yyyy is the RFC number of the document: draft-haynes-nfsv4-erasure-encoding.xml] Authors' Addresses Thomas Haynes Hammerspace Haynes & Evenou Expires 18 April 2025 [Page 12] Internet-Draft mojette-encoding October 2024 Email: loghyr@gmail.com Pierre Evenou Hammerspace Email: pierre.evenou@hammerspace.com Haynes & Evenou Expires 18 April 2025 [Page 13]