This section describes how to increase SSD endurance and maximize steady-state write performance for write-intensive workloads by configuring volume groups and DDPs to have free capacity. The topics of endurance, overprovisioning, write amplification factor, and workload conditioning will be explored to provide a basis for understanding how leaving free capacity effectively increases the level of overprovisioning in the drives in each group or pool. Increasing overprovisioning can be expected to increase both SSD endurance and maximum sustained write performance, especially for lower-capacity drives.

SSD endurance

SSD endurance is typically specified in terms of drive writes per day (DWPD), which is just a convenient way to specify an amount of data. The NVMe SSDs used in the DE6400/DE6600 are rated for 1 DWPD. That means that you could nominally write an amount of data equal to the capacity of each SSD once per day without exceeding its rated endurance during the warranty period. Because endurance is a measure of the amount of data that can be written, the rated endurance for a 3.84TB SSD expressed as terabytes written is twice that of a 1.92TB SSD because it has twice the capacity. Similarly, the endurance for a 15.3TB SSD is twice that of a 7.68TB SSD.

There is an endurance limit specified for SSDs because solid-state memory can wear out. The NAND flash memory in an SSD is repeatedly programmed and erased over time as data is written to the drive. NAND flash memory can only be programmed and erased a limited number of times before wearing out, which means that there is an upper limit on how much data can be written to each SSD during its lifetime.

The smallest amount of data that can be written from the perspective of the array (and from the attached hosts) is one logical block, which is 4096 bytes for the NVMe drives used in an DE6400/DE6600. Inside the SSD, the smallest amount of data that can be written is a NAND flash memory page, which may be larger than a logical block. The smallest amount of data that can be erased is a NAND block, which can contain hundreds of pages. After a page is written, it cannot be overwritten until the entire NAND block is erased. The exact page and NAND block sizes vary between SSD models. In general, the NAND block size increases as NAND flash memory density and capacity increases.

Overprovisioning

All SSDs have more internal solid-state storage than the amount specified as the usable capacity. The extra capacity is referred to as overprovisioning (OP). The rated endurance is directly related to the amount of overprovisioning, which is expressed as the percentage increase of the usable capacity. OP values of 7%, 28%, and 100% typically correspond to rated endurances of 1 DWPD, 3 DWPD, and 10 DWPD, respectively. The exact amount of OP required for the rated endurance is an implementation detail, however, and can vary between vendors or between generations of drives.

So, the amount of solid-state storage in a drive that has a stated usable capacity of Ux with 7% OP has internal storage in the amount of R = Ux + 0.07*Ux or R = 1.07*Ux. If the same drive of raw capacity R were instead configured for an OP of 28%, the usable capacity would be Uy = (1.07*Ux)/1.28. If the drive were configured for an OP of 100%, the usable capacity would be Uz = (1.07*Ux)/2. As an example, a drive with a stated usable capacity of 3.84TB when configured for 7% OP to support an endurance of 1 DWPD would have a usable capacity of 3.2TB when configured for an OP of 28% to support an endurance of 3 DWPD. If it were configured with an OP of 100% to support 10 DWPD, it would have a usable capacity of 2.1TB.

As the capacity of SSDs has increased, the amount of raw capacity needed to configure a given amount of OP has also increased because OP is specified as the percent of additional memory needed to support a given endurance. For example, an 800GB SSD rated for 3 DWPD needs a raw capacity of approximately 1024GB, or 224GB more than the usable capacity. By comparison, a 3.84TB SSD configured for an endurance of 3 DWPD would require approximately 1.1TB of additional capacity as opposed to only about 270GB of additional capacity to support an endurance of 1 DWPD. The difference in raw capacity required is over 800GB, which is not directly visible to the end user and increases the cost of the drive as a percentage of usable capacity.

Write amplification factor

SSD endurance is specified as an amount of data that can be written to each drive during its lifetime. It is not really that simple, however, because the endurance rating is based on a random write workload assuming a certain write amplification factor (WAF). Recall that the data can be written to the NAND flash memory with page granularity, but can only be erased as a NAND block, which may contain hundreds of pages. To ensure even wear on all NAND blocks, the SSD performs both garbage collection and wear leveling in the background.

  • Garbage collection happens when the contents of a logical block are overwritten with new data. The SSD writes the data to a page that is currently erased. The old data for that logical block are no longer needed and can be discarded. After a large enough percentage of pages in a block no longer contains valid data, the SSD copies pages with valid data into erased pages in another NAND block so that it can erase the entire NAND block.

  • Wear leveling happens when a NAND block contains data that is never overwritten, the SSD periodically copy the data to another block so that all blocks can be used (in other words, programmed and erased) evenly throughout the life of the drive. Note: In general, increasing the OP lowers the WAF, especially for random write workloads. Lowering the WAF in turn increases endurance and can also increase steady-state performance for write-intensive workloads.

All of this means that the amount of data written to the NAND flash memory exceeds the amount of data written to the SSD by the array (which is the host, as viewed by the SSD). The ratio of NAND writes to host writes is referred to as the write amplification factor or WAF.

Steady-state performance

The maximum achievable write performance for an SSD eventually reaches a steady-state level. For most workloads, the maximum obtainable write performance can be expected to decrease from the peak.

Write performance for a given workload does not necessarily drop after the SSD has been conditioned to that workload if the write rate is at or lower than the maximum steady state write performance.

Values that can be obtained when the drive is mostly erased. As the host continues to write data to the drive, the SSD must perform garbage collection in the background to free space as logical blocks are overwritten. Over time, the drive must also perform background wear leveling. The maximum obtainable write performance starts to decrease as data is written to the drive but can be expected to stabilize to a steady-state value for a given workload. When the maximum obtainable performance stabilizes, the drive is said to be conditioned for that workload.

The amount of data that must be written before the maximum write performance stabilizes varies with the workload and the amount of overprovisioning. As a rule, maximum write performance can be expected to stabilize after an amount of data two to three times the capacity of the drive has been written to the drive. There is a correlation between maximum steady state write performance and overprovisioning. As a rule, maximum write performance increases with higher levels of overprovisioning.

Reserving free capacity

When creating volume groups and DDPs, consider leaving some free capacity in the group or pool rather than allocating all available capacity to volumes. The DE6400/DE6600 automatically unmaps free capacity. Therefore, free capacity effectively increases the OP level for the constituent drives in that group or pool, which can result in lower WAF for both random write and multi-stream sequential write workloads. Lowering the WAF for a given workload inherently increases endurance and can improve steady-state performance for write-intensive workloads, especially for lower capacity drives. With lower capacity drives, the maximum steady-state write performance is expected to be less than half that of the system throughput capability if there is no free space in the group or pool.

The maximum steady-state IOPS and bandwidth capability for each individual SSD in a group or pool increases as free capacity is increased in the group or pool. Equally important, increasing free capacity decreases the WAF for most workloads, increasing SSD endurance. The decrease in WAF should occur for most workloads even if the performance requirements of the workload are significantly lower than the maximum steady-state values.

The following table shows the effective OP for various amounts of free capacity held back as a percentage of the usable capacity of the drive. The usable capacity in a volume group varies considerably with the RAID level and group size, so the free capacity reserved in the volume group should be based on the total capacity of the drive. A holdback of 16.4% equates to an effective OP of 28%, which is the OP level nominally used to configure drives for 3 DWPD endurance.

Table 1. Table Per-drive capacity holdback (in GiB) required to reach effective OP.

% Holdback

Effective OP

1.92 TB SSD

3.84 TB SSD

7.68 TB SSD

15.3 TB SSD

0

7.0%

0

0

0

0

4

11.5%

71.54

143.08

286.16

572.32

8

16.3%

143.08

286.16

572.32

1144.63

12

21.6%

214.62

429.24

858.47

1716.95

16.4

28.0%

293.31

586.63

1173.25

2346.50

20

33.8%

357.70

715.40

1430.79

2861.58

24

40.8%

429.24

858.48

1716.95

3433.90

28

48.6%

500.78

1001.56

2003.11

4006.21