Samsung 870 QVO 4TB SATA SSD-s: how are they doing after 4 years of use?

77 points by furkansahin 3 days ago

matharmin 13 hours ago

> Overall, I haven’t seen many issues with the drives, and when I did, it was a Linux kernel issue.

Reading the linked post, it's not a Linux kernel issue. Rather, the Linux kernel was forced to disable queued TRIM and maybe even NCQ for these drives, due to issues in the drives.

happyPersonR 5 hours ago

Hopefully there are drives that don’t have that issue?

Prunkton 12 hours ago

Since it’s kind of related, here’s my anecdote/data point on the bit rot topic: I did a 'btrfs scrub' (checksum) on my two 8 TB Samsung 870 QVO drives. One of them has been always on (10k hours), while the other hasn’t been powered on a single time in 9 months and once in 16 months.

No issues were found on either of them.

diggan 11 hours ago

How much have been written to each of them across their lifetime?
- Prunkton 11 hours ago
  
  very little, about 25 TB written on the always-on one. The offline one just does diffs, so probably <12 TB. Both are kind of data dumps, which is outside their designed use case. That's why I included data integrity checks in my backup script before the actual rsync backup runs. But again no issues so far
modzu 8 hours ago

if only i could trust btrfs scrub

tracker1 7 hours ago

My 3 of the first SATA SSDs are still in use from over a decade ago... I first had them in my home server as OS and Cache drives respectively, they later went into desktop use for a couple years when the server crashed and I replaced it with a mini pc that was smaller, faster and quieter. Then they eventually wound up in a few desk-pi cases with RPi 4 8GB units, and I handed them off to a friend a couple months ago. They're all still working and only 1 error between the 3 of them. IIRC, all 240gb crucial drives from early 2010's.

I've never had any spinning drives come close to that level of reliability. I've only actually had one SSD or NVME fail, and that's the first gen Intel drive in my desktop that had a firmware bug and one day showed up as an 8mb empty drive. It was a 64gb unit and I was so impressed by the speed, but tired of symlinking directories to the HDD for storage needs I just bumped to 240gb+ models and never looked back.

Currently using a Corsair MP700 Pro drive (gen 5 nvme) in my desktop. Couldn't be happier... Rust and JS projects build crazy fast.

vardump 2 days ago

I wonder how long those drives can be powered off before they lose the data. And until they lose all functionality when the critical bookkeeping data disappears.

magicalhippo 2 days ago

This would depend on how worn they are. Here's an article describing a test a YouTuber did[1] that I watched some time ago. The worn drives did not fare that well, while the fresh ones did ok. Those were TLC drives though, for QLC I expect the result is overall much worse.
[1]: https://www.tomshardware.com/pc-components/storage/unpowered...
- 0manrho 2 days ago
  
  I remember that post. Typical Tom's quality (or lack there of).
  The only insight you can gleam from that is that bad flash is bad, and worn bad flash is even worse, and that's frankly a stretch given the lack of sample sizes or a control group.
  The reality is that its non trivial to determine data retention/resilience in a powered off state, at least as it pertains to a coming to a useful and reasonably accurate generalism of "X characteristics/features result in poor data retention/endurance when powered off in Y types of devices," and being able to provide the receipts to back that up. There are far more variables than most people realize going on under the hood with flash and how different controllers and drives are architected(hardware) and programmed(firmware). Thermal management is a huge factor that is often overlooked or misunderstood and that has substantial impact on flash endurance (and performance). I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.
  All that said, the general mantra remains true: more layers per cell generally means data per cell is more fragile/sensitive, but that's generally in the context of write cycle endurance.
  - ffsm8 13 hours ago
    
    First time I hear such negativity about tomshardware but the only time I actually looked at one of their tests in detail was with their series that tests for burn-in for consumer OLED TVs and displays. But the other reviews I glances at in that contexts looked pretty solid from a casual glance
    Can you elaborate wrt the reason for your critique considering they're pretty much just testing from the perspective of the consumer? I thought their explicit goal is not to provide highly technical analysis and niche preferences but instead look at it for John Doe that's thinking about buying X, and what it would mean for his usecases. From my mental model of that perspective, they're reporting was pretty spot on and not shoddy, but I'm not an expert on the topic
    
    magicalhippo 12 hours ago
    
    The article I linked to is basically just a very basic retelling of the video by some YouTuber. I decided to link to it as I prefer linking to text sources rather than videos.
    The video isn't perfect, but I thought it had some interesting data points regardless.
    
    AdrianB1 12 hours ago
    
    As someone that I read Tom's since it was ran by Thomas, I found the quality of the articles a lot lower than almost 30 years ago. I don't remember when I stopped checking it daily, but I guess it is over 15 years ago.
    Maybe the quality looks good to you, but maybe you don't know what it used to be 25 years ago to compare to. Maybe it is a problem of wrong baseline.
  - tart-lemonade 4 hours ago
    
    > I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.
    I would read an entire series of blog posts about this.
  - Eisenstein 9 hours ago
    
    > I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.
    I would love this.

Havoc 13 hours ago

Have had enough consumer SSDs fail on me that I ended up building a NAS with mirrored enterprise ones...but 2nd hand ones. Figured between mirrored and enterprise that's an OK gamble.

Still to be seen how that works out in long run but so far so good.

Yokolos 11 hours ago

For data storage, I just avoid SSDs outright. I only use them for games and my OS. I've seen too many SSDs fail without warning into a state where no data is recoverable, which is extremely rare for HDDs unless they're physically damaged.
- yabones 9 hours ago
  
  SSDs are worth it to me because the restore and rebuild times are so much faster. Larger HDDs can take several days to rebuild a damaged array, and other drives have a higher risk of failure when they're being thrashed by IO and running hot. And if it does have subsequent drives fail during the rebuild, it takes even longer to restore from backup. I'm much happier to just run lots of SSDs in a configuration where they can be quickly and easily replaced.
- Havoc 11 hours ago
  
  I just don't have the patience for HDDs anymore. Mirrored arrays and backups are going to have to do on data loss.
  That said I only have a couple of TBs...bit more and HDDs do become unavoidable
  - shim__ 10 hours ago
    
    I'm using an HDD with SSD cache for /home all non stale will be cached by the SSD
- vardump 5 hours ago
  
  I’m worried about my drives that contain helium. So far so good, all show 100% helium level, but I wonder for how long.
PaulKeeble 10 hours ago

You can't trust SSDs or HDDs, fundamentally they still have high failure rates regardless. Modern Filesystems with checksums and scrub cycles etc are going to be necessary for a long time yet.

bullen 6 hours ago

I have used Samsung PM897 1.92TB for a year and it's okish... it is slow on the indexing when folders become saturated and as the drive fills up it becomes slower.

Solution is to remove some files... and pray it lasts half of a 64GB Intel X25-E!

It should last 30x shorter because it is 30x larger?

Or is this game only about saturation rate?

spaceport 4 hours ago

What size of apartment does an hdd noise become an issue in?

8cvor6j844qw_d6 13 hours ago

I wonder whats the best SATA SSD (M.2 2280) one could get now?

I have an old Asus with a M.2 2280 slot that only takes SATA III.

I recall 840 EVO M.2 (if my memory serves me right) is the current drive but looking for a new replacement seems not to be straightforward as most SATA is 2.5 in. Or if its the correct M.2 2280, its for NVMe.

Marsymars 9 hours ago

I don’t know about best but you can filter pcpartpicker.com to M.2 SATA interface drives.
Hendrikto 10 hours ago

Most companies stopped making and selling SATA M.2 drives years ago.
- jonbiggums22 7 hours ago
  
  Most companies seem to well on the way for SATA drives in general. There aren't many non-garbage tier options available anymore. I keep ending up buying 870 EVOs even though I don't really love them.
more_corn 8 hours ago

Samsung and Intel have come out on top on all my tests.

more_corn 8 hours ago

The critical failure profile is when you fill them up almost full and then hit them with a bunch of writes. If you can avoid that you’re good for years.

justsomehnguy 12 hours ago

> The reported SSD lifetime is reported to be around 94%, with over 170+ TB of data written

Glad for the guy, but here are a bit different view on the same QVO series:

    Device Model:     Samsung SSD 870 QVO 1TB
    User Capacity:    1,000,204,886,016 bytes [1.00 TB]
   
    == /dev/sda
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
    177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       406
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354606366027
    == /dev/sdb
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
    177 Wear_Leveling_Count     0x0013   060   060   000    Pre-fail  Always       -       402
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354366033251
    == /dev/sdc
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
    177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       409
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       352861545042
    == /dev/sdd
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40778
    177 Wear_Leveling_Count     0x0013   060   060   000    Pre-fail  Always       -       403
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354937764042
    == /dev/sde
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
    177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       408
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       353743891717

NB you need to look at the first decimal number in 177 Wear_Leveling_Count to get the 'remaining endurance percent' value, ie 59 and 60 here

While overall it's not that bad, losing only 40% after 4.5 years - it means what in another 3-4 years it would be down to 20% if the usage pattern wouldn't change and the system wouldn't hit the write amplification. Sure, someone had that "brilliant" idea ~5 years ago to use a desktop grade QLC flash as a ZFS storage for PVE...

yrro 11 hours ago

Have a look at the SSD Statistics page of the device statistics log (-l smart). This has one "Percentage Used Endurance Indicator" value, which is 5 for three of these disks, and 6 for one of them. So based on that, the drives still have ~95% of their useful life left.
As I understand it, the values in the device statistics log have standardized meanings that apply to any drive model, whereas any details about SMART attributes (as in the meaning of a particular attribute or any interpretation of its value apart from comparing the current value with the threshold) are not. So absent a data sheet for this particular drive documenting how to interpret attribute 177, I would not feel confident interpreting the normalized value as a percentage; all you can say is that the current value is > the threshold so the drive is healthy.
- justsomehnguy 8 hours ago
  == /dev/sda 177 Wear_Leveling_Count PO--C- 059 059 000 - 406 0x07 0x008 1 42 N-- Percentage Used Endurance Indicator == /dev/sdb 177 Wear_Leveling_Count PO--C- 060 060 000 - 402 0x07 0x008 1 42 N-- Percentage Used Endurance Indicator == /dev/sdc 177 Wear_Leveling_Count PO--C- 059 059 000 - 409 0x07 0x008 1 43 N-- Percentage Used Endurance Indicator == /dev/sdd 177 Wear_Leveling_Count PO--C- 060 060 000 - 403 0x07 0x008 1 42 N-- Percentage Used Endurance Indicator == /dev/sde 177 Wear_Leveling_Count PO--C- 059 059 000 - 408 0x07 0x008 1 42 N-- Percentage Used Endurance Indicator
  Yeah, it's better for sure. I did 'smartctl -x | grep Percent', it's easier.
  I said about 177 because it's the same numbers what PVE gives in the webui and I didn't found the obvious 'wearout/lifeleft' I'm accustomed to see in the SMART attributes.
wtallis 6 hours ago

Building an array of five 1TB QLC drives seems like a really odd decision, like somebody started with the constraint that they must use exactly five SSDs and then tried to optimize for cost.
The 4TB models obviously will hold up better under 170+ TB of writes than the 1TB drives will, and it wouldn't be surprising to see less write amplification on the larger drives.