XPost: linux.debian.bugs.dist
From: pronoiac@gmail.com
Whoops, I hadn't intended to top-post... I'll do it correctly this time.
On Thu, Sep 25, 2025 at 12:57€€€PM james young wrote:
>
> I'm not sure what timeline to expect for a response.
> Would a tarball of the outer image preserve everything needed for
diagnostics?
>
> -James
>
> On Tue, Sep 23, 2025 at 2:10€€€PM james young wrote:
> >
> > I hit an issue with btrfs compression; I reported it to Debian, which
> > I was using, and they suggested that I take it upstream.
> >
> > Thanks, Salvatore. My apologies to everyone if I misunderstood.
> >
> > -James
> >
> > On Tue, Sep 23, 2025 at 1:50€€€PM Salvatore Bonaccorso
wrote:
> > >
> > > Control: tags -1 + moreinfo
> > >
> > > Hi James,
> > >
> > > On Tue, Sep 23, 2025 at 08:04:25PM +0200, James Young wrote:
> > > > Package: src:linux
> > > > Version: 6.1.129-1
> > > > Severity: normal
> > > > X-Debbugs-Cc: pronoiac@gmail.com
> > > >
> > > > Dear Maintainer,
> > > >
> > > >
> > > > * What led up to the situation?
> > > > We made empty files in a loop, in parallel, under CPU and I/O load.
> > > > We had an outer Btrfs image file with compression, which contained a
Btrfs image file, which contained billions of empty files.
> > > > We wrote around 100TB to the inner image file.
> > > > Around 60TB in, compression quietly shut off.
> > > > We ran out of space; both mounts presented i/o errors.
> > > >
> > > > * What exactly did you do (or not do) that was effective (or
ineffective)?
> > > > * I unmounted the inner and outer images.
> > > > I didn't take note of memory usage before this point.
> > > > * dump debug info for the outer image - `btrfs inspect-internal
dump-tree --dfs ...`
> > > > * We started a btrfsck. (twice, actually; breadth-first hit memory
limits, I think)
> > > > After that, I learned about `btrfs check`, but didn't interrupt the
btrfsck, due to Sunk Cost Fallacy.
> > > > The btrfsck is still running. It's of extremely dubious value now.
> > > > * check the kernel logs
> > > > * I grepped for btrfs, the mount points, compress, and zstd. I
didn€€€t find a smoking gun in the right timeframe.
> > > >
> > > > not done yet:
> > > > * mount the outer image
> > > > * rebooted
> > > > * tried a newer kernel. we're currently on kernel 6.1.129; we could
go
to newer 6.1 or 6.12 kernels
> > > > * redo live file system compression, with e.g. `btrfs filesystem
defrag -czstd`
> > > > * fstrim the outer image
> > > >
> > > > goals:
> > > > * work out what happened.
> > > > How can we help?
> > > > * help avoid it happening again, to others
> > > > * salvage what we can
> > > >
> > > > I've run `bugreport` as a non-privileged user. Let me know if root
access would give a fuller picture.
> > >
> > > I believe the best thing you could do here is to contact actually
> > > upstream people directly. get_maintainers and the MAINTAINERS file
> > > has:
> > >
> > > BTRFS FILE SYSTEM
> > > M: Chris Mason
> > > M: Josef Bacik
> > > M: David Sterba
> > > L: linux-btrfs@vger.kernel.org
> > > S: Maintained
> > > W: https://btrfs.readthedocs.io
> > > Q: https://patchwork.kernel.org/project/linux-btrfs/list/
> > > C: irc://irc.libera.chat/btrfs
> > > T: git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.
git
> > > F: Documentation/filesystems/btrfs.rst
> > > F: fs/btrfs/
> > > F: include/linux/btrfs*
> > > F: include/trace/events/btrfs.h
> > > F: include/uapi/linux/btrfs*
> > >
> > > So I would suggest you to contact above maintainers including the
> > > list.
> > >
> > > Please keep this downstream bugreport as well in the recipients list.
> > >
> > > Regards,
> > > Salvatore
I made a tarball of the file system, then mounted and looked at the
file systems.
I attempted to recompress (with btrfs defrag) and fstrim, with little
success in freeing up space.
I started btrfs check with the progress option; within two hours, it
had gotten to €€€[2/7] checking extents, 82 items checked€€€.
I confused the extents with the compressed chunk length - 128KiB - so
that seemed woefully low on progress.
Over a week later, it€€€s still "82 items checked".
It€€€s still taking CPU (3% right now) and gigs of memory; it€€€s doing
something, though slowly.
So, a question:
* is this business as usual for a btrfs check?
* is this a clue about what happened?
* is this a symptom?
If this is a useful metric for file system robustness, is this
something I could / should experiment with to shorten?
* run `sync`
* periodically pause writes, to let the buffers empty
[continued in next message]
--- SoupGate-Win32 v1.05
* Origin: you cannot sedate... all the things you hate (1:229/2)
|