Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unrelated files from being overwritten #7138

Closed
glassez opened this issue Oct 10, 2022 · 20 comments
Closed

Prevent unrelated files from being overwritten #7138

glassez opened this issue Oct 10, 2022 · 20 comments
Labels

Comments

@glassez
Copy link
Contributor

glassez commented Oct 10, 2022

(This Issue is extracted from discussion
started at qbittorrent/qBittorrent#7146 (comment).)

Currently, when libtorrent starts processing an existing file (either when adding a new torrent, or when moving an existing one with dont_replace flag, when the file name matches the name of one of the files belonging to the torrent), it "naively" believes that this file belongs to the torrent. And if, when checking hashes, the hash of some piece does not match, then such a piece is considered not downloaded. Because of this behavior, any unrelated file with the matched name is simply considered by libtorrent as not having any downloaded piece, as a result of which such a file is overwritten by a file from the torrent, which is undesirable behavior from the point of view of users.
As mentioned in the linked Issue, it would be nice to make the initial file check smarter, or at least make the behavior described above less "naive".

@glassez
Copy link
Contributor Author

glassez commented Oct 10, 2022

There are three possible variants of the existing file in relation to torrent:

  1. Related file that has all the data, i.e. complete file
  2. Related file that partially has the data, i.e. incomplete file
  3. Unrelated file

So, in order to make the behavior less "naive", libtorrent could set an error and pause the torrent when any unrelated file is found, thereby preventing unrelated files from being unexpectedly overwritten. This will still allow overwriting such a file if necessary, if the user explicitly wants it (by clearing error and resuming the torrent).

@glassez
Copy link
Contributor Author

glassez commented Oct 12, 2022

either when adding a new torrent, or when moving an existing one with dont_replace flag

Apparently, the key action is checking files, so it doesn't matter what it is initiated by, adding a new torrent, moving storage, or just forced rechecking, at the end of it the torrent should get an errored status if at least one unrelated file is found.

@glassez

This comment was marked as off-topic.

@AllSeeingEyeTolledEweSew
Copy link
Contributor

How would libtorrent know if a file is "related" or not?

@arvidn
Copy link
Owner

arvidn commented Oct 14, 2022

For example how I outline above; if a file contains non-zero pieces that fail the hash check (where all blocks are non-zero, naturally)

@glassez
Copy link
Contributor Author

glassez commented Oct 15, 2022

How would libtorrent know if a file is "related" or not?

  1. If file matches the size and all the piece hashes it is exactly "related".
  2. If at least one piece hash matches (especially if size is matches too) it is highly likely "related" file (incompletely downloaded previously).
  3. If no piece hash matches but only size does, it is probably "related" file (only preallocated previously).
  4. If neither size nor any piece hash matches it is highly likely "unrelated" file.

So if it falls in "errored" state at least in case 4. it would be really better than currently. And I believe it should cover most of regular cases.

There also could be some setting for sensitivity level of checking to optionally consider case 3. and case 2. as error.

@AllSeeingEyeTolledEweSew
Copy link
Contributor

It seems like there are edge cases:

  • What if the file was partially downloaded in sparse mode? IIUC the file size may be smaller than the fully-allocated size.
  • Files smaller than 1 piece may not have enough data to hash check

Is there a reason these checks couldn't be done in the app, prior to calling move_storage()?

The goal seems to be to answer "did the user really mean to do X?". That's ill-defined, and may be different for different apps. Libtorrent's codebase is already huge. Personally I think there needs to be a really compelling reason to extend that codebase with fuzzy logic like this.

@glassez
Copy link
Contributor Author

glassez commented Oct 16, 2022

The goal seems to be to answer "did the user really mean to do X?". That's ill-defined, and may be different for different apps.

This is completely wrong. The goal is to prevent something that the user definitely does not want (i.e. overwriting the files that are unrelated to torrent and only have the matched names by accident).

And in case it is not clear from the context, the point is not to prohibit overwriting files in the cases described above (I agree, it is desirable to have clearer conditions for this), but namely to prevent accidental overwriting of files (anyone can still clear the error and resume the torrent).
It can also be a configurable option, so that libtorrent could behave either as before or have a different level of sensitivity to such ambiguous files.

@arvidn
Copy link
Owner

arvidn commented Oct 17, 2022

a change that would be simple to make is; if no_verify_files flag nor the seed_mode flag is set when adding a torrent, any existing file is considered an error.

It would support the following cases:

  1. When starting to seed existing files for the first time: specify the seed_mode flag. Existing files are expected and assumed to match
  2. When starting a download for the first time. Do not set seed_mode nor no_verify_files flags. Existing files are considered an error (with this proposal)
  3. When resuming a torrent from a previous session (either downloading or seeding), set the no_verify_files flag. Existing files are expected to belong to the torrent already

Perhaps a better way would be to add a new flag saying any existing files are to be considered an error. Say, no_overwrite_existing or something like that.

@glassez
Copy link
Contributor Author

glassez commented Oct 17, 2022

3. When resuming a torrent from a previous session

This case is not relevant at all.

As for 1. and 2.
The prerequisite is that user doesn't know enough about the files on disk and content of added torrents. It's either about magnet links, or about some automated ways to add torrents, for example from RSS.
I don't want to worsen the current UX by killing the ability to download previously unfinished torrents (again, I don't mean torrents from a resumed session, but those that were either incompletely downloaded by another torrent client, or transferred from another computer, etc.).

File checking should be done completely as currently and only then some more advanced analysis could be performed. And if it detects any possible unrelated files, then it should prevent them from being overwritten without explicit permission from the user side. That's the whole point.
BTW, now it seems to me that it can even be implemented at the application level (of course, not without using tricks with libtorrent extensions, which I have to use for closer integration with libtorrent in such cases).

@arvidn
Copy link
Owner

arvidn commented Oct 17, 2022

This case is not relevant at all.

You mean it's not relevant to this ticket, right? It is a case that needs to be supported,

@arvidn
Copy link
Owner

arvidn commented Oct 17, 2022

then it should prevent them from being overwritten without explicit permission from the user side.

The "explicit permission" could be in the form of a torrent_flag to add_torrent_params or by clear_error() on the torrent after being stopped with an error.

Did you have any other mechanism in mind?

@glassez
Copy link
Contributor Author

glassez commented Oct 17, 2022

This case is not relevant at all.

You mean it's not relevant to this ticket, right? It is a case that needs to be supported,

It isn't a case that needs to be supported.
Of course unless resume data is rejected and libtorrent checks the files from the scratch.

@arvidn
Copy link
Owner

arvidn commented Oct 17, 2022

It isn't a case that needs to be supported.
Of course unless resume data is rejected and libtorrent checks the files from the scratch.

It was added as part of this discussion: #5163

@glassez
Copy link
Contributor Author

glassez commented Oct 17, 2022

It isn't a case that needs to be supported.
Of course unless resume data is rejected and libtorrent checks the files from the scratch.

It was added as part of this discussion: #5163

Sorry, I'm confused. What do you mean?

I mentioned that the case of restoring torrents from a previous session (neither with no_verify_files flag nor without it) does not apply to this Issue.

@glassez
Copy link
Contributor Author

glassez commented Oct 17, 2022

then it should prevent them from being overwritten without explicit permission from the user side.

The "explicit permission" could be in the form of a torrent_flag to add_torrent_params or by clear_error() on the torrent after being stopped with an error.

Explicit permission to overwrite a "suspicious" files is clear_error() (after some file has been identified as possibly unrelated).
But as I have repeatedly mentioned above, there may also be a global session setting and/or torrent_flag in add_torrent_params to enable/disable this advanced analysis (and optionally adjust its sensitivity).

Did you have any other mechanism in mind?

No.

@EdwinKM
Copy link

EdwinKM commented Dec 7, 2022

Not native English, tried my best though :)

Maybe i am completely missing the obvious here. But we are trying really hard to solve a easy problem the hard way. (With fuzzy detection of related files)
To me the issue is not a libtorrent problem. The clients (Transmission/qbittorent) are downloading all files in 1 directory and if a torrent contains the same file it will overwrite each other data. Relation suggestion to the topic starter at the end.

Easy solution:

  • Create a unique folder for each torrent
  • User can change the final destination but the client should NOT move the data until it is finished.
  • On completion the client will move the download to the destination (without the unique foldername, with the "torrent" name as folder, or the custom folder if the user so set)
  • At this move-moment a collision is also possible. Just rename the file/folder and append a sequential number (aka: .1)

This fixes:

  • Even if torrents have the same structure (aka: torrent1=/dir/file1 & torrent2=/dir/file2) and a user decides to delete a torrent. This will never destroy the data for the other torrent because both folders are stored in a unique folder.
  • Current system is unclear to end users. User is adding a torrent. The client is showing it is downloading to "/path/to/downloads". But actually is downloading to "/path/to/incomplete". If however the user is changing it to "/path/to/downloads2" (because it is learning us to set the destination) the incomplete data is transferred to this folder. And a user can assume the download is thus finished.
  • Not sure about this one, because multiple torrents are writing/reading to the same file. Maybe it is now transferring a lot of bad data to other clients?

Possible downsides:

  • The "force directory" makes it maybe difficult to be backwards compatible. Users with already incomplete downloads who updates to the updated client (only introduce this for new added torrents?)
  • It will be more difficult for users to import their progress in a new installation without the database. Aka the "mapping" between the unique folder (with incomplete data) and the torrent (save torrent in the destination folder itself? This also fixes the issue stated by the topic starter. Nobody has to guess. The data in this folder belongs to the torrent in the same folder)

As stated, i do not have knowledge about the technical side. But this seems a better and simple solution.

@stale
Copy link

stale bot commented Mar 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 18, 2023
@stale stale bot closed this as completed Aug 13, 2023
@calumapplepie
Copy link

@arvidn Should this be fixed in libtorrent? Yes. Will I be submitting a fix for it in qBittorrent instead? Also yes. Was this issue closed for being "stale" without any good reason? You betchya.

@luzpaz
Copy link
Contributor

luzpaz commented Feb 13, 2024

bumping for re-open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants