-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python bindings: add bytes accessors for data #5985
Comments
Also: URIs and HTTP header names and values are meant to be limited to ASCII by various underlying protocols, so IMO it's not important to make these data accessible as bytes. We can do this for API consistency if we want, though. |
If my perspective in #5984 (comment) is right, I'll close this.
|
It seems right to return bytes there. The tests should also make sure bytes could be used to set the peer id |
Per a lot of discussion, I stripped this down to just add a bytes version of |
A number of fields in the torrent protocol (or libtorrent) are encoding-agnostic byte strings.
We should ensure there's always a way to access
bytes
, ifbytes
may be passed as input, or if the underlying data is encoding-agnostic.One pattern in python for this is "runtime templating", where the type of output changes with the type of input. For instance,
os.listdir(".")
returns a list ofstr
, whileos.listdir(b".")
returns a list ofbytes
. However, the python bindings have historically allowed eitherbytes
orstr
as input forchar *
orstd::string
, but tend to always returnstr
. If we make output type vary with input, we'll probably break a lot of clients.The only alternative I can think of is have
*_bytes
versions of functions.Here's a proposed list of functions returning
str
, which should have pair functions returningbytes
:EDIT: I changed my mind about most of these, realizing that the underlying C++ logic munges inputs and presents utf-8 strings.
generate_fingerprint()
is the only one that seems relevant now.file_storage.symlink()
:symlink_bytes()
file_storage.file_path()
:file_path_bytes()
file_storage.file_name()
:file_name_bytes()
file_storage.name()
:name_bytes()
generate_fingerprint()
:generate_fingerprint_bytes()
torrent_info.web_seeds()
:web_seeds_bytes()
(the"auth"
field may be binary data)torrent_info.name()
:name_bytes()
torrent_info.creator()
:creator_bytes()
torrent_info.comment()
:comment_bytes()
torrent_info.collections()
:collections_bytes()
I'll add more as I find them.
Adding these new functions should be a non-breaking change.
Thoughts on file_storage
EDIT: never mind
There's an argument that the fields offile_storage
should be given "pathname treatment", and use the filesystem encoding as detailed in #5984. This would make sense when creating torrents off the local filesystem.However, currentlyfile_storage
is used in lots of different contexts. A torrent found randomly on the internet and converted into atorrent_info
has afile_storage
whose bytes may represent an encoding from some other system.So I think the best thing is to add*_bytes()
accessors tofile_storage
, and clients can apply whatever decoding (or guesses, with fallbacks) as necessary.Notably, that means that the proper usage for creating files from the filesystem is a little funky:The only alternative I can think of here is thatfile_storage
could hold a flag to remember how its inputs are encoded, such thatfs.file_path(i)
is the same asos.fsdecode(fs.file_path_bytes(i))
iffs
was populated from the filesystem, but I can't see any way to add this feature without making it confusing and error prone.The text was updated successfully, but these errors were encountered: