Python bindings: add bytes accessors for data #5985

AllSeeingEyeTolledEweSew · 2021-02-17T20:37:32Z

A number of fields in the torrent protocol (or libtorrent) are encoding-agnostic byte strings.

We should ensure there's always a way to access bytes, if bytes may be passed as input, or if the underlying data is encoding-agnostic.

One pattern in python for this is "runtime templating", where the type of output changes with the type of input. For instance, os.listdir(".") returns a list of str, while os.listdir(b".") returns a list of bytes. However, the python bindings have historically allowed either bytes or str as input for char * or std::string, but tend to always return str. If we make output type vary with input, we'll probably break a lot of clients.

The only alternative I can think of is have *_bytes versions of functions.

Here's a proposed list of functions returning str, which should have pair functions returning bytes:

EDIT: I changed my mind about most of these, realizing that the underlying C++ logic munges inputs and presents utf-8 strings. generate_fingerprint() is the only one that seems relevant now.

~~file_storage.symlink(): symlink_bytes()~~
~~file_storage.file_path(): file_path_bytes()~~
~~file_storage.file_name(): file_name_bytes()~~
~~file_storage.name(): name_bytes()~~
generate_fingerprint(): generate_fingerprint_bytes()
~~torrent_info.web_seeds(): web_seeds_bytes() (the "auth" field may be binary data)~~
~~torrent_info.name(): name_bytes()~~
~~torrent_info.creator(): creator_bytes()~~
~~torrent_info.comment(): comment_bytes()~~
~~torrent_info.collections(): collections_bytes()~~

I'll add more as I find them.

Adding these new functions should be a non-breaking change.

Thoughts on file_storage

EDIT: never mind

There's an argument that the fields of file_storage should be given "pathname treatment", and use the filesystem encoding as detailed in #5984. This would make sense when creating torrents off the local filesystem.

However, currently file_storage is used in lots of different contexts. A torrent found randomly on the internet and converted into a torrent_info has a file_storage whose bytes may represent an encoding from some other system.

~~So I think the best thing is to add *_bytes() accessors to file_storage, and clients can apply whatever decoding (or guesses, with fallbacks) as necessary.~~

~~Notably, that means that the proper usage for creating files from the filesystem is a little funky:~~

fs = lt.file_storage()
lt.add_files(fs, my_root_dir)
for i in fs.num_files():
    # file_storage is abstract, but we know the encoding came from the filesystem
    print(os.fsdecode(fs.file_path_bytes(i)))

The only alternative I can think of here is that file_storage could hold a flag to remember how its inputs are encoded, such that fs.file_path(i) is the same as os.fsdecode(fs.file_path_bytes(i)) if fs was populated from the filesystem, but I can't see any way to add this feature without making it confusing and error prone.

The text was updated successfully, but these errors were encountered:

AllSeeingEyeTolledEweSew · 2021-02-18T01:12:35Z

Also: URIs and HTTP header names and values are meant to be limited to ASCII by various underlying protocols, so IMO it's not important to make these data accessible as bytes. We can do this for API consistency if we want, though.

AllSeeingEyeTolledEweSew · 2021-04-16T18:30:26Z

If my perspective in #5984 (comment) is right, I'll close this.

generate_fingerprint() returning bytes is the only ask that would still be legitimate, but I don't feel strongly about it

arvidn · 2021-04-17T13:42:49Z

It seems right to return bytes there. The tests should also make sure bytes could be used to set the peer id

AllSeeingEyeTolledEweSew · 2021-07-30T16:58:36Z

Per a lot of discussion, I stripped this down to just add a bytes version of generate_fingerprint(). I did that in #6349

arvidn added this to the 1.2.14 milestone Apr 8, 2021

arvidn modified the milestones: 1.2.14, 1.2.15 Jun 7, 2021

AllSeeingEyeTolledEweSew mentioned this issue Jul 30, 2021

Add generate_fingerprint_bytes #6349

Merged

arvidn modified the milestones: 1.2.15, 1.2.16 Dec 27, 2021

arvidn modified the milestones: 1.2.16, 1.2.17 Apr 17, 2022

arvidn modified the milestones: 1.2.17, 1.2.19 Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python bindings: add bytes accessors for data #5985

Python bindings: add bytes accessors for data #5985

AllSeeingEyeTolledEweSew commented Feb 17, 2021 •

edited

Loading

AllSeeingEyeTolledEweSew commented Feb 18, 2021

AllSeeingEyeTolledEweSew commented Apr 16, 2021

arvidn commented Apr 17, 2021 •

edited

Loading

AllSeeingEyeTolledEweSew commented Jul 30, 2021

Python bindings: add bytes accessors for data #5985

Python bindings: add bytes accessors for data #5985

Comments

AllSeeingEyeTolledEweSew commented Feb 17, 2021 • edited Loading

Thoughts on file_storage

AllSeeingEyeTolledEweSew commented Feb 18, 2021

AllSeeingEyeTolledEweSew commented Apr 16, 2021

arvidn commented Apr 17, 2021 • edited Loading

AllSeeingEyeTolledEweSew commented Jul 30, 2021

AllSeeingEyeTolledEweSew commented Feb 17, 2021 •

edited

Loading

arvidn commented Apr 17, 2021 •

edited

Loading