Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] document #1

Open
wyapx opened this issue Jul 19, 2022 · 15 comments
Open

[feature] document #1

wyapx opened this issue Jul 19, 2022 · 15 comments
Labels
documentation Improvements or additions to documentation

Comments

@wyapx
Copy link

wyapx commented Jul 19, 2022

来点文档

@johaven
Copy link

johaven commented Jul 22, 2022

+1 ;)

@synodriver
Copy link
Owner

api长得和原来那个用ctypes的差不多 其实这个是cython的教学项目,但是学生跑了,所以我自己写完了

@synodriver synodriver added the documentation Improvements or additions to documentation label Jul 22, 2022
@synodriver
Copy link
Owner

Ok will add it soon.

@johaven
Copy link

johaven commented Jul 22, 2022

Just one question, why not maintain the ctypes version? It's no longer maintained https://github.com/smartfile/python-librsync and it doesn't require any dependencies. I think there is no big difference in performance between CFFI and CTYPES for this library.

@synodriver
Copy link
Owner

Just one question, why not maintain the ctypes version? It's no longer maintained https://github.com/smartfile/python-librsync and it doesn't require any dependencies. I think there is no big difference in performance between CFFI and CTYPES for this library.

I'm not the maintainer of the origin repo so I don't know why they stop maintaining it either. I write this wrapper with cython which should be far more faster than the ctypes one, as for the cffi backend, it just a copy of cython to run on pypy. Both cython and cffi backend have the same function and signature.

@synodriver
Copy link
Owner

And here is a simple example

@synodriver
Copy link
Owner

Simple usage have been added.

@johaven
Copy link

johaven commented Jul 25, 2022

I'm not the maintainer of the origin repo so I don't know why they stop maintaining it either. I write this wrapper with cython which should be far more faster than the ctypes one, as for the cffi backend, it just a copy of cython to run on pypy. Both cython and cffi backend have the same function and signature.

It seems to me that they no longer maintained the ctype version because they no longer used the module for their product. This option is interesting because it does not impose any dependency in the code. Moreover I think that to propose the 3 solutions would really make this lib a good reference. Regarding performance, I would be curious to compare, because the call to the library is rather direct with the ctype method, there is very little code to access C functions.

Regarding the installation of the library, it would be interesting to use pip's extra require. Which would give the advantage of being able to install this way: pip install pyrsync[cffi] or pip install pyrsync[cython] and avoid the flags (--use-cffi ...)

Lack of time to test now but it would be necessary to indicate the compatible versions of librsync.

Regarding the ctype code, I can provide you with the version that I modified to work with version 2.x of librsync, I have to revalidate it with the latest version (2.3.x).

import ctypes
import ctypes.util
import tempfile
import functools

paths = ['lib/librsync', '../lib/librsync', 'librsync', 'librsync1', 'rsync']
if is_windows:
    _librsync = None
    for p in paths:
        try:
            _librsync = ctypes.cdll.LoadLibrary(p)
        except EnvironmentError:
            continue
        else:
            break
    if not _librsync:
        raise ImportError('Could not find librsync, make sure it is installed')
else:
    path = next((ctypes.util.find_library(p) for p in paths if ctypes.util.find_library(p)), None)
    if path:
        try:
            _librsync = ctypes.cdll.LoadLibrary(path)
        except ImportError:
            raise ImportError('Could not load librsync at "%s"' % path)
    else:
        raise ImportError('Could not find librsync, make sure it is installed')


VERSION = bytes(ctypes.cast(_librsync.rs_librsync_version, ctypes.c_char_p).value).decode().split()[1]

MAX_SPOOL = 1024 ** 2 * 5

TRACE_LEVELS = (0, 1, 2, 3, 4, 5, 6, 7)

RS_DONE = 0
RS_BLOCKED = 1

# Default length of strong signatures, in bytes.  The MD4 checksum is truncated to this size.
RS_JOB_BLOCKSIZE = 65536
# Default block length, if not determined by any other factors.
RS_DEFAULT_STRONG_LEN = 8
# Default, if not determined by file size
RS_DEFAULT_BLOCK_LEN = 2048

RS_DELTA_MAGIC = 0x72730236  # r s \2 6
RS_MD4_SIG_MAGIC = 0x72730136  # r s \1 6
RS_BLAKE2_SIG_MAGIC = 0x72730137  # r s \1 7

# PREFERRED_MAGIC_HASH = RS_MD4_SIG_MAGIC if parse_version(VERSION) < parse_version('1.0.0') else RS_BLAKE2_SIG_MAGIC

#############################
#  DEFINES FROM librsync.h  #
#############################


# librsync.h: rs_buffers_s
class Buffer(ctypes.Structure):
    _fields_ = [
        ('next_in', ctypes.c_char_p),
        ('avail_in', ctypes.c_size_t),
        ('eof_in', ctypes.c_int),

        ('next_out', ctypes.c_char_p),
        ('avail_out', ctypes.c_size_t),
    ]


# char const *rs_strerror(rs_result r);
_librsync.rs_strerror.restype = ctypes.c_char_p
_librsync.rs_strerror.argtypes = (ctypes.c_int,)

# rs_job_t *rs_sig_begin(size_t new_block_len, size_t strong_sum_len);
_librsync.rs_sig_begin.restype = ctypes.c_void_p
_librsync.rs_sig_begin.argtypes = (ctypes.c_size_t, ctypes.c_size_t, ctypes.c_int,)

# rs_job_t *rs_loadsig_begin(rs_signature_t **);
_librsync.rs_loadsig_begin.restype = ctypes.c_void_p
_librsync.rs_loadsig_begin.argtypes = (ctypes.c_void_p,)

# rs_job_t *rs_delta_begin(rs_signature_t *);
_librsync.rs_delta_begin.restype = ctypes.c_void_p
_librsync.rs_delta_begin.argtypes = (ctypes.c_void_p,)

# rs_job_t *rs_patch_begin(rs_copy_cb *, void *copy_arg);
_librsync.rs_patch_begin.restype = ctypes.c_void_p
_librsync.rs_patch_begin.argtypes = (ctypes.c_void_p, ctypes.c_void_p,)

# rs_result rs_build_hash_table(rs_signature_t* sums);
_librsync.rs_build_hash_table.restype = ctypes.c_size_t
_librsync.rs_build_hash_table.argtypes = (ctypes.c_void_p,)

# rs_result rs_job_iter(rs_job_t *, rs_buffers_t *);
_librsync.rs_job_iter.restype = ctypes.c_int
_librsync.rs_job_iter.argtypes = (ctypes.c_void_p, ctypes.c_void_p,)

# void rs_trace_set_level(rs_loglevel level);
_librsync.rs_trace_set_level.restype = None
_librsync.rs_trace_set_level.argtypes = (ctypes.c_int,)

# void rs_free_sumset(rs_signature_t *);
_librsync.rs_free_sumset.restype = None
_librsync.rs_free_sumset.argtypes = (ctypes.c_void_p,)

# rs_result rs_job_free(rs_job_t *);
_librsync.rs_job_free.restype = ctypes.c_int
_librsync.rs_job_free.argtypes = (ctypes.c_void_p,)

# A function declaration for our read callback.
patch_callback = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_void_p, ctypes.c_longlong,
                                  ctypes.c_size_t, ctypes.POINTER(Buffer))


class LibrsyncError(Exception):
    def __init__(self, r):
        super(LibrsyncError, self).__init__(_librsync.rs_strerror(ctypes.c_int(r)))


def seekable(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        s = args[0]
        assert callable(getattr(s, 'seek', None)), 'Must provide seekable file-like object'
        return f(*args, **kwargs)

    return wrapper


def _execute(job, f, o=None):
    """
    Executes a librsync "job" by reading bytes from `f` and writing results to
    `o` if provided. If `o` is omitted, the output is ignored.
    """
    # Re-use the same buffer for output, we will read from it after each
    # iteration.
    out = ctypes.create_string_buffer(RS_JOB_BLOCKSIZE)
    while 1:
        block = f.read(RS_JOB_BLOCKSIZE)
        buff = Buffer()
        # provide the data block via input buffer.
        buff.next_in = ctypes.c_char_p(block)
        buff.avail_in = ctypes.c_size_t(len(block))
        buff.eof_in = ctypes.c_int(not block)
        # Set up our buffer for output.
        buff.next_out = ctypes.cast(out, ctypes.c_char_p)
        buff.avail_out = ctypes.c_size_t(RS_JOB_BLOCKSIZE)
        r = _librsync.rs_job_iter(job, ctypes.byref(buff))
        if o:
            o.write(out.raw[:RS_JOB_BLOCKSIZE - buff.avail_out])
        if r == RS_DONE:
            break
        elif r != RS_BLOCKED:
            raise LibrsyncError(r)
        if buff.avail_in > 0:
            # There is data left in the input buffer, librsync did not consume
            # all of it. Rewind the file a bit so we include that data in our
            # next read. It would be better to simply tack data to the end of
            # this buffer, but that is very difficult in Python.
            f.seek(f.tell() - buff.avail_in)
    if o and callable(getattr(o, 'seek', None)):
        # As a matter of convenience, rewind the output file.
        o.seek(0)
    return o


def debug(level=7):
    assert level in TRACE_LEVELS, "Invalid log level %i" % level
    _librsync.rs_trace_set_level(level)


@seekable
def signature(f,
              s=None,
              block_size=RS_DEFAULT_BLOCK_LEN,
              block_checksum=RS_DEFAULT_STRONG_LEN,
              magic=RS_MD4_SIG_MAGIC):
    """
    Generate a signature for the file `f`. The signature will be written to `s`.
    If `s` is omitted, a temporary file will be used. This function returns the
    signature file `s`. You can specify the size of the blocks using the
    optional `block_size` parameter.
    """
    if s is None:
        s = tempfile.SpooledTemporaryFile(max_size=MAX_SPOOL, mode='wb+')
    job = _librsync.rs_sig_begin(block_size, block_checksum, magic)
    try:
        _execute(job, f, s)
    finally:
        _librsync.rs_job_free(job)
    return s


@seekable
def delta(f, s, d=None):
    """
    Create a delta for the file `f` using the signature read from `s`. The delta
    will be written to `d`. If `d` is omitted, a temporary file will be used.
    This function returns the delta file `d`. All parameters must be file-like
    objects.
    """
    if d is None:
        d = tempfile.SpooledTemporaryFile(max_size=MAX_SPOOL, mode='wb+')
    sig = ctypes.c_void_p()
    try:
        job = _librsync.rs_loadsig_begin(ctypes.byref(sig))
        try:
            _execute(job, s)
        finally:
            _librsync.rs_job_free(job)
        r = _librsync.rs_build_hash_table(sig)
        if r != RS_DONE:
            raise LibrsyncError(r)
        job = _librsync.rs_delta_begin(sig)
        try:
            _execute(job, f, d)
        finally:
            _librsync.rs_job_free(job)
    finally:
        _librsync.rs_free_sumset(sig)
    return d


@seekable
def patch(f, d, o=None):
    """
    Patch the file `f` using the delta `d`. The patched file will be written to
    `o`. If `o` is omitted, a temporary file will be used. This function returns
    the be patched file `o`. All parameters should be file-like objects. `f` is
    required to be seekable.
    """
    if o is None:
        o = tempfile.SpooledTemporaryFile(max_size=MAX_SPOOL, mode='wb+')

    @patch_callback
    def read_cb(opaque, pos, length, buff):
        f.seek(pos)
        size_p = ctypes.cast(length, ctypes.POINTER(ctypes.c_size_t)).contents
        size = size_p.value
        block = f.read(size)
        size_p.value = len(block)
        buff_p = ctypes.cast(buff, ctypes.POINTER(ctypes.c_char_p)).contents
        buff_p.value = block
        return RS_DONE

    job = _librsync.rs_patch_begin(read_cb, None)
    try:
        _execute(job, d, o)
    finally:
        _librsync.rs_job_free(job)
    return o

@synodriver
Copy link
Owner

The --use-cffi is for compilation, if you just want to install, use pip install python-rsync is enough.

By the way, it's necessary to point out that it is quite difficult to compile librsync correctly on windows, and you will probably get a bunch of link errors, which means you can only use bundled librsync on windows. On linux, users can specify which dynamic library to link against with --use-lib. You may have a look at the setup.py, this setup script probably need some modify, but anyway, this package is basically finished. More tests is needed and PRs are welcome.

@synodriver
Copy link
Owner

As for performance, the built-in ctypes module itself is based on Python-C API, and all type conversions are completed in Python. It's just a wrapper around libffi., how can it be faster than native Python-C API which cython compile against?

And for cffi, I'm using the API mod which also compiles to native Python-C API. The ctypes module is like cffi's ABI mod, so, with a ffi.dlopen, you may get sth. like ctypes.

@johaven
Copy link

johaven commented Jul 25, 2022

Now it's very easy to compile librsync on windows, only two commands:

cmake -A Win32 -D BUILD_RDIFF=OFF -D BUILD_SHARED_LIBS=OFF .

-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.17763.
-- The C compiler identification is MSVC 19.29.30146.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x86/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- DO_RS_TRACE=0
.....
-- Could NOT find LIBB2 (missing: LIBB2_LIBRARY_RELEASE LIBB2_INCLUDE_DIR)
-- Using included blake2 implementation.
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- CMAKE_C_FLAGS  = /DWIN32 /D_WINDOWS /W3 /D_CRT_SECURE_NO_WARNINGS
-- Configuring done
-- Generating done
-- Build files have been written

cmake --build . --config Release

Checking Build System
Building Custom Rule CMakeLists.txt
checksum_test.c
checksum.c
rollsum.c
...
Génération de code en cours...
sumset_test.vcxproj -> librsync-2.3.2\Release\sumset_test.exe
Building Custom Rule librsync-2.3.2/CMakeLists.txt

When i get more time after holidays, i will push a try to implement ctype if you are with that.

@synodriver
Copy link
Owner

Looks good. So there maybe a third backend like pyrsync.backends.ctypes?

@johaven
Copy link

johaven commented Jul 25, 2022

yes, that would be the idea :)

@synodriver
Copy link
Owner

Besides, just compiling for windows is not enough. You must also set the visibility of functions so that ctypes/cffi can load the module. I suffered a lot from this.

@synodriver
Copy link
Owner

image
With extra_objects passed to setup script, linux platforms can easily link to a shared library. But I just don't know how windows can do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants