Skip to content

Commit

Permalink
struct Dav1dFilmGrainData::ar_coeff_shift: 16-byte align to fix a `…
Browse files Browse the repository at this point in the history
…dav1d` alignment bug.

x86 asm uses `psrad` on a pointer to `Dav1dFilmGrainData`::ar_coeff_shift`.
When `psrad`'s shift operand is a memory address, i.e. a `XMMWORD PTR`,
it loads 128 bits from it, shifts by the lower 64 bits,
and requires the 128 bits to be 128-bit/16-byte aligned.

Previously, and still in `dav1d`, `ar_coeff_shift`
is only 8-byte aligned, as is `Dav1dFilmGrainData`.
However, in `dav1d`, `Dav1dFilmGrainData` is part of
`Dav1dFrameHeader`, which is allocated with `malloc`.
`malloc` happens to return 16-byte aligned pointers usually,
but is not required to, so this is UB and will segfault if not aligned.

Due to the `Rav1dFilmGrainData` to `Dav1dFilmGrainData`
conversion done now, the `Dav1dFilmGrainData` is stored on the stack,
and often will not be 16-byte aligned, and thus will often segfault.

To fix this, `ar_coeff_shift` must be 16-byte aligned.
This cannot be done only for the field without changing the offsets of
the subsequent fields, however, so we instead align
`Dav1dFilmGrainData` itself with `#[repr(align(16))]`.
`ar_coeff_shift` is at offset `0xB0`/`176`,
which is divisible by 16.

`psrad` also loads a full 128 bits, not just the 64 bits of
`ar_coeff_shift`, even if it doesn't read them,
so we must ensure that the following 64 bits are also deferenceable.
They indeed are in `dav1d`, but we must be careful,
as `ar_coeff_shift` being the last field would
read 8 bytes out of bounds and be UB.

Note that this bug fix comes before the next commit, which actually triggered the bug
due to putting `Dav1dFilmGrainData` on the stack and at a slightly different spot than previously.
However, this was always a bug and UB all along, just not triggered by a segfault yet.
Putting this bug fix commit first means all the commits will continue to pass all tests.
  • Loading branch information
kkysen committed Nov 3, 2023
1 parent 87ba91f commit 2d2ddfc
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ target/
/retranspile/

.gdb_history
.gdbinit
38 changes: 38 additions & 0 deletions include/dav1d/headers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1053,8 +1053,11 @@ pub struct Rav1dFilmGrainData {
pub uv_offset: [c_int; 2],
}

/// Must be 16-byte aligned for `psrad` on [`Self::ar_coeff_shift`].
/// See the docs for [`Self::ar_coeff_shift`] for an explanation.
#[derive(Clone)]
#[repr(C)]
#[repr(align(16))]
pub struct Dav1dFilmGrainData {
pub seed: c_uint,
pub num_y_points: c_int,
Expand All @@ -1066,6 +1069,41 @@ pub struct Dav1dFilmGrainData {
pub ar_coeff_lag: c_int,
pub ar_coeffs_y: [i8; 24],
pub ar_coeffs_uv: [[i8; 28]; 2],
/// Must be 16-byte aligned for `psrad`.
///
/// TODO(kkysen) This appears to be a bug in `dav1d`.
///
/// x86 asm uses `psrad` on a pointer to [`Self::ar_coeff_shift`].
/// When `psrad`'s shift operand is a memory address, i.e. a `XMMWORD PTR`,
/// it loads 128 bits from it, shifts by the lower 64 bits,
/// and requires the 128 bits to be 128-bit/16-byte aligned.
///
/// Previously, and still in `dav1d`, [`Self::ar_coeff_shift`]
/// is only 8-byte aligned, as is [`Self`]/[`Dav1dFilmGrainData`].
/// However, in `dav1d`, [`Dav1dFilmGrainData`] is part of
/// [`Dav1dFrameHeader`], which is allocated with [`malloc`].
/// [`malloc`] happens to return 16-byte aligned pointers usually,
/// but is not required to, so this is UB and will segfault if not aligned.
///
/// Due to the [`Rav1dFilmGrainData`] to [`Dav1dFilmGrainData`]
/// conversion done now, the [`Dav1dFilmGrainData`] is stored on the stack,
/// and often will not be 16-byte aligned, and thus will often segfault.
///
/// To fix this, [`Self::ar_coeff_shift`] must be 16-byte aligned.
/// This cannot be done only for the field without changing the offsets of
/// the subsequent fields, however, so we instead align
/// [`Dav1dFilmGrainData`] itself with `#[repr(align(16))]`.
/// [`Self::ar_coeff_shift`] is at offset `0xB0`/`176`,
/// which is divisible by 16.
///
/// `psrad` also loads a full 128 bits, not just the 64 bits of
/// [`Self::ar_coeff_shift`], even if it doesn't read them,
/// so we must ensure that the following 64 bits are also deferenceable.
/// They indeed are in `dav1d`, but we must be careful,
/// as [`Self::ar_coeff_shift`] being the last field would
/// read 8 bytes out of bounds and be UB.
///
/// [`malloc`]: libc::malloc
pub ar_coeff_shift: u64,
pub grain_scale_shift: c_int,
pub uv_mult: [c_int; 2],
Expand Down

0 comments on commit 2d2ddfc

Please sign in to comment.