Releases: NVIDIA/warp
Releases · NVIDIA/warp
v1.6.0
Changelog
[1.6.0] - 2025-02-03
Added
- Add preview of Tile Cholesky factorization and solve APIs through
wp.tile_cholesky()
,tile_cholesky_solve()
andtile_diag_add()
(preview APIs are subject to change). - Support for loading tiles from arrays whose shapes are not multiples of the tile dimensions.
Out-of-bounds reads will be zero-filled and out-of-bounds writes will be skipped. - Support for higher-dimensional (up to 4D) tile shapes and memory operations.
- Add intersection-free self-contact support in
wp.sim.VDBIntegrator
by passinghandle_self_contact=True
.
Seewarp/examples/sim/example_cloth_self_contact.py
for a usage example. - Add functions
wp.norm_l1()
,wp.norm_l2()
,wp.norm_huber()
,wp.norm_pseudo_huber()
, andwp.smooth_normalize()
for vector types to a newwp.math
module. wp.sim.SemiImplicitIntegrator
andwp.sim.FeatherstoneIntegrator
now have an optionalfriction_smoothing
constructor argument (defaults to 1.0) that controls softness of the friction norm computation.- Support
assert
statements in kernels (docs).
Assertions can only be triggered in"debug"
mode (GH-366). - Support CUDA IPC on Linux. Call the
ipc_handle()
method to get an IPC handle for awp.Event
or awp.array
,
and callwp.from_ipc_handle()
orwp.event_from_ipc_handle()
in another process to open the handle
(docs). - Add per-module option to disable fused floating point operations, use
wp.set_module_options({"fuse_fp": False})
(GH-379). - Add per-module option to add CUDA-C line information for profiling, use
wp.set_module_options({"lineinfo": True})
. - Support operator overloading for
wp.struct
objects by definingwp.func
functions
(GH-392). - Add built-in function
wp.len()
to retrieve the number of elements for vectors, quaternions, matrices, and arrays
(GH-389). - Add
warp/examples/optim/example_softbody_properties.py
as an optimization example for soft-body properties
(GH-419). - Add
warp/examples/tile/example_tile_walker.py
, which reworks the existingexample_walker.py
to use Warp's tile API for matrix multiplication. - Add
warp/examples/tile/example_tile_nbody.py
as an example of an N-body simulation using Warp tile primitives.
Changed
- Breaking: Change
wp.tile_load()
andwp.tile_store()
indexing behavior so that indices are now specified in
terms of array elements instead of tile multiples. - Breaking: Tile operations now take
shape
andoffset
parameters as tuples,
e.g.:wp.tile_load(array, shape=(m,n), offset=(i,j))
. - Breaking: Change exception types and error messages thrown by tile functions for improved consistency.
- Add an implicit tile synchronization whenever a shared memory tile's data is reinitialized (e.g. in dynamic loops).
This could result in lower performance. wp.Bvh
constructor now supports various construction algorithms via theconstructor
argument, including
"sah"
(Surface Area Heuristics),"median"
, and"lbvh"
(docs)- Improve the query efficiency of
wp.Bvh
andwp.Mesh
. - Improve memory consumption, compilation and runtime performance when using in-place vector/matrix assignments in
kernels that haveenable_backward
set toFalse
(GH-332). - Vector/matrix/quaternion component
+=
and-=
operations compile and run faster in the backward pass
(GH-332). - Name files in the kernel cache according to their directory. Previously, all files began with
module_codegen
(GH-431). - Avoid recompilation of modules when changing
block_dim
. wp.autograd.gradcheck_tape()
now has additional optional argumentsreverse_launches
andskip_to_launch_index
.wp.autograd.gradcheck()
,wp.autograd.jacobian()
, andwp.autograd.jacobian_fd()
now also accept
arbitrary Python functions that have Warp arrays as inputs and outputs.update_vbo_transforms
kernel launches in the OpenGL renderer are no longer recorded onto the tape.- Skip emitting backward functions/kernels in the generated C++/CUDA code when
enable_backward
is set toFalse
. - Emit deprecation warnings for the use of the
owner
andlength
keywords in thewp.array
initializer. - Emit deprecation warnings for the use of
wp.mlp()
,wp.matmul()
, andwp.batched_matmul()
.
Use tile primitives instead.
Fixed
- Fix unintended modification of non-Warp arrays during the backward pass (GH-394).
- Fix so that
wp.Tape.zero()
zeroes gradients passed via thegrads
parameter inwp.Tape.backward()
(GH-407). - Fix errors during graph capture caused by module unloading (GH-401).
- Fix potential memory corruption errors when allocating arrays with strides (GH-404).
- Fix
wp.array()
not respecting the targetdtype
andshape
when the given data is an another array with a CUDA interface
(GH-363). - Negative constants evaluate to compile-time constants (GH-403)
- Fix
ImportError
exception being thrown during interpreter shutdown on Windows when using the OpenGL renderer
(GH-412). - Fix the OpenGL renderer not working when multiple instances exist at the same time (GH-385).
- Fix
AttributeError
crash in the OpenGL renderer when moving the camera (GH-426). - Fix the OpenGL renderer not correctly displaying duplicate capsule, cone, and cylinder shapes
(GH-388). - Fix the overriding of
wp.sim.ModelBuilder
default parameters (GH-429). - Fix indexing of
wp.tile_extract()
when the block dimension is smaller than the tile size. - Fix scale and rotation issues with the rock geometry used in the granular collision SDF example
(GH-409). - Fix autodiff Jacobian computation in
wp.autograd.jacobian()
where in some cases gradients were not zeroed-out properly. - Fix plotting issues in
wp.autograd.jacobian_plot()
. - Fix the
len()
operator returning the total size of a matrix instead of its first dimension. - Fix gradient instability in rigid-body contact handling for
wp.sim.SemiImplicitIntegrator
and
wp.sim.FeatherstoneIntegrator
(GH-349). - Fix overload resolution of generic Warp functions with default arguments.
- Fix rendering of arrows with different
up_axis
,color
inOpenGLRenderer
(GH-448).
v1.5.1
Changelog
[1.5.1] - 2025-01-02
Added
- Add PyTorch basics and custom operators notebooks to the
notebooks
directory. - Update PyTorch interop docs to include section on custom operators
(docs).
Fixed
- warp.sim: Fix a bug in which the color-balancing algorithm was not updating the colorings.
- Fix custom colors being not being updated when rendering meshes with static topology in OpenGL
(GH-343). - Fix
wp.launch_tiled()
not returning aLaunch
object when passedrecord_cmd=True
. - Fix default arguments not being resolved for
wp.func
when called from Python's runtime
(GH-386). - Array overwrite tracking: Fix issue with not marking arrays passed to
wp.atomic_add()
,wp.atomic_sub()
,
wp.atomic_max()
, orwp.atomic_min()
as being written to (GH-378). - Fix for occasional failure to update
.meta
files into Warp kernel cache on Windows. - Fix the OpenGL renderer not being able to run without a CUDA device available
(GH-344). - Fix incorrect CUDA driver function versions (GH-402).
v1.5.0
Changelog
[1.5.0] - 2024-12-02
Added
- Support for cooperative tile-based primitives using cuBLASDx and cuFFTDx, please see the tile
documentation for details. - Expose a
reversed()
built-in for iterators (GH-311). - Support for saving Volumes into
.nvdb
files with thesave_to_nvdb
method. - warp.fem: Add
wp.fem.Trimesh3D
andwp.fem.Quadmesh3D
geometry types for 3D surfaces with newexample_distortion_energy
example. - warp.fem: Add
"add"
option towp.fem.integrate()
for accumulating integration result to existing output. - warp.fem: Add
"assembly"
option towp.fem.integrate()
for selecting between more memory-efficient or more
computationally efficient integration algorithms. - warp.fem: Add Nédélec (first kind) and Raviart-Thomas vector-valued function spaces
providing conforming discretization ofcurl
anddiv
operators, respectively. - warp.sim: Add a graph coloring module that supports converting trimesh into a vertex graph and applying coloring.
Thewp.sim.ModelBuilder
now includes methods to color particles for use withwp.sim.VBDIntegrator()
,
users should callbuilder.color()
before finalizing assets. - warp.sim: Add support for a per-particle radius for soft-body triangle contact using the
wp.sim.Model.particle_radius
array (docs), replacing the previous
hard-coded value of 0.01 (GH-329). - Add a
particle_radius
parameter towp.sim.ModelBuilder.add_cloth_mesh()
andwp.sim.ModelBuilder.add_cloth_grid()
to set a uniform radius for the added particles. - Document
wp.array
attributes (GH-364). - Document time-to-compile tradeoffs when using vector component assignment statements in kernels.
- Add introductory Jupyter notebooks to the
notebooks
directory.
Changed
- Drop support for Python 3.7; Python 3.8 is now the minimum-supported version.
- Promote the
wp.Int
,wp.Float
, andwp.Scalar
generic annotation types to the public API. - warp.fem: Simplify querying neighboring cell quantities when integrating on sides using new
wp.fem.cells()
,wp.fem.to_inner_cell()
,wp.fem.to_outer_cell()
operators. - Show an error message when the type returned by a function differs from its annotation, which would have led to the compilation stage failing.
- Clarify that
wp.randn()
samples a normal distribution of mean 0 and variance 1. - Raise error when passing more than 32 variadic argument to the
wp.printf()
built-in.
Fixed
- Fix
place
setting of paddle backend. - warp.fem: Fix tri-cubic shape functions on quadrilateral meshes.
- warp.fem: Fix caching of integrand kernels when changing code-generation options.
- Fix
wp.expect_neq()
overloads missing for scalar types. - Fix an error when a
wp.kernel
or awp.func
object is annotated to return aNone
value. - Fix error when reading multi-volume, BLOSC-compressed
.nvdb
files. - Fix
wp.printf()
erroring out when no variadic arguments are passed (GH-333). - Fix memory access issues in soft-rigid contact collisions (GH-362).
- Fix gradient propagation for in-place addition/subtraction operations on custom vector-type arrays.
- Fix the OpenGL renderer's window not closing when clicking the X button.
- Fix the OpenGL renderer's camera snapping to a different direction from the initial camera's orientation when first looking around.
- Fix custom colors being ignored when rendering meshes in OpenGL (GH-343).
- Fix topology updates not being supported by the the OpenGL renderer.
v1.4.2
Changelog
[1.4.2] - 2024-11-13
Changed
- Make the output of
wp.print()
in backward kernels consistent for all supported data types.
Fixed
- Fix to relax the integer types expected when indexing arrays (regression in
1.3.0
). - Fix printing vector and matrix adjoints in backward kernels.
- Fix kernel compile error when printing structs.
- Fix an incorrect user function being sometimes resolved when multiple overloads are available with array parameters with different
dtype
values. - Fix error being raised when static and dynamic for-loops are written in sequence with the same iteration variable names (GH-331).
- Fix an issue with the
Texture Write
node, used in the Mandelbrot Omniverse sample, sometimes erroring out in multi-GPU environments. - Code generation of in-place multiplication and division operations (regression introduced in a69d061)(GH-342).
v1.4.1
Changelog
[1.4.1] - 2024-10-15
Fixed
- Fix
iter_reverse()
not working as expected for ranges with steps other than 1 (GH-311). - Fix potential out-of-bounds memory access when a
wp.sparse.BsrMatrix
object is reused for storing matrices of different shapes. - Fix robustness to very low desired tolerance in
wp.fem.utils.symmetric_eigenvalues_qr
. - Fix invalid code generation error messages when nesting dynamic and static for-loops.
- Fix caching of kernels with static expressions.
- Fix
ModelBuilder.add_builder(builder)
to correctly updatearticulation_start
and therebyarticulation_count
whenbuilder
contains more than one articulation. - Re-introduced the
wp.rand*()
,wp.sample*()
, andwp.poisson()
onto the Python scope to revert a breaking change.
v.1.4.0
CHANGELOG
[1.4.0] - 2024-10-01
Added
- Support for a new
wp.static(expr)
function that allows arbitrary Python expressions to be evaluated at the time of
function/kernel definition (docs). - Support for stream priorities to hint to the device that it should process pending work
in high-priority streams over pending work in low-priority streams when possible
(docs). - Adaptive sparse grid geometry to
warp.fem
(docs). - Support for defining
wp.kernel
andwp.func
objects from within closures. - Support for defining multiple versions of kernels, functions, and structs without manually assigning unique keys.
- Support for default argument values for user functions decorated with
wp.func
. - Allow passing custom launch dimensions to
jax_kernel()
(GH-310). - JAX interoperability examples for sharding and matrix multiplication (docs).
- Interoperability support for the PaddlePaddle ML framework (GH-318).
- Support
wp.mod()
for vector types (GH-282). - Expose the modulo operator
%
to Python's runtime scalar and vector types. - Support for fp64
atomic_add
,atomic_max
, andatomic_min
(GH-284). - Support for quaternion indexing (e.g.
q.w
). - Support shadowing builtin functions (GH-308).
- Support for redefining function overloads.
- Add an ocean sample to the
omni.warp
extension. warp.sim.VBDIntegrator
now supports body-particle collision.- Add a contributing guide to the Sphinx docs .
- Add documentation for dynamic code generation (docs).
Changed
wp.sim.Model.edge_indices
now includes boundary edges.- Unexposed
wp.rand*()
,wp.sample*()
, andwp.poisson()
from the Python scope. - Skip unused functions in module code generation, improving performance.
- Avoid reloading modules if their content does not change, improving performance.
wp.Mesh.points
is now a property instead of a raw data member, its reference can be changed after the mesh is initialized.- Improve error message when invalid objects are referenced in a Warp kernel.
if
/else
/elif
statements with constant conditions are resolved at compile time with no branches being inserted in the generated code.- Include all non-hidden builtins in the stub file.
- Improve accuracy of symmetric eigenvalues routine in
warp.fem
.
Fixed
- Fix for
wp.func
erroring out when defining aTuple
as a return type hint (GH-302). - Fix array in-place op (
+=
,-=
) adjoints to compute gradients correctly in the backwards pass - Fix vector, matrix in-place assignment adjoints to compute gradients correctly in the backwards pass, e.g.:
v[1] = x
- Fix a bug in which Python docstrings would be created as local function variables in generated code.
- Fix a bug with autograd array access validation in functions from different modules.
- Fix a rare crash during error reporting on some systems due to glibc mismatches.
- Handle
--num_tiles 1
inexample_render_opengl.py
(GH-306). - Fix the computation of body contact forces in
FeatherstoneIntegrator
when bodies and particles collide. - Fix bug in
FeatherstoneIntegrator
whereeval_rigid_jacobian
could give incorrect results or reach an infinite
loop when the body and joint indices were not in the same order. AddedModel.joint_ancestor
to fix the indexing
from a joint to its parent joint in the articulation. - Fix wrong vertex index passed to
add_edges()
called fromModelBuilder.add_cloth_mesh()
(GH-319). - Add a workaround for uninitialized memory read warning in the
compute-sanitizer
initcheck tool when usingwp.Mesh
. - Fix name clashes when Warp functions and structs are returned from Python functions multiple times.
- Fix name clashes between Warp functions and structs defined in different modules.
- Fix code generation errors when overloading generic kernels defined in a Python function.
- Fix issues with unrelated functions being treated as overloads (e.g., closures).
- Fix handling of
stream
argument inarray.__dlpack__()
. - Fix a bug related to reloading CPU modules.
- Fix a crash when kernel functions are not found in CPU modules.
- Fix conditions not being evaluated as expected in
while
statements. - Fix printing Boolean and 8-bit integer values.
- Fix array interface type strings used for Boolean and 8-bit integer values.
- Fix initialization error when setting struct members.
- Fix Warp not being initialized upon entering a
wp.Tape
context. - Use
kDLBool
instead ofkDLUInt
for DLPack interop of Booleans.
v1.3.3
[1.3.3] - 2024-09-04
- Bug fixes
- Fix an aliasing issue with zero-copy array initialization from NumPy introduced in Warp 1.3.0.
- Fix
wp.Volume.load_from_numpy()
behavior whenbg_value
is a sequence of values.
[1.3.2] - 2024-08-30
- Bug fixes
- Fix accuracy of 3x3 SVD
wp.svd3
with fp64 numbers (GH-281). - Fix module hashing when a kernel argument contained a struct array (GH-287).
- Fix a bug in
wp.bvh_query_ray()
where the direction instead of the reciprocal direction was used
(GH-288). - Fix errors when launching a CUDA graph after a module is reloaded. Modules that were used during graph capture
will no longer be unloaded before the graph is released. - Fix a bug in
wp.sim.collide.triangle_closest_point_barycentric()
where the returned barycentric coordinates may be
incorrect when the closest point lies on an edge. - Fix 32-bit overflow when array shape is specified using
np.int32
. - Fix handling of integer indices in the
input_output_mask
argument toautograd.jacobian
and
autograd.jacobian_fd
(GH-289). - Fix
ModelBuilder.collapse_fixed_joints()
to correctly update the body centers of mass and the
ModelBuilder.articulation_start
array. - Fix precedence of closure constants over global constants.
- Fix quadrature point indexing in
wp.fem.ExplicitQuadrature
(regression from 1.3.0).
- Fix accuracy of 3x3 SVD
- Documentation improvements
- Add missing return types for built-in functions.
- Clarify that atomic operations also return the previous value.
- Clarify that
wp.bvh_query_aabb()
returns parts that overlap the bounding volume.
[1.3.1] - 2024-07-27
- Remove
wp.synchronize()
from PyTorch autograd function example Tape.check_kernel_array_access()
andTape.reset_array_read_flags()
are now private methods.- Fix reporting unmatched argument types
[1.3.0] - 2024-07-25
-
Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
compiled(compiled)
, loaded from the cache(cached)
, or was unable to be
loaded(error)
. wp.config.verbose = True
now also prints out a message upon the entry to awp.ScopedTimer
.- Add
wp.clear_kernel_cache()
to the public API. This is equivalent towp.build.clear_kernel_cache()
. - Add code-completion support for
wp.config
variables. - Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update
wp.matmul()
CPU fallback to use dtype explicitly innp.matmul()
call - Add support for PEP 563's
from __future__ import annotations
(GH-256). - Allow passing external arrays/tensors to
wp.launch()
directly via__cuda_array_interface__
and__array_interface__
, up to 2.5x faster conversion from PyTorch - Add faster Torch interop path using
return_ctype
argument towp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add
wp.abs()
andwp.sign()
for vector types - Expose scalar arithmetic operators to Python's runtime (e.g.:
wp.float16(1.23) * wp.float16(2.34)
) - Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiability - Add
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collection - Implement the assignment operator for
wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their
_t
suffix:wp.BVHQuery
,wp.HashGridQuery
,wp.MeshQueryAABB
,wp.MeshQueryPoint
, andwp.MeshQueryRay
- Add
wp.array(ptr=...)
to allow initializing arrays from pointer addresses inside of kernels (GH-206)
-
warp.autograd
improvements:- New
warp.autograd
module with utility functionsgradcheck()
,jacobian()
, andjacobian_fd()
for debugging kernel Jacobians (docs) - Add array overwrite detection, if
wp.config.verify_autograd_array_access
is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs) - Fix bug where modification of
@wp.func_replay
functions and native snippets would not trigger module recompilation - Add documentation for dynamic loop autograd limitations
- New
-
warp.sim
improvements:- Improve memory usage and performance for rigid body contact handling when
self.rigid_mesh_contact_max
is zero (default behavior). - The
mask
argument towp.sim.eval_fk()
now accepts both integer and boolean arrays to mask articulations. - Fix handling of
ModelBuilder.joint_act
inModelBuilder.collapse_fixed_joints()
(affected floating-base systems) - Fix and improve implementation of
ModelBuilder.plot_articulation()
to visualize the articulation tree of a rigid-body mechanism - Fix ShapeInstancer
__new__()
method (missing instance return and*args
parameter) - Fix handling of
upaxis
variable inModelBuilder
and the rendering thereof inOpenGLRenderer
- Improve memory usage and performance for rigid body contact handling when
-
warp.sparse
improvements:- Sparse matrix allocations (from
bsr_from_triplets()
,bsr_axpy()
, etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously. bsr_assign()
now supports changing block shape (including CSR/BSR conversions)- Add Python operator overloads for common sparse matrix operations, e.g
A += 0.5 * B
,y = x @ C
- Sparse matrix allocations (from
-
warp.fem
new features and fixes:- Support for variable number of nodes per element
- Global
wp.fem.lookup()
operator now supportswp.fem.Tetmesh
andwp.fem.Trimesh2D
geometries - Simplified defining custom subdomains (
wp.fem.Subdomain
), free-slip boundary conditions - New field types:
wp.fem.UniformField
,wp.fem.ImplicitField
andwp.fem.NonconformingField
- New
streamlines
,magnetostatics
andnonconforming_contact
examples, updatedmixed_elasticity
to use a nonlinear model - Function spaces can now export VTK-compatible cells for visualization
- Fixed edge cases with NanoVDB function spaces
- Fixed differentiability of
wp.fem.PicQuadrature
w.r.t. positions and measures
v1.3.2
[1.3.2] - 2024-08-30
- Bug fixes
- Fix accuracy of 3x3 SVD
wp.svd3
with fp64 numbers (GH-281). - Fix module hashing when a kernel argument contained a struct array (GH-287).
- Fix a bug in
wp.bvh_query_ray()
where the direction instead of the reciprocal direction was used
(GH-288). - Fix errors when launching a CUDA graph after a module is reloaded. Modules that were used during graph capture
will no longer be unloaded before the graph is released. - Fix a bug in
wp.sim.collide.triangle_closest_point_barycentric()
where the returned barycentric coordinates may be
incorrect when the closest point lies on an edge. - Fix 32-bit overflow when array shape is specified using
np.int32
. - Fix handling of integer indices in the
input_output_mask
argument toautograd.jacobian
and
autograd.jacobian_fd
(GH-289). - Fix
ModelBuilder.collapse_fixed_joints()
to correctly update the body centers of mass and the
ModelBuilder.articulation_start
array. - Fix precedence of closure constants over global constants.
- Fix quadrature point indexing in
wp.fem.ExplicitQuadrature
(regression from 1.3.0).
- Fix accuracy of 3x3 SVD
- Documentation improvements
- Add missing return types for built-in functions.
- Clarify that atomic operations also return the previous value.
- Clarify that
wp.bvh_query_aabb()
returns parts that overlap the bounding volume.
[1.3.1] - 2024-07-27
- Remove
wp.synchronize()
from PyTorch autograd function example Tape.check_kernel_array_access()
andTape.reset_array_read_flags()
are now private methods.- Fix reporting unmatched argument types
[1.3.0] - 2024-07-25
- Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
compiled(compiled)
, loaded from the cache(cached)
, or was unable to be
loaded(error)
. wp.config.verbose = True
now also prints out a message upon the entry to awp.ScopedTimer
.- Add
wp.clear_kernel_cache()
to the public API. This is equivalent towp.build.clear_kernel_cache()
. - Add code-completion support for
wp.config
variables. - Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update
wp.matmul()
CPU fallback to use dtype explicitly innp.matmul()
call - Add support for PEP 563's
from __future__ import annotations
(GH-256). - Allow passing external arrays/tensors to
wp.launch()
directly via__cuda_array_interface__
and__array_interface__
, up to 2.5x faster conversion from PyTorch - Add faster Torch interop path using
return_ctype
argument towp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add
wp.abs()
andwp.sign()
for vector types - Expose scalar arithmetic operators to Python's runtime (e.g.:
wp.float16(1.23) * wp.float16(2.34)
) - Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiability - Add
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collection - Implement the assignment operator for
wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their
_t
suffix:wp.BVHQuery
,wp.HashGridQuery
,wp.MeshQueryAABB
,wp.MeshQueryPoint
, andwp.MeshQueryRay
- Add
wp.array(ptr=...)
to allow initializing arrays from pointer addresses inside of kernels (GH-206)
v1.3.1
[1.3.1] - 2024-07-27
- Remove
wp.synchronize()
from PyTorch autograd function example Tape.check_kernel_array_access()
andTape.reset_array_read_flags()
are now private methods.- Fix reporting unmatched argument types
[1.3.0] - 2024-07-25
-
Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
compiled(compiled)
, loaded from the cache(cached)
, or was unable to be
loaded(error)
. wp.config.verbose = True
now also prints out a message upon the entry to awp.ScopedTimer
.- Add
wp.clear_kernel_cache()
to the public API. This is equivalent towp.build.clear_kernel_cache()
. - Add code-completion support for
wp.config
variables. - Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update
wp.matmul()
CPU fallback to use dtype explicitly innp.matmul()
call - Add support for PEP 563's
from __future__ import annotations
(GH-256). - Allow passing external arrays/tensors to
wp.launch()
directly via__cuda_array_interface__
and__array_interface__
, up to 2.5x faster conversion from PyTorch - Add faster Torch interop path using
return_ctype
argument towp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add
wp.abs()
andwp.sign()
for vector types - Expose scalar arithmetic operators to Python's runtime (e.g.:
wp.float16(1.23) * wp.float16(2.34)
) - Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiability - Add
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collection - Implement the assignment operator for
wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their
_t
suffix:wp.BVHQuery
,wp.HashGridQuery
,wp.MeshQueryAABB
,wp.MeshQueryPoint
, andwp.MeshQueryRay
- Add
wp.array(ptr=...)
to allow initializing arrays from pointer addresses inside of kernels (GH-206)
-
warp.autograd
improvements:- New
warp.autograd
module with utility functionsgradcheck()
,jacobian()
, andjacobian_fd()
for debugging kernel Jacobians (docs) - Add array overwrite detection, if
wp.config.verify_autograd_array_access
is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs) - Fix bug where modification of
@wp.func_replay
functions and native snippets would not trigger module recompilation - Add documentation for dynamic loop autograd limitations
- New
-
warp.sim
improvements:- Improve memory usage and performance for rigid body contact handling when
self.rigid_mesh_contact_max
is zero (default behavior). - The
mask
argument towp.sim.eval_fk()
now accepts both integer and boolean arrays to mask articulations. - Fix handling of
ModelBuilder.joint_act
inModelBuilder.collapse_fixed_joints()
(affected floating-base systems) - Fix and improve implementation of
ModelBuilder.plot_articulation()
to visualize the articulation tree of a rigid-body mechanism - Fix ShapeInstancer
__new__()
method (missing instance return and*args
parameter) - Fix handling of
upaxis
variable inModelBuilder
and the rendering thereof inOpenGLRenderer
- Improve memory usage and performance for rigid body contact handling when
-
warp.sparse
improvements:- Sparse matrix allocations (from
bsr_from_triplets()
,bsr_axpy()
, etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously. bsr_assign()
now supports changing block shape (including CSR/BSR conversions)- Add Python operator overloads for common sparse matrix operations, e.g
A += 0.5 * B
,y = x @ C
- Sparse matrix allocations (from
-
warp.fem
new features and fixes:- Support for variable number of nodes per element
- Global
wp.fem.lookup()
operator now supportswp.fem.Tetmesh
andwp.fem.Trimesh2D
geometries - Simplified defining custom subdomains (
wp.fem.Subdomain
), free-slip boundary conditions - New field types:
wp.fem.UniformField
,wp.fem.ImplicitField
andwp.fem.NonconformingField
- New
streamlines
,magnetostatics
andnonconforming_contact
examples, updatedmixed_elasticity
to use a nonlinear model - Function spaces can now export VTK-compatible cells for visualization
- Fixed edge cases with NanoVDB function spaces
- Fixed differentiability of
wp.fem.PicQuadrature
w.r.t. positions and measures
v1.3.0
[1.3.0] - 2024-07-25
-
Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
compiled(compiled)
, loaded from the cache(cached)
, or was unable to be
loaded(error)
. wp.config.verbose = True
now also prints out a message upon the entry to awp.ScopedTimer
.- Add
wp.clear_kernel_cache()
to the public API. This is equivalent towp.build.clear_kernel_cache()
. - Add code-completion support for
wp.config
variables. - Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update
wp.matmul()
CPU fallback to use dtype explicitly innp.matmul()
call - Add support for PEP 563's
from __future__ import annotations
(GH-256). - Allow passing external arrays/tensors to
wp.launch()
directly via__cuda_array_interface__
and__array_interface__
, up to 2.5x faster conversion from PyTorch - Add faster Torch interop path using
return_ctype
argument towp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add
wp.abs()
andwp.sign()
for vector types - Expose scalar arithmetic operators to Python's runtime (e.g.:
wp.float16(1.23) * wp.float16(2.34)
) - Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiability - Add
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collection - Implement the assignment operator for
wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their
_t
suffix:wp.BVHQuery
,wp.HashGridQuery
,wp.MeshQueryAABB
,wp.MeshQueryPoint
, andwp.MeshQueryRay
- Add
wp.array(ptr=...)
to allow initializing arrays from pointer addresses inside of kernels (GH-206)
-
warp.autograd
improvements:- New
warp.autograd
module with utility functionsgradcheck()
,jacobian()
, andjacobian_fd()
for debugging kernel Jacobians (docs) - Add array overwrite detection, if
wp.config.verify_autograd_array_access
is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs) - Fix bug where modification of
@wp.func_replay
functions and native snippets would not trigger module recompilation - Add documentation for dynamic loop autograd limitations
- New
-
warp.sim
improvements:- Improve memory usage and performance for rigid body contact handling when
self.rigid_mesh_contact_max
is zero (default behavior). - The
mask
argument towp.sim.eval_fk()
now accepts both integer and boolean arrays to mask articulations. - Fix handling of
ModelBuilder.joint_act
inModelBuilder.collapse_fixed_joints()
(affected floating-base systems) - Fix and improve implementation of
ModelBuilder.plot_articulation()
to visualize the articulation tree of a rigid-body mechanism - Fix ShapeInstancer
__new__()
method (missing instance return and*args
parameter) - Fix handling of
upaxis
variable inModelBuilder
and the rendering thereof inOpenGLRenderer
- Improve memory usage and performance for rigid body contact handling when
-
warp.sparse
improvements:- Sparse matrix allocations (from
bsr_from_triplets()
,bsr_axpy()
, etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously. bsr_assign()
now supports changing block shape (including CSR/BSR conversions)- Add Python operator overloads for common sparse matrix operations, e.g
A += 0.5 * B
,y = x @ C
- Sparse matrix allocations (from
-
warp.fem
new features and fixes:- Support for variable number of nodes per element
- Global
wp.fem.lookup()
operator now supportswp.fem.Tetmesh
andwp.fem.Trimesh2D
geometries - Simplified defining custom subdomains (
wp.fem.Subdomain
), free-slip boundary conditions - New field types:
wp.fem.UniformField
,wp.fem.ImplicitField
andwp.fem.NonconformingField
- New
streamlines
,magnetostatics
andnonconforming_contact
examples, updatedmixed_elasticity
to use a nonlinear model - Function spaces can now export VTK-compatible cells for visualization
- Fixed edge cases with NanoVDB function spaces
- Fixed differentiability of
wp.fem.PicQuadrature
w.r.t. positions and measures