Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find_Sequence Function for Arrays, and Array-Like classes #11722

Open
ColinSORourke opened this issue Feb 8, 2025 · 1 comment
Open

Find_Sequence Function for Arrays, and Array-Like classes #11722

ColinSORourke opened this issue Feb 8, 2025 · 1 comment

Comments

@ColinSORourke
Copy link

ColinSORourke commented Feb 8, 2025

Describe the project you are working on

Hi! I'm currently working on a games-adjacent tool, for editing NDS ROM files. Working with these binary files, I often have to search a file's entire bytes for a small (or large) byte subsequence. I use this the most in the compression algorithms, which are fairly niche compression variants, not included in Godot's built-in compression types.

Describe the problem or limitation you are having in your project

While GDScript String's String::Find will search for a substring, GDScript Array's (and Array-Like classes) Array::Find will only search for a singular value, not a subarray.

So while GDScript can do:
"ABCDEF".find("DE")

It cannot do
[1, 2, 3, 4, 5, 6].find([1,2])

Making it more complex, and less performant to try and find a sub-array in a large array.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

I suggest adding an Array::Find_Seq function to GDScript Arrays (and Array-Like classes) which will search for a Sub-Array. (As well as the Reverse variant, Array::RFind_Seq).

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

I am prepared to implement this proposal, and have already implemented some of it in the arrayFindSeq Branch of my Godot Fork. I compiled Godot with my adjusted functions, and ran tests to confirm the intended behaviors, and confirm my intuitions about performance. Most of the relevant code updates are in core/variant/array.cpp, with necessary bindings in core/variant/array.h & core/variant/variant_call.cpp

[Edited for Brevity]
Across the 3 basic tests I performed, the new functions behaved as expected, dramatically out-performing a GDScript function (30x - 45x) to find the sequence, and performing on par (1x) or significantly better (4x) than using String::Find as a workaround.

This is not a complete implementation, a complete implementation would include:

  • Updates to the documentation
  • Implementations of Find_Seq in either Vector.h and/or CowData.cpp, as PackedByteArrays and other ArrayLike classes inherit from those rather than Array.cpp
  • Discussion of exact intended behaviors with Typed Arrays & Reverse Finds
  • Perhaps an implementation of Find_Custom_Sequence
  • Discussion of how to handle the exponential cost of searching an Array of Arrays for a subsequence of Arrays.

If this enhancement will not be used often, can it be worked around with a few lines of script?

[Edited for Brevity]
The workaround options are:

  • Write a GDScript function to perform the search
  • Perform an Array->String Conversion and use String::Find to find the subsequence

I think neither is satisfactory. Writing a function in GDScript produces dramatically worse performance than a native CPP search. Using a String Conversion can make the code rather confusing, if not extremely difficult to even produce a correct result. And in the intended use-cases, the String::Find is often less performant than a C++ Array:Find_Seq, due to having perform significantly more comparisons per array element.

Is there a reason why this should be core and not an add-on in the asset library?

I think it's important that GDScript have robust & performant Array capabilities. These functions will make complex Array interactions more approachable, and make it friendlier to users. I myself was fairly surprised that Strings (a nice wrapper around a form of Array) have a subsequence Find while actual Arrays do not.

@Ivorforce
Copy link

Ivorforce commented Feb 8, 2025

I am actually planning implement the backend of this function anyway! Stemming from a similar background :)

In short, String's find implementation could be abstracted to run on Span instead without loss of efficiency. Packed* arrays and Arrays could then use the same implementation without needing new code.
For me, this is part of a longer term refactor in trying to reduce duplicate code.

I am currently waiting for the Span PR to get merged (hoping for 4.5 dev 1). When that's done, find can be moved out to something like spans::find_subspan, and exposing that to GDScript users could be achieved with just a binding to that from Vector, without needing any new code. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants