Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for binding parameters by index #83

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

chands10
Copy link

@chands10 chands10 commented Dec 18, 2024

*Issue number of the reported bug or feature request: #82 *

Describe your changes
Added support for binding by index by using a sequence of parameter values. Also checks if the value provided in this case is an array, which is not supported.
Examples:
handle.execute("select ?", [10])
This would output 10

NOTE: when binding by index % does not need to be escaped in dbapi2. In all other cases using dbapi2 (even if not parameter binding), % needs to be escaped

https://peps.python.org/pep-0249/#id53 supports using a sequence of values with qmark param style

Testing performed
cdb2:
Added test to make sure binding an array by index does not work.
Added test that inserts parameters binded by index, similar to test that inserts parameters binded by value.

dbapi2:
Tested different behavior with %
Tested different sequences. Using list, tuple which should pass, string should fail.
Tested all data types, datetimes, overflow -- similar to binding by name
Tested array binding failure

@chands10 chands10 force-pushed the bind_by_index branch 2 times, most recently from 2149a58 to d666570 Compare December 18, 2024 21:47
@chands10 chands10 changed the title Add support for binding parameters by index Add support for binding parameters by index to cdb2 Dec 18, 2024
@chands10 chands10 force-pushed the bind_by_index branch 2 times, most recently from 3280dfd to 6cc553f Compare December 19, 2024 03:22
@chands10 chands10 marked this pull request as draft January 6, 2025 20:31
@chands10 chands10 force-pushed the bind_by_index branch 8 times, most recently from 1540fdd to ec9ddc5 Compare January 7, 2025 18:35
@chands10 chands10 marked this pull request as ready for review January 7, 2025 18:35
@chands10 chands10 changed the title Add support for binding parameters by index to cdb2 Add support for binding parameters by index Jan 7, 2025
@godlygeek
Copy link
Contributor

OK, I've finally had a chance to review this, and to sit down and think about it a bit more thoroughly. Overall, this looks very good to me. I've pushed a fixup commit that tweaks the wording of some of the documentation and exceptions, but left almost everything else alone. I did change my mind about one thing: I decided to accept other sequence types, rather than just limiting it to list and tuple. It might lead to confusing errors if the user accidentally passes a string, but the advantage of being able to pass a numpy array probably outweighs that, and I don't like our type annotations saying that Sequence is accepted when in actuality only a handful of sequence types are accepted.

While thinking more about it, though, I wonder if we should use :1 and :2 style placeholders instead of ? style. We'd get some advantages from that:

  • PEP 249 says "Module implementors should prefer numeric, named or pyformat over the other formats because these offer more clarity and flexibility.", so the :1 format is preferred over ?
  • We'd implement this by always binding by name, instead of by index - we'd just use the numeric position as the name, which sqlite weirdly but explicitly allows. This means that binding arrays would work, and we could remove all of the caveats about arrays not being able to be bound positionally.

What do you think of that idea, @chands10 ?

@godlygeek
Copy link
Contributor

Ooh, I've also encountered a bug while testing with this locally:

>>> empty_blob = b""
>>> list(hndl.execute("select :blob", {"blob": empty_blob}))
[[b'']]
>>> list(hndl.execute("select ?", [empty_blob]))
[[None]]
>>>

It seems like, when binding by index, an empty blob is being turned into a null, but when binding by name it isn't. I think this might be a bug in cdb2_bind_index itself, and if so, that's another reason to implement positional binding in terms of cdb2_bind_param and cdb2_bind_array...

@godlygeek
Copy link
Contributor

Slightly better demo for that bug:

>>> list(hndl.execute("select typeof(:x)", {"x": b""}))
[['blob']]
>>> list(hndl.execute("select typeof(?)", [b""]))
[['null']]

@chands10
Copy link
Author

chands10 commented Jan 18, 2025

Thanks for reviewing! I just checked and seems like there is one conditional missing regarding blobs in cdb2_bind_index that is present in cdb2_bind_params. Nice find! These two functions should really use the same helper function given that a lot of the code is the same, I will combine these so that one function does not become out of date.

The open source version of cdb2api does not have this problem thankfully.

>>> from comdb2 import dbapi2
>>> conn = dbapi2.connect("testdb", "local", autocommit=True)
>>> cursor=conn.cursor()
>>> cursor.execute("select @hi", {"hi": b""})
<comdb2.dbapi2.Cursor object at 0x7f45092dd960>
>>> cursor.fetchall()
[[b'']]
>>> cursor.execute("select ?", [b""])
<comdb2.dbapi2.Cursor object at 0x7f45092dd960>
>>> cursor.fetchall()
[[b'']]
>>> 

I will reply to other comments next week. Have a good weekend!

@chands10
Copy link
Author

I also think it will be simple to add binding arrays by index to cdb2api. I made a quick copy paste implementation on my local copy of cdb2api to bind index and added functionality to python-comdb2

>>> from comdb2 import cdb2
>>> c = cdb2.CHandle("testdb", "local")
>>> list(c.execute('select 1 in carray(?)', [[1,2,3]]))
BINDING ARRAY BY INDEX
GOT RC 0
[[1]]
>>> list(c.execute('select 1 in carray(?)', [[2,3]]))
BINDING ARRAY BY INDEX
GOT RC 0
[[0]]

@chands10
Copy link
Author

I think it would be confusing to implement binding by index in python using the binding by name functions in cdb2api.
Also the docs state that cdb2_bind_index is faster than cdb2_bind_param so I think it would be good to implement it using the same function https://bloomberg.github.io/comdb2/c_api.html#cdb2_bind_index

Additionally it looks like the sqlite3 python api supports only named params and qmark, and that numeric params will be interpreted as named params https://docs.python.org/3/library/sqlite3.html#how-to-use-placeholders-to-bind-values-in-sql-queries

Personally I am implementing a query replayer program given a sample of queries from various dbs, and need a way to support rerunning all types of queries. Ideally I would like to use python cdb2 for this, but would need it to support using the cdb2_bind_index function so that qmark style queries would also be able to be rerun.

Mike is out of office today but I'll also ask him for his input

@godlygeek
Copy link
Contributor

I think it would be confusing to implement binding by index in python using the binding by name functions in cdb2api.

Hm, I don't think I agree with this. Users shouldn't need to know or care what library functions we use to implement the interface we're exposing. The fact that python-comdb2 even uses libcdb2api at all is an implementation detail, the particular functions we use under the hood doubly so.

Additionally it looks like the sqlite3 python api supports only named params and qmark, and that numeric params will be interpreted as named params

I'm not sure that this really matters, either - except that a small handful of people have tried using in-memory sqlite databases as mocks to stand in for Comdb2 databases. But there are tons of differences between them, so what's one more for the pile...

I would like to use python cdb2 for this, but would need it to support using the cdb2_bind_index function so that qmark style queries would also be able to be rerun.

That's reasonable enough, though... I guess you're aware of which parameter binding API was used by the query you're trying to replay?

I have a slight preference for numeric over qmark, but this use case might be enough to tip me the other way and convince me that qmark is best after all.

Mike is out of office today but I'll also ask him for his input

OK, please do. We can easily implement either qmark or numeric, but whichever one we implement, we'll be stuck on that path and won't be able to make the other one work, at least without the user passing some sort of extra flag. I suppose qmark is good enough - we'll be able to use the ?2 ?1 style for free if we go down the qmark path. It's non-standard, but that's OK, we'd just need to document it. But I do want to point out that we're forever giving up the PEP 249 numeric style by taking that route.

Thanks for reviewing! I just checked and seems like there is one conditional missing regarding blobs in cdb2_bind_index that is present in cdb2_bind_params. Nice find!

I also think it will be simple to add binding arrays by index to cdb2api.

Can you give an ETA for when we could get each of these improvements into the version of cdb2api packaged internally for Bloomberg? I'd be thrilled if we could get array binding working from the start, we could remove big chunks of the documentation in this PR if we had it, and make it easier to announce and talk about this new feature.

@chands10
Copy link
Author

chands10 commented Jan 22, 2025

That's fair, I'll let you know what Mike says once I speak with him! Just was thinking it might be confusing if someone asks us to look at a bug regarding a python query that looks like it's being bound by index in cdb2api but actually isn't.

Yes I would know if a query from the samples is binding parameters by value or index.

My guess is that it would take a few weeks to review and roll out, but I have already made prs to support it!
bloomberg/comdb2#4973

@godlygeek
Copy link
Contributor

Just was thinking it might be confusing if someone asks us to look at a bug

Ah, yeah. Won't be confusing for end users, but you're right, it's a bit confusing for maintainers.

Personally I am implementing a query replayer program

It's not really a goal of python-comdb2 to work as a libcdb2api test driver, so I wouldn't want to make design decisions based solely on this sort of use case... but since I'm pretty on the fence between the two options, I do think this is enough to tip the scales. I think I'm leaning towards going with the qmark approach, but I do still want to hear Mike's opinion before we commit to it.

@mponomar
Copy link

I don't have a strong opinion. The ?-style binding approach seems nicer from the user perspective, since it doesn't require you to count your parameters - a number that'll be used exactly nowhere else. The only advantage of :1-style names is that you can use them out of sequence, but you still have binding by name for that and it clashes with how SQLite interprets parameters (it'd treat them as names).

@mponomar
Copy link

A non-standard (from a PEP perspective) format seems fine since we're not changing the SQL string - the python library just packages up the parameters and leaves their meaning to the database (eg: if we really need to support ^1, ^2 as a format, nothing in the python library needs to change).

@godlygeek
Copy link
Contributor

godlygeek commented Jan 27, 2025

eg: if we really need to support ^1, ^2 as a format, nothing in the python library needs to change

Given that sqlite supporst ?1, ?2 out of the box, we can just use that. The only disadvantage being that it's not a standard placeholder format per PEP 249, but otherwise it seems fine enough.

OK, I'm convinced, let's go with the ? qmark style, and forever abandon my dreams of supporting the :1 numeric style, heh.

@godlygeek
Copy link
Contributor

seems like there is one conditional missing regarding blobs in cdb2_bind_index that is present in cdb2_bind_params. Nice find!

I also think it will be simple to add binding arrays by index to cdb2api.

OK, then, last real decision: should we wait to land this until after these two improvements are in place, or should we land it with documented caveats about the inability to bind arrays and empty blobs, and then revisit the docs/implementation once those improvements are in place?

@chands10
Copy link
Author

Great!

I think it'll be easier if we wait, it shouldn't take too long to get those improvements in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants