Add support for binding parameters by index #83

chands10 · 2024-12-18T21:29:19Z

*Issue number of the reported bug or feature request: #82 *

Describe your changes
Added support for binding by index by using a sequence of parameter values. Also checks if the value provided in this case is an array, which is not supported.
Examples:
handle.execute("select ?", [10])
This would output 10

NOTE: when binding by index % does not need to be escaped in dbapi2. In all other cases using dbapi2 (even if not parameter binding), % needs to be escaped

https://peps.python.org/pep-0249/#id53 supports using a sequence of values with qmark param style

Testing performed
cdb2:
Added test to make sure binding an array by index does not work.
Added test that inserts parameters binded by index, similar to test that inserts parameters binded by value.

dbapi2:
Tested different behavior with %
Tested different sequences. Using list, tuple which should pass, string should fail.
Tested all data types, datetimes, overflow -- similar to binding by name
Tested array binding failure

Signed-off-by: Salil Chandra <[email protected]>

godlygeek · 2025-01-17T23:29:05Z

OK, I've finally had a chance to review this, and to sit down and think about it a bit more thoroughly. Overall, this looks very good to me. I've pushed a fixup commit that tweaks the wording of some of the documentation and exceptions, but left almost everything else alone. I did change my mind about one thing: I decided to accept other sequence types, rather than just limiting it to list and tuple. It might lead to confusing errors if the user accidentally passes a string, but the advantage of being able to pass a numpy array probably outweighs that, and I don't like our type annotations saying that Sequence is accepted when in actuality only a handful of sequence types are accepted.

While thinking more about it, though, I wonder if we should use :1 and :2 style placeholders instead of ? style. We'd get some advantages from that:

PEP 249 says "Module implementors should prefer numeric, named or pyformat over the other formats because these offer more clarity and flexibility.", so the :1 format is preferred over ?
We'd implement this by always binding by name, instead of by index - we'd just use the numeric position as the name, which sqlite weirdly but explicitly allows. This means that binding arrays would work, and we could remove all of the caveats about arrays not being able to be bound positionally.

What do you think of that idea, @chands10 ?

godlygeek · 2025-01-17T23:58:22Z

Ooh, I've also encountered a bug while testing with this locally:

>>> empty_blob = b""
>>> list(hndl.execute("select :blob", {"blob": empty_blob}))
[[b'']]
>>> list(hndl.execute("select ?", [empty_blob]))
[[None]]
>>>

It seems like, when binding by index, an empty blob is being turned into a null, but when binding by name it isn't. I think this might be a bug in cdb2_bind_index itself, and if so, that's another reason to implement positional binding in terms of cdb2_bind_param and cdb2_bind_array...

godlygeek · 2025-01-18T00:13:12Z

Slightly better demo for that bug:

>>> list(hndl.execute("select typeof(:x)", {"x": b""}))
[['blob']]
>>> list(hndl.execute("select typeof(?)", [b""]))
[['null']]

chands10 · 2025-01-18T02:45:03Z

Thanks for reviewing! I just checked and seems like there is one conditional missing regarding blobs in cdb2_bind_index that is present in cdb2_bind_params. Nice find! These two functions should really use the same helper function given that a lot of the code is the same, I will combine these so that one function does not become out of date.

The open source version of cdb2api does not have this problem thankfully.

>>> from comdb2 import dbapi2
>>> conn = dbapi2.connect("testdb", "local", autocommit=True)
>>> cursor=conn.cursor()
>>> cursor.execute("select @hi", {"hi": b""})
<comdb2.dbapi2.Cursor object at 0x7f45092dd960>
>>> cursor.fetchall()
[[b'']]
>>> cursor.execute("select ?", [b""])
<comdb2.dbapi2.Cursor object at 0x7f45092dd960>
>>> cursor.fetchall()
[[b'']]
>>>

I will reply to other comments next week. Have a good weekend!

chands10 · 2025-01-18T03:23:26Z

I also think it will be simple to add binding arrays by index to cdb2api. I made a quick copy paste implementation on my local copy of cdb2api to bind index and added functionality to python-comdb2

>>> from comdb2 import cdb2
>>> c = cdb2.CHandle("testdb", "local")
>>> list(c.execute('select 1 in carray(?)', [[1,2,3]]))
BINDING ARRAY BY INDEX
GOT RC 0
[[1]]
>>> list(c.execute('select 1 in carray(?)', [[2,3]]))
BINDING ARRAY BY INDEX
GOT RC 0
[[0]]

chands10 · 2025-01-21T16:11:14Z

I think it would be confusing to implement binding by index in python using the binding by name functions in cdb2api.
Also the docs state that cdb2_bind_index is faster than cdb2_bind_param so I think it would be good to implement it using the same function https://bloomberg.github.io/comdb2/c_api.html#cdb2_bind_index

Additionally it looks like the sqlite3 python api supports only named params and qmark, and that numeric params will be interpreted as named params https://docs.python.org/3/library/sqlite3.html#how-to-use-placeholders-to-bind-values-in-sql-queries

Personally I am implementing a query replayer program given a sample of queries from various dbs, and need a way to support rerunning all types of queries. Ideally I would like to use python cdb2 for this, but would need it to support using the cdb2_bind_index function so that qmark style queries would also be able to be rerun.

Mike is out of office today but I'll also ask him for his input

godlygeek · 2025-01-21T19:09:53Z

I think it would be confusing to implement binding by index in python using the binding by name functions in cdb2api.

Hm, I don't think I agree with this. Users shouldn't need to know or care what library functions we use to implement the interface we're exposing. The fact that python-comdb2 even uses libcdb2api at all is an implementation detail, the particular functions we use under the hood doubly so.

Additionally it looks like the sqlite3 python api supports only named params and qmark, and that numeric params will be interpreted as named params

I'm not sure that this really matters, either - except that a small handful of people have tried using in-memory sqlite databases as mocks to stand in for Comdb2 databases. But there are tons of differences between them, so what's one more for the pile...

I would like to use python cdb2 for this, but would need it to support using the cdb2_bind_index function so that qmark style queries would also be able to be rerun.

That's reasonable enough, though... I guess you're aware of which parameter binding API was used by the query you're trying to replay?

I have a slight preference for numeric over qmark, but this use case might be enough to tip me the other way and convince me that qmark is best after all.

Mike is out of office today but I'll also ask him for his input

OK, please do. We can easily implement either qmark or numeric, but whichever one we implement, we'll be stuck on that path and won't be able to make the other one work, at least without the user passing some sort of extra flag. I suppose qmark is good enough - we'll be able to use the ?2 ?1 style for free if we go down the qmark path. It's non-standard, but that's OK, we'd just need to document it. But I do want to point out that we're forever giving up the PEP 249 numeric style by taking that route.

Thanks for reviewing! I just checked and seems like there is one conditional missing regarding blobs in cdb2_bind_index that is present in cdb2_bind_params. Nice find!

I also think it will be simple to add binding arrays by index to cdb2api.

Can you give an ETA for when we could get each of these improvements into the version of cdb2api packaged internally for Bloomberg? I'd be thrilled if we could get array binding working from the start, we could remove big chunks of the documentation in this PR if we had it, and make it easier to announce and talk about this new feature.

chands10 · 2025-01-22T19:15:18Z

That's fair, I'll let you know what Mike says once I speak with him! Just was thinking it might be confusing if someone asks us to look at a bug regarding a python query that looks like it's being bound by index in cdb2api but actually isn't.

Yes I would know if a query from the samples is binding parameters by value or index.

My guess is that it would take a few weeks to review and roll out, but I have already made prs to support it!
bloomberg/comdb2#4973

godlygeek · 2025-01-22T19:47:29Z

Just was thinking it might be confusing if someone asks us to look at a bug

Ah, yeah. Won't be confusing for end users, but you're right, it's a bit confusing for maintainers.

Personally I am implementing a query replayer program

It's not really a goal of python-comdb2 to work as a libcdb2api test driver, so I wouldn't want to make design decisions based solely on this sort of use case... but since I'm pretty on the fence between the two options, I do think this is enough to tip the scales. I think I'm leaning towards going with the qmark approach, but I do still want to hear Mike's opinion before we commit to it.

mponomar · 2025-01-27T19:31:55Z

I don't have a strong opinion. The ?-style binding approach seems nicer from the user perspective, since it doesn't require you to count your parameters - a number that'll be used exactly nowhere else. The only advantage of :1-style names is that you can use them out of sequence, but you still have binding by name for that and it clashes with how SQLite interprets parameters (it'd treat them as names).

mponomar · 2025-01-27T19:34:08Z

A non-standard (from a PEP perspective) format seems fine since we're not changing the SQL string - the python library just packages up the parameters and leaves their meaning to the database (eg: if we really need to support ^1, ^2 as a format, nothing in the python library needs to change).

godlygeek · 2025-01-27T23:33:31Z

eg: if we really need to support ^1, ^2 as a format, nothing in the python library needs to change

Given that sqlite supporst ?1, ?2 out of the box, we can just use that. The only disadvantage being that it's not a standard placeholder format per PEP 249, but otherwise it seems fine enough.

OK, I'm convinced, let's go with the ? qmark style, and forever abandon my dreams of supporting the :1 numeric style, heh.

godlygeek · 2025-01-27T23:40:40Z

seems like there is one conditional missing regarding blobs in cdb2_bind_index that is present in cdb2_bind_params. Nice find!

I also think it will be simple to add binding arrays by index to cdb2api.

OK, then, last real decision: should we wait to land this until after these two improvements are in place, or should we land it with documented caveats about the inability to bind arrays and empty blobs, and then revisit the docs/implementation once those improvements are in place?

chands10 · 2025-01-28T15:27:37Z

Great!

I think it'll be easier if we wait, it shouldn't take too long to get those improvements in.

Signed-off-by: Salil Chandra <[email protected]>

chands10 force-pushed the bind_by_index branch 2 times, most recently from 2149a58 to d666570 Compare December 18, 2024 21:47

chands10 changed the title ~~Add support for binding parameters by index~~ Add support for binding parameters by index to cdb2 Dec 18, 2024

chands10 force-pushed the bind_by_index branch 2 times, most recently from 3280dfd to 6cc553f Compare December 19, 2024 03:22

chands10 marked this pull request as draft January 6, 2025 20:31

chands10 force-pushed the bind_by_index branch 8 times, most recently from 1540fdd to ec9ddc5 Compare January 7, 2025 18:35

chands10 marked this pull request as ready for review January 7, 2025 18:35

chands10 changed the title ~~Add support for binding parameters by index to cdb2~~ Add support for binding parameters by index Jan 7, 2025

chands10 force-pushed the bind_by_index branch from ec9ddc5 to ac89678 Compare January 7, 2025 19:25

chands10 force-pushed the bind_by_index branch from ac89678 to 3066d8a Compare January 17, 2025 18:30

Add support for binding parameters by index

2486583

Signed-off-by: Salil Chandra <[email protected]>

godlygeek force-pushed the bind_by_index branch from 6d0f8f9 to f095660 Compare January 17, 2025 23:16

fixup! Add support for binding parameters by index

6bba605

godlygeek force-pushed the bind_by_index branch from f095660 to 6bba605 Compare January 17, 2025 23:28

Add support for binding arrays by index

a6a728d

Signed-off-by: Salil Chandra <[email protected]>

chands10 force-pushed the bind_by_index branch from c844cba to a6a728d Compare January 29, 2025 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for binding parameters by index #83

Add support for binding parameters by index #83

chands10 commented Dec 18, 2024 •

edited

Loading

godlygeek commented Jan 17, 2025

godlygeek commented Jan 17, 2025

godlygeek commented Jan 18, 2025

chands10 commented Jan 18, 2025 •

edited

Loading

chands10 commented Jan 18, 2025

chands10 commented Jan 21, 2025

godlygeek commented Jan 21, 2025

chands10 commented Jan 22, 2025 •

edited

Loading

godlygeek commented Jan 22, 2025

mponomar commented Jan 27, 2025

mponomar commented Jan 27, 2025

godlygeek commented Jan 27, 2025 •

edited

Loading

godlygeek commented Jan 27, 2025

chands10 commented Jan 28, 2025

Add support for binding parameters by index #83

Are you sure you want to change the base?

Add support for binding parameters by index #83

Conversation

chands10 commented Dec 18, 2024 • edited Loading

godlygeek commented Jan 17, 2025

godlygeek commented Jan 17, 2025

godlygeek commented Jan 18, 2025

chands10 commented Jan 18, 2025 • edited Loading

chands10 commented Jan 18, 2025

chands10 commented Jan 21, 2025

godlygeek commented Jan 21, 2025

chands10 commented Jan 22, 2025 • edited Loading

godlygeek commented Jan 22, 2025

mponomar commented Jan 27, 2025

mponomar commented Jan 27, 2025

godlygeek commented Jan 27, 2025 • edited Loading

godlygeek commented Jan 27, 2025

chands10 commented Jan 28, 2025

chands10 commented Dec 18, 2024 •

edited

Loading

chands10 commented Jan 18, 2025 •

edited

Loading

chands10 commented Jan 22, 2025 •

edited

Loading

godlygeek commented Jan 27, 2025 •

edited

Loading