Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ADO string support for mssql #295

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

1Dragoon
Copy link

A common way to connect to MSSQL servers is using an ADO string that tiberius natively supports. Can we have connectorx pass it through? The way I've done it here is pretty simplistic and may not be desirable because the end user may not get the best debugging information from it. Often the ADO string itself is referred to as a connection string, so it was what I tried to use first rather than a URL given the parameter is a string named 'conn'. I had to look through the code to realize that it was looking for a different format.

I'm a noob developer so let me know if there's another preferred way of doing this.

@wangxiaoying
Copy link
Contributor

Hi @1Dragoon , thanks for the PR! I think adding ADO string support will benefit many mssql users!

Other than the construction of mssql source, there are several other places that we need to parse and use the conn string (mainly for python users):
[1]. Identifying the database: https://github.com/sfu-db/connector-x/blob/main/connectorx-python/src/source_router.rs#L48
[2]. Get partition range for mssql: https://github.com/sfu-db/connector-x/blob/main/connectorx-python/src/source_router.rs#L356

To support these, do you think we can encapsulate the ADO string into a valid URI, such as "mssql://{ado string}"? We might need to ask the user to do url encoding on the ado string and decode it before Config::from_ado_string. We also need to have a different way to distinguish between the current conn string and ado string.

@1Dragoon
Copy link
Author

1Dragoon commented Jun 3, 2022

Yeah I'll look into doing those others as well. I'm not sure whether it would be commonly understood as a URI format though. I think it should be relatively easy to distinguish between the two rather than the haphazard trial/error approach I used in my local implementation.

I know one goal of the project is to be very fast though, so one concern of mine is that if we do any excessive parsing then that would work counter to that goal, or are we concerned about performance that much for the setup stage? I don't have any formal training as a developer whatsoever, just purely a hobbyist, so I don't know what most professionals expect.

One approach is we could accept with or without mssql://{ado string}, like for example by looking for i.e. "[Ss]erver=\w+" in the string

@wangxiaoying
Copy link
Contributor

I know one goal of the project is to be very fast though, so one concern of mine is that if we do any excessive parsing then that would work counter to that goal, or are we concerned about performance that much for the setup stage? I don't have any formal training as a developer whatsoever, just purely a hobbyist, so I don't know what most professionals expect.

I think we don’t need to worry about the performance here. Since our targets are scenarios in fetching large datasets, the overhead in parsing the conn string should be negligible.

One approach is we could accept with or without mssql://{ado string}, like for example by looking for i.e. "[Ss]erver=\w

One reason I suggest to encapsulate the ado string into a url is because our workflow is to parse a conn strong into a uri [1] and some of our functions takes the uri object as input.( for example in [2]) If the conn string is in different format we need some refactoring to handle both.

@olivermenschick
Copy link

If I understand correctly, this fixes the issue that many MSSQL have, correct?

@1Dragoon @wangxiaoying Is there anything in the pipeline yet? Can you make a suggestion on how to move forward?

@wangxiaoying
Copy link
Contributor

If I understand correctly, this fixes the issue that many MSSQL have, correct?

@1Dragoon @wangxiaoying Is there anything in the pipeline yet? Can you make a suggestion on how to move forward?

Hi @olivermenschick , I think the next step here is to embed the connection string into the whole process. Like mentioned above, other than really fetching the data, we also use the input url string in other places such as identify the data source and partition the query. Currently we put everything in the Url type, which cannot be converted from the ADO string directly.

One approach is to define a url special scheme for mssql ADO string like mssqlado://{ado string} or mssqlado://{file of the ado string} so that we can wrap it into Url type and know how to parse and configure the connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants