Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated code formatting and lint (latest ruff, mypy) #91

Merged

Conversation

alexander-beedie
Copy link
Contributor

@alexander-beedie alexander-beedie commented Mar 19, 2024

What is the motivation?

Was taking a look at the SurrealDB Python API, with a view to possibly adding some degree of Polars1 support for the async client/driver (I am one of the Polars devs), and realised that the lint/formatting could be made faster & brought up to date. Seemed like a good way to both look through the code and make an initial contribution 😎👍

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🥂 Improvement (non-breaking change which improves an existing feature)

What does this change do?

  • Updates ruff from 0.0.245 to 0.3.3 and expands lint scope.
  • Updates to the latest mypy version (and pins linter/formatter versions).
  • Drops use of black in favour of ruff format2 (nearly identical but much faster).
  • Updates related pre-commit hooks to use the newer versions (from the official "astral" repo).

Formatting differences

The formatting is (as you'd hope!) essentially the same, with most differences being marginal (such as an initial line break after the leading """ inside multiline docstrings). Some additional lines longer than the 88-char limit were found/fixed. For consistency, applied the formatter to the "tests" and "surrealdb" dirs, and enabled code formatting in docstrings.

Additional linting

Despite the substantial increase in the (potential) lint rules now covered, only a small handful of minor tidy-ups and Exception improvements were actually flagged.

Rules enabled before:

["I","D","N","UP"]

Rules enabled after:

["B","D","E","EM","F","FA","FBT001","I","N","PIE","PT",
 "PTH","RUF","SIM","TCH","TD","TID","TRY","UP","W"]

Exception improvements:

try:
    ...
except Exception as e:
    raise SurrealDbError(e) from None   # << the "from None" is new

Where a SurrealDbError is raised from within another Exception a "from None" designation3 is now added. The result is that the same error, with the same message, is raised, but without the unnecessary/misleading "During handling of the above exception, another exception occurred" context. This noticeably improves signal-to-noise when dealing with Exception traces.

Also:

A few super calls were also automatically simplified:

# before
instance = super(ConnectionController, cls).__call__(*args, **kwargs)
# after
instance = super().__call__(*args, **kwargs)

What is your testing strategy?

  • No functional changes were made, so no new tests were added.

  • Confirmed that the existing unit tests all run successfully.

    Screenshot 2024-03-20 at 09 55 24

Is this related to any issues?

Have you read the Contributing Guidelines?

Footnotes

  1. Polars: https://github.com/pola-rs/polars

  2. Ruff formatter: https://docs.astral.sh/ruff/formatter/

  3. PEP409: https://peps.python.org/pep-0409/#proposal

@maxwellflitton
Copy link
Contributor

@alexander-beedie thanks for the pull request. Looks good. Out of curiosity, what type of things are you planning to do with surrealDB.py and polars? A lot of friends have switched from pandas to polars and are amazed at the speed increases they are getting.

@alexander-beedie
Copy link
Contributor Author

alexander-beedie commented Mar 20, 2024

@alexander-beedie thanks for the pull request. Looks good. Out of curiosity, what type of things are you planning to do with surrealDB.py and polars? A lot of friends have switched from pandas to polars and are amazed at the speed increases they are getting.

Nice to hear!

Initially I'm just looking to allow use of a Surreal client as a connection-equivalent in the Polars read_database function. (If SurrealDB adds Arrow export support then we could have some really efficient integration here, otherwise we'll have to eat the expensive intermediate Python materialisation ;)

Experimenting on a local branch, I have the following test working already, which builds on the about-to-merge Polars support1 for async drivers that I committed yesterday:

from surrealdb import Surreal
import polars as pl
import asyncio

async def get_frame(query: str) -> pl.DataFrame:
    async with Surreal("ws://localhost:8000/rpc") as client:
        await client.use("test", "test")
        return pl.read_database(query=query, connection=client)

asyncio.run(get_frame("SELECT * FROM person LIMIT 1"))

# shape: (1, 5)
# ┌─────────────────────────────┬───────────┬────────┬─────────────────────────┬──────────────┐
# │ id                          ┆ marketing ┆ pass   ┆ tags                    ┆ user         │
# │ ---                         ┆ ---       ┆ ---    ┆ ---                     ┆ ---          │
# │ str                         ┆ bool      ┆ str    ┆ list[str]               ┆ str          │
# ╞═════════════════════════════╪═══════════╪════════╪═════════════════════════╪══════════════╡
# │ person:fvmbtxcv79dveedch4vk ┆ false     ┆ 123456 ┆ ["polars", "surrealdb"] ┆ Stroop Wafel │
# └─────────────────────────────┴───────────┴────────┴─────────────────────────┴──────────────┘

I added KuzuDB support2 to Polars (along with polars DataFrame export on their side) recently, and am interested generally in extending what we can accept beyond the standard relational DBs, in order to unlock some more interesting use-cases.

Footnotes

  1. https://github.com/pola-rs/polars/pull/15162

  2. https://github.com/pola-rs/polars/pull/14822

@maxwellflitton
Copy link
Contributor

Hey! Looping @CarolineMorton who is working on a project with SurrealDB on creating realistic synthetic data for healthcare research. She has used polars a fair bit in the past, and I know she wants the output of the synthetic data to be a polars df.

@CarolineMorton
Copy link

Hey @alexander-beedie

This is great. Thanks for the @ @maxwellflitton.

Thank you for your efforts on integrating SurrealDB with Polars. My project involves importing significant reference data (SNOMED/ICD10) into SurrealDB to leverage its query functionality for navigating through a graph of related information. This includes asynchronously sampling from these nodes and compiling the results into a structure similar to a table, with a preference for exporting these as Polars DataFrames. The use of Arrow in Polars for data handling is a key factor for this preference.

The work to allow a Surreal client to act as a connection in the Polars read_database function sounds really promising. The prospect of SurrealDB incorporating Arrow export capabilities could significantly benefit the "big data" community. Would you be open to discussing potential collaboration on this front?

Cheers

@AlexFrid
Copy link
Contributor

Thanks for the PR @alexander-beedie! 🎉

Adding Polars and Arrow support would be phenomenal 🚀
It's something that's been on my personal wishlist as well, so happy to support in that where I can.

@alexander-beedie
Copy link
Contributor Author

alexander-beedie commented Mar 24, 2024

Thank you for your efforts on integrating SurrealDB with Polars. My project involves importing significant reference data (SNOMED/ICD10) into SurrealDB to leverage its query functionality for navigating through a graph of related information. This includes asynchronously sampling from these nodes and compiling the results into a structure similar to a table, with a preference for exporting these as Polars DataFrames. The use of Arrow in Polars for data handling is a key factor for this preference.

@CarolineMorton, FYI the initial commit just landed: pola-rs/polars#15269. I'd expect us to make a release that contains the new code sometime in the coming week; would be very interested in your feedback and/or feature requests (or bug reports, but lets hope there aren't any ;) as Polars support for async database drivers is also new (landing in 0.20.17).

The work to allow a Surreal client to act as a connection in the Polars read_database function sounds really promising. The prospect of SurrealDB incorporating Arrow export capabilities could significantly benefit the "big data" community. Would you be open to discussing potential collaboration on this front?

Sure, though any Surreal → Arrow engineering work would realistically have to come from the Surreal folks - I'm more than happy to help test/integrate with Polars should such support arrive though! I think it would be a big win on many fronts 😎

@alexander-beedie
Copy link
Contributor Author

alexander-beedie commented Mar 24, 2024

@maxwellflitton: Anything else I need to do to help get this one merged? I already have another small PR pending that will deal with a minor Exception-related issue that I spotted while testing the new Polars read_database support ;)

Copy link
Contributor

@AlexFrid AlexFrid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with @maxwellflitton that its ready to merge, so I'll do that 🙌

@AlexFrid AlexFrid merged commit c8a4c84 into surrealdb:main Mar 25, 2024
5 checks passed
@alexander-beedie alexander-beedie deleted the update-ruff-lint-and-format branch March 25, 2024 13:03
@alexander-beedie
Copy link
Contributor Author

alexander-beedie commented Mar 27, 2024

FYI: @CarolineMorton (and @AlexFrid / @maxwellflitton ;) - pola-rs/polars#15269 just merged. We'll have our initial SurrealDB support, validated locally against both both Surreal ("ws") and SurrealHTTP ("http"), available in our (imminent) 0.20.17 release.

If the Surreal Python SDK/driver is eventually able to return Arrow data then we'll be able to take a significant step forward performance-wise (true zero-copy to Polars), but this should still be a good enabler for plenty of use-cases👌

Once it's released (likely in the next few days), let me know if you try it out and see any rough spots; async driver support as a whole is new to this release, and it will be good to get some "real world" testing (and feel free to tag me directly in any feedback).

@CarolineMorton
Copy link

Fantastic @alexander-beedie - Will check out the polars changes. Excited about moving this all forward.

@AlexFrid Shall we have a chat about adding arrow support some time

@alexander-beedie
Copy link
Contributor Author

✅ Released!
https://github.com/pola-rs/polars/releases/tag/py-0.20.17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants