Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNDB-12503 remove column name restrictions in SAI #1539

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

k-rus
Copy link

@k-rus k-rus commented Jan 30, 2025

What is the issue

Trying to create an SAI or other index on a column, which name is either long or has
non-alphanumeric characters, fails, while the table was successfully
created with such name. This happens if an index name is not provided.
The reason for this limitation is that the column name is used in
the default index name, which is used in the file names.

What does this PR fix and why was it fixed

Fixes https://github.com/riptano/cndb/issues/12503, fixes https://github.com/riptano/cndb/issues/12609

This PR removes the check for column names during an index creation. Thus it allows to create indexes on existing columns with any names including qualified names and long names.

  1. The check was preventing qualified column names with characters, which cannot be used in file names. This was never an issue as the index name construction from table and column names already removes any non-alphanumeric or non-underscore characters.
  2. The check was preventing long column names. This limitation is removed by
  • Adding calculation of maximum amount characters, which can be used in index name, so it will allow to construct file names without exceeding the limit of 255 chars (255 is added as a constant).
  • Then the index name is truncated.
  1. The uniqueness of index names is already fixed by existing implementation, which adds counter prefix when needed.

This change to generated index name does not affect existing indexes as the existing name from IndexMetadata is always serialized to JSON, thus avoiding to re-generating the name.

This fix required to fix CQL tester, since it was assuming that the restriction is in place. So column name regex pattern matcher is fixed in createIndex to allow non-restricted column names. There might still be possible corner cases on some column names used for generating index names, which will not work with createIndex. In such case execute should be used.

It also includes several fixes in affected files:

  • Improve the name and usage of constants.
  • Fix warnings with minor improvements in affected files.

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

@k-rus k-rus force-pushed the rf-cndb-12503-sai-syntax branch from 5f71894 to 569922d Compare January 30, 2025 10:20
@k-rus k-rus requested a review from a team January 30, 2025 11:16
@k-rus k-rus force-pushed the rf-cndb-12503-sai-syntax branch from 569922d to 12441d7 Compare January 31, 2025 11:03
private static int calculateAllowedLength(int keyspaceNameLength, int tableNameLength)
{
// Prefixes and suffixes from IndexContext.getFullName
int addedLength1 = keyspaceNameLength + tableNameLength + 2 + 4;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers - can you make them into named constants?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer to avoid magic numbers. In this case I started with constants, but it became unreadable due to many constants used only once and capitalized names does not look readable. I am open for suggestion how to make maintainable and correct code without investing too much time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I exploded the code with named variables for values and as result putting into different functions as recommended by IntelliJ.

Comment on lines 117 to 120
// Prefixes and suffixes constructed by Version.stargazerFileNameFormat
int addedLength2 = SAI_DESCRIPTOR.length() + 4 + IndexComponentType.KD_TREE_POSTING_LISTS.representation.length() + 3;
// Prefixes from Descriptor constructor
addedLength2 += 2 + 2 + SSTableFormat.Type.BTI.name().length() + 16;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: those constant numbers scream they need names

@k-rus k-rus requested review from pkolaczk and a team February 3, 2025 11:42
@k-rus k-rus changed the title CNDB-12503 remove column restrictions in SAI CNDB-12503 remove column name restrictions in SAI Feb 3, 2025
@k-rus k-rus force-pushed the rf-cndb-12503-sai-syntax branch 2 times, most recently from cc08613 to c6cd552 Compare February 3, 2025 17:47
@k-rus
Copy link
Author

k-rus commented Feb 4, 2025

Fixed usage of renames in CNDB: https://github.com/riptano/cndb/pull/12715

@k-rus k-rus requested a review from adelapena February 7, 2025 09:20
*/
private static int calculateGeneratedIndexNameLength(int keyspaceNameLength, int tableNameLength)
{
int uniquenessSuffixLength = 3;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this sets a maximum of 100 indexes with a long name per keyspace. So for example a keyspace with a very long name could have 20 tables with 5 indexes each. It looks like a reasonable limit, albeit I wonder if for peace of mind we should use a max suffix length of 4 chars.

In any case, I would add a comment here about how this limits the number of indexes with a long name, and perhaps put that number in a constant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be fully correct it will be limited to 100 only in the case of long keyspace and table names. An index name is table + column names and it will be trimmed to allow later addition of keyspace and table names.
Currently the limit of 3 chars suffix is soft, i.e., if this condition satisfied, then later it can be more than 3 chars be added as it's not controlled there. It will not fail until the number of chars is big enough, since the name length constant of 222 chars is arbitrary to my understanding and is below the file name limit by OSs.
I avoid this corner case with uniqueness suffix, I think to control the result index name to fit the allowed size as it's now done only for the given index name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increased to 4 chars, i.e., 999 repetition of the same index name. It should be safe enough. There is still a bug of not validating length of index names, https://github.com/riptano/cndb/issues/13198

k-rus added 16 commits March 3, 2025 16:55
Several fixes:
- Improve the name of the utility function to check database object
names to provide more semantics.
- Remove duplicates of utility functions and constants.
- Fix warnings with minor improvements in affected files.
Also fix CNDB-12609

Trying to create an SAI on a column, which name is either long or has
non-alphanumeric characters, fails, while the table was successfully
created with such name. This happens if an index name is not provided.
The reason for this limitation is that the column name is used in
the default index name, which is used in the file names.

This commit fixes this limitation by removing characters other than
alphabetic, numeric or underscore when creating an index name and
trimming the index name to be acceptable for a file name.
Adds tests with long keyspace and table names.
Improves readability of calculating additional future length by
putting values into variables.
CQLTester.createIndex assumed non-qualified column names. Thus it was
necessary to adjust it, since this PR removes the restriction.
Allowed length of index name is used to check given index in addition to
truncate generated index name. The generated name needs also to take in
account the generated suffix. Thus, splitting in tow functions.
When an index name is generated from a column name, it keeps the same
capitalization for qualified column name, while SQL tester and existing
tests assume that any column name is switched to lower case only.
@k-rus k-rus force-pushed the rf-cndb-12503-sai-syntax branch from 2f4f862 to fc5f50e Compare March 3, 2025 16:06
k-rus added 4 commits March 4, 2025 21:53
After checking through code and test it was found that only one option
is used for file name with index name in it. So it's corrected and the
logic is moved into SAI.

Removed validation of index name length to not break existing indexes.
@k-rus k-rus force-pushed the rf-cndb-12503-sai-syntax branch from ad74ff0 to fa313cb Compare March 5, 2025 13:06
@k-rus k-rus requested a review from adelapena March 5, 2025 13:29
@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-1539 approved by Butler


Approved by Butler
See build details here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants