Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Variable page size for batch PQ writes #9508

Open
wants to merge 2 commits into
base: queue-timeouts-plus-batches
Choose a base branch
from

Conversation

andrewvc
Copy link
Contributor

@andrewvc andrewvc commented May 1, 2018

Part of #9389 , this implements a variably sized page allowing us to write the entire batch to the given page.

This is in a WIP state, its ready for a pre-review to assess its approach, but needs tests and some code cleanup potentially.

This introduces a new variable and the idea of pageDataCapacity on top of pageCapacity. The pageCapacity is really the page file size, pageDataCapacity is that size - the size of a file header for when we write the batch, since we need to know if it will fit the exact batch size.

@andrewvc andrewvc changed the title Variable page size for batch PQ writes [WIP] Variable page size for batch PQ writes May 1, 2018
@andrewvc andrewvc force-pushed the variable-page-size branch from db1ec23 to 9ea962b Compare May 1, 2018 22:35
First attempt at variable page sizes that passes all tests
@andrewvc andrewvc force-pushed the variable-page-size branch from 9ea962b to 04c0841 Compare May 1, 2018 22:37
@andrewvc andrewvc mentioned this pull request May 2, 2018
3 tasks
@original-brownbear
Copy link
Member

@andrewvc sorry for the delay, taking a look now :)

this.offsetMap = new IntVector();
this.checkSummer = new CRC32();
this.file = dirPath.resolve("page." + pageNum).toFile();
int existingFileSize = (int) file.length();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc this is somewhat optimistic I think. We should do a safe cast here (using Guava's Ints for example) here and hard fail if the length exceeds 2GB because we can't mmap that kind of file.
I know what we won't be able to write that kind of file in the first place, so this shouldn't ever come up but with the amount of weird bugs we ran into I think we should have visibility on that kind of problem if it occurs :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to handling the edge cases.

@original-brownbear
Copy link
Member

@andrewvc this LGTM for a merge to a feature branch for sure btw. Wouldn't wanna block you here if this blocks you with my nitpick on the typecast :)

@andrewvc
Copy link
Contributor Author

andrewvc commented May 3, 2018

@original-brownbear I'll do a check against Integer.MAX_VALUE and then merge if that works for you. I agree, we should handle that case.

@original-brownbear
Copy link
Member

@andrewvc sure sounds good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants