Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement show_chunks in C and have drop_chunks use it #642

Merged
merged 1 commit into from
Nov 28, 2018

Conversation

Ngalstyan4
Copy link
Contributor

Timescale provides an efficient and easy to use api to drop individual
chunks from timescale database through drop_chunks. This PR builds on
that functionality and through a new show_chunks function gives the
opportunity to see the chunks that would be dropped if drop_chunks was run.
Additionally, it adds a newer_than option to drop_chunks (also supported
by show_chunks) that allows to see/drop chunks in an interval or newer
than a point in time.

This commit includes:
- Implementation of show_chunks in C
- Additional helper functions to work with chunks
- New version of drop_chunks in sql that uses show_chunks. This
also adds a newer_than option to drop_chunks
- More enhanced tests of drop_chunks and new tests for show_chunks

Among other reasons, show_chunks was implemented in C in order
to be able to have both older_than and newer_than arguments be null. This
was not possible in SQL because the arguments had to have polymorphic types
and whether they are used in function body or not, PL/pgSQL requires these
arguments to typecheck.
Implements and resolves #572

Copy link
Contributor Author

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions for discussion before merge

sql/ddl_api.sql Outdated
SELECT _timescaledb_internal.time_to_internal(older_than, pg_typeof(older_than)) INTO older_than_internal;
PERFORM _timescaledb_internal.drop_chunks_impl(older_than_internal, table_name, schema_name, cascade);
PERFORM _timescaledb_internal.drop_chunks_impl(older_than, table_name, schema_name, cascade,
truncate_before => FALSE, newer_than_time => newer_than);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is truncate_before argument of drop_chunks_impl for?
it is not ever used and there is no public facing api for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is functionality hooks for an enterprise feature that might not be used anymore. @cevian would know more.

src/chunk.c Outdated Show resolved Hide resolved
src/chunk.c Outdated
if (IS_INTEGER_TYPE(time_column_type) && IS_INTEGER_TYPE(arg_type))
return;

if (arg_type == INTERVALOID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give a deprecation warning when drop_chunks or show_chunks is used with INTERVAL argument type?
During discussion of show_chunks api it has come up several times.
Main arguments for deprecating

  • It is as simple as passing now()- interval instead of interval when the option is not supported but makes it more explicit
  • It becomes less clear what exactly it means when there also is a newer_than interval (is it now() - interval or now() + interval ?)

@mfreed @cevian @erimatnor @JLockerman @amytai

@Ngalstyan4 Ngalstyan4 force-pushed the show_chunks branch 2 times, most recently from 1995bf9 to a8e7698 Compare August 6, 2018 17:51
@codecov-io
Copy link

codecov-io commented Aug 6, 2018

Codecov Report

Merging #642 into master will increase coverage by 0.06%.
The diff coverage is 89.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #642      +/-   ##
==========================================
+ Coverage   89.92%   89.99%   +0.06%     
==========================================
  Files          79       80       +1     
  Lines        8827     8978     +151     
==========================================
+ Hits         7938     8080     +142     
- Misses        889      898       +9
Impacted Files Coverage Δ
src/catalog.c 86.66% <ø> (ø) ⬆️
src/dimension_slice.c 99.44% <ø> (ø) ⬆️
src/extension_utils.c 74.28% <ø> (ø) ⬆️
src/planner.c 93.91% <ø> (ø) ⬆️
src/plan_add_hashagg.c 97.09% <ø> (ø) ⬆️
src/compat.c 0% <0%> (ø)
src/dimension.c 96.51% <100%> (-0.14%) ⬇️
src/hypertable.c 88.1% <100%> (+0.19%) ⬆️
src/utils.c 61.9% <78.57%> (-0.05%) ⬇️
src/chunk.c 92.04% <93.75%> (+0.45%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d461959...37fc267. Read the comment docs.

@Ngalstyan4 Ngalstyan4 force-pushed the show_chunks branch 4 times, most recently from daeaf1c to 955977d Compare August 6, 2018 20:15
@@ -0,0 +1 @@
DROP FUNCTION IF EXISTS _timescaledb_internal.dimension_get_time(integer);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on which release this function ends up in, we should remember to put this in relevant update file.

@Ngalstyan4 Ngalstyan4 force-pushed the show_chunks branch 3 times, most recently from a02defd to 9e9c00e Compare August 6, 2018 20:38
DROP FUNCTION IF EXISTS drop_chunks(INTERVAL, NAME, NAME, BOOLEAN);
DROP FUNCTION IF EXISTS drop_chunks(ANYELEMENT, NAME, NAME, BOOLEAN);
DROP FUNCTION IF EXISTS _timescaledb_internal.drop_chunks_impl(BIGINT, NAME, NAME, BOOLEAN, BOOLEAN)
DROP FUNCTION IF EXISTS _timescaledb_internal.dimension_get_time(INTEGER);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on which release this function ends up in, we should remember to put this in relevant update file.

@Ngalstyan4 Ngalstyan4 force-pushed the show_chunks branch 4 times, most recently from 84f42fa to 630971f Compare August 6, 2018 21:38
DROP FUNCTION IF EXISTS drop_chunks(ANYELEMENT, NAME, NAME, BOOLEAN);
DROP FUNCTION IF EXISTS _timescaledb_internal.drop_chunks_impl(BIGINT, NAME, NAME, BOOLEAN, BOOLEAN);
DROP FUNCTION IF EXISTS _timescaledb_internal.drop_chunks_type_check(REGTYPE, NAME, NAME);
DROP FUNCTION IF EXISTS _timescaledb_internal.dimension_get_time(INTEGER);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on which release this function ends up in, we should remember to put this in relevant update file.
Also if we plan a 0.12.0 release, this file should be renamed right @RobAtticus ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may have to rebase and move it there manually, not entirely sure how this will be handled. good job surfacing it early though

sql/ddl_api.sql Show resolved Hide resolved
sql/ddl_api.sql Outdated
SELECT _timescaledb_internal.time_to_internal(older_than, pg_typeof(older_than)) INTO older_than_internal;
PERFORM _timescaledb_internal.drop_chunks_impl(older_than_internal, table_name, schema_name, cascade);
PERFORM _timescaledb_internal.drop_chunks_impl(older_than, table_name, schema_name, cascade,
truncate_before => FALSE, newer_than_time => newer_than);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is functionality hooks for an enterprise feature that might not be used anymore. @cevian would know more.

sql/ddl_api.sql Outdated
-- if and only if all the hypertables in the database
-- have the same type as the given time constraint argument
CREATE OR REPLACE FUNCTION show_chunks(
hypertable_name REGCLASS = NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a REGCLASS the parameter should probably not be hypertable_name since the argument is really an OID (a name gets implicitly converted to the REGCLASS). In other words, you can do show_chunks(12423, ...);

-- specifying the caller name. This makes it easier to taylor
-- error messages to the caller function context.
CREATE OR REPLACE FUNCTION _timescaledb_internal.show_chunks_impl(
hypertable_name REGCLASS = NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some here as above w.r.t. name vs regclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed the discrepancy.
The reason I went with this was to be compatible with drop_chunks naming.
I think it would be ideal fro the two to have identical APIs so any call of drop_chunks could be refactored into a call of show_chunks by only changing function name.
I am happy to change this to hypertable or hypertable_regclass if you think that is better.
Could also add a drop_chunks implementation with similar API to provide interoperability

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer hypertable. Regarding drop_chunks, it is using two NAME parameters as opposed to a REGCLASS so there _namesuffix is appropriate. Still, I think that drop_chunks should probably take a REGCLASS instead. I don't think there's a good reason why it now takes two NAME params (apart from being legacy and probably my fault).

We should discuss changing drop_chunks to use REGCLASS and I think for most use-cases it would be safe since with both old and new function you can do:

drop_chunks(now() - '1 day', 'mytable');

And only in case of specifying a schema in the old way it would break.

sql/ddl_api.sql Show resolved Hide resolved
src/chunk.c Outdated Show resolved Hide resolved
src/chunk.c Outdated
}
else
{
ht = hypertable_cache_get_entry(hypertable_cache, table_relid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do hypertable_get_by_name here to avoid the cache altogether.

src/chunk.c Outdated
hypertable_cache = hypertable_cache_pin();
if (PG_ARGISNULL(0))
{
hypertable_get_all(&hypertables, CurrentMemoryContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually need only a set of dimension_scans here. But maybe getting the full hypertables are fine, since that's probably what you do in the single case.

src/hypertable.c Outdated
.scandirection = ForwardScanDirection,
.result_mctx = mctx,
.tuplock = {//not sure what correct values here would be
.waitpolicy = LockWaitBlock,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to set anything here if not grabbing tuple locks. But why can't this function simply use hypertable_scan_limit_internal?

src/utils.c Outdated
* set returning function for this to work.
*/
Datum
srf_return_list(FunctionCallInfo fcinfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this as a utility function if its not used elsewhere? I also think this obscures what is going on in the SRF functions that call this list, especially since they also have to do SRF_IS_FIRSTCALL().

@Ngalstyan4 Ngalstyan4 force-pushed the show_chunks branch 2 times, most recently from ee941e3 to d3566b9 Compare August 14, 2018 15:47
Copy link
Contributor Author

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered @erimatnor's questions and made some clarifying comments in the code.
Will squash and rebase once we are done discussing.
Did not rebase now to preserve some of the old comments. Also will need to talk with Rob to see what I need to change in update files for 0.12.

sql/ddl_api.sql Show resolved Hide resolved
sql/ddl_api.sql Show resolved Hide resolved
-- specifying the caller name. This makes it easier to taylor
-- error messages to the caller function context.
CREATE OR REPLACE FUNCTION _timescaledb_internal.show_chunks_impl(
hypertable_name REGCLASS = NULL,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed the discrepancy.
The reason I went with this was to be compatible with drop_chunks naming.
I think it would be ideal fro the two to have identical APIs so any call of drop_chunks could be refactored into a call of show_chunks by only changing function name.
I am happy to change this to hypertable or hypertable_regclass if you think that is better.
Could also add a drop_chunks implementation with similar API to provide interoperability

-- Note that currently this failure is triggered by SQL typecheker when it tries and fails to resolve
-- ANYELEMENT args to a concrete type. Explicit error checking may be necessary if drop_chunks implementation
-- is fully ported to C.
SELECT drop_chunks();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test to make sure drop_chunks without args does not succeed. Discussed this above

@@ -833,6 +852,14 @@ set_complete_chunk(ChunkScanCtx *scanctx, Chunk *chunk)
return false;
}

static bool
chunk_scan_context_add_chunk(ChunkScanCtx *scanctx, Chunk *chunk)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE(erik):

Why is this function name set_all_chunks although only one chunk is set?
chunk_scan_context_append_chunk() seems like more accurate name.
Not even sure why this function is necessary since the chunks are already in the scan context and you can iterate the context similar to how you iterate a list?

What do you mean chunks are already in the scan context?
I could not find a direct way of extracting them and anything I tried involved quite some code repetition. Looks like to do it directly, I would need to repeat a lot of chunk_scan_ctx_foreach_chunk code which includes quite some boilerplate.

Copy link
Contributor

@erimatnor erimatnor Nov 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was that the scan context already contains the chunks you need, so instead of returning a new "container" of the chunks "extracted" from the context, just return the context. So, you would do like follows:

For each hypertable:

  1. Scan for chunks -> return scan context
  2. Add the size of the scan context (# hash table entries) to a cumulative num_chunks count.
  3. Add scan context to a list or array of size num_hypertables

Once done:

  1. Allocate a Chunk * array of size num_chunks
  2. Iterate each scan context and add each chunk to the Chunk * array
  3. Sort the Chunk * array

This would avoid the boilerplate code needed for the Chunks container and avoid a lot of repallocs

* set returning function for this to work.
*/
static Datum
chunks_return_srf(FunctionCallInfo fcinfo)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

Do we really need this as a utility function if its not used elsewhere? I also think this obscures what is going on in the SRF functions that call this list, especially since they also have to do SRF_IS_FIRSTCALL().

refactored and made static
having this exposed made even less sense when it became so Chunk-specific

src/chunk.c Outdated
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("When calling the internal function for show_chunks "
"caller_name cannot be null")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I error here?
I can just add a check and consider this to be same "show_chunks" default case as above?
This C function is registered as 2 SQL functions. when external SQL show_chunks is called, only 3 arguments are passed and this case never arises. This will arise iff the function is called in C with 4 args and the last one is set to NULL explicitly.

src/chunk.c Outdated
Chunk v1 = *((const Chunk *) ch1);
Chunk v2 = *((const Chunk *) ch2);

if (v1.fd.hypertable_id < v2.fd.hypertable_id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be hypertable id or hypertable oid?
@erimatnor in your initial review comment you said hypertable_oid but I thought id would make more sense.

Initially, @cevian suggested to just order everything by chunk oid without considering hypertables since it gives them in creation time order. Just something to think about when deciding the final sort order.

src/chunk.h Outdated Show resolved Hide resolved
src/chunk.c Outdated

if (SRF_IS_FIRSTCALL())
{
MemoryContext oldcontext;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if does 3 main things

  • declare and populate local variables from passed arguments
  • typecheck all these variables
  • for each hypertable call other functions to get chunks

first two take most of the space. the second cannot easily be moved to a different function because each paragraph of code that does checking works with >7 variables from current context and it would not be ideal to pass all these variables around.

One option would be to pass fcinfo of current context to another function and do all the checking there but postgres C function definition guide recommends against this style.
I just though there is nothing complex in this block and most of it is routine typechecking and breaking it up might make it less readable so was not sure how to change.

I did get rid of some list->array ->list conversions which made this slightly shorter.

src/chunk.c Outdated
}
else
{
ht = hypertable_cache_get_entry(hypertable_cache, table_relid);
Copy link
Contributor Author

@Ngalstyan4 Ngalstyan4 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

You can do hypertable_get_by_name here to avoid the cache altogether.

I am not sure I can do that because the function, unlike drop_chunks takes regclass hypertable oid and not schema NAME and table NAME.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you can easily get the relation name and schema from the relid (e.g., get_rel_name).

@alanhamlett
Copy link

What's the status of this PR? I'm interested in using newer_than with show_chunks and drop_chunks to facilitate exporting and archiving ranges of data.

@Ngalstyan4 Ngalstyan4 force-pushed the show_chunks branch 2 times, most recently from eb5b133 to a4bd4bc Compare October 5, 2018 20:07
Copy link
Contributor

@cevian cevian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Narek, still some stuff to do but I think it's coming along.

sql/ddl_internal.sql Outdated Show resolved Hide resolved
src/chunk.c Outdated Show resolved Hide resolved
src/chunk.h Outdated Show resolved Hide resolved
src/utils.h Outdated Show resolved Hide resolved
src/utils.h Outdated Show resolved Hide resolved
src/chunk.c Outdated Show resolved Hide resolved
foreach(lc, hypertables)
{
ht = lfirst(lc);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we break out line 1190 to line 1245 into a separate function? Another word it would return chunks_added based on a hypertable, time_dimension, older_than, older_than_type, newer_than, newer_than_type, funcctx->multi_call_memory_ctx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I factored it out in a separate function. Let me know what you think. That function takes a lot of arguments and really is useful only if arguments depend directly on user input. I am not sure it would be reusable in other contexts

src/chunk.c Outdated

if (SRF_IS_FIRSTCALL())
{
MemoryContext oldcontext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can (and should) put the main logic inside this if in a separate function. You can do all the argument checking before passing them into a function. In particular, that separate function will be reusable (I've already refactored it myself in a C implementation of drop_chunks). It's true that the function will take something like 7 variables, but if it is reusable and makes for more readable code, I don't think that matters.

src/chunk.c Outdated Show resolved Hide resolved
@Ngalstyan4 Ngalstyan4 force-pushed the show_chunks branch 3 times, most recently from a705ba7 to 578edb8 Compare November 5, 2018 21:47
Copy link
Contributor Author

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comments above and added some comments that I would like response to.
Left my individual commits in case you would like to see the diffs between them but will squash before merge.

@@ -82,6 +82,7 @@ set(MOD_FILES
updates/1.0.0-rc2--1.0.0-rc3.sql
updates/1.0.0-rc3--1.0.0.sql
updates/1.0.0--1.0.1-dev.sql
updates/1.0.1--1.1.0.sql
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which update file these will end up in but looks like it will not be 1.0.*

*/
/* qsort comparison function for Oids */
int
oid_cmp(const void *p1, const void *p2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of chunk sorting uses their hypertable and chunk IDs to sort. So oid_cmp is in fact not used as comparator. I can remove this if you think it is not likely to be useful later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still the case.

foreach(lc, hypertables)
{
ht = lfirst(lc);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I factored it out in a separate function. Let me know what you think. That function takes a lot of arguments and really is useful only if arguments depend directly on user input. I am not sure it would be reusable in other contexts

src/chunk.c Outdated
{
MemoryContext oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);

chunks = chunks_alloc(0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added this. Previously, client crashed if there were no hypertables and show_chunks was called.
As an alternative, it would make sense to error in here instead of returning 0 rows. That is what, for example \dt does in psql. Let me know what you think. @cevian

-- specifying the caller name. This makes it easier to taylor
-- error messages to the caller function context.
CREATE OR REPLACE FUNCTION _timescaledb_internal.show_chunks_impl(
hypertable_name REGCLASS = NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer hypertable. Regarding drop_chunks, it is using two NAME parameters as opposed to a REGCLASS so there _namesuffix is appropriate. Still, I think that drop_chunks should probably take a REGCLASS instead. I don't think there's a good reason why it now takes two NAME params (apart from being legacy and probably my fault).

We should discuss changing drop_chunks to use REGCLASS and I think for most use-cases it would be safe since with both old and new function you can do:

drop_chunks(now() - '1 day', 'mytable');

And only in case of specifying a schema in the old way it would break.

src/chunk.c Outdated
}
else
{
ht = hypertable_cache_get_entry(hypertable_cache, table_relid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you can easily get the relation name and schema from the relid (e.g., get_rel_name).

@@ -833,6 +852,14 @@ set_complete_chunk(ChunkScanCtx *scanctx, Chunk *chunk)
return false;
}

static bool
chunk_scan_context_add_chunk(ChunkScanCtx *scanctx, Chunk *chunk)
Copy link
Contributor

@erimatnor erimatnor Nov 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was that the scan context already contains the chunks you need, so instead of returning a new "container" of the chunks "extracted" from the context, just return the context. So, you would do like follows:

For each hypertable:

  1. Scan for chunks -> return scan context
  2. Add the size of the scan context (# hash table entries) to a cumulative num_chunks count.
  3. Add scan context to a list or array of size num_hypertables

Once done:

  1. Allocate a Chunk * array of size num_chunks
  2. Iterate each scan context and add each chunk to the Chunk * array
  3. Sort the Chunk * array

This would avoid the boilerplate code needed for the Chunks container and avoid a lot of repallocs

src/chunk.c Outdated
static List *
chunk_scan_ctx_get_all_chunks(ChunkScanCtx *ctx)
{
ctx->data = chunks_alloc(CHUNKS_DEFAULT_CAPACITY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can just allocate with the number of entries in the hash table within the scan context. I am not sure you even need the expandable Chunks container. I think you might be able to get away with a fixed size Chunk * array without all the boilerplate code for the Chunks type. See also my comment above about this.

@Ngalstyan4 Ngalstyan4 force-pushed the show_chunks branch 3 times, most recently from 256b549 to 3b1d846 Compare November 26, 2018 03:31
Copy link
Contributor Author

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of expanding Chunk-array data structure and made sure all allocations are of predetermined size.

* num_chunks can safely be 0 as palloc protects against unportable
* behavior.
*/
chunks = palloc(sizeof(Chunk *) * num_chunks);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I allocate for Chunk* pointers here? Note that this is called after hash context destroy. It looks like hash context destroy only affects ctx->htab and this allocation works so Chunks are not freed.
But I am not sure where exactly are Chunks structs freed. There is a function for doing that but is never called.
@erimatnor just wanted to draw your attention on this to make sure this is not going to result in some kind of leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the option of actually allocating Chunk structs as opposed to pointers and the behavior in that case will definitely be clearer to understand from the code. However, Chunks are pretty big.
On the other hand, we could make it an array of Oids and it would actually simplify and generalize the chunks_return_srf function. But this becomes problematic if we want to sort the oids in the order defined by chunk_cmp as it uses other info from Chunk for sorting.

src/chunk.c Outdated
{
/* Get all the chunks from the context */
ctxs[i]->data = current;
chunk_scan_ctx_foreach_chunk(ctxs[i], chunk_scan_context_add_chunk, -1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypertables already come in sorted order so we could technically do this instead of sorting everything later
qsort(current, (Chunk **) ctxs[i]->data - current, sizeof(Chunk *), chunk_cmp);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it is worth it. Also silently breaks if someone changes ordering of hypertables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, if you'd do a partial sort, I a further optimization is probably to combine that with a lazy scan of chunks. So, on firstcall you just scan for hypertables, then in chunks_return_srf you lazily scan for chunks for each hypertable as needed. This would reduce memory usage as you only keep one per-hypertable chunks array at a time and you need not scan some chunks in case of a LIMIT clause.

Just suggesting as feedback, but I don't think it is required that you do that now.

*/
/* qsort comparison function for Oids */
int
oid_cmp(const void *p1, const void *p2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still the case.

CREATE VIEW dependent_view AS SELECT * FROM _timescaledb_internal._hyper_1_1_chunk;
\set ON_ERROR_STOP 0
SELECT drop_chunks(2);
ERROR: cannot drop table _timescaledb_internal._hyper_1_1_chunk because other objects depend on it
SELECT drop_chunks(NULL::interval);
ERROR: can only use drop_chunks with an INTERVAL for TIMESTAMP, TIMESTAMPTZ, and DATE types
ERROR: older_than and newer_than timestamps provided to drop_chunks cannot both be NULL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erimatnor @svenklemm note that the error messages here differ after introducing show_chunks
I think the new ones make more sense in this context but just wanted to get your opinion on this as well.
Tests introduced at #861

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ngalstyan4 Sure, i didnt focus on the error messages themselves only that there are tests for NULL and that NULL doesnt crash the backend or produces weird errors. Feel free to improve the wording of the error messages.

Copy link
Contributor

@erimatnor erimatnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but added some suggestions for clarity fixes.

src/hypertable.c Outdated Show resolved Hide resolved
src/chunk.c Outdated Show resolved Hide resolved
src/chunk.c Outdated Show resolved Hide resolved
src/chunk.c Outdated
{
/* Get all the chunks from the context */
ctxs[i]->data = current;
chunk_scan_ctx_foreach_chunk(ctxs[i], chunk_scan_context_add_chunk, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it is worth it. Also silently breaks if someone changes ordering of hypertables.

src/chunk.c Outdated Show resolved Hide resolved
src/chunk.c Show resolved Hide resolved
src/chunk.c Outdated
{
/* Get all the chunks from the context */
ctxs[i]->data = current;
chunk_scan_ctx_foreach_chunk(ctxs[i], chunk_scan_context_add_chunk, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, if you'd do a partial sort, I a further optimization is probably to combine that with a lazy scan of chunks. So, on firstcall you just scan for hypertables, then in chunks_return_srf you lazily scan for chunks for each hypertable as needed. This would reduce memory usage as you only keep one per-hypertable chunks array at a time and you need not scan some chunks in case of a LIMIT clause.

Just suggesting as feedback, but I don't think it is required that you do that now.

src/chunk.c Outdated Show resolved Hide resolved
Timescale provides an efficient and easy to use api to drop individual
chunks from timescale database through drop_chunks. This PR builds on
that functionality and through a new show_chunks function gives the
opportunity to see the chunks that would be dropped if drop_chunks was run.
Additionally, it adds a newer_than option to drop_chunks (also supported
by show_chunks) that allows to see/drop chunks in an interval or newer
than a point in time.

This commit includes:
    - Implementation of show_chunks in C
    - Additional helper functions to work with chunks
    - New version of drop_chunks in sql that uses show_chunks. This
      	  also adds a newer_than option to drop_chunks
    - More enhanced tests of drop_chunks and new tests for show_chunks

Among other reasons, show_chunks was implemented in C in order
to be able to have both older_than and newer_than arguments be null. This
was not possible in SQL because the arguments had to have polymorphic types
and whether they are used in function body or not, PL/pgSQL requires these
arguments to typecheck.
@cevian cevian merged commit 9a34028 into timescale:master Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: support for exporting data before deleting it via drop_chunks
8 participants