Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct implementation of sync requests #154

Open
steve-chavez opened this issue Oct 2, 2024 · 3 comments
Open

Correct implementation of sync requests #154

steve-chavez opened this issue Oct 2, 2024 · 3 comments
Labels

Comments

@steve-chavez
Copy link
Member

steve-chavez commented Oct 2, 2024

Problem

Sync requests were previously supported using net.collect_response (deprecated); which is bad for performance, since it does an infinite loop plus pg_sleep.

pg_net/sql/pg_net.sql

Lines 50 to 76 in 28b26d8

create or replace function net._await_response(
request_id bigint
)
returns bool
volatile
parallel safe
strict
language plpgsql
as $$
declare
rec net._http_response;
begin
while rec is null loop
select *
into rec
from net._http_response
where id = request_id;
if rec is null then
-- Wait 50 ms before checking again
perform pg_sleep(0.05);
end if;
end loop;
return true;
end;
$$;

Some simple use cases require sync requests (https://github.com/orgs/supabase/discussions/28771). While this is bad for performance, it can help some users get going.

Solution

Now that #139 is merged, we can implement sync requests efficiently by waiting on the socket file descriptor. No need for loops or querying the response table.

This would be implemented as a PROCEDURE, so users can only invoke it with CALL, which prevents combining it with other queries on the same statement. This way it'll be more clear that this is a special (and slow) operation.

@MathNerd28
Copy link

Would it be possible to implement a version that waits on multiple requests?

I'm using pg_cron jobs to import data from an external service using HTTP GET requests. pgsql-http is not an option for me, because these jobs can potentially execute hundreds of requests (in batches); I need them to run asynchronously.

Here's what I use right now:

CREATE OR REPLACE PROCEDURE sync.await_responses(
  request_ids bigint[],
  timeout interval
)
AS $$
DECLARE
  start_time CONSTANT timestamptz := now();
BEGIN
  DROP TABLE IF EXISTS remaining;
  CREATE TEMP TABLE remaining AS
  SELECT id
  FROM unnest(request_ids) requests(id);

  WHILE TRUE LOOP
    -- transactions prevent changes from appearing
    COMMIT;

    -- if a response has an error of any kind, abort
    IF EXISTS (
      SELECT 1
      FROM
        net._http_response response
        JOIN remaining request ON response.id = request.id
      WHERE
        response.timed_out OR
        response.error_msg IS NOT NULL OR
        NOT (response.status_code = 200 OR response.status_code = 304)
      LIMIT 1
    )
    THEN
      RAISE EXCEPTION 'server responded with error';
    END IF;

    -- if a request returned successfully, remove it from remaining
    DELETE FROM remaining
      USING net._http_response response
    WHERE remaining.id = response.id;

    -- if there are no more requests, return successfully
    IF (SELECT count(*) FROM remaining) = 0 THEN
      RETURN;
    END IF;

    -- timeout if this is taking too long
    IF now() - start_time >= timeout THEN
      RAISE EXCEPTION 'timeout while waiting for responses';
    END IF;

    -- wait for more responses
    PERFORM pg_sleep(0.1);
  END LOOP;
END;
$$ LANGUAGE plpgsql;

@steve-chavez
Copy link
Member Author

@MathNerd28 Interesting use case. I guess we could have a function like:

call wait_response([1,2,3]) -- where the numbers are the request ids

But I'm thinking of a better way that doesn't involve waiting. Using the idea on #62:

select cron.schedule('30 3 * * 6', $_$
    select net.http_get(
      url := 'http://domain.com',
    , success_cb := $$ 
        insert into import_table values ($1, $2, $3) -- $1=status, $2=headers, $3=body
    $$
    );  
$_$);

This way we don't have to add a sync function, which could be misused.

@MathNerd28
Copy link

That's definitely a much cleaner solution that would work for most use cases.

I still think it's worth implementing a wait_responses procedure though, because there may still be times where you need to perform a single action after multiple requests, and don't want the hassle of maintaining and cleaning state. At least give the option, even if you add documentation that makes it clear this is not a recommended practice.

An example might be if your post-request action required a lock on a table, and you wanted to minimize the downtime for other write operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants