Correct implementation of sync requests #154

steve-chavez · 2024-10-02T19:21:12Z

Problem

Sync requests were previously supported using net.collect_response (deprecated); which is bad for performance, since it does an infinite loop plus pg_sleep.

pg_net/sql/pg_net.sql

Lines 50 to 76 in 28b26d8

    
           create or replace function net._await_response( 
        
               request_id bigint 
        
           ) 
        
               returns bool 
        
               volatile 
        
               parallel safe 
        
               strict 
        
               language plpgsql 
        
           as $$ 
        
           declare 
        
               rec net._http_response; 
        
           begin 
        
               while rec is null loop 
        
                   select * 
        
                   into rec 
        
                   from net._http_response 
        
                   where id = request_id; 
        
                   if rec is null then 
        
                       -- Wait 50 ms before checking again 
        
                       perform pg_sleep(0.05); 
        
                   end if; 
        
               end loop; 
        
               return true; 
        
           end; 
        
           $$;

Some simple use cases require sync requests (https://github.com/orgs/supabase/discussions/28771). While this is bad for performance, it can help some users get going.

Solution

Now that #139 is merged, we can implement sync requests efficiently by waiting on the socket file descriptor. No need for loops or querying the response table.

This would be implemented as a PROCEDURE, so users can only invoke it with CALL, which prevents combining it with other queries on the same statement. This way it'll be more clear that this is a special (and slow) operation.

The text was updated successfully, but these errors were encountered:

MathNerd28 · 2024-10-06T20:46:54Z

Would it be possible to implement a version that waits on multiple requests?

I'm using pg_cron jobs to import data from an external service using HTTP GET requests. pgsql-http is not an option for me, because these jobs can potentially execute hundreds of requests (in batches); I need them to run asynchronously.

Here's what I use right now:

CREATE OR REPLACE PROCEDURE sync.await_responses(
  request_ids bigint[],
  timeout interval
)
AS $$
DECLARE
  start_time CONSTANT timestamptz := now();
BEGIN
  DROP TABLE IF EXISTS remaining;
  CREATE TEMP TABLE remaining AS
  SELECT id
  FROM unnest(request_ids) requests(id);

  WHILE TRUE LOOP
    -- transactions prevent changes from appearing
    COMMIT;

    -- if a response has an error of any kind, abort
    IF EXISTS (
      SELECT 1
      FROM
        net._http_response response
        JOIN remaining request ON response.id = request.id
      WHERE
        response.timed_out OR
        response.error_msg IS NOT NULL OR
        NOT (response.status_code = 200 OR response.status_code = 304)
      LIMIT 1
    )
    THEN
      RAISE EXCEPTION 'server responded with error';
    END IF;

    -- if a request returned successfully, remove it from remaining
    DELETE FROM remaining
      USING net._http_response response
    WHERE remaining.id = response.id;

    -- if there are no more requests, return successfully
    IF (SELECT count(*) FROM remaining) = 0 THEN
      RETURN;
    END IF;

    -- timeout if this is taking too long
    IF now() - start_time >= timeout THEN
      RAISE EXCEPTION 'timeout while waiting for responses';
    END IF;

    -- wait for more responses
    PERFORM pg_sleep(0.1);
  END LOOP;
END;
$$ LANGUAGE plpgsql;

steve-chavez · 2024-10-14T18:24:44Z

@MathNerd28 Interesting use case. I guess we could have a function like:

call wait_response([1,2,3]) -- where the numbers are the request ids

But I'm thinking of a better way that doesn't involve waiting. Using the idea on #62:

select cron.schedule('30 3 * * 6', $_$
    select net.http_get(
      url := 'http://domain.com',
    , success_cb := $$ 
        insert into import_table values ($1, $2, $3) -- $1=status, $2=headers, $3=body
    $$
    );  
$_$);

This way we don't have to add a sync function, which could be misused.

MathNerd28 · 2024-10-14T18:35:56Z

That's definitely a much cleaner solution that would work for most use cases.

I still think it's worth implementing a wait_responses procedure though, because there may still be times where you need to perform a single action after multiple requests, and don't want the hassle of maintaining and cleaning state. At least give the option, even if you add documentation that makes it clear this is not a recommended practice.

An example might be if your post-request action required a lock on a table, and you wanted to minimize the downtime for other write operations.

steve-chavez added the idea label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct implementation of sync requests #154

Correct implementation of sync requests #154

steve-chavez commented Oct 2, 2024 •

edited

Loading

MathNerd28 commented Oct 6, 2024

steve-chavez commented Oct 14, 2024

MathNerd28 commented Oct 14, 2024

Correct implementation of sync requests #154

Correct implementation of sync requests #154

Comments

steve-chavez commented Oct 2, 2024 • edited Loading

Problem

Solution

MathNerd28 commented Oct 6, 2024

steve-chavez commented Oct 14, 2024

MathNerd28 commented Oct 14, 2024

steve-chavez commented Oct 2, 2024 •

edited

Loading