docs: update readme latency (#10)

* Update README.md with information about memory/latency
apify · Sep 4, 2024 · 90c7bd0 · 90c7bd0
1 parent 1cfe48c
commit 90c7bd0
Show file tree

Hide file tree

Showing 3 changed files with 108 additions and 36 deletions.
diff --git a/README.md b/README.md
@@ -1,21 +1,21 @@
-## RAG Web Browser
+## 🌐 RAG Web Browser
 
 This Actor retrieves website content from the top Google Search Results Pages (SERPs).
 Given a search query, it fetches the top Google search result URLs and then follows each URL to extract the text content from the targeted websites.
 
 The RAG Web Browser is designed for Large Language Model (LLM) applications or LLM agents to provide up-to-date Google search knowledge.
 
-**Main features**:
+**🚀 Main features**:
 - Searches Google and extracts the top Organic results.
 - Follows the top URLs to scrape HTML and extract website text, excluding navigation, ads, banners, etc.
 - Capable of extracting content from JavaScript-enabled websites and bypassing anti-scraping protections.
 - Output formats include plain text, markdown, and HTML.
 
 This Actor is a combination of a two specialized actors:
-- [Google Search Results Scraper](https://apify.com/apify/google-search-scraper)
-- [Website Content Crawler](https://apify.com/apify/website-content-crawler)
+- Are you looking to scrape Google Search Results? Check out the [Google Search Results Scraper](https://apify.com/apify/google-search-scraper) actor.
+- Do you need extract content from a list of URLs? Explore the [Website Content Crawler](https://apify.com/apify/website-content-crawler) actor.
 
-### Fast responses using the Standby mode
+### 🏎️ Fast responses using the Standby mode
 
 This Actor can be run in both normal and [standby modes](https://docs.apify.com/platform/actors/running/standby).
 Normal mode is useful for testing and running in ad-hoc settings, but it comes with some overhead due to the Actor's initial startup time.
@@ -26,7 +26,7 @@ This allows the Actor to stay active, enabling it to retrieve results with lower
 *Limitations*: Running the Actor in Standby mode does not support changing crawling and scraping configurations using query parameters.
 Supporting this would require creating crawlers on the fly, which would add an overhead of 1-2 seconds.
 
-#### How to start the Actor in a Standby mode?
+#### 🔥 How to start the Actor in a Standby mode?
 
 You need the Actor's standby URL and `APIFY_API_TOKEN`. Then, you can send requests to the `/search` path along with your `query` and the number of results (`maxResults`) you want to retrieve.
 
@@ -60,7 +60,7 @@ Here’s an example of the server response (truncated for brevity):
 The Standby mode has several configuration parameters, such as Max Requests per Run, Memory, and Idle Timeout.
 You can find the details in the [Standby Mode documentation](https://docs.apify.com/platform/actors/running/standby#how-do-i-customize-standby-configuration).
 
-## API parameters
+## 📧 API parameters
 
 When running in the standby mode the RAG Web Browser accept the following query parameters:
 
@@ -72,36 +72,34 @@ When running in the standby mode the RAG Web Browser accept the following query
 | `requestTimeoutSecs` | Timeout (in seconds) for making the search request and processing its response                       |
 
 
-### What is the best way to run the RAG Web Browser?
+### 🏃 What is the best way to run the RAG Web Browser?
 
 The RAG Web Browser is designed to be run in Standby mode for optimal performance.
 The Standby mode allows the Actor to stay active, enabling it to retrieve results with lower latency.
+
+### 🕒 What is the expected latency?
+
 The latency is proportional to the memory allocated to the Actor and number of results requested.
 
 Here is a typical latency breakdown for the RAG Web Browser.
-Please note the these results are only indicative and may vary based on the search term and the target websites.
+Please note the these results are only indicative and may vary based on the search term, the target websites,
+and network latency.
 
-The numbers below are based on the following search terms: "apify", "Donald Trump", "boston". Results were averaged for the three queries.
+The numbers below are based on the following search terms: "apify", "Donald Trump", "boston".
+Results were averaged for the three queries.
 
 | Memory (GB) | Max Results | Latency (s) |
 |-------------|-------------|-------------|
 | 2           | 1           | 36          |
 | 2           | 5           | 88          |
 | 4           | 1           | 22          |
+| 4           | 3           | 31          |
 | 4           | 5           | 46          |
 
+Based on your requirements, if low latency is a priority, consider running the Actor with 4GB or more of memory.
+However, if you're looking for a cost-effective solution, you can run the Actor with 2GB of memory.
 
-#### Looking to scrape Google Search Results?
-- Check out the [Google Search Results Scraper](https://apify.com/apify/google-search-scraper) actor.
-
-#### Need to extract content from a list of URLs?
-- Explore the the [Website Content Crawler](https://apify.com/apify/website-content-crawler) actor.
-
-Browsing Tool
-- https://community.openai.com/t/new-assistants-browse-with-bing-ability/479383/27
-
-
-### Development
+### 👷🏼 Development
 
 #### Run STANDBY mode using apify-cli for development
 ```bash

diff --git a/data/performance_measures.md b/data/performance_measures.md
@@ -22,7 +22,7 @@ playwright-wait-dynamic-content: 7029
 playwright-remove-cookie: 1073
 playwright-parse-with-cheerio: 5564
 playwright-process-html: 3829
-playwright-before-response-send: 236 
+playwright-before-response-send: 236
 Time taken for each request: [ 49762, 16004, 42676 ]
 Time taken on average 36147.333333333336
 
@@ -132,15 +132,88 @@ before-cheerio-queue-add: 123
 cheerio-request-handler-start: 2637
 before-playwright-queue-add: 12
 playwright-request-start: 8517
-playwright-wait-dynamic-content: 6013 
-playwright-remove-cookie: 497 
-playwright-parse-with-cheerio: 2296 
-playwright-process-html: 1664 
-playwright-before-response-send: 110 
+playwright-wait-dynamic-content: 6013
+playwright-remove-cookie: 497
+playwright-parse-with-cheerio: 2296
+playwright-process-html: 1664
+playwright-before-response-send: 110
 Time taken for each request: [ 25433, 14899, 25276 ]
 Time taken on average 21869.333333333332
 ```
 
+# Memory 4GB, Max Results 3, Proxy: auto
+
+```text
+Average time for each time measure event: Map(10) {
+  'request-received' => [
+    0, 0, 0, 0, 0,
+    0, 0, 0, 0
+  ],
+  'before-cheerio-queue-add' => [
+    157, 157, 157,
+    107, 107, 107,
+    122, 122, 122
+  ],
+  'cheerio-request-handler-start' => [
+    1699, 1699, 1699,
+    4312, 4312, 4312,
+    2506, 2506, 2506
+  ],
+  'before-playwright-queue-add' => [
+    10, 10, 10, 13, 13,
+    13,  5,  5,  5
+  ],
+  'playwright-request-start' => [
+    16249, 17254, 26159,
+     6726,  9821, 11124,
+     7349,  8212, 29345
+  ],
+  'playwright-wait-dynamic-content' => [
+    1110, 10080, 10076,
+    6132,  1524, 18367,
+    3077,  2508, 10001
+  ],
+  'playwright-remove-cookie' => [
+    1883,  914, 133,
+    1176, 5072, 241,
+     793, 4234, 120
+  ],
+  'playwright-parse-with-cheerio' => [
+    1203, 1490,  801,
+     698, 2919,  507,
+     798, 1378, 2756
+  ],
+  'playwright-process-html' => [
+    2597, 1304, 1398,
+    1099, 6756, 1031,
+    2110, 5416, 2028
+  ],
+  'playwright-before-response-send' => [
+    105,  112, 74,
+    501, 3381, 26,
+    101, 1570, 69
+  ]
+}
+request-received: 0 s
+before-cheerio-queue-add: 129 s
+cheerio-request-handler-start: 2839 s
+before-playwright-queue-add: 9 s
+playwright-request-start: 14693 s
+playwright-wait-dynamic-content: 6986 s
+playwright-remove-cookie: 1618 s
+playwright-parse-with-cheerio: 1394 s
+playwright-process-html: 2638 s
+playwright-before-response-send: 660 s
+Time taken for each request: [
+  25013, 33020,
+  40507, 20764,
+  33905, 35728,
+  16861, 25951,
+  46952
+]
+Time taken on average 30966.777777777777
+```
+
 # Memory 4GB, Max Results 5, Proxy: auto
 
 ```text
@@ -205,15 +278,15 @@ Time taken on average 21869.333333333332
   ]
 }
 request-received: 0 s
-before-cheerio-queue-add: 145 
-cheerio-request-handler-start: 3117 
-before-playwright-queue-add: 41 
-playwright-request-start: 31449 
-playwright-wait-dynamic-content: 4987 
-playwright-remove-cookie: 1742 
-playwright-parse-with-cheerio: 2020 
-playwright-process-html: 2451 
-playwright-before-response-send: 558 
+before-cheerio-queue-add: 145
+cheerio-request-handler-start: 3117
+before-playwright-queue-add: 41
+playwright-request-start: 31449
+playwright-wait-dynamic-content: 4987
+playwright-remove-cookie: 1742
+playwright-parse-with-cheerio: 2020
+playwright-process-html: 2451
+playwright-before-response-send: 558
 Time taken for each request: [
   26517, 33101, 58388,
   71906, 81101, 30794,

diff --git a/src/performance-measures.ts b/src/performance-measures.ts
@@ -7,6 +7,7 @@ import { Actor } from 'apify';
 // const datasetId = 'aDnsnaBqGb8eTdpGv'; // 2GB, maxResults=1
 // const datasetId = 'giAPLL8dhd2PDqPlf'; // 2GB, maxResults=5
 // const datasetId = 'VKzel6raVqisgIYfe'; // 4GB, maxResults=1
+// const datasetId = 'KkTaLd70HbFgAO35y'; // 4GB, maxResults=3
 const datasetId = 'fm9tO0GDBUagMT0df'; // 4GB, maxResults=5
 
 // set environment variables APIFY_TOKEN