Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: setValue and useState functions fail for large objects #2815

Open
1 task
lhotanok opened this issue Jan 16, 2025 · 0 comments
Open
1 task

bug: setValue and useState functions fail for large objects #2815

lhotanok opened this issue Jan 16, 2025 · 0 comments
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Milestone

Comments

@lhotanok
Copy link
Contributor

lhotanok commented Jan 16, 2025

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/core

Issue description

Functions Actor.setValue and Actor.useState / crawler.useState use JSON.stringify function internally to serialize objects. This function cannot handle very large objects and fails with Invalid string length error. This error is already caught and re-thrown in crawlee:

const error = e as Error;
// Give more meaningful error message
if (error.message?.indexOf('Invalid string length') >= 0) {
error.message = 'Object is too large';
}
throw new Error(`The "value" parameter cannot be stringified to JSON: ${error.message}`);

But it would be better to deal with large objects rather then re-throw the error. There're libraries for big JSON data available, such as:

They're based on stream processing.

The issue can be avoided when using Actor.setValue by stringifying the object before passing it to the setValue function:

import { stringify } from 'big-json';

const stringifiedObj = await stringify({ body: objectToStringify });

await Actor.setValue(
    'KVS_KEY',
    stringifiedObj,
    { contentType: 'application/json' },
);

But useState cannot be used with large objects. It would help if the useState function accepted a callback function as a parameter where the serialization of the state object could be customized.

Steps to reproduce the error:

  • get a large JSON file, e.g. bigFile.json from my js-stringify-examples and parse its content to the JS object obj
    • or generate a large JS object in-memory
  • call JSON.stringify(obj) - the call should fail with Invalid string length error. JSON.stringify can be tested with process.js from js-stringify-examples.
  • call Actor.setValue('KEY', obj) - the call should fail immediately with The "value" parameter cannot be stringified to JSON error
  • call Actor.useState('STATE', obj) - this should fail during the persistation of the state object (not immediately after the call)

Code sample

Actor.setValue('KEY', obj);
Actor.useState('STATE', obj);

Package version

v3.12.1

Node.js version

v22.9.0

Operating system

Tested on both Windows and WSL with Debian distro

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

@lhotanok lhotanok added the bug Something isn't working. label Jan 16, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 16, 2025
@B4nan B4nan added this to the 4.0 milestone Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants