-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CGI support #83
Comments
Constantine Peresypkin [email protected] writes:
Looks good to me.
Also fine.
Looks good, but we are missing other HTTP headers sent by the client. I'm testing how Apache does it right now and so far I can see that a Foo http://www.ietf.org/rfc/rfc3875 When I try adding a Range header, well then Apache seems to handle it! I had expected it to be the CGI script that should handle the Range I've read questions about the header with PHP and judging from some SO I'll have to test this some more.
So you want the name as seen inside the sandbox? That is probably okay. /~mg/cgi-bin/test.py I would say that it is mostly used to generate URLs that refer back to
I don't see this header in RFC 3875.
In traditional CGI, this contain extra path components. So if I request http://localhost/~mg/cgi-bin/test.py/hey/extra/stuff I see that PATH_INFO is set to '/hey/extra/stuff'. So for the /open/
Sounds good.
This is a little weird. The document root is normally a quite static Will a script not always "know" where the attached object is? In the
I (only) see these two variables defined if I POST to my little test I hope there cannot be a situation where we have both POST data coming
Yeah, it will be nice to have the meta data directly there.
I was first thinking that this should be passed in the QUERY_STRING... |
I fear that most current web servers have no real standard on which headers are handled by the server and which ones - by CGI...
Yeah, exactly that kind of thing.
But it's visible in most current servers, AFAIK.
Yep, this is the exact value it has right now. But we will have a problem when we will try to implement "arbitrary RESTful interfaces" on Zerocloud.
Because job description is quite static and can be compared to specific mod_* config in apache.conf the
Probably that will never happen, but it depends on how the "arbitrary RESTful interfaces" are going to be implemented. Probably the POST data from the REST API frontend will be first materialized as an object and then ZeroVM will act upon it in another session. But there are other ways to do it, for example packing it in tar on-the-fly.
Not quite right, you can obviously change any part of the job description in runtime. This is the problem with current "standard" interfaces like CGI - it doesn't quite fit the Zerocloud execution paradigm. But inventing a new good API for that is something I still would like to avoid, will be too complex, probably. |
Constantine Peresypkin [email protected] writes:
There seems to be a difference between handling the header and exposing
Some more testing shows that Apache will "help" the CGI script if it More concretely, if it don't send out a Status header, Apache will add a If the script does add a Status header, Apache will do less. If I add a So all in all, I think we can expose more headers in the environment. We |
We can do that, but it will make things problematic, i.e. we will need to put safeguards against abusing headers. :)
We are supporting CGI and CGI NPH right now. Which means that if job claims to support |
Constantine Peresypkin [email protected] writes:
Yes, I suggest we take a look at how other servers do this and implement
Thanks for that keyword, I did not know that this behavior had a name
Yeah, I see this documented in the Servlet.md document.
Is this because of the fan-out execution that we have behind the I actually think we should handle this case already today. Something Today, we get a concatenation of the output from the stdout. It seems It might make more sense to say that we send back at most one output. Martin Geisler |
It means reverse engineering apache/nginx source code, yuck. :)
Yes. But not only that. You can save output of
Yep, probably need to be specifically stated in docs. Now it's more like "send job with multiple nodes without a path and see what happens". :)
That was considered, but it really makes life more miserable. Earlier we also supported multiple channels concatenated in one node output, but dropped it. Right now some people still miss that feature (even without knowing it ever existed). I think we need to make a clear distinction here. And I think we probably need to make some "CGI mode" available. I.e. mode that behaves a lot like classical CGI (with some zerocloud added bonuses). And it will probably mean that it's job description will be a fixed one.
|
Constantine Peresypkin [email protected] writes:
Apache adds most headers, it only filters out Authorization and https://github.com/apache/httpd/blob/53823ebd5c/modules/generators/mod_cgi.c#L805
It sounds like we might just want to consider that a configuration error
Guess how I tested this before writing my previous mail :)
Well, the current situation is only non-miserable if the behavior makes
I'm not sure I follow you here. What is a "fixed" job description?
Up until now, I had not even noticed that you could attach these content
This was why I wanted to restrict the output to a single node -- then it
I think I would try not to have different modes. My thinking is that if I think it would be better overall if we can define some sensible Martin Geisler |
That's an easy one. If user wants to write an object back to Swift but the object MIME-type is only known in runtime: it will use CGI interface to create a proper 'Content-Type' header and that header will be used when object is PUT. Same thing about other object-related headers, like metadata or encoding.
That would be a super-set. I.e. CGI app can produce a bunch of headers, only specific ones will be used for PUT, all other ones can be transferred to the user, if response is to be transferred to the user.
And Content-Length + Content-Type "for no specific reason". :)
Not good, see the first paragraph. :)
Not a coincidence. But the guarantees are that the order will be constant between invocations and deterministic. For deterministic reasons.
Means that you not send it with the request, but it's implied from other things. Like the current GET behavior.
Obviously not. :)
But obviously the end result for the user will be the same. As proxy will add its own headers anyway. To both of them, and just add more headers to the former, than to the latter. |
Constantine Peresypkin [email protected] writes:
They are actually not filtered -- they're just added by hand for no
I cannot see that paragraph any longer in your mail :) And GitHub In other words, this discussion format is a little primitive.
Okay, I had not considered that the determinism would apply to this
Okay, thanks!
Okay, but what is the 'message/http' content type then? I thought 'message/http' was the content type that told ZeroCloud that
Yeah, that makes sense. Martin Geisler |
|
Constantine Peresypkin [email protected] writes:
Okay, thanks! Martin Geisler |
This is the current typical environment for Zerocloud on Zebra
|
Some CGI variables are exposed to the application to make it aware of the environment.
Currently it looks like this:
We can divide the vars into groups.
Static ones, usually never change.
Related to Swift setup, will change if Swift will be installed differently.
Related to request. The values are taken verbatim from the http request
All others. These ones need most attention as we are "emulating" things here.
SCRIPT_NAME
- right now it's a path to executable taken from job description. I want to alter it and use the actual name/path of the executable here (like "python" or "/bin/wc" for example).SCRIPT_FILENAME
- right now unused. I want it to be the path from job description (like "swift://account/cont/app.nexe" or "file://python:python")PATH_INFO
- path to account if not connected to swift object, path to swift object if attached to object ( "/account" in former case "/account/container/object" - in latter)REQUEST_METHOD
- it's "GET" for requests that do not have attached data files (the ones that have only "stdout" or "output" channels, and maybe network ones), and it's "POST" for requests that have attached data ("stdin", "input", "image" and so on)DOCUMENT_ROOT
- device name of the attached object (if attached, otherwise unset)CONTENT_LENGTH
- size of the attached object (if attached, otherwise unset)CONTENT_TYPE
- content-type of the attached object (if attached, otherwise unset)HTTP_X_TIMESTAMP
,HTTP_ETAG
,HTTP_CONTENT_ENCODING
,HTTP_X_OBJECT_META_*
- metadata from attached object (if attached, otherwise unset)Additional things.
command line args - probably we need to pass them as env variable also (very useful for daemon mode)
The text was updated successfully, but these errors were encountered: