-
Notifications
You must be signed in to change notification settings - Fork 6.1k
ceph-mgr: Implement new pecan-based rest api #14457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Initial comments (sorry if this is all a bit critical, such is the nature of code review...) When someone authenticates by CephX key, you're sending a remote command to the mon for every single HTTP request -- that's adding latency to the requests and adding load to the mons. I imagine you're hoping that users will be well behaved and use the "token" mechanism most of the time -- I think that is quite optimistic. IMHO if a user is going to require cephx credentials to initially create/destroy tokens, then that initial config might as well just be a CLI thing, and avoid the risk of users just using cephx keys for everything. Encouraging people to send CephX tokens to an unauthenticated host is also just plain bad security. While the connection is SSL, it is shipping with a self-signed certificate, so there is no authentication. I can't see where the logic for modifying pg_num/pgp_num went (the part where we increase one first and then gradually increase the other). One of the main reasons I wrote that for Calamari back in the day was an example of how API operations weren't always trivial things, and sometimes they had multiple steps -- the structure that enabled that should persist. There doesn't seem to be any update to the ubuntu/debian packaging, are the dependencies present in Xenial? I hope this doesn't seem too picky, but please can we avoid picking deliberately obscure names. This module should probably just be called rest, or if we're going to have the old one co-existing, it should be called rest2 or rest-flask or something like that. Exposing documentation on /doc is a neat, but exposing it on docs.ceph.com is much more important, is there a path to doing that? If we're going to the trouble of having a brand new module, let's have some documentation to go with it, at least demonstrating how to authenticate. |
src/pybind/mgr/coleoid/common.py
Outdated
return " ".join(out) | ||
|
||
|
||
def HUMANIFY_REQUEST(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this isn't a method on CommandsRequest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was on a TODO list, looks like I should have written it down. :)
src/pybind/mgr/coleoid/common.py
Outdated
|
||
|
||
# Transform command to a human readable form | ||
def HUMANIFY_COMMAND(command): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function names should be lower_case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was from back then when I used 'from common import *' and I used upper case to identify the bits from the common file. I will change that.
src/pybind/mgr/coleoid/module.py
Outdated
|
||
|
||
@staticmethod | ||
def run(commands, uuid=str(uuid4())): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently gets self.uuid passed in always, so no default needed, and instead of a static method that takes a uuid argument, this could be a class method that just uses self.uuid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it is a relict from one of the older ideas I had which I deprecated since then.
src/pybind/mgr/coleoid/module.py
Outdated
try: | ||
results = self.run(commands_arrays[0], self.uuid) | ||
except: | ||
instance.log.error(str(traceback.format_exc())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mark itself failed at this stage, presumably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I forgot to remove this debugging try: except: statement which should no longer be necessary (I have previously had some issues with logging from different places so I had to catch this early but it is no longer necessary).
Yep, I was hoping the users will use tokens when necessary. To create a persistent token all you need to do is to visit the /auth node while being authenticated by a CephX key. That will generate a persistent token (well, until you delete it with delete method or refresh it by visiting the /auth endpoint again). You can access this easily via browser, since it sends back 401 with www-authenticate, the browser will prompt for the username/password. From python, all you need to do is add auth=('user', 'password') to the request method, i.e. btw: As far as speed goes, I have tested this in my VM with 1500 authenticated requests and it took 13.5 seconds when sending remote command for auth each time compared to about 10 seconds when using tokens.
I would not exactly call the host unauthenticated. It is running on a monitor and that is a host that is under the control of a Ceph admin. Also, paranoid admins can use their own keys for this and they can do the authentication properly. The packaging changes do not overwrite the existing keys on purpose. That being said, I admit I did not spend that much time on this part and I would be happy to do this better. I would e.g. like it if all the nodes did have the same cert/key, have some mechanism to verify the self-signed keys, etc. btw: One of the reasons I wanted to use the CephX keys was that I was hoping we could automate the initial need for creation of ceph-mgr keys and we could use the key provided by a user to authenticate ceph-mgr at a later point and only then, it would start serving requests. Although I did not look into how easy/difficult that would be, yet.
The submit_request method accepts list of lists. The pg_num is changed in the first iteration and only if that succeeds the command will change the pgp_num. I may have missed something there though. i.e. the request looks like
The request mechanism will run the commands in the first sub-list in parallel and once those are done it will run the commands in the second sublist (well, the one command). If you are saying gradually then maybe there should be more sub-lists?
Sigh, I always forget about these. The ssl bits will definitely be there. I have just checked for the python-flask-restful extension on pkgs.org and it seems to be available in ubuntu xenial as well as debian sid.
Sure, the primary reason I went for an obscure name was that 'rest' was already taken and I did not want to do a PR that would patch the old rest directory as that would look like a total mess on github. It might be ok if I did that in two commits I suppose (one that removed the old rest api and one that added the flask-based one). btw: My working name was neverest, I can switch to that if you like it better?
I do plan to create a (html-formatted) snapshot of the /doc endpoint once the API stabilizes and document it on docs.ceph.com. No automated path for that, yet. :) |
I'm not so concerned about the API performance as I am about it placing extra load on the mons (and also spamming the audit log). If someone is running e.g. a UI that's doing several requests every few seconds, it's unreasonable to be sending a command to mon for every one of those. I really wouldn't rely on users to be well-behaved enough to switch auth methods to the more efficient tokens. If you just avoid giving them the cephx option to begin with, we don't have to worry. Users/tools would have to go the CLI or their keyring file to get their cephx token to begin with, so it's probably not at all painful to just have them use the CLI to create their token for API access.
From the REST client's point of view, it is just talking to an IP address and hoping the thing responding is really the monitor. This is a "red address bar" situation, where we have a SSL connection but we are only hoping that the thing sending response packets is really the host we thought we were talking to. The host definitely is unauthenticated in the security sense of the word, there's no ambiguity about that.
At the risk of sounding a bit preachy, we have a duty to ship things as secure as we can by default, not rely on admins to secure them after the fact. Self-signed certificates are an unfortunate reality in some environments, but we can mitigate this by avoiding sending over-privileged secrets (like cephx keys) over that channel. I think there's also an argument that we should really wave this issue in the admin's face by requiring them to either load their own SSL key, or at least requiring them to manually type a command that tells them they are creating a self-signed certificate.
Look at PgCreatingRequest. It needs to batch the creation of PGs and wait for them to be done before creating some more. That reminds me of a more general point -- the commands in the current rest code (OsdMapModifyingRequest etc) will make sure the up-to-date cluster maps are loaded before considering the job complete. That's so that if someone does a POST and then a GET, the results reflect the change they just made. It could be reasonable to just not try and do that any more, but it is a change you are making that you should be aware of.
That's better but my preference would be for something that starts with "rest". See how ceph-devel feels about just replacing the old one outright. You need to make some effort to find out about anyone who is using the old one, to have the discussion about whether to kill it.
Ah, good. Hopefully that will include some text explaining things, and an example piece of client code etc, not just a list of fields. |
a3f9337
to
4a04d13
Compare
Latest changelog:
|
security: We do not authenticate the servers in the sense of SSL but afaik, we don't do that with Ceph tools either (*). The SSL certificates just make it more obvious because they are designed to be signed by a CA but they do have a use even when they are just self-signed -- you can easily detect a change in host (certificate changed), the communication is encrypted,... It is only the first encounter with the server when you are vulnerable to the MITM attack just like you are with Ceph itself (imagine someone spoofing your DNS records and acting as a middle man resending hte data to the real Ceph node, the person would have your data, know your key, ...). In any case, the new way of handling the certificates allows you to verify the certificate via the usual channels ( There is also the point that if you use a non-standard auth system, the admins are very likely to not take the API that seriously but that is really rather bad -- the API is rather powerful (you can even delete pools with it...). If you use the CephX key for auth, it makes this much more obvious and it should make users much more cautious about it which is a good thing. (*) We would need a third party -- some kind of CA -- for that. I really can't imagine how a server authentication without some kind of CA would work and afaik, we do not have a CA in Ceph. speed: My point was that 20-30% performance penalty if you are running the API as a user (i.e. one request per few seconds or minutes) does not sound that bad. On the other hand, if you are doing several requests per second with a web UI, it is really rather easy to generate a token and use the token-based authentication. |
Without having double checked this, I believe that CephX authentication doesn't transmit the key. It's shared secret crypto: the server proves to the client that it already knows the client's key, and the client proves to the server that it has the key, all without actually sending the key. The central issue with what you're doing here is that you're transmitting that key like a password, but it's not a password. It's a key for use in shared secret crypto.
I don't care about the penalty for the rest api consumer, I care about it spamming the monitors. I don't want every monitor log we see to be a long stream of "auth get" commands, and I don't want monitors to have to service these extra requests unnecessarily.
Certainly it's easy to generate a token, but that doesn't mean users will do it. There's no need to offer them the choice -- just let the tokens be configured via the CLI, and then only allow HTTP access with tokens. |
src/pybind/mgr/restful/module.py
Outdated
def _shutdown(self): | ||
# We need request module to get the shutdown function | ||
requests.post( | ||
'https://localhost:8002/shutdown', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the werkzeug server really not have a better way to shut down than making a process POST to itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Werkzeug does but Flask (that runs the werkzeug server) does not provide it directly so you need the request to get the environment variable that points to the shutdown function. Some more details can be found e.g. here:
http://stackoverflow.com/questions/15562446/how-to-stop-flask-application-without-using-ctrl-c
@@ -173,7 +173,8 @@ void MonCapGrant::expand_profile_mon(const EntityName& name) const | |||
profile_grants.push_back(MonCapGrant("mon", MON_CAP_R)); | |||
profile_grants.push_back(MonCapGrant("mds", MON_CAP_R)); | |||
profile_grants.push_back(MonCapGrant("osd", MON_CAP_R | MON_CAP_W)); | |||
profile_grants.push_back(MonCapGrant("config-key", MON_CAP_R)); | |||
profile_grants.push_back(MonCapGrant("auth", MON_CAP_R | MON_CAP_X)); | |||
profile_grants.push_back(MonCapGrant("config-key", MON_CAP_R | MON_CAP_W)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so it looks like @liewegas rolled back the permissions of the mgr quite a bit, in practice I suspect we're going to end up adding back read/write access to most of these in order to enable mgr modules to do management stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, when I first tested this, even the "osd" had only MON_CAP_R so I could not e.g. create a pool. I think @tchaikov changed that in the last couple of days.
@jcsp will review this change (and yours) in one day or two. |
that's fine!
|
I've been trying to find any rationale or discussion, and failing. Is there some background for "why are we doing this"? |
@dmick Dependencies, the django based REST api won't run unless you manually (with pip I believe) install all the dependencies. This one only requires stuff that is already in EPEL. It was easier for me to do a complete rewrite based on the django based REST api than trying to strip the django dependencies from the current API. This runs out of the box, no manual steps are necessary and we can package it easily. |
The new rest API uses pecan for the restful functionality and simplifies the code significantly. It should be mostly equivalent in functionality to the django-based rest API. The api is self-documenting via /doc endpoint. Signed-off-by: Boris Ranto <branto@redhat.com>
Signed-off-by: Boris Ranto <branto@redhat.com>
The patch moves decorators into a separate file so they could be shared amongst various files. It also fixes key generation in werkzeug 0.10+ and fixes get input method on various endpoints. Signed-off-by: Boris Ranto <branto@redhat.com>
The patch makes the REST api being served by a module, not a single monolithic file. Signed-off-by: Boris Ranto <branto@redhat.com>
This allows users to create their own certificates that are signed by a CA. If the keys are not present then a self-signed certificate will be created and distributed amongst all the mgr nodes. It also allows us to get rid of the pyOpenSSL dependency. Signed-off-by: Boris Ranto <branto@redhat.com>
The send_command function now requires 6 arguments, switching back to mon commands as that is what we used before. Signed-off-by: Boris Ranto <branto@redhat.com>
This also contains a documentation fix for authentication in / endpoint. Signed-off-by: Boris Ranto <branto@redhat.com>
This is the simplest way to generate the keys and probably the least likely to cause trouble in the future. Signed-off-by: Boris Ranto <branto@redhat.com>
This allows us to run both calamari and the rest api at the same time. Signed-off-by: Boris Ranto <branto@redhat.com>
This removes the catch decorator which was useful primarily for testing purposes and instead switches to a pecan error hook which will print the traceback to the ceph-mgr log. Signed-off-by: Boris Ranto <branto@redhat.com>
Signed-off-by: Boris Ranto <branto@redhat.com>
Signed-off-by: Boris Ranto <branto@redhat.com>
This commit adds a pagination support via paginate decorator. The decorator will paginate the output if it is told to. The first page is 0, the last page is -1. The decorator was applied to /request endpoint so you can get requests in hunderds if you supply the endpoint with ?page=N. If no ?page=N is supplied, the decorator will output the full list. Signed-off-by: Boris Ranto <branto@redhat.com>
@liewegas I have rebased and fixed the conflicts. I use this simple script to test the code: It mostly just checks the code integrity by going through almost all the endpoints and checking that they return http 200. It could use a bit more automation e.g. to pass the admin key, etc. |
Cool, can you add that script to qa/workunits/rest/test_mgr_rest_api.py (or something like that), and make it take the endpoint via the command line? Thanks! |
This commit adds a simple test that will send the requests to most of the API endpoints and check the status code of the requests. Signed-off-by: Boris Ranto <branto@redhat.com>
I guess you meant the admin key should be a cli argument. I have added the test script. |
Can we merge this? |
Oops! Yeah, I thought we already had. |
Hello everyone, I hope I'm not rude for commenting after a while, but I'm trying to work with this module right now and have some problems. Actually, my only problem is with the The command I'm issuing right now looks something like: The logs say thus:
I must admit I don't understand the prefix thing. And maybe I'm wrong in the understanding of the endpoint itself as a whole - shouldn't I use it to pass ANY ceph command? It'll be great if somebody would explain. Maybe the developer himself @b-ranto? |
Hi, that method is a simple pass through method for arbitrary commands as defined in
The prefix is I.e. You should send something like The endpoint is hard to use by design. It is not recommended to post to the endpoint unless you know exactly what you are doing. There is no parsing of output, no error handling, nothing. It will just pass the python structure and return the output. |
Thank you SO much @b-ranto ! |
Yeah, feel free to play with it or use it, just really don't expect anything special out of it. We had some issues in the past with a similar pass through method that was easier to use where people expected it to do way more than it actually did. |
@b-ranto No, be certain I'm not after anything special, just run basic commands via REST. :) |
|
||
else: | ||
self.keys[command['key_name']] = str(uuid4()) | ||
self.set_config_json('keys', self.keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, a quick question: self.set_config_json
probably run failed, like bad authorization. should we process this case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has changed a lot since this PR has been merged. Feel free to create an upstream issue or PR if you think there is something wrong with the current code.
However, in general, set_config_json is unlikely to fail. If it does it may point to some deeper issues with ceph-mgr code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @b-ranto roger that.
@b-ranto I guess the answer is 'no', but I have to ask just in case you know better - Is it possible to invoke the Ceph admin socket (under |
If this discussion could move into a ceph-devel mailing list thread for broader visibility (and so that I don't keep getting notifications from this long-closed PR?), I would appreciate it |
The new rest API uses Flask with flask-restful extension and simplifies
the code significantly. It should be mostly equivalent in functionality
to the django-based rest API.
The new API uses Ceph auth system (requires caps mon allow *) and is
self-documenting via /doc endpoint.
Signed-off-by: Boris Ranto branto@redhat.com