Skip to content

Conversation

simonw
Copy link
Owner

@simonw simonw commented Jun 23, 2019

Most of the code here was fleshed out in comments on #272 (Port Datasette to ASGI) - this pull request will track the final pieces:

  • Update test harness to more correctly simulate the raw_path issue
  • Use raw_path so table names containing / can work correctly
  • Bug: JSON not served with correct content-type
  • Get ?_trace=1 working again
  • Replacement for @app.listener("before_server_start")
  • Bug: /fixtures/table%2Fwith%2Fslashes.csv?_format=json downloads as CSV
  • Replace Sanic request and response objects with my own classes, so I can remove Sanic dependency
  • Final code tidy-up before merging to master

simonw added 17 commits June 15, 2019 14:27
Refs #272

This reverts commit 9fdb47c.

Now that ASGI supports raw_path we don't need our own encoding scheme!
Lots still to do:

* Static files are not being served
* Streaming CSV files don't work
* Tests all fail
* Some URLs (e.g. the 'next' link on tables) are incorrect

But... the server does start up and you can browse databases/tables
CSV tests still all fail.

Also I marked test_trace() as skip because I have not yet re-implemented trace.
Not yet using aiofiles so will not correctly handle larger static assets.

Still needs security tightening.

Still needs tests.

But the CSS and JS now work
Mainly so you can tell if a Datasette instance is running
on ASGI or not.
Refactored utils.py into a datasette/utils package, refactored some of
the ASGI helper code into datasette/utils/asgi.py
@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

This is strange: on my local machine http://127.0.0.1:8001/fixtures/table%2Fwith%2Fslashes.csv is returning a 404 BUT there's a test for that which is passing under pytest:

datasette/tests/test_api.py

Lines 721 to 727 in d60fbfc

def test_table_with_slashes_in_name(app_client):
response = app_client.get(
"/fixtures/table%2Fwith%2Fslashes.csv?_shape=objects&_format=json"
)
assert response.status == 200
data = response.json
assert data["rows"] == [{"pk": "3", "content": "hey"}]

@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

Mystery solved: that's because I'm constructing my own scope object and testing via ApplicationCommunicator rather than exercising Uvicorn directly.

async def _get(self, path, allow_redirects=True, redirect_count=0):
query_string = b""
if "?" in path:
path, _, query_string = path.partition("?")
query_string = query_string.encode("utf8")
instance = ApplicationCommunicator(
self.asgi_app,
{
"type": "http",
"http_version": "1.0",
"method": "GET",
"path": path,
"query_string": query_string,
"headers": [[b"host", b"localhost"]],
},
)

I don't want to introduce the complexity of launching a real Uvicorn as part of the tests, so I guess I'll have to carefully update my ApplicationCommunicator test harness to more correctly emulate real life.

@simonw simonw self-assigned this Jun 23, 2019
@simonw simonw added this to the Datasette 1.0 milestone Jun 23, 2019
@simonw simonw changed the title Use ASGI internally, serve using Uvicorn instead of Sanic #272 Port Datasette from Sanic to ASGI + Uvicorn Jun 23, 2019
@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

Another bug: JSON is being served without a content-type header:

~ $ curl -i 'http://127.0.0.1:8001/fivethirtyeight/ahca-polls%2Fahca_polls.json'
HTTP/1.1 200 OK
date: Sun, 23 Jun 2019 16:04:01 GMT
server: uvicorn
referrer-policy: no-referrer
transfer-encoding: chunked

{"database": "fivethirtyeight", "table": "ahca-polls/ahca_polls", ...

@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

OK, for Get ?_trace=1 working again. The old code lives in two places:

datasette/datasette/app.py

Lines 546 to 560 in 35429f9

class TracingSanic(Sanic):
async def handle_request(self, request, write_callback, stream_callback):
if request.args.get("_trace"):
request["traces"] = []
request["trace_start"] = time.time()
with capture_traces(request["traces"]):
await super().handle_request(
request, write_callback, stream_callback
)
else:
await super().handle_request(
request, write_callback, stream_callback
)
app = TracingSanic(__name__)

And then:

datasette/datasette/app.py

Lines 653 to 672 in 35429f9

@app.middleware("response")
async def add_traces_to_response(request, response):
if request.get("traces") is None:
return
traces = request["traces"]
trace_info = {
"request_duration_ms": 1000 * (time.time() - request["trace_start"]),
"sum_trace_duration_ms": sum(t["duration_ms"] for t in traces),
"num_traces": len(traces),
"traces": traces,
}
if "text/html" in response.content_type and b"</body>" in response.body:
extra = json.dumps(trace_info, indent=2)
extra_html = "<pre>{}</pre></body>".format(extra).encode("utf8")
response.body = response.body.replace(b"</body>", extra_html)
elif "json" in response.content_type and response.body.startswith(b"{"):
data = json.loads(response.body.decode("utf8"))
if "_trace" not in data:
data["_trace"] = trace_info
response.body = json.dumps(data).encode("utf8")

So it's stashing something on the request to tell the rest of the code it should be tracing, then using that collected data from the request to add information to the final body.

One possible shape for the replacement is a new ASGI middleware that wraps everything else. We don't have a mutable request object here though, so we will need to untangle this entirely from the request object.

Also tricky is that in ASGI land we handle streams - we don't usually wait around for the entire response body to be compiled for us. This means the code that modifies the response (adding to the JSON or appending inside the </body>) needs to be reconsidered.

As usual, Starlette seems to have figured this out: https://github.com/encode/starlette/blob/8c8cc2ec0a5cb834a9a15b871ae8b480503abb67/starlette/middleware/gzip.py

@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

Replacement for @app.listener("before_server_start") - this is what the ASGI lifespan protocol is for.

I know Uvicorn supports this because it keeps saying ASGI 'lifespan' protocol appears unsupported on the console.

I think the solution here will be to introduce another ASGI wrapper class similar to AsgiTracer. I'll model this on the example in the ASGI lifespan spec.

simonw added 2 commits June 23, 2019 13:31
Also did a little bit of lint cleanup
@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

The big one: Replace Sanic request and response objects with my own classes, so I can remove Sanic dependency

@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

The InvalidUsage exception is thrown by Sanic when it gets an unhandled request - usually a HEAD. It was added in efbb4e8

I'm going to replace it with specific handling for HEAD requests plus a unit test.

@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

The Sanic stuff I import at the moment is:

@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

For the request object.... what are the fields of it I actually use?

  • request.url
  • request.query_string
  • request.path
  • request.method
  • request.args
  • request.raw_args

ALL of those are things that can be derived from the scope - so I think my new Request class (in utils/asgi.py) is just going to be a wrapper around a scope.

@simonw
Copy link
Owner Author

simonw commented Jun 23, 2019

Last thing is to replace sanic.response:

  • response.text("")
  • response.html()
  • response.redirect(path)
  • response.HTTPResponse

Implementations here: https://github.com/huge-success/sanic/blob/0.7.0/sanic/response.py#L175-L285

@simonw simonw merged commit ba8db96 into master Jun 24, 2019
@simonw
Copy link
Owner Author

simonw commented Jun 24, 2019

It's live! https://a559123.datasette.io/

simonw added a commit that referenced this pull request Jul 7, 2019
Datasette now uses ASGI internally, and no longer depends on Sanic.

It now uses Uvicorn as the underlying HTTP server.

This was thirteen months in the making... for full details see the issue:

#272

And for a full sequence of commits plus commentary, see the pull request:

#518
simonw added a commit that referenced this pull request Nov 11, 2019
Datasette now uses ASGI internally, and no longer depends on Sanic.

It now uses Uvicorn as the underlying HTTP server.

This was thirteen months in the making... for full details see the issue:

#272

And for a full sequence of commits plus commentary, see the pull request:

#518
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant