In previous posts, I spent some time describing a protocol design for negotiating independent channels over a shared connection of sorts. That’s all well and good, but it does beg the question what purpose channels serve.
In the abstract, it is just what I just wrote, a mechanism for conducting multiple parts of a larger communication in parallel, and independent of each other. But while that description is apt enough, it also leaves a lot of questions open.
I have a few answers here, and they all go back to where this inspiration for channels comes from. The answers, then, don’t exactly match up 100% with each other, but I consider each of them entirely valid. Let’s just run through them before finding a conclusion.
As I wrote in a previous post, TCP and UDP feature ports, and channels could be seen as roughly analogous. I also explored how these ports serve a purpose on the ISO stack.
At the data link layer, the concern is the equivalent of cables connecting machines, i.e. physical addressing.
At the network layer, we’re more concerned with logical addressing, which introduces the necessity for routing: which computer to I need to speak to to reach my destination?
Somewhere between the transport and session layers, I am interested in applications — servers — I’m speaking to.
Ports come in at this last layer: they serve the purpose of identifying a particular server on the source and destination machines.
So far, so good, but there’s a little bit of glue missing here, and that is essentially
/etc/services - that file on UNIX-like machines contains a mapping of service names to TCP and UDP port numbers.
As we’ve probably all learned at some point, there is no particular reason to run a particular server on a particular port other than that it’s necessary to know the port in order to connect to it. Convention and general applicability of clients is what assigns SSH to port 22 and HTTPS to 443. And the above mapping file documents that convention.
This convention matters, because clients tend to speak a single protocol only — sure, there are multi-protocol clients, but they usually want to know in which protocol they should introduce themselves. That means that this convention isn’t really about server processes as such, but about protocols. Clients connecting to port 23 expect the remote server to speak telnet, and will start speaking telnet once the connection is established.
Ports, then, identify the protocols that applications communicate by. Fixed ports on the server side ensure that any client has a chance of picking the right protocol, and dynamic ports on the client side ensure that multiple client processes can connect to the same service via the same protocol.
QUIC also knows channels, but has a simplifying characteristic that makes the concept of picking the right channel for the right purpose superfluous: an endpoint will speak the same protocol on all channels, and it’s QUIC.
Here, channels are typically associated with parallel concerns within the same protocol — or in simpler terms, downloading resources from multiple URLs at the same time.
It’s even more simplified for QUIC because all other characteristics we want to negotiate per channel are the fixed, such as reliability characteristics.
More Flexibility — For What?
We designed things to have more flexibility in the protocol, but at what price and for what gain?
First of all, we need a mechanism for determining the purpose of a channel.
Secondly, we need to ensure that the channel purpose is not at odds with the channel characteristics. Some protocols will require reliability features that others do not.
The added flexibility, however, lets us treat our protocol as an essentially application-agnostic mechanism, a transport for all kinds of purposes. It would, for example, not be unreasonable to view it as a universal tunnelling protocol: one where the application required flexibility and security features are provided.
So let’s see about solutions for keeping that.
The Simple Solution
The solution inspired by how IP-layered protocols like TCP and UDP do stuff is to… do nothing.
How would that work?
Well, for a client wanting to, say, speak FTP, it would have to negotiate a reliable channel with or without security — because FTP requires reliability features much like TCP.
And then, the client would start sending FTP commands.
If your implementation is one where, as in QUIC, a single application protocol is supported, then that’s really all that’s necessary. Job done, let’s move on.
Yes, well, for this kind of scenario we really do not need more. But if we imagine an inetd-like service, something that handles communications for many different kinds of protocols, downsides crop up. You’d have to implement the server to try and interpret at least the first incoming packet via any supported protocol, and make a choice of how to respond based on how well that worked out for each.
Less than ideal.
The Explicit Solution
One solution that’s eminently feasible is to treat each new channel’s purpose as essentially undefined. Then before the client gets to send its protocol’s commands, we explicitly negotiate the purpose.
I imagine this as something where the client sends a message containing what amounts to “hey, I want to speak SMTP here, is that cool?” — and the server either hooks up the right protocol handler and responds “sure”, or it doesn’t.
It’s a simple two-message protocol. Easy to implement, shouldn’t hurt.
Unfortunately, channel establishment means that the server must allocate resources (see previous posts). While we manage to avoid this for a while via the cookie mechanism, once the channel exists, resources are committed. If, then, the channel needs to be closed immediately because the inner protocol can’t be negotiated, well, that’s a bummer.
By which I mean it could be used to create a denial-of-service attack, asking for nonsensical protocols en masse.
Of course, the server can implement defensive measures against clients who take that approach, but let’s not leave the design wide open here, hmm?
The Negotiation Solution
I really don’t love negotiating features very much in channel handshakes, as I’ve previously stated, but I think this is actually one of the few scenarios where it makes the most kind of sense.
You may recall that the initiator sends MSG_CHANNEL_NEW, which the responder may reply to with MSG_CHANNEL_ACKNOWLEDGE. That response may get lost, or just not sent if the server decides it’s not appropriate.
The initiator must then send MSG_CHANNEL_FINALIZE or MSG_CHANNEL_COOKIE for the responder to finally allocate resources for the channel.
Let’s extend these messages.
If the initiator encodes the purpose in MSG_CHANNEL_NEW, this gives the server a reason to either respond MSG_CHANNEL_ACKNOWLEDGE if the purpose is acceptable, or just ignore the request. We could add some kind of explicit denial here, but for a number of reasons we decided against that before.
The most important in the context of channel purposes is that it acts as an oracle, and reveals what services a responder provides. It may require some wait time on the client side, but let’s just have the server decline to respond, and use timeouts for the client to notice.
MSG_CHANNEL_ACKNOWLEDGE doesn’t need the purpose encoded. If the responder acknowledges a channel, it’s clear that it’s content with the requested purpose.
The initiator must, however, send the purpose again in its last message (whether it is MSG_CHANNEL_FINALIZE or MSG_CHANNEL_COOKIE), because only on receipt of this last message is the server required to allocate resources — it can completely forget that MSG_CHANNEL_NEW was previously sent.
So that’s it then. MSG_CHANNEL_NEW, MSG_CHANNEL_FINALIZE, and/or the MSG_CHANNEL_COOKIE sent by the initiator must contain the channel purpose.
Hang on. One second. What about preventing clients from cheating?
Imagine for a moment that a protocol is so ubiquitous that it’s hard for a server not to establish a channel for it. Let’s pick, oh, I don’t know, HTTP for it.
So the initiator sends “Hey, could I have a channel please, and by the way it’s for HTTP”. The responder replies “sure, just send me this cookie back when you’re ready”. And then the initiator sends “Hey here’s a cookie for a channel, and by the way it’s for IMAP”.
Again, the server could be prompted into a denial-of-service situation since it allocates resources here — or it could be made to act as an oracle again if the initiator proceeds to send IMAP commands next. Either is a little more complicated now than in the prior scenarios, but let’s still do neither.
The implication is, then, that the responder must include the channel purpose in the cookie it returns to the initiator. In this way, it can verify that the MSG_CHANNEL_FINALIZE/_COOKIE of an initiator isn’t manipulated. Note that this is generally the case for these cookies.
Putting It Together
In the end, the whole thing isn’t too complicated, but it gets us off the ground. Let’s just keep the whole purpose pinning mechanism optional such that single-protocol implementations similar to QUIC can skip it.
MSG_CHANNEL_NEW must include a length byte for the purpose, plus this many bytes of purpose data. The protocol itself can be entirely agnostic to the meaning of these bytes. Use a zero length for single-protocol implementations.
MSG_CHANNEL_ACKNOWLEDGE’s cookie must also span those purpose bytes. Responders can skip sending this message if the purpose doesn’t match their ideas, such as when a sub-protocol is not implemented, or a purpose was sent in a single-protocol implementation or vice versa.
Actually, the initiator’s cookie should also span the purpose for much the same reason, to prevent a responder from messing around with it. The risk here is pretty low, but it’s an easy mechanism to default to.
MSG_CHANNEL_FINALIZE/_COOKIE from the responder must contain the purpose again unchanged for the server to accept the channel and do any real work.
This now leaves things open for single-protocol implementations to essentially work as before; they just encode a zero-valued byte more in their cookie. Multi-protocol implementations now have the option for consulting a responder-side mapping of purpose data to some actual protocol implementation.
I’ve deliberately steered away from any format for these purposes, as it’s very difficult to pin this down for all use cases. I expect a human-readable string would be preferred.
As a first tentative proposal, I would suggest a URI-style syntax, which explicitly permits namespaces via the URI scheme. I’d furthermore suggest an initial URI scheme “iana”, which then is followed by a service name from IANA’s Service Name and Transport Protocol Port Number Registry — which is after all the basis for the
/etc/services file mentioned at the outset of this post.
There isn’t a better registry for “standard” protocols I’m aware of, and the service names are low enough on detail that protocol revisions and extensions remain the domain of the protocol itself to negotiate, so a “iana:https” purpose is sufficient detail to send.
A quick note on subscriptions: I’m trying to keep doing this work, and have set up subscriptions so that people can help me pay my bills. That means some articles are effectively paywalled — if you can’t pay, fear not: there is a special launch promotion open.
You’ll get free lifetime access, and I can keep the paywall up for folks coming from the outside. And if you are in the right space to be extra awesome and pay for a subscription, that’s all the more appreciated!