On November 28, the HTTPbis group within the IETF published the first draft for the upcoming HTTP2 protocol. What is being posted now is a start and a foundation for further discussions and changes. It is basically an import of the SPDY version 3 protocol draft.
There’s been a lot of resistance within the HTTPbis to the mandated TLS that SPDY has been promoting so far and it seems unlikely to reach a consensus as-is. There’s also been a lot of discussion and debate over the compression SPDY uses. Not only because of the pre-populated dictionary that might already be a little out of date or the fact that gzip compression consumes a notable amount of memory per stream, but also recently the security aspect to compression thanks to the CRIME attack.
Meanwhile, the discussions on the spdy development list have brought up several changes to the version 3 that are suggested and planned to become part of the version 4 that is work in progress. Including a new compression algorithm, shorter length fields (now 16bit) and more. Recently discussions have brought up a need for better flexibility when it comes to prioritization and especially changing prio run-time. For like when browser users switch tabs or simply scroll down the page and you rather have the images you have in sight to load before the images you no longer have in view…
I started my work on Spindly a little over a year ago to build a stand-alone library, primarily intended for libcurl so that we could soon offer SPDY downloads for it. We’re still only on SPDY protocol 2 there and I’ve failed to attract any fellow developers to the project and my own lack of time has basically made the project not evolve the way I wanted it to. I haven’t given up on it though. I hope to be able to get back to it eventually, very much also depending on how the HTTPbis talk goes. I certainly am determined to have libcurl be part of the upcoming HTTP2 experiments (even if that is not happening very soon) and spindly might very well be the infrastructure that powers libcurl then.
For the readers of my blog, this is a copy of what I posted to the httpbis mailing list on July 12th 2012.
This is a response to the httpis call for expressions of interest
I am the project leader and maintainer of the curl project. We are the open source project that makes libcurl, the transfer library and curl the command line tool. It is among many things a client-side implementation of HTTP and HTTPS (and some dozen other application layer protocols). libcurl is very portable and there exist around 40 different bindings to libcurl for virtually all languages/enviornments imaginable. We estimate we might have upwards 500
million users or so. We’re entirely voluntary driven without any paid developers or particular company backing.
HTTP/1.1 problems I’d like to see adressed
Pipelining – I can see how something that better deals with increasing bandwidths with stagnated RTT can improve the end users’ experience. It is not easy to implement in a nice manner and provide in a library like ours.
Many connections – to avoid problems with pipelining and queueing on the connections, many connections are used and and it seems like a general waste that can be improved.
We’ve implemented HTTP/1.1 and we intend to continue to implement any and all widely deployed transport layer protocols for data transfers that appear on the Internet. This includes HTTP/2.0 and similar related protocols.
curl has not yet implemented SPDY support, but fully intends to do so. The current plan is to provide SPDY support with the help of spindly, a separate SPDY library project that I lead.
We’ve selected to support SPDY due to the momentum it has and the multiple existing implementaions that A) have multi-company backing and B) prove it to be a truly working concept. SPDY seems to address HTTP’s pipelining and many-connections problems in a decent way that appears to work in reality too. I believe SPDY keeps enough HTTP paradigms to be easily upgraded to for most parties, and yet the ones who can’t or won’t can remain with HTTP/1.1 without too much pain. Also, while Spindly is not production-ready, it has still given me the sense that implementing a SPDY protocol engine is not rocket science and that the existing protocol specs are good.
By relying on external libs for protocol and implementation details, my hopes is that we should be able to add support for other potentially coming HTTP/2.0-ish protocols that gets deployed and used in the wild. In the curl project we’re unfortunately rarely able to be very pro-active due to the nature of our contributors, which tends to make us follow the rest and implement and go with what others have already decided to go with.
I’m not aware of any competitors to SPDY that is deployed or used to any particular and notable extent on the public internet so therefore no other ”HTTP/2.0 protocol” has been considered by us. The two biggest protocol details people will keep mention that speak against SPDY is SSL and the compression requirements, yet I like both of them. I intend to continue to participate in dicussions and technical arguments on the ietf-http-wg mailing list on HTTP details for as long as I have time and energy.
curl currently supports Basic, Digest, NTLM and Negotiate for both host and proxy.
Similar to the HTTP protocol, we intend to support any widely adopted authentication protocols. The HOBA, SCRAM and Mutual auth suggestions all seem perfectly doable and fine in my perspective.
However, if there’s no proper logout mechanism provided for HTTP auth I don’t forsee any particular desire from browser vendor or web site creators to use any of these just like they don’t use the older ones either to any significant extent. And for automatic (non-browser) uses only, I’m not sure there’s motivation enough to add new auth protocols to HTTP as at least historically we seem to rarely be able to pull anything through that isn’t pushed for by at least one of the major browsers.
The “updated HTTP auth” work should be kept outside of the HTTP/2.0 work as far as possible and similar to how RFC2617 is separate from RFC2616 it should be this time around too. The auth mechnism should not be too tightly knit to the HTTP protocol.
The day we can clense connection-oriented authentications like NTLM from the HTTP world will be a happy day, as it’s state awareness is a pain to deal with in a generic HTTP library and code.
As a protocol geek I love working in my open source projects curl, libssh2, c-ares and spindly. I also participate in a few related IETF working groups around these protocols, and perhaps primarily I enjoy the HTTPbis crowd.
Meanwhile, I’m a consultant during the day and most of my projects and assignments involve embedded systems and primarily embedded Linux. The protocol part of my life tends to be left to get practiced during my “copious” amount of spare time – you know that time after your work, after you’ve spent time with your family and played with your kids and done the things you need to do at home to keep the household in a decent shape. That time when the rest of the family has gone to bed and you should too but if you did when would you ever get time to do that fun things you really want to do?
IETF has these great gatherings every now and then and they’re awesome places to just drown in protocol mumbo jumbo for several days. They’re being hosted by various cities all over the world so often I deem them too far away or too awkward to go to, also a lot because I rarely have any direct monetary gain or compensation for going but rather I’d have to do it as a vacation and pay for it myself.
IETF 83 is going to be held in Paris during March 25-30 and it is close enough for me to want to go and HTTPbis and a few other interesting work groups are having scheduled meetings. I really considered going, at least to meet up with HTTP friends.
Something very rare instead happened that prevents me from going there! My customer (for whom I work full-time since about six months and shall remain nameless for now) asked me to join their team and go visit the large embedded conference ESC in San Jose, California in the exact same week! It really wasn’ t a hard choice for me, since this is my job and being asked to do something because I’m wanted is a nice feeling and position – and they’re paying me to go there. It will also be my first time in California even though I guess I won’t get time to actually see much of it.
I hope to write a follow-up post later on about what I’m currently working with, once it has gone public.
There’s this person within IETF who seems to possess endless energy and a never-ending wish to clean up tiny details within the IETF processes. He continuously digs up specifications that need to be registered or submitted again somewhere due to some process. Often under loud protests from fellow IETFers since it steals time and energy from people on the lists for discussions and reviews – only to satisfy some administrative detail. Time and energy perhaps better spent on things like new protocols and interesting new technologies.
From my work with curl I have of course figured out a few problems with RFC1738 that I don’t think we should just repeat in a new version of the spec. It turns out I’m not alone in thinking this work isn’t really good like this, and I posted a second mail to clarify my points.
We’re not working on fixing the problems with FTP URIs that are present in RFC1738 so just rephrasing those into a new spec is a bad idea.
We could possibly start the work on fixing the problems, but so far I’ve seen no such will or efforts and I don’t plan on pushing for that myself either.
Please tell me or the ftpext2 group where I or the others are wrong or right or whatever!
Back when I was a HTTP rookie in the late 90s, I once expected that there was this fine RFC document somewhere describing how to do HTTP cookies. I was wrong. A lot of others have missed that document too, both before and after my initial search.
I was wrong in the sense that sure there were RFCs for cookies. There were even two of them (RFC2109 and RFC2965)! The only sad thing was however that both of them were totally pointless as in effect nobody (servers nor clients) implemented cookies like that so they documented idealistic protocols that didn’t exist in the real world. This sad state has made people fall into cookie problems all the way into modern days when they’ve implemented services according to those RFCs and then blame their browser for failing.
It turned out that the only document that existed that were being used, was the original Netscape cookie document. It can’t even be called a specification because it is so short and is so lacking in details that it leaves large holes open and forces implementors to guess about the missing pieces. A sweet irony in itself is the fact that even Netscape removed the document from their site so the only place to find this document is at archive.org or copies like the one I link to above at the curl.haxx.se site. (For some further and more detailed reading about the history of cookies and a bunch of the flaws in the protocol/design, I recommend Michal Zalewski’s excellent blog post HTTP cookies, or how not to design protocols.)
While HTTP was increasing in popularity as a protocol during the 00s and still is, and more and more stuff get done in browsers and everything and everyone are using cookies, the protocol was still not documented anywhere as it was actually used.
Somewhat modeled after the httpbis working group (which is working on updating and bugfixing the HTTP 1.1 spec), IETF setup a mailing list named httpstate in the early 2009 to start discussing what problems there are with cookies and all related matters. After lively discussions throughout the year, the working group with the same name as the mailinglist was founded at December 11th 2009.
One of the initial sparks to get the httpstate group going came from Bill Corry who said this about the start:
In late 2008, Jim Manico and I connected to create a specification for HTTPOnly — we saw the security issues arising from how the browser vendors were implementing HTTPOnly in varying ways due to a lack of a specification and formed an ad-hoc working group to tackle the issue.
I must stress that the work has only involved to document how things work today and not to invent or create anything new. We don’t fix any of the many known problems with cookies, but we describe how you write your protocol implementation if you want to interact fine with existing infrastructure.
The new spec explicitly obsoletes the older RFC2965, but doesn’t obsolete RFC2109. That was done already by RFC2965. (I updated this paragraph after my initial post.)
Oh, and yours truly is mentioned in the ending “acknowledgements” section. It’s actually the second RFC I get to be mentioned in, the first being RFC5854.
I am convinced that I will get reason to get back to the cookie topic soon and describe what is being worked on for the future. Once the existing cookies have been documented, there’s a desire among people to design something that overcomes the problems with the existing protocol. Adam’s CAKE proposal being one of the attempts and ideas in the pipe.
Another parallel IETF effort is the http-auth mailing list in which lots of discussions around HTTP authentication is being held, and as they often today involve cookies there’s a lot of talk about them there as well. See for example Timothy D. Morgan’s document Weaning the Web off of Session Cookies.
I’ll certainly track the development. And possibly even participate in shaping how this will go. We’ll see.
So yesterday we held a little HTTP-related event in Stockholm, arranged by OWASP Sweden. We talked a bit about cookies, websockets and recent HTTP headers.
Below are all the slides from the presentations I, Martin Holst Swende and John Wilanders did. (The entire event was done in Swedish.)
Martin Holst Swende’s talk:
John Wilander’s slides from his talk are here:
In the IETF ftpext2 working group there have been some talks around clients’ and servers’ ability to do and support “ranged” file transfers, that is transferring only a piece of any given file. FTP supports the REST command and has done so since the dawn of man (RFC765 – June 1980), and using that command, a client can set the starting point for a transfer but there is no way to set the end point. HTTP has supported the Range: header since the first HTTP 1.1 spec back in January 1997, and that supports both a start and an end point. The HTTP header does in fact support multiple ranges within the same header, but let’s not overdo it here!
Currently, to avoid getting an entire file a client would simply close the data connection when it has got all the data it wants. The unfortunate reality is that some servers don’t notice clients doing this, so in order for this to work reliably a client also has to send ABOR, and after this command has been sent there is no way for the client to reliably figure out the state of the control connection so it has to get closed as well (which is crap in case more files are to be transferred to or from the same host). It primarily becomes unreliable because when ABOR is sent, the client gets one or two responses back due to a race condition between the closing and the actual end of transfer etc, and it isn’t possible to tell exactly how to continue.
A solution for the future is being worked on. I’ve joined up the effort to write a spec that will suggest a new FTP command that sets the end point for a transfer in the same vein REST sets the start point. For the moment, we’ve named our suggested command RANG (as short for range). “We” in this context means Tatsuhiro Tsujikawa, Anthony Bryan and myself but we of course hope to get further valuable feedback by the great ftpext2 people.
There already are use cases that want range request for FTP. The people behind metalinks for example want to download the same file from many servers, and then it makes sense to be able to download little pieces from different sources.
The people who found the libcurl bugs I linked to above use libcurl as part of the Fedora/Redhat installer Anaconda, and if I understand things right they use this feature to just get the beginning of some files to check them out and avoid having to download the full file before it knows it truly wants it. Thus it saves lots of bandwidth.
In short, the use-cases for ranged FTP retrievals are quite likely pretty much the same ones as they are for HTTP!
The first RANG draft is now available.
SFTP, the SSH File Transfer Protocol, is a misleading name. It gives you the impression that it might be something like a secure version of FTP, perhaps something like FTPS but modeled over SSH instead of SSL. But it isn’t!
I think a more suitable name would’ve been SNFS or FSSSH. That is: networked file system operations over SSH, as that is in fact what SFTP is. The SFTP protocol is closer to NFS in nature than FTP. It is a protocol for sending and receiving binary packets over a (secure) SSH channel to read files, write files, and so on. But not on the basis of entire files, like FTP, but by sending OPEN file as FILEHANDLE, “WRITE this piece of data at OFFSET using FILEHANDLE” etc.
SFTP was being defined by a working group with IETF but the effort died before any specification was finalized. I wasn’t around then so I don’t know how this happened. During the course of their work, they released several drafts of the protocol using different protocol versions. Version 3, 4, 5 and 6 are the ones most used these days. Lots of SFTP implementations today still only implement the version 3 draft. (like libssh2 does for example)
Each packet in the SFTP protocol gets a response from the server to acknowledge it was received. It also includes an error code etc. So, the basic concept to write a file over SFTP is:
[client] OPEN <filehandle>
[server] OPEN OK
[client] WRITE <data> <filehandle> <offset 0> <size N>
[server] WRITE OK
[client] WRITE <data> <filehandle> <offset N> <size N>
[server] WRITE OK
[client] WRITE <data> <filehandle> <offset N*2> <size N>
[server] WRITE OK
[client] CLOSE <filehandle>
[server] CLOSE OK
This example obviously assumes the whole file was written in three WRITE packets. A single SFTP packet cannot be larger than 32768 bytes so if your client could read the entire file into memory, it can only send it away using very many small chunks. I don’t know the rationale for selecting such a very small maximum packet size, especially since the SSH channel layer over which SFTP packets are transferred over doesn’t have the same limitation but allows much larger ones! Interestingly, if you send a READ of N bytes from the server, you apparently imply that you can deal with packets of that size as then the server can send packets back that are N bytes (plus header)…
Enter network latency.
More traditional transfer protocols like FTP, HTTP and even SCP work on entire files. Roughly like “send me that file and keep sending until the entire thing is sent”. The use of windowing in the transfer layer (TCP for FTP and HTTP and within the SSH channels for SCP) allows flow control to work without having to ACK every single little packet. This is a great concept to keep the flow going at high speed and still allow the receiver to not get drowned. Even if there’s a high network latency involved.
The nature of SFTP and its ACK for every small data chunk it sends, makes an initial naive SFTP implementation suffer badly when sending data over high latency networks. If you have to wait a few hundred milliseconds for each 32KB of data then there will never be fast SFTP transfers. This sort of naive implementation is what libssh2 has offered up until and including libssh2 1.2.7.
To achieve speedy transfers with SFTP, we need to “pipeline” the packets. We need to send out several packets before we expect the answers to previous ones, to make the sending of an SFTP packet and the checking of the corresponding ACKs asynchronous. Like in the above example, we would send all WRITE commands before we wait for/expect the ACKs to come back from the server. Then the round-trip time essentially becomes a non-factor (or at least a very small one).
In tests I did over a high-latency connection, I could boost libssh2’s SFTP upload performance 8 (eight) times compared to the former behavior. In fact, that’s compared to earlier git behavior, comparing to the latest libssh2 release version (1.2.7) would most likely show an even greater difference.
My plan is now to implement this same concept for SFTP downloads in libssh2, and then look over if we shouldn’t offer a slightly modified API to allow applications to use pipelined transfers better and easier.
Ok, so NFS has never really been my cup of tea. Complicated and the problem with the root user and locking and what not have always made me get all itchy when thinking about or using NFS.
Enter pNFS, the NFS 4.1 invention that truly suddenly makes the NFS technology so much interesting for high performance solutions. I also think it is a bit unknown so I thought I’d help to share the knowledge about this to you my dear readers. The p in pNFS stands for parallel. The whole idea is that the single NFS server just provides meta data back to the client, with enough information to allow the client to read the actual payload data directly from the storage server(s), that then supposedly are different ones than the main meta data server
(picture from www.nexenta.org)
As you can see on this fancy picture, it allows each client to speak directly to the storage device to get or send the data. This allows them to avoid using a single bottleneck NFS server.