curl doesn’t spew binary anymore

One of the least favorite habits of curl during all these years, I’ve been told, is when users forget to instruct the command line tool where to store the downloaded file and as a direct consequence, curl instead sends a lot of binary “gunk” to the terminal. The end result of that is at best just a busload of weird-looking characters on the screen, but with just a little bit of bad luck it can also lock up the terminal completely or change it in other ways.

Starting in curl 7.55.0 (from this commit), curl will inspect the beginning of each download that has been told to get sent to the terminal (tty!) and attempt to detect and prevent raw binary output to get sent there. The code is only simply looking for a binary zero in the data.

$ curl https://example.com/image.jpg
Warning: Binary output can mess up your terminal. Use "--output -" to tell curl to output it to your terminal anyway, or consider "--output <FILE>" to save to a file.

As the warning message says, there’s an option to use to switch off this emergency check for when you truly know what you’re doing and you don’t need curl to prevent you from doing this. Then you just tell curl explicitly that you want the output to stdout, with “–output -” (or “-o -” for a shorter version):

$ curl -o - https://example.com/binblob.img

We’re eager to get your input and feedback on how this works. We are aware of the risk of false positives for UTF-16 and UTF-32 outputs, but we think they are rare enough to not make this a huge problem.

This feature should be able to drastically reduce the risk for this:

Pipes

(Update, added after the initial posting.)

So many have remarked or otherwise asked how this affects when stdout is piped into something else. It doesn’t affect that! The whole point of this check is to only show the warning message if the binary output is sent to the terminal. If you instead pipe the output to another program or if you redirect the output with >, that will not trigger this warning but will instead continue just like before. Just like you’d expect it to.

23 thoughts on “curl doesn’t spew binary anymore”

  1. > The code is only simply looking for a binary zero in the data.

    What do you mean by that? I have a tool that parses files and needs to ignore binary files. Right now I’m just parsing them anyway, and breaking if the line I’m reading is not a valid UTF-8 string.

    1. I’m guessing that this check is done while the download is still in progress, so the entire file can’t yet be parsed.

  2. I just looked at your commit, and I was surprised at how nice the code is! The relevant line to my question is here:

    if(isatty && (outs->bytes terminal_binary_ok) {
    if(memchr(buffer, 0, bytes)) {

    memchr returns the first occurrence of the byte 0 (your second argument), or NULL.

    So a few things:

    * what if your output is more than 2000 bytes?
    * what if your output is binary but doesn’t contain a byte 0?
    * what if your output is a normal UTF-8 string but contains a byte 0? ( see https://stackoverflow.com/questions/6907297/can-utf-8-contain-zero-byte )

    1. 1) You don’t want this searching the entire file or it will have to do that before displaying any of it. This is a problem for any long text file, but a special problem for text streams, which could sit and wait, or actively stream data, for quite some time. This is obviously a heuristic.

      2) This is obviously a heuristic.

      3) The ‘NUL’ in UTF-8 is generally used for end-of-string and nothing’s else. Thus, if your fetch contains a NUL, it consists of binary data by definition, because text data is not ‘a bunch of strings’ or ‘arbitrary data structures containing text’, it’s ‘text’.

    1. I just consider the fact that my cat spews binary as part of the price of pet ownership. Just mop it up before it dries. Obviously I’d prefer it it spewed JSON but such is life.

  3. Love the idea but I’m slightly concerned about data loss. You request data from a server, it sends a response, but curl neither shows nor saves it. Maybe you can request the same data again, but maybe the server response is different next time. Maybe it was a rare one time glitch that caused a NULL byte to be sent, and showing the output would have helped to track it down. I don’t know. Perhaps this feature should be disabled by default for POST?

    1. @Jim: I don’t rule out a very small risk. But most terminals will silently hide binary zeroes anyway so you’d already have a data loss by sending such data to the terminal. Not to mention how some sequences will cause “funny effects” and not be a safe “storage” either. So I truly think that risk is minimal.

  4. It’s an ok change. All programs should actually consider behaving like this by default.

    I can not recount a SINGLE time when I needed the above functionality to be outputed onto stdout (tty). It may be different for when I may redirect streams, but this is like … 1% or 2% of the times when I needed it only.

  5. I spot a bug:

    bool isatty = config->global->isatty;
    […]
    if(!config)
    return failure;

    1. @Name: that would be true if ‘config’ could ever be NULL at that point. Can you explain to me how that would happen there? But yes, the check is certainly wrong or the code is wrong.

  6. Did you consider automatically outputting to a temp file, and printing the temp file name (perhaps just prior to starting the transfer)? This might be helpful when scripting curl.

  7. Thankyou ! I tend to have gaps between the times when I use cURL big enough that I do this quite a bit.

  8. The usability of the prior behavior, messaging up the state of your terminal, is bad and I am sure we have all hit that many times. That said, if you know are connected to a terminal would it not be better to always escape problematic output rather than relying a heuristic to detect if it is needed? I don’t know how to do that, btw, but I like idea to delegate that to less as Bobby suggested. It is easy to disable with “| cat”.

Comments are closed.