Tag Archives: Books

Review: curl programming

Title: Curl Programming
Author: Dan Gookin
ISBN: 9781704523286
Weight: 181 grams

A book for my library is a book about my library!

Not long ago I discovered that someone had written this book about curl and that someone wasn’t me! (I believe this is a first) Thrilled of course that I could check off this achievement from my list of things I never thought would happen in my life, I was also intrigued and so extremely curious that I simply couldn’t resist ordering myself a copy. The book is dated October 2019, edition 1.0.

I don’t know the author of this book. I didn’t help out. I wasn’t aware of it and I bought my own copy through an online bookstore.

First impressions

It’s very thin! The first page with content is numbered 13 and the last page before the final index is page 110 (6-7 mm thick). Also, as the photo shows somewhat: it’s not a big format book either: 225 x 152 mm. I suppose a positive spin on that could be that it probably fits in a large pocket.

Size comparison with the 2018 printed version of Everything curl.

I’m not the target audience

As the founder of the curl project and my role as lead developer there, I’m not really a good example of whom the author must’ve imagined when he wrote this book. Of course, my own several decades long efforts in documenting curl in hundreds of man pages and the Everything curl book makes me highly biased. When you read me say anything about this book below, you must remember that.

A primary motivation for getting this book was to learn. Not about curl, but how an experienced tech author like Dan teaches curl and libcurl programming, and try to use some of these lessons for my own writing and manual typing going forward.

What’s in the book?

Despite its size, the book is still packed with information. It contains the following chapters after the introduction:

  1. The amazing curl … 13
  2. The libcurl library … 25
  3. Your basic web page grab … 35
  4. Advanced web page grab … 49
  5. curl FTP … 63
  6. MIME form data … 83
  7. Fancy curl tricks … 97

As you can see it spends a total of 12 pages initially on explanations about curl the command line tool and some of the things you can do with it and how before it moves on to libcurl.

The book is explanatory in its style and it is sprinkled with source code examples showing how to do the various tasks with libcurl. I don’t think it is a surprise to anyone that the book focuses on HTTP transfers but it also includes sections on how to work with FTP and a little about SMTP. I think it can work well for someone who wants to get an introduction to libcurl and get into adding Internet transfers for their applications (at least if you’re into HTTP). It is not a complete guide to everything you can do, but then I doubt most users need or even want that. This book should get you going good enough to then allow you to search for the rest of the details on your own.

I think maybe the biggest piece missing in this book, and I really thing it is an omission mr Gookin should fix if he ever does a second edition: there’s virtually no mention of HTTPS or TLS at all. On the current Internet and web, a huge portion of all web pages and page loads done by browsers are done with HTTPS and while it is “just” HTTP with TLS on top, the TLS part itself is worth some special attention. Not the least because certificates and how to deal with them in a libcurl world is an area that sometimes seems hard for users to grasp.

A second thing I noticed no mention of, but I think should’ve been there: a description of curl_easy_getinfo(). It is a versatile function that provides information to users about a just performed transfer. Very useful if you ask me, and a tool in the toolbox every libcurl user should know about.

The author mentions that he was using libcurl 7.58.0 so that version or later should be fine to use to use all the code shown. Most of the code of course work in older libcurl versions as well.

Comparison to Everything curl

Everything curl is a free and open document describing everything there is to know about curl, including the project itself and curl internals, so it is a much wider scope and effort. It is however primarily provided as a web and PDF version, although you can still buy a printed edition.

Everything curl spends more space on explanations of features and discussion how to do things and isn’t as focused around source code examples as Curl Programming. Everything curl on paper is also thicker and more expensive to buy – but of course much cheaper if you’re fine with the digital version.

Where to buy?

First: decide if you need to buy it. Maybe the docs on the curl site or in Everything curl is already good enough? Then I also need to emphasize that you will not sponsor or help out the curl project itself by buying this book – it is authored and sold entirely on its own.

But if you need a quick introduction with lots of examples to get your libcurl usage going, by all means go ahead. This could be the book you need. I will not link to any online retailer or anything here. You can get it from basically anyone you like.

Mistakes or errors?

I’ve found some mistakes and ways of phrasing the explanations that I maybe wouldn’t have used, but all in all I think the author seems to have understood these things and describes functionality and features accurately and with a light and easy-going language.

Finally: I would never capitalize curl as Curl or libcurl as Libcurl, not even in a book. Just saying…

HTTP/3 Explained

I’m happy to tell that the booklet HTTP/3 Explained is now ready for the world. It is entirely free and open and is available in several different formats to fit your reading habits. (It is not available on dead trees.)

The book describes what HTTP/3 and its underlying transport protocol QUIC are, why they exist, what features they have and how they work. The book is meant to be readable and understandable for most people with a rudimentary level of network knowledge or better.

These protocols are not done yet, there aren’t even any implementation of these protocols in the main browsers yet! The book will be updated and extended along the way when things change, implementations mature and the protocols settle.

If you find bugs, mistakes, something that needs to be explained better/deeper or otherwise want to help out with the contents, file a bug!

It was just a short while ago I mentioned the decision to change the name of the protocol to HTTP/3. That triggered me to refresh my document in progress and there are now over 8,000 words there to help.

The entire HTTP/3 Explained contents are available on github.

If you haven’t caught up with HTTP/2 quite yet, don’t worry. We have you covered for that as well, with the http2 explained book.

A book status update

— How’s Daniel’s curl book going?

I can hear absolutely nobody asking. I’ll just go ahead and tell you anyway since I had a plan to get a first version “done” by “the summer” (of 2016). I’m not sure I believe in that time frame anymore.

everything-curl-coverI’m now north of 40,000 words with a bunch of new chapters and sections added recently and I’m now generating an index that looks okay. The PDF version is exactly 200 pages now.

The index part is mainly interesting since the platform I use to write the book on, gitbook.com, doesn’t offer any index functionality of its own so I  had to hack one up and add. That’s just one additional beauty of having the book made entirely in markdown.

Based on what I’ve written so far and know I still have outstanding, I am about 70% done, indicating there are about 17,000 words left for me. At this particular point in time. The words numbers tend to grow over time as the more I write (and the completion level is sort of stuck), the more I think of new sections that I should add and haven’t yet written…

On this page you can get the latest book stats, right off the git repo.

http2 explained in markdown

http2 explainedAfter twelve  releases and over 140,000 downloads of my explanatory document “http2 explained“, I eventually did the right thing and converted the entire book over to markdown syntax and put the book up on gitbook.com.

Better output formats, now epub, MOBI, PDF and everything happens on every commit.

Better collaboration, github and regular pull requests work fine with text content instead of weird binary word processor file formats.

Easier for translators. With plain text commits to aid in tracking changes, and with the images in a separate directory etc writing and maintaining translated versions of the book should be less tedious.

I’m amazed and thrilled that we already have Chinese, Russian, French and Spanish translations and I hear news about additional languages in the pipe.

I haven’t yet decided how to do with “releases” now, as now we update everything on every push so the latest version is always available to read. Go to http://daniel.haxx.se/http2/ to find out the latest about the document and the most updated version of the document.

Thanks everyone who helps out. You’re the best!

Screen scraping expert witness

This is a slightly edited version of a genuine email I received in May 2012:

Dear Mr. Stenberg –

I recently came across the text you co-authored with Michael Schrenk, Webbots, Spiders, and Screen Scrapers, and was wondering if you might be interested in being a paid expert witness in a lawsuit we’re handling.

One of the major claims in the suit is unauthorized computer access in the form of a massive, multi-year campaign of screen scraping, and we’re looking for a qualified expert who can make the activity make sense to a jury (in the unlikely event that this matter reaches trial; fewer than 2% of cases do, in federal court).

We’re in Los Angeles, California, as is the case (and naturally would cover travel expenses, an hourly or per diem expert witness fee, etc). If you’re interested (or even if you’re not), please let me know? You can reach me via email or at (xyz) xyz-xyzx.

Many thanks,
[withheld]

Link to the book.

I responded to this mail saying that I’d rather not due to the distance and travel it’d require, but I never heard back from them again so I have no idea whatever happened in this case or who got to be the expert in the end…

The updated web scraping howto

webbots-spiders-and-screen-scrapers

Web scraping is a practice that is basically as old as the web. The desire to extract contents or to machine- generate things from what perhaps was primarily intended to be presented to a browser and to humans pops up all the time.

When I first created the first tool that would later turn into curl back in 1997, it was for the purpose of scraping. When I added more protocols beyond the initial HTTP support it too was to extend its abilities to “scrape” contents for me.

I’ve not (yet!) met Michael Schrenk in person, although I’ve communicated with him back and forth over the years and back in 2007 I got a copy of his book Webbots, Spiders and Screen Scrapers in its 1st edition. Already then I liked it to the extent that I posted this positive little review on the curl-and-php mailing list saying:

this book is a rare exception and previously unmatched to my knowledge in how it covers PHP/CURL. It explains to great details on how to write web clients using PHP/CURL, what pitfalls there are, how to make your code behave well and much more.

Fast-forward to the year 2011. I was contacted by Mike and his publisher at Nostarch, and I was asked to review the book with special regards to protocol facts and curl usage. I didn’t hesitate but gladly accepted as I liked the first edition already and I believe an updated version could be useful to people.

Now, in the early 2012 Mike’s efforts have turned out into a finished second edition of his book. With updated contents and a couple of new chapters, it is refreshed and extended. The web has changed since 2007 and so has this book! I hope that my contributions didn’t only annoy Mike but possibly I helped a little bit to make it even more accurate than the original version. If you find technical or factual errors in this edition, don’t feel shy to tell me (and Mike of course) about them!

The Mythical Man-Month

The Mythical Man-MonthFrederick Brooks wrote this classical book already back in 1975 and added a few extra chapters for the twenty years anniversary 1995…

Large portions of it feels of the age and there’s a lot of talk about Fortran, System/360 and PL-1 as if we should know about them (which made me fast forward over some chapters). But there are gems as well, and the most significant things people seem to remember Brooks’ book for are still pretty valid and fine.

Adding more people to a project leads to the need for more communication and thus it may slow down development rather than speed it up. Also known as Brooks’s law.

Given the complexity of software and software development, there’s no single method or concept that will lead to an improvement by an order of magnitude – within a decade. There’s No Silver Bullet. (This section was not in the original edition of the book.)

The risks involved when rewriting something and wants to fix everything that was wrong in the previous version so you over-work and over-design the successor. The so called Second system effect.

A lot of the book is spent on thoughts and theories around how to manage really really large software projects, like when you involve thousands of persons. Is it even possible to make such huge projects successful and if so, what does it take? The extra chapters do indeed add value since they offered Brooks a chance to re-evaluate his earlier claims and ideas and to check what seemed to be truths and what mistakes he did in the original edition.

A very interesting read that I’m glad I finally got time to get through!