Screen scraping expert witness

Thursday, August 23rd, 2012

This is a slightly edited version of a genuine email I received in May 2012:

Dear Mr. Stenberg -

I recently came across the text you co-authored with Michael Schrenk, Webbots, Spiders, and Screen Scrapers, and was wondering if you might be interested in being a paid expert witness in a lawsuit we’re handling.

One of the major claims in the suit is unauthorized computer access in the form of a massive, multi-year campaign of screen scraping, and we’re looking for a qualified expert who can make the activity make sense to a jury (in the unlikely event that this matter reaches trial; fewer than 2% of cases do, in federal court).

We’re in Los Angeles, California, as is the case (and naturally would cover travel expenses, an hourly or per diem expert witness fee, etc). If you’re interested (or even if you’re not), please let me know? You can reach me via email or at (xyz) xyz-xyzx.

Many thanks,
[withheld]

Link to the book.

I responded to this mail saying that I’d rather not due to the distance and travel it’d require, but I never heard back from them again so I have no idea whatever happened in this case or who got to be the expert in the end…

US patent 6,098,180

Wednesday, May 11th, 2011

(I am not a lawyer, this is not legal advice and these are not legal analyses, just my personal observations and ramblings. Please correct me where I’m wrong or add info if you have any!)

At 3:45 pm on March  18th 2011, the company Content Delivery Solutions LLC filed a complaint in a court in Texas, USA. The defendants are several bigwigs and the list includes several big and known names of the Internet:

  • Akamai
  • AOL
  • AT&T
  • CD Networks
  • Globalscape
  • Google
  • Limelight Networks
  • Peer 1 Network
  • Research In Motion
  • Savvis
  • Verizon
  • Yahoo!

The complaint was later amended with an additional patent (filed on April 18th), making it list three patents that these companies are claimed to violate (I can’t find the amended version online though). Two of the patents ( 6,393,471 and 6,058,418) are for marketing data and how to use client info to present ads basically. The third is about file transfer resumes.

I was contacted by a person involved in the case at one of the defendants’. This unspecified company makes one or more products that use “curl“. I don’t actually know if they use the command line tool or the library – but I figure that’s not too important here. curl gets all its superpowers from libcurl anyway.

This Patent Troll thus basically claims that curl violates a patent on resumed file transfers!

The patent in question that would be one that curl would violate is the US patent 6,098,180 which basically claims to protect this idea:

A system is provided for the safe transfer of large data files over an unreliable network link in which the connection can be interrupted for a long period of time.

The patent describes several ways in how it may detect how it should continue the transfer from such a break. As curl only does transfer resumes based on file name and an offset, as told by the user/application, that could be the only method that they can say curl would violate of their patent.

The patent goes into detail in how a client first sends a “signature” and after an interruption when the file transfer is about to continue, the client would ask the server about details of what to send in the continuation. With a very vivid imagination, that could possibly equal the response to a FTP SIZE command or the Content-Length: response in a HTTP GET or HEAD request.

A more normal reader would rather say that no modern file transfer protocol works as described in that patent and we should go with “defendant is not infringing, move on nothing to see here”.

But for the sake of the argument, let’s pretend that the patent actually describes a method of file transfer resuming that curl uses.

The ‘180 (it is referred to with that name within the court documents) patent was filed at February 18th 1997 (and issued on August 1, 2000). Apparently we need to find prior art that was around no later than February 17th 1996, that is to say one year before the filing of the stupid thing. (This I’ve been told, I had no idea it could work like this and it seems shockingly weird to me.)

What existing tools and protocols did resumed transfers in February 1996 based on a file name and a file offset?

Lots!

Thank you all friends for the pointers and references you’ve brought to me.

  • The FTP spec RFC 959 was published in October 1985. FTP has a REST command that tells at what offset to “restart” the transfer at. This was being used by FTP clients long before 1996, and an example is the known Kermit FTP client that did offset-based file resumed transfer in 1995.
  • The HTTP header Range: introduces this kind of offset-based resumed transfer, although with a slightly fancier twist. The Range: header was discussed before the magic date, as also can be seen on the internet already in this old mailing list post from December 1995.
  • One of the protocols from the old days that those of us who used modems and BBSes in the old days remember is zmodem. Zmodem was developed in 1986 and there’s this zmodem spec from 1988 describing how to do file transfer resumes.
  • A slightly more modern protocol that I’ve unfortunately found no history for before our cut-off date is rsync, as I could only find the release mail for rsync 1.0 from June 1996. Still long before the patent was filed obviously, and also clearly showing that the one year margin is silly as for all we know they could’ve come up with the patent idea after reading the rsync releases notes and still rsync can’t be counted as prior art.
  • Someone suggested GetRight as a client doing this, but GetRight wasn’t released in 1.0 until Febrary 1997 so unfortunately that didn’t help our case even if it seems to have done it at the time.
  • curl itself does not pre-date the patent filing. curl was first released in March 1998, and the predecessor was started around summer-time 1997. I don’t have any remaining proofs of that, and it still wasn’t before “the date” so I don’t think it matters much now.

At the time of this writing I don’t know where this will end up or what’s going to happen. Time will tell.

This Software patent obviously is a concern mostly to US-based companies and those selling products in the US. I am neither a US citizen nor do I have or run any companies based in the US. However, since curl and libcurl are widely used products that are being used by several hundred companies already, I want to help bring out as much light as possible onto this problem.

The patent itself is of course utterly stupid and silly and it should never have been accepted as it describes trivially thought out ideas and concepts that have been thought of and implemented already decades before this patent was filed or granted although I claim that the exact way explained in the patent is not frequently used. Possibly the protocol using a method that is closed to the description of the patent is zmodem.

I guess I don’t have to mention what I think about software patents.

I’m convinced that most or all download tools and browsers these days know how to resume a previously interrupted transfer this way. Why wouldn’t these guys also approach one of the big guys (with thick wallets) who also use this procedure? Surely we can think of a few additional major players with file tools that can resume file transfers and who weren’t targeted in this suit!

I don’t know why. Clearly they’ve not backed down from attacking some of the biggest tech and software companies.

patent drawing

(Illustration from the ‘180 patent.)

foss-sthlm second meetup

Wednesday, April 28th, 2010

Within the network of people we call foss-sthlm, the time is closing in towards our second meetup. Or “Möte #2″ as we so imaginatively call it (translates to “meeting #2″), to be held in the same great place as the first one, on May 19th 2010.

Our first gettogether was an astounding success and I don’t expect us to be able to repeat that massive attention and number of attendees this time. Things still look good and the number of attendees just reached 80 with exactly three weeks to go, so there’s still good time for more people to decide.

This time we’ve lined up 5 speakers and we’ve made an deliberate effort to not re-use any of the speakers from the first metting to make sure more people get the chance. Unfortunately, that means I won’t do any talk! ;-) Topics this time (for those lazy enough to not visit the link or if you for any reason think Swedish is hard to digest) include:

Smalltalk, FOSS legalities in Sweden, Cross platform Qt, Women in Open Source and FFmpeg.

I’m glad that we once again have sponsors that will let us use the room for free, provide sandwiches and coffee for the break without charge so that we can do this thing admission free once again. We learned from the first meeting that we need a little break in the middle.

If you’re in the area and still haven’t signed up, please go ahead. We’ll get a full afternoon with hardcore FOSS talks, followed by the rest of the evening at a pub talking FOSS with Stockholm’s FOSS hackers that matter.

Logotype Competition

We’re also throwing out different logo alternatives on the mailing list as a kind of competition and I guess we’ll just vote for a winning contribution at some point. My personal favorite so far is probably this version created by Tommy Nevtelen:

Tommy-Nevtelen-FOSS_STHLM

Open Android Alliance

Monday, September 28th, 2009

In the past: cyanogenmod made one of the most popular 3rd party Android ROMs for HTC devices. Personally I haven’t yet tried it on my Magic, but friends tell me it’s the ROM to use.Android

On September 24th 2009, Google sets their legal team on the ROM creator, asking him to stop distributing the parts of Android that aren’t open source but in fact are good old traditional closed source apps – made by Google. Cyanogen himself (Steve Kondik) responded something in the spirit that since the ROM only runs on hardware that already runs the apps users already have a license to use them. Google responded, saying they protect the Google Phone Experience.

This C&D act triggered a huge reaction in the Android communities as people suddenly became aware of the fact that A) parts of the Android core OS aren’t at all open (source) and B) Google is not the cuddly Teddy Bear we all want it to be.

In the xda-developers.com front, where a lot of the custom ROMs are being discussed and users of them hang out, they created the Open Android Alliance with the intent of creating a completely open source Android.

In another end and indepedently of the xda-developers it seems, lots of participants in the google group android-platform pretty much decided the same thing but they rather started out discussing exactly what would be needed to do and what code there is and so on.

Currently, both camps have been made aware of each other and there have been expressed intents of joining into a single effort. I don’ t think such subtleties matter much, but we just might see the beginning of a more open more free Android project getting started here. I’ll certainly be interested in seeing where this is going…

Updated: they now have their own domain. Link in article updated.

Apple patents another Rockbox idea

Saturday, January 24th, 2009

We’ve just read about Apple’s patent application (that seems to have been filed on July 17 2007) to alter the volume of a media player based on the external surrounding.

It’s funny how this was suggested to the Rockbox project already back in September 2002 and is logged fine independently by archive.org – and in fact also on Sourceforge where we hosted our request-tracker back then.

This is not the first time we see this consumer electronics giant patent ideas we’ve already implemented or discussed publicly a very long time before in the Rockbox project.

In 2006 Apple filed to patent a system to read up audio clips to the user to help menu navigation, a concept we at that time already had implemented and I must say was fairly polished in Rockbox. (Link to their patent application.)

Two obvious cases of where the ideas certainly were not new. Not that it tend to prevent patent applications, but still…

Rockbox

How to hack firmwares and get away with it

Wednesday, April 2nd, 2008

It is with interest we in the Rockbox camp checked out the recent battle in Creative land where they shot down a firmware (driver really) hack by the hacker Daniel_K as seen in this forum thread.

We’re of course interested since we do a lot of custom firmwares for all sorts of targets by all sorts of companies, and recently there are efforts in progress on the Creative series of players so could this take-down move possibly be a threat to us?

But no.

In the Rockbox community we have already since day one struggled to never ever release anything, not code nor images or anything else, that originates from a company or other property owner. We don’t distribute other’s firmwares, not even parts of them.

For several music players the install process involves patching the original firmware file and flashing that onto the target. But then we made tools that get the file from the source, or let the user himself get the file from the right place, and then our tool does the necessary magic.

I’m not the only one that think Daniel Kawakami should’ve done something similar. If he would just have released tools and documentation written entirely by himself, that would do the necessary patching and poking on the drivers that the users could’ve downloaded from Creative themselves, then big bad Creative wouldn’t have much of legal arguments to throw at Daniel. It would’ve saved Daniel from this attack and it would’ve taken away the ammunition from Creative.Lots of Rockbox Targets

I’m not really defending Creative’s actions, although I must admit it wasn’t really a surprising action seeing that Daniel did ask for money (donations) for patching and distributing derivates of Creative’s software.

So far in our 6+ years of history, the Rockbox project has been target of legal C&D letter threats multiple times, but never from one of the companies for which targets we develop firmwares for. It has been other software vendors: two game companies (Tetris Company and PopCap games) fighting to prevent us from using their trademarked names (and we could even possibly agree that our name selections were a bit too similar to the original ones) and AT&T banning us from distributing sound files generated with their speech engine software. Both PopCap and Tetris of course also waved with laywers saying that we infringed on their copyrights on “game play” and “look” and what not, but they really have nothing on us there so we just blanked-faced them on those silly demands.

The AT&T case is more of a proof of greedy software companies having very strict user licenses and we really thought we had a legitimate license that we could use to produce output and distribute for users – sound files that are to a large extent used by blind or visually impaired users to get the UI spelled out. We pleaded that we’re an open source, no-profit, no-money really organization and asked for permission, but were given offers to get good deals on “proper” licenses for multiple thousands of dollars per year.

Ok, so the originating people of the Rockbox project is based in Sweden which may also be a factor as we’re not as vulnerable to scary US company tactics where it seems they can sue companies/people who then will have to spend a fortune of their own money just to defend themselves and then you have to counter-sue to get any money back even if you were found not guilty in the first case. Neither is Rockbox an attempt to circumvent any copy protections, as if it were it would have violated laws in multiple countries and regions. Also, reverse engineering is perfectly legal in many regions of the world contrary to what many people seem to believe.

If this isn’t sticking your chin out, then what is? ;-)

Update 4-apr-2008: Creative backpedals when their flame thrower backfired.