Wednesday, April 11, 2012

Good deal? Census 1940 site built on barter

National Archives

The website at 1940census.archives.gov is operated by a private company, for free. In exchange, it can use the free public records on its for-profit site as well. Other companies paid $200,000 for the records.

Who says there's no free lunch?

You may have read over the past week about the release of 1940 Census records on a new U.S. government website, a site that buckled under the huge demand from people looking up details on the lives of their friends and relatives from the Great Depression.

You may not have realized that the site was built for U.S. taxpayers for the price of ? not one dime. A company from Silicon Valley built the site, and is operating it, for free. Genealogy buffs have been using the site for a week now to check millions of records. (See our earlier story?for tips on searching the 1940 Census, and?examples of people who have found relatives.)

Of course, the company, Inflection LLC of Redwood City, Calif., did get something in return for its effort: a free copy of those 3.8 million images of records from the 1940 Census. While other companies paid $200,000 for a set of the public records, Inflection can use those records in its for-profit business, a genealogy site called Archives.com.

It's a barter system for federal records: the public gets a free official U.S. website, and the company gets free data. It's been done before, as when the U.S. Patent and Trademark Office gave data to Google, which since 2006 has hosted the site for free as Google Patents.


Do you approve of the approach that the National Archives took, giving the data away in exchange for the free website? And what stories have you found in the 1940 Census? Add your story in the comments below or on our Open Channel page on Facebook.

Inflection also was hoping to get a boost to its reputation for building websites that could withstand a storm of traffic.

Performance standards in the contract
Both the company and the National Archives and Records Administration (NARA) had anticipated that the site would draw a crowd, as 72-year privacy restrictions expired and the records became available. What happened next lends credence to the boast that genealogy is the country's favorite hobby.

The contract says, "Drawing from NARA's experience in releasing the 1930 Census, and the experience of the National Archives of the United Kingdom when they released their 1901 and 1911 Censuses, NARA anticipates immense interest in the 1940 Census and a tremendous increase in traffic to its?www.archives.gov web site." (Here's?the contract in a PDF file.)

But how much of a crowd?

Here are the performance standards in the contract:

  • "When browsing from one image to another, each image should be presented to the user in 3 seconds or less."
  • "When moving from the standard rendered image to each zoom level (e.g. zoom 1x, 2x, 3x), the reformatted image should be rendered in 2 seconds or less."
  • "Support up to 10 million hits per day while providing response times of less than three seconds for keyword searches of the descriptive metadata."
  • "Support up to 25,000 concurrent users."

There was one more element in the contract, a somewhat vague requirement that Inflection increase service if demand was greater than anticipated.

  • "Scale on demand in the event that 10 million hits and/or 25,000 concurrent users are exceeded to ensure that the performance requirements ... are still achieved."

The crowd certainly exceeded those levels, as the most old-fashioned sounding search term possible, "1940 Census," became a top "trending topic" on Google and Twitter.

Most people seemed to get little or nothing from the site on the first day, including Census leaders, who were prepared to show off how easy it was to look up their grandparents. When the site stuck on "loading image," as it did for many other users, the officials resorted to showing a PowerPoint presentation with the results from an earlier search.

A 'tsunami'
As Inflection's general manager, Joe Godfrey, told us last week, "We were expecting a flood, but we got a tsunami."

  • On Day One, Monday, an estimated 100 million hits, or requests, with 22.5 million hits in just the first three hours. Though Inflection scrambled to improve service, the site was unusable for many users on the first day. The company added more servers through Amazon Simple Storage Service, its cloud data service provider, and also restricted some features on the site (such as zooming of images), until finally it was able to get on top of the traffic.
  • On Day Two, Tuesday, the numbers haven't been totaled, but it's believed to be higher than on Day One, with an estimated 40.1 million hits in the three-hour peak.
  • By Friday, the site was stable with about 60 million hits per day, and had served up more than 80 million images, or about 61 terabytes of data, the National Archives said. (That's more than the data contained in the first 20 years of astronomical observations by the Hubble Space Telescope.) The service quality was better than called for in the contract, with a load time of about 1.8 seconds per page, according to the Archives.

In other words, this might have been a good project for a "soft launch."

The contract called for extensive load testing before the release. We asked the National Archives for copies of those test results, but its spokeswoman said it wouldn't be able to provide them. But it said the site was tested to handle more than 70,000 simultaneous users ? more than the contract called for, and fewer than the level that resulted.

A 'no-cost contract'
No-cost contracts are allowed under Federal Acquisition Regulation competitive procedures. This contract has a one-year base period and options to extend for four more one-year periods.

"NARA provided a copy of the data to Inflection at no cost, copies that were sold to others for $200K," said spokeswoman Laura Diachenko of the National Archives. "Why Inflection agreed to this is a better question for them, but we are very happy to have them as a partner. They have experience with Census data, and managing access to large data sets, the capabilities we were seeking for this project."

She added, "Even though this is called a no-cost contract, the Government did incur costs ? in this case, aside from our resources, we also provided a copy of the 1940 Census to Inflection, at no cost.? In this particular case, we provided them data that they wanted in exchange for hosting access to this data.? Their interest was in getting the data (for their archives.com business), and for business development (attracting users to their site and eventually converting them to a subscriber."

Inflection's Godfrey said, "The primary value for us was in building our brand/notoriety, leveraging and expanding our technical expertise/infrastructure and helping to getting this extremely valuable record collection into the hands of as many people as possible.? Also, our engineering team (like all great engineers) are motivated by tackling challenging technical problems, and so the team was very excited to work on this."

Competition
All or most of the 1940 Census is now available free from several other companies, which had to pay for the public records. As a sort of loss leader, other genealogy sites, even the commercial ones, are making the 1940 Census records available for free, to subscribers and non-subscribers alike.

Here's how the race worked: All the commercial sites that chose to buy the data for $200,000 were handed a rack of hard drives full of 20 terabytes of images, taken from 4,745 rolls of microfilm, at 12:01 a.m. on April 2, or 72 years and a day after the Census Day in 1940.

By Thursday, a relatively new genealogy site called myHeritage,?was the first to have all the images online. Also making images available for free are Ancestry.com,?a commercial site, and?FamilySearch.org, owned by the Church of Jesus Christ of Latter-day Saints.

Thousands of volunteers are working on the next step: indexing the records by name, just as previous Census releases have been indexed by volunteers. Until those indexes are finished, searching is done only by address or neighborhood.

Your view
Do you approve of the approach that the National Archives took, giving the data away in exchange for the free website? And what stories have you found in the 1940 Census? Add your story in the comments below or on our Open Channel page on Facebook. See our earlier story?for tips on searching the 1940 Census.

arnold palmer invitational ryan madson louisiana primary syracuse basketball chipper jones chipper jones dancing with the stars cast

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.