One of the top requests we get for NewsBreak is that we download and cache the web pages that the “Read More” links go to. The reason people want this is that many news feeds don’t provide the full story. They just give you a teaser. If you’re off line this means you can’t read the entire story without going back online. Annoying, I know, especially for those feeds that only give you like one line of the story.
So why don’t we do it? I’m glad you asked! Read on for the full story…
How a Newsfeed Works
Before I go into the details I’ll offer the world’s shortest lesson on RSS. No technical details here. Just the nuts and bolts of where the content of a news feed comes from. When you subscribe to a newsfeed you get a specific bunch of content. We, as the news reader, have no control over this content. The provider determines how much you get, whether images are included, etc..
So if the provider only wants you to read the first two lines, all we can provide you with are the first two lines. If the provider sends the entire article, we give you the entire article! Unfortunately, most providers only send one or two lines of a story and then include a link to the whole thing.
The Economics of Newsfeeds
Why do they do this? Another excellent question! The fact of the matter is that these sites typically make money when people actually visit them. Unique visits, click throughs, ad clicks, and other typical “web money makers” provide the source site with income. If they send you the entire story, you never visit the site, and they never make their money! For many sites the news feed isn’t so much a service as it is a teaser to get you to visit the actual site!
The Trouble with Caching: Part I
This brings us to the problem. If we start going out to the links in a news feed and grab the entire page that it leads to, we’ve just killed the potential income that the source site hoped to make. When you read the article offline not only can this bypass their metrics (means of measuring hits) but it may even eliminate the ads completely. Needless to say the providers really really really don’t approve of this!
Technical Note: Yep…this is a simplification but the general point is valid. The sites want you to visit, not download their pages.
The Trouble with Caching: Part II
Here is the really ugly one though. Copyright. If we go out, grab an entire web page and copy it to your device, we’ve just duplicated copyrighted content.
Is this legal?
Maybe it is but maybe it isn’t. To my knowledge this hasn’t been played out in court yet (not in the situation we are in anyhow.)
Do I want Ilium Software to be the company CNN, Disney, Time-Warner, or others send their lawyers after to resolve this issue?
Do I need to answer this one?
And this isn’t just a legal issue. We’re in the business of selling software. Copyright is a big deal to us. There is no way we are going to do anything that even comes close to violating the copyright of someone else. It goes against our core beliefs as individuals and as a company.
The Moral of the Story
For now, I don’t intend to put page caching on our list of new features for the next version of NewsBreak. Anything is possible of course and we never close the door completely on a good idea. At the same time, until we hear something that solidly invalidates the problems listed above, it just isn’t something we feel comfortable doing.