LJ Archive

UpFront

Various

Issue #83, March 2001

Stop the Presses, LJ Index and more.

LJ Bashed in WSJ

It was P.T. Barnum who famously said, “I don't care what you say about me, but just spell my name right.”

So that's our rationale, here at Linux Journal, for enduring a bit of publicity that came our way via the December 14, 2000 issue of The Wall Street Journal. The “Digits” column on page B6 (the Technology Journal page) leads off with a six-inch item titled “Linux Battles”. Normally we scan pubs like the Journal for tidbits about anything and everything that might remotely relate to Linux. But this time the news wasn't just close to home—it was home. The story was about Linux Journal.

“Digits” is the WSJ's form of “UpFRONT” and shares the same appetite for irony. Unfortunately, the irony in question here involved the apparent fact that our modest little on-line store sold police-style barricade tape that says “Microsoft Free Zone” while the company that actually runs the store, WAS, Inc., was hardly Microsoft-free. It seems that the site was at least partly served up by (shudder) Microsoft Windows NT—at least while WAS moved its operation onto some kind of UNIX.

We have been working with WAS to hasten the end of this irony and expect to return the store to an equally Microsoft- and news-free condition.

Storage System Turns Latino Research Center Into Publishing Powerhouse

Founded in 1989 at Michigan State University, the Julian Samora Research Institute has one purpose: to generate, transmit and apply knowledge that will serve the needs of Latino communities in the Midwest. Grants to the Institute help to fund empirical research done by scholars and to publish their research as books or monographs. This research looks at relevant social, economic, educational and political conditions of Latino communities in both the US as a whole and the Midwest in particular. The Institute's forthcoming data will serve as a resource on and for Latinos.

The Institute started publishing small books and reports a decade ago. Since then, it has increased both its research and publishing volume ten times over. Up until three years ago, the Institute could get away with publishing a book on paper and then file it away.

Danny Layne, who divides his time between network administration and publishing production, says: “If we needed to print a book, we'd pull it out of the file and then put it away.”

To keep up with the volume of research it had to publish, the Institute found itself producing more electronic files—files that kept getting larger and more complex. Researchers broke down chapters into multiple files. One book could consist of 20 different files. Researchers also generated electronic charts and graphics along with PowerPoint presentations. Books got published not only in hard-copy format but also on-line on the Institute's web site. Layne says: “To this end, we were generating new types of files that we never had before.” Disk space on a desktop personal computer couldn't handle the volume being churned out. Layne says that the Institute didn't want to start adding large hard disk drives to its desktop PCs. “If one PC's disk drive failed, then we'd have to restore files from a previous backup tape and recreate what we lost. That's inefficient.” So, with technology funds from the government and the University, the Institute decided to buy a central storage system to house all publications and the files for the web site. Since the Institute has a small computing staff and limited resources, the storage device had to be highly reliable, easy to set up and maintain and able to accommodate more storage space with the addition of more disk drives as needed. Layne notes, “Our search for a storage system brought us to Winchester Systems. We purchased a FlashDisk external RAID storage system with seven 9GB disk drives.”

Layne observes that just three years ago the Institute had virtually no storage—only a few desktop PCs. “During this time, the FlashDisk has allowed a small research department, within a large university, to turn itself into a publishing powerhouse on a small purse. Some of the other departments on campus are in awe of our storage system. And there are good reasons for it.” Two side-by-side Dell PowerEdge 2200s, one Windows NT and one Linux plug directly into the FlashDisk. It provides fast, highly reliable RAID 5 storage to multiple servers with different operating systems. This feature eliminates the expense of buying storage for each server. Layne says that managing one storage system is easier than managing two or three of them.

The Windows NT server, which connects to the intranet within the building, functions as a central repository for all active publications and for the databases used to inventory and to track these publications. The FlashDisk allows each researcher to have his or her own storage space, apart from the desktop. Using either a Windows-based PC or a Mac, researchers can access both Windows NT, Mac and Linux files stored on the FlashDisk.

Cross-platform programs allow the system to function as a quasi-network attached storage filer. These programs include Services for Macintosh running on the Windows NT server and Netatalk running on the Linux server. The latter program permits printers to function as network devices.

The FlashDisk also contains a large collection of artwork. Overall, the FlashDisk provides the researchers with fast access to a large bank of files: everything from text to graphics, regardless of the format, over an intranet. When a book is no longer going to be published, it gets archived to a CD-ROM or a DVD. Meanwhile, the Linux server, which connects to the external network, contains all of the Institute's web files, as well as the web site itself. About 700 web pages reside on the FlashDisk. The web site gets about 3,000 hits each day (100,000 hits a month). Setting the space aside on the FlashDisk to store the Linux files, as well as the Linux operating system, turned out to be easier than Layne thought it would be: “We just followed the FlashDisk's instructions in the manual and made one telephone call to technical support, and then we were up and running,” he says.

While the FlashDisk provides a large amount of disk space, Layne wants to avoid having it become clogged with multiple versions of old files. He says that keeping storage neat and trim shouldn't become a time-consuming burden for a network administrator. “We've mentored our researchers to perform a number of storage housekeeping procedures. After all, they're responsible for overseeing their flow of information, including creating it, updating it, storing it, deleting it or archiving it.” For example, researchers learn how to name their files so they can easily locate them and remove them if they get old. Layne has also put a regular storage clean-up program in place. Researchers have to go through their storage space and either delete multiple copies of files or move old files to a CD. While researchers do a good job of maintaining their space, Layne says that the Institute's publishing volume has a healthy appetite for more storage space.

According to Layne, “We're planning to upgrade our 9GB drives to 18GB drives to double our amount of storage. We can do this inexpensively because Winchester Systems will give us credit toward a trade-in on the drives. The service folks at Winchester Systems must feel like the repair people at Maytag. The FlashDisk has never broken down, not even hiccuped.”

Money from the University will allow the Institute to produce audio and video clips for the Web. Layne says, “We've already tested accessing and storing multimedia on the FlashDisk. Everything worked fine.”

—Elizabeth M. Ferrarini

Job Opening Trends

by Reginald Charney

Fuzzy Data

There was a plethora of terms found when analyzing the job-opening descriptions. For example, Windows' terms include 95, NT, CE, Win, Windows NT, etc. Besides various abbreviations, there are also ambiguous words. Is Win followed by NT to be counted as Windows and Windows NT or just Windows NT alone? Generally, the “maximum munch” rule is used. That is, the longest recognizable term possible is used. Then there are still the misspellings and unknown words. Thus, all this makes determining what is required fuzzy. Having said that, here are some of the statistics used in the job descriptions:

  • number of job openings: 129,000

  • number of unique words: 12,200

  • number of unique skill sets: 68,800

Five most often used words:

  • C++: 19,951

  • Java: 16,920

  • SQL: 9,121

  • vb: 9,003

  • HTML: 8,788

  • JavaScript: 4,793

As you would expect with all these terms, skill sets made up of a couple of words would tend to appear more often than skill sets made up of three or more words. The top five skill sets were:
  • frame relay: 445

  • Oracle dba: 445

  • C++ Java: 412

  • WinNT server: 361

  • Shell script: 361

Thanks to DICE at www.dice.com for a truly valuable service and allowing me to analyze some of their data.

Reginald Charney currently heads the US chapter of the Association of the C and C++ Users. Visit their site at http://www.accu-usa.org/ to learn more.

Linus Torvalds, Then and Now

Linus Torvalds [from LJ March 1995]: “I'll make a [kernel] 2.0 someday...” “No wonder Linux works so well. He has an alpha-testing lab of ten thousand people!” says someone from Novell.

Mr. Torvalds [from LJ November 1999], married with children, well into the 2.x series of kernels with eight million users strong.

LJ Index—March 2001

  1. Market cap in billions of USD at which Yahoo is worth more than all magazines put together: 30

  2. Number of billions of USD IBM is investing in Linux software, hardware, services, partnerships and the Open Source community in 2001: 1

  3. Number of servers X-series computers in what IBM claims will be the world's largest Linux supercomputer: 1,024

  4. Number of racks used to house all those Linux boxes: 32

  5. Sum in trillions of USD that will be spent on Internet infrastructure and e-commerce by 2003, exceeding the gross domestic products of Germany, France and the UK: 2.8

  6. Year by which Vint Cerf expects the number of net-connected devices will outnumber the world's telephones: 2006

  7. Millions of net-connected devices, independent of cell phones, Vint Cerf predicts will exist by 2006: 900

  8. Number of airline flights in the next 24 hours: 42,300

  9. Number of people who will take those flights: 3, 000,000

  10. Number of flights that will crash: 0

  11. Number of travelers in 1999 who used the Net to plan trips and make reservations: 52,000,000

  12. Percent of application developers worldwide who say they plan to write wireless applications in the next year: 40

  13. Position of Linux as a web server platform: #1

  14. Percentage of web servers that ran on Linux as of May, 2000: 36

  15. Position of Linux among fastest-growing server operating systems: #1

  16. Internet-related applications as a percentage of spending on Linux servers: 40

  17. Number of handheld and notebook information appliances by 2002: 55,000,000

  18. Year by which shipments of handhelds and notebook appliances will exceed shipments of PCs: 2005

Sources:

  • 1: Forbes

  • 2-4: IBM

  • 5: Nortel & International Data Corp. (IDC)

  • 6-7: Domain Street

  • 8-10: Boeing

  • 11-12: Evans Data Corp.

  • 13-14: Netcraft

  • 15-18: IDC

Linux Bytes Other Markets—Sticky Game Site Powered by Apache and Linux

After only a few months of operation, NeoPets.com, a web site built on a Red Hat Linux/Apache platform, is already turning a profit, recording billions of page views monthly. Targeting youths aged 20 or younger, the site enables users to create and care for their own personal virtual critter known as a “NeoPet”. It also boasts a series of constantly changing “universes” complete with games, stories, contests and entertainment. According to recent figures from PC Data Online, NeoPets attracts 2.1 billion page views and 2.3 million unique users each month who each stay for an average of 7.48 hours, making this the stickiest site on the Web.

Based on August numbers, NeoPets ranks higher in page views than Excite, Lycos and Amazon. What's more, it engenders far more loyalty (termed stickiness) among users. The average AOL user, for example, visits for 35 minutes a month, while Yahoo users spend three hours 22 minutes. In the Gen Y market, NeoPets total of seven hours 48 minutes trounces the competition, with ten times the page views of Disney.

Initially created in a college dorm with a “launch campaign” that consisted of sending a couple of e-mails to other virtual pet sites, the site chalked up 200 sign-ups on its first day and was soon scoring as many as 200,000 page views a day. A management and technical team was then formed to create the corporate platform needed to help NeoPets expand. They added more staff and moved its servers to Pixelgate, a Westlake Village, California-based web hosting and internet services company. “After being off-line for several days, we surpassed 600,000 page views within three days of getting back on-line,” said NeoPets Chairman and CEO Doug Dohring.

The company increased the number of Apache/Linux boxes from two to five, using single CPU P3-600s as image servers and dual P3-600s for web servers, each with 512MB to 1GB RAM. Continual load expansion eventually pushed NeoPet's MySQL database technology to the limit. By this time, NeoPets was surpassing up to ten million page views a day. Reorganization again became a necessity.

The company secured the services of Web Zone Inc. of Santa Clara, California and Broomfield, Colorado-based Level 3, a multinational Tier One provider with hosting facilities in Los Angeles. This provided enough bandwidth to deal comfortably with anticipated traffic volumes. NeoPets then added yet more staff and purchased about 50 Red Hat/Apache web and image servers, two more MySQL Servers and a Sun server to run an Oracle database. Once the Oracle conversion was completed, page views soared to over 40 million a day.

The current NeoPets architecture comprises a Red Hat 6.2 and Apache front end, with a Solaris and Oracle back end. At the same time, MySQL is still used for a wide range of database operations.

Despite the introduction of Oracle, NeoPets remains one of the larger users of Apache on the Web. Though Oracle had to be introduced to provide a heavy-duty database, NeoPets believes that open source ultimately offers better quality and greater product reliability and remains committed to further expanding the robustness and capacity of PHP, Apache and MySQL as an alternative to Oracle.

“We are looking for people who can modify these open-source applications and take them to a new plateau,” said CTO Bill McCaffrey. “If we involve the right people, we believe that we can take these applications to the point where they can be used for even the largest sites on the Web.”

In anticipation of another summertime boom in site usage, NeoPets is planning to add many more web developers and open-source programmers, as well as system administrators and IT support staff.

—Drew Robb

The Linux Journal Award for Applied Calimari Goes To—

Open source is a fine development model, but with the obvious exception of Eric Raymond it kind of sucks at PR.

Okay, let's qualify that. There are some fine companies that get mileage out of open source as a virtue, but as an editor I can tell you that there are too darn few pure open-source projects .org-type with a PR department (we suspect that number is zero), or with much PR instinct, by which I mean they bother editors like me with interesting information about what they're up to. Sure, we get flamed to a cinder when we neglect to mention the obvious, such as early last year when we wrongly reported that Borland's InterBase was about to become the first open-source database project, earning the outrage of some PostgreSQL folks (though surprisingly few, considering). But there isn't much outreach by the growing assortment of nuts-and-bolts open-source projects that simply make something handy that a lot of others can use.

Take proxy caching, which is very handy if you've got a lot of traffic to manage—but not much of a conversation starter except for those who (for professional or other reasons) obsess about it.

As it happens there are more than a few obsessives out there, and one of them (I forget who) told me that Squid (http://www.squid-cache.org/) is the cat's pajamas of open-source proxy servers. Well, it seems there are a pile of proprietary (presumably closed-source, certainly not free) proxy servers in the world. You can get them from Lucent, Novell, IBM, Cisco, Microsoft and the other usual suspects. Their prices run from zero to six figures. Squid is at the bottom of that range. As their FAQ puts it, “You can download Squid via FTP from the primary FTP site or one of the many worldwide mirror sites. Many sushi bars also have Squid.”

The product is competitive—literally. A group called IRCache holds frequent bake-offs (which they now call cache-offs) using the web Polygraph (http://www.polygraph.ircache.net/), a benchmarking tool developed by the National Science Foundation and a bunch of those same usual suspects. The results (also on the IRCache site) for each bake/cache-off run through many pages, many tables and many graphs. Squid leads in some places and lags in others, but it runs in the thick of every race.

Perhaps the most telling results come from this level-5 post from Matthew P. Barnson on Slashdot last year:

I can personally say that the three I've had experience with, Novell's ICS caches (which comprised ten of the twenty entrants), Network Appliance's NetCache, and Squid (on Solaris, in our case) all rock. Squid 2.3-stable1 was a dream to compile, install, and configure.

When we contacted him directly, he added this about Squid: “As an outgrowth of the Harvest Project, this venerable, free-software proxy cache sets the benchmark by which all other caches are measured.... For the price, Squid kicks some serious butt!” He also has kind words for another open-source project:

Apache web server was not specifically mentioned in the bake-off, but in my experience is extremely popular for caching services because the same server that can serve your web pages from your dorm room can also speed up your web surfing.

And he should know his stuff, he works at one of the most heavily visited sites on Earth (iMALL.com).

So let's raise a glass of saki to the Squid team and invite all the other open-source and free-software developers who envy this kind of coverage to let us know what they're up to.

Stop the Presses

What's happens when the Penguin-in-Chief himself issues an oh-by-the-way e-mail to the Kernel Mailing List that takes the form of a halfhearted (dare we say half-minded?) and clearly tongue-in-cheek press release that just happens to be entirely about the long-awaited version 2.4 of the Linux kernel?

A worldwide sigh of thanks, followed by traffic jams at the ftp servers.

It went like this:

Date: Thu, 4 Jan 2001 16:01:22 -0800 (PST)<\n> From: Linus Torvalds torvalds@transmeta.comTo: Kernel Mailing List linux-kernel@vger.kernel.orgSubject: And oh, btw...

In a move unanimously hailed by the trade press and industry analysts as being a sure sign of incipient brain damage, Linus Torvalds (also known as the “father of Linux” or, more commonly, as “mush-for-brains”) decided that enough is enough, and that things don't get better from having the same people test it over and over again. In short, 2.4.0 is out there.

Anxiously awaited for the last too many months, 2.4.0 brings to the table many improvements, none of which come to mind to the exhausted release manager right now. “It's better”, was the only printable quote. Pressed for details, Linus bared his teeth and hissed at reporters, most of whom suddenly remembered that they'd rather cover “Home and Gardening” than the IT industry anyway.

Anyway, have fun. And don't bother reporting any bugs for the next few days. I won't care anyway.

—Linus

Context: The kernel had a bit of a Y2K problem of its own. In January 2000, Linus said the 2.4 kernel would be out in the summer. Then in November he said the kernel would be released in early December. Now here it is, pretty much exactly one year, um, later. And hey: so what?

The features? USB support, symmetric multiprocessing support, a rewritten networking layer, driver updates, other good stuff.

2.6 is next. Let the wait begin.

—Doc Searls

They Said It

Never attribute malice to what can be adequately explained as pure-unfiltered-idiocy.

—Joseph E. Arruda

Black holes are where God divided by zero.

—Steven Wright

Eagles may soar, but weasels don't get sucked into jet engines.

—Anonymous Psychopath on Slashdot

Any technology distinguishable from magic is insufficiently advanced.

—Don Marti (as far as we know)

Every revolution has been preceded by hard critical thinking, the diffusion of culture, and the spread of ideas among men who are at first unwilling to listen, men concerned with solving their private economic and political problems.

—Antonio Gramsci.

What is wanted is not the will to believe, but the will to find out, which is the exact opposite.

—Bertrand Russell

It is a myth that people resist change. People resist what other people make them do, not what they themselves choose to do. . . . That's why companies that innovate successfully year after year seek their people's ideas, let them initiate new projects and encourage more experiments.

—Rosabeth Moss Kanter

Long-range planning does not deal with future decisions, but with the future of present decisions.

—Peter F. Drucker.

You can't depend on your judgment when your imagination is out of focus.

—Mark Twain.

Discovery consists of seeing what everybody has seen and thinking what nobody has thought.

—Albert Szent-Gyorgyi.

The Internet never retreats.

—Vint Cerf

Linux (le-nuks, lin-uks) noun. A version of the UNIX System V Release 3.0 kernel developed for PCs with 80386 and higher microprocessors. Developed by Linus Torvalds of Sweden (for whom it is named).

—from Microsoft Bookshelf (spotted by Wojciech Tatina)

The only piece of software I've never cursed is emacs. It changes modes effortlessly. When I'm editing a Perl script it adds the tags and checks the parens. When I edit a letter it gives me all the carriage returns in the right place. It's one piece of software, but it understands file extensions. emacs knows what I'm up to. It's okay with what I do and it tries to help. I often find Word trying to add bullet points or numbers where I don't want them. emacs never does that to me. Of course emacs and I grew up in the same environment, so maybe that makes sense.

—Clay Shirky

...being a Linux user is sort of like living in a house inhabited by a large family of carpenters and architects. Every morning when you wake up, the house is a little different. Maybe there is a new turret, or some walls have moved. Or perhaps someone has temporarily removed the floor under your bed.

—John R. Levine and Margaret Levine Young

To try to do something that is inherently impossible is always a corrupting enterprise.

—Michael Oakshott

Bored people are the best consumers.

—John Taylor Gatto

You people just don't get it, do you? All Linux applications run on Solaris, which is our implementation of Linux.

—Scott McNealy

LJ Archive