LJ Archive

Letters

Hardware Interrogation

I've just read Federico Kereki's article about interrogating a Linux system titled “What's in the Box? Interrogate Your Hardware” in the December 2015 issue. I love this kind of article and hope to see more!


Brian Clark

Federico Kereki replies: Thanks, Mr Clark, for your kind words. The article grew out of my actual need to know about the hardware in my own machine, and because of Linux's openness, I learned even more than I had expected. I'm glad you liked my results!

Server Hardening

Regarding Greg Bledsoe's “Server Hardening” article in the November 2015 issue: great article—lots of detailed help in one source. One question: is it possible to mention specific logs you reference in the article? I get lost quickly when seeing the large number of logs scattered about my Debian server.


Tom Browder

Bug in Script

I believe that the final version of the script that Dave Taylor came up with in his Work the Shell column titled “Analyzing Comma-Separated Values (CSV) Files” in the December 2015 issue of Linux Journal contains an oversight. Specifically, it does not handle the case in which more than one field contains commas (for example, the dollar amount field and the comment field). I have modified Dave's script to take this into account. Hopefully, this will be of some help. I always enjoy Dave's column and have learned a lot from it. Here's the modified script:


#! /bin/bash -

# fixcsv

# fix CSV files with embedded commas

# The problem is that some spreadsheet fields may contain 
commas. In the sample case, this includes the dollar 
amount and comment fields. I believe you overlooked the 
case in which both the dollar amount and comment fields 
contain commas. Your script assumes that there is at
most one such instance.

# The simplest solution is to export the spreadsheet 
contents with some field delimiter that can never appear 
in any field, e.g., a tab. Then write the script 
using this delimiter.

# Original code

# while read inline
# do
#   if [ ! -z "$(echo $inline | grep \")" ]
#   then
#     f1=$(echo $inline | cut -d\" -f1)
#     f2=$(echo $inline | cut -d\" -f2)
#     f3=$(echo $inline | cut -d\" -f3)
#     echo $f1`echo $f2|sed 's/,//g'`$f3
#   else
#     echo $inline
#   fi
# done
# exit 0

# This works correctly ONLY when there is EXACTLY ONE 
field enclosed in double quotes.

# Revised code

# For each line that contains at least one field enclosed 
in double quotes, process each such field from left to 
right until no fields are enclosed in double quotes and 
all remaining commas are field separators. The steps are: 
(1) replace the double quotes enclosing the field being 
processed with a temporary delimiter to isolate that
specific field, (2) remove any commas embedded in the 
isolated field, (3) reconstruct the line without the 
temporary delimiters. The temporary delimiter must be 
a single character (for the cut command) that cannot 
appear in the input file. I selected an asterisk (*), 
but other characters can be used. Some characters (such 
as asterisk, colon, hyphen, and equals) work fine, while 
others (such as tab and semicolon) do not.

td=*  # temporary delimiter

while read inline
do
  while [ ! -z "$(echo $inline | grep \")" ]
  do
    inline=$(echo $inline | sed "s/\"/$td/" | sed "s/\"/$td/")
    f1=$(echo $inline | cut -d"$td" -f1)
    f2=$(echo $inline | cut -d"$td" -f2)
    f3=$(echo $inline | cut -d"$td" -f3)
    inline=$(echo "$f1$(echo $f2 | sed 's/,//g')$f3")
  done
  echo $inline
done
exit 0

# Test input file fixcsvtest.txt:

$ cat fixcsvtest.txt
4/7/14,subscriptions,199.99,Ask Dave Taylor Monthly
4/10/14,subscriptions,"1,300.99",Linux Journal
4/10/14,subscriptions,"1,300.99","Linux Journal, APR 2014"
4/10/14,subscriptions,19.99,"Linux Journal, annual"

ab,cd,ef,gh
ab,cd,ef,"g,h"
ab,cd,"e,f",gh
ab,cd,"e,f","g,h"
ab,"c,d",ef,gh
ab,"c,d",ef,"g,h"
ab,"c,d","e,f",gh
ab,"c,d","e,f","g,h"
"a,b",cd,ef,gh
"a,b",cd,ef,"g,h"
"a,b",cd,"e,f",gh
"a,b",cd,"e,f","g,h"
"a,b","c,d",ef,gh
"a,b","c,d",ef,"g,h"
"a,b","c,d","e,f",gh
"a,b","c,d","e,f","g,h"
$

# Test Results

$ ./fixcsv < fixcsvtest.txt
4/7/14,subscriptions,199.99,Ask Dave Taylor Monthly
4/10/14,subscriptions,1300.99,Linux Journal
4/10/14,subscriptions,1300.99,Linux Journal APR 2014
4/10/14,subscriptions,19.99,Linux Journal annual

ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
$




Jeff Mumma

Dave Taylor replies: Thanks for your note, Jeff, and I do believe you're correct that I didn't test the case where more than a single field of the input data included commas. Bah, pesky debugging! I like your mods, and yet still have a niggling sense that the entire problem can be sidestepped with the perfect regular expression. If I only had a few weeks to create it!

Photo of the Month

I thought you would like to see an unusual place where LJ is being read this month: 49 degrees north, 35 degrees west. That's the middle of the Atlantic Ocean at 20 knots heading for NYC. The satellite Internet costs are rather steep on board so I brought a few issues with me. Must dash as the sun is over the yard arm.


Roger Greenwood

LJ Archive