Thursday, June 28, 2012

How To: Update Your Ubuntu GNU/Linux sources.list the Geeky Way

Here's my geeky tip for updating your /etc/apt/sources.list on Ubuntu GNU/Linux.

This tip is especially useful around April/October when the new Ubuntu releases are freed into the wild and the main servers are very busy.

I know what you're saying: This can easily be done from the  Ubuntu Software Center via the edit > sources menu. Yes, this is true, but now that's not a very geeky (or terminal-fast) thing to do, now is it? Besides, I like it better when I can initiate the sources update myself with sudo apt-get update, vs. having the software centre do it on exit.

To change your sources.list package server setting from the command line.

1.) Open the Terminal. Simply hit CTRL+ALT+T.

2.) Run this command to update your sources.list file:

sudo sed -i.backup 's/' /etc/apt/sources.list

3.) Run this command to see if your change took effect (you should see instead of on update).

sudo apt-get update

Related Notes:

a.) Edits your sources.list file in place (makes a backup of your current sources.list as /etc/apt/sources.list.backup). Keep in mind if you run the command twice - the backup will be overwritten.

b.) Assumes your Ubuntu was installed in the USA(can probably swap the us for your country code) - hence the original setting.

c.) Assumes you want to replace your current package source with (that one is fast for me here in Seattle, WA). See your list of options for package servers by running this command:

cat /usr/share/update-manager/mirrors.cfg

Alternatively, checkout
archivemirrors for list with speeds and other information.

Feel free to leave your suggestions for the better way below. Thanks!

See these links for more information:


Shannon VanWagner

Tuesday, June 19, 2012

How To Use xargs To grep (or rm) a Million Files

Sometimes even when a tidbit of technology one is studying is already very well documented, one still seeks to test it out for oneself to get a solid sense of the true behaviour of the subject. Plus, if you're like me, writing about a particular subject has the added benefit of committing it to memory.

And so it is for the reason of teaching myself that I document these already well-known points about grep and xargs.

Of course, as a side-effect, if another out there ends up learning from my writings too, that would be perfectly fabulous in my eyes as well.

Basically, the question in my mind is this: How do I successfully grep(search) for something in a directory that contains hundreds of thousands, or perhaps more individual files?

To illustrate an example: Using the grep command by itself to search through hundreds of thousands of files provides the following result on my Ubuntu 12.04 GNU/Linux system. The below directory contains 200,000 files.

$ grep 'rubies' *
bash: /bin/grep: Argument list too long

So why would I receive the error "Argument list too long" for this example? The key is to take a look at the number of characters for an argument that I am dealing with when using grep * in a directory with a large number of files(as in the example above ). Take a look at this example, which counts and displays the number of characters in the arguments for echo.

$ echo *|wc -c

The above command uses echo to enumerate all the names of the files in the current directory with the wildcard "*". The results are then piped to the word counting(wc) program, showing number of characters via (-c).

So as you can see, when applying "*" to a command, it's not really the number of files retrieved as arguments that's the problem, but the length of all the names of the files globbed together in the directory when all being processed as an argument to a command with the "*" wildcard.

If the number of characters you retrieve with the command above is greater than the pre-set "ARG_MAX" value on your system, that's when you will get the "Argument list too long" error with a command being used to process a great number of files.

Here's one example of how to find the ARG_MAX value:

$ getconf ARG_MAX

Obviously, if the number of characters submitted to my grep command is greater than the number shown for the ARG_MAX setting, I will not be able to process a command that uses * with that size of argument.

So, the answer to deal with this argument list problem, is to use GNU xargs from the Free Software Foundation.

Here's an excerpt from the xargs man page:

       xargs - build and execute command lines from standard input

       This manual page documents the GNU version of xargs.  xargs reads items from the
       standard input, delimited by blanks (which can be protected with double or  sin‐
       gle  quotes  or  a  backslash) or newlines, and executes the command (default is
       /bin/echo) one or more times with any initial-arguments followed by  items  read
       from standard input.  Blank lines on the standard input are ignored.

       Because  Unix  filenames can contain blanks and newlines, this default behaviour
       is often problematic; filenames containing blanks and/or newlines are incorrect‐
       ly  processed  by xargs.  In these situations it is better to use the -0 option,
       which prevents such problems.   When using this option you will need  to  ensure
       that  the  program which produces the input for xargs also uses a null character
       as a separator.  If that program is GNU find for  example,  the  -print0  option
       does this for you.

(For the complete manual, please see )

In this writeup, I want to focus on details of the second paragraph. Specifically, I wanted to document some tests that show why you should use the find command with the -print0 setting along with the xargs -0 setting together to overcome problems like spaces in filenames, and to overcome the "Argument list too long" error.

Anyways, here's how you can see how things respond, and which way is the wrong way vs. the right. DISCLAIMER: These tests are experimental only, and I cannot responsible for any damage you cause to your machine while testing these commands for yourself. So make a backup of your important data and use caution when entering the commands.

Let's start by making 200,000 files (a task that took my computer about 8.3 seconds). Then we'll cd into the new directory.

mkdir dirWith200KFiles
cd dirWith200KFiles

Now, create 200,000 files (named file-1 thru file-200000), and echo some text into them (with just a few taps of your fingers). Note: this same process will work for a million or more files, e.g., just replace {1..200000} with {1..1000000}.

for eachfile in {1..200000}
    echo "yes there is something here" > file-$eachfile

Now, let's hide a gem in one of the files so we can search for it with grep later.

echo "rubies diamonds and gold" >> file-78432

And, let's add a file with spaces in the name so we can break some commands with that too.

echo "spaces in filename" > "myfile spaces inname"

At this point we can conduct a search with grep, and experience what might happen when one is trying to find a gem in such a large set of files and in a file with spaces in the name.

$ grep 'rubies' *
bash: /bin/grep: Argument list too long

So, in the above example, grep fails because of "Argument list too long". To resolve the problem, see the CORRECT example below.

A CORRECT way to use xargs with grep:

$ find . -type f -print0 | xargs -0 grep 'rubies'
./file-78432:rubies diamonds and gold

In the above example, the find command checks the current directory for files of type and formats the output, replacing blank spaces in names with the null character, which then gets piped to the xargs command. The xargs command accepts the output from the find command, while ensuring no blank spaces with the -0 (format of -print0 command required), and greps the results for 'rubies'. As you can see in the output, this is how it's supposed to work.

Here are a few variations with explanations that show how these are the WRONG way to use the grep and xargs commands.

$ find . -type f | xargs -0 grep 'rubies'
xargs: argument line too long

In the above example, when the find command encounters our filename with 3 spaces in it, they are piped into the xargs command as 3 arguments at once, which causes an error because our xargs command only expects 1 argument.

$ find . -type f -print0 | xargs grep 'rubies'
xargs: Warning: a NUL character occurred in the input.
It cannot be passed through in the argument list.
Did you mean to use the --null option?
xargs: argument line too long

In the above example, the output format of the find command sends the null characters in place of the spaces in the filename, but the xargs command doesn't expect them, so it causes an error.

And finally, the most chaotic example that has the potential to cause problems. Especially if using xargs to do something more destructive than grep, e.g. rm (remove) files:

$ find . -type f | xargs grep 'rubies'
grep: ./myfile: No such file or directory
grep: spaces: No such file or directory
grep: inname: No such file or directory
./file-78432:rubies diamonds and gold

In the above example, the output from the find command is processed by "xargs grep" as separate arguments and so separate filenames in this case. The xargs grep command then also succeeds in finding the correct result, but at this point the damage could already be done.

Here are some of the same tests using the command "rm" instead:


$ find . -type f -print0 | xargs -0 rm

WRONG (In my case of deleting 200K files anyway):

$ rm *
bash: /bin/rm: Argument list too long


$ find . -type f | xargs rm
rm: cannot remove `./myfile': No such file or directory
rm: cannot remove `spaces': No such file or directory
rm: cannot remove `inname': No such file or directory

So there it is. Problem solved. I'm definitely not saying this is the only way to do it. But now you can get your searching for text in large sets of files on like never before.

Credit to the site below for showing more information on ARG_MAX

Shannon VanWagner

Friday, June 8, 2012

Ubuntu 12.04 GNU/Linux + HP 8100 or Ricoh Aficio MP 3500 = Printing Success

Here's a quick write-up on my real-life experience with adding the HP LaserJet and Ricoh Aficio MP 3500 as printers in Ubuntu GNU/Linux.

I chose to make a note of this simple task because I was tripped up by it at first. The problem? The default setting caused nothing but garbage at the printer. After some simple trial and error, I figured out that I needed to switch the driver settings as noted below.

Ricoh Aficio 3500 Driver:
Ricoh Aficio MP 3500 PXL

HP Laserjet 8000 series Driver:
HP Laser Jet 8000 Series pcl3, hpcups 3.12.x

Adding a printer in Ubuntu GNU/Linux 12.04 is really easy, simply follow these steps:
  1.  Click the Power icon > Printers > Add + 
  2. Expand the Network Printer section > click AppSocket/HP jetDirect
  3. Enter the hostname|IP Address for the printer, Click Forward (and pause as the system will attempt to detect the printer)
  4. Select the printer from database(or leave as detected) > click Forward
  5. If not given the selection to select the specific driver, accept the default and then you can come back and open printers > properties (for the printer you want to modify), then set the driver that way. 
Basically, if your printer is not working with the default setting (usually postscript), I suggest trying the pcl3 or pxl drivers instead.

Typically, printing works great with the default settings in Ubuntu anyway, I just wanted to point out that if it's not, that you should try switching to the alternate driver.

Hopefully this helps someone out there. Please feel free to leave your on-subject, constructive comments below.

Note: If you're looking for the *.ppd file for Ricoh  Aficio MP 3500 PXL (can be imported as a printer driver), see this link.


Shannon VanWagner