Split and Reassemble Files

Posted by gmendoza on June 3, 2007 under Tech Tips | Be the First to Comment

If you ever need to work with a large file and wish you could split it into smaller pieces, you’ll be pleased to know that it’s extremely easy to do in Linux. You can use the “split” utility that comes standard with most *nix variations. Lets take a look at a couple easy examples.

To create a test file to work with, the following will create one that’s exactly 100 megabytes. Note, I am using ‘dd’ with /dev/urandom to demonstrate that the results of the split and reassembly are completely accurate. This will be accomplished via md5 hash comparisons at the end of this process.

$ dd if=/dev/urandom of=testfile bs=1k count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 23.2982 seconds, 4.5 MB/s

$ ls -lh testfile
-rw-r--r-- 1 gmendoza gmendoza 100M 2007-06-03 22:45 testfile

To split the file into five 20MB files, use the split command as shown below. Note, I am producing five files with a new naming convention of “splitfiles”.

$ split -b 20971520 -d testfile splitfiles

Verify by listing all files that begin with “splitfiles”. Below, you see the new files with the appropriate sequence numbers as a result of the split command.

$ ls -l splitfiles*
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles00
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles01
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles02
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles03
-rw-r--r-- 1 gmendoza gmendoza 20971520 2007-06-03 22:47 splitfiles04

To reassemble the smaller files back to their original state, concatenate them together using a simple redirect.

$ cat splitfile* > newtestfile

… and list again to show your handy work…

$ ls -lh newtestfile
-rw-r--r-- 1 gmendoza gmendoza 100M 2007-06-03 22:52 newtestfile

As proof that both the original and newly reassembled files are exactly the same, check the results of a cryptographic md5 hash:

$ md5sum testfile newtestfile
54a07d5011ca893eddfab29960a7f232 testfile
54a07d5011ca893eddfab29960a7f232 newtestfile

Cool stuff.

Useful APT Aliases

Posted by gmendoza on June 1, 2007 under Tech Tips | Be the First to Comment

If you’re an avid user of Ubuntu or other Debian based Linux distributions, then you’re probably very familiar with using APT and it’s related command line utilities. You might however find it useful to create some command line aliases that shorten the time it takes to type out these repetitive tasks.

For example,

"sudo apt-get update" can be shortened to "agu".
"sudo apt-get install" can be shortened to "agi".
"sudo apt-get dist-upgrade" can be shorted to "agd".

A very simple way to create a set of command line aliases would be to add them to your ~/.bashrc file located in your users home directory. Here’s an example of some of my favorite APT aliases.

# Favorite Aliases
alias agu='sudo apt-get update'
alias agi='sudo apt-get install'
alias agd='sudo apt-get dist-upgrade'
alias agr='sudo apt-get remove'
alias ags='sudo aptitude search'
alias agsh='sudo apt-cache show'
alias afs='sudo apt-file search'
alias afsh='sudo apt-file show'
alias afu='sudo apt-file update'

To apply the changes immediately to your bash profile without having to log out, simply run the following command:

. .bashrc

Now, if you want to install the “vim-full” package, simply issue the following command:

agi vim-full

Remember, because “sudo” has been added to your alias, you don’t have to type it every time. It will prompt you to use the password the first time, and won’t ask again for the duration of the defined timeout period. Cool?

“apt-file” is a very useful package you should install. The alias is defined above, but is not installed by default. It allows you to search for file names in all packages from all your defined repositories. For example, lets say you’ve tried to run an application and it claims that your’re missing the library “libstdc++.so.5.0.7″. The following example tells you which packages contains a file with that name, which you can then install.

afs libstdc++.so.5.0.7
libstdc++5: usr/lib/libstdc++.so.5.0.7
libstdc++5-3.3-dbg: usr/lib/debug/libstdc++.so.5.0.7
agi libstdc++5

Although these examples have been geared towards Debian and Ubuntu, you can obviously use aliases on any Unix-like operating system. The technique of applying them just varies depending on the shell environment you are using. Have fun!