While working through large text files for work, I found several command line tools whose existence was unknown to me before. Their usefulness is great however, expecially if you deal with comma or tab delimited files (I do, and a great deal).
For example you can cut specific columns out of a file with the cut command:
cut -f [fieldname] textfile
You can also specify ranges (-f1,5) or a starting point till the end (-f2,. By default the field delimiter is TAB but you can change it with the -d option:
cut -d , -f1,2 textfile
By default it outputs the result to the standard output, so you will need to redirect (>) to a file if needed. The –output-delimiter option permits you to change the delimiter of the output.
The paste command does exactly the reverse:
paste file1 file2 > resultfile
Basically it adds file2’s lines to file1’s, useful if you have to add a specific column from another file.
Another nice utility is the “comm” command. I recently found it and it’s a life saver if you have two files that contain similar elements and you want to find out which are common and which aren’t. As a prerequisite the files should be sorted first, otherwise you won’t get the right results. Then you can use comm to print out a series of information.
sort file1 > file1.sorted
sort file2 > file2.sorted
comm -13 file1.sorted file2.sorted
In this particular example I tell comm to suppress output of the lines that are only in the first file (-1) and the common lines (-3) so that you effectively get only file2’s items. The other option is -2, and that suppresses lines unique from file2. Again the output is on standard output so if you need to save it you’ll have to redirect it somewhere.
That’s all for now. As you can see the command line can do a few interesting things (try that, cmd.exe) and that’s merely the surface (google for other powerful commands such as awk or sed and you’ll see what I mean). By doing things this way I manage my files far more efficiently than with a GUI.