CSVs and the command line
I occasionally work with large csv files. With large files, full featured programs often move really slowly, which can be frustrating when I have a few simple questions. By large, I mean csvs ranging from a thousand lines all the way up into the gigabyte ranges. To get around this it can be much faster to get these answers from some simple command line tools. Hereâs a few tips that Iâve picked up over the years. Iâll use âcsvâ to specify the path/to/csv for simplicity
wc -l csv => number of lines in the csv
head -100 csv => Shows first 100 lines. Use -1 to just see the headers
tail -100 csv => Shows the last 100 lines
You can combine the commands to access any contiguous range of lines
head -1000 csv | tail - 50 => Shows lines 951-1000
tail -50 csv | head -20 => Shows the first 20 of the last 50 lines of the file
With head and tail you can access any lines in the csv you want, but it gives all of the columns. That can be difficult to parse through if the csv youâre dealing with has a lot of columns, or a lot of content in the columns. Luckily we can use the cut command to filter for the columns you want
cut -d , -f 1 csv => Shows you the first column of the csv. NOTE: the -d flag specifies the delimiter. You can specifiy any character. The default is the TAB character so if youâre working with a tsv (tab separated values) you can omit the -d flag entirely. On the other hand, cut is not sophisticated, so if you csv has commas in the values then it might return incorrect values. This works best for large csvs that donât have commas in the values. Okay, back to more copy-pastable commands!
cut -d , -f 2,5,10 csv => Shows the 2nd, 5th, and 10th columns
If youâre working with a large csv you might have realized that this isnât particularly useful because it prints out every line. However we can combine this with head and tail and then we can really start to see any part of the csv we want!
cut -d , -f 2,4 csv | head -20 => View columns 2 and 4 but only the first 20 lines
head -20 csv | cut -d , -f 2,4 => Same as above, but this should run faster b/c head will run first which will significantly reduce the amount of data cut has to sort through. However these commands can run so fast sometimes that it often doesnât matter. Unless your csvs are pushing the 100MB ranges you probably wonât notice a difference
Letâs put them all together
head -1000 csv | tail -50 | cut -d , -f 10,15 => View columns 10 and 15 of lines 951 to 1000
Thereâs plenty more options to uncover. Run man cut to view them all. There are a few other commands you can put in there too which can be helpful, such as sort.
Happy CSV-ing!











