Datamash, a tool for common data related operations in Unix shells

Image for post
Image for post

Having used the flexibility of powershell in terms of its rich command-lets, e.g., where-object, foreach-object, and group-object, I find bash shell limiting. Most of the capabilities offered by powershell command-lets need to be realized by either combining other commands or programming them in bash. Specifically, I am annoyed by the lack of command pipeline based support to perform basic data related operations, e.g., sum the numbers in a file, group values in a file, get frequency of words in a file.

This changed today when I learned about GNU’s datamash, a command that performs basic numeric,textual and statistical operations on input textual data files. Now, instead of doing (seq 10 | tr '\n' '+' ; echo '0' ) | bc to sum a sequence of numbers, I can just do seq 10 | datamash sum 1 :)

For my purpose, I found datamash’s support for following operations most useful.

While I have used csvkit in the past for some of the above operations, I suspect that I will be using datamash to perform these operations in the future.

If you crunch data, then you should try out datamash.


Written by

Programming, experimenting, writing | Past: SWE, Researcher, Professor | Present: SWE

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store