The join
command, as the name suggests, outputs the contents of two files joined together based on a common field. For example, consider the following files of names:
$ cat names 1 Pike 2 Grog 3 Scanlan 4 Shaun 5 Tiberius 6 Taryon $ cat surnames 1 Trickfoot 2 Strongjaw 3 Shorthalt 4 Gilmore 5 Stormwind 6 Darrington
I would like to see the full names for each individual. It’s as simple as join names surnames
:
$ join names surnames 1 Pike Trickfoot 2 Grog Strongjaw 3 Scanlan Shorthalt 4 Shaun Gilmore 5 Tiberius Stormwind 6 Taryon Darrington
That was easy! Note the numbers at the start of each line – this is what join
uses to determine whether or not a record exists. Unless otherwise specified, join
will use the first field for a file. If I append another line to names
, for example 7 Percy
, then join the files again, there will be no seventh row in the second file to join with:
$ join names surnames [...] 6 Taryon Darrington
No Percy there. However, if I add the -a
option with a value of 1, we get the following:
$ join names surnames -a1 [...] 6 Taryon Darrington 7 Percy
The -a
switch controls how join
should deal with unmatched lines; the value of 1 here means “first argument”. In our example, there is no matching line from surnames
so the value from names
is used. If I wanted to display all the lines without matches from surnames
, I’d use -a2
instead. If I wanted to see all lines from both files, match or not, I could use -a1 -a2
.
Bringing it all together
Let’s put all of this together, in a single file, full names – but without those line numbers. While we’re at it, we can also add a surname for Percy. The -o
switch controls the output fields in the format file.field
; 1.1
being the first field in the first file (in this case, the numbers), 1.2
is the second field in the first file (Pike, Grog) and so on. These arguments are given as a comma-separated list.
$ join names surnames -o 1.2,2.2 -a1 Pike Trickfoot Grog Strongjaw Scanlan Shorthalt Shaun Gilmore Tiberius Stormwind Taryon Darrington Percy
An additional -e
switch, which requires -o
to be given, specifies what to display in case of no match for a field in the output.
$ join names surnames -o 1.2,2.2 -a1 -e "has no surname" Pike Trickfoot Grog Strongjaw Scanlan Shorthalt Shaun Gilmore Tiberius Stormwind Taryon Darrington Percy has no surname
Let’s give Percy his surname and redirect the output to a file.
$ join names surnames -o 1.2,2.2 -a1 -e "DeRolo" > fullnames $ cat fullnames Pike Trickfoot Grog Strongjaw Scanlan Shorthalt Shaun Gilmore Tiberius Stormwind Taryon Darrington Percy DeRolo
Other formats
What if the data is not space-separated? The -t
switch specifies the field separator to use. For example, we can specify a comma for use with CSV files:
$ cat species Grog Strongjaw,Goliath Percy DeRolo,Human Pike Trickfoot,Gnome Scanlan Shorthalt,Gnome Shaun Gilmore,Human Taryon Darrington,Human Tiberius Stormwind,Dragonborn $ cat players Travis,Grog Strongjaw Taliesin,Percy DeRolo Ashley,Pike Trickfoot Sam,Scanlan Shorthalt Matt,Shaun Gilmore Sam,Taryon Darrington Orion,Tiberius Stormwind
When joined, they become:
$ join -t, players species -1 2 -2 1 Grog Strongjaw,Travis,Goliath Percy DeRolo,Taliesin,Human Pike Trickfoot,Ashley,Gnome Scanlan Shorthalt,Sam,Gnome Shaun Gilmore,Matt,Human Taryon Darrington,Sam,Human Tiberius Stormwind,Orion,Dragonborn
Those final arguments, -1 2 -2 1
are more advanced output control. They tell join
what to compare to join our files: use field two for file one (-1 2
) and field one for file two (-2 1
). The -2
isn’t strictly speaking necessary, as join
assumes the first field unless otherwise instructed, but being explicit is never a bad thing.