How do I perform different types of SQL-style joins like inner, outer, left, and right joins on data frames in R?

anjuyadav.1398 · June 11, 2025, 6:30pm

I have two data frames in R, df1 and df2, and I want to join them in ways similar to SQL joins:

Inner join: only rows with matching keys in both data frames.

Outer join: all rows from both data frames, matched where possible.

Left join: all rows from df1, with matching rows from df2.

Right join: all rows from df2, with matching rows from df1.

How can I do these joins in R using these two data frames?

Rashmihasija · June 21, 2025, 10:47am

Hey @anjuyadav.1398 , I’ve been working with joins in R for quite a while, and honestly, I find that the dplyr package makes SQL-style joins super straightforward. For your two data frames, df1 and df2, here’s how I’d go about it:

Inner join: inner_join(df1, df2, by = "key")
Left join: left_join(df1, df2, by = "key")
Right join: right_join(df1, df2, by = "key")
Full outer join: full_join(df1, df2, by = "key")

These functions mimic SQL joins perfectly, and the syntax is really clean and intuitive. Plus, the nice thing about dplyr is how it handles the merging logic smoothly, saving you from dealing with any tricky edge cases.

emma-crepeau · June 21, 2025, 10:48am

Totally agree with you, @Rashmihasija ! When I first started with joins in R, I relied heavily on base R’s merge() function. It’s still a solid option, especially if you want to avoid extra dependencies. Here’s how you can perform SQL-style joins using base R:

Inner join: merge(df1, df2, by = "key")
Left join: merge(df1, df2, by = "key", all.x = TRUE)
Right join: merge(df1, df2, by = "key", all.y = TRUE)
Full outer join: merge(df1, df2, by = "key", all = TRUE)

Though merge() works well, I’ve shifted to dplyr over time because of the cleaner and more readable syntax, especially for operations like the left join r. But, honestly, for smaller datasets, merge() gets the job done just fine.

Rashmihasija · June 21, 2025, 10:48am

Good point, @emma-crepeau ! I’ve definitely used merge() before, but lately, I’ve been all about performance with larger datasets, so I’ve turned to the data.table package. It’s a game-changer, especially when handling huge data frames. Here’s how I perform SQL-style joins in data.table:

First, you’ll want to convert your data frames into data.table objects: setDT(df1); setDT(df2)

Inner join: df1[df2, on = "key", nomatch = 0]
Left join: df1[df2, on = "key"]
Right join: df2[df1, on = "key"]

As for outer join in data.table, it’s a bit more work, but you can still use merge() with the all = TRUE argument, or you could use more specific data.table syntax.

Overall, if you’re working with large datasets, data.table is blazing fast and efficient. It really shines when you need speed alongside SQL-style joins, like a left join r, but if you’re working with smaller data or want simplicity, dplyr or even merge() are great options.