Good point, @emma-crepeau ! I’ve definitely used merge()
before, but lately, I’ve been all about performance with larger datasets, so I’ve turned to the data.table
package. It’s a game-changer, especially when handling huge data frames. Here’s how I perform SQL-style joins in data.table
:
First, you’ll want to convert your data frames into data.table
objects:
setDT(df1); setDT(df2)
- Inner join:
df1[df2, on = "key", nomatch = 0]
- Left join:
df1[df2, on = "key"]
- Right join:
df2[df1, on = "key"]
As for outer join in data.table
, it’s a bit more work, but you can still use merge()
with the all = TRUE
argument, or you could use more specific data.table
syntax.
Overall, if you’re working with large datasets, data.table
is blazing fast and efficient. It really shines when you need speed alongside SQL-style joins, like a left join r, but if you’re working with smaller data or want simplicity, dplyr
or even merge()
are great options.