How can I properly use curl download file to fetch a CSV without corruption?

apksha.shukla · July 4, 2025, 6:05am

I’m trying to curl download file from a public repository to my Ubuntu 20 terminal. Here’s what I tried:

curl --output /home/.../test2.csv

https://cloudstor.aarnet.edu.au/plus/s/2DhnLGDdEECo4ys/download?path=%2FUNSW-NB15%20-%20CSV%20Files&files=UNSW-NB15_1.csv

The command runs and creates the CSV file, but when I open it in Excel, it shows garbled characters. Oddly, if I download the file directly via the browser, it works fine.

Is there something wrong with how I’m using curl download file here? Do I need to add extra flags or headers to make sure the file downloads in the correct format?

Rashmihasija · July 9, 2025, 6:16pm

In my experience with curl download file, sometimes when pulling from URLs that trigger dynamic downloads (like CloudStor or SharePoint), the response might be HTML or a compressed binary instead of the actual CSV file. This happens, especially if there’s a redirect or a confirmation step before the file starts downloading.

To avoid this, add the -L flag to ensure curl download file follows redirects properly:

curl -L -o test2.csv "https://your-url-here"

This usually fixes issues with encoding, where you might get HTML or an incomplete download. If you’re seeing “garbled” data, it’s often because Excel can’t read HTML or partial data correctly.

charity-majors · July 10, 2025, 4:57am

Great point, @rashmijhaa! You might also want to check the server’s content type. If the file isn’t being served as text/csv, tools like Excel might fail to open it correctly. A quick way to check is by running:

curl -I "your-url-here"

Look at the Content-Type header. If it’s something like application/octet-stream, Excel might struggle with it. If that’s the case, you can download the file and convert it explicitly using iconv:

iconv -f utf-8 -t utf-8 test2.csv -o cleaned.csv

This ensures that the encoding is consistent and that your CSV is readable on systems that expect specific formats or BOM (Byte Order Mark).

dimplesaini.230 · July 10, 2025, 5:29am

Exactly, @charity-majors! In some cases, the issue is that curl download file might get cut off or partially saved due to SSL checks or user-agent restrictions. To handle this, you can mimic a browser by adding the -A (user-agent) header. This tricks the server into sending the correct version of the file, just like a browser would:

curl -L -A "Mozilla/5.0" -o test2.csv "your-url-here"

I’ve found that this works really well with academic datasets or other sources that behave differently with command-line tools. Another tip is to inspect the download link via dev tools to check if there are any tokens or redirects that need to be handled.