I have filenames that follow a pattern like:
someletters_12345_moreletters.ext
I want to extract just the 5-digit number (in this case, 12345) and store it in a variable using a Bash substring method or any other shell-compatible approach.
The format is consistent: random characters, then an underscore, a 5-digit number, another underscore, and more characters. I’m curious about the different ways this can be done, using tools like bash string manipulation, grep, sed, or awk.
What are some reliable and efficient methods for extracting that number?
Hey! If you want a clean, dependency-free way that uses just built-in Bash, try this regex approach:
filename="someletters_12345_moreletters.ext"
if [[ $filename =~ _([0-9]{5})_ ]]; then
digits="${BASH_REMATCH[1]}"
echo "$digits"
fi
This uses bash substring extraction via regex capture groups. It’s great if you want to avoid external commands and keep it portable.
I use this all the time in my deployment scripts.
If you’re more comfortable with Unix tools, sed gives you a quick way to do this:
filename="someletters_12345_moreletters.ext"
digits=$(echo "$filename" | sed -n 's/.*_\([0-9]\{5\}\)_.*/\1/p')
echo "$digits"
This matches exactly five digits surrounded by underscores. It’s super handy when piping filenames or working in loops over files in directories.
I’ve also done this with awk when I had to extract multiple tokens from complex filenames:
filename="someletters_12345_moreletters.ext"
digits=$(echo "$filename" | awk -F'_' '{print $2}')
echo "$digits"
Since your format is consistent, and the number is always the second field, this works efficiently without regex. Especially nice when you’re dealing with multiple files and want clean, field-based extraction.