How can I read a binary file in Python?

suneelak.673 · November 22, 2024, 6:30pm

I’m having difficulty reading a binary file with Python. The file format is as follows:

Bytes 1-4: The integer 8
Bytes 5-8: The number of particles, N
Bytes 9-12: The number of groups
Bytes 13-16: The integer 8
Bytes 17-20: The integer 4*N
Following bytes: The group ID numbers for all particles
Last 4 bytes: The integer 4*N

In Fortran 90, the file is read easily with the following code:

int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)

How can I read this binary file with Python? I’ve tried several approaches, but it never worked. Is there a way to use a Fortran 90 program in Python to read the binary file and then save the data that I need to work with?

Can you help me with python read binary file?

shashank_watak · November 22, 2024, 6:32pm

Using struct to Unpack the Binary Data: Python’s struct module is a great way to interpret binary data. Here’s a quick example of how you can read and unpack binary files according to a specific format:

import struct

def read_binary_file(filename):
    with open(filename, 'rb') as f:
        # Read the first 4 bytes (integer 8)
        int_8 = struct.unpack('i', f.read(4))[0]

        # Read the next 4 bytes for the number of particles (n_particles)
        n_particles = struct.unpack('i', f.read(4))[0]

        # Read the next 4 bytes for the number of groups (n_groups)
        n_groups = struct.unpack('i', f.read(4))[0]

        # Skip the next 4 bytes (integer 8 again)
        f.read(4)

        # Read the integer 4*N (length of group IDs)
        group_id_length = struct.unpack('i', f.read(4))[0]

        # Read the group IDs
        group_ids = [struct.unpack('f', f.read(4))[0] for _ in range(n_particles)]

        # Read the last 4 bytes (integer 4*N)
        f.read(4)

    return n_particles, n_groups, group_ids

# Usage
filename = 'your_binary_file.bin'
n_particles, n_groups, group_ids = read_binary_file(filename)
print(n_particles, n_groups, group_ids)

This method provides precision but requires careful handling of the binary structure.

madhurima_sil · December 2, 2024, 11:37am

@shashank_watak, that’s a solid approach with struct, but have you tried using numpy for this kind of task? It simplifies things a lot, especially when dealing with numeric data. Here’s an example that leverages numpy for efficiency:

import numpy as np

def read_binary_file_with_numpy(filename):
    with open(filename, 'rb') as f:
        # Read first 4 bytes (integer 8)
        f.read(4)

        # Read the number of particles (n_particles)
        n_particles = np.fromfile(f, dtype=np.int32, count=1)[0]

        # Read the number of groups (n_groups)
        n_groups = np.fromfile(f, dtype=np.int32, count=1)[0]

        # Skip the next 4 bytes (integer 8 again)
        f.read(4)

        # Read group IDs (4 bytes for each particle)
        group_ids = np.fromfile(f, dtype=np.float32, count=n_particles)

        # Skip the last 4 bytes (integer 4*N)
        f.read(4)

    return n_particles, n_groups, group_ids

# Usage
filename = 'your_binary_file.bin'
n_particles, n_groups, group_ids = read_binary_file_with_numpy(filename)
print(n_particles, n_groups, group_ids)

This method is faster and more concise, especially for large datasets. It also avoids the explicit loop for reading group_ids.