Skip to content

Speed up read.FCS() #280

@SamGG

Description

@SamGG

In my hands, most current FCS files use float (or double?) representation for intensity. I think that reading their matrix of intensities uses the following call.

flowCore/R/IO.R

Lines 906 to 907 in e898b05

readBin(con = con, what = dattype, n = count, size = size,
signed = signed, endian=endian)

readBin() is doing so much stuff that it makes reading a matrix from file very slow. As a workaround, I replaced the matrix reading with a Cpp code. In my workflow, reading and sampling 69 FCS goes from 180 sec to 30 sec. I use ff <- read.FCS(..., which.lines = 1:5) to get a flowFrame and all information from a FCS file. Then I get intensities matrix using Cpp code and replace the exprs(ff) of the flowFrame with the matrix. Unfortunately, it is sound difficult/dangereous to replace flowCore code above by infering the C file pointer from the R connection.
The following Cpp code works fine for float and should be adapted for double (or integers).
I think the great developpers of flowCore will perform a better job than me.

#include <Rcpp.h>
#include <fstream>
#include <stdexcept>

using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix read_fcs_data(
        const std::string& file_path,
        long byte_offset,
        long n_row,
        long n_par,
        bool swap
) {
    // open file in binary mode
    std::ifstream con(file_path, std::ios::binary);
    if (!con.is_open())
        stop("Cannot open file: " + file_path);
    
    // seek to byte offset
    con.seekg(byte_offset, std::ios::beg);
    if (con.fail())
        stop("Failed to seek to offset " + std::to_string(byte_offset));
    
    // read n_vals float32 values into a flat buffer
    int n_vals = n_row * n_par;
    std::vector<float> buf(n_vals);
    con.read(reinterpret_cast<char*>(buf.data()), n_vals * sizeof(float));
    if (con.fail())
        stop("Failed to read " + std::to_string(n_vals) + " values from file");
    
    // swap bytes if file endian differs from host
    if (swap) {
        for (int k = 0; k < n_vals; k++) {
            char* p = reinterpret_cast<char*>(&buf[k]);
            std::swap(p[0], p[3]);
            std::swap(p[1], p[2]);
        }
    }
    
    // fill matrix row-by-row (byrow = TRUE means values are row-major)
    NumericMatrix data_mat(n_row, n_par);
    for (int row = 0; row < n_row; row++) {
        for (int col = 0; col < n_par; col++) {
            data_mat(row, col) = static_cast<double>(buf[row * n_par + col]);
        }
    }
    
    return data_mat;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions