![]() |
OpenMS
|
Shared utilities for reading, writing, and packaging Parquet-based file formats. More...
#include <OpenMS/FORMAT/ParquetFile.h>
Static Public Member Functions | |
Arrow builder helpers | |
| static void | appendOrThrow (const arrow::Status &status, const std::string &column) |
| Append a value to an Arrow builder, throwing on failure. | |
| template<typename BuilderT > | |
| static std::shared_ptr< arrow::Array > | finishArray (BuilderT &builder, const std::string &name) |
| Finish an Arrow builder and return the resulting Array. | |
Parquet file I/O | |
| static void | writeTable (const std::shared_ptr< arrow::Table > &table, const String &filename, int64_t row_group_size=262144) |
| Write an Arrow Table to a Parquet file. | |
| static std::shared_ptr< arrow::Table > | readTable (const String &filename) |
| Read a Parquet file into an Arrow Table. | |
| static std::shared_ptr< arrow::Table > | readTable (const std::shared_ptr< arrow::io::RandomAccessFile > &infile) |
| Read a Parquet file from an Arrow RandomAccessFile into an Arrow Table. | |
Column accessors | |
| static std::shared_ptr< arrow::Array > | getColumn (const std::shared_ptr< arrow::Table > &table, const std::string &name) |
| Get a required column from an Arrow Table by name. | |
| static std::shared_ptr< arrow::Array > | getOptionalColumn (const std::shared_ptr< arrow::Table > &table, const std::string &name) |
| Get an optional column from an Arrow Table by name. | |
Type-safe value accessors | |
| static int64_t | getInt64 (const std::shared_ptr< arrow::Array > &array, int64_t row, int64_t default_value, bool allow_null) |
| Read an integer value from an Arrow Array with type coercion. | |
| static double | getDouble (const std::shared_ptr< arrow::Array > &array, int64_t row, double default_value, bool allow_null) |
| Read a floating-point value from an Arrow Array with type coercion. | |
| static bool | getBool (const std::shared_ptr< arrow::Array > &array, int64_t row, bool default_value, bool allow_null) |
| Read a boolean value from an Arrow Array with type coercion. | |
| static std::string | getString (const std::shared_ptr< arrow::Array > &array, int64_t row) |
| Read a string value from an Arrow Array. | |
| static std::vector< std::string > | getStringList (const std::shared_ptr< arrow::Array > &array, int64_t row) |
| Read a list of strings from an Arrow Array. | |
Misc helpers | |
| static std::string | jsonEscape (const String &input) |
| Escape a string for safe embedding into JSON values. | |
| static int64_t | rowCount (const String &filename) |
| Return the number of rows in a parquet file using the low-level parquet reader metadata. Returns 0 if the file does not exist. | |
| static void | throw_finish_error_ (const std::string &name, const std::string &error) |
| Internal helper to throw a consistent error from finishArray. | |
Shared utilities for reading, writing, and packaging Parquet-based file formats.
This class provides static helpers used by multiple OpenMS Parquet-backed I/O classes (e.g. TransitionParquetFile, OpenSwathOSWParquetWriter, XICParquetFile).
Capabilities include:
All Parquet zip archives use store-only compression (-0) because Parquet files are already internally compressed; re-compressing with deflate wastes CPU for negligible size reduction.
|
static |
Append a value to an Arrow builder, throwing on failure.
| [in,out] | status | Arrow status returned by the append call |
| [in] | column | Column name (used in error messages) |
| Exception::InvalidValue | if the status is not OK |
|
inlinestatic |
Finish an Arrow builder and return the resulting Array.
| [in,out] | builder | Any Arrow ArrayBuilder subclass |
| [in] | name | Column name (used in error messages) |
| Exception::InvalidValue | if finishing fails |
|
static |
Read a boolean value from an Arrow Array with type coercion.
Supports Boolean and all integer types (non-zero = true).
| [in] | array | The Arrow Array (may be nullptr if allow_null) |
| [in] | row | Row index |
| [in] | default_value | Value returned when array is nullptr or value is null |
| [in] | allow_null | If true, null values return default_value instead of throwing |
| Exception::MissingInformation | if value is null and allow_null is false |
| Exception::InvalidValue | for unsupported column types |
|
static |
Get a required column from an Arrow Table by name.
| [in] | table | The Arrow Table |
| [in] | name | Column name |
| Exception::MissingInformation | if column not found |
| Exception::InvalidValue | if column has no chunks |
|
static |
Read a floating-point value from an Arrow Array with type coercion.
Supports Float, Double, and all integer types (coerced to double).
| [in] | array | The Arrow Array (may be nullptr if allow_null) |
| [in] | row | Row index |
| [in] | default_value | Value returned when array is nullptr or value is null |
| [in] | allow_null | If true, null values return default_value instead of throwing |
| Exception::MissingInformation | if value is null and allow_null is false |
| Exception::InvalidValue | for unsupported column types |
|
static |
Read an integer value from an Arrow Array with type coercion.
Supports Int8–Int64, UInt8–UInt64 types.
| [in] | array | The Arrow Array (may be nullptr if allow_null) |
| [in] | row | Row index |
| [in] | default_value | Value returned when array is nullptr or value is null |
| [in] | allow_null | If true, null values return default_value instead of throwing |
| Exception::MissingInformation | if value is null and allow_null is false |
| Exception::InvalidValue | for unsupported column types |
|
static |
Get an optional column from an Arrow Table by name.
| [in] | table | The Arrow Table |
| [in] | name | Column name |
| Exception::InvalidValue | if column exists but has no chunks |
|
static |
Read a string value from an Arrow Array.
Supports String and LargeString types.
| [in] | array | The Arrow Array (may be nullptr) |
| [in] | row | Row index |
| Exception::InvalidValue | for unsupported column types |
|
static |
Read a list of strings from an Arrow Array.
Supports String/LargeString (semicolon-delimited) and List/LargeList of strings.
| [in] | array | The Arrow Array (may be nullptr) |
| [in] | row | Row index |
| Exception::InvalidValue | for unsupported column types |
|
static |
Escape a string for safe embedding into JSON values.
Mirrors the ad-hoc jsonEscape_ implementations used in several Parquet-writing sources so callers can reuse a single canonical implementation.
|
static |
Read a Parquet file from an Arrow RandomAccessFile into an Arrow Table.
Allows reading Parquet data directly from an in-archive RandomAccessFile (e.g. libzip-backed).
|
static |
Read a Parquet file into an Arrow Table.
The table is returned with chunks combined into single arrays per column.
| [in] | filename | Input Parquet file path |
| Exception::InvalidValue | if reading fails |
|
static |
Return the number of rows in a parquet file using the low-level parquet reader metadata. Returns 0 if the file does not exist.
|
staticprivate |
Internal helper to throw a consistent error from finishArray.
|
static |
Write an Arrow Table to a Parquet file.
| [in] | table | The Arrow Table to write |
| [in] | filename | Output file path |
| [in] | row_group_size | Number of rows per row group (default: 262144) |
| Exception::FileNotWritable | if the file cannot be opened |
| Exception::InvalidValue | if writing fails |