Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -354,3 +354,44 @@ def to_dataframe(file_path: str,
If any specified column does not exist in the table schema.
"""
```

### TsFileDataFrame
TsFileDataFrame is built around three core types:

* **TsFileDataFrame**: The entry object that loads one or more TsFiles and provides a unified view. Only metadata is scanned during initialization; actual data values are **not** read.
* **Timeseries**: A lazy-loaded handle for a single time series. Obtained via array-style indexing with `df[...]`, it contains series metadata but does not load data immediately – data reading is only triggered when indexed by row number.
* **AlignedTimeseries**: The time-aligned result of multiple series. Obtained via `df.loc[...]`, it aligns multiple specified series to the same timeline within a given time range and loads them into memory in one operation.

#### TsFileDataFrame

| Example | Operation | Return Type |
| -------------------------------------------- | -------------------------------------- | ----------------- |
| `TsFileDataFrame(paths)` | Load file(s) / directory | TsFileDataFrame |
| `len(df)` | Get total number of time series | int |
| `df.list_timeseries("weather")` | Get / filter series names by prefix | List[str] |
| `df["weather.beijing.humidity"], df[0], df[-1]` | Get a single time series | Timeseries |
| `df[0:3], df[[0,2,5]]` | Get multiple time series | List[Timeseries] |
| `df.loc[start:end, series_list]` | Query with timestamp alignment | AlignedTimeseries |

#### Timeseries

| Example | Operation | Return Type |
| ------------------- | ----------------------- | ----------- |
| `ts.name` | Series name | str |
| `len(ts)` | Number of data points | int |
| `ts.stats` | Series statistics | dict |
| `ts[20]` | Read single value | float |
| `ts[20:100]` | Slice by row range | np.ndarray |
| `ts.timestamps` | Timestamps array | np.ndarray |

#### AlignedTimeseries

| Example | Operation | Return Type |
| -------------------------------------------- | ----------------------- | ------------------------------- |
| `data.timestamps` | Timestamps array | `np.ndarray` |
| `data.values` | Value matrix | `np.ndarray, shape=(N, M)` |
| `data.series_names` | Series names list | `List[str]` |
| `data.shape` | Shape | `(N, M)` |
| `len(data)` | Number of rows | `int` |
| `data[0]`, `data[0:10]`, `data[0, 1]` | Row / element indexing | `np.ndarray` / scalar |
| `print(data)`, `data.show(50)` | Formatted output | Auto-truncated table |
18 changes: 18 additions & 0 deletions src/UserGuide/develop/QuickStart/QuickStart-PYTHON.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,24 @@ table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile")
print(ts.to_dataframe(table_data_dir))
```

`TsFileDataFrame` allows you to read time series data from TsFile just like operating a DataFrame, without worrying about the underlying file format and data loading details.

```Python
from iotdb_ai import TsFileDataFrame

df = TsFileDataFrame("data/") # Load all TsFiles in the directory
# Browse all time series

ts = df["weather.Beijing.humidity"] # Retrieve a single time series
window = ts[20:100] # Slice by row index -> np.ndarray

data = df.loc[start:end, [ # Align multiple time series by timestamp
"weather.Beijing.temperature",
"weather.Beijing.humidity",
]]
data.values # -> np.ndarray, shape=(N, 2)
```

## Sample Code

The sample code of using these interfaces is in:https://github.com/apache/tsfile/blob/develop/python/examples/example.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -352,4 +352,45 @@ def to_dataframe(file_path: str,
ColumnNotExistError
If any specified column does not exist in the table schema.
"""
```
```

### TsFileDataFrame
TsFileDataFrame is built around three core types:

* **TsFileDataFrame**: The entry object that loads one or more TsFiles and provides a unified view. Only metadata is scanned during initialization; actual data values are **not** read.
* **Timeseries**: A lazy-loaded handle for a single time series. Obtained via array-style indexing with `df[...]`, it contains series metadata but does not load data immediately – data reading is only triggered when indexed by row number.
* **AlignedTimeseries**: The time-aligned result of multiple series. Obtained via `df.loc[...]`, it aligns multiple specified series to the same timeline within a given time range and loads them into memory in one operation.

#### TsFileDataFrame

| Example | Operation | Return Type |
| -------------------------------------------- | -------------------------------------- | ----------------- |
| `TsFileDataFrame(paths)` | Load file(s) / directory | TsFileDataFrame |
| `len(df)` | Get total number of time series | int |
| `df.list_timeseries("weather")` | Get / filter series names by prefix | List[str] |
| `df["weather.beijing.humidity"], df[0], df[-1]` | Get a single time series | Timeseries |
| `df[0:3], df[[0,2,5]]` | Get multiple time series | List[Timeseries] |
| `df.loc[start:end, series_list]` | Query with timestamp alignment | AlignedTimeseries |

#### Timeseries

| Example | Operation | Return Type |
| ------------------- | ----------------------- | ----------- |
| `ts.name` | Series name | str |
| `len(ts)` | Number of data points | int |
| `ts.stats` | Series statistics | dict |
| `ts[20]` | Read single value | float |
| `ts[20:100]` | Slice by row range | np.ndarray |
| `ts.timestamps` | Timestamps array | np.ndarray |

#### AlignedTimeseries

| Example | Operation | Return Type |
| -------------------------------------------- | ----------------------- | ------------------------------- |
| `data.timestamps` | Timestamps array | `np.ndarray` |
| `data.values` | Value matrix | `np.ndarray, shape=(N, M)` |
| `data.series_names` | Series names list | `List[str]` |
| `data.shape` | Shape | `(N, M)` |
| `len(data)` | Number of rows | `int` |
| `data[0]`, `data[0:10]`, `data[0, 1]` | Row / element indexing | `np.ndarray` / scalar |
| `print(data)`, `data.show(50)` | Formatted output | Auto-truncated table |
18 changes: 18 additions & 0 deletions src/UserGuide/latest/QuickStart/QuickStart-PYTHON.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,24 @@ table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile")
print(ts.to_dataframe(table_data_dir))
```

`TsFileDataFrame` allows you to read time series data from TsFile just like operating a DataFrame, without worrying about the underlying file format and data loading details.

```Python
from iotdb_ai import TsFileDataFrame

df = TsFileDataFrame("data/") # Load all TsFiles in the directory
# Browse all time series

ts = df["weather.Beijing.humidity"] # Retrieve a single time series
window = ts[20:100] # Slice by row index -> np.ndarray

data = df.loc[start:end, [ # Align multiple time series by timestamp
"weather.Beijing.temperature",
"weather.Beijing.humidity",
]]
data.values # -> np.ndarray, shape=(N, 2)
```

## Sample Code

The sample code of using these interfaces is in:https://github.com/apache/tsfile/blob/develop/python/examples/example.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,7 @@ class ResultSet:
```
### to_dataframe
```Python
Expand Down Expand Up @@ -331,6 +332,46 @@ def to_dataframe(file_path: str,
ColumnNotExistError
当指定的列在表结构中不存在时抛出。
"""
```
### TsFileDataFrame
TsFileDataFrame 围绕着三个核心类型:
* **TsFileDataFrame:** 入口对象,加载一至多个 TsFile 并提供统一视图。初始化时只扫描元数据,不读取实际数值。
* **Timeseries:** 单条时间序列的懒加载句柄。通过 `df[...]`的数组操作获得,包含序列元信息但不立即读取,仅在行号索引时才触发数据读取
* **AlignedTimeseries:** 多条序列的时间对齐结果。通过`df.loc[...]`获取,一次性将指定时间范围内的多条序列对齐到同一时间轴并读入内存
#### TsFileDataFrame
| 示例 | 操作 | 返回类型 |
| ----------------------------------------------------- | ----------------------- | ------------------- |
| `TsFileDataFrame(paths)` | 加载文件/目录 | TsFileDataFrame |
| `len(df)` | 获取时间序列总数 | int |
| `df.list_timeseries("weather")` | 获取/按前缀筛选序列名 | List[str] |
| `df["weather.beijing.humidity"],df[0], df[-1]` | 获取单条序列 | Timeseries |
| `df[0:3], df[[0,2,5]]` | 获取多条序列 | List[Timeseries] |
| `df.loc[start:end, serlies_list]` | 按时间戳对齐查询 | AlignedTimeseries |
#### Timeseries
| 示例 | 操作 | 返回类型 |
| ----------------------------- | -------------- | ------------ |
| `ts.name` | 序列名 | str |
| `len(ts)` | 序列点数 | int |
| `ts.stats` | 序列统计信息 | dict |
| `ts[20]` | 单值读取 | float |
| `ts[20:100]` | 行范围切片 | np.ndarray |
| `ts.``timestamps` | 时间戳数组 | np.ndarray |
#### AlignedTimeseries
| 示例 | 操作 | 返回类型 |
| --------------------------------------------------- | --------------- | -------------------------------- |
| `data.timestamps` | 时间戳数组 | `np.ndarray` |
| `data.values` | 值矩阵 | `np.ndarray, shape=(N, M)` |
| `data.series_names` | 序列名列表 | `List[str]` |
| `data.shape` | 形状 | `(N, M)` |
| `len(data)` | 行数 | `int` |
| `data[0]`、`data[0:10]`、`data[0, 1]` | 行 / 元素索引 | `np.ndarray`/ scalar |
| `print(data)`、`data.show(50)` | 格式化输出 | 自动截断的表格 |
18 changes: 17 additions & 1 deletion src/zh/UserGuide/develop/QuickStart/QuickStart-PYTHON.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ mvn clean install -P with-python -DskipTests
mvnw.cmd clean install -P with-python -DskipTests
```

* 编译成功后,wheel 文件将位于 `tsfile/python/dist` 目录下, 可通过 pip install 命令进行本地安装(假设他的名字是 tsfile.wheel)
* 编译成功后,wheel 文件将位于 `tsfile/python/dist` 目录下, 可通过 pip install 命令进行本地安装(假设他的名字是 `tsfile.wheel`

```bash
pip install tsfile.wheel
Expand Down Expand Up @@ -158,6 +158,22 @@ table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile")
print(ts.to_dataframe(table_data_dir))
```

TsFileDataFrame能够让你像操作 DataFrame 一样读取TsFile 中的时序数据,无需关心底层文件格式和数据加载细节。

```Python
from iotdb_ai import TsFileDataFrame

df = TsFileDataFrame("data/") # 加载目录下所有 TsFile # 浏览所有序列

ts = df["weather.Beijing.humidity"] # 取一条序列
window = ts[20:100] # 按行号切片 -> np.ndarray

data = df.loc[start:end, [ # 按时间戳对齐多条序列"weather.Beijing.humidity",
"weather.Beijing.temperature",
"weather.Beijing.humidity",
]]
data.values # -> np.ndarray, shape=(N, 2)
```

## 示例代码

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -332,5 +332,46 @@ def to_dataframe(file_path: str,
ColumnNotExistError
当指定的列在表结构中不存在时抛出。
"""

```

### TsFileDataFrame

TsFileDataFrame 围绕着三个核心类型:

* **TsFileDataFrame:** 入口对象,加载一至多个 TsFile 并提供统一视图。初始化时只扫描元数据,不读取实际数值。
* **Timeseries:** 单条时间序列的懒加载句柄。通过 `df[...]`的数组操作获得,包含序列元信息但不立即读取,仅在行号索引时才触发数据读取
* **AlignedTimeseries:** 多条序列的时间对齐结果。通过`df.loc[...]`获取,一次性将指定时间范围内的多条序列对齐到同一时间轴并读入内存

#### TsFileDataFrame

| 示例 | 操作 | 返回类型 |
| ----------------------------------------------------- | ----------------------- | ------------------- |
| `TsFileDataFrame(paths)` | 加载文件/目录 | TsFileDataFrame |
| `len(df)` | 获取时间序列总数 | int |
| `df.list_timeseries("weather")` | 获取/按前缀筛选序列名 | List[str] |
| `df["weather.beijing.humidity"],df[0], df[-1]` | 获取单条序列 | Timeseries |
| `df[0:3], df[[0,2,5]]` | 获取多条序列 | List[Timeseries] |
| `df.loc[start:end, serlies_list]` | 按时间戳对齐查询 | AlignedTimeseries |

#### Timeseries

| 示例 | 操作 | 返回类型 |
| ----------------------------- | -------------- | ------------ |
| `ts.name` | 序列名 | str |
| `len(ts)` | 序列点数 | int |
| `ts.stats` | 序列统计信息 | dict |
| `ts[20]` | 单值读取 | float |
| `ts[20:100]` | 行范围切片 | np.ndarray |
| `ts.``timestamps` | 时间戳数组 | np.ndarray |

#### AlignedTimeseries

| 示例 | 操作 | 返回类型 |
| --------------------------------------------------- | --------------- | -------------------------------- |
| `data.timestamps` | 时间戳数组 | `np.ndarray` |
| `data.values` | 值矩阵 | `np.ndarray, shape=(N, M)` |
| `data.series_names` | 序列名列表 | `List[str]` |
| `data.shape` | 形状 | `(N, M)` |
| `len(data)` | 行数 | `int` |
| `data[0]`、`data[0:10]`、`data[0, 1]` | 行 / 元素索引 | `np.ndarray`/ scalar |
| `print(data)`、`data.show(50)` | 格式化输出 | 自动截断的表格 |
16 changes: 16 additions & 0 deletions src/zh/UserGuide/latest/QuickStart/QuickStart-PYTHON.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,22 @@ table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile")
print(ts.to_dataframe(table_data_dir))
```

TsFileDataFrame 能够让你像操作 DataFrame 一样读取 TsFile 中的时序数据,无需关心底层文件格式和数据加载细节。

```Python
from iotdb_ai import TsFileDataFrame

df = TsFileDataFrame("data/") # 加载目录下所有 TsFile # 浏览所有序列

ts = df["weather.Beijing.humidity"] # 取一条序列
window = ts[20:100] # 按行号切片 -> np.ndarray

data = df.loc[start:end, [ # 按时间戳对齐多条序列"weather.Beijing.humidity",
"weather.Beijing.temperature",
"weather.Beijing.humidity",
]]
data.values # -> np.ndarray, shape=(N, 2)
```

## 示例代码

Expand Down
Loading