diff --git a/src/UserGuide/develop/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md b/src/UserGuide/develop/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md index 2ed66d8f5..746e9910e 100644 --- a/src/UserGuide/develop/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md +++ b/src/UserGuide/develop/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md @@ -354,3 +354,44 @@ def to_dataframe(file_path: str, If any specified column does not exist in the table schema. """ ``` + +### TsFileDataFrame +TsFileDataFrame is built around three core types: + +* **TsFileDataFrame**: The entry object that loads one or more TsFiles and provides a unified view. Only metadata is scanned during initialization; actual data values are **not** read. +* **Timeseries**: A lazy-loaded handle for a single time series. Obtained via array-style indexing with `df[...]`, it contains series metadata but does not load data immediately – data reading is only triggered when indexed by row number. +* **AlignedTimeseries**: The time-aligned result of multiple series. Obtained via `df.loc[...]`, it aligns multiple specified series to the same timeline within a given time range and loads them into memory in one operation. + +#### TsFileDataFrame + +| Example | Operation | Return Type | +| -------------------------------------------- | -------------------------------------- | ----------------- | +| `TsFileDataFrame(paths)` | Load file(s) / directory | TsFileDataFrame | +| `len(df)` | Get total number of time series | int | +| `df.list_timeseries("weather")` | Get / filter series names by prefix | List[str] | +| `df["weather.beijing.humidity"], df[0], df[-1]` | Get a single time series | Timeseries | +| `df[0:3], df[[0,2,5]]` | Get multiple time series | List[Timeseries] | +| `df.loc[start:end, series_list]` | Query with timestamp alignment | AlignedTimeseries | + +#### Timeseries + +| Example | Operation | Return Type | +| ------------------- | ----------------------- | ----------- | +| `ts.name` | Series name | str | +| `len(ts)` | Number of data points | int | +| `ts.stats` | Series statistics | dict | +| `ts[20]` | Read single value | float | +| `ts[20:100]` | Slice by row range | np.ndarray | +| `ts.timestamps` | Timestamps array | np.ndarray | + +#### AlignedTimeseries + +| Example | Operation | Return Type | +| -------------------------------------------- | ----------------------- | ------------------------------- | +| `data.timestamps` | Timestamps array | `np.ndarray` | +| `data.values` | Value matrix | `np.ndarray, shape=(N, M)` | +| `data.series_names` | Series names list | `List[str]` | +| `data.shape` | Shape | `(N, M)` | +| `len(data)` | Number of rows | `int` | +| `data[0]`, `data[0:10]`, `data[0, 1]` | Row / element indexing | `np.ndarray` / scalar | +| `print(data)`, `data.show(50)` | Formatted output | Auto-truncated table | \ No newline at end of file diff --git a/src/UserGuide/develop/QuickStart/QuickStart-PYTHON.md b/src/UserGuide/develop/QuickStart/QuickStart-PYTHON.md index c29dfba4c..30dd23271 100644 --- a/src/UserGuide/develop/QuickStart/QuickStart-PYTHON.md +++ b/src/UserGuide/develop/QuickStart/QuickStart-PYTHON.md @@ -154,6 +154,24 @@ table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile") print(ts.to_dataframe(table_data_dir)) ``` +`TsFileDataFrame` allows you to read time series data from TsFile just like operating a DataFrame, without worrying about the underlying file format and data loading details. + +```Python +from iotdb_ai import TsFileDataFrame + +df = TsFileDataFrame("data/") # Load all TsFiles in the directory + # Browse all time series + +ts = df["weather.Beijing.humidity"] # Retrieve a single time series +window = ts[20:100] # Slice by row index -> np.ndarray + +data = df.loc[start:end, [ # Align multiple time series by timestamp + "weather.Beijing.temperature", + "weather.Beijing.humidity", +]] +data.values # -> np.ndarray, shape=(N, 2) +``` + ## Sample Code The sample code of using these interfaces is in:https://github.com/apache/tsfile/blob/develop/python/examples/example.py diff --git a/src/UserGuide/latest/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md b/src/UserGuide/latest/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md index 849f1c000..f6d07bdc8 100644 --- a/src/UserGuide/latest/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md +++ b/src/UserGuide/latest/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md @@ -352,4 +352,45 @@ def to_dataframe(file_path: str, ColumnNotExistError If any specified column does not exist in the table schema. """ -``` \ No newline at end of file +``` + +### TsFileDataFrame +TsFileDataFrame is built around three core types: + +* **TsFileDataFrame**: The entry object that loads one or more TsFiles and provides a unified view. Only metadata is scanned during initialization; actual data values are **not** read. +* **Timeseries**: A lazy-loaded handle for a single time series. Obtained via array-style indexing with `df[...]`, it contains series metadata but does not load data immediately – data reading is only triggered when indexed by row number. +* **AlignedTimeseries**: The time-aligned result of multiple series. Obtained via `df.loc[...]`, it aligns multiple specified series to the same timeline within a given time range and loads them into memory in one operation. + +#### TsFileDataFrame + +| Example | Operation | Return Type | +| -------------------------------------------- | -------------------------------------- | ----------------- | +| `TsFileDataFrame(paths)` | Load file(s) / directory | TsFileDataFrame | +| `len(df)` | Get total number of time series | int | +| `df.list_timeseries("weather")` | Get / filter series names by prefix | List[str] | +| `df["weather.beijing.humidity"], df[0], df[-1]` | Get a single time series | Timeseries | +| `df[0:3], df[[0,2,5]]` | Get multiple time series | List[Timeseries] | +| `df.loc[start:end, series_list]` | Query with timestamp alignment | AlignedTimeseries | + +#### Timeseries + +| Example | Operation | Return Type | +| ------------------- | ----------------------- | ----------- | +| `ts.name` | Series name | str | +| `len(ts)` | Number of data points | int | +| `ts.stats` | Series statistics | dict | +| `ts[20]` | Read single value | float | +| `ts[20:100]` | Slice by row range | np.ndarray | +| `ts.timestamps` | Timestamps array | np.ndarray | + +#### AlignedTimeseries + +| Example | Operation | Return Type | +| -------------------------------------------- | ----------------------- | ------------------------------- | +| `data.timestamps` | Timestamps array | `np.ndarray` | +| `data.values` | Value matrix | `np.ndarray, shape=(N, M)` | +| `data.series_names` | Series names list | `List[str]` | +| `data.shape` | Shape | `(N, M)` | +| `len(data)` | Number of rows | `int` | +| `data[0]`, `data[0:10]`, `data[0, 1]` | Row / element indexing | `np.ndarray` / scalar | +| `print(data)`, `data.show(50)` | Formatted output | Auto-truncated table | \ No newline at end of file diff --git a/src/UserGuide/latest/QuickStart/QuickStart-PYTHON.md b/src/UserGuide/latest/QuickStart/QuickStart-PYTHON.md index c29dfba4c..30dd23271 100644 --- a/src/UserGuide/latest/QuickStart/QuickStart-PYTHON.md +++ b/src/UserGuide/latest/QuickStart/QuickStart-PYTHON.md @@ -154,6 +154,24 @@ table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile") print(ts.to_dataframe(table_data_dir)) ``` +`TsFileDataFrame` allows you to read time series data from TsFile just like operating a DataFrame, without worrying about the underlying file format and data loading details. + +```Python +from iotdb_ai import TsFileDataFrame + +df = TsFileDataFrame("data/") # Load all TsFiles in the directory + # Browse all time series + +ts = df["weather.Beijing.humidity"] # Retrieve a single time series +window = ts[20:100] # Slice by row index -> np.ndarray + +data = df.loc[start:end, [ # Align multiple time series by timestamp + "weather.Beijing.temperature", + "weather.Beijing.humidity", +]] +data.values # -> np.ndarray, shape=(N, 2) +``` + ## Sample Code The sample code of using these interfaces is in:https://github.com/apache/tsfile/blob/develop/python/examples/example.py diff --git a/src/zh/UserGuide/develop/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md b/src/zh/UserGuide/develop/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md index 60515bff4..102a74555 100644 --- a/src/zh/UserGuide/develop/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md +++ b/src/zh/UserGuide/develop/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md @@ -265,6 +265,7 @@ class ResultSet: ``` + ### to_dataframe ```Python @@ -331,6 +332,46 @@ def to_dataframe(file_path: str, ColumnNotExistError 当指定的列在表结构中不存在时抛出。 """ - ``` +### TsFileDataFrame + +TsFileDataFrame 围绕着三个核心类型: + +* **TsFileDataFrame:** 入口对象,加载一至多个 TsFile 并提供统一视图。初始化时只扫描元数据,不读取实际数值。 +* **Timeseries:** 单条时间序列的懒加载句柄。通过 `df[...]`的数组操作获得,包含序列元信息但不立即读取,仅在行号索引时才触发数据读取 +* **AlignedTimeseries:** 多条序列的时间对齐结果。通过`df.loc[...]`获取,一次性将指定时间范围内的多条序列对齐到同一时间轴并读入内存 + +#### TsFileDataFrame + +| 示例 | 操作 | 返回类型 | +| ----------------------------------------------------- | ----------------------- | ------------------- | +| `TsFileDataFrame(paths)` | 加载文件/目录 | TsFileDataFrame | +| `len(df)` | 获取时间序列总数 | int | +| `df.list_timeseries("weather")` | 获取/按前缀筛选序列名 | List[str] | +| `df["weather.beijing.humidity"],df[0], df[-1]` | 获取单条序列 | Timeseries | +| `df[0:3], df[[0,2,5]]` | 获取多条序列 | List[Timeseries] | +| `df.loc[start:end, serlies_list]` | 按时间戳对齐查询 | AlignedTimeseries | + +#### Timeseries + +| 示例 | 操作 | 返回类型 | +| ----------------------------- | -------------- | ------------ | +| `ts.name` | 序列名 | str | +| `len(ts)` | 序列点数 | int | +| `ts.stats` | 序列统计信息 | dict | +| `ts[20]` | 单值读取 | float | +| `ts[20:100]` | 行范围切片 | np.ndarray | +| `ts.``timestamps` | 时间戳数组 | np.ndarray | + +#### AlignedTimeseries + +| 示例 | 操作 | 返回类型 | +| --------------------------------------------------- | --------------- | -------------------------------- | +| `data.timestamps` | 时间戳数组 | `np.ndarray` | +| `data.values` | 值矩阵 | `np.ndarray, shape=(N, M)` | +| `data.series_names` | 序列名列表 | `List[str]` | +| `data.shape` | 形状 | `(N, M)` | +| `len(data)` | 行数 | `int` | +| `data[0]`、`data[0:10]`、`data[0, 1]` | 行 / 元素索引 | `np.ndarray`/ scalar | +| `print(data)`、`data.show(50)` | 格式化输出 | 自动截断的表格 | diff --git a/src/zh/UserGuide/develop/QuickStart/QuickStart-PYTHON.md b/src/zh/UserGuide/develop/QuickStart/QuickStart-PYTHON.md index 8b435f37c..1ed7790d0 100644 --- a/src/zh/UserGuide/develop/QuickStart/QuickStart-PYTHON.md +++ b/src/zh/UserGuide/develop/QuickStart/QuickStart-PYTHON.md @@ -91,7 +91,7 @@ mvn clean install -P with-python -DskipTests mvnw.cmd clean install -P with-python -DskipTests ``` -* 编译成功后,wheel 文件将位于 `tsfile/python/dist` 目录下, 可通过 pip install 命令进行本地安装(假设他的名字是 tsfile.wheel) +* 编译成功后,wheel 文件将位于 `tsfile/python/dist` 目录下, 可通过 pip install 命令进行本地安装(假设他的名字是 `tsfile.wheel`) ```bash pip install tsfile.wheel @@ -158,6 +158,22 @@ table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile") print(ts.to_dataframe(table_data_dir)) ``` +TsFileDataFrame能够让你像操作 DataFrame 一样读取TsFile 中的时序数据,无需关心底层文件格式和数据加载细节。 + +```Python +from iotdb_ai import TsFileDataFrame + +df = TsFileDataFrame("data/") # 加载目录下所有 TsFile # 浏览所有序列 + +ts = df["weather.Beijing.humidity"] # 取一条序列 +window = ts[20:100] # 按行号切片 -> np.ndarray + +data = df.loc[start:end, [ # 按时间戳对齐多条序列"weather.Beijing.humidity", + "weather.Beijing.temperature", + "weather.Beijing.humidity", +]] +data.values # -> np.ndarray, shape=(N, 2) +``` ## 示例代码 diff --git a/src/zh/UserGuide/latest/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md b/src/zh/UserGuide/latest/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md index 08a4b2f6c..102a74555 100644 --- a/src/zh/UserGuide/latest/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md +++ b/src/zh/UserGuide/latest/QuickStart/InterfaceDefinition/InterfaceDefinition-Python.md @@ -332,5 +332,46 @@ def to_dataframe(file_path: str, ColumnNotExistError 当指定的列在表结构中不存在时抛出。 """ - ``` + +### TsFileDataFrame + +TsFileDataFrame 围绕着三个核心类型: + +* **TsFileDataFrame:** 入口对象,加载一至多个 TsFile 并提供统一视图。初始化时只扫描元数据,不读取实际数值。 +* **Timeseries:** 单条时间序列的懒加载句柄。通过 `df[...]`的数组操作获得,包含序列元信息但不立即读取,仅在行号索引时才触发数据读取 +* **AlignedTimeseries:** 多条序列的时间对齐结果。通过`df.loc[...]`获取,一次性将指定时间范围内的多条序列对齐到同一时间轴并读入内存 + +#### TsFileDataFrame + +| 示例 | 操作 | 返回类型 | +| ----------------------------------------------------- | ----------------------- | ------------------- | +| `TsFileDataFrame(paths)` | 加载文件/目录 | TsFileDataFrame | +| `len(df)` | 获取时间序列总数 | int | +| `df.list_timeseries("weather")` | 获取/按前缀筛选序列名 | List[str] | +| `df["weather.beijing.humidity"],df[0], df[-1]` | 获取单条序列 | Timeseries | +| `df[0:3], df[[0,2,5]]` | 获取多条序列 | List[Timeseries] | +| `df.loc[start:end, serlies_list]` | 按时间戳对齐查询 | AlignedTimeseries | + +#### Timeseries + +| 示例 | 操作 | 返回类型 | +| ----------------------------- | -------------- | ------------ | +| `ts.name` | 序列名 | str | +| `len(ts)` | 序列点数 | int | +| `ts.stats` | 序列统计信息 | dict | +| `ts[20]` | 单值读取 | float | +| `ts[20:100]` | 行范围切片 | np.ndarray | +| `ts.``timestamps` | 时间戳数组 | np.ndarray | + +#### AlignedTimeseries + +| 示例 | 操作 | 返回类型 | +| --------------------------------------------------- | --------------- | -------------------------------- | +| `data.timestamps` | 时间戳数组 | `np.ndarray` | +| `data.values` | 值矩阵 | `np.ndarray, shape=(N, M)` | +| `data.series_names` | 序列名列表 | `List[str]` | +| `data.shape` | 形状 | `(N, M)` | +| `len(data)` | 行数 | `int` | +| `data[0]`、`data[0:10]`、`data[0, 1]` | 行 / 元素索引 | `np.ndarray`/ scalar | +| `print(data)`、`data.show(50)` | 格式化输出 | 自动截断的表格 | diff --git a/src/zh/UserGuide/latest/QuickStart/QuickStart-PYTHON.md b/src/zh/UserGuide/latest/QuickStart/QuickStart-PYTHON.md index 430a324ef..d2a8fe53b 100644 --- a/src/zh/UserGuide/latest/QuickStart/QuickStart-PYTHON.md +++ b/src/zh/UserGuide/latest/QuickStart/QuickStart-PYTHON.md @@ -158,6 +158,22 @@ table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile") print(ts.to_dataframe(table_data_dir)) ``` +TsFileDataFrame 能够让你像操作 DataFrame 一样读取 TsFile 中的时序数据,无需关心底层文件格式和数据加载细节。 + +```Python +from iotdb_ai import TsFileDataFrame + +df = TsFileDataFrame("data/") # 加载目录下所有 TsFile # 浏览所有序列 + +ts = df["weather.Beijing.humidity"] # 取一条序列 +window = ts[20:100] # 按行号切片 -> np.ndarray + +data = df.loc[start:end, [ # 按时间戳对齐多条序列"weather.Beijing.humidity", + "weather.Beijing.temperature", + "weather.Beijing.humidity", +]] +data.values # -> np.ndarray, shape=(N, 2) +``` ## 示例代码