Skip to content

Commit 649fe0c

Browse files
committed
refine ufo2 doc
1 parent 4afe7be commit 649fe0c

33 files changed

Lines changed: 1508 additions & 1230 deletions

documents/docs/infrastructure/agents/design/processor.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -748,7 +748,7 @@ Different agent types implement platform-specific processors:
748748
| **Windows HostAgent** | `HostAgentProcessor` | Desktop screenshot + app list | Application selection | Launch app, create AppAgent | App selection history |
749749
| **Linux** | `LinuxAgentProcessor` | Screenshot + shell output | Shell command generation | Shell command execution | Command history |
750750

751-
See the [Agent Types documentation](../overview.md#agent-types) for platform-specific processor implementations.
751+
See the [Agent Types documentation](../agent_types.md) for platform-specific processor implementations.
752752

753753
---
754754

documents/docs/infrastructure/agents/design/strategy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -961,7 +961,7 @@ Different platforms implement platform-specific strategies while following the s
961961
- [Command Layer](command.md): How ACTION_EXECUTION strategies dispatch commands
962962
- [Memory System](memory.md): How MEMORY_UPDATE strategies use Memory and Blackboard
963963
- [State Layer](state.md): How AgentState delegates to Processor
964-
- [Agent Types](../overview.md#agent-types): Platform-specific strategy implementations
964+
- [Agent Types](../agent_types.md): Platform-specific strategy implementations
965965

966966
---
967967

Lines changed: 43 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,84 @@
11
# Batch Mode
22

3-
Batch mode is a feature of UFO, the agent allows batch automation of tasks.
3+
Batch mode allows automated execution of tasks on specific applications or files using predefined plan files. This mode is particularly useful for repetitive tasks on Microsoft Office applications (Word, Excel, PowerPoint).
44

55
## Quick Start
66

7-
### Step 1: Create a Plan file
7+
### Step 1: Create a Plan File
88

9-
Before starting the Batch mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:
9+
Create a JSON plan file that defines the task to be automated. The plan file should contain the following fields:
1010

1111
| Field | Description | Type |
1212
| ------ | -------------------------------------------------------------------------------------------- | ------- |
1313
| task | The task description. | String |
1414
| object | The application or file to interact with. | String |
1515
| close | Determines whether to close the corresponding application or file after completing the task. | Boolean |
1616

17-
Below is an example of a plan file:
17+
Example plan file:
1818

1919
```json
2020
{
2121
"task": "Type in a text of 'Test For Fun' with heading 1 level",
2222
"object": "draft.docx",
23-
"close": False
23+
"close": false
2424
}
2525
```
2626

27-
!!! note
28-
The `object` field is the application or file that the agent will interact with. The object **must be active** (can be minimized) when starting the Batch mode.
29-
The structure of your files should be as follows, where `tasks` is the directory for your tasks and `files` is where your object files are stored:
27+
**Important:** The `close` field should be a boolean value (`true` or `false`), not a Python boolean (`True` or `False`).
3028

31-
- Parent
32-
- tasks
33-
- files
29+
The file structure should be organized as follows:
3430

31+
```
32+
Parent/
33+
├── tasks/
34+
│ └── plan.json
35+
└── files/
36+
└── draft.docx
37+
```
38+
39+
The `object` field in the plan file refers to files in the `files` directory. The plan reader will automatically resolve the full file path by replacing `tasks` with `files` in the directory structure.
3540

36-
### Step 2: Start the Batch Mode
37-
To start the Batch mode, run the following command:
41+
### Step 2: Start Batch Mode
42+
43+
Run the following command to start batch mode:
3844

3945
```bash
40-
# assume you are in the cloned UFO folder
41-
python ufo.py --task_name {task_name} --mode batch_normal --plan {plan_file}
46+
# Assume you are in the cloned UFO folder
47+
python -m ufo --task {task_name} --mode batch_normal --plan {plan_file}
4248
```
4349

44-
!!! tip
45-
Replace `{task_name}` with the name of the task and `{plan_file}` with the `Path_to_Parent/Plan_file`.
50+
**Parameters:**
51+
- `{task_name}`: Name for this task execution (used for logging)
52+
- `{plan_file}`: Full path to the plan JSON file (e.g., `C:/Parent/tasks/plan.json`)
53+
54+
### Supported Applications
4655

56+
Batch mode currently supports the following Microsoft Office applications:
4757

58+
- **Word** (`.docx` files) - `WINWORD.EXE`
59+
- **Excel** (`.xlsx` files) - `EXCEL.EXE`
60+
- **PowerPoint** (`.pptx` files) - `POWERPNT.EXE`
61+
62+
The application will be automatically launched when the batch mode starts, and the specified file will be opened and maximized.
4863

4964
## Evaluation
50-
You may want to evaluate the `task` is completed successfully or not by following the plan. UFO will call the `EvaluationAgent` to evaluate the task if `EVA_SESSION` is set to `True` in the `config_dev.yaml` file.
5165

52-
You can check the evaluation log in the `logs/{task_name}/evaluation.log` file.
66+
UFO can automatically evaluate whether the task was completed successfully. To enable evaluation, ensure `EVA_SESSION` is set to `True` in the `config/ufo/system.yaml` file.
67+
68+
Check the evaluation results in `logs/{task_name}/evaluation.log`.
69+
70+
## References
71+
72+
The batch mode uses a `PlanReader` to parse the plan file and creates a `FromFileSession` to execute the plan.
5373

54-
# References
55-
The batch mode employs a `PlanReader` to parse the plan file and create a `FromFileSession` to follow the plan.
74+
### PlanReader
5675

57-
## PlanReader
58-
The `PlanReader` is located in the `ufo/module/sessions/plan_reader.py` file.
76+
The `PlanReader` is located at `ufo/module/sessions/plan_reader.py`.
5977

6078
:::module.sessions.plan_reader.PlanReader
6179

62-
<br>
63-
## FollowerSession
80+
### FromFileSession
6481

65-
The `FromFileSession` is also located in the `ufo/module/sessions/session.py` file.
82+
The `FromFileSession` is located at `ufo/module/sessions/session.py`.
6683

6784
:::module.sessions.session.FromFileSession
Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,37 @@
11
# Customization
22

3-
Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user.
3+
UFO can ask users for additional context or information when needed and save it in local memory for future reference. This customization feature enables a more personalized user experience by remembering user-specific information across sessions.
44

5-
## Scenario
5+
## Example Scenario
66

7-
Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again.
7+
Consider a task where UFO needs to book a cab. To complete this task, UFO requires the user's address. UFO will:
88

9+
1. Ask the user for their address
10+
2. Save the address in local memory
11+
3. Use the saved address automatically in future tasks that require it
912

10-
## Implementation
11-
We currently implement the customization feature in the `HostAgent` class. When the `HostAgent` needs additional information, it will transit to the `PENDING` state and ask the user for the information. The user will provide the information, and the `HostAgent` will save it in the local memory base for future reference. The saved information is stored in the `blackboard` and can be accessed by all agents in the session.
13+
This eliminates the need to repeatedly provide the same information.
1214

13-
!!! note
14-
The customization memory base is only saved in a **local file**. These information will **not** upload to the cloud or any other storage to protect the user's privacy.
15+
## How It Works
16+
17+
The customization feature is implemented across multiple agent types (`HostAgent`, `AppAgent`, and `OpenAIOperatorAgent`). When an agent needs additional information:
18+
19+
1. The agent transitions to the `PENDING` state
20+
2. The agent asks the user for the required information (if `ASK_QUESTION` is enabled)
21+
3. The user's response is saved to the `blackboard` in the QA pairs file
22+
4. All agents in the session can access this information from the shared `blackboard`
23+
24+
The saved QA pairs are stored locally as JSON lines in the file specified by `QA_PAIR_FILE`. Privacy is preserved as this information never leaves the local machine.
1525

1626
## Configuration
1727

18-
You can configure the customization feature by setting the following field in the `config_dev.yaml` file.
28+
Configure the customization feature in `config/ufo/system.yaml`:
29+
30+
| Configuration Option | Description | Type | Default Value |
31+
|------------------------|------------------------------------------------------------------|---------|---------------------------------------|
32+
| `ASK_QUESTION` | Whether to allow agents to ask users questions | Boolean | False |
33+
| `USE_CUSTOMIZATION` | Whether to load and use saved QA pairs from previous sessions | Boolean | False |
34+
| `QA_PAIR_FILE` | Path to the file storing historical QA pairs | String | "customization/global_memory.jsonl" |
35+
| `QA_PAIR_NUM` | Maximum number of recent QA pairs to load into memory | Integer | 20 |
1936

20-
| Configuration Option | Description | Type | Default Value |
21-
|------------------------|----------------------------------------------|---------|---------------------------------------|
22-
| `USE_CUSTOMIZATION` | Whether to enable the customization. | Boolean | True |
23-
| `QA_PAIR_FILE` | The path for the historical QA pairs. | String | "customization/historical_qa.txt" |
24-
| `QA_PAIR_NUM` | The number of QA pairs for the customization.| Integer | 20 |
37+
**Note:** Both `ASK_QUESTION` and `USE_CUSTOMIZATION` need to be enabled for the full customization experience. `ASK_QUESTION` controls whether agents can prompt users for information, while `USE_CUSTOMIZATION` controls whether previously saved information is loaded.
Lines changed: 30 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
# Follower Mode
22

3-
The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an `AppAgent` that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification.
3+
Follower mode enables UFO to execute a predefined list of steps in natural language. Unlike normal mode where the agent generates its own plan, follower mode creates an `AppAgent` that follows user-provided steps to interact with applications. This mode is particularly useful for debugging, software testing, and verification.
44

55
## Quick Start
66

7-
### Step 1: Create a Plan file
7+
### Step 1: Create a Plan File
88

9-
Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:
9+
Create a JSON plan file containing the steps for the agent to follow:
1010

1111
| Field | Description | Type |
1212
| --- | --- | --- |
1313
| task | The task description. | String |
1414
| steps | The list of steps for the agent to follow. | List of Strings |
1515
| object | The application or file to interact with. | String |
1616

17-
Below is an example of a plan file:
17+
Example plan file:
1818

1919
```json
2020
{
@@ -31,53 +31,54 @@ Below is an example of a plan file:
3131
}
3232
```
3333

34-
!!! note
35-
The `object` field is the application or file that the agent will interact with. The object **must be active** (can be minimized) when starting the Follower mode.
34+
The `object` field specifies the application or file the agent will interact with. This object should be opened and accessible before starting follower mode.
3635

36+
### Step 2: Start Follower Mode
3737

38-
### Step 2: Start the Follower Mode
39-
To start the Follower mode, run the following command:
38+
Run the following command:
4039

4140
```bash
42-
# assume you are in the cloned UFO folder
43-
python ufo.py --task_name {task_name} --mode follower --plan {plan_file}
41+
# Assume you are in the cloned UFO folder
42+
python -m ufo --task {task_name} --mode follower --plan {plan_file}
4443
```
4544

46-
!!! tip
47-
Replace `{task_name}` with the name of the task and `{plan_file}` with the path to the plan file.
48-
45+
**Parameters:**
46+
- `{task_name}`: Name for this task execution (used for logging)
47+
- `{plan_file}`: Path to the plan JSON file
4948

5049
### Step 3: Run in Batch (Optional)
5150

52-
You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command:
51+
To execute multiple plan files sequentially, provide a folder containing multiple plan files:
5352

5453
```bash
55-
# assume you are in the cloned UFO folder
56-
python ufo.py --task_name {task_name} --mode follower --plan {plan_folder}
54+
# Assume you are in the cloned UFO folder
55+
python -m ufo --task {task_name} --mode follower --plan {plan_folder}
5756
```
5857

59-
UFO will automatically detect the plan files in the folder and run them one by one.
60-
61-
!!! tip
62-
Replace `{task_name}` with the name of the task and `{plan_folder}` with the path to the folder containing plan files.
58+
UFO will automatically detect and execute all plan files in the folder sequentially.
6359

60+
**Parameters:**
61+
- `{task_name}`: Name for this batch execution (used for logging)
62+
- `{plan_folder}`: Path to the folder containing plan JSON files
6463

6564
## Evaluation
66-
You may want to evaluate the `task` is completed successfully or not by following the plan. UFO will call the `EvaluationAgent` to evaluate the task if `EVA_SESSION` is set to `True` in the `config_dev.yaml` file.
6765

68-
You can check the evaluation log in the `logs/{task_name}/evaluation.log` file.
66+
UFO can automatically evaluate task completion. To enable evaluation, ensure `EVA_SESSION` is set to `True` in `config/ufo/system.yaml`.
67+
68+
Check the evaluation results in `logs/{task_name}/evaluation.log`.
69+
70+
## References
71+
72+
Follower mode uses a `PlanReader` to parse the plan file and creates a `FollowerSession` to execute the steps.
6973

70-
# References
71-
The follower mode employs a `PlanReader` to parse the plan file and create a `FollowerSession` to follow the plan.
74+
### PlanReader
7275

73-
## PlanReader
74-
The `PlanReader` is located in the `ufo/module/sessions/plan_reader.py` file.
76+
The `PlanReader` is located at `ufo/module/sessions/plan_reader.py`.
7577

7678
:::module.sessions.plan_reader.PlanReader
7779

78-
<br>
79-
## FollowerSession
80+
### FollowerSession
8081

81-
The `FollowerSession` is also located in the `ufo/module/sessions/session.py` file.
82+
The `FollowerSession` is located at `ufo/module/sessions/session.py`.
8283

8384
:::module.sessions.session.FollowerSession
Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,52 @@
11
# Operator as an AppAgent
22

3-
UFO² supports **wrapping any third-party agent as an AppAgent**, allowing it to be invoked by the HostAgent within a multi-agent workflow. This section demonstrates how to run **Operator**, an OpenAI-based Conversational UI Agent (CUA), as an AppAgent inside the UFO² ecosystem.
3+
UFO² supports wrapping third-party agents as AppAgents, enabling them to be orchestrated by the HostAgent in multi-agent workflows. This guide demonstrates how to run **Operator**, an OpenAI-based Conversational UI Agent (CUA), within the UFO² ecosystem.
44

5-
<div align="center">
6-
<img src="/img/everything.png" alt="Speculative Multi-Action Execution" />
7-
</div>
5+
![Operator Integration](../../img/everything.png)
86

9-
<br><br>
7+
## Prerequisites
108

11-
## 📦 Prerequisites
9+
Before proceeding, ensure that Operator has been properly configured. Follow the setup instructions in the [OpenAI CUA (Operator) guide](../../configuration/models/operator.md).
1210

13-
Before proceeding, please ensure that the Operator has been properly configured. You can follow the setup instructions in the [OpenAI CUA (Operator) guide](../../configuration/models/operator.md).
11+
## Running the Operator
1412

15-
## 🚀 Running the Operator
13+
UFO² provides two modes for running Operator:
1614

17-
UFO² provides two modes for running the Operator:
15+
1. **Single Agent Mode (`operator`)** — Run Operator independently through UFO² as a launcher
16+
2. **AppAgent Mode (`normal_operator`)** — Run Operator as an `AppAgent` orchestrated by the `HostAgent`
1817

19-
1. **Single Agent Mode** — Use UFO² as the launcher to run Operator in standalone mode.
20-
2. **AppAgent Mode** — Run Operator as an `AppAgent`, enabling it to be orchestrated by the `HostAgent` as part of a broader task decomposition.
18+
### Single Agent Mode
2119

22-
### 🔹 Single Agent Mode
20+
In single agent mode, Operator functions independently but is launched through UFO². This mode is useful for debugging or quick prototyping.
2321

24-
In this mode, the Operator functions independently but is launched through UFO². This is useful for debugging or quick prototyping.
22+
```powershell
23+
python -m ufo --mode operator --task <your_task_name> --request <your_request>
24+
```
2525

26+
**Example:**
2627
```powershell
27-
python -m ufo -m operator -t <your_task_name> -r <your_request>
28+
python -m ufo --mode operator --task test_operator --request "Open Notepad and type Hello World"
2829
```
2930

30-
### 🔸 AppAgent Mode
31+
### AppAgent Mode
32+
33+
In AppAgent mode, Operator is wrapped as an `AppAgent` and can be triggered as a sub-agent within the HostAgent workflow. This enables task decomposition where the HostAgent coordinates multiple agents including Operator.
3134

32-
This mode wraps Operator as an AppAgent (`normal_operator`) so that it can be triggered as a sub-agent within a full HostAgent workflow.
35+
```powershell
36+
python -m ufo --mode normal_operator --task <your_task_name> --request <your_request>
37+
```
3338

39+
**Example:**
3440
```powershell
35-
python -m ufo -m normal_operator -t <your_task_name> -r <your_request>
41+
python -m ufo --mode normal_operator --task test_integration --request "Search for Python documentation and open the first result"
3642
```
3743

38-
## 📝 Logs
44+
## Logs
3945

40-
In both modes, execution logs will be saved in the following directory:
46+
In both modes, execution logs are saved in:
4147

4248
```
4349
logs/<your_task_name>/
4450
```
4551

46-
These logs follow the same structure and conventions as previous UFO² sessions.
52+
These logs follow the same structure and conventions as other UFO² sessions.

0 commit comments

Comments
 (0)