You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+21-3Lines changed: 21 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,12 +44,30 @@ ArcFlow now generates standalone creator documents in addition to collection rec
44
44
- Are marked with `is_creator: true` to distinguish from collections
45
45
- Must be fed into a Solr instance with fields to match their specific facets (See: Configure Solr Schema below)
46
46
47
+
### Agent Filtering
48
+
49
+
**ArcFlow automatically filters agents to include only legitimate creators** of archival materials. The following agent types are **excluded** from indexing:
- ✗ **System-generated agents** - Auto-created for users (identified by `system_generated` field)
53
+
- ✗ **Software agents** - Excluded by not querying the `/agents/software` endpoint
54
+
- ✗ **Repository agents** - Corporate entities representing the repository itself (identified by `is_repo_agent` field)
55
+
- ✗ **Donor-only agents** - Agents with only the 'donor' role and no creator role
56
+
57
+
**Agents are included if they meet any of these criteria:**
58
+
59
+
- ✓ Have the **'creator' role** in linked_agent_roles
60
+
- ✓ Are **linked to published records** (and not excluded by filters above)
61
+
62
+
This filtering ensures that only legitimate archival creators are discoverable in ArcLight, while protecting privacy and security by excluding system users and donors.
63
+
47
64
### How Creator Records Work
48
65
49
66
1.**Extraction**: `get_all_agents()` fetches all agents from ArchivesSpace
50
-
2.**Processing**: `task_agent()` generates an EAC-CPF XML document for each agent with bioghist notes
51
-
3.**Linking**: Handled via Solr using the persistent_id field (agents and collections linked through bioghist references)
52
-
4.**Indexing**: Creator XML files are indexed to Solr using `traject_config_eac_cpf.rb`
67
+
2.**Filtering**: `is_target_agent()` filters out system users, donors, and non-creator agents
68
+
3.**Processing**: `task_agent()` generates an EAC-CPF XML document for each target agent with bioghist notes
69
+
4.**Linking**: Handled via Solr using the persistent_id field (agents and collections linked through bioghist references)
70
+
5.**Indexing**: Creator XML files are indexed to Solr using `traject_config_eac_cpf.rb`
0 commit comments