Skip to content

Commit e326983

Browse files
Copilotalexdryden
andauthored
Reorder traject config discovery to follow collection records pattern (#14)
* Improve traject config discovery and logging - Add fallback search in arcflow package directory for development - Add clear logging showing which traject config is being used - Add warning when using arcflow package version (development mode) - Improve error messages when traject config not found - Document that traject config belongs in arcuit gem, not arcflow - Update README with traject config location guidance Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com> * Address code review feedback - Change log level from error to warning for missing traject config - Update example path to clarify arcuit gem location - Show actual searched paths in error message for better troubleshooting Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com> * Reorder traject config search to follow collection records pattern - Change search order: arcuit_dir (1st) → bundle show (2nd) → example file (3rd) - Rename traject_config_eac_cpf.rb to example_traject_config_eac_cpf.rb - Prioritize arcuit_dir parameter as most up-to-date user control - Fall back to example file for module usage without arcuit - Update README with new search order and example file guidance Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com> * Address code review feedback on example file - Update usage comment to reference correct filename - Improve log message formatting for consistency - Add note about copying to arcuit for production use Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com> * Update traject config search paths to follow ArcLight pattern - Remove arcuit_dir/arcflow path (development artifact) - Add arcuit_dir/lib/arcuit/traject path (matches EAD traject location) - Apply same paths to both arcuit_dir and bundle show arcuit searches - Update debug message to reflect new subdirectory checked Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com> * Simplify example traject config search to single known location - Remove candidate paths loop for example file - Directly check the one known location at repo root - Add comment explaining we know the exact location Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
1 parent b0bcf33 commit e326983

3 files changed

Lines changed: 74 additions & 32 deletions

File tree

README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,22 @@ This is a **one-time setup** per Solr instance.
137137

138138
---
139139

140-
To index creator documents to Solr:
140+
### Traject Configuration for Creator Indexing
141+
142+
The `traject_config_eac_cpf.rb` file defines how EAC-CPF creator records are mapped to Solr fields.
143+
144+
**Search Order**: arcflow searches for the traject config following the collection records pattern:
145+
1. **arcuit_dir parameter** (if provided via `--arcuit-dir`) - Highest priority, most up-to-date user control
146+
2. **arcuit gem** (via `bundle show arcuit`) - For backward compatibility when arcuit_dir not provided
147+
3. **example_traject_config_eac_cpf.rb** in arcflow - Fallback for module usage without arcuit
148+
149+
**Example File**: arcflow includes `example_traject_config_eac_cpf.rb` as a reference implementation. For production:
150+
- Copy this file to your arcuit gem as `traject_config_eac_cpf.rb`, or
151+
- Specify the location with `--arcuit-dir /path/to/arcuit`
152+
153+
**Logging**: arcflow clearly logs which traject config file is being used when creator indexing runs.
154+
155+
To index creator documents to Solr manually:
141156

142157
```bash
143158
bundle exec traject \

arcflow/main.py

Lines changed: 55 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -965,14 +965,15 @@ def process_creators(self):
965965
self.log.info(f'{indent}Indexing {len(creator_ids)} creator records to Solr...')
966966
traject_config = self.find_traject_config()
967967
if traject_config:
968+
self.log.info(f'{indent}Using traject config: {traject_config}')
968969
indexed = self.index_creators(agents_dir, creator_ids)
969970
self.log.info(f'{indent}Creator indexing complete: {indexed}/{len(creator_ids)} indexed')
970971
else:
971-
self.log.info(f'{indent}Skipping creator indexing (traject config not found)')
972+
self.log.warning(f'{indent}Skipping creator indexing (traject config not found)')
972973
self.log.info(f'{indent}To index manually:')
973974
self.log.info(f'{indent} cd {self.arclight_dir}')
974975
self.log.info(f'{indent} bundle exec traject -u {self.solr_url} -i xml \\')
975-
self.log.info(f'{indent} -c /path/to/arcuit/arcflow/traject_config_eac_cpf.rb \\')
976+
self.log.info(f'{indent} -c /path/to/arcuit-gem/traject_config_eac_cpf.rb \\')
976977
self.log.info(f'{indent} {agents_dir}/*.xml')
977978
elif self.skip_creator_indexing:
978979
self.log.info(f'{indent}Skipping creator indexing (--skip-creator-indexing flag set)')
@@ -984,15 +985,32 @@ def find_traject_config(self):
984985
"""
985986
Find the traject config for creator indexing.
986987
987-
Tries:
988-
1. bundle show arcuit (finds installed gem)
989-
2. self.arcuit_dir (explicit path)
990-
3. Returns None if neither works
988+
Search order (follows collection records pattern):
989+
1. arcuit_dir if provided (most up-to-date user control)
990+
2. arcuit gem via bundle show (for backward compatibility)
991+
3. example_traject_config_eac_cpf.rb in arcflow (fallback when used as module without arcuit)
991992
992993
Returns:
993994
str: Path to traject config, or None if not found
994995
"""
995-
# Try bundle show arcuit first
996+
self.log.info('Searching for traject_config_eac_cpf.rb...')
997+
searched_paths = []
998+
999+
# Try 1: arcuit_dir if provided (highest priority - user's explicit choice)
1000+
if self.arcuit_dir:
1001+
self.log.debug(f' Checking arcuit_dir parameter: {self.arcuit_dir}')
1002+
candidate_paths = [
1003+
os.path.join(self.arcuit_dir, 'traject_config_eac_cpf.rb'),
1004+
os.path.join(self.arcuit_dir, 'lib', 'arcuit', 'traject', 'traject_config_eac_cpf.rb'),
1005+
]
1006+
searched_paths.extend(candidate_paths)
1007+
for traject_config in candidate_paths:
1008+
if os.path.exists(traject_config):
1009+
self.log.info(f'✓ Using traject config from arcuit_dir: {traject_config}')
1010+
return traject_config
1011+
self.log.debug(' traject_config_eac_cpf.rb not found in arcuit_dir')
1012+
1013+
# Try 2: bundle show arcuit (for backward compatibility when arcuit_dir not provided)
9961014
try:
9971015
result = subprocess.run(
9981016
['bundle', 'show', 'arcuit'],
@@ -1003,39 +1021,46 @@ def find_traject_config(self):
10031021
)
10041022
if result.returncode == 0:
10051023
arcuit_path = result.stdout.strip()
1006-
# Prefer config at gem root, fall back to legacy subdirectory layout
1024+
self.log.debug(f' Found arcuit gem at: {arcuit_path}')
10071025
candidate_paths = [
10081026
os.path.join(arcuit_path, 'traject_config_eac_cpf.rb'),
1009-
os.path.join(arcuit_path, 'arcflow', 'traject_config_eac_cpf.rb'),
1027+
os.path.join(arcuit_path, 'lib', 'arcuit', 'traject', 'traject_config_eac_cpf.rb'),
10101028
]
1029+
searched_paths.extend(candidate_paths)
10111030
for traject_config in candidate_paths:
10121031
if os.path.exists(traject_config):
1013-
self.log.info(f'Found traject config via bundle show: {traject_config}')
1032+
self.log.info(f'✓ Using traject config from arcuit gem: {traject_config}')
10141033
return traject_config
1015-
self.log.warning(
1016-
'bundle show arcuit succeeded but traject_config_eac_cpf.rb '
1017-
'was not found in any expected location under the gem root'
1034+
self.log.debug(
1035+
' traject_config_eac_cpf.rb not found in arcuit gem '
1036+
'(checked root and lib/arcuit/traject/ subdirectory)'
10181037
)
10191038
else:
1020-
self.log.debug('bundle show arcuit failed (gem not installed?)')
1039+
self.log.debug(' arcuit gem not found via bundle show')
10211040
except Exception as e:
1022-
self.log.debug(f'Error running bundle show arcuit: {e}')
1023-
# Fall back to arcuit_dir if provided
1024-
if self.arcuit_dir:
1025-
candidate_paths = [
1026-
os.path.join(self.arcuit_dir, 'traject_config_eac_cpf.rb'),
1027-
os.path.join(self.arcuit_dir, 'arcflow', 'traject_config_eac_cpf.rb'),
1028-
]
1029-
for traject_config in candidate_paths:
1030-
if os.path.exists(traject_config):
1031-
self.log.info(f'Using traject config from arcuit_dir: {traject_config}')
1032-
return traject_config
1033-
self.log.warning(
1034-
'arcuit_dir provided but traject_config_eac_cpf.rb was not found '
1035-
'in any expected location'
1041+
self.log.debug(f' Error checking for arcuit gem: {e}')
1042+
1043+
# Try 3: example file in arcflow package (fallback for module usage without arcuit)
1044+
# We know exactly where this file is located - at the repo root
1045+
arcflow_package_dir = os.path.dirname(os.path.abspath(__file__))
1046+
arcflow_repo_root = os.path.dirname(arcflow_package_dir)
1047+
traject_config = os.path.join(arcflow_repo_root, 'example_traject_config_eac_cpf.rb')
1048+
searched_paths.append(traject_config)
1049+
1050+
if os.path.exists(traject_config):
1051+
self.log.info(f'✓ Using example traject config from arcflow: {traject_config}')
1052+
self.log.info(
1053+
' Note: Using example config. For production, copy this file to your '
1054+
'arcuit gem or specify location with --arcuit-dir.'
10361055
)
1037-
# No config found
1038-
self.log.warning('Could not find traject config (bundle show arcuit failed and arcuit_dir not provided or invalid)')
1056+
return traject_config
1057+
1058+
# No config found anywhere - show all paths searched
1059+
self.log.error('✗ Could not find traject_config_eac_cpf.rb in any of these locations:')
1060+
for i, path in enumerate(searched_paths, 1):
1061+
self.log.error(f' {i}. {path}')
1062+
self.log.error('')
1063+
self.log.error(' Add traject_config_eac_cpf.rb to your arcuit gem or specify with --arcuit-dir.')
10391064
return None
10401065

10411066

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@
44
# Persons, and Families) XML documents from ArchivesSpace archival_contexts endpoint.
55
#
66
# Usage:
7-
# bundle exec traject -u $SOLR_URL -c traject_config_eac_cpf.rb /path/to/agents/*.xml
7+
# bundle exec traject -u $SOLR_URL -c example_traject_config_eac_cpf.rb /path/to/agents/*.xml
8+
#
9+
# For production, copy this file to your arcuit gem as traject_config_eac_cpf.rb
810
#
911
# The EAC-CPF XML documents are retrieved directly from ArchivesSpace via:
1012
# /repositories/{repo_id}/archival_contexts/{agent_type}/{id}.xml

0 commit comments

Comments
 (0)