Skip to content

Commit 24e604e

Browse files
committed
Remove category from AI responsibilities
1 parent f949dcd commit 24e604e

4 files changed

Lines changed: 9 additions & 139 deletions

File tree

lib/archivist.ex

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@ defmodule Archivist do
44
"""
55

66
@callback extract_pdf_information(path :: Path.t()) ::
7-
{:ok,
8-
%{category: String.t(), date: String.t(), source: String.t(), title: String.t()}}
9-
| :error
7+
{:ok, %{date: String.t(), source: String.t(), title: String.t()}} | :error
108

119
@callback init :: :ok | :error
1210

@@ -24,7 +22,6 @@ defmodule Archivist do
2422
2523
iex> Archivist.extract_pdf_information(pdf_path)
2624
{:ok, %{
27-
category: "money",
2825
date: "2025-01-30",
2926
source: "abc-corp",
3027
title: "invoice-for-jan"

lib/archivist/system_calls.ex

Lines changed: 2 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -8,131 +8,12 @@ defmodule Archivist.SystemCalls do
88
@model "llama3.2"
99
@num_ctx 8192
1010

11-
@categories [
12-
"Vital Records and Identification",
13-
"Financial Documents",
14-
"Tax Records",
15-
"Insurance Documents",
16-
"Medical and Health Records",
17-
"Property and Real Estate",
18-
"Housing and Household",
19-
"Vehicle and Transportation",
20-
"Employment and Career",
21-
"Legal and Estate Planning",
22-
"Education and Professional Development",
23-
"Family and Household Members",
24-
"Warranties and Manuals",
25-
"Memberships and Subscriptions",
26-
"Travel and Leisure",
27-
"Digital Assets and Online Accounts",
28-
"Sentimental and Historical",
29-
"Miscellaneous and Other"
30-
]
31-
3211
@slug_length 25
3312

3413
@system """
3514
You are a text classification and metadata extraction assistant. You will be given text
3615
extracted from a PDF, and your job is to return the following information in valid JSON format:
3716
38-
- category (string)
39-
- Must be exactly one of these: #{@categories |> Enum.map(&~s/"#{&1}"/) |> Enum.join(", ")}.
40-
- This refers to the overall subject area or domain of the document.
41-
- Below are the category explanations for reference:
42-
- Vital Records and Identification
43-
- Description: Documents that establish or verify an individual's identity and
44-
significant life events.
45-
- Examples: Birth certificates, marriage or divorce certificates, death certificates
46-
(for family members), passports, Social Security cards (or equivalents), citizenship
47-
or naturalization papers, name change documents.
48-
- Financial Documents
49-
- Description: Paperwork related to banking, credit, investments, and recurring
50-
expenses.
51-
- Examples: Bank statements, credit card statements, loan agreements (mortgage,
52-
student, car), investment records (stocks, bonds, mutual funds, cryptocurrency),
53-
budget worksheets, utility bills, subscription invoices.
54-
- Tax Records
55-
- Description: All documents needed for tax filing, verification, and historical
56-
reference.
57-
- Examples: Past tax returns, W-2/1099 forms (or international equivalents), receipts
58-
for deductible expenses (charitable donations, medical, business), property tax
59-
statements.
60-
- Insurance Documents
61-
- Description: Policies and claims information for various types of insurance.
62-
- Examples: Health insurance policy details, life insurance contracts, auto or
63-
homeowners policies, coverage schedules, claim forms, renewal notices.
64-
- Medical and Health Records
65-
- Description: Personal and family health documentation, including treatments and
66-
prescriptions.
67-
- Examples: Immunization records, physician or hospital visit summaries, lab test
68-
results, prescription information, dental/vision care records, documentation of
69-
chronic conditions.
70-
- Property and Real Estate
71-
- Description: Paperwork detailing real property ownership, transactions, and
72-
improvements.
73-
- Examples: Mortgage agreements, deeds and titles, closing documents, lease agreements
74-
for rental properties, receipts for major renovations, HOA (Homeowners Association)
75-
guidelines.
76-
- Housing and Household
77-
- Description: Day-to-day living documents and service agreements for your home.
78-
- Examples: Rental lease agreements (if renting), utility contracts and bills
79-
(electricity, water, internet), service or maintenance contracts (e.g., lawn care,
80-
pest control), appliance manuals, home repair receipts.
81-
- Vehicle and Transportation
82-
- Description: Records associated with car ownership, maintenance, and usage.
83-
- Examples: Vehicle titles, registration papers, auto insurance policies, maintenance
84-
and service records, warranty details, driver's license copies, parking permits.
85-
- Employment and Career
86-
- Description: Information related to current and past employment, as well as
87-
professional growth.
88-
- Examples: Employment contracts, offer letters, pay stubs, performance evaluations,
89-
benefits guides, separation or termination documents, professional certifications,
90-
résumés/CVs.
91-
- Legal and Estate Planning
92-
- Description: Legally binding papers covering estates, end-of-life directives, and
93-
other legal matters.
94-
- Examples: Wills, trusts, power of attorney documents, living wills or advance
95-
directives, guardianship papers, and court or legal settlement documents.
96-
- Education and Professional Development
97-
- Description: Records of academic achievements, certifications, and ongoing
98-
education.
99-
- Examples: Transcripts, diplomas, course certificates, scholarships or grant info,
100-
professional licenses, continuing education credits, conference attendance records.
101-
- Family and Household Members
102-
- Description: Personal documents specific to each household member or dependent.
103-
- Examples: Spouse or partner's documents (if kept separately), children's birth
104-
certificates, school records, immunization details, childcare arrangements, pet
105-
adoption or vaccination papers.
106-
- Warranties and Manuals
107-
- Description: Documentation for product guarantees and user guides.
108-
- Examples: Warranty information for electronics or appliances, user manuals, extended
109-
service contracts, purchase receipts for large items or equipment.
110-
- Memberships and Subscriptions
111-
- Description: Details on recurring membership-based services or organizations.
112-
- Examples: Gym memberships, club or association memberships, magazine or streaming
113-
subscriptions, loyalty or frequent flyer program statements, renewal notices.
114-
- Travel and Leisure
115-
- Description: Arrangements and records related to vacations, trips, and leisure
116-
activities.
117-
- Examples: Travel itineraries, flight tickets, hotel confirmations, visa
118-
documentation, travel insurance policies, timeshare contracts, past trip expense
119-
receipts.
120-
- Digital Assets and Online Accounts
121-
- Description: Information and credentials for online identities, cloud services, and
122-
digital platforms.
123-
- Examples: Password manager references (stored securely), domain registrations, cloud
124-
storage subscriptions, digital payment account details (PayPal, etc.), important
125-
email or social media account notes.
126-
- Sentimental and Historical
127-
- Description: Keepsakes and personal or family history items with emotional or
128-
genealogical importance.
129-
- Examples: Family photos, letters, journals, genealogy research, copies of heirlooms,
130-
scrapbooks, memorabilia.
131-
- Miscellaneous and Other
132-
- Description: A catch-all for documents that do not neatly fit into other categories.
133-
- Examples: Personal or hobby-related projects, unusual one-off contracts, event
134-
memorabilia, or temporary items awaiting proper classification.
135-
13617
- date (string)
13718
- Must be a valid ISO 8601 date in the format YYYY-MM-DD.
13819
- If the text contains multiple dates, choose the one most relevant to the document (e.g.,
@@ -156,19 +37,18 @@ defmodule Archivist.SystemCalls do
15637
Your output must follow exactly this JSON structure (example with placeholder values):
15738
15839
```
159-
{"category": "money", "date": "2025-01-30", "source": "abc-corp", "title": "invoice-for-jan"}
40+
{"date": "2025-01-30", "source": "abc-corp", "title": "invoice-for-jan"}
16041
```
16142
"""
16243

16344
@format %{
16445
type: :object,
16546
properties: %{
166-
category: %{type: :string, enum: @categories},
16747
date: %{type: :string, format: :date},
16848
source: %{type: :string},
16949
title: %{type: :string}
17050
},
171-
required: [:category, :date, :source, :title]
51+
required: [:date, :source, :title]
17252
}
17353

17454
@impl Archivist
@@ -215,14 +95,12 @@ defmodule Archivist.SystemCalls do
21595
),
21696
{:ok,
21797
%{
218-
"category" => category,
21998
"date" => date,
22099
"source" => source,
221100
"title" => title
222101
}} <- JSON.decode(response) do
223102
{:ok,
224103
%{
225-
category: category,
226104
date: date,
227105
source: Slug.slugify(source, truncate: @slug_length),
228106
title: Slug.slugify(title, truncate: @slug_length)

lib/archivist/worker.ex

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,11 @@ defmodule Archivist.Worker do
5656
end
5757

5858
defp move_file(path, extracted_info, state) do
59-
category_path = Path.join(state.archive, extracted_info.category)
60-
6159
archive_filename =
6260
"#{extracted_info.date}_#{extracted_info.source}_#{extracted_info.title}.pdf"
6361

64-
File.mkdir_p!(category_path)
65-
archive_path = Path.join(category_path, archive_filename)
62+
File.mkdir_p!(state.archive)
63+
archive_path = Path.join(state.archive, archive_filename)
6664

6765
Logger.info("Moving #{inspect(Path.basename(path))} to #{inspect(archive_path)}")
6866

@@ -75,7 +73,7 @@ defmodule Archivist.Worker do
7573
[
7674
System.system_time(:second),
7775
Path.basename(path),
78-
Path.join(extracted_info.category, archive_filename)
76+
archive_filename
7977
]
8078
])
8179

test/archivist/worker_test.exs

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,6 @@ defmodule Archivist.WorkerTest do
131131
File.touch!(pdf_path)
132132

133133
pdf_info = %{
134-
category: "money",
135134
date: "2025-01-30",
136135
source: "abc-corp",
137136
title: "invoice-for-jan"
@@ -148,7 +147,7 @@ defmodule Archivist.WorkerTest do
148147

149148
assert {:noreply, state, {:continue, :next_file}} == result
150149

151-
archive_path = Path.join([archive, "money", "2025-01-30_abc-corp_invoice-for-jan.pdf"])
150+
archive_path = Path.join([archive, "2025-01-30_abc-corp_invoice-for-jan.pdf"])
152151
assert File.exists?(archive_path)
153152

154153
assert info_log =~ "Archiving #{inspect(pdf_path)}"
@@ -203,15 +202,13 @@ defmodule Archivist.WorkerTest do
203202
File.touch!(pdf_path)
204203

205204
pdf_info = %{
206-
category: "money",
207205
date: "2025-01-30",
208206
source: "abc-corp",
209207
title: "invoice-for-jan"
210208
}
211209

212-
category_path = Path.join(archive, "money")
213-
File.mkdir_p!(category_path)
214-
archive_path = Path.join(category_path, "2025-01-30_abc-corp_invoice-for-jan.pdf")
210+
File.mkdir_p!(archive)
211+
archive_path = Path.join(archive, "2025-01-30_abc-corp_invoice-for-jan.pdf")
215212
File.touch!(archive_path)
216213

217214
Archivist.Mock

0 commit comments

Comments
 (0)