@@ -8,131 +8,12 @@ defmodule Archivist.SystemCalls do
88 @ model "llama3.2"
99 @ num_ctx 8192
1010
11- @ categories [
12- "Vital Records and Identification" ,
13- "Financial Documents" ,
14- "Tax Records" ,
15- "Insurance Documents" ,
16- "Medical and Health Records" ,
17- "Property and Real Estate" ,
18- "Housing and Household" ,
19- "Vehicle and Transportation" ,
20- "Employment and Career" ,
21- "Legal and Estate Planning" ,
22- "Education and Professional Development" ,
23- "Family and Household Members" ,
24- "Warranties and Manuals" ,
25- "Memberships and Subscriptions" ,
26- "Travel and Leisure" ,
27- "Digital Assets and Online Accounts" ,
28- "Sentimental and Historical" ,
29- "Miscellaneous and Other"
30- ]
31-
3211 @ slug_length 25
3312
3413 @ system """
3514 You are a text classification and metadata extraction assistant. You will be given text
3615 extracted from a PDF, and your job is to return the following information in valid JSON format:
3716
38- - category (string)
39- - Must be exactly one of these: #{ @ categories |> Enum . map ( & ~s/ "#{ & 1 } "/ ) |> Enum . join ( ", " ) } .
40- - This refers to the overall subject area or domain of the document.
41- - Below are the category explanations for reference:
42- - Vital Records and Identification
43- - Description: Documents that establish or verify an individual's identity and
44- significant life events.
45- - Examples: Birth certificates, marriage or divorce certificates, death certificates
46- (for family members), passports, Social Security cards (or equivalents), citizenship
47- or naturalization papers, name change documents.
48- - Financial Documents
49- - Description: Paperwork related to banking, credit, investments, and recurring
50- expenses.
51- - Examples: Bank statements, credit card statements, loan agreements (mortgage,
52- student, car), investment records (stocks, bonds, mutual funds, cryptocurrency),
53- budget worksheets, utility bills, subscription invoices.
54- - Tax Records
55- - Description: All documents needed for tax filing, verification, and historical
56- reference.
57- - Examples: Past tax returns, W-2/1099 forms (or international equivalents), receipts
58- for deductible expenses (charitable donations, medical, business), property tax
59- statements.
60- - Insurance Documents
61- - Description: Policies and claims information for various types of insurance.
62- - Examples: Health insurance policy details, life insurance contracts, auto or
63- homeowners policies, coverage schedules, claim forms, renewal notices.
64- - Medical and Health Records
65- - Description: Personal and family health documentation, including treatments and
66- prescriptions.
67- - Examples: Immunization records, physician or hospital visit summaries, lab test
68- results, prescription information, dental/vision care records, documentation of
69- chronic conditions.
70- - Property and Real Estate
71- - Description: Paperwork detailing real property ownership, transactions, and
72- improvements.
73- - Examples: Mortgage agreements, deeds and titles, closing documents, lease agreements
74- for rental properties, receipts for major renovations, HOA (Homeowners Association)
75- guidelines.
76- - Housing and Household
77- - Description: Day-to-day living documents and service agreements for your home.
78- - Examples: Rental lease agreements (if renting), utility contracts and bills
79- (electricity, water, internet), service or maintenance contracts (e.g., lawn care,
80- pest control), appliance manuals, home repair receipts.
81- - Vehicle and Transportation
82- - Description: Records associated with car ownership, maintenance, and usage.
83- - Examples: Vehicle titles, registration papers, auto insurance policies, maintenance
84- and service records, warranty details, driver's license copies, parking permits.
85- - Employment and Career
86- - Description: Information related to current and past employment, as well as
87- professional growth.
88- - Examples: Employment contracts, offer letters, pay stubs, performance evaluations,
89- benefits guides, separation or termination documents, professional certifications,
90- résumés/CVs.
91- - Legal and Estate Planning
92- - Description: Legally binding papers covering estates, end-of-life directives, and
93- other legal matters.
94- - Examples: Wills, trusts, power of attorney documents, living wills or advance
95- directives, guardianship papers, and court or legal settlement documents.
96- - Education and Professional Development
97- - Description: Records of academic achievements, certifications, and ongoing
98- education.
99- - Examples: Transcripts, diplomas, course certificates, scholarships or grant info,
100- professional licenses, continuing education credits, conference attendance records.
101- - Family and Household Members
102- - Description: Personal documents specific to each household member or dependent.
103- - Examples: Spouse or partner's documents (if kept separately), children's birth
104- certificates, school records, immunization details, childcare arrangements, pet
105- adoption or vaccination papers.
106- - Warranties and Manuals
107- - Description: Documentation for product guarantees and user guides.
108- - Examples: Warranty information for electronics or appliances, user manuals, extended
109- service contracts, purchase receipts for large items or equipment.
110- - Memberships and Subscriptions
111- - Description: Details on recurring membership-based services or organizations.
112- - Examples: Gym memberships, club or association memberships, magazine or streaming
113- subscriptions, loyalty or frequent flyer program statements, renewal notices.
114- - Travel and Leisure
115- - Description: Arrangements and records related to vacations, trips, and leisure
116- activities.
117- - Examples: Travel itineraries, flight tickets, hotel confirmations, visa
118- documentation, travel insurance policies, timeshare contracts, past trip expense
119- receipts.
120- - Digital Assets and Online Accounts
121- - Description: Information and credentials for online identities, cloud services, and
122- digital platforms.
123- - Examples: Password manager references (stored securely), domain registrations, cloud
124- storage subscriptions, digital payment account details (PayPal, etc.), important
125- email or social media account notes.
126- - Sentimental and Historical
127- - Description: Keepsakes and personal or family history items with emotional or
128- genealogical importance.
129- - Examples: Family photos, letters, journals, genealogy research, copies of heirlooms,
130- scrapbooks, memorabilia.
131- - Miscellaneous and Other
132- - Description: A catch-all for documents that do not neatly fit into other categories.
133- - Examples: Personal or hobby-related projects, unusual one-off contracts, event
134- memorabilia, or temporary items awaiting proper classification.
135-
13617 - date (string)
13718 - Must be a valid ISO 8601 date in the format YYYY-MM-DD.
13819 - If the text contains multiple dates, choose the one most relevant to the document (e.g.,
@@ -156,19 +37,18 @@ defmodule Archivist.SystemCalls do
15637 Your output must follow exactly this JSON structure (example with placeholder values):
15738
15839 ```
159- {"category": "money", " date": "2025-01-30", "source": "abc-corp", "title": "invoice-for-jan"}
40+ {"date": "2025-01-30", "source": "abc-corp", "title": "invoice-for-jan"}
16041 ```
16142 """
16243
16344 @ format % {
16445 type: :object ,
16546 properties: % {
166- category: % { type: :string , enum: @ categories } ,
16747 date: % { type: :string , format: :date } ,
16848 source: % { type: :string } ,
16949 title: % { type: :string }
17050 } ,
171- required: [ :category , : date, :source , :title ]
51+ required: [ :date , :source , :title ]
17252 }
17353
17454 @ impl Archivist
@@ -215,14 +95,12 @@ defmodule Archivist.SystemCalls do
21595 ) ,
21696 { :ok ,
21797 % {
218- "category" => category ,
21998 "date" => date ,
22099 "source" => source ,
221100 "title" => title
222101 } } <- JSON . decode ( response ) do
223102 { :ok ,
224103 % {
225- category: category ,
226104 date: date ,
227105 source: Slug . slugify ( source , truncate: @ slug_length ) ,
228106 title: Slug . slugify ( title , truncate: @ slug_length )
0 commit comments