-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsystem-diagram.puml
More file actions
152 lines (127 loc) · 4.68 KB
/
system-diagram.puml
File metadata and controls
152 lines (127 loc) · 4.68 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
@startuml ShowScraper System Architecture
!define RECTANGLE class
skinparam componentStyle rectangle
skinparam backgroundColor #FEFEFE
skinparam component {
BackgroundColor<<scraper>> #FFE5E5
BackgroundColor<<llm>> #E5F5FF
BackgroundColor<<frontend>> #E5FFE5
BackgroundColor<<external>> #FFF5E5
BackgroundColor<<storage>> #F5E5FF
}
title ShowScraper - Bay Area Concert Aggregation Platform
' === SCRAPER TIER ===
package "Scraper Backend (Ruby + Selenium)" <<scraper>> {
component [bin/run_scraper\nCLI Entry Point] as CLI
component [scraper.rb\nMain Orchestrator] as Scraper
component [54+ Venue Scrapers\n(DnaLounge.rb, Fillmore.rb, etc.)] as Venues
component [GCS Upload\n(gcs.rb)] as GCSUpload
component [Selenium WebDriver\nFirefox/Chrome] as Selenium
}
' === STORAGE TIER ===
package "Google Cloud Storage" <<storage>> {
database [show-scraper-data\n• sources.json\n• {VenueName}.json (54+ files)] as GCS
}
' === LLM TIER ===
package "LLM Server (Python FastAPI)" <<llm>> {
component [FastAPI Server\nlocalhost:8000] as FastAPI
component [Concert Research Handler\n(concert_research.py)] as Handler
package "Research Modes" {
component [Quick Mode\n(2-3 sentence summary)] as Quick
component [Artists List Mode\n(multi-artist research)] as ArtistsList
component [Artists Fields Mode\n(deep dive research)] as ArtistsFields
}
component [Cache System\n(tasks/cache/)] as Cache
component [Logging\n(tasks/logs/)] as Logs
}
' === FRONTEND TIER ===
package "Frontend (React + GitHub Pages)" <<frontend>> {
component [React App\n(App.jsx)] as ReactApp
component [GcsDataLoader\n(fetch venue & event data)] as DataLoader
package "State Management (Recoil)" {
component [eventsState\nvenuesState\naiModalState] as State
}
package "Views" {
component [EventListView\n(Text + Images)] as EventList
component [MapView\n(Leaflet)] as MapView
component [VenuesList] as VenuesList
component [AIResearchModal] as AIModal
}
component [Utilities\n(date, filter, calendar, map)] as Utils
}
' === EXTERNAL SERVICES ===
package "External Services" <<external>> {
component [Venue Websites\n(50+ venues)] as Websites
component [OpenAI API\n(Claude LLM)] as OpenAI
component [SerpAPI\n(Web Search)] as SerpAPI
component [GitHub Pages\nbayareashows.org] as GitHubPages
}
' === SCRAPER FLOW ===
CLI --> Scraper : launches
Scraper --> Venues : orchestrates 54+ scrapers\n(3min timeout each)
Venues --> Selenium : browser automation
Selenium --> Websites : HTTP requests
Websites --> Selenium : HTML responses
Venues --> Scraper : event arrays\n[{url, title, date, img, details}]
Scraper --> GCSUpload : validated events
GCSUpload --> GCS : upload JSON files
' === FRONTEND DATA FLOW ===
GCS --> DataLoader : fetch sources.json +\n54 venue JSON files
DataLoader --> State : populate eventsState\n& venuesState
State --> EventList : render events by date
State --> MapView : render venue markers
State --> VenuesList : render venue list
State --> AIModal : event selection
' === AI RESEARCH FLOW ===
AIModal --> FastAPI : SSE request\n/tasks/concert-research?date&title&venue&mode
FastAPI --> Handler : route request
Handler --> Quick : mode=quick
Handler --> ArtistsList : mode=artists_list
Handler --> ArtistsFields : mode=artists_fields
Quick --> OpenAI : simple LLM call
ArtistsList --> OpenAI : LLM with SerpAPI tool
ArtistsList --> SerpAPI : web searches
ArtistsFields --> OpenAI : multi-query LLM
ArtistsFields --> SerpAPI : multiple searches
Quick --> Cache : store results
ArtistsList --> Cache : store results
ArtistsFields --> Cache : store results
Handler --> Logs : log requests/responses
Handler --> AIModal : SSE stream response
AIModal --> State : cache to localStorage
' === DEPLOYMENT ===
ReactApp --> GitHubPages : yarn deploy\n(gh-pages branch)
' === NOTES ===
note right of Scraper
**Scraper Configuration**
• runs on-demand or cron
• headless mode option
• dry-run mode
• per-venue timeout
• event validation
end note
note right of GCS
**Data Format**
Event: {
url: string
title: string
date: DateTime
img: string
details: string
}
end note
note right of FastAPI
**Infrastructure**
• Rate limit: 10/min, 200/day
• CORS: localhost + production
• AgentOps tracing
• SSE streaming
end note
note right of State
**Key State Atoms**
• eventsState (grouped by MM-DD)
• venuesState (54+ venues)
• currentDayState
• timeGroupingModeState
end note
@enduml