[TOC]
The data flows through the system in the following way:
- Raw files are stored in a blob store
- Riviera extracts features and metadata from these files
- Information flows through third-party connectors
- Kafka message brokers transport the data
- Transformers process and structure the information
- Finally, everything populates the search index
During indexing, when a file contains GPS coordinates in its metadata, Dropbox converts those coordinates into a hierarchical chain of location IDs. For example, a photo taken in San Francisco would generate a chain linking San Francisco to California to the United States. This hierarchy is crucial because it enables flexible searching at different geographic levels.
When a search returns results, the system generates preview URLs that the frontend can fetch. These URLs point to a preview service built on top of Riviera that generates thumbnails and previews in multiple resolutions on the fly. To avoid repeatedly generating the same preview, the system caches them for 30 days, striking a balance between storage costs and performance.
[1] Block, Object, and File Storage in System Design
[2] System Design CheatSheet for Interview
[4] Dropbox Multimedia Search: Making File Search More Useful




