Skip to content

ETL: batch geocoding pipeline with rate limiting and resume support #26

@francescobianco

Description

@francescobianco

Problem

When geocoding large address lists (10k+ records), the API rate limits kick in. There is currently no built-in mechanism in the SDK to handle this gracefully.

Typical pattern people try

$client = new Client($token);
foreach ($addresses as $addr) {
    $result = $client->get('https://geocoding.openapi.com/geocode', ['address' => $addr]);
    // if rate limit hit (429), this throws immediately and the whole batch is lost
}

What would help

A retry/backoff decorator on top of HttpTransportInterface:

use Openapi\Transports\RetryTransport;

$client = new Client($token, new RetryTransport(
    maxRetries: 3,
    backoffMs: 500,
    retryOn: [429, 503]
));

And a checkpoint mechanism to resume from the last successful record.

Open questions

  • Should RetryTransport be part of the core SDK or a separate optional package?
  • Should the checkpoint state be file-based, Redis-based, or left to the consumer?
  • Is 429 the only rate-limit signal used across all Openapi endpoints?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions