You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-4Lines changed: 19 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -150,21 +150,36 @@ validators:
150
150
151
151
As seen above, mdox supports validate configuration supports a few parameters and passing an array of link validators with types and regexes. The supported configuration parameters are:
152
152
153
-
*`timeout`: The HTTP client's timeout. Defaults to "10s".
154
-
* `parallelism`: The maximum amount of concurrent HTTP requests. Defaults to 100.
153
+
*`timeout`: The HTTP client's timeout. Defaults to "30s".
154
+
* `parallelism`: The maximum amount of concurrent HTTP requests. Defaults to 25.
155
155
* `host_max_conns`: The maximum amount of HTTP connections open per host. Defaults to 2.
156
-
* `random_delay`: A random delay between 0 and this value is added between requests. It takes values like "500ms", "1s", "1m", or "1m30s". Defaults to no delay.
156
+
* `random_delay`: A random delay between 0 and this value is added between requests. It takes values like "500ms", "1s", "1m", or "1m30s". Defaults to "500ms".
157
157
158
158
There are three types of validators:
159
159
160
160
* `ignore`: This type of validator makes sure that `mdox` does not check links with provided regex. This is the most common use case.
161
161
* `githubPullsIssues`: This is a smart validator which only accepts a specific type of regex of the form `(^http[s]?:\/\/)(www\.)?(github\.com\/){ORG}\/{REPO}(\/pull\/|\/issues\/)`. It performs smart validation on GitHub PR and issues links, by fetching GitHub API to get the latest pull/issue number and matching regex. This makes sure that mdox doesn't get rate limited by GitHub, even when checking a large number of GitHub links(which is pretty common in documentation)!
162
-
*`roundtrip`: All links are checked with the roundtrip validator by default(no need for including into config explicitly) which means that each link is visited and fails if http status code is not 200(even after retries).
162
+
*`roundtrip`: All links are checked with the roundtrip validator by default(no need for including into config explicitly) which means that each link is visited and fails if http status code is not 200(even after retries). Transient failures (HTTP 429, 503, 504, and network timeouts) are retried up to 3 times with exponential backoff. Permanent failures (404, DNS NXDOMAIN, TLS/certificate errors) fail immediately.
163
163
164
164
Relative link checking *is not* affected by this configuration, as it is expected that such links will work.
165
165
166
166
YAML can be passed in directly as well using `links.validate.config` flag! For more details [go.dev reference](https://pkg.go.dev/github.com/bwplotka/mdox) or [Go struct](https://github.com/bwplotka/mdox/blob/main/pkg/mdformatter/linktransformer/config.go).
167
167
168
+
#### Recommended CI configuration
169
+
170
+
For CI environments, enabling caching avoids re-checking links that were recently verified. This is the most effective way to reduce flaky CI failures from transient network issues or rate limiting:
171
+
172
+
```yaml
173
+
version: 1
174
+
timeout: '1m'
175
+
parallelism: 25
176
+
host_max_conns: 2
177
+
random_delay: '500ms'
178
+
cache:
179
+
type: 'sqlite'
180
+
jitter: '24h'
181
+
```
182
+
168
183
### Link localization
169
184
170
185
It is expected fordocumentation to contain remote links to the project website. However,in such cases, it creates problems for multi-version docs or multi-domain websites (links would need to be updated for each version which is cumbersome). Also, it would not be navigatable locally or through GitHub (would always redirect to the website) and requires additional link checking.
v.remoteLinks[response.Ctx.Get(originalURLKey)] =fmt.Errorf("remote link retry %v: %w", response.Ctx.Get(originalURLKey), err)
328
-
break
329
-
}
330
-
v.remoteLinks[response.Ctx.Get(originalURLKey)] =fmt.Errorf("%q rate limited even after retry; status code %v: %w", response.Request.URL.String(), response.StatusCode, err)
v.remoteLinks[response.Ctx.Get(originalURLKey)] =fmt.Errorf("remote link retry %v: %w", response.Ctx.Get(originalURLKey), err)
340
-
break
341
-
}
342
-
v.remoteLinks[response.Ctx.Get(originalURLKey)] =fmt.Errorf("%q not accessible even after retry; status code %v: %w", response.Request.URL.String(), response.StatusCode, err)
343
-
default:
344
-
v.remoteLinks[response.Ctx.Get(originalURLKey)] =fmt.Errorf("%q not accessible; status code %v: %w", response.Request.URL.String(), response.StatusCode, err)
0 commit comments