Add sharding feature for k8s_cluster receiver

### Component(s)

receiver/k8scluster

### Is your feature request related to a problem? Please describe.

Centralised polling with a single Collector might not scale well on large clusters. 

With deterministic sharding we could support horizontal scaling: each Collector instance handles a disjoint subset of objects.

### Describe the solution you'd like

[Kube-state-metrics](https://github.com/kubernetes/kube-state-metrics/blob/79420f2d7d29b71267447ed9ecd04e80e66abfd0/README.md?plain=1#L238-L305) already implements this idea.

The idea is based on having `SharedIndexInformer`s with `cache.ListWatch` which adds extra filtering:

```go
type sharding struct {
	shard       uint64
	totalShards uint64
}

func (s *sharding) keep(o metav1.Object) bool {
	h := xxhash.New()
	h.Write([]byte(o.GetUID()))
	ret := (h.Sum64() % s.totalShards) == s.shard
	return ret
}

// newShardedInformer builds a SharedIndexInformer with a ListWatch that filters objects by the sharding rule.
func newShardedInformer(
	s sharding,
	objType runtime.Object,
	listWithCtx func(ctx context.Context, options metav1.ListOptions) (runtime.Object, error),
	watchWithCtx func(ctx context.Context, options metav1.ListOptions) (watch.Interface, error),
	resyncPeriod time.Duration,
	indexers cache.Indexers,
) cache.SharedIndexInformer {
	lw := &cache.ListWatch{
		ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
			return listWithCtx(context.Background(), options)
		},
		WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
			w, err := watchWithCtx(context.Background(), options)
			if err != nil {
				return nil, err
			}
			return watch.Filter(w, func(in watch.Event) (out watch.Event, keep bool) {
				a, err := meta.Accessor(in.Object)
				if err != nil {
					return in, true
				}
				return in, s.keep(a)
			}), nil
		},
		ListWithContextFunc: func(ctx context.Context, options metav1.ListOptions) (runtime.Object, error) {
			obj, err := listWithCtx(ctx, options)
			if err != nil {
				return nil, err
			}
			items, err := meta.ExtractList(obj)
			if err != nil {
				// If extraction fails, do not drop the list.
				return obj, nil
			}
			kept := make([]runtime.Object, 0, len(items))
			for _, it := range items {
				a, err := meta.Accessor(it)
				if err != nil {
					kept = append(kept, it)
					continue
				}
				if s.keep(a) {
					kept = append(kept, it)
				}
			}
			_ = meta.SetList(obj, kept)
			return obj, nil
		},
		WatchFuncWithContext: func(ctx context.Context, options metav1.ListOptions) (watch.Interface, error) {
			w, err := watchWithCtx(ctx, options)
			if err != nil {
				return nil, err
			}
			return watch.Filter(w, func(in watch.Event) (out watch.Event, keep bool) {
				a, err := meta.Accessor(in.Object)
				if err != nil {
					return in, true
				}
				return in, s.keep(a)
			}), nil
		},
	}
	return cache.NewSharedIndexInformer(
		lw,
		objType,
		resyncPeriod,
		indexers,
	)
}
```

This addition would be a nice to have feature for the `k8s_cluster` receiver.

### Sample config

```yaml
receivers:
  k8s_cluster:
    collection_interval: 10s
    sharding:
      shard_instance_id: ${env:REPLICA_ID}
      total_shards: 3
```

where `REPLICA_ID` is set using `apps.kubernetes.io/pod-index` label set by the StatefulSet controller. 

This feature makes mostly sense to automatically run as part of a StatefulSet Collector (as in [ksm](https://github.com/kubernetes/kube-state-metrics/blob/79420f2d7d29b71267447ed9ecd04e80e66abfd0/README.md?plain=1#L253)) leveraging the deterministic behavior of the StatefulSet workload type.

A sample values file for the existing Helm Chart:

```yaml
mode: statefulset
replicaCount: 3


image:
  repository: otelcontribcol-dev
  tag: "latest"
  pullPolicy: IfNotPresent

presets:
  clusterMetrics:
    enabled: true

resources:
  limits:
    cpu: 1
    memory: 1Gi

extraEnvs:
  - name: REPLICA_ID
    valueFrom:
      fieldRef:
        fieldPath: metadata.labels['apps.kubernetes.io/pod-index']


config:
  receivers:
    k8s_cluster:
      collection_interval: 10s
      sharding:
        shard_instance_id: ${env:REPLICA_ID}
        total_shards: 3
  exporters:
    debug:
      verbosity: detailed

  service:
    extensions: [health_check]
    telemetry:
      logs:
        level: INFO
    pipelines:
      metrics:
        receivers: [ k8s_cluster ]
        processors: [ ]
        exporters: [ debug ]
```

### Describe alternatives you've considered

_No response_

### Additional context

A draft POC patch: https://github.com/open-telemetry/opentelemetry-collector-contrib/compare/main...ChrsMark:opentelemetry-collector-contrib:k8scluster_sharding?expand=1

### Tip

<sub>[React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sharding feature for k8s_cluster receiver #45311

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Sample config

Describe alternatives you've considered

Additional context

Tip

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add sharding feature for k8s_cluster receiver #45311

Description

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Sample config

Describe alternatives you've considered

Additional context

Tip

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions