Skip to content

Add log type include filter#714

Open
jorbaum wants to merge 20 commits intocloudfoundry:mainfrom
jorbaum:add-log-type-filter
Open

Add log type include filter#714
jorbaum wants to merge 20 commits intocloudfoundry:mainfrom
jorbaum:add-log-type-filter

Conversation

@jorbaum
Copy link
Contributor

@jorbaum jorbaum commented Jan 5, 2026

Description

Addresses #711 :

  • Support both ?include-source-types= and ?exclude-source-types (calling it source type instead of log type as log type is too generic and is already used in this code base)
  • Throw error in case both are configured. Do not throw an error on misconfigured values for the new config option (consistent with previous HTTP query param usage in binding config), but log a warning (new).
  • Only exclude filter shall pass unknown/nil source type, include filter will discard

I still need to verify this, but I think a feature flag is not necessary as I tried to make string parsing more efficient by:

  • running only when source type tag and log type filter is present
  • strings.IndexByte might take some time, but it is used only with a single char / and strings.IndexByte() should be more efficient than strings.Index()
  • Looking up content in a map logTypePrefixes rather than in each log line, which should be more efficient

Related PRs:

Disclaimer: Claude helped me in crafting this code.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing performed?

  • Unit tests
  • Integration tests
  • Acceptance tests

Checklist:

  • This PR is being made against the main branch, or relevant version branch
  • I have made corresponding changes to the documentation
  • I have added testing for my changes

@jorbaum jorbaum force-pushed the add-log-type-filter branch from 43b0941 to 9fcc372 Compare January 8, 2026 09:41
@jorbaum jorbaum force-pushed the add-log-type-filter branch from 9fcc372 to bccd92c Compare January 8, 2026 09:43
Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorbaum Thanks for the changes.

Generally it looks good, but I have some smaller objections which should be addressed.

DrainData DrainData `json:"type,omitempty"`
OmitMetadata bool
InternalTls bool
LogFilter *LogTypeSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this has to be a pointer.

Copy link
Contributor Author

@jorbaum jorbaum Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that at first, but in manager.go line 45 Binding is used as a map key:

	sourceDrainMap    map[string]map[syslog.Binding]drainHolder

My understanding is that maps in Go are not comparable, so adding LogTypeSet (which is a map[LogType]struct{}) to the Binding struct makes it non-comparable.

AFAICS the alternative to using a pointer would be to implement a custom comparable key for the map, but using a pointer seemed simpler to me.

default:
// Unknown log type, skip it
// TODO add unit test for unknown log type
d.log.Printf("Unknown log type '%s' in log type filter, ignoring", logType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a log message here and an app owner wrongly configures the drain, than for every app instance and every app log, a log ,message will be written.

The filter settings should be validated upfront and we should not end up in this situation. At this point we should only have valid drain configuration.
I guess we have to bring in #633 first and than this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this code does not run for every log message, but only parsing the syslog drain config. I intentionally left logging out of filtering_drain_writer.go to avoid creating a log for each app log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @chombium we concluded that yhis log gets created for each wrongly configured value in the log types of each syslog drain. This is emitted every few seconds on each VM within cloud foundry. While this is better than for each log line, we would still keep log load at a minimum.

To address this I now made sure to log at most one log line in this method.

set := make(syslog.LogTypeSet, len(logTypes))

for _, logType := range logTypes {
logType = strings.ToLower(strings.TrimSpace(logType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query string in the URL is case sensitive. We should only accept lowercase and not convert eVerY POssibLE string to lower case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did so.

Still, I wonder if we should be strict here about this. Not caring about casing might bring us some benefits as this might be something that users accidentally misconfigure. It should not hurt much to support both I think.

@chombium
Copy link
Contributor

chombium commented Jan 9, 2026

@ZPascal Wdyt about this PR? It addresses #711

@jorbaum
Copy link
Contributor Author

jorbaum commented Jan 12, 2026

@chombium I left a bunch of replies for points that are not yet clear to me. All the rest look good.

Thanks for the thorough review! You found a bunch of nice things.

Copy link
Contributor Author

@jorbaum jorbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked in some comments.

set := make(syslog.LogTypeSet, len(logTypes))

for _, logType := range logTypes {
logType = strings.ToLower(strings.TrimSpace(logType))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did so.

Still, I wonder if we should be strict here about this. Not caring about casing might bring us some benefits as this might be something that users accidentally misconfigure. It should not hurt much to support both I think.

Copy link
Contributor Author

@jorbaum jorbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked in most comments. Still 2 left to clarify.

default:
// Unknown log type, skip it
// TODO add unit test for unknown log type
d.log.Printf("Unknown log type '%s' in log type filter, ignoring", logType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @chombium we concluded that yhis log gets created for each wrongly configured value in the log types of each syslog drain. This is emitted every few seconds on each VM within cloud foundry. While this is better than for each log line, we would still keep log load at a minimum.

To address this I now made sure to log at most one log line in this method.

@jorbaum
Copy link
Contributor Author

jorbaum commented Jan 19, 2026

Feel free to take another look at it.

I noticed that as my changes depend on `RawQuery` not being empty, but
actually they were.
@jorbaum jorbaum force-pushed the add-log-type-filter branch from 382854c to 21d9380 Compare February 11, 2026 14:24
@jorbaum jorbaum force-pushed the add-log-type-filter branch from 21d9380 to 108740d Compare February 11, 2026 14:26
Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jorbaum. Thanks for the improvements. Unfortunately, I've found few more problematic things.

return true
}

if sourceTypeTag == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source_type is always set. Why is this check needed?

type SourceType string

const (
SOURCE_API SourceType = "API"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add log somewhere in the name of the constants, so it is clear that these are for logs. Something like LOG_SOURCE_STG for example

Expect(err).To(HaveOccurred())
})

Context("when source_type tag is missing", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a source type be missing for log envelopes?
Please read the Loggregator API spec for Log Envelopes.

}
})

It("omits logs when source type include filter is configured with LOGS", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds odd: It omits logs when the include filter is configured.

Does the test test if the wanted logs are sent (included)? If an include filter is set , only these logs should be included

Expect(fakeWriter.received).To(Equal(0))
})

It("sends logs when source type exclude filter is configured with LOGS", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above. The naming of the test is odd.

}

// AllSourceTypes returns all valid source types
func AllSourceTypes() []SourceType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function used?

set := make(syslog.SourceTypeSet, len(sourceTypes))

for _, sourceType := range sourceTypes {
sourceType = strings.TrimSpace(sourceType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not trim, leading and trailing space will be url encoded as %20 or +. Nevertheless, if there is a space in the url that will be misformed url, an invalid Syslog Drain in our case.

for _, sourceType := range sourceTypes {
sourceType = strings.TrimSpace(sourceType)
t, _ := syslog.ParseSourceType(sourceType)
set.Add(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a source type is invalid? Shouldn't that throw an error and mark the drain as invalid?

}

func (d *DrainParamParser) getLogFilter(u *url.URL) *syslog.LogFilter {
includeSourceTypes := u.Query().Get("include-source-types")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be better if we add log in the names of the parameters, so that they have clear descriptive names. Maybe something like include-log-source-types and exclude-log-source-types will be better...

},
{
name: "include-source-types=app",
url: "https://test.org/drain?include-source-types=app",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be types=APP. The urls are case sensitive. As the values of the source types are defined uppercase, I'm not sure if lower case should be accepted for the values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants