Add EEP draft for in-app export visibility#85
Conversation
| application name. Using the OTP application name is recommended, but the | ||
| attribute defines a visibility domain. This lets the mechanism remain useful | ||
| for code that is not packaged as an OTP application while preserving the common | ||
| OTP use case. |
There was a problem hiding this comment.
This also enables having a group of applications that share the same scope. For example Cowlib has many internal functions for Cowboy that should not be used by Cowboy users. A common scope would enforce that.
Worth pointing out that this can be circumvented by having a module use the same scope and proxying functions to that module. I still think there is value since the user has to explicitly use the scope, demonstrating intent to use internal functions.
There was a problem hiding this comment.
Yes, that is true. A module can be placed anywhere and declare -app(my_app), and then it can call functions restricted to my_app.
That is intentional in the current design. The boundary is based on explicit module metadata, not filesystem layout, code path location, or OTP application discovery. I think that is preferable because those alternatives become implicit and brittle, especially with releases, generated code, test modules, vendored code, and custom build systems.
This is not meant to be a security sandbox. It is an encapsulation mechanism for making internal API boundaries explicit and enforceable against accidental use. If someone deliberately adds a false -app(...) attribute to circumvent the boundary, they are already modifying code in the system and should understand that they are opting into that visibility domain.
A module can only declare one app visibility domain, so the attribute still gives tools and the VM a clear, inspectable rule: this module belongs to exactly one visibility group, and app-local calls are allowed only within that group.
There was a problem hiding this comment.
A module can only declare one app visibility domain
That seems too limiting to me. E.g. cowbody_module_x declared -app (cowlib) so it can use app-exported functions from cowlib app scope, but than it can't declare it's own scope to use app_exports for it's own functions? Should there be a distinction between -app/1 and e.g. -use_app/1 attribute, where -app/1 is one/module and -use_app/1 is many/module?
There was a problem hiding this comment.
You could put everything in the app cow if you want.
There was a problem hiding this comment.
You could, but it would a semantic mess.
What if -app/1 would just be used when you want to use app, and -app_export would take 2 arguments: app and list of exports? That would count both as app definition and app export.
There was a problem hiding this comment.
I think I understand the argument, or at least have a case where the current proposal is limited.
- Cowboy and Gun both depend on Cowlib which contains many private functions made specifically for Cowboy, Gun and/or both. If used only by one they should still sit in Cowlib to facilitate testing (e.g. parser+builder of RFC components together).
- Cowlib defines
cowscope. - Cowboy and Gun also define the
cowscope to use the functions from Cowlib. - Problem: Cowboy uses Gun only for testing (and vice versa) so the scope is not enough to ensure Cowboy itself doesn't call Gun functions internally (and vice versa).
Using separate scopes for separate modules helps a little but the problem can still occur. The problem is likely catched elsewhere (xref possibly).
The problem would not happen if a module's scope and the scopes used by the module were separate. Cowboy could have all its modules be in the cowboy scope, Gun in gun and Cowlib in cow, and Cowboy modules that need access to Cowlib (and not to Gun) could say they use the cow scope.
So you'd have the scope of the module, and extra scopes used by the module.
27a73a2 to
34b5046
Compare
34b5046 to
7b58015
Compare
Implements the runtime side of EEP 80 "BEAM-Level In-App Export Visibility" as a single squashed patch on top of upstream erlang/otp. Public spec: erlang/eep#85 Mechanism ========= Two new module attributes: -app(App). -app_export([Name/Arity, ...]). A function listed in -app_export/1 is exported normally, but remote calls to it succeed only when the caller module declares the same -app(App) value. Cross-app callers and callers with no -app/1 raise error:{badappcall, [{TargetModule, TargetFunction, ArgsOrArity, []}]} Modules without -app/1 behave exactly as today. Functions not listed in -app_export/1 behave exactly as today. Implementation summary ====================== Loader (erts/emulator/beam/beam_load.c) Parses the new attributes. Stores the module's app atom on the Module struct, and for each app-restricted export sets the app_restricted flag plus the app_atom field on the Export entry. Export and Module state (erts/emulator/beam/export.{c,h}, module.{c,h}) Adds Eterm app_atom and int app_restricted to Export. Adds Eterm app_atom to Module. Process state (erts/emulator/beam/erl_process.{c,h}) Adds c_p->caller_app (preserved-caller cache, used for the error_handler bounce) and c_p->caller_saved_i (a PC inside the calling JIT'd function that call_error_handler / apply() can resolve back to the caller's module). Atoms (erts/emulator/beam/atom.names) am_app, am_app_export, am_badappcall, am_undefined as needed. Interpreter check (erts/emulator/beam/beam_common.c, emu/instrs.tab, emu/macros.tab, emu/beam_emu.c) DISPATCH_EXPORT and DISPATCH_EXPORT_TAILCALL gate the dispatch through erts_check_app_visibility, passing the caller PC. i_call_ext and the apply opcodes write c_p->caller_saved_i so the apply() runtime and call_error_handler can recover the caller after a tail call or an error_handler bounce. apply() / fixed_apply() (beam_common.c) Populates c_p->caller_app from caller_saved_i when the immediate caller has a real app atom; preserves the cached value when the caller is system code (e.g. error_handler). call_error_handler() (beam_common.c) When dispatch hits a stub Export and is about to bounce through error_handler:undefined_function/3, capture the original caller's app from the topmost Erlang-stack CP and, failing that, from c_p->caller_saved_i. This is what makes a direct same-app call to a never-loaded restricted function succeed: the first dispatch finds ep->app_restricted == 0 (the loader has not run for the target module yet), so the gate skips, but the bounce preserves the caller's identity for the second visibility check after error_handler loads the module and re-applies. ARM JIT (erts/emulator/beam/jit/arm/{beam_asm.hpp,instr_call.cpp, instr_fun.cpp}) emit_check_app_visibility calls erts_check_app_visibility_jit with the caller module atom baked in as an immediate. emit_app_visibility_gate wraps the check with the app_restricted fast-path skip and writes c_p->caller_saved_i = adr(gate_pc) before the cbz so call_error_handler has a PC fallback. i_call_ext, i_call_ext_only, both bypass branches of move_call_ext_last, and the apply paths (variable_apply / fixed_apply) route their dispatch through the gate. emit_call_fun tests FUN_HEADER_KIND_OFFS and gates external funs (fun M:F/A); local funs and closures skip the gate (their bodies are JIT'd in the originating module and the next call_ext inside the body is already gated against that module's app). x86 JIT (erts/emulator/beam/jit/x86/{beam_asm.hpp,instr_call.cpp, instr_fun.cpp}) emit_check_app_visibility / emit_save_app_visibility_caller_pc / emit_check_external_fun_app_visibility provide the same surface as the ARM helpers and are wired into the equivalent x86 emitters. Compiler tooling (lib/stdlib/src/erl_lint.erl, lib/compiler/src/v3_core.erl) Recognises -app/1 and -app_export/1 attributes and warns when -app_export/1 lists a function not exported or when -app_export/1 is present without -app/1. module_info (erts/emulator/beam/erl_bif_info.c) Reports app and app_export in module_info(attributes). Verified scenarios ================== All paths tested with both -emu_flavor jit and -emu_flavor emu where applicable (an external escript test_comprehensive plus an internal validate.erl plus an internal dist_test): Direct calls, same-app and cross-app Tail calls, same-app and cross-app apply/3 with literal args Dynamic apply/3 (M, F, A from runtime values) Cross-app calls through a stub Export (target not yet loaded), via both direct call and apply/3 Same-app calls through a stub Export Caller has no -app at all (shell / escript / erl_eval) Cache-pollution sequences (a same-app gated call does not allow a later cross-app call to slip through) External funs (fun M:F/A) same-app and cross-app Closures: same-app closure body that invokes a restricted function is allowed; cross-app closure body is blocked at its own call_ext. rpc:call between two nodes, with no-app caller and with bar_c-style cross-app caller resident on the callee; enforcement happens on the callee node using its metadata as the EEP specifies. Co-authored-by: Erik Stenman <erik@stenmans.org>
| The feature adds package-private-style visibility without adding new Erlang | ||
| syntax and without changing the behavior of existing modules. Code that does | ||
| not use the new attributes behaves as it does today. The proposal is intended | ||
| to make internal APIs explicit, prevent accidental coupling between OTP | ||
| applications or libraries, and give tools a shared source of visibility | ||
| metadata. |
There was a problem hiding this comment.
If this is the goal, then I don't see why the runtime should enforce it.
It should be sufficient to give this information to the static checking tools, e.g. xref.
Runtime checks are very problematic for various reasons:
- It complicates live debugging and troubleshooting. Sometimes we have to call internal functions from a remote shell either to query state of a life system or to recover it.
- Concept of application leaks from the standard library to the VM. I could be wrong, but don't think the VM currenly has any direct knowledge about the applications; rather the application controller uses VM primitives like group leader to implement the concept.
- Other BEAM languages may have different views on what is considered an application, and how they are organized.
- Additional costs for every remote call. The document already mentions it, but given the problems above, I don't think it's worth it.
There was a problem hiding this comment.
Static tooling would help, and the EEP explicitly expects xref/Dialyzer/language servers to use the metadata. The reason I proposed runtime enforcement is that static checks do not cover all call paths reliably: apply/3, remote funs, dynamically loaded modules, shell-created calls, generated code, mixed compile pipelines, and modules compiled without the checker. Without runtime behavior, the attribute becomes documentation plus optional tooling, which is close to what EEP 67 explored.
On the specific concerns:
Debugging/recovery: I agree this is a real cost. The current design would make intentionally calling an app-restricted function from the shell fail unless the shell/eval context has matching app metadata. One possible mitigation is an explicit unsafe escape hatch for privileged debugging, but I would want that to be clearly outside normal call semantics rather than silently weakening the boundary.
Application concept in the VM: The intent is not for the VM to understand OTP applications or .app files. -app/1 is just an atom carried as module metadata and copied into export metadata. It defines a visibility domain, not an OTP application. That said, the name may be misleading; this is one reason I listed naming as an open question. A name like -visibility_domain/1 or similar might avoid implying that the VM is learning OTP application structure.
Other BEAM languages: This is also why I prefer explicit BEAM/module metadata over deriving anything from OTP application layout, code paths, or build tools. Other languages could either emit the metadata or ignore it. Modules that do not emit it keep current behavior.
Remote-call cost: The prototype numbers suggest public calls are within noise, while allowed restricted calls pay a few ns/op in the x86 JIT microbench. But the larger question is whether runtime enforcement is valuable enough to justify any cost and complexity. That is exactly what I am hoping to validate with the EEP discussion.
So I think the disagreement is not about whether static tooling is useful. It is useful. The question is whether the feature should define enforceable semantics, or remain advisory metadata. My argument for enforcement is that “internal API boundary” is much more predictable when every dynamic call path observes the same rule.
There was a problem hiding this comment.
static checks do not cover all call paths reliably: apply/3, remote funs, dynamically loaded modules, shell-created calls, generated code
As a library developer, I am not concerned about someone going out of their way to shoot themselves in a foot by calling an internal API. If I was, for some reason, absolutely determined to not allow anyone to call a common function, then I'd put it in an hrl or something.
Remote-call cost: The prototype numbers suggest public calls are within noise, while allowed restricted calls pay a few ns/op in the x86 JIT microbench.
Based on common sense, I presume that the majority of remote calls happen between modules of the same application (scope, namespace, ...), with cross-scope calls being relatively rare. The feature penalizes the former. So if the feature becomes widely used, then we can assume that most calls will be penalized.
|
I think that attribute names are misleading. Also, as I commented, there should be a distinction of when you "use" some app (namespace) and when you define it, e.g. Performance: That benchmark results seem really promising. I was thinking about implementation in the VM, but then concluded that it would probably be too slow. However, I think there should be a flag for the BEAM to completely disable those checks in the runtime (i.e. I'd never enforce those, ideally it should be opt-in). I'd also like if you could turn it on/off in runtime without having to restart the VM. |
|
I second the point that if the attribute is or suggests an application, then in this context it'd be assumed to be an OTP application. Calling it a namespace, scope, or boundary (there's prior art in the BEAM ecosystem...), could help clarify it.
Strong 👍 I think it's one of BEAM's great strengths that it's just easy to "look under the hood" and do troubleshooting like this. The java-esque private/package/public model is a bit in conflict with that approach.
Since the feature is supposed to be opt-in, we can just choose not to use it in a codebase. The problem is dependencies. Advisory metadata is a strong enough signal for those willing to leverage it. Enforcement on a dependency that's chosen to use the feature, without an escape hatch, could make troubleshooting and fixing dependencies much less pleasant 😬 |
|
How will the I additionally voice all concerns brought by @ieQu1. In my opinion this feature fully belongs to a static analyzer. Even if you eventually want to make it part of the VM, I'd suggest to first implement it as a static analysis tool, otherwise it will be a pain to find out only at runtime that I have introduced an exception due to visibility changes. For example, imagine |
|
Thanks everyone. I think the discussion has surfaced several separate design questions, and I should treat them separately instead of folding them into one large argument. First, I agree that -app/1 is probably the wrong name. The proposal is about a visibility domain, not necessarily an OTP application. I will revise the draft toward terminology such as scope or visibility_domain, unless there is a better established BEAM term. Second, the Cowlib/Cowboy/Gun example shows a real limitation of the current one-domain model. Separating “the domain this module belongs to” from “additional domains this module may use” would express more cases. I still think the single-domain model is the smallest useful semantics, but I will document the two-domain/use-domain model as an alternative or possible extension so the trade-off is explicit. Third, functions required as callbacks by behaviours outside the visibility domain should use ordinary -export/1. Scoped exports are intended for internal cross-module APIs, not for callbacks invoked by external framework code such as gen_server, supervisor, application, etc. A separate callback-specific export form might be useful in the future, but I do not think this EEP should try to solve that. Fourth, I still think static-only metadata is a different feature from this proposal. Static tooling is valuable and should use the metadata, but it does not define behaviour for apply/3, remote funs, dynamically loaded modules, generated code, or modules compiled without the checker. The core question is whether Erlang should have enforceable internal API semantics at all. If the answer is yes, the VM needs to participate; if the answer is no, then the proposal becomes advisory metadata and is much closer to EEP 67. Static analysis could also be used to report what would happen if an existing export were given a narrower scope. That can help maintainers, but it cannot fully answer how library users call the code in the wild. A user calling internal functions would discover quickly that upgrading to a version with stricter visibility breaks their code, which is similar to relying on any undocumented internal API. The debugging concern is valid. I will think through an explicit unsafe/debug escape hatch rather than silently weakening normal call semantics. In practice, when I need to debug a running system, I often recompile a module with export_all, so this may be solvable as an explicit debug mode rather than part of the normal visibility rule. |
|
Small clarification on the debugging point: I think Erlang debugging already has a precedent for explicitly loading code with different visibility during investigation, for example: That is a deliberate local debugging action, with the usual caveats around code loading and source access in a live system. For example, loading a module again while a process is still running old code can purge the old code and kill that process. In my experience, if an operator is allowed to run a shell on a live system, they are usually also allowed access to the source code or to a way of producing an equivalent debug build. |
|
For general information, OTP is reading the comments and checking the community opinion on this matter. |
IMO the shell should have a mode (possibly the default in interactive) that allows calling any functions regardless of scope. It would be far too inconvenient to use the shell for development otherwise. |
|
Some late comments on debugging:
|
This PR proposes application-scoped function visibility for Erlang by adding
-app/1and-app_export/1module attributes with BEAM runtime enforcement.\n\nThe goal is to provide a package-private-style boundary for functions that are exported for use inside one application but are not public API for external callers.\n\nThe draft includes semantics for remote calls,apply/3, remote funs, distributed calls, hot code loading, backwards compatibility, security considerations, prior art from EEP 5/67, and implementation notes.\n\nA prototype has been used to validate the basic design, but a clean public implementation fork will be linked later.