Skip to content

Comments

Cleanup FIML, enhance em_mvn()#303

Merged
Maximilian-Stefan-Ernst merged 12 commits intoStructuralEquationModels:develfrom
alyst:alyst/enhance_em_cor
Feb 20, 2026
Merged

Cleanup FIML, enhance em_mvn()#303
Maximilian-Stefan-Ernst merged 12 commits intoStructuralEquationModels:develfrom
alyst:alyst/enhance_em_cor

Conversation

@alyst
Copy link
Contributor

@alyst alyst commented Feb 9, 2026

This is hopefully one of the 2 big remaining refactorings from #193:

  • FIML: introduce SemFIMLPattern -- it handles FIML calculation for a specific missing pattern.
    Previously, we introduced SemObservedMissingPattern to collect observed samples with a specific missing pattern; this PR applies a similar change for FIML.
  • em_mvn(): decouple from SemObservedMissing. Previously one could only use em_mvn() from SemObservedMissing after it's already created.
    Now it could be used outside, and there is no EmMVNModel struct.
    obs_cov and obs_mean for SemObservedMissing are still lazily calculated, but now it follows the common SemObserved API, so there is less
    need for overloading methods to handle SemObservedMissing
  • em_mvn() was optimized:
    • do more inplace linear algebra to reduce memory footprint
    • use Cholesky factorizations of the matrices (obs cov within a missing pattern) instead of matrices themselves.
    • more subtle numerical stability optimizations to work with z (observed data of the current pattern x minus the current mean estimate) rather than with x directly.
  • em_mvn() some options to help with large matrices:
    • max_nsamples_em limits how many samples from each missing pattern is used -- to handle data with thousands of samples
    • min_eigval to enforce pos-def of the resulting matrix
  • add progress bar support to em_mvn() for monitoring the convergence of large matrices (thousands of samples, >1000 observed vars)

@alyst
Copy link
Contributor Author

alyst commented Feb 9, 2026

cc @Maximilian-Stefan-Ernst

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 85.02415% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.45%. Comparing base (1f8d2a9) to head (4989950).
⚠️ Report is 28 commits behind head on devel.

Files with missing lines Patch % Lines
src/additional_functions/helper.jl 0.00% 16 Missing ⚠️
src/observed/EM.jl 90.82% 10 Missing ⚠️
src/loss/ML/FIML.jl 94.73% 3 Missing ⚠️
src/observed/missing.jl 88.88% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #303      +/-   ##
==========================================
- Coverage   71.83%   71.45%   -0.39%     
==========================================
  Files          51       51              
  Lines        2223     2288      +65     
==========================================
+ Hits         1597     1635      +38     
- Misses        626      653      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Maximilian-Stefan-Ernst
Copy link
Collaborator

Looks nice, also the imporvements to the EM algorithm. I made some small comments, apart from that, I'm happy to merge.

@alyst alyst force-pushed the alyst/enhance_em_cor branch 4 times, most recently from 31593be to a4f5d7a Compare February 13, 2026 23:58
@alyst
Copy link
Contributor Author

alyst commented Feb 14, 2026

@Maximilian-Stefan-Ernst thank you for the review! I've addressed your comments in the recent commits (If you approve the PR, I will cleanup the history before the merge).
I've also rebased the branch to the latest devel branch. The CI test is passing, the format checks fail -- but in the fiels that were not touched by this PR.

@alyst alyst force-pushed the alyst/enhance_em_cor branch 2 times, most recently from 7386699 to 17ce0bb Compare February 16, 2026 00:01
@Maximilian-Stefan-Ernst
Copy link
Collaborator

Thank you, I would be happy to merge now - do you want to clean anything up still, or should I go ahead?

alyst and others added 12 commits February 18, 2026 09:06
* refactor FIML code to consolidate pattern methods under SemFIMLPattern
* rename semfiml/lossfun to loss for consistency
so EM MVN could be done when SemObsMissing is constructed
- report rel_error if not converged
- add max_nsamples_em opttion to limit number of samples used
it's ok to return vector
@alyst alyst force-pushed the alyst/enhance_em_cor branch from 17ce0bb to 4989950 Compare February 18, 2026 17:10
@alyst
Copy link
Contributor Author

alyst commented Feb 18, 2026

@Maximilian-Stefan-Ernst I've just rebased it to the devel branch and cleaned up the commit history. It should be ready to merge. Thank you!

@Maximilian-Stefan-Ernst Maximilian-Stefan-Ernst merged commit 9b05e61 into StructuralEquationModels:devel Feb 20, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants