This app allows users to easily scrape the contents from specific sections of SEC 10-K filings and apply named entity recognition (NER) to the results. The results are output to .xlsx. Written in Python and NodeJS/Electron, it builds to an installer that allows users to use the app on all major operating systems without any dependencies.
To install all dependencies for the entire project, run npm install in the root directory.
To build the entire project, run npm run build:full either in the root directory or the /ui/ subdirectory.
This will:
- build the Python backend
- move the resulting executable into
/ui/extraResources/(see the section "Frontend") - build the Electron frontend
The resulting product can be found in
/frontend/dist/. This only builds the app for the OS (Windows / Mac / Linux) that your computer uses! To build for all three OSes, you will need to set up this repo on a Windows machine, a Mac machine, and a Linux machine!
This repository is divided into two main directories: backend and ui.
This is the poetry project containing the Python backend to the app.
backend_server.py is the entry point. In it, we configure the zerorpc server we use to communicate with the frontend.
Business logic is organized by functionality.
Contains helper code for rate-limiting requests to the SEC API and for serializing complex return types with msgpack (see developer manual).
Contains code for writing processed data to Excel.
Contains code to communicate with the SEC API, to search for and fetch documents.
Contains code for parsing 10-K filings, extracting key sections, and applying Named Entity Recognition to the results.
Unimportant. A temporary directory created when downloading the NER model.
This directory contains the extracted version of the NER model downloaded during the install process. You don't need to touch this, but make sure its contents don't get committed; they vary slightly across platforms.
We have a number of tools to help us write good code. These include:
isortto optimize import orderingmypy, a static type-checker to force accurate type useflake8, a linterblack, a hassle-free code formatterdocformatter, to clean up docstrings (enforce triple-quoting, the formatting of summary vs. detailed description, etc.) in compliance with PEP 257
To run these tools, as well as unit tests, run poetry run pypyr pypyr/quality-check in the backend directory.
We use pyinstaller to build our code to a platform-specific executable, which is then bundled into the Electron app.
These tools are all called automatically:
- before each build
- before commits Neither operation will succeed if any of the tools or unit tests fail.
In the backend directory, run poetry install, to install python dependencies, and
poetry run pypyr pypyr/download-ner-model to properly download and bundle the backend's Named Entity Recognition model for your platform.
Both of these operations are done automatically when you run npm install in the root directory.
poetry uses a virtual environment to isolate a python version and set of dependencies from the rest of your system. To run commands, prefix them with poetry run to make sure you're using the appropriate installations of everything for this project.
For example, to manually perform type-checking, which would normally be mypy ., run
poetry run mypy .
To launch the backend, which would normally by python backend_server.py, run
poetry run python backend_server.py.
poetry run pyinstaller backend_server.spec
The resulting executable and supporting files will be in backend/dist.
You generally don't want or need to do this yourself; to actually use the backend, it needs to be bundled with the frontend.
Use npm run build:full to build the whole app, resulting in a new build of the full app in ui/dist that will use the new backend build.
This is the npm package containing the code to our frontend, an Electron app.
Our code is specifically in the ui/packages subdirectory. To build to an executable, it is bundled by vite (similar to webpack) and rebuilt and packaged by electron-builder.
See the electron-builder docs and the README within ui for more details.
Intermediate build products used by tooling. Do not touch.
For information like the app installer's icon. Not important.
Where build products go. npm run build:full will produce a platform-specific unpacked directory inside ui/dist/, e.g. ui/dist/linux-unpacked, containing the Electron executable and associated files. The final distributable will be an installer; for development and debugging, just run the executable, fractracker-sec-ui{.exe/.app}, in the unpacked folder.
the electron-builder extraResources folder. This is where we place the Python backend executable to be included in the Electron app.
This is where our code lies. It's divided into three directories:
main, for the code of the main process,preload, for preload scripts, most importantlyexposeInRealWorldfor context bridgingrenderer, for the actual renderer process. To understand the distinction between these three kinds of code, check out the documentation for electron's process model and context isolation.
Each directory is itself composed of:
dist, which is for the build process and shouldn't be touchedsrc, where source code goestests, for tests
Unit testing will be done using vitest, here.
Metadata for tooling. Don't worry about this.
eslintfor linting and light autoformattingtscfor static type checking and, of course, the build processvitefor the build processelectron-builderfor the build processnano-stagedfor cross-platform precommit hooks
Before committing, nano-staged runs eslint on every applicable file in packages that is staged for commit, fixing minor errors like missing semicolons automatically. tsc checks that types are valid and compatible across the whole project. If either utility finds nay issues it can't resolve automatically, it's reported in the command line and the commit is stopped.
First, build the Python executable and place it in buildResources.
Then run npm run build:full. The output will be in ui/dist/{platform}-unpacked (see above).
The UI communicates with the Python backend using zerorpc. More details to come in the future, but the gist is:
The Python backend exposes any methods (static or otherwise) of the object passed to the zerorpc.Server() constructor in backend_server.py. We call these methods in the Electron renderer process by calling
window.requestRPC.procedure(funcName:string, args:any[]). Behind the scenes, zerorpc uses messagepack; the details of the encoding/decoding process aren't clearly documented anywhere, so we recommend sticking to passing and returning ints, strings and JSON objects for the time being. (Further experimentation to be done)