How to set up your local machine.
- Python > 3.11
- Node.js
- Yarn
-
Create a Virtual Environment
python -m venv venv .\venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Configure environment variable (optional)s
-
copy
api-keys.env.exampletoapi-keys.envand add your API keys.- required fields for different providers are different, please refer to the LiteLLM setup guide for more details.
- currently only endpoint, model, api_key, api_base, api_version are supported.
- this helps data formulator to automatically load the API keys when you run the app, so you don't need to set the API keys in the app UI.
- required fields for different providers are different, please refer to the LiteLLM setup guide for more details.
-
set
.envto configure server properties:- copy
.env.templateto.env - configure settings as needed:
- DISABLE_DISPLAY_KEYS: if true, API keys will not be shown in the frontend
- EXEC_PYTHON_IN_SUBPROCESS: if true, Python code runs in a subprocess (safer but slower), you may consider setting it true when you are hosting Data Formulator for others
- LOCAL_DB_DIR: directory to store the local database (uses temp directory if not set)
- External database settings (when USE_EXTERNAL_DB=true):
- DB_NAME: name to refer to this database connection
- DB_TYPE: mysql or postgresql (currently only these two are supported)
- DB_HOST: database host address
- DB_PORT: database port
- DB_DATABASE: database name
- DB_USER: database username
- DB_PASSWORD: database password
- copy
-
-
Run the app
- Windows
.\local_server.bat- Unix-based
./local_server.sh
-
Install NPM packages
yarn
-
Development mode
Run the front-end in development mode using, allowing real-time edits and previews:
yarn start
Open http://localhost:5173 to view it in the browser. The page will reload if you make edits. You will also see any lint errors in the console.
-
Build the frontend and then the backend
Compile the TypeScript files and bundle the project:
yarn build
This builds the app for production to the
py-src/data_formulator/distfolder.Then, build python package:
pip install build python -m build
This will create a python wheel in the
dist/folder. The name would bedata_formulator-<version>-py3-none-any.whl -
Test the artifact
You can then install the build result wheel (testing in a virtual environment is recommended):
# replace <version> with the actual build version. pip install dist/data_formulator-<version>-py3-none-any.whl
Once installed, you can run Data Formulator with:
data_formulator
or
python -m data_formulator
Open http://localhost:5000 to view it in the browser.
When deploying Data Formulator to production, please be aware of the following security considerations:
-
Local DuckDB Files: When database functionality is enabled (default), Data Formulator stores DuckDB database files locally on the server. These files contain user data and are stored in the system's temporary directory or a configured
LOCAL_DB_DIR. -
Session Management:
- When database is enabled: Session IDs are stored in Flask sessions (cookies) and linked to local DuckDB files
- When database is disabled: No persistent storage is used, and no cookies are set. Session IDs are generated per request for API consistency
-
Data Persistence: User data processed through Data Formulator may be temporarily stored in these local DuckDB files, which could be a security risk in multi-tenant environments.
For production deployment, consider:
- Use
--disable-databaseflag for stateless deployments where no data persistence is needed - Implement proper authentication, authorization, and other security measures as needed for your specific use case, for example:
- Store DuckDB file in a database
- User authentication (OAuth, JWT tokens, etc.)
- Role-based access control
- API rate limiting
- HTTPS/TLS encryption
- Input validation and sanitization
# For stateless deployment (recommended for public hosting)
python -m data_formulator.app --disable-databaseSee the Usage section on the README.md page.