This Simulator module aims to automate experimentation by sampling EEG data from prior sessions and running given models in a task loop, thus simulating a live session.
main.py is the entry point for program. After following BciPy readme steps for setup, run the module from terminal:
(venv) $ bcipy-sim -h
usage: bcipy-sim [-h] [-i] [--gui] [-d DATA_FOLDER] [-m MODEL_PATH] [-p PARAMETERS] [-n N] [-s SAMPLER] [-o OUTPUT]
optional arguments:
-h, --help show this help message and exit
-i, --interactive Use interactive command line for selecting simulator inputs
--gui Use interactive GUI for selecting simulator inputs
-d DATA_FOLDER, --data_folder DATA_FOLDER
Raw data folders to be processed. Multiple values can be provided, or a single parent folder.
-m MODEL_PATH, --model_path MODEL_PATH
Signal models to be used. Multiple models can be provided.
-p PARAMETERS, --parameters PARAMETERS
Parameter File to be used
-n N Number of times to run the simulation
-s SAMPLER, --sampler SAMPLER
Sampling strategy
--sampler_args SAMPLER_ARGS
Sampler args structured as a JSON string.
-o OUTPUT, --output OUTPUT
Sim output path
-v, --verbose Verbose mode for more detailed logging.For example,
$ bcipy-sim -d my_data_folder/ -p my_parameters.json -m my_models/ -n 5
i: Interactive command line interface. Provide this flag by itself to be prompted for each parameter.gui: A graphical user interface for configuring a simulation. This mode will output the command line arguments which can be used to repeat the simulation.d: Raw data folders to be processed. Data folders should contain EEG responses to Copy Phrase tasks. Each session data folder should contain raw_data.csv, triggers.txt, parameters.json.These files will be used to construct a data pool from which simulator will sample EEG. The parameters file in each data folder will be used to check compatibility with the simulation/model parameters.p: path to the parameters.json file used to run the simulation. These parameters will be applied to all raw_data files when loading. This file can specify various aspects of the simulation, including the language model to be used, the text to be spelled, etc. Timing-related parameters should generally match the parameters file used for training the signal model(s).m: Path to a pickled (.pkl) signal model. One or more models can be provided.n: Number of simulation runso: Output directory for all simulation artifacts.s: Sampling strategy to use; by default the TargetNonTargetSampler is used. The value provided should be the class name of a Sampler.sampler_args: Arguments to pass in to the selected Sampler. Some samplers can be customized with further parameters. These should be structured as a JSON dictionary mapping keys to values. For example:--sampler_args='{"inquiry_end": 4}'-vor--verbose: Execute the simulation in verbose mode for more detailed logging. Useful for debugging.
Output folders are generally located in the data/simulator directory, but can be configured per simulation. Each simulation will create a new directory. The directory name will be prefixed with SIM and will include the current date and time (E.G -- "SIM_%m-%d-%Y_%H_%M_%S")
At the top level of the output directory, the following files are created:
parameters.jsoncaptures params used for the simulation.sim.logis a log file for the overall simulation; metrics will be output here.summary_data.jsonsummarizes session data from each of the runs into a single data structure.metrics.pngboxplots for several metrics summarizing all simulation runs.
A directory is created for each simulation run. The directory contents are similar to the session output in a normal bcipy task. Each run directory contains:
run_{n}.loglog file specific to the run, where n is the run number.session.jsonsession data output for the task, including evidence generated for each inquiry and overall metrics.session.xlsxsession data summarized in an excel spreadsheet with charts for easier visualization.
- Task - a simulation task to be run (ex. RSVP Copy Phrase)
- TaskRunner - runs one or more iterations of a simulation
- TaskFactory - constructs the hierarchy of objects needed for the simulation.
- DataEngine - loads data to be used in a simulation and provides an API to query for data.
- DataProcessor - used by the DataEngine to pre-process data. Pre-processed data can be classified by a signal model.
- Sampler - strategy for sampling data from the data pool stored in the DataEngine.
The simulator is structured to support evidence from multiple devices (multimodal). However, it currently only includes processing for EEG device data. To provide support for models trained on data from other devices (ex. Gaze), a RawDataProcessor must be added for that device. The Processor pre-processes data collected from that device and prepares it for sampling. A RawDataProcessor is matched up to a given signal model using that model's metadata (metadata.device_spec.content_type). See the data_process module for more details.
The parameters file is used to configure various aspects of the of the simulation. Timing-related parameters should generally match the parameters file used for training the signal model(s). Following are some specific parameters that you may want to modify, depending on the goals of a particular simulation:
task_text- the text to spell.lang_model_type- language model to use in the simulation.summarize_session- if set to true a session.xlsx summary will be generated for each simulation run.
Parameters which define task stoppage criteria are important to ensure that the simulation runs to completion without getting stuck in an infinite loop. The values for these parameters may also affect analysis of results.
min_inq_len- Specifies the minimum number of inquiries to present before making a decision in copy/spelling tasks.max_inq_len- maximum number of inquiries to display before stopping the task.max_selections- The maximum number of selections for copy/spelling tasks. The task will end if this number is reached.max_incorrect- The maximum number of consecutive incorrect selections for copy/spelling tasks. The task will end if this number is reached.max_inq_per_series- Specifies the maximum number of inquiries to present before making a decision in copy/spelling tasks
A simulation can be started using a graphical user interface.
$ bcipy-sim --gui
This provides a way to explore the file system when providing the parameters.json file, simulation model, and input data sources. After all required inputs have been provided the user can initiate the simulation from the GUI interface. The command line used to run the simulation are output to the console prior to the run making it easier to start subsequent simulations with the same set of arguments.
The simulator also includes a Task for replaying a recorded session using a different signal model. This is used for testing if changes to a model result in more easily differentiated signals.
This functionality currently has a different entry point.
(venv) $ bcipy-replay -h
usage: bcipy-replay [-h] [-d DATA_FOLDER] -m MODEL_PATH [-p PARAMETERS] [-o OUTPUT]
optional arguments:
-h, --help show this help message and exit
-d DATA_FOLDER, --data_folder DATA_FOLDER
Raw data folders to be processed. Multiple values can be provided, or a single parent folder.
-m MODEL_PATH, --model_path MODEL_PATH
Signal model to be used.
-p PARAMETERS, --parameters PARAMETERS
Parameter File to be used
-o OUTPUT, --output OUTPUT
Sim output path- Only one sampler maybe provided for all devices. Ideally we should support a different sampling strategy for each device.
- Only Copy Phrase is currently supported.
A group simulation demo is provided in the demo directory. This demo includes the ability to run a simulation across multiple users, phrases, and language_models. The demo is run using the following command while in a virtual environment with the bcipy package installed:
python bcipy/simulator/demo/demo_group_simulation.pySee the demo file for more details on how to configure the simulation.
The switch_data_processor and switch_model are used to demonstrate a multimodal simulations using a button/switch as an example. To run a simulation with these inputs, you will need to perform the following steps:
- Ensure that the devices.json file has an entry for a switch
{
"name": "Switch",
"content_type": "MARKERS",
"channels": [
{ "name": "Marker", "label": "Marker" }
],
"sample_rate": 0.0,
"description": "Switch used for button press inputs",
"excluded_from_analysis": [],
"status": "active",
"static_offset": 0.0
}- Ensure that the switch signal model can be loaded or create a switch signal model. To create a new one:
from pathlib import Path
from bcipy.acquisition.devices import preconfigured_device
from bcipy.io.save import save_model
from bcipy.signal.model.base_model import SignalModelMetadata
from bcipy.signal.model.switch_model import SwitchModel
dirname = "" # TODO: enter the directory
model = SwitchModel()
# name should match devices.json spec. Alternatively, use bcipy.acquisition.datastream.mock.switch.switch_device()
device = preconfigured_device("Switch")
model.metadata = SignalModelMetadata(device_spec=device, evidence_type="BTN", transform=None)
save_model(model, Path(dirname, "switch_model.pkl"))-
Set the appropriate simulation parameters in the parameters.json file.
- set the
acq_modeparameter to 'EEG+MARKERS'. - ensure that
preview_inquiry_progress_methodparameter is set to '1' or '2'. - You may also want to set the
summarize_sessionparameter totrueto see how the evidences get combined during decision-making.
- set the
-
Ensure that the data directories have a raw data file (csv) for markers in addition to the EEG data. If your data does not have marker data, you can extract this from the triggers.txt file using the script
bcipy.simulator.util.generate_marker_data. If the task was run with Inquiry Preview, the script can use the button press events recoreded in the trigger file. Otherwise you can use the--mockflag along with a parameters file to mock what a raw data file would look like if the user pressed the button according to the configured button press mode.$ python -m bcipy.simulator.util.generate_marker_data -h usage: generate_marker_data.py [-h] [-m] [-p PARAMETERS] data_folder Create raw marker data for a given session. positional arguments: data_folder Data directory (must contain triggers.txt file) optional arguments: -h, --help show this help message and exit -m, --mock Mock data; use when button presses are not recorded in trigger file. -p PARAMETERS, --parameters PARAMETERS Optional parameters file to use when mocking data.
-
Run a simulation.
- Set the simulation parameters for both the EEG and the Button models (.pkl files).
- Use the InquirySampler
Run with verbose mode and inspect the detailed run logs to ensure that the evidence is being sampled correctly.
Along with the 'eeg' evidence, the output session.json (and session.xlsx) should record 'btn' evidence for each inquiry. These evidences should be fused to provide the 'likelihood' values.
For inquiries in which the target is shown:
- evidence values for symbols in the inquiry should be boosted relative to non-inquiry symbols (default values are 0.95 for boosted and 0.05 for degraded).
For inquiries in which the target not shown:
- evidence values for symbols in inquiry should be degraded
Note that the progress method (preview_inquiry_progress_method parameter) doesn't matter if it is set to "press to accept" or "press to skip", since the SwitchDataProcessor interprets this and outputs a 1.0 for inquiries that should be supported and 0.0 for those that shouldn't.
- A
preview_inquiry_progress_methodof 0 is currently not supported and an exception will be thrown. Ideally, all inquiries should get an evidence value of 1.0 (no change) with this mode. - Button evidence only works correctly with the InquirySampler. This is due to all trials in the same inquiry receiving the same value.