Skip to content

Latest commit

 

History

History
236 lines (167 loc) · 12.8 KB

File metadata and controls

236 lines (167 loc) · 12.8 KB

BciPy Simulator

Overview

This Simulator module aims to automate experimentation by sampling EEG data from prior sessions and running given models in a task loop, thus simulating a live session.

Run steps

main.py is the entry point for program. After following BciPy readme steps for setup, run the module from terminal:

(venv) $ bcipy-sim -h
usage: bcipy-sim [-h] [-i] [--gui] [-d DATA_FOLDER] [-m MODEL_PATH] [-p PARAMETERS] [-n N] [-s SAMPLER] [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -i, --interactive     Use interactive command line for selecting simulator inputs
  --gui                 Use interactive GUI for selecting simulator inputs
  -d DATA_FOLDER, --data_folder DATA_FOLDER
                        Raw data folders to be processed. Multiple values can be provided, or a single parent folder.
  -m MODEL_PATH, --model_path MODEL_PATH
                        Signal models to be used. Multiple models can be provided.
  -p PARAMETERS, --parameters PARAMETERS
                        Parameter File to be used
  -n N                  Number of times to run the simulation
  -s SAMPLER, --sampler SAMPLER
                        Sampling strategy
  --sampler_args SAMPLER_ARGS
                        Sampler args structured as a JSON string.
  -o OUTPUT, --output OUTPUT
                        Sim output path
  -v, --verbose         Verbose mode for more detailed logging.

For example,

$ bcipy-sim -d my_data_folder/ -p my_parameters.json -m my_models/ -n 5

Program Args

  • i : Interactive command line interface. Provide this flag by itself to be prompted for each parameter.
  • gui: A graphical user interface for configuring a simulation. This mode will output the command line arguments which can be used to repeat the simulation.
  • d : Raw data folders to be processed. Data folders should contain EEG responses to Copy Phrase tasks. Each session data folder should contain raw_data.csv, triggers.txt, parameters.json.These files will be used to construct a data pool from which simulator will sample EEG. The parameters file in each data folder will be used to check compatibility with the simulation/model parameters.
  • p : path to the parameters.json file used to run the simulation. These parameters will be applied to all raw_data files when loading. This file can specify various aspects of the simulation, including the language model to be used, the text to be spelled, etc. Timing-related parameters should generally match the parameters file used for training the signal model(s).
  • m: Path to a pickled (.pkl) signal model. One or more models can be provided.
  • n: Number of simulation runs
  • o: Output directory for all simulation artifacts.
  • s: Sampling strategy to use; by default the TargetNonTargetSampler is used. The value provided should be the class name of a Sampler.
  • sampler_args: Arguments to pass in to the selected Sampler. Some samplers can be customized with further parameters. These should be structured as a JSON dictionary mapping keys to values. For example: --sampler_args='{"inquiry_end": 4}'
  • -v or --verbose: Execute the simulation in verbose mode for more detailed logging. Useful for debugging.

Sim Output Details

Output folders are generally located in the data/simulator directory, but can be configured per simulation. Each simulation will create a new directory. The directory name will be prefixed with SIM and will include the current date and time (E.G -- "SIM_%m-%d-%Y_%H_%M_%S")

At the top level of the output directory, the following files are created:

  • parameters.json captures params used for the simulation.
  • sim.log is a log file for the overall simulation; metrics will be output here.
  • summary_data.json summarizes session data from each of the runs into a single data structure.
  • metrics.png boxplots for several metrics summarizing all simulation runs.

A directory is created for each simulation run. The directory contents are similar to the session output in a normal bcipy task. Each run directory contains:

  • run_{n}.log log file specific to the run, where n is the run number.
  • session.json session data output for the task, including evidence generated for each inquiry and overall metrics.
  • session.xlsx session data summarized in an excel spreadsheet with charts for easier visualization.

Main Components

  • Task - a simulation task to be run (ex. RSVP Copy Phrase)
  • TaskRunner - runs one or more iterations of a simulation
  • TaskFactory - constructs the hierarchy of objects needed for the simulation.
  • DataEngine - loads data to be used in a simulation and provides an API to query for data.
  • DataProcessor - used by the DataEngine to pre-process data. Pre-processed data can be classified by a signal model.
  • Sampler - strategy for sampling data from the data pool stored in the DataEngine.

Device Support

The simulator is structured to support evidence from multiple devices (multimodal). However, it currently only includes processing for EEG device data. To provide support for models trained on data from other devices (ex. Gaze), a RawDataProcessor must be added for that device. The Processor pre-processes data collected from that device and prepares it for sampling. A RawDataProcessor is matched up to a given signal model using that model's metadata (metadata.device_spec.content_type). See the data_process module for more details.

Parameters

The parameters file is used to configure various aspects of the of the simulation. Timing-related parameters should generally match the parameters file used for training the signal model(s). Following are some specific parameters that you may want to modify, depending on the goals of a particular simulation:

  • task_text - the text to spell.
  • lang_model_type - language model to use in the simulation.
  • summarize_session - if set to true a session.xlsx summary will be generated for each simulation run.

Stoppage Criteria

Parameters which define task stoppage criteria are important to ensure that the simulation runs to completion without getting stuck in an infinite loop. The values for these parameters may also affect analysis of results.

  • min_inq_len - Specifies the minimum number of inquiries to present before making a decision in copy/spelling tasks.
  • max_inq_len - maximum number of inquiries to display before stopping the task.
  • max_selections - The maximum number of selections for copy/spelling tasks. The task will end if this number is reached.
  • max_incorrect - The maximum number of consecutive incorrect selections for copy/spelling tasks. The task will end if this number is reached.
  • max_inq_per_series - Specifies the maximum number of inquiries to present before making a decision in copy/spelling tasks

GUI

A simulation can be started using a graphical user interface.

$ bcipy-sim --gui

This provides a way to explore the file system when providing the parameters.json file, simulation model, and input data sources. After all required inputs have been provided the user can initiate the simulation from the GUI interface. The command line used to run the simulation are output to the console prior to the run making it easier to start subsequent simulations with the same set of arguments.

Replay Session

The simulator also includes a Task for replaying a recorded session using a different signal model. This is used for testing if changes to a model result in more easily differentiated signals.

This functionality currently has a different entry point.

(venv) $ bcipy-replay -h
usage: bcipy-replay [-h] [-d DATA_FOLDER] -m MODEL_PATH [-p PARAMETERS] [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -d DATA_FOLDER, --data_folder DATA_FOLDER
                        Raw data folders to be processed. Multiple values can be provided, or a single parent folder.
  -m MODEL_PATH, --model_path MODEL_PATH
                        Signal model to be used.
  -p PARAMETERS, --parameters PARAMETERS
                        Parameter File to be used
  -o OUTPUT, --output OUTPUT
                        Sim output path

Current Limitations

  • Only one sampler maybe provided for all devices. Ideally we should support a different sampling strategy for each device.
  • Only Copy Phrase is currently supported.

Group Demo

A group simulation demo is provided in the demo directory. This demo includes the ability to run a simulation across multiple users, phrases, and language_models. The demo is run using the following command while in a virtual environment with the bcipy package installed:

python bcipy/simulator/demo/demo_group_simulation.py

See the demo file for more details on how to configure the simulation.

Multimodal

The switch_data_processor and switch_model are used to demonstrate a multimodal simulations using a button/switch as an example. To run a simulation with these inputs, you will need to perform the following steps:

  1. Ensure that the devices.json file has an entry for a switch
{
  "name": "Switch",
  "content_type": "MARKERS",
  "channels": [
      { "name": "Marker", "label": "Marker" }
  ],
  "sample_rate": 0.0,
  "description": "Switch used for button press inputs",
  "excluded_from_analysis": [],
  "status": "active",
  "static_offset": 0.0
}
  1. Ensure that the switch signal model can be loaded or create a switch signal model. To create a new one:
from pathlib import Path
from bcipy.acquisition.devices import preconfigured_device
from bcipy.io.save import save_model
from bcipy.signal.model.base_model import SignalModelMetadata
from bcipy.signal.model.switch_model import SwitchModel

dirname = "" # TODO: enter the directory
model = SwitchModel()

# name should match devices.json spec. Alternatively, use bcipy.acquisition.datastream.mock.switch.switch_device()

device = preconfigured_device("Switch")
model.metadata = SignalModelMetadata(device_spec=device, evidence_type="BTN", transform=None)
save_model(model, Path(dirname, "switch_model.pkl"))
  1. Set the appropriate simulation parameters in the parameters.json file.

    • set the acq_mode parameter to 'EEG+MARKERS'.
    • ensure that preview_inquiry_progress_method parameter is set to '1' or '2'.
    • You may also want to set the summarize_session parameter to true to see how the evidences get combined during decision-making.
  2. Ensure that the data directories have a raw data file (csv) for markers in addition to the EEG data. If your data does not have marker data, you can extract this from the triggers.txt file using the script bcipy.simulator.util.generate_marker_data. If the task was run with Inquiry Preview, the script can use the button press events recoreded in the trigger file. Otherwise you can use the --mock flag along with a parameters file to mock what a raw data file would look like if the user pressed the button according to the configured button press mode.

    $ python -m bcipy.simulator.util.generate_marker_data -h
    usage: generate_marker_data.py [-h] [-m] [-p PARAMETERS] data_folder
    
    Create raw marker data for a given session.
    
    positional arguments:
      data_folder           Data directory (must contain triggers.txt file)
    
    optional arguments:
      -h, --help            show this help message and exit
      -m, --mock            Mock data; use when button presses are not recorded in trigger file.
      -p PARAMETERS, --parameters PARAMETERS
                            Optional parameters file to use when mocking data.
  3. Run a simulation.

    • Set the simulation parameters for both the EEG and the Button models (.pkl files).
    • Use the InquirySampler

Run with verbose mode and inspect the detailed run logs to ensure that the evidence is being sampled correctly.

Expected Behavior

Along with the 'eeg' evidence, the output session.json (and session.xlsx) should record 'btn' evidence for each inquiry. These evidences should be fused to provide the 'likelihood' values.

For inquiries in which the target is shown:

  • evidence values for symbols in the inquiry should be boosted relative to non-inquiry symbols (default values are 0.95 for boosted and 0.05 for degraded).

For inquiries in which the target not shown:

  • evidence values for symbols in inquiry should be degraded

Note that the progress method (preview_inquiry_progress_method parameter) doesn't matter if it is set to "press to accept" or "press to skip", since the SwitchDataProcessor interprets this and outputs a 1.0 for inquiries that should be supported and 0.0 for those that shouldn't.

Multimodal Limitations

  • A preview_inquiry_progress_method of 0 is currently not supported and an exception will be thrown. Ideally, all inquiries should get an evidence value of 1.0 (no change) with this mode.
  • Button evidence only works correctly with the InquirySampler. This is due to all trials in the same inquiry receiving the same value.