Skip to content

Commit fd747b3

Browse files
committed
Add data directory documentation with structure and acquisition instructions
1 parent 4632a93 commit fd747b3

1 file changed

Lines changed: 106 additions & 0 deletions

File tree

data/README.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Data Directory
2+
3+
This directory contains neural and behavioral data for the ClickDV project. The data is organized by animal subjects and recording sessions.
4+
5+
## Directory Structure
6+
7+
```
8+
data/
9+
├── README.md # This file
10+
├── raw/ # Raw neural data files (.mat format)
11+
│ ├── A324/ # Subject A324 data
12+
│ │ ├── 2023-07-27/
13+
│ │ │ └── A324_pycells_20230727.mat
14+
│ │ └── 2023-07-28/
15+
│ │ └── A324_pycells_20230728.mat
16+
│ ├── A327/ # Subject A327 data
17+
│ │ ├── 2023-09-09/
18+
│ │ │ └── A327_pycells_20230909.mat
19+
│ │ ├── 2023-09-10/
20+
│ │ │ └── A327_pycells_20230910.mat
21+
│ │ ├── 2023-09-11/
22+
│ │ │ └── A327_pycells_20230911.mat
23+
│ │ └── 2023-09-12/
24+
│ │ └── A327_pycells_20230912.mat
25+
│ ├── C211/ # Subject C211 data
26+
│ │ ├── 2024-01-03/
27+
│ │ │ └── C211_pycells_20240103.mat
28+
│ │ ├── 2024-01-04/
29+
│ │ │ └── C211_pycells_20240104.mat
30+
│ │ ├── 2024-01-05/
31+
│ │ │ └── C211_pycells_20240105.mat
32+
│ │ ├── 2024-01-06/
33+
│ │ │ └── C211_pycells_20240106.mat
34+
│ │ ├── 2024-01-07/
35+
│ │ │ └── C211_pycells_20240107.mat
36+
│ │ ├── 2024-01-08/
37+
│ │ │ └── C211_pycells_20240108.mat
38+
│ │ └── 2024-01-10/
39+
│ │ └── C211_pycells_20240110.mat
40+
│ └── Copy of twoarmedbandit_trainingrecordings.csv
41+
└── processed/ # Processed data outputs
42+
├── aligned_sessions/ # Time-aligned neural activity
43+
├── click_times/ # Extracted click timing data
44+
└── decision_variables/ # Computed decision variables
45+
```
46+
47+
## Data Format
48+
49+
### Raw Data Files
50+
- **Format**: MATLAB `.mat` files
51+
- **Naming Convention**: `{ANIMAL_ID}_pycells_{YYYYMMDD}.mat`
52+
- **Content**: Neural spike times, behavioral timestamps, trial information
53+
54+
### Key Data Fields
55+
Each `.mat` file contains:
56+
- `raw_spike_time_s`: Raw neural spike times in seconds
57+
- `filt_spike_time`: Filtered spike times (quality-approved units)
58+
- `clicks_on`: Click event timestamps
59+
- `cpoke_in`/`cpoke_out`: Center poke entry/exit times
60+
- `spoke`: Side poke timestamps (choice indicators)
61+
- `feedback`: Trial feedback timestamps
62+
- `region`: Brain region labels (e.g., 'ADS', 'NAc', 'MGB')
63+
- `hemisphere`: Recording hemisphere ('left'/'right')
64+
65+
## Data Acquisition
66+
67+
### For New Users
68+
1. **Contact**: Obtain data access from the Brody-Daw lab. I received this from Julie Charlton.
69+
2. **Download**: Request access to the lab's data repository
70+
3. **Placement**: Download files into the appropriate `data/raw/ANIMAL_ID/DATE/` directories
71+
4. **Verification**: Ensure file naming follows the convention above
72+
73+
### Data Sources
74+
- **Origin**: Brody-Daw lab, Princeton University
75+
- **Recording Type**: Multi-unit neural recordings during two-armed bandit task
76+
- **Species**: Rat behavioral experiments
77+
- **Recording Regions**: Anterior Dorsal Striatum (ADS), Nucleus Accumbens (NAc), others
78+
79+
### File Sizes
80+
- Individual session files: ~10-50 MB each
81+
- Total raw data: ~500 MB
82+
- Processed outputs: Variable, typically <100 MB
83+
84+
## Setup Instructions
85+
86+
1. **Create directory structure**:
87+
```bash
88+
mkdir -p data/raw data/processed/aligned_sessions data/processed/click_times data/processed/decision_variables
89+
```
90+
91+
2. **Obtain raw data**:
92+
- Contact lab for data access credentials
93+
- Download session files to appropriate `data/raw/ANIMAL_ID/DATE/` folders
94+
- Verify file integrity and naming conventions
95+
96+
3. **Processed data**:
97+
- Will be generated by running analysis scripts
98+
- Intermediate outputs saved in `processed/` subdirectories
99+
- Can be regenerated from raw data as needed
100+
101+
## Notes
102+
103+
- Raw data files are ignored by git (see `.gitignore`)
104+
- Only commit processed outputs that are small and essential
105+
- For reproducibility, document any data preprocessing steps
106+
- Consider using data versioning tools (DVC) for larger datasets

0 commit comments

Comments
 (0)