Analysis Overview

Experiment workflow

The data pipeline has five stages:

  1. Run the experiment — A MATLAB protocol script (e.g. protocol_27.m) controls the LED arena and camera, producing a UFMF video file and a LOG.mat file with stimulus timing and metadata.
  2. Track the flies — FlyTracker processes the video offline, extracting each fly’s position, heading, and basic features into trx.mat (trajectories) and feat.mat (features).
  3. Compute behavioural metricscombine_data_one_cohort(feat, trx) filters bad tracking, interpolates gaps, and computes 12 behavioural metrics (forward velocity, angular velocity, turning rate, distance from centre, etc.).
  4. Split by conditioncomb_data_one_cohort_cond(LOG, comb_data) uses LOG frame indices to slice the continuous data into per-condition segments.
  5. Merge experimentscomb_data_across_cohorts_cond(protocol_dir) combines all sessions into a single hierarchical DATA struct for group analysis.

Steps 2–3 can be fully automated by the processing pipeline.

Overview

The data acquired from freely-walking optomotor experiments, especially during the screen using protocol 27, is analysed in two main steps.

The first step (process_freely_walking_data) is done per cohort (each vial of flies that was run). This creates several “overview” level plots for the individual cohort.

The second step (process_screen_data) combines data from across cohorts and parses the data based on the condition too. This creates plots that compare the behaviour of each strain against the empty-split control flies.

A third step (make_summary_heat_maps_p27) performs statistical testing across all strains and conditions.

Requirements for analysing the data from the MIC screen

In order for the processing pipeline to run, within each experiment folder there should be:

  • a .ufmf video of the entire experiment
  • a .mat LOG file
  • a subdirectory that contains trx.mat and the -feat.mat file outputted by FlyTracker.

The .ufmf video is a compressed video format generated by BIAS. The difference between frames is stored, not the entire frame data. The LOG file contains metadata about the experiment (fly strain, date and time, which pattern was used for which condition) and the frame numbers at which each condition started and ended. These frame numbers are recorded during the experiment by MATLAB interfacing with BIAS.

The two FlyTracker output files serve different purposes:

  • trx.mat — trajectory data. A MATLAB table with one row per tracked fly containing arrays of x position, y position, heading angle, and timestamps across all video frames.
  • -feat.mat — behavioural features. Contains per-frame measurements computed by FlyTracker such as distance from the arena edge, wing angles, and body dimensions.

Both files are generated during the FlyTracker tracking pipeline — trx is produced first during tracking, then feat is computed from the trajectories in a second pass. All rows should have arrays of the same length corresponding to the total number of frames in the video.

Behavioural metrics

The following metrics are computed from the FlyTracker output by combine_data_one_cohort and stored in the comb_data structure:

Variable Units Source Computation
fv_data mm/s Computed Forward velocity in heading direction (two-point derivative, negative values set to NaN)
av_data deg/s Computed Angular velocity via least-squares line fit to heading (window = 16 frames)
vel_data mm/s Computed Total velocity magnitude (three-point central difference)
curv_data deg/mm Computed Turning rate: av_data / fv_data
x_data mm trx X position, converted from pixels via PPM (4.1691 px/mm)
y_data mm trx Y position, converted from pixels via PPM
heading_data deg trx Continuous (unwrapped) heading angle
heading_wrap deg trx Heading wrapped to −180° to 180°
dist_data mm feat Distance from arena centre
dist_data_delta mm Computed Change in distance relative to stimulus onset
view_dist mm Computed Viewing distance to arena wall (ray-circle intersection)
IFD_data mm Computed Distance to nearest fly
IFA_data deg Computed Angle to nearest fly
NoteProcessing parameters
Parameter Value Description
Max velocity threshold 50 mm/s Frames with velocity above this are set to NaN (assumed tracking errors)
Angular velocity window 16 frames Least-squares fitting window for heading derivative
Interpolation method spline Used for filling NaN values in position/distance data; heading uses previous-value fill
Frame rate 30 fps Camera acquisition rate

Tree structure of processing functions

Functions in red are used for processing the data. Functions in blue are used for plotting the data.

  • process_freely_walking_data
    • process_data_features
      • combine_data_one_cohort
      • make_overview
      • plot_all_features_filt
      • plot_all_features_acclim
      • comb_data_one_cohort_cond
      • plot_allcond_onecohort_tuning
      • plot_errorbar_tuning_curve_diff_contrasts
      • plot_errorbar_tuning_diff_speeds
      • generate_circ_stim_ufmf
        • create_stim_video_loop
  • process_screen_data
    • comb_data_across_cohorts_cond
    • generate_exp_data_struct
    • plot_allcond_acrossgroups_tuning

Level 1 — analyse per cohort: process_freely_walking_data

Inputs

Requires a string of the date for which you want to analyse the data (format 'YYYY_MM_DD'). It will process all of the data from experiments conducted with any protocol that are within that day.

Runs the function process_data_features per cohort and experiment.

Outputs

  • Exports a text file of the number of flies ran per protocol and per strain.
  • Results .mat file per vial containing: LOG, feat, trx, comb_data, n_fly_data
  • Figures:
    • Acclimation timeseries
    • Full-experiment timeseries overview
    • Timeseries per behavioural metric per vial

Description of process_data_features

Processes the tracked data from FlyTracker. Loads LOG, feat, and trx from each experiment folder.

Saves in the results file *_data.mat:

  • LOG — original experiment metadata
  • feat — FlyTracker features with poorly tracked flies removed
  • trx — FlyTracker trajectories with poorly tracked flies removed
  • comb_data — combined behavioural metrics for all flies across the entire experiment
  • n_fly_data[3 x 1] array of [n_flies_in_arena, n_flies_tracked, n_flies_removed]

The function proceeds through four steps:

1. Combine the tracking data for all flies within one vial across the entire experiment

The function combine_data_one_cohort combines data from all flies within a single experiment into the comb_data struct. Each field (e.g. fv_data) contains a [n_flies x n_frames] array. The data is not parsed by condition at this stage.

Tracking quality is checked first by check_tracking_FlyTrk, which compares the frame count for each tracked object against the mode. Flies with a different frame count are removed — this catches cases where tracking was split across multiple identities or non-fly objects were tracked.

Data extracted directly from FlyTracker output:

  • Distance from the arena edge (from feat)
  • Heading angle (from trx)
  • X and Y position (from trx)

Data computed from these:

  • Angular velocity — least-squares line fit to heading over a 16-frame window (vel_estimate with method = 'line_fit')
  • Forward velocity — position derivative projected onto heading direction, smoothed with Gaussian convolution. Negative values and values exceeding 50 mm/s are set to NaN and filled with linear interpolation.
  • Three-point velocity — total speed from central difference of position (calculate_three_point_velocity)
  • Turning rateav_data / fv_data (degrees per millimetre)
  • Viewing distance — distance from the fly to the arena wall along its heading direction, computed via ray-circle intersection (calculate_viewing_distance). A ray is cast from the fly’s position along its heading and the intersection with the arena circle (centre [126.6, 124.7] mm, radius 119.0 mm) is found by solving the resulting quadratic equation.
  • Inter-fly distance and angle — distance and angle to the nearest other fly (calculate_distance_to_nearest_fly)

2. Create overview plots of behaviour during the entire experiment

  • make_overview — histogram subplots of general locomotion metrics (forward velocity, angular velocity, turning rate distributions) over the full protocol.

  • plot_all_features_filt — timeseries for all flies over the full protocol, showing forward velocity, angular velocity, turning rate, and distance from arena centre. Coloured background rectangles indicate when each stimulus condition occurred.

  • plot_all_features_acclim — timeseries during the 5-minute dark acclimation period only, showing forward velocity, angular velocity, turning rate, and both absolute and relative distance from centre.

    acclim_end = LOG.acclim_off1.stop_f;
    range_of_data_to_plot = 1:acclim_end;

3. Parse the behavioural data based on conditions

The function comb_data_one_cohort_cond organises the combined data into the nested DATA structure, with fields for each condition (e.g. R1_condition_1, R2_condition_1) and each behavioural metric within those conditions.

4. Plot the condition-parsed data

plot_allcond_onecohort_tuning generates a [(n_conditions/2) x 2] subplot figure showing mean ± SEM timeseries during each condition for all flies in the vial.


Explanation of the different functions used to combine data

combine_data_one_cohort

[comb_data, feat, trx] = combine_data_one_cohort(feat, trx)

Combines data from all flies within a single experiment into the comb_data struct. Each field contains a [n_flies x n_frames] array. This function checks for bad tracking, filters high-velocity frames (> 50 mm/s) as tracking errors (setting them to NaN), and fills missing values using spline interpolation for position/distance data and previous-value interpolation for heading. The processed data is saved to the results file and used for all downstream analyses. The original data is never altered.

comb_data_one_cohort_cond

Both comb_data_one_cohort_cond and comb_data_across_cohorts_cond create the nested DATA structure based on experimental conditions. The single-cohort version is only used within process_data_features to create the DATA struct for the per-vial overview timeseries plots.

comb_data_across_cohorts_cond

Used within process_screen_data to combine data from all flies across multiple cohorts. The resulting DATA struct is organised hierarchically:

DATA.(strain).(sex)(cohort_idx).(condition).(data_type)

For example: DATA.jfrc100_es_shibire_kir.F(1).R1_condition_1.fv_data returns a [n_flies x n_frames] array. This function requires that the protocol saves condition numbers to the LOG file — older protocols that do not include this information cannot be processed with this function.


Level 2 — analyse across cohorts: process_screen_data

This function uses the .mat results files generated by process_freely_walking_data to combine data across all cohorts.

  • Runs comb_data_across_cohorts_cond to generate the hierarchical DATA struct across all strains and cohorts.
  • Runs plot_allcond_acrossgroups_tuning to create [(n_conditions/2) x 2] subplot figures for each strain versus the empty-split control flies. It creates 5 figures per strain, one for each data type: fv_data, av_data, curv_data, dist_data, dist_data_delta.

Inputs

  • String of the protocol e.g. 'protocol_27'
  • .mat results files from process_data_features

Outputs

  • 5 figures per strain (timeseries per condition vs empty-split controls)
  • Text file and 2 plots of the number of vials per strain and the number of flies per strain

Level 3 — Statistical analysis: make_summary_heat_maps_p27

This function generates a red-blue heatmap of p-values comparing each strain to the empty-split control across all conditions and behavioural metrics.

The statistical pipeline:

  1. Combines all data for protocol_27 using comb_data_across_cohorts_cond.
  2. Computes p-values for each strain × condition × metric comparison using make_pvalue_heatmap_across_strains (Wilcoxon rank-sum test).
  3. Applies a False Discovery Rate (FDR) correction using fdr_bh with an alpha threshold of 0.001 and the dependent assumption ('dep' method).
  4. Plots the corrected p-values as a heatmap using plot_pval_heatmap_strains. Red indicates the test strain has a significantly higher value than the control; blue indicates a significantly lower value.

Processing of other protocols

Several analysis scripts handle data from protocols other than the main screen protocol:

Protocol Script Analysis
protocol_30 p30_different_contrasts_analysis.m Contrast tuning curves — compares optomotor responses across different contrast levels
protocol_31 p31_different_speeds_analysis.m Speed tuning curves — compares responses across 4 speeds (32, 64, 127 px/s) at two spatial frequencies
protocol_25 p25_single_lady_analysis.m Single-fly analysis — tests individual fly behaviour in isolation
protocol_33/34 analyse_p33_p34.m Eye-painted fly experiments
protocol_35 analyse_p35_shiftedCoR.m Shifted centre of rotation experiments

Additional analysis scripts for phototaxis (analyse_phototaxis.m) and viewing distance (analyse_viewing_distance.m) can be applied to data from any protocol.


Processing details

Tracking quality control

Before computing behavioural metrics, the function check_tracking_FlyTrk removes badly tracked flies. FlyTracker sometimes produces artifacts — a single fly split into two tracks, or a non-fly object (dust, shadow) tracked. The function detects these by comparing the frame count for each tracked object against the mode (the most common frame count, which corresponds to the true video length). Any fly whose frame count differs from the mode is removed from both trx and feat.data.

High-velocity filtering

For each fly, frames where the FlyTracker-reported velocity exceeds 50 mm/s are marked as tracking errors. This threshold is well above the typical maximum walking speed of Drosophila (~30 mm/s) and reliably catches tracking jumps. At these frames, position, heading, and distance data are set to NaN for interpolation.

Interpolation methods

Missing values from filtering are filled using method-appropriate interpolation:

Data Method Reason
Distance to wall (d_wall_data) Spline Smooth, continuous changes in distance
Heading angle (heading_data) Previous value Avoids introducing artificial heading jumps
X and Y position Spline Smooth position trajectories

Condition splitting

The function comb_data_one_cohort_cond slices continuous data into per-condition segments using LOG frame indices. For each condition, the data slice runs from start_f(1) - 300 to stop_f(end), where the 300-frame (10-second) pre-buffer captures baseline behaviour before stimulus onset. Conditions are named R1_condition_N or R2_condition_N for repetitions 1 and 2 respectively, plus acclim_off1, acclim_patt, and acclim_off2 for the acclimation phases.

TipFurther reading

The training guide PDF (docs/training_guide/training_guide.pdf in the freely-walking-optomotor repository) provides detailed walkthroughs of every processing function, including the mathematical derivations for angular velocity, forward velocity, viewing distance, and inter-fly distance computations.