Companion Data Page

Behavioral individuality reveals genetic control of phenotypic variability

Julien Ayroles1,2,3, Sean Buchanan4, Chelsea Jenney1,4,5, Kyobi Skutt-Kakaria1,5,
Jennifer Grenier3, Andrew Clark3, Daniel Hartl1, Benjamin de Bivort1,4,5

1 Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA.
2 Harvard Society of Fellows, Harvard University, Cambridge, Massachusetts, USA.
3 Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA.
4 The Rowland Institute at Harvard, Cambridge, Massachusetts, USA.
5 Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA.

Final publication coming soon.

Instrument control .VIs (LabVIEW 8.6)

y maze Main data acquisition interface. Performs tracking of the centroid of pixels dark with respect to a singly acquired reference image, within positionable regions of interest that the user centers over the Y-maze centers. Has fields for experimental meta data. Exports two files, one with the extension ".labels.txt" which contains meta data, and one with the extension ".data.txt" containing the tracking data. The file path serving as the base name of these two files is the inputs to flyY120LoadData.m which reads the data into MATLAB and converts the tracking data into directional choices.
Calls y maze Save data and y maze ROI centroid
y maze ROI centroid Determines the centroid of pixels exceeding a darkness threshold compared to a reference image within all the user-designated ROIs. Addorns the difference image between the current camera view and the reference image with red circles marking the positions of the centroids, and yellow boxes indicating the ROI boundaries.
Called by y maze
y maze Save data Writes a line of output to the .data.txt output file of y maze once per frame. Called by y maze

Data .mat (MATLAB 2014a)

vofv_AllYMazeData.mat Individual Y-maze left-right choice data for all the flies reported in the study. Contains four variables: expData_DGRPScreen (choice data for the DGRP line screen, Figure 1), expData_VarHerit (choice data for the heritability of high and low variability phenotype crosses, Figure 2), expData_TenaMutants (choice data for the Ten-a allele experiments, Figure 3), and expData_TenaTimeCourse (choice data for the Ten-a RNAi sliding window knock down experiment and controls, Figure 4).

expData_DGRPScreen, expData_VarHerit and expData_TenaTimeCourse are n x 15 cell arrays, where the columns are: 1) experiment filename string, 2) maze number within the array, 3) experiment date, 4) experiment start time, 5) imaging rig number, 6) tray number, 7) experimental group / genotype, 8) fly sex, 9) comment field, 10) maze ROI x coordinate, 11) maze ROI y coordinate, 12) number of turns, 13) turn bias, 14) vector of turn directions (0=left, 1=right), 15) vector of turn times in ms (0=start of experiment).

expData_TenaMutants is a n x 9 cell array in the format of data from Buchanan et al., 2014. Columns are: 1) maze number within the array, 2) Buchanan experiment ID, 3) tray number, 4) imaging rig number, 5) experimental group / genotype, 6) number of turns, 7) turn bias, 8) vector of turn directions (0=left, 1=right), 9) vector of turn times in ms (0=start of experiment).

Analysis .m functions (MATLAB 2014a)

basic analysis and data handling
flyY120LoadData.m Loads turn choice data from the output of y maze into MATLAB. Takes one parameter, pathname, which is the file path to the data to be imported, including directories as needed, but excluding the .dat.txt and .label.txt extensions created by the instrument control vi. Integrates turn direction and timing vectors with experimental meta data from the .label.txt file.
Calls flyY120.m.
flyY120.m Parses the data contained within the .dat.txt vi output file to extract the turn direction and turn timing vectors. Sensitive to which mazes within each 120 maze array are rightside up (1 - 64), and which are updside down (65-120).
Called by flyY120LoadData.m.
flyY120StatusSummary.m Generates a basic summary table (a cell array) of the sample sizes of each experimental group as enumerated by their labels in a raw turn choice data master variable such as expData_DGRPScreen. Determines the list of experimental groups by taking unique labels from the column 7 of the master variable. Can be used on Nine column formatted data variables by changing the hard-coded parameter strainCol. Used during the screening of DGRP lines to monitor the overall progress of the screens and individual lines.
handGetExpGroup.m Collects all the individual turn data from a single experimental group / genotype from within a master cell array variable such as expData_DGRPScreen. Output is a cell array of just the turn data for that experimental group.
strainPicker.m Generates a randomly permuted vector of DGRP line numbers, based on a vector of such labels passed. Used during the screening of DGRP lines to determine which lines (among the lines that were not completed, i.e. were still being tested) would go into the maze array currently being loaded.
flyY120DataWorkup.m Generates additional statistics based on individual turn score data in 15 column format, such as that in expData_DGRPScreen. Output is a 3 object structure with these objects:

.summary: n x 13 cell array, where n is the number of unique experimental groups present in the input data. Columns are as follows: 1) experimental group name, 2) sample size, 3) median turning bias, 4) median number of turns completed, 5) median switchiness (the streakiness of the turn direction vector, a measure of the mutual information between successive turns), 6) median clumpiness (a measure of the lack of uniformity of turn timing), 7) the median absolute deviation from the median (MAD) of turn bias, 8) the MAD of the number of turns completed, 9) MAD of switchiness, 10) MAD of clumpiness, 11) median of unimplemented alternative clumpiness metric, 12) MAD of unimplemented alternative clumpiness metric, 13) the vector of turn biases from this experimental group.

.newData: an appended version of the single fly turning data that was used as an input with three additional columns: 16) the switchiness of each fly's turn sequence, 17) the clumpiness of each fly's turn timing, 18) unimplemented alternative clumpiness metric.

.corrs: a 4 x 4 x n array of correlation coefficients, where each 4 x 4 subarray is the cross correlation matrix between the four individual fly phenotypes (turn bias, number of turns, turn sequence switchinesss and turn timing clumpiness) across individual flies. n such matrices are given, one for each experimental group in the input data with a sample size>=50.
vofvTenATimecourseBS.m Estimates the likelihood of observing greater variation in the MAD across experimental groups than observed in experimental data, based on the null hypothesis that all groups are drawn from the same distribution of values. Used to generate the p-values of Figure 4d. Resamples individual experimental groups from a pool of turn biases from all experimental groups and calculates the test statistic M=abs(MAD1-MADall) + abs(MAD2-MADall) + ... + abs(MADn-MADall) where n is the number of experimental groups, MADi is the resampled MAD of group i, and MADall is the MAD of the concatenated turn bias scores for all groups. Performs 10,000 replicates.

Takes the .summary object of the output of flyY120DataWorkup.m as input (or a row subset of that cell array).
vofvMadzTest.m Resamples the MAD value of a vector of turn bias scores. Second parameter is number of resamples to perform. Used to generate standard error intervals, such as those in Figure 4d, or estimate the statistical support for the MAD of an experimental group being greater or less than a null hypothesis value (e.g. the global MAD presented in Figure 1a) by calculating the frequency with which the resampled MAD falls below (or above) the null hypothesis value.
vofvMadtTest.m Resamples the difference in MAD scores between two vectors of turn bias scores. Third parameter is number of resamples to perform. Used to generate p-values on the difference between two experimental group MADs under the assumption that they are drawn from the same distribution. Concatenates both vectors and resamples each group from that pool of turn biases, and outputs the difference in the MADs between the two resampled groups. Can be one- or two-tailed by considering only differences greater than zero or both less than and greater than zero. The frequency of resampled differences greater than or equal to the observed difference in MAD between two groups is used to estimate the p-value. Used to estimate the p-values in Figure 3a-c.
bootstrapCI.m Calculates "95% likelihood intervals" used to interpret bootstrap-derived p-values. I.e. if bootstrap resampling under the null hypothesis yielded k of n observations of a test statistic as extreme or moreso than the experimental data, calculates the lowest value of p such that binomialCDF(n,k,p) > 0.025 and the greatest p such that binomialCDF(n,k,p) < 0.975. When n>>k, we conservatively report the former as our estimate of p based on bootstrapping. Input parameters are k and n.
vofvBetaResample.m Resamples best beta-distribution fits of a vector of turn bias data, by resampling with replacement the individual turn bias scores and then calculating a beta distribution based on the mean and variance of the resample. Used to generate the Confidence Intervals of the right panels Figure 3a-c, which are +/- 2x the standard deviation of the resampled distribution values.