Efficient algorithms for adaptive data collection with very large action spaces
In many science and industry applications, data-driven discovery is limited by the rate of data collection like the time it takes skilled labor to operate a pipette or the cost of expensive reagents or use of experimental apparatuses. When measurement budgets are necessarily small, adaptive data collection that uses previously collected data to inform future data collection in a closed loop can make the difference between inferring a phenomenon or not. While methods like multi-armed bandits have provided great insights into optimal means of collecting data in the last several years, these algorithms require a number of measurements that scales linearly with the total number of possible actions or measurements that can be made, even if discovering just one among possibly many true positives is desired. For example, if many of our 20,000 genes are critical for cell-growth and a measurement corresponds to knocking out just one gene and measuring a noisy phenotype signal, one may expect that we can find a single influential gene with far fewer than 20,000 total measurements. In this talk I will ground this intuition in a theoretical framework and describe several applications where I have applied this perspective and new algorithms including crowd-sourcing preferences, multiple testing with false discovery control, hyperparameter tuning, and crowdfunding.