FAQs

I. Data Challenge

Q: What is the overall goal of the CogPilot Data Challenge?
A: This data challenge focuses on developing AI approaches that can turn multimodal physiological measures into accurate quantitative assessments of cognitive state/cognitive workload. Such technology can be applicable to ground training, in-flight training, and any task that's cognitively demanding and has the potential for optimizing the training outcome

Q: What are the modeling tasks for this data challenge?
A: There are two predictive modeling tasks for you to complete in this data challenge. Given the physiological data measured from a subject during a run:
• Challenge Task 1: Predict the difficulty level of the run (there are 4 difficulty levels, thus this is a multiclass classification)
• Challenge Task 2: Predict the performance error of the run

Q: Is there a targeted deployment platform for a final solution?
A: Not for this data challenge. But solutions that leverage open source resources are preferred.

Q: What are the options for me to post my questions and get answers?
A: You can always reach the challenge organizers by sending an email to: cogpilot@mit.edu.

You can also sign up for the CogPilot Data Challenge Slack channel and post your questions and comments there. It’s a community for participants to discuss data challenge related topics.

II. Registration

Q: Who is the target audience for the CogPilot Data Challenge?
A: Anyone interested in the intersection of AI, human performance, physiology, cognition, and flying! All participants are welcome.

Q: When will the CogPilot Data Challenge take place and what is the time commitment of participants?
A: The data challenge will begin in the middle of October and continue until Spring 2023. Registration will remain open for the entire Data Challenge.

Q: Is the CogPilot Data Challenge in-person or virtual?
A: This challenge will be hosted fully virtually.

Q: If I want to participate as a team, does each member register individually?
A: Please have all members register individually and list their team's name.

Q: Is there a limit to team size?
A: No, but usually a team of 4-8 is recommended.

III. Dataset

Q: How do I access the dataset?
A: The dataset is freely available on Physionet: https://doi.org/10.13026/azwa-ge48

Q: How do we learn more about the challenge data, including the data collection set-up, recording, modalities, and preprocessing?
A: Please check out the reference folder that comes with the challenge dataset download. It contains a wealth of information for this data challenge.

Q: Is this data labeled?
A: The dataset includes a PerfMetrics.csv that includes, for each run, both the difficulty level label (“Difficulty”) for Challenge Task #1 and the total flight performance error (“Cumulative_Total_Error”) for Challenge Task #2.

Q1: Can you provide a process flow with approximate times that covers what the trainees go through before data recording starts until after data recording stops?
A: Trainees are first outfitted with a suite of wearable sensors. Then, they perform a few practice runs on the lowest difficulty level to get familiarized with the scenario, which usually lasts about 5-10 mins. Trainees then get ready for the 12 runs of the experiment. The experimenter first loads the scenario (i.e., one of the 4 levels of difficulty). The scenario loads in “paused” mode so there’s no aircraft movement. The experimenter then starts the data recording. Every flight begins with the aircraft in the same position and orientation. The only differences between levels of difficulty are changes in weather (visibility and wind). The start of the data recording happens slightly before the start of the flight. In practice, the simulation is paused, the data logging begins, the simulation is then un-paused, and the participant begins the run. One could plot the aircraft airspeed to see that the initial few points have zero velocity, but it jumps to ~100 knots once the simulation is un-paused. After the trainee lands the aircraft (or crashes), the experimenter asks the trainee to take their hands off the controls and then ends the recording of the data. At this point, the next trial is loaded.

Q: What is included in the “Rest” periods
A: There is a 5-min Rest period before the 12 runs and after. During the rest periods, trainees sit quietly with all the sensors recording data. The rest periods may provide information regarding the physiological baseline of an individual.

Q: How long is each run?
A: Runs are approximately 7 to 10 mins and may depend on the speed of flight, or errors in movement. Sometimes, a novice subject may crash during the virtual flight, so the run may be shorter. Each of the 12 runs is one of the four difficulty options.

Q: Are the recordings synchronized?
A: For each modality, the data and time vectors are aligned, and the linking time point is listed in the first column. The time listed in the first column is universal across modalities. However, the first timestamp between two different modality files (e.g., Subject001_EDAfile vs Subject001_EMGfile) may not be the same. One modality stream might start slightly ahead of the other. Nevertheless, the timestamps will be the truth to identify the slight time difference.

Q: For eye tracking, is there an association between the X and Y axis and what instrument the pilot would be looking at?
A: No, because of head movement and rotation, the position of the instrument panel may change in the VR space relative to the person's eye tracking gaze.

Q: Is a "Trial" the same as a "Run" or are there multiple trials per run?
A: We use Trial and Run interchangeably. In the data files, we use "runs".

Q: Should we hold out subjects for evaluation?
A: The entire dataset available for download is for you to develop your AI models. You can partition the data however you like for the goal of model development. A typical approach is to use cross validation. We have an independent dataset outside of the downable dataset for evaluation.

IV. Model Development

Q: Is there a preference for programming language or technology stack?
A: The starter code has been written in Python. However, you are welcome to use any language that you are comfortable with.

Q: Can you suggest an ML course to help us get up to speed on the key concepts of ML?
A: While there are good ML courses online, here we recommend a ML course taught by one of the MIT co-PIs of the CogPilot project. Course Link: https://tamarabroderick.com/ml.html

V. Model Evaluation

Q: How is the submitted entries being evaluated?
A: Please refer to the Data Challenge Description Tab for details.

VI. Miscellaneous
Q: How can I participate in the data collection?
A: If you'd like to be a subject, please email cogpilot@mit.edu (data collections occur at MIT Campus in Cambridge, MA)

Q: Can we run this simulator ourselves and is it the same as the publicly available one on Steam?
A: If you have a Valve Index, then you can access X-Plane 11 at https://store.steampowered.com/app/269950/XPlane_11/ . However, it will not be the aircraft that was used in the simulations, which was a T-6A Texan II aircraft model developed by FliteAdvantage. This aircraft model is custom built based on the Air Force T-6 and is not available freely.