NeuroScenes - Perception Engineering Group¶
Overview¶
This project is part of my master’s thesis. This page serves as some sort of diary thats reflects the constant idea creation and testing process. We investigate how certain statistical regularities modulate the N300 ERP. The N300 is a negative deflection observed in fronto-central areas of the scalp, happening roughly 300ms after a stimulus onset. Currently, the N300 is thought to be a result of the semantic processing of visual information, for example, classifying and recognizing images, and more importantly the detection of semantic mismatches, i.e. when something doesn’t belong, or doesn’t look right.
Predictive coding refers to the idea that the brain generates predictions about incoming sensory data instead of just reacting to it. This way, top-down predictions are compared with incoming sensory data, and only the error is sent back up to update the internal model. In this case, the N300 is the result of that mismatch.

This GIF illustrates brain activity dynamics over time following scene onset. Note the increase in activity around the occipital region (back of the head) at approximately 100 ms, corresponding to the brain’s processing of the visual scene. This highlights the early involvement of the visual cortex in scene perception.
While this visualization is from our pilot study and serves as an illustration rather than a precise representation of the N300, it provides a general overview of the temporal dynamics of brain responses to visual stimuli.
First impressions¶
For the first pilot, we decided to test our ideas by presenting the stimuli using a VR headset, we expected that more immersive images reflect the role of the N300 in real life situations, and not just highly controlled environments. We use naturalistic scenes as our stimuli. Half of the images in the experiment consisted of immersive images, presented in 360 degrees, allowing participants to look around. The other half were the same images, but cutout to resemble a regular camera photo, these covered a 14° x 10° visual field, trying to emulate previous experiments using 2D screens, presented over a phase-scrambled background, providing the same low-level features as the immersive images, but with no useful information. This way, we can more directly compare both conditions.

The experiment lasts about 2 hours inlcuding setup, main experiment and cleaning/debriefing. Participants are actively labeling images for about 1 hour, for a total of 200 trials. The duration of the experiment might lead to a decrease in attention, reducing the quality of our signal. In an attempt to mitigate this problem, we decided to include a break scene every 25 images that lasts 1 minute. Some studies have shown that virtual nature can increase attention, therefore, or virtual environment scene consisted of a snowy forest, where participants could listen to music and control the wheather.
For each image, we asked participants to label image (beach/city/other), and to rate in based on representativeness (1-5), for which there was no time limit. Previous experiments show the representativeness of naturalistic scenes might drive the N300 effect, such as in Center’s study. However, they used a simple binary paradigm, where images were classified as good or bad. We aimed to extend this idea so that the least representative images, would elicit a more prominent N300 than the lower ratings, and this would follow a linear relation.
Results¶
After running a small 12 participant pilot, the relation between perceived representativeness and the N300’s amplitude did not match our hypothesis, with it not even being monotonic.
Average amplitude over the electrodes F3, Fz and F4 using data from the 2D images
Average amplitude over the electrodes F3, Fz and F4 using data from the immersive 360 images
This study is still in progress, and we have two hypotheses that might explain our results:
-
Previous experiments investigating the N300 used short presentation times around 200ms, in contrast, the average response time in our experiment was around 10s. Participants’ ratings would therefore be mostly driven by later processes, obscuring the relation between the N300 amplitude and the rating.
-
The stimuli in the previous experiment may not have captured varying levels of category representativeness, but rather the ease of recognition of an image, i.e., how easy it is to figure out what’s in the image.
Current state¶
Given the results from the previous pilot, we opted to drop the VR headset, trying to mimic experimental setups in ERP research as closely as possible, also reducing the stimulus presentation time to 200ms. In an attempt to test whether it is the representativeness, or the ease of recognition of an image driving the N300 effect, we used two different types of stimuli. Phase scrambled, and unaltered. Phase scrambled images keep the same frequency component while progressively scrambling its spatial structure by blending its phase with random noise, the level of scrambling is controlled by an interpolation weight (α).

We used four increasing levels of scrambling (α ≈ 0.32, 0.49, 0.67, 0.84).
In the current experiment, every stimulus has a recognizability level ranging from 1 to 4, with unaletered images being assigned a phase scrambling level of 0, providing us a total of 5 recongizability levels. As in our previous pilot, we ask participants each stimulus image in a range of 1 to 5 based on how representaive they are of a city.
We now aim to directly compare how an image’s representativeness and recognizability modulate the N300. We will used mixed effects models to test the goodness-of-fit for models using our two variables as predictors.
ERP wave using the phase scrambling levels as grouping factor.
ERP wave using the self-reported representativeness as grouping factor.
The current results indicate that it is indeed the reconizability of an image that drives the N300 effect. There might still be a non-linear relation between representativeness and N300 amplitude which is yet to be tested. In order to confirm the topology of the signal is consistent with the N300, we ran a cluster based permutation test, using the scrambling level of 5 as target bin, this confirmed that although the effect is weak, our signal is mostly located in fronto-central electrodes, matching the N300 in a ~250-350ms time window.

This work is in progress, feel free to ask any questions at hinojosa.garcia@oulu.fi
Unity Project¶
It facilitates communication with external lab equipment, such as EEG devices, using the Lab Streaming Layer (LSL). The project was created using Unity 2022.3.9f1.
Previously, the project used the Meta hand tracking system and the Quest 3 headset, but it has now transitioned to using the Varjo Aero for enhanced precision and immersive experience.
Project Details¶
- Unity Version: 2022.3.9f1
- Headset: Varjo Aero (previously used Meta Quest 3)
- LSL Integration: Used for communication with EEG and other lab equipment
- Scenes: 360° beach and street scenes displayed to the HMD
- Research Group: Perception Engineering Group, UBICOMP, University of Oulu
- Website: UBICOMP Research Unit
- Repo: GitLab Repo
Project Team¶
The main researchers behind this project are Evan Center and Matti Pouke, both from the University of Oulu.
- Evan Center: Researcher Profile
- Matti Pouke: Researcher Profile
The Unity implementation, including the integration of 360° immersive scenes and Lab Streaming Layer (LSL) communication, was developed by me.